I had a discussion at work today where we were adding some fields to a model, and we were talking about whether it should be split into a separate data model. This made me wonder what type of guidance there was out there in the universe.
Turns out, there’s not much. I searched around for any posts about it, and couldn’t find any. There’s lots of info about how to model data upfront, but not a lot of advice about the ongoing maintenance of a data model. So I figured I’d write my own.
I’m using “data model” to describe a single class that gets data from a matching database table, and provides some small and common amounts of processing/filtering/transformation logic for that data. You could also refactor your usage of models to separate the concerns, but many projects don’t.
There are three things to consider when deciding if you should take one data model and split it into two, but I think the ideas can be applied to other designs.
You want your data model to be simple and easy to understand. One model should be equivalent to one concept.
The issue is when there’s another concept that’s similar, but not the same. For example, your Store table requires a mailing address, but what about an online store? Do you add a type column, and then validate that the address, or URL is present, depending on the type? Or, does it need to be an OnlineStore vs a PhysicalStore?
You end up with a table where you have 20 columns, only some of which are required under certain circumstances, but not others.
I think that validations with lots of conditionals are a warning sign that the table might be modeling more than one thing.
Many model classes contain some amount of presentation logic. By “presentation logic,” I mean that it filters the data so that only certain attributes are returned, or that it does some type of transformation to make the data ready to use.
If you notice that you end up with substantial amounts of this logic for presenting data, you should consider if the data model can be improved. Is there some reason that you might be devoting a lot of code to filtering data out of a single table? Would it be better if it was split into a separate table?
I think the lifecycle of an object matters. An extreme example, for explanation purposes, would be a table that stores a data for a TemporaryMessage and a LongLivedMessage. These two types of data are managed differently, their access is probably controlled differently, and they are purged out of the system according to different business logic.
I think this is especially nefarious because if one data model has different lifecycles, it means that every time an engineer is working with those models in new code, they need to remember that there are different types, and they need special treatment for each type. This can be avoided if you have different classes for different things.