Why is using a common-lookup table to restrict the status of entity wrong?
According to Five Simple Database Design Errors You Should Avoid by Anith Sen, using a common-lookup table to store the possible statuses for an entity is a common mistake.
Edit + Answer: The figures in Anith's article aren't well labelled - I thought both Figure 1 and Figure 2 are examples of bad design, whereas Figure 2 is good design. Phew, got worried there for a moment.
- Lookup tables: good.
- Common-lookup tables: bad.
I'll keep my question below for reference.
The following reasons are given:
"You lose the means to ensure accurate data; constraints. By combining different entities into a single table, you have no declarative means to restrain values of a certain category." How is constraining values losing accuracy?
"You are forced to represent every data type as a string with this type of generic lookup table." If I want to represent another data type, I can add a column for it to my lookup table.
"You commit yourself to rigidity and subsequent complexity." How?
Fourthly and finally, you are faced with the physical implementation issues. I don't see why.
I disagree with most of the reasons given and would like some objective critique on my incorrect? logic.
Citing the example of jobs at a repair service with many possible statuses that generally have a natural flow, let's take a JobStatus table:
- Booked In
- Assigned to Technician
- Diagnosing problem
- Waiting for Client Confirmation
- Repaired & Ready for Pickup
- Repaired & Couriered
- Irreparable & Ready for Pickup
- Quote Rejected
Arguably, some of these statuses can be normalised to tables like Couriered Items, Completed Jobs and Quotes (with Pending/Accepted/Rejected statuses), but that feels like unnecessary schema complication.
Another common example would be an OrderStatus table to restrict the status of an order:
The status titles and descriptions are in one place for editing and are easy to scaffold as a drop-down with a foreign key for dynamic data applications. This has worked well for me in the past. If the business rules dictate the creation of a new order status, I can just add it to OrderStatus table, without rebuilding my code.
Why is this a bad practice?
Edits: I added Anith's reason to my question and tried to remain objective.
What Anith Sen is advising against is having a single lookup table for all lookup codes. That is the significance of the category column in his example. Having a separate table for each category is definitely the way to go.
This is because:
- we can use the look-up tables to restrict values through enforce foreign keys
- it makes it easier for the database to optimize queries which join data tables with look-up tables
- it scales better: one large lookup category can really skew performance
In your examples JobStatus and OrderStatus are separate categories. applicable to separate entities. That is why they need different look-up tables. There is not even a problem with sharing the same code table across several different data tables. Where it becomes problematic is when we have separate data tables (entities) for which some statuses are not appropriate: that is the time to split the codes out into separate look-up tables.
I see you have edited your post to cite all of Anith's points. I think the most important point one is the first one, regarding constraints. If you want to restrict the ORDERS.STATUS column to have values from the OrderStatus category then you have to have a separate table to enforce a foreign key. Your alternatives are:
- include a CodeCategory column on the ORDERS table and enforce a compound foreign key against the common CODES table, which now needs a unique key of (Category,Code).
- duplicate the OrderStatus values in a check constraint
- don't enforce the values in the database, and rely on the application's drop-down list to restrict the values.
All of those options suck, from the database perspective.
You already have a correct answer, so this remark is extra.
The big problem with OTLT (One true lookup table) is that you end up placing values from different domains in the same column, and then using a separate column to disambiguate.
In your examples, you have used numbers alongside each status description. If those numbers are numeric codes, as I think they must be, then you do not want the value 4, meaning "waiting for client confirmation" in the same column as the value 4, meaning "Cancelled". If you do this, then you can't use this column as the PK for your one true lookup table.
If you give your one true lookup table another column, call it "CodeType" and use "CodeType" and "Code" as a compound PK, you have introduced more complexity than you introduce by having a separate lookup table for each code type.
The short answer is this: don't put values from different domains in the same column. It always causes more trouble than it saves.
Incidentally, it's possible to create a view that combines all the separate lookup tables into what appears to be a single giant lookup table. This can be useful in certain very unusual situations.
You should have a a separate lookup table for each lookup you need, not just one. When you have one, it becomes a bottleneck in the system as most queries may need to join to the table or query just that table. Plus frankly it makes the system harder to maintain and understand when you come in new to the company.
Additionally, you have destroyed any data integrity from have a PK/FK relationship. If you have separate tables the FK means you can't enter any values not in the lookup table. If you use One big table, you can enter values that are not appropriate to the situation. I remember one database I worked in where the value for the person_type was "Yes" for some records.
I don't think your Order Status example really fits what is discussed in the article - an OrderStatus table seems to be how the author thinks you should do it.
There are numerous reasons that the "lookup table to end all lookup tables" is a bad approach, among them the fact that you can't define relationships, and joining becomes more difficult than it should be.
The issue is with creating a central table that has many different codes comingled. You either a) have to include extra columns so you ensure that orders only reference order statuses within that table, or b) you end up with a Job being Shipped and an Order being "Irreparable & Ready for Pickup"
Now you've added Anith's reasons and your rejection reasons. Again, what Anith is talking about is having one central table for all lookup values within your database. So let's look at number 2:
"You are forced to represent every data type as a string with this type of generic lookup table."
If I want to represent another data type, I can add a column for it to my lookup table.
But with one central table for all lookups, you're now adding a column that should only be populated for one particular lookup operation. But are you going to add the CHECK constraints that enforce this (effectively creating a NULL, but NOT NULL if Category='X' constraint on the column). This can turn into a maintenance nightmare.