I mean, every time you write to the database, you open a transaction, do you stuff and close it. It either works or it doesn't. If it doesn't, then it rolls back. Isn't that enough?
I guess I am saying that for complex integrity checks, it should be the responsibility of the application's data layer.
On the other hand what is meant by "complex"? Is it "complex" because of numerous, ever-changing applications that touch the database or is it just one application and a database with a lot of tables?
In the case of wild, numerous applications, I suppose that database-side integrity controls are needed. If you really have ONE application touching the database, it might be better to have the application police integrity. Perhaps there's more than one answer?
One way to resolve it with triggers is to have a stored procedure fetch all the records that intersect the new booking's interval, and procedurally count the number of concurrent bookings at each record's start and end times; the constraint would be violated if and only if the count at any point plus the pax of the new booking exceeds capacity.
I am not sure that I would call these "tricks"
There are two major problems with dealing with them in the application side
1. You can no longer guarantee consistency.
2. In the case where you need to do some things procedurally so that your application can no longer make a yes/no query to determine if a constraint is violated but need to make use of intermediate results, then this means that you will need to transfer intermediate results over the wire, which is a performance penalty over just doing it in the database server.
edit: of course, when you need to ensure consistency with some data source outside the database, there's no choice but to do it application-side.
shudder
Triggers are almost always the wrong level to handle consistency. It flies in the face of 'fail fast' if you need to actually handle problems of consistency and if you need 'last line' protection, why not bake it into the structure instead?
If you put the validation in the database, any application can leverage it. If you want to fail fast, you can still do validation in the application layer while relying on the DB layer as the ultimate validator.
- You check for consistency issues on the client-side UI level, and give the user instant feedback when something is not right.
- You also create a db trigger to enforce the rule in the database itself. This way you, as a developer, can write your code with the assumption that the data is gonna be consistent when it comes from the database, even if something went wrong on the client UI level at some point.
With more complex checks it might be necessary to build a periodically runnable check mechanism. At startup might be a good way to do it, or some sort of cronjob. I can imagine that in larger systems there might even be separate services to check data integrity across other services.
Sometimes I add the same or similar constraints in the application layer because it leads to better error/validation messages.
IMO having data integrity enforced in the database instead of application code is similar to having a type system enforcing code integrity at compile time instead of at runtime. It's about catching errors as early and as close to the root as possible.
There's also a neat little trick with indexed views and cross join[1] that you can use for more complex constraints. I've never used it for anything more than reference tables, so I can't comment on its ability to scale.
[1] https://spaghettidba.com/2011/08/03/enforcing-complex-constr...
Reference tables for things that don't change or change very seldom are a valuable tool if data across other tables must agree. Check constraints otherwise.
But that's just the bones -- the last line of defense. Having a framework that can push such constraints through the application is very helpful so you are not waiting for SQL errors to enforce integrity.
It is easy to go overboard, though, and it is worth thinking about how important data integrity and agreement actually is before writing it in stone, so to speak.
In fact, your the foreign side of the FK needs to be covered by a unique constraint (including PK, which is simply a UNIQUE with slight differences).
I've also heard them called 'Verification Tables'. They are basically check constraints but stored in a table for reference. If you are using a check constraint in more than one place and it goes beyond a simple 'Y', 'N' it's probably worth thinking about using a reference table instead.
These tables will generally not be changed very often and can be considered 'read only' for the most part. At their most pure, an application can count on certain values to exist in a reference table and might even verify this at startup or populate them with application properties. They could be application level properties if not for the referential integrity needs.
It is often useful, however, to have some reference tables that are only 'mostly read only'. Changeable but largely static.
As a trivial example, a table like state_abbreviations. It is a finite list of data that is unlikely to change.
Data that comes from the government is a pretty good candidate for this kind of thing. It changes, but not often, and you don't want to give users the option to put whatever they want. e.g. Federal School Codes.
Other good candidates are structural aspects of your application. Say, a list of modules that users can navigate to to perform a certain job function. Perhaps you want to give them the ability to compose security roles around those and then apply those roles to users. Also, say, you want to use the same names for menus or reference them elsewhere in a consistent way, a reference table might be the way to go.
So basically a normalised database
In Microsoft SQL Server, you can often achieve the desired result with check constraints, primary and foreign keys, indexed views, and unique indexes. When more logic is required, I usually restrict write access from the application at the table or column level and force modifications to occur through a stored procedure that enforces integrity.
Unfortunately unlike compiled typed languages the database foreign keys are just straight up runtime overhead.
Using types correctly such as storing numbers and dates and UUID's in the actual database type rather than strings is an obvious one that typically improves space and performance, there is normally not much trade off here. Varchar with specific length I go back and forth on, it has saved me off on and catching something not validated for length properly in the app, but does get in the way when needed to expand length which happens quite often.
For coded values I like a code table per code type with foreign keys rather than say EAV, this has saved me many times in data quality and allows extra attributes to be added to specific types. It matches more closely to the type system in the client too which will have a specific type for a specific coded value. It also allows reflection like qualities where you can query the schema and follow the types. The downside is in a large app you can have many hundreds or more of code tables and its more work to cross cut features, but the benefits outweigh the downsides for me. this also goes for more full blown "entities" which are more obvious.
I don't normally use more complicated constraint or triggers, the tend to get in the way things like ETL loads and many complex rules end up working out much better in the app code. I am torn about it, but the procedural languages in the DB is usually less expressive with less reuse and can't do the other things needed such as alerting messaging etc.
Bottom line though don't be dogmatic, use the database as you can, test performance make the trade-offs. Sometimes constraints and triggers are better due to their locality to the data, sometimes leaving off foreign keys is better due to the overhead. Sometimes EAV is better for flexibility (or JSON).
In an ideal world we would run the same code in any tier(client, app server, database) for this stuff choosing the best location to run it based on the needs (or run it in multiple tiers, client for user experience, database for transactional correctness). You can almost get there with javascript an PL/V8 now days, and sort of with .Net and Sql Server, but its really not there yet.
- I use Postgres. This is important. Postgres has long been one of the most stable databases. It has multiversion concurrency control and even lets you make changes to tables within transactions, which makes such changes easier. It has many features, like check constraints, triggers, procedural languages, and JSON, which makes it easy to put move more stuff to your database.
- I have shell access to Linux virtual machines. I don't have layers of frameworks. I use the psql command-line tool and can log in as the superuser, postgres, when I need to. So my use of the database is not inhibited in any way.
- I write internal business apps. They can have complex business rules, but it is not web scale --- though I would be comfortable serving hundreds of requests per second for tens of thousands of users with my set-up, which is just a single virtual machine. I'm being conservative, and it could probably handle 100 times that.
I began 14 years ago as most web programmers do, keeping their database as a dumb data store and writing everything in their middle layer. But I have found maintenance easier, and the overall codebase much, much smaller, by keeping my database in the know as much as possible.
- Are the values in a column supposed to be from a limited set of choices? Tell it! Put those choices in a table, and link the column to that table with a foreign key.
- Is a number supposed to be within a certain range? Is the date in a column supposed to be after a date in another column? Tell it, such as with Check constraints.
- Is the data supposed to be sorted or summed or transformed before printed on the page? Tell it! Put the gob of SQL into a view, so that all your middle layer does is "select * from view".
Some might suggest that you take a performance hit by not using stored procs etc, but in reality it's never been a problem for my apps (the largest one has over 150k active users - https://usebx.com).
If the application is the only way to access the data, it would appear consistent until it wasn't. Assuming two application processes, process one would not immediately see the mistake it made, but process two would see it on startup. So, an audit table might be a good idea to see which process (or user) made which change, including the original erroneous entry, and any change needed to remedy the failed constraint. If the first process is still active, the second process could warn the first process, but that leaves a lot to chance. In comparison, once in place, relational database constraints are always "on" and enforced globally. Enforcement on a per transaction basis might provide a consistent slowdown, but still be preferable to a long startup time for checking the integrity of a large dataset with potentially many errors. In order to handle increasing startup times given boot time integrity checking, you may have to partition the data into more recently and less recently accessed data. In that case, you may find inconsistencies in reporting where the application only touched an earlier set of data once, and didn't enforce integrity checking. On the other hand, you could be presented with an opportunity to remedy the data before you can receive the report. If the entries that are permitted change over time, you would need some kind of date ranges on those values, so that the person receiving the report doesn't enter historically inaccurate data while fixing their own report.
I've opted to build an "API" within the database consisting of views for read-access and stored procedures that can validate relationships upon insert/update. It's pretty clean so far as it allows the data needed for integrity checks to remain in the database.
This hasn't made it to production yet, so take it with a grain of salt :)
Column should be NOT NULL WITH DEFAULT, nulls introduces unnecessary additional where clause checks.
SQL requires defining integrity in advance, and if your database gets corrupted, there is little hope of recover. Cleaning up your data is nearly impossible because your schemas and logic are tightly coupled.
Writing garbage collection scripts is easy if you make a practice of it, and makes you think about your schema in the ways that are important and that the ease of triggers hides from you. You can monitor and metricize it in ways that you can't with triggers.
These problems probably exist, but they’re very rare. Much more rare than people THINKING they have a problem like this, hacking around it with poorly implemented application code, and ending up with data that has integrity issues, when it could have been solved well using the DB properly.
There’s no “room” there for foreign key constraints nor pervasive use of transactions. We used limited transactions in payments area only (and eventually even retired the last of those) and no foreign keys.
Reasonably careful coding, doing a lot of work in stored procedures, and “fix it if it breaks” was the order of the day. I won’t claim it was academically pure, but it sure did work based on the scoreboard (cash register).
The main reason is change. As I evolve the code base, I might want to do something today which is principled, but seemed unprincipled yesterday. Database constraints have tended to hit me the hardest at the worst times too, e.g. being unable to make an emergency fix in prod. Or prod can't come up because the flyway script (which succeeded in test/stage) got snagged on a constraint when it ran on prod data.
Could you not then simply use a nosql solution?
I keep all of my logic in the application and only one “service” can write to related tables.
I also think that you should always write software as domain specific “microservices”. Not necessarily out of process, separately released microservices, but the Domain Driven Design concept of “services” that can just as easily be part of a monolithic project with different modules or if necessary shared between teams via versioned modules.
It is so much easier to deploy, rollback, and version code than databases.
Complex constraints sounds like an app/domain responsibility, much easier to test, version, troubleshoot, evolve.
When a solution cannot quickly and easily be tested, it impacts overall agility (small iterations become small waterfalls). Having complex constraints in database, could mean slower release cycles, processes (schema changes reviews, etc), more migrations.