From what I'm reading there's still a lot to be fleshed out to be maximally useful to me, but even in its current state I could imagine using this.
— I'd like to have a field property that limits stored values to a single version and thus is automatically cleared whenever the row is updated. This would be useful for inlining change annotations, and for associating a user_id to specific changes.
— I'd like to be able to arbitrarily select the n-1 value of fields regardless of their time period. E.g.
select username, previous(username)
from users
— When viewing a specific version, I'd like to know whether a field's value was supplied in that revision. That's distinct from if the field was changed. I want to know if the value was supplied—even if it was identical to the previous value.— This might be possible already (it's hard to tell) but I'd like to be able to query/join on any revision. For example I might want to ask the question "show me all products that james has ever modified". That could then get more specific, e.g. "show me all products where james changed the price".
It's a long and messy history (no pun intended), but essentially it was rarely practical to consider retaining database history for the first few decades of SQL, due to physical storage costs & limitations. Snodgrass and Jensen proposed initial bitemporal extensions in the 90s and lot of research was done subsequently, but most vendors didn't make their move until the 2011 standard was formed (Oracle Flashback being the most notable exception). Unfortunately the rollout of the 2011 temporal standard has been underwhelming across the board, as each vendor ended up implementing something subtly different, which I think has massively hindered adoption. Since then I would guess that "immutability" has been the largest driving force behind the resurgence of interest.
That does make sense from a historical perspective and I don't doubt that's why. But still I find it unsatisfying because any competent database schema will always retain the history that needs to be retained. If you don't have the storage capacity, you choose to not store so much history. If you don't have native concepts for storing history, you kludge it yourself.
Whether you have native temporal support or have to kludge a DIY solution in the schema, the data you need to store gets stored.
My frustration is that I feel that temporal concepts should have been deeply native to SQL right to its core. History should have been as fundamental to database design as columns and rows. It should be a thing you turn off when you don't want it, not a thing you turn on when you do.
Essentially PostgreSQL has copy-on-write semantics, so historical records exist unless a vacuum marks them as no longer needed and subsequent insert/updates overwrite the values.
In the past when PostgreSQL had the postquel language (before SQL was added) there was special syntax to access data at specific points in time:
This is nicely outlined in "THE IMPLEMENTATION OF POSTGRES" by Michael Stonebraker, Lawrence A. Rowe and Michael Hirohama[1]. Go ahead open the PDF and search for "time travel" or read the quotes below.
> The second benefit of a no-overwrite storage manager is the possibility of time travel. As noted earlier, a user can ask a historical query and POSTGRES will automatically return information from the record valid at the correct time.
Quoting the paper again:
> For example to find the salary of Sam at time T one would query:
retrieve (EMP.salary)
using EMP [T]
where EMP.name = "Sam"
> POSTGRES will automatically find the version of Sam’s record valid at the correct time and get the
appropriate salary.I am using triggers and audit tables which works but my data requirements are relatively small so I won't face any challenges that way. However, re-using the old rows like this would lead to a far more efficient approach if it were supported natively.
[0] https://www.splitgraph.com/product/data-lifecycle/research
We implemented the “minimum viable” features (i.e. automatic expiring, non-destructive updates, generated indexes and generated table declarations), but left some of the “harder” ideas up to the application designer (adding semantic versioning on top of temporal versioning, schema migrations).
It’s worked really well for us. I can’t think of anything we’ve done that’s had a higher ROI than this. I’ll really miss it when I leave!
Given this caveat, this seems unusable for production systems.
I'm guessing that "backups" would actually have to be live replicas set up from the start, and if the master fails, you convert a replica to master.
In addition, you could perform actual static backups by pausing a replica, backing up the actual table files themselves, then resuming the replica (and it will catch up). In case of total failure, you just dump the table files into a fresh install of MariaDB. (Copying database files is a common technique for migrating data, not just SQL command import/export.)
Is there any reason why these wouldn't work?
From the description it looks like it would be easy to do backups, it's just that mysqldump is not currently aware of temporal tables.
Just use
SELECT * FROM t FOR SYSTEM_TIME ALL;
And export it in an appropriate format.
For anyone wondering why temporality matters and how this is different from adding a “create_time” to each row, I would highly recommend watching Rich Hickey’s talk title, “Value of Values”
Could be nice to see what magic goes on behind the scene in some applications.
For example when you do some clicks in the backend of WordPress and wonder what it actually did to the data.
1) Is this always going to be performant with indices? It seems like "time" is kind of like another index here, and when designing queries which indices are used and in which order can be the difference between taking milliseconds and taking an hour. It's not obvious to me whether this will have hidden gotchas or query execution complexities, or if it's designed in a way that's so restricted and integrated into indices themselves that query performance will always remain within the same order of magnitude
2) What is the advantage of building this into the database, instead of adding your own timestamp columns e.g. 'created_timestamp' and 'expunged_timestamp'? Not only does that seem relatively simple, but it gives you the flexibility of creating indices across multiple columns (including them) for desired performance, the ability to work with tools like mysqldump, and it's just conceptually simpler to understand the database. And if the question is data security, is there a real difference between a "security layer" that is built around the database, versus one built into it? It would be fairly simple to write a command-line tool to change the MariaDB data files directly, no?
also, DDL migrations become a nightmare.
I haven't thought about this too deeply, but I think "simple" is overstating it. Being able to turn on versioning for any table by basically just pushing a button seems really powerful.
There's application-layer stuff like paper_trail for rails that can do this for you, but you're stuck if your language doesn't have a good one.
Building it into the db also means that any out-of-band direct edits to the DB also get tracked.
If it's present in every table, the database can be optimised for it.
This feature seems to be well fitted to support some of the cases where event sourcing is introduced, I wonder if someone successfully applied event sourcing with use of temporal tables to reduce the amount of work that has to be done in the application code (Akka, etc.).
It’s an excellent point to be aware of.
Now all they need is materialized views and they'll be close to postgres.