I personally would argue with every single point this article makes, except scalability.
Maybe you could be more explicit about what you don't like about their ideas? I personally do like a lot of their ideas, such as the following:
> In the future, I’d expect to see a tighter coupling between the frameworks we’re using for reactive frontends – React, Vue, etc. – and the database, via hooks or otherwise.
This builds on the behavior that made MongoDB so phenomenally popular, as the article points out. Data management is pervasive in modern applications and anything that makes it easier for devs to implement is goodness.
I know posts with ThOuGhT LeaDeRshIp titles like this are usually annoying, but I thought it would be interesting to write down some of the lessons I've been gathering as I've spent more time covering and using specific databases. My background is in data science / analytics with a couple of years of more traditional full stack here and there.
Broadly we've seen this pattern with infrastructure in general – it's a lot easier to set up a server than it used to be, all things considered. Now obviously if you're a tiny startup, you're more comfortable outsourcing everything to Heroku, and if you're a hyperscale enterprise, you probably want more control on exactly what your database is doing.
The thesis here is that on the tail end (hyper scale), things are getting more under control and predictable, and developers there want the same "nice things" you get with platforms like Heroku. Elsewhere in the ecosystem, more and more parts of the stack are getting turned into "simple APIs" (Stripe for payments, Twilio for comms, etc.). And perhaps most interestingly, as serverless for compute seems kind of stuck (maybe?), it may be the case that serverless for databases – whatever that ends up meaning – is actually an easier paradigm for application developers to work with.
It's absolutely true that a low friction developer experience is necessary for a database product to be successful. But this in no way implies that database internals are being commoditized or relegated to minor importance.
Snowflake is a particularly bad example as taking a clean sheet and novel approach to internals is the very fulcrum that creates the easy developer experience.
Admittedly it's been a while since I looked at vitess, but my recollection is that it's cross shard functionality is so limited as to make claiming internals no longer matter a bit dubious.
The reason there's only a handful of spanner style systems is exactly because the internals both matter and are quite daunting to get right.
Partitioning: 1) DynamoDb: Partitioning is explicit and one of the most important parts of schema design 2) Spanner, Cockroach: Database automatically partitions the key ranges. 3) Postgres: You will probably never reach the scale where you need to partition your dataset!
Transactions: 1) Spanner, firestore - no stored procedures, client-side transactions are important 2) Dynamodb: No stored procedures, no client-side transactions, only transactions where all items involved are known by primary key in advance. 3) Fauna, Supabase: Stored procedures are the way to go! You do not need application code, access your database from the client! 4) Postgres: We have everything, use what fits your particular use-case!
If database internals did not matter, why are they all doing something different and are sometimes quite opinionated about it?
Everything else pales in comparison.
Create a database?
sqlite3 mydata.db
Where is the database? In the current directory
How is it structured on disk? It's a single file
How do I backup the DB? cp mydata.db /my/backups/mydata.db
Do I have to install a server? No
Do I have to configure anything? No
During setup and deployment I usually I dabble a while with the whole GRANT *.* ON localhost IDENTIFIED BY PASSWORD or something. How do I do that with sqlite? It just works
Do I have to close / protect any specific ports? No, it's just a file
Which field types should I use for ... ? None. It just works.You could be a single-node 4-core $5/month VPS instance and run into this issue. SQLite requires "exclusive" access to a table to handle writes (meaning when writing, no other thread can be reading the table). Especially if your transactions start to become complex.
In contrast, MySQL and PostgreSQL allow for simultaneous reads while writes are occurring.
There are a lot of fun little tricks you learn along the way.
The most important one is to ignore all the naysayers who claim it won't scale. I can easily saturate any IO subsystem by way of SQLite.
At first you may think it's impossible, but then you start to go outside the lines a bit. You start by turning on WAL. That helps a lot... But then you want another 100x? One thing you can do is heavily abuse the fact that any SQLite db is "just a file" by opening as many as you need to in parallel. 9/10 times you can break down a typical business app into grains roughly approximating the current state of each user.
Might make regulatory compliance easier if each user is stored in their own database too. Makes removing someone's entire presence from your system 1 file deletion activity.
https://manual.calibre-ebook.com/faq.html#i-am-getting-error...
i guess that's a bad -dev- user experience?
The same goes for a local filesystem. Sqlite has certain features it requires the filesystem to have. That is independent of how that filesystem stores the data physically.
Also, it support more stuff (like stored procedures!) that I miss a lot on sqlite.
My only complain with firebird is that is not easy to embed into iso/android or it will my main db.
Not when you have threads.
If for you there are specific differences that make you chose some other db, let us know.
Hierarchical databases were much like a filesystem, but for records instead of files.
Navigational databases allowed data to be linked in a network. Look up CODASYL for detail.
The relational database design was first proposed in the 1970s.
Wait, is that a feature or a bug?
So funny to me that NoSQL boosters have only recently understood that designing sane schemas and knowing what order your data is inserted is important for data integrity. It's like an entire generation of highly paid software devs never learned fundamental computer science principles.
That being said: going back to 1970 to read the original "A Relational Model of Data for Large Shared Data Banks" by Codd (the paper which started the relational-database + normalization model) is incredibly useful.
But yeah, all of this debate about "how data should be modeled" was the same in 1970 as it is today.
-----
SQL doesn't quite fit 100% into the relational model, but its certainly inspired by Codd's relational model and designed to work with the principles from that paper.
And strangely enough, legions of authors and teachers and courses do a worse job at explaining relational databases than Codd's original 11 page paper.
Relational algebra is a lot easier once you start viewing it as relational algebra - a declarative expression of intent that can be manipulated and re-expressed similar to other purely mathematical statements. Then, when performance tuning becomes the watchword, you take that flexible expression and slice and dice it according to how the DBMS you're working with requires to align it with performance. You always want to think of your queries as complex summoning spells that draw in different necessary resources in some particular patterns and then impose an expression form on that blob of data - then you'll skate through all things SQL.
And these performance differences are not small once your database has a decent size. And if you tables are really large it's not a question of fast or slow but fast enough or so slow it's indistinguishable from the database being down.
Of course to some extent you can simply throw hardware or money at the problem. This certainly works for smaller inefficiencies, but sometimes knowing the database will give you orders of magnitude better performance. Hardware and money also don't scale indefinitely.
In my view the author has absolutely zero basis to make such a claim.
It was my impression that everyone picked (and still picks) MySQL, MongoDB, and Firebase _because_ they were the easiest to use. It seemed like developer experience was by far the most important thing to them (compared to sane behavior initially in the case of Mongo and MySQL, some of which has since evolved).
I've found that to be the case, except for enterprise development, which has different concerns than how quickly code gets written to use a database.
You know what is practical? Schema design. Query language. That's what made MongoDB super popular; no schemas to worry about. A query is just '{ firstName: "John" }'.
I cannot emphasize this enough: I cannot summon even a milliliter of desire to care about whether Mongo's way of doing these things is actually "better" or "worse". But it is what made it popular.
Of course you need to know the internals of your database. If you've ever come across a project where the team treated a key/value, or document database as a relational one (probably because the query syntax looks similar), then you will know just how important database internals are.
I guess rethinkdb really was ahead of it's time.