Whether it's a good idea depends on your goal, and what alternative buildings blocks you have available.
(Eg if you are building your distributed relational database to run on top of lots of computers with spinning hard disks, you might want to expose some more characteristics of the hard disk directly to your database, so you can manage them; instead of trying to hide them behind an abstraction.)
I think I'd prefer to stop calling _large_ resources that are only K:V a 'database' though.
A 'database' shouldn't require SQL, but a distributed filesystem, however similar, isn't quite a database.
Perhaps the distinction is more pragmatic than fundamentally technical. We typically use the term "database" to describe systems designed primarily for structured data management with query capabilities, while filesystems optimize for hierarchical storage of opaque binary objects.
Therefore a KV is not a database either.
It's certainly a base for data and can be used to implement all core concepts of relational calculus but it isn't designed for such and doesn't do so with performance in mind. Conversely, filesystems are often implemented using B-trees as many RDBMSes are but aren't designed for many of the operations one might typically ascribe to a database.
Nomenclature is tricky... how does that saying about the two hardest problems in CS go again?
Think of KV databases as a persistent associative mapping/hash map that needs to store data in a safe and secure way, then we can build advanced stuff on top of it. Take TiDB for example, it is a distributed database based on MySQL (its own query language can be considered as a subset of MySQL), but actually most of the heavylifting is handled by TiKV, which is a distributed KV datastore with Raft distributed consensus.
And then SurrealDB also leveraged TiKV to build their own graph-document hybrid database product...as one of the data transport. P.S.: used to be a contributor for SurrealDB.
If your workload has even a whiff of analytics to it, operational or slow-time, KV databases are almost the pathological architecture in theory. Their intrinsically poor locality exacts a steep performance price.
These database architectures are all equivalent in the same sense that almost everything is a Turing Machine. Some manifestations and implementations are much more efficient than others in the real world. While I am not as emotionally invested in it as the article’s author seems to be, he is generally correct that KV databases have poor properties for most applications.
The first one I encountered was DrFTPD circa 2004. But these days, any object storage system qualifies because they all support varying replication schemes and reading from any valid in-sync replica.
Some generalisations are close enough to true to be worth it.
Eg I'm fairly confident to generalise and say that for most beginners in 2025 picking Python is a better choice than PHP or Cobol.
Of course, you can come up with some contrived scenarios where the beginner would be better served with Cobol.
IMO key value stores tend to live in the space between a third normal form ultra relational UML diagram database like the college textbooks assure you exist and a high chaos cowboy document storage system like mongodb.
They enable you to make a lot of things up as you go and iterate on your design. I like them because they remove a lot of ceremony around letting me get on with persisting things without having to ALTER TABLE or CREATE TABLE and all that entails. At the same time, they're constrained and often organized in a way that storing big ol' json blobs aren't. I like them for doing multiplayer gamedev things.
But you are right that in practice you sometimes want to deviate from them. And then it is still useful to be aware of what the normal form of your database _would_ be, and how you are deviating.
Similarly to how sometimes you might want to manually unroll a loop in your code, and it's still useful to keep in mind conceptually how the original loop would have looked like.
Fundamentally this isn't a theoretical limitation of the relational model, instead it is a historical artefact that "doesn't have to be that way".
Systems like Kubernetes or Azure Resource Manager show how this ought to have been implemented: Declarative resource definitions via an API with idempotency as a core paradigm.
I.e.: Instead of the web developer having to figure out the delta change between two schema versions, they should just "declare" that they want some specific schema. This could happen with every code deployment, with most such changes doing nothing. If the schema does change, then it's up to the database engine to figure out how to make it happen, much like how a Kubernetes cluster reconciles the cluster state with a deployment definition.
KV stores become popular because they do this implicitly, with no explicit schema at all.
Relational databases (or even KV stores!) could instead use an explicit schema with automatic reconciliation instead of manual delta changes implemented by developers.
TL;DR: The tooling is bad. KV is an over-reaction to this, but instead we just needed better tools.
Doen that imply that you should give up on KV datastores today, when this product category he's asking for barely exists? No, obviously not.
KVs give you this behavior, they just drop everything else with it
Or maybe there is a higher level DSL that you could apply to create query plans (something like MongoDB aggregation pipelines maybe?), but it quickly becomes basically the same as SQL.
https://github.com/permazen/permazen/blob/master/README.md
It's a bit like the record layer in FoundationDB but more advanced. You specify query plans manually, so you can't accidentally forget an index for example.
I think it is because most people can make something work with SQL.
I don't want the dynamic nature of the planner. I don't want to send SQL over the wire, I want to send the already completed plan that I either generated or wrote by hand. So many annoying performance bugs are because the planner did the slow thing. Just let me write/adjust it.
If your use-case is a data warehouse, then you absolutely want more than a K/V database and likely dynamic query plans because the point is dynamic usage. If your use-case is the serving frontend for a >1m request per second API, then sure, you probably don't want the complexity of a relational database and query planner.
Most things are somewhere in the middle and need to give serious consideration to this.
The poster complains about query plans being unnecessarily dynamic; for certain queries, it should be pinned, and only changed in a controlled way. Compare it to something like pip or npm; not being able to pin versions of certain packages could be a source of endless frustrations.
Pinning a query plan to a query could very well be a feature of a relational DB, an it is. Postgres (pg_hint_plan extension), Oracle (a bunch), MS SQL (somehow), they all have ways to pin the query plan. Not sending SQL is calling a stored procedure, also a long-standing feature of relational databases.
Knowing your tech stack goes a long way in battling frustration.
Let database enforce serialization format (JSON, BSON, MessagePack, protobuf.. anything really) + create and maintain indices, using this fancy crash-proof logic it has. That'll cover 95% of all my database needs.
(OP also asks for row-based layout, types, and non-trivial language. I think those parts are entirely optional)
You can attain what you desire by using an RDBMS, and having all tables with one key column, and a TEXT column with your serialized non-key fields; it's going to be a fun approximation of 6NF. Realistically, you can have all joinable columns as normal columns, indexed as you desire, and the rest of the columns as a serialized blob.
When you want high parallelism for guaranteed independent segments of data, use sharding.
Indexes, triggers (very good abstraction covering everything from computed fields to dependent fields), transactions.
In Fox, you write more or less `physical query plans` as syntax:
USE customer && Opens Customer table
CLEAR
SCAN FOR UPPER(country) = 'SWEDEN'
? contact, company, city
ENDSCAN
And what it make this even better, is that you can also write `SQL` so you can have the best of both worlds.BTW, I think this idea can be move even further and my take is at https://tablam.org