Really!? Are you building applications by chance or something else? Are you doing raw sql mostly or an ORM/ORM-like library? This surprises me because my experience dabbling in json fields for CRUD apps has been mostly trouble stemming from the lack of typechecks. SQLite's fluid type system haa been a nice middle ground for me personally. For reference my application layer is kysely/typescript.
Well, you move the type checks from the database to the app, effectively, which is not a new idea by any means (and a bad idea in many cases), but with JSON, it can actually work out nicely-ish, as long as there are no significant relationships between tables.
Practical example: I recently wrote my own SMTP server (bad idea!), mostly to be able to control spam (even worse idea! don't listen to me!). Initially, I thought I would be really interested in remote IPs, reverse DNS domains, and whatever was claimed in the (E)HLO.
So, I designed my initial database around those concepts. Turns out, after like half a million session records: I'm much more interested in things like the Azure tenant ID, the Google 'groups' ID, the HTML body tag fingerprint, and other data points.
Fortunately, my session database is just 'JSON(B) in a single table', so I was able to add those additional fields without the need for any migrations. And SQLite's `json_extract` makes adding indexes after-the-fact super-easy.
Of course, these additional fields need to be explicitly nullable, and I need to skip processing based on them if they're absent, but fortunately modern C# makes that easy as well.
And, no, no need for an ORM, except `JsonSerializer.Deserialize<T>`... (And yeah, all of this is just a horrible hack, but one that seems surprisingly resilient so far, but YMMV)
Our solution for a similar situation involving semi-structured data (in postgres) was to double it up: put all the json we send/receive with a vendor into a json field, then anything we actually need to work on gets extracted into regular table/columns. We get all the safety/performance guarantees the database would normally give us, plus historical data for debugging or to extract into a new column if we now need it. The one thing we had to monitor in code reviews was to never use the json field directly for functionality.
That way you're throwing away 50% of the reason you use a relational database in the first place. Has it occurred to you that MongoDB exists?
Also I don't understand why you're afraid of migrations, especially since you're the only developer on your own SMTP server.
My original comment started with "but it feels "prior to the MongoDB-is-webscale memes""
So, care to take another guess? And, while we're here, does MongoDB run fully in-process these days? And/or allow easy pagination by ROWID?
What gain would MongoDB offer here?
You certainly would lose a lot of things, like a well supported path to linking with to the database engine, and a straightforward way to start to introduce relational tables as the project matures. Nothing completely insurmountable, of course, but carry a lot of extra effort for what benefit?
Did you miss that he’s using sqlite? The dev experience with a sqlitedb is way better than running yet another service, especially for personal projects.
Sqlite is used just as much as an application file format as it is a relational database.
> And SQLite's `json_extract` makes adding indexes after-the-fact super-easy.
That's a migration.
> Of course, these additional fields need to be explicitly nullable, and I need to skip processing based on them if they're absent
That's an effect of not migrating - having to process null and absent fields instead of just null fields. After doing more of these, you'll run into the same thing that made people stop using NoSQL databases: with no schema, your code has to parse all previous versions of the data format and they probably aren't even well-documented. While an RDBMS can just set the new column to null in existing rows.
I do the same thing with serde_json in Rust for a desktop app sqlitedb and it works great so +1 on that technique.
In Rust you can also tell serde to ignore unknown fields and use individual view structs to deserialize part of the JSON instead of the whole thing and use string references to make it zero copy.
Recently I’ve been tempted to make an SMTP server that translates emails into a web-hook. Please tell me more horror stories so that I might be convinced not do it.
For the data schema, we're using Protobufs with buf validate. This works surprisingly well, you can use the same types in the backend API and on the frontend. We even have a cron job that reads all the data periodically and verifies that the JSON blobs conform to the schema. Our next intern is going to write a PostgreSQL extension to do that on commit/update :)
One real advantage of this approach is that it's easy to do stuff like "search everywhere".
Classic fine-grained schemas are not that much different. A lot of high-level ORM frameworks simply save all the objects' fields on update, without doing fine-grained diffs.
In addition, our frontend apps also support offline mode. They can get all the relevant objects, and then operate on them locally. So our API was designed from the start to deal with conflicts.
The actual "data" is a mess: different data types for the same JSON field. Imagine storing "price" as both a float and a string, depending on whatever buggy code happened to do the inserts.
It worked enough for a prototype and that was enough for management to believe the project was a success. I can't wait until we actually try and deploy it.