We aren't using SQLite exactly as intended either. We have databases in the 100-1000 gigabyte range that are concurrently utilized by potentially hundreds or thousands of simultaneous users. Performance is hardly a concern when you have reasonable hardware (NVMe/SSD) and utilize appropriate configuration (PRAGMA journal_mode=WAL).
In our testing, our usage of SQLite vastly outperformed an identical schema on top of SQL Server. It is my understanding that something about not having to take a network hop and being able to directly invoke the database methods makes a huge difference. Are you able to execute queries and reliably receive results within microseconds with your current database setup?
Sure, there is no way we are going to be able to distribute/cluster our product by way of our database provider alone, but this is a constraint we decided was worth it, especially considering all of the other reduction in complexity you get with single machine business systems. I am aware of things like DQLite/RQLite/et.al., but we simply don't have a business case that demands that level of resilience (and complexity) yet.
Some other tricks we employ - We do not use 1 gigantic SQLite database for the entire product. It's more like a collection of microservices that live inside 1 executable with each owning an independent SQLite database copy. So, we would have databases like Users.db, UserSessions.db, Settings.db, etc. We don't have any use cases that would require us to write some complex reporting query across multiple databases.
with SQL Server you can get a very fast local connection by specify "server=(local)" in the connection string - this uses shared memory protocol bypassing the net stack.
If I am constraining my SQL Server usage to fit on 1 box, I might as well use SQLite (assuming no future plans for horizontal scaling).
For remote databases, setting the TWO_TASK environment variable to the server's TNS descriptor is one way to force a network login.
It is not.
> What happens if there is a hardware failure?
The product would suffer a total outage until manual intervention takes place. A restore of the VM from snapshot would be carried out by the customer. Some loss of the most recent business data would occur (i.e. between latest snapshot and time of failure). All of this is understood and agreed to by our customers.
> How do you partition access in a way that means an extremely active user doesn't impact availability?
Partitioning is not applicable. Our entire product fits on 1 machine and dividing the I/O along any dimension does not add much value to the performance equation.
That said, I love the idea this architecture. Might use it for whatever next dumb little web service I cook up! I love how this simplifies a lot of dev/deployment ops, perfect for a side project.
It would be similar here. According to the author each sqlite DB belongs to a single microservice, which will naturally group together the most common of joins. Anything else will indeed have to be manually joined.
Part of the reason nosql became popular for a bit. That's reversed, and sharded SQL is pretty common now, but it definitely adds more cognitive load to schema design.
How do you do joins?
Because SQLite is effectively serializing all the writes for us, we have zero locking in our code. We used to have to lock when inserting new items (to get the LastInsertRowId), but the newer version of SQLite supports the RETURNING keyword, so we don't even have to lock on inserts now.
Also, the fact that we have the databases divided across function helps free up some of the lock contention in the provider. We don't really have any mixed workload databases - its either "slow" gigantic operations (JSON blob access), or super quick tiny things (updating session timestamps). So, there is some minor isolation of contention issues on a subsystem basis.
SQL.js[0] and the incredible “Absurd SQL”[1] are making it possible to build PWAs and hybrid mobile apps with a local SQL db. Absurd SQL uses IndexedDB as a block store fs for SQLite so you don’t have to load the whole db into memory and get atomic writes.
Also I recently discovered the Session Extension[2] which would potentially enable offline distributed updates with eventual consistency!
I can imagine building a SAAS app where each customer has a “workspace” each as a single SQLite db, and a hybrid/PWA app which either uses a local copy of the SQLite db synced with the session extension or uses a serveless backend (like CloudFlare workers) where a lightweight function performs the db operations. I haven’t yet found a nice way to run SQLite on CloudFlare workers, it need some sort of block storage, but it can’t be far off.
That's what absurd SQL is for (link in the parent comment).
[1] https://aws.amazon.com/blogs/compute/using-amazon-efs-for-aw...
I did just that at my (now defunct) startup a few years ago. We were building a collaborative database query tool. The software we built used sqlite to keep a local db for storing user credentials, caching query results, etc. Bonus, we were able to have the local database file be encrypted protecting the data at rest.
Very useful for pure go applications!
So you need to be carful how you design your schema, but very possible.
One option is to use something like Yjs and a JSON column to get proper CRDTs for merging.
As an aside, what about distributed derivatives of sqlite, like rqlite, as a response to the criticism that sqlite requires your database server to also be your web server. Could something like rqlite also provide a way for an sqlite database to grow into a distributed cluster at a later point?
The dirty secret is: Most of them don't really need that architecture. Most, and I mean +95% of websites would run just fine on a single box, running the websrv, db and whatever script-interpreter the backend runs on.
Sure, it may be a bit slower if its not hooked up to a global CDN, and if the box goes down the user gets 404. But its also alot easier for an admin to simply give the one box a good wallop and put it back online, since it so simple.
The problem to me isn't ability to scale on one server, it's the single point of failure. My biggest site is a wordpress box with one instance on a pretty big VPS. In the last year I've had several outages big enough to require a post-mortem (not complete outages, but periods with high levels of error rates/failures), and every time it has been because of internal data center networking issues at my preferred cloud provider (and thankfully their customer service is amazing and they will tell me honestly what the problem was instead of leaving me to wonder and guess). So the main incentive for me to achieve horizontal scalability in that app is not scaling, it's high availability so I can survive temporary outages because of hardware or networking, and other stuff outside of my control.
Except now you are not in control of plugging the cable back in, and pay (less?) for that convenience.
Which cloud provider is this, if you don’t mind sharing?
Then, of course, there was ISDN and xDSL, which would give you true to god whopping 128 kbits/s for a while. 64 kpbs if you were cheap. It took a while to get to affordable multiples of Mbits per second.
Now that there's at least 10 Mbps uplink from each residential subscriber, doesn't take long to DoS even a beefy server.
And I'd say that server-side, things improved vastly with advent of FastCGI and its equivalents. Back in that heyday of your P166-MMX server, it was CGI with Perl, spawning a process for each incoming request, or "blazing-fast" Apache's server-side includes, or other things like that. Maybe mod_perl with its caveats on memory sharing.
Anyway, you're right in that whenever you show them a wider pipe, they will find more stuff to congest it with.
It was only with the advent of heroku and the sudden shift to the cloud (AWS) that the DB stopped being software you ran on your box as a daemon, and became essentially a dedicated resource. Even for services running at a completely trivial scale.
Whether most things actually require HA is debatable, but a lot of businesses make it a core requirement and so it gets baked into the architecture. Personally I feel like most stuff would be better suited to having fast fail-over and recovery early on, but my advice rarely gets taken. Instead you end up with complicated HA architectures that nobody totally understands, which then (inevitable still) fall over and take hours to recover.
You don’t have to distribute everything to scale pretty high.
It is not client/server; a process must be able to fopen() the database file. NFS and SMB are options that can convey access to remote systems, but performance will not likely be good.
Only a single process can write to the database at any time; it does not support concurrent writers.
The backup tools do not support point-in-time recovery to a specific past time.
If your application can live with these limitations, then it does have some wonderful features.
And the safety of your data depends on the quality of your network filesystem's locking implementation. It's not too difficult to design a locking method that works most of the time, but it's a lot harder to build something that guarantees mutual exclusion over an imperfect network.
On a single machine, file locking is generally reliable (mediated by the kernel). Multiple clients can access the same database simultaneously, with readers and writers taking turns, and all that happens is you pay the performance penalty of serialized access.
But if your client thinks it has a lock on a remotely-stored database file, but the server-side lock has actually expired and another client is writing to the database, your data is likely to get hosed.
I can't parse [0] right now about the overhead of TCP and round trips with the database server, but it's basically a question of whether one SQLite write is 10x, 100x, 1000x, or more faster than a database server write. That should make a lot of difference.
In terms of latency it'll still be difficult to beat a database that lives in the same process as your application, but it won't be as bad as going over the network might be.
That's like saying immortality is trivial modulo death and whatnot. If you don't integrate with sqlite's locking system, you can easily "backup" a file mid-write, corrupted and unusable.
That's why sqlite has a built-in backup command.
There are ways around this (filesystem snapshots and potentially file locking) but it's not trivial and failing to handle this correctly is a very good way to build a system which passes tests under low load (when the file-copying operation will win all the relevant races) but fail catastrophically under heavy load.
What you are proposing may be trivial but in comparison it might just as well be rocket science. Doing more work to use a simpler database that is harder to setup and has less features is not exactly compelling.
RECOVER DATABASE UNTIL TIME '2021-10-01 02:00:00' USING BACKUP CONTROLFILE;
SQLite does not implement such a feature.I think you can do concurrent writing now with Write-Ahead Logging (WAL): https://www.sqlite.org/wal.html
I've never tried it though, so I don't know how suitable it is for web apps that might potentially have multiple processes trying to write to the DB at the same time.
But transactions in SQLite are often fast enough that this could still be acceptable for a lot of purposes, especially on an SSD.
The one time I actually wanted to do that, I wrote the server that `accept`ed incoming connections and used the single `fopen`ed SQLite DB.
It can be very flexible that way, TBH, but if you really need that type of thing a more traditional DB is better.
Potential plus in cloud/container deployments.
> It is not client/server; a process must be able to fopen() the database file. NFS and SMB are options that can convey access to remote systems, but performance will not likely be good.
There are other possibilities given a bit of additional wrappers (over the db process) and federating nodes with some (consensus) protocol. It may actually be a good fit as the 'node-unit' for very large scale distributed metadata store that would benefit from relational DB semantics instead of basic K/V.
With appropriate pragmas and database handles stored persistently between requests in server processes, the rate (in bytes of data affected by INSERT/UPDATE/DELETE) at which users could make DB changes got pretty darn close to the rate at which a file could be sequentially written on the filesystem, which is more than enough aggregate throughput for 1000s of concurrent non-bulk changes in most web applications, I'd imagine.
PRAGMA journal_mode = WAL;
PRAGMA busy_timeout = 30000;
This is what it takes to run a basic postgres database on my own PC (in a docker compose file):
postgres:
image: postgres:12.7
container_name: postgres
environment:
- PGDATA=/var/lib/postgresql/data/pgdata
- POSTGRES_PASSWORD=<pw>
volumes:
- ./volumes/postgres/:/var/lib/postgresql/data/
For someone who's completely allergic to SSH and linux, a managed Postgres service will take care of all that too.SQLite seems simple in that its "just a file". But its not. You can't pretend a backup is just copying the file while a DB is operating and expect it to be consistent. You can't put the file on NFS and have multiple writers and expect it to work. You can't use complex datatypes or have the database catch simple type errors for you. Its "simple" in precisely the wrong way - it looks simple, but actually using it well is not simple. It doesn't truly reduce operational burden, it only hides it until you find that it matters.
Similarly postgres is not automatically complex simply because it _can_ scale. It really is a good technology that can be simple at small scale yet complex if you need it.
Performs much better then postgres in terms of query latency which is ultra important for the domain we operate in.
I take machine level backups every 2 hours, so in the event of an outage, just boot the disk image on a new vm and it's off.
I would never do this on my professional job due to the stigma, but for this side project, it has been incredible
I love controversy so I was able to push SQLite through as the sole persistence mechanism for our product. Virtually every constraint lined up with our business perfectly.
We sell a B2B product that gets installed by our customers, so they love it too. No more sweaty hands conversations about why their database server cant talk to their application server over their networks while we are caught in the middle trying to avoid taking ownership of their internal IT problems.
> They’d double click on the thing and a dialog box would pop that says, “Can’t connect to database server,” and it wasn’t our fault. We didn’t have any control over the database server, but what do you do if you can’t connect to the server, so we got the blame all the same because we were painting the dialog box.
Using the right tool for the job does indeed carry a lot of stigma in many professional environments. Instead you use the tool that some VP has been sold by some salesman.
I presume that's block level backups? Or some snapshotting?
As far as I know, block level filesystem copies can get inconsistent (so we have journalling file systems). But assuming it works well, like a filesystem aware snapshot, can sqlite deal with files snapshotted in the middle of an operation?
Now obviously if I wanted to scale up, at some point you would have too many users to fit in memory. But do programs at that scale actually need to exist? Why can't everyone be on a federated server with state that fits in memory/JSON? I guess that's more of a philosophical question about big tech. But I think it's interesting that most of our tech stack choices are driven by projects designed to work at a scale most of us will never need, and maybe nobody needs.
As an aside, is there something like SQLite but closer to my use cases? So I guess like the nosql version of SQLite.
[1]: https://github.com/boringproxy/boringproxy/blob/master/datab...
I almost always choose SQLite for persisting to disk over JSON files. It essentially removes a large class of bugs and is robust enough that I’m not worried about introducing new problems.
Since they are using Go, couldn't you say the same thing about the Golang std library? As long as they know how to use a local file as database (do the swap, flush, etc...) I don't see the problem.
Of course you have to then mess around with SQL but you only have to write it once and encapsulate it somewhere and you're done.
I do a very similar thing for many of my at-home projects. They generally have a single-user, me, and it works great.
This latter point matters a lot to me since the disk in question is a Raspberry Pi's SD card, which I've heard have a tendency to wear out quickly, so I wanted to avoid writing to it unnecessarily. And yeah, this system is fragile, not actually ACID, etc etc… but I'm the only one who relies on it (it's not even accessible on the public internet, only over my Wireguard VPN), so it doesn't matter if it breaks!
I will probably rewrite this service later, backed by either SQLite or Postgres, but writing it this way saved me a fair bit of thinking and lines of code, which was great for getting it off the ground and usable to me in like two hours.
And also a query language, which is easy to write and modify queries for, presenting a great benefit over writing large amounts of boilerplate code for looping over records and accessing the right indices every time (keep abstracting that and you'll end up with your own relational database system).
I often import data into SQLite just to work with it, without necessarily even an application in between. Depending on the nature of the data it's either that or Matlab.
In our cloud infrastructure though we have 10k+ distributed sites so the interface reads/writes to an API backed by a database.
A major benefit to the confit file is that you can just open it in a text editor and muck around during development.
With SQLite, you can read or write a few bytes without having to process the others.
Also, SQLite would allow observing the current state by querying the file, with the JSON version you would need to keep dumping the state every few seconds.
But I agree with you that most languages have solid support for JSON and it prevents you having to break down those tree structures into rows and then back again. For typed languages the struct becomes the schema.
SQLite indexes and joins also would not be a benefit if your state is small enough that queries are just filter functions with “full table scans”.
Only serializing. The db is kept in memory, and only deserialized on startup.
But why is this a problem at the scale I'm working with?
Or LMDB, LevelDB, RocksDB, etc, if performance is important.
Maybe a heroku that knows about your database file and automatically loads the latest version for you?
I kind of feel like GP is a troll comment, as there's no real value add for a managed SQLite.
I used sqlite for that, and had a mysql server for the user data and stuff that needed to be written to. Performance was fantastic, users were happy, data updates were instantaneous ; copy the new data to the server then repoint a symlink.
Most of my work is modeling and simulation. Sqlite is almost always my output format ; one case per database is really natural and convenient, both for analysis, and run management.
Anyway. Sqlite is amazing.
I need to import some 30k JSONs of external monitor data from Lunar (https://lunar.fyi) into a normalized form so that everyone can query it.
I'd love to get this into a single SQLite file that can be served and cached through CDN and local browser cache.
But is there something akin to Metabase that could be used to query the db file after it was downloaded?
I know I could have a Metabase server that could query the SQLite DB on my server, but I'd like the db and the queries to run locally for faster iteration and less load on my server.
Besides, I'm reluctant to run a public Metabase instance given the log4j vulnerabilities that keep coming.
It's not perfect, but it fills the niche nicely.
One thing that I always wondered though: does anyone knows a big project/service that uses Golang and is backed by SQLite? This because SQLite would require CGO and CGO generally adds extra complexities and performance costs. I wonder how big Golang applications fare with this.
Isn't there some issue where SQLite basically has to be single-threaded in Golang programs, at least if you use the stdlib SQL library?
I guess not exactly, but iirc there were some caveats/performance issues around C threads x Go routines. A bit is touched in this post: https://www.cockroachlabs.com/blog/the-cost-and-complexity-o...
But well, it seems arp242 is GoatCounter's developer. Based on the post content, it is powered by SQLite and this is probably a good write-heavy application example. I wonder if there are any blog posts about the SQLite performance for this service.
CGO isnt too big a problem and if it is a real dealbreaker something like https://pkg.go.dev/modernc.org/sqlite will work as it transpiled the c into go and passes the sqlite test suite. I think there is performance degradation with writes but reads are still pretty quick.
https://pkg.go.dev/modernc.org/sqlite
It works well, but the performance is worse than C version. Not a big deal for what I used it for, though. It was approx. 6x worse at inserts.
No reader-writer lock. Still only 1 concurrent writer, but write via append to WAL file is cheaper. Can adjust read vs write performance by syncing WAL file more or less often. Can also increase performance with lower durability by not syncing WAL file to disk as often
My team at work has adopted it and generally likes it, but the biggest hurdle we've found is that it's not easy to inspect or fix data in production the way we would with postgres.
I assume because you're using a remote socket connection from the client?
I haven't tried it in a serious setting yet, but I did play around with dqlite and was impressed. Canonical uses it as the backing data store for lxd. Basically sqlite with raft clustering and the ability for clients to connect remotely via a wire protocol. https://dqlite.io/
Yeah, it's common for all developers to connect to and query against prod postgres DBs via DataGrip or similar.
dqlite definitely looks interesting, but I worry it's a bit heavy given that our only use case for remote access is prod troubleshooting. I think I saw something recently where you could spin up a server on top of a sqlite file temporarily - that might be ideal for us.
That's essentially what we do: copy the file locally if we need to inspect it. It's slightly more cumbersome though.
> I think that'd be advisable even if you were running postgresql.
Connecting to live prod servers is definitely not a 10/10 on the "best practices" scale, but it works well for our business (trading), where there are small developer teams that also operate, no PII in the database, and critical realtime functionality isn't directly involved with the database anyway.
The value in SQLite is its light weight, and not it's SQL side. If you're building a mobile app and you're loading a lot of local data, it might be the right choice.
We are using it as a replacement for RocksDB - we need a richer way to store data than a simple key value store. It still runs on a server though, and therefore it would be useful to be able to read data remotely, even if that isn't the primary purpose.
Does this mean that devs need to copy the production database file locally to then inspect it? Or are there tools to connect/bridge to a remote sqlite file?
Yeah.
> Does this mean that devs need to copy the production database file locally to then inspect it? Or are there tools to connect/bridge to a remote sqlite file?
We use "kubectl copy" currently when we want to inspect it, and we haven't actually had to write back to a production file yet. We've explored the "remote" option, but since it's just a file, everything seems to boil back down to "copy locally" then "copy back to remote prod".
It's only a small part of our stack at the moment, so we haven't invested in tooling very much - but I'd be curious if others have solved similar problems.
https://corecursive.com/066-sqlite-with-richard-hipp/
An interview with one of the creators:Mr. Richard Hipp - for a better and deeper understanding what pitch they took and what industries they were in to. Their approach to overcome the db-world that they saw in front of them. See the obstacles and the solutions and why it came to be that underestimated 'sqlite' that powers a good chunk of all you mobile actions triggered by your apps - but just read that interview - i cannot reproduce the dramatic here in my own words (underestimated).
It's a "read only" and small website (at least for now), with just a bunch of daily visitors, a perfect use case for SQLite.
Funny thing is that in my case the database it's so small that it's pushed directly on the repo.
Especially for startups and little projects, SQLite is your best friend.
The scant upside of 10-50x supposed query latency increase is likely to be worth little. In the extreme this is low single-digit milliseconds, so will be dwarfed by network hops.
In return for the above, you’ve coupled your request handler and it’s state, so you won’t be able to treat services as ephemeral. Docker and Kubernetes, for instance, become difficult. You now require a heap of error-prone gymnastics to manage your service.
If the query latency really matters, use an in memory db such as Redis.
SQLite is great for embedded systems where you’re particularly hardware constrained and know your machine in advance. It could also be a reasonable option for small webservers running locally. For anything remote, or with the slightest ambition of scale, using SQLite is very likely a bad trade off.
There must be some scaling limits to encounter using this combination, but wouldn't you love to have that problem?
It gives me a handy way to run queries on data.
I tried to make it super smart too and really make educated guesses on what data types to use and even linking foreign keys.
I'm surprised by this, sure in-process is always going to be faster but still find it hard to believe that sqlite can be beat postgres in a single machine.
"VACUUM INTO somefile.sqlite;"
to dump the RAM copy to disk.
What a great tool.
Once I had this data in Python I needed a way to analyze it. I never worked with Databases but decided to install a local copy of SQLite. The rest is history. I feel like I learned how to use databases in an organic way: by looking for a solution from raw data. A couple of queries later and python was exporting excel sheets with color coded boxes that indicated something based on the analysis I did.
Of course this could be done with any database application but the low weight nature of sqlite allowed me to prototype a solution so easily. We just backed up that native sqlite dump with the cloud and had an easy (super easy) solution to analyze raw data.
Currently using Postgres and I'm open to switching but I haven't seen any libraries or implementations of SQLite being used as a client/server database.
Read the creator of sqlite for more information on all running topics that make you go 'Uh' up to this point of time: https://corecursive.com/066-sqlite-with-richard-hipp/