Here it’s being used for web application firewall rules.
Another place I’ve thought about using this is feature flag configuration. Feature flags can be checked dozens of times per request and often need the kind of queries (user is a member of group A and has an IP located in country B) which could be well served by a local SQLite - and feature flags have a tolerance for updates taking a few seconds (or longer) to roll out.
BTW, this is also the model used by all CDNs, where the global configuration file containing the certificates, HTTP routing rules etc. for all customers will be updated into into a single-file b-tree structure*, and that "bundle" is distributed among all edge locations frequently.
* I'm yet to see someone use sqlite for this purpose, it's usually DBM style databases like LMDB or Kyoto Cabinet.
Now, that's a name I've not heard in a long time.
Are people still using Kyoto Cabinet in new projects? Are people still using DBM-style storage generally? I thought that whole branch of the evolutionary tree had sort of died out.
Periodically we would run a full sync to replace the database. Between the periodic full syncs, we had a background process keep changes applied on a rolling basis.
All-in-all, it worked pretty well at the time! The full database file sync ensured a bad database was timeboxed and we got a bootstrapping mechanism for free.
In any case, SQLite would serve this solution, but almost certainly with a measurable level of inefficiency built in.
I'm not knowledgeable on this, but my understanding was a b-tree is a way of sorting values that could be ordered in a certain way. Like this would be a b-tree of IDs
```
[8]
/ \
[3, 5] [10, 12]
/ | \ / | \
[1] [4] [6,7] [9] [11, 13]
```You traverse by comparing your needle to the root node and going left or right depending on the results.
How is that done with configuration options? That seems like it would just be a regular hashmap which is already efficient to read. What would a b-tree of key/values even look like that wouldn't be less efficient than a hashmap?
In Ruby we called this “hammerspace” https://github.com/airbnb/hammerspace
2. If you were making it today would you just use SQLite?
My strategy for resolving this is to fetch the flag value once, but to store it in the request object, so that a) you never have to take the expensive lookup hit more than once per request, and b) there's no risk of an inconsistent value if the flag is updated mid-request.
I can see why you need to check multiple different flags in a session and I understand the parent point about looking in SQLite for them (effectively a function call into a library in process address space rather than a call over the network for each flag).
* SQLite, but every change you make increments a global version counter.
* Some way to retrieve and/or apply just the changes between version counter numbers.
Then, you could layer all manner of distribution mechanisms on top of that. You could imagine gossip networks, an explicit tree-shaped cache/distribution network, etc. Anyone who has a later version than you would be able to give you the update.
What would this get you?
* You can bundle the current version of a DB into your app artifact, but efficiently update it once your app runs. This would let you fall back to much better defaults than having no data at all. (eg. For feature flags or live config, this would let you fall back to a recent state rather than code-level defaults.)
* Any kind of client can send the global counter as an ETAG and get just the delta
* Reconnections if the network blips are really simple and cheap
* If the system also let you keep a couple of minutes of history, you could annotate the counter onto a microservice call (using W3C Baggage headers, for example), and evaluate your flags/config/data/whatever at a single version number across multiple microservices. Even without evaluate-at-time functionality, logging the generation number would help a lot when debugging what happened after the fact
I'm honestly surprised it isn't more pervasive.
The write tracking needs to intercept all writes with a custom VFS, but once registered and configured, it's regular in-memory SQLite (no additional daemon).
That said, it may not be practical to have hundreds of read-only nodes, but for moderate-size needs, should work fine.
Disclaimer: I'm the creator of rqlite.
While SQLite can be used on SSDs, it is not as heavily optimized for SSDs as RocksDB. SQLite is a general-purpose embedded database and was primarily designed for use on devices like mobile phones or lightweight embedded systems, where storage is often slower and smaller (e.g., flash storage or basic SSDs).
SQLite’s traditional B-tree structure involves more random I/O compared to RocksDB's LSM tree, which is less ideal for maximizing the performance of high-speed SSDs like NVMe.
SQLite’s limited concurrency (single-writer, multiple-readers) also means it cannot fully utilize the parallelism and high throughput potential of NVMe SSDs.
You could look into WAL replication if you wanted an efficient way to update the copies. Something like Litestream.
As requests come in, gather their features, convert to effectively an int vector, filter for row where match and serve request
This doesn't sound right. A feature flag only requires checking if a request comes from a user that is in a specific feature group. This is a single key:value check.
The business logic lies in assigning a user to a specific feature group, which the simplest way means pre assigning the user and in the most complex cases takes place at app start/first request to dynamically control dialups.
Either way, it's a single key: value check where the key is user ID+feature ID, or session ID + feature ID.
I mean, I guess you can send a boat load of data to perform the same complex query over and over again. I suppose. But you need to not have invested any thought onto the issue and insisted in making things very hard for you, QAs, and users too. I mean, read your own description: why are you making the exact same complex query over and over and over again, multiple times in the same request? At most, do it once, cache the result, and from therein just do a key:value check. You can use sqlite for that if you'd like.
This meant you could roll features out to:
- Specific user IDs
- Every user ID in a specific group
- Every object owned by a specific user ID (feature flags might apply to nested objects in the system)
- Requests from IP addresses in certain countries
- Requests served by specific website TLDs
- Users who are paid members of a specific plan
- etc etc etc
It was an enormously complicated system, that had evolved over 5-10 years.
Not saying that level of complexity is recommended, but that's what we had.
Looks like I gave a talk about this back in 2014: https://speakerdeck.com/simon/feature-flags
We had a different problem, where our monolithic app used both Postgres / Redis for different use cases and worked relatively well. However - it was a lot easier to shove new functionality in the shared Redis cluster. Because Redis is single-threaded, one inconsiderate feature that does bulk reads (100K+ keys) may start to slow down other things. One of the guidelines I proposed was that Redis is really good when we're reading/writing a key, or small fixed-cardinality set of keys at a time, because we have a lot of random things using Redis (things like locks and rate limits on popular endpoints, etc).
However, in your case, I'm guessing Redis shines in the case of a naive single-key (IP address) lookup, but also doesn't do well with more complicated reads (representing your range query representation?). Cool write up overall, I don't have a deeper understanding of how SQLite performs so well when compared to a local Redis instance, so that was unexpected and interesting to observe.
I think it's best to consider Redis a cache with richer primitives. It excels at this and used appropriately will be both fast and solid.
But then people start wanting to use it for things that don't fit into the primary rdbms. Soon you have a job queue, locks of various sorts, etc. And then it just becomes a matter of time until performance crosses a cliff, or the thing falls down for some other reason, and you're left with a pretty ugly mess to restore things, usually resulting in just accepting some data loss.
It takes some discipline to avoid this, because it happens easily by increments.
As for SQLite's performance, besides avoiding network overhead, a lot of people underestimate serialization and deserialization costs. Even though Redis uses a pretty minimalist protocol it adds up. With SQLite a lot of things boil down to an in process memcopy.
Other alternatives may use static files loaded in-memory, but I'm guessing the data is more than you'd want to keep in memory in this case, making SQLite a nice alternative.
Ultimately a RDBMS like SQLite is what you'd get if you start with loading static files into memory and from that point onward you add the necessary and sufficient features you need to get it to work for the most common usecases. Except it's rock solid, very performant, and exceptionally tested out.
I've only worked on one production Rails application in my career (and it did use Redis!), so I'm way out of the loop – is the ecosystem turning against Redis from a business perspective (I know there have been some license changes), or is it a YAGNI situation, or something else?
IIRC we used it mainly with Rescue to schedule asynchronous jobs like indexing, transcoding, etc., but it seemed like a neat tool at the time.
The 2024 Rails community survey just came out and Redis is still listed as the top datastore that people use in their apps.
FWIW - we found that while many folks are _using_ Redis in their apps, they're just using it for things like Sidekiq and not actually taking advantage of it for holding things like real time leaderboards, vector db functions, etc. so it's a little fuzzy the actual usage.
Right now, most rails setups with decent traffic will have frontend boxes, a sql db, a KV store (redis or memcached), and a cache store pointed at the kv store, with, annoyingly, very different usage patterns than typical KV store usage, eg for maintaining api quotas or rate limiting.
Disk performance has gotten fast enough and SQL performance has gotten good enough that there's a movement to drop the KV store and split the usages (for traditional KV use and also backing a cache) to the sql db and disk, respectively. Plus new nvme disks are almost as fast and still much cheaper than ram so you can cache more.
In the era of k8s, and redis-as-a-service, though? It’s gonna be “shared memory” on another VM on another rack. At that point, just read & write a damn file off S3, you’ve already abandoned all hope of efficient use of resources.
Every database, Relational or Nonrelational, requires approximately the same level of management and maintenance when you start dealing with non-toy levels of transactions.
The "Fast" part is a little funny. If you don't care about joins, then row inserts and retrievals are pretty damn fast too =)
What makes SQLite exceptionally fast in a server environment is that you do not require a network call to do the query or even retrieve the data. Your remarks about joins and transactions are meaningless once you understand you're just reading stuff from your very own local HD, which is already orders of magnitude faster.
Modern cloud architecture hates local sockets and wants to push everything to the network, but if you care about performance and are using Redis, that’s how you’d favor deploying & connecting to it.
The last time I shipped an application on SQLite, I ended up just using a counter and vacuuming after a large number of write operations.
Doing a vacuum after a large number of deletes might make sense. The only real purpose of vacuum IMO is to recover free space from a database. Vacuum may also optimize certain access patterns for a short while, though I have never tested this, and it would be highly dependent on the queries used. If fragmentation is a bigger concern for you than recovering free space, you can also compute the fragmentation to decide whether to vacuum by using the dbstat table:
https://www.sqlite.org/dbstat.html
Then again, computing this will require accessing most of the database pages I'm guessing, so might take nearly as long as a vacuum. The other gotcha here is that just because db pages appear to be sequential in a file doesn't mean they are sequential on a physical drive, though filesystems do strive for that.
SQLite has pragma commands to tell you the number of total and free db pages. When the percentage of free pages is greater than x% and it's a convenient time, do a vacuum. For a highly volatile db, you can add a table containing this percentage, update it every day, and make your decision based on an average, but IMO it's easier just to check for more than 50% free (or whatever) and do the vacuum.
Vacuums used to be (circa 2019) pretty slow operations, but the SQLite team has sped them up greatly since then. Vacuuming a 3GB SQLite db on a SSD takes less than a minute these days. That's with the db 100% full; with only 50% used pages, it would be considerably faster.
Vacuums are done in a statement transaction, so you don't have to worry about a "half vacuum that runs out of disk space" screwing up your database.
I disagree with this statement. Surely there is a difference on the effort for "management and maintenance" if your database requires the operation of independent server processes.
Going to extreme examples, do you really believe it makes no difference whether you use SQLite or Oracle?
>According to the benchmarks, Redka is several times slower than Redis.
Still a cool project, don't get me wrong. But this kind of doesn't give me any incentive to switch.
It abandons Redis' in-memory data and sqlite's in-process speed... for what?
Thanks for sharing.
v1 1) In v1 they had waf and redis on the same server 2) Client went to the admin panel to set new rules 3) Rules went to redis that is on the same server with admin panel 4) Thanks to redis internal synchronization mechanism rules were updated to all of the redises(that are stand locally with waf all over the globe) 5) When new request come to some waf, waf verified request/ip with updated redis rules Do I understand v1 correctly? Redis infrastructure was used to spread new rules by itself?
v2: 1) They deleted the redis cluster 2) Every waf server now has sqlite db 3) They made some synchronization mechanism to spread new rules from admin panel to every server that contains waf and sqlite 4) When a new request comes to some waf, waf verifies request/ip with updated sqlite rules. And that is very fast!
That is the case?
"SQLite does not compete with client/server databases. SQLite competes with fopen()."
If this is uncompressed IPv4 addresses, it's just 4.8 MB; and with some trival compression (like a 2-level trie), it could be about 2x smaller. Even if it's uncompressed IPv6, that's still just 32 megabytes.
Does Ruby support mmap? If yes, I'd suggest direct IP list. Lots of fun to write, big speedup over sqlite, and zero startup time.
.
But always happy to see a discovery of a better solution. I agree removing the network is a win.
However, Redis runs as an out of process server with marshaling and unmarshaling of data across sockets. SQLite is in process and with a prepared query is basically one library call to a purpose built data access VM. So I’m not sure why it would be hard to believe this cache and TLB friendly setup can beat Redis.
1 - Redis sorted sets - https://redis.io/docs/latest/develop/data-types/sorted-sets/
With that in mind, it's setting higher levels of limiting and doing the math to push that out over many machines/instances/dynos. That helps for things like scraping prevention, etc.
For issues like credential stuffing attacks, you'd want a lower limit but also coupled with mitigations like IP bans, IP reputation, etc. to deal with underlying issue.
But if we are sending deltas to a central server performance isn't critical: there can be a traditional web service to call, without uploading databases.
I have not used Redis myself, but have been using Sqlite more and more over the years.. and found a perfect application I wrote using Sqlite under the hood.
Powerful and convienient database system!
Is it even "worth" using SQLite at this point? What about a configuration file, and straight-up code that works with in-memory data structures?
- Really strong support across multiple platforms (we have clients for most of the major web frameworks)
- Efficiency, sure we have lots of RAM on servers nowdays but on some platforms it's constrained and if you don't have to burn it, we'd just rather not.
- When we started mapping this out, we ended up with something that looked like a JSON format that we were adding indexes to....and then we were re-inventing SQLite.
Edit: reading the docs it looks like it operates the same way, just reading sections of the db from memory instead of disk
See the other response from the article's author.
The client is responsible for checking with our servers and, if rule updates are found, downloading a new database file. To avoid locking and contention issues, these are each uniquely named, and which DB is "current" is just updated.
Note: This is only in "managed" mode. If you'd rather, you can distribute a SQLite database of the rules alongside your app.
Due to the magic of the page cache, the answer to that can be "both".
If the sqlite database is being read often and not being written, the page cache will be valid and reads will pretty much never go to the filesystme.
Not seeing a mention of that in the article.
In the article, we refer to this as "decimal lexical" formatting, where we're taking the IPs and making them integers but actually treating them as strings. We're doing this in both Redis with sorted sets and then in a single table in SQLite.
I was going to explain all this in the article, but it was too long already, so it will be a future blog post.
But about the formatting of the data, is it completely inherent to the rest of the system / unchangeable? Spontaneously I would have guessed that for example a bitfield in redis would have performed better. Did you test any other formattings?
... Why not run redis on localhost?
As someone who was a relatively-early adopter of it on real servers and hand-managed VMs (i.e. we also controlled the hardware and host OS for the VMs) for a higher traffic site than most of the ones that think they need auto-scaling cloud shit from day one will ever reach, and was/is very enthusiastic about redis, I have exactly no idea why this is a popular way to use it. Cargo-culting and marketing (how else are you gonna sell it as a service?) are all I can figure as the motivations.
Connecting to it over a network is a handy feature for, like, ancillary things that need to connect to it remotely (maybe for reporting or something?) but as the main or only way of using it? Yeah I don’t get it, you’ve just wiped out a ton of its benefits.
Isn’t “redis to sqlite” effectively what foundationDB?