https://download.huihoo.com/google/gdgdevkit/DVD1/developers...
Also in Brett Slatkin's "Building Scalable Web Apps with App Engine" (2008)
> You can get higher concurrency by keeping more than one row and updating a random row... just choose a random slot and update it
https://www.oreilly.com/library/view/high-performance-mysql/...
For example, my experience has been that in the world of payments, the nature of money transactions is to debit perhaps many different accounts on the one side, but credit only a handful of accounts on the other, so that group commit can't fully amortize fsync.
Performance panaceas mostly come down to sharding, but sharding doesn't work well where you need strict updates to balances.
At work, we saw this play out several times in different systems, and decided to do something about it. We took the ledger of an open-source payments switch called Mojaloop, and extracted it as a distributed financial accounting database called TigerBeetle, designed to track financial transactions at scale.
The key performance insight was to dial up group commit. We batch up balance updates so that a single DB query can do on the order of 10k balance updates. We then fsync the batch with a single write before commit, moving the performance needle out from 1k-10k TPS to 1m TPS.
This is the advantage of a purpose-built database that's designed for counting at scale.
More information, including our design decisions are in the repo here: https://github.com/coilhq/tigerbeetle
UPDATE counters SET count = count + 1 WHERE name = ? AND slot = (SELECT slot FROM counters FOR UPDATE SKIP LOCKED LIMIT 1) LIMIT 1
SKIP LOCKED doesn’t seem to be designed for that purpose: https://www.enterprisedb.com/blog/what-skip-locked-postgresq...
The whole point of the article was that idle-in-transaction due to locks on a counter in another transaction cause bottlenecks; skipping them using SKIPLOCKED with enough slots eliminates this. Randomly selecting them also randomly picks the locked one, causing a wait.
One side benefit of this approach is that getting the final aggregate is cheap, where compacting an append-only log table might not be.
This makes some intuitive sense, though: general purpose databases are expected to be _pretty good_ at handling the case of "add new data" with no other specific conditions, e.g. on other rows or tables' existing data.
I also agree with your last point. Running count() all day on this wouldn't be great, and compaction would take real time. I assumed that most high throughput write scenarios for something like an event count or view count can be a few minutes (or hours, or days) out of date, at which point read caching would be my first stop before clever summation algorithms, which are still pretty cool.
I has the same issue, and I fixed it by adding an associated 'count_table' row for each hit, and deleting the row once it had been added later on to the final count. Which actually fixed the issue. Then refactored it so each user or ip had it's own 'count_table' row. It meant the final total count lagged a bit, by 60 seconds or so once the count_table rows had been counted up and deleted, that was the downside but it was totally acceptable.
I wish I'd have thought of this, it's much better and simpler I think haha.
or just buffer the counters in redis and then flush them out
I have a counters API that does precisely this
Docs are here: cmd+f: 'Counters API now live'
For one use-case it might be ok, although if we expect more counters in the future, it's easier to make it now than refactor later.