A couple years ago there was an interesting tidbit at re:Invent about customers moving from DynamoDB to Aurora to save significant costs.[1] The Aurora team made the point that DynamoDB suffers from hotspots despite your best efforts to evenly distribute keys, so you end up overprovisioning. Whereas with Aurora you just pay for I/O. And the scalability is great. Plus you get other nice stuff with Aurora like, you know, traditional SQL multi-operation transactions.
It was kind of buried in a preso from the Aurora team and the high-level messaging from Amazon was still, NoSQL is the most scalable thing. Aurora was and is still seemingly positioned against other solutions within the SQL realm. I sort of get it in theory that NoSQL is still theoretically infinitely scalable whereas Aurora is bounded by 15 read replicas and one write master.. but in practice these days those limits are huge. I think one write master can handle like 100K transactions a second or something.
So, I'm really curious where this has gone in the past couple years if anywhere. Is NoSQL still the best approach?
But, as far as the "you end up overprovisioning" because of hotspots thing, DynamoDB does offer autoscaling these days, which should alleviate a lot of provisioning-related headaches and save you money compared to the provisioning you would have done with DynamoDB, from what I understand.
Granted, I don't think I'd want to use Dynamo for anything other than temporary data. Lock-in makes me nervous, and the way it scales up/down really makes it difficult to use it for hourly workloads...by the time it scales up we're close to done needing more capacity, then it doesn't scale down for like 40m after. We set up caps and the DB overflow machanism keeps things from grinding to a halt.
The problem they noted isn't lack of autoscaling, it's that you have to provision the entire datastore to accommodate your hottest partition.
[0] total throughput/num shards
We use Aurora or Postgres for key/value unless we need something specific, like multi-regional capacity or really high-end performance. For that we run ScyllaDB.
I'd be really surprised if the client library introduces a latency significant enough to be compared to the network latency between the app server and the database server.
That being said, one advantage of DynamoDB is that it is API based and you can make a true serverless web app where all of the logic is on the client, you use Web Federation for authentication to DynamoDB, and you host your JavaScript files, html and CSS on S3.
Another advantage until two days ago, was that with most of the data stores on AWS, you kept your databases behind a VPC and if you used lambda, your lambda also had to be in a VPC and that increased warm up time for the lambda.
Now, there is the Read Only Data API for serverless Aurora. You don’t have to worry about the traditional connection pooling or being in a VPC.
NoSQL has such a nich usage!
* Fast one-time data import without permanently creating a lot of shards (important if you are restoring from a backup)
* Better visibility into what causes throttling (e.g. was it a hot shard? Was it a brief but large burst of traffic?)
* Lower p99.9 latency. It occasionally has huge latency spikes.
* Indexes of more than 2 columns
* A solution for streaming out updates that is better than dynamodb streams
Wish Dynamo had something similar
A way of doing this without expending all that effort is oh my wish list too.
What bothers you about dynamodb streams specifically?
There is a new breed of databases that use consensus algorithms to enable global multi-region consistency. Google Spanner and FaunaDB where I work are part of this group. I didn’t catch anything about the implementation details of DynamoDB transactions in the article. If they are using a consensus approach, expect them to add multi-region consistency soon. If they are using a traditional active/active replication approach, they’ll be limited to regional replication.
Uh... this is just not true.
https://googleappengine.blogspot.com/2011/10/app-engine-155-...
Google Cloud Spanner: https://cloud.google.com/spanner/docs/transactions
Google Cloud Firestore: https://firebase.google.com/docs/firestore/manage-data/trans...
Plus if you use Cloud Firestore in Datastore Mode then Google Cloud Datastore would satisfy this requirement as well.
“Multi-document transactions can be used across multiple operations, collections, databases, and documents.”
There is globalization and intermingling happening on technology too.
On a similar thought, a few years back, C# and Java got `Any` generic types, while Python/JS got static types (via python3 typings, typescript)
https://stackoverflow.com/questions/2690623/what-is-the-dyna...
You are still responsible to implements a Queue or a Lock on the Items you want to mutate.
That said this is a huge milestone for DynamoDB, we can now safely mutate multiples items while remaining ACID.