Skip to content

Top New Best Ask Show Jobs

Amazon DynamoDB Transactions | Better HN

Amazon DynamoDB Transactions (opens in new tab)

(aws.amazon.com)

173 pointsleef7y ago75 comments

75 comments

Any experience with using Aurora in place of DynamoDB?

A couple years ago there was an interesting tidbit at re:Invent about customers moving from DynamoDB to Aurora to save significant costs.[1] The Aurora team made the point that DynamoDB suffers from hotspots despite your best efforts to evenly distribute keys, so you end up overprovisioning. Whereas with Aurora you just pay for I/O. And the scalability is great. Plus you get other nice stuff with Aurora like, you know, traditional SQL multi-operation transactions.

It was kind of buried in a preso from the Aurora team and the high-level messaging from Amazon was still, NoSQL is the most scalable thing. Aurora was and is still seemingly positioned against other solutions within the SQL realm. I sort of get it in theory that NoSQL is still theoretically infinitely scalable whereas Aurora is bounded by 15 read replicas and one write master.. but in practice these days those limits are huge. I think one write master can handle like 100K transactions a second or something.

So, I'm really curious where this has gone in the past couple years if anywhere. Is NoSQL still the best approach?

[1] https://youtu.be/60QumD2QsF0?t=1021

https://aws.amazon.com/blogs/database/how-amazon-dynamodb-ad...

Oh cool. For those reading along this is titled "How Amazon DynamoDB adaptive capacity accommodates uneven data access patterns (or, why what you know about DynamoDB might be outdated)". Is this a new feature?

I personally recommend using a SQL database until you're absolutely positively sure you don't need one, for many reasons.

But, as far as the "you end up overprovisioning" because of hotspots thing, DynamoDB does offer autoscaling these days, which should alleviate a lot of provisioning-related headaches and save you money compared to the provisioning you would have done with DynamoDB, from what I understand.

orthecreedence7y ago

We use a hybrid. We process a lot of incoming data and dump most of it into dynamo (it's ephemeral so the TTL feature is nice) and if we get capacity errors (Dynamo takes a while to scale up sometimes) we just dump our objects in the DB. The end result is we keep a huge amount of writes off our DB for processing incoming largish objects. The amount of data it stores would cost an arm and a leg to put into redis.

Granted, I don't think I'd want to use Dynamo for anything other than temporary data. Lock-in makes me nervous, and the way it scales up/down really makes it difficult to use it for hourly workloads...by the time it scales up we're close to done needing more capacity, then it doesn't scale down for like 40m after. We set up caps and the DB overflow machanism keeps things from grinding to a halt.

> DynamoDB does offer autoscaling these days, which should alleviate a lot of provisioning-related headaches

The problem they noted isn't lack of autoscaling, it's that you have to provision the entire datastore to accommodate your hottest partition.

piinbinary7y ago

Autoscaling doesn't always help with hot shards (which I think gp was referring to) because you can have a single shard go over its share of the throughput[0] while still having a low total throughput.

[0] total throughput/num shards

manigandham7y ago

Yes. Relational databases are very fast and using them as key/value stores is a great use-case. Using a scale-out system like Aurora makes it even better. It's slower because of SQL parsing and generally the SQL clients are not as fast, but you can get close to single-digit millisecond latency these days.

We use Aurora or Postgres for key/value unless we need something specific, like multi-regional capacity or really high-end performance. For that we run ScyllaDB.

> It's slower because of SQL parsing and generally the SQL clients are not as fast

I'd be really surprised if the client library introduces a latency significant enough to be compared to the network latency between the app server and the database server.

scarface747y ago

Whether NoSQL is the best approach and whether DynamoDB is the best approach are two separate issues. I find DynamoDB too limiting with the way that it handles indexing, read and write capacity, etc. compared to traditional NoSQL databases like ElasticSearch and Mongo.

That being said, one advantage of DynamoDB is that it is API based and you can make a true serverless web app where all of the logic is on the client, you use Web Federation for authentication to DynamoDB, and you host your JavaScript files, html and CSS on S3.

Another advantage until two days ago, was that with most of the data stores on AWS, you kept your databases behind a VPC and if you used lambda, your lambda also had to be in a VPC and that increased warm up time for the lambda.

Now, there is the Read Only Data API for serverless Aurora. You don’t have to worry about the traditional connection pooling or being in a VPC.

wahnfrieden7y ago

You can write too - not just read-only.

mdani7y ago

Aurora did not work well for us (it was using local ephemeral disk to do sorts so our query results were truncated / limited to largest local storage) so the best option for us was to run MySQL or PostGres on a i3 instance with local SSDs.

Ok but I'm not sure this is relevant. We're talking about using Aurora in place of DynamoDB, not how it compares to other SQL DBs. With DynamoDB the kind of internal sort you're talking about isn't even possible, right?

rawoke0836007y ago

"Plus you get other nice stuff with Aurora like, you know, traditional SQL multi-operation transactions." THIS !!!!

NoSQL has such a nich usage!

piinbinary7y ago

My wishlist for DynamoDB is now down to:

* Fast one-time data import without permanently creating a lot of shards (important if you are restoring from a backup)

* Better visibility into what causes throttling (e.g. was it a hot shard? Was it a brief but large burst of traffic?)

* Lower p99.9 latency. It occasionally has huge latency spikes.

* Indexes of more than 2 columns

* A solution for streaming out updates that is better than dynamodb streams

monstrado7y ago

Also, better insight into partition sizes / what's causing hot spotting. The DB abstracts a lot from the user, which isn't necessarily great, because it's still subject to the normal pitfalls of a NoSQL database.

tejasmanohar7y ago

Bigtable is a different beast, but it's new "Key Visualizer" is impressive. Has helped us quickly find anomalies https://cloud.google.com/bigtable/docs/keyvis-overview

Wish Dynamo had something similar

darkr7y ago

Not a particularly easy solution, but you can use dynamo streams to achieve this by loading fast into a temporary table, trickle-feeding via a stream into another table. When it’s caught up, stop writes on the import table then swap over to the permanent table.

A way of doing this without expending all that effort is oh my wish list too.

antonp7y ago

> * A solution for streaming out updates that is better than dynamodb streams

What bothers you about dynamodb streams specifically?

What kind of p99.9 latency are you looking for?

anentropic7y ago

and would Dax help?

Congrats to the DynamoDB team for going beyond the traditional limits of NoSQL.

There is a new breed of databases that use consensus algorithms to enable global multi-region consistency. Google Spanner and FaunaDB where I work are part of this group. I didn’t catch anything about the implementation details of DynamoDB transactions in the article. If they are using a consensus approach, expect them to add multi-region consistency soon. If they are using a traditional active/active replication approach, they’ll be limited to regional replication.

erik_seaberg7y ago

They warn about other regions seeing incomplete transactions (if you opt into transactions on global tables), which fits with the current "copy each new item from the stream" async replication.

“DynamoDB is the only non-relational database that supports transactions across multiple partitions and tables.”

Uh... this is just not true.

Can you identify some others?

stickfigure7y ago

The Google Cloud Datastore (formerly the "App Engine Datastore") has had cross-entity-group transactions since 2011:

https://googleappengine.blogspot.com/2011/10/app-engine-155-...

monstrado7y ago

FoundationDB https://www.foundationdb.org/ not only supports transactions, they are mandatory. They also go one step further and support atomic operations, which are especially killer.

As far as I'm aware these offerings support transactions across the entire database.

Google Cloud Spanner: https://cloud.google.com/spanner/docs/transactions

Google Cloud Firestore: https://firebase.google.com/docs/firestore/manage-data/trans...

Plus if you use Cloud Firestore in Datastore Mode then Google Cloud Datastore would satisfy this requirement as well.

gmaster14407y ago

MongoDB (https://www.mongodb.com/transactions)

“Multi-document transactions can be used across multiple operations, collections, databases, and documents.”

AaronFriel7y ago

Hyperdex Warp - I'm not sure if it's still available - which purports to provide serializability over multi-key transactions. In the HyperDex model that means over all defined spaces in the cluster. That's a stronger guarantee than DynamoDB provides, which is still susceptible to phantom reads. The DynamoDB team ought to be aware of it, because it's one of the first hits for "multi-key transactions" and the paper is an important one for designing transactions on KVS.

https://arxiv.org/pdf/1509.07815.pdf

winetraveler7y ago

FaunaDB (mentioned in previous comment) -- it is multi-model NoSQL so you can do relational queries and it supports transactions across multiple partitions, documents, replicas. json docs, not tables.

Nican7y ago

CockroachDB And you can get pretty close to being schemlass with the json column, and maybe with the inverted index.

philliphaydon7y ago

RavenDB I think?

This is cool, it lifts the burden of having to bake "atomicity" into your app if you're using a key/value store like DynamoDB. I can see a nice balance of combining this with some built in error checking in the app itself.

jared25017y ago

I'd be interested to see comparisons/benchmarks against FoundationDB. DynamoDB transactions make dynamo a serious alternative to FDB now. I can see the two manage advantages for FDB being: 1) you can deploy it on premise (which is potentially important for some B2B companies), 2) it shuffles data around so that hot-spotting of a cluster is eliminated (which dynamo appears to still suffer from).

wahnfrieden7y ago

#2 appears to be solved now https://aws.amazon.com/blogs/database/how-amazon-dynamodb-ad...

polskibus7y ago

Foundation DB is open source!

Postgresql gets native JSON support (at least since 9.2 onwards) to store schemaless free flowing text. Dynamodb gets transaction guarantees.

There is globalization and intermingling happening on technology too.

On a similar thought, a few years back, C# and Java got `Any` generic types, while Python/JS got static types (via python3 typings, typescript)

C# doesn't have an Any generic type (Foo<?> In java parlance)

manigandham7y ago

C# can use `object` or `dynamic`:

https://stackoverflow.com/questions/2690623/what-is-the-dyna...

asien7y ago

>If an item is modified outside of a transaction while the transaction is in progress, the transaction is canceled and an exception is thrown

You are still responsible to implements a Queue or a Lock on the Items you want to mutate.

That said this is a huge milestone for DynamoDB, we can now safely mutate multiples items while remaining ACID.

Max 10 items per transaction, that's quite a restriction! I guess you have to plan all the transactions you would perform and make sure they meet the bounds.

Thaxll7y ago

Is the heat map available to customers now or is it still a request you have to do?

darkr7y ago

Still have to request it AFAIAA

j / k navigate · click thread line to collapse