undefined | Better HN

0 pointsmaltalex4y ago0 comments

Sure. Here are a few examples. They're all based on my experience with the MongoDB API for CosmosDb. Your mileage with other APIs may vary.

1. CosmosDB has a hardcoded 60-second timeout for queries. That means that queries that take longer than that are literally impossible to run without breaking the query into smaller chunks. This is worse than it sounds because CosmosDB doesn't have some of the basic optimizations that exist in other databases. For example, finding all distinct values of an indexed field required a full scan which wasn't doable in 60 seconds. Another example is deleting all documents with a specific value in an indexed field - again, not doable in 60 seconds. When deleting or updating multiple documents, we'd write short snippets of code that queried for the ids of all documents that need to be updated, and then updated or deleted them by id one by one.

2. Scaling up and down again can cause irrevocable performance changes since there's a direct link between the number of provisioned RUs and the number of "Physical Partitions" created by the database. A new "physical partition" is created for every 10K RUs or 50GB of data. CosmosDB knows how to create new physical partitions when scaling up, but doesn't know how to merge them when scaling down.

Say you have 10 logical partitions on 5 physical partitions, and you are paying for 50K RUs. Each physical partition holds exactly 2 logical partitions and is allocated 10K RUs. Now you had to temporarily scale up the database for some reason to 100K RUs, so you have 10 physical partitions with one logical partition on each one. When you scale back to 50K RUs, you'll still have 10 logical partitions, each with 5K. So now each of your logical partitions has exactly 5K RUs, while before it had 10K RUs shared with a different logical partition.

3. The allocation of logical partitions to physical partitions is static, hash-based and there's no control over it. This means that having hot logical partitions is a performance problem. Hot logical partitions might end up on the same physical partition and be starved for resources while other physical partitions are over-provisioned. Of course, you can allocate data to partitions completely randomly and hope for the best, but there's a performance penalty for querying multiple logical partitions. Plus, updates/deletes are limited to a single logical partition, so you'll be losing the ability to batch update/delete related documents.

4. Index construction is asynchronous and very slow because it uses some pool of "free" RUs that scale off the RUs allocated to your collection. It used to take us over 12 hours to build simple indexes on a ~30GB collection. Also, if you issue multiple index modification commands they will be queued even if they cancel each other out. So issuing a "create index" command, realizing you've made a mistake, then issuing a "drop index" followed by another "create index" is a 24-hour adventure. Over the next 12 hours the original index will be created, then immediately dropped, and created again. To top it off, there's no visibility into which indexes are being used, and the commands for checking the progress of index construction were broken and never worked for us.

0 comments

Cpoll4y ago

> RUs

I've also run into a situation with obtuse/unexpected/excessive pricing. We were spiking a Mongo -> Cosmos migration and with our Cosmos configuration we were getting charged separately for each (empty) collection our app created.

Each collection was billed separately for provisioned throughput even though they were unused, and since the app created 50~ collections the cost added up pretty quickly before I noticed. Note: There is a way to turn this off afair, but the description of the setting made me decide to select it for a prod-like database.

In reality it's partly my fault. I should have kept a close eye on the costs until I was sure my mental model was correct. I also knew CosmosDB was just offering a Mongo API and wasn't actually hosted Mongo, so I should have been more vigilant about the differences in implementation. And of course I should have RTFM — although, even when I noticed the problem, it took me a long time and a support ticket to find the explanation in the docs.

maltalexOP4y ago

> I should have RTFM — although, even when I noticed the problem, it took me a long time and a support ticket to find the explanation in the docs

I haven't raised it as an issue since it's not a design issue. But the CosmosDB documentation has been a constant source of pain for us too. Important details were often not mentioned, mentioned in unexpected places, or were downright wrong.

rawgabbit4y ago

Thank you for the detailed write up. I did a cursory investigation of CosmosDB but didn’t know it was this bad.

dagss4y ago

It is much worse than this this was just the start of a very long list of awful problems (my team tried to live with Cosmos for a couple of years too,luckily got approved a port to another DB and now we are not looking back).

The worst thing is Microsoft reps would always recommend the database, Microsoft support (who came onsite to us to talk with us about Cosmos) had a hard time acknowledging what should be obvious problems and so on... so it took a while for us to catch on that it actually is as beyond awful as it is.

maltalexOP4y ago

Glad to hear we're not the only ones.

There were times when I felt like I was part of some elaborate prank or social psychology experiment to see how developers would react to an obviously broken product. Trivial use cases were, and probably still are utterly broken. As if no one before us ever tried to use them.

Still, I'd love to hear about your experience with cosmos and the issues you guys discovered.

bowmessage4y ago

From what I remember of my time in AWS, DynamoDB also suffers from 2. Not sure if that's changed, since.

j / k navigate · click thread line to collapse

0 pointsmaltalex4y ago0 comments

Sure. Here are a few examples. They're all based on my experience with the MongoDB API for CosmosDb. Your mileage with other APIs may vary.

0 comments

Cpoll4y ago

> RUs

maltalexOP4y ago

> I should have RTFM — although, even when I noticed the problem, it took me a long time and a support ticket to find the explanation in the docs

rawgabbit4y ago

Thank you for the detailed write up. I did a cursory investigation of CosmosDB but didn’t know it was this bad.

dagss4y ago

maltalexOP4y ago

Glad to hear we're not the only ones.

Still, I'd love to hear about your experience with cosmos and the issues you guys discovered.

bowmessage4y ago

From what I remember of my time in AWS, DynamoDB also suffers from 2. Not sure if that's changed, since.

j / k navigate · click thread line to collapse