extra_rice on Hacker News

Ask HN: How do I efficiently track record count for big (distributed) DBs?

I've run in to this interesting question multiple times mostly during systems design interviews, but I'm never quite sure what's the "best" way to address this since I've not encountered it in any of my real world projects, unfortunately. This is usually in the context of high demand items such as limited time promotional vouchers, concert tickets, limited stock items, etc. that customers will for sure be rushing to buy, and will sell out in a relatively short period of time.

My naive solution to this is to basically count the number of reservations made (basically, do something like an SQL count of all the rows) and subtract it from some constant limit, which can be safely copied across multiple instances of the reservation/ticketing service. However, for millions of rows, this may be potentially slow and having to do this for thousands of requests per second could be fairly inefficient. Adding some caching in front of a (probably sharded/distributed) database might help, but due to the numbers being updated very quickly, I'm not sure the cache will help that much.

One alternative I can think of is maybe setup a lightweight(?) store (something like Redis perhaps, though I don't have first hand experience with it) to keep track of the count. This will introduce problems related to eventual consistency, but I think it could work. This can be durable store, or we can make it semi-durable and just do a count query if we ever need to restore the service. I always ask if it's possible for us to handle overbooking because there's a chance this could occur. Strong consistency guarantees will prevent this, but at the cost of slower performance.

Another alternative I can think of is adding a sequence number to the reservation records, and query the max instead. I'm not sure if this performs better than count or they're pretty similar in performance.

Any insights or pointers on this?

1extra_rice3y ago0

Ask HN: What are the best resources for learning caching strategies?

There are myriad of ways to improve and optimise application performance, but one of the most commonly used is caching. I want to learn more about caching strategies especially in the context of modern distributed applications. What do you think are the best resources to build better understanding on this topic?

Thanks!

Ask HN: Where else can I buy O’Reilly eBooks online?

I know it’s actually been a while, but I’ve only recently learned that they stopped selling eBooks online in favour of their subscription service. I do tend to prefer physical books over eBooks, but at the right price, I’d get both. I’m lucky to get access to all their contents for free through my current place of employment, but I hate using their mobile app (in both iOS and Android) to read anything, so I still find myself wanting to get a copy of their eBook so I can read them through another app.

I know Amazon Kindle, and Google Play Books have them. I prefer to get them through Google Play because there’s a good chance I’m allowed to download the eBook through them. I’m curious if there are other places that sell them at competitive prices. I checked InformIT but they don’t seem to carry them (at least not the books I’m currently looking for).

Ask HN: Do you run databases on Kubernetes?

I'm curious if development teams prefer to run databases in or out of Kubernetes. For those who do, how to you make it work? What are the key points you think anyone who is considering doing the same should think about before going for it? For those who eventually decided against it, what were the main factors for the decision?

I was at Kubecon earlier this year, and the impression I got was that in general, persistence on Kubernetes is still somehow a challenge, and I think that's true given what I see first hand.

I work in a very small team that, sadly, do not have enough expertise with maintaining databases. I mean, we can use them, but we're not at the level where we can make databases sing. I think somewhere in our organization, there are database people, but currently none of them are involved directly in our project. We are, at the moment, running MongoDB in our Kubernetes cluster, but we recently ran into some issues with it causing problems for the rest of the deployments. I was wondering if it's time for us to consider moving it elsewhere so Kubernetes doesn't have to worry about it.

Ask HN: What are some of the most utilised patterns for querying large datasets?

I'm currently working on a software project where I need to query datasets that could be very large (maybe hundreds of thousands per single context), and then do some computations on the results. It's basically, find some sort of "median" from the set, but it could be a bit more complex than that, like find the smallest, most common value. My impression is that most modern databases should be able to handle queries like this with some built-in mechamism. However, one of the concerns is that, because the datasets could be very large, queries would end up taking very long. The data being queried is also highly dynamic, so caching maybe a little tricky.

I'm pretty sure this isn't something unique to this project, but I'm interested to know how other practitioners address this kind of situation. Also, to note, while I'm asking this in general terms, it'd be interesting to know how MongoDB users in particular handle this.

Ask HN: How do I efficiently track record count for big (distributed) DBs?

Any insights or pointers on this?

Ask HN: What are the best resources for learning caching strategies?

Thanks!

Ask HN: Where else can I buy O’Reilly eBooks online?

Ask HN: Do you run databases on Kubernetes?

I was at Kubecon earlier this year, and the impression I got was that in general, persistence on Kubernetes is still somehow a challenge, and I think that's true given what I see first hand.

Ask HN: What are some of the most utilised patterns for querying large datasets?

extra_rice

Recent submissions

Ask HN: How do I efficiently track record count for big (distributed) DBs?

Ask HN: What are the best resources for learning caching strategies?

Ask HN: Where else can I buy O’Reilly eBooks online?

Ask HN: Do you run databases on Kubernetes?

Ask HN: What are some of the most utilised patterns for querying large datasets?

Recent submissions

Ask HN: How do I efficiently track record count for big (distributed) DBs?

Ask HN: What are the best resources for learning caching strategies?

Ask HN: Where else can I buy O’Reilly eBooks online?

Ask HN: Do you run databases on Kubernetes?

Ask HN: What are some of the most utilised patterns for querying large datasets?