Reducing BigQuery Costs (opens in new tab)

(blog.peerdb.io)

76 pointscauchyk2y ago38 comments

38 comments

We've recently been struggling with BigQuery and various other GCP services (i.e. CloudRun and Pub sub) as it feels like utilizing these services can feel like a minefield of gotcha's. With their documentation, and limits/quotas being spread all over the place. It's given us more problems than solutions thus far; albeit that could very well be our fault.

Has anybody else had this experience? Or are we just doing it wrong?

This is not intended to be a rant, just curious.

caffeinated_me2y ago

I've generally found something similar- lots of gotchas, but also some very useful products.

The best way I've found to approach it is to treat GCP as something that has to be evaluated at an individual service level. It's great if you're on one of their expected workflows/golden paths, and you can get lucky with a good fit if you aren't, but they seem to have a lot of unspoken assumptions and limits baked in that might or might not align with your use case.

Disclaimer: My use cases are pretty unusual from talking to our account rep, so this might be over-fitting to weird data.

boredemployee2y ago

At the beginning of my career, I incurred some hundreds of dollars in losses with BigQuery and Google Cloud Function. The problem with these services is that they are easy and intuitive enough for a beginner to use, but a nightmare to maintain.

winrid2y ago

That's nothing. Wait until you incur $150k of Lambda costs in a day!

2 more replies

hilti2y ago

I‘ve had almost the same experience. First I was super impressed how easy it is to get data into BigQuery and retrieve it using their IDE.

But really soon I noticed the slow startup … simple queries took too long (1.2 sec vs milliseconds in a traditional database)

Then I learned a lot about BigQuery views. That helped a little.

At some point I simply wanted to export data. New Google tools needed to be learned: Cloud Storage, Data Flow.

After 18 months of using BigQuery on roughly 850 million rows, I switched back to a traditional database.

dharmab2y ago

I'm glad you learned that lesson for less than $1k. I think everyone who's ever worked with large amounts of data in BigQuery has a story like that, and sometimes the number is six or seven digits.

1 more reply

rdli2y ago

I agree that some GCP services are better than others.

I’ve never used Pub/Sub or Cloud Run, but have been quite happy with BigQuery and GKE.

SOLAR_FIELDS2y ago

BigQuery has more footguns than GKE in my experience, but that’s perhaps because I have a lot more experience with GKE and know how to avoid those footguns. To me at least it’s understandable enough to say More Nodes is More Money but completely non-straightforward to say that this query I wrote is going to scan the data in a new and expensive way. Am I doing it wrong?

2 more replies

danpalmer2y ago

At my previous workplace we had a mix of bare metal (most services), AWS (one service), Digital Ocean (misc), and GCP (BigQuery), and eventually moved almost entirely onto GCP, retaining just a bit of Digital Ocean stuff.

We found that all of these had significant caveats that required careful planning. We had a few instances of runaway AWS costs due to basically not knowing enough about AWS and we had to be careful to only use the "good" AWS products, Digital Ocean never had runaway costs but they did keep turning off production services because our use-case was not one they were familiar with (dev machines, off-site backups). Bare metal was a minefield, we found we couldn't reliably run Prometheus because it ate SSDs. As for GCP, it did require understanding the pricing and it was possible to shoot yourself in the foot with things, but no more than anything else.

There are going to be gotchas everywhere. Overall we had a great experience with GCP, to the point that the company has remained on GCP post-acquisition by another company who were mostly on Azure.

likis2y ago

We have been using all those GCP products and more without any significant problems.

But I do agree, there are some gotchas. PubSub examples: Duplicated messages, shitty DLQ implementation (in my opinion), some developers had improper error handling which lead to to infinite resends of messages (because they nacked message on error), etc..

However, I think the scaling and setup weighs up for all of that. You just need to specify a topic and subscription, and then you don't really have to care about resources or scaling at all, and that is SUPER nice. Also, PubSub is stupidly cheap in comparison to any other similar product, at least that I know of.

dharmab2y ago

Yes, I've found that you need to scrutinize the documentation, quotas, SKUs and billing statements quite closely, and you need to test everything before you run production at scale. I've seen unexpected billing due to an operation or resource using a different SKU than expected which didn't qualify for an account's discount, for example.

antoniojtorres2y ago

Very much. We ended up writing our own queries to try and figure out where the costs were coming from in BQ. Ultimately we decided to offload as much as possible to a self-managed ClickHouse cluster.

nikau2y ago

Yes, try and find the pause button for a push pub sub for example...

natnat2y ago

We use a kind of funky BigQuery setup at my shop to reduce our spend, and I think it's based on a quirk in BigQuery pricing that Google doesn't explicitly recommend.

BigQuery has two completely separate pricing models: on-demand, and slot-based. The two models bill on completely separate things:

* On-demand pricing bills per GB of data processed by your query.

* Slot-based (or editions) pricing allocates your project a number of CPU/memory slots, and you pay per CPU-second.

You can find the costs each query would have by looking at the total_bytes_billed and total_slot_ms columns in the INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION table, and multiplying those values by the slot-ms cost (total_slot_ms * 0.01111) and the bytes-billed cost (total_bytes_billed * 0.0000059). Then you can go through your queries and allocate them to either on-demand or slot-based pricing, depending on which is cheaper.

Usually slot-based is cheaper, but queries that do a lot of internal joins can have really huge CPU costs but cost very little in on-demand if they're not reading a lot of bytes.

Somewhat annoyingly, these billing models are configured at a per-project level; you can't switch between the two of them in a single project. Fortunately, you can query tables from other projects easily.

coreyp_12y ago

Rule #1. BQ is not a standard database. If you use it like one, it will cost a fortune.

Rule #2. BQ is amazing for being able to churn through and analyze massive amounts of data, and can very well be the best option in some use cases.

Rule #3. Letting "just anyone" run queries is as dangerous as casually handing a credit card to your drug-addicted cousin. Just wait until you get the bill!

Rule #4: Partition and cluster your data wisely. You don't have indexes.

Rule #5: Duplicate data. Throw all of the normal forms out the window. Storage is cheap, computation is expensive.

Rule #6: BQ is not meant to be used like MySQL. It's "spin up" time is too slow, but you would be hard-pressed to beat its performance on truly large data sets.

My perspective: One of our customers has a database growing by 17 gigs a day. One of them. There's several on the same scale. Yes, it's necessary. Another instance: One of our customers spent $8k in one month because limits were not properly placed on the account and we didn't catch it until the bill came. We monitor better now. A different instance: We had a dev trying to optimize a query, and they spent $250 in queries to get the cost down from $50/query to $15/query. Most of the time, though, our queries are only pennies.

Now that I've written all of this out, I feel like I need to record a video about it. There's not a lot of BQ info aside from the marketing fluff put out by "teh Google".

saisrirampur2y ago

OP here. 100% agreed on your analysis. Thanks for chiming in. Coming from the Postgres world, this was very counter intuitive for me. I am still not convinced if a database should charge 1000s of $s due to lack of an index (cluster). It could either create the index automatically or explicitly (on the face) warn the user that this can be expensive or else slow.

ethbr12y ago

BigQuery seems to suffer from being overly internal Googly.

A bizarre conversation I witnessed between the BigQuery team and my company at the time (a major customer):

Company: "We need to be able to see our consumption and a breakdown of our bill"

BQ team: "Oh, yeah. We can see how that would be useful. We should probably build that..."

Like, this was a GA product without any thought given to self-serve billing visibility.

I realize billing is usually the last thing bolted on, but I'd expect some basics to be in place before the product ships.

tobad3572y ago

We use BQ quite extensively there are a number of billing tuning options which are not that well documented.

1. for some it will make sense to move to pricing based on CPU time per query vs billing on scanned TB of data. This can be done through the commits in UI.

2. there is option to have storage billed on logical or physical bytes. If you have data with a lot of duplication (enums, customer ids etc) then physical billing can be a lot better option. Last I looked this was only available through CLI setting in dataset and you may need to ask Google to include you in their beta. We lowered our billing with 30% for storage.

I try to keep an eye on GCP release notes to find things like the physical vs logical billing.

ckdarby2y ago

Use BQ to crunch the larger set into a smaller subset that you need and ram that into PG/MySQL.

Used this to power a +$30M revenue affiliate platform tracking.

varshithr2y ago

Thanks for these rules, as a budding engineer, this is very insightful. Will look forward to your video.

andrewstuart2y ago

The Cloud is where Moore’s Law goes to die.

Ditch the cloud. Go buy a bunch of fast computers. You’d be surprised at how fast modern computers are.

Re-learn the ancient art of running your own systems…. careful, all the naysayers will tell you that no company can run their own systems, to try would be crazy!

larodi2y ago

Absolutely! Best way to optimize the costs of this abomination-of-storage called BigQuery from that abomination-of-tech company is to actually NOT use it. Postgres has pretty much everything you'd get from BigQuery if not more. Put on a proper hardware with proper tuning it becomes unbelievably fast. Much faster than BigQuery jobs for sure.

jojobas2y ago

So far the best solution I've seen to save on BigQuery costs is to deploy your own ClickHouse.

It has its own gotchas and quirks, but cost-wise it's not even comparable.

tbragin2y ago

Disclaimer: I work at ClickHouse

Thank you! Glad it helped you.

ClickHouse is in particular well-optimized to take on real-time analytical workloads from BigQuery - both functionally and from a cost perspective.

For anyone wanting to try it yourself, there guide to follow when syncing or migrating workloads from BigQuery to ClickHouse, with some benchmarks and back-of-the-envelope cost comparisons: https://clickhouse.com/blog/clickhouse-bigquery-migrating-da...

ople2y ago

I have had the same experience. I was constantly bumping into unexpected limitations. Moving to CH felt like the opposite, with many more ”Wow, I didn’t expect this to be possible but it is” experiences.

There is a place for BQ but it is good to set expectations correctly and also look at the constraints. They are sometimes not obvious. The docs do helpfully outline the limitations, for example:

- Materialized views: https://cloud.google.com/bigquery/docs/materialized-views-in... - Indexes: https://cloud.google.com/bigquery/docs/search-intro#limitati...

ruslandanilin2y ago

We are working on a product aimed at enhancing the cost efficiency of cloud data warehouses, particularly for users of platforms like Google BigQuery. The core features are intelligent clustering suggestions, automatic materialized view creation, cost monitoring/alerts and more. We're in the phase of seeking potential users who can provide valuable feedback and help validate our idea. If this strikes a chord with your needs or interests, we'd be thrilled to connect with you. Please write me at rdanilin@gmail.com Thanks!

jakozaur2y ago

Some observation:

1. Google Big Query is super powerful, but it's straightforward to shoot yourself in the foot and end up with a huge bill. I would agree that Big Query shines for advanced users. For regular, it's better to use tech that can't be misused by business analytics that just try to get their job done with BI and SQL.

2. I wish a lot of tooling provided better defaults and suggestions to use it efficiently.

intesars2y ago

We paid $14k/mo on GCP's VertexAI :)

garciasn2y ago

BQ works better and costs less with clustered and partitioned tables.

Who would have thought?!

nine_k2y ago

"Any recent graduate can come up with an architecture better than the current one. What takes a senior engineer is to transition to a better architecture while the business is continuing to operate."

klodolph2y ago

And it takes a staff engineer to know which system to refactor.

j / k navigate · click thread line to collapse

38 comments

stairlane2y ago

Has anybody else had this experience? Or are we just doing it wrong?

This is not intended to be a rant, just curious.

caffeinated_me2y ago

I've generally found something similar- lots of gotchas, but also some very useful products.

Disclaimer: My use cases are pretty unusual from talking to our account rep, so this might be over-fitting to weird data.

boredemployee2y ago

winrid2y ago

That's nothing. Wait until you incur $150k of Lambda costs in a day!

2 more replies

hilti2y ago

I‘ve had almost the same experience. First I was super impressed how easy it is to get data into BigQuery and retrieve it using their IDE.

But really soon I noticed the slow startup … simple queries took too long (1.2 sec vs milliseconds in a traditional database)

Then I learned a lot about BigQuery views. That helped a little.

At some point I simply wanted to export data. New Google tools needed to be learned: Cloud Storage, Data Flow.

After 18 months of using BigQuery on roughly 850 million rows, I switched back to a traditional database.

dharmab2y ago

I'm glad you learned that lesson for less than $1k. I think everyone who's ever worked with large amounts of data in BigQuery has a story like that, and sometimes the number is six or seven digits.

1 more reply

rdli2y ago

I agree that some GCP services are better than others.

I’ve never used Pub/Sub or Cloud Run, but have been quite happy with BigQuery and GKE.

SOLAR_FIELDS2y ago

2 more replies

danpalmer2y ago

There are going to be gotchas everywhere. Overall we had a great experience with GCP, to the point that the company has remained on GCP post-acquisition by another company who were mostly on Azure.

likis2y ago

We have been using all those GCP products and more without any significant problems.

dharmab2y ago

antoniojtorres2y ago

Very much. We ended up writing our own queries to try and figure out where the costs were coming from in BQ. Ultimately we decided to offload as much as possible to a self-managed ClickHouse cluster.

nikau2y ago

Yes, try and find the pause button for a push pub sub for example...

natnat2y ago

We use a kind of funky BigQuery setup at my shop to reduce our spend, and I think it's based on a quirk in BigQuery pricing that Google doesn't explicitly recommend.

BigQuery has two completely separate pricing models: on-demand, and slot-based. The two models bill on completely separate things:

* On-demand pricing bills per GB of data processed by your query.

* Slot-based (or editions) pricing allocates your project a number of CPU/memory slots, and you pay per CPU-second.

Usually slot-based is cheaper, but queries that do a lot of internal joins can have really huge CPU costs but cost very little in on-demand if they're not reading a lot of bytes.

coreyp_12y ago

Rule #1. BQ is not a standard database. If you use it like one, it will cost a fortune.

Rule #2. BQ is amazing for being able to churn through and analyze massive amounts of data, and can very well be the best option in some use cases.

Rule #3. Letting "just anyone" run queries is as dangerous as casually handing a credit card to your drug-addicted cousin. Just wait until you get the bill!

Rule #4: Partition and cluster your data wisely. You don't have indexes.

Rule #5: Duplicate data. Throw all of the normal forms out the window. Storage is cheap, computation is expensive.

Rule #6: BQ is not meant to be used like MySQL. It's "spin up" time is too slow, but you would be hard-pressed to beat its performance on truly large data sets.

Now that I've written all of this out, I feel like I need to record a video about it. There's not a lot of BQ info aside from the marketing fluff put out by "teh Google".

saisrirampur2y ago

ethbr12y ago

BigQuery seems to suffer from being overly internal Googly.

A bizarre conversation I witnessed between the BigQuery team and my company at the time (a major customer):

Company: "We need to be able to see our consumption and a breakdown of our bill"

BQ team: "Oh, yeah. We can see how that would be useful. We should probably build that..."

Like, this was a GA product without any thought given to self-serve billing visibility.

I realize billing is usually the last thing bolted on, but I'd expect some basics to be in place before the product ships.

tobad3572y ago

We use BQ quite extensively there are a number of billing tuning options which are not that well documented.

1. for some it will make sense to move to pricing based on CPU time per query vs billing on scanned TB of data. This can be done through the commits in UI.

I try to keep an eye on GCP release notes to find things like the physical vs logical billing.

ckdarby2y ago

Use BQ to crunch the larger set into a smaller subset that you need and ram that into PG/MySQL.

Used this to power a +$30M revenue affiliate platform tracking.

varshithr2y ago

Thanks for these rules, as a budding engineer, this is very insightful. Will look forward to your video.

andrewstuart2y ago

The Cloud is where Moore’s Law goes to die.

Ditch the cloud. Go buy a bunch of fast computers. You’d be surprised at how fast modern computers are.

Re-learn the ancient art of running your own systems…. careful, all the naysayers will tell you that no company can run their own systems, to try would be crazy!

larodi2y ago

jojobas2y ago

So far the best solution I've seen to save on BigQuery costs is to deploy your own ClickHouse.

It has its own gotchas and quirks, but cost-wise it's not even comparable.

tbragin2y ago

Disclaimer: I work at ClickHouse

Thank you! Glad it helped you.

ClickHouse is in particular well-optimized to take on real-time analytical workloads from BigQuery - both functionally and from a cost perspective.

ople2y ago

There is a place for BQ but it is good to set expectations correctly and also look at the constraints. They are sometimes not obvious. The docs do helpfully outline the limitations, for example:

- Materialized views: https://cloud.google.com/bigquery/docs/materialized-views-in... - Indexes: https://cloud.google.com/bigquery/docs/search-intro#limitati...

ruslandanilin2y ago

jakozaur2y ago

Some observation:

2. I wish a lot of tooling provided better defaults and suggestions to use it efficiently.

intesars2y ago

We paid $14k/mo on GCP's VertexAI :)

garciasn2y ago

BQ works better and costs less with clustered and partitioned tables.

Who would have thought?!

nine_k2y ago

"Any recent graduate can come up with an architecture better than the current one. What takes a senior engineer is to transition to a better architecture while the business is continuing to operate."

klodolph2y ago

And it takes a staff engineer to know which system to refactor.

j / k navigate · click thread line to collapse