undefined | Better HN

0 pointssillysaurusx5y ago0 comments

Don't buy hardware in general for AI work, IMO. It'll be out of date in a year and you'll end up training in the cloud anyway.

0 comments

dx0345y ago

If you properly utilize your hardware, on premise (or colocation in an area with cheap electricity prices) is vastly cheaper and will likely continue to be for a while. I don't see how training models in the cloud makes financial sense for organizations that can utilize their hardware 24/7.

For all others with burst workloads training in the cloud can make sense, but that has been the case for a while already.

sillysaurusxOP5y ago

We're not talking about organizations, though. I don't agree with your premise, either. People aren't training models 24/7, so the idea that it's "vastly cheaper and will continue to be for a while" isn't true.

king_magic5y ago

> People aren’t training models 24/7

... uh, you sure about that? Let me go check on the 3 models I have concurrently training for my organization on 3 separate GPU servers (all 2 year old hardware to boot) that have been running continuously for the past 36 hours. It pretty much works out to 24/7 training for the past several months.

And BTW, this is massively cheaper for us than training in the cloud.

qayxc5y ago

Instead of arguing back and forth, how about a test case instead?

Pretraining BERT takes 44 minutes on 1024 V100 GPUs [1]

This requires dedicated instances, since shared instances won't be able to get to peak performance if only because of the "noisy neighbour"-effect.

At GCP, a V100 costs $2.48/h [2], so Microsoft's experiment would've cost $2,539.52.

Smaller providers offer the same GPU at just $1.375/h [3], so a reasonable lower limit would be around $1,408.

For a single BERT pretraining, provided highly optimised workflows and distributed training scripts are already at hand, renting a GPU for single training tasks seems to be the way to go.

The cost of V100-equivalent end-user hardware (we don't need to run in a datacentre, dedicated workstations will do), is about $6,000 (e.g. a Quadro RTX 6000), provided you don't need double precision. The card will have equal FP32 performance, lower TGP and VRAM that sits between the 16 GB and 32 GB version of the V100.

Workstation hardware to go with such card will cost about $2,000, so $8,000 are a reasonable cost estimation. The cost of electricity varies between regions, but in the EU the average non-household price is about 0.13€/kWh [4].

Pretraining BERT therefore costs an estimated 1024 h * 0.13€/kWh * 0.5 kW ≈ 57€ in electricity (power consumption estimated from TGP + typical power consumptions of an Intel Xeon workstation from my own measurements when training models).

In order to get the break-even point we can use the following equation: t * $1,408 = $8,000 + t * $69, which results in t = 8,000/(1408-69) or t > 5.

In short, if you pretrain BERT 6 times, you safe money by BUYING a workstation and running it locally over renting cloud GPUs from a reasonably cheap provider.

This example only concerns BERT, but you can use the same reasoning for any model that you know the required compute time and VRAM requirements of.

This only concerns training, too - inference is a whole different can of worms entirely.

[1] https://www.deepspeed.ai/news/2020/05/27/fastest-bert-traini...

[2] https://cloud.google.com/compute/gpus-pricing

[3] https://www.exoscale.com/syslog/new-tesla-v100-gpu-offering/

[4] https://ec.europa.eu/eurostat/statistics-explained/index.php...

j / k navigate · click thread line to collapse