Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
Deploying Llama3 70B on AWS – GPU Requirement, Cost and Step-by-Step Guide
(opens in new tab)
(slashml.com)
3 points
JJneid
1y ago
5 comments
Share
Deploying Llama3 70B on AWS – GPU Requirement, Cost and Step-by-Step Guide | Better HN
5 comments
default
newest
oldest
rini17
1y ago
Note that quantized versions of llama3 70B can be ran on CPU on much cheaper server. I am personally using it via llama.cpp on bare metal 6-core Xeon CPU with 128G RAM for ~50 euro monthly.
JJneid
OP
1y ago
Is inference speed an issue for you?
rini17
1y ago
Sufficient for fluent conversation.
JJneid
OP
1y ago
usually performance takes a hit with quantization. are you getting quality responses?
rini17
1y ago
Since llama3, yes, quite satisfying.
j
/
k
navigate · click thread line to collapse