undefined | Better HN

0 pointsanonzzzies3mo ago0 comments

Is that true? Because that's indeed FAR less than I thought. That would definitely make me worry a lot less about energy consumption (not that I would go and consume more but not feeling guilty I guess).

0 comments

derekdahmer3mo ago

A H100 uses about 1000W including networking gear and can generate 80-150 t/s for a 70B model like llama.

So back of the napkin, for a decently sized 1000 token response you’re talking about 8s/3600s*1000 = 2wh which even in California is about $0.001 of electricity.

pshc3mo ago

With batched parallel requests this scales down further. Even a MacBook M3 on battery power can do inference quickly and efficiently. Large scale training is the power hog.

j / k navigate · click thread line to collapse

0 comments

derekdahmer3mo ago

A H100 uses about 1000W including networking gear and can generate 80-150 t/s for a 70B model like llama.

So back of the napkin, for a decently sized 1000 token response you’re talking about 8s/3600s*1000 = 2wh which even in California is about $0.001 of electricity.

pshc3mo ago

With batched parallel requests this scales down further. Even a MacBook M3 on battery power can do inference quickly and efficiently. Large scale training is the power hog.

j / k navigate · click thread line to collapse