A H100 uses about 1000W including networking gear and can generate 80-150 t/s for a 70B model like llama.
So back of the napkin, for a decently sized 1000 token response you’re talking about 8s/3600s*1000 = 2wh which even in California is about $0.001 of electricity.