I think it's really interesting to look at how the GPU market is evolving. TensorPool [1], as an example, who I'm not affiliated with, is a startup that is looking at lowering GPU inference costs.
I think there was some research in relation to energy consumption a couple of years back [2], but I've not noticed anything more recently, since, having briefly searched.
I'm really interested to hear the thoughts of the community in terms of energy costs and provisioning spend w.r.t. increasing usage over time.
[1] https://tensorpool.dev/ [2] GPT-4 energy consumption: https://www.sciencedirect.com/science/article/pii/S2542435123003653