https://apxml.com/models/glm-5
To run GLM-5 you need access to many, many consumer grade GPUs, or multiple data center level GPUs.
>They will likely get cheaper to run over time as well (better hardware).
Unless they magically solve the problem of chip scarcity, I don't see this happening. VRAM is king, and to have more of it you have to pay a lot more. Let's use the RTX 3090 as an example. This card is ~6 years old now, yet it still runs you around $1.3k. If you wanted to run GLM-5 I4 quantization (the lowest listed in the link above) with a 32k context window, you would need *32 RTX 3090's*. That's $42k dollars you'd be spending on obsolete silicon. If you wanted to run this on newer hardware, you could reasonable expect to multiply that number by 2.
Also, how much bang for the buck do those 3090s actually give you compared to enterprise-grade products?