vLLM in a docker container, FP16 quantized on an 8x MI300X cluster. Very lazy hackjob, I didn't even set up an interface. Was constructing curl commands from string templates. I worked out if I paid that compute cost over a whole month, it was twice as expensive as the monthlies you'd pay for owning a very nice 2000sqft non-coop apartment in Midtown Manhattan. I was paying rock bottom prices, too.