undefined | Better HN

0 pointsux2664782d ago0 comments

The base model doesn't have these problems FWIW

0 comments

How are you running the base model?

vLLM in a docker container, FP16 quantized on an 8x MI300X cluster. Very lazy hackjob, I didn't even set up an interface. Was constructing curl commands from string templates. I worked out if I paid that compute cost over a whole month, it was twice as expensive as the monthlies you'd pay for owning a very nice 2000sqft non-coop apartment in Midtown Manhattan. I was paying rock bottom prices, too.

j / k navigate · click thread line to collapse

0 comments

cosmojg2d ago

How are you running the base model?

ux266478OP2d ago

j / k navigate · click thread line to collapse