undefined | Better HN

0 pointsdale_glass2y ago0 comments

Sounds excellent! What are the requirements to run this regarding hardware? How much VRAM? Does it work on AMD or Intel Arc?

0 comments

jpcl2y ago

Both models are using around 3GB right now (converted into FP16 for speed). But I checked that the (slower) FP32 version uses 2.3GB so we are probably doing something suboptimal here.

We support CUDA right now although it should not be too hard to port it to whisper/llama.cpp or Apple's MLX. It's a pretty straightforward transformer architecture.

j / k navigate · click thread line to collapse

0 comments

jpcl2y ago

Both models are using around 3GB right now (converted into FP16 for speed). But I checked that the (slower) FP32 version uses 2.3GB so we are probably doing something suboptimal here.

We support CUDA right now although it should not be too hard to port it to whisper/llama.cpp or Apple's MLX. It's a pretty straightforward transformer architecture.

j / k navigate · click thread line to collapse