undefined | Better HN

0 pointsm3kw92y ago0 comments

Is it fixed to a certain llm architecture like llama2? How does it deal with different architectures like MOE for example

0 comments

tome2y ago

It's not fixed and our chip wasn't designed with LLMs in mind. It's a general purpose, low latency, high throughput compute fabric. Our compiler toolchain is also general purpose and can compile arbitrary high performance numerical programs without the need for handwritten kernels. Because of the current importance of ML/AI we're focusing on PyTorch and ONNX models as input, but it really could be anything.

We can also deploy speech models like Whisper, for example, or image generation models. I don't know if we have any MOE architectures, but we'll be implementing Mixtral soon for sure!

j / k navigate · click thread line to collapse

0 comments

tome2y ago

We can also deploy speech models like Whisper, for example, or image generation models. I don't know if we have any MOE architectures, but we'll be implementing Mixtral soon for sure!

j / k navigate · click thread line to collapse