undefined | Better HN

0 pointssushid1y ago0 comments

Is that not just traditional OCR applied on top of LLM?

0 comments

It's possible they have a software layer that does that. But I was assuming they don't, because the open source multimodal models don't.

No it’s not, it’s a multimodal transformer model.

j / k navigate · click thread line to collapse

It's possible they have a software layer that does that. But I was assuming they don't, because the open source multimodal models don't.

No it’s not, it’s a multimodal transformer model.

j / k navigate · click thread line to collapse