Zai/GLM-OCR (opens in new tab)

(huggingface.co)

2 pointsnaze3mo ago1 comments

1 comments

The MTP (Multi-Token Prediction) loss combined with stable full-task RL is an interesting training approach - curious how much the MTP specifically contributes to the 94.62 OmniDocBench score vs the RL component alone. At 0.9B params with vLLM/SGLang support, this looks very deployable. The PP-DocLayout-V3 integration for layout analysis before recognition is smart - most OCR failures I've seen come from poor region detection on complex documents rather than the recognition itself.

j / k navigate · click thread line to collapse

1 comments

raphaelmolly83mo ago

j / k navigate · click thread line to collapse