1 link tagged with all of: ocr + benchmarks + open-source + document-processing + multilingual
Links
Chandra OCR 2, a 4 billion-parameter model from Datalab, outperforms GPT-4o and Gemini on AllenAI’s olmOCR benchmark and a 90-language test while halving the model size. It preserves layout, reads complex tables and math notation, converts diagrams to Mermaid, and runs at two pages per second on an NVIDIA H100. The code is Apache 2.0 but the model weights use an OpenRAIL-M license with commercial restrictions.
ocr ✓
open-source ✓
benchmarks ✓
document-processing ✓
multilingual ✓