Click any tag below to further narrow down your results
Links
Grab built a specialized Vision LLM to improve the accuracy of information extraction from user documents for eKYC verification. They faced challenges with traditional OCR systems and fine-tuned existing models, ultimately creating a model that can process Southeast Asian languages and diverse document formats. The article details their technical approach and training methods.
Grab developed a specialized Vision LLM to enhance document processing for eKYC in Southeast Asia. The project focused on improving OCR accuracy for diverse languages and document formats, ultimately creating a lightweight model tailored to their needs.
GLM-OCR is a multimodal optical character recognition (OCR) model designed for complex document understanding. Built on the GLM-V architecture, it features a robust two-stage pipeline for layout analysis and recognition, achieving high accuracy in varied real-world scenarios. The model is open-sourced and comes with an easy-to-use SDK for integration.
The article explains how optical character recognition (OCR) models, like deepseek-ocr, process images of text into machine-readable formats. It details the roles of the encoder and decoder in transforming visual data into structured text while highlighting the advancements in learning techniques that reduce the need for manual coding.
Nanonets has launched Nanonets-OCR-s, an advanced image-to-markdown OCR model that intelligently recognizes document structures and content, providing formatted markdown outputs suitable for downstream processing. This model excels in handling complex elements such as LaTeX equations, images, signatures, and tables, making it a valuable tool for various industries including academia, legal, healthcare, and corporate sectors.