4 links tagged with all of: machine-learning + ocr + document-processing
Click any tag below to further narrow down your results
Links
Grab built a specialized Vision LLM to improve the accuracy of information extraction from user documents for eKYC verification. They faced challenges with traditional OCR systems and fine-tuned existing models, ultimately creating a model that can process Southeast Asian languages and diverse document formats. The article details their technical approach and training methods.
Grab developed a specialized Vision LLM to enhance document processing for eKYC in Southeast Asia. The project focused on improving OCR accuracy for diverse languages and document formats, ultimately creating a lightweight model tailored to their needs.
GLM-OCR is a multimodal optical character recognition (OCR) model designed for complex document understanding. Built on the GLM-V architecture, it features a robust two-stage pipeline for layout analysis and recognition, achieving high accuracy in varied real-world scenarios. The model is open-sourced and comes with an easy-to-use SDK for integration.
Nanonets has launched Nanonets-OCR-s, an advanced image-to-markdown OCR model that intelligently recognizes document structures and content, providing formatted markdown outputs suitable for downstream processing. This model excels in handling complex elements such as LaTeX equations, images, signatures, and tables, making it a valuable tool for various industries including academia, legal, healthcare, and corporate sectors.