Click any tag below to further narrow down your results
Links
A new open-source OCR model outperformed all major commercial tools on standard text and handwriting tests. It accurately transcribed a 1913 handwritten letter by Ramanujan, preserving layout, math notation, and faint ink details.
This article introduces the Gemma 4 family of models from Google DeepMind, detailing their architectures and improvements over the previous version, Gemma 3. It highlights key features such as interleaved attention layers and efficiency enhancements in global attention mechanisms.
Liquid AI has launched the LFM2.5-350M, an enhanced version of its 350M model, featuring 28 trillion tokens of pre-training and improved performance in data extraction and tool use. The model runs efficiently on various hardware, making it suitable for large-scale data pipelines and edge deployments.
The article discusses the shifting landscape for data scientists and machine learning engineers in the age of large language models (LLMs). It emphasizes the importance of data science fundamentals in evaluating AI systems, addressing common pitfalls in metrics, experimental design, and data quality. The author argues that the core work of data scientists remains vital, even as their roles evolve.
This article explores an unconventional method for classifying text by leveraging compression algorithms. The author demonstrates how to concatenate labeled documents, compress them, and use the compressed sizes to predict labels for new texts. While the method shows promise, it is computationally expensive and generally underperforms compared to traditional classifiers.
This article explores how Python 3.14's zstd module enables efficient text classification through incremental compression. It outlines a method where text is classified based on the size of compressed output from different class-specific compressors, demonstrating improved speed and accuracy over traditional methods.
Organizations are increasingly faced with the decision of whether to implement Retrieval-Augmented Generation (RAG) or fine-tuning for their AI initiatives. RAG connects large language models to external databases, allowing access to real-time information, reducing inaccuracies, and enhancing security and traceability. However, implementing RAG comes with its own technical challenges that require careful planning and maintenance.
Deep Think with Confidence (DeepConf) is introduced as a method to improve reasoning efficiency and performance in large language models by using internal confidence signals to filter out low-quality reasoning traces. It requires no additional training or tuning and can be easily integrated into existing systems. Evaluations show significant accuracy improvements and a reduction in generated tokens on various reasoning tasks.
Deep Atlas offers an intensive curriculum designed to compress months of AI and machine learning education into just weeks. With hands-on projects, community learning, and successful alumni, participants can quickly gain the skills needed for a career in AI.
Qwen has released the Qwen3-VL-Embedding and Qwen3-VL-Reranker models, designed for advanced multimodal information retrieval and cross-modal understanding. These models support various inputs, including text and images, and enhance retrieval accuracy through a two-stage process of initial recall and precise re-ranking.
SleepFM is a novel foundation model developed to analyze polysomnography (PSG) recordings, facilitating accurate predictions of various health conditions based on sleep data. Trained on over 585,000 hours of sleep recordings, it demonstrates strong performance in predicting diseases such as dementia and heart failure, while also supporting standard sleep analysis tasks.
Exploring the effectiveness of coding agents hinges on effective user input, constraints, and context. By applying Steven Johnson's patterns for generating ideas, the article demonstrates how to enhance coding agent outputs through structured prompting and feedback mechanisms. This approach encourages incremental development, reuses existing solutions, and fosters a collaborative environment between humans and AI.
A comprehensive collection of over 123 scientific skills has been developed for Claude, enabling it to function as an AI research assistant across various scientific fields. These skills support complex workflows in areas such as bioinformatics, cheminformatics, clinical research, and machine learning, providing users with extensive tools and resources for their scientific tasks.
PostHog AI has evolved significantly over its first year, transforming from a basic tool to a comprehensive AI agent capable of complex data analysis and task execution. Key learnings highlight the importance of model improvements, context, and user trust in AI interactions. The platform is now utilized by thousands weekly, offering insights into product usage and error management.
Livedocs is a collaborative platform that merges the functionality of notebooks with app-building simplicity, ideal for various data tasks such as exploration, analysis, and visualization. It supports powerful AI tools, enabling users to perform advanced analytics, create interactive dashboards, and share insights effortlessly.
Pingkit is a toolkit designed for training reproducible, capacity-aware models using transformer activations. It offers features for extracting embeddings, training neural architectures, and creating custom probes tailored to specific research needs. The toolkit is integrated with Hugging Face models and provides various utilities for data processing and model training.
The Smol Training Playbook on Hugging Face provides a comprehensive guide for efficiently training machine learning models using the Hugging Face ecosystem. It emphasizes best practices and methodologies for optimizing training processes, making it accessible for both beginners and experienced practitioners. The playbook also includes practical examples and resources to enhance the learning experience.
Foundation models in pathology are failing not due to size or training duration but because they are built on flawed assumptions about data scalability and generalization. Clinical performance has plateaued, as models struggle with variability across institutions and real-world applications, highlighting a need for task-specific approaches instead of generalized solutions. Alternative methods, like weakly supervised learning, have shown promise in achieving high accuracy without the limitations of foundation models.