Click any tag below to further narrow down your results
Links
ShapeR offers a method for generating 3D shapes from image sequences. It processes input images to extract relevant data, then uses a transformer model to create a mesh representation of each object in the scene. The project includes tools for setup, data exploration, and evaluation.
Grab built a specialized Vision LLM to improve the accuracy of information extraction from user documents for eKYC verification. They faced challenges with traditional OCR systems and fine-tuned existing models, ultimately creating a model that can process Southeast Asian languages and diverse document formats. The article details their technical approach and training methods.
This article discusses how machine learning techniques can improve acoustic eavesdropping attacks using gyroscopes and accelerometers in smartphones. It highlights recent research that bypasses the need for microphone access by utilizing these sensors to extract speech data. The series will explore the success of previous projects and attempt to reproduce and enhance their results.
The article outlines a structured approach to creating product evaluations for language models. It emphasizes the importance of labeling, aligning evaluators, and setting up an evaluation harness to ensure accurate and efficient assessments. The author shares practical tips on handling binary labels, dataset balance, and the integration of evaluators for scalable results.
Gregor Zunic argues that traditional agent frameworks complicate AI model interactions without adding value. Instead, he advocates for minimalism, allowing models maximum freedom to operate effectively, especially in tasks like browsing. The focus should be on leveraging the model's capabilities rather than imposing restrictive abstractions.
The article discusses how Dash evolved from a basic search system to an agentic AI by implementing context engineering. It highlights strategies like limiting tool definitions, filtering relevant context, and introducing specialized agents to improve decision-making and performance.
The article explains how Yelp developed a Back-Testing Engine to simulate ad budget allocation changes using historical data. This tool allows the company to test new algorithms and strategies safely without impacting live campaigns, helping optimize performance and maintain advertiser trust.
This article discusses how Netflix uses Metaflow to improve machine learning and AI workflows. It introduces a new feature called Spin, which accelerates iterative development by allowing users to run and test code quickly while managing inputs and outputs effectively.
This article outlines the development of Expedia Group's centralized Embedding Store Service, which streamlines the management and querying of vector embeddings for machine learning applications. It emphasizes the importance of metadata management, discoverability, and efficient similarity searches to support various ML workflows.
The article discusses various open problems in machine learning inspired by a graduate class. It critiques current methodologies, emphasizing the need for a design-based perspective, better evaluation methods, and innovations in large language models. The author encourages researchers to explore these under-addressed areas.
This article details how Dropbox created a custom feature store to enhance the search and ranking system in Dropbox Dash. It discusses the challenges of integrating on-premises and cloud systems, achieving low latency for feature retrieval, and ensuring data freshness in response to user behavior.
Apple has launched MLX, a machine learning framework optimized for their silicon chips. It supports various tasks including training transformer models, text and image generation, and speech recognition. The article also touches on a phenomenon called "grokking" related to neural network learning.
Perfectly is an AI-driven recruiting agency that helps startups fill roles quickly, often within days. Their system uses machine learning to identify and rank candidates, aiming to double interview pass rates and reduce the time to hire. Startups can request access and get started with a brief intake process.
This article outlines the role of machine learning engineers, detailing their responsibilities in transitioning ML models from research to production. It covers essential skills, methodologies, and career paths in the field, emphasizing the importance of collaboration between data science and engineering.
This article details the development of AI systems that remember and learn from interactions, enhancing contextual understanding. Key features include coherent narratives, evidence-based perception, and dynamic user profiles, achieving high reasoning accuracy. Contributions from the community are encouraged.
This article discusses how Faire uses graph neural networks (GNNs) to improve personalized product recommendations in its marketplace. It details the challenges of traditional recommendation systems and explains how GNNs model relationships between retailers and products to surface relevant items. The approach involves building a bipartite engagement graph and optimizing embeddings for better accuracy.
This article explores the potential of a new AI model capable of recognizing and interacting with computer interfaces in real-time without relying on APIs. It outlines the challenges of achieving quick reaction times, complex reasoning, and flawless execution, suggesting that success in these areas could revolutionize automation across various fields.
This article explains the shift from analytical machine learning to real-time machine learning, highlighting its importance in making instantaneous business decisions. It details how companies like Uber leverage real-time data for applications such as fraud detection and personalized recommendations.
Clara Collier interviews Abhishaike Mahajan from Noetik about the role of AI in developing cancer treatments. Mahajan shares his experience in machine learning applications across health insurance and genetic therapies, focusing on using AI to better understand tumor microenvironments and improve drug response predictions.
The article explains reinforcement learning through a psychological lens, focusing on feedback mechanisms in both humans and computers. It outlines how computer programs learn by receiving scores, updating their responses, and emphasizes a specific approach called Reformist RL, which simplifies implementation for generative models.
The article explores Moravec's paradox, highlighting the disparity between tasks that are easy for machines and those that are difficult, like everyday physical actions. It discusses experiments with a robotic model tackling simple tasks, revealing both successes and limitations in achieving "gold medal" standards. The work emphasizes the need for diverse data to improve robots' physical intelligence.
Anthropic has released Claude Opus 4.6, an upgraded AI model that enhances coding skills, multitasking, and reasoning capabilities. It features a 1M token context window and outperforms previous models and competitors in various evaluations, making it suitable for complex tasks in finance, coding, and document creation.
This article discusses how AI technologies are reshaping data quality processes in modern enterprises. It explains the shift from traditional rule-based systems to AI-driven frameworks that enhance data accuracy, automate cleaning, and create trust scores based on data reliability. The use of deep learning, generative models, and reinforcement learning plays a key role in adapting to complex data environments.
This article introduces WebGym, an extensive open-source environment for training visual web agents using nearly 300,000 tasks from real websites. It details a reinforcement learning approach that improves agent performance, achieving a notable increase in success rates on unseen tasks compared to other models.
Egocentric-10K is the largest dataset focused on egocentric video collected in real factory settings, featuring over 1 billion frames across nearly 193,000 clips. It includes detailed camera intrinsics and metadata for each video, making it valuable for research in human-robot interaction and computer vision.
Tony Zhao announces the ACT-1, a new robotic AI model that does not rely on prior robot data. It features capabilities for long-horizon tasks and can generalize without specific training examples. The model aims to enhance robotic dexterity and performance.
STARFlow and STARFlow-V are open-source models designed for generating high-quality images and videos from text prompts. They combine autoregressive models with normalizing flows to achieve impressive results in both text-to-image and text-to-video tasks. Users can easily set up the models and start generating content with provided scripts and configurations.
Meta has launched Ax 1.0, an open-source platform that uses machine learning to streamline complex experimentation. It employs Bayesian optimization to help researchers efficiently identify optimal configurations across various applications, from AI model tuning to infrastructure optimization.
This article discusses a new data platform model called Da2a, which shifts from centralized systems to a network of specialized agents. Each agent handles specific domains and collaborates through a protocol to answer business questions, reducing reliance on technical teams and streamlining the data analysis process.
jax-js is a JavaScript library that brings JAX-style machine learning capabilities to the browser. It allows users to perform high-performance numerical computations using familiar NumPy-like syntax and runs entirely client-side. The framework supports GPU acceleration through WebGPU and offers features like automatic differentiation and JIT compilation.
INTELLECT-3 is a Mixture-of-Experts model with over 100 billion parameters, trained using a custom reinforcement learning framework. It outperforms larger models across various benchmarks in math, code, and reasoning. The training infrastructure and datasets are open-sourced for public use and research.
This article outlines how researchers trained a GPT-2 model using a carefully crafted 1 billion token dataset, achieving over 90% of the performance of models trained on 10 times more data. They found that a static mix of 50% finePDFs, 30% DCLM-baseline, and 20% FineWeb-Edu outperformed traditional curriculum learning methods. Key insights include the importance of dataset quality and the dangers of abrupt transitions between data distributions.
GLM-OCR is a multimodal optical character recognition (OCR) model designed for complex document understanding. Built on the GLM-V architecture, it features a robust two-stage pipeline for layout analysis and recognition, achieving high accuracy in varied real-world scenarios. The model is open-sourced and comes with an easy-to-use SDK for integration.
This article discusses the concept of federated fine-tuning specifically for tabular data models. It explores how this approach can enhance model performance while addressing privacy concerns by keeping data decentralized. The piece delves into the implications for machine learning and data collaboration.
The article explores how Google Maps influences the survival of restaurants in London through its ranking system. By analyzing over 13,000 restaurants using machine learning, the author reveals that visibility on the platform disproportionately benefits chains and established venues, while new independents struggle to gain traction. A dashboard has been created to visualize these dynamics and identify underrated restaurants.
This article analyzes design patterns for autonomous agents, emphasizing context management and the use of computers to enhance agent functionality. It discusses various techniques like progressive disclosure, context offloading, and the use of sub-agents to optimize performance and reduce token costs.
Grab developed a specialized Vision LLM to enhance document processing for eKYC in Southeast Asia. The project focused on improving OCR accuracy for diverse languages and document formats, ultimately creating a lightweight model tailored to their needs.
This article discusses the significance of the Chain Rule of Probability and the Chain Rule of Calculus in machine learning advancements. It explains how these rules help compute complex probabilities in language models by breaking them down into smaller events, like predicting tokens based on previous ones. The author also highlights notable achievements in deep learning and diversity efforts within the AI community.
This article critiques the use of perplexity as a metric for evaluating machine learning models, particularly Transformers. It argues that a model can achieve low perplexity while failing to predict certain sequences accurately, highlighting the metric's inadequacy in reliably selecting the best model. The authors provide analytical insights into how model confidence and accuracy relate to perplexity.
The article discusses the importance of verifiability over model performance in AI cybersecurity. It highlights how offensive AI has a clear advantage due to easy verification of tasks, while defensive security struggles with complex, hard-to-verify challenges. Effective verifiers are essential for improving defense strategies against AI-driven attacks.
This article explains how to train a WordPiece tokenizer specifically for BERT models. It covers dataset selection and the tokenization process, emphasizing the importance of capturing sub-word components. The author also provides related resources for further exploration.
This article explores the development of Matic, a home robot designed to automate cleaning tasks with advanced navigation and mapping capabilities. Founders Navneet Dalal and Mehul Nariyawala aim to free up time for families by addressing the repetitive chores that consume daily life.
AWS has introduced a Responsible AI Lens and updated its Machine Learning and Generative AI Lenses within the Well-Architected Framework. These updates aim to help professionals design and manage AI systems with a focus on ethics, risk management, and operational best practices.
This repository provides the implementation details for Multiplex Thinking, a method that uses token-wise branch-and-merge reasoning for efficient multi-pattern reasoning. It includes setup instructions using Docker or Conda, and details for training and evaluating models.
This article discusses a method for shaping language model capabilities during pretraining by filtering tokens from the training data. The authors demonstrate that token filtering is more effective and efficient than document filtering, particularly for minimizing unwanted medical capabilities. They also introduce a new labeling methodology and show that this approach remains effective even with noisy labels.
This article discusses the common reasons why enterprise ontologies and knowledge graphs often fail. The author draws on personal experience with machine learning projects to highlight key issues in design and implementation.
The article introduces the FLUX.2 [klein] model family, which offers rapid image generation and editing capabilities in under half a second. It combines text-to-image and multi-reference generation in a compact architecture that runs efficiently on consumer hardware. Open weights are available for customization and fine-tuning.
AIRS-Bench evaluates the research capabilities of large language model agents across 20 tasks in machine learning. Each task includes a problem, dataset, metric, and state-of-the-art value, allowing for performance comparison among various agent configurations. The framework supports contributions from the AI research community for further development.
This article explains tensor parallelism (TP) in transformer models, focusing on how it allows for efficient matrix multiplication across multiple GPUs. It details the application of TP in both the Multi-Head Attention and Feed-Forward Network components, highlighting its constraints and practical usage with the Hugging Face library.
The article critiques the common practices in machine learning system design interviews, highlighting their inefficiencies and failure modes. It advocates for a reassessment of interview structures to focus on relevant skills and realistic scenarios, rather than outdated or superficial questions.
This article discusses the evolving role of data engineers in the age of AI, emphasizing the need to adapt data preparation strategies. It highlights the shift from traditional data workflows to flexible, context-aware systems that prioritize data curation over mere collection.
This article discusses a new method for understanding user intent by breaking down interactions on mobile devices into two stages. By summarizing individual screens and then extracting intent from those summaries, small models can achieve results similar to larger models without needing server processing. The approach improves efficiency and maintains user privacy.
This article provides an overview of agents in the context of data science and machine learning on Kaggle. It explains their role in automating tasks, making decisions based on data, and improving efficiency in projects. Readers can expect to learn about the fundamental concepts and applications of agents.
The huggingface_hub has launched version 1.0 after five years of development, introducing significant changes and performance improvements. This version supports over 200,000 libraries and provides access to millions of models, datasets, and Spaces, while ensuring backward compatibility for most machine learning libraries.
This article discusses BaNEL, a new algorithm that improves generative models by training them using only negative reward samples. It addresses the challenges of reward sparsity and costly evaluations in complex problem-solving scenarios, demonstrating its effectiveness through various experiments.
The article discusses the evolution of voice computing, tracing its journey from early attempts with Siri to the current advancements driven by AI and machine learning. It emphasizes the potential for real functionality in voice interaction, especially with new models and hardware innovations on the horizon. The author expresses cautious optimism that 2026 could be a turning point for practical voice computing.
This article discusses Apple's MLX framework, designed for efficient use of M-series chips in protein folding tasks. It highlights the advantages of unified memory architecture and provides a detailed example of adapting OpenFold3 code to work with MLX. The author shares performance results showing significant speed improvements compared to traditional setups.
This article introduces a comprehensive resource for learning AI engineering, focusing on building efficient and reliable intelligent systems. It offers a textbook, hands-on activities, and hardware kits, emphasizing real-world application and constraints. The goal is to train engineers who can create dependable AI systems.
This article discusses how Pinterest uses behavioral sequence modeling to improve ad candidate generation. By analyzing user behavior, the platform predicts which advertisers and products users are likely to engage with, leading to more personalized and relevant ad experiences.
This article highlights the features and benefits of the Vectra AI Platform, which can detect threats more quickly and accurately. It includes testimonials from various security professionals who discuss improved detection rates and reduced response times after implementing Vectra AI.
This article discusses how Netflix integrates its Foundation Model into personalization applications. It outlines three approaches—embeddings, subgraph, and fine-tuning—each with distinct trade-offs and complexities tailored to different application needs and performance requirements.
This article discusses WarpGrep, a model designed for efficient code search. It highlights how WarpGrep uses reinforcement learning for quick and parallel code retrieval, achieving results comparable to leading models in a fraction of the time.
The article discusses an experiment where a summarizer and a generator were co-trained to create a compression scheme for text. The model learned to effectively use Mandarin and punctuation to reduce text size while preserving meaning, achieving a compression rate of about 90%.
ShareChat engineers faced scalability issues with their ML feature store, initially unable to handle the required load. After a series of architectural optimizations and a shift in focus, they successfully rebuilt the system to support 1 billion features per second without increasing database capacity.
Qwen-Image-Layered is a model that breaks down images into multiple editable RGBA layers, allowing for precise manipulation of individual components. Users can perform tasks like recoloring, resizing, and repositioning without altering other layers. The model supports variable-layer decomposition and can recursively decompose layers.
This article discusses the extraction and analysis of Claude 4.5 Opus's "soul document," a key component in its training. The author details the process of retrieving this document, the consistency of its outputs, and its implications for understanding the model's knowledge and behavior. Insights into Claude's system message and how it interacts with user prompts are also examined.
This article explores Leash Bio, a startup focused on machine learning for drug discovery. It highlights the challenges of ensuring model accuracy and the potential for unintentional bias in research, while showcasing Leash Bio's commitment to ethical practices in a complex field.
The article discusses TabPFN, a foundation model designed to improve predictions on tabular datasets without needing to retrain for each new dataset. It highlights how TabPFN uses in-context learning and synthetic data to achieve efficient inference, demonstrating its effectiveness through a Kaggle competition comparison with XGBoost.
John Giannandrea, Apple's senior VP for AI and Machine Learning, will retire in spring 2026 but remain as an advisor until then. Amar Subramanya has been appointed as the new VP of AI, tasked with leading significant projects in AI research and development.
The SpecForge team, in partnership with industry leaders, has launched SpecBundle (Phase 1), a collection of production-ready EAGLE-3 model checkpoints aimed at enhancing speculative decoding in large language models. This release addresses the lack of accessible tools and high-quality draft models, while SpecForge v0.2 introduces major usability upgrades and multi-backend support for improved performance.
GLM-5 is a new model designed for complex systems engineering and long-horizon tasks, boasting 744 billion parameters and improved training efficiency. It outperforms its predecessor, GLM-4.7, on various benchmarks and is capable of generating professional documents directly from text.
This article provides a Jupyter notebook for implementing the OLMo3 model from scratch. It includes code examples and explanations for building and training the model. The focus is on practical application rather than theoretical concepts.
This article explains how AI transforms traditional ETL processes by automating schema mapping, data transformations, and anomaly detection. It highlights the challenges of traditional ETL, such as handling unstructured data and adapting to schema changes, and shows how AI-driven methods improve efficiency and scalability.
This article discusses the challenges and solutions in developing large-scale generative recommendation systems, particularly in managing user data and improving training efficiency. It highlights techniques like multi-modal item towers and sampled softmax to enhance performance while addressing issues like cold-start and latency.
The author reflects on teaching machine learning without relying on the concept of data-generating distributions, arguing that such distributions don’t exist in practice. Instead, the focus should be on population models and how they inform decision-making based on sample data. The article emphasizes the importance of understanding how samples relate to the populations they represent.
Saber is a zero-shot framework for reference-to-video generation that relies solely on video-text pairs instead of costly reference image-video-text triplets. It uses masked training with dynamic substitutes to enhance subject integration and generalization across diverse scenarios. The model shows improved performance in generating videos that maintain subject identity while following text prompts.
This article discusses the challenges and methods for teaching a language model to generate humor. It details the use of specific rubrics to evaluate comedic content and describes the data collection process from various platforms like Twitter and TikTok. The author shares successes and failures in refining the model's ability to produce funny responses.
OptiMind is a language model developed by Microsoft Research that converts natural language optimization problems into mathematical models ready for solvers. It aims to streamline the modeling process, making it quicker and easier for users in various fields like supply chain and finance. Available on Hugging Face, it allows for hands-on experimentation and integration into existing workflows.
Google Cloud's AlphaEvolve uses AI to help solve complex optimization problems by evolving algorithms through a feedback loop. Users provide a problem specification and initial code, and AlphaEvolve generates improved versions, optimizing efficiency over time. It's currently in private preview for businesses looking to enhance their algorithmic challenges.
Sarah Usher discusses the limitations of using BigQuery as a data warehouse, particularly in machine learning applications. She highlights common issues like data disorganization, performance slowdowns, and the pitfalls of maintaining multiple data cleaning processes. Usher emphasizes the importance of defining a clear source of truth and designing data lineage effectively.
DFlash introduces a lightweight block diffusion model that enhances speculative decoding by enabling faster and more accurate parallel drafting. It combines the speed of diffusion models with the verification strength of autoregressive models, achieving significant performance improvements over existing methods like EAGLE-3. The approach demonstrates how to leverage the benefits of both model types without sacrificing quality.
Gemini 3 Pro advances AI's ability to understand and reason with visual information, excelling in document processing, spatial awareness, screen interaction, and video analysis. It outperforms human benchmarks in complex tasks and offers solutions for education, medical imaging, and legal workflows.
This article outlines the development of Pinterest's AI infrastructure over ten years, highlighting key phases and challenges faced by the machine learning teams. It discusses the importance of organizational alignment and shared foundations in driving adoption and improving efficiency.
This article introduces CoLog, a framework designed to detect both point and collective anomalies in operating system logs using collaborative transformers. It effectively handles different log modalities and has demonstrated high precision and recall across multiple benchmark datasets.
Rmlx is an R package that connects to Apple's MLX framework, allowing users to leverage GPU computing on Apple Silicon. It supports various backend configurations for efficient matrix operations and automatic differentiation. The package facilitates high-performance computations directly from R, making it suitable for data analysis and machine learning tasks.
The article explains how optical character recognition (OCR) models, like deepseek-ocr, process images of text into machine-readable formats. It details the roles of the encoder and decoder in transforming visual data into structured text while highlighting the advancements in learning techniques that reduce the need for manual coding.
Isomorphic Labs has launched the Drug Design Engine (IsoDDE), which significantly improves predictive accuracy for drug discovery. It outperforms previous models like AlphaFold 3 in predicting protein-ligand interactions and identifying novel binding sites, facilitating faster and more effective drug design.
Researchers trained an AI model to detect Alzheimer's using blood samples, focusing on DNA fragment length patterns. They created a more interpretable classifier that outperforms traditional biomarker classes in detecting the disease.
This article discusses Grab's approach to optimizing CPU provisioning for Flink applications using machine learning. It highlights the limitations of reactive autoscaling and proposes a predictive model that forecasts workload demands to improve resource allocation and reduce inefficiencies.
This article introduces Nested Learning, a machine learning paradigm that addresses catastrophic forgetting by treating models as interconnected optimization problems. It highlights how this approach can enhance continual learning and improve memory management in AI systems, demonstrated through a new architecture called Hope.
This article discusses Autocomp, a framework designed to optimize code for tensor accelerators using large language models. It highlights how Autocomp outperforms human experts in efficiency and portability, particularly when applied to AWS Trainium. The authors explore the challenges of programming tensor accelerators and the unique optimizations required for effective performance.
This article introduces Dynamic Large Concept Models (DLCM), a new framework that enhances language processing by shifting focus from individual tokens to broader concepts. It learns semantic boundaries and reallocates computational resources for better reasoning, achieving improvements in language model performance on various benchmarks.
Depth Anything 3 (DA3) is a model designed for accurate depth estimation and 3D geometry recovery from various visual inputs, regardless of camera pose. It simplifies the process using a single transformer backbone and a depth-ray representation, outperforming previous models in both monocular and multi-view scenarios. Various specialized models within the DA3 series cater to different depth estimation tasks.
This article details Lyft's Feature Store, highlighting its role in managing and deploying machine learning features at scale. It covers architectural improvements, batch feature ingestion, online serving mechanisms, and the importance of metadata for governance and discoverability. The post illustrates how these advancements enhance developer experience and support data-driven decision-making.
IBM has patented a method for using derivatives to find convergents of generalized continued fractions, a technique that dates back over 200 years. The implementation merely applies established number theory concepts in PyTorch, raising concerns about the validity of the patent given the existing mathematical knowledge. This patent could impact various fields that utilize continued fractions, including engineering and mathematics.
Facebook Reels enhanced its video recommendations by implementing a User True Interest Survey that directly collects user feedback on content relevance. This approach helps surface niche content, boosts user engagement, and addresses challenges like data sparsity and bias.
This article introduces Tangle, a platform for creating machine learning pipelines using a visual editor. Users can build, edit, and run workflows without needing extensive coding skills, and the tool supports various programming languages and frameworks. It also offers features like execution caching to improve efficiency.
This article details the implementation of Google's Nested Learning (HOPE) architecture, focusing on its mechanism-level components and testing procedures. It provides guidance on installation, usage, and evaluation, including various training configurations and memory management strategies for machine learning models.
This article details how to create a football chatbot that assists defensive coordinators by analyzing opponent tendencies. It outlines the process of building and continuously optimizing the chatbot using expert feedback and specific domain knowledge.
This article presents Agentic Rubrics, a method for verifying software engineering agents without executing code. By using a context-grounded checklist created by an expert agent, candidate patches are scored efficiently, providing a more interpretable alternative to traditional verification methods. The results show significant improvements in scoring compared to existing baselines.