12 links
tagged with all of: reinforcement-learning + open-source
Click any tag below to further narrow down your results
Links
DeepCoder-14B-Preview is a new open-source code reasoning model developed by Agentica and Together AI, achieving a 60.6% Pass@1 accuracy on LiveCodeBench with 14B parameters. It utilizes a carefully curated dataset of 24K verified coding problems and advanced reinforcement learning techniques to enhance its performance and generalization capabilities, surpassing existing benchmarks. The project includes open-sourced training materials and optimizations for further development in the coding domain.
The article describes the implementation of the DeepSeek R1-zero style training for large language models (LLMs) using a single or multiple GPUs, with a focus on simplicity and efficiency. It highlights the capabilities of the nanoAhaMoment project, which includes full parameter tuning, multi-GPU support, and a full evaluation suite, while maintaining competitive performance with minimal complexity. The repository offers interactive Jupyter notebooks and scripts for training, complete with installation instructions and dependency management.
INTELLECT-2 has been launched as the first decentralized Reinforcement Learning framework with 32 billion parameters, allowing anyone to contribute compute resources. It introduces a new asynchronous training paradigm that supports heterogeneous nodes and focuses on efficient validation and communication, while enabling the training of state-of-the-art reasoning models under controlled thinking budgets. The initiative aims to create a sovereign open-source AI ecosystem with mechanisms to ensure honest participation and verify contributions.
Kimi-Dev-72B is an advanced open-source coding language model designed for software engineering tasks, achieving a state-of-the-art performance of 60.4% on the SWE-bench Verified benchmark. It leverages large-scale reinforcement learning to autonomously patch real repositories and ensures high-quality solutions by only rewarding successful test suite completions. Developers and researchers are encouraged to explore and contribute to its capabilities, available for download on Hugging Face and GitHub.
Qwen3-Coder has been launched as a powerful code model boasting 480 billion parameters and exceptional capabilities in coding and agentic tasks, including a context length of up to 1 million tokens. The release includes the Qwen Code CLI tool for enhanced coding tasks and emphasizes advancements in reinforcement learning for real-world coding applications. Ongoing developments aim to improve performance and explore self-improvement capabilities for coding agents.
Mini-o3 introduces an advanced system that enhances tool-based interactions for visual reasoning by supporting deep, multi-turn reasoning and achieving state-of-the-art performance on visual search tasks. The system utilizes a novel over-turn masking strategy to effectively manage response lengths during reinforcement learning, combined with a comprehensive dataset designed for exploratory reasoning. Open-source code and models are provided to facilitate reproducibility and further research.
Reinforcement learning (RL) is becoming essential in developing large language models (LLMs), particularly for aligning them with human preferences and enhancing their capabilities through multi-turn interactions. This article reviews various open-source RL libraries, analyzing their designs and trade-offs to assist researchers in selecting the appropriate tools for specific applications. Key libraries discussed include TRL, Verl, OpenRLHF, and several others, each catering to different RL needs and architectures.
INTELLECT-2 is a groundbreaking 32 billion parameter model trained using a decentralized reinforcement learning framework called PRIME-RL, enabling fully asynchronous training across a global network of contributors. The model demonstrates significant improvements in reasoning tasks and is open-sourced to foster further research in decentralized AI training methodologies.
Tunix is a new open-source, JAX-native library designed to simplify the post-training process for large language models (LLMs). It offers a comprehensive toolkit for model alignment, including various algorithms for supervised fine-tuning, preference tuning, reinforcement learning, and knowledge distillation, all optimized for performance on TPUs. The library enhances the developer experience with a white-box design and seamless integration into the JAX ecosystem.
InternVL3.5 introduces a new family of open-source multimodal models that enhance versatility, reasoning capabilities, and inference efficiency. A key innovation is the Cascade Reinforcement Learning framework, which improves reasoning tasks significantly while a Visual Resolution Router optimizes visual token resolution. The model achieves notable performance gains and supports advanced capabilities like GUI interaction and embodied agency, positioning it competitively against leading commercial models.
OpenThinkIMG is an open-source framework that enables Large Vision-Language Models (LVLMs) to engage in interactive visual cognition, allowing AI agents to effectively think with images. It features a flexible tool management system, a dynamic inference pipeline, and a novel reinforcement learning approach called V-ToolRL, which enhances the adaptability and performance of visual reasoning tasks. The project aims to bridge the gap between human-like visual cognition and AI capabilities by providing a standardized platform for tool-augmented reasoning.
The Environments Hub is being launched as an open, community-driven platform for reinforcement learning (RL) environments, aiming to provide a shared space for researchers and developers to build, share, and utilize these environments effectively. This initiative seeks to democratize access to high-quality RL tools, fostering innovation in AI by lowering barriers to creating and training models, while also promoting open-source development in contrast to proprietary systems used by large labs.