Quit Emailing Yourself

DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level

DeepCoder-14B-Preview is a new open-source code reasoning model developed by Agentica and Together AI, achieving a 60.6% Pass@1 accuracy on LiveCodeBench with 14B parameters. It utilizes a carefully curated dataset of 24K verified coding problems and advanced reinforcement learning techniques to enhance its performance and generalization capabilities, surpassing existing benchmarks. The project includes open-sourced training materials and optimizations for further development in the coding domain.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ deepcoder reinforcement-learning ✓ open-source ✓ + coding-benchmarks + dataset-curation

GitHub - McGill-NLP/nano-aha-moment: Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"

The article describes the implementation of the DeepSeek R1-zero style training for large language models (LLMs) using a single or multiple GPUs, with a focus on simplicity and efficiency. It highlights the capabilities of the nanoAhaMoment project, which includes full parameter tuning, multi-GPU support, and a full evaluation suite, while maintaining competitive performance with minimal complexity. The repository offers interactive Jupyter notebooks and scripts for training, complete with installation instructions and dependency management.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ deep-learning + gpu-training reinforcement-learning ✓ + language-models open-source ✓

INTELLECT-2: The First Globally Distributed Reinforcement Learning Training of a 32B Parameter Model

INTELLECT-2 has been launched as the first decentralized Reinforcement Learning framework with 32 billion parameters, allowing anyone to contribute compute resources. It introduces a new asynchronous training paradigm that supports heterogeneous nodes and focuses on efficient validation and communication, while enabling the training of state-of-the-art reasoning models under controlled thinking budgets. The initiative aims to create a sovereign open-source AI ecosystem with mechanisms to ensure honest participation and verify contributions.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

reinforcement-learning ✓ + decentralized open-source ✓ + compute-pool + ai-ecosystem

moonshotai/Kimi-Dev-72B · Hugging Face

Kimi-Dev-72B is an advanced open-source coding language model designed for software engineering tasks, achieving a state-of-the-art performance of 60.4% on the SWE-bench Verified benchmark. It leverages large-scale reinforcement learning to autonomously patch real repositories and ensures high-quality solutions by only rewarding successful test suite completions. Developers and researchers are encouraged to explore and contribute to its capabilities, available for download on Hugging Face and GitHub.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ coding + llm open-source ✓ reinforcement-learning ✓ + software-engineering

Qwen3-Coder: Agentic Coding in the World

Qwen3-Coder has been launched as a powerful code model boasting 480 billion parameters and exceptional capabilities in coding and agentic tasks, including a context length of up to 1 million tokens. The release includes the Qwen Code CLI tool for enhanced coding tasks and emphasizes advancements in reinforcement learning for real-world coding applications. Ongoing developments aim to improve performance and explore self-improvement capabilities for coding agents.

Saved by tldr-importer · Last saved October 29, 2025 · 4 min read

+ qwen3-coder + code-model reinforcement-learning ✓ + agentic-coding open-source ✓

Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search

Mini-o3 introduces an advanced system that enhances tool-based interactions for visual reasoning by supporting deep, multi-turn reasoning and achieving state-of-the-art performance on visual search tasks. The system utilizes a novel over-turn masking strategy to effectively manage response lengths during reinforcement learning, combined with a comprehensive dataset designed for exploratory reasoning. Open-source code and models are provided to facilitate reproducibility and further research.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ visual-search + multimodal reinforcement-learning ✓ open-source ✓ + dataset

Open Source RL Libraries for LLMs | Anyscale

Reinforcement learning (RL) is becoming essential in developing large language models (LLMs), particularly for aligning them with human preferences and enhancing their capabilities through multi-turn interactions. This article reviews various open-source RL libraries, analyzing their designs and trade-offs to assist researchers in selecting the appropriate tools for specific applications. Key libraries discussed include TRL, Verl, OpenRLHF, and several others, each catering to different RL needs and architectures.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

reinforcement-learning ✓ open-source ✓ + libraries + large-language-models + agentic-rl

INTELLECT-2 Release: The First 32B Parameter Model Trained Through Globally Distributed Reinforcement Learning

INTELLECT-2 is a groundbreaking 32 billion parameter model trained using a decentralized reinforcement learning framework called PRIME-RL, enabling fully asynchronous training across a global network of contributors. The model demonstrates significant improvements in reasoning tasks and is open-sourced to foster further research in decentralized AI training methodologies.

Saved by tldr-importer · Last saved October 29, 2025 · 5 min read

reinforcement-learning ✓ + decentralized open-source ✓ + ai + model-training

Introducing Tunix: A JAX-Native Library for LLM Post-Training

Tunix is a new open-source, JAX-native library designed to simplify the post-training process for large language models (LLMs). It offers a comprehensive toolkit for model alignment, including various algorithms for supervised fine-tuning, preference tuning, reinforcement learning, and knowledge distillation, all optimized for performance on TPUs. The library enhances the developer experience with a white-box design and seamless integration into the JAX ecosystem.

Saved by tldr-importer · Last saved October 29, 2025 · 5 min read

+ tunix + jax + llm open-source ✓ reinforcement-learning ✓

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

InternVL3.5 introduces a new family of open-source multimodal models that enhance versatility, reasoning capabilities, and inference efficiency. A key innovation is the Cascade Reinforcement Learning framework, which improves reasoning tasks significantly while a Visual Resolution Router optimizes visual token resolution. The model achieves notable performance gains and supports advanced capabilities like GUI interaction and embodied agency, positioning it competitively against leading commercial models.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ multimodal + reasoning reinforcement-learning ✓ open-source ✓ + computer-vision

GitHub - zhaochen0110/OpenThinkIMG: OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.

OpenThinkIMG is an open-source framework that enables Large Vision-Language Models (LVLMs) to engage in interactive visual cognition, allowing AI agents to effectively think with images. It features a flexible tool management system, a dynamic inference pipeline, and a novel reinforcement learning approach called V-ToolRL, which enhances the adaptability and performance of visual reasoning tasks. The project aims to bridge the gap between human-like visual cognition and AI capabilities by providing a standardized platform for tool-augmented reasoning.

Saved by tldr-importer · Last saved October 29, 2025 · 5 min read

open-source ✓ + visual-cognition + ai reinforcement-learning ✓ + tools

Environments Hub: A Community Hub To Scale RL To Open AGI

The Environments Hub is being launched as an open, community-driven platform for reinforcement learning (RL) environments, aiming to provide a shared space for researchers and developers to build, share, and utilize these environments effectively. This initiative seeks to democratize access to high-quality RL tools, fostering innovation in AI by lowering barriers to creating and training models, while also promoting open-source development in contrast to proprietary systems used by large labs.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ environments reinforcement-learning ✓ open-source ✓ + ai + community

Links