5 links tagged with all of: reinforcement-learning + software-engineering
Click any tag below to further narrow down your results
Links
The article discusses the release of SWE-1.5, a new coding agent that balances speed and performance through a unified system. It highlights the development process, including reinforcement learning and custom coding environments, which improve task execution and code quality. SWE-1.5 aims to surpass previous models in both speed and effectiveness.
Composer is a new model designed to assist software engineers by generating code and solutions quickly. It uses reinforcement learning to optimize its performance in real-world coding scenarios, enhancing productivity for developers. The model has been tested against real requests to ensure its usefulness in software development.
This article presents Agentic Rubrics, a method for verifying software engineering agents without executing code. By using a context-grounded checklist created by an expert agent, candidate patches are scored efficiently, providing a more interpretable alternative to traditional verification methods. The results show significant improvements in scoring compared to existing baselines.
MiniMax-M1 is a groundbreaking open-weight hybrid-attention reasoning model featuring a Mixture-of-Experts architecture and lightning attention mechanism, optimized for handling complex tasks with long inputs. It excels in various benchmarks, particularly in mathematics, software engineering, and long-context understanding, outperforming existing models with efficient test-time compute scaling. The model is trained through large-scale reinforcement learning and offers function calling capabilities, positioning it as a robust tool for next-generation AI applications.
Kimi-Dev-72B is an advanced open-source coding language model designed for software engineering tasks, achieving a state-of-the-art performance of 60.4% on the SWE-bench Verified benchmark. It leverages large-scale reinforcement learning to autonomously patch real repositories and ensures high-quality solutions by only rewarding successful test suite completions. Developers and researchers are encouraged to explore and contribute to its capabilities, available for download on Hugging Face and GitHub.