7 links
tagged with all of: reasoning + machine-learning
Click any tag below to further narrow down your results
Links
R-Zero is a self-evolving framework for Large Language Models (LLMs) that generates its own training data autonomously, circumventing reliance on human-curated tasks. It features two models—the Challenger, which poses increasingly difficult tasks, and the Solver, which solves them—allowing for co-evolution and significant improvements in reasoning capabilities across various benchmarks. Empirical results show notable enhancements in performance, particularly with the Qwen3-4B-Base model.
Daily-Omni is introduced as a new benchmark for audio-visual reasoning, featuring 684 videos and 1197 QA pairs across various tasks. The study highlights the challenges faced by current multimodal large language models in integrating audio and visual information, while demonstrating that combining visual and audio models with temporal alignment techniques can enhance performance. The paper also presents a QA generation pipeline to improve efficiency and scalability in evaluation.
Robix is a unified model that integrates robot reasoning, task planning, and natural language interaction, enhancing human-robot collaboration through a hierarchical system. It employs innovative capabilities such as proactive dialogue and context-aware reasoning, achieving superior performance in interactive task execution across various user-involved scenarios. Extensive evaluations show that Robix outperforms leading models in both foundational and interactive capabilities.
Recent advancements in large language models (LLMs) have prompted discussions about their reasoning capabilities. This study introduces a representation engineering approach that leverages model activations to create control vectors, enhancing reasoning performance on various tasks without additional training. The results indicate that modulating model activations can effectively improve LLMs' reasoning abilities.
M1 introduces a hybrid linear RNN reasoning model based on the Mamba architecture, designed for scalable test-time computation in solving complex mathematical problems. By leveraging distillation from existing models and reinforcement learning, M1 achieves significant speed and accuracy improvements over traditional transformer models, matching the performance of state-of-the-art distilled reasoning models while utilizing memory-efficient inference techniques.
JudgeLRM introduces a novel approach to using Large Language Models (LLMs) as evaluators, particularly in complex reasoning tasks. By employing reinforcement learning with judge-wise rewards, JudgeLRM models significantly outperform traditional Supervised Fine-Tuning methods and current leading models, demonstrating superior performance in tasks that require deep reasoning.
XBai o4 is the latest fourth-generation open-source large model technology, showcasing enhanced complex reasoning capabilities that surpass OpenAI-o3-mini in Medium mode. It employs a novel reflective generative training form to significantly reduce inference costs and improve response quality. The repository includes training and evaluation code, along with instructions for setup and benchmarks.