Quit Emailing Yourself

# training → efficiency

4 links tagged with all of: training + efficiency

Click any tag below to further narrow down your results

Links

Squeezing 1TB Model Rollout into a Single H200: INT4 QAT RL End-to-End Practice | LMSYS Org

The SGLang RL team developed an end-to-end INT4 Quantization-Aware Training (QAT) pipeline that enhances training efficiency and model stability. By using fake quantization during training and real quantization at inference, they achieved significant performance improvements for large models on a single GPU. The article details the technical steps taken and results of their approach.

Saved by tldr-importer · Last saved February 14, 2026 · 6 min read

+ int4 + quantization + reinforcement-learning training ✓ efficiency ✓

DeepSeek Touts New Training Method as China Pushes AI Efficiency - Bloomberg

DeepSeek introduced a paper detailing its innovative training method called Manifold-Constrained Hyper-Connections. This approach aims to enhance scalability and reduce energy use in AI development, addressing challenges tied to limited access to Nvidia chips in China.

Saved by tldr-importer · Last saved February 14, 2026 · 1 min read

+ deepseek + ai training ✓ efficiency ✓ + china

GitHub - yifan123/flow_grpo: [NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Flow-GRPO-Fast is a newly introduced accelerated variant of the Flow-GRPO model that enhances training efficiency by reducing the number of denoising steps required per trajectory. Recent updates include support for various models and reward mechanisms, as well as improvements in training parameters to optimize performance on tasks such as image editing and generation. The article outlines detailed instructions for setup, training, and model implementation across multiple environments.

Saved by tldr-importer · Last saved October 29, 2025 · 5 min read

+ flow-grpo training ✓ efficiency ✓ + image-editing + reward-models

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

TRL has introduced co-located vLLM to improve the efficiency of training large language models by allowing both training and inference to run on the same GPUs, eliminating idle time and reducing hardware costs. This integration enhances throughput, simplifies deployment, and makes the system more robust for online learning setups like GRPO. The new approach is supported by a series of performance experiments demonstrating significant speedups compared to traditional server setups.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ gpu efficiency ✓ training ✓ + inference + vllm