Quit Emailing Yourself

# language-models → inference

4 links tagged with all of: language-models + inference

Click any tag below to further narrow down your results

Links

Set Block Decoding is a Language Model Inference Accelerator

Set Block Decoding (SBD) introduces a novel approach to accelerate the inference process in autoregressive language models by integrating next token prediction and masked token prediction. This method allows for parallel sampling of multiple tokens and achieves a significant reduction in computational requirements without compromising accuracy, as demonstrated through fine-tuning existing models like Llama-3.1 and Qwen-3. SBD provides a 3-5x decrease in forward passes needed for generation while maintaining performance levels similar to standard training methods.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ machine-learning language-models ✓ inference ✓ + acceleration + token-prediction

Alex L. Zhang | Recursive Language Models

Recursive Language Models (RLMs) are introduced as a novel inference strategy allowing language models to recursively interact with unbounded input context through REPL environments. This approach aims to mitigate the context rot phenomenon and improve performance on long-context benchmarks, showing promising early results that suggest RLMs may enhance general-purpose inference capabilities.

Saved by tldr-importer · Last saved October 29, 2025 · 7 min read

+ recursive-models language-models ✓ + context-rot inference ✓ + benchmarks

Defeating Nondeterminism in LLM Inference

Achieving reproducibility in large language model (LLM) inference is challenging due to inherent nondeterminism, often attributed to floating-point non-associativity and concurrency issues. However, most kernels in LLMs do not require atomic adds, which are a common source of nondeterminism, suggesting that the causes of variability in outputs are more complex. The article explores these complexities and offers insights into obtaining truly reproducible results in LLM inference.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ nondeterminism + reproducibility + floating-point inference ✓ language-models ✓

https://epoch.ai/blog/inference-economics-of-language-models

The article explores the economic implications of using language models for inference, highlighting the costs associated with deploying these models in real-world applications. It discusses factors that influence pricing, efficiency, and the overall impact on businesses leveraging language models in various sectors. The analysis aims to provide insights into optimizing the use of language models while balancing performance and cost-effectiveness.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

inference ✓ language-models ✓ + economics + optimization + deployment