Quit Emailing Yourself

BERT is just a Single Text Diffusion Step | nathan.rs

The article discusses the concept of using discrete language diffusion models for text generation, specifically highlighting how BERT's masked language modeling can be generalized into a diffusion framework. It explores the evolution from traditional models like BERT and GPT to the newer Gemini Diffusion model, and introduces the idea of transforming BERT's training objective into a generative process through variable masking rates. The author also notes the existence of related work, such as DiffusionBERT, which performs similar tasks with rigorous testing.

Saved by hn_user_10 · 2 others saved this · Last saved October 28, 2025 · 3 min read

+ bert diffusion ✓ + language models + language_modeling

[2505.22618] Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

The article introduces Fast-dLLM, a method for accelerating diffusion-based large language models (LLMs) by implementing a block-wise approximate Key-Value (KV) Cache and a confidence-aware parallel decoding strategy. This approach addresses the slow inference speed of diffusion LLMs and mitigates quality degradation during parallel token decoding, achieving significant throughput improvements while maintaining accuracy. Experimental results show up to 27.6 times higher throughput, facilitating the practical deployment of diffusion LLMs.

Saved by hn_user_7 · 2 others saved this · Last saved October 28, 2025 · 3 min read

diffusion ✓ + language models + parallel decoding + acceleration

Links

BERT is just a Single Text Diffusion Step | nathan.rs

[2505.22618] Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding