Quit Emailing Yourself

# reproducibility → floating-point → gpu-inference → nondeterminism → kernels

1 link tagged with all of: reproducibility + floating-point + gpu-inference + nondeterminism + kernels

Links

Defeating Nondeterminism in LLM Inference

This article digs into why repeated LLM calls can produce different outputs even at zero temperature. It shows that floating-point non-associativity and kernel implementation details—rather than thread scheduling or atomic adds—are the real sources of run-to-run variation and outlines ways to make inference fully reproducible.

Last saved Apr 14, 2026 · 6 min read

nondeterminism floating-point kernels gpu-inference reproducibility