gpu-inference

# reproducibility → nondeterminism → gpu-inference

1 link tagged with all of: reproducibility + nondeterminism + gpu-inference

Click any tag below to further narrow down your results

Links

Defeating Nondeterminism in LLM Inference

This article digs into why repeated LLM calls can produce different outputs even at zero temperature. It shows that floating-point non-associativity and kernel implementation details—rather than thread scheduling or atomic adds—are the real sources of run-to-run variation and outlines ways to make inference fully reproducible.

Last saved Apr 14, 2026 · 6 min read

nondeterminism + floating-point + kernels gpu-inference reproducibility