1 link tagged with all of: reproducibility + floating-point + gpu-inference + nondeterminism + kernels
Links
This article digs into why repeated LLM calls can produce different outputs even at zero temperature. It shows that floating-point non-associativity and kernel implementation details—rather than thread scheduling or atomic adds—are the real sources of run-to-run variation and outlines ways to make inference fully reproducible.
nondeterminism
floating-point
kernels
gpu-inference
reproducibility