1 link tagged with all of: reproducibility + nondeterminism + gpu-inference
Click any tag below to further narrow down your results
Links
This article digs into why repeated LLM calls can produce different outputs even at zero temperature. It shows that floating-point non-associativity and kernel implementation details—rather than thread scheduling or atomic adds—are the real sources of run-to-run variation and outlines ways to make inference fully reproducible.