Achieving reproducibility in large language model (LLM) inference is challenging due to inherent nondeterminism, often attributed to floating-point non-associativity and concurrency issues. However, most kernels in LLMs do not require atomic adds, which are a common source of nondeterminism, suggesting that the causes of variability in outputs are more complex. The article explores these complexities and offers insights into obtaining truly reproducible results in LLM inference.
+ nondeterminism
reproducibility ✓
floating-point ✓
inference ✓
language-models ✓