A new scaling paradigm for language models, called Parallel Scaling (ParScale), is introduced, emphasizing parallel computation during training and inference. This approach demonstrates significant benefits, including improved reasoning performance, greater inference efficiency, and reduced memory and latency costs compared to traditional parameter scaling. The authors provide various models and tools to facilitate implementation and experimentation with this new scaling law.
parallel-scaling ✓
language-models ✓
reasoning ✓
inference-efficiency ✓
+ cost-analysis