1 link tagged with all of: machine-learning + pytorch + training-stability + gradient-clipping
Click any tag below to further narrow down your results
Links
ZClip is an adaptive gradient clipping technique for mitigating gradient spikes during LLM pre-training, utilizing Exponential Moving Averages to adjust clipping thresholds dynamically. It enhances training stability and efficiency by responding to changes in gradient norms without relying on fixed thresholds. The implementation is compatible with PyTorch and PyTorch Lightning, allowing seamless integration into training pipelines.