10 links
tagged with all of: pytorch + machine-learning
Click any tag below to further narrow down your results
Links
Researchers demonstrated the use of torchft and torchtitan for training a model under extreme synthetic failure rates, achieving fault tolerance without relying on checkpoints. By employing a novel asynchronous weight transfer method, they successfully isolated failures and maintained training continuity across multiple GPU groups.
FlashPack is a new file format and loading mechanism for PyTorch that significantly speeds up model checkpoint loading, achieving 3-6 times faster performance than existing methods. By flattening weights into a contiguous byte stream and optimizing parallel processing between CPU and GPU, FlashPack enhances efficiency in model I/O, making it ideal for machine learning applications. Users can easily convert and integrate their models with FlashPack to benefit from faster loading times.
The article discusses advancements in accelerating graph learning models using PyG (PyTorch Geometric) and Torch Compile, highlighting methods that enhance performance and efficiency in processing graph data. It details practical implementations and the impact of these optimizations on machine learning tasks involving graphs.
The Kubeflow Trainer project has been integrated into the PyTorch ecosystem, providing a scalable and community-supported solution for running PyTorch on Kubernetes. It simplifies distributed training of AI models and fine-tuning of large language models (LLMs) while optimizing GPU utilization and supporting advanced scheduling capabilities. The integration enhances the deployment of distributed PyTorch applications and offers a streamlined experience for AI practitioners and platform admins alike.
ZClip is an adaptive gradient clipping technique for mitigating gradient spikes during LLM pre-training, utilizing Exponential Moving Averages to adjust clipping thresholds dynamically. It enhances training stability and efficiency by responding to changes in gradient norms without relying on fixed thresholds. The implementation is compatible with PyTorch and PyTorch Lightning, allowing seamless integration into training pipelines.
Helion introduces a high-level domain-specific language that simplifies kernel development for machine learning by compiling Python-embedded code into optimized Triton code. It automates complex tasks like memory management and tuning, allowing developers to focus on algorithmic logic rather than hardware specifics. Helion's autotuning engine enhances performance portability across different hardware architectures with minimal effort.
The article introduces the PyTorch Native Agentic Stack, a new framework designed to enhance the development of AI applications by providing a more efficient and integrated approach to leveraging PyTorch's capabilities. It emphasizes the stack's ability to simplify the implementation of agent-based systems and improve overall performance in machine learning tasks.
PyTorch has released native quantized models, including Phi4-mini-instruct and Qwen3, optimized for both server and mobile platforms using int4 and float8 quantization methods. These models offer efficient inference with minimal accuracy degradation and come with comprehensive recipes for users to apply quantization to their own models. Future updates will include new features and collaborations aimed at enhancing quantization techniques and performance.
ZeroGPU enables efficient use of Nvidia H200 hardware in Hugging Face Spaces by allowing users to avoid keeping GPUs locked during idle periods. The article discusses how ahead-of-time (AoT) compilation with PyTorch can significantly enhance performance, reducing processing time for generating images and videos with speedups of 1.3x to 1.8x. It also provides a guide on implementing AoT compilation in ZeroGPU Spaces, including advanced techniques like FP8 quantization.
PyTorch Conference 2025 will take place in San Francisco from October 22-23, featuring keynotes, workshops, and technical sessions focused on advancements in AI. The event includes co-located summits and the launch of PyTorch training and certification, aimed at connecting AI innovators and practitioners. Session recordings and presentation slides will be available for attendees to review after the conference.