Quit Emailing Yourself

# deep-learning → performance

3 links tagged with all of: deep-learning + performance

Click any tag below to further narrow down your results

Links

[no-title]

NUMA (Non-Uniform Memory Access) awareness is crucial for optimizing high-performance deep learning applications, as it impacts memory access patterns and overall system efficiency. By understanding NUMA architecture and implementing strategies that leverage it, developers can significantly enhance the performance of deep learning models on multi-core systems.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ numa deep-learning ✓ performance ✓ + optimization + architecture

DeepNVMe: Affordable I/O scaling for Deep Learning Applications

DeepNVMe has been updated to enhance I/O performance in deep learning applications by improving checkpointing with FastPersist and model inference with ZeRO-Inference. These advancements include support for CPU-only environments, offset-based I/O operations, and tensor data type casting, along with significant speedups facilitated by Gen5 NVMe SSDs. The updates aim to democratize access to large models and optimize I/O-bound workloads for various users.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

deep-learning ✓ + nvme performance ✓ + optimization + checkpointing

ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization

ParetoQ is a novel algorithm for low-bit quantization of large language models, unifying binary, ternary, and 2-to-4 bit quantization-aware training. It achieves state-of-the-art performance across all bit widths and offers a reliable framework for comparing quantization methods, demonstrating that lower-bit quantization can surpass traditional 4-bit methods in both accuracy and efficiency. The integration of ParetoQ into the torchao library facilitates easy deployment on edge devices while optimizing accuracy and compression trade-offs.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ quantization deep-learning ✓ + models performance ✓ + paretoq