Quit Emailing Yourself

# performance → inference → gpu

2 links tagged with all of: performance + inference + gpu

Click any tag below to further narrow down your results

Links

Breaking the Million-Token Barrier

Azure's ND GB300 v6 virtual machines achieved a record-breaking performance of 1.1 million tokens per second on the Llama2 70B model. This surpasses the previous record by 27% and features enhanced hardware optimizations for better inference workloads. The results were verified by Signal65.

Saved by tldr-importer · Last saved February 14, 2026 · 4 min read

+ azure performance ✓ inference ✓ + llama2 gpu ✓

'I paid for the whole GPU, I am going to use the whole GPU': A high-level guide to GPU utilization

GPUs are critical for high-performance computing, particularly for neural network inference workloads, but achieving optimal GPU utilization can be challenging. This guide outlines three key metrics of GPU utilization—allocation, kernel, and model FLOP/s utilization—and discusses strategies to improve efficiency and performance in GPU applications. Modal's solutions aim to enhance GPU allocation and kernel utilization, helping users achieve better performance and cost-effectiveness.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

gpu ✓ + utilization performance ✓ + neural-networks inference ✓