Click any tag below to further narrow down your results
Links
Azure's ND GB300 v6 virtual machines achieved a record-breaking performance of 1.1 million tokens per second on the Llama2 70B model. This surpasses the previous record by 27% and features enhanced hardware optimizations for better inference workloads. The results were verified by Signal65.
A comprehensive guide for deploying AI models using vLLM on Azure Kubernetes Service (AKS) with NVIDIA H100 GPUs and Multi-Instance GPU (MIG) technology is provided. It outlines the necessary prerequisites, steps for infrastructure creation, GPU component installation, and model deployment, enabling efficient utilization of resources and cost savings through hardware isolation.