Click any tag below to further narrow down your results
Links
Kubernetes v1.35 introduces workload aware scheduling, enhancing how multiple Pods are scheduled together. It features a new Workload API for defining scheduling requirements and supports gang scheduling to optimize resource use for large workloads. The update also includes opportunistic batching to speed up scheduling for identical Pods.
This article explains how to implement large-scale inference for language models using Kubernetes. It covers key concepts like batching strategies, performance metrics, and intelligent routing to optimize GPU usage. Practical deployment examples and challenges in managing inference are also discussed.