Quit Emailing Yourself

Introducing Kthena: LLM inference for the cloud native era

5 min read | Saved February 14, 2026 | Copied!

kthena 🤖 kubernetes 🤖 llm 🤖 mlops 🤖 volcano 🤖

Do you care about this?

Kthena is a new system tailored for Kubernetes that optimizes the routing, orchestration, and scheduling of Large Language Model (LLM) inference. It addresses key challenges like resource utilization and latency, offering features such as intelligent routing and production-grade orchestration. This sub-project of Volcano enhances support for AI lifecycle management.

If you do, here's more

Kthena is a new project from the Volcano community aimed at improving the deployment of Large Language Models (LLMs) on Kubernetes. It's designed to tackle the challenges developers face when efficiently serving LLMs at scale. Key features include topology-aware scheduling and KV Cache-aware routing, which enhance GPU/NPU utilization and reduce latency. Kthena acts as an orchestration layer, integrating seamlessly into Kubernetes rather than replacing existing inference engines. Its two main components are the Kthena Router, which directs inference requests, and the Kthena Controller Manager, responsible for workload orchestration and lifecycle management.

The article outlines the significant hurdles in deploying LLMs on Kubernetes, such as low resource utilization, the latency vs. throughput trade-off, and the complexity of managing multiple models. Kthena addresses these issues with features like multi-model routing and cost-driven autoscaling. Its intelligent routing capabilities support various algorithms and facilitate non-disruptive model updates. Performance benchmarks show impressive results, with Kthena achieving up to a 2.73x increase in throughput and a 73.5% reduction in time to first token using its KV Cache-aware strategy.

Community support for Kthena is strong, with endorsements from industry leaders like Huawei Cloud and China Telecom AI. Both organizations highlight Kthena's role in enhancing cloud-native AI infrastructure and resource efficiency. These collaborations emphasize Kthena's potential to streamline AI workloads and foster an open ecosystem for developers.

Questions about this article

No questions yet.