3 links
tagged with all of: kubernetes + autoscaling
Click any tag below to further narrow down your results
Links
KServe v0.15 has been released, enhancing capabilities for serving generative AI models, including support for large language models (LLMs) and advanced caching mechanisms. Key features include integration with Envoy AI Gateway, multi-node inference, and autoscaling with KEDA, aimed at improving performance and scalability for AI workloads. The update also introduces a dedicated documentation section for generative AI and various performance optimizations.
Elastic's transformation to a serverless architecture for Elastic Cloud Serverless involved shifting from a stateful system to a stateless design, leveraging cloud-native object storage and Kubernetes for orchestration. The changes aimed to meet evolving customer needs for simplified infrastructure management and scalability while optimizing performance and reducing operational complexity. Key strategies included using a push model for control and data communication, automated upgrades, and flexible usage-based pricing.
The blog discusses the introduction of configurable tolerance settings in the Horizontal Pod Autoscaler (HPA) for Kubernetes version 1.33, allowing developers to define how aggressively the HPA should respond to changes in resource demand. This enhancement aims to improve application stability and performance by allowing more fine-tuned control over scaling behaviors.