Click any tag below to further narrow down your results
Links
This article discusses Google's latest advancements in Google Kubernetes Engine (GKE) as it marks its 10th anniversary. Key updates include the introduction of Agent Sandbox for AI workloads, enhancements to autoscaling, and new compute classes to improve efficiency and performance across various workloads.
This article discusses the challenges of scaling Kubernetes nodes from zero, focusing on the startup latency that can occur. It introduces concepts like reservation and overprovisioning placeholders to reduce delays and improve user experience, especially during spikes in traffic.
The article discusses the rising adoption of GPUs for AI workloads and how organizations are increasingly using serverless compute services like AWS Lambda and Google Cloud Run. It highlights the inefficiencies in resource utilization across various platforms and the growing use of Kubernetes features like Horizontal Pod Autoscaler to optimize resource management.
KServe v0.15 has been released, enhancing capabilities for serving generative AI models, including support for large language models (LLMs) and advanced caching mechanisms. Key features include integration with Envoy AI Gateway, multi-node inference, and autoscaling with KEDA, aimed at improving performance and scalability for AI workloads. The update also introduces a dedicated documentation section for generative AI and various performance optimizations.
Elastic's transformation to a serverless architecture for Elastic Cloud Serverless involved shifting from a stateful system to a stateless design, leveraging cloud-native object storage and Kubernetes for orchestration. The changes aimed to meet evolving customer needs for simplified infrastructure management and scalability while optimizing performance and reducing operational complexity. Key strategies included using a push model for control and data communication, automated upgrades, and flexible usage-based pricing.
The blog discusses the introduction of configurable tolerance settings in the Horizontal Pod Autoscaler (HPA) for Kubernetes version 1.33, allowing developers to define how aggressively the HPA should respond to changes in resource demand. This enhancement aims to improve application stability and performance by allowing more fine-tuned control over scaling behaviors.