Click any tag below to further narrow down your results
Links
This article provides a guide to 15 essential metrics for monitoring Kubernetes environments. It focuses on how these metrics can help optimize performance, troubleshoot issues, and maintain system health. The content is aimed at developers and IT operations teams.
A security researcher revealed a Kubernetes vulnerability that allows users with read-only permissions to execute arbitrary commands on pods. This exploit stems from the nodes/proxy GET resource, which many monitoring tools use, and poses significant risks to cluster security. Until the upcoming KEP-2862 is fully implemented, organizations need to audit their permissions and consider stricter access controls.
This article discusses the clopus-watcher, an autonomous agent designed to monitor applications in Kubernetes and apply hotfixes as needed. The author argues that such systems could eventually replace many roles currently held by 24/7 on-call engineers.
This article introduces Container Network Observability for Amazon EKS, a feature that enhances visibility into network performance and traffic patterns within Kubernetes clusters. It details key functionalities like performance metrics, service maps, and flow tables to help teams troubleshoot and optimize their containerized applications.
This article explains Kubernetes metrics and their importance in monitoring cluster health and performance. It covers various types of metrics, such as cluster, node, pod, network, storage, and application metrics, along with tools for effective monitoring.
Pinterest encountered a significant performance issue during the migration of its search infrastructure, Manas, to Kubernetes, where one in a million search requests experienced latency spikes. The investigation revealed that cAdvisor’s memory monitoring processes were causing excessive contention, leading to these delays. The team resolved the issue by disabling a specific metric in cAdvisor, allowing them to continue their migration efforts without compromising performance.
Setting up a local Langfuse server with Kubernetes allows developers to manage traces and metrics for sensitive LLM applications without relying on third-party services. The article details the necessary tools and configurations, including Helm, Kustomize, and Traefik, to successfully deploy and access Langfuse on a local GPU cluster. It also provides insights on managing secrets and testing the setup through a Python container.
Octopus has introduced the Kubernetes Live Object Status feature to enhance its Kubernetes agent, enabling simplified deployments and robust post-deployment monitoring for applications running on Kubernetes. This feature allows users to view the status of Kubernetes resources in real-time and provides detailed insights for troubleshooting, aiming to streamline the continuous delivery process.
Microsoft has introduced container network logs in the public preview of Advanced Container Networking Services for Azure Kubernetes Service, providing detailed insights into network traffic. This feature enhances troubleshooting, security enforcement, and operational efficiency by monitoring various traffic layers and offering two modes of log storage. Users can visualize logs through Azure managed Grafana dashboards for better analysis and monitoring.
Memory usage in Prometheus can escalate dramatically in enterprise Kubernetes environments due to high-cardinality metrics and labels. This article details methods to analyze and reduce memory consumption effectively, including identifying redundant metrics and employing scripts to optimize monitoring without losing essential data.
By implementing a php-fpm-exporter in a Kubernetes environment, the author identified severe underutilization of PHP-FPM processes due to a misconfigured shared configuration file. After analyzing the traffic patterns and adjusting the PHP-FPM settings accordingly, memory utilization was reduced by over 80% without sacrificing performance. The article emphasizes the importance of customizing configurations based on specific application needs rather than relying on default settings.