Click any tag below to further narrow down your results
Links
This article outlines Zendesk's approach to reducing costs associated with observability data while maintaining essential visibility for engineers. It details their methods for identifying valuable traces and logs, implementing targeted changes, and enhancing cost transparency. The results included significant savings and improved performance monitoring.
This article covers how Pipeline Performance Profiling helps teams analyze and optimize CI/CD pipeline performance. It breaks down execution into measurable phases and provides insights on resource usage, bottlenecks, and cost efficiency. The tool integrates with existing observability tools, making it easier to track performance trends and identify areas for improvement.
Uber improved its data observability by implementing a system that tracks I/O patterns across its cloud and on-prem infrastructure. This allows for real-time insights into application performance, network usage, and data access, aiding in migration to a hybrid cloud model. The solution aggregates metrics without requiring code changes, benefiting various workloads.
New Relic migrated its Lambda Extension from Go to Rust, resulting in a 40% reduction in billed duration and improved memory efficiency. The rewrite also enhanced reliability and introduced a more robust telemetry pipeline.
The article introduces pgX, a tool designed to integrate PostgreSQL monitoring with application and infrastructure observability. It emphasizes the need for a unified approach to diagnose performance issues effectively, moving away from isolated database metrics. This shift helps engineers understand the system's behavior as a whole, improving troubleshooting and optimization efforts.
This article explains Kubernetes metrics and their importance in monitoring cluster health and performance. It covers various types of metrics, such as cluster, node, pod, network, storage, and application metrics, along with tools for effective monitoring.
This article discusses an ebook that analyzes current trends in operating cloud-native applications. It highlights the need for faster deployment and emphasizes the importance of IT automation and customer satisfaction in improving application performance.
This article discusses the limitations of traditional monitoring tools for AI systems and the need for improved observability. It highlights strategies to manage complexity, control costs, and prevent performance issues in AI workflows.
The article discusses the importance of observability in the context of retrieval-augmented generation (RAG) agents, emphasizing how effective monitoring can enhance their performance and reliability. It explores various strategies and tools that can be employed to achieve better insights and control over RAG systems, ultimately leading to improved user experiences.
Observability in applications comes with instrumentation overhead, which can impact performance and resource consumption. A benchmark of OpenTelemetry in a Go application revealed a CPU usage increase of about 35% and some additional memory usage, while still maintaining stable throughput. For teams prioritizing incident resolution, the tradeoff for detailed observability is often justified, though eBPF-based instrumentation offers a lighter alternative for monitoring without significant resource costs.
Amazon ElastiCache now supports Valkey 8.1, introducing new features such as native Bloom filter support, enhanced hash table implementation, and the COMMANDLOG feature for improved performance and observability. These updates aim to enhance application responsiveness while reducing infrastructure costs. The new version is available at no extra cost and allows for easy upgrades without downtime.
eBPF (extended Berkeley Packet Filter) is emerging as a transformative technology for cloud-native applications, enabling developers to execute code in the kernel without modifying the kernel itself. This capability enhances performance, security, and observability in cloud environments, positioning eBPF as a critical component in the next phase of cloud-native development.
Modern observability is essential for developers, enabling them to understand code behavior in production and improve performance and reliability. By integrating observability into development workflows, developers can gain real-time insights, trace issues efficiently, and enhance collaboration across teams. The right observability tools help streamline the debugging process and reduce the cognitive load on developers.
Organizations are struggling with the high costs of traditional log management solutions like Splunk as data volumes grow, prompting a shift towards OpenSearch as a sustainable alternative. OpenSearch enhances log analysis through its Piped Processing Language (PPL) and Apache Calcite for enterprise performance, while unifying the observability experience for users. The platform aims to empower teams with advanced analytics capabilities and community-driven development.
The article discusses how to enable the display of a million spans in the trace details page of an observability tool, enhancing the user experience by providing comprehensive insights into system performance. It highlights the technical challenges faced and the solutions implemented to efficiently manage and visualize large amounts of trace data.
Observability in applications introduces instrumentation overhead that can impact performance, particularly when using OpenTelemetry with Go. A benchmark comparing a Go HTTP server's performance with and without OpenTelemetry revealed a notable increase in CPU and memory usage, but maintained stable throughput. The choice of observability method should balance the need for detailed tracing against resource costs, with eBPF-based instrumentation offering a more lightweight alternative for high-load environments.
Dynatrace's video discusses the challenges organizations face when adopting AI and large language models, focusing on optimizing performance, understanding costs, and ensuring accurate responses. It outlines how Dynatrace utilizes OpenTelemetry for comprehensive observability across the AI stack, including infrastructure, model performance, and accuracy analysis.
Grafana Cloud introduces a new approach to observability by shifting from traditional pillars of logs, metrics, and traces to interconnected rings that optimize performance and reduce telemetry waste. By combining these signals in a context-rich manner, Grafana offers opinionated observability solutions that enhance operational efficiency, lower costs, and provide actionable insights. The article also highlights the integration of AI to further improve observability workflows and decision-making.
New Relic has announced support for the Model Context Protocol (MCP) within its AI Monitoring solution, enhancing application performance management for agentic AI systems. This integration offers improved visibility into MCP interactions, allowing developers to track tool usage, performance bottlenecks, and optimize AI agent strategies effectively. The new feature aims to eliminate data silos and provide a holistic view of AI application performance.
Arc is a high-performance time-series database capable of ingesting 2.4 million metrics per second, along with logs, traces, and events using a unified MessagePack columnar protocol. Currently in alpha release, it features a stable core with ongoing developments, including advanced SQL analytics via DuckDB, flexible storage options, and built-in token-based authentication, making it suitable for development and testing environments. The system is designed for high-throughput ingestion, low latency, and efficient data management, aiming to support observability across various telemetry types.
Character.AI has transformed its fragmented logging system into a centralized one, significantly improving query speeds and enabling real-time visibility for developers. By selectively capturing logs and introducing new features like live tailing and keyword search, the company aims for metric unification to enhance observability and support future growth.