Quit Emailing Yourself

7 links tagged with all of: infrastructure + machine-learning

Click any tag below to further narrow down your results

Links

OpenConnect: LinkedIn's Next-Generation AI Pipeline Ecosystem

LinkedIn has developed OpenConnect, a next-generation AI pipeline ecosystem that significantly enhances the efficiency and reliability of processing large volumes of data for AI applications. By addressing challenges from its previous ProML system, OpenConnect reduces launch times, improves iteration speed, and supports robust experimentation, thereby facilitating the deployment of AI features for over 1.2 billion members.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ openconnect + ai-pipeline + scalability infrastructure ✓ machine-learning ✓

https://blog.bytebytego.com/p/how-openai-uses-kubernetes-and-apache

OpenAI leverages Kubernetes and Apache technologies to manage their scalable infrastructure effectively, ensuring that machine learning models can be deployed and maintained seamlessly. The integration of these tools allows for efficient resource management and orchestration, enabling OpenAI to handle complex workloads and enhance their service delivery.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ openai + kubernetes + apache machine-learning ✓ infrastructure ✓

[no-title]

Discord has successfully transitioned from using a single-node system to implementing multi-GPU clusters, making distributed computing more accessible for machine learning engineers. This shift allows for improved performance and efficiency in handling complex machine learning tasks. The article details the technical advancements and the impact on Discord's infrastructure.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ distributed-compute machine-learning ✓ + gpu-clusters infrastructure ✓ + discord

The evolution of Grab's machine learning feature store

Grab has evolved its machine learning feature store by transitioning from a traditional model to a more sophisticated feature table design, utilizing Amazon Aurora Postgres for efficient data management and retrieval. This new architecture addresses complexities in high-cardinality data and improves atomicity, ensuring consistency and reliability in ML model serving. The feature tables enhance user experience and streamline the model lifecycle, resulting in better performance of ML models.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

machine-learning ✓ + feature-store + data-management + amazon-aurora infrastructure ✓

Scaling Pinterest ML Infrastructure with Ray: From Training to End-to-End ML Pipelines

Pinterest has enhanced its machine learning (ML) infrastructure by extending the capabilities of Ray beyond just training and inference. By addressing challenges such as slow data pipelines and inefficient compute usage, Pinterest implemented a Ray-native ML infrastructure that improves feature development, sampling, and labeling, leading to faster, more scalable ML iteration.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

machine-learning ✓ + ray infrastructure ✓ + optimization + data-processing

Managing your ever-growing cloud infrastructure with GenAI, OpenTelemetry, and modern observability

Modern infrastructure complexity necessitates advanced observability tools, which can be achieved through cost-effective storage solutions, standardized data collection with OpenTelemetry, and the integration of machine learning and AI for better insight and efficiency. The evolution in observability is marked by the need for high-fidelity data, seamless signal correlation, and intelligent alert management to keep pace with scaling systems. Ultimately, successful observability will hinge on these innovations to maintain operational efficacy in increasingly intricate environments.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ observability + opentelemetry + ai machine-learning ✓ infrastructure ✓

Modernising Grab’s model serving platform with NVIDIA Triton Inference Server

Grab has modernized its machine learning model serving platform, Catwalk, by adopting NVIDIA Triton Inference Server to enhance performance and reduce costs. The transition involved creating a "Triton manager" for seamless integration and backward compatibility, resulting in significant improvements in latency and infrastructure spending for deployed models.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ model-serving + nvidia-triton machine-learning ✓ + performance-optimization infrastructure ✓