Quit Emailing Yourself

5 links tagged with all of: machine-learning + kubernetes

Click any tag below to further narrow down your results

+ ai (1) + eks (1) + open-source (1) + kubeflow (1) + pytorch (1) + distributed-training (1) + openai (1) + apache (1) + infrastructure (1) + langfuse (1) + deployment (1) + monitoring (1) + agentic-ai (1) + platform-engineering (1) + workflow-orchestration (1)

Links

Setting Up A Local Langfuse Server With Kubernetes To Trace Agentic Systems | Xebia

Setting up a local Langfuse server with Kubernetes allows developers to manage traces and metrics for sensitive LLM applications without relying on third-party services. The article details the necessary tools and configurations, including Helm, Kustomize, and Traefik, to successfully deploy and access Langfuse on a local GPU cluster. It also provides insights on managing secrets and testing the setup through a Python container.

Saved by tldr-importer · Last saved October 29, 2025 · 4 min read

+ langfuse kubernetes ✓ + deployment + monitoring machine-learning ✓

Kubernetes for agentic apps: A platform engineering perspective

Software is transitioning towards genuine autonomy through agentic AI, which utilizes Large Language Models for proactive, goal-driven operations. Kubernetes offers a robust platform engineering foundation to meet the unique demands of agentic workloads, addressing challenges such as dynamic compute, persistent state management, and complex orchestration, while emphasizing the need for a platform-centric approach in deploying agentic AI at scale.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ agentic-ai kubernetes ✓ + platform-engineering machine-learning ✓ + workflow-orchestration

https://blog.bytebytego.com/p/how-openai-uses-kubernetes-and-apache

OpenAI leverages Kubernetes and Apache technologies to manage their scalable infrastructure effectively, ensuring that machine learning models can be deployed and maintained seamlessly. The integration of these tools allows for efficient resource management and orchestration, enabling OpenAI to handle complex workloads and enhance their service delivery.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ openai kubernetes ✓ + apache machine-learning ✓ + infrastructure

PyTorch on Kubernetes: Kubeflow Trainer Joins the PyTorch Ecosystem

The Kubeflow Trainer project has been integrated into the PyTorch ecosystem, providing a scalable and community-supported solution for running PyTorch on Kubernetes. It simplifies distributed training of AI models and fine-tuning of large language models (LLMs) while optimizing GPU utilization and supporting advanced scheduling capabilities. The integration enhances the deployment of distributed PyTorch applications and offers a streamlined experience for AI practitioners and platform admins alike.

Saved by tldr-importer · Last saved October 29, 2025 · 5 min read

+ kubeflow + pytorch + distributed-training kubernetes ✓ machine-learning ✓

Introducing AI on EKS: powering scalable AI workloads with Amazon EKS | Amazon Web Services

Amazon Web Services has launched AI on EKS, an open source initiative aimed at simplifying the deployment and scaling of AI/ML workloads on Amazon Elastic Kubernetes Service. This project provides deployment-ready blueprints, Terraform templates, and best practices to optimize infrastructure for large language models and other AI tasks, while separating it from the previously established Data on EKS initiative to enhance focus and maintainability.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ ai + eks kubernetes ✓ machine-learning ✓ + open-source