7 links
tagged with all of: machine-learning + deployment
Click any tag below to further narrow down your results
Links
The EdgeAI for Beginners course offers a comprehensive introduction to deploying artificial intelligence on edge devices, emphasizing practical applications, privacy, and real-time performance. It covers small language models, optimization techniques, and production strategies, with hands-on workshops and resources for various technical roles across multiple industries. Participants can follow a structured learning path and engage with a community of developers for support.
Setting up a local Langfuse server with Kubernetes allows developers to manage traces and metrics for sensitive LLM applications without relying on third-party services. The article details the necessary tools and configurations, including Helm, Kustomize, and Traefik, to successfully deploy and access Langfuse on a local GPU cluster. It also provides insights on managing secrets and testing the setup through a Python container.
The article discusses the concept of an AI engineering stack, outlining the various components and tools necessary for building and deploying AI systems effectively. It emphasizes the importance of a structured approach to integrate AI into existing workflows and highlights key technologies that facilitate this process.
The article discusses the deployment of machine learning agents as real-time APIs, emphasizing the benefits of using such systems for enhanced efficiency and responsiveness. It explores the technical aspects and considerations involved in implementing these agents effectively in various applications.
Inferless is a serverless GPU platform designed for effortless machine learning model deployment, allowing users to scale from zero to hundreds of GPUs quickly and efficiently. With features like automatic redeployment, zero infrastructure management, and enterprise-level security, it enables companies to save costs and enhance performance without the hassles of traditional GPU clusters. The platform will be sunsetting on October 31, 2025.
PyTorch has released native quantized models, including Phi4-mini-instruct and Qwen3, optimized for both server and mobile platforms using int4 and float8 quantization methods. These models offer efficient inference with minimal accuracy degradation and come with comprehensive recipes for users to apply quantization to their own models. Future updates will include new features and collaborations aimed at enhancing quantization techniques and performance.
The webpage provides an overview of Baseten's Model APIs, which facilitate the deployment and management of machine learning models. It emphasizes ease of integration, scalability, and the ability to create robust APIs for various applications. Users can leverage these APIs to streamline their machine learning workflows and enhance application performance.