Quit Emailing Yourself

# ai-models → inference

4 links tagged with all of: ai-models + inference

Click any tag below to further narrow down your results

Links

OpenAI gpt-oss LLMs use MXFP4: smaller, faster, cheaper

OpenAI has adopted a new data type called MXFP4, which significantly reduces inference costs by up to 75% by making models smaller and faster. This micro-scaling block floating-point format allows for greater efficiency in running large language models (LLMs) on less hardware, potentially transforming how AI models are deployed across various platforms. OpenAI's move emphasizes the efficacy of MXFP4, effectively setting a new standard in model quantization for the industry.

Saved by tldr-importer · Last saved October 29, 2025 · 5 min read

+ openai + mxfp4 inference ✓ + quantization ai-models ✓

Blazingly fast whisper transcriptions with Inference Endpoints

Hugging Face has launched a new deployment option for OpenAI's Whisper model on Inference Endpoints, offering up to 8x performance improvements for transcription tasks. The platform leverages advanced optimizations like PyTorch compilation and CUDA graphs, enhancing the efficiency and speed of audio transcriptions while maintaining high accuracy. Users can easily deploy their own ASR pipelines with minimal effort and access powerful hardware options.

Saved by tldr-importer · Last saved October 29, 2025 · 4 min read

+ hugging-face + whisper + transcription inference ✓ ai-models ✓

Cohere on Hugging Face Inference Providers 🔥

Cohere has become a supported Inference Provider on the Hugging Face Hub, allowing users to access a variety of enterprise-focused AI models designed for tasks such as generative AI, embeddings, and vision-language applications. The article highlights several of Cohere's models, their features, and how to implement them using the Hugging Face platform, including serverless inference capabilities and integration with client SDKs.

Saved by tldr-importer · Last saved October 29, 2025 · 4 min read

+ cohere inference ✓ + hugging-face ai-models ✓ + enterprise

Scaleway on Hugging Face Inference Providers 🔥

Scaleway has been added as a new Inference Provider on the Hugging Face Hub, allowing users to easily access various AI models through a serverless API. The service features competitive pricing, low latency, and supports advanced functionalities like structured outputs and multimodal processing, making it suitable for production use. Users can manage their API keys and preferences directly within their accounts for seamless integration.

Saved by tldr-importer · Last saved October 29, 2025 · 3 min read

+ scaleway + hugging-face inference ✓ ai-models ✓ + serverless