Click any tag below to further narrow down your results
Links
This article explains how AssemblyAI offers a comprehensive API for building voice AI applications. It includes features like speech-to-text, speaker identification, and PII redaction, all designed to streamline development and improve accuracy. The platform supports multiple languages and is aimed at enterprises looking for efficient solutions.
Voxtral has released two new speech-to-text models, Voxtral Mini Transcribe V2 for batch processing and Voxtral Realtime for live applications. Both models support 13 languages, offer high accuracy, and are designed for efficiency in various use cases like meeting transcription and voice applications.
The Grok Voice Agent API has launched, allowing developers to create voice agents that support multiple languages and real-time data searches. Built on the same technology used in Tesla vehicles, it offers high speed and intelligence, with a flat pricing structure of $0.05 per minute.
Google has launched the Gemini Embedding model (gemini-embedding-001), now available to developers via the Gemini API and Vertex AI, showcasing superior performance on the Massive Text Embedding Benchmark. This versatile model supports over 100 languages and features flexible output dimensions, allowing developers to optimize for performance and cost. Users are encouraged to migrate from older models before their deprecation dates, with enhanced features like Batch API support coming soon.
Llama 4 Maverick is a state-of-the-art multilingual model designed for image and text understanding, creative writing, and enterprise applications. While it is not yet supported on Together AI, users can register for an account to access various API functionalities, including image generation, chat completions, and audio transcriptions. The model allows for versatile applications such as generating videos and embeddings based on user prompts.
Qwen-MT has been updated to enhance its multilingual understanding and translation capabilities, supporting 92 languages with high accuracy and fluency. The model incorporates reinforcement learning and offers customizable features for specialized translation needs, while achieving low latency and cost efficiency. Evaluation results demonstrate superior performance over comparable models, with robust support for various contexts and terminologies.