Click any tag below to further narrow down your results
Links
This article outlines how teams can switch their inference infrastructure to FriendliAI for improved efficiency and cost savings. FriendliAI claims 99.99% reliability, up to 90% lower costs, and faster throughput with minimal code changes required for migration. Users can get up to $50,000 in credits when they switch.
Tokasaurus is a newly released LLM inference engine designed for high-throughput workloads, outperforming existing engines like vLLM and SGLang by more than 3x in benchmarks. It features optimizations for both small and large models, including dynamic prefix identification and various parallelism techniques to enhance efficiency and reduce CPU overhead. The engine supports various model families and is available as an open-source project on GitHub and PyPI.