Quit Emailing Yourself

# throughput → inference

2 links tagged with all of: throughput + inference

Click any tag below to further narrow down your results

Links

Switch to FriendliAI | Get up to $50k Inference Credits

This article outlines how teams can switch their inference infrastructure to FriendliAI for improved efficiency and cost savings. FriendliAI claims 99.99% reliability, up to 90% lower costs, and faster throughput with minimal code changes required for migration. Users can get up to $50,000 in credits when they switch.

Saved by tldr-importer · Last saved February 14, 2026 · 4 min read

inference ✓ + cost-savings + migration throughput ✓ + reliability

Tokasaurus: An LLM Inference Engine for High-Throughput Workloads

Tokasaurus is a newly released LLM inference engine designed for high-throughput workloads, outperforming existing engines like vLLM and SGLang by more than 3x in benchmarks. It features optimizations for both small and large models, including dynamic prefix identification and various parallelism techniques to enhance efficiency and reduce CPU overhead. The engine supports various model families and is available as an open-source project on GitHub and PyPI.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ llm inference ✓ throughput ✓ + optimization + open-source