Quit Emailing Yourself

# optimization → llm → open-source → throughput

1 link tagged with all of: optimization + llm + open-source + throughput

Tokasaurus: An LLM Inference Engine for High-Throughput Workloads

Tokasaurus is a newly released LLM inference engine designed for high-throughput workloads, outperforming existing engines like vLLM and SGLang by more than 3x in benchmarks. It features optimizations for both small and large models, including dynamic prefix identification and various parallelism techniques to enhance efficiency and reduce CPU overhead. The engine supports various model families and is available as an open-source project on GitHub and PyPI.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

llm ✓ + inference throughput ✓ optimization ✓ open-source ✓

Links

Tokasaurus: An LLM Inference Engine for High-Throughput Workloads