Quit Emailing Yourself

# optimization → inference → llm

3 links tagged with all of: optimization + inference + llm

Click any tag below to further narrow down your results

Links

GitHub - microsoft/BitNet: Official inference framework for 1-bit LLMs

Bitnet.cpp is a framework designed for efficient inference of 1-bit large language models (LLMs), offering significant speed and energy consumption improvements on both ARM and x86 CPUs. The software enables the execution of large models locally, achieving speeds comparable to human reading, and aims to inspire further development in 1-bit LLMs. Future plans include GPU support and extensions for other low-bit models.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ bitnet llm ✓ inference ✓ optimization ✓ + open-source

Scaling Large Language Model Serving Infrastructure at Meta

Charlotte Qi discusses the challenges of serving large language models (LLMs) at Meta, focusing on the complexities of LLM inference and the need for efficient hardware and software solutions. She outlines the critical steps to optimize LLM serving, including fitting models to hardware, managing latency, and leveraging techniques like continuous batching and disaggregation to enhance performance.

Saved by tldr-importer · Last saved October 29, 2025 · 7 min read

llm ✓ inference ✓ optimization ✓ + meta + infrastructure

Tokasaurus: An LLM Inference Engine for High-Throughput Workloads

Tokasaurus is a newly released LLM inference engine designed for high-throughput workloads, outperforming existing engines like vLLM and SGLang by more than 3x in benchmarks. It features optimizations for both small and large models, including dynamic prefix identification and various parallelism techniques to enhance efficiency and reduce CPU overhead. The engine supports various model families and is available as an open-source project on GitHub and PyPI.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

llm ✓ inference ✓ + throughput optimization ✓ + open-source