Quit Emailing Yourself

# inference → optimization → infrastructure

1 link tagged with all of: inference + optimization + infrastructure

Scaling Large Language Model Serving Infrastructure at Meta

Charlotte Qi discusses the challenges of serving large language models (LLMs) at Meta, focusing on the complexities of LLM inference and the need for efficient hardware and software solutions. She outlines the critical steps to optimize LLM serving, including fitting models to hardware, managing latency, and leveraging techniques like continuous batching and disaggregation to enhance performance.

Saved by tldr-importer · Last saved October 29, 2025 · 7 min read

+ llm inference ✓ optimization ✓ + meta infrastructure ✓

Links

Scaling Large Language Model Serving Infrastructure at Meta