quantization

# tokenization → kv-cache → transformer → quantization

1 link tagged with all of: tokenization + kv-cache + transformer + quantization

Click any tag below to further narrow down your results

Links

How LLM Inference Works

This article breaks down how an LLM turns your prompt into streamed tokens, covering tokenization, embeddings, transformer attention, and the two-phase pipeline of compute-bound prefill and memory-bound decode. It explains KV caching, quantization, and metrics like Time to First Token and Inter-Token Latency to show why inference speed depends on both compute and memory.

Last saved May 04, 2026 · 7 min read

tokenization + attention transformer kv-cache quantization