Quit Emailing Yourself

# llm → transformer

2 links tagged with all of: llm + transformer

Click any tag below to further narrow down your results

Links

How LLM Inference Works

This article explains how Large Language Models (LLMs) process prompts from tokenization to response generation. It covers the transformer architecture, including self-attention and feed-forward networks, and details the importance of the KV cache in optimizing performance.

Saved by tldr-importer · Last saved February 14, 2026 · 6 min read

llm ✓ + inference + tokenization transformer ✓ + kv-cache

[2510.02361] ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference

The article presents ChunkLLM, a lightweight and pluggable framework designed to enhance the inference efficiency of large transformer models. It introduces two key components, QK Adapter and Chunk Adapter, which improve feature compression and chunk attention acquisition while maintaining high performance on both long and short text benchmarks. Experimental results indicate significant speedup in processing long texts compared to traditional transformer models.

Saved by hn_user_4 · 2 others saved this · Last saved October 28, 2025 · 3 min read

llm ✓ + efficiency transformer ✓ + inference