Quit Emailing Yourself

LLMs can invent their own compression - Rajan Agarwal

6 min read | Saved February 14, 2026 | Copied!

compression 🤖 llm 🤖 summarization 🤖 machine-learning 🤖 tokens 🤖

Do you care about this?

The article discusses an experiment where a summarizer and a generator were co-trained to create a compression scheme for text. The model learned to effectively use Mandarin and punctuation to reduce text size while preserving meaning, achieving a compression rate of about 90%.

If you do, here's more

The article presents an innovative approach to text compression in language models by co-training a summarizer and a generator. The goal is to create a compression scheme that maintains full-context behavior while using significantly fewer context tokens. The process involves training the model to compress input text into a token space compatible with the original model, allowing it to predict the next token effectively. Notably, the model adopts unique strategies like aggressive pruning, dense punctuation, and switching to Mandarin for efficient information density.

The experimental setup divides a text document into three segments: the initial context, the future prediction section, and a local window of tokens. A teacher model, which sees the full context, guides a generator model that only views the compressed representation and the local context. The results hinge on a carefully defined reward function that balances the improvement in prediction accuracy from the compression against the length of the generated gist. This setup aims to enhance the model's ability to predict future tokens without redundantly carrying both global and local information.

The training process employs reinforcement learning, treating the creation of the gist as a constrained optimization problem. The model's performance is measured by log-probability improvements when using the compressed representation compared to local context alone. Over 1,500 training steps, the model showed a decrease in the KL divergence between the generated outputs and the expected results, indicating that the summarizer and generator effectively learned to collaborate for better text compression.

Questions about this article

No questions yet.