Quit Emailing Yourself

Linear representations in language models can change dramatically over a conversation

2 min read | Saved February 14, 2026 | Copied!

language-models 🤖 representations 🤖 conversations 🤖 factuality 🤖 research 🤖

Do you care about this?

This article examines how language models alter their representations during conversations. Notably, factual information can shift to non-factual as discussions progress, depending on the content. These changes challenge static interpretations of model behavior and suggest new avenues for research.

If you do, here's more

Language model representations can shift significantly during conversations. Researchers explored how these representations evolve, particularly focusing on dimensions tied to high-level concepts like factuality. They found that information perceived as factual at the start of a conversation may be seen as non-factual later on, and this transformation depends on the content discussed. While representations related to the conversation's context change, more generic information tends to remain stable. These dynamics were observed across different models and layers, suggesting a broader applicability.

One key finding is that even using conversation transcripts from different models can trigger similar representational changes. In contrast, simply framing a narrative, like a sci-fi story, does not produce as strong an adaptation. The study also highlights that steering a model along a specific representational direction can yield varying effects depending on the conversation's progression. This challenges traditional views on interpretability, as static interpretations of model features might be misleading. The evolving nature of representations calls for new research into how models adapt to different contexts, indicating a complex interplay between conversation dynamics and language understanding.

Questions about this article

No questions yet.