The Anthropic interpretability team shares preliminary research on cross-modal features in language models, particularly their ability to recognize and generate visual concepts in text-based formats like ASCII and SVG. They demonstrate how specific features can activate based on context and how steering these features can alter visual representations, leading to insights about the models' internal workings and potential future research directions.
interpretability ✓
language-models ✓
cross-modal ✓
+ svg
+ ascii