3 links tagged with all of: language-models + transformers
Click any tag below to further narrow down your results
Links
This article discusses RePo, a module that improves transformer-based language models by assigning semantic positions to tokens, enhancing their ability to manage context. It shows that RePo effectively reduces cognitive load, helping models better handle noisy inputs, structured data, and long contexts. Experimental results demonstrate significant performance gains in various tasks.
This article introduces Mixture-of-Recursions (MoR), a framework that enhances the efficiency of language models by combining parameter sharing and adaptive computation. MoR dynamically adjusts recursion depths for individual tokens, improving memory access and reducing computational costs while maintaining model performance. It shows significant improvements in validation perplexity and few-shot accuracy across various model sizes.
Researchers discovered that language models fail on long conversations due to the removal of initial tokens, which act as "attention sinks" that stabilize attention distribution. Their solution, StreamingLLM, retains these tokens permanently, allowing models to process sequences of over 4 million tokens effectively. This approach has been integrated into major frameworks like HuggingFace and OpenAI's latest models.