2 links tagged with all of: language-models + neural-networks
Click any tag below to further narrow down your results
Links
This article explores how large language models (LLMs) adopt the "Assistant" persona during interactions. It discusses the concept of the "Assistant Axis," a neural framework that defines how models behave and how steering techniques can either stabilize or destabilize their responses. The research highlights the challenges of maintaining consistency in the Assistant's character and the risks of persona drift.
Modern language models utilizing sliding window attention (SWA) face limitations in effectively accessing information from distant words due to information dilution and the impact of residual connections. Despite theoretically being able to see a vast amount of context, practical constraints reduce their effective memory to around 1,500 words. The article explores these limitations through mathematical modeling, revealing how the architecture influences information flow and retention.