Click any tag below to further narrow down your results
Links
MIT CSAIL researchers built Retrieval Language Models that store full documents outside the model’s context window and let the AI query them via code, slicing, and parallel sub-instances. This approach handles inputs up to 10 million tokens, doubles benchmark performance, and matches or beats the cost of massive-context calls.
This article shows that SKILL.md files aren’t static prompts but loader specifications defining what to load, when, and at what cost. It breaks down the three progressive-disclosure levels, explains how architecture—not instruction content—drives context consumption, and highlights common antipatterns that bloat or break skills.
Anthropic reduced Claude Code’s prompt cache TTL from one hour to five minutes, causing higher token write costs and faster quota depletion for long coding sessions. Developers report frequent cache misses—especially with large context windows—hitting usage limits and degrading performance. Anthropic says it will tweak default context windows but won’t offer a global TTL setting.
This article breaks down the core concepts behind LLMs—from next-token prediction training to tokens, vectors and attention layers—to show how they generate text. It also covers context windows, parameters and why model scale affects performance.