3 links tagged with all of: language-models + scaling-laws
Click any tag below to further narrow down your results
Links
This article explores how the performance of language model-based agent systems can be quantitatively analyzed. It identifies key scaling laws and coordination strategies through experiments with various agent architectures, revealing insights on tool coordination, capability saturation, and error amplification. The findings help predict optimal coordination strategies for different tasks.
A new method for estimating the memorization capacity of language models is proposed, distinguishing between unintended memorization and generalization. The study finds that GPT-style models have an estimated capacity of 3.6 bits per parameter, revealing that models memorize data until their capacity is reached, after which generalization begins to take precedence.
Mixture-of-Experts (MoE) architectures enhance the efficiency of large language models (LLMs) by separating parameters from computational costs. This study introduces the Efficiency Leverage (EL) metric to quantify the computational advantage of MoE models and establishes a unified scaling law that predicts EL based on configuration parameters, demonstrating that a model with significantly fewer active parameters can achieve comparable performance to a larger dense model while using less computational resources.
+ mixture-of-experts
+ efficiency-leverage
scaling-laws ✓
language-models ✓
+ computational-resources