A new method for estimating the memorization capacity of language models is proposed, distinguishing between unintended memorization and generalization. The study finds that GPT-style models have an estimated capacity of 3.6 bits per parameter, revealing that models memorize data until their capacity is reached, after which generalization begins to take precedence.
language-models ✓
memorization ✓
generalization ✓
+ scaling-laws
data-analysis ✓