1 link tagged with all of: tokenization + integer-linear-programming
Click any tag below to further narrow down your results
Links
The author frames tokenizer design as an integer linear program, relaxes it to a continuous LP, and uses cutting planes to close the gap between fractional and integral solutions. They automate cut discovery with Codex, apply cycle constraints on overlapping token edges, and report provably optimal tokenizers on small pretokenized datasets.
tokenization
integer-linear-programming
+ cutting-planes
+ byte-pair-encoding
+ optimization
+ tldr-a-byte-sized-daily-tech-newsletter