3 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
This article discusses the challenges and methods for teaching a language model to generate humor. It details the use of specific rubrics to evaluate comedic content and describes the data collection process from various platforms like Twitter and TikTok. The author shares successes and failures in refining the model's ability to produce funny responses.
If you do, here's more
The article explores the challenge of teaching large language models (LLMs) like Kimi K2 to understand and generate humor. The author discusses a past conversation about the subjective nature of comedy and the absence of a clear reward function for humor. Using Moonshot's Kimi K2, the author applies a rubric-based reinforcement learning (RL) approach to break down humor into specific, measurable components. They identify criteria such as relevance, specificity, and depth of understanding as key elements that can be quantitatively assessed.
To gather training data, the author employed various sources, including Twitter, TikTok, Reddit, and humor blogs like The Harvard Lampoon. They used a scraper to collect 200,000 tweets and 48,000 examples filtered from multiple platforms. For RL training, they created rubrics that evaluate responses based on specific attributes, such as the use of particular names or commitment to a premise. The model initially overused laughing emojis, prompting the introduction of a new rule to penalize such signals, which were seen as signs of weak humor construction.
The article also details what didn't work in the training process. Attempts to use TikTok comment rankings for training resulted in unhelpful signals, as the top comments often consisted of emojis rather than meaningful content. Similarly, generating "funny" and "unfunny" pairs with GPT-4o-mini led to verbose outputs that missed the essence of humor. The successful strategy involved careful iteration of the rubrics and the right mix of data sources, focusing on comedy bits considered broadly funny and timely. The author provides access to the models and training code for those interested in further exploring this approach.
Questions about this article
No questions yet.