2 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
This article discusses the significance of the Chain Rule of Probability and the Chain Rule of Calculus in machine learning advancements. It explains how these rules help compute complex probabilities in language models by breaking them down into smaller events, like predicting tokens based on previous ones. The author also highlights notable achievements in deep learning and diversity efforts within the AI community.
If you do, here's more
The article highlights the significance of two distinct Chain Rules in machine learning: the Chain Rule of Probability and the Chain Rule of Calculus. The Chain Rule of Probability plays a crucial role in developing Large Language Models (LLMs). It allows the calculation of complex event probabilities by breaking them down into simpler, conditional probabilities. For instance, the probability of a sequence of tokens can be expressed as the product of individual token probabilities given their preceding tokens. This method is fundamental in language modeling, where models typically operate with a vocabulary of around 100,000 tokens.
The piece also touches on advancements in deep learning, particularly showcased during Tesla's AI Day. It emphasizes a neural network architecture that ingeniously interprets visual data to determine valid lanes from images. By integrating various technologies such as CNNs and transformers, this approach effectively constructs a graph that represents lanes and their characteristics. The author's reflections on 2021 highlight personal milestones, including engagement in diversity and inclusion efforts at DeepMind and mentoring roles within the Khipu AI community in Latin America. They also mention the Perceiver architecture, which treats multiple data modalities as sequences, aligning with their long-held aspirations in deep learning.
Questions about this article
No questions yet.