3 links tagged with all of: llms + reinforcement-learning
Click any tag below to further narrow down your results
Links
This article introduces Generative Adversarial Distillation (GAD), a method for training student models using only teacher-generated texts. Unlike traditional knowledge distillation, GAD employs a two-player game between a generator and a discriminator, enabling effective learning without probability supervision. The results demonstrate that models trained with GAD achieve performance comparable to their larger teacher models.
The article discusses the evolution of large language models (LLMs), highlighting the shift in perception among researchers regarding their capabilities. It emphasizes the role of chain of thought (CoT) in enhancing LLM outputs and the potential of reinforcement learning to drive further improvements. The piece also touches on the changing attitudes of programmers toward AI-assisted coding and the ongoing exploration of new model architectures.
Sutton critiques the prevalent approach in LLM development, arguing that they are heavily influenced by human biases and lack the "bitter lesson pilled" quality that would allow them to learn independently from experience. He contrasts LLMs with animal learning, emphasizing the importance of intrinsic motivation and continuous learning, while suggesting that current AI systems may be more akin to engineered "ghosts" rather than true intelligent entities. The discussion highlights the need for inspiration from animal intelligence to innovate beyond current methods.