Emergent Behaviors

Emergent Behaviors

Sign in Subscribe

On-Policy Context Distillation for Language Models (OPCD)

On-Policy Context Distillation for Language Models (OPCD)

Baking the Brain: How Microsoft Research Internalizes System Prompts! 🧠🔥 Tired of massive system prompts bloating your context window and spiking your API costs? 💸 In this deep dive post, we explore a breakthrough from Microsoft Research: On-Policy Context Distillation (OPCD). Learn how we can move from the "crushing weight"

Delta Belief-RL: Revolutionizing Long-Horizon Interaction through Intrinsic Credit Assignment

Delta Belief-RL: Revolutionizing Long-Horizon Interaction through Intrinsic Credit Assignment

An Explainer Video: Intrinsic Credit Assignment for Long Horizon Interaction https://arxiv.org/pdf/2602.12342 Ilze Amanda Auzina, Joschka Struber, Sergio Hernandez-Gutierrez, Shashwat Goel, Ameya Prabhu, Matthias Bethge A Gentle Slide Deck: Connecting... ⚠️ LOAD FAILED ... Try hosting the PDF on GitHub or Dropbox (public link). < PREV PAGE 1

SKILLRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

SKILLRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

An Explainer Video: SKILLRL: EVOLVING AGENTS VIA RECURSIVE SKILL-AUGMENTED REINFORCEMENT LEARNING https://arxiv.org/abs/2602.08234v1 Peng Xia, UNC-Chapel Hill, pxia@cs.unc.edu Jianwen Chen, UNC-Chapel Hill Hanyang WangUniversity of Chicago Jiaqi Liu, UNC-Chapel Hill Kaide Zeng, UNC-Chapel Hill Yu Wang, University of California San Diego Siwei Han,

Maximum Likelihood Reinforcement Learning (MaxRL)

Maximum Likelihood Reinforcement Learning (MaxRL)

https://arxiv.org/pdf/2602.02710 Fahim Tajwar, Guanning Zeng, Yueer Zhou, Yuda Song Daman Arora, Yiding Jiang, Jeff Schneider, Ruslan Salakhutdinov Haiwen Feng, Andrea Zanette Carnegie Mellon University, Tsinghua University, Zhejiang University, UC Berkeley, Impossible, Inc. 🚀 Unlocking the Future of Reinforcement Learning with MaxRL! In this post, we explore

Reinforcement Learning via Self-Distillation

Reinforcement Learning via Self-Distillation

https://arxiv.org/pdf/2601.20802 Jonas Hübotter,Frederike Lübeck, Lejs Behric, Anton Baumann, Marco Bagatella, Daniel Marta1, Ido Hakimi, Idan Shenfeld, Thomas Kleine Buening, Carlos Guestrin, Andreas Krause1 ETH Zurich, Max Planck Institute for Intelligent Systems, MIT, Stanford 🚀 Unlocking Reinforcement Learning: The Power of Self-Distillation! In this post, we

Solving Catastrophic Forgetting via Self-Distillation Fine-Tuning (SDFT)

Solving Catastrophic Forgetting via Self-Distillation Fine-Tuning (SDFT)

Self-Distillation Enables Continual Learning https://arxiv.org/pdf/2601.19897 Idan Shenfeld, MIT; Mehul Damani, MIT; Jonas Hübotter, ETH Zurich; Pulkit Agrawal, MIT In this engaging post, we delve into the innovative research paper "Self-Distillation Enables Continual Learning" by the talented team from MIT, Improbable AI Lab, and

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability (SOAR)

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability (SOAR)

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability https://arxiv.org/abs/2601.18778 Shobhita Sundaram, MIT, John Quan, Meta FAIR, Ariel Kwiatkowski, MIT, Kartik Ahuja, MIT, Yann Ollivier, Meta FAIR, Julia Kempe, MIT An Explainer Video: A Gentle Slide Deck: Connecting... ⚠️ LOAD FAILED ... Try hosting the

Learning to Discover at Test Time (TTT-Discover)

Learning to Discover at Test Time (TTT-Discover)

Learning to Discover at Test Time https://arxiv.org/pdf/2601.16175 Mert Yuksekgonul, Daniel Koceja, Xinhao Li, Federico Bianchi, Jed McCaleb, Xiaolong Wang, Jan Kautz, Yejin Choi, James Zou, Carlos Guestrin, Yu Sun Stanford University, NVIDIA, Astera Institute, UC San Diego, Together AI An Explainer Video: A Gentle Slide

Reasoning Models Generate Societies of Thought

Reasoning Models Generate Societies of Thought

Reasoning Models Generate Societies of Thought https://arxiv.org/pdf/2601.10825 Junsol Kim, Shiyang Lai, Nino Scherrer, Blaise Agüera y Arcas and James Evans Google, Paradigms of Intelligence Team, University of Chicago, Santa Fe Institute 🧠 Unpacking AI Reasoning: Are Models a Debate Club? In this post, we explore the

Dr. Zero: Self-Evolving Search Agents without Training Data

Dr. Zero: Self-Evolving Search Agents without Training Data

Dr. Zero: Self-Evolving Search Agents Without Training Data https://arxiv.org/pdf/2601.07055 Zhenrui Yue, Kartikeya Upasani, Xianjun Yang, Suyu Ge, Shaoliang Nie1, Yuning Mao, Zhe Liu, Dong Wang Meta Superintelligence Labs, University of Illinois Urbana-Champaign 🚀 Dive into the Future of Self-Evolving AI Agents! In this post, we explore

STEM: Scaling Transformers with Embedding Modules

STEM: Scaling Transformers with Embedding Modules

In this post, we explore the innovative approach of "STEM: Scaling Transformers with Embedding Modules." This amazing research from Carnegie Mellon University and Meta AI presents a solution to the inefficiencies of traditional Transformer architectures. Learn how STEM leverages embedding modules to enhance model performance while minimizing computational

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Large Language Models

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

https://arxiv.org/pdf/2601.05242

DeepSeek: Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

DeepSeek: Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

In this post, we explore the innovative concepts presented in the paper "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" by leading researchers at DeepSeek. Discover how memory techniques can transform the way large language models (LLMs) access and utilize information, making

MemRL: A Framework for Continuous, Non-Parametric Learning in AI Agents

MemRL: A Framework for Continuous, Non-Parametric Learning in AI Agents

Can AI actually learn from its mistakes the same way we do? 🧠 In this post, we explore MemRL (Memory-Augmented Reinforcement Learning), an interesting AI framework designed to mimic the human brain’s ability to balance long-term knowledge with new, real-world experiences. We dive into the "Stability-Plasticity Dilemma," why

The AI Superhighway: How Manifold-Constrained Hyper-Connections (mHC) Prevent Traffic Jams in Large Language Models by DeepSeek

Manifold-Constrained Hyper-Connections

The AI Superhighway: How Manifold-Constrained Hyper-Connections (mHC) Prevent Traffic Jams in Large Language Models by DeepSeek

🤖 Taming the AI Titans: The Secret to Scaling Giant Models As AI models get bigger, they often become more unstable during training. In this post, we dive into a breakthrough in AI architecture that solves the "exploding signal" problem, allowing us to build larger, smarter, and more stable

An Analysis of "MIT Recursive Language Models" Approach

Recursive Language Models

An Analysis of "MIT Recursive Language Models" Approach

🧠 STOP AI FROM FORGETTING! The End of "Goldfish Memory" Recursive Language Models https://arxiv.org/pdf/2512.24601v1 Alex L. Zhang, MIT CSAIL, altzhang@mit.edu Tim Kraska, MIT CSAIL, kraska@mit.edu Omar Khattab, MIT CSAIL, okhattab@mit.edu Ever feel like your AI has the memory

The Goldilocks Principle: The Secret to Unlocking True AI Reasoning

Reinforcement Learning

The Goldilocks Principle: The Secret to Unlocking True AI Reasoning

Is AI actually thinking—or just really good at guessing? 🤔 In this post we’re unpacking "The Recipe for AI Reasoning." We head inside the CMU AI Laboratory to see how researchers are moving past the "messy internet" to build a controlled world for synthetic reasoning.

DeepSeek-V3.2: A Technical Report on Architectural Efficiency and Agentic Reasoning

Open Source Models

DeepSeek-V3.2: A Technical Report on Architectural Efficiency and Agentic Reasoning

Is the gap between open-source and closed-source AI finally closed? 🚀 In this post, we dive deep into DeepSeek-V3.2, the open-source challenger that is officially taking on the giants. For years, proprietary models held a massive lead, but DeepSeek has engineered a way to achieve parity with models like GPT-5-high

TiDAR: Think in Diffusion, Talk in Autoregression

Autoregressive Generation

TiDAR: Think in Diffusion, Talk in Autoregression

🖥️ NVIDIA Research: How TiDAR Achieves 5.9x Speedup in LLMs This post explores TiDAR, a new architecture from researchers at NVIDIA that solves one of the biggest bottlenecks in modern AI: speed. By combining the "thinking" power of diffusion models with the "talking" precision of autoregressive