Emergent Behaviors
  • Home
  • About
Sign in Subscribe
Delta Belief-RL: Revolutionizing Long-Horizon Interaction through Intrinsic Credit Assignment

Delta Belief-RL: Revolutionizing Long-Horizon Interaction through Intrinsic Credit Assignment

An Explainer Video: A Gentle Slide Deck: Connecting... ⚠️ LOAD FAILED ... Try hosting the PDF on GitHub or Dropbox (public link). < PREV PAGE 1 / -- NEXT > Let's Dive In... 1. The Challenge of Uncertainty in Multi-Turn Agents In the current landscape of artificial intelligence, large-scale agents have
20 Feb 2026 16 min read
SKILLRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

SKILLRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

An Explainer Video: A Gentle Slide Deck: Connecting... ⚠️ LOAD FAILED ... Try hosting the PDF on GitHub or Dropbox (public link). < PREV PAGE 1 / -- NEXT > Let's Dive In... 1. The Challenge of Episodic Isolation in LLM Agents Current Large Language Model (LLM) agents have demonstrated remarkable
18 Feb 2026 14 min read
Maximum Likelihood Reinforcement Learning (MaxRL)

Maximum Likelihood Reinforcement Learning (MaxRL)

https://arxiv.org/pdf/2602.02710 Fahim Tajwar, Guanning Zeng, Yueer Zhou, Yuda Song Daman Arora, Yiding Jiang, Jeff Schneider, Ruslan Salakhutdinov Haiwen Feng, Andrea Zanette Carnegie Mellon University, Tsinghua University, Zhejiang University, UC Berkeley, Impossible, Inc. 🚀 Unlocking the Future of Reinforcement Learning with MaxRL! In this post, we explore
16 Feb 2026 16 min read
Reinforcement Learning via Self-Distillation

Reinforcement Learning via Self-Distillation

https://arxiv.org/pdf/2601.20802 Jonas Hübotter,Frederike Lübeck, Lejs Behric, Anton Baumann, Marco Bagatella, Daniel Marta1, Ido Hakimi, Idan Shenfeld, Thomas Kleine Buening, Carlos Guestrin, Andreas Krause1 ETH Zurich, Max Planck Institute for Intelligent Systems, MIT, Stanford 🚀 Unlocking Reinforcement Learning: The Power of Self-Distillation! In this post, we
14 Feb 2026 18 min read
Solving Catastrophic Forgetting via Self-Distillation Fine-Tuning (SDFT)

Solving Catastrophic Forgetting via Self-Distillation Fine-Tuning (SDFT)

Self-Distillation Enables Continual Learning https://arxiv.org/pdf/2601.19897 Idan Shenfeld, MIT; Mehul Damani, MIT; Jonas Hübotter, ETH Zurich; Pulkit Agrawal, MIT In this engaging post, we delve into the innovative research paper "Self-Distillation Enables Continual Learning" by the talented team from MIT, Improbable AI Lab, and
14 Feb 2026 15 min read
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability (SOAR)

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability (SOAR)

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability https://arxiv.org/abs/2601.18778 Shobhita Sundaram, MIT, John Quan, Meta FAIR, Ariel Kwiatkowski, MIT, Kartik Ahuja, MIT, Yann Ollivier, Meta FAIR, Julia Kempe, MIT An Explainer Video: A Gentle Slide Deck: Connecting... ⚠️ LOAD FAILED ... Try hosting the
13 Feb 2026 17 min read
Learning to Discover at Test Time (TTT-Discover)

Learning to Discover at Test Time (TTT-Discover)

Learning to Discover at Test Time https://arxiv.org/pdf/2601.16175 Mert Yuksekgonul, Daniel Koceja, Xinhao Li, Federico Bianchi, Jed McCaleb, Xiaolong Wang, Jan Kautz, Yejin Choi, James Zou, Carlos Guestrin, Yu Sun Stanford University, NVIDIA, Astera Institute, UC San Diego, Together AI An Explainer Video: A Gentle Slide
25 Jan 2026 21 min read
Reasoning Models Generate Societies of Thought

Reasoning Models Generate Societies of Thought

Reasoning Models Generate Societies of Thought https://arxiv.org/pdf/2601.10825 Junsol Kim, Shiyang Lai, Nino Scherrer, Blaise Agüera y Arcas and James Evans Google, Paradigms of Intelligence Team, University of Chicago, Santa Fe Institute 🧠 Unpacking AI Reasoning: Are Models a Debate Club? In this post, we explore the
24 Jan 2026 20 min read
Dr. Zero: Self-Evolving Search Agents without Training Data

Dr. Zero: Self-Evolving Search Agents without Training Data

Dr. Zero: Self-Evolving Search Agents Without Training Data https://arxiv.org/pdf/2601.07055 Zhenrui Yue, Kartikeya Upasani, Xianjun Yang, Suyu Ge, Shaoliang Nie1, Yuning Mao, Zhe Liu, Dong Wang Meta Superintelligence Labs, University of Illinois Urbana-Champaign 🚀 Dive into the Future of Self-Evolving AI Agents! In this post, we explore
24 Jan 2026 19 min read
STEM: Scaling Transformers with Embedding Modules

STEM: Scaling Transformers with Embedding Modules

In this post, we explore the innovative approach of "STEM: Scaling Transformers with Embedding Modules." This amazing research from Carnegie Mellon University and Meta AI presents a solution to the inefficiencies of traditional Transformer architectures. Learn how STEM leverages embedding modules to enhance model performance while minimizing computational
21 Jan 2026 21 min read
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Large Language Models

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

https://arxiv.org/pdf/2601.05242
19 Jan 2026 16 min read
DeepSeek: Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

DeepSeek: Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

In this post, we explore the innovative concepts presented in the paper "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" by leading researchers at DeepSeek. Discover how memory techniques can transform the way large language models (LLMs) access and utilize information, making
17 Jan 2026 14 min read
MemRL: A Framework for Continuous, Non-Parametric Learning in AI Agents

MemRL: A Framework for Continuous, Non-Parametric Learning in AI Agents

Can AI actually learn from its mistakes the same way we do? 🧠 In this post, we explore MemRL (Memory-Augmented Reinforcement Learning), an interesting AI framework designed to mimic the human brain’s ability to balance long-term knowledge with new, real-world experiences. We dive into the "Stability-Plasticity Dilemma," why
09 Jan 2026 21 min read
The AI Superhighway: How Manifold-Constrained Hyper-Connections (mHC) Prevent Traffic Jams in Large Language Models by DeepSeek
Manifold-Constrained Hyper-Connections

The AI Superhighway: How Manifold-Constrained Hyper-Connections (mHC) Prevent Traffic Jams in Large Language Models by DeepSeek

🤖 Taming the AI Titans: The Secret to Scaling Giant Models As AI models get bigger, they often become more unstable during training. In this post, we dive into a breakthrough in AI architecture that solves the "exploding signal" problem, allowing us to build larger, smarter, and more stable
02 Jan 2026 16 min read
An Analysis of "MIT Recursive Language Models" Approach
Recursive Language Models

An Analysis of "MIT Recursive Language Models" Approach

🧠 STOP AI FROM FORGETTING! The End of "Goldfish Memory" Recursive Language Models https://arxiv.org/pdf/2512.24601v1 Alex L. Zhang, MIT CSAIL, altzhang@mit.edu Tim Kraska, MIT CSAIL, kraska@mit.edu Omar Khattab, MIT CSAIL, okhattab@mit.edu Ever feel like your AI has the memory
01 Jan 2026 19 min read
The Goldilocks Principle: The Secret to Unlocking True AI Reasoning
Reinforcement Learning

The Goldilocks Principle: The Secret to Unlocking True AI Reasoning

Is AI actually thinking—or just really good at guessing? 🤔 In this post we’re unpacking "The Recipe for AI Reasoning." We head inside the CMU AI Laboratory to see how researchers are moving past the "messy internet" to build a controlled world for synthetic reasoning.
15 Dec 2025 16 min read
DeepSeek-V3.2: A Technical Report on Architectural Efficiency and Agentic Reasoning
Open Source Models

DeepSeek-V3.2: A Technical Report on Architectural Efficiency and Agentic Reasoning

Is the gap between open-source and closed-source AI finally closed? 🚀 In this post, we dive deep into DeepSeek-V3.2, the open-source challenger that is officially taking on the giants. For years, proprietary models held a massive lead, but DeepSeek has engineered a way to achieve parity with models like GPT-5-high
08 Dec 2025 21 min read
TiDAR: Think in Diffusion, Talk in Autoregression
Autoregressive Generation

TiDAR: Think in Diffusion, Talk in Autoregression

🖥️ NVIDIA Research: How TiDAR Achieves 5.9x Speedup in LLMs This post explores TiDAR, a new architecture from researchers at NVIDIA that solves one of the biggest bottlenecks in modern AI: speed. By combining the "thinking" power of diffusion models with the "talking" precision of autoregressive
24 Nov 2025 21 min read
Page 1 of 1
Emergent Behaviors © 2026
  • Sign up
Powered by Ghost