The Landscape of Agentic Reinforcement Learning for LLMs

6 days ago

Copy Link

Agentic reinforcement learning (Agentic RL) represents a shift from conventional reinforcement learning for LLMs, turning them into autonomous decision-making agents.
The survey contrasts single-step Markov Decision Processes (MDPs) of LLM-RL with temporally extended, partially observable MDPs (POMDPs) in Agentic RL.
A twofold taxonomy is proposed: one based on core agentic capabilities (planning, tool use, memory, reasoning, self-improvement, perception) and another on applications across task domains.
Reinforcement learning is highlighted as the key mechanism for transforming static capabilities into adaptive, robust agentic behavior.
The survey consolidates open-source environments, benchmarks, and frameworks to support future research.
Over five hundred recent works are synthesized to outline the field's rapid evolution and identify opportunities and challenges for scalable, general-purpose AI agents.

Hasty Briefsbeta