Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks
6 hours ago
- #Reinforcement Learning
- #Artificial Intelligence
- #Memory Management
- Long-context Large Language Models need effective working memory management to avoid attention dilution during long-horizon tasks.
- Existing approaches for memory management lack awareness of the agent's reasoning state, leading to suboptimal decisions.
- Memory-as-Action (MemAct) treats working memory management as learnable policy actions using in-place editing operations like deletion and insertion.
- MemAct enables joint optimization of information retention and task performance through end-to-end reinforcement learning.
- Dynamic Context Policy Optimization is introduced to address computational challenges and maintain training efficiency without compromising reasoning integrity.
- Experiments show MemAct-RL-14B matches the accuracy of models 16 times larger while reducing average context length by 51%, with strategies that adapt and generalize across tasks.