Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks

7 hours ago

Long-context Large Language Models need effective working memory management to avoid attention dilution during long-horizon tasks.
Existing approaches for memory management lack awareness of the agent's reasoning state, leading to suboptimal decisions.
Memory-as-Action (MemAct) treats working memory management as learnable policy actions using in-place editing operations like deletion and insertion.
MemAct enables joint optimization of information retention and task performance through end-to-end reinforcement learning.
Dynamic Context Policy Optimization is introduced to address computational challenges and maintain training efficiency without compromising reasoning integrity.
Experiments show MemAct-RL-14B matches the accuracy of models 16 times larger while reducing average context length by 51%, with strategies that adapt and generalize across tasks.

Hasty Briefsbeta