Continual Harness: Online Adaptation for Self-Improving Foundation Agents

5 hours ago

A new system called Continual Harness is introduced, which enables embodied AI agents to self-improve online without human intervention, starting from a minimal environment interface.
The system was tested on Pokemon games, where it reduced button-press costs and recovered most of the gap to expert-crafted solutions, despite no prior knowledge or tools.
An online process-reward co-learning loop allows open-source agents to learn from a frontier teacher's relabeled rollouts, driving progress without resetting the environment.

Hasty Briefsbeta