Following the Text Gradient at Scale

2 days ago

Reinforcement learning (RL) algorithms currently discard rich feedback, compressing it into a single scalar reward, creating a 'scalar bottleneck' that hinders learning efficiency.
Rich feedback, such as natural-language critiques or detailed evaluations, provides actionable information that enables targeted improvements, reducing the need for extensive trial-and-error.
An emerging paradigm in learning avoids this bottleneck by using structured feedback directly, exemplified by methods like Feedback Descent, which outperforms specialized RL in domains like molecular design and prompt optimization.
Feedback Descent utilizes two components: evaluators that provide detailed textual feedback and editors (LLMs) that revise artifacts based on accumulated feedback, forming an iterative loop for improvement.
In tests across domains (molecular design, SVG optimization, prompt optimization), Feedback Descent matched or exceeded state-of-the-art methods, demonstrating sample efficiency and cross-domain applicability without task-specific modifications.
Text-based optimization suggests learning can occur in 'semantic space' through accumulating textual artifacts, offering advantages in continual learning by avoiding pitfalls like catastrophic forgetting associated with weight updates.

Hasty Briefsbeta