Following the Text Gradient at Scale
2 days ago
- #Feedback Optimization
- #Machine Learning
- #Reinforcement Learning
- Reinforcement learning (RL) algorithms currently discard rich feedback, compressing it into a single scalar reward, creating a 'scalar bottleneck' that hinders learning efficiency.
- Rich feedback, such as natural-language critiques or detailed evaluations, provides actionable information that enables targeted improvements, reducing the need for extensive trial-and-error.
- An emerging paradigm in learning avoids this bottleneck by using structured feedback directly, exemplified by methods like Feedback Descent, which outperforms specialized RL in domains like molecular design and prompt optimization.
- Feedback Descent utilizes two components: evaluators that provide detailed textual feedback and editors (LLMs) that revise artifacts based on accumulated feedback, forming an iterative loop for improvement.
- In tests across domains (molecular design, SVG optimization, prompt optimization), Feedback Descent matched or exceeded state-of-the-art methods, demonstrating sample efficiency and cross-domain applicability without task-specific modifications.
- Text-based optimization suggests learning can occur in 'semantic space' through accumulating textual artifacts, offering advantages in continual learning by avoiding pitfalls like catastrophic forgetting associated with weight updates.