Hasty Briefsbeta

Bilingual

Following the Text Gradient at Scale

2 days ago
  • #Feedback Optimization
  • #Machine Learning
  • #Reinforcement Learning
  • Reinforcement learning (RL) algorithms currently discard rich feedback, compressing it into a single scalar reward, creating a 'scalar bottleneck' that hinders learning efficiency.
  • Rich feedback, such as natural-language critiques or detailed evaluations, provides actionable information that enables targeted improvements, reducing the need for extensive trial-and-error.
  • An emerging paradigm in learning avoids this bottleneck by using structured feedback directly, exemplified by methods like Feedback Descent, which outperforms specialized RL in domains like molecular design and prompt optimization.
  • Feedback Descent utilizes two components: evaluators that provide detailed textual feedback and editors (LLMs) that revise artifacts based on accumulated feedback, forming an iterative loop for improvement.
  • In tests across domains (molecular design, SVG optimization, prompt optimization), Feedback Descent matched or exceeded state-of-the-art methods, demonstrating sample efficiency and cross-domain applicability without task-specific modifications.
  • Text-based optimization suggests learning can occur in 'semantic space' through accumulating textual artifacts, offering advantages in continual learning by avoiding pitfalls like catastrophic forgetting associated with weight updates.