Hasty Briefsbeta

Learning from failure to tackle hard problems

23 days ago
  • #generative models
  • #machine learning
  • #negative rewards
  • The blog post introduces BaNEL (Bayesian Negative Evidence Learning), an algorithm designed to post-train generative models using only negative rewards.
  • BaNEL addresses two key challenges in machine learning: sparsity of positive rewards and costly reward evaluations.
  • The algorithm learns from failures by modeling the underlying structure of negative samples, avoiding past mistakes without requiring positive examples.
  • BaNEL uses a separate generative model to approximate a rejection region, filtering out samples similar to past failures.
  • Experiments show BaNEL significantly improves success rates in tasks like adversarial attacks on toy language models and reasoning tasks.
  • The method trades compute for reward efficiency, excelling when additional offline computation is available.
  • BaNEL provides qualitative insights into failure modes, guiding human intuition in solving hard problems.