Learning from failure to tackle hard problems
23 days ago
- #generative models
- #machine learning
- #negative rewards
- The blog post introduces BaNEL (Bayesian Negative Evidence Learning), an algorithm designed to post-train generative models using only negative rewards.
- BaNEL addresses two key challenges in machine learning: sparsity of positive rewards and costly reward evaluations.
- The algorithm learns from failures by modeling the underlying structure of negative samples, avoiding past mistakes without requiring positive examples.
- BaNEL uses a separate generative model to approximate a rejection region, filtering out samples similar to past failures.
- Experiments show BaNEL significantly improves success rates in tasks like adversarial attacks on toy language models and reasoning tasks.
- The method trades compute for reward efficiency, excelling when additional offline computation is available.
- BaNEL provides qualitative insights into failure modes, guiding human intuition in solving hard problems.