Learning from failure to tackle hard problems

23 days ago

Copy Link

The blog post introduces BaNEL (Bayesian Negative Evidence Learning), an algorithm designed to post-train generative models using only negative rewards.
BaNEL addresses two key challenges in machine learning: sparsity of positive rewards and costly reward evaluations.
The algorithm learns from failures by modeling the underlying structure of negative samples, avoiding past mistakes without requiring positive examples.
BaNEL uses a separate generative model to approximate a rejection region, filtering out samples similar to past failures.
Experiments show BaNEL significantly improves success rates in tasks like adversarial attacks on toy language models and reasoning tasks.
The method trades compute for reward efficiency, excelling when additional offline computation is available.
BaNEL provides qualitative insights into failure modes, guiding human intuition in solving hard problems.

Hasty Briefsbeta