We Only Learn from Error

6 hours ago

The text discusses the training of a vision-language-action model to follow faces, initially struggling with high error due to limited exposure to corrective scenarios in training data.
A behavioral cloning approach was used, where the model imitated an oracle that tracked and adjusted to faces, but the dataset was skewed towards frames where the oracle had already corrected to the center.
To address the issue, periodic disturbances were introduced, teleporting the face target to increase high-error samples, which significantly improved the model's accuracy, reducing average error from 20 to under 5 degrees.

Hasty Briefsbeta