AutoHarness: Improving LLM agents by automatically synthesizing a code harness
2 days ago
- #GameAI
- #LLM
- #Automation
- LLM agents often perform prohibited actions in external environments, leading to failures.
- Manual 'harnesses' are commonly written to prevent such LLM failures.
- Gemini-2.5-Flash can automatically synthesize a code harness to prevent illegal moves.
- The synthesized harness prevents all illegal moves in 145 TextArena games.
- A smaller model with a custom harness can outperform larger models like Gemini-2.5-Pro and GPT-5.2-High.
- Generating the entire policy in code eliminates the need for LLM decision-making at runtime.
- The code-policy approach is more cost-effective and achieves higher average rewards.