Why LLMs still lack taste
9 hours ago
- #LLMs
- #Software Development
- #AI Taste
- LLMs demonstrate advanced capabilities in software development but lack 'taste'—the ability to choose the best option from correct alternatives.
- Taste is context-dependent, subjective, and crucial for long-term maintainability, but LLMs struggle due to their reliance on verifiable rewards in training.
- Humans acquire taste through years of experience in varied contexts, learning which code properties are desirable, unlike LLMs' short, objective-focused training.
- RLVR (Reinforcement Learning from Verifiable Rewards) improves coding but fails to capture long-term goals like maintainability and uptime, as rewards are narrow.
- A proposed solution involves a long-horizon RLVR harness simulating real-world SaaS environments with diverse users and monetary rewards to teach taste.