Why LLMs still lack taste

9 hours ago

LLMs demonstrate advanced capabilities in software development but lack 'taste'—the ability to choose the best option from correct alternatives.
Taste is context-dependent, subjective, and crucial for long-term maintainability, but LLMs struggle due to their reliance on verifiable rewards in training.
Humans acquire taste through years of experience in varied contexts, learning which code properties are desirable, unlike LLMs' short, objective-focused training.
RLVR (Reinforcement Learning from Verifiable Rewards) improves coding but fails to capture long-term goals like maintainability and uptime, as rewards are narrow.
A proposed solution involves a long-horizon RLVR harness simulating real-world SaaS environments with diverse users and monetary rewards to teach taste.

Hasty Briefsbeta