It's Hard to Eval Is a Product Smell
9 hours ago
- #Verification
- #Product design
- #AI evals
- The statement 'our product is hard to eval' is a product smell, indicating potential usability issues for both developers and users.
- Designing products for ease of verification should prioritize user trust and reduce the need for users to redo work to verify AI outputs.
- Example 1: AI data agents should provide checkable artifacts, like interactive notebooks with progressive disclosure, to allow verification through trusted sources, metric definitions, and sanity checks.
- Example小孩2: An AI tool for physical education lesson plans should anchor generated plans in vetted, existing plans, showing diffs and reasons for changes to reduce cognitive load during review.
- Example 3: A workers' compensation medical report tool should act as a research assistant, surfacing evidence, contradictions, and gaps for doctors to verify before assembling final reports.
- General principles include understanding user verification needs, using provenance and progressive disclosure, and breaking workflows into smaller, verifiable units.
- Designing for verification aligns with established principles like needfinding and sensemaking, and is crucial in the AI era where verification is a bottleneck.