TaxCalcBench: Evaluating Frontier Models on the Tax Calculation Task
18 hours ago
- #Benchmarking
- #Tax Calculation
- #Artificial Intelligence
- AI currently cannot accurately file US personal income taxes.
- TaxCalcBench is introduced as a benchmark to evaluate AI models on tax calculation tasks.
- State-of-the-art models succeed in calculating less than a third of federal income tax returns.
- Common errors include misuse of tax tables, calculation mistakes, and incorrect eligibility determination.
- Additional infrastructure is needed to improve AI application in tax calculations.