Show HN: Alumnium – SOTA Browsing for Claude Code
17 hours ago
- #Benchmark
- #AI
- #WebVoyager
- Alumnium MCP with Claude Code achieves 98.5% on WebVoyager benchmark, setting a new state of the art.
- Alumnium is an open-source project, sharing full results, session transcripts, and code for reproducibility.
- Alumnium sits between fully autonomous browser agents and raw browser primitives, offering high-level tools like do(), get(), and check().
- Claude Code with Sonnet 4.6 and Alumnium MCP 0.18 with GPT-5 Nano were used in the benchmark.
- The benchmark included 610 tasks, with adjustments like restoring 20 tasks and updating date-specific references.
- Total cost for Alumnium MCP API calls was approximately $5, less than a cent per task.
- Alumnium works primarily from accessibility trees, making vision unnecessary for most tasks.
- The experiment shows that custom browser agents and modern browser stacks may not be necessary for great results.
- Future plans include submitting results to the Steel browser agent leaderboard and evaluating on other benchmarks.