Hasty Briefsbeta

Bilingual

Show HN: Alumnium – SOTA Browsing for Claude Code

19 hours ago
  • #Benchmark
  • #AI
  • #WebVoyager
  • Alumnium MCP with Claude Code achieves 98.5% on WebVoyager benchmark, setting a new state of the art.
  • Alumnium is an open-source project, sharing full results, session transcripts, and code for reproducibility.
  • Alumnium sits between fully autonomous browser agents and raw browser primitives, offering high-level tools like do(), get(), and check().
  • Claude Code with Sonnet 4.6 and Alumnium MCP 0.18 with GPT-5 Nano were used in the benchmark.
  • The benchmark included 610 tasks, with adjustments like restoring 20 tasks and updating date-specific references.
  • Total cost for Alumnium MCP API calls was approximately $5, less than a cent per task.
  • Alumnium works primarily from accessibility trees, making vision unnecessary for most tasks.
  • The experiment shows that custom browser agents and modern browser stacks may not be necessary for great results.
  • Future plans include submitting results to the Steel browser agent leaderboard and evaluating on other benchmarks.