Hasty Briefsbeta

Bilingual

Sakana Fugu

4 hours ago
  • #AI Research
  • #Autonomous Agents
  • #Benchmark Performance
  • An AI agent autonomously improves a small GPT's training recipe using AutoResearch, running 123 experiments over ~14 hours on a single H100 GPU. It achieves the best mean BPB (0.9774 ± 0.0019) and a top single-run BPB of 0.9748, outperforming frontier models A, B, and C.
  • Fugu-Ultra successfully recovers the reading order of scattered classical Japanese kana letters (ryōmei shōsoku) with a NED score of 0.80, far exceeding the scores of Models A (0.24) and B (similar), while Model C fails to produce a viable predictor.
  • In a Rubik's Cube solver benchmark, Fugu-Ultra and Model A generate solvers that solve all 300 scrambled cubes, with Fugu-Ultra averaging 19.72 moves versus Model A's 19.76, while Models B and C produce code that crashes and solves none.
  • Fugu-Ultra generates a functional mechanical iris CAD design with blades rotating around outer pins to open and close the aperture properly, unlike other models that produce designs with gaps, weak linkages, or incomplete closure.
  • Fugu-Ultra wins four consecutive blindfold chess games against three frontier models and a 2100-Elo Stockfish engine, maintaining accuracy and ending each game in checkmate.
  • In a simulated equity trading task over 50 weeks, Fugu-Ultra achieves a mean return of +19.43% (portfolio growth to $11,943.22 ± $633.86), outperforming other frontier models, which all yield less than +15% return.