Hasty Briefsbeta

Bilingual

Reading MAI's efficiency gain. How to pick architectures like serious people

4 hours ago
  • #model architecture
  • #efficiency metric
  • #computational trade-offs
  • The MAI-Thinking-1 report introduces a method to compare model architectures using Efficiency Gain (EG), a metric that accounts for compute budget vs. final loss trade-offs.
  • EG measures how much better or worse a candidate design is compared to a baseline, calculable on cost axes like FLOPs or wall-clock time, which often differ in optimal models.
  • FLOPs counting is implementation-independent, useful for evaluating new ideas pre-optimization, while wall-clock time reflects real-world costs like cloud rental or cluster sharing.
  • A key insight is that architectures cheap in FLOPs may underperform in time due to inefficient kernels, making EG crucial for avoiding costly mistakes in design choices.
  • Example from Table 2: an MoE variant with 7+1 shared layers shows a 3% EG win in FLOPs but an 18% loss in time, favoring the interleaved layout despite FLOPs suggesting otherwise.
  • EG is computed by fitting a power law to baseline runs, inverting to find cost-from-loss, and comparing candidate costs; values above 1 indicate efficiency gains.
  • The method generalizes to any architectural change, helping assess if reduced FLOPs justify engineering effort or if ideas are viable on actual hardware.