Hasty Briefsbeta

Bilingual

Qwen3.7-Max Ran for 35 Hours on Unknown Hardware and Achieved a 10× Speedup

3 days ago
  • #agentic AI
  • #kernel development
  • #AI optimization
  • Qwen3.7-Max autonomously optimized a kernel on unfamiliar hardware (T-Head ZW-M890 PPUs) over 35 hours, achieving a 10x speedup.
  • The model made 1,158 tool calls and performed 432 kernel evaluations, diagnosing failures and redesigning the architecture multiple times without human guidance.
  • Compared to other models, GLM 5.1 reached 7.3x speedup, Kimi K2.6 reached 5x, and DeepSeek V4 Pro reached 3.3x on the same task.
  • Benchmark results show Qwen3.7-Max trades blows with top models in coding (e.g., SWE-Verified) and leads in reasoning tasks like GPQA Diamond and HLE.
  • Training via 'environment scaling' across diverse agentic environments enables cross-harness generalization and robust problem-solving.
  • Limitations include being a proprietary API model (no open weights) and potential gaps in complex instruction following compared to competitors.
  • Suitable for agentic workflows that can use a proprietary API, but not for those requiring open weights or local deployment.