Kimi K2.6 just beat Claude, GPT-5.5, and Gemini in a coding challenge

5 hours ago

Kimi K2.6, an open-weights Chinese model from Moonshot AI, won a programming challenge (Word Gem Puzzle) by scoring 22 match points, beating Western models like GPT-5.5 and Claude Opus 4.7.
The Word Gem Puzzle involves sliding tiles on grids to form valid English words, with scoring that rewards longer words and penalizes short ones.
Kimi K2.6's strategy was aggressive sliding to unlock high-value words, which worked well on larger scrambled grids (e.g., 30x30), while other models like MiMo V2-Pro relied on static scanning of intact seed words.
MiMo V2-Pro placed second with 20 match points, showing a different approach, and the results highlight how model strategies (e.g., sliding vs. scanning) affect performance based on grid conditions.
The challenge reveals that open-weights models are closing the capability gap with frontier labs, with Kimi K2.6 scoring close to top Western models on benchmarks, indicating a shift in the competitive landscape.
Notable underperformers included Muse Spark, which scored -15,309 due to claiming all words regardless of length penalties, and DeepSeek V4, which produced malformed output.

Hasty Briefsbeta