The frontier is open-source today

6 hours ago

GLM-5.2 outperformed Opus 4.8 on an AI-resistant take-home test, delivering higher-quality transcriptions, better speaker identification, closer instruction-following, and more maintainable code.
offmute-v2 combines insights from previous projects into a multi-step pipeline using a regular STT model and a multimodal LLM to produce accurate, diarized, timestamp-correct transcripts with identified speakers.
The tool is extensible, runs in the browser, and allows for customization, such as fixing common misspellings or focusing on conversations in noisy environments.
GLM's version, offmute-v2@glm, is now the primary version, with Opus's version preserved as offmute-v2@opus. Opus's best ideas are being integrated into the GLM version.
Both models faced issues: GLM had a silent bug serving cached transcripts, while Opus crashed on audio-only files and had spec-implementation drift.
Despite Opus posting a better raw WER (Word Error Rate), the difference narrowed after fixing GLM's deduplication bug. WER is not the sole metric, as output quality, speaker matching, and maintainability are more important.
This marks a significant milestone where an open-source model (GLM) outperformed a frontier model (Opus) across multiple axes, offering cost-effectiveness, open weights, and competitive intelligence.
The advancement in open-source models like GLM-5.2 enables new possibilities for secure, reliable data use at scale, with potential for more accessible pretraining and tuning in the future.

Hasty Briefsbeta