GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2

14 days ago

Copy Link

OpenAI released new open-weight LLMs: gpt-oss-120b and gpt-oss-20b, their first since GPT-2 in 2019.
The architecture includes optimizations like MXFP4 quantization, allowing models to run locally on single GPUs.
Key architectural changes from GPT-2 include removing dropout, using RoPE for positional embeddings, and replacing GELU with Swish/SwiGLU.
Mixture-of-Experts (MoE) replaces single feed-forward modules, increasing model capacity while keeping inference efficient.
Grouped Query Attention (GQA) and sliding window attention improve computational efficiency.
RMSNorm replaces LayerNorm for better training efficiency.
Comparison with Qwen3 shows differences in width vs. depth and expert configurations.
gpt-oss models support adjustable reasoning effort levels (low/medium/high) via system prompts.
Benchmarks show gpt-oss is competitive with proprietary models and Qwen3, despite being smaller.
GPT-5 was released shortly after gpt-oss, with gpt-oss performing surprisingly well in comparison.

Hasty Briefsbeta