OpenAI's new open weight (Apache 2) models are good
9 months ago
- #AI
- #Machine Learning
- #OpenAI
- OpenAI released new open weight models under Apache 2.0 license: gpt-oss-120b and gpt-oss-20b.
- gpt-oss-120b achieves near-parity with proprietary o4-mini on reasoning benchmarks, running on an 80GB GPU.
- gpt-oss-20b matches o3-mini performance, suitable for edge devices with 16GB memory.
- Both models use mixture-of-experts, activating 5.1B and 3.6B parameters per token respectively.
- Models perform well on PhD-level science questions (GPQA Diamond benchmark).
- gpt-oss-20b runs efficiently on a Mac with 32GB RAM, using ~12GB for inference.
- Models support reasoning levels (low, medium, high) affecting speed and accuracy.
- OpenAI Harmony introduced as a new prompt template format with system, developer, user, assistant, and tool roles.
- Models trained on trillions of tokens, focusing on STEM, coding, and general knowledge, with safety filters.
- Training costs estimated between $4.2M-$23.1M for gpt-oss-120b and $420K-$2.3M for gpt-oss-20b.
- Models support tool calling for web browsing, Python execution, and developer-defined functions.
- Competitive with recent Chinese open models (Qwen, Moonshot, Z.ai), potentially surpassing them.