OpenAI's new open-source model is basically Phi-5
17 days ago
- #Synthetic Data
- #AI Safety
- #OpenAI
- OpenAI released its first open-source large language models, gpt-oss-120b and gpt-oss-20b, with mixed performance on benchmarks.
- The models excel in some areas but underperform in others, like SimpleQA, and lack out-of-domain knowledge.
- Microsoft's Phi-series models, developed by Sebastien Bubeck, were trained on synthetic data, performing well on benchmarks but poorly in real-world tasks.
- Synthetic data offers control over training content, making models safer but potentially less versatile.
- OpenAI likely adopted synthetic data for safety, ensuring the open-source models avoid subversive behavior and align with benchmarks.
- OpenAI's main business remains closed-source models, reducing the need for their open-source models to excel in real-world applications.