Arcee Trinity Mini: US-Trained Moe Model
9 days ago
- #AI
- #Open Source
- #Machine Learning
- Mergekit returns to GNU Lesser General Public License v3 effective October 31, 2025.
- Arcee introduces Trinity Mini, a compact MoE model trained in the U.S., offering open weights and strong reasoning.
- Chinese labs like Qwen and DeepSeek are leading in open weight MoE models.
- Arcee AI aims to provide open weight models trained end-to-end in America with Trinity family.
- Trinity Nano and Mini are available now; Trinity Large is training and will arrive in January 2026.
- Trinity Mini is a fully post-trained reasoning model, while Trinity Nano is an experimental chat model.
- Arcee shifted from post-training open bases to training their own foundations for long-term improvements.
- AFM-4.5B was their initial dense model experiment, leading to the development of Trinity.
- Trinity uses afmoe architecture with gated attention, Muon, and a U.S.-controlled data pipeline.
- Training involves grouped-query attention, gated attention, and local/global attention patterns.
- MoE layers follow DeepSeekMoE design with 128 routed experts, 8 active per token.
- Training uses Muon, TorchTitan in bf16 precision, and a curriculum of 10T tokens across three phases.
- Trinity Large is a 420B parameter model with 13B active parameters per token.
- Arcee encourages the community to test and provide feedback on Trinity models to shape future developments.