Hasty Briefsbeta

Trinity large: An open 400B sparse MoE model

11 days ago
  • #AI
  • #Machine Learning
  • #MoE
  • Arcee introduces Trinity Mini, a compact MoE model trained end-to-end in the U.S., offering open weights, strong reasoning, and full control for developers.
  • Trinity Large is a 400B parameter sparse MoE with 13B active parameters per token, using 256 experts with 4 experts active per token.
  • Three variants of Trinity-Large are being released: Preview (lightly post-trained and chat-ready), Base (best pretraining checkpoint), and TrueBase (early checkpoint without instruct data).
  • Trinity-Large-Base matches and exceeds peers in open-base models across benchmarks like math, coding, scientific reasoning, and knowledge absorption.
  • Trained on 2048 Nvidia B300 GPUs, the pretraining run finished in 33 days, making it one of the fastest for its scale.
  • The dataset includes 17T tokens curated by DatologyAI, with over 8T tokens of synthetic data across web, code, math, reasoning, and multilingual domains.
  • Trinity-Large-Preview excels in creative writing, storytelling, role-play, chat scenarios, and real-time voice assistance, and is free in OpenRouter during the preview period.
  • Trinity-Large-TrueBase offers a pure pretraining checkpoint without instruction data, ideal for researchers studying high-quality pretraining.
  • The entire effort cost $20 million, a fraction of what frontier labs typically spend.
  • Trinity Large natively supports 512k context, with the preview API running at 128k context with 8-bit quantization.