Trinity-Large-Thinking: Open-source 398B MoE (13B active) for agentic tasks

a month ago

Trinity-Large-Thinking is a Mixture-of-Experts (MoE) model with 398B total parameters but only 13B active during inference, enabling fast performance while leveraging extensive knowledge.
It was pretrained on 17 trillion tokens and specifically post-trained on agentic tasks like tool-calling trajectories and multi-step reasoning, integrating reasoning and tool use from the start.
The model maintains thinking tokens across the entire agent loop, preserving reasoning traces in context to inform decisions and avoid resetting memory between steps—key for multi-step tasks.
Benchmarks show Trinity excels in agentic scenarios, outperforming models like Opus 4.6 on Tau2-Airline (88.0 vs. 82.0) and Tau2-Telecom (94.7 vs. 92.1), and scoring 98.2 on LiveCodeBench for coding tasks.
Trinity requires significant infrastructure and is not for consumer GPUs; it's accessible via OpenRouter API or vLLM for custom deployments, with a strict note to preserve thinking blocks in context for effective operation.
It is best suited for production agent systems (e.g., OpenClaw or Hermes Agent integration), not for general-purpose tasks, as it is specifically designed for multi-step agentic workflows.

Hasty Briefsbeta