DeepSeek-TNG-R1T2-Chimera

10 months ago

DeepSeek-TNG-R1T2-Chimera is a new model, successor to the original DeepSeek R1T Chimera.
It is constructed using the Assembly of Experts method with three parent models: R1-0528, R1, and V3-0324.
Fixes the <think> token consistency issue present in the original R1T Chimera.
Operates at a new sweet spot in intelligence vs. output token length, being 20% faster than R1 and more intelligent in benchmarks like GPQA and AIME-24.
Recommended as a drop-in replacement for R1, a cheaper alternative to R1-0528, and generally preferred over R1T unless specific traits of R1T are needed.
Not recommended for function-calling intensive applications due to influence from the R1 parent model.
Benchmarking now includes AIME24, AIME25, and GPQA-Diamond, showing larger score differences between R1 and R1T Chimera.
Architecture is a DeepSeek-MoE transformer-based language model, released on 2025-07-02.
Users in the EU are advised to comply with the EU AI Act guidelines effective August 2nd, 2025, or cease using the model.
Feedback is encouraged via email or X.com, and citation details are provided for academic use.

Hasty Briefsbeta