Ministral 3 – pruning via Cascade Distillation

4 months ago

Introduces Ministral 3 series, a family of parameter-efficient dense language models for compute and memory constrained applications.
Available in three model sizes: 3B, 8B, and 14B parameters, each with three variants: base, instruction finetuned, and reasoning models.
Uses Cascade Distillation, an iterative pruning and continued training with distillation technique, to derive the models.
Includes image understanding capabilities and is released under the Apache 2.0 license.

Hasty Briefsbeta