Hasty Briefsbeta

Bilingual

Ministral 3 – pruning via Cascade Distillation

4 months ago
  • #language-models
  • #machine-learning
  • #distillation
  • Introduces Ministral 3 series, a family of parameter-efficient dense language models for compute and memory constrained applications.
  • Available in three model sizes: 3B, 8B, and 14B parameters, each with three variants: base, instruction finetuned, and reasoning models.
  • Uses Cascade Distillation, an iterative pruning and continued training with distillation technique, to derive the models.
  • Includes image understanding capabilities and is released under the Apache 2.0 license.