Ministral 3 – pruning via Cascade Distillation
4 months ago
- #language-models
- #machine-learning
- #distillation
- Introduces Ministral 3 series, a family of parameter-efficient dense language models for compute and memory constrained applications.
- Available in three model sizes: 3B, 8B, and 14B parameters, each with three variants: base, instruction finetuned, and reasoning models.
- Uses Cascade Distillation, an iterative pruning and continued training with distillation technique, to derive the models.
- Includes image understanding capabilities and is released under the Apache 2.0 license.