Amália and the Future of European Portuguese LLMs
2 days ago
- #NLP
- #European Portuguese
- #Open Source
- AMÁLIA is a European Portuguese LLM project with a 5.5 million Euro investment from the Portuguese government.
- The model builds on EuroLLM and aims to prioritize European Portuguese data, but only 5.5% of pre-training tokens are clearly from Portuguese sources.
- It outperforms models like Qwen 3-8B on some Portuguese benchmarks, but lacks full openness in weights, data, and training logs.
- New benchmarks like ALBA were created, but they may not adequately measure the model's knowledge about Portugal-specific topics.
- Critiques include limited Portuguese data usage and the need for more transparency and focus on intrinsic Portuguese knowledge.