Hasty Briefsbeta

Bilingual

Amália and the Future of European Portuguese LLMs

2 days ago
  • #NLP
  • #European Portuguese
  • #Open Source
  • AMÁLIA is a European Portuguese LLM project with a 5.5 million Euro investment from the Portuguese government.
  • The model builds on EuroLLM and aims to prioritize European Portuguese data, but only 5.5% of pre-training tokens are clearly from Portuguese sources.
  • It outperforms models like Qwen 3-8B on some Portuguese benchmarks, but lacks full openness in weights, data, and training logs.
  • New benchmarks like ALBA were created, but they may not adequately measure the model's knowledge about Portugal-specific topics.
  • Critiques include limited Portuguese data usage and the need for more transparency and focus on intrinsic Portuguese knowledge.