
AMÁLIA is an open-source LLM for European Portuguese, focusing on data utilization and benchmarking in NLP.
AMÁLIA is a large-scale Large Language Model (LLM) developed for European Portuguese, backed by a significant investment from the Portuguese government. The initiative aims to enhance the representation of European Portuguese in the field of natural language processing.
Key features:
Despite its promising features, there are concerns regarding the amount of European Portuguese data utilized in training, with only 5.5% of the training tokens being clearly identified as European Portuguese. The article discusses the implications of this and the importance of transparency in the model's development.