Massively Multilingual Adaptation of Large Language Models Using Bilingual Translation Data
Shaoxiong Ji,
Zihao Li,
Jaakko Paavola,
Indraneil Paul,
Hengyu Luo,
Jörg Tiedemann
Arxiv
[Paper] /
[Github] /
[Model] /
[Datasets]
|
Scaling Low-Resource MT via Synthetic Data Generation with LLMs
Ona de Gibert,
Joseph Attieh,
Teemu Vahtola,
Mikko Aulamo,
Zihao Li,
Tiancheng Hu,
Raúl Vázquez,
Jörg Tiedemann
Arxiv
[Paper]
|
Rethinking Multilingual Continual Pretraining: Data Mixing for Adapting LLMs Across Languages and Resources
Zihao Li,
Shaoxiong Ji,
Hengyu Luo,
Jörg Tiedemann
Arxiv
[Paper] /
[Github] /
[Models & Datasets]
|
GlotEval: A Test Suite for Massively Multilingual Evaluation of Large Language Models
Hengyu Luo,
Zihao Li,
Joseph Attieh*,
Sawal Devkota*,
Ona de Gibert*,
Shaoxiong Ji*,
Peiqin Lin*,
Bhavani Sai Praneeth Varma Mantina*,
Ananda Sreenidhi*,
Raúl Vázquez*,
Mengjie Wang*,
Samea Yusofi*,
Jörg Tiedemann
Arxiv
[Paper] /
[Github]
|
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
Shaoxiong Ji,
Zihao Li,
Indraneil Paul,
Jaakko Paavola,
Peiqin Lin,
Pinzhen Chen,
Dayyán O'Brien,
Hengyu Luo,
Hinrich Schütze,
Jörg Tiedemann,
Barry Haddow
Arxiv
[Paper] /
[Github] /
[Model] /
[Datasets]
|
A Comparison of Language Modeling and Translation as Multilingual Pretraining Objectives
Zihao Li,
Shaoxiong Ji*,
Timothee Mickus*,
Vincent Segonne,
Jörg Tiedemann
EMNLP 2024
[Paper] /
[Github]
|
|