In this post we look at the works which were submitted for the shared task of multilingual low-resource translation for Indo-European languages. The works will be for the EMNLP 2021 sixth conference on machine translation. The scientific workshop will be held in November 10-11, 2021 in Punta Cana (Dominican Republic) and online. A link for the full evaluation results can be found in this link.
The automatic evaluation metrics which are used are BLEU, TER, chrF, COMET and BertScore. The final ranking is done according to the average ranking of the individual metrics per family, ties on individual metrics are considered. Two baselines were used: M2M-100 1.2B Model and mT5-devFinetuned.
Romance Family (Wikipedia)
Official Ranking | Average Ranking | BLEU | TER | chrF | COMET | BertScore |
CUNI-Primary | 1.2±0.4 | 50.06 | 0.401 | 0.694 | 0.566 | 0.901 |
CUNI-Contrastive | 1.6±0.5 | 49.48 | 0.404 | 0.693 | 0.569 | 0.901 |
Tencent-Contrastive | 3.0±0.0 | 43.45 | 0.460 | 0.670 | 0.444 | 0.894 |
Tencent-Primary | 3.8±0.4 | 43.31 | 0.462 | 0.668 | 0.442 | 0.894 |
BSC-Primary (*) | 5.0±0.7 | 41.33 | 0.462 | 0.647 | 0.363 | 0.884 |
M2M-100 (baseline) | 5.8±0.4 | 40.02 | 0.478 | 0.634 | 0.414 | 0.878 |
UBCNLP-Primary | 7.2±0.4 | 35.41 | 0.528 | 0.588 | 0.007 | 0.854 |
mT5-Finetuned (baseline) | 8.0±0.7 | 29.28 | 0.592 | 0.553 | 0.059 | 0.850 |
UBCNLP-Contrastive | 8.6±0.5 | 28.51 | 0.591 | 0.529 | -0.374 | 0.825 |
North-Germanic Family (Europeana)
Official Ranking | Average Ranking | BLEU | TER | chrF | COMET | BertScore |
M2M-100 (baseline) | 1.0±0.0 | 31.45 | 0.54 | 0.55 | 0.399 | 0.862 |
Edinsaar-Contrastive | 2.2±0.4 | 27.07 | 0.57 | 0.54 | 0.283 | 0.856 |
Edinsaar-Primary | 2.8±0.4 | 27.54 | 0.58 | 0.52 | 0.276 | 0.849 |
UBCNLP-Primary | 4.0±0.0 | 24.94 | 0.60 | 0.50 | 0.076 | 0.847 |
UBCNLP-Contrastive | 5.0±0.0 | 24.02 | 0.61 | 0.49 | -0.068 | 0.837 |
mT5-devFinetuned (baseline) | 6.0±0.0 | 18.53 | 0.78 | 0.42 | -0.102 | 0.810 |