The goal of this shared task is to benchmark and promote speech translation technology for a diverse range of dialects and low-resource languages. While significant research progress has been demonstrated recently on popular datasets, many of the world’s dialects and low-resource languages lack the parallel data at scale needed for standard supervised learning. We will likely require creative approaches in leveraging disparate resources.
For example, to translate dialectal speech such as Tunisian Arabic, one may leverage existing speech and text resources in Modern Standard Arabic. Or, to translate a low-resource language such as Tamasheq, one may need to leverage word-level translation resources and raw audio.
We will provide training and evaluation data for a range of language-pairs. Participants are free to participate in any number of language-pairs in this track. We encourage both dedicated systems that are designed to a single language-pair, as well as general recipes aimed at improving speech translation broadly for a wide typology of languages.
General Information for All Language-Pairs
The submission format will be standardized across all language-pairs. Participants can submit systems under two conditions:
- Constrained condition: systems are trained only on the dataset provided by the organizers
- Unconstrained condition: systems can be trained with any resource, including pre-trained models. Multilingual models are allowed.