The advancement of large-scale pretrained models has given a boost to research in automatic text generation. However, only a few languages dominate the current state-of-the-art due to the ease of acquiring training data. Nonetheless, there has been a recent increase in interest in generating text for under-resourced and under-represented languages, as well as in multilingual models and applications in NLP. To address these trends, the WebNLG 2023 Challenge is organized, which focuses on generating text in few-shot and/or zero-shot settings for four under-resourced languages. The challenge involves converting RDF triples sourced from DBPedia into natural language text.
The new edition of WebNLG focuses on four under-resourced languages which are severely under-represented in research on text generation, namely Maltese, Irish, Breton and Welsh. In addition, WebNLG 2023 will once again include Russian, which was first featured in WebNLG 2020. For WebNLG 2023, we are soliciting submissions encompassing a variety of approaches to automatic text generation, from neural architectures to rule-based systems. We especially encourage submissions addressing generation in few-shot or zero-shot settings.
Development and test data is now available for all 5 languages, namely Breton, Maltese, Irish and Welsh (the target languages for WebNLG 2023), as well as Russian. Participants can download the development data; the test data will be reserved for the final evaluation.Data for each language was obtained by sourcing high-quality, professional translations of the original English texts in the WebNLG 2020 dev and test sets.
Training data is also available for the original WebNLG English data and, as per WebNLG 2020, for Russian. In addition, we provide ‘noisy’ training data for the target languages (Maltese, Breton, Welsh and Irish), obtained via machine translation of the texts in the English WebNLG 2020 train split.
As in previous editions of WebNLG, submitted results will be evaluated using both automatic and human evaluation methods.
Data and instructions for the task are available from the WebNLG repo: https://github.com/WebNLG/2023-Challenge
Teams who submit systems for evaluation at WebNLG 2023 will subsequently be invited to contribute a short paper describing their approach and results. The task as a whole, as well as individual submissions, will be presented at a special session in an event to be announced later.
General information about the WebNLG challenges can be found on the following URL: https://synalp.gitlabpages.inria.fr/webnlg-challenge/challenge_2023/
- February 2023: First call for participation. Development data and noisy training data available.
- 8 June 2023: Release of test data
- 15 June 2023: Deadline for submission of system outputs.
- 15 August 2023: Deadline for submission of short papers describing systems.
The final presentation of results will be held during a workshop. Current plans are to hold this in September 2023.
WebNLG 2023 is being organised under the auspices of LT-Bridge, supported by the Horizon 2020 Work Programme Spreading Excellence and Widening Participation (WIDESPREAD) 2018-2020 and the ANR funded xNLG Chair on multi-lingual, multi-source NLG.
Contact: [email protected]