拢1.8m funding for large-scale online resource of contemporary Welsh language
As a leading authority on Welsh language technologies, 黑料不打烊 will be participating in a multi-institution project to develop the first mass corpus to capture and inform the past, present and future use of the Welsh language.
Led by Cardiff University鈥檚 , the 拢1.8 million interdisciplinary, collaborative project, funded by the (ESRC) and the Arts and Humanities Research Council (AHRC), entitled The National Corpus of Contemporary Welsh, or Corpws Cenedlaethol Cymraeg Cyfoes (CorCenCC) will compile an initial data set of 10 million Welsh words.
Commencing in March 2016, the project will run for three and a half years. It will draw on expertise from 黑料不打烊, Cardiff, Swansea and Lancaster Universities and break new ground as both a language resource and a model of corpus construction.
Professor Enlli Thomas from 黑料不打烊, already the home of the , will co-lead on the development and evaluation of a dedicated resource for teachers and learners of Welsh that will result from Corpus as part of the research.
Professor Thomas, said: "This work will be a major contribution to the field. It will be a substantial collection of different kinds of language patterns of Welsh speakers - both verbally and in writing - and a living record of the language."
The corpus - a large collection of texts, or a body of written or spoken material for linguistic analysis - will represent Welsh language use across all communication types. This will include spoken, written and digital language, encompassing different genres, language varieties (regional and social) and contexts.
Contributors will be drawn from the 562,000 Welsh speakers in the UK, who will contribute via crowdsourcing digital technologies and community collaboration.
Further detail on the project鈥檚 construction and the ways in which users will be able to participate will be shared once it is live in 2016.
, from Cardiff University鈥檚 School of English, Communication and Philosophy, who is leading the project, said: 鈥淲hat we hope to achieve is the development of the first large-scale living and evolving corpus, representing the Welsh language across communication types and informed by real, current, users of the language.
鈥淲e will be engaging with the public in a number of ways, and using new technologies to do so. This is a project about the past, present and future use of the Welsh language and will inform us about variation and change in real language use, such as regional differences or use of mutations over time.
鈥淭he project will have a positive impact on the work of translators, publishers, policy-makers, language technology developers and academics, and a bespoke toolkit will be constructed for teachers and learners, integrating basic corpus functionalities for the exploration of language use.鈥
The range of stakeholders for the project - including the Welsh Government, Welsh Joint Education Committee, Welsh for Adults, Gwasg y Lolfa and University of Wales Dictionary 鈥 are representative of the linguistic, cultural and social relevance of the project.
Publication date: 13 October 2015