Irish Language Resource, TwittIrish, Launched

27 May 2021

A new language resource for Irish, TwittIrish, was released in May following research by DCU researcher, Lauren Cassidy.  TwittIrish, the Irish Twitter Universal Dependencies Treebank, is a dataset of Irish tweets with linguistic annotation and the first treebank of user generated content in Irish parsed within the Universal Dependency framework, an open source annotation scheme which is consistent across languages. 

TwittIrish will be useful for researching how Irish is used online, including many phenomena not found in standard Irish text such as code-switching, transliteration and abbreviation.

TwittIrish is part of the GaelTech project, funded by The Irish Government  Department of Tourism, Culture, Arts, Gaeltacht, Sport and Media, and partially supported by Science Foundation Ireland through the ADAPT Centre for Digital Content Technology and co-funded under the European Regional Development Fund.

Universal Dependencies (UD) is a framework for consistent annotation of grammar across different human languages. UD is an open community effort with over 300 contributors producing nearly 200 treebanks in over 100 languages. 

TwittIrish is the work of Lauren Cassidy and was supervised by Dr Teresa Lynn and Dr Jennifer Foster of DCU.