Application Form
eSTÓR Internship

JobRef: 23165
  • Accepted file types: pdf, Max. file size: 12 MB.
  • Accepted file types: pdf, Max. file size: 2 MB.

eSTÓR Internship

Level:Internship | Clár Intéirneachta
POSTED:April 30, 2024
LOCATION:DCU
Duration:
Reports to:Dr Brian Davis
Salary:
Closing Date:May 10, 2024

Please note the below is a shortened version of the full job specification. For more details please refer to the full Job Description document, which can be downloaded by clicking on the ‘Download full job spec’ button above.

Apply by sending your cover letter and CV to Abigail Walsh at [email protected] by midday on the 10th May 2024.

Project Description:

The eSTÓR data repository was initially set up as the National Relay Station (NRS) in 2019 to be the first platform to share bilingual language data nationally. It was created by researchers at the ADAPT Centre through the EU-funded European Language Resource Infrastructure (ELRI) project. On receipt of Irish government funding in 2021, the NRS was redesigned and rebranded as the eSTÓR website (www.estor.ie) in 2022. As eSTÓR, it continues to be used by people working with the Irish language to upload language data to one central location with the ultimate aim of improving translation technology for public administration nationally and across Europe.

The role of the student in this project:

The successful candidate will be primarily involved in the cleaning and development of specialised idiomatic datasets for use in NLP applications. He/She will be processing and validating collections of idioms taken from electronic dictionaries and other lexical resources, usually in XML format.

  • Processing steps include removing duplicated or noisy entries, automatically supplying translations and other missing linguistic information, and structuring entries according to license and idiom category.
  • Validation steps include perform sanity checks during processing, manual inspection of the data, and manual annotation of idiom categories.
  • The candidate will also be required to write a report on the processing and validation tasks, including quality estimation of the data and remaining cleaning tasks to be carried out.

Pending quality and license, subsets of this dataset will be added to the eSTÓR collection, for use in downstream NLP applications, including machine translation. 

Skills Needed
Strong Irish language skills; well organised; experience with linguistics and translation as well as an interest in language technology especially for the Irish language; strong scripting skills (Python and Bash preferred). 
Benefit Gained 

The successful candidate will have a unique opportunity to work on building new datasets and uploading them to the first ever national digital repository for Irish language data. The eSTÓR project serves as a platform for the use of public servants, as well as supporting development of machine translation (MT) technology at the EU level. With additional specialised resources, NLP applications, including MT, can be improved and expanded for use by all members of the public. Therefore the candidate will gain an insight into:

  • The future plans for Irish language technology, the National Digital Strategy and the Digital Plan for the Irish language at both the government and more general public levels
  • The future plans for EU machine translation projects
  • How research projects are run and operated
  • Practical skills in data processing for building tailored NLP models for low-resource languages

Research skills including developing and following methodologies, reporting on results, and communicating with a larger research team

Apply by sending your cover letter and CV to Abigail Walsh at [email protected] by midday on the 10th May 2024.

Apply Now

Other Positions