Dr John P. McCrae

Research Group Leader

I am the leader of the Unit for Linguistic Data at the Data Science Institute of the National University of Ireland Galway. My work has focussed on the intersection of NLP and data science, and I have lead the development of the linguistic linked open data cloud, a large-scale integration of many language resources. I am the co-ordinator of the Prêt-à-LLOD project, funded by the European Union H2020 project, which aims to make linguistic linked open data ready-to-use. I am also a work package leader in the ELEXIS project on building a new lexicographic infrastructure for Europe. In addition, I am funded by the Irish Research Council under the Laureate program with the Cardamom project focused on the development of the comparable deep models for minority and historical languages. Finally, I am a PI in the SFI Insight Centre for Data Analytics and, from 2021, a PI in the SFI ADAPT Centre. I am also a member of the Centre for Applied Linguistics and Multilingualism (CALM) and have active research collaborations with Fidelity Investments and Huawei.

I completed my PhD within 3 years while still publishing a journal article (with 47 citations) and contributing to the BioCaster system for detecting disease outbreaks by processing texts in East Asian languages. After joining Bielefeld University in 2009, I played a leading role in at least two major scientific breakthroughs. Firstly, the development of the lemon Lexicon Model for Ontologies was a major contribution to the representation of semantics relative to natural language and is now being used by most relevant research groups and was one of the most significant outcomes of the Monnet project, an FP7 funded project. Secondly, out of the work on this topic I have been instrumental in creating the topic of linguistic linked open data as a major research theme which has been supported by over a dozen workshops and events and was a major theme of the 2016 Language Resource and Evaluation Conference (LREC). This topic lead to the Lider project, which used linguistic linked open data as an enabler for content analytics in enterprise and was funded by FP7, where I played a major role in writing the grant and in implementing the work plan. More recently, my work in linked data has played a pivotal role in obtaining funding for the ELEXIS project (under H2020-INFRAIA), where we will apply linked data technologies to lexicography.

My work has lead to over 100 publications, nearly all of these citations are for work that did not involve my PhD supervisor and I have co-authored with over 150 co-authors from institutions around the world.

Research Domains

Artificial Intelligence
Data Analytics
Data Integration and Quality
Data Management
Deep Learning
Digital Humanities
Emotion
FinTech
Knowledge & Data Engineering
Languages
Linguistics
Linked Data / Semantic Web / Knowledge Graphs
Machine Learning
Machine Translation
Natural Language Processing
Standards

Publications by John P. McCrae

SHACL4GW: SHACL Shapes for the Global Wordnet Association RDF Schema

PUBLICATION:	GWC 2025 - 13th Global Wordnet Conference
AUTHOR(S):	Fahad Khan, John P. McCrae
DATE:	01 February 2025
TYPE:	Conf papers

Remedying Gender Bias in Open English Wordnet

PUBLICATION:	GWC 2025 - 13th Global Wordnet Conference
AUTHOR(S):	John P. McCrae, Haotian Zhu, Fei Xia, Al Waskow, Kexin Gao
DATE:	01 February 2025
TYPE:	Conf papers

Renovating the Verb Hierarchy of English Wordnet

PUBLICATION:	GWC 2025 - 13th Global Wordnet Conference
AUTHOR(S):	John P. McCrae
DATE:	01 February 2025
TYPE:	Conf papers

MOOC on Linguistic Linked Data

PUBLICATION:	ESWC 2025 - 22nd European Semantic Web Conference
AUTHOR(S):	Jorge Gracia, Slavko Zitnik, Max Ionov, Christian Chiarcos, Dagmar Gromann, Francesco Mambrini, Marco Passarotti, Armando Stellato, John P. McCrae, Gilles Serasset, Elena Montiel-Ponsoda, Sara Carvalho, Penny Labropoulou, Rute Costa
DATE:	01 September 2025
TYPE:	Conf papers

DA-ATE: Data Augmentation for Automatic Term Extraction

PUBLICATION:	SEMANTiCS 2025 - 21st International Conference on Semantic Systems
AUTHOR(S):	Shubhanker Banerjee, Bharathi Raja Chakravarthi, John P. McCrae
DATE:	26 August 2025
TYPE:	Conf papers

Cuac: Fast and Small Universal Representations of Corpora

PUBLICATION:	LDK 2025 - 5th Conference on Language, Data and Knowledge
AUTHOR(S):	John P. McCrae, Bernardo Stearns, Alamgir Munir Qazi, Shubhanker Banerjee, Atul Kr. Ojha
DATE:	01 September 2025
TYPE:	Conf papers

Benchmarking Hindi Term Extraction in Education: A Dataset and Analysis

PUBLICATION:	LDK 2025 - 5th Conference on Language, Data and Knowledge
AUTHOR(S):	Shubhanker Banerjee, Bharathi Raja Chakravarthi, John P. McCrae
DATE:	01 September 2025
TYPE:	Conf papers

MG2P: An Empirical Study Of Multilingual Training for Manx G2P

PUBLICATION:	LDK 2023 - 4th Conference on Language Data and Knowledge
AUTHOR(S):	Shubhanker Banerjee, Bharathi Raja Chakravarthi, John P. McCrae
DATE:	01 September 2023
TYPE:	Conf papers

Documenting the Open Multilingual Wordnet

PUBLICATION:	GWC 2023 - 12th Global WordNet Conference
AUTHOR(S):	Francis Bond, Michael Wayne Goodman, Ewa Rudnicka, Luis Morgado da Costa, Alexandre Rademaker, John P. McCrae
DATE:	01 January 2023
TYPE:	Conf papers

Enriching a terminology for under-resourced languages using knowledge graphs

PUBLICATION:	eLex 2021 - 7th Biennial Conference on Electronic Lexicography
AUTHOR(S):	John P. McCrae, Atul Kumar Ojha, Bharathi Raja Chakravarthi, Ian Kelly, Patricia Buffini, Grace Tang, Eric Paquin and Manuel Locria
DATE:	05 July 2021
TYPE:	Conf papers

Dr John P. McCrae

Research Group Leader

Social Links

Other Academic Links

Publications by John P. McCrae