ADAPT Radio: AI in Action, Professor Naomi Harte on Enhancing Speech Communication between Humans and Technology

30 May 2022

ADAPT Radio continues this month with an enlightening discussion with a founding member of the ADAPT SFI Research Centre. Professor Naomi Harte is an ADAPT Co-PI and established Associate Professor in the School of Engineering in Trinity College Dublin. Within ADAPT, Professor Harte has led a major Research Theme centred on Multimodal Interaction involving researchers from Universities across Ireland and was instrumental in developing the future vision for the Centre for 2021-2026. She is also a lead academic of the hugely successful Sigmedia Research Group in the School of Engineering. 

Professor Harte begins the podcast with an illuminating discussion on her current research products which include audio visual speech recognition, speech synthesis, evaluation, multimodal speech analysis, and birdsong.  She informs us that the production system of birdsong is very similar to the way humans produce speech and her research attempts to automate the process of identifying individual birds from birdsong recordings. These techniques are similar when analysing human speech but slightly different as much of the information in human speech is non-verbal, i.e. mouth movements or body language, which help us understand speech and which Professor Harte integrates into her analysis.

During the podcast, Professor Harte’s fascinating project ‘Room Reader’ was also discussed in detail. ‘Room Reader’ was a pandemic innovation that grew from the Professor’s frustration of teaching over Zoom, particularly with the rise of Zoom fatigue. The project specifically addressed questions such as how a teacher could read the room to monitor student engagement when, as per online lectures, eye-contact and other visual cues can no longer be relied upon. ‘Room Reader’ can predict engagement from students using deep learning or AI to read their typical visual cues. As well as this, Professor Harte discusses how deep learning models, or AI, has transformed speech technology for people. At this stage, we have all encountered speech tools like Alexa, Siri and Google Home making speech recognition a reliable alternative to typing and providing hands free technology. Professor Harte discusses how seamless the interactions between smart speakers and humans could be and the possibility of devices and assistants taking visual cues from us.

Another area of speech research Professor Harte discusses is ‘spoofing detection’. There are two parallel branches of speech research and speaker verification, i.e. biometric systems and speech synthesis (the technology that takes recordings of your voice and builds a new voice for you). Biometric systems need to understand how you can create a voice and the weaknesses within that technology that could be exploited via Professor Harte’s systems. In return there are representations of speech that Professor Harte can use, such as a time frequency representation, where you break music into notes, and, in the same vein, break down speech into a mixture of frequency or notes. Some of these patterns can be perceived when listening to the voice and it will become obvious when the voice is not real. She explains that the ultimate challenge will be interpreting this data as the spoofs improve over time.

Professor Harte also discusses the challenges her research faced over the last twelve months, in particular keeping her research group going when meetings are happening entirely online and missing the spontaneous conversations that come from being in person. Despite its challenges, Professor Harte is glad for the great team she has around her. Finally, she concludes the podcast with an insight into the opportunities for people in the areas of engineering and speech technology.

The full podcast with Professor Naomi Harte is available here.

ADAPT Radio: AI in Action is ADAPT’s newest podcast series highlighting pioneering ADAPT AI research that is empowering individuals, businesses and society to get the most from AI-Driven Digital Media Technology. ADAPT Radio is available on SoundCloud, iTunes, Spotify, and Google Podcasts.