Innovative Research Sheds Light on Limitations of Popular Evaluation Metric
New research led by ADAPT researchers at Trinity College Dublin Dr Sébastien Le Maguer and and Professor Naomi Harte, with Professor Simon King of the University of Edinburgh, has brought into question the reliability of the Mean Opinion Score (MOS), a widely used metric in evaluating speech synthesis technologies. The research paper titled “The Limits of the Mean Opinion Score for Speech Synthesis Evaluation” was recently published in Computer Speech and Language and highlights significant concerns about the MOS, particularly in light of advancements in technologies like WaveNet and Tacotron.
The Issue with MOS
The MOS, derived from the Absolute Category Rating (ACR) protocol, is a common method for assessing the quality of synthetic speech. It’s a simple and scalable system however due to advancements in the technology its reliability is now debatable. The study underscores that while the MOS is used as an absolute measure, it actually behaves more like a relative score, influenced by the quality of other systems it’s compared against.
The research team conducted four experiments, replicating and expanding on the 2013 edition of the Blizzard Challenge, to probe deep into the MOS’s effectiveness. The results were revealing. While confirming the superiority of modern technologies over historical speech synthesis, the experiments also highlighted the MOS’s sensitivity to the presence of lower and higher quality systems. This suggests that the MOS, despite its popularity, is not suitable for evaluating the nuanced and rapidly advancing field of speech synthesis.
The study calls for a reevaluation of current methodologies and suggests relying on more robust alternatives, such as MUSHRA, until more insightful protocols are proposed. This shift is crucial, especially considering the increasing reliance on automatic MOS predictors in Deep Learning.
Read the publication here: https://www.sciencedirect.com/science/article/abs/pii/S0885230823000967