An empirical study of segment prioritization for incrementally retrained post-editing-based SMT

Du, Jinhua ORCID: 0000-0002-3267-4881, Ankit, Srivastava, Way, Andy ORCID: 0000-0001-5736-5930, Maldonado Guerra, Alfredo ORCID: 0000-0001-8426-5249 and Lewis, David ORCID: 0000-0002-3503-4644 (2015) An empirical study of segment prioritization for incrementally retrained post-editing-based SMT. In: The Fifteenth MT Summit Conference, 30 Oct-3 Nov 2015, Miami, FL, USA.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

Post-editing the output of a statistical machine translation (SMT) system to obtain high-quality translation has become an increasingly common application of SMT, which henceforth we refer to as post-editing-based SMT (PE-SMT). PE-SMT is often deployed as an incrementally retrained system that can learn knowledge from human post-editing outputs as early as possible to augment the SMT models to reduce PE time. In this scenario, the order of input segments plays a very important role in reducing the overall PE time. Under the active learning-based (AL) framework, this paper provides an empirical study of several typical segment prioritization methods, namely the cross entropy difference (CED), n-grams, perplexity (PPL) and translation confidence, and verifies their performance on different data sets and language pairs. Experiments in a simulated setting show that the confidence of translations performs best with decreases of 1.72-4.55 points TER absolute on average compared to the sequential PE-based incrementally retrained SMT.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Conference
Refereed:	Yes
Uncontrolled Keywords:	Authoring Tools; Controlled Languages; SpeechTo Speech Translation
Subjects:	Computer Science > Machine learning
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Initiatives and Centres > ADAPT
Published in:	Al-Onaizan, Yaser and Lewis, Will, (eds.) Proceedings of MT Summit XV. 1. Association for Computational Linguistics.
Publisher:	Association for Computational Linguistics
Official URL:	https://amtaweb.org/wp-content/uploads/2015/10/MTS...
Copyright Information:	© 2015 ACL
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:	Science Foundation Ireland through the ADAPT Centre (Grant 13/RC/2106) (www.adaptcentre.ie) at Dublin City University and Trinity College Dublin, Grant 610879 for the Falcon project funded by the European Commission
ID Code:	23216
Deposited On:	01 May 2019 15:31 by Thomas Murtagh . Last Modified 20 May 2021 13:59

Documents

Full text available as:

[thumbnail of An Empirical Study of Segment Prioritization for Incrementally Retrained Post-Editing-Based SMT.pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
396kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

Altmetric