Joss Moorkens

Joss Moorkens is a Post-doctoral Researcher in the ADAPT Centre, within the School of Computing in Dublin City University (DCU) with interests in user evaluation of translation technology and post-editing.

will present…

Correlations of perceived post-editing effort with measurements of actual effort

Abstract

As the quality of machine translation (MT) has incrementally improved in recent years, research has indicated that translation productivity may be improved by introducing post-editing (PE) of MT for certain domains, language pairs, and when MT quality is sufficient. Translators are, however, still faced with varied MT quality. To improve both productivity and to reduce cognitive friction (Cooper, 2004), it would seem useful to only present translators with high quality MT for post-editing. Post-editors themselves have previously requested that MT confidence estimations be displayed within their translation interface (Moorkens and O’Brien, 2013). Such confidence scores need to be trustworthy and reliable and should reflect how much post-editing effort is really required.

In this paper, we report on a three-stage study whereby experienced translators were first asked to rate segments of English-to-Brazilian Portuguese machine translated output from two texts for esti-mated post-editing effort. The effort categories were simplified to fit with a three-colour ‘traffic light’ confidence indicator to be used at Stage 3 of the study. In the second stage, after a break of several weeks, the same raters were asked to post-edit the machine translated texts (using a beta PE environ-ment) and their estimated and actual post-editing effort were compared. This allows us to see whether post-editors’ perceptions about PE effort at the segment level are borne out by correlations with mul-tiple measurements of their actual effort.

PE effort for Stage 2 was measured in three ways, as specified in Krings (2001): cognitive, tem-poral, and technical effort. Eye-tracking techniques were used to capture fixation count and duration as a proxy for cognitive effort; temporal effort was captured using timestamps taken from user activity data (UAD), i.e. logs of PE activity recorded within the editing environment; and technical effort, us-ing the TER (Translation Edit Rate; Snover et al., 2006) metric to measure the minimum number of edits required to change the machine translated segment to the post-edited segment. Measurements of perceived effort using similar scales to this study have been used to build sentence-level confidence estimation (CE) models, intended to give post-editors an indication of whether an unseen MT segment is worth post-editing (Specia et al., 2009).
We found that there was not a strong agreement between the 6 raters for perceived PE effort, but participants’ averaged perceived post-editing effort showed a strong correlation with actual technical effort and a moderate correlation with actual effort in temporal and cognitive measurements. In Stage 3, over 20 student participants with minimal post-editing experience completed the same two post-editing tasks, and their technical and temporal effort was recorded. One of the tasks was completed with colour-coded indicators of estimated effort displayed for each segment based on the ratings from Stage 1. The motivation for this was to test whether the display of indicators impacted on the tech-nical or temporal PE effort. Although there was a strong correlation between technical post-editing effort expended by participant groups in each stage, the correlation between Stage 1 ratings and Stage 3 effort was moderate. Moreover, in this experiment the display of confidence indicators was found not to impact on the post-editing effort.

Although the study is applied to only one language pair, our findings suggest that human ratings of perceived PE effort are not completely reliable indicators of actual effort. This may explain why post-editors in this study displayed no significant behavioural change whether or not indicators based on perceived effort were presented. Despite differences between expert and novice groups found in pre-vious research (Moorkens and O’Brien, 2015), there was a strong correlation between the technical and temporal effort of each group in this study, which suggests that actual effort, even from a different user group, would be a more reliable indicator of future effort, and as such would be a more reliable as training for MT CE scores.

——
References

Cooper, Alan. 2004. The Inmates are Running the Asylum: Why Hi-Tech Products Drive us Crazy and How to Restore the Sanity. Indianapolis: Sams Publishing.
Krings, Hans P. 2001. Repairing Texts. Kent State University Press, Ohio, USA.
Moorkens, Joss, and Sharon O’Brien. 2013. User Attitudes to the Post-Editing Interface. In Proceedings of MT Summit XIV Workshop on Post-editing Technology and Practice. Proceedings eds. Sharon O’Brien, Michel Simard, Lucia Specia. Nice, France, September 2, 19–25.
Moorkens, Joss, and Sharon O’Brien. 2015. Post-Editing Evaluations: Trade-offs between Novice and Professional Participants. In Proceedings of European Association for Machine Translation (EAMT) 2015. Proceed-ings eds. İIknur Durgar El‐Kahlout, Mehmed Özkan, Felipe Sánchez‐Martínez, Gema Ramírez‐Sánchez, Fred Hollowood, Andy Way. Antalya, Turkey, May 11-13, 75–81.
Snover, Matthew, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A Study of Translation Edit Rate with Targeted Human Annotation. In Proceedings of Association for Machine Transla-tion in the Americas, 2006, 223–231.
Specia, Lucia, Nicola Cancedda, Marc Dymetman, Marco Turchi and Nello Cristianini. 2009. Estimating the Sentence-Level Quality of Machine Translation Systems. In Proceedings of the 13th Annual Conference of the EAMT, Barcelona, May, 28–35.

Co-authors
Sharon O’Brien is the Director of the Centre for Translation and Textual Studies, a Challenge Leader in the ADAPT Centre, and Senior Lecturer in SALIS in DCU. She is interested in translation and especially translation technology with a specific focus on controlled language, machine translation, post-editing, localisation, and translation in crisis scenarios. She is also interested in end users of translation and in concepts such as translatability, usability, readability, comprehensibility and the measurement of cognitive load.	Igor A. Lourenço da Silva is a lecturer of Translation Studies at Universidade Federal de Uberlândia (UFU), Brazil. His lectures focus on English writing, inverse translation, and introduction to translation practice. He holds a Master’s and PhD degree in Applied Linguistics from Universidade Federal de Minas Gerais (UFMG), Brazil. His main fields of research comprise translation process research, translation expertise, and text linguistics. He has developed projects in partnership with UFMG, Dublin City University, and University of Macau. He is currently a research member of Translation research group (UFU) and LETRA (Laboratory for Experimentation in Translation, UFMG). He has worked as a freelance translator and proofreader since 2005. He also worked as an assistant researcher at University of Saarland, Germany, in 2011-2012.	Norma B.de Lima Fonseca is currently a PhD candidate in Applied Linguistics at Graduate Program in Linguistics and Applied Linguistics (POSLIN) at Federal University of Minas Gerais (UFMG) in Brazil, where she develops empirical-experimental research in Translation Studies. She obtained a master’s degree in Applied Linguistics from the same program. Her bachelor degree in English and Portuguese Languages was received from Federal University of Viçosa (UFV).	Fabio Alves is Professor of Translation Studies at Universidade Federal de Minas Gerais (UFMG), Brazil, where he carries out empirical-experimental research at the Laboratory for Experimentation in Translation (LETRA). His research interests encompass expertise and expert knowledge in translation; cognitive approaches to translation; translation and technology; and human-machine interaction in translation. He has published extensively in journals such as Across Languages and Cultures, Meta and Target, as well as book chapters in Continuum, Routledge and John Benjamins book series.