TC39 Keynote Speakers
Prof. Dr. Alexander Waibel
Carnegie Mellon University, USA
Karlsruher Institut für Technologie, Germany
Dr. Alexander Waibel is a Professor of Computer Science at Carnegie Mellon University, Pittsburgh and at the Karlsruhe Institute of Technology, Germany. He is the director of the International Center for Advanced Communication Technologies (interACT). The Center works in a network with eight of the world’s top research institutions. The Center’s mission is to develop multimodal and multilingual human communication technologies based on advanced machine learning algorithms to improve human-human and human-machine communication. Prof. Waibel and his team developed many statistical and neural network learning algorithms that made a number of communication breakthroughs possible. These included early multimodal interfaces, first neural network speech and language processing systems, the first speech translation systems in Europe&USA (1990/1991), the world’s first simultaneous lecture translation system (2005), and Jibbigo, the world’s first commercial speech translator on a phone (2009).
Dr. Waibel founded and served as chairmen of C-STAR, the Consortium for Speech Translation Advanced Research in 1991. Since then he directed many research programs in speech, translation, multimodal interfaces and machine learning in the US, Europe and Asia. He served as director of EU-Bridge (2012-2015), CHIL (2004-2007), two large scale European multi-site Integrated Project initiatives on intelligent assistants and speech translation services. He also served as co-director of IMMI, a joint venture between KIT, CNRS & RWTH.
Dr. Waibel is an IEEE Fellow and received many awards for pioneering work on multilingual and multimodal speech communication and translation. He published extensively (>700 publications, >24,000 citations, h-index 80) in the field and received/filed numerous patents.
During his career, Dr. Waibel founded and built 10 successful companies. Following the acquisition of Jibbigo by Facebook, Waibel served as founding director of the Language Technology Group at FB. He also deployed speech translation technologies in humanitarian and disaster relief missions. His team recently deployed the first simultaneous interpretation service for lectures at Universities and interpretation tools at the European Parliament.
Dr. Waibel received his BS, MS and PhD degrees at MIT and CMU, respectively.
After centuries of separation and misunderstandings, we are lucky to be living in the generation that will see an end to language barriers between the peoples of our planet. Automatic translation of text is now becoming ubiquitous on the internet, and even communication by voice between people speaking different languages is now becoming a reality for everyone.
Early breakthroughs in large vocabulary speech recognition, machine translation and neural networks prepared the way for the development of first speech-to-speech translation systems in the early 90’s. Over the 25 years of research that followed, what seemed a crazy idea at first, blossomed into an array of practical interpreting systems that revolutionize modern human communication today: Cross-language interpretation systems that bring people closer together than ever before.
In this talk, I will review the technologies and deployed interpreting solutions available today:
- Speech translators running on servers, laptops and smartphones for tourists, medical doctors and international relief workers,
- Communication on tablets in Humanitarian and Government Missions
- Road sign interpreters that translate road signs while traveling abroad
- Multilingual subtitling and translation of TV broadcasts
- Automatic simultaneous Interpretation of lectures given in foreign languages
- Tools and Technology that facilitate and support human interpreters at the European Parliament
I will review algorithmic advances, progress in performance and usability, and discuss remaining scientific challenges. And we will speculate on a future without language barriers that involves human and machine interpretation.
Prof. Dr. Roberto Navigli
Sapienza Università di Roma
Linguistic Computing Laboratory
Roberto Navigli is Professor of Computer Science at the Sapienza University of Rome, where he heads the multilingual Natural Language Processing group. He was awarded the Marco Somalvico 2013 AI*IA Prize for the best young researcher in AI. He is one of the few Europeans to have received two prestigious ERC grants in computer science, namely an ERC Starting Grant onmultilingual word sense disambiguation (2011-2016) and an ERC Consolidator Grant on multilingual language- and syntax-independent open-text unified representations (2017-2022).
He was also a co-PI of a Google Focused Research Award on NLP. In 2015 he received the META prize for groundbreaking work in overcoming language barriers withBabelNet, a project also highlighted in TIME magazine and presented in the most cited 2012 paper in the Artificial Intelligence Journal, a journal for which he is currently an Associate Editor.
Based on the success of BabelNet and its multilingual disambiguation technology, he co-founded Babelscape, a Sapienza startup company which enables HLT in hundreds of languages.
In this talk I will introduce the most recent developments of the BabelNet technology, winner of several scientific awards and of great interest to interpreters and translators. I will first describe BabelNet live – the largest, continuously-updated multilingual encyclopedic dictionary – and then discuss a range of cutting-edge industrial use cases implemented by Babelscape, our Sapienza startup company, including: multilingual interpretation and mapping of terms; multilingual concept and entity extraction from text; semantically-enhanced multilingual translation, and cross-lingual text similarity.
Presentations
listed in order by surname of (1st) presenter
Kristiina Abdallah (Universities of Vaasa and Jyväskylä) Kristiina Abdallah has worked as a translator, subtitler and technical writer. Since 2001 she has held various positions at the University of Tampere, namely that of an assistant, a lecturer and researcher. As of 2010 she has worked as a university teacher, first at the University of Eastern Finland, and currently at the Universities of Vaasa and Jyväskylä. She defended her doctoral thesis entitled Translators in Production Networks. Reflections on Agency, Quality and Ethics in 2012. Her research interests are translation sociology and, more specifically, translators’ workplace studies. |
Three-dimensional Quality Model: The Focal Point of Workflow Management in Organisational Ergonomics Although quality is a central concept in every act of translating, it has been considered difficult to define and therefore remained elusive. Generally, approaches to quality, both in translation studies and in translation industry, have concentrated on the product and/or process quality. Yet, in the present-day man and machine mediated, collaborative translation production networks, the challenge to define and manage quality comprehensively has become more acute than ever before. This paper participates in the discussion on organizational ergonomics of translation by presenting a three-dimensional quality model. The model encompasses not only the so-far familiar product and process dimensions, but also a third dimension called social quality. Social quality, the focus of this paper, addresses the relations of the actors involved, both human and non-human, and their organizational interaction. The theoretical discussion on quality will be complemented by a recent case from Finland regarding the working conditions of the audio-visual translators of Star Wars: The Force Awakens and their impact on translation quality. By emphasizing the point that quality is a multidimensional concept which also includes social and ethical aspects, the paper argues for workflow management that caters to the needs of people, who are the bedrock of the industry. |
Natalia Bondonno (Department for General Assembly and Conference Management - United Nations New York) Natalia Bondonno has been a United Nations staff member at the Department for General Assembly and Conference Management in New York since 2014. She is the Project Manager for machine-readable documents and for the UNTERM portal under the gText Project, which offers a suite of language applications, including eLUNa, an in-house developed CAT tool designed for UN language professionals. Ms. Bondonno has a degree in Legal Translation from the University of Buenos Aires, a masters in translation from the University of Alicante and a masters in International Law from Fundación Ortega y Gasset. Before joining the UN, she worked as a project manager and financial translator, and was a staff interpreter in NY Civil Court for four years. |
LUNA - The Web-based Family of Language Tools of the United Nations This presentation introduces the eLUNa family of language tools developed by the United Nations: a web-based computer-assisted tool, an editorial interface and a search engine, all specially designed for UN language professionals. The presentation will also cover recent and future developments in eLUNa, as well as a short update on the projects to produce machine-readable documents and to share eLUna with other organizations.) |
Sheila Castilho, Joss Moorkens, Federico Gaspari, Andy Way, Yota Georgakopoulou, Maria Gialama, Vilelmini Sosoni and Rico Sennrich (Dublin City University) Sheila Castilho is a post-doc researcher in the ADAPT Centre in Dublin City University. Her research interests include human and usability evaluation of machine translation, translation technology and audio-visual translation.
Co-authorsJoss Moorkens is an Assistant Professor and researcher in the ADAPT Centre, within the School of Applied Languages and Intercultural Studies in Dublin City University (DCU) with interests in human evaluation of translation technology, ethics and translation technology, and translation evaluation. Federico Gaspari teaches English linguistics and translation studies at the University for Foreigners “Dante Alighieri” of Reggio Calabria (Italy) and is a postdoctoral researcher at the ADAPT Centre in Dublin City University, where he works on EU projects focusing on machine translation evaluation. Andy Way is a Professor of Computing at Dublin City University (DCU) and Deputy Director of the ADAPT Centre. He is a former President of the European Association for Machine Translation and edits the journal Machine Translation. Panayota (Yota) Georgakopoulou holds a PhD in translation and subtitling and is a seasoned operations executive in the subtitling and translation industries, with significant experience in the translation academia as well. She is currently Senior Director, Research & Int’l Development at Deluxe Media, leading translation initiatives and research on language technologies and tools, and their application in subtitling workflows. Maria Gialama is currently working as Account Manager, R&D at Deluxe Media, focusing in the application of language technologies in subtitling. Maria received her MA in translation and subtitling from the University of Surrey and has extensive experience in translation ops management. Vilelmini Sosoni is Lecturer at the Ionian University in Greece. She has taught Specialised Translation in the UK and Greece and has extensive industrial experience. Her research interests lie in the areas of the translation of institutional texts, translation technology and audiovisual translation. Rico Sennrich is a research associate at the University of Edinburgh. His main research interest is machine learning, especially in the area of machine translation and natural language processing.
|
Crowdsourcing for NMT Evaluation: Professional Translators versus the Crowd The use of machine translation (MT) has become widespread in many areas, from household users to the translation and localization industries. Recently, the great interest shown in neural machine translation (NMT) models by the research community has made more detailed evaluation of this new paradigm essential, since several comparative studies using human and automatic evaluation of statistical and neural MT have shown that results concerning the improvements of NMT are not yet conclusive (e.g. Castilho et al. 2017). Crowdsourcing has become a frequently-employed option to evaluate MT output. In the field of translation, such crowds may consist of translation professionals, bilingual fans or amateurs, or a combination thereof. Crowdsourcing activities are at the heart of the European-funded research and innovation project TraMOOC (Translation for Massive Open Online Courses). In this presentation, we will focus on the MT evaluation crowdsourcing activities performed by professional translators and amateur crowd contributors. We will present the results of this evaluation based on automated metrics and post-editing effort and compare how translators and the general crowd assessed the quality of the NMT output. |
Emmanuelle Esperanca-Rodier, Laurent Besacier and Caroline Rossi (Université de Grenoble-Alpes) Emmanuelle Esperança-Rodier is a lecturer at Univ. Grenoble Alpes (UGA), France, where she teaches English for Specific Purpose and a member of the Laboratoire d’Informatique de Grenoble (LIG). After defending a PhD in computational linguistics, on “Création d’un diagnostic générique de langues contrôlées, avec application particulière à l’anglais simplifié”, she worked as a post-editor in a translation agency. Back at University, she participated in IWSLT and WMT evaluation campaigns, as well as in several LIG projects. She now works on the evaluation of MT systems based on competences and focused on tasks, translation error analysis and multilinguism. Co-authorsProf. Laurent Besacier defended his PhD thesis (Univ. Avignon, France) in Computer Science in 1998 on “A parallel model for automatic speaker recognition”. Then he spent one and a half year at the Institute of Microengineering (EPFL, Neuchatel site, Switzerland) as an associate researcher working on multimodal person authentication (M2VTS European project). Since 1999 he is an associate professor (full professor since 2009) in Computer Science at Univ. Grenoble Alpes (he was formerly at U. Joseph Fourier). From September 2005 to October 2006, he was an invited scientist at IBM Watson Research Center (NY, USA) working on Speech to Speech Translation. Caroline Rossi is a lecturer in the Applied Modern Languages department at Univ. Grenoble Alpes, where she teaches English and translation. She is a member of the Multilingual Research Group on Specialized Translation (GREMUTS) within ILCEA4 (Institut des Langues et Cultures d’Europe, Amérique, Afrique, Asie, Australie). Her current research focus is on integrating critical skills and understanding of both statistical and neural machine translation in translator training. Since 2014, Alexandre Bérard has been a PhD Student at the University of Lille (with Prof. Laurent Besacier, and Prof. Olivier Pietquin). He worked with the SequeL team (specialized on Machine Learning) at Inria Lille, and then from 2016, with GETALP (specialized on NLP) at the University of Grenoble. |
Evaluation of NMT and SMT systems : A Study on Uses and Perceptions Statistical and neural approaches have permitted fast improvement in the quality of machine translation, but we are yet to discover how those technologies can best “serve translators and end users of translations” (Kenny, 2017). To address human issues in machine translation, we propose an interdisciplinary approach linking Translation Studies, Natural Language Processing and Philosophy of Cognition. Our collaborative project is a first step in connecting sound knowledge of Machine Translation (MT) systems to a reflection on their implications for the translator. It focuses on the most recent Statistical MT (SMT) and Neural MT (NMT) systems, and their impact on the translator’s activity. BTEC-corpus machine translations, from in-house SMT and NMT systems, are subjected to a comparative quantitative analysis, based on BLEU, TER (Translation Edit Rate) and the modified version of METEOR from the LIG (Servan & al, 2016). Then, we qualitatively analyse translation errors from linguistic criteria (Vilar, 2006) or the MQM (Multidimensional Quality Metrics) using LIG tools, to determine for each MT systems, which syntactic patterns imply translation errors and which error type is mainly made. We finally assess translators’ interactions with the main error types in a short post-editing task, completed by 10 freelance translators and 20 trainees. |
Claudio Fantinuoli (Johannes Gutenberg Universität Mainz/Germersheim) Claudio Fantinuoli is Lecturer at the Johannes Gutenberg University Mainz in Germersheim and at the Institute for Translation Studies in Innsbruck. His research and teaching areas are Language Technologies in Translation and Interpreting. |
Speech Recognition in the Interpreter Workstation In recent years, computer-assisted interpreting (CAI) programs have been used by professional interpreters to prepare assignments, to organize terminological information, and to share event-related information among colleagues (Fantinuoli, 2016, 2017). One of the key features of such tools is the ability to support users in accessing terminology during simultaneous interpretation (SI). With state-of-the-art CAI tools, interpreters need to manually input a term or part of it in order to query the database. The main drawback to this approach is that from the booth it is considered both time-consuming and, to some extent, distracting during an activity that requires concentration and rapid information processing (Tripepi Winteringham, 2010). However, initial empirical studies on the use of such tools seem to support the idea that interpreters on the job may have the time and the cognitive ability to look up terms. Furthermore, CAI tools seem to contribute to improving terminology and overall interpreting performance (Prandi 2015; Biagini 2016). With this in mind, the automatization of the querying system would represent a step forward in reducing the additional cognitive effort needed to perform this human-machine interaction. With more free cognitive ability at disposal, it is reasonable to assume that a CAI tool equipped with an automatic look up system would contribute to further improving the terminology and overall performance of interpreters during the simultaneous interpretation of specialized texts. Speech Recognition (SR) has been proposed as a methodology and technology to automatize the querying system of CAI tools (Fantinuoli 2016; Hansen-Schirra 2012). In the past, the difficulty to build SR systems that were accurate enough to be useful outside of a carefully controlled environment has hindered its deployment in the interpreting setting. However, recent advances in Artificial Intelligence, especially since the dissemination of deep learning and neural networks, have considerably increased the quality of SR (Yu and Deng, 2015). In order to be successfully integrated in an interpreter workstation, both SR and CAI tools must fulfil a series of specific requirements. For example, SR must be truly speaker-independent, have a short reaction time, and be accurate in the recognition of specialized vocabulary. On the other hand, CAI tools need to overcome some shortcomings of current implementations and need, for instance, to handle morphological variants of the selection of results and new ways to present extracted terminology. In the first part of the paper, a framework for the integration of SR in CAI tools will be defined. In particular, much attention will be devoted to the analysis of state-of-the-art SR and the problems that may arise with its integration into an interpreter workstation. Secondly, the adaptation of traditional querying systems used in CAI tools to allow for keyword spotting will be discussed and a prototype will be presented. Finally, general advantages and shortcomings of SR-CAI integration will be highlighted and prospective developments of the use of SR that support the access of terminological data will be introduced, i.e. the recognition of numbers and entities. |
Michael Farrell (Traduzioni Inglese) Michael Farrell is an untenured lecturer in computer tools for translators and interpreters at the International University of Languages and Media (IULM), Milan, Italy, the developer of the terminology search tool IntelliWebSearch, a qualified member of the Italian Association of Translators and Interpreters (AITI), and member of the Mediterranean Editors and Translators association. Besides this, he is also a freelance translator and transcreator. Over the years, he has acquired experience in the cultural tourism field and in transcreating advertising copy and press releases, chiefly for the promotion of technology products. Being a keen amateur cook, he also translates texts on Italian cuisine.
|
Building a Custom Machine Translation Engine as Part of a Postgraduate University Course: a Case Study In 2015, I was asked to design a postgraduate course on machine translation (MT) and post editing. Following a preliminary theoretical part, the module concentrated on the building and practical use of custom machine translation (CMT) engines. This was a particularly ambitious proposition since it was not certain that students with degrees in languages, translation and interpreting, without particular knowledge of computer science or computational linguistics, would succeed in assembling the necessary corpora and build a CMT engine. This paper looks at how the task was successfully achieved using KantanMT to build the CMT engines and Wordfast Anywhere to convert and align the training data. The course was clearly a success since all students were able to train a working CMT engine and assess its output. The majority agreed their raw CMT engine output was better than Google Translate’s for the kinds of text it was trained for, and better than the raw output (pre-translation) from a translation memory tool. There was some initial scepticism among the students regarding the effective usefulness of MT, but the mood clearly changed at the end of the course with virtually all students agreeing that post-edited MT has a legitimate role to play. |
Joshua Goldsmith (Interpreter & Université de Genève) Josh Goldsmith is an EU-accredited interpreter working from Spanish, French, Italian and Catalan into English. He splits his time between interpreting and working as a trainer and researcher at the University of Geneva, where he focuses on the intersection between interpreting, technology and education. A lover of all things tech, Josh shares tips about technology and interpreting in conferences and workshops, the Interpreter’s Toolkit column (https://aiic.net/search/tags/the-interpreter’s-toolkit), and on Twitter (@Goldsmith_Josh). |
A Comparative User Evaluation of Tablets and Tools for Consecutive Interpreters Since the release of the first modern tablets, practicing interpreters have begun to consider how tablets could be used to support their interpreting practice. The first phase of a recent mixed methods assessed the pros and cons of different tablets, applications and styluses, finding that professional interpreters were effectively using tablets for consecutive interpreting in a wide range of settings. Results also indicated that certain types of tablets, applications and styluses were especially appreciated by practitioners (Goldsmith & Holley (2015). This paper presents the second phase of that study, building on previous conclusions to derive an instrument for carrying out a comparative user evaluation of these tablet interpreting tools. Using this instrument, it compares and contrasts the different tablets and accessories currently available on the market. Its conclusions are expected to serve as a useful guide to allow interpreters to pick the tablets, applications and styluses which best meet their needs. |
Sarah Griffin-Mason (ITI Chairperson & University of Portsmouth) I am currently Chair of the Institute of Translation and Interpreting (ITI) and Senior Lecturer in Translation Studies at the University of Portsmouth. I am an experienced freelance translator, editor and educator teaching translation at MA and UG levels on a half-time contract while running a freelance translation and editing business and in all my roles I express a deep commitment to improving translator training. |
The Human and the Machine: Perspectives to 2045 and Beyond As Chair of the Institute of Translation and Interpreting (ITI) and Senior Lecturer in Translation Studies at the University of Portsmouth (UoP), I am constantly forced to consider the messages I give out to UoP students and ITI members aiming to work as the translators and interpreters of the future. The machines will inevitably move on and it is likely that the bulk of human translators and interpreters will have to move on with them, so, ideally, the professional associations and training centres should prepare their members and students for mutually beneficial symbiotic relationships with the machines, helping them adapt to the possibilities of new modes and models of work. But just what might these be? Starting with Ray Kurzweil (who pencilled in ‘the singularity’ for 2045), I will take a trip through content from experts including Nicholas Carr, Spence Green and Dorothy Kenny to create some semblance of what the sector, skills profiles and working patterns might look like for the humans working with the machines, outlining some potential niches for humans in the sector of the future. |
Martin Kappus, Erik Angelone, Romina Schaub-Torsello, Martin Schuler and Maureen Ehrensberger-Dow (Zürich Institute of Translation and Interpreting) Martin Kappus is a lecturer in the ZHAW Institute of Translation and Interpreting. Before joining the ZHAW faculty, he worked for a CAT tool manufacturer and a large language service provider. His research and teaching interests are language technology in general, translation technology in particular, and barrier-free communication. Martin Schuler is a research associate and head of the usability lab at the ZHAW School of Applied Linguistics. He has a BA in technical communication and an MA in Human Computer Interaction Design. He has been involved in several types of usability projects for a variety of clients. Co-authors Maureen Ehrensberger-Dow is professor of translation studies in the ZHAW Institute of Translation and Interpreting. She has been the (co)investigator in several interdisciplinary projects investigating the reality and ergonomics of professional translation. Romina Schaub-Torsello is a research assistant at the ZHAW Institute of Translation and Interpreting. She is a trained translator and graduated from the ZHAW MA program in Applied Linguistics with a thesis about the impact that disturbances can have on translator’s cognitive flow and behaviour. Erik Angelone is professor of translation studies at the ZHAW Institute of Translation and Interpreting. His research interests are in process-oriented translator training, translation pedagogy and curricular design, and empirical translation studies.
|
When is Less Actually More? A Usability Comparison of Two CAT Interfaces The body of evidence is growing that CAT tools have fundamentally altered the tasks that most non-literary translators engage in and possibly also their cognitive processing. Recent research suggests that translators may be exposing themselves to unnecessary cognitive friction by the way they use their tools (O’Brien et al. 2017). If tool settings and features do not align with translators’ ways of working, then flow can be interrupted and cognitive load increased. Fatigue and reduced attention are two consequences of cognitive overload over extended periods, both of which have been associated with an increase in errors and lower productivity. We report on a usability comparison of two interfaces for translation work that differ with respect to the information and functions available on the screen when the factory default settings are used (i.e. one interface has several fields with supporting functions visible and the other has a simpler look). Eye tracking measures and indicators from retrospective commentaries and interviews highlight how novices interact with the two interfaces and various features. We consider the implications of our findings in light of recent calls for less cluttered user interfaces and open the discussion of how cognitive load can be reduced. |
Samuel Läubli (University of Zürich) and David Orrego-Carmona (Aston University) Samuel Läubli David Orrego-Carmona |
When Google Translate is better than Some Human Colleagues, those People are no longer Colleagues Expressing discomfort with machine translation is a recurrent theme in translator forums on social media. We analyse posts from Facebook, LinkedIn, and Twitter, showing that translators spend considerable time and effort on mocking and discussing machine translation failures. Research has shown a disconnect between translation technology researchers and practitioners, while at the same time indicating the benefits of collaboration between them. Reflecting on an analysis of posts and engagement of translators in social media, we outline three suggestions to bridge this gap: (i) identify and report patterns rather than isolated errors, (ii) organise or participate in evaluation campaigns, and (iii) engage in cross-disciplinary discourse. Rather than pointing out each other’s deficiencies, we call for computer scientists, translation scholars, and professional translators to advance translation technology by acting in concert. |
Bianca Prandi (Johannes Gutenberg Universität Mainz/Germersheim) Bianca Prandi is a doctoral student at the University of Mainz/Germersheim. She holds a BA in Intercultural Linguistic Mediation and a MA in Interpreting from the University of Bologna/Forlì. She graduated with a dissertation on the integration of the CAI tool InterpretBank in the curriculum of interpreting students. She is currently working on her doctoral dissertation at Mainz University under the supervision of Prof. Dr. Hansen-Schirra. Her main research interests are new technologies in interpreting and cognition. |
Designing a Multimethod Study on the Use of CAI Tools during Simultaneous Interpreting Even though studies on computer-assisted interpreting still represent a very small percentage in the body of research, the topic is starting to gain attention in the interpreting community. So far, only a handful of studies have focused on the use of CAI tools in the interpreting booth (Gacek, 2015; Biagini, 2015; Prandi, 2015a, 2015b). While they did shed some light on the usability and the reception of CAI tools as well as on the terminological quality of simultaneous interpreting performed with the support of such tools, these studies were only product-oriented. We still lack process-oriented, empirical research on computer-aided interpreting. A pilot study currently underway at the University of Mainz/Germersheim (Prandi, 2016, 2017) aims at bridging this gap by combining process- and product-oriented methods. After discussing the theoretical models adopted to date in CAI research, this paper will suggest how an adaptation of Seeber’s (2011) Cognitive Load Model can be better suited then Gile’s (1988, 1997, 1999) Effort Model to operationalize hypotheses on the use of CAI tools in the booth. The paper will then introduce the experimental design adopted in the study with a focus on the features of the texts used and on the rationale behind their creation. |
Jon Riding and Neil Boulton (United Bible Societies) Jon Riding leads the Glossing Technologies Project for United Bible Societies. The project develops language independent NLP systems to assist Bible translators by automatically analysing elements of natural languages. He is a Visiting Researcher at Oxford Brookes University.
Neil Boulton works as part of the Glossing Technologies Project for United Bible Societies. The project develops language independent NLP systems to assist Bible translators by automatically analysing elements of natural languages. Previously most of his working life has been spent in various IT roles for British and Foreign Bible Society, based in Swindon, UK. |
Learning from Sparse Data - Meeting the Needs Big Data can't Reach The Bible Societies have for many years built systems to help translators working in scores of languages. Their focus is in linguistic analysis rather than synthesis. But there is a problem, shared by all MT systems. Until there is enough text to train the system the output is limited. By the time there is enough training data much of the task may already be complete. To address this, United Bible Societies has begun to re-imagine what we expect from our translation support systems. A system is now in development which begins learning about the target language at the very start of a Bible translation project. Rather than building stand alone morphology analysers, glossing engines and aligners the project constructs a learning framework within which all of these machines, and more, can operate with very small amounts of text, using outputs from one context to strengthen a hypothesis from another. This paper describes a framework within which such processing might take place, how that framework enables learning to take place from very small amounts of data, how that learning is gradually aggregated into a coherent language model and how this model is used to support the translator in their work.
|
Marianne Starlander (Université de Genève) Dr. Marianne Starlander is a CAT tool specialist and lecturer at the Faculty of Translation and Interpreting of the University of Geneva. She joinded the multilingual information processing department in 2000 where she worked as a teaching and research assistant and now as teaching staff. She originally trained as a translator at the same faculty and also holds a post-graduate degree in European studies from the European Institute of the Unversity of Geneva (2000). |
Adapting a Computer Assisted Translation MA Course to New Trends The present paper will present how we adapted our MA CAT tool class to two current trends. The first trend is the drastic rise of the number of students enrolled in the class. We report on the impact of the rise of students by presenting how challenging it has become to give them the assignment described in Starlander et Morado Vazquez (2013) and in Starlander (2015) The discussion is oriented towards how to teach this evaluation methodology in a different way, by adapting the content and most of all the teaching methods (crowdsourcing, online quiz). We describe in detail how these new activities fit into the entire course content. The second trend is the integration of MT into CAT tools. How can we best introduce this evolution in our teaching? We present the main results of a preliminary experience of integrating a translation exercise involving the use of MT. The final discussion is dedicated to more general teaching challenges implied by the ever moving trends in teaching translation technology. |
Carlos Teixeira, Joss Moorkens, Daniel Turner, Joris Vreeke and Andy Way (Dublin City University) Carlos Teixeira is a post-doctoral researcher in the ADAPT Centre for Intelligent Digital Content Technology and a member of the Centre for Translation and Textual Studies (CTTS) at Dublin City University (DCU). He holds a PhD in Translation and Intercultural Studies and Bachelor degrees in Electrical Engineering and Linguistics. His research interests include Translation Technology, Translation Process Research, Translator-Computer Interaction, Localisation and Specialised Translation. He has vast experience in the use of eye tracking for assessing the usability of translation tools. His industry experience includes over 15 years working as a translator, localiser and language consultant. Joss Moorkens is a lecturer in Translation Technology in the School of Applied Language and Intercultural Studies (SALIS) at DCU and a researcher in the ADAPT Centre and CTTS. Within ADAPT, he has contributed to the development of translation tools for both desktop and mobile. He is co-editor of a book on human and machine translation quality and evaluation (due in 2018) and has authored journal articles and book chapters on topics such as translation technology, post-editing of machine translation, human and automatic translation quality evaluation, and ethical issues in translation technology in relation to both machine learning and professional practice. Daniel Turner is a research engineer in the ADAPT Centre’s Design & Innovation Lab (dLab). Within ADAPT, he has contributed to projects with a strong focus on rapid prototyping of user interfaces. He is proficient in full stack development with experience using a variety of languages and tools. Joris Vreeke is Scrum Master and Senior Software Engineer in the ADAPT Centre’s dLab. He has a background in software development and design with a preference for graphics, UI/UX and web application development. Andy Way is a professor in the School of Computing at DCU and leads ADAPT Centre’s Transforming Digital Content theme as well as the Localisation spoke, supervising projects with prominent industry partners. He has published over 350 peer-reviewed papers and successfully graduated numerous PhD and MSc students. His research interests include all areas of machine translation such as statistical MT, example-based MT, neural MT, rule-based MT, hybrid models of MT, MT evaluation and MT teaching. |
Creating a Tool for Multimodal Translation and Post-editing on Touch-Screen Devices Only a few translation tools have been created with an ‘organic’ integration of TM and MT, i.e. tools that were designed to work equally well for post-editing MT and for handling TM matches. Still, these scarce options are based on the traditional input modes comprised of keyboard and mouse. Building on our experience in creating a prototype mobile post-editing interface for smartphones, we have created a translation editing environment that accepts additional input modes, such as touch commands (on devices equipped with touch screens, such as tablets and select laptops) and voice commands – using automatic speech recognition. Another important tool feature is the inclusion of accessibility principles from the outset, with the aim of opening translation editing to professionals with special needs. In particular, the tool is being designed to cater for blind translators. Our presentation will report on initial usability tests with an early version of the tool. The results include productivity measurements as well as data collected using satisfaction reports. Our ultimate goal is to test whether the tool can help alleviate some of the pain points of the edit-intensive, mechanical task of desktop post-editing. |
Vincent Vandeghinste, Sven Coppers, Jan Van den Bergh, Tom Vanallemeersch, Els Lefever, Ayla Rigouts Terryn, Iulianna van der Lek-Ciudin, Bram Bulté, Frieda Steurs and Karin Coninx (Katholieke Universiteit Leuven) Dr. Vincent Vandeghinste is a post-doctoral researcher at the KU Leuven, and has been working on natural language processing and translation technologies since 1998. He is the project coordinator of the SCATE project (Smart Computer-Aided Translation Environment), and (co)-authored about 70 publications in the areas of corpus building, treebanking, machine translation, augmented alternative communication and text-to-pictograph translation. He teaches Natural Language Processing, Language Engineering Applications and Linguistics And Artificial Intelligence in the advanced masters program for Artificial Intelligence at KU Leuven, as well as Computational Linguistics to students of Linguistics.Vincent Vandeghinste is a post-doc researcher at the University of Leuven and has been working on machine translation since 2004. He is the project coordinator of the SCATE project, a 3 million euro Flemish project to improve the translation environment of professional translators. Sven Coppers studied computer science at Hasselt University (UHasselt) and is interested in various aspects of Human Computer Interaction, such as 2D and 3D visualizations, user-centered software engineering, context-awareness and intelligibility (comprehensibility). Currently, he is doing a PhD about making context-aware Internet-of-things applications more understandable and controllable for end-users. In addition, he is working on user interfaces for translation environment within the SCATE project, with a focus on usability, intelligibility, customization and collaboration. Dr. Jan Van den Bergh is a post-doctoral researcher and research assistant at Hasselt University and member of the HCI group in the research institute Expertise Centre for Digital Media. His research is situated in user-centred engineering of context-aware, mobile or collaborative systems. His recent research is focused on how interactive technology can support knowledge workers and/or end users in specific domains, including professional translation and human-robot collaboration in manufacturing. He obtained a PhD in computer science (human-computer interaction) from Hasselt University. He co-organized several scientific workshops and served as PC member for several conferences and workshops. He is a member of the IFIP working group 13.2 on User-Centred Systems Design. Tom Vanallemeersch is a researcher in the field of translation technology at KU Leuven. After his studies in translation and in language technology during the early nineties, he focused his attention on various forms of translation software, including translation memories, automated alignment and machine translation. His career involves both academia and industry. He worked in two Belgian translation agencies (Xplanation, LNE International), a French MT development company (Systran), and the MT team of the Commission’s DG Translation in Luxembourg. In academia, he taught the ins and outs of TM and MT to Applied Linguistics students at Lessius Hogeschool (now KU Leuven), then started working at the University’s Centre for Computational Linguistics. He currently performs research in SCATE (Smart Computer-Aided Translation Environment), an extensive, six-team project coordinated by the Centre. While Tom is passionate about translation technology, his career sporadically shifted to other types of natural language processing, such as terminology extraction (coordination of the TermTreffer project at Dutch Language Union). Dr. Els Lefever is an assistant professor at the LT3 language and translation technology team at Ghent University. She started her career as a computational linguist at the R&D-department of Lernout & Hauspie Speech products. She holds a PhD in computer science from Ghent University on ParaSense: Parallel Corpora for Word Sense Disambiguation (2012). She has a strong expertise in machine learning of natural language and multilingual natural language processing, with a special interest for computational semantics, cross-lingual word sense disambiguation, event extraction and multilingual terminology extraction. She is currently involved in the SCATE project (work package on bilingual terminology extraction from comparable corpora) and the Multilingual IsA project (multilingual database of hypernym relations) and supervises PhD projects on terminology extraction from comparable corpora, semantic operability of medical terminology, irony detection and disambiguation of terminology in a cross-disciplinary context. She teaches Terminology and Translation Technology, Language Technology and Digital Humanities courses. Ayla Rigouts Terryn is a PhD researcher at the Language and Translation Technology Team (LT3) research group, at the Department of Translation, Interpreting and Communication of Ghent University. She graduated from the University of Antwerp in 2014 with a Master’s in Translation and worked there as a scientific fellow on a one-year research project about translation revision competence. In 2015, she joined the LT3 research group to work on the terminology work package of the SCATE project. She is currently working as an FWO scholar on her PhD about bilingual terminology extraction from comparable corpora. Bram Bulté obtained an MA in linguistics and literature (2005) and a PhD in linguistics (2013) from Brussels University, and an MA in statistics (2016) and in artificial intelligence, option speech and language technology (2017) from KU Leuven. He worked as a translator for the European Parliament (2007-2015) and as a guest professor at Brussels University (2015-2017). He currently works for the Centre for Computational Linguistics at KU Leuven. His research focuses on second language acquisition, multilingual education and natural language processing. Iulianna van der Lek is passionate about Language Technologies, always looking for ways to improve the translators’ efficiency. She is currently working as a Research Associate and teaching assistant at KU Leuven, Faculty of Arts, Campus Antwerp. Her research focuses on computer-assisted translation tools and their impact on the translation process, usability, and methods of acquiring domain-specific terminology. As a certified memoQ, Memsource and SDL trainer, she is teaching computer-assisted translation tools both to students and freelance translators. Besides research and teaching activities, she is also coordinating the Postgraduate Programme in Specialised Translation, developing new modules on language technologies and training programs for professional translators. Prof. Dr. Karin Coninx is full professor at Hasselt University (UHasselt), Belgium. She obtained a PhD in sciences, computer science after a study of Human-Computer interaction (HCI) in immersive virtual environments. Her research interests include user-centred methodologies, persuasive applications in the context of eHealth, technology-supported rehabilitation, serious games, (multimodal) interaction in virtual environments, haptic feedback, intelligibility, mobile and context-sensitive systems, interactive work spaces, and the model-based realisation of user interfaces. Prof. Dr. Frieda Steurs is full professor at the KU Leuven, Faculty of Arts, campus Antwerpen. She works in the field of terminology, language technology, specialized translation and multilingual document management. She is a member of the research group Quantitative Lexicology and Variation Linguistics (QLVL). Her research includes projects with industrial partners and public institutions. She is the founder and former president of NL-TERM, the Dutch terminology association for both the Netherlands and Flanders. She is also the head of the ISO TC/37 standardization committee for Flanders and the Netherlands.. She is the president of TermNet, the International Network for Terminology (Vienna). Since 2016, she is the head of research of the INT, the Dutch Language Institute in Leiden. In this capacity, she is responsible for the collection, development and hosting of all digital language resources for the Dutch Language. The INT is the CLARIN centre for Flanders, Belgium. |
The SCATE Prototype: A Smart Computer-Aided Translation Environment We present SCATE: A Smart Computer-Aided Translation Environment developed in the SCATE research project. It is a carefully designed prototype of the user interface of a translation environment, that displays different translation suggestions coming from different resources in an intelligible and interactive way. Our environment contains carefully designed representations that show relevant context to clarify why certain suggestions are given. In addition, several relationships between the source text and the suggestions are made explicit, to help the user understand how a suggestion can be used and select the most appropriate one. Well designed interaction techniques are included that improve the efficiency of the user interface. The suggestions are generated through different web services, such as translation memory (TM) fuzzy matching, machine translation (MT) and support for terminology. A lookup mechanism highlights terms in the source segment that are available with their translation equivalents in the bilingual glossary. |
Geoffrey Westgate (World Intellectual Property Organization) Geoffrey Westgate is Head of the Support Section, PCT Translation Division, at WIPO in Geneva, Switzerland. After obtaining a DPhil in 1999 from the University of Oxford, UK, where he also taught German language and literature, he worked initially as a translator and then a reviser in WIPO’s patent translation department. Since 2009 he has headed the Division’s Support Section, with responsibility for computer-assisted translation tools, translation project management, and terminology management, including WIPO’s online terminology portal, WIPO Pearl. |
WIPO Pearl - The Terminology Portal of the World Intellectual Property Organization In this paper we shall present WIPO Pearl, the multilingual terminology portal of the World Intellectual Property Organization, a specialized agency of the United Nations. The nature of the linguistic dataset made available in WIPO Pearl will be described and we shall show how multilingual knowledge representation is achieved and graphically displayed. Secondly, we shall demonstrate how such data is exploited to facilitate search of prior art for patent filing or examination purposes, by leveraging the validated linguistic content as well as the validated conceptual relations that are presented in “concept maps”. We shall discuss how, in addition to humanly validated concept maps, “concept clouds” are generated by means of machine learning algorithms which automatically cluster concepts in the database by exploiting textual data embedded in the terminology repository. And finally, we shall present opportunities for collaborations with WIPO in the field of terminology. |
Andrzej Zydroń (XTM) Andrzej Zydroń MBCS CITP CTO @ XTM International, Andrzej Zydroń is one of the leading IT experts on Localization and related Open Standards. Zydroń sits/has sat on, the following Open Standard Technical Committees: 1. LISA OSCAR GMX Zydroń has been responsible for the architecture of the essential word and character count GMX-V (Global Information Management Metrics eXchange) standard, as well as the revolutionary xml:tm (XML based text memory) standard which will change the way in which we view and use translation memory. Zydroń is also chair of the OASIS OAXAL (Open Architecture for XML Authoring and Localization) reference architecture technical committee which provides an automated environment for authoring and localization based on Open Standards. Specific areas of specialization: |
Beyond Neural MT A lot of hype and excitement has surrounded the latest advances in Neural Machine Translation (NMT) and generally with some justification: output is generally more fluent and closer to normal human output. Nevertheless some of the claims need to be qualified and practical implementation of NMT is not without difficulty: typically double the training material is required when compared to Statistical Machine Translation (SMT) and training a new engine can take weeks not days. Although NMT can produce much more fluent output than SMT it can have limited impact concerning real world localization tasks. Extensive tests have proven that in the end there is no great improvement in post-editing throughput and NMT is no panacea for omission or mistranslation. With NMT it is also impossible to ‘tune’ the output as can be done with SMT: you have no knowledge of how the NMT engine has made its decisions. Most practical translation projects do not have anywhere near enough training data and do not have the luxury of waiting weeks for an engine to be trained even if there was enough data. If the training has been done on unrelated material or material that is not directly relevant to the customer’s terminology then misleading results can be produced: due to its improved fluency NMT can make identifying mistakes more difficult. NMT quality can also drop with the length of sentences and translating from a morphologically rich language to one that is morphologically impoverished can in fact produce worse results than SMT. |