Posters – TC38

Rocio Palomares-Perraut and Carmen Gomez-Camarero (University of Málaga)

Carmen Gómez-Camarero is Assistant Professor at the University of Málaga, specialized in Librarian and Information Sciences. She teaches and researchs in Information Literation in Translation and Interpretation Degree and also in Gender Studies and Information Literacy Skills. She has participated in several projects of academic research in these areas of knowledge, and published works in national and international frameworks.

Rocío Palomares-Perraut is Assistant Professor at the University of Málaga, specialized in Librarian and Information Sciences. She teaches and researchs in Information Literation in Translation and Interpretation Degree and also in Gender Studies and Information Literacy Skills. She has participated in several projects of academic research in these areas of knowledge, and published works in national and international frameworks.

How translators can improve multilingual terminology in a link: teaching case study examples

The Web has become an unavoidable tool to get information. It is a platform where nodes such as objects, people and identities are interconnected. User translators collaborate and exchange information and knowledge in order to obtain the most accurate and appropriate data to ensure the quality of a translation work. To do that, however, the translator’s literacy skills are crucial in determining the suitability to choose a selected appropriate term or expression.

In this work we describe a method to enrich and improve the translators’ and interpreters’ multilingual terminology to translate specialized texts from a single link. This method consists of using controlled vocabularies such as thesauri, classification schemes, subject heading systems and taxonomies. Those controlled vocabularies employ Linked Open Data technology in the framework of Semantic Web. This standard is about using the Web to connect related data that wasn’t previously linked in a way that, in the context of terminology, each concept is uniquely identified by an URIs (Uniform Resource Identifier) and SKOS (Simple Knowledge Organization System) to support the use of knowledge organization systems.

These controlled vocabularies have various functions. These include, among others: gathering together the richness of variant terms and synonyms for concept and to link concepts in a logical order, sorting terms into categories, promoting consistency in preferred terms, improving and enriching the translators’ multilingual and specialized terminology. To that end, some examples employed in class in order to teach these functions and usefulness of controlled vocabularies are described. The specialized texts used are taken from SINC (Servicio de Información y Noticias Científicas), a Spanish website for scientific communication. Students must identify and select the proper keywords looking up these different controlled vocabularies: UNESCO Thesaurus, Cultural Heritage of Spain Thesauri (Tesauros del Patrimonio Cultural de España), Répertoire d’autorité-matière encyclopédique et alphabétique unifié (RAMEAU) of the National Library of France and Library of Congress Subject Headings (LCSH).

Mark Unitt (Capita Translation and Interpreting)

Mark initiated his career in the translation industry back in 1991 as a localization engineer. From early days, he was involved on the development of computer assisted translation environments, having developed methodology for IBM TM2, Trados and DejaVu systems.
In 2004, as part of an initiative by Applied Language Solutions, Mark started working with a team of NLP professionals in the development of Moses based Machine Translation environments and the application of Machine Translation as part of the Localization workflows.Currently, Mark is Head of Language Product Development at Capita Translation and Interpreting, where he manages a team of language experts and software developers working on a number of projects to support the integration of technology in the translation workflows. These projects include areas such as predictive analysis of Machine Translation quality, automation of workflows, Big data analysis and data mining and collection for corpus creation.

Combining different tools to build a semi-supervised data collection model to increase MT quality and performance

Quality of Machine Translation (MT) output is clearly driven by the quality of the input. Those LSPs, like Capita, in search of the best way to implement MT within their industry/domain specific workflows, are continuously looking for mechanisms to assist them in the collection of quality corpus that can be used for engine building and training.

Our NLP team have been working during the past 3 years on building and training a number of “non-customer” domain specific engines for different language pairs. During this process they have built a semi supervised corpus collection tool that has helped to significantly reduce collection time and increase the quality of the data.

We would like to share our model and explain how we have used existing tools in combination with our in-house developments to build a data collection model that uses keywords generated from domain specific material to assess compatibility with the domain. We will then use these keywords to identify compatible monolingual data, which will then be paired to target languages. Once bilingual data is available, our tools will produce segmented bilingual corpora using alignment tools.

Finally, we reach a set of data that can be evaluated and cleaned both using technology and human evaluation processes.