Posters – Archive of TC36

(alphabetical order of presenter)

Hanna Bechara

I previously studied at Eberhard Karls University of Tubingen, where I earned my Bachelor’s degree in Computational Linguistics and Computer Science in 2010.

After completing an internship at Dublin City University, I started research on statistical post-‐editing for machine translation and was awarded my Masters of Science (by Research) from Dublin City
University in 2013.

I joined the EXPERT project as ESR 12 on January, 6th 2014, at the University of Wolverhampton.
My research centres around evaluation methods of different machine translation systems, specifically the hybrid systems proposed by the EXPERT projects.

This work is under the supervision of Dr Constantin Orasan and Prof Ruslan Mitkov.

Rohit Gupta
Gupta

I am happy to be a part of the Research Group in Computational Linguistics, Research Institute of Information and Language Processing, University of Wolverhampton, UK.

I am working here as a Early Stage Researcher in Marie Curie EXPERT Project and also pursuing PhD under the supervision of Dr Constantin Orasan, Professor Ruslan Mitkov and Dr Iustin Dornescu.

In past I have worked at CDAC Mumbai on English to Indian Language Machine Translation (EILMT) project in collaboration with IIT Bombay and CDAC Pune.

I have completed my Master of Technology (silver medallist) in Information Technology (with specialization in Intelligent Systems) from Indian Institute of Information Technology Allahabad, India and Bachelor of Technology from University Institute of Engineering and Technology, CSJM University Kanpur, India.

My research interests include Computer Aided Translation, Translation Memory, Statistical Machine Translation, Anaphora Resolution and Information Retrieval. For more information please refer to my website (http://pers-‐
www.wlv.ac.uk/~in4089/index.html).

Constantin Orasan

I am currently reader in computational linguistics at University of Wolverhampton, and the deputy head of the Research Group in Computational Linguistics.

I am also the Local Course Coordinator (Wolverhampton) for the Erasmus Mundus International Masters in Natural Language Processing and Human Language Technology (IM NLP&HLT), coordinator of the FIRST project and coordinator of the EXPERT project.

In the past, I managed two European projects QALL-‐ME and MESSAGE and was involved in many other projects.

Before becoming a lecturer, I worked as research fellow on the CAST project. I have received my BSc in computer science at Babes-‐Bolyai University, Cluj-Napoca, Romania and my PhD from University of Wolverhampton with a thesis entitled Comparative evaluation of modular automatic summarisation systems using CAST under the supervision of Prof. Ruslan Mitkov and Dr. Chris Paice.

I am currently the module leader of a course on programming for corpus linguistics and am also teaching on two other masters courses.

My research interests in computational linguistics are text summarisation, question answering, anaphora and coreference resolution, building, annotation and exploitation of corpora and machine learning for natural language processing. Please refer to my website for more information
(http://pers-www.wlv.ac.uk/~in6093/).

Intelligent Translation Memory Matching and Retrieval Metric Exploiting Linguistic Technology

Translation Memories (TM) help translators in their task by retrieving previously translated matches and editing fuzzy matches when no exact match is found in the system. Current TM systems use simple edit distance or some variation of it, which largely rely on the surface form of the sentences and does not necessarily reflect the semantic similarity of segments as judged by humans. In this paper, we propose an intelligent metric to compute the fuzzy match score which is inspired by similarity and entailment techniques developed in Natural Language Processing. Translation Memories (TM) help translators in their task by retrieving previously translated matches and editing fuzzy matches when no exact match is found in the system. Current TM systems use simple edit distance or some variation of it, which largely rely on the surface form of the sentences and does not necessarily reflect the semantic similarity of segments as judged by humans.

We have adapted our SemEval textual similarity and entailment system (Gupta et al., 2014) to measure similarity for TM segments. Given the amount of calculation involved in the task, we kept only those features which can be quickly calculated and have the most impact from the original system. The system uses features based on surface form, parts of speech information, lemma, typed dependency parsing, named entities, paraphrasing, machine translation evaluation, and corpus pattern analysis (Hanks, 2013).

Stanford CoreNLP3 toolkit (Manning et al., 2014) provides lemma, parts of speech (POS), named entities, and dependencies relations of words in each sentence. We used the PPDB paraphrase database (Ganitkevitch et al., 2013) to get the paraphrases. After getting all features, we employed a support vector machine in order to build a regression model to predict semantic similarity using 5000 parallel sentences of the SICK dataset (Marelli et al., 2014), which contains human similarity scores as a training corpus.

We carried out a preliminary evaluation on 500 sentences as input and 5000 sentences as a TM from DGT-TM corpora(Steinberger et al., 2012), using English as source and French as target. A target side (French) of 500 sentences input was considered as reference for evaluation. We used the edit-distance measure implemented by OmegaT as a baseline. The Meteor (Denkowski and Lavie, 2014) and BLEU (Papineni et al., 2002) scores were used in our evaluation. For each of the 500 input sentences, we retrieved the most similar sentence (and their proposed translation into French) as indicated by the baseline and our similarity metric. The proposed method yields better results for both BLEU and Meteor scores. The BLEU score improved from 7.20 to 7.80 and Meteor score improved from 0.175 to 0.193. Furthermore, when restricting the retrieval to 70% and above threshold, the Meteor score improved from 0.915 to 0.926 and BLEU score improved from 77.32 to 81.61.

We are in a stage of improving and speeding up our system to create a larger-scale evaluation. The initial results, as stated earlier, show improvements in spite of training the system on a different genre of corpus. The SICK dataset consists of mostly simple sentences extracted from image captions while DGT-TM corpus has much larger and complex sentences.

————-
References

Michael Denkowski and Alon Lavie. 2014. Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the EACL 2014 Workshop on Statistical Machine Translation.
Juri Ganitkevitch, Van Durme Benjamin, and Chris Callison-Burch. 2013. Ppdb: The paraphrase database. In Proceedings of NAACL-HLT, pages 758–764, Atlanta, Georgia.
Rohit Gupta, Hanna B´echara, Ismail El Maarouf, and Constantin Orasan. 2014. UoW: NLP techniques developed at the University of Wolverhampton for Semantic Similarity and Textual Entailment. In Proceedings of the 8th
International Workshop on Semantic Evaluation (SemEval-2014).
Patrick Hanks. 2013. Lexical Analysis: Norms and Exploitations. Mit Press.
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky.
2014. The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 55–60.
Marco Marelli, Stefano Menini, Marco Baroni, Luisa Bentivogli, Raffaella Bernardi, and Roberto Zamparelli. 2014. A sick cure for the evaluation of compositional distributional semantic models. In Proceedings of LREC
2014.
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the ACL, pages 311–318.
Ralf Steinberger, Andreas Eisele, Szymon Klocek, Spyridon Pilos, and Patrick Schl¨uter. 2012. DGT-TM: A freely available Translation Memory in 22 languages. LREC, pages 454–459.

Hernan Costa

Hernani Costa is currently a Marie Curie Early Stage Researcher in the Department of Translation and Interpreting at the Faculty of Philosophy and Humanities, University of Malaga, Spain. His main research interest lies in the Computational Linguistics and Artificial Intelligence areas, especially its practical application in the fields of Translation Technologies, Natural Language Processing, Information Extraction and Information Retrieval. He is also interested in (or has worked on) a number of other topics such as Recommender Systems, Multiagent Systems, Affective Computing, amongst others.

Hernani completed his BSc and MSc on Informatics Engineering in the Bologna model at the Department of Informatics Engineering of the University of Coimbra (UC) in 2010.

It was during his Master degree that he started his research activities. In particular, he developed a system capable of acquiring semantic knowledge from any kind of Portuguese text. Besides that, he also analysed the benefits of applying similarity distributional metrics, based on the occurrence of words in documents, on the system outputs. In the same academic year, he applied for a research grant to work in the “Automatic Construction of Ontologies in Portuguese” project, where he explored popular distributional similarity measures with the purpose of quantify relational triples in an automatic fashion. Furthermore, in September 2011, he was invited by LAP LAMBERT Academic Publishing to publish his MSc thesis. The book was published on October 2011 named “Automatic Extraction and Validation of Lexical Ontologies from Text: Creating Lexical Ontologies from text”. In a total of six years, he studied at the aforementioned institution, developing skills on the field of Computer Science, except in the academic year of 2007/2008 where he integrated the Erasmus program for a year at the University of Vigo, Spain. During this period, besides starting to acquire skills in the Natural Language Processing area, he developed his skills in interpersonal relations (meeting Erasmus students from other cultures and languages), teamwork, research, organisation and autonomy which enabled him to develop writing and speaking skills in Spanish, as well as in English.
In October 2010, he applied for a scholarship and, between December 2010 and August 2013, he worked on the project “Forms of Selective Attention in Intelligent Transportation Systems”, at the Cognitive and Media Systems (CMS) group, at the Department of Informatics Engineering of the University of Coimbra. In this project, a two-parted agent architecture was implemented, with an agent responsible for gathering Points of Interest (POIs) from a location-based service, and a set of Personal Assistant Agents (PAAs) collecting information about the context and the intentions of its respective user. In each PAA were embedded a set of Machine Learning algorithms, with the purpose of ascertaining how wellsuited these classifiers are for filtering irrelevant POIs, in a completely automatic fashion.
During the Autumn 2011, he also developed an online service for browsing Portuguese semantic relations for the Linguateca project. It is also important to mention that he has three years of experience on teaching area. In the first two years he taught at Lousã Professional School (2010/2011 and 2011/2012) and in the academic year 2012/2013 he taught at Coimbra Institute of Engineering (ISEC). As always, he is highly motivated to find new challenges that defy his competences and skills in computer science field. That is why he enrolled the doctoral program in September 2013 at the Department of Translation and Interpreting, at the Faculty of Philosophy and Humanities of the University of Malaga, Spain.

CO-authors did not provide biographical data.

iCompileCorpora: A Web-based Application to Semi-automatically Compile Multilingual Comparable Corpora

Co-authors: Hernani Costa, Gloria Corpas Pastor and Miriam Sighiri
This article presents a robust and agile web-based application to semi-automatically compile monolingual and multilingual comparable corpora, which we named iCompileCorpora. The dimensions that comprise iCompileCorpora can be represented in a layered model comprising a manual, a semi-automatic web-based and a semi-automatic Cross-Language Information Retrieval (CLIR) layer.
This design option will permit not only to agile the compilation process, but also to hierarchically extend the manual layer features to the semi-automatic web-based layer and then to the semi-automatic CLIR layer.

The manual layer represents the option of compile monolingual or multilingual corpora. It will allow to manually upload documents from a local or remote directory into the platform.
The second layer will permit to exploit either monolingual or multilingual corpora mined from the Internet.

As nowadays there is an increasing demand for systems that can somehow cross the language boundaries by retrieving information of various languages with just one query, the third layer aims to answer this demand by taking advantage of CLIR techniques to find relevant information written in a language different from the one semi-automatically retrieved by the methodology used in the previous layer.

The benefits of using corpora in translation have been demonstrated by various authors (cf. Zanettinet al., 2003; Corpas Pastor and Seghiri, 2009). The main advantages of its usage are their objectivity, reusability, multiplicity and applicability of uses and easy handling and quickly access to large volume of data. Moreover, they are a suitable tool for translators, as they can easily determine how specific words and their synonyms collocate and vary in practical use. Furthermore, in the last decade, a growing interest in bilingual and multilingual corpora have been shown by researchers working in other fields, such as terminology and specialised language, automatic and assisted translation, language teaching, Natural Language Processing, amongst others.

Although corpora can be manually compiled, nowadays specialised tools can be used to automate this tedious process.
By a way of example, BootCaT <http://bootcat.sslmit.unibo.it> (Baroni and Bernardini, 2004) was built to exploit specialised monolingual corpora from the Web. It is capable of compiling a corpus through automated search queries, and only requires a small set of seed words as input. This tool was been used, for example, to create specialised comparable corpora for travel insurance (Corpas Pastor and Seghiri, 2009), medical treatments (Guti´errez Florido et al., 2013), amongst other narrow-domains.
WebBootCat <www.sketchengine.co.uk/documentation/wiki/Website/Features#WebBootCat> (Baroni et al., 2006) is similar to BootCaT, but instead of having to download and install the application, WebBootCat can be used online.

Even though they were designed for other purposes, Terminus <http://terminus.iula.upf.edu//cgi-bin/terminus2.0/terminus.pl> and Corpogr´afo <http://www.linguateca.pt/corpografo/> should also be mentioned as examples of web-based compilation tools.

As we can see, several semi-automatic compilation tools have been proposed so far. Nevertheless, their simplicity, lack of features, performance issues and usability problems result in a pressing need to design compilation tools tailored to fulfil not only translators and interpreters needs, but also professionals and ordinary people. Departing from a careful analysis of their weaknesses and strengths, we started by designing and developing a robust and agile web-based application to semi-automatically compile monolingual and multilingual comparable corpora, which we named iCompileCorpora. The dimensions that comprise iCompileCorpora can be represented in a layered model comprising a manual, a semi-automatic web-based and a semi-automatic Cross-Language Information Retrieval (CLIR) layer. This design option will permit not only to agile the compilation process, but also to hierarchically extend the manual layer features to the semi-automatic web-based layer and then to the semi-automatic CLIR layer. In detail, the manual layer represents the option of compile monolingual and multilingual corpora. It will allow to manually upload documents from a local or remote directory into the platform. The second layer will permit to exploit both monolingual and multilingual corpora mined from the Internet. Although this layer can be considered similar to the approaches used by BootCaT and WebBootCat, it has been designed to address some of their limitations (e.g. admit the use of more than one boolean operator when creating search query strings), as well as to improve the User Experience (UX) with this type of software. As nowadays there is an increasing demand for systems that can somehow cross the language boundaries by retrieving information of various languages with just one query, the third layer aims to answer this demand by taking advantage of CLIR techniques to find relevant information written in a language different from the one semi-automatically retrieved by the methodology used in the previous layer. To conclude, this article presents an ongoing project that aims to agile the compilation of monolingual and multilingual comparable corpora. iCompileCorpora intends to fulfil not only translators and interpreters needs, but also professionals and ordinary people, either by breaking some of the usability problems found in the current compilation tools available on the market or by improving their limitations and performance issues.
——————–
References:

Baroni, M. and Bernardini, S. (2004). BootCaT: Bootstrapping Corpora and Terms from theWeb. In 4th Int. Conf. on Language Resources and Evaluation, LREC’04, pages 1313–1316.
Baroni, M., Kilgarriff, A., Pomik´alek, J., and Rychl´y, P. (2006). WebBootCaT: instant domain-specific corpora to support human translators. In 11th Annual Conf. of the European Association for Machine Translation, EAMT’06, pages 247–252, Oslo, Norway. The Norwegian National LOGON Consortium and The Deparments of Computer Science and Linguistics and Nordic Studies at Oslo University (Norway).
Corpas Pastor, G. and Seghiri, M. (2009). Virtual Corpora as Documentation Resources: Translating Travel Insurance Documents (English-Spanish). In Beeby, A., In´es, P., and S´anchez-Gij´on, P., editors, Corpus Use and Translating: Corpus Use for Learning to Translate and Learning Corpus Use to Translate, Benjamins translation library, chapter 5, pages 75–107. John Benjamins Publishing Company.
Guti´errez Florido, R., Corpas Pastor, G., and Seghiri, M. (2013). Using semi-automatic compiled corpora for medical terminology and vocabulary building in the healthcare domain. In 10th Int. Conf. on Terminology and Artificial Intelligence (TIA’13), Workshop on Optimizing Understanding in Multilingual Hospital Encounters, Paris, France.
Zanettin, F., Bernardini, S., and Stewart, D. (2003). Corpora in Translator Education. Manchester: St. Jerome Publishing.

Sabine Hunsicker

Sabine Hunsicker is a computational linguist with experience in statistical natural language processing with a focus on machine translation. Sabine completed her M.Sc. in computational linguistics at Saarland University in 2009. Her thesis concerned example-based machine translation enhanced with statistical methods. Before euroscript, she worked at the German Research Center for Artificial Intelligence (DFKI) in Saarbrücken. Her research topics were hybrid machine translation and she was involved with the Euromatrix Plus and ACCURAT research projects with a strong focus on integrating linguistic analysis into the SMT workflow. Her areas of expertise include linguistic analysis, data mining as well as information retrieval.

Alexandru Ceausu is a computational linguist with experience in statistical machine translation. Alexandru completed his PhD with a thesis on statistical machine translation for morphologically-rich languages at the Research Institute for Artificial Intelligence of the Romanian Academy in Bucharest. Before euroscript, from 2010 to 2012, he has been working as a post-doctoral researcher for the Centre for Next Generation Localisation (CNGL) at the Dublin City University. In Dublin, he was involved in the PLUTO project, for which he developed machine translation systems adapted to invention patents. Before that, he worked as a researcher at the Research Institute for Artificial Intelligence in Bucharest, where he was involved in several European projects including ACCURAT, CLARIN, SEE-ERA.NET. His areas of expertise include pre-ordering and domain adaptation for machine translation, cross-lingual information retrieval, lexical ontologies, distributed processing using Apache projects (Hadoop, Mahoot, Solr, Nutch, UIMA, etc.).

Machine Translation Quality Estimation Adapted to the Translation Workflow

No short abstract available. We invite you to read the full text below.

1 Introduction

The varying quality of MT poses a problem in the translation workflow as it requires different levels of post-editing effort. As one sentence may be translated well, the next one may turn out to be completely unusable. There might be cases where discarding the MT suggestions and trans-lating from scratch is going to be faster. This de-cision time also increases the post-editing time.

In order to be able to exploit the full potential of MT suggestions, they should be annotated with a score that is indicative for the translation quality. The translators are already familiar with such scores from the usage of translation memory (TM).

MT does not assign a score to its output. The decoder calculates internal scores to find the best hypothesis from the translation options, but these decoder scores cannot be used to estimate a level of quality.

For an LSP, a predictive score for MT quality would be very useful, as this would be in line with the way TMs are used in the workflow. An-other important advantage is that it provides an upfront estimation of the cost for a given translation.

2 Confidence Score

The translation workflow at euroscript involves several MT-related stages. One of these stages contains the quality estimation component which we call confidence score, e.g. a component that would answer the question on ‘how confident is the MT that a particular sentence was well-translated? ‘

In order to reduce the annotation effort, we developed this score starting from the automatic scores. The approach can be easily automated so that it can be run immediately after training a new MT system. Another advantage is that there

is no time lost in finding human annotators and for data creation before the new MT system is deployed in production.

The prediction model makes use of a combina-tion of system-independent and sys-tem-dependent features. For example, the sentence lengths of the source, the reference and the MT candidate are taken into account. The sys-tem-dependent features vary on the MT system that should be evaluated. SMT systems usually provide different scores calculated during decod-ing.

Each training instance includes the source sentence, the target sentence, the MT candidate, the feature vector and the automatic score.

The training algorithm automatically chooses a well-distributed sample of training instances to train the prediction model. As the confidence score is integrated into the MT workflow, each MT request is automatically annotated with the confidence score.

The confidence score is optimized to predict which of the following levels of quality the cur-rent translation belongs to:

good (no or little editing needed)
usable (some editing is required)
useless (discard)

3 Experiments

For exemplification, we present our confidence score experiments on the language direction Eng-lishDanish.

The texts used in our experiments come from the public domain. The training data was created by translating these texts with MT and then post-editing the results. As such, the translations are usually very close to the MT candidates, ex-cept where MT was of such a bad quality that it was discarded. The test set contains 1074 sentences.

During translation with MT, the automatic scores for the classifier were collected and a con-

fidence model was trained on the resulting data. After integration into the translation workflow, the model was evaluated on unseen texts from the same domain.

We trained prediction models for the follow-ing four automatic scores:

BLEU
normalised Editing Distance
Fuzzy Match

To compare the different metrics, we calculate three types of measures: the mean absolute error (MAE), the root mean squared error (RMSE) and Pearson’s correlation coefficient (r).

Confi dence Score	Score	RMSE	MAE	r
conf BLEU	BLEU	0.4577	0.2894	-0.1615
	ED	0.6765	0.6469	-0.2414
	FM	0.5305	0.4618	0.2358*
confED	BLEU	0.5743	0.5422	0.0063
	ED	0.1878	0.1537	0.1980
	FM	0.3611	0.2658	-0.1414
confFM	BLEU	0.3861	0.3257	0.4063*
	ED	0.4707	0.4348	0.2743
	FM	0.3858	0.3483	-0.1044

Table 1. Evaluation statistics for ENDA confidence score correlation.

Table 1 shows the evaluation results for all four prediction models. We see that the error rates differ considerably between the evaluation met-rics and the error rates. The prediction model based on the editing distances performs quite well: it achieves the lowest error rates and corre-lates moderately with the score it tries to predict.

As predicting the full range of scores is a very complex task, we decided to scale down and only predict the three quality levels described in Section 2.

To determine the thresholds of these levels, we ran two experiments, one with a very high level (95% for good, 75% for usable) and the other with a moderate level (75% for good, 50% for usable).

From these experiments, we can tell that the moderate quality levels are easier to predict, as we achieve higher correlation values with them. The editing distance model performs well here as well: choosing the minimum thresholds, we achieve a correlation of 0.2588 of the confidence score to the actual editing distance score.

In a human evaluation, we used a random sample of the evaluation data to be judged by professional translators. Of 221 sentences, 102 scores were judged to be appropriate and 119 to be inappropriate, indicating that the scoring mechanism needs more fine-tuning, but that it is still usable.

4 Conclusion

We presented a practical example on how to in-corporate the confidence score into the tradition-al translation workflow. The first prototype of our confidence score works well in our production set-up. To fine-tuning the predictions require more evaluation and data, both of which are cre-ated automatically during the translation post-editing in production.

Emmanuel Rayner

Emmanuel (Manny) Rayner is a Collaborateur Scientifique (senior non-teaching research post) at the University of Geneva’s Multilingual Processing Group, and has previously held positions at SRI International (1991-1999) and NASA Ames Research Center (1999-2005).
Over the last 20 years, his research has focused on construction of speech-enabled dialogue systems.He has played key roles in several major projects in this area, including SRI’s Spoken Language Translator, NASA’s Clarissa and the Open Source Regulus platform
At the University of Geneva, he has worked primarily with speech translation and CALL.
He has more than 100 refereed publications in speech technology, computational linguistics, machine translation and artificial intelligence.

Alejandro Armando is a certified translator graduated from the School of Languages, Universidad Nacional de Córdoba, Argentina.He works as a freelance translator for a number of international organizations and private sector clients.He is currently finishing a Master of Arts in Translation at the University of Geneva and conducting research on automatic translation and speech recognition.

Nikos Tsourakis is also a Collaborateur Scientifique at UNIGE’s Multilingual Processing Group. He has played important roles in several UNIGE projects, where his work has primarily been concerned with deployment and evaluation of speech-enabled systems on mobile platforms. He has over 30 refereed publications in speech technology.

Pierrette Bouillon is a full professor at Geneva University’s Faculty of Interpretation and Translation, where she also holds the position of Vice-Dean. She has published extensively in the fields of machine translation, lexical semantics and speech technology, and has acted as PI on numerous projects, most recently ACCEPT, a Seventh Framework project whose focus is machine translation for user generated text, and CALL-SLT, a Swiss National Science Foundation project centered on development of methods for speech-enabled CALL systems.

A Tool for Building Multilingual Voice Questionnaires

We describe an easy-to-use architecture which can be used to generate multilingual voice questionnaires, deployed on mobile platforms linked over a 3G connection to a remote server. The person administering the questionnaire chooses the next question by speaking it in their own language. The application uses speech recognition to identify it, speaks a pre-recorded translation in the respondent language, and displays a set of icons on the touch-screen. The respondent answers by pressing an icon. The questionnaire designer specifies the questionnaire using a simple format, in which there are three types of unit: Groups, Questions, and Answers; this is compiled into an accurate limited-vocabulary speech recogniser and a set of tables.

The paper describes the tool in detail, taking as the main example an initial case study using a questionnaire designed for gathering information about information related to availability of anti-malaria measures in sub-Saharan Africa.

There are many circumstances where it is useful to be able to administer multilingual voice questionnaires. A familiar example in Western society is admission to the accident and emergency room of a hospital: the nurse on duty will most likely start by asking for personal details, the nature of the patient’s immediate problem, previous medical history, and so on. If the nurse and the patient do not share a common language, difficulties arise. Another example, which will occupy us more in this paper, is information gathering for demographic and health topics (DHS Program 2014).

We describe an easy-to-use architecture, inspired by the RAMP framework (Salihu 2013) which can be used to generate voice questionnaires of this type. Our questionnaires are deployed on mobile platforms — smartphones, tablets or laptops — linked over a 3G connection to a remote server. The person administering the questionnaire chooses the next question by speaking it in their own language. The application uses speech recognition (performed on the server) to identify it, speaks a pre-recorded translation in the respondent language, and displays a set of icons on the touch-screen. The respondent answers by pressing on the icons; each icon has an associated voice recording, in the respondent language, identifying its function.

The questionnaire designer specifies the questionnaire using a simple format, in which there are three types of unit: Groups, Questions, and Answers. A Group specifies a top-level item on the questionnaire, a list of permitted Fillers, and a pointer to the next Group. A Question specifies one possible way to attempt to assign a Filler to a Group; it defines the Group which the Question belongs to, a list of surface realization of the question in the questionnaire administrator’s language (specified compactly in a minimal regular expression notation), a translation in each target language, and a list of permitted Answers, each one optionally associated with a Filler. An Answer defines a visual icon and an associated translation for each target language. The questionnaire description is compiled into an accurate limited-vocabulary speech recogniser and a set of tables. It is deployed over the web using methods developed at Geneva University and elsewhere in the context of various speech translation and CALL projects (Rayner et al 2006, 2014; Fuchs et al 2012).

The value of the tool, as compared to an application which mechanically asks a set of questions in a fixed order, is that it allows human judgement to be used to select appropriate questions within each Group. For example, four of the Questions associated with a Group “Religion” whose possible Fillers are “Christian”, “Muslim” and “No religion”. might be realized as “What is your religion?” “Do you have a religion?”, “Are you a Muslim?” and “Are you a Christian?” Evidently, the situation may make some choices more appropriate than others; in certain cases, the difference may be very important.

In the full paper, we will describe the tool in more detail, taking as our main example an initial case study using a questionnaire, consisting of 45 Groups and about 300 Questions, designed for gathering information about information related to availability of anti-malaria measures in sub-Saharan Africa. The overall goal is to measure nine internationally recognized vector control indicators that can be generated through household surveys. Some of the questions are about preventive measures — namely, the use of mosquito nets and indoor residual spraying — and bear a more immediate relevance to such indicators; others are about the characteristics of the household’s dwelling unit and are important to establish a connection between the socioeconomic status of the population and public health issues. All questions were extracted from model questionnaires recommended by the cluster of humanitarian actors that developed the indicators, which include the World Health Organization and the Roll Back Malaria Partnership. Our evaluation focusses in particular on questions of usability, habitability, rapid development of content, and quality of speech recognition performance.
——————–
References

DHS Program (2014). http://dhsprogram.com/
Fuchs, M., Tsourakis, N. and Rayner, M. (2012). A Scalable Architecture For Web Deployment of Spoken Dialogue Systems. Proc. LREC.
Rayner, M., Hockey, B.A. and Bouillon, P. (2006). Putting Linguistics into Speech Recognition. CSLI Press.
Rayner, M., C. Baur and N. Tsourakis. (2014). CALL-SLT Lite: A Minimal Framework for Building Interactive Speech-Enabled CALL Applications. Proc. WOCCI.
Salihu, M. (2013). Piloting the first-ever RAMP survey using DHIS mobile. In 141st APHA Annual Meeting.