Cite as: Rossetti, A. and O’Brien, S. (2019). Helping the helpers: Evaluating the impact of a
controlled language checker on the intralingual and interlingual translation tasks involving
volunteer health professionals. In McDonough Dolmaya, J. and Del Mar, M. (eds.) Special
Issue of Translation Studies. Social Translation: New Roles, New Actors.
Helping the helpers: Evaluating the impact of a controlled language checker on the
intralingual and interlingual translation tasks involving volunteer health professionals
Dr Alessandra Rossetti (corresponding author)
Dublin City University, Ireland
alessandra.rossetti2@mail.dcu.ie
Prof Sharon O’Brien
Dublin City University, Ireland
sharon.obrien@dcu.ie
Cochrane is a non-profit organization which mainly relies on volunteer health
professionals for the production, simplification, evaluation, and multilingual
dissemination of high-quality health content. The approach that Cochrane volunteers
adopt for the simplification (or intralingual translation) of English health content is
non-automated and involves the manual checking and implementation of plain
language guidelines. This study investigated whether and to what extent the
introduction of a controlled language (CL) checker—which would make the
simplification approach semi-automated—increased authors’ satisfaction and machine
translation (MT) quality. Twelve Cochrane authors completed a standardized
questionnaire and answered follow-up questions on their level of satisfaction and
preferences. Forty-one Cochrane evaluators assessed the quality of the Spanish MT
outputs of simplified texts. Authors showed a preference for the introduction of a CL
checker. Differences in MT quality scores were slight.
Keywords: controlled language checker; text simplification; volunteer satisfaction;
machine translation quality; health information; non-profit organization
1 Introduction and Related Work
Volunteers play an increasingly important role in the production, editing, and (multilingual)
dissemination of health content, particularly online. Websites that are widely consulted for
health-related purposes often rely on contributions from volunteers. For instance, Wikipedia,
which has become an important source of online health information, is supported by the
Wikimedia Foundation—a non-profit organization—and is written, maintained and edited
collaboratively by volunteers (Laurent and Vickers 2009; Heilman et al. 2011). Translators
without Borders (TwB) is another example of a non-profit organization relying on volunteers
to translate critical information (including medical content) during crises. TwB also partnered
with Wikipedia for the 100x100 project, which involves the translation of 100 of Wikipedia’s
highest ranking medical articles into 100 developing world languages (TwB 2018a).
Epistemonikos is an online database that collates and translates high-quality health evidence
by relying on volunteer domain experts (Rada, Pérez and Capurro 2013). Gigliotti (2017)
discusses the widespread reliance on volunteer translators at non-profit organizations and
goes on to analyse the quality of the translations produced at TwB, The Rosetta Foundation,
PerMondo, and Translations for Progress. These initiatives can be defined as cause-driven
since they contribute to the development of non-profit and humanitarian agendas
(McDonough Dolmaya 2012).
Often the volunteers involved in the authoring, editing and translation activities are
also non-professional writers/translators, i.e. individuals who have not received formal
training in the areas of linguistics/translation (Pérez-González and Susam-Saraeva 2012). It is
not uncommon then for volunteers/non-professional writers/translators to be provided with
guidelines and instructions, particularly when they are health domain experts, who might find
the usage of layperson terms challenging (Zethsen 2009). For example, Simple English
Wikipedia contributors are expected to adhere to an editorial policy including
recommendations on plain language writing (Den Besten and Dalle 2008). Similarly, TwB
volunteers, who can be either professional translators or simply individuals fluent in two
languages, are expected to comply with a code of conduct according to which “[t]ranslators
shall always thoroughly and faithfully render the source language message, omitting or
adding nothing, giving consideration to linguistic variations in both source and target
languages, and conserving the tone and spirit of the source language message” (TwB 2018b,
no page number).
Technology can support non-professional volunteers. Pym (2011, 6) argues that
technology “enables new social locations for translation, particularly in the non-professional
sphere.” He also points out that volunteer and professional translators might collaborate, for
instance, with volunteers post-editing MT outputs that are later revised by professionals
(ibid.). A few examples can help illustrate how this kind of collaboration can take place. With
regard to simplification or intralingual translation, Leroy, Kauchak and Mouradi (2013)
developed a writer support tool to help health practitioners simplify texts for patients and
save time. Regarding interlingual translation, Abekawa and Kageura (2007) describe a
translation aid system for volunteers translating from English into Japanese, which facilitated
the consultation of references, among other tasks. Translation at Epistemonikos often
involves the adoption of machine translation (MT) systems and the subsequent collaborative
editing of the MT output (Rada, Pérez and Capurro 2013). As of 2013, 99.6% of
Epistemonikos articles had been translated with MT into languages other than English (ibid.).
TwB is also increasingly relying on MT technology to speed up the translation process, e.g.
into Kurdish languages (TwB 2016, 2019).
In summary, the important role played by non-professional volunteers for the
production, editing, and (multilingual) dissemination of health content has been recognized,
and writing/translation technology is increasingly supporting their contributions. However, to
the best of our knowledge, no studies have investigated the impact that technologies for
intralingual translation—such as authoring support tools and controlled language (CL)
checkers—can have on the satisfaction of non-professional volunteers, and on the machine
translatability of the simplified texts that they produce.
2 The Case of Cochrane
For our study, we focused on Cochranei, a non-profit organization addressing the demand for
simplified and multilingual medical content online through intralingual and interlingual
translation tasks mainly conducted by volunteer health practitioners, who act as content
creators, editors, translators, and quality evaluators. This organization was chosen because of
its non-automated approach to intralingual translation (which might have benefited from the
introduction of technological assistance), and its increasing reliance on MT technology for
simplified texts. This section will delve into the intralingual and interlingual translation
workflows at Cochrane, while also outlining the goals of our study.
Cochrane provides online high-quality health-related information by producing
Systematic Reviews of empirical evidence answering a specific research question—mainly
on the impact of treatments and interventions (Chandler et al. 2017). The specialised medical
language that characterizes these Systematic Reviews might be difficult for the lay public to
comprehend. Therefore, each Cochrane Systematic Review is accompanied by a plain
language summary (PLS) ii, which both summarizes and simplifies its content (The Cochrane
Collaboration 2013). Numerous readers would consult only the PLS rather than the entire
Systematic Review (Maguire and Clarke 2014).
To produce PLS, Cochrane mainly relies on volunteer authors with a health
background (i.e. health domain experts), who are therefore involved in the workflow of
intralingual translation (i.e. simplification of content to render it accessible to lay readers)
(Muñoz-Miquel 2012; Kajzer-Wietrzny, Whyatt, and Stachowiak 2016). Different sets of
written guidelines are available to the authors of PLS, which deal with both content (e.g.
which information in the Systematic Review to include or exclude), and language/style (e.g.
sentence length or terminology). These guidelines can be found in a variety of documents,
such as the Cochrane Handbook for Systematic Reviews of Interventions (Higgins and Green
2011), or the Standards for the Reporting of Plain Language Summaries in New Cochrane
Intervention Reviews (PLEACS) (The Cochrane Collaboration 2013). Our analysis of
Cochrane PLS guidelines has shown that they are characterized by vagueness and
contradictions, for instance on the length of PLS and on the use of acronyms. Moreover, this
simplification approach is non-automated, because there is no automatic checking for
adherence to the PLS guidelines—volunteers are asked to manually check and implement
guidelines, which can represent a difficult and time-consuming task (Temnikova 2012),
particularly for volunteer health practitioners acting as new agents in the field of linguistics
or translation.
Against this background, the first goal of this study was to investigate the impact that
introducing Acrolinxiii into Cochrane’s non-automated simplification/intralingual translation
workflow would have on the satisfaction of volunteer authors. Acrolinx is a CL checker
developed at the German Research Center for Artificial Intelligence. A CL is a set of rules
applied to text production, which aims at making the language more readable and
translatable, and a checker is the software that checks for adherence to those rules (O’Brien
2010). Acrolinx automatically and consistently flags readability and translatability issues in a
text, while also providing suggestions on how to solve them (Rodríguez Vázquez 2016). We
therefore hypothesized that, by availing of this tool, Cochrane volunteer authors would be
more satisfied with the overall simplification workflow. We refer to the approach involving
Acrolinx as semi-automated because, even though readability and translatability issues are
automatically and consistently flagged, the author needs to manually apply the edits.
Cochrane has also been broadening its international audience, thus adopting strategies
for the multilingual dissemination of its content (Von Elm et al. 2013). In particular, PLS are
being translated into a variety of languages, including Chinese, Portuguese and Russian
(Birch et al. 2017). Due to the non-profit nature of Cochrane, the interlingual translation
workflow has largely relied on volunteer health practitioners. Only a small number of
Cochrane volunteers are professional translators. Moreover, even when translations are
conducted by professionals, a final revision of the accuracy of translated content from
volunteer health practitioners/domain experts is common practice (Birch et al. 2017).
Cochrane has observed the potential impact of publishing translations. In 2016, 66% of all
visits to cochrane.org were conducted through browsers set to a language other than English,
and Internet users from Spanish- and French-speaking countries represented a large
percentage of this audience (ibid.).
To make the translation effort more sustainable and to increase the number of
available translations, Cochrane has been considering the integration of technologies such as
CL checkers and MT systems into their workflow (Von Elm et al. 2013). Customized MT
systems are being developed for Cochrane content (Bojar et al. 2017). However, the impact
that simplification with a CL checker might have on the quality of the Spanish MT output of
PLS obtained with freely available MT systems has not been investigated. The second goal of
this study was therefore to determine whether introducing the Acrolinx CL checker into the
workflow of PLS production would increase the machine translatability of Cochrane PLS.
To summarize, our goal was to test the impact of the Acrolinx CL checker on the
satisfaction of Cochrane volunteer PLS authors, and on the machine translatability of the
PLS. In the following sections, we will describe methodology, participants, experimental
materials, experimental design and tasks within our study. Section 7 will present the main
results, while Section 8 will outline conclusions and areas for future work.
3 Methodology
Authoring Study
In this study, we defined satisfaction as the “extent to which the user’s physical, cognitive
and emotional responses that result from the use of a system, product or service meet the
user’s needs and expectations” (ISO 9241-11:2018, 3.1.14). To measure satisfaction, we
asked twelve participants recruited from the pool of Cochrane volunteer authors (Section 4)
to revise their PLS previously produced with Cochrane guidelines by following Acrolinx
automatic suggestions on readability and translatability (Section 6). Subsequently, these
authors were asked to complete the System Usability Scale (SUS) (Brooke 1996) focusing on
their interaction with both Cochrane PLS guidelines and the Acrolinx CL checker.
The SUS is a Likert scale questionnaire composed of ten statements (Appendix A).
We slightly modified the wording of the statements, i.e. the word “system” in the original
SUS was respectively replaced with “Acrolinx” and “Cochrane PLS guidance”. Slight
wording changes have been shown not to alter the reliability of the SUS (Lewis and Sauro
2009). This questionnaire was selected because it is technology-agnostic and has been
adopted for various tools and applications, from websites (Sauro and Lewis 2011) to punched
voting cards (Byrne, Greene, and Everett 2007), to CL checkers (Miyata et al. 2017).
Moreover, several studies have proven the reliability and validity of the SUS (Bangor,
Kortum and Miller 2008).
The SUS provides one single score—the higher the score, the higher the users’
satisfaction. However, it does not provide diagnostic information (Chaparro et al. 2014).
Therefore, authors also answered follow-up questions on their future preferences for a
specific approach. In particular, they were asked if, in the future, they would use: (i) both
Cochrane PLS guidance and Acrolinx; (ii) Acrolinx only; (iii) Cochrane PLS guidance only;
(iv) other types of authoring support; or (v) no support at all. Finally, authors were also asked
about the reasons for their preferences.
MT Evaluation Study
The MT evaluation experiment was conducted with forty-one participants recruited from the
pool of Spanish-speaking Cochrane volunteer domain experts (Section 4). Each Spanishspeaking Cochrane volunteer was presented with two English PLS (one produced following
Cochrane PLS guidelines, henceforth non-automated PLS, and the other revised with
Acrolinx, henceforth semi-automated PLS) and their Spanish MT outputs, produced by
Google Translate, and segmented at the sentence level. Although Google Translate is a
generic MT engine, it was selected because it is not unrealistic to expect lay people and
experts to use freely available MT systems for assimilation purposes (Section 4).
The presentation order of the sentences followed the structure of the text, rather than
being randomized (Doherty 2017). Each MT output was evaluated against its source PLS.
Evaluators were asked to assign two scores (from 1, the lowest, to 4, the highest) to each
machine-translated sentence. One score was based on fluency and the other score on
adequacy (Miyata et al. 2017). A 4-point scale was selected to avoid mid-point bias.
Evaluators were instructed to provide their intuitive reaction to each sentence, rather than
pondering their decision (Linguistic Data Consortium 2002).
Most evaluators did not have a background in linguistics/translation and might have
found it difficult to understand what was meant by fluency or adequacy. Therefore, drawing
upon Castilho et al. (2017), for each pair of sentences, participants were presented with the
two following questions, dealing with adequacy and fluency, respectively: “How much of the
information contained in the English source sentence appears in the Spanish target
sentence?”; and “Indicate the extent to which the Spanish target sentence is in grammatically
well-formed and fluent Spanish”.
Human evaluation of MT has several drawbacks, including subjectivity and time
commitment (Fiederer and O’Brien 2009). However, due to the importance of accuracy in the
health field, the MT output of Cochrane texts requires human validation. Therefore, involving
human evaluators created a more realistic scenario. Furthermore, fluency and adequacy are
very common in MT evaluation studies, less time consuming than, for example, error
annotation, and more accessible as tasks for non-linguist evaluators.
4 Participants
Authoring Study
We recruited authors from the pool of Cochrane volunteer contributors with a health
background (Section 2). Twelve Cochrane authors who had written a PLS in the past agreed
to revise their PLS by using the Acrolinx CL checker (Section 6). They were volunteer
contributors to various Cochrane Review Groups, such as Cochrane Vascular Group or
Stroke Group. Most authors were native speakers of English (n=7). The remainder reported
Dutch (n=2), German (n=1), Portuguese (n=1) and Russian (n=1) as their native languages.
All authors were health domain experts and worked either as academics (n=9) or health
practitioners (n=3). Authors differed in the number of PLS that they had produced in the past
by following Cochrane PLS guidelines, from only one to ten PLS. There was also wide
variability in the time elapsed between the production of PLS with the non-automated
approach and the use of Acrolinx—from one month to three years. In other words, our
sample of authors was heterogeneous in terms of native language, familiarity with plain
language writing, and memory of Cochrane PLS guidelines.
We tried to recruit a larger and more homogeneous sample of Cochrane authors by
distributing our call for participations via a variety of channels, and by sending multiple
reminders. However, this was not possible—it is important to remember the Cochrane
contributors are academics and health professionals with busy schedules. As a result of the
variability in authors’ background characteristics, our results cannot be generalized (Section
8).
MT Evaluation Study
Forty-one Cochrane volunteers participated in the MT evaluation task. Like authors,
evaluators were recruited from the pool of Cochrane volunteer contributors. All evaluators
were native speakers of Spanish and working or training to work in the health field. They
were health practitioners (n=22), academics (n=8), both academics and health practitioners
(n=2), students or trainees (n=8), and pharmaceutical consultants (n=1). Unlike other MT
evaluation studies, evaluators were not professional translators, so their scores were unlikely
to be influenced by the perception of MT as a threat (Fiederer and O’Brien 2009).
Nonetheless, it is worth noting that eight participants also reported some experience or
training as translators. For instance, one participant had translated pharmaceutical/medical
documents.
Prior to conducting the evaluation tasks, and as part of a pre-task questionnaire
(Section 6), participants were asked to complete the Cambridge English testiv for a quick
assessment of their level of English proficiency (Parra Escartín et al. 2017). This assessment
was needed for the purpose of the adequacy evaluation of the MT output (Section 3), which
requires source language competence (Castilho et al. 2018). The Cambridge English test
provides a score which is then converted into one of the six CEFR (Common European
Framework of Reference for Languages) levels, from A1 (lowest proficiency) to C2 (highest
proficiency), each associated with different reception, production and interaction skills in a
foreign language (Council of Europe 2011).
Evaluators varied in terms of English proficiency—Table 1 shows the number of
evaluators who were assigned to each CEFR level by the Cambridge English test.
Accordingly, during data analysis (Section 7), we calculated mean adequacy scores from all
evaluators first, and then only from evaluators who were assigned a B1 CEFR level or higher,
in order to identify and remove potential biases due to low English language proficiency. B1
was selected as a threshold because, unlike those at A1-A2 levels, users at B1 level “[c]an
understand the main points of clear standard input on familiar matters regularly encountered
in work, school, leisure, etc.” (Council of Europe 2011, 5).
Finally, regarding MT use, the vast majority of participants (n=36) reported using MT
to understand information in an unknown language. Most of them reported using MT either
always or frequently (n=28), while the remainder indicated that they used MT rarely (n=8).
Level of
Number of
English
evaluators
A1
1
A2
8
B1
13
B2
5
B2 - C1
4
C1
2
C1 - C2
5
C2
3
Table 1: Level of English of MT evaluators
5 Experimental Materials
Since twelve Cochrane volunteers took part in our authoring study (Section 4), the following
experimental materials were available to us: (i) a corpus of twelve non-automated PLS and
their Spanish MT outputs; (ii) the corresponding corpus of semi-automated PLS (i.e. revised
by using Acrolinx) and their Spanish MT outputs.
Due to the inconsistencies in Cochrane PLS guidance (Section 2), the length of the
experimental materials produced with the non-automated approach varied between 300 and
700 words (M=437 words, SD=120.95). The set of semi-automated PLS roughly contained
the same average number of words (M=436.41 words, SD=132.85).
6 Experimental Design, Tasks and Procedures
Authoring Study
We adopted a within-subject design—each author engaged with both Cochrane PLS
guidance and the Acrolinx CL checker on the same PLS. More precisely, Cochrane authors
who agreed to take part in this study were asked to:
(i) Complete a pre-task questionnaire to determine if they were eligible (i.e. if they had
authored a PLS in the past) and to collect background information;
(ii) Complete the SUS on their satisfaction with the non-automated approach (i.e. Cochrane
PLS guidance);
(iii) Install the TeamViewer v free software on their computers, which allowed for remote
access to other computers;
(iv) By using TeamViewer, access a computer owned by one of the researchers, where
Acrolinx was installed as a Microsoft Word plugin, and where their past PLS was ready for
them to edit/revise using this CL checker. However, before the main editing task, authors
conducted a warm-up task on a sample PLS (also provided on the researcher’s computer) to
familiarize themselves with Acrolinx. Figure 1 shows a participant conducting the warm-up
task.
Figure 1: Warm-up task with Acrolinx
(v) Edit their past PLS in Microsoft Word by following Acrolinx suggestions. However,
authors were instructed to use their common sense in deciding whether to apply a change
recommended by Acrolinx; and
(vi) Complete the SUS on their satisfaction with Acrolinx and answer the remaining
preference questions.
Authors were provided with instructions for each of the tasks described above. No
time limit was set for the warm-up task or the main editing task. This decision was taken to
maximize the ecological validity of our study since Cochrane authors do not produce PLS
under time pressurevi. With regard to the characteristics of Acrolinx, we used the Microsoft
Word Sidebar Edition which shows readability and translatability issues in a sidebar in Word,
along with examples and suggestions on how to edit them in the text. Examples of Acrolinx
recommendations are: “Simplify word”, or “Use ‘this/that/these/those’ with noun”.
Customizing Acrolinx CL rules for Cochrane content was not possible. However, to make the
two simplification approaches as comparable as possible, we deactivated any Acrolinx CL
rules which contravened Cochrane guidelines (e.g. on the use of hyphens after prefixes).
MT Evaluation Study
A within-subject design was also adopted for the MT evaluation study, where each
participant conducted two evaluation/scoring tasks (Section 3). In particular, Cochrane
evaluators were asked to:
(i) Complete a pre-task questionnaire to test their eligibility (i.e. that they were native speakers
of Spanish and had a background in the health field). Additional questions were asked about
their profiles, particularly on their level of English and on their use of MT; and
(ii) Conduct the evaluation tasks of two Spanish MT outputs.
Both the pre-task questionnaire and the two MT evaluation tasks were presented to
participants online, on Google Forms. Each sentence in each MT output was evaluated in
terms of fluency and adequacy by either three or four participants, depending on the number
of volunteers recruited. Dyson and Hannah (1987) recommend at least three evaluators.
To compensate for fatigue effect, the order in which non-automated and semiautomated PLS were presented to evaluators was counterbalanced (MacKenzie 2013).
Overall, authors were divided into twelve groups and different texts were assigned to each
group. In groups 1-6, the semi-automated PLS was presented first, while in groups 7-12 the
non-automated PLS was presented first. Moreover, the two PLS (and corresponding MT
outputs) assigned to each evaluator dealt with two different health-related topics.
7 Results and Discussion
Volunteer authors’ satisfaction and preferences
Table 2 shows the SUS scores that the twelve authors assigned to Cochrane PLS guidelines
and Acrolinx. Results from the ten Likert-type items that compose the SUS were combined
into a single score prior to being analysed because, taken individually, they do not relate to
any specific characteristic of a system (Brooke 2013). In addition to the mean and the
standard deviation (SD), the median was also included in the analysis because it is less
influenced by extreme values (Doane and Seward 2011).
Simplification approach
SUS score
(mean)
SUS score
(SD)
SUS score
(median)
Non-automated
(Cochrane
PLS guidelines)
Semi-automated
(Acrolinx)
62.29
26.53
70
75.41
14.49
78.75
Table 2: Descriptive statistics of the SUS scores assigned by the twelve authors
The mean and the median rates in Table 2 indicate that using Acrolinx to check the
readability and translatability of a PLS (i.e. making the simplification approach semiautomated) resulted in a higher level of authors’ satisfaction. However, a paired t-test
conducted using the statistical software package Stata 12.1 showed that the difference was
not statistically significant, t(11)=1.2549, p=0.2355.
To complement the SUS results, authors were also asked which type of authoring
support they would use when producing PLS in the future. Figure 2 reports the authors’
responses. The horizontal axis indicates the number of participants who selected each option.
Figure 2: Future use of authoring support for PLS
Most participants (n=9) reported that, in the future, they would use both Cochrane PLS
guidance and Acrolinx, i.e. they would welcome the integration of a CL checker into the
simplification workflowvii. Only a small percentage of participants reported that they would
use either Cochrane guidance only or Acrolinx only. No participant reported that they would
produce a PLS without using any support. This result is not surprising considering that all the
participants were either health professionals or academics in the health field who were likely
to be more familiar with specialised medical language than with plain language (Section 4).
Two authors (P05 and P15) answered that they would like to use only Acrolinx in the
future. P05 provided as a reason the fact that Acrolinx readability suggestions were specific.
However, this participant also specified that they struggled to increase the Acrolinx
readability score of their PLS because the software did not recognize medical terminology.
They added that this issue could be solved by creating an Acrolinx term set for the medical
field (which is indeed possible). P15 reported that they found the software very useful to
improve both the readability and style of the text. Moreover, P15 appreciated Acrolinx
suggestions on translatability and reported that they had already been using an authoring
support tool called Grammarly, thus suggesting that Cochrane PLS guidance alone might not
be enough.
This was the first time that participants had interacted with Acrolinx. Therefore, there
was some agreement on the need to practice with Acrolinx before becoming familiar with the
tool. For this study, participants were assigned a warm-up task before conducting the main
simplification task on their PLS. Nonetheless, the time that participants dedicated to the
warm-up varied greatly.
MT quality
To analyse the quality of the Spanish MT outputs, we calculated the average adequacy and
fluency score (across all text sentences) per evaluator (Koehn and Monz 2006). The average
adequacy score per evaluator ranged from 2.47 to 4, while the average fluency score per
evaluator ranged from 1.04 to 4. For each evaluator, an independent-samples t-test was
conducted in Stata 12.1 to determine if the average score that each evaluator assigned to the
non-automated PLS was significantly different from the average score that they assigned to
the semi-automated PLS at the 0.05 significance level. For each evaluator, Table 3 reports the
average adequacy scores assigned to both texts. Similarly, Table 4 presents the average
fluency scores per evaluator. Statistically significant differences are signalled with an
asterisk. Moreover, we calculated a grand mean of fluency and adequacy scores for both
corpora to get an overall picture of the evaluations.
Groups
Evaluators
Non-automated PLS
Mean (SD) adequacy score
1
2
3
4
5
6
7
8
E13
E25
E01
E37
E14
E02
E26
E15
E03
E27
E39
E16
E04
E40
E28
E17
E05
E41
E18
E42
E30
E31
E49
E19
E08
3.73 (0.44)
4 (0)
3.43 (0.66) (*)
4 (0)
3.88 (0.47)
4 (0)
3.88 (0.47)
3.76 (0.53)
4 (0)
4 (0)
3.52 (0.81)
4 (0)
3.89 (0.4)
4 (0)
3.93 (0.25)
4 (0)
2.7 (0.6)
3.48 (0.75)
3.55 (0.82)
3.9 (0.44)
3.95 (0.22)
3.85 (0.6)
3.25 (0.98) (*)
3.29 (0.77) (*)
3.86 (0.35)
Semi-automated PLS
Mean (SD) adequacy
score
3.89 (0.31)
4 (0)
3.94 (0.22) (*)
4 (0)
4 (0)
3.96 (0.2)
3.68 (0.69)
3.72 (0.45)
3.97 (0.16)
3.89 (0.31)
3.13 (0 .78)
3.86 (0.44)
3.82 (0.46)
3.89 (0.3)
3.82 (0.6)
4 (0)
2.54 (0.8)
3.68 (0.47)
3.5 (0.84)
3.75 (0.61)
3.93 (0.25)
4 (0)
3.78 (0.61) (*)
3.95 (0.21) (*)
3.89 (0.3)
9
10
11
12
E20
E32
E44
E21
E09
E33
E45
E10
E34
E46
E23
E50
E52
E12
E24
E48
GRAND Means (SD)
3.93 (0.25)
3.55 (0.73)
3.82 (0.46)
3.93 (0.25)
4 (0)
3.83 (0.46)
3.86 (0.34)
3.63 (0.72)
3.77 (0.52)
3.86 (0.35)
3.82 (0.38)
3.73 (0.54)
2.47 (0.59) (*)
3.34 (0.7)
3.84 (0.36)
3.46 (0.62)
4 (0)
3.34 (0.76)
3.93 (0.37)
3.96 (0.19)
4 (0)
3.85 (0.36)
3.77 (0.42)
3.77 (0.61)
3.83 (0.37)
3.96 (0.17)
3.76 (0.6)
3.8 (0.54)
3.5 (0.69) (*)
3.5 (0.65)
3.75 (0.44)
3.7 (0.46)
3.72 (0.33)
3.78 (0.27)
Table 3: Descriptive and inferential statistics on adequacy scores
Groups
1
2
3
4
5
6
7
Evaluators
Non-automated PLS
Mean (SD) fluency score
Semi-automated PLS
Mean (SD) fluency score
E13
E25
E01
E37
E14
E02
E26
E15
E03
E27
E39
E16
E04
E40
E28
E17
E05
E41
E18
E42
E30
E31
3.56 (0.58)
3.17 (1.07)
2.26 (1.17)
3.65 (0.57)
3.11 (0.9) (*)
3.38 (0.69)
3.33 (0.76)
3.71 (0.56)
3.71 (0.46)
3.04 (1.07)
2.8 (0.81)
3.79 (0.49)
3.79 (0.55)
3.24 (0.57)
3.68 (0.66)
2.85 (0.76)
2.07 (0.72)
2.55 (1.01)
3.9 (0.3)
3.45 (0.75)
2.55 (1.19)
3.59 (0.79) (*)
3.47 (0.77)
3.21 (1.13)
2.73 (1.04)
3.63 (0.68)
3.64 (0.56) (*)
3.36 (0.63)
3.12 (1.01)
3.62 (0.59)
3.59 (0.64)
3.13 (0.91)
2.67 (0.7)
3.65 (0.61)
3.65 (0.72)
3.44 (0.63)
3.75 (0.73)
3.13 (0.63)
1.95 (1.04)
2.77 (0.86)
3.72 (0.54)
3.54 (0.72)
2.7 (1)
3.92 (0.34) (*)
8
9
10
11
12
E49
E19
E08
E20
E32
E44
E21
E09
E33
E45
E10
E34
E46
E23
E50
E52
E12
E24
E48
GRAND Means (SD)
3.22 (0.93)
3 (0.96)
3.75 (0.51)
3.89 (0.3)
3.13 (1.02)
3.68 (0.54)
3.8 (0.48)
3.86 (0.34)
3.46 (0.62)
3.36 (0.88)
3.09 (1.1)
2.22 (0.97)
3.04 (0.84)
3.69 (0.47)
3.78 (0.51)
1.04 (0.2) (*)
3.78 (0.42) (*)
3.9 (0.29)
3.31 (0.78)
3.53 (0.67)
3.36 (0.58)
3.86 (0.35)
3.96 (0.18)
2.62 (1.04)
3.62 (0.67)
3.51 (0.7)
3.85 (0.45)
3.59 (0.57)
3.37 (0.92)
3.61 (0.84)
2.09 (0.9)
3.35 (0.7)
3.84 (0.36)
3.63 (0.71)
1.6 (1.1) (*)
4 (0) (*)
3.87 (0.33)
2.91 (0.77)
3.27 (0.6)
3.33 (0.55)
Table 4: Descriptive and inferential statistics on fluency scores
From Tables 3 and 4 it emerges that: (i) in terms of adequacy, the number of evaluators who
rated semi-automated PLS higher (n=19) was just one greater than the number of evaluators
who rated non-automated PLS higher (n=18); (ii) the number of evaluators who assigned a
higher average fluency score to semi-automated PLS (n=22) was very slightly higher than the
number of evaluators who assigned a higher average fluency score to non-automated PLS
(n=19); (iii) most differences in average scores were not statistically significant. The only
statistically significant increases in adequacy and fluency scores (for eight participants in
total) were observed for semi-automated PLS. Moreover, after excluding evaluators with A1A2 level of English proficiency (Section 4) and recalculating the grand mean of adequacy
scores, we observed that the difference between the grand mean score assigned to nonautomated PLS (M=3.76, SD=0.28) and the grand mean score assigned to semi-automated
PLS (M=3.82, SD=0.25) remained slight. Overall, there was little difference in evaluators’
ratings of fluency and adequacy between the two corpora of PLS, thus indicating that the MT
system used (i.e. Google Translate) did not consistently produce better raw MT quality when
Acrolinx was integrated into the simplification process.
Overall fluency and adequacy scores were relatively high, suggesting that the MT
system produced reasonably good raw MT quality. We would need to examine the Spanish
quality and verify that the evaluations are fair. At the same time, evaluators are domain
experts with some source text knowledge and we assume that they are reasonably capable of
making a credible judgement of the content. A random sample check of scores suggests that
the evaluation task was executed credibly, and the scores are fair judgements of quality.
Moreover, evaluators who consistently assigned four on adequacy, gave varied scores on
fluency (and vice versa), thus providing further indication that they paid attention to the
different characteristics of the sentences. The data reported in Tables 3 and 4 also show that
the vast majority of evaluators (n=38) assigned higher mean scores for adequacy than for
fluency for both corpora of PLS—this is also reflected in the grand means, which are higher
for the adequacy measure.
8 Conclusions and Future Work
This study focused on volunteer health practitioners facilitating the intralingual and
interlingual translation of medical content at Cochrane. More precisely, we examined
whether introducing the Acrolinx CL checker into Cochrane’s non-automated simplification
approach for the production of PLS would be beneficial in terms of volunteer authors’
satisfaction and machine translatability (into Spanish) of their PLS. In this section, we will
summarize and discuss the main findings, and we will present areas for future work.
We observed that most authors were satisfied with the CL checker, and would
welcome its introduction to supplement Cochrane PLS guidelines during the
simplification/intralingual translation tasks. This tool would reduce the need to remember,
check and manually implement plain language guidelines, which might represent a burden,
particularly for individuals with no training in linguistics. A possible way to maximize the
benefits of integrating a CL checker and guidelines would be to use the former to check for
compliance with simplification rules that can be formalized, and the latter at the
summarization stage, i.e. when authors need guidance on which content of the Systematic
Reviews should be included in the PLS. However, CL checkers should be tailored to the
characteristics of Cochrane content, and authors would need some time to familiarise
themselves with this semi-automated approach.
When volunteering involves content editing and production (as in the case of
Cochrane or Wikipedia), Nov and Rao (2008, 85) shed light on the threat of asymmetry,
defined as “a lack of contributed resources for maintaining and improving the common pool
of resources”. In other words, there is some risk that volunteers’ contributions will be scarce
and not sufficient to expand and update content. Ensuring the commitment of volunteers to
the authoring task (by boosting their satisfaction) might reduce the threat of asymmetry, and
result in an increase in “volunteering rates” (Olohan 2014, 19), namely the amount of health
content made available in plain English on different websites: as part of the CochraneWikipedia partnership, evidence from Cochrane Systematic Reviews is being used in
Wikipedia medical articles to ensure their accuracy (Cochrane 2016).
However, satisfaction with a task might not always result in motivation to volunteer.
Motivation is a complex psychological construct. Previous works have shown that volunteers
involved in online communities can be self-motivated to write in the first place (Joyce and
Kraut 2006), or that motivation is determined by the perceived importance and uniqueness of
the contributions (Ling et al. 2005). According to Clary et al. (1998), volunteering can serve
six functions, which are: expressing altruistic values; engaging in favourably viewed
activities; advancing one’s career; reducing one’s sense of guilt; enhancing positive mood;
and understanding, which is defined as the possibility for volunteers to be involved in new
learning experiences, and to put their skills and abilities into practice. Future research, e.g.
involving structured or semi-structured interviews, might shed light on the factors that
motivate Cochrane authors to volunteer, and on whether the use of automated tools such as
CL checkers affects their motivation.
Future work should also measure the duration of the authoring tasks, since time
commitment is another factor likely to influence authors’ motivation to engage in volunteer
intralingual translation tasks. Moreover, these findings regarding the satisfaction and
preference of Cochrane authors should be tested in studies with a larger and more
homogeneous sample of participants (in terms of native language, plain language writing
skills, and familiarity with Cochrane sets of guidelines) in order to be generalisable.
Regarding machine translatability, using the Acrolinx CL checker did not result in a
significant increase in quality of the Spanish MT output of PLS. In line with Wu et al. (2011),
we also observed that Google Translate produced output of relatively high quality. This result
seems encouraging for Cochrane and other non-profit organizations that are increasingly
relying on MT systems to streamline their translation workflow and to encourage
contributions from volunteers, particularly non-professional translators (Section 1).
Furthermore, for both PLS corpora, adequacy was rated higher than fluency. This finding is
also promising since, compared with adequacy/content errors, fluency/style issues are easier
to correct and less likely to have a detrimental effect on the well-being of readers (Koponen
2010; Stymne 2013).
It remains to be investigated if volunteer health professionals would be willing to
post-edit MT outputs rather than translating PLS from scratch. Future work should also
compare the human interlingual translation vs. post-editing scenario of simplified texts.
Moreover, it might be interesting to analyse the impact of other CL checkers on machine
translatability (into other languages), as well as to conduct the evaluation of the MT output at
the document (rather than the sentence) level (Läubli, Sennrich and Volk 2018). Even though
volunteer health professionals might need some training on how to validate or post-edit MT
outputs, combining CL checkers and MT might reduce their workload and increase the
amount of health information that is made available in different languages.
Funding and acknowledgments
This work was supported by the European Union’s Horizon 2020 research and innovation
programme under the Marie Skłodowska-Curie grant agreement N. 734211 (“Interact”); and
by the Irish Research Council (GOIPG/2017/1409). The authors would like to thank the
Acrolinx team for their technical support, and Dr Silvia Rodríguez Vázquez for her assistance
with the setup of the authoring study. The authors would also like to thank the anonymous
participants, and Therese Docherty, Juliane Reid, Hayley Hassan, and Andrea Cervera from
Cochrane for helping with recruitment.
Disclosure Statement
The authors have no financial interest or benefit arising from the direct applications of their
research.
References
Abekawa, Takeshi, and Kyo Kageura. 2007. “A Translation Aid System with a Stratified
Lookup Interface.” In Proceedings of the 45th Annual Meeting of the Association for
Computational Linguistics, Poster and Demo Session, 5-8. https://bit.ly/2LYqqy5
Bangor, Aaron, Philip Kortum, and James Miller. 2008. “An Empirical Evaluation of the
System Usability Scale.” International Journal of Human-Computer Interaction
24(6): 574-594. doi:10.1080/10447310802205776.
Birch, Alexandra, Ondřej Bojar, Rudolf Rosa, Juliane Ried, Hayley Hassan, and Colin
Davenport. 2017. D5.5: Report on User Surveys, Impact Assessment and Automatic
Semantic Metrics. https://goo.gl/EGQJXK
Bojar, Ondřej, Barry Haddow, David Mareček, Roman Sudarikov, Aleš Tamchyna, and
Dušan Variš. 2017. D1.1: Report on Building Translation Systems for Public Health
Domain. https://goo.gl/MzXdKj
Brooke, John. 1996. “SUS – A Quick and Dirty Usability Scale.” Usability Evaluation in
Industry 89 (194): 4-7. https://goo.gl/kyFkmJ
Brooke, John. 2013. “SUS: A Retrospective.” Journal of Usability Studies 8 (2): 29-40.
https://goo.gl/ckpbm4
Byrne, Michael, Kristen Greene, and Sarah Everett. 2007. “Usability of Voting Systems:
Baseline Data for Paper, Punch Cards, and Lever Machines.” In Proceedings of the
SIGCHI Conference on Human factors in Computing Systems, edited by Bo Begole,
Stephen Payne, Elizabeth Churchill, Rob Amant, David Gilmore, and Mary Beth
Rosson, 171-180. New York: ACM.
Castilho, Sheila, Joss Moorkens, Federico Gaspari, Rico Sennrich, Vilelmini Sosoni,
Panayota Georgakopoulou, Pintu Lohar, Andy Way, Antonio Miceli Barone, and
Maria Gialama. 2017. “A Comparative Quality Evaluation of PBSMT and NMT
Using Professional Translators.” In Proceedings of MT Summit XVI, vol. 1, edited by
Sadao Kurohashi and Pascale Fung, 116-131. https://goo.gl/pHwfdB
Castilho, Sheila, Stephen Doherty, Federico Gaspari, and Joss Moorkens. 2018. “Approaches
to Human and Machine Translation Quality Assessment.” In Translation Quality
Assessment: From Principles to Practice, edited by Joss Moorkens, Sheila Castilho,
Federico Gaspari, and Stephen Doherty, 9-38. Basel, Switzerland: Springer
International Publishing AG.
Chandler, Jackie, Julian Higgins, Jonathan Deeks, Clare Davenport, and Mike Clarke. 2017.
“Chapter 1: Introduction.” In Cochrane Handbook for Systematic Reviews of
Interventions, Version 5.2.0, edited by Julian Higgins, Rachel Churchill, Jackie
Chandler, and Miranda Cumpston. https://goo.gl/BRfoYQ
Chaparro, Barbara, Mikki Phan, Christina Siu, and Jo Jardina. 2014. “User Performance and
Satisfaction of Tablet Physical Keyboards.” Journal of Usability Studies 9 (2): 70-80.
https://goo.gl/qB1x9P
Clary, Gyl, Mark Snyder, Robert Ridge, John Copeland, Arthur Stukas, Julie Haugen, and
Peter Miene. 1998. “Understanding and Assessing the Motivations of Volunteers: A
Functional Approach.” Journal of Personality and Social Psychology 74 (6): 15161530. doi:10.1037/0022-3514.74.6.1516
Cochrane. 2016. “The Cochrane-Wikipedia Partnership in 2016”. https://goo.gl/Lh7b7z
Council of Europe 2011. Common European Framework of Reference for Languages:
Learning, Teaching, Assessment. https://bit.ly/2w4iMwg
Den Besten, Matthijs, and Jean‐Michel Dalle. 2008. “Keep it Simple: A Companion for
Simple Wikipedia?” Industry and Innovation 15 (2): 169-178.
doi:10.1080/13662710801970126
Doane, David, and Lori Seward. 2011. “Measuring Skewness: A Forgotten Statistic?”
Journal of Statistics Education 19 (2): 1-18. doi:10.1080/10691898.2011.11889611
Doherty, Stephen. 2017. “Issues in Human and Automatic Translation Quality Assessment.”
In Human Issues in Translation Technology, edited by Dorothy Kenny, 131-148.
London: Routledge.
Dyson, Mary, and Jean Hannah. 1987. “Toward a Methodology for the Evaluation of
Machine Assisted Translation Systems.” Computers and Translation 2 (3): 163-176.
https://goo.gl/fWC1ZX
Fiederer, Rebecca, and Sharon O’Brien. 2009. “Quality and Machine Translation: A Realistic
Objective.” Journal of Specialised Translation 11: 52-74. https://goo.gl/D5wf5z
Gigliotti, Giulia. 2017. “The Quality of Mercy: A Corpus-Based Analysis of the Quality of
Volunteer Translations for Non-Profit Organisations (NPOs).” New Voices in
Translation Studies 17: 52-81. https://goo.gl/L8eqjL
Heilman, James, Eckhard Kemmann, Michael Bonert, Anwesh Chatterjee, Brent Ragar,
Graham Beards, David Iberri et al. 2011. “Wikipedia: A Key Tool for Global Public
Health Promotion.” Journal of Medical Internet Research 13(1): e14.
doi:10.2196/jmir.1589
Higgins, Julian, and Sally Green, eds. 2011. Cochrane Handbook for Systematic Reviews of
Interventions, Version 5.1.0. https://goo.gl/x1mgRS
ISO (International Standardization Organization) 2018. ISO 9241-11:2018. Ergonomics of
Human-System Interaction — Part 11: Usability: Definitions and Concepts. Preview:
https://goo.gl/57dkf8
Joyce, Elizabeth, and Robert Kraut. 2006. “Predicting Continued Participation in
Newsgroups.” Journal of Computer-Mediated Communication 11: 723-747.
doi:10.1111/j.1083-6101.2006.00033.x
Kajzer-Wietrzny, Marta, Boguslawa Whyatt, and Katarzyna Stachowiak. 2016.
“Simplification in Inter- and Intralingual Translation — Combining Corpus
Linguistics, Key Logging and Eye-Tracking.” Poznan Studies in Contemporary
Linguistics 52 (2): 235-237. doi:10.1515/psicl-2016-0009
Koehn, Philipp, and Christof Monz. 2006. “Manual and Automatic Evaluation of Machine
Translation between European Languages.” In Proceedings of the Workshop on
Statistical Machine Translation (HLT-NAACL 06), edited by Philipp Koehn, and
Christof Monz, 102-121. Stroudsburg: Association for Computational Linguistics.
Koponen, Maarit. 2010. “Assessing Machine Translation Quality with Error Analysis.” In
Electronic Proceedings of the KäTu Symposium on Translation and Interpreting
Studies. https://bit.ly/2Ni6c7l
Laurent, Michaël, and Tim Vickers. 2009. “Seeking Health Information Online: Does
Wikipedia Matter?” Journal of the American Medical Informatics Association 16 (4):
471-479. doi:10.1197/jamia.M3059
Läubli, Samuel, Rico Sennrich, and Martin Volk. 2018. “Has Machine Translation Achieved
Human Parity? A Case for Document-Level Evaluation.” https://bit.ly/2M4cumf
Leroy, Gondy, David Kauchak, and Obay Mouradi. 2013. “A User-Study Measuring the
Effects of Lexical Simplification and Coherence Enhancement on Perceived and
Actual Text Difficulty.” International Journal of Medical Informatics 82(8): 717-730.
doi:10.1016/j.ijmedinf.2013.03.001
Lewis, James, and Jeff Sauro. 2009. “The Factor Structure of the System Usability Scale.” In
Proceedings of the 13th International Conference on Human Computer Interaction
(HCII 2009), edited by Masaaki Kurosu, 94-103. New York: Springer.
Ling, Kimberly, Gerard Beenen, Pamela Ludford, Xiaoqing Wang, Klarissa Chang, Xin Li,
Dan Cosley et al. 2005. “Using Social Psychology to Motivate Contributions to
Online Communities.” Journal of Computer-Mediated Communication 10(4).
doi:10.1111/j.1083-6101.2005.tb00273.x
Linguistic Data Consortium 2002. Linguistic Data Annotation Specification: Assessment of
Fluency and Adequacy in Arabic-English and Chinese-English Translations.
https://goo.gl/FMp5wj
MacKenzie, Scott. 2013. Human-Computer Interaction: An Empirical Research
Perspective. Burlington, Massachusetts: Morgan Kaufmann.
Maguire, Lisa, and Mike Clarke. 2014. “How Much do You Need: A Randomised
Experiment of whether Readers Can Understand the Key Messages from Summaries
of Cochrane Reviews without Reading the Full Review.” Journal of the Royal Society
of Medicine 107 (22): 444-449. doi:10.1177/0141076814546710
McDonough Dolmaya, Julie. 2012. “Analyzing the Crowdsourcing Model and Its Impact on
Public Perceptions of Translation.” The Translator 18 (2): 167-191.
doi:10.1080/13556509.2012.10799507
Miyata, Rei, Anthony Hartley, Kyo Kageura, and Cécile Paris. 2017. “Evaluating the
Usability of a Controlled Language Authoring Assistant.” The Prague Bulletin of
Mathematical Linguistics 108: 147-158. doi:10.1515/pralin-2017-0016
Muñoz-Miquel, Ana. 2012. “From the Original Article to the Summary for Patients:
Reformulation Procedures in Intralingual Translation.” Linguistica Antverpiensia,
New Series–Themes in Translation Studies 11: 187-206. https://bit.ly/2w8FOCR
Nov, Oded, and Bharat Rao. 2008. “Technology-Facilitated ‘Give According to Your
Abilities, Receive According to Your Needs’”. Communications of the ACM 51 (5):
83-87. doi:10.1145/1342327.1342342
O’Brien, Sharon. 2010. “Controlled Language and Readability.” In Translation and
Cognition - ATA Scholarly Monograph XV Series, edited by Gregory Shreve, and Erik
Angelone, 143-168. Amsterdam: John Benjamins.
Olohan, Maeve. 2014. “Why do You Translate? Motivation to Volunteer and TED
Translation.” Translation Studies 7 (1): 17-33. doi:10.1080/14781700.2013.781952
Parra Escartín, Carla, Sharon O’Brien, Marie-Josée Goulet, and Michel Simard. 2017.
“Machine Translation as an Academic Writing Aid for Medical Practitioners.” In
Proceedings of MT Summit XVI, vol. 1, edited by Sadao Kurohashi, and Pascale Fung,
254-267. https://goo.gl/pHwfdB
Pérez-González, Luis, and Şebnem Susam-Saraeva. 2012. “Nonprofessionals Translating and
Interpreting.” The Translator 18 (2): 149-165. doi:10.1080/13556509.2012.10799506
Pym, Anthony. 2011. “What Technology Does to Translating.” Translation & Interpreting
3(1): 1-9. https://bit.ly/2PEhfoG
Rada, Gabriel, Daniel Pérez, and Daniel Capurro. 2013. “Epistemonikos: A Free, Relational,
Collaborative, Multilingual Database of Health Evidence.” Studies in Health
Technology and Informatics 192: 486-490. doi:10.3233/978-1-61499-289-9-486
Rodríguez Vázquez, Silvia. 2016. “Assuring Accessibility during Web Localisation: An
Empirical Investigation on the Achievement of Appropriate Text Alternatives for
Images.” PhD dissertation, University of Geneva.
Sauro, Jeff, and James Lewis. 2011. “When Designing Usability Questionnaires, does It Hurt
to Be Positive?” In Proceedings of the SIGCHI Conference on Human Factors in
Computing Systems (CHI 2011), edited by Desney Tan, Geraldine Fitzpatrick, Carl
Gutwin, Bo Begole, and Wendy Kellogg, 2215-2224. New York: Association for
Computing Machinery.
Schriver, Karen. 2017. “Plain Language in the US Gains Momentum: 1940-2015.” IEEE
Transactions on Professional Communication 60(4): 343-366.
doi:10.1109/TPC.2017.2765118
Stymne, Sara. 2013. “Using a Grammar Checker and Its Error Typology for Annotation of
Statistical Machine Translation Errors.” In Proceedings of the 24th Scandinavian
Conference of Linguistics (24SCL). https://bit.ly/2x5HRbr
Temnikova, Irina. 2012. “Text Complexity and Text Simplification in the Crisis Management
Domain.” PhD dissertation, University of Wolverhampton.
The Cochrane Collaboration 2013. Standards for the Reporting of Plain Language
Summaries in new Cochrane Intervention Reviews (PLEACS). https://goo.gl/w8FpFv
TwB (Translators without Borders). 2016. “Translators without Borders Develops the
World’s First Crisis-Specific Machine Translation System for Kurdish Languages.”
https://goo.gl/tuvv2W
TwB (Translators without Borders). 2018a. “Development & Preparedness.”
https://goo.gl/RP2BDb
TwB (Translators without Borders). 2018b. “Translators without Borders Code of Conduct
for Translators.” https://bit.ly/2NONC3L
TwB (Translators without Borders). 2019. “Kató – TWB’s Translation Platform.”
https://goo.gl/si6aHp
Von Elm, Erik, Philippe Ravaud, Harriet MacLehose, Lawrence Mbuagbaw, Paul Garner,
Juliane Ried, and Xavier Bonfill. 2013. “Translating Cochrane Reviews to Ensure that
Healthcare Decision-Making Is Informed by High-Quality Research Evidence.” PLOS
Medicine 10 (9): e1001516. doi:10.1371/journal.pmed.1001516
Wu, Cuijun, Fei Xia, Louise Deleger, and Imre Solti. 2011. “Statistical Machine Translation
for Biomedical Text: Are We There Yet?” In Proceedings of the AMIA Annual
Symposium, 1290-1299. Bethesda, Maryland: American Medical Informatics
Association.
Zethsen, Karen Korning. 2009. “Intralingual Translation: An Attempt at Description.” Meta
544: 795–812. doi:10.7202/038904ar
Appendix A.
Strongly disagree
1. I think that I would like to
use this system frequently
2. I found the system unnecessarily
complex
3. I thought the system was easy
to use
4. I think that I would need the
support of a technical person to
be able to use this system
5. I found the various functions in
this system were well integrated
6. I thought there was too much
inconsistency in this system
7. I would imagine that most people
would learn to use this system
very quickly
8. I found the system very
cumbersome to use
9. I felt very confident using the
system
10. I needed to learn a lot of
things before I could get going
with this system
Strongly agree
The Cochrane Collaboration website is available at: https://www.cochrane.org/ [Accessed 7 January 2019]
An example of Cochrane PLS is available at: https://goo.gl/WKGpBS [Accessed 7 January 2019]
iii The Acrolinx website is available at: https://www.acrolinx.com/ [Accessed 7 January 2019]
iv The Cambridge English Test is available at: https://goo.gl/BFt6zE [Accessed 7 January 2019]
v A description of TeamViewer is available at: https://www.teamviewer.com/en/ [Accessed 7 January 2019]
vi We could not recruit authors who would produce PLS from scratch for our experiment. Producing a PLS from
an entire Systematic Review is an onerous and time-consuming task that volunteer authors conduct in different
sessions during their free time. Our only option was to recruit authors who had already produced a PLS in the
past, and ask them to check and edit their PLS using Acrolinx. While this choice meant that we were not able to
measure task duration, nor to control the conditions under which the production of PLS with Cochrane
guidelines took place, it did enhance the ecological validity of the study since a laboratory/controlled setting
would have represented an unrealistic scenario for Cochrane authors.
vii It is worth noting that authoring a PLS from scratch while consulting sets of guidelines and at the same time
receiving automatic notifications on violations of readability and translatability might be overwhelming and
distracting for authors. Therefore, we suggest that if Acrolinx (or similar) were integrated into the overall
workflow of PLS production, it be used as a way to check the readability and translatability of an already
existing draft. Moreover, text simplification is often an iterative process, in which one draft is successively
edited and refined (Schriver 2017).
i
ii