assessing support for community workflows in localisation · assessing support for community...
TRANSCRIPT
Assessing Support for Community Workflows in Localisation
Aram Morera, Lamine Aouad, and J.J. Collins
Localisation Research Centre - Centre for Next Generation Localisation (CNGL)
Dept. of Computer Science and Information Systems
University of Limerick, Limerick, Ireland
aram.morera-mesa,lamine.aouad,[email protected]
Abstract. This paper identifies a set of workflow patterns necessary to support
community-oriented localisation. Workflow pattern discovery is based on use
case analysis of five community translation tools, and modelled using the Yet
Another Workflow Language (YAWL) notation. An analysis is presented of the
support for these baseline patterns in two mainstream enterprise-oriented
Translation Management Systems (TMS) - GlobalSight and WorldServer. A
gap is identified with respect to the emerging need for community-oriented
workflows and their potential support in mainstream enterprise localisation
architectures.
Keywords: crowdsourcing, workflows, workflow patterns, localization,
automation
1 Introduction
The Localization Industry Standards Association (LISA) describe localisation as
“the process of modifying products or services to account for differences in distinct
markets” [1]. Localization also includes infrastructural support such as project
management, engineering, quality assurance, and human-computer interface design
issues [2]. Globalisation and pervasive internet access are driving demand for
localised content at lower cost without sacrificing quality. However, this demand is
not currently satisfied, and the resulting deficit is referred to as the digital divide.
Overcoming this challenge will require higher levels of automation [3], such as the
use of workflow enactment engines to support the business processes. In addition, it is
argued that use of the crowdsourcing paradigm through community-oriented
localisation infrastructures will be necessary [3]. Combining crowdsourcing and
2 Aram Morera, Lamine Aouad, and J.J. Collins
automation through workflows requires the specification of community-oriented
localisation workflows.
Traditional approaches to localisation as embodied in enterprise level platforms
have embraced automation support at various points in the workflow. For example,
Translation Memory (TM) tools such as TRADOS, Deja vu, and Wordfast, to name a
few, facilitate increasing consistency within and across projects, and free translators
from manually intensive operations such as copying and pasting of text that had
already been translated. Machine Translation has made it possible to deliver lower
quality translation at almost no cost in near real-time. In addition, Translation
Management Systems (TMS) orchestrate the business functions, project tasks, process
workflows and language technologies that underpin large-scale translation activity
[4].
These systems were designed to help enterprise level Language Service Providers
(LSPs) to translate large amounts of content using a predominantly freelance
workforce [5]. The emergence of the crowdsourcing paradigm and Web 2.0 has
allowed companies and NGOs to leverage the community to do the translation [6].
Two examples of NGOs doing this are The Rosetta Foundation and Translate.org.za.
Facebook [7] and PcTools [8] are for profit companies that have developed
proprietary tools in order to leverage their communities to translate their strings for
free. Other examples include open source projects such as Ubuntu [9], LibreOffice
and Firefox that are localized by their communities [10]. This community-based
approach is seen as a necessary tactic to address the ever growing demand for
localised content [3].
Different strategies and technologies have been adopted by these organizations to
carry out their community-oriented localisation projects. This paper analyzes the
community-oriented approaches enabled by the localisation technologies of Crowdin,
Facebook, Asia Online, Pootle, and LaunchPad. Use cases are recovered for these
community translation tools through manual screen scrapes and/or analysis of user
manuals. Workflow patterns are suggested for these use cases that will best support
the desired functionality, with some additional patterns incorporated to enhance the
quality of the service. These patterns discovered through use case analysis constitute a
baseline for community-oriented localisation workflows. This baseline is used for
comparison with the patterns supported in two mainstream enterprise-oriented
localisation industry TMS tools - GlobalSight and WorldServer. This facilitates the
identification of the gap in workflow support between enterprise and the emerging
community-oriented paradigm based on crowdsourcing.
Section 2 of this paper presents a series of use cases for community translation
tools and the workflows that emerge from them. Section 3 presents the TMSs that
were used; and a mapping study showing their support for the patterns that were
discovered in section 2. The paper concludes with a discussion in section 4 that
outlines future research directions.
Assessing Support for Community Workflows in Localisation 3
2 Discovering Community-Oriented Workflows
Use cases are captured in order to identify the patterns that should be supported.
Use cases are descriptions of sequences of interactions between the system and its
users [11]. If a pattern can be mapped to the actions in a use case, it is deemed
necessary to support it for community translation. The sequence pattern is not
included in these use cases as it is an implicit requirement for any kind of workflow.
For Asia online and Facebook, the process was followed as captured from talks given
by Losse [7] and Vashee [12]. For Crowdin, Pootle, and LaunchPad, a number of
projects and user accounts were created. Projects were executed by simulating a
crowd using the different user accounts that were harnessed to provide translations,
votes and comments by iterating through the fields in the screens presented. These
screens were used to specify use cases. The use cases are the basis for workflows that
are modelled using Yet Another Workflow Language (YAWL) [13] with the notation
shown in figure 1. Note that the OR Join and the OR Split can map to different
patterns, that the shading does not carry any implicit semantics, and that the sequence
pattern has not been made explicit in the use cases.
Fig. 1. Workflow notations
2.1 Use cases
The use cases are presented next for a subset of the targeted platforms.
Crowdin
Crowdin has been used to localize products such as the Android app Titanium
Backup that has 0.5 - 1 million users. Three different accounts were created, one for
the project manager and the other two in order to simulate a crowd. A project can be
configured as managed in that members of the crowd have to be accepted by the
project manager; or open in that anyone can participate. The crowd has access to the
source text, translations from Crowdin's TM, and translations from Google's and
4 Aram Morera, Lamine Aouad, and J.J. Collins
Microsoft's Machine Translation (MT) systems, once the source files have been
uploaded in the designated file format. They can then suggest alternative translations
or vote for or against other translations. The project creator can give extra rights at
any time to any user who then becomes a group leader. The project creator and the
group leaders can also approve translations that are then hidden from the crowd. This
system creates a Multiple Instances without Synchronization task for each new file.
Each instance corresponds to a translation unit, their number is known at run time and
each instance is an independent thread that does not block the progress of the
workflow if it is not executed. These instances consist of a subworkflow that
commences with a pre-translation stage that has three tasks: Google MT translation,
Bing MT translation and Crowdin's TM leverage. These three tasks happen in parallel
and thus a Parallel Split must precede them. Crowdin's TM may present none or more
matches, this means that it is a Multiple Instances without a priori Run-Time
Knowledge task. Because the MT systems may fail and the TM may produce no
results, the control thread must converge over an Acyclic Synchronizing Merge.
Fig. 2. YAWL representation - Crowdin's translate and vote subworkflow
After the merge, the crowd can suggest their own translations. Since multiple users
can suggest translations and any single user can suggest a number of translations, this
is a Multiple Instances without a priori Run-Time knowledge task. A translation can
be voted upon until it is approved. As with the suggest-translation task, the number of
votes that a translation will receive before it is approved is not known and it is
therefore modelled as a Multiple Instances without a priori Run-Time knowledge task.
Since a translation can be approved before it receives any votes, the split that follows
the suggest-translation task must be a Multichoice Split. As the control flow can pass
directly from the suggest-translation task to the approve-translation task, the approve
translation task must be preceded by an Acyclic Synchronizing Merge, so that an
absence of votes does not stall the flow. At any time a user with the right permissions
can declare the project finished. To be able to react to this signal the system must
support the Transient Trigger pattern. This triggers the cancellation of all the
activities in the case without making the case unsuccessful, thus requiring the Cancel
Assessing Support for Community Workflows in Localisation 5
Region pattern. The cancellation of tasks prevents the creation of new work items and
causes the implicit termination of the case, which requires support for the Implicit
Termination pattern.
Figure 2 illustrates a YAWL representation of the translate-and-vote subworkflow.
Asia Online
Asia Online has translated part of the content of English Wikipedia to Thai using
MT and a selected community of users [12]. Each document goes through an MT
translation process. After this 3 instances of a post-edition task are created. To support
this it is necessary to use the Multiple Instances with a priori Design-Time Knowledge
pattern. These post-edition instances are carried out by one community member each.
Then, their corrections are compared. As waiting for the three instances to finish is
required, this means that support for the Synchronization pattern is also required. If
two corrections are the same they are automatically sent to the authoritative
translation database, otherwise, the corrections go to an administrator. To support this
you must support the Exclusive choice pattern. The administrator selects the
authoritative version, the alternative translations for storage and the bad translations
that are discarded. This is an example action requiring the support of the Multichoice
split pattern. In the case of the alternative translations, there could be one, two or none
being sent for storage and that requires supporting the Multiple Instances without a
priori Run-Time knowledge pattern.
Fig. 3. YAWL representation of Asia Online’s workflow
The case of the bad translations being discarded is the same. Zero or more
translations could be discarded thus requiring the Multiple Instances without a priori
Run-Time knowledge pattern. The translations that are added to the authoritative TM
are then used in the delivered translation to train the MT. This means that the Parallel
Split pattern is required. The delivery of the translation, discarding the bad
translation(s), storing the alternative translations, and training the MT engine all
happen before closing the project. However, the project could close without any
6 Aram Morera, Lamine Aouad, and J.J. Collins
translation being stored in the alternative translation TM or being discarded. To
support this, one must use the Acyclic Synchronizing Merge. After this is done, the
project is closed in an explicit manner. This requires support for the Explicit
Termination pattern. A YAWL representation of this workflow is depicted in figure 3.
Facebook uses different models for localisation depending on the language [7]. One
of their models lets the users suggest translations for the strings in the User Interface
(UI). Because we have a known number of strings and they may individually undergo
the translation and voting subworkflow, we need a Multiple Instances without
Synchronization task where each TU corresponds to an instance.
Each string can receive multiple translation suggestions from an unknown number
of users in the subworkflow. Supporting this requires supporting the Multiple
Instances without a priori Run-Time Knowledge pattern. Once a translation is
suggested, an unknown number of users will rate it. This again requires supporting the
Multiple Instances without a priori Run-Time Knowledge pattern. The suggestion with
the highest rating is selected and becomes the translation that appears on the UI, but
this may be replaced later by a more popular translation. In order to support the
translations being replaced over time, we need once more support for the Multiple
Instances without a priori Run-Time Knowledge pattern. A YAWL representation of
the translate and vote subworkflow in Facebook is depicted in figure 4.
Facebook's would ideally support the implicit termination pattern, but this has not
been explicitly indicated by Losse [7].
Fig. 4. YAWL representation of Facebook's translate and vote subworkflow
Pootle.
Pootle has been used to localize products such as Firefox and LibreOffice. Pootle
can obtain suggestions from MT systems, but these were not enabled in our instance
and hence do not appear in the workflow. When a user creates a project, a file is
divided in TUs that can be translated individually thus requiring the Multiple
Instances without Synchronization pattern.
The subworkflow starts with a pre-translation task that leverages Pootle's TM. Two
tasks can happen concurrently after the leverage: translation suggestion and
translation submission. To support this concurrency we need to precede the tasks with
a Parallel Split. The translation-suggestion task can be carried out an unknown
number of times by an unknown number of users. The Multiple Instances without a
priori Run-Time Knowledge pattern is required to support this interaction. The submit
translation task can be carried out directly after the leverage, or after some
translations have been suggested. This implies the need for an Acyclic Synchronizing
Assessing Support for Community Workflows in Localisation 7
Merge where the arches from the leverage and suggest translation tasks meet. After
the submission forty seven automatic quality checks are concurrently carried out.
Concurrence of different tasks again requires support for the Parallel Split pattern.
The system waits until all the tests are finished to display the errors, thus requiring the
Synchronization pattern. The number of errors is not known until the checks are
carried out and displayed, and implies support for the Multiple Instances without a
priori Run-Time Knowledge pattern. To allow users with the right permissions to
solve zero or more of the issues the system must support the Multiple Instances
without a priori Run-Time Knowledge pattern. A YAWL representation of this
subworkflow is shown in figure 5.
Fig. 5. YAWL representation of Pootle's translation subworkflow
At any time a user with the right permissions can declare the project finished. To
be able to react to this signal the system must support the Transient Trigger pattern.
This triggers the cancellation of all the activities in the case without making the case
unsuccessful, thus requiring the Cancel Region pattern. The cancellation of tasks
prevents the creation of new work items and causes implicit termination of the case,
which requires support for the Implicit Termination pattern.
3 Mapping Study and Analysis
A set of control flow patterns have emerged from the use cases above. In this section,
the support for those patterns in the workflow engine of two TMSs is analysed. TMSs
were developed to satisfy the automation needs of traditional LSPs; however their
workflow modules may not be flexible enough to render them suitable for community
localisation. A number of additional patterns are identified and added to the list
because of their potential suitability for crowdsourcing in an enterprise workflow.
8 Aram Morera, Lamine Aouad, and J.J. Collins
3.1 TMS Selection
According to Rinsche [5] eighty eight out of five hundred and sixty two companies
claim to be using a TMS. Of these, eighty six stated that they were using systems
developed in-house. While these numbers agree with Sargent [18], the report
questions the validity of these responses given the confusion with respect to the
description of a TMS. Two of the systems named in the report will serve as a
reference - GlobalSight and WorldServer. These systems were chosen because both of
them fit the description given by Sargent and DePalma [4], and their support for
workflow configurations that go beyond the lineal workflow.
Both GlobalSight and WorldServer include a series of workflows by default and
have workflow management engines that can be used to enact any other workflow
that they support. Russell [14] suggested that the support for certain patterns could be
used to assess the suitability of a workflow system for a project. This suggestion is
followed by analysing the suitability of these off-the-shelf products for localisation
projects that involve a community.
3.2 Relevant Patterns
Basic Control Patterns
Table 1 shows support for basic control patterns in GlobalSight and WorldServer.
Only Parallel Split and Synchronization are not supported by GlobalSight. This is due
to GlobalSight's inability to support concurrent tasks, and does not negatively impact
its suitability for enterprise based localisation workflows. There are examples of
successful deployments in companies such as salesforce.com [15], Spartan
Consulting, and YYZ Translations [16].
WorldServer’s support for the Parallel Split and Synchronization patterns is
limited and offered via its parallel revision and parallel subworkflow constructs. The
parallel review construct ties each Parallel Split to a Synchronization that is always
followed by an Exclusive Choice split. The construct limits thus the power of these
patterns that could have been combined in other manners.
Although its functionality differs, from the point of view of control the control
flow patterns works like the parallel review construct.
In the context of crowdsourcing, it would also be necessary to support parallelism
followed by a free choice of joins, specially the Acyclic Synchronizing merge for
crowdsourcing. WorldServer’s support for the Parallel Spilt is therefore considered
incomplete.
Basic Control GlobalSight WorldServer
Parallel Split 0 1*
Synchronization 0 1*
Exclusive Choice 1 1
Assessing Support for Community Workflows in Localisation 9
Table 1.
Advanced Branching and Synchronizing Patterns
Table 2 shows support for advanced branching and synchronization patterns in
GlobalSight and WorldServer. Neither of the systems supports any of the advanced
branching patterns that emerged from the use cases. This illustrates that traditional
TMSs are probably not suited for the management of community translations efforts.
Supporting the Acyclic Synchronizing Merge appears to be a pre-requisite, given that,
in crowdsourcing, tasks may be delayed or not undertaken for long periods of time.
Advanced Branching GlobalSight WorldServer
Multichoice 0 0
Acyclic Synchronizing Merge 0 0
Table 2.
Structural Patterns
Table 3 shows support for structural patterns in GlobalSight and WorldServer.
Although no use cases brought up any of the two looping patterns, one can argue that
support for them would be useful in crowdsourcing scenarios and both TMSs support
these constructs. Where in traditional localisation workflows it is common to find a
translate-review loop, crowdsourcing processes replace this with multiple instances
that include the corrections that would usually emerge from the reviews. This system
prevents collaborators from finding out why their translation was not approved and
hinders thus their learning experience. Although support for the structured loop
pattern is a requirement for traditional translation/review loop, none of the TMSs has
functionality to count the number of iterations carried out that would be necessary in
community workflows. This issue is implicitly acknowledged by GlobalSight as none
of its preconfigured workflows uses any kind of loop. Also in the case of the
Arbitrary Cycle pattern, the issue of the lack of a counter means that the use of this
pattern could result in an infinite loop.
Structural Patterns GlobalSight WorldServer
Arbitrary Cycles 1 1
Structured Loop 1 1
Recursion 0 1
Implicit Termination 0 0
Explicit Termination 1 1
Table 3.
The use case analysis did not demonstrate a need for the Recursion pattern, but it
could be useful for crowdsourcing. For example, it would be useful to let testing call
themselves, if during a bug test another bug emerged. Although it is not apparent,
WorldServer supports Recursion through the subworkflow and parallel subworkflows
10 Aram Morera, Lamine Aouad, and J.J. Collins
constructs. Both systems support the explicit termination pattern that appeared only in
the Asia Online use case, but neither implements the implicit termination pattern that
appears in several of the other use cases.
Multiple Instance Patterns
Table 4 shows support for multiple instance patterns in GlobalSight and WorldServer,
again emphasizing the fact that GlobalSight and WorldServer are traditional TMS
systems developed to support LSP project management practices. Only WorldServer
supports one of the multiple instance patterns - Multiple Instances with a priori
Design-Time Knowledge pattern. This makes perfect sense with respect to the use
cases of traditional localisation where resource utilization is maximized by having a
one-to-one mapping only, for example, between translators and files.
Multiple Instances GlobalSight WorldServer
Multiple Instances without Synchronization 0 0
Multiple Instances with a priori Design-Time
knowledge 0 1
Multiple Instances with a priori Run-Time
knowledge 0 0
Multiple Instances without a priori Run-Time
knowledge 0 0
Table 4.
The Multiple Instances with a priori Run-Time Knowledge, like the Multiple
Instances with a priori Design-Time Knowledge pattern, implies a need for
synchronization later on. If these patterns are used in crowdsourcing, they may cause
a stall of the progression of the workflow as the more difficult tasks may not be
tackled by any member. However, applying them implies guaranteeing that tasks
involved in the pattern are completed before moving on to the next step. This feature
is potential useful and the reason why the pattern has been added to the list of
required patterns.
Supporting Multiple Instances without Synchronization allows a number of
activities to start and be carried out independently without blocking the progress of
other activities at any point.
Cancellation Patterns
Both systems support cancelling a case, but only in reaction to a trigger given by the
project administrators.
Cancellation Patterns GlobalSight WorldServer
Cancel Case 1 1
Cancel Region 0 0
Table 5.
Assessing Support for Community Workflows in Localisation 11
Trigger patterns
Both systems support triggers from manual cancellation signals, and this does not
constitute proper support of the pattern.
Trigger Patterns GlobalSight WorldServer
Transient Trigger 1 1
Table 6.
4 Conclusion and discussion
This comparative study identifies a list of seventeen patterns with thirteen
emerging from the use cases and the remaining four being added for completeness of
the specification. GlobalSight has partial/full support for six of these patterns, of
which four appear in the use cases. Likewise, WorldServer has partial/full support for
ten, seven of which were recovered from use cases. While this coverage is
incomplete, Van Der Aalst states that none of the general purpose workflow systems
offer support for all the patterns in the catalogue [17]. Furthermore, TMSs being
specialized tools are invariably developed using a subset of patterns in this catalogue.
Both systems can be extended by means of their Application Programming Interfaces
(API) potentially enabling support for missing patterns. The mapping study
demonstrates that TMSs designed to support traditional enterprise-oriented
localisation workflows do not map cleanly to crowdsourced localisation scenarios
because of the gaps identified.
A limitation of this comparative study is the number of systems evaluated.
Enterprise tools such as Lingotek support a workflow that uses crowdsource-like
translation, and MemoQ with its online document management module allows a type
of interaction that fits with the crowdsourcing approach to localisation [18].
Furthermore, GlobalSight has been extended with a module called CrowdSight that
intends to make it suitable to support crowdsourcing.
Besides this limitation, the community tools discussed focus on the translation
task. The crowdsourcing model, with processes unmarred by deadlines, executed by
many actors and tasks that can be left incomplete, if applied to them, will probably
generate the same set of patterns for other processes, like terminology and QA,
however, at the time of this writing, no tools or data were available to back up this
claim.
The next phase of this research will expand the number of subjects and include the
platforms mentioned. However, initial modelling of the patterns required to support
the use cases of the community tools reveal that a crowdsourcing workflow system
would have to implement several patterns for parallel tasks, multiple instance tasks,
and advanced merging patterns that allow the progression of the workflow without the
tasks being complete.
12 Aram Morera, Lamine Aouad, and J.J. Collins
Acknowledgement - this research is supported by the Science Foundation Ireland
(Grant 07/CE/I1142) as part of the Centre for Next Generation Localisation
(www.cngl.ie) at University of Limerick.
References
[1] Lommel, A. (2003).The Localization Industry Primer. 2nd ed, SMP Marketing and LISA.
Available at: http://www.cit.griffith.edu.au/~davidt/cit3611/LISAprimer.pdf
[2] Schaeler, R. (2008). Communication as a Key to Global Business. Connecting People
with Technology: Issues in Professional Communication. G. F. Hayhoe and H. M. Grady
(Eds.), Baywood Publishing Company.
[3] Van Genabith, J. (2009). Next Generation Localisation. Localisation Focus 8(1):4-10.
[4] Sargent, B. and D. DePalma (2008). Translation Management Systems: Assessment of
Commercial and LSP specific TMS Offerings. Common Sense Advisory.
[5] Rinsche, A. and N. Portera-Zanotti (2009). Study on the size of the language industry in
the EU. European Commission.
[6] Ray, R. and N. Kelly (2011). Crowdsourced Translation Best Practices for
Implementation. Common Sense Advisory.
[7] Losse, K. (2008). Facebook - Achieving Quality in a Crowd-sourced Translation
Environment. LRC XIII Localisation4All Conference, Ireland.
[8] Rickard, J. (2009). Translation in the Community. LRC XIV Localisation in The Cloud
Conference, Limerick, Ireland, September 2009.
[9] Mackenzie, A. (2006). Internationalization: software, universality and otherness.
Internationalization In Java.
[10] Dalvit, L., A. Terzoli, et al. (2008). Opensource software and localisation in indigenous
South African languages with Pootle. SATNAC'2008.
[11] Cockburn, A. (2001). Writing effective use cases. Addison-Wesley.
[12] Vashee, K. (2009). MT Technology in the Cloud - An evolving model. LRC XIV,
Localisation in The Cloud Conference, Limerick, Ireland, 2009.
[13] Russell, N., A. H. M. ter Hofstede, et al. (2007). newYAWL: achieving comprehensive
patterns support in workflow for the control-flow, data and resource perspectives. BPM
Center, Report BPM-07-05, BPMcenter.org.
[14] Russell, N., A. H. M. Ter Hofstede, et al. (2006). Workflow Control-Flow Patterns: A
Revised View. BPM Center Technical Report BPM-06-22.
[15] Wunderlich, M. (2011). Our Globalsight migration - lessons learnt. Accessed at:
http://www.martinwunderlich.com/?p=48 June 2011.
[16] Ghaznawi, S. (2010). GlobalSight and LSPs. ELIA Networking Days, Istanbul.
[17] Van Der Aalst, W. M. P., A. H. M. Ter Hofstede, et al. (2003). Workflow patterns.
Distributed and Parallel Databases 14(1):5-51.
[18] Sargent, B. (2007). Translation Management Systems and Subcategories. Multilingual
18(3): 83-86.