assessing support for community workflows in localisation · assessing support for community...

Assessing Support for Community Workflows in Localisation

Aram Morera, Lamine Aouad, and J.J. Collins

Localisation Research Centre - Centre for Next Generation Localisation (CNGL)

Dept. of Computer Science and Information Systems

University of Limerick, Limerick, Ireland

aram.morera-mesa,lamine.aouad,[email protected]

Abstract. This paper identifies a set of workflow patterns necessary to support

community-oriented localisation. Workflow pattern discovery is based on use

case analysis of five community translation tools, and modelled using the Yet

Another Workflow Language (YAWL) notation. An analysis is presented of the

support for these baseline patterns in two mainstream enterprise-oriented

Translation Management Systems (TMS) - GlobalSight and WorldServer. A

gap is identified with respect to the emerging need for community-oriented

workflows and their potential support in mainstream enterprise localisation

architectures.

Keywords: crowdsourcing, workflows, workflow patterns, localization,

automation

1 Introduction

The Localization Industry Standards Association (LISA) describe localisation as

“the process of modifying products or services to account for differences in distinct

markets” [1]. Localization also includes infrastructural support such as project

management, engineering, quality assurance, and human-computer interface design

issues [2]. Globalisation and pervasive internet access are driving demand for

localised content at lower cost without sacrificing quality. However, this demand is

not currently satisfied, and the resulting deficit is referred to as the digital divide.

Overcoming this challenge will require higher levels of automation [3], such as the

use of workflow enactment engines to support the business processes. In addition, it is

argued that use of the crowdsourcing paradigm through community-oriented

localisation infrastructures will be necessary [3]. Combining crowdsourcing and

2 Aram Morera, Lamine Aouad, and J.J. Collins

automation through workflows requires the specification of community-oriented

localisation workflows.

Traditional approaches to localisation as embodied in enterprise level platforms

have embraced automation support at various points in the workflow. For example,

Translation Memory (TM) tools such as TRADOS, Deja vu, and Wordfast, to name a

few, facilitate increasing consistency within and across projects, and free translators

from manually intensive operations such as copying and pasting of text that had

already been translated. Machine Translation has made it possible to deliver lower

quality translation at almost no cost in near real-time. In addition, Translation

Management Systems (TMS) orchestrate the business functions, project tasks, process

workflows and language technologies that underpin large-scale translation activity

[4].

These systems were designed to help enterprise level Language Service Providers

(LSPs) to translate large amounts of content using a predominantly freelance

workforce [5]. The emergence of the crowdsourcing paradigm and Web 2.0 has

allowed companies and NGOs to leverage the community to do the translation [6].

Two examples of NGOs doing this are The Rosetta Foundation and Translate.org.za.

Facebook [7] and PcTools [8] are for profit companies that have developed

proprietary tools in order to leverage their communities to translate their strings for

free. Other examples include open source projects such as Ubuntu [9], LibreOffice

and Firefox that are localized by their communities [10]. This community-based

approach is seen as a necessary tactic to address the ever growing demand for

localised content [3].

Different strategies and technologies have been adopted by these organizations to

carry out their community-oriented localisation projects. This paper analyzes the

community-oriented approaches enabled by the localisation technologies of Crowdin,

Facebook, Asia Online, Pootle, and LaunchPad. Use cases are recovered for these

community translation tools through manual screen scrapes and/or analysis of user

manuals. Workflow patterns are suggested for these use cases that will best support

the desired functionality, with some additional patterns incorporated to enhance the

quality of the service. These patterns discovered through use case analysis constitute a

baseline for community-oriented localisation workflows. This baseline is used for

comparison with the patterns supported in two mainstream enterprise-oriented

localisation industry TMS tools - GlobalSight and WorldServer. This facilitates the

identification of the gap in workflow support between enterprise and the emerging

community-oriented paradigm based on crowdsourcing.

Section 2 of this paper presents a series of use cases for community translation

tools and the workflows that emerge from them. Section 3 presents the TMSs that

were used; and a mapping study showing their support for the patterns that were

discovered in section 2. The paper concludes with a discussion in section 4 that

outlines future research directions.

Assessing Support for Community Workflows in Localisation 3

2 Discovering Community-Oriented Workflows

Use cases are captured in order to identify the patterns that should be supported.

Use cases are descriptions of sequences of interactions between the system and its

users [11]. If a pattern can be mapped to the actions in a use case, it is deemed

necessary to support it for community translation. The sequence pattern is not

included in these use cases as it is an implicit requirement for any kind of workflow.

For Asia online and Facebook, the process was followed as captured from talks given

by Losse [7] and Vashee [12]. For Crowdin, Pootle, and LaunchPad, a number of

projects and user accounts were created. Projects were executed by simulating a

crowd using the different user accounts that were harnessed to provide translations,

votes and comments by iterating through the fields in the screens presented. These

screens were used to specify use cases. The use cases are the basis for workflows that

are modelled using Yet Another Workflow Language (YAWL) [13] with the notation

shown in figure 1. Note that the OR Join and the OR Split can map to different

patterns, that the shading does not carry any implicit semantics, and that the sequence

pattern has not been made explicit in the use cases.

Fig. 1. Workflow notations

2.1 Use cases

The use cases are presented next for a subset of the targeted platforms.

Crowdin

Crowdin has been used to localize products such as the Android app Titanium

Backup that has 0.5 - 1 million users. Three different accounts were created, one for

the project manager and the other two in order to simulate a crowd. A project can be

configured as managed in that members of the crowd have to be accepted by the

project manager; or open in that anyone can participate. The crowd has access to the

source text, translations from Crowdin's TM, and translations from Google's and


Microsoft's Machine Translation (MT) systems, once the source files have been

uploaded in the designated file format. They can then suggest alternative translations

or vote for or against other translations. The project creator can give extra rights at

any time to any user who then becomes a group leader. The project creator and the

group leaders can also approve translations that are then hidden from the crowd. This

system creates a Multiple Instances without Synchronization task for each new file.

Each instance corresponds to a translation unit, their number is known at run time and

each instance is an independent thread that does not block the progress of the

workflow if it is not executed. These instances consist of a subworkflow that

commences with a pre-translation stage that has three tasks: Google MT translation,

Bing MT translation and Crowdin's TM leverage. These three tasks happen in parallel

and thus a Parallel Split must precede them. Crowdin's TM may present none or more

matches, this means that it is a Multiple Instances without a priori Run-Time

Knowledge task. Because the MT systems may fail and the TM may produce no

results, the control thread must converge over an Acyclic Synchronizing Merge.

Fig. 2. YAWL representation - Crowdin's translate and vote subworkflow

After the merge, the crowd can suggest their own translations. Since multiple users

can suggest translations and any single user can suggest a number of translations, this

is a Multiple Instances without a priori Run-Time knowledge task. A translation can

be voted upon until it is approved. As with the suggest-translation task, the number of

votes that a translation will receive before it is approved is not known and it is

therefore modelled as a Multiple Instances without a priori Run-Time knowledge task.

Since a translation can be approved before it receives any votes, the split that follows

the suggest-translation task must be a Multichoice Split. As the control flow can pass

directly from the suggest-translation task to the approve-translation task, the approve

translation task must be preceded by an Acyclic Synchronizing Merge, so that an

absence of votes does not stall the flow. At any time a user with the right permissions

can declare the project finished. To be able to react to this signal the system must

support the Transient Trigger pattern. This triggers the cancellation of all the

activities in the case without making the case unsuccessful, thus requiring the Cancel


Region pattern. The cancellation of tasks prevents the creation of new work items and

causes the implicit termination of the case, which requires support for the Implicit

Termination pattern.

Figure 2 illustrates a YAWL representation of the translate-and-vote subworkflow.

Asia Online

Asia Online has translated part of the content of English Wikipedia to Thai using

MT and a selected community of users [12]. Each document goes through an MT

translation process. After this 3 instances of a post-edition task are created. To support

this it is necessary to use the Multiple Instances with a priori Design-Time Knowledge

pattern. These post-edition instances are carried out by one community member each.

Then, their corrections are compared. As waiting for the three instances to finish is

required, this means that support for the Synchronization pattern is also required. If

two corrections are the same they are automatically sent to the authoritative

translation database, otherwise, the corrections go to an administrator. To support this

you must support the Exclusive choice pattern. The administrator selects the

authoritative version, the alternative translations for storage and the bad translations

that are discarded. This is an example action requiring the support of the Multichoice

split pattern. In the case of the alternative translations, there could be one, two or none

being sent for storage and that requires supporting the Multiple Instances without a

priori Run-Time knowledge pattern.

Fig. 3. YAWL representation of Asia Online’s workflow

The case of the bad translations being discarded is the same. Zero or more

translations could be discarded thus requiring the Multiple Instances without a priori

Run-Time knowledge pattern. The translations that are added to the authoritative TM

are then used in the delivered translation to train the MT. This means that the Parallel

Split pattern is required. The delivery of the translation, discarding the bad

translation(s), storing the alternative translations, and training the MT engine all

happen before closing the project. However, the project could close without any


translation being stored in the alternative translation TM or being discarded. To

support this, one must use the Acyclic Synchronizing Merge. After this is done, the

project is closed in an explicit manner. This requires support for the Explicit

Termination pattern. A YAWL representation of this workflow is depicted in figure 3.

Facebook

Facebook uses different models for localisation depending on the language [7]. One

of their models lets the users suggest translations for the strings in the User Interface

(UI). Because we have a known number of strings and they may individually undergo

the translation and voting subworkflow, we need a Multiple Instances without

Synchronization task where each TU corresponds to an instance.

Each string can receive multiple translation suggestions from an unknown number

of users in the subworkflow. Supporting this requires supporting the Multiple

Instances without a priori Run-Time Knowledge pattern. Once a translation is

suggested, an unknown number of users will rate it. This again requires supporting the

Multiple Instances without a priori Run-Time Knowledge pattern. The suggestion with

the highest rating is selected and becomes the translation that appears on the UI, but

this may be replaced later by a more popular translation. In order to support the

translations being replaced over time, we need once more support for the Multiple

Instances without a priori Run-Time Knowledge pattern. A YAWL representation of

the translate and vote subworkflow in Facebook is depicted in figure 4.

Facebook's would ideally support the implicit termination pattern, but this has not

been explicitly indicated by Losse [7].

Fig. 4. YAWL representation of Facebook's translate and vote subworkflow

Pootle.

Pootle has been used to localize products such as Firefox and LibreOffice. Pootle

can obtain suggestions from MT systems, but these were not enabled in our instance

and hence do not appear in the workflow. When a user creates a project, a file is

divided in TUs that can be translated individually thus requiring the Multiple

Instances without Synchronization pattern.

The subworkflow starts with a pre-translation task that leverages Pootle's TM. Two

tasks can happen concurrently after the leverage: translation suggestion and

translation submission. To support this concurrency we need to precede the tasks with

a Parallel Split. The translation-suggestion task can be carried out an unknown

number of times by an unknown number of users. The Multiple Instances without a

priori Run-Time Knowledge pattern is required to support this interaction. The submit

translation task can be carried out directly after the leverage, or after some

translations have been suggested. This implies the need for an Acyclic Synchronizing


Merge where the arches from the leverage and suggest translation tasks meet. After

the submission forty seven automatic quality checks are concurrently carried out.

Concurrence of different tasks again requires support for the Parallel Split pattern.

The system waits until all the tests are finished to display the errors, thus requiring the

Synchronization pattern. The number of errors is not known until the checks are

carried out and displayed, and implies support for the Multiple Instances without a

priori Run-Time Knowledge pattern. To allow users with the right permissions to

solve zero or more of the issues the system must support the Multiple Instances

without a priori Run-Time Knowledge pattern. A YAWL representation of this

subworkflow is shown in figure 5.

Fig. 5. YAWL representation of Pootle's translation subworkflow

At any time a user with the right permissions can declare the project finished. To

be able to react to this signal the system must support the Transient Trigger pattern.

This triggers the cancellation of all the activities in the case without making the case

unsuccessful, thus requiring the Cancel Region pattern. The cancellation of tasks

prevents the creation of new work items and causes implicit termination of the case,

which requires support for the Implicit Termination pattern.

3 Mapping Study and Analysis

A set of control flow patterns have emerged from the use cases above. In this section,

the support for those patterns in the workflow engine of two TMSs is analysed. TMSs

were developed to satisfy the automation needs of traditional LSPs; however their

workflow modules may not be flexible enough to render them suitable for community

localisation. A number of additional patterns are identified and added to the list

because of their potential suitability for crowdsourcing in an enterprise workflow.


3.1 TMS Selection

According to Rinsche [5] eighty eight out of five hundred and sixty two companies

claim to be using a TMS. Of these, eighty six stated that they were using systems

developed in-house. While these numbers agree with Sargent [18], the report

questions the validity of these responses given the confusion with respect to the

description of a TMS. Two of the systems named in the report will serve as a

reference - GlobalSight and WorldServer. These systems were chosen because both of

them fit the description given by Sargent and DePalma [4], and their support for

workflow configurations that go beyond the lineal workflow.

Both GlobalSight and WorldServer include a series of workflows by default and

have workflow management engines that can be used to enact any other workflow

that they support. Russell [14] suggested that the support for certain patterns could be

used to assess the suitability of a workflow system for a project. This suggestion is

followed by analysing the suitability of these off-the-shelf products for localisation

projects that involve a community.

3.2 Relevant Patterns

Basic Control Patterns

Table 1 shows support for basic control patterns in GlobalSight and WorldServer.

Only Parallel Split and Synchronization are not supported by GlobalSight. This is due

to GlobalSight's inability to support concurrent tasks, and does not negatively impact

its suitability for enterprise based localisation workflows. There are examples of

successful deployments in companies such as salesforce.com [15], Spartan

Consulting, and YYZ Translations [16].

WorldServer’s support for the Parallel Split and Synchronization patterns is

limited and offered via its parallel revision and parallel subworkflow constructs. The

parallel review construct ties each Parallel Split to a Synchronization that is always

followed by an Exclusive Choice split. The construct limits thus the power of these

patterns that could have been combined in other manners.

Although its functionality differs, from the point of view of control the control

flow patterns works like the parallel review construct.

In the context of crowdsourcing, it would also be necessary to support parallelism

followed by a free choice of joins, specially the Acyclic Synchronizing merge for

crowdsourcing. WorldServer’s support for the Parallel Spilt is therefore considered

incomplete.

Basic Control GlobalSight WorldServer

Parallel Split 0 1*

Synchronization 0 1*

Exclusive Choice 1 1


Table 1.

Advanced Branching and Synchronizing Patterns

Table 2 shows support for advanced branching and synchronization patterns in

GlobalSight and WorldServer. Neither of the systems supports any of the advanced

branching patterns that emerged from the use cases. This illustrates that traditional

TMSs are probably not suited for the management of community translations efforts.

Supporting the Acyclic Synchronizing Merge appears to be a pre-requisite, given that,

in crowdsourcing, tasks may be delayed or not undertaken for long periods of time.

Advanced Branching GlobalSight WorldServer

Multichoice 0 0

Acyclic Synchronizing Merge 0 0

Table 2.

Structural Patterns

Table 3 shows support for structural patterns in GlobalSight and WorldServer.

Although no use cases brought up any of the two looping patterns, one can argue that

support for them would be useful in crowdsourcing scenarios and both TMSs support

these constructs. Where in traditional localisation workflows it is common to find a

translate-review loop, crowdsourcing processes replace this with multiple instances

that include the corrections that would usually emerge from the reviews. This system

prevents collaborators from finding out why their translation was not approved and

hinders thus their learning experience. Although support for the structured loop

pattern is a requirement for traditional translation/review loop, none of the TMSs has

functionality to count the number of iterations carried out that would be necessary in

community workflows. This issue is implicitly acknowledged by GlobalSight as none

of its preconfigured workflows uses any kind of loop. Also in the case of the

Arbitrary Cycle pattern, the issue of the lack of a counter means that the use of this

pattern could result in an infinite loop.

Structural Patterns GlobalSight WorldServer

Arbitrary Cycles 1 1

Structured Loop 1 1

Recursion 0 1

Implicit Termination 0 0

Explicit Termination 1 1

Table 3.

The use case analysis did not demonstrate a need for the Recursion pattern, but it

could be useful for crowdsourcing. For example, it would be useful to let testing call

themselves, if during a bug test another bug emerged. Although it is not apparent,

WorldServer supports Recursion through the subworkflow and parallel subworkflows


constructs. Both systems support the explicit termination pattern that appeared only in

the Asia Online use case, but neither implements the implicit termination pattern that

appears in several of the other use cases.

Multiple Instance Patterns

Table 4 shows support for multiple instance patterns in GlobalSight and WorldServer,

again emphasizing the fact that GlobalSight and WorldServer are traditional TMS

systems developed to support LSP project management practices. Only WorldServer

supports one of the multiple instance patterns - Multiple Instances with a priori

Design-Time Knowledge pattern. This makes perfect sense with respect to the use

cases of traditional localisation where resource utilization is maximized by having a

one-to-one mapping only, for example, between translators and files.

Multiple Instances GlobalSight WorldServer

Multiple Instances without Synchronization 0 0

Multiple Instances with a priori Design-Time

knowledge 0 1

Multiple Instances with a priori Run-Time

knowledge 0 0

Multiple Instances without a priori Run-Time

knowledge 0 0

Table 4.

The Multiple Instances with a priori Run-Time Knowledge, like the Multiple

Instances with a priori Design-Time Knowledge pattern, implies a need for

synchronization later on. If these patterns are used in crowdsourcing, they may cause

a stall of the progression of the workflow as the more difficult tasks may not be

tackled by any member. However, applying them implies guaranteeing that tasks

involved in the pattern are completed before moving on to the next step. This feature

is potential useful and the reason why the pattern has been added to the list of

required patterns.

Supporting Multiple Instances without Synchronization allows a number of

activities to start and be carried out independently without blocking the progress of

other activities at any point.

Cancellation Patterns

Both systems support cancelling a case, but only in reaction to a trigger given by the

project administrators.

Cancellation Patterns GlobalSight WorldServer

Cancel Case 1 1

Cancel Region 0 0

Table 5.


Trigger patterns

Both systems support triggers from manual cancellation signals, and this does not

constitute proper support of the pattern.

Trigger Patterns GlobalSight WorldServer

Transient Trigger 1 1

Table 6.

4 Conclusion and discussion

This comparative study identifies a list of seventeen patterns with thirteen

emerging from the use cases and the remaining four being added for completeness of

the specification. GlobalSight has partial/full support for six of these patterns, of

which four appear in the use cases. Likewise, WorldServer has partial/full support for

ten, seven of which were recovered from use cases. While this coverage is

incomplete, Van Der Aalst states that none of the general purpose workflow systems

offer support for all the patterns in the catalogue [17]. Furthermore, TMSs being

specialized tools are invariably developed using a subset of patterns in this catalogue.

Both systems can be extended by means of their Application Programming Interfaces

(API) potentially enabling support for missing patterns. The mapping study

demonstrates that TMSs designed to support traditional enterprise-oriented

localisation workflows do not map cleanly to crowdsourced localisation scenarios

because of the gaps identified.

A limitation of this comparative study is the number of systems evaluated.

Enterprise tools such as Lingotek support a workflow that uses crowdsource-like

translation, and MemoQ with its online document management module allows a type

of interaction that fits with the crowdsourcing approach to localisation [18].

Furthermore, GlobalSight has been extended with a module called CrowdSight that

intends to make it suitable to support crowdsourcing.

Besides this limitation, the community tools discussed focus on the translation

task. The crowdsourcing model, with processes unmarred by deadlines, executed by

many actors and tasks that can be left incomplete, if applied to them, will probably

generate the same set of patterns for other processes, like terminology and QA,

however, at the time of this writing, no tools or data were available to back up this

claim.

The next phase of this research will expand the number of subjects and include the

platforms mentioned. However, initial modelling of the patterns required to support

the use cases of the community tools reveal that a crowdsourcing workflow system

would have to implement several patterns for parallel tasks, multiple instance tasks,

and advanced merging patterns that allow the progression of the workflow without the

tasks being complete.


Acknowledgement - this research is supported by the Science Foundation Ireland

(Grant 07/CE/I1142) as part of the Centre for Next Generation Localisation

(www.cngl.ie) at University of Limerick.

References

[1] Lommel, A. (2003).The Localization Industry Primer. 2nd ed, SMP Marketing and LISA.

Available at: http://www.cit.griffith.edu.au/~davidt/cit3611/LISAprimer.pdf

[2] Schaeler, R. (2008). Communication as a Key to Global Business. Connecting People

with Technology: Issues in Professional Communication. G. F. Hayhoe and H. M. Grady

(Eds.), Baywood Publishing Company.

[3] Van Genabith, J. (2009). Next Generation Localisation. Localisation Focus 8(1):4-10.

[4] Sargent, B. and D. DePalma (2008). Translation Management Systems: Assessment of

Commercial and LSP specific TMS Offerings. Common Sense Advisory.

[5] Rinsche, A. and N. Portera-Zanotti (2009). Study on the size of the language industry in

the EU. European Commission.

[6] Ray, R. and N. Kelly (2011). Crowdsourced Translation Best Practices for

Implementation. Common Sense Advisory.

[7] Losse, K. (2008). Facebook - Achieving Quality in a Crowd-sourced Translation

Environment. LRC XIII Localisation4All Conference, Ireland.

[8] Rickard, J. (2009). Translation in the Community. LRC XIV Localisation in The Cloud

Conference, Limerick, Ireland, September 2009.

[9] Mackenzie, A. (2006). Internationalization: software, universality and otherness.

Internationalization In Java.

[10] Dalvit, L., A. Terzoli, et al. (2008). Opensource software and localisation in indigenous

South African languages with Pootle. SATNAC'2008.

[11] Cockburn, A. (2001). Writing effective use cases. Addison-Wesley.

[12] Vashee, K. (2009). MT Technology in the Cloud - An evolving model. LRC XIV,

Localisation in The Cloud Conference, Limerick, Ireland, 2009.

[13] Russell, N., A. H. M. ter Hofstede, et al. (2007). newYAWL: achieving comprehensive

patterns support in workflow for the control-flow, data and resource perspectives. BPM

Center, Report BPM-07-05, BPMcenter.org.

[14] Russell, N., A. H. M. Ter Hofstede, et al. (2006). Workflow Control-Flow Patterns: A

Revised View. BPM Center Technical Report BPM-06-22.

[15] Wunderlich, M. (2011). Our Globalsight migration - lessons learnt. Accessed at:

http://www.martinwunderlich.com/?p=48 June 2011.

[16] Ghaznawi, S. (2010). GlobalSight and LSPs. ELIA Networking Days, Istanbul.

[17] Van Der Aalst, W. M. P., A. H. M. Ter Hofstede, et al. (2003). Workflow patterns.

Distributed and Parallel Databases 14(1):5-51.

[18] Sargent, B. (2007). Translation Management Systems and Subcategories. Multilingual

18(3): 83-86.

assessing support for community workflows in localisation · assessing support for community...

Documents