a rule triggering system for automatic text-to-sign translation - limsi

8
A rule triggering system for automatic text-to-Sign translation Michael Filhol LIMSI–CNRS B.P. 133, 91403 Orsay cedex France michael.fi[email protected] Mohamed Nassime Hadjadj LIMSI–CNRS, France [email protected] grenoble3.fr Benoît Testu LIMSI–CNRS France [email protected] ABSTRACT The topic of this paper is machine translation from French text to French Sign Language (LSF). After arguing in favour of a rule-based method, it presents the architecture of an MT system, built on two distinct efforts: formalising LSF pro- duction rules and triggering LSF rules by text processing. The former is made without any concern for text or transla- tion and involves corpus analysis to link LSF form features to linguistic functions. It produces a set of production rules which we propose can constitute a full production gram- mar. The latter is an information extraction task from text, broken down in as many subtasks as there are rules in the grammar. After discussing this architecture, comparing it to the traditional methods and presenting the methodology for each task, we present the set of production rules found to govern event precedence and duration in LSF, and give a progress report on the implementation of the rule triggering system. With this proposal, we hope also to show how MT can benefit today from Sign Language processing. Categories and Subject Descriptors I.2.7 [Artificial intelligence]: Natural language process- ing—Language models, Machine translation General Terms Translation Keywords Automatic translation, Sign Language 1. INTRODUCTION The Sign Language (SL) community interested in com- puter applications has spent significant time in the past decade on language modelling and animation software im- plementation. Applications like machine translation (MT) Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SLTAT 2013, Chicago, IL, USA Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00. systems involving SLs would increase accessibility of inform- taion to the deaf public. The problem when addressing MT often comes from the less-resourced status of SLs. We explain this in the next section when presenting the different available approaches to the problem, and propose a few choices to tackle the is- sue. We propose to base our MT system on a grammar of SL production rules, following a methodology and cor- pus presented in section 3. Then, we present a new type of transfer-based MT architecture, based on rule triggering from selected text processing, which we take the time to discuss in section 4. Before concluding, we report on the progress we have made in both production rule formalising and text processing modules. 2. MACHINE TRANSLATION There are two methods used for MT in general: rule- based approaches and data-driven approaches, sometimes hybridised. Presently, the latter ones are clearly the domi- nant ones when processing written or vocal languages. 2.1 Traditional and present methods A rule-based approach tries to model linguistic knowledge to formalise rules allowing to process data from the input source through more abstract representations and over to the target language. As Vauquois illustrates it (fig. 1), the more abstract the intermediate representation levels, the further we work from word-to-word translation, the easier the inter-language transfer step, hence probably the better the output [2]. Ideally, an “interlingua” could serve as a language-independant representation half-way through the process (top corner in the picture), a goal argued to be un- reachable. Of course, most of the literature considers text-to-text processes, and very little exists on sign languages. For SL, ViSiCAST is the most significant rule-based achievement [8]. It presents a forward pipeline from English to a chosen SL, and Discourse Representation Theory as the intermediate semantic representation. As far as we know, no translation system was actually demonstrated but the full process was described, including an HPSG grammar to support a syntac- tic level [10]. Other efforts like ZARDOZ [17] and TEAM [20] have used rule-based approaches, but the data-driven methods seemed to have been favoured since over purely rule-based. Data-driven methods are either example-based or statis- tical. Both rely on word-aligned bitexts, i. e. parallel texts whose words in one language are identified in the other’s

Upload: others

Post on 11-Feb-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A rule triggering system for automatic text-to-Sign translation - Limsi

A rule triggering system for automatic text-to-Signtranslation

Michael FilholLIMSI–CNRS

B.P. 133, 91403 Orsay cedexFrance

[email protected]

Mohamed NassimeHadjadj

LIMSI–CNRS, [email protected]

grenoble3.fr

Benoît TestuLIMSI–CNRS

[email protected]

ABSTRACTThe topic of this paper is machine translation from Frenchtext to French Sign Language (LSF). After arguing in favourof a rule-based method, it presents the architecture of an MTsystem, built on two distinct efforts: formalising LSF pro-duction rules and triggering LSF rules by text processing.The former is made without any concern for text or transla-tion and involves corpus analysis to link LSF form featuresto linguistic functions. It produces a set of production ruleswhich we propose can constitute a full production gram-mar. The latter is an information extraction task from text,broken down in as many subtasks as there are rules in thegrammar. After discussing this architecture, comparing itto the traditional methods and presenting the methodologyfor each task, we present the set of production rules foundto govern event precedence and duration in LSF, and give aprogress report on the implementation of the rule triggeringsystem. With this proposal, we hope also to show how MTcan benefit today from Sign Language processing.

Categories and Subject DescriptorsI.2.7 [Artificial intelligence]: Natural language process-ing—Language models, Machine translation

General TermsTranslation

KeywordsAutomatic translation, Sign Language

1. INTRODUCTIONThe Sign Language (SL) community interested in com-

puter applications has spent significant time in the pastdecade on language modelling and animation software im-plementation. Applications like machine translation (MT)

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.SLTAT 2013, Chicago, IL, USACopyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00.

systems involving SLs would increase accessibility of inform-taion to the deaf public.

The problem when addressing MT often comes from theless-resourced status of SLs. We explain this in the nextsection when presenting the different available approachesto the problem, and propose a few choices to tackle the is-sue. We propose to base our MT system on a grammarof SL production rules, following a methodology and cor-pus presented in section 3. Then, we present a new typeof transfer-based MT architecture, based on rule triggeringfrom selected text processing, which we take the time todiscuss in section 4. Before concluding, we report on theprogress we have made in both production rule formalisingand text processing modules.

2. MACHINE TRANSLATIONThere are two methods used for MT in general: rule-

based approaches and data-driven approaches, sometimeshybridised. Presently, the latter ones are clearly the domi-nant ones when processing written or vocal languages.

2.1 Traditional and present methodsA rule-based approach tries to model linguistic knowledge

to formalise rules allowing to process data from the inputsource through more abstract representations and over tothe target language. As Vauquois illustrates it (fig. 1), themore abstract the intermediate representation levels, thefurther we work from word-to-word translation, the easierthe inter-language transfer step, hence probably the betterthe output [2]. Ideally, an “interlingua” could serve as alanguage-independant representation half-way through theprocess (top corner in the picture), a goal argued to be un-reachable.

Of course, most of the literature considers text-to-textprocesses, and very little exists on sign languages. For SL,ViSiCAST is the most significant rule-based achievement [8].It presents a forward pipeline from English to a chosen SL,and Discourse Representation Theory as the intermediatesemantic representation. As far as we know, no translationsystem was actually demonstrated but the full process wasdescribed, including an HPSG grammar to support a syntac-tic level [10]. Other efforts like ZARDOZ [17] and TEAM[20] have used rule-based approaches, but the data-drivenmethods seemed to have been favoured since over purelyrule-based.

Data-driven methods are either example-based or statis-tical. Both rely on word-aligned bitexts, i. e. parallel textswhose words in one language are identified in the other’s

Page 2: A rule triggering system for automatic text-to-Sign translation - Limsi

Figure 1: The Vauquois triangle

word sequence.Example-based methods employ an analogy principle com-

paring the input sentence against the bilingual corpus look-ing for candidate translations. Three steps are generallyplanned in a translation:

1. finding the closest matches for the input on the sourceside of the parallel corpus;

2. retrieving the aligned segments on the target side;

3. recombining target language segments to build the out-put translation.

The basic idea of the statistical methods is to maximisep(T |S), the probability that sentence T be a target trans-lation of source sentence S. It is equivalent to maximisingthe product of p(S), the probability that S can be foundin the source language (a.k.a. language model), and p(S|T ),the probability that S is a translation of T (a. k. a. transla-tion model). Both language and translation models are builtby training machine learning algorithms on large sets of bi-texts. To work properly, the statistical approach thereforenecessitates:

(c1) that source and target productions be viewed as se-quences of units arranged in segmented sentences;

(c2) large pre-aligned parallel corpora to train the systemenough.

Here again, there is not lots of work to be found on SLdata-driven approaches, compared to the large literature ontext, but the newer attempts fall mostly in this category[19, 11, 14, 4, 16]. Hybrid approaches have been proposed,mostly using rules where statistical learning appeared im-possible [18].

2.2 Choices for SLWe have long argued that SLs should be studied as funda-

mentally oral and multi-articulator languages, whose simul-taneous movements and gestures need be synchronised in amulti-linear fashion, and this remains a strong requirementin this work. Quite contrarily, almost all known past andcurrent models impose a string of lexical signs in sequence.Whether or not they add syntactic constraints onto the se-quence, the sign (gloss) is the basic unit for all productions.

None views productions as fully multi-linear. Also, handsare generally the primary—if not the only—articulators ofa Bebian/Stokoe parametric combination, which to us dis-cards too much of SL’s productive expressiveness. The onlymulti-linear formalisms are P/C [9], a model allowing topartition the signing activity into simultaneous tracks, andAZee [5], which not only allows partitioning but also flexiblegeneralisation of partition patterns. We will be using thisformalism in this work.

Besides, SL corpora are scarce, let alone looking for suf-ficient parallel data for statistical training. Had we enoughvideo or mocap data, the question would remain of whatto learn from them. A stream of pixels or body movementswould need to be linearised and segmented, and then alignedbefore being usable here. Regardless of the problem of lin-earisation here again, it is a tedious task, difficult to afford.We therefore believe that too many barriers stand on theway to satisfying conditions (c1) and (c2) of §2.1, and it iswhy we prefer to take the path of a rule-based approach.

Modelling linguistic rules is not trivial, and even less soas our choice is to account for LSF as a language of its ownwith as little bias as possible. To do so, we choose to lookfor patterns linking:

• signed forms: the visible features produced, i. e. thestates, movements and synchronisation of the language’sarticulators, e. g. eyes closed, lower jaw drop, head nod;

• intentional linguistic functions, i. e. the interpreted pur-pose of the production (meaning), whether rhetoric,semantic, lexical, e. g. depicting a path or state theplace of an event.

Our working proposition is that:

1. a sufficient set of functions can be found to build afull SL production grammar, able to generate any LSFutterance—though at this early stage, it is impossibleto really determine how many are needed;

2. each function can potentially be triggered from a de-tection process on the text, which we explain in a latersection as well.

We explain these two steps in the next two sections.

3. SL RULE CREATIONOverall, modelling SL regardless of text is the first step

for us to take. It is the object of this section.

3.1 MethodologyGiven a corpus, the starting point of our methodology is

the search for links between SL observable body and face ar-ticulations (forms) and their linguistic interpretations (func-tions). Beginning with either a form or a function feature,we search through a corpus looking for invariants in therespective counterpart functions or forms. When a groupof similar features is found, we switch form and functionaround and look for occurrences of those features. This isrepeated back and forth to refine the features until an invari-ant in form can be found for (almost) every occurrence of anidentified function. Decribing that form and parametrisingit with contextual arguments raises what we call a “produc-tion rule”, which can be animated by SL synthesis softwarewith a virtual signer on the output end.

Page 3: A rule triggering system for automatic text-to-Sign translation - Limsi

Figure 2: The 2-view video corpus of news items

Conversely, a definite function invariably detected for ev-ery occurrence of a certain form criterion raises an “inter-pretation rule”, i. e. one to be triggered in SL recognitiontasks. But this is not relevant to our task in which LSF isthe target language, not the source. Note that no one-to-onemapping is required between form and function criteria.

This is typically a task where linguistics and computerprocessing overlap. In fact, this effort may well fully beregarded as Sign linguistics.

3.2 CorpusThe corpus we used for most of the work reported here is

one we created late 2012 with professional translators, withthe aim of building the first French-LSF parallel corpus, andthe collaboration of WebSourd1

R©. Sometimes, we have usedother SL ressources too, in particular the narrative sections(vs. dialogue turn taking) of the DictaSign corpus2.

The parallel corpus consists of a set of 40 paragraph-longtexts, eached linked to 3 different LSF translations, totalling120 videos (face and side views) and one hour of journalisticsigning. The texts were real-life news, “secretly” collectedfrom their archives of 2006, chosen to balance around 20semantic criteria. While we hope to publish the full cre-ation protocol in a near future, what matters here is thatwe elicited substantial SL material on various chronologicalprecedence relations between open-domain events. It is filledwith various (French) expressions meaning “A happened be-fore B”, “C happened long/shortly/10 mins after D”, “E haslasted since F”, with lettered elements standing for relativeor absolute dates, events or states, with or without a speci-fied date.

The fact that it is a parallel corpus is not relevant to thisparticular part of our work as the source texts are not used.But it is important to note that the videos are not capturesof on-the-fly interpretation. They are the result of a pre-pared translation process by native (deaf) signers. A pointwas actually made in the elaboration process to remain asclose to their normal daily experience, that is discovering thetexts in the morning, using a 2-hour prep time before fac-ing the camera and being validated by a colleague. For thispaper, translation can be considered as a way of elicitatingselected aspects of language with non-constructed examples.

3.3 AnnotationTo be studied, forms were systematically annotated, fol-

lowing the scheme summarised below, with no look on thetext. For each observed articulator, we list the categories

1www.websourd.org2EU-FP7 project, ended Feb. 2012, www.dictasign.eu; cor-pus at www.sign-lang.uni-hamburg.de/dicta-sign/portal.

used for annotation. Some categories are missing, e. g. lat-eral chin movements or eyelids open wide, but we preferredleaving the exceptional occurrences blank and come backand add categories afterwards if needed, to accumulate morevideo time and articulator annotation. For all articulators,a “standard” category is available, meaning it is in the samestate as before even signing.

• Eyelids: semi-closed, near-closed, closed.

• Eyebrows: mod-raised, mod-lowered (“mod”= not fully).

• Head nod, rotation and tilt (yes/no/maybe axes): nod-up, nod-down, rot-left, rot-right.

• Head displacement (absolute head orientation unchanged):fwd, back, left, right.

• Eye gaze direction: s-sp (signing space), left, right.

• Hand strokes.

Identifying a function systematically signed with a definiteform (yet potentially parametrised with arguments) estab-lishes a production rule. Function-to-form rules are estab-lished regardless of the linguistic levels their functions wouldintuitively pertain to, and together form a SL generationgrammar. Our findings on event durations and precedenceare given in §5.1.

4. AN MT ARCHITECTURE FOR SLRule entries represent a function interpreted on the re-

ceiver’s side, so it is no surprise that they are usually loadedsemantically. Without exactly being equivalent to semanticgraphs or first-order logic predicates, they do offer quite ahigh abstract level of representation of the discourse mean-ing and rhetorics. Having chosen to go for rule-based MTinto LSF, this property makes us propose that they servedirectly as the pivot of our system.

4.1 Rule triggeringThat said, the problem of translation becomes one of de-

tecting the rule header functions in the input text, therebymaking the corresponding rules good candidates for use ina correct translation. In other words, that is looking for theforms in French that bear the function specified by the rulein SL. The production of the SL output forms then becomesa trivial (non-linguistic) task as it is precisely what the in-voked rule describes, and what made the function availablein the first place. Of course, a wrapper function and somecombination (nesting) of the triggered rules will still be miss-ing; we come back to this further down.

This breaks our full problem down into as many text pro-cessing problems as there are rules available, where eachrule becomes an NLP information extraction problem in itstraditional sense. To implement this framework, an NLPdetection module can be programmed for every productionrule header (named function), though some processing mayserve for several tasks. Each module may be more or lesscomplex, and call for any number of helper tools such assyntactic parsing, complex anaphor resolution or simple lex-ical triggers. Of course, factorisation of common processingtasks is eventually welcome. For example, morpho-syntactictypes (POS tags) are obviously useful for many purposes, soa POS-tagging tool can be run in an earlier stage, which wecall “preprocessing stage” and discuss later on here.

Page 4: A rule triggering system for automatic text-to-Sign translation - Limsi

Figure 3: The AZee translation diagram

4.2 DiscussionDefinitely rule-based, our framework is not trivially com-

parable to the pipeline Vauquois describes. Taking a stepback on the proposed scheme, we discuss this in the presentsection, and will be referring to the diagram in figure 3.

The most abstract elements in our proposal are the func-tional entries described by the AZee rules combined in theproduction, which makes AZee the top-most corner of aVauquois-type illustration. However, since they are builtexclusively from SL corpus analysis, regardless of text orindeed of any other language or translation purpose. Oncethe AZee representation is reached, the composition is al-ready “fully Sign”, so we place it on the right-hand side ofthe diagram, the SL side.

On the LSF side, every rule can equally be relevant to:

• the lexical level, if it specifies the stabilised vocabulary;

• syntax, when ordering units as necessary;

• SL grammatical structure in a wider (multi-linear) sense,when sequence alone is not sufficient and finer synchro-nisation rules of overlapping gestural parts are needed;

• discourse structure, if it defines an outline sequence,emphasises header topics, adds the right pauses be-tween the sections...

In any of these cases, from any rule entry, the AZee anima-tion system will synthesise the sign form it describes, usuallyfalling through recursive rules, all the way down to the ar-ticulator movements (cf. phonetics). This explains why nostages are represented in the synthesis process on the right-hand side of the diagram. It is to be considered as a whole,with high-level entries chosen as input, and low-level artic-ulation/animation as output. The system requires neitherprior identification nor systematic use of, for instance, a syn-tactic or a lexical layer.

On the text side, linguistic layers can be identified, inparticular by the tools analysing the source, but they donot have to be taken in the same order or up to the samelevel for every rule. Also, rules may need to process data onmore than one identified level. This is represented on thediagram by the set of arrows pointing rightwards across thevertical language separation line. Theoretically, the startingpoint of these arrows could be located anywhere on the ver-tical axis, even higher than the AZee target level itself, but

no reliable text processing tool really allows that yet. So, weend up with a set of up&rightward paths from many (ratherlower) layers of French to high-level SL-specific rules. In thissense, we do compare our approach to a transfer approach,and more specifically to a “multi-level ascending transfer”[2]. There is no language-independant representation or in-terlingua; we are implementing a multi-rule and multi-layertransfer.

With a traditional rule-based approach, more and moreabstract representations would be produced as the text isprocessed, to be rephrased in SL in a second step. Contrar-ily, our framework searches in the text for patterns, clues,forms, etc. whose functions it already knows the SL produc-tion rules for. In other words, instead of a forward pipelineapproach taking steps towards the right, upwards first andthen downwards, our proposal is built backwards from ex-isting candidate SL rules to text processing tasks.

Only the preprocessing stage, introduced in 4.1 and occur-ring in a prior stage, runs forward from the text before know-ing what rules are to be triggered. However, we have onlyintroduced it for cases where the preprocessing is needed byseveral triggers. In a sense, it can be regarded as mere codefactoring, to avoid running duplicate code when triggeringrules. No preprocessing is performed if not to serve a ruletrigger. Thus the whole architecture, from the earliest tothe latest stage—the SL synthesis part by design—is drivenby the AZee rule set, which is a SL model exclusively.

The two-fold benefit is:

• no distinctions or categories present in the source textare carried “rightwards” if they are not relevant on theSL side;

• no ambiguity can be transfered over if the SL grammarmakes a distinction between two (or more) cases.

This is an original target-driven approach to rule-basedmachine translation, which to us perfectly recalls that pro-fessional translators unanimously prefer translations per-formed into a native language. One performs better trans-lation if taking foreign input and producing output with atarget-language mind set. We compare this to our picturehere: the system runs with AZee rules as a starting point,and looks through the input text for forms it is prepared toproduce in a “perfectly Sign” way.

However, once the rules are triggered separately, they stillhave to be arranged in a single representation in a last stage.Comparably to example-based MT where retrieved segmentsmust be combined to build a final translation proposal, rulestriggered here may not be fully nested, different parts maybe isolated and will still have to be combined. We leavethis problem aside for now, and talk about it again in theprospects.

5. OUR PROGRESSOver the last few months, substantial progress was made

in the fields of corpus analysis methodology, SL rule buildingand SL animation platform engineering. Besides, the textinformation extraction task has made a good start, and wepresent our overall progress in this section, leaving asideLIMSI’s synthesis platform Kazoo, whose development is tobe published in these proceedings ??.

5.1 SL rule functions

Page 5: A rule triggering system for automatic text-to-Sign translation - Limsi

(a) (b)

Figure 4: Still shots of signs

In this section, we give a summary of the function/formlinks we have explored in our corpus, and conclude the iden-tified production rules.

Around 70 occurrences of chronological precedence andduration of events were found in our corpus and were sub-ject to analysis. Following our methodology, we alternatedbetween search for a function with description of its occur-rences’ forms, and vice-versa. A few relevant forms in oursearches were:

• forward movement of a finger-spread hand, startingabove the shoulder with a non-simultaneous trill onall fingers and eye gaze directed to the hand (see stillon fig. 4a);

• precedence of the event chronologically occurring first;

• use of the sign inconsistantly glossed “your/his turn”,“immediately after” or “consequence” (fig. 4b)...

Some relevant functions were:

• length of a time-specified duration;

• whether an event occurs during the specified period,before or after;

• if a date is absolute or relative...

Analysis produced two major findings. One confirms a re-sult already present in the literature [3], namely the system-atic preference for the chronological order when expressingtime relations between events and dates. The second, muchmore susprising as it is not semantically intuitive, reveals aclear categorical distinction between time periods lasting nomore than 10 days, and those lasting longer.

In the videos, whether translated from French sentencesusing avant (before), apres (after) or other constructionsintroducing event, date or period precedence, all are signedin an order that is congruent to their chronological order.The only exceptions are interpreted as interpolated con-structions, i. e. pieces that do not affect the primary meaningof the production if taken out. We discard these occurrencesfor future work on interpolated clauses specifically. Pausesare sometimes noted between two consecutive parts on thesigned “time line”, but we could not find any predictive rule,other than allowing them without enforcing them.

For all expressions of durations, whether an event is tak-ing place before, during and/or after the duration, we founda clear separation line in the signed form, based on the dura-tion length itself, drawn at about 10 days of absolute time.All durations exceeding this time lapse are expressed usingthe form shown in figure 4a, whether it is a period of sep-aration between events or a period during which an event

is taking place. All durations shorter than 10 days are ex-pressed differently, and the surrounding construction differsdepending on if a signed event is said to happen in the givenperiod or not. For these shorter periods, durations of eventsare expressed with a lexical sign meaning“duration” (severalavailable), while durations between events make use of whatis shown in figure 4b.

Less surprinsingly, we have also observed that when eventswere given a date or a time reference, the date was al-ways signed immediately before the event. This holds ineither case of an absolute or a relative date, and in eitherchronological rank in an event sequence. The only com-ment we have about dating is that we have detected no useof the yet documented [3] transverse time axis for absolutedates (the horizontal axis normal to the sagittal plane). Allchronological arrangements were performed along the sagit-tal axis. Though of course, the corpus containing only realnews events, all texts had a link to the time of speech. There-fore, we do not conclude that no absolute dating takes placeon the transverse axis, rather that dates, even though ab-solute, are signed along the sagittal axis if in the same se-quence as a relative date. For similar reasons, we excludetime sequences referencing the future.

From these observations, we have derived the six rulesbelow, whose form specifications are illustrated in fig. 5. Inthe figure, the boxes bound the time intervals during whichthe contained form description occurs. The arrangement onthe vertical axis is arbitrary; the diagrams are not to be readas annotation tiers where one would find one articulator pertier. Italics in the boxes are the rule arguments, and thear : cat indications are form specifications using categorycat for articulator ar (hd = head, el = eyelids, eg = eyegaze, sh = strong hand), cf. §3.3.

(r1) Separation of two events or dates by a period under 10days (arguments: the event happening first chronolog-ically pre, the second event post, the optional durationdur separating the events)

(r2) Chronological sequence (arguments: list of chronolog-ically ordered events, dates or durations)

(r3) Period of at least 10 days (argument: the duration durof the period)

(r4) An event lasts for at least 10 days’ time (arguments:the event and the duration dur)

(r5) An event lasts less than 10 days (arguments: the eventand the duration dur)

(r6) Dated/time-stamped event (arguments: event and date)

Here are a few examples of LSF constructions using theserules, and possible English equivalents for each:

(e1) r2(Ev1, r3(dur : “3 months”), Ev2): “Ev2 three monthsafter Ev1”;

(e2) r1(pre: Ev, post : r6(event : “election”, date: “to-day”), dur : “1 week”): “Ev one week before today’selection”;

(e3) r1(pre: “left France”, post : r5(event : “hostages”, dur :“3 days”)): “they were held as hostages for 3 days justafter they left France”.

Page 6: A rule triggering system for automatic text-to-Sign translation - Limsi

Figure 5: Form production for rules (r1–6)

Our first thought on the 10-day distinction was that itpertained less to the absolute duration than it did to someform of judgement on the duration. For example, one couldtell any duration to be of one category (“rather long”) or theother. But our study disconfirmed this hypothesis. Simi-larly, it is surprising that event precedence has a rule (r1)with two argument events in the case of short durations,without offering the counterpart for the other case. But,corpus observation has revealed no further relevance in se-quences of two events separated by a long period than ofany other combination of long periods and events. Onlythe time sequence (r2) has surfaced as a useful productionrule. This shows how important it is to write SL rules withno preconception of a syntactic or semantic structure, as inall likelihood no semantic representation graph would havemade such difference.

Of course, we have not fully described every possible formfeature, in particular when related to the dynamics in thechanges, and we want mention that for that matter, thehuman limit of SL video observation is eventually hit. Allmeasurements here are man-made by means of frame-by-frame viewing and video annotating, thus rather qualitative,which is sometimes not sufficient, even to discriminate be-tween form categories. Indeed very little on dynamics canbe properly described with such tools only, whereas motioncapture recordings could achieve great precision in measur-ing movement velocities and accelerations, as well as lengthsof pauses and holds, which all may well participate in a rule’sform specification. We expect a lot of additional informationfrom technologies like mocap in the future.

Figure 6: The text processing software architecture

Encoding these synchronising rules in AZee is useful forautomatic generation, but only represents half of the transla-tion process, which the next section continues on to address.

5.2 Text processingThis is the part of the system we have started most re-

cently. Again before producing anything from the text, thepoint was to have a linguistically informed SL animationsystem with no concern for—hence no bias from—writtenlanguage lexicon, clause order or linear structure. This sec-tion presents what we have done on the text processing, i. e.rule triggering, side of the system.

The underlying architecture consists in a list of rule trig-gering processes, one for every available target SL rule, eachpotentially making use of some pre-processing output, e. g.general purpose POS-tags or some more dedicated informa-tion. To unify this and be able easily to add new parsingand rule triggering abilities as time goes, we implementeda generic plugin system to allow for modularity on both as-pects, as illustrated in fig. 6.

When executed, the program first runs all available pre-processing (stage 1) modules. They save useful output in ashared and easily explorable data space (in the middle in thefigure), either saving text files or extending the pre-existingmulti-tag&tree structure. Initially, this structure containsthe list of the lexical units of the input text, which tag-oriented modules can label with attributes and values andwhich tree-building modules can use as leaf nodes. Any sortof additional data file can be stored as well, for the purposeof specific rules.

Stage 2 comes next, where all available rule triggeringmodules are run. Each stage-2 module searches through theshared data space (or even—though not explicitely shownin fig. 6—through the text itself if helpful), for clues thatits rule is to be considered for use in a translation of theinput text. A rule can of course be triggered several timesfor a single text, or none at all, and can populate some of itsarguments if it is able. Every trigger is associated a reasonfor the trigger and a reliability mark.

All this has recently been implemented, and is bound toundergo further development, but the architectural diagramdrawn in fig. 6 is ready and modules can already be pluggedin or out of the system for both stages. Today, the system iscomposed of 5 stage 1 modules, and of most triggers for therules described in §5.1, plus one for the “open list” function.This rule was added first, because it was already availableand well established. It is renamed from the earlier pub-

Page 7: A rule triggering system for automatic text-to-Sign translation - Limsi

lished “enumeration” rule [6], as strictly speaking enumera-tions may have different types, and we believe it only coverslists to which more items can be appended.

The five pre-processing modules are: “tree tagger”, “XIP”,“enum”, “Wmatch-timer” and “time seq graph”. The firstone, “tree tagger” performs an external call to the tool it isnamed after, which provides with the POS tags for the sen-tence tokens [15]. XIP, the Xerox R© Incremental Parser, isanother external tool that builds syntactic trees for the in-put sentences [1]. As these tasks are very often needed, theirplace is in stage 1. The last three are explained later, whereuseful. We now describe the trigger modules implemented inthe software, our intention being to cover the largest numberof triggering rules more than to compete with the state ofthe art in every NLP sub-task addressed.

The “open list” trigger mostly relies on typographical andlexical cues, and includes some syntactic analysis as well.Lists in French are often strings of comma-separated ele-ments of the same syntactic type (noun, verb phrase...), fol-lowed by the last item introduced by the conjunction “et”or “ou”. Typography also helps to detect parenthesised orcolon-initiated lists, and ellipses. Lexical checks are usefulfor including patterns like example (e4) below, and for ex-cluding example (e5). Case (e6) can be ruled out with asemantic hyponym check.

(e4) W. Pickett, qui avait signe des titres comme “In themidnight hour” ou “Mustang Sally” est decede jeudi. –W. P., author of songs like ... or ..., died on Thursday.

(e5) Le gouvernement a annonce ses trois priorites pourl’annee : la recherche, l’education et la culture. – Thegovernment announced its three priorities for the year(to come): public research, education and culture.

(e6) Deux personnes, un Francais et un Britannique, ontete arretes hier soir. – Two people, one French andone British, were arrested last night.

There is no enumeration pattern in (e4) par se, but theuse of “like ... or ...” in French clearly has the function ofan open list, which should trigger the same LSF productionrule. Contrarily, (e5) and (e6) have an enumeration pat-tern, but should not be included. The introductory clausein (e5) contains the lexical unit for “three”, giving away afully-exhaustive list after it, and while (e6) is theoreticallyambiguous, the hyperonym status of the first item relatedto all others combined with a lexical count as in (e5) favoursa closed list over an open one.

Except for this last (e6) type of criterion, everything hereis implemented. Preparing for the other types of lists (e. g.lists of mutually exclusive options, revealed in the DictaSignproject under the name “alternative constructions”), textualdetection of enumerations was moved into a pre-processingmodule named “enum”, whose output is a file with all de-tected lists of comma-separated items, tagged with a fewmarkers such as the conjuntion used before the last enumer-ated item. This way, the trigger to come for the “option list”rule will share part of its processing with the “open list” one.Each trigger module simply filters out the enumerations itis not concerned with, and performs any additional searchneeded to find textual patterns that are not enumerations,e. g. like in (e4).

Similarly, modules were written for most rules (r1–6) ofsection 5.1. Rules (r1–r5) need duration detection, and rules

(r1) and (r2) are about sequences, which calls for two thepre-processing modules: respectively “Wmatch-timer” and“time seq graph”, the latter still under development.

Wmatch is a semantic parser engine for text, developedat LIMSI [13]. It allows custom grammars, recognises se-quences of lexical units or subtrees and builds partial se-mantic/syntactic trees over the input. We used this tool todetect occurrences of durations, as well as binary syntac-tic connectors like “trois jours avant” (three days before).This pre-processing step is contained in module “Wmatch-timer”. Using its output and more syntactic analysis, an-other pre-processing module “time seq graph” detects se-quences of times and events, and each item’s duration whenpossible. We leave out the details about the output form,all the more so as it is still incomplete. The idea is to helprules (r1) and (r2), both requiring sequence detection.

Here are the clues that each trigger module uses for thetime being—of course, most of them can be extended todetect more forms in French supporting the same function:

(r1) Look in output of “time seq graph” for two consecutivenodes separated by a short duration.

(r2) Build the sequence from the output of“time seq graph”,triggering (r4) to include long durations of separationand (r1) when two consecutive nodes are separated bya shorter longer duration.

(r3) Nothing triggers this rule alone; it is indirectly trig-gered by (r2) and (r4) modules when needed.

(r4&5) These rules use the output of “Wmatch-timer”, ad-ditional syntactic clues and a duration check.

(r6) Not implemented yet. We intend to use mostly syntac-tic clues to start, combining with an already availableWmatch grammar good for recognising date patterns.

We have started measuring precision and recall values fora few of the implemented modules. But, evaluating eachsubtask independently will not assess the proposed archi-tecture as a whole, and we have no intention of challengingthe NLP state of the art for the individual modules (wewould rather collaborate with text NLP experts to associatethem to the SL translation field). Not yet knowing how toapproach this problem, though fully aware that it will be-come a crucial question soon enough, we list this as futurework and discuss it in the last section.

6. CONCLUSION AND FUTURE WORKAfter showing limitations to data-driven methods for SL

translation, i. e. lack of segmented data and linearity, wehave suggested an original rule-based software architecture,which we believe could be explored for other language pairs,even in the well-addressed field of text-to-text translation,and presented our progress on its implementation. It isbased on a SL model allowing multi-linear formalisation ofSL-specific structures without influence from text structure,after thorough corpus analysis. These rules drive the trans-lation, in the sense that it is the rules’ functions that tell thesystem what to look for in the text. We have started withopen lists and a subset of rules found to be governing event,date and period precedence as well as event durations.

After implementing enough rules, we need to think of away to evaluate such a system. Currently, the way to eval-uate an output translation is either to measure similarity

Page 8: A rule triggering system for automatic text-to-Sign translation - Limsi

between the output and a set of human-provided referenceswith metrics like BLEU [12], or to ask for human assessmentof the output, either with judgements on criteria like ade-quacy and fluency, by ranking translations or by modifyingthe output with as few steps as possible to reach a satis-factory translation (HTER). But these techniques are basedon lexical segmentation and sequence, and we discarded sta-tistical approaches for the same reason. Moreover, we donot produce full output translations, rather SL rule trigger-ing from text analysis, while all the methods above assumefully generated sentences to operate on. This in fact raises anew challenge, pertaining to a full research field: MT eval-uation. Hence, our evaluation protocol will need time andwork to be established properly, and the question may evenbe premature before we work on the missing final stage ofthe system: triggered rule combination.

Before concluding, we would actually like to address thisproblem through an interesting prospect for our work. Fromwhat we have presented, we end up with a set of triggeredrules, and are still unsure of how these are to be put to-gether, nested under one another or combined. While wedo not believe that full automatic translation, i. e. compa-rable to human production, is in reach any time soon, anapplication of this work could be to assist human transla-tion, leaving the final arrangement task to the translator, forexample through an interface inspired by DictaSign’s Signwiki prototype [7].

With this wiki, the user could edit/create SL animationsby manipulating items taken from a set of basic signs and“linguistic structures”, all identifiable by production rules inour terms today. He could insert and remove signs, arrangethe hierarchy of rules, etc. to ultimately build the desiredoutput, which could be generated in one click. The appli-cation prospect for us here is to use a similar interface andpropose, after processing an input text, an initial set of pro-duction rules whose functions (and arguments when possi-ble) are identified in the text, more or less still to be com-bined, filled with missing contents and relieved from falsepositives. This basically merges the final arrangement taskwith the professional human post-editing process that wouldbe necessary anyway. Working on text, professionals alreadyuse the help of translation memory managers to easily recallpreviously translated equivalents for repeated text patterns.Comparably, our proposal can assist human translators withpre-formatted pieces of output from a text processing stage.

Such project would have to include the participation oftranslators, for example to poll them about the type of in-terface they would find helpful, or about their preferencefor recall- or precision-oriented rule triggers. Indeed, it isnot obvious whether throwing out superfluous suggestionsis more comfortable or time-saving than trusting most ofthe output, even if it means looking for the missing piecesby hand. This is a very exciting prospect to us as it will bethe first to explore the potential benefit of our work in thetranslation service industry.

7. REFERENCES[1] S. Aıt-Mokhtar, J.-P. Chanod, and C. Roux.

Robustness beyond shallowness: Incremental deepparsing. Natural Language Engineering, 8:121–144,2002.

[2] C. Boitet. Automated translation. Revue francaise delinguistique appliquee (in English), 8:99–121, 2003.

[3] C. Cuxac. Langue des signes francaise, les voies del’iconicite, volume 15–16. Ophrys, 2000.

[4] S. Dandapat, S. Morrissey, A. Way, and M. L.Forcada. Using example-based mt to supportstatistical mt when translating homogeneous data in aresource-poor setting. In Proceedings of the 15thannual meeting of the European Association forMachine Translation (EAMT 2011), 2011.

[5] M. Filhol. Combining two synchronisation methods ina linguistic model to describe sign language. Gestureand Sign Language in Human-Computer Interactionand Embodied Communication, Springer LNCS/LNAI,7206, 2012.

[6] M. Filhol and A. Braffot. Dictasign deliverable d4.2:Report on the linguistic structures modelled for thesign wiki, 2010. DictaSign project, EU-FP7.

[7] J. Glauert et al. Dictasign deliverable d7.3: Sign wikiprototype, 2012.

[8] T. Hanke et al. Visicast deliverable d5-1: interfacedefinitions, 2002. ViSiCAST project report.

[9] M. Huenerfauth. Generating American Sign Languageclassifier predicates for English-to-ASL machinetranslation. PhD thesis, University of Pennsylvania,2006.

[10] I. Marshall and E. Safar. Sign language generation inan ale hpsg. In Proceedings of HPSG04, 2004.

[11] S. Morrissey. Data-driven Machine Translation forSign Languages. PhD thesis, Dublin City University,2008.

[12] K. Papineni, S. Roukos, T. Ward, and W. J. Zhu.Bleu: a method for automatic evaluation of machinetranslation. In Proceedings of the 40th Annual Meetingof the ACL, 2002.

[13] S. Rosset, O. Galibert, G. Bernard, E. Bilinski, andG. Adda. The limsi participation to the qast track. InWorking Notes of CLEF workshop, 2008.

[14] R. San-Segundo, R. Barra, R. Cordoba, L. F. D’Haro,F. Fernandez, J. Ferreiros, J. M. Lucas,J. Macıas-Guarasa, J. M. Montero, and J. M. Pardo.Speech to sign language translation system forspanish. Speech Communication, 50/11-12, 2008.

[15] H. Schmid. Treetagger website: http://www.cis.uni-muenchen.de/ schmid/tools/treetagger.

[16] D. Stein, C. Schmidt, and H. Ney. Analysis,preparation, and optimization of statistical signlanguage machine translation. Machine Translation,26:4:325–357, 2012.

[17] T. Veale, A. Conway, and B. Collins. The challenges ofcross-modal translation: English to sign languagetranslation in the zardoz system. Machine Translation,13:81–106, 1998.

[18] C. Vertan and D. Karagiozov. Atlas deliverable d6.1:Machine translation in atlas, 2012.

[19] C. H. Wu, H. Y. Su, Y. H. Chiu, and C. H. Linal.Transfer-based statistical translation of taiwanese signlanguage using pcfg. In ACM transactions on Asianlanguage information processing (TALIP), 2007.

[20] L. Zhao, K. Kipper, W. Schuler, C. Vogler, N. Badler,and M. Palmer. A machine translation system fromenglish to american sign language. Association forMachine Translation in the Americas, 2000.