parallel bilingual paraphrase...parallel paraphrase rules zcoverage of rules for lexical pairs:...
TRANSCRIPT
Parallel Bilingual Paraphrase Rules for Noun Compounds:
Concepts and Rules for Exploring Web Language Resources
Mar 25 2004
Kageura, K.*, Yoshikane, F$. and Nozawa, T$.*National Institute of Informatics, Tokyo.
$National Institution for Academic Degrees and University Evaluation, Tokyo.
Outline of Presentation
PurposeWhat is the paraphrase of terms?What is the parallel paraphrase?Parallel paraphrase rules for terms (or noun compounds)Web-based paraphrase explorationsConclusions and Outlook
Purpose
With the advent of neo-cons, the world has become more dangerous than ever.→「ネオコンの出現により」 or 「ネオコンが出現したことで」, the latter sounds “softer”.
The advent of neo-cons made the world…→「ネオコンの出現は」 is good (given that
the overall sentence constructions keepcorrespondences between E and J).
Purpose
Best forms among possible variations at various levels of constructions in translation depend on many factors.Expert translators know them, though not explicitly nor in articulated forms.
Is it possible to describe the knowledge?Pertly yes if we start from describing limitedlocal variation rules and patterns in loose formsfor terms/noun compounds.
Purpose
Parallel variation rules will necessarily be loose and over-generative; cannot be used for making best choice automatically.Rather they will be used for showing range of possible variations from examples.There’s no suitable corpora for this aim.
Web? Yes, though it’s usually the first and most convenient resort for cheap, unarticulated and mediocre run-away from difficulties….
Purpose
On the other hand, multilingual utilisationof Web has not fully been explored, due to non-parallel, non-comparable nature of Web mixed-language world.Use of Web for extracting multilingual paraphrase samples using loose parallel paraphrase rules for terms/noun compounds is a good combination, complementing each other’s limitations.
Paraphrase/Variations of Terms?
Variations: different forms with the samemeaning
Morphological and morpho-syntactic variations of terms in contexts are studied intensively since 1990’s.System of detecting term variations from monolingual texts were developed
Uses paraphrase/variation rules at the level of morpho/syntactic patterns and source terms whose paraphrased forms are to be detected.
Parallel Paraphrases/Variations
Essentially, variations of focal expressions within the multiple (all potential) sets of texts made from translations and back-translations constitute parallel variations.Nature of parallel variations:
Constructions with finite number of variations are expected to be limited to lexica/phrases.All potential sets of translated and back-translated texts are in no way available.
Parallel Paraphrases/Variations
Given this limitations, what should we do?There are units which are regarded as having correspondences (roughly) across languages, i.e. words and compounds (and some phrases).There are linguistic concepts which are regarded as having correspondences (roughly) among languages, i.e. part-of-speech, head-modifier, argument….
Parallel Paraphrases/Variations
Using these, “local” quasi-parallelism among variation patterns can be defined:
Starting from corresponding POS-patterns of noun compounds as an anchoring point;Classify monolingual paraphrase/variation rules for Japanese and English noun compounds;Aligning Japanese and English paraphrase rules by anchoring points and rule patterns, on the basis of correspondences between POS, head-modifier, argument, etc.
Parallel Paraphrases/Variations
An example: term verbalisation ruleJapanese Rule: X NS → XをNSする
e.g. 概念学習→概念を学習するEnglish Rule: N1 N2 → V2 {ART?} N1
e.g. concept learning → lean conceptsNS ⇔N2/V2, X⇔N, verbalise⇔verbalise, etc.
⇒ so the above J & E rules are parallel.“lean concepts” corresponds to 概念を学習するas variations of “concept learning”=概念学習.
Parallel Paraphrases/Variations
This only gives “local” and “internal” rule correspondences.Used only when anchoring points are instantiated by actual lexical items, but parallel instantiation is not guaranteed for individual lexical items/compounds.It still can be used for:
Exploring full notion of “parallel paraphrases”Looking up relevant corresponding variations.
Parallel Paraphrase Rules
We used English and Japanese term variation rules developed by Jacquemin(1999) and Yoshikane (2003).Assumed to use parallel rules for variation detection using Fastr (Jacquemin 1999).Made POS correspondences on the basis of Treetagger (E) and ChaSen (J).Define some lexical correspondences, e.g. “of” and の.
Parallel Paraphrase Rules
Three major paraphrase types:Major category shift: paraphrases that change the grammatical category of original compounds, e.g. “concept classification”->“classify concepts”;Head swap: paraphrases that change/swap the head elements, e.g. “memory sharing”->”shared memory”;Internal variants: paraphrases that retain the head and overall category, e.g. “concept classification”->”classification of concept”.
Parallel Paraphrase Rules
Major category shift (2 subtypes):Argument-Verb: e.g. “system implementation” to “to implement (a) system”.J: X1 NS1 → X1 を NS1 VSE: N1 N2 → V2 ART? N1 (root(N2)=root(V2))Modification-Verb: e.g. “ambiguous classification” to “to classify ambiguously”.J: NA1 NS2 → NA1 S4 NS2 VSE: A1 N2 → ADV1 V2 (root(A1)=root(ADV1))
Parallel Paraphrase Rules
Head swap:e.g. “added material” to “material addition”;J: NS1 NX2 → NX2 の NS1E: V1 N2 → N2 V1
Parallel Paraphrase Rules
Internal variations (3 major subtypes):Functional operations;Content-word operations;Morphological operations.
Functional operations:e.g. “job amount” to “amount of jobs”.J: NX1 NX2 → NX1 の NX2.E: N1 N2 → N2 “of” N1.
Parallel Paraphrase Rules
Content-word operations:Modifications: e.g. “big cat” to “big noisy cat”.J: NX1 NX2 → NX1 {NX TPX?}+ NX2.E: N1 N2 → N1 {A|N|V}+ N2.Coordinations: e.g. “word class” to “word and concept class”.J: NX1 NX2 → NX1 C NX S NX2.E: N1 N2 → N1 C N N2.
Parallel Paraphrase Rules
Morphological operations:N to N: “word class” to “word classification”;N to V: “index grammar” to “indexed grammar”;V to N: “indexed grammar” to “index grammar”;N to A: “category grammar” to “categorialgrammar”;
A to N: “categorial gram.” to “category gram.”;A to A: “syntactic information” to “syntactical information”.
Parallel Paraphrase Rules
Coverage of rules for lexical pairs:Actual Japanese and English corresponding lexical items do not always take POS-patterns provided by parallel paraphrase rules.So it is useful to observe that, given a set of actual bilingual vocabulary, to what extent the parallel paraphrase rules can be invoked.We checked this on the different word basis, using 19,532 entries of bilingual terminological list.
Parallel Paraphrase Rules
45.4%8870Morphological Operations53.4%10433Coordinations53.0%10361Modifications53.4%10433Content-word operations55.3%10792Functional operations63.5%12405Internal Variants31.4%6141Head Swap32.1%6279Modification-Verb
6.7%1316Argument-Verb32.3%6312Major Category Shift
Parallel Paraphrase Rules
Coverage of rules for lexical pairs:7090 (36.3%) of 19,532 complex terms listed in the bilingual terminology do not have any parallel paraphrase rules that can be applied to them.The number of term pairs to which neither Japanese nor English rules can be applied is 558 (2.9%).
Web-based Paraphrase Explorations
All in all, roughly 100 Japanese rules and 70 English rules are established and linked as parallel rules.They are currently under review and re-examination, on the basis of analytical and empirical obserbations.Part of these rules are implemented on the Web, to explore Web spaces.
Web-based Paraphrase Explorations
Japanese or English Complex Terms
Dictionary Lookup
Japanese & English Complex Terms
Web Search by Constituent Elements
Fastr
( )
Japanese Pages
English Pages
Output: Corresponding Forms of Variants
Parallel Rules for Variants
Web-based Paraphrase Explorations
The experimental system for Web-based paraphrase detection is currently run at:http://svrrd2.niad.ac.jp/faculty/nozawa/VSearch/in
dex.html
Summary
Introduced the concept of parallel paraphrases within translation context.Define local monolingual paraphrase rules for terms and parallelise them in terms of correspondences.Implemented the rules using Fastr and experimentally run on the Web.
Outlook
Multilingual expansion, especially to French, Spanish and Asian languages.Focusing on some paraphrase patterns and observe contextual factors to understand parallel paraphrases, e.g.:
Due to the irresponsibility of people ->「国民の無責任さにより」「国民が無責任なために」
Activities of election observation -> 『選挙監視活動」「選挙の監視活動」「選挙監視の活動」