selecting proper lexical paraphrase for children

19
Selecting Proper Lexical Paraphrase for Children Tomoyuki Kajiwara Hiroshi Matsumoto Kazuhide Yamamoto Nagaoka University of Technology

Post on 06-Jul-2015

129 views

Category:

Technology


0 download

DESCRIPTION

Tomoyuki Kajiwara, Hiroshi Matsumoto and Kazuhide Yamamoto. Selecting Proper Lexical Paraphrase for Children. The 25th Conference on Computational Linguistics and Speech Processing (ROCLING 2013), pp.59-73 (2013.10)

TRANSCRIPT

Page 1: Selecting Proper Lexical Paraphrase for Children

Selecting Proper Lexical Paraphrase for Children

Tomoyuki Kajiwara Hiroshi Matsumoto Kazuhide Yamamoto

Nagaoka University of Technology

Page 2: Selecting Proper Lexical Paraphrase for Children

Lexical Paraphrase for Children Elementary school Japanese dictionary

【大詰め:final stage】

The last scene of the play 芝居の最後の場面

Newspaper for Children

最後の大一番 Big match of the last

Basic Vocabulary to Learn

5,404 words

Newspaper for Adults

大詰めの大一番 Big match of the final stage

Total annual number of vocabulary

200,000 words Selected by the similarity between the headword

2

Page 3: Selecting Proper Lexical Paraphrase for Children

BVL : Basic Vocabulary to Learn

Vocabulary that registered in the elementary school dictionary

Vocabulary that registered in the general dictionary

Vocabulary that elementary

school students can use sufficient Vocabulary of

the minimum necessary for a living

3

Basic Vocabulary 2,000 words

Basic Vocabulary to Learn 5,404 words

General Vocabulary

Paraphrase to BVL from GV and VL

Reading assistance for

elementary school students

Vocabulary to Learn 25,000 words

Page 4: Selecting Proper Lexical Paraphrase for Children

Related Works •  Paraphrase of utilizing a dictionary

– headword → headword •  Fujita et al. (2000)、Mino and Tanaka (2011)

– headword → word from the end of definition statement •  Kaji et al. (2002)、Mino and Tanaka (2011)、Kajiwara and Yamamoto(2013)

”The definition statements are simpler than the headwords” ”The last segment represents the meaning of the headword”

4

Page 5: Selecting Proper Lexical Paraphrase for Children

Problem of Related Works Definition 【 大詰め 】芝居の最後の場面 【final stage】the last scene of the party

Paraphrase  ✕ 大詰めの大一番 → 場面の大一番  Big match of the final stage → Big match of the scene

 ✔ 大詰めの大一番 → 最後の大一番  Big match of the final stage → Big match of the last

Appropriate target words are not always

found at the end of definitions 5

Page 6: Selecting Proper Lexical Paraphrase for Children

Proposed Method

Page 7: Selecting Proper Lexical Paraphrase for Children

Proposed Method(1/2) •  Acquisition of the Target Word Candidates ①  Difficult word is extracted ②  Entries of the difficult word are searched ③  Words are extracted if they are the same part-of-speech as the difficult word

6

・・・ professor ・・・

【professor】People of status as professor. 【professor】Status as professor. 【professor】Teach learning and skill. 【professor】University teacher.

Japanese Dictionary

Original Sentence People Status

Professor Learning

Skill University

Teacher

① ③

Page 8: Selecting Proper Lexical Paraphrase for Children

Proposed Method(2/2) •  Selection of the Proper Target Word ④  Simple words are extracted ⑤  Similarities of meaning are calculated ⑥  Simple word with the highest similarity is selected

7

Basic Vocabulary to Learn

People Learning  University Skill Teacher 

People Status

Professor Learning

Skill University

Teacher

:0.17 :0.11  :0.08 :0.13 :0.25

④ ⑤

Page 9: Selecting Proper Lexical Paraphrase for Children

Experiments

Page 10: Selecting Proper Lexical Paraphrase for Children

Comparative Methods •  Acquisition of the Target Word Candidates One word is extracted From the end of definition statements If it is the same part-of-speech as the difficult word

•  Selection of the Proper Target Word Weighted voting by following methods •  Frequency •  Co-occurrence frequency •  Point-wise Mutual Information •  Tri-gram frequency •  Cosine similarity between document vectors 8

Page 11: Selecting Proper Lexical Paraphrase for Children

Experimental Setup •  Experimental object : 152 difficult words –  Do not appear in BVL –  Appear more than 50 times in the Mainichi News Paper published in 2000

–  Include paraphrasable simple words in the definition statements

•  Dictionary : Three Japanese dictionary •  Thesaurus : Japanese WordNet

9

Page 12: Selecting Proper Lexical Paraphrase for Children

Procedure (1/2) •  Experiments on the 52 difficult words –  Decide weight

•  Experiments on the 100 difficult words – Weighted voting

•  Evaluation – Three evaluator are judged – Decide by majority vote – Definition of “paraphrasable” The simple word can be replaced with difficult word in the original sentence

10

Page 13: Selecting Proper Lexical Paraphrase for Children

Procedure (2/2)

11

・・・ professor ・・・

【professor】People of status as professor. 【professor】Status as professor. 【professor】Teach learning and skill. 【professor】University teacher.

Japanese Dictionary

Original Sentence People Status

Professor Learning

Skill University

Teacher

③ Nouns are extracted

Basic Vocabulary to Learn

People Learning  University Skill Teacher 

People Status

Professor Learning

Skill University

Teacher

:0.17 :0.11  :0.08 :0.13 :0.25

④ Simple words are extracted ⑤ Similarities of meaning are calculated

① Difficult word is extracted

② Entries of the professor are searched

Page 14: Selecting Proper Lexical Paraphrase for Children

Result (1/3) •  Acquisition of the Target Word Candidates – More paraphrasable simple words are acquired – Only 3.2 points difference

Many paraphrasable simple words appear at the end of definition statements

Number of paraphrasable words

Percentage of paraphrasable words

Proposed 165 / 221 74.7 % Comparative 158 / 221 71.5 %

12

Page 15: Selecting Proper Lexical Paraphrase for Children

Result (2/3)

13 0 10 20 30 40 50 60 70

(5) Cosine similarity

(4) Tri-gram frequency

(3) Point-wise Mutual Information

(2) Co-occurrence Frequency

(1) Frequency

【Proposed】WordNet-similarity

【Baseline】Randomness

Acquisition by comparative method Acquisition by proposed method

Page 16: Selecting Proper Lexical Paraphrase for Children

Result (3/3)

14

0 10 20 30 40 50 60 70

D) Weighted voting adds the WordNet-similarity to the B)

C) Weightless voting adds the WordNet-similarity to the A)

B) Weighted voting by comparative methods (1)-(5)

A) Weightless voting by comparative methods (1)-(5)

【Proposed】WordNet-similarity

【Baseline】Randomness

Acquisition by comparative method Acquisition by proposed method

Page 17: Selecting Proper Lexical Paraphrase for Children

The method utilizing frequency or context information selected paraphrasable word

Erroneous Examples (1/2) •  Two or more simple words have the highest similarity Example •  Original : A summary of the main points. •  Definition :【Points】essential, score, game, spot essential score game spot

15

: similarity 1.0 : similarity 1.0 : similarity 1.0 : similarity 1.0

Page 18: Selecting Proper Lexical Paraphrase for Children

•  The non-paraphrasable word have the highest similarity Example •  Original : I can play the program during recording. •  Definition : 【Play】Use the garbage again. What was gone once again regains power and life.

Erroneous Examples (2/2)

16

use : paraphrasable, similarity 0.8 power : non-paraphrasable, similarity 1.0

The method utilizing frequency or context information selected paraphrasable word

Page 19: Selecting Proper Lexical Paraphrase for Children

Conclusion We paraphrase difficult word to simple word with the highest similarity using the whole definition statements •  Acquisition of the Target Word Candidates – More paraphrasable simple words are acquired – Many of them appear at the end of definitions

•  Selection of the Proper Target Word  The selection based on the similarity is better than  the selection by frequency or context information

17