from web n-grams to collocation learning

27
From Web n-grams to collocation learning Shaoqun Wu University of Waikato

Upload: rossa

Post on 15-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

From Web n-grams to collocation learning. Shaoqun Wu University of Waikato. heavy smoker. strong smoker powerful tea big rain look book one sex product. strong smoker. heavy rain. read a book. disposable product. What is collocation?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: From Web n-grams to collocation learning

From Web n-grams to collocation learning

Shaoqun Wu

University of Waikato

Page 2: From Web n-grams to collocation learning

strong smoker

powerful tea

big rain

look book

one sex product

heavy smoker

strong smoker

heavy rain

read a book

disposable product

Page 3: From Web n-grams to collocation learning

What is collocation?

Choice of one word conditions the choice of the next, and of the next again

John McHardy Sinclair

Page 4: From Web n-grams to collocation learning

Learning collocation is important

Page 5: From Web n-grams to collocation learning
Page 6: From Web n-grams to collocation learning
Page 7: From Web n-grams to collocation learning
Page 8: From Web n-grams to collocation learning
Page 9: From Web n-grams to collocation learning

Existing collocation resources

Page 10: From Web n-grams to collocation learning
Page 11: From Web n-grams to collocation learning
Page 12: From Web n-grams to collocation learning
Page 13: From Web n-grams to collocation learning
Page 14: From Web n-grams to collocation learning

What are you doing tonight ?

I am going to look TV.

Page 15: From Web n-grams to collocation learning

Watch TV?

Page 16: From Web n-grams to collocation learning
Page 17: From Web n-grams to collocation learning
Page 18: From Web n-grams to collocation learning

Z Z Z

Page 19: From Web n-grams to collocation learning

Building a collocation database from the Web text

Page 20: From Web n-grams to collocation learning

Google n-gram collection

word_1 <space> word_2 <space>… word_n <tab> count

tokens 1,024,908,267,229 1012

sentences 95,119,665,584 0.95109

unigrams 13,588,391 0.014109

bi-grams 314,843,401 0.3109

trigrams 977,069,902 1.0109

four-grams 1,313,818,354 1.3109

five-grams 1,176,470,663 1.2109

Page 21: From Web n-grams to collocation learning

Building a Web collocation collection

• clean up n-grams,

• assign syntactic tags to the words of n-grams,

• match tagged n-grams with the syntactic patterns, and

• discard ones that occur less than 100 times.

Page 23: From Web n-grams to collocation learning

Parents have a wide range of options to cultivate their children with a good art understanding

cultivate their children

Page 25: From Web n-grams to collocation learning

Evaluation

• language learners from the education school86% vs 67%

• myself: diary recording …

• and you?

Page 26: From Web n-grams to collocation learning

Questions

specific questions

serious questions

tough questions

difficult questions

hard questions

critical question

simple questions

general questions

interesting questions

various questions

common questions

no further questions

Page 27: From Web n-grams to collocation learning

no stupid questions

THERE ARE