gdex: automatically finding good dictionary examples in a corpus

17
GDEX: Automatically finding good dictionary examples in a corpus Kivik 2013 Kilgarriff: GDEX 1

Upload: koen

Post on 27-Jan-2016

38 views

Category:

Documents


0 download

DESCRIPTION

GDEX: Automatically finding good dictionary examples in a corpus. Users appreciate examples. Paper: space constraints Electronic: no space constraints Give lots of examples Constraint: Cost of selection, editing. Project. Macmillan English dictionary Already had 1000 collocation boxes - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: GDEX: Automatically finding good dictionary examples in a corpus

GDEX: Automatically finding good dictionary examples in a corpus

Kivik 2013 Kilgarriff: GDEX 1

Page 2: GDEX: Automatically finding good dictionary examples in a corpus

Kivik 2013 Kilgarriff: GDEX 2

Users appreciate examples

Paper: space constraints Electronic: no space constraints

Give lots of examplesConstraint: Cost of selection, editing

Page 3: GDEX: Automatically finding good dictionary examples in a corpus

Kivik 2013 Kilgarriff: GDEX 3

Project

Macmillan English dictionary Already had 1000 collocation boxes Average 8 per box New electronic version

All 8000 collocations need examples Authentic; from corpus

Page 4: GDEX: Automatically finding good dictionary examples in a corpus

Kivik 2013 Kilgarriff: GDEX 4

Old method

Lexicographer Gets concordance for collocation Reads through until they find a good

example Cut, paste, edit

Page 5: GDEX: Automatically finding good dictionary examples in a corpus

Kivik 2013 Kilgarriff: GDEX 5

New method

Lexicographer Gets sorted concordance

20 best examples in spreadsheet Less reading through Tick the first good one, edit

Page 6: GDEX: Automatically finding good dictionary examples in a corpus

Kivik 2013 Kilgarriff: GDEX 6

What makes a good example?

Readable EFL users

Informative Typical, for the collocation Gives context which helps user

understand the target word/phrase

Page 7: GDEX: Automatically finding good dictionary examples in a corpus

Kivik 2013 Kilgarriff: GDEX 7

Readability

70 years research Not just (or mainly) EFL

Educational theory Teaching children to read

Instruction manuals Early work: US military

Publishing People like newspapers and magazines that

they find easy to read

Page 8: GDEX: Automatically finding good dictionary examples in a corpus

Kivik 2013 Kilgarriff: GDEX 8

Readability tests Fleish-Kincaid Reading Ease test

1948 Ave sentence length, ave word length In some word processing software

Many similar measures Recent work

training data for different reading levels Language modelling Tailored readability according to domain, L1

Target levels US grades Now, increasingly: Common European Framwork

Page 9: GDEX: Automatically finding good dictionary examples in a corpus

Kivik 2013 Kilgarriff: GDEX 9

GDEX

Get concordance for collocation For each sentence

Score it Sort Show best ones to lexicographer

Page 10: GDEX: Automatically finding good dictionary examples in a corpus

Kivik 2013 Kilgarriff: GDEX 10

GDEX heuristics Sentence length (10-26 words) Mostly common words is good Rare words are bad Sentences

Start with capital, end with one of .!? No [, ], <, >, http, \ Not much other punctuation, numbers Not too many capitals Typicality: third collocate is a plus

Page 11: GDEX: Automatically finding good dictionary examples in a corpus

Kivik 2013 Kilgarriff: GDEX 11

Weighting

For each sentence Score on each heuristic Weight scores Add together weighted score

How to set weights? Two students:

Manually judged 1000 “good examples” Weights set so system makes same choices

as students

Page 12: GDEX: Automatically finding good dictionary examples in a corpus

Kivik 2013 Kilgarriff: GDEX 12

Was it successful? Did it save lexicographer time?

Definitely (says project manager)

Rough guess Average number of corpus lines to read

until you find a good one: Unsorted: 20 Sorted: 5

Page 13: GDEX: Automatically finding good dictionary examples in a corpus

Kivik 2013 Kilgarriff: GDEX 13

Corpus choice

Started with BNC but Too old Not enough examples

If no good examples in corpus, GDEX can’t help

Changed to UKWaC 20 times bigger; from web; contemporary Better Most web junk filtered out Usually a good example in top twenty

Page 14: GDEX: Automatically finding good dictionary examples in a corpus

Kivik 2013 Kilgarriff: GDEX 14

GDEX and TALC TALC

Teaching and Language Corpora Goal: bring corpora into lg teaching Usual problem

Concordances are tough for learners to read

Way forward GDEX examples Half way between dictionary and corpus

Page 15: GDEX: Automatically finding good dictionary examples in a corpus

Kivik 2013 Kilgarriff: GDEX 15

GDEX: Models for use

More examples for dictionaries Speed up, as with MED or Fully automatic “more examples”

Corpus query tool Option in the Sketch Engine

Only show concordances with high scores

Automatic collocations dictionary http://forbetterenglish.com

Page 16: GDEX: Automatically finding good dictionary examples in a corpus

Recent developments

Configurable GDEX For other languages Interface to help set up

Commonest string Between ‘bare collocate’ and example

Kivik 2013 Kilgarriff: GDEX 16

Page 17: GDEX: Automatically finding good dictionary examples in a corpus

Kivik 2013 Kilgarriff: GDEX 17