p ascal c hallenge on e valuating m achine l earning for i nformation e xtraction

Post on 24-Jan-2016

31 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

P ASCAL C HALLENGE ON E VALUATING M ACHINE L EARNING FOR I NFORMATION E XTRACTION. Neil Ireson Local Challenge Coordinator Web Intelligent Group Department of Computer Science University of Sheffield UK. Organisers. Sheffield – Fabio Ciravegna UCD Dublin – Nicholas Kushmerick - PowerPoint PPT Presentation

TRANSCRIPT

PASCAL

PASCAL CHALLENGE ON EVALUATING MACHINE LEARNING FOR

INFORMATION EXTRACTION

Designing Knowledge Management using Adaptive Information Extraction from Text

PASCAL Network of Excellence on Pattern Analysis, Statistical Modelling and Computational Learning

Call for participation:

Evaluating Machine Learning for Information Extraction

July 2004 - November 2004

The Dot.Kom European project and the Pascal Network of Excellence invite you in participating in the Challenge on Evaluation of Machine Learning for Information Extraction from Documents. Goal of the challenge is to assess the current situation concerning Machine Learning (ML) algorithms for Information Extraction (IE), identifying future challenges and to foster additional research in the field. Given a corpus of annotated documents, the participants will be expected to perform a number of tasks; each examining different aspects of the learning process.

Corpus A standardised corpus of 1100 Workshop Call for Papers (CFP) will be provided. 600 of these documents will be annotated with 12 tags that re late to pertinent information (names, locations, dates, etc.). Of the annotated documents 400 will be provided to the participants as a training set, the remaining 200 will form the unseen test set used in the final evaluation. All the documents will be pre-processed to include tokenisation, part-of-speech and named-entity information.

Tasks Full scenario: The only mandatory task for participants is learning to annotate implicit information: given the 400 training documents, learn the textual patterns nece ssary to extract the annotated information. Each participant provides results of a four-fold cross-validation experiment using the same document partitions for pre-competitive tests. A final test will be performed on the 200 unseen documents. Active learning: Learning to select documents: the 400 training documents will be divided into fixed subsets of increasing size (e.g. 10, 20, 30, 50, 75, 100, 150, and 200). The use of the subsets for training will show effect of limited resources on the learning process. Secondly, given each subset the participants can select the documents to add to increment to the next size (i.e. 10 to 20, 20 to 30, etc.), thus showing the ability to select the most suitable set of documents to annotate. Enriched Scenario: the same procedure as task 1, except the participants will be able to use the unannotated part of the corpus (500 documents). This will show how the use of unsupervised or semi-supervised methods can improve the results of supervised approaches. An interesting variant of this task could concern the use of unlimited resources, e.g. the Web.

Participation Participants from different fields such as machine learning, text mining, natural language processing, etc. are welcome. Participation in the challenge is free. After registration, participant will receive the corpus of documents to train on and the precise instructions on the tasks to be performed. At an established date, participants will be required to submit their systems’ answers via a Web portal. An automatic scorer will compute the accuracy of extraction. A paper will have to be produced in order to describe the system and the results obtained. Results of the challenge will be discussed in a dedicated workshop.

Timetable 5th July 2004: Formal definition of the tasks, annotated corpus and evaluation server 15th October 2004: Formal evaluation November 2004: Presentation of evaluation at Pascal workshop

Organizers Fabio Ciravegna: University of Sheffield, UK; (coordinator) Mary Elaine Califf, Illinois State University, USA,

Neil Ireson

Local Challenge Coordinator

Web Intelligent GroupDepartment of Computer ScienceUniversity of SheffieldUK

PASCAL

Organisers

• Sheffield – Fabio Ciravegna

• UCD Dublin – Nicholas Kushmerick

• ITC-IRST – Alberto Lavelli

• University of Illinois – Mary-Elaine Califf

• FairIsaac – Dayne Freitag

PASCAL

Outline

• Challenge Goals

• Data

• Tasks

• Participants

• Experimental Results

• Conclusions

PASCAL

Goal : Provide a testbed for comparative evaluation of ML-based IE

• Standardisation – Data

• Partitioning• Same set of features

– Corpus preprocessed using Gate– No features allowed other than the ones provided

– Explicit Tasks– Evaluation Metrics

• For future use• Available for further test with same or new systems• Possible to publish and new corpora or tasks

PASCAL

Data (Workshop CFP)2005

1993

2000

Training Data

400 Workshop CFP

Testing Data

200 Workshop CFP

PASCAL

Data (Workshop CFP)2005

1993

2000

Training Data

400 Workshop CFP

Testing Data

200 Workshop CFP

Set0

Set1

Set2

Set3

PASCAL

Data (Workshop CFP)2005

1993

2000

Training Data

400 Workshop CFP

Testing Data

200 Workshop CFP

Set0

Set1

Set2

Set3

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9

PASCAL

Data (Workshop CFP)2005

1993

2000

Training Data

400 Workshop CFP

Testing Data

200 Workshop CFP

Unannotated Data 1

250 Workshop CFP Unannotated

Data 2

250 Conference CFP

WWW

PASCAL

PASCAL

Annotation SlotsTraining Corpus Test corpus

workshop name 543 11.8% 245 10.8%

acronym 566 12.3% 243 10.7%

homepage 367 8.0% 215 9.5%

location 457 10.0% 224 9.9%

date 586 12.8% 326 14.3%

paper submission date 590 12.9% 316 13.9%

notification of acceptance date 391 8.5% 190 8.4%

camera-ready copy date 355 7.7% 163 7.2%

conference name 204 4.5% 90 4.0%

acronym 420 9.2% 187 8.2%

homepage 104 2.3% 75 3.3%

Total 4583 100.0% 2274 100.0%

PASCAL

Preprocessing

• GATE– Tokenisation– Part-Of-Speech– Named-Entities

• Date, Location, Person, Number, Money

PASCAL

Evaluation Tasks

• Task1 - ML for IE: Annotating implicit information – 4-fold cross-validation on 400 training documents

– Final Test on 200 unseen test documents

• Task2a - Learning Curve: – Effect of increasing amounts of training data on learning

• Task2b - Active learning: Learning to select documents – Given seed documents select the documents to add to training set

• Task3a – Semi-supervised Learning: Given data– Same as Task1 but can use the 500 unannotated documents

• Task3b - Semi-supervised Learning: Any Data– Same as Task1 but can use all available unannotated documents

PASCAL

Evaluation

• Precision/Recall/F1Measure

• MUC Scorer

• Automatic Evaluation Server

• Exact matching

• Extract every slot occurrence

PASCAL

ParticipantsParticipant ML 4-fold X-validation Test Corpus

1 2a 2b 3a 3b 1 2a 2b 3a 3b

Amilcare (Sheffield, UK) LP2 2 2 1 1 1 1 1

Bechet (Avignon, France) HMM 2 1 2 2

Canisius (Netherlands) SVM, IBL 1 1

Finn (Dublin, Ireland) SVM 1 1

Hachey (Edinburgh, UK) MaxEnt, HMM 1 1

ITC-IRST (Italy) SVM 3 3 1

Kerloch (France) HMM 2 2 3 2

Sigletos (Greece) LP2, BWI, ? 1 3

Stanford (USA) CRF 1 1

TRex (Sheffield, UK) SVM 2

Yaoyong (Sheffield, UK) SVM 3 3 3 3 3 3

Total 15 8 4 0 0 20 10 5 1 1

PASCAL

Task1

Information Extraction with all the available data

PASCAL

Task1: Test Corpus

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Precision

Re

call

AmilcareStanfordYaoyongITC-IRSTSigletosCanisiusTrexBechetFinnKerloch

PASCAL

Task1: Test Corpus

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Precision

Re

call

AmilcareStanfordYaoyongITC-IRSTSigletosCanisiusTrexBechetFinnKerloch

PASCAL

Task1: Test Corpus

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

P recision

Yaoyong

ITC-IRST

Canisius

Trex

Finn

PASCAL

Task1: 4-Fold Cross-validation

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Precision

Rec

all

Amilcare

Yaoyong

ITC-IRST

Sigletos

Canisius

Bechet

Finn

Kerloch

PASCAL

Task1: 4-Fold & Test Corpus

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Precision

Rec

all

Amilcare

Yaoyong

ITC-IRST

Sigletos

Canisius

Bechet

Finn

Kerloch

PASCAL

Task1: Slot FMeasure

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Mean

Max

PASCAL

Best Slot FMeasures Task1: Test Corpus

Amilcare1 Yaoyong1 Stanford1 Yaoyong2 ITC-IRST2name 0.352 0.58 0.596 0.542 0.66acro 0.865 0.612 0.496 0.6 0.383date 0.694 0.731 0.752 0.69 0.589home 0.721 0.748 0.671 0.705 0.516loca 0.488 0.641 0.647 0.66 0.542pape 0.864 0.74 0.712 0.696 0.712noti 0.889 0.843 0.819 0.856 0.853came 0.87 0.75 0.784 0.747 0.783name 0.551 0.503 0.493 0.477 0.481acro 0.905 0.445 0.491 0.387 0.348home 0.393 0.149 0.151 0.116 0.119

workshop

conference

PASCAL

Task 2a

Learning Curve

PASCAL

Task2a: Learning Curve FMeasure

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Amilcare

Yaoyong1

Yaoyong2

Yaoyong3

ITC-IRST1

Bechet2

Kerloch3

Bechet1

Kerloch2

Hachey

MEAN

PASCAL

Task2a: Learning Curve Precision

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Amilcare

Yaoyong1

Yaoyong2

Yaoyong3

ITC-IRST1

Bechet2

Kerloch3

Bechet1

Kerloch2

Hachey

MEAN

PASCAL

Task2a: Learning Curve Recall

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Amilcare

Yaoyong1

Yaoyong2

Yaoyong3

ITC-IRST1

Bechet2

Kerloch3

Bechet1

Kerloch2

Hachey

MEAN

PASCAL

Task 2b

Active Learning

PASCAL

Task2b: Active Learning

• Amilcare– Maximum divergence from expected number of

tags.

• Hachey– Maximum divergence between two classifiers

built on different feature sets.

• Yaoyong (Gram-Schmidt)– Maximum divergence between example subset.

PASCAL

Task2b: Active LearningIncreased FMeasure over random selection

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Amilcare

Yaoyong1

Yaoyong2

Yaoyong3

Hachey

PASCAL

Task 3

Semi-supervised learning

(not significant participation)

PASCAL

Conclusions (Task1)

• Top three (4) systems use different algorithms– Rule Induction, SVM, CRF & HMM

• Same algorithms (SVM) produced different results

• Brittle Performance• Large variation on slot performance• Post-processing

PASCAL

Conclusion (Task2 & Task3)

• Task 2a: Learning Curve– Systems’ performance is largely as expected

• Task 2b: Active Learning– Two approaches, Amilcare and Hachey,

showed benefits

• Task 3: Semi-supervised Learning– Not sufficient participation to evaluate use of

enrich data

PASCAL

Future Work

• Performance differences:– Systems: what determines good/bad performance– Slots: different systems were better/worse at identifying different

slots

• Combine approaches• Active Learning• Semi-supervised Learning

– Overcoming the need for annotated data

• Extensions– Data: Use different data sets and other features, using (HTML)

structured data– Tasks: Relation extraction

PASCAL

Thank You

http://tyne.shef.ac.uk/Pascal

top related