Transcript
Page 1: Machine Learning for Information Extraction

Machine Learning for Information Extraction

Li Xu

Page 2: Machine Learning for Information Extraction

Objective

• Learn how to apply the machine learning concept to the application

• Learn how to improve the performance of the existed application by applying the machine learning algorithms

Page 3: Machine Learning for Information Extraction

Introduction

• Information Extraction (IE) is concerned with extracting the relevant data from a collection of document.

• Key component: extraction patterns.

• Machine Learning algorithms.

Page 4: Machine Learning for Information Extraction

IE for Free Text

• Syntactic and semantic constraints

• AutoSlog

• LIEP

• PALKA

• CRYSTAL

• CRYSTAL + Webfoot

• HASTEN

Page 5: Machine Learning for Information Extraction

IE from online Document• WHISK (Soderland 1998)

– Domain: Rental Ads– Precision: ~95%; Recall: 73%-90%

• RAPIER (Califf & Mooney 1997)– Domain: software jobs– Precision: 84%; Recall: 53%

• SRV (Freitag 1998)– Domain: Seminar announcement – Precision: Speaker, 75%; Location,75%; start time 99%, end time

96%.

Page 6: Machine Learning for Information Extraction

WHISK

Page 7: Machine Learning for Information Extraction

RAPIER

Page 8: Machine Learning for Information Extraction

SRV

Page 9: Machine Learning for Information Extraction

Problems• Bottom-up search

– RAPIER– WHISK

• Single-slot extraction rules – SRV– RAPIER

• Heavily depend on the layout pattern

Page 10: Machine Learning for Information Extraction

Obituary Ontology

Page 11: Machine Learning for Information Extraction

Improvement

Page 12: Machine Learning for Information Extraction

Lexical Object

• Relational Learning– FOIL– Feature design

• Regular expression

• Rote Learning

Page 13: Machine Learning for Information Extraction

Multi-slot Hierarchy

Page 14: Machine Learning for Information Extraction

Multi-slot Boundary

• Relational Learning

• Feature Design– Individual heuristics – Combining heuristics

Page 15: Machine Learning for Information Extraction

Conclusion

• How to applying the machine learning algorithm to IE?

• What is the problem for each system?

• How to improve an existed IE approach through machine learning? And how to avoid the problems appeared in other machine learning based IE systems?


Top Related