machine learning for information extraction
DESCRIPTION
Machine Learning for Information Extraction. Li Xu. Objective. Learn how to apply the machine learning concept to the application Learn how to improve the performance of the existed application by applying the machine learning algorithms. Introduction. - PowerPoint PPT PresentationTRANSCRIPT
Machine Learning for Information Extraction
Li Xu
Objective
• Learn how to apply the machine learning concept to the application
• Learn how to improve the performance of the existed application by applying the machine learning algorithms
Introduction
• Information Extraction (IE) is concerned with extracting the relevant data from a collection of document.
• Key component: extraction patterns.
• Machine Learning algorithms.
IE for Free Text
• Syntactic and semantic constraints
• AutoSlog
• LIEP
• PALKA
• CRYSTAL
• CRYSTAL + Webfoot
• HASTEN
IE from online Document• WHISK (Soderland 1998)
– Domain: Rental Ads– Precision: ~95%; Recall: 73%-90%
• RAPIER (Califf & Mooney 1997)– Domain: software jobs– Precision: 84%; Recall: 53%
• SRV (Freitag 1998)– Domain: Seminar announcement – Precision: Speaker, 75%; Location,75%; start time 99%, end time
96%.
WHISK
RAPIER
SRV
Problems• Bottom-up search
– RAPIER– WHISK
• Single-slot extraction rules – SRV– RAPIER
• Heavily depend on the layout pattern
Obituary Ontology
Improvement
Lexical Object
• Relational Learning– FOIL– Feature design
• Regular expression
• Rote Learning
Multi-slot Hierarchy
Multi-slot Boundary
• Relational Learning
• Feature Design– Individual heuristics – Combining heuristics
Conclusion
• How to applying the machine learning algorithm to IE?
• What is the problem for each system?
• How to improve an existed IE approach through machine learning? And how to avoid the problems appeared in other machine learning based IE systems?