bootstrapping regular-expression recognizer to h elp h uman annotators
DESCRIPTION
Bootstrapping Regular-Expression Recognizer to H elp H uman Annotators. Tae Woo Kim. Background. Human annotators annotate entities Top to bottom, a person at a time Find what they can find. Background. Background. Background. The form fills out the ontology snippet. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Bootstrapping Regular-Expression Recognizer to H elp H uman Annotators](https://reader037.vdocuments.mx/reader037/viewer/2022103100/5681323a550346895d98a0f8/html5/thumbnails/1.jpg)
Bootstrapping Regular-Expression Recognizer to Help Human Annotators
Tae Woo Kim
![Page 2: Bootstrapping Regular-Expression Recognizer to H elp H uman Annotators](https://reader037.vdocuments.mx/reader037/viewer/2022103100/5681323a550346895d98a0f8/html5/thumbnails/2.jpg)
Background• Human annotators annotate entities
• Top to bottom, a person at a time
• Find what they can find
![Page 3: Bootstrapping Regular-Expression Recognizer to H elp H uman Annotators](https://reader037.vdocuments.mx/reader037/viewer/2022103100/5681323a550346895d98a0f8/html5/thumbnails/3.jpg)
Person
Name:
Birth date:
Death date:
Residence:
Father:
Mother:
Mary Eliza Warner
1826
Samuel Selden Warner
Azubah Tully Warner
Background
![Page 4: Bootstrapping Regular-Expression Recognizer to H elp H uman Annotators](https://reader037.vdocuments.mx/reader037/viewer/2022103100/5681323a550346895d98a0f8/html5/thumbnails/4.jpg)
Person
Name:
Birth date:
Death date:
Residence:
Father:
Mother:
Samuel Selden Warner
Background
![Page 5: Bootstrapping Regular-Expression Recognizer to H elp H uman Annotators](https://reader037.vdocuments.mx/reader037/viewer/2022103100/5681323a550346895d98a0f8/html5/thumbnails/5.jpg)
Background• The form fills out the ontology snippet
![Page 6: Bootstrapping Regular-Expression Recognizer to H elp H uman Annotators](https://reader037.vdocuments.mx/reader037/viewer/2022103100/5681323a550346895d98a0f8/html5/thumbnails/6.jpg)
Motivation• Too many genealogical documents for human
annotators
• 611,923 Historical documents and family tree with Ely
• The documents represent information in similar patterns
• Why not use these patterns!
![Page 7: Bootstrapping Regular-Expression Recognizer to H elp H uman Annotators](https://reader037.vdocuments.mx/reader037/viewer/2022103100/5681323a550346895d98a0f8/html5/thumbnails/7.jpg)
Solution• While human annotators annotate entities, the
system watches and learn
• Break the text of the documents into sentence fragments
• Find sentence fragments that are in the same pattern
• Turn the pattern into regular expressions
![Page 8: Bootstrapping Regular-Expression Recognizer to H elp H uman Annotators](https://reader037.vdocuments.mx/reader037/viewer/2022103100/5681323a550346895d98a0f8/html5/thumbnails/8.jpg)
What human annotators have
What the system has
![Page 9: Bootstrapping Regular-Expression Recognizer to H elp H uman Annotators](https://reader037.vdocuments.mx/reader037/viewer/2022103100/5681323a550346895d98a0f8/html5/thumbnails/9.jpg)
[1digit num.]._[name],_b._[date],_d._[date].
(\d).\s([A-Z][a-z]+\s[A-Z][a-z]+),\sb.\s(\d{4}),\sd.\s(\d{4}).
Solution
![Page 10: Bootstrapping Regular-Expression Recognizer to H elp H uman Annotators](https://reader037.vdocuments.mx/reader037/viewer/2022103100/5681323a550346895d98a0f8/html5/thumbnails/10.jpg)
Solution• Run the regular-expressions in the rest of the
documents
• Ontology snippet can be filled out with the extracted data
• The system fills out the form for the annotators
![Page 11: Bootstrapping Regular-Expression Recognizer to H elp H uman Annotators](https://reader037.vdocuments.mx/reader037/viewer/2022103100/5681323a550346895d98a0f8/html5/thumbnails/11.jpg)
Conclusion• Regular-expression recognizers watches and learn
from human annotators
• Generate regular-expression to find entities for annotators
• The system will get better and better as it learns more patterns