proposal for synergistic name extraction from historical text documents

5
Proposal for Synergistic Name Extraction from Historical Text Documents

Upload: roberta-lee

Post on 29-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Proposal for Synergistic Name Extraction from Historical Text Documents

Proposal for Synergistic Name Extraction from Historical Text Documents

Page 2: Proposal for Synergistic Name Extraction from Historical Text Documents

Synergistic Name Extraction

• System operation– Shift burden to the system to the extent possible– Smoothly ranges from fully manual to fully automatic– Full control over level of automation

• Expectations– Immediate improvement (click to fill-in form ready for tech transfer)

• Accuracy can be as good as manual extraction (can guarantee)• Time to extract reduced (likely significantly; potential reduction to 0)

– Can use while still being studied and initially developed• Green Interaction

– Improves with use– Learns from observation and being corrected– Can be bootstrapped from scratch (needed for new language)

Synergy: The interaction of two or more agents or forces so that their combined effect is greater than the sum of their individual effects.

Page 3: Proposal for Synergistic Name Extraction from Historical Text Documents

Levels of Automation• Manual

– Type in all info of interest, both stated and inferred– Is the church’s current extraction system (except using DEG’s form-based interface)

• Manual minimal– Click-only form fill-in of stated information– Reasoner provides inferred information– Manual check

• all info of interest displayed• correction option through manual editing

• Synergistic– Initial automatic form fill-in of info, both stated and inferred– Manual check and edit

• Automatic with auditing– Automatic form fill-in for a batch of names– Auditor samples and checks accuracy

• If accuracy deemed sufficient, accept• If insufficient, redo synergistically

• Automatic without auditing– Fully automatic extraction and linking to the FamilySearch tree– Patrons notified when viewing information extracted automatically

• Patrons can view source and extraction & inference results• Patrons can, indeed have the responsibility to, make corrections

Page 4: Proposal for Synergistic Name Extraction from Historical Text Documents

Obituaries Project• Demo with the synergistic system in mind• Demo

– 1998 system demo– Korean & French demo– FROntIER demo

• Includes extraction and inference• caution: knowledge engineering not complete

– Synergistic (mock-up of editing)– Manual click-only demo– NewsBank ??

• Can we strike a deal now based on click-only?• Should we first run a pilot experiment to provide numbers for decision making?

– Fully manual to establish baseline– Manual-minimal to see if we’re enough better than the baseline to commit now– Do knowledge engineering for FROntIER obit extraction to see if we can either:

» Develop and use automated synergistic system» Accept the accuracy levels to go fully automatic

Page 5: Proposal for Synergistic Name Extraction from Historical Text Documents

Scanned Book Project• Alter envisioned experiment (slightly)

– New objective: investigate accuracy & cost wrt levels of automation– Only a slight change – should be acceptable to all

• Missionary task changes– Single form with fields for all info of interest– Two modes of operation (with different groups of missionaries)

• Type all• Click-only (& let FROntIER get inferred info)

• Thesis experiment changes– No change to the experiment we’re planning– Additional follow-on experimental work

• Using the OntologyEditor, display and check results (as directed by low precision & recall) • Edit by hand and rerun to get improved precision & recall results

• What we can learn by the slightly altered experiment– Type-in missionaries establish baseline– Click-only missionaries yield estimated results for manual-minimal level– Knowledge-engineering experiment:

• Estimates the cost and expertise of knowledge engineering (as originally planned)• Establishes the potential for the fully automatic level (as originally planned)• Roughly estimates accuracy & cost for the synergistic level (i.e., with added fix)