proposal for synergistic name extraction from historical text documents
TRANSCRIPT
Proposal for Synergistic Name Extraction from Historical Text Documents
Synergistic Name Extraction
• System operation– Shift burden to the system to the extent possible– Smoothly ranges from fully manual to fully automatic– Full control over level of automation
• Expectations– Immediate improvement (click to fill-in form ready for tech transfer)
• Accuracy can be as good as manual extraction (can guarantee)• Time to extract reduced (likely significantly; potential reduction to 0)
– Can use while still being studied and initially developed• Green Interaction
– Improves with use– Learns from observation and being corrected– Can be bootstrapped from scratch (needed for new language)
Synergy: The interaction of two or more agents or forces so that their combined effect is greater than the sum of their individual effects.
Levels of Automation• Manual
– Type in all info of interest, both stated and inferred– Is the church’s current extraction system (except using DEG’s form-based interface)
• Manual minimal– Click-only form fill-in of stated information– Reasoner provides inferred information– Manual check
• all info of interest displayed• correction option through manual editing
• Synergistic– Initial automatic form fill-in of info, both stated and inferred– Manual check and edit
• Automatic with auditing– Automatic form fill-in for a batch of names– Auditor samples and checks accuracy
• If accuracy deemed sufficient, accept• If insufficient, redo synergistically
• Automatic without auditing– Fully automatic extraction and linking to the FamilySearch tree– Patrons notified when viewing information extracted automatically
• Patrons can view source and extraction & inference results• Patrons can, indeed have the responsibility to, make corrections
Obituaries Project• Demo with the synergistic system in mind• Demo
– 1998 system demo– Korean & French demo– FROntIER demo
• Includes extraction and inference• caution: knowledge engineering not complete
– Synergistic (mock-up of editing)– Manual click-only demo– NewsBank ??
• Can we strike a deal now based on click-only?• Should we first run a pilot experiment to provide numbers for decision making?
– Fully manual to establish baseline– Manual-minimal to see if we’re enough better than the baseline to commit now– Do knowledge engineering for FROntIER obit extraction to see if we can either:
» Develop and use automated synergistic system» Accept the accuracy levels to go fully automatic
Scanned Book Project• Alter envisioned experiment (slightly)
– New objective: investigate accuracy & cost wrt levels of automation– Only a slight change – should be acceptable to all
• Missionary task changes– Single form with fields for all info of interest– Two modes of operation (with different groups of missionaries)
• Type all• Click-only (& let FROntIER get inferred info)
• Thesis experiment changes– No change to the experiment we’re planning– Additional follow-on experimental work
• Using the OntologyEditor, display and check results (as directed by low precision & recall) • Edit by hand and rerun to get improved precision & recall results
• What we can learn by the slightly altered experiment– Type-in missionaries establish baseline– Click-only missionaries yield estimated results for manual-minimal level– Knowledge-engineering experiment:
• Estimates the cost and expertise of knowledge engineering (as originally planned)• Establishes the potential for the fully automatic level (as originally planned)• Roughly estimates accuracy & cost for the synergistic level (i.e., with added fix)