wok: a web of knowledge
DESCRIPTION
WoK: A Web of Knowledge. David W. Embley Brigham Young University Provo, Utah, USA. A Web of Pages A Web of Facts. Birthdate of my great grandpa Orson Price and mileage of red Nissans, 1990 or newer Location and size of chromosome 17 US states with property crime rates above 1%. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/1.jpg)
David W. EmbleyBrigham Young University
Provo, Utah, USA
WoK: A Web of Knowledge
![Page 2: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/2.jpg)
A Web of Pages A Web of FactsBirthdate of my great
grandpa Orson
Price and mileage of red Nissans, 1990 or newer
Location and size of chromosome 17
US states with property crime rates above 1%
![Page 3: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/3.jpg)
• Fundamental questions– What is knowledge?– What are facts?– How does one know?
• Philosophy– Ontology– Epistemology– Logic and reasoning
Toward a Web of Knowledge
![Page 4: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/4.jpg)
• Existence asks “What exists?”• Concepts, relationships, and constraints with
formal foundation
Ontology
![Page 5: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/5.jpg)
• The nature of knowledge asks: “What is knowledge?” and “How is knowledge acquired?”• Populated conceptual model
Epistemology
![Page 6: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/6.jpg)
• Principles of valid inference – asks: “What is known?” and “What can be inferred?”• For us, it answers: what can be inferred (in a
formal sense) from conceptualized data.
Logic and Reasoning
Find price and mileage of red Nissans, 1990 or newer
![Page 7: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/7.jpg)
![Page 8: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/8.jpg)
• Distill knowledge from the wealth of digital web data• Annotate web pages
• Need a computational alembic to algorithmically turn raw symbols contained in web pages into knowledge
Making this Work How?
Fact
Fact
Fact
AnnotationAnnotation
…
…
![Page 9: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/9.jpg)
Turning Raw Symbols into Knowledge
• Symbols: $ 11,500 117K Nissan CD AC• Data: price(11,500) mileage(117K)
make(Nissan)• Conceptualized data:– Car(C123) has Price($11,500)– Car(C123) has Mileage(117,000)– Car(C123) has Make(Nissan)– Car(C123) has Feature(AC)
• Knowledge– “Correct” facts– Provenance
![Page 10: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/10.jpg)
Actualization (with Extraction Ontologies)
Find me the price and mileage of all red Nissans – I want a 1990 or newer.
![Page 11: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/11.jpg)
Data Extraction Demo
![Page 12: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/12.jpg)
Semantic Annotation Demo
![Page 13: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/13.jpg)
Free-Form Query Demo
![Page 14: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/14.jpg)
Explanation: How it Works
• Extraction Ontologies• Semantic Annotation• Free-Form Query Interpretation
![Page 15: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/15.jpg)
Extraction Ontologies
Object sets
Relationship sets
Participation constraints
Lexical
Non-lexical
Primary object set
Aggregation
Generalization/Specialization
![Page 16: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/16.jpg)
Extraction Ontologies
External Rep.: \s*[$]\s*(\d{1,3})*(\.\d{2})?
Key Word Phrase
Left Context: $
Data Frame:
Internal Representation: float
Values
Key Words: ([Pp]rice)|([Cc]ost)| …
Operators
Operator: >
Key Words: (more\s*than)|(more\s*costly)|…
![Page 17: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/17.jpg)
Generality & Resiliency ofExtraction Ontologies
• Generality: assumptions about web pages– Data rich– Narrow domain– Document types
• Single-record documents (hard, but doable)• Multiple-record documents (harder)• Records with scattered components (even harder)
• Resiliency: declarative– Still works when web pages change– Works for new, unseen pages in the same domain– Scalable, but takes work to declare the extraction
ontology
![Page 18: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/18.jpg)
Semantic Annotation
![Page 19: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/19.jpg)
Free-Form Query Interpretation
• Parse Free-Form Query(with respect to data extraction ontology)
• Select Ontology• Formulate Query Expression• Run Query Over Semantically Annotated Data
![Page 20: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/20.jpg)
Parse Free-Form Query “Find me the and of all s – I want a ”
price
mileage
red
Nissan
1996
or newer
>= Operator
![Page 21: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/21.jpg)
Select Ontology“Find me the price and mileage of all red Nissans – I want a 1996 or newer”
![Page 22: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/22.jpg)
• Conjunctive queries and aggregate queries• Mentioned object sets are all of interest.• Values and operator keywords determine conditions.– Color = “red”– Make = “Nissan”– Year >= 1996
>= Operator
Formulate Query Expression
![Page 23: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/23.jpg)
For
Let
Where
Return
Formulate Query Expression
![Page 24: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/24.jpg)
Run QueryOver Semantically Annotated Data
![Page 25: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/25.jpg)
• How do we create extraction ontologies?– Manual creation requires several dozen person hours– Semi-automatic creation
• TISP (Table Interpretation by Sibling Pages)• TANGO (Table ANalysis for Generating Ontologies)• Nested Schemas with Regular Expressions• Synergistic Bootstrapping• Form-based Information Harvesting
• How do we scale up?– Practicalities of technology transfer and usage– Millions of queries over zillions of facts for thousands of
ontologies
Great!But Problems Still Need Resolution
![Page 26: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/26.jpg)
Manual Creation
![Page 27: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/27.jpg)
Manual Creation
![Page 28: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/28.jpg)
Manual Creation
-Library of instance recognizers-Library of lexicons
![Page 29: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/29.jpg)
Automatic Annotation with TISP(Table Interpretation with Sibling Pages)
• Recognize tables (discard non-tables)• Locate table labels• Locate table values• Find label/value associations
![Page 30: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/30.jpg)
Recognize Tables
Data Table
Layout Tables (discard)
NestedData Tables
![Page 31: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/31.jpg)
Locate Table LabelsExamples: Identification.Gene model(s).Protein Identification.Gene model(s).2
![Page 32: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/32.jpg)
Locate Table LabelsExamples: Identification.Gene model(s).Gene Model Identification.Gene model(s).2
12
![Page 33: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/33.jpg)
Locate Table Values
Value
![Page 34: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/34.jpg)
Find Label/Value AssociationsExample:(Identification.Gene model(s).Protein, Identification.Gene model(s).2) = WP:CE28918
12
![Page 35: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/35.jpg)
Interpretation Technique:Sibling Page Comparison
![Page 36: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/36.jpg)
Interpretation Technique:Sibling Page Comparison
Same
![Page 37: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/37.jpg)
Interpretation Technique:Sibling Page Comparison
Almost Same
![Page 38: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/38.jpg)
Interpretation Technique:Sibling Page Comparison
Different
Same
![Page 39: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/39.jpg)
Technique Details
• Unnest tables• Match tables in sibling pages– “Perfect” match (table for layout discard )– “Reasonable” match (sibling table)
• Determine & use table-structure pattern– Discover pattern– Pattern usage– Dynamic pattern adjustment
![Page 40: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/40.jpg)
Generated RDF
![Page 41: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/41.jpg)
WoK Demo (via TISP)
![Page 42: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/42.jpg)
Semi-Automatic Annotation with TANGO (Table Analysis for Generating Ontologies)
• Recognize and normalize table information• Construct mini-ontologies from tables• Discover inter-ontology mappings• Merge mini-ontologies into a growing ontology
![Page 43: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/43.jpg)
Recognize Table Information
Religion Population Albanian Roman Shi’a SunniCountry (July 2001 est.) Orthodox Muslim Catholic Muslim Muslim other
Afganistan 26,813,057 15% 84% 1%Albania 3,510,484 20% 70% 10%
![Page 44: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/44.jpg)
Construct Mini-Ontology Religion Population Albanian Roman Shi’a SunniCountry (July 2001 est.) Orthodox Muslim Catholic Muslim Muslim other
Afganistan 26,813,057 15% 84% 1%Albania 3,510,484 20% 70% 10%
![Page 45: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/45.jpg)
Discover Mappings
![Page 46: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/46.jpg)
Merge
![Page 47: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/47.jpg)
BootstrappingCost-effective and Accurate Extraction
• Focus on semi-structured elements first
• Bootstrap synergistically– Extract from semi-structured elements– Learn extraction ontologies– Extract from plain text
![Page 48: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/48.jpg)
ListReader:Wrapper Induction for Lists
![Page 49: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/49.jpg)
Part I: Semi-supervised
![Page 50: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/50.jpg)
OCR
newline First row, left to right: C. Paulson, G. Whaley, E Eastlund, B. Krohg, D. Bakken, R. Norgaard, 0. Bakken, A. Vig, newline H. Megorden, D Wynne newline Second row- Mr. See bach, D. Colligan, J. Wogsland, F Knudson, A. Hagen, R. Myhrum, R. Nienaber, J. Mittun, newline Mr. Bohnsack. newline Third row: G. Carlm, R. Reterson, K Larson, J Skatvold, A. Enckson, R Roysland, L.Johnson, L. Nystrom. newLine Fourth row: R. Kvare, H. Haugen, R. Lubken, R Larson, A. Carlson, A. Nienaber, W Ram bo I, V Hanson, K. Ny- newline newline QootLaM "leam newline newline Captain Donald "Dude" Bakken ............... Right Half Back newline LeRoy "Sonny' Johnson ..................,.... Lcft Half Back newline Orley Bakken ...........,...........,.......... Quarter Back newline Roger Myhrum ................................... Full Back newline Bill "Schnozz" Krohg .............................. Center newline Howard "Little Huby" Megorden ................ Right Guard newline Royce "Shorty" Norgaard ....................... Left Guard newline Eugene "Mad Russian" Easthind ............... Right Tackle newline Alvin "Stuben" Hagen ......................... Left Tackle newline Richard "Dick" Nienabcr ........................ Right End newline James "Oakie" Wogsland .......................... Lcft End newline newline Other lettermen were- newline Glenn "Doc" Whaley newline Allen "Swede" Enckson newline James "Snooky" Mittun newline Curtis "Curt" Paulson newline Arthur "Art" Vig newline Forrest "Forry" Knudson newline Robert "Bobby" Roysland newline Page 26 newline
![Page 51: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/51.jpg)
HandForm
Creation &
Labeling
![Page 52: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/52.jpg)
HandForm
Creation &
Labeling
√
![Page 53: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/53.jpg)
HandForm
Creation &
Labeling
Donald√
![Page 54: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/54.jpg)
HandForm
Creation &
Labeling
Donald Bakken√
![Page 55: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/55.jpg)
HandForm
Creation &
Labeling
Donald Bakken Dude√
![Page 56: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/56.jpg)
HandForm
Creation &
Labeling
Donald Bakken Dude Right Half Back√
![Page 57: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/57.jpg)
Generate Wrapper for First Record
Captain Donald "Dude" Bakken ............... Right Half Back newline LeRoy "Sonny' Johnson ..................,.... Lcft Half Back newline Orley Bakken ...........,...........,.......... Quarter Back newline Roger Myhrum ................................... Full Back newline Bill "Schnozz" Krohg .............................. Center newline Howard "Little Huby" Megorden ................ Right Guard newline Royce "Shorty" Norgaard ....................... Left Guard newline Eugene "Mad Russian" Easthind ............... Right Tackle newline Alvin "Stuben" Hagen ......................... Left Tackle newline Richard "Dick" Nienabcr ........................ Right End newline James "Oakie" Wogsland .......................... Lcft End newline newline Other lettermen were- newline Glenn "Doc" Whaley newline Allen "Swede" Enckson newline James "Snooky" Mittun newline Curtis "Curt" Paulson newline Arthur "Art" Vig newline Forrest "Forry" Knudson newline Robert "Bobby" Roysland newline Page 26 newline
1. Captain, 2. Given Name, 3. Nickname, 4. Surname, 5. Position(Captain) (\w{6,6}) "(\w{4,4})" (\w{6,6}) \.{14,14} ((\w{4,5}){3,3})\n
![Page 58: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/58.jpg)
Update Wrapper &Annotate Records
Captain Donald "Dude" Bakken ............... Right Half Back newline LeRoy "Sonny' Johnson ..................,.... Lcft Half Back newline Orley Bakken ...........,...........,.......... Quarter Back newline Roger Myhrum ................................... Full Back newline Bill "Schnozz" Krohg .............................. Center newline Howard "Little Huby" Megorden ................ Right Guard newline Royce "Shorty" Norgaard ....................... Left Guard newline Eugene "Mad Russian" Easthind ............... Right Tackle newline Alvin "Stuben" Hagen ......................... Left Tackle newline Richard "Dick" Nienabcr ........................ Right End newline James "Oakie" Wogsland .......................... Lcft End newline newline Other lettermen were- newline Glenn "Doc" Whaley newline Allen "Swede" Enckson newline James "Snooky" Mittun newline Curtis "Curt" Paulson newline Arthur "Art" Vig newline Forrest "Forry" Knudson newline Robert "Bobby" Roysland newline Page 26 newline
2. Captain, 3. Given Name, 5. Nickname, 6. Surname, 7. Position((Captain) )?(\w{5,6})( "(\w{4,5}) ['"] )? (\w{6,7}) [\.,]{14,34} ((\w{4,7} ){2,3})\n
![Page 59: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/59.jpg)
Final Wrapperand Annotation
Captain Donald "Dude" Bakken ............... Right Half Back newline LeRoy "Sonny' Johnson ..................,.... Lcft Half Back newline Orley Bakken ...........,...........,.......... Quarter Back newline Roger Myhrum ................................... Full Back newline Bill "Schnozz" Krohg .............................. Center newline Howard "Little Huby" Megorden ................ Right Guard newline Royce "Shorty" Norgaard ....................... Left Guard newline Eugene "Mad Russian" Easthind ............... Right Tackle newline Alvin "Stuben" Hagen ......................... Left Tackle newline Richard "Dick" Nienabcr ........................ Right End newline James "Oakie" Wogsland .......................... Lcft End newline
2. Captain, 3. Given Name, 5. Nickname, 7. Surname, 8. Position((Captain) )?(\w{4,7})( “((\w{4,7}){1,2})['"] )? (\w{5,8} ) [\.,]{14,34} ((\w{4,7} ){1,3})\n
![Page 60: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/60.jpg)
Part II: Weakly-supervised
![Page 61: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/61.jpg)
Apply Extraction Ontologies
![Page 62: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/62.jpg)
Find List and Generate WrapperBase list finding on whether a wrapper can be generated.Base wrapper generation on best-labeled record.
![Page 63: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/63.jpg)
Extract Synergistically from Text
![Page 64: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/64.jpg)
Extract Synergistically from Text
![Page 65: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/65.jpg)
Form CreationBasic form-construction facilities:• single-entry field• multiple-entry field• nested form• …
![Page 66: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/66.jpg)
Created Sample Form
![Page 67: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/67.jpg)
Generated Ontology View
![Page 68: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/68.jpg)
Source-to-Form Mapping
![Page 69: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/69.jpg)
Source-to-Form Mapping
![Page 70: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/70.jpg)
Source-to-Form Mapping
![Page 71: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/71.jpg)
Source-to-Form Mapping
![Page 72: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/72.jpg)
Almost Ready to Harvest
• Need reading path: DOM-tree structure• Need to resolve mapping problems– Split/Merge– Union/Selection
![Page 73: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/73.jpg)
Almost Ready to Harvest …
• Need reading path: DOM-tree structure• Need to resolve mapping problems– Split/Merge– Union/Selection
Voltage-dependent anion-selective channel protein 3VDAC-3hVDAC3Outer mitochondrial membrane Protein porin 3
Name
![Page 74: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/74.jpg)
Almost Ready to Harvest …
• Need reading path: DOM-tree structure• Need to resolve mapping problems– Split/Merge– Union/Selection
Voltage-dependent anion-selective channel protein 3VDAC-3hVDAC3Outer mitochondrial membrane Protein porin 3
Name
![Page 75: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/75.jpg)
Almost Ready to Harvest …
• Need reading path: DOM-tree structure• Need to resolve mapping problems– Split/Merge– Union/Selection
Name
T-complex protein 1 subunit thetaTCP-1-thetaCCT-thetaRenal carcinoma antigen NY-REN-15
![Page 76: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/76.jpg)
Almost Ready to Harvest …
• Need reading path: DOM-tree structure• Need to resolve mapping problems– Split/Merge– Union/Selection
Name
T-complex protein 1 subunit thetaTCP-1-thetaCCT-thetaRenal carcinoma antigen NY-REN-15
![Page 77: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/77.jpg)
Can Now Harvest
Name
![Page 78: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/78.jpg)
Can Now Harvest
Name
14-3-3 protein epsilonMitochondrial import stimulation factor LsubunitProtein kinase C inhibitor protein-1KCIP-114-3-3E
![Page 79: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/79.jpg)
Can Now Harvest
Name
Voltage-dependent anion-selective channel protein 3VDAC-3hVDAC3Outer mitochondrial membrane Protein porin 3
![Page 80: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/80.jpg)
Can Now Harvest
Name
Tryptophanyl-tRNA synthetase, mitochondrial precursorEC 6.1.1.2Tryptophan—tRNA ligaseTrpRS(Mt)TrpRS
![Page 81: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/81.jpg)
Harvesting Populates Ontology
![Page 82: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/82.jpg)
Harvesting Populates Ontology
Also helps adjust ontology constraints
![Page 83: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/83.jpg)
Can Harvest from Additional Sites
Name
T-complex protein 1 subunit thetaTCP-1-thetaCCT-thetaRenal carcinoma antigen NY-REN-15
![Page 84: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/84.jpg)
AutomatingExtraction Ontology Creation
Lexicons
Name
14-3-3 protein epsilonMitochondrial import stimulation factor LsubunitProtein kinase C inhibitor protein-1KCIP-114-3-3E
Name
T-complex protein 1 subunit thetaTCP-1-thetaCCT-thetaRenal carcinoma antigen NY-REN-15
Name
Tryptophanyl-tRNA synthetase, mitochondrial precursorEC 6.1.1.2Tryptophan—tRNA ligaseTrpRS(Mt)TrpRS
…14-3-3 protein epsilonMitochondrial import stimulation factor LsubunitProtein kinase C inhibitor protein-1KCIP-114-3-3E…T-complex protein 1 subunit thetaTCP-1-thetaCCT-thetaRenal carcinoma antigen NY-REN-15…Tryptophanyl-tRNA synthetase, mitochondrial precursorEC 6.1.1.2Tryptophan—tRNA ligaseTrpRS(Mt)TrpRS…
![Page 85: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/85.jpg)
AutomatingExtraction Ontology Creation
Instance RecognizersNumber Patterns Context Keywords and Phrases
![Page 86: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/86.jpg)
Automatic Source-to-Form Mapping
![Page 87: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/87.jpg)
Automatic Semantic Annotation
Recognize and annotate with respect to an ontology
![Page 88: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/88.jpg)
• Advanced free-form queries with disjunction and negation
• Form-based query language• Table-based query languages• Graphical query languages
Practicalities: WoK Query Interfaces(Future Work)
![Page 89: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/89.jpg)
• Won’t just happen without sufficient content• Niche applications– Historical Data (e.g. Genealogy)– Topical Blogs
• Local WoKs– Intra-organizational effort– Individual interests
Practicalities: Bootstrapping the WoK(Future Work)
![Page 90: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/90.jpg)
• Potential Rapid growth– Thousands of ontologies– Millions of simultaneous queries– Billions of annotated pages– Trillions of facts
• Search-engine-like caching & query processing
Practicalities: Scalability(Future Work)
![Page 91: WoK: A Web of Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022081503/568143a0550346895db02137/html5/thumbnails/91.jpg)
• Automatic (or near automatic) creation of extraction ontologies
• Automatic (or near automatic) annotation of web pages
• Simple but accurate query specification without specialized training
Key to Success:Simplicity via Automation
www.deg.byu.edu