semantic understanding

137
Semantic Understanding An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National Science Foundation

Upload: kiayada-webb

Post on 03-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Semantic Understanding. An Approach Based on Information Extraction Ontologies. David W. Embley Brigham Young University. Funded in part by the National Science Foundation. Presentation Outline. Grand Challenge Meaning, Knowledge, Information, Data Fun and Games with Data - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Semantic Understanding

Semantic UnderstandingAn Approach Based on

Information Extraction Ontologies

David W. EmbleyBrigham Young University

Funded in part by the National Science Foundation

Page 2: Semantic Understanding

Presentation Outline Grand Challenge Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges

Page 3: Semantic Understanding

Grand Challenge

Semantic UnderstandingSemantic Understanding

Can we quantify & specify the nature of this grand challenge?

Page 4: Semantic Understanding

Grand Challenge

Semantic UnderstandingSemantic Understanding“If ever there were a technology that could generatetrillions of dollars in savings worldwide …, it wouldbe the technology that makes business informationsystems interoperable.”

(Jeffrey T. Pollock, VP of Technology Strategy, Modulant Solutions)

Page 5: Semantic Understanding

Grand Challenge

Semantic UnderstandingSemantic Understanding“The Semantic Web: … content that is meaningful tocomputers [and that] will unleash a revolution of newpossibilities … Properly designed, the Semantic Webcan assist the evolution of human knowledge …”

(Tim Berners-Lee, …, Weaving the Web)

Page 6: Semantic Understanding

Grand Challenge

Semantic UnderstandingSemantic Understanding“20th Century: Data Processing“21st Century: Data Exchange “The issue now is mutual understanding.”

(Stefano Spaccapietra, Editor in Chief, Journal on Data Semantics)

Page 7: Semantic Understanding

Grand Challenge

Semantic UnderstandingSemantic Understanding“The Grand Challenge [of semantic understanding] has become mission critical. Current solutions … won’t scale. Businesses need economic growth dependent on the web working and scaling (cost: $1 trillion/year).”

(Michael Brodie, Chief Scientist, Verizon Communications)

Page 8: Semantic Understanding

What is Semantic Understanding?

Understanding: “To grasp or comprehend [what’s]intended or expressed.’’

Semantics: “The meaning or the interpretation of a word, sentence, or other language form.”

- Dictionary.com

Page 9: Semantic Understanding

Can We Achieve Semantic Understanding?

“A computer doesn’t truly ‘understand’ anything.”

But computers can manipulate terms “in ways that are useful and meaningful to the human user.”

- Tim Berners-Lee

Key Point: it only has to be good enough.And that’s our challenge and our opportunity!

Page 10: Semantic Understanding

Presentation Outline Grand Challenge Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges

Page 11: Semantic Understanding

Information Value Chain

Meaning

Knowledge

Information

Data

Translating data into meaning

Page 12: Semantic Understanding

Foundational Definitions

Meaning: knowledge that is relevant or activates Knowledge: information with a degree of

certainty or community agreement Information: data in a conceptual framework Data: attribute-value pairs

- Adapted from [Meadow92]

Page 13: Semantic Understanding

Foundational Definitions

Meaning: knowledge that is relevant or activates Knowledge: information with a degree of

certainty or community agreement (ontology) Information: data in a conceptual framework Data: attribute-value pairs

- Adapted from [Meadow92]

Page 14: Semantic Understanding

Foundational Definitions

Meaning: knowledge that is relevant or activates Knowledge: information with a degree of

certainty or community agreement (ontology) Information: data in a conceptual framework Data: attribute-value pairs

- Adapted from [Meadow92]

Page 15: Semantic Understanding

Foundational Definitions

Meaning: knowledge that is relevant or activates Knowledge: information with a degree of

certainty or community agreement (ontology) Information: data in a conceptual framework Data: attribute-value pairs

- Adapted from [Meadow92]

Page 16: Semantic Understanding

Data

Attribute-Value Pairs• Fundamental for information• Thus, fundamental for knowledge & meaning

Page 17: Semantic Understanding

Data

Attribute-Value Pairs• Fundamental for information• Thus, fundamental for knowledge & meaning

Data Frame• Extensive knowledge about a data item

�̶Everyday data: currency, dates, time, weights & measures

�̶Textual appearance, units, context, operators, I/O conversion

• Abstract data type with an extended framework

Page 18: Semantic Understanding

Presentation Outline Grand Challenge Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges

Page 19: Semantic Understanding

?

Olympus C-750 Ultra Zoom

Sensor Resolution: 4.2 megapixelsOptical Zoom: 10 xDigital Zoom: 4 xInstalled Memory: 16 MBLens Aperture: F/8-2.8/3.7Focal Length min: 6.3 mmFocal Length max: 63.0 mm

Page 20: Semantic Understanding

?

Olympus C-750 Ultra Zoom

Sensor Resolution: 4.2 megapixelsOptical Zoom: 10 xDigital Zoom: 4 xInstalled Memory: 16 MBLens Aperture: F/8-2.8/3.7Focal Length min: 6.3 mmFocal Length max: 63.0 mm

Page 21: Semantic Understanding

?

Olympus C-750 Ultra Zoom

Sensor Resolution: 4.2 megapixelsOptical Zoom: 10 xDigital Zoom: 4 xInstalled Memory: 16 MBLens Aperture: F/8-2.8/3.7Focal Length min: 6.3 mmFocal Length max: 63.0 mm

Page 22: Semantic Understanding

?

Olympus C-750 Ultra Zoom

Sensor Resolution 4.2 megapixelsOptical Zoom 10 xDigital Zoom 4 xInstalled Memory 16 MBLens Aperture F/8-2.8/3.7Focal Length min 6.3 mmFocal Length max 63.0 mm

Page 23: Semantic Understanding

Digital Camera

Olympus C-750 Ultra Zoom

Sensor Resolution: 4.2 megapixelsOptical Zoom: 10 xDigital Zoom: 4 xInstalled Memory: 16 MBLens Aperture: F/8-2.8/3.7Focal Length min: 6.3 mmFocal Length max: 63.0 mm

Page 24: Semantic Understanding

?

Year 2002Make FordModel ThunderbirdMileage 5,500 milesFeatures Red

ABS6 CD changerkeyless entry

Price $33,000Phone (916) 972-9117

Page 25: Semantic Understanding

?

Year 2002Make FordModel ThunderbirdMileage 5,500 milesFeatures Red

ABS6 CD changerkeyless entry

Price $33,000Phone (916) 972-9117

Page 26: Semantic Understanding

?

Year 2002Make FordModel ThunderbirdMileage 5,500 milesFeatures Red

ABS6 CD changerkeyless entry

Price $33,000Phone (916) 972-9117

Page 27: Semantic Understanding

?

Year 2002Make FordModel ThunderbirdMileage 5,500 milesFeatures Red

ABS6 CD changerkeyless entry

Price $33,000Phone (916) 972-9117

Page 28: Semantic Understanding

Car Advertisement

Year 2002Make FordModel ThunderbirdMileage 5,500 milesFeatures Red

ABS6 CD changerkeyless entry

Price $33,000Phone (916) 972-9117

Page 29: Semantic Understanding

?

Flight # Class From Time/Date To Time/Date Stops

Delta 16 Coach JFK 6:05 pm CDG 7:35 am 0 02 01 04 03 01 04

Delta 119 Coach CDG 10:20 am JFK 1:00 pm 0 09 01 04 09 01 04

Page 30: Semantic Understanding

?

Flight # Class From Time/Date To Time/Date Stops

Delta 16 Coach JFK 6:05 pm CDG 7:35 am 0 02 01 04 03 01 04

Delta 119 Coach CDG 10:20 am JFK 1:00 pm 0 09 01 04 09 01 04

Page 31: Semantic Understanding

Airline Itinerary

Flight # Class From Time/Date To Time/Date Stops

Delta 16 Coach JFK 6:05 pm CDG 7:35 am 0 02 01 04 03 01 04

Delta 119 Coach CDG 10:20 am JFK 1:00 pm 0 09 01 04 09 01 04

Page 32: Semantic Understanding

?

Monday, October 13th

Group A W L T GF GA Pts.USA 3 0 0 11 1 9Sweden 2 1 0 5 3 6North Korea 1 2 0 3 4 3Nigeria 0 3 0 0 11 0

Group B W L T GF GA Pts.Brazil 2 0 1 8 2 7…

Page 33: Semantic Understanding

?

Monday, October 13th

Group A W L T GF GA Pts.USA 3 0 0 11 1 9Sweden 2 1 0 5 3 6North Korea 1 2 0 3 4 3Nigeria 0 3 0 0 11 0

Group B W L T GF GA Pts.Brazil 2 0 1 8 2 7…

Page 34: Semantic Understanding

World Cup Soccer

Monday, October 13th

Group A W L T GF GA Pts.USA 3 0 0 11 1 9Sweden 2 1 0 5 3 6North Korea 1 2 0 3 4 3Nigeria 0 3 0 0 11 0

Group B W L T GF GA Pts.Brazil 2 0 1 8 2 7…

Page 35: Semantic Understanding

?

Calories 250 calDistance 2.50 milesTime 23.35 minutesIncline 1.5 degreesSpeed 5.2 mphHeart Rate 125 bpm

Page 36: Semantic Understanding

?

Calories 250 calDistance 2.50 milesTime 23.35 minutesIncline 1.5 degreesSpeed 5.2 mphHeart Rate 125 bpm

Page 37: Semantic Understanding

?

Calories 250 calDistance 2.50 milesTime 23.35 minutesIncline 1.5 degreesSpeed 5.2 mphHeart Rate 125 bpm

Page 38: Semantic Understanding

Treadmill Workout

Calories 250 calDistance 2.50 milesTime 23.35 minutesIncline 1.5 degreesSpeed 5.2 mphHeart Rate 125 bpm

Page 39: Semantic Understanding

?

Place Bonnie LakeCounty DuchesneState UtahType LakeElevation 10,000 feetUSGS Quad Mirror LakeLatitude 40.711ºNLongitude 110.876ºW

Page 40: Semantic Understanding

?

Place Bonnie LakeCounty DuchesneState UtahType LakeElevation 10,000 feetUSGS Quad Mirror LakeLatitude 40.711ºNLongitude 110.876ºW

Page 41: Semantic Understanding

?

Place Bonnie LakeCounty DuchesneState UtahType LakeElevation 10,000 feetUSGS Quad Mirror LakeLatitude 40.711ºNLongitude 110.876ºW

Page 42: Semantic Understanding

Maps

Place Bonnie LakeCounty DuchesneState UtahType LakeElevation 10,100 feetUSGS Quad Mirror LakeLatitude 40.711ºNLongitude 110.876ºW

Page 43: Semantic Understanding

Presentation Outline Grand Challenge Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges

Page 44: Semantic Understanding

Information Extraction OntologiesSource Target

InformationExtraction

InformationExchange

Page 45: Semantic Understanding

What is an Extraction Ontology? Augmented Conceptual-Model Instance

• Object & relationship sets• Constraints• Data frame value recognizers

Robust Wrapper (Ontology-Based Wrapper)• Extracts information• Works even when site changes or when new sites

come on-line

Page 46: Semantic Understanding

CarAds

Color

Feature

AccessoryBodyType

OtherFeatureEngine

Transmission

Mileage

ModelTrim

TrimModel

Year

Make

Price

PhoneNr

0:1

has1:*

0:1has1:*

0:0.7:1has

1:* 0:0.9:1has

1:*

0:0.78:1

has

1:*

0:1

1:*

0:1

1:*

0:1

has1:*

0:*has

1:*

0:*

has

1:*

CarAds

Color

Feature

AccessoryBodyType

OtherFeatureEngine

Transmission

Mileage

ModelTrim

TrimModel

Year

Make

Price

PhoneNr

0:1

has1:*

0:1has1:*

0:0.7:1has

1:* 0:0.9:1has

1:*

0:0.78:1

has

1:*

0:1

1:*

0:1

1:*

0:1

has1:*

0:*has

1:*

0:*

has

1:*

CarAds Extraction Ontology

<ObjectSet x="329" y="51" lexical="true" name="Mileage" id="osmx50"> <DataFrame> <InternalRepresentation> <DataType typeName="String"/> </InternalRepresentation> <ValuePhraseList> <ValuePhrase hint="Mileage Pattern 1"> <ValueExpression color="ffffff"> <ExpressionText>[1-9]\d{0,2}[kK]</ExpressionText> </ValueExpression> <LeftContextExpression color="ffffff"> … <KeywordPhraseList> <KeywordPhrase hint=“New phrase 1”> <KeywordExpression color=“ffffff”> <ExpressionText>\bmiles\b</ExpressionText> …

<ObjectSet x="329" y="51" lexical="true" name="Mileage" id="osmx50"> <DataFrame> <InternalRepresentation> <DataType typeName="String"/> </InternalRepresentation> <ValuePhraseList> <ValuePhrase hint="Mileage Pattern 1"> <ValueExpression color="ffffff"> <ExpressionText>[1-9]\d{0,2}[kK]</ExpressionText> </ValueExpression> <LeftContextExpression color="ffffff"> … <KeywordPhraseList> <KeywordPhrase hint=“New phrase 1”> <KeywordExpression color=“ffffff”> <ExpressionText>\bmiles\b</ExpressionText> …

Page 47: Semantic Understanding

Extraction Ontologies:An Example of

Semantic Understanding

“Intelligent” Symbol Manipulation Gives the “Illusion of Understanding” Obtains Meaningful and Useful Results

Page 48: Semantic Understanding

Presentation Outline Grand Challenge Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges

Page 49: Semantic Understanding

A Variety of Applications Information Extraction Semantic Web Page Annotation Free-Form Semantic Web Queries Task Ontologies for Free-Form Service Requests High-Precision Classification Schema Mapping for Ontology Alignment Record Linkage Accessing the Hidden Web Ontology Discovery and Generation Challenging Applications (e.g. BioInformatics)

Page 50: Semantic Understanding

Application #1

Information Extraction

Page 51: Semantic Understanding

'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888

'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888

Constant/Keyword Recognition

Descriptor/String/Position(start/end)

Year|97|2|3Make|CHEV|5|8Make|CHEVY|5|9Model|Cavalier|11|18Feature|Red|21|23Feature|5 spd|26|30Mileage|7,000|38|42KEYWORD(Mileage)|miles|44|48Price|11,995|100|105Mileage|11,995|100|105PhoneNr|566-3800|136|143PhoneNr|566-3888|148|155

Page 52: Semantic Understanding

Heuristics

Keyword proximity Subsumed and overlapping constants Functional relationships Nonfunctional relationships First occurrence without constraint violation

Page 53: Semantic Understanding

Year|97|2|3Make|CHEV|5|8Make|CHEVY|5|9Model|Cavalier|11|18Feature|Red|21|23Feature|5 spd|26|30Mileage|7,000|38|42KEYWORD(Mileage)|miles|44|48Price|11,995|100|105Mileage|11,995|100|105PhoneNr|566-3800|136|143PhoneNr|566-3888|148|155

Keyword Proximity

'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888

'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888

Page 54: Semantic Understanding

Subsumed/Overlapping Constants

'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888

'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888

Year|97|2|3Make|CHEV|5|8Make|CHEVY|5|9Model|Cavalier|11|18Feature|Red|21|23Feature|5 spd|26|30Mileage|7,000|38|42KEYWORD(Mileage)|miles|44|48Price|11,995|100|105Mileage|11,995|100|105PhoneNr|566-3800|136|143PhoneNr|566-3888|148|155

Page 55: Semantic Understanding

Year|97|2|3Make|CHEV|5|8Make|CHEVY|5|9Model|Cavalier|11|18Feature|Red|21|23Feature|5 spd|26|30Mileage|7,000|38|42KEYWORD(Mileage)|miles|44|48Price|11,995|100|105Mileage|11,995|100|105PhoneNr|566-3800|136|143PhoneNr|566-3888|148|155

Functional Relationships

'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888

'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888

Page 56: Semantic Understanding

Nonfunctional Relationships

'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888

'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888

Year|97|2|3Make|CHEV|5|8Make|CHEVY|5|9Model|Cavalier|11|18Feature|Red|21|23Feature|5 spd|26|30Mileage|7,000|38|42KEYWORD(Mileage)|miles|44|48Price|11,995|100|105Mileage|11,995|100|105PhoneNr|566-3800|136|143PhoneNr|566-3888|148|155

Page 57: Semantic Understanding

First Occurrence without Constraint Violation

'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888

'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888

Year|97|2|3Make|CHEV|5|8Make|CHEVY|5|9Model|Cavalier|11|18Feature|Red|21|23Feature|5 spd|26|30Mileage|7,000|38|42KEYWORD(Mileage)|miles|44|48Price|11,995|100|105Mileage|11,995|100|105PhoneNr|566-3800|136|143PhoneNr|566-3888|148|155

Page 58: Semantic Understanding

Year|97|2|3Make|CHEV|5|8Make|CHEVY|5|9Model|Cavalier|11|18Feature|Red|21|23Feature|5 spd|26|30Mileage|7,000|38|42KEYWORD(Mileage)|miles|44|48Price|11,995|100|105Mileage|11,995|100|105PhoneNr|566-3800|136|143PhoneNr|566-3888|148|155

Database-Instance Generator

insert into Car values(1001, “97”, “CHEVY”, “Cavalier”, “7,000”, “11,995”, “556-3800”)insert into CarFeature values(1001, “Red”)insert into CarFeature values(1001, “5 spd”)

Page 59: Semantic Understanding

Application #2

Semantic Web Page Annotation

Page 60: Semantic Understanding

Annotated Web Page(Demo)

Page 61: Semantic Understanding

OWL<owl:Class rdf:ID="CarAds"> <rdfs:label xml:lang="en">CarAds</rdfs:label>...... <rdfs:subClassOf>

<owl:Restriction> <owl:onProperty rdf:resource="#hasMileage" /> <owl:minCardinality rdf:datatype="&xsd;nonNegativeInteger">0</owl:minCardinality>

</owl:Restriction> </rdfs:subClassOf> <rdfs:subClassOf>

<owl:Restriction> <owl:onProperty rdf:resource="#hasMileage" />

<owl:maxCardinality rdf:datatype="&xsd;nonNegativeInteger">1</owl:maxCardinality>

</owl:Restriction> </rdfs:subClassOf> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="#hasMileage" /> <owl:allValuesFrom rdf:resource="#Mileage" /> </owl:Restriction> </rdfs:subClassOf>……</owl:Class>……<owl:Class rdf:ID="Mileage"> <rdfs:label xml:lang="en">Mileage</rdfs:label>……</owl:Class>……

<CarAds rdf:ID="CarAdsIns2"><CarAdsValue rdf:datatype="&xsd;string">2</CarAdsValue>

</CarAds>……<Mileage rdf:ID="MileageIns2">

<StartingCharPosition rdf:datatype="&xsd;nonNegativeInteger">237</StartingCharPosition>

<EndingCharPosition rdf:datatype="&xsd;nonNegativeInteger">241</EndingCharPosition>

</Mileage>…….<owl:Thing rdf:about="#CarAdsIns2">

<hasMake rdf:resource="#MakeIns2" /><hasModel rdf:resource="#ModelIns2" /><hasYear rdf:resource="#YearIns2" /><hasMileage rdf:resource="#MileageIns2" /><hasPhoneNr rdf:resource="#PhoneNrIns2" /><hasPrice rdf:resource="#PriceIns2" />

</owl:Thing>

……

Page 62: Semantic Understanding

Application #3

Free-Form Semantic Web Queries

Page 63: Semantic Understanding

Find Ontology“Tell me about cruises on San Francisco Bay. I’d like to know

scheduled times, cost, and the duration of cruises on Friday of next week.”

Page 64: Semantic Understanding

Formulate Query

Friday, Oct. 29thcost

duration

Selection Constants

San Francisco Bayscheduled times

Projection

= Result ( )

Join Path

Page 65: Semantic Understanding

StartTime Price Duration Source

10:45 am, 12:00 pm, 1:15, 2:30, 4:00 $20.00, $16.00, $12.00

1

10:00 am, 10:45 am, 11:15 am, 12:00 pm, 12:30 pm, 1:15 pm, 1:45 pm, 2:30 pm, 3:00 pm, 3:45 pm, 4:15 pm, 5:00 pm

$17.00, $16.00, $12.00

1 Hour 2

Page 66: Semantic Understanding

Application #4

Task Ontologies for Free-Form Service Requests

Page 67: Semantic Understanding

Basic Idea Service Request

Match with Task Ontology• Domain Ontology• Process Ontology

Complete, Negotiate, Finalize

I want to see a dermatologist next week; any day would

be ok for me, at 4:00 p.m. The dermatologist must be

within 20 miles from my home and must accept my

insurance.

Page 68: Semantic Understanding

Domain Ontology

Appointment

Place

Insurance

Service Provider

Person

NameDoctor

Pediatrcian

Service Description

Duration

Medical Service Provider

Auto Service Provider Auto Mechanic

Dermatologist

Address

Cost

Date

Time

has

is at

is on

has

provides

has

accepts

hashas

"IHC"

is with

is for

is at

is at

has

"DMBA"

is at

->Appointment

Place

Insurance

Service Provider

Person

NameDoctor

Pediatrcian

Service Description

Duration

Medical Service Provider

Auto Service Provider Auto Mechanic

Dermatologist

Address

Cost

Date

Time

has

is at

is on

has

provides

has

accepts

hashas

"IHC"

is with

is for

is at

is at

has

"DMBA"

is at

->

Page 69: Semantic Understanding

Appointment

Place

Insurance

Service Provider

Person

NameDoctor

Pediatrcian

Service Description

Duration

Medical Service Provider

Auto Service Provider Auto Mechanic

Dermatologist

Address

Cost

Date

Time

has

is at

is on

has

provides

has

accepts

hashas

"IHC"

is with

is for

is at

is at

has

"DMBA"

is at

->Appointment

Place

Insurance

Service Provider

Person

NameDoctor

Pediatrcian

Service Description

Duration

Medical Service Provider

Auto Service Provider Auto Mechanic

Dermatologist

Address

Cost

Date

Time

has

is at

is on

has

provides

has

accepts

hashas

"IHC"

is with

is for

is at

is at

has

"DMBA"

is at

->

Appointment …

context keywords/phrase: “appointment |want to see a |…”

Dermatologist …

context keywords/phrases: “([D|d]ermatologist) | …”

I want to see a dermatologist next week; any day would

be ok for me, at 4:00 p.m. The dermatologist must be

within 20 miles from my home and must accept my

insurance.

Page 70: Semantic Understanding

Appointment

Place

Insurance

Service Provider

Person

NameDoctor

Pediatrcian

Service Description

Duration

Medical Service Provider

Auto Service Provider Auto Mechanic

Dermatologist

Address

Cost

Date

Time

has

is at

is on

has

provides

has

accepts

hashas

"IHC"

is with

is for

is at

is at

has

"DMBA"

is at

->Appointment

Place

Insurance

Service Provider

Person

NameDoctor

Pediatrcian

Service Description

Duration

Medical Service Provider

Auto Service Provider Auto Mechanic

Dermatologist

Address

Cost

Date

Time

has

is at

is on

has

provides

has

accepts

hashas

"IHC"

is with

is for

is at

is at

has

"DMBA"

is at

->

Appointment …

context keywords/phrase: “appointment |want to see a |…”

Dermatologist …

context keywords/phrases: “([D|d]ermatologist) | …”

I want to see a dermatologist next week; any day would

be ok for me, at 4:00 p.m. The dermatologist must be

within 20 miles from my home and must accept my

insurance.

Page 71: Semantic Understanding

Appointment

Place

Insurance

Service Provider

Person

NameDoctor

Pediatrcian

Service Description

Duration

Medical Service Provider

Auto Service Provider Auto Mechanic

Dermatologist

Address

Cost

Date

Time

has

is at

is on

has

provides

has

accepts

hashas

"IHC"

is with

is for

is at

is at

has

"DMBA"

is at

->Appointment

Place

Insurance

Service Provider

Person

NameDoctor

Pediatrcian

Service Description

Duration

Medical Service Provider

Auto Service Provider Auto Mechanic

Dermatologist

Address

Cost

Date

Time

has

is at

is on

has

provides

has

accepts

hashas

"IHC"

is with

is for

is at

is at

has

"DMBA"

is at

->

Appointment …

context keywords/phrase: “appointment |want to see a |…”

Dermatologist …

context keywords/phrases: “([D|d]ermatologist) | …”

I want to see a dermatologist next week; any day would

be ok for me, at 4:00 p.m. The dermatologist must be

within 20 miles from my home and must accept my

insurance.

Page 72: Semantic Understanding

Appointment

Place

Insurance

Service Provider

Person

NameDoctor

Pediatrcian

Service Description

Duration

Medical Service Provider

Auto Service Provider Auto Mechanic

Dermatologist

Address

Cost

Date

Time

has

is at

is on

has

provides

has

accepts

hashas

"IHC"

is with

is for

is at

is at

has

"DMBA"

is at

->Appointment

Place

Insurance

Service Provider

Person

NameDoctor

Pediatrcian

Service Description

Duration

Medical Service Provider

Auto Service Provider Auto Mechanic

Dermatologist

Address

Cost

Date

Time

has

is at

is on

has

provides

has

accepts

hashas

"IHC"

is with

is for

is at

is at

has

"DMBA"

is at

->

Appointment …

context keywords/phrase: “appointment |want to see a |…”

Dermatologist …

context keywords/phrases: “([D|d]ermatologist) | …”

I want to see a dermatologist next week; any day would

be ok for me, at 4:00 p.m. The dermatologist must be

within 20 miles from my home and must accept my

insurance.

Page 73: Semantic Understanding

Appointment

Place

Insurance

Service Provider

Person

NameDoctor

Pediatrcian

Service Description

Duration

Medical Service Provider

Auto Service Provider Auto Mechanic

Dermatologist

Address

Cost

Date

Time

has

is at

is on

has

provides

has

accepts

hashas

"IHC"

is with

is for

is at

is at

has

"DMBA"

is at

->Appointment

Place

Insurance

Service Provider

Person

NameDoctor

Pediatrcian

Service Description

Duration

Medical Service Provider

Auto Service Provider Auto Mechanic

Dermatologist

Address

Cost

Date

Time

has

is at

is on

has

provides

has

accepts

hashas

"IHC"

is with

is for

is at

is at

has

"DMBA"

is at

->

Appointment …

context keywords/phrase: “appointment |want to see a |…”

Dermatologist …

context keywords/phrases: “([D|d]ermatologist) | …”

I want to see a dermatologist next week; any day would

be ok for me, at 4:00 p.m. The dermatologist must be

within 20 miles from my home and must accept my

insurance.

Date …NextWeek(d1: Date, d2: Date)returns (Boolean{T,F})context keywords/phrases: next week | week from now | …

Distanceinternal representation : real;input (s: String)context keywords/phrases: miles | mile | mi | kilometers | kilometer | meters | meter | centimeter | … Within(d1: Distance, “20”)returns (Boolean {T or F})context keywords/phrases: within | not more than | | …return (d1d2)…end;

Page 74: Semantic Understanding

Appointment

Place

Insurance

Service Provider

Person

NameDoctor

Pediatrcian

Service Description

Duration

Medical Service Provider

Auto Service Provider Auto Mechanic

Dermatologist

Address

Cost

Date

Time

has

is at

is on

has

provides

has

accepts

hashas

"IHC"

is with

is for

is at

is at

has

"DMBA"

is at

->Appointment

Place

Insurance

Service Provider

Person

NameDoctor

Pediatrcian

Service Description

Duration

Medical Service Provider

Auto Service Provider Auto Mechanic

Dermatologist

Address

Cost

Date

Time

has

is at

is on

has

provides

has

accepts

hashas

"IHC"

is with

is for

is at

is at

has

"DMBA"

is at

->

Page 75: Semantic Understanding

Appointment

Place

Dermatologist

Person

Name

Address

Date

Time

is at

is on

has

hasis with

is for

is at

is at

has

is at

->Appointment

Place

Dermatologist

Person

Name

Address

Date

Time

is at

is on

has

hasis with

is for

is at

is at

has

is at

->

Page 76: Semantic Understanding

Process Ontology

ready to schedule

task-view = null

report that the appointment cannot be scheduled

task-view != null

schedule-appointment(task-view.Person.Name,task-view.Service Provider.Name, task-view.Date, task-view.Time, task-view.Address);report that the appointment is scheduled;

initial-task-view ready

no missing information missing information

task-view = get-from-system(task-view); if (still missing values) task-view = ger-from-user(task-view);

@process ontology(domain ontology)

task-view = create-task-view(domain ontology);task-constraints = create-task-constraints(task-view);

ready@create

initialize

.

.

.

ready to schedule

task-view = null

report that the appointment cannot be scheduled

task-view != null

schedule-appointment(task-view.Person.Name,task-view.Service Provider.Name, task-view.Date, task-view.Time, task-view.Address);report that the appointment is scheduled;

initial-task-view ready

no missing information missing information

task-view = get-from-system(task-view); if (still missing values) task-view = ger-from-user(task-view);

@process ontology(domain ontology)

task-view = create-task-view(domain ontology);task-constraints = create-task-constraints(task-view);

ready@create

initialize

.

.

.

Page 77: Semantic Understanding

Specification Satisfaction

"Dr. Carter" "Lynn Jones"

Dermatologist0 "IHC" "DMBA"

Appointment7 "4:00""5 Jan 05"

Person100

"Orem 600 State St." "Provo 300 State St."

"Dr. Carter" "Lynn Jones"

Dermatologist0 "IHC" "DMBA"

Appointment7 "4:00""5 Jan 05"

Person100

"Orem 600 State St." "Provo 300 State St."

Date(“28 Dec 04”) and NextWeek(“28 Dec 04”, “5 Jan 05”)Dermatologist(Dermatologist0) is at Address(“Orem 600 State St.”) and Within(DistanceBetween(“Provo 300 State St.”, “Orem 600 State St.”), “22”)i2 (Dermatologist(Dermatologist0) accepts Insurance(i2) and Equal(“IHC”, i2))

Page 78: Semantic Understanding

Application #5

High-Precision Classification

Page 79: Semantic Understanding

An Extraction Ontology Solution

Page 80: Semantic Understanding

Document 1: Car Ads

Document 2: Items for Sale or Rent

Density Heuristic

Page 81: Semantic Understanding

Document 1: Car Ads

Year: 3Make: 2Model: 3Mileage: 1Price: 1Feature: 15PhoneNr: 3

Expected Values Heuristic

Document 2: Items for Sale or Rent

Year: 1Make: 0Model: 0Mileage: 1Price: 0Feature: 0PhoneNr: 4

Page 82: Semantic Understanding

Vector Space of Expected Values

OV ______ D1 D2Year 0.98 16 6Make 0.93 10 0Model 0.91 12 0Mileage 0.45 6 2Price 0.80 11 8Feature 2.10 29 0PhoneNr 1.15 15 11

D1: 0.996D2: 0.567

ov

D1

D2

Page 83: Semantic Understanding

Grouping Heuristic

YearMakeModelPriceYearModelYearMakeModelMileage…

Document 1: Car Ads

{{{

YearMileage…MileageYearPricePrice…

Document 2: Items for Sale or Rent

{{

Page 84: Semantic Understanding

GroupingCar Ads----------------YearYearMakeModel-------------- 3PriceYearModelYear---------------3MakeModelMileageYear---------------4ModelMileagePriceYear---------------4…Grouping: 0.875

Sale Items----------------YearYearYearMileage-------------- 2MileageYearPricePrice---------------3YearPricePriceYear---------------2PricePricePricePrice---------------1…Grouping: 0.500

Expected Number in Group = floor(∑ Ave ) = 4 (for our example)

Sum of Distinct 1-Max Object Sets in each GroupNumber of Groups * Expected Number in a Group

1-Max

3+3+4+4 4*4

= 0.875 2+3+2+1 4*4

= 0.500

Page 85: Semantic Understanding

Application #6

Schema Mapping forOntology Alignment

Page 86: Semantic Understanding

Problem: Different Schemas

Target Database Schema{Car, Year, Make, Model, Mileage, Price, PhoneNr}, {PhoneNr, Extension}, {Car, Feature}

Different Source Table Schemas• {Run #, Yr, Make, Model, Tran, Color, Dr}• {Make, Model, Year, Colour, Price, Auto, Air Cond.,

AM/FM, CD}• {Vehicle, Distance, Price, Mileage}• {Year, Make, Model, Trim, Invoice/Retail, Engine,

Fuel Economy}

Page 87: Semantic Understanding

Solution: Remove Internal Factoring

Discover Nesting: Make, (Model, (Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*)*

Unnest: μ(Model, Year, Colour, Price, Auto, Air Cond, AM/FM, CD)* μ (Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*Table

Legend

ACURA

ACURA

Page 88: Semantic Understanding

Solution: Replace Boolean Values

Legend

ACURA

ACURA

β CD Table

Yes,

CD

CD

Yes,Yes,βAutoβAir CondβAM/FMYes,

AM/FM

AM/FM

AM/FM

AM/FM

AM/FM

AM/FM

Air Cond.

Air Cond.

Air Cond.

Air Cond.

Auto

Auto

Auto

Auto

Page 89: Semantic Understanding

Solution: Form Attribute-Value Pairs

Legend

ACURA

ACURA

CD

CD

AM/FM

AM/FM

AM/FM

AM/FM

AM/FM

AM/FM

Air Cond.

Air Cond.

Air Cond.

Air Cond.

Auto

Auto

Auto

Auto

<Make, Honda>, <Model, Civic EX>, <Year, 1995>, <Colour, White>, <Price, $6300>, <Auto, Auto>, <Air Cond., Air Cond.>, <AM/FM, AM/FM>, <CD, >

Page 90: Semantic Understanding

Solution: Adjust Attribute-Value Pairs

Legend

ACURA

ACURA

CD

CD

AM/FM

AM/FM

AM/FM

AM/FM

AM/FM

AM/FM

Air Cond.

Air Cond.

Air Cond.

Air Cond.

Auto

Auto

Auto

Auto

<Make, Honda>, <Model, Civic EX>, <Year, 1995>, <Colour, White>, <Price, $6300>, <Auto>, <Air Cond>, <AM/FM>

Page 91: Semantic Understanding

Solution: Do Extraction

Legend

ACURA

ACURA

CD

CD

AM/FM

AM/FM

AM/FM

AM/FM

AM/FM

AM/FM

Air Cond.

Air Cond.

Air Cond.

Air Cond.

Auto

Auto

Auto

Auto

Page 92: Semantic Understanding

Solution: Infer Mappings

Legend

ACURA

ACURA

CD

CD

AM/FM

AM/FM

AM/FM

AM/FM

AM/FM

AM/FM

Air Cond.

Air Cond.

Air Cond.

Air Cond.

Auto

Auto

Auto

Auto

{Car, Year, Make, Model, Mileage, Price, PhoneNr}, {PhoneNr, Extension}, {Car, Feature}

Each row is a car. πModelμ(Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*TableπMakeμ(Model, Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*μ(Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*TableπYearTable

Note: Mappings produce sets for attributes. Joining to form recordsis trivial because we have OIDs for table rows (e.g. for each Car).

Page 93: Semantic Understanding

Solution: Infer Mappings

Legend

ACURA

ACURA

CD

CD

AM/FM

AM/FM

AM/FM

AM/FM

AM/FM

AM/FM

Air Cond.

Air Cond.

Air Cond.

Air Cond.

Auto

Auto

Auto

Auto

{Car, Year, Make, Model, Mileage, Price, PhoneNr}, {PhoneNr, Extension}, {Car, Feature}

πModelμ(Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*Table

Page 94: Semantic Understanding

Solution: Do Extraction

Legend

ACURA

ACURA

CD

CD

AM/FM

AM/FM

AM/FM

AM/FM

AM/FM

AM/FM

Air Cond.

Air Cond.

Air Cond.

Air Cond.

Auto

Auto

Auto

Auto

{Car, Year, Make, Model, Mileage, Price, PhoneNr}, {PhoneNr, Extension}, {Car, Feature}

πPriceTable

Page 95: Semantic Understanding

Solution: Do Extraction

Legend

ACURA

ACURA

CD

CD

AM/FM

AM/FM

AM/FM

AM/FM

AM/FM

AM/FM

Air Cond.

Air Cond.

Air Cond.

Air Cond.

Auto

Auto

Auto

Auto

{Car, Year, Make, Model, Mileage, Price, PhoneNr}, {PhoneNr, Extension}, {Car, Feature}

Yes,ρ Colour←Feature π ColourTable U ρ Auto←Feature π Auto β AutoTable U ρ Air Cond.←Feature π Air Cond.

β Air Cond.Table U ρ AM/FM←Feature π AM/FM β AM/FMTable U ρ CD←Featureπ CDβ CDTableYes, Yes, Yes,

Page 96: Semantic Understanding

Application #7

Record Linkage

Page 97: Semantic Understanding

“Kelly Flanagan” Query

Page 98: Semantic Understanding

Gather evidence from each of several different facets• Attributes• Links• Page Similarity

Combine the evidence

A Multi-faceted Approach

Page 99: Semantic Understanding

Phone number, email address, state, city, zip code Data-frame recognizers

Attributes

Page 100: Semantic Understanding

Links

Page 101: Semantic Understanding

“adjacent cap-word pairs”: Cap-Word (Connector | Preposition (Article)? | (Capital-LetterDot))? Cap-Word.

Page Similarity

Page 102: Semantic Understanding

C1 C2 ….. Ci ….. Cj … Cn

C1 1 C12 C1i C1j C1n

C2 1 C2i C2j C2n

: : : :

Ci 1 Cij Cin

: : :

Cj 1 Cjn

: :

Cn 1

P(Ci and Cj refer to a same person | evidence for a facet f )

0 if no evidence for a facet f

Cij =

Training set to compute the conditional probabilities

Confidence Matrix for Each Facet

Page 103: Semantic Understanding

0.96 + 0 + 0.78 - 0.96 * 0 - 0.96 * 0.78 - 0.78 * 0 + 0.96 * 0 * 0.78 = 0.9912

Confidence Matrix for Attributes Confidence Matrix for Links Confidence Matrix for Page Similarity

Final Matrix

Page 104: Semantic Understanding

Input: final confidence matrix Output: citations grouped by same person The idea:

{Ci , Cj} and {Cj , Ck} then {Ci , Cj , Ck}

The threshold we use for “highly confident” is 0.8.

Grouping Algorithm

Page 105: Semantic Understanding

Experimental Results

Page 106: Semantic Understanding

Application #8

Accessing the Hidden Web

Page 107: Semantic Understanding

Obtaining Data Behind Forms

• Web information is stored in databases

• Databases are accessed through forms

• Forms are designed in various ways

Page 108: Semantic Understanding

Hidden Web Extraction System

Input Analyzer

Retrieved Page(s)

User Query

Site Form

Output Analyzer

Extracted Information

ApplicationExtraction Ontology

“Find green cars costing no more than $9000.”

Page 109: Semantic Understanding

Application #9

Ontology Discovery & Generation

Page 110: Semantic Understanding

TANGO: Table Analysis for Generating Ontologies

Recognize and normalize table information Construct mini-ontologies from tables Discover inter-ontology mappings Merge mini-ontologies into a growing ontology

Page 111: Semantic Understanding

Recognize Table Information

Religion Population Albanian Roman Shi’a SunniCountry (July 2001 est.) Orthodox Muslim Catholic Muslim Muslim other

Afganistan 26,813,057 15% 84% 1%Albania 3,510,484 20% 70% 10%

Page 112: Semantic Understanding

Construct Mini-Ontology Religion Population Albanian Roman Shi’a SunniCountry (July 2001 est.) Orthodox Muslim Catholic Muslim Muslim other

Afganistan 26,813,057 15% 84% 1%Albania 3,510,484 20% 70% 10%

Page 113: Semantic Understanding

Discover Mappings

Page 114: Semantic Understanding

Merge

Page 115: Semantic Understanding

Application #10

Challenging Applications(e.g. BioInformatics)

Page 116: Semantic Understanding

Large Extraction Ontologies

Page 117: Semantic Understanding

Complex Semi-Structured Pages

Page 118: Semantic Understanding

Additional Analysis Opportunities

Sibling Page Comparison Semi-automatic Lexicon Update Seed Ontology Recognition

Page 119: Semantic Understanding

Sibling Page Comparison

Page 120: Semantic Understanding

Sibling Page ComparisonAttributes

Page 121: Semantic Understanding

Sibling Page Comparison

Page 122: Semantic Understanding

Sibling Page Comparison

Page 123: Semantic Understanding

Semi-automatic Lexicon Update

Additional Protein Names

Additional Source Speciesor Organisms

Page 124: Semantic Understanding

nucleus;

nucleus;zinc ion binding;nucleic acid binding;

zinc ion binding;nucleic acid binding;

linear;

NP_079345;

9606;

Eukaryota; Metazoa;Chorata;Craniata;Vertebrata;Euteleostomi;Mammalia;Eutheria;Primates;Catarrhini;Hominidae;Homo;

NP_079345;

Homo sapiens;human;

GTTTTTGTGTT……….ATAAGTGCATTAACGGCCCACATG;

FLJ14299

msdspagsnprtpessgsgsgg………tagpyyspyalygqrlasasalgyq;

hypothetical protein FLJ14299;

8;eight;

“8:?p\s?12”;“8:?p11.2”;“8:?p11.23”;:: “37,?612,?680”;

“37,?610,?585”;

Seed Ontology Recognition

Page 125: Semantic Understanding

Seed Ontology Recognition

Page 126: Semantic Understanding

Presentation Outline Grand Challenge Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges

Page 127: Semantic Understanding

Limitations and Pragmatics

Data-Rich, Narrow Domain Ambiguities ~ Context Assumptions Incompleteness ~ Implicit Information Common Sense Requirements Knowledge Prerequisites …

Page 128: Semantic Understanding

Busiest Airport?

Chicago - 928,735 Landings (Nat. Air Traffic Controllers Assoc.) - 931,000 Landings (Federal Aviation Admin.)Atlanta - 58,875,694 Passengers (Sep., latest numbers available)Memphis - 2,494,190 Metric Tons (Airports Council Int’l.)

Page 129: Semantic Understanding

Busiest Airport?

Chicago - 928,735 Landings (Nat. Air Traffic Controllers Assoc.) - 931,000 Landings (Federal Aviation Admin.)Atlanta - 58,875,694 Passengers (Sep., latest numbers available)Memphis - 2,494,190 Metric Tons (Airports Council Int’l.)

Page 130: Semantic Understanding

Busiest Airport?

Chicago - 928,735 Landings (Nat. Air Traffic Controllers Assoc.) - 931,000 Landings (Federal Aviation Admin.)Atlanta - 58,875,694 Passengers (Sep., latest numbers available)Memphis - 2,494,190 Metric Tons (Airports Council Int’l.)

Page 131: Semantic Understanding

Busiest Airport?

Chicago - 928,735 Landings (Nat. Air Traffic Controllers Assoc.) - 931,000 Landings (Federal Aviation Admin.)Atlanta - 58,875,694 Passengers (Sep., latest numbers available)Memphis - 2,494,190 Metric Tons (Airports Council Int’l.)

Ambiguous Whom do we trust? (How do they count?)

Page 132: Semantic Understanding

Busiest Airport?

Chicago - 928,735 Landings (Nat. Air Traffic Controllers Assoc.) - 931,000 Landings (Federal Aviation Admin.)Atlanta - 58,875,694 Passengers (Sep., latest numbers available)Memphis - 2,494,190 Metric Tons (Airports Council Int’l.)

Important qualification

Page 133: Semantic Understanding

Dow Jones Industrial Average

High Low Last Chg30 Indus 10527.03 10321.35 10409.85 +85.1820 Transp 3038.15 2998.60 3008.16 +9.8315 Utils 268.78 264.72 266.45 +1.7266 Stocks 3022.31 2972.94 2993.12 +19.65

44.07

10,409.85

Graphics, Icons, …

Page 134: Semantic Understanding

Dow Jones Industrial Average

High Low Last Chg30 Indus 10527.03 10321.35 10409.85 +85.1820 Transp 3038.15 2998.60 3008.16 +9.8315 Utils 268.78 264.72 266.45 +1.7266 Stocks 3022.31 2972.94 2993.12 +19.65

44.07

10,409.85

Reported onsame date

WeeklyDaily

Implicit information: weekly stated in upper corner of page; daily not stated.

Page 135: Semantic Understanding

Presentation Outline Grand Challenge Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges

Page 136: Semantic Understanding

Some Key Ideas Data, Information, and Knowledge Data Frames

• Knowledge about everyday data items• Recognizers for data in context

Ontologies• Resilient Extraction Ontologies• Shared Conceptualizations

Limitations and Pragmatics

Page 137: Semantic Understanding

Some Research Issues

Building a library of open source data recognizers Precisely finding and gathering relevant information

• Subparts of larger data

• Scattered data (linked, factored, implied)

• Data behind forms in the hidden web

Improving concept matching• Indirect matching

• Calculations, unit conversions, data normalization, …

Achieving the potential of the presented applications

www.deg.byu.edu