nlp linked open data "is a" solution

28
LINKED OPEN DATA SERVICES Is a “cat” a “mammal”? … true Is a “lizard” a “reptile”? … true Is a “cat” a “reptile”? … false Is a “lizard” an “animal”? … true …..

Upload: wilsmith73

Post on 15-Jan-2015

395 views

Category:

Technology


2 download

DESCRIPTION

Demonstration of solving the "is a" natural language problem using linked open data, the DBPedia.org RDF store, and Semantic Web technologies.

TRANSCRIPT

Page 1: NLP Linked Open Data "Is a" Solution

LINKED OPEN DATA SERVICESIs a “cat” a “mammal”? … true

Is a “lizard” a “reptile”? … true

Is a “cat” a “reptile”? … false

Is a “lizard” an “animal”? … true

…..

Page 2: NLP Linked Open Data "Is a" Solution

Four types of “IS A” query• Shallow query TO rdf:type: “Cat” is a “Animal”?

• Is a “Cat” -> rdf:type -> http://dbpedia.org/ontology/Animal

• Deep query THROUGH rdf:type: “Cat” is a “Eukaryote”?• Is a “Cat” -> rdf:type -> http://dbpedia.org/ontology/Animal• “Animal” -> rdfs:subClassOf -> http://dbpedia.org/ontology/Eukaryote

Page 3: NLP Linked Open Data "Is a" Solution

Four types of “IS A” query• Shallow query TO dcterms:subject: “Cat” is a “Feline”?

• Is a “Cat” -> dcterms:subject -> http://dbpedia.org/resource/Category:Felines

• Deep query THROUGH dcterms:subject: “Cat” is a “Felid”?• Is a “Cat” -> dcterms:subject -> http://dbpedia.org/resource/

Category:Felines

• http://dbpedia.org/resource/Category:Felines -> skos:broader -> http://dbpedia.org/resource/Category:Felids

Page 4: NLP Linked Open Data "Is a" Solution

CONVERT TRIPLES TO NEW VOCABULARYRe-link the Linked Data

Page 5: NLP Linked Open Data "Is a" Solution

Original Graph from DBPedia.org

Term

rdf:type

OntologyResource

OntologyResource

rdfs:subClassOf

Category Resource

dcterms:subject

Category Resource

skos:broader

Stru

ctured

Hierarch

yW

ikip

edia

Cat

ego

ries

Page 6: NLP Linked Open Data "Is a" Solution

Normalize Labels for Search• http://dbpedia.org/page/United_States

• “united states”• http://dbpedia.org/ontology/PopulatedPlace

• “populated place”• http://dbpedia.org/class/yago/CountriesBorderingTheAtlanticOcean

• “countries bordering the atlantic ocean”• http://dbpedia.org/resource/Category:Former_British_colonies

• “former british colonies”

Term / Resource

String: Label

rdfs:label

Page 7: NLP Linked Open Data "Is a" Solution

Update Types to Vulcan Vocabularyprefix: halo-uri http://halo.vulcan.com/lod/2013/11/isa-vocabulary#

• http://dbpedia.org/page/United_States• halo-uri:term• halo-uri:term-id

• http://dbpedia.org/ontology/PopulatedPlace• halo-uri:rdf-type

• http://dbpedia.org/resource/Category:Former_British_colonies• halo-uri:wikipedia-category

Term / Resource

Halo Vocabulary

rdf:type

Page 8: NLP Linked Open Data "Is a" Solution

Add “Is A” Connection for Graphsprefix: halo-uri http://halo.vulcan.com/lod/2013/11/isa-vocabulary#

• http://dbpedia.org/page/United_States• halo-uri:isA http://dbpedia.org/ontology/Place

• http://dbpedia.org/page/United_States• halo-uri:isA http://dbpedia.org/resource/Category:Republic

TermType,

Category, or Graph

halo-uri:isA

Page 9: NLP Linked Open Data "Is a" Solution

Add “Is A” Connection for Ontologyprefix: halo-uri http://halo.vulcan.com/lod/2013/11/isa-vocabulary#

• http://dbpedia.org/page/United_States• halo-uri:isA http://dbpedia.org/ontology/Place

Ontology Tree

Ontology Tree

Ontology Tree

Ontology Tree

Termhalo-uri:isA

Place Populated Place Country Thing

Page 10: NLP Linked Open Data "Is a" Solution

Add “Is A” Connection for Categoriesprefix: halo-uri http://halo.vulcan.com/lod/2013/11/isa-vocabulary#

• http://dbpedia.org/page/United_States• halo-uri:isA http://dbpedia.org/resource/Category:Republic

Category Tree

Category Tree

Category Tree

Category Tree

Termhalo-uri:isA “Is A” connections

not applied higher up category tree

Countries Republics Political Theories Philosophies

Page 11: NLP Linked Open Data "Is a" Solution

New Graph with Links to DBPedia.org

TermOntology graph

Halo type

Halo label

Halo typeHalo

type

Halolabel

Category graph

halo-uri:isA

Sea

rch

Ter

ms

and

Co

nte

xt

Custom graph replaces hierarchy

Page 12: NLP Linked Open Data "Is a" Solution

Original Graph from DBPedia.org

Term

rdf:type

OntologyResource

OntologyResource

Category Resource

dcterms:subject

Category Resource

skos:broader

Stru

ctured

Hierarch

yW

ikip

edia

Cat

ego

ries

rdfs:subClassOf

Page 13: NLP Linked Open Data "Is a" Solution

EXAMPLE: 1Query TO and THROUGH rdf:type

Page 14: NLP Linked Open Data "Is a" Solution

Query TO and THROUGH rdf:type• Is a “cat” a “mammal”?• Is a “cat an “animal”?

• Is a “lizard” a “reptile”?• Is a “lizard” an “animal”?

Cat halo-uri:isA Mammal

Lizard Reptile

Animal

halo-uri:isA

Page 15: NLP Linked Open Data "Is a" Solution

Query TO and THROUGH rdf:type• Is a “cat” a “reptile”?• Is a “cat an “animal”?

Cat Animal

Cat Reptilehalo-uri:isA

halo-uri:isA

Page 16: NLP Linked Open Data "Is a" Solution

Query TO and THROUGH rdf:type• Is a “cat” a “mal”?

Cat Animal

Cat Mammalhalo-uri:isA

halo-uri:isA

?

Page 17: NLP Linked Open Data "Is a" Solution

Query TO and THROUGH rdf:type• XML: http://halo.vulcan.com:8080/isa/cat/type/animal.xml• XML: http://halo.vulcan.com:8080/isa/cat/type-graph/animal.xml

<item>

<id>9c5eebf630d626279fa6acbe1f50c9b9</id>

<term>cat</term>

<domain>animal</domain>

<match>true</match>

<triples>

<s>http://dbpedia.org/resource/Cat</s>

<p>http://halo.vulcan.com/lod/2013/11/isa-vocabulary#isA</p>

<o>http://dbpedia.org/ontology/Animal</o>

<search>

<p>http://www.w3.org/2000/01/rdf-schema#label</p>

<o>animal</o>

</search>

</triples>

</item>

Page 18: NLP Linked Open Data "Is a" Solution

Query TO and THROUGH rdf:type• JSON: http://halo.vulcan.com:8080/isa/cat/type/animal.json• JSON: http://halo.vulcan.com:8080/isa/cat/type-graph/animal.json

{

"id":"9c5eebf630d626279fa6acbe1f50c9b9",

"term":"cat",

"domain":"animal",

"match":true,

"triples":[{

"s":"http://dbpedia.org/resource/Cat",

"p":"http://halo.vulcan.com/lod/2013/11/isa-vocabulary#isA",

"o":"http://dbpedia.org/ontology/Animal",

"search":{

"p":"http://www.w3.org/2000/01/rdf-schema#label",

"o":"animal"

}

}]

}

Page 19: NLP Linked Open Data "Is a" Solution

Query TO and THROUGH rdf:type• Halo.Vulcan SPARQL : Is a “cat” a “mammal”?

PREFIX halo: <http://halo.vulcan.com/lod/2013/11/isa-vocabulary#>

SELECT DISTINCT ?p ?o ?domainLabel WHERE { GRAPH ?G {

?term halo:isA ?o .

?term ?p ?o .

?term <http://www.w3.org/2000/01/rdf-schema#label> ?termLabel .

?o <http://www.w3.org/2000/01/rdf-schema#label> ?domainLabel .

?o rdf:type halo:rdf-type .

FILTER (regex(str(?termLabel), '^cat$', 'i')) .

FILTER (regex(str(?domainLabel), 'mammal', 'i'))

}} LIMIT 100

Page 20: NLP Linked Open Data "Is a" Solution

EXAMPLE: 2Query TO and THROUGH category

Page 21: NLP Linked Open Data "Is a" Solution

Query TO and THROUGH category• Is a “cat” an “animal”?

Cat halo-uri:isAInvasive animal

species

Animals described in 1758

Domesticated animals

Page 22: NLP Linked Open Data "Is a" Solution

Searches are Returned with Triples"triples":[{

"s":"http://dbpedia.org/resource/Cat",

"p":"http://halo.vulcan.com/lod/2013/11/isa-vocabulary#isA",

"o":"http://dbpedia.org/resource/Category:Invasive_animal_species",

"search":{

"p":"http://www.w3.org/2000/01/rdf-schema#label",

"o":"invasive animal species"

}

},{

"s":"http://dbpedia.org/resource/Cat",

"p":"http://halo.vulcan.com/lod/2013/11/isa-vocabulary#isA",

"o":"http://dbpedia.org/resource/Category:Animals_described_in_1758",

"search":{

"p":"http://www.w3.org/2000/01/rdf-schema#label",

"o":"animals described in 1758"

}

}]

Page 23: NLP Linked Open Data "Is a" Solution

Query TO and THROUGH rdf:type• Halo.Vulcan SPARQL : Is a “cat” an “animal”?

PREFIX halo: <http://halo.vulcan.com/lod/2013/11/isa-vocabulary#>

SELECT DISTINCT ?p ?o ?domainLabel WHERE { GRAPH ?G {

?term halo:isA ?o .

?term ?p ?o .

?term <http://www.w3.org/2000/01/rdf-schema#label> ?termLabel .

?o <http://www.w3.org/2000/01/rdf-schema#label> ?domainLabel .

?o rdf:type halo:wikipedia-category .

FILTER (regex(str(?termLabel), '^cat$', 'i')) .

FILTER (regex(str(?domainLabel), ’animal', 'i'))

}} LIMIT 100

Page 24: NLP Linked Open Data "Is a" Solution

ADDITIONAL SERVICESMore graphs, flexible service points, and unexpected features…

Page 25: NLP Linked Open Data "Is a" Solution

There are 4 graphs in Virtuoso• http://halo.vulcan.com:8890/conductor/sparql_graph.vspx

• http://halo.vulcan.com:8890/isa/rdf-type• ~87,069 Triples

• http://halo.vulcan.com:8890/isa/rdf-type-graph• ~101,823 Triples

• http://halo.vulcan.com:8890/isa/category-type• ~292,239 Triples

• http://halo.vulcan.com:8890/isa/category-type-graph• ~560,906 Triples

652,677 NORMALIZED TRIPLES ACROSS ALL GRAPHS

Page 26: NLP Linked Open Data "Is a" Solution

“Find All” Service Points

Why query every instance? Just ask the service for all relations to a term in a given graph.

• All “is a” matches in rdf-type• http://halo.vulcan.com:8080/isa/cat/type/.xml

• All “is a” matches in rdf-type graph• http://halo.vulcan.com:8080/isa/cat/type-graph/.xml

• All “is a” matches in categories• http://halo.vulcan.com:8080/isa/cat/category/.xml

• All “is a” matches in categories graph• http://halo.vulcan.com:8080/isa/cat/category-graph/.xml

• All “is a” matches in every graph• http://halo.vulcan.com:8080/isa/cat/.xml

Page 27: NLP Linked Open Data "Is a" Solution

Unanticipated Features• Absolute matching on domain for “is a” relations

• Add a URL for literal matching and update SPARQL regex

• Spacing and special characters need to be URL encoded because each call is a ‘GET’

• Wikipedia categories are of poor quality• User defined and often inaccurate• Very specific: “Animal species described in 1705”• Cyclical: “Republics -> Countries -> United States -> Republics …

• All paths lead to: Philosophy

• Plural categories make for difficult literal matching and odd “is a” statements: Is a cat a felines?

Page 28: NLP Linked Open Data "Is a" Solution

Example Service URLsQuery rdf-types graph:• Domain-range Query – http://halo.vulcan.com:8080/isa/cat/type/animal.json• Term isa * Query – http://halo.vulcan.com:8080/isa/cat/type/.json

Query rdf-types and parents graph: • Domain-range Query – http://halo.vulcan.com:8080/isa/lizard/type-graph/reptile.json• Term isa * Query – http://halo.vulcan.com:8080/isa/lizard/type-graph/.json

Query category graph:• Domain-range Query – http://halo.vulcan.com:8080/isa/cat/category/animal.json• Term isa * Query – http://halo.vulcan.com:8080/isa/cat/category/.json

Query category and parents graph:• Domain-range Query – http://halo.vulcan.com:8080/isa/cat/category-graph/animal.json• Term isa * Query – http://halo.vulcan.com:8080/isa/cat/category-graph/.json

Query all IsA graphs (every associated entity types, categories, and parents):• Domain-range Query – http://halo.vulcan.com:8080/isa/cat/animal.json• Term isa * Query – http://halo.vulcan.com:8080/isa/cat/.json