nlp linked open data "is a" solution

Post on 15-Jan-2015

398 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Demonstration of solving the "is a" natural language problem using linked open data, the DBPedia.org RDF store, and Semantic Web technologies.

TRANSCRIPT

LINKED OPEN DATA SERVICESIs a “cat” a “mammal”? … true

Is a “lizard” a “reptile”? … true

Is a “cat” a “reptile”? … false

Is a “lizard” an “animal”? … true

…..

Four types of “IS A” query• Shallow query TO rdf:type: “Cat” is a “Animal”?

• Is a “Cat” -> rdf:type -> http://dbpedia.org/ontology/Animal

• Deep query THROUGH rdf:type: “Cat” is a “Eukaryote”?• Is a “Cat” -> rdf:type -> http://dbpedia.org/ontology/Animal• “Animal” -> rdfs:subClassOf -> http://dbpedia.org/ontology/Eukaryote

Four types of “IS A” query• Shallow query TO dcterms:subject: “Cat” is a “Feline”?

• Is a “Cat” -> dcterms:subject -> http://dbpedia.org/resource/Category:Felines

• Deep query THROUGH dcterms:subject: “Cat” is a “Felid”?• Is a “Cat” -> dcterms:subject -> http://dbpedia.org/resource/

Category:Felines

• http://dbpedia.org/resource/Category:Felines -> skos:broader -> http://dbpedia.org/resource/Category:Felids

CONVERT TRIPLES TO NEW VOCABULARYRe-link the Linked Data

Original Graph from DBPedia.org

Term

rdf:type

OntologyResource

OntologyResource

rdfs:subClassOf

Category Resource

dcterms:subject

Category Resource

skos:broader

Stru

ctured

Hierarch

yW

ikip

edia

Cat

ego

ries

Normalize Labels for Search• http://dbpedia.org/page/United_States

• “united states”• http://dbpedia.org/ontology/PopulatedPlace

• “populated place”• http://dbpedia.org/class/yago/CountriesBorderingTheAtlanticOcean

• “countries bordering the atlantic ocean”• http://dbpedia.org/resource/Category:Former_British_colonies

• “former british colonies”

Term / Resource

String: Label

rdfs:label

Update Types to Vulcan Vocabularyprefix: halo-uri http://halo.vulcan.com/lod/2013/11/isa-vocabulary#

• http://dbpedia.org/page/United_States• halo-uri:term• halo-uri:term-id

• http://dbpedia.org/ontology/PopulatedPlace• halo-uri:rdf-type

• http://dbpedia.org/resource/Category:Former_British_colonies• halo-uri:wikipedia-category

Term / Resource

Halo Vocabulary

rdf:type

Add “Is A” Connection for Graphsprefix: halo-uri http://halo.vulcan.com/lod/2013/11/isa-vocabulary#

• http://dbpedia.org/page/United_States• halo-uri:isA http://dbpedia.org/ontology/Place

• http://dbpedia.org/page/United_States• halo-uri:isA http://dbpedia.org/resource/Category:Republic

TermType,

Category, or Graph

halo-uri:isA

Add “Is A” Connection for Ontologyprefix: halo-uri http://halo.vulcan.com/lod/2013/11/isa-vocabulary#

• http://dbpedia.org/page/United_States• halo-uri:isA http://dbpedia.org/ontology/Place

Ontology Tree

Ontology Tree

Ontology Tree

Ontology Tree

Termhalo-uri:isA

Place Populated Place Country Thing

Add “Is A” Connection for Categoriesprefix: halo-uri http://halo.vulcan.com/lod/2013/11/isa-vocabulary#

• http://dbpedia.org/page/United_States• halo-uri:isA http://dbpedia.org/resource/Category:Republic

Category Tree

Category Tree

Category Tree

Category Tree

Termhalo-uri:isA “Is A” connections

not applied higher up category tree

Countries Republics Political Theories Philosophies

New Graph with Links to DBPedia.org

TermOntology graph

Halo type

Halo label

Halo typeHalo

type

Halolabel

Category graph

halo-uri:isA

Sea

rch

Ter

ms

and

Co

nte

xt

Custom graph replaces hierarchy

Original Graph from DBPedia.org

Term

rdf:type

OntologyResource

OntologyResource

Category Resource

dcterms:subject

Category Resource

skos:broader

Stru

ctured

Hierarch

yW

ikip

edia

Cat

ego

ries

rdfs:subClassOf

EXAMPLE: 1Query TO and THROUGH rdf:type

Query TO and THROUGH rdf:type• Is a “cat” a “mammal”?• Is a “cat an “animal”?

• Is a “lizard” a “reptile”?• Is a “lizard” an “animal”?

Cat halo-uri:isA Mammal

Lizard Reptile

Animal

halo-uri:isA

Query TO and THROUGH rdf:type• Is a “cat” a “reptile”?• Is a “cat an “animal”?

Cat Animal

Cat Reptilehalo-uri:isA

halo-uri:isA

Query TO and THROUGH rdf:type• Is a “cat” a “mal”?

Cat Animal

Cat Mammalhalo-uri:isA

halo-uri:isA

?

Query TO and THROUGH rdf:type• XML: http://halo.vulcan.com:8080/isa/cat/type/animal.xml• XML: http://halo.vulcan.com:8080/isa/cat/type-graph/animal.xml

<item>

<id>9c5eebf630d626279fa6acbe1f50c9b9</id>

<term>cat</term>

<domain>animal</domain>

<match>true</match>

<triples>

<s>http://dbpedia.org/resource/Cat</s>

<p>http://halo.vulcan.com/lod/2013/11/isa-vocabulary#isA</p>

<o>http://dbpedia.org/ontology/Animal</o>

<search>

<p>http://www.w3.org/2000/01/rdf-schema#label</p>

<o>animal</o>

</search>

</triples>

</item>

Query TO and THROUGH rdf:type• JSON: http://halo.vulcan.com:8080/isa/cat/type/animal.json• JSON: http://halo.vulcan.com:8080/isa/cat/type-graph/animal.json

{

"id":"9c5eebf630d626279fa6acbe1f50c9b9",

"term":"cat",

"domain":"animal",

"match":true,

"triples":[{

"s":"http://dbpedia.org/resource/Cat",

"p":"http://halo.vulcan.com/lod/2013/11/isa-vocabulary#isA",

"o":"http://dbpedia.org/ontology/Animal",

"search":{

"p":"http://www.w3.org/2000/01/rdf-schema#label",

"o":"animal"

}

}]

}

Query TO and THROUGH rdf:type• Halo.Vulcan SPARQL : Is a “cat” a “mammal”?

PREFIX halo: <http://halo.vulcan.com/lod/2013/11/isa-vocabulary#>

SELECT DISTINCT ?p ?o ?domainLabel WHERE { GRAPH ?G {

?term halo:isA ?o .

?term ?p ?o .

?term <http://www.w3.org/2000/01/rdf-schema#label> ?termLabel .

?o <http://www.w3.org/2000/01/rdf-schema#label> ?domainLabel .

?o rdf:type halo:rdf-type .

FILTER (regex(str(?termLabel), '^cat$', 'i')) .

FILTER (regex(str(?domainLabel), 'mammal', 'i'))

}} LIMIT 100

EXAMPLE: 2Query TO and THROUGH category

Query TO and THROUGH category• Is a “cat” an “animal”?

Cat halo-uri:isAInvasive animal

species

Animals described in 1758

Domesticated animals

Searches are Returned with Triples"triples":[{

"s":"http://dbpedia.org/resource/Cat",

"p":"http://halo.vulcan.com/lod/2013/11/isa-vocabulary#isA",

"o":"http://dbpedia.org/resource/Category:Invasive_animal_species",

"search":{

"p":"http://www.w3.org/2000/01/rdf-schema#label",

"o":"invasive animal species"

}

},{

"s":"http://dbpedia.org/resource/Cat",

"p":"http://halo.vulcan.com/lod/2013/11/isa-vocabulary#isA",

"o":"http://dbpedia.org/resource/Category:Animals_described_in_1758",

"search":{

"p":"http://www.w3.org/2000/01/rdf-schema#label",

"o":"animals described in 1758"

}

}]

Query TO and THROUGH rdf:type• Halo.Vulcan SPARQL : Is a “cat” an “animal”?

PREFIX halo: <http://halo.vulcan.com/lod/2013/11/isa-vocabulary#>

SELECT DISTINCT ?p ?o ?domainLabel WHERE { GRAPH ?G {

?term halo:isA ?o .

?term ?p ?o .

?term <http://www.w3.org/2000/01/rdf-schema#label> ?termLabel .

?o <http://www.w3.org/2000/01/rdf-schema#label> ?domainLabel .

?o rdf:type halo:wikipedia-category .

FILTER (regex(str(?termLabel), '^cat$', 'i')) .

FILTER (regex(str(?domainLabel), ’animal', 'i'))

}} LIMIT 100

ADDITIONAL SERVICESMore graphs, flexible service points, and unexpected features…

There are 4 graphs in Virtuoso• http://halo.vulcan.com:8890/conductor/sparql_graph.vspx

• http://halo.vulcan.com:8890/isa/rdf-type• ~87,069 Triples

• http://halo.vulcan.com:8890/isa/rdf-type-graph• ~101,823 Triples

• http://halo.vulcan.com:8890/isa/category-type• ~292,239 Triples

• http://halo.vulcan.com:8890/isa/category-type-graph• ~560,906 Triples

652,677 NORMALIZED TRIPLES ACROSS ALL GRAPHS

“Find All” Service Points

Why query every instance? Just ask the service for all relations to a term in a given graph.

• All “is a” matches in rdf-type• http://halo.vulcan.com:8080/isa/cat/type/.xml

• All “is a” matches in rdf-type graph• http://halo.vulcan.com:8080/isa/cat/type-graph/.xml

• All “is a” matches in categories• http://halo.vulcan.com:8080/isa/cat/category/.xml

• All “is a” matches in categories graph• http://halo.vulcan.com:8080/isa/cat/category-graph/.xml

• All “is a” matches in every graph• http://halo.vulcan.com:8080/isa/cat/.xml

Unanticipated Features• Absolute matching on domain for “is a” relations

• Add a URL for literal matching and update SPARQL regex

• Spacing and special characters need to be URL encoded because each call is a ‘GET’

• Wikipedia categories are of poor quality• User defined and often inaccurate• Very specific: “Animal species described in 1705”• Cyclical: “Republics -> Countries -> United States -> Republics …

• All paths lead to: Philosophy

• Plural categories make for difficult literal matching and odd “is a” statements: Is a cat a felines?

Example Service URLsQuery rdf-types graph:• Domain-range Query – http://halo.vulcan.com:8080/isa/cat/type/animal.json• Term isa * Query – http://halo.vulcan.com:8080/isa/cat/type/.json

Query rdf-types and parents graph: • Domain-range Query – http://halo.vulcan.com:8080/isa/lizard/type-graph/reptile.json• Term isa * Query – http://halo.vulcan.com:8080/isa/lizard/type-graph/.json

Query category graph:• Domain-range Query – http://halo.vulcan.com:8080/isa/cat/category/animal.json• Term isa * Query – http://halo.vulcan.com:8080/isa/cat/category/.json

Query category and parents graph:• Domain-range Query – http://halo.vulcan.com:8080/isa/cat/category-graph/animal.json• Term isa * Query – http://halo.vulcan.com:8080/isa/cat/category-graph/.json

Query all IsA graphs (every associated entity types, categories, and parents):• Domain-range Query – http://halo.vulcan.com:8080/isa/cat/animal.json• Term isa * Query – http://halo.vulcan.com:8080/isa/cat/.json

top related