fact forge aimsa2012

25
FactForge: Data Service or the Diversity of Inferred Knowledge over LOD over LOD Mariana Damova, PhD, Kiril Simov, Zdravko Tashev, Atanas Kiryakov AIMSA’2012 September 2012

Upload: mariana-damova

Post on 08-May-2015

445 views

Category:

Technology


0 download

DESCRIPTION

This presentation describes the public data service - FactForge. It is a reason-able view of a segement of LOD cloud, and the biggest body of general knowledge on which inference is performed, supplied with a reference layer for a quick access.

TRANSCRIPT

Page 1: Fact forge aimsa2012

FactForge: Data Service or the

Diversity of Inferred Knowledge

over LODover LOD

Mariana Damova, PhD, Kiril Simov, Zdravko Tashev, Atanas Kiryakov

AIMSA’2012

September 2012

Page 2: Fact forge aimsa2012

Ontotext

– Top-5 provider of core Semantic Technology

– Established in year 2000; offices in Bulgaria, UK, USA

– Active both in research and commercial projects (FP7 funding for 10 years)

• 360° semantic technology – unique portfolio:

– Semantic Databases: high-performance RDF DBMS, scalable reasoning

– Semantic Search: text-mining (IE), metadata generation, Information Retrieval (IR)

– Web Mining: focused crawling, screen scraping, data fusion

– Linked Data Management and Data Integration

Good recognition in the SemTech community

– Ontotext pages are ranked #1 for “semantic annotation” and “semantic repository” at

GYM, #3 for “linked data management” at Google

Several joint ventures and subsidiaries

– Innovantage: leading online recruitment intelligence provider in UK

Page 3: Fact forge aimsa2012

Ontotext Clients (selected)

British Broadcasting Corporation (BBC)– Run its World Cup 2010 sites on top of OWLIM

– Since Mar’12 BBC Sports

– 2012 Olympics sections are driven by OWLIM and a Concept Extraction service developed by Ontotext

Press Association (UK)– Analysis of Sports news

– Concept extraction

– Linked data generation– Linked data generation

Top-3 USA media (not allowed to name)

The National Archives (UK) contracted Ontotext to implement semantic KB and semantic search for the Government Web Archive

British Museum (UK) Ontotext leads the development of Phase 3 of ResearchSpace project on collaborative research in cultural heritage; British Museum’s public SPARQL end-point is powered by OWLIM

de Bibliothek (Holland) aggregation of data from 150 library databases

Page 4: Fact forge aimsa2012

Semantic Web and Linked Open Data

• Semantic Web

a set of standards that enable computers to interpret the

semantics of data on the web

• Linked Open Data

a set of principles for publishing structured data and interlinking

them so that they can be browsed in a way HTML pages are them so that they can be browsed in a way HTML pages are

browsable

- Use URIs to identify things.

- Use HTTP URIs so that these things can be referred to and looked up

("dereferenced") by people and user agents.

- Provide useful information about the thing when its URI is dereferenced,

using standard formats such as RDF/XML.

- Include links to other, related URIs in the exposed data to improve discovery

of other related information on the Web.

#4September 2012AIMSA’2012

Page 5: Fact forge aimsa2012

Linked Open Data cloud

2008

2011

#5July 2011AIMSA’2012

20092011

295 datasets more than 30 billion triples

Page 6: Fact forge aimsa2012

Linked Open Data is maturing

LOD cloud grows by billions of triples yearly

Technologies and guidelines about

how to produce linked data fast how to produce linked data fast

how to assure their quality

how to provide vertical oriented data services

LOD2, LATC, baseKB

#6September 2012AIMSA’2012

Page 7: Fact forge aimsa2012

This talk is about

reasoning

and

coping with diversity of the data on the web of data coping with diversity of the data on the web of data

#7September 2012AIMSA’2012

Page 8: Fact forge aimsa2012

Outline

• FactForge (beta)

• Reference Layer

• Access Modes

• Querying

– Airports around London– Airports around London

– US city – a subject of a Novel

– US city – contactInformation

• Challenges

• Conclusion

AIMSA’2012 September 2012

Page 9: Fact forge aimsa2012

FactForge (beta)

AIMSA’2012

the largest body of heterogeneous general knowledge on which inference has been performed

– powered by OWLIM 5.2 – supporting SPARQL 1.1

September 2012

Page 10: Fact forge aimsa2012

Datasets

CIA FactBookDBpedia 3.7

Freebase

REASON-ABLE VIEW

of LOD datasetsNumber of explicit statements: 1,686,804,539

Implicit statements: 1,264,199,839

Retrievable statements: 12,646,674,554

AIMSA’2012

NY Times

Lingvoj

DBpedia 3.7

Geonames

Freebase

Wordnet 3.0MusicBrainz

Lexvo

materialization is performed with respect to the semantics of OWL-Horst optimized

September 2012

Page 11: Fact forge aimsa2012

Reference Layer

#11September 2012AIMSA’2012

Linking at schema level:(1) using rdfs:subClassOf and rdfs:subPropertyOf statements; (2) using OWL expressions where there is a difference in the conceptualization(3) using inference rules if additional individuals are necessary in the repository to support the mapping

PROTON – light weight upper level ontology~500 classes, ~150 properties

http://www.ontotext.com/proton-ontology

Page 12: Fact forge aimsa2012

Access modes

RDF Search - retrieve ranked list of URIs related to literals, which contain specific keywords

#12September 2012AIMSA’2012

Page 13: Fact forge aimsa2012

Exploration - traversing the data, one resource at a time

Access modes (condt)

AIMSA’2012 September 2012

Page 14: Fact forge aimsa2012

Access modes (condt)

Exploration - traversing the data, one resource at a time,

inspecting inferred knowledge

- locatedIn – Bulgaria, Eastern Europe- Geonames types/FearureCodes (dc:type P.PPL)- parentFeature – Bulgaria, Europe-containsLocation – Cherno More Sports Complex,

Varna Archeological Museum

#14September 2012AIMSA’2012

Varna Archeological Museum- isBirthPlaceOf – Aleksander Kraev, Martin Hristov…

Page 15: Fact forge aimsa2012

Access modes (condt)

Exploration - traversing the data, one resource at a time, inspecting inferred knowledge

- locatedIn - Europe- subRegionOf - Europe- hasContactInfo –

website via Freebase-containsLocation

#15September 2012AIMSA’2012

-containsLocation- partOf…

Page 16: Fact forge aimsa2012

Access modes (condt)

SPARQL endpoint

#16September 2012AIMSA’2012

Page 17: Fact forge aimsa2012

Access modes (condt)

RelFinder

#17September 2012European Data Forum

Page 18: Fact forge aimsa2012

Using LOD concepts

SELECT * WHERE {

?Person dbp-ont:birthPlace ?BirthPlace ;

rdf:type dbp-ont:Politician ;

?BirthPlace geo-ont:parentFeature dbpedia:Germany .

}

Querying

Using the intermediary layer

SELECT * WHERE {

?Person prot:birthPlace ?BirthPlace ;

rdf:type prot:Politicianr ;

?BirthPlace prot:subRegionOf dbpedia:Germany .

}

AIMSA’2012 September 2012

Page 19: Fact forge aimsa2012

Find Airports near London

Standard LOD vs. PROTON query 13 vs. 20 resultsDBpedia vs. DBpedia and Geonames

#19September 2012AIMSA’2012

Page 20: Fact forge aimsa2012

Find airports near London - Results comparison

#20September 2012AIMSA’2012

Using Geospatial index of OWLIM

Page 21: Fact forge aimsa2012

City – a subject of a science fiction author

#21September 2012AIMSA’2012

Page 22: Fact forge aimsa2012

OWLIM 5.0 and SPARQL 1.1

Exemplary queries :

GROUP BY, min

— Minimal and maximal population counts of European countries

Federated Query between FactForge and LinkedLifeData

— Drugs that cure the disease from which died Alexandre Graham Bell

Literal index over datesLiteral index over dates

– World governors in office between 1980 and 2005

Literal index over digits

― European countries with population above 20 MLN

Geospatial index

— Show the distance from London of airports located at most 50 miles away from it

#22September 2012AIMSA’2012

Page 23: Fact forge aimsa2012

Challenges and usage

• Clean data

– Clean up input data

• At model level

– Contradiction detection

– Consistency checking

• Curation and upgrading methodology• Curation and upgrading methodology

#23September 2012AIMSA’2012

FactForge has been used as data layer infrastructure in FP7 projects, like RENDERFactForge has been used in tasks of

linked data generation from unstructured data,metadata enrichment of structured data

providing linkage to the entire LOD cloudfor example The National Archive of UK

EDAMAM - food recommendation app

Page 24: Fact forge aimsa2012

Acknowledgements

ColleaguesIvan Peikov, OntotextRouslan Velkov, OntotextBarry Bishop, OntotextBarry Norton, Ontotext

Partial funding

#24September 2012AIMSA’2012

Barry Norton, OntotextMarin Dimitrov, OntotextAlex Simov, OntotextJordan Dichev, OntotextKonstantin Penchev, Ontotext

Linkshttp://ff-dev.ontotext.comhttp://www.ontotext.com/owlimhttp://www.ontotext.com/factforgeEmail:[email protected]

Page 25: Fact forge aimsa2012

Thank you for your attention!

[email protected]