This presentation describes the public data service - FactForge. It is a reason-able view of a segement of LOD cloud, and the biggest body of general knowledge on which inference is performed, supplied with a reference layer for a quick access.


<ul><li>1.FactForge: Data Service or theDiversity of Inferred Knowledgeover LOD Mariana Damova, PhD, Kiril Simov, Zdravko Tashev, Atanas Kiryakov AIMSA2012 September 2012</li></ul> <p>2. Ontotext Top-5 provider of core Semantic Technology Established in year 2000; offices in Bulgaria, UK, USA Active both in research and commercial projects (FP7 funding for 10 years) 360 semantic technology unique portfolio: Semantic Databases: high-performance RDF DBMS, scalable reasoning Semantic Search: text-mining (IE), metadata generation, Information Retrieval (IR) Web Mining: focused crawling, screen scraping, data fusion Linked Data Management and Data Integration Good recognition in the SemTech community Ontotext pages are ranked #1 for semantic annotation and semantic repository at GYM, #3 for linked data management at Google Several joint ventures and subsidiaries Innovantage: leading online recruitment intelligence provider in UK 3. Ontotext Clients (selected)British Broadcasting Corporation (BBC) Run its World Cup 2010 sites on top of OWLIM Since Mar12 BBC Sports 2012 Olympics sections are drivenby OWLIM and a Concept Extraction service developed by OntotextPress Association (UK) Analysis of Sports news Concept extraction Linked data generationTop-3 USA media (not allowed to name)The National Archives (UK) contracted Ontotext to implementsemantic KB and semantic search for the Government Web ArchiveBritish Museum (UK) Ontotext leads the development of Phase 3 ofResearchSpace project on collaborative research in cultural heritage;British Museums public SPARQL end-point is powered by OWLIMde Bibliothek (Holland) aggregation of data from 150 library databases 4. Semantic Web and Linked Open Data Semantic Weba set of standards that enable computers to interpret thesemantics of data on the web Linked Open Dataa set of principles for publishing structured data and interlinkingthem so that they can be browsed in a way HTML pages arebrowsable - Use URIs to identify things. - Use HTTP URIs so that these things can be referred to and looked up ("dereferenced") by people and user agents. - Provide useful information about the thing when its URI is dereferenced, using standard formats such as RDF/XML. - Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web. AIMSA2012September 2012 #4 5. Linked Open Data cloud2008 2011 295 datasets 2009 more than 30 billion triples AIMSA2012 July 2011#5 6. Linked Open Data is maturingLOD cloud grows by billions of triples yearlyTechnologies and guidelines abouthow to produce linked data fasthow to assure their qualityhow to provide vertical oriented data services LOD2, LATC, baseKBAIMSA2012 September 2012 #6 7. This talk is about reasoning and coping with diversity of the data on the web of dataAIMSA2012 September 2012 #7 8. Outline FactForge (beta) Reference Layer Access Modes Querying Airports around London US city a subject of a Novel US city contactInformation Challenges ConclusionAIMSA2012 September 2012 9. FactForge (beta)the largest body of heterogeneous general knowledge on which inference has been performed powered by OWLIM 5.2 supporting SPARQL 1.1 AIMSA2012September 2012 10. Datasets REASON-ABLE VIEW of LOD datasetsNumber of explicit statements: 1,686,804,539Implicit statements: 1,264,199,839Retrievable statements: 12,646,674,554CIA FactBook DBpedia 3.7 Freebase NY Times LexvoWordnet 3.0GeonamesLingvoj MusicBrainzmaterialization is performed with respect to the semantics of OWL-Horst optimizedAIMSA2012 September 2012 11. Reference Layer PROTON light weight upper level ontology ~500 classes, ~150 properties http://www.ontotext.com/proton-ontologyLinking at schema level:(1) using rdfs:subClassOf and rdfs:subPropertyOf statements;(2) using OWL expressions where there is a difference in the conceptualization(3) using inference rules if additional individuals are necessary in the repository to support the mappingAIMSA2012 September 2012 #11 12. Access modesRDF Search - retrieve ranked list of URIs related to literals, which contain specific keywordsAIMSA2012September 2012 #12 13. Access modes (condt) Exploration - traversing the data, one resource at a timeAIMSA2012 September 2012 14. Access modes (condt)Exploration - traversing the data, one resource at a time, inspecting inferred knowledge- locatedIn Bulgaria, Eastern Europe- Geonames types/FearureCodes (dc:type P.PPL)- parentFeature Bulgaria, Europe-containsLocation Cherno More Sports Complex,Varna Archeological Museum- isBirthPlaceOf Aleksander Kraev, Martin Hristov AIMSA2012September 2012 #14 15. Access modes (condt) Exploration - traversing the data, one resource at a time, inspecting inferred knowledge- locatedIn - Europe- subRegionOf - Europe- hasContactInfo website via Freebase-containsLocation- partOf AIMSA2012 September 2012 #15 16. Access modes (condt)SPARQL endpoint AIMSA2012 September 2012 #16 17. Access modes (condt)RelFinder European Data Forum September 2012 #17 18. QueryingUsing LOD concepts SELECT * WHERE {?Person dbp-ont:birthPlace ?BirthPlace ; rdf:type dbp-ont:Politician ; ?BirthPlace geo-ont:parentFeature dbpedia:Germany . }Using the intermediary layer SELECT * WHERE { ?Person prot:birthPlace ?BirthPlace ;rdf:type prot:Politicianr ; ?BirthPlace prot:subRegionOf dbpedia:Germany . } AIMSA2012 September 2012 19. Find Airports near London Standard LOD vs. PROTON query 13 vs. 20 results DBpedia vs. DBpedia and GeonamesAIMSA2012September 2012 #19 20. Find airports near London - Results comparison Using Geospatial index of OWLIM AIMSA2012 September 2012 #20 21. City a subject of a science fiction author AIMSA2012September 2012 #21 22. OWLIM 5.0 and SPARQL 1.1Exemplary queries :GROUP BY, min Minimal and maximal population counts of European countriesFederated Query between FactForge and LinkedLifeData Drugs that cure the disease from which died Alexandre Graham BellLiteral index over dates World governors in office between 1980 and 2005Literal index over digits European countries with population above 20 MLNGeospatial index Show the distance from London of airports located at most 50 miles away from itAIMSA2012 September 2012 #22 23. Challenges and usage Clean data Clean up input data At model level Contradiction detection Consistency checking Curation and upgrading methodology FactForge has been used as data layer infrastructure in FP7 projects, like RENDER FactForge has been used in tasks of linked data generation from unstructured data, metadata enrichment of structured data providing linkage to the entire LOD cloud for example The National Archive of UKEDAMAM - food recommendation appAIMSA2012September 2012 #23 24. AcknowledgementsPartial fundingColleaguesIvan Peikov, OntotextRouslan Velkov, OntotextBarry Bishop, OntotextBarry Norton, OntotextMarin Dimitrov, OntotextAlex Simov, OntotextJordan Dichev, OntotextKonstantin Penchev, OntotextLinkshttp://ff-dev.ontotext.comhttp://www.ontotext.com/owlimhttp://www.ontotext.com/factforgeEmail:info@factforge.net AIMSA2012September 2012 #24 25. Thank you for your attention!mariana.damova@ontotext.com </p>