social tags and linked data for ontology development: a case study in the financial domain andrés...

20
Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva , Leyla Jael García- Castro ± , Alexander García*, Oscar Corcho {hgarcia, ocorcho}@fi.upm.es Ontology Engineering Group Universidad Politécnica de Madrid, Spain ± [email protected] Universitat Jaume I, Castellón de la Plana, Spain *[email protected] State University, Florida, USA June 2014

Upload: emmeline-beasley

Post on 18-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva †, Leyla Jael García-Castro ±, Alexander

Social Tags and Linked Data for Ontology Development:

A Case Study in the Financial Domain

Andrés García-Silva†, Leyla Jael García-Castro±,

Alexander García*, Oscar Corcho†

†{hgarcia, ocorcho}@fi.upm.esOntology Engineering Group

Universidad Politécnica de Madrid, Spain

± [email protected]

Universitat Jaume I, Castellón

de la Plana, Spain

*[email protected]

State University, Florida, USA

June 2014

FPI grant BES-2008-007622

Page 2: Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva †, Leyla Jael García-Castro ±, Alexander

2Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

FolksonomiesIntroduction

Java Programming language

Tutorial

Web 2.0User-

generated Content

Social Networks

Tools for organizing, sharing & discovering

Information

Java Programming language

Tagging Systems

Folksonomy

Java Java Persistent Access

Database Knowledge Base

Page 3: Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva †, Leyla Jael García-Castro ±, Alexander

3Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

• Vocabulary emerges around resources and usersGolder and Huberman (2006), Marlow et al. (2006)• Maintained by a large user community• Flexible (No restricted)• Up-to-date

• Emergent semantics from the aggregation of individual classifications Gruber (2007), Mika (2007), Specia and Motta (2007)

Folksonomies

Folksonomies as a source of knowledge

Introduction

Page 4: Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva †, Leyla Jael García-Castro ±, Alexander

4Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

Folksonomies

Statistical-based Ontology-based

State of the art

Tag

Sim

ilarit

y M

easu

res

Ont

olog

y G

ener

atio

n

relation?

Two tags are related if..

Hybrid approaches

Ontology Folksonomy

Ontology

Ontology

Cattuto et al. (2008)Markines et al. (2009)

Körner et al. (2010)Benz et al. (2011)

Heymann and Garcia-Molina. (2006)Begelman et al. (2006)Hamasaki et al. (2007)

Jäschke et al. (2008) Kennedy et al. (2007)

Mika (2007)Benz et al. (2010)

Limpens et al. (2010)

Angeletou et al. (2008)Cantador et al. (2008) García-Silva et al. (2009)Maala et al. (2008)Passant (2007)Tesconi et al. (2008))

Giannakidou et al. (2008)Specia and Motta (2007).

Page 5: Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva †, Leyla Jael García-Castro ±, Alexander

5Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

FolksonomiesState of the art

Mika, 2007 Stat Yes Del,Oth Yes Yes No Yes Onto Desc Study NoHamasaki et al., 2007 Stat Yes Pol No Yes Yes No Onto Task-based No

Jaschke et al., 2008 Stat Yes Del,Bib Yes Yes No No Hier Desc Study NoLimpens et al., 2010 Stat Semi Oth No No Yes Yes Enri Pres/Rec No

Begelman et al., 2006 Stat Yes Del,Raw Yes Yes No No Clus Desc Study NoKennedy et al., 2007 Stat Yes Fli Yes Yes Yes Yes Inst Pres/Rec No

Heyman & Garcia Molina, 2006 Stat Yes Del,Cit No Yes No No Hier Task-based NoBenz et al., 2010 Stat Yes Del No Yes Yes Yes Hier Pres/Rec No

Giannakidou et al., 2008 Hyb Yes Fli Yes Yes Yes No Clus No NoSpecia & Motta, 2007 Hyb Semi Del,Fli Yes Yes Yes Yes Onto Desc Study No

Angeletou et al., 2008 Ont Yes Fli Yes Yes Yes Yes Enri Pres/Rec NoCantador et al., 2008 Ont Yes Fli,Del Yes Yes No Yes Inst Pres/Rec No

Tesconi et al., 2008 Ont Yes Del Yes Yes Yes Yes Enri Pres/Rec NoPassant, 2007 Ont No Oth Yes Yes Yes Yes Enri Desc Study No

Maala et al., 2008 Ont Yes Fli Yes Yes No Yes Enri Desc Study No

Disambi-guation

Sem. Ident

Output Evaluation Domain Knowledge

Approach Type Auto Dat Src. Select. & Cleaning

Context Ident.

Statistical-based• Most of the approaches do not distinguish

between classes and instances• Relation semantics is limited to some

types and is not precesily defined• No domain knowledge

LimitationsOntology-based• All the approaches produce either

enrichments or instances (No Classes)• Relations are not identified• No domain knowledge

Hybrid• Semi-automatic ontology generation• No domain knowledge

Page 6: Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva †, Leyla Jael García-Castro ±, Alexander

6Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

Proposal

Goal: Generate a domain baseline ontology, containing classes and relationships, out of folksonomy information.

Folksonomy

Terminology

ExtractionList of domain terms

Domain Experts

Semantic Elicitatio

nLinked Open Data*

*“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”

drive the extraction of domain classes and relationships from LOD

Domain relevant resources (URL)

Page 7: Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva †, Leyla Jael García-Castro ±, Alexander

7Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

We propose a process to extract domain knowledge from large and generic knowledge bases which is driven by the domain terminology in the folksonomy

• It may save time in the ontology development process

• It allows ontology engineers to understand the domain with a limited participation of domain experts.

• Smaller and more focused ontologies which are potentially easier to understand and maintain.

• complex queries and reasoning task may execute faster on smaller data sets

• In observance of methodological practice, our technique harvests community knowledge and reuses existing ontologies

• The Ontology has links to external classes and relationships available in the Linked Open Data cloud.

Benefits

Page 8: Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva †, Leyla Jael García-Castro ±, Alexander

8Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

Challenges

Problem: Tags lack semanticsAmbiguitySynonymsAcronymsMorphological variations

PluralsSingularsVerb Conjugations

Misspellings

Page 9: Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva †, Leyla Jael García-Castro ±, Alexander

9Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

Goal: To extract domain terminology from the folksonomy

Folksonomy A = U x T x R, G = (V,E) where V = U T R, and E ={(u, t, r)|(u, t, r) A}∪ ∪ ∈Resource graph G’ = (V’,E’) where V’ = R, and E’={(ri, rj)| ((u, t∃ m, ri) A ^ (u, t∈ n, rj) A ^ t∈ m= tn)}

Spreading Activaction

Seeds: Domain relevant resources from Domain Experts

Nodes weighted with an activation value used to start the search.

Activation value spreads to adjacent nodes by an activation function.

Activation function: ~ Shared tags between the visited node and the source node, and the source node activation value.

Activation function > threshold: Node marked as activated and the spreading continuous to adjacent nodes.

Tags of activated nodes are collected as domain terms.

Terminology ExtractionApproach

Page 10: Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva †, Leyla Jael García-Castro ±, Alexander

10Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

Semantic ElicitationApproach

Enabling folksonomies for knowledge extraction: A semantic grounding approach (2012)A García-Silva, I Cantador, Ó CorchoInternational Journal on Semantic Web and Information Systems 8 (3), 24-41

• Normalize the tag to the standard notation of DBpedia resource titles• Search for a resource with a label equal to the normalized tag using SPARQL

• If not exists: Use an spelling suggestion service and search again• If exists: Check if it is related to a disambiguation resource

• If true: retrieve disambiguation candidates

Select the most similar candidate to the tag context• Vector space model• Candidate Resources represented using their textual descriptions • Tag represented using its context (i.e, cooccurrent tags)• Selection of most similar candidate using Cosine

• If false: Select the resource (Default sense in Wikipedia)

Goal: To relate domain terms (tags) to DBpedia resources

Page 11: Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva †, Leyla Jael García-Castro ±, Alexander

11Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

Semantic ElicitationApproach

Goal: Identify classes from resources

• Use ask constructor to verify if the entity is a class

• If not:

• Create queries to traverse all the possible paths of equivalent relations between the entity and a class in the RDF graph

# Query 1.ASK{<resource> <rdf:type> <rdfs:Class>}

# Query 2SELECT ?classWHERE{ <resource> ?rel1 ?class. ?class <rdf:type> <rdfs:Class>FILTER (?rel1 = <owl:sameAs>) }

# Query 3SELECT ?classWHERE{ <resource> ?rel1 ?node. ?node ?rel2 ?class. ?class <rdf:type> <rdfs:Class>FILTER((?rel1 = <owl:sameAs>) &&(?rel2 = <owl:sameAs>))}

RelFinder: Revealing Relationships in RDF Knowledge Bases.  Philipp Heim, Sebastian Hellmann, Jens Lehmann, Steffen Lohmann and Timo Stegemann In: Proceedings of the 4th International Conference on Semantic and Digital Media Technologies (SAMT 2009), pages 182-187. Springer, Berlin/Heidelberg, 2009.  

Page 12: Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva †, Leyla Jael García-Castro ±, Alexander

12Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

Semantic ElicitationApproach

Goal: To identify relations between classes

• For each pair of classes• Create queries to traverse all the

possible paths between two classes in the RDF graph, and retrieve the relationships.

Caveats

• May result in adding non relevant domain information to the ontology

• Large path• Path passes through abstract

concepts or relationships• cyc:ObjectType• umbel:RefConcept

Page 13: Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva †, Leyla Jael García-Castro ±, Alexander

13Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

Semantic ElicitationApproach

Minimizing the risk to add non relevant information to the ontology• Keep the path length short

• Our experiments show satisfactory results with short path lengths that allow us to enrich the initial set of classes while preserving the precision of the ontology

• Avoid high level concepts• Create lists of high level concepts collected from the knowledge base vocabularies

to filter out the paths containing those concepts• Knowledge base core vocabularies are usually well documented

• http://umbel.org/specications/vocabulary• http://mappings.dbpedia.org/server/ontology/classes/• http://www.cyc.com/kb/thing

• Use semantic similarity distances• Wu and Palmer, 1994 : Depth of the classes and the common subsumer in the taxonomy• Jiang and Conrath, 1997: subclasses per class, class depth, information content, etc.

Page 14: Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva †, Leyla Jael García-Castro ±, Alexander

14Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

Experiment in the financial DomainEvaluation

Finance vocabulary

Input

Evaluation

Page 15: Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva †, Leyla Jael García-Castro ±, Alexander

15Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

Experiment in the financial DomainEvaluation

Terminology Extraction

Finance Ontology

Finance vocabulary

Page 16: Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva †, Leyla Jael García-Castro ±, Alexander

16Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

• Ran the process with an activation threshold 0.8• The ontology produced consists of 187 classes, 378 relations of 8 different types,

and 12 modules.

Inspecting a financial ontologyEvaluation

Page 17: Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva †, Leyla Jael García-Castro ±, Alexander

17Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

A

Evaluation

Class Precision = 80.67%, Relation Precision=96.4%

Inspecting a financial ontologyEvaluation

Ontology Modules

Module Precision (Class) Module Precision (Class)Organization 77,80% Stock Exchange 84,60%Company 88,50% Money Transactions 100%Person 55,60% Country 100%Union 3,74% Research 100%Banker 100% Driver 0%Human 100% Member 100%

Page 18: Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva †, Leyla Jael García-Castro ±, Alexander

18Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

• We have generated a method for automatically developing domain ontologies• Limited user participation• We benefit from the aggregation of the individual classifications to

extract an emergent domain vocabulary• In accordance with methodological guidelines we reuse existing

knowledge (The Web of Data)• We tap into existing links between data sets to collect related

semantic information• We avoid, to some extent, semantic mismatches• We avoid heterogeneous representations

• In practice, we expect the method will be used by ontology engineers to generate baseline ontologies that can be refined later according to the

ontology requirements.

Conclusions

Page 19: Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva †, Leyla Jael García-Castro ±, Alexander

19Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

• Develop a method to assess automatically the validity of the relationships found in the linked data cloud:• OpenCyc Stock Exchange is owl:sameAs UMBEL Exchange of User Rights• However:

• Stock Exchange is an organization

• Exchange of User Rights is an event

• The use of semantic similarity measures to decide whether to include or not relationships found setting up a path between two classes.

• To be able to discover and use datasets in the linked data cloud that cover the domain of interest.

Future Work

Page 20: Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva †, Leyla Jael García-Castro ±, Alexander

Social Tags and Linked Data for Ontology Development:

A Case Study in the Financial Domain

Andrés García-Silva†, Leyla Jael García-Castro±,

Alexander García*, Oscar Corcho†

†{hgarcia, ocorcho}@fi.upm.esOntology Engineering Group

Universidad Politécnica de Madrid, Spain

± [email protected]

Universitat Jaume I, Castellón

de la Plana, Spain

*[email protected]

State University, Florida, USA

June 2014

FPI grant BES-2008-007622