concept drift in ontology mapping and semantic...

25
CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC ANNOTATION ADAPTATION 1 Cédric PRUSKI Dri%aLOD@EKAW 2016, November 20 th , Bologna, Italy

Upload: others

Post on 25-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC ANNOTATION ADAPTATION

1

Cédric  PRUSKI    

Dri%-­‐a-­‐LOD@EKAW  2016,    November  20th,  Bologna,  Italy  

Page 2: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

MOTIVATION

2

data  

KS KT

malignancy Malignant neoplasm

=

?

inaccessible

Outdated mappings and annotations may trigger undesirable results in biomedical systems

Crucial maintaining mappings

and annotations valid Malignant neoplasm

Large size and complexity

Prevents a totally manual maintenance

malignancy

malignancy

data

?

Page 3: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

•  What is the impact of concept drift (or ontology evolution) on ontology mappings and semantic annotations? •  Quantitative •  Qualitative

•  How can we formally characterize concept drift? •  Basic changes (Addition/Deletion of concepts) •  Complex changes (Split, merge, move of concepts)

•  Can we reuse information that characterizes concept drift to adapt ontology mappings and semantic annotations? •  Prevention of re-alignment / re-annotation of whole datasets

PROBLEMATIC

3

Page 4: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

①  Concept drift for mapping adaptation a.  DynaMO research project b.  Change patterns

②  Concept drift for semantic annotation maintenance a.  ELISA research project b.  Background knowledge

③  Discussion a.  Concept drift for LOD

AGENDA

4

Page 5: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

THE CASE OF MAPPING ADAPTATION

5

Page 6: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

“Adaptation of existing mappings according to modifications

affecting KOS elements at evolution time”

Definition and Problematic

ONTOLOGY MAPPING ADAPTATION

6

MV1=(s, t, r) MV2=(s’, t, r’)

Hypothesis: There is a correlation between the way KOS’ elements evolve and the way mappings are adapted

Page 7: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

UNDERSTANDING MAPPING EVOLUTION

7

•  Identify potential interdependencies between changes affecting KOS entities and the mapping evolution

•  Empirically examine official and real-world mappings over time •  Evolution of SNOMED CT and ICD9CM as a case study

~400 000 mappings analyzed

SNOMEDCT

Jan/10

SNOMEDCT

Jul/10

SNOMEDCT

Jan/11

SNOMEDCT

Jul/11

ICD9CM 2009

ICD9CM 2010

MST 1

Jan/10 MST 2 Jul/10

MST 3 Jan/11

MST 4 Jul/11

How concept drift impact mappings?

Page 8: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

How to identify these attributes?

KEY FINDINGS

8

This concept changed 560.39

≡ ≤ ≤ ≤

560.39

168000

is-a

44635007

is-a

29162007 168000 40515007

is-a

560.32

This concept was added

40515007

Before Evolution After Evolution

197063004

≡ ≤ ≤

ICD9CM

SNOMED CT SNOMED CT

ICD9CM

similarity

Enterolith (disorder)

Typhlolithiasis (disorder) Concretion

of intestine (disorder)

Impaction of intestine

29162007 44635007

≤ ≡

Fecal impaction

Fecal impaction of colon

197063004

Fecal impaction

Fecal impaction of colon

Observed modifications

Time

Attributes -Concretion of intestine -Enterolith -Fecal impaction

Mapping adaptation based on the evolution of relevant concept attributes

Page 9: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

Lexical change patterns

CHARACTERIZATION OF CHANGES

9

a1, a2,…, an

asup1, asup2,…, asupn

asib1

asub1, asub2, …, asubn

a1, a2, …, an asib1, asib2

Ø  Total Copy (TC)

Ø  Total Transfer (TT)

Ø  Partial Copy (PC)

Ø  Partial Transfer (PT)

unspecified mental behavioral problem

bronzed diabetes inflammatory bowel diseases

bronzed diabetes inflammatory bowel diseases 1

specified behavioral problem

inflammatory bowel diseases

cs0

cs1

time CONTEXT = SUP ∪ SUB ∪ SIB

time j

specified behavioral problem

time j+1

Page 10: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

Semantic change patterns

CHARACTERIZATION OF CHANGES

10

a1, a2,…, an

asup1, asup2,…, asupn

asib1

a1, a2, …, an

asib1, asib2

asub1,…, asubn

Ø  Equivalent (EQV)

Ø  Partial Match (PTM)

Ø  More Specific (MSP)

Ø  Less Specific (LSP)

Diabetes type 1

Diabetes type I Focal atelectasis

Helical atelectasis

familial chylomicronemia

familial hyperchylomicronemia Kappa chain disease

Kappa light chain disease

cs0

cs1

time j+1

time

time j

CONTEXT = SUP ∪ SUB ∪ SIB

Page 11: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

Heuristics

LINKING CP AND MAINTENANCE ACTIONS

11

as1, as2, as3, …, asn

as1, as2, as3, …, asn

asib1, asib2,…, asibn

cs0a1,…, ak ct

semType Affected by KOS changes

KOS KS KOS KT

cs1

relevant attributes

MoveM(mst , ccand1

)

ccand1

∃!Lexical CP (Total Transfer) Semantic CP

unchanged

Kappa light chain disease

Kappa chain disease

CONTEXT = SUP ∪ SUB ∪ SIB

time j

time j+1

Page 12: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

•  Concept drift has a huge impact on ontology mappings but some changes in concept do not affect mappings

•  Drift of attribute values governs the mapping adaptation process

•  In most of the cases concept drift results in local changes •  Change in super, sub concepts and siblings

•  Considering ontology versions alone is not enough to characterize concept drift •  Need of external background knowledge to better determine the semantic relationship

between versions of concept •  Cf. semantic annotation adaptation

Lessons learned

CONCEPT DRIFT FOR MAPPING ADAPTATION

12

Page 13: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

THE CASE OF SEMANTIC ANNOTATIONS ADAPTATION

13

www.elisa-­‐project.lu    elisaelisa

Page 14: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

Problem

SEMANTIC ANNOTATIONS ADAPTATION

14

Page 15: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

Impact of concept drift on semantic annotations

METHODOLOGY

15

Page 16: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

RESULTS

16

Page 17: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

RESULTS

17

Page 18: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

RESULTS

18

Page 19: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

RESULTS

19

Page 20: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

•  Concept may have labels before and after evolution that are disjoint from the syntactic or lexical point of view •  Ex: Cancer Malignant neoplasm

•  Lexical and Semantic change patterns cannot be applied

•  Consideration of external knowledge sources are required to characterize the evolution of concepts in such situations

•  We propose a methods exploiting Bioportal to overcome this limitation •  Ontologies •  Mappings

•  The method is able to find the semantic relationship between two versions of the same concepts •  Equivalent, less specific, more specific, unrelated, partially matched

Use of external knowledge source

CONCEPT DRIFT FOR ANNOTATIONS

20

Page 21: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

Example

USE OF EXTERNAL KNOWLEDGE SOURCE

21

“Pituitary)dwarfism”)(MeSH))

“Pituitary)dwarfism)II”)(MeSH))

SNOMED)CT,)ICD9CM,)MEDDRA,)

NCIT,)DOID,)RCD,)HP,)DERMLEX,)NATPRO,)

CRISP,)SOPHARM,)BDO,)SNMI)

OMIM)NDFRT)

Search)in)ontologies) Search)in)ontologies)

No)common)ontologies)

Use)mappings)

15)mappings)available)(OMIM)ontology))

“Pituitary)dwarfism)II”)(OMIM))Mapped_to)

“LaronRtype)isolated)somatotropin)defect”)(SNOMED)CT))

SNOMED)CT)is)the)common)ontology)

“LaronRtype)isolated)somatotropin)defect”)and)“Pituitary)dwarfism”)have)the)same)super)concept)

(“short)stature)disorder”))they)are)siblings)

1 1

2

(Direct)method))

(Indirect)method))

3

Page 22: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

•  Ontology regions do not evolve in the same way •  Unstable regions à handle with care •  Interesting for predicting concept drift

•  Concept drift has a different impact on annotation tools •  GATE •  NCBO annotator

•  Background knowledge gives promising results for characterizing concept drift •  Bioportal ontologies •  RDF datasets, Web data under investigation

•  Will machine learning help in understanding concept drift? •  Identification of relevant features •  What ML techniques to use?

Lessons learned (so far …)

CONCEPT DRIFT IN ANNOTATION ADAPTATION

22

Page 23: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

•  Linked Open Data requires vocabulary for semantic interoperability purposes

•  LOD for characterizing concept drift •  Quality of LOD is problematic •  Some datasets rely on outdated vocabularies

•  Concept drift impacting LOD: •  FOAF, DC not so dynamic as domain ontologies •  No control over the datasets using controlled vocabularies

à How to propagate changes observed in the vocabulary to RDF datasets?

Concept drift for LOD

DISCUSSION

23

Page 24: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

•  Silvio Cardoso, •  Dr. Marcos Da Silveira, •  Dr. Duy Dinh, •  Dr. Julio Dos Reis, •  Dr. Anika Gross, •  Pr. Erhard Rahm •  Pr. Chantal Reynaud-Delaître,

•  And all the others …

COLLABORATORS

24

Page 25: CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC …event.cwi.nl/drift-a-lod/2016/slides/Keynote_Drift-a-LOD_Cedric.pdf · V2 =(s’, t, r’) Hypothesis: There is a correlation between

M. Da Silveira, J. C. Dos Reis, C. Pruski, Management of Dynamic Biomedical Terminologies: Current Status and Future Challenges, IMIA Yearbook of Medical Informatics, 10(1), 125-133, 2015 J. C. Dos Reis, D. Dinh, M. Da Silveira, C. Pruski, C. Reynaud-Delaître, Recognizing lexical and semantic change patterns in evolving life science ontologies to inform mapping adaptation, Artificial Intelligence in Medicine, 63(3), 153-170, (DOI: http://dx.doi.org/10.1016/j.artmed.2014.11.002), 2015 J. C. Dos Reis, C. Pruski, M. Da Silveira, C. Reynaud-Delaître, Understanding semantic mapping evolution by observing changes in biomedical ontologies, Journal of Biomedical Informatics, 47, 71-82, 2014. S. D. Cardoso, C. Pruski, M. Da Silveira, Y-C Lin, A. Gross, E. Rahm, C. Reynaud-Delaitre, Leveraging the Impact of Ontology Evolution on Semantic Annotations, Knowledge Engineering and Knowledge Management - 20th International Conference, (EKAW) 2016, Bologna, Italy, November 19-23, 2016 J.C. Dos Reis, C. Pruski, M. Da Silveira, C. Reynaud-Delaître, Characterizing Semantic Mappings Adaptation via Biomedical KOS Evolution: A Case Study Investigating SNOMED CT and ICD, AMIA 2013 Annual Symposium, Washington DC (USA), 2013 J.C. Dos Reis, D. Dinh, C. Pruski, M. Da Silveira, C. Reynaud-Delaître, Mapping Adaptation Actions for the Automatic Reconciliation of Dynamic Ontologies, ACM International Conference on Information and Knowledge Management (CIKM 2013), San Francisco, CA (USA), 2013 J.C. Dos Reis, D. Dinh, C. Pruski, M. Da Silveira, C. Reynaud-Delaître, The influence of similarity between concepts in evolving biomedical ontologies for mapping adaptation, European Medical Informatics Conference (MIE), 31/08 - 03/09, Istanbul, Turquie, 2014 J.C. Dos Reis, D. Dinh, C. Pruski, M. Da Silveira and C. Reynaud-Delaître, Identifying change patterns of concept attributes in ontology evolution, Proc. of the 11th ESWC, Anissaras, Crete, (Greece), 2014. C. Pruski, J.C. Dos Reis, M. Da Silveira, Capturing the relationship between evolving biomedical concepts via background knowledge, 9th International SWAT4LS conference, Amsterdam, 2016

REFERENCES

25