topic: identifying the data schema behind snomed ct
DESCRIPTION
Topic: Identifying the Data Schema behind SNOMED CT. Jon Patrick , Centre for Health Informatics Research & Development, University of Sydney Ming Zhang, Donna Truran National Centre for Classification in Health. Outline. Project description Research methodology Experiments and Results - PowerPoint PPT PresentationTRANSCRIPT
1
Topic: Identifying the Data Schema behind SNOMED CT
Jon Patrick, Centre for Health Informatics Research & Development, University of SydneyMing Zhang, Donna TruranNational Centre for Classification in Health
2
Outline
Project description Research methodology Experiments and Results Conclusion Limitation Recommendation for future work
3
Project Description
Project background SNOMED CT – The core content is stored in simple
tables Project Objective
To discover the conceptual model of SNOMED CT by reverse engineering
4
Research methodology
Data preparation Transfer the SNOMED CT core content table into
RDBMS , that is the Text file into MySQL
Ontology Structure Investigation Database querying -- Explicit characteristics Programming – Implicit characteristics
Data modelling Analysis of the different characteristics and features
so as to generate the conceptual data model
5
Experiment and Result
Explicit Characteristics of the Ontology Original data over view Fully defined and primitive Relationship types Hierarchy structure Multiple inheritance Full structure implicit Characteristics of the Ontology Classification principles Relationship patterns
6
Original Data model
3 data tables: Concepts: one clinical idea is recorded as an concept:
Descriptions: one clinical idea could have more than one description in this table
Relationships: each row represents a relationship between two concepts
16953009 0 elbow joint structure Xa1q8 T15430 1
28696014 0 16953009 elbow joint 0 2 en
711822028 16953009 272741003 182353008 1 2 0
laterality side
7
Fully defined and primitive concepts
Primitive: A concept is primitive if its defining characteristics
are insufficient to define it – that is it has more content than indicated by its attributes and relationship, e.g. clinical finding
Fully defined concepts A concept is fully defined if its defining
characteristics are sufficient
“sufficient” and “insufficient” are determined by SNOMED experts.
Currently 41244 (11%) concepts are fully defined
8
Relationship types
Relationships between two concepts
“laterality” is a “relationship type” According to the statistics there are 1.4 million
records of relationships,
There are 62 relationship types used currently to represent the relationships between two concepts.
711822028 16953009 272741003 182353008 1 2 0
laterality side
9
Relationship types
Time aspect Access instrument Laterality Revision status
WAS A Has specimen Interprets Procedure context
Indirect device After MAY BE A Associated with
Measurement method Has focus Has active ingredient Due to
Specimen source identity Approach Causative agent Specimen source topography
Scale type REPLACED BY SAME AS Associated procedure
Specimen source morphology Using Access Has intent
Property Has dose form Procedure site Associated finding
Recipient category Direct device Part of Direct morphology
Procedure morphology Finding context Priority Has definitional manifestation
Specimen substance Procedure site - Direct Method Occurrence
Pathological process Has interpretation Associated morphology Component
Procedure device Direct substance Episodicity Onset
Indirect morphology Procedure site - Indirect Severity Is a
Specimen procedure Temporal context Course
MOVED TO Subject relationship context Finding site
10
Hierarchy structure
In the collection of relationship types, “IS_A” represents the hierarchal relationship.
485,335 records in relationships tables are stored in the hierarchal information of SNOMED CT
The main hierarchal features root level(no parents): one root “SNOMED CT CONCEPT” middle node level (have parents and children): 80895 (22%) concepts
25687 nodes have only 1 child
leaf node level (no children) 285283 (78%) concepts
11
Multiple inheritance
one concept in SNOMED CT may have many children and many parents
25687
15213
9678
6621
4768
33112530
1901 1545 1626
8016
0
5000
10000
15000
20000
25000
30000
1 2 3 4 5 6 7 8 9 10 >10
Number of Children
Nu
mb
er
of
Co
nc
ep
ts
12
Multiple inheritance
Distribution of The number of parent
282775
59910
15711
4979 1804 9990
50000
100000
150000
200000
250000
300000
1 2 3 4 5 >5
number of parent
nu
mb
er
of
co
nc
ep
ts
13
Hierarchy structure - example
Root
MiddleNodes
leaf
Multiple parents
14
Full structure
Bacterial pneumonia
Infective pneumonia Bacterial infectious disease
Disease
Sudden Onset
courses
Episodicities
bacteria
structure of interstitial tissue of lung
Causative agent
Finding site
onset
course
episodicity
15
Experiment and Result
Explicit Characteristics of the Ontology Original data over view Fully defined and primitive Relationship types Hierarchy structure Multiple inheritance Fully structure Implicit Characteristics of the Ontology Classification principle Relationship patterns
16
Classification principle
Top level categories: 18 direct children of root Each concept belongs to only one top level
category So all concepts in SNOMED CT can be divided into 18 groups
17
Implicit
Top level category Number Of concepts
Physical force 200
Specimen 1044
Staging and scales 1108
Linkage concept 1129
Events 1642
Environments and geographical locations 1666
Physical object 4355
Social context 5188
Context-dependent categories 6836
Observable entity 7568
Qualifier value 8266
Pharmaceutical / biologic product 19639
Substance 23022
Organism 26134
Body structure 31760
Procedure 52741
Special concept 62014
Clinical finding 111866
18
Relationship patterns
Relationship table
582896029 76752008363698007254837009 ….
Relationship ID Source concepts Relationship Type Target concepts
Breast cancer Finding stte Breast Structure
Clinical finding Body structure
Finding sttePattern: Clinical finding Body structure
The specific relationship type between any two Top categories
19
Relationship patterns
Pattern: {C1,type,C2} C1 is the one of 18 top categories type is the one of 62 relationship types C2 is the one of 18 top categories There are 18x62x18 = 20088 possible patterns
Each record in 1.4 million relationships records match one pattern.
To avoid ambiguity, the scope of this study covers only is “active” concepts
The results show only 78 patterns have instance in relationship table.
20
Data modelling based on patterns
For example: to find the relationship between “clinical
finding” and other top categories.
Clinical finding (finding) Causative agent (attribute) Pharmaceutical / biologic product (product)
Clinical finding (finding) Course (attribute) Qualifier value (qualifier value)
Clinical finding (finding) Due to (attribute) Clinical finding (finding)
Clinical finding (finding) Episodicity (attribute) Qualifier value (qualifier value)
Clinical finding (finding) Finding site (attribute) Body structure (body structure)
…………………..
Clinical finding (finding) Has definitional manifestation
(attribute) Clinical finding (finding)
21
Conceptual Data Model
Procedure
Environments and geographical locations
Specimen
Social context
Physical object
Physical force
Pharmaceutical / biologic product
Observable entity
Substance
OrganismCausative
agent
Causative agent
Finding site
After
Due to
Has definitional manifestation
Associated with
Interprets
Associated with
Causative agent
Causative agent
Causative agent
Associated with
Interprets
After
Associated with
EpisodicityOnset
Has interpretation
Pathological process
Occurrence
Course
Severity
Direct morphologyProcedure
morphology ComponentProcedure
site - Indirect
Indirect morphology
Procedure site
Procedure site - Direct
Has focus
Component
Component
Using
Procedure device
Indirect device
Direct device
UsingAccess
instrument
Body Structure
Part of
Laterality
Has intent
Revision status
Scale type
Approach
Access
Priority
Method
Property
Time aspect
Measurement method
Has focus
Recipient category
Context-dependent categories
Associated finding
Associated finding
Associated procedure
Associated finding
Associated procedure
Procedure contex
Temporal context
Finding context
Component Direct substance
Specimen source topography
Specimen source morphology
Specimen source identity
Has specimen Specimen
procedure
Specimen source identity
Specimen source identity
Associated morphology
Associated with
Clinical finding
Subject relationship context
Has dose form
Has active ingredient
Has active ingredient
Specimen substance
Qualifier value
Procedure
Environments and geographical locations
Specimen
Social context
Physical object
Physical force
Pharmaceutical / biologic product
Observable entity
Substance
Organism
Body Structure
Context-dependent categories
Linkage concept
Clinical finding
Qualifier value
Event
Special concept
Staging and scales
SNOMED CT
CONCEPT
ISA
Confidential -- Draft
Conceptual Model for SNOMED CT
Jon Patrick & Ming Zhang
School of Information Technologies
University of Sydney
01/02/2006
22
Future Work
Design a methods of defining real-world constraints over the relationships E.g. suicide can have slow onset
Develop storage and maintenance procedures for managing the data, e.g. there is no constraint over the data model as it exists at the moment.
Design a terminology server to deliver SCT to vendors.
Work with vendors to define a transport mechanism for vendors to be able to install SCT.
Create Internet access to SCT content for ad hoc users.
Start working on systems that demonstrate the value of SCT for clinical and administrative work.