experience with using the umls semantic network to coordinate controlled terminologies for a large...
TRANSCRIPT
Experience with Using the UMLS Semantic Network to
Coordinate Controlled Terminologiesfor a Large Clinical Data Repository
James J. Cimino
Department of Biomedical Informatics
Columbia University College of Physicians and Surgeons
National Library of Medicine, April 8, 2005
Overview
• Background
• History
• General principles
• Empiric observations: Semantic Network in the Medical Entities Dictionary
• Lessons to be learned
Clinical Data Architecture
• Central repository to collect data from myriad sources• Myriad users of data - some not yet imagined
New York Presbyterian HospitalClinical Information Systems Architecture
Clinical Database
Medical Entities Dictionary
Database Monitor
Medical Logic Modules
DatabaseInterface
Research
Administrative
Alerts & Reminders
Results Review
. . .. . .Radiology LaboratoryDischarge
Summaries
Reformatter Reformatter Reformatter
Clinical Data Architecture
• Central repository to collect data from myriad sources• Myriad users of data - some not yet imagined• Patient-oriented, not visit oriented, database• Relational, not hierarchical, model• Entity-attribute-value model
Entity-Attribute-Value Clinical Data Repository
Clinical Data Architecture
• Central repository to collect data from myriad sources• Myriad users of data - some not yet imagined• Patient-oriented, not visit oriented, database• Relational, not hierarchical, model• Entity-attribute-value model
• Coded data wherever possible• Unify terminology
Medical Entities Dictionary: A Central Terminology Repository
MED Structure
MedicalEntity
LaboratoryProcedure
CHEM-7PlasmaGlucose
Test
LaboratorySpecimen
PlasmaSpecimen
Substance
Sampled
Part of
Has S
pecimen
Event
LaboratoryTest
DiagnosticProcedure
Substance MeasuredGlucose
Plasma
AnatomicSubstance
Substance
BioactiveSubstance
Chemical
Carbo-hydrate
Communicating Terminology Changes
K#1
K#2 K#3
K#3 = 2.6
K#1 = 4.2K#1 = 3.3
K#2 = 3.2
K#1 = 3.0
K#1 = 4.2K#1 = 3.3
K#2 = 3.2
K#1 = 3.0
Solution: Hierarchical Integration
K#1
K#2
K
K#3
K#3 = 2.6
Use of the UMLS in Patient CareUse of the UMLS in Patient Care
James J. Cimino, M.D.James J. Cimino, M.D.
Center for Medical InformaticsCenter for Medical Informatics
Columbia UniversityColumbia University
Mont Pelerin, Switzerland 1994Mont Pelerin, Switzerland 1994
UMLS Semantic Network
• Strict hierarchy
• Semantic types: 132 (135)
• Semantic relations: 46 (53)
• Inheritance of relations: 6233 (6700)
UMLS Metathesaurus
• Terms from 22 (100+) controlled vocabularies
• Total source terms: 311,046
• Total strings: 279,237 (5,000,000)
• Total concepts: 152,444 (1,000,000)
• Relationships: 1,484,994 (16,000,000)
Medical Entities Dictionary
• Semantic Network
• Sources: 5
• Strings: 108,492
• Concepts: 35,281
• Semantic relations: 23 pairs
• Semantic Links: 145,672
Comparisons - Methods
• CPMC Entities vs. UMLS Semantic Types
• MED Classes vs. UMLS Semantic Types
• MED Semantic Links vs. UMLS Semantic Relations
• MED Concepts vs. Metathesaurus Concepts
• MED Semantic Links vs. Meta Relations
Comparisons - Results
DB Entities Classes Links
CPMC
UMLS
Types
Relations
Concepts
Concepts
Meta Links
++++
+++
+/-
++
+++
Summary
• Semantic Types provide good coverage
• Concepts provide good coverage in certain domains
• No technical reason why UMLS could not incorporate clinical vocabulary
Where We Are Today - Repository
• Patients: 2.6 million• Visits: >10 million since 1996 with
archives going back to 1979• Visit diagnoses, locations,
procedures, providers, insurance• Lab procedures: 16 million with 130
million results (to 1989)• Radiology procedures reports: 5.7
million• Pathology: 1.4 million• Cardiology procedures: 1.5 million • Resident signout notes:760,000• Operative Notes: 426,000• Clinical Notes: 400,000• Discharge Summaries: 420000
• Medication orders: >60 million• ObGyn Procedure Reports: 241,000• GI Procedure Reports: 101,000• Neurology Procedure Reports:
54,000 • Ideatel BP’s: 215,000• Ideatel Glucose: 650,000• Consult Events: 18000• HEENT Events:13000• Hospitalist Notes:30000• PFT: 25000• Provider profiles 11000• IDX 1.4 million• East Campus
Where We Are Today - MED• Domains: 7++ (5)
– HP lab terms– Misys lab terms– Cerner lab terms– Misys Radiology– Digimedix drugs– Cerner Drugs– ICD9-based problem list terms– Other applications– Knowledge terms
• Size:– Concept-based: 95,641 (35,281)– Multiple hierarchy: 141,306– Synonyms: 239,581 (108,492)– Translations: 141,717– Semantic link pairs: 52 (23)– Semantic links: 225,698 (145,672)– Attributes: 210,456
What does this have to do with the SN?
• MED was initially based on UMLS design (creationism)
• UMLS SN was the “starter set”
• MED is “local UMLS” for CPMC
• General principles were established
• MED has developed without further conscious attention to the SN (evolution)
• MED content represents real-world terminology
• What follows are empiric observations, open to criticism; perhaps indefensible
General Principles
• Everything is a class
• Multiple hierarchy
• Some relations are definitional
• At most, one part of relation pair is definitional
• Properties introduced at single points
Observations on the SN in the MED
• Arrangement of SN in MED
• Multiple hierarchy of STs
• Size of ST classes in MED (vs Meta?)
• STs as introduction points
• Intersections
UMLS Semantic Net in the MED
A: T071: Medical Entity [94729]. A1: T072: Physical Object [5618]. +*A1.2: T017: Anatomical Structure [577]. A2: T077: Conceptual Entity [77861]. *B: T051: Event [55450]
Key: “A1.2”: UMLS Tree address “T071”: Semantic type ID “Event”: MED Name “+”: Multiple locations “*”: Discontinuous tree address “[577]”: Number of MED concepts
UMLS Semantic Net in the MEDA: T071: Medical Entity [94729]. A1: T072: Physical Object [5618]. . A1.1: T001: Organism [3153]. . . A1.1.1: T002: Plant [1]. . . . A1.1.1.1: T003: Alga [0]. . . A1.1.2: T004: Fungus [273]. . . A1.1.3: T005: Virus [169]. . . A1.1.4: T006: Rickettsia or Chlamydia [5]. . . A1.1.5: T007: Bacterium [992]. . . A1.1.6: T194: Archaeon [0]. . . A1.1.7: T008: Animal [93]. . . . A1.1.7.1: T009: Invertebrate [85]. . . . A1.1.7.2: T010: Vertebrate [6]. . . . . A1.1.7.2.1: T011: Amphibian [0]. . . . . A1.1.7.2.2: T012: Bird [0]. . . . . A1.1.7.2.3: T013: Fish [0]. . . . . A1.1.7.2.4: T014: Reptile [0]. . . . . A1.1.7.2.5: T015: Mammal [1]. . . . . . A1.1.7.2.5.1: T016: Human [0]
Key: “A1.2”: UMLS Tree address “T071”: Semantic type ID “Event”: MED Name “+”: Multiple locations “*”: Discontinuous tree address “[577]”: Number of MED concepts
UMLS Semantic Net in the MEDA: T071: Medical Entity [94729]. +*A1.2: T017: Anatomical Structure [577]. . A1.2.3: T021: Fully Formed Anatomical Structure [230]. . . A1.2.3.1: T023: Body Part, Organ, or Organ Component [204]. . . *A1.2.1: T018: Embryonic Structure [2]. . . *A1.2.2: T190: Anatomical Abnormality [20]. . . . A1.2.2.1: T019: Congenital Abnormality [0]. . . . A1.2.2.2: T020: Acquired Abnormality [18]. . *A1.2.3.2: T024: Tissue [66]. . *A1.2.3.3: T025: Cell [61]. . *A1.2.3.4: T026: Cell Component [11]. . *A1.2.3.5: T028: Gene or Genome [0]. . *A1.4.2: T031: Body Substance [56]. . +*A2.1.4.1: T022: Body System [65]. . +*A2.1.5.1: T030: Body Space or Junction [43]. . +*A2.1.5.2: T029: Body Location or Region [117. . *A1.3: T073: Manufactured Object [16]. . . A1.3.1: T074: Medical Device [6]. . . A1.3.2: T075: Research Device [0]. . . A1.3.3: T200: Clinical Drug [0]. . A1.4: T167: Substance [???]. . . A1.4.1: T103: Chemical [1942]. . . . A1.4.1.1: T120: Chemical Viewed Functionally [1828]. . . . . A1.4.1.1.1: T121: Pharmacologic Substance [1468]. . . . . . +*A1.4.1.1.3.4: T127: Vitamin [20]. . . . . . A1.4.1.1.1.1: T195: Antibiotic [130]. . . . . A1.4.1.1.3: T123: Biologically Active Substance [530]. . . . . . +A1.4.1.1.3.4: T127: Vitamin [20]
Key: “A1.2”: UMLS Tree address “T071”: Semantic type ID “Event”: MED Name “+”: Multiple locations “*”: Discontinuous tree address “[577]”: Number of MED concepts
1: Medical Entirity [T071] MED-CODE UMLS-CODE NAME SUBCLASS-OF -> SUBCLASS (1: Medical Entity [T071]) SUBCLASS -> SUBCLASS-OF (1: Medical Entity [T071]) SYNONYMS PRINT-NAME HAS-PARTS -> PART-OF (1: Medical Entity [T071]) PART-OF -> HAS-PARTS (1: Medical Entity [T071]) DEFINITION MAIN-MESH SUPPLEMENTARY-MESH NAME-TOKEN DEFAULT-SHORT-DISPLAY-NAME DEFAULT-DISPLAY-NAME SPEECH-SYNONYM SPEECH-SYNTHESIS-NAME ENTITY-(HAS-RELATED)-PAGER-NUMBER ENTITY-(HAS)-MEDLEE-TARGET-TERM HIERARCHY-SELECTOR
Property Introduction Points
7: Body System [T022] ACTION-SITE-OF -> ACTION-SITE (98: Health Care Activity
(Procedure) [T058])
14: Anatomical Structure [T017] SITE-OF-PROBLEM -> HAS-PROBLEM-SITE (30007: Patient Problem) OBSERVATION-SITE-OF -> OBSERVATION-SITE (94: Diagnostic
Procedure [T060])
43: Chemical [T103] PHARMACEUTIC-COMPONENT-OF -> PHARMACEUTIC-
COMPONENT (28103: Pharmacy Items (Drugs and Nondrugs))
50: Measureable Entity MEASURED-BY-PROCEDURE -> ENTITY-MEASURED (64964:
Assessment Procedures) LOINC-ANALYTE-NAME
76: Disease or Syndrome [C0391828] ETIOLOGY -> CAUSES-DISEASES (135: Etiologic Agent) IS-HISTORIC-DISEASE-FOR -> HISTORIC-DISEASE (56164: Factors
Related to Past Disease Influencing Health Status)
Medical Properties
83: Laboratory Finding or Test Result [T034] RESULT-TYPE-->TESTS -> TEST-->RESULT-TYPE (94: Diagnostic
Procedure [T060])
86: Finding [T033] FINDING-(REFERS-TO)->ORGANISM
93: Laboratory Diagnostic Procedure COLLECTED-BY -> COLLECTED-FOR (33023: Specimen Collection
[C0200345])
94: Diagnostic Procedure [T060] UNITS TEST-->RESULT-TYPE -> RESULT-TYPE-->TESTS (83: Laboratory Finding
or Test Result [T034]) OBSERVATION-SITE -> OBSERVATION-SITE-OF (14: Anatomical Structure
[T017]) TEST-(HAS)-ABNORMAL-FLAG -> ABNORMAL-FLAG-(FOR)-TEST (77746:
Abnormal Flag Value)
98: Health Care Activity (Procedure) [T058] PROCEDURE-(INDICATES)->PT-PROBLEM -> PT-PROBLEM-(INDICATED-
BY)->PROCEDURE (30007: Patient Problem) ACTION-SITE -> ACTION-SITE-OF (7: Body System [T022])
Medical Properties
135: Etiologic Agent CAUSES-DISEASES -> ETIOLOGY (76: Disease or Syndrome
[C0391828])
1181: Antibiotic Sensitivity Tests SENSITIVITY-ANALYTE -> SENSITIVITY-ANALYTE-OF (44440:
Antibiotic or Bacterial Enzyme Inhibitor)
32291: Sampleable Entity SAMPLED-BY -> SYSTEM-SAMPLED (64970: Sample Entity) LOINC-SYSTEM-CODE
44440: Antibiotic or Bacterial Enzyme Inhibitor SENSITIVITY-ANALYTE-OF -> SENSITIVITY-ANALYTE (1181:
Antibiotic Sensitivity Tests)
Medical Properties
59511: Clinical Repository Table TABLE-HAS-COLUMN -> COLUMN-IS-IN-TABLE (59512: Clinical
Repository Column)
59512: Clinical Repository Column COLUMN-IS-IN-TABLE -> TABLE-HAS-COLUMN (59511: Clinical
Repository Table)
59528: Generic Column COLUMN-HAS-PERMITTED-VALUES -> IS-PERMITTED-VALUE-FOR-
COLUMN (67164: Verification Concept for Generic Column)
59729: Data Entry Form Component REPEAT-TYPE(DATA-ENTRY-COMPONENT) NUMBER-REPEATS(DATA-ENTRY-COMPONENT) REPEAT-LAYOUT-TYPE(DATA-ENTRY-COMPONENT)
59732: Form Field Allowable Values ALLOWABLE-VALUE-(FOR)->DATA-ENTRY-FIELD -> DATA-ENTRY-
FIELD-(HAS)->ALLOWABLE-VALUE (42646: Data Entry Form Field)
Data Dictionary Properties
21762: ICD9 Element ICD9-CODE ICD9-ENTRY-CODE OLD-ICD9-CODE ICD9-NAME
23147: American Hospital Formulary Service Class AHFS-CLASS-CODE
28104: Drug Enforcement Administration (DEA) Controlled Substance Category
DEA-CODE
Controlled Terminology Properties
1178: Number or String Result EVENT-ID-OF -> EVENT-ID (9876: CPMC Event) EVENT-PATIENT-ID-OF -> EVENT-PATIENT-ID (9876: CPMC Event) EVENT-ORGANIZATION-OF -> EVENT-ORGANIZATION (9876: CPMC Event) EVENT-LOCATION-OF -> EVENT-LOCATION (9876: CPMC Event) PARTICIPANT-ID-OF -> PARTICIPANT-ID (30352: Medical Event Participant)
9876: CPMC Event EVENT-ID -> EVENT-ID-OF (1178: Number or String Result) EVENT-DATE -> EVENT-DATE-OF (30349: Date Result) EVENT-PATIENT-ID -> EVENT-PATIENT-ID-OF (1178: Number or String Result) EVENT-PARTICIPANT -> PARTICIPANT-OF (30352: Medical Event Participant) EVENT-ORGANIZATION -> EVENT-ORGANIZATION-OF (1178: Number or String Result) EVENT-LOCATION -> EVENT-LOCATION-OF (1178: Number or String Result) EVENT-STATUS -> STATUS-OF (30355: CPMC Status Term) EVENT-(HAS)-ORGANIZATION -> ORGANIZATION-(FOR)-EVENT (81475: CPMC Coded
Organizations)
30344: CPMC Order ORDER-QUANTITY -> ORDER-QUANTITY-OF (30350: Quantity Result) ORDER-FREQUENCY -> ORDER-FREQUENCY-OF (32504: Order Frequency) ORDER-START-DATE -> ORDER-START-DATE-OF (30349: Date Result) ORDER-STOP-DATE -> ORDER-STOP-DATE-OF (30349: Date Result)
30352: Medical Event Participant PARTICIPANT-OF -> EVENT-PARTICIPANT (9876: CPMC Event) PARTICIPANT-ID -> PARTICIPANT-ID-OF (1178: Number or String Result) PARTICIPANT-NAME -> PARTICIPANT-NAME-OF (32653: ID Number Plus Text Result)
Data Modeling Properties
40441: Display Information [C0010996] DEFAULT-DISPLAY-FOR -> HAS-DEFAULT-DISPLAYS (94: Diagnostic
Procedure [T060]) DISPLAYS-ELEMENTS-OF -> ELEMENTS-DISPLAYED-BY (94:
Diagnostic Procedure [T060]) HAS-DISPLAY-PARAMETERS -> IS-DISPLAY-PARAMETER-OF (94:
Diagnostic Procedure [T060]) DISPLAY-PARAMETER-ORDER
Application Properties
42645: Data Entry Form FORM-(IS-PART-OF)->FORMSET -> FORMSET-(CONTAINS)->FORM (66436:
Data Entry Form Sets)
42646: Data Entry Form Field DATA-ENTRY-FIELD-(HAS)->ALLOWABLE-VALUE -> ALLOWABLE-VALUE-
(FOR)->DATA-ENTRY-FIELD (59732: Form Field Allowable Values) FORM-FIELD-(HAS)->FIELD-TYPE -> FIELD-TYPE-(FOR)->FORM-FIELD (66295:
Data Entry Field Type) FORM-FIELD-(OBEYS)->PREFILL-RULE -> PREFILL-RULE-(FOR)->FORM-FIELD
(66311: Prefill Rules) FORM-FIELD-MAXIMUM-VALUE FORM-FIELD-MINIMUM-VALUE FORM-FIELD-MAXIMUM-CHARACTER-COUNT
59732: Form Field Allowable Values ALLOWABLE-VALUE-(FOR)->DATA-ENTRY-FIELD -> DATA-ENTRY-FIELD-(HAS)-
>ALLOWABLE-VALUE (42646: Data Entry Form Field)
66295: Data Entry Field Type FIELD-TYPE-(FOR)->FORM-FIELD -> FORM-FIELD-(HAS)->FIELD-TYPE (42646:
Data Entry Form Field)
66308: Layout Type LAYOUT-TYPE-(FOR)->FORM-STRUCTURE -> FORM-STRUCTURE-(HAS)-
>LAYOUT-TYPE (66405: Data Entry Form Structure)
Document Properties
Chemical [T103]Measureable EntityEtiologic Agent 1780 cases.
Measureable EntityLaboratory Finding or Test Result [T034]Finding [T033]Etiologic AgentMicrobiology ResultPatient ProblemLaboratory Results 1399 cases.
Laboratory Finding or Test Result [T034]Finding [T033]Patient ProblemLaboratory Results 3309 cases.
Laboratory Finding or Test Result [T034]Finding [T033]Patient ProblemLaboratory ResultsNew York Hospital (NYH) Laboratory Nomenclature Term 1601 cases.
207 Intersection Classes
Laboratory Finding or Test Result [T034]Finding [T033]Patient ProblemNew York Hospital (NYH) Laboratory Nomenclature Term 2906 cases.
Laboratory Diagnostic ProcedureDiagnostic Procedure [T060]Health Care Activity (Procedure) [T058]Event [T051]Laboratory Diagnostic BatteriesSingle-Result Laboratory TestNew York Hospital (NYH) Laboratory ConceptAssessment Procedures 1197 cases.
Laboratory Diagnostic ProcedureDiagnostic Procedure [T060]Health Care Activity (Procedure) [T058]Event [T051]Laboratory Diagnostic BatteriesNew York Hospital (NYH) Laboratory ConceptAssessment Procedures 1822 cases.
207 Intersection Classes
Laboratory Diagnostic ProcedureDiagnostic Procedure [T060]Health Care Activity (Procedure) [T058]Event [T051]Single-Result Laboratory TestNew York Hospital (NYH) Laboratory ConceptAssessment Procedures 3200 cases.
Laboratory Diagnostic ProcedureDiagnostic Procedure [T060]Health Care Activity (Procedure) [T058]Event [T051]Single-Result Laboratory TestCPMC Single-Result Laboratory TestAssessment Procedures 3197 cases.
Health Care Activity (Procedure) [T058]Event [T051]ICD9 ElementVerification Concept for Generic Column 10048 cases.
207 Intersection Classes
Revisiting Recommendations - General
• Make “Event” a temporal concept
• Conceptual vs. Physical polarization
• Directed Acyclic Graph
• Merge Network and Metathesaurus
Revisiting Recommendations - Specific
• Tests have Specimens
• Tests have Parts
• Separate Medications from Chemicals
• Liberalize assignment of Relations
Revisiting Summary
• Semantic Types provide good coverage
• Concepts provide good coverage in certain domains
• No technical reason why UMLS could not incorporate clinical vocabulary
Lessons to be Learned
• The MED is representative of clinical care
• MED classes work well as introduction points
• Multiple hierarchy works
• Semantic Network is largely intact
• Unifying organization for anatomy needed
• Further study of MED will suggest additional types and relations