uima for nlp based researchers’ workplaces in medical domains
TRANSCRIPT
UIMA for NLP based Researchers’ Workplaces in Medical Domains
Manuela KunzeDietmar Rösner
University of MagdeburgGermany
Researchers‘ Workplace in Medical Domains
Manuela Kunze UIMA Workshop 2008 2
GUIUIMA
components
Autopsy Protocols
Epicrises(clinical reports) psychotherapists
forensic medicine
Outline
• use cases1. Processing clinical reports
2. Processing autopsy protocols
• architecture– preprocessing
– analyses
– presentation of results
• summary
Manuela Kunze UIMA Workshop 2008 3
Corpus: Clinical Reports
• clinical reports: summary reports of diagnoses and treatmentfor specific patients
• research question by psychotherapists: – to detect significant changes in the distribution of diagnoses
• e.g. related to the fundamental changes of the socio-political system after 1989 in former East Germany
• started with a feasibility study– diagnostic summaries (parts of epicrises)
– span over time period of 20 years
– ca. 1000 summaries
Manuela Kunze UIMA Workshop 2008 4
Corpus: Clinical Reports
• a diagnostic summary can contains following parts:– psychopathological symptoms
– relevant personality traits and interpersonal problems
– diagnostic label
– (optional) related social incidents
Manuela Kunze UIMA Workshop 2008 5
Schwere depressive Störung im Zusammenhang mit beruflicher und
partnerschaftlicher Konfliktsituation bei schizoider Persönlichkeit mit
narzisstischen Anteilen.
Ängstlich-depressives Syndrom mit multiplen Somatisierungen.
2 examples:
Corpus: Autopsy Protocols
• 120 forensic autopsy protocols– ca. 1 million running word forms
– strictly defined format and content
• contains different parts with their ownsublanguage:– findings
– histological findings
– background
– death causes
– discussion
– …
• research question: – detection of injury patterns and creation of
resp. statistics
Manuela Kunze UIMA Workshop 2008 6
Outline
• use cases1. Processing epicrises
2. Processing autopsy protocols
• architecture– preprocessing
– analyses
– presentation of results
• summary
Manuela Kunze UIMA Workshop 2008 7
Preprocessing
Manuela Kunze UIMA Workshop 2008 8
clinical reports
• splitting: – documents contain the
collected diagnosticsummaries made in a year
• CPE– detection of a
diagnosis summary
– diagnostic printer
autopsy protocols
• anonymisation– names of persons, locations,
birth dates
• CPE:– detection of sensible data
– replacement by placeholders
Outline
• use cases1. Processing epicrises
2. Processing autopsy protocols
• architecture– preprocessing
– analyses
– presentation of results
• summary
Manuela Kunze UIMA Workshop 2008 9
Analyses of Documents
Manuela Kunze UIMA Workshop 2008 10
Analysis Module
general (medical) analysis engines
processing of epicrisis
discourse marker
annotator
subfragment
annotator
rule-based
classifier
OpenNlp-Maxent
classifier
structure tagger POS tagger
Gazetteer
annotator
GermaNet
annotator
UMLS
annotator
synonym
annotator
processing of autopsy protocols
Context based
analaysis
Personal data
annotator
Traumata
annotator
Weapons
annotator
Summary
annotator
Criminal offense
signs annotator
Analyses of Documents: General Medical Tools
• structure tagger: sentence boundaries, numbers, abbreviations, …
• POS Tagger: word categories, stems, case, number
• Gazetteer annotator: lists about syndroms, symptoms, diseases, …
• GermaNet annotator: information about GermaNet synsets
• UMLS annotator: information about concepts of metathesaurus of UMLS
Manuela Kunze UIMA Workshop 2008 11
general (medical) analysis engines
structure tagger POS tagger
Gazetteer
annotator
GermaNet
annotator
UMLS
annotator
Analyses of Documents
Manuela Kunze UIMA Workshop 2008 12
Analysis Module
general (medical) analysis engines
processing of epicrisis
discourse marker
annotator
subfragment
annotator
rule-based
classifier
OpenNlp-Maxent
classifier
structure tagger POS tagger
Gazetteer
annotator
GermaNet
annotator
UMLS
annotator
synonym
annotator
processing of autopsy protocols
Context based
analaysis
Personal data
annotator
Traumata
annotator
Weapons
annotator
Summary
annotator
Criminal offense
signs annotator
Analyses of Documents: Processing Clinical Reports
• discourse marker annotator
• subfragment annotator–annotates the different parts of a diagnostic summary
Manuela Kunze UIMA Workshop 2008 13
processing of epicrisis
discourse marker
annotator
subfragment
annotator
rule-based
classifier
OpenNlp-Maxent
classifier
synonym
annotator
Akute depressive Symptomatik im Zusammenhang mit Partnerschaftskonflikt auf der Basis einer primär neurotischen Fehlentwicklungbei selbstunsicherer depressiv strukturierter abhängiger Persönlichkeit.
Analyses of Documents: Processing Clinical Reports
• synonym annotator– domain specific synonymous terms
• e.g. psychosomatic disorders, depression
• fragment classifier– OpenNLP classifier– Rule-based classifier
Manuela Kunze UIMA Workshop 2008 14
processing of epicrisis
discourse marker
annotator
subfragment
annotator
rule-based
classifier
OpenNlp-Maxent
classifier
synonym
annotator
Analyses of Documents
Manuela Kunze UIMA Workshop 2008 15
Analysis Module
general (medical) analysis engines
processing of epicrisis
discourse marker
annotator
subfragment
annotator
rule-based
classifier
OpenNlp-Maxent
classifier
structure tagger POS tagger
Gazetteer
annotator
GermaNet
annotator
UMLS
annotator
synonym
annotator
processing of autopsy protocols
Context based
analaysis
Personal data
annotator
Traumata
annotator
Weapons
annotator
Summary
annotator
Criminal offense
signs annotator
Analyses of Documents: Processing Autopsy Protocols
Manuela Kunze UIMA Workshop 2008 16
processing of autopsy protocols
Context based
analaysis
Personal data
annotator
Traumata
annotator
Weapons
annotator
Summary
annotator
Criminal offense
signs annotator
• Personal Data Annotator: age, weight
• Traumata Annotator: fractures, hematoma, stab wound, etc.
• Criminal Offense Signs Annotator: signs of criminal offense (e.g. ‚stab canal‘)
• Weapons Annotator: thrusting, baton, …
• Summary Annotator: death cause and manner of death
• Context based Analysis: relations between injuries and their resp. locations
Outline
• use cases1. Processing epicrises
2. Processing autopsy protocols
• architecture– preprocessing
– analyses
– presentation of results
• summary
Manuela Kunze UIMA Workshop 2008 17
Processing of Results
• different CAS Consumers– text summaries
– for indexing: UIMA search engine, Lucene
• different user interfaces– presentation of annotations
– search engines
Manuela Kunze UIMA Workshop 2008 18
Processing of Results
Manuela Kunze UIMA Workshop 2008 19
Processing Results
• a generic interface to UIMA Search Engine– input:
• directory to indexed files
• directory to CAS files
• type system descriptor
• XML descriptor for indexing
• XML based description of possible values for features
Manuela Kunze UIMA Workshop 2008 20
Processing Results
Manuela Kunze UIMA Workshop 2008 21
Processing Results
Manuela Kunze UIMA Workshop 2008 22
Processing Results
Manuela Kunze UIMA Workshop 2008 23
Outline
• use cases1. Processing epicrises
2. Processing autopsy protocols
• architecture– preprocessing
– analyses
– presentation of results
• summary
Manuela Kunze UIMA Workshop 2008 24
Summary
• Why UIMA?
– modular architecture, interfaces
– strict separation of resources and process methods
– simple changes by domain experts are possible
Manuela Kunze UIMA Workshop 2008 25
Researchers‘ Workplace
GUIUIMA
components
Autopsy Protocols
Epicrisespsychotherapists
forensic medicine