recent efforts in clinical nlp: clinical text analysis and knowledge extraction system (ctakes)...

42
Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and Harvard Medical School

Upload: steven-bradford

Post on 17-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Recent Efforts in Clinical NLP:Clinical Text Analysis and

Knowledge Extraction System (cTAKES)

Guergana K. Savova, PhDChildren’s Hospital Boston and

Harvard Medical School

Acknowledgements

Software developers and contributors at different times (in no specific order)James Masanz, Mayo ClinicPatrick Duffy, Mayo ClinicPhilip Ogren, University of ColoradoSean Murphy, Mayo ClinicVinod Kaggal, Mayo ClinicJiaping Zheng, Childrens Hospital BostonPei Chen, Childrens Hospital BostonJihno Choi, University of Colorado

Investigators (in no specific order)Christopher Chute, MD, DrPH, Mayo ClinicJames Buntrock, MS, Mayo ClinicGuergana Savova, PhD, Childrens Hospital Boston

Overview

BackgroundClinical Text Analysis and Knowledge Extraction System (cTAKES)cTAKES for developers Download and install of cTAKES How to build the dictionary

cTAKES: graphical user interface

4

Definitions

• Information Extraction (IE)• Extracting existing facts from unstructured or loosely

structured text into a structured form

• Information Retrieval (IR)• Finding documents relevant to a user query

• Named Entity Recognition (NER)• Discovery of groups of textual mentions that belong to certain

semantic class

• Natural Language Processing (NLP)• Computational methods for text processing based on

linguistically sound principles

• Clinical NLP – NLP for the clinical narrative

• Biomedical NLP – NLP for the clinical narrative and biomedical literature

5

Problem Space

• Structured information• Relational databases

• Easy to extract information from them

• Semi-structured information• Loosely formatted XML, CSV tables

• Not challenging to extract information

• Unstructured information• Scholarly literature, clinical notes, research reports, webpages

• Majority of information is unstructured!!

• Real challenge to extract the information

Overarching Goal

Open-source, general-purpose clinical NLP toolkit Phenotype extraction from unstructured data Library of modules Cohesive with other initiatives Cutting edge methodologies Best software development practices

Our principles Open source Scalable and robust Modular and expandable Based on existing standards and conventions Scalable, adaptable methodologies through open

collaboration in the open-source development

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 mpresentation. Her initial blood glucose was 340 mg/dL. Glyburide

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation.

On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones.

She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.

Processing Clinical Notes

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation.

On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones.

She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.

Clinical Element Modelhttp://intermountainhealthcare.org/cem/Page

s/home.aspxDisorder CEM text: diabetes mellituscode: 73211009subject: patient relative temporal context: 3 months agonegation indicator: not negated

Disorder CEM text: diabetes mellituscode: 73211009subject: family member relative temporal context: negation indicator: not negated

Tobacco Use CEM text: smokingcode: 365981007subject: patient relative temporal context: 25 yearsnegation indicator: not negated

Medication CEM text: Glyburidecode: 315989subject: patient frequency: once dailynegation indicator: not negated strength: 2.5 mg

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation.

On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones.

She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation.

On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones.

She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation.

On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones.

She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation.

On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones.

She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.

Comparative Effectiveness

Disorder CEM text: diabetes mellituscode: 73211009subject: patient relative temporal context: 3 months agonegation indicator: not negated

Disorder CEM text: diabetes mellituscode: 73211009subject: family member relative temporal context: negation indicator: not negated

Tobacco Use CEM text: smokingcode: 365981007subject: patient relative temporal context: 25 yearsnegation indicator: not negated

Medication CEM text: Glyburidecode: 315989subject: patient frequency: once dailynegation indicator: not negated strength: 2.5 mg

Compare the effectiveness of different treatment strategies (e.g., modifying target levels for glucose, lipid, or blood pressure) in reducing cardiovascular complications in newly diagnosed adolescents and adults with type 2 diabetes.

Compare the effectiveness of traditional behavioral interventions versus economic incentives in motivating behavior changes (e.g., weight loss, smoking cessation, avoiding alcohol and substance abuse) in children and adults.

Meaningful Use

Disorder CEM text: diabetes mellituscode: 73211009subject: patient relative temporal context: 3 months agonegation indicator: not negated

Disorder CEM text: diabetes mellituscode: 73211009subject: family member relative temporal context: negation indicator: not negated

Tobacco Use CEM text: smokingcode: 365981007subject: patient relative temporal context: 25 yearsnegation indicator: not negated

Medication CEM text: Glyburidecode: 315989subject: patient frequency: once dailynegation indicator: not negated strength: 2.5 mg

• Maintain problem list

• Maintain active med list

• Record smoking status

• Provide clinical summaries for each office visit

• Generate patient lists for specific conditions

• Submit syndromic surveillance data

Clinical Practice

Disorder CEM text: diabetes mellituscode: 73211009subject: patient relative temporal context: 3 months agonegation indicator: not negated

Medication CEM text: Glyburidecode: 315989subject: patient frequency: once dailynegation indicator: not negated strength: 2.5 mg

• Provide problem list and meds from the visit

Applications

Meaningful use of the EMR Comparative effectiveness Clinical investigation Patient cohort identification Phenotype extraction

Epidemiology Clinical practice and many more….With deep semantic processing, the sky is the limit for applications

Partnerships

NCBC-funded initiatives Integrating Data for Analysis, Anonymization and Sharing (iDASH) Ontology Development and Information Extraction (ODIE)

Veterans AdministrationStrategic Health Advanced Research Projects (SHARP)

SHARP 3: SMaRT app (http://www.smartplatforms.org/) SHARP 4: www.sharpn.org

R01s Shared annotated lexical resource Temporal relation discovery for the clinical domain Milti-source integrated platform for answering clinical questions

eMERGE, PGRN (Pharmacogenomics Research Network)Linguistic Data Consortium and Penn TreebankMITRE Corporation

Integrating cTAKES within i2b2

Querying encrypted clinical notes stored in the i2b2 databaseProcessing the result notes through cTAKESPersisting extracted concepts into the i2b2 databaseThus, the concepts are now searchable by the researcherEnabling the training and running classifiers directly from the i2b2 workbench

https://www.i2b2.org/events/slides/i2b2_AMIA_Tutorial_20100310.pdf

….a scalable informatics framework that will enable clinical researchers to use existing clinical data for discovery research and, when combined with IRB-approved genomic data, facilitate the design of targeted therapies for individual patients with diseases having genetic origins.

15

clinical Text Analysis and Knowledge Extraction System (cTAKES)

16

cTAKES Adoption May, 2011: 2306 downloads*

eMERGE (SGH, NW) PGRN (HMS, NW) Extensions: Yale (YATEX), MITRE

* Source: http://sourceforge.net/project/stats/?group_id=255545&ugn=ohnlp&type=&mode=alltime

18

cTAKES Technical Details • Open source

• Apache v2.0 license

• http://sourceforge.net/projects/ohnlp/

• Java 1.5

• Dependency on UMLS which requires a UMLS license (free)

• Framework • IBM’s Unstructured Information Management Architecture

(UIMA) open source framework, Apache project

• Methods • Natural Language Processing methods (NLP)

• Based on standards and conventions to foster interoperability

• Application • High-throughput system

19

cTAKES: Components

• Sentence boundary detection (OpenNLP technology)

• Tokenization (rule-based)

• Morphologic normalization (NLM’s LVG)

• POS tagging (OpenNLP technology)

• Shallow parsing (OpenNLP technology)

• Named Entity Recognition• Dictionary mapping (lookup algorithm)

• Machine learning (MAWUI)

• types: diseases/disorders, signs/symptoms, anatomical sites, procedures, medications

• Negation and context identification (NegEx)

• Dependency parser

• Drug Profile module

• Smoking status classifier

• CEM normalization module (soon to be released)

20

Output Example: Drug Object

• “Tamoxifen 20 mg po daily started on March 1, 2005.”• Drug

• Text: Tamoxifen

• Associated code: C0351245

• Strength: 20 mg

• Start date: March 1, 2005

• End date: null

• Dosage: 1.0

• Frequency: 1.0

• Frequency unit: daily

• Duration: null

• Route: Enteral Oral

• Form: null

• Status: current

• Change Status: no change

• Certainty: null

21

Output Example: Disorder Object

• “No evidence of cholangiocarcinoma.”• Disorder

• Text: cholangiocarcinoma

• Associated code: SNOMED 70179006

• Certainty: 1

• Context: current

• Relatedness to patient: true

• Status: negated

(1)cTAKES for developersDownload and install of cTAKES

Building the dictionary

Jiaping ZhengChildren’s Hospital Boston

Introduction

See separate pdf for the slides

24

Graphical User Interface (GUI) to cTAKES:

a Prototype

Pei J. ChenChildren’s Hospital Boston

cTAKES as a Service

Objectives1. Demo cTAKES prototype web application

Empower End Users to leverage cTAKES2. Gather feedback for future cTAKES GUI3. Potential system integrations with other applications

(i.e. i2b2, ARC, Web Annotator)

Developed within i2b2 to integrate cTAKES in the i2b2 NLP cell

cTAKES Web Application: a Prototype

http://chipweb2.chip.org/cTakes_webservice_trunk/index.html

Single clinical note

Technologies

Front-End

Web GUI ExtJS JavaScript

Back-End

cTAKES JAVA UIMA

Middleware

Web ServicesJAVAApache

CXFJSON

Deployment Considerations

Deployment ModelSecurityPerformanceLicensing (UMLS, Apache, GPL v.3)