biomedical named entity recognition

26
Citation Biomedical Informatics Data Information Knowledge BMI Biomedical Named Entity Recognition Ramakanth Kavuluru NLP Seminar – 8/21/2012

Upload: ellis

Post on 24-Feb-2016

95 views

Category:

Documents


2 download

DESCRIPTION

Biomedical Named Entity Recognition. Ramakanth Kavuluru. NLP Seminar – 8/21/2012. What are named entities?. The benefits of taking cholesterol lowering statin drugs outweigh the risks even among people who are likely to develop diabetes. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Biomedical Named Entity Recognition

Citation

Biomedical InformaticsData ➜ Information ➜ Knowledge

BMI

Biomedical Named Entity Recognition

Ramakanth Kavuluru

NLP Seminar – 8/21/2012

Page 2: Biomedical Named Entity Recognition

BMI

What are named entities?

• The benefits of taking cholesterol lowering statin drugs outweigh the risks even among people who are likely to develop diabetes.

• Acute exposure to resveratrol inhibits AMPK activity in human skeletal muscle cells

Page 3: Biomedical Named Entity Recognition

BMI

What are named entities?

• The benefits of taking cholesterol lowering statin drugs outweigh the risks even among people who are likely to develop diabetes.

• Acute exposure to resveratrol inhibits AMPK activity in human skeletal muscle cells

Biologically Active Substance

Drug

Disorder

Organic Chemical

Enzyme

Cell

Page 4: Biomedical Named Entity Recognition

BMI

What are named entities?

• The benefits of taking cholesterol lowering statin drugs outweigh the risks even among people who are likely to develop diabetes.

• Acute exposure to resveratrol inhibits AMPK activity in human skeletal muscle cells

Cholesterol lowering drugs

Drug

Biological Function

Page 5: Biomedical Named Entity Recognition

BMI

Why do we need to extract them?

• To provide effective semantic search– Find all discharge summaries of patients that

have a history of diabetes and obesity and have taken statins as part of their treatment.

– Find all biomedical articles that discuss the dopamine neurotransmitter in the context of depressive disorders.

Clinical Trial Recruitment

Literature Review

Page 6: Biomedical Named Entity Recognition

BMI

Why do we need to extract them?

• To use as features in machine learning for effective text classification

• To build semantic clusters of textual documents to understand evolving themes

• Reduce noise by avoiding key words that are not indicative of the classes or clusters

• Recently, as a first step in relation extraction and hence in knowledge discovery

Page 7: Biomedical Named Entity Recognition

BMI

A major task in text mining• Extract information from textual data• Use this information to solve problems• What type of information?– relevant concepts - a medical condition or

finding, a drug, a gene or protein, an emotion (hope, love, …)

– Relevant (binary) relations – drug TREATS a condition, protein CAUSES a disease

• What are the typical questions?– Does a pathology report indicate a reportable

case?– Which patients satisfy the criteria for a clinical

trial?

Page 8: Biomedical Named Entity Recognition

BMI

Knowledge Discovery

• VIP Peptide – increases – Catecholamine Biosynthesis

• Catecholamines – induce – β-adrenergic receptor activity

• β-adrenergic receptors – are involved – fear conditioning

VIP Peptide – affects – fear conditioning ?????

In Cattle

In Rats

In Humans

Page 9: Biomedical Named Entity Recognition

BMI

Clinical NER

Concept Type Attributes• Disorder/

Symptom

• Medication

• Procedures

Present/historical/absent, Acute? Uncertain?

Present/historical/future

Page 10: Biomedical Named Entity Recognition

BMI

Why is NER Hard?

Page 11: Biomedical Named Entity Recognition

BMI

Linguistic Variation

• Derivational variation: cranial, cranium• Inflectional variation: coughed, coughing• Synonymy– nuerofibromin 2, merlin, NF2 protein, and

schwannomin.– Addison’s disease, adrenal insufficiency,

hypocortisolism, bronzed disease– Feeding problems in newborn – The mother

said she was having trouble feeding the baby.

Page 12: Biomedical Named Entity Recognition

BMI

Polysemy

• Merlin – both a bird and protein in UMLS• Discharge– Patient was prescribed codeine upon discharge– The discharge was yellow and purulent

• Abbreviations– APC: Activated protein C, Adenomatosis

polyposis coli, antigen presenting cell, aerobic plate count, advanced pancreatic cancer, age period cohort, antibody producing cells, atrial premature complex

Page 13: Biomedical Named Entity Recognition

BMI

Negation

• Nearly half of all clinical concepts in dictated narratives are negated– There is no maxillary sinus tenderness

• Implied absence without negation– Lungs are clear upon auscultationSo,– Rales: Absent– Rhonchi: Absent– Wheezing: Absent

Page 14: Biomedical Named Entity Recognition

BMI

Controlled Terminologies

Controlled vocabularies or taxonomies– Gene Ontology (gene products)

• most cited, 450 per year in PubMed• Total of 33000+ terms

– SNOMED CT (about 300K+ concepts)– NCI Thesaurus , ICD-9/10, ICD-0-3, LOINC,

MedlinePlus– UMLS Metathesaurus (integration of 140+

vocabularies)• 2.3 million concepts

Page 16: Biomedical Named Entity Recognition

BMI

Semantic Types and Relations

• NLM Semantic Network, the type system behind UMLS Metathesaurus– Semantic Types (135)

• Semantic Groups (15)– Semantic Relations (54)

• Specialist Lexicon– Malaria, malarial– Hyperplasia, hyperplastic

How do we extract named entities?

Page 17: Biomedical Named Entity Recognition

BMI

Metamap from NLM

Identify phrases: Use SPECIALIST parser

Map to CUIs: Use SPECIALIST Lexicon, Metathesaurus and Semantic Network

Page 18: Biomedical Named Entity Recognition

BMI

Output of syntactic analysis

• Syntactic Analysis – “ocular complications of myasthenia gravis” – Ocular (adj), complications (noun), of (prep),

myasthenia (noun), gravis (noun)– gives noun phrases (NP): “Ocular

complications” and “Myasthenia gravis”– Prepositions are ignored– In a given NP, you have a head and modifiers:

• Ocular (mod) and complications (head)• How about “male pattern baldness”?

Page 19: Biomedical Named Entity Recognition

BMI

Variant Generation

Page 20: Biomedical Named Entity Recognition

BMI

Variant Generation

Page 21: Biomedical Named Entity Recognition

BMI

Candidate identification• Look for all variants in Metathesaurus

strings and identify those candidate concepts (CUIs) that contain at least one variant as a substring

• Example: For ocular complication, obtain all Metathesaurus strings that contain any of the following as substrings– Optic complication– Eyes complication– Opthalmic complicated– ….

Page 22: Biomedical Named Entity Recognition

BMI

Mapping and Evaluation

• So now we have a bunch of candidate CUIs based on presence of variants of the given phrase in Metathesaurus strings. How do we select the best candidate.

• Use several measures to compute a rank– Centrality (involvement of head)– Variation (average of inverse distance scores)– Coverage– Cohesivness

Page 23: Biomedical Named Entity Recognition

BMI

Final Score

Page 24: Biomedical Named Entity Recognition

BMI

Metamap Options

• Types of variants: include or exclude derivational variants

• Word sense disambiguation– Discharge (bodily secretion VS release the

patient)• Concept gaps– Obstructive apnea mapping to “obstructive

sleep apnea” or “obstructive neonatal apnea”• Term processing– Process the input string as a single concept,

that is, don’t split it into noun phrases

Page 25: Biomedical Named Entity Recognition

BMI

Output options

• Human readable format• XML format• Restrictions based on certain vocabularies:

consider only ICD-9• Restrictions based on certain types:

consider only pharmacological substances (i.e., drugs)

DEMO TIME: Daniel Harris

Page 26: Biomedical Named Entity Recognition

BMI

References• An overview of Metamap

: Historical Perspectives and Recent Advances, Alan Aronson and Francois Lang

• Effective Mapping of Biomedical Text to the UMLS Metathesaurus: The MetaMap Program, Alan Aronson

• Comparison of LVG and Metamap Functionality, Alan Aronson

• Lexical, Terminological, and Ontological Resources for Biological Text Mining, Olivier Bodenreider