information extraction from medical records by alexander barsky

19
Information Extraction From Medical Records by Alexander Barsky

Upload: adrian-webb

Post on 29-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Information Extraction From Medical Records

 by Alexander Barsky

Current Methodology:

Broad assessment of patient contained in beginning of chart with references to more specific areas. Specific divisions follow broad assessment. Records are listed in chronological order of activity.

Chart Example:

.

Problem:

   A patient's medical chart is very detailed and very complex in nature. Any attempt to quickly locate specific information will be met with frustration.

Example:

.

Solution:

Create a system that properly extracts wanted information based on a predefined set of parameters.  Example: "Hormonal imbalance during puberty". Retrieve all references to hormonal imbalances but only between two specific time periods in medical chart.

Tool At our disposal:

JAPE  : Java Annotation Patterns Engine.     Use : pattern matching and semantic  extraction GATE : General Architecture for Text Engineering.    Use: Information Extraction, document annotation, and              XML output. C#     : Visual C# Winforms.    Use: Medium for conversion between XML and .csv file                    formats.          

Solution Methodology:

1. Create corpus of documents in GATE.2. Introduce rules for information extraction.3. Annotate documents in corpus.4. Output annotated documents in XML.5. Strip file of unnecessary elements and convert to .csv. 

                        ANNIE

        A-Nearly-New-Information-Extraction-System  -Tokeniser - splits sentence into simple tokens-Gazetter - identify entity names contained in lists-Sentence Splitter - splits text into sentences based on lists.-Parts of Speech Tagger - identifies text as different  POS.-Coreference Matcher- identifies relationships between previously defined entities.     

Success in Information Extraction is based on integrating most if not all ANNIE components -

        JAPE : Key to Extraction

-

                  JAPE Example

-

XML Output:

-

Problem: Too much unorganized information.

 Solution :

XLST to the rescue!!!

 XLST - Extensible Stylesheet Language Transformations  - Add specific rules to seperate needed from unnecessary information.

XLST Example

-Find all the nodes within the <Lookup>. Add string between the tags.

CSV File Type Comma  Seperated Value - Used to present information in a tabular system. Useful for analyzing large amount of data in an easy to understand format. Most common program to use it is Excel.  

.

Potential Problem:

Regardless of how well all the ANNIE tools are utilized and how well the JAPE rules are defined, proper recall precentage won't ever be exact.

Solution: Machine Learning

Machine learning is our best chance to increase precision  of output results. Training a computer to recognize commonally used reporting phraseology will organize extraction better with more precise, concise outputs. Lucky for us, GATE include plugins to program machine learning.