visualization of relational text information for biomedical knowledge discovery james w. cooper ibm...
TRANSCRIPT
Visualization of Relational Text Visualization of Relational Text InformationInformation
for Biomedical Knowledge Discovery
James W. Cooper
IBM T J Watson Research Center
Hawthorne, NY
Overview Overview
Prior workJava based text miningComputation of unnamed relationsGraphical display of relations
Text
Text
Text
TextText
TextText
Text
Text
Relations between termsRelations between terms Noun phrase co-occurrence statistics [Roark,
Charniak] Choose seed words and look for terms near them.
[Brin] [Gravano, Agichtein]– Repeat
Biomedical domain– Blaschke used dictionary of common verbs– Pustejovsky found inhibit relations
Stevens, Palakal, Mostafa– Detected abstract-wide co-occurrence using
dictionary of genes and useful verbs.
Graphical DisplaysGraphical Displays
Biolayout – protein similarityProtInAct – interactive system using yFilesZhang – interactive 3D systemJenssen – gene network Leroy – GeneScene
BioLayout –Enright and OuzounisBioLayout –Enright and Ouzounis
Spheres represent proteins and lines represent protein similarities.
Five related protein families and their corresponding relationships.
ProInAct- Spencer and BennettProInAct- Spencer and Bennett
Proteins clustered by functional interaction
Zhang-Protein interaction mappingZhang-Protein interaction mapping
Jenssen – A literature networkJenssen – A literature network
Lines connect genes that have co-occurred in 1 or more papers.
Leroy –GeneSceneLeroy –GeneScene
What would we like to do?What would we like to do?
Find scientifically meaningful connections between important terms.– Such as Swanson’s Reynaud’s disease – fish
oil connection.Allow exploration of relations by user.Filter the relations by ontology or term
typesPerform path analysisLet the user vary the graphical display.
Data we analyzedData we analyzed
Two sets of patent data– 584 patents on Viagra and phosphodiesterase
inhibitors.– 1514 patents on quinolones (like Cipro)
Recognized major technical terms in each patent.
Filtered organic chemical nomenclature.
The Talent text mining systemThe Talent text mining system
Text Analysis and Language Engineering Tools– Finds multiword noun phrases– Does shallow parse– Can extract NPs and VGs
As well as all other sentence parts
The JTalent LibraryThe JTalent Library
Java class library with JNI interface– To Talent DLL
Creates database load files of terms– Paragraph– Sentence– Offset– Term type (NP, VG)
TalentShow DemoTalentShow Demo
The KSS LibraryThe KSS Library
Java class library of functions for– Accessing a database (DB2, Access)– Manipulating a search engine– Manipulating tables of information created by
JTalent.
Database TablesDatabase Tables
Documents– Title, author, URL, ID
TermDocs– Term– Paragraph– Sentence– Offset– Type
Dictionary of terms, types and IDs– Such as MeSH
Computing term informationComputing term information
Compute unique terms from TermdocsCompute frequencyCompute salience
– Based on frequency– Number of docs they appear in more than
once
Compute term relationsCompute term relations
Named relations based on abbreviation expansions.
Unnamed relations based on proximity, with weight based on how frequently they occur near each other.
Mutual information weight:
21
logfreqfreq
paircounttotaltermsm
Tuning Computed relationsTuning Computed relations
Select only terms above a salience threshold.
Only relations in which one or both are members of an ontology.
Store relations in a database table for rapid access:
Term | weight | term
Original SystemOriginal System
Visual clientSOAP server
– Queries database to get relations– Round trip for each new query
Instead, we export the data for the user to visualize as they wish.
Exporting relationsExporting relations Save relations and ontology information in xml file. <relation>
– <term> <iq>78</iq> <source>MeSH</source> <relationDocuments>
– <doc> 34</doc– </term>– <term> </term>
</relation> This XML file is a portable version of the computed
relations that we can then use with any number of viewers.
A Graphical Relations ViewerA Graphical Relations Viewer
Creates a Java Relations object for each relation it reads from the XML file.
Inserts them into a Trie structure based on lower cased first term.– If there is already a Relation at that point, it
adds them to a Vector for that term.Creates an alphabetical list of all terms in a
2nd Trie.
Using the ViewerUsing the Viewer
When you enter part of a term, it shows all terms starting with that fragment in the left list box.
When you click on a term, it shows all its relations in the right list box.
Lexical NavigationLexical Navigation
Displays relations between terms graphically and allows you to explore them without formulating a specific query.
Possible enhancementsPossible enhancements
Show only terms belonging to an ontology.Show only higher IQ termsShow the documents the relations occur in.Show the ontology reference.Show computed pathsShow more kinds of named relations.
– Inhibits, expresses
Evaluations of Information Evaluations of Information VisualizationVisualization Few, if any, graphical displays have been
evaluated thus far for effectiveness. Usability studies are hard to construct and carry
out. Intuition seems to show
– that exploration may result in discoveries.– Relations more than one step apart seem best
displayed graphically. Remains to be shown that such visualizations are
actually useful.
Differences in IntentDifferences in Intent
Displays may represent information your system has discovered.– Gene – protein relations
Or they may represent data from which the user may discover new information.– New 2nd or 3rd order relationships
These are rather different applications of visualization technology
SummarySummary
Java-based text mining systemDatabase of terms and positionsComputation of relationsExport as XMLGraphical relations viewerThe value of such visual interfaces has not
yet been established.
AcknowledgementsAcknowledgements
Bhavani Iyer – XML exportEric Brown – DictMatcher hash codeDaniel Tunkelang – graphical layoutBob Mack – paper suggestions