visualization of relational text information for biomedical knowledge discovery james w. cooper ibm...
TRANSCRIPT
![Page 1: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/1.jpg)
Visualization of Relational Text Visualization of Relational Text InformationInformation
for Biomedical Knowledge Discovery
James W. Cooper
IBM T J Watson Research Center
Hawthorne, NY
![Page 2: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/2.jpg)
Overview Overview
Prior workJava based text miningComputation of unnamed relationsGraphical display of relations
Text
Text
Text
TextText
TextText
Text
Text
![Page 3: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/3.jpg)
Relations between termsRelations between terms Noun phrase co-occurrence statistics [Roark,
Charniak] Choose seed words and look for terms near them.
[Brin] [Gravano, Agichtein]– Repeat
Biomedical domain– Blaschke used dictionary of common verbs– Pustejovsky found inhibit relations
Stevens, Palakal, Mostafa– Detected abstract-wide co-occurrence using
dictionary of genes and useful verbs.
![Page 4: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/4.jpg)
Graphical DisplaysGraphical Displays
Biolayout – protein similarityProtInAct – interactive system using yFilesZhang – interactive 3D systemJenssen – gene network Leroy – GeneScene
![Page 5: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/5.jpg)
BioLayout –Enright and OuzounisBioLayout –Enright and Ouzounis
Spheres represent proteins and lines represent protein similarities.
Five related protein families and their corresponding relationships.
![Page 6: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/6.jpg)
ProInAct- Spencer and BennettProInAct- Spencer and Bennett
Proteins clustered by functional interaction
![Page 7: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/7.jpg)
Zhang-Protein interaction mappingZhang-Protein interaction mapping
![Page 8: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/8.jpg)
Jenssen – A literature networkJenssen – A literature network
Lines connect genes that have co-occurred in 1 or more papers.
![Page 9: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/9.jpg)
Leroy –GeneSceneLeroy –GeneScene
![Page 10: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/10.jpg)
What would we like to do?What would we like to do?
Find scientifically meaningful connections between important terms.– Such as Swanson’s Reynaud’s disease – fish
oil connection.Allow exploration of relations by user.Filter the relations by ontology or term
typesPerform path analysisLet the user vary the graphical display.
![Page 11: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/11.jpg)
Data we analyzedData we analyzed
Two sets of patent data– 584 patents on Viagra and phosphodiesterase
inhibitors.– 1514 patents on quinolones (like Cipro)
Recognized major technical terms in each patent.
Filtered organic chemical nomenclature.
![Page 12: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/12.jpg)
The Talent text mining systemThe Talent text mining system
Text Analysis and Language Engineering Tools– Finds multiword noun phrases– Does shallow parse– Can extract NPs and VGs
As well as all other sentence parts
![Page 13: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/13.jpg)
The JTalent LibraryThe JTalent Library
Java class library with JNI interface– To Talent DLL
Creates database load files of terms– Paragraph– Sentence– Offset– Term type (NP, VG)
![Page 14: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/14.jpg)
TalentShow DemoTalentShow Demo
![Page 15: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/15.jpg)
The KSS LibraryThe KSS Library
Java class library of functions for– Accessing a database (DB2, Access)– Manipulating a search engine– Manipulating tables of information created by
JTalent.
![Page 16: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/16.jpg)
Database TablesDatabase Tables
Documents– Title, author, URL, ID
TermDocs– Term– Paragraph– Sentence– Offset– Type
Dictionary of terms, types and IDs– Such as MeSH
![Page 17: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/17.jpg)
Computing term informationComputing term information
Compute unique terms from TermdocsCompute frequencyCompute salience
– Based on frequency– Number of docs they appear in more than
once
![Page 18: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/18.jpg)
Compute term relationsCompute term relations
Named relations based on abbreviation expansions.
Unnamed relations based on proximity, with weight based on how frequently they occur near each other.
Mutual information weight:
21
logfreqfreq
paircounttotaltermsm
![Page 19: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/19.jpg)
Tuning Computed relationsTuning Computed relations
Select only terms above a salience threshold.
Only relations in which one or both are members of an ontology.
Store relations in a database table for rapid access:
Term | weight | term
![Page 20: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/20.jpg)
Original SystemOriginal System
Visual clientSOAP server
– Queries database to get relations– Round trip for each new query
Instead, we export the data for the user to visualize as they wish.
![Page 21: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/21.jpg)
Exporting relationsExporting relations Save relations and ontology information in xml file. <relation>
– <term> <iq>78</iq> <source>MeSH</source> <relationDocuments>
– <doc> 34</doc– </term>– <term> </term>
</relation> This XML file is a portable version of the computed
relations that we can then use with any number of viewers.
![Page 22: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/22.jpg)
A Graphical Relations ViewerA Graphical Relations Viewer
Creates a Java Relations object for each relation it reads from the XML file.
Inserts them into a Trie structure based on lower cased first term.– If there is already a Relation at that point, it
adds them to a Vector for that term.Creates an alphabetical list of all terms in a
2nd Trie.
![Page 23: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/23.jpg)
Using the ViewerUsing the Viewer
When you enter part of a term, it shows all terms starting with that fragment in the left list box.
When you click on a term, it shows all its relations in the right list box.
![Page 24: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/24.jpg)
Lexical NavigationLexical Navigation
Displays relations between terms graphically and allows you to explore them without formulating a specific query.
![Page 25: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/25.jpg)
Possible enhancementsPossible enhancements
Show only terms belonging to an ontology.Show only higher IQ termsShow the documents the relations occur in.Show the ontology reference.Show computed pathsShow more kinds of named relations.
– Inhibits, expresses
![Page 26: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/26.jpg)
Evaluations of Information Evaluations of Information VisualizationVisualization Few, if any, graphical displays have been
evaluated thus far for effectiveness. Usability studies are hard to construct and carry
out. Intuition seems to show
– that exploration may result in discoveries.– Relations more than one step apart seem best
displayed graphically. Remains to be shown that such visualizations are
actually useful.
![Page 27: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/27.jpg)
Differences in IntentDifferences in Intent
Displays may represent information your system has discovered.– Gene – protein relations
Or they may represent data from which the user may discover new information.– New 2nd or 3rd order relationships
These are rather different applications of visualization technology
![Page 28: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/28.jpg)
SummarySummary
Java-based text mining systemDatabase of terms and positionsComputation of relationsExport as XMLGraphical relations viewerThe value of such visual interfaces has not
yet been established.
![Page 29: Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY](https://reader036.vdocuments.mx/reader036/viewer/2022081603/5697c0051a28abf838cc511e/html5/thumbnails/29.jpg)
AcknowledgementsAcknowledgements
Bhavani Iyer – XML exportEric Brown – DictMatcher hash codeDaniel Tunkelang – graphical layoutBob Mack – paper suggestions