delroy cameron's dissertation defense: a contenxt-driven subgraph model for literature-based...
DESCRIPTION
Literature-Based Discovery (LBD) refers to the process of uncovering hidden connections that are implicit in scientific literature. Numerous hypotheses have been generated from scientific literature, which influenced innovations in diagnosis, treatment, preventions and overall public health. However, much of the existing research on discovering hidden connections among concepts have used distributional statistics and graph-theoretic measures to capture implicit associations. Such metrics do not explicitly capture the semantics of hidden connections. ... While effective in some situations, the practice of relying on domain expertise, structured background knowledge and heuristics to complement distributional and graph-theoretic approaches, has serious limitations. .. This dissertation proposes an innovative context-driven, automatic subgraph creation method for finding hidden and complex associations among concepts, along multiple thematic dimensions. It outlines definitions for context and shared context, based on implicit and explicit (or formal) semantics, which compensate for deficiencies in statistical and graph-based metrics. It also eliminates the need for heuristics a priori. An evidence-based evaluation of the proposed framework showed that 8 out of 9 existing scientific discoveries could be recovered using this approach. Additionally, insights into the meaning of associations could be obtained using provenance provided by the system. In a statistical evaluation to determine the interestingness of the generated subgraphs, it was observed that an arbitrary association is mentioned in only approximately 4 articles in MEDLINE, on average. These results suggest that leveraging implicit and explicit context, as defined in this dissertation, is an advancement of the state-of-the-art in LBD research. Ph.D. Committee: Drs. Amit Sheth (Advisor), TK Prasad, Michael Raymer, Ramakanth Kavuluru (UKY), Thomas C. Rindflesch (NLM) and Varun Bhagwan (Yahoo! Labs) Relevant Publications (more at: http://knoesis.wright.edu/students/delroy/) D. Cameron, R. Kavuluru, T. C. Rindflesch, O. Bodenreider, A. P. Sheth, K. Thirunarayan. Leveraging Distributional Semantics for Domain Agnostic Literature-Based Discovery (under preparation) D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13), 46(2): 238–251, 2013 D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature International Bioinformatics and Biomedical Conference (BIBM11), pp. 512–519, 2011 (acceptance rate=19.4%) D. Cameron, P. N. Mendes, A. P. Sheth, V. Chan. Semantics-empowered Text Exploration for Knowledge Discovery. ACM Southeast Conference (ACMSE10), 14, 2010TRANSCRIPT
A CONTEXT-DRIVEN SUBGRAPH MODEL FOR LITERATURE-BASED DISCOVERY
PH.D. DISSERTATION DEFENSEDELROY CAMERONAUGUST 18, 2014
PH.D. COMMITTEEAMIT P. SHETH (ADVISOR)KRISHNAPRASAD THIRUNARAYANMICHAEL RAYMERRAMAKANTH KAVULURU (UKY)THOMAS C. RINDFLESCH (NIH)VARUN BHAGWAN (YAHOO! LABS)
All truths are easy to understand once they are discovered; the point is to discover them. (Galileo Galilei, 1564–1642)
2
Historical Perspectives
Walter Sutton(1877 – 1916)
Theodor Boveri(1862 – 1915)
Gregor Johann Mendel(1822 – 1884)
Mendelian Laws of Inheritance(1866)
Boveri-Sutton Chromosome Theory(1903)
3
Science of Making Discoveries
Discovery
Information ProcessingSystem
×What is promising?
4
Thesis Statement
An information processing system that leverages rich representations of textual content from scientific literature based on implicit and explicit context can provide effective means for literature-based discovery.
5
Motivation
Rofecoxib Osteoarthritis1999 TREAT
Merck & Co.
Increased risk of Heart Attack
2002
2004
$254.3 millionSettlement
2005
VioxxWithdrawn
$4.85 billionSettlement
Confirmed byClinical Trial
2007 2011
$950 millionSettlement
2013
$23 millionSettlement
6
Motivation
Literature-Based Discovery (LBD)
7
Literature-Based Discovery (LBD)
ABC Model
AnC Model
Context-Driven Subgraph Model
A CB
A CB
1
B2
Bi
Source: Wikipedia - http://en.wikipedia.org/wiki/Don_R._Swanson
Keyword-basedConcept-based
Relations-based
2006 20111986 1996
ARROWSMITH v1Term Frequency
1999
IRIDESCENTTerm Co-occurrence
2001
DADMetaMAP
UMLS
2003
LitlinkerMeSH, UMLS, Rules
Level of Support
Contribution #1Context-Driven
Subgraph Model for LBD
SemBTSemantic Predications
Level of Support
Discovery Browsing Degree Centrality
Cooperative Reciprocity
Manual
2013
ManjalUMLS, MeSH
Topic Profiles, TF-IDF
2004
RajolinkMeSH, Rarity
BioSbKDSUMLS Relations
MeSH
2005
BITOLAUMLS, MeSHAssoc. Rules,
Confidence
Graph-based
ACS (2004)MeSH,
Hebbian Learning
A CBCAUSESINHIBITS
CAUSESA CDISRUPTS
PRODUCES
INHIBITS
STIM
ULATE
S
PRODUCES
INHIB
TS
ISA
TREATS
Discovery Patterns
Hybrid
ARROWSMITH v28 Features (2007)
Semantic MEDLINESummarization
Discovery Browsing
EpiphanetPredications-based Semantic Indexing
CoPubKeywords, Mutual
Information
2010
Literature-based discovery refers to the use of papers and other academic publications (the “literature”) to find new relationships between existing knowledge (the “discovery”).
Definition courtesy of Wikipedia: http://en.wikipedia.org/wiki/Literature-based_discovery
8
Application: Raynaud Syndrome – Fish Oil
ISA
Prostaglandin I3
CONVERTS_TO
Dietary Fish Oils
Platelet Aggregation
DISRUPTS
ISA
DISRUPTS
DISRUPTS
Epoprostenol
DISRUPTS
ISA
STIMULATES
Prostaglandin
CONVERTS_TO
Raynaud Syndrome
TREATS
CAUSES
D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13). 46(2): 238–251, 2013.
Dietary Fish Oils
Platelet Aggregation
Raynaud Syndrome
DISRUPTS CAUSESDietary Fish Oils
Platelet Aggregation
Raynaud Syndrome
Keyword/Conceptbased
Relationsbased
Subgraphbased
Inferred predicates
9
Comparison
Scenario Intermediate Cameron [19] Srinivasan [88, 89]
Weeber [101, 102]
Gordon [36,37,38]
Hristovski [40]
Raynaud Syndrome – Dietary Fish
Oils
Blood Viscosity × × × × ×
Platelet Aggregation × × × × ×
Vascular Reactivity × × × ×
Ramakrishnan [72]*
?
?
?
Table 1: Comparison of intermediates rediscovered for Raynaud Syndrome – Dietary Fish Oil
DISRUPTS
ISA
ISA
Dietary Fish Oils
Platelet Aggregation
DISRUPTS Raynaud SyndromeCAUSES
Prostaglandins
CONVERTS_TO
Prostacyclin (PGI2)
DISRUPTSProstaglandin I3
(PGI3) TREATSSTIMULATES
Raynaud Syndrome
Dietary Fish Oils
Fatty Acid
Essential Fatty Acid
Triglyceride
Lipid
ISA
DISRUPTS CAUSES
ISAINHIBIT
AFFECTS
ISA
INHIBITS
Blood Viscosity
Cellular Activity
Blood Physiology
Problem
How to automate this?
TissueFunction
D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis usingSemantic Predications. Journal of Biomedical Informatics (JBI13). 46(2): 238–251, 2013.
DISRUPTS
ISA
Dietary Fish Oils
Prostaglandin I3 (PGI3)
Prostacyclin (PGI2)
Raynaud SyndromeCAUSESVasoconstrictionINHIBIT
CONVERTS_TO
AFFECTS DISRUPTSTREATS
Literature-Based
Discovery
Context-Driven
Subgraph Model
Foundations
Automatic Subgraph Creation
Experimental Results
Dissertation Contribution
s
Knowledge Exploration
Limitations & Future
Work
PREDICATIONS GRAPH
13
. . .
Subgraph Model
Predications Graph (G)
CandidateGraph (RG)
Subgraphs (SG)
No two contexts are the same
R(s,t)(c1) R(s,t)(c2) R(s,t)(ck)
R(s,t)
. . .
. . .
What is context?
Literature-Based
DiscoveryContext-Driven
Subgraph Model
Foundations
Automatic Subgraph Creation
Experimental Results
Dissertation Contribution
s
Knowledge Exploration
Limitations &
Future Work
15
• Path Relatedness• Semantic Predication Context
Context Distribution Assumption: The context of a semantic predication can be expressed as the distribution of all MeSH descriptors associated with all articles that contain it.
Semantic Underpinnings
Relational Semantic Summary
Textual Semantic Summary
Concept-LevelSemantic Summary
Interchangeability Assumption: The concept-level and relational semantic summary of a MEDLINE article are interchangeable.
16
Linguistic Underpinnings
Linguistic items with similar distributions have similar meanings
“You shall know a word by the company it keeps”
– J. R. Firth 1957
Semantic Predications with shared contexts in their distributions are related
Distributional Semantics
Context-sensitive nature of meaning
Literature-Based
DiscoveryContext-Driven
Subgraph Model
Foundations
Automatic Subgraph Creation
Experimental Results
Dissertation Contribution
s
Knowledge Exploration
Limitations &
Future Work
18MeSH Hierarchy
MeSH Hierarchy
Automatic Subgraph Creation
m1 m2
m7 m8
m1 m7 m2 m8
m1 m5 m9 m8
Semantic Relatednessof MeSH Context Vectorsm9m1
m5 m8
Contribution #2 Context of a path
as a vector of MeSH Descriptors
pi
pj
19
Path Relatedness
3 32
5 42
2
53 6
Objective #1: Maximize weights of In-Context Descriptors
Objective #2: Minimize weights of Out-Of-Context Descriptors
C(pi)
C(pj) 1 3 1 2
2
3 00 00 02 0 0 03 22
5 42 53 61 3 1 20 00
p – patht – semantic predication
m1 m2 m3 m4 m5
m1 m2 m6 m7 m8 m9 m10 m11 m12 m13
m1 m2 m6 m7 m8 m9 m10 m11 m12 m13m3 m4 m5
C(pi)
C(pj)
20
Path Relatedness: Shared Context
1 00 00 01 0 0 01 11
1 11 11 11 1 1 10 00
Platelet aggregation
Plateletactivation
EpoprostenolPlatelet
adhesivenessProstaglandinsm3 m4 m5 m9 m10 m11 m12 m13
G-Tree
platelet aggregation
hemostasis
Blood physiological
process
Blood physiological phenomena
Circulatory and respiratory physiological phenomena
platelet adhesiveness
platelet activation Epoprostenol
D-Tree
Prostaglandins I
Arachidonic Acids
Fatty Acids, Unsaturated
Fatty Acids
Lipids
Prostaglandins
Eicosanoids
Contribution #3 Structured Background Knowledge
for computing shared context of paths
C(pi)
C(pj)
21
Path Relatedness Score
*Dictionary of Distances, Elena Deza, Michel-Marie Deza, Elsevier, 2006
22
Hierarchical Agglomerative Clustering
A C A CA CA C A CA CA C A C
Iteration 1
Iteration n
. . .Bucket PopulationBucket Merging. . .
A C
A C
A C
A C
Path Relatedness Threshold
1. Bucket Population
2. Bucket Merging
3. Subgraph Ranking
23
Summary of Metrics
• Path Relatedness– Model: MeSH Context Vectors– Metrics: Semantics-enhanced shared context, Log Reduction– Threshold: ??
• MeSH Semantic Similarity– Model: MeSH Hierarchy– Metrics: Dice Similarity– Threshold: Manually
24
Automatic Threshold Selection
RS-DFO Experiment
Manual Threshold = 3.0
Gaussian Distribution
Path Relatedness Score
Num
ber
of P
ath
Pai
rs
25
Automatic Threshold Selection
Gaussian Function
Path Relatedness Score
Exp
ecte
d V
alue
26
Automatic Threshold Selection
• Gaussian Distribution
Diagram courtesy of Wikipedia*
Points of Inflection
27
Threshold Comparisons
ScenarioPath Relatedness Score
Max2 Std Dev. Manual 3 Std Dev.
RS-DFO 2.68 3.0 3.04 3.38
Testosterone-Sleep 3.35 3.5 3.8262 6.22
DEHP-Sepsis 3.94 4.0 4.53 4.84
Table 2: Path Relatedness Threshold Comparisons
28
Bucket Merging
Ba
Bb
Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze: Introduction to information retrieval. Cambridge University Press 2008, ISBN 978-0-521-86571-5, pp. I-XXI, 1-482
Straggly Clusters Compact Clusters
Broad Clusters
29
Subgraph Ranking
Intra-Cluster Rank
30
Singleton Ranking
Association Rarity
31
Summary of Metrics
• Path Relatedness– Model: MeSH Context Vectors– Metrics: Semantics-enhanced shared context, Log Reduction– Manual Threshold for Semantic Similarity, Dice Similarity– Threshold: 2nd Standard Deviation from Mean of Gaussian
• Bucket Relatedness– Model: Set of Paths– Metric: Inter-Cluster Similarity– Threshold: 2nd Standard Deviation from Mean of Gaussian
• Subgraph Ranking– Metrics: Intra-Cluster Similarity, Singleton Rank (Association Rarity)
32
Algorithm
Time Complexity: Θ(N 2logN )
Literature-Based
DiscoveryContext-Driven
Subgraph Model
Foundations
Automatic Subgraph Creation
Experimental Results
DissertationContribution
s
Knowledge Exploration
Limitations &
Future Work
34
Raynaud Syndrome – Dietary Fish Oil
Inferred predicates
Path Relatedness Threshold = 3σ
Scenario 1: Raynaud Syndrome – Dietary Fish Oil
Details Intermediate Association Status
Cut-off date: Nov. 1985By. D. R. Swanson(Article)
Blood ViscosityDietary Fish Oils INHIBITS Blood
ViscosityBlood Viscosity CAUSES Raynaud
SyndromeZR-15
Platelet AggregationDietary Fish Oils INHIBITS Platelet
AggregationPlatelet Aggregation CAUSES Raynaud
SyndromeS1
VasoconstrictionDietary Fish Oils INHIBITS
VasoconstrictionVasoconstriction CAUSES Raynaud
Syndrome
Legend
ZR-zero rarity singleton
S-Subgraph
Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
Scenario 2: Magnesium – Migraine
Details Intermediate Association Status
Cut-off date: Apr. 1987By. D. R. Swanson(Article)
Calcium Channel BlockersMagnesium ISA Calcium Channel
BlockerCalcium Channel Blockers TREATS
MigraineS22
Epilepsy Magnesium AFFECTS Epilepsy Epilepsy CO_EXISTS_WITH Migraine S9
Hypoxia Magnesium INHIBITS Hypoxia Hypoxia ASSOCIATED_WITH Migraine
Inflammation Magnesium INHIBITS Inflammation Inflammation CAUSES Migraine ZR-3
Platelet ActivityMagnesium INHIBITS Platelet
AggregationPlatelet Aggregation CAUSES Migraine S1
ProstaglandinsMagnesium STIMULATES
ProstaglandinsProstaglandins DISRUPTS Migraine S4
Stress/Type A Personality STRESS INHIBITS Magnesium Stress ASSOICATED_WITH Migraine
Serotonin Magnesium INHIBITS Serotonin Serotonin CAUSES Migraine S1
Cortical DepressionMagnesium INHIBITS Spreading
Cortical DepressionSpreading Cortical Depression CAUSES
Migraine
Substance P Magnesium INHIBITS Substance P Substance P CAUSES Migraine
Vascular Mechanisms Magnesium INHIBITS Vasoconstriction Vasoconstriction CAUSES Migraine S9
Legend
ZR-zero rarity singleton
S-Subgraph
Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
Scenario 3: Somatomedin C – Arginine
Details Intermediate Association Status
Cut-off date: Apr. 1989By. D. R. Swanson(Article)
Growth HormoneArginine STIMULATES Growth
HormoneGrowth Hormone STIMULATES
Somatomedins (IGF1)S5
Body Weight (body mass)Somatomedins (IGF1) STIMULATES
GrowthArginine STIMULATES Growth S7
Malnutrition Somatomedins TREATS Malnutrition Arginine TREATS Malnutrition S7
Wound Healing (NK activity)
Somatomedins STIMULATES Wound Healing
Arginine STIMULATES Wound Healing
Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
Legend
ZR-zero rarity singleton
S-Subgraph
Not Found
Scenario 4: Indomethacin – Alzheimer’s Disease
Details Intermediate Association Status
Cut-off date: Jul. 1995By. Swanson/Smalheiser(Article)
Acetylcholine Indomethacin INHIBITS Acetylcholine Acetylcholine CAUSES Alzheimers S4
Lipid PeroxidationIndomethacin INHIBITS Lipid
PeroxidationLipid Peroxidation CAUSES Alzheimers S2
M2-MuscarinicIndomethacin INHIBITS M2-
MuscarinicM2-Muscarinic CAUSES Alzheimers
Membrane FluidityIndomethacin INHIBITS Membrane
Fluidity Membrane Fluidity CAUSES Alzheimers
LymphocytesIndomethacin STIMULATES Natural
Killer T-Cell ActivityT-Cell Activity INHIBITS Alzheimers S14
ThyrotropinIndomethacin STIMULATES
ThyrotropinThyrotropin AFFECTS Alzheimers ZR-20
T-lymphocytes (T-Cells)Indomethacin STIMULATES T-
lymphocytesT-lymphocyte Activity INHIBITS
AlzheimersS3
Legend
ZR-zero rarity singleton
S-Subgraph
Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
Scenario 5: Estrogen – Alzheimer’s Disease
Details Intermediate Association Status
Cut-off date: Jul. 1995By. Swanson/Smalheiser(Article)
Antioxidant Activity Estrogen INHIBITS Antioxidant Activity Antioxidant Activity CAUSES Alzheimers S4
Aliproprotein E (ApoE) Estrogen INHIBITS ApoE ApoE CAUSES Alzheimers S3
Calbindin D28kEstrogen REGULATES Caldindin
D28kCalbindin D28k AFFECTS Alzheimers S4
Cathepsin D Estrogen STIMULATES Cathepsin D Cathepsin D PREVENTS Alzheimers
Cytochrome C Oxidase Subunit III
Estrogen STIMULATES Cytochrome C Oxidase Subunit III
Cytochrome C Oxidase Subunit IIIAFFECTS Alzheimers
Glutamate Estrogen STIMULATES Glutamate Glutamate AFFECTS Alzheimers
Receptor PolymorphismEstrogen EXHIBITS Receptor
PolymorphismReceptor Polymorphism AFFECTS
Alzheimers
Legend
ZR-zero rarity singleton
S-Subgraph
Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
Scenario 6: Calcium Independent PLA2 – Schizophrenia
Details Intermediate Association StatusCut-off date: 1997By. Swanson/Smalheiser(Article)
Oxidative StressOxidative Stress INHIBITS Calcium-
Independent PLA2Oxidative Stress CAUSES Schizophrenia ZR-2
SeleniumSelenium INHIBITS Calcium-
Independent PLA2Selenium PREVENTS Schizophrenia ZR-2
Vitamin EVitamin E INHIBITS Calcium-
Independent PLA2Vitamin E PREVENTS Schizophrenia ZR-2
Legend
ZR-zero rarity singleton
S-Subgraph
Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
Scenario 7: Chlorpromazine – Cardiac Hypertrophy
Details Intermediate Association StatusCut-off date: 01/01/2002By. J. D. Wren(Article)
Calcineurin Chlorpromazine INHIBITS CalcineurinCalcineurin CAUSES Cardiac
HypertrophyS5
IsoproterenolChlorpromazine INHIBITS
IsoproterenolIsoproterenol CAUSES Cardiamegaly S12
Legend
ZR-zero rarity singleton
S-Subgraph
Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
Scenario 8: Testosterone – Sleep
Details Intermediate Association StatusCut-off date: 01/01/2012By. Miller/Rindflesch(Article)
Cortisol/Hydrocortisone Testosterone INHIBITS Cortisol Cortisol DISRUPTS Sleep S7
Legend
ZR-zero rarity singleton
S-Subgraph
Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
Scenario 9: Diethylhexyl Phthalate (DEHP) – Sepsis
Details Intermediate Association StatusCut-off date: 01/01/2013By. Cairelli/Rindflesch(Article)
PParGamma DEHP STIMULATES PParGamma PParGamma INHIBITS Sepsis
Legend
ZR-zero rarity singleton
S-Subgraph
Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
44
Statistical Evaluation
Association Rarity Interestingness
45
Statistical Evaluation
Experiment # Unique Associations
Total MEDLINE
Frequency
Rarity r(E)
Interestingness I(E)
Raynaud-Fish Oil 10 0 0.00 1.00
Magnesium-Migraine 48 27 0.56 0.64
SomaC-Arginine 18 306 17.00 0.06
Indomethacin-Alzheimers
21 9 0.43 0.70
Estrogen-Alzheimers 42 36 0.86 0.54
PLA2-Schizophrenia 10 0 0.00 1.00
CPZ-Cardiac Hypertrophy
21 2 0.10 0.91
Testosterone-Sleep 61 654 10.72 0.09
Average 29 129 3.71 0.62
Table 3: Rarity and Interestingness score of the subgraphs in the rediscoveries
Literature-Based
DiscoveryContext-Driven
Subgraph Model
Foundations
Automatic Subgraph Creation
Experimental Results
Dissertation Contribution
s
Knowledge Exploration
Limitations &
Future Work
47
Predications-based Knowledge Exploration
Corpus
Predications Graph
Definitional Knowledge (UMLS + MeSH)
Provenance
Knowledge Abstraction
D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature International Bioinformatics and Biomedical Conference (BIBM11). 512–519 , 2011.
Contribution #4 Combining Assertional and
Definitional Knowledgefor Knowledge Exploration
48
Levels of Contexts
A CBPredication
Context
A CB
1
B2
Bi
PathContext
A CB
1
B2
B3
A CB
1
B2Shared
Context
CAUSESA CDISRUPTS
PRODUCES
INHIBITS
STIM
ULATE
S
PRODUCES
INHIB
TS
ISA
TREATSSubgraphContext
…
…
…
…
…
… A C
A C
A C
…
Dimensions
Literature-Based
DiscoveryContext-Driven
Subgraph Model
Foundations
Automatic Subgraph Creation
Experimental Results
DissertationContribution
s
Knowledge Exploration
Limitations &
Future Work
50
Dissertation Contributions
1. Context-Driven Subgraph Model– Knowledge Rediscovery & Decomposition
2. Predication/Path Context– Vector of MeSH Descriptors
3. Shared Context– Background Knowledge (MeSH Hierarchy)
4. Semantic Predications-based Text Exploration– Obvio Web Application
51
Innovation
System/TechniqueTechnique
TypeAutomatic Relational
Evidence-based
Thematic
Results
#Discoveries #Rediscoveries
IRIDESCENT [108] Keyword 1 0
ARROWSMITH [84]Keyword/Concept
5 0
DAD [101,102] Concept 0 2
BITOLA [46] Concept 0 1
Litlinker [110] Concept 0 2
Manjal [87,88] Concept × 0 5
SemBT [40,41,42] Relations × × 0 1
BioSbKDS [47] Relations × × 0 1
Wilkowski [107] Graph × × 0 0
Ramakrishnan [72] Graph × × 0 1*
Zhang [114] Graph × × × 0 0
Obvio [19, 21] Graph × × × × 0 8
ARROWSMITH v2 [86,98] Hybrid × 0 6*
Semantic MEDLINE [18,63] Hybrid × × 2 0
Note: References are from the PhD Dissertation manuscript entitled: A Context Driven Subgraph Model for Literature-Based Discovery
Table 4: Comparison of capabilities and accomplishments of LBD techniques
Literature-Based
DiscoveryContext-Driven
Subgraph Model
Foundations
Automatic Subgraph Creation
Experimental Results
DissertationContribution
s
Knowledge Exploration
Limitations &
Future Work
53
Limitations
1. Manual Threshold– MeSH Semantic Similarity
2. Path Relatedness Threshold– Only Approximate Gaussian
3. Definition of Context
54
Levels of Semantic Representation
Keywords
Concepts
MeSH Descriptors
Semantic Predications
Ensemble of Features
Relationships
A B
Semantic PredicationPREDICATE
55
Limitations
1. Manual Threshold– MeSH Semantic Similarity
2. Path Relatedness Threshold– Only Approximate Gaussian
3. Definition of Context
4. MEDLINE Querying– Deep integration of Assertional/Definitional
5. Contradiction Detection
6. Statistical Evaluation
7. Scalability of Clustering Algorithm
8. Subgraph Labeling
56
Take Away
• Future of Information Processing– Rich Knowledge Representations
o Implicit, Formal, Powerful semantics
– Application to Literature-Based Discovery
57
Conclusion
• Context-Driven Subgraph Model – Manually create Complex Associations– Automatic Subgraph Creation
o Novel definitions for Context and Shared Contexto Multiple Thematic Dimensions
– Predications-based Knowledge Exploration o Predicateso Highlighted MEDLINE sentences
– Knowledge Rediscoveryo 8 out of 9 existing scientific discoveries
58
Publications
1. D. Cameron, R. Kavuluru, T. C. Rindflesch, O. Bodenreider, A. P. Sheth, K. Thirunarayan. Context-Driven Automatic Subgraph Creation for Literature-Based Discovery (under preparation)
2. D. Cameron, A. P. Sheth, N. Jaykumar, G. Anand, K. Thirunarayan, G. A. Smith. A Hybrid Approach to Finding Relevant Social Media Content for Domain Specific Information Needs. (submitted to the Journal of Web Semantics)
3. D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13). 46(2): 238–251, 2013.
4. D. Cameron, G. A. Smith, R. Daniulaityte, A. P. Sheth, D. Dave, L. Chen, G. Anand, R. Carlson, K. Z. Watkins, R. Falck. PREDOSE: A Semantic Web Platform for Drug Abuse Epidemiology using Social Media Journal of Biomedical Informatics (JBI13). 46(6): 985–997, 2013.
5. R. Daniulaityte, R. Carlson, R. Falck, D. Cameron, S. Perera, L. Chen, A. P. Sheth. “I just wanted to tell you that Loperamide WILL WORK: A Web-Based Study of Extra-medical use of Loperamide. Journal of Drug and Alcohol Dependence (DAD13) 130(1–3): 241–244, 2013.
6. D. Cameron, V. Bhagwan, A. P. Sheth. Towards Comprehensive Longitudinal Healthcare Data Capture. International Workshop on Semantic Web in Literature-Based Discovery (SWLBD12). 241–247, 2012.
7. R. Daniulaityte, R. Carlson, R. Falck, D. Cameron, S. Perera, L. Chen, A. P. Sheth. A Web-Based Study of Extra-medical use of Loperamide. The College on Problems of Drug Dependence (CPDD12), 2012.
8. D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature. International Bioinformatics and Biomedical Conference (BIBM11). 512–519, 2011.
9. D. Cameron, B. Aleman-Meza, I. B. Arpinar, S. L. Decker, A. P. Sheth. A Taxonomy-based Model for Expertise Extrapolation. International Conference on Semantic Computing (ICSC10). 333–240, 2010.
10. D. Cameron, P. N. Mendes, A. P. Sheth, V. Chan. Semantics-empowered Text Exploration for Knowledge Discovery. ACM Southeast Conference (ACMSE10). 14, 2010.
11. C. Thomas, W. Wang, P. Mehra, D. Cameron, P. N. Mendes, A. P. Sheth. What Goes Around Comes Around – Improving Linked Open Data through On-Demand Model Creation. Web Science Conference (WebSci10), 2010.
12. P. N. Mendes, P. Kapanipathi, D. Cameron, A. P. Sheth. Dynamic Associative Relationships on the Linked Data Web. Web Science Conference (WebSci10), 2010.
59
Research Expertise
Literature-Based Discovery
Text MiningQuestion Answering
[1]
InformationRetrieval
[2]
[3]
[6]
[4]
[8]
[10]
[5]
[7]
60
Parting Words
“...some day the piecing together of dissociated knowledge will open up such terrifying vistas of reality,...that we shall either go mad from the revelation or flee from the deadly light into the peace and safety of a new dark age.”
– H. P. Lovecraft (The Call of Cthulhu, The Horror in Clay).
H. P. Lovecraft. The Call of Cthulhu. In S. T. Joshi, editor. The Call of Cthulhu and Other Weird Stories. Penguin Books Ltd., London, 1999
61
Acknowledgements
• Olivier Bodenreider• Marcelo Fiszman• Mike Cairelli• Swapna Abhyankar• Drashti Dave• Dongwook Shin
• Special Thankso Pavano Shreyansho Swapnilo Nishita
• PREDOSE Teamo Nishitao Gaurisho Alano Revathy
62
Ph.D. Committee Members
Amit P. Sheth (Advisor)
T.K. Prasad Michael Raymer
Ramakanth Kavuluru Thomas C. Rindflesch Varun Bhagwan