biomedical text mining: inferring hidden relationships from biological literature
TRANSCRIPT
![Page 1: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/1.jpg)
Biomedical Text Mining: Inferring Hidden Relationships
from Biological Literature
![Page 2: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/2.jpg)
Biomedical Text Mining (BTM)
Why biomedicine? Consider just MEDLINE: more than 20,000,000
references, 40,000 added per month Dynamic nature of the domain: new terms (genes,
proteins, chemical compounds, drugs) constantly created
Impossible to manage such an information overload
![Page 3: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/3.jpg)
From Text to Knowledge: tackling the data deluge through text mining
Unstructured Text(implicit knowledge)
Structured content(explicit knowledge)
Informationextraction
Semanticmetadata
Knowledge Discovery
InformationRetrieval
AdvancedInformation
Retrieval
![Page 4: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/4.jpg)
Information Deluge
Bio-databases, controlled vocabularies and bio-ontologies encode only small fraction of information
Linking text to databases and ontologies Curators struggling to process scientific literature Discovery of facts and events crucial for gaining
insights in biosciences: need for text mining
![Page 5: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/5.jpg)
Aims of Biomedical Text Mining
Text mining: discover & extract unstructured knowledge hidden in text Hearst (1999)
Text mining aids to construct hypotheses from associations derived from text
protein-protein interactions associations of genes – phenotypes functional relationships among genes
![Page 6: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/6.jpg)
Impact of biomedical text mining
Extraction of named entities (genes, proteins, metabolites, etc)
Discovery of concepts allows semantic annotation of documents Improves information access by going beyond index
terms, enabling semantic queryingConstruction of concept networks from text
Allows clustering, classification of documents Visualization of concept maps
![Page 7: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/7.jpg)
Impact of BTM
Extraction of relationships (events and facts) for knowledge discovery Information extraction, more sophisticated annotation
of texts (event annotation) Beyond named entities: facts, events Enables even more advanced semantic querying
![Page 8: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/8.jpg)
Literature Based Discovery (LBD)
Swanson experiments (1986) influenced conceptual biology rapid ‘mining’ of candidate hypotheses from the
literature migraine and magnesium deficiency (Swanson, 1988) indomethacin and Alzheimer’s disease (Swanson and
Smalheiser 1994), Curcuma longa and retinal diseases, Crohn's disease
and disorders related to the spinal cord (Srinivasan and Libbus 2004).
(Weeber M, Rein et al. 2003) thalidomide for treating a series of diseases such as acute pancreatitis, chronic hepatitis C.
![Page 9: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/9.jpg)
Literature Based Discovery (LBD)
Conceptual Biology?
Swanson’s ABC model
Drug repositioning
Alzheimer
In-sulin
PKC1
CATS
SOS2
3
5
2
8
9
4
![Page 10: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/10.jpg)
Literature-based discovery (LDA)? ---the very idea.
1. It means deriving, from the public record of science new solutions to scientific problems.
2. The possibility arises, for example, when two articles considered together for the first time suggest new information of scientific interest not apparent from either article alone.
![Page 11: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/11.jpg)
Venn Diagram -- ABC Model
A CB
Articles about an AB relationship.
Articles about a BC relationship.
AB BC
AB and BC are complementary but disjoint :They can reveal an implicit relationship between A and C in the absence of any explicit relation.
![Page 12: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/12.jpg)
An ABC example based on title words in Medline
Magnesium-deficient rat as a model of epilepsy.Lab Animal Sci 28:680-5, 1978
The relation of migraineand epilepsy. Brain 92: 285-300, 1969
A magnesium88204
C migraine26923An unintended link
Venn diagram: sets of Medline records; A,C are disjoint.
1018 1710
B epilepsy
![Page 13: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/13.jpg)
Research problems
Information model Biological information Multi-level
Automation Gigantic amount of data Swanson’s ABC model
Semi-automatic
How to discover novelty Find novel information
Novel Hypothesis Genera-tion
A1(Fish Oil)
C1(Raynaud Disease)
B1(Blood Viscos-
ity)Re-duce
Aggre-gate
![Page 14: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/14.jpg)
Information Model
Information Model : category-based interaction model
Interactor node Connects whole relation Represents action by verb
Interactor Type
Induce Increase
Contribute Increase
Reduce Reduction
Increase Increase
Resistant Reduction
![Page 15: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/15.jpg)
Information Model
Each node is represented by mapping a semantic type of the node to its corresponding UMLS top category.
![Page 16: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/16.jpg)
Methods
Data Flow of BioDiscovery
MED-LINE /
PubMED Abstracts
Sentence Splitter
Entity Extractor
Relation Extractor
Similar Entity De-
tectorUPK
UMLS
Extracted Enti-ties/Relations
Graph Builder
Sentence Parser
Visual-izer
![Page 17: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/17.jpg)
MethodsData Flow of BioDiscovery
=Sentence Parser=- Input : Split Sentence
- Output : Sentence tree by Link Grammar Parser
Sentence Parsing Phase
Split Sen-tences
TaggerParsed Tree
![Page 18: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/18.jpg)
Sentence Parsing - Example
Original Sentence: After the DF1 cells had been cultured for 9 d, the ALV p27 antigen in the supernatants of the two sets was detected by ELISA
![Page 19: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/19.jpg)
Sentence Parsing - Example
![Page 20: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/20.jpg)
Entity Extractor
A NER technique is used to detect entities LingPipe NER and Genia corpus used to detect
The accuracy of entity extraction by LingPipe is low.
Validation of the entity type of extracted entities: by looking up UMLS Semantic Network
Assignment of the category tag for each entity: by utilizing UMLS top categories such as Anatomical
Structure, Substance, and Phenomenon or Process
![Page 21: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/21.jpg)
Relation Extractor
Selection of the key connector term (i.e., verb) Difficult decision where complex sentences contain
many verbs Utilize Link Grammar link types such as V and MV
to determine the key connector
Entities that appear before the key connector is set to Interactor entities
Entities that appear after the key connector is set to Interactee entities
![Page 22: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/22.jpg)
Interaction Graph Builder
A maximum connected graph that can be built by our interaction model is a bow tie shape.
Each node represents an entity. Edge between entities is determined by
proximity in a sentence.First two nodes to be connected are an
interactor entity and an interactee entity that are located closest to the connector.
Entities that belong to the same category are inter-connected to each other.
![Page 23: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/23.jpg)
Methods
Data Flow of MKEM
* See Appendix B for description
MEDLINE / PubMED Abstracts
Sentence Selector
Relation Extractor
Informa-tion Ele-
ment Rec-ognizer
Similarity MeasureUPK
Entity Ex-traxtor
Relation Extractor
Similar Entity
DetectorUPK
=UPK Infer-ence=- Input : Extracted Entity/Relation- Output : UPK
Similarity Measure*
MetaMap Type
Structural Atomic Count
Semantic Similar-ity
0 : Not Simi-lar1 : Similar
0 : Not Similar0.5 : Substruc-ture1 : Similar
0 : Not Simi-lar1 : Similar
Ranking scores Graph Builder
Visual-izer
UMLS
![Page 24: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/24.jpg)
Similarity Measures
Semantic Type UMLS Semantic Type
Structural Similarity Structural similarity is calculated using the SMSD
(Small Molecule Subgraph Detector) systemAtomic Count
is taken from the chemDB database. Atomic count defines the enumeration of constituent atoms of the chemical which is of interest.
Semantic Similarity Relative importance-based graph similarity
Topological Similarity (Not implemented yet) Graph topology-based similarity
![Page 25: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/25.jpg)
Semantic Similarity
Build dependency tree of a sentenceCreate semantic distributional models (based
on feature vectors) by Tensor Singular Value Decomposition (SVD) The shape is a 3-dimensional tensor of the edge
statistics, which has the shape Head-Relation-Dependency
It adds dependency edges in the reverse directionCalculate term weight by Point-wise Mutual
Information (PMI)
![Page 26: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/26.jpg)
Tensor Example
![Page 27: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/27.jpg)
Tensors are useful for 3 or more modes
![Page 28: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/28.jpg)
Tensor SVD Decomposition
![Page 29: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/29.jpg)
2D Analog of Tensor SVD Decomposition
![Page 30: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/30.jpg)
Methods
UPK Inference Example
Wogonin
Apopto-sis
N/A
Malig-nant T-Cells
In-crease
Fisetin
Apopto-sis
N/A
HCT-116 Cells
In-crease
![Page 31: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/31.jpg)
Methods
UPK Inference Example
Wogonin
Apopto-sis
N/A
Malig-nant T-Cells
In-crease
Fisetin
Apopto-sis
N/A
HCT-116 Cells
In-crease
Wogonin Fisetin Similarity
UMLS Semantic
Type
Organic Chemical 1
Structural Similarity
0.75 1
Atomic Count
C16H12O5 C15H10O6 1
Semantic Similarity
0.265
Similarity measure
![Page 32: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/32.jpg)
Methods
UPK Inference Example
Wogonin
Apopto-sis
N/A
Malig-nant T-Cells
In-crease
Fisetin
Apopto-sis
N/A
HCT-116 Cells
In-crease
Wogonin
Fisetin
![Page 33: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/33.jpg)
Results & Discussion
Input data 500 PubMED abstracts related to ‘apoptosis’
Extraction result
Entity Type # of extracted entities
Substances 410
Processes 357
Diseases 44
Body Parts 82
![Page 34: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/34.jpg)
Results & Discussion: Semantic Similarity with Wogonin
Similarity between wogonin_NN1 and docetaxel_NN1 : 0.0555 Similarity between wogonin_NN1 and serotonin_NN1 : 0.0558 Similarity between wogonin_NN1 and amisulpride_NN1 : 0.0 Similarity between wogonin_NN1 and ranolazine_NN1 : 0.0 Similarity between wogonin_NN1 and genistein_NN1 : 0.0429 Similarity between wogonin_NN1 and brivaracetam_NN1 : 0.0 Similarity between wogonin_NN1 and carisbamate_NN1 : 0.0 Similarity between wogonin_NN1 and riboflavin_NN1 : 0.0 Similarity between wogonin_NN1 and fisetin_NN1 : 0.0532 Similarity between wogonin_NN1 and daidzein_NN1 : 0.0 Similarity between wogonin_NN1 and caffeine_NN1 : 0.0 Similarity between wogonin_NN1 and enzyme_NN1 : -
1.530258524063656E-4 Similarity between wogonin_NN1 and topiramate_NN1 : 0.0 Similarity between wogonin_NN1 and melatonin_NN1 : 0.084 Similarity between wogonin_NN1 and nimodipine_NN1 : 0.086
![Page 35: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/35.jpg)
Results & Discussion: PageRank Score
Substance Name Semantic Type PageRank Similarity
NAG-1 Gene or Genome 0.007264810642471977
apoptosis Cell Function 0.0072537088024985635
Flou-3 AM Pharmacologic Substance 0.007244320944332344
wogonin Organic Chemical 0.007134948358585843
Docetaxel Organic Chemical 0.0070126880085477124
Jarisch-Herxheimer reaction Functional Concept 0.0070126880085477124
apoptotic cells Cell 0.0067827690545702755
Genistein Organic Chemical 0.006781234384834052
p53 Gene or Genome 0.006771667214596861
docetaxel+SN Organic Compound 0.006762759924385635
adverse reactions Finding 0.006762759924385635
atRA Organic Chemical 0.006762759924385635
HCT-116 Cell Line 0.006762759924385635
HCT-116 cells Cell Line 0.006762759924385635
SN-38 Organic Chemical 0.006521739130434784
mesenchyme Embryonic Structure 0.006521739130434784
![Page 36: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/36.jpg)
Results & Discussion: Semantic Similarity for Magnesium and Migraine
A1(Magnesium)
C1(Migraine)
B1(Epilepsy)
Positive impact on
Is related to
Semantic Type: Dis-ease or Syndrome
Element, Ion, or Iso-tope
Semantic Type: Dis-ease or Syndrome
Magnesium – Epilepsy: 0.033Magnesium – Malaria: 0.011Magnesium – Sarcoidosis: 0.015Magnesium – Diabetes: 0.017Magnesium – Asthma: 0.021Magnesium – Hyperoxaluria: 0.026 Magnesium – Hepatitis: 0.018
Epilepsy – Migraine: 0.158Epilepsy – Malaria: 0.004Epilepsy – Sarcoidosis: 0.041Epilepsy – Diabetes: 0.049Epilepsy – Asthma: 0.058Epilepsy – Hyperoxaluria: 0.002Epilepsy – Hepatitis: 0.009
![Page 37: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/37.jpg)
Results & Discussion
Sample of new relationships
Supporting Papers Wogonin increases apoptosis in HCT-116 cells
“Reactive oxygen species up-regulate p53 and Puma; a possible mechanism for apoptosis during combined treatment with TRAIL and wogonin”, Dae-Hee Lee et. al.
Genistein can induce apoptosis in HCT-116 cells “Genistein, a Dietary Isoflavone, down-regulates the MDM2 Oncogene at
Both Transcriptional and Posttranslational Levels”, Mao Li et. al.
Substance Effect Type
Process Disease
Body Part
Wogonin Increase Apoptosis N/A HCT-116 Cells
Fisetin Increase Apoptosis N/A Malignant T Cells
Docetaxel Increase mRNA expression of IL-1
N/A N/A
Genistein Increase Apoptosis N/A HCT-116 Cells
Fisetin Increase Apoptosis N/A Tumor Cells
![Page 38: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/38.jpg)
Summary & Future Work
It is a on-going project. The system was applied on the entity
relations identified by our information model.
We proposed a new system that extracts relationships from biomedical text and infers new information.
Future work Other techniques for NER. Anaphoric relationship extraction. Further enhancing Link Grammar lexicon. Rule generalization to provide better coverage.
![Page 39: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/39.jpg)
Demo
Retrieve stored entities and relations http://
informatics.yonsei.ac.kr/relex/SelectDatabase.jsp
Download pubmed record and extract entities and relations http://
informatics.yonsei.ac.kr/relex/DownloadPubMedRecord.jsp
![Page 40: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/40.jpg)
Conclusion
We suggested context-vectors to infer unknown relationships based on biologically meaningful terms.
We constructed multi-level entity dictionary to recognize multi-level entities from the literature.
We utilized our context vectors to discover putative drugs and diseases relationships.
We evaluated the results by drug-disease relations which are curated from the literature.(PharmGKB, CTD).
In the Alzheimer’s disease 77,711 papers, we found that our context vector based hybrid approach has better precision than previous frequency based ABC model.
![Page 41: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/41.jpg)
Thank you!
Questions?
Thank You!
![Page 42: Biomedical Text Mining: Inferring Hidden Relationships from Biological Literature](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649e185503460f94b04ef1/html5/thumbnails/42.jpg)
Appendix: Future Study: Difference Approach to Context Terms
Based on Interaction words (verb terms), define possible direct interaction among entities, and assume that interactions among the rest of entities are context.
I-verbI-Ent1 I-En2 C-Ent C-EntC-Ent
Sentence 1
I-verbI-Ent1 I-En2C-Ent C-EntC-Ent
Sentence 2
I-verbC-Ent I-En1C-Ent C-EntI-Ent2
Sentence 3