![Page 1: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/1.jpg)
Associating Biomedical Terms:Case Study for Acetylation
Aaron BuechleinIndiana University School of InformaticsAdvisor: Dr. Predrag Radivojac
![Page 2: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/2.jpg)
Overview• Background
• Previous Work
• Methods
• Results
![Page 3: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/3.jpg)
Central Dogma
Background
Previous Work
Methods
Results
http://www.accessexcellence.org/RC/VL/GG/images/central.gif
![Page 4: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/4.jpg)
Post-Translational Modifications (PTMs)
Background
Previous Work
Methods
Results
![Page 5: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/5.jpg)
Acetylation
• Acetylation involves the substitution of an acetyl group (-COCH3) for hydrogen
• Typically occurs on N-terminal tails and lysine residues (Lys or K)
Background
Previous Work
Methods
Results
![Page 6: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/6.jpg)
Previous Predictors
• Several PTM predictors have been created prior to this work
• There are also acetylation predictors prior
• NetAcet is a predictor for only N-terminal sites• AutoMotif Server is a predictor for various PTMs and
includes an acetylation portion• PAIL is a lysine acetylation predictor
Background
Previous Work
Methods
Results
![Page 7: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/7.jpg)
Methods
• Create Dataset
• Download articles relevant to acetylation and extract sites
• Rank articles in order to elucidate sites quickly• SwissProt and Human Protein Reference Database
(HPRD)
• Create Predictors
• Leave – one – protein – out validation• Matlab
Background
Previous Work
Methods
Results
![Page 8: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/8.jpg)
Article Retrieval
• Searched individual journal sites for articles relevant to acetylation
• Saved resultant html pages for each journal
• These pages were then used as the input for a web crawler to download articles
• Due to varying journal site construction each journal required a unique regular expression to extract links for articles
Background
Previous Work
Methods
Results
![Page 9: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/9.jpg)
Rank Articles
• First locate occurrences of first phrase: “phrase 1”
• A = {a1, a2, …, a|A |}
• Next locate occurrences of second phrase: “phrase 2”
• R = {r1, r2…, r|R|}
•
• c and d are constants• x is the distance in characters between r and the nearest
word a
Background
Previous Work
Methods
Results
![Page 10: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/10.jpg)
An example: acetylation
Background
Previous Work
Methods
Results 1. word “acetylat”
A = {a1, a2, …, am}
2. regular expression
(k lys lysine)(space)*(digit)+
R = {r1, r2, …, rn}
![Page 11: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/11.jpg)
An example: acetylation
Background
Previous Work
Methods
Results
n
i i ArscoreS1
),(
Score for article S:
and
where
![Page 12: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/12.jpg)
An example: acetylation
n
i i ArscoreS1
),(
|))()((|),( kii apositionrpositionfArscore
|)()(|minarg ...1 jimj apositionrpositionk
Score for article S:
where:
and
Papers with S > 100 are rich in sites; if S < 30 “twilight” zone
Background
Previous Work
Methods
Results
0 100 200 300 400 500 600 700 800 900 10000
1
2
3
4
5
6
7
8
9
10
Distance in characters
f(x)
xexf 005.010)(
![Page 13: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/13.jpg)
Elucidate Sites
• Sites were manually extracted from articles beginning with the highest rank
• The original experimental paper for these sites was verified for traceable evidence
• Sites were extracted from SwissProt
• Sites were extracted from HPRD
Background
Previous Work
Methods
Results
![Page 14: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/14.jpg)
Predictors
• Support Vector Machine
• Artificial Neural Network
• Decision Tree
Background
Previous Work
Methods
Results
![Page 15: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/15.jpg)
Predictor Input
• Positives taken as all lysines found to be acetylated
• Negatives taken as all lysines not found to be acetylated
• Features created based on characteristics surrounding lysines
• Amino acid content, hydrophobicity, charge, disorder, etc.
Background
Previous Work
Methods
Results
![Page 16: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/16.jpg)
Predictor Input
Background
Previous Work
Methods
Results
Protein Features Acetylated
1 8 1 0.48609 0.001767 0.48979 0.51508 1
1 7 1 0.92146 0.03019 0.96423 0.79416 1
1 0 0 0.50622 0.015251 0.52335 0.51855 0
2 10 2 0.2008 0.038708 0.25441 0.36071 1
2 1 0 0.62016 0.009772 0.62846 0.67525 0
2 0 0 0.27783 0.028957 0.32162 0.34207 0
3 11 1 0.89239 0.018354 0.91884 0.88125 1
3 12 2 0.87354 0.022307 0.90349 0.87446 1
3 8 1 0.81549 0.025339 0.85289 0.85702 1
3 2 0 0.84588 0.024766 0.88219 0.86599 0
![Page 17: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/17.jpg)
Article and Ranking Results
• 4888 articles from 10 sites were searched• Nature provided 2147 articles• Science Direct provided1519 articles
• The highest ranking article was obtained from the Journal of Biological Chemistry• Score of 151.87 • Contained 10 acetylation sites
• The highest ranking article was obtained from Nature when histones are excluded• Previously ranked at #5• score of 116.36• Contained 9 unique acetylation sites
Background
Previous Work
Methods
Results
![Page 18: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/18.jpg)
Top 25
Rank Score Sites Article Source1) 151.8667 10 Journal of Biological Chemistry2) 123.2314 12 Cell / Science Direct3) 121.9031 6 Nature4) 117.7988 9 Journal of Proteome Research5) 116.3582 9 Nature6) 111.1745 14 Biochemistry7) 104.4652 6 Cell / Science Direct8) 104.0166 7 Nature9) 102.0683 13 Molecular Cell / Science Direct
10) 98.80812 6 Journal of Biological Chemistry11) 97.64634 6 Biochemistry12) 96.76536 6 Journal of Biological Chemistry13) 96.0845 9 International Journal of Mass Spectrometry / Science Direct14) 88.12967 9 Biochemistry15) 86.17157 6 Journal of Biological Chemistry16) 81.78705 5 Nucleic Acids Research17) 81.30967 6 Biochemistry18) 81.06128 6 Molecular Cell / Science Direct19) 80.74899 9 Journal of Biological Chemistry20) 80.16261 9 Nature21) 79.65658 6 Molecular Cell / Science Direct22) 77.9022 4 Cell / Science Direct23) 77.88304 5 Nucleic Acids Research24) 77.60087 8 Gene / Science Direct25) 77.44198 6 Journal of the American Society for Mass Spectrometry
Background
Previous Work
Methods
Results
![Page 19: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/19.jpg)
Ranking Results
• Articles with scores greater than 30 had potential for providing at least one site
• As scores approached 30, articles became less fruitful
Background
Previous Work
Methods
Results
![Page 20: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/20.jpg)
Dataset Results
• Dataset included 1442 total sites and 1085 non-redundant sites
• HPRD contributed 90 total sites• Swiss-Prot contributed 825• Our Study contributed 527
Background
Previous Work
Methods
Results
![Page 21: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/21.jpg)
Background
Previous Work
Methods
Results
Dataset Results
![Page 22: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/22.jpg)
Sensitivity, Specificity, and Precision
• Sensitivity(sn) -
• Specificity(sp) -
• Precision(pr) -
Background
Previous Work
Methods
Results
![Page 23: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/23.jpg)
Accuracy and AUC
• Accuracy(acc) -
• Area Under Curve(AUC)• Refers to the area under the Receiver Operating Curve
(ROC)• ROC is the graphical plot of sensitivity vs. 1-specificity
Background
Previous Work
Methods
Results
![Page 24: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/24.jpg)
SVM Predictor
DegreePolynomial kernel
sn sp pr acc AUC
p = 1 52.3 71.0 24.6 61.6 65.2
p = 2 46.1 69.8 20.3 57.9 62.8
p = 3 31.6 80.8 23.5 56.2 60.3
DegreeGaussian kernel
sn sp pr acc AUC
σ = 10-2 43.8 75.8 24.9 59.8 64.3
σ = 10-3 54.1 72.1 25.9 63.1 68.1
σ = 10-6 52.8 70.7 24.6 61.8 65.3
Background
Previous Work
Methods
Results
![Page 25: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/25.jpg)
Artificial Neural Network
Hidden Neurons
Artificial Neural Network
sn sp pr acc AUC
1 68.0 47.7 20.7 57.8 61.9
3 65.2 47.7 19.4 56.4 58.9
5 65.0 47.2 19.1 56.1 57.5
Background
Previous Work
Methods
Results
![Page 26: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/26.jpg)
Decision Tree
AlgorithmDecision Tree
sn sp pr acc AUC
Decision Tree 61.7 45.9 18.3 53.8 42.1
Background
Previous Work
Methods
Results
![Page 27: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/27.jpg)
Algorithm Comparison
Algorithm sn sp pr acc AUC
SVM 54.1 72.1 25.9 63.1 68.1
Neural Network 68.0 47.7 20.7 57.8 61.9
Decision Tree 61.7 45.9 18.3 53.8 42.1
Background
Previous Work
Methods
Results
![Page 28: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/28.jpg)
I would like to acknowledge those who have helped me throughout the duration of this project, Dr. Predrag Radivojac, Dr. Haixu Tang, and Wyatt Clark
![Page 29: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/29.jpg)
I welcome your questions and/or comments
![Page 30: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/30.jpg)
An example: acetylation
1. word “acetylat”
A = {a1, a2, …, am}
2. regular expression
(k lys lysine)(space)*(digit)+
R = {r1, r2, …, rn}
Background
Previous Work
Methods
Results
![Page 31: Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac](https://reader036.vdocuments.mx/reader036/viewer/2022062721/56649f275503460f94c3f329/html5/thumbnails/31.jpg)
An example: acetylation
Background
Previous Work
Methods
Results
n
i i ArscoreS1
),(
Score for article S:
and
where