![Page 1: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/1.jpg)
Nothing in (computational) biology makessense except in the light of evolution
after Theodosius Dobzhansky (1970)
Using (and abusing) sequence analysis to make biological discoveries
![Page 2: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/2.jpg)
Only a small fraction of amino acid residues is directlyinvolved in protein function (including enzymatic);the rest of the protein serves largely as structuralscaffold
Significant sequence similarity is evidence of homology
Conserved sequence motifs are determinants ofconserved ancestral functions
![Page 3: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/3.jpg)
The evolving roles of computational analysis in biology
Pre-sequencing era (before 1978-80)
Study biological function
Study biological function Clone/sequence gene
Analyze/interpret sequence
Pre-genomic era (1980-1996)
Sequence genomeAnalyze/interpret sequences
of all genes
Prioritize targetsStudy biological function
Post-genomic era (1996-
![Page 4: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/4.jpg)
![Page 5: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/5.jpg)
Sequence complexity
Measure of the randomness of a sequence
Random sequence - highest complexity (entropy) -globular protein domains
Homopolymer - lowest complexity (entropy) -non-globular structures
Algorithmic complexity
QQQQQQQQQQQQQ = (Q)n
KRKRKRKRKRKR = (KR)n
ASDFGHKLCVNM - random sequence - no algorithm to derivefrom a simpler one
![Page 6: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/6.jpg)
seg BRCA1 45 3.4 3.7 > BRCA1.seg>gi|728984|sp|P38398|BRC1_HUMAN Breast cancer type 1 susceptibility protein
1-388 MDLSALRVEEVQNVINAMQKILECPICLEL IKEPVSTKCDHIFCKFCMLKLLNQKKGPSQ CPLCKNDITKRSLQESTRFSQLVEELLKII CAFQLDTGLEYANSYNFAKKENNSPEHLKD EVSIIQSMGYRNRAKRLLQSEPENPSLQET SLSVQLSNLGTVRTLRTKQRIQPQKTSVYI ELGSDSSEDTVNKATYCSVGDQELLQITPQ GTRDEISLDSAKKAACEFSETDVTNTEHHQ PSNNDLNTTEKRAAERHPEKYQGSSVSNLH VEPCGTNTHASSLQHENSSLLLTKDRMNVE KAEFCNKSKQPGLARSQHNRWAGSKETCND RRTPSTEKKVDLNADPLCERKEWNKQKLPC SENPRDTEDVPWITLNSSIQKVNEWFSRsdellgsddshdgesesnakvadvldvlne 389-458vdeysgssekidllasdphealickservh sksvesnied 459-526 KIFGKTYRKKASLPNLSHVTENLIIGAFVT EPQIIQERPLTNKLKRKRRPTSGLHPEDFI KKADLAVQktpeminqgtnqteqngqvmnitnsghenk 527-635tkgdsiqneknpnpieslekesafktkaepisssisnmelelnihnskapkknrlrrkss trhihalelvvsrnlsppn 636-995 CTELQIDSCSSSEEIKKKKYNQMPVRHSRN LQLMEGKEPATGAKKSNKPNEQTSKRHDSD TFPELKLTNAPGSFTKCSNTSELKEFVNPS LPREEKEEKLETVKVSNNAEDPKDLMLSGE RVLQTERSVESSSISLVPGTDYGTQESISL LEVSTLGKAKTEPNKCVSQCAAFENPKGLI HGCSKDNRNDTEGFKYPLGHEVNHSRETSI EMEESELDAQYLQNTFKVSKRQSFAPFSNP GNAEEECATFSAHSGSLKKQSPKVTFECEQ KEENQGKNESNIKPVQTVNITAGFPVVGQK DKPVDNAKCSIKGGSRFCLSSQFRGNETGL ITPNKHGLLQNPYRIPPLFPIKSFVKTKCKknlleenfeehsmsperemgnenipstvst 996-1089isrnnirenvfkeasssninevgsstnevgssineigssdeniqaelgrnrgpklnamlr lgvl 1090-1238 QPEVYKQSLPGSNCKHPEIKKQEYEEVVQT VNTDFSPYLISDNLEQPMGSSHASQVCSET PDDLLDDGEIKEDTSFAENDIKESSAVFSK SVQKGELSRSPSPFTHTHLAQGYRRGAKKL ESSEENLSSEDEELPCFQHLLFGKVNNIPsqstrhstvateclsknteenllslknsln 1239-1312dcsnqvilakasqehhlseetkcsaslfss qcseledltantnt 1313-1316 QDPF
Non-globular regionsGlobular domains
![Page 7: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/7.jpg)
1422-1513 GSQPSNSYPSIISDSSALEDLRNPEQSTSE KAVLTSQKSSEYPISQNPEGLSADKFEVSA DSSTSKNKEPGVERSSPSKCPSLDDRWYMH SCsgslqnrnypsqeelikvvdveeqqleesg 1514-1616phdltetsylprqdlegtpylesgislfsddpesdpsedrapesarvgnipsstsalkvp qlkvaesaqspaa 1617-1863 AHTTDTAGYNAMEESVSREKPELTASTERV NKRMSMVVSGLTPEEFMLVYKFARKHHITL TNLITEETTHVVMKTDAEFVCERTLKYFLG IAGGKWVVSYFWVTQSIKERKMLNEHDFEV RGDVVNGRNHQGPKRARESQDRKIFRGLEI CCYGPFTNMPTDQLEWMVQLCGASVVKELS SFTLGTGVHPIVVVQPDAWTEDNGFHAIGQ MCEAPVVTREWVLDSVALYQCQELDTYLIP QIPHSHY
![Page 8: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/8.jpg)
Re moving spurious database hits for the low se que nce comple xity prote in BRCA1 by modifying SEG parame te rs a
Number of residues E-value s of the BLAST hits Parame te r se t of SEGa Maske
d Unmaske d Dentin Plant
BRCA1
Opossum BRCA1
No filtering 0 1,863 3e-11 4e-15 1e-28
12 2.1 2.4 35 1,828 4e-9 4e-15 1e-28 12 2.2 2.5 (default)
117 1,746 5e-4 5e-12 7e-22
12 2.3 2.6 172 1,691 - 5e-11 3e-21 12 2.4 2.7 279 1,584 - 5e-11 1e-14 12 2.5 2.8 487 1,376 - 6e-11 8e-10 12 2.6 2.9 616 1,247 - 2e-10 5e-9 12 2.7 3.0 908 955 - 4e-06 2e-8 12 2.8 3.1 1,164 699 - 0.003 6e-7
Composition-based filtering
0 1,863 - 3e-12 1e-20
aSEG parameters are trigger window length, trigger complexity, and extension
![Page 9: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/9.jpg)
![Page 10: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/10.jpg)
![Page 11: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/11.jpg)
![Page 12: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/12.jpg)
1422-1513 GSQPSNSYPSIISDSSALEDLRNPEQSTSE KAVLTSQKSSEYPISQNPEGLSADKFEVSA DSSTSKNKEPGVERSSPSKCPSLDDRWYMH SCsgslqnrnypsqeelikvvdveeqqleesg 1514-1616phdltetsylprqdlegtpylesgislfsddpesdpsedrapesarvgnipsstsalkvp qlkvaesaqspaa 1617-1863 AHTTDTAGYNAMEESVSREKPELTASTERV NKRMSMVVSGLTPEEFMLVYKFARKHHITL TNLITEETTHVVMKTDAEFVCERTLKYFLG IAGGKWVVSYFWVTQSIKERKMLNEHDFEV RGDVVNGRNHQGPKRARESQDRKIFRGLEI CCYGPFTNMPTDQLEWMVQLCGASVVKELS SFTLGTGVHPIVVVQPDAWTEDNGFHAIGQ MCEAPVVTREWVLDSVALYQCQELDTYLIP QIPHSHY
![Page 13: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/13.jpg)
![Page 14: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/14.jpg)
![Page 15: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/15.jpg)
![Page 16: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/16.jpg)
Paradigm shift in database searching
Querysequence
Sequence database
Set of homologs
PSSM
Querysequence
PSSM databaseDomainarchitecture
Traditional
New
PSI-BLAST
RPS-BLAST
![Page 17: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/17.jpg)
![Page 18: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/18.jpg)
![Page 19: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/19.jpg)
![Page 20: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/20.jpg)
![Page 21: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/21.jpg)
![Page 22: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/22.jpg)
![Page 23: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/23.jpg)
![Page 24: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/24.jpg)
![Page 25: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/25.jpg)
BRCA1RING
BARD1
DOMAIN ARCHITECTURE OF SELECTED BRCT PROTEINS
BRCTBRCT
CMP-trans REV1 yeast
DPB11 yeast
ATP-dep ligase DNA ligase IIIhuman
AZFPARP
PARPvertebrates
HhHpolX TdT eukaryotes
ATP and PCNA-binding RFC1
NAD-dep ligase DNA ligasebacteria
eukaryotes
PHD-lBRCA1/BARDhomolog plant
![Page 26: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/26.jpg)
![Page 27: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/27.jpg)
![Page 28: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/28.jpg)
![Page 29: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/29.jpg)
![Page 30: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/30.jpg)
![Page 31: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/31.jpg)
![Page 32: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/32.jpg)
![Page 33: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/33.jpg)
![Page 34: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/34.jpg)
![Page 35: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/35.jpg)
![Page 36: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/36.jpg)
![Page 37: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/37.jpg)
![Page 38: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/38.jpg)
![Page 39: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/39.jpg)
![Page 40: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/40.jpg)
![Page 41: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/41.jpg)
Use of profile libraries to examine domain representation in individual proteomes
Profile library
6,200
~20,000
yeast
worm
0
100
0
100
Detect domainsusingPSI-BLAST,IMPALA
Compare domaindistributions
Chervitz SA, Aravind L, Sherlock G, Ball CA, Koonin EV, Dwight SS, Harris MA, Dolinski K, Mohr S, SmithT, Weng S, Cherry JM, Botstein D. 1998. Comparison of the complete protein sets of worm and yeast: orthology and divergence. Science 282: 2022-8
![Page 42: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/42.jpg)
Normalized domain counts in worm and yeast
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12Yeast
Wo
rm1
2
3
4
5
6 7
98
10
11
12
13
14
15
16 17
18
19
1.Hormone receptor; 2.POZ; 3.EGF; 4.MATH; 5.PTPase; 6.Cation Channels; 7.PDZ; 8.SH2; 9.FNIII; 10.Homeodomain; 11.LRR; 12.EF hands; 13.Ankyrin; 14.RING finger; 15.C2H2 finger; 16.small GTPase; 17.RRM; 18.AAA+; 19.C6 finger
![Page 43: Nothing in ( computational ) biology makes sense except in the light of evolution](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813b61550346895da45de6/html5/thumbnails/43.jpg)
•Searching a domain library is often easier and more informative than searching the entire sequence database. However, the latter yields complementary information and should not be skipped if details are of interest.•Varying the search parameters, e.g. switching composition-based statistics on and off, can make a difference.•Using subsequences, preferably chosen according to objective criteria, e.g. separation from the rest of the protein by a low-complexity linker, may improve search performance. •Trying different queries is a must when analyzing protein (super)families.Even hits below the threshold of statistical significance often are worth analyzing, albeit with extreme care. Transferring functional information between homologs on the basis of a database description alone is dangerous.• Conservation of domain architectures, active sites and other features needs to be analyzed (hence automated identification of protein families is difficult and automated prediction of functions is extremely error-prone). •Always do a reality check!