introducci ó a la bioinformàtica roderic guigó i serra [email protected]
DESCRIPTION
Introducci ó a la Bioinformàtica Roderic Guigó i Serra [email protected]. Bioinform àtica , UPF Curs 2013-2014. Van Leeuwenhoek. - PowerPoint PPT PresentationTRANSCRIPT
Introducció a la BioinformàticaRoderic Guigó i [email protected]
Bioinformàtica, UPF Curs 2013-2014
Van Leeuwenhoek
In 1676 his credibility was questioned when he sent the Royal Society a copy of his first observations of microscopic single celled organisms. Heretofore, the existence of single celled organisms was entirely unknown … The Royal Society arranged to send an English vicar, as well as a team of respected jurists and doctors to Delft, Holland to determine whether it was in fact Van Leeuwenhoek's ability to observe and reason clearly (wikipedia)
ACTCAGCCCCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGGTGAGAAGCGCAGTCGGGGGCACGGGGATGAGCTCAGGGGCCTCTAGAAAGATGTAGCTGGGACCTCGGGAAGCCCTGGCCTCCAGGTAGTCTCAGGAGAGCTACTCAGGGTCGGGCTTGGGGAGAGGAGGAGCGGGGGTGAGGCCAGCAGCAGGGGACTGGACCTGGGAAGGGCTGGGCAGCAGAGACGACCCGACCCGCTAGAAGGTGGGGTGGGGAGAGCATGTGGACTAGGAGCTAAGCCACAGCAGGACCCCCACGAGTTGTCACTGTCATTTATCGAGCACCTACTGGGTGTCCCCAGTGTCCTCAGATCTCCATAACTGGGAAGCCAGGGGCAGCGACACGGTAGCTAGCCGTCGATTGGAGAACTTTAAAATGAGGACTGAATTAGCTCATAAATGGAAAACGGCGCTTAAATGTGAGGTTAGAGCTTAGAATGTGAAGGGAGAATGAGGAATGCGAGACTGGGACTGAGATGGAACCGGCGGTGGGGAGGGGGAGGGGGTGTGGAATTTGAACCCCGGGAGAGAAAGATGGAATTTTGGCTATGGAGGCCGACCTGGGGATGGGGAAATAAGAGAAGACCAGGAGGGAGTTAAATAGGGAATGGGTTGGGGGCGGCTTGGTAACTGTTTGTGCTGGGATTAGGCTGTTGCAGATAATGGAGCAAGGCTTGGAAGGCTAACCTGGGGTGGGGCCGGGTTGGGGTCGGGCTGGGGGCGGGAGGAGTCCTCACTGGCGGTTGATTGACAGTTTCTCCTTCCCCAGACTGGCCAATCACAGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGGCAGGTATGGGGCGGGGCTTGCTCGGTTTTCCCCGCTTCTCCCCCTCTCATCCTCACCTCAACCTCCTGGCCCCATTCAAGCACACCCTGGGCCCCCTCTTCTTCTGCTGGTCTGTCCCCTGAGGGGAAAGCCCAGGTCTGAGGCTTCTATGCTGCTTTCTGGCTCAGAACAGCGATTTGACGCTCTGTGAGCCTCGGTTCCTCCCCCGCTTTTTTTTTTTCAGCCAGAGTCTCACTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCAATCTCAGCTCACTGCAAGCTCCGCCTCCCGGGTTCACGCTATTCTCCCGCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCGCCCGCCACCATGCCCGGCTAATTTTTTGTACTTTGAGTAGGGAAGGGGTTTCACTGTATTATCCAGGATGGTCTCTATCTCCTGACCTCGTGATCTGCCCGCCTGGCCTCCCAAAGTGCTGGAATTACAGGCGTGAGCCTCCGCGCCCGGCCTCCCCATCCTTAATATAGGAGTTAGAAGTTTTTGTTTGTTTGTTTTGTTTTGTTTTTGTTTTGTTTTGAGATGAAGTCCCTCTGTCGCCCAGGCTGGAGTGCAGTGGCTCCCAGGCTGGAGTTCAGTGGCTGGATCTCGGCTCACTGCAAGCTCCGCCTCCCAGGTTCACGCCATTCTCCTGCCTCAGCCTCCGGAGTAGCTGGGACTACAGGAACATGCCACCACACCCGACTAACTTTTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTGGCCAGGCTGGTCTGGAACTCCTGACCTCAGGTGATCTGCCTGCTTCAACCTCCCAAAGTGCTGGGATTACAGACGTGGGCCACCGCGCCCGGCTGGGAGTTAAGAGGTTTCTAATGCATTGCATTAGAATACCAGACACGGGACAGCTGTGATCTTTATTCTCCATCACCCCACACAGCCCTGCCTGGGGCACACAAGGACACTCAATACACGCTTTTCGGGCGCGGTGGCTCAAGCTGTAATCCCAGCACTTTGGGAGGCTGAGGCGGGTGGTACATGAGGTCAGGAGATCGAGACCATCCTGGCTAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAAACTAGCCCGGGCGTGGTGGCGGGCGCCTGTAGTCCCAGCTACTCGGAGGCTGAGGCAGGAGAATGGCGTGAACCTGGGAGGCGGAGCTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGGGTGACACAGCGCGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATACACGCTTTTCCGCTAGGCACGGTGGCTCACCCCTGTAATCCCAGCATTTTGGGAGGCCAAGGTGGGAGGATCACTTGAGCCCAGGAGTTCAACACCAGACTCAGCAACATAGTGAGACTCTCTCTACTAAAAATACAAAAATTAGCCAGGCCTGGTGCCACACACCTGTGGTCCCAGCTACTCAGAAGGCTAAGGCAGGAGGATCGCTTAAGCCCAGAAGGTCAAGGTTGCAGTGAACCACGTTCAGGCCACTGCAGTCCAGCCTGGGTGACAGAGCAAGACCCTGTCTGTAAATAAATAACGCTTTTCAAGTGATTAAACAGACTCCCCCCTCACCCTGCCCACCATGGCTCCAAAGCAGCATTTGTGGAGCACCTTCTGTGTGCCCCTAGGTACTAGCTGCCTGGACGGGGTCAGAAGGAACCTGAACCACCTTCAACTTGTTCCACACAGGATGCCAGGCCAAGGTGGAGCAACCGGTGGAGCCAGAGACAGAACCCGACGTTCGCCAGCAGGCTGAGTGGCAGAGCGGCCAGCCCTGGGAGCTGGCACTGGGTCGCTTTTGGGATTACCTGCGCTGGGTGCAGACACTGTCTGAGCAGGTGCAGGAGGAGCTGCTCAGCCCCCAGGTCACCCAGGAACTGACGTGAGTGTCCCCATCCCGGCCCTTGACCCTCCTGGTGGGCGGCTATACCTCCCCAGGTCCAGGTTTCATTCTGCCCCTGCCACTAAGTCTTGGGGGCCTGGGTCTCTGCTGGTTCTAGCTTCCTCTTCCCATTTCTGACTCCTGGCTTTAGCTCTCTGGAATTCTCTCTCTCAGTTCTGTTTCTCCCTCTTCCCTTCTGACTCAGCCTGTCACACTCGTCCTGGCGCTGTCTCTGTCCTTCACTAGCTCTTTTATATAGAGACAGAGAGATGGGGTCTCACTGTGTTGCCCAGGCTGGTCTTGAACTTCTGGGCTCAAGCGATCCTCCCACCTCGCCTCCCAAAGTGCTGGGAATAGAGACATGAGCCACCTTGCTCGGCCTCCTAGCTCTTTCTTCGTCTCTGCCTCTGCTCTCTGCGTCTGTCTTTGTCTCCTCTCTGCCTCTGTCCCGTTCCTTCTCTCTTGGTTCACTGCCCTTCTGTCTCTCCCTGTTCTCCTTAGGAGACTCTCCTCTCTTCCTTCTCGAGTCTCTCTGGCTGATCCCCATCTCACCCACACCTATCC
ACTCAGCCCCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGGTGAGAAGCGCAGTCGGGGGCACGGGGATGAGCTCAGGGGCCTCTAGAAAGATGTAGCTGGGACCTCGGGAAGCCCTGGCCTCCAGGTAGTCTCAGGAGAGCTACTCAGGGTCGGGCTTGGGGAGAGGAGGAGCGGGGGTGAGGCCAGCAGCAGGGGACTGGACCTGGGAAGGGCTGGGCAGCAGAGACGACCCGACCCGCTAGAAGGTGGGGTGGGGAGAGCATGTGGACTAGGAGCTAAGCCACAGCAGGACCCCCACGAGTTGTCACTGTCATTTATCGAGCACCTACTGGGTGTCCCCAGTGTCCTCAGATCTCCATAACTGGGAAGCCAGGGGCAGCGACACGGTAGCTAGCCGTCGATTGGAGAACTTTAAAATGAGGACTGAATTAGCTCATAAATGGAAAACGGCGCTTAAATGTGAGGTTAGAGCTTAGAATGTGAAGGGAGAATGAGGAATGCGAGACTGGGACTGAGATGGAACCGGCGGTGGGGAGGGGGAGGGGGTGTGGAATTTGAACCCCGGGAGAGAAAGATGGAATTTTGGCTATGGAGGCCGACCTGGGGATGGGGAAATAAGAGAAGACCAGGAGGGAGTTAAATAGGGAATGGGTTGGGGGCGGCTTGGTAACTGTTTGTGCTGGGATTAGGCTGTTGCAGATAATGGAGCAAGGCTTGGAAGGCTAACCTGGGGTGGGGCCGGGTTGGGGTCGGGCTGGGGGCGGGAGGAGTCCTCACTGGCGGTTGATTGACAGTTTCTCCTTCCCCAGACTGGCCAATCACAGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGGCAGGTATGGGGCGGGGCTTGCTCGGTTTTCCCCGCTTCTCCCCCTCTCATCCTCACCTCAACCTCCTGGCCCCATTCAAGCACACCCTGGGCCCCCTCTTCTTCTGCTGGTCTGTCCCCTGAGGGGAAAGCCCAGGTCTGAGGCTTCTATGCTGCTTTCTGGCTCAGAACAGCGATTTGACGCTCTGTGAGCCTCGGTTCCTCCCCCGCTTTTTTTTTTTCAGCCAGAGTCTCACTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCAATCTCAGCTCACTGCAAGCTCCGCCTCCCGGGTTCACGCTATTCTCCCGCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCGCCCGCCACCATGCCCGGCTAATTTTTTGTACTTTGAGTAGGGAAGGGGTTTCACTGTATTATCCAGGATGGTCTCTATCTCCTGACCTCGTGATCTGCCCGCCTGGCCTCCCAAAGTGCTGGAATTACAGGCGTGAGCCTCCGCGCCCGGCCTCCCCATCCTTAATATAGGAGTTAGAAGTTTTTGTTTGTTTGTTTTGTTTTGTTTTTGTTTTGTTTTGAGATGAAGTCCCTCTGTCGCCCAGGCTGGAGTGCAGTGGCTCCCAGGCTGGAGTTCAGTGGCTGGATCTCGGCTCACTGCAAGCTCCGCCTCCCAGGTTCACGCCATTCTCCTGCCTCAGCCTCCGGAGTAGCTGGGACTACAGGAACATGCCACCACACCCGACTAACTTTTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTGGCCAGGCTGGTCTGGAACTCCTGACCTCAGGTGATCTGCCTGCTTCAACCTCCCAAAGTGCTGGGATTACAGACGTGGGCCACCGCGCCCGGCTGGGAGTTAAGAGGTTTCTAATGCATTGCATTAGAATACCAGACACGGGACAGCTGTGATCTTTATTCTCCATCACCCCACACAGCCCTGCCTGGGGCACACAAGGACACTCAATACACGCTTTTCGGGCGCGGTGGCTCAAGCTGTAATCCCAGCACTTTGGGAGGCTGAGGCGGGTGGTACATGAGGTCAGGAGATCGAGACCATCCTGGCTAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAAACTAGCCCGGGCGTGGTGGCGGGCGCCTGTAGTCCCAGCTACTCGGAGGCTGAGGCAGGAGAATGGCGTGAACCTGGGAGGCGGAGCTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGGGTGACACAGCGCGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATACACGCTTTTCCGCTAGGCACGGTGGCTCACCCCTGTAATCCCAGCATTTTGGGAGGCCAAGGTGGGAGGATCACTTGAGCCCAGGAGTTCAACACCAGACTCAGCAACATAGTGAGACTCTCTCTACTAAAAATACAAAAATTAGCCAGGCCTGGTGCCACACACCTGTGGTCCCAGCTACTCAGAAGGCTAAGGCAGGAGGATCGCTTAAGCCCAGAAGGTCAAGGTTGCAGTGAACCACGTTCAGGCCACTGCAGTCCAGCCTGGGTGACAGAGCAAGACCCTGTCTGTAAATAAATAACGCTTTTCAAGTGATTAAACAGACTCCCCCCTCACCCTGCCCACCATGGCTCCAAAGCAGCATTTGTGGAGCACCTTCTGTGTGCCCCTAGGTACTAGCTGCCTGGACGGGGTCAGAAGGAACCTGAACCACCTTCAACTTGTTCCACACAGGATGCCAGGCCAAGGTGGAGCAACCGGTGGAGCCAGAGACAGAACCCGACGTTCGCCAGCAGGCTGAGTGGCAGAGCGGCCAGCCCTGGGAGCTGGCACTGGGTCGCTTTTGGGATTACCTGCGCTGGGTGCAGACACTGTCTGAGCAGGTGCAGGAGGAGCTGCTCAGCCCCCAGGTCACCCAGGAACTGACGTGAGTGTCCCCATCCCGGCCCTTGACCCTCCTGGTGGGCGGCTATACCTCCCCAGGTCCAGGTTTCATTCTGCCCCTGCCACTAAGTCTTGGGGGCCTGGGTCTCTGCTGGTTCTAGCTTCCTCTTCCCATTTCTGACTCCTGGCTTTAGCTCTCTGGAATTCTCTCTCTCAGTTCTGTTTCTCCCTCTTCCCTTCTGACTCAGCCTGTCACACTCGTCCTGGCGCTGTCTCTGTCCTTCACTAGCTCTTTTATATAGAGACAGAGAGATGGGGTCTCACTGTGTTGCCCAGGCTGGTCTTGAACTTCTGGGCTCAAGCGATCCTCCCACCTCGCCTCCCAAAGTGCTGGGAATAGAGACATGAGCCACCTTGCTCGGCCTCCTAGCTCTTTCTTCGTCTCTGCCTCTGCTCTCTGCGTCTGTCTTTGTCTCCTCTCTGCCTCTGTCCCGTTCCTTCTCTCTTGGTTCACTGCCCTTCTGTCTCTCCCTGTTCTCCTTAGGAGACTCTCCTCTCTTCCTTCTCGAGTCTCTCTGGCTGATCCCCATCTCACCCACACCTATCC
ACTCAGCCCCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGGTGAGAAGCGCAGTCGGGGGCACGGGGATGAGCTCAGGGGCCTCTAGAAAGATGTAGCTGGGACCTCGGGAAGCCCTGGCCTCCAGGTAGTCTCAGGAGAGCTACTCAGGGTCGGGCTTGGGGAGAGGAGGAGCGGGGGTGAGGCCAGCAGCAGGGGACTGGACCTGGGAAGGGCTGGGCAGCAGAGACGACCCGACCCGCTAGAAGGTGGGGTGGGGAGAGCATGTGGACTAGGAGCTAAGCCACAGCAGGACCCCCACGAGTTGTCACTGTCATTTATCGAGCACCTACTGGGTGTCCCCAGTGTCCTCAGATCTCCATAACTGGGAAGCCAGGGGCAGCGACACGGTAGCTAGCCGTCGATTGGAGAACTTTAAAATGAGGACTGAATTAGCTCATAAATGGAAAACGGCGCTTAAATGTGAGGTTAGAGCTTAGAATGTGAAGGGAGAATGAGGAATGCGAGACTGGGACTGAGATGGAACCGGCGGTGGGGAGGGGGAGGGGGTGTGGAATTTGAACCCCGGGAGAGAAAGATGGAATTTTGGCTATGGAGGCCGACCTGGGGATGGGGAAATAAGAGAAGACCAGGAGGGAGTTAAATAGGGAATGGGTTGGGGGCGGCTTGGTAACTGTTTGTGCTGGGATTAGGCTGTTGCAGATAATGGAGCAAGGCTTGGAAGGCTAACCTGGGGTGGGGCCGGGTTGGGGTCGGGCTGGGGGCGGGAGGAGTCCTCACTGGCGGTTGATTGACAGTTTCTCCTTCCCCAGACTGGCCAATCACAGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGGCAGGTATGGGGCGGGGCTTGCTCGGTTTTCCCCGCTTCTCCCCCTCTCATCCTCACCTCAACCTCCTGGCCCCATTCAAGCACACCCTGGGCCCCCTCTTCTTCTGCTGGTCTGTCCCCTGAGGGGAAAGCCCAGGTCTGAGGCTTCTATGCTGCTTTCTGGCTCAGAACAGCGATTTGACGCTCTGTGAGCCTCGGTTCCTCCCCCGCTTTTTTTTTTTCAGCCAGAGTCTCACTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCAATCTCAGCTCACTGCAAGCTCCGCCTCCCGGGTTCACGCTATTCTCCCGCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCGCCCGCCACCATGCCCGGCTAATTTTTTGTACTTTGAGTAGGGAAGGGGTTTCACTGTATTATCCAGGATGGTCTCTATCTCCTGACCTCGTGATCTGCCCGCCTGGCCTCCCAAAGTGCTGGAATTACAGGCGTGAGCCTCCGCGCCCGGCCTCCCCATCCTTAATATAGGAGTTAGAAGTTTTTGTTTGTTTGTTTTGTTTTGTTTTTGTTTTGTTTTGAGATGAAGTCCCTCTGTCGCCCAGGCTGGAGTGCAGTGGCTCCCAGGCTGGAGTTCAGTGGCTGGATCTCGGCTCACTGCAAGCTCCGCCTCCCAGGTTCACGCCATTCTCCTGCCTCAGCCTCCGGAGTAGCTGGGACTACAGGAACATGCCACCACACCCGACTAACTTTTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTGGCCAGGCTGGTCTGGAACTCCTGACCTCAGGTGATCTGCCTGCTTCAACCTCCCAAAGTGCTGGGATTACAGACGTGGGCCACCGCGCCCGGCTGGGAGTTAAGAGGTTTCTAATGCATTGCATTAGAATACCAGACACGGGACAGCTGTGATCTTTATTCTCCATCACCCCACACAGCCCTGCCTGGGGCACACAAGGACACTCAATACACGCTTTTCGGGCGCGGTGGCTCAAGCTGTAATCCCAGCACTTTGGGAGGCTGAGGCGGGTGGTACATGAGGTCAGGAGATCGAGACCATCCTGGCTAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAAACTAGCCCGGGCGTGGTGGCGGGCGCCTGTAGTCCCAGCTACTCGGAGGCTGAGGCAGGAGAATGGCGTGAACCTGGGAGGCGGAGCTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGGGTGACACAGCGCGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATACACGCTTTTCCGCTAGGCACGGTGGCTCACCCCTGTAATCCCAGCATTTTGGGAGGCCAAGGTGGGAGGATCACTTGAGCCCAGGAGTTCAACACCAGACTCAGCAACATAGTGAGACTCTCTCTACTAAAAATACAAAAATTAGCCAGGCCTGGTGCCACACACCTGTGGTCCCAGCTACTCAGAAGGCTAAGGCAGGAGGATCGCTTAAGCCCAGAAGGTCAAGGTTGCAGTGAACCACGTTCAGGCCACTGCAGTCCAGCCTGGGTGACAGAGCAAGACCCTGTCTGTAAATAAATAACGCTTTTCAAGTGATTAAACAGACTCCCCCCTCACCCTGCCCACCATGGCTCCAAAGCAGCATTTGTGGAGCACCTTCTGTGTGCCCCTAGGTACTAGCTGCCTGGACGGGGTCAGAAGGAACCTGAACCACCTTCAACTTGTTCCACACAGGATGCCAGGCCAAGGTGGAGCAACCGGTGGAGCCAGAGACAGAACCCGACGTTCGCCAGCAGGCTGAGTGGCAGAGCGGCCAGCCCTGGGAGCTGGCACTGGGTCGCTTTTGGGATTACCTGCGCTGGGTGCAGACACTGTCTGAGCAGGTGCAGGAGGAGCTGCTCAGCCCCCAGGTCACCCAGGAACTGACGTGAGTGTCCCCATCCCGGCCCTTGACCCTCCTGGTGGGCGGCTATACCTCCCCAGGTCCAGGTTTCATTCTGCCCCTGCCACTAAGTCTTGGGGGCCTGGGTCTCTGCTGGTTCTAGCTTCCTCTTCCCATTTCTGACTCCTGGCTTTAGCTCTCTGGAATTCTCTCTCTCAGTTCTGTTTCTCCCTCTTCCCTTCTGACTCAGCCTGTCACACTCGTCCTGGCGCTGTCTCTGTCCTTCACTAGCTCTTTTATATAGAGACAGAGAGATGGGGTCTCACTGTGTTGCCCAGGCTGGTCTTGAACTTCTGGGCTCAAGCGATCCTCCCACCTCGCCTCCCAAAGTGCTGGGAATAGAGACATGAGCCACCTTGCTCGGCCTCCTAGCTCTTTCTTCGTCTCTGCCTCTGCTCTCTGCGTCTGTCTTTGTCTCCTCTCTGCCTCTGTCCCGTTCCTTCTCTCTTGGTTCACTGCCCTTCTGTCTCTCCCTGTTCTCCTTAGGAGACTCTCCTCTCTTCCTTCTCGAGTCTCTCTGGCTGATCCCCATCTCACCCACACCTATCC
La matèria cromosòmica és “un cristall aperiòdic”, constituït per la successió d'un nombre petit d'elements isomèrics*, la seqüència concreta dels quals és la responsable de la seva funcionalitat.
(*) “the number of atoms in such a structure need not to be very large to produce an almost unlimited number of possible arrangements. For illustration, think of the Morse code…”
1943: Schroëdinger, “What is life?”
ENIAC
Late 40s: first digital computers
MALWTRLRPLLALLALWPPPPARAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREVEGPQVGALELAGGPGAGGLEGPPQKRGIVEQCCASVCSLYQLENYCN
Amino acid sequence of the bovine insuline
http://www.ict-science-to-society.org/
Early 60s: the genetic code
GAGTTTTATCGCTTCCATGACGCAGAAGTTAACACTTTCGGATATTTCTGATGAGTCGAAAAATTATCTTGATAAAGCAGGAATTACTACTGCTTGTTTACGAATTAAATCGAAGTGGACTGCTGGCGGAAAATGAGAAAATTCGACCTATCCTTGCGCAGCTCGAGAAGCTCTTACTTTGCGACCTTTCGCCATCAACTAACGATTCTGTCAAAAACTGACGCGTTGGATGAGGAGAAGTGGCTTAATATGCTTGGCACGTTCGTCAAGGACTGGTTTAGATATGAGTCACATTTTGTTCATGGTAGAGATTCTCTTGT
MALWTRLRPLLALLALWPPPPARAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREVEGPQVGALELAGGPGAGGLEGPPQKRGIVEQCCASVCSLYQLENYCN
1957: invention of the programming language FORTRAN
Computers become smaller and therefore faster and cheaper
During the 60s computers are introduced into bancs, financial institutions, universities and research centers
60s: Transistors and integrated circuits
Sequence alignment and comparison
substitution matrices
Sequence alignment The substitution matrices provided a model
under which the concept of optimal alignment could be formalized, and computed. The optimal alignment between two sequences is the alignment that maximizes the sum of the amino acid substitution values at each aligned position.
A R N D C Q A R N D C Q S K - E A E - S K E A E +1+3-1+3-2+2=6 -1+0+1+3-2+2=3
The total number of possible alignments between two sequences of length 100 is approximately 10200.
With DP the number of operations required to obtain the optimal alignment is aproximately 3x1002
Query: 25 IPREVIERLARSQIHSIRDLQRLLEIDSVGSEDSLDTSLRAHGVHATKHVPEKRPLPIRR 84 IP E+ + L+ I S DLQRLL+ DS G ED + L H+ + R Sbjct: 10 IPEELYKMLSGHSIRSFDDLQRLLQGDS-GKEDGAELDLNMTRSHSGGELESLA----RG 64
Query: 85 KRSI------EEAVPAVCKTRTVIYEIPRSQVDPTSANFLIWPPCVEVKRCTGCCNTSSV 138 KRS+ E A+ A CKTRT ++EI R +D T+ANFL+WPPCVEV+RC+GCCN +VSbjct: 65 KRSLGSLSVAEPAMIAECKTRTEVFEISRRLIDRTNANFLVWPPCVEVQRCSGCCNNRNV 124
Query: 139 KCQPSRVHHRSVKVAKVEYVRKKPKLKEVQVRLEEHLECAC 179 +C+P++V R V+V K+E VRKKP K+ V LE+HL C CSbjct: 125 QCRPTQVQLRPVQVRKIEIVRKKPIFKKATVTLEDHLACKC 165
DYNAMIC PROGRAMMING,Nedleman and Wunsch, 1970Smith and Waterman, 1981
70’s: Optimal sequence alignment
mid70’s: DNA sequencing, Sanger. Maxam and Gilbert
By the end of the sixties, hundreds of proteins had been sequenced, but the sequencing on nucleic acids remained elusive
Sanger (Cambridge)Maxam and Gilbert (Harvard)
Anys 70: Internet. Advanced Research Projects Agency
gagttttatcgcttccatgacgcagaagttaacactttcggatatttctgatgagtcgaaaaattatcttgataaagcaggaattactactgcttgtttacgaattaaatcgaagtggactgctggcggaaaatgagaaaattcgacctatccttgcgcagctcgagaagctcttactttgcgacctttcgccatcaactaacgattctgtcaaaaactgacgcgttggatgaggagaagtggcttaatatgcttggcacgttcgtcaaggactggtttagatatgagtcacattttgttcatggtagagattctcttgttgacattttaaaagagcgtggattactatctgagtccgatgctgttcaaccactaataggtaagaaatcatgagtcaagttactgaacaatccgtacgtttccagaccgctttggcctctattaagctcattcaggcttctgccgttttggatttaaccgaagatgatttcgattttctgacgagtaacaaagtttggattgctactgaccgctctcgtgctcgtcgctgcgttgaggcttgcgtttatggtacgctggactttgtgggataccctcgctttcctgctcctgttgagtttattgctgccgtcattgcttattatgttcatcccgtcaacattcaaacggcctgtctcatcatggaaggcgctgaatttacggaaaacattattaatggcgtcgagcgtccggttaaagccgctgaattgttcgcgtttaccttgcgtgtacgcgcaggaaacactgacgttcttactgacgcagaagaaaacgtgcgtcaaaaattacgtgcggaaggagtgatgtaatgtctaaaggtaaaaaacgttctggcgctcgccctggtcgtccgcagccgttgcgaggtactaaaggcaagcgtaaaggcgctcgtctttggtatgtaggtggtcaacaattttaattgcaggggcttcggccccttacttgaggataaattatgtctaatattcaaactggcgccgagcgtatgccgcatgacctttcccatcttggcttccttgctggtcagattggtcgtcttattaccatttcaactactccggttatcgctggcgactccttcgagatggacgccgttggcgctctccgtctttctccattgcgtcgtggccttgctattgactctactgtagacatttttactttttatgtccctcatcgtcacgtttatggtgaacagtggattaagttcatgaaggatggtgttaatgccactcctctcccgactgttaacactactggttatattgaccatgccgcttttcttggcacgattaaccctgataccaataaaatccctaagcatttgtttcagggttatttgaatatctataacaactattttaaagcgccgtggatgcctgaccgtaccgaggctaaccctaatgagcttaatcaagatgatgctcgttatggtttccgttgctgccatctcaaaaacatttggactgctccgcttcctcctgagactgagctttctcgccaaatgacgacttctaccacatctattgacattatgggtctgcaagctgcttatgctaatttgcatactgaccaagaacgtgattacttcatgcagcgttaccatgatgttatttcttcatttggaggtaaaacctcttatgacgctgacaaccgtcctttacttgtcatgcgctctaatctctgggcatctggctatgatgttgatggaactgaccaaacgtcgttaggccagttttctggtcgtgttcaacagacctataaacattctgtgccgcgtttctttgttcctgagcatggcactatgtttactcttgcgcttgttcgttttccgcctactgcgactaaagagattcagtaccttaacgctaaaggtgctttgacttataccgatattgctggcgaccctgttttgtatggcaacttgccgccgcgtgaaatttctatgaaggatgttttccgttctggtgattcgtctaagaagtttaagattgctgagggtcagtggtatcgttatgcgccttcgtatgtttctcctgcttatcaccttcttgaaggcttcccattcattcaggaaccgccttctggtgatttgcaagaacgcgtacttattcgccaccatgattatgaccagtgtttccagtccgttcagttgttgcagtggaatagtcaggttaaatttaatgtgaccgtttatcgcaatctgccgaccactcgcgattcaatcatgacttcgtgataaaagattgagtgtgaggttataacgccgaagcggtaaaaattttaatttttgccgctgaggggttgaccaagcgaagcgcggtaggttttctgcttaggagtttaatcatgtttcagacttttatttctcgccataattcaaactttttttctgataagctggttctcacttctgttactccagcttcttcggcacctgttttacagacacctaaagctacatcgtcaacgttatattttgatagtttgacggttaatgctggtaatggtggttttcttcattgcattcagatggatacatctgtcaacgccgctaatcaggttgtttctgttggtgctgatattgcttttgatgccgaccctaaattttttgcctgtttggttcgctttgagtcttcttcggttccgactaccctcccgactgcctatgatgtttatcctttgaatggtcgccatgatggtggttattataccgtcaaggactgtgtgactattgacgtccttccccgtacgccgggcaataacgtttatgttggtttcatggtttggtctaactttaccgctactaaatgccgcggattggtttcgctgaatcaggttattaaagagattatttgtctccagccacttaagtgaggtgatttatgtttggtgctattgctggcggtattgcttctgctcttgctggtggcgccatgtctaaattgtttggaggcggtcaaaaagccgcctccggtggcattcaaggtgatgtgcttgctaccgataacaatactgtaggcatgggtgatgctggtattaaatctgccattcaaggctctaatgttcctaaccctgatgaggccgcccctagttttgtttctggtgctatggctaaagctggtaaaggacttcttgaaggtacgttgcaggctggcacttctgccgtttctgataagttgcttgatttggttggacttggtggcaagtctgccgctgataaaggaaaggatactcgtgattatcttgctgctgcatttcctgagcttaatgcttgggagcgtgctggtgctgatgcttcctctgctggtatggttgacgccggatttgagaatcaaaaagagcttactaaaatgcaactggacaatcagaaagagattgccgagatgcaaaatgagactcaaaaagagattgctggcattcagtcggcgacttcacgccagaatacgaaagaccaggtatatgcacaaaatgagatgcttgcttatcaacagaaggagtctactgctcgcgttgcgtctattatggaaaacaccaatcttcccaagcaacagcaggtttccgagattatgcgccaaatgcttactcaagctcaaacggctggtcagtattttaccaatgaccaaatcaaagaaatgactcgcaaggttagtgctgaggttgacttagttcatcagcaaacgcagaatcagcggtatggctcttctcatattggcgctactgcaaaggatatttctaatgtcgtcactgatgctgcttctggtgtggttgatatttttcatggtattgataaagctgttgccgatacttggaacaatttctggaaagacggtaaagctgatggtattggctctaatttgtctaggaaataaccgtcaggattgacaccctcccaattgtatgttttcatgcctccaaatcttggaggcttttttatggttcgttcttattacccttctgaatgtcacgctgattattttgactttgag
1977: X174 virus genome
1982: the first electronic databases
FASTA, 1982: Wilbur and Lipman, 1985: Lipman and PearsonBLAST, 1990: Altschul, Gish, Miller, Myers and Lipman
accelerating database searches hash methods
1 2 3 4 5 6 7 8 9 10 11 12 13
W A T S N A N D C R I C K
A C D I K N R S T W26
912
8 11
13
57
10 4 3 1
Query Sequence
Hash table K=1
http://www.ccl.rutgers.edu/~ouyang/5020/FASTA-BLAST.ppt
Search of the Platelet Derived Growth Factor sequence
1982, Doolittle: relationship between oncogenes and growth factors
1990:The human genome projectTHE HUMAN GENOME PROGRAM (HGP) is producing large quantities of complex map and DNA sequence data. Informatics projects in algorithms, software, and databases are crucial in accumulating and interpreting these data in a robust and automated fashion at genome and sequencing centers
Computer systems play essential roles in all aspects of genome research, from data acquisition and analysis to data management. Without powerful computers and appropriately designed data–management systems, high–volume genome research cannot proceed.
This proposal concerns the management of general information about accelerators and experiments at CERN. It discusses the problems of loss of information about complex evolving systems and derives a solution based on a distributed hypertext system (Tim Berners-Lee)
1990:WWW at CERN
Human Genome Project Milestones
2001: la culminació del projecte
bioinformatics
Medline articles with keyword Bioinformatics.
year # articlesTo 1990 0
bioinformatics
Medline articles with keyword Bioinformatics.
year # articlesTo 1990 0
1990-1994 15
bioinformatics
Medline articles with keyword Bioinformatics.
year # articlesTo 1990 0
1990-1994 151995-1999 823
bioinformatics
Medline articles with keyword Bioinformatics.
year # articlesTo 1990 0
1990-1994 151995-1999 8232000-2004 7827
bioinformatics
Medline articles with keyword Bioinformatics.
year # articlesTo 1990 0
1990-1994 151995-1999 8232000-2004 78272005-2008 18822
bioinformatics
Medline articles with keyword Bioinformatics.
year # articlesTo 1990 0
1990-1994 151995-1999 8232000-2004 78272005-2009 188222010-2013 52835
Bioinformatics, Genomics, Systems Biology in Medline
What is past, is a prologue
W. Shakespeare, La Tempestad,
Evolution of Sequencing technologies
Evolution of sequencing technologies
• The human genome project– 12-15 years– 5 large centers: hundreds of instruments– Hundreds of scientists world wide– 3,000 millions dollars
• Today– A single instrument
2-3 genomes/today– 3,000 dollars/genoma
ENIAC, 1950s2.4 x 0.9 x 30 (m) 385 operations/second. 10-6 operations/second/cm3
ENIAC, 1950s2.4 x 0.9 x 30 (m) 385 operations/second. 10-6 operations/second/cm3
MAC AIR, 2010s~1 x 32.5 x 22.7 (cm) 133,656,056 operations/second. 105 operations/second/cm3
CELERA GENOMICS, year 20001,000 m2. 2 yr. 3GB at 10x5x10-6 Gb/day/m3
CELERA GENOMICS, year 20001,000 m2. 2 yr. 3GB at 10x5x10-6 Gb/day/m3
HISEQ 2500. year 2012119 x 94 x 76 (cm). 1 day 120 Gb102 Gb/day/m3
Moore’s Law
En un futur proper,disposarem del
genoma individual de cadascu de nosaltres
El futur es aqui: el nostre genoma