Download - What is Bioinformatics?
![Page 1: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/1.jpg)
What is Bioinformatics?
![Page 2: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/2.jpg)
What is Bioinformatics?
Conceptualizing biology in terms of molecules and then applying “informatics” techniques from
math, computer science, and statistics to understand and organize the information
associated with these molecules on a large scale
![Page 3: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/3.jpg)
Focus
![Page 4: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/4.jpg)
Profile of a bioinformatician• (General) knowledge of biology and genome sciences• Translation biology <-> informatics• Knowledge of Unix-based operating systems• Programming skills (Java, Python, Shell/Perl scripting, R)• (Parallel) computing environments• Data storage and database technology• Statistics• Mathematics
Freely adapted from Richter et al (2009) PLoS computational biology
![Page 5: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/5.jpg)
How do we use Bioinformatics?• Store/retrieve biological information (databases)• Retrieve/compare gene sequences• Predict function of unknown genes/proteins• Search for previously known functions of a gene • Compare data with other researchers• Compile/distribute data for other researchers
![Page 6: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/6.jpg)
Other bioinformatics organisations• European Bioinformatics Institute (EBI)
– http://www.ebi.ac.uk/
• National Center for Biotechnology Information (NCBI)– http://www.ncbi.nlm.nih.gov/
• EMBnet– http://www.embnet.org/
• International Society for Computational Biology (ISCB)– http://www.iscb.org/
![Page 7: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/7.jpg)
1965 Margaret Dayhoff's Atlas of Protein Sequences 1970 Needleman-Wunsch algorithm (global alignment)1977 DNA sequencing and software to analyze it (Staden) 1981 Smith-Waterman algorithm developed (local sequence alignment)1981 The concept of a sequence motif (Doolittle) 1982 GenBank made public 1983 Sequence database searching algorithm (Wilbur-Lipman) 1987 Perl (Practical Extraction Report Language) is released by Larry Wall. 1988 National Center for Biotechnology Information (NCBI) created at NIH/NLM 1988 EMBnet network for database distribution 1990 BLAST: fast sequence similarity searching1990 The HTTP 1.0 specification is published. First HTML document. 1990 Grid computing as a metaphor for making computer power as easy to access as an electric power grid. 1994 EMBL European Bioinformatics Institute (EBI), Hinxton, UK 1995 Microsoft version 1.0 of IE. Sun version 1.0 of Java. Version 1.0 of Apache.1997 PSI-BLAST 1997 International Society for Computational Biology was founded1998 Worm (multicellular) genome completely sequenced 1999 e-Science was introduced by John Taylor, the Director General of the United Kingdom's Office of Science and Technology2000 Gene Ontology (GO) 2001 The human genome (3 Giga base pairs) is published. 2001 Minimum information about a microarray experiment (MIAME; Brazma).2001 Genetical Genomics (Ritsert Jansen)2002 BioMoby. Web-service repository2003 myGrid: personalised bioinformatics on the information grid (e.g, Taverna).2004 Bioconductor: open software development for computational biology and bioinformatics2005 Reactome: knowledge base of biological pathways
History of bioinformatics
![Page 8: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/8.jpg)
1965 Margaret Dayhoff's Atlas of Protein Sequences 1970 Needleman-Wunsch algorithm (global alignment)1977 DNA sequencing and software to analyze it (Staden) 1981 Smith-Waterman algorithm developed (local sequence alignment)1981 The concept of a sequence motif (Doolittle) 1982 GenBank made public 1983 Sequence database searching algorithm (Wilbur-Lipman) 1987 Perl (Practical Extraction Report Language) is released by Larry Wall. 1988 National Center for Biotechnology Information (NCBI) created at NIH/NLM 1988 EMBnet network for database distribution 1990 BLAST: fast sequence similarity searching1990 The HTTP 1.0 specification is published. First HTML document. 1990 Grid computing as a metaphor for making computer power as easy to access as an electric power grid. 1994 EMBL European Bioinformatics Institute (EBI), Hinxton, UK 1995 Microsoft version 1.0 of IE. Sun version 1.0 of Java. Version 1.0 of Apache.1997 PSI-BLAST 1997 International Society for Computational Biology was founded1998 Worm (multicellular) genome completely sequenced 1999 e-Science was introduced by John Taylor, the Director General of the United Kingdom's Office of Science and Technology2000 Gene Ontology (GO) 2001 The human genome (3 Giga base pairs) is published. 2001 Minimum information about a microarray experiment (MIAME; Brazma).2001 Genetical Genomics (Ritsert Jansen)2002 BioMoby. Web-service repository2003 myGrid: personalised bioinformatics on the information grid (e.g, Taverna).2004 Bioconductor: open software development for computational biology and bioinformatics2005 Reactome: knowledge base of biological pathways
![Page 9: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/9.jpg)
1965 Margaret Dayhoff's Atlas of Protein Sequences 1970 Needleman-Wunsch algorithm (global alignment)1977 DNA sequencing and software to analyze it (Staden) 1981 Smith-Waterman algorithm developed (local sequence alignment)1981 The concept of a sequence motif (Doolittle) 1982 GenBank made public 1983 Sequence database searching algorithm (Wilbur-Lipman) 1987 Perl (Practical Extraction Report Language) is released by Larry Wall. 1988 National Center for Biotechnology Information (NCBI) created at NIH/NLM 1988 EMBnet network for database distribution 1990 BLAST: fast sequence similarity searching1990 The HTTP 1.0 specification is published. First HTML document. 1990 Grid computing as a metaphor for making computer power as easy to access as an electric power grid. 1994 EMBL European Bioinformatics Institute (EBI), Hinxton, UK 1995 Microsoft version 1.0 of IE. Sun version 1.0 of Java. Version 1.0 of Apache.1997 PSI-BLAST 1997 International Society for Computational Biology was founded1998 Worm (multicellular) genome completely sequenced 1999 e-Science was introduced by John Taylor, the Director General of the United Kingdom's Office of Science and Technology2000 Gene Ontology (GO) 2001 The human genome (3 Giga base pairs) is published. 2001 Minimum information about a microarray experiment (MIAME; Brazma).2001 Genetical Genomics (Ritsert Jansen)2002 BioMoby. Web-service repository2003 myGrid: personalised bioinformatics on the information grid (e.g, Taverna).2004 Bioconductor: open software development for computational biology and bioinformatics2005 Reactome: knowledge base of biological pathways
![Page 10: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/10.jpg)
Global alignment (toy example)
CATGATGACTGAGAT
Can you “align” these two sequences introduce “gaps” in these two sequences such that you maximize the number of matching nucleotides
![Page 11: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/11.jpg)
Global alignment (toy example)
CATGATGACTGAGAT
CATGATGA-C-TGA-GAT
Helps us to understand the function of ‘new’DNA
Dynamic programming gives optimal solution…… but is slow. Often heuristic methods are used (BLAST, BLAT)
![Page 12: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/12.jpg)
Hogeweg, P. (1978). Simulating the growth of cellular forms. Simulation 31, 90-96;
Hogeweg, P. and Hesper, B. (1978) Interactive instruction on population interactions. Comput Biol Med 8:319-27.
Paulien Hogeweg (1943) Dutch theoretical biologist and complex systems researcher studying biological systems as dynamic information processing systems at many interconnected levels.
Together with Ben Hesper she coined the term Bioinformatics in 1978 as the study of informatic processes in biotic systems
1978
![Page 13: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/13.jpg)
1965 Margaret Dayhoff's Atlas of Protein Sequences 1967 Scientific director of NBIC was born1970 Needleman-Wunsch algorithm (global alignment)1977 DNA sequencing and software to analyze it (Staden) 1981 Smith-Waterman algorithm developed (local sequence alignment)1981 The concept of a sequence motif (Doolittle) 1982 GenBank made public 1983 Sequence database searching algorithm (Wilbur-Lipman) 1987 Perl (Practical Extraction Report Language) is released by Larry Wall. 1988 National Center for Biotechnology Information (NCBI) created at NIH/NLM 1988 EMBnet network for database distribution 1990 BLAST: fast sequence similarity searching1990 The HTTP 1.0 specification is published. First HTML document. 1990 Grid computing as a metaphor for making computer power as easy to access as an electric power grid. 1994 EMBL European Bioinformatics Institute (EBI), Hinxton, UK 1995 Microsoft version 1.0 of IE. Sun version 1.0 of Java. Version 1.0 of Apache.1997 PSI-BLAST 1997 International Society for Computational Biology was founded1998 Worm (multicellular) genome completely sequenced 1999 e-Science was introduced by John Taylor, the Director General of the United Kingdom's Office of Science and Technology2000 Gene Ontology (GO) 2001 The human genome (3 Giga base pairs) is published. 2001 Minimum information about a microarray experiment (MIAME; Brazma).2001 Genetical Genomics (Ritsert Jansen)2002 BioMoby. Web-service repository2003 myGrid: personalised bioinformatics on the information grid (e.g, Taverna).2004 Bioconductor: open software development for computational biology and bioinformatics2005 Reactome: knowledge base of biological pathways
![Page 14: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/14.jpg)
1965 Margaret Dayhoff's Atlas of Protein Sequences 1967 Scientific director of NBIC was born1970 Needleman-Wunsch algorithm (global alignment)1977 DNA sequencing and software to analyze it (Staden) 1981 Smith-Waterman algorithm developed (local sequence alignment)1981 The concept of a sequence motif (Doolittle) 1982 GenBank made public 1983 Sequence database searching algorithm (Wilbur-Lipman) 1987 Perl (Practical Extraction Report Language) is released by Larry Wall. 1988 National Center for Biotechnology Information (NCBI) created at NIH/NLM 1988 EMBnet network for database distribution 1990 BLAST: fast sequence similarity searching1990 The HTTP 1.0 specification is published. First HTML document. 1990 Grid computing as a metaphor for making computer power as easy to access as an electric power grid. 1994 EMBL European Bioinformatics Institute (EBI), Hinxton, UK 1995 Microsoft version 1.0 of IE. Sun version 1.0 of Java. Version 1.0 of Apache.1997 PSI-BLAST 1997 International Society for Computational Biology was founded1998 Worm (multicellular) genome completely sequenced 1999 e-Science was introduced by John Taylor, the Director General of the United Kingdom's Office of Science and Technology2000 Gene Ontology (GO) 2001 The human genome (3 Giga base pairs) is published. 2001 Minimum information about a microarray experiment (MIAME; Brazma).2001 Genetical Genomics (Ritsert Jansen)2002 BioMoby. Web-service repository2003 myGrid: personalised bioinformatics on the information grid (e.g, Taverna).2004 Bioconductor: open software development for computational biology and bioinformatics2005 Reactome: knowledge base of biological pathways
![Page 15: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/15.jpg)
1965 Margaret Dayhoff's Atlas of Protein Sequences 1967 Scientific director of NBIC was born1970 Needleman-Wunsch algorithm (global alignment)1977 DNA sequencing and software to analyze it (Staden) 1981 Smith-Waterman algorithm developed (local sequence alignment)1981 The concept of a sequence motif (Doolittle) 1982 GenBank made public 1983 Sequence database searching algorithm (Wilbur-Lipman) 1987 Perl (Practical Extraction Report Language) is released by Larry Wall. 1988 National Center for Biotechnology Information (NCBI) created at NIH/NLM 1988 EMBnet network for database distribution 1990 BLAST: fast sequence similarity searching1990 The HTTP 1.0 specification is published. First HTML document. 1990 Grid computing as a metaphor for making computer power as easy to access as an electric power grid. 1994 EMBL European Bioinformatics Institute (EBI), Hinxton, UK 1995 Microsoft version 1.0 of IE. Sun version 1.0 of Java. Version 1.0 of Apache.1997 PSI-BLAST 1997 International Society for Computational Biology was founded1998 Worm (multicellular) genome completely sequenced 1999 e-Science was introduced by John Taylor, the Director General of the United Kingdom's Office of Science and Technology2000 Gene Ontology (GO) 2001 The human genome (3 Giga base pairs) is published. 2001 Minimum information about a microarray experiment (MIAME; Brazma).2001 Genetical Genomics (Ritsert Jansen)2002 BioMoby. Web-service repository2003 myGrid: personalised bioinformatics on the information grid (e.g, Taverna).2004 Bioconductor: open software development for computational biology and bioinformatics2005 Reactome: knowledge base of biological pathways
![Page 16: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/16.jpg)
1965 Margaret Dayhoff's Atlas of Protein Sequences 1967 Scientific director of NBIC was born1970 Needleman-Wunsch algorithm (global alignment)1977 DNA sequencing and software to analyze it (Staden) 1981 Smith-Waterman algorithm developed (local sequence alignment)1981 The concept of a sequence motif (Doolittle) 1982 GenBank made public 1983 Sequence database searching algorithm (Wilbur-Lipman) 1987 Perl (Practical Extraction Report Language) is released by Larry Wall. 1988 National Center for Biotechnology Information (NCBI) created at NIH/NLM 1988 EMBnet network for database distribution 1990 BLAST: fast sequence similarity searching1990 The HTTP 1.0 specification is published. First HTML document. 1990 Grid computing as a metaphor for making computer power as easy to access as an electric power grid. 1994 EMBL European Bioinformatics Institute (EBI), Hinxton, UK 1995 Microsoft version 1.0 of IE. Sun version 1.0 of Java. Version 1.0 of Apache.1997 PSI-BLAST 1997 International Society for Computational Biology was founded1998 Worm (multicellular) genome completely sequenced 1999 e-Science was introduced by John Taylor, the Director General of the United Kingdom's Office of Science and Technology2000 Gene Ontology (GO) 2001 The human genome (3 Giga base pairs) is published. 2001 Minimum information about a microarray experiment (MIAME; Brazma).2001 Genetical Genomics (Ritsert Jansen)2002 BioMoby. Web-service repository2003 myGrid: personalised bioinformatics on the information grid (e.g, Taverna).2004 Bioconductor: open software development for computational biology and bioinformatics2005 Reactome: knowledge base of biological pathways
![Page 17: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/17.jpg)
1965 Margaret Dayhoff's Atlas of Protein Sequences 1967 Scientific director of NBIC was born1970 Needleman-Wunsch algorithm (global alignment)1977 DNA sequencing and software to analyze it (Staden) 1981 Smith-Waterman algorithm developed (local sequence alignment)1981 The concept of a sequence motif (Doolittle) 1982 GenBank made public 1983 Sequence database searching algorithm (Wilbur-Lipman) 1987 Perl (Practical Extraction Report Language) is released by Larry Wall. 1988 National Center for Biotechnology Information (NCBI) created at NIH/NLM 1988 EMBnet network for database distribution 1990 BLAST: fast sequence similarity searching1990 The HTTP 1.0 specification is published. First HTML document. 1990 Grid computing as a metaphor for making computer power as easy to access as an electric power grid. 1994 EMBL European Bioinformatics Institute (EBI), Hinxton, UK 1995 Microsoft version 1.0 of IE. Sun version 1.0 of Java. Version 1.0 of Apache.1997 PSI-BLAST 1997 International Society for Computational Biology was founded1998 Worm (multicellular) genome completely sequenced 1999 The term e-Science was created by John Taylor, the Director General of the United Kingdom's Office of Science and Technology2000 Gene Ontology (GO) 2001 The human genome (3 Giga base pairs) is published. 2001 Minimum information about a microarray experiment (MIAME; Brazma).2001 Genetical Genomics (Ritsert Jansen, Jan Peter Nap)2002 BioMoby. Web-service repository2003 myGrid: personalised bioinformatics on the information grid (e.g, Taverna).2004 Bioconductor: open software development for computational biology and bioinformatics2005 Reactome: knowledge base of biological pathways
![Page 18: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/18.jpg)
Bioinformatics in the Netherlands1976 Pauline Hogeweg (theoretical biology)1979 Gert Vriend (proteins)1985 Computer Assisted Organic Synthesis/Computer Assisted
Molecular Modelling Centre (CAOS/CAMM) was founded (Nijmegen, Jan
Noordik)1989 Jack Leunissen (first Dutch researcher with PhD in
Bioinformatics)90 ‘s Driving forces: Herman Berendsen, Charles Buys, Jacob de
Vlieg1999 CAOS/CAMM was reorganized; Gert Vriend becomes director
of CMBI.1999 KNAW committee(chaired by Berendsen) wrote the report
‘Bioexact’ in which strong stimulation of bioinformatics was
recommended.2000 KNCV working group bioinformatics2000 NWO-BMI (Biomolecular informatics); program committee
chaired by De Vlieg2001 NWO/KNAW workshop ‘The future of bioinformatics in the Netherlands’2002 Position paper ‘De toekomst van de bioinformatica in
Nederland’ representing the vision of the NWO/KNAW 2003 NBIC was founded2003 First BioRange proposal (Vriend, Berendsen, Hertzberger,
Tellegen) 2005 Start of BioRange (NBIC-I)2008 ……………
![Page 19: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/19.jpg)
Publication history
http://dan.corlan.net/medline-trend.html
![Page 20: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/20.jpg)
• Many different bioinformatic tools are freely available– BLAST, EMBOSS, EnsEMBL, GenScan, BioConductor,........
• Many different biological databases are freely available– GenBank, UniProtKB, KEGG,........
• Many publications in open access journals– BMC bioinformatics– PLoS computational biology
• Also many commercial software packages available– Spotfire, Rosetta Resolver, Genelogic, ......
• Bioinformaticians write their own tools for specialized tasks
Bioinformatics tools and databases
![Page 21: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/21.jpg)
National Center for Biotechnology Information
GenBank and other genome databases
Sequence retrieval:
Protein Structure:
3D modeling programs – RasMol, Protein Explorer
Sequence comparison programs:
BLAST GCG MacVector
![Page 22: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/22.jpg)
![Page 23: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/23.jpg)
Similarity Search: BLAST
A tool for searching gene or protein sequence databases for related genes of interest
The structure, function, and evolution of a gene may be determined by such comparisons
Alignments between the query sequence and any given database sequence, allowing for mismatches and gaps, indicate their degree of similarity
http://www.ncbi.nlm.nih.gov/BLAST/
![Page 24: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/24.jpg)
MRCKTETGAR
MRCGTETGAR
% identity
90%
CATTATGATA
GTTTATGATT70%
![Page 25: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/25.jpg)
Strengths:
Accessibility
Growing rapidly
User friendly
Weaknesses:
Sometimes not up-to-date
Limited possibilities
Limited comparisons and information
Not accurate
![Page 26: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/26.jpg)
Need for improved Bioinformatics
Genomics: Human Genome ProjectGene array technologyComparative genomicsFunctional genomics
Proteomics:
Global view of protein function/interactions
Protein motifs
Structural databases
![Page 27: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/27.jpg)
Data Mining
Handling enormous amounts of data
Sort through what is important and what is not
Manipulate and analyze data to find patterns and variations that correlate with biological function
![Page 28: What is Bioinformatics?](https://reader035.vdocuments.mx/reader035/viewer/2022070500/568168f0550346895ddff5c6/html5/thumbnails/28.jpg)
bioinformatics
students educators
researchers institutions