biology/computer science 251: introduction to bioinformatics what is bioinformatics?...
TRANSCRIPT
Biology/Computer Science 251: Introduction to Bioinformatics
What is bioinformatics?
Definition: “Bioinformatics is nothing but good, sound, regularbiology appropriately dressed so that it can fit into a computer” -- Claverie & Notredame, Bioinformatics for Dummies
Class Web Sitehttp://cs.gettysburg.edu/~leinbach/Bio_CS251/
• This site will contain all important documents related to the class.
Power points of all lectures
Labs
Exam Answer Keys (Posted after the exam)
Homework Assignments and Answer Keys
• Note the updated syllabus contained at this site. It supercedes the one distributed on paper.
Where to start: A brief history of Bioinformatics
Gregor Mendel: 1866 - first described a set of mathematical rules by Which the appearance of an organism (its PHENOTYPE) could berelated to its inherited genetic makeup (GENOTYPE)
7 pea traits, or characters, studied by Mendel
Two Scientists Who In ~900 Words Reshaped the Way In Which We View Life on Earth
Red blood cells undergo sickling due toa single base change in the DNA of thebeta-globin gene. This base change in DNA changes one residue in the protein`
Bioinformatics allows the study of gene origins and the evolution of new genes
SCIENCE,12-22-06
We are now moving into the post-genomic era, as entire genomesare sequenced and made available with individual genome databases
Bioinformatics was born out of databases, constructed to collectprotein and DNA sequences
1. Proteins were first: 1960’s thru 1970’s Margaret Dayhoff, National BiomedicalResearch Foundation (NBRF) established PIR, Protein Information Resource,which grew into the PIR-International Protein Sequence Database:
http://www-nbrf.georgetown.edu/pir
2. DNA came second: 1977 - reliable DNA sequencing developed1974 - GenBank was established, followed by1980 - European Molecular Biology Laboratory
(EMBL) Data Library, and…1984 - DNA Databank of Japan (DDBJ)
Today, GenBank is under National Center for Biotechnology Information (NCBI). GenBank, EMBL, and DDBJ formed International Sequence Database Collaboration, Data are exchanged on a daily basis. GenBank: http://www.ncbi.nlm.nih.gov
How fast is the GenBank database growing?
http://www.ncbi.nlm.nih.gov/GenBank/genebankstats.html
The Net (no pun intended) Result is an Astounding Growth of Biological Information
Source: Scrabanek op. cit.
Whole- genome structure/organization can be compared between species
This is a human karyogram,or idiogram
This is a human-mouse synteny map
Bioinformatic and genomic approaches have exploded thestudy of evolution (molecular evolution)
SCIENCE, 12-22-06
Bioinformatic and genomic approaches allow discovery of newand unsuspected species of organisms that cannot be detectedusing conventional approaches
Archaeal Richmond Mine AcidophilicNanoorganism (ARMAN)
Bioinformatics can be applied to study interactions among proteinswithin the cell, i.e, proteomics
For example, take a look at the Saccharomyces genome database (SGD):
http://www.yeastgenome.org/
Protein Structure Data is Growing at a Slower Pace
Protein-proteininteraction map forbudding yeast
Jeong et al. 2001. Nature 411: 41-2
The color of a node indicatesThe effect of deleting the Corresponding protein.
Red: lethalGreen: non-lethalOrange: slow growthYellow: unknown
Novel insights from the layering of 3-D protein structure withProtein Interaction Networks
Cancer cells Normal cells
DNA chips(DNA microarrays)can be used to unravel the geneticbasis of cancer
One form of cancer, Non-Hodgkins Lymphoma, is actually two types: two varieties of Diffuse Large B-Cell Lymphoma (DLBCL) exhibit very different microarrayRNA-expression profiles and very different survival outcomes.
Definition II: “A field that involves the building and manipulation ofbiological databases. In the context of genomics, this means managingmassive amounts of sequencing data and providing useful access to and interpretation of the data” -- Weaver, Molecular Biology, 3rd ed.
Definition III: “A field that extracts biological information from largedatasets such as sequences, protein interactions, microarrays, etc. Thisfield also includes the area of data visualization”
-- Campbell & Heyer, Genomics, Proteomics, and Bioinformatics
What is bioinformatics?
Definition I: “Bioinformatics is nothing but good, sound, regularbiology appropriately dressed so that it can fit into a computer” -- Claverie & Notredame, Bioinformatics for Dummies
Carl’s Definition of BioinformaticsA study of the algorithms and programs that are used by Molecular Biologists and others in the Biological and Medical Sciences in their quest for understanding protein structure and function in living organisms.
This is just one of many definitions that may be found in text books, scientific papers, and on the web. The simplest definition is that it is an interdisciplinary subject drawing on material from Biology, Mathematics, and Computer Science. To me this is like saying that e = mc2 has something to do with relativity theory.
Some Implications of this Definition
• An individual studying Bioinformatics needs to have some understanding of the basic ideas of Molecular Biology research.
• They also need to have a familiarity with DNA sequences and how they contribute to 3D Protein Structure as well as gene identification and phylogenetics.
• They need to be familiar with the many “in silico” tools that are used and the parameters that control the output of the programs or algorithmically controlled devices.
• It is important for them to understand the objectives and limitations of both Computer Science and Molecular Biology.
• They need to have some experience with collecting biological data for analysis
Computational Biology
Micro Biology & Medical Science
Computational Biology
Computer Science Biology
Bioinformatics
Micro Biology & Medical Science
Computational Biology
(Note the two way arrow)
Early Pre History
Computer Science
Micro Biology
Bioinformatics
Late Pre History
Computer Science
Micro Biology
Bioinformatics
Recent History
Computer Science
Micro Biology
Bioinformatics
As a result, DNA sequencing and Proteomics have had an increasing number of important applications in the life, medical and social sciences.
Pickup any scientific journal that deals with the life or medical sciences, any popular scientific magazine, or, for that matter, any daily newspaper and you will find an article where DNA or related issues play an important role
Why, it even makes the comic section:
For example: A fungus, Aspergillus nidulans
http://www.broad.mit.edu/annotation/genome/aspergillus_nidulans/Home.html
What can we do with these databases?What is the purpose of bioinformatics?
Answer: to make sense out of this: AN 1.29 nt 240000 - 270000
Preview a few tools -
A.n. genome database: “Browse Regions” - “Feature Map” - “Get DNA sequence”
GenBank: “ORF Finder” - “Blastp search” - “Conserved domain” - “Format Results”