bioinformatics a biologist’s perspective rob rutherford

Post on 19-Dec-2015

224 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Bioinformatics

A Biologist’s perspective

Rob Rutherford

1. The Biologist’s perspective

2. A survey of tools

3. Training students for the future

If the biota, in the course of eons, has built something …..who but a fool would discard seemingly useless parts? To keep every cog

and wheel is the first precaution of intelligent tinkering.

-Aldo Leopold (1887 - 1948)

Figure 1.18 Careful observation and measurement provide the raw data for science

PubMed had 400,000 new research articles entered in 2002.

NCBI-NLM, 2003

Productive Tinkerers

NIH-NLM 2003

NIH-NLM 2003

(Cockerill 2003))

“If your experiment needs statistics, you ought to have done a better experiment.”

-Rutherford (the other one)

RA Fisher 1956, University of Adeliade Archives

“To consult the statistician after an experiment is finished is … to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.”

2 Wings of Bioinformatics

Housekeeping BioinformaticsRepresentation, storage, and distribution of data

Analytical BioinformaticsNew tools for the discovery of knowledge in data

Part 2

A Survey of Problems/Opportunities

“The Central Dogma”

DNA Information Warehouse

(4 nucleic acid letters atgc)

RNA Temporary copy of a gene

Protein Working Cellular Machine

(20 amino acid letters)

RNA polymerase PDB

A Survey of Problems

Finding Genes and Understanding GenesProtein Structure and FunctionGene ExpressionNetworks

Other areas

Finding and Understanding Genes

Receptors-GPCR (767)

Receptors-NHR (56)

Integrins (33)

Ion Channels (313)

Kinases (713)

Phosphatases (274)

Phosphodiesterases (58)

Neurotrans. transporters (34)

P450s (59)

Proteases (527)

Secreted (3621)

Other (53076)

Estimated Gene Number~(59538)Human Genes

Rutherford

10 20 30 40 ....*....|....*....|....*....|....*....| consen 1 SPKNTPVVLIPKKGPGKYRPISlvDYKILNKATKKrFSpp 40 1MML 83 SPWNTPLLPVKKPGTNDYRPVQ--DLREVNKRVED-IH-- 117 1HNI_B 54 NPYNTPVFAIKKKDSTKWRKLV--DFRELNKRTQD-FWev 90 1MU2_B 49 NPYNTPTFAIKKKDKNKWRMLI--DFRELNKVTQD-FTei 85 1D1U_A 69 SPWNTPLLPVKKPGTNDYRPVQ--DLREVNKRVED-IH– 103

50 60 70 80 ....*....|....*....|....*....|....*....|

Consen 41 qPGFRPGRSLLNKLKGS-KWFLKLDLKKAFDSIPHDPLLR 79 1MML 118 -PTVPNPYNLLSGLPPShQWYTVLDLKDAFFCLRLHPTSQ 156 1HNI_B 91 qLGIPHPAGL-----KKKKSVTVLDVGDAYFSVPLDEDFR 125 1MU2_B 86 qLGIPHPAGL—AKK -RRITVLDVGDAYFSIPLHEDFR 120 1D1U_A 104-PTVPNPYNLLSGLPPShQWYTVLDLKDAFFCLRLHPTSQ 142

CnD3 HIV

Finding Conserved Regions/Domains

HIV protein

Comparing your sequence versus models derived from curated known protein families

Thanks to Porterfield

Phylogenetics and Evolution

Protein Structure

Imaging Experimental X-ray diffraction data

Predicting structure in silico from sequence

Experimental structures in the Protein Data Bank

HIV reverse tanscriptase

Goodsell, PDB

DNA (human genome)

RNA (HIV virus)

Protein

Structure is Function

Goodsell, PDB

Figure 17.0 Ribosome

Structural Predictions just from raw protein sequence?

Figure 17.0 Ribosome 1 ggcacgaggc acggctgtgc aggcacgcat gcaggccagc ….

1 atctgcacgt ggttatgctg ccggagtttg ggccgccact….

CASPCommunity Wide Assessment of techniques

for Protein Structure Prediction

Every two years, contest to test protein structure prediction from primary sequence

An example:

Gene Expression

Sequencing RNA (ESTs)Sequencing bits of ESTs (SAGE)

Automation of In situDNA microarray technology

One spot for each gene

MicroArray

Microarray Expression Analysis

Reference Mixture Specific Organ

H2O2 SDS Diamide Iron NO NOSigE SigH IdeR NrpR Experimental Conditions

400

0 G

enes

Gene turned onGene turned off

Low O2

DormancyGenes

Figure 1.3 Some properties of life

Figure 1.23x1 Biotechnology laboratory

Metabolic Pathway Map

Building Transcriptional Network Map

Networks

Biochemical PathwaysSignaling Networks

Transcriptional Networks Computational Neuroscience

Scientific American 2001

Microarrays uncover networks of interactions…

Other Opportunities

Organismal Physiology Populations

Communities Ecosystems

Same issues in “Macro” Biology

Long history of mathematical modeling

Huge datasets from •GPS/GIS•Remote sensing

If the biota, in the course of eons, has built something …..who but a fool would discard seemingly useless parts? To keep every cog

and wheel is the first precaution of intelligent tinkering.

-Aldo Leopold (1887 - 1948)

Where is all this leading to?

Part 3How do we prepare our students for

this future?

Dr. Peter Munson

Head of the Mathematical and Statistical Computing Laboratory Division of Computational Biosciences National Institutes of Health

Ole’ pre 1976

The Tool Builders

• Excellent mathematical skills

(algorithms, linear algebra, data structures)• Be comfortable in a Linux/Unix environment, and

know Perl and C/C++. • A deep background in 2+ advanced area of

biology with chemistry prerequisites.• Graduate training

The systems biologist.

Biologist who is an intelligent and skeptical consumer of large data sets

• Probability and Statistics • SQL and database basics• Equilibrium and rates of change (Calculus)• Exposure to system level data

And who knows how and when to collaborate(!)

end

top related