introduction to bioinformatics

33
Introduction to Bioinformatics Alexandra M Schnoes Univ. California San Francisco [email protected]

Upload: wilma

Post on 05-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Introduction to Bioinformatics. Alexandra M Schnoes Univ. California San Francisco [email protected]. What is Bioinformatics?. Intersection of Biology and Computers Broad field Often means different things to different people Personal Definition: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to Bioinformatics

Introduction to Bioinformatics

Alexandra M Schnoes

Univ. California San Francisco

[email protected]

Page 2: Introduction to Bioinformatics

What is Bioinformatics?

Intersection of Biology and Computers Broad field

Often means different things to different people

Personal Definition: The utilization of computation for biological

investigation and discovery—the process by which you unlock the biological world through the use of computers.

Page 3: Introduction to Bioinformatics

What does one do in Bioinformatics?

(a small sample)

dsafd

dsafd

C

O–

O

O OH

PO-O

O-P

O

-O

O

O-O-

OH

O-

OH

O

O-

OH

O

H

-OO-

O

O

-O

O

O O

H

?

Our Lab: Understanding Protein (Enzyme) Function

Page 4: Introduction to Bioinformatics

What does one do in Bioinformatics?

(a small sample) Discover new drug targets—computational docking

Atreya, C. E. et al. J. Biol. Chem. 2003;278:14092-14100Shoichet, B. K. Nature. 2004;432:862-865

Page 5: Introduction to Bioinformatics

What does one do in Bioinformatics?

(a small sample) Systems Biology

sbw.kgi.edu/ www.sbi.uni-rostock.de/ research.html

Page 6: Introduction to Bioinformatics

This lab: Nucleotide & Protein Informatics

Sequence analysis Finding similar sequences Multiple sequence alignment Phylogenetic analysis

Page 7: Introduction to Bioinformatics

C

O–

O

O OH

PO-O

O-P

O

-O

O

O-O-

OH

SequenceStructureFunction

Page 8: Introduction to Bioinformatics

Process of Evolution

Sequences change due to Mutation Insertion Deletion

Page 9: Introduction to Bioinformatics

Use Evolutionary Principles to Analyze Sequences

If sequence A and sequence B are similar A and B evolutionarily related

If sequence A, B and C are all similar but A and B are more similar than A and C and B and C. A and B are more closely evolutionarily related to each

other than to C

Page 10: Introduction to Bioinformatics

Extremely Powerful Idea

1. Start with unknown sequence

2. Find what the unknown is similar to

3. Use information about the known to make predictions about the unknown

Page 11: Introduction to Bioinformatics

How do you know when sequences are similar?

Align two sequences together and score their similarity

TASSWSYIVE

TATSFSYLVG

Use substitution matrices to score the alignment

Page 12: Introduction to Bioinformatics

Substitution Matrices Give a Score for Each Mutation

Many different matrices available Blosum matrices standard in the field

Blosum 62 Scoring matrix

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

http://www.carverlab.org/images/

Page 13: Introduction to Bioinformatics

Scoring: Add up the positional Scores

Score of 30

TASSWSYIVE

TATSFSYLVG

TASSWSYIVE

TATSFSYLVG

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Score of 1

Page 14: Introduction to Bioinformatics

Additional issues…

Gaps (insertions/deletions) Have scoring penalties for opening and continuing a

gap

TASSWSYIVE TASSWSYIVE

TATSFLVG TATSF--LVG

Page 15: Introduction to Bioinformatics

How do we find similar sequences?

Start at the National Center for Biotechnology Information http://www.ncbi.nlm.nih.gov/

Page 16: Introduction to Bioinformatics

How do we find similar sequences?

Nucleotide Sequence Databases

Page 17: Introduction to Bioinformatics

How do we find similar sequences?

Protein Sequence Databases

Page 18: Introduction to Bioinformatics

How do we find similar sequences?

Similarity Search: BLAST Basic Local Alignment Search Tool

Page 19: Introduction to Bioinformatics

BLAST is very quick but … Only local alignments Alignments aren’t great Only pair-wise alignments

Page 20: Introduction to Bioinformatics

Want better alignments … Multiple alignment

Multiple sequences Better signal to noise

More Sequences = Better alignment More accurate reflection of evolution

ClustalW Commonly used Easy to use

Page 21: Introduction to Bioinformatics

Visualize the Multiple Alignment

Page 22: Introduction to Bioinformatics

Use the Alignment to Calculate Evolutionary Distances

See ‘how close’ sequences are to each other

Best way to tell what is ‘most similar’

Can calculate simple tree from clustalW

Taubenberger et al., Nature: 437, 889-893, 2005

Page 23: Introduction to Bioinformatics

Caveats!

In reality Sequences (even parts of sequences) can evolve at

different rates Don’t have a good understanding of sequence and

function High sequence identity does not always mean the same

function Getting good alignments and good trees can be very

hard

Page 24: Introduction to Bioinformatics

Bioinformatics: Sequence Analysis

1. Start with unknown sequence

2. Find similar sequences

3. Create alignment

4. Create phylogenetic tree

5. Use information about knowns to make predictions about unknown

Page 25: Introduction to Bioinformatics

Mini Virus Intro— Often considered ‘not alive’ Extremely small (much smaller than a cell) Cellular parasites

Has a genome but can only reproduce inside a host cell

Page 26: Introduction to Bioinformatics

Different Viruses

RNA & DNA viruses Both single and double-

stranded

Page 27: Introduction to Bioinformatics

Different Viruses

RNA & DNA viruses Both single and double-

stranded

Influenza Virus

Page 28: Introduction to Bioinformatics

Influenza Virus (flu) Small genome—8 RNA molecules Evolves quickly– genetic drift, antigenic shift

Page 29: Introduction to Bioinformatics

Influenza Virus (flu)

Sequencing

Reverse Transcriptase

DNA

SequencingGenomic Nucleotide Sequence

Page 30: Introduction to Bioinformatics

Influenza Pandemics

1918 Flu Killed from 50-100 Mil. people worldwide Considered to be one of the most deadly pandemics Killed many of the young and healthy Influenza A, Type H1N1 Thought to have derived from Avian Influenza Recently reconstituted from recovered human samples Considerable ethical debate

Page 31: Introduction to Bioinformatics

Avian Influenza

Current fear of pandemic High mortality rate (including young and healthy) Current concern is Influenza A, Type H5N1 Still only transmitted by contact with birds Is now in Asia and Eastern and Western Europe

Page 32: Introduction to Bioinformatics

This lab: Nucleotide & Protein Informatics

Sequence analysis Finding similar sequences Multiple sequence alignment Phylogenetic analysis

Page 33: Introduction to Bioinformatics