bioinformatics i sequence analysis - purdue university · consider how humans and mouse have...

35
Bioinformatics I Sequence Analysis

Upload: others

Post on 08-Mar-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Bioinformatics I Sequence Analysis

Page 2: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Prerequisites•Comp Sci 1 and 2. Can you program?

•Molecular biology. Do you know the central dogma?

•Molecular biochemistry. What molecular functions can you name?

Page 3: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

You need this:

Z&B

Page 4: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

...and this.

4

www.geneious.com

Page 5: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Chris, don’t forget to...

• Review the syllabus• Set office hours

5

Page 6: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Consider how humans and mouse have diverged...

•Mouse and human had a common ancestor about 80 MYA.

•Evolution occurs by point mutations, insertions, deletions and rearrangements.

•Individual mouse genes and human genes are 80 to 95% identical.

•However, gene locations are scrambled! (Maybe gene location matters more than its sequence???)

actual content starts here

Page 7: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Inter-chromosomal rearrangement

Page 8: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Mouse chromosomes re-constructed from human

8

Page 9: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Largescale evolutionary changes in chromosomes

• Duplication

9

“...the most important evolutionary force since the emergence of the universal common ancestor.” -- Susumu Ohno

Page 10: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Inversions•A,B,C,D are genes or syntenic groups.

A BC

D

CB

DA

5'

5'3'

3'

Messages in C,B region are now read from the opposite (-) strand.

Page 11: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Transposition

A syntenic group appears on the opposite strand, different location.

Page 12: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Syntenic group in bacteria: Trp operon

Page 13: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

One way inversion can happen in Eukaryotes

Remember Meiosis?

Page 14: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Normal synapsis does not change the order of genes, just swaps alleles.

3'A B

CD

b c

ad

5'

5' 3'

5'b

cDA

Normal prophase 1

3'

5'B

Cda

3'

non-sister cromatids synapse. Site of homologous recombination

Page 15: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Illegitimate synapsis changes the order and direction of genes, and swaps alleles.

3'A B

CD

cb

da5'

5'3'

5'c

bDA

Abnormal prophase 1.

3'

5'C

Bda

3'

illegitimate synapse

INVERSION

Page 16: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Edit distance as a series of mutations

ACGGCGTGCTTTAGAACATAG

AAGGCGTGCTTTAGAACATAG

AAGGCGTGCGTTAGAACATAG

ACGGCGTGCGTAAGGACAATAG

• • •

time

dista

nce

=4

homoplasy

Page 17: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Edit distance (genomic) as a series of reversals

123456789

432156789

436512789

435612789

435698721

123498765

123789465

123789645

873219645

ancestral mammal

humanmouse

Page 18: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

rug rat mouse rat

Page 19: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

The Pancake Flipping ProblemA sloppy cook at a pancake diner makes pancakes of all different sizes and stacks them haphazardly.

The waiter likes the pancakes to be stacked with the largest on the bottom and the smallest on top. On the way to the table, using only one hand with a spatula, he flips the pancakes until they are arranged by size, largest on bottom, smallest on top.

•What is the algorithm for flipping?

•What is the algorithm for finding the fewest flips?

•Same problem, pancakes burned on one side.

Page 20: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

In class exercise:

Given the arrangement below, flip the pancakes until they are in order. How many flips? (You can order the numbers instead of the pancakes.)

642315 123456

Page 21: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

In class exercise: work in pairs

•Write detailed instructions on how to stack six pancakes in order by flipping. The instructions should not depend on the starting order.

•Give your instructions to your partner. Follow your partner's written instructions to stack the following six "pancakes" in order: 125436 ---> ... ---> ... ---> 123456

•On the board: Convert these instructions to pseudocode.

Page 22: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Minimum inversions algorithm using maximal cycle decomposition

22Vineet Bafna and Pavel A. Pevzner. Molecular Biology and Evolution 12 (2): 239. (1995)

01234567 ==> 03152647

Page 23: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Reversals put genes on the opposite strand.123456789

-4-3-2-156789

-4-3-6-512789

-4-35612789

-4-356-9-8-7-2-1

1234-9-8-7-6-5

123789-4-6-5

12378964-5

-8-7-3-2-1964-5

ancestral mammal

humanmouse

“-” indicates that the gene is on the reverse complement strand.

“pancakes burned on one side” problem

Page 24: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Plotting rearranged genomes

24

1 2 3 4 5 6 7 8 9-4 -3 5 6 -9 -8 -7 -2-1

1 2 3 4 5 6 7 8 9 -4 -3 5 6 -9 -8 -7 -2-1

Page 25: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Alignment matrix

25

A C T G A A C C T1

11

11

11

11

0 0 0 0 0 0 0 00 0 0 0 0 0 0

0 0 0 0 0 00

000 0 0 0 000 0

0 0 0 0 000 00 0 0 0 000 00 0 0 0 000 00 0 0 0 000 00 0 0 0 000 0

A C

T G A

A C

C T

1=aligned (associated)0=not aligned

•Boolean matrix.•Only one “1” per row.•Only one “1” per column.•If A(i,j)==1, then A(m,n)=0 for all (m<i && n>j)•If A(i,j)==1, then A(m,n)=0 for all (m>i && n<j)

Page 26: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Alignment matrix

26

A C T G A A C C T

A C

T G A

A C

C T

easier to draw this way

Page 27: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Alignment matrix

27

A C T G A A C C T

A C

T G A

A C

C T

easier still

Page 28: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Alignment matrix

28

reverse alignment

A C T G A A C C T

TC

CA

AG

TC

A(NOT biologically relevant!)

Page 29: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Alignment matrix

29

reverse complement alignment

(yes, biologically relevant, for nucleotides!)

A C T G A A C C T

AG

GT

TC

AG

T

Page 30: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Plotting rearranged genomes

30

Z&B p. 158

1 2 3 4 5 6 7 8 9-4 -3 5 6 -9 -8 -7 -2-1

1 2 3 4 5 6 7 8 9 -4 -3 5 6 -9 -8 -7 -2-1

Each number represents a long sequence

Each negative number represents its reverse complement

Page 31: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Dot PlotEach position in the matrix D[i,j] is either

dot, if A[i] == B[j]

blank, otherwise.

AAGACGTTTA GACGTACT

Page 32: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

A more advanced dot plotWith thousands of bases, it is impossible to plot all dots in the matrix. Instead we look for stretches of sequence with few mismatches. If the number of mismatches is less than the cutoff, plot a dot or line.

AAGACGTTTA GACGTACT

Show all diagonals with at least 4 out

of 5 matches.

"window size" is the length of a diagonal, "stringency" is minimum number of matches in the window.

Page 33: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Dot matrix with reverse complement

AAGACGTTTA GACGTACT

Base matches its complementReverse diagonal means inverse alignment

Page 34: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Install Geneious• www.geneious.com• Download “Free Trial”• Install (follow instructions of installer program)• Start Geneious Pro

– From top menu: Pro->Enter a license key...– Licensee’s name= your name– paste license key

34

Page 35: Bioinformatics I Sequence Analysis - Purdue University · Consider how humans and mouse have diverged... •Mouse and human had a common ancestor about 80 MYA. •Evolution occurs

Explore Geneious• Sample Documents/Nucleotide Documents/DCN

gene• DotPlot (self), high sensitivity. Find a

WindowSize=50 with at least Threshold=80• Select. Got to Sequence View.• Extract.• Select extracted sequences. Align. (use defaults)• What is the Pairwise % Identity?

35