bbsi research simulation news

Post on 15-Jan-2016

36 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

BBSI Research Simulation News. Project proposals. - Monday, June 16. - Format (see News, Presentations and other dates). Renaissance fair and other events. Party at Greg’s house. BBSI Research Simulation PSSMs and Search for Repeats in DNA Application of PSSMs. - PowerPoint PPT Presentation

TRANSCRIPT

BBSI Research SimulationNews

• Project proposals

- Monday, June 16

- Format (see News, Presentations and other dates)

• Renaissance fair and other events

• Party at Greg’s house

BBSI Research SimulationPSSMs and Search for Repeats in DNA

Application of PSSMs

• Regulatory protein and their binding sites• Palindromic DNA and its significance

• How to find protein binding sites: Meme

• PSSMs to find beginning of genes

• Repeated sequences and location of protein binding sites Li et al (2002)

Regulatory Protein and their Binding Sites

GTA ..(8).. TAC

5’-GTGAGTTAGCTCACNNNNNNNNNNTANNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGNNNNNNNNNNNNNNNN3’-CACTCAATCGAGTGNNNNNNNNNNATNNNANNNNNNNNNNNNNNNNNNNNNNNNNNNNTACNNNNNNNNNNNNNNNN

lacZ

Crp

RNA Polymerase

Operator

C

Presence of CRP sites Regulation by carbon source

Presence of X sites Regulation by Y

Regulatory Protein and their Binding Sites

5’-GTGAGTTAGCTCACNNNNNNNNNNTANNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGNNNNNNNNNNNNNNNN3’-CACTCAATCGAGTGNNNNNNNNNNATNNNANNNNNNNNNNNNNNNNNNNNNNNNNNNNTACNNNNNNNNNNNNNNNN

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

NNNNNNNNNNNNNNNNNNNNNNNNNN

recognizes GTGAGTT

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

NNNNNNNNNNNNNNNNNNNNNNNNNN

recognizes GTGAGTT

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

NNNNNNNNNNNNNNNNNNNNNNNNNN

Regulatory Protein and their Binding Sites

Palindromic sequencesTTAATGTGAGTTAGCTCACTCATT

AATTACACTCAATCGAGTGAGTAA

NNNNNNNNNNNNNNN

NNNNNNNNNNNNNNN

NNNNNNNNNNNNNNNNNNNNNNNNNN

Regulatory Protein and their Binding Sites

Palindromic sequences

TTAATGTGAGTTAGCTCACTCATT

AATTACACTCAATCGAGTGAGTAA

NNNNNNNNNNNNNNN

NNNNNNNNNNNNNNN

NNNNNNNNNNNNN

NNNNNNNNNNNNN

Regulatory P

rotein and their Binding Sites

Palindromic sequences

TTAATGTGAGTTAGCTCACTCATT

AATTACACTCAATCGAGTGAGTAA

NNNNNNNNNNNNNNN

NNNNNNNNNNNNNNN

NNNNNNNNNNNNN

NNNNNNNNNNNNN

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

NNNNNNNNNNNNNNNNNNNNNNNNNN

recognizes GTGAGTT

Regulatory Protein and their Binding Sites

Palindromic sequencesTTAA

TGTGAGTT

AGCTCACT

CATT

AATTACAC

TCAATCGA

GTGAGTAA

NNNNNNNN

NNNNNNN

NNNNNNNN

NNNNNNN

NNNNNNNN

NNNNN

NNNNNNNN

NNNNN

Regul

ator

y Pro

tein

and

their

Bin

ding

Site

s

Palin

drom

ic se

quen

ces

TTAATGTGAGTTAGCTCACTCATT

AATTACACTCAATCGAGTGAGTAA

NNNNNNNNNNNNNNN

NNNNNNNNNNNNNNN

NNNNNNNNNNNNN

NNNNNNNNNNNNN

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

NNNNNNNNNNNNNNNNNNNNNNNNNN

Palindromes: Serve as binding sites for dimeric protein

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

5’- -3’3’- -5’

tRNA

TAT GGCATGCTAGC

TTAAT TCATTAATTA AGTAA

CGTACGATCGG TAT

DNA: cruciform

RNA: stem/loop

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

5’- -3’3’- -5’

TAT GGCATGCTAGC

TTAAT TCATTAATTA AGTAA

CGTACGATCGG TAT

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

Function of palindromeRNA secondary structure?

Binding site for dimeric protein?

How to tell?

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

5’- -3’3’- -5’

TAT GGCATACTAGC

TTAAT TCATTAATTA AGTAA

CGTACGATCGG TAT

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

Function of palindromeRNA secondary structure?

Binding site for dimeric protein?

How to tell?

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

5’- -3’3’- -5’

TAT GGCATA CTAGC

TTAAT TCATTAATTA AGTAA

CGTACGATCGG TAT

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

Function of palindromeRNA secondary structure?

Binding site for dimeric protein?

How to tell?

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

5’- -3’3’- -5’

TAT GGCATA TTAGC

TTAAT TCATTAATTA AGTAA

CGTACGATCGG TAT

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

Function of palindromeRNA secondary structure?

Binding site for dimeric protein?

How to tell?

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

5’- -3’3’- -5’

TAT GGCATATTAGC

TTAAT TCATTAATTA AGTAA

CGTACGATCGG TAT

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

Function of palindromeRNA secondary structure?

Binding site for dimeric protein?

How to tell?

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

5’- -3’3’- -5’

TAT GGCATGCTAGC

TTAAT TCATTAATTA AGTAA

CGTACGATCGG TAT

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

Function of palindromeRNA secondary structure?

Binding site for dimeric protein?

How to tell?

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

5’- -3’3’- -5’

TAT GGCATGCTAGC

TTAAT TCATTAATTA AGTAA

CGTACGATCGG TAT

TTAATGTAAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

Function of palindromeRNA secondary structure?

Binding site for dimeric protein?

How to tell?

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

5’- -3’3’- -5’

TAT GGCATGCTAGC

TTAAT TCATTAATTA AGTAA

CGTACGATCGG TAT

TTAATGTAAGTTAGCTCACTCATTAATTACATTCAATCGAGTGAGTAA

Function of palindromeRNA secondary structure?

Binding site for dimeric protein?

How to tell?

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

5’- -3’3’- -5’

TAT GGCATGCTAGC

TTAAT TCATTAATTA AGTAA

CGTACGATCGG TAT

TTAATGTAAGTTAGCTCACTCATTAATTACATTCAATCGAGTGAGTAA

Function of palindromeRNA secondary structure?

Binding site for dimeric protein?

How to tell?

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

5’- -3’3’- -5’

TAT GGCATGCTAGC

TTAAT TCATTAATTA AGTAA

CGTACGATCGG TAT

TTAATGTAAGTTAGCTCACTCATTAATTACATTCAATCGAGTGAGTAA

How to tell?Compensatory mutations: RNAUncorrelated mutations: protein

Count all in certain class (Li et al, 2000) Guess a pattern and improve (Meme, Gibbs sampler)

snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTChistone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTTHMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGGTP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTTprotamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACTnucleolin GCAGGCTCAGTCTTTCGCCTCAGTCTCGAGCTCTCGCTGGsnRNP E TGCCGCCGCGTGACCTTCACACTTCCGCTTCCGGTTCTTTrp S14 GACACGGAAGTGACCCCCGTCGCTCCGCCCTCTCCCACTCrp S17 TGGCCTAAGCTTTAACAGGCTTCGCCTGTGCTTCCTGTTTribosomal p. S19 ACCCTACGCCCGACTTGTGCGCCCGGGAAACCCCGTCGTTa'-tubulin ba'1 GGTCTGGGCGTCCCGGCTGGGCCCCGTGTCTGTGCGCACGb'-tubulin b'2 GGGAGGGTATATAAGCGTTGGCGGACGGTCGGTTGTAGCAa'-actin skel-m. CCGCGGGCTATATAAAACCTGAGCAGAGGGACAAGCGGCCa'-cardiac actin TCAGCGTTCTATAAAGCGGCCCTCCTGGAGCCAGCCACCCb'-actin CGCGGCGGCGCCCTATAAAACCCAGCGGCGCGACGCGCCA

Human sequences 5’ to transcriptional start

Regulatory Protein and their Binding SitesHow to find them?

Step 1. Arbitrarily choose candidate pattern from a sequence

Step 2. Find best matches to pattern in all sequences

Step 3. Construct position-dependent frequency table based on matches

Step 4. Calculate relative probability of matches from frequency table

ACAGGGCAGAACCCGGGTGTTTCCGGGGACGCGCCCCCGGGCCTCCGCAGAGCTG

A 0.208 0.292 0.000 0.999 0.000 0.999 0.811 0.905 0.575 0.321 0.151 0.264T 0.160 0.217 0.867 0.000 0.999 0.000 0.189 0.000 0.208 0.057 0.104 0.113C 0.283 0.236 0.132 0.000 0.000 0.000 0.000 0.000 0.000 0.151 0.330 0.283G 0.349 0.255 0.000 0.000 0.000 0.000 0.000 0.95 0.217 0.472 0.415 0.340

Regulatory Protein and their Binding SitesHow Meme finds them

snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTChistone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTTHMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGGTP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTTprotamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT

Human sequences 5’ to transcriptional start

How do pattern finders work?

snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTChistone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTTHMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGGTP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTTprotamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT

Step 1. Arbitrarily choose candidate pattern from a sequence

Step 2. Find best matches to pattern in all sequences

Step 3. Construct position-dependent frequency table based on matches

Step 4. Calculate relative probability of matches from frequency table

Step 5. Move around to find local maximum

Regulatory Protein and their Binding SitesHow Meme finds them

How do pattern finders work?

snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTChistone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTTHMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGGTP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTTprotamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT

Step 1. Arbitrarily choose candidate pattern from a sequence

Step 2. Find best matches to pattern in all sequences

Step 3. Construct position-dependent frequency table based on matches

Step 4. Calculate relative probability of matches from frequency table

Step 5. Move around to find local maximum

Regulatory Protein and their Binding SitesHow Meme finds them

Step 6. If probability score high, remember pattern and score

How do pattern finders work?

snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTChistone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTTHMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGGTP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTTprotamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT

Step 1. Arbitrarily choose candidate pattern from a sequence

Step 2. Find best matches to pattern in all sequences

Step 3. Construct position-dependent frequency table based on matches

Step 4. Calculate relative probability of matches from frequency table

Step 7. Repeat Steps 1 - 5

Regulatory Protein and their Binding SitesHow Meme finds them

Step 5. Move around to find local maximum

Step 6. If probability score high, remember pattern and score

• You’ve found a gene related to Purple Tongue Syndrome

• BlastP: Encoded protein related to cAMP-binding proteins

• Are the similarities trivial? Related to cAMP binding?

• Does your protein contain cAMP-binding site?

• What IS a cAMP-binding site?

Task

1. Determine what is a cAMP-binding site

2. Determine if your protein has one

Regulatory Protein and their Binding SitesHow Meme finds them

1. Collect sequences of known cAMP-binding proteins

2. Run Meme, a pattern-finding programAsk it to find any significant motifs

3. Rerun Meme. Demand that every protein has identified motifs

4. Run Pfam over known sequence to check

Do it

Strategy

Regulatory Protein and their Binding SitesHow Meme finds them

aceB ACTATGGAGCATCTGCACATGAAAACCatpI ACCTCGAAGGGAGCAGGAGTGAAAAACbioB ACGTTTTGGAGAAGCCCCATGGCTCACglnA ATCCAGGAGAGTTAAAGTATGTCCGCTglnH TAGAAAAAAGGAAATGCTATGAAGTCTlacZ TTCACACAGGAAACAGCTATGACCATGrpsJ AATTGGAGCTCTGGTCTCATGCAGAACserC GCAACGTGGTGAGGGGAAATGGCTCAAsucA GATGCTTAAGGGATCACGATGCAGAACtrpE CAAAATTAGAGAATAACAATGCAAACA

PSSMs in actionIdentification of beginning of gene

Experimentally proven

start sites

unknown

PSSMs in actionIdentification of beginning of gene

Experimentally proven

start sites

unknownaceB ACTATGGAGCATCTGCACATGAAAACCatpI ACCTCGAAGGGAGCAGGAGTGAAAAACbioB ACGTTTTGGAGAAGCCCCATGGCTCACglnA ATCCAGGAGAGTTAAAGTATGTCCGCTglnH TAGAAAAAAGGAAATGCTATGAAGTCTlacZ TTCACACAGGAAACAGCTATGACCATGrpsJ AATTGGAGCTCTGGTCTCATGCAGAACserC GCAACGTGGTGAGGGGAAATGGCTCAAsucA GATGCTTAAGGGATCACGATGCAGAACtrpE CAAAATTAGAGAATAACAATGCAAACA

aceB ACCACATAACTATGGAGCATCTGCACATGAAAACCatpI ACCTCGAAGGGAGCAG.....GAGTGAAAAACbioB ACGTTTTGGAGAAGC...CCCATGGCTCACglnA ATCCAGGAGAGTTA.AAGTATGTCCGCTglnH TAGAAAAAAGGAAATG.....CTATGAAGTCTlacZ TTCACACAGGAAACAG....CTATGACCATGrpsJ AATTGGAGCTCTGGTCTCATGCAGAACserC GCAACGTGGTGAGGG...GAAATGGCTCAAsucA GATGCTTAAGGGATCA....CGATGCAGAACtrpE CAAAATTAGAGAATA...ACAATGCAAACA

ACGT

PSSMs in actionIdentification of beginning of gene

aceB ACCACATAACTATGGAGCATCT.GCACATGAAAACCatpI ACCTCGAAGGGAGCAG.....GAGTGAAAAACbioB ACGTTTTGGAGAAGC...CCCATGGCTCACglnA ATCCAGGAGAGTTA.AAGTATGTCCGCTglnH TAGAAAAAAGGAAATG.....CTATGAAGTCTlacZ TTCACACAGGAAACAG....CTATGACCATGrpsJ AATTGGAGCTCTGGTCTCATGCAGAACserC GCAACGTGGTGAGGG...GAAATGGCTCAAsucA GATGCTTAAGGGATCA....CGATGCAGAACtrpE CAAAATTAGAGAATA...ACAATGCAAACA

ACGT

PSSMs in actionIdentification of beginning of gene

PSSMs in actionAlgorithm to find binding sites (Li et al)

Li et al (2002)Algorithm

Calculation of probability by Poisson equation

Dimer occurred n times. How likely is that?

Frequency of GTGAGTT = f1

Frequency of AACTCAC = f2

How likely is it to find: GTGAGTTAACTCAC

Frequency of joint occurrence = f1 · f2 = f12

Li et al (2002)Algorithm

Calculation of probability by Poisson equation

Probability of n occurrences of dimer =

f12 · f12 · f12 · … · (1-f12) · (1-f12) · (1-f12) · …n times N - n times

NCn

N !

n! · (N – n)!·

Li et al (2002)Algorithm

Calculation of probability by Poisson equation

Probability of n occurrences of dimer =

(f12)n · (1-f12)(N-n)N !

n! · (N – n)!·

Expected number = m = f12 · N

f12 = m / N

(m/N)n · (1-m/N)(N-n)N !

n! · (N – n)!·

Li et al (2002)Algorithm

Calculation of probability by Poisson equation

Probability of n occurrences of dimer =

(m/N)n · (1-m/N)(N-n)N !

n! · (N – n)!·

(m)n · (1-m/N)NN !

n! · (N – n)!·

(N)n · (1-m/N)n

(m)n · (1-m/N)NN !

n! · (N – n)!·

(N)n · (1 )n

(m)n · e-mN !

n! · (N – n)!·

(N)n · (1 )n

Li et al (2002)Algorithm

Calculation of probability by Poisson equation

Probability of n occurrences of dimer =

(m)n · e-mN !

n! · (N – n)!·

(N)n · (1 )n

(m)n · e-mN · (N-1) · (N – 2) · … (N–n+1)

n! (N)n · (1 )n

·

(m)n · e-m

n!

top related