prosite and ucsc genome browser exercise 3. protein motifs and prosite

49
Prosite and Prosite and UCSC Genome UCSC Genome Browser Browser Exercise 3 Exercise 3

Post on 21-Dec-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

Prosite and Prosite and UCSC Genome UCSC Genome

BrowserBrowser

Exercise 3Exercise 3

Page 2: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

Protein motifsProtein motifs and and

Prosite Prosite

Page 3: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

Turning information into knowledgeTurning information into knowledge

The outcome of a sequencing project is The outcome of a sequencing project is masses of raw datamasses of raw data

The challenge is to turn this The challenge is to turn this raw data into raw data into biological knowledgebiological knowledge

A valuable tool for this challenge is an A valuable tool for this challenge is an automated diagnostic pipe through which automated diagnostic pipe through which newly determined sequences can be newly determined sequences can be streamlinedstreamlined

Page 4: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

From sequence to functionFrom sequence to function

Nature tends to innovate rather than inventNature tends to innovate rather than invent Proteins are composed of functional Proteins are composed of functional

elements: domains and motifselements: domains and motifs DomainsDomains are structural units that carry out a are structural units that carry out a

certain functioncertain function The same domains are The same domains are

shared between different shared between different proteinsproteins

MotifsMotifs are shorter are shorter sequences with certainsequences with certainbiological activitybiological activity

Page 5: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

What is a motif?What is a motif?

A sequence motifA sequence motif = a certain sequence = a certain sequence that is widespread and conjectured to that is widespread and conjectured to have biological significancehave biological significance

Examples:Examples:KDELKDEL – ER-lumen retention signal – ER-lumen retention signalPKKKRKVPKKKRKV – an NLS (nuclear localization – an NLS (nuclear localization signal)signal)

Page 6: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

More loosely defined motifsMore loosely defined motifs

KDEL (usually)KDEL (usually)++

HDEL (rarely) HDEL (rarely) ==

[HK]-D-E-L:[HK]-D-E-L:H H oror K at the first position K at the first position

This is called a pattern (in Biology), or a This is called a pattern (in Biology), or a regular expression (in computer science)regular expression (in computer science)

Page 7: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

Syntax of a patternSyntax of a pattern

Example:Example: W-x(9,11)-[FYV]-[FYW]-x(6,7)-[GSTNE]W-x(9,11)-[FYV]-[FYW]-x(6,7)-[GSTNE]

Page 8: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

PatternsPatterns

W-x(9,11)-[FYV]-[FYW]-x(6,7)-[GSTNE]W-x(9,11)-[FYV]-[FYW]-x(6,7)-[GSTNE]

Any amino-acid, between 9-11

times

F or Y or

V

WOPLASDFGYVWPPPLAWSROPLASDFGYVWPPPLAWSWOPLASDFGYVWPPPLSQQQ

Page 9: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

Patterns - syntaxPatterns - syntax

The standard IUPAC one-letter codes. The standard IUPAC one-letter codes. ‘‘x’x’ : any amino acid. : any amino acid. ‘‘[]’[]’ : residues allowed at the position. : residues allowed at the position. ‘‘{}’{}’ : residues forbidden at the position. : residues forbidden at the position. ‘‘()’()’ : repetition of a pattern element are indicated in : repetition of a pattern element are indicated in

parenthesis. X(n) or X(n,m) to indicate the number or parenthesis. X(n) or X(n,m) to indicate the number or range of repetition. range of repetition.

‘‘-’-’ : separates each pattern element. : separates each pattern element. ‘‹’‘‹’ : indicated a N-terminal restriction of the pattern. : indicated a N-terminal restriction of the pattern. ‘›’‘›’ : indicated a C-terminal restriction of the pattern. : indicated a C-terminal restriction of the pattern. ‘‘.’.’ : the period ends the pattern. : the period ends the pattern.

Page 10: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

Profile-pattern-consensusProfile-pattern-consensus

AAAACCTTTTGG

AAAAGGTTCCGG

CCAACCTTTTCC

1122334455

AA0.660.66110000..

TT00000011..

CC0.330.33000.660.6600..

GG00000.330.3300..

AAAACCTTTTGG

]AC-[A-[GC]-T-[TC]-[GC]

multiple alignment

consensus

pattern

profile

NNAANNTTNNNN

Page 11: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

http://www.expasy.ch/http://www.expasy.ch/prositeprosite//

Page 12: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

PrositeProsite

A method for determining the function of A method for determining the function of uncharacterized translated protein uncharacterized translated protein sequencessequences

Database of annotated protein families Database of annotated protein families and functional sites as well as associated and functional sites as well as associated patterns and profiles to identify thempatterns and profiles to identify them

Page 13: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

PrositeProsite Entries are represented with Entries are represented with patternspatterns or or

profilesprofiles

pattern

1122334455

AA0.660.66110000..

TT00000011..

CC0.330.33000.660.6600..

GG00000.330.3300..

profile

]AC-[A-[GC]-T-[TC]-[GC]

Profiles are used in Prosite when the motif is relatively Profiles are used in Prosite when the motif is relatively divergent and it is difficult to represent as a patterndivergent and it is difficult to represent as a pattern

Page 14: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

Scanning PrositeScanning Prosite

Query: sequence

Query: pattern

Result: all patterns found in sequence

Result: all sequences which adhere to this pattern

Page 15: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

prosite sequence queryprosite sequence query

Page 16: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite
Page 17: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

Prosite profileProsite profile

Page 18: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

Prosite profile Prosite profile sequence logo sequence logo

Page 19: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

Sequence logoSequence logo

Page 20: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

WebLogoWebLogo

http://weblogo.berkeley.edu/logo.cgi

Page 21: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

Searching Prosite with a sequenceSearching Prosite with a sequence

Page 22: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

Patterns with a high probability of Patterns with a high probability of occurrenceoccurrence

Entries describing commonly found postEntries describing commonly found post--translational modifications or compositionally translational modifications or compositionally biased regions.biased regions.

Found in the majority of known protein Found in the majority of known protein sequences sequences

High probability of occurrenceHigh probability of occurrence

Page 23: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

Searching Prosite with a patternSearching Prosite with a pattern

Page 24: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

prosite pattern queryprosite pattern query

Page 25: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite
Page 26: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite
Page 27: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

Searching Prosite with a Prosite ACSearching Prosite with a Prosite AC

Page 28: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

UCSC UCSC Genome Browser Genome Browser

Page 29: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

UCSC Genome BrowserUCSC Genome Browser

Page 30: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

UCSC Genome BrowserUCSC Genome Browser

Page 31: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

Reset all settings of

previous user

UCSC Genome Browser - GatewayUCSC Genome Browser - Gateway

Page 32: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

UCSC Genome Browser - GatewayUCSC Genome Browser - Gateway

Page 33: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

UCSC Genome Browser - GatewayUCSC Genome Browser - Gateway

Page 34: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

UCSC Genome Browser query resultsUCSC Genome Browser query results

Page 35: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

UCSC Genome Browser UCSC Genome Browser Annotation tracksAnnotation tracks

Vertebrate conservation

mRNA (GenBank)

RefSeq

UCSC Genes

Base position

Single species compared

SNPs

Repeats

Direction oftranscription (<)

CDS

Intron

UTR

Page 36: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

USCS GeneUSCS Gene

Page 37: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

UCSC Genome Browser - movementUCSC Genome Browser - movement

Zoom x3 + Center

Page 38: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

UCSC Genome Browser – UCSC Genome Browser – Base viewBase view

Page 39: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

Annotation track optionsAnnotation track options

dense

squish

full

pack

Page 40: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

Annotation track optionsAnnotation track optionsAnother option totoggle between

‘pack’ and ‘dense’view is to click on

the track title

Sickle-cell anemia distr.

Malariadistr.

Page 41: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

BLATBLAT

BLAT = BBLAT = Blast-last-LLike ike AAlignment lignment TTool ool BLAT is designed to find similarity of BLAT is designed to find similarity of >95% on >95% on

DNADNA, , >80% for protein>80% for protein Rapid search by indexing entire genome.Rapid search by indexing entire genome.Good for:Good for:1.1. Finding genomic coordinates of cDNAFinding genomic coordinates of cDNA2.2. Determining exons/intronsDetermining exons/introns3.3. Finding human (or chimp, dog, cow…) Finding human (or chimp, dog, cow…)

homologs of another vertebrate sequencehomologs of another vertebrate sequence4.4. Find upstream regulatory regionsFind upstream regulatory regions

Page 42: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

BLAT on UCSC Genome BrowserBLAT on UCSC Genome Browser

Page 43: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

BLAT on UCSC Genome BrowserBLAT on UCSC Genome Browser

Page 44: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

BLAT ResultsBLAT Results

Page 45: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

BLAT ResultsBLAT Results

Match

Non-Match(mismatch/indel)

Indel boundaries

Page 46: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

BLAT ResultsBLAT Results

Page 47: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

BLAT Results on the browserBLAT Results on the browser

Page 48: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

Getting Getting DNADNA sequence of region sequence of region

Page 49: Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite

Getting Getting DNADNA sequence of region sequence of region