basic terms:

38
Basic terms: Basic terms: Similarity Similarity - measurable quantity. - measurable quantity. Similarity- applied to proteins Similarity- applied to proteins using concept of conservative using concept of conservative substitutions substitutions Identity Identity percentage percentage Homology Homology -specific term indicating -specific term indicating relationship by evolution relationship by evolution

Upload: shona

Post on 19-Mar-2016

46 views

Category:

Documents


0 download

DESCRIPTION

Basic terms:. Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Identity percentage Homology -specific term indicating relationship by evolution. Basic terms:. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Basic terms:

Basic terms:Basic terms: SimilaritySimilarity - measurable quantity. - measurable quantity.

Similarity- applied to proteins using concept of Similarity- applied to proteins using concept of conservative substitutionsconservative substitutions

IdentityIdentity percentagepercentage

HomologyHomology-specific term indicating -specific term indicating relationship by evolutionrelationship by evolution

Page 2: Basic terms:

Basic terms:Basic terms: Orthologs: homologous sequences found Orthologs: homologous sequences found

in in two or moretwo or more species, that have the species, that have the same function (i.e. alpha- hemoglobin).same function (i.e. alpha- hemoglobin).

Page 3: Basic terms:

Basic terms:Basic terms: Orthologs: homologous sequences found Orthologs: homologous sequences found

it it two or moretwo or more species, that have the species, that have the same function (i.e. alpha- hemoglobin).same function (i.e. alpha- hemoglobin).

Paralogs: homologous sequences found in Paralogs: homologous sequences found in the the samesame species that arose by gene species that arose by gene duplication. ( alpha and beta hemoglobin).duplication. ( alpha and beta hemoglobin).

Page 4: Basic terms:

Pairwise comparisonPairwise comparison DotplotDotplot

All against all comparison.All against all comparison.• Every position is compared with every other Every position is compared with every other

position.position.

Page 5: Basic terms:

Pairwise comparisonPairwise comparison DotplotDotplot

All against all comparison.All against all comparison.• Every position is compared with every other Every position is compared with every other

position.position.• Nucleic acids and proteins have polarity.Nucleic acids and proteins have polarity.

Page 6: Basic terms:

Pairwise comparisonPairwise comparison DotplotDotplot

All against all comparison.All against all comparison.• Every position is compared with every other Every position is compared with every other

position.position.• Nucleic acids and proteins have polarity.Nucleic acids and proteins have polarity.• Typically only one direction makes biological Typically only one direction makes biological

sense. sense.

Page 7: Basic terms:

Pairwise comparisonPairwise comparison DotplotDotplot

All against all comparison.All against all comparison.• Every position is compared with every other Every position is compared with every other

position.position.• Nucleic acids and proteins have polarity.Nucleic acids and proteins have polarity.• Typically only one direction makes biological Typically only one direction makes biological

sense. sense. 5’ to 3’ or amino terminus to carboxyl terminus.5’ to 3’ or amino terminus to carboxyl terminus.

Page 8: Basic terms:

Simple plotSimple plot Window: size of sequence block used for Window: size of sequence block used for

comparison. In previous example:comparison. In previous example: window = 1window = 1

Stringency = Number of matches required Stringency = Number of matches required to score positive. In previous example:to score positive. In previous example: stringency = 1 (required exact match)stringency = 1 (required exact match)

Page 9: Basic terms:

GATCGTACCATGGAATCGTCCAGATCAGATC + (4/4)

GATCGATC

GATC - (0/4)- (0/4)+ (2/4)

WINDOW = 4; STRINGENCY = 2

DotPlot

Page 10: Basic terms:

Dot PlotDot Plot

Compare two sequences in every register.Compare two sequences in every register. Vary size of window and stringency Vary size of window and stringency

depending upon sequences being compared.depending upon sequences being compared. For nucleotide sequences typically start with For nucleotide sequences typically start with

window = 21; stringency = 14window = 21; stringency = 14 Protein - start with smaller window : 3, Protein - start with smaller window : 3,

stringency 1 or 2.stringency 1 or 2. Important to test different stringencies.Important to test different stringencies.

Page 11: Basic terms:

Intergenic comparisonIntergenic comparison Nucleotide sequence Nucleotide sequence

contains three domains.contains three domains. 50 - 350 - Strong conservation50 - 350 - Strong conservation

• Indel places comparison Indel places comparison out of registerout of register

450 - 1300 - Slightly weaker 450 - 1300 - Slightly weaker conservationconservation

1300 - 2400 - Strong 1300 - 2400 - Strong conservationconservation

Page 12: Basic terms:

Scoring AlignmentsScoring Alignments Quality ScoreQuality Score: :

Score x for match, -y for mismatch; Score x for match, -y for mismatch;

Page 13: Basic terms:

Scoring AlignmentsScoring Alignments Quality ScoreQuality Score: :

Score x for match, -y for mismatch; Score x for match, -y for mismatch; • Penalty for:Penalty for:

Creating GapCreating Gap Extending a gapExtending a gap

Page 14: Basic terms:

Scoring AlignmentsScoring Alignments Quality ScoreQuality Score: :

QualityQuality = [10(match)] = [10(match)]

Page 15: Basic terms:

Scoring AlignmentsScoring Alignments Quality ScoreQuality Score: :

QualityQuality = [10(match)] + [-1(mismatch)] = [10(match)] + [-1(mismatch)]

Page 16: Basic terms:

Scoring AlignmentsScoring Alignments Quality ScoreQuality Score: :

QualityQuality = [10(match)] + [-1(mismatch)] - = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps)[(Gap Creation Penalty)(#of Gaps)

Page 17: Basic terms:

Scoring AlignmentsScoring Alignments Quality ScoreQuality Score: :

QualityQuality = [10(match)] + [-1(mismatch)] - = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total

length of Gaps)]length of Gaps)]

Scoring scheme incorporates an evolutionary Scoring scheme incorporates an evolutionary model--model--

Page 18: Basic terms:

Scoring AlignmentsScoring Alignments Quality ScoreQuality Score: :

QualityQuality = [10(match)] + [-1(mismatch)] - = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total length of [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total length of

Gaps)]Gaps)]

Scoring scheme incorporates an evolutionary model--Scoring scheme incorporates an evolutionary model--

Matches are conservedMatches are conserved

Page 19: Basic terms:

Scoring AlignmentsScoring Alignments Quality ScoreQuality Score: :

QualityQuality = [10(match)] + [-1(mismatch)] - = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total length of [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total length of

Gaps)]Gaps)]

Scoring scheme incorporates an evolutionary model--Scoring scheme incorporates an evolutionary model--Matches are conservedMatches are conserved

Mismatches are divergencesMismatches are divergences

Page 20: Basic terms:

Scoring AlignmentsScoring Alignments Quality ScoreQuality Score: :

QualityQuality = [10(match)] + [-1(mismatch)] - = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total length of [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total length of

Gaps)]Gaps)]

Scoring scheme incorporates an evolutionary model--Scoring scheme incorporates an evolutionary model--Matches are conservedMatches are conservedMismatches are divergencesMismatches are divergences

Gaps are more likely to disrupt function, hence greater Gaps are more likely to disrupt function, hence greater penalty than mismatch.penalty than mismatch.

Page 21: Basic terms:

Scoring AlignmentsScoring Alignments Quality ScoreQuality Score: :

QualityQuality = [10(match)] + [-1(mismatch)] - = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total length of Gaps)][(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total length of Gaps)]

Scoring scheme incorporates an evolutionary model--Scoring scheme incorporates an evolutionary model--Matches are conservedMatches are conservedMismatches are divergencesMismatches are divergencesGaps are more likely to disrupt function, hence greater penalty than mismatch.Gaps are more likely to disrupt function, hence greater penalty than mismatch.

Introduction of a gap (indel) penalized more Introduction of a gap (indel) penalized more than extension of a gap.than extension of a gap.

Page 22: Basic terms:

Z Score (standardized score)Z Score (standardized score) Z = (ScoreZ = (Scorealignmentalignment - Average Score - Average Scorerandomrandom))

Standard Deviationrandom

Page 23: Basic terms:

Quality Score:Randomization•Program takes sequence and randomizes it X times (user select).•Determines average quality score and standard

deviation with randomized sequences•Compare randomized scores with Quality score to help determine if alignment is potentially significant.

Page 24: Basic terms:

RandomizationRandomization It has become clear thatIt has become clear that

Sequences appear to evolve in a Sequences appear to evolve in a “word” like fashion.“word” like fashion.• 26 letters of the alphabet--combined to 26 letters of the alphabet--combined to

make words. make words. • Words actually communicate information.Words actually communicate information.

Randomization should actually occur at Randomization should actually occur at the level of strings of nucleotides (2-4). the level of strings of nucleotides (2-4).

Page 25: Basic terms:

Global AlignmentGlobal Alignment Global - Compares all possible Global - Compares all possible

alignments of two sequences and alignments of two sequences and presents the presents the one with the greatest one with the greatest number of matches and the fewest number of matches and the fewest gapsgaps. .

Page 26: Basic terms:

Global AlignmentGlobal Alignment Global - Compares all possible Global - Compares all possible

alignments of two sequences and alignments of two sequences and presents the presents the one with the greatest one with the greatest number of matches and the fewest number of matches and the fewest gapsgaps..

Alignment will “run” from one end of the Alignment will “run” from one end of the longest sequence, to the other end. longest sequence, to the other end.

Page 27: Basic terms:

Global AlignmentGlobal Alignment Global - Compares all possible Global - Compares all possible

alignments of two sequences and alignments of two sequences and presents the presents the one with the greatest one with the greatest number of matches and the fewest number of matches and the fewest gapsgaps..

Alignment will “run” from one end of the Alignment will “run” from one end of the longest sequence, to the other end. longest sequence, to the other end.

Best for closely related sequences.Best for closely related sequences.

Page 28: Basic terms:

Global AlignmentGlobal Alignment Global - Compares all possible alignments of Global - Compares all possible alignments of

two sequences and presents the two sequences and presents the one with the one with the greatest number of matches and the fewest greatest number of matches and the fewest gapsgaps..

Alignment will “run” from one end of the Alignment will “run” from one end of the longest sequence, to the other end. longest sequence, to the other end.

Best for closely related sequences.Best for closely related sequences. Can miss short regions of strongly conserved Can miss short regions of strongly conserved

sequence. sequence.

Page 29: Basic terms:

Local AlignmentLocal Alignment

Identifies segments of alignment with the Identifies segments of alignment with the highest possible score.highest possible score.

Page 30: Basic terms:

Local AlignmentLocal Alignment

Identifies segments of alignment with the Identifies segments of alignment with the highest possible score.highest possible score.

Align sequences, extends aligned regions in Align sequences, extends aligned regions in both directions until score falls to zero.both directions until score falls to zero.

Page 31: Basic terms:

Local AlignmentLocal Alignment

Identifies segments of alignment with the highest Identifies segments of alignment with the highest possible score.possible score.

Align sequences, extends aligned regions in both Align sequences, extends aligned regions in both directions until score falls to zerodirections until score falls to zero..

Best for comparing sequences whose relationship is Best for comparing sequences whose relationship is unknown.unknown.

Page 32: Basic terms:

Global Alignment:

Local Alignment:

Page 33: Basic terms:

Blast 2

Basic Local Alignment Search Tool

E (expect) valueE (expect) value: number of hits expected by randomchance in a database of same size.

Larger numerical value = lower significance

HIV sequence

Page 34: Basic terms:

Both Global and Local alignment programs will Both Global and Local alignment programs will (almost) (almost) alwaysalways give a match. give a match.

Page 35: Basic terms:

Both Global and Local alignment programs will Both Global and Local alignment programs will (almost) (almost) alwaysalways give a match. give a match.

It is important to determine if the match is It is important to determine if the match is biologically relevant.biologically relevant.

Page 36: Basic terms:

Both Global and Local alignment programs will Both Global and Local alignment programs will (almost) (almost) alwaysalways give a match. give a match.

It is important to determine if the match is It is important to determine if the match is biologically relevant.biologically relevant.

Not necessarily relevant: Low complexity Not necessarily relevant: Low complexity regions.regions. Sequence repeats (glutamine runs)Sequence repeats (glutamine runs)

Page 37: Basic terms:

Both Global and Local alignment programs will Both Global and Local alignment programs will (almost) (almost) alwaysalways give a match. give a match.

It is important to determine if the match is It is important to determine if the match is biologically relevant.biologically relevant.

Not necessarily relevant: Low complexity Not necessarily relevant: Low complexity regions.regions. Sequence repeats (glutamine runs)Sequence repeats (glutamine runs) Transmembrane regions (high in hydrophobes)Transmembrane regions (high in hydrophobes)

Page 38: Basic terms:

Both Global and Local alignment programs will Both Global and Local alignment programs will (almost) (almost) alwaysalways give a match. give a match.

It is important to determine if the match is It is important to determine if the match is biologically relevant.biologically relevant.

Not necessarily relevant: Low complexity Not necessarily relevant: Low complexity regions.regions. Sequence repeats (glutamine runs)Sequence repeats (glutamine runs) Transmembrane regions (high in hydrophobes)Transmembrane regions (high in hydrophobes)

If working with coding regions, you are If working with coding regions, you are typically better off typically better off comparing proteincomparing protein sequencessequences. Greater information content.. Greater information content.