m.sc. in molecular medicine, institute of molecular medicine, trinity college dublin, ireland

21
M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland. Introduction to Bioinformatics: February 2005 David Lynn (M.Sc., Ph.D.) http://www.binf.org/course2005/

Upload: hayden

Post on 21-Jan-2016

61 views

Category:

Documents


0 download

DESCRIPTION

M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland. Introduction to Bioinformatics: February 2005 David Lynn (M.Sc., Ph.D.). http://www.binf.org/course2005/. Topics for the next 3 days:. Day 1a – Nucleic Acid Sequence Analysis - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland

M.Sc. in Molecular Medicine,Institute of Molecular Medicine,

Trinity College Dublin,Ireland.

Introduction to Bioinformatics:

February 2005David Lynn (M.Sc., Ph.D.)

http://www.binf.org/course2005/

Page 2: M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland

Topics for the next 3 days:Topics for the next 3 days:

Day 1a – Nucleic Acid Sequence Analysis

Day 1b – Protein Sequence Analysis

Day 1c – Accessing Complete Genomes

Day 2a – Alignments & Homology Searching

Day 2b – Phylogenetic Trees

Page 3: M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland

Day 1a Day 1a

Introduction Interrogating Sequence Databases

Translating DNA in 6 frames. Reverse complement & other tools. Calculating some properties of DNA/RNA sequences. Primer design. Gene prediction. Alternative splicing. Promoter characterisation. Other resources.

Page 4: M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland

1) Translating DNA in 6 frames1) Translating DNA in 6 frames

5'3' Frame 1

atcacctggtatagtataa

I T W Y S I

5'3' Frame 2

atcacctggtatagtataa

S P G I V *

5'3' Frame 3

atcacctggtatagtataa

H L V * Y

3'5' Frame 1

ttatactataccaggtgat

L Y Y T R *

3'5' Frame 2

ttatactataccaggtgat

Y T I P G D

3'5' Frame 3

ttatactataccaggtgat

I L Y Q V

Page 5: M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland

Why?Why?

Translating in all 6 frames is commonly done for a range of bioinformatics applications.

One place you may need to do it is to locate ORFs in an mRNA sequence which will have untranslated 3’ and 5’ UTRs.

Try find the protein sequence encoded by the IL-11 mRNA (link on webpage) using the Translate Tool at Expasy.

Page 6: M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland

2) Search launcher at Baylor College2) Search launcher at Baylor College

Readseq – converts sequences from one format to another. RepeatMasker – masks sequences against repeat sequences. Primer Selection - PCR primer selection (See primer design later). WebCutter- restriction maps using enzymes w/ sites >= 6 bases. 6 Frame Translation - translates a nucleic acid sequence in 6 frames. Reverse Complement - reverse complements a nucleic acid sequence. Reverse Sequence - reverses sequence order. Sequence Chopover - cut a large protein/DNA sequence into smaller

ones with certain amounts of overlap. HBR - Finds E.coli contamination in human sequences.

Page 7: M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland

3) Oligo Calculator 3) Oligo Calculator

Calculates the – Length– %GC content– Melting temperature (Tm) the midpoint of the temperature range at

which the nucleic acid strands separate– Molecular weight– What an OD = 1 is in picoMolar of your input sequence.

Many of these parameters are useful in primer design

Page 8: M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland

Beer – Lambert LawBeer – Lambert Law

A = cl = molar extinction coefficient c = molar concentration l = light path = 1 cm

A = O.D.

If O.D. = 1 = 41 pM

Reading of O.D. = 0.5 on spectrometer– => concentration = 20.5pM

Page 9: M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland

5) Gene Prediction5) Gene Prediction

Gene prediction is an area under intensive research in bioinformatics.

GENSCAN program - one of the major programs used to predict genes in the human genome .

Should be useful in predicting genes in most vertebrate species, although caution should be used when dealing with other species especially prokaryotes where other programs are more suitable.

The Institute for Genomic Research The Deambulum Nucleic Acids Sequence Analysis page at Infobiogen

Page 10: M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland

6) Splice site prediction/Alternative splicing6) Splice site prediction/Alternative splicing

For proper splicing => some way to distinguish exons from introns.

Accomplished using certain base sequences as signals.

Allow the spliceosome (the cellular machinery that does the splicing) to identify the

5' and 3' ends of the intron.

Eukaryotes: the base sequence of an intron begins with 5' GU, and ends with 3' AG.

Each species has additional bases associated with these splice sites.

Introns also have another important sequence signal called a branch site containing

a tract of pyrimidine bases and a special adenine base, usually approximately 50

bases upstream from the 3' splice site.

Page 11: M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland

Consensus splice site sequencesConsensus splice site sequences

Page 12: M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland

Alternative splicingAlternative splicing

Central dogma of molecular biology was that 1 gene = 1 protein.

Multiple possible mRNA transcripts can be produced from 1 gene and if translated these transcripts can code for very different proteins – Alternative splicing

4 basic methods of alternative splicing.

Page 13: M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland

1) Splice/Don’t Splice1) Splice/Don’t Splice

Page 14: M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland

2) Competing 5’ or 3’ splice sites2) Competing 5’ or 3’ splice sites

Page 15: M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland

3) Exon Skipping3) Exon Skipping

Page 16: M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland

4) Mutually Exclusive Exons4) Mutually Exclusive Exons

Page 17: M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland

The Human Alternative Splicing Database at The Human Alternative Splicing Database at UCLA UCLA

Used ESTs to locate alternative splices.

Project has resulted in a publication of over six thousand alternatively spliced isoforms of human genes.

Search the database using any of the following identifiers:– Gene Symbol– UniGene Sequence Identifier – UniGene Cluster Identifier– Gene Title – GenBank Sequence Identifier

Page 18: M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland

7) Promoter Analysis & Recognition7) Promoter Analysis & Recognition

A promoter is a sequence that is used to initiate and regulate transcription of a gene.

Most protein-coding genes in higher eukaryotes have polymerase II dependent promoters.

Features of pol II promoters:– Combination of multiple individual regulatory elements.– Most important elements are transcription factor binding sites.– CAAT or TATA boxes are neither necessary nor sufficient for

promoter function.– In many cases, order and distances of elements are crucial for their

function.– Sequences between elements within a promoter are usually not

conserved and of no known function.

Page 19: M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland

The promoter region in higher eukaryotesThe promoter region in higher eukaryotes

Page 20: M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland

PromoterInspectorPromoterInspector

predicts eukaryotic pol II promoter regions with high specificity (~ 85%) in mammalian genomic sequences.

sensitivity of PromoterInspector is about 50% which

means that the current version predicts about every second promoter in the genome.

PromoterInspector predicts the approximate location of

a promoter region and not the exact location of the Transcription Start Site (TSS).

Page 21: M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland

MatInspector professionalMatInspector professional

Individual Transcription Factor sites build the basis of the promoter.

Relatively short stretches of DNA (10 - 20 nucleotides)

Sufficiently conserved in sequence to allow specific recognition by the corresponding transcription factor.

Utilizes a library of matrix descriptions for transcription factor binding sites to locate matches in sequences.