thursday, 5 june 2008
DESCRIPTION
Genes Determining Plant-Cyanobacterial Symbioses and Consideration of Blast. Thursday, 5 June 2008. Problems in sequence analysis. Identification by sequence similarity. - PowerPoint PPT PresentationTRANSCRIPT
Thursday, 5 June 2008
• Problems in sequence analysis• Identification by sequence similarity
Genes Determining Plant-Cyanobacterial Symbioses and Consideration of Blast
This demonstration is best viewed as a slide show,enabling you to simulate a session and make changes
in cursor position more obvious.To do this, click Slide Show on the top tool bar, then View show.
Click anywhere to go on to the next slide
10 mM nitrate 0.1 mM nitrate
Gland development is stimulated by N-limitation
What's special about the gland?
Gland suppressed by presence of fixed N
Plant starved for N makes gland to house cyanobacteria
What genes are specifically
expressed in glands?
Construction of a cDNA library from Gunnera gland
mRNA ends with polyA tails
Use modified polyT to direct synthesis of
DNA copy of mRNAReverse Transcriptase (RT) adds CCC to end.
Add 2nd adapter, using GGG to attach to CCC. Extend cDNA
Construction of a cDNA library from Gunnera gland(Same protocol, but with real sequences)
5'-NNNNNNNNNN ... NNNNNNNNNNAAAAAAAAAAAAAAAAA...-3'3'-TTTTTCTTTTTTCATGGCTGACGCTGAGACGCAACTATGGTGACGAA-5'
Use modified polyT adapter to direct synthesis of DNA copy of mRNA
Construction of a cDNA library from Gunnera gland
5'-NNNNNNNNNN ... NNNNNNNNNNAAAAAAAAAAAAAAAAA...-3'3'-TTTTTCTTTTTTCATGGCTGACGCTGAGACGCAACTATGGTGACGAA-5'
Use modified polyT adapter to direct synthesis of DNA copy of mRNA
The adapter can bind to many positions in polyA tail, resulting in variation in number
of T's in cDNA sequence.
Construction of a cDNA library from Gunnera gland
5'-NNNNNNNNNN ... NNNNNNNNNNAAAAAAAAAAAAAAAAA...-3'3'-TTTTTCTTTTTTCATGGCTGACGCTGAGACGCAACTATGGTGACGAA-5'
Use modified polyT adapter to direct synthesis of DNA copy of mRNA
The adapter can bind to many positions in polyA tail, resulting in variation in number
of T's in cDNA sequence.
Construction of a cDNA library from Gunnera gland
5'-NNNNNNNNNN ... NNNNNNNNNNAAAAAAAAAAAAAAAAA...-3' TTTTTCTTTTTTCATGGCTGACGCTGAGACGCAACTATGGTGACGAA-5' 3'-CCCNNNNNNNNNN ... NNNNNNNNNN
Reverse Transcriptase (RT) extends the adapter to the end of the mRNA
and adds CCC to the 3' end.
3'-CCCNNNNNNNNNN ...
Construction of a cDNA library from Gunnera gland
5'-NNNNNNNNNN ... NNNNNNNNNNAAAAAAAAAAAAAAAAA...-3' TTTTTCTTTTTTCATGGCTGACGCTGAGACGCAACTATGGTGACGAA-5' 3'-CCCNNNNNNNNNN ... NNNNNNNNNN
5'-AAGCAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGG
A second adapter is added which (with the help of antibodies to) uses three G's to bind to the three .C's.
CCCNNNNNNNNNN ...
Construction of a cDNA library from Gunnera gland
5'-NNNNNNNNNN ... NNNNNNNNNNAAAAAAAAAAAAAAAAA...-3' TTTTTCTTTTTTCATGGCTGACGCTGAGACGCAACTATGGTGACGAA-5' 3'-CCCNNNNNNNNNN ... NNNNNNNNNN
5'-AAGCAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGG
The cDNA sequence is extended to the left, using the second
adapter as a template.
TTCGTCACCATAGTTGCGTCTCACCGGTAATGCCGG
CCCNNNNNNNNNN ...
Construction of a cDNA library from Gunnera gland
5'-NNNNNNNNNN ... NNNNNNNNNNAAAAAAAAAAAAAAAAA...-3' TTTTTCTTTTTTCATGGCTGACGCTGAGACGCAACTATGGTGACGAA-5' 3'-CCCNNNNNNNNNN ... NNNNNNNNNN
5'-AAGCAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGGNNNNNNNNNN ...
The cDNA sequence is extended to the left, using the second
adapter as a template…
…and then the second cDNA is strand is synthesized left-to-right, using the first cDNA strand as the
template.
TTCGTCACCATAGTTGCGTCTCACCGGTAATGCCGG
CCCNNNNNNNNNN ...
Construction of a cDNA library from Gunnera gland
5'-NNNNNNNNNN ... NNNNNNNNNNAAAAAAAAAAAAAAAAA...-3' TTTTTCTTTTTTCATGGCTGACGCTGAGACGCAACTATGGTGACGAA-5' 3'-CCCNNNNNNNNNN ... NNNNNNNNNN
5'-AAGCAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGGNNNNNNNNNN ... TTCGTCACCATAGTTGCGTCTCACCGGTAATGCCGG
Hundreds to thousands of nucleotides
To give some perspective, the adapters are about 50 nucleotides, while the
mRNA itself can be as large as a couple of thousands of nucleotides.
Construction of a cDNA library from Gunnera gland
Of course there are thousands of different mRNA's in a cell, leading to thousands of cDNA's in the library, all
in multiple copies.
Sequencing of cDNA library
Limitations:
- Only from ends
- Only ~400 nt
It would be nice to be able to sequence the cDNA's from end to end, but that's
not presently possible.
Sequencing has its limitations.
Sequencing of cDNA library
Limitations:
- Only from ends
- Only ~400 nt
Solution:
- Break the cDNA
The solution is to break up the cDNA so that there are multiple, overlapping ends from which to sequence. In this way, all the full
length of the cDNA can be sequenced
Sequencing of cDNA library
(1000's of cDNA's)
The broken fragments are read from either end (at random). If there are enough reads, it is possible to use overlaps to
reassemble the original sequence.
Unfortunately, the adapters are also sequenced, and these complicate the assembly process, as they're interpreted as
overlapping sequences, leading to misassembly.
They need to be removed.
Sequencing of cDNA library
(1000's of cDNA's)
Given the number of sequences, the removal process obviously must be automated, but automated processes,
while fast, are often stupid.
We need to check to make sure they worked.
Identifying elements of cDNA library
The assembly process should, in theory, also remove duplicate sequences.
Identifying elements of cDNA library
The assembly process should, in theory, also remove duplicate sequences.
In practice, partial duplicates may remain, and it is necessary to keep an eye
out for them.
Identifying elements of cDNA library
Predict function directly from sequence
How to go from cDNA sequence to predicted function for the sequences?
You might think that since we can readily predict a protein sequence from a DNA
sequence, it should be possible to predict function as well.
Identifying elements of cDNA library
Predict function directly from sequence
Predict function from sequence similarity
Nope. At present that's impossible.
The best we can do is to compare sequences with sequences from other
organisms where there is experimental evidence as to function.
Identifying elements of cDNA library
Predict function directly from sequence
Predict function from sequence similarityBlast is a tool to do just that, comparing a
given sequence against at database of known sequences.
It is important to understand the mind of Blast. But that is a subject for another time.
Genes Determining Plant-Cyanobacterial Symbioses and Consideration of Blast
1. Determine if primers been removed from sequences.
2. Determine if the library contains duplicates
3. Identify protein sequences similar to those encoded by cDNAs
We've identified many things that need to be done:
4. (plus one extra) Find where in the cDNAs genes begin and end
Genes Determining Plant-Cyanobacterial Symbioses and Consideration of Blast
Go into StaphyloBIKE through the BioBIKE portal(Gunnera isn't a member of the Staphylococcus, of course, but I put the cDNA sequences in that instance of BioBIKE)
RUN-FILE "contig-resources.bike" SHARED(this makes the cDNA sequences available to you as a variable called gunnera-contigs and also provides you with a possibly useful tool
READ-NAMED to extract specific sequences)
These questions are ordinarily answered by high-powered computer types. But you can answer them yourself.
First you need to read in the data.
Genes Determining Plant-Cyanobacterial Symbioses and Consideration of Blast
SEQUENCE-SIMILAR-TOAccesses BLAST, using as targets either internal data
(i.e. gunnera-contigs) or external data (i.e. *GENBANK*)Also used to look for nearly identical sequences,
using the MISMATCHES option.
READING-FRAMES-OF Translates the sequence in all six possible reading frames.
Possibly useful functions: