rna structure prediction chapter 16. primary, secondary and tertiary structures
TRANSCRIPT
![Page 1: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/1.jpg)
RNA Structure Prediction
Chapter 16
![Page 2: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/2.jpg)
Primary, Secondary and Tertiary Structures
![Page 3: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/3.jpg)
RNA Structures
![Page 4: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/4.jpg)
![Page 5: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/5.jpg)
Ab Initio
Prediction based on a single RNA sequenceSearch for RNA structure with lowest energyFree energy calculated from G-C < A-U < G-U < unpaired pairsStacking between aromatic rings (van der Waals interactions [no apostrophe]) gives rise to cooperativetyNeighboring loops or bulges impose unfavorable entropic changeFind all possible base-pairing interactionCalculate the energy of each and choose the lowest energy configuration
Dot MatricesPlot all interactions in self alignment plotFind diagonals after applying sliding window
Dynamic ProgrammingFind the single optimal matchUse Watson-Crick and wobble base pairing scoresConformations with slightly higher energies may exist without optimal base pairing
![Page 6: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/6.jpg)
Partition Function
Use a probability distribution to generate sub-optimal structures within a given energy range
Mfoldhttp://mfold.bioinfo.rpi.edu/applications/mfold/Dynamic programming and thermodynamic calculationRNAfoldhttp://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgiExtend alignment to more than one diagonal in dotplot to calculate thermodynamic stability of structures
![Page 7: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/7.jpg)
Comparative Approach
Assumption that homologous RNA sequences fold into same structure
CovariationCovariant regions in homologous sequences are likely to be basepairedPredict consensus structure based onm predictions for all aligned sequencesRNAalifoldhttp://rna.tbi.univie.ac.at/cgi-bin/RNAalifold.cgiPrealignmentPredictions based on covariance, minimum free eneregy, dynamioc programming finds optimal satructure for entire alignmentFoldalignNo prealignmenthttp://foldalign.ku.dk/Clustal alignment and dynamic programming
![Page 8: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/8.jpg)
Chapter 17
Genome Mapping, Assembly and Comparison
![Page 9: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/9.jpg)
Definitions
Genomics – study of genomes
Structural genomics (genome analysis) – identification of genes, annotation of gene features, comparison of genome structures
Functional genomics – analysis of genome wide gene expression and gene functions
![Page 10: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/10.jpg)
Genome Mapping
`Cytological map •Banding pattern of metaphase chromosomes•Low resolution (Dustin units)Genetic map • Relative positions of genetic markers•Marker associated with specific genetic trait•The closer the markers, the lower the probability of separation in cross-over event, and independent inheritancePhysical map •Order of clone fragments using a library of radio-labeled probes
![Page 11: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/11.jpg)
Genome Sequencing
Shotgun approach•Sequence large number of randomly cloned DNA fragments•Number of fragments to be sequenced is large to allow overlap to reconstruct entire genome•Requires no knowledge of physical map•Typically equivalent of 6 genome length (“6× coverage”) must be sequences to ensure correct assembly•Gaps filled in with PCR “chromosome walking” (successive sequencing from primers designed from last round of sequencing results)
Hierarchical approach•Clone of very large fragments (100-300kb) into Bacterial Artificial Chromosomes (BACs)•Map BAC inserts by restriction enzyme analysis•Arrange in order•Choose smallest number of BACs that cover entire genome (“golden tiling path”)•Sub-clone BAC insert fragments into bacterial vectors and sequence
![Page 12: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/12.jpg)
![Page 13: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/13.jpg)
Genome Sequence Assembly
Short sequence 500bp runs → 5-10kb contigs → 30-50kb supercontigs (scaffolds)
Major challenges
•Sequence errors•Vector DNA contamination (filtering programs)•Repetitive sequence regions (RepeatMasker)
![Page 14: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/14.jpg)
Dealing with repeats (almost…)
•Forward-reverse constraint
![Page 15: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/15.jpg)
![Page 16: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/16.jpg)
Base calling: Phred
•http://www.phrap.org/•Fourier analysis to resolve fluorescent traces•Assignment to base giving probability score
Sequence assembly: Phrap•http://www.phrap.org/•Takes Phred files as input•Performs Smith-Waterman local alignment•Progressively merge sequence pairs with highest to lowest similarity scores, removing overlaps•Outputs contigs
Base calling and assembly programs
→ Nucleotide sequence
![Page 17: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/17.jpg)
Additional software
VecScreen•To remove “contaminating” vector DNA sequences from genomes•http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html•Performs BLAST screen of submitted sequence against UniVec non-redundant vector database•Matches are displayed
TIGR Assembler (last updated 2003)•http://www.jcvi.org/cms/research/software/•Uses forward-reverse constraints•Smith-Waterman sequence assmbly
ARACHNE•http://www.broad.mit.edu/wga/•Gives statistical scores to overlaps•Corrects error in multiple overlaps•Outputs contigs or supercontigs
EULER•http://nbcr.sdsc.edu/euler/•Uses shortest distance traveling salesman algorithm•Useful for assembly of sequences with repeats
![Page 18: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/18.jpg)
Genome Annotation
•Sequence
•Gene structures (GenScan, FgenesH)
•Predictions verified by BLAST against sequence database, cDNA and EST (GeneWise, Spidey, SIM4, EST2Genome)
•Manually verified by human curators
•Functional assignment of proteins by BLAST searches of protein database
•Further functional description from Pfam and InterPro and literature
![Page 19: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/19.jpg)
![Page 20: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/20.jpg)
Gene Ontology
Uses limited vocabulary to describe
•Cellular components•Biological processes•Molecular functions
Vocabulary arranged in a hierarchical manner from widest to most specific description
![Page 21: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/21.jpg)
GO: “cytochrome c oxidase gene ” in Ensembl
.
.
.
![Page 22: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/22.jpg)
Automated Genome Annotation
•Genome data generated at exponential rate requires automatic genome annotation•Based on homologies
Genequiz•http://swift.cmbi.kun.nl/swift/genequiz/•BLAST and FASTA homology searches of database•Domain analysis with PROSITE and Blocks databases•Analysis of secondary and supersecondary (eg. Coiled-coils)•All results compiled to produce summary with assigned confidence level
![Page 23: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/23.jpg)
Annotation of hypothetical proteins
•In newly sequences genome as much as 40% of protein are “hypothetical”
To assign function:•Homology searches in databases•Search for similar motifs, domains and secondary structures•Identify conserved functional sites by HMM•Predict structure with fold recognition or threading•Assign broad function to protein•Test assigned function experimentally
![Page 24: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/24.jpg)
How many genes in a genome?
•Total number of human genes ~25,000•Equivalent to that in mouse•4× more than Saccharomyces cerevisiae•Not number of cells in organism that counts, but number of specialized cells (tissues) and response conditions
![Page 25: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/25.jpg)
Genome Economy
•One gene → one protein is not true•EST suggests >100,000 proteins in humans (from 25,000 genes?)
Alternative splicing•Joining different exons from a single transcript to form different proteins
Exon shuffling•Joining exons from different genes
•Drosophila Dscam gene contains 115 exons, 20 of which are constitutively spliced and 95 of which are alternatively spliced •Expresses 38,016 different mRNAs by virtue of alternative splicing
Trans-splicing•Drosophila mdg4 gene•Joins 4 exons on sense strand and 2 exons on anti-sense strand
•Single transcript of encodes dentin phosphoprotein and sialoprotein. Protein is cleaved to form two different proteins
•Human transcript for Prostrate Specific Antigen (PSA) also encodes PSA-LM in 4 th intron
![Page 26: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/26.jpg)
Comparative Genomics
•Compare genomes from different organisms
Whole Genome Alignment•Extent of genome conservation•Mechanism of genome evolution•MUMer and BLASTZ•Modified BLAST to align long genome sequences
Finding a minimal genome•What are the minimum number of genes to support a free-living cellular entity?•Useful to identify genes constituting essential metabolic pathways
Lateral Gene Transfer•Identify by G-C skew•GC%•Codon bias
![Page 27: RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures](https://reader034.vdocuments.mx/reader034/viewer/2022050908/56649ee65503460f94bf6c19/html5/thumbnails/27.jpg)
Gene order comparisons
• Where gene order is conserved between genomes, it is called synteny• Synteny may indicate functional relationships• Often indicate physical interaction of proteins• Genes encoding proteins catalyzing consecutive steps of metabolic
pathway sometimes are ordered – co-regulation of “operon”?• MAL cluster in yeast: multigene complex that encodes the MAL23
trans-acting MAL-activator, MAL21 maltose permease, and MAL22 maltase in order on chromosomes 2, 3, 7, 9 and 10