operon prediction

of 24/24
Operon Prediction Cao Fan

Post on 24-Feb-2016




0 download

Embed Size (px)


Operon Prediction. Cao Fan. Operon. A functioning unit of genomic material containing a cluster of genes under the control of a single regulatory signal or promoter Exists primarily in prokaryotes, also found in eukaryotes. Operon. Approaches- wet lab. - PowerPoint PPT Presentation


Operon Prediction

Operon PredictionCao FanOperonA functioning unit of genomic material containing a cluster of genes under the control of a single regulatory signal or promoter

Exists primarily in prokaryotes, also found in eukaryotes


Approaches- wet labDemonstrate co-transcription of the candidate gene cluster via RT-PCR of whole cell RNAReverse transcribe a specific RNA into a cDNA using a gene specific primerAmplify the cDNA via PRC using primers designed from genes within the gene clusterSuccessful PCR amplification signals the genes are members of an operonMaritza Guacucano, Gloria Levican, David S. Holmes, Eugenia Jedlicki. An RT-PCR artifact in the characterization of bacterial operons. http://www.ejbiotechnology.info/content/vol3/issue3/full/5/index.htmlApproaches dry labFeatures used:Intergenic distance (IG)Conserved gene clusters (CG)Functional relations (FR)Experimental evidence (EE)Sequence based features (SF)Phylogenetic profiles(PP)Intergenic distanceIG(contiguous genes, same operon) < IG(contiguous genes, different operons)

The most widely used parameter for operon prediction

Best single predictor

6Conserved gene clustersGenes in an operon tend to be preserved across phylogenetically related organismsOrder of genes in an operon may not be conservedSequence comparison between non-redundant genomes is usually performed to identify conserved clustersFunctional relationsGenes in the same operon tend to encode functionally related proteins

E.g. members of the same protein complex, enzymes part of a single metabolic pathway

Functional relationsFunctional classifications:Rileys functional annotationMetabolic pathwaysClusters of orthologous groups of proteins (COG)Gene ontologies (GO)Sequence-based featuresOverrepresented sequence motifs and other sequence elements such as promoters, terminators are used

Gene length ratio is also used. The ratio is shown to be genome specificPhylogenetic profilesIndicate a general trend for a set of genes to be simultaneously present or absent in related organisms

PP is shown to be genome specificFeatures

IG onlyIG, SF, EECG onlyRutger W.W. Brouwer, Oscar P.Kuipers and Sacha A.F.T. van Hijum. The relative value of operon predictions. Briefings in Bioinformatics 2008SFFeatures

Using both genome-specific and general genomic informationPhuongan Dam, Victor Olman, Kyle Harris, Zhengchang Su and Ying XuFeatures used:Intergenic distanceNeighborhood conservationPhylogenetic distanceShort DNA motifsSimilarity score between GO termsLength ratio

Prediction of operons in microbial genomesby Maria D. Ermolaeva, Owen White and Steven L. SalzbergFeatures:Conserved gene clustersScoring method:Log-likely scoresPrediction of operons in microbial genomesGene pair: two adjacent genes separated by 200 bpConserved gene pair: two adjacent genes (A,B) for which a homologous gene pair (A,B) can be found in another genome.Similarity(A,B) < Similarity(B,B) and Similarity(A,B) < Similarity(A,A)Use BLASTP to find homologsPrediction of operons in microbial genomesS pair: genes in the pair on the same strandD pair: genes in the pair on different strandsSO pair: gene pair belong to the same operonSN pair: gene pair belong to different operonsDirecton: a maximal set of adjacent genes located on the same DNA strandPrediction of operons in microbial genomesPrediction of operons in microbial genomesCalculate P(SN|S):Assumption: orientation of operons is randomN(operons) = 2N(directons)N(SN pairs) = N(operons) N(adjacent, non-pairs) N(D pairs)= 2N(directons) (N(genes) N(pairs)) N(D pairs)= 2N(directons) + N(S pairs) N(genes)P(SN|S) = N(SN pairs) / N(S pairs)Prediction of operons in microbial genomesCalculating Pchance:Pchance = (0.1G/N(conserved S))hG is the number of genomes searched, h is the number of genomes where homologs for a given gene is foundPrediction of operons in microbial genomesResult: 7699 gene pairs in 34 bacterial genomes with genes belonging to the same operon with probability >= 0.98Sensitivity: 30% - 50%OperonDBOperonDBResult:Sensitivity > 60%Maximum accuracy: 80%Relation to UROP