sample gep annotation report · examination of this region in the gep ucsc genome browser shows...
TRANSCRIPT
![Page 1: Sample GEP Annotation Report · Examination of this region in the GEP UCSC Genome Browser shows that there is only one methionine in frame +2 that could serve as the start codon for](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f48087bcd72a1569806028e/html5/thumbnails/1.jpg)
LastUpdate:12/28/2019
1
GEP Annotation Report
Studentname:WilsonLeungStudentemail:[email protected]:SarahC.R.ElginCollege/university:WashingtonUniversityinSt.Louis
Project details Projectname:contig10Projectspecies:D.biarmipesDateofsubmission:12/28/2019Sizeofprojectinbasepairs:43,013Numberofgenesinproject:3Doesthisreportcoverallofthegenesorisitapartialreport?PartialreportIfthisisapartialreport,pleaseindicatetheregionoftheprojectcoveredbythisreport:
Frombase25,000tobase28,000
Instructions for project with no genes Ifyoubelievethattheprojectdoesnotcontainanygenes,pleaseprovidethefollowingevidencetosupportyourconclusion:
1. PerformaNCBIBLASTXsearchoftheentirecontigsequenceagainstthe“non-redundantproteinsequences(nr)”database.Provideanexplanationforanysignificant(E-value<1e-5)hitstoknowngenesinthenrdatabaseastowhytheydonotcorrespondtorealgenesintheproject.
2. ForeachGenscanprediction,performaNCBIBLASTPsearchofthepredictedamino
acidsequenceagainstthenrproteindatabaseusingthestrategydescribedabove.
3. Examinethegeneexpressiontracks(e.g.,RNA-Seq)forevidenceoftranscribedregionsthatdonotcorrespondtoalignmentstoknownD.melanogasterproteins.PerformaNCBIBLASTXsearchagainstthenrproteindatabaseusingthesegenomicregionstodetermineiftheyshowsequencesimilaritytoknownorpredictedproteinsinthenrdatabase.
Note:Foreachgenedescribedinthisannotationreport,youshouldalsopreparethecorrespondingGFF,transcriptandpeptidesequencefilesaspartofyoursubmission.
![Page 2: Sample GEP Annotation Report · Examination of this region in the GEP UCSC Genome Browser shows that there is only one methionine in frame +2 that could serve as the start codon for](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f48087bcd72a1569806028e/html5/thumbnails/2.jpg)
LastUpdate:12/28/2019
2
CompletethefollowingGeneReportFormforeachgeneinyourproject.Copyandpastethesectionsbelowtocreateasmanycopiesasneededwithinthisreport.BesuretocreateenoughIsoformReportFormswithinyourGeneReportFormforallisoforms.
Gene report form Genename(e.g.,D.biarmipeseyeless):D.biarmipesCG31997Genesymbol(e.g.,dbia_ey):dbia_CG31997Approximatelocationinproject(from5’endto3’end):25673-27471NumberofisoformsinD.melanogaster:2Numberofisoformsinthisproject:2Completethefollowingtableforalltheisoformsinthisproject:Name(s)ofuniqueisoform(s)basedoncodingsequence
Listofisoformswithidenticalcodingsequences
CG31997-PB CG31997-PANamesoftheisoformswithuniquecodingsequencesinD.melanogasterthatareabsentinthisspecies:NA
Consensus sequence errors report form Completethissectionifyouhaveidentifiederrorsintheprojectconsensussequencethataffecttheannotationofthegenedescribedabove.Allofthecoordinatesreportedinthissectionshouldberelativetothecoordinatesoftheoriginalprojectsequence.Location(s)intheprojectsequencewithconsensuserrors:NA
Note:Forisoformswithidenticalcodingsequence,youonlyneedtocompletetheIsoformReportFormforoneoftheseisoforms(i.e.usingthenameoftheisoformlistedintheleftcolumnofthetableabove).However,youshouldgenerateGFF,transcript,andpeptidesequencefilesforALLisoforms,irrespectiveofwhethertheyhaveidenticalcodingsequencesasotherisoforms.
![Page 3: Sample GEP Annotation Report · Examination of this region in the GEP UCSC Genome Browser shows that there is only one methionine in frame +2 that could serve as the start codon for](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f48087bcd72a1569806028e/html5/thumbnails/3.jpg)
LastUpdate:12/28/2019
3
Isoform report form Completethisreportformforeachuniqueisoformlistedinthetableabove.CopyandpastethisformtocreateasmanycopiesofthisIsoformReportFormasneeded.Gene-isoformname(e.g.,dbia_ey-PA):dbia_CG31997-PBNamesoftheisoformswithidenticalcodingsequencesasthisisoform:dbia_CG31997-PAIsthe5’endofthisisoformmissingfromtheendoftheproject?No
Ifso,howmanyexonsaremissingfromthe5’end: Isthe3’endofthisisoformmissingfromtheendoftheproject?No
Ifso,howmanyexonsaremissingfromthe3’end:
1. Gene Model Checker checklist EnterthecoordinatesofyourfinalgenemodelforthisisoformintotheGeneModelCheckerandpasteascreenshotofthechecklistresultsintotheboxbelow:
Note:Forprojectswithconsensussequenceerrors,reporttheexoncoordinatesrelativetotheoriginalprojectsequence.IncludetheVCFfileyouhavegeneratedabovewhenyousubmitthegenemodeltotheGeneModelChecker.TheGeneModelCheckerwillusethisVCFfiletoautomaticallyrevisethesubmittedexoncoordinates.
![Page 4: Sample GEP Annotation Report · Examination of this region in the GEP UCSC Genome Browser shows that there is only one methionine in frame +2 that could serve as the start codon for](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f48087bcd72a1569806028e/html5/thumbnails/4.jpg)
LastUpdate:12/28/2019
4
2. View the gene model on the Genome Browser UsethecustomtrackfeaturefromtheGeneModelCheckertocaptureascreenshotofyourgenemodelshownontheGenomeBrowserforyourproject.Zoominsothatonlythisisoformisinthescreenshot.(Seepage12oftheGeneModelCheckeruserguideonhowtodothis;youcanfindtheguideunder“Help”è“Documentations”è“WebFramework”ontheGEPwebsiteathttp://gep.wustl.edu.)Includethefollowingevidencetracksinthescreenshotiftheyareavailable:
1. Asequencealignmenttrack(D.melProteinsorOtherRefSeq)2. Atleastonegenepredictiontrack(e.g.,Genscan)3. AtleastoneRNA-Seqtrack(e.g.,RNA-SeqAlignmentSummary)4. Acomparativegenomicstrack(e.g.,Conservation,D.mel.NetAlignment)
PasteascreenshotofyourgenemodelasshownontheGEPUCSCGenomeBrowserintotheboxbelow:
Low-frequencyRNA-Seqexonjunctionsnotannotated:TheevidencefromtheRNA-SeqTopHattracksandMultizalignmentssuggestthattheremightbeadditionalisoformsbecauseofalternativesplicingatthe5'endofthisgene(redarrowsinthescreenshotabove).However,becausemostoftheTopHatjunctionsaresupportedbylessthan10reads,thereisinsufficientevidencetopostulatethepresenceofmultiplenovelisoformsinD.biarmipescomparedtoD.melanogaster.
![Page 5: Sample GEP Annotation Report · Examination of this region in the GEP UCSC Genome Browser shows that there is only one methionine in frame +2 that could serve as the start codon for](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f48087bcd72a1569806028e/html5/thumbnails/5.jpg)
LastUpdate:12/28/2019
5
ExtraCDSpredictedbytheSNAPgenepredictor:SNAPpredictedaCDSat26,502-26,584(bluearrowinthescreenshotabove)betweenthefirstandsecondCDS'sofCG31997.TheRNA-SeqAlignmentSummarytrackshowsthattheregionsurroundingthisregionhaslow(<20reads)RNA-SeqreadcoverageandtheregionisadjacenttoahATDNAtransposonfragment(seescreenshotbelow).
NCBIBLASTXsearchofthegenomicregionsurroundingtheSNAPCDSprediction(contig10:26400-26700)againstthenrdatabasedidnotdetectanysignificant(E-value<1e-5)sequencesimilaritytoknownproteinsinthenrdatabase(seescreenshotbelow).
![Page 6: Sample GEP Annotation Report · Examination of this region in the GEP UCSC Genome Browser shows that there is only one methionine in frame +2 that could serve as the start codon for](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f48087bcd72a1569806028e/html5/thumbnails/6.jpg)
LastUpdate:12/28/2019
6
ANCBIBLASTNsearchofthisregionagainstthentdatabasedetectedfivesignificantmatchestopredictedmRNAsinDrosophilasuzukii(seescreenshotbelow).
TheE-valuesfortheseD.suzukiimatchesrangefrom4e-10to3e-06,andtheycorrespondtothreedifferentpredictedgenes(LOC108013970,LOC108011950,andLOC108014610).AllofthesematchesareRefSeqpredictionsthathavenotbeenconfirmedexperimentally.TherearenosignificantmatchestoRefSeqrecordsthataresupportedbyexperimentalevidenceandnosignificantmatchestomRNAsinotherspeciesbesidesD.suzukii.Collectively,whilewecouldnotrejectthepossibilitythatthisregionofcontig10containsanuntranslatedregionofanearbygene,thereisinsufficientevidencetopostulateanovelisoformofCG31997inD.biarmipescomparedtoD.melanogaster.GiventheproximityofthisfeaturetothehATDNAtransposonandthemultiplematchestopredictedtranscriptsinD.suzukii,analternativeexplanationisthatthefeatureispartofatransposonthatisfoundinbothD.biarmipesandD.suzukii.HencewehaveomittedthispredictedCDSinourannotationoftheCG31997orthologinD.biarmipes.
![Page 7: Sample GEP Annotation Report · Examination of this region in the GEP UCSC Genome Browser shows that there is only one methionine in frame +2 that could serve as the start codon for](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f48087bcd72a1569806028e/html5/thumbnails/7.jpg)
LastUpdate:12/28/2019
7
3. Alignment between the submitted model and the D. melanogaster ortholog ShowanalignmentbetweentheproteinsequenceforyourgenemodelandtheproteinsequencefromtheputativeD.melanogasterortholog.YoucaneitherusetheproteinalignmentgeneratedbytheGeneModelChecker(availablethroughthe“Viewproteinalignment”linkunderthe“DotPlot”tab)oryoucangenerateanewalignmentusingthe“Aligntwoormoresequences”feature(bl2seq)attheNCBIBLASTwebsite.Pasteascreenshotoftheproteinalignmentintotheboxbelow:
4. Dot plot between the submitted model and the D. melanogaster ortholog PasteascreenshotofthedotplotofyoursubmittedmodelagainsttheputativeD.melanogasterortholog(generatedbytheGeneModelChecker)intotheboxbelow.Provideanexplanationforanyanomaliesonthedotplot(e.g.,largegaps,regionswithnosequencesimilarity).
![Page 8: Sample GEP Annotation Report · Examination of this region in the GEP UCSC Genome Browser shows that there is only one methionine in frame +2 that could serve as the start codon for](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f48087bcd72a1569806028e/html5/thumbnails/8.jpg)
LastUpdate:12/28/2019
8
ThedotplotshowsthatthelasttwoCDS'sofCG31997-PBarehighlyconservedbetweentheproposedD.biarmipesgenemodelandtheD.melanogasterortholog.ExaminationoftheproteinalignmentattheendofthesecondandthirdCDS'sindicatethattheaminoacidshavesimilarchemicalpropertieseventhoughtheyarenotidentical.Inaddition,thelengthsofthesetwoCDS'sarethesamebetweenD.biarmipesandD.melanogaster.ThedotplotshowsthatthebeginningofthefirstCDSofCG31997-PBisonlyweaklyconservedbetweenD.biarmipesandD.melanogaster.Inaddition,thedotplotshowsthatthefirstCDSoftheD.biarmipesgenemodelislongerthantheorthologousCDSinD.melanogaster.Theproteinalignmentshowsthatthereare8additionalaminoacidswithinthefirstCDSintheproposedD.biarmipesgenemodelcomparedtoD.melanogaster.ExaminationofthisregionintheGEPUCSCGenomeBrowsershowsthatthereisonlyonemethionineinframe+2thatcouldserveasthestartcodonforCG31997-PB(seescreenshotbelow).TheexpansionofthisCDSisconsistentwiththeBLASTXalignment,theN-SCANgeneprediction,andtheavailableRNA-Seqdata.Consequently,ourannotationhasexpandedthesizeofthisCDS(1_10755_0)inordertoretainthisisoforminD.biarmipes.
Note:Largeverticalandhorizontalgapsnearexonboundariesinthedotplotoftenindicatethatanincorrectsplicesitemighthavebeenpicked.Pleasere-examinetheseregionsandprovideajustificationastowhyyouhaveselectedthisparticularsetofdonorandacceptorsites.