2014 10-9-blogging genomesannotation

Download 2014 10-9-blogging genomesannotation

If you can't read please download the document

Upload: yannick-wurm

Post on 02-Jul-2015

197 views

Category:

Science


0 download

DESCRIPTION

QMUL SBC322 Eco evo genomics Peer review concepts & blogging science setup Difficulties of genomics with emerging model organisms

TRANSCRIPT

2. Peer review process In academia: For funding For publication! Outside academia 3. Elseviers standard ofcial process 4. Do ItYourself. See & feel value of peer review process Improve writing skills Improve critical reading skills Construct your online prole credit as author, as reviewer, as editor. 5. Plan? 8 papers - 1 blog post per paper 100-500 words 2 authors per blog post from other theme mix up groups # needed in role # of reviews to do Normal person everyone else 2 Web person 2 1 Editor responsible for 4 blog posts 2 0 6. Some details: http://goo.gl/U0LlIK 7. Todos Decide your groups for authoring (must pair with different people than presentation!) cannot present and blog as part of same "theme" https://etherpad.mozilla.org/obVAlZUq5D Editors determine who is responsible for which papers alert authors when their stuff is due determine who receives which review task when Web people need to decide on platform (e.g. tumblr & color scheme etc), name, potential guidelines, and set up 8. Some implementationtips/resources 9. Writing a good blog post Respect basic writing rules :) Similar to New Scientist article? Lots of online resources. e.g.: http://scienceofblogging.com/how-to-write-a-good-research-blog-post/ 10. Reviewing Mention strengths Most important: Potential for improvement!! Major themes Constructive suggestions if possible: structure / wording clarication / elimination Structuring a review: General statement Numbered list of specic points that need to be addressed. 11. Responding to a review Fix main text Specically reply to each point raised by reviewers! 12. Todos Decide your groups for authoring (must pair with different people than presentation!) cannot present and blog as part of same "theme" https://etherpad.mozilla.org/obVAlZUq5D Editors determine who is responsible for which papers alert authors when their stuff is due determine who receives which review task when Web people need to decide on platform (e.g. tumblr & color scheme etc), name, potential guidelines, and set up 13. Genomics on emerging model organisms 14. Genomes of emerging model organisms [email protected] SBC322 15. So you want to do sequence a genome Sampling? algorithms prefer low diversity Sequencing approach? paired end? which sequencer? what is needed for scaffolding? 16. Scaffolding 17. So you want to do sequence a genome Sampling? algorithms prefer low diversity Sequencing approach? paired end? which sequencer? what is needed for scaffolding? input data Q/A? sequencer statistics fastqc bio-relevant measurements? (e.g. % mapping to known data) Unable to detect all errors! 18. trimming/deduplicating/ltering removing excess/redundant data removing errors Which assembler? used by others? (publications/ online list/ forum/ assemblathon) something new?! assembly result QA sequence statistics (e.g., QUAST) bio-relevant measures (e.g. ,CEGMA) So you want to do sequence a genome 19. Perfect parameters Instead: need to test many combinations of trimming of ltering different assembly software 20. ACGACACGGCGCGTCGAGCGTGCACAGAGAAGTTGAATTATCGAGGGaGAAAAAGGGCGCGAGGGAGGGGCAGGACGGTCAAACACTGAGAAAGCAGGCAACATCCGCGAACCCGTAAATTAATTAGGGCCAACATAATAG CTGGGAATGGCTGGTAGTGGTGGTTCGTCTTCGTGGTCCAGCTTGCGGTTCCCGAGTGCCCTCGCAGTTCGCGCGGTCATGCCTCGGAAACTACACGATACACCTTGCGTCATCCTTACACGAGGATACCTGTTTGCGATC CCCTGAGGTATTAACGAGCGTATAAGCAGACTCAAAGTACGAACACACTGCTAGTTTCGCGTGCATGTTGCTCGTCGCCGTGCAGACGAGTTAGTACTGTCGACAAAGTAGTCGTAAAGTACGTAAGAGTCTTCCTCTAGT AGCGAGGACTAATCTCCCGGTACATTAATCTATGTATATTTTATGATATACACACCTGTATTTATCGAAAGTTTATGTTTATCTCGAAAATTAACGTTAATTTTCGAGGACGCAAAAGCTGTCGCAAGTTTaTTTAATTTT GTTTGtATTTTTTTTTCTTGCTCATTTTTATTTCGAAGTATGCAAATTGAATCTCTTGGTGGCATAACGAGAATTTTCGAAGCTTAAAGGGATCTGCGTTGGCGCGAAGAGAATGGGACTTGTTTTAAACTTTTCTTTCCA ACGAACACCGTTTCCTTCATTGTAACGTAAAAATGGTGAGCTTCTGCGGCGCGATTGGTCTCATCTTTCCTCTCGTATCGTTCGTCATTTTGTCCGAGAGCGTGTAGATATTGTACCCGATAGAACGGAGACGGAATCACG CTATACGGTTCCCTCCCAAATGTTCGTTCCTCACGGGCAAAAGTACAAGTAAAAGTAGGAGGGTCCGACTTTATGACCCTACGAGGCACAATAAAGGTTGATGACTACTAAACGTAGAAGAACACGTATACAGTCCGACGT AAAGTTATTTTTCAATTCCGCGGTCCTCCGCCGCAGTCGTTTTGCCTTGACGTGGAAAAGGAAATTTCCCGGGTTCTAGTCGTCCGGTCTTCTTTtCGTTCTACTGACAATACCATAtTTTCGACATAGATCGATCTCCTT TtCTCTTtCTCTCTCTCTCTCTCTCTCTCTCTTGAAAATGAAAGCGCGGAAGAGCCGGATCGTGCAACGCACAATGCGAGGCCGCTTTGTACGAGGTACGAGGAGTCCGTCATGCGCCGTAAAACGCCGGAATATGCAATA CTACTTTCGCAGCGACGGGGCTTGCACGAAGTAATATCGTGAAACGTAGAACCGTTCTTTTCATACGGATATGCGGGAGAAGTTGCTCGTCCGCCCTCCGGCGCATACACGTCGCGCGAGAGTATCGTGCATCCGAAACTC TGAGATGAAACTCTTCTGTAGTCGATATTCGTCGCGATAAATAGATAAGTCTCGGATAAGGTAGAGACATCGATAATTCCGTTACGAGAATACTCGAGAATAAGATCCAAGTGAAGTGATCACGCTCCATATGGTTTCAGA TTAATCTCCAATCGGCTGACGAAGGAGGATCATCCTTTTACCGGTGGAAAATAGGGTGGATTGGCGGAGCAAGAAGGCTAAACAGAGAATAAACAGGAGCTATAGCCGAACGAGGGAGAAGGTAAGTAGATGCCTCGGAGA ACGTGAGCGAAAAAAGGAAACGGGTACGAGAGAAAAAGAAACGGACGGAATAGCGGCTCGGGATTCGCATCCCAAGGAAAACCAAGGCTATACCGGGGTCTTGGATTATTCGAGGCACGACGACGATCTTCGAGTCAGCCG ACTGTCTGTACCGTGAAAGTGGCACGTATCGATCGCACGGCTGGATTATCTTCCACTTCGATCTACGACGATTACTTCCGCCATCGTATATCCGGGCTTTGCACTAGCGAGGCTATTTAAAAATCTGCGCTCAGTAACTAC TTATGATTTTTCCATCAGAAACGATTGTGGAGAGAAAAGAGGGAAAAAAAGAGATAGACAGCCTCTGGCTCGAAATGCTAATTTCGCAATCGAGAATTAAAATGCACTTCTTTGTATCTAAATTTTCGTAGAATTAAAATA AAATTGAATAAAGCAATGAATAAATTGAATAAACTAAAATATAGCTAAATATTTTTTCCTCTATACAAGGTGAATATAATTATCAAATATTTAAGTATGTAGATTGAATTTAAACAGCCTCGGAAGAGAAAAGAATCGGAT AAACGAAATGCTTTTGCCTCTATTTTCAAGCACGTGACGAATAAAATCTAGCAAAGCTTTTCGACACAATATGTCGACGCAAATGTGGTCTATTTTGGCTAATGATTATTACCGGGAGTCCCGGGCACGGTGTGTCCGCGG CGCGATAAATTAGAGCGCGAATCGACTTCCACGGCCGTTGTAGAAGGTACTTTGGCAAACGTTATTTCTTCCTGTCTCGAAGGAAGCGCCACTCGAAAACTTGGAAAAGTTCGGCCAGCTGCACGCACCGCGATCTCGGGT CCGTTCTGGCTCGGTGGCGTGCGCAGGACGGTTGTGAGACGAGAGAAAGAGAGAAAGGAGGATAGAGCGCGAGAGGGAAAGAAGAGGGAACATCCGCGTGCGTGGTTGTATATGGCGTTAATGGCGGCGAGCATAAAGCAT CCCCGcGCCTGCACGCTCGACCACGGTCGTTTACAGTGCCACTATTTATGTTGATAACTTCGGAATGGAGTCAGATACACGGTACCGAGTGCCGGCGGTTCGGCGGTGGTCCGGCAGCGGTCTGGTAGCACTTGCAAAGTA TGGGAGAAGAAGGGGGATGGTGGATTCGCGGATCTTTTTGGTCGTGCAAGGAAGGCGGGTCGGTTAGGGTAGGTAGAGAGGAGACGAGCCGAGTCGAGAACAAGATTCAAGCGGAAAGTTTTGCGAGTTAATGGTGCGGAG GTAATGCCGCGGCCGCAACAGCAACAGCCATCCCCCTGCTTGTGTGTTCGCTCGCTCGCTCACTtCTTCCGTTCTCTCCTCTTCGCGCTAGTTCTCTCTCTCTCTCTCTCTCTCTCTCTGtCTTTCTTGGAATAGCCGTAG AAAAAaGGAAAAGAAAGAGAGAGAAAGAACGAGACTCCTTTCTCCCCACGAATTCTCTCCTCTCTTTAAGCACACTCTCTCTTCCTGCaccCCCCCCCcTCcTCTCTCTCTCTTTCTCTCTCTATTTTCTTGTCTTCCTCA CTTGCATCATCCCTCGTCCATTTCCcTCTGGAGAACGTGCCACGTTCTCACTTCCTCGTTCGTCGCTACTTTTTTTCTTCTTTAGTTGTCTCCTCTCACACCCTCGAGACGGCCCGATCTTTTCCTCGTACAGCTCCTATC ACGAACAGTCGCTAACAAGTGCATCGAATGCAAGTTGCCGACAAACTTCTTCCACCGATTTGTGCTTGTTCTCTGTGCATGCGCGGGCATGTATATCTCTATTAGGCGACATCTGCTCTCAGCTTTTTCATAACGAGGTAG CGCGGATTCGCGCTCCGAGAGACTTCACGAAGCACTTCCGATCTCGCTACAGTAGAATGCGTATTGTATTTTCTTGTCTCTCTCATTTACTCTTTTCTATCTTTCGTATCTAGCGTGAATACTCCCATGAGGAATGTAGAA ACCAGATTTTGAAACGGCTTCTTCGTATCTAAATTTTCTTAACTATTTGTTCCTCAAAGTCCATTTTGACGTTATAAATTTTTATTTTTTAATGAGAATGTTTTTATGTGGAGAAGAAAAGTACAACTTTTTTCAAAATGC AATTAAATTTATACAAGACTACTAAGATAAAAAAAGATGCAAATAATAATGTTCAACTTACTAACTGCTATATTATTAAGGCCAAGTTAAATTACAGATTTATGATGTTGCAGAATATAATAGAAAAGTTTCAAGAAAAAA aCATTTTAAAGCTTAAAGAGTTTGTTTTCAAACCCATCAAATTTTTTtCTCTGGCAGTGCTGCTGCTCTGCGACGTCATTTTCAAGTCGATCAATGCAAAGTTAACAAAAGTATTTCAACTTTAAAATCATGCAGAAAGTT GGGAAAAATTTGTTTTCAATTTTACTTAAATTTTAGTTACATTTTTTCTAAAAGCTTAAAATCTCCTCTTTAAAATATGTCTTTAAAAATTGAGCTATGATTTTTCTTCACCGAGTTATCGTAATTTGAAATGGCCAATCG ACGTTTTCTTTTCATCACTAGGAATAATGCCGGCGGATTTAAGCTTTCAGAAGATTCCAAGCAAAGTTGAAAACAATTTGTTTGAATTCCGCATGATCTCAAAGTTGAAGCTAAAAAAATTTCACAAGACTTTAAAAGACA AAAAGTCAATGTTGCACAATCGACTTGGAAATAACGCCACTTGGAGCAGAGAAGTATTGCCGAGAAAAAGCTTCGACCGGGTTTGAAAGCAGGATCTATAGGCTTCCAAACTTTTTTTTtGAAAAATGGAGCTTTAAGGTT ACTTTTCACGAAATTGTATAAATTGTGTTTTGTTGtTTTTATCATCGATTCGAAATGATCCTTTCATTTTGTTCAAGAATGTATCATTAACTTAATGCAATGATATCTTTTAATAATTCAATACTTCTTACTAATTAGATT TAGAAAAGTTTCATGAACAATTTTAGAACGGTTATTGCTAATTATTCGCCAAAAACAGACGCCTTTCTTCGAGGAGTAACAACTCAAGCTGCATTCGCGGTTGACGGCTTTCCGCGGCGCGTCGCGGTAATATGGCTCCTA TTTAATTACCCGGTCCTTTGGGAGCTTAAACCAGCCAGAGCTGGAACGGCTGCGCCGTTATTCTATCTAGACTCCTTTGCTTTCTCAAGCAGGCGCGCGATCAAACCTTCTCGCATAAAAGACAATCGCAGCTGGCAGTCG ACGACGCGcGGGACAGTCGAATCATGACCCGCTCCTCTCTAGTTACCGCCGTCAGTCTGCTCTACTTCCAGACGCCGCGCGTAATCTATTCGACATTAGTTGCTTGATTGCACCGTAAAATGCGACGGCGACGTGAACGAG AACGACGACGACGACGACGACGACGACAACGACGACGACAATGACAACGACGACGACGACAAGAGTGGGTTCGTCGGTGCTCGATGGCGCCTCCGATTTCAACCGCAGCACGATGCAAGCCCATTACTATCGCCCGGAGCT AAACGGCACCCGGAGCTCGTGCCATTAAGGGAATCTAGGGTCCGATCCACCTCATTGAATTCCGTTcAATTGCGATTATGATAATGCGTGAACGATCGCCGTGGACGTGAGCTACGGaCAACgAGGGTGTTCGTTCTCGGC TCAGAGAAACGCAGCGATAAAATTATCAGTGACAGCTTCATTTTTGTTACATTTGACGTTGAAAAATTTGCGAAAGAAATGTGCATTATTCAACTACATTGACAATTGTAAATCTTACATGACCTTTTATTAATATACATA TATGTAAGATATATGTTCCATCTCTAGTCTCGTTCGATAATGAAGAAATTATAGCTGCAACATAGACGCCGGATTATTCGCGGGGGAAACCGTTGATTATTTGGACTCCGTGGGCTCGGGGTCTGGTTTACTTCTTCCCTT ATGTCCGGAGATAATGGACACTATTACCTTAACGAGGCCCCAGTCTCTTAGCCGGTAATTGCTTCGAGATATCCGAGAGAGCTCCGGCGTACGTTGCCGCCTGGTGTTGCAGGCAGAGAACCCaGACGGTATTATTGCCGC CGAGGCTACTCGCCCGTTTCATGGTGGTTATTGTTATGGGCCTGCGGCATTAAGATGTACACCGCACTCTCATACGGGACCCACCACCCCATATACGAGCTGCATATATACTTATGCGGGGAGGATTTCATTACGCCGCTT CATAACTGGCGGTCTTCAAAATCGCGATAAATCGGGATTGTCTTCCTTGTTAAATCAGCTCCCGTTCCCCTTCCTCATATCGCTGACGAAGCCAACGAGACGGATTAATCTGCGCGATTGAATGGCCTCTATGACGTAAGC GCACCGTTTACTGGCACGGCTCCTCGTGTTCACGTGAAACGATTTGCGTCGTAAATATTTTATTTTAACATCGCAACATCAAGACAGACGAGGATCGGCTATTGCCTCGTGATCCTAAAAGGGAGATTCTCAAGGCGGAAT CGGGGTAACGCGTTCGTTGATCTCGCCAAACTACGGCATCTTGAGGACTAGTCTTAGaGGAAAAAAAGACGACGAGGAGACACGGTGAGCATTAGATGAGAAAGAGACGGCGCGGCGCGGTGCGGCGGAGCGAGACGGAAA GAGATCAAATCTGGATATCAGGATTAGGGTGGGTACGTAATCCGCAGGACGACGGGTGGTAGGAACGGTGGATCCGTCGGCAGATCTCATCGCGCGGAGGAACTCTGCGGGTACTTCGCCCGGCCAAATACGACAAGAGCA GCAGCCTAACTCGAGTAGAGCCGCCGAGATGGTTTACGGCTCGTCGCAAGTTGGTGAAATTTAAGAGCGCATTTAAAACTGGTTTGGCGCGATGCCCTGCGCCTCCCTGCAGCAGGTACTGCCGGCGAGAGATACCGCTGG GCTCCGACAGGAGTTTATCGCCCTAACATTCTTGGAAAGCTTCGGATGGATTTCCCTCCTCAACCCATTTCTTCCGGCAGCACCTCAAATCGTCGCTTCTTTCGAGCTCCCTGTGTCGTCCGTTCTTTCCCCTTGCCACGA AAATTCTTCCAACAGACTCGACACCAAATCGTCACGACGATCATCGGTCAAGATACCCGGAGAAACGTCGCGGACGATGTTACCGTCAGATCGGACGCTTTTAGCTCCGCATTGGAGCTTTTTCCTCGACCGTCTCGGAGG AGATTTAGCGGACCGACAACAAATGAAGGTCATCTTTCGCGCGAGAAAGCGTCTCACCTTGATTTCCGTCTCGTTTCTCCGGAAATTGGATATCAACCGGACGAACTGTAATGTGTACGTTAAATTGCCAATTATATATAT AATGTAACAGCTGATTATCTCAAGTGTCCTAAAAACCATTATACTCTTAATTTCTGTGAAAAATGGCGAAAATAAAAAAGAAACCGATCTTAATAAAGATATTCTTCCTGATAGATGCCATGACCCACGTGGAAAACTTTT TAGTTTTGTACAGTGGTATTATACGTTATCTTCCGCTGAACGTAAGACGTGCCTATCGCGCAATTTCATCGCGACGTCGTCGTATAGCGATTATGGCTACTCCATTAAAAATGAATTTTATAAAGGCAATCTTTCCAAGCG ATCGTTGTAGGAGAAAAAGGCGAAAGCCGGAGCCAAAGGGGATGAGGCCACTACCTTTGGCTGATCCACTTCGAATGATAATCACCTCTAGGAGACTCAATTTCGCCCTGCTCCGCGTCCTTACCCGTTCCTATCTTCGGA AGGTTCAACGCCGCAGCGGACTGCATCTTTCACTCCCTTCGTCACCACCGCCCTATTCCTATCGCCCTCCGCGCGCCTACCGCCCCTATATCCTTCCCTTCCTTCACtCCTAGACTATTCTGAACGACCTCTTCCCCCATT CGCCAACGCTCACTCCTAACTGATTGGAGTACCAATCAATGCGGCATTCAGGCGGCCGTGCTGAAaCTTTAGGAAATTAACTATTCACTCTCTGGAAATGGTTATTTGGAAGGCCGGAAAGGCAGTCGGGACTACGTTACG But making it useable is hard. Sequencing a genome is now cheap & easy. 21. Gene prediction Dozens of software algorithms: dozens of predictions Yandell&Ence2013NRG TTTTtACCTGTTTTtGAAAAGGTAATTTTCTTTAGATATATACAGTTTGTAATaTTAGGTATTTTATAAACAGTGTGTATATTTCTTACAATATAAAAGACACAATTGCAAACTAGCATGATTGTAAACAATTGCTAAACGGATCAATATAAATTAAAATTGTAATATTAAGTATCAAACCGATAATTTTT Evidence Why? 22. Gene prediction Dozens of software algorithms: dozens of predictions 20% failure rate: missing piecesextra piecesincorrect mergingincorrect splitting Visual inspection... and manual xing required. 1 gene = 20 minutes to 3 daysYandell&Ence2013NRG TTTTtACCTGTTTTtGAAAAGGTAATTTTCTTTAGATATATACAGTTTGTAATaTTAGGTATTTTATAAACAGTGTGTATATTTCTTACAATATAAAAGACACAATTGCAAACTAGCATGATTGTAAACAATTGCTAAACGGATCAATATAAATTAAAATTGTAATATTAAGTATCAAACCGATAATTTTT Evidence Consensus: 23. Annotation in practice Annotation software (in google chrome):Afra Login http://afra.sbcs.qmul.ac.uk/#login Your task: work through 3 examples 5 additional gene annotations (through Curate button)