mining hidden information from your 454 data using modular and database oriented methods
DESCRIPTION
Mining hidden information from your 454 data using modular and database oriented methods. Joachim De Schrijver. Overview. Short introduction on 454 sequencing Variant Identification pipeline Possibilities of a DB oriented pipeline Examples Coverage Improving PCR Fast Q assessment - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Mining hidden information from your 454 data using modular and database oriented methods](https://reader036.vdocuments.mx/reader036/viewer/2022062813/56816589550346895dd84af4/html5/thumbnails/1.jpg)
Mining hidden information from your 454 data using modular and database
oriented methods
Joachim De Schrijver
![Page 2: Mining hidden information from your 454 data using modular and database oriented methods](https://reader036.vdocuments.mx/reader036/viewer/2022062813/56816589550346895dd84af4/html5/thumbnails/2.jpg)
Short introduction on 454 sequencing Variant Identification pipeline Possibilities of a DB oriented pipeline Examples
◦ Coverage◦ Improving PCR◦ Fast Q assessment◦ Homopolymers
Overview
![Page 3: Mining hidden information from your 454 data using modular and database oriented methods](https://reader036.vdocuments.mx/reader036/viewer/2022062813/56816589550346895dd84af4/html5/thumbnails/3.jpg)
Roche/454 GS-FLX sequencing:◦ Pyrosequencing◦ ± 400,000 reads/run◦ Average length: 200-250bp
Applications:◦ Resequencing: Variant identification◦ De novo (genome) sequencing: Assembly of new
regions, plasmids or entire genomes Standard Software:
◦ Variants: Amplicon Variant Analyzer (AVA)◦ Assembly: Standard 454 assembler
Introduction (i)
![Page 4: Mining hidden information from your 454 data using modular and database oriented methods](https://reader036.vdocuments.mx/reader036/viewer/2022062813/56816589550346895dd84af4/html5/thumbnails/4.jpg)
Standard software◦ + Easy to use◦ + reproducible results on similar datasets◦ + GUI (graphical user interface)◦ - No answer for ‘non-standard’ questions
Methylation experiments Different types of experiments grouped together …
◦ - What about ‘hidden’ information? Homopolymer error rates Quality score ~ length of sequenced read ‘Multirun’ information …
Introduction (ii)
![Page 5: Mining hidden information from your 454 data using modular and database oriented methods](https://reader036.vdocuments.mx/reader036/viewer/2022062813/56816589550346895dd84af4/html5/thumbnails/5.jpg)
Modular and database oriented pipeline
Modular:◦ Efficient planning◦ Scalable
Database (DB):◦ No loss of data◦ Grouping several
runs together
Variant Identification Pipeline (i)
![Page 6: Mining hidden information from your 454 data using modular and database oriented methods](https://reader036.vdocuments.mx/reader036/viewer/2022062813/56816589550346895dd84af4/html5/thumbnails/6.jpg)
Basic idea: Data is processed and stored in DB. Results (reports) are calculated ‘on the fly’ using the DB data.◦ Fast & efficient◦ Calculations only happen once◦ Everybody can access the database without risk of
data modification◦ Reporting is independent from the dataprocessing
Paper: De Schrijver et al. 2009. Analysing 454 sequences with a modular and database oriented Variant Identification Pipeline
Variant Identification pipeline (ii)
![Page 7: Mining hidden information from your 454 data using modular and database oriented methods](https://reader036.vdocuments.mx/reader036/viewer/2022062813/56816589550346895dd84af4/html5/thumbnails/7.jpg)
VIP originally developed for variant identification
Now being used in:◦ Amplicon resequencing◦ De novo shotgun◦ Methylation ◦ ~ solexa experiments
‘Hidden’ data can be extracted using intelligent querying strategies
Results per lane/Multiplex MID/run…
Possibilities of a DB oriented pipeline
![Page 8: Mining hidden information from your 454 data using modular and database oriented methods](https://reader036.vdocuments.mx/reader036/viewer/2022062813/56816589550346895dd84af4/html5/thumbnails/8.jpg)
Coverage can be calculated per◦ Lane◦ MID◦ Amplicon◦ Base position
Assessment of errors (PCR dropouts vs. human errors)
Example: Detailed coverage
1 2 3 4 5 6 7 8 9 10 11 120.00%2.00%4.00%6.00%8.00%
10.00%12.00%14.00%
MID frequency (unmapped)
![Page 9: Mining hidden information from your 454 data using modular and database oriented methods](https://reader036.vdocuments.mx/reader036/viewer/2022062813/56816589550346895dd84af4/html5/thumbnails/9.jpg)
Amplicon Resequencing experiment
Goal: Variant identification Length distributions
◦ Mapped◦ Unmapped◦ ‘Short’ mapped
Additional length separation + Improved PCR
Result: Improved efficiency
Example: Improving PCR
![Page 10: Mining hidden information from your 454 data using modular and database oriented methods](https://reader036.vdocuments.mx/reader036/viewer/2022062813/56816589550346895dd84af4/html5/thumbnails/10.jpg)
Can the length of a homopolymer be assessed using the Q score?
Yes, when homopolymer length < 6bp
Example: Homopolymers
![Page 11: Mining hidden information from your 454 data using modular and database oriented methods](https://reader036.vdocuments.mx/reader036/viewer/2022062813/56816589550346895dd84af4/html5/thumbnails/11.jpg)
Fast assessment of the quality of a run
Example: Q assessment
1 27 53 79 10513115718320923526128731333936505
1015202530354045
Q value ~ position
Q v
alue
0 50 100 150 200 250 30005
101520253035404550
Q value ~ position
Lab work OK Errors in lab work
![Page 12: Mining hidden information from your 454 data using modular and database oriented methods](https://reader036.vdocuments.mx/reader036/viewer/2022062813/56816589550346895dd84af4/html5/thumbnails/12.jpg)
Biobix – UgentWim Van CriekingeTim De MeyerGeert TrooskensTom VandekerkhoveLeander Van NesteGerben Mensschaert
CMG – UZ GentJo VandesompeleJan HellemansFilip PattynSteve LefeverKim DeleeneerJean-Pierre Renard
Acknowledgements NXT-GNT
Paul CouckeSofie BekaertFilip Van NieuwerburghDieter DeforceWim Van CriekingeJo Vandesompele