alexis dereeper, françois sabot
DESCRIPTION
Analysis of NGS raw data with Galaxy. Cleaning, data control, alignment, polymorphism. CIBA courses – Brasil 2011. Alexis Dereeper. Alexis Dereeper, François Sabot. Aim of the Tutorial classes: 1- Galaxy vs Command line 2- Understand FASTQ files 3- Cleaning of Illumina data (FASTQ) - PowerPoint PPT PresentationTRANSCRIPT
Alexis Dereeper, François Sabot
Analysis of NGS raw data with Galaxy
Cleaning, data control, alignment, polymorphism
Alexis Dereeper CIBA courses – Brasil 2011
Aim of the Tutorial classes:
1- Galaxy vs Command line
2- Understand FASTQ files
3- Cleaning of Illumina data (FASTQ)
4- Perform an assembly
5- Perform a mapping of Illumina reads on a reference sequence
6- Cleaning of a multiple SAM file
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
1- Galaxy
Serveur principal: http://main.g2.bx.psu.edu/
CIRAD Server : http://gohelle.cirad.fr/galaxy/
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
TOOLS DATA
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
WEB APPLICATION - “Click'n'Play” system- transparent for user
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François Sabot
MODULAR- Numerous default bricks (already integrated)- Adding of customizable bricks
Alexis Dereeper CIBA courses – Brasil 2011
WEB APPLICATION - “Click'n'Play” system- transparent for user
Alexis Dereeper, François Sabot
MULTIPLE- Based on a web server (Apache...)- On a single machine, or a cluster...
Alexis Dereeper CIBA courses – Brasil 2011
MODULAR- Numerous default bricks (already integrated)- Adding of customizable bricks
WEB APPLICATION - “Click'n'Play” system- transparent for user
Alexis Dereeper, François Sabot
BUT- Simple support- Much less powerful than terminal- Only for routine analysis - Only for limited data
Alexis Dereeper CIBA courses – Brasil 2011
MULTIPLE- Based on a web server (Apache...)- On a single machine, or a cluster...
MODULAR- Numerous default bricks (already integrated)- Adding of customizable bricks
WEB APPLICATION - “Click'n'Play” system- transparent for user
CONNECTION FOR THE TUTORIAL CLASSES:
http://gohelle.cirad.fr/galaxy/
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Connecting...
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Add data...
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Import data from Galaxy libraries
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Import data from Galaxy libraries
FASTQ file → TEXT file
STRUCTURE:
@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTACGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT
+cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb
@HWUSI-EAS454_0006:1:37:16314:3410#CTTGTAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGTGGTGGCCG
+`bTbbccccceeeeeceeeecccYeedded`ceec]dddde^a`deeeec\`dddcbaadadYd`]]Jc_^bc^^\
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA
CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT
+
cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA
CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT
+
cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb
SEQUENCE NAME
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
IUPAC SEQUENCE
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA
CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT
+
cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA
CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT
+
cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb
Quality in ASCII
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA
CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT
+
cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François Sabot
@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA
CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT
+cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb
Alexis Dereeper CIBA courses – Brasil 2011
f → Quality = 38 (102 – 64)
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA
CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT
+cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb
Alexis Dereeper, François Sabot
WHAT IS QUALITY ?
Quality value Q is an integer mapping of p (i.e., the probability that the corresponding base call is incorrect).
Alexis Dereeper CIBA courses – Brasil 2011
FASTQC: quality control
http://www.bioinformatics.bbsrc.ac.uk/projects/download.html#fastqc
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Why do we need to clean ?
To remove remaining adapters/primers andlow quality sequences
→ CutAdapt
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François Sabot
20
7
Alexis Dereeper CIBA courses – Brasil 2011
70
Your data are now ready to be analyzed...
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Concatenate files
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Untested Tools → NGS → Assembly → Assemble with MIRA
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
BLAST of putative contigs against reference
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
BLAST of putative contigs against reference
Separate sequences by original individuals RC1, RC2...
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Separate sequences by original individuals RC1, RC2...
Separate sequences by original individuals RC1, RC2...
Use of regular expression via Galaxy:
→ RC[13456789] & remove reads => keep RC2
→ RC[123456789]_ & remove reads => keep RC10
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Separate sequences by original individuals RC1, RC2...
Alexis Dereeper, François Sabot
Mapping: Map 'pair-end‘ reads on a reference
1- Compute positions for each read
Alexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François Sabot
Mapping: Map 'pair-end‘ reads on a reference
1- Compute positions for each read
2- Associate positions of each member of the pair
Alexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François Sabot
Mapping: Map 'pair-end‘ reads on a reference
1- Compute positions for each read
2- Associate positions of each member of the pair
3- Selection of the more probable position respecting the conditions
Alexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François Sabot
Mapping: Map 'pair-end‘ reads on a reference
1- Compute positions for each read
2- Associate positions of each member of the pair
3- Select of the more probable position respecting the conditions
4- Edit a SAM output file
Alexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Reference From History: Shared data/Formation/PreProcess/reference.fasta
Library: Paired-end
FASTQ files: From your history
BWA setting to use: Commonly Used
Unselect “Suppress the header in the output SAM file”
Click
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
SAM output file (Sequence Alignment/Map)
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François Sabot
Sort of SAM file by coordinate
Alexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Creation of Workflow for automated analysis
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Workflow: how to avoid to run all the process by hand
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011