trinity college dublin, the university of dublin data download: bioinf.gen.tcd.ie/ge3m25/project...
TRANSCRIPT
Trinity College Dublin, The University of Dublin
Data download:bioinf.gen.tcd.ie/GE3M25/project
Get .fastq.gz file associated with your student ID
http://bioinf.gen.tcd.ie/GE3M25/project
Trinity College Dublin, The University of Dublin
GE3M25:Data Analysis, Class3
Karsten Hokamp, PhDGenetics
TCD, 30/11/2015
http://bioinf.gen.tcd.ie/GE3M25/project
Trinity College Dublin, The University of Dublin
GE3M25 Data Handling Module Content
PythonProgramming
Bioinformatics
ChIP-Seq analysis
http://bioinf.gen.tcd.ie/GE3M25/project
Trinity College Dublin, The University of Dublin
Class 3: Project Data
• Download project data
• Quality control
• Trimming
• Read mapping
• Visualisation
http://bioinf.gen.tcd.ie/GE3M25/project
Trinity College Dublin, The University of Dublin
Next Generation Sequencing - Applications
Xu F, Wang Q, Zhang F, Zhu Y, Gu Q, Wu L, Yang L, Yang X. Impact of Next-Generation Sequencing (NGS) technology on cardiovascular disease research. Cardiovasc Diagn Ther 2012;2(2):138-146
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
Source: Bio-Rad
ChIP-Seq Basics
ChIP = Chromatin ImmunoPrecipitation
= highly ordered packaging of DNA and histones together
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
= highly ordered packaging of DNA and histones together
Rosa, S.; Shaw, P. Insights into Chromatin Structure and Dynamics in Plants. Biology 2013, 2, 1378-1410.
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
Immunoprecipitation (IP) is the technique of precipitating a protein antigen out of solution using an antibody that specifically binds to that particular protein.
ChIP-Seq Basics
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
Steps in this class:
1. Download FastQ data set (ChIP-Seq of TF in yeast)
2. Quality control (FastQC)
3. Storage of FastQC report file
4. Read mapping (Bowtie2)
5. Generate indexed and sorted BAM file
6. Visualisation (IGV)
7. Store BAM and index files
GE3M25 Project
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
Optional steps in this class:
1. Trimming by quality (UrQt)
2. Trimming for Illumina Universal Adapter (trim_galore)
3. Trimming for other adapters (trim_galore)
4. Other read mapper (BWA)
5. Comparison of results
6. Upload of most suitable BAM and index files
GE3M25 Project
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
Working on the Command Line
Start:
Open 'Terminal' from Spotlight or Dock
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
GE3M25 Project Step 1
Download data
1. Browse to bioninf.gen.tcd.ie/GE3M25/project
2. Locate the file with your student ID
3. Click to download
4. Check Downloads folder for file
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
GE3M25 Project Step 2
Quality Control with FastQC
1. Download FastQC
2. Load the (compressed) FastQ file
3. Save report
4. Rename to start with full Student ID
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
GE3M25 Project Step 2
Info for project report
1. Data details (# sequences, read length, etc.)
2. Comments on quality aspects
3. Highlight of potential issues
4. Discuss ways to clean up data
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
Quality Information
Conversion of quality score:
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
GE3M25 Project Step 3
Storage of FastQC report
1. Open HTML report in browser
2. Copy and paste information into a Word document
or
Ctrl-click to copy images (or use Grab for screenshots)
3. Mail document to you
or
store on USB/Network
or
upload HTML file through bioinf.gen.tcd.ie/GE3M25/project
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
GE3M25 Project Step 4
Read mapping
1. Download bowtie2 programs and reference sequence
bioinf.gen.tcd.ie/GE3M25/data_handling
2. Switch to Terminal for command line work
3. Extract bowtie2 programs: tar zxvf bowtie2.tgz
Or: tar xvf bowtie2.tar
4. Build index: ./bowtie2-build S288C_reference_sequence_R64-2-1_20150113.fsa yeast
5. Map reads with default parameters:
./bowtie2 -x yeast -U XXX.fastq.gz -p 4 > bowtie2_def.sam
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
GE3M25 Project Step 4
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
GE3M25 Project Step 4
Read mapping
1. Download bowtie2 programs and reference sequence
bioinf.gen.tcd.ie/GE3M25/data_handling
2. Switch to Terminal for command line work
3. Extract bowtie2 programs: tar zxvf bowtie2.tgz
Or: tar xvf bowtie2.tar
4. Build index: ./bowtie2-build S288C_reference_sequence_R64-2-1_20150113.fsa yeast
5. Map reads with default parameters:
./bowtie2 -x yeast -U XXX.fastq.gz -p 4 > bowtie2_def.sam
Replace!
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
Working on the Command Line – the Prompt
userhost
directory symbol
Spaces are important!
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
Steps in this class:
1. Download FastQ data set (ChIP-Seq of TF in yeast)
2. Quality control (FastQC)
3. Storage of FastQC report file
4. Read mapping (Bowtie2)
5. Generate indexed and sorted BAM file
6. Visualisation (IGV)
7. Store BAM and index files
GE3M25 Project
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
GE3M25 Project Step 5
Generate indexed and sorted BAM file
Sequence Alignment/Map Format
- Standard format for read mapping results
- Can be compressed to save space:
binary SAM BAM format
- Can be indexed for random access
- samtools allow viewing and processing SAM data
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
GE3M25 Project Step 5
samtools
Download from bioinf, chmod and run
ls -l samtools
chmod +x samtools
./samtools
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
GE3M25 Project Step 5
samtools view options
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
SAM Format
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
GE3M25 Project Step 5
View SAM file
./samtools view -S bowtie2_def.sam | less
Change into BAM format
./samtools view -bS bowtie2_def.sam > bowtie2_def.bam
Sort BAM file
./samtools sort bowtie2_def.bam bowtie2_def_sorted
Index BAM file
./samtools index bowtie2_def_sorted.bam
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
Steps in this class:
1. Download FastQ data set (ChIP-Seq of TF in yeast)
2. Quality control (FastQC)
3. Storage of FastQC report file
4. Read mapping (Bowtie2)
5. Generate indexed and sorted BAM file
6. Visualisation (IGV)
7. Store BAM and index files
GE3M25 Project
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
GE3M25 Project Step 6
1. Download IGV (local copy on bioinf)
2. Unpack (on the command line_:
unzip IGV_2.3.66.app.zip
3. Start by double-click in Finder
4. Load S. cerevisiae (sacCer3) genome
5. Load BAM file
Visualisation with IGV (Integrated Genome Viewer)
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
GE3M25 Project Step 6
Visualisation with IGV (Integrated Genome Viewer)
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
Exercises
Clean your data via trimming
Run bowtie with different parameters
How do these steps affect the number of mapped reads?
How do they affect the peaks that you see in IGV?
Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin
GE3M25 Project Step 7
Storage of BAM file
upload BAM and bam.bai files through
bioinf.gen.tcd.ie/GE3M25/project
Trinity College Dublin, The University of Dublin
Don't forget to log out!