trinity college dublin, the university of dublin data download: bioinf.gen.tcd.ie/ge3m25/project...

33
Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/proje ct Get .fastq.gz file associated with your student ID tp://bioinf.gen.tcd.ie/GE3M25/proje

Upload: laureen-wood

Post on 21-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of Dublin

Data download:bioinf.gen.tcd.ie/GE3M25/project

Get .fastq.gz file associated with your student ID

http://bioinf.gen.tcd.ie/GE3M25/project

Page 2: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of Dublin

GE3M25:Data Analysis, Class3

Karsten Hokamp, PhDGenetics

TCD, 30/11/2015

http://bioinf.gen.tcd.ie/GE3M25/project

Page 3: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of Dublin

GE3M25 Data Handling Module Content

PythonProgramming

Bioinformatics

ChIP-Seq analysis

http://bioinf.gen.tcd.ie/GE3M25/project

Page 4: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of Dublin

Class 3: Project Data

• Download project data

• Quality control

• Trimming

• Read mapping

• Visualisation

http://bioinf.gen.tcd.ie/GE3M25/project

Page 5: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of Dublin

Next Generation Sequencing - Applications

Xu F, Wang Q, Zhang F, Zhu Y, Gu Q, Wu L, Yang L, Yang X. Impact of Next-Generation Sequencing (NGS) technology on cardiovascular disease research. Cardiovasc Diagn Ther 2012;2(2):138-146

Page 6: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

Source: Bio-Rad

ChIP-Seq Basics

ChIP = Chromatin ImmunoPrecipitation

= highly ordered packaging of DNA and histones together

Page 7: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

= highly ordered packaging of DNA and histones together

Rosa, S.; Shaw, P. Insights into Chromatin Structure and Dynamics in Plants. Biology 2013, 2, 1378-1410.

Page 8: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

Immunoprecipitation (IP) is the technique of precipitating a protein antigen out of solution using an antibody that specifically binds to that particular protein.

ChIP-Seq Basics

Page 9: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

Page 10: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

Steps in this class:

1. Download FastQ data set (ChIP-Seq of TF in yeast)

2. Quality control (FastQC)

3. Storage of FastQC report file

4. Read mapping (Bowtie2)

5. Generate indexed and sorted BAM file

6. Visualisation (IGV)

7. Store BAM and index files

GE3M25 Project

Page 11: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

Optional steps in this class:

1. Trimming by quality (UrQt)

2. Trimming for Illumina Universal Adapter (trim_galore)

3. Trimming for other adapters (trim_galore)

4. Other read mapper (BWA)

5. Comparison of results

6. Upload of most suitable BAM and index files

GE3M25 Project

Page 12: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

Working on the Command Line

Start:

Open 'Terminal' from Spotlight or Dock

Page 13: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

GE3M25 Project Step 1

Download data

1. Browse to bioninf.gen.tcd.ie/GE3M25/project

2. Locate the file with your student ID

3. Click to download

4. Check Downloads folder for file

Page 14: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

GE3M25 Project Step 2

Quality Control with FastQC

1. Download FastQC

2. Load the (compressed) FastQ file

3. Save report

4. Rename to start with full Student ID

Page 15: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

GE3M25 Project Step 2

Info for project report

1. Data details (# sequences, read length, etc.)

2. Comments on quality aspects

3. Highlight of potential issues

4. Discuss ways to clean up data

Page 16: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

Quality Information

Conversion of quality score:

Page 17: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

GE3M25 Project Step 3

Storage of FastQC report

1. Open HTML report in browser

2. Copy and paste information into a Word document

or

Ctrl-click to copy images (or use Grab for screenshots)

3. Mail document to you

or

store on USB/Network

or

upload HTML file through bioinf.gen.tcd.ie/GE3M25/project

Page 18: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

GE3M25 Project Step 4

Read mapping

1. Download bowtie2 programs and reference sequence

bioinf.gen.tcd.ie/GE3M25/data_handling

2. Switch to Terminal for command line work

3. Extract bowtie2 programs: tar zxvf bowtie2.tgz

Or: tar xvf bowtie2.tar

4. Build index: ./bowtie2-build S288C_reference_sequence_R64-2-1_20150113.fsa yeast

5. Map reads with default parameters:

./bowtie2 -x yeast -U XXX.fastq.gz -p 4 > bowtie2_def.sam

Page 19: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

GE3M25 Project Step 4

Page 20: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

GE3M25 Project Step 4

Read mapping

1. Download bowtie2 programs and reference sequence

bioinf.gen.tcd.ie/GE3M25/data_handling

2. Switch to Terminal for command line work

3. Extract bowtie2 programs: tar zxvf bowtie2.tgz

Or: tar xvf bowtie2.tar

4. Build index: ./bowtie2-build S288C_reference_sequence_R64-2-1_20150113.fsa yeast

5. Map reads with default parameters:

./bowtie2 -x yeast -U XXX.fastq.gz -p 4 > bowtie2_def.sam

Replace!

Page 21: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

Working on the Command Line – the Prompt

userhost

directory symbol

Spaces are important!

Page 22: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

Steps in this class:

1. Download FastQ data set (ChIP-Seq of TF in yeast)

2. Quality control (FastQC)

3. Storage of FastQC report file

4. Read mapping (Bowtie2)

5. Generate indexed and sorted BAM file

6. Visualisation (IGV)

7. Store BAM and index files

GE3M25 Project

Page 23: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

GE3M25 Project Step 5

Generate indexed and sorted BAM file

Sequence Alignment/Map Format

- Standard format for read mapping results

- Can be compressed to save space:

binary SAM BAM format

- Can be indexed for random access

- samtools allow viewing and processing SAM data

Page 24: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

GE3M25 Project Step 5

samtools

Download from bioinf, chmod and run

ls -l samtools

chmod +x samtools

./samtools

Page 25: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

GE3M25 Project Step 5

samtools view options

Page 26: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

SAM Format

Page 27: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

GE3M25 Project Step 5

View SAM file

./samtools view -S bowtie2_def.sam | less

Change into BAM format

./samtools view -bS bowtie2_def.sam > bowtie2_def.bam

Sort BAM file

./samtools sort bowtie2_def.bam bowtie2_def_sorted

Index BAM file

./samtools index bowtie2_def_sorted.bam

Page 28: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

Steps in this class:

1. Download FastQ data set (ChIP-Seq of TF in yeast)

2. Quality control (FastQC)

3. Storage of FastQC report file

4. Read mapping (Bowtie2)

5. Generate indexed and sorted BAM file

6. Visualisation (IGV)

7. Store BAM and index files

GE3M25 Project

Page 29: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

GE3M25 Project Step 6

1. Download IGV (local copy on bioinf)

2. Unpack (on the command line_:

unzip IGV_2.3.66.app.zip

3. Start by double-click in Finder

4. Load S. cerevisiae (sacCer3) genome

5. Load BAM file

Visualisation with IGV (Integrated Genome Viewer)

Page 30: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

GE3M25 Project Step 6

Visualisation with IGV (Integrated Genome Viewer)

Page 31: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

Exercises

Clean your data via trimming

Run bowtie with different parameters

How do these steps affect the number of mapped reads?

How do they affect the peaks that you see in IGV?

Page 32: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

GE3M25 Project Step 7

Storage of BAM file

upload BAM and bam.bai files through

bioinf.gen.tcd.ie/GE3M25/project

Page 33: Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of Dublin

Don't forget to log out!