how to do differential expression analysis from fastq ... · •introduction: •btrim is a fast...

13
How to do differential expression analysis from Fastq format data on HPC Written by: yiran.zhang

Upload: others

Post on 01-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

How to do differential expression analysis from Fastq

format data on HPCWritten by: yiran.zhang

Page 2: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

Pipeline

• ShortRead: Quality Assessment, filtering and trimming

• Fastqc: Quality control

• Btrim: Filtering and Trimming

• HISAT2: Align the reads to reference genome

• Htseq-count: Count reads using htseq-count

• DESeq: Differential gene expression analysis based on the negative binomial distribution.

Page 3: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

ShortRead

• Introduction:• The ShortRead package provides functionality for working with FASTQ files

from high throughput sequence analysis

• Environment required:• R

• Function:• Quality Assessment

• filtering and trimming

Page 4: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

ShortRead

• Write your own function to guarantee the reads can not contain ‘N’

> qaSummary[["baseCalls"]] A C G T Ngu2_read1.fq 21685857 28412307 28130219 21767568 4049gu2_read2.fq 21729895 28722063 27816884 21730591 567gu3_read1.fq 21723444 28346939 28174527 21751407 3683gu3_read2.fq 21734048 28697947 27824570 21742736 699ye1_read1.fq 21675483 28443112 28095517 21781660 4228ye1_read2.fq 21702486 28762839 27785294 21748745 636ye3_read1.fq 21795076 28360347 27968237 21872354 3986ye3_read2.fq 21807964 28695030 27618484 21877864 658

Page 5: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

Fastqc: Quality control

• Introduction:• FastQC aims to provide a simple way to do some quality control checks on

raw sequence data coming from high throughput sequencing pipelines.

Page 6: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

Fastqc

Page 7: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

Btrim

• ShortRead drops the reads containing the ‘N’, but it looks like that the low quality bases still exists, so we decide to filtering and trimming the ShortRead result with Btrim.

• Introduction:• Btrim is a fast and lightweight tool to trim adapters and low quality regions in

reads from ultra high-throughput next-generation sequencing machines.

• Note:• Use fastqc to get the quality control report again, to check whether the

filtered and trimmed reads are reasonable. Just edit your previous command of fastqc.pbs and submit it.

Page 8: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

HISAT2

• Introduction:• HISAT2 is a fast and sensitive alignment program for mapping next-generation

sequencing reads (both DNA and RNA) to a population of human genomes (as well as against a single reference genome).

• Advantage: Highly efficient

• Note:• It will create a SAM file which can be directly used in the further work of

htseq.

Page 9: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

HISAT2

Page 10: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

HTSeq:

• Introduction:

• HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays.

• Require:• htseq-count [options] <alignment_file> <gff_file>

Page 11: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

DESeq

• Introduction:• Differential gene expression analysis based on the negative binomial

distribution.

Page 12: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

Further work

• Try to connect the whole Pipeline which can make this work in less commands and steps.

• Adjust program to our system to do ambiguous reads mapping

Page 13: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

Thank you