how to do differential expression analysis from fastq ... · •introduction: •btrim is a fast...

13

How to do differential expression analysis from Fastq format data on HPC Written by: yiran.zhang

Upload: others

Post on 01-Aug-2020

1 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

How to do differential expression analysis from Fastq

format data on HPCWritten by: yiran.zhang

Page 2: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

Pipeline

• ShortRead: Quality Assessment, filtering and trimming

• Fastqc: Quality control

• Btrim: Filtering and Trimming

• HISAT2: Align the reads to reference genome

• Htseq-count: Count reads using htseq-count

• DESeq: Differential gene expression analysis based on the negative binomial distribution.

Page 3: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

ShortRead

• Introduction:• The ShortRead package provides functionality for working with FASTQ files

from high throughput sequence analysis

• Environment required:• R

• Function:• Quality Assessment

• filtering and trimming

Page 4: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

ShortRead

• Write your own function to guarantee the reads can not contain ‘N’

> qaSummary[["baseCalls"]] A C G T Ngu2_read1.fq 21685857 28412307 28130219 21767568 4049gu2_read2.fq 21729895 28722063 27816884 21730591 567gu3_read1.fq 21723444 28346939 28174527 21751407 3683gu3_read2.fq 21734048 28697947 27824570 21742736 699ye1_read1.fq 21675483 28443112 28095517 21781660 4228ye1_read2.fq 21702486 28762839 27785294 21748745 636ye3_read1.fq 21795076 28360347 27968237 21872354 3986ye3_read2.fq 21807964 28695030 27618484 21877864 658

Page 5: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

Fastqc: Quality control

• Introduction:• FastQC aims to provide a simple way to do some quality control checks on

raw sequence data coming from high throughput sequencing pipelines.

Page 6: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

Fastqc

Page 7: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

Btrim

• ShortRead drops the reads containing the ‘N’, but it looks like that the low quality bases still exists, so we decide to filtering and trimming the ShortRead result with Btrim.

• Introduction:• Btrim is a fast and lightweight tool to trim adapters and low quality regions in

reads from ultra high-throughput next-generation sequencing machines.

• Note:• Use fastqc to get the quality control report again, to check whether the

filtered and trimmed reads are reasonable. Just edit your previous command of fastqc.pbs and submit it.

Page 8: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

HISAT2

• Introduction:• HISAT2 is a fast and sensitive alignment program for mapping next-generation

sequencing reads (both DNA and RNA) to a population of human genomes (as well as against a single reference genome).

• Advantage: Highly efficient

• Note:• It will create a SAM file which can be directly used in the further work of

htseq.

Page 9: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

HISAT2

Page 10: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

HTSeq:

• Introduction:

• HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays.

• Require:• htseq-count [options] <alignment_file> <gff_file>

Page 11: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

DESeq

• Introduction:• Differential gene expression analysis based on the negative binomial

distribution.

Page 12: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

Further work

• Try to connect the whole Pipeline which can make this work in less commands and steps.

• Adjust program to our system to do ambiguous reads mapping

Page 13: How to do differential expression analysis from Fastq ... · •Introduction: •Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra

Thank you

Practical Linux Examples · Estimate the percentage of sequencing reads in the FASTQ file that contains the adapter “AGATCGGAAGAGC”. • Read file: cat D7RACXX.fastq • Select

Le contrôle qualité sur les données fastq

RNA-seq data analysis Project QI LIU. From reads to differential expression Raw Sequence Data FASTQ Files Unspliced Mapping BWA, Bowtie Mapped Reads SAM/BAM

Using FlexNet Business Adapters Documents/Cloud... · otherwise clarified, all discussion of 'adapters' in this document relate to business adapters (as distinct from inventory adapters)

BM1 Supplementary Note v05.docx - biorxiv.org · 12/23/2014 · For BM1.1, sequence reads in FASTQ format corresponding to 40x coverage of a CLL tumor sample and the ... We initially

Documentos - Embrapa · Os arquivos se chamam: adrenal_1.fastq, adrenal_2.fastq, brain_1.fastq, brain_2.fastq e ‘iGenomes UCSC hg19, chr19 gene annotation’. Todos os 5 arquivos

Adapters Appliances and Adapters

Accelerating Genomic Discovery with Apache Spark · 2019-11-05 · • Can load reads from SAM/BAM/CRAM/FASTQ • Executes GATK BWA JNI bindings from within Spark to parallelize alignment

De novo sequencing and Assembly - CGIARhpc.ilri.cgiar.org/beca/training/SudanBFX2014/... · - Short single end reads: velveth Assem 29 -short -fastq s_1_sequence.txt - Paired-end

BBA - Proteins and Proteomics - USPsistemas.eel.usp.br/docentes/arquivos/427823/PBI5206/Matheus.pdf · reads. The quality of the sequences in FASTQ format was analyzed using FastQC

File Types in Bioinformatics - GitHub Pages · File Types in Bioinformatics 151116 ... FASTA – reference sequences FASTQ – reads in raw form SAM – aligned reads BAM – compressed

RNA seq Data Therapy 12Jan2018 - University of Connecticut · analysis RNA fastq fastq SAM/BAM fasta GFF/GTF RNA-seqvs Microarray • Higher sensitivity and dynamic range • Lower

Brennan Adapters

Conversion Adapters

Adapters/Couplings - VALUE · Adapters/Couplings Adapters/Couplings Autoclave Engineers offers a complete line of standard adapters and couplings as well as special

Adapters & Tube Fittings - Airline Hydraulics · adapters • dayco pipe swivel adapters • dayco tube adapters • dayco supraflare adapters • metric adapters • dayco special

adapters 11g.docx

NGS: Controle de qualidade e montagem de novolabbioinfo.ufpel.edu.br/aula_rnaseq/aula_agronomia.pdf · Arquivos: Lepto_1.fastq, Lepto_2.fastq, Lepto_RNA.fastq. E as análises downstream?

Ethernet iSCSI Adapters and Ethernet FCoE Adapters QLogic ... · iii BC0054508-00 J User’s Guide—Ethernet iSCSI Adapters and Ethernet FCoE Adapters QLogic BCM57xx and BCM57xxx

A Comparison of Two Methods of Template Amplification for ......reads, total sequencing reads, empty microcell well reads, no-template reads, useable number reads, and library number

De-multiplexing & Quality Control Challenges and Solutions · –Translate base calls (.bcl files) to compressed, demultiplexed FASTQ files –Align reads –Call variants (SNPs and

A look at the methods behind whole‐genome single ... · ‐ I’ll explain the dataformat for a fastq file in the next slide ‐ These fastq files are what gets transferred over

MuG - FASTQ Pipelines Documentation

Illumina Output Fastq format and Quality control · Fastq format – fasta with qualities • p = the probability that the corresponding base call is wrong • Qualities – p = 0.1

Technical NoteNote+-+C… · upaired reads and can be ignored), then import the fastq files into Geneious by dragging them into an appropriate folder in the Local data folder. 2

SPRING: a next-generation compressor for FASTQ data · SPRING as a practical tool •Support for: •Lossless and lossy modes •Variable length reads, long reads, etc. •Random

Hydraulics Adapters · Hydraulics Adapters Hydraulics Catalog Adapters. Adapters Adapters Steel AdaptersSteel Adapters 1 Hydraulic Adapters Nomenclature Thread Type Family Thread

Assembly, annotation, quantification Assembly.pdf · annotation GTF file Transcript identification StringTie Raw reads FASTQ file Genome sequence FASTA file Genome annotation GFF/GTF

Supplementary Materials for...Mar 02, 2016 · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

Sensor Accessories Connector adapters; Current adapters · Sensor Accessories Connector adapters; Current adapters Connector adapters To provide connectivity between sensors and IEDs

AdaptersŸродукти/Тръби и тръбна... · Adapters BSPT-BSPP metric-UNF Conversion adapters Male/Male adapters BSPP 60° cone, BSPT, NPT Male/Female swivel adapters