1 outline - barc.wi.mit.edubarc.wi.mit.edu/education/hot_topics/galaxy/rnaseq... · 17/01/2013  ·...

4
RNA-seq Analysis in Galaxy Local copy of Galaxy: https://galaxy.wi.mit.edu/ Main site: http://main.g2.bx.psu.edu/ January, 2013 Hot Topics: RNA-seq Analysis in Galaxy. 1 Outline What is the Galaxy Interface? What is RNA-seq? Data upload in Galaxy Data preprocessing in Galaxy Tuxedo tools for RNA-seq analysis RNA-seq analysis workflows: Hands-on 2 Hot Topics: RNA-seq Analysis in Galaxy. The Galaxy Interface A web based platform for analysis of large genomic datasets 3 No need of programming experience. Integrates many bioinformatics tools within one interface. Keeps track of all the steps performed in an analysis. Even if you delete the datasets, the history keeps the tools used. LOCAL COPY Faster Customizable 250Gb of storage Data is private Jobs are sent to the cluster Type “https://galaxy.wi.mit.edu/” in your browser address. You will be prompted for your name and password (these are the same that you use for your email) Hot Topics: RNA-seq Analysis in Galaxy. Galaxy Interface: Analyze Data Tools window Data display and tool’s dialog window History window: Data analysis Processed data Green: job is finished Yellow: job is running Gray: job is in queue Red: there is a problem 4 History window: History window: All analysis steps are saved. Data is not overwritten. Can create workflow to repeat an analysis.

Upload: others

Post on 07-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Outline - barc.wi.mit.edubarc.wi.mit.edu/education/hot_topics/galaxy/RNAseq... · 17/01/2013  · Cufflinks package Cufflinks Assembles transcripts Cuffcompare Compares transcript

1/17/2013

1

RNA-seq Analysis in Galaxy

Local copy of Galaxy: https://galaxy.wi.mit.edu/ Main site: http://main.g2.bx.psu.edu/

January, 2013

Hot Topics: RNA-seq Analysis in Galaxy.

1

Outline

• What is the Galaxy Interface?

• What is RNA-seq?

• Data upload in Galaxy

• Data preprocessing in Galaxy

• Tuxedo tools for RNA-seq analysis

• RNA-seq analysis workflows: Hands-on

2

Hot Topics: RNA-seq Analysis in Galaxy.

The Galaxy Interface A web based platform for analysis of large genomic datasets

3

within one interface.

an analysis. Even if you delete the datasets, the history keeps the tools used.

No need of programming experience.

Integrates many bioinformatics tools within one interface.

Keeps track of all the steps performed in an analysis. Even if you delete the datasets, the history keeps the tools used.

LOCAL COPY

storage

Jobs are sent to the cluster

LOCAL COPY

Faster Customizable 250Gb of storage Data is private Jobs are sent to the cluster

Type “https://galaxy.wi.mit.edu/” in your browser address. You will be prompted for your name and password (these are the same that you use for your email)

Hot Topics: RNA-seq Analysis in Galaxy.

Galaxy Interface: Analyze Data

Tools window

Data display and tool’s dialog window

History window:

Data analysis

Processed data Green: job is finished Yellow: job is running Gray: job is in queue Red: there is a problem

4

History window:

History window: All analysis steps are saved. Data is not overwritten. Can create workflow to repeat an analysis.

Page 2: 1 Outline - barc.wi.mit.edubarc.wi.mit.edu/education/hot_topics/galaxy/RNAseq... · 17/01/2013  · Cufflinks package Cufflinks Assembles transcripts Cuffcompare Compares transcript

1/17/2013

2

Hot Topics: RNA-seq Analysis in Galaxy.

How to find your previous histories

5

History menu

RNA-seq Experiment

Wang, Z. et al. RNA-Seq: a revolutionary tool for transcriptomics Nature Reviews Genetics (2009)

6

Hot Topics: RNA-seq Analysis in Galaxy.

RNA-seq Applications

• Annotation

Identify novel genes, transcripts, exons, splicing events, ncRNAs.

• Detecting RNA editing and SNPs.

• Measurements: RNA quantification and differential gene expression

Abundance of transcripts between different conditions

7

Hot Topics: RNA-seq Analysis in Galaxy. Hot Topics: RNA-seq Analysis in Galaxy.

Getting Data: Uploading Large Files Step 1: copy your file to

/nfs/galaxy/uploads/[email protected] using a sftp client

8

CyberDuck

/nfs/galaxy/uploads/[email protected]

22

Page 3: 1 Outline - barc.wi.mit.edubarc.wi.mit.edu/education/hot_topics/galaxy/RNAseq... · 17/01/2013  · Cufflinks package Cufflinks Assembles transcripts Cuffcompare Compares transcript

1/17/2013

3

Hot Topics: RNA-seq Analysis in Galaxy.

Getting Data: Uploading Large Files Step 2: Select and upload the file within galaxy

9

Execute

Genome Assembly

Upload Fie

Preprocessing NGS Quality Control: FastQC

Hot Topics: RNA-seq Analysis in Galaxy.

10

Hot Topics: RNA-seq Analysis in Galaxy.

Preprocessing: Remove bad quality reads FASTX-TOOLKIT ->Filter by quality

11

RNA-seq Analysis: Tuxedo Tools

Hot Topics: RNA-seq Analysis in Galaxy.

Tool Tool description

Bowtie Ultrafast short read aligner

Tophat Aligns RNA-seq reads to the genome using Bowtie. Discovers splice sites

Cufflinks package

Cufflinks Assembles transcripts

Cuffcompare Compares transcript assemblies to annotation

Cuffmerge Merges two or more transcript assemblies

Cuffdiff Finds differentially expressed genes and transcripts. Detects differential splicing and promoter use

12

Page 4: 1 Outline - barc.wi.mit.edubarc.wi.mit.edu/education/hot_topics/galaxy/RNAseq... · 17/01/2013  · Cufflinks package Cufflinks Assembles transcripts Cuffcompare Compares transcript

1/17/2013

4

Examples of analysis workflows Hands-on 1

Differential expression analysis

Hands-on 2 Transcript assembly and

differential expression analysis

Hands-on 3 Transcript assembly and transcript comparison

Hot Topics: RNA-seq Analysis in Galaxy.

13

Tutorials and References • Galaxy tutorials

http://galaxy.psu.edu/screencasts.html

• Previous Hot Topics

http://jura.wi.mit.edu/bio/education/hot_topics

• SOPs (Standard operating procedures)

https://gir.wi.mit.edu/trac/wiki/barc/SOPs

• References

Taylor et al. (2007) Using Galaxy to perform large-scale interactive data analyses. Current

Protocols in Bioinformatics Chapter 10, unit 10.

Blankenberg et al. (2010) Manipulation of FASTQ data with Galaxy. Bioinformatics

26(14):1783-5

Trapnell et al. (2012) Differential gene and transcript expression analysis of RNA-seq

experiments with TopHat and Cufflinks. Nature Protocols 7, 562–578

Tophat Manual: http://tophat.cbcb.umd.edu/manual.html

Cufflinks Manual: http://cufflinks.cbcb.umd.edu/manual.html

14