analysis and visualization of microarray experiment data integrating pipeline pilot, spotfire and r

28
Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R Vladimir Morozov, ALS Therapy Development Institute 2009 - Boston, MA

Upload: vmorozov

Post on 17-Dec-2014

1.858 views

Category:

Documents


0 download

DESCRIPTION

More 30 public and proprietary microarray experiments have been analyzed using in-house software. Pipeline Pilot workflows are developed to integrate the analysis results into the company gene target Knowledge Sphere platform. The gene expression values are analyzed and plotted via the R connector and custom R scripts. Pipeline Pilot workflows are embedded as Spotfire guides to retrieve gene annotation from NCBI, produce visualizations of differential expression statistics and biological pathway

TRANSCRIPT

Page 1: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R

Analysis and visualization of microarray experiment data

integrating Pipeline Pilot, Spotfire and R

Vladimir Morozov, ALS Therapy Development Institute

2009 - Boston, MA

Page 2: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R

Abstract

More 30 public and proprietary microarray experiments have been analyzed using in-house software. Pipeline Pilot workflows are developed to integrate the analysis results into the company gene target Knowledge Sphere platform. The gene expression values are analyzed and plotted via the R connector and custom R scripts. Pipeline Pilot workflows are embedded as Spotfire guides to retrieve gene annotation from NCBI, produce visualizations of differential expression statistics and biological pathway regulation.

Page 3: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R

R/Bioconductor pipeline

Array quality

AffyBatch and design

QC images

ExprSet

Affymetrix experiment data & annotation

Quality control

Modeling gene expression by

biological variables

Pathway analysis of the

gene modelsNormalization

gene expression values stored on SQL server

Gene statistics Pathway statistics

R data files

Images

Tables

RDBMS

Page 4: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R

The data are available via the company portal

Page 5: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R
Page 6: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R

Access to experiment data set

Links call a Pipeline Pilot protocol

Page 7: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R

PP protocol parses a directory with the experiment files and expose them thorogh web page

Page 8: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R
Page 9: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R
Page 10: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R
Page 11: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R

Automatically generate Volcano plots for the all statistical comparisons from the design file

Page 12: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R

Ander hood:

The Guide page points out to start PP protocol

Page 13: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R

Modified Discngine connector as components

Custom JaveScripts using the Spotfire API

Page 14: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R

Parsing experiment design data

Page 15: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R

JavaScripts with JSON input are exposed on the Spotfire Guide

panel

Page 16: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R

Pathway analysis visualizations are generated via similar Pipeline Pilot-JavaScript-Spotfire framework

Page 17: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R
Page 18: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R
Page 19: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R

Search for genes inside Spotfire visualization

Page 20: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R
Page 21: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R
Page 22: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R

Under hood:

User queryNCBI GET Esearch

(couldn’t get the web service work in PP)

Propagated to all mammalian orthologe IDs via SQL server HomoloGene database. PP “ODBC Select for Each Data” component is slow. So joining is done via temporary text table and SQL BULK INSERT

Enterz IDs extracted

The gene IDs are passed to Spotfire Guide Javascript via JSON file

Page 23: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R

Analysis of individual gene expression

Call to PP protocol with gene , experiment IDs as parameters

Page 24: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R

Opens the PP protocol that read gene expression data from MS SQL server

Page 25: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R
Page 26: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R
Page 27: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R

Under hood:

R custom scripts using “pairwiseCI” ,”Hmisc” and “ggplot2” R packages. The PP “Run R” component is modified to accept command line arguments:

Expression values from SQL server

Experiment design file

Page 28: Analysis and visualization of microarray experiment data integrating Pipeline Pilot, Spotfire and R

Acknowledgments

• Shawn Sullivan & Bashar Alnakhala(ALS-TDI) for providing SQL server storage and Web front-end

• Eric Le Roux (Discngine) for the Spotfire connector

• All ALS-TDI scientists for feedbacks