sdcsb advanced tutorial: reproducible data visualization workflow with cytoscape and ipython...

82
SDCSB Advanced Cytoscape Tutorial 4/17/2015 @Sanford Keiichiro Ono UCSD Trey Ideker Lab Cytoscape Core Team Building Reproducible Network Data Visualization Workflows with Cytoscape and IPython Notebook

Upload: keiichiro-ono

Post on 15-Jul-2015

305 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

SDCSB Advanced Cytoscape Tutorial

4/17/2015 @Sanford

Keiichiro Ono UCSD Trey Ideker Lab Cytoscape Core Team

Building Reproducible Network Data Visualization Workflows with Cytoscape and IPython Notebook

Page 2: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Thanks for Attending!

You are about to learn modern tools boosting your productivity!

REST

Page 3: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Keiichiro Ono

BackgroundBioinformatics

Computer ScienceWork

ResearchBioinformatics workflow

Visualization pipeline

Data

VisualizationNetworks

Other Biological Data

Integration

Molecular Interactions

Pathways

Annotations

Software Development

CytoscapeNeXO

Cyberinfrastructure

All kinds of small tools

Like

ArtKandinsky

Mondrian

Music

Electronica

TechnoMinimal

Detroit

Jazz

Sci-fiMovie

Novel

Life

US

San DiegoSan Francisco Bay Area

Los Angeles

Orange County

JapanGifu

Tokyo

Page 4: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Keiichiro Ono

BackgroundBioinformatics

Computer ScienceWork

ResearchBioinformatics workflow

Visualization pipeline

Data

VisualizationNetworks

Other Biological Data

Integration

Molecular Interactions

Pathways

Annotations

Software Development

CytoscapeNeXO

Cyberinfrastructure

All kinds of small tools

Like

ArtKandinsky

Mondrian

Music

Electronica

TechnoMinimal

Detroit

Jazz

Sci-fiMovie

Novel

Life

US

San DiegoSan Francisco Bay Area

Los Angeles

Orange County

JapanGifu

Tokyo

Page 5: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Keiichiro Ono

BackgroundBioinformatics

Computer ScienceWork

ResearchBioinformatics workflow

Visualization pipeline

Data

VisualizationNetworks

Other Biological Data

Integration

Molecular Interactions

Pathways

Annotations

Software Development

CytoscapeNeXO

Cyberinfrastructure

All kinds of small tools

Like

ArtKandinsky

Mondrian

Music

Electronica

TechnoMinimal

Detroit

Jazz

Sci-fiMovie

Novel

Life

US

San DiegoSan Francisco Bay Area

Los Angeles

Orange County

JapanGifu

Tokyo

Page 6: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Computer Science Biology

Page 7: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Cytoscape and IPython Notebook for Reproducible Data Visualization Workflow

Page 8: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Review: Basic Data Visualization Workflow with Cytoscape

Page 9: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

1. Data Integration (Load Networks and Tables)

2. Data Analysis

3. Visualization

Basic Workflow

4. Prepare for Publication

Page 10: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Network Data

Annotated Networks

Attributes

Analyzed Data

Page 11: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Cline, Melissa S., et al. "Integration of biological networks and gene expression data using Cytoscape." Nature protocols 2.10 (2007): 2366-2382.

Page 12: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Cline, Melissa S., et al. "Integration of biological networks and gene expression data using Cytoscape." Nature protocols 2.10 (2007): 2366-2382.

Page 13: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Results

Page 14: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Sharing Results

😐

Page 15: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Sharing Results and Process

😃

Page 16: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Point & Click Operation is Easy, but not Reproducible…

Page 17: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Problems in Bioinformatics- No more free lunch

- Even if you buy expensive machines, you cannot get free performance gain anymore. You have to design your code for massively distributed environment. (From Scale-up to Scale-out)

- Complex Data Analysis Pipeline

- Need to build pipeline by connecting multiple resources, or services

- Needs for complex, customized data visualization

- Reproducibility

➡ But building, deploying, and maintaining reproducible pipeline is not straight-forward

Page 18: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Goal: Reproducible Science

Page 19: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Goal: Reproducible Science

REST

Page 20: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Tools You Need

Page 21: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

REST

- Docker - Data analysis environment in a portable

container

- GitHub - For source code sharing

- IPython Notebook - Your electronic lab notebook

- cyREST - RESTful API module for Cytoscape

Page 22: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook
Page 23: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Why ?

Page 24: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

- Full-stack

- Data preparation to web application

- Easy to learn

- Strong support from data science community

- Tons of high-performance libraries

Page 25: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

A community for developers and users of Python data tools

pydata.org

Page 26: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

by Peter Wang @PyData 2014

Page 27: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook
Page 28: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

But most of the tools are language-agnostic!

Page 29: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Basic Data Visualization Workflow

Page 30: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Data Preparation Analysis Visualization

Page 31: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Data Preparation

Page 32: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Data Preparation

- Cleansing

- Normalization

- Missing values

- Corrupted values

- Reformat

- Conversion

Page 33: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Data Preparation Analysis Visualization

Page 34: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Analysis

Page 35: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Analysis

- Filtering

- Standard graph statistics

- Density

- Betweenness

- Centrality

- Clustering

- Community Detection

- GO enrichment analysis

Page 36: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Data Preparation Analysis Visualization

Page 37: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Visualization

Page 38: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Visualization

- Mapping

- Data points to visual variables

- Layout

- For graphs:

- Force-directed

- Tree

Page 39: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Data Preparation Analysis Visualization

Page 40: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Data Preparation

Analysis Visualization

Page 41: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Data Preparation

Analysis Visualization

Page 42: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Data Preparati

on

Analysis Visualization

Page 43: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

REST

Page 44: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Git/GitHub For Sharing Code/Notebooks

Page 45: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Git/GitHub For Sharing Code/Notebooks

- Git - Distributed Source Code Management System

- GitHub - (Public) Remote repository + great user interface for working with OSS code

Page 46: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

- Create a new repository from existing one

- Complete copy of the original + your full access

- Pull Request

Forking

Page 47: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Exercise: Fork Repository

Page 48: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Fork My Repo.

bit.ly/1aBiRuf

Page 49: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Prepare Environment to Run Notebooks

Page 50: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Docker as Portable Data Analysis Environment

Page 51: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Bare Metal Machine

OSVirtual Machine

Frameworks

Your App

Page 52: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Bare Metal MachineOS (Linux)

Docker

FrameworksApplication

FrameworksApplication

FrameworksApplication

FrameworksApplication

FrameworksApplication

Page 53: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook
Page 54: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

What is Docker?

- Container to run applications in an isolated environment

- Application = Layer of images

- Sharable Environments

- Environments as code

Page 55: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Docker Hub

- Sharing environments as code!

- Dockerfile - Definition of your container

- “GitHub of Images”

Page 56: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Image B

Image C

Image A

Page 57: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Data Analyst’s Toolbox

Basic Python

Graph Analysis

Page 58: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook
Page 59: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook
Page 60: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Run a Container

Page 61: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Quick Start

‣git clone [email protected]:idekerlab/sdcsb-advanced-tutorial.git

‣cd sdcsb-advanced-tutorial

‣docker run -d -v $PWD:/notebooks -p 80:8888 -e "PASSWORD=yourpass" -e "USE_HTTP=1" idekerlab/vizbi-2015

Page 62: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

docker run -d -v $PWD:/notebooks -p 80:8888 -e "PASSWORD=yourpass" -e "USE_HTTP=1" idekerlab/vizbi-2015

Actual Command to Run the Image (one-line)

Page 63: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

~/g/sdcsb-advanced-tutorial git:master ›❯›❯›❯ docker run -d -v $PWD:/notebooks -p 80:8888 -e "PASSWORD=sdcsb" -e "USE_HTTP=1" idekerlab/vizbi-2015 Unable to find image 'idekerlab/vizbi-2015:latest' locally Pulling repository idekerlab/vizbi-2015 7dfae1b52000: Pulling dependent layers 511136ea3c5a: Download complete f3c84ac3a053: Download complete a1a958a24818: Download complete 9fec74352904: Download complete d0955f21bf24: Download complete 4f527ba3fd02: Download complete ac7605e8bbf0: Download complete 8e8747f25e33: Download complete . . .

This takes a very long for the first time…

Page 64: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

~/g/sdcsb-advanced-tutorial git:master ›❯›❯›❯ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES fa3a9466a261 idekerlab/vizbi-2015:latest "/notebook.sh" 3 minutes ago Up 3 minutes 0.0.0.0:80->8888/tcp sad_wright

Check Status

Page 65: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook
Page 66: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook
Page 67: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook
Page 68: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook
Page 69: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

IPython Notebook as your electronic lab notebook

Page 70: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook
Page 71: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook
Page 72: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Jupyter as a Lab Notebook for Dry Experiments

Page 73: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Interactive Command-Line +

Markdown-based Documents

Page 74: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

IPython Notebook? Jupyter?

Page 75: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

IPython Notebook

Notebook UI

+ Python Kernel

Jupyter Notebook UI

+

Language Kernel

(R/Julia/etc.)

Page 76: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Language-Agnostic

- From next version (4.x), Python Notebook will be an implementation of Jupyter

- You can switch to other language kernels

Page 77: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook
Page 78: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

bit.ly/1HxZIqmLink to Welcome notebook on nbviewer

Page 79: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

Let’s start: Lesson 0

Page 81: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

• https://flic.kr/p/bFZpyg

• https://flic.kr/p/bmXUz1

Photo Credits

Page 82: SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook

• https://www.flickr.com/photos/23629083@N03/15409436041/in/photolist-ptFotK-9uS2gj-hypkSp-hypk9F-hypjha-99c472-9Xkuuc-huNmqB-7NMxMz-rg2Xh2-qYABcA-qjnGoB-rg2WVF-rdQYMf-qjaxy7-rg5Aoo-rg2Wre-qYAAt1-rg2Wev-qYAAaA-rg2W1V-rdQXT1-qjawtS-rg9ePH-rg5zb3-qjnEtV-qYHAvc-qYBA9d-rg2V7F-qYHAeF-qYAySA-rg5ys9-rg9dLF-rg2Utg-rg9drH-qYAyew-rg9dmc-rg5xP5-rg5xDA-qYAxV5-rg2TLe-rg5xp7-rg5xfQ-aq32tC-hba7em-hbafzE-gbeABq-gck7Dv-7PoYg1-fkisQL

• https://www.flickr.com/photos/nebulux/10000066526/in/photolist-geEXo7-58r1VP-6GioJH-9juEda-53HFiR-4sq7n3-4gyg7e-8ag9VV-8uqK43-4E89Gc-iWDeiJ-9G47M4-9G71KC-9waYuP-5FWSrX-87Mhxi-9G71XY-7Ai8hs-48vd2B-7B7o6n-6D9uWd-6hffXv-gYExNx-7defC1-66ygvB-4LsWSN-6D5n5k-6hfg5z-eucXAh-8uyuuG-aAY6cH-76QCEX-7f6mdp-RntfW-eFuVBC-5nY8Vc-7utTA2-brdj8F-92k6n3-5KdCfh-83uVKy-8unxG8-3d3zxi-cdz8S7-4HT5qQ-99SwEn-7Akbcb-8y7ds9-fvo9zH-9zZky3

• https://www.flickr.com/photos/stratman2/8613731520/in/photolist-e8aChq-7LLUoQ-8s8eBL-6uGRmE-77wKJF-dqo6ar-6hffGK-7rykRT-6fG8WV-8unyFa-8AeF8A-93Xpo2-9XLXCj-7GVMym-5Tu3dJ-7v58RC-5K9nBF-2MbvpL-2M77nV-et54Ce-6hfgvr-6hffQa-67wNj5-9FDGTz-49NmoE-eFXB7u-76QB7H-brdbSP-brcYHT-22zYYv-6fFZoM-ckuXNC-a8UZ3D-dzGXYU-6nf4MN-4j7TzA-47fYur-2kutoV-56catX-apUJgr-cSJHkG-88w1ie-6Nbj1a-8MYxve-6xL3SF-6fL87j-4G6x71-dUL16b-7auq8Q-6hwbVB

• https://www.flickr.com/photos/gcwest/281385801/in/photolist-5mFJtX-4o3Ria-hD9E92-qSbck-9abnoA-7hsWoU-ntEmgy-oSAQtv-nx5Chg-iuZJCa-j7eWKk-hD7JTZ-4iECHX-j8M2r7-bSrWHc-prpFcX-db7xd-jLmzoF-75mqRx-pnSzL-6gVcao-9F5bop-j77HEs-73Umq1-5kRyNp-hD9cR2-mTvNB8-gyXWaf-Lkro7-idQBY4-fRYu1-5eR2cn-3EK4k-nnxH8u-9uDMLx-4NY3Yi-kDQagt-ioGRSb-75qid1-82RzYt-5qQuwt-n8hvL6-ifemz5-3iYUQG-aJnNiX-mzirX2-23rDNy-qx3KEd-h5UnGW-hD7Jqz