nci cbiit speaker series december 9 2015...programmatic access for the algorithm developer google...
TRANSCRIPT
![Page 1: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/1.jpg)
ISB Cancer Genomics Cloud
NCI CBIIT Speaker Series
December 9th 2015
![Page 2: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/2.jpg)
ISB-CGC Team Members
Ilya Shmulevich
Sheila Reynolds
Michael Miller
Phyliss Lee
Kelly Iverson
Zack Rodebaugh
Kalle Leinonen
Abigail Hahn
Eric Downes
Roger Kramer
David Pot
Ross Casanova
Sandeep Namburi
Yan Zhang
Brian Conn
Jonathan Bingham
Nicole Deflaux
Matt Bookman
Jaclyn Koller
![Page 3: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/3.jpg)
ISB GDAC in TCGA
http://explorer.cancerregulome.org
![Page 4: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/4.jpg)
ISB GDAC in TCGA: Cloud Pilots
http://explorer.cancerregulome.org
“[The Cloud Pilots] aim to bring data and analysis together on a single platform by creating a set of data repositories with co-located computational capacity and an Application Programming Interface (API) that provides secure data access.“
ISB GDAC in TCGA
![Page 5: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/5.jpg)
The Challenge of Big Data
Big Data: Astronomical or Genomical? Zachary D. Stephens, Skylar Y. Lee, Faraz Faghri, Roy H. Campbell, Chengxiang Zhai, Miles J. Efron,
Ravishankar Iyer, Michael C. Schatz , Saurabh Sinha , Gene E. Robinson
![Page 6: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/6.jpg)
The Challenge of Big Data, TCGA
1 P
B
Big Data: Astronomical or Genomical? Zachary D. Stephens, Skylar Y. Lee, Faraz Faghri, Roy H. Campbell, Chengxiang Zhai, Miles J. Efron,
Ravishankar Iyer, Michael C. Schatz , Saurabh Sinha , Gene E. Robinson
![Page 7: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/7.jpg)
Cloud Paradigm Shift(s)• Shift #1: Move data and existing pipelines to the cloud
– all researchers access a single copy of the data
– everyone saves time, money, and bandwidth
– compute-power is “near” the data
– pay only for minutes used
• Shift #2: Cloud-aware computing
– rethink/redevelop approaches to fully leverage the power of the cloud
– massively parallel, bursty, opportunistic computing
![Page 8: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/8.jpg)
Cloud Paradigm Shift(s), Example
• Shift #1: Move data and existing pipelines to the cloud
– all researchers access a single copy of the data
– everyone saves time, money, and bandwidth
– compute-power is “near” the data
– pay only for minutes used
• Shift #2: Cloud-aware computing
– rethink/redevelop approaches to fully leverage the power of the cloud
– massively parallel, bursty, opportunistic computing
• eg: use BigQuery to calculate expression association with mutation status for one gene takes 7s, doing it for all 20k genes takes less than 9s!
![Page 9: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/9.jpg)
The ISB Cancer Genomics Cloud
• Goals
• Approach
![Page 10: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/10.jpg)
Primary Goals of the ISB-CGC
to make TCGA data, together with tools and compute-power available and accessible to a broad range of users
using multiple access modes:• interactive web application
• scripting languages: R, Python, SQL
• direct programmatic access
![Page 11: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/11.jpg)
Platform & Tools Targeted to a Range of Users
Google Cloud Storage BigQuery Google Genomics
ISB Cancer Genomics Cloud(web app, API, tools, etc) Compute
Engine VMs
Local Storage
PI / BiologistComputational
Research ScientistAlgorithm Developer
web access
python, R, SQL
ssh, programmatic
access
Platform & Tools targeted to a range of users:
![Page 12: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/12.jpg)
Web Access for the PI/Biologist
Google Cloud Storage BigQuery Google Genomics
ISB Cancer Genomics Cloud(web app, API, tools, etc) Compute
Engine VMs
Local Storage
PI / BiologistComputational
Research ScientistAlgorithm Developer
web access
python, R, SQL
ssh, programmatic
access
Use Cases• select a subset of TCGA samples
based on clinical or molecular characteristics, then explore all data for a specific gene or pathway
• compare one cohort to another• upload a small private dataset to
analyze in conjunction with TCGA data• etc…
web access for the PI / Biologist:
![Page 13: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/13.jpg)
Python, R, and SQL for the Computational Scientist: Use Cases
Google Cloud Storage BigQuery Google Genomics
ISB Cancer Genomics Cloud(web app, API, tools, etc) Compute
Engine VMs
Local Storage
PI / BiologistComputational
Research ScientistAlgorithm Developer
web access
python, R, SQL
ssh, programmatic
access
Use Cases• write scripts in R or python to do
custom analyses that are not (yet) available interactively
• develop and share/publish new tools (including interactive)
• develop/customize pipelines• etc…
Python, R, and SQL for the Computational Scientist:
![Page 14: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/14.jpg)
Programmatic Access for the Algorithm Developer
Google Cloud Storage BigQuery Google Genomics
ISB Cancer Genomics Cloud(web app, API, tools, etc) Compute
Engine VMs
Local Storage
PI / BiologistComputational
Research ScientistAlgorithm Developer
web access
python, R, SQL
ssh, programmatic
access
Use Cases• test new algorithm on hundreds or
thousands of BAM or FASTQ files• reprocess all TCGA DNAseq and/or
RNAseq data• reprocess all SNP6 CEL files• etc…
programmatic access for the Algorithm Developer:
![Page 15: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/15.jpg)
Primary Goals of the ISB-CGC: Users
Goal #1: Data
Goal #2: Compute
![Page 16: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/16.jpg)
Google Cloud Storage
Goal #1: Data 1 PBCloud Shift #1
Goal #1: Data
![Page 17: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/17.jpg)
What is in There?
1 PB
Total size of TCGA data hosted by ISB-CGC: 1 PB
What is in there?
![Page 18: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/18.jpg)
Low-Level Sequence Data
Low-level Sequence
Data
Total size of TCGA data hosted by ISB-CGC: 1 PB
• 99.8% is low-level sequence data (Level-1)
![Page 19: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/19.jpg)
DNASeq and RNASeq
DNASeq
RNASeq
Total size of TCGA data hosted by ISB-CGC: 1 PB
• 99.8% is low-level sequence data (Level-1)• 85% is DNASeq data• 15% is RNASeq data (including miRNAseq)
![Page 20: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/20.jpg)
Total Size of TCGA DataTotal size of TCGA data hosted by ISB-CGC: 1 PB
• 99.8% is low-level sequence data (Level-1)• 85% is DNASeq data
• 52% is whole genome sequence• 48% is exome sequence
• 15% is RNASeq data (including miRNAseq)
DNASeqWGS
DNASeqWXS
RNASeq
![Page 21: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/21.jpg)
RNASeqTotal size of TCGA data hosted by ISB-CGC: 1 PB
• 99.8% is low-level sequence data (Level-1)• 85% is DNASeq data
• 52% is whole genome sequence• 48% is exome sequence
• 15% is RNASeq data (including miRNAseq)
• 0.15% is low-level SNP array data (CEL files)
• 0.05% is all other data (Level-3, clinical, etc)
DNASeqWGS
DNASeqWXS
RNASeq
![Page 22: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/22.jpg)
Total Number of TCGA FilesTotal number of TCGA files hosted by ISB-CGC: 340K
• 22% is low-level sequence data (Level-1)• 53% is DNASeq data
• 10% is whole genome sequence• 90% is exome sequence
• 47% is RNASeq data (including miRNAseq)
• 7% is low-level SNP array data (CEL files)
• 71% is all other data (Level-3, clinical, etc)
WGS DNASeq WXS
RNASeq
SNP array (CEL)
Everything Else
![Page 23: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/23.jpg)
All Other DataTotal number of TCGA files hosted by ISB-CGC: 340K
• 22% is low-level sequence data (Level-1)• 53% is DNASeq data
• 10% is whole genome sequence• 90% is exome sequence
• 47% is RNASeq data (including miRNAseq)
• 7% is low-level SNP array data (CEL files)
• 71% is all other data (Level-3, clinical, etc)
DNASeq WGS DNASeq WXS
RNASeq(gene, isoform, exon,
junction, etc)
SNP array(genotype calls,
allele- and segment-copy-number values)
clinical & biospecimen
miRNAseq
DNA methylation
RNASeq
SNP array (CEL)
Protein (RPPA)
DNASeq (MAF, VCF)
![Page 24: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/24.jpg)
Goal #1: Data
ISB-CGC Phase 1• Low-level sequence and SNP array data as files in Cloud Storage• High-level data and annotations as tables in BigQuery
ISB-CGC Phase 2• Low-level sequence data in Google Genomics (backed by Bigtable)• Variant calls in Google Genomics and BigQuery
![Page 25: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/25.jpg)
Goal #1: Data, BigQuery and Google Genomics
ISB-CGC Phase 1• Low-level sequence and SNP array data as files in Cloud Storage• High-level data and annotations as tables in BigQuery
ISB-CGC Phase 2• Low-level sequence data in Google Genomics• Variant calls in Google Genomics and BigQuery
• BigQuery: massively parallel analytics engine pushes queries out to thousands of machines and aggregates results in seconds
• Google Genomics: read- and variant-optimized platform, supports the industry standard GA4GH API and can handle petabytes of data
![Page 26: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/26.jpg)
Table Details: Clinical, Biospecimen, Annotations
![Page 27: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/27.jpg)
Table Details
![Page 28: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/28.jpg)
TCGA Table Details
![Page 29: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/29.jpg)
Bring your data to BigQuery!• easily integrate with other BigQuery datasets … if other people put
their data and annotations into BigQuery tables
• eg Tute Genomics
• Let’s put out a call to researchers to make data, annotations, etc available for all to use in BigQuery!• TCGA Level-3 data (500 GB) -- $10 per month
• Tute Genomics (649 GB and 8.6 billion rows) -- $13 per month
• GENCODE (593 MB table with 2.6 million rows) -- only 14 cents per year
![Page 30: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/30.jpg)
Goal #2: Compute
1. PI / Biologist: web-based interaction
2. Computational Research Scientist: R, Python, SQL
3. Algorithm Developer: VMs, Container Engine, Dataproc, Dataflow
![Page 31: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/31.jpg)
web access for the PI / Biologist
![Page 32: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/32.jpg)
Create Cohort Clinical Features
web access for the PI / Biologist:
![Page 33: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/33.jpg)
Save As New Cohort
web access for the PI / Biologist:
![Page 34: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/34.jpg)
Create Cohort Vital Status
web access for the PI / Biologist:
![Page 35: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/35.jpg)
Name New Cohort
web access for the PI / Biologist:
![Page 36: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/36.jpg)
Share Cohort
web access for the PI / Biologist:
![Page 37: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/37.jpg)
Additional Cohort Operations
web access for the PI / Biologist:
Additional Cohort operations include:• set operations (union, intersection,
complement)• comment• clone• delete
![Page 38: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/38.jpg)
Visualization
![Page 39: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/39.jpg)
EGFR Gene Expression vs Copy-Number
EGFR Copy Number Segment Mean
EGFR
RN
Ase
q e
xpre
ssio
n (
RSE
M c
ou
nts
)
![Page 40: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/40.jpg)
Save Cohort
EGFR Copy Number Segment Mean
EGFR
RN
Ase
q e
xpre
ssio
n (
RSE
M c
ou
nts
)EG
FR R
NA
seq
exp
ress
ion
(R
SEM
co
un
ts)
![Page 41: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/41.jpg)
Python, R, and SQL for the Computational Scientist
Python, R, and SQL for the Computational Scientist:
SQL
![Page 42: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/42.jpg)
ISB-CGC Examples
https://github.com/isb-cgc/examples-R
https://github.com/isb-cgc/examples-Python
![Page 43: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/43.jpg)
ISB-CGC examples-Python
![Page 44: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/44.jpg)
ISB-CGC examples-R
![Page 45: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/45.jpg)
BigrQuery
![Page 46: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/46.jpg)
Copy Number Segments (Broad)
![Page 47: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/47.jpg)
Python APIs
![Page 48: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/48.jpg)
Copy Number Segments
![Page 49: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/49.jpg)
Histograms of Average Copy-Number
![Page 50: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/50.jpg)
Programmatic Access for the Algorithm Developer (Google Cloud)programmatic accessfor the Algorithm Developer:
your own Google Cloud Project , with automatic access to:
Cloud StorageBigQueryGoogle Genomicsall Google Compute technologies, including:
Compute Engine: anything you can do on your
laptop/desktop you can do on a VM
Container Engine: fully managed and hosted container
orchestration – create and deploy clusters in seconds
Dataflow: successor to MapReduce
![Page 51: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/51.jpg)
Cloud Endpoints APIthe ISB-CGC API provides programmatic access to the
same functionality as the web-app and more:
Cloud Endpoints API (backed by App Engine)
authenticate from the command-linemake requests to Endpoints API, eg:
get list of my cohorts get cohort details save a new cohort get list of data files associated with a cohort
![Page 52: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/52.jpg)
Summary
ISB-CGC Phase 1• Low-level sequence and SNP array data as files in Cloud Storage• High-level data and annotations as tables in BigQuery• Multiple access modes and interfaces:
• Interactive web-application• R, Python, SQL, and JavaScript • Endpoint APIs
ISB-CGC Phase 2• Low-level sequence data in Google Genomics• Variant calls in Google Genomics and BigQuery
![Page 53: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/53.jpg)
Project Funding
This project has been funded in whole with Federal funds from the National Cancer Institute, National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN261201400007C.
![Page 54: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/54.jpg)
Questions?
ISB Cancer Genomics Cloud
![Page 55: NCI CBIIT Speaker Series December 9 2015...Programmatic Access for the Algorithm Developer Google Cloud Storage BigQuery Google Genomics ISB Cancer Genomics Cloud (web app, API, tools,](https://reader033.vdocuments.mx/reader033/viewer/2022050110/5f47d6617b20fa4153437000/html5/thumbnails/55.jpg)
Data Word Cloud