scalable graph analytics for metagenomics and metaproteomics

3
Scalable graph analytics for metagenomics and metaproteomics Ananth Kalyanaraman @ HPCBio lab ([email protected] ) Associate Professor, School of EECS, Washington State University, Pullman, WA Research Areas: Parallel algorithms, Computational biology/bioinformatics, Graph algorithms, String algorithms, Parallel architectures Workshop on Future Computing Platforms to Accelerate Next-Gen Sequencing (NGS) Applications, May 19, 2013, held in conjunction with IPDPS’13, Boston, MA Applications: bioenergy alternatives human health environmental monitorin soil and forest ecology ocean microbiology … nvironmental microbial community analytics DNA, RNA, protein, mass spec/peptide NGS Data scale: #studies: >350 #samples: >2,500 #genic/ORF reads: >100M+ Funding relevance: Image courtesy: www.genomesonline.org

Upload: deborah-deleon

Post on 31-Dec-2015

30 views

Category:

Documents


1 download

DESCRIPTION

Scalable graph analytics for metagenomics and metaproteomics. Ananth Kalyanaraman @ HPCBio lab ( [email protected] ) Associate Professor, School of EECS, Washington State University, Pullman, WA. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Scalable graph analytics for  metagenomics  and  metaproteomics

Scalable graph analytics for metagenomics and metaproteomics

Ananth Kalyanaraman @ HPCBio lab ([email protected])Associate Professor, School of EECS, Washington State University, Pullman, WA

Research Areas: Parallel algorithms, Computational biology/bioinformatics, Graph algorithms, String algorithms, Parallel

architectures

Research Areas: Parallel algorithms, Computational biology/bioinformatics, Graph algorithms, String algorithms, Parallel

architectures

Workshop on Future Computing Platforms to Accelerate Next-Gen Sequencing (NGS) Applications, May 19, 2013, held in conjunction with IPDPS’13, Boston, MA

Applications: bioenergy alternatives human health environmental monitoring soil and forest ecology ocean microbiology …

Environmental microbial community analytics

DNA, RNA, protein,mass spec/peptide

NGS

Data scale: #studies: >350 #samples: >2,500 #genic/ORF reads: >100M+ …

Funding relevance:

Image courtesy: www.genomesonline.org

Page 2: Scalable graph analytics for  metagenomics  and  metaproteomics

Some graph-theoretic problems in environmental microbial community analytics

Problems: Network construction Clustering Community annotation Network comparison Heterogeneity …

Source data:Protein/ORF sequence homologyMass spectral library construction Interaction networks (gene, protein)

Parallelism: mostly rudimentary/ad hoc

in standard workflows distributed memory

MPI, MapReduce Intra-node

Multicore, GPUs

Some challenges: inherits graph-related challenges

and choice of architectures availability of networks/inference data integration low sampling, species diversity qualitative metrics automated workflows …Workshop on Future Computing Platforms to Accelerate Next-Gen Sequencing

(NGS) Applications, May 19, 2013, held in conjunction with IPDPS’13, Boston, MA

Page 3: Scalable graph analytics for  metagenomics  and  metaproteomics

SIAM CSE'13, Boston, MA 3

Graphs are pervasive in Computational Biology

2/28/2013

genemotifs

read

s

Genome

mRNA

proteindatabase

search

Comparativegenomics

Phy

loge

netic

tree

Proteinfamilies

….

STRING GRAPHSCLIQUE

PROBABILISTICGRAPH MODELS

COMPARATIVENETWORK ANALYSIS

CLASSICAL NETWORKANALYSIS

TREES,DAGS,TSP,ML

PATTERNMATCHING

Populationgenomics