bridging data analysis and interactive visualization

13
Bridging Data Analysis Interactive Visualization & Nacho Caballero Boston University I’m going to talk about data exploration, which is something that most of us do all day. We explore data to answer questions like: what genes have expression patterns that can discriminate between dierent types of tumor, or what are the oscillation dynamics of yeast metabolites.

Upload: nacho-caballero

Post on 09-May-2015

188 views

Category:

Technology


1 download

DESCRIPTION

Clickme is an R package that lets you generate interactive visualizations directly from R. I presented the latest iteration at the 2013 IBSB conference in Kyoto

TRANSCRIPT

Page 1: Bridging data analysis and interactive visualization

Bridging Data Analysis

Interactive Visualization

&Nacho Caballero

Boston University

I’m going to talk about data exploration, which is something that most of us do all day.

We explore data to answer questions like: what genes have expression patterns that can discriminate between different types of tumor, or what are the oscillation dynamics of yeast metabolites.

Page 2: Bridging data analysis and interactive visualization

?These are the big questions, but before they can be answered we need to tackle the little questions: what format should I use to store my data, or how should I normalize it.

Page 3: Bridging data analysis and interactive visualization

??

? ???

???????

? ? ? ????

These are the big questions, but before they can be answered we need to tackle the little questions: what format should I use to store my data, or how should I normalize it.

Page 4: Bridging data analysis and interactive visualization

ISG20,2.378414,5.61778,14.123248,2.234574,18.635737,26.354913,4.924578,6.727171,20.82147,4.346939,16.223352,4.316913,26.908685,7.110741,6.233317,16.111289,24.251465,0.00242025799883492,0.00250798955660646,0.182037350296986,0.143497332454966KCTD14,0.036147,0.01323,0.100134,0.01243,0.111105,0.946058,0.005411,0.835088,0.432269,0.010237,1.021012,1.125058,0.052922,1.006956,0.02936,0.140632,2.12874,0.00530809681926075,0.236357348297356,0.47826448842347,0.00721678970151101SOCS1,2.297397,0.687771,2.531513,0.532185,2.584706,1.624505,1.22264,1.905276,1.433955,0.768438,1.94531,0.97942,4.228072,1.443929,1.494849,3.07375,1.986185,0.00655947338411466,0.191026275159313,0.342274678385418,0.675685362613106GADD45B,0.687771,0.303549,0.823591,0.496546,1.101905,0.607097,0.721965,0.514057,0.25349,0.287175,0.539614,0.29937,1.905276,0.358489,0.450625,1,0.858565,0.0101354143538965,0.310652726378842,0.493564426650172,0.226954163814597TAP1,1.729074,8.282119,9.713559,1.94531,13.454343,8.339726,2.887858,6.233317,7.727491,3.24901,11.551434,4.958831,33.128478,12.906268,2.620787,11.080876,13.269113,0.0106104636811607,0.179309470812382,0.316716762315457,0.151556141118556TNFAIP6,0.036147,0.01323,0.046714,0.01243,0.034435,0.223756,0.005411,0.482968,0.965936,0.010237,0.460094,2.114036,4.594793,0.737135,0.013792,0.05329,0.510506,0.0116669161910557,0.24614243557355,0.438396881725518,0.163040211392942ARL5B,5.278032,0.742262,1.301342,2.250117,0.267943,0.456916,4.228072,0.952638,0.812252,1.892115,0.277392,0.773782,1.265757,0.692555,2.828427,0.697372,2.297397,0.013242419977536,0.224561120371216,0.032942733244772,0.0167920963496654CD63,5.775717,9.986644,14.320401,5.81589,17.387758,20.966294,9.986644,10.852835,43.111474,6.062866,23.588307,16.111289,56.492992,14.320401,16,19.835323,38.854236,0.0163856104898972,0.241315868518633,0.30441654324614,0.092573470769844HSH2D,0.659754,0.965936,2.361985,0.420448,3.052518,1.840375,0.590496,1.22264,2.394957,0.521233,1.494849,1,6.821079,1.613284,0.602904,1.986185,1.905276,0.0241241197764549,1.15122088287513e-­‐05,0.405513349700302,0.0504226552664796TRIM25,0.493116,3.5801,7.061624,0.366021,5.278032,9.713559,2.056228,3.706352,13.086433,1.086735,7.674113,4.40762,30.909963,3.20428,1.172835,4.112455,11.63178,0.0245693112111099,0.0439687792064969,0.360644719095266,0.00227712751439227SIGLEC1,0.090246,0.0819,0.702222,0.01243,0.840896,2.42839,0.029157,0.747425,1.257013,0.010453,1.433955,0.243164,1.693491,0.378929,0.050415,0.466516,4,0.026855277631121,0.154317494438736,0.0512677523593489,0.1076810187058LAP3,1.028114,2.013911,6.916298,1.265757,5.278032,19.698311,2.329467,5.278032,20.112214,0.933033,13.642158,8.75435,27.284317,13.547925,1.328686,4.055838,17.876594,0.0307333179253698,0.022880152675696,0.218109564319035,0.0810388003193872CORO2A,0.732043,0.175556,0.697372,0.031467,0.532185,0.668964,0.203063,1.356604,2.887858,0.118257,1.057018,1.094294,7.727491,2.013911,0.07911,0.619854,2.531513,0.0375867457085621,0.377055117492505,0.430037425480533,0.0269888435515195BTG3,1.853176,0.566442,0.283221,0.594604,0.189465,0.096055,0.63728,0.31864,0.438303,0.790041,0.493116,0.420448,0.946058,0.429283,0.876606,0.295248,0.257028,0.0391072414686755,0.0269904051087048,0.523363050526997,0.0718318166772209IFIT2,0.747425,1.180993,16.449821,0.417544,15.032364,28.246496,0.726986,19.027314,18.000936,0.550953,20.82147,16,41.355291,15.136922,0.632878,8.055644,35.506223,0.040023287856168,0.0735202871192785,0.206872507694354,0.00522500331539952SIGLEC9,0.036147,0.111105,0.219151,0.065154,0.205898,0.228458,0.20166,0.11908,0.323088,0.021051,0.309927,0.149685,0.607097,0.109576,0.075887,0.135842,0.378929,0.0411674943375035,0.163871976641275,0.22544691557851,0.234743591663763TP53,0.972655,3.810552,3.630077,0.403321,4.469149,1.591073,1.375542,0.907519,1.248331,1.853176,1.986185,1.394744,4.346939,1.086735,0.888843,2.378414,1.918528,0.0429888041979826,0.0799200444146394,0.320890329219422,0.916358037620054IFIT3,0.325335,1.670176,16,0.173139,20.677645,65.799285,0.737135,26.354913,32,0.668964,37.014022,17.508699,56.492992,31.559447,0.503478,9.447941,48.840295,0.0445077985097724,0.0942370571745166,0.132034891220241,0.0264982772378789NR4A2,10.410735,0.578344,1.414214,3.482202,0.858565,0.20166,4.890561,1.189207,0.578344,3.138336,0.15932,0.102949,0.264255,0.61132,4.438278,2.361985,2.234574,0.0458872581522527,0.0685963953625151,0.0186377365252256,0.0229729130997906STAT1,3.810552,10.410735,24.590003,1.42405,23.425371,22.471118,4.500234,13.547925,28.246496,3.010493,32.446703,15.032364,75.58353,15.136922,2.297397,12.125733,33.128478,0.0480007475674105,0.131668005378222,0.254501530406828,9.25837445396843e-­‐06TNFRSF21,0.135842,0.094078,0.476319,0.043586,0.594604,1.453973,0.524858,0.411796,1.647182,0.043285,1.172835,0.779165,2.042024,1.021012,0.281265,0.358489,2.297397,0.0546576637223853,0.143999137966455,0.178714635988111,0.073452905262782IRF7,0.496546,1.248331,5.775717,0.267943,13.832596,7.310652,0.566442,6.147501,5.735821,0.435275,8.282119,4.084049,12.466633,3.530812,0.539614,10.126053,10.483147,0.055607834227974,0.117993851426521,0.132223445393013,0.0347695112322287CXCR4,41.355291,11.004335,89.884472,35.260964,59.301636,8.75435,48.167896,13.454343,12.38052,51.984153,9.646463,6.680703,42.813682,8.111676,36.504439,89.263595,16.223352,0.0557453841560263,0.011921500962025,0.499518957719701,0.000140456107695074RSAD2,0.036147,0.435275,4.112455,0.026096,5.314743,14.221483,0.06164,8.224911,5.063026,0.190782,7.727491,5.205367,15.136922,5.696201,0.071794,2.114036,11.551434,0.0560573738055966,0.0659941398254695,0.200755021810249,0.0212693081396108FCGR1B,0.795536,0.435275,1.443929,0.127627,1.866066,11.080876,0.30566,4.287094,3.530812,0.047696,6.727171,8.75435,6.916298,2.188587,0.408951,0.550953,7.412704,0.121772447420583,0.127705401877893,7.42789066050201e-­‐07,0.133461949864962

There are two types of little questions: those that require manipulating data in bulk, and those that require presenting it visually.

I have found the R programming language an incredibly useful tool to work with data in bulk because it’s fast and it’s flexible. Unfortunately, R can only display static plots, which slow down the process of data exploration, wasting time that could be better spent thinking about the big questions.

Page 5: Bridging data analysis and interactive visualization

ISG20,2.378414,5.61778,14.123248,2.234574,18.635737,26.354913,4.924578,6.727171,20.82147,4.346939,16.223352,4.316913,26.908685,7.110741,6.233317,16.111289,24.251465,0.00242025799883492,0.00250798955660646,0.182037350296986,0.143497332454966KCTD14,0.036147,0.01323,0.100134,0.01243,0.111105,0.946058,0.005411,0.835088,0.432269,0.010237,1.021012,1.125058,0.052922,1.006956,0.02936,0.140632,2.12874,0.00530809681926075,0.236357348297356,0.47826448842347,0.00721678970151101SOCS1,2.297397,0.687771,2.531513,0.532185,2.584706,1.624505,1.22264,1.905276,1.433955,0.768438,1.94531,0.97942,4.228072,1.443929,1.494849,3.07375,1.986185,0.00655947338411466,0.191026275159313,0.342274678385418,0.675685362613106GADD45B,0.687771,0.303549,0.823591,0.496546,1.101905,0.607097,0.721965,0.514057,0.25349,0.287175,0.539614,0.29937,1.905276,0.358489,0.450625,1,0.858565,0.0101354143538965,0.310652726378842,0.493564426650172,0.226954163814597TAP1,1.729074,8.282119,9.713559,1.94531,13.454343,8.339726,2.887858,6.233317,7.727491,3.24901,11.551434,4.958831,33.128478,12.906268,2.620787,11.080876,13.269113,0.0106104636811607,0.179309470812382,0.316716762315457,0.151556141118556TNFAIP6,0.036147,0.01323,0.046714,0.01243,0.034435,0.223756,0.005411,0.482968,0.965936,0.010237,0.460094,2.114036,4.594793,0.737135,0.013792,0.05329,0.510506,0.0116669161910557,0.24614243557355,0.438396881725518,0.163040211392942ARL5B,5.278032,0.742262,1.301342,2.250117,0.267943,0.456916,4.228072,0.952638,0.812252,1.892115,0.277392,0.773782,1.265757,0.692555,2.828427,0.697372,2.297397,0.013242419977536,0.224561120371216,0.032942733244772,0.0167920963496654CD63,5.775717,9.986644,14.320401,5.81589,17.387758,20.966294,9.986644,10.852835,43.111474,6.062866,23.588307,16.111289,56.492992,14.320401,16,19.835323,38.854236,0.0163856104898972,0.241315868518633,0.30441654324614,0.092573470769844HSH2D,0.659754,0.965936,2.361985,0.420448,3.052518,1.840375,0.590496,1.22264,2.394957,0.521233,1.494849,1,6.821079,1.613284,0.602904,1.986185,1.905276,0.0241241197764549,1.15122088287513e-­‐05,0.405513349700302,0.0504226552664796TRIM25,0.493116,3.5801,7.061624,0.366021,5.278032,9.713559,2.056228,3.706352,13.086433,1.086735,7.674113,4.40762,30.909963,3.20428,1.172835,4.112455,11.63178,0.0245693112111099,0.0439687792064969,0.360644719095266,0.00227712751439227SIGLEC1,0.090246,0.0819,0.702222,0.01243,0.840896,2.42839,0.029157,0.747425,1.257013,0.010453,1.433955,0.243164,1.693491,0.378929,0.050415,0.466516,4,0.026855277631121,0.154317494438736,0.0512677523593489,0.1076810187058LAP3,1.028114,2.013911,6.916298,1.265757,5.278032,19.698311,2.329467,5.278032,20.112214,0.933033,13.642158,8.75435,27.284317,13.547925,1.328686,4.055838,17.876594,0.0307333179253698,0.022880152675696,0.218109564319035,0.0810388003193872CORO2A,0.732043,0.175556,0.697372,0.031467,0.532185,0.668964,0.203063,1.356604,2.887858,0.118257,1.057018,1.094294,7.727491,2.013911,0.07911,0.619854,2.531513,0.0375867457085621,0.377055117492505,0.430037425480533,0.0269888435515195BTG3,1.853176,0.566442,0.283221,0.594604,0.189465,0.096055,0.63728,0.31864,0.438303,0.790041,0.493116,0.420448,0.946058,0.429283,0.876606,0.295248,0.257028,0.0391072414686755,0.0269904051087048,0.523363050526997,0.0718318166772209IFIT2,0.747425,1.180993,16.449821,0.417544,15.032364,28.246496,0.726986,19.027314,18.000936,0.550953,20.82147,16,41.355291,15.136922,0.632878,8.055644,35.506223,0.040023287856168,0.0735202871192785,0.206872507694354,0.00522500331539952SIGLEC9,0.036147,0.111105,0.219151,0.065154,0.205898,0.228458,0.20166,0.11908,0.323088,0.021051,0.309927,0.149685,0.607097,0.109576,0.075887,0.135842,0.378929,0.0411674943375035,0.163871976641275,0.22544691557851,0.234743591663763TP53,0.972655,3.810552,3.630077,0.403321,4.469149,1.591073,1.375542,0.907519,1.248331,1.853176,1.986185,1.394744,4.346939,1.086735,0.888843,2.378414,1.918528,0.0429888041979826,0.0799200444146394,0.320890329219422,0.916358037620054IFIT3,0.325335,1.670176,16,0.173139,20.677645,65.799285,0.737135,26.354913,32,0.668964,37.014022,17.508699,56.492992,31.559447,0.503478,9.447941,48.840295,0.0445077985097724,0.0942370571745166,0.132034891220241,0.0264982772378789NR4A2,10.410735,0.578344,1.414214,3.482202,0.858565,0.20166,4.890561,1.189207,0.578344,3.138336,0.15932,0.102949,0.264255,0.61132,4.438278,2.361985,2.234574,0.0458872581522527,0.0685963953625151,0.0186377365252256,0.0229729130997906STAT1,3.810552,10.410735,24.590003,1.42405,23.425371,22.471118,4.500234,13.547925,28.246496,3.010493,32.446703,15.032364,75.58353,15.136922,2.297397,12.125733,33.128478,0.0480007475674105,0.131668005378222,0.254501530406828,9.25837445396843e-­‐06TNFRSF21,0.135842,0.094078,0.476319,0.043586,0.594604,1.453973,0.524858,0.411796,1.647182,0.043285,1.172835,0.779165,2.042024,1.021012,0.281265,0.358489,2.297397,0.0546576637223853,0.143999137966455,0.178714635988111,0.073452905262782IRF7,0.496546,1.248331,5.775717,0.267943,13.832596,7.310652,0.566442,6.147501,5.735821,0.435275,8.282119,4.084049,12.466633,3.530812,0.539614,10.126053,10.483147,0.055607834227974,0.117993851426521,0.132223445393013,0.0347695112322287CXCR4,41.355291,11.004335,89.884472,35.260964,59.301636,8.75435,48.167896,13.454343,12.38052,51.984153,9.646463,6.680703,42.813682,8.111676,36.504439,89.263595,16.223352,0.0557453841560263,0.011921500962025,0.499518957719701,0.000140456107695074RSAD2,0.036147,0.435275,4.112455,0.026096,5.314743,14.221483,0.06164,8.224911,5.063026,0.190782,7.727491,5.205367,15.136922,5.696201,0.071794,2.114036,11.551434,0.0560573738055966,0.0659941398254695,0.200755021810249,0.0212693081396108FCGR1B,0.795536,0.435275,1.443929,0.127627,1.866066,11.080876,0.30566,4.287094,3.530812,0.047696,6.727171,8.75435,6.916298,2.188587,0.408951,0.550953,7.412704,0.121772447420583,0.127705401877893,7.42789066050201e-­‐07,0.133461949864962

There are two types of little questions: those that require manipulating data in bulk, and those that require presenting it visually.

I have found the R programming language an incredibly useful tool to work with data in bulk because it’s fast and it’s flexible. Unfortunately, R can only display static plots, which slow down the process of data exploration, wasting time that could be better spent thinking about the big questions.

Page 6: Bridging data analysis and interactive visualization

Demo rclickme.com

In the year 2013, there is no reason why I shouldn’t be able to simply type the name of a data point and see where it shows up in my data. No technological reason prevents me from zooming in to a specific region and hovering over a point to show additional information on demand.

During the past few years, a thriving community of JS developers has turned your internet browser into a very powerful visualization platform, but these advantages are just now starting to become adopted by the R community.

I didn’t want to have to choose between R’s ability to work with data in bulk and JS’s ability to display data interactively, so I built an R package to get the best of both worlds. It’s called Clickme, and it’s available at rclickme.com

Page 7: Bridging data analysis and interactive visualization

data  <-­‐  data.frame(        x  =  c(1,  2,  3),        y  =  c("a",  "b",  "c"))

R

I encountered two major problems while trying to make both platforms talk to each other.

The first was working with different data formats. This is an example of how R and JS store a matrix containing numbers and strings. Clickme converts R data to JS formatted data by using translator functions, which ensure that every data type in R has the right format in JS.

Page 8: Bridging data analysis and interactive visualization

data  <-­‐  data.frame(        x  =  c(1,  2,  3),        y  =  c("a",  "b",  "c"))

R

var  data  =  [        {"x":1,  "y":"a"},        {"x":2,  "y":"b"},        {"x":3,  "y":"c"}];

JSI encountered two major problems while trying to make both platforms talk to each other.

The first was working with different data formats. This is an example of how R and JS store a matrix containing numbers and strings. Clickme converts R data to JS formatted data by using translator functions, which ensure that every data type in R has the right format in JS.

Page 9: Bridging data analysis and interactive visualization

data  <-­‐  data.frame(        x  =  c(1,  2,  3),        y  =  c("a",  "b",  "c"))

R

var  data  =  [        {"x":1,  "y":"a"},        {"x":2,  "y":"b"},        {"x":3,  "y":"c"}];

JS

translate(data) Translator

I encountered two major problems while trying to make both platforms talk to each other.

The first was working with different data formats. This is an example of how R and JS store a matrix containing numbers and strings. Clickme converts R data to JS formatted data by using translator functions, which ensure that every data type in R has the right format in JS.

Page 10: Bridging data analysis and interactive visualization

data  <-­‐  data.frame(        x  =  c(1,  2,  3),        y  =  c("a",  "b",  "c"))

var  data  =  {{  translate(data)  }};

RTemplate

The other major problem was data reusability. How to tell the JS code responsible for generating the visualization what data to use? Clickme does this by using templates: hybrid files that contain both R and JS code. This makes it possible for the same template to be used to visualize different data sets.

A template contains mostly JS code, but at critical points it has R code surrounded by double (or triple) braces. When the user asks to generate a plot, the R code is evaluated and the braces are replaced with the results, generating a visualization customized to your data.

Page 11: Bridging data analysis and interactive visualization

data  <-­‐  data.frame(        x  =  c(1,  2,  3),        y  =  c("a",  "b",  "c"))

var  data  =  {{  translate(data)  }};

R

var  data  =  [        {"x":1,  "y":"a"},        {"x":2,  "y":"b"},        {"x":3,  "y":"c"}];

JS

Template

The other major problem was data reusability. How to tell the JS code responsible for generating the visualization what data to use? Clickme does this by using templates: hybrid files that contain both R and JS code. This makes it possible for the same template to be used to visualize different data sets.

A template contains mostly JS code, but at critical points it has R code surrounded by double (or triple) braces. When the user asks to generate a plot, the R code is evaluated and the braces are replaced with the results, generating a visualization customized to your data.

Page 12: Bridging data analysis and interactive visualization

Clickme plots are easy to create and shareThe main reason why you should use Clickme for your daily plotting needs is that dynamic plots are as easy to generate as static plots. You simply call a function and send it a template and some data.

The plots are also easy to share, simply upload them to a server, or email them.

Page 13: Bridging data analysis and interactive visualization

rclickme.com

@nachocaballero

You can try Clickme by visiting rclickme.com and following the instructions to install the package in R.

Right now, you can only create scatter plots, but I’m working on adding more types of visualizations (line plots, heatmaps).

If you have a visualization that you would like to be able to use directly from R, let me know and I’ll send you an email when the developer guide is ready.

I hope Clickme helps you solve the little problems more quickly, so you can spend the extra time thinking about the big questions.