Bridging Data Analysis
Interactive Visualization
&Nacho Caballero
Boston University
I’m going to talk about data exploration, which is something that most of us do all day.
We explore data to answer questions like: what genes have expression patterns that can discriminate between different types of tumor, or what are the oscillation dynamics of yeast metabolites.
?These are the big questions, but before they can be answered we need to tackle the little questions: what format should I use to store my data, or how should I normalize it.
??
? ???
???????
? ? ? ????
These are the big questions, but before they can be answered we need to tackle the little questions: what format should I use to store my data, or how should I normalize it.
ISG20,2.378414,5.61778,14.123248,2.234574,18.635737,26.354913,4.924578,6.727171,20.82147,4.346939,16.223352,4.316913,26.908685,7.110741,6.233317,16.111289,24.251465,0.00242025799883492,0.00250798955660646,0.182037350296986,0.143497332454966KCTD14,0.036147,0.01323,0.100134,0.01243,0.111105,0.946058,0.005411,0.835088,0.432269,0.010237,1.021012,1.125058,0.052922,1.006956,0.02936,0.140632,2.12874,0.00530809681926075,0.236357348297356,0.47826448842347,0.00721678970151101SOCS1,2.297397,0.687771,2.531513,0.532185,2.584706,1.624505,1.22264,1.905276,1.433955,0.768438,1.94531,0.97942,4.228072,1.443929,1.494849,3.07375,1.986185,0.00655947338411466,0.191026275159313,0.342274678385418,0.675685362613106GADD45B,0.687771,0.303549,0.823591,0.496546,1.101905,0.607097,0.721965,0.514057,0.25349,0.287175,0.539614,0.29937,1.905276,0.358489,0.450625,1,0.858565,0.0101354143538965,0.310652726378842,0.493564426650172,0.226954163814597TAP1,1.729074,8.282119,9.713559,1.94531,13.454343,8.339726,2.887858,6.233317,7.727491,3.24901,11.551434,4.958831,33.128478,12.906268,2.620787,11.080876,13.269113,0.0106104636811607,0.179309470812382,0.316716762315457,0.151556141118556TNFAIP6,0.036147,0.01323,0.046714,0.01243,0.034435,0.223756,0.005411,0.482968,0.965936,0.010237,0.460094,2.114036,4.594793,0.737135,0.013792,0.05329,0.510506,0.0116669161910557,0.24614243557355,0.438396881725518,0.163040211392942ARL5B,5.278032,0.742262,1.301342,2.250117,0.267943,0.456916,4.228072,0.952638,0.812252,1.892115,0.277392,0.773782,1.265757,0.692555,2.828427,0.697372,2.297397,0.013242419977536,0.224561120371216,0.032942733244772,0.0167920963496654CD63,5.775717,9.986644,14.320401,5.81589,17.387758,20.966294,9.986644,10.852835,43.111474,6.062866,23.588307,16.111289,56.492992,14.320401,16,19.835323,38.854236,0.0163856104898972,0.241315868518633,0.30441654324614,0.092573470769844HSH2D,0.659754,0.965936,2.361985,0.420448,3.052518,1.840375,0.590496,1.22264,2.394957,0.521233,1.494849,1,6.821079,1.613284,0.602904,1.986185,1.905276,0.0241241197764549,1.15122088287513e-‐05,0.405513349700302,0.0504226552664796TRIM25,0.493116,3.5801,7.061624,0.366021,5.278032,9.713559,2.056228,3.706352,13.086433,1.086735,7.674113,4.40762,30.909963,3.20428,1.172835,4.112455,11.63178,0.0245693112111099,0.0439687792064969,0.360644719095266,0.00227712751439227SIGLEC1,0.090246,0.0819,0.702222,0.01243,0.840896,2.42839,0.029157,0.747425,1.257013,0.010453,1.433955,0.243164,1.693491,0.378929,0.050415,0.466516,4,0.026855277631121,0.154317494438736,0.0512677523593489,0.1076810187058LAP3,1.028114,2.013911,6.916298,1.265757,5.278032,19.698311,2.329467,5.278032,20.112214,0.933033,13.642158,8.75435,27.284317,13.547925,1.328686,4.055838,17.876594,0.0307333179253698,0.022880152675696,0.218109564319035,0.0810388003193872CORO2A,0.732043,0.175556,0.697372,0.031467,0.532185,0.668964,0.203063,1.356604,2.887858,0.118257,1.057018,1.094294,7.727491,2.013911,0.07911,0.619854,2.531513,0.0375867457085621,0.377055117492505,0.430037425480533,0.0269888435515195BTG3,1.853176,0.566442,0.283221,0.594604,0.189465,0.096055,0.63728,0.31864,0.438303,0.790041,0.493116,0.420448,0.946058,0.429283,0.876606,0.295248,0.257028,0.0391072414686755,0.0269904051087048,0.523363050526997,0.0718318166772209IFIT2,0.747425,1.180993,16.449821,0.417544,15.032364,28.246496,0.726986,19.027314,18.000936,0.550953,20.82147,16,41.355291,15.136922,0.632878,8.055644,35.506223,0.040023287856168,0.0735202871192785,0.206872507694354,0.00522500331539952SIGLEC9,0.036147,0.111105,0.219151,0.065154,0.205898,0.228458,0.20166,0.11908,0.323088,0.021051,0.309927,0.149685,0.607097,0.109576,0.075887,0.135842,0.378929,0.0411674943375035,0.163871976641275,0.22544691557851,0.234743591663763TP53,0.972655,3.810552,3.630077,0.403321,4.469149,1.591073,1.375542,0.907519,1.248331,1.853176,1.986185,1.394744,4.346939,1.086735,0.888843,2.378414,1.918528,0.0429888041979826,0.0799200444146394,0.320890329219422,0.916358037620054IFIT3,0.325335,1.670176,16,0.173139,20.677645,65.799285,0.737135,26.354913,32,0.668964,37.014022,17.508699,56.492992,31.559447,0.503478,9.447941,48.840295,0.0445077985097724,0.0942370571745166,0.132034891220241,0.0264982772378789NR4A2,10.410735,0.578344,1.414214,3.482202,0.858565,0.20166,4.890561,1.189207,0.578344,3.138336,0.15932,0.102949,0.264255,0.61132,4.438278,2.361985,2.234574,0.0458872581522527,0.0685963953625151,0.0186377365252256,0.0229729130997906STAT1,3.810552,10.410735,24.590003,1.42405,23.425371,22.471118,4.500234,13.547925,28.246496,3.010493,32.446703,15.032364,75.58353,15.136922,2.297397,12.125733,33.128478,0.0480007475674105,0.131668005378222,0.254501530406828,9.25837445396843e-‐06TNFRSF21,0.135842,0.094078,0.476319,0.043586,0.594604,1.453973,0.524858,0.411796,1.647182,0.043285,1.172835,0.779165,2.042024,1.021012,0.281265,0.358489,2.297397,0.0546576637223853,0.143999137966455,0.178714635988111,0.073452905262782IRF7,0.496546,1.248331,5.775717,0.267943,13.832596,7.310652,0.566442,6.147501,5.735821,0.435275,8.282119,4.084049,12.466633,3.530812,0.539614,10.126053,10.483147,0.055607834227974,0.117993851426521,0.132223445393013,0.0347695112322287CXCR4,41.355291,11.004335,89.884472,35.260964,59.301636,8.75435,48.167896,13.454343,12.38052,51.984153,9.646463,6.680703,42.813682,8.111676,36.504439,89.263595,16.223352,0.0557453841560263,0.011921500962025,0.499518957719701,0.000140456107695074RSAD2,0.036147,0.435275,4.112455,0.026096,5.314743,14.221483,0.06164,8.224911,5.063026,0.190782,7.727491,5.205367,15.136922,5.696201,0.071794,2.114036,11.551434,0.0560573738055966,0.0659941398254695,0.200755021810249,0.0212693081396108FCGR1B,0.795536,0.435275,1.443929,0.127627,1.866066,11.080876,0.30566,4.287094,3.530812,0.047696,6.727171,8.75435,6.916298,2.188587,0.408951,0.550953,7.412704,0.121772447420583,0.127705401877893,7.42789066050201e-‐07,0.133461949864962
There are two types of little questions: those that require manipulating data in bulk, and those that require presenting it visually.
I have found the R programming language an incredibly useful tool to work with data in bulk because it’s fast and it’s flexible. Unfortunately, R can only display static plots, which slow down the process of data exploration, wasting time that could be better spent thinking about the big questions.
ISG20,2.378414,5.61778,14.123248,2.234574,18.635737,26.354913,4.924578,6.727171,20.82147,4.346939,16.223352,4.316913,26.908685,7.110741,6.233317,16.111289,24.251465,0.00242025799883492,0.00250798955660646,0.182037350296986,0.143497332454966KCTD14,0.036147,0.01323,0.100134,0.01243,0.111105,0.946058,0.005411,0.835088,0.432269,0.010237,1.021012,1.125058,0.052922,1.006956,0.02936,0.140632,2.12874,0.00530809681926075,0.236357348297356,0.47826448842347,0.00721678970151101SOCS1,2.297397,0.687771,2.531513,0.532185,2.584706,1.624505,1.22264,1.905276,1.433955,0.768438,1.94531,0.97942,4.228072,1.443929,1.494849,3.07375,1.986185,0.00655947338411466,0.191026275159313,0.342274678385418,0.675685362613106GADD45B,0.687771,0.303549,0.823591,0.496546,1.101905,0.607097,0.721965,0.514057,0.25349,0.287175,0.539614,0.29937,1.905276,0.358489,0.450625,1,0.858565,0.0101354143538965,0.310652726378842,0.493564426650172,0.226954163814597TAP1,1.729074,8.282119,9.713559,1.94531,13.454343,8.339726,2.887858,6.233317,7.727491,3.24901,11.551434,4.958831,33.128478,12.906268,2.620787,11.080876,13.269113,0.0106104636811607,0.179309470812382,0.316716762315457,0.151556141118556TNFAIP6,0.036147,0.01323,0.046714,0.01243,0.034435,0.223756,0.005411,0.482968,0.965936,0.010237,0.460094,2.114036,4.594793,0.737135,0.013792,0.05329,0.510506,0.0116669161910557,0.24614243557355,0.438396881725518,0.163040211392942ARL5B,5.278032,0.742262,1.301342,2.250117,0.267943,0.456916,4.228072,0.952638,0.812252,1.892115,0.277392,0.773782,1.265757,0.692555,2.828427,0.697372,2.297397,0.013242419977536,0.224561120371216,0.032942733244772,0.0167920963496654CD63,5.775717,9.986644,14.320401,5.81589,17.387758,20.966294,9.986644,10.852835,43.111474,6.062866,23.588307,16.111289,56.492992,14.320401,16,19.835323,38.854236,0.0163856104898972,0.241315868518633,0.30441654324614,0.092573470769844HSH2D,0.659754,0.965936,2.361985,0.420448,3.052518,1.840375,0.590496,1.22264,2.394957,0.521233,1.494849,1,6.821079,1.613284,0.602904,1.986185,1.905276,0.0241241197764549,1.15122088287513e-‐05,0.405513349700302,0.0504226552664796TRIM25,0.493116,3.5801,7.061624,0.366021,5.278032,9.713559,2.056228,3.706352,13.086433,1.086735,7.674113,4.40762,30.909963,3.20428,1.172835,4.112455,11.63178,0.0245693112111099,0.0439687792064969,0.360644719095266,0.00227712751439227SIGLEC1,0.090246,0.0819,0.702222,0.01243,0.840896,2.42839,0.029157,0.747425,1.257013,0.010453,1.433955,0.243164,1.693491,0.378929,0.050415,0.466516,4,0.026855277631121,0.154317494438736,0.0512677523593489,0.1076810187058LAP3,1.028114,2.013911,6.916298,1.265757,5.278032,19.698311,2.329467,5.278032,20.112214,0.933033,13.642158,8.75435,27.284317,13.547925,1.328686,4.055838,17.876594,0.0307333179253698,0.022880152675696,0.218109564319035,0.0810388003193872CORO2A,0.732043,0.175556,0.697372,0.031467,0.532185,0.668964,0.203063,1.356604,2.887858,0.118257,1.057018,1.094294,7.727491,2.013911,0.07911,0.619854,2.531513,0.0375867457085621,0.377055117492505,0.430037425480533,0.0269888435515195BTG3,1.853176,0.566442,0.283221,0.594604,0.189465,0.096055,0.63728,0.31864,0.438303,0.790041,0.493116,0.420448,0.946058,0.429283,0.876606,0.295248,0.257028,0.0391072414686755,0.0269904051087048,0.523363050526997,0.0718318166772209IFIT2,0.747425,1.180993,16.449821,0.417544,15.032364,28.246496,0.726986,19.027314,18.000936,0.550953,20.82147,16,41.355291,15.136922,0.632878,8.055644,35.506223,0.040023287856168,0.0735202871192785,0.206872507694354,0.00522500331539952SIGLEC9,0.036147,0.111105,0.219151,0.065154,0.205898,0.228458,0.20166,0.11908,0.323088,0.021051,0.309927,0.149685,0.607097,0.109576,0.075887,0.135842,0.378929,0.0411674943375035,0.163871976641275,0.22544691557851,0.234743591663763TP53,0.972655,3.810552,3.630077,0.403321,4.469149,1.591073,1.375542,0.907519,1.248331,1.853176,1.986185,1.394744,4.346939,1.086735,0.888843,2.378414,1.918528,0.0429888041979826,0.0799200444146394,0.320890329219422,0.916358037620054IFIT3,0.325335,1.670176,16,0.173139,20.677645,65.799285,0.737135,26.354913,32,0.668964,37.014022,17.508699,56.492992,31.559447,0.503478,9.447941,48.840295,0.0445077985097724,0.0942370571745166,0.132034891220241,0.0264982772378789NR4A2,10.410735,0.578344,1.414214,3.482202,0.858565,0.20166,4.890561,1.189207,0.578344,3.138336,0.15932,0.102949,0.264255,0.61132,4.438278,2.361985,2.234574,0.0458872581522527,0.0685963953625151,0.0186377365252256,0.0229729130997906STAT1,3.810552,10.410735,24.590003,1.42405,23.425371,22.471118,4.500234,13.547925,28.246496,3.010493,32.446703,15.032364,75.58353,15.136922,2.297397,12.125733,33.128478,0.0480007475674105,0.131668005378222,0.254501530406828,9.25837445396843e-‐06TNFRSF21,0.135842,0.094078,0.476319,0.043586,0.594604,1.453973,0.524858,0.411796,1.647182,0.043285,1.172835,0.779165,2.042024,1.021012,0.281265,0.358489,2.297397,0.0546576637223853,0.143999137966455,0.178714635988111,0.073452905262782IRF7,0.496546,1.248331,5.775717,0.267943,13.832596,7.310652,0.566442,6.147501,5.735821,0.435275,8.282119,4.084049,12.466633,3.530812,0.539614,10.126053,10.483147,0.055607834227974,0.117993851426521,0.132223445393013,0.0347695112322287CXCR4,41.355291,11.004335,89.884472,35.260964,59.301636,8.75435,48.167896,13.454343,12.38052,51.984153,9.646463,6.680703,42.813682,8.111676,36.504439,89.263595,16.223352,0.0557453841560263,0.011921500962025,0.499518957719701,0.000140456107695074RSAD2,0.036147,0.435275,4.112455,0.026096,5.314743,14.221483,0.06164,8.224911,5.063026,0.190782,7.727491,5.205367,15.136922,5.696201,0.071794,2.114036,11.551434,0.0560573738055966,0.0659941398254695,0.200755021810249,0.0212693081396108FCGR1B,0.795536,0.435275,1.443929,0.127627,1.866066,11.080876,0.30566,4.287094,3.530812,0.047696,6.727171,8.75435,6.916298,2.188587,0.408951,0.550953,7.412704,0.121772447420583,0.127705401877893,7.42789066050201e-‐07,0.133461949864962
There are two types of little questions: those that require manipulating data in bulk, and those that require presenting it visually.
I have found the R programming language an incredibly useful tool to work with data in bulk because it’s fast and it’s flexible. Unfortunately, R can only display static plots, which slow down the process of data exploration, wasting time that could be better spent thinking about the big questions.
Demo rclickme.com
In the year 2013, there is no reason why I shouldn’t be able to simply type the name of a data point and see where it shows up in my data. No technological reason prevents me from zooming in to a specific region and hovering over a point to show additional information on demand.
During the past few years, a thriving community of JS developers has turned your internet browser into a very powerful visualization platform, but these advantages are just now starting to become adopted by the R community.
I didn’t want to have to choose between R’s ability to work with data in bulk and JS’s ability to display data interactively, so I built an R package to get the best of both worlds. It’s called Clickme, and it’s available at rclickme.com
data <-‐ data.frame( x = c(1, 2, 3), y = c("a", "b", "c"))
R
I encountered two major problems while trying to make both platforms talk to each other.
The first was working with different data formats. This is an example of how R and JS store a matrix containing numbers and strings. Clickme converts R data to JS formatted data by using translator functions, which ensure that every data type in R has the right format in JS.
data <-‐ data.frame( x = c(1, 2, 3), y = c("a", "b", "c"))
R
var data = [ {"x":1, "y":"a"}, {"x":2, "y":"b"}, {"x":3, "y":"c"}];
JSI encountered two major problems while trying to make both platforms talk to each other.
The first was working with different data formats. This is an example of how R and JS store a matrix containing numbers and strings. Clickme converts R data to JS formatted data by using translator functions, which ensure that every data type in R has the right format in JS.
data <-‐ data.frame( x = c(1, 2, 3), y = c("a", "b", "c"))
R
var data = [ {"x":1, "y":"a"}, {"x":2, "y":"b"}, {"x":3, "y":"c"}];
JS
translate(data) Translator
I encountered two major problems while trying to make both platforms talk to each other.
The first was working with different data formats. This is an example of how R and JS store a matrix containing numbers and strings. Clickme converts R data to JS formatted data by using translator functions, which ensure that every data type in R has the right format in JS.
data <-‐ data.frame( x = c(1, 2, 3), y = c("a", "b", "c"))
var data = {{ translate(data) }};
RTemplate
The other major problem was data reusability. How to tell the JS code responsible for generating the visualization what data to use? Clickme does this by using templates: hybrid files that contain both R and JS code. This makes it possible for the same template to be used to visualize different data sets.
A template contains mostly JS code, but at critical points it has R code surrounded by double (or triple) braces. When the user asks to generate a plot, the R code is evaluated and the braces are replaced with the results, generating a visualization customized to your data.
data <-‐ data.frame( x = c(1, 2, 3), y = c("a", "b", "c"))
var data = {{ translate(data) }};
R
var data = [ {"x":1, "y":"a"}, {"x":2, "y":"b"}, {"x":3, "y":"c"}];
JS
Template
The other major problem was data reusability. How to tell the JS code responsible for generating the visualization what data to use? Clickme does this by using templates: hybrid files that contain both R and JS code. This makes it possible for the same template to be used to visualize different data sets.
A template contains mostly JS code, but at critical points it has R code surrounded by double (or triple) braces. When the user asks to generate a plot, the R code is evaluated and the braces are replaced with the results, generating a visualization customized to your data.
Clickme plots are easy to create and shareThe main reason why you should use Clickme for your daily plotting needs is that dynamic plots are as easy to generate as static plots. You simply call a function and send it a template and some data.
The plots are also easy to share, simply upload them to a server, or email them.
rclickme.com
@nachocaballero
You can try Clickme by visiting rclickme.com and following the instructions to install the package in R.
Right now, you can only create scatter plots, but I’m working on adding more types of visualizations (line plots, heatmaps).
If you have a visualization that you would like to be able to use directly from R, let me know and I’ll send you an email when the developer guide is ready.
I hope Clickme helps you solve the little problems more quickly, so you can spend the extra time thinking about the big questions.