hdp r-google charttools-webinar-3-5-2013 (2)
TRANSCRIPT
![Page 1: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/1.jpg)
© Hortonworks Inc. 2013
Quick House Keeping Rule
• Q&A panel is available if you have any questions during the
webinar
• There will be time for Q&A at the end
• We will record the webinar for future viewing
• All attendees will receive a copy of the slides an recording
Page 1
![Page 2: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/2.jpg)
© Hortonworks Inc. 2013
Hadoop, R, and Google Chart Tools
Data Visualization for Application Developers
Jeff Markham
Solution Engineer
![Page 3: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/3.jpg)
© Hortonworks Inc. 2013
Agenda
• Introductions• Use Case Description• Preparation• Demo• Review• Q & A
Page 3
![Page 4: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/4.jpg)
© Hortonworks Inc. 2013
Use Case Description
• Visualizing data• Tools vs. application development• Choosing the technology
• Hortonworks Data Platform• RHadoop• Google Charts
Page 4
![Page 5: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/5.jpg)
© Hortonworks Inc. 2013
OS Cloud VM Appliance
Preparation: Install HDP
Page 5
HORTONWORKS DATA PLATFORM (HDP)
Hortonworks Data Platform (HDP)Enterprise Hadoop
• The ONLY 100% open source and complete distribution
• Enterprise grade, proven and tested at scale
• Ecosystem endorsed to ensure interoperability
PLATFORM SERVICES
HADOOP CORE
Enterprise Readiness: HA, DR, Snapshots, Security, …
Distributed Storage & ProcessingHDFS YARN (in 2.0)
WEBHDFS MAP REDUCE
DATASERVICES
Store, Process and Access Data
HCATALOG
HIVEPIGHBASE
SQOOP
FLUME
OPERATIONAL SERVICES
Manage & Operate at
ScaleOOZIE
AMBARI
![Page 6: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/6.jpg)
© Hortonworks Inc. 2013
Preparation: Install R
Page 6
• Install R language
• Install appropriate packages– rhdfs– rmr2–googleVis– shiny–Dependencies for all above
![Page 7: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/7.jpg)
© Hortonworks Inc. 2013
Preparation
Page 7
• rmr2–Functions to allow for MapReduce in R apps
• rhdfs–Functions allowing HDFS access in R apps
• googleVis–Use of Google Chart Tools in R apps
• shiny– Interactive web apps for R developers
![Page 8: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/8.jpg)
© Hortonworks Inc. 2012
Demo WalkthroughUsing Hadoop, R, and Google Chart Tools
![Page 9: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/9.jpg)
© Hortonworks Inc. 2013
Visualization Use Case
• Data from CDC– Vital statistics publicly available data– 2010 US birth data file
Page 9
S 201001 7 2 2 30105 2 011 06 1 123 3405 1 06 01 2 2 0321 1006 314 2000 2 222 2 2 2 2 2 122222 11 3 094 1 M 04 200940 39072 3941 083 22 2 2 2 2 110 110 00 0000000 00 000000000 000000 000 000000000000000000011 101 1 111 1 0 1 1 1 111111 11 1 1 1 1
SAM
PLE
RECO
RD
source: http://www.cdc.gov/nchs/data_access/vitalstatsonline.htm
![Page 10: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/10.jpg)
© Hortonworks Inc. 2013
Visualization Use Case
Page 10
> hadoop fs –put ~/VS2010NATL.DETAILUS.DAT /user/jeff/natality/
PUT
DATA
INTO
HD
FS
> hadoop fs –mkdir /user/jeff/natality
CREA
TE H
DFS
DIR
• Put data into HDFS– Create input directory– Put data into input directory
![Page 11: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/11.jpg)
© Hortonworks Inc. 2013
Visualization Use Case
Page 11
#!/usr/bin/env Rscript
require('rmr2')require('rhdfs')hdfs.init()
hdfs.data.root = 'natality'hdfs.data = file.path(hdfs.data.root, 'VS2010NATL.DETAILUS.DAT')hdfs.out.root = hdfs.data.roothdfs.out = file.path(hdfs.out.root, 'out')
. . .
R SC
RIPT
• Write R script– Specify use of RHadoop packages– Initialize HDFS– Specify data input and output location
![Page 12: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/12.jpg)
© Hortonworks Inc. 2013
Visualization Use Case
Page 12
. . .
mapper = function(k, fields) { keyval(as.integer(substr(fields, 89, 90)),1)}
reducer = function(key, vv) {# count values for each key keyval(key, sum(as.numeric(vv),na.rm=TRUE))} . . .
R SC
RIPT
• Write R script– Write mapper function– Write reducer function
![Page 13: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/13.jpg)
© Hortonworks Inc. 2013
Visualization Use Case
Page 13
. . .
job = function (input, output) { mapreduce(input = input, output = output, input.format = "text", map = mapper, reduce = reducer, combine = T)} . . .
R SC
RIPT
• Write R script– Write job function
![Page 14: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/14.jpg)
© Hortonworks Inc. 2013
Visualization Use Case
Page 14
. . .
out = from.dfs(job(hdfs.data, hdfs.out))results.df = as.data.frame(out,stringsAsFactors=F)R
SCRI
PT
• Write R script– Write result to HDFS output directory
![Page 15: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/15.jpg)
© Hortonworks Inc. 2013
Visualization Use Case
Page 15
> mkdir ~/my-shiny-app
SHIN
Y AP
P D
IR
• Create Shiny application
– Create directory– Create ui.R– Create server.R
![Page 16: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/16.jpg)
© Hortonworks Inc. 2013
Visualization Use Case
Page 16
shinyUI(pageWithSidebar(
# Application title headerPanel("2010 US Births"),
sidebarPanel(. . .),
mainPanel( tabsetPanel( tabPanel("Line Chart", htmlOutput("lineChart")), tabPanel("Column Chart", htmlOutput("columnChart")) ) )))
UI.R
SO
URC
E
• Create Shiny application– Create ui.R
![Page 17: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/17.jpg)
© Hortonworks Inc. 2013
Visualization Use Case
Page 17
library(googleVis)library(shiny)library(rmr2)library(rhdfs)
hdfs.init()
hdfs.data.root = 'natality'hdfs.data = file.path(hdfs.data.root, 'out')df = as.data.frame(from.dfs(hdfs.data))
. . .
SERV
ER.R
SO
URC
E
• Create Shiny application– Create server.R
![Page 18: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/18.jpg)
© Hortonworks Inc. 2013
Visualization Use Case
Page 18
. . . shinyServer(function(input, output) {
output$lineChart <- renderGvis({ gvisLineChart(df, options=list( vAxis="{title:'Number of Births'}", hAxis="{title:'Age of Mother'}", legend="none" )) }) . . .
SERV
ER.R
SO
URC
E
• Create Shiny application– Create server.R
![Page 19: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/19.jpg)
© Hortonworks Inc. 2013
Visualization Use Case
Page 19
> shiny::runApp('~/my-shiny-app')Loading required package: shiny
Welcome to googleVis version 0.4.0
. . .
HADOOP_CMD=/usr/bin/hadoop
Be sure to run hdfs.init()
Listening on port 8100
RUN
SH
INY
APP
• Run Shiny application
![Page 20: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/20.jpg)
© Hortonworks Inc. 2013
Visualization Use Case
Page 20
• View Shiny application
![Page 21: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/21.jpg)
© Hortonworks Inc. 2012
Demo LiveUsing Hadoop, R, and Google Chart Tools
![Page 22: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/22.jpg)
© Hortonworks Inc. 2013
Visualization Use Case
Page 22
• Architecture recap– Analyze data sets with R on Hadoop– Choose RHadoop packages– Visualize data with Google Chart Tools via googleVis package– Render googleVis output in Shiny applications
• Architecture next steps– Integrate Shiny application into existing web apps– Create further data models with R
![Page 23: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/23.jpg)
© Hortonworks Inc. 2013
OS Cloud VM Appliance
HDP: Enterprise Hadoop Distribution
Page 23
HORTONWORKS DATA PLATFORM (HDP)
Hortonworks Data Platform (HDP)Enterprise Hadoop
• The ONLY 100% open source and complete distribution
• Enterprise grade, proven and tested at scale
• Ecosystem endorsed to ensure interoperability
PLATFORM SERVICES
HADOOP CORE
Enterprise Readiness: HA, DR, Snapshots, Security, …
Distributed Storage & ProcessingHDFS YARN (in 2.0)
WEBHDFS MAP REDUCE
DATASERVICES
Store, Process and Access Data
HCATALOG
HIVEPIGHBASE
SQOOP
FLUME
OPERATIONAL SERVICES
Manage & Operate at
ScaleOOZIE
AMBARI
![Page 24: Hdp r-google charttools-webinar-3-5-2013 (2)](https://reader033.vdocuments.mx/reader033/viewer/2022052621/558b229bd8b42a98478b458c/html5/thumbnails/24.jpg)
© Hortonworks Inc. 2013
HDP Sandbox
Page 24