web briefing: unlock the power of hadoop to enable interactive analytics
DESCRIPTION
TRANSCRIPT
Unlock the power of Hadoop to enable interactive analytics & real-time Business Intelligence July 10, 2013
Web Briefing: Unlock the power of Hadoop
to enable interactive analytics
• Thank you for joining today’s session!• The web briefing will start momentarily. • We will use the WebEx Q & A feature
Today’s Slides are available at www.slideshare.net/kognitio
@Hortonworks@Kognitio
Follow the conversation on Twitter:
Teleconference:Use your computer, or call:
US +1 631 267 4890UK +44-203-478-5289Passcode: 841 203 797
Unlock the power of Hadoop to enable interactive analytics
July 10, 2013
Demonstration: SQL and Hadoop with in‐memory MPP Acceleration ‐ Stuart Watt
Hadoop meets Mature BI: Interactive Analytics‐Michael Hiskey
Modern Data Architectures‐ John Kriesa
Web Briefing Agenda
© Hortonworks Inc. 2013
Modern Data ArchitecturesBig data drivers and patterns
John Kreisa – VP Strategic Marketing, Hortonworks@marked_man
© Hortonworks Inc. 2013
Existing Data ArchitectureAP
PLICAT
IONS
DAT
A SYSTEM
S
TRADITIONAL REPOSRDBMS EDW MP
P
DAT
A SO
URC
ES
OLTP, POS
SYSTEMS
OPERATIONALTOOLS
MANAGE & MONITOR
Traditional Sources (RDBMS, OLTP, OLAP)
DEV & DATATOOLS
BUILD & TEST
Business Analytics
Custom Applications
Enterprise Applications
Page 5
© Hortonworks Inc. 2013
6 Common Types of Hadoop Data
1. SentimentUnderstand how your customers feel about your brand and products – right now
2. ClickstreamCapture and analyze website visitors’ data trails and optimize your website
3. Sensor/MachineDiscover patterns in data streaming automatically from remote sensors and machines
4. GeographicAnalyze location-based data to manage operations where they occur
5. Server LogsResearch logs to diagnose process failures and prevent security breaches
6. Unstructured (txt, video, pictures, etc..)Understand patterns in text across millions of web pages, emails, and documents
Value
Page 6
© Hortonworks Inc. 2013
Next-Generation Data Architecture
Page 7
APPLICAT
IONS
DAT
A SYSTEM
S
Microsoft Applications
DAT
A SO
URC
ES
Traditional Sources (RDBMS, OLTP, OLAP)
In‐memory MPP Accelerator
BI Tools & OLAP Clients
TRADITIONAL REPOSRDBMS EDW MPP
OPERATIONALTOOLS
MANAGE & MONITOR
DEV & DATATOOLS
BUILD & TEST
New Sources (web logs, email, sensors, social media)
HORTONWORKS DATA PLATFORM
© Hortonworks Inc. 2013
Interoperating With Your Data Tools
Page 8
APPLICAT
IONS
DAT
A SYSTEM
S
Microsoft Applications
DAT
A SO
URC
ES
Traditional Sources (RDBMS, OLTP, OLAP)
In‐memory MPP Accelerator
HORTONWORKS DATA PLATFORM
OPERATIONALTOOLS
Viewpoint
DEV & DATATOOLS
TRADITIONAL REPOS
New Sources (web logs, email, sensors, social media)
© Hortonworks Inc. 2013
Big DataTransactions, Interactions, Observations
Hadoop Common Patterns of Use
Business Cases
HORTONWORKSDATA PLATFORM
Refine Explore Enrich
Batch Interactive Online
“Right-time” Access to Data
Page 9
© Hortonworks Inc. 2013
Data System
sAp
plications
Sources
Infrastructure ‐ Data LakeModern Data Architecture
Hadoop as a Shared Data Lake
TRADITIONAL REPOS
RDBMS EDW MPP
Custom Analytic App
New Sources (logs, clicks, social media, sensors)
Packaged Analytic App
Traditional Sources (RDBMS, OLTP, OLAP)
• A more mature organization will have this as a goal for Hadoop
ENTERPRISE HADOOP PLATFORM
Page 10
• Store all data and build/enable applications on shared “data lake”
• Delivers broad value across the enterprise
In‐memory MPP Accelerator
HORTONWORKS DATA PLATFORM
• Seamless SQL access with interactive analytics
© Hortonworks Inc. 2013
Data System
sAp
plications
Sources
Hadoop for New Targeted Applications
TRADITIONAL REPOS
RDBMS EDW MPP
New Sources (logs, clicks, social media, sensors)
Packaged Analytic App
Traditional Sources (RDBMS, OLTP, OLAP)
ENTERPRISE HADOOP PLATFORM
Business ApplicationCatalyst: Type of Data
Custom Analytic App
In‐memory MPP Accelerator
HORTONWORKS DATA PLATFORM
• Many organizations start here & expand usage
• Driven by a type of data that was not capable of analysis before Hadoop
• Delivers explicit value for a business case or an individual LOB
• Complementary to existing applications that use SQL
• Interactive analytics with MPP in-memory execution of R, Python, Perl, etc.
© Hortonworks Inc. 2013
OS Cloud VM Appliance
HDP: Enterprise Hadoop Distribution
Page 12
PLATFORM SERVICES
HADOOP CORE
Enterprise ReadinessHigh Availability, Disaster Recovery,Security and Snapshots
HORTONWORKS DATA PLATFORM (HDP)
OPERATIONAL SERVICES
DATASERVICES
HIVE & HCATALOG
PIG HBASE
OOZIE
AMBARI
HDFS
MAP REDUCE
Hortonworks Data Platform (HDP)Enterprise Hadoop
• The ONLY 100% open source and complete distribution
• Enterprise grade, proven and tested at scale
• Ecosystem endorsed to ensure interoperability
SQOOP
FLUME
NFS
LOAD & EXTRACT
WebHDFS
Hadoop meets Mature BI: Interactive Analytics
Michael HiskeyVP of Marketing & Business Development
@mphnyc
Mature Business Intelligence and Reporting
Numbers, tables, charts, indicators
…accessed with ease and simplicity
Historical information, latency
BI tools have plateaued
Decision Support
Advanced analytics and data science
More math…a lot more math
Drive for a deeper level of understanding
DynamicSimulation
Statistical Analysis
Behavior modellingReporting Fraud
detection
create external script LM_PRODUCT_FORECAST environment rsintreceives ( SALEDATE DATE, DOW INTEGER, ROW_ID INTEGER, PRODNO INTEGER, DApartition by PRODNO order by PRODNO, ROW_IDsends ( R_OUTPUT varchar )isolate partitionsscript S'endofr( # Simple R script to run a linear fit on daily sales
prod1<-read.csv(file=file("stdin"), headercolnames(prod1)<-c("DOW","ID","PRODNO","DAdim1<-dim(prod1)daily1<-aggregate(prod1$DAILYSALES, list(Ddaily1[,2]<-daily1[,2]/sum(daily1[,2])basesales<-array(0,c(dim1[1],2))basesales[,1]<-prod1$IDbasesales[,2]<-(prod1$DAILYSALES/daily1[prcolnames(basesales)<-c("ID","BASESALES")fit1=lm(BASESALES ~ ID,as.data.frame(basesforecast<-array(0,c(dim1[1]+28,4))colnames(forecast)<-c("ID","ACTUAL","PREDI
select Trans_Year, Num_Trans,count(distinct Account_ID) Num_Accts,sum(count( distinct Account_ID)) over (partition by Trans_Year order by Num_Trancast(sum(total_spend)/1000 as int) Total_Spend,cast(sum(total_spend)/1000 as int) / count(distinct Account_ID) Avg_Yearly_Spendrank() over (partition by Trans_Year order by count(distinct Account_ID) desc) Rrank() over (partition by Trans_Year order by sum(total_spend) desc) Rank_by_Totfrom( select Account_ID,
Extract(Year from Effective_Date) Trans_Year,count(Transaction_ID) Num_Trans,sum(Transaction_Amount) Total_Spend,avg(Transaction_Amount) Avg_Spend
from Transaction_factwhere extract(year from Effective_Date)<2009and Trans_Type='D' and Account_ID<>9025011and actionid in (select actionid from DEMO_FS.V_FIN_actions
where actionoriginid =1)group by Account_ID, Extract(Year from Effective_Date) ) Acc_Summary
group by Trans_Year, Num_Transorder by Trans_Year desc, Num_Trans;
select dept, sum(sales) from sales_factWhere period between date ‘01-05-2006’ agroup by depthaving sum(sales) > 50000;
select sum(sales) from sales_historywhere year = 2006 and month = 5 and regiselect total_salesfrom summary where year = 2006 and month = 5 and regi
The Analytical Enterprise
Business Analyst
Systems Admin
Data Scientist
Sexiest job of the 21st Century?
Key: “Graduation”• Projects will need to easily Graduate
from the Data Science Lab and become part of Business as Usual
Your goal:
PRESS HERE…and really cool Big Data stuff happens!
Big Data: Bring the Analytics TO the Data
Kognitio Hadoop Integration • Kognitio Map/Reduce Agent uploads itself to
Hadoop nodes• Query passes selections, relevant predicates• Data filtering & projection locally on each node
• Data filtered as it is read from file(s)• Only data of interest is transferred and loaded
into memory via parallel load streams
Demonstration: SQL &Hadoop with in‐memory
MPP AccelerationStuart Watt
Senior Systems Engineer@Kognitio
© Hortonworks Inc. 2013
Hortonworks Snapshot
• We distribute the only 100% Open Source Enterprise Hadoop Distribution: Hortonworks Data Platform
• We engineer, test & certify HDP for enterprise usage
• We employ the core architects, builders and operators of Apache Hadoop
• We drive innovation within Apache Software Foundation projects
• We are uniquely positioned to deliver the highest quality of Hadoop support
• We enable the ecosystem to work better with Hadoop
Develop Distribute Support
We develop, distribute and support the ONLY 100% open source Enterprise Hadoop distribution
Endorsed by Strategic Partners
Headquarters: Palo Alto, CAEmployees: 200+ and growingInvestors: Benchmark, Index, Yahoo, Tenaya, Dragoneer
Kognitio Snapshot: Mature SQL atop Hadoop
Kognitio is an in‐memory analytical platform that is tightly integrated with Hadoop for high‐performance advanced analytics
that make Big Data more consumable for enterprises,
especially those with mature BI environments or engrained
tools.
• Privately held• Invented the in‐memory analytical platform• Labs in the UK ‐ HQ in New York, NY
• Powering advanced analytics at organizations worldwide, such as:
© Hortonworks Inc. 2013
Interactive analytics with Hadoop: Getting Started
• Assess your environment and use case for Hortonworks Data Platform + Kognitio Analytical Platform www.kognitio.com/hadoop
Download Hortonworks Sandboxwww.hortonworks.com/sandbox
Sign up for Training for in-depth learninghortonworks.com/hadoop-training/
ZERO to big data in 15 minutes:
Request a Meeting
Download the Kognitio Analytical Platform• No registration required• Perpetual license - No time limits www.kognitio.com/free
Question & Answer session will be conducted electronically, using the panel to the right of your screen
Today’s Slides available at: www.slideshare.net/kognitio
Download Hortonworks Sandboxwww.hortonworks.com/sandbox
Download the Kognitio Analytical Platform• No registration required• Perpetual license - No time limits www.kognitio.com/free
Unlock the power of Hadoop to enable interactive analytics
Request a Meetingwww.kognitio.com/hadoop
connect
www.kognitio.com
twitter.com/kognitiolinkedin.com/companies/kognitio
tinyurl.com/kognitio youtube.com/kognitio
+1 855 KOGNITIO
© Hortonworks Inc. 2013
Hortonworks SandboxFastest onramp to Apache Hadoop• What is it?
– Free, virtualized single-node version of Hortonworks Data Platform– A personal Hadoop environment– An integrated learning environment with hands-on step-by-step tutorials
• What it does?– Dramatically accelerates the process of learning Apache Hadoop– Accelerates & validates the use of Hadoop within your unique data
architecture– Use your data to explore and investigate your use cases
• ZERO to big data in 15 minutes• Get Started!
Page 25
Download Hortonworks Sandboxwww.hortonworks.com/sandbox
Sign up for Training for in-depth learninghortonworks.com/hadoop-training/