Download - What is Hadoop and why is it important?
Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .
HADOOP PRIMERSTEVE HOLDER
Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .
HADOOP AGENDA
• Why Big Data and why’s it different• What is Hadoop?• The players
Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .
BIG DATA
Lots of data
AN ERA OF ABUNDANCE
WHERE WE ARE NOW
2005 2007 2009 2011 2013
Processing Power
HADOOP ANALYTICS
Intelligence
Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .
BIG DATA
Large Volumes of Unstructured Data Mine data Detect nuggets of relevant
data while disregarding unimportant data
Smaller Structured Data Sets• Run queries to for insight• Know what you’re looking for
Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .
BIG DATA DEALING WITH NOISY DATA
[Skip 2 seconds (12,600 record entries) for next meaningful action from this user]
Visitor A views 1st product
Visitor A - irrelevant data
Visitor A views 2nd product
User Product Product etc
43.251.164.128 B003ZX8B3W B00365F6EG etcEDW
Example: What’s Important Data in a Web Log?
Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .
WHAT IS APACHE HADOOP?
Apache Hadoop is open source software that enables reliable, scalable,
distributed computing on clusters of inexpensive servers• Reliable - Software is fault tolerant, it expects and handles hardware and
software failures• Scalable - Designed for massive scale of processors, memory, and local
attached storage• Distributed - Handles replication. Offers massively parallel programming
model, MapReduce
Hadoop framework handles: partitioning, scheduling, dispatch, execution,
communication, failure handling, monitoring, reporting and more
Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .
HADOOP LOGICAL VIEW
Hadoop Distributed File System (HDFS) • Reliable and cheap data storage • Uses commodity hardware
YARN• Resource manager• Key to Enterprise scalability• Provides hooks into HDFS
MapReduce • Programing model • Create queries • Manages execution
HIVE • Solution on top of Hadoop • Direct access to HDFS and Hbase • Provides access to Hadoop
Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .
APACHE HADOOP BOTTOM LINE
Strengths Weaknesses
+ Huge data volumes
+ Unstructured data
+ Reliable
+ Scalable
+ Lowest cost
+ Open source
+ No hardware lock in
+ Batch processing
- Limited to no built in analytics
- Not efficient at small scale
- Requires skilled engineering, operation and analyst resources
- Hiring qualified talent
- Less mature than SQL
- Governance
- Lack of user role support in access model
Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .
THE HADOOP ARMS RACE
WHO’S WHO IN THE ZOO
Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .
HADOOP VENDORS
• First in the market• Proprietary software to enhance
ecosystem• Single place to store, process and analyze
all your data• In many large accounts as the incumbent• Partner approach has been more
conservative
CLOUDERA
Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .
HADOOP VENDORS
• 100% open source• Driving the most Apache Projects• Created and leader in YARN• Seeing good deal of traction due to 100%
Open Source• Partner approach has been much more
open and beneficial
HORTONWORKS
Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .
SAS & HADOOP HOW?
SAS & Hadoop intersect in many ways:
SAS can treat Hadoop just as any other data source, pulling data
FROM Hadoop, when it is most convenient;
SAS can work WITH Hadoop, lifting data in a purpose-built
advanced analytics in-memory environment;
SAS can work directly IN Hadoop, leveraging the distributed
processing capabilities of Hadoop.
Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .
HADOOP QUICK OVERVIEW
Why it’s important?• Hadoop has moved to the enterprise – it’s the Go To
for Big Data• Adoption is faster than most other technologies• Hadoop cost per TB is cheaper than traditional DBs
Why Hadoop is it important to SAS?• We make them look good - #1 reason for adoption of
Hadoop = Analytics• They are selling around us – Why not partner?• We have an amazing story – From With and In• Joint knowledge will mitigate Open Source threat
Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved . sas.com
QUESTIONS