get started quickly with ibm's hadoop as a service
TRANSCRIPT
© 2015 IBM Corporation 2
Disclaimer
IBM’s statements regarding its plans, directions, and intent are subject to change or
withdrawal without notice at IBM’s sole discretion. Information regarding potential future
products is intended to outline our general product direction and it should not be relied on in
making a purchasing decision. The information mentioned regarding potential future products
is not a commitment, promise, or legal obligation to deliver any material, code or functionality.
Information about potential future products may not be incorporated into any contract. The
development, release, and timing of any future features or functionality described for our
products remains at our sole discretion.
© 2015 IBM Corporation 3
Agenda
• Evolution of the Big Data Analytics space
• Open Data Platform and IBM’s BigInsights
• Hadoop as a Service – BigInsights on Cloud Options
• IBM Analytics for Hadoop – Free, 14-day trial
• BigInsights for Apache Hadoop – Bare Metal option for Production
• Demo
• Questions & Answers
• Resources
© 2015 IBM Corporation 4
“At the World Economic Forum last month in Davos, Switzerland, Big Data was a marquee topic. A report by the forum, “Big Data, Big Impact,” declared data a new class of economic asset, like currency or gold.
“Companies are being inundated with data—from information on customer-buying habits to supply-chain efficiency. But many managers struggle to make sense of the numbers.”
“Increasingly, businesses are applying analytics to social media such as Facebook and Twitter, as well as to product review websites, to try to “understand where customers are, what makes them tick and what they want”, says Deepak Advani, who heads IBM’s predictive analytics group.”
“Big Data has arrived at Seton Health Care Family, fortunately accompanied by an analytics tool that will help deal with the complexity of more than two million patient contacts a year…”
“Data is the new oil.”
Clive Humby
The Oscar Senti-meter — a tool developed by the L.A. Times, IBM and the USC Annenberg Innovation Lab — analyzes opinions about the Academy Awards race shared in millions of public messages on Twitter.”
Big Data continues to be a hot topic in the market
“…now Watson is being put to work digesting millions of pages of research, incorporating the best clinical practices and monitoring the outcomes to assist physicians in treating cancer patients.”
© 2015 IBM Corporation 5
An automotive company is running a
series of experiments to better
understand and adapt to shifting
landscape of urban transportation by
streaming data from sensors on cars
using InfoSphere Streams to analyze it
on Hadoop using BigInsights on Cloud
Industrial manufacturer in the United
States reduces errors and the time
required for engine calibrations by 90
percent and improves reliability and new
product design by using sensors to collect
information on its products in the field and
analyzing it using InfoSphere BigInsights
Big Data implementations are driving real
business value for IBM customers
© 2015 IBM Corporation 6
Rich capabilities in IBM’s Big Data Portfolio mean
lower risk and more successful projects
On premise, Cloud, and “as a Service”
BigInsights
© 2015 IBM Corporation 8
Open Data Platform Initiative
Why is IBM involved?
Strong history of leadership in open source & standards
Supports our commitment to open source currency in all
future releases
Accelerates our innovation within Hadoop &
surrounding applications
Open Data Platform (ODP) vs. Apache Software
Foundation (ASF)
ODP supports the ASF mission
ASF provides a governance model around individual
projects without looking at ecosystem
ODP aims to provide a vendor-led consistent packaging
model for core Apache components as an ecosystem
All Standard Apache Open Source Components
HDFS
YARN
MapReduce
Ambari HBase
Spark
Flume
Hive Pig
Sqoop
HCatalog
Solr/Lucene
ODP
© 2015 IBM Corporation 9
SQL on Hadoop
Big SQL – optimized ANSI compliant SQL
Application Tooling
Toolkits and accelerators
Search & Entity Matching
Watson Explorer, Big Mach
Data Visualization
BigSheets spreadsheet interface
Predictive Modeling
Big R, Machine Learning
Text Analytics
Advanced text processing with AQL, Text
extraction web interface
Real-time Analytics
Streams
Data Governance and Security
DataClick, LDAP, Secure cluster
Storage Integration
GPFS - POSIX Distributed Filesystem
Enterprise Manageability
Adaptive MapReduce, Multi-tenant
scheduling
BigInsights for Apache Hadoop
IOP + IBM Value Adds = BigInsights
Knox
Ambari
Snappy
Open JDK
Avro
Solr
Oozie
Flume
Slider
Pig
Hadoop
HDFS/MapReduce/YARN*
Zookeeper
Parquet
HBase
IBM Open Platform (IOP)
Spark
Hive
Sqoop
ODP
© 2015 IBM Corporation 10
BigInsights Users & Role-Based Modules
IBM Open Platform
BigInsights for
Apache Hadoop
© 2015 IBM Corporation 14
IBM BigInsights – BigSheets Spreadsheet style analysis tool for business users
Easily visualize big data using
rich built-in graphing and
analytic functions
© 2015 IBM Corporation 15
Big SQL in BigInsights
Data Sources
Hive Tables HBase Tables
BigSQL Engine
BigInsights
Application
SQL Language
JDBC / ODBC Driver
JDBC / ODBC Server
Native Sources
CSV SEQ
Parquet RC
AVRO ORC
JSON Custom
ANSI SQL 2011 Compliant
IBM’s SQL for Hadoop
• Makes Hadoop data accessible
to a wider audience
• Familiar, widely known syntax
• Leverage native Hadoop
data sources
Complements the Data
Warehouse
• Exploratory analytics
• Sandbox, Data Lake
Included in BigInsights
Use familiar SQL tools
• Cognos, SPSS, Tableau,
MicroStrategy
© 2015 IBM Corporation 16
Example of text analytic tooling: Graphical
interface to describe structure of various
textual formats – from log file data to natural
language. Users do not need to now AQL
IBM BigInsights – Text Analytics
Information Extraction Framework for Text Analytics
© 2015 IBM Corporation 17
R Clients
Embedded R Execution
R Packages
1
2
Explore, visualize, transform, and model big data using familiar R syntax and paradigm
Scale out R
Partitioning of large data (“divide”)
Parallel cluster execution of
pushed down R code (“conquer”)
All of this from within the R
environment (Jaql, Map/Reduce
are hidden from you)
Almost any R package can run in
this environment
Pull data
summaries to R
client
Or, push R
functions right
on the data
Data sources
R Packages
IBM BigInsights – Big R
End-to-end integration of R into BigInsights
© 2015 IBM Corporation 18
Prototype, create mash-ups in
the cloud for non-production use
Empowers developers to rapidly
drive insight from all data
Two-node Docker Instance
Enterprise features – BigSheets,
Big SQL, Text, and Big R
Delivered via IBM Bluemix
50 GB – input data space
Extendable, Free 14-day Trial
For Production deployments at scale
in the cloud
Delivers flexibility and efficiency
with BYOL and PAYG pricing
Scale to meet spikes in demand
without on-premise infrastructure
Perform enterprise-class, complex
analytics on Big Data Available via
the IBM Cloud Marketplace
Web-based UI for Sizing/Pricing
IBM BigInsights – Cloud deployment options
Manage less, analyze more
IBM Analytics for Hadoop BigInsights for Apache Hadoop
© 2015 IBM Corporation 19
IBM Analytics for Hadoop Details
Free 14-day trial on www.bluemix.net
© 2015 IBM Corporation 20
BigInsights for Apache Hadoop – Options
Secure, Dedicated Bare-metal
Infrastructure
IBM Open Platform
BigInsights for
Apache Hadoop
© 2015 IBM Corporation 21
IBM BigInsights on Cloud – Security
Dedicated, isolated environment for every client
Administrative control owned by customer at Hadoop
and BigInsights level
Native HDFS encryption; optional Guardium encryption
Firewalls provide perimeter security and private network isolation
Aiming for ISO 27K1 compliance in 2015
Example Configuration…
Non-shared physical machines for added security & performance
© 2015 IBM Corporation 23
The IBM Difference
IBM delivers the foundation for Big Data – now and in the future
Embraces open source
Establishes standards
Integrates with familiar interfaces and established systems
Delivers advanced analytic capabilities
IBM is the only vendor providing…
Hadoop as a Managed Service in the Cloud
A single company providing Hadoop-base software, cloud and services
Provides expertise to help you on your journey
6,000 partners
Analytics services and solution centers
© 2015 IBM Corporation 24
IBM BigInsights on Cloud – unique capability
Built-in Twitter Decahose service
Scaled down random sample of Twitter Firehose
Easily land Twitter data into BigInsights HDFS
Manipulate and visualize data using BigSheets
Incorporate sentiment data into analytic models
Easily store and accommodate vast data sets
© 2015 IBM Corporation 25
Check out more data management services at www.bluemix.net
Cloudant dashDB BigInsights on
Cloud DB2 on Cloud
© 2015 IBM Corporation 26
Big Data University – Free Training http://bigdatauniversity.com/
Powered by Hadoop http://wiki.apache.org/hadoop/PoweredBy
Free Trial Software (both for on-premise and cloud) http://www-01.ibm.com/software/data/infosphere/hadoop/trials.html
YouTube Videos
Watson
• The Science Behind the Answer (~7 minutes)
• Watson: Final Jeopardy (~11 minute summary)
Big Data Channel
• http://www.youtube.com/user/ibmbigdata
Resources