Big Data: Why now?
of Top 500 enterprises willFail to exploit Big Data2
85%
of enterprises have no formalconcept for data management5
>30%
digital data globally doublesevery two years1
x2 90%of all data is unstructured and
cannot be handled with traditionalanalytics tools1
10-50%cost reduction in production
through Big Data exploitation4
of all IT invest 2015 will beBig Data driven2
70%
1 IDC Predictions 2012 , 2 Gartner, Predicts 2012.
4 McKinsey Global Institute 2011, Big data: The next frontier for innovation, competition, and productivity, 5 Economist Intelligence Unit 2011, Big data. Harnessing a game-changing asset
Mobile database In-memory database
Databaseappliances Cloud
databaseRelational OLTP
Objectdatabase
Graphdatabase
Documentdatabase
Key-value
Traditional EDW Column-store EDW MPP EDW
Enterprise data warehouse
NoSQL (nonrelational)
RelationalScale-outrelational
Traditional data sources New data sourcesCRM ERP Legacy apps Public data Sensors Marketplace
Social media Geo-locationSource: Forrester Research, Inc.
The BI Ecosystem according to Forrester
Cost of a Terabyte of Enterprise Disk Storage• 1990 – in the region of USD 9 million
• 2013 – in the region of USD 100
Cost of a Terabyte of RAM• 1990 – in the region of USD 106 million
• 2013 – in the region of USD 500
• i.e. over the last 20 years the price ratio of Memory to Storage has dropped from 1:12 to 1:5
• But in real terms the drop in price is 200 000 times
Performance Comparison of Memory to Disk Read• Enterprise Disk – between 4 and 13 million nanoseconds
• Memory – between 0.4 and 40 nanoseconds
• i.e. between 150 000 and 1 million times faster when already in memory
The facts behind in-memory
Positioning Big Data TechnologiesNovember 2013
Approaching and beyond mainstream adoption
Hadoop SQL Interfaces
Hadoop Distribution
In-memory Analytics
Big Data tools complement existing BI investmentThey do not replace them - Yet
Existing data sources
Business Intelligence Tools and analytical applications
TransactionalOLTP DBMS
BusinessApplications
ERP, CRM, etc.
DataWarehouse
Data MartCube
Appliance
Reporting Dashboard OLAP Data & Text Mining
Data integration ETL
Big Data tools complement existing BI investmentThey do not replace them - Yet
Hadoop,NoSQL,
Log-Data
In-MemoryDatabase
Static data Flowing data
Real-time dataprocessing andanalysis
Complex eventprocessing
Structured andunstructured data
New data sources
OperationalIntelligence
PredictiveAnalytics
Existing data sources
Business Intelligence Tools and analytical applications
TransactionalOLTP DBMS
BusinessApplications
ERP, CRM, etc.
DataWarehouse
Data MartCube
Appliance
Reporting Dashboard OLAP Data & Text Mining
Data integration ETL
The 3 V’s of Big Data
BusinessProblem
TechnologySolution
Backward-lookinganalysisUsing data out ofbusiness applications
SAP HANA Cloudera HadoopHortonworks Hadoop
StructuredLimited (2 – 3 TB in RAM)
StructuredLimited (1 PB in RAM)
Structured or unstructuredQuasi unlimited(20 – 30 PB)
Legacy BI High performance BI „Hadoop“ Ecosystem
Selected Vendors
Data Type/Scalability
SAP Business ObjectsIBM CognosMicroStrategy
Quasi-real-time,In-memory analysisUsing data out ofbusiness applicationsComplex EventProcessing
Batch, Forward-lookingpredictive analysisQuestions defined in themoment, using datafrom many sources
HADOOP vs In-Memory analytics
How fastdo you want your
delivery made?
Whatis being delivered?
How muchdo you want to spend?
Do you havespecialist drivers?
? $ +
HADOOP vs In-Memory analytics
Hadoop (with Impala)
MPV
Good performanceCapacity
Easy to driveAffordable
Hadoop (without Impala)
Long Haul Trucks
Excellent CapacityDrives overnight
Moderate performance
Needs a specialist driver’s license
IMA
Ferrari
SexyVery fast
Limited luggage space
HADOOP vs In-Memory analyticsSome Hadoop improvements
Cloudera’s Hadoop offeringswhen you buy the Trucks they throw in the MPV's for free
Hadoop becomes easier and easier to useWith the ecosystem of contributors and distributionse.g. Cloudera’s Impala, Microsoft’s HDInsight, MapR’s Drill, Hortonworks’ Stinger Initiative
Hadoop 2.0brings YARN, Graph Analysis and Stream Processing
The speed of improvements in HDFS/HBase/Hive/YarnThe gap between batch and real-time/low-latency is going to be cut fairly soone.g. from Hive 0.10 to 0.11 with the new RCFile data format there is a performance boost >10x
Use case segmentation drives solution design andtechnology selection
Real-time Reporting of SAP OLTP data, including joinsand data transformations
Summarise Unstructured DATA LOGS (scheduled)
Realtime reporting of Summarised Data Logs, with Joinsto other NON OLTP Data
Near Realtime reporting of Social Media Data
Realtime reporting of recent OLTP data joined withrecent Social Media Data
Image Analysis Processing (scheduled)
Image Analysis Reporting
Predictive Analysis Reporting (comparing OLTP & NONOLTP DATA)
SAP HANA
HADOOP MAP/REDUCE
IMPALA
IMPALA + HADOOP MAP/REDUCE (scheduled to collect recent Social Media Data)
HANA + HADOOP MAP/REDUCE (scheduled to collect recent Social Media Dataand load into HANA)
HADOOP MAP/REDUCE (scheduled job runs sophisticated analysis of Video filesand stores results in a structured file)
IMPALA (to report on results file)
HANA + HADOOP MAP/REDUCE (scheduled to collect & transfer applicableHistoric or relevant Non OLTP Data to HANA)
USE CASE POTENTIAL TOOL
The NEW Real time analytics with SAP HANA &HadoopIntegrate and federate
non-SAP
SAP
In-Memory
HadoopMapReduce/Batch C
Computing engine
SAP HANA
Hadoop
UI/Front end analytics
SAPERP/DW
Sybase ASE & IQ
3rd party DBMS
Sybase ESP
SAPLIVE & UI Analytics
Mobile & EmbeddedApplications
non-SAP BI
SLTDXCETL
SmartAccess
SAP DS
SmartAccess
Learning some of the language of Big Data
Jaspersoft
KarmasphereStudio
Talend Pentaho
Continuity
NoSQLMongoDB
Cassandra
CouchDB
Redis Riak
Neo4j
Platfora
Tableau
Splunk
Shep
Hadoop
MapReduce
ZooKeeper
Avro
Nutch
HDFS
Matlab
R
Python JRuby
Ruby
Java
C++
Kafka
InfoChimps
Skytree
GreenPlum
Aster
GoPivotal
Hive Pig
Hbase
Chukwa
Yarn
The other Big Data toolsOnce you have a data store and a means of accessing the data.
OperationalIntelligence
Platform
Video search, audiosearch and content
analytics
Text search Graphdatabases
Complex eventprocessing
In-memorydata grid
Speechrecognition
Patternrecognition
Some new roles in data/analyticsThe coming of age of data in the enterprise
The DataScientist
The ChiefData Officer
Data Explorer CampaignExpert
Data SecurityOfficer
Business SolutionArchitect/ Domain
Expert
Data Hygienist/Data Steward
Big Data talent gapexpected until 2018
50%
external online sources
FacebookTwitterLinkedInGoogle+YouTube
TomTomMarketWatchFinancial TimesBloomberg
the information-driven Transport &logistics & Retail provider
new customer base
FinancialIndustry
PublicAuthorities
MarketResearch
SME Retail
commercial dataservices
Adress VerificationMarket IntelligenceSupply Chain MonitoringEnvironmental Statistics
MarketingAnd Sales
ProductManagement
Operations
NewBusiness
Order volume,received service quality
Customer sentiment and feedback
Location, Destination,Availability
Network flow data
Network flow data
Real-timeincidents
MarketandCustomerIntelligence
Location, traffic density,directions, delivery sequence
Continuoussensor data
existing customer base
High-Tech / Pharma
Manufacturing / FMCG
Commerce Sector
Households / SME
real-time route optimizationDelivery Routes are dynamicallycalculated based on deliverysequence, traffic conditions andrecipient status.
1
2 consolidated pickupand deliveryCarriers of multiple existing fleets are leveragedto pick up or deliver shipments along routes theywould take anyway.
3
strategic network planningLong-term demand forecasts fortransport capacity are generatedin order to support strategicinvestments into the network.
4operational capacity planningShort- and mid-term capacity planning allowsoptimal utilization and scaling of manpower andresources.
5
customer loyalty managementPublic customer information is mappedagainst business parameters in order topredict churn and initiate countermeasures.
6
service improvementand product innovationA comprehensive view on customerrequirements and service quality is used toenhance the product portfolio.
7
risk evaluation andresilience PlanningBy tracking and predicting events that lead tosupply chain disruptions, the resilience level oftransport services is increased manpower andresources.
8market intelligence for smeSupply chain monitoring data is used to createmarket intelligence reports for small andmedium-sized companies.
9
financial demand andsupply chain analyticsA micro-economic view is createdon global supply chain data that helpsfinancial institutions improve theirrating and investment decisions.
10address verificationFleet personnel verifies recipient addresses which aretransmitted to a central address verification serviceprovided to retailers and marketing agencies.
11
environmental intelligenceSensors attached to delivery vehicles producefine-meshed statistics on pollution, trafficdensity, noise, parking spot utilization etc.
Predictive analytics for transport, logistics & retail
smartPORT logisticsdeveloped by T-Systems, Deutsche TelekomInnovation Laboratories,SAP Research and Hamburg Port Authority
Only location-basedinformation sent to driver, thanksto geo-fencing
Precise communicationsthanks to real-time data andsmart devices
Stakeholder integrationIncl. port authority, forwarding agents, terminal and parkinglot operators, plus others as required (sea shippingcompanies etc.)
5-10 minutes saved per tourmeans one more pick-up per day
Portal provides transparencyfor all stakeholders, with role-based access
Cloud solutioncollects all relevant real-time information inone place
Greater Efficiency for truck and container movementsThe right information, in the right place, in time, predictable
100 %compliance withlegal requirements
Up to 20 %lower costs 1)
Fulltransparency
Up to 20 %reduction in HR coststhanks to automation
Seamlessdata flow
Rapidreactions
Patient controlled data distribution
VOLUME VELOCITYVARIETYVALUE
VOLUME VELOCITYVARIETYVALUE
IntegrationConsolidationOptimization
Processing & integratingsmart data management
Factor of 5.8:Potential growthby 2015 2)
Secured connection forerror-free data transfer
Optimizationand automationof processes
Pinpointingguzzlers
Intelligent managementof medical care
Managementof Devices
Immediate availabilityof patient and poc data
Physicians, Specialists,Family Doctors
Insurance
Hospitals & Pharma
Health care & Pharmagrids got smartTransparency enhanced with predictive analytics
Summary
Data Volumes are here to stay
In-Memory Computing is becoming increasingly “affordable”
Hadoop is not your Big Data answer it is part of your BI and BigData ecosystem
BI and Big Data Ecosystem will likely benefit from other tools as well
An Enterprise Data Strategy and Data Governance iscritical to success
Summary
Make sure you have two conversations in your enterprise
1 2
A BusinessConversationabout the business values from your BIEcosystem
An IT Conversationto ensure your IT Organisationunderstands the new world of BI, theshortcomings, the strengths and rolesof the component technologies
Summary
“What matters is how — and why — vastly moredata leads to vastly greater value creation.
Designing and determining those links is typicallyin the province of top management”
but needs to be facilitated by the IT Organisationin Business terms
A parting thought: Big Data‘s 4 V‘s
VALUEvalue comes from knowing more than the rest
ANALYTICScreates
HADOOP Innovation #1: Much cheaper storage
0.5 Petabytes200,000 IOPS8 Gbyte/sec
1 Petabyte200,000 IOPS10 Gbyte/sec
10 Petabytes400,000 IOPS
250 Gbyte/sec
$1 Milliongets you
Software HDS, bundled withhardware by HDS
NetApp, bundled withhardware by NetApp
Open source Hadoop ecosystem,hardware self-assembled
Gigabyte
SAN Storage NAS File Servers Local Storage
$2 - $10 $1 - $5 <$0.50
Learning the language of Big DataColour coding key
Core HadoopKernel/ModulesHadoop DW ModulesNoSQL DB PlatformsMPP Analytics PlatformsProgramming LanguagesIDEsData HubsBI SuiteAnalysis and VisualisationData Analysis ToolData Integration ToolStartup - undefined