realising value from data - british computer society · realising value from data togetherwith....

30
BCS Open Source SIG | London | 1 May 2013 Open Source Drives Innovation & Adoption in Big Data Realising Value from Data Togetherwith

Upload: others

Post on 20-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

BCS Open Source SIG | London | 1 May 2013

Open Source Drives Innovation & Adoption in Big Data

Realising Value from DataTogetherwith

Page 2: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Timings

6:00 - 6:30pm. Register / Refreshments6:30 - 8:00pm, Presentation Session8:00 - 8:30pm, Networking

Objectives• What is Big Data?• Evolution of Open Source Hadoop and its influence on the Big Data

phenomenon• The Open Source ecosystem around Big Data• What is the importance of Open Source for Hadoop?• What business challenges does Hadoop address?• What does a Hadoop architecture look like?• How have MapR built a unique offering on top of Hadoop?

1© 2013 Onepoint IQ Limited | London

Page 3: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Introduction

Big Data & Apache Hadoop

MapR / Drill Demo

Summary & Further Learning

2© 2013 Onepoint IQ Limited | London

Page 4: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

About Onepoint IQOnepoint IQ empowers individuals and organisations to discover and deliverreal value from big (and small) data

3© 2013 Onepoint IQ Limited | London

Page 5: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

About MapROne Platform for Big Data

4© 2013 Onepoint IQ Limited | London

Batch

99.999% HA DataProtection

DisasterRecovery

Scalability&

PerformanceEnterpriseIntegration Multi-tenancy

MapReduce

File-BasedApplications SQL Database Search Stream

Processing

Interactive Real-time

…Broad

range ofapplications

Recommendation Engines Fraud Detection Billing Logistics

Risk Modeling Market Segmentation Inventory Forecasting

Page 6: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Introduction – Onepoint IQ & MapRPresentation Team Today

5© 2013 Onepoint IQ Limited | London

Michael HausenblasChief Data Engineer - EMEA

Shashin ShahTechnology Zen Master | Founding Dir

Page 7: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Introduction

Big Data & Apache Hadoop

MapR / Drill Demo

Summary & Further Learning

6© 2013 Onepoint IQ Limited | London

Page 8: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Big Data DefinitionsAlthough there is not a universally accepted definition for ‘Big Data’, all acknowledge thetechnology advances that are now available to handle data (big and small).

7© 2013 Onepoint IQ Limited | London

“Datasets whose size isbeyond the ability oftypical database softwaretools to capture, store,manage, and analyse.”

“Big data technologiesdescribe a newgeneration oftechnologies andarchitectures, designedto economically extractvalue from very largevolumes of awide variety of data, byenabling high-velocitycapture, discovery,and/or analysis.”

“Big data is data thatexceeds the processingcapacity of conventionaldatabase systems. Thedata is too big, moves toofast, or doesn't fit thestrictures of yourdatabase architectures.To gain value from thisdata, you must choosean alternative way toprocess it.”

Page 9: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Business Needs Spurn InnovationWeb search engines were among the first to confront the ‘Big Data’ problem. Today, socialnetworks, mobile phones, sensors and science contribute to petabytes of data created daily.

8

advertising optimisationmail anti-spam

video & audio processingad selection

web search

user interest prediction

customer trendanalysis

analysing web logs

content optimisation

data analytics

machine learning

data mining

textmining

social media

Source: Hortonworks, Apache Lucene Eurocon, Barcelona, 2011

© 2013 Onepoint IQ Limited | London

Page 10: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Big Data AdvancesThe technology advances around ‘Big Data’ can be broadly grouped into two categories.

9© 2013 Onepoint IQ Limited | London

Datamanagement

advances

Analyticsadvances

Big, fast data Big, better analytics

Page 11: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Traditional vs. Big Data – Data Management AdvancesThe advances donot replace existing enterprise data warehousing (EDW), businessintelligence (BI), or analytics approaches –they instead enhance andextend them.

10© 2013 Onepoint IQ Limited | London

• Integrated data sources • Virtualised and blended data sources

• Structured data • Unstructured, semi-structured, multi-structureddata

• Aggregated and granular data (with limitations) • Large volumes of granular data (without limits)

• Relational EDW with at-rest data • Non-relational EDW with at-rest data

• Dimensional cubes / marts with at-rest data • Streaming systems with in-motion data

• One-size-fits-all data management • Flexible and optimised data management

Traditional decision-making environment

Big data innovationsand extensions

Datamanagement

advances

Page 12: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Traditional vs. Big Data – Analytics advancesThe advances donot replace existing enterprise data warehousing (EDW), businessintelligence (BI), or analytics approaches –they instead enhance andextend them.

11© 2013 Onepoint IQ Limited | London

• Reporting and OLAP • Advanced analytic functions and predictivemodels

• Dashboards and scorecards • Sophisticated visualisation of large data sets

• Structured navigation (drill down / up; slice /dice)

• Flexible exploration of large data sets

• Humans interpret results, patterns and trends • Sophisticated trend and pattern analysisthrough machine learning

• Manual analyses, decisions, actions • Model / rules-driven decisions & actions

Analyticsadvances

Traditional decision-making environment

Big data innovationsand extensions

Page 13: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Our View‘Big Data’ is data that becomes large enough that it cannot be processed usingconventional methods. A fantastic array of advances aim to address this challenge.

12© 2013 Onepoint IQ Limited | London

“True business value comes from:

• properly choosing and applying• appropriate ‘Big Data’ technology

advances to• specific data challenges (or

opportunities) of an organisation• whether big or small.”

Page 14: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Use Case Clusters (or Deployment Patterns)To speed up the discovery of benefits and value from Big (and small) Data, we group thenearly endless permutations of Big Data use cases into a few key clusters.

13© 2013 Onepoint IQ Limited | London

3. Real-timemonitoring &

analytics

4. Near real-time

analytics

1. Dataintegration

hub

5. Investigativecomputing

2. Analyticsaccelerator

6. Newmarketsenabler

These clusters are not mutually exclusive. A given client use case is likely to be a combination of theseclusters. For example, a data integration hub (1) may be a pre-requisite to investigative computing (5).

Page 15: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Big Data – A Brief HistoryBig Data technologies have developed over time to meet increasing need to process largedata volumes

© 2013 Onepoint IQ Limited | London 14

Open source webcrawler projectcreated by Doug

Cutting

Publishes MapReduceand GFS Paper

Open SourceMapReduce and HDFS

project created byDoug Cutting

Runs 4,000-nodeHadoop cluster

Hadoop winsTerabyte sortbenchmark

Launches SQL supportfor Hadoop

Other HadoopDistributions

Cloudera MapR /Hortonwworks

2002 2007 2012

Page 16: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Growth of Hadoop - Examples

15© 2013 Onepoint IQ Limited | London

Source : http://hadoopblog.blogspot.co.uk/2010/05/facebook-has-worlds-largest-hadoop.html

Source: http://nosql.mypopescu.com/post/42441081929/hadoop-at-yahoo-2013-update

2009: 10 & 28 node clusters2010: Hundreds node cluster / multi PB2011: Thousands node cluster(s) / 10s PB

Source: eBay’s Hadoop Stack: Evolution and Revolution, Juhan Lee, ebay

Page 17: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

“There’s this wonderful technology at Google. I would love to be able to use it but I can’t because Idon’t work at Google. There are probably a lot of other people who feel that same way, and opensource is a great way to get technology to everyone.”.

I’ve always loved open source because it’s such a tremendous lever.

What I look for is a way to find the smallest thing I can do, with the least amount of work thatwill have the most impact. Where is the leverage point?Hadoop came out of that. We needed to do some vast computing, but I also saw a lot of otherworkloads that could benefit from this.

-Doug Cutting, Director Apache Software Foundation (2006)

Source : Forbes http://www.forbes.com/sites/netapp/2013/01/16/big-data-hadoop-doug-cutting/

© 2013 Onepoint IQ Limited | London 16

Page 18: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Big Data AdvancesReal value comes from taking advantage of these advances to develop data products andservices.

17© 2013 Onepoint IQ Limited | London

Datamanagement

advances

Analyticsadvances

Big, fast data Big, better analytics

Data Services &Apps

Real business value

Page 19: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Emerging Big Data Reference Stack

18

• As the foundational layer in the bigdata stack, the cloud provides thescalable persistence and computepower needed to manufacture dataproducts.

• At the middle layer of the big datastack is analytics, where features areextracted from data, and fed intoclassification and predictionalgorithms.

• Finally, at the top of the stack areservices and applications. This is thelevel at which consumers experience adata product, whether it be a musicrecommendation or a traffic routeprediction.

Source: O’Reilly Strata

Big, fastdata

Big, betteranalytics

Data Services& Apps

© 2013 Onepoint IQ Limited | London

Page 20: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Core Hadoop ArchitectureHighly scalable, distributed, fault tolerant Architecture running on off the shelfhardware; two core components –HDFS&MapReduce

19© 2013 Onepoint IQ Limited | London

Page 21: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Hadoop within existing Enterprise ArchitectureWe seeHadoop as complimentary to current Enterprise Architecture to extract valuefrom newer data sources both structured and unstructured

20© 2013 Onepoint IQ Limited | London

Dat

a An

alyt

ics

& A

pplic

atio

nsD

ata

Stor

age

& M

anag

emen

t

Page 22: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

A Simplified Comparison

21

The ‘Old’ way: Store some of the data. Process and analyze some of the

data. Setup specific schemas and queries. Huge effort when schemas have to

change.

The ‘Big Data’ way: Store all the data you want. Process and Analyze all Your Data. Ask new Questions for further analysis. Ask more Questions Get Answers faster Get clearer Insight Make better business decisions

CRM

Normalized DataODS or

TraditionalData

Warehouse

Billing

Finance

Data ETLData QualityData

© 2013 Onepoint IQ Limited | London

Page 23: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Evolving Big Data Landscape

© 2013 Onepoint IQ Limited | London 22

Source: Bloomberg Ventures (Matt Truck @matttruck & Shivon Zillis @shivonz)

Page 24: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Practical experience with:

Hadoop infrastructure

Big data management

NoSQL & other databases

Big data search

Analytic discovery

Analytic visualisation

Cloud platforms

Systems integrators

Business consultancies

Sector & point specialists

EcosystemOurpractical experience with an ever-evolving ecosystem andour vendor independencehelp us to quickly define atailored solution that is ‘right’ for our clients.

23© 2013 Onepoint IQ Limited | London

Shark

(This is a highly simplified characterisation of the landscape, as solution stacks often cut across multiple categories.)

Page 25: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Introduction

Big Data & Apache Hadoop

MapR / Drill Demo

Summary & Further Learning

24© 2013 Onepoint IQ Limited | London

Page 26: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Introduction

Big Data & Apache Hadoop

MapR / Drill Demo

Summary & Further Learning

25© 2013 Onepoint IQ Limited | London

Page 27: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Summary / Wrap Up

26© 2013 Onepoint IQ Limited | London

• There is more to Big Data than the hype• Many of the advances are powered by Open Source• The ecosystem around Big Data is evolving rapidly• Most organisations can experimentation with Big (and small) Data• MapR have built a unique offering on top of Open Source

Page 28: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Further LearningLearning can take many shapes, depending on organisational and individual needs.

27

EcosystemWorkshops

1-2 days

AdministratorWorkshops

3-5 days

TechnicalProgramme

BusinessProgramme

ExecutiveProgramme

TechnicalBriefing1-2 hours

BusinessBriefing45-90 mins

IdeationWorkshop

2-3 hours

DiscoveryWorkshop

½-1 day

ExecutiveBriefing45-90 mins

‘CXO’Workshop

3-4 hours

DeveloperWorkshops

3-5 days

© 2013 Onepoint IQ Limited | London

Page 29: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Cheatsheet & Mythbuster

28© 2013 Onepoint IQ Limited | London

Page 30: Realising Value from Data - British Computer Society · Realising Value from Data Togetherwith. Timings 6:00 - 6:30pm. Register / Refreshments ... for Hadoop Other Hadoop Distributions

Thank you

29© 2013 Onepoint IQ Limited | London

Shashin Shah | Technology Zen Master | Founder Director | [email protected] Kulasingam | Chief Stratnologist | Founder Director | [email protected] (Sasha) Polev | (The Data Professor) Chief Architect | [email protected] Hausenblas | Chief Data Engineer EMEA at MapR Technologies | [email protected]