big data advance topics - part 2.pptx

13
© 2016 Ness SES. All Rights Reserved 1 BIG DATA advanced topics Cloudera vs Hortonworks MOLDOVAN Radu Adrian Timisoara May 2016

Upload: moldovan-radu-adrian

Post on 16-Apr-2017

292 views

Category:

Engineering


3 download

TRANSCRIPT

Page 1: Big data   advance topics - part 2.pptx

© 2016 Ness SES. All Rights Reserved1

BIG DATAadvanced topics

Cloudera vs HortonworksMOLDOVAN Radu Adrian Timisoara May 2016

Page 2: Big data   advance topics - part 2.pptx

© 2016 Ness SES. All Rights Reserved2

Who am I? :)❏passionate about

technology❏20 years of programming using open source❏ last 4 years in Big Data

❏Big Data Architect @

Page 3: Big data   advance topics - part 2.pptx

© 2016 Ness SES. All Rights Reserved3

Page 4: Big data   advance topics - part 2.pptx

© 2016 Ness SES. All Rights Reserved4

Cloudera and Hortonworks: The Similarities

- set on top of Apache Hadoop

- both are mature offering security

- provide paid consulting, training and services

- strong development communities

- master-slave architecture

- support MapReduce

- YARN as resource manager

- reducing the deployment time

- set on top of Apache Hadoop

- both are mature offering security

- provide paid consulting, training

and services

- strong development communities

- master-slave architecture

- support MapReduce

- YARN as resource manager

- reducing the deployment time

The Similarities

Page 5: Big data   advance topics - part 2.pptx

© 2016 Ness SES. All Rights Reserved5

Cloudera and Hortonworks: The Differences

- a commercial license

(a free 60-day trial)

- reposition as “enterprise

data hub”

- 2008, Facebook, Google,

Oracle and Yahoo in 2008

- +400 customers

- founds $1.04B

- open source license is

completely free.

- positioned as Hadoop distro

- has no proprietary software

- 2011, Teradata

- Yahoo & Microsoft

- founds $248M

https://www.crunchbase.com

Page 6: Big data   advance topics - part 2.pptx

© 2016 Ness SES. All Rights Reserved6

Security Solutions

http://www.forbes.com/sites/gilpress/2016/03/14/top-10-hot-big-data-technologies/#7cd07887f26a

HortonworksApache RangerApache KnoxApache Falcon

Cloudera Project RhinoProject Sentry

Page 7: Big data   advance topics - part 2.pptx

© 2016 Ness SES. All Rights Reserved7

HADOOP (HDFS) (C+H)

Res. ManagerYarn (C+H)

Warehouse DBPresto (H)

MapReducePIG(C+H)

Search EnginesSolrCloud (C+H)

Analytics

Columnar Store

Accumulo (C+H)

Impala(C)

Machine

LearningSpark ML (C+H)

Mahout(H)

HBase(C+H)

Data StreamingStorm(H)Spark Streaming(C+H)

HIVE (C+H)

Tableau

Data AggregationFlume (C+H)

Msg Brokers + Streams

Kafka (C+H)

COLLECT PROCESS STORE VISUALIZE

Data LoaderSqoop (C+H)

Cluster ecosystem - VISUALIZE

In MemorySpark (C+H)

Tez (H)

Logi

Jasper Reports

D3

Pentaho*Interactive Reporting

Crystal Reports

Data GovernanceAtlas (H)

Page 8: Big data   advance topics - part 2.pptx

© 2016 Ness SES. All Rights Reserved8

Cloudera

Page 9: Big data   advance topics - part 2.pptx

© 2016 Ness SES. All Rights Reserved9

Cloudera Management Service

Page 10: Big data   advance topics - part 2.pptx

© 2016 Ness SES. All Rights Reserved10

Hortonworks

Page 11: Big data   advance topics - part 2.pptx

© 2016 Ness SES. All Rights Reserved11

Trends - Forbes report Q1 2016

http://www.forbes.com/sites/gilpress/2016/03/14/top-10-hot-big-data-technologies/#7cd07887f26a

Page 12: Big data   advance topics - part 2.pptx

© 2016 Ness SES. All Rights Reserved12

Big Data - Buzz words #TAGs

FAULT TOLERANCE

DATA LOCALITY

LAMBDA ARCHITECTURE

CRUD => CRUD

SHARDING

REPLICATION

RESILIENT SYSTEMS

DISRUPTIVE TECHNOLOGIES

Cloud ComputingInternet of ThingsData Analytics

Page 13: Big data   advance topics - part 2.pptx

© 2016 Ness SES. All Rights Reserved13

Thank you!

Skype: r.moldovan