wandisco non-stop hadoop: phxdataconference presentation oct 2014

33
Non-Stop Hadoop: Adding R-A-S to your Hadoop clusters using a Globally Consistent HDFS Namespace Presented by Chris Almond @ Phoenix Data Conference October 2014

Upload: chris-almond

Post on 07-Jul-2015

245 views

Category:

Software


1 download

DESCRIPTION

Hadoop has quickly evolved into the system of choice for storing and processing Big Data, and is now widely used to support mission-critical applications that operate within a ‘data lake’ style infrastructures. A critical requirement of such applications is the need for continuous operation even in the event of various system failures. This requirement has driven adoption of multi-data center Hadoop architectures, a.k.a geo-distributed or global Hadoop. In this session we will provide a brief introduction to WANdisco, then dig into how our Non-Stop Hadoop solution addresses real world use cases, and also a show live demonstration of Non-Stop namenode operation across two WAN connected hadoop clusters.

TRANSCRIPT

Page 1: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

Non-Stop Hadoop: Adding R-A-S to your Hadoop clusters using a Globally Consistent HDFS Namespace Presented by Chris Almond @ Phoenix Data Conference October 2014

Page 2: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

2   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

For Today Who am I and what is this about?

At Work: [email protected]

On line: www.linkedin.com/in/chrisalmond/

www.twitter.com/calmo

Session Description: Hadoop has quickly evolved into the system of choice for storing and processing Big Data, and is now widely used to support mission-critical applications that operate within a ‘data lake’ style infrastructures. A critical requirement of such applications is the need for continuous operation even in the event of various system failures. This requirement has driven adoption of multi-data center Hadoop architectures, a.k.a geo-distributed or global Hadoop. In this session we will provide a brief introduction to WANdisco, then dig into how our Non-Stop Hadoop solution addresses real world use cases, and also a show live demonstration of Non-Stop namenode operation across two WAN connected hadoop clusters.

Page 3: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

3   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

WANdisco Background

•  WANdisco: Wide Area Network Distributed Computing –  Enterprise ready, high availability software solutions that enable globally distributed

organizations to meet today’s data challenges of secure storage, scalability and availability •  Leader in tools for software engineers – Subversion

–  Apache Software Foundation sponsor •  Highly successful IPO, London Stock Exchange, June 2012 (LSE:WAND) •  US patented active-active replication technology granted, November 2012 •  Global locations

–  San Ramon (CA) –  Chengdu (China) –  Tokyo (Japan) –  Boston (MA) –  Sheffield (UK) –  Belfast (UK)

Page 4: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

4   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

Customers

Page 5: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

5   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

Non-Stop Hadoop

Non-Intrusive Plugin

Provides Continuous Availability In the LAN / Across the WAN

Active/Active

Page 6: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

6   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

Key Problem For Multi Cluster Hadoop LAN / WAN

+   =  

Page 7: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

7   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

•  Require Continuous Availability –  SLA’s, Regulatory Compliance

•  Require HDFS to be Deployed Globally –  Share Data Between Data Centers –  Data is Consistent and Not Eventual

•  Ease Administrative Burden –  Reduce Operational Complexity –  Simplify Disaster Recovery –  Lower RTO/RPO

•  Allow Maximum Utilization of Resource

–  Within the Data Center –  Across Data Centers

Enterprise Ready Hadoop Characteristics of Mission Critical Applications

Page 8: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

8   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

The difficulty realizing the data lake…

Page 9: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

9   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

… is that data spans the entire world

Page 10: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

10   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

Single Standby •  Inefficient utilization of resource

–  Journal Nodes –  ZooKeeper Nodes –  Standby Node

•  Performance Bottleneck •  Still tied to the beeper •  Limited to LAN scope

Active / Active •  All resources utilized

–  Only NameNode configuration –  Scale as the cluster grows –  All NameNodes active

•  Load balancing •  Set resiliency (# of active NN) •  Global Consistency

Breaking Away from Active/Passive What’s in a NameNode

Page 11: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

11   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

Standby Datacenter •  Idle Resource

–  Single Data Center Ingest –  Disaster Recovery Only

•  One way synchronization –  DistCp

•  Error Prone –  Clusters can diverge over time

•  Difficult to scale > 2 Data Centers –  Complexity of sharing data

increases

Active / Active •  DR Resource Available

–  Ingest at all Data Centers –  Run Jobs in both Data Centers

•  Replication is Multi-Directional –  active/active

•  Absolute Consistency –  Single HDFS spans locations

•  ‘N’ Data Center support –  Global HDFS allows appropriate

data to be shared

Breaking Away from Active/Passive What’s in a Data Center

Page 12: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

12   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

One Cluster Aproach

•  Example Applications

–  HBASE –  RT Query –  Map Reduce

•  Poor Resource Management

–  Data Locality Issues –  Network Use –  Complex

Multiple Clusters

Page 13: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

13   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

Creating Multiple Clusters

•  Example Applications

–  HBASE –  RT Query –  Map Reduce

•  Need to share data between clusters

–  DistCp / Stale Data –  Inefficient use of

storage and or network

–  Some clusters may not be available

Multiple Clusters

Page 14: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

14   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

Cluster Zones Zoning for Optimal Efficiency

1 100%

HDFS  

Consistency  

Page 15: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

15   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

Multi Datacenter Hadoop Disaster Recovery

WAN  REPLICATION    

Absolute  Consistency  Maximum  Resource  Use  

Lower  Recovery  Time/Point    

Replicate  Only  What  You  Want  BeEer  UHlizaHon  of  Power/Cooling  

Lower  TCO  LAN  Speed  Performance  

 

Page 16: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

Technical Overview Hadoop Powered by WANdisco

Page 17: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

17   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

Periodic Synchronization DistCp

Parallel Data Ingest Load Balancer, Streaming

Multi Data Center Hadoop Today What's wrong with the status quo

Page 18: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

18   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

Periodic Synchronization DistCp

Multi Data Center Hadoop Today Hacks currently in use

•  Runs as Map reduce •  DR Data Center is read only •  Over time, Hadoop clusters

become inconsistent •  Manual and labor intensive

process to reconcile differences •  Inefficient us of the network

Page 19: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

19   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

Parallel Data Ingest Load Balancer, Flume

Multi Data Center Hadoop Today Hacks currently in use

•  Hiccups in either of the Hadoop cluster causes the two file systems to diverge

•  Potential to run out of buffer when WAN is down

•  Requires constant attention and sys-admin hours to keep running

•  Data created on the cluster is not replicated

•  Use of streaming technologies (like flume) for data redirection are only for streaming

Page 20: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

20   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

DConE Distributed Coordination Engine

•  WANdisco’s patented WAN capable paxos implementation –  Mathematically proven –  Provides distributed co-ordination of File system metadata

•  Active/Active (All locations) •  Create, Modify, Delete •  Shared nothing (No Leader)

•  No restrictions on distance between datacenters –  US Patent granted for time independent implementation of Paxos

•  Not based on SAN block device synchronization such as EMC SRDF –  SAN block replication has distance limits resulting from the inability of file systems

such as NTFS and ext4 to tolerate long RTTs to block storage –  Possible distribution of corrupted blocks

Page 21: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

21   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

DConE Distributed Coordination Engine

•  WANdisco’s patented WAN capable paxos implementation –  Mathematically proven –  Provides distributed co-ordination of File system metadata

•  Active/Active (All locations) •  Create, Modify, Delete •  Shared nothing (No Leader)

•  No restrictions on distance between datacenters –  US Patent granted for time independent implementation of Paxos

•  Not based on SAN block device synchronization such as EMC SRDF –  SAN block replication has distance limits resulting from the inability of file systems

such as NTFS and ext4 to tolerate long RTTs to block storage –  Possible distribution of corrupted blocks

PAXOS

Paxos is a family of protocols for solving consensus in a network of unreliable processors.

Consensus is the process of agreeing on one result among a group of participants.

This problem becomes difficult when the participants or their communication medium may experience failures.

Page 22: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

22   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

DConE Distributed Coordination Engine

•  WANdisco’s patented WAN capable paxos implementation –  Mathematically proven –  Provides distributed co-ordination of File system metadata

•  Active/Active (All locations) •  Create, Modify, Delete •  Shared nothing (No Leader)

•  No restrictions on distance between datacenters –  US Patent granted for time independent implementation of Paxos

•  Not based on SAN block device synchronization such as EMC SRDF –  SAN block replication has distance limits resulting from the inability of file systems

such as NTFS and ext4 to tolerate long RTTs to block storage –  Possible distribution of corrupted blocks

PAXOS

Leslie  Lamport:  Any  node  that  proposes  aDer  a  decision  has  been  reached  must  communicate  with  a  node  in  the  majority.  The  protocol  guarantees  that  it  will  learn  the  previously  agreed  upon  value  from  that  majority.  hEp://research.microsoW.com/en-­‐us/um/people/lamport/pubs/pubs.html    

hEp://research.microsoW.com/en-­‐us/um/people/lamport/pubs/lamport-­‐paxos.pdf  

hEp://css.csail.mit.edu/6.824/2014/papers/paxos-­‐simple.pdf  

Page 23: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

23   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

DConE Distributed Coordination Engine

•  WANdisco’s patented WAN capable paxos implementation –  Mathematically proven –  Provides distributed co-ordination of File system metadata

•  Active/Active (All locations) •  Create, Modify, Delete •  Shared nothing (No Leader)

•  No restrictions on distance between datacenters –  US Patent granted for time independent implementation of Paxos

•  Not based on SAN block device synchronization such as EMC SRDF –  SAN block replication has distance limits resulting from the inability of file systems

such as NTFS and ext4 to tolerate long RTTs to block storage –  Possible distribution of corrupted blocks

PAXOS

“Contrary to conventional wisdom, we were able to use Paxos to build a highly available system that provides reasonable latencies for interactive applications while synchronously replicating writes across geographically distributed datacenters.“ http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf …  

Page 24: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

24   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

•  Majority Quorum –  A fixed number of participants –  The Majority must agree for change

•  Failure –  Failed nodes are unavailable –  Normal operation continue on nodes

with quorum

•  Recovery / Self Healing –  Nodes that rejoin stay in safe mode

until they are caught up

•  Disaster Recovery –  A complete loss can be brought back

from another replica

How DConE Works WANdisco Active/Active Replication

TX  id:  168  TX  id:  169  TX  id:  170  TX  id:  171  TX  id:  172  TX  id:  173  

TX  id:  168  TX  id:  169  TX  id:  170  TX  id:  171  TX  id:  172  TX  id:  173  

TX  id:  168  TX  id:  169  TX  id:  170  TX  id:  171  TX  id:  172  TX  id:  173  

Proposal  170  

Agree  170  

Agree  170  

Proposal  171  Agree  172  Agree  173  

Agree  171  Proposal  172  Proposal  173  

B  

A  

C  Agree  170  Agree  171   Agree  172  

Agree  173  

Page 25: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

25   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

Architecture of a Non-Stop Hadoop

Page 26: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

26   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

•  Data is as current as possible (no periodic synchs)

•  Doesn’t require monitoring and consistency checking

•  Virtually zero downtime to recover from regional data center failure

•  Regulatory compliance

Use Case: Disaster Recovery Use Cases

Page 27: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

27   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

•  Ingest and analyze anywhere •  Analyze Everywhere

–  Fraud Detection –  Equity Trading Information –  New Business –  Etc…

•  Backup Datacenter(s) can be used for work

–  No idle resource

Use Case: Multi Data-Center Ingest and multi-tenant workloads

Page 28: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

28   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

•  Maximize Resource Utilization –  No idle standby

•  Isolate Dev and Test Clusters –  Share data not resource

•  Carve off hardware for a specific group

–  Prevents a bad map/reduce job from bringing down the cluster

•  Guarantee Consistency and availability of data

–  Data is instantly available

Use Case: Zones

Page 29: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

29   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

•  Mixed Hardware Profiles –  Memory, Disk, CPU –  Isolate memory-hungry

processing (Storm/Spark) from regular jobs

•  Share data, not processing –  Isolate lower priority (dev/

test) work

Use Case: Heterogeneous Hardware (Zones) In memory analytics

Page 30: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

30   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

Data  Ocean  

Feeder  Site  

AccounHng  Mart  

Banking  Mart  

•  Data Marts –  Restrict access to relevant

data –  Create Quick Clusters

•  Feeder Sites (Data Tributaries) –  Ingest Only

Data Reservoir Use Cases

Page 31: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

31   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

•  Basel III –  Consistency of Data

•  Data Privacy Directive –  Data Sovereignty

•  data doesn’t leave country of origin

Compliance  

RegulaHon  

Guidelines  

Regulatory Compliance

Page 32: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

32   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

5 Reasons your Hadoop Deployment Needs Wandisco

Page 33: WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

33   WWW.WANDISCO.COMREALIZING THE POSSIBILITIES OF BIG DATA

Non-Stop Hadoop Demonstration