security implementation on hadoop

Post on 21-Jan-2018

364 Views

Category:

Engineering

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1© Cloudera, Inc. All rights reserved.

Security Implementation on Hadoop

Dr. Wei-Chiu Chuang | Software

Engineer

2© Cloudera, Inc. All rights reserved.

$ whoami

Software Engineer, Cloudera Apache Hadoop Committer/PMC

3© Cloudera, Inc. All rights reserved.

Unguarded data stores are the victims

4© Cloudera, Inc. All rights reserved.

Regulatory Compliance

Organizations can be fined up to 4% of annual global turnover for breaching GDPR

or €20 Million

6© Cloudera, Inc. All rights reserved.

Security Implementation

7© Cloudera, Inc. All rights reserved.

Disclaimer

This talk serves as a general guideline for

security implementation on Hadoop.

The actual implementation procedures and

scope of implementation vary on a case-

by-case basis, and should be assessed by

Cloudera’s Professional Services team or

certified Cloudera SI Partners.

8© Cloudera, Inc. All rights reserved.

Non-secure #0Data Free for All

9© Cloudera, Inc. All rights reserved.

Firewall

ActiveDirectory/KDC

Hadoop cluster

Cloudera Manager

Gateway node

Cloudera NavigatorDatacenter

Applications

10© Cloudera, Inc. All rights reserved.

High Availability made Easy

11© Cloudera, Inc. All rights reserved.

Identity Management

Simple AuthenticationFile group ownership• AD integration• SSSD or CentrifyConsideration in large enterprises.

SSSD

via

12© Cloudera, Inc. All rights reserved.

System Diagram #0

Firewall

ActiveDirectory

Master

Worker Worker Worker

Cloudera Manager

Master

(SSSD/Centrify)

13© Cloudera, Inc. All rights reserved.

Simple authentication =

no authentication

14© Cloudera, Inc. All rights reserved.

Minimal Security #1

Reduce Risk Exposure

15© Cloudera, Inc. All rights reserved.

Kerberos

EXAMPLE.COM

KDC

user@EXAMPLE.COM

Hadoop

user@EXAMPLE.COM

user

Strong Authentication

KDC

• MIT

• ActiveDirectory (more common)

realmprimary

16© Cloudera, Inc. All rights reserved.

Kerberos

Consideration in large corporates

Time synchronization

CM Kerberos Wizard

• Configure AD to create a Kerberos

principal for CM server, and to

delegate CM the ability to

create/manage Kerberos principals

17© Cloudera, Inc. All rights reserved.

LDAP Authentication

* LDAP over SSL

18© Cloudera, Inc. All rights reserved.

Authorization/Access Control

HDFS File ACL YARN job submission

Hbase ACLs Oozie ACL

Access Control List (ACLs)

Hive

Sentry Managed

(RBAC)

Impala

19© Cloudera, Inc. All rights reserved.

Auditing

20© Cloudera, Inc. All rights reserved.

Backup/Disaster Recovery

Cloudera Backup/Disaster Recovery (BDR)

• A high performance data replicator

• Copies incremental data on the source cluster at specified schedules

Supports

Kerberos

Data encryption

HDFS replication to cloud

21© Cloudera, Inc. All rights reserved.

Kerberized BDR Best Practice

Production DR

Cloudera BDRPROD.EXAMPLE.COM

Cross-realm trustKDC KDC

DR.EXAMPLE.COM

22© Cloudera, Inc. All rights reserved.

Firewall

System Diagram #1

ActiveDirectory/ KDC

Master

Worker Worker Worker

Cloudera Manager

Kerberos

Master

(SSSD/Centrify)

DR

23© Cloudera, Inc. All rights reserved.

More Security #2

Managed, Secure, Protected

24© Cloudera, Inc. All rights reserved.

Data In-Transit Encryption

RPC encryption

Data transport encryption

• Supports AES CTR, up to 256-bit

key length

HTTP TLS/SSL encryption

• No self-signed certificates in

production

Master

Worker Worker Worker

Master

Application

RPC encryption

Transport encryption

TLS/SSL

25© Cloudera, Inc. All rights reserved.

Data At-Rest Encryption

Transparent encryption

Supports any Hadoop applications

Encryption Zone

$ hadoop key create mykey

$ hadoop fs -mkdir /zone

$ hdfs crypto -createZone -keyName mykey -path /zone

/

/tmp/zon

e

foo bar

Encryption zone

26© Cloudera, Inc. All rights reserved.

Key Management Server Deployment (non-prod)

HDFS NameNode

Client

Java Keystore

KMS

Keystorefile

Separation of duties

• Encryption Zone Key (EZK) is stored in

KMS server

• HDFS super user can not decrypt files

27© Cloudera, Inc. All rights reserved.

Key Management Server/Key Trustee Server Deployment

HDFS NameNode

ClientKey Trustee

KMS

Key Trustee KMS

Firewall

Key Trustee Server

(Active)

Key Trustee Server

(Passive)

synchronization

(or more)

28© Cloudera, Inc. All rights reserved.

KMS+KTS+HSM Deployment

HDFS NameNode

Client HSM KMS

HSM KMS

Firewall

Key Trustee Server

(Active)

Key Trustee Server

(Passive)

synchronization

Key HSM

(or more)

Key HSM

HSM

HSM

29© Cloudera, Inc. All rights reserved.

Encryption Performance

30© Cloudera, Inc. All rights reserved.

Troubleshooting: Encryption Performance Anomaly

• Configuration

• AES-NI Hardware acceleration

• OpenSSL library

• Entropy

31© Cloudera, Inc. All rights reserved.

Fine Grained Access Control with Apache Sentry

32© Cloudera, Inc. All rights reserved.

Firewall

System Diagram #2

ActiveDirectory/ KDC

Master

Worker Worker Worker

Cloudera Manager

Kerberos

Master

KMSKMS

Firewall

KeyTrusteeKeyTrustee

(SSSD/Centrify)

33© Cloudera, Inc. All rights reserved.

Most Security #3

Secure Data Vault

34© Cloudera, Inc. All rights reserved.

Data Redaction

Personal Identifiable Information

• PCI-DSS, HIPAA

Best practice

Password

• stores in credential files, not in configuration

Log, queries

• Cloudera Manager

35© Cloudera, Inc. All rights reserved.

Full Encryption

Encrypt Data Spills

• MapReduce

• Impala

• Hive

• Flume

OS-level encryption

• Navigator Encrypt

36© Cloudera, Inc. All rights reserved.

Security Vulnerabilities

37© Cloudera, Inc. All rights reserved.

Vulnerability Response and Process

Vulnerability reports

Upstream

Internal

External

Fix Publish

CVE

Cloudera TSB

38© Cloudera, Inc. All rights reserved.

Cloudera Certified Technology

39© Cloudera, Inc. All rights reserved.

Cloudera Certified Technology Partners

Data Sources Data IngestProcess, Refine

& PrepData Discovery Advanced Analytics

Connected Machines/Data sources

Other Data Sources

40© Cloudera, Inc. All rights reserved.

A certified product ensures it integrates with a secure cluster

• Authenticate via Kerberos or LDAP

Authentication

• Handle Apache Sentry with Hive, Impala, Search, HDFS

Authorization

• Support HDFS transport encryption, at-rest encryption; support SSL/TLS connection encryption

Encryption

41© Cloudera, Inc. All rights reserved.

Cloudera SDX

42© Cloudera, Inc. All rights reserved.

Cloudera Enterprise

42

The modern platform for machine learning and analytics optimized for the cloud

EXTENSIBLE SERVICES

CORE SERVICESDATA

ENGINEERINGOPERATIONAL

DATABASEANALYTIC DATABASE

DATA CATALOG

INGEST & REPLICATION

SECURITY GOVERNANCEWORKLOAD

MANAGEMENT

DATA SCIENCE

S3 ADLS HDFS KUDUSTORAGESERVICES

43© Cloudera, Inc. All rights reserved.

• Unified security – protects sensitive data with consistent

controls, even for transient and recurring workloads

• Consistent governance – enables secure self-service access

to all relevant data and increases compliance

• Easy workload management – increases user productivity

and boosts job predictability

• Flexible ingest and replication – aggregates a single copy of

all data, provides disaster recovery, and eases migration

• Shared catalog – defines and preserves structure and

business context of data for new applications and partner

solutions

Open platform servicesBuilt for multi-function analytics | Optimized for cloud

44© Cloudera, Inc. All rights reserved.

Successful use cases

45© Cloudera, Inc. All rights reserved.

Cloudera Overview & Financial Services Focus

2000Strong Partner

Ecosystem

+

1600 Employees Globally

+

19 Of the 30 G-SIBs Run on Cloudera

Strong Focus & Momentum in Financial Services

3 Of the Fortune 500

Top 5 Insurers Run on Cloudera

5 Of the Top 6 Asset Management Firms

Run on Cloudera

200+ Financial Services Customers

47© Cloudera, Inc. All rights reserved.

Building a Fantastic Customer Experience

• Improved customer experience• 80 percent reduction in operating costs

through a wide-range of customer service and operational improvements

• Decrease in cost to service customers while increasing revenue through better service

CUSTOMER 360

FINANCIAL SERVICES» PREDICTIVE ANALYTICS» 360 CUSTOMER VIEW» OPERATIONAL ANALYTICS

48© Cloudera, Inc. All rights reserved.

Large healthcare provider enables practitioners to recommend at-home actions to prevent hospital visits

• Flexible, automatic data classification for diverse medical ontologies

• Self-service data discovery for real-time, data-driven decisions

49© Cloudera, Inc. All rights reserved.

Thank you

Wei-Chiu Chuang | weichiu@cloudera.com

50© Cloudera, Inc. All rights reserved.

More information on Hadoop Security

51© Cloudera, Inc. All rights reserved.

Books authored by Clouderans

top related