don't let security be the 'elephant in the room
DESCRIPTION
Enterprise security for big dataTRANSCRIPT
04/11/2023
Don’t Let Security Be The ‘Elephant in the Room’Enterprise Security for Big Data
Mitch Ferguson, VP Business Development, Hortonworks
Jeremy Stieglitz, VP Business Development, Voltage Security
© Hortonworks Inc. 2013
HortonworksCommunity DrivenEnterprise Apache Hadoop
June 2013
Page 2
© Hortonworks Inc. 2013
A Brief History of Apache Hadoop
Page 3
2013
Focus on INNOVATION2005: Yahoo! creates
team under E14 to work on Hadoop
Focus on OPERATIONS2008: Yahoo team extends focus to
operations to support multiple projects & growing clusters
Yahoo! begins to Operate at scale
EnterpriseHadoop
Apache Project Established
HortonworksData Platform
2004 2008 2010 20122006
STABILITY2011: Hortonworks created to focus on “Enterprise Hadoop“. Starts with
24 key Hadoop engineers from Yahoo
© Hortonworks Inc. 2013
Hortonworks Snapshot
Page 4
• We distribute the only 100% Open Source Enterprise Hadoop Distribution: Hortonworks Data Platform
• We engineer, test & certify HDP for enterprise usage
• We employ the core architects, builders and operators of Apache Hadoop
• We drive innovation within Apache Software Foundation projects
• We are uniquely positioned to deliver the highest quality of Hadoop support
• We enable the ecosystem to work better with Hadoop
Develop Distribute Support
We develop, distribute and support the ONLY 100% open source Enterprise Hadoop distribution
Endorsed by Strategic Partners
Headquarters: Palo Alto, CAEmployees: 200+ and growingInvestors: Benchmark, Index, Yahoo
© Hortonworks Inc. 2013
Enabling Hadoop as Enterprise Big Data Platform
OP
ER
ATIO
NSE
CO
SY
STE
MDEVELOPER
Enterprise R
eady & E
asy to Use
Data Platform Services & Open APIs
Ena
ble
Eco
syst
em a
t Eac
h La
yer
Hortonworks Data Platform
Applications,
Business Tools,
Development Tools,
Open APIs and access
Data Movement & Integration,
Data Management Systems,
Systems Management
Installation & Configuration,
Administration,
Monitoring,
High Availability,
Replication,
Multi-tenancy, ..
Metadata, Indexing, Search, Security, Management, Data Extract & Load, APIs
Page 5
© Hortonworks Inc. 2013
Hortonworks Partner Eco-System 140+
Page 6
© Hortonworks Inc. 2013Page 7
Apache Software Foundation Guiding Principles• Release early & often• Transparency, respect, meritocracy
Key Roles held by Hortonworkers• PMC Members
– Managing community projects– Mentoring new incubator projects– Over 20 Hortonworkers managing community
• Committers– Authoring, reviewing & editing code– Over 50 Hortonworkers across projects
• Release Managers– Testing & releasing projects– Hortonworkers across key projects like Hadoop,
Hive, Pig, HCatalog, Ambari, HBase
ApacheHadoop
Test &Patch
Design & Develop
Release
ApachePig
ApacheHCatalo
gApacheHBase
Other Apache Projects
ApacheHive
Apache Ambari
“We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s more innovative features. These include YARN, Ambari and HCatalog..”
- Jeff Kelly: Wikibon
Apache Community Leadership
© Hortonworks Inc. 2013
Leadership that Starts at the Core
Page 8
• Driving next generation Hadoop– YARN, MapReduce2, HDFS2, High
Availability, Disaster Recovery
• 420k+ lines authored since 2006– More than twice nearest contributor
• Deeply integrating w/ecosystem– Enabling new deployment platforms
– (ex. Windows & Azure, Linux & VMware HA)
– Creating deeply engineered solutions– (ex. Teradata big data appliance)
• All Apache, NO holdbacks– 100% of code contributed to Apache
© Hortonworks Inc. 2013
Hortonworks Process for Enterprise Hadoop
Page 9
Upstream Community Projects Downstream Enterprise Product
HortonworksData Platform
Design & Develop
Distribute
Integrate & Test
Package & Certify
ApacheHCatalo
g
ApachePig
ApacheHBase
Other Apache Projects
ApacheHive
Apache Ambari
ApacheHadoop
Test &Patch
Design & Develop
Release
Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstreamNo Lock-in: Integrated, tested & certified distribution lowers risk by ensuring close alignment with Apache projects
Stable Project Releases
Fixed Issues
“We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s more innovative features. These include YARN, Ambari and HCatalog.” - Jeff Kelly: Wikibon
© Hortonworks Inc. 2013
Enhancing the Core of Apache Hadoop
Deliver high-scale storage & processing with enterprise-ready platform services
Unique Focus Areas:• Bigger, faster, more flexible
Continued focus on speed & scale and enabling near-real-time apps
• Tested & certified at scale Run ~1300 system tests on large Yahoo clusters for every release
• Enterprise-ready servicesHigh availability, disaster recovery, snapshots, security, …
Page 10
HADOOP CORE
Hortonworkers are the architects, operators, and builders of core Hadoop
Distributed Storage & Processing
PLATFORM SERVICES Enterprise Readiness
© Hortonworks Inc. 2013Page 11
HADOOP CORE
DATASERVICES
Provide data services to store, process & access data in many ways
Unique Focus Areas:• Apache HCatalog
Metadata services for consistent table access to Hadoop data
• Apache Hive Explore & process Hadoop data via SQL & ODBC-compliant BI tools
Distributed Storage & Processing
Hortonworks enables Hadoop data to be accessed via existing tools & systems
Store, Process and Access Data
PLATFORM SERVICES Enterprise Readiness
Data Services for Full Data Lifecycle
© Hortonworks Inc. 2013
Operational Services for Ease of Use
Page 12
OPERATIONAL SERVICES
Include complete operational services for productive operations & management
Unique Focus Area:• Apache Ambari:
Provision, manage & monitor a cluster; complete REST APIs to integrate with existing operational tools; job & task visualizer to diagnose issues
Only Hortonworks provides a complete open source Hadoop management tool
Manage & Operate at
Scale
DATASERVICES
Store, Process and Access Data
HADOOP CORE Distributed Storage & Processing
PLATFORM SERVICES Enterprise Readiness
© Hortonworks Inc. 2013Page 13
Only Hortonworks allows you to deploy seamlessly across any deployment option
• Linux & Windows• Azure, Rackspace & other clouds• Virtual platforms• Big data appliances
Deployable Across a Range of Options
OS Cloud VM Appliance
PLATFORM SERVICES
HADOOP CORE
Enterprise Readiness
HORTONWORKS DATA PLATFORM (HDP)
OPERATIONAL SERVICES
DATASERVICES
Distributed Storage & Processing
Manage & Operate at
Scale
Store, Process and Access Data
© Hortonworks Inc. 2013
OS Cloud VM Appliance
HDP: Enterprise Hadoop Distribution
Page 14
PLATFORM SERVICES
HADOOP CORE
Enterprise Readiness
HORTONWORKS DATA PLATFORM (HDP)
OPERATIONAL SERVICES
DATASERVICES
Hortonworks Data Platform (HDP)Enterprise Hadoop
• The ONLY 100% open source and complete distribution
• Enterprise grade, proven and tested at scale
• Ecosystem endorsed to ensure interoperability
Distributed Storage & Processing
Manage & Operate at
Scale
Store, Process and Access Data
© Hortonworks Inc. 2013
OS Cloud VM Appliance
HDP: Enterprise Hadoop Distribution
Page 15
PLATFORM SERVICES
HADOOP CORE
Enterprise ReadinessHigh Availability, Disaster Recovery,Security and Snapshots
HORTONWORKS DATA PLATFORM (HDP)
OPERATIONAL SERVICES
DATASERVICES
HIVE & HCATALOG
PIG HBASE
OOZIE
AMBARI
HDFS
MAP REDUCE
Hortonworks Data Platform (HDP)Enterprise Hadoop
• The ONLY 100% open source and complete distribution
• Enterprise grade, proven and tested at scale
• Ecosystem endorsed to ensure interoperability
SQOOP
FLUME
NFS
LOAD & EXTRACT
WebHDFS
© Hortonworks Inc. 2013
OS/VM Cloud Appliance
HDP: Enterprise Hadoop Distribution 2.0
Page 16
PLATFORM SERVICES
HADOOP CORE
Enterprise ReadinessHigh Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots
HORTONWORKS DATA PLATFORM (HDP)
OPERATIONAL SERVICES
DATASERVICES
HIVE &HCATALOG
PIG HBASE
HDFS
MAP
Hortonworks Data Platform (HDP)Enterprise Hadoop
• The ONLY 100% open source and complete distribution
• Enterprise grade, proven and tested at scale
• Ecosystem endorsed to ensure interoperability
SQOOP
FLUME
NFS
LOAD & EXTRACT
WebHDFS
KNOX*
OOZIE
AMBARI
FALCON*
YARN*
TEZ* OTHERREDUCE
*included HDP 2.0
© Hortonworks Inc. 2013
Secure Hadoop Cluster
Apache Knox Gateway
Page 17
Browser
RESTClient
Masters
Slaves
JTNNWebHCat
Oozie
AA DN TTEnterprise
IdentityProvider
Firew
all
Firew
all
Ambari Server
YARN
Enterprise/Cloud SSO
Provider
Knox Gateway Cluster
GWGWGW
HUE
JDBCClient
HBaseHive
DMZ
© Hortonworks Inc. 2013
Big DataTransactions, Interactions, Observations
Hadoop Common Patterns of Use
Business Cases
HORTONWORKSDATA PLATFORM
Refine Explore Enrich
Batch Interactive Online
“Right-time” Access to Data
Page 18
© Hortonworks Inc. 2013
Operational Data RefineryDA
TA S
YSTE
MS
DATA
SO
URC
ES
1
3
1 Capture
Process
Distribute & Retain
2
3
Refine Explore Enrich
2
APPL
ICAT
ION
S
Transform & refine ALL sources of data
Also known as Data Reservoir or Catch Basin
TRADITIONAL REPOSRDBMS EDW MPP
Business Analytics
Custom Applications
Enterprise Applications
Traditional Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensor data, social media)
Page 19
HORTONWORKS DATA PLATFORM
© Hortonworks Inc. 2013
Big Data Exploration & VisualizationDA
TA S
YSTE
MS
DATA
SO
URC
ES
Refine Explore Enrich
APPL
ICAT
ION
S
Leverage “data lake” to perform iterative investigation for value
3
2TRADITIONAL REPOS
RDBMS EDW MPP
1
Business Analytics
Traditional Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensor data, social media)
Custom Applications
Enterprise Applications
1 Capture
Process
Explore & Visualize
2
3
Page 20
HORTONWORKS DATA PLATFORM
© Hortonworks Inc. 2013
DATA
SYS
TEM
SDA
TA S
OU
RCES
Refine Explore Enrich
APPL
ICAT
ION
S
Create intelligent applications
Collect data, create analytical models and deliver to online apps
3
1
2TRADITIONAL REPOS
RDBMS EDW MPP
Traditional Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensor data, social media)
Custom Applications
Enterprise Applications
NOSQL
1 Capture
Process & Compute
Deliver Model
2
3
Page 21
Application Enrichment
HORTONWORKS DATA PLATFORM
Don’t Let Security Be The ‘Elephant in the Room’Enterprise Security for Big Data
Jeremy Stieglitz
Extracting Value from DataBig Data Now Includes Sensitive Data
• Marketing – analyze purchase patterns• Social media – find best customer segments• Financial systems – model trading data• Banking and insurance – 360° customer view• Security – identify credit card fraud• Healthcare – advance disease prevention
Copyright 2013 Voltage Security 23
How do you liberate the value in data – without increasing risk?
Hidden Risks in Big Data Adoption
Big Data Enables deeper data
analysis More value from old
data New risks if data is not
protected
24
Data Concentration Risks– Financial Positions– Market Position– Changes to big picture– Corporate Compliance risk
Cloud Adoption Risks– Sensitive data in untrusted
systems.– Data in storage, in use,
transmitted to cloud.
Data Sharing Risks– Compliance challenges
with 3rd party risk– Data in and out of the
enterprise
Breach Risks– Internal users– External shares– Backup’s, Hadoop
stores, data feeds.
Copyright 2013 Voltage Security
Data Security ApproachesIT Infrastructure Security
Database
Network
Access
Application
Security Gap
Security Gap
Security Gap
Full disk encryption
Transparent Database Encryption (TDE)
SSL/TLS
Authentication and Access Control
OS/Storage
Secu
rity
Cove
rage
Copyright 2013 Voltage Security 25
Data Security ApproachesIT Infrastructure Security
Database
Network
Access
Application
Secu
rity
Cove
rage
Security Gap
Security Gap
Security Gap
Full disk encryption
Transparent Database Encryption (TDE)
SSL/TLS
Authentication and Access Control
OS/Storage
• More keys• More secure• Less computation• Application aware
• Less keys• Less secure• More computation• Transparent
“check box” encryption, Available from cloud providers
Copyright 2013 Voltage Security 26
Traditional IT Infrastructure Security
Data-Centric Security Top down:
Application-layer data protection provides seamless end-to-end data security
Encrypt once, persistently protect from point of capture:in storage, in transit, in use
If attacked, data has no value
Database
Network
Access
Data/Application
OS / Storage
Secu
rity
Cove
rage
Secu
rity
Cove
rage
Full disk encryption
Transparent Database Encryption (TDE), triggers
SSL/TLS/Firewalls
Authentication and Access Control
Security Gap (Data in the Clear)
Security Gap (Data in the Clear)
Security Gap (Data in the Clear)
Security Gap (Data in the Clear)
Traditional IT Security vs. Data-centric security
Copyright 2013 Voltage Security 27
Requirements for Big Data Security
28
Lock data in place
More keys to manage
Horizontal support to wherever your data travels
Copyright 2013 Voltage Security
Data – structure, value, and meaning
Take a simple Tax ID. It’s more than just a number.
• It has a format and structure • It has value in being unique • It’s parts have value – e.g. last 4 digits
Copyright 2013 Voltage Security 29
Traditional Encryption Practically Eliminates Value in the Data
• Destroys the original value – makes data secure, but incompatible
• Changes format of data – requires schema changes• Changes size of field – increases storage• Always requires application and data flow changes: “Ripping up
the Roads”• Destroys any special encoding or checksums (Luhn checksum
in credit cards, driver’s license checksums for certain states)
934-72-2356Tax ID AES-CBC
uE28W&=209gX32F*52Encrypted Tax ID
Copyright 2013 Voltage Security 30
• Standard, proven mode of AES (NIST FFX mode – ask NIST)• Encrypt at capture. Data stays protected at all times• Fit into existing systems, protocols, schemas – any data• Enable operation on encrypted data – retains the value of the original data• Protect live data in applications & databases, business process or transactions• Create de-identified data for test, cloud apps, outsourcers• Can preserve validation checksums
Voltage Format-Preserving Encryption™ (FPE)
31
Credit Card934-72-2356
Tax ID
Regular AES 8juYE%Uks&dDFa2345^WFLERG
FPE 7412 3423 3526 0000 298-24-2356
Ija&3k24kQarotugDF2390^32
7412 3456 7890 0000
Copyright 2013 Voltage Security
Stateless Key Management
32
Keys when you need them, not when you don’t.• Keys derived on the fly• Simple - lower risk, lower cost• Scale to millions of users• Keys don’t stay resident• Standards Based• FPE/AES Symmetric keys• Structured and unstructured
data
Identity Based Encryption IEEE 1363.3
Copyright 2013 Voltage Security
High-performance Data Security
33
Voltage SecureData™ for Hadoop
Hadoop ecosystem: ETL tools, HIVE, MapReduce jobs, other query and analysis tools
Copyright 2013 Voltage Security
Three Insertion Points into Hortonworks Data Platform (HDP)
#1. Upon Ingest:APIs, CL, Batch toolsfor ETL, SQOOP, Streaming, etc.
Copyright 2013 Voltage Security 34
Three Insertion Points into Hortonworks Data Platform (HDP)
#2. Executed asMap Job
Copyright 2013 Voltage Security 35
Three Insertion Points into Hortonworks Data Platform (HDP)
#3. UDFs for PIG,Hive, etc.
Copyright 2013 Voltage Security 36
Benefits of Voltage SecureData
• Solves complex global compliance issues • Ensures data stays protected wherever it goes• Enables accurate analytics on encrypted data• Optimizes performance• Flexibly adapts to the fast-growing Hadoop ecosystem• Delivers maximum return on information – without
increased risk
Copyright 2013 Voltage Security 37
Use Case: Fortune 50 HealthcareProducts and Services Company
• Challenge– Sell new information-based services to
medical suppliers & drug companies– Big Data team tasked with securing 1000
node Hadoop cluster for HIPAA, HITECH
• Solution – Data de-identified in ETL move before
entering Hadoop– Ability to decrypt analytic results when
needed, through multiple tools
• Benefits – Ability to monetize existing medical data, and
fine-tune manufacturing and marketing
04/11/2023 38Copyright 2013 Voltage Security
04/11/2023 39
Use Case: BankingTop Worldwide Financial Institution
• Challenge– Credit risk and consumer fraud groups – PCI compliance is #1 driver– ETL offload use case with Hadoop alongside DW
• Solution– Integrate with Sqoop on ingestion, and Hive and Pig on
the applications / query side to protect 20 types of data– Fraud analysts work with SST tokenized credit card
numbers and only de-tokenize as needed
• Benefits– Enable fraud and risk analytics directly in Hadoop on
protected data– Use Hadoop processing with security and compliance for
faster time to insight
Copyright 2013 Voltage Security
40
Contacts
- Hortonworks- http://hortonworks.com/ - http://hortonworks.com/partners/certified-technology-program/
- USA: (855) 8-HORTON (1 for sales) - Intl: (408) 916-4121 (1 for sales)
- Voltage Security- http://www.voltage.com/- http://www.voltage.com/partners/technology-partners/hortonworks/
- Tel: +1 (408) 886-3200- [email protected]
Copyright 2013 Voltage Security
THANK YOU