five steps to secure big data
DESCRIPTION
database, security, big data, tokenizationTRANSCRIPT
Five Steps to Secure Big Data
Ulf Mattsson, CTO
Protegrity
ulf.mattsson AT protegrity.com
20 years with IBM • Research & Development & Global Services
Inventor • Encryption, Tokenization & Intrusion Prevention
Involvement
Ulf Mattsson, CTO Protegrity
2
• PCI Security Standards Council (PCI SSC)
• American National Standards Institute (ANSI) X9
• Encryption & Tokenization
• International Federation for Information Processing• IFIP WG 11.3 Data and Application Security
• ISACA New York Metro chapter
Big Data
Hadoop
• Designed to handle the emerging “4 V’s”
• Massively Parallel Processing (MPP)
• Elastic scale
• Usually Read-Only
• Allows for data insights on massive, heterogeneous data sets
What is Big Data?
data sets
• Includes an ecosystem of components:
4
Hive
MapReduce
HDFS
Physical Storage
Pig Other
Application Layers
Storage Layers
Has Your Organization Already Invested in Big Data?
5
Source: Gartner
6
http://www.ey.com/Publication/vwLUAssets/EY_-_2013_Global_Information_Security_Survey/$FILE/EY-GISS-Under-cyber-attack.pdf
Holes in Big Data…
7
Source: Gartner
Many Ways to Hack Big Data
MapReduce(Job Scheduling/Execution System)
Pig (Data Flow) Hive (SQL) Sqoop
ETL Tools BI Reporting RDBMS
Avr
o (S
eria
lizat
ion)
Zoo
keep
er (
Coo
rdin
atio
n)
Hackers
UnvettedApplications
OrAd Hoc
Processes
Source: http://nosql.mypopescu.com/post/1473423255/apache-hadoop-and-hbase
8
HDFS(Hadoop Distributed File System)
Hbase (Column DB)
Avr
o (S
eria
lizat
ion)
Zoo
keep
er (
Coo
rdin
atio
n)
PrivilegedUsers
Current Data Security for Big data
Authentication
• Who am I and how do I prove it?
• Ensure the identity of the users, services and hosts that make up and use the system is authoritatively known
Authorization
• What am I allowed to see and do?
• Ensure services and data are accessed only by entitled identities
Data Protection
• How is my Data being Protected?
• Ensure data cannot be usefully stolen or undetectably tampered with
Auditing
• What have I attempted to do or done?
• Ensure a permanent record of who did what, when
DataSecurity
Taking Data Security to the Next Level
10
Security
Achieving Best Data Security for Big Data
Massively Scalable Data Security
Maximum Transparency
Maximum Performance
Easy to Use
Heterogeneous System Compatibility
Enterprise Ready
Many Layers of Defense
Corporate Firewall
Corporate Enterprise
Kerberos Authentication
Authorization through ACLs
Big D
ataEncrypted Communications
Big Data Cluster
Authorization through ACLs
Coarse Grained
Fine Grained
Data Security Policy
Protegrity
8
Protecting the Big Data Ecosystem
Data Access Framework
Pig Hive
Data Processing Framework
BI Applications
Java API enables Field Level data protection with Policy based access
User Defined Functions (UDFs) enable Field Level data protection with Policy based access controls with Monitoring.
BI Applications are authorized to access sensitive data through the policy.
Data Storage Framework(HDFS)
Data Processing Framework(MapReduce)
Volume or File Encryption with Policy based access controls at the OS file system level.
File level data protection with Policy based access controls for existing and new data.
protection with Policy based access controls with Monitoring.
CoarseGrained
Policy Based File and Disk Encryption
14
Grained
File Based Encryption Example
Files with personal identifiable informationStored in Hadoop cluster
Root user logged-in to one of the nodes
Search for sensitive information on disk
FineGrained
Policy BasedField Level Data Protection
16
Grained
Fine Grained Protection: Field Protection
Production SystemsEncryption• Reversible• Policy Control (Authorized / Unauthorized Access)• Lacks Integration Transparency• Complex Key Management• Example
Tokenization / Pseudonymization• Reversible• Policy Control (Authorized / Unauthorized Access)
!@#$%a^.,mhu7///&*B()_+!@
17
Non-Production SystemsMasking• Not reversible• No Policy, Everyone Can Access the Data• Integrates Transparently• No Complex Key Management• Example 0389 3778 3652 0038
• Policy Control (Authorized / Unauthorized Access)• Integrates Transparently• No Complex Key Management• Business Intelligence Credit Card: 0389 3778 3652 0038
Field Level Protection Example
Files with personal identifiable informationLoaded in to a Hive table
Select data from that table
Root user logged-in to one of the nodes
Search for sensitive information on disk
SecurityPolicy
Take Control Of Data Security
19
Policy
Policy Based Access Control
What is the sensitive data that needs to be protected. Data Element.
Who should have access to sensitive data and who should not. Security access control. Roles & Members.
What
Who
Combination of what data needs to be protected and who has access to that data is the key to creating a meaningful policy
20
Members.meaningful policy
Protegrity Data Security Policy
What
Who
How
What is the sensitive data that needs to be protected. Data Element.
How you want to protect and present sensitive data. There are several methods for protecting sensitive data. Encryption, tokenization, monitoring, etc.
Who should have access to sensitive data and who should not. Security access control. Roles & Members.
When
Where
Audit
When should sensitive data access be granted to those who have access. Day of week, time of day.
Where is the sensitive data stored? This will be where the policy is enforced. At the protector.
Audit authorized or un-authorized access to sensitive data. Optional audit of protect/unprotect.
Policy Based Filed Protection Example
Files with personal identifiable informationLoaded in to a Hive table
Create a view on that table
Select data as authorized user
Select data as privileged user
Enterprise
Enterprise Strength
Protection platforms must protect sensitive data end to
23
Enterprise protect sensitive data end to end – at rest, in transit and on any technology platform
End to End Data Security Across the Enterprise
24
Enterprise Heterogeneous Coverage
• File Protectors: AIX, HPUX, Linux, Solaris, Windows
• Database Protectors : DB2, SQL Server, Oracle, Teradata, Informix, Netezza, Greenplum
• Big Data Protectors: BigInsights, Cloudera, Greenplum, mapR, Aster, Apache Hadoop, Hortonworks
• Big Iron Platform: zSeries, HP Non-Stop
Best Practices for Protecting Big Data
Start Early
Fine Grained protection
Select the optimal protection for the future
Enterprise coverage
Protection against insider threatProtection against insider threat
Transparent protection to the analysis process
Policy based protection and audit
25
Five Point Data Protection Methodology
26
1. Classify 2. Discovery 3. Protect 4. Enforce 5. Monitor
Classify
Determine what data is sensitive to your organization.
27
sensitive to your organization.
Financial Services
Healthcare and Pharmaceuticals
Infrastructure and Energy
Federal Government
Select US Regulations for Security and Privacy
Federal Government
28
1. Classify: Examples of Sensitive Data
Sensitive Information Compliance Regulation / Laws
Credit Card Numbers PCI DSS
Names HIPAA, State Privacy Laws
Address HIPAA, State Privacy Laws
Dates HIPAA, State Privacy Laws
Phone Numbers HIPAA, State Privacy Laws
29
Personal ID Numbers HIPAA, State Privacy Laws
Personally owned property numbers HIPAA, State Privacy Laws
Personal Characteristics HIPAA, State Privacy Laws
Asset Information HIPAA, State Privacy Laws
Discovery
Discover where the sensitive data is located and how it flows
30
data is located and how it flows
2. Discovery in a large enterprise with many systems
System
SystemSystem
System
System
System
031
System
Corporate Firewall
System
System
031
System
System System
System
2. Discovery: Determine the context to the Business
System
System System
Retail
Employees
032
System
Corporate Firewall
System
032
System
System
Employees
Corporate IP
Healthcare
2. Discover: Context to the Business and to Security
Stores & Ecommerce
Data Protection Solution
File Server
Databases
Collecting transactions
033
Corporate Firewall
Applications
Research Databases
Solution Requirements
File Server containing IP
Hadoop
Protect
Protect the sensitive data at rest and in transit.
34
rest and in transit.
Balancing Security and Data Insight
Tug of war between security and data insight
Big Data is designed for access
Privacy regulations require de-identification
Granular data-level protectionGranular data-level protection
Traditional security don’t allow for seamless data use
35
Protection Beyond Kerberos
API enabled Field level data protection
API enabled Field level data protection
MapReduce(Job Scheduling/Execution System)
Pig (Data Flow) Hive (SQL) Sqoop
ETL Tools BI Reporting RDBMS
36
Volume Encryption
Field level data protection for existing and new data.HDFS
(Hadoop Distributed File System)
Hbase (Column DB)
Volume Encryption
MapReduceEntire file is in the clear when analyzed
37
HDFS
Protected with Volume Encryption
File Encryption – Authorized User
MapReduceEntire file is in the clear when analyzed
38
HDFS
Protected with File Encryption
File Encryption – Non Authorized User
MapReduceEntire file is in unreadable when analyzed
39
HDFS
Protected with File Encryption
Volume Encryption + Gateway Field Protection
MapReduce
Kerberos
Granular Field Level Protection
40
HDFSKerberos AccessControl
Protected with Volume Encryption
Data Protection File Gateway
Volume Encryption + Internal MapReduce Field Protection
MapReduce
Granular Field Level Protection
Hadoop Staging
Analytics
41
HDFSKerberos Access Control
Protected with Volume EncryptionMapReduce
Enforce
Policies are used to enforce rules about how sensitive data
42
rules about how sensitive data should be treated in the enterprise.
A Data Security Policy
What is the sensitive data that needs to be protected. Data Element.
How you want to protect and present sensitive data. There are several methods for protecting sensitive data. Encryption, tokenization, monitoring, etc.
Who should have access to sensitive data and who should not. Security access control. Roles & Members.
What
Who
How
43
When should sensitive data access be granted to those who have access. Day of week, time of day.
Where is the sensitive data stored? This will be where the policy is enforced. At the protector.
Audit authorized or un-authorized access to sensitive data. Optional audit of protect/unprotect.
When
Where
Audit
Volume Encryption + Field Protection + Policy Enforcement
MapReduce
44
HDFSProtected with
Volume Encryption
Data Protection Policy
Volume Encryption + Field Protection + Policy Enforcement
MapReduce
45
HDFSProtected with
Volume Encryption
Data Protection Policy
4. Authorized User ExamplePresentation to requestorName: Joe SmithAddress: 100 Main Street, Pleasantville, CA
Selected data displayed (least privilege)
Data Scientist, Business Analyst
PolicyEnforcement
Does the requestor have the authority to access the protected data?
Authorized
Request
Response
46
Protection at restName: csu wusojAddress: 476 srta coetse, cysieondusbak, CA
4. Un-Authorized User Example
Request
Response
Presentation to requestorName: csu wusojAddress: 476 srta coetse, cysieondusbak, CA
Privileged Used, DBA, System Administrators, Bad Guy
PolicyEnforcement
Does the requestor have the authority to access the protected data?
Not Authorized
47
Protection at restName: csu wusojAddress: 476 srta coetse, cysieondusbak, CA
Monitor
A critically important part of a security solution is the ongoing
48
security solution is the ongoing monitoring of any activity on sensitive data.
Best Practices for Protecting Big Data
Start early
Granular protection
Select the optimal protection
Enterprise coverage
Protection against insider threat Protection against insider threat
Protect highly sensitive data in a way that is mostly transparent to the analysis process
Policy based protection
Record data access events
49
How Protegrity Can Help
1
2
3
We can help you Classify the sensitive data
We can help you Discover where the sensitive data sits
We can help you Protect your sensitive data in a flexible way
50
3
4
5
We can help you Protect your sensitive data in a flexible way
We can help you Enforce policies that will enable business functions and preventing sensitive data from the wrong hands.
We can help you Monitor sensitive data to gain insights on abnormal behaviors.
Protegrity Summary
Proven enterprise data security software and innovation leader
• Sole focus on the protection of data
• Patented Technology, Continuing to Drive Innovation
Cross-industry applicability• Retail, Hospitality, Travel and
TransportationTransportation
• Financial Services, Insurance, Banking
• Healthcare
• Telecommunications, Media and Entertainment
• Manufacturing and Government
51