automatic detection, classification and authorization of sensitive personal data impacted by gdpr
TRANSCRIPT
1 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Hortonworks Confidential. For Internal Use Only.
AUTOMATIC DETECTION, CLASSIFICATION, AND AUTHORIZATION OF SENSITIVE PERSONAL DATA IMPACTED BY GDPRSrikanth Venkat – Senior Director, Product Management, Hortonworks
Subra Ramesh – VP, Products & Engineering, Dataguise
2© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Agenda
2
GDPR Overview
GDPR Personal Data – what it requires
GDPR – Controller vs. Processor Requirements
Addressing GDPR requirements– DgSecure: Detection, Element-level Protection, Monitoring
– Hortoworks HDP: Apache Ranger (Security & Privacy)and Apache Atlas (Data Inventory/Classification)
Integration of DgSecure Detection with Atlas-Ranger for Automatic Authorization Control over GDPR Personal Data
Demo
3© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
General Data Protection Regulation
3
Framework for the digital transformation economy – Data = business asset, new currency, innovation accelerator– Personal data leveraged throughout connected ecosystems
GDPR harmonizes and extends EU Data Protection Directive 95/46/EC
Expands the definition of protected data
Expands data subject rights
4© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Overview of GDPR Framework
Data Protection Authority
(supervising authority)
Data Controller
(organisations)
Data Subject
(individuals)
Data
Processor
Third
Countries
Third
Parties
Duties
Rights
Inform?
Disclosure?
Is Data Handling
Secure ?
Guarantees?
Advisory and
Enforcement
European Data Protection Board
(consistency mechanism) EU Courts National Courts
Complaint/
Resolution
5© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
GDPR Data Privacy
5
Sources: 1. ec.europa.eu/justice/data-protection/reform/files/regulation_oj_en.pdf2. http://www.consilium.europa.eu/en/infographics/data-protection-regulation-infographics/
7© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Rights & Obligations under GDPR
7
Controller Obligations– Clear Consent
– Clear Detailed Privacy notices
– Breach Notification (72 hours)
– Appointment of Data Protection Officer (250+, or high risk processing)
– Privacy by Design & Other considerations
―Lawful basis, Fair processing, & Specify Purposes
―Adequate, relevant, not excessive
―Data Accuracy, Retention, and Appropriate Security
– International Transfer adequacy
8© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Rights & Obligations under GDPR
8
Individual Rights– Access to data
– Remedy from supervisory body/court
―Compensation for Damage
―Compensation for Distress
―Rectification
– Objection (for direct marketing)
– Erasure (right to be forgotten)
– Data Portability
– Restrict data processing (put on hold)
– Automated decisions and profiling
9© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Broad Scope of GDPR
9
NOT ONLY data controllers or processors that are within the European Union
BUT ALSO– ANY processing of ANY personal data belonging to EU citizens
when the processing relates to the offering of goods or services, or monitoring behavior that takes place within the EU
Source: ec.europa.eu/justice/data-protection/reform/files/regulation_oj_en.pdf
10© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
⬢ Comprehensive coverage across Hadoop ecosystem components
⬢ Plugins for components resident with component
⬢ Extensible Plugin Model: plugin for authorizing other sources can be built
Apache Ranger: Comprehensive Extensible Authorization
11© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
⬢ Simple Intuitive UI for Policy Editing and Setup
⬢ Fine-grained specificity by resource type, user context, tags, and operation
⬢ Supports Access, Tag Based, Dynamic Data Masking, and Row Filtering Policy Types
Apache Ranger - Intuitive and Granular Policy Management
12© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Apache Ranger Audits - Data Access
⬢ Comprehensive scalable audit logging ⬢ Audits for:
⬢ Resource Access Events with user context⬢ Policy Edits/Creation/Deletion⬢ User session information⬢ Component plugin policy sync operations
13© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
STRUCTURED
Atlas: Metadata Truth in Hadoop
TRADITIONALRDBMS
METADATA
MPP APPLIANCES
Kafka Storm
Sqoop
Hive
ATLASMETADATA
Falcon
RANGERCustom
Partners
Metadata-driven governance services for Hadoop and enterprise big data ecosystems
Data Lineage/Provenance Along the entire data lifecycle with integrated Cross
component lineageData Classification Supports classification of data assets using tags (e.g. PII,
PHI, PCI etc.) and attributesMetadata Catalog Search Free text search on metadata Advanced search using DSLIntegrationsacross the Hadoop ecosystem, through a common metadata store Free text search on metadata OOtB real-time metadata and lineage ingestion with Hive,
Sqoop, Storm/Kafka APIs for custom metadata ingestion Apache Ranger integration for classification based
security
Key Benefits:
Modern Data Lakes need new ways to govern because:
• Cost – Traditional staff ratio to data size not possible
• Diversity – Only way to manage velocity of new datasets
• Agility – Quick change based on tags / taxonomy
14© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
HDP – Security & Governance
Classification
Prohibition
Time
Location
Policies
PDPResource
Cache
Ranger
Manage Access Policies and Audit Logs
Track Metadataand Lineage
Atlas ClientSubscribers
to Topic
Gets MetadataUpdates
Atlas
MetastoreTags
Assets
Entitles
Streams
Pipelines
Feeds
HiveTables
HDFSFiles
HBaseTables
Entitiesin Data
Lake
Industry First: Dynamic Tag-based Security Policies
15© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Dataguise: Company Background
Pioneers of Hadoop
Data Protection
2011-2013
Magic Quadrant
“Visionary” for Data
Masking
2015
Recommended for
Data-Centric
Security
2015
Recommended for
Protecting Big Data
in Hadoop
2015
2007-2010
“Breakthrough” Masking Technology
2014
The “Essential”
Solution for Data
Protection
in Hadoop
Cloud Platform
Coverage
2016
2017
Gartner Market
Guide for Data
Masking
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
16© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
DGSECURE PRODUCT
16
17© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
DgSecure Operation Sequence
Define the
Policy
Discover the
Sensitive Data
Secure
DataMonitor and
Reporting
18© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Visualization: Enterprise-wide Data Security Posture
18
19© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Enable Access Control based on Sensitivity Classification
19
Set up DgSecure to run on periodic basis to scan for sensitive data and generate classification information– DgSecure will continuously update Atlas with Tags as and when it find sensitive information.
Set up Ranger Policies based on Sensitive Tags
Ranger Policies will kick in at the time any user tries to access the data, for example, in a Hive Query
20© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
DgSecure – Atlas/Ranger Integration Flow
20
DgSecure Detection
Atlas Populated with
Sensitivity Tags
Ranger Policies
based on tags
Access Control based
on Sensitivity
21© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
DgSecure Integration with Atlas/Ranger
21
DgSECURE
DgSecure
Repository
Detection
DATA STORE
Hadoop, Hive, S3, Blob Storage
ATLAS RANGER
Atlas Tags
ACL
Enforcement
Data Store (Hadoop, Hive, S3, Blob Storage)
22© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Demo – DgSecure + Atlas/Ranger
23© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Key Takeaways: DgSecure + HDP can help with GDPR
Detection of Sensitive Data– Structured, Unstructured Data, Context Information used, Machine Learning capabilities
Protection of Sensitive Data at Element Level– Masking or Encryption options in Hadoop
– At Rest Protection (Masking or Encryption)
Monitoring – Raise Alerts on (Attempted) Access to Sensitive Data– Breach Notification Requirement
Access Control Integration– Via Atlas/Ranger integration, Ranger Tag-Based Policies
Reporting – Visualization of Enterprise-Level Data Exposure
24© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Thank You
25© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
DgSecure Policy
25
26© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
DgSecure Hive Task
26
27© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
DgSecure Detection Results (Hive)
28© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Sensitive Data Tags in Atlas
28
29© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Ranger Tag-Based Policies
29
30© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
For more information check outCheck out other relevant sessions:
Apace Atlas: Governance for your data, 4:10p, Wednesday April 5th
2017
Bridle Your Flying Islands And Castles In The Sky: Built-in Governance And Security For The Cloud, 11.30am, Thursday April 6, 2017
BoF sessions – Security and Governance 5:50p, Thursday, April 6th 2017
Hortonworks
www.hortonworks.com
Dataguise
www.dataguise.com