is your enterprise data lake metadata driven and secure?
TRANSCRIPT
Is Your Enterprise Data Lake Metadata Driven AND Secure?
Apache Atlas + Ranger
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Disclaimer
This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed.
Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery.
This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product.
Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
• Introduction
• Overview Apache Atlas & Ranger
• Technical Preview: Dynamic, Tag based Policies
• Q & A
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Speakers
Andrew AhnDirector, Governance Product Management
Madhan NeethirajDirector, Enterprise Security Engineering
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas + Ranger Overview
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas is Metadata Services
Metadata Services Foundation — HDP 2.3• Business Catalog: Taxonomy based classification
• Technical Data: e.g. Model for Hive: DB, Tables, Views and Columns
• Centralized location for all metadata inside and single Interface point for Metadata Exchange with platforms outside of HDP
Metadata that enriches every component
Available Now with HDP 2.3• Hive – Complete lineage, every SQL statement tracked• Ambari – setup & monitoring
Apache Atlas
Hiv
e
Ran
ger
Falc
on
Sqoo
p
Stor
m
Kaf
ka
Spar
k
NiF
i
1Q2016 – Technical Preview• Sqoop – supplement Hive lineage based on Sqoop import/export• Storm & Kafka – lineage for topologies and participating queues/topics • Ranger – Dynamic Security Policies: leveraging metadata tags• Falcon - Process entities lineage
Roadmap• HDFS – Correlated with other components• Spark – support for SparkSQL• NiFi – integrate fine-grained data provenance with Atlas
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Big Data Management Through Metadata
Management ScalabilityMany traditional tools and patterns do not scale when applied to multi-tenant data lakes. Many enterprise have silo’d data and metadata stores that collide in the data lake. This is compounded by the ability to have very large windows (years). Can traditional EDW tools manage 100 million entities effectively with room to grow ?
Metadata Tools
Scalable, decoupled, de-centralized manage driven through metadata is the only via solution. This allows quick integration with automation and other metamodels
Tags for Management, Discovery and Security
Proper metadata is the foundation for business taxonomy, stewardship, attribute based security and self-service.
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Access Policy Requirements
• Basic Tag policy – PII example. Access and entitlements must be tag based ABAC and scalable in implementation.
• Geo-based policy – Policy based on IP address, proxy IP substitution maybe required. The rule enforcement but be geo aware.
• Time-based Tag policy – Timer for data access, de-coupled from deletion of data.
• Prohibitions – Prevention of combination of Hive tables/Columns that may pose a risk together.
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How does Atlas work with Ranger at scale?
Atlas provides: Metadata• Business Classification (taxonomy): Company > HR > Driver
• Hierarchy with Inheritance of attribute to child objects: Sensitive “PII” tag of department HR will be inherited by group HR> Driver
• Atlas will notify Ranger via Kafka Topic for changes
Apache Atlas
Hiv
e
Ran
ger
Falc
on
Kaf
ka
Stor
m
Atlas provides the metadata tag to create policies
Ranger provides: Access & Entitlements
• Ranger will cache tags and asset mapping for performance
• Ranger will have policies based on tags instead of roles.
• Example: PII = <group> This can work for many assets.
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger:Dynamic classification based Security
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger: Introduction
Centralized authorization and auditing across Hadoop components• HDFS, Hive, HBase, Knox, Strom, YARN, Kafka, Solr, ..• Audit logs to: Solr, HDFS, RDBMS, Log4j, ..
Resource based security• Policies for specific set of resources• Requires revision of policies as resources get added/moved
Classification based security• Policies for classifications and not for specific resources• A single policy protects resources in multiple components• As classification for resources change, appropriate policies would
automatically be applied• Enables separation of duties: resource-classification and security policies
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger: Authorization and Auditing
HBase
Ranger Administration Portal
HDFS
Hive Server2
Ranger Audit StoreRanger Policy Store
Ranger Plugin
Hadoop Components
Enterprise Users
Log4j
Knox
Storm
YARN
Kafka
Solr
HDFS
Solr
Ranger Plugin
Ranger Plugin
Ranger Plugin
Ranger Plugin
Ranger Plugin
Ranger Plugin
Ranger Plugin
RDBMS
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas + Ranger integration
Metastore
• Tags• Assets• Entities
Notification Framework
Kafka Topics
AtlasAtlas Client
• Subscribes to Topic• Gets Metadata
Updates
PDPResource Cache
Ranger
Notification Metadata updates
Messagedurability
Optimized for Speed
Event driven updates
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
DEMO
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Setup for the demo
Database Table Columnsfinance tax_2010 Table Access Expires on 12/31/2015
hr employee SSN tagged as PII
Users:• analyst: No access to PII, No access to Expired Data• admin: Access to PII, Access to Expired Data
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas: tag a column as PII
3. Select ‘Tags’ tab 4. Click on ‘Add Tag’
5. Select PII tag & click ‘Save’
1. Search for the column 2. Select the column
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas: tag a table for expiry_date
Select EXPIRES_ON tag and enter value for expiry_date
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger: authorization policy for PII
Pick the tag
Deny access to PII data to all users with exception of ‘admin’ user
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger: authorization policy for expiry_date
Pick the tag
Deny access to data after expiry date with the exception of ‘admin’ user
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger: access audit logs
Tags associated with resourcesResources accessedPolicy that allowed/denied access
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
References
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
References
• Apache Atlas• http://atlas.apache.org• http://hortonworks.com/apache/atlas
• Apache Ranger• http://ranger.apache.org• http://hortonworks.com/apache/ranger
• Apache Ranger wiki• https://cwiki.apache.org/confluence/display/RANGER
• Tag based policies• https://cwiki.apache.org/confluence/display/RANGER/Tag+Based+Policies
• Geo-location based policies• https://cwiki.apache.org/confluence/display/RANGER/Geo-location+based+policies