hadoop operations: how to secure and control cluster access
TRANSCRIPT
![Page 1: Hadoop Operations: How to Secure and Control Cluster Access](https://reader037.vdocuments.mx/reader037/viewer/2022110204/55d4f9febb61eb36428b45a6/html5/thumbnails/1.jpg)
1
Hadoop Operations: How to Secure and Control Cluster AccessEric SammerEngineering Manager, Cloudera – Author, Hadoop Operations
![Page 2: Hadoop Operations: How to Secure and Control Cluster Access](https://reader037.vdocuments.mx/reader037/viewer/2022110204/55d4f9febb61eb36428b45a6/html5/thumbnails/2.jpg)
2
We’re here to talk about…
•How common security constructs map onto services•How these constructs work in Hadoop•Security model and options for a few critical components
•A few DOs and DON’Ts
![Page 3: Hadoop Operations: How to Secure and Control Cluster Access](https://reader037.vdocuments.mx/reader037/viewer/2022110204/55d4f9febb61eb36428b45a6/html5/thumbnails/3.jpg)
3
Warning
•Security in distributed systems is complicated•This is just a whirlwind tour – Do your homework•Assumptions
• You’re familiar with Hadoop’s architecture and functionality• You have a basic understanding of Kerberos
![Page 4: Hadoop Operations: How to Secure and Control Cluster Access](https://reader037.vdocuments.mx/reader037/viewer/2022110204/55d4f9febb61eb36428b45a6/html5/thumbnails/4.jpg)
4
The Three Questions
•Identity: Who are you?•Authentication: Can you prove it?•Authorization: Are you allowed to do that?
![Page 5: Hadoop Operations: How to Secure and Control Cluster Access](https://reader037.vdocuments.mx/reader037/viewer/2022110204/55d4f9febb61eb36428b45a6/html5/thumbnails/5.jpg)
5
Hadoop’s “Simple” Mode
•Identity: Usually the OS user of the client application•Authentication: Trust•Easy to impersonate other users•Stop good users from doing silly things•The default
![Page 6: Hadoop Operations: How to Secure and Control Cluster Access](https://reader037.vdocuments.mx/reader037/viewer/2022110204/55d4f9febb61eb36428b45a6/html5/thumbnails/6.jpg)
6
Hadoop’s “Simple” Mode
•Use simple mode when:• No regulatory or compliance concerns• All users are trusted• Single purpose cluster (single-tenancy)
![Page 7: Hadoop Operations: How to Secure and Control Cluster Access](https://reader037.vdocuments.mx/reader037/viewer/2022110204/55d4f9febb61eb36428b45a6/html5/thumbnails/7.jpg)
7
Hadoop’s “Secure” Mode
•Identity: Local part of the Kerberos principal•Authentication: Kerberos•User impersonation not possible except in specific (admin-configured) situations
![Page 8: Hadoop Operations: How to Secure and Control Cluster Access](https://reader037.vdocuments.mx/reader037/viewer/2022110204/55d4f9febb61eb36428b45a6/html5/thumbnails/8.jpg)
8
Hadoop’s “Secure” Mode
•Use secure mode when:• Real regulatory concerns• Untrusted users• Running on untrusted infrastructure or in an untrusted
environment• Multi-purpose cluster (multi-tenancy)
![Page 9: Hadoop Operations: How to Secure and Control Cluster Access](https://reader037.vdocuments.mx/reader037/viewer/2022110204/55d4f9febb61eb36428b45a6/html5/thumbnails/9.jpg)
9
Identity Management
•Always• Use a central user database/directory service for OS users• Wire up the Kerberos KDC to use the central directory
•Never• Use service users (e.g. hdfs, mapred) for anything other than
running services• Share accounts, even for admin purposes
![Page 10: Hadoop Operations: How to Secure and Control Cluster Access](https://reader037.vdocuments.mx/reader037/viewer/2022110204/55d4f9febb61eb36428b45a6/html5/thumbnails/10.jpg)
10
Authentication
•Simple mode: Trust what the client provides•Secure mode: Kerberos
• Keytabs for services• Many options: Passphrase, M/TFA, X.509 for users• Depends on Kerberos implementation
![Page 11: Hadoop Operations: How to Secure and Control Cluster Access](https://reader037.vdocuments.mx/reader037/viewer/2022110204/55d4f9febb61eb36428b45a6/html5/thumbnails/11.jpg)
11
Authorization
•Inherently service specific•Granularity of control varies by platform component•Examples
• Filesystem object-level, POSIX-style• Role-based access control (RBAC)• Access control lists (ACLs)• Deferral to underlying components
![Page 12: Hadoop Operations: How to Secure and Control Cluster Access](https://reader037.vdocuments.mx/reader037/viewer/2022110204/55d4f9febb61eb36428b45a6/html5/thumbnails/12.jpg)
12
HDFS Security Model
•POSIX-style users and groups•Traditional Unix-style octal permissions
• Files: no execute, sticky, setuid, setgid• Directories: no setuid, always behave as if setgid is set
•Authorization checks performed by NameNode
![Page 13: Hadoop Operations: How to Secure and Control Cluster Access](https://reader037.vdocuments.mx/reader037/viewer/2022110204/55d4f9febb61eb36428b45a6/html5/thumbnails/13.jpg)
13
HDFS User Levels
User Level Privileges Description and Notes
Cluster super user All User who started the daemons. Default: hdfs
Administrators AllConfiguration property dfs.permissions.supergroup specifies the name of the group of admins. Default: supergroup
Normal user Object-level All other users are beholden to the file and directory permissions, as specified.
![Page 14: Hadoop Operations: How to Secure and Control Cluster Access](https://reader037.vdocuments.mx/reader037/viewer/2022110204/55d4f9febb61eb36428b45a6/html5/thumbnails/14.jpg)
14
MapReduce Security Model
•Configurable job queues•Queues have associated ACLs•ACLs control job submission and administrative ops•Authorization checks performed by JobTracker
![Page 15: Hadoop Operations: How to Secure and Control Cluster Access](https://reader037.vdocuments.mx/reader037/viewer/2022110204/55d4f9febb61eb36428b45a6/html5/thumbnails/15.jpg)
15
MapReduce User Levels
User LevelPrivilege
sQueue Description and Notes
Cluster super user All All User who started the daemons. Default:
mapred
Cluster admins All AllConfiguration property mapred.cluster.administrators specifies the admin ACL.
Queue admins All SingleConfiguration property mapred.queue.queue-name.acl-administer-jobs specifies the admin ACL.
Job ownerSubmit,
Admin on own jobs
Queue containing
job
Configuration property mapred.queue.queue-name.acl-submit-job specifies the submission ACL.
![Page 16: Hadoop Operations: How to Secure and Control Cluster Access](https://reader037.vdocuments.mx/reader037/viewer/2022110204/55d4f9febb61eb36428b45a6/html5/thumbnails/16.jpg)
16
Systems on top of MapReduce
•Hive/Impala are the most featureful today• Without Sentry: Defers to HDFS object permissions• With Sentry, fine-grained RBAC on logical constructs (New!)
• Scope: Server, database, table, view• Privileges: ALL, SELECT, INSERT, TRANSFORM• Removes direct access to files• Supports traditional techniques for controlling column-level access
(i.e. views without sensitive columns)
•Everything else: HDFS object permissions
![Page 17: Hadoop Operations: How to Secure and Control Cluster Access](https://reader037.vdocuments.mx/reader037/viewer/2022110204/55d4f9febb61eb36428b45a6/html5/thumbnails/17.jpg)
17
A note on auditing...
•Winds up being service-specific•Cloudera Navigator handles this (and more)
![Page 18: Hadoop Operations: How to Secure and Control Cluster Access](https://reader037.vdocuments.mx/reader037/viewer/2022110204/55d4f9febb61eb36428b45a6/html5/thumbnails/18.jpg)
18
What we didn’t talk about
•Configuration and deployment• Lots of options, lots of moving parts• Integration with existing infrastructure• Cloudera Manager turns days or weeks of work into minutes
or hours; built to handle exactly these challenges•The other 80%: YARN applications, ZooKeeper, Flume, Sqoop, Oozie, Hue, Cloudera Search (Solr), multi-tenant gateway services, all of the administrative web interfaces, encryption of data at rest and on the wire, network footprint and exposure, ...
![Page 19: Hadoop Operations: How to Secure and Control Cluster Access](https://reader037.vdocuments.mx/reader037/viewer/2022110204/55d4f9febb61eb36428b45a6/html5/thumbnails/19.jpg)
19
Further reading and references
•Hadoop OperationsChapter 6: Identity, Authentication, and Authorization (E. Sammer, O’Reilly)
•Kerberos: The Definitive Guide(J. Garman, O’Reilly)
•CDH4 Security Guide•CDH4 Sentry Guide•Cloudera Manager•Cloudera Navigator
Submit questions in the Q&A panel
Watch on-demand video of this webinar and many more at http://cloudera.com
Follow Eric @esammer
Follow Cloudera @ClouderaU
Learn more at Strata + Hadoop World:
http://tinyurl.com/hadoopworld
Thank you for attending!