developing enterprise policy presentation title goes here ... · presentation is intended to be, or...

27
PRESENTATION TITLE GOES HERE Developing Enterprise Policy Over Big Data Jim McGann Index Engines

Upload: others

Post on 10-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

PRESENTATION TITLE GOES HERE Developing Enterprise Policy Over Big Data

Jim McGann

Index Engines

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

SNIA Legal Notice

The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations and literature under the following conditions:

Any slide or slides used must be reproduced in their entirety without modification The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations.

This presentation is a project of the SNIA Education Committee. Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK.

2

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

Total Enterprise Data Growth

IDC estimates the volume of digital data is growing at 40% to 50% per year. By 2020, IDC predicts the number will have reached 40 Zettabytes (ZB). According to Gartner 42% state they will have invested in Big Data by 2014

3

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

Big Data Impact

4

IT Budget

IT Legal & Compliance Support

IT Satisfaction

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

Reporting and Analytics

IT tools are not designed to provide file-level insight and analytics against big data.

Access Logs Security information No detailed file information

Block level/Capacity Planning No insight into file level data

File System Metadata Light metadata

File Metadata Full metadata and duplicate information (MD5) Full content (optionally)

Business units need file level knowledge to determine disposition

5

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

Big Unstructured Data

Traditional unstructured data includes user generated content such as transactional data and logs. Including user generated data such as email and documents adds value and knowledge to big data. Data is an asset! Not a liability. Mining value from unstructured user data will add knowledge and valuable intellectual property to Big Data analytics.

6

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

Big Data vs. Better Data

Profile big data using reports and analysis Classify data into groups

Business Value No Business Value Redundant Risk

Work with business units to determine disposition Manage big data more intelligently Purge what is no longer required Turn Big Data into Better Data

7

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

Data Profiling Technology

High-speed indexing platform Integrates into storage environment Maintains index of unstructured content

Access to all sources Primary and secondary storage Support legacy backup tape data

Enterprise ready Cost effective Easy to incrementally deploy

8

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

Index Engines

What is Metadata?

User Files Dates (Modified, Accessed, Created) Size File Name/Path Author/Owner Signature

Email Dates (Sent, Received Deleted) to/from/cc/bcc Mailboxes/folders Signature

Backup B/U time Server Volume/B/U Set

Content (optional) Full text PII (SSN/CCN)

File Properties Email Properties Backup Properties Location

Servers Tapes Desktops

9

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

Dis

posi

tion

Combine

queries/filters and reports on metadata

index

Make decisions about data

Extract file and email metadata from

unstructured user data

Dates Modified Accessed Created

User Author

To/CC/BCC

Location Path

File Name Size

Backup B/U Set

B/U Date B/U Host

Unstructured Data Profiling

Scan data sources using

high speed indexing technology

Generate rich metadata

index that is incrementally updated

User Shares

Dept Servers

Email Backup Tapes

10

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

Performance/Formats

NFS/CIFS crawling or NDMP Bandwidth can be throttled Incremental updates Schedule can be defined

Formats Supported Unstructured user data Email (Exchange, Notes)

Profiling Options Light – metadata only Full – full content/text

11

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

Active Directory/LDAP Integration

Integrates with Active Directory(AD) and Lightweight Directory Access Protocol (LDAP) to take advantage of user and group information Active users vs. inactive group Departmental groups Reports summarized based on groups

Supports charge backs by department

Security audits Query user ACLs – determine read/write/browse

12

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

Data Classification Policies

13

• Owned by ex-employees and no access in years Abandoned • Not accessed in more than three years Aged • Duplicate content Redundant • Multimedia files such as iTunes and movies Personal • Sensitive content such as PII and legal hold Risk • Data with long term business value (Value Based Archives) Archive • Manage data in place to determine future disposition Active

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

administratorjohn.doesally.smithbugs.bunnyclark.kent

Data Profiling in Action

100TB Unstructured

User Data

14

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

Cleaning Up Admin Owned Files

• Reassign owner based on metadata properties, location, content, file name, etc.

• Reassign owner based on location path, extract path info into file name.

• Tag content based on metadata properties

Owner = Admin

15

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

daffy.duckjohn.doesally.smithbugs.bunnyclark.kent

Cleaned Owners Report

16

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

ManufacturingR&DSalesMarketingLegalHR

Departmental Report/Chargebacks

Active Directory

LDAP

17

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

Data Policy and Disposition

Abandoned – Defensibly Delete

Aged – Migrate to Lower Cost Storage/Cloud

Redundant – Purge and Consolidate

Personal – Notify and Enforce Policy

Risk – Secure in Legal Hold Archive

Intellectual Property – Preserve in Archive

Active – Monitor

18

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

Reclaim and Manage Capacity

Defensibly Deleted

Migrated to New Platform/Cloud

Consolidation of Redundant Data

Personal Files Removed

Sensitive Content Archived

Active Content Managed

19

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

Workflow: Aged Data

Filter on locations

Report on last modified age

Analyze capacity

20

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

Workflow: Aged Data

Filter on 5+ years

Report on owners

21

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

Disposition Options

Copy, delete and archive are included in GUI csv text file output

Use detailed file listing to determine disposition of data:

Purge Move Copy Encrypt etc.

22

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

Case Studies

Client Overview Solution

Manufacturing Legal issues related to unmanaged PSTs across corporate networks.

Audit and clean up 14,000 PSTs across 500TB.

Business Services 550TB of legal hold data on Data Domain – upgrade required.

Extract 1TB of actual legal hold data and reclaim capacity.

Financial Services Clean user share according to corporate policies.

Execute chargeback plan on 40TB server with map of usage.

Top 5 Financial Services Prepare for mortgage lawsuits and archive 175,000 users email.

Profile 220,000 legacy tapes (17PB) and extract relevant data to an archive.

Oil and Gas Migrate research data to cloud archive for long term preservation.

Find files by type across 2PB of storage and department and migrate to cloud.

23

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

Data Profiling Advantages

Better insight into all corporate data assets

Streamline storage capacity by cleaning up unnecessary data

Legacy tape remediation

Improve support for legal and compliance

Find and manage data more effectively

24

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

Sample Workflows

Aged Data Clean Up Data Tiering (Cloud On-Ramping) Archiving On-Ramping Managing Large Files (Multimedia) PII/Security Audit Email (PST) Management Storage Capacity Allocations/Chargebacks Data Center Migrations and Consolidations Technology Refresh/Audits

25

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

Best Practices

Start Small User shares – common area of unmanaged data growth Servers requiring capacity upgrades Monitor data growth by user

Engage legal/compliance/records management team Communicate how data can be profiled Help refine/define data policies based on risk Work to implement and audit policies

Implement chargeback's Profile data by department and deliver a view into content Provide disposition options that allow them to control expenses Get support from legal/compliance to enforce clean up

26

Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.

Attribution & Feedback

27

Please send any questions or comments regarding this SNIA Tutorial to [email protected]

The SNIA Education Committee thanks the following individuals for their contributions to this Tutorial.

Authorship History Name/Date of Original Author here: Jim McGann Updates: Name/Date Name/Date

Additional Contributors Name of contributor here Name of contributor here Name of contributor here Name of contributor here Name of contributor here Name of contributor here Name of contributor here Name of contributor here