sensitive information sweep using cornell’s spider wyman miles, cornell university kerry havens,...

Post on 14-Dec-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Sensitive Information Sweep

Using Cornell’s Spider

Wyman Miles, Cornell University

Kerry Havens, University of Colorado at Boulder

Steve Lovaas, Colorado State University

Overview

• Quick Background

• The Technical Problem (Kerry)

• The Organizational Problem (Steve)

• Spider (Wyman)

• Summary & Questions

What is “Sensitive Information”?

• A Growing Concern

• A Moving Target

• SSN, Credit Card, Driver’s License, Medical Records, Student Information, Proprietary Research,…

• Data in Context – Aggregation

Why Are We All Here?

• The Front Page!

• CDW-G 2006 Survey – more than 3 million college students may have lost personal information in the last year.

• Identity theft is the fastest growing crime in the U.S.

• By far the biggest culprit? Lost or stolen computers.

Regulations, Standards, & Laws

• Federal – HIPAA, FERPA, SarbOx, GLB,… Identity Theft Protection Act?

• State – Many states passing identity theft protection laws; New York & Colorado have state CISO

• Industry – PCIDSS

The Technical Problem:Finding sensitive information in a

haystack

Kerry Havens

University of Colorado at Boulder

SSN Remediation

• At CU-Boulder, SSNs were used as a student identifier before 2004

• House Bill 03-1175 was approved in 2003 requiring institutions to change this method to ensure the privacy of a student’s social security number

• CU-Boulder started issuing student IDs to new students in July 2004 and converting SSNs to SIDs in 2005

Where the data is not stored

• File type exclusions – fine tuning– Binary files where the data cannot be read– Received input from community for fine tuning

• False positives– International telephone numbers– Examples for web form validation

• Why is the department webpage asking for SSNs?

OS and File Encoding Problems

• HTML encoding problems• Representations (pictures) of sensitive

data are not found– Examples include PDF

• Searching a UNIX filesystem– Preparing the file before searching for private

data– For example, using strings to extract text from

text/binary hybrids like .doc or .xls

Where the data is stored

• Typical file types of discovered data– Gradebooks– Course web pages– Homework assignments– Travel authorization forms– Personal financial documents– Email

Regular Expressions

• Returns too much data: /\d{3}-\d{2}-\d{4}/

• Searching for environment specific data in the hope that common data will lead us to more data:/\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} |

(52[1-4]|65[0-3])\d{6})\b/

• State specific information can be found at

http://www.ssa.gov/employer/stateweb.htm

Regular Expressions

• Let’s dissect this…

/\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} |

(52[1-4]|65[0-3])\d{6})\b/

Regular Expressions

• Let’s dissect this…

/\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} |

(52[1-4]|65[0-3])\d{6})\b/

Boundary

Regular Expressions

• Let’s dissect this…

/\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} |

(52[1-4]|65[0-3])\d{6})\b/

First acceptable digit

Regular Expressions

• Let’s dissect this…

/\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} |

(52[1-4]|65[0-3])\d{6})\b/

2, 4, or 6 digits in a row

Regular Expressions

• Let’s dissect this…

/\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} |

(52[1-4]|65[0-3])\d{6})\b/

Delimited by dash or space

Regular Expressions

• Let’s dissect this…

/\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} |

(52[1-4]|65[0-3])\d{6})\b/

Colorado specific prefix, not delimited

CU Experiences

• Pitfalls– Users’ interpretations of the log file– Fine tuning file extension exceptions and

regular expressions

• Recommendations– Keep current environment in mind

The Organizational Problem:a really big haystack

Steve Lovaas

Network Security Manager

Colorado State University

Organizational Vision

• Support from the top – Cabinet-level committee driving the project– Spurred by headlines and state mandates– VP for IT who really gets security

• Campus PR campaign– Web site– Public meetings

• Tied SSN purge to the rollout of a new CSUID in Fall 2006

Using Resources

• Project Constraints– Tight timeline– No budget – Not a trivial programming project

• Buy / Build / Leverage tools?

• Goal: 100% coverage vs. Best Effort

• Spider chosen for Windows, Linux, Mac

• Manual searching on AIX, mainframe

Ultimate Responsibility

• Original thought: deans / dept. heads

• Revised edition: individual employees

• Developed a personal attestation for for every employee to sign, submitted in bulk by colleges

• More work for central IT

• Senior VP: Doing the scan and signing the form is a CONDITION OF EMPLOYMENT

Individual Attestation Form

• Every employee• 2 choices:

– I don’t interact with SSNs in the course of my job

– SSNs in all electronic files under my control have been removed or encrypted

• VP for IT must approve exceptions

CSU Experiences

• Pitfalls– Beta tool for a live project requires quick response

and careful management of user expectations & acceptance

– Careful of deadlines, it’s a lot of work!

• Recommendations– Don’t do this kind of project without active support

from the very top– Anticipate the need for analysis/parsing tools– Have a supported encryption solution for exceptions

Cornell Spider

Wyman Miles

Sr. Security Engineer

Cornell University

A Brief History of Spider

• Early 2005, scan Web for SSNs

• Later, scan disk images for SSNs/CCNs

• March 2006, debut at BU Security Camp

• April 2006, Educause, demand for a Windows version

• Version 1.0 in May, 2.0 in June

A Brief History, II

• June 2006, major feedback from Steve: bug reports, tests, feature requests

• Engine developed that same month: internal incident response

• OSX Spider Sept 2006

• Windows Spider rewrite

• April 2007, GPL release of all Spiders

Current Spider

• SSN, SIN, CCN, NINO discovery in many file types

• Various data type validators

• Web scanning, back to its roots

• Scan for data in unallocated space

• Faster. More readable source

Various Spiders

• Windows Spider, aka Spider3

• OSX Spider

• Engine, general UNIX spider

• LinSpider, our oldest version

• Spider Simple: Windows Spider preconfigured to skip noisy files

Future Spider

• Feature set convergence between Engine, OSX, Windows

• Community Development

• Possible I2 hosting of distribution and documentation

• More documentation!

• Client-Server model revisited

Spider Log

Spider at Cornell

• Incident response: a compromise has happened, what was at risk?

• Pre-emptive– Dan Elswit, CALS Security Officer

Spider in CIT

• CIT abandoned SSNs a few years ago, but they remain

• Tech support uses Spider Simple to discover lurking SSNs

• Manual process

Athletics

• Spider Simple

• Unique log names to network share

• Centralized analysis

Spider Downloads

• http://www.cit.cornell.edu/security/tools

Summary

• Purging sensitive information is something we’re going to have to get good at

• Get support from the highest levels• Tune regular expressions and file/ext skip

lists for your environment• Anticipate parsing needs, exceptions• New Spider features, more users, broader

OS support• Spider also for ongoing support, forensics

Questions?

• Wyman Miles:– wm63@cornell.edu

• Kerry Havens:– Kerry.Havens@Colorado.EDU

• Steve Lovaas:– Steven.Lovaas@ColoState.EDU

• The Spider users’ list:– cuspider-L@cornell.edu

top related