understanding the value of database discovery - beyond unstructured data

31
UNDERSTANDING THE VALUE OF DATABASE DISCOVERY BEYOND UNSTRUCTURED DATA

Upload: robbie-hilson

Post on 19-Jul-2015

166 views

Category:

Law


1 download

TRANSCRIPT

1

UNDERSTANDING THE VALUE OF DATABASE DISCOVERY BEYOND UNSTRUCTURED DATA

2

Join Today! aceds.org/join

Exclusive News and Analysis

Monthly Members-Only Webcasts

Networking with CEDS, Members

On-Demand Training

Resources

Jobs Board

bits + bytes Newsletter

Affinity Partner Discounts

“ACEDS provides an excellent, much needed forum… to train, network and stay

current on critical information.”

Kimarie Stratos, General Counsel, Memorial Health Systems, Ft. Lauderdale

3

4

PRESENTERS

Stephanie L. Giammarco sits on BDO’s Board of Directors and leads

BDO’s Forensic Technology Services practice with more than 20

years of experience and a background in accounting, information

technology and criminology. Having worked on some of the largest

financial frauds to date, she has led teams creating databases of

millions of records, performed advanced data analytics and provided

testimony pertaining to damages and electronically stored

information.

Stephanie provides litigation and consulting services to organizations

and their counsel, including data analytics, computer forensics and

e-discovery services related to domestic and international matters

involving product liability, financial statement fraud, class action

lawsuits, internal investigations, securities fraud, employee and

vendor schemes, and breach of contract. She is skilled in the

collection, preservation and analysis of electronic evidence, as well

as the implementation of various e-discovery tools.

She has been deposed as a Rule 30(b)6 e-discovery witness and

testified before the Judicial Arbitration Services on the calculation

of damages in contract disputes. Stephanie has published and

presented on a range of computer forensics and e-discovery topics,

including before the Securities and Exchange Commission, Security

Industry Authority and National Futures Association.

Chris J. Lopata is of counsel at Jones Day in New York. His practice

focuses on complex and general civil litigation, including product

liability, toxic torts, credit reporting, and a wide range of business

litigation.

Chris is a member of the firm's e-Discovery Committee and serves as the

New York office coordinator for e-discovery issues. Chris has led

discovery teams in numerous joint defense groups. He has extensive

experience coordinating affirmative and defensive e-discovery efforts on

behalf of clients.

Chris' practice extends beyond pretrial e-discovery. He has served as lead

trial counsel in a variety of commercial disputes in New York State

courts. He also has counseled clients who have sought and obtained

favorable settlements in non-trial bound business disputes.

The views set forth herein are the personal views of the author and do

not necessarily reflect those of the law firm with which he is associated.

Stephanie L. Giammarco, CPA/CITP, CFE, CEDS

Partner, BDO Consulting

[email protected]

Direct: 212-885-7439

Christopher J. Lopata

Of Counsel, Jones Day

[email protected]

Direct: 212-326-3602

5

OUR AGENDA

1. A quick poll of the audience…

2. Structured v. unstructured data

3. Some necessary definitions

4. Examples of database-driven applications

5. The database schema and data dictionary

6. Theories of database discovery

7. Database discovery: methods for “pulling” data for review and

production

8. Practice pointers

6

A Quick Poll…

… on Database Discovery

7

A QUICK POLL…

Who knows what a database is?

A fancy Excel spreadsheet. A collection of rows and columns, each populated with a value.

Who has used a database as part of their personal or work activities?

All of you have…Google & Lexus for research. Your time-keeping system, Concordance, Summation,

and Relativity, are all databases. Your company’s email system is effectively a database.

Who has had to conduct discovery from a database (or database-driven application)?

Sales and Marketing (CRM), Human Resources (HRIS), and GL/Inventory (ERP). SAP, and Hyperion

are perfect examples.

Bonus Question: Who can tell me what a relational database is?

A bunch of Excel spreadsheets (tables) linked together by a common key…

8

DEFINITIONS

Unstructured v. Structured Data

The Table

The Relational Database

9

DEFINITIONS| UNSTRUCTURED V. STRUCTURED DATA

Unstructured Data

Wikipedia definition: Unstructured Data (or unstructured information) refers to

information that does not have a pre-defined data model. Unstructured information is

typically text-heavy, but may contain data such as dates, numbers, and facts as well.

Translation: MS office files, loose files, most of the information that you can see via

Windows Explorer.

Structured Data

Definition: Structured Data is information that resides in fixed fields within a record

or file, or is information that is organized into rows and columns, with pre-set

characteristics.

Translation: Multiple tables, containing rows and columns which relate to each other

via common key.

10

DEFINITIONS| THE TABLE (THE CORE OF THE DATABASE)

Records, not files…

Rows v. columns

Tables maintain the relationship

between columns

A field is another way of saying column

Data values, in the context of rows,

columns and tables, is the substance

Real-time, constantly changing

information

Data dictionary

Schema

11

DEFINITIONS| THE RELATIONAL DATABASE

Some databases only have one table

(flat file systems) and are no different

than a Microsoft Excel spreadsheet (very

rare).

Relational databases, which are much

more common, have multiple tables,

each with a key that “links” them

together.

How can relational databases be more

challenging to handle than flat file

systems in the context of discovery?

Why do we use databases?

12

DATABASES

Database-Driven Applications

The Schema

The Data Dictionary

13

DATABASES| DATABASE-DRIVEN APPLICATIONS

A database, when combined with a user interface is often called a database-driven

application.

Enterprise Resource Planning (ERP)

Data Warehouses & Business Intelligence Systems

Human Resource Information System (HRIS)

Customer Relationship Management (CRM)

Adverse Effects Systems

SharePoint

Email Archiving Systems

kCura Relativity

DATABASES ARE ALL AROUND US AND WE WORK WITH THEM EVERY DAY.

14

DATABASES| THE SCHEMA

The database schema is the key to understanding:

What tables of data exist within the relational database.

The name assigned to each column within each table.

How the columns are grouped together in each table.

How the tables relate to each other.

15

DATABASES| THE DATA DICTIONARY

Within the STUDENTS table, there are two columns of information.

– The STUDENT column contains the name of the student enrolled in the university

– The ID column is the unique identification number assigned to each student

Within the ACTIVITIES table, there are four columns of

information.

– The ID column is the unique identification number assigned to each

student

– The ACTIVITY1 column contains the name of the activity they are

registered for

– The COST1 column contains the fee paid to the school for the activity

– The ACTIVITY2 column represents the secondary (if any) activity that

the student is registered for

– The COST2 column contains the fee paid to the school for the

secondary activity

The ID field is the primary key between the STUDENTS and ACTIVITIES tables.

16

DATABASE DISCOVERY

Reports, Data, Trends

17

DATABASE DISCOVERY

Theory #1 – Reports are all that matter...

18

DATABASE DISCOVERY

Theory #2 – Data is all that matters...

Databases are huge, historical repositories of “activity”

– Information inserted into a CRM system by an sales person, recording customer wins and losses,

potential new business opportunities, or even other uses for a medication he or she is selling

(Pharmaceutical Sales).

– The price point for a specific medication inserted into a POS system, and the entity that is

paying for it (Medicare Fraud).

– A history of consistent payments to a “false” or “suspicious” entity in the general ledger

(within the ERP system) (FCPA).

The best way to identify trends is to pull large amounts of data into a usable format -

sort, filter, and investigate.

19

DATABASE DISCOVERY| THE “BRUTE FORCE” METHOD

Just get the data out. Common in DOJ and FTC requests for data. Also used to provide

raw data to experts for analysis.

Sample DOJ Database Request

1. Identify each electronic or other database or data set used or maintained by the company at any

time after January 1, 2009, without regard to custodian, that contains information concerning the

company’s (a) products and product codes; (b) facilities; (c) production; (d) shipments; (e) sales;

(f) prices; (g) margins; (h) costs, including but not limited to production costs, distribution costs,

research and development costs, storage costs, standard costs, expected costs, and opportunity

costs; (i) patents or other intellectual property; (j) research or development projects; or (k)

customers, to the extent such customer information is not provided in response to specifications 9

and 10. For each such database, identify (i) the database type, i.e., flat, relational, or

enterprise; (ii) the size in both number of records and bytes of information; (iii) the fields,

query forms, and reports available or maintained; and (iv) any software product or platform

required to access the database.

20

DATABASE DISCOVERY| THE “BRUTE FORCE” METHOD

2. Submit a useable copy of each database or data set identified in response to specification 1), any

accompanying data dictionary, and any software product or platform required to access the

database or data set. For each database or data set identified in response to specification 1) that

contains cost or margin information, submit one copy of each regularly produced (no more

frequently than in four week periods) report generated using that database since January 1, 2009,

and any documentation that defines, describes or explains the calculation in any terms, measures,

or aggregations appearing on the materials provided.

3. For all databases or data sets produced in response to the specifications 1) and 2), describe in

detail the relationship of the different tables in the database (e.g., an entity relationship diagram

and all foreign keys) and submit documents sufficient to show the tables that are populated by the

company, and the following items for each table: (a) the size of the table in both number of

records and bytes of information; (b) the table name; (c) a general description of the

information contained in the table; (d) a list of field names; (e) a definition for each field as it

is used by the company, including the meanings of all codes that can appear as field values; (f)

the format, including variable type and length, of each field; and (g) the primary key in a

given table that defines a unique observation.

21

DATABASE DISCOVERY| THE “BRUTE FORCE” METHOD

Why is this request so difficult and what is the potential way to approach this?

Work with data dictionary and schema to determine what information exists in the

system.

With the limited information you have (table and column names, as well as limited

descriptions), attempt to ascertain what information is relevant within the database.

Find a “super user.”

Try to understand how the columns and tables that you have identified relate to each

other.

Develop a “custom” query to extract that information into a “usable” format

(Microsoft Excel, delimited text file).

Review & Produce…

22

DATABASE DISCOVERY| THE “BRUTE FORCE” METHOD

Some Potential Problems:

Unfortunately, the data dictionary and schema often do not exist, especially in the

case of a proprietary or legacy system.

If one or the other doesn’t exist, this method becomes much more complex.

Many fields in a typical database are not used, which adds complexity.

This method can be very time consuming.

Often it can result in a heated negotiation between parties (how did you choose those

fields, what other fields exist, how do we (opposing) know you gave us everything…

You can leverage in-house resources, but then they may have to testify.

23

DATABASE DISCOVERY| THE “REPORT” METHOD

Commonly used to extract data to evaluate potential damages.

Sample Request

Documents sufficient to show: (a) the number of units sold by month, year and purchaser from January

1, 2001 to the present including product numbers; (b) the revenue attributable to each food product

by month and year from January 1, 2001 to the present; (c) the gross profit attributable to each food

product by month and year from January 1, 2001 to the present; (d) the net profit attributable to each

food product by month and year from January 1, 2001 to the present; and (e) any discounts, rebates

not reflected in price per unit.

For each food product identified in your answer to above, produce documents sufficient to show your

revenue, costs, including but not limited to both fixed and variable costs for each component, and

profit margin, from January 1, 2001 to the present.

24

DATABASE DISCOVERY| THE “REPORT” METHOD

Investigate the existing reporting functionality:

Virtually every database-driven application has a built-in, somewhat user-friendly

reporting function.

Generate a list of all the “standard” reports that are typically “run” from the system.

Narrow the lengthy list to a select few and pull samples (repeat as necessary).

Review the reports to determine whether they address the relevant activity

(potentially even meet and confer on the topic).

Agree on the reports that will be produced and the timeframe applicable.

25

Practice Pointers

Databases v. Reports

Balancing the Pros and Cons

“Unstructured“ Data in the

“Structured" Database

The Truth is (Not) Always in the

Numbers

Meet-and-Confer Considerations

26

PRACTICE POINTERS

Assess the value of producing/seeking databases

versus reports

How to prove or defend the case?

What do your experts need?

How substantial are costs and burdens -- and is fee

shifting a possibility?

Is specialized software or hardware required for

native databases?

Is the database structure (not just the data) a trade

secret?

27

PRACTICE POINTERS

Balancing some of the pros and cons of databases

and reports

Reports are often easier to review, more limited

in scope, and generally less costly

Databases are often incomprehensible to mere

mortals, open to any kind of search, and

generally more expensive -- for producing and

requesting parties

28

PRACTICE POINTERS

Beware the "unstructured" data hiding in the

"structured" database

Open-text or free form fields

Redacting databases with 10+ billion entries

Anticipate privacy issues if personally identifiable

information exists

29

PRACTICE POINTERS

The truth is (not) always in the numbers

Missing data or errors in the data

Data dictionaries -- explaining the codes

Figuring out how the database is "really" used

Legacy system migrations and migraines

30

PRACTICE POINTERS

Meet-and-Confer Considerations

Scope of relevant information

Understand the systems before making/demanding

commitments

Limitations on time period, fields, geography,

business units, etc.

Availability of preexisting reports and creating

custom reports

Listings of tables, columns, rows

Data dictionaries and the schema

31

Q & AStephanie L. Giammarco

Partner, BDO Consulting

[email protected]

Direct: 212-885-7439

Christopher J. Lopata

Of Counsel, Jones Day

[email protected]

Direct: 212-326-3602