© 2008 clearwell systems, inc. confidential role of enterprise search in e-discovery june 18, 2008

14
© 2008 Clearwell Systems, Inc. Confidential Role of Enterprise Search in E-Discovery June 18, 2008

Upload: darrell-robertson

Post on 17-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: © 2008 Clearwell Systems, Inc. Confidential Role of Enterprise Search in E-Discovery June 18, 2008

© 2008 Clearwell Systems, Inc. Confidential

Role of Enterprise Search

in E-Discovery

June 18, 2008

Page 2: © 2008 Clearwell Systems, Inc. Confidential Role of Enterprise Search in E-Discovery June 18, 2008

© 2008 Clearwell Systems, Inc. Confidential 2

Enterprise E-Discovery is a business processSearch is central to E-Discovery

Processing

Analysis

Information Management

Identification Review Production Presentation

Preservation

Collection

VOLUME RELEVANCE

Electronic Discovery Reference Model (www.edrm.net)

Identification Search

• Custodians• Meta-Data• Date Range• Media Type• Data Type

Collection Search

• By Custodian • By Operator• By Data Type• By keyword, phrase,

concept• By Project

Analysis/Review Search

• Responsiveness• Privilege Determination• Review Grouping• Near-duplicates• Quality Control

Page 3: © 2008 Clearwell Systems, Inc. Confidential Role of Enterprise Search in E-Discovery June 18, 2008

© 2008 Clearwell Systems, Inc. Confidential 3

FRCP Rules governing E-Discovery

Rule Summary Reading

Rule 16(b) Outline plans for e-discovery and document production

Rule 26(f) Procedures and Protocols to govern e-discovery

Rule 16(b) (5) Courts to include scheduling orders

Rule 26(a) Expansion on definition of ESI

Rule 26(b) (2)E-Discovery Scope Cost-Shifting arguments – Burden of reasonableness moving to Requesting Party

Rule 26(b) (5) Inadvertently disclosed ESI and Privilege Claw-back agreements

Rule 34(b) Specify forms of production (Native, Image etc.)

Rule 37(f)Disallow sanctions when ESI lost due to retention policy and good faith efforts

Page 4: © 2008 Clearwell Systems, Inc. Confidential Role of Enterprise Search in E-Discovery June 18, 2008

© 2008 Clearwell Systems, Inc. Confidential 4

FRCP Rules and Their Impact on E-Discovery

• Emphasis on co-operation during E-Discovery• Sedona Principles as a Guide for E-Discovery• Early Discovery Planning Conferences• No “Gaming” of E-Discovery

• Prepare for Meet and Confer• Organizational Structure• Information Assets and Data Map• ILM Policies and Procedures• Backup and Disaster Recovery Practices• Preservation Hold/Legal Hold Policies and Actions

• Establish E-Discovery Scope• Estimate Review Size from automated Search Results• Raw Volume, Processed Volume, Review Volume• Substantiate “Not Reasonably Accessible” Claims• Move burden of “cost provability” to the Requesting Party

Page 5: © 2008 Clearwell Systems, Inc. Confidential Role of Enterprise Search in E-Discovery June 18, 2008

© 2008 Clearwell Systems, Inc. Confidential 5

CaseData

PreservationHold

Enabling E-Discovery within an Enterprise

File Shares

Messagingservers

CMS

Meta-DataIndex

Enterprise Intranet

KeywordIndex

Digital Asset Database

Organizational Data

IT Personnel

Data Map

Legal IT Personnel

Analysis, Culling, Review

Legal Search/Analysts

ECM/ILM Policies

Page 6: © 2008 Clearwell Systems, Inc. Confidential Role of Enterprise Search in E-Discovery June 18, 2008

© 2008 Clearwell Systems, Inc. Confidential 6

E-Discovery Search Characteristics

Theme

• Produce Entire Results – not sufficient to only produce Top N

• No Estimates of Counts – Must provide accurate, actual counts

• Stability of Results

• Very large Result Sets

• Fast Query Response Time

• Provide Complete Hit Context

Relevance

• Activity Based Relevance – Responsiveness Search vs. Privilege Search

• Meta-Data based Relevance – Timeliness, People, Connection to other data

• Review-directed Relevance

• Traditional TF/IDF based Relevance

Results Management

• Complete Auditing of all Searches

• Document Hit Count Reports

• Tie back to original Document Meta-Data

• EDRM XML-2 Export to downstream processes

• Group Neat-Duplicates, Concept Clusters for Review Efficiency

Data Types

• Many data formats – 10,000 formats

• New communication formats – Wiki, Blogs, SMS, IM, Unified Messaging

• ESI from old, legacy applications

• Incomplete and Corrupt data (Deleted Files, raw disk blocks)

• Handle Multi-language ESI

• Handle Low-fidelity documents – OCR-scanned images

Flexibility

• Advanced Search/Query Language

• Iterative Search and Search Refinement

• Guided Navigation, one-click Filtering

• Saving and Sharing Searches

• Remove impediments to search – ACLs, Encryption, Container Extraction

• Real-time updates for Tagging, Classifying Results

Workflow

• Incremental ESI Collections (Batches)

• Multi-level Review

• Multi-person Review

• Rolling Productions

• Activity Reports

• Outside Counsel, Opposing Counsel interactions

• Project Management

Page 7: © 2008 Clearwell Systems, Inc. Confidential Role of Enterprise Search in E-Discovery June 18, 2008

© 2008 Clearwell Systems, Inc. Confidential 7

Search EffectivenessTechniques to improve Precision and Recall

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

Precision

• Pre-filtering wildcard expansions

• Boolean Queries

• Proximity Specification

• Keyword Scope (Sentence, Paragraph)

• Meta-Data Context

• Entity based Search

Recall

• Misspellings/Fuzzy Search

• Wildcard Specifications

• Synonyms

• Related Terms

• Concept Search

• Bayesian Search

Precision

Recall

Search

Page 8: © 2008 Clearwell Systems, Inc. Confidential Role of Enterprise Search in E-Discovery June 18, 2008

© 2008 Clearwell Systems, Inc. Confidential 8

E-Discovery Search: Typical measures and outcomes

Search Method Retrieved Documents

Sample Size Responsive in Sample

Estimate in Retrieved

Precision

Keyword Search

14846 1537 940 9080 0.61

Discussions Threads

16515 1537 1069 11486 0.70

Concept Search 18554 1537 1128 13617 0.73

Search Method Unretrieved Documents

Sample Size Responsive in Sample

Estimate in Unretrieved

Recall

Keyword Search 1076258

1537 29 20307 0.31

Discussions Threads 1074589

1537 28 19576 0.37

Concept Search 1072550 1537 26 18143 0.43

Number of truly responsive in Retrieved Collection:

Number of truly responsive documents in Un-Retrieved Collection

Page 9: © 2008 Clearwell Systems, Inc. Confidential Role of Enterprise Search in E-Discovery June 18, 2008

© 2008 Clearwell Systems, Inc. Confidential 9

Interactive Search: Key to Search Efficiency

Interactive wildcard, stemming expansion selection• Removes precision-recall

tradeoff by enabling interactive review and removal of false positive expansions

• Save thousands of dollars per search

Search Report• Detailed, interactive

keyword search report results for iterative large query execution

• Full transparency and auditing

• Significant time savings

Page 10: © 2008 Clearwell Systems, Inc. Confidential Role of Enterprise Search in E-Discovery June 18, 2008

© 2008 Clearwell Systems, Inc. Confidential 10

E-Discovery is about extracting Relevant Content

50 TB

PreservationStore

500 GB

1-2 GB

Archive and Store

Collect and Preserve

Analyze and Review

100-1000 TB

Page 11: © 2008 Clearwell Systems, Inc. Confidential Role of Enterprise Search in E-Discovery June 18, 2008

© 2008 Clearwell Systems, Inc. Confidential 11

Enterprise Case Study – Global Media Conglomerate

Case Data

456,448

208,628

74,713

Data culling based on query permutations reduced data set by 99% to

417

Data culling based on query permutations reduced data set by 99% to

417

Time = 2.5 days

Eliminating the need to process and review 456,000 documents saved $175,000

Eliminating the need to process and review 456,000 documents saved $175,000

Page 12: © 2008 Clearwell Systems, Inc. Confidential Role of Enterprise Search in E-Discovery June 18, 2008

© 2008 Clearwell Systems, Inc. Confidential 12

Source SCANFull TextIndexer

Copy EngineProcessingCase Mgmt

SOURCES

Deep IndexFull-Text

E-Discovery - Workflow

Meta-Data(Shallow Index)

ProcessingManifest

Rate of Ingestion• 1M files/hour• 10K directory scans• 1 TB/hour

Size of Index• 1 TB• 10 billion

objects

Size of Index• 0.2 TB• 1 billion rows• 10K/s Bulk-

Load

Size of Index• 1 TB (each partition)• Up to 100 index

partitions• 10 billion objects• 200-400 file types• Includes meta-data

Rate of Indexing• 100 K files/hour• 10-20 GB/hour

Rate of Extraction• 20 K files/hour• 2-4 GB/hour

Rate of Processing• 100 custodians• 10K files/hour• 1 GB PST/custodian

Size of Store• 32 TB FC/SCSI• 4 TB NTFS• 300 GB/custodian• 100 custodians

Size of Manifest• 10 million items

Case ESI Store

SQL Full Text

Page 13: © 2008 Clearwell Systems, Inc. Confidential Role of Enterprise Search in E-Discovery June 18, 2008

© 2008 Clearwell Systems, Inc. Confidential 13

E-Discovery Search: Collection Workflow

Source SCAN

SOURCES

Meta-Data(Shallow Index)

Search

Search Scope• Owners/SID• Last Modification Date• Creation Date• Author/Title• Department

Search Technology• Keyword Search• Parameterized Date Range

Case Document Collection

Copy of Original• Maintain Original Locations• Hash with Meta-Data for content and

location integrity• Hash without Meta-Data for content

Integrity

Page 14: © 2008 Clearwell Systems, Inc. Confidential Role of Enterprise Search in E-Discovery June 18, 2008

© 2008 Clearwell Systems, Inc. Confidential 14

Privilege Search• Documents• Emails

E-Discovery Search: Analysis Workflow

Search

Search Scope• Documents• Emails

Search Technology• Keywords• Boolean Search• Proximity Search• Fuzzy Search• Concept Search• Tagged Search

SampleNon-Responsive

Documents

SamplingEngine

Search Refinement• Additional

Keywords• Additional

Search Methods

DocumentReview

Search

Quality Control• Documents• Emails• Tags

Confidence Level

Sample Size

95 1537

99 66358

Case Document Collection

ResponsiveDocuments

Non-ResponsiveDocuments

ResponsiveMisses

“Recall”

Search

PotentiallyPrivileged

Documents

PotentiallyResponsiveDocuments

Privilege Review

Privilege“Misses” Review

PrivilegedDocuments

ProductionDocuments

Reports• Search Reports• Activity Reports• QC Reports• Project Review Reports• Privilege Log• Exceptions Reports