tar 2.0 and cal: smart people. smart machines. smart discovery

877.557.4273

catalystsecure.com

Smart People. Smart Machines. Smart Discovery

WEBINAR

TAR 2.0 & CAL John Tredennick, Esq. Mark Noel, Esq. Moderator Robert Ambrogi, Esq.

Presenters

Panelists

Bob is a lawyer, blogger and veteran legal journalist who has served as editor-in-chief of a number of legal publications, including the National Law Journal. Bob serves as Catalyst’s director of communications.

Robert Ambrogi, Esq. Moderator

Mark consults with corporate law departments to deliver effective knowledge management and work-flow solutions driving productivity and accountability.

Mark Noel, Esq. Managing Director, Professional Services

John Tredennick, Esq. Founder and CEO

A nationally known trial lawyer and longtime litigation partner, John is Catalyst's founder and CEO. John was recently honored as one of the top six e-discovery trail blazers by The American Lawyer.

Informal poll results Use of Technology Assisted Review

The benefits of TAR are widely recognized. The vast majority of respondents said that they believed TAR would reduce review costs, speed up review and make prioritized review more efficient. !

Even so, close to half the respondents had never used TAR. Why is this?

Poll results show common perceptions of TARWhat Size Cases Are Most Appropriate for TAR?

▪ Many legal professionals perceive TAR as suitable only for very specific types of cases.

▪ One respondent said that TAR is best for "large cases, matters with a clearly defined issue, 'impossible timing' cases."

▪ Others also cited the fact that they hadn't had large-enough cases to warrant the use of TAR.

“We would like to use it, but the right case has not come up.”

Poll Concerns about TAR

“We lack documented processes and procedures about how to use it effectively within our discovery model. TAR changes our model significantly.”

“My previous company used it and it missed one of the biggest hot documents in the case. I wasn't involved in setting it up so I don't know where the issue was.”

What is Technology Assisted Review?

1. A process through which humans work with a computer to teach it to identify relevant documents.

2. Ordering documents by relevance for more efficient review.

3. [Optional] Stopping the review after you have reviewed a high percentage of documents.

We Already Use It

Collect, take random samples, then train with an expert, then reviewTAR 1.0

Start immediately with whatever and whomever; review = trainingTAR 2.0

What is the Process?1. Collect and process your files

2. Train the system (seeds)

3. Rank the documents

4. Continue your review/training

5. Test as you go

6. Stop when finished

!

CAL: Solving Real-World Problems

1. One time training

2. Rolling uploads

3. Subject matter experts

4. Low richness collections

5. Only for big cases

Continuous active learning (CAL) is more effective than one-time

training used in TAR 1.0

Simple and flexible CAL Training

1. Train with anything you want

2. Use as many or as few as you want 3. Keep searching and feeding throughout the

process

4. Makes use of every attorney decision on documents

5. Contextual diversity will help find what you don’t know

6. QC helps ensure consistent training

“Non-random training methods [keyword search and active learning] require substantially and significantly less human review effort to achieve any given level of recall, than passive learning [random seed selection].”

Contextual Diversity (Explicit Modeling of the Unknown)

What Are the Savings?

0%!

10%!

20%!

30%!

40%!

50%!

60%!

70%!

80%!

90%!

100%!

0%! 10%! 20%! 30%! 40%! 50%! 60%! 70%! 80%! 90%! 100%!

Percentage!of!Relevant!Documents!Found!(Recall)!

Percentage!of!Documents!Reviewed!

Yield!Curve!


0%!

10%!

20%!

30%!

40%!

50%!

60%!

70%!

80%!

90%!

100%!

0%! 10%! 20%! 30%! 40%! 50%! 60%! 70%! 80%! 90%! 100%!



Yield!Curve!

Number of documents in the review


0%!

10%!

20%!

30%!

40%!

50%!

60%!

70%!

80%!

90%!

100%!

0%! 10%! 20%! 30%! 40%! 50%! 60%! 70%! 80%! 90%! 100%!



Yield!Curve!

Percentage of relevant documents found


0%!

10%!

20%!

30%!

40%!

50%!

60%!

70%!

80%!

90%!

100%!

0%! 10%! 20%! 30%! 40%! 50%! 60%! 70%! 80%! 90%! 100%!



Yield!Curve!

Linear Review

! Support Vector Machines

! Naïve Bayes

! K-Nearest Neighbor

! Geospatial Predictive Modeling

! Latent Semantic

“I may be less interested in the science behind the ‘black box’ than in whether it produced responsive documents with reasonably high recall and high precision.”

– Peck, M.J. (SDNY)

It’s About Mathematics

ALGORITHM

Collect, take random samples, then train with an expert, then reviewTAR 1.0

Start immediately with whatever and whomever; review = trainingTAR 2.0

Simple Passive Learning

-+

Review Discard

-+

UnknownReview Discard

Continuous Active Learning

Classify

Protect

Discover

Three Broad Categories:Review/Search Tasks in Discovery

Recall? Reasonably high

Precision? Reasonably high

1. Classify Which bucket does each document go into?

Recall? 100% – nothing escapes

Precision? Usually high, especially if you have to log it all

2. Protect Make sure no sensitive or privileged info gets out2. Protect

Search Terms

Human Review

TAR Ranking

2. ProtectPrivilege Review

Recall? Don’t really care

Precision? Really good – don’t waste our time with junk.

3. Discover What can we learn from the documents’ contents?3. Discover

3. Discover

Issue Coding

Infringement

Damages

Willfullness

0%!

10%!

20%!

30%!

40%!

50%!

60%!

70%!

80%!

90%!

100%!

0%! 10%! 20%! 30%! 40%! 50%! 60%! 70%! 80%! 90%! 100%!

Yield!Curve!

%!of!Documents!

%!Relevant!


0%!

10%!

20%!

30%!

40%!

50%!

60%!

70%!

80%!

90%!

100%!

0%! 10%! 20%! 30%! 40%! 50%! 60%! 70%! 80%! 90%! 100%!

Yield!Curve!

%!of!Documents!

%!Relevant!

Review 12% and get 80% recall


0%!

10%!

20%!

30%!

40%!

50%!

60%!

70%!

80%!

90%!

100%!

0%! 10%! 20%! 30%! 40%! 50%! 60%! 70%! 80%! 90%! 100%!

Yield!Curve!

%!of!Documents!

%!Relevant!

Review 24% and get 95% recall


Case Study: Large Production Review

▪ Collection: 2.1 million documents ▪ Initial richness: 1% ▪ Review team used CAL ▪ Review richness: 25 to 35%

Result: 98% Recall

Review: Less than 10%

Discussion and Q&A

Robert Ambrogi

[email protected]

Mark Noel

[email protected]

John Tredennick

[email protected]

Thank You!