tar 2.0 and cal: smart people. smart machines. smart discovery
TRANSCRIPT
877.557.4273
catalystsecure.com
Smart People. Smart Machines. Smart Discovery
WEBINAR
TAR 2.0 & CAL John Tredennick, Esq. Mark Noel, Esq. Moderator Robert Ambrogi, Esq.
Presenters
Panelists
Bob is a lawyer, blogger and veteran legal journalist who has served as editor-in-chief of a number of legal publications, including the National Law Journal. Bob serves as Catalyst’s director of communications.
Robert Ambrogi, Esq. Moderator
Mark consults with corporate law departments to deliver effective knowledge management and work-flow solutions driving productivity and accountability.
Mark Noel, Esq. Managing Director, Professional Services
John Tredennick, Esq. Founder and CEO
A nationally known trial lawyer and longtime litigation partner, John is Catalyst's founder and CEO. John was recently honored as one of the top six e-discovery trail blazers by The American Lawyer.
Informal poll results Use of Technology Assisted Review
The benefits of TAR are widely recognized. The vast majority of respondents said that they believed TAR would reduce review costs, speed up review and make prioritized review more efficient. !
Even so, close to half the respondents had never used TAR. Why is this?
Poll results show common perceptions of TARWhat Size Cases Are Most Appropriate for TAR?
▪ Many legal professionals perceive TAR as suitable only for very specific types of cases.
▪ One respondent said that TAR is best for "large cases, matters with a clearly defined issue, 'impossible timing' cases."
▪ Others also cited the fact that they hadn't had large-enough cases to warrant the use of TAR.
“We would like to use it, but the right case has not come up.”
Poll Concerns about TAR
“We lack documented processes and procedures about how to use it effectively within our discovery model. TAR changes our model significantly.”
“My previous company used it and it missed one of the biggest hot documents in the case. I wasn't involved in setting it up so I don't know where the issue was.”
What is Technology Assisted Review?
1. A process through which humans work with a computer to teach it to identify relevant documents.
2. Ordering documents by relevance for more efficient review.
3. [Optional] Stopping the review after you have reviewed a high percentage of documents.
What is the Process?1. Collect and process your files
2. Train the system (seeds)
3. Rank the documents
4. Continue your review/training
5. Test as you go
6. Stop when finished
!
CAL: Solving Real-World Problems
1. One time training
2. Rolling uploads
3. Subject matter experts
4. Low richness collections
5. Only for big cases
Simple and flexible CAL Training
1. Train with anything you want
2. Use as many or as few as you want 3. Keep searching and feeding throughout the
process
4. Makes use of every attorney decision on documents
5. Contextual diversity will help find what you don’t know
6. QC helps ensure consistent training
“Non-random training methods [keyword search and active learning] require substantially and significantly less human review effort to achieve any given level of recall, than passive learning [random seed selection].”
What Are the Savings?
0%!
10%!
20%!
30%!
40%!
50%!
60%!
70%!
80%!
90%!
100%!
0%! 10%! 20%! 30%! 40%! 50%! 60%! 70%! 80%! 90%! 100%!
Percentage!of!Relevant!Documents!Found!(Recall)!
Percentage!of!Documents!Reviewed!
Yield!Curve!
What Are the Savings?
0%!
10%!
20%!
30%!
40%!
50%!
60%!
70%!
80%!
90%!
100%!
0%! 10%! 20%! 30%! 40%! 50%! 60%! 70%! 80%! 90%! 100%!
Percentage!of!Relevant!Documents!Found!(Recall)!
Percentage!of!Documents!Reviewed!
Yield!Curve!
Number of documents in the review
What Are the Savings?
0%!
10%!
20%!
30%!
40%!
50%!
60%!
70%!
80%!
90%!
100%!
0%! 10%! 20%! 30%! 40%! 50%! 60%! 70%! 80%! 90%! 100%!
Percentage!of!Relevant!Documents!Found!(Recall)!
Percentage!of!Documents!Reviewed!
Yield!Curve!
Percentage of relevant documents found
What Are the Savings?
0%!
10%!
20%!
30%!
40%!
50%!
60%!
70%!
80%!
90%!
100%!
0%! 10%! 20%! 30%! 40%! 50%! 60%! 70%! 80%! 90%! 100%!
Percentage!of!Relevant!Documents!Found!(Recall)!
Percentage!of!Documents!Reviewed!
Yield!Curve!
Linear Review
! Support Vector Machines
! Naïve Bayes
! K-Nearest Neighbor
! Geospatial Predictive Modeling
! Latent Semantic
“I may be less interested in the science behind the ‘black box’ than in whether it produced responsive documents with reasonably high recall and high precision.”
– Peck, M.J. (SDNY)
It’s About Mathematics
ALGORITHM
Recall? Reasonably high
Precision? Reasonably high
1. Classify Which bucket does each document go into?
Recall? 100% – nothing escapes
Precision? Usually high, especially if you have to log it all
2. Protect Make sure no sensitive or privileged info gets out2. Protect
Recall? Don’t really care
Precision? Really good – don’t waste our time with junk.
3. Discover What can we learn from the documents’ contents?3. Discover
0%!
10%!
20%!
30%!
40%!
50%!
60%!
70%!
80%!
90%!
100%!
0%! 10%! 20%! 30%! 40%! 50%! 60%! 70%! 80%! 90%! 100%!
Yield!Curve!
%!of!Documents!
%!Relevant!
What Are the Savings?
0%!
10%!
20%!
30%!
40%!
50%!
60%!
70%!
80%!
90%!
100%!
0%! 10%! 20%! 30%! 40%! 50%! 60%! 70%! 80%! 90%! 100%!
Yield!Curve!
%!of!Documents!
%!Relevant!
Review 12% and get 80% recall
What Are the Savings?
0%!
10%!
20%!
30%!
40%!
50%!
60%!
70%!
80%!
90%!
100%!
0%! 10%! 20%! 30%! 40%! 50%! 60%! 70%! 80%! 90%! 100%!
Yield!Curve!
%!of!Documents!
%!Relevant!
Review 24% and get 95% recall
What Are the Savings?
Case Study: Large Production Review
▪ Collection: 2.1 million documents ▪ Initial richness: 1% ▪ Review team used CAL ▪ Review richness: 25 to 35%
Robert Ambrogi
Mark Noel
John Tredennick
Thank You!