demystifying predictive coding technology
TRANSCRIPT
![Page 1: Demystifying Predictive Coding Technology](https://reader035.vdocuments.mx/reader035/viewer/2022081502/557e0181d8b42a16408b4732/html5/thumbnails/1.jpg)
Demystifying Predictive Coding Technology
Date: Wednesday, August 13, 2014
Time: 1 p.m. ET / Noon CT / 11 a.m. MT / 10 a.m. PT
Anita Engles, VP Products and Marketing Daegis
Doug Stewart, VP Sales Support Daegis
![Page 2: Demystifying Predictive Coding Technology](https://reader035.vdocuments.mx/reader035/viewer/2022081502/557e0181d8b42a16408b4732/html5/thumbnails/2.jpg)
TAR Defined
A process for prioritizing or coding a collection of
electronic documents using a computerized
system that harnesses human judgments of one
or more Subject Matter Expert(s) on a smaller set
of documents and then extrapolates those
judgments to the remaining Document Population.* Grossman & Cormack 2012
![Page 3: Demystifying Predictive Coding Technology](https://reader035.vdocuments.mx/reader035/viewer/2022081502/557e0181d8b42a16408b4732/html5/thumbnails/3.jpg)
The TAR Frontlines
• Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery (2014)
• Maura R. Grossman and Gordon V. Cormack• http://cormack.uwaterloo.ca/cormack/calstudy/
![Page 4: Demystifying Predictive Coding Technology](https://reader035.vdocuments.mx/reader035/viewer/2022081502/557e0181d8b42a16408b4732/html5/thumbnails/4.jpg)
Key Findings
• Non-Random Selection Methods Work Best for Seed Set
• Active Learning Better than Passive Learning
• Senior Level Subject Matter Experts are NOT Required to Train System
![Page 5: Demystifying Predictive Coding Technology](https://reader035.vdocuments.mx/reader035/viewer/2022081502/557e0181d8b42a16408b4732/html5/thumbnails/5.jpg)
TAR Steps
Process Overview
ProducingTrainingAssessing Results
Creating the Seed
Set
Keyword Searching
Relatedness Scoring
Identifying the
Population
![Page 6: Demystifying Predictive Coding Technology](https://reader035.vdocuments.mx/reader035/viewer/2022081502/557e0181d8b42a16408b4732/html5/thumbnails/6.jpg)
Relatedness Scoring
Building the Map
• Build the MapStep
• Measure Relationships
Purpose
• AlgorithmsVariations
• Core to Predictive Functionality
Why It Matters
![Page 7: Demystifying Predictive Coding Technology](https://reader035.vdocuments.mx/reader035/viewer/2022081502/557e0181d8b42a16408b4732/html5/thumbnails/7.jpg)
Keyword Searching
Tried and True
• Validated & Iterative Keyword Searching
Step
• Inexpensive TrainingPurpose
• Not used in All ApproachesVariations
• Drives EfficiencyWhy It Matters
motorcycle or bike AND ((throttle or accel*) w/10 stick)
![Page 8: Demystifying Predictive Coding Technology](https://reader035.vdocuments.mx/reader035/viewer/2022081502/557e0181d8b42a16408b4732/html5/thumbnails/8.jpg)
Seed Set
Building the Seed Set
• Review Strategically Sampled Docs
Step
• Generates High-level Relevancy “Heat Map”
Purpose
• Random, Strategic, Judgmental Samples
Variations
• Drives EfficiencyWhy It Matters
![Page 9: Demystifying Predictive Coding Technology](https://reader035.vdocuments.mx/reader035/viewer/2022081502/557e0181d8b42a16408b4732/html5/thumbnails/9.jpg)
Predicting Responsiveness
The Prediction Engine
Prediction Engine
Relatedness Map
Seed Set / Search
TrainingDefinitely
Predictive Calls
Responsive?Definitely Not
The three categories of information we know are fed into the system’s algorithm, which evaluates the data to score the likelihood of each document’s being responsive.
![Page 10: Demystifying Predictive Coding Technology](https://reader035.vdocuments.mx/reader035/viewer/2022081502/557e0181d8b42a16408b4732/html5/thumbnails/10.jpg)
Assessing the Results
Building the Answer Key
•Assess Accuracy Based on Industry Standard Metrics Step
•Informs Decision to Stop TARPurpose
•Simple and Stratified Sampling
•Sample Once or Multiple Times
Variations
•DefensibilityWhy It
Matters
Definitely
Predictive Calls
Responsive?Definitely Not
![Page 11: Demystifying Predictive Coding Technology](https://reader035.vdocuments.mx/reader035/viewer/2022081502/557e0181d8b42a16408b4732/html5/thumbnails/11.jpg)
Training / Learning
Continual Refinement
Definitely
Predictive Calls
Responsive?Definitely Not
Refining keyword searches and manually reviewing documents with highest levels of uncertainty moves docs from the middle toward the endpoints.
• Reviewers Train and System LearnsStep
• Transfer Subject Matter Expertise to TAR System
Purpose
• Active Learning• Passive LearningVariations
• Dramatic Cost SavingsWhy It
Matters
![Page 12: Demystifying Predictive Coding Technology](https://reader035.vdocuments.mx/reader035/viewer/2022081502/557e0181d8b42a16408b4732/html5/thumbnails/12.jpg)
Post-TAR
Producing the Responsive Documents• Terminate TAR Review
• Decision based on Accuracy and Cost Metrics
• “Stabilization”• Harvest Predicted Calls• Review Responsive Docs• Sample Non-Responsive Docs• Document Entire Process
![Page 13: Demystifying Predictive Coding Technology](https://reader035.vdocuments.mx/reader035/viewer/2022081502/557e0181d8b42a16408b4732/html5/thumbnails/13.jpg)
Accuracy Metrics
How Accuracy is MeasuredTAR improves the F1 score by moving documents from false (incorrect) bins to the true bins where they belong.
![Page 14: Demystifying Predictive Coding Technology](https://reader035.vdocuments.mx/reader035/viewer/2022081502/557e0181d8b42a16408b4732/html5/thumbnails/14.jpg)
Selected TAR Bibliography
TAR Resources1. Search, Forward: Will Manual Document Review and Keyword
Searches be Replaced by Computer-assisted Coding? (2011)• Judge Andrew Peck• http://www.law.com/jsp/lawtechnologynews/PubArticleLTN.jsp?id=12
025165305342. Technology-Assisted Review in E-Discovery can be More Effective
and More Efficient than Exhaustive Manual Review (2011)• Maura R. Grossman and Gordon V. Cormack• http://jolt.richmond.edu/v17i3/article11.pdf
3. Where the Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery (2012)
• RAND Institute for Civil Justice: Nicholas M. Pace, Laura Zakaras
• http://www.rand.org/pubs/monographs/MG1208.html#abstract
![Page 15: Demystifying Predictive Coding Technology](https://reader035.vdocuments.mx/reader035/viewer/2022081502/557e0181d8b42a16408b4732/html5/thumbnails/15.jpg)
15
Thank You!
Q&A