877.557.4273cata lystsecure.com
Speakers
Robert Ambrogi, Esq.Host & Moderator
Data Driven Discovery Jeremy Pickens, Ph.D.Mark Noel, Esq.Seven Metrics for Smarter Decisions & Better Results
APRIL WEBINAR
Today’s Agenda§ Being Empirical 101
§Kaizen, the scientific method, and getting good data§How much do you really need to do or know to be effective?
§ A data-driven look at seven phases of discovery§What variables might be in play?
§Any variables that seem plausible but don’t have much effect?§When it might be a good idea to measure§What/how to measure§Other experiments or research
§ Questions / open discussion
SpeakersJeremy Pickens, Ph.D. | Chief Scientist, Catalyst
Mark Noel, Esq. | Managing Director, Professional Services, Catalyst
Robert Ambrogi, Esq. | Director of Communications, Catalyst
Jeremy is one of the world's leading search scientists and a pioneer in the field of collaborative exploratory search. He has six patents pending in the field of search and information retrieval, including two for collaborative exploratory search systems. At Catalyst, Jeremy researches and develops methods of using collaborative search and other techniques to enhance search and review within the Catalyst system and help clients achieve more intelligent and precise results in e-discovery search and review.
Mark specializes in helping clients use technology-assisted review, advanced analytics, and custom workflows to handle complex and large-scale litigations. He also works with Catalyst’s research and development group on new litigation technology tools. Before joining Catalyst, Mark was an intellectual property litigator with Latham & Watkins LLP, co-founder of an e-discovery software startup, and a research scientist at Dartmouth College’s Interactive Media Laboratory.
Bob is a practicing lawyer in Massachusetts and is the former editor-in-chief of The National Law Journal, Lawyers USA and Massachusetts Lawyers Weekly. A fellow of the College of Law Practice Management, he writes the award-winning blog LawSites and co-hosts the legal-affairs podcast Lawyer2Lawyer. He is a regular contributor to the ABA Journal and is vice chair of the editorial board of the ABA’s Law Practice magazine.
Abraham FlexnerEvidence-based medicine
Fundamentals§ Spot the testable question – don’t guess§ Good experimental design – don’t make it up as you go along§ Control variables§ Assume data will be noisy – you may need several matters§ Measure early and often
“The first principle is that you must not fool yourself, and you are the easiest person to fool.”
– Richard Feynman
Cranfield Model (1966)1. Assemble a test collection:
§Document Corpus§Expression of User Information Need§Relevance Judgments, aka ground truth
2. Choose an Effectiveness Metric
3. Vary the TAR system
Training ProtocolCondition 1 Condition 2 Condition 3
Document Corpus Corpus Z Corpus Z Corpus Z
Starting Condition (e.g. seed documents, ad hoc query, etc.)
[docid:7643 = true][docid:225 = true]
[docid:7643 = true][docid:225 = true]
[docid:7643 = true][docid:225 = true]
Feature (Signal) Extraction Character n-grams Character n-grams Character n-grams
Ranking Engine Logistic Regresssion Logistic Regresssion Logistic Regresssion
Training/Review Protocol SPL SAL CAL
Ground Truth[docid:7643 = true][docid:225 = true][docid:42 = false]
[docid:7643 = true][docid:225 = true][docid:42 = false]
[docid:7643 = true][docid:225 = true][docid:42 = false]
Evaluation Metric Precision@75% recall Precision@75% recall Precision@75% recall
Okay, but what about all us non-scientists?
1. Targeted Collections
§ Number of custodians§ Number or type of collection sources§ Sophistication, reasonableness, or tenacity of opposing side§ Likelihood of unrelated but sensitive material being scooped up§ Capabilities of targeted collection tools
Factors to consider:
1. Targeted Collections
§ Process and review random samples from initial custodians§ Use TAR to sort into “likely positive” and “likely negative” populations§ Use TAR to also generate or supplement list of potential search terms§ Run a report to compare each term’s hit count or hit density in each of the populations
Example: Generating and validating boolean terms
“Raptor”
“Payments”
2. Culling
§ Black list vs. white list search terms§ File type, date range, etc. § Can we get a stip?§ Do we need to validate in order to avoid a Biomet problem?§ Is it even worth doing in light of what we’re doing next?
Factors to consider:
Problem with Keyword Search
§ Attorneys worked with experienced paralegals to develop search terms. Upon finishing, they estimated that they had retrieved at least three quarters of all relevant documents. § What they actually retrieved >
Generally Poor Results
Blair&Maron,AnEvaluationofRetrievalEffectivenessforaFull-TextDocument-RetrievalSystem(1985).
Problem with Keyword Search
§ (((master settlement agreement OR msa) AND NOT (medical savings account OR metropolitan standard area)) OR s. 1415 OR (ets AND NOT educational testing service) OR (liggett AND NOT sharon a. liggett) OR atco OR lorillard OR (pmi AND NOT presidential management intern) OR pm usa OR rjr OR (b&w AND NOT photo*) OR phillip morris OR batco OR ftc test method OR star scientific OR vector group OR joe camel OR (marlboroAND NOT upper marlboro)) AND NOT (tobacco* OR cigarette* OR smoking OR tar OR nicotine OR smokeless OR synar amendment OR philip morris OR r.j. reynolds OR ("brown and williamson") OR("brown & williamson") OR bat industries OR liggett group)
Jason R. Baron, Through A Lawyer’s Lens: Measuring Performance in Conducting Large Scale Searches Against Heterogeneous Data Sets in Satisfaction of Litigation Requirements, University of Pennsylvania Workshop, (October 26, 2006).
It can become overly complex
2. Culling
§ Sample review to validate recall or elusion§ Total cost of culling effort vs. total cost to promote additional documents to review§ Using TAR, additional non-relevant docs might not get reviewed anyway§ Developing and validating extensive culling terms requires a lot of human effort
Potential Metrics:
2. CullingExample: Manual search and culling vs. TAR / CAL
Time Cost
Manual development and validation of search terms
Two weeks (with two associates) to cull 700,000 documents
160 Associate hours
Letting TAR do the workOne day (with 12-person review team) to review 6,000 more docs
100 review hours + technology cost of 700,000 * $0.01
3. ECA & Investigation
§ Know what we’re looking for?§ Possible intent to evade search? § Time/resource constraints?§ Blair & Maron – almost 50 topics, 75% recall on 1, but many with less than 3% recall.§ TREC 2016 will have a topical recall track.
Factors to consider:
3. ECA & Investigation
§ Number of different search techniques§ Total time, total docs, or percentage of docs required to reach a defensible outcome
Potential Metrics:
4. Review
§ Richness estimates§ Population overall§ Batch richness / review precision§ Family or doc level review§ Factors for and against each
§Overall richness§Average family size§Workflow and tool capabilities
§ Dependence on context to make relevance judgments
Factors to consider:
4. Review
§ Review rate§ Relevant vs. Non-relevant§ Threading§ Clustering / TAR§ Heterogeneity
§ Complexity§ Number of coding fields§ Separating variables in coding field§ Bifurcated workflows (e.g., special file types)
More Factors to consider:
4. Review
§ The usual suspects§ Population richness§ Batch richness§ Docs per hour§ Relevant docs per review hour
§ Review precision by day (“yield”)§ Review precision by reviewer§ A/B Testing
Potential Metrics:
5. TAR and Analytics
§ Re-using seeds§ Which/how many judgmental seeds§ Artificial seeds§ Variations on weighting§ Richness constraints?
§ Training protocol§ Frequency of re-ranking
Factors to consider:
5. TAR and AnalyticsFrequency of updates
Condition 1 Condition 2
Document Corpus Corpus Z Corpus Z
Starting Condition (e.g. seed documents, ad hoc query, etc.)
[docid:7643 = true][docid:225 = true]
[docid:7643 = true][docid:225 = true]
Feature (Signal) Extraction 1-grams 1-gramsRanking Engine Logistic Regresssion Logistic Regresssion
Training/Review Protocol CAL, reranked once a day CAL, reranked every 10 min.
Ground Truth[docid:7643 = true][docid:225 = true][docid:42 = false]
[docid:7643 = true][docid:225 = true][docid:42 = false]
Evaluation Metric Precision@75% recall Precision@75% recall
5. TAR and AnalyticsExample: Frequency of Re-ranking
Case 1
5. TAR and AnalyticsExample: Frequency of Re-ranking
Case 2
5. TAR and AnalyticsExample: Frequency of Re-ranking
Case 3
5. TAR and AnalyticsExample: Frequency of Re-ranking
Case 4
5. TAR and AnalyticsCondition 1 Condition 2
Document Corpus English Corpus Mixed Japanese+EnglishCorpus
Starting Condition (e.g. seed documents, ad hoc query, etc.)
[docid:7643 = true][docid:225 = true]
[docid:9356 = false][docid:89 = true]
Feature (Signal) Extraction 1-grams, no cross-language features
1-grams, no cross-language features
Ranking Engine Logistic Regresssion Support Vector MachineTraining/Review Protocol CAL CAL
Ground Truth[docid:7643 = true][docid:225 = true][docid:42 = false]
[docid:9356 = false][docid:89 = true][docid:42 = false]
Evaluation Metric Precision@75% recall Precision@75% recall
6. Quality ControlFactors to consider:
§ How much is enough?§ Random or systematic? § Validate? § Base level of disagreement inherent in human review, or significant? § Disagreement due to expertise differences between first pass and QC team (tendency of less expert to over-mark and err on the side of relevant) § Relevance drift
Disagreement Among ReviewersMaura R Grossman & Gordon V. Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and Efficient Than Exhaustive and Manual Review, XVII Rich. J.L. & Tech. 11 (2011), http://jolt.richmond.edu/v1713/articlee11.pdf; Ellen M. Voorhees, Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness, 36 Info. Processing & Mgmt 697 (2000).
6. Quality ControlPotential metrics:
§ Overturn rate in a validation sample§ Overturn yield:
Expert vs Non-Expert TrainingCondition Condition 2
Document Corpus Corpus Z Corpus Z
Starting Condition (e.g. seed documents, ad hoc query, etc.)
[docid:7643 = true][docid:225 = false]
[docid:7643 = true][docid:225 = true]
Feature (Signal) Extraction 1-grams 1-gramsRanking Engine Logistic Regresssion Logistic Regresssion
Training/Review Protocol CAL, using non-expert judgments
CAL, using expert judgments
Ground Truth[docid:7643 = true][docid:225 = true][docid:42 = false]
[docid:7643 = true][docid:225 = true][docid:42 = false]
Evaluation Metric Precision@75% recall Precision@75% recall
7. ValidationFactors to consider:
§ How confident do you need to be?§ What are the boundaries of the process we need to validate?§ Will one total recall number be sufficient, or do you also need some guarantee of topical completeness?§ Who are you defending the process to and what metrics do they care about?
§ Recall – people trying to find stuff§ Precision – people paying for stuff
7. ValidationAnother Example: Review Precision
7. ValidationExample: Review Metrics for Outside Counsel
Questions & Answers
Jeremy Pickens, Ph.D.
Mark Noel, Esq
Robert Ambrogi, Esq
You may use the chat feature at any time to ask questions
Lowering Your Total Cost of ReviewUsing Predicative Analytics
Thursday, May 12, 2016 | 2 p.m. Eastern
John Tredennick Michael Arkfeld David Stanton