data-driven discovery: seven metrics for smarter decisions and better results

Download Data-Driven Discovery: Seven Metrics for Smarter Decisions and Better Results

Post on 16-Apr-2017




0 download

Embed Size (px)


  • 877.557.4273cata


    Robert Ambrogi, Esq.Host & Moderator

    Data Driven Discovery Jeremy Pickens, Ph.D.Mark Noel, Esq.Seven Metrics for Smarter Decisions & Better Results


  • Todays Agenda Being Empirical 101

    Kaizen, the scientific method, and getting good dataHow much do you really need to do or know to be effective?

    A data-driven look at seven phases of discoveryWhat variables might be in play?

    Any variables that seem plausible but dont have much effect?When it might be a good idea to measureWhat/how to measureOther experiments or research

    Questions / open discussion

  • SpeakersJeremy Pickens, Ph.D. | Chief Scientist, Catalyst

    Mark Noel, Esq. | Managing Director, Professional Services, Catalyst

    Robert Ambrogi, Esq. | Director of Communications, Catalyst

    Jeremy is one of the world's leading search scientists and a pioneer in the field of collaborative exploratory search. He has six patents pending in the field of search and information retrieval, including two for collaborative exploratory search systems. At Catalyst, Jeremy researches and develops methods of using collaborative search and other techniques to enhance search and review within the Catalyst system and help clients achieve more intelligent and precise results in e-discovery search and review.

    Mark specializes in helping clients use technology-assisted review, advanced analytics, and custom workflows to handle complex and large-scale litigations. He also works with Catalysts research and development group on new litigation technology tools. Before joining Catalyst, Mark was an intellectual property litigator with Latham & Watkins LLP, co-founder of an e-discovery software startup, and a research scientist at Dartmouth Colleges Interactive Media Laboratory.

    Bob is a practicing lawyer in Massachusetts and is the former editor-in-chief of The National Law Journal, Lawyers USA and Massachusetts Lawyers Weekly. A fellow of the College of Law Practice Management, he writes the award-winning blog LawSites and co-hosts the legal-affairs podcast Lawyer2Lawyer. He is a regular contributor to the ABA Journal and is vice chair of the editorial board of the ABAs Law Practice magazine.

  • Abraham FlexnerEvidence-based medicine

  • Fundamentals Spot the testable question dont guess Good experimental design dont make it up as you go along Control variables Assume data will be noisy you may need several matters Measure early and often

  • The first principle is that you must not fool yourself, and you are the easiest person to fool.

    Richard Feynman

  • Cranfield Model (1966)1. Assemble a test collection:

    Document CorpusExpression of User Information NeedRelevance Judgments, aka ground truth

    2. Choose an Effectiveness Metric

    3. Vary the TAR system

  • Training ProtocolCondition 1 Condition 2 Condition 3

    Document Corpus Corpus Z Corpus Z Corpus Z

    Starting Condition (e.g. seed documents, ad hoc query, etc.)

    [docid:7643 = true][docid:225 = true]

    [docid:7643 = true][docid:225 = true]

    [docid:7643 = true][docid:225 = true]

    Feature (Signal) Extraction Character n-grams Character n-grams Character n-grams

    Ranking Engine Logistic Regresssion Logistic Regresssion Logistic Regresssion

    Training/Review Protocol SPL SAL CAL

    Ground Truth[docid:7643 = true][docid:225 = true][docid:42 = false]

    [docid:7643 = true][docid:225 = true][docid:42 = false]

    [docid:7643 = true][docid:225 = true][docid:42 = false]

    Evaluation Metric Precision@75% recall Precision@75% recall Precision@75% recall

  • Okay, but what about all us non-scientists?

  • 1. Targeted Collections

    Number of custodians Number or type of collection sources Sophistication, reasonableness, or tenacity of opposing side Likelihood of unrelated but sensitive material being scooped up Capabilities of targeted collection tools

    Factors to consider:

  • 1. Targeted Collections

    Process and review random samples from initial custodians Use TAR to sort into likely positive and likely negative populations Use TAR to also generate or supplement list of potential search terms Run a report to compare each terms hit count or hit density in each of the populations

    Example: Generating and validating boolean terms

  • Raptor

  • Payments

  • 2. Culling

    Black list vs. white list search terms File type, date range, etc. Can we get a stip? Do we need to validate in order to avoid a Biomet problem? Is it even worth doing in light of what were doing next?

    Factors to consider:

  • Problem with Keyword Search

    Attorneys worked with experienced paralegals to develop search terms. Upon finishing, they estimated that they had retrieved at least three quarters of all relevant documents. What they actually retrieved >

    Generally Poor Results


  • Problem with Keyword Search

    (((master settlement agreement OR msa) AND NOT (medical savings account OR metropolitan standard area)) OR s. 1415 OR (ets AND NOT educational testing service) OR (liggett AND NOT sharon a. liggett) OR atco OR lorillard OR (pmi AND NOT presidential management intern) OR pm usa OR rjr OR (b&w AND NOT photo*) OR phillip morris OR batco OR ftc test method OR star scientific OR vector group OR joe camel OR (marlboroAND NOT upper marlboro)) AND NOT (tobacco* OR cigarette* OR smoking OR tar OR nicotine OR smokeless OR synar amendment OR philip morris OR r.j. reynolds OR ("brown and williamson") OR("brown & williamson") OR bat industries OR liggett group)

    Jason R. Baron, Through A Lawyers Lens: Measuring Performance in Conducting Large Scale Searches Against Heterogeneous Data Sets in Satisfaction of Litigation Requirements, University of Pennsylvania Workshop, (October 26, 2006).

    It can become overly complex

  • 2. Culling

    Sample review to validate recall or elusion Total cost of culling effort vs. total cost to promote additional documents to review Using TAR, additional non-relevant docs might not get reviewed anyway Developing and validating extensive culling terms requires a lot of human effort

    Potential Metrics:

  • 2. CullingExample: Manual search and culling vs. TAR / CAL

    Time Cost

    Manual development and validation of search terms

    Two weeks (with two associates) to cull 700,000 documents

    160 Associate hours

    Letting TAR do the workOne day (with 12-person review team) to review 6,000 more docs

    100 review hours + technology cost of 700,000 * $0.01

  • 3. ECA & Investigation

    Know what were looking for? Possible intent to evade search? Time/resource constraints? Blair & Maron almost 50 topics, 75% recall on 1, but many with less than 3% recall. TREC 2016 will have a topical recall track.

    Factors to consider:

  • 3. ECA & Investigation

    Number of different search techniques Total time, total docs, or percentage of docs required to reach a defensible outcome

    Potential Metrics:

  • 4. Review

    Richness estimates Population overall Batch richness / review precision Family or doc level review Factors for and against each

    Overall richnessAverage family sizeWorkflow and tool capabilities

    Dependence on context to make relevance judgments

    Factors to consider:

  • 4. Review

    Review rate Relevant vs. Non-relevant Threading Clustering / TAR Heterogeneity

    Complexity Number of coding fields Separating variables in coding field Bifurcated workflows (e.g., special file types)

    More Factors to consider:

  • 4. Review

    The usual suspects Population richness Batch richness Docs per hour Relevant docs per review hour

    Review precision by day (yield) Review precision by reviewer A/B Testing

    Potential Metrics:

  • 5. TAR and Analytics

    Re-using seeds Which/how many judgmental seeds Artificial seeds Variations on weighting Richness constraints?

    Training protocol Frequency of re-ranking

    Factors to consider:

  • 5. TAR and AnalyticsFrequency of updates

    Condition 1 Condition 2

    Document Corpus Corpus Z Corpus Z

    Starting Condition (e.g. seed documents, ad hoc query, etc.)

    [docid:7643 = true][docid:225 = true]

    [docid:7643 = true][docid:225 = true]

    Feature (Signal) Extraction 1-grams 1-gramsRanking Engine Logistic Regresssion Logistic Regresssion

    Training/Review Protocol CAL, reranked once a day CAL, reranked every 10 min.

    Ground Truth[docid:7643 = true][docid:225 = true][docid:42 = false]

    [docid:7643 = true][docid:225 = true][docid:42 = false]

    Evaluation Metric Precision@75% recall Precision@75% recall

  • 5. TAR and AnalyticsExample: Frequency of Re-ranking

    Case 1

  • 5. TAR and AnalyticsExample: Frequency of Re-ranking

    Case 2

  • 5. TAR and AnalyticsExample: Frequency of Re-ranking

    Case 3

  • 5. TAR and AnalyticsExample: Frequency of Re-ranking

    Case 4

  • 5. TAR and AnalyticsCondition 1 Condition 2

    Document Corpus English Corpus Mixed Japanese+EnglishCorpus

    Starting Condition (e.g. seed documents, ad hoc query, etc.)

    [docid:7643 = true][docid:225 = true]

    [docid:9356 = false][docid:89 = true]

    Feature (Signal) Extraction 1-grams, no cross-language features1-grams, no cross-language features

    Ranking Engine Logistic Regresssion Support Vector MachineTraining/Review Protocol CAL CAL

    Ground Truth[docid:


View more >