scientists see promise in deep-learning programs

38
entists See Promise in Deep-Learning Program Microsoft Seeks an Edge in Analyzing Big Dat Jeff Hawkins Develops a Brainy Big Data Company Google Offers Big-Data Analytics The Age of Big Data How Big Data Became So Big Why Hire a Lawyer? Computers Are Cheaper Armies of Expensive Lawyers, Replaced by Cheaper Software

Upload: raanan

Post on 05-Jan-2016

35 views

Category:

Documents


2 download

DESCRIPTION

Scientists See Promise in Deep-Learning Programs. Microsoft Seeks an Edge in Analyzing Big Data. The Age of Big Data. Why Hire a Lawyer? Computers Are Cheaper. Armies of Expensive Lawyers, Replaced by Cheaper Software. Google Offers Big-Data Analytics. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Scientists See Promise in Deep-Learning Programs

Scientists See Promise in Deep-Learning Programs

Microsoft Seeks an Edge in Analyzing Big Data

Jeff Hawkins Develops a Brainy Big Data Company

Google Offers Big-Data Analytics

The Age of Big Data

How Big Data Became So Big

Why Hire a Lawyer? Computers Are Cheaper

Armies of Expensive Lawyers, Replaced by Cheaper Software

Page 2: Scientists See Promise in Deep-Learning Programs

The total amount of digital data in the world is estimated toexceed 1.8 Zettabytes (1.8 TRILLION Gigabytes))

The digital universe is doubling every 2 years

85% of that data is owned or controlled by corporations at some point in its lifecycle

Source: International Data Corporation (IDC) Study, 2012

Page 3: Scientists See Promise in Deep-Learning Programs

Big Data is HereAnd it’s coming soon to a litigation

near you…What’s changed?

Page 4: Scientists See Promise in Deep-Learning Programs

The Great Comminglin

g

Page 5: Scientists See Promise in Deep-Learning Programs

Redefining scalability in eDiscovery.

1

1000

1 X 1012

Page 6: Scientists See Promise in Deep-Learning Programs

Predictive Coding is a Form of Machine Learning

What is Machine Learning?

Page 7: Scientists See Promise in Deep-Learning Programs

voice recognition software, e.g., calling your bank or credit card company

handwriting, facial or fingerprint recognition

analyzing market trends and guiding investment decisions

making decisions on applications for credit or loans

modeling and predicting severe weather patterns

filtering spam in your email inbox

targeted marketing on the internet

robotics

It’s already a part of our lives. . .

Page 8: Scientists See Promise in Deep-Learning Programs

KEY POINT: Predictive coding is just a part of a continuum of technology assisted review (TAR) methods that we are already very familiar with in searching and analyzing data.

Key WordsConcept

ClusteringConcept Search

Predictive Coding

Three supporting propositions:

1. Each successive approach incorporates the preceding approaches.2. Each successive approach contains more supporting criteria.3. All are ultimately based on the concept of pattern matching.

Page 9: Scientists See Promise in Deep-Learning Programs

Key Words = Simple pattern matching

External input:“wild,” “wolf,” “pet”

dog

cat

rhino

ferretgoldfish

cow

wolfdomesticwild

pet

Page 10: Scientists See Promise in Deep-Learning Programs

Concept Clustering = Organization based on internal relationships

dog

cat

domesticated

wild

pet

rhino

ferret

goldfish

cow

wolf

tiger

dog

cat

domesticated

wild

pet

rhinoferret

goldfishcow

wolf

tiger01110111011010010110110001100100 (wild)

011001000110111101100111 (dog)

011100000110010101110100 (pet)

Page 11: Scientists See Promise in Deep-Learning Programs

Concept Searching

dog

cat

rhino

ferretgoldfish

cow

wolfdomesticwild

pet

dog cat

rhino

ferretgoldfishcow

wolf

domesticatedwild

pet

tiger

= Key words + Concept organization

External input:“zoo,” wild,” “domesticated”

farm

zoo01111010011011110110111 (zoo)

01110111011010010110110001100100 (wild)

0110010001101111011011010110010101110011011101000110100101100011011000010111

01000110010101100100 (domesticated)

Page 12: Scientists See Promise in Deep-Learning Programs

Predictive Coding

dog

cat

rhino

ferretgoldfish

cow

wolfdomesticwild

pet

dog cat

rhino

ferretgoldfishcow

wolf

domesticatedwild

pet

tiger

= document-level input + probabilistic modeling

farm

zoo

external input:human-coded documents

output: doc-level probability rankings

01111010011011110110111 (zoo)

01110111011010010110110001100100 (wild)

0110010001101111011011010110010101110011011101000110100101100011011000010111

01000110010101100100 (domesticated)

Page 13: Scientists See Promise in Deep-Learning Programs

InferStep 1. sample documents from entire set.

Page 14: Scientists See Promise in Deep-Learning Programs

Step 2: attorney review of sample documents to create training and control set.

In the European mind, wolves long stood as a symbol of baneful, uncontrollable nature. As far back as the time of Aesop in 500 BCE (Before the Common Era), wolves in literature are portrayed as wicked villains and long-fanged, terrible beasts. Before the Middle Ages, wolves were nearly always the greedy thief, criminal trickster, or cruel remorseless murderer. The wolf does not fare well in the European imagination.

Can the wolf be domesticated?

The domesticated dog isdescended from the wolf found in the wild.

While some people have occasionally attemptedto raise wolves as pets, their2 ½ inch fangs and tendencyto eat nearby small animals such as cats can create socially awkward situations withneighbors.

Responsive

Not Responsive

Page 15: Scientists See Promise in Deep-Learning Programs

Step 3: create model from human coded training set (responsive and not responsive).

In the European mind, wolves long stood as a symbol of baneful, uncontrollable nature. As far back as the time of Aesop in 500 BCE (Before the Common Era), wolves in literature are portrayed as wicked villains and long-fanged, terrible beasts. Before the Middle Ages, wolves were nearly always the greedy thief, criminal trickster, or cruel remorseless murderer. The wolf does not fare well in the European imagination.

Can the wolf be domesticated?

The domesticated dog isdescended from the wolf found in the wild.

While some people have occasionally attemptedto raise wolves as pets, their2 ½ inch fangs and tendencyto eat nearby small animals such as cats can create socially awkward situations withneighbors.

Can the wolf be domesticated?

The domesticated dog isdescended from the wolf found in the wild.

While some people have occasionally attemptedto raise wolves as pets, their2 ½ inch fangs and tendencyto eat nearby small animals such as cats can create socially awkward situations withneighbors.

wolves

wolf

pet

Word Pos. Neg.

wolf .98 .08

dog .56 .43

pet .42 .28

raise .61 .09

costner

dances

Word Assoc %

wolf pet .73

dog wolf .43

pet raise ..88

raise wolf .61

raise

werewolf

011001000110111101100111

011001000110111101100111

011001000110111101100111

011001000110111101100111

011001000110111101100111

011001000110111101100111

011001000110111101100111

Page 16: Scientists See Promise in Deep-Learning Programs

Step 4: test model against sample (human coded) set.

"Dances With Wolves" has the makings of a great work, one that recalls a variety of literary antecedents, everything from "Robinson Crusoe" and "Walden" to "Tarzan of the Apes." Michael Blake's screenplay touches both on man alone in nature and on the 19th-century white man's assuming his burden among the less privileged.

Wolves are sometimes kept as exotic pets, and in some rarer occasions, as working animals. Although closely related to dogs (which are believed to have split from wolves between 10,000 and 100,000 years ago), wolves do not show the same tractability as dogs in living alongside humans. Wolves also need much more space than dogs, about 10- 15 sq. miles.

Page 17: Scientists See Promise in Deep-Learning Programs

Yes

No

Apply model to remainder of documents that have not been reviewed

Responsive

Non-responsive

Page 18: Scientists See Promise in Deep-Learning Programs

Step 5: Apply model to entire set and rank documents.100 %

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%

Page 19: Scientists See Promise in Deep-Learning Programs

PREDICTIVE CODING AND BIG DATA

NYLJ/Pangea3 WebinarApril 15, 2013

Page 20: Scientists See Promise in Deep-Learning Programs

OUTLINE

1. Mitigating Big Data in E-Discovery2. Stakeholder Analysis3. The New Reality of Predictive Coding4. Long-Term Trends

Page 21: Scientists See Promise in Deep-Learning Programs

MITIGATING BIG DATA IN E-DISCOVERY

Predictive Coding and Big Data

Page 22: Scientists See Promise in Deep-Learning Programs

BIG DATA IN E-DISCOVERY

• Bigger haystack—more documents in general

• Corporate data culture—more relevant

documents

• More sources—poses collection/preservation

challenges

Page 23: Scientists See Promise in Deep-Learning Programs

MITIGATING BIG DATA IN E-DISCOVERY

• Some mitigating factors:

• Principles of proportionality and cooperation

• Information governance tools and document management

• Technology-assisted review and predictive coding

Page 24: Scientists See Promise in Deep-Learning Programs

STAKEHOLDER ANALYSISPredictive Coding and Big Data

Page 25: Scientists See Promise in Deep-Learning Programs

PREDICTIVE CODING STAKEHOLDER ANALYSIS

• Judges: generally receptive

• Clients: cost efficiencies vs. risk management

• Lawyers: new model, building expertise

Page 26: Scientists See Promise in Deep-Learning Programs

THE NEW REALITY OF PREDICTIVE CODING

Predictive Coding and Big Data

Page 27: Scientists See Promise in Deep-Learning Programs

NEW REALITY OF PREDICTIVE CODING

Reduced Data Volumes

Increased Complexity and Density

Focused, High-Stakes Human Review

Battle of Expertise

Predictive Coding

Page 28: Scientists See Promise in Deep-Learning Programs

LONG-TERM TRENDSPredictive Coding and Big Data

Page 29: Scientists See Promise in Deep-Learning Programs

LONG-TERM TRENDS

• Over time, Big Data growth > predictive coding benefits

• Some document-by-document human review necessary

• Strategic nuances in a new discovery battleground

Page 30: Scientists See Promise in Deep-Learning Programs

NEW YORK

Pangea3 LLC530 5th Avenue, 7th FLNew York, NY 10036

Tel. (US Main): +1-212-689-3819Fax: +1-212-820-9784

MUMBAI

Pangea3 Legal Database Systems Pvt. Ltd.102-B, Ground Floor, Leela Business ParkAndheri-Kurla RoadAndheri East, Mumbai 400 059, India

U.S. Line: +1-877-311-8528Tel.: +91-22-6191-7500Fax: +91-22-6191-7600

DALLAS

Pangea3 LLC2395 Midway RoadCarrollton, TX 75006

Tel. (US Main): +1-212-689-3819Fax: +1-212-820-9784

DELHI

Pangea3 Legal Database Systems Pvt. Ltd.B-23, Sector 58Noida UP 20 301, India

U.S. Line: +1-877-311-8528Tel: +91-120-425-5210/14/16Fax: +212-820-9783

CONTACT PANGEA3

Page 31: Scientists See Promise in Deep-Learning Programs

31

SEARCH (1)

How do we search for discoverable ESI?• Manually?• With automated assistance?• Which is“better” and why?

– M.R. Grossman & G.V. Cormack, “The Grossman-Cormack Glossary of Technology-Assisted Review,” 7 Fed. Cts. Law R. 1 (2013)

– Maura R. Grossman & Gordon V. Cormack, “Technologically-Assisted Review in E-Discovery Can Be More Effective and More Efficient than Exhaustive Manual Review,” XVII Rich. J.L. & Tech. 11 (2011) (available at http://jolt.richmond.edu/v17i3/article11.pdf)

– For a “shorter” discussion, see Efficient E-Discovery, ABA Journal 31 (Apr. 2012)

Page 32: Scientists See Promise in Deep-Learning Programs

32

SEARCH (2)

• Using search terms? How accurate are these? See In re National Ass’n of Music Merchants, Musical Instruments and Equipment Antitrust Litig., 2011 WL 6372826 (S.D. Ca. Dec. 19, 2011)

Page 33: Scientists See Promise in Deep-Learning Programs

33

SEARCH (3)

Automated review or “predictive coding” as an alternative to the use of search terms. For decisions which address automated review, see:

• EORHB, Inc. v. HOA Holdings LLC, C.A. No. 7409 (Del. Ct. Ch. Oct. 15, 2012)

• In re Actos (Pioglitazone) Prod. Liability Litig., MDL No. 6:11-md-2299 (W.D. La. July 27, 2012)

• Da Silva Moore v. Publicis Groupe SA, 2012 U.S. Dist. LEXIS 23350 (S.D.N.Y. Feb. 24), aff’d, 11 Civ. 1279 (ALC (AJP) (S.D.N.Y. Apr. 26, 2012)

• Global Aerospace Inc. v. Landow Aviation, L.P., Consol. Case No. CL 61040 (VA Cir. Ct. Apr. 23, 2012)

Page 34: Scientists See Promise in Deep-Learning Programs

34

SEARCH (4)

WHAT LESSONS CAN BE DRAWN FROM THE DECISIONS?• Judge approved automated search at a “threshold” level.

“Results” may be subject to challenge and later rulings.• Threshold superiority of automated vs. manual review

recognized given volume of ESI and attorney review costs.• Large volumes of ESI in issue.• Party seeking to do automated review must offer

“transparency of process” or something close to it.• “Reasonableness” of methodology is key.• Speculation by the opposing party is insufficient to defeat

threshold approval.

Page 35: Scientists See Promise in Deep-Learning Programs

35

SEARCH (5)

LET’S TAKE A DEEP BREATH AND RECAP WHERE WE ARE TODAY, VENDOR HYPE NOTWITHSTANDING:• We have yet to see a judicial analysis of process and

results in a contested matter.• Safe to assume that the proponent of a process will bear

the burden of proof (whatever that burden might be).• Safe to assume at least some transparency of process

may/will be expected.• If “reasonableness” is standard, how reasonable must

the results be? Is “precision” of 80% enough? 90%? Remember, there are no agreed-on standards.

Page 36: Scientists See Promise in Deep-Learning Programs

36

INTERLUDE

Assume a party makes production of ESI based on search terms proposed by an adversary. Assume further that the adversary suspects “something” is missing.

Is suspicion enough to warrant direct access to the party’s databases by a consultant retained by the adversary?

If not, what proofs should be required?• Will an attorney’s certification or affidavit suffice?• Will/should the attorney become a witness?• Will experts be needed?Note, with regard to proofs, S2 Automation LLC v. Micron Technology,

Inc., No. 11-0884 (D.N.M. Aug. 9, 2012), where the court, relying on Rule 26(g)(1), required a party to disclose its search methodology.

Page 37: Scientists See Promise in Deep-Learning Programs

37

INTERLUDEA collision between search and ethics?• Assume a party’s attorney knows that search terms proposed by

adversary counsel, if applied to the party’s ESI, will not lead to the production of relevant (perhaps highly relevant) ESI.

• Absent a lack of candor to adversary counsel or the court under RPC 3.4 (which implies if not require,s some affirmative statement), does not RPC 1.6 require the party’s attorney to remain silent?

• What if the “nonproduction” becomes learned later? If nothing else, will the party’s attorney suffer bad “PR” if nothing else?

• If the party’s attorney wants to advise the adversary, should the attorney secure her client’s informed consent? What if the client says, “no?”

(with thanks to the Hon. John M. Facciola)

Page 38: Scientists See Promise in Deep-Learning Programs

38

INTERLUDE

AS WE THINK ABOUT SEARCH, THINK ABOUT THE ETHICS ISSUES THAT USE OF A NONPARTY VENDOR MAY LEAD TO!