pattern recognition research lab d. lopresti & h. s. baird henry s. baird terry riopka jon...

40
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting eCommerce from Robots Impersonating Human Users

Upload: edmund-singleton

Post on 17-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Henry S. Baird

Terry Riopka

Jon Bentley (Avaya Labs)

Michael A. Moll

Sui-Yu Wang

Protecting eCommercefrom Robots Impersonating

Human Users

Page 2: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

A Pitfall of the World Wide Web

© Peter Steiner, The New Yorker, July 5, 1993, p. 61 (Vol.69, No. 20)

Page 3: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Straws in the wind…

Mid 90’s: spammers trolling for email addresses

• in defense, people start disguising them, e.g.

“baird AT cse DOT lehigh DOT edu”

1997: abuse of ‘Add-URL’ feature at AltaVista

• some write programs to add their URL many times

• to skew search rankings in their favor

Andrei Broder et al (then at DEC SRC)

• a user action which is legitimate when performed once

becomes abusive when repeated many times

• no effective legal recourse

• how to block or slow down these programs …

Page 4: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

The first known instance…

Altavista’s AddURL filter

1999: “ransom note filter”

• randomly pick letters, fonts, rotations – render as an image

• every user is required to read and type it in correctly

• reduced “spam add_URL” by “over 95%”

Weaknesses: isolated chars, filterable noise, affine deformationsM. D. Lillibridge, M. Abadi, K. Bharat, & A. Z. Broder, “Method for Selectively

Restricting Access to Computer Systems,” U.S. Patent No. 6,195,698, Filed April 13, 1998, Issued February 27, 2001.

An image of text, not ASCII

Page 5: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Alan Turing (1912-1954)

1936 a universal model of computation

1940s helped break Enigma (U-boat) cipher

1949 first serious uses of a working computer

including plans to read printed text

(he expected it would be easy)

1950 proposed a test for machine intelligence

Page 6: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Turing’s Test for AI

How to judge that a machine can ‘think’:

• play an ‘imitation game’ conducted via teletypes

• a human judge & two invisible interlocutors:• a human

• a machine `pretending’ to be human

• after asking any questions (challenges) he/she

wishes, the judge decides which is human

• failure to decide correctly would be convincing

evidence of machine intelligence

Modern GUIs invite richer challenges than teletypes….

A. Turing, “Computing Machinery & Intelligence,” Mind, Vol. 59(236), 1950.

Page 7: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Completely Automated Public Turing Teststo Tell Computers & Humans Apart

challenges can be generated & graded automatically

(i.e. the judge is a machine) accepts virtually all humans, quickly & easily rejects virtually all machines resists automatic attack for many years

(even assuming that its algorithms are known?)

NOTE: machines administer, but cannot pass the test!

L. von Ahn, M. Blum, N.J. Hopper, J. Langford, “CAPTCHA: Using Hard AI Problems For Security,” Proc., EuroCrypt 2003, Warsaw, Poland, May 4-

8, 2003.

“CAPTCHAs”

Page 8: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Some Typical CAPTCHAs

Microsoft

eBay/PayPal

Yahoo!

PARC’s PessimalPrint

Page 9: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Cropping up everywhere…

Used to defend against:• skewing search-engine rankings (Altavista, 1999)• infesting chat rooms, etc (Yahoo!, 2000)• gaming financial accounts (PayPal, 2001)• robot spamming (MailBlocks, SpamArrest 2002)• In the last two years: Overture, Chinese website, HotMail, CD-rebate, TicketMaster, MailFrontier, Qurb, Madonnarama, Gay.com, …

… how many have you seen? On the horizon:

• ballot stuffing, password guessing, denial-of-service attacks• `blunt force’ attacks (e.g. UT Austin break-in, Mar ’03)• …many others

D. P. Baron, “eBay and Database Protection,” Case No. P-33, Case Writing Office,Stanford Graduate School of Business, Stanford Univ., 2001.

Page 10: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

The Limitations ofImage Understanding Technology

There remains a large gap in ability

between human and machine vision systems,

even when reading printed text

Performance of OCR machines has been systematically studied:

7 year olds can consistently do better!

This ability gap has been mapped quantitatively

S. Rice, G. Nagy, T. Nartker, OCR: An Illustrated Guide to the Frontier, Kluwer Academic Publishers: 1999.

Page 11: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Image Degradation Modeling

Effects of printing & imaging:

We can generate challenging

images pseudorandomly

H. Baird, “Document Image Defect Models,” in H. Baird, H. Bunke, & K. Yamamoto (Eds.),Structured Document Image Analysis, Springer-Verlag: New York, 1992.

blur

thrs

sen

s

thrs x blur

Page 12: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Machine Accuracy is Often a NearlyMonotonic Function of Parameters

T. K. Ho & H. S. Baird, “Large Scale Simulation Studies in Image Pattern Recognition,”IEEE Trans. on PAMI, Vol. 19, No. 10, p. 1067-1079, October 1997.

Page 13: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Can You Read These Degraded Images?

Of course you can …. but OCR machines cannot!

Page 14: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

The PessimalPrint CAPTCHA

Three OCR machines fail when: OCR outputs

– blur = 0.0

& threshold 0.02 - 0.08

– threshold = 0.02

& any value of blur

~~~.I~~~

~~i1~~

N/A

N/A

N/A ~~I~~

A. Coates, H. Baird, R. Fateman, “Pessimal Print: A Reverse Turing Test,” Proc. 6th IAPR Int’l Conf. On Doc. Anal. & Recogn. (ICDAR’01), Seattle, WA, Sep 10-13, 2001.

… but people find all these easy to read

Page 15: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

1st Int’l Workshop onHuman Interactive Proofs

PARC, Palo Alto, CA, January 9-11, 2002

Page 16: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

2nd Int’l Workshop onHuman Interactive Proofs

PARC, Palo Alto, CA, January 9-11, 2002Lehigh University, Bethlehem, PA – May 19-20, 2005

Page 17: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Variations & Generalizations

CAPTCHA

Completely Automatic Public Turing test to tell Computers and Humans Apart

HUMANOID

Text-based dialogue which an individual can use to authenticate that he/she is himself/herself (‘naked in a glass bubble’)

PHONOID

Individual authentication using spoken language

Human Interactive Proof (HIP)An automatically administered challenge/response protocol An automatically administered challenge/response protocol

allowing a person to authenticate him/herself as belonging to a allowing a person to authenticate him/herself as belonging to a certain group over a network without the burden of passwords, certain group over a network without the burden of passwords, biometrics, mechanical aids, or special training.biometrics, mechanical aids, or special training.

Page 18: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Weaknesses of Existing CAPTCHAs

English lexicon is too predictable:

• dictionaries are too small

• only 1.2 bits of entropy per character (cf. Shannon)

Physics-based image degradations vulnerable

to well-studied image restoration attacks, e.g.

Complex images irritate people

• even when they can read them

• need user-tolerance experiments

Page 19: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Human Readers

Literature on the psychophysics of reading is helpful:

many kinds of familiarity helps, not just English words

optimal word-image size is known:

0.3-2 degrees subtended angle

optimal contrast conditions known

other factors measured for the best performance:

to achieve and sustain “critical reading speed”

BUT gives no answer to:

where’s the optimal comfort zone?

G. E. Legge, D. G. Pelli, G. S. Rubin, & M. M. Schleske,

“Psychophysics of Reading: I. normal vision,” Vision Research 25(2), 1985.

J. Grainger & J. Segui, “Neighborhood Frequency Effects

in Visual Word Recognition,’ Perception & Psychophysics 47, 1990.

Page 20: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

The BaffleText CAPTCHA

Nonsense words• generate ‘pronounceable’ – not ‘spellable’ – words

using a variable-length character n-gram Markov model• they look familiar, but aren’t in any lexicon, e.g.

ablithan wouquire quasis

Gestalt perception• force inference of a whole word-image

from fragmentary or occluded characters, e.g.

• using a single familiar typeface also helps

M. Chew & H. S. Baird, “BaffleText: A Human Interactive Proof,”

Proc., SPIE/IS&T Conf. on Document Recognition & Retrieval X, Santa Clara, CA, January 23-24, 2003.

Page 21: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Mask Degradations

Parameters of pseudorandom mask generator:• shape type: square, circle, ellipse, mixed• density: black-area / whole-area• range of radii of shapes

Page 22: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

User Acceptance

% Subjects willing to solve a BaffleText…

17% every time they send email

39% … if it cut spam by 10x

89% every time they register for an e-commerce site

94% … if it led to more trustworthy recommendations

100% every time they register for an email account

Out of 18 responses to the exit survey.

Page 23: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Many Are Vulnerable to Character-Segmentation Attack

Effective strategy of attack:

• Segment image into characters

• Apply aggressive OCR to isolated chars

• If it’s known (or guessed) that the word is ‘spellable’

(e.g. legal English), use the lexicon to constrain

interpretations

Patrice Simard (MS Research) reports that this

breaks many widely used CAPTCHAs

Page 24: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

So, try to generate word-imagesthat will be hard to segment into characters

Slice characters up: -vertical cuts; then -horizontal cuts

Set size of cuts to constant within a word

Choose positions of cuts randomly

Force pieces to drift apart: ‘scatter’ horiz. & vert.

Change intercharacter space

Page 25: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Character fragments can interpenetrate

Not only is it hard to segment the word into characters, ….

… it can be hard to recombine characters’ fragments into characters

Page 26: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

How Well Can People Read These?

We carried out a human legibility trial with the help of ~60 volunteers: students, faculty, & staff at Lehigh Univ. plus colleagues at Avaya Labs Research

Page 27: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Subjects were told they got it right/wrong– after they rated its ‘difficulty’

Page 28: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Subjective difficulty ratingsare correlated with illegibility

Right:

Wrong:

1 Easy

2

3

4

5 Impossible

Page 29: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

People Rated These “Easy’ (1/5)

aferatic

memmari

heiwho

nampaign

Page 30: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Rated “Medium Hard” (3/5)

overch / ovorch

wouwould

atlager / adager

weland / wejund

Page 31: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Rated “Impossible” (5/5)

acchown /

echaeva

gualing /

gealthas

bothere /

beadave

caquired /

engaberse

Page 32: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Why is ScatterType legible at all?

Should it surprise you that this is legible…?

We speculate that we can read it because:• human readers exploit typeface consistency cues … evidence remains in small details of local shape• this ability seems largely unconscious

Page 33: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Mean Horizontal Scattervs Mean Vertical Scatter

Mirage: data analysis tool,Tin Kam Ho, Bell Labs.

Right:

Wrong:

1 Easy

2

3

4

5 Impossible

Page 34: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

The Arms Race

When will serious technical attacks be launched?

• ‘spam kings’ make $$ millions

• two spam-blocking firms rely on CAPTCHAs

How long can a CAPTCHA withstand attack?

• especially if its algorithms are published or guessed

Strategy: keep a pipeline of defenses in reserve:

• continuing partnership between R&D & users

Page 35: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

The 2nd HIP Workshop

May 2005 -- Lehigh University, Bethlehem, PA

Advisory Board:

Manuel Blum, CMU

Doug Tygar, UCB CS/SIMS

Patrice Simard, Microsoft Research

Gordon Legge, Univ. Minnesota

Organizers:

Henry Baird, Dan Lopresti

Page 36: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Lots of Open Research Questions

What are the most intractable obstacles to machine vision?

segmentation, occlusion, degradations, …?

Under what conditions is human reading most robust?

linguistic & semantic context, Gestalt, style consistency…?

Where are ‘ability gaps’ located?

quantitatively, not just qualitatively

How to generate challenges strictly within ability gaps?

fully automatically

an indefinitely long sequence of distinct challenges

Page 37: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Disguised CAPTCHAs

Note that many normal navigation aids are CAPTCHAs (though not designed for that purpose)

                

Page 38: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Implicit CAPTCHAs

We are investigating design principles for “implicit CAPTCHAs” that relieve these drawbacks:• Challenges disguised as necessary browsing links• Challenges that can be answered with a single click while still

providing several bits of confidence• Challenges that can be answered only through experience of

the context of the particular website• weave CAPTCHAs into a multi-page “story”• can’t be extracted and “farmed-out” to people

• Challenges that are so easy that failure indicates a failed robot attack

Page 39: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Alan Turing might have enjoyed the irony …

A technical problem – machine reading –

which he thought would be easy,

has resisted attack for 50 years, and

now allows the first widespread

practical use of variants of

his test for artificial intelligence.

Page 40: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting

Pattern Recognition Research LabD. Lopresti & H. S. Baird

Contact

Henry S. [email protected]