antispam image filtering technologies

139
Image-Filtering Technologies Michael Lamont Senior Software Engineer Process Software

Upload: michael-lamont

Post on 15-Jan-2015

162 views

Category:

Software


1 download

DESCRIPTION

Slides from my wildly popular presentation at HP World 2005. Who knew? Grossly over-simplified signal processing methodology and sample photos of models in bikinis was a winning combo, even in San Francisco.

TRANSCRIPT

Page 1: Antispam Image Filtering Technologies

Image-Filtering

Technologies

Michael Lamont

Senior Software

Engineer

Process Software

Page 2: Antispam Image Filtering Technologies

Overview

• Role of image filtering in anti-spam

filtering

• Two popular image filtering methods:

– Shape recognition

– Skin detection

• Example image filtering

• Image filtering issues

• Tools you can play with on your own

Page 3: Antispam Image Filtering Technologies

What Isn’t Covered

• Anything requiring advanced math

• Optical character recognition (OCR)

Page 4: Antispam Image Filtering Technologies

Spam Images

• A picture is worth 1000 words…

• …and it’s a lot harder to filter than

1000 words.

• Especially when spamvertizing

pornography, photos are essential

marketing tools.

Page 5: Antispam Image Filtering Technologies

Spam Images

• Right now, a spam filter can be very

effective without looking at images.

• This is going to change when the

majority of sites start installing more

accurate filters, and spammers are

forced to adapt.

Page 6: Antispam Image Filtering Technologies

90-Second Image Review

• To understand how image filtering

technologies work, you need a basic

understanding of how computers

represent images.

• Images are broken into square dots,

which correspond to pixels on a

monitor.

Page 7: Antispam Image Filtering Technologies

90-Second Image Review

• Example image:

Page 8: Antispam Image Filtering Technologies

90-Second Image Review

• Each dot’s color is represented by 3

components: red, green, and blue.

• Each of the three color components

has a value of 0 to 255.

• If all three are 0, then the pixel is black.

If all three are 255, then the pixel is

white.

Page 9: Antispam Image Filtering Technologies

90-Second Image Review

• The higher the number, the more

intense the color component.

• Example: Increasing red value from 0

to 255 while leaving other components

at 0:

Page 10: Antispam Image Filtering Technologies

Shape Recognition

• Identifies objects in an image using

posterization and edge finding.

• Extracts interesting objects and

searches for similar objects in a

database of “bad” objects.

• For our application, the objects are

human body parts.

Page 11: Antispam Image Filtering Technologies

Posterization

• Dramatically reduces the number of

colors in an image.

• Has the side effect of lumping most of

an object’s pixels together.

• Called “posterization” because the

same kind of color reduction used to

be done for images printed on posters.

Page 12: Antispam Image Filtering Technologies

Posterization - Example

Page 13: Antispam Image Filtering Technologies

Posterization - Example

Page 14: Antispam Image Filtering Technologies

Posterization - Method

• A number of color bins are created.

• The number of bins is a lot less than

the ~16m colors that are possible.

• Each bin holds several hundred colors

that are closely related.

• Every color in the bin is represented by

the average color.

Page 15: Antispam Image Filtering Technologies

Posterization - Method

• Example: If a bin contained every

shade of red from light pink to dark

blood, every color in the bin would be

represented by plain old red.

• The posterization process itself

consists of replacing the color of every

pixel in the image with its bin’s

representative color.

Page 16: Antispam Image Filtering Technologies

Posterization - Example 2

Page 17: Antispam Image Filtering Technologies

Posterization - Example 2

Page 18: Antispam Image Filtering Technologies

Posterization - Example 3

Page 19: Antispam Image Filtering Technologies

Posterization - Example 3

Page 20: Antispam Image Filtering Technologies

Edge Finding

• After posterizing the image, edge

finding is used to identify individual

objects.

• Edge finding determines the

boundaries between different patches

of color and contrast.

Page 21: Antispam Image Filtering Technologies

Edge Finding - Example

Page 22: Antispam Image Filtering Technologies

Edge Finding - Example

Page 23: Antispam Image Filtering Technologies

Edge Finding - Method

• The edge finding program scans the

image looking for pixels that are very

different from their neighbors.

• When it finds a radically different pixel,

it marks it as part of an edge.

• Good edge finding algorithms look at

lots of neighboring pixels to help

reduce noise.

Page 24: Antispam Image Filtering Technologies

Edge Finding - Demonstration

Page 25: Antispam Image Filtering Technologies

Edge Finding - Example 2

Page 26: Antispam Image Filtering Technologies

Edge Finding - Example 2

Page 27: Antispam Image Filtering Technologies

Edge Finding - Example 3

Page 28: Antispam Image Filtering Technologies

Edge Finding - Example 3

Page 29: Antispam Image Filtering Technologies

Object Extraction

• Once objects have been identified with

posterization and edge finding, they’re

easy to extract.

Page 30: Antispam Image Filtering Technologies

Object Extraction

• Leg, midriff, and upper torso objects

are being searched in the case of

people wearing swimsuits.

Page 31: Antispam Image Filtering Technologies

Object Extraction

• A database of known objects is

searched for matches to the extracted

objects.

• Both object shape and color are used

in the search.

• Comparisons are done with a fuzzy

logic algorithm, since it’s unlikely two

objects will be exactly alike.

Page 32: Antispam Image Filtering Technologies

Skin Detection

• Subset of an image classification

method called color histogram

matching.

• Finds patches of skin tone in an image.

• Calculates the overall percentage of

the image that is skin.

• If more than a specified amount of the

image is skin, it’s filtered.

Page 33: Antispam Image Filtering Technologies

Skin Tones

• Almost all human skin is the same hue

- saturation differences result in

different skin colors.

• Human skin tones don’t often appear

in other photographed objects, so color

alone can be used to identify skin.

• Skin tones are primarily red, without

any blue and little if any green.

Page 34: Antispam Image Filtering Technologies

Skin Color Model

• To identify skin tones in an image, a

filter needs to know what colors are

skin tones.

• You could hardcode every skin color,

but there are tens of thousands of

them.

• Much more accurate to identify skin

patches in an image and “train” the

filter.

Page 35: Antispam Image Filtering Technologies

Skin Color Training

• Works almost like Bayesian filter

training, but with image colors instead

of message tokens.

• Filter maintains one database of skin

colors, and another database of non-

skin colors.

• If a color appears more often in the

skin color database, it’s treated as a

skin color.

Page 36: Antispam Image Filtering Technologies

Skin Color Training

• This system has the nice side-effect of

dropping out most skin colors that also

appear in non-skin areas of photos.

Page 37: Antispam Image Filtering Technologies

Training Sample

Page 38: Antispam Image Filtering Technologies

Skin Identification

• To analyze an image, the filter

examines the color of each pixel.

• If the color is a skin tone, the filter

marks the pixel as skin.

• When every pixel has been examined,

the % of the image that is skin is

calculated.

• If the % is over a specified threshold,

the image is filtered.

Page 39: Antispam Image Filtering Technologies

Skin Detection Example

Page 40: Antispam Image Filtering Technologies

Skin Detection Example

Page 41: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 42: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 43: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 44: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 45: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 46: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 47: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 48: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 49: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 50: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 51: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 52: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 53: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 54: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 55: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 56: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 57: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 58: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 59: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 60: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 61: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 62: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 63: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 64: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 65: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 66: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 67: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 68: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 69: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 70: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 71: Antispam Image Filtering Technologies

Shape Recognition Problems

• Following are examples of images that

shape recognition doesn’t handle

correctly.

• Skin detection handles them correctly,

but only because it’s biased to filter

images with a lot of skin.

Page 72: Antispam Image Filtering Technologies

Shape Recognition Problems

• Unusual angle obscures shapes

Page 73: Antispam Image Filtering Technologies

Shape Recognition Problems

• Unusual angle obscures shapes

Page 74: Antispam Image Filtering Technologies

Shape Recognition Problems

• Unusual angle obscures shapes

Page 75: Antispam Image Filtering Technologies

Shape Recognition Problems

• Skin detection works

Page 76: Antispam Image Filtering Technologies

Shape Recognition Problems

• Skin detection works

Page 77: Antispam Image Filtering Technologies

Shape Recognition Problems

• Shapes are too broken up for the filter

to work

Page 78: Antispam Image Filtering Technologies

Shape Recognition Problems

• Shapes are too broken up for the filter

to work

Page 79: Antispam Image Filtering Technologies

Shape Recognition Problems

• Shapes are too broken up for the filter

to work

Page 80: Antispam Image Filtering Technologies

Shape Recognition Problems

• Skin detection works

Page 81: Antispam Image Filtering Technologies

Shape Recognition Problems

• Skin detection works

Page 82: Antispam Image Filtering Technologies

Shape Recognition Problems

• Not enough “swimsuit” objects

Page 83: Antispam Image Filtering Technologies

Shape Recognition Problems

• Not enough “swimsuit” objects

Page 84: Antispam Image Filtering Technologies

Shape Recognition Problems

• Not enough “swimsuit” objects

Page 85: Antispam Image Filtering Technologies

Shape Recognition Problems

• Skin detection works

Page 86: Antispam Image Filtering Technologies

Shape Recognition Problems

• Skin detection works

Page 87: Antispam Image Filtering Technologies

Shape Recognition Problems

• Not enough “swimsuit” objects

Page 88: Antispam Image Filtering Technologies

Shape Recognition Problems

• Not enough “swimsuit” objects

Page 89: Antispam Image Filtering Technologies

Shape Recognition Problems

• Not enough “swimsuit” objects

Page 90: Antispam Image Filtering Technologies

Shape Recognition Problems

• Skin detection works

Page 91: Antispam Image Filtering Technologies

Shape Recognition Problems

• Skin detection works

Page 92: Antispam Image Filtering Technologies

Shape Recognition Problems

• Image is so noisy that edge detection

goes crazy

Page 93: Antispam Image Filtering Technologies

Shape Recognition Problems

• Image is so noisy that edge detection

goes crazy

Page 94: Antispam Image Filtering Technologies

Shape Recognition Problems

• Image is so noisy that edge detection

goes crazy

Page 95: Antispam Image Filtering Technologies

Shape Recognition Problems

• Amazingly, skin detection still works

Page 96: Antispam Image Filtering Technologies

Shape Recognition Problems

• Amazingly, skin detection still works

Page 97: Antispam Image Filtering Technologies

Skin Detection Problems

• Following are examples of images that

skin detection incorrectly filters.

• Shape recognition works for most of

these, mainly because it can’t extract

any useful shapes.

Page 98: Antispam Image Filtering Technologies

Skin Detection Problems

• Baby photos tend to show lots of skin

Page 99: Antispam Image Filtering Technologies

Skin Detection Problems

• Baby photos tend to show lots of skin

Page 100: Antispam Image Filtering Technologies

Skin Detection Problems

• Shape recognition doesn’t filter the

image

Page 101: Antispam Image Filtering Technologies

Skin Detection Problems

• Shape recognition doesn’t filter the

image

Page 102: Antispam Image Filtering Technologies

Skin Detection Problems

• Shape recognition doesn’t filter the

image

Page 103: Antispam Image Filtering Technologies

Skin Detection Problems

• Portraits have the same problem as

babies.

Page 104: Antispam Image Filtering Technologies

Skin Detection Problems

• Portraits have the same problem as

babies.

Page 105: Antispam Image Filtering Technologies

Skin Detection Problems

• Shape recognition ignores the image.

Page 106: Antispam Image Filtering Technologies

Skin Detection Problems

• Shape recognition ignores the image.

Page 107: Antispam Image Filtering Technologies

Skin Detection Problems

• Shape recognition ignores the image.

Page 108: Antispam Image Filtering Technologies

Skin Detection Problems

• In the right light, sand can be the same

color as skin.

Page 109: Antispam Image Filtering Technologies

Skin Detection Problems

• In the right light, sand can be the same

color as skin.

Page 110: Antispam Image Filtering Technologies

Skin Detection Problems

• That’s fairly rare - usually skin color

models exclude sand colors.

Page 111: Antispam Image Filtering Technologies

Skin Detection Problems

• That’s fairly rare - usually skin color

models exclude sand colors.

Page 112: Antispam Image Filtering Technologies

Skin Detection Problems

• Black & white images can’t be filtered

Page 113: Antispam Image Filtering Technologies

Skin Detection Problems

• It also makes life rough on shape

recognition filters.

Page 114: Antispam Image Filtering Technologies

Skin Detection Problems

• It also makes life rough on shape

recognition filters.

Page 115: Antispam Image Filtering Technologies

Wedding Photos

• Wedding photos are guaranteed to

make a mess of image filters.

• Skin fades into the background

because of soft lighting, soft filters, and

retouching.

• Turns out that brides get upset if the

image is crystal clear with good

contrast - it shows off skin flaws.

Page 116: Antispam Image Filtering Technologies

Wedding Photos

• Skin detection filters start identifying

everything as skin (false positive).

• Shape recognition filters give up and

don’t filter the message (accurate, but

not for the right reasons).

• Porn tends not to be shot with soft

lighting - good contrast makes skin

“pop” in photos.

Page 117: Antispam Image Filtering Technologies

Example Wedding Photo - Shape

Page 118: Antispam Image Filtering Technologies

Example Wedding Photo - Shape

Page 119: Antispam Image Filtering Technologies

Example Wedding Photo - Shape

Page 120: Antispam Image Filtering Technologies

Example Wedding Photo - Skin

Page 121: Antispam Image Filtering Technologies

Example Wedding Photo - Skin

Page 122: Antispam Image Filtering Technologies

Example Wedding Photo - Shape

Page 123: Antispam Image Filtering Technologies

Example Wedding Photo - Shape

Page 124: Antispam Image Filtering Technologies

Example Wedding Photo - Shape

Page 125: Antispam Image Filtering Technologies

Example Wedding Photo - Skin

Page 126: Antispam Image Filtering Technologies

Example Wedding Photo - Skin

Page 127: Antispam Image Filtering Technologies

“Art Porn”

• Usually shot with the same lighting

effects as wedding photos.

• Rarely seen in email.

• In this case, skin detection is accurate

for the wrong reasons while shape

recognition lets the image pass.

Page 128: Antispam Image Filtering Technologies

“Art Porn” Example - Shape

Page 129: Antispam Image Filtering Technologies

“Artistic” Example - Shape

Page 130: Antispam Image Filtering Technologies

“Artistic” Example - Shape

Page 131: Antispam Image Filtering Technologies

“Artistic” Example - Skin

Page 132: Antispam Image Filtering Technologies

“Artistic” Example - Skin

Page 133: Antispam Image Filtering Technologies

Things I Can’t Show You

• S & M

– Skin tends to be covered with “clothing”

– Shapes are broken up by all of the

paraphernalia

• Simpson’s shocker

• Still images from “interesting” videos

– Images are badly pixelated

– Colors are muddy and smudged

Page 134: Antispam Image Filtering Technologies

Image Filtering Issues

• Accuracy:

– Shape recognition misses lots of images it

shouldn’t (false negatives)

– Skin detection filters lots of images it

shouldn’t (false positives)

– Best skin detection systems are about

80% accurate

– Best shape recognition systems are about

40% accurate

Page 135: Antispam Image Filtering Technologies

Image Filtering Issues

• Performance:

– Image filtering requires huge amounts of

memory, CPU time, and disk bandwidth.

– Unacceptably slows down most site’s

email servers/filtering systems.

– DL380 benchmark:

• ~1.2 million messages/hour with no filtering

• ~195,000 messages/hour with skin detection

• ~69,000 messages/hour with shape recognition

Page 136: Antispam Image Filtering Technologies

Image Filtering Issues

• Diminishing returns on accuracy - most

spam filters won’t see a noticeable

increase in accuracy with the addition

of image filtering.

• That’s likely to change in the future as

spammers discover it’s one of the

better options for circumventing current

solutions.

Page 137: Antispam Image Filtering Technologies

I Wanna Play!

• Shape recognition:

– UC Berkeley’s blobworld

• Open source

• http://elib.cs.berkeley.edu/

– Skin detection

• No good open-source examples

• Trivial to write your own using ImageMagick

• http://www.imagemagick.org/

Page 138: Antispam Image Filtering Technologies

Quick Review

• We covered:

– How and why images appear in spam

– Why the use of images in spam is likely to

increase

– Two methods for filtering images

– Examples of how the two methods work

and don’t work

– Why image filtering isn’t widely used at

this point.

Page 139: Antispam Image Filtering Technologies