image analysis for malicious advertisement detection

Image analysis for fraudulent advertisementsJithendranath J V

2 04/10/2023

AdvertisementAwesomeAwful

3 04/10/2023

Image Analyzer for Creative Tester

User Issue / Yahoo! Challenge

Roadmap Theme & Goal

• Creative Tester gets approximately around a million creatives per day to be tested for malicious content. Of this 2 %– 5 % of adverts are of category windows mimic. These needs to be detected and banned at the earliest, with less human intervention.

• Need to validate brand safety and ensure quality impressions for advertisers.

• Trust and Safety team in collaboration with Sciences came up with a Image Analyzer module that can detect the malicious advertisements like windows mimic or fake brands with phony downloads and tag them appropriately to be banned.

Value Proposition/Positioning – To reduce the manual effort in

recognizing and banning of malicious advertisements that can be visually

identified as fraudulent

4 04/10/2023

IONIX / CT Ecosystem

Cqueuer (RMX Apps)

IONIXCreative Tester

(CT)

TRF_PROD DB

Primary/Secondary Creative/Click_URL Review

Media Guard Manual Audit

Queue.

Domain Lookup Service

Media Trust

(3rd Party)

Virus Checker

(ClamAv / Trend Micro)

Image Analyzer

Min-bar/Technical Tags

Min-bar /Technical

Tags

Creatives/LineItems gets banned with Min-Bar Classifier

s

Creative Feed based on Advertisers profile

Downloaders (Chrome, Firefox, IE)

Flash Checker

Creatives Banned

5 04/10/2023

IA Internals - Modeler

Feature extraction

SIFT

SURF

CBOW

K Means Computation

Histogram Generation

Model Generation (SVM)

6 04/10/2023

IA Internals - Classifier

Feature extraction

SIFT

SURF

CBOW

Histogram Generation

Classification (SVM)

7 04/10/2023

Performance – Precision and Recall

Precision 0.81818 0.81818 0.81818 0.8125 0.8125 0.76522 0.74638 0.72327 0.67895 0.65198 0.55102 0.41163 0.34281 0.27941 0.22636 0.1649

Recall 0.00402 0.06827 0.13253 0.19679 0.26104 0.3253 0.38956 0.45382 0.51807 0.58233 0.64659 0.71084 0.7751 0.83936 0.90361 1

Threshold 3.64068 2.45085 1.85615 1.24538 0.91759 0.29167 0.06556 -0.18092 -0.52049 -0.78885 -1.18095 -1.75574 -2.07244 -2.52684 -3.13143 -4.45694

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.2

0.4

0.6

0.8

1

1.2

PrecisionVsRecall

PrecisionVsRecall

1 2 3 4 5 6 7 8 9 10111213141516

-5

-4

-3

-2

-1

0

1

2

3

4

5

PrecisionRecallThreshold

8 04/10/2023

IA Integration with CT

Feature Extractor

K Means

Histogram

Classifier

ServletCreative Tester

Image Analyzer

HTTP

9 04/10/2023

IA - API Example

{ "requests":[ { "imgid":"1", "imgurl":"http://ionix.zenfs.com/ct/dev2/screenshots/5d079b5de50f6b30602e4a00b84a6e49e9443af7.jpg", "run_wnddlg":true }]}

{ "responses":[ { "imgurl":"http://ionix.zenfs.com/ct/dev2/screenshots/5d079b5de50f6b30602e4a00b84a6e49e9443af7.jpg", "imgid":"1", "classifiers":[ { "classifier":"wnddlg", "status”:true, "result":true, "conf":0.40216639639794 } ] }]}

Request: Response:

Yahoo! Confidential & Proprietary. 10 04/10/2023

Sample – Classified images

11 04/10/2023

What Does Success Look Like

• Who are the customers?– RMX and APT creative serving systems.– Moneyball (Going forward)

• Success metrics– Reducing the manual effort needed in identifying win mimic based

advertisements– This would be measured by the confidence score generated by the

system, that would eventually help us do everything automated– Reduction in customer complaints.

• Key business stakeholders who have/will validate success– Serving systems– Business teams– Manual review teams

12 04/10/2023

Competitive Landscape

• 3rd party ad verification companies.

etc.,

• What differentiates our product/Solution?

– Avoiding the need to expose and send out demand inventory.– Flexibility to keep improvising the algorithms for higher precision/recall.– Quick turn around time for validation.– Building highly targeted models ( for ex: fake facebook, or fake adobe)

image analysis for malicious advertisement detection

Technology