image analysis for malicious advertisement detection
DESCRIPTION
TRANSCRIPT
Image analysis for fraudulent advertisementsJithendranath J V
2 04/10/2023
AdvertisementAwesomeAwful
3 04/10/2023
Image Analyzer for Creative Tester
User Issue / Yahoo! Challenge
Roadmap Theme & Goal
• Creative Tester gets approximately around a million creatives per day to be tested for malicious content. Of this 2 %– 5 % of adverts are of category windows mimic. These needs to be detected and banned at the earliest, with less human intervention.
• Need to validate brand safety and ensure quality impressions for advertisers.
• Trust and Safety team in collaboration with Sciences came up with a Image Analyzer module that can detect the malicious advertisements like windows mimic or fake brands with phony downloads and tag them appropriately to be banned.
Value Proposition/Positioning – To reduce the manual effort in
recognizing and banning of malicious advertisements that can be visually
identified as fraudulent
4 04/10/2023
IONIX / CT Ecosystem
Cqueuer (RMX Apps)
IONIXCreative Tester
(CT)
TRF_PROD DB
Primary/Secondary Creative/Click_URL Review
Media Guard Manual Audit
Queue.
Domain Lookup Service
Media Trust
(3rd Party)
Virus Checker
(ClamAv / Trend Micro)
Image Analyzer
Min-bar/Technical Tags
Min-bar /Technical
Tags
Creatives/LineItems gets banned with Min-Bar Classifier
s
Creative Feed based on Advertisers profile
Downloaders (Chrome, Firefox, IE)
Flash Checker
Creatives Banned
5 04/10/2023
IA Internals - Modeler
Feature extraction
SIFT
SURF
CBOW
K Means Computation
Histogram Generation
Model Generation (SVM)
6 04/10/2023
IA Internals - Classifier
Feature extraction
SIFT
SURF
CBOW
Histogram Generation
Classification (SVM)
7 04/10/2023
Performance – Precision and Recall
Precision 0.81818 0.81818 0.81818 0.8125 0.8125 0.76522 0.74638 0.72327 0.67895 0.65198 0.55102 0.41163 0.34281 0.27941 0.22636 0.1649
Recall 0.00402 0.06827 0.13253 0.19679 0.26104 0.3253 0.38956 0.45382 0.51807 0.58233 0.64659 0.71084 0.7751 0.83936 0.90361 1
Threshold 3.64068 2.45085 1.85615 1.24538 0.91759 0.29167 0.06556 -0.18092 -0.52049 -0.78885 -1.18095 -1.75574 -2.07244 -2.52684 -3.13143 -4.45694
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90
0.2
0.4
0.6
0.8
1
1.2
PrecisionVsRecall
PrecisionVsRecall
1 2 3 4 5 6 7 8 9 10111213141516
-5
-4
-3
-2
-1
0
1
2
3
4
5
PrecisionRecallThreshold
8 04/10/2023
IA Integration with CT
Feature Extractor
K Means
Histogram
Classifier
ServletCreative Tester
Image Analyzer
HTTP
9 04/10/2023
IA - API Example
{ "requests":[ { "imgid":"1", "imgurl":"http://ionix.zenfs.com/ct/dev2/screenshots/5d079b5de50f6b30602e4a00b84a6e49e9443af7.jpg", "run_wnddlg":true }]}
{ "responses":[ { "imgurl":"http://ionix.zenfs.com/ct/dev2/screenshots/5d079b5de50f6b30602e4a00b84a6e49e9443af7.jpg", "imgid":"1", "classifiers":[ { "classifier":"wnddlg", "status”:true, "result":true, "conf":0.40216639639794 } ] }]}
Request: Response:
Yahoo! Confidential & Proprietary. 10 04/10/2023
Sample – Classified images
11 04/10/2023
What Does Success Look Like
• Who are the customers?– RMX and APT creative serving systems.– Moneyball (Going forward)
• Success metrics– Reducing the manual effort needed in identifying win mimic based
advertisements– This would be measured by the confidence score generated by the
system, that would eventually help us do everything automated– Reduction in customer complaints.
• Key business stakeholders who have/will validate success– Serving systems– Business teams– Manual review teams
12 04/10/2023
Competitive Landscape
• 3rd party ad verification companies.
etc.,
• What differentiates our product/Solution?
– Avoiding the need to expose and send out demand inventory.– Flexibility to keep improvising the algorithms for higher precision/recall.– Quick turn around time for validation.– Building highly targeted models ( for ex: fake facebook, or fake adobe)