john p., fang yu, yinglian xie, martin abadi, arvind krishnamurthy university of california, santa...

45
Searching the Searchers with SearchAudit John P., Fang Yu, Yinglian Xie, Martin Abadi, Arvind Krishnamurthy University of California, Santa Cruz USENIX SECURITY SYMPOSIUM, August, 2010 A Presentation at Advanced Defense Lab

Upload: edward-rice

Post on 27-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • John P., Fang Yu, Yinglian Xie, Martin Abadi, Arvind Krishnamurthy University of California, Santa Cruz USENIX SECURITY SYMPOSIUM, August, 2010 John P., Fang Yu, Yinglian Xie, Martin Abadi, Arvind Krishnamurthy University of California, Santa Cruz USENIX SECURITY SYMPOSIUM, August, 2010 A Presentation at Advanced Defense Lab
  • Slide 2
  • Outline Introduction Related Work Architecture Implementation Stage 1 Implementation Stage 2 Attack 1: Indentifying Vulnerable Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger Phishing Conclusion Advanced Defense Lab2
  • Slide 3
  • Introduction A framework that identifies malicious queries from massive search engine logs to uncover their relationship with potential attacks. Use a small set of malicious queries as seed, and generates regular expressions for detecting new malicious queries. Advanced Defense Lab3
  • Slide 4
  • Introduction Two stage: Identification Investigation SearchAudit identifies malicious queries. Analyzing those queries and the attacks of which they are part. Advanced Defense Lab4
  • Slide 5
  • Introduction Enhanced detection capability 400 becomes 4 million. Low false-positive rates. 2% Ability to detect new attacks Forum spaming Facilitation of attack analysis Analyze a series of phishing attacks that lasted for more than one year. Advanced Defense Lab5
  • Slide 6
  • Outline Introduction Related Work Architecture Implementation Stage 1 Implementation Stage 2 Attack 1: Indentifying Vulnerable Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger Phishing Conclusion Advanced Defense Lab6
  • Slide 7
  • Related Work Advanced Defense Lab7 Theres a significant amount of automated Web traffic on the Internet. Another research showed that more than 3% of the entire search traffic may be generated by stealthy search bots. Whats the motivation of those search bots? Search engine competitors Studying search quality Click fraud for monetary gain Spreading infection (MyDoom, Santy) Identifying victims
  • Slide 8
  • Related Work Advanced Defense Lab8 Using regular expression patterns Hon-eycomb Polygraph Hamsa AutoRE (A way to generate RE from another research)
  • Slide 9
  • Outline Introduction Related Work Architecture Implementation Stage 1 Implementation Stage 2 Attack 1: Indentifying Vulnerable Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger Phishing Conclusion Advanced Defense Lab9
  • Slide 10
  • Architecture Let attackers be our guides Follow their activities and predict their future attacks. Advanced Defense Lab10
  • Slide 11
  • Architecture Platform Dryad/DryadLINQ Query Expansion Taking a small set of seed queries and expand them Extract IPs and search again Regular Expression Generation Signature Generation (AutoRE)AutoRE Eliminating Redundancies Eliminating Proxies Advanced Defense Lab11
  • Slide 12
  • Arch. Eliminating Redundancies Advanced Defense Lab12 Algorithm REGEX_CONSOLIDATE
  • Slide 13
  • Architecture Eliminating Proxies Advanced Defense Lab13 Most users in a geographical region have similar query patterns. Mostly legitimate users queries will have a large overlap with the popular queries from the same /16 IP prefix. We label an IP as a proxy if K most popular queries from that IP and the K most popular queries from that prefix overlap in m queries. K = 100, m = 5
  • Slide 14
  • Outline Introduction Related Work Architecture Implementation Stage 1 Implementation Stage 2 Attack 1: Indentifying Vulnerable Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger Phishing Conclusion Advanced Defense Lab14
  • Slide 15
  • Data Description and Sys Setup Use 3 months of search logs from the Bing search engine.Bing search February 2009 (when it was known as Live Search) December 2009 January 2010 Each month of sampled data contains around 2 billion pageviews. The seed 500 malicious queries are obtained from a hacker Web site milw0rm.commilw0rm.com Takes about 7 hours to process the 1.2 TB of sampled data. Advanced Defense Lab15
  • Slide 16
  • Selection of RE Use Cookies to identify the malicious queries. Benign proxy are eliminated. Use a threshold to pick regular expressions based on their scores. Advanced Defense Lab16
  • Slide 17
  • Detection Results: Effect of Query Expansion and Regular Expression Matching Feed the 500 malicious queries into SearchAudit, we find that 122 of the 500 queries appear in the dataset. February 2009 dataset 174 IPs issued these queries Use the result to feed our system again 800 unique queries from 264 IPs Advanced Defense Lab17
  • Slide 18
  • Detection Results Advanced Defense Lab18
  • Slide 19
  • Effect of Incomplete Seeds Split the 122 seed queries into two sets 100 queries that were first posted on milw0rm.com before 2009 22 queries were posted in 2009 Advanced Defense Lab19
  • Slide 20
  • Looping Back Seed Queries Use derived RE as new seeds to feed back as an input to SearchAudit. Advanced Defense Lab20
  • Slide 21
  • Overall Matching Statistics Advanced Defense Lab21
  • Slide 22
  • Verification of Malicious Queries As we lack ground truth information about whether a query is malicious or not. Check whether the query is reported on any hacker Web sites Check query behavior whether the query matches individual bot or botnet features For each query q returned by SearchAudit Issue a query q AND (dork OR vulnerability) to search engine, and save the results. Advanced Defense Lab22
  • Slide 23
  • Verification of Queries Generated by Individual Bots Two features help us to distinguish bot queries from human queries Cookie: Most bot queries do not enable cookies, resulting in an empty cookie field. Normal users who do not clear their cookies, all the queries carry the old cookies. Link clicked Many bots do not click any link on the result page. Instead, they scrape the results off the page. Advanced Defense Lab23
  • Slide 24
  • Verification of Queries Generated by Individual Bots Advanced Defense Lab24
  • Slide 25
  • Verification of Queries Generated by Botnets If most of the IPs that issued malicious queries exhibit similar behavior, then its likely that all these IPs were running the same script. User agent Contains information about the browser and the version used Metadata Records certain metadata that comes with the request Pages per query Records the number of search result pages retrieved per query Inter-query interval Denotes the time between queries issued by the same IP Advanced Defense Lab25
  • Slide 26
  • Verification of Queries Generated by Botnets Advanced Defense Lab26
  • Slide 27
  • Verification of Queries Generated by Botnets Advanced Defense Lab27
  • Slide 28
  • Outline Introduction Related Work Architecture Implementation Stage 1 Implementation Stage 2 Attack 1: Indentifying Vulnerable Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger Phishing Conclusion Advanced Defense Lab28
  • Slide 29
  • Analysis of Detection Results Large countries such as USA, Russia, and China are responsible for almost half the IPs issuing malicious queries. Vulnerable Web Sites Try to exploit these web sites by SQL injection index.php?content=[?=#+;&:]{1,10} Try to find particular software with known vulnerabilities Power by Forum spamming /includes/joomla.php site:.[a-zA-Z]{2,3} Windows Live Messenger phishing Advanced Defense Lab29
  • Slide 30
  • Analysis of Detection Results Advanced Defense Lab30
  • Slide 31
  • Outline Introduction Related Work Architecture Implementation Stage 1 Implementation Stage 2 Attack 1: Indentifying Vulnerable Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger Phishing Conclusion Advanced Defense Lab31
  • Slide 32
  • Identifying Vulnerable Web Sites Applications of Vulnerability Searches Sample 5000 queries returned by SearchAudit. For every query q we issue a query q dork vulnerability. Obtain 80,490 URLs from 39,475 unique Web sites. Compare this list of random Web sites against a list of known phishing or malware sites. PhishTank Microsoft Test and show that many of these sites indeed have SQL injection vulnerabilities. Advanced Defense Lab32
  • Slide 33
  • Identifying Vulnerable Web Sites Advanced Defense Lab33
  • Slide 34
  • SQL Injection Vulnerabilities For the malicious queries, we look at the search results and crawl all of the links twice. First time, we crawl the link as is Second time, we add a single quote () If the two pages are identical, then it suggests that theres no obvious SQL injection vulnerability If the second page have any kind of SQL error, then there might exists an SQL injection vulnerability In 14,500 URLs, we find 1,760 URLs (12%) may have SQL injection vulnerability. Advanced Defense Lab34
  • Slide 35
  • Outline Introduction Related Work Architecture Implementation Stage 1 Implementation Stage 2 Attack 1: Indentifying Vulnerable Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger Phishing Conclusion Advanced Defense Lab35
  • Slide 36
  • Forum-Spamming Attacks We manually identified 46 REs that are associated with forum spamming. Advanced Defense Lab36
  • Slide 37
  • Advanced Defense Lab37
  • Slide 38
  • Forum-Spamming Attacks Advanced Defense Lab38
  • Slide 39
  • Apps of Forum Searching Queries Using Project Hony Pot to identify Web spammingProject Hony Pot Advanced Defense Lab39
  • Slide 40
  • Outline Introduction Related Work Architecture Implementation Stage 1 Implementation Stage 2 Attack 1: Indentifying Vulnerable Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger Phishing Conclusion Advanced Defense Lab40
  • Slide 41
  • Windows Live MSN Phishing What is a MSN Phishing ? http://[a-zA-Z0-9._]*. / http:// ?user=[a-zA-Z0-9._]* Advanced Defense Lab41
  • Slide 42
  • Windows Live MSN Phishing Advanced Defense Lab42
  • Slide 43
  • Characteristics of Compromised Accounts Advanced Defense Lab43
  • Slide 44
  • Outline Introduction Related Work Architecture Implementation Stage 1 Implementation Stage 2 Attack 1: Indentifying Vulnerable Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger Phishing Conclusion Advanced Defense Lab44
  • Slide 45
  • Conclusion Advanced Defense Lab45