1. committee members and signatures: - home |...

1

Master Project Report

PhishLurk:

A Mechanism for Classifying and

Preventing Phishing Websites

By: Mohammed Alqahtani

1. Committee Members and Signatures: Approved by Date

__________________________________ _____________

Advisor: Dr. Edward Chow

__________________________________ _____________

Committee member: Dr. Albert Glock

__________________________________ _____________

Committee member: Dr. Chuan Yue

2

Abstract

Phishing attackers have been improving and sophisticating their attempts using different

ways and methods to target users. At the same time, users are using varieties ways to access the

internet with different platforms, different computation capabilities and various level of

protection support which expands the surface for phishing attackers and complicates the

provisioning of security protection.

I proposed PhishLurk, an anti-phishing search website that classifies and prevents

phishing attacks. PhishLurk provides the protection from the server side and uses the coloring

scheme and warning for classification in order to consume as little computation and screen

resource as possible on the client-side. It can work efficiently with varieties of devices having

different capabilities. PhishLurk uses PhishTank as the blacklist provider and checks the list in

real time to achieve the maximum possible accuracy. The idea of PhishLurk can be a useful

enhancement, if it is adopted by major search engines, e.g., Google and Yahoo. Besides the

mechanism can be optimized to apply and work efficiently for smartphones.

3

1. Introduction Phishing is a cybercrime when an attacker tries to gather personal and financial

information, such as usernames, passwords, and credit card numbers, from recipients by

pretending to be a legitimate website. Most phishing attacks come into two types: emails and

webpages that spoof or lure the user to enter sensitive information. On other words, phishing is

directing users to fraudulent web sites in order to get the sensitive information. The sensitive

information can be confidential information or financial data [22]. Figure 1 shows a sample of

phishing website. Phishers used to utilize emails to lure the targets to give away some

information. Lately, Phishers started to used different methods to lure and steal the targeted

users’ information, Methods such as faked websites, trojans, key-loggers and screen captures

[23].

Fiugre1: Sample of a phishing website (source: www . phishtank . com )

1.1 Impact of phishing

Phishing has been a major concern in the IT security. In the U.S., companies lose more

than $2 billion every year as results of phishing attacks [6]. 1.2 million users in the U.S. were

4

phished between May 2004 & May 2005 which approximately cost $929 million [6]. AOL-UK

announced that one out of twenty users has lost money from phishing attacks [25]. In 2010 a

survey indicates that generally between half a billion dollars to $1 trillion every year is the loss

from cybercrime due to the loss of confidential banking information or corporate data [25].

2. BackgroundRecently, Users started to have more varieties of access to surf the internet for example

notebooks, PC, game console, handhelds, and smartphones , However; using more varieties of

devises made in different abilities and features make it complicate to provide a full protection,

especially from phishing attacks . Currently there is no perfect protection. One of the most used

devices is smartphones. According to a survey of ComScore, Inc. the number of smartphones

subscribers increased 60 percent in 2010 compared to 2009 [4]. Another report by Nielsen

Company indicates that by 2011 half of cell-phones users would be using smartphones [5]. Users

prefer to use these types of access to do their activities and tasks due to the advantages they

provide. Smartphone is preferred to use because of the easiness, flexibility, and mobility. Some

activities such as online banking, paying bills, online shopping, emailing, and social

networking[5] demand users to enter sensitive information to complete the authentication and

authorization process. Sensitive information could be credit-cards numbers, password and

usernames. In fact, having varieties of accesses to the internet expands the surface for phishing

attackers and complicate the protection.

2.1 History of Phishing

The idea of luring people to give away their sensitive information simply started using the

phone calls. Phishers used the combined phishing technique: making phone calls “Phreaking”

5

and luring the target client “Fishing”. In mid-1990’s, the main target of phishing attackers was

America Online (AOL). Phishers keep sending instant messages to users, using social

engineering and similar domain names like www.ao1.com, to lure users to reveal their

passwords. Then, utilize users’ account for free. Later attackers started seeking for more details

and information such as credit card numbers and social security numbers. During the past ten

years, Phishing attackers start attacking at a higher level and target financial service’ and online

payment’ users directly such as E-buyers, PayPal, eBay and banks. In addition to the previous

techniques, attackers used more advance techniques such as key-logging, browser vulnerabilities,

and link obfuscation [27].

2.2 Most Targeted Industries

As result of the dense confidential content and financial use, the financial services and

online payment are the most targeted industries by phishing attackers [22]. Figure 2 shows the

distribution of the phishing activities by the targeted areas.

Figure 2. Phishing Activity Trends Report - 2nd Half 2010 - Anti-Phishing Working Group (APWG)

6

2.3 Why Phishing Works

Phishing works because of many reasons. One of the most common reasons is the users’

carelessness and ignorance about how to differentiate whether the website is legitimate or

phishing [1]. Moreover, phishing attackers work hard by sending millions of messages and

attempts, looking for vulnerabilities, and seeking for sensitive information.

2.4 Existing Work Anti-Phishing:

Many techniques have been proposed focusing on anti-phishing, using different methods

of filtering and detection, such as black lists, plugs-in, extensions, and toolbars for browsers [2].

The developers of browsers try hard to provide a solid protection such as warning the user by

displaying a box massage if the website is a potential phishing website, or contains invalid or

expired SSL certificates. Often a third party and black-lists are involved to display and identify

phishing websites [3].

3. Related Work PhishTank is a nonprofit project aimed to build dependable database of phishing URLs

[7]. The project is to collect, verify, track, and share phishing data. In order to report a phishing

link, the user has to be registered as a member. So the admin can learn and judge each member's

contribution. The phishing websites can be reported and submitted via emails or via PhishTank’s

websites. The data are verified by a committee after they are submitted by the members.

PhishTank’s database can be shared via an API. The links in the original database are only

classified as “phishing” and “unknown”. We propose to classify the phishing links based on

PhishTank database with a more precise modification. PhishTank has been working effectively

7

to fight against phishing attacks, thousands of phishing links are detected and verified as valid

phishing sites monthly [9]. It uses the public’s effort and contribution to build a trustworthy and

dependable database that is open for everyone to use and share. As a result, several well-known

organizations and browsers started using PhishTank database such as Yahoo mail, Opera,

MacAfee, and Mozilla Firefox [10]. In my prototype, I use PhishTank as a phishing blacklist

provider.

In the paper titled “Large-Scale Automatic Classification of Phishing Pages [2]”, Colin

Whittaker, Brian Ryner, and Marria Nazif proposed an automatic classifier to detect phishing

websites. The classifier maintains Google’s phishing blacklist automatically and analyzes

millions of pages a day including examining the URL and the contents to verify whether the page

is phishing or not. The paper proposed a classifier works automatically with large-scale system

which will maintain a false positive rate below 0.1% and reduce the life time of phishing page.

They used machine learning technique to analyze the web page content. In my project, the

determination is based on Phishtank’s blacklist, My goal is not to determine whether the page

phishing or not, but to provide a new method to classify phishing links and considering two

factors: consuming as less memory and screen space as possible which eventually improve the

overall classification efficiency.

In the paper titled “PhishGuard: A Browser Plug-in for Protection from Phishing [8],

Joshi, Y. Saklikar, S. Das, D. Saha, proposed a mechanism to detect a forged website via

submitting fake credentials before the actual credentials during the login process of a website,

then the server-side analyzes the responses of the submissions of all those credentials to

determine whether the website is phishing or not. The mechanism was implemented on browsers

8

side “user-side” as plug-in of Mozilla Firefox, However; the mechanism only detects during the

log-in process for a user. If another user log-in to the same phishing website, he will goes

through the same detection process. In my project, if the website reported as phishing site, no

other user can get access, the reported link will be blocked, to the reported website.

In the paper titled “BogusBiter: A Transparent Protection Against Phishing Attacks [17]”

Chuan Yue and Haining Wang proposed a client-side tool called BogusBiter that send a large

number of bogus credentials to suspected phishing sites and hides the real credentials from

phishers . BogusBiter is unique and help legitimate web sites detect stolen credentials in a timely

manner by having the phisher to verify the credentials he has collected at that legitimate web

site. Bogus Biter was implanted as Firefox 2 extension. My project is different since it uses the

server side to provide the protection.

In the paper titled “The Battle Against Phishing: Dynamic Security Skins [18]” Rachna

Dhamija and J. D. Tygar proposed an anti-phishing tools helps user distinguishing if they are

interacting with a trusted site or not by [1]. This approach uses shared cryptographic image that

remote web servers use to proof their identities to users, in a way that supports easy verification

for humans being and hard for attackers to spoof/ It can’t provide protection when we have

users utilizing a public access because the approach requires support from both client-sides and

server-side. In my project there is no dependency on the client-side.

3.1 Blacklisting

Blacklisting is the idea of denying the access to resources based on a list. The blacklisting

is determined either by a mechanism automatically e.g., Google’s blacklist [2] or by the users’

9

feedback as the case in PhishTank [7], where users submit and report the suspicious websites.

The object of a blacklist can be a user, IP, website, or software.

We can classify varieties of blacklists as follows:

Content filter: It is a proxy server to filter the content. The proxy server not only blocks

banned URLs using blacklist but also use keywords, metadata, and pictures to filter the

content. Examples of content filters include DansGuardian [28] and SquidGuard [Refs]. In

SquidGuard, The proxy use advance web filtering polices to prevent inappropriate content

for the organization or company. The filter blocks URLs using blacklist, controls the content

by using the inferred keywords blocking from the metadata and the page content.

SquidGuard are used mostly at educational environments and for kids’ protection. The main

goal of content filter is to speed up the access control management efficiently. In

DansGuardian, the client requests URLs, DansGuardian collects them and compare against

the blacklist and whitelist. In case the request is clean, DansGuardian passes along the URL

request. If the URL is not clean, DansGuardian blocks it [28].

E-mail spam filter: It monitors, prevents, and blocks spam emails and phishing emails using a

blacklist of spam emails resource. It prevents them from reaching the client side. There are

many blacklists of emails’ anti spams, e.g., GFI MailEssentials’s list, ATL Abuse Block List,

Blacklist Master, Composite Blocking List (CBL), and SpamCop.

Many web-browsers and companies use their own blacklist against spams and phishing, e.g.,

IE, Google, and Norton.

3.2 Current Browser’s Phishing Protection

Most popular browsers provide a phishing filter that warns users from malicious websites

including phishing websites. Filters mainly depend on certain lists to detect the malicious

10

websites. IE7 used “Phishing Filter” that has been improved to be SmartScreen Filter in later

version of IE due to the weak protection phishing filter provides [15]. In IE 8 and IE 9

"SmartScreen Filter" verifies the visited websites based on the updated list of malicious websites

that Microsoft created and updated continuously [11] [12]. Similar to IE, Safari browser has

filters checking the websites while the user browsing against a list of phishing sites. After the

warning of PayPal to its members that Safari is not safe for their service [13], Safari started to

use an extended validation certificates to support analyzing websites [14]. Earlier versions of

Firefox take advantage of ant-phishing companies such as GeoTrust, or the Phish-Tank, using

their list to support identifying malicious websites. The current version of Firefox has adopted

Google's anti-phishing program to support its phishing protection.

Many research projects have proposed mechanisms that implemented as browser plugs-in

or tool-bar against phishing attack. The main problem with plugs-in and tool bar is the need for

users’ cooperation. Users may not cooperate and install the tool. Some users occasionally prefer

to turn their filter off to brows faster [16]. Plugs-in and tools bar in some devices may not be as

effective as in desktop browser due to the limitation in the performance and the screen space as

the case in smartphones.

3.3 Classification of Phishing Defense

The different phishing defense approaches can be further classified based on where the

alerts are generated:

• Browsers themselves: IE9, Firefox 5.

• Browsers extensions or plug-ins: BogusBiter, PhishGuard.

• Anti-phishing Search Site: PhishLurk “my project”.

11

• Proxy server: Dansguardian [20].

• Anti-phishing Server: OpenDNS [19], GFI MailEssentials [21], and some browser

extensions use server side partially such as Skins [18].

According to the official website [20], DansGuardian is an active web content filter that

filters web sites based on a number of criteria including website URL, words and phrases

included in the page, file type, mime type and more. DansGuardian is configured as a proxy

server that control, filter, and monitor all content. Therefore it functions more than anti-phishing.

There is no such a project using proxy server as anti-phishing but it can be really an effective

technique to classify and prevent phishing websites.

4. The Proposed ProjectIn this project we propose to create a software tool, called PhishLurk, aiming to classify

and block phishing links. PhishLurk uses PhishTank as the provider of the blacklist. PhishLurk

indicates the risk to users and consumes as little computation and screen recourses as possible,

using coloring scheme and warning annotation. The process is fully done on the search server

side and delivers classified and protected links to the users. Even if the phishing protection was

disabled or uninstalled on client-side, PhishLurk still provide protected and classified links to the

user. Figure 3 shows explains PhishLurk’s scenario against phishing sites. In addition, PhishLurk

has a database which contains records for the visits of each website, and how many times the

website has been visited.

12

Figure 3: Diagram explains PhishLurk’s scenario against phishing sites

5. Design of PhishLurk

5.1 PhishLurk Components:

Classifier: to assesses and classifies the links based on PhishTank’s blacklist.

Logger: records the visits of each link, how many times the link has been visited.

Blacklist: an updated blacklist and Live checking using API.

Database: to store every single visited link, the number of visits for each link and

the link’s class.

Figure 4 is a diagram shows the design of PhishLurk.

13

Figure 4: Diagram shows the design of PhishLurk

Classifier: PhishLurk’s mechanism assesses and classifies the links based on PhishTank’s

blacklist, the mechanism classifies as following:

Phishing link (Red): It is an absolute phishing link., the user will be warn highly not to

access the linke. So even if the user is ignorant or surfing carelessly as we saw in the

survey [1], the user will goes through many warning indicators.

Unknown link (Orange): It is a suspicious link. It might potentially be a phishing link. It

could be a link indicates the same name or part of a real company's name asking the user

to provide sensitive information. The link is submitted as a phishing link but it hasn’t

been verified yet. If the user clicks and gets access to this type of sites, it is their own

responsibility. The user gets warned before accessing the link.

Safe Link (Blue): These are safe links, totally not phishing. The user can access the link

without triggering warning messages. Figure 5 shows the categories of links that

PhishLurk classifies.

14

Type Description Color Treatment

Phishing link A valid phishing link, high risk. Red Users will be warned highly not to access the links.

Unknown link Suspicious links, might be potentially phishing, but not verified yet.

OrangeUsers are warned about potential impact.

Safe Link Links that are not blacklisted. Blue user can access the link without triggering warning messages

Figure 5: Table showing the categories of links PhishLurk classifies

Blacklist: PhishLurk utilizes PhishTank’s blacklist. In order to achieve the possible

maximum accuracy, PhishLurk updates the blacklist using two different methods:

- Updating the blacklist periodically: downloading it every 24 hours.

- Live checking using API.

Here the live checking is referred to checking individual URL with PhishTank. If you have 10

urls in the web page, 10 queries to PhishTank will be issued. Therefore there are trade-offs

between these two approaches.

Logger: PhishLurk has a logger that records the number of visits for every single visited link

within the web application and stores the data’s logs in PhishLurk’s database including

URLs, visits and the current class of the URL.

Database: It is a database to store the records of every single visited link including the

number of visits for each link and the link’s class. Users can have an access to the database to

view the table of the all likes have been visited by PhishLurk’s users; the links are also

colored based on their class on the revised web page.

6. Implementation

PhishLurk’s is programmed in PHP. PHP is widely used in web server side programing

and deployed on many web servers. PHP currently is supported by most of web servers including

15

Apache and Microsoft Internet Information Server. PHP works easily with HTML and provides

the ability to interact with the user dynamically.

Given that PhishLurk’s mechanism is aimed to use as less space and competition as

possible in the client side, PhishLurk uses CSS for classifying and indicating the risk level of the

links, due to the light computation CSS consumes. I created a database using MySQL to store the

logs records. I used two methods to read and to update the blacklist from PhishTank: Live

checking and periodic downloaded blacklist.

6.1 The Information flow

The information flow in PhishLurk starts by receiving the keywords queries from the

user. Next, the keyword is transferred to the search engine to execute the queries. Then, the

PhishLurk Classifier received query results and classifies them based on PhishTank’s blacklist.

After the classification, PhishLurk creates log records for all the visited URLs and registers the

visit. Finally, requested URLs are delivered to users’ browsers. Figure 6 explains the information

flow in PhishLurk.

Figure 6: Flowchart showing the information flow in PhishLurk.

16

PhishLurk needs a search engine to process the search queries. I used Google.com to

process the queries. I will explain why I use Google in Section 8. To send quires to Google, I

used the following statements:

$gg_url = 'http://www.google.com/search?hl=en&q='. urlencode($query) . '&start=';

$ch = curl_init($gg_url.$page.'0'); curl_setopt_array($ch,$options); $scraped=""; $scraped.=curl_exec($ch); curl_close( $ch ); $results = array(); preg_match_all('/a href="([^"]+)" class=l.+?>.+?<\/a>/',$scraped,$results);

To receive the results back from Google and to show them, PhishLurk uses the following

statements:

$ch = curl_init($gg_url.$page.'0'); curl_setopt_array($ch,$options); $scraped=""; $scraped.=curl_exec($ch); curl_close( $ch ); $results = array(); preg_match_all('/a href="([^"]+)" class=l.+?>.+?<\/a>/',$scraped,$results);

For each link of the page results, Metadata function is used to show the website’s title and the

description related to the URL.

$content = file_get_contents($url);$title = getMetaTitle($content);$description = getMetaDescription($content);

6.2 Blacklist

PhishLurk needs to use the blacklist to classify a link. To check against the blacklist I

used two methods: updated blacklist and live checking.

6.2.1 Updated Blacklist

PhishTank provides a downloadable database “blacklist” and updated hourly to facilitate

utilizing PhishTank’s blacklist and phishing detection in your application. The PHP format of the

17

blacklist is available on: (http://data.phishtank.com/data/online-valid.php_serialized). The

blacklist file is big. The average size of the black list is between 13 and 17 MB, which takes time

to process and slows the performance during the update.

To improve the performance, I minimize size of the blacklist by first changing its format from

Phish-id Phish_detail_url URL Submission_time Verified Verification_time Online Target

to

Phish-id URL Class

I removed the fields that I don’t use in my prototype.

I created a function that reads the list from the file Blist.txt and if the link is blacklisted, it is

classified as “phishing”. If a link are reported as a potential phishing link but not yet verified, it

is classified as “unknown”.

$class= 0; $file_handle = fopen("blist.txt", "rb"); while (!feof($file_handle) ) { $line_of_text = fgets($file_handle); $parts = explode(',', $line_of_text); if ($url==$parts[0]) { $class= $parts[1];} elsif ($url==$parts[0]) { $class= $parts[2];} } fclose($file_handle);

Due to the parsing errors in processing the blacklist, I resort to use the Excel’s function. The

drawback is that the process is changed to partially manual. The problem is solved by using live

checking.

6.2.2 Checking the URLs Live:

I used the API to make a live checking with the blacklist. This method also works with

HTTP POST request, the same PhishLurk uses, and responds with the URL's status in the

http://data.phishtank.com/data/online-valid.php_serialized

18

database. I created a parameter called $phishtank that PhishLurk sends it to the PhishTank API-

checking:

$phishtank = file_get_contents("http://checkurl.phishtank.com/checkurl/index.php?url=$url");

For example the Link “www.uccs.edu” has been received from the search results and will be sent to get

live checked to PhishTank.

$phishtank=file_get_contents("http://checkurl.phishtank.com/checkurl/index.php?url=http://www.uccs.edu/");

The response appears in XML format as the following :

<response><meta><timestamp>2011-08-18T04:09:22+00:00</timestamp><serverid>2d5c2cb</serverid><requestid>192.168.0.109.4e4c90729dea26.99932296</requestid></meta><results><url0><url><![CDATA[ http://www.uccs.edu/ ]]></url><in_database>false</in_database></url0></results></response>

6.3 Classification

After the checking, the links go through the classification function. The process is explained

further.

Phishing Links: if ($class == 1) { Shows a note = "This web page has been reported as a phishing webpage based on our security preferences"the user redirected to warning.php with class1 and its URL.Scheme color colors the link red and prints small tag next to the title (Phishing Link).

http://www.uccs.edu/

19

Unknown Links:Elseif ($class ==2){ Shows a note ="This web page might potentially be p a phishing page" the user redirected to warning.php with class 2 and its URL.Scheme color colors the link orange and prints small tag next to the title (Known Link).

Safe Links:Else ($class == 0){ the user transferred directly to the logger ” log.php” with class 0 and its URL. Then to the targeted URL through}

6.4 Warning:

I created a dynamic page, “warning.php” for the generating the warnings. Having one

dynamic warning page for all classes is useful to control the writing of the log records. First, the

warning page recognizes the link’s class using ($_GET['class'] == “class # ”) . Then it shows the

warning of that class. The process is as follows:

if the class == 1 // phishing link{Print “Phishing Site!”Display: “A warning note: This web page is reported as phishing website. We recommend you to exit, otherwise, click on “Proceed”.This URL has been visited: “visits number” by PhishLurk's users}Elseif if the class == 2 // Unknown link {Print “Unknown page!”Display a warning note: This web page might potentially be a phishing page. If youtrust this page click “Proceed”, otherwise, exits.This URL has been visited: “visits number” by PhishLurk's userselse{ die (); }

If the class number is not listed or the user tries to use unlisted class number, PhishLurk kills the

request by using the PHP function Die ();.

20

In order to show the user how many times the link has been visited by PhishLurk’s users, I

created visited.php to connect to the database and querying about the link visits, The function as

following:

<?php$link = mysql_connect('server-name', 'root', 'password');mysql_select_db('visits',$link);$sql = "SELECT * FROM `visits` WHERE `link` = '$url'"; // looking for the link $result = mysql_query($sql);if (mysql_num_rows($result) == 1) // if the link exited, it will have one records {$line = mysql_fetch_array($result) ;// leave message for the userecho "<br>This URL has been visited: $line[2] <td>by PhishLurk's users"; }// zero visits if there is not a record.else{ echo "<br>This URL has been visited: 0 <td>by PhishLurk's users"; }?>

6.5 Logger

The logger function is to count the visit of each URL and display the log’s records from the

database.

6.5.1 Creating and updating the records

After the user decides to access a website, the browser will be directed to go.php whether via

warning page” warning.php” by clicking on “Proceed” or directly from the results’ page, in case

the link was safe.

href="go.php?url=<? echo $”url”; ?>&class=”class number">

Next, go.php receives the URL and its class, and record the new visit. Then, go.php connects to

the database and looks for the URL. If the URL has a record, it increases the number of visits;

otherwise; it creates a new record. If the URL doesn’t have a record that means it is a new visited

URL.

21

$url = $_GET['url'];$class = $_GET['class'];// Connecting to the DB$link = mysql_connect('server-name', 'root', 'password');mysql_select_db('visits',$link);// Performing SQL query about the visited URL in the DB$sql = "SELECT * FROM `visits` WHERE `link` = '$url'";$result = mysql_query($sql);if (mysql_num_rows($result) == 1) // if there is a records{ $line = mysql_fetch_array($result) ;

$id= $line[0];$old_visits = $line[2];$new_visits = $old_visits+1; // count one more visit

$sql = "UPDATE `visits` SET `visits` = '$new_visits', `class` = '$class' WHERE `id` = '$id'"; mysql_query($sql); // update the DB}else{ // if there is nothing, add a new record for the new URL $sql = "INSERT INTO `visits` VALUES (0, '$url', 1, $class)"; mysql_query($sql);}?>

Finally the browser is immediately redirected to the requested URL.

<META http-equiv="refresh" content="0;URL=<? echo $url; ?>">

6.5.2 View the logs

The logger uses log.php to display the entire logs’ records.

<?php// reading the logsif (mysql_num_rows($result) >= 1){ while ($line = mysql_fetch_array($result, MYSQL_NUM)) {?> <tr > <td class="style<?php echo $line[3]; ?>"> <?php echo $line[1]; ?> </td> // 1st Col= URL <td class="style<?php echo $line[3]; ?>"> <?php echo $line[2]; ?></td> // 2nd Col= visits # <td class="style<?php echo $line[3]; ?>"> <?php echo $line[3]; ?></td> // 3rd Col= Class # </tr> }?>

Figure 7 shows how the logger will show the records to the user in log.php page.

22

Link Visits Classhttp://www.phishing-link.com/ 5 1

http://www.unknown-link.com/ 5 2

http://www.safe-link.com/ 1 0

Figure 7: A sample of the log’s table at the database

6.6 Database:

Since PhishLurk needs to update and write the logs records, I need to create a database using

MySQL to make it easier to update the records. I called the database “visits”, there is one table

“visits” to store the URLs, the number of visits, and the class.

CREATE TABLE `visits` (`id` INT( 2 ) NOT NULL AUTO_INCREMENT PRIMARY KEY , // URL-ID`link` VARCHAR( 300 ) NOT NULL , // the URL`visits` INT( 2 ) NOT NULL DEFAULT '0') ENGINE = MYISAM ; // number of visits`class` INT( 2 ) NOT NULL DEFAULT '0') ENGINE = MYISAM ; // class number

7. Performance Evaluation

7.1 Challenges

Correctness: How correct is the result PhishLurk sending and how many varieties of

accesses can a user to benefit from PhishLurk?

Timeliness: How to keep the blacklist up to date?

Overhead: How long it takes to have the classified results back? How big is the difference in

the time execution using PhishLurk.

7.2 Test bed experiment

In the test bed experiments, I used the local server “Apache v2.2.1” on Windows environment.

Apache supports both PHP and MySQL. We using HP Notebook PC” Pavllion dv6700” has

CPU: Intel Core2 Duo 2.00 GHz and RAM of 3GB. We test PhishLurk using five different

browsers (IE 9, Chrome 5, Firefox 5, Opera, Safari 5.1)

23

7.3 Experiment

I created a test bed to examine the functionality of PhishLurk including the correctness and the

timeliness. In order to know how accurate and updated PhishLurk performs, I tested PhishLurk

by sending queries which search for websites that was assumed to blacklisted, and using most

common keywords in phishing websites. In my searches 20 Blacklisted phishing URLs and 13

unknown website appears in the search results, PhishLurk was able to detect and classify all of

them. Figure 8 is a chart shows how many links of assumed blacklisted link PhishLurk was able

to detect and classify.

phishing unknown0

5

10

15

20

25

Blacklited URLDetected Links

Figure 8: PhishLurk was able to detect and classify all of them.

After I changed the updating process to the live check, there was a slight increase in the time

execution. On average it is 0.1238 seconds for each single link between the PhishLurk blacklist

and PhishLurk live checking. Figure 9 shows the average of the time execution for each link.

Figure 10 shows the difference in execution time along with 20 queries. Due to the

comprehensive protection of Google blacklisting, we didn’t find phishing sites appear in the

PhishBank Blacklist. As resulte, we simulated experiment was conducted by changing the

24

blacklist in PhishLurk to verify the performance of PhishLurk. In part 8, we explain the reson of

using Google as search engine.

Figure 9: The average of the time exaction for each link

Figure 10: The difference in exaction time for 20 queries

7.3.1 Impact of PhishLurk: Delays caused by PhishLurk

We sent 10 queries to PHishLurk twice. At first time, PhishLurk was enabled. At the second

time, PhishLurk was disabled. There was a slight increase in the time execution. On average it is

2.1901 seconds for each query. However; it can be improved in the future. Figure 11 shows the

slight increase in caused by PhishLruk among 10 queries.

25

Figure 11: A slight increase in time execution caused by PhishLruk among 10 queries.

7.3.2 Impact of the page size with different alerting schemes.

In PhishLurk, The total size of the page result is 1.01 KB (1,038 bytes). On the other hand, Norton and McAfee use image scheme to rate and warn about the results in the search page. In Norton’s scheme, the size of single image itself is 3 KB. In McAfee, The size of a single image itself is 1 KB. Figure 12 shows the size and location of the image alerting scheme in Norton. Figure 13 shows the size and location of the image alerting scheme in McAfee.

Figure 12. The size of the image used in Norton

Figure 13. The size of the image used in McAfeePhishLurk has smaller web page size because of the coloring scheme and the text-based warning.

Seconds

26

7.3.2. Impact of the time performance on different browsers

In order to test the impact of the time performance on different browsers, I tested

PhishLurk on 5 different browsers: Chrome, IE, Firefox, Opera and Safari, by sending the same

10 queries to 5 different browsers. There were light differences in all the queries. Each query

takes the execution time between 0.0014 and 0.0015 seconds. Chrome was the faster browser.

Figure 14 shows The average of the time exaction for 10 queries using 5 different

browsers .

Chrom IE FireFox opera safari0.001370

0.001380

0.001390

0.001400

0.001410

0.001420

0.001430

0.001440

0.001450

0.001460

Figure 14: Average time exaction of 10 queries using 5 different browsers

7.4 Analysis of PhishTank Blacklist

We observed and analyzed Phishtank blacklist daily for 2 weeks, between 08-08-2011

and 21-08-2011. The maximum number of verified links a day was 96 links and the minimum

was 35 links. On average, 72.3 links every day. Figure 15 shows the amounts of links were

verified each day.

27

Figure 15: the amount of links were verified between each day.

We also observed and analyzed Phishtank blacklist hourly for 24 hours on 22-08-2011. The total

number of verified links was high, 362 links. The maximum number of verified links hourly was

42 links and the minimum was zero. Figure 16 show the number of the verified links hourly on

22-08.2011.

Figure 16: the number of the verified links at each hour on 22-08.2011.

28

8. Discussion

8.1 Expanding the categories: Lately, it’s been noticed that some official website found hosting phishing websites in

their server [29,30,and 31]. However; it is really rare to have official websites of government,

hospital, or university to host a phishing website. When official websites host phishing sites or

producing attacks, it is typically from an insider who has privileges to access and control the

system, or the site might actually be attacked by cross-site scripting attacks, or SQL injection

attack [30]. Another possibility is that it might be someone reported the unlikely site trying to

damage the reputation of the organizations. In my opinion, the links to these kinds of websites

should have their own class, it could be called unlikely link. Unlikely link is the same as

unknown link, the difference is when the blacklist gets a report about the link that is unlikely to

be a phishing link. For example, the websites that have Top-Level Domain “TLD” ends with

(.edu or .gov) are in this categories.

The link will maintain the unlikely status until gets verified. It is fair to maintain the unlikely

status until it gets verified and changed to be a Safe link.

Figure 15. Global Phishing Survey: Trends and Domain Name Use - April 2011

As we see in Figure 15, 60% phishing attacks were lunched by servers in these TLDs: .COM,

.NET, .TK, and .CC.

29

8.2 Lessens Learned

8.2.1 Search Engine

Using your own search engine is very beneficial and hard at the same time. One

advantage, you narrow down the search’s range that you are looking for. In our case we need a

widely used search engine. I tried to create a PHP search engine, but it needs a huge database

and implementation of crawling functions . As result, I used Google.com as search engine.

Google already has their own phishing protection, so the protection would be doubled as we

combine the protection of Google’s and that of PhishLurk.

During the evaluation I had difficulties looking for phishing links. Having Google as

search engine at the first line makes really hard to evaluate your prototype. I started with a PHP

search engine but it doesn’t work efficiently, it has to be entered with a very huge database of

URLs. In fact, creating a complicated search engine is much more difficult than I expected.

8.2.2 Disadvantages of Ajax Ajax provides dynamic interaction between the browser and the server

and generates preferred results or provides suggestions to the user. Ajax

relies on JavaScript which could cause some difficulties to run consistently

among different browsers because Javasscript cannot be installed the same

way in browsers.

The process on the client side causes more interaction with JavaScript

and the browser which against one of the ultimate goals of PhishLurk that is

provide the protection from the server side and without requiring the client

side cooperation. ]. It also requires loading or referencing of additional AJAX

library which results in the increase of the page size.

30

8.2.3 Blacklist and Live Checking

Manual process: During the implementation I had a lot of parsing errors in processing

the blacklist. I resort to use the Excel’s function, the drawback is that the process was partially

manual. . However; we solved the problem by the other method “live checking” which is totally

automatic.

From partially Manual to fully automatic and live.

Unknown links: Another problem is that unverified links don’t show in

the blacklist, PhishTank doesn’t Provide the unknown links in the

downloadable database. API method can provide that but we have to have

the ID number of the link. We can’t know the ID number of the link because

the downloaded data base provide only the verified links.

Figure 16: PhishTank’s respond to my email.

Figure 16 shows PhishTank’s respond indicates that we can determine the

unknown links using Phish-id but phishtank provide only the verified phishing

links, so there is no way to know the unknown’s ID. So unknown links is

available only through the website, we can only show them through browser

31

the website. Figure 16 shows PhishTank respond and they promise to solve it

ASAP.

9. Future workThe current version of PhishLurk seems to be working efficiently. Its functions can be

further improved and enhanced ., It can be extended for spams protection PhishLurk can be

enhanced with the following features and to be implemented on others devices and systems::

Tune up the code to optimize its performance.

Improve the features to increase the flexibility.

Client side protection: create a plug-in on smartphone’s browser such as Blackberry

web-kit browser.

Follow-up reporting: create a module to send out email asking the users who decided to

visit potential phishing sites and provide their feedbacks.

Conduct a survey on how useful is PhishLurk after making it available on internet.

32

10. Conclusion

I designed and developed PhishLurk, an anti-phishing search website that classifies and

prevents phishing attacks. PhishLurk provides the protection from the server side and uses the

coloring scheme for classification in order to consume as little computation and screen resource

as possible on the client-side. It can be ported to working efficiently with varieties of devices

have different capabilities. PhishLurk uses PhishTank’s as the blacklist provider and checks the

list live to achieve the maximum possible accuracy.

The efficiency of PhishLurk is affected by some factors, including the accuracy of

backlists and search engines.

I believe the idea of PhishLurk can be a good enhancement feature to be included in a major

search engine such as Google and Yahoo. Moreover the mechanism can be optimized to be

applied and work efficiently in smartphones.

11. Acknowledgment:

I would like to thank my advisor, Dr. Edward Chow for his support and continual

encouragement during my research.

I thank Dr. Albert Glock and Dr. Chuan Yue for willing to serve as committee members

in my project.

33

12. References

1. Rachna Dhamija, J. D. Tygar, and Marti Hearst. 2006. Why phishing works. In

Proceedings of the SIGCHI conference on Human Factors in computing systems

(CHI '06), Rebecca Grinter, Thomas Rodden, Paul Aoki, Ed Cutrell, Robin Jeffries,

and Gary Olson (Eds.). ACM, New York, NY, USA, 581-590.

DOI=10.1145/1124772.1124861 http://doi.acm.org/10.1145/1124772.1124861.

2. Colin Whittaker, Brian Ryner, Marria Nazif, “Large-Scale Automatic Classification

of Phishing Pages”, NDSS '10, 2010.<

http://research.google.com/pubs/pub35580.html >

3. Gross, Ben. "Smartphone Anti-Phishing Protection Leaves Much to Be Desired |

Messaging News." Messaging News | The Technology of Email and Instant

Messaging. 26 Feb. 2010. Web. <http://www.messagingnews.com/story/smartphone-

anti-phishing-protection-leaves-much-be-desired>.

4. ComScore, Inc. "Smartphone Subscribers Now Comprise Majority of Mobile

Browser and Application Users in U.S." ComScore, Inc. - Measuring the Digital

World. ComScore, Inc, 1 Oct. 2010.

<http://www.comscore.com/Press_Events/Press_Releases/2010/10/Smartphone_Subs

cribers_Now_Comprise_Majority_of_Mobile_Browser_and_Application_Users_in_

U.S>.

5. Entner, Roger. "Smartphones to Overtake Feature Phones in U.S. by 2011."

Http://www.nielsen.com. Nielsen Wire, 26 Mar. 2010. Web.

<http://blog.nielsen.com/nielsenwire/consumer/smartphones-to-overtake-feature-

phones-in-u-s-by-2011/>.

6. Kerstein, Paul L. "How Can We Stop Phishing and Pharming Scams?" CSO Online

- Security and Risk. CSO Magazine - Security and Risk, 19 July 2005. Web.

<http://www.csoonline.com/article/220491/how-can-we-stop-phishing-and-

pharming-scams->.

7. OpenDNS, LLC. PhishTank: an Anti-phishing Site. [Online].

http://www.phishtank.com.

http://doi.acm.org/10.1145/1124772.1124861

34

8. Joshi, Y.; Saklikar, S.; Das, D.; Saha, S.; , "PhishGuard: A browser plug-in for

protection from phishing," Internet Multimedia Services Architecture and

Applications, 2008. IMSAA 2008. 2nd International Conference on , vol., no., pp.1-6,

10-12 Dec. 2008 doi: 10.1109/IMSAA.2008.4753929, URL:

http://ieeexplore.ieee.org/stamp/stamp.jsp?

tp=&arnumber=4753929&isnumber=4753904

9. PhishTank - Statistics about phishing activity and PhishTank usage ,

http://www.phishtank.com/stats.php

10. PhishTank, Friends of PhishTank, http://www.phishtank.com/friends.php

11. SmartScreen Filter: Frequently Asked Questions." Windows Home - Microsoft

Windows. <http://windows.microsoft.com/en-US/windows7/SmartScreen-Filter-

frequently-asked-questions-IE9>.

12. "SmartScreen Filter - Microsoft Windows." Windows Home - Microsoft Windows.

Web. <http://windows.microsoft.com/en-US/internet-explorer/products/ie-9/

features/smartscreen-filter>.

13. Apple - Safari - Learn about the Features Available in Safari." Apple.

<http://www.apple.com/ca/safari/features.html>.

14. TECH.BLORGE- Top Technology news, Paypal warns buyers to avoid Safari

browser from Apple - < http://tech.blorge.com/Structure:%20/2008/02/28/paypal-

warns-buyers-to-avoid-safari-browser-from-apple/ >

15. "Firefox 2 Phishing Protection Effectiveness Testing." Home of the Mozilla Project.

<http://www.mozilla.org/security/phishing-test.html>.

16. "AVIRA News - Anti-Virus Users Are Restless, Avira Survey Finds." Antivirus

Software Solutions for Home and for Business. <http://www.avira.com/en/press-

details/nid/482/>.

17. Chuan Yue and Haining Wang. 2010. BogusBiter: A transparent protection against

phishing attacks. ACM Trans. Internet Technol. 10, 2, Article 6 (June 2010), 31

pages. DOI=10.1145/1754393.1754395 http://doi.acm.org/10.1145/1754393.1754395

18. Rachna Dhamija and J. D. Tygar. 2005. The battle against phishing: Dynamic

Security Skins. In Proceedings of the 2005 symposium on Usable privacy and

http://doi.acm.org/10.1145/1754393.1754395

http://www.phishtank.com/friends.php

http://www.phishtank.com/stats.php

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4753929&isnumber=4753904


35

security (SOUPS '05). ACM, New York, NY, USA, 77-88.

DOI=10.1145/1073001.1073009 http://doi.acm.org/10.1145/1073001.1073009

19. OpenDNS | DNS-Based Web Security. <http://www.opendns.com/>.

20. DansGuardian - True Web Content Filtering for All. <http://dansguardian.org/>.

21. GFI - Web, Email and Network Security Solutions for SMBs on Premise and

Hosted. http://www.gfi.com/

22. Anti-Phishing Working Group, Phishing Activity Trends Report - 2nd Half 2010."

Anti-Phishing Working Group (APWG).

http://www.antiphishing.org/reports/apwg_report_h2_2010.pdf, Dec. 2010. Web.

23. G. Ollman, The Phishing Guide: Understanding and Preventing Phishing Attacks,

22. September 2004, http://www.technicalinfo.net/papers/Phishing.html

24. Persson, Anders. "Exploring Phishing Attacks and Countermeasures." Blekinge

Institute of Technology, Dec. 2007. <http://citeseerx.ist.psu.edu/viewdoc/download?

doi=10.1.1.111.2030>.

25. im Richardson. "Brits Fall Prey to Phishing." The Register-Sci/Tech News for the

World. May 2005. Web. 07 Aug. 2011.

<http://www.theregister.co.uk/2005/05/03/aol_phishing/>.

26. Red Condor – Revolutionizing Spam Fighting. "Phishing for Disaster: The Cost of

Corporate Ignorance." Red Condor, July 2010. Web.

<http://www.edgewave.com/docs/whitepaper/RedCondor_phishing-white-

paper.pdf>.

27. Zulfikar Ramzan. "A Brief History of Phishing: Part I | Symantec Connect

Community." Symantec - Official Blog. 29 June 2009. Web.

<http://www.symantec.com/connect/blogs/brief-history-phishing-part-i>.

28. Ying-Dar Lin; Chih-Wei Jan; Po-Ching Lin; Yuan-Cheng Lai; , "Designing an

Integrated Architecture for Network Content Security Gateways," Computer ,

vol.39, no.11, pp.66-72, Nov. 2006, doi: 10.1109/MC.2006.379 , URL:

http://ieeexplore.ieee.org/stamp/stamp.jsp?

tp=&arnumber=4014769&isnumber=4014747

29. Jeremy Kirk. "Sony Server Said to Have Been Hacked to Host Credit-card Phishing

Site - Latimes.com. May 2008. Web..



http://www.technicalinfo.net/papers/Phishing.html

http://www.gfi.com/

http://doi.acm.org/10.1145/1073001.1073009

36

http://latimesblogs.latimes.com/technology/2011/05/sony-servers-hacked-host-credit-

card-phishing-site.html.

30. Kirk, Jeremy. "Hacked Bank Server Hosts Phishing Sites" Computerworld - IT

News, Features, Blogs, Tech Reviews, Career Advice. Mar. 2006. Web. 18 Aug.

2011.

<http://www.computerworld.com/s/article/109500/Hacked_bank_server_hosts_phis

hing_sites>.

31. Dennis Fisher. "Researchers Find Government Site Hosting Phishing Data |

Threatpost." Threatpost | The First Stop for Security News. Apr. 2008. Web. 18

Aug. 2011. <http://threatpost.com/en_us/blogs/researchers-find-government-site-

hosting-phshing-data-061610>.

32. Zepeda, J.S.; Chapa, S.V.; , "From Desktop Applications Towards Ajax Web

Applications," Electrical and Electronics Engineering, 2007. ICEEE 2007. 4th

International Conference on , vol., no., pp.193-196, 5-7 Sept. 2007 doi:

10.1109/ICEEE.2007.4345005, URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?

tp=&arnumber=4345005&isnumber=4344971.

33. Wang Jing; Xu Feng; , "The Research of Ajax Technique Application Based on the

J2EE," Database Technology and Applications (DBTA), 2010 2nd International

Workshop on , vol., no., pp.1-3, 27-28 Nov. 2010, doi: 10.1109/DBTA.2010.5659073,

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?

tp=&arnumber=5659073&isnumber=5658597.

34.





http://latimesblogs.latimes.com/technology/2011/05/sony-servers-hacked-host-credit-card-phishing-site.html

http://latimesblogs.latimes.com/technology/2011/05/sony-servers-hacked-host-credit-card-phishing-site.html

37

13. Index

1. Introduction 3

1.1 Impact of phishing 3

2. Background 4

2.1 History of Phishing 4

2.2 Most Targeted Industries 5

2.3 Why Phishing Works 6

2.4 Existing Work Anti-Phishing: 6

3. Related Work 6

3.1 Blacklisting 8

3.2 Current Browser’s Phishing Protection 9

3.3 Classification of Phishing Defense 10

4. The Proposed Project 11

5. Design of PhishLurk 12

5.1 PhishLurk Components: 12

6. Implementation 14

6.1 The Information flow 15

6.2 Blacklist 16

6.2.1 Updated Blacklist 16

6.2.2 Checking the URLs Live 17

6.3 Classification 18

6.4 Warning 19

6.5 Logger 20

6.5.1 Creating and updating the records 20

6.6.1 View the logs 21

6.6 Database 22

7. Performance Evaluation 22

7.1 Challenges 22

7.2 Test bed experiment 22

7.3 Experiment 23

38

7.3.1 Impact of PhishLurk: Delays caused by PhishLurk24

7.3.2 Impact of the page size with different alerting schemes 25

7.3.3 Impact of the time performance on different browsers 26

7.4 Analysis of PhishTank Blacklist26

8. Discussion 28

8.1 Expanding the categories 28

8.2 Lesson Learned 29

8.2.1 Search Engine 29

8.2.2 Disadvantages of Ajax 29

8.2.3 Blacklist and Live Checking

30

9. Future work 31

10. Conclusion 32

11. Acknowledgment 32

12. References 33

13. Index 37

Appendix A. User Guide 39

Appendix B. Installation and Configuration of PhishLurk 42

39

Appendix A. User Guide

PhishLurk is simple to use. The main page includes the text box where the user

input the keywords.

Figure shows the main page of PhishLurk

After the user enters the keywords, the search result will be shown as links with

their title and description.

40

How to know the classification of the links. There three classes :

1. Phishing Link: Phishing Link is a risky link, PhishLurk displays the

Phishing link in red color and added text next to the title indicates that the

link phishing.

Phishing Link appeared in search results

If the user click to access the phishing link, PhishLurk assure the user that

link is risky by transferring him to the warning page. The warning page alert the

user from the risk and shows how many times the websites has been visited.

Figure shows the warning page of phishing links

41

2. Unknown Link: Unknown Link is a suspicious Link, PhishLurk displays the

Unknown link in Orange color and added text next to the title indicates that

the link is Unknown.

Unknown Links appeared in search results

If the user click to access the Unknown link, PhishLurk assure the user that

link is suspicious by transferring him to the warning page. The warning page alert

the user that the link is potentially risky and shows how many times the websites

has been visited.

Figure shows the warning page of unknown links

3. Safe Links: The Link is safe and not blacklisted. The user can access safely.

Phishing Link appeared in search results

42

Appendix B. Installation and Configuration of PhishLurk1. You need to install apache web server. You can download a version of apache on :

<http://httpd.apache.org/download.cgi>. If you have difficulties installing apache on

windows, you can read the directions on this very useful link

http://httpd.apache.org/docs/2.0/platform/windows.html .

2. You need to down load the source code, it is available on

<http://cs.uccs.edu/~gsc/pub/master/malqahta/src/PhishLurk.rar> .

3. Extract the file’s content and relocate them on “ C:\AppServ\www\phishlurk\”, after you

create the folder “phishlurk”.

4. There are two version of Phishlurk :

Updated Blacklist, it can be reached on

“http://localhost/phishlurk/localchecksearch.php”

Live checking PhishLurk, it can be reached on

“http://localhost/phishlurk/livechecksearch.php”

http://localhost/phishlurk/livechecksearch.php

http://localhost/phishlurk/localchecksearch.php

http://httpd.apache.org/docs/2.0/platform/windows.html

http://httpd.apache.org/download.cgi

1. committee members and signatures: - home |...

Documents