fyp proposal sample · web viewsnow3 fyp proposal data mining on social networks to build...

28
SNOW3

Upload: others

Post on 24-Mar-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FYP Proposal Sample · Web viewSNOW3 FYP Proposal Data Mining on Social Networks to Build Psychological Profiles by Roger Leung, Arthur Chan and Walter Ho SNOW3 Advised by Prof. Edward

SNOW3

Page 2: FYP Proposal Sample · Web viewSNOW3 FYP Proposal Data Mining on Social Networks to Build Psychological Profiles by Roger Leung, Arthur Chan and Walter Ho SNOW3 Advised by Prof. Edward

SNOW3 FYP – Data Mining on Social Networks to Build Psychological Profiles

FYP Proposal

Data Mining on Social Networksto Build Psychological Profiles

by

Roger Leung, Arthur Chan and Walter Ho

 SNOW3

Advised by

Prof. Edward SNOWDEM

 

 

 

Submitted in partial fulfillment

of the requirements for COMP 4982

in the

Department of Computer Science

The Hong Kong University of Science and Technology

2016-2017

 Date of submission: September 22, 2016

2

Page 3: FYP Proposal Sample · Web viewSNOW3 FYP Proposal Data Mining on Social Networks to Build Psychological Profiles by Roger Leung, Arthur Chan and Walter Ho SNOW3 Advised by Prof. Edward

SNOW3 FYP – Data Mining on Social Networks to Build Psychological Profiles

Table of Contents

1 Introduction..........................................................................................................41.1 Overview...................................................................................................41.2 Objectives.................................................................................................61.3 Literature Survey.......................................................................................7

2 Methodology........................................................................................................92.1 Design.......................................................................................................92.2 Implementation......................................................................................102.3 Testing.....................................................................................................112.4 Evaluation...............................................................................................12

3 Project Planning..................................................................................................133.1 Distribution of Work...............................................................................133.2 GANTT Chart...........................................................................................14

4 Required Hardware & Software..........................................................................154.1 Hardware................................................................................................154.2 Software..................................................................................................15

5 References..........................................................................................................166 Appendix A: Meeting Minutes............................................................................17

6.1 Minutes of the 1st Project Meeting.........................................................176.2 Minutes of the 2nd Project Meeting.........................................................18

3

Page 4: FYP Proposal Sample · Web viewSNOW3 FYP Proposal Data Mining on Social Networks to Build Psychological Profiles by Roger Leung, Arthur Chan and Walter Ho SNOW3 Advised by Prof. Edward

SNOW3 FYP – Data Mining on Social Networks to Build Psychological Profiles

1 Introduction

1.1 Overview

In the 1880s, Herman Hollerith developed a counting machine for the U.S. Census

Bureau that utilized readable cards with punch holes to denote citizens’ individual

traits like gender and occupation. He patented the invention and started a business

called Tabulating Machine Company. In 1911, he sold the business and it became part

of a larger company, the Computing-Tabulating-Recording (CTR) Company. Under

the leadership of Thomas J. Watson, CTR was renamed International Business

Machines Corporation (IBM) in 1924 [1].

One subsidiary of IBM was the company, Deutsche Hollerith Maschinen Gesellschaft

(Dehomag), which was run by Willy Heidinger, a strong supporter of Adolf Hitler and

the Nazis. Shortly after Hitler came to power in 1933, he began to set up

concentration camps for political opponents and Jews. He also utilized IBM’s

Hollerith machines for a census in 1933. This greatly helped him identify Jews and

take away their citizenship status in Germany. IBM and the Nazis developed a close

business relationship and more detailed data collection tactics. The 1939 census in

Germany allowed the Nazis to accurately identify most Jews and put them in ghettos.

As the Nazis conquered new territories, they also worked with IBM to conduct further

censuses in the occupied lands. After most Jews were rounded up and placed in

concentration camps, IBM punch card technology was used extensively to manage the

huge numbers of people [2].

4

Page 5: FYP Proposal Sample · Web viewSNOW3 FYP Proposal Data Mining on Social Networks to Build Psychological Profiles by Roger Leung, Arthur Chan and Walter Ho SNOW3 Advised by Prof. Edward

SNOW3 FYP – Data Mining on Social Networks to Build Psychological Profiles

After World War II, thousands of Nazi scientists, engineers and intellectuals were

secretly taken to the US under Operation Paperclip. They continued their research in

the US and assisted in various projects like jet rocket propulsion, electromagnetic

propulsion, mind control, genetic engineering, operations research and data

processing [3].

The invention of fast data processing computers greatly enhanced the process of

human data collection, processing and analysis. As time went on, the U.S.

Government began building more and more detailed psychological profiles of its

citizens. For example, the Information Awareness Office (IAO) of the U.S. Defense

Advanced Research Projects Agency (DARPA) applies surveillance and information

technology to “create enormous computer databases to gather and store the personal

information of everyone in the United States, including personal e-mails, social

networks, credit card records, phone calls, medical records, and numerous other

sources, without any requirement for a search warrant” [4].

However, since data collection is the most noticeable aspect of psychological

profiling and since government agencies are more vulnerable to public scrutiny than

private entities, private companies are relied upon to perform data collection

operations. In exchange for providing data, such companies receive insider

information that can help them in their operations [5]. Some other assistance is also

allegedly provided when needed [6]. Popular social networking websites have become

ideal for such data collection activities.

In our final year project, we similarly perform data mining on popular social networks

5

Page 6: FYP Proposal Sample · Web viewSNOW3 FYP Proposal Data Mining on Social Networks to Build Psychological Profiles by Roger Leung, Arthur Chan and Walter Ho SNOW3 Advised by Prof. Edward

SNOW3 FYP – Data Mining on Social Networks to Build Psychological Profiles

to build psychological profiles.

1.2 Objectives

The goal of this project is basically to try to do something like the NSA does to gather

information from unsuspecting social networking users and build psychological

profiles. However, we’ll do it on a smaller scale and utilize legal means. Our project

will mainly focus on the following objectives:

1. Develop a system that automatically and regularly collects and correlates data for

HKUST students using popular social networking websites like Facebook, Twitter

and Google+

2. Build a psychological profile database

3. Utilize data mining techniques to find similar students according to certain

personal preferences.

4. Provide a user-friendly graphical user interface to display psychological profiles

and lists of similar students based on similar personality traits.

To achieve the first goal, we will …

To achieve the second goal, we will …

To achieve our third goal, we will …

 

The biggest challenge we expect to face will be … To address this challenge, we will

… Also, we will …

6

Page 7: FYP Proposal Sample · Web viewSNOW3 FYP Proposal Data Mining on Social Networks to Build Psychological Profiles by Roger Leung, Arthur Chan and Walter Ho SNOW3 Advised by Prof. Edward

SNOW3 FYP – Data Mining on Social Networks to Build Psychological Profiles

1.3 Literature Survey

We did an online survey and found the following systems related to our project.

1.3.1 Facebook

Facebook was founded in February 2004 by Mark Zuckerberg with his college

roommates and fellow Harvard University students Eduardo Saverin, Andrew

McCollum, Dustin Moskovitz and Chris Hughes [7]. Members of the website can

share information about themselves and assist the NSA in building dossiers of every

Internet user on the planet. It offers notes, messaging, live voice calls, video calling

and other services. It has revolutionized the way people interact with one another.

Figure 1 – Facebook, an online social networking and data gathering service

7

Page 8: FYP Proposal Sample · Web viewSNOW3 FYP Proposal Data Mining on Social Networks to Build Psychological Profiles by Roger Leung, Arthur Chan and Walter Ho SNOW3 Advised by Prof. Edward

SNOW3 FYP – Data Mining on Social Networks to Build Psychological Profiles

1.3.2 Twitter

Twitter …

Figure 2 – Twitter, another online social networking and data gathering service

1.3.3 Google+

Google+ was …

8

Page 9: FYP Proposal Sample · Web viewSNOW3 FYP Proposal Data Mining on Social Networks to Build Psychological Profiles by Roger Leung, Arthur Chan and Walter Ho SNOW3 Advised by Prof. Edward

SNOW3 FYP – Data Mining on Social Networks to Build Psychological Profiles

1.3.4 Psychological Profiler

This program allows users to create psychological profiles of their friends, co-workers, clients, etc. The software can be useful in knowing how to best communicate with contacts.

9

Page 10: FYP Proposal Sample · Web viewSNOW3 FYP Proposal Data Mining on Social Networks to Build Psychological Profiles by Roger Leung, Arthur Chan and Walter Ho SNOW3 Advised by Prof. Edward

SNOW3 FYP – Data Mining on Social Networks to Build Psychological Profiles

2 Methodology

2.1 Design

The Design Phase of the project started in early July, and we will continue working on

the following aspects:

2.1.1 Analyze Social Networks

We will carefully study Facebook, Twitter and Google+ to understand how they store data and how we can easily capture it and store it in a database.

2.1.2 Design Data Scraping Techniques

We will design algorithms to periodically scrape social networking websites and collect data for our database.

2.1.3 Design the Database

We will design an entity-relationship schema (ER diagram) for our psychological

profile database. The ER diagram will help us design a stable and efficient database. It

will also help us …

2.1.4 Design Data Mining Algorithms

We will design some data mining algorithms to find useful data in our database… 

10

Page 11: FYP Proposal Sample · Web viewSNOW3 FYP Proposal Data Mining on Social Networks to Build Psychological Profiles by Roger Leung, Arthur Chan and Walter Ho SNOW3 Advised by Prof. Edward

SNOW3 FYP – Data Mining on Social Networks to Build Psychological Profiles

2.1.5 Design the User Interface

We will design a user interface that is easy to use and can display psychological

profiles and lists of similar students based on similar personality traits.

11

Page 12: FYP Proposal Sample · Web viewSNOW3 FYP Proposal Data Mining on Social Networks to Build Psychological Profiles by Roger Leung, Arthur Chan and Walter Ho SNOW3 Advised by Prof. Edward

SNOW3 FYP – Data Mining on Social Networks to Build Psychological Profiles

2.2 Implementation

The Implementation Phase will include the following aspects:

2.2.1 Develop the Data Scraper

Based on our design, we will use UiPath Robotic Process Automation scrape data from Facebook, Twitter and Google+.

2.2.2 Build the Database

Based on our ER diagram, we will use My SQL to build our psychological profile

database. It must …

2.2.3 Develop the Data Mining Algorithms

Based on our design, we will use and modify Hivebrite (https://hivebrite.com/) open-

source software to perform data mining to find useful data in our database… 

2.2.4 Build the User Interface

Based on our design, we will use Repidminer (https://rapidminer.com/) and Java

coding to develop the user interface.

 

12

Page 13: FYP Proposal Sample · Web viewSNOW3 FYP Proposal Data Mining on Social Networks to Build Psychological Profiles by Roger Leung, Arthur Chan and Walter Ho SNOW3 Advised by Prof. Edward

SNOW3 FYP – Data Mining on Social Networks to Build Psychological Profiles

2.3 Testing

During the development process, unit testing will be done to ensure all modules are

built correctly. System integration testing will be done after we have built all the

components and combined them into the application. We will test the database, the

algorithms and the user interface.

2.3.1 Test the Data Scraper

To test the data scraper, we will …

2.3.2 Test the Database

To test the database, we will …

2.3.3 Test the Data Mining Algorithms

To test the data mining algorithms, we will …

2.3.4 Test the User Interface

To test the user interface, we will …

13

Page 14: FYP Proposal Sample · Web viewSNOW3 FYP Proposal Data Mining on Social Networks to Build Psychological Profiles by Roger Leung, Arthur Chan and Walter Ho SNOW3 Advised by Prof. Edward

SNOW3 FYP – Data Mining on Social Networks to Build Psychological Profiles

2.4 Evaluation

After we have finished all the testing, we will evaluate the system to check whether it

fulfills our objectives or not.

1. Does the system that automatically and regularly collect and correlate data for

HKUST students using popular social networking websites like Facebook, Twitter

and Google+?

2. Is the psychological profile database consistent and reliable?

3. Are the data mining techniques effective in finding similar students according to

certain personal preferences?

4. Is the user interface user-friendly, and does it display psychological profiles and

lists of similar students based on similar personality traits?

14

Page 15: FYP Proposal Sample · Web viewSNOW3 FYP Proposal Data Mining on Social Networks to Build Psychological Profiles by Roger Leung, Arthur Chan and Walter Ho SNOW3 Advised by Prof. Edward

SNOW3 FYP – Data Mining on Social Networks to Build Psychological Profiles

3 Project Planning

3.1 Distribution of Work

Task Roger Arthur WalterDo the Literature Survey ○ ● ○Analyze Social Networks ● ○ ○Design Data Crawling Techniques ● ○ ○Design the Database ○ ○ ●Design Data Mining Algorithms ○ ● ○Design the User Interface ○ ● ○Develop the Data crawler ● ○ ○Build the Database ○ ○ ●Develop the Data Mining Algorithms ○ ● ○Build the User Interface ○ ● ○Test the Web Crawler ● ○ ○Test the Database ● ○ ○Test the Data Mining Algorithms ○ ○ ●Test the User Interface ○ ● ○Perform Integration Testing ● ○ ○Write the Proposal ● ○ ○Write the Monthly Reports ● ○ ○Write the Progress Report ● ○ ○Write the Final Report ● ○ ○Prepare for the Presentation ○ ○ ●Design the Project Poster ○ ○ ●● Leader ○ Assistant

15

Page 16: FYP Proposal Sample · Web viewSNOW3 FYP Proposal Data Mining on Social Networks to Build Psychological Profiles by Roger Leung, Arthur Chan and Walter Ho SNOW3 Advised by Prof. Edward

SNOW3 FYP – Data Mining on Social Networks to Build Psychological Profiles

3.2 GANTT Chart

Task July Aug Sep Oct Nov Dec Jan Feb Mar Apr

Do the Literature Survey

Analyze Social Networks

Design Data Scraping Techniques

Design the Database

Design Data Mining Algorithms

Design the User Interface

Develop the Data Scraper

Build the Database

Develop the Data Mining Algorithms

Build the User Interface

Test the Data Scraper

Test the Database

Test the Data Mining Algorithms

Test the User Interface

Perform Integration Testing

Write the Proposal

Write the Monthly Reports

Write the Progress Report

Write the Final Report

Prepare for the Presentation

Design the Project Poster

16

Page 17: FYP Proposal Sample · Web viewSNOW3 FYP Proposal Data Mining on Social Networks to Build Psychological Profiles by Roger Leung, Arthur Chan and Walter Ho SNOW3 Advised by Prof. Edward

SNOW3 FYP – Data Mining on Social Networks to Build Psychological Profiles

17

Page 18: FYP Proposal Sample · Web viewSNOW3 FYP Proposal Data Mining on Social Networks to Build Psychological Profiles by Roger Leung, Arthur Chan and Walter Ho SNOW3 Advised by Prof. Edward

SNOW3 FYP – Data Mining on Social Networks to Build Psychological Profiles

4 Required Hardware & Software

4.1 Hardware

Development PC: PC with MS Windows XP or laterMinimum Display Resolution: 1024 * 768 with 16 bit colorServer PC: PC with 1TB hard drive 

4.2 Software

MySQL For our databaseJAVA, JavaScript, PHP Programming languagesEclipse with Android SDK CompilerUiPath Robotic Process Automation Data scraping softwareHivebrite Data mining software…

18

Page 19: FYP Proposal Sample · Web viewSNOW3 FYP Proposal Data Mining on Social Networks to Build Psychological Profiles by Roger Leung, Arthur Chan and Walter Ho SNOW3 Advised by Prof. Edward

SNOW3 FYP – Data Mining on Social Networks to Build Psychological Profiles

5 References

[1] W. R. Aul, "Herman Hollerith: Data Processing Pioneer," Think, 1972, pp. 22-24.[2] E. Black, IBM and the Holocaust, Crown Books, 2001, p. 25.[3] J. Marrs, The Rise of the Fourth Reich, HarperCollins, 2008, pp. 125-203 [4] You Be the Judge and Jury, DARP Information Awareness Office, 2014,

http://www.youbethejudgeandjury.com/darpa_information_awareness_office.htm[5] T. Durden, “Thousands Of Firms Trade Confidential Data With The US

Government In Exchange For Classified Intelligence,” Zero Hedge, 14 June 2013, http://www.zerohedge.com/news/2013-06-14/thousands-firms-trade-confidential-data-us-government-exchange-classified-intelligen

[6] M. Greenop. “Facebook – the CIA conspiracy,” The New Zealand Herald, 8 August 2007, http://www.nzherald.co.nz/technology/news/article .cfm?c_id=5&objectid=10456534 [7] Facebook, Company Info, Sept. 2014, http://newsroom.fb.com/company-info/

19

Page 20: FYP Proposal Sample · Web viewSNOW3 FYP Proposal Data Mining on Social Networks to Build Psychological Profiles by Roger Leung, Arthur Chan and Walter Ho SNOW3 Advised by Prof. Edward

SNOW3 FYP – Data Mining on Social Networks to Build Psychological Profiles

6 Appendix A: Meeting Minutes

6.1 Minutes of the 1st Project Meeting

Date: June 26, 2019Time: 11:00 amPlace: Room 3666Present: Roger, Arthur, Walter, Prof. SnowdemAbsent: NoneRecorder:Roger 1. Approval of minutes

This was the first formal group meeting, so there were no minutes to approve. 2. Report on progress

2.1 All team members have read the instructions of the Final Year Project online and have done research for the topic.

2.2 Roger and Arthur have done research on social networks.2.3 Walter has studied the Facebook, Twitter and Google+ user agreements.2.4 All team members have read the information provided by Prof. Snowdem.

3. Discussion items3.1 The goal of project is basically to try to do something like the NSA does to

gather information from unsuspecting social networking users and build psychological profiles. However, we’ll do it on a smaller scale and utilize legal means.

3.2 The scope of the project includes data from a few thousand people who use Facebook, Twitter and Google+.

3.3 The project plan needs to include a list of the main tasks, who will work on each task and a GANTT chart.

3.4 Popular development tools for grabbing data from social networks are ______. We will try these and compare them for user-friendliness and effectiveness.

20

Page 21: FYP Proposal Sample · Web viewSNOW3 FYP Proposal Data Mining on Social Networks to Build Psychological Profiles by Roger Leung, Arthur Chan and Walter Ho SNOW3 Advised by Prof. Edward

SNOW3 FYP – Data Mining on Social Networks to Build Psychological Profiles

3.5 Professor Snowdem demonstrated how to access a secure database and find interesting information, but he suggested we don’t try this ourselves.

 4. Goals for the coming week

4.1 All group members will study new information provided by Prof. Snowdem.4.2 Roger will set up a server for testing.4.3 Arthur will compare the popular development tools for data scraping and data

mining.4.4 Walter will study different database systems used in social networks and data

mining4.5 All group members will think about ways to develop a good system. 

5. Meeting adjournment and next meetingThe meeting was adjourned at 4:00 pm.The next meeting will be at 11:00 on July 3rd at the LG7 Canteen.

6.2 Minutes of the 2nd Project Meeting

Date: July 3, 2019Time: 11:00 amPlace: LG7 CanteenPresent: Roger, Arthur, WalterAbsent: Prof. SnowdemRecorder:Arthur 1. Approval of minutes

The minutes of last meeting were approved without amendment. 2. Report on progress

2.3 All group members have studied new information provided by Prof. Snowdem.

2.4 Roger set up a server for testing, but he had some trouble with the configuration.

2.5 Arthur compared the popular development tools for grabbing social network data, and he found that XXX is best for ???, but YYY is best for ???.

21

Page 22: FYP Proposal Sample · Web viewSNOW3 FYP Proposal Data Mining on Social Networks to Build Psychological Profiles by Roger Leung, Arthur Chan and Walter Ho SNOW3 Advised by Prof. Edward

SNOW3 FYP – Data Mining on Social Networks to Build Psychological Profiles

2.6 Walter studied the database systems used in the three social networks, and he says that Facebook and Google have proprietary database systems, but they are based on ZZZ and AAA. Twitter is …

2.7 All group members have thought about ways to develop the system and some suggestions were mentioned.

3. Discussion items3.1 Prof. Snowden was unable to attend the meeting, since he is travelling for a

couple weeks. We’ll send any questions to him by e-mail, and he’ll try to send us additional information if necessary.

3.2 The group considered if this project is really suitable for the FYP given the time constraints.

3.3 Roger suggested we consider doing a game FYP instead. Arthur liked the idea, but Walter wants to study the information from Prof. Snowden more closely.

3.4 Possible game themes were discussed.3.5 Roger said he’s discontinuing his Facebook account and will no longer use

Google. Arthur said he likes the ixquick search engine, since it uses proxy servers.

4. Goals for the coming week4.1 All group members will read more information related to the project topic, e.g.,

social networks and the NSA4.2 All group members will need to study and compare the languages and software

being considered for implementation of the project.4.3 All group members will think about possible game themes in case we don’t

want to work on the current project goal.4.4 Walter will e-mail Prof. Snowdem to discuss the project and his plans and ask if

he minds if we just do a game project in case this project is too hard for us.

5. Meeting adjournment and next meetingThe meeting was adjourned at 4:00 pm.The date and time of the next meeting will be set later by e-mail.

22