ocean: open-source collation of egovernment data and networks: understanding privacy leaks in open...

71
OCEAN: Open-source Collation of eGovernment data And Networks Understanding Privacy Leaks in Open Government Data Srishti Gupta Advisor: Dr. Ponnurangam Kumaraguru M.Tech Thesis Defense 20-November-2013

Upload: precog

Post on 17-Dec-2014

986 views

Category:

Technology


2 download

DESCRIPTION

The awareness and sense of privacy has increased in the minds of people over the past few years. Earlier, people were not very restrictive in sharing their personal information, but now they are more cautious in sharing it with strangers, either in person or online. With such privacy expectations and attitude of people, it is difficult to embrace the fact that a lot of information is publicly available on the web. Information portals in the form of the e-governance websites run by Delhi Government in India provide access to such PII without any anonymization. Several databases e.g., Voterrolls, Driving Licence number, MTNL phone directory, PAN card serve as repositories of personal information of Delhi residents. This large amount of available personal information can be exploited due to the absence of proper written law on privacy in India. PII can also be collected from various social networking sites like Facebook, Twitter, GooglePlus etc. where the users share some information about them. Since users themselves put this information, it may not be considered as a privacy breach, but if the information is aggregated, it may give out much more information resulting in a bigger threat. For e.g., data from social networks and open government databases can be combined together to connect an online identity to a real world identity. Even though the awareness about privacy has increased, the threats possible due to the availability of this large amount of personal data is still unknown. To bring such issues to public notice, we developed Open-source Collation of eGovernment data And Networks (OCEAN), a system where the user enters little information (e.g. Name) about a person and gets large amount of personal information about him / her like name, age, address, date of birth, mother's name, father's name, voter ID, driving licence number, PAN. On aggregation of information within the Voter ID database, OCEAN creates a family tree of the user giving out the details of his / her family members as well. We also calculated a privacy score, which calculates the risk associated with that individual in terms of how much PII of that person is revealed from open government data sources. 1,693 users had the highest privacy score making them the most vulnerable to risks. Using OCEAN, we could collect 8,195,053 Voterrolls; 2,24,982 Driving licence; 53,419 PAN card numbers; 1,557,715 Twitter; 3,377,102 Facebook; 29,393 Foursquare; 1,86,798 LinkedIn and 28,900 GooglePlus records. We received 661 total hits (657 unique visitors) from the day we released the system, January 21, 2013, until October 10, 2013. To the best of our knowledge, this is the fi rst real world deployed tool which provides personal information about residents of Delhi to everyone free of cost. Full Report: http://arxiv.org/abs/1312.2784

TRANSCRIPT

Page 1: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

OCEAN: Open-source Collation of eGovernment data And NetworksUnderstanding Privacy Leaks in Open

Government Data

Srishti Gupta

Advisor: Dr. Ponnurangam Kumaraguru

M.Tech Thesis Defense

20-November-2013

Page 2: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Thesis Committee

Dr. Muttukrishnan Rajarajan, City University, London

Dr. Vinayak Naik, IIIT-Delhi

Dr. PK (Chair), IIIT-Delhi

2

Page 3: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Demo

3

Page 4: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Academic Honors

Gupta, S., Gupta, M., and Kumaraguru, P. OCEAN: Open-source Collation of eGovernment data And Networks. Poster at Security and Privacy Symposium (SPS), IIT-K, 2013.

Gupta, S., Gupta, M., and Kumaraguru, P. Is Government a Friend or Foe? Privacy in Open Government Data. Poster at IBM-ICARE, IISc Bangalore, 2012.

4

BEST Poster

Page 5: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Recognition

5

IIITD Homepage [ Aug ’13 ]

Hindustan [ April ’13 ]

550 Unique Visitors

(as on Nov 17, 2013)

Page 6: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Presentation Outline

6

Presentation Outline

Research Motivation and Aim

Related Work

Research Contribution

Methodology

Experiments and Analysis

Conclusion

Future Work

Questions

Page 7: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Identity Theft- On rise!

7

Research Motivation and Aim

Page 8: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Ways to get PII

8

Mail Thefts, Pharming

OSN

Social Engineering (e.g., Fake accounts)

Not credible Limited Info.

Open Government Data Source

E-mail, Docs, Spreadsheet

Shoulder SurfingDumpster Diving

Page 9: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Open Government Data Sources ‘Open’: Publicly available

eGovernment initiatives by different state government inthe form of databases / services.

Objective? Improve information gathering procedure

Reduce the burden on citizens to access their data

Pros: Improved data availability, easy verification.

Cons: Databases publicly available, leading to informationdisclosure, privacy breach.

9

Research Motivation and Aim

Page 10: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

10

Information Leakage in Open Government Data Sources ??

Page 11: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

PII Leakage

11

Personally Identifiable Information (PII)

Voter ID, Name, Father’s name, Age, Gender, Date Of Birth, DL number, PAN, Phone number

Research Motivation and Aim

Page 12: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

The Other Side! “People’s View”

12

Research Motivation and Aim

CONSCIOUS DECISION !

(Kumaraguru, 2012)

Page 13: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

13

Citizens do not want their PII to be leaked !

Page 14: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Research Aim

To develop a technology to showcase publicly availablepersonal information online

To highlight the privacy issues on aggregation of availablepersonal information

14

Research Motivation and Aim

Page 15: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

System Outline

15

Presentation Outline

Identification of data sources

Threat Modelling Information Aggregation

Data ExtractionEvaluation (Privacy Score, Recall, SUS)

Page 16: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Presentation Outline

16

Presentation Outline

Research Motivation and Aim

Related Work

Research Contribution

Methodology

Experiments and Analysis

Conclusion

Future Work

Questions

Page 17: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Related Work

17

Related Work and Research Contribution

Yasni(www.yasni.com)

Page 18: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Related Work

18

Related Work and Research Contribution

Pipl(www.pipl.com)

Page 19: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Related Work

19

Related Work and Research Contribution

Name Country Description

IndianKanoon(http://www.indiankanoon.org/)

India Legal search engine Indexes judgements of the Supreme Court and several High

Courts

OpenCivic.in(http://www.opencivic.in/)

India Application Programming Interface Gives data about state assembly elections and profiles of MP's in

Maharashtra

ABQ Ride(http://www.cabq.gov/abq-apps/city-apps-listing/abq-ride)

USA Real-time locations of city buses Fares for other public transportation

Illustreets(http://data.gov.uk/apps/illustreets)

UK Comparing locations Gives crime, education, transport and census data for a location

Various country-specific systems built with Open Government Data

Page 20: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Research Gap

20

OCEANYasni / Pipl

Open Source Data Aggregation

Indian KanoonOpen Government Data

PII Leakage

Related Work and Research Contribution

Page 21: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Presentation Outline

21

Presentation Outline

Research Motivation and Aim

Related Work

Research Contribution

Methodology

Experiments and Analysis

Conclusion

Future Work

Questions

Page 22: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Research Contribution First deployed system which shows the aggregated personal

information about the residents of Delhi.

Threat modelling on the various open government databases.

Privacy Score: Risk associated with the person on the leaking PII.

Empirical understanding of privacy perceptions, awareness andexpectations of the users from the open government data.

22

Related Work and Research Contribution

Page 23: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Presentation Outline

23

Presentation Outline

Research Motivation and Aim

Related Work

Research Contribution

Methodology Identification of open government data sources

Threat Modelling

Data Extraction

Information Aggregation

Experiments and Analysis

Conclusion

Future Work

Questions

Page 24: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

System Architecture

24

Page 25: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

System Outline

25

Presentation Outline

Identification of data sources

Threat Modelling Information Aggregation

Data Extraction Evaluation (Privacy Score, Recall, SUS)

Page 26: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Driving LicenceDL-XXYYYYAAAAAAA where

DL: state(Delhi), XX: Location in Delhi, YYYY: Year of issue of the license, AAAAAAA is unique

26

Methodology

Page 27: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Voter ID XXX12345678 where

X: ‘A’ – ‘Z’ and last 8 digits- numerals

27

Methodology

Page 28: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

PAN XXXTL1234X where

XXX: ‘A’ – ‘Z’, T: Type of holder, L: First character of last-name,1234: Sequential number, X: Check digit

28

Methodology

Page 29: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Online Social Networks

29

Methodology

Name , Gender, Profile image, Profile url

Name , Followers / Following count, Location, Profile image, Profile url

Name , Gender, Facebook / Twitter contact, Friend / Follower count, Badge / Mayorship / Check-in count, Location, Profile image, Profile url

Name , Location, Profile image, Profile url

Name , Gender, Relationship status, Location, Organization, Birthday, E-mail, Language, Profile image, Profile url

Page 30: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

System Outline

30

Presentation Outline

Identification of data sources

Threat Modelling Information Aggregation

Data ExtractionEvaluation (Privacy Score, Recall, SUS)

Page 31: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

II. Threat Modelling

31

Methodology

USER

OPEN GOVERNMENT

DATA

PAN

DRIVING LICENSE VOTER ROLLS

Name, Address, Father’s name, Driving License no., DOB

Driving License number

Name, Constituency

Name, Address, Relation name, Age, Gender, Voter ID

Name, PANName, DOB

TRUST BOUNDARY

Page 32: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Attack Scenario (I) Online Voter ID card – Multiple fake voter ID cards can be

created from the available PII

32

Research Motivation and Aim

Page 33: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Attack Scenario (II) View tax statements (Income tax e-filing) – Fake accounts

can be created to view TDS statements.

33

Research Motivation and Aim

Page 34: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Attack Scenario (III) Procure a SIM card / phone connection

Fake documents can be created

Credit / debit cards can be applied in victim’s name

Networking accounts can be created

34

Research Motivation and Aim

Page 35: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

II. Threat Modelling DREAD Model: Microsoft’s Risk Assessment Model

35

Methodology

Term Remarks

Damage How big the damage would be if the attack succeeded?

Reproducibility How easy it is to reproduce the attack to work?

Exploitability How much time, effort, and expertise is needed toexploit the threat?

Affected Users If a threat were exploited, what percentage of users would be affected?

Discoverability How easy is it for an attacker to discover this threat?

Page 36: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

II. Threat ModellingScheme: High (3), Medium (2), Low (1)

Threat: Malicious user can identify PII of Delhi residents

36

Methodology

[Threat modelling: http://msdn.microsoft.com/en-us/library/ff648644.aspx]

Page 37: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

II. Threat ModellingAccording to Microsoft’s DREAD model,

In our case,

Overall rating = 2 + 3 + 2 + 3 + 3 = 13 (High)

It means that this threat pose a significant risk to the various information portal websites of Delhi government and needs to be addressed as soon as possible !

37

Methodology

Range Level of risk

5 -7 Low

8 – 11 Medium

12 – 15 High

Page 38: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

System Outline

38

Presentation Outline

Identification of data sources

Threat Modelling Information Aggregation

Data ExtractionEvaluation (Privacy Score, Recall, SUS)

Page 39: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

III. Data ExtractionData was collected from various open government data sources usingPHP scripts and stored as MySQL databases.

39

Methodology

OPEN GOVT. WEBSITES

Alphabets a-z for name, across 70 constituencies

Name and DOB from DL

Random 5 seeds, ‘Incremental attack’

PAN

[53,419]

DRIVING LICENCE[2,24,982]

VOTER[81,95,053]

Page 40: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

III. Data Extraction Public data from various online social networking sites was

collected using public API calls.

OAuth tokens were used for authentication and authorization.

40

Methodology

UNIQUE NAME

API CALLS

GOOGLEPLUS[28,900]

LINKEDIN[1,86,798]

FOURSQUARE[29,393]

TWITTER[15,57,715]

FACEBOOK[33,77,102]

Page 41: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

System Outline

41

Presentation Outline

Identification of data sources

Threat Modelling Information Aggregation

Data ExtractionEvaluation (Privacy Score, Recall, SUS)

Page 42: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

IV. Information Aggregation Family Tree

Information within Voter ID database aggregated to findrelationships among records.

OCEAN has 3,90,353 such users.

42

Methodology

Page 43: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

IV. Information Aggregation Mapping of users across Voter ID and Driving licence database.

Table Schema:

Done on the basis of similarity between name, relation name andaddress of the users across the database.

OCEAN has 6,384 such users.

43

Methodology

Database Attributes

Voter ID Voter ID, Name, Address, Father's / Mother's / Husband's name, Age, Gender

Driving Licence Name, Address, Father's name, DOB, Validity period, vehicle category

Page 44: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

44

IV. Information AggregationMethodology

Challenge: The address formats for various sources is different

Page 45: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

IV. Information Aggregation Mapping of users across Voter ID, Driving licence and PAN

database.

Subset of DL having PAN were chosen.

OCEAN has 1,693 such users.

45

Methodology

Page 46: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

IV. Information Aggregation Mapping users across Foursquare, Facebook and Twitter.

Some users specify their other OSN’s contact on Foursquare. Theinformation available from such users is aggregated together.

OCEAN has 11 such users

46

Methodology

Page 47: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

IV. Information Aggregation

Challenge: Not many users link their OSN accounts

47

Methodology

Page 48: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Presentation Outline

48

Presentation Outline

Research Motivation and Aim

Related Work and Research Contribution

Methodology

System User Interface

Experiments and Analysis

Conclusion

Future Work

Questions

Page 49: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

System Outline

49

Presentation Outline

Identification of data sources

Threat Modelling Information Aggregation

Data ExtractionEvaluation (Privacy Score, Recall, SUS)

Page 50: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Survey Dataset

62 complete responses.

51% males, 49% females.

77% in the age group 20 – 25.

23% had friends / self experience identity thefts online.

50

Experiments and Analysis

Page 51: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

51

Experiments and Analysis

Privacy score measure the risk associated with a person on the basis of how much PII about that person is revealed from open government data sources.

Privacy score (user) = Σ Sensitivity score (attributes)

Sensitivity score -> {1, 2, 3, 4, 5}

Range Level

<20 % 1

21 – 30 % 2

31 – 50 % 3

51 – 60 % 4

>61 % 5

Evaluation Metric I - Privacy Score

Page 52: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Privacy Score

52

Experiments and Analysis

Level 5 1Willingness to share

Attribute Percentage of users unwilling to share personal information with anyone

Privacy Level

Voter ID 56.4% 4

Driving licence no. 58% 4

PAN 67.7% 5

Full name 14.5% 1

Home address 82.25% 5

Age 29% 2

DOB 50% 3

Father’s name 38.7% 3

Gender 14.5% 1

Page 53: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Privacy Score

53

Experiments and Analysis

Privacy score for 84,22,459 users:

Case 1: Users having only Voter ID (97.3%)

PS = Σ(Voter ID, name, father’s name, age, gender, address) = 16

Case 2: Users having only Driving licence number (2%)

PS = Σ(DL number, name, relative’s name, DOB, address) = 17

Case 3: Users having only PAN (1%)

PS = Σ(PAN, DL number, name, relative’s name, DOB, address) = 25

Page 54: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Privacy Score

54

Experiments and Analysis

1,693 people

Highest Risk!

Case 4: Users having Voter ID and DL number (0.07%)

PS = Σ(Voter ID, DL number, name, father’s name, age, gender, DOB, address) = 24

Case 5: Users having Voter ID, DL number and PAN (0.02%)

PS = Σ(Voter ID, DL number, PAN, name, father’s name, age, gender, DOB, address) = 29

Page 55: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Evaluation Metric II

55

Evaluation Metrics

Recall (Based on user study)

𝑅𝑒𝑐𝑎𝑙𝑙 =𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑒𝑜𝑝𝑙𝑒 𝑤ℎ𝑜 𝑐𝑜𝑢𝑙𝑑 𝑏𝑒 𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑒𝑑 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑦𝑠𝑡𝑒𝑚

𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑒𝑎𝑟𝑐ℎ 𝑜𝑝𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠 𝑑𝑜𝑛𝑒 𝑜𝑛 𝑡ℎ𝑒 𝑠𝑦𝑠𝑡𝑒𝑚

Thus, Recall = ( 179 / 389 ) = 46%

Low Recall Data collection not 100%.

(Out of 12 million voter records, we have ~8 million records)

Respondents might be unclear about constituency.

Page 56: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Evaluation Metric III

56

Evaluation Metrics

System Usability Score (SUS)

Measured using the standard method as defined by Brooke et.al.

For OCEAN, value was 74.5 / 100 which means that people found the system usable and convenient to use.

(Brooke, 1996)

Page 57: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

User Awareness

Government started various open initiatives to increase the level of transparency with citizens.

But, only 19% survey respondents aware.

Around 76% have started using these for less than 2 years.

Proper schemes required to convey the existence.

57

Experiments and Analysis

Page 58: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

User Experience Majority, 62% were shocked to see the availability of

personal information to this extent.

People felt that the information can be used maliciously against them.

People now feel scared in sharing their information with various government departments.

58

Experiments and Analysis

Page 59: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

User Expectations

59

Experiments and Analysis

Page 60: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Feedback

60

Feedback

“It was an eye-openerto a common man.”

I am really shocked that the exact ID

numbers are available online without much security against data mining at this scale.”

“Waiting for an upgraded version

which will work for other states also.”

“Good system. Great work ! Didn't know

such a system existed.”

“A great shortcoming and security flaw has been pointed out by OCEAN. Great work.”

Page 61: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Presentation Outline

61

Presentation Outline

Research Motivation and Aim

Related Work

Research Contribution

Methodology

Experiments and Analysis

Conclusion

Future Work

Questions

Page 62: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Conclusion Large amount of personal information is available on

government servers.

Information aggregation yields more information about a person.

Threat Modelling on open government data sources shows risk associated with PII leakage and need for preventive measures.

1,693 users are most vulnerable to identity thefts risks.

People felt the need of access control on the data and proper privacy laws against the misuse of information.

62

Conclusion

Page 63: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Presentation Outline

63

Presentation Outline

Research Motivation and Aim

Related Work

Research Contribution

Methodology

Experiments and Analysis

Conclusion

Future Work

Questions

Page 64: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Future Work Datasets can be extended to other states in India.

Mapping users across offline (govt. databases) and online(social networking sites) worlds.

Data collection can be expanded to improve the recall.

64

Future Work

Page 65: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

Acknowledgments

Mayank Gupta, B.Tech, DCE

Niharika Sachdeva, PhD, IIIT-Delhi

Precog members, friends and family

65

Future Work

Page 66: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

References Kumaraguru, P., and Sachdeva, N. Privacy in India: Attitudes and

Awareness V 2.0. Tech. rep., PreCog-TR-12-001, PreCog@IIIT-Delhi, 2012. http://precog.iiitd.edu.in/research/privacyindia/

McCallister, Erika, Tim Grance, and Karen Scanfone. "Guide to protecting the confidentiality of personally identifiable information (PII)(draft), January 2009." NIST Special Publication: 800-122.

Schwartz, Paul M., and Daniel J. Solove. "PII Problem: Privacy and a New Concept of Personally Identifiable Information, The." NYUL Rev. 86 (2011): 1814.

Mont, Marco Casassa, Siani Pearson, and Pete Bramhall. "Towards accountable management of identity and privacy: Sticky policies and enforceable tracing services." Database and Expert Systems Applications, 2003. Proceedings. 14th International Workshop on. IEEE, 2003.

Jones, Rosie, et al. "I know what you did last summer: query logs and user privacy." Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM, 2007

66

Page 67: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

References (I) Nashash, Hyam. "EDUCATION AS A BUILDING BLOCK IN OPENING UP

GOVERNMENT DATA." European Scientific Journal 9.13 (2013).

Barber, Grayson. "Personal Information in Government Records: Protecting the Public Interest in Privacy." . Louis U. Pub. L. Rev. 25 (2006): 63.

Krishnamurthy, Balachander, and Craig E. Wills. "On the leakage of personally identifiable information via online social networks." Proceedings of the 2nd ACM workshop on Online social networks. ACM, 2009.

Jurgens, David. "That’s What Friends Are For: Inferring Location in Online Social Media Platforms Based on Social Relationships." Seventh International AAAI Conference on Weblogs and Social Media. 2013.

Zheleva, Elena, and Lise Getoor. "To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles." Proceedings of the 18th international conference on World wide web. ACM, 2009.

67

Page 68: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

References (II) Mislove, Alan, et al. "You are who you know: inferring user profiles in

online social networks." Proceedings of the third ACM international conference on Web search and data mining. ACM, 2010.

Harel, Amir, et al. "M-score: estimating the potential damage of data leakage incident by assigning misuseability weight." Proceedings of the 2010 ACM workshop on Insider threats. ACM, 2010.

Wright, Glover, Pranesh Prakash Sunil Abraham, and Nishant Shah. "Open government data study: India." Study commissioned by the Transparency and Accountability Initiative (2010).

Godse, Mr Vinayak, and Director–Data Protection. "RISE PROJECT." (2010).\bibitem{brooke1996sus} Brooke, John. ``SUS-A quick and dirty usability scale." Usability evaluation in industry 189 (1996): 194.

Social media report 2012: Social media comes of age. http://www.nielsen.com/us/en/reports/2012/state-of-the-media-the-social-media-report-2012.html

68

Page 69: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

69

Thank You!

Page 70: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

70

Questions?

Page 71: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data

71

For any further information, please write to

[email protected]

precog.iiitd.edu.in