privacy framework for rdf data mining
DESCRIPTION
Privacy Framework for RDF Data Mining. Master’s Thesis Project Proposal By: Yotam Aron. Overview. Motivation and Goal Background Proposed Solution and Design Example Conclusion. Motivation. D ata mining continues to become more widespread. Useful for research, public policy, etc. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/1.jpg)
Privacy Framework for RDF Data Mining
Master’s Thesis Project ProposalBy: Yotam Aron
![Page 2: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/2.jpg)
OverviewMotivation and GoalBackgroundProposed Solution and DesignExampleConclusion
![Page 3: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/3.jpg)
MotivationData mining continues to become
more widespread.◦Useful for research, public policy,
etc.Want to maintain privacy of
participants in the database.Little work has been done for
privacy for semantic web data.
![Page 4: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/4.jpg)
Previous WorkAnonymizationK-Anonimity1
Differential Privacy systems: PINQ2, AIRAVAT3.
Drawbacks:◦Do not apply to semantic web data.◦Do not support SPARQL.
![Page 5: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/5.jpg)
GoalDevelop a system to protect
dataset participants’ personal data in SPARQL.
Integrates well with existing SPARQL endpoints.
Relatively easy for the user and the administrator to use.
![Page 6: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/6.jpg)
BackgroundRule-based Privacy Policies in AIRDifferential Privacy
![Page 7: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/7.jpg)
Rule-based Privacy Policies in AIR4
Rules define patterns in a SPARQL query.
If pattern is matched, rule infers compliance or non-compliance of incoming SPARQL query.
![Page 8: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/8.jpg)
AIR Example5
air:if {:W s:TriplePattern :T . :T log:includes { :X type:F :V }.
}; air:then [ air:description (“type:F was selected in " q:QUERY) ; air:assert { q:QUERY air:non-compliant-with q:Policy4 . } ] .
SELECT ?s WHERE {?s type:F ?p}
AIR Policy (extract)
Query
AIR will show that the query is non-compliant with Policy4.
![Page 9: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/9.jpg)
Differential Privacy OverviewMinimize probability of privacy
breach.Maximize statistical accuracy.Definition requires that given two
similar datasets, a function query on those two datasets give similar results with high probability.
Makes no assumptions on the underlying dataset.
![Page 10: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/10.jpg)
Differential PrivacyDefinition: We say a randomized
computation M provides ɛ-differential privacy if for any two data sets A and B, and any set of possible outputs S ⊆ Range(M),
Pr[M(A) ∈ S] ≤ Pr[M(B) ∈ S] × exp( ɛ × |A ⊕ B|).
![Page 11: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/11.jpg)
Differential Privacy in PracticeEach user is given an ɛ value that
cannot be exceeded.Each query qi has some noise value ɛi . In
total, the user’s queries must satisfy the property
Noise (usually Laplace), which depends on the aggregate function, is added with variance
![Page 12: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/12.jpg)
Limitations of Differential PrivacyOnly statistical data protected.High variance in data yields poor
query results.Theory not always perfect in
practice.◦Assume no collusion among users.◦Covert channel attacks.6
◦What value of ɛ to choose?
![Page 13: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/13.jpg)
Example, No DPName Salary
Alice 31,000
Bob 47,000
Charlie 20,000
David 21,000
SELECT COUNT(Name) WHERE (Age < 25)
2
![Page 14: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/14.jpg)
Example, No DPName Salary
Alice 31,000
Bob 47,000
Charlie 20,000
SELECT COUNT(Name) WHERE (Age < 25)
1 Big difference in answers!!
![Page 15: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/15.jpg)
Example, With DPName Salary
Alice 31,000
Bob 47,000
Charlie 20,000
David 21,000
SELECT COUNT(Name) WHERE (Age < 25)
2 + noise = ~2 (with high probability)
![Page 16: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/16.jpg)
Example, With DPName Salary
Alice 31,000
Bob 47,000
Charlie 20,000
SELECT COUNT(Name) WHERE (Age < 25)
1+ noise = ~2 (with high probability)
With high probability, records are indistinguishable!
![Page 17: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/17.jpg)
Practical Consequences of DPAn individual’s inclusion in the
dataset is not likely a privacy risk.
The answers to the queries can still be useful.
![Page 18: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/18.jpg)
Achieving Differential Privacy in RDFCurrent techniques for
differential privacy are developed for relational databases.
As a first approximation, reduce triple-store to a relational database.
Improved mechanism as project progresses.
![Page 19: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/19.jpg)
Example of RDF-RDBS Reduction:Person1 foaf:name “Alice”;
foaf:member :DIGfoaf:age “21”foaf:knows :Person2 :Person3.
:Person2 foaf:name “Bob”;foaf:member :DIG;foaf:knows :Person3.
:Person3 foaf:name “Charlie”;foaf:age “22”.
ID Foaf:name
Foaf:member
Foaf:knows
Foaf:age
Person1 “Alice” DIG [Person2,Person3
“21”
Person2 “Bob” DIG [Person3] None
Person3 “Charlie” None None “22”
![Page 20: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/20.jpg)
Proposed SolutionSPARQL Privacy Insurance
Module (SPIM)Build layer between user and
endpoint.Integrate both AIR and
differential privacy.Integrate credential-checking
system.Modify existing differential
privacy framework for use with triple-stores.
![Page 21: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/21.jpg)
ContributionsComplete privacy protection for
triplestores.Differential Privacy sensitivity for
SPARQL 1.1 aggregate functions including count, sum, avg, sum, min, and max.
![Page 22: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/22.jpg)
System Overview
![Page 23: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/23.jpg)
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
![Page 24: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/24.jpg)
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
![Page 25: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/25.jpg)
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• TAAC Will:• Verify user has
permission to access
• Send central module data about user
![Page 26: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/26.jpg)
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• SPIM:• Controls order of
privacy operations.
• Interfaces with the SPARQL endpoint.
![Page 27: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/27.jpg)
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• AIR:• Reasoner that
uses rule-based policies to check queries for privacy hazards.
• Extracts information for differential privacy.
![Page 28: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/28.jpg)
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• Policy Files:• Contain the
rules for AIR.
![Page 29: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/29.jpg)
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• Differential Privacy Module:• Checks to see
for query limits (based off ɛ use.
• Applies noise to statistical data.
![Page 30: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/30.jpg)
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• User Data:• Contains user ɛ
data.
![Page 31: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/31.jpg)
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• SPIM:• Controls order of
privacy operations.
• Interfaces with the SPARQL endpoint.
![Page 32: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/32.jpg)
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• Service Description:• Contains
information to be used for the addition of noise.
![Page 33: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/33.jpg)
• Miscellaneous:• Interface to SPARQL
Endpoint• Transaction File• Improved Differential
Privacy Output• Service Description
Generator
![Page 34: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/34.jpg)
• Potential Extensions:• Robustness against
attacks• Concurrency• Optimization for large
systems• Customizable UI• Accountability
![Page 35: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/35.jpg)
Sample ScenarioTriplestore datamining in
biotechnological applications.Biofirm provides data about
hospitals in the US.Alice is a PhD student at MIT.Alice would like to query Biofirm’s
database for research purposes. She just got permissions yesterday and is logging in for the first time.
![Page 36: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/36.jpg)
PreprocessingBiofirm installs SPIM, and runs
the service description generation code.◦May need to create the correct
interface.Makes sure the UI is accessible
online.
![Page 37: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/37.jpg)
Sample Compliant QueryAlice would like to know the total
number of visits that Boston hospitals received.
SELECT (SUM(?s) as ?people) WHERE{?h a biofirm:Hospital.?h biofirm:visits ?s.?h biofirm:location geo:Boston.
}
Epsilon value: 1.0
![Page 38: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/38.jpg)
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• Alice enters query into the provided user interface.
![Page 39: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/39.jpg)
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• TAAC insures that biofirm has given Alice access to its triple-store.
![Page 40: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/40.jpg)
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• Query request arrives at SPIM central module.
![Page 41: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/41.jpg)
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• Policyrunner is called upon to check query for triple patterns that are in violation.
• No violations found. • Since this is Alice’s
first time, AIR extracts what type of permissions Alice has.
![Page 42: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/42.jpg)
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• SPIM creates a profile for Alice. • Gives her an ɛ
value (suppose it 2.0).
• Stores it in triple store.
![Page 43: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/43.jpg)
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• SPIM extracts which variables will yield statistical results and will have differential privacy applied.
![Page 44: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/44.jpg)
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• Differential Privacy module assures that query’s results will not exceed given epsilon value.
![Page 45: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/45.jpg)
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• This is Alice’s first time, and her epsilon value is 2.0 and the epsilon for this query is 1.0. Everything looks good.
![Page 46: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/46.jpg)
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• Query is sent to the endpoint.
• Results are received.
![Page 47: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/47.jpg)
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• Differential privacy module adds noise to appropriate fields, and updates epsilon values.
![Page 48: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/48.jpg)
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• SPIM is ready to return the results.
![Page 49: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/49.jpg)
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• Alice receives results.
![Page 50: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/50.jpg)
SummarySystem will combine rule-based
privacy with differential privacy.Develop differential privacy
techniques for semantic web data.
Make privacy module client and administrator friendly.
![Page 51: Privacy Framework for RDF Data Mining](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5681451f550346895db1e20d/html5/thumbnails/51.jpg)
References K-Anonimity: http://spdp.dti.unimi.it/papers/k-Anonymity.pdf PINQ: http://
research.microsoft.com/pubs/80218/sigmod115-mcsherry.pdf
AIRAVAT: http://www.cs.utexas.edu/~shmat/shmat_nsdi10.pdf
AIR: http://dig.csail.mit.edu/TAMI/2008/12/AIR/ AIR Policy Example: http://
dig.csail.mit.edu/2009/IARPA-PIR/usecase1/generic-policies.n3
Differential Privacy Under Fire: http://www.usenix.org/events/sec11/tech/full_papers/Haeberlen.pdf