l8: introduction to privacy-preserving computations

L8: Introduction to privacy-preserving computationsPrivacy-preserving Technologies / LTAT.04.007

Dan Bogdanovdan.bogdanov@cyber.ee

Motivation – why privacy-preserving computing?

Example: Linking Tax And Education Data

Regulatory Barriers On Data Linking

Using privacy technologies to solve it

Source data:10 million tax records,600 000 education records.

Each record upload using secret sharing(think: “encryption”)Records linked and processed using secure multi-party computation (think: “data not decrypted for processing”) Data never existed outside the source in an unencrypted state.Solution based on Sharemind MPC. Cybernetica

Educationrecords

Employmenttax records

Estonian Information

System's Authority

Ministry of Finance

IT Center

Ministry of Education and Research

Estonian Tax and Customs Board

Tax and Customs

Employmenttax payments

Ministry of Education

and Science

Higher studyevents

Monthly income

University career of a person

Aggregate by person

Average yearly income

Aggregate by year

Employment record of a person

Complete record of a person

Merge by person's ID

Analysis table

Compute additional attributes and

align tax payments Extract data

Extract data

Higher studyevents

Secret shareand upload

Employmenttax payments

Expand by years and aggregate by person

Aggregate by month

Data stored with secret sharing andprocessed with secure multi-party computation

Analysis results

Recoverresults from

shares

Statisticalanalyst

Sharemind-powered Analytics

Data scientists used analytics tools based on secure multi-party computation.The MPC system also prevented queries outside the study plan.Reports were given to industry, universities and the government.Result: no clear relation between working during studies and not graduating.

DataAnalyst

UniversitiesCompanies

Policymakers

Cybernetica

Estonian Information

System's Authority

Ministry of Finance

IT Center

A privacy-preserving statistics tool inspired by R

2. Tulemused Tulemused kinnitavad, et nominaalajaga lõpetajate osakaal on madal tudengite hulgas üldiselt ja IKT-tudengite hulgas eriti. IKT-tudengite hulgas varieerub nominaalajaga lõpetajate osakaal bakalaureuse-õppes 20 protsendi piirimail, mis on madalam kui muude õppekavade tudengite vastav number (vt Joonis 1). Samasugune tendents ilmneb rakenduskõrgharidusõppe puhul. Magistriõppes on nominaalajaga lõpetajate osakaal veidi kõrgem, varieerudes sõltuvalt aastast 30% ja 40% vahel, kuid siingi on IKT õppurite hulgas see madalam kui teistel.

Joonis 1. Nominaalajaga lõpetajate osakaal immatrikuleerimisaastate lõikes, IKT- ja mitte-IKT õppekavad, bakalaureuseõpe

Meessoost tudengid lõpetavad nominaalaja jooksul väiksema tõenäosusega kui naistudengid (vt Joonis 2, Joonis 3). IKT tudengite madalam nominaalajaga lõpetamise tõenäosus ilmneb mõlema soo puhul.

Non-IT graduation rate is around 40%

IT graduation rate is around 20%

Kooliti on nominaalajaga lõpetavate IKT-tudengite osakaal bakalaureuseõppes kõrgeim Tartu Ülikoolis, järgnevad TTÜ ja TLÜ. Rakenduskõrghariduses on kõrgeim nominaalajaga lõpetajate osakaal TLÜ-s, järgnevad Infotehnoloogia Kolledž ja TTÜ. Magistriõppes on kõrgeim nominaalajaga lõpetajate osakaal TÜ-s, järgnevad TTÜ ja TLÜ (järjestus varieerub aastati). ˇ

Nominaalaja jooksul töötamist vaadates selgub üllatuslikult, et IKT-tudengid ei tööta õpingute ajal rohkem kui mitte-IKT õppekavadel õppivad tudengid. Bakalaureuseõppes on kõigi õppeaastate lõikes enamikul aastatel mitte-IKT õppekavade tudengite hulgas tööhõive määr kõrgem kui IKT-tudengitel (vt Joonis 4). Sama on järeldus rakenduskõrghariduse õppurite osas. Magistriõppes, kus tööhõive määrad ületavad 80%, on aga tulemus vastupidine: IKT-tudengite hulgas on tööhõive määr kõrgem kui mitte-IKT õppekavade õppuritel.

Joonis 4. Nominaalaja jooksul töötanud tudengite osakaal kõigist tudengitest aastati, IKT- ja mitte-IKT õppekavad, bakalaureuseõpe

Naissoost tudengite tööhõive määr on mõnevõrra kõrgem kui meessoost õppuritel, seda nii IKT- kui mitte-IKT tudengite hulgas (Joonis 5, Joonis 6). Soolised erinevused hõivemäärades varieeruvad aastati – on aastaid, kus erinevus on märkimisväärne, ning aastaid, kus olulist erinevust pole.

Non-IT and IT students have similar employment ratios, but IT students lost more in the financial crisis

Regulatory status of the project

In an official response, after a study of the system, the Estonian DPA suggested that

neither the hosts of the servers running the statisticsnor the analysts making the queriescould feasibly re-identify individuals in the source database (this was pre-GDPR).

The Internal Supervision Department of the Tax and Customs Board agreed to provide unmodified tax records after a code and process review.Follow-up legal review in the FP7 PRACTICE by a research from the University of Göttingen suggested that the same precedent could hold under GDPR as well.

DATA-DRIVEN SERVICES ON CONFIDENTIAL DATA

A general model for privacy-preserving computing

Concept of secure computing

encrypteddatabase

standard tools

secure computing

When a standard computerencrypts data, it must bedecrypted before analysis

Secure computing systemscan analyze data without removing the encryption.

Extended definition of secure multi-party computation

Input parties

Computing parties

Result parties

Step 1: upload and storage of inputs

Step 3: publishingof results

Step 2: secure

computing

Technique: property-preserving cryptography

Analogy: symmetric crypto that preserves a relation on inputs (e.g., order, equality).Pros:

Low performance overhead.Fits well into existing systems.

Cons: Only allows a few operations (e.g., only equality comparison or ordering).Multi-user systems are a challenge, but can be done with proxy re-encryption.

Technique: homomorphic encryption

Analogy: asymmetric crypto that allows addition and multiplication of ciphertexts.Pros:

Fits well into existing systems.Cons:

High performance overhead.Multi-user systems are a challenge, but can be done with proxy re-encryption.

Analogy: cryptographic versions of electrical circuits.Pros:

Flexible programming model. Cons:

Medium performance overhead.Fixed number of parties (can be solved by combining with other techniques).

Technique: garbled circuits

Example: millionaire’s problem

Analogy: give a number of people a random piece of each secret value and let them collaborate to compute results.Pros:

Low-to-medium performance overhead.Flexible programming model.

Cons: Distributed deployments do not fit into all existing systems.

Technique: secret sharing

Analogy: think of a computer process that hides the data from its ownerPros:

Minimal performance overhead.Relatively easy to convert applications to work on trusted execution environments

Cons: Side-channel attack mitigations are complicated to implement.

Technique: trusted execution environments

Lecture exercise: modelling parties for a COVID-19 social distancing tracking application

Think of an application that would support social distancing and limit infection rates. Write down very clearly, what is the expected benefit of the system.

Write down the list of input parties and the data they would provide.Write down the list of computing parties and describe the kind of processing they would perform. Write down the list of result parties and describe the outputs they would receive.

Bonus tasks, time permitting:Think of minimizing personal data processing using process redesign.See if any of the secure computing paradigms described above could support your application.

Prepare in 12 minutes and then we’ll have 1-2 students present their ideas.

Lecture task

Programmable privacy-preserving computations

A protection domain kind (PDK) is a set of data representations, algorithms and protocols for storing and computing on protected data.Examples:

SMC based on secret sharing,SMC based on garbled circuits,(fully) homomorphic encryption,trusted hardware (e.g., Intel SGX).

PDK as an abstraction of a secure computing paradigm

A protection domain (PD) is a set of data that is protected with the same resources and for which there is a well-defined set of algorithms and protocols for computing on that data while keeping the protection.Examples:

data held by a fixed group of servers performing secure multi-party computation,data encrypted under a fixed key of a homomorphic encryption scheme.

Protection domain as an instance of a PDK

Secureprimitiveoperations

Application

• private outputs from private inputs,

• have privacy proofs,• remain private under

sequential or parallel composition,

• optimized to have a low resource footprint.

• publish selected results tomake system useful,

• do not leak private inputs or show leakage as acceptable,

• compositions of secureprimitive operations,

• optimize for runningtime.

Privacy-preservingalgorithms

Application logic

Application model for privacy-preserving computing

We pick frequent itemset mining as a problem of choice.Frequent itemset mining is a data mining problem that helps with shopping basket analysis and the simplest kinds of recommender systems.

What kind of things do people buy from stores together most often?If the service provider knows this, they can recommend one to a customer who is planning to buy the other.

The simpler algorithms include Apriori (breadth-first search) and Eclat (depth-first search).We will know look at the basic primitive of frequent itemset mining and then build a privacy-preserving approach.

Converting an algorithm to a privacy-preserving one

t1 rendang nasi lemak chicken satay

t2 nasi lemak lontong

t3 chicken satay

t4 rendang nasi lemak

t5 nasi lemak chicken satay

t6 nasi lemak chicken satay

t7 lontong

rendang nasi lemak lontong chicken

t1 1 1 0 1t2 0 1 1 0t3 0 0 0 1t4 1 1 0 0t5 0 1 0 1t6 0 1 0 1t7 0 0 1 0

Private data representations are the key towarddesaigning privacy-preserving algorithms.

Privacy-preserving data representations

The data representation allows for very efficientcalculation of item supports.

t1 1 1 0 1t2 0 1 1 0t3 0 0 0 1t4 1 1 0 0t5 0 1 0 1t6 0 1 0 1t7 0 0 1 0

nasi lemak

1101110

∑ = 5

Calculating the support of an item

Checking the joint support of a pair of itemssimply requires a multiplication

chicken satay

1010110

t1 1 1 0 1t2 0 1 1 0t3 0 0 0 1t4 1 1 0 0t5 0 1 0 1t6 0 1 0 1t7 0 0 1 0

nasi lemak

1101110

∑ = 3

xxxxxxx

=======

nasi lemak & chicken satay

1000110

Calculating support for a set of items

Depth-first search would be intuitive for pruning.

{ rendang } { nasi lemak } { lontong } { chicken satay }

rendang, nasi lemak{ } rendang,

lontong{ } rendang, chicken satay{ } nasi lemak,

lontong{ } nasi lemak, chicken satay{ } lontong,

chicken satay{ }rendang, nasi lemak,lontong{ } rendang,

nasi lemak,chicken satay{ } nasi lemak,

lontongchicken satay{ }rendang,

lontong,chicken satay{ }

rendang, nasi lemak,lontongchicken satay{ }

Evaluating itemsets with a depth-first strategy

However, breadth-first search can be done in parallel.

{ rendang } { nasi lemak } { lontong } { chicken satay }

rendang, nasi lemak,lontong{ } rendang,

nasi lemak,chicken satay{ } nasi lemak,

lontongchicken satay{ }rendang,

lontong,chicken satay{ }

rendang, nasi lemak,lontongchicken satay{ }

rendang, nasi lemak{ } rendang,

lontong{ } rendang, chicken satay{ } nasi lemak,

lontong{ } nasi lemak, chicken satay{ } lontong,

chicken satay{ }

Evaluating itemsets with a breadth-first strategy

Challenge: exploring all possible itemsets leads is slow due to combinatorial explosion.Pruning the search tree requires us to declassify itemset supports during computation (leak?).Solution: consider that the algorithm will publish all frequent itemsets, as that is its intended goal.We will compare support to the threshold privately, only declassifying the result bit.We will prune the search tree based on that bit.Not a leak - if the itemset is frequent, we would have learned it from the outputs anyway.

Balancing optimizations with privacy preservation

l8: introduction to privacy-preserving computations

Documents

l8 grammar 4

kym363 mÜhendİslİk ekonomİsİ l8 -...

collisions (l8)

privacy-preserving access control and computations of...

l8 design1

l8 homefront

l8 entropy

dancing l8

2019 11...

l8 optimize

legionella l8 updates

03. roc l8 (25) & roc l8 (30)

l8 agrafia

spelling words l8

l8 classes

l8 womens rights

l8 - fqcms.com

)1 l8 ! sqn )1 l8 ! sqn profundidade máxima de escavação

l8 vendajes

atlas copco roc l8 prospect -...