analytics with data privacy and control · n1 analytics 23 | release your data without losing...
TRANSCRIPT
Confidential ComputingAnalytics with data privacy and control
Brian Thorne
Privacy Preserving LinkageMotivation
Computation
Result
Org A Org B
Mutually Trusted
Multi-Organisation Analytics Today
Confidential Computing | [email protected] |
Computation
Result
NOT Trusted!
Confidential Confidential
But many Opportunities are Blocked
Confidential Computing | [email protected] |
Entity Matching
Confidential Computing | [email protected]
Overview of a typical
data integration
project within GOV
6 |
Privacy-preserving entity resolution
● Goal: match corresponding rows in two distinct databases
● Constraint: can’t share Personally Identifiable Information (PII)
● Solution: fuzzy & private matching
8
Privacy-preserving entity resolution
9
How?
For every record we process the PII into a
Cryptographic Longterm Key or (CLK)
Briefly, we hash the bi-grams for each PII feature into a
bloom filter.
https://github.com/n1analytics/clkhash/
10
Cryptographic Longterm Key
11 |
Private Record Linkage
12 |
Fuzzy Matcher
Shared Secret Salt
Hasher
Personally Identifiable Information
AnonymousHashes
Hasher
Personally Identifiable Information
AnonymousHashes
Linkage Table
Semi-Trusted Party
Company A Company B
Confidential Computing | Brian Thorne
PII cannot be recovered from the hashes
anonlink
Confidential Computing | [email protected]
Semi-trusted Third Party
• Only hashed data is uploaded to the entity resolution service
• Hash security relies on a shared secret between parties
• Implemented the service with a simple JSON + REST API
• All communication is secured with HTTPS
• Authentication tokens created for each job
• Result type and agreed schema is set at beginning
Confidential Computing | [email protected]
Client side: Command Line Utility
• Locally hashes PII data
• Creates new mapping jobs
on the server
• Uploads hash data
• Retrieves results
Confidential Computing | [email protected] |
Performance & Case Study
Confidential Computing | [email protected]
Speed and Scale
• 1.3B hash comparisons/s• Handle uploads of 35M hashes• 1M x 1M match takes around 5 hours
Running on four r4.4xlarge instances on AWS
Confidential Computing | [email protected]
Computing similarity between CLKs is a very parallel problem. Our
implementation utilizes multiple workers to carry out comparisons
using a kubernetes cluster
19 |
Data61 Privacy Projects
Protari, SENDA, Risk Identification, N1, Private Linkage
Confidential Computing
Confidential Computing21 |
Organisation 1
Cloud / data center
Sensitive
data
Health; sensor; finance;
government; location;
etc.
Organisation 2
Cloud / data center
Sensitive
data
How can we learn insights from data from multiple sources
and protect its value?
Insights
Joint
Analysi
s Health; sensor; finance;
government; location;
etc.
Protari
22 |
N1 Analytics
23 |
Release your data without losing control Access data that is currently too sensitive
Fully, Somewhat, Partially
Homomorphic
encryption
Secure Multiparty Compute Learning from Aggregates
Learn and deploy models Secure aggregation of dataClustering/Anomaly
Detection
Goals
:
Technologies
:
Capabilities
:
Confidential Computing
Confidential ComputingAnalytics with data privacy and control
www.n1analytics.com