the genomics revolution: the good, the bad, and the ugly · the good revolution in medicine and...

34
The Genomics Revolution: The Good, The Bad, and The Ugly (A Privacy Researcher's Perspective) Emiliano De Cristofaro University College London https://emilianodc.com

Upload: others

Post on 23-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

The Genomics Revolution: The Good, The Bad, and The Ugly

(A Privacy Researcher's Perspective)

Emiliano De CristofaroUniversity College London

https://emilianodc.com

Page 2: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

This Talk In a…

The GoodRevolution in medicine and healthcareGenetic testing for the masses

The BadCollection of highly sensitive dataVery hard to anonymize / de-identify

The UglyGreater good vs privacyEncryption might not be the answer

2

Page 3: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

3

Page 4: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

History1970s: DNA sequencing starts1990: The “Human Genome Project” starts2003: First human genome fully sequenced2012: UK announces sequencing of 100K genomes2015: USA announces sequencing of 1M genomes

$$$$3B: Human Genome Project $250K: Illumina (2008)$5K: Complete Genomics (2009), Illumina (2011)$1K: Illumina (2014)

4

Page 5: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

How to read the genome?

GenotypingTesting for genetic

differences using a set of markers

5

SequencingDetermining the full

nucleotide order of an organism’s genome

Page 6: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

6

Page 7: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

7

Page 8: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

8

Page 9: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

9

Page 10: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data
Page 11: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

But… not all data are created equal!

11

Page 12: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

Privacy Researcher’s Perspective

Treasure trove of sensitive informationEthnic heritage, predisposition to diseases

Genome = the ultimate identifierHard to anonymize / de-identify

Sensitivity is perpetualCannot be “revoked”Leaking one’s genome ≈ leaking relatives’ genome

12

Page 13: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

The Greater Goodvs

Privacy?

13

Page 14: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

The rise of a new research community

Studying privacy issues

Exploring techniques to protect privacy

14

Page 15: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

De-Anonymization

Melissa Gymrek et al. “Identifying Personal Genomes bySurname Inference.” Science Vol. 339, No. 6117, 2013 15

Page 16: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

Aggregation

Re-identification of aggregated dataStatistics from allele frequencies can be used to identify genetic trial participants [1]Presence of an individual in a group can be determined by using allele frequencies and his DNA profile [2]

[1] R. Wang et al. “Learning Your Identity and Disease from Research Papers: Information Leaks in Genome Wide Association Study.” CCS, 2009

[2] N. Homer et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays.

PLoS Genetics,200816

Page 17: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

Kin Privacy

Quantifying how much privacy do relatives lose when one’s genome is leaked?

M. Humbert et al., “Addressing the Concerns of the Lacks Family: Quantification of Kin Genomic Privacy.” Proceedings of ACM CCS, 2013

Also read: “Routes for breaching genetic privacy”

Y. Erlich and A. Narayanan, Nature Review Genetics

Vol. 15, No. 6, 2014

Page 18: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

18

Page 19: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

Human Aspects of Genome Privacy

Dynamic Consent:Patients electronically control consent through time and receive information about the uses of their dataJane Kaye’s work at Oxford

Understanding “fears” and “reactions”, including:Survivor’s guiltFreedom to withdraw is crucial but poorly understoodInsurance carriers and big corporations most distrusted

19

Page 20: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

Ethnographic Studies in WGS

Semi-structured interviews with 16 participantsAssessing perception of genetic tests, attitude toward WGS programs, as well as perception of privacy/ethical issues

(Some) Preliminary results1.  Preferred method is through doctors not companies

(trust)2.  Labor/healthcare discrimination top concerns3.  Differences in correlation with income and education

E. De Cristofaro. “Users' Attitudes, Perception, and Concerns in the Era of

Whole Genome Sequencing.” (USEC 2014)20

Page 21: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

The rise of a new research community

Studying privacy issues

Exploring techniques to protect privacy

21

Page 22: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

Differential Privacy

Computing number/location of SNPs associated to diseaseSignificance/correlation between a SNP and a disease

A. Johnson and V. Shmatikov. “Privacy-Preserving Data Exploration in Genome-Wide Association Studies.” Proceedings of KDD, 2013 22

Genome Wide Association Studies (GWAS)

Page 23: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

Computing on Encrypted Genomes

Genomic datasets often used for association studies

Encrypt data & outsource to the cloudPerform private computation over encrypted dataUsing partial & fully homomorphic encryption

Examples:Pearson Goodness-of-Fit test, linkage disequilibriumEstimation Maximization, Cochran-Armitage TT, etc.

23K. Lauter, A. Lopez-Alt, M. Naehrig.

Private Computation on Encrypted Genomic Data

Page 24: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

Computing on Encrypted Genomes

24

L. Kamm, D. Bogdanov, S. Laur, J. Vilo. A new way to protect privacy in large- scale genome-wide association studies. Bioinformatics 29 (7): 886-893, 2013.

Page 25: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

Private Personal Genomic Tests

Individuals retain control of their sequenced genome

Allow doctors/labs to run genetics tests, but:1. Genome never disclosed, only test output is2. Pharmas can keep test specifics confidential

… two main approaches …25

Page 26: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

(i) D

NA

sam

ple

(i) Clinical and Environmental

data

(ii) Encrypted SNPs

(iii)

Dis

ease

R

isk

Com

puta

tion

CERTIFIED INSTITUTION (CI)

MEDICAL UNIT (MU)

STORAGE AND PROCESSING UNIT (SPU)

PATIENT (P)

1. Using Semi-Trusted Parties

26

Page 27: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

1. Using Semi-Trusted Parties

Ayday et al. (WPES’13)Data is encrypted and stored at a “Storage Process Unit”Disease susceptibility testing

Ayday et al. (DPM’13)Encrypting raw genomic data (short reads)Allowing medical unit to privately retrieve them

Danezis and De Cristofaro (WPES’14)Regression for disease susceptibility

27

Page 28: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

doctoror lab

genome

individual

test specifics

Secure Function

Evaluation

test result test result

• Private Set Intersection (PSI)• Authorized PSI• Private Pattern Matching• Homomorphic Encryption• Garbled Circtuis• […]

Output reveals nothing beyond test result

• Paternity/Ancestry Testing• Testing of SNPs/Markers • Compatibility Testing• […]

2. Users keep sequenced genomes

28

Page 29: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

2. Users keep sequenced genomes

Baldi et al. (CCS’11)Privacy-preserving version of a few genetic tests, based on private set operationsPaternity test, Personalized Medicine, Compatibility Tests(First work to consider fully sequenced genomes)

De Cristofaro et al. (WPES’12), extends the aboveFramework and prototype deployment on AndroidAdds Ancestry/Genealogy Testing

29

Page 30: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

Open ProblemsWhere do we store genomes?

Encryption can’t guarantee security past 30-50 yrsReliability and availability issues?

CryptographyEfficiency overhead Data representation assumptionsHow much understanding required from users?

30

Page 31: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

We all leave biological cells behind…Hair, saliva, etc., can be collected and sequenced?

Compare this “attack” to re-identifying millions of DNA donors or hacking into 23andme…

The former: expensive, prone to mistakes, only works against a handful of targeted victimsThe latter: very “scalable”

Why do we even careabout genome privacy?

31

Page 32: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

Epilogue

Whole Genome SequencingA revolution in healthcareRaises worrisome privacy/ethical concerns

The Genomic Privacy research communityUnderstanding the privacy issuesPrivacy-preserving testing on whole genomesPossible using efficient crypto protocols & cross-discipline collaboration

A number of open research issues…32

Page 33: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

For more info:

http://genomeprivacy.org

Also:E. Ayday, E. De Cristofaro, J.P. Hubaux, G. Tsudik.

“Whole Genome Sequencing: Revolutionary Medicine or Privacy Nightmare?”

IEEE Computer Magazine

Page 34: The Genomics Revolution: The Good, The Bad, and The Ugly · The Good Revolution in medicine and healthcare Genetic testing for the masses The Bad Collection of highly sensitive data

Thank you!

Special thanks toE. Ayday, P. Baldi, R. Baronio, G. Danezis, S. Faber,

P. Gasti, J-P. Hubaux, B. Malin, G. Tsudik