1, roberta baronio emiliano de cristofaro · 2012-08-23 · children genetic diseases with...

26
Pierre Baldi 1 , Roberta Baronio 1 , Emiliano De Cristofaro 2 , Paolo Gasti 1 , and Gene Tsudik 1 1 UC Irvine 2 PARC – work done while at UC Irvine * See: http://www.imdb.com/title/tt0119177/

Upload: others

Post on 07-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

Pierre Baldi1, Roberta Baronio1, Emiliano De Cristofaro2, Paolo Gasti1, and Gene Tsudik1

1 UC Irvine 2 PARC – work done while at UC Irvine

* See: http://www.imdb.com/title/tt0119177/

Page 2: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

Outline

• Genomics Background

• Privacy Concerns

• Related Work and Challenges

• Privacy-Preserving Testing of Full Human Genomes •  Paternity Test •  Personalized Medicine •  Compatibility Tests

• Conclusion

2

Page 3: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

Genomics 101

• Genome: •  Contains all of the biological information needed to build

and maintain a “living example” of an organism •  Encoded in DNA, one polymer of nucleotides

•  A,G,C,T

•  Human Genome: •  Approximately 3 billion nucleotides •  Stored in 23 chromosome pairs (plus mtDNA)

• DNA Sequencing: •  Determining precise sequence of nucleotides in a

strand of DNA •  Since the 70’s, a major driving force in life-science •  Rise of High-Throughput Sequencing (HTS)

3

Image  from:  bio.unt.edu  

Image  from:  scilogs.be  

Page 4: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

Full Genome Sequencing (FGS)

• Full Sequencing of Human Genomes: •  The “Human Genome Project”: first full genome in 2003 •  In UK, 1000 genomes are already available •  The “Race for $1000 genome” by 2012

•  $100 by 2017

4

Image  from:  eyeondna.com  

Page 5: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

Full Genome Sequencing (FGS)

•  Advances in FGS: •  The “Human Genome Project”: first full genome in 2003 •  In UK, 1000 genomes are already available •  The “Race for $1000 genome” by 2012

•  $100 by 2017

Ubiquitous availability of FGS is in sight!

•  New Frontiers: •  Better understanding of human genome •  Most individuals will have access to their (full) genomes •  Personalized Medicine •  Testing not only in-vitro but also in-silico

•  Cheaper and more accurate genetic testing

5

Image  from:  eyeondna.com  

Image  from:  blog.bufferapp.com  

Page 6: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

What about privacy?

• Sensitivity of human genome: •  Uniquely identifies an individual (and discloses

ethnicity, disease predispositions, phenotypic traits, …) •  Once leaked, it cannot be “revoked” •  De-identification and obfuscation are not effective •  Legislation, e.g., Genetic Information Nondiscrimination Act (GINA)

• Privacy challenges: •  Available legislation often not technical enough

•  Need for a better understanding of genomics applications

•  Ubiquitous availability of low-cost FGS will amplify privacy concerns…

…It is not too early to investigate them!

6

Image  from:  scienceprogress.org  

Page 7: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

Testing on Full Human Genomes Availability of affordable FGS allows to query/test genomic information not only in vitro but also in silico, e.g.,: • Paternity Tests

•  Commercial in-vitro testing widespread (starting at $79) •  With the availability of full genomes, we can design

algorithms (w/o the need for external companies)

• Personalized Medicine •  Treatment/medication tailored to patient’s genetic makeup •  E.g., testing of tpmt gene advised before prescribing

drugs for childhood leukemia and autoimmune diseases

7

Image  from:  frogsmoke.com  

Image  from:  8ieldofscience.com  

Page 8: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

Testing on Full Human Genomes (2)

• Genetic Tests •  Newborn/fetal screening •  Confirmational diagnostics •  Pre-symptomatic testing

•  E.g., Huntington’s disease

•  Compatibility tests •  Dating web sites finding “good matches” •  Partners assessing possibility of transmitting on to their

children genetic diseases with Mendelian inheritance [1]

8

Image  from:  dnares.in  

[1] V. McKusick and S. Antonarakis. Mendelian inheritance in man: a catalog of human genes and genetic disorders. John Hopkins University Press, 1994.

Page 9: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

Related Work

• Crypto techniques with applications to DNA testing: •  [TKC07], [BA10]: privacy-preserving error-resilient string searching •  [GHS10], [HT10]: secure pattern matching •  [KM10]: secure text processing and CODIS test

• Similarity of DNA Sequences •  [JKS08]: secure edit distance and Smith-Waterman scores

• Other techniques •  [WWL+09]: secure computation on genomic data at a provider •  [BKKT08]: identity test, paternity test, and more

9

Image  from  jonloomer.com  

Page 10: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

Challenges

• Efficiency •  Do available cryptographic protocols scale to full genomes?

•  Short sequences vs 3-billion protocol input

•  Need domain knowledge to minimize computation

• Error Resilience •  Can we use techniques resilient to sequencing errors?

• More in the paper…

• Our Goal: •  Explore techniques viable today •  Combine efficient cryptographic techniques with genomics domain knowledge

10

Image  from  zedge.net  

Page 11: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

Outline

• Genomics Background

• Privacy Concerns

• Related Work and Challenges

• Privacy-Preserving Testing of Full Human Genomes •  Paternity Test •  Personalized Medicine •  Compatibility Tests

• Conclusion

11

Page 12: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

Privacy-Preserving Genetic Paternity Test

• A Strawman Approach for Paternity Test: •  On average, ~99.5% of any two human genomes are identical •  Parents and children have even more similar genomes •  Compare candidate’s genome with that of the alleged child:

•  Test positive if percentage of matching nucleotides is > 99.5 + τ

• First-Attempt Privacy-Preserving Protocol: •  Use an appropriate secure two-party protocol for the comparison •  PROs: High-accuracy and error resilience •  CONs: Performance not promising (3 billion symbols in input)

•  In our experiments, computation takes a few days

12

Page 13: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

Privacy-Preserving Genetic Paternity Test (2)

•  Improved Protocol •  ~99.5% of any two human genomes are identical •  Why don’t we compare only the remaining 0.5%?

But… We don’t know (yet) where exactly this 0.5% occur! Using Private Set Intersection Cardinality for privacy-preserving comparison, it would take about 1 hour

13

Image  from  dna-­‐testing-­‐for-­‐paternity.com  

Page 14: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

Private Set Intersection Cardinality (PSI-CA)

Server Client

S = {s1,, sw}

Private Set Intersection Cardinality (PSI-CA)

14

C = {c1,,cv}

S∩C⊥

Page 15: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

Privacy-Preserving Genetic Paternity Test (3)

•  In-vitro emulation – RFLP-based paternity test •  Restriction Fragment Length Polymorphism (RFLP) analysis:

a difference between samples of homologous DNA molecules from differing locations of restriction enzyme sites

•  DNA sample is cut into fragments by enzymes •  Fragments separated according to their lengths by gel electrophoresis •  Paternity test is positive if enough fragments have the same length

• RFLP-based PPGPT – Reduction to PSI-CA •  Participants: “client” (receives the result), “server” (remains oblivious) •  Public input: , enzymes , markers •  Private input: digitized genomes

15

E = {e1,...,ej} M = {mk1,...,mkl}τ

Page 16: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

Privacy-Preserving RFLP-based Paternity Test 16

Private Set Intersection Cardinality

Test Result (#fragments with same length)

Page 17: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

Remarks 17

• Why compare fragment lengths? •  Isn’t it more accurate to compare actual contents? •  In reality, RFLP yields “false positives” with very low probability •  This approach increases resilience to sequencing errors

• Performance Evaluation •  About 1min pre-processing to emulate enzyme digestion process •  About 10ms computation time on Intel Core i5 with 25 fragments •  Less than 1s on a smartphone (Nokia N900, 600MHz CPU) •  Extending to 50 fragments doubles computation time and increases

accuracy by orders of magnitudes •  Communication overhead: only a few KBs

Page 18: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

Personalized Medicine (PM)

• Drugs designed for patients’ genetic features •  Associating drugs with a unique genetic fingerprint •  Max effectiveness for patients with matching genome •  Test drug’s “genetic fingerprint” against patient’s genome

• Examples: •  tmpt gene – relevant to leukemia

•  (1) G->C mutation in pos. 238 of gene’s c-DNA, or (2) G->A mutation in pos. 460 and one A->G is pos. 419 cause the tpmt disorder (relevant for leukemia patients)

•  hla-B gene – relevant to HIV treatment •  One G->T mutation (known as hla-B*5701 allelic variant) is associated with

extreme sensitivity to abacavir (HIV drug)

18

Image  from:  8ieldofscience.com  

Page 19: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

Privacy-preserving PM Testing (P3MT) 19

• Challenges: •  Patients may refuse to unconditionally release their genomes

•  Or may be sued by their relatives…

•  DNA fingerprint corresponding to a drug may be proprietary: ü  We need privacy-protecting fingerprint matching

•  But we also need to enable FDA approval on the drug/fingerprint ü  We reduce P3MT to Authorized Private Set Intersection (APSI)

Page 20: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

S∩C =def

s j ∈ S ∃ci ∈C : ci = sj ∧auth(ci ) is valid{ }

Authorized Private Set Intersection (APSI)

Server Client

S = {s1,, sw} C = {(c1,auth(c1)),, (cv,auth(cv ))}

Authorized Private Set Intersection

20

CA

C = {c1,,cv}

S∩C =def

sj ∈ S ∃ci ∈C : ci = sj{ }

Page 21: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

Reducing P3MT to APSI

•  Intuition: •  FDA acts as CA, Pharmaceutical company as Client, Patient as Server •  Patient’s private input set: •  Pharmaceutical company’s input set:

•  Each item in needs to be authorized by FDA

21

fp(D) = bj* || j( ){ }

G = (bi || i) bi ∈ {A,C,G,T}{ }i=13⋅109

fp(D)

Patient

APSI

CA

Company

G = (bi || i){ } fp(D) = bj* || j( ){ }fp(D) = bj

* || j( ),auth bj* || j( )( ){ }

Test Result

Page 22: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

Epilogue •  In conclusion:

•  Explored three privacy-sensitive genomic applications •  Paternity tests, personalized medicine, genetic compatibility testing

•  Unlike prior work, we focused on full genomes (1)  Efficient constructions based on: privacy-preserving operations on private sets (2)  Domain knowledge in genomics (and simulation of in-vitro techniques)

•  Lesson learned: Good when two communities connect and attempt to solve real-world problems

• Future Work •  More on private testing of fully-sequenced human genomes

•  Paternity test based on other methods (e.g., STR or SNP) •  Ancestry testing, certified forensic identification, … •  Reducing communication overhead

•  Non-human genomes (e.g., crops, animals, …)

22

Page 23: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

Thank you!

Emiliano De Cristofaro [email protected] www.emilianodc.com

23

Page 24: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

Bonus Slides 24

Page 25: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

P3MT – Performance Evaluation

• Pre-Computation •  Done once, for all possible tests •  Patient’s pre-processing of the genome: a few days •  Optimization:

•  Patient applies reference-based compression techniques •  Rather than full genome, input all differences with “reference” genome (0.5%) •  In our experiments, about 3.5 hours

• Online Computation •  Depend (linearly) on fingerprint size – typically a few nucleotides

•  0.82ms for hla-b*5701, 2.46ms for tpmt (in our Intel i5-560M setting)

• Communication •  Depends on the size of encrypted genome (about 4GB)

25

Page 26: 1, Roberta Baronio Emiliano De Cristofaro · 2012-08-23 · children genetic diseases with Mendelian inheritance [1] 8 Imagefrom:&dnares.in& [1] V. McKusick and S. Antonarakis. Mendelian

Genetic Compatibility Testing (GCT)

• The importance of GCT: •  Predicting whether potential partners are at risk of conceiving a child

with a recessive genetic disease •  That is, both partners carry at least one gene affected by mutation

• Examples: •  Beta-Thalassemia (red cells smaller than average)

•  Due to a mutation in hbb gene •  “Minor” when mutation occurs only in one chromosome (no severe impact) •  “Major” results in likely premature death (both chromosomes have the mutation) •  If both partners have the minor form, their child carry the major variant with 25% prob.

•  Lynch Syndrome (high risk of colon cancer) •  Parents have a 50% of passing it on to their child

• Construction, open problems, and performance: •  In the paper

26