the human variant database - github pages · • two parallel goals: – personalized oncogenomics...
TRANSCRIPT
![Page 1: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/1.jpg)
TheHumanVariantDatabase
MyaWarrenMichaelSmithGenomeSciencesCentre
VancouverBC
![Page 2: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/2.jpg)
Bioinforma=csisBigData
• Humangenomehas– 3billionnucleo=debases– 60thousandgenes– 10-20thousandproteins
• Bioinforma=cstakesadvantageof– Highperformancecompu=ng– Sophis=catedalgorithms– Math/Sta=s=cs– Machinelearning
![Page 3: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/3.jpg)
Ourmission
• Twoparallelgoals:– PersonalizedOncogenomicsProgram
Usepa'entgenomicstodiagnoseandiden'fytherapiesforeachpa'ent’suniquedisease
– CancerresearchFindnewpa9ernsinthegenomicsdatatoiden'fynoveltargetsfortherapy,learnfundamentaltruthsaboutcancer
![Page 4: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/4.jpg)
Ourmission
• Twoparallelgoals:– PersonalizedOncogenomicsProgram
Usepa'entgenomicstodiagnoseandiden'fytherapiesforeachpa'ent’suniquedisease
– CancerresearchFindnewpa9ernsinthegenomicsdatatoiden'fynoveltargetsfortherapy,learnfundamentaltruthsaboutcancer
• Thedatabasesupportsthesegoalsthrough:– Fastqueryingandexplora=onofpa=entgenomics,clinicalcovariates
– Dataminingandanalysisofpa=entcohorts
![Page 5: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/5.jpg)
HAWQ(HAdoopWithQueries)
Amassivelyparallelprocessing(MPP)SQL
engineinHadoop
![Page 6: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/6.jpg)
HAWQ(HAdoopWithQueries)
Amassivelyparallelprocessing(MPP)SQL
engineinHadoop• InterfacewiththedatausingPostgreSQL
![Page 7: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/7.jpg)
HAWQ(HAdoopWithQueries)
Amassivelyparallelprocessing(MPP)SQL
engineinHadoop• InterfacewiththedatausingPostgreSQL• Parallel,faulttolerantarchitectureforstoringandprocessingbigdata
![Page 8: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/8.jpg)
Oursystem
• 13slavenodes• 32threadCPUs• Totalmemory:1.5TB• Totalstorage:250TB• Currentdiskusage:1.5TB• Largesttable:~10billionrows
![Page 9: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/9.jpg)
HAWQArchitecture• Hadoopdistributedfile
system(HDFS)– Dataischunked,replicated,
distributed
• Datalocality– Movethecomputa=onto
thedata– Dataisnotshared– HAWQisveryfast,linear
scalability
• CaninterfacewiththerestoftheHadoopecosystem
![Page 10: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/10.jpg)
HAWQvs.Rela=onalDatabases
• Append-onlytables• Noprimarykeys• Noforeignkeys• Joinsaremoreexpensive• Extract-transform-load(ETL)op=mizedforlargedatafiles– Importrawdata– Transformdataindatabase
![Page 11: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/11.jpg)
TheData
• Internallygenerateddata+publiccancerdatasets(TCGA)
• 11,519pa=ents• 21,591libraries• 31,067analyses• >10billionrows
![Page 12: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/12.jpg)
Variants
• Rawdatafor– Unpaired/soma=cSNVsandIndels– Germline/soma=cCNVs– Soma=clossofheterozygosity– Geneexpression– Homozygousdele=ons
• Post-Processedandfilteredvariantdata
![Page 13: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/13.jpg)
Metadata
• Libraryconstruc=onandsequencing• Analysispipeline• Pa=entdata– Demographics– Biopsydiagnoses– Drugtreatment– Radia=ontreatment
![Page 14: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/14.jpg)
Annota=ons
• dbSNP• COSMIC• ClinVar• SnpEff• Genemodels
![Page 15: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/15.jpg)
Comingsoon
• Otherinternalprojects• Moreexternaldatasets!• Structuralvariants,miRNA...• Disease/Drugontologies• Knowledgebase• Moredata=bejeranalysis!
![Page 16: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/16.jpg)
Accessingthedata
• Customqueriesandpipelines
![Page 17: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/17.jpg)
Accessingthedata
• Customqueriesandpipelines• GeneralpurposeRESTAPIs– Python– SQLAlchemyObjectRela=onalModel– PyramidRESTframework
• Webinterface– Query– Filter– Analyze
![Page 18: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/18.jpg)
Queryselector
![Page 19: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/19.jpg)
Results
![Page 20: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/20.jpg)
TheFuture
Letthedatabasedothework!
![Page 21: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/21.jpg)
TheFuture
Letthedatabasedothework!• Whygiveupyourpipeline?– speed– flexibility
![Page 22: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/22.jpg)
Tasksthatcouldbedoneonthevariantdatabase
• Annota=ons• Filtering• Sta=s=calanalysisandanaly=cs• Correla=ons• MachineLearning
![Page 23: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/23.jpg)
scalable,in-databaseanaly=cs
![Page 24: The Human Variant Database - GitHub Pages · • Two parallel goals: – Personalized Oncogenomics Program Use pa'ent genomics to diagnose and iden'fy therapies for each pa'ent’s](https://reader033.vdocuments.mx/reader033/viewer/2022041909/5e66b7cd1cbefe105e6d5f3d/html5/thumbnails/24.jpg)
Thanks!
VariantDBDevelopersMarcelBernardJoshuaDaviesDarrylD’SouzaNavjashanSinghJamesZhouSimonChan
PIPE/BioApps/LIMSMorganByeKarenEddyPatrickPlejner
SystemsHansenWongRudyZhouLanceBailey
BrandonPierceRichardCorbejEricChuahYussanneMa