genome big data

genomebigdata

Adrián Báez16/06/2014

DNA Genes

Proteins

Genome

Genomics

Biomedicine

Sequencing

Assembly

Fragments

Reads

FASTQ file

Genome sequencing

2003

2014

Human Genome Project ending (1990-2003)

2.7 billion dollars

Illumina launchs HiSeqX Ten

1000 dollars/genome

“Forty such machines would be able to sequence more genomes in one year than had

been produced by all other sequencers to date.”

Genome sequencing

÷400x20

Reads

MB ~ GB

HDD

Assembly

Intermediate data structures

GB ~ TB

RAM

Original sequence

MB ~ GB

HDD

Reads Assembly (RAM) Result

Escherichiacoli 82.4 MB 1.64 GB 3.8 MB

Trypanosomacruzi 1 GB 13.75 GB 38.6 MB

Genome assembly

Instituto Universitario de Enfermedades Tropicales y Salud Pública de Canarias

Current system: Web assembly and analysis

Future work: Big Data solutions

Instituto Universitario de Enfermedades Tropicales y Salud Pública de Canarias

Data transfer

Biotorrents

Implementing Big Data

Security and privacy

Advanced encryption algorithms

Custom hardware solutions instead of cloud computing

Consent forms to share personal genome data

Data storage

Lack of an integral, economic and safe solution

Sequencing/assembly projects

Google Scholar: papers that mention genome sequencing or assembly

Human Genome Project

Cancer Genome Project Pine Genome Project

Dog Genome Project

Pediatric Cancer GenomeProject

Bovine Genome Project

Mammoth GenomeProject

Pear Genome Project

Fugu Genome Project

thanksfor yourattention

genome big data

Data & Analytics