genome big data
DESCRIPTION
Presentation from the "Demystifying Big Data" Technical Conference (Universidad de La Laguna, Spain, June 2014). Biomedical sciences rely on massive data sets. By using machines capable of generating large amounts of data with low cost, science has entered the 'Big Data' era, making computational infrastructures essential to maintain, transfer and analyze all this information.TRANSCRIPT
genomebigdata
Adrián Báez16/06/2014
DNA Genes
Proteins
Genome
Genomics
Biomedicine
Sequencing
Assembly
Fragments
Reads
FASTQ file
Genome sequencing
2003
2014
Human Genome Project ending (1990-2003)
2.7 billion dollars
Illumina launchs HiSeqX Ten
1000 dollars/genome
“Forty such machines would be able to sequence more genomes in one year than had
been produced by all other sequencers to date.”
Genome sequencing
÷400x20
Reads
MB ~ GB
HDD
Assembly
Intermediate data structures
GB ~ TB
RAM
Original sequence
MB ~ GB
HDD
Reads Assembly (RAM) Result
Escherichiacoli 82.4 MB 1.64 GB 3.8 MB
Trypanosomacruzi 1 GB 13.75 GB 38.6 MB
Genome assembly
Instituto Universitario de Enfermedades Tropicales y Salud Pública de Canarias
Current system: Web assembly and analysis
Future work: Big Data solutions
Instituto Universitario de Enfermedades Tropicales y Salud Pública de Canarias
Data transfer
Biotorrents
Implementing Big Data
Security and privacy
Advanced encryption algorithms
Custom hardware solutions instead of cloud computing
Consent forms to share personal genome data
Data storage
Lack of an integral, economic and safe solution
Sequencing/assembly projects
Google Scholar: papers that mention genome sequencing or assembly
Human Genome Project
Cancer Genome Project Pine Genome Project
Dog Genome Project
Pediatric Cancer GenomeProject
Bovine Genome Project
Mammoth GenomeProject
Pear Genome Project
Fugu Genome Project
thanksfor yourattention