speed, accurate and efficient way to identify the dna

44
Speed, Accurate and Efficient way to identify the DNA

Upload: bryant-myers

Post on 30-Mar-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Speed, Accurate and Efficient way to identify the DNA

Speed, Accurate and Efficient way to identify the DNA

Page 2: Speed, Accurate and Efficient way to identify the DNA

DNA Overview.

Sequence Alignment.

Problem & Previous

Solutions.

GPU & CUDA.

Implemented Solution.

GUI (Ribbon).

Results.

Page 3: Speed, Accurate and Efficient way to identify the DNA

DNA Overview.

Sequence Alignment.

Problem & Previous

Solutions.

GPU & CUDA.

Implemented Solution.

GUI (Ribbon).

Results.

Page 4: Speed, Accurate and Efficient way to identify the DNA

• Describing the genetic information for cell growth, division and functions.

• Diagnoses the case of an organism or a human, for example:

- check if he has certain disease such as cancer or not .

• feature of the human body. -Such as ( height, eye color, the shape of the

nose, hair, skin color , gender,……. ).

Page 5: Speed, Accurate and Efficient way to identify the DNA

• Chromosomes• Genes

• Nucleotide• bases₋ Adenine (A) ₋ Guanine (G) ₋ Cytosine (C) ₋ Thymine (T).

Page 6: Speed, Accurate and Efficient way to identify the DNA

Genes structure

Page 7: Speed, Accurate and Efficient way to identify the DNA

• FASTA format is a text-based format used to represent any type of sequences as DNA .

Page 8: Speed, Accurate and Efficient way to identify the DNA
Page 9: Speed, Accurate and Efficient way to identify the DNA

DNA Overview.

Sequence Alignment.

Problem & Previous

Solutions.

GPU & CUDA.

Implemented Solution.

GUI (Ribbon).

Results.

Page 10: Speed, Accurate and Efficient way to identify the DNA

• biological sequences develop from preexisting sequences instead of being invented by nature from the beginning.

• Three types of changes can occur at any given position within a sequence:– Point mutations.– Insertion.– Deletions.

• Two identical characters produces a match, Two different nonblank characters produces a mismatch, and a blank is called an indel (insertion/deletion) or gap.

Page 11: Speed, Accurate and Efficient way to identify the DNA

• Global Sequence Alignment

– Needleman-Wunsch Algorithm

• Local Sequence Alignment

– Smith-Waterman Algorithm

Page 12: Speed, Accurate and Efficient way to identify the DNA

DNA Overview.

Sequence Alignment.

Problem & Previous

Solutions.

GPU & CUDA.

Implemented Solution.

GUI (Ribbon).

Results.

Page 13: Speed, Accurate and Efficient way to identify the DNA

• The computational cost is very high, requiring a number of operations proportional to the product of the length of two sequences. The algorithm has a complexity of O(NxM)

• Previous solutions:– FPGA:

• High cost.• Not suitable for all users

– Approximated algorithms:• Less accurate

• Current Solution: Parallelization on Graphics Cards.

Page 14: Speed, Accurate and Efficient way to identify the DNA

DNA Overview.

Sequence Alignment.

Problem & Previous

Solutions.

GPU & CUDA.

Implemented Solution.

GUI (Ribbon).

Results.

Page 15: Speed, Accurate and Efficient way to identify the DNA

GPU( Graphics Processing Unit)

• GPU is viewed as a compute device operating as a coprocessor to the main CPU (host).

• CPU and GPU are separate devices with separate memory.

Page 16: Speed, Accurate and Efficient way to identify the DNA

CUDACompute Unified Device

Architecture

• CUDA is NVidia's scalable parallel programming model and a software environment for parallel computing.

• Language: CUDA C, minor extension to C/C++.

• A heterogeneous serial-parallel programming model.

Page 17: Speed, Accurate and Efficient way to identify the DNA

CUDA

• CUDA program = serial code + parallel kernels (all in CUDA C).

-Serial C code executes in a host thread (CPU thread). - Parallel kernel code executes in many device threads (GPU threads).

Page 18: Speed, Accurate and Efficient way to identify the DNA

•Blocks and grids may be 1d, Blocks and grids may be 1d, 2d, or 3d.2d, or 3d.

•gridDim, blockIdx, gridDim, blockIdx, blockDim, threadIdx.blockDim, threadIdx.

•Threads/blocks have unique Threads/blocks have unique IDs.IDs.

Page 19: Speed, Accurate and Efficient way to identify the DNA

CUDA Kernels

• A kernel is a function executed on the CUDA device.

• Threads are grouped into warps of 32 threads. -Warps are grouped into thread blocks. -Thread blocks are grouped into grids.

• Each kernel has access to certain variables that define its position.

- threadIdx.x.

- blockIdx.x. -gridDim.x, blockDim.x.

Page 20: Speed, Accurate and Efficient way to identify the DNA

Kernel Call Syntax

• Kernels are called with the <<<>>> syntax.

• Function name<<<Dg, Db >>>(arg[1],arg[2],…).

Where: Dg = dimensions of the grid (type dim3). Db = dimensions of the block (type

dim3).

Page 21: Speed, Accurate and Efficient way to identify the DNA

Function Type Qualifiers

• The kernel was defined as __global__.

• This specifies that the function runs on the device and is callable from the host only.

• __device__ and __host__ are other available qualifiers. __device__ - executed on device, callable only from device. __host__ - default if not specified. Executed on host, callable from host only.

Page 22: Speed, Accurate and Efficient way to identify the DNA

CUDA PROGARMING

Basic steps

•Transfer data from CPU to GPU.

•Explicitly call the GPU kernel designed -CUDA will implicitly assign threads to each multiprocessor and assign resources for computations.

•Transfer results back from GPU to CPU.

Page 23: Speed, Accurate and Efficient way to identify the DNA

GPU( Graphics Processing Unit)

• GPU is viewed as a compute device operating as a coprocessor to the main CPU (host).

• CPU and GPU are separate devices with separate memory.

Page 24: Speed, Accurate and Efficient way to identify the DNA

CUDACompute Unified Device

Architecture

• CUDA is NVidia's scalable parallel programming model and a software environment for parallel computing.

• Language: CUDA C, minor extension to C/C++.

• A heterogeneous serial-parallel programming model.

Page 25: Speed, Accurate and Efficient way to identify the DNA

CUDA

• CUDA program = serial code + parallel kernels (all in CUDA C).

-Serial C code executes in a host thread (CPU thread). - Parallel kernel code executes in many device threads (GPU threads).

Page 26: Speed, Accurate and Efficient way to identify the DNA

•Blocks and grids may be 1d, 2d, Blocks and grids may be 1d, 2d, or 3d.or 3d.

•gridDim, blockIdx, blockDim, gridDim, blockIdx, blockDim, threadIdx.threadIdx.

•Each kernel has access to certain Each kernel has access to certain variables that define its position.variables that define its position. -- threadIdx.x. threadIdx.x.

-- blockIdx.x. blockIdx.x. -gridDim.x,-gridDim.x, blockDim.x. blockDim.x.

Page 27: Speed, Accurate and Efficient way to identify the DNA

CUDA Kernels

• A kernel is a function executed on the CUDA device.

• Threads are grouped into warps of 32 threads. -Warps are grouped into thread blocks. -Thread blocks are grouped into grids.

Page 28: Speed, Accurate and Efficient way to identify the DNA

Kernel Call Syntax

• Kernels are called with the <<<>>> syntax.

• <<<Dg, Db >>>.Where: Dg = dimensions of the grid (type dim3). Db = dimensions of the block (type

dim3).

Page 29: Speed, Accurate and Efficient way to identify the DNA

Function Type Qualifiers

• The kernel was defined as __global__.

• This specifies that the function runs on the device and is callable from the host only.

• __device__ and __host__ are other available qualifiers. __device__ - executed on device, callable only from device. __host__ - default if not specified. Executed on host, callable from host only.

Page 30: Speed, Accurate and Efficient way to identify the DNA

DNA Overview.

Sequence Alignment.

Problem & Previous

Solutions.

GPU & CUDA.

Implemented Solution.

GUI (Ribbon).

Results.

Page 31: Speed, Accurate and Efficient way to identify the DNA

PARALLELIZATION• The sequence alignment algorithm consumes large

amount of time For processing.• parallelization capabilities found in the GPUs.• Parallelization=Performance

Two levels of polarization level 1: Paralleling the Database comparison -- Assume 14 sequences in the database

Page 32: Speed, Accurate and Efficient way to identify the DNA

PARALLELIZATION

Parallelization inside single sequence comparing.

1. Initializing the data matrix and pointers

Page 33: Speed, Accurate and Efficient way to identify the DNA

PARALLELIZATION

Page 34: Speed, Accurate and Efficient way to identify the DNA

• data dependency in the calculation steps d

PARALLELIZATION

Page 35: Speed, Accurate and Efficient way to identify the DNA

PARALLELIZATION

Page 36: Speed, Accurate and Efficient way to identify the DNA

Implementation of this paralleling part

Page 37: Speed, Accurate and Efficient way to identify the DNA

DNA Overview.

Sequence Alignment.

Problem & Previous

Solutions.

GPU & CUDA.

Implemented Solution.

GUI (Ribbon).

Results.

Page 38: Speed, Accurate and Efficient way to identify the DNA
Page 39: Speed, Accurate and Efficient way to identify the DNA

DNA Overview.

Sequence Alignment.

Problem & Previous

Solutions.

GPU & CUDA.

Implemented Solution.

GUI (Ribbon).

Results.

Page 40: Speed, Accurate and Efficient way to identify the DNA

Performance

Page 41: Speed, Accurate and Efficient way to identify the DNA

Performance

Page 42: Speed, Accurate and Efficient way to identify the DNA

Speed Up

Page 43: Speed, Accurate and Efficient way to identify the DNA

Speed Up

Page 44: Speed, Accurate and Efficient way to identify the DNA

Any Questions ??