sequence alignment in dna under the guidance of : prof. kolin paul presented by: lalchand gaurav...
TRANSCRIPT
![Page 1: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/1.jpg)
Sequence Alignment in DNA
Under the Guidance of :Prof . Kolin Paul
Presented By:Lalchand
Gaurav Jain
![Page 2: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/2.jpg)
• Application Domain & objective• General Alignment Procedure• Scope of parallelism in BWT• Selection sort and quick sort implementation• Bwt Implementation on GPU• Comparative study
Agenda
![Page 3: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/3.jpg)
• Application Domain & objective• General Alignment Procedure• Scope of parallelism in BWT• Selection sort and quick sort implementation• Bwt Implementation on GPU• Comparative study
Time-Line
![Page 4: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/4.jpg)
• Application Domain & objective• General Alignment Procedure• Scope of parallelism in BWT• Selection sort and quick sort implementation• Bwt Implementation on GPU• Comparative study
Time-Line
![Page 5: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/5.jpg)
• Application Domain & objective• General Alignment Procedure• Scope of parallelism in BWT• Selection sort and quick sort implementation• Bwt Implementation on GPU• Comparative study
Time-Line
![Page 6: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/6.jpg)
• Application Domain & objective• General Alignment Procedure• Scope of parallelism in BWT• Selection sort and quick sort implementation• Bwt Implementation on GPU• Comparative study
Time-Line
![Page 7: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/7.jpg)
• Application Domain & objective• General Alignment Procedure• Scope of parallelism in BWT• Selection sort and quick sort implementation• Bwt Implementation on GPU• Comparative study
Time-Line
![Page 8: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/8.jpg)
Application Domain & Objective
To present an efficient implementation (Specially parallel) that effectively aids the problem of searching for short sequences in DNA.
• Analyzing Gene expression• Mapping variations between individuals• Mapping homologous Proteins• Assembling Genome of Organism
![Page 9: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/9.jpg)
BWT
Algorithm
Mapper
Indexing
{ Location,Occurance}
ReadsSuffix Array :15GB for human
genome{3 billion * 4 B +3 GB genome}
Basic Alignment Procedure
BWT :Bwt[i] = Ref(SA[i]-1)
{3 GB }
To be parallelized
Parallelized
Intermediate size :10^18
Genome
O(logG)Searching
![Page 10: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/10.jpg)
10
Scope of Parallelism in BWT
• With BWT , w length string can be find in O(w) time.• The BWT is closely related to the suffix array• Lexicographic sorted list of all suffixes in a genome.
• Bwt[i] = ref [ SA[i] -1] {Bwt[i] = $ when S(i) =1}
BWT
![Page 11: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/11.jpg)
● Implementation of Bwt using Selection Sort– OpenMp
Initial Step - 1
![Page 12: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/12.jpg)
0 100 200 300 400 500 600 700 800 900 10000
1000
2000
3000
4000
5000
6000
7000
Bwt Creation using Selection sort
Proc 1 Proc 2Proc 4Proc 8
File Size in KB
Tim
e in
Sec
onds
Selection Sort - Openmp
CPUCores 8
Data cache
L1 :32K L2 :6M
DRAM 12GB
Proc. Clock
2.9 GHz
![Page 13: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/13.jpg)
● Implementation of Bwt using Selection Sort– OpenMp
● Implementation of Bwt using Quick Sort– OpenMp
Initial Step - 2
![Page 14: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/14.jpg)
Quick Sort - Openmp
CPU StatisticsCores 8
Data cache
L1 :32K L2 :6M
DRAM 12GB
Proc. Clock
2.9 GHz
![Page 15: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/15.jpg)
● Implementation of Bwt using Selection Sort– OpenMp
● Implementation of Bwt using Quick Sort– OpenMp
● Implementing Bwt on GPU– Bitonic sort
Initial Step - 3
![Page 16: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/16.jpg)
![Page 17: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/17.jpg)
Why Bitonic ??...
• Concatenations of two sub-sequences sorted in opposite directions – A cyclic shift of elements
• Implemented by comparator networks– Work in place– No Communication
• Naturally suitable for SIMD architectures– Each thread executing same code but different data
• O(log2n) time and O(nlog2n) work
![Page 18: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/18.jpg)
18
Burrows-Wheeler Transform
5 $
A C G T A
4 A $ A C G T
3 T A $ A C G
2 G T A $ A C
1 C G T A $ A
0 A C G T A $
Input: A C G T A $
Output: A T $ A C G
Basic String Sorting Algorithm
indices: 0 1 2 3 4 5
5 $
A C G T A
4 A $ A C G T
0 A C G T A $
1 C G T A $ A
2 G T A $ A C
3 T A $ A C G
indices: 5 4 0 1 2 3
![Page 19: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/19.jpg)
Steps Performed
• Copy Genome from host to device Memory • Indices Array for pointing Reference string• Compare Suffix based on indices array – Swap indices accordingly.
• Sorts n elements in log2n Kernel calls. – Each of O(1) time & O(n) work
• One more step for BWT from suffix array– Bwt[i] = ref [ SA[i] -1] {Bwt[i] = $ when S(i)= 1}
![Page 20: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/20.jpg)
Cpu
Bitonic
Bitonic_sort_stepCuda_Memcpy & kernel call
CPU – GPU Interaction (BWT)
Suffix - > BWT
Genome
O(log2G)Searching
Suffix_compare
Suffix Array
Initialise_indices_array
![Page 21: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/21.jpg)
Evaluation
GPU Statistics
SM 30
Core/SM 8
Cores 240
Data cache (SM)
16 K
DRAM 536 M
Proc. Freq 1.2 MHz
Bwt with Bitonic Sort
![Page 22: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/22.jpg)
Comparison between Expected (GPU) and Exact result
CPU GPUCores 2 240
Data cache (SM)
L1 :32K L2 :6M
16K
DRAM 12GB 536 M
Proc. Clock
2.9 GHz
1.2 MHz
(Quick_Sort_time) * 2 ) / 240
![Page 23: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/23.jpg)
References :• Fast in-place sorting with CUDA based on bitonic sort :Hagen Peters
• Rapid Parallel Genome Indexing with MapReduce :Rohith K. Menon
• M. Burrows and D. Wheeler. A Block-Sorting Lossless Data Compression Algorithm. Technical report
• Lightweight Data Indexing and Compression in External Memory :Paolo Ferragina
• Parallel Lossless Data Compression on the GPU : Yao Zhang
![Page 24: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/24.jpg)
Thanks
![Page 25: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d135503460f949e6f1a/html5/thumbnails/25.jpg)
Future Work
• Run in limited memory environments– Compute in parts
• To use the memory hierarchy of GPU– Sort keys are cached in register or shared memory– Long runs of repeated character• Position indicating end of run
• Can only sort sequence,with length power of 2– 2k+1 2k+1
–Padding with largest symbol