m odeling i nter -m otif d ependence without increasing the complexity zhizhuo zhang
TRANSCRIPT
MODELING INTER-MOTIF DEPENDENCE WITHOUT INCREASING THE COMPLEXITYZhizhuo Zhang
PWM MODEL
1 2 3 4 5 6
A 0.1 0 0 1 0.4 0.4
C 0 0.1 0 0 0.4 0
G 0.1 0 1 0 0.2 0.2
T 0.8 0.9 0 0 0 0.4
PositionalWeightMatrix (PWM)
| | 4
1 1
log(P(x| ))= ( ( ), ) log( )
where ( , ) is indicator function
x
iji j
x i j
TTGACTTCGACTTTGACTTTGAAAATGAGGTTGAAAGTGAAATTGACTTTGAGGTTGAAA
HIGH -ORDER DEPENDENCY
1st -order
2mer P4-5
CT 0.4
AA 0.4
GG 0.2
CC 0
AC 0
…. 0
TT 0
TTGACTTCGACTTTGACTTTGAAAATGAGGTTGAAAGTGAAATTGACTTTGAGGTTGAAA
HIGH -ORDER DEPENDENCY
Assume only one dependency group
1...| |
4
1, 1 2
log(P(x| , ))= ( ( ), ) log( ) ( , ) log( ( ( )))
where ( , ) is indicator function
xij
i i j I
x i j I i x I
TWO MODELING PRINCIPLES
Inter-dependence bases only exists in the diverged positions.
There is no inter-dependence relationship across the conserved base.
PRINCIPLE ONE
People use KL-Divergence to measure the dissimilarity between two probability distribution
To show the KL-divergence between K+1 order distribution and K order distribution + 0 order distribution is small when the K+1 position base is very conserved.
PRINCIPLE ONEThe KL-divergence between K+1 order distribution and K order distribution + 0 order distribution is followed:
= 𝑃1:𝑘+1(𝐴,𝑥)𝑙𝑜𝑔 𝑃1:𝑘+1ሺ𝐴,𝑥ሻ𝑃1:𝑘ሺ𝐴ሻ𝑃𝑘+1ሺ𝑥ሻ𝐴=ሾ𝑎𝑐𝑔𝑡ሿ𝑘,𝑥=ሾ𝑎𝑐𝑔𝑡ሿ
= 𝑃1:𝑘+1(𝐴,𝑥)𝑙𝑜𝑔𝑃1:𝑘+1ሺ𝐴,𝑥ሻ𝑃1:𝑘ሺ𝐴ሻ𝐴=ሾ𝑎𝑐𝑔𝑡ሿ𝑘,𝑥=ሾ𝑎𝑐𝑔𝑡ሿ − 𝑃1:𝑘+1(𝐴,𝑥)𝑙𝑜𝑔𝑃𝑘+1ሺ𝑥ሻ𝐴=ሾ𝑎𝑐𝑔𝑡ሿ𝑘,𝑥=ሾ𝑎𝑐𝑔𝑡ሿ
= ቌ 𝑃1:𝑘+1(𝐴,𝑥)𝑙𝑜𝑔𝑃1:𝑘+1ሺ𝐴,𝑥ሻ𝑃1:𝑘ሺ𝐴ሻ𝐴=ሾ𝑎𝑐𝑔𝑡ሿ𝑘,𝑥=ሾ𝑎𝑐𝑔𝑡ሿ ቍ+ 𝐻𝑘+1ሺ𝑋ሻ = 𝐻1:𝑘ሺ𝐴ሻ− 𝐻1:𝑘+1ሺ𝐴,𝑋ሻ+ 𝐻𝑘+1ሺ𝑋ሻ = −𝐻ሺ𝑥ȁ𝐴ሻ+ 𝐻𝑘+1ሺ𝑋ሻ ≤ 𝐻𝑘+1(𝑋)
PRINCIPLE TWOCys2His2 Zinc Finger DNA-binding family, which is the largest known DNA-binding family in multi-cellular organisms.
Independent
CONTROL THE COMPLEXITY
The larger the dependence group, the more parameters, the easier to overfit.
We want to model the k-order dependence using the same number of parameters as (k+1) independent position PWM. (i.e.,4k+4 parameters)
CONTROL THE COMPLEXITY
1 2 3 4 5 6
A 0.1 0 0 1 CT=0.4
AC=0
C 0 0.1 0 0 AA=0.4
CA=0
G 0.1 0 1 0 GG=0.2
TT=0
T 0.8 0.9 0 0 CC=0 Other=0
TTGACTTCGACTTTGACTTTGAAAATGAGGTTGAAAGTGAAATTGACTTTGAGGTTGAAA
Dependence PositionalWeightMatrix (PWM)
CONTROL THE COMPLEXITY
Model the problem: Given a set of binding site sequences X (each is
length k), find a DPWM Ω maximize the likelihood P(X| Ω) (or minimize the KL-divergence), with 4k parameters
We can prove that taking top 4k-1 kmer probability as the first 4k-1 paramter value is the best solution:
Let the alphabet index for 𝐴= ሾ𝑎𝑐𝑔𝑡ሿ𝑘 is 1,2,3,…, 4𝑘, the k-order dependency model is built according to the following rules:
𝑃𝑘ሺ𝑖ሻ= ቌ
𝑃𝑡ሺ𝑖ሻ 𝑖 ∈ሾ1,4𝑘− 1ሿ𝑚𝑘 𝑖 ∈ሾ4𝑘,4𝑘ሿ,𝑚𝑘 = 𝑃𝑡ሺ𝑖ሻ4𝑘 − 4𝑘+ 14𝑘
𝑖=4𝑘ቍ
𝑓𝑖𝑛𝑑 𝑷𝒌 𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝑃𝑡(𝐴)𝑙𝑜𝑔𝑃𝑘ሺ𝐴ሻ𝐴=ሾ𝑎𝑐𝑔𝑡ሿ𝑘
EXHAUSTIVE SEARCH DEPENDENCE
Naive method Enumerate all the combinations and find the max
likelihood combination. Example: length 5
1,2,3,4,5 (1,2,3),4,5 (1,2)3,(4,5) (1,2,4,5)3 (1,2,3,4,5) ….
EXHAUSTIVE SEARCH DEPENDENCE
improved method: Enumerate only single dependence group
If D1 and D2 are two independent groups Then D1, D2 can be used to compute
D1,D2 In fact, greedy search
Example: sorted combination (log likelihood) (1,2),3,4,5: -32 (1,2,3),4,5:-44 1,2,3,(4,5):-50 … 1,2,3,4,5:-100
The best (1,2),3,(4,5)
RESULT
Run MEME, Cisfinder, Amadeus, ChIPMunk, HMS, Trawler, Weeder, JPomoda on 15 ES ChIPseq datasets
Using one half of ChIPseq peaks to learn de novo PWM, and the other half to validate their performances.
RESULT
MEME DP_MEME Weeder DP_Weeder Cisfinder DP_Cisfinder Amadeus DP_Amadeus HMS DP_HMS trawler DP_trawler ChIPMunk DP_ChIPMunk Jpomoda DP_Jpomoda
tcfcp2I1 0.9212 0.9375 0.8911 0.9544 0.9328 0.9644 0.8615 0.8752 0.9707 0.9673 NA NA 0.9703 0.9702 0.9710 NA
klf4 0.8625 0.8596 0.8445 0.8569 0.8487 0.8601 0.8240 0.8389 0.8612 0.8561 0.6310 0.6538 0.8637 0.8592 0.8360 0.8369
suz12 0.6434 0.6438 0.5852 0.5695 0.5760 0.5838 0.5912 0.5919 0.5920 0.5959 NA NA NA NA 0.5963 0.6005
zfx 0.7586 0.7548 0.7717 0.7432 0.7406 0.7433 0.6974 0.7089 0.6166 0.6096 0.7606 0.7624 0.7562 0.7672 0.7522 0.7531
stat3 0.7137 0.7229 0.6989 0.7200 0.7216 0.7323 0.7159 0.7041 0.7035 0.7116 0.6898 0.7090 0.7243 0.7332 0.7455 0.7424
nmyc 0.7785 0.7803 0.7425 0.7406 0.7494 0.7520 0.7425 0.7455 0.7145 0.7358 NA NA 0.7547 0.7728 0.7640 0.7602
esrrbredo 0.9099 0.9052 0.9076 0.9144 0.8994 0.9051 0.8874 0.8820 0.8807 0.8967 0.8769 0.8876 NA NA 0.8729 0.8713
cmyc 0.7681 0.7668 0.7550 0.7617 0.7746 0.7728 0.7631 0.7594 0.6855 0.6984 0.7472 0.7675 NA NA 0.7801 0.7807
e2f1 0.6110 0.6185 0.5714 0.5803 0.5729 0.6039 0.5818 0.5875 0.5884 0.5900 0.5629 0.5767 0.6420 0.6464 0.6208 0.6204
nanog 0.6690 0.6813 0.6649 0.6850 0.6635 0.6835 0.6074 0.6171 0.6722 0.6795 NA NA 0.5635 0.5554 0.6964 0.6997
oct4 0.6673 0.6827 0.6646 0.6816 0.6460 0.6784 0.6293 0.6470 0.4790 0.4780 NA NA 0.7194 0.7136 0.6880 0.6891
sox2 0.8449 0.8837 0.8151 0.8514 0.8145 0.8615 0.7369 0.7434 0.5758 0.5823 0.7881 0.8506 0.8323 0.8558 0.8185 0.8427
smad1 0.5848 0.5847 0.5847 0.6042 0.5765 0.5765 0.5767 0.5767 0.5781 0.5718 0.5484 0.5504 0.6048 0.5957 0.6328 0.6328
ctcf 0.9809 0.9854 0.9708 0.9846 0.9648 0.9855 0.9474 0.9680 0.9790 0.9862 NA NA 0.9819 0.9818 0.9804 0.9835
p300 0.6198 0.6062 0.5355 0.5224 NA NA NA NA 0.5749 0.5883 NA NA 0.5831 0.5898 0.5709 NA
ADJACENT DEPENDENCY
MEME CTCF motif 1-2-3,10-11 AUC Result:
MEME:0.9809 Dependence: 0.9854
LARGE DEPENDENCY GROUP
MEME SOX2 motif 1-2-3-4-5-7,14-15 AUC Result:
MEME:0.845 Dependence:0.884
LONG DEPENDENCY
MEME NMYC motif 10-21,11-12 AUC Result:
MEME:0.7785 Dependence: 0.7803
NEW SERVERS CONFIGURATION
MODEL & PRICE
Hostname: genome3U server2X Intel Xeon X5680 Processor(6-core each) 144GB RAM16X2TB SAS Disks2X1G network interfacesPrice:20kSGD
Hostname: biogpu1U server2X Intel Xeon X5680 Processor(6-core each) 2XM2050 GPU48GB RAM3X2TB SATA2 Disks2X1G network interfacesPrice:18k SGD
FILE SYSTEM
genome: RAID-6, 28TB , Centos5.5 Home:23TB
biogpu: RAID-5, 4TB , YellowDog linux (Centos5.4) Home: 3TB
SERVER SOFTWARE
NIS: using the same account for 2 servers NFS:
Home directory : genome server Public_html: biogpu server Share software: /cluster/biogpu/programs/bin/
Apache: biogpu server Mysql: genome server
CURRENT PROBLEMS
Filesystem Size Used Avail Use% Mounted on/dev/mapper/VolGroup00-LogVol02 393G 4.7G 368G 2% //dev/mapper/VolGroup00-LogVol00 2.0T 199M 1.9T 1% /tmp/dev/sdb2 23T 23T 439G 99% /home/dev/sda1 920M 47M 826M 6% /boottmpfs 71G 0 71G 0% /dev/shm
I/O killer
TO DO
Swap backup Connect to Tembusu Install SGE
GPU COMPUTING
FERMI M2050 Fermi M2050
Peak double precision floating point performance
515 Gigaflops
Peak single precision floating point performance
1030 Gigaflops
CUDA cores 448
Memory size (GDDR5) 3 GigaBytes
Memory bandwidth *(ECC off)
144 GBytes/sec
27
CODE EXAMPLE TO ADD TWO ARRAYS CUDA C Program
__global__ void addMatrixG( float *a, float *b, float *c, int N ) int i = blockIdx.x * blockDim.x + threadIdx.x; int j = blockIdx.y * blockDim.y + threadIdx.y; int index = i + j * N; if ( i < N && j < N ) c[index] = a[index] + b[index];
void main() ...... dim3 dimBlk( 16, 16 ); dim3 dimGrd( N/dimBlk.x, N/dimBlk.y ); addMatrixG<<<dimGrd, dimBlk>>>( a, b, c, N );
Device code
Host code
A CUDA kernel
28
CUDA MEMORY MODEL Each thread can
R/W per-thread registers R/W per-thread local
memory R/W per-block shared
memory R/W per-grid global
memory RO per-grid constant
memory RO per-grid texture
memory
Host can R/W global, constant and texture memory
Host
CUDA MEMORY HIERARCHY
The CUDA platform has three primary memory typesLocal Memory – per thread memory for automatic variables and register spilling.
Shared Memory – per block low-latency memory to allow for intra-block data sharing and synchronization. Threads can safely share data through this memory and can perform barrier synchronization through _ _syncthreads()
Global Memory – device level memory that may be shared between blocks or grids
MOVING DATA…CUDA allows us to copy data from one memory type to another.
This includes dereferencing pointers, even in the host’s memory (main system RAM)
To facilitate this data movement CUDA provides cudaMemcpy()
31
CUDA EXAMPLE 1 – VECTOR ADDITION (1)
// Device code__global__ void VecAdd( float *A, float *B, float *C ) int i = blockIdx.x * blockDim.x + threadIdx.x; if ( i < N ) C[i] = A[i] + B[i];
// Host codeint main() // Allocate vectors in device memory size_t size = N * sizeof(float); float *d_A; cudaMalloc( (void**)&d_A, size ); float *d_B; cudaMalloc( (void**)&d_B, size ); float *d_C; cudaMalloc( (void**)&d_C, size );
32
CUDA EXAMPLE 1 – VECTOR ADDITION (2) // Copy vectors from host memory to device memory // h_A and h_B are input vectors stored in host memory cudaMemcpy( d_A, h_A, size, cudaMemcpyHostToDevice ); cudaMemcpy( d_B, h_B, size, cudaMemcpyHostToDevice ); // Invoke kernel int threadsPerBlock = 256; int blocksPerGrid = (N + threadsPerBlock – 1) / threadsPerBlock; VecAdd<<<blocksPerGrid, threadsPerBlock>>>( d_A, d_B,
d_C ); // Copy result from device memory to host memory // h_C contains the result in host memory cudaMemcpy( h_C, d_C, size, cudaMemcpyDeviceToHost ); // Free device memory cudaFree(d_A); cudaFree(d_B); cudaFree(d_C);
OPTIMIZATION
Minimize the diverse path (if,else …) Collapsed access global memory Scattering to gathering Use Share memory as much as possible
34
Compiling codeLinux
Command line. CUDA provides nvcc (a NVIDIA “compiler-driver”. Use instead of gcc
nvcc –O3 –o <exe> <input> -I/usr/local/cuda/include –L/usr/local/cuda/lib –lcudart
Separates compiled code for CPU and for GPU and compiles code. Need regular C compiler installed for CPU.Make files also provided.
Windows
NVIDIA suggests using Microsoft Visual Studio
CUDA TOO HARD?
Use others software with cuda acceleration Use wrapper library
CUDA ACCELERATED SOFTWARE
cuBlas, cudaLAPACK CudaR, CudaPy Cuda Bioinformatics Softwares:
Molecular Dynamics & Quantum Chemistry • ACE MD • AMBER • BigDFT (ABINIT) (news) • GROMACS • HOOMD • LAMMPS • NAMD • TeraChem (Quantum Chemistry) • VMD
Bio Informatics • CUDA-BLASTP • CUDA-EC • CUDA-MEME • CUDASW++ (Smith-Waterman) • DNADist • GPU Blast • GPU-HMMER • HEX Protein Docking • Jacket (MATLAB Plugin) • MUMmerGPU • MUMmerGPU++
THRUST
Searching Binary Search
Vectorized SearchesCopying
Gathering Scattering
Reductions Counting Comparisons Extrema Transformed Reductions Logical Predicates
• ReorderingPartitioningStream Compaction
• Prefix SumsSegmented Prefix SumsTransformed Prefix Sums
• Set Operations• Sorting• Transformations
FillingModifyingReplacing
EXAMPLE 1
#include <thrust/count.h> #include <thrust/device_vector.h> ... // put three 1s in a device_vector thrust::device_vector<int> vec(5,0); vec[1] = 1; vec[3] = 1; vec[4] = 1; // count the 1s int result = thrust::count(vec.begin(), vec.end(), 1); // result is three
EXAMPLE 2 #include <thrust/transform_reduce.h>
#include <thrust/functional.h>#include <thrust/device_vector.h>#include <thrust/host_vector.h>#include <cmath>
// square<T> computes the square of a number f(x) -> x*xtemplate <typename T>struct square __host__ __device__ T operator()(const T& x) const return x * x; ;
int main(void) // initialize host array float x[4] = 1.0, 2.0, 3.0, 4.0;
// transfer to device thrust::device_vector<float> d_x(x, x + 4);
// setup arguments
square<float> unary_op; thrust::plus<float> binary_op; float init = 0;
// compute norm
float norm = std::sqrt( thrust::transform_reduce(d_x.begin(), d_x.end(), unary_op, init, binary_op) );
std::cout << norm << std::endl;
return 0;
40
REFERENCES NVIDIA CUDA Programming Guide, Version 2.3
http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/NVIDIA_CUDA_Programming_Guide_2.3.pdf
NVIDIA CUDA C Programming Best Practices Guide, Version 2.3 http://
developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/NVIDIA_CUDA_BestPracticesGuide_2.3.pdf
http://code.google.com/p/thrust/wiki/Documentation