improving bayesian computational time and scalability with gpgpu
TRANSCRIPT
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
1/33
Improving BayesianComputational Time andScalability with GPGPU
Thanakij Pechprasarn, Noppadon Khiripet
Knowledge Elicitation Archiving Laboratory (KEA)
National Electronics and Computer Technology Center
(NECTEC),A N S C S E 1 5 1 stApril2011
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
2/33
Bayesian applications
Style of problems includes: inferenceproblems, causal problems
For example, the problem may bethat given that the grass is wet(evidence), what is the probabilityof each influential cause (rain,
sprinkler)?
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
3/33
Bayesian probability
probability as a degree of belief
conditional probability given information(evidence), your belief changes
posterior as inverse probability Bayes theorem
Where,
= prior of = likelihood = posterior = prior ofD (acts as a normalizing constant of value
)
)(
)()|()|(
DP
PDPDP
=
)(P
)|( DP
)|( DP
)(DP dPDP )()|(
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
4/33
Our selected application
To do hypothesis testing givenobserved data
The expected value of the posteriorhas to fall under 95% region(credible interval) of the priordistribution
If true, then the hypothesis isaccepted, otherwise rejected
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
5/33
Posterior expectation
An expected value of the posterior,
It requires one to sample from theposterior, but a sampling methodfor all posterior may not be known,especially when the posterior has a
complex form We can work out math to make it
simpler
Remark a powerful method such as Markov chain Monte Carlo
][)|( DPE
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
6/33
Posterior expectation (2)
The definition of an expected value, So,
Using Bayes rule, From the definition of an expectation,
Now we have changed the distributionfrom using the posterior to the prior
We assume that known sampling method
for the prior distribution exists
dxXPXXE XP
= )(*][)(
= dDPE DP )|(*][)|(
dDP
PDP
=
)(
)(*)|(*
)(
)]|(*[)(
DP
DPEP =
[...][...] )()|( PDP EE
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
7/33
Hypothesis testing
We do the testing to see if thecalculated expected value of theposterior falls under 95% region of
prior distribution That is, to see if
95.0)(
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
8/33
Problems
However, we still have to solve theintegrals appeared in the denominator,P(D), and in the hypothesis testing
Analytical method may not work becauseclosed-form solution may not be found
Notice that we can convert back and forthbetween the integrals and the
expectations However, how can we really solve either
an integral or an expected value
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
9/33
Solutions
Monte Carlo integration (MCI) can beused to approximate anexpectation/integral involving a
random process
Thus, to find an expectation with MCI:
1.Sample X1..N according to the
distribution f
2.Calculate the sam le mean an
N
xf
xfE
N
i
i= 1
)(
)]([
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
10/33
Solutions (2)
Unfortunately, MCI also has itsdrawback
In general, the more samples, themore accurate of the final answer
However, with a lot more samples,the computation becomes much
slower!
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
11/33
GPUs and CUDA
GPU computing, leveraging graphicscards as an accelerator of thecomputation
Nvidia CUDA is a major frameworkfor programming GPUs
CUDA allows developers to exploit
parallelism in a form of blocks andthreads
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
12/33
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
13/33
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
14/33
Current work
Make use of our previous work, theparallel reduction module in GPUs
Speed up the computation in a real-world Bayesian application withGPU computing
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
15/33
Current work (2)
Calculate the posterior expectation
With this form, we can calculate theexpectations with MCI for both thenumerator and denominator
)(
)]|(*[][
)(
)|(DP
DPEE
P
DP
=
=
dPDP
DPEP
)(*)|(
)]|(*[)(
)]|([
)]|(*[
)(
)(
DPE
DPE
P
P=
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
16/33
Current work (3)
Given the computed value of the posteriorexpectation, one can test the hypothesisvia Monte Carlo methods as follows:
1.X1..N = sample from the prior2.count= the number of samples that its
value is less than the expected value
3.Ifcount/N< 0.95 then accept Else reject
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
17/33
Structures of the parallelprogram
1.Sample from the prior, (CPU)
2.
3.Calculate the posterior expectation (GPU)
The numerator part,
The denominator part,
4.
5.Do hypothesis testing,(CPU)
NDPN
i
ii /)|(*1=
=
N
i
i NDP1
/)|(
)(P
95.0)(
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
18/33
Extra issues
In addition to the parallelized Bayesianapplications, we also handle 2 issues found in ourprevious work in theparallel reduction step:
1. Further optimization
Although results from previous work show thatthe computational time is substantialreduced, but we find that it can be furtherimproved
Techniques: loop unrolling, enhance compactingcode
2.Scalability
The problem is that a certain block size canhandle a problem size up to a certain point,
so small blocks cannot afford larger problem
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
19/33
What about the likelihood andprior?
Prior ~ (broad prior)
Each observation ~ (normalmodel)
Likelihood = (observationsare independent)
The 23 observations weve used are from the
Cavendishs data:5.36, 5.29, 5.58, 5.65, 5.57, 5.53,5.62, 5.29, 5.44, 5.34, 5.79, 5.10,5.27, 5.39, 5.42, 5.47, 5.63, 5.34,5.46, 5.30, 5.78, 5.68, 5.85
)04.0,(N
)5.0,5(N
=
23
1 )04.0,;(i iDN
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
20/33
Platforms
CUDA 3.2
A workstation with followingspecification:Description CPU GPU
Model Intel Core i7 Nvidia GeForce GTX580Clock frequency (GHz) 2.8 1.56
# processors 2 16
# cores per processor 4 32
# total cores 8 512
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
21/33
Results
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
22/33
Results (2)
The calculated expected value isabout 5.483
It falls under 95% region, so thehypothesis is accepted
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
23/33
Results (3)
Running time: Sequential (CPU) vs Parallel(GPU)
Our maximum speed-up achieved is53.49x
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
24/33
Results (4)
However, we know that the parallelimplementation also contain asequential part
Currently only the portion of finding aposterior expectation is parallelized
If we compare the running time of
this specific portion between CPUand GPU versions, we would seegreater difference in performance
And the maximum speed-up we
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
25/33
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
26/33
Summary
Weve implemented a Bayesianapplication to do the hypothesistesting given a posterior
expectation We develop parallel programs
running on GPUs to help accelerate
the computation Our maximum speed-up obtained is
53.49x
In addition, we cope with the
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
27/33
Thank You
Q&A
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
28/33
Solving the scalability issues
We now use 2D blocks instead of 1Dblocks
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
29/33
Results (scalability issue)
We show that the smallest block size can also be used with
the largest problem size (this would not be possible inour previous work)
Problem Size Running Time (second)using Block Size = 128
65,535 0.011
131,070 0.021
262,140 0.041
524,280 0.080
1,048,560 0.159
2,097,120 0.317
4,194,240 0.631
8,388,480 1.261
16,776,960 2.523
33,553,920 5.076
67,107,840 10.368
134,215,680 20.516
268,431,360 40.332
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
30/33
Further optimization:Loop unrolling
(* parallel reduction in the reduce kernel *)FOR s from num_samples/2 to 64 having s/=2 Sync threads (* make sure that all threads are working on the same
level of the tree *) IF threadIdis less than s THEN Add s_data[threadId] to s_data[threadId+ s] END IFEND FOR(* loop unrolling *)IF threadIdis less than 32 THEN (* CUDA warp size is 32 *) Add s_data[threadId] to s_data[threaded+ 32]
Add s_data[threadId] to s_data[threaded+ 16] Add s_data[threadId] to s_data[threaded+ 8] Add s_data[threadId] to s_data[threaded+ 4] Add s_data[threadId] to s_data[threaded+ 2] Add s_data[threadId] to s_data[threaded+ 1]END IF
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
31/33
Further optimization:Enhance compacting kernel
Original version:kernel_reduce ()
Modified version:kernel_reduce
()
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
32/33
Effect of furtheroptimization
Unfortunately, each introduced optimizationon parallel reduction seems to have a littlegain
We find that this is due to the other hot spot
in the program that dominates thecom utation (that is, the time s ent on
-
8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU
33/33
Monte Carlo integration(MCI)
We want to integrate fin [a,b]
Divide by P(x), distribution that we know how to
sample from
Change into a form of expectation
Estimate the integral by sampling from P(x) andcalculate the sample mean
=b
a
dxxfI )(
= dxxPxP
xfI )(
)(
)(
])(
)([)(
xP
xfEI XP=
N
xP
xf
I
N
i i
i= 1
)(
)(