Download - By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation
![Page 1: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/1.jpg)
Gaussian EliminationBy
Yequn Zhang, Yu Zhang
![Page 2: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/2.jpg)
Contents
IntroductionProblem AnalysisProposed AlgorithmEvaluation
![Page 3: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/3.jpg)
Contents
IntroductionProblem AnalysisProposed AlgorithmEvaluation
![Page 4: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/4.jpg)
Gaussian EliminationForward EliminationBack Substitution
![Page 5: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/5.jpg)
Contents
IntroductionProblem AnalysisProposed AlgorithmEvaluation
![Page 6: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/6.jpg)
Problem AnalysisData size used by kernels changes continuouslyDifficult to find an appropriate block size to avoid divergenceBlock-based approach
Assign a certain part of computation running on CPU-leave the irregularity to cpu
Manually make the data size changes with a step of block sizeBlock number per grid is easy to set
![Page 7: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/7.jpg)
Contents
IntroductionProblem AnalysisProposed AlgorithmEvaluation
![Page 8: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/8.jpg)
Forward EliminationA block-based approachTry to avoid divergenceTry to use GPUTry to be fine-grained
![Page 9: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/9.jpg)
K 1
Find Max Row
![Page 10: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/10.jpg)
Swapcpu
Now start toeliminate the block of data on cpu
![Page 11: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/11.jpg)
Calculatecoefficients
![Page 12: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/12.jpg)
Eliminationon CPU
![Page 13: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/13.jpg)
K 1
Calculate Coefficients
![Page 14: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/14.jpg)
K2K 2
Eliminationon CPU
![Page 15: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/15.jpg)
Swap on GPU
K3
K 3
![Page 16: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/16.jpg)
K4Elimination on GPU
K 4
![Page 17: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/17.jpg)
K5Eliminationon GPU
K 5
![Page 18: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/18.jpg)
Intra-block loop
![Page 19: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/19.jpg)
Inter-block loop
![Page 20: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/20.jpg)
Last inter-block loopprocessedon CPU
![Page 21: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/21.jpg)
Back SubstitutionLaunch kernel when number of coefficients per row
exceeds four block size (64*4=256)A fine-grained way, use a similar way as forward
elimination, part on CPU and part on GPU
![Page 22: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/22.jpg)
Contents
IntroductionProblem AnalysisProposed AlgorithmEvaluation
![Page 23: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/23.jpg)
Block size effect
![Page 24: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/24.jpg)
The contribution of swap and find max rowIs it necessary to implement every part on GPU?
![Page 25: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/25.jpg)
Performance breakdownContribution of each part to the total performance,
including kernels as well as CPU part
![Page 26: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/26.jpg)
Speedup
![Page 27: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation](https://reader030.vdocuments.mx/reader030/viewer/2022033108/56649d435503460f94a1f3d2/html5/thumbnails/27.jpg)
Questions ?