optimizing matrix multiplication with a classifier learning system xiaoming li (presenter) maría...
TRANSCRIPT
![Page 1: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/1.jpg)
Optimizing Matrix Multiplication with a Classifier Learning System
Xiaoming Li (presenter)María Jesús Garzarán
University of Illinois at Urbana-Champaign
![Page 2: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/2.jpg)
Tuning library for recursive matrix multiplication
• Use cache-aware algorithms that take into account architectural features– Memory hierarchy– Register file, …
• Take into account input characteristics– matrix sizes
• The process of tuning is automatic.
![Page 3: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/3.jpg)
Recursive Matrix Partitioning
• Previous approaches– Multiple recursive steps– Only divide by half
A B
![Page 4: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/4.jpg)
Recursive Matrix Partitioning
• Previous approaches:– Multiple recursive steps– Only divide by half
A B
Step 1:
![Page 5: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/5.jpg)
Recursive Matrix Partitioning
• Previous approaches:– Multiple recursive steps– Only divide by half
A B
Step 2:
![Page 6: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/6.jpg)
Recursive Matrix Partitioning
• Our approach is more general– No need to divide by half– May use a single step to reach the same partition– Faster and more general
A B
Step 1:
![Page 7: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/7.jpg)
Our approach
• A general framework to describe a family of recursive matrix multiplication algorithms, where given the input dimensions of the matrices, we determine:– Number of partition levels– How to partition at each level
• An intelligent search method based on a classifier learning system– Search for the best partitioning strategy in a
huge search space
![Page 8: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/8.jpg)
Outline
• Background
• Partition Methods
• Classifier Learning System
• Experimental Results
![Page 9: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/9.jpg)
Recursive layout framework
• Multiple levels of recursion– Takes into account the
cache hierarchy
1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56
57 58 59 60 61 62 63 64
![Page 10: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/10.jpg)
Recursive layout framework
1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56
57 58 59 60 61 62 63 64
• Multiple levels of recursion– Takes into account the
cache hierarchy
![Page 11: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/11.jpg)
Recursive layout in our framework
1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56
57 58 59 60 61 62 63 64
• Multiple levels of recursion– Takes into account the
cache hierarchy
![Page 12: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/12.jpg)
Recursive layout framework
1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56
57 58 59 60 61 62 63 64
• Multiple levels of recursion– Takes into account the
cache hierarchy
![Page 13: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/13.jpg)
Recursive layout framework
1 2 5 6 17 18 21 22
3 4 7 8 19 20 23 24
9 10 13 14 25 26 29 30
11 12 15 16 27 28 31 32
33 34 37 38 49 50 53 54
35 36 39 40 51 52 55 56
41 42 45 46 57 58 61 62
43 44 47 48 59 60 63 64
• Multiple levels of recursion– Takes into account the
cache hierarchy
![Page 14: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/14.jpg)
Padding
• Necessary when the partition factor is not a divisor of the matrix dimension.
2000 Divide by 3
![Page 15: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/15.jpg)
Padding
• Necessary when the partition factor is not a divisor of the matrix dimension.
2001 Divide by 3
667
![Page 16: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/16.jpg)
Padding
• Necessary when the partition factor is not a divisor of the matrix dimension.
2001 Divide by 4
667
![Page 17: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/17.jpg)
Padding
• Necessary when the partition factor is not a divisor of the matrix dimension.
2004 Divide by 4
668
![Page 18: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/18.jpg)
Recursive layout in our framework
• Multiple level recursion– Support cache hierarchy
• Square tile rectangular tile– Fit non-square matrixes
![Page 19: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/19.jpg)
Recursive layout in our framework
• Multiple level recursion– Support cache hierarchy
• Square tile rectangular tile– Fit non-square matrixes
9
8
![Page 20: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/20.jpg)
Recursive layout in our framework
• Multiple level recursion– Support cache hierarchy
• Square tile rectangular tile– Fit non-square matrixes
10
8
Padding
![Page 21: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/21.jpg)
Recursive layout in our framework
• Multiple level recursion– Support cache hierarchy
• Square tile rectangular tile– Fit non-square matrixes
3
4
![Page 22: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/22.jpg)
Outline
• Background
• Partition Methods
• Classifier Learning System
• Experimental Results
![Page 23: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/23.jpg)
• Partition by Block (PB)– Specify the size of each tile– Example:
• Dimensions (M,N,K) = (100, 100, 40)• Tile size (bm, bn, bk) = (50, 50, 20)
Partition factors (pm, pn, pk) = (2,2,2)
– Tiles need not to be square
Two methods to partition matrices
bk
kpk
bn
npn
bm
mpm ,,
![Page 24: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/24.jpg)
Two methods to partition matrices
• Partition by Size (PS)– Specify the maximum size of the three tiles.– Maintain the ratios between dimensions constant– Example:
• (M,N,K) = (100, 100,50)• Maximum tile size for M,N = 1250
(pm, pn, pk) = (2,2,1)
– Generalization of the “divide-by-half” approach.• Tile size = 1/4 * matrix size
![Page 25: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/25.jpg)
Outline
• Background
• Partition Methods
• Classifier Learning System
• Experimental Results
![Page 26: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/26.jpg)
Classifier Learning System
• Use the two partition primitives to determine how the input matrices are partitioned– Determine partition factors at each level
f: (M,N,K) (pmi,pni,pki), i=0,1,2 (only consider 3 levels)
• The partition factors depend on the matrix size– Eg. The partitions factors of a (1000 x 1000) matrix
should be different that those of a (50 x 1000) matrix.
• The partition factors also depend on the architectural characteristics, like cache size.
![Page 27: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/27.jpg)
Determine the best partition factors
• The search space is huge exhaustive search is impossible
• Our proposal: use a multi-step classifier learning system– Creates a table that given the matrix
dimensions determines the partition factors
![Page 28: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/28.jpg)
Classifier Learning System
• The result of the classifier learning system is a table with two columns
• Column 1 (Pattern): A string of ‘0’, ‘1’, and ‘*’ that encodes the dimensions of the matrices
• Column 2 (Action): Partition method for one step– Built using the “partition-by-block” and “partition-by-
size” primitives with different parameters.
![Page 29: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/29.jpg)
Learn with Classifier System
Pattern Action
(10***,11***) PS 100
… …
(010**,011**) PB (4,4)
![Page 30: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/30.jpg)
Learn with Classifier System
Pattern Action
(10***,11***) PS 100
… …
(010**,011**) PB (4,4)
5 bits / dim
![Page 31: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/31.jpg)
Learn with Classifier System
Pattern Action
(10***,11***) PS 100
… …
(010**,011**) PB (4,4)
16
24
![Page 32: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/32.jpg)
Learn with Classifier System
Pattern Action
(10***,11***) PS 100
… …
(010**,011**) PB (4,4)
16
24
![Page 33: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/33.jpg)
Learn with Classifier System
Pattern Action
(10***,11***) PS 100
… …
(010**,011**) PB (4,4)
8
12
![Page 34: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/34.jpg)
Learn with Classifier System
Pattern Action
(10***,11***) PS 100
… …
(010**,011**) PB (4,4)
8
12
![Page 35: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/35.jpg)
Learn with Classifier System
Pattern Action
(10***,11***) PS 100
… …
(010**,011**) PB (4,4)
8
12
![Page 36: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/36.jpg)
Learn with Classifier System
Pattern Action
(10***,11***) PS 100
… …
(010**,011**) PB (4,4)
4
4
![Page 37: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/37.jpg)
How classifier learning algorithm works?
• Change the table based on the feedback of performance and accuracy from previous runs.
• Mutate the condition part of the table to adjust the range of matching matrix dimensions.
• Mutate the action part to find the best partition method for the matching matrices.
![Page 38: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/38.jpg)
Outline
• Background
• Partition Methods
• Classifier Learning System
• Experimental Results
![Page 39: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/39.jpg)
Experimental Results
• Experiments on three platforms– Sun UltraSparcIII– P4 Intel Xeon– Intel Itanium2
• Matrices of sizes from 1000 x 1000 to 5000 x 5000
![Page 40: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/40.jpg)
Algorithms• Classifier MMM: our approach
– Include the overhead of copying in and out of recursive layout
• ATLAS: Library generated by ATLAS using the search procedure without hand-written codes. – Has some type of blocking for L2
• L1: One level of tiling– tile size: the same that ATLAS for L1
• L2: Two levels of tiling– L1tile and L2tile: the same that ATLAS for L1
![Page 41: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/41.jpg)
![Page 42: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/42.jpg)
![Page 43: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/43.jpg)
Conclusion and Future Work
• Preliminary results prove the effectiveness of our approach– Sun UltraSparcIII and Xeon: 18% and 5%
improvement, respectively. – Itanium: -14%
• Need to improve padding mechanism– Reduce the amount of padding– Avoid unnecessary computation on padding
![Page 44: Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55178a105503460e6e8b5738/html5/thumbnails/44.jpg)
Thank you!