![Page 1: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/1.jpg)
What is the WHT anyway, and why are there so many ways to
compute it?
Jeremy Johnson
1, 2, 6, 24, 112, 568, 3032, 16768,…
![Page 2: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/2.jpg)
Walsh-Hadamard Transform
• y = WHTN x, N = 2n
n
N WHTWHTWHT 22...
11
11WHT2
1111
1111
1111
1111
11
11
11
11
WHTWHTWHT 224
![Page 3: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/3.jpg)
WHT Algorithms
• Factor WHTN into a product of sparse structured matrices
• Compute: y = (M1 M2 … Mt)xyt = Mtx…
y2 = M2 y3
y = M1 y2
![Page 4: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/4.jpg)
Factoring the WHT Matrix
• AC DCD• A • A C) = (A C
• Im nmn
1100
1100
0011
0011
1010
0101
1010
0101
1111
1111
1111
1111
4WHT
WHT2 WHT2WHT2WHT2
![Page 5: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/5.jpg)
Recursive and Iterative Factorization
WHT8 WHT2WHT4
WHT2WHT2 I2WHT
WHT2WHT2 I2I2WHT
WHT2WHT2 I2I2WHT
WHT2WHT2 I2I2WHT
WHT2WHT2 I4WHT
![Page 6: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/6.jpg)
Recursive Algorithm
11
11
11
11
11
11
11
1111
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11111111
11111111
11111111
11111111
11111111
11111111
11111111
11111111
WHT8 = (WHT2 I4)(I2 (WHT2 I2)(I2 WHT2))
![Page 7: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/7.jpg)
Iterative Algorithm
11
11
11
11
11
11
11
11
11
11
1
11
11
11
11
11
11
11
11
11
11
11
11
11
11111111
11111111
11111111
11111111
11111111
11111111
11111111
11111111
WHT8 = (WHT2 I4)(I2 WHT2 I2)(I4 WHT2)
![Page 8: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/8.jpg)
WHT Algorithms
• Recursive
• Iterative
• General
n
i
ini
N1
IWHTIWHT 2221
WHTIIWHTWHT 2 2/2/2 NNN
nnn t
t
i
nnnnnn tiii
1
1
where
,2222 IWHTIWHT 111
![Page 9: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/9.jpg)
WHT Implementation
• Definition/formula– N=N1* N2Nt Ni=2ni – x=WHTN*x x =(x(b),x(b+s),…x(b+(M-1)s))
• Implementation(nested loop) R=N; S=1; for i=t,…,1 R=R/Ni
for j=0,…,R-1 for k=0,…,S-1 S=S* Ni;
Mb,s
t
i 1
)
nn WHTWHT 222 II( n1+ ··· + ni-1 2ni+1+ ··· + nt
i
ii
i
i
NSkSjNN
NSkSjN xWHTx ,,
i
![Page 10: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/10.jpg)
Partition Trees
4
1 3
1 2
1 1
4
1 1 11
Right Recursive
Iterative
9
3 4 2
1 2 1
1 1
4
13
12
11
Left Recursive
4
2 2
1 1 1 1
Balanced
![Page 11: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/11.jpg)
Ordered Partitions
• There is a 1-1 mapping from ordered partitions of n onto (n-1)-bit binary numbers.
There are 2n-1 ordered partitions of n.
1|1 1|1 1 1 1|1 1 1+2+4+2 = 9
162 = 1 0 1 0 0 0 1 0
![Page 12: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/12.jpg)
Enumerating Partition Trees
3 3
12
00 01
3
12
11
01
3
21
10
3
1 2
1 1
10
3
11
11
1
![Page 13: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/13.jpg)
Search Space
• Optimization of the WHT becomes a search, over the space of partition trees, for the fastest algorithm.
• The number of trees:
TTT...
1
1
1n
nnnn t
t
n
n 1 2 3 4 5 6 7 8T_n 1 2 6 24 112 568 3672 16768
![Page 14: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/14.jpg)
Size of Search Space
• Let T(z) be the generating function for Tn
Tn = (n/n3/2), where =4+8 6.8284
• Restricting to binary trees
Tn = (5n/n3/2)
))(1/()()1/()( 2 zTzTzzzT
2)()1/()( zBzzzB
![Page 15: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/15.jpg)
WHT PackagePüschel & Johnson (ICASSP ’00)• Allows easy implementation of any of the possible WHT algorithms• Partition tree representation W(n)=small[n] | split[W(n1),…W(nt)]• Tools
– Measure runtime of any algorithm
– Measure hardware events
– Search for good implementation
• Dynamic programming
• Evolutionary algorithm
![Page 16: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/16.jpg)
Histogram (n = 16, 10,000 samples)
•Wide range in performance despite equal number of arithmetic operations (n2n flops)
•Pentium III consumes more run time (more pipeline stages)
•Ultra SPARC II spans a larger range
![Page 17: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/17.jpg)
Operation Count
Theorem. Let WN be a WHT algorithm of size N. Then the number of floating point operations (flops) used by WN is Nlg(N).
Proof. By induction.
2222
)(2)(
11
1
nninnn
Nflopsnn
Nflops
nnn
WWt
ii
it
i
i
i
t
i
i
![Page 18: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/18.jpg)
Instruction Count Model
)()()A()IC(3
1
3
1AL nlninn
ll
ii
A(n) = number of calls to WHT procedure= number of instructions outside loopsAl(n) = Number of calls to base case of size l l = number of instructions in base case of size l
Li = number of iterations of outer (i=1), middle (i=2), and outer (i=3) loop i = number of instructions in outer (i=1), middle (i=2), and outer (i=3) loop body
![Page 19: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/19.jpg)
Small[1].file "s_1.c"
.version "01.01"
gcc2_compiled.:
.text
.align 4
.globl apply_small1
.type apply_small1,@function
apply_small1:
movl 8(%esp),%edx //load stride S to EDX
movl 12(%esp),%eax //load x array's base address to EAX
fldl (%eax) // st(0)=R7=x[0]
fldl (%eax,%edx,8) //st(0)=R6=x[S]
fld %st(1) //st(0)=R5=x[0]
fadd %st(1),%st // R5=x[0]+x[S]
fxch %st(2) //st(0)=R5=x[0],s(2)=R7=x[0]+x[S]
fsubp %st,%st(1) //st(0)=R6=x[S]-x[0] ?????
fxch %st(1) //st(0)=R6=x[0]+x[S],st(1)=R7=x[S]-x[0]
fstpl (%eax) //store x[0]=x[0]+x[S]
fstpl (%eax,%edx,8) //store x[0]=x[0]-x[S]
ret
![Page 20: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/20.jpg)
Recurrences
leaf a ,0)A(
... ),A(1)A(1
12
n
nnnn
n
nnnti
t
i
i
lnl
ln
llA leaves ofnumber where,)( 2
![Page 21: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/21.jpg)
Recurrences
leaf a ,0)(
... ,)()(
... ,)()(
... ),()(
L
2L2L
2L2L
L2L
11
23
1
...
122
11
11
11
n
nnnn
nnnn
nnnn
n
nnnnn
nnnnn
nntn
i
ti
t
i
ti
t
i
ti
t
i
ii
ii
i
![Page 22: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/22.jpg)
Histogram using Instruction Model (P3)
l = 12, l = 34, and l = 106 = 271 = 18, 2 = 18, and 1 = 20
![Page 23: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/23.jpg)
Algorithm Comparison
Recursive/Iterative Runtime
0.00E+002.00E-014.00E-016.00E-018.00E-011.00E+00
1.20E+001.40E+001.60E+001.80E+002.00E+00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
WHT size(2̂ n)
ratio r1/i1
Rec &Bal/It Instruction Count
0
0.5
1
1.5
2
2.5
1 3 5 7 9 11 13 15 17 19
rr1/i1
lr1/i1
bal1/i1
Rec&It/Best Runtime
0.00E+00
2.00E+00
4.00E+00
6.00E+00
8.00E+00
1.00E+01
1.20E+01
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
WHT size(2̂ n)
ratio
r1/b
r3/b
i1/b
i3/b
b/b
Small/It Runtime
0.00E+002.00E+00
4.00E+006.00E+00
8.00E+001.00E+01
1.20E+01
1 2 3 4 5 6 7 8
WHT size(2̂ n)
ratio I_1/rt
r_1/rt
![Page 24: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/24.jpg)
Dynamic Programming
minn1+… + nt= n
Cost(T T…
n1 nt
n),
where Tn is the optimial tree of size n.
This depends on the assumption that Cost only depends onthe size of a tree and not where it is located.(true for IC, but false for runtime).
For IC, the optimal tree is iterative with appropriate leaves.For runtime, DP is a good heuristic (used with binary trees).
![Page 25: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/25.jpg)
Optimal FormulasPentium
[1], [2], [3], [4], [5], [6]
[7]
[[4],[4]]
[[5],[4]]
[[5],[5]]
[[5],[6]]
[[2],[[5],[5]]]
[[2],[[5],[6]]]
[[2],[[2],[[5],[5]]]]
[[2],[[2],[[5],[6]]]]
[[2],[[2],[[2],[[5],[5]]]]]
[[2],[[2],[[2],[[5],[6]]]]]
[[2],[[2],[[2],[[2],[[5],[5]]]]]]
[[2],[[2],[[2],[[2],[[5],[6]]]]]]
[[2],[[2],[[2],[[2],[[2],[[5],[5]]]]]]]
UltraSPARC[1], [2], [3], [4], [5],
[6]
[[3],[4]]
[[4],[4]]
[[4],[5]]
[[5],[5]]
[[5],[6]]
[[4],[[4],[4]]]
[[[4],[5]],[4]]
[[4],[[5],[5]]]
[[[5],[5]],[5]]
[[[5],[5]],[6]]
[[4],[[[4],[5]],[4]]]
[[4],[[4],[[5],[5]]]]
[[4],[[[5],[5]],[5]]]
[[5],[[[5],[5]],[5]]]
![Page 26: What is the WHT anyway, and why are there so many ways to compute it?](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56813d81550346895da7601e/html5/thumbnails/26.jpg)
Different Strides
• Dynamic programming assumption is not true. Execution time depends on stride.
0 2 4 6 8 10 12 14 16 18 200.5
1
1.5
2
2.5
3
3.5
4
4.5
time/
time
for
strid
e 1
stride
W(1)W(2)W(3)W(4)W(5)