o'connor -- rpglab: a matlab package for random permutation generation
DESCRIPTION
These are notes on the generation of large random permutations. Included is a description of the RPGLab package written in Matlab that can be used to generate and test random permutations.TRANSCRIPT
NOTES ON
RANDOM PERMUTATION GENERATION
AND THE
MATLAB package RPGLab
Derek O’Connor
January 31, 2011*
1 Introduction
Permutations and combinations have been studied seriously for at least 500 years. Someof the best mathematicians have contributed to this research: Bernoulli, Newton, Stir-ling, Euler, . . ., etc. This area of study has expanded greatly in the last 100 years and isnow called Combinatorics, with permutations and combinations forming the foundation.That is, you can’t do combinatorics unless you know (and mind) your P s and C s.
Permutations and combinations are considered to be so important that even school-children are required to study the subject. Figure 1 shows the opening page of theP&C chapter in Hall’s Algebra, which I bought as a secondary school pupil in 1960, price10s/6d. Note that they use the good-old-fashioned notation nCr rather than the ambigu-ous (n
r). They also use nPr , but I don’t know the modern equivalent.1
This is an outline of these notes:
1. Introduction
2. Definitions
3. Random Permutations
4. Algorithms for Random Permutations
5. MATLAB Implementations and Testing
6. Generating Special Permutations
7. Testing Permutation Generators
8. RPGLab
*Started: 3rd Jan 2010. Web: http://www.derekro onnor.net email : derekro onnor�eir om.net1There is no need for all this notational fuss: C(n, k) for combinations and P(n, k) for permutations will
do nicely.
1
Derek O’Connor Random Permutation Generation
Figure 1. H. S. HALL, An Algebra for Schools, 1ST EDITION 1912, REPRINTED 1956, MACMILLAN, LONDON
© DEREK O’CONNOR, JANUARY 31, 2011 2
Derek O’Connor Random Permutation Generation
2 Definitions
Definition 1. (Permutation.) A permutation is a rearrangement of the elements of anordered list A = {a1, a2, . . . , an} into a one-to-one correspondence with A itself.
In what follows we assume that A is the set {1, 2, . . . , n}. We will represent any per-mutation as a MATLAB vector p(1 : n), where p(i) is the position of element i in thepermutation. Thus we have
{a1, a2, . . . , an}p−→ {ap(1), ap(2), . . . , ap(n)} (2.1)
Table 1. A PERMUTATION VECTOR
i 1 2 3 4 5 6 7 8 9 10
p(i) 3 4 9 2 10 8 6 1 5 7
The number of permutations on a set of n elements is n! = n · (n − 1) · · · 2 · 1. There is noexact closed form for n!, but Stirling’s Approximation is excellent for even modest n
n! ≈√
2πn(n
e
)n(
1 +1
12n+
1
288n2+
139
5140n3+ O
(
1
n4
))
(2.2)
Table 2. APPROXIMATE NUMBER OF PERMUTATIONS
n N = n!
101 107
102 10158
103 102,568
104 1035,659
105 10456,573
106 105,565,709
107 1065,657,059
108 10756,570,556
109 108,565,705,523
1010 1095,657,055,186
A random permutation of length n = 109 can be generated by a 2.3GHz, 16GB, machinein a few minutes. Table 2 is there to remind us of the truly gigantic size of the permutationspace from which these are generated.
Definition 2. (Identity.) The identity permutation is p = [1, 2, . . . , n], that is, p(i) = i, i =1, 2, . . . , n. In MATLAB the statement p = 1:n; generates the identity permutation.
Definition 3. (Transposition.) A transposition of a permutation is an exchange of twoof its elements i, j with all others staying the same. Any transposition of p gives a newpermutation q, if i 6= j.
© DEREK O’CONNOR, JANUARY 31, 2011 3
Derek O’Connor Random Permutation Generation
Definition 4. (Fixed Point.) A fixed point of a permutation p is any i such that p(i) = i. Itis called a fixed point because the element ai does not move under the permutation. TheIdentity Permutation p = (1, 2, . . . , n) has n fixed points.
Definition 5. (Derangement.) A derangement is a permutation with no fixed points, i.e.,p(i) 6= i, i = 1, 2, . . . , n.
Definition 6. (Cycle.) A cycle is a sequence i → p(i) → · · · → j → p(j) → i. The lengthof a cycle is the number of elements in the cycle. A fixed point is a cycle of length 1. Theidentity permutation has n cycles of length 1.
Definition 7. (Cyclic Permutation.) A cyclic permutation has 1 cycle. That is, starting atany point i we have a sequence i → p(i) → · · · → j → p(j) → i, which includes everyelement of {1, 2, . . . , n}. This cycle has length n.
It should be obvious that a cyclic permutation has no fixed points and is, therefore, aderangement. Equally obvious is that not all derangements are cyclic permutations.
3 Random Permutations
A random permutation of the numbers 1, 2, . . . , n is a permutation drawn uniformly fromthe set of all n ! permutations of n numbers. That is, a permutation p is drawn from theset of all permutations in such a way that the probability of drawing a given permutation,p, is 1/n ! .
The reason for our interest in random permutations is simple: we are interested in permu-tation vectors where n ≥ 106. For example, if we wish to test a new super-duper sortingalgorithm on inputs of size n ≥ 106, then we must resort to testing with a random samplefrom the vast (N ≥ 106 !) space of permutations.
3.1 PROPERTIES OF RANDOM PERMUTATIONS
It is useful to think of a permutation as a directed graph with n nodes labelled 1, 2, · · · , n.A directed arc i → j exists if p[i] = j. Of necessity there can be only one arc from anynode i. Also, of necessity, each node has one incoming arc. [WHY ?] Hence a permutationis a directed graph with n nodes and n directed arcs. An example of a permutation graphof 10 nodes is shown in Figure 2. The the permutation shown in Figure 2 has 10 nodes,
5 3 6 8 1 4 7 2 10 9
1 2 3 4 5 6 7 8 9 10
p
5 3 6 8 1 4 7 2 10 9
1 2 3 4 5 6 7 8 9 10
Figure 2. A PERMUTATION AND ITS GRAPH
© DEREK O’CONNOR, JANUARY 31, 2011 4
Derek O’Connor Random Permutation Generation
10 arcs, 4 cycles starting at 1,2,7,9, with lengths 2,5,1,and 2, respectively.2
The Number of Fixed Points. The probability that a permutation of length n has k fixedpoints is
Pr(p has k fixed points ) ∼ 1
ek!, (3.1)
and the average number of cycles of length k in a permutation of length n is 1/k withvariance 1/k. Hence, the average number of fixed points in a permutation of length n is 1, withvariance 1. This is a surprising result because the average is independent of the length ofthe permutation.
The Number of Derangements. Let D(n) be the number of derangements in the set ofn! permutations of length n. Then
D(n) = n!n
∑k=0
(−1)k
k !=
⌊
n! + 1
e
⌋
→ n!
e, as n → ∞. (3.2)
Hence
Pr(p is a derangement) =D(n)
n!→ 1
e≈ 0.36788. (3.3)
This means that in a large sample of random permutations about 37% of them will bederangements, independent of n, the length of the permutation.
The Number of Cycles. The number of cycles in a random permutation ranges from 1,a cyclic permutation, to n, the identity permutation. Let Ck(n) be the number of cycles oflength k in a permutation p(1 : n).
The expected number of cycles of length less than or equal to m is Hm.
The expected number of cycles of any length is Hn, or about log n. The average length of
a cycle is thusn
log n.
The Number of Cyclic Permutations There are n! permutations of n distinct elementsand (n − 1) ! of these will be cyclic.
Cn(n) = (n − 1)! and Pr(p(n) is cyclic) =Cn(n)
n!=
1
n. (3.4)
This means that in a large sample of random permutations a decreasing fraction of themwill be cyclic (e.g., 1% for n = 100, 0.1% for n = 1000), while the fraction of derangementsremains constant at 37%.
2Most of what follows is from Sedgewick & Flajolet, Analysis of Algorithms, Addison-Wesley, 1996, Chap-ter 6.
© DEREK O’CONNOR, JANUARY 31, 2011 5
Derek O’Connor Random Permutation Generation
Table 3. CYCLE DISTRIBUTION IN RANDOM PERMUTATIONS
C(i) n = 106, 11 cycles n = 107, 9 cycles n = 108, 11 cycles n = 109, 14 cycles
i Start Length Start Length Start Length Start Length
1 1 419486 3 765,295 3 7,946,786 1 715510740
2 2 563485 1 183,715 2 1,805,558 4 143677250
3 44 15656 38 41,873 74 168,213 20 135801148
4 509 432 97 8,007 586 53,765 148 3267451
5 1010 119 1,114 775 1 13,346 1303 1160044
6 5793 542 4,374 216 1,409 11,991 2774 461529
7 7003 255 3,307 100 14,927 329 9415 119643
8 16043 27 465 15 934,615 7 459730 1876
9 220629 2 253,498 4 958,846 3 7116764 135
10 578575 5 1,805,235 1 4362180 76
11 815788 2 3,807,650 1 9884753 37
12 30479687 29
13 112643112 23
14 111244636 19
In the next section we will discuss the two main algorithms for generating random per-mutations. However, before we do this we should have methods for checking the prop-erties of random permutations. This will allow us to check the output of any generatorso that we do not waste time developing bad algorithms and programs that implementthem. These methods have been collected in a package called RPGLAB which is dis-cussed in Section 8.
© DEREK O’CONNOR, JANUARY 31, 2011 6
Derek O’Connor Random Permutation Generation
Example 1. (Random Permutation of length 99.)3
p =
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
34 60 68 78 50 4 7 97 16 74 98 76 8 65 99 75 10 30 91 95
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
42 15 80 6 90 70 28 47 32 1 63 51 49 57 59 72 66 85 53 94
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
9 83 87 81 43 84 12 23 54 22 79 92 45 62 64 31 46 82 52 93
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
5 58 35 55 20 37 18 19 67 44 24 26 11 41 27 88 73 89 13 36
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
56 40 38 3 29 21 69 48 71 17 39 96 25 33 2 14 61 86 77
1
3457
46
84
3
68
19
91
39
5345
43
87
69
67
18
30
2
60
93
25
90
17
10
74
41
9
16
75
27
28
47
12
7688 48
23
80
36
72
26
70
44
81
56
31
63
35
59
52
92
96
14
652095
4
78 89
71
246
5
50
22
15
99
77
73
11
9886
21
42
83
38
85
29
32
51
79
13
8 97
61
7
33
49
5462
58
82
4094
37
66
55
64
Figure 3. RANDOM PERMUTATION OF LENGTH 99 WITH 8 CYCLES AND 1 FIXED POINT
3Figure 3 produced by the MATLAB function GrVizp.m and GraphViz with ‘neato’ layout.
© DEREK O’CONNOR, JANUARY 31, 2011 7
Derek O’Connor Random Permutation Generation
Example 2. (Random Permutation of length 150.)
1
122
10
49
24
86
60
97
107
40
30
89
37
98
9
115
42
10976
108 7116
18
88
87
99
36
114
105
66
73
47
101
11
117
138
5
13
135
140
17
126
93
103
86814725
15050
4
123
69
141
2
127
119
78
144
121
45
104
22
116
82
96
70
3
39
95
148
81
136
110120 139
726
29
15
85
34
137
149
12
80
90
14
59
38
92
94
128
55
118
77
41
74
235684
102
130
20
7
134
13331
83
26
124
35
62
11319
43
91
21
2751
146
48
63
65
67
33
143
7544
61
112
46
129
53
28
58
132
131
32
142
52
54
57
125
64
106 111
145
79100
Figure 4. RANDOM PERMUTATION (AND DERANGEMENT) OF LENGTH 150
© DEREK O’CONNOR, JANUARY 31, 2011 8
Derek O’Connor Random Permutation Generation
Example 3. (Random Permutation of length 200.)
1
15
73
161
158
179
110
174
542
18
21
19
80
132
70
38
50
44
23
74 86
52
2
6
146
63
170
183
60
25
36
57
81
116
136
15583 142
59
111
43
69
151
150
13
184
79
30
112
22
4
115
176105133
135
3
180
172
9
156
190
126
200
29
168
124
153
159
66
145
106
194
119
186
100131 35 157
127
160
87
144
173
143
58
130
65
138
64
26
76
147
8
24
40
120
99
28
121
11
189
681037137
195
118
34
166
48
10
163
188
196
37
128
169
75
49
16
134
192
56
88
197
89
140
45
39123
97 27 4755
162
171
85
95
175
91
31
82
125
109
113
84
165
152
61
148
117
107
94
72
167
54
181
20
96
519053
6792
101
46
139
93
32
71
12
199 104
108
185154
14
122
187164
17
198
178149
33
182
141 62
114
129
177
7741
193
78
98
102
191
Figure 5. RANDOM PERMUTATION OF LENGTH 200
© DEREK O’CONNOR, JANUARY 31, 2011 9
Derek O’Connor Random Permutation Generation
4 Algorithms for Random Permutations
Algorithms and programs for generating random permutations have been around sincethe start of the computer age. In fact Knuth has tracked down the first known algorithmto R. A. Fisher and F. Yates, Statistical Tables, (London, 1938), Example 12.4 There appearto be just two types of algorithm which are based on: (1) Random transpositions, and (2)Sorting a random vector.
The production of random permutations may be viewed in different ways:
1. Building a permutation randomly step-by-step.
2. Sampling with replacement from the set of all permutations of length n.
3. Sampling without replacement from the set of integers {1, 2, . . . , n}.
All three views are useful and should be kept in mind.
Conceptually, building a random permutation step-by-step is simple: take a random sam-ple of size n without replacement from a set of n elements S = {s1, s2, . . . , sn}.
Algorithm RandPerm(S) → p
Generates a random permutation of the
elements of the set S = {s1, s2, . . . , sn}for k := 1 to n do
Choose an element skr at random from S
S := S − {skr}
p[k] := skr
endfor kreturn p
endalg RandPerm
Notice that all the elements of S will be chosen and put into the array p. The arrayis needed to preserve the random order in which the elements were drawn. Thus p isreturned containing a random permutation of S.
Analysis of Algorithm RandPerm. By construction, RandPerm returns a permutation ofthe set S. We need to show that each permutation is equally likely with probability 1/n!of occurring. We prove this by induction on k.
The size of the set |S| decreases by 1 at each stage and so Pr(skr ) = 1/(n − k + 1) at stage
k, where skr is the element chosen at random at stage k. This is obviously true for k = 1,
with Pr(s1r ) = 1/n. Hence, because sk
r is chosen uniformly and independently, we have
Pr(s1r , s2
r , . . . , snr ) = Pr(s1
r )Pr(s2r ) · · · Pr(sn
r ) =1
n· 1
n − 1· · · 1
1=
1
n!. (4.1)
Thus we have proved that Algorithm RandPerm generates all possible permutations ofthe set S with equal probability.
4Available here: http://digital.library.adelaide.edu.au/ oll/spe ial/�sher/© DEREK O’CONNOR, JANUARY 31, 2011 10
Derek O’Connor Random Permutation Generation
If we assume that each set operation can be performed in O(1) time, then this algorithmruns in O(n) time, which is optimal up to a multiplicative constant. There are many waysto implement the set operations, either explicitly or implicitly.
The Fisher-Yates Shuffle Algorithm
This is a random transposition algorithm. The first computer implementation of it was byDurstenfeld.5 The most succinct statement of the Fisher-Yates Shuffle Algorithm is givenby Reingold, et al.6:
for i := n downto 2 do πi ↔ πrand(1,i), (4.2)
where π is a permutation of length n, and rand(i, j) returns a random integer from theset {i, i + 1, . . . , j − 1, j}. This is a truly elegant algorithm: it is tiny, uses no extra space,and is optimal. Also, it can shuffle any array in place, not just permutations.7
The Sort Permutation Algorithm
Sorting a set of distinct elements (x1, x2, . . . , xn), may be viewed as
Find a permutation p = (p1, p2, . . . , pn), such that xp1< xp2 · · · < xpn . (4.3)
Hence any sorting algorithm will generate, implicitly or explicitly, a permutation and itsassociated sorted vector:
[s, p] := Sort(x), where s[i] = x[p[i]]. (4.4)
The following line of MATLAB code demonstrates this:
x = rand(1,8); [s,p] = sort(x); table = [x;s;p;x(p)]
Table 4. SORTING WITH A PERMUTATION VECTOR
i 1 2 3 4 5 6 7 8
x 0.146529 0.118308 0.315581 0.683211 0.914784 0.912839 0.753141 0.437558
s 0.118308 0.146529 0.315581 0.437558 0.683211 0.753141 0.912839 0.914784
p 2 1 3 5 8 7 6 4
x[p] 0.118308 0.146529 0.315581 0.437558 0.683211 0.753141 0.912839 0.914784
We can see that the permutation p has one fixed point p[3] = 3, and three cycles (1, 2, 1),(4, 5, 8, 4), and (6, 7, 6).
A random permutation is obtained from the sorting algorithm if it is given a randomly-ordered vector to sort. This is done by filling a vector r[1 : n] with random numbers
5Richard Durstenfeld, Algorithm 235, Random Permutation, Communications of the ACM, Vol. 7, July1964, page 420.
6Edward M. Reingold, Jurg Nievergelt, and Narsingh Deo, Combinatorial Algorithms: Theory and Practice,Prentice-Hall, 1977, page 177.
7 Here is an interesting page on card shuffling by Richard J. Wagner: http://www-personal.umi h.edu/~wagnerr/shu�e/. Elsewhere on his site is his C++ implementation of the Mersenne Twister random numbergenerator.
© DEREK O’CONNOR, JANUARY 31, 2011 11
Derek O’Connor Random Permutation Generation
distributed uniformly on (0, 1) : r[1 : n] := RandReal(0, 1), and the sorting this randomvector : [s, p] := Sort(r[1 : n]), where s = r[p]. Thus a random permutation operating ona random vector gives an ordered vector.
Here are the two algorithms, side-by-side for easy comparison:
function GRPfys(n) → pGenerates a random permuation
of the integers 1,2,...,n
p := [1, 2, . . . , n] Identity perm.
for k := 2 to n dor := RandInt(1, k)p[k] :=: p[r] Swap
endfor kreturn p
endfunc GRPfys
function GRPsort(n) → pGenerates a random permuation
of the integers 1,2,...,n
for k := 1 to n dor[k] := RandReal(0, 1)
endfor k[s, p] := Sort(r)return p
endfunc GRPsort
Analysis of the Shuffle and Sort Algorithms
GRPfys has one for loop (2 : n), and so it requires Tgr = (n − 1)(Tr + Ts) time, where Tr
and Ts are the times to perform the random number generation and swap, respectively.We say the time (or step) complexity of this algorithm is O(n). This is asymptoticallyoptimal – just to read a permutation requires O(n) time.
GRPsort has one for loop (1 : n), which requires nTr time, followed by a sort whichrequires O(n log n) time. The total time is Tgs(n) = nTr + O(n log n). The time (or step)complexity of this algorithm is O(n log n).
An important difference between the algorithms is that GRPsort requires twice as muchstorage as the Shuffle algorithm, assuming that the vector r[1 : n] is over-written bys[1 : n].
© DEREK O’CONNOR, JANUARY 31, 2011 12
Derek O’Connor Random Permutation Generation
5 MATLAB Implementations of the Shuffle Algorithm
The MATLAB functions here are all minor variations of the Fisher-Yates Shuffle algorithm,except MATLAB’s own randperm which uses sorting.
In the MATLAB functions that follow, all loops are incremented rather than decremented.In the original Knuth algorithm P, he decrements the loop because in those days this gavefaster loops. This is no longer true. 8
Fisher-Yates Implementations
function p = GRPfys(n)
p = 1:n; Identity permutation
for k = 2:n
r = ceil(k*rand); Random integer between 1 and k
t = p(r);
p(r) = p(k); Swap(p(r),p(k))
p(k) = t;
end;
return; GRPfys
Knuth, in the 3rd edition of Volume 2, TAOCP, points out in the second-last paragraphon page 145 that there is no need for the swap if we don’t want to shuffle a given vector,but just want a random permutation. This is implemented below as GRPNS
function p = GRPns(n)
p = 1:n; Identity permutation
for k = 2:n
r = ceil(k*rand); Random integer between 1 and k
p(k) = p(r); No Swap
p(r) = k;
end;
return; GRPns
The final variation of GRPfys is what all right-thinking MATLAB-ers consider ‘good prac-tice’, viz., vectorization. In this variation the generation of the random integer r is moved
8The decremented (or backward) loop, and its close cousin, the zero-based array, have been constantsources of error in the student programs that I have seen over the 25 years of teaching algorithms and datastructures. People from childhood have been taught to count from 1 upwards and mistakes are inevitablewhen this is reversed. The rot set in, I believe, with Sputnik and the American Space Program – all thoseHouston count-downs. And remember, Knuth started programming at that time.
Likewise with zero-based arrays: people do not naturally start counting from 0. Except of course inIreland and the UK. Here, in these sceptre’d isles, we count the floors of our buildings from 0 upwards. Butwe don’t call the first floor Floor Zero. Oh no! We have a special name for it: the Ground Floor. I havelost count (pardon the pun) of the number of Americans wandering lost around our university buildingssaying, “But Prof Jones states in his letter that his office is on the second floor of the Gerry Adams MemorialBuilding”. MATLAB, very sensibly, does not allow zero-based arrays, but I’m sure they get many requestsfor them.
© DEREK O’CONNOR, JANUARY 31, 2011 13
Derek O’Connor Random Permutation Generation
outside the loop and a random integer vector r is generated before entering the loop. Im-mediately we see that this has doubled the memory required: two vectors p and r insteadof just p. But if it speeds up the function, it may be worth it.
function p = GRPvec(n);
p = I(n);
r = ceil(rand(n,1).*(1:n)’); Vector of random integers
for k = 2:n
t = p(r(k));
p(r(k)) = p(k); Swap(p(r),p(k))
p(k) = t;
end;
return; GRPvec
We can see in Table 5 that GRPVec takes 20% longer for n = 107, and 6% longer for n = 108,than the loop version. In this simple function vectorization has given us the worst of bothworlds: increased time and increased space. Vectorization, like the optimizing compiler,seems to be a mixed blessing.
I decided to replace the 3-statement swap with a slick one-statement in the original Shuf-fle method.
function p = GRPvswap(n)
p = 1:n;
for k = n:1:2
r = ceil(k*rand);
1 t = p(r);
2 p([k r]) = p([r k]); p(r) = p(k);
3 p(k) = t;
end;
return; GRPvswap
This 1-statement swap had a disastrous effect on the execution time of this simple func-tion. This function took 14 to 40 times longer to execute than the original. The slow-downis caused by the need to construct two vectors [k r], [r k] for each swap. This is ex-pensive, according to Jan Simon.9
9http://www.mathworks. om/matlab entral/newsreader/view_thread/295414.
© DEREK O’CONNOR, JANUARY 31, 2011 14
Derek O’Connor Random Permutation Generation
Jan Simon
function X = Shuffle(N,’index’)
% < snip >
% INPUT:
% X: Array with any size. Types: DOUBLE, SINGLE, CHAR, LOGICAL,
% (U)INT64/32/16/8. This works for CELL and STRUCT arrays also, but
% shuffling their indices is faster.
% Author: Jan Simon, Heidelberg, (C) 2010
% Simple Knuth shuffle in forward direction:
for i = 1:length(X)
w = ceil(rand * i);
t = X(w);
X(w) = X(i);
X(i) = t;
end;
return; % Shuffle
This is Knuth’s Algorithm P with the loop direction reversed. Note that this function canshuffle any vector, not just permutations. Also, it is very general and well-written, beingable to handle various integer arrays as well as the usual double precision array.
This MATLAB function is not actually used. Instead, Simon has written this in C and put aMEX wrapper around it so that when it is compiled it can be called from MATLAB. This isbecause the C compiler can generate much faster code than MATLAB’s interpreter. Thusthis is a code generation improvement and not an algorithmic improvement.
MATLAB
function p = randperm(n)
%RANDPERM Random permutation.
% RANDPERM(n) is a random permutation of the integers from 1 to n.
% For example, RANDPERM(6) might be [2 4 5 6 1 3].
%
% Note that RANDPERM calls RAND and therefore changes RAND’s state.
%
% See also PERMUTE.
% Copyright 19842004 The MathWorks, Inc.
% Revision : 5.10.4.1 Date : 2004/03/0221 : 48 : 27[ignore,p] = sort(rand(1,n));
If you have a good sorting method then this is a quick ’n slick method for generatingrandom permutations. Others might call it quick ’n dirty. Yet others would call it justdirty. But let’s not quibble: it works fine and it is fast because MATLAB has carefullyimplemented (in C or assembly?) a good sorting algorithm, Hoare’s Quicksort, I believe.
Although all the MATLAB functions listed above use rand, which returns a standard IEEE8-byte double precision floating point number, only MATLAB uses a whole n−vector ofthem. But the methods above generate n rands, don’t they? Yes, but one-at-a-time, so the
© DEREK O’CONNOR, JANUARY 31, 2011 15
Derek O’Connor Random Permutation Generation
total memory requirement is 1 integer n−vector, whereas MATLAB’s method requires 1integer n−vector and 1 double precision n−vector. This can make a big difference if youwork with truly large permutations.
5.1 Timing Tests
The results here show that all variants of GRPfys are about the same and so the choiceof function is reduced to three: GRPfys, GRPmex, or randperm. Jan Simon’s GRPmex is the
Table 5. PERMUTATION GENERATION USING MATLAB R2008A (SECS.)
Function Coder Date Mem n = 107 n = 108 n = 109
GRPfys Derek O’Connor Dec 2010 1 2.0 24.0 315
GRPns ” Dec 2010 1 2.0 24.0 310
GRPvec ” Dec 2010 1 2.2 25.5 350
GRPmex Jan Simon 2010 1 0.6 7.6 156
randperm Matlab 2005 2 2.3 26.0
Dell Precision 690, Intel 2xQuad-Core E5345 @ 2.33GHz, 16GB RAM
Windows7 64-bit Prof., MATLAB R2008a, 7.6.0.324
’mexed’ version of GRPfys, and runs 2–3 times faster. Both require 1 array of size n. If youdo not have the appropriate C compiler then you are forced to use GRPfys or somethingsimilar.
MATLAB’s randperm is the very slick one-liner [s,p] = sort(rand(n,1)). This method’scomplexity is O(n log n) which means that an O(n) method such as Shuffle will even-tually beat it when n is large enough. If Tshuf = cshufn and Tsort = csortn log n, then theaverage times per array element are cshuf and csort log n. Thus the average time of MAT-LAB’s sort method grows with n, while the Shuffle remains constant.
Table 5 shows that there is very little difference between GRPfys and randperm for n = 107
and n = 108. But the crucial difference is that MATLAB’s method uses twice as muchworking memory as GRPfys. This is why there is a blank in the table for randperm at n =109: 16 GB memory was not enough to allow randperm to generate a random permutationof size 8 bytes× 109 = 8 GB.
The general conclusion from these tests is that the simplest implementations of the Fisher-Yates algorithm are the best. Fancy vectorizations and array index manipulations notonly obscure the algorithm, but are error-prone, and are slower than the simpler methods.
5.2 Profiling GRPfys(n)
The profile for p = GDPfys(10ˆ7) is shown below. It is important on multi-core systemsto switch to single processor mode before running the profiler, otherwise the results willbe erratic. See the MATLAB profiler help for details.
© DEREK O’CONNOR, JANUARY 31, 2011 16
Derek O’Connor Random Permutation Generation
time Percent calls
1 function p = GRPfys(n);
0.08 1.0 1 p = 1:n;
1 for k = 2:n
1.87 22.6 9999999 r = ceil(rand*k);
1.29 15.7 9999999 t = p(r);
1.89 22.9 9999999 p(r) = p(k);
1.80 21.8 9999999 p(k) = t;
1.32 16.0 9999999 end;
1 return;
We can see that there are no ‘hot spots’ in this code and that time is fairly evenly spreadacross each statement. This suggests that there is little we can do in MATLAB to speed itup. But what about vectorization?, I hear you cry. Let us try profiling GRPvec above. Hereare the results for n = 108:
time calls | time calls
function p = GRPfys(n); | function p = GRPvec(n);
0.66 1 p = 1:n; | 0.74 1 p = 1:n;
| 6.22 1 r = ceil(rand(n,1).*(1:n)’);
1 for k = 2:n | 1 for k = 2:n
18.05 99999999 r = ceil(rand*k); |
12.69 99999999 t = p(r); | 21.55 99999999 t = p(r(k));
20.50 99999999 p(r) = p(k); | 22.97 99999999 p(r(k)) = p(k);
21.70 99999999 p(k) = t; | 12.93 99999999 p(k) = t;
12.57 99999999 end; | 12.54 99999999 end;
|
86.16 total | 77.10 total
We can see that the unvectorized code GRPfys on the left requires 86.16/77.1− 1 = 0.1175= 12% more time that GRPvec. Random number generation takes 18.05 secs in GRPfysbut only 6.22 secs in GRPvec. The time taken by the other statements in the loop are aboutthe same for both versions. Hence the 12% difference is accounted for by the vectorizationof the random number generator. Or so it seems.
Because profiling adds a lot of overhead to the run times, it is always a good idea to getthe actual times without the profiler on. Still with the computer in the single cpu modeGRPfys and GRPvec were timed, again with n = 108. Here are the command-line results:
» tic;p=GRPvec(1e8);toc
Elapsed time is 27.283832 seconds.
» tic;p=GRPvec(1e8);toc
Elapsed time is 27.352696 seconds.
» tic;p=GRPfys(1e8);toc
Elapsed time is 23.726308 seconds.
» tic;p=GRPfys(1e8);toc
Elapsed time is 23.714992 seconds.
This shows that the GRPvec takes 27.3/23.7 − 1 = 15% more time than the unvectorizedGRPfys, a complete reversal of the profiler results. In this case the MATLAB profiler hasbeen worse than useless — it has been grossly misleading. We can see that the profileradded about (86.16 − 23.7)/23.7 = 263% overhead time to the actual running time of
© DEREK O’CONNOR, JANUARY 31, 2011 17
Derek O’Connor Random Permutation Generation
GRPfys. This overhead time is noise (we don’t want it) which is swamping out the actualtime signal. The only reliable information given out by the profiler is the statement count.But apart from the erroneous profiler results, this example shows that vectorization isnot (always) a good thing: not only has it added 15% to the run time, it has doubled theworking memory required.
“It was once true that vectorization would improve the speed of your MATLAB code.However, that is largely no longer true with the JIT-accelerator” – Matlab Doug, Mar.2010.
6 Generating Special Permutations.
The Fisher-Yates Shuffle Algorithm is provably correct and, when implemented with agood random number generator, produces all possible permutations with equal proba-bility. A special permutation is one which has some special property or restriction: it iscyclic, it is a derangement, it must have 2 fixed points, etc.
The simplest way to generate special permutations is by the Rejection Method, as shownin the function GRPSpec: repeatedly generate a permutation until it has the special prop-erty or restriction. The main problem with this general method is that many permutationsmay need to be generated before a desirable one is found.
function GRPSpec(n) → pGenerates a special random permutation
of the integers 1,2,...,n
p := [1, 2, . . . , n] Identity perm.
while p is not special do
p := GRPfys(n)endwhile preturn p
endfunc GRPSpec
This type of algorithm is called a Las Vegas Algorithm:10 an algorithm that is guaranteed togive the correct output, but may take a very long (random) time to do so. It is importantto understand that GRPfys(p) is sampling uniformly with replacement from the set of allpermutations of length n, and knows nothing about the special property. We have seenthat the time complexity of GRPfys(n) is Ts(n) = csn ∈ O(n). This is not a random time,but constant for a given n.
The random uniform output of GRPfys(p) means that we do no know how many timesthe while loop is performed: it is a random number nw. Hence the time complexity ofGRPSpec(n) is a random function
Tspec(n) =nw
∑k=1
(Ttest(n) + Ts(n)). (6.1)
Using Wald’s Equation, E[∑Ni=1 Xi] = E[N]E[X], equation (6.1) becomes
E[Tspec(n)] = E[nw]E[Ttest(n) + Ts(n)] = E[nw](E[Ttest(n)] + E[Ts(n)]). (6.2)
10See Brassard & Bratley, Fundamentals of Algorithmics, Prentice-Hall, 1996, Chapter 10.
© DEREK O’CONNOR, JANUARY 31, 2011 18
Derek O’Connor Random Permutation Generation
We have E[Ts(n)] = Ts(n) is a constant for a given n, but we do not know much aboutTtest(n). Is it a random number or a constant? Usually these tests can be performed inO(n) time, e.g., IsCyc(p) and IsDer(p) are O(n). Assuming that the test is , at worst, aconstant ctn, then we have E[Tspec(n)] = E[nw](ctn + csn) = cnE[nw].
What can be said about nw and its expected value? Obviously nw > 0 and nw < ∞,assuming that the desired permutation type is not an impossibility, e.g., p(1 : 10) has atleast one p(i) = 59, say. Consider asking for p = (n, n − 1, . . . , 2, 1), the reverse identity.The probability of this permutation occurring is 1/n!, and so GRPfys may spend a verylong time until it ’hits’ it. An upper bound on nw would seem to be O(n!). This is, ofcourse, the essence of a Las Vegas algorithm: it will find the correct answer, but it maytake forever.
6.1 Generating Random Derangements.
A derangement is a permutation with no fixed points, i.e., pi 6= i, i = 1, 2 . . . , n. Obvi-ously, a permutation can be checked in O(n) time to determine if it is a derangement.In this case the rejection method works in O(n) time because Pr(p is a derangement) =D(n)/n! → 1/e ≈ 0.36788. Hence the expected time complexity is E(nw)n = en ≈2.718n, i.e., on average, the while loop of GRDrej is performed about 2.718 times perderangement generated.
The Rejection Derangement Generator
function p = GRDrej(n);
Generate a random permutation p(1:n) using GRPfys
and reject if this is not a derangement.
Requires expected e = 2.7183... passes through the while-loop.
NotDer = true;
while NotDer
p = GRPfys(n);
NotDer = HasFP(p); Derangement check
end;
return GRDrej
The Martinez-Panholzer-Prodinger Derangement Generator.
This is a new derangement algorithm by Martinez, et al.11 that is not easy to understandbut seems to work well. Shown below are the original algorithm and it MATLAB imple-mentation.
11http://www.siam.org/pro eedings/anal o/2008/anl08_022martinez .pdf© DEREK O’CONNOR, JANUARY 31, 2011 19
Derek O’Connor Random Permutation Generation
function GRPmpp(n) → pMartinez et al’s original Algorithm which generates a random
derangement of the integers 1,2,...,n
p := [1, 2, . . . , n] Identity permutation
Mark[1 : n] :=false;
i := n; u := nwhile u ≥ 2 do
if ¬Mark[i] thenrepeat
j :=Random(1, i − 1)until ¬Mark[j]p[i] :=: p[j] Swap
u :=Uniform(0, 1)if u < (u − 1)Du−2/Du then
Mark[j] :=true
u := u − 1endif uu := u − 1
endif ¬Mark[i]i := i − 1
endwhile jreturn p
endfunc GRPmpp
function p = GRDmpp(n);
p = I(n); identity permutation
mark = p < 0; mark(1:n) = false
i = n; u = n;
while u > 1
if ∼mark(i)
j = ceil(rand*(i1)); random j in [1,i-1]
while mark(j)
j = ceil(rand*(i1)); random j in [1,i-1]
end;
t = p(i);
p(i) = p(j); Swap p(i) and p(j)
p(j) = t;
r = rand;
if r < 1/u Prob. if test for n large
mark(j) = true;
u = u1;
end;
u = u1;
end; if ∼mark
i = i1;
end; while u > 1
return GRDmpp
Tests on GRDmpp show that on average the outer while loop is performed 2 times, aspredicted by the analysis. Hence this is faster than the Rejection algorithm which requirese = 2.718 loops, on average.
© DEREK O’CONNOR, JANUARY 31, 2011 20
Derek O’Connor Random Permutation Generation
Derangement Generator Timing Tests
Timing tests were run on the two derangement functions along with GRDmex which isGRDrej using Jan Simon’s fast GRPmex function.
Table 6. RANDOM DERANGEMENT GENERATOR TIMES (SECS)
n GRDrej GRDmex GRDmpp
105 0.04 8.91 0.01 1 0.02 4.13
106 0.26 3.48 0.08 1 0.24 3.20
107 18.06 5.70 4.83 1.52 3.17 1
108 144.11 14.38 10.02 1 38.03 3.80Times and normalized times, averaged over a
sample of 10 runs for each n.
Expected Running Times of the Derangement Generators
The Rejection algorithm (GRDrej) and the Martinez algorithm (GRDmpp) are examples ofLas Vegas Algorithms, i.e., they are guaranteed to give the correct result but they mayrun for a long (random) time. The Fisher-Yates Shuffle algorithm has a constant runningtime for a given n, which I will call Ts(n). The running time of the Rejection algorithmis Tr(n) = nw × (Tt(n) + Ts(n)), where nw is a random variable that counts the numberof times the while loop is executed. The expected value of nw is e = 2.7183 . . . becausethe probability of the Shuffle algorithm generating a derangement is 1/e. Hence Tr(n) =e(Tt(n) + Ts(n)) = 2.7183Ts(n), if we assume that Tt(n) is negligible compared to Ts(n).
Martinez, et al., prove that the expected running time of their algorithm is Tm(n) = 2Ts(n)and so it is faster, on average, than the Rejection algorithm, but not by much. My advicewould be: if you have a good fast Fisher-Yates Shuffle program, use it in the Rejectionalgorithm, especially if you are risk-averse, because the Martinez algorithm is harder toimplement correctly. We can see in the timing table above that the Rejection algorithmusing Jan Simon’s GRPmex was fastest for n = 105, n = 106 and n = 108. GRDmpp wasfastest for n = 107. An important added advantage of the Rejection algorithm is that ituses just 1 array of length n, whereas the Martinez algorithm uses 2 arrays of length n.
The interesting question remains: is random derangement generation inherently moredifficult than random permutation generation, or is a there a derangement algorithmwith Td(n) = Ts(n) ?
6.2 Generating Random Cyclic Permutations.
A cyclic permutation has a single cycle of length n. A permutation can be tested in O(n)time to see if it is cyclic. Hence GRPSpec will perform an O(n) cyclic test followed by anO(n) permutation generation for each iteration of the while loop. If nw is the numberof times the while loop is performed then the complexity of this method is O(nwn).From (3.4) we have Pr(p is cyclic) = 1/n. Hence the expected value of nw is n and so theexpected complexity of GRPSpec is O(n2), for cyclic permutations.
© DEREK O’CONNOR, JANUARY 31, 2011 21
Derek O’Connor Random Permutation Generation
Sattola’s Cyclic Permutation Generator.
S. Sattola, in her Master’s thesis, gave a very simple modification of the Fisher-Yatesalgorithm that generates random cyclic permutations.12
function GRCsat(n) → pGenerates a random cyclic permutation
of the integers 1,2,...,n
p := [1, 2, . . . , n] Identity perm.
for k := 2 to n do
r := RandInt(1, k − 1) Sattola’s modification
p[k] :=: p[r] Swap
endfor kreturn p
endfunc GRCsat
We can see that the only change in the Fisher-Yates algorithm is this: r := RandInt(1, k)becomes r := RandInt(1, k − 1). This change ensures that it swaps different elements ofp, i.e, p[k] :=: p[r], k 6= r. It is remarkable that such a small change in the Fisher-Yatesalgorithm can make such a big difference in its output.
The correctness of this algorithm has been proved and it has been thoroughly analysedby Prodinger13 It obviously has the same O(n) complexity as the Fisher-Yates algorithm,and so it is an order of magnitude faster than the rejection method.
12S. Sattola, “An algorithm to generate a random cyclic permutation”. Information Processing Letters, Vol.22, pages 315–317, 1986.
13Prodinger, Helmut, “On the analysis of an algorithm to generate a random cyclic permutation”, ArsCombinatorica, Vol. 65, 2002.
© DEREK O’CONNOR, JANUARY 31, 2011 22
Derek O’Connor Random Permutation Generation
7 Testing Combinatorial Generators
Combinatorial generation algorithms and programs are often small, deceptively simple,subtle, and, above all, error-prone. These small programs are often buried deep as sub-programs (functions) in large simulations whose operation depends crucially on theircorrect and efficient working. It should go without saying that these generators must berigourously tested.
7.1 Random Generator Tests
These tests are for any n, but usually with n in the range [106, 1010], where exhaustivetesting is out of the question.
1. Existential. Does the generator output the correct form of combinatorial object. Forexample, (1) does a permutation generator produce (all possible) permutations?(2) does a random derangement generator produce (all possible) derangements?,and (3) does a random cyclic permutation generator produce (all possible) cyclicpermutations.
2. Uniformity. Are the random permutations distributed uniformly over the popula-tion of n! permutations. Uniformity implies that each number in {1, 2, . . . , n} occurswith relative frequency 1/n.
3. Frequency. Counting Classes. Derangements, Cycle Structure, etc.
4. Others
— TO BE COMPLETED —
© DEREK O’CONNOR, JANUARY 31, 2011 23
Derek O’Connor Random Permutation Generation
Here is an example of an existential test of a derangement generator.
function [p,k,t] = FindPerm(target);
% Check if RPG can hit (find) the permutation ’target’
% USE: [p,k,t] = FindPerm([5 4 6 7 1 8 10 2 3 9]);
% The example target is a derangement but not cyclic
% Warning: use small permutations. O(n!) time.
% Derek O’Connor 28 Dec 2010.
n = length(target)
p = randpermfull2(n);
limit = 5*factorial(n);
k = 0;
tic;
while any(ptarget) && k < limit
p = randpermfull2(n); % Ver 2, Jos van der Geest’s ’derangement’ gen.
k = k+1;
end;
t = toc;
Frac = k/factorial(n);
dispa(’’);
if k == limit
dispa(’Target NOT FOUND after k =’, k,’iterations. Time =’,t,’secs.’);
else
dispa(’Target FOUND after k =’, k,’iterations. Time =’,t,’secs.’);
end;
dispa(’Fraction of Space searched (k/n!) =’, Frac ,’Rate =’, ceil(k/t), ’per second’);
[target;p]
return
The target is [5 4 6 7 1 8 10 2 3 9]. HasFP(target) returns 0 or false so we know thattarget is a derangement. Let’s see if it is a cyclic permutations: IsCyc(target) returns 0or false, so now we know that target is a derangement that is not cyclic. If randpermfull is atrue derangement generator then it should generate this derangement after a sufficientlylarge number of iterations. Here is what happens:
target = [5 4 6 7 1 8 10 2 3 9];
[p,k,t] = FindPerm(target);
Target NOT FOUND after k = 18144000 iterations. Time = 202.6715 secs.
Fraction of Space searched (k/n!) = 5 Rate = 89525 per second
ans =
5 4 6 7 1 8 10 2 3 9
6 5 4 1 10 8 3 2 7 9
© DEREK O’CONNOR, JANUARY 31, 2011 24
Derek O’Connor Random Permutation Generation
Here is an example of an important frequency test of a permutation generator.
function nfp = FreqFixedPts(n,nsamp);
% Frequency Count of fixed points in a
% a sample of random permutations p(1:n).
% USE: [nfp,avfp,sdevfp] = FreqFixedPts(20,10ˆ4);
% Derek O’Connor 8th Jan 2011. [email protected]
nfp = zeros(nsamp,1);
for s = 1:nsamp
p = GRPfys(n);
nfp(s) = CountFixedPts(p);
end;
OutFreqFixedPts(nfp);
return; % FreqFixedPts
Figure 6 confirms that the expected value and variance of the number of fixed points is 1,for any n, and that the relative frequency of 0 fixed points is about 1/e ≈ 0.3679, whichis the relative frequency of derangements. A bad RPG would be unlikely to give theseresults. Indeed, it was this test that showed that there was something wrong with GRDrej.The fault was identified in the ‘simple’ IsDer interrogation function: a compound while
test was wrong. The function was re-written as IsDer(p) = ∼HasFP(p).
Figure 6. FIXED POINT FREQUENCIES
— TO BE COMPLETED —
© DEREK O’CONNOR, JANUARY 31, 2011 25
Derek O’Connor Random Permutation Generation
8 RPGLab
In this section14 we describe a set of simple tools, MATLAB functions, that help us ma-nipulate and test permutations. This is done in the spirit of Kernighan and Plauger’sSoftware Tools in Pascal, Addison-Wesley, 1981, one of the great books in computer sci-ence. Sadly, the lessons and insights of this book are either unknown to or ignored bymany of today’s MATLAB programmers. In the same spirit as Kernighan & Plauger, thebooks by Jon Bentley are a devoted to small, simple, but powerful programs.
I hope that these simple functions will prove useful to those who wish to experimentwith random permutations. Hence the suffix LAB.
Simplicity and Correctness. The functions in RPGLAB have been designed to be assimple as possible and as simple to use as possible. The inputs are: an integer n, or a rowvector p(1 : n), or an integer and a vector. Outputs are: a logical ans, or a vector p.
Naming Conventions. Mathematics derives its power from the judicious choice of sym-bols it uses to name objects and processes. For example, ∑
ni=1 xi, conveys an immense
amount of information in a very compact form. Expressing this in a programming lan-guage (except APL) would take many more symbols, while expressing it in ordinaryEnglish would take a paragraph or more. With this in mind we have used the short-est possible names that are compatible with conveying information that is essential tounderstanding a piece of code. The names we use are not explanations but symbolicreminders of what the name stands for. Thus p = GRPfys(n) reminds us that it standsfor the process of Generating a Random Permutation of length n using the f isher-yatesshuffle algorithm, and stores it in the row vector p. This name and others have to bestudied, understood, and remembered, before they can be used with facility – just as inMathematics.
Error Checking. What may shock many MATLABers is the lack of input error checking.This is in the spirit of Kernighan & Plauger who assume that the users are reasonablycompetent and not lazy. After all, the makers of 36" pipe wrenches don’t check to see ifyou are using one to fix your bicycle.
If you generate and manipulate permutations with these functions only, then only validpermutations will be generated. Otherwise you must do your own error-checking.
Comments. Another shock is that very few comments are used, except for a succinctstatement of purpose at the head of each function. It has become fashionable in somequarters to write mini-theses at the start of each function, along with enough biographicaldata to satisfy the Library of Congress. I do include my name in the header of eachfunction.15 The header comments in the functions that follow have been stripped out tosave space, but they are in the m-files.
14I have written this section to be independent of the others. As a result, some code and discussions arerepeated.
15This reminds me of a comment by a famous art historian: “No matter how abstract the painting, thesignature is always very clear”.
© DEREK O’CONNOR, JANUARY 31, 2011 26
Derek O’Connor Random Permutation Generation
’MATLAB-isms’. Perhaps the biggest shock to MATLABers will be that there is virtuallyno use of MATLAB’s vectorizations and fancy array manipulation functions. Most code iswritten in loop or component form, as the Numerical Linear Algebra people call it. I be-lieve that MATLAB’s matrix-vector notation and manipulation functions are very useful,when used in the proper place, but many MATLABers have made a fetish of vectorizationand index manipulation, to the detriment of code clarity, and, quite often, to speed.
A benefit of not using what I call ‘MATLAB-isms’, is that these functions are easily trans-lated into other languages, such as Fortran and C, making them good Mex candidates.
MATLAB is a very powerful and useful system for numerical computing. It allows theknowledgeable user to do in a few lines what would take hundreds of lines of tediousFortran or C code. A superb example of the power of good MATLAB programming isTrefethen’s Chebfun system.16. Trefethen, by the way, claims to be the first license holderof MATLAB.
8.1 The Primitives
These are the low-level functions that are used everywhere and are so simple that theyare, we hope, obviously correct. Writing such simple functions is not a trivial task.
These primitives fall into two classes: (1) permutation constructors, and (2) permutationinterrogators.
Table 7. PERMUTATION PRIMITIVES.
Class Name Description Use
I Generate the identity permutation p = I(n)
Trans Transpose elements i and j of p p = Trans(p,i,j)
Constrs. Rev Reverse the elements p(i), . . . , p(j) p = Rev(p,i,j)
Rot Left circular shift p by k positions p = Rot(p,k)
GRPfys Generate a random permutation p(1 : n) p = GRPfys(n)
GRPmex Generate a random permutation p(1 : n) p = GRPmex(n)
IsPer Is p a permutation? ans = IsPer(p)
Interrs. HasFP Does p have a fixed point? ans = HasFP(p)
IsDer Is p a derangement? ans = IsDer(p)
IsCyc Is p cyclic? ans = IsCyc(p)
Inputs and Outputs: i, j, k, n are integers, p is a permutation, ans is logical.
16http://www2.maths.ox.a .uk/ hebfun/© DEREK O’CONNOR, JANUARY 31, 2011 27
Derek O’Connor Random Permutation Generation
8.2 Constructors
Identity. p = I(n)
function p = I(n);
p = 1:n; a row vector
return; I(n)
Transpose. p = Trans(p,i,j)
function p = Trans(p,i,j);
t = p(i);
p(i) = p(j);
p(j) = t;
return; Trans
Reverse. p = Rev(p,i,j)
function p = Rev(p,i,j);
while i < j
p = Trans(p,i,j);
i = i+1;
j = j1;
end;
return; Rev
This code reverses elements p(i) . . . p(j) of p(1, 2, . . . , n). 17 Although there is no error
5 3 6 8 1 4 7 2 10 9
1 2 3 4 5 6 7 8 9 10
p
5 3 7 8 1 4 6 2 10 9
1 2 3 4 5 6 7 8 9 10
p
5 3 7 4 1 8 6 2 10 9
1 2 3 4 5 6 7 8 9 10
p
i j
i j
i = j
Figure 7. REVERSE OPERATION
checking, this code is quite robust: if i ≥ j then nothing happens. In practice, the trans-position function is replaced by the 3-statement swap operation. This operation is nowused in the rotation operation in a very clever way.
17See Kernighan & Plauger, Software Tools in Pascal, Addison-Wesley, 1981, pages 194,195.
© DEREK O’CONNOR, JANUARY 31, 2011 28
Derek O’Connor Random Permutation Generation
Rotate. p = Rot(p,k)
function p = Rot(p,k);
n = length(p);
p = Rev(p,1,k);
p = Rev(p,k+1,n);
p = Rev(p,1,n);
return; Rot
This does a left circular shift of all elements of p by k positions. That is,
[p(1), . . . p(k), p(k + 1), . . . , p(n)]Rot(p,k)−−−−→ [p(k + 1), . . . , p(n), p(1), . . . , p(k)]
Here is how the three reversals work:
[p(1), . . . p(k), p(k + 1), . . . , p(n)]Rev(p,1,k)−−−−−→ [p(k), . . . , p(1), p(k + 1), . . . , p(n)]
[p(k), . . . p(1), p(k + 1), . . . , p(n)]Rev(p,k+1,n)−−−−−−−→ [p(k), . . . , p(1), p(n), . . . , p(k + 1)]
[p(k), . . . p(1), p(n), . . . , p(k + 1)]Rev(p,1,n)−−−−−→ [p(k + 1), . . . , p(n), p(1), . . . , p(k)]
At first glance, this code may seem inefficient because it is doing three reversals. Firstly,it should be obvious that the Reverse function is efficient: it performs (j − i)/2 transpo-sitions, or a total of 3(j − i)/2 element moves. Hence, the Rotate operation performs
3(k − 1)
2+
3(n − k − 1)
2+
3(n − 1)
2=
6n − 9
2∼ 3n element moves.
Although this is not optimal (each element is moved twice) it is very efficient nonetheless.There is an extensive body of research literature on this and related topics. The rotationoperation is an important low-level operation in text editors.
Generate a Random Permutation. p = GRPfys(n)
function p = GRPfys(n);
p = I(n);
for k = 2:n
r = RandInt(1,k);
p = Trans(p,k,r);
end;
return; GRPfys
This is the increasing-loop version of the Durstenfeld version of the Fisher-Yates Shufflealgorithm, shown below, along with Pike’s modification for a partial shuffle.
For those who have never seen, let alone written an Algol program, the function entier(x)is the largest integer not greater than the value of x.18
18Looking at Durstenfeld’s nicely-typeset Shuffle procedure, it is a shock to realize that it is a valid Algolprocedure, comments and all. Now, fifty years later, it is a poor reflection on today’s language designers,compiler-interpreter writers, and program-editor makers, that they cannot handle or present us with nicelytypeset programs, despite the huge strides made in mathematical typesetting, a much more difficult taskthan program typesetting. But we do have 19 different types of assignment statements in Java. Programmersof the World, Protest!
© DEREK O’CONNOR, JANUARY 31, 2011 29
Derek O’Connor Random Permutation Generation
Generate a Random Permutation. p = GRPmex(n)
function p = GRPmex(n);
p = ShuffleMex(n,’index’); Jan Simon’s mexed version of F-Y Shuffle
return; GRPmex
This is 2 to 3 times faster than the pure MATLAB version GRPfys(n).
© DEREK O’CONNOR, JANUARY 31, 2011 30
Derek O’Connor Random Permutation Generation
8.3 Interrogators
Valid Permutation Check. ans = IsPer(p)
function ans = IsPer(p)
n = length(p);
count = zeros(n,1);
ans = true;
for k = 1:n
if count(p(k)) == 0
count(p(k)) = count(p(k))+1;
else
ans = false; Stop after first bad p(k)
return
end;
end; for k
return; IsPer
© DEREK O’CONNOR, JANUARY 31, 2011 31
Derek O’Connor Random Permutation Generation
A permutation p is valid if and only if each k ∈ {1, 2, . . . , n} appears once and only once.This is an expensive check because it uses an extra array.19 Notice that the function re-turns as soon as a bad value is found. No further time is wasted checking for other badpoints. We will use this stop-as-soon-as-possible principle throughout the other interroga-tion functions.
Fixed Point Check. ans = HasFP(p)
function ans = HasFP(p)
n = length(p);
ans = false;
for k = 1:n
if p(k) == k
ans = true;
return; stops after first fixed point.
end;
end; for k
return HasFP
A permutation p has a fixed point if p(k) = k, for some k = 1, 2, . . . , n. Note thatHasFP(p) returns false if p is a derangement. This function could be written more suc-cinctly with a compound while statement, but such statements are (for me at least) errorprone. It is well to remember that succinctness and simplicity are often at odds.
Derangement Check. ans = IsDer(p)
function ans = IsDer(p);
ans = ∼HasFP(p);
return IsDer
If you are sure that HasFP(p) is correct, then IsDer(p) is obviously correct. Replace thefunction call with inline code if necessary, but check it carefully.20
19I’m sure there are better ways of doing this check. Later, maybe.20My first attempt at IsDer(p) had an error in a compound while check. This error did not show up until
much later, when I did frequency tests on the permutation generators.
© DEREK O’CONNOR, JANUARY 31, 2011 32
Derek O’Connor Random Permutation Generation
Cyclic Permutation Check. ans = IsCyc(p)
function ans = IsCyc(p);
n = length(p);
start = 1;
next = p(start);
L = 1;
while next ∼= start stops at the end of first cycle.
L = L+1;
next = p(next);
end;
ans = L == n;
return; IsCyc
A permutation p is cyclic if it has a cycle of length n. This function starts arbitrarily atelement 1. If element 1 is part of a cycle whose length is less than n, then p cannot becyclic, and false is returned.
8.4 Special Permutation Generators
There are many special permutations. Here we give two which seem to be the mostuseful: Derangement and Cyclic.
Table 8. SPECIAL GENERATORS.
Name Description Use
GRDrej Generate a random derangement p(1 : n) p = GRDrej(n)
GRDmex Generate a random derangement p(1 : n) p = GRDmex(n)
GRDmpp Generate a random derangement p(1 : n) p = GRDmpp(n)
GRCsat Generate a random cyclic permutation p(1 : n) p = GRCsat(n)
Generate a Derangement. p = GRDrej(n)
function p = GRDrej(n);
NotDer = true;
while NotDer
p = GRPfys(n);
NotDer = HasFP(p); Derangement check
end;
return GRDrej
© DEREK O’CONNOR, JANUARY 31, 2011 33
Derek O’Connor Random Permutation Generation
Generate a Derangement. p = GRDmex(n)
function p = GRDmex(n);
NotDer = true;
while NotDer
p = GRPmex(n);
NotDer = HasFP(p); Derangement check
end;
return GRDmex
This is just GRDrej using GRPmex which uses Jan Simon’s ShuffleMex. This gives a 2 to 3speedup.
Generate a Cyclic Permutation. p = GRCsat(n)
function p = GRCsat(n)
p = I(n);
for k = 2:n
r = RandInt(1,k1); Changing k to k-1 in GRPfys
p = Trans(p,k,r);
end;
return; GRCsat
This is Sattola’s modification of GRPfys. We can see that the only change in GRCsat isto use of k − 1 instead of k. This causes a dramatic change in the output of this func-tion: it generates cyclic permutations only. The correctness of this algorithm has beenproved and it has been thoroughly analysed by Prodinger21 It obviously has the sameO(n) complexity as the Fisher-Yates algorithm.
21Prodinger, Helmut, “On the analysis of an algorithm to generate a random cyclic permutation”, ArsCombinatorica, Vol. 65, 2002.
© DEREK O’CONNOR, JANUARY 31, 2011 34
Derek O’Connor Random Permutation Generation
Generate a Random Integer. r = RandInt(L,U)
This function is included because it is easy to get wrong. It can be replaced by the singleline of code below. The purpose of this function is to pick a random integer from the set{L, L + 1, . . . , U − 1, U}. This is done with the single statement
r = L + floor(rand ∗ (U − L + 1)) (8.1)
We wish to prove that this statement works correctly, given that rand is MATLAB’s imple-mentation of the Mersenne Twister generator. MATLAB’s documentation on rand statesthat
The rand function now supports a method of random number generation called theMersenne Twister. The algorithm used by this method, developed by Nishimura andMatsumoto, generates double precision values in the closed interval [2−53, 1− 2−53],with a period of (219937 − 1)/2
From this information we have from (8.1)
r ∈ L + floor([2−53, 1 − 2−53]× (U − L + 1))
= L + floor[2−53(U − L + 1), (U − L + 1)− 2−53(U − L + 1)],
= L + floor[ǫ, U − L + 1 − ǫ], where ǫ = 2−53(U − L + 1),
= L + [⌊ǫ⌋, ⌊(U − L + 1 − ǫ)⌋],= L + [0, U − L], if ǫ < 1,
= [L, U].
Now ǫ < 1 ⇒ 2−53(U − L + 1) < 1 ⇒ (U − L + 1) < 253. If L and U are 32-bitsigned integers then (U − L + 1) is also a 32-bit signed integer in [−231, 231 − 1]. Hence−231 ≤ (U − L + 1) ≤ 231 − 1, and the condition ǫ < 1, or (U − L + 1) < 253, obviouslyholds.
If L and U are 64-bit signed integers then (U − L + 1) is also a 64-bit signed integerin [−263, 263 − 1]. Hence −263 ≤ (U − L + 1) ≤ 263 − 1, and the condition ǫ < 1, or(U − L + 1) < 253, may not hold.
Warning: Do not use 64-bit integers with r = L + floor(rand*(UL+1)).
The period of rand is (219937 − 1)/2 ≈ 20 × 106000, which is a gigantic number, at leastto ordinary mortals. However, the sets of permutations that GRPfys(n) samples from aregigantically larger that rand’s gigantic period (See Table 2 above). On a 2.3 GHz, 16 GBmachine, random permutations of length n = 109 can be generated in a few minutes.Thus we are sampling from a space of N = n! = (109)! objects. Using Stirling’s approx-imation we find loge(109)! ≈ 2 × 1010 which means that N = n! is an integer with about1010 digits. This means that despite rand’s gigantic period – an integer with a mere 6000digits – GRPfys(n) will never visit more than an infinitesimally small fraction of the per-mutation space. Yet we can say, with a certain amount of confidence, that GRPfys willproduce a permutation p(1 : 109) with probability 1/(1010- digit number).
© DEREK O’CONNOR, JANUARY 31, 2011 35
Derek O’Connor Random Permutation Generation
— TO BE COMPLETED —
Table 9. OTHER FUNCTIONS.
Name Description Use
InvP The inverse of p. p(q) = q(p) = I q = InvP(p)
CanForm Arrange p in canonical form q = CanForm(p)
CycStruct Determine cycle structure of p [ncyc,S,L] = CycStruct(p)
FindPerm Can an RPG ‘hit’ the target pt [p,k,time] = FindPerm(pt)
© DEREK O’CONNOR, JANUARY 31, 2011 36