random numbers in hep

25
Random Numbers in HEP Tests of Random Engines in CLHEP Heinrich’s note about RandEngine

Upload: desirae-lancaster

Post on 04-Jan-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Random Numbers in HEP. Tests of Random Engines in CLHEP Heinrich’s note about RandEngine. Essential Concepts. Instances Use of a random in one module need not affect the sequence of randoms seen by another Engines vs Distributions An Engine is a source of uniform (pseudo-) randomness - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Random Numbers in HEP

Random Numbers in HEP

Tests of Random Engines in CLHEP

Heinrich’s note about RandEngine

Page 2: Random Numbers in HEP

Essential Concepts• Instances

– Use of a random in one module need not affect the sequence of randoms seen by another

• Engines vs Distributions– An Engine is a source of uniform (pseudo-) randomness– A Distribution uses an engine to fire off variates

• Flat()• Gaussian()

• Validity of Randomness – How {unbiased, ergodic, unpredictable} is the engine?

Page 3: Random Numbers in HEP

How Engines Can be Bad

• Wrong single-number statistics– Wrong mean (c.f. the buggy Linux RandEngine)– Wrong sigma (or higher moment)– Very easy to detect such a major deficiency– Non-uniform distribution with right moments

• Possible! (but mathematically pathological)• And certainly possible to an excellent approximation *

• Sequence-order dependent flaws– Can be harder to detect– Can still lead to incorrect physics!

Page 4: Random Numbers in HEP

An engine with good moments but flawed distribution

• Every engine on a real computer!– (But if the granularity is small enough this

“flaw” becomes moot)

Page 5: Random Numbers in HEP

Typical Sequence Flaws• Tom Nash, doing a thesis in the 70’s, found

that pairs of numbers from the “random” engine in use lied in stripes.

• He was far from the first to notice this!

• But it was affecting the physics

Page 6: Random Numbers in HEP

Testing an Engine• The general test:

– Start with a mathematical observation: “If Sn is an ergodic sequence, uniformly distributed on (0,1), then for all subsequences T, some property P should hold.”

– Property P should hold, to accuracy (n) and with probability p(n), where (n)0 and p(n) 1 for large n.

– Apply the “Does P hold” test to a bunch of sequences generated by the engine.

• An Engine that passes every possible test based on every possible property to perfect precision would be a perfect random engine– But mere mortals have to pick some set of tests– And have to settle for some finite precision on each test

Page 7: Random Numbers in HEP

The DIEHARD Suite• Invented (more like collected) by Marsaglia.• 13 tests (for 13 properties of the sequence).• Each test uses the sequence to generate a group of

numbers, with mathematically known distribution for random sequences.

• The distribution obtained is compared via a Kolmogorov-Smyrnoff test with the proper distribution.

• If any P-value is extremely close to 1, this is proof of non-randomness of the sequence.

Page 8: Random Numbers in HEP

Interpreting the Results of a test suite

• The longer the sequences used, the more discrimination power.

• If no test fails, that is not proof that the sequence is random:– It is proof that either the tester was not clever enough to

find and expose non-random properties, or that the sequence used was not long enough.

– But for practical uses, some suite of tests at some level of N is “good enough.”

Page 9: Random Numbers in HEP

The DIEHARD Tests

• Birthday Spacings– The likelihood of j birthdays being shared among m people in an n-day

year is Poisson distributed with mean m3/(4n).– Not order-sensitive.

• Overlapping 5-permutation Test– Each 5-number subsequence forms one of 120 possible ordering numbers.

The count for each should be equal and uncorrelated.

• Binary Rank Test (32x32 and 6x8)– The distribution of ranks of 32x32 bit matrices is known theoretically.

This detects many linear realations involving up to 1000 bits of the sequence.

• Bitstream Test• For a random stream of 221 bits, the number of “missing” 20-bit “words”

should be normally distributed with mean 142,000 and sigma 428.

Page 10: Random Numbers in HEP

and…

• Overlapping Pairs and Quadruples, Sparse Occupancy– Missing 2- and 4-”letter” words in a long sequence of “letters” from a

1024-letter alphabet

– Not very order sensitive past N=20 or 40

• DNA Test– Missing “10-genome” sequences in the 4-letter genome alphabet

• Count-the-1-’s test on a stream of or specific bytes– Uses the number of 1’s in 8-bit sequences, to form pseudo-letters, which

are then grouped into 5-letter words.

• Runs Test– The theoretical distributions and covariances fro up and down runs of N

numbers are known.

Page 11: Random Numbers in HEP

and…

• Overlapping Pairs and Quadruples, Sparse Occupancy– Missing 2- and 4-”letter” words in a long sequence of “letters” from a

1024-letter alphabet

– Not very order sensitive past N=20 or 40

• DNA Test– Missing “10-genome” sequences in the 4-letter genome alphabet

• Count-the-1-’s test on a stream of or specific bytes– Uses the number of 1’s in 8-bit sequences, to form pseudo-letters, which

are then grouped into 5-letter words.

• Runs Test– The theoretical distributions and covariances fro up and down runs of N

numbers are known.

Page 12: Random Numbers in HEP

and…

• Craps test– A long sequence of games of craps is simulated. The distribution of wins

and losses, and of the number of dice throws for each game, is known.

• Parking Lot Test– Make a number of attempts to park a circle in a random “lot” of a square

of side 100. Re-try if there are collisions. After 120,000 tries, the number of successes shuld match some (experimentally determined) distribution.

• Overlapping Sums Test– The sums of 100 variables should be virtually normally distributed.– Sensitive to medium-scale imperfections.

• Squeeze Test– Number of iterations needed to reduce 231 down to 1, by doing

k = ceiling(k*random()).

Page 13: Random Numbers in HEP

The Minimum Distance Test

• The last two tests are the MD test in 2 and 3 dimensions:– Use N pairs (triplets) of randoms as coordinates

of N points in the unit volume. – The closest pair of points has some distance d.– d has a “known” theoretical distribution

• As applied by Marsaglia, most generators semi-fail the 2-D Minimum distance test!

Page 14: Random Numbers in HEP

The Minimum Distance Test• The last two tests are the MD test in 2 and 3 dimensions:

– Use N pairs (triplets) of randoms as coordinates of N points in the unit volume.

– The closest pair of points has some distance d.– d has a “known” theoretical distribution

• As applied by Marsaglia, most generators semi-fail the 2-D Minimum distance test!– P-values of .99 are typical.– They did not semi-fail when the FORTRAN versions of the tests

were applied.– The f2c version showed this behavior.– ?

Page 15: Random Numbers in HEP

The Minimum Distance Test

• Marsaglia speculated this was due to some funny business (error) with converting array code using f2c.

• Turns out it was a function of being able to do more iterations on later processors.

• The theoretical distribution had been calculated only to lowest order. By the time ZOOM did these tests, we were exposing the fact that the actual distribution differed.

• See my Technical memo TM2170, where I do the next few orders. This resolves the discrepancy.

Page 16: Random Numbers in HEP

The CLHEP Engines

• JamesRandom– Fred James’ adaptation of Marsaglia’s add-with-carry generator. Large state.

• Ranecu– Multiplicative Congruential generator using formula constants of L'Ecuyer. Large

state.

• Ranecu– Multiplicative Congruential generator using formula constants of L'Ecuyer. Loarge

state.

• Ranlux and Ranlux64– Luescher’s “folding” generator. For luxury levels of 2+ it is excellent; for luxury 4

is “provably” ergodic. Moderate state.

• TripleRand– Used in the Canopy Lattice gauge computations, this combines a Hurd primitive

polynomial generator with two other unrelated generators to form a family of trustworthy yet distinct sequences. Moderate state.

Page 17: Random Numbers in HEP

The CLHEP Engines

• JamesRandom– Fred James’ adaptation of Marsaglia’s add-with-carry generator. Large state.

• Ranecu– Multiplicative Congruential generator using formula constants of L'Ecuyer. Large

state.

• Ranecu– Multiplicative Congruential generator using formula constants of L'Ecuyer. Loarge

state.

• Ranlux and Ranlux64– Luescher’s “folding” generator. For luxury levels of 2+ it is excellent; for luxury 4

is “provably” ergodic. Moderate state.

• TripleRand– Used in the Canopy Lattice gauge computations, this combines a Hurd primitive

polynomial generator with two other unrelated generators to form a family of trustworthy yet distinct sequences. Moderate state.

Page 18: Random Numbers in HEP

More CLHEP Engines• Hurd288 and Hurd 160

– Based on primitive polynomials over Z(2). Provably ergodic (but I don’t know how well to trust the proof). Used as basis for TripleRand and DualRand. Small state.

• RanshiEngine– Modeled on spinning balls. Very large state, but fast and excellent.

• MTwistEngine– Based on number-theory considerations involving large Mersenne primes.

Very Large state. Many randomness properties proven, though the math is difficult. Fast and excellent (can be made even faster…)

• All the above engines have excellent randomness properties.

Page 19: Random Numbers in HEP

More CLHEP Engines• Hurd288 and Hurd 160

– Based on primitive polynomials over Z(2). Provably ergodic (but I don’t know how well to trust the proof). Used as basis for TripleRand and DualRand. Small state.

• RanshiEngine– Modeled on spinning balls. Very large state, but fast and excellent.

• MTwistEngine– Based on number-theory considerations involving large Mersenne primes.

Very Large state. Many randomness properties proven, though the math is difficult. Fast and excellent (can be made even faster…)

• All the above engines have excellent randomness properties.

Page 20: Random Numbers in HEP

And some Low-quality CLHEP Engines• DRand48Engine

– Based on a common Unix system random number generator. Very small state and fast, but fails several randomness tests.

• RandEngine– Directly calls the system function rand(). Not portable; state not savable. Its claim to

fame is that the usual implementation of rand() is dissected in Knuth and shown to be a terrible generator!!

– RandEngine, when coded properly, fails half of the DIEHARD tests• NonRandomEngine

– Lets people test program behavior on a set forced sequence of randoms. Requires user to specify a sequence. Useful for testing. If you use this to try to generate true random numbers, you get what you deserve.

• Why was RandEngine left in CLHEP?– According to a web page circa 1997, because Geant4 insisted on it.– Fortunately, they then proceeded not to use it.

Page 21: Random Numbers in HEP

How the CLHEP Engines do on DIEHARD

• DRand48Engine failed the three “alphabet sequences” tests– indicating bad correllations at the range of 5-

number subsequences

• Ranlux at luxury level 0 (which was never recommended) failed the Birthday and Craps tests.– At luxury level above 1 it is fine

Page 22: Random Numbers in HEP

How the CLHEP Engines do on DIEHARD

• Several engines semi-failed the Minimum Distance test– The more numbers we could afford to generate, the

more likely the failure– We now understand the flaw– A summer student in 1998 coded the test in C++ from

scratch, with my correction terms in the distribution, and all the “good” engines passed.

• All engines other than RandEngine and DRand48 pass the entire DIEHARD Suite.

Page 23: Random Numbers in HEP

RandEngine• On systems until Linux, rand() used a

standardized 15-bit linear congruence– Lame, see Knuth– But kind of OK.

• At some point, an attempt was made in CLHEP to combine 2 calls to rand() by a shift and OR – to get 3 bits of randomness.– Still just as lame

Page 24: Random Numbers in HEP

RandEngine

• This coding was blown, in that when Linux came along with a 32-bit result for rand(), the OR caused the average value returned to be 5/8!!!!

• Results from programs run on Linux which used RandEngine must be discarded.

• Fortunately, it appears nobody trusted RandEngine anyway– CDF uses Ranecu– Geant4 does not have RandENgine anywhere in there

source code.

Page 25: Random Numbers in HEP

Recommendations

• Ranecu is perfectly fine, no need to change.• Ranlux on level 2 is about as fast but “provably”

reliable.• MTwist (if you have only a few instances so can

afford a very large state), TripleRand, and JamesEngine are about twice as fast and very probably just as good or better.

• Ranshi is faster still but has a huge state, and I’m not 100% convinced it can’t have flaws.