prime numbers and their application

Upload: tanmayan-pande

Post on 13-Jul-2015

138 views

Category:

Documents


1 download

TRANSCRIPT

Prime Numbers and Their Application

You may be reading through all of this and wondering what exactly is the point of finding out prime numbers? Up until the 19th century, there was little use for prime numbers, and mathematicians calculated them hoping to make some form of a breakthrough. Calculation of prime numbers was a purely mathematical endeavor. But, in the 19th century, there was a need for secrecy, especially during times of war. Messages and files needed to encoded, so that the enemy couldn t read them. Encryption was used, and computers were used to make more complex, harder to crack codes. It was found that using two prime numbers multiplied together makes a much better key than any old number, because it has only four factors. One, itself, and the two primes that it is a product of. This makes the code much harder to crack. Finally, after thousands of years, man had a real use for primes, and what better use than keeping documents and files safe. Prime numbers can also be used in pseudorandom number generators, and in computer hash tables. Primes only use is in the computer world, and we still don t have any use for primes in the physical world, however some say that some insects do.

Some insects will live in the ground for a number of years, and come out after 13 or 17 years. Both 13 and 17 are prime numbers, and by emerging at these times, it makes it harder for predators to adapt and kill the insects, and therefore more of them survive. One bug that does this is the cicada.

The only even prime number is 2. Zero and 1 are not considered prime numbers.

Except for 0 and 1, a number is either a prime number or a composite number. A composite number is defined as any number, greater than 1, that is not prime.

Note: The definition of a prime number doesn't allow 1 to be a prime number: 1 only has one factor, namely 1. Prime numbers have exactly two factors, not "at most two" or anything like that. When a number has more than two factors it is called a composite number.0 is neither. It is not prime - lots of other numbers divide into it. (They all do, except for zero itself). But it is not composite - the only way you could get zero by multiplying primes together is if one of the primes were zero, which isn't the case.

Often when defining prime and composite numbers, the fact that the number must be greater than one is part of the definition. Recent Significant Primes On 19 Nov 2011, 14:03:58 UTC, PrimeGrid's PRPNet found the largest known generalized Fermat mega prime: 75898^524288+1 The prime is 2,558,647 digits long and enters Chris Caldwell's The Largest Known Primes Database ranked 1st for generalized Fermat primes and 13th overall.

The discovery was made by Michael Goetz (Michael Goetz) of the United States using an NVIDIA GeForce GTX 460 in an Intel Core 2 Quad Q6600 @ 2.4GHz with 8GB RAM, running 64 bit Windows 7. The GPU took just under 5 hours to

probable prime (PRP) test with GeneferCUDA. The CPU completed the primality test using pfgw64 in a little over 8 days!

Preface: What is This?

The RSA cipher is a fascinating example of how some of the most abstract mathematical subjects find applications in the real world. Few are the mathematicians who study creatures like the prime numbers with the hope or even desire for their discoveries to be useful outside of their own domain. But every now and then that is exactly what happens.

This text explains the mathematics behind RSA -- how and why it works. The intended audience is just about anyone who is interested in the topic and who can remember a few basic facts from algebra: what a variable is, the difference between a prime number and a composite number, and the like.

The most important mathematical facts necessary for understanding RSA's foundations are reviewed near the beginning. Even if you are familiar with everything covered in these sections, I would recommend that you at least skim through them.

In one or two places, I have specifically targeted an explanation to what I consider to be the average computer programmer, leveraging analogous concepts in programming and general mathematics.

Before getting started, I should make some observations on the mathematical notation used here.

For the most part, where notations for the same idea differ between standard mathematics and the common practices among computer programmers, I have stuck with the mathematicians. This is, after all, a mathematical subject. However, I have deviated in a few places where there was too much opportunity for confusion. I have used * to denote multiplication, and have completely avoided "implied" multiplication (i.e., using PQ as shorthand for P * Q). Since MS Word cannot display superscripts, I have used ^ to denote exponentiation. (This necessitates more parenthesizing than would normally be used.) The mathematician's three-bar congruency symbol is not available, so I have made do with = instead. Variables are always named with a single capital letter.

Finally, please note that throughout the text I use the word number to refer specifically to a positive integer -- what are sometimes referred to as the natural numbers, or counting numbers.

Introduction: The Idea of a Trapdoor Function

What a mathematician refers to as a function is very similar to a function in computer programming. It is, in essence, an abbreviation. For example:

F(X) = 7 * X + 43.

If X happens to be 3, then F(X) will be 64. So, "F(3)" is shorthand for "7 * 3 + 43".

Of course, in a computer program, functions are used to encapsulate all kinds of algorithms, and frequently make use of external variables and the like. In mathematics, however, a function is used solely for the number it returns. And, given a certain number as input, they will always return the same output. (Thus, rand() would not qualify as a mathematical function, unless it were written so that the seed value was passed in as an input parameter.)

Mathematicians often consider how to construct a function's inverse -- taking a function and making a new one that "goes in the other direction", so to speak:

G(X) = (X - 43) / 7.

G(64) is equal to 3, and in general, G(F(X)) is equal to X. Therefore, G is F's inverse. Not all functions are invertible, of course. Clearly, the function:

F(X) = X * 0

cannot be inverted. (Because how could G(F(X)) return X when F(X) is always zero?)

Usually, when you have a mathematical function for which an inverse does exist, constructing it is not too difficult. In fact, it is often transparent. Typically, you can just run through the steps backwards, subtracting where the original function adds, and so on. But can it be done for every invertible function?

To put the question in terms of programming, imagine that there are two functions:

int foo(int x); int bar(int x);

foo() and bar() work like mathematical functions -- they do nothing but compute a return value, and given the same number for input, they will always produce the same output. (And pretend for the moment that this is on a machine where integers can be arbitrarily large.) Suppose you are told that bar() is the inverse of foo(). The statement:

x == bar(foo(x))

is always true, as long as x meets foo()'s requirements for a valid argument.

Now, imagine that you have the source code for foo(), but not for bar(). Can you write your own replacement for bar(), just by examining foo()?

It seems that you ought to be able to. There are no secrets as to what foo() does, after all. You can run foo() with different inputs as many times as you like. You already know that bar() exists, somewhere, so you know that it is possible to write. Is it guaranteed that you can reconstruct it?

Theoretically speaking, the answer is yes. Given such an function, it is always possible to construct its inverse. However, if we also throw in the tiny constraint that you have to finish before the heat-death of the universe, the answer subtly changes.

There are some special functions that, though what they do is simple enough, and how they do what they do is utterly transparent, figuring out how to undo what they do is a diabolical task. Such a creature is a trapdoor function. Anyone can fall through a trapdoor, but only those who know where the hidden lever is can climb back out again.

In 1975, Whitfield Diffie, Martin E. Hellman, and Ralph Merkle realized that a trapdoor function could be the basis for an entirely new kind of cipher -- one in which the decoding method could remain secret even when the encoding method was public knowledge. Diffie and Hellman published a paper in 1976 that described this idea, and offered some examples of weak trapdoor functions. And in 1977, Ronald L. Rivest, Adi Shamir, and Leonard Adleman outlined, in an MIT technical memo, an excellent candidate that became the basis for the RSA cipher.

What follows is a description of that function.

Background, Part I: How to Calculate with Exponents

Here's a quick refresher on how to combine exponents. Recall that:

N^2 = N * N,

N^3 = N * N * N, N^4 = N * N * N * N,

and so on. For example:

2^7 = 2 * 2 * 2 * 2 * 2 * 2 * 2 = 128.

If we fiddle with exponents for a bit, we will quickly realize that:

N^E * N = N^(E + 1).

So, for example:

2^7 * 2 = 128 * 2 = 256 = 2^8.

Building upon this, we can also see that:

N^E * N * N = N^(E + 2).

But N * N can also be written as N^2:

N^E * N^2 = N^(E + 2).

We can extrapolate from this, and derive a more general rule -- namely:

N^A * N^B = N^(A + B).

And, if we repeated this process on the next level up, we would find that:

(N^A)^B = N^(A * B).

These two simple facts are very useful when handling exponent-laden formulas.

How to Make RSA Uncrackable

Of course, in our case the factors of R can be found by consulting a times table. So it's not much of a challenge. (For that matter, since we're encrypting one character at a time, our coded messages would also be vulnerable to good oldfashioned cryptanalysis).

To make it less easy to find R's factors, we need to pick larger prime numbers for U and V to begin with. If, instead of 5 and 11, we had chosen 673 and 24971, we would have a value of 16805483 for R, and phi(R) would be 16779840. (This would also give us enough room to encrypt more than one byte at a time, which seriously reduces the vulnerability to cryptanalysis.) Looking for a P and Q pair is no longer something you want to do with pencil and paper, of course, but it took

me less than three minutes to find the usable pair 397 and 211333 -- including the time it took to write and debug a Perl script.

On the other hand, it also took me less than three seconds to run "factor" on 16805483 to obtain 673 and 24971. Armed with those numbers, it wouldn't take much longer to derive 211333 from 397. So even these numbers aren't close to being large enough. We need really big numbers.

Well, we can certainly find huge values for R that are difficult to factor. But how far can we go before it becomes too difficult for us to use the number in the first place?

Huge Exponents in Modulus Arithmetic

The problem is this: The bigger R gets, the bigger P and Q will be, and P and Q are to be used as exponents! Even the relatively tame-looking

9^(9^9)

produces a number over 350 million decimal digits long. How are we going to be able to encrypt anything without needing terabytes of storage?

The trick is that we only need to calculate these exponential values modulo R. As always, modulus arithmetic simplifies things a great deal.

Let's revisit our example, and look at how we could decrypt the number 28, remembering that R = 55 and Q = 23:

28^23 (mod 55) = ?

To start with, we look at Q's binary representation. 23 in binary is 10111, which means that:

23 = 16 + 4 + 2 + 1, or 23 = 2^4 + 2^2 + 2^1 + 2^0.

We can now break the exponential calculation apart into several smaller ones:

28^23

= 28^(2^4 + 2^2 + 2^1 + 2^0) = 28^(2^4) * 28^(2^2) * 28^(2^1) * 28^(2^0) = 28^(2 * 2 * 2 * 2) * 28^(2 * 2) * 28^2 * 28 = (((28^2)^2)^2)^2 * (28^2)^2 * 28^2 * 28.

This may look like anything but an improvement, at first. But on a closer examination, you'll see that we actually have many repeated subterms. This simplifies matters, particularly when we take advantage of the fact that we are calculating in modulo 55.

We compute the first square in modulus arithmetic:

28^2 = 784 = 14 (mod 55).

By substituting this value into our equation:

28^23 = (((28^2)^2)^2)^2 * (28^2)^2 * 28^2 * 28 (mod 55),

we get:

28^23 = ((14^2)^2)^2 * 14^2 * 14 * 28 (mod 55).

Now by computing that square:

14^2 = 196 = 31 (mod 55),

we will have:

28^23 = (31^2)^2 * 31 * 14 * 28 (mod 55).

And, finally:

31^2 = 961 = 26 (mod 55), and 26^2 = 676 = 16 (mod 55);

and so:

28^23 = 16 * 31 * 14 * 28 (mod 55).

We can continue to take advantage of the modulus when we do the final multiplications:

28^23

= 16 * 31 * 14 * 28 (mod 55) = 16 * 31 * 392 (mod 55) = 16 * 31 * 7 (mod 55) = 16 * 217 (mod 55) = 16 * 52 (mod 55) = 832 (mod 55) = 7 (mod 55)

Lo and behold: 7, the same result as when we did it the hard way.

This binary technique is really no different than how computers normally compute integer powers. However, the fact that we can break the process down to successive multiplications allows us to apply the modulus at every step of the

way. This assures us that at no point will our algorithm have to handle a number larger than (R - 1)^2.

Huge Factors in Modulus Arithmetic

The magic of modulus arithmetic will also ensure that it's possible to find our P and Q pair. Remember that, after we've selected some humongous value for R, we need to find values to fit:

P * Q = 1 (mod phi(R)),

or, without the modulus arithmetic:

P * Q = K * phi(R) + 1, where K is any number.

After picking a likely value for P -- which probably will not be a conveniently small number like 7 -- we will need to find a matching Q. By rewriting the above as:

P * Q - phi(R) * K = 1,

with known values for P and phi(R), we have what is called a Diophantine equation. This really just means that we have more unknowns than equations. However, it also means that algorithms exist for solving it, the most well-known one being Euler's. (One thing you quickly discover when you dabble in number

theory is that a lot of things are named after Euler.) While it's still not something you'd want to do with pencil and paper, it doesn't involve anything more advanced than a whole lot of long division. In short, it's something that a computer can do in a relatively brief amount of time.

Safety in Numbers

Okay. So we know that the whole process is still practical, even if R is immense. But all of this is still moot unless we can select an R in the first place. R has to be the product of two prime numbers, don't forget. If we want R to be so big that it can't be factored easily, how are we going to find those factors to begin with?

It turns out that there is an interesting little asymmetry here. Determining whether or not a number is prime happens to be a relatively cheap process.

One of the most famous methods for testing a number for primality uses Fermat's Little Theorem. Here is the version of this Theorem that we're interested in:

If P is prime, then N^(P - 1) = 1 (mod P) is true for every number N < P.

Does this seems suspiciously reminiscent of Euler's Totient Theorem? It should. Euler was the first person to publish a proof of Fermat's Little Theorem, and his Totient Theorem is a generalization of Fermat's. You can see this for yourself by remembering that phi(P) = P - 1 when P is prime.

Of course, as far as proofs go, this theorem is only useful for proving that a given number is composite. For example, it just so happens that:

4^14 (mod 15) = 268435456 (mod 15) = 1,

even though 15 is no prime. Nonetheless, it is also true that:

3^14 (mod 15) = 4782969 (mod 15) = 9, and 5^14 (mod 15) = 6103515625 (mod 15) = 10.

On the other hand, 17, which is prime, results in 1 every time:

3^16 (mod 17) = 43046721 (mod 17) = 1, 4^16 (mod 17) = 4294967296 (mod 17) = 1, 5^16 (mod 17) = 152587890625 (mod 17) = 1, and so on.

So, if we want to know if a number is prime, we can run it through this test, using (say) 2 as the base. If anything besides 1 results, we know with certainty that the number is composite. If the answer is 1, however, we try the test again with 3, and then 4, and so on. If we keep getting back 1 as the result, it soon becomes rather unlikely that the number is anything but prime.

Unlikely, but not impossible. There are a handful of numbers which pass this test for every base, but which are not prime. Called Carmichael numbers, they are far

more rare than the prime numbers -- but, like the primes numbers, there are still an infinite number of them. So we wouldn't want to rely on this test alone.

Fortunately, there are other tests for primality which are more reliable. But they all have at least one thing in common with this test: When they reject a number, they tell us only that the number can be factored. The test results give us no information at all as to what the factors might be.

Summing Up

The basic truth is that, in order to find the factors of a composite number, we're pretty much stuck with using brute force: Divide the number by all the primes you can get your hands on until one of them goes in evenly. There are plenty of ways to improve on this approach (the Number Field Sieve currently being the best), but they are complicated, and all they do is allow you to narrow the scope of the search. They don't reduce the search enough to make this problem tractable in general.

Nor is it likely that new approaches will, either! The real issue is that the encrypting and decrypting algorithms have a running time that is linear with respect to the length of R. That is to say, doubling the number of digits in R doubles the amount of time (roughly) needed to encrypt, decrypt, and to select the two primes to make a key with. But the algorithms for factoring R have a running time that is exponential with respect to the length of R. That is to say, the time (roughly) doubles with every few digits! (Because every digit added to R makes it ten times larger, and thus multiplies the number of potential candidates for its measly two factors.)

So if a new technique is suddenly found that makes it a trillion times faster to factor a number, all we have to do is increase the size of R we use by enough digits, and the situation will be right back where it started -- and all it means to us is that it takes a little bit longer to send and receive our messages. Already some people are using keys that, in order to factor with the Number Field Sieve, would require more energy than exists in the known universe.

An illustration: At the time of my writing, one of the largest general numbers that has been independently factored was the number used as the modulus for the RSA-140 challenge. (By "general numbers", I'm excluding creatures like Mersenne numbers and Fermat numbers, which have specialized factoring techniques that are inapplicable elsewhere.) It was completed on February 2, 1999. Now, the record previous to this was the RSA-130 number, and the process of factoring it was estimated as taking a total of 1000 MIPS-years of computer time. RSA-140, a number only 10 decimal digits longer, required twice that amount.

This, finally, is the heart of what makes RSA a trapdoor function: the gap between obtaining a number with two prime factors, and rediscovering the factors from the number itself. And the gap just keeps expanding as the numbers get larger.

The breakthrough that would completely destroy RSA's security would be an algorithm that actually produced a number's factors directly, instead of merely narrowing the search's scope. Such a thing has not been proven impossible, and it may well be that such a proof will never be found. But considering that prime numbers have been studied for thousands of years, and given the renewed attention that has been focused on this problem in the last few decades, the likelihood of the existence of such an algorithm appears very remote. Discovering one would change the face of number theory as much as RSA has changed the face of cryptography.

However -- if this were to happen, there are other trapdoor functions out there, waiting to be found. Whatever the future of RSA may be, the trapdoor cipher has certainly changed the face of cryptography forever.

References

In Numbers 2 is considered to be the first prime number. 29 is the 10th prime number. 541 the 100th. 7919 is the 1000th, and 1,299,709 is the 100,000th.

1287821 is the biggest palindrome in the first 100,000 prime numbers. 170: That's how many palindromes there are in the first 105 primes. 929 is the biggest prime palindrome under 1000. 13331 is the smallest prime palindrome to have 3 consecutive digits. 0: There are no palindromes with 4 consecutive digits in the first 100,000 primes.

1222229 is the biggest number to have 5 consecutive digits in the first 105 primes.

4260 prime numbers have 3 (or more) consecutive digits. 309 prime numbers have 4 (or more) consecutive digits. 26 of the first 100,000 prime numbers have 5 consecutive digits.

There are 10,250 twin primes in the first 100,000 prime numbers. A twin prime is a prime number that differs from another prime by two. {3,5} is the smallest twin prime. 20.5% of the first 100,000 prime numbers are in a prime twin.

129 is the sum of the first 10 primes sumed up. 24,133 is the sum of the first 100 primes. 3,682,913 of the first 1000 and 62,260,698,721 of the first 100,000.

25,007 of the first 100,000 prime numbers end with the digit 7. That's about 25%. 1 is the most appearing last digit in the first 10'000 and 50'000 primes.

The different thing about primes is that there is no real pattern to their occurrence. They seem to appear randomly, however, we do know that all but

two of them end in either 1, 3, 7, or 9. Other than that though, we have found no distinct pattern. This makes finding primes incredibly hard. If we knew they occurred every 24 number, they would be very easy to find. But, with their random appearances, each number has to be tested individually in order to determine if it is prime or not. Even with computers, this is a very grueling task. Recently, some progress has been made by graphing the prime numbers, but the patterns they form are very difficult to follow, and generating large primes with them would be difficult. Other, smaller patterns can be found in prime numbers, but eventually they terminate.

Some neat prime numbers that have patterns are 12345678901234567891. This is an ascending prime. Another neat pattern in primes is palindromic primes, and these read the same from front to back, such as 111191111, 919191919, and 123494321.

There are several types of primes, however there are two main ones. These are Fermat primes, and Mersenne primes. Each is described as two to the power of a number, and then plus or minus one. The format for Fermat primes is Fn= 2*2^n + 1. One example of this would beFn =2(2^0)+1 Fn =2*1+1 Fn =3

Three is a prime number. Some other Fermat primes are 5 and 17. There is only a limited number of Fermat primes, and not every number put into this equation turns out to be prime.

Mersenne primes can be expressed as Mn = 2n - 1. Here s an example

Mn =2*2-1 Mn =4 -1 Mn =3

Three is a prime number, and it is unique because it is both a Mersenne prime and a Fermat prime. There are many more Mersenne primes, such as 7 and 31.

Many very large prime numbers have been found using these formulas. In fact, the largest known prime (as of May 1st, 2007) is 232582657 -1. Many of the largest primes found are mersenne primes, because they are easy to check for primality with a computer.

As I mentioned, there are many other types of primes, such as twin primes, Wieferich primes, Wall-Sun-Sun primes, Wilson primes, and Wolstenholme primes.

Every time you buy something on the internet with your credit card, you use prime numbers to keep your personal information secure. How? Using RSA cryptography. Read on to learn more. Or, if you are brave, go directly to the challenge problem of decoding the secret message 243689518774052214930089506033... displayed below.

The task of cryptography (Greek kryptos hidden, graphein to write) is to turn messages into gibberish so that they cannot be read by persons other than the intended recipient. If I send my credit card number and PIN over the internet to an online book store, the book store should be able to read it, but no one else should. A cryptographer starts with plain text such as "ATTACKATDAWN" and produces cipher text such as "OTMMGKLHDTIR." This is encryption. Decryption is the reverse: it produces plain text from cipher text.

In the old days, to send a secret message, you first had to send a key secretly to the recipient. This you did by meeting in person, or through a trusted courier. In the example above, the key was "OATMEAL" and the encryption method is due to Vigenre.

But the internet is not a trusted medium, so how do you get started? You can't use the internet to send the key, then your financial information. A bad person could capture your internet traffic with the bookstore. He would use the key to read your credit card number and PIN, and would then charge expensive travel to exotic places to your account.

This dilemma was solved in 1978 by Rivest, Shamir, and Adelman. Their method, now known as RSA, depends on some marvelous properties of prime numbers. One of these is that it is rather easy to generate large prime numbers, but much harder to factor large numbers into primes. Another is Fermat's little theorem.

As an example, to be explained in more detail below, consider the message

243689518774052214930089506033

998596335782879839107051625360 7140448055114932771201027350325, 323915666133187777174633743307 665741495158513587387621667442 84515065903121845841724822236676

It was encrypted using the key

e = 5, N = 519208104502047440191322024032 461128846299254256408973265508 51544998255968235697331455544257

The number N is the product of two primes, just as 39 is the product of the primes 3 and 13. If you could discover the primes, you could decrypt the message. But this is not so easy to do, and this is one of the reasons why RSA works. Primes, factorization, and a a secret message to decode

1234567 = 127*9721

1020030004000050000060000007 = ? * ? * ? ...

public key = 5, 519208104502047440191322024032 461128846299254256408973265508 51544998255968235697331455544257

secret message = 243689518774052214930089506033 998596335782879839107051625360 7140448055114932771201027350325, 323915666133187777174633743307 665741495158513587387621667442 84515065903121845841724822236676 = ???

Challenge problem. What is the secret message? (The reward is the satisfaction of knowing the answer).

How do you know if your answer is correct? Use the algorithm and the table for converting text to numbers and vice versa. If the result makes any sense at all, you almost certainly have the correct answer.

To decode the secret message, you must first factor the number 5192.........4257 of the public key into a product of primes, just as 39 factors as 3 x 13. Then you have to do some standard number theory computations.

The fact that finding primes is "easy," while factoring into primes is "hard" is what makes RSA work.

The government uses prime numbers to hand out stipends, that way there is always a remainder which they can collect back. :)

Prime numbers are also good for encryptions because to hack it a common method is to use addition subtraction, and other elementary operators which can be difficult with prime number codes. Prime numbers are essentially a novelty. What is the use of 6? or 12? or 18? They are just numbers that fall into a certain category. Someone recognized their similarity and it applies to only mathematical models.

Finding Primes

Sieve of Eratosthenes 1 2 14 3 15 19 31 35 47 51 63 67 79 83 95 4 16 20 32 36 48 52 64 68 80 84 96 5 6 7 8 9 10 11 12 13

17 18 30 33 34 46 49 50 62 65 66 78 81 82 94

21

22

23

24

25

26

27 28

29

37

38

39

40

41

42

43 44

45

53

54

55

56

57

58

59 60

61

69

70

71

72

73

74

75 76

77

85

86

87

88

89

90

91 92

93

97 98 110 113 114 126 129 130 142 145 146 158 161 162 174 177 178 190 193 194 206 209 210 222 225 226 238 241 242 254

99 111 115 127 131 143 147 159 163 175 179 191 195 207 211 223 227 239 243 255

100 112 116 128 132 144 148 160 164 176 180 192 196 208 212 224 228 240 244 256

101

102

103

104

105

106

107 108

109

117

118

119

120

121

122

123 124

125

133

134

135

136

137

138

139 140

141

149

150

151

152

153

154

155 156

157

165

166

167

168

169

170

171 172

173

181

182

183

184

185

186

187 188

189

197

198

199

200

201

202

203 204

205

213

214

215

216

217

218

219 220

221

229

230

231

232

233

234

235 236

237

245

246

247

248

249

250

251 252

253

Making a list of prime numbers is surprisingly easy. One of the simplest methods was created by Eratosthenes, a classical Greek mathematician and what we now would call a polymath. Among many accomplishments, Eratosthenes

invented what we now call the "Sieve of Eratosthenes", a way to quickly distinguish prime numbers from composites.

The Sieve works like this:

Create a list of sequential integers, {1 ... n}, n being the largest number of interest. Initially set a position p at the first prime number in the list (the number 2). Strike off all numbers that are multiples of p: {p*2, p*3, ...}. Move p forward in the list to the the nearest number not stricken off in step (3) (this will be a prime number). Until p is greater than or equal to the square root of n, repeat from step (3) above. Done.

The Sieve is trivial to implement in computer code and is useful for creating lists of primes within reasonable values of n. It's important to note that the Sieve is normally represented in a computer as an array of binary bits, consequently increasing the value of n requires more computer memory. Finally the point is reached where the bit array is too large to be practical and other methods must be used.

I want to emphasize that the Sieve example on this page is represented as a square array only for display convenience. The Sieve is in some ways better visualized as a long one-dimensional row of numbers. As it turns out, because of how I've designed the example, the reader can enter particular width and height

numbers to form a long row (or column) and see how this looks. Unfortunately, most of us don't have really, really large computer displays, so most of a long row or column will be invisible at any particular time.

How Many Primes?

Infinity

It is known that there are an infinity of primes, and this has been known since antiquity. In Euclid's Elements, Book IX, proposition 20, Euclid offers a proof similar to this one:

Assume there is a largest prime number. Call this number p. Create new number q, equal to the product of all primes between 2 and p, plus 1. Our new number q has no factors in the original set of primes (between 2 and p), because dividing by any of them would produce a remainder of 1. From point (3) we conclude that q is either itself prime, or is composed of prime factors, all of which are larger than p. Point (4) falsifies point (1).

Even though a proof like the above needs no specific numerical evidence, here is an example with real numbers:

Assume the largest prime is 17: p = 17. Our new number q equals the product of all primes in the set {2 ... p} plus 1: q = 2 * 3 * 5 * 7 * 11 * 13 * 17 + 1 = 510511 According to the factoring calculator at the top of this page, q has these prime factors: 19, 97, 277. All the prime factors of q are larger than p. Point (4) falsifies point (1).

There are a large number of proofs of the infinity of primes, this one happens to be easy to understand.

Prime Number Theorem

While running the Sieve of Eratosthenes example above, the reader may have noticed that, as ordinary numbers get larger, there are fewer prime numbers scattered among them. This observation has been studied a great deal, and has led to the following theorem:

Let (x) be a prime-counting function that gives the number of prime numbers at or below a particular real number x. Let ln(x) denote the natural logarithm of x. The Prime Number Theorem asserts this limit:

(2)

The limit in equation (2), known as the asymptotic law of distribution of prime numbers, can be restated as an approximation:

(3)

In everyday language, the Prime Number Theorem says that, as x approaches infinity, becomes a better and better estimate of the number of primes at or below x.

The Prime Number Theorem is one component in an effort to understand primes and their distribution. The fact is prime numbers are still rather poorly understood, with many important questions yet to be answered. Because of our reliance on primes as shown by the widespread acceptance of RSA cryptography, we can't afford to ignore the inner workings of primes.

Conclusion

Number theory is one of many specialized fields of mathematics that until recently had no obvious practical purpose. As we come to rely more and more on technology, and as our study of nature becomes more sophisticated, more dependent on evidence and less dependent on belief, we are becoming aware that mathematics is an articulate way for us to describe nature, and for nature to describe herself to us. Consider what is revealed by the locust example presented in this article it means that natural processes, and natural selection, can generate prime numbers

as easily as the Sieve of Eratosthenes (and by way of a similar mechanism). Until now, scientists have sometimes been caught off guard by the degree to which nature is described by mathematics, but I think we're coming to realize that mathematics is nature's native tongue, and we all should learn the language nature speaks in.

If we plan to visit France, it can't hurt to learn a few simple tourist phrases, just to get along, n'est-ce pas? The more French we learn, the more we will understand while traveling in France. The same can be said of nature. In my view, to try to understand nature without learning her native tongue seems rude.