stronger forms of zipf's law

9
Stronger Forms of Zipf's Law Author(s): Bruce M. Hill and Michael Woodroofe Source: Journal of the American Statistical Association, Vol. 70, No. 349 (Mar., 1975), pp. 212- 219 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2285406 . Accessed: 14/06/2014 11:31 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org This content downloaded from 62.122.78.49 on Sat, 14 Jun 2014 11:31:52 AM All use subject to JSTOR Terms and Conditions

Upload: bruce-m-hill-and-michael-woodroofe

Post on 21-Jan-2017

214 views

Category:

Documents


1 download

TRANSCRIPT

Stronger Forms of Zipf's LawAuthor(s): Bruce M. Hill and Michael WoodroofeSource: Journal of the American Statistical Association, Vol. 70, No. 349 (Mar., 1975), pp. 212-219Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2285406 .

Accessed: 14/06/2014 11:31

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.

http://www.jstor.org

This content downloaded from 62.122.78.49 on Sat, 14 Jun 2014 11:31:52 AMAll use subject to JSTOR Terms and Conditions

Stronger Forms of Zipf's Law BRUCE M. HILL and MICHAEL WOODROOFE*

It is shown that convergence in probability to Zipf's Law follows from a many family modification of a Bose-Einstein form of the classical occupancy model with a random number of cells.

1. INTRODUCTION AND SUMMARY

In this article it is shown that an extension of the basic Bose-Einstein model proposed by Hill [2] yields con- vergence in probability to the generic-specific form of Zipf's Law.

By the generic-specific form of Zipf's Law is here meant any system of classification of units such that the propor- tion of classes with exactly s units is in some specified sense approximately proportional to s(l+e), for some constant a > 0. It is familiar in a variety of empirical areas, including linguistics, personal income distributions, the distribution of biological genera and species, and a great many others, that such a relationship holds to a surprisingly good approximation [4, 5, 6, 7]. For con- creteness and vividness, the situation will here be de- scribed in terms of the distribution of biological genera and species. Lest there be any misunderstanding, we wish to make it clear that we do not mean to imply that our assumptions hold literally for this situation. Rather, they should be viewed as being tentatively entertained, and of interest because they do yield, under a simple probabilistic model, Zipf's Law, a law for which there is substantial evidence in regard to the distribution of biological genera and species [6, 7].

The model proposed in [2] is as follows. Suppose that N species are to be distributed to M nonempty genera. Let Li be the number of species allocated to genus i, and let G(s) be the resulting number of genera with exactly s species. Suppose that the allocation of species to genera is of the Bose-Einstein form

Pr{L=ifM,N} Ni(M1)

for all I1- (11, * l * M) such that 1i > 1, EM 1i = N. Suppose further that given N, M has a conditional dis- tribution such that Pr { M/N < x i N } converges properly to a distribution function F (x) with F (0) = 0. Then it was shown in [2] that G(s)/M, the proportion of genera with s species, is, in the limit as N -* o, distributed like e (1 - 0) s-1, where (@ denotes a random variable having distribution F. If ( has the beta distribution B (a, b),

* Bruce M. Hill and Michael Woodroofe are professors, both with the Depart- ment of Statistics, University of Michigan, Ann Arbor, Mich. 48104.

i.e., if F has density function

F'(x) = P(a -- b)[F(a)P(b)]-1xa-l (1 - X)b-l

where P is the gamma function, 0 < x < 1, and a > 0, b > 0, then

E1i e(- 0)S-1 }- ar (a + b) [r (b)]-s-(l+a)

as s -> , where the symbol "' -' indicates that the ratio of the two sides tends to unity. In fact, the approxima- tion is generally good even for small s. For example, if eH has the uniform distribution on the unit interval, then

E I((1- =8 [-1 s -[S(s + 1)]-1,

which is a simple and important form of Zipf's Law fitting approximately a great variety of data [7].

However, under the above model, a realization of G (s) /M would be of very nearly geometric form ()o(l- Oo)8 l, for sufficiently large N, where e0 could be interpreted as a realization of the random variable 0. Thus, this model yields a very weak form of Zipf's Law in which only the expectations are of the appropriate form. Since some data, e.g., the lizards analyzed in [2] and reconsidered here, seem to call for a stronger form of Zipf's Law, namely one in which G(s)/M itself tends to be very nearly proportional to s-(1+), it is natural to consider ways of achieving the result G (s) /M converging in probability to p(s), where p(s) > 0, E.'=, p(s) = 1, and p(s) - Cs-(l+a) for some constant C. In this article it is shown that a simple and natural modification of the original model does indeed lead to such a result under quite general conditions. The mrlodification involves a twofold classification of species, first into families, and then into genera within families. Such a modification, proposed in [3], was there shown also to yield the rank- frequency form of Zipf's Law, i.e., if L(r) is the number of species in the rth largest genus, or if L(r) is the popula- tion of the rth largest city in a country, then LDr) is ap- proximately proportional to r-(l+a) for some a > 0.

In Section 2 the model is formulated and the con- vergence of G(s)/M to p(s) is proven. In Section 3 some examples are given which are relevant to the analysis of the reptile data discussed in Section 4.

2. MODEL AND CONVERGENCE

We consider N species which are divided into k (non- empty) families with Ni species in the ith family. The Ni

? Journal of the American Statistical Association March 1975, Volume 70, Number 349

Theory and Methods Section

212

This content downloaded from 62.122.78.49 on Sat, 14 Jun 2014 11:31:52 AMAll use subject to JSTOR Terms and Conditions

Strong Zipf's Law 213

species in the ith family are divided into Mi (nonempty) genera according to Bose-Einstein statistics, as in the introduction. Let Gi(s) be the number of genera in the ith family with exactly s species, let G(s) = Gl(s) + * + Gk(s), and let M = M1 + + Mk. The proportion of genera with exactly s species is G(s)/M, which is the weighted average

-G(s) = E _ __

M ( =( M Mt

In effect, Hill [2] considers only a single family, say the first, and shows Gl(s)/Mi = O1(1 - OE)'-1 + 8N, as N1 -* o, where (1 = M1/N1 and avN converges in probability to 0. In the present model Gj(s)/Mj = (yi( - (i)g-1 + aNi, for each i, and it is reasonable to hope that some form of the law of large numbers will imply that the weighted average G(s)/M converges to a constant p(s). This is, in fact, the case under modest assumptions to be listed shortly. We shall show in Theorem 1 that this holds with

p(s) = t( - t)'-ldH(t)

0

for s > 1, where H is an appropriate distribution func- tion. We shall show in Theorem 2 that p(s) 0 Cs-(1+a) as s -o for a large class of possible H. Thus, under the present model, individual realizations of G(s)/M may be expected to follow a Zipf's Law, approximately. The lizard data analyzed earlier comprise 21 families, so the present model is more appropriate.

The proofs of the theorems will be given in terms of triangular arrays of random variables. Those readers uninterested in the mathematics may wish to skip the remainder of this section, and go on to Section 3 where examples are discussed.

We now introduce notation, state assumptions, and prove a series of lemmas leading to the first theorem. We suppose throughout that for each k = 1, 2, ..., the bivariate vectors (Mk1, Nk1), * , (Mkk, Nkk), form a triangular array, where Mki and Nki are the numbers of genera and species, respectively, in the ith family, when the total number of families is k. We shall let k - oo, and view a particular set of data, e.g., the lizards, as forming a row of the array. If k is sufficiently large for such data, then the limiting behavior as k - co will be useful as a guide in analyzing that data.

Let k

aki = Mki/E(Mki), Gk(s) = E Gki(S), eki= Mki=Nki, i=l

k

#ki= Nki/E(Nki), AIk = Mki, =k = E(Mkl), i=l

where G1, (s) is the number of genlera with s species in the ith family when there are kc families.

If xl and YI are jointly distributed random vectors, for

each k = 1, 2, ***, then we shall say that Xk and Yk are asymptotically independent as k -0oo if the following is true: Xk and Yk are stochastically bounded as k -* 0; moreover, if ki, i > 1, is any subsequence of the integers for which (Xki, Yk) converges in distribution to a random vector (x, y), then x and y are independent.

It is an easy consequence of the Helly-Bray Lemma that if Xk and Yk are stochastically bounded, then Xk and Yk are asymptotically independent if and only if

lim Cov {g(xk), h(yk)} = 0 k ->ooo

for all bounded continuous functions g and h. We shall assume:

Al: For each k, given (Mkl, Nkl), * *, (Mkk, Nkk), the alloca- tion of species to genera within each of the k families is of the Bose-Einstein form, with the allocations within the families mutually independent.

A2: For each k, (Mk., Nkl), *-* *, (Mkk, Nkk) are exchangeable. A3: The vectors (akl, Oki) and (ak2, fk2) are asymptotically

independent as k -* oo. A3': Cov (akl, ak2) - 0 as k - . Moreover,

CoV {aklekl(l - ?kl)sl 1 ak2E)k2(l - ek2)811

as k - oo for every s = 1, 2, * * - A4: akl, k > 1, are uniformly integrable. A4': E (akl)2=o(k) as k- oo A5: Ak < oo forallk,and,lAk -0 ask .

We shall say that assumptions A are satisfied if either Al, A2, A3, A4, and A5 are satisfied or Al, A2, A3', A4', and A5 are satisfied.

Theorem 1. Let assumptions A be satisfied. Define Hk(t) = E caklIk(t)}, where lk(t) is the indicator of the event that ekl < t, 0 < t < 1. If Hk converges weakly to a distribution function H, then Gk(S)/Mk converges in probability to

p(s) = ft(l - t) -ldH(t) (2.1) 0

for s = 1, 2, We divide the proof into several lemmas.

Lemma 1. Let Xkl, *, Xkk be a triangular array of nonnegative random variables which are exchangeable in each row, and define Mk = E(xkl). Suppose that Xkl, k > 1, are uniformly integrable, and that M = lim Ik exists as k -0oo. Suppose also that Xkl and Xk2 are asymptotically independent as k -* oo0 Then

Xk = (xkl + + Xkk)/lk c-

in probability as k - o00

Proof. For each n, let 4. be a continuous function on EO, 0o) for which 4n(x)= x for 0 <x <in, cI(x) = 0 for x > n + 1, and 0 < 4n(x) < x for all x. Let (yk,n)

= 4f (xkq) and observe that limE{Xkl- (yk1) } = 0 uniformly in kc as in -so0 by uniform integrability.

Let e > 0 and 6 > 0 be given. Then there are integers m and ko for which E{Xkl -(yklm) } < (a/8 for all kc and

This content downloaded from 62.122.78.49 on Sat, 14 Jun 2014 11:31:52 AMAll use subject to JSTOR Terms and Conditions

214 Journal of the American Statistical Association, March 1975

I ,uk < e/4 for k > ko. Thus, for k > ko

Pr( > E) < Pr ( k- (9km) > E/4)

+ Pr (9|(k-) -

Amk I 2 e/4)I

where .mk = E(Yklm) and

(k m) = ((yklm) + * *+ (Ykkm))/k

By Markov's inequality and choice of m, Pr (xk - (pkm)

> E/4) < 4e1E{ Ik - (gkm) I < 8/2 for all k. By Chebyshev's inequality

Pr (k ..

A8m-mk I 2 4) <- Var (g kn)

16 /16(k - 1) - Var (Yklm) + ( - Coy ((yklm), (yk2m)) -

k Ec2 kE 2 / (2.2)

The second term on the right side of (2.2) tends to zero as k -> oX by asymptotic independence. The first is less than or equal to 16(m + l)Mk/kE2, which tends to zero as k - o. Thus, we may make

Pr (| (gkm) - Amk I > e/4) < ?/2

by taking k sufficiently large.

Lenma 1'. Let Xkl, * I Xkk be a triangular array of random variables which are exchangeable in each row. If A = lim E (xkl) exists as k oX and if

(1/k)E(Xkl)2 0 and Cov (Xkl, Xk2) 0

as k-* oo, then Xk s-*u in probability as k oc

Proof. We have

1 Sk -1

Var () - Var (Xkl) + -_

k) Cov (Xkl, Xk2)

1 Zk -1 < - E((xkl) 2) + ( k )COV (Xkl Xk2) I

which tends to zero as k-* o. Lemma 1' now follows from Chebyshev's inequality.

Let Hk be as in the statement of Theorem 1. If Hk

converges weakly to a distribution function H, and if 6 is any continuous function on the unit interval, then

1 1

Efaok14'(@k1) f }-J(t)dHk(t) -*fV(t)dH(t)

0 0

as k -* by the Helly-Bray Lemma. (The first equality is immediate if i/ is the indicator function of an interval (a, b], 0 < a < b < 1; and any continuous 4V may be approximated by a linear combination of such indicator functions.) It follows (from either Lemma 1 or I') that if Assumptions A are satisfied, then

Cak (ak +. * *+ a11)/Ic -*

in probability, and

1k - E aki0ki(( - Oki)' 1-) p(s) (2.3) k i=

in probability as k -*> o for every s = 1, 2,

Lemma 2. Let Al be satisfied. Define

pAk(, n) = E{Gkl(S) lIMkl = m, Nkl n

and

ak2(m, n) = Var {Gkl(S) I Mkl = Ml Nk1 = n}

There exists a constant C such that

A ik (m, n) - mekl (I - 0k1)8-l < C

and

Ck2(n, n) < CMik1 for all m, n, where eOU = m/n

Proof. It is straightforward but tedious to employ Equations (3.1) and (3.2) of [2] for A k(m, n), and ak2(m, n), and perform an induction on s.

We now prove

Lemma 3. Let assumptions A be satisfied. Then

( aki)- [ (Gk s - Oki(l - Oki)"-' (Ea kM)i

converges in probability to 0 as k - o.

Proof.

k k

E Gki(s) - E Mki-ki - Oki)s 1 i=l i=l

k k < |E Gki (8) - ,.d(Mki, N1i) i

i=l1 i=l

k k + A p4k(Mki, Nki) - I Mkieki(l - k08 1

k

< t (Gki(s) - Ak(Mk1, Nki)) | + kC i=l

by Lemma 2. Let

k Rk = E [Gki(s) - Ak(Mki, Nki)]

Then

El (Rk)'I = E{E((Rk)2|Mk11 Nkl, * y Mkk, Nkk) I = kE ftk2 (Mki, Nkl) ? < kCE(Mk1) = kCGk y

by Lemma 2 again. By Lemma 1 or 1', M/kMIkk converges in probability to 1 as k -* oc, while by A5, k -)* oo. Since

k k I E Gki(s) - Y Mki Oki (1 - Oi)"I/Mk < (I Rk + kC)/Mk i=1 i=l

it follows that the left side converges in probability to 0 as ik -*oc which completes the proof of Lemma 3.

This content downloaded from 62.122.78.49 on Sat, 14 Jun 2014 11:31:52 AMAll use subject to JSTOR Terms and Conditions

Strong Zipf's Law 215

Since ak converges in probability to 1, if Assumptions A are satisfied, Lemma 3 and (2.3) combine to yield

Gk(s) k Gki(S)

Mk i=1 Mki k

= - E akiEki(l - Eki)8-1 + op(l) k i=1

converges in probability to p (s) as k -> co. This com- pletes the proof of Theorem 1.

In fact, the argument just given yields a slightly stronger conclusion than was claimed in Theorem 1. If Assumptions A are satisfied, then Gk(S)/Mk converges in probability to p(s) for s = 1, 2, . with p(s) > 0 and p(1) + p(2) +?.. = 1 if, and only if, Hk converges weakly to a distribution function H, in which case (2.1) holds. To establish the "only if" half of this assertion let H and H' be any two limit points of the sequence Hk,

k > 1, and let ki, k2, ... and (kli), (k22'), - be sub- sequences for which Hki -* H and H(ki') -* H' weakly as i -* oo. Then Theorem 1 applies to both Hki and H(ki') as i -o, so if Gk(s)/Mk -* p(s) in probability, s = 1, 2, , then we must have

1 1

ft(1 - t)8-ldH(t) = t(1 - t)8-1dH'(t) (2.4)

0 0

for s = 1, 2, ... by (2.1). But (2.4) implies that H = H' since the moments of a distribution on the unit interval determine it uniquely. Thus, the convergence of Gk (S)/Mk

for all s = 1, 2, ... implies the weak convergence of Hk

to a limit H, as asserted. The conditions A3 and A4 are not comparable with

A3' and A4', even if Al, A2, and A5 are satisfied. How- ever, it can be shown that A3 implies A3' if (akl)2, k > 1, are uniformly integrable. We believe that both sets of conditions are of interest. A3 and A4 are of a qualitative nature and appear quite reasonable, while A3' and A4' are, to some extent, subject to empirical verification.

For the data analysis of Section 4, we find it more natural and convenient to employ the Oki rather than aki. As indicated in Example 2 of Section 3, Theorem 1 could just as well have been proved with the assumptions concerning the aki replaced by the same assumptions for the fki* In this case, A4' would become E(1kl)2 = o(k) as k -* *o. This assumption can be put into perspective by noting that exchangeability implies

E(Nkl/N) = k-'E(3kl) = k-1

Consider now the important special case where N is a nonrandom parameter which tends to co as k -> co. Then for any exchangeable distribution of the vector

Nk= (Nkl, * , Nkk),

1c'E(fkl)2 = cE (Nkl/N)2 ? lcE[Nkl/N] = 1.

If, for example, Nk has the Bose-Einstein distribultion,

with all possible realization of Nk equally likely, then

k-/E(3kl)2 2- 2k-1 .

If, on the other hand, Nk has a Maxwell-Boltzmann dis- tribution (i.e., the Nki - 1 are multinomial with equal cell probabilities) then

k-1E(k3kl)2 - N-1

For both of these distributions the assumption

k-/E(3kl)2 -O 0

holds. Although the upper bound of 1 for k-/E(3k1)2 can be obtained, this case seems of little interest. For es- sentially this upper bound occurs when all Nki are equal to 1, except for one, which then has the value N - k + 1. For the Nki to be exchangeable, each Nki must be equally likely to be the large one. Then k-1E(3kl)2 -* 1. However, such degenerate realizations of Nk seem highly atypical of our data.

3. EXAMPLES

We have seen that the limiting behavior of Gk(S)/Mk is governed by the distribution H (t) = limk X E{aklIk (t) }, where Ik (t) is the indicator function for the event Ekl < t. Let us also suppose, as in [2], that the sequence of distribution functions Fk (t) = Pr { ekl < t } converges weakly to a distribution function F (t). We now consider some examples of Hk(t) and Fk (t) which are of special interest in regard to Zipf's Law.

Example 1. Let aki and ekl be asymptotically indepen- dent. Then Hk (t) - E (akl)E{ Ik (t)} = Fk (t), so that Gk (s)/Mk converges in probability to Jfol t (1 - t)8-ldF (t). This example thus leads to convergence to the same limit as in [2], except that now Gk(S)/Mk itself converges to this limit, not merely E { Gk (s) /Mk }I

Example 2. Let fki = Nki/E(Nki). It is easily seen that the proof of the main theorem could have been obtained with fki replacing aki everywhere. In this case

Gk(S)11-k = (Z fkiEOkj[Gki(S)/Mki])/Z fki.eki

i i

converges in probability to 1 1

[ft2(1 - t)8-1dH*(t)] ftdH*(t)

0 0

where H* (t) = limk E { fklI k (t) }0. In particular, if 3kl iS

asymptotically independent of ek1,U then Gk(s)/Mk con- verges in probability to

1 1

[ftt2(1 -t) 8ldF(t)] f tdF(t)

0 0

Note that the difference in the limits for these two examples stems from the fact that in the first example

This content downloaded from 62.122.78.49 on Sat, 14 Jun 2014 11:31:52 AMAll use subject to JSTOR Terms and Conditions

216 Journal of the American Statistical Association, March 1975

Mkl/E(Mkl) is asymptotically independent of 0kl, while in the second example Nkl/E(Nkl) is. This situation is clarified by Example 3, which generalizes the previous examples, and is the basis for the analysis of the reptiles in Section 4.

Example 3. Let Mki = Xki(eki'), where

(Xki') = Xki/E(Xki)

is asymptotically independent of eki. Then

Hk(t) = E{MklIk(t)}/E{Mkl}

= E { (Xkl') (8kl")Ik(t) }/E{ (Xkl') (eklY) }

E { (Ok1l7)Ik(t) } IE { (Okl'y) I

t 1

y fYdF(y) fy dF(y)

0 0

if t is a continuity point of F. In this case, Gk (S)/Mk con- verges in probability to

11

(tl+Y(1 - t)s-8dF(t)) ftYdF(t) o 0

Note that -y = 0 is equivalent to Example 1, while -y = 1 is equivalent to Example 2.

This example can be put into perspective by taking logarithms to yield ln Mki = y ln eki + ln Xki. If we further suppose In Xki = constant + Eki, with Eki an error term having expectation zero, and that the Eki are uncorrelated, then Example 3 can be interpreted in terms of a linear regression between ln Mki and In Oki. As will be seen in Section 4, there is substantial evidence in favor of such a regression for the reptiles.

For later reference we now consider the case where F is a beta distribution, say B (a, b). Then, if a + -y > 0, Gk(s)/Mk converges in probability to

[tl+'Y(l -t)s-Wd(t) /tYdF (t)

1 1

r(a + y +1) r(a + b + y) r(s - 1 + b) r(a + y)r(b)F(s + a + b +* )

(a + -y)(s-2 + b)(s-1 + b) X ... X b

(s+a+b++y- 1) X ... X (a + b + y)

By Stirling's formula, for large s, Gk (s)/lkM k 's- la+' ,

so that the usual Zipf Laws arise if a = a + zy is such that 0 < a < 1. For the beta distribution it is also worth noting that Gk (l)/Mk converges in probability to (a+ y)/(a+ y+b).

We close this section with a theorem showing that Zipfian behavior for p (s) follows when HI(t) '-~ cia as t -Q.

Theorem 2. Let H be a distribution function on [0, 1] and let

p(s) t(l - t)-1dH(t)

0

for s = 1, 2, **. Jfthere are a > O and c > O for which H(t) ~ cta as t 0, then

p (s) -' ca 2J (a) /sl+a

ass -> . Proof. Integrating by parts, we may write p (s) = p1(s)

-p 2(S), where

pl(S) = (S -1) ft( - t) s-2H(t)dt

0 and

P2(S) = (- t)8-111(t)dt

0

Thus, it will suffice to show that pi(s) cl'(a + 2)/sa+l and p2(S)- cr (a + 1)/sa+1 as s-> xo. We shall prove only the second of these, since the proof of the first is similar.

Now, given c' > c, there is (by assumption) an E > 0 for which H(t) C c'ta for 0 < t < E. Thus,

f 1

p2(8)=f (1-t)s-1H(t)dt + (1 - t)s-lH(t)dt 0 f

< c' t(1 - t)S-'dt + s-'(1 - E)s

0

< c' ta(l - t)s-ldt + 0((1 - E)S)

0

=cP(a+? 1)r(s)/r(a++ s ? 1) + 0((1 -E))

C'r(a + 1)/sa+1, as s -o,

where we have used Stirling's approximation in the final step. Since c' > c was arbitrary, it now follows that lim sup Sa+lp2(S) < cJ'(a + 1) as s - oo; and a similar argument will show that lim inf Sa+lP2 (s) > cr (a + 1) as s -> cc to complete the proof.

4. DISCUSSION OF REPTILES

The preceding theory is now applied to families of lizards, snakes and turtles using data from Ditmars [1]. The lizards comprise 21 families, the snakes 17, and the turtles 11. Some of these families are in fact subfamilies, and some genera and families are considered more firmly established than others, but for simplicity we shall ignore such distinctions, as well as recent changes in the classi- fication of dubious cases. The numbers of genera, Mi, species, Ni, and proportions 83i = Mi/NVt, are given in Table 1, while Table 2 gives G (s) for the 7 largest lizard

This content downloaded from 62.122.78.49 on Sat, 14 Jun 2014 11:31:52 AMAll use subject to JSTOR Terms and Conditions

Strong Zipf's Law 217

families, the 3 largest snake families, the largest turtle family, all lizard families, all snake families, all turtle families, and the union of the 11 large families drawn from lizards, snakes, and turtles, where for any such col- lection of families G(s) is the total number of genera in that collection with s species. These form 7 such collec- tions and they will be designated as Cases 1 through 7, in the order just given. Table 3 gives certain statistics for each of these 7 cases, namely,

K = number of families in case, = least squares estimate of -y under the linear re-

gression model for ln Mi versus ln )i, (Var a)* = the estimated standard error of a,

= least squares estimate of a under the linear regres- sion model ln G(s) /M = constant -(1 + a) ln s + error, for G(s) > 1,

(VAr &)I = the estimated standard error of &, al = a-Y

K

a2 = -[K-1 L3 ln 0i]-1, i =1

K

= K-1 i i=l

G(1)/M = proportion of singletons, K K

[G(1)/M] = E (Ei'+z)/ (iP), A

t=1 ~~~i=l [G(1)/M] = (2 +? y)/ (2 + a + 1), and

K K

K-2 E (#i)2 = EI (Ni/N)2. i=1 i=1

These statistics are presented primarily for descriptive purposes, since inference based upon such data would be somewhat tenuous.

1. Mi, Ni and Oi = MJlNi for the Families

i = Family number Lizards Snakes Turtles

1 49, 270, .18a 3,103, .03 1, 1,1 2 2, 6, .33 2, 28, .07 2, 3, .67 3 2, 4, .50 7, 21, .33 3, 4, .75 4 6, 8, .75 13, 47, .28 2, 14, .14 5 30,193, .16a 3, 5, .60 1, 1, 1 6 48, 308, .16a 7, 41, .17 22,123, .18a 7 1, 1,1 1, 1,1 2, 4, .50 8 4, 14, .29 5, 5,1 3, 14, .21 9 7, 44, .16 129, 698, .18a 7, 24, .29

1 0 1 , 1,1 9, 23, .39 1 , 1,1 11 2, 3, .67 69, 264, .26a 6, 24, .25 1 2 1 , 27, .04 1 , 1,1 13 3, 7, .43 10, 55, .18 14 35, 102, .34a 29,138, .21a 1 5 14, 67, .21 a 5, 34, .15 1 6 17, 916, .1 8a 9, 43, .21 17 5, 15, .33 4, 68, .06 18 26, 371, .07a 19 3, 6, .50 20 2, 3, .67 21 3, 49, .06

M, N,K 260, 1595, 21 306, 1312, 17 50, 224, 11 M/N .16 .23 .22

a Indicates families used in large family analysis.

2. G(s) for the Seven Cases

Case s

1 2 3 4 5 6 7

1 85 90 5 103 127 16 180 2 35 31 4 43 40 12 70 3 20 25 2 26 30 4 47 4 12 14 2 13 19 3 28 5 11 14 2 15 16 2 27 6 7 7 4 8 9 5 18 7 6 6 1 6 9 4 13 8 4 6 4 8 10 9 5 7 5 9 12

10 5 2 5 4 7 11 3 3 3 4 1 6 12 1 1 1 1 2 13 2 3 2 3 5 14 1 1 1 1 2 15 2 1 1 4 2 3 16 3 3 3 17 1 2 1 2 3 18 2 1 2 2 3 19 2 3 2 21 2 2 2 3 4 22 1 3 1 5 4 23 1 1 1 24 1 1 1 25 3 3 3 26 1 1 1 27 1 1 28 2 2 2 31 1 1 1 32 1 1 1 33 1 1 1 37 1 1 1 40 1 41 1 1 1 42 1 1 1 44 1 50 1 1 1 64 1 1 1 66 1 1 1 97 1

106 1 1 1 160 1 1 1

M 219 227 22 261 297 50 468

On inspection of Table 3 it is seen that for all cases, except 6, there is no real evidence that 7y differs from 0, and even in Case 6, such evidence is not particularly strong. Thus, the data seem consistent with the hypothe- sis that In Mi is uncorrelated with In )j, i.e., y = 0, but not with the hypothesis that In Ni is uncorrelated with In 83, i.e., y = 1. If, in fact, - = 0, then G(s)/M should be approximately equal to E0J(1 - 0)8-1, where e) has the limiting distribution F, as in Hill [2]. It may be remarked that the turtles (Cases 3 and 6) are generally anomalous, and that these cases are based on the scantiest data.

The estimate &2 is the maximum likelihood estimate of a using only the 8. as data, and assuming that the limiting distribution F is a beta distribution B (a, b) with the parameter b = 1. This distribution for 8- implies that in (8@-') has an exponential distribution with expectation a-l. Although there is no special reason to anticipate such a beta distribution, &2 iS presented for its value as a

This content downloaded from 62.122.78.49 on Sat, 14 Jun 2014 11:31:52 AMAll use subject to JSTOR Terms and Conditions

218 Journal of the American Statistical Association, March 1975

3. Various Statistics for the Seven Cases

Case K (VAry)i & (Vcr&,)i ci 2 G(1) 4(1) G(1) K 2x(P2

1 7 .03 .46 .19 .08 .16 .56 .18 .39 .19 .37 .19 2 3 .22 .09 .22 .40 .48 3 1 -.63 .27 .18 .23 4 21 -.54 .31 .26 .09 .80 .78 .38 .40 .26 .19 .15 5 17 -.21 .34 .24 .09 .45 .70 .36 .42 .31 .33 .36 6 1 1 -.90 .31 -.23 .21 .67 1.19 .54 .32 .35 .22 .33 7 11 .26 .54 .43 .10 .17 .58 .19 .38 .20 .46 .14

descriptive statistic and for comparison with di. It would, of course, be interesting to estimate a assuming only that F(x) _ Cxa as x -- 0, i.e., without assuming any paramet- ric form for F. However, it appears that the data is too scanty to allow any reliable inference of this type.

Now consider G(1)/M, the observed proportion of genera with exactly one species. For F of the B (a, 1) form, G(1)/M converges to (a + -y)/(a + y + 1), so

AAy that [G(1)/M] = (d2 + -)/(d2 + - ? 1) would be a plausible estimate. Another estimate, which does not require any assumption about the form of F, is [G(1)/Ml, since this estimate employs only Ay and the empirical distribution of the Oi. We remark that K2EK1 (j3)2, where f3i = KNi/N, is an estimate of K-'E (f1)2, and that the values of this estimate are substantially smaller than one, as desired.

There is an interesting discrepancy between the data and predictions based on our theory. For each case, except 6, we have G(1)/M > (, with e in the vicinity of .20, for large families, and G(1)/M generally between .30 and .40. But under our theory G(1)/M converges to E (1+ )/E ( (iY), where e has distribution F, and so

E(031+',)E(9t,) 2: E(e))

if and only if Cov (E, @y) > 0, i.e., -y > 0. However, three of the Ay are negative, and there is no evidence that any y are positive. To illustrate, if F is B (a, b) then E(8) = a/(a + b), while

lim G(1)/M = (a + y)/(a + y + b)

Thus, if b = 4a, so E(e) = .20, then for y = -.5, and a = .8, which are not atypical values, we get (a + y)/ (a + ry + b) = .09, which is too small. Of course F need not be any beta distribution, and the discrepancy can be substantially reduced by appropriate choice of F, but we will still have

E(O) 2: lim (G(1)/M)

for any F, if -y < 0. One possible explanation for this discrepancy can be

given by slightly modifying the underlying Bose-Einstein allocation model. Thus, consider a large genus, one with say 150 species, that has been formed under the original

Bose-Einstein model. This genus can then always be split to form more genera. In particular, with so many species, it seems highly likely that one or more of the species will appear quite extreme in some respect from the majority of the species, and therefore such a species might be split off to form a new genus, which would there- fore have exactly one species. In fact several singleton genera might be formed in this way from the same original large genus, since the most extreme species are likely to appear extreme in different respects, and thus not neces- sarily lumped together, although this too could occur. In this way there will be a tendency to form more genera with exactly one species, and also perhaps with small numbers of species, than the simple Bose-Einstein model would predict, and this is precisely the discrepancy we have noted, i.e., G(1)/M is too large. Furthermore, the reduc- tion of a large genus, say from 150 to 145 species, would be virtually undetectable statistically, since the number of species in a large genus has a huge variance. This argument would suggest that the discrepancy would be greatest in the collections having many large genera, which seems to be the case. This constitutes one, but of course not the only, possible explanation for the dis- crepancy noted.

We close by remarking that we have presented a con- ceptually simple model for an exceedingly complex phenomenon, and we believe that even with such dis- crepancies as have been noted, the fit between data and predictions is remarkably good, given the complexity of the phenomenon in question. The assumptions are rather modest, a priori not implausible as an approximation, and supported by the data. The most fundamental as- sumption, that underlying all of the theory, is the ap- proximate Bose-Einstein allocation of species to genera within a family. There seems to be no obvious way to test this assumption other than by comparing the predictions that stem from it with the data. If true, it presumably reflects both the nature of the material being classified, and psychological tendencies of the classifiers. Because of its simplicity, it would seem to deserve the support of Occam's Razor, and to stand until something clearly better comes along in the way of a hypothesis. We do not wish to imply that taxonomy consists of purely random allocation of species to genera. But it seems not unlikely a priori that certain aspects of the classification

This content downloaded from 62.122.78.49 on Sat, 14 Jun 2014 11:31:52 AMAll use subject to JSTOR Terms and Conditions

Strong Zipf's Law 219

procedure, perhaps due to the fundamental difficulties of the subject, have elements of randomness, and that these may influence the results more than the relatively clear cases.

[Received February 1974.]

REFERENCES

[1] Ditmars, R.L., Reptiles of the World,, New York: Sturgis and Walton Company, 1910.

[2] Hill, Bruce M., "Zipf's Law and Prior Distributions for the Composition of a Population," Journal of the American Statistical Association, 65 (September 1970), 1220-32.

[3] Hill, Bruce M., "The Rank-Frequency Form of Zipf's Law,"

Journal of the American Statistical Association, 69 (December 1974), 1017-26.

[4] Mandelbrot, B., "On the Language of Taxonomy: An Outline of a 'Thermostatistical' Theory of Systems of Categories with Willis (Natural) Structure," in Colin Cherry, ed., Information Theory: Third London Symposium, London: Butterworths, 1956, 135-48.

[5] Simon, H.A., "On a Class of Skew Distribution Functions," Biometrika, 42 (December 1955), 425-40.

[6] Yule, G.U., "A Mathematical Theory of Evolution Based on the Conclusions of Dr. J.C. Willis, F.R.S.," Philosophical Transac- tions B, 213 (1924), 21-87.

[7] Zipf, G.K., Human Behavior and the Principle of Least Effort, Cambridge, Massachusetts: Addison-Wesley Publishing Co., 1949.

This content downloaded from 62.122.78.49 on Sat, 14 Jun 2014 11:31:52 AMAll use subject to JSTOR Terms and Conditions