power law size distributions - principles of complex ... · power law size distributions...

6
Power Law Size Distributions Definition Examples Wild vs. Mild CCDFs Zipf’s law Zipf CCDF References 1 of 36 Power Law Size Distributions Principles of Complex Systems CSYS/MATH 300, Fall, 2011 Prof. Peter Dodds Department of Mathematics & Statistics Center for Complex Systems Vermont Advanced Computing Center University of Vermont Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License. Power Law Size Distributions Definition Examples Wild vs. Mild CCDFs Zipf’s law Zipf CCDF References 2 of 36 Outline Definition Examples Wild vs. Mild CCDFs Zipf’s law Zipf CCDF References Power Law Size Distributions Definition Examples Wild vs. Mild CCDFs Zipf’s law Zipf CCDF References 3 of 36 Size distributions The sizes of many systems’ elements appear to obey an inverse power-law size distribution: P(size = x ) cx -γ where 0 < x min < x < x max and γ> 1 Exciting class exercise: sketch this function. x min = lower cutoff x max = upper cutoff Negative linear relationship in log-log space: log 10 P(x )= log 10 c - γ log 10 x We use base 10 because we are good people. Power Law Size Distributions Definition Examples Wild vs. Mild CCDFs Zipf’s law Zipf CCDF References 4 of 36 Size distributions Usually, only the tail of the distribution obeys a power law: P(x ) cx -γ for x large. Still use term ‘power law distribution.’ Other terms: Fat-tailed distributions. Heavy-tailed distributions. Beware: Inverse power laws aren’t the only ones: lognormals (), Weibull distributions (),... Power Law Size Distributions Definition Examples Wild vs. Mild CCDFs Zipf’s law Zipf CCDF References 5 of 36 Size distributions Many systems have discrete sizes k : Word frequency Node degree in networks: # friends, # hyperlinks, etc. # citations for articles, court decisions, etc. P(k ) ck -γ where k min k k max Obvious fail for k = 0. Again, typically a description of distribution’s tail. Power Law Size Distributions Definition Examples Wild vs. Mild CCDFs Zipf’s law Zipf CCDF References 6 of 36 The statistics of surprise—words: Brown Corpus () (10 6 words): rank word % q 1. the 6.8872 2. of 3.5839 3. and 2.8401 4. to 2.5744 5. a 2.2996 6. in 2.1010 7. that 1.0428 8. is 0.9943 9. was 0.9661 10. he 0.9392 11. for 0.9340 12. it 0.8623 13. with 0.7176 14. as 0.7137 15. his 0.6886 rank word % q 1945. apply 0.0055 1946. vital 0.0055 1947. September 0.0055 1948. review 0.0055 1949. wage 0.0055 1950. motor 0.0055 1951. fifteen 0.0055 1952. regarded 0.0055 1953. draw 0.0055 1954. wheel 0.0055 1955. organized 0.0055 1956. vision 0.0055 1957. wild 0.0055 1958. Palmer 0.0055 1959. intensity 0.0055

Upload: others

Post on 21-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Power Law Size Distributions - Principles of Complex ... · Power Law Size Distributions Definition Examples Wildvs.Mild CCDFs Zipf’s law Zipf⇔CCDF References 20 of 36 Size distributions

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

1 of 36

Power Law Size DistributionsPrinciples of Complex Systems

CSYS/MATH 300, Fall, 2011

Prof. Peter Dodds

Department of Mathematics & StatisticsCenter for Complex Systems

Vermont Advanced Computing CenterUniversity of Vermont

Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

2 of 36

Outline

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

3 of 36

Size distributionsThe sizes of many systems’ elements appear to obey aninverse power-law size distribution:

P(size = x) ∼ c x−γ

where 0 < xmin < x < xmax

and γ > 1

I Exciting class exercise: sketch this function.

I xmin = lower cutoffI xmax = upper cutoffI Negative linear relationship in log-log space:

log10 P(x) = log10 c − γ log10 x

I We use base 10 because we are good people.

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

4 of 36

Size distributions

Usually, only the tail of the distribution obeys apower law:

P(x) ∼ c x−γ for x large.

I Still use term ‘power law distribution.’I Other terms:

I Fat-tailed distributions.I Heavy-tailed distributions.

Beware:I Inverse power laws aren’t the only ones:

lognormals (), Weibull distributions (), . . .

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

5 of 36

Size distributions

Many systems have discrete sizes k :I Word frequencyI Node degree in networks: # friends, # hyperlinks, etc.I # citations for articles, court decisions, etc.

P(k) ∼ c k−γ

where kmin ≤ k ≤ kmax

I Obvious fail for k = 0.I Again, typically a description of distribution’s tail.

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

6 of 36

The statistics of surprise—words:Brown Corpus () (∼ 106 words):

rank word % q1. the 6.88722. of 3.58393. and 2.84014. to 2.57445. a 2.29966. in 2.10107. that 1.04288. is 0.99439. was 0.9661

10. he 0.939211. for 0.934012. it 0.862313. with 0.717614. as 0.713715. his 0.6886

rank word % q1945. apply 0.00551946. vital 0.00551947. September 0.00551948. review 0.00551949. wage 0.00551950. motor 0.00551951. fifteen 0.00551952. regarded 0.00551953. draw 0.00551954. wheel 0.00551955. organized 0.00551956. vision 0.00551957. wild 0.00551958. Palmer 0.00551959. intensity 0.0055

Page 2: Power Law Size Distributions - Principles of Complex ... · Power Law Size Distributions Definition Examples Wildvs.Mild CCDFs Zipf’s law Zipf⇔CCDF References 20 of 36 Size distributions

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

7 of 36

The statistics of surprise—words:

First—a Gaussian example:

P(x)dx =1√2πσ

e−(x−µ)2/2σdx

linear:

0 5 10 15 200

0.1

0.2

0.3

0.4

x

P(x

)

log-log

−3 −2 −1 0 1 2−25

−20

−15

−10

−5

0

log10

x

log 1

0P

(x)

mean µ = 10, variance σ2 = 1.

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

8 of 36

The statistics of surprise—words:

Raw ‘probability’ (binned):

linear:

0 2 4 6 80

200

400

600

800

1000

1200

q

Nq

log-log

−3 −2 −1 0 10

1

2

3

4

log10

q

log 10

Nq

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

9 of 36

The statistics of surprise—words:

‘Exceedance probability’:

linear:

0 2 4 6 80

500

1000

1500

2000

2500

q

N>

q

log-log

−3 −2 −1 0 10

1

2

3

4

log10

q

log 10

N>

q

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

10 of 36

My, what big words you have...

I Test capitalizes on word frequency following aheavily skewed frequency distribution with adecaying power law tail.

I Let’s do it collectively... ()

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

11 of 36

The statistics of surprise:

Gutenberg-Richter law ()

I Log-log plotI Base 10I Slope = -1

N(M > m) ∝ m−1

I From both the very awkwardly similar Christensen etal. and Bak et al.:“Unified scaling law for earthquakes” [3, 1]

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

12 of 36

The statistics of surprise:

From: “Quake Moves Japan Closer to U.S. andAlters Earth’s Spin” () by Kenneth Chang, March13, 2011, NYT:What is perhaps most surprising about the Japanearthquake is how misleading history can be. In the past300 years, no earthquake nearly that large—nothinglarger than magnitude eight—had struck in the Japansubduction zone. That, in turn, led to assumptions abouthow large a tsunami might strike the coast.

“It did them a giant disservice,” said Dr. Stein of thegeological survey. That is not the first time that theearthquake potential of a fault has been underestimated.Most geophysicists did not think the Sumatra fault couldgenerate a magnitude 9.1 earthquake, . . .

Page 3: Power Law Size Distributions - Principles of Complex ... · Power Law Size Distributions Definition Examples Wildvs.Mild CCDFs Zipf’s law Zipf⇔CCDF References 20 of 36 Size distributions

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

13 of 36

Great:

Two things we have poor cognitive understanding of:1. Probability

I Ex. The Monty Hall Problem ()I Ex. Son born on Tuesday ().

2. Logarithmic scales.

On counting and logarithms:

I Listen to Radiolab’s“Numbers.” ().

I Later: Benford’s Law ().

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

14 of 36

6

100 102 104

word frequency

100

102

104

100 102 104

citations

100

102

104

106

100 102 104

web hits

100

102

104

106 107

books sold

1

10

100

100 102 104 106

telephone calls received

100

103

106

2 3 4 5 6 7earthquake magnitude

102

103

104

0.01 0.1 1crater diameter in km

10-4

10-2

100

102

102 103 104 105

peak intensity

101

102

103

104

1 10 100intensity

1

10

100

109 1010

net worth in US dollars

1

10

100

104 105 106

name frequency

100

102

104

103 105 107

population of city

100

102

104

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

FIG. 4 Cumulative distributions or “rank/frequency plots” of twelve quantities reputed to follow power laws. The distributionswere computed as described in Appendix A. Data in the shaded regions were excluded from the calculations of the exponentsin Table I. Source references for the data are given in the text. (a) Numbers of occurrences of words in the novel Moby Dickby Hermann Melville. (b) Numbers of citations to scientific papers published in 1981, from time of publication until June1997. (c) Numbers of hits on web sites by 60 000 users of the America Online Internet service for the day of 1 December 1997.(d) Numbers of copies of bestselling books sold in the US between 1895 and 1965. (e) Number of calls received by AT&Ttelephone customers in the US for a single day. (f) Magnitude of earthquakes in California between January 1910 and May 1992.Magnitude is proportional to the logarithm of the maximum amplitude of the earthquake, and hence the distribution obeys apower law even though the horizontal axis is linear. (g) Diameter of craters on the moon. Vertical axis is measured per squarekilometre. (h) Peak gamma-ray intensity of solar flares in counts per second, measured from Earth orbit between February1980 and November 1989. (i) Intensity of wars from 1816 to 1980, measured as battle deaths per 10 000 of the population of theparticipating countries. (j) Aggregate net worth in dollars of the richest individuals in the US in October 2003. (k) Frequencyof occurrence of family names in the US in the year 1990. (l) Populations of US cities in the year 2000.

6

100

102

104

wor

d fr

eque

ncy

100

102

104

100

102

104

cita

tions

100

102

104

106

100

102

104

web

hits

100

102

104

106

107

book

s sol

d

110100

100

102

104

106

tele

phon

e ca

lls re

ceiv

ed

100

103

106

23

45

67

earth

quak

e m

agni

tude

102

103

104

0.01

0.1

1cr

ater

dia

met

er in

km

10-4

10-2

100

102

102

103

104

105

peak

inte

nsity

101

102

103

104

110

100

inte

nsity

110100

109

1010

net w

orth

in U

S do

llars

110100

104

105

106

nam

e fr

eque

ncy

100

102

104

103

105

107

popu

latio

n of

city

100

102

104

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)(k

)(l)

FIG

.4

Cum

ula

tive

distr

ibution

sor

“ran

k/f

requen

cyplo

ts”

oftw

elve

quan

tities

repute

dto

follow

pow

erla

ws.

The

distr

ibution

sw

ere

com

pute

das

des

crib

edin

Appen

dix

A.

Dat

ain

the

shad

edre

gion

sw

ere

excl

uded

from

the

calc

ula

tion

sof

the

expon

ents

inTab

leI.

Sou

rce

refe

rence

sfo

rth

edat

aar

egi

ven

inth

ete

xt.

(a)

Num

ber

sof

occ

urr

ence

sof

wor

ds

inth

enov

elM

oby

Dic

kby

Her

man

nM

elville

.(b

)N

um

ber

sof

cita

tion

sto

scie

ntific

pap

ers

publish

edin

1981

,fr

omtim

eof

publica

tion

until

June

1997

.(c

)N

um

ber

sof

hits

onw

ebsite

sby

6000

0use

rsof

the

Am

eric

aO

nline

Inte

rnet

serv

ice

for

the

day

of1

Dec

ember

1997

.(d

)N

um

ber

sof

copie

sof

bes

tsel

ling

book

sso

ldin

the

US

bet

wee

n18

95an

d19

65.

(e)

Num

ber

ofca

lls

rece

ived

by

AT

&T

tele

phon

ecu

stom

ersin

the

US

fora

singl

eday

.(f

)M

agnitude

ofea

rthquak

esin

Cal

ifor

nia

bet

wee

nJa

nuar

y19

10an

dM

ay19

92.

Mag

nitude

ispro

por

tion

alto

the

loga

rith

mof

the

max

imum

amplitu

de

ofth

eea

rthquak

e,an

dhen

ceth

edistr

ibution

obey

sa

pow

erla

wev

enth

ough

the

hor

izon

talax

isis

linea

r.(g

)D

iam

eter

ofcr

ater

son

the

moon

.V

ertica

lax

isis

mea

sure

dper

squar

ekilom

etre

.(h

)Pea

kga

mm

a-ra

yin

tensity

ofso

lar

flar

esin

counts

per

seco

nd,

mea

sure

dfr

omE

arth

orbit

bet

wee

nFeb

ruar

y19

80an

dN

ovem

ber

1989

.(i)

Inte

nsity

ofw

ars

from

1816

to19

80,m

easu

red

asbat

tle

dea

ths

per

1000

0of

the

pop

ula

tion

ofth

epar

tici

pat

ing

countr

ies.

(j)

Agg

rega

tenet

wor

thin

dol

lars

ofth

erich

est

indiv

idual

sin

the

US

inO

ctob

er20

03.

(k)

Fre

quen

cyof

occ

urr

ence

offa

mily

nam

esin

the

US

inth

eye

ar19

90.

(l)

Pop

ula

tion

sof

US

cities

inth

eye

ar20

00.

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

15 of 36

Size distributions

Examples:I Earthquake magnitude (Gutenberg-Richter

law ()): [1] P(M) ∝ M−2

I Number of war deaths: [9] P(d) ∝ d−1.8

I Sizes of forest fires [4]

I Sizes of cities: [10] P(n) ∝ n−2.1

I Number of links to and from websites [2]

I See in part Simon [10] and M.E.J. Newman [6] “Powerlaws, Pareto distributions and Zipf’s law” for more.

I Note: Exponents range in error

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

16 of 36

Size distributions

Examples:I Number of citations to papers: [7, 8] P(k) ∝ k−3.I Individual wealth (maybe): P(W ) ∝ W−2.I Distributions of tree trunk diameters: P(d) ∝ d−2.I The gravitational force at a random point in the

universe: [5] P(F ) ∝ F−5/2.I Diameter of moon craters: [6] P(d) ∝ d−3.I Word frequency: [10] e.g., P(k) ∝ k−2.2 (variable)

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

17 of 36

Power law distributions

Gaussians versus power-law distributions:I Mediocristan versus ExtremistanI Mild versus Wild (Mandelbrot)I Example: Height versus wealth.

I See “The Black Swan” by NassimTaleb. [11]

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

18 of 36

Turkeys...

From “The Black Swan” [11]

Page 4: Power Law Size Distributions - Principles of Complex ... · Power Law Size Distributions Definition Examples Wildvs.Mild CCDFs Zipf’s law Zipf⇔CCDF References 20 of 36 Size distributions

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

19 of 36

Taleb’s table [11]

Mediocristan/ExtremistanI Most typical member is mediocre/Most typical is either

giant or tiny

I Winners get a small segment/Winner take almost alleffects

I When you observe for a while, you know what’s goingon/It takes a very long time to figure out what’s going on

I Prediction is easy/Prediction is hard

I History crawls/History makes jumps

I Tyranny of the collective/Tyranny of the rare andaccidental

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

20 of 36

Size distributions

Power law size distributions aresometimes calledPareto distributions () after Italianscholar Vilfredo Pareto. ()

I Pareto noted wealth in Italy wasdistributed unevenly (80–20 rule;misleading).

I Term used especially bypractitioners of the DismalScience ().

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

21 of 36

Devilish power law distribution details:

Exhibit A:I Given P(x) = cx−γ with 0 < xmin < x < xmax,

the mean is (γ 6= 2):

〈x〉 =c

2− γ

(x2−γ

max − x2−γmin

).

I Mean ‘blows up’ with upper cutoff if γ < 2.I Mean depends on lower cutoff if γ > 2.I γ < 2: Typical sample is large.I γ > 2: Typical sample is small.

Insert question from assignment 1 ()

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

22 of 36

And in general...

Moments:I All moments depend only on cutoffs.I No internal scale that dominates/matters.I Compare to a Gaussian, exponential, etc.

For many real size distributions: 2 < γ < 3I mean is finite (depends on lower cutoff)I σ2 = variance is ‘infinite’ (depends on upper cutoff)I Width of distribution is ‘infinite’I If γ > 3, distribution is less terrifying and may be

easily confused with other kinds of distributions.

Insert question from assignment 1 ()

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

23 of 36

Moments

Standard deviation is a mathematical convenience:I Variance is nice analytically...I Another measure of distribution width:

Mean average deviation (MAD) = 〈|x − 〈x〉|〉

I For a pure power law with 2 < γ < 3:

〈|x − 〈x〉|〉 is finite.

I But MAD is mildly unpleasant analytically...I We still speak of infinite ‘width’ if γ < 3.

Insert question from assignment 2 ()

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

24 of 36

How sample sizes grow...

Given P(x) ∼ cx−γ:I We can show that after n samples, we expect the

largest sample to be

x1 & c′n1/(γ−1)

I Sampling from a finite-variance distribution gives amuch slower growth with n.

I e.g., for P(x) = λe−λx , we find

x1 &1λ

ln n.

Insert question from assignment 2 ()

Page 5: Power Law Size Distributions - Principles of Complex ... · Power Law Size Distributions Definition Examples Wildvs.Mild CCDFs Zipf’s law Zipf⇔CCDF References 20 of 36 Size distributions

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

25 of 36

Complementary Cumulative Distribution Function:

CCDF:I

P≥(x) = P(x ′ ≥ x) = 1− P(x ′ < x)

I

=

∫ ∞

x ′=xP(x ′)dx ′

I

∝∫ ∞

x ′=x(x ′)−γdx ′

I

=1

−γ + 1(x ′)−γ+1

∣∣∣∣∞x ′=x

I

∝ x−γ+1

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

26 of 36

Complementary Cumulative Distribution Function:

CCDF:I

P≥(x) ∝ x−γ+1

I Use when tail of P follows a power law.I Increases exponent by one.I Useful in cleaning up data.

PDF:

−3 −2 −1 0 10

1

2

3

4

log10

q

log 10

Nq

CCDF:

−3 −2 −1 0 10

1

2

3

4

log10

q

log 10

N>

q

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

27 of 36

Complementary Cumulative Distribution Function:

I Discrete variables:

P≥(k) = P(k ′ ≥ k)

=∞∑

k ′=k

P(k)

∝ k−γ+1

I Use integrals to approximate sums.

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

28 of 36

Zipfian rank-frequency plots

George Kingsley Zipf:I Noted various rank distributions

followed power laws, often with exponent -1(word frequency, city sizes...)

I Zipf’s 1949 Magnum Opus ():“Human Behaviour and the Principle ofLeast-Effort” [12]

I We’ll study Zipf’s law in depth...

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

29 of 36

Zipfian rank-frequency plots

Zipf’s way:I Given a collection of entities, rank them by size,

largest to smallest.I xr = the size of the r th ranked entity.I r = 1 corresponds to the largest size.I Example: x1 could be the frequency of occurrence of

the most common word in a text.I Zipf’s observation:

xr ∝ r−α

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

30 of 36

Size distributions

Brown Corpus (1,015,945 words):

CCDF:

−3 −2 −1 0 10

1

2

3

4

log10

q

log 10

N>

q

Zipf:

0 1 2 3 4−3

−2

−1

0

1

log10

rank i

log 10

qi

I The, of, and, to, a, ... = ‘objects’I ‘Size’ = word frequencyI Beep: CCDF and Zipf plots are related...

Page 6: Power Law Size Distributions - Principles of Complex ... · Power Law Size Distributions Definition Examples Wildvs.Mild CCDFs Zipf’s law Zipf⇔CCDF References 20 of 36 Size distributions

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

31 of 36

Size distributions

Brown Corpus (1,015,945 words):

CCDF:

−3 −2 −1 0 10

1

2

3

4

log10

q

log 10

N>

q

Zipf (axes flipped):

−3 −2 −1 0 10

1

2

3

4

log10

qi

log 10

ran

k i

I The, of, and, to, a, ... = ‘objects’I ‘Size’ = word frequencyI Beep: CCDF and Zipf plots are related...

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

32 of 36

Observe:I NP≥(x) = the number of objects with size at least x

where N = total number of objects.I If an object has size xr , then NP≥(xr ) is its rank r .I So

xr ∝ r−α = (NP≥(xr ))−α

∝ x (−γ+1)(−α)r since P≥(x) ∼ x−γ+1.

We therefore have 1 = (−γ + 1)(−α) or:

α =1

γ − 1

I A rank distribution exponent of α = 1 corresponds toa size distribution exponent γ = 2.

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

33 of 36

The Don. ()

Extreme deviations in test cricket:

1000 10 20 30 9040 50 60 70 80

I Don Bradman’s batting average () = 166% nextbest.

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

34 of 36

References I

[1] P. Bak, K. Christensen, L. Danon, and T. Scanlon.Unified scaling law for earthquakes.Phys. Rev. Lett., 88:178501, 2002. pdf ()

[2] A.-L. Barabási and R. Albert.Emergence of scaling in random networks.Science, 286:509–511, 1999. pdf ()

[3] K. Christensen, L. Danon, T. Scanlon, and P. Bak.Unified scaling law for earthquakes.Proc. Natl. Acad. Sci., 99:2509–2513, 2002. pdf ()

[4] P. Grassberger.Critical behaviour of the Drossel-Schwabl forest firemodel.New Journal of Physics, 4:17.1–17.15, 2002. pdf ()

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

35 of 36

References II

[5] J. Holtsmark.?Ann. Phys., 58:577–, 1917.

[6] M. E. J. Newman.Power laws, pareto distributions and zipf’s law.Contemporary Physics, pages 323–351, 2005.pdf ()

[7] D. J. d. S. Price.Networks of scientific papers.Science, 149:510–515, 1965. pdf ()

[8] D. J. d. S. Price.A general theory of bibliometric and other cumulativeadvantage processes.J. Amer. Soc. Inform. Sci., 27:292–306, 1976.

Power Law SizeDistributions

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

36 of 36

References III

[9] L. F. Richardson.Variation of the frequency of fatal quarrels withmagnitude.J. Amer. Stat. Assoc., 43:523–546, 1949. pdf ()

[10] H. A. Simon.On a class of skew distribution functions.Biometrika, 42:425–440, 1955. pdf ()

[11] N. N. Taleb.The Black Swan.Random House, New York, 2007.

[12] G. K. Zipf.Human Behaviour and the Principle of Least-Effort.Addison-Wesley, Cambridge, MA, 1949.