wikipedia: english, selected articles - magic · pdf fileshape of a hanging cable (the...

139
From/en.wikipedia.org 16 January 2011 Selected Articles These are entries printed using the PDF feature of Wikipedia. The title of each article is listed in bookmarks. Excerpts Wikipedia (pronounced / wɪkɨ pi di.ə/ WIK-i-PEE-dee-ə) is a multilingual, web- based, free-content encyclopedia project based on an openly editable model. The name "Wikipedia" is a portmanteau of the words wiki (a technology for creating collaborative websites, from the Hawaiian word wiki, meaning "quick") and encyclopedia. Wikipedia's articles provide links to guide the user to related pages with additional information. Wikipedia is written collaboratively by largely anonymous Internet volunteers who write without pay. Anyone with Internet access can write and make changes to Wikipedia articles (except in certain cases where editing is restricted to prevent disruption or vandalism). Users can contribute anonymously, under a pseudonym, or with their real identity, if they choose. People of all ages, cultures and backgrounds can add or edit article prose, references, images and other media here. What is contributed is more important than the expertise or qualifications of the contributor. What will remain depends upon whether it fits within Wikipedia's policies, including being verifiable against a published reliable source, so excluding editors' opinions and beliefs and unreviewed research, and is free of copyright restrictions and contentious material about living people. Contributions cannot damage Wikipedia because the software allows easy reversal of mistakes and many experienced editors are watching to help and ensure that edits are cumulative improvements. Wikipedia is a live collaboration differing from paper-based reference sources in important ways. Unlike printed encyclopedias, Wikipedia is continually created and updated, with articles on historic events appearing within minutes, rather than months or years. Older articles tend to grow more comprehensive and balanced; newer articles may contain misinformation, unencyclopedic content, or vandalism. Awareness of this aids obtaining valid information and avoiding recently added misinformation (see Researching with Wikipedia).

Upload: duongdiep

Post on 27-Mar-2018

222 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

From/en.wikipedia.org 16 January 2011

Selected Articles

These are entries printed using the PDF feature of Wikipedia. The title of each article is listed in bookmarks.

Excerpts

Wikipedia (pronounced /wɪkɨpidi.ə/ WIK-i-PEE-dee-ə) is a multilingual, web-based, free-content encyclopedia project based on an openly editable model. The name "Wikipedia" is a portmanteau of the words wiki (a technology for creating collaborative websites, from the Hawaiian word wiki, meaning "quick") and encyclopedia. Wikipedia's articles provide links to guide the user to related pages with additional information.

Wikipedia is written collaboratively by largely anonymous Internet volunteers who write without pay. Anyone with Internet access can write and make changes to Wikipedia articles (except in certain cases where editing is restricted to prevent disruption or vandalism). Users can contribute anonymously, under a pseudonym, or with their real identity, if they choose.

People of all ages, cultures and backgrounds can add or edit article prose, references, images and other media here. What is contributed is more important than the expertise or qualifications of the contributor. What will remain depends upon whether it fits within Wikipedia's policies, including being verifiable against a published reliable source, so excluding editors' opinions and beliefs and unreviewed research, and is free of copyright restrictions and contentious material about living people. Contributions cannot damage Wikipedia because the software allows easy reversal of mistakes and many experienced editors are watching to help and ensure that edits are cumulative improvements.

Wikipedia is a live collaboration differing from paper-based reference sources in important ways. Unlike printed encyclopedias, Wikipedia is continually created and updated, with articles on historic events appearing within minutes, rather than months or years. Older articles tend to grow more comprehensive and balanced; newer articles may contain misinformation, unencyclopedic content, or vandalism. Awareness of this aids obtaining valid information and avoiding recently added misinformation (see Researching with Wikipedia).

Page 2: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

From/en.wikipedia.org 16 January 2011

Wikipedia: Researching with Wikipedia From Wikipedia, the free encyclopedia

Wikipedia can be a great tool for learning and researching information. However, as with all reference works, not everything in Wikipedia is accurate, comprehensive, or unbiased. Many of the general rules of thumb for conducting research apply to Wikipedia, including:

• Always be wary of any one single source (in any medium — web, print, television or radio), or of multiple works that derive from a single source.

• Where articles have references to external sources (whether online or not) read the references and check whether they really do support what the article says.

• In most academic institutions, major references to Wikipedia, along with most encyclopedias, are unacceptable for a research paper. Other encyclopedias, such as Encyclopædia Britannica, have notable authors working for them and may be cited as a secondary source in most cases. For example, Cornell University has a guide on how to cite encyclopedias.

However, because of Wikipedia's unique nature, there are also some rules for conducting research that are special to Wikipedia, and some general rules that do not apply to Wikipedi

From http://en.wikipedia.org/wiki/Wikipedia:Researching_with_Wikipedia

Page 3: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

From mathworld.wolfram.com/HyperbolicFunctions.html 16 January 2011

The hyperbolic functions arise in many problems of mathematics and mathematical

physics in which integrals involving arise (whereas the circular functions involve

). For instance, the hyperbolic sine arises in the gravitational potential of a cylinder and the calculation of the Roche limit. The hyperbolic cosine function is the shape of a hanging cable (the so-called catenary). The hyperbolic tangent arises in the calculation of and rapidity of special relativity. All three appear in the Schwarzschild metric using external isotropic Kruskal coordinates in general relativity. The hyperbolic secant arises in the profile of a laminar jet. The hyperbolic cotangent arises in the Langevin function for magnetic polarization.

The hyperbolic functions are defined by

(7)

(8)

(9)

(10)

(11)

(12)

(13)

(14)

(15)

(16)

For arguments multiplied by ,

(17)

(18)

The hyperbolic functions satisfy many identities analogous to the trigonometric identities (which can be inferred using Osborn's rule) such as

(19)

Page 4: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

From mathworld.wolfram.com/HyperbolicFunctions.html 16 January 2011

(20)

(21)

See also Beyer (1987, p. 168).

Some half-angle formulas are

(22)

(23)

where .

Some double-angle formulas are

(24) (25) (26)

Identities for complex arguments include

(27)

(28)

The absolute squares for complex arguments are

(29) (30)

SEE ALSO: Double-Angle Formulas, Fibonacci Hyperbolic Functions, Half-Angle Formulas, Hyperbolic Cosecant, Hyperbolic Cosine, Hyperbolic Cotangent, Generalized Hyperbolic Functions, Hyperbolic Secant, Hyperbolic Sine, Hyperbolic Tangent, Inverse Hyperbolic Functions, Osborn's Rule

REFERENCES:

Abramowitz, M. and Stegun, I. A. (Eds.). "Hyperbolic Functions." §4.5 in Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, 9th printing. New York: Dover, pp. 83-86, 1972.

Page 5: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

From mathworld.wolfram.com/HyperbolicFunctions.html 16 January 2011

Anderson, J. W. "Trigonometry in the Hyperbolic Plane." §5.7 in Hyperbolic Geometry. New York: Springer-Verlag, pp. 146-151, 1999.

Beyer, W. H. "Hyperbolic Function." CRC Standard Mathematical Tables, 28th ed. Boca Raton, FL: CRC Press, pp. 168-186 and 219, 1987.

Coxeter, H. S. M. and Greitzer, S. L. Geometry Revisited. Washington, DC: Math. Assoc. Amer., pp. 126-131, 1967.

Harris, J. W. and Stocker, H. "Hyperbolic Functions." Handbook of Mathematics and Computational Science. New York: Springer-Verlag, pp. 245-262, 1998.

Jeffrey, A. "Hyperbolic Identities." §2.5 in Handbook of Mathematical Formulas and Integrals, 2nd ed. Orlando, FL: Academic Press, pp. 117-122, 2000.

Yates, R. C. "Hyperbolic Functions." A Handbook on Curves and Their Properties. Ann Arbor, MI: J. W. Edwards, pp. 113-118, 1952.

Zwillinger, D. (Ed.). "Hyperbolic Functions." §6.7 in CRC Standard Mathematical Tables and Formulae. Boca Raton, FL: CRC Press, pp. 476-481 1995.

CITE THIS AS:

Weisstein, Eric W. "Hyperbolic Functions." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/HyperbolicFunctions.html

Page 6: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Hyperbolic function 1

Hyperbolic function

A ray through the origin intercepts the hyperbola in the point, where is twice the area between the ray and the -axis.

For points on the hyperbola below the -axis, the area is considered negative(see animated version with comparison with the trigonometric (circular)

functions).

In mathematics, hyperbolic functions areanalogs of the ordinary trigonometric, orcircular, functions. The basic hyperbolicfunctions are the hyperbolic sine "sinh"(typically pronounced /ˈsɪntʃ/ or English

pronunciation: /ˈʃaɪn/), and the hyperboliccosine "cosh" (typically pronounced /ˈkɒʃ/),from which are derived the hyperbolictangent "tanh" (typically pronounced /ˈtæntʃ/or English pronunciation: /ˈθæn/), and so on,corresponding to the derived trigonometricfunctions. The inverse hyperbolic functionsare the area hyperbolic sine "arsinh" (alsocalled "asinh", or sometimes by themisnomer of "arcsinh"[1] ) and so on.

Just as the points (cos t, sin t) form a circlewith a unit radius, the points (cosh t, sinh t)form the right half of the equilateralhyperbola. Hyperbolic functions occur in thesolutions of some important lineardifferential equations, for example the equation defining a catenary, and Laplace's equation in Cartesian coordinates.The latter is important in many areas of physics, including electromagnetic theory, heat transfer, fluid dynamics, andspecial relativity.

The hyperbolic functions take real values for a real argument called a hyperbolic angle. In complex analysis, they aresimply rational functions of exponentials, and so are meromorphic.

Hyperbolic functions were introduced in the 1760s independently by Vincenzo Riccati and Johann HeinrichLambert.[2] Riccati used Sc. and Cc. ([co]sinus circulare) to refer to circular functions and Sh. and Ch. ([co]sinushyperbolico) to refer to hyperbolic functions. Lambert adopted the names but altered the abbreviations to what theyare today.[3]

Page 7: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Hyperbolic function 2

Standard algebraic expressions

sinh, cosh and tanh

csch, sech and coth

The hyperbolic functions are:• Hyperbolic sine:

• Hyperbolic cosine:

• Hyperbolic tangent:

• Hyperbolic cotangent:

• Hyperbolic secant:

• Hyperbolic cosecant:

Page 8: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Hyperbolic function 3

Hyperbolic functions can be introduced via imaginary circular angles:• Hyperbolic sine:

• Hyperbolic cosine:

• Hyperbolic tangent:

• Hyperbolic cotangent:

• Hyperbolic secant:

• Hyperbolic cosecant:

where i is the imaginary unit defined as i2 = -1.The complex forms in the definitions above derive from Euler's formula.Note that, by convention, sinh2 x means (sinh x)2, not sinh(sinh x); similarly for the other hyperbolic functions whenused with positive exponents. Another notation for the hyperbolic cotangent function is ctnh x, though coth x is farmore common.

Useful relations

Hence:

It can be seen that cosh x and sech x are even functions; the others are odd functions.

Hyperbolic sine and cosine satisfy the identity

which is similar to the Pythagorean trigonometric identity. One also has

for the other functions.The hyperbolic tangent is the solution to the nonlinear boundary value problem[4] :

Page 9: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Hyperbolic function 4

It can be shown that the area under the curve of cosh x is always equal to the arc length:[5]

Inverse functions as logarithms

Derivatives

Page 10: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Hyperbolic function 5

Standard IntegralsFor a full list of integrals of hyperbolic functions, see list of integrals of hyperbolic functions

In the above expressions, C is called the constant of integration.

Taylor series expressionsIt is possible to express the above functions as Taylor series:

The function sinh x has a Taylor series expression with only odd exponents for x. Thus it is an odd function, that is,−sinh x = sinh(−x), and sinh 0 = 0.

The function cosh x has a Taylor series expression with only even exponents for x. Thus it is an even function, thatis, symmetric with respect to the y-axis. The sum of the sinh and cosh series is the infinite series expression of theexponential function.

(Laurent series)

(Laurent

series)

Page 11: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Hyperbolic function 6

whereis the nth Bernoulli numberis the nth Euler number

Comparison with circular trigonometric functionsConsider these two subsets of the Cartesian plane

Then A forms the right branch of the hyperbola (x,y): x2 − y2 = 1, while B is the unit circle. Evidently =(1,0). The primary difference is that the map t → B is a periodic function while t → A is not.There is a close analogy of A with B through split-complex numbers in comparison with ordinary complex numbers,and its circle group. In particular, the maps t → A and t → B are the exponential map in each case. They are bothinstances of one-parameter groups in Lie theory where all groups evolve out of the identity

For contrast, in the terminology of topological groups, B forms a compact group while Ais non-compact since it is unbounded.The hyperbolic functions satisfy many identities, all of them similar in form to the trigonometric identities. In fact,Osborn's rule[6] states that one can convert any trigonometric identity into a hyperbolic identity by expanding itcompletely in terms of integral powers of sines and cosines, changing sine to sinh and cosine to cosh, and switchingthe sign of every term which contains a product of 2, 6, 10, 14, ... sinhs. This yields for example the additiontheorems

the "double angle formulas"

and the "half-angle formulas"[7]

Note: This is equivalent to its circular counterpart multiplied by −1.

Note: This corresponds to its circular counterpart.

The derivative of sinh x is cosh x and the derivative of cosh x is sinh x; this is similar to trigonometric functions,albeit the sign is different (i.e., the derivative of cos x is −sin x).The Gudermannian function gives a direct relationship between the circular functions and the hyperbolic ones thatdoes not involve complex numbers.The graph of the function a cosh(x/a) is the catenary, the curve formed by a uniform flexible chain hanging freelyunder gravity.

Page 12: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Hyperbolic function 7

Relationship to the exponential functionFrom the definitions of the hyperbolic sine and cosine, we can derive the following identities:

and

These expressions are analogous to the expressions for sine and cosine, based on Euler's formula, as sums ofcomplex exponentials.

Hyperbolic functions for complex numbersSince the exponential function can be defined for any complex argument, we can extend the definitions of thehyperbolic functions also to complex arguments. The functions sinh z and cosh z are then holomorphic.Relationships to ordinary trigonometric functions are given by Euler's formula for complex numbers:

so:

Thus, hyperbolic functions are periodic with respect to the imaginary component,with period ( forhyperbolic tangent and cotangent).

Hyperbolic functions in the complex plane

Page 13: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Hyperbolic function 8

References[1] Some examples of using arcsinh (http:/ / www. google. com/ books?q=arcsinh+ -library) found in Google Books.[2] Robert E. Bradley, Lawrence A. D'Antonio, Charles Edward Sandifer. Euler at 300: an appreciation. Mathematical Association of America,

2007. Page 100.[3] Georg F. Becker. Hyperbolic functions. Read Books, 1931. Page xlviii.[4] Eric W. Weisstein. "Hyperbolic Tangent" (http:/ / mathworld. wolfram. com/ HyperbolicTangent. html). MathWorld. . Retrieved 2008-10-20.[5] N.P., Bali (2005). Golden Intergral Calculus (http:/ / books. google. com/ books?id=hfi2bn2Ly4cC). Firewall Media. p. 472.

ISBN 8-170-08169-6. ., Extract of page 472 (http:/ / books. google. com/ books?id=hfi2bn2Ly4cC& pg=PA472)[6] G. Osborn, Mnemonic for hyperbolic formulae (http:/ / links. jstor. org/ sici?sici=0025-5572(190207)2:2:34<189:1MFHF>2. 0. CO;2-Z), The

Mathematical Gazette, p. 189, volume 2, issue 34, July 1902[7] Peterson, John Charles (2003). Technical mathematics with calculus (http:/ / books. google. com/ books?id=PGuSDjHvircC) (3rd ed.).

Cengage Learning. p. 1155. ISBN 0-766-86189-9. ., Chapter 26, page 1155 (http:/ / books. google. com/ books?id=PGuSDjHvircC&pg=PA1155)

External links• Hyperbolic functions (http:/ / planetmath. org/ encyclopedia/ HyperbolicFunctions. html) on PlanetMath• Hyperbolic functions (http:/ / mathworld. wolfram. com/ HyperbolicFunctions. html) entry at MathWorld• GonioLab (http:/ / glab. trixon. se/ ): Visualization of the unit circle, trigonometric and hyperbolic functions (Java

Web Start)• Web-based calculator of hyperbolic functions (http:/ / www. calctool. org/ CALC/ math/ trigonometry/

hyperbolic)

Page 14: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Article Sources and Contributors 9

Article Sources and ContributorsHyperbolic function Source: http://en.wikipedia.org/w/index.php?oldid=408009678 Contributors: Aaron Rotenberg, Access Denied, Agro r, AlanUS, Albmont, Alle, Allispaul, Andy FisheraKa Shine, Aushulz, AxelBoldt, Azo bob, Barak Sh, Bcherkas, Ben pcc, BenFrantzDale, Bh3u4m, Bhadani, Brighterorange, Calvinballing, Can't sleep, clown will eat me, Canderson7, Carlodn6,Charles Matthews, Chris-gore, Chzz, CiaPan, Coasterlover1994, Cojoco, Compotatoj, Cottonmouse, Courcelles, Ctifumdope, Culero101, Cyp, DVdm, Dcqec111, Derepi, Deskana, Doctormatt,Dr Greg, Dragnmn, Drilnoth, Eettttt, Egmontaz, Email4mobile, Eric119, Error792, Evil saltine, Fanix1128, Faramir1138, Footyfanatic3000, Fredrik, Gabbe, Gauge, Gco, GeiwTeol, Gene WardSmith, Georgia guy, Gesslein, Giftlite, Googl, Guardian of Light, Hashar, Hdt83, Heliomance, HowiAuckland, Husond, Icairns, Jackfork, Jeandavid54, Jleedev, JoeHillen, Josh Grosse, Justin WSmith, KarlJorgensen, KennethJ, Kieff, Kiensvay, KnowledgeOfSelf, Kritikool99, Ktims, Kwamikagami, LOL, Lambiam, Lee, Linas, Lzur, Macrakis, Maksim-e, MarSch, MarkSweep, Meand,Mebden, Metacomet, Michael Hardy, Michael.YX.Wu, Michiganfan9000, Mleconte, Montparnasse, Moverington, Moxfyre, Nascar1996, Nbarth, Ninly, Noerdosnum, Nonagonal Spider, OlegAlexandrov, PAR, Patchouli, Pikalax, Pne, Prophile, Rgdboer, Richnfg, Roberdin, Rocky2009, SKvalen, Salih, Sam Derbyshire, SebastianHelm, Seraph85, Sergiacid, Sharonlees, Shen, Sillyrabbit, Smack, Snevets, Societelibre, Sohale, Suisui, Sverdrup, Swetrix, TV4Fun, TakuyaMurata, The Anome, TheBendster, Tim bates, Tob, Tobias Bergemann, Tsemii, Urhixidur, V1adis1av,Vinograd19, Wereon, Whiner01, Wiccan Quagga, WojciechSwiderski, Wwoods, Xanthoxyl, Zvika, 295 anonymous edits

Image Sources, Licenses and ContributorsImage:Hyperbolic functions-2.svg Source: http://en.wikipedia.org/w/index.php?title=File:Hyperbolic_functions-2.svg License: Public Domain Contributors: User:Jeandavid54image:sinh cosh tanh.svg Source: http://en.wikipedia.org/w/index.php?title=File:Sinh_cosh_tanh.svg License: Public Domain Contributors: Original uploader was Ktims at en.wikipediaimage:csch sech coth.svg Source: http://en.wikipedia.org/w/index.php?title=File:Csch_sech_coth.svg License: Public Domain Contributors: Original uploader was Ktims at en.wikipediaImage:Complex Sinh.jpg Source: http://en.wikipedia.org/w/index.php?title=File:Complex_Sinh.jpg License: Public Domain Contributors: User:Jan HomannImage:Complex Cosh.jpg Source: http://en.wikipedia.org/w/index.php?title=File:Complex_Cosh.jpg License: Public Domain Contributors: User:Jan HomannImage:Complex Tanh.jpg Source: http://en.wikipedia.org/w/index.php?title=File:Complex_Tanh.jpg License: Public Domain Contributors: User:Jan HomannImage:Complex Coth.jpg Source: http://en.wikipedia.org/w/index.php?title=File:Complex_Coth.jpg License: Public Domain Contributors: User:Jan HomannImage:Complex Sech.jpg Source: http://en.wikipedia.org/w/index.php?title=File:Complex_Sech.jpg License: Public Domain Contributors: User:Jan HomannImage:Complex Csch.jpg Source: http://en.wikipedia.org/w/index.php?title=File:Complex_Csch.jpg License: Public Domain Contributors: User:Jan Homann

LicenseCreative Commons Attribution-Share Alike 3.0 Unportedhttp:/ / creativecommons. org/ licenses/ by-sa/ 3. 0/

Page 15: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

From mathworld.wolfram.com/HyperbolicFunctions.html 16 January 2011

Hyperbolic Functions

The hyperbolic functions , , , , , (hyperbolic sine, hyperbolic cosine, hyperbolic tangent, hyperbolic cosecant, hyperbolic secant, and hyperbolic cotangent) are analogs of the circular functions, defined by removing s appearing in the complex exponentials. For example,

(1)

so

(2)

Note that alternate notations are sometimes used, as summarized in the following table.

alternate notations

(Gradshteyn and Ryzhik 2000, p. xxvii)

(Gradshteyn and Ryzhik 2000, p. xxvii)

(Gradshteyn and Ryzhik 2000, p. xxvii)

(Gradshteyn and Ryzhik 2000, p. xxvii)

The hyperbolic functions share many properties with the corresponding circular functions. In fact, just as the circle can be represented parametrically by

(3)

(4)

a rectangular hyperbola (or, more specifically, its right branch) can be analogously represented by

(5)

(6)

where is the hyperbolic cosine and is the hyperbolic sine.

Page 16: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

From mathworld.wolfram.com/HyperbolicFunctions.html 16 January 2011

The hyperbolic functions arise in many problems of mathematics and mathematical

physics in which integrals involving arise (whereas the circular functions involve

). For instance, the hyperbolic sine arises in the gravitational potential of a cylinder and the calculation of the Roche limit. The hyperbolic cosine function is the shape of a hanging cable (the so-called catenary). The hyperbolic tangent arises in the calculation of and rapidity of special relativity. All three appear in the Schwarzschild metric using external isotropic Kruskal coordinates in general relativity. The hyperbolic secant arises in the profile of a laminar jet. The hyperbolic cotangent arises in the Langevin function for magnetic polarization.

The hyperbolic functions are defined by

(7)

(8)

(9)

(10)

(11)

(12)

(13)

(14)

(15)

(16)

For arguments multiplied by ,

(17)

(18)

The hyperbolic functions satisfy many identities analogous to the trigonometric identities (which can be inferred using Osborn's rule) such as

(19)

Page 17: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

From mathworld.wolfram.com/HyperbolicFunctions.html 16 January 2011

(20)

(21)

See also Beyer (1987, p. 168).

Some half-angle formulas are

(22)

(23)

where .

Some double-angle formulas are

(24) (25) (26)

Identities for complex arguments include

(27)

(28)

The absolute squares for complex arguments are

(29) (30)

SEE ALSO: Double-Angle Formulas, Fibonacci Hyperbolic Functions, Half-Angle Formulas, Hyperbolic Cosecant, Hyperbolic Cosine, Hyperbolic Cotangent, Generalized Hyperbolic Functions, Hyperbolic Secant, Hyperbolic Sine, Hyperbolic Tangent, Inverse Hyperbolic Functions, Osborn's Rule

REFERENCES:

Abramowitz, M. and Stegun, I. A. (Eds.). "Hyperbolic Functions." §4.5 in Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, 9th printing. New York: Dover, pp. 83-86, 1972.

Page 18: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

From mathworld.wolfram.com/HyperbolicFunctions.html 16 January 2011

Anderson, J. W. "Trigonometry in the Hyperbolic Plane." §5.7 in Hyperbolic Geometry. New York: Springer-Verlag, pp. 146-151, 1999.

Beyer, W. H. "Hyperbolic Function." CRC Standard Mathematical Tables, 28th ed. Boca Raton, FL: CRC Press, pp. 168-186 and 219, 1987.

Coxeter, H. S. M. and Greitzer, S. L. Geometry Revisited. Washington, DC: Math. Assoc. Amer., pp. 126-131, 1967.

Harris, J. W. and Stocker, H. "Hyperbolic Functions." Handbook of Mathematics and Computational Science. New York: Springer-Verlag, pp. 245-262, 1998.

Jeffrey, A. "Hyperbolic Identities." §2.5 in Handbook of Mathematical Formulas and Integrals, 2nd ed. Orlando, FL: Academic Press, pp. 117-122, 2000.

Yates, R. C. "Hyperbolic Functions." A Handbook on Curves and Their Properties. Ann Arbor, MI: J. W. Edwards, pp. 113-118, 1952.

Zwillinger, D. (Ed.). "Hyperbolic Functions." §6.7 in CRC Standard Mathematical Tables and Formulae. Boca Raton, FL: CRC Press, pp. 476-481 1995.

CITE THIS AS:

Weisstein, Eric W. "Hyperbolic Functions." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/HyperbolicFunctions.html

Page 19: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Logistic distribution 1

Logistic distribution

Logistic

Probability density function

Cumulative distribution function

parameters: location (real)scale (real)

support:

pdf:

cdf:

mean:

median:

mode:

variance:

skewness:

ex.kurtosis:

entropy:

mgf:for , Beta function

cf:for

In probability theory and statistics, the logistic distribution is a continuous probability distribution. Its cumulativedistribution function is the logistic function, which appears in logistic regression and feedforward neural networks. Itresembles the normal distribution in shape but has heavier tails (higher kurtosis).

Page 20: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Logistic distribution 2

Specification

Cumulative distribution functionThe logistic distribution receives its name from its cumulative distribution function (cdf), which is an instance of thefamily of logistic functions:

In this equation, x is the random variable, μ is the mean, and s is a parameter proportional to the standard deviation.

Probability density functionThe probability density function (pdf) of the logistic distribution is given by:

Because the pdf can be expressed in terms of the square of the hyperbolic secant function "sech", it is sometimesreferred to as the sech-square(d) distribution.

See also: hyperbolic secant distribution

Quantile function

The inverse cumulative distribution function of the logistic distribution is , a generalization of the logitfunction, defined as follows:

Alternative parameterizationAn alternative parameterization of the logistic distribution can be derived using the substitution .This yields the following density function:

Page 21: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Logistic distribution 3

ApplicationsThe logistic distribution and the S-shaped pattern that results from it have been extensively used in many differentareas, including:

Fitted cumulative logistic distribution to October rainfalls usingCumFreq [1]

• Biology – to describe how species populations growin competition[2]

• Epidemiology – to describe the spreading ofepidemics[3]

• Psychology – to describe learning[4]

• Technology – to describe how new technologiesdiffuse and substitute for each other[5]

• Market – the diffusion of new-product sales[6]

• Energy – the diffusion and substitution of primaryenergy sources[7] , as in the Hubbert curve

• Hydrology - In hydrology the distribution of longduration river discharge and rainfall (e.g monthlyand yearly totals, consisting of the sum of 30respectively 360 daily values) is often thought to bealmost normal according to the central limit theorem.[8] The normal distribution, however, needs a numericapproximation. As the logistic distribution, which can be solved analytically, is similar to the normal distribution,it can be used instead. The blue picture illustrates an example of fitting the logistic distribution to ranked Octoberrainfalls - that are almost normally distributed - and it shows the 90% confidence belt based on the binomialdistribution. The rainfall data are represented by plotting positions as part of the cumulative frequency analysis.

• Physics - the cdf of this distribution describes a Fermi gas and more specifically the number of electrons within ametal that can be expected to occupy a given quantum state. Its range is between 0 and 1, reflecting the Pauliexclusion principle. The value is given as a function of the kinetic energy corresponding to that state and isparametrized by the Fermi energy and also the temperature (and Boltzmann constant). By changing the sign infront of the "1" in the denominator, one goes from Fermi–Dirac statistics to Bose–Einstein statistics. In this case,the expected number of particles (bosons) in a given state can exceed unity, which is indeed the case for systemssuch as lasers.

Both the United States Chess Federation and FIDE have switched their formulas for calculating chess ratings fromthe normal distribution to the logistic distribution; see Elo rating system.

Related distributionsIf log(X) has a logistic distribution then X has a log-logistic distribution and X – a has a shifted log-logisticdistribution.

Derivations

Expected Value

Substitute:

Page 22: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Logistic distribution 4

Note the odd function:

Higher order momentsThe n-th order central moment can be expressed in terms of the quantile function:

This integral is well-known[9] and can be expressed in terms of Bernoulli numbers:

See also• Generalized logistic distribution• Logistic regression• Sigmoid function

Notes[1] "Cumfreq, a free computer program for cumulative frequency analysis" (http:/ / www. waterlog. info/ cumfreq. htm). .[2] P. F. Verhulst, "Recherches mathématiques sur la loi d'accroissement de la population", Nouveaux Mémoirs de l'Académie Royale des

Sciences et des Belles-Lettres de Bruxelles, vol. 18 (1845); Alfred J. Lotka, Elements of Physical Biology, (Baltimore, MD: Williams &Wilkins Co., 1925).

[3] Theodore Modis, Predictions: Society's Telltale Signature Reveals the Past and Forecasts the Future, Simon & Schuster, New York, 1992,pp 97-105.

[4] Theodore Modis, Predictions: Society's Telltale Signature Reveals the Past and Forecasts the Future, Simon & Schuster, New York, 1992,Chapter 2.

[5] J. C. Fisher and R. H. Pry , "A Simple Substitution Model of Technological Change", Technological Forecasting & Social Change, vol. 3, no.1 (1971).

[6] Theodore Modis, Conquering Uncertainty, McGraw-Hill, New York, 1998, Chapter 1.[7] Cesare Marchetti, "Primary Energy Substitution Models: On the Interaction between Energy and Society", Technological Forecasting &

Social Change, vol. 10, (1977).[8] Ritzema (ed.), H.P. (1994). Frequency and Regression Analysis (http:/ / www. waterlog. info/ pdf/ freqtxt. pdf). Chapter 6 in: Drainage

Principles and Applications, Publication 16, International Institute for Land Reclamation and Improvement (ILRI), Wageningen, TheNetherlands. pp. 175–224. ISBN 90 70754 3 39. .

[9] (http:/ / www. research. att. com/ ~njas/ sequences/ A001896)

Page 23: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Logistic distribution 5

References• N., Balakrishnan (1992). Handbook of the Logistic Distribution. Marcel Dekker, New York.

ISBN 0-8247-8587-8.• Johnson, N. L., Kotz, S., Balakrishnan N. (1995). Continuous Univariate Distributions. Vol. 2 (2nd Ed. ed.).

ISBN 0-471-58494-0.

Page 24: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Article Sources and Contributors 6

Article Sources and ContributorsLogistic distribution Source: http://en.wikipedia.org/w/index.php?oldid=402718993 Contributors: Abdel Hameed Nawar, Ahoerstemeier, Army1987, Asitgoes, Betacommand, Br77rino,Bubba73, Carbonate, Casey56, Cazort, ChrisCork, Coachaxis, Dicklyon, Draco flavus, Eric Kvaalen, Fnielsen, Giftlite, Home Row Keysplurge, Hongooi, Iwaterpolo, Leonard G., LokiClock,MarkSweep, Melcombe, Michael Hardy, Mwbaxter, PAR, Quicksilvre, Qwfp, Radon210, Rlendog, Shoessss, SimonP, Stpasha, Tmh, Tomi, Trevor.tombe, 44 anonymous edits

Image Sources, Licenses and ContributorsImage:Logisticpdfunction.png Source: http://en.wikipedia.org/w/index.php?title=File:Logisticpdfunction.png License: GNU Free Documentation License Contributors: Anarkman,Pfctdayelise, RandomP, WikipediaMasterImage:Logistic cdf.png Source: http://en.wikipedia.org/w/index.php?title=File:Logistic_cdf.png License: Creative Commons Attribution-Sharealike 2.5 Contributors: AnarkmanFile:FitLogisticdistr.tif Source: http://en.wikipedia.org/w/index.php?title=File:FitLogisticdistr.tif License: Creative Commons Attribution-Sharealike 3.0 Contributors: User:Buenas días

LicenseCreative Commons Attribution-Share Alike 3.0 Unportedhttp:/ / creativecommons. org/ licenses/ by-sa/ 3. 0/

Page 25: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Logistic function 1

Logistic functionFor the recurrence relation, see logistic map.

Standard logistic sigmoid function

A logistic function or logistic curve isa common sigmoid curve, given itsname in 1844 or 1845 by PierreFrançois Verhulst who studied it inrelation to population growth. It canmodel the "S-shaped" curve(abbreviated S-curve) of growth ofsome population P. The initial stage ofgrowth is approximately exponential;then, as saturation begins, the growthslows, and at maturity, growth stops.

A simple logistic function may bedefined by the formula

where the variable P might be considered to denote a population and the variable t might be thought of as time[1] .For values of t in the range of real numbers from −∞ to +∞, the S-curve shown is obtained. In practice, due to thenature of the exponential function e−t, it is sufficient to compute t over a small range of real numbers such as[−6, +6].The logistic function finds applications in a range of fields, including artificial neural networks, biology,biomathematics, demography, economics, chemistry, mathematical psychology, probability, sociology, politicalscience, and statistics. It has an easily calculated derivative:

It also has the property that

In other words, the function P - 1/2 is odd.

Logistic differential equationThe logistic function is the solution of the simple first-order non-linear differential equation

where P is a variable with respect to time t and with boundary condition P(0) = 1/2. This equation is the continuousversion of the logistic map.The qualitative behavior is easily understood in terms of the phase line: the derivative is 0 at P=0,1, and thederivative is positive for P between 0 and 1, and negative for P above 1 or less than 0 (though negative populationsdo not generally accord with a physical model). This yields an unstable equilibrium at 0, and a stable equilibrium at1, and thus for any value of P greater than 0 and less than 1, P grows to 1.One may readily find the (symbolic) solution to be

Page 26: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Logistic function 2

Choosing the constant of integration gives the other well-known form of the definition of the logistic curve

More quantitatively, as can be seen from the analytical solution, the logistic curve shows early exponential growthfor negative t, which slows to linear growth of slope 1/4 near t = 0, then approaches y = 1 with an exponentiallydecaying gap.The logistical function is the inverse of the natural logit function and so can be used to convert the logarithm of oddsinto a probability; the conversion from the log-likelihood ratio of two alternatives also takes the form of a logisticcurve.The logistic sigmoid function is related to the hyperbolic tangent, A.p. by

In ecology: modeling population growth

Pierre-François Verhulst (1804–1849)

A typical application of the logistic equation is a common model ofpopulation growth, originally due to Pierre-François Verhulst in 1838, wherethe rate of reproduction is proportional to both the existing population and theamount of available resources, all else being equal. The Verhulst equationwas published after Verhulst had read Thomas Malthus' An Essay on thePrinciple of Population. Verhulst derived his logistic equation to describe theself-limiting growth of a biological population. The equation is alsosometimes called the Verhulst-Pearl equation following its rediscovery in1920. Alfred J. Lotka derived the equation again in 1925, calling it the law ofpopulation growth.

Letting P represent population size (N is often used in ecology instead) and trepresent time, this model is formalized by the differential equation:

where the constant r defines the growth rate and K is the carrying capacity.In the equation, the early, unimpeded growth rate is modeled by the first term +rP. The value of the rate r representsthe proportional increase of the population P in one unit of time. Later, as the population grows, the second term,which multiplied out is −rP2/K, becomes larger than the first as some members of the population P interfere witheach other by competing for some critical resource, such as food or living space. This antagonistic effect is called thebottleneck, and is modeled by the value of the parameter K. The competition diminishes the combined growth rate,until the value of P ceases to grow (this is called maturity of the population).Dividing both sides of the equation by K gives

Page 27: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Logistic function 3

Now setting gives the differential equation

For we have the particular case with which we started.In ecology, species are sometimes referred to as r-strategist or K-strategist depending upon the selective processesthat have shaped their life history strategies. The solution to the equation (with being the initial population) is

where

Which is to say that K is the limiting value of P: the highest value that the population can reach given infinite time(or come close to reaching in finite time). It is important to stress that the carrying capacity is asymptotically reachedindependently of the initial value P(0) > 0, also in case that P(0) > K.

Time-varying carrying capacitySince the environmental conditions influence the carrying capacity, as a consequence it can be time-varying:K(t) > 0, leading to the following mathematical model:

A particularly important case is that of carrying capacity that varies periodically with period T:

It can be shown that in such a case, independently from the initial value P(0) > 0, P(t) will tend to a unique periodicsolution P*(t), whose period is T.A typical value of T is one year: in such case K(t) reflects periodical variations of weather conditions.Another interesting generalization is to consider that the carrying capacity K(t) is a function of the population at anearlier time, capturing a delay in the way population modifies its environment. This leads to a logistic delay equation[2] , which has a very rich behavior, with bistability in some parameter range, as well as a monotonic decay to zero,smooth exponential growth, punctuated unlimited growth (i.e., multiple S-shapes), punctuated growth or alternationto a stationary level, oscillatory approach to a stationary level, sustainable oscillations, finite-time singularities aswell as finite-time death.

In neural networksLogistic functions are often used in neural networks to introduce nonlinearity in the model and/or to clamp signals towithin a specified range. A popular neural net element computes a linear combination of its input signals, and appliesa bounded logistic function to the result; this model can be seen as a "smoothed" variant of the classical thresholdneuron.A common choice for the activation or "squashing" functions, used to clip for large magnitudes to keep the responseof the neural network bounded[3] is

which we recognize to be of the form of the logistic function. These relationships result in simplified implementations of artificial neural networks with artificial neurons. Practitioners caution that sigmoidal functions which are symmetric about the origin (e.g. the hyperbolic tangent) lead to faster convergence when training networks

Page 28: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Logistic function 4

with backpropagation.[4]

In statisticsLogistic functions are used in several roles in statistics. Firstly, they are the cumulative distribution function of thelogistic family of distributions. Secondly they are used in logistic regression to model how the probability p of anevent may be affected by one or more explanatory variables: an example would be to have the model

where x is the explanatory variable and a and b are model parameters to be fitted.An important application of the logistic function is in the Rasch model, used in item response theory. In particular,the Rasch model forms a basis for maximum likelihood estimation of the locations of objects or persons on acontinuum, based on collections of categorical data, for example the abilities of persons on a continuum based onresponses that have been categorized as correct and incorrect.

In medicine: modeling of growth of tumorsAnother application of logistic curve is in medicine, where the logistic differential equation is used to model thegrowth of tumors. This application can be considered an extension of the above mentioned use in the framework ofecology. Denoting with X(t) the size of the tumor at time t, its dynamics are governed by:

which is of the type:

where F(X) is the proliferation rate of the tumor.If a chemotherapy is started with a log-kill effect, the equation may be revised to be

where c(t) is the therapy-induced death rate. In the idealized case of very long therapy, c(t) can be modeled as aperiodic function (of period T) or (in case of continuous infusion therapy) as a constant function, and one has that

i.e. if the average therapy-induced death rate is greater than the baseline proliferation rate then there is theeradication of the disease. Of course, this is an over-simplified model of both the growth and the therapy (e.g. it doesnot take into account the phenomenon of clonal resistance).

Page 29: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Logistic function 5

In chemistry: reaction modelsThe concentration of reactants and products in autocatalytical reactions follow the logistic function.

In physics: Fermi distributionThe logistic function determines the statistical distribution of fermions over the energy states of a system in thermalequilibrium. In particular, it is the distribution of the probabilities that each possible energy level is occupied by afermion, according to Fermi-Dirac statistics.

In linguistics: language changeIn linguistics, the logistic function can be used to model language change[5] : an innovation that is at first marginalbegins to spread more quickly with time, and then more slowly as it becomes more universally adopted.

In economicsThe logistic function can be used to illustrate the progress of the diffusion of an innovation through its life cycle.This method was used in papers by several researchers at the International Institute for Applied Systems Analysis(IIASA). These papers deal with the diffusion of various innovations, infrastructures and energy source substitutionsand the role of work in the economy as well as with the long economic cycle. Long economic cycles wereinvestigated by Robert Ayres (1989)[6] Cesare Marchetti published on long economic cycles and on diffusion ofinnovations.[7] [8] Arnulf Grübler’s book (1990) gives a detailed account of the diffusion of infrastructures includingcanals, railroads, highways and airlines , showing that their diffusion followed logistic shaped curves.[9]

Carlota Perez used a logistic type curve to illustrate the long business cycle with the following labels: beginning of atechnological era as irruption, the ascent as frenzy, the rapid build out as synergy and the completion as maturity.[10]

Double logistic function

Double logistic sigmoid curve

The double logistic is a function similar to the logistic functionwith numerous applications. Its general formula is:

where d is its centre and s is the steepness factor. Here "sgn" represents the sign function.It is based on the Gaussian curve and graphically it is similar to two identical logistic sigmoids bonded together atthe point x = d.One of its applications is non-linear normalization of a sample, as it has the property of eliminating outliers.

Page 30: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Logistic function 6

See also• Generalised logistic curve• Gompertz curve• Hubbert curve• Logistic distribution• Logistic map• Logistic regression• Logit• Log-likelihood ratio• Malthusian growth model• r/K selection theory• Logistic Smooth-Transmission Model

Notes[1] * Verhulst, Pierre-François (1838). "Notice sur la loi que la population poursuit dans son accroissement" (http:/ / books. google. com/

?id=8GsEAAAAYAAJ& q=) (PDF). Correspondance mathématique et physique 10: 113–121. . Retrieved 09/08/2009.[2] V.I. Yukalov, E.P. Yukalova and D. Sornette, Punctuated Evolution due to Delayed Carrying Capacity, Physica D 238, 1752-1767 (2009)[3] Gershenfeld 1999, p.150[4] LeCun, Y.; Bottou, L.; Orr, G.; Muller, K. (1998). Efficient BackProp (http:/ / yann. lecun. com/ exdb/ publis/ pdf/ lecun-98b. pdf). Springer.

ISBN 3540653112.[5] Bod, Hay, Jennedy (eds.) 2003, pp. 147–156[6] Ayres, Robert (1989). Technological Transformations and Long Waves (http:/ / www. iiasa. ac. at/ Admin/ PUB/ Documents/ RR-89-001.

pdf).[7] [ |Marchetti, Cesare (http:/ / cesaremarchetti. org/ )] (1996). Pervasive Long Waves: Is Society Cyclotymic (http:/ / www. agci. org/ dB/ PDFs/

03S2_CMarchetti_Cyclotymic. pdf).[8] [ |Marchetti, Cesare (http:/ / cesaremarchetti. org/ )] (1988). Kondratiev Revisited-After One Cycle (http:/ / www. cesaremarchetti. org/

archive/ scan/ MARCHETTI-037. pdf).[9] Grübler, Arnulf (1990). The Rise and Fall of Infrastructures: Dynamics of Evolution and Technological Change in Transport (http:/ / www.

iiasa. ac. at/ Admin/ PUB/ Documents/ XB-90-704. pdf). Heidelberg and New York: Physica-Verlag.[10] [ |Perez, Carlota (http:/ / www. carlotaperez. org)] (2002). Technological Revolutions and Financial Capital: The Dynamics of Bubbles and

Golden Ages. UK: Edward Elgar Publishing Limited. ISBN 1843763311.

References1. Jannedy, Stefanie; Bod, Rens; Hay, Jennifer (2003). Probabilistic Linguistics. Cambridge, Massachusetts: MIT

Press. ISBN 0-262-52338-8.2. Gershenfeld, Neil A. (1999). The Nature of Mathematical Modeling. Cambridge, UK: Cambridge University

Press. ISBN 978-0521-570954.3. Kingsland, Sharon E. (1995). Modeling nature: episodes in the history of population ecology. Chicago:

University of Chicago Press. ISBN 0-226-43728-0.4. Weisstein, Eric W., " Logistic Equation (http:/ / mathworld. wolfram. com/ LogisticEquation. html)" from

MathWorld.

Page 31: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Logistic function 7

External links• L.J. Linacre, Why logistic ogive and not autocatalytic curve? (http:/ / rasch. org/ rmt/ rmt64k. htm), accessed

2009-09-12.• http:/ / luna. cas. usf. edu/ ~mbrannic/ files/ regression/ Logistic. html• Modeling Market Adoption in Excel with a simplified s-curve (http:/ / 8020world. com/ jcmendez/ 2007/ 04/

business/ modeling-market-adoption-in-excel-with-a-simplified-s-curve)• Weisstein, Eric W., " Sigmoid Function (http:/ / mathworld. wolfram. com/ SigmoidFunction. html)" from

MathWorld.• Online experiments with JSXGraph (http:/ / jsxgraph. uni-bayreuth. de/ wiki/ index. php/ Logistic_process)

Page 32: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Article Sources and Contributors 8

Article Sources and ContributorsLogistic function Source: http://en.wikipedia.org/w/index.php?oldid=407555192 Contributors: 2004-12-29T22:45Z, A bit iffy, A. di M., A3 nm, AJRobbins, Ahoerstemeier, Albertodonofrio,Alvestrand, Ancheta Wis, Arnejohs, Bcasterline, BenFrantzDale, Bobo192, Booyabazooka, Calimo, Charles Matthews, Cryptic C62, Cutler, Cyrius, DFRussia, DaveWF, David Gerard,Doctormatt, Dominus, Dsornette, Duoduoduo, Dzhim, Echo R314, Edward Z. Yang, Eequor, Eric Kvaalen, Eyvin, Farrwill, Fisenko, Foobarhoge, Fplay, Fredrik, Fyedernoggersnodden, Gbedia,Giftlite, Gramscis cousin, Henrygb, Holon, Ioannes Pragensis, Jclaer, Jcmendez, Jinxs, JohnOwens, Jorge Stolfi, Just Another Dan, Karada, Knights who say ni, LOL, Linas, Mack2, Mandarax,MarkSweep, Mbhiii, Mcld, Melcombe, Menelik3, Metamagician3000, Michael Hardy, Mihal Orela, MihalOrela, MisterSheik, MonteChristof, MrOllie, Mrwojo, Nbarth, Neelix, New ImageUploader 929, Numsgil, O18, Oanjao, Pargeter1, PhDP, Phipperdee, Phmoreno, Plumbago, Populus, Qef, Rhythmiccycle, Richard001, Rjwilmsi, Robth, Runia, RyanEberhart, Sbarthelme,Scythe33, Shadowjams, Shanemcd, SimonMayer, Solarapex, Soobrickay, Swestrup, Tabletop, Tamfatkh, The Anome, The Thing That Should Not Be, TigerShark, TittoAssini, Trinary M01,UnitedStatesian, Vectornaut, Voretus, Wolfman, Woohookitty, Zeev Grin, 141 anonymous edits

Image Sources, Licenses and ContributorsImage:Logistic-curve.svg Source: http://en.wikipedia.org/w/index.php?title=File:Logistic-curve.svg License: Public Domain Contributors: User:QefImage:Pierre Francois Verhulst.jpg Source: http://en.wikipedia.org/w/index.php?title=File:Pierre_Francois_Verhulst.jpg License: Public Domain Contributors: Bluemoose, High on a treeImage:dsigmoid.png Source: http://en.wikipedia.org/w/index.php?title=File:Dsigmoid.png License: Public Domain Contributors: Brighterorange, Loluengo, Sakurambo

LicenseCreative Commons Attribution-Share Alike 3.0 Unportedhttp:/ / creativecommons. org/ licenses/ by-sa/ 3. 0/

Page 33: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

16 January 2011 From 8020world.com/blog/2007/04/business/modeling-market-adoption-in-excel-with-a-simplified-s-curve

Modeling market adoption in Excel with a simplified s-curve 24th of April, 2007 23:04 Editor’s Note – The screen cast referenced in the text is no longer available. The URL was http://8020.babygames4.us/blog/wp-content/uploads/2007/10/scurvedemo.html, but no such server now exists.

Often business analysts need to model the adoption of a new product or service for financial planning. There are several approaches, but a common one is the s-curve ( see Wikipedia article ). Here is a simple implementation in Excel that can be easily added to your spreadsheets. It reduces all the [http://8020world.com/blog/2007/07/excel/math-on-the-simplified-market-adoption-s-curve-for-excel/, see below] math to just three parameters:

• saturation – What is the maximum expected penetration after the product becomes mainstream? i.e. what is the value that the top of the s-curve will reach?

• start of fast growth – By this year, the penetration will be 10% of the saturation value, and it will start to grow rapidly. 10% was an arbitrary choice to simplify the model, and by doing some math you could change the formula to any value. It is a reasonable choice in most cases. We’ll call this parameter hypergrowth

• takeover time – How long it will take for the product to “catch on”? – The operational assumption in the formula is that this number of years after the start of fast growth, the product would have reached 90% of the saturation value and will start to slow down. Again, 90% is an arbitrary value I chose.

The s-curve model focuses in the early phases of the product lifecycle, until maturity is reached. Penetration decay is NOT covered by this model.

The formula for each year’s penetration would simply be: =saturation/(1+81^((hypergrowth+takeover/2-year)/takeover))

See it in action:

Page 34: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

16 January 2011 From 8020world.com/blog/2007/04/business/modeling-market-adoption-in-excel-with-a-simplified-s-curve

p. In the sample spreadsheet above, look at cell B8 where you can see the formula in use. It is the same for all row 8.

saturation, hypergrowth and takeover are names defined for the parameters on rows 2 to 5 (you use names in your models instead of plain cell references, don’t you?)

Very simple, easy to maintain, light on calculation times… happy market adoption modeling!

PS: The chart shown is NeoOffice, an open source alternative to Excel for Macintosh users, based on OpenOffice.

Page 35: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Logistic regression 1

Logistic regressionIn statistics, logistic regression (sometimes called the logistic model or logit model) is used for prediction of theprobability of occurrence of an event by fitting data to a logit function logistic curve. It is a generalized linear modelused for binomial regression. Like many forms of regression analysis, it makes use of several predictor variables thatmay be either numerical or categorical. For example, the probability that a person has a heart attack within aspecified time period might be predicted from knowledge of the person's age, sex and body mass index. Logisticregression is used extensively in the medical and social sciences fields, as well as marketing applications such asprediction of a customer's propensity to purchase a product or cease a subscription.

Definition

Figure 1. The logistic function, with z on the horizontal axis and ƒ(z) on the vertical axis

An explanation of logistic regressionbegins with an explanation of thelogistic function:

A graph of the function is shown in figure 1. The input is z and the output is ƒ(z). The logistic function is usefulbecause it can take as an input any value from negative infinity to positive infinity, whereas the output is confined tovalues between 0 and 1. The variable z represents the exposure to some set of independent variables, while ƒ(z)represents the probability of a particular outcome, given that set of explanatory variables. The variable z is a measureof the total contribution of all the independent variables used in the model and is known as the logit.The variable z is usually defined as

where is called the "intercept" and , , , and so on, are called the "regression coefficients" of , ,respectively. The intercept is the value of z when the value of all independent variables is zero (e.g. the value of z

in someone with no risk factors). Each of the regression coefficients describes the size of the contribution of that riskfactor. A positive regression coefficient means that the explanatory variable increases the probability of the outcome,while a negative regression coefficient means that the variable decreases the probability of that outcome; a largeregression coefficient means that the risk factor strongly influences the probability of that outcome; while anear-zero regression coefficient means that that risk factor has little influence on the probability of that outcome.Logistic regression is a useful way of describing the relationship between one or more independent variables (e.g.,age, sex, etc.) and a binary response variable, expressed as a probability, that has only two possible values, such as

Page 36: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Logistic regression 2

death ("dead" or "not dead").

Sample size-dependent efficiencyLogistic regression tends to systematically overestimate odds ratios or beta coefficients in small and moderatesamples (samples < 500 approximately). With increasing sample size the magnitude of overestimation diminishesand the estimated odds ratio asymptotically approaches the true population value. However, it was concluded thatthis overestimation might in a single study not have any relevance for the interpretation of the results since it is muchlower than the standard error of the estimate. But if a number of small studies with systematically overestimatedeffect sizes are pooled together without consideration of this effect we may misinterpret evidence in the literature foran effect when in reality such effect does not exist[1] .A minimum of ten events per independent variable has been recommended.[2] [3] For example, in a study wheredeath is the outcome of interest, and there were 50 deaths out of 100 patients, the number of independent variablesthe model can support is 50/10 = 5.

ExampleThe application of a logistic regression may be illustrated using a fictitious example of death from heart disease. Thissimplified model uses only three risk factors (age, sex, and blood cholesterol level) to predict the 10-year risk ofdeath from heart disease. This is the model that we fit:

Which means the model is

In this model, increasing age is associated with an increasing risk of death from heart disease (z goes up by 2.0 forevery year over the age of 50), female sex is associated with a decreased risk of death from heart disease (z goesdown by 1.0 if the patient is female), and increasing cholesterol is associated with an increasing risk of death (z goesup by 1.2 for each 1 mmol/L increase in cholesterol above 5 mmol/L).We wish to use this model to predict Nathan Petrelli's risk of death from heart disease: he is 50 years old and hischolesterol level is 7.0 mmol/L. Nathan Petrelli's risk of death is therefore

This means that by this model, Nathan Petrelli's risk of dying from heart disease in the next 10 years is 0.07 (or 7%).

Page 37: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Logistic regression 3

Formal mathematical specificationLogistic regression analyzes binomially distributed data of the form

where the numbers of Bernoulli trials ni are known and the probabilities of success pi are unknown. An example ofthis distribution is the fraction of seeds (pi) that germinate after ni are planted.The model proposes for each trial i there is a set of explanatory variables that might inform the final probability.These explanatory variables can be thought of as being in a k-dimensional vector Xi and the model then takes theform

The logits, natural logs of the odds, of the unknown binomial probabilities are modeled as a linear function of the Xi.

Note that a particular element of Xi can be set to 1 for all i to yield an intercept in the model. The unknownparameters βj are usually estimated by maximum likelihood using a method common to all generalized linearmodels. The maximum likelihood estimates can be computed numerically by using iteratively reweighted leastsquares.The interpretation of the βj parameter estimates is as the additive effect on the log of the odds for a unit change in thejth explanatory variable. In the case of a dichotomous explanatory variable, for instance gender, is the estimate ofthe odds of having the outcome for, say, males compared with females.The model has an equivalent formulation

This functional form is commonly called a single-layer perceptron or single-layer artificial neural network. Asingle-layer neural network computes a continuous output instead of a step function. The derivative of pi with respectto X = x1...xk is computed from the general form:

where f(X) is an analytic function in X. With this choice, the single-layer neural network is identical to the logisticregression model. This function has a continuous derivative, which allows it to be used in backpropagation. Thisfunction is also preferred because its derivative is easily calculated:

ExtensionsExtensions of the model cope with multi-category dependent variables and ordinal dependent variables, such aspolytomous regression. Multi-class classification by logistic regression is known as multinomial logit modeling. Anextension of the logistic model to sets of interdependent variables is the conditional random field.

Model AccuracyA way to test for errors in models created by step-wise regression, is to not rely on the model's F-statistic,significance, or multiple-r, but instead assess the model against a set of data that was not used to create the model[4] .This is often done by building a model based on a sample of the dataset available (i.e. 30%) and use the remaining70% dataset to assess the accuracy of the model.

Page 38: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Logistic regression 4

Accuracy is measured as correctly classified records in the holdout sample [5] . There are four possibleclassifications: 1) A predicted 0 when the holdout sample has a 0; 2) A predicted 0 when the holdout sample has a 1(error); 3) A predicted 1 when the holdout sample has a 0 (error); 4) A predicted 1 when the holdout sample has a 1.The percent of correctly classified observations in the holdout sample is referred to the assessed model accuracy.Additional accuracy can be expressed as the models ability to correctly classify 0, or the ability to correctly classify1 in the holdout dataset. The hold-out model assessment method is particularly valuable when data is collected indifferent settings (i.e. time, social) or when models are assumed to be generalizable.

See also• Logistic function• Sigmoid function• Artificial neural network• Data mining• Jarrow–Turnbull model• Limited dependent variable• Linear discriminant analysis• Multinomial logit model• Ordered logit• Perceptron• Principle of maximum entropy• Probit model• Variable rules analysis• Hosmer–Lemeshow test• Separation (statistics)

References[1] Nemes S, Jonasson JM, Genell A, Steineck G. 2009 Bias in odds ratios by logistic regression modelling and sample size. BMC Medical

Research Methodology 9:56 BioMedCentral (http:/ / www. biomedcentral. com/ 1471-2288/ 9/ 56)[2] Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR (1996). "A simulation study of the number of events per variable in logistic

regression analysis". J Clin Epidemiol 49 (12): 1373–9. PMID 8970487.[3] Agresti A (2007). "Building and applying logistic regression models". An Introduction to Categorical Data Analysis. Hoboken, New Jersey:

Wiley. p. 138. ISBN 978-0-471-22618-5.[4] Jonathan Mark and Michael A. Goldberg (2001). Multiple Regression Analysis and Mass Assessment: A Review of the Issues. The Appraisal

Journal, Jan. pp. 89-109[5] Mayers, J.H and Forgy E.W. (1963). The Development of numerical credit evaluation systems. Journal of the American Statistical

Association, Vol.58 Issue 303 (Sept) pp 799-806

• Agresti, Alan. (2002). Categorical Data Analysis. New York: Wiley-Interscience. ISBN 0-471-36093-7.• Amemiya, T. (1985). Advanced Econometrics. Harvard University Press. ISBN 0-674-00560-0.• Balakrishnan, N. (1991). Handbook of the Logistic Distribution. Marcel Dekker, Inc.. ISBN 978-0824785871.• Greene, William H. (2003). Econometric Analysis, fifth edition. Prentice Hall. ISBN 0-13-066189-9.• Hilbe, Joseph M. (2009). Logistic Regression Models. Chapman & Hall/CRC Press. ISBN 978-1-4200-7575-5.• Hosmer, David W.; Stanley Lemeshow (2000). Applied Logistic Regression, 2nd ed.. New York; Chichester,

Wiley. ISBN 0-471-35632-8.

Page 39: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Logistic regression 5

External links• Logistic Regression Interpretation (http:/ / www. appricon. com/ index. php/ logistic-regression-analysis. html)• Logistic Regression tutorial (http:/ / www. omidrouhani. com/ research/ logisticregression/ html/

logisticregression. htm)

Page 40: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Article Sources and Contributors 6

Article Sources and ContributorsLogistic regression Source: http://en.wikipedia.org/w/index.php?oldid=405808995 Contributors: Antaltamas, Aprock, BAnstedt, Baccyak4H, BlaiseFEgan, BrendanH, Cancan101, Cbuckley,Ciphers, D nath1, Den fjättrade ankan, Duoduoduo, Dvandeventer, Dvdpwiki, F0rbidik, Future Perfect at Sunrise, G716, Gak, Giftlite, Grumpfel, Jamelan, Jjoseph, Jtneill, Junling, Kallerdis,Kenkleinman, Kierano, Kpmiyapuram, Ktalon, LOL, Lassefolkersen, Lilac Soul, Mack2, Mani1, Markjoseph125, Materialscientist, Mbhiii, Mdf, Melcombe, Michael Hardy, Mudx77, Neoforma,New Image Uploader 929, Nszilard, Nutcracker, O18, Olberd, Oleg Alexandrov, Owenozier, Pgan002, Qef, Qwfp, Requestion, Rich Farmbrough, Sanchom, Schwnj, Sderose, Secondsminutes,Statone, Tomi, Tomixdf, Twanvl, X7q, Ypetrachenko, 111 anonymous edits

Image Sources, Licenses and ContributorsImage:Logistic-curve.svg Source: http://en.wikipedia.org/w/index.php?title=File:Logistic-curve.svg License: Public Domain Contributors: User:Qef

LicenseCreative Commons Attribution-Share Alike 3.0 Unportedhttp:/ / creativecommons. org/ licenses/ by-sa/ 3. 0/

Page 41: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

1

1 16 January 2011 From www.appricon.com/index.php/logistic-regression-analysis.html

Logistic Regression Analysis and Interpretation Logistic regression analysis and interpretation is a complex task that involves different methods and approaches. This document depicts some of the common methods for binomial logistic regression analysis and is intended to simplify the logistic regression analysis process.

Note – This document does not cover all aspects of logistic regression analysis and does not guarantee a complete and secure logistic regression model. Models should be validated with statisticians or data mining professionals.

Glossary and Terms

• Explained variable (dependent variable)– The variable in question or the variable being explored by the model. This variable is binary and can only have two values 0 or 1.

• Explanatory variables (independent variable) – The variables used to model the explained variable (dependent variable).

• Null model – The empty model or a model that includes the explained variable. • Fitted model – The actual model that contains the explained variable and the

explanatory variables

Logistic Regression – General Modeling Guidelines

• Data set should contain at least 30 rows of data • The logistic regression model should be comprised of no more than 1 variable per

30-50 data rows. As example, a logistic regression model based on a data set containing 300 data rows should have no more than 6-10 variables (300/30 – 300/50).

• A logistic regression model should have preselected variables used as the model core and defined by a professional in the field of application being modeled. The preselected variables are ones that considered as affecting the decision being modeled prior to the modeling process. The general approach is that a logistic regression model has to be based on the field of application and can not be defined solely on statistical tests.

• Logistic regression models should have a minimal set of variables – This rule can not be quantified yet variables that add little to model performance should not be included.

Page 42: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

2

2 16 January 2011 From www.appricon.com/index.php/logistic-regression-analysis.html

• The desired parameter values in the process of analysis are not absolute and relate to the field being modeled (example: models involving human behavior might have larger p-values and less accuracy compare to models involving physical phenomena).

• Classification tables (the tables that show how model classified the rate of hits/misses) should be assessed with caution. In order to achieve good and long lasting results, statistical testing should be the main tool of analysis and classification tables should be treated as an independent test conducted after model quality assessments are completed. Classification tables play an important role once a model is deployed and used since only reality shows the true quality of the model (after deployment). At this stage classification tables computed over the model results are the main tool for logistic regression model performance analysis.

• Biased data – many data sets include variables that imply future results, such variables cause model biasing and should be avoided. Notable cases of data corruption and biasing:

Variables from the future – Variables that include future data or that were used to derive the target variable.

Null values – Most applications for logistic regression modeling ignore null data and exclude it from the model. Null values can enter implicitly into a model in calculated variables especially when using dummy variables. In many cases null values imply that the target variable is 0, this may happen in commercial data sets when a field such as gender or address field was filled only after a customer joined the company.

Large coefficient values – In many cases, variables with a large coefficient values in a logistic regression model imply either that the variable is extremely effective (should probably be removed from the model and used as an external rule or filter) or that the variable contains biased data and is highly correlated to the explained variable. In either case, high coefficient variables in a logistic regression model should be treated carefully.

Logistic Regression - Model Parameter Analysis

Different statistical packages and applications have different outputs for logistic regression models. Many of the logistic regression model parameters are relative parameters that change dramatically over different models. Comparing between test parameter values in different models should be done using models built on the same data set for the same explained (independent) variable in question. The following suggestion for logistic regression model parameter analysis includes some of the common tests.

• Null Deviance – Without diving to the way this parameter value is calculated, null deviance is the performance of the “empty” model.

Page 43: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

3

3 16 January 2011 From www.appricon.com/index.php/logistic-regression-analysis.html

• Model Deviance – Describes the performance of the current model. The higher the model deviance compare to the null deviance is, the better the model is. Use can use the improvement parameter (below) instead of this variable to compare between model performances.

• Improvement – Determines how much the model improves the classification of the variable in question (the explained or dependent variable). This parameter should be as high as possible and can only compared between logistic regression models for the same explained variable. Improvement is the difference between Null Deviance and Model Deviance.

• p-value – A parameter with extreme importance as gives a good indication to the significance of the model. When a model has a high p-value (p-value > 0.2) there is a very good chance that the model is not significant and should not be used.

• Cox and Snell R Square – This R^2 test as well as other logistic regression R^2 tests tries to measure the strength of association of the model. The values of this test are between 0 and 1. Today Nagelkerke R^2 is more common and considered a better indication to strength of association.

• Nagelkerke R Square – A modification of the Cox and Snell R^2 • Area Under ROC Curve (AOC) – Gives a good indication to model

performance (values are between 0.5 and 1). This variable should be as high as possible with some restrictions. Typical values indicate the following:

0.5 – No distinguish ability (the model has no meaning).

0.51 – 0.7 – Low distinguish ability (not a very good model yet the model can be used).

0.71 – 0.9 – Very good distinguish ability.

0.91 – 1 – Excellent distinguish ability; In some fields, logistic regression models can have an excellent distinguish ability, however this might indicate that the model is “too good to be true”. Double and triple check your model making sure that no variables from the future are present and that the model has no other odd parameter values.

Page 44: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

4

4 16 January 2011 From www.appricon.com/index.php/logistic-regression-analysis.html

• Homsmer-Lemeshow Probability – The Homsmer – Lemeshow probability test is based on a chi-square test which is done over the Homsmer – Lemeshow table (below). This important parameter tests the assumption that the model distinguishes the explained variable better. The actual Null hypothesis is that the model is insignificant and the test tries to break this hypothesis. Values for this test should be higher than 0.5 – 0.6.

• Homsmer-Lemeshow table – A model classification table which describes both expected model classifications and actual model classifications. The Homsmer-Lemeshow table divides the data into 10 groups (deciles, one per row) each representing the expected and observed frequency of both 1 and 0 values. The expected frequency of data assigned to each deciles should match the actual frequency outcome and each deciles should contain data.

HL table output

Page 45: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

5

5 16 January 2011 From www.appricon.com/index.php/logistic-regression-analysis.html

• Classification tables – In binomial logistic regression, the classification table is a 2 x 2 table that contains the observed and predicted model results (shown in the figure below). The classification table is computed by taking a data set, usually either training data - the data the model was built on or test data – a data set that was not used to compute model coefficients and is used for model quality evaluation. The model then is used to classify each data record using the computed probability given by the model (a value between 0 and 1) and the cut value which is the minimal value of probability that should be classified as 1. The default "cut value" value is 0.5, determines that a data record that has a value larger than 0.5 should be classified as 1.

The classification table has 4 data cells:

1. Observed 0 Predicted 0 – The number of cases that were both predicted and observed as 0. The model classification was correct for these records.

2. Observed 0 Predicted 1 – The number of cases that were predicted as 1 yet observed as 0. The records in this cell are referred to as false negatives The model classification was incorrect for these records.

3. Observed 1 Predicted 1 – The number of cases that were both predicted and observed as 1. The model classification was correct for these records.

4. Observed 1 Predicted 0 – The number of cases that were predicted as 0 yet observed as 1. The records in this cell are referred to as false positives The model classification was incorrect for these records.

Different fields of applications require different rates of false positives and false negatives since in some applications false positives can not be tolerated while in other applications, false negatives can not be tolerated.

Cases plot (histogram of the predicted data)

This plot shows how many data records were assigned to each probability interval. An example of a cases plot is shown below where 30 data records were assigned a 0.2-0.3

Page 46: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

6

6 16 January 2011 From www.appricon.com/index.php/logistic-regression-analysis.html

probability to have a value of 1 yet these values are classified by the model as 0. When a model contains high coefficient variables, the cases plot tends to have most values on either end of the chart (data is either classified as having a very low probability of 1 or a very high probability of 1).

Hits ratio

The hits ratio shows each the model performance over the deciles

During logistic regression model analysis, it is important to examine the hits/miss ratio. In many models (e.g. models that involve classification of behavior), the hits ratio chart is expected to have lower values in the middle that rise towards each end (as shown in the image below). This behavior is observed since the logistic regression model is expected to miss more cases classified with 0.3 – 0.7 probability than cases that are classified with 0-0.2 or 0.8-1 probability.

Page 47: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

7

7 16 January 2011 From www.appricon.com/index.php/logistic-regression-analysis.html

Page 48: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

1 16 January 2011 From www.omidrouhani.com/research/logisticregression/html/logisticregression.htm

Omid’s Logistic Regression tutorial The main focus of this Logistic Regression tutorial is the usage of Logistic Regression in the field of Machine Learning and Data Mining. We will introduce the mathematical theory behind Logistic Regression and show how it can be applied to the field of Machine Learning when we try to extract information from very large data sets.

1 Theory.. 10

1.1 Machine Learning.. 10

1.2 Regression Analysis. 48

1.2.1 Ordinary Linear Regression. 58

1.2.2 General Linear Regression. 90

1.2.3 Logistic Regression. 126

1.2.4 Obtaining the Model Parameters. 269

1.2.5 Ridge Regression. 317

1.2.6 Weighted Logistic Regression. 446

1.3 Solving Linear Equation Systems. 501 1.3.1 Solving a Simple Linear Equation System.. 566

1.3.2 Conjugate Gradient Method. 591

1.3.3 Solvers for Building Large Logistic Regression Classifiers. 698

1.3.4 How to Calculate beta.... 738

1.4 Classification and Ranking.. 778

1.4.1 Do Classification Using Logistic Regression. 780

1.4.2 Do Ranking Using Logistic Regression. 807

LIST OF ABBREVIATIONS

REFERENCES

ABOUT THIS DOCUMENT

Page 49: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

2 16 January 2011 From www.omidrouhani.com/research/logisticregression/html/logisticregression.htm

1 Theory 1.1 Machine Learning Machine learning is a subfield of Artificial Intelligence and deals with the development of techniques which allows computers to “learn” from previously seen datasets. Machine learning overlaps extensively with mathematical statistics, but differs in that it deals with the computational complexities of the algorithms.

Machine learning itself can be divided into many subfields, whereas the field we will work with is the one of supervised learning where we will start with a data set with labeled data points. Each data point is a vector

and each data point has a label

Given a set of data points and the corresponding labels we want be able to train a computer program to classify new (so far unseen) data points by assigning a correct class label to each data point. The ratio of correctly classified data points is called the accuracy of the system. 1.2 Regression Analysis Regression analysis is a field of mathematical statistic that is well explored and has been used for many years. Given a set of observations, one can use regression analysis to find a model that best fits the observation data. 1.2.1 Ordinary Linear Regression The most common form of regression models is the ordinary linear regression which is able to fit a straight line through a set of data points. It is assumed that the data point’s values are coming from a normally distributed random variable with a mean that can be written as a linear function of the predictors and with a variance that is unknown but constant.

We can write this equation as

where a is a constant, sometimes also denoted as b0, beta is a vector of the same

size as our input variable x and where the error term Figure 2 shows a response with mean

Page 50: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

3 16 January 2011 From www.omidrouhani.com/research/logisticregression/html/logisticregression.htm

which follows a normal distribution with constant variance 1.

Figure 2 Ordinary Linear Regression Ordinary Linear Regression for y = -2.45 + 0.35 * x. The error term has mean 0 and a constant variance.

1.2.2 General Linear Regression The general form of regression, called generalized linear regression, assumes that the data points are coming from a distribution that has a mean that comes from a monotonic nonlinear transformation of a linear function of the predictors. If we can call this transformation g, the equation can be written as

where a is a constant, sometimes also denoted as b0, beta is a vector of the same

size as our input variable x and where the error term is epsilon.

The inverse of g is called the link function. With generalized linear regression we no longer require the data points to have a normal random distribution, but we can have any distribution.

Figure 3 shows a response with mean

which follows a Poisson distribution.

Page 51: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

4 16 January 2011 From www.omidrouhani.com/research/logisticregression/html/logisticregression.htm

Figure 3 Generalized Linear Regression Generalized Linear Regression for a signal coming from a Poisson distribution with mean y = exp(-2.45 + 0.35 * x).

1.2.3 Logistic Regression Although the linear regression model is simple and used frequently it’s not adequate for some purposes. For example, imagine the response variable y to be a probability that takes on values between 0 and 1. A linear model has no bounds on what values the response variable can take, and hence y can take on arbitrary large or small values. However, it is desirable to bound the response to values between 0 and 1. For this we would need something more powerful than linear regression.

Another problem with the linear regression model is the assumption that the response y has a constant variance. This can not be the case if y follows for example a binomial distribution (y ~ Bin(p,n)). If y also is normalized so that it takes values between 0 and 1, hence y = Bin(p,n)/n, then the variance would then be Var(y) = p*(1-p), which takes on values between 0 and 0.25. To then make an assumption that y would have a constant variance is not feasible.

In situations like this, when our response variable follows a binomial distribution, we need to use general linear regression. A special case of general linear regression is logistic regression, which assumes that the response variable follows the logit-function shown in Figure 4.

Page 52: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

5 16 January 2011 From www.omidrouhani.com/research/logisticregression/html/logisticregression.htm

Figure 4 The logit function Note that it’s only defined for values between 0 and 1. The logit function goes from minus infinity to plus infinity. The logit function has the nice property that logit(p) = -logit(1-p) and its inverse is defined for values from minus infinity to plus infinity, and it only takes on values between 0 and 1. However, to get a better understanding for what the logit-function is we will now introduce the notation of odds. The odds of an event that occurs with probability P is defined as odds = P / (1-P).

Figure 5 shows how the odds-function looks like. As we can see, the odds for an event is not bounded and goes from 0 to infinity when the probability for that event goes from 0 to 1.

However, it’s not always very intuitive to think about odds. Even worse, odds are quite unappealing to work with due to its asymmetry. When we work with probability we have that if the probability for yes is p, then the probability for no is 1-p. However, for odds, there exists no such nice relationship.

To take an example: If a Boolean variable is true with probability 0.9 and false with probability 0.1, we have that the odds for the variable to be true is 0.9/0.1 = 9 while the odds for being false is 0.1/0.9 = 1/9 = 0.1111... . This is a quite unappealing relationship. However, if we take the logarithm of the odds, when we would have log(9) for true and log(1/9) = -log(9) for false.

Page 53: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

6 16 January 2011 From www.omidrouhani.com/research/logisticregression/html/logisticregression.htm

Hence, we have a very nice symmetry for log(odds(p)). This function is called the logit-function.

logit(p) = log(odds(p)) = log(p/(1-p))

As we can see, it is true in general that logit(1-p)=-logit(p).

logit(1-p) = log((1-p)/p) = - log(p/(1-p)) = -logit(p)

Figure 5 The odds function The odds function maps probabilities (between 0 and 1) to values between 0 and infinity. The logit-function has all the properties we wanted but did not have when we previously tried to use linear regression for a problem where the response variable followed a binomial distribution. If we instead use the logit-function we will have p bounded to values between 0 and 1 and we will still have a linear expression for our

input variable x logit(p) = a + * x. [logit(p) = a + beta * x.]

Page 54: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

7 16 January 2011 From www.omidrouhani.com/research/logisticregression/html/logisticregression.htm

If we would like to rewrite this expression to get a function for the probability p it would look like

1.2.4 Obtaining the Model Parameters

In practice, one usually simplifies notation somewhat by only having one parameter beta instead of both alpha and beta.

If our original problem is formulated such as

We rewrite this as

If we now call ’ = [ ] T and x’ = [1 x] then we can formulate the exact same problem but with only “one” model parameter beta’

. .Note that this is nothing but a change of notation. We still have two parameters to determine, but we have simplified our notation so that we now only need to estimate beta’. From now on, we will denote beta’ as beta and x’ as x and our problem statement will hence be to obtain the model parameter beta when If we have made n observations with responses yi and predictors xi we can define

The system we want to solve to find the parameter beta is then written as

. The minimum square error solution to this system is found as follows

Page 55: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

8 16 January 2011 From www.omidrouhani.com/research/logisticregression/html/logisticregression.htm

We just need to evaluate the expression (XT * X)-1 * X T * Y and we have found the beta that minimizes the sum of squares residuals. However, in practice there might be computational difficulties with evaluating this expression, as we will see further on.

1.2.5 Ridge Regression

As we have seen we can obtain beta by simply evaluating

However, if some prediction variables are (almost) linearly dependent, then XT * X is (almost) singular and hence the variance of beta is very large. So to avoid having XT * X singular we add a small constant value to the diagonal of the matrix

where I = unity matrix, and λ = small constant.

By doing this we avoid the numerical problems we will get when trying to invert an (almost) singular matrix. But we are paying a price for doing this. By doing this we have biased the prediction and hence we are solving the solution to a slightly different problem. As long as the error due to the bias is smaller than the error we would have got from having a (nearly) singular XT * X we will end up getting a smaller mean square error and hence ridge regression is desirable.

We can also see ridge regression as a minimization problem where we try to find a beta according to

.

.Which we (through Lagrange multiplier) can rewrite to an unconstraint minimization problem

Page 56: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

9 16 January 2011 From www.omidrouhani.com/research/logisticregression/html/logisticregression.htm

where λ is inversely proportional to s.

This can be compared to the classic regression where we are minimizing

. Now the problem is just to find a good λ (or s) so that the variance gets small, but at the same time we should make sure the bias error doesn’t get to big either. To find a good λ (or s) we can use heuristics, graphics or cross validation. However, this can be computationally expensive, so in practice one might prefer to just choose a small constant λ and then normalize the input data so that

and .

Or in other words, we make sure x is centered and normalized.

Ridge regression has the advantage of preferring smaller coefficient values for beta and hence we end up with a less complex model. This is desirable, due too Occam’s razor which says that it is preferable to pick the simpler model out of two models that are equally good but where one is simpler than the other, since the simpler model is more likely to be correct and also hold for new unseen data.

Another way to get an intuition for why we prefer small coefficient values is in the case when we have correlated attributes. Imagine two attributes that are strongly correlated and when either one of them takes the value 1, the other one does the same with high likelihood and vice verse. It would now be possible that the coefficients for these two attributes to take identical extremely large values but with different signs since they both “cancel out” each other. This is of course undesirable in the situations when the attributes take different values and X * takes on ridiculously large values.

Ridge regression has proved itself to be superior to many alternative methods when it has been used to avoid numerical difficulties when solving linear equation systems for building logistic regression classifiers ([1], [2], [13]).

Page 57: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

10 16 January 2011 From www.omidrouhani.com/research/logisticregression/html/logisticregression.htm

Ridge regression was first used in the context of least square regression in [15] and later on used in the context of logistic regression in [16].

1.2.6 Weighted Logistic Regression As we have seen we need to evaluate this expression in classic logistic regression

This expression came from the linear equation system

. Indirectly we assumed that all observations where equally important and hence had the same weight, since we tried to minimize the sum of squared residuals.

However, when we do weighted logistic regression we will weight the importance of our observations so that different observations have different weights associated to them. We will have a weight matrix W that is a diagonal matrix with the weight of observation i at location Wii.

Now, instead of evaluating

we will evaluate

where

and µi is our estimate for p, which we previously saw could be written as

.

The weights Wii are nothing but the standard deviation of our own prediction. In general, if

then and since we have a Bernoulli trial we have n = 1 so the variance becomes

Page 58: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

11 16 January 2011 From www.omidrouhani.com/research/logisticregression/html/logisticregression.htm

. The term yi - µi is our prediction error and the variance Wii “scales” it so that a low variance will have a larger impact on U than a high variance data point. Or in other words, the importance of correctly classifying data points with a low variance increases while the importance of correctly classifying data points with a high variance decreases.

1.3 Solving Linear Equation Systems We have now seen the theory behind the equation that we now need to solve. With notation as before we now want to solve

However, so far we have not discussed the computational difficulties with doing this. One of the major differences between classical statistics and machine learning is that the later one deals with the computational difficulties one is facing when one is trying to solve the equations obtained from the field of classical statistics.

When one needs to evaluate an expression such as the one we have for beta, it is very common to write down the problem as a linear equation system that needs to be solved, to avoid having to calculate the inverse of a large matrix.

Hence, the problem can be rewritten as

Where

Let us now take a closer look at the problem we are facing.

What we want to achieve is to build a classifier that will classify very large data sets. Our input data is X and (indirectly) U. To get an idea of what size our matrices have, imagine our application having 100,000 classes and 100,000 attributes, also imagine us having in average 10 training data points per class. The size of the A matrix would then be 100,000 x 100,000 and the size of the b vector would be 100,000 x 1. The X matrix would be of size 1’000’000 x 100,000. Luckily for us, our data will be sparse, and hence only a small fraction of the elements will have non-zero values. Using this knowledge, we can choose an equation solver that is efficient given this assumption.

1.3.1 Solving a Simple Linear Equation System

Page 59: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

12 16 January 2011 From www.omidrouhani.com/research/logisticregression/html/logisticregression.htm

In general when one needs to solve a linear equation system such as

one needs to choose a solver method appropriate to the properties of the problem. This basically means that one needs to investigate what properties are satisfied for A and from that choose one of the many available solver methods that are available. If one needs an iterative solver, which does not give an exact solution but is computationally efficient and in many case the only practical alternative, [3] offers an extensive list of solvers that can be used.

1.3.2 Conjugate Gradient Method For our application we are going to use the conjugate gradient method, which is a very efficient method for solving linear equation systems when A is a symmetric positive definite matrix, since we only need to store a limited number of vectors in memory.

When we solve a linear system with iterative CG we will use the fact that the solution to the problem A * = b, for symmetric positive definite matrices A, is identical to the solution for the minimization problem

The complete algorithm for solving using the CG algorithm is found in Figure 6.

Page 60: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

13 16 January 2011 From www.omidrouhani.com/research/logisticregression/html/logisticregression.htm

Figure 6 The conjugate gradient method The conjugate gradient method can efficiently solve the equation system A * beta = b, for symmetric positive definite sparse matrices A. With perfect arithmetic, we will be able to find the correct solution x in the CG algorithm above in m steps, if A is of size m. However, since we will solve the system A * beta = b iteratively, and our A and b will change after each iteration, we don’t iterate the CG algorithm until we have an exact solution for beta. We stop when beta is “close enough” to the correct solution and then we recalculate A and b, using the recently calculated beta value, and once again run the CG algorithm to obtain an even better beta value. So although the CG algorithm requires m steps to find the exact solution, we will terminate the algorithm in advance and hence get a significant speed up.

Page 61: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

14 16 January 2011 From www.omidrouhani.com/research/logisticregression/html/logisticregression.htm

How fast we get to a solution that is good depends on the eigenvalues of the matrix A. We earlier stated that m iterations are required to find the exact solution, but to be more precise the number of iterations required is also bounded by the number of distinct eigenvalues of matrix A. However, in most practical situations with large matrices we will just iterate until the residual is small enough. Usually we will get a solution that is reasonable good within 20-40 iterations.

Note that one might choose many different termination criteria for when we want to stop the CG algorithm. For example:

* Termination when we have iterated too many times.

* Termination when residual is small enough

.

* Termination when the relative difference of the deviance is small enough

The deviance for our logistic regression system is

where as previously

. For an extensive investigation of how different termination criteria are affecting the resulting classifier accuracy, see [9].

For the reader interested in more details about the conjugate gradient method and possible extensions to it, the authors would like to recommend [3], [7] and [8].

1.3.3 Solvers for Building Large Logistic Regression Classifiers

Many papers have been investigating how one can build large scale logistic classifiers with different linear equations solvers ([1], [4], [5], [6]). We will be using the conjugate gradient method for this task. This has previously been reported to be a successful method for building large scale logistic classifiers in terms of nr of attributes and in nr of data points ([1]). However, due to the fact that the high

Page 62: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

15 16 January 2011 From www.omidrouhani.com/research/logisticregression/html/logisticregression.htm

computational complexity of calculating beta it is infeasible to build very large logistic regression classifiers if we don’t have an algorithm for building the classifier in a distributed environment using the power of a large number of machines.

The main contribution of this research will be to develop an efficient algorithm for building a very large scale logistic regression classifier using a distributed system.

1.3.4 How to Calculate beta

We have now gone through all theory we need to be able to build a large scale logistic regression classifier. To obtain beta we will now be using the iteratively reweighted least-squares method, also known as the IRLS method in Figure 10.

A complete algorithm for getting beta is shown in Figure 7.

Figure 7 Algorithm for Obtaining beta (IRLS) The Iteratively Reweighted Least-Squares method algorithm.

Page 63: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

16 16 January 2011 From www.omidrouhani.com/research/logisticregression/html/logisticregression.htm

Note that this will give us beta for one class only. We need to run this algorithm once for each class that we have so that we have one beta for each class. If we for example have 100,000 classes, the algorithm would need to run 100,000 times with different yi values each time. To run this code with a data set with around one million data points that have in the order of 100,000 attributes could take in the order of 1 minute to finish, so to be able to scale up our classifier we need an algorithm that can efficiently run this piece of code on distributed client machines.

1.4 Classification and Ranking

1.4.1 Do Classification Using Logistic Regression The way we will do classification with our system when we have all beta values is to create a matrix that we call the “weight matrix” (denoted W).

Say we have a data point x and we want to know which of the n classes it should belong to.

We have previously seen that the probability that data point x belongs to the class corresponding to beta is

. Hence, the larger value we have for beta * x, the stronger is our belief that the data point x belongs to the class corresponding to beta. So to do classification, we only need to see which beta gives the highest value and chose that class as our best guess.

.1.4.2 Do Ranking Using Logistic Regression To do ranking, we do basically the same thing as for classification

.

.Hence, the score for class i will be betai * x, and we rank the classes so that the class with the highest score is ranked highest.

Page 64: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

17 16 January 2011 From www.omidrouhani.com/research/logisticregression/html/logisticregression.htm

LIST OF ABBREVIATIONS LR Logistic Regression

NB Naïve Bayesian

CG Conjugate Gradient

IRLS Iteratively Reweighted Least-Squares

References [1] P. Komarek and A. Moore, Making Logistic Regression A Core Data Mining Tool: A Practical Investigation of Accuracy, Speed, and Simplicity. ICDM 2005.

[2] P. Komarek and A. Moore. Fast Robust Logistic Regression for Large Sparse Datasets with Binary Outputs. In Artificial Intelligence and Statistics, 2003.

[3] R Barrett, M Berry, T F. Chan, J Demmel, J M. Donato, J Dongarra, V Eijkhout, R Pozo, C Romine and H Van der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. Netlib Repository.

[4] Christopher J. Paciorek and Louise Ryan, Computational techniques for spatial logistic regression with large datasets. October 2005.

[5] P. Komarek and A. Moore, Fast Robust Logistic Regression for Large Sparse Datasets with Binary Outputs. 2003.

[6] P. Komarek and A. Moore, Fast Logistic Regression for Data Mining, Text Classification and Link Detection. 2003.

[7] Jonathan Richard Shewchuk, An Introduction to the Conjugate Gradient Method Without the Agonizing Pain. 1994.

[8] Edited by Loyce Adams, J. L. Nazareth, Linear and nonlinear conjugate gradient-related methods. 1996.

[9] Paul Komarek, Logistic Regression for Data Mining and High-Dimensional Classification.

[10] Holland, P. W., and R. E. Welsch, Robust Regression Using Iteratively Reweighted Least-Squares. Communications in Statistics: Theory and Methods, A6, 1977.

[11] J Zhang, R Jin, Y Yang, A. G. Hauptmann, Modified Logistic Regression: An Approximation to SVM and Its Applications in Large-Scale Text Categorization. ICML-2003.

[12] Information courtesy of The Internet Movie Database (http://www.imdb.com). Used with permission.

Page 65: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

18 16 January 2011 From www.omidrouhani.com/research/logisticregression/html/logisticregression.htm

[13] Claudia Perlich, Foster Provost, Jeffrey S. Simonoff. Tree Induction vs. Logistic Regression: A Learning-Curve Analysis. Journal of Machine Learning Research 4 (2003) 211-255.

[14] T. Lim, W. Loh, Y. Shih. A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-three Old and New Classification Algorithms. Machine Learning, 40, 203-229 (2000).

[15] A.E. Hoerl and R.W. Kennard. Ridge regression: biased estimates for nonorthogonal problems. Technometrics, 12:55–67, 1970.

[16] A.E.Hoerl, R.W.Kennard, and K.F. Baldwin. Ridge regression: some simulations. Communications in Statistics, 4:105–124, 1975.

[17] M. Ennis, G. Hinton, D. Naylor, M. Revow, and R. Tibshirani. A comparison of statistical learning methods on the GUSTO database. Statist. Med. 17:2501–2508, 1998.

[18] Tom M. Mitchell. Generative and discriminative classifiers: Naive Bayes and logistic regression. 2005.

[19] Gray A., Komarek P., Liu T. and Moore A. High-Dimensional Probabilistic Classification for Drug Discover. 2004.

[20] Kubica J., Goldenberg A., Komarek P., Moore A., and Schneider J. A Comparison of Statistical and Machine Learning Algorithms on the Task of Link Completion. In KDD Workshop on Link Analysis for Detecting Complex Behavior. 2003.

About this document The content of this HTML document comes from my Master Thesis.

If you intend to reference to this document, please reference as “Rouhani-Kalleh, O. Analysis, Theory and Design of Logistic Regression Classifiers Used For Very Large Scale Data Mining. Master Thesis. 2006.”

For contact information, please see ( http://www.OmidRouhani.com/contact.html ).

Page 66: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

1 Introduction

0

0.2

0.4

0.6

0.8

1

-4 -3 -2 -1 0 1 2 3 4

P(Z)

Z

Figure 1. The logistic curve P(Z)

The sigmoid curve of Figure 1 is traced by the logistic function

P (Z) =expZ

1 + expZ. (1)

P behaves like the distribution function of a symmetrical density, with mid-point zero; as Z moves through the real number axis, P rises monotonicallybetween the bounds of zero and 1. The meaning of this function varies ac-cording to the the definition of the variables. In the logit version of bio-assay,P is the probability of a binary outcome (the survival or death of an organ-ism), and Z = α + βX, with X a continuous stimulus or exposure variable(like the dosage of an insecticide); α determines the location of the curve onthe X-axis, and β its slope. In logistic regression there are several deter-minants of P , and Z = xT β, with x a vector of covariates (including a unitconstant) and β their coefficients. But the logistic function was originally de-signed to describe the course of a proportion P over time t, with Z = α+βt;it is a growth curve, since P (t) rises monotonically with t.

Over a fairly wide central range, for values of P from .3 to .7, the shapeof the logistic curve closely resembles the normal probability distribution

2

Page 67: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

function. The two functions

Pl(x) =exp(βx)

1 + exp(βx). (2)

and

Pn(x) =1

σ√

∫ x

− infexp−1/2(u/σ)2du. (3)

both pass through the point (0, .5), and they can be made almost to coincideupon a suitable adjustment of β and σ. This is a sheer algebraic coincidence,for there appears to be no intrinsic relation between the two forms.

2 The origins of the logistic function

The logistic function was invented in the 19th century for the description ofthe growth of populations and the course of autocatalytic chemical reactions,or chain reactions. In either case we consider the time path of a quantityW (t) and its growth rate

W (t) = dW (t)/dt. (4)

The simplest assumption is that W (t) is proportional to W (t)

W (t) = βW (t), β = W (t)/W (t), (5)

with β the constant rate of growth. This leads of course to exponentialgrowth

W (t) = A exp βt,

where A is sometimes replaced by the initial value W (0). With W (t) thehuman population of a country, this is a model of unopposed growth; asMalthus (1789) put it, a human population, left to itself, will increase in ge-ometric progression. It is a reasonable model for a young and empty countrylike United States in its early years1. Like many others, Alphonse Quetelet(1795–1874), the Belgian astronomer turned statistician, was well aware thatthe indiscriminate extrapolation of exponential growth must lead to impos-sible values. He experimented with several adjustments of (5) and also asked

1Two hundred years later exponential growth played a major part in the Report to theClub of Rome of Meadows, Meadows, Randers, and Behrens (1972), and it is still implicitin many economic analyses.

3

Page 68: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

his pupil, the mathematician Pierre–Francois Verhulst (1804–1849), to lookinto the problem.

Like Quetelet, Verhulst approached the problem by adding an extra termto (5) to represent the increasing resistance to further growth, as in

W (t) = βW (t)− φ(W (t)). (6)

and then experimenting with various forms of φ. The logistic appears whenthis is a simple quadratic, for in that case we may rewrite (6) as

W (t) = βW (t)(Ω−W (t)) (7)

where Ω denotes the upper limit or saturation level of W , its asymptote ast →∞. Growth is now proportional both to the population already attainedW (t) and to the remaining room for further expansion Ω − W (t). If weexpress W (t) as a proportion P (t) = W (t)/Ω this gives

P (t) = βP (t)1− P (t), (8)

and the solution of this differential equation is

P (t) =exp(α + βt)

1 + exp(α + βt), (9)

which Verhulst named the logistic function. The population W (t) then fol-lows

W (t) = Ωexp(α + βt)

1 + exp(α + βt). (10)

Verhulst published his suggestions between 1838 and 1847 in three papers.The first is a brief note in the Correspondance Mathematique et Physique,edited by Quetelet, in 1838. It contains the essence of the argument infour small pages, followed by a demonstration that the curve agrees verywell with the actual course of the population of France, Belgium, Essex andRussia for periods up to 1833; Verhulst explains that he did his research acouple of years before, that he did not have the time for an update and that hepublishes this note only at the insistence of Quetelet. He does not say how hefitted the curves. The second paper, in the Proceedings of the Belgian RoyalAcademy of 1845, is a much fuller account of the function and its properties.Here Verhulst names it the logistic, without further explanation: in a neatdiagram, the courbe logistique is drawn alongside the courbe logarithmique,

4

Page 69: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

which we would nowadays call the exponential. Verhulst also determines thethree parameters Ω, α and β of (10) by making the curve pass through threeobserved points. With data for some twenty or thirty years only this is ahazardous method, as is borne out by the resulting estimates of the limitingpopulation Ω. Employing the known values of the Belgian population in1815, 1830 and 1845 Verhulst finds a limiting population of 6.6 million forthat country, and in a similar exercise 40 million for France: at present thesepopulations number 10.2 and 58.7 million. In 1847 there followed a secondpaper in the Proceedings, which is chiefly notable for an adjustment of thecorrection term that leads to a much better estimate of 9.5 millions for theBelgian Ω.

Verhulst was in poor health and died in 1849. He was primarily a math-ematician - professor of mathematics at the Belgian Military Academy - butsensitive to social and political issues. In his obituary of Verhulst, Quetelet(1850) attributes his early death to overwork and, rather curiously, to hisgreat stature, as Verhulst was 1.89 meters or six feet tall. His discovery ofthe logistic curve was not taken up with much enthusiasm by Quetelet; asVanpaemel (1987) has shown, the two men did not see eye to eye on thequestion of population growth. This may in part account for some curiouselements in Quetelet’s obituary; while ostensibly praising his lamented pupil,Quetelet stresses his impulsive nature and depicts him as a somewhat sillyman. Quetelet recounts at length Verhulst’s adventures in Rome. Verhulstwas staying in that city in the summer of 1830, when the news broke of therevolution in Paris and of the Belgian secession from the Netherlands. Theseevents moved him strongly and set him drafting a democratic constitutionfor the Papal State. He submitted this document to some cardinals he hadmet, who expressed great interest; still the police were called in, and Ver-hulst banished from Rome. He left under somewhat dramatic circumstances,having at first barricaded his apartment with the intention of withstandinga siege by the forces of law and order. But then he was only 26 years old atthe time.

Quetelet did not pay much attention to the logistic curve in his writings;it is barely mentioned, in an aside, in Quetelet (1848). But Verhulst’s workwas quoted with approval by Liagre (1852), his colleague at the MilitaryAcademy, and in the second edition of this textbook Camille Peney repeatsthe estimation of Ω for Belgium on the basis of more recent population figures,arriving at a value of 13.7 millions.

5

Page 70: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

As a model of population growth the logistic function was discoveredanew in 1920 by Pearl and Reed. They were unaware of Verhulst’s work(though not of the curves for autocatalytic reactions discussed presently),and they arrived independently at the logistic curve of (10). When this wasfitted to Census figures of the U.S., again by making the curve pass throughthree points, it gave a good fit for the period from 1790 to 1910. But theestimate of Ω of 197 millions once more compares badly with the presentvalue of about 270 millions. Along with the pursuit of many other interests,Pearl and his collaborators in the next twenty years went on to apply thelogistic growth curve to almost any living population from fruit flies to thehuman population of the French colonies in North Africa as well as to thegrowth of cantaloupes.

In 1920, Raymond Pearl (1879–1940) had just been appointed Director ofthe Department of Biometry and Vital Statistics at Johns Hopkins Univer-sity, and Lowell J. Reed (1886–1966) was his deputy (and his successor whena few years later Pearl was promoted to Professor of Biology). Pearl wastrained as a biologist and acquired his statistics as a young man by spendingthe year 1905–1906 in London with Karl Pearson (and later quarrelling withhim). He became a prodigious investigator and a prolific writer on a widevariety of phenomena like longevity, fertility, contraception, and the effectsof alcohol and tobacco consumption on health, all subsumed under the head-ing of human biology. During World War I Pearl worked in the U.S. FoodAdministration, and this may account for his preoccupation with the foodneeds of a growing population in the 1920 paper. Reed, who was trained as amathematician, made a quiet career in biostatistics; he excelled as a teacherand as an administrator, and was brought back in 1953 from retirement toserve as President of Johns Hopkins. Among his publications in the after-math of the 1920 paper with Pearl is an application of the logistic curve toautocatalytic reactions, Reed and Berkson (1929). We shall hear more aboutthis co-author in the next section.

Verhulst’s work was rediscovered soon after Pearl and Reed’s first paperof 1920. The immediate sequel, Pearl and Reed (1922), does not mention it;Verhulst’s priority is first acknowledged in a footnote in Pearl (1922), and,at greater length, in Pearl and Reed (1923). In this paper, Pearl and Reedcall Verhulst’s papers ”long since forgotten”, except for a single article byDu Pasquier (1918), and they then go out of their way to criticize that authorfor an ”entirely unjustified and in practice usually incorrect modification” ofVerhulst’s formula, without substantiating this harsh judgment. In fact Du

6

Page 71: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Pasquier’s paper is a harmless reflection on four mathematical theories ofpopulation, of a very formal and abstract character to the point of inanity.The four theories are ascribed to Halley, de Moivre, Euler and Verhulst, andthese authors are briefly introduced; Halley, for example, as ”the famousastronomer”, and Verhulst, rather oddly, as ”a Belgian who died in 1847”.No references are given.

Louis–Gustave Du Pasquier (1876–1957), Professor of Mathematics atthe University of Neuchatel, took his degrees in mathematics in Zurich, butfollowed courses in the social sciences as well and spent the year 1900–1901in Paris, taking courses at a variety of academic institutions. He may wellhave read about Verhulst in Liagre or elsewhere in the French literature, butI have been unable to find a useful reference to this effect in his textbook ofprobability (1926) . It is also not clear how Pearl learned about Verhulst, or,for that matter, about Du Pasquier2.

The next important publication is Yule’s Presidential Address to theRoyal Statistical Society of 1925. Yule, who says he owes the reference toPearl (1922), treats Verhulst much more handsomely than Pearl and Reeddid, devoting an appendix to his work. Yule is also the first author to revivethe name logistic, which is not used by Liagre or Du Pasquier nor by Pearland Reed in their earlier references. By 1924, however, ”logistic” is used as acommonplace term in the correspondence between Pearl and Yule, who werelifelong friends. It would take until 1933 for Miner (a collaborator of Pearl)to pay tribute to Verhulst, if in an oblique way: instead of reproducing atleast one of Verhulst’s papers, Miner translates Quetelet’s obituary, and em-phasises Verhulst’s Roman imbroglio by adding an extract from the memoirsof Queen Hortense de Beauharnais recording this episode.

As we have already hinted there is another early root of the logistic func-tion in chemistry, where it was employed (again with some variations) todescribe the course of autocatalytic or chain reactions, where the productitself acts as a catalyst for the process while the supply of raw material isfixed. This leads naturally to a differential equation like (8) and hence to thelogistic function for the time path of the amount of the reaction product. Thereview of the application of logistic curves to a number of such processes byReed and Berkson (1929) quotes work of the German professor of chemistry

2The Pearl archives at the American Philosophical Society in Philadelphia containPearl’s correspondence with several hundred individuals, but Du Pasquier is not amongthem.

7

Page 72: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Wilhelm Ostwald of 1883. Authors like Yule (1925) and Wilson (1925) werewell aware of this strand of the literature.

The basic idea of logistic growth is simple and effective, and it is used tothis day to model population growth and market penetration of new productsand technologies. The introduction of mobile telephones is an autocatalyticprocess, and so is the spread of many new products and techniques in indus-try.

3 The invention of the probit and the advent of the logit in bio-assay

The invention of the probit model is usually credited to Gaddum (1933) andBliss (1934a,1934b), but one look at the historical section of Finney (1971)or indeed at Gaddum’s paper and his references will show that this is toosimple. The roots of the method and in particular the transformation of fre-quencies to equivalent normal deviates can be traced to the German scholarFechner (1801–1887). Stigler (1986) recounts how Fechner was drawn tostudy human responses to external stimuli by experimental test of the abil-ity to distinguish differences in weight. The issue of the variability of humanresponses had been raised by astronomers, who relied on human observersof celestial phenomena and found that their readings showed much unac-countable variation. Fechner recognized that human response to an identicalstimulus is not uniform, and he was the first to transform observed differencesto equivalent normal deviates. The historical sketches of Finney (1971), Ch.3.6, and of Aitchison and Brown (1957), Ch. 1.2, record a long line of largelyindependent rediscoveries of this approach that spans the seventy years fromFechner (1860) to the early 1930’s when Gaddum and Bliss published theircontributions. Both authors regard the assumption of a normal distributionas commonplace, and attach more importance to the logarithmic transfor-mation of the stimulus. Their papers contain no major innovations, butthey mark the emergence of a standard paradigm of bio-assay and of a newterminology. Gaddum wrote a comprehensive and authoritative report withthe emphasis on practical aspects of the experiments and on the statisticalinterpretation of bio-assay, giving several worked examples from the medicaland pharmaceutical literature. Bliss published two brief notes in Science, in-troducing the term probit; he followed this up with a series of articles settingout the maximum likelihood estimation of the probit curve, in one instancewith assistance from R.A. Fisher, Bliss (1935). Both Gaddum and Bliss set

8

Page 73: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

standards of estimation; until the 1930’s this was largely a matter of ad hocnumerical and graphical adjustment of curves to categorical data.

John Henry Gaddum (1900–1965) studied medicine at Cambridge butfailed in his final examinations. He turned to pharmacology and worked un-der Trevan at the Wellcome Laboratories, then transferred to the NationalInstitute for Medical Research (where he wrote the 1933 report) before em-barking on an academic career of professorships in pharmacology in Cairo,London and Edinburgh. He was elected to the Royal Society in 1945 andknighted in 1964. To this day the British Pharmacological Society awards anannual Gaddum Memorial Prize for pharmaceutical research. Charles IttnerBliss (1899–1979) studied as an entomologist at Ohio State University andwas a field worker with the U.S. Department of Agriculture until this employ-ment was terminated in 1933. He then spent two years in London studyingstatistics with R.A. Fisher, and Fisher found him a job as a statistician inLeningrad where he lived from 1936 and 1938. The political conditions werenot propitious for serious research. Bliss returned to the Connecticut Agri-cultural Experiment Station, combining his work as a practising statisticianwith a Lecturership at Yale from 1942 until his retirement. He played animportant role in the founding of the Biometric Society.

In their early writings on bio-assay both authors adhere firmly to theclassical model of bio-assay, where the stimulus is determinate and responsesare random because of the variability of individual tolerance levels. Blissintroduced the term probit (short for ’probability unit’) originally as a con-venient scale for normal deviates, but abandoned this within a year in favourof a different definition which has since been generally accepted. For any(relative) frequency f there is an equivalent normal deviate Z such that thecumulative normal distribution at Z equals f ; Z is the solution of

f =1√2π

∫ Z

− infexp−(1/2)u2du, (11)

and this can be read off from a table of the normal distribution. The probitof f is this equivalent normal deviate Z, or initially Z increased by 5; thisensures that the probit is almost always positive, which facilitates calculation.In the 1930’s such additive constants were a common device. In the probitmethod probits of relative frequencies or of probabilities f are linearly relatedto (the logarithm of the) stimulus.

The acceptance of the probit method was aided by the articles of Bliss,who published regularly in this field until the 1950’s, and by Finney and

9

Page 74: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

others (Gaddum returned to pharmacology). The full flowering of this schoolprobably coincides with the first edition of Finney’s monograph in 1947.Without the underlying theory of bio-assay, probit analysis was quickly usedfor any relation of a discrete binary outcome to one or more determinants.In economics and market research, for example, the first applications appearalready in the 1950’s: Farrell (1954) uses a probit model for the ownershipof cars of different vintage as a function of household income, and Adam(1958) fits lognormal demand curves to survey data of the willingness to buycigarette lighters and the like at various prices. The classic monograph onthe lognormal distribution of Aitchison and Brown (1957) brought probitanalysis to the notice of a wider audience of economists.

As far as I can see the introduction of the logistic as an alternative to thenormal probability function is the work of a single person, namely JosephBerkson (1899–1982), Reed’s co-author of the paper on autocatalytic func-tions of 1929. Berkson read physics at Columbia, then went to Johns Hopkinsfor his M.D. and a doctorate in statistics in 1928. He stayed on as an assistantfor three years and this is when he collaborated with Reed on autocatalyticfunctions. Berkson then moved to the Mayo Clinic where he remained forthe rest of his working life as chief statistician. In the 1930’s he published nu-merous papers on medical and public health matters, but in 1944 he turnedhis attention to the statistical methodology of bio-assay and proposed theuse of the logistic instead of the normal probability function of (11), coiningthe term ’logit’ by analogy to the ’probit’ of Bliss (for which he was initiallymuch derided). As we have indicated earlier the two functions are almostindistinguishable. By the inverse of the logistic function (1) we have

logit(P ) = logP

1− P= Z, (12)

which is of course much simpler than the definition of the probit of (11). Theissue of logit versus probit was tangled by Berkson’s simultaneous attacks onthe method of maximum likelihood and his advocacy of minimum chi-squaredestimation instead. Between 1944 and 1980 he wrote a large number of paperson both issues; examples are Berkson (1951) and Berkson (1980). He oftenadopted a somewhat provocative style, and much controversy ensued.

The close resemblance of the logistic to the normal distribution functionmust have been common knowledge among those who were familiar withthe logistic; it had been demonstrated by Wilson (1925) and written up byWinsor (1932) (another collaborator of Pearl). Wilson was probably the

10

Page 75: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

first to publish an application of the logistic curve in bio-assay in Wilsonand Worcester (1943), just before Berkson (1944). But it was Berkson whopersisted and fought a long and spirited campaign which lasted for severaldecades.

Berkson’s suggestion was not well received by the biometric establish-ment. In the first place, the logit was regarded as somewhat inferior anddisreputable because unlike the probit it can not be related to an underly-ing (normal) distribution of tolerance levels. Aitchison and Brown (1957)dismiss the logit in a single sentence, because it ”lacks a well-recognized andmanageable frequency distribution of tolerances which the probit curve doespossess in a natural way” (p.72). Berkson was aware of this defect and triedto remedy it by adapting the autocatalytic argument, in Berkson (1951), butthis did not convince as this argument essentially deals with a process overtime. In retrospect it is surprising that so much importance was attachedto these somewhat ideological points of interpretation. At the time no one(not even Berkson) seems to have recognized the formidable power of thelogistic’s analytical properties. In the second place, Berkson’s case for thelogit was not helped by his simultaneous attacks on the established wisdomof maximum likelihood estimation and his advocacy of minimum chi-squared.The unpleasant atmosphere in which this discussion was conducted can begauged from the acrimonious exchanges between R.A. Fisher and Berkson inFisher (1954).

In the practical aspect of ease of computation the logit had a clear advan-tage over the probit, even with maximum likelihood estimation. To quoteCochran (from his comments on Fisher (1954), p.147) ”.. the speed withwhich a new technique becomes widely used is considerably influenced by thesimplicity or otherwise of the calculations that it requires. Next door to thelecture room in which the probit method is expounded one may still find thelaboratory in which the workers compute their LD 50s by the [much less so-phisticated] Behrens (Reed–Muench) method ..”. On this count the logitspread much more quickly in workfloor practice than in the academic dis-course. Until the advent of the computer and the pocket calculator, sometwenty years later, all numerical work was done by hand, that is with penciland paper, sometimes aided by graphical inspection of ’freehand curves’, ’fit-ted by eye’. For probit and logit analyses of grouped data or class frequenciesthere was graph paper with a special grid on which a probit or logit curvewould appear as a straight line. Wilson (1925) had introduced the logistic(or ’autocatalytic’) grid, and examples of lognormal paper can be found in

11

Page 76: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Aitchison and Brown (1957) and Adam (1958);3 Berkson himself had de-signed logistic graph paper as well as several nomograms.4 Numerical workwas supported rather feebly by the slide rule and by mechanical calculatingmachines, driven by hand or powered by a small electric motor, which werecapable of addition and multiplication; punched card equipment was helpfulif numerous data had to be analysed. Values of the normal distribution (andof exponentials and logarithms) were obtained from printed tables like Pear-son’s Biometrika Tables or the Statistical Tables of Fisher and Yates (1938).From the first edition the latter carried specially designed tables for probitanalysis (with auxiliary tables contributed by Bliss and by Finney), and fromthe fifth edition of 1957 onwards they also included special tables for logitanalysis.

In time, the ideological conflict over bio-assay abated. Finney, who hadignored the logit in the second edition of his textbook of 1952, made amendsin the third edition of 1970, recognizing (somewhat belatedly) that ”whatmatters is the dependence of P on dose and the unknown parameters, andthe tolerance distribution is merely a substructure leading to this” (p.47). Infact the narrow conflict between probit and logit in bio-assay had long beenovertaken by independent developments in statistics and biometrics.

4 The ascent of the logit

When the ideological debate about logit and probit in bio-assay had abated,around 1960, the logit terminology and the logit transformation of (12) weresoon much more widely adopted, and their origins forgotten. An accuratehistory of the adoption and further development of the logit would requirean intimate knowledge of several quite distinct disciplines, for many newgeneralizations were introduced independently and in almost complete isola-tion in completely unrelated applied work. We shall here only briefly touchupon some major movements in statistics, in epidemiology, and in the socialsciences and econometrics, without attempting a systematic treatment.

The earliest developments took place in the late 1950’s and the 1960’sin statistics and epidemiology. In statistics, the analytical advantages ofthe logit transformation as a means of dealing with discrete binary outcomes

3Finney (1971) traces the invention of the probability grid to a French artilleryman ofthe late 1890’s.

4A nomogram is a graph from which one can read off a transformations, as from a table;sophisticated nomograms may permit the quick solution of more complicated equations.

12

Page 77: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

were soon recognized. Cox was among the first to explore (and exploit) thesepossibilities; he wrote a series of papers between around 1960, and followedthese up with an influential textbook in 1969. The logit model of bio-assay iseasily generalized to logistic regression where binary outcomes are related toa number of determinants, without a specific theoretical background, and thisstatistical model proved as fertile as linear regression in an earlier era. Later,the link of the logistic model with discriminant analysis was recognized, andits ready association with loglinear models in general. In epidemiology, case-control studies began even earlier, and since these are directly concerned withodds, and odds ratios, the log-odds or logit transformation arises naturally.The practice had already called for a theoretical justification, especially of thesampling aspects, from an early date; see, for example, the work of Cornfieldin the early 1950’s.

Table 1. Number of articles in statistical journalscontaining the word ’probit’ or ’logit’.

probit logit

1935 – 39 6 -1940 – 44 3 11945 – 49 22 61950 – 54 50 151955 – 59 53 231960 – 64 41 271965 – 69 43 411970 – 74 48 611975 – 79 45 721980 – 84 93 1471985 – 89 98 2151990 – 94 127 311

The ascent of the logit in the statistical literature is illustrated in Table1, which is drawn from the jstor electronic repertory of major statisticaljournals in the english language5. The table shows the number of articleswhich contain the word ”probit” or ”logit”. It must be borne in mind that

5These are all the journals of the Royal Statistical Society and of the American Statis-tical Association; the Annals of Applied Probability, Annals of (Mathematical) Statistics,Annals of Probability, Biometrics, Biometrika and Statistical Science.

13

Page 78: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

the overall number of articles in these journals increases substantially overtime; from 1935 to 1985 it increased about eightfold. Up to around 1970 therelative numbers show the predominance of probit in bio-assay; then logitsoars ahead - not because of the after-effects of a victory over probit in bio-assay, but because of its much wider use in statistical theory and applicationsgenerally.

Until about 1980 computational effort was still an important issue in thediscussion of statistical techniques, but by then the computer revolution putan end to this. On the specific issue of estimating logit and probit analy-ses, maximum likelihood estimation became the norm when routines for thismethod, applicable to individual data, were included in commercial statisti-cal program packages. This facility was probably first offered by the bmdp(or biomedical data processing) program of 1977. By the time the firstcomprehensive textbook with medical applications of Hosmer and Lemeshow(1989) was published the use of such routines was taken for granted. Of thetwo causes Berkson advocated, minimum chi-squared estimation was effec-tively overtaken by the computer revolution, while the logit transformationof (12) was triumphant.

We conclude with some remarks on contributions from econometrics andthe social sciences. We have earlier indicated that the probit model of bio-assay was readily adopted in these disciplines. The theoretical justificationof bio-assay in terms of determinate stimulus and random thresholds wasfirst jettisoned in the change to logistic regression, and then retrieved inthe form of the latent regression equation model that is still dear to thebehavioural sciences. This is probably due to McKelvey and Zavoina (1975),who introduce it in an ordered probit analysis of the voting behaviour ofUS Congressmen. An example of simultaneous independent discoveries isthe generalization of logistic regression to the multinomial or polychotomouscase. This was first set out, at some length, by Gurland, Lee, and Dahm(1960). Several years later it was put forward quite independently by thestatistician Cox (1966) and by the biometric statistician Mantel (1966). Andsome years later again it was once more rediscovered independently by theeconometrician Theil (1969), who arrived at it from the general perspectiveof modelling shares.

For a long time, logistic regression, whether in the binary or the multi-nomial context, was principally used as a technique, a simple tool withouta specific underlying process and therefore without a characteristic interpre-tation. But in 1973 McFadden, working as a consultant for a Californian

14

Page 79: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

public transportation project, linked the multinomial logit to the theory ofdiscrete choice from mathematical psychology. This provided a theoreticalfoundation of the logit model that is much more profound than any theoryput forward for the use of the probit in bio-assay. It earned its author theNobel prize in economics in 2000.

References

Adam, D. (1958). Les Reactions du Consommateur Devant les Prix. Num-ber 15 in Observation Economique. Paris: Sedes.

Aitchison, J. and J. A. C. Brown (1957). The Lognormal Distribution.Number 5 in University of Cambridge, Department of Applied Eco-nomics Monographs. Cambridge: Cambridge University Press.

Berkson, J. (1944). Application of the logistic function to bio-assay. Jour-nal of the American Statistical Association 39, 357–365.

Berkson, J. (1951). Why I prefer logits to probits. Biometrics 7, 327–339.

Berkson, J. (1980). Minimum chi-square, not maximum likelihood! Annalsof Mathematical Statistics 8, 457–487.

Bliss, C. I. (1934a). The method of probits. Science 79, 38–39.

Bliss, C. I. (1934b). The method of probits. Science 79, 409–410.

Bliss, C. I. (1935). The calculation of the dosage-mortality curve. Annalsof Applied Biology 22, 134–167. With an appendix by R.A. Fisher.

Cornfield, J. (1951). A method of estimating comparative rates from clin-ical data. Journal of the National Cancer Institute 11, 1269–1275.

Cornfield, J. (1956). A statistical problem arising from retrospective stud-ies. In J. Neyman (Ed.), Proceedings of the Third Berkeley Symposiumon Mathematical Statistics and Probability, Berkeley, Calif., pp. 135–148. University of California Press.

Cox, D. R. (1958). The regression analysis of binary sequences. Journal ofthe Royal Statistical Society, Series B 20, 215–242.

Cox, D. R. (1966). Some procedures connected with the logistic qualita-tive response curve. In F. David (Ed.), Research Papers in Statistics:Festschrift for J. Neyman, pp. 55–71. London: Wiley.

Cox, D. R. (1969). Analysis of Binary Data. London: Chapman and Hall.

15

Page 80: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

de Beauharnais, H. (1927). Memoires de la Reine Hortense, publiees parle prince Napoleon. Paris: Plon. Three volumes.

Du Pasquier, L.-G. (1918). Esquisse d’une nouvelle theorie de lapopulation. Vierteljahrsschrift der Naturforschenden Gesellschaft inZurich 63, 236–249.

Du Pasquier, L.-G. (1926). Le calcul des probabilites. Paris: Hermann.

Farrell, M. J. (1954). The demand for motorcars in the United States.Journal of the Royal Statistical Society, series A 117, 171–200.

Fechner, G. T. (1860). Elemente der Psychophysik. Leipzig: Breitkopf undHartel.

Finney, D. (1971). Probit Analysis (Third ed.). Cambridge: CambridgeUniversity Press. First edition in 1947.

Fisher, R. A. (1954). The analysis of variance with various binomial trans-formations. Biometrics 10, 130–151. with comments by M.S. Bartlett,F.J. Anscombe, W.G. Cochran and J. Berkson.

Fisher, R. A. and F. Yates (1938). Statistical Tables for Biological, Agri-cultural and Medical Research. Edinburgh: Oliver and Boyd.

Gaddum, J. H. (1933). Reports on Biological Standard III. Methods ofBiological Assay Depending on a Quantal Response. London: Medi-cal Research Council. Special Report Series of the Medical ResearchCouncil, no. 183.

Gurland, J., I. Lee, and P. A. Dahm (1960). Polychotomous quantal re-sponse in biological assay. Biometrics 16, 382–398.

Hosmer, D. W. and S. Lemeshow (2000). Applied Logistic Regression (Sec-ond ed.). New York: Wiley. First edition in 1989.

Liagre, J. B. J. (1852). Calcul des probabilites et theorie des erreurs. Brux-elles: Soci’et’e pour l’emancipation intellectuelle (A. Jamard).

Liagre, J. B. J. (1879). Calcul des probabilites et theorie des erreurs.Bruxelles, Paris: Muquardt, Gauthier-Villars. 2eme edition, revue parCamille Peney.

Malthus, T. E. (1798). An Essay on the Principle of Population. London:anon.

16

Page 81: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Mantel, N. (1966). Models for complex contingency tables and polychoto-mous dosage response curves. Biometrics 22, 83–95.

McFadden, D. (2001). Economic choices. American Economic Review 91,352–370. Nobel prize acceptance speech.

McKelvey, R. D. and W. Zavoina (1975). A statistical model for the anal-ysis of ordinal level dependent variables. Journal of Mathematical So-ciology 4, 103–120.

Meadows, D. H., D. L. Meadows, J. Randers, and W. W. Behrens (1972).The Limits to Growth. New York: Universe Books.

Miner, J. R. (1933). ’Pierre-Francois Verhulst, the discoverer of the logisticcurve. Human Biology 5, 673–689.

Pearl, R. (1922). The Biology of Death. Phildelphia: Lippincott.

Pearl, R. (1927). The indigenous population of Algeria in 1926. Science 66,593–594.

Pearl, R. and L. J. Reed (1920). On the rate of growth of the populationof the United States since 1870 and its mathematical representation.Proceedings of the National Academy of Sciences 6, 275–288.

Pearl, R. and L. J. Reed (1922). A further note on the mathematicaltheory of population growth. Proceedings of the National Academy ofSciences 8, 365–368.

Pearl, R. and L. J. Reed (1923). On the mathematical theory of populationgrowth. Metron 5, 6–19.

Pearl, R., L. J. Reed, and J. F. Kish (1940). The logistic curve and thecensus count of 1940. Science 92, 486–488.

Pearl, R., C. P. Winsor, and F. B. White (1928). The form of the growthcurve of the cantaloupe (Cucumis melo) under field conditions. Pro-ceedings of the National Academy of Sciences 14, 895–901.

Pearson, K. (1914). Tables for Statisticians and Biometricians. Cambridge:Cambridge University Press.

Quetelet, A. (1848). De systeme social et des lois qui le regissent. Paris:Guillaume.

Quetelet, A. (1850). Notice sur Pierre-Francois Verhulst. Annuaire de l’A-cademie Royale des Sciences, Lettres et des Beaux-arts 16, 97–124.

17

Page 82: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Reed, L. J. and J. Berkson (1929). The application of the logistic functionto experimental data. Journal of Physical Chemistry 33, 760–779.

Stigler, S. M. (1986). The History of Statistics. Cambridge, Mass.: HarvardUniversity Press.

Theil, H. (1969). A multinomial extension of the linear logit model. Inter-national Economic Review 10, 251–259.

Vanpaemel, G. (1987). Quetelet en Verhulst over de mathematische wet-ten van de bevolkingsgroei. Academiae Analecta, Mededelingen van deKoninklijke Academie voor Wetenschappen, Letteren en Schoone Kun-sten van Belgie 49, 99–114.

Verhulst, P.-F. (1838). Notice sur la loi que la population suit dans sonaccroissement. Correspondance Mathematique et Physique, publiee parA. Quetelet 10, 113.

Verhulst, P.-F. (1845). Recherches mathematiques sur la loi d’accroisse-ment de la population. Nouveaux Memoires de l’Academie Royale desSciences, des Lettres et des Beaux-Arts de Belgique 18, 1–32.

Verhulst, P.-F. (1847). Deuxieme memoire sur la loi d’accroissement dela population. Nouveaux Memoires de l’Academie Royale des Sciences,des Lettres et des Beaux-Arts de Belgique 20, 1–32.

Wilson, E. B. (1925). The logistic or autocatalytic grid. Proceedings of theNational Academy of Sciences 11, 451–456.

Wilson, E. B. and J. Worcester (1943). The determination of L. D. 50 andits sampling error in bio-assay. Proceedings of the National Academy ofSciences 29, 79. First of a series of three articles.

Winsor, C. P. (1932). A comparison of certain symmetrical growth curves.Journal of the Washington Academy of Sciences 22, 73–84.

Yule, G. U. (1925). The growth of population and the factors which controlit. Journal of the Royal Statistical Society 138, 1–59.

Biographical sources

I have consulted a number of obituaries and other sources about Pearl, Reed, Bliss, Gad-dum and Berkson, namely

Jennings, H.S. (1941) Raymond Pearl, 1879–1940. Biographical Memoirs of the NationalAcademy of Sciences of the United States 22, nr. 14, 295–347.

18

Page 83: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Miner, John R. and Joseph Berkson (1940) Raymond Pearl, 1879–1940. The ScientificMonthly 52, 1092–194.

Cochran, W.G. (1967) Lowell Jacob Reed. Journal of the Royal Statistical Society, SeriesA 130, 279-281.

Feldberg, W. (1967) John Henry Gaddum, 1900–1965. Biographical Memoirs of Fellowsof the Royal Society 13, 57–77.

Cochran, W.G. (1979) Chester Ittner Bliss, 1899—1979. Biometrics 35, 715–717.

(about Bliss) Salsburg, D. (2001) The lady tasting Tea. New York: Holt.

Armitage, P., and T. Colton (eds) (1998) Joseph Berkson 1899–1982. Encyclopedia ofBiostatistics, volume I, 290–300. New York: Wiley.

Taylor, W.F. (1983) Joseph Berkson, 1899-1982. Journal of the Royal Statistical Society,Series A 146, 413–419.

19

Page 84: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Logit 1

LogitThe logit function is the inverse of the "sigmoid", or "logistic" function used in mathematics, especially in statistics.Logit is pronounced /ˈloʊdʒɪt/ (LOH-jit).

DefinitionThe logit of a number p between 0 and 1 is given by the formula:

The base of the logarithm function used is of little importance in the present article, as long as it is greater than 1, butthe natural logarithm with base e is the one most often used.The "logistic" function of any number is given by the inverse-logit:

If p is a probability then p/(1 − p) is the corresponding odds, and the logit of the probability is the logarithm of theodds; similarly the difference between the logits of two probabilities is the logarithm of the odds ratio (R), thusproviding a shorthand for writing the correct combination of odds ratios only by adding and subtracting:

Plot of logit(p) in the domain of 0 to 1, where the base of logarithm is e

History

The logit model was introduced by JosephBerkson in 1944, who coined the term. Theterm was borrowed by analogy from thevery similar probit model developed byChester Ittner Bliss in 1934.[1] G. A.Barnard in 1949 coined the commonly usedterm log-odds; the log-odds of an event isthe logit of the probability of the event.

Uses and properties

• The logit in logistic regression is aspecial case of a link function in ageneralized linear model: it is thecanonical link function for the binomialdistribution.

• The logit function is the negative of the derivative of the binary entropy function.• The logit is also central to the probabilistic Rasch model for measurement, which has applications in

psychological and educational assessment, among other areas.• The inverse-logit function is also sometimes referred to as the expit function.

Page 85: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Logit 2

See also• Discrete choice on binary logit, multinomial logit, conditional logit, nested logit, mixed logit, exploded logit, and

ordered logit• Limited dependent variable• Daniel McFadden, a Nobel Prize winner for development of a particular logit model used in economics• Logit analysis in marketing• Perceptron• Probit another function with the same domain and range as the logit

References[1] J. S. Cramer (2003). "The origins and development of the logit model" (http:/ / www. cambridge. org/ resources/ 0521815886/ 1208_default.

pdf). Cambridge UP. .

Further reading• Ashton, Winifred D. (1972). The Logit Transformation: with special reference to its uses in Bioassay. Griffin's

Statistical Monographs & Courses. 32. Charles Griffin. ISBN 0852642121.

Page 86: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Article Sources and Contributors 3

Article Sources and ContributorsLogit Source: http://en.wikipedia.org/w/index.php?oldid=402753376 Contributors: AllenDowney, AxelBoldt, BKfi, Baccyak4H, Belkovich, BenFrantzDale, BrendanH, Calimo, Cazort,Cbuckley, David Haslam, DavidLevinson, Davwillev, Doradus, Duoduoduo, Feinstein, Giftlite, HenningThielemann, Henrygb, Hike395, Juffi, Kwamikagami, Liftarn, LokiClock, Melcombe,Michael Hardy, Michael Slone, MrOllie, Nshuks7, O18, Owenozier, Pgan002, Piotrus, Populus, Predictor, Qwfp, Ross Burgess, Rstatx, Scentoni, Solarapex, StephanWehner, Stephenpratt, TheAnome, Tkinias, Tomi, Topbanana, Wile E. Heresiarch, Wmahan, Zven, 52 anonymous edits

Image Sources, Licenses and ContributorsImage:Logit.png Source: http://en.wikipedia.org/w/index.php?title=File:Logit.png License: GNU Free Documentation License Contributors: Darapti, Maksim

LicenseCreative Commons Attribution-Share Alike 3.0 Unportedhttp:/ / creativecommons. org/ licenses/ by-sa/ 3. 0/

Page 87: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

The origins and development of the logit model

J.S. Cramer ∗

August 2003

Abstract

This is and updated and somewhat extended version of Chapter9 of Logit Models from Economics and Other Fields (Cambridge Uni-versity Press, 2003) which includes additional material obtained sincethe completion of that book. The text has been adapted so that thispaper can be read independently.

The paper describes the origins of the logistic function and itshistory up to the adoption of the logit in bio-assay and the beginningof its wider acceptance in statistics. Its roots spread back to the19th century, when the function was invented to describe populationgrowth and given its name by the Belgian mathematician Verhulst.Subsequent events have been determined decisively by the individualactions and personal histories of a few scholars: the rediscovery of thegrowth function is due to Pearl and Reed, the survival of the termlogistic to Yule, and the introduction of the function in bio-assay (andhence in statistics in general) to Berkson.

∗University of Amsterdam and Tinbergen Institute, Amsterdam; postal address Baam-brugse Zuwe 194, 3645 AM Vinkeveen, the Netherlands; e-mail [email protected] comments on earlier drafts and ready help in obtaining valuable information aboutVerhulst, Pearl and Du Pasquier I thank John Glaus (Maine), Jan Sandee (Boskoop), IdaStamhuis (Amsterdam), Professor G. Vanpaemel (Gent) and Professor J.Aghion (Liege).I am much indebted to the American Philosophical Society, Philadelphia, for permissionto consult the Pearl Archives, which contain the correspondence and pocket diaries ofRaymond Pearl, and to Robert Cox for his help; to Professor Remy Scheurer (Neuchatel),who provided invaluable material about Du Pasquier from the archives of the University ofNeuchatel; and to Michel Guillot (Paris) and Anton Barten (Leuven) who kindly assistedme in consulting Du Pasquier’s books at the Bibliotheque Nationale and the UniversityLibrary, Leuven.

1

Page 88: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

1 Introduction

0

0.2

0.4

0.6

0.8

1

-4 -3 -2 -1 0 1 2 3 4

P(Z)

Z

Figure 1. The logistic curve P(Z)

The sigmoid curve of Figure 1 is traced by the logistic function

P (Z) =expZ

1 + expZ. (1)

P behaves like the distribution function of a symmetrical density, with mid-point zero; as Z moves through the real number axis, P rises monotonicallybetween the bounds of zero and 1. The meaning of this function varies ac-cording to the the definition of the variables. In the logit version of bio-assay,P is the probability of a binary outcome (the survival or death of an organ-ism), and Z = α + βX, with X a continuous stimulus or exposure variable(like the dosage of an insecticide); α determines the location of the curve onthe X-axis, and β its slope. In logistic regression there are several deter-minants of P , and Z = xT β, with x a vector of covariates (including a unitconstant) and β their coefficients. But the logistic function was originally de-signed to describe the course of a proportion P over time t, with Z = α+βt;it is a growth curve, since P (t) rises monotonically with t.

Over a fairly wide central range, for values of P from .3 to .7, the shapeof the logistic curve closely resembles the normal probability distribution

2

Page 89: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

function. The two functions

Pl(x) =exp(βx)

1 + exp(βx). (2)

and

Pn(x) =1

σ√

∫ x

− infexp−1/2(u/σ)2du. (3)

both pass through the point (0, .5), and they can be made almost to coincideupon a suitable adjustment of β and σ. This is a sheer algebraic coincidence,for there appears to be no intrinsic relation between the two forms.

2 The origins of the logistic function

The logistic function was invented in the 19th century for the description ofthe growth of populations and the course of autocatalytic chemical reactions,or chain reactions. In either case we consider the time path of a quantityW (t) and its growth rate

W (t) = dW (t)/dt. (4)

The simplest assumption is that W (t) is proportional to W (t)

W (t) = βW (t), β = W (t)/W (t), (5)

with β the constant rate of growth. This leads of course to exponentialgrowth

W (t) = A exp βt,

where A is sometimes replaced by the initial value W (0). With W (t) thehuman population of a country, this is a model of unopposed growth; asMalthus (1789) put it, a human population, left to itself, will increase in ge-ometric progression. It is a reasonable model for a young and empty countrylike United States in its early years1. Like many others, Alphonse Quetelet(1795–1874), the Belgian astronomer turned statistician, was well aware thatthe indiscriminate extrapolation of exponential growth must lead to impos-sible values. He experimented with several adjustments of (5) and also asked

1Two hundred years later exponential growth played a major part in the Report to theClub of Rome of Meadows, Meadows, Randers, and Behrens (1972), and it is still implicitin many economic analyses.

3

Page 90: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

his pupil, the mathematician Pierre–Francois Verhulst (1804–1849), to lookinto the problem.

Like Quetelet, Verhulst approached the problem by adding an extra termto (5) to represent the increasing resistance to further growth, as in

W (t) = βW (t)− φ(W (t)). (6)

and then experimenting with various forms of φ. The logistic appears whenthis is a simple quadratic, for in that case we may rewrite (6) as

W (t) = βW (t)(Ω−W (t)) (7)

where Ω denotes the upper limit or saturation level of W , its asymptote ast →∞. Growth is now proportional both to the population already attainedW (t) and to the remaining room for further expansion Ω − W (t). If weexpress W (t) as a proportion P (t) = W (t)/Ω this gives

P (t) = βP (t)1− P (t), (8)

and the solution of this differential equation is

P (t) =exp(α + βt)

1 + exp(α + βt), (9)

which Verhulst named the logistic function. The population W (t) then fol-lows

W (t) = Ωexp(α + βt)

1 + exp(α + βt). (10)

Verhulst published his suggestions between 1838 and 1847 in three papers.The first is a brief note in the Correspondance Mathematique et Physique,edited by Quetelet, in 1838. It contains the essence of the argument infour small pages, followed by a demonstration that the curve agrees verywell with the actual course of the population of France, Belgium, Essex andRussia for periods up to 1833; Verhulst explains that he did his research acouple of years before, that he did not have the time for an update and that hepublishes this note only at the insistence of Quetelet. He does not say how hefitted the curves. The second paper, in the Proceedings of the Belgian RoyalAcademy of 1845, is a much fuller account of the function and its properties.Here Verhulst names it the logistic, without further explanation: in a neatdiagram, the courbe logistique is drawn alongside the courbe logarithmique,

4

Page 91: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

which we would nowadays call the exponential. Verhulst also determines thethree parameters Ω, α and β of (10) by making the curve pass through threeobserved points. With data for some twenty or thirty years only this is ahazardous method, as is borne out by the resulting estimates of the limitingpopulation Ω. Employing the known values of the Belgian population in1815, 1830 and 1845 Verhulst finds a limiting population of 6.6 million forthat country, and in a similar exercise 40 million for France: at present thesepopulations number 10.2 and 58.7 million. In 1847 there followed a secondpaper in the Proceedings, which is chiefly notable for an adjustment of thecorrection term that leads to a much better estimate of 9.5 millions for theBelgian Ω.

Verhulst was in poor health and died in 1849. He was primarily a math-ematician - professor of mathematics at the Belgian Military Academy - butsensitive to social and political issues. In his obituary of Verhulst, Quetelet(1850) attributes his early death to overwork and, rather curiously, to hisgreat stature, as Verhulst was 1.89 meters or six feet tall. His discovery ofthe logistic curve was not taken up with much enthusiasm by Quetelet; asVanpaemel (1987) has shown, the two men did not see eye to eye on thequestion of population growth. This may in part account for some curiouselements in Quetelet’s obituary; while ostensibly praising his lamented pupil,Quetelet stresses his impulsive nature and depicts him as a somewhat sillyman. Quetelet recounts at length Verhulst’s adventures in Rome. Verhulstwas staying in that city in the summer of 1830, when the news broke of therevolution in Paris and of the Belgian secession from the Netherlands. Theseevents moved him strongly and set him drafting a democratic constitutionfor the Papal State. He submitted this document to some cardinals he hadmet, who expressed great interest; still the police were called in, and Ver-hulst banished from Rome. He left under somewhat dramatic circumstances,having at first barricaded his apartment with the intention of withstandinga siege by the forces of law and order. But then he was only 26 years old atthe time.

Quetelet did not pay much attention to the logistic curve in his writings;it is barely mentioned, in an aside, in Quetelet (1848). But Verhulst’s workwas quoted with approval by Liagre (1852), his colleague at the MilitaryAcademy, and in the second edition of this textbook Camille Peney repeatsthe estimation of Ω for Belgium on the basis of more recent population figures,arriving at a value of 13.7 millions.

5

Page 92: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

As a model of population growth the logistic function was discoveredanew in 1920 by Pearl and Reed. They were unaware of Verhulst’s work(though not of the curves for autocatalytic reactions discussed presently),and they arrived independently at the logistic curve of (10). When this wasfitted to Census figures of the U.S., again by making the curve pass throughthree points, it gave a good fit for the period from 1790 to 1910. But theestimate of Ω of 197 millions once more compares badly with the presentvalue of about 270 millions. Along with the pursuit of many other interests,Pearl and his collaborators in the next twenty years went on to apply thelogistic growth curve to almost any living population from fruit flies to thehuman population of the French colonies in North Africa as well as to thegrowth of cantaloupes.

In 1920, Raymond Pearl (1879–1940) had just been appointed Director ofthe Department of Biometry and Vital Statistics at Johns Hopkins Univer-sity, and Lowell J. Reed (1886–1966) was his deputy (and his successor whena few years later Pearl was promoted to Professor of Biology). Pearl wastrained as a biologist and acquired his statistics as a young man by spendingthe year 1905–1906 in London with Karl Pearson (and later quarrelling withhim). He became a prodigious investigator and a prolific writer on a widevariety of phenomena like longevity, fertility, contraception, and the effectsof alcohol and tobacco consumption on health, all subsumed under the head-ing of human biology. During World War I Pearl worked in the U.S. FoodAdministration, and this may account for his preoccupation with the foodneeds of a growing population in the 1920 paper. Reed, who was trained as amathematician, made a quiet career in biostatistics; he excelled as a teacherand as an administrator, and was brought back in 1953 from retirement toserve as President of Johns Hopkins. Among his publications in the after-math of the 1920 paper with Pearl is an application of the logistic curve toautocatalytic reactions, Reed and Berkson (1929). We shall hear more aboutthis co-author in the next section.

Verhulst’s work was rediscovered soon after Pearl and Reed’s first paperof 1920. The immediate sequel, Pearl and Reed (1922), does not mention it;Verhulst’s priority is first acknowledged in a footnote in Pearl (1922), and,at greater length, in Pearl and Reed (1923). In this paper, Pearl and Reedcall Verhulst’s papers ”long since forgotten”, except for a single article byDu Pasquier (1918), and they then go out of their way to criticize that authorfor an ”entirely unjustified and in practice usually incorrect modification” ofVerhulst’s formula, without substantiating this harsh judgment. In fact Du

6

Page 93: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Pasquier’s paper is a harmless reflection on four mathematical theories ofpopulation, of a very formal and abstract character to the point of inanity.The four theories are ascribed to Halley, de Moivre, Euler and Verhulst, andthese authors are briefly introduced; Halley, for example, as ”the famousastronomer”, and Verhulst, rather oddly, as ”a Belgian who died in 1847”.No references are given.

Louis–Gustave Du Pasquier (1876–1957), Professor of Mathematics atthe University of Neuchatel, took his degrees in mathematics in Zurich, butfollowed courses in the social sciences as well and spent the year 1900–1901in Paris, taking courses at a variety of academic institutions. He may wellhave read about Verhulst in Liagre or elsewhere in the French literature, butI have been unable to find a useful reference to this effect in his textbook ofprobability (1926) . It is also not clear how Pearl learned about Verhulst, or,for that matter, about Du Pasquier2.

The next important publication is Yule’s Presidential Address to theRoyal Statistical Society of 1925. Yule, who says he owes the reference toPearl (1922), treats Verhulst much more handsomely than Pearl and Reeddid, devoting an appendix to his work. Yule is also the first author to revivethe name logistic, which is not used by Liagre or Du Pasquier nor by Pearland Reed in their earlier references. By 1924, however, ”logistic” is used as acommonplace term in the correspondence between Pearl and Yule, who werelifelong friends. It would take until 1933 for Miner (a collaborator of Pearl)to pay tribute to Verhulst, if in an oblique way: instead of reproducing atleast one of Verhulst’s papers, Miner translates Quetelet’s obituary, and em-phasises Verhulst’s Roman imbroglio by adding an extract from the memoirsof Queen Hortense de Beauharnais recording this episode.

As we have already hinted there is another early root of the logistic func-tion in chemistry, where it was employed (again with some variations) todescribe the course of autocatalytic or chain reactions, where the productitself acts as a catalyst for the process while the supply of raw material isfixed. This leads naturally to a differential equation like (8) and hence to thelogistic function for the time path of the amount of the reaction product. Thereview of the application of logistic curves to a number of such processes byReed and Berkson (1929) quotes work of the German professor of chemistry

2The Pearl archives at the American Philosophical Society in Philadelphia containPearl’s correspondence with several hundred individuals, but Du Pasquier is not amongthem.

7

Page 94: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Wilhelm Ostwald of 1883. Authors like Yule (1925) and Wilson (1925) werewell aware of this strand of the literature.

The basic idea of logistic growth is simple and effective, and it is used tothis day to model population growth and market penetration of new productsand technologies. The introduction of mobile telephones is an autocatalyticprocess, and so is the spread of many new products and techniques in indus-try.

3 The invention of the probit and the advent of the logit in bio-assay

The invention of the probit model is usually credited to Gaddum (1933) andBliss (1934a,1934b), but one look at the historical section of Finney (1971)or indeed at Gaddum’s paper and his references will show that this is toosimple. The roots of the method and in particular the transformation of fre-quencies to equivalent normal deviates can be traced to the German scholarFechner (1801–1887). Stigler (1986) recounts how Fechner was drawn tostudy human responses to external stimuli by experimental test of the abil-ity to distinguish differences in weight. The issue of the variability of humanresponses had been raised by astronomers, who relied on human observersof celestial phenomena and found that their readings showed much unac-countable variation. Fechner recognized that human response to an identicalstimulus is not uniform, and he was the first to transform observed differencesto equivalent normal deviates. The historical sketches of Finney (1971), Ch.3.6, and of Aitchison and Brown (1957), Ch. 1.2, record a long line of largelyindependent rediscoveries of this approach that spans the seventy years fromFechner (1860) to the early 1930’s when Gaddum and Bliss published theircontributions. Both authors regard the assumption of a normal distributionas commonplace, and attach more importance to the logarithmic transfor-mation of the stimulus. Their papers contain no major innovations, butthey mark the emergence of a standard paradigm of bio-assay and of a newterminology. Gaddum wrote a comprehensive and authoritative report withthe emphasis on practical aspects of the experiments and on the statisticalinterpretation of bio-assay, giving several worked examples from the medicaland pharmaceutical literature. Bliss published two brief notes in Science, in-troducing the term probit; he followed this up with a series of articles settingout the maximum likelihood estimation of the probit curve, in one instancewith assistance from R.A. Fisher, Bliss (1935). Both Gaddum and Bliss set

8

Page 95: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

standards of estimation; until the 1930’s this was largely a matter of ad hocnumerical and graphical adjustment of curves to categorical data.

John Henry Gaddum (1900–1965) studied medicine at Cambridge butfailed in his final examinations. He turned to pharmacology and worked un-der Trevan at the Wellcome Laboratories, then transferred to the NationalInstitute for Medical Research (where he wrote the 1933 report) before em-barking on an academic career of professorships in pharmacology in Cairo,London and Edinburgh. He was elected to the Royal Society in 1945 andknighted in 1964. To this day the British Pharmacological Society awards anannual Gaddum Memorial Prize for pharmaceutical research. Charles IttnerBliss (1899–1979) studied as an entomologist at Ohio State University andwas a field worker with the U.S. Department of Agriculture until this employ-ment was terminated in 1933. He then spent two years in London studyingstatistics with R.A. Fisher, and Fisher found him a job as a statistician inLeningrad where he lived from 1936 and 1938. The political conditions werenot propitious for serious research. Bliss returned to the Connecticut Agri-cultural Experiment Station, combining his work as a practising statisticianwith a Lecturership at Yale from 1942 until his retirement. He played animportant role in the founding of the Biometric Society.

In their early writings on bio-assay both authors adhere firmly to theclassical model of bio-assay, where the stimulus is determinate and responsesare random because of the variability of individual tolerance levels. Blissintroduced the term probit (short for ’probability unit’) originally as a con-venient scale for normal deviates, but abandoned this within a year in favourof a different definition which has since been generally accepted. For any(relative) frequency f there is an equivalent normal deviate Z such that thecumulative normal distribution at Z equals f ; Z is the solution of

f =1√2π

∫ Z

− infexp−(1/2)u2du, (11)

and this can be read off from a table of the normal distribution. The probitof f is this equivalent normal deviate Z, or initially Z increased by 5; thisensures that the probit is almost always positive, which facilitates calculation.In the 1930’s such additive constants were a common device. In the probitmethod probits of relative frequencies or of probabilities f are linearly relatedto (the logarithm of the) stimulus.

The acceptance of the probit method was aided by the articles of Bliss,who published regularly in this field until the 1950’s, and by Finney and

9

Page 96: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

others (Gaddum returned to pharmacology). The full flowering of this schoolprobably coincides with the first edition of Finney’s monograph in 1947.Without the underlying theory of bio-assay, probit analysis was quickly usedfor any relation of a discrete binary outcome to one or more determinants.In economics and market research, for example, the first applications appearalready in the 1950’s: Farrell (1954) uses a probit model for the ownershipof cars of different vintage as a function of household income, and Adam(1958) fits lognormal demand curves to survey data of the willingness to buycigarette lighters and the like at various prices. The classic monograph onthe lognormal distribution of Aitchison and Brown (1957) brought probitanalysis to the notice of a wider audience of economists.

As far as I can see the introduction of the logistic as an alternative to thenormal probability function is the work of a single person, namely JosephBerkson (1899–1982), Reed’s co-author of the paper on autocatalytic func-tions of 1929. Berkson read physics at Columbia, then went to Johns Hopkinsfor his M.D. and a doctorate in statistics in 1928. He stayed on as an assistantfor three years and this is when he collaborated with Reed on autocatalyticfunctions. Berkson then moved to the Mayo Clinic where he remained forthe rest of his working life as chief statistician. In the 1930’s he published nu-merous papers on medical and public health matters, but in 1944 he turnedhis attention to the statistical methodology of bio-assay and proposed theuse of the logistic instead of the normal probability function of (11), coiningthe term ’logit’ by analogy to the ’probit’ of Bliss (for which he was initiallymuch derided). As we have indicated earlier the two functions are almostindistinguishable. By the inverse of the logistic function (1) we have

logit(P ) = logP

1− P= Z, (12)

which is of course much simpler than the definition of the probit of (11). Theissue of logit versus probit was tangled by Berkson’s simultaneous attacks onthe method of maximum likelihood and his advocacy of minimum chi-squaredestimation instead. Between 1944 and 1980 he wrote a large number of paperson both issues; examples are Berkson (1951) and Berkson (1980). He oftenadopted a somewhat provocative style, and much controversy ensued.

The close resemblance of the logistic to the normal distribution functionmust have been common knowledge among those who were familiar withthe logistic; it had been demonstrated by Wilson (1925) and written up byWinsor (1932) (another collaborator of Pearl). Wilson was probably the

10

Page 97: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

first to publish an application of the logistic curve in bio-assay in Wilsonand Worcester (1943), just before Berkson (1944). But it was Berkson whopersisted and fought a long and spirited campaign which lasted for severaldecades.

Berkson’s suggestion was not well received by the biometric establish-ment. In the first place, the logit was regarded as somewhat inferior anddisreputable because unlike the probit it can not be related to an underly-ing (normal) distribution of tolerance levels. Aitchison and Brown (1957)dismiss the logit in a single sentence, because it ”lacks a well-recognized andmanageable frequency distribution of tolerances which the probit curve doespossess in a natural way” (p.72). Berkson was aware of this defect and triedto remedy it by adapting the autocatalytic argument, in Berkson (1951), butthis did not convince as this argument essentially deals with a process overtime. In retrospect it is surprising that so much importance was attachedto these somewhat ideological points of interpretation. At the time no one(not even Berkson) seems to have recognized the formidable power of thelogistic’s analytical properties. In the second place, Berkson’s case for thelogit was not helped by his simultaneous attacks on the established wisdomof maximum likelihood estimation and his advocacy of minimum chi-squared.The unpleasant atmosphere in which this discussion was conducted can begauged from the acrimonious exchanges between R.A. Fisher and Berkson inFisher (1954).

In the practical aspect of ease of computation the logit had a clear advan-tage over the probit, even with maximum likelihood estimation. To quoteCochran (from his comments on Fisher (1954), p.147) ”.. the speed withwhich a new technique becomes widely used is considerably influenced by thesimplicity or otherwise of the calculations that it requires. Next door to thelecture room in which the probit method is expounded one may still find thelaboratory in which the workers compute their LD 50s by the [much less so-phisticated] Behrens (Reed–Muench) method ..”. On this count the logitspread much more quickly in workfloor practice than in the academic dis-course. Until the advent of the computer and the pocket calculator, sometwenty years later, all numerical work was done by hand, that is with penciland paper, sometimes aided by graphical inspection of ’freehand curves’, ’fit-ted by eye’. For probit and logit analyses of grouped data or class frequenciesthere was graph paper with a special grid on which a probit or logit curvewould appear as a straight line. Wilson (1925) had introduced the logistic(or ’autocatalytic’) grid, and examples of lognormal paper can be found in

11

Page 98: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Aitchison and Brown (1957) and Adam (1958);3 Berkson himself had de-signed logistic graph paper as well as several nomograms.4 Numerical workwas supported rather feebly by the slide rule and by mechanical calculatingmachines, driven by hand or powered by a small electric motor, which werecapable of addition and multiplication; punched card equipment was helpfulif numerous data had to be analysed. Values of the normal distribution (andof exponentials and logarithms) were obtained from printed tables like Pear-son’s Biometrika Tables or the Statistical Tables of Fisher and Yates (1938).From the first edition the latter carried specially designed tables for probitanalysis (with auxiliary tables contributed by Bliss and by Finney), and fromthe fifth edition of 1957 onwards they also included special tables for logitanalysis.

In time, the ideological conflict over bio-assay abated. Finney, who hadignored the logit in the second edition of his textbook of 1952, made amendsin the third edition of 1970, recognizing (somewhat belatedly) that ”whatmatters is the dependence of P on dose and the unknown parameters, andthe tolerance distribution is merely a substructure leading to this” (p.47). Infact the narrow conflict between probit and logit in bio-assay had long beenovertaken by independent developments in statistics and biometrics.

4 The ascent of the logit

When the ideological debate about logit and probit in bio-assay had abated,around 1960, the logit terminology and the logit transformation of (12) weresoon much more widely adopted, and their origins forgotten. An accuratehistory of the adoption and further development of the logit would requirean intimate knowledge of several quite distinct disciplines, for many newgeneralizations were introduced independently and in almost complete isola-tion in completely unrelated applied work. We shall here only briefly touchupon some major movements in statistics, in epidemiology, and in the socialsciences and econometrics, without attempting a systematic treatment.

The earliest developments took place in the late 1950’s and the 1960’sin statistics and epidemiology. In statistics, the analytical advantages ofthe logit transformation as a means of dealing with discrete binary outcomes

3Finney (1971) traces the invention of the probability grid to a French artilleryman ofthe late 1890’s.

4A nomogram is a graph from which one can read off a transformations, as from a table;sophisticated nomograms may permit the quick solution of more complicated equations.

12

Page 99: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

were soon recognized. Cox was among the first to explore (and exploit) thesepossibilities; he wrote a series of papers between around 1960, and followedthese up with an influential textbook in 1969. The logit model of bio-assay iseasily generalized to logistic regression where binary outcomes are related toa number of determinants, without a specific theoretical background, and thisstatistical model proved as fertile as linear regression in an earlier era. Later,the link of the logistic model with discriminant analysis was recognized, andits ready association with loglinear models in general. In epidemiology, case-control studies began even earlier, and since these are directly concerned withodds, and odds ratios, the log-odds or logit transformation arises naturally.The practice had already called for a theoretical justification, especially of thesampling aspects, from an early date; see, for example, the work of Cornfieldin the early 1950’s.

Table 1. Number of articles in statistical journalscontaining the word ’probit’ or ’logit’.

probit logit

1935 – 39 6 -1940 – 44 3 11945 – 49 22 61950 – 54 50 151955 – 59 53 231960 – 64 41 271965 – 69 43 411970 – 74 48 611975 – 79 45 721980 – 84 93 1471985 – 89 98 2151990 – 94 127 311

The ascent of the logit in the statistical literature is illustrated in Table1, which is drawn from the jstor electronic repertory of major statisticaljournals in the english language5. The table shows the number of articleswhich contain the word ”probit” or ”logit”. It must be borne in mind that

5These are all the journals of the Royal Statistical Society and of the American Statis-tical Association; the Annals of Applied Probability, Annals of (Mathematical) Statistics,Annals of Probability, Biometrics, Biometrika and Statistical Science.

13

Page 100: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

the overall number of articles in these journals increases substantially overtime; from 1935 to 1985 it increased about eightfold. Up to around 1970 therelative numbers show the predominance of probit in bio-assay; then logitsoars ahead - not because of the after-effects of a victory over probit in bio-assay, but because of its much wider use in statistical theory and applicationsgenerally.

Until about 1980 computational effort was still an important issue in thediscussion of statistical techniques, but by then the computer revolution putan end to this. On the specific issue of estimating logit and probit analy-ses, maximum likelihood estimation became the norm when routines for thismethod, applicable to individual data, were included in commercial statisti-cal program packages. This facility was probably first offered by the bmdp(or biomedical data processing) program of 1977. By the time the firstcomprehensive textbook with medical applications of Hosmer and Lemeshow(1989) was published the use of such routines was taken for granted. Of thetwo causes Berkson advocated, minimum chi-squared estimation was effec-tively overtaken by the computer revolution, while the logit transformationof (12) was triumphant.

We conclude with some remarks on contributions from econometrics andthe social sciences. We have earlier indicated that the probit model of bio-assay was readily adopted in these disciplines. The theoretical justificationof bio-assay in terms of determinate stimulus and random thresholds wasfirst jettisoned in the change to logistic regression, and then retrieved inthe form of the latent regression equation model that is still dear to thebehavioural sciences. This is probably due to McKelvey and Zavoina (1975),who introduce it in an ordered probit analysis of the voting behaviour ofUS Congressmen. An example of simultaneous independent discoveries isthe generalization of logistic regression to the multinomial or polychotomouscase. This was first set out, at some length, by Gurland, Lee, and Dahm(1960). Several years later it was put forward quite independently by thestatistician Cox (1966) and by the biometric statistician Mantel (1966). Andsome years later again it was once more rediscovered independently by theeconometrician Theil (1969), who arrived at it from the general perspectiveof modelling shares.

For a long time, logistic regression, whether in the binary or the multi-nomial context, was principally used as a technique, a simple tool withouta specific underlying process and therefore without a characteristic interpre-tation. But in 1973 McFadden, working as a consultant for a Californian

14

Page 101: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

public transportation project, linked the multinomial logit to the theory ofdiscrete choice from mathematical psychology. This provided a theoreticalfoundation of the logit model that is much more profound than any theoryput forward for the use of the probit in bio-assay. It earned its author theNobel prize in economics in 2000.

References

Adam, D. (1958). Les Reactions du Consommateur Devant les Prix. Num-ber 15 in Observation Economique. Paris: Sedes.

Aitchison, J. and J. A. C. Brown (1957). The Lognormal Distribution.Number 5 in University of Cambridge, Department of Applied Eco-nomics Monographs. Cambridge: Cambridge University Press.

Berkson, J. (1944). Application of the logistic function to bio-assay. Jour-nal of the American Statistical Association 39, 357–365.

Berkson, J. (1951). Why I prefer logits to probits. Biometrics 7, 327–339.

Berkson, J. (1980). Minimum chi-square, not maximum likelihood! Annalsof Mathematical Statistics 8, 457–487.

Bliss, C. I. (1934a). The method of probits. Science 79, 38–39.

Bliss, C. I. (1934b). The method of probits. Science 79, 409–410.

Bliss, C. I. (1935). The calculation of the dosage-mortality curve. Annalsof Applied Biology 22, 134–167. With an appendix by R.A. Fisher.

Cornfield, J. (1951). A method of estimating comparative rates from clin-ical data. Journal of the National Cancer Institute 11, 1269–1275.

Cornfield, J. (1956). A statistical problem arising from retrospective stud-ies. In J. Neyman (Ed.), Proceedings of the Third Berkeley Symposiumon Mathematical Statistics and Probability, Berkeley, Calif., pp. 135–148. University of California Press.

Cox, D. R. (1958). The regression analysis of binary sequences. Journal ofthe Royal Statistical Society, Series B 20, 215–242.

Cox, D. R. (1966). Some procedures connected with the logistic qualita-tive response curve. In F. David (Ed.), Research Papers in Statistics:Festschrift for J. Neyman, pp. 55–71. London: Wiley.

Cox, D. R. (1969). Analysis of Binary Data. London: Chapman and Hall.

15

Page 102: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

de Beauharnais, H. (1927). Memoires de la Reine Hortense, publiees parle prince Napoleon. Paris: Plon. Three volumes.

Du Pasquier, L.-G. (1918). Esquisse d’une nouvelle theorie de lapopulation. Vierteljahrsschrift der Naturforschenden Gesellschaft inZurich 63, 236–249.

Du Pasquier, L.-G. (1926). Le calcul des probabilites. Paris: Hermann.

Farrell, M. J. (1954). The demand for motorcars in the United States.Journal of the Royal Statistical Society, series A 117, 171–200.

Fechner, G. T. (1860). Elemente der Psychophysik. Leipzig: Breitkopf undHartel.

Finney, D. (1971). Probit Analysis (Third ed.). Cambridge: CambridgeUniversity Press. First edition in 1947.

Fisher, R. A. (1954). The analysis of variance with various binomial trans-formations. Biometrics 10, 130–151. with comments by M.S. Bartlett,F.J. Anscombe, W.G. Cochran and J. Berkson.

Fisher, R. A. and F. Yates (1938). Statistical Tables for Biological, Agri-cultural and Medical Research. Edinburgh: Oliver and Boyd.

Gaddum, J. H. (1933). Reports on Biological Standard III. Methods ofBiological Assay Depending on a Quantal Response. London: Medi-cal Research Council. Special Report Series of the Medical ResearchCouncil, no. 183.

Gurland, J., I. Lee, and P. A. Dahm (1960). Polychotomous quantal re-sponse in biological assay. Biometrics 16, 382–398.

Hosmer, D. W. and S. Lemeshow (2000). Applied Logistic Regression (Sec-ond ed.). New York: Wiley. First edition in 1989.

Liagre, J. B. J. (1852). Calcul des probabilites et theorie des erreurs. Brux-elles: Soci’et’e pour l’emancipation intellectuelle (A. Jamard).

Liagre, J. B. J. (1879). Calcul des probabilites et theorie des erreurs.Bruxelles, Paris: Muquardt, Gauthier-Villars. 2eme edition, revue parCamille Peney.

Malthus, T. E. (1798). An Essay on the Principle of Population. London:anon.

16

Page 103: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Mantel, N. (1966). Models for complex contingency tables and polychoto-mous dosage response curves. Biometrics 22, 83–95.

McFadden, D. (2001). Economic choices. American Economic Review 91,352–370. Nobel prize acceptance speech.

McKelvey, R. D. and W. Zavoina (1975). A statistical model for the anal-ysis of ordinal level dependent variables. Journal of Mathematical So-ciology 4, 103–120.

Meadows, D. H., D. L. Meadows, J. Randers, and W. W. Behrens (1972).The Limits to Growth. New York: Universe Books.

Miner, J. R. (1933). ’Pierre-Francois Verhulst, the discoverer of the logisticcurve. Human Biology 5, 673–689.

Pearl, R. (1922). The Biology of Death. Phildelphia: Lippincott.

Pearl, R. (1927). The indigenous population of Algeria in 1926. Science 66,593–594.

Pearl, R. and L. J. Reed (1920). On the rate of growth of the populationof the United States since 1870 and its mathematical representation.Proceedings of the National Academy of Sciences 6, 275–288.

Pearl, R. and L. J. Reed (1922). A further note on the mathematicaltheory of population growth. Proceedings of the National Academy ofSciences 8, 365–368.

Pearl, R. and L. J. Reed (1923). On the mathematical theory of populationgrowth. Metron 5, 6–19.

Pearl, R., L. J. Reed, and J. F. Kish (1940). The logistic curve and thecensus count of 1940. Science 92, 486–488.

Pearl, R., C. P. Winsor, and F. B. White (1928). The form of the growthcurve of the cantaloupe (Cucumis melo) under field conditions. Pro-ceedings of the National Academy of Sciences 14, 895–901.

Pearson, K. (1914). Tables for Statisticians and Biometricians. Cambridge:Cambridge University Press.

Quetelet, A. (1848). De systeme social et des lois qui le regissent. Paris:Guillaume.

Quetelet, A. (1850). Notice sur Pierre-Francois Verhulst. Annuaire de l’A-cademie Royale des Sciences, Lettres et des Beaux-arts 16, 97–124.

17

Page 104: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Reed, L. J. and J. Berkson (1929). The application of the logistic functionto experimental data. Journal of Physical Chemistry 33, 760–779.

Stigler, S. M. (1986). The History of Statistics. Cambridge, Mass.: HarvardUniversity Press.

Theil, H. (1969). A multinomial extension of the linear logit model. Inter-national Economic Review 10, 251–259.

Vanpaemel, G. (1987). Quetelet en Verhulst over de mathematische wet-ten van de bevolkingsgroei. Academiae Analecta, Mededelingen van deKoninklijke Academie voor Wetenschappen, Letteren en Schoone Kun-sten van Belgie 49, 99–114.

Verhulst, P.-F. (1838). Notice sur la loi que la population suit dans sonaccroissement. Correspondance Mathematique et Physique, publiee parA. Quetelet 10, 113.

Verhulst, P.-F. (1845). Recherches mathematiques sur la loi d’accroisse-ment de la population. Nouveaux Memoires de l’Academie Royale desSciences, des Lettres et des Beaux-Arts de Belgique 18, 1–32.

Verhulst, P.-F. (1847). Deuxieme memoire sur la loi d’accroissement dela population. Nouveaux Memoires de l’Academie Royale des Sciences,des Lettres et des Beaux-Arts de Belgique 20, 1–32.

Wilson, E. B. (1925). The logistic or autocatalytic grid. Proceedings of theNational Academy of Sciences 11, 451–456.

Wilson, E. B. and J. Worcester (1943). The determination of L. D. 50 andits sampling error in bio-assay. Proceedings of the National Academy ofSciences 29, 79. First of a series of three articles.

Winsor, C. P. (1932). A comparison of certain symmetrical growth curves.Journal of the Washington Academy of Sciences 22, 73–84.

Yule, G. U. (1925). The growth of population and the factors which controlit. Journal of the Royal Statistical Society 138, 1–59.

Biographical sources

I have consulted a number of obituaries and other sources about Pearl, Reed, Bliss, Gad-dum and Berkson, namely

Jennings, H.S. (1941) Raymond Pearl, 1879–1940. Biographical Memoirs of the NationalAcademy of Sciences of the United States 22, nr. 14, 295–347.

18

Page 105: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Miner, John R. and Joseph Berkson (1940) Raymond Pearl, 1879–1940. The ScientificMonthly 52, 1092–194.

Cochran, W.G. (1967) Lowell Jacob Reed. Journal of the Royal Statistical Society, SeriesA 130, 279-281.

Feldberg, W. (1967) John Henry Gaddum, 1900–1965. Biographical Memoirs of Fellowsof the Royal Society 13, 57–77.

Cochran, W.G. (1979) Chester Ittner Bliss, 1899—1979. Biometrics 35, 715–717.

(about Bliss) Salsburg, D. (2001) The lady tasting Tea. New York: Holt.

Armitage, P., and T. Colton (eds) (1998) Joseph Berkson 1899–1982. Encyclopedia ofBiostatistics, volume I, 290–300. New York: Wiley.

Taylor, W.F. (1983) Joseph Berkson, 1899-1982. Journal of the Royal Statistical Society,Series A 146, 413–419.

19

Page 106: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Sigmoid function 1

Sigmoid function

The logistic curve

Plot of the error function

Many natural processes, including thoseof complex system learning curves,exhibit a progression from smallbeginnings that accelerates andapproaches a climax over time. When adetailed description is lacking, asigmoid function is often used. Asigmoid curve is produced by amathematical function having an "S"shape. Often, sigmoid function refers tothe special case of the logistic functionshown at right and defined by theformula

Another example is the Gompertz curve. It is used in modeling systems that saturate at large values of t. Anotherexample is the ogee curve as used in the spillway of some dams. A wide variety of sigmoid functions have been usedas the activation function of artificial neurons, including the logistic function and tanh(x).

Page 107: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Sigmoid function 2

PropertiesIn general, a sigmoid function is real-valued and differentiable, having either a non-negative or non-positive firstderivative which is bell shaped. There are also a pair of horizontal asymptotes as . The logistic functionsare sigmoidal and are characterized as the solutions of the differential equation[1]

Examples

Some sigmoid functions compared. In the drawing all functions are normalized in such away that their slope at 0 is 1.

Besides the logistic function, sigmoid functions include the ordinary arctangent, the hyperbolic tangent, and the errorfunction, but also the generalised logistic function and algebraic functions like .

The integral of any smooth, positive, "bump-shaped" function will be sigmoidal, thus the cumulative distributionfunctions for many common probability distributions are sigmoidal. The most famous such example is the errorfunction.

References[1] http:/ / www. ai. mit. edu/ courses/ 6. 892/ lecture8-html/ sld015. htm

• Tom M. Mitchell, Machine Learning, WCB-McGraw-Hill, 1997, ISBN 0-07-042807-7. In particular see "Chapter4: Artificial Neural Networks" (in particular p. 96-97) where Mitchel uses the word "logistic function" and the"sigmoid function" synonymously -- this function he also calls the "squashing function" -- and the sigmoid (akalogistic) function is used to compress the outputs of the "neurons" in multi-layer neural nets.

• http:/ / www. computing. dcu. ie/ ~humphrys/ Notes/ Neural/ sigmoid. html Properties of the sigmoid, includinghow it can shift along axes and how its domain may be transformed.

Page 108: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Article Sources and Contributors 3

Article Sources and ContributorsSigmoid function Source: http://en.wikipedia.org/w/index.php?oldid=407086458 Contributors: Alfie66, AnthonyQBachler, Barraki, BenFrantzDale, Booyabazooka, Cacycle, Chinasaur,Chinju, Cleared as filed, Closedmouth, Doubleddoubleu, Dr. Sunglasses, Eequor, Georg-Johann, Giftlite, GregorB, Hagman, HappyCamper, Hyperwiz, JHunterJ, Jacobolus, Jfmiller28, Jim.belk,Jorge Stolfi, KennethJ, Kku, Knights who say ni, Kri, Light current, Linas, Loluengo, MarkSweep, Mboverload, Melesse, Michael Hardy, Mike2vil, Mkch, MrOllie, Nbarth, New Image Uploader929, Paul Haymon, Pebkac, Pfhenshaw, Pmcray, Schroding79, Sławomir Biały, The Anome, Thelostchild, Tktktk, Tobias Bergemann, TravisTX, Trialsanderrors, Vjost, Waldir, Wolfman,Wvbailey, ZackV, ZeroOne, 45 anonymous edits

Image Sources, Licenses and ContributorsImage:Logistic-curve.svg Source: http://en.wikipedia.org/w/index.php?title=File:Logistic-curve.svg License: Public Domain Contributors: User:QefImage:Error Function.svg Source: http://en.wikipedia.org/w/index.php?title=File:Error_Function.svg License: Public Domain Contributors: User:InductiveloadFile:Gjl-t(x).svg Source: http://en.wikipedia.org/w/index.php?title=File:Gjl-t(x).svg License: GNU Free Documentation License Contributors: User:Georg-Johann

LicenseCreative Commons Attribution-Share Alike 3.0 Unportedhttp:/ / creativecommons. org/ licenses/ by-sa/ 3. 0/

Page 109: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

1 16 January 2011 From www.computing.dcu.ie/~humphrys/Notes/Neural/sigmoid.html

Continuous Output - The sigmoid function Given Summed Input:

x =

Instead of threshold, and fire/not fire, we could have continuous output y according to the sigmoid function:

Note e and its properties. As x goes to minus infinity, y goes to 0 (tends not to fire). As x goes to infinity, y goes to 1 (tends to fire): At x=0, y=1/2

Page 110: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

2 16 January 2011 From www.computing.dcu.ie/~humphrys/Notes/Neural/sigmoid.html

More threshold-like We can make this more and more threshold-like, or step-like, by increasing the weights on the links, and so increasing the summed input:

More linear Q. How do we make it less step-like (more linear)?

For any non-zero w, no matter how close to 0, ς(wx) will eventually be asymptotic to the lines y=0 and y=1.

Page 111: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

3 16 January 2011 From www.computing.dcu.ie/~humphrys/Notes/Neural/sigmoid.html

Is this linear? Let's change the scale:

This is exactly same function.

Page 112: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

4 16 January 2011 From www.computing.dcu.ie/~humphrys/Notes/Neural/sigmoid.html

So it's not actually linear, but note that within the range -6 to 6 we can approximate a linear function with slope.

If x will always be within that range then for all practical purposes we have linear output with slope.

Or try this:

Is this linear? Let's change the scale:

Page 113: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

5 16 January 2011 From www.computing.dcu.ie/~humphrys/Notes/Neural/sigmoid.html

This is exactly same function.

Approximation of Linear with slope In practice, x will always be within some range. So we can always get, within that range, an approximation of many different linear functions with slope.

e.g. Given x will be from -30 to 30:

Page 114: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

6 16 January 2011 From www.computing.dcu.ie/~humphrys/Notes/Neural/sigmoid.html

Page 115: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

7 16 January 2011 From www.computing.dcu.ie/~humphrys/Notes/Neural/sigmoid.html

Page 116: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

8 16 January 2011 From www.computing.dcu.ie/~humphrys/Notes/Neural/sigmoid.html

Approximation of any linear function so long as y stays in [0,1]

And centred on zero. To centre other than zero see below.

Linear y=1/2

The only way we can make ς(wx) exactly linear is to set w=0, then y = constant 1/2 for all x.

Change sign We can also, by changing the sign of the weights, make large positive actual input lead to large negative summed input and hence no fire, and large negative actual input lead to fire.

Page 117: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

9 16 January 2011 From www.computing.dcu.ie/~humphrys/Notes/Neural/sigmoid.html

Not centred on zero This is of course a threshold-like function still centred on zero. To centre it on any threshold we use:

y = ς(x-t) where t is the threshold for this node. This threshold value is something that is learnt, along with the weights.

Page 118: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

10 16 January 2011 From www.computing.dcu.ie/~humphrys/Notes/Neural/sigmoid.html

The "threshold" is now the centre point of the curve, rather than an all-or-nothing value.

ς(ax+b)

General case: use ς(ax+b)

Can we have linear output? Can y be linear? Not if it has slope. Must stay between 0 and 1.

Can be linear constant y=c, c between 0 and 1. We already saw y=1/2. Can we have other y=c?

By setting a=0, y=ς(b) constant for all x By varying b, we can have constant output y=c for any c between 0 and 1.

Page 119: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

11 16 January 2011 From www.computing.dcu.ie/~humphrys/Notes/Neural/sigmoid.html

Reminder - differentiation rules Product Rule:

d/dx (fg) = f (dg/dx) + g (df/dx)

Quotient Rule:

d/dx (f/g) = ( g (df/dx) - f (dg/dx) ) / g2

Properties of the sigmoid function

Page 120: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

12 16 January 2011 From www.computing.dcu.ie/~humphrys/Notes/Neural/sigmoid.html

Max/min value of slope Slope = y (1-y) The slope is greatest where? And least where?

To prove this, take the next derivative and look for where it equals 0:

d/dy ( y (1-y) ) = y (-1) + (1-y) 1 = -y + 1 -y = 1 - 2y = 0 for y = 1/2 This is a maximum. There is no minimum.

Slope of ς(ax+b) For the general case:

y = ς(ax+b)

Page 121: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

13 16 January 2011 From www.computing.dcu.ie/~humphrys/Notes/Neural/sigmoid.html

a positive or negative, fraction or multiple b positive or negative

y = ς(z) where z = ax+b dy/dx = dy/dz dz/dx = y(1-y) a if a positive, all slopes are positive, steepest slope (highest positive slope) is at y = 1/2 if a negative, all slopes are negative, steepest slope (lowest negative slope) is at y = 1/2

i.e. Slope is different value, but still steepest at y = 1/2

Page 123: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Weibull distribution 2

In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is namedafter Waloddi Weibull who described it in detail in 1951, although it was first identified by Fréchet (1927) and firstapplied by Rosin & Rammler (1933) to describe the size distribution of particles.

DefinitionThe probability density function of a Weibull random variable X is[1] :

where k > 0 is the shape parameter and λ >0 is the scale parameter of the distribution. Its complementarycumulative distribution function is a stretched exponential function. The Weibull distribution is related to a numberof other probability distributions; in particular, it interpolates between the exponential distribution (k = 1) and theRayleigh distribution (k = 2).If the quantity X is a "time-to-failure", the Weibull distribution gives a distribution for which the failure rate isproportional to a power of time. The shape parameter, k, is that power plus one, and so this parameter can beinterpreted directly as follows:• A value of k<1 indicates that the failure rate decreases over time. This happens if there is significant "infant

mortality", or defective items failing early and the failure rate decreasing over time as the defective items areweeded out of the population.

• A value of k=1 indicates that the failure rate is constant over time. This might suggest random external events arecausing mortality, or failure.

• A value of k>1 indicates that the failure rate increases with time. This happens if there is an "aging" process, orparts that are more likely to fail as time goes on.

In the field of materials science, the shape parameter k of a distribution of strengths is known as the Weibullmodulus.

PropertiesThe cumulative distribution function for the Weibull distribution is

for x ≥ 0, and F(x; k; λ) = 0 for x < 0.The failure rate h (or hazard rate) is given by

MomentsThe moment generating function of the logarithm of a Weibull distributed random variable is given by[2]

where Γ is the gamma function. Similarly, the characteristic function of log X is given by

In particular, the nth raw moment of X is given by:

Page 124: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Weibull distribution 3

The mean and variance of a Weibull random variable can be expressed as:

and

The skewness is given by:

The excess kurtosis is given by:

where . The kurtosis excess may also be written as :

Moment generating functionA variety of expressions are available for the moment generating function of X itself. As a power series, since theraw moments are already known, one has

Alternatively, one can attempt to deal directly with the integral

If the parameter k is assumed to be a rational number, expressed as k = p/q where p and q are integers, then thisintegral can be evaluated analytically.[3] With t replaced by −t, one finds

where G is the Meijer G-function.The characteristic function has also been obtained by Muraleedharan et al. (2007).Information entropyThe information entropy is given by

where is the Euler–Mascheroni constant.

Page 125: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Weibull distribution 4

Related distributionsThe translated Weibull distribution contains an additional parameter, and is also often found in the literature.[2] It hasthe probability density function

for and f(x; k, λ, θ) = 0 for x < θ, where is the shape parameter, is the scale parameter and is the location parameter of the distribution. When θ=0, this reduces to the 2-parameter distribution.The Weibull distribution can be characterized as the distribution of a random variable X such that the randomvariable

is the standard exponential distribution with intensity 1.[2] The Weibull distribution interpolates between theexponential distribution with intensity 1/λ when k = 1 and a Rayleigh distribution of mode when k = 2.

The density function of the Weibull distribution changes character radically as k varies between 0 and 3, particularlyin terms of its behaviour near x=0. For k < 1 the density approaches ∞ as x nears zero and the density is J-shaped. Fork = 1 the density has a finite positive value at x=0. For 1<k<2 the density is zero nears zero,has an infinite slope atx=0 and is unimodal. For k=2 the density has a finite positive slope at x=0. For k>2 the density is zero and has a zeroslope at x=0 and the density is unimodal. As k goes to infinity, the Weibull distribution converges to a Dirac deltadistribution centred at x=λ.The Weibull distribution can also be characterized in terms of a uniform distribution: if X is uniformly distributed on(0,1), then the random variable Weibull distributed with parameters k and λ. This leads to aneasily implemented numerical scheme for simulating a Weibull distribution.The Weibull distribution (usually sufficient in reliability engineering) is a special case of the three parameterExponentiated Weibull distribution where the additional exponent equals 1. The Exponentiated Weibull distributionaccommodates unimodal, bathtub shaped*[4] and monotone failure rates.The Weibull distribution is a special case of the generalized extreme value distribution. It was in this connection thatthe distribution was first identified by Maurice Fréchet in 1927. The closely related Fréchet distribution, named forthis work, has the probability density function

The Weibull distribution can also be generalized to the 3 parameter exponentiated Weibull distribution. This modelsthe situation when the failure rate of a system is due to a combination of factors, and may increase for some timesand decrease for other times (see bathtub curve).

Weibull plotThe goodness of fit of data to a Weibull distribution can be visually assessed using a Weibull Plot[5] . The WeibullPlot is a plot of the empirical cumulative distribution function of data on special axes in a type of Q-Q plot.The axes are versus . The reason for this change of variables is the cumulativedistribution function can be linearised:

Page 126: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Weibull distribution 5

which can be seen to be in the standard form of a straight line. Therefore if the data came from a Weibull distributionthen a straight line is expected on a Weibull plot.There are various approaches to obtaining the empirical distribution function from data: one method is to obtain thevertical coordinate for each point using where is the rank of the data point and is the number

of data points[6] .Linear regression can also be used to numerically assess goodness of fit and estimate the parameters of the Weibulldistribution. The gradient informs one directly about the shape parameter and the scale parameter can also beinferred.

UsesThe Weibull distribution is used• In survival analysis[7]

• In reliability engineering and failure analysis• In industrial engineering to represent manufacturing and delivery times• In extreme value theory• In weather forecasting

• To describe wind speed distributions, as the natural distribution often matches the Weibull shape[8]

Fitted cumulative Weibull distribution to maximum one-dayrainfalls using CumFreq [9]

• In communications systems engineering• In radar systems to model the dispersion of the

received signals level produced by some types ofclutters

• To model fading channels in wirelesscommunications, as the Weibull fading modelseems to exhibit good fit to experimental fadingchannel measurements

• In General insurance to model the size of Reinsuranceclaims, and the cumulative development of Asbestosislosses

• In forecasting technological change (also known as theSharif-Islam model)

• In hydrology the Weibull distribution is applied toextreme events such as annually maximum one-day rainfalls and river discharges. The blue picture illustrates anexample of fitting the Weibull distribution to ranked annually maximum one-day rainfalls showing also the 90%confidence belt based on the binomial distribution. The rainfall data are represented by plotting positions as partof the cumulative frequency analysis.

Page 127: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Weibull distribution 6

Rosin-Rammler distributionThe 2-Parameter Weibull distribution is used to describe the particle size distribution (PSD) of particles generated bygrinding, milling and crushing operations. The Rosin-Rammler distribution predicts fewer fine particles than theLog-normal distribution. It is generally most accurate for narrow PSDs.Using the cumulative distribution function:• F(x; k; λ) is the mass fraction of particles with diameter < x• λ is the mean particle size• k is a measure of particle size spread

References[1] Papoulis, Pillai, "Probability, Random Variables, and Stochastic Processes, 4th Edition[2] Johnson, Kotz & Balakrishnan 1994[3] See (Cheng, Tellambura & Beaulieu 2004) for the case when k is an integer, and (Sagias & Karagiannidis 2005) for the rational case.[4] "System evolution and reliability of systems" (http:/ / www. sys-ev. com/ reliability01. htm). Sysev (Belgium). 2010-01-01. .[5] The Weibull plot (http:/ / www. itl. nist. gov/ div898/ handbook/ eda/ section3/ weibplot. htm)[6] Wayne Nelson (2004) Applied Life Data Analysis. Wiley-Blackwell ISBN 0471644625[7] Survival/Failure Time Analysis (http:/ / www. statsoft. com/ textbook/ survival-failure-time-analysis/ #distribution)[8] Wind Speed Distribution Weibull (http:/ / www. reuk. co. uk/ Wind-Speed-Distribution-Weibull. htm)[9] "Cumfreq, a free computer program for cumulative frequency analysis" (http:/ / www. waterlog. info/ cumfreq. htm). .

Bibliography• Fréchet, Maurice (1927), "Sur la loi de probabilité de l'écart maximum", Annales de la Société Polonaise de

Mathematique, Cracovie 6: 93–116.• Johnson, Norman L.; Kotz, Samuel; Balakrishnan, N. (1994), Continuous univariate distributions. Vol. 1, Wiley

Series in Probability and Mathematical Statistics: Applied Probability and Statistics (2nd ed.), New York: JohnWiley & Sons, MR1299979, ISBN 978-0-471-58495-7

• Muraleedharan, G.; Rao, A.G.; Kurup, P.G.; Nair, N. Unnikrishnan; Sinha, Mourani (2007), "CoastalEngineering", Coastal Engineering 54 (8): 630–638, doi:10.1016/j.coastaleng.2007.05.001

• Rosin, P.; Rammler, E. (1933), "The Laws Governing the Fineness of Powdered Coal", Journal of the Institute ofFuel 7: 29–36.

• Sagias, Nikos C.; Karagiannidis, George K. (2005), "Gaussian class multivariate Weibull distributions: theory andapplications in fading channels", Institute of Electrical and Electronics Engineers. Transactions on InformationTheory 51 (10): 3608–3619, doi:10.1109/TIT.2005.855598, MR2237527, ISSN 0018-9448

• Weibull, W. (1951), "A statistical distribution function of wide applicability", J. Appl. Mech.-Trans. ASME 18(3): 293–297.

• "Engineering statistics handbook" (http:/ / www. itl. nist. gov/ div898/ handbook/ eda/ section3/ eda3668. htm).National Institute of Standards and Technology. 2008.

• Nelson, Jr, Ralph (2008-02-05). "Dispersing Powders in Liquids, Part 1, Chap 6: Particle Volume Distribution"(http:/ / www. erpt. org/ 014Q/ nelsa-06. htm). Retrieved 2008-02-05.

External links• Mathpages - Weibull Analysis (http:/ / www. mathpages. com/ home/ kmath122/ kmath122. htm)

Page 128: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

Article Sources and Contributors 7

Article Sources and ContributorsWeibull distribution Source: http://en.wikipedia.org/w/index.php?oldid=396533396 Contributors: Agriculture, Alfpooh, Argyriou, Asitgoes, Avraham, AxelBoldt, Bender235, Bryan Derksen,Btyner, Calimo, Cburnett, Corecode, Corfuman, Craigy144, Darrel francis, David Haslam, Dhatfield, Diegotorquemada, Dmh, Doradus, Edratzer, Eliezg, Emilpohl, Felipehsantos,Gausseliminering, Gcm, Giftlite, Gobeirne, GuidoGer, Iwaterpolo, J6w5, JJ Harrison, Janlo, Jason A Johnson, Jfcorbett, Joanmg, KenT, Kghose, Lachambre, LachlanA, LilHelpa, MH, Mack2,Mebden, Melcombe, Michael Hardy, MisterSheik, O18, Olaf, Oznickr, PAR, Pleitch, Policron, Prof. Frink, Qwfp, R.J.Oosterbaan, RekishiEJ, Robertmbaldwin, Saad31, Sam Blacketer, Samikrc,Sandeep4tech, Slawekb, Smalljim, Stern, Stpasha, Strypd, Sławomir Biały, TDogg310, Tassedethe, Tom harrison, Tomi, Uppland, WalNi, Wiki5d, Yanyanjun, Zundark, 126 anonymous edits

Image Sources, Licenses and ContributorsImage:Weibull PDF.svg Source: http://en.wikipedia.org/w/index.php?title=File:Weibull_PDF.svg License: GNU Free Documentation License Contributors: User:CalimoImage:Weibull CDF.svg Source: http://en.wikipedia.org/w/index.php?title=File:Weibull_CDF.svg License: GNU Free Documentation License Contributors: User:CalimoFile:FitWeibullDistr.tif Source: http://en.wikipedia.org/w/index.php?title=File:FitWeibullDistr.tif License: Creative Commons Attribution-Sharealike 3.0 Contributors: User:Buenas días

LicenseCreative Commons Attribution-Share Alike 3.0 Unportedhttp:/ / creativecommons. org/ licenses/ by-sa/ 3. 0/

Page 129: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

1 From www.mathpages.com/home/kmath122/kmath122.htm 16 January 2011

Weibull Analysis

Given any function g(x) with g(0) = 0 and increasing monotonically to infinity as x goes to infinity, we can define a cumulative probability function by

Obviously this probability is 0 at t = 0 and increases monotonically to 1.0 as t goes to infinity. The corresponding density distribution f(t) is the derivative of this, i.e.,

As discussed in Failure Rates, MTBFs, and All That , the "rate" of occurrence for a given density distribution is

so the rate for the preceding density function f(t) is

This enables us to define a probability density distribution with any specified rate function R(t). One useful two-parameter family of rate functions is given by

where α and β are constants. This is the so-called "Weibull" family of distributions, named after the Swedish engineer Waloddi Weibull (1887-1979) who popularized its use for reliability analysis, especially for metallurgical failure modes. Weibull's first paper on the subject was published in 1939, but the method didn't attract much attention until the 1950's. Interestingly, the "Weibull distribution" had already been studied in the 1920's by the statistician Emil Gumbel (1891-1966), who is best remembered today for his confrontation with the Nazis in 1931 when they organized a campaign to force him out of his professorship at Heidelberg University for his outspoken pacifist and anti-Nazi views.

Page 130: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

2 From www.mathpages.com/home/kmath122/kmath122.htm 16 January 2011

The constant α is called the scale parameter, because it scales the t variable, and the constant β is called the shape parameter, because it determines the shape of the rate function. (Occasionally the variable t in the above definition is replaced with t−γ, where γ is a third parameter, used to define a suitable zero point.) If β in greater than 1 the rate increases with t, whereas if β is less than 1 the rate decreases with t. If β = 1 the rate is constant, in which case the Weibull distribution equals the exponential distribution. The shapes of the rate functions for the Weibull family of distributions are illustrated in the figure below

Since R(t) equals g'(t), we integrate this function to give

Clearly for any positive α and β parameters the function g(t) monotonically increases from 0 to infinity as t increases from 0 to infinity, so it yields a valid cumulative density distribution, namely

Page 131: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

3 From www.mathpages.com/home/kmath122/kmath122.htm 16 January 2011

and the corresponding density distribution is

Suppose we have a population consisting of n widgets (where n is a large number), all of which began operating continuously at the same time t = 0. (Note that this time variable t represents "calendar time", which is the hours of operation of each individual widget, not the sum total of the operational hours of the entire population; the latter would be given by nt. We can use a single t as the time variable because we've assumed a coherent population consisting of widgets that began operating at the same instant and accumulated hours continuously. We defer the discussion of non-coherent populations until later.) If each widget has a Weibull cumulative failure distribution given by equation (2) for some fixed parameters α and β, then the expected number N(t) of failures by the time t is

Dividing both sides by n, and re-arranging terms, this can be written in the form

Taking the natural log of both sides and negating both sides, we have

Taking the natural log again, we arrive at

If the number N(t) of failures is very small compared with the total number n of widgets, the inner natural log on the left hand side can be approximated by the first-order term of the power series expansion, which gives simply N(t)/n, so we have

Page 132: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

4 From www.mathpages.com/home/kmath122/kmath122.htm 16 January 2011

However, it's usually just as easy to use the exact equation (5) rather than the approximate equation (6). Given an initial population of n = 100 widgets beginning at time t = 0, and accumulating hours continuously thereafter, suppose the first failure occurs at time t = t1. Roughly speaking, we could say the expected number of failures at the time of the first failure is about 1, so we could put F(t1) = N(t1)/n = 1/100. However, this isn't quite optimum, because statistically the first failure is most likely to occur slightly before the expected number of failures reaches 1. To understand why, consider a population consisting of just a single widget, in which case the expected number of failures at any given time t would be simply F(t), which only approaches 1 in the limit as t goes to infinity, and yet the median time of failure is at the value of t = tmedian such that F(tmedian) = 0.5. In other words, the probability is 0.5 that the failure will occur prior to tmedian, and 0.5 that it will occur later. Hence in a population of size n = 1 the expected number of failures at the median time of the first failure is just 0.5. In general, given a population of n widgets, each with the same failure density f(t), the probability for each individual widget being failed at time tm is F(tm) = N(tm)/n. Denoting this value by φ, the probability that exactly j widgets are failed and n − j are not failed at time tm is

It follows that the probability of j or more being failed at the time tm is

This represents the probability that the jth (of n) failure has occurred by the time tm, and of course the complement is the probability that the jth failure has not yet occurred by the time tm. Therefore, given that the jth failure occurs at tm, the "median" value of F(tm) = φ is given by putting P[≥j;n] = 0.5 in the above equation and solving for φ. This value is called the median rank, and can be computed numerically. An alternative approach is to use the remarkably good approximate formula

This is the value (rather than j/n) that should be assigned to N(tj)/n for the jth failure.

Page 133: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

5 From www.mathpages.com/home/kmath122/kmath122.htm 16 January 2011

To illustrate, consider again an initial population of n = 100 widgets beginning at time t = 0, and accumulating hours continuously thereafter, and suppose the first five widget failures occurred at the times t1 = 1216 hours, t2 = 5029 hours, t3 = 13125 hours, t4 = 15987 hours, and t5 = 29301 hours. This gives us five data points

By simple linear regression we can perform a least-squares fit of this sequence of k = 5 data points to a line. In terms of variables xj = ln(tj) and vj = ln(ln(1/(1-(j-0.3)/(n+0.4))), the estimated Weibull parameters are given by

For our example with k = 5 data points, we get β = 0.609 and α = (4.14)106 hours. Using equation (4), we can then forecast the expected number of failures into the future, as shown in the figured below.

Page 134: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

6 From www.mathpages.com/home/kmath122/kmath122.htm 16 January 2011

Notice that at the time t = α we expect 63.21% of the population to have failed, regardless of the value of β. The estimated failure rate for each individual widget as a function of time is

This function is shown in the figure below.

Page 135: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

7 From www.mathpages.com/home/kmath122/kmath122.htm 16 January 2011

A distribution of this kind is said to exhibit an "infant mortality" characteristic, because the failure rate is initially very high, and then drops off. For an alternative derivation of equation (6) for small N(t)/n, notice that the overall failure rate for the whole population of n widgets in essentially nR(t) because the number of functioning widgets does not change much as long as N(t)/n is small. Therefore, the expected number of failures by the time T for a coherent population of n widgets is

Dividing both sides by n and taking the natural log of both sides gives, again, equation (6). However, it should be noted that this applies only for times t such that N(t) is small compared with n. This is because we are analyzing the failure rate without replacement, so the value of n (the size of the unfailed population) is actually decreasing with time. By limiting our considerations to small values of N(t) (compared with n) we make this effect negligible. Now, it might seem that we could just as well assume replacement of the failed units, so that the size of the overall population remains constant, and indeed this is typically how analyses are conducted for devices with exponential failure distributions, for which the failure rate is constant. However, for the Weibull distribution it is not so easy, because the failure rate of each widget is a function of the age of that particular widget. If we replace a unit that has failed at 10000 hours with a new unit, the overall failure rate of the total population changes abruptly, either up or down, depending on whether β is less than or greater than 1.

Page 136: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

8 From www.mathpages.com/home/kmath122/kmath122.htm 16 January 2011

The dependence of the failure rate R(t) on the time of each individual widget is also the reason we considered a coherent population of widgets whose ages are all synchronized. The greatly simplifies the analysis. In more realistic situations the population of widgets will be changing, and the "age" of each widget in the population will be different, as will the rate at which it accumulates operational hours as a function of calendar time. More generally we could consider a population of widgets such that each widget has it's own "proper time" τj given by τj = µj(t − γj) for all t greater than γj, where t is calendar time, γj is the birth date of the jth widget, and µj is the operational usage factor. This proper time is then the time variable for the Weibull density function for the jth widget, and the overall failure rate for the whole population at any given calendar time is composed of all the individual failure rates. In this non-coherent population, each widget has its own distinct failure distribution. At a given calendar time, the experience basis of a particular population might be as illustrated below:

So far there have been three failures, widgets 2, 4, and 5. The other seven widgets are continuing to accumulate operational hours at their respective rates. These are sometimes called "right censured" data points, or "suspensions", because we imagine that the testing has been suspended on these seven units prior to failure. We don't know when they will fail, so we can't directly use them as data points to fit the distribution, but we would still like to make some use of the fact that they accumulated their respective hours without failure. The usual approach in reliability analysis is to first rank all the data points according to their accumulated hours, as shown below.

Page 137: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

9 From www.mathpages.com/home/kmath122/kmath122.htm 16 January 2011

Next we assign adjust ranks to the widgets that have actually failed. Letting kj denote the overall rank of the jth failure, and letting r(j) denote the adjusted rank of the jth failure (with r(0) defined as 0), the adjusted rank of the jth failure is given by the formula

So, for the example above, we have

Using these adjusted ranks, we have the three data points

Page 138: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

10 From www.mathpages.com/home/kmath122/kmath122.htm 16 January 2011

Fitting these three points using linear regression (as discussed above), we get the Weibull parameters a = 1169 and b = 4.995. The expected number of failures (which is just n times the cumulative distribution function) is shown below.

This shows a clear "wear out" characteristic, consistent with the observed failures (and survivals). The failure rate is quite low until the unit reaches about 500 hours, at which point the rate begins to increase, as shown in the figure below.

Page 139: Wikipedia: English, Selected Articles - magic · PDF fileshape of a hanging cable (the so-called catenary). ... and Laplace's equation in Cartesian coordinates. ... is the catenary,

11 From www.mathpages.com/home/kmath122/kmath122.htm 16 January 2011

The examples discussed above are classical applications of the Weibull distribution, but the Weibull distribution is also sometimes used more loosely to model the "maturing system" effect of a high level system being introduced into service. In such a context the variation in the failure rate is attributed to gradual increase in familiarity of the operators with the system, improvements in maintenance, incorporation of retro-fit modifications to the design or manufacturing processes to fix unforeseen problems, and so on. Whether the Weibull distribution is strictly suitable to model such effects is questionable, but it seems to have become an accepted practice. In such applications it is common to lump all the accumulated hours of the entire population together, as if every operating unit has the same failure rate at any given time. This makes the analysis fairly easy, since it avoids the need to consider "censored" data, but, again, the validity of this assumption is questionable.