[s.c. gupta, v.k. kapoor] fundamentals of mathemat(bookos.org)

1303

Click here to load reader

Upload: sushil-kumar-gautam

Post on 30-Nov-2015

17.052 views

Category:

Documents


56 download

TRANSCRIPT

  • FUNDAMENTALS OF MATHEMATICAL STATISTICS

    (A Modern Approach) A Textbook written completely on modern lines for Degree, Honours,

    Post-graduate Students of al/ Indian Universities and ~ndian Civil Services, Indian Statistical Service Examinations.

    (Contains, besides complete theory, more than 650 fully solved examples and more than 1,500 thought-provoking Problems

    with Answers, and Objective Type Questions)

    '.

    S.C. GUPTA Reader in Statistics

    Hindu College, University of Delhi

    Delhi

    V.K. KAPOOR Reader in Mathematics Shri Ram Coffege of Commerce University of Delhi Delhi

    Tenth Revised Edition (Greatly Improved)

    I> \0 .be s~ .1i 0\ '1-1-,,-

    J''''et;;' + -..' If ~ .. .. -. ~ ~ 6 lr '

  • * First Edition: Sept. 1970 Tenth Revised Ec;lition : August 2000 Reprint.: 2002

    * Price: Rs. 210.00

    ISBN 81-7014-791-3

    * Exclusive publication, distribution and promotion rights reserved with the Publishers.

    * Published by : Sultan Chand & Sons 23, Darya Ganj, New Delhi-11 0002 Phones: V77843, 3266105, 3281876

    * Laser typeset by : T.P.

    Printed at: New A.S. Offset Press Laxmi Nagar Delhi92

  • DEDICATED TO OUR TEACHER

    PROFESSOR H. C. GUPTA WHO INITIATED

    THE TEACHING OF MATHEMATICAL STATISTICS

    AT THE UNIVERSITY OF DELHI

  • PREFACE TO THE TENrH EDITION

    The book has been revised keeping in mind the comments and suggestions received from the readers. An attempt is made to eliminate the misprints/errors in the last edition. Further suggestions and criticism for the improvement of ~he book will be' most welcome and thankfully acknowledged. \ August 2000 S.c. GUPTA

    V.K. KAPOOR TO THE NINTH EDITION

    The book originally written twenty-four years ago has, during the intervening period, been revised and reprinted seve'ral times. The authors have, however, been thinking, for the last f({w years that the book needed not only a thorough revision but rather a complete rewriting. They now take great pleilsure in presenting to the readers the ninth completely revised and enlarged edition of the book. The subject-matter in the whole book has been rewritten in the light of numerous criticisms and suggestions received from the users of the previous editions in-lndia and abroad.

    Some salient features of the new edition are: The entire text, especially Chapter 5 (Random Variables), Chapter

    6 (Mathematical Expectation), Chapters 7 and 8 (Theoretical Discrete and Continuous Distributions), Chapter 10 (Correlation and Regression), Chapter 15 (Theory of Estimation), has been restructured, rewritten and updated to cater to the revised syllabi of Indian universities, Indian Civil Services and various other competitive examinations.

    During the course of rewriting, it has been specially borne in mind to retain all the basic features of the previous editions especially the simplicity of presentation, lucidity of style and analytical approach which have been appreciated by teachers and students all over India and abroad.

    A number of typical problems have been added as solved examples in each chapter. These will enable 'the reader to have a better and thoughtful understanding of the basic. concepts of the theor.y and its various applications.

    Several new topics have been added at appropriate places to make the treatment more comprehensive and complete. Some of the obvious ADDITIONS are:

    81.5 Triangular Distribution p. 8 i 0 to 812 88.3 Logistic Distribution p. 892 to 895 810 Remks 2, Convergence in Distributipn of Law p. 8106 810.3. Remark 3, Relation between Central ~imit Theorem al?d

    Weak Law of Large Numbers p. 8110 810.4 C;ramer's Theorem p . 8111-8.112, 8114-8115 -

    Example 8.46

  • 8 74 to J Order Statistics - Theory, Illustrations and 8 746, Exercise Set p. 8 736 to 8 751 8 75 'Truncated Distributions-with Illustrations

    p. 8757 to 8756 ~ 706 7 Derivation of Rank Correlation Formula for Tied Ranks

    p. 7040-7047 . 70-7 7 Lines of Rt:;gression-Derivation (Aliter)

    p. 7050-7057. Example 7027 p. 7055 7 O 702 Remark to 10 7 02 - Marginal Distributions of

    Bivariate Normal Distribution p. 7088-7090 Tlieorem 705, p. 7086. and Theorem 706, p. 70."(37 on Bivariate Normal Distribution. Solved Examples 7037, 7032, PClges 70967097 on BVN Distribution. Theorem 735 Alternative Proof of Distribution of (X, S2) using m.g.f. p. 73 79 to 7327

    73 77 X2- Test for pooling of Probabilities (PJ. Test) p. 7369 754 7 Invariance property of Consistent Estimators-TheQrem

    757, pp 753 7542 Sufficient Conditions for C;on~istency-Theorem 752,

    p. 753 1555 MVUE: Theorem 754, p. 7572-7373 757 Remark 7. Minimum Variance Bound (MVB), Estimator,

    p.7524 757 7 Conditions for the equality sign in CramerRao (CR)

    Inequality, p. 7525 to 7527 758 CQmplete family of Distributions (with illustrations),

    p. 7537 to 7534 Theorem 7510 (Blackwellisation), p .. 1536. Theorems 7576 and 7577 on MLE, p. 7555.

    765 7 Unbiased Test and Unbiased Critical Region. Theorem 762pages 769-76 70

    7652 Optimum Regions and Sufficient Statistics, p.7670-7677 Remark to Example 766, p. 76 7 7 7 6 78 and Remarks 7, 2 ,to Example 767, p. 7620 to 7622; GrqphicaI Representation of Critical Regions.

    Exercise sets at the end of each chapter are substantially reorganised. Many new problems are included in the exercise sets. Repetition of questions of the same type (more than what is necessary) has been avoided. Further in the set of exercises, the problems have been carefully arranged and properly graded. More difficult problems are put in the miscellaneous exercise at the end of each chapter.

    Solved examples and unsolved problems in the exercise sets 11Cfve been drav.:n from the latest examination papers of various Indian Universities, Indian' Civil Services, etc.

  • (I'U)

    An attempt has been made to rectify the errors in the previous editions .

    The present edition Incorporates modern viewpoints. In fact with the addition of new topics, rewriting and revision of many others and restructuring of exercise sets, altogether a new book, covering the revised syllabi of almost all the Indian urilversities, is being presfJnted to the reader. It Is earnestly hoped that, In -the new form, the book will prove of much greater utility to the students as well as teachers of the subject.

    We express our deep sense of gratitude to our Publishers Mis sultan Chand & Sons and printers DRO Phototypesetter for their untiring efforts, unfailing courtesy, and co-operation in bringing out the book, in suchan elegant form. We are also thankful to ou; several colleagues, friends and students for their suggestions and encouragement during the preparing of this revised edition;

    Suggestions and criticism for further improvement of the' book as weJl ~s intimation of errors and misprints will be most gratefully received and duly acknowledged.

    ,

    S C. GUPTA & V.K. KAPOOR TO THE FIRST EDITION

    Although there are a iarge number of books available covering various aspects in the field of Mathematical Statistics, there is no comprehensive book dealing with the various topics on Mathematical Statistics for the students. The present book is a modest though detarmined bid to meet the requirf3ments of the students of Mathematical Statistics at Degree, Honours and Post-graduate levels. The book will also be found' of use DY the students preparing for various competitive examinations. While writing this book our goal has been to present a clear, interesting, systematic and thoroughly teachable treatment of Mathematical Stalistics and to provide a textbook which should not only serve as an introduction to the study of Mathematical StatIstics but also carry the student on to 'such a level that he can read' with profit the numercus special monographs which are available on the subject. In any branch of Mathematics, it is certainly the teacher who holds the key to successful learning, Our aim in writing this book has been simply to assisf the teacher in conveying to th~ stude,nts .more effectively a thorough understanding of Mathematical Stat;st(cs.

    The book contains sixteen chapters (equally divided between two volumes). the first chapter is devoted to a concise and logical development of the subject. i'he second and third chapters deal with the frequency distributions, and measures of average, ~nd dispersion. Mathematical treatment has been given to .the proofs of various articles included in these chapters in a very logi9aland simple manner. The theory of probability which has been developed by the application of the set theory has been discussed quite in detail. A ,large number of theorems have been deduced using the simple tools of set theory. The

  • (viii)

    simple applications of probability are also given. The chapters on mathematical expectation and theoretical distributions (discrete as well as continuous) have been written keeping the'latest ideas in mind. A new treatment has been given to the chapters on correlation, regress~on and bivariate normal distribution using the concepts of mathematical expectation. The thirteenth and fourteenth chapters deal mainly with the various sampling distributions and the various tests of significance which can be derived from them. In chapter 15, we have discussed concisely statistical inference (estimation and testing of hypothesis). Abundant material is given in the chapter on finite differences and numerical integration. The whole of the relevant theory is arranged in the form of serialised articles which are concise and to the. point without being insufficient. The more difficult sections will, in general, ,be found towards the end of each chapter. We have tried our best to present the subject so as to be within the easy grasp of students with

    vary~ng degrees of intellectual attainment. Due care has been taken of the examination r.eeds of the students

    and, wherever possible, indication of the year, when the' articles and problems were S!3t in the examination as been given. While writing this text, we have gone through the syllabi and examination papfJrs O,f almC'st all Inc;lian universities where the subject is taught sQ as to make it as comprehensive as possible. Each chapte( contains a large number of carefully graded worked problems mostly drawn from university papers with a view to acquainting the student with the typical questions pertaining to each topiC. Furthermore, to assist the student to gain proficiency iii the subject, a large number of properly graded problems maif)ly drawn from examination papers of various. universities are given at the end of each chapter. The questions and pro.blems given at the end of each chapter usually require for (heir solution a thoughtful use of concepts. During the preparation of the text we have gone through a vast body of liter9ture available on the subject, a list of which is given at the end of the book. It is expected that the bibliography given at the end of the book ,will considerably help those who want to make a detailed study of the subject

    The lucidity of style and simplicity of expression have been our twin objects to remove the awe which is usually associated with most mathematical and statistical textbooks.

    While every effort has been made to avoid printing and other mistakes, we crave for the indulgence of the readers fot the errors that might have inadvertently crept in. We shall consider our efforts amplY rewarded if those for whom the book is intended are benefited by it. Suggestions for the improvement of the book will be hIghly appreciated and will be duly incorporated.

    SEPTEMBER 10, 1970 S.C. GUPTA & V.K. KAPOOR

  • contents

    ~rt Chapter 1 Introduction -- Meaning and Scope

    1 '1 Origin and Development of Statistics 1-1 1'2 Definition of Statistics 1'2 1'3 Importance and Scope of Statistics 1-4 1'4 Limitations of Statistics 1-5 15 Distrust of Statistics 1-6

    Chapter 2 Frequency Distributions and Measures of central Tendency

    2'1 Frequency Distribution$ 21 21'1 Continuous Frequency ,Distribution 2-4

    Pages

    1-1 - 1'8

    .2-1 - 244

    2-2 Graphic Representation of a Frequency Distribution 2-4 2-2-1 Histogram 2-4 2'2'2 Frequency Polygon 25

    2'3 Averages or Measures of Central Tendency or Measures of Location 2'6

    2'4 Requisites for an Ideal Measure of Central Tendency 2-6 25 Arithmetic Mean 2-6

    25'1 Properties of Arithmetic Mean 28 25'2 Merits and Demerits of Arithmetic Mean 2'10 2-5'3 Weighted Mean 2-11

    2'6 Median 2'13 26'1 Derivation of Median Formula 2'19 2-6'2 Merits and Demerits of Median 216

    2-7 Mode 2'17 2-7-1 Derivation of Mode Formula 2'19 27'2 Merits and Demerits of Mode 2-22

    2'8 Geometric Mean 2'22 2'8-1 Merits and Demerits of Geometric Mean 2'23

    2'9 Harmonic Mean 2-25 2'9'1 Merits and Demerits of Harmonic Mean 2'25

    2~1 0 Selection of an Average 2'26 2'11 Partition Values 2'26 /

    2'11'1 Graphicai Location of the Partition Values 2'27

  • (x)

    Chapter-3 Measures of Dispersion, Skewness and Kurtosis 3'1 - 340

    31 3-2 33 34 3:~ 3'6 37

    371 372 37'3

    3-8 3'81

    3-9 3'9'1

    3'9-2 3'93' 3-9-4 3'10' 3'11 3'12 3'13 3'14

    Dispersion 3'1 Characteristics for an Ideal Measure of DisperSion 3-1 Measures of Dispersion 3-1 Range 3-1 Quartile Deviation 3-1 Mean Deviation 3-2 Standard Deviation (0) and,Root Mean Square Deviation (5) 3'2 Relation between 0 and s' 3'3 Different Formulae for Calculating Variance. 3'3 Variance of the Combined Series 3'10 Coefficient of Dispersion 3'12 Coefficient of Variation 3'12 Moments 3'21 Relation Between Moments About Mean in Terms of Moments About Any Point and Vice Versa 3'22 Effect of Cllange of Origin and Scale on Moments 3-23 Sheppard's Correction for Moments 3-23 Charlier's Checks 3'24 Pearson's ~ and y Coefficients 3'24 Factorial Moments 3-24 Absolute Moments 3-25 Skewness 3-32 Kurtosis 3-35

    Chapter- 4 Theory of Probability 4-1 - 4116

    4'1 Introduction 4-1 42 S~ort History 4-1 43 Definitions of Various Terms 4-2

    4'3-1 Mathematical or Classical Probability 4-3 4-3-2 Statistical or Empirical Probability 4-4

    4-4 Mathematicallools : Preliminary Notions of Sets 4-14 4-4-1 Sets and Elements of Sets 4'14

    .4-4-2 Operations on Sets 4-15 4;4-3 Albebra of Sets 4-15 4-4-4 Umn of. Sequence of Sets 4-16 4'45 Classes of S~ts 4-17

    4-5 Axiomatic ApP,roach to Probability 4'17

  • (x~

    4-5-1 Random Experiment (Sample space) 4-18 4-5-2 Event 4-19 4-5-3 Some Illustrations 4-19 4-5-4 Algebra of Events 4-21

    4-6 Probability - Mathematical Notion 4-25 4-6-1 Probability Function 4-25 4-6-2 Law of Addition of Probabilities 4-30-4-6-3 Extension of General Law of Addition of

    Probabilities 4-31 4-7 Multiplication Law of Probability and Conditiooal

    Probability 4-35 4-7-1 Extension of Multiplication Law of Probability 4-36 4-7-2 Probability of Occurrence of At Least One of .the n

    Independent Events 4-37 4-7-3 Independent Events 4-39 4-7-4 Pairwise Independent Events 4-39 4-7-5 Conditions for Mutual Independence of n Events 4-40

    4-8 Bayes Theorem 4-69 4-9 Geometric Probability 4-80

    Chapter 5 Random Variables and Distribution -Functions 5-1 - 5-82

    5 -1 Random Variable 5-1 5-2 Distribution Function 5-4

    5-2-1 Properties of Distribution Function 5-6 5-3 Discrete Random Variable 5-6

    5-3-1 Probability Mass Function 5-~ 5-3-2 Discrete Distribution Function 5 -7

    5-4 Continuous RandomVariable 5-13 5-4-1 Probability Density Function 5-13-5-4-2 Various Measures of Central Tend~ncy, Dispersion,

    Skewness and Kurtosis for Continuous Distribution 5-4-3 Continuous Distribution Function 5-32

    5-5 Joint Probability Law 5-41 5-5-1 JOint Probability Mass Function 5-41 5-5-2 Joint Probability Distribl,ltion Function 5-42 5-5-3 Marginal Distribution Function 5-43 5-5-4 Joint Density Function 5-44 5-5-5 The Conditional Distribution Function 5-46 5-5-6 Stochastic Independence 5-47

    5-6 Transformation of One-dimensional Random-Variable 5-7 Tran&formation of Two-dimensional Random Variable

    5-15

    5-70 5-73

  • Chapter-6 Mathematical Expectation and Generating Functions 6-1 - 6-138

    6-1 Mathematical Expectation 6-1 6-2 Expectation of a Function of a Random Variable 6-3 6-3 Addition Theorem of Expectation 6-4 6-4 Multiplication Theorem of Expectation 6-6 6-5 Expectation 1f a Linear Combination of Random

    Variables 6-8 4

    6-6 Covariance 6-17 6-7 Variance of a Linear Combination of Random

    Variables 6-11 6-8 Moments of Bivariate Probability Distributions 6-54 6-9 Conditional Expectation and Conditional Variance 6-54

    6'10 Moment Generating Function 6'67 6-10'1 Some Limitations of Moment Generating

    Functions 6'68 6'10'2 Theorems 011 Moment Generating Functions 611 6-10-3 Uniqueness Theorem of Moment Generating

    Function 6- 72 6-11 Cumulants 6-72

    6-11-1 Additive Property of Cumulants 6-73 6-11-2 Effect of Change of Origin and Scale on Cumulants 6-73

    6-12 Characteristic Function 617 6-12-1 Properties of Char~cteristic Function 6-78 6-12-2 Theorems on Characteristic Functions 619 6-12-3 Necessary and Sufficient Conditions for a Function

    cIl(t) to be a Characteristic Function 6-83 6-12-4 Multivariate Characteristic Function 6-84

    6-13 Chebychev's Inequality 6-97 6-13-1 Generalised Form of Bienayme-Chebychev Inequality 6-98

    61 4 Convergence in- Probability 6 -100 615 Weak Law of Large Numbers 6-101

    6-15-1 Bernoulli's Law of Large Numbers 6-103 6'15-2 Markoff's Theorem 6-104 6-15-3 Khintchin's Theorem 6'104

    6-16 BorelCantelliLemma 6115 6'17 Probability Generating Function. 6'123

    6-17'1 Convolutions I 6126 I

  • Chapter- 7 Theoretical Discrete Distributions 7'1 - 7'114

    7'0 71

    7'1'1 7'2

    7'2'1 7'2'2

    7'2'3 724 7'25 72'6

    7'27 728 7'29

    7210

    7211

    7'212 7'3

    731 73'2 733 73'4

    735 736 7'37 738

    739

    73'10 74

    741

    742 743

    Introduction 71 Bernoulli Distribution 71 Moments of Bernoulli Distribution 71 Binomial Distribution 71 Moments 76 Recurrence Relation for the Moments of Binomial Distribution 7'9 Factorial Moments of Binomial Distribution 7'11 Mean Deviation about Mean of Binomial Distribution 7'11 Mode of Binomial Distnbution 712 Moment Gen~rating Function of Binom.ial Distribution 714 Additive Property of Binomial Distributio.n 715 Characteristic Function of Binomial Distribution 716 Cumulants of Binomial Distribution 716 Recurrence Relation for Cumulants of Binomial Distribution 717 Probability Generating Function of Binomial

    Distribution 718 Fitting of Binomial Distribution 719 Poisson Distribu~ion 740 The Poisson Process 742 Moments of Poisson Distribution Mode of Pois~n Distribution

    744 745

    Recurrence Relation for the Moments of Poisson Distribution 746 Moment Generaiing Function of Poisson Distribution 7-47 Characteristic Function of POissQn Distribution 747 Cumulants of POisson Distribution 747 Additive or Reproductive Property of Independent Poisson Variates 747 Probability Generating Function of Poisson Distribution 7;49 Fitting of Poisson Distribution 761 Negative Binomial Distribution 7 72 Moment Generating Function of Negative Binomial Distribution 714 Cumulants of Negative Birfo'mial Distribution 774 POisson Distribution as limiting Case of Negative Binomial Di.stribution 7 75

  • 7-4-4 Probability Generating Function of Negative Binomial Distribution 7- 76

    7-4-5 Deduction of Moments of Negative Binomial Distribution From Binomial Distribution 7-79

    7-5 Geometric Distribution 7-83 7-5-1 Lack of Memory 7-84 7-5-2 Moments of Geometric Distribution 7-84 7-5-3 Moment Generating Function of Geometric

    Distribution 7-85 7-6. 'Hypergeomeiric Distribution 7-88

    7-6-1 Mean and Variance of Hypergeometric Distribution 7-89 7-6-2 Factorial Moments of Hypergeometric Distribution 7-90 7-6-3 Approximation to the Binomial Distribution 7-91 7-6-4 Recurrence Relation for Hypergeometric Distribution 7-91

    7-7 Multinomial Distribution 7-95 7-7-1 Moments of Multinomial Distribution 7-96

    7-8 Discrete Uniform Distribution 7-101 7-9 Power Series Distribution 7-101

    7-9-1 Moment Generating Function of p-s-d 7-102 7-9-2 Recurrence Relation for Cumulants of p-s-d 7-102 7-9-3 Particular Cases of gop-sod 7'103

    Chapter-8 Theoretical Continuous Distributions 8-1 8-166

    8 -1 Rectangular or Uniform Distribution 8-1 8-1-1 Moments of Rectangular DiS\ribution 8-2 8-1-2 M-G'F- of Rectangular Distribution 8-2 8-1-3 Characteristic Function 8-2 8-1-4 Mean Deviation about Me~n 8-2 8-1-5 Triangular Distribution 8-10

    8-2 Normal Distribution 8-17 8 -2 -1 Normal Distribution as a Limiting form of Binomial

    Distribution 8-18 8-2-2 Chief Characteristics of the Normal Distribution.and

    Normal Probability Curve 8-20 8-2-3 Mode of Normal distribution 8-22 8-2-4 Median of Normal Distribution 8-23 8-2-5 M-G-F- of Normal'Distribution 8-23 8-2-6 Cumulant Generating Functio.n (c-g-f-) of Normal

    Distribution 8-24 8-2-7 Moments of Normal Distribution 8-24

  • 8'2'8 A Linear Combination of Independent NQnn,1 Variates i~ also a Nonnal'Variate 8'26

    8'2'9 Points of Inflexion of Normal, Curve 8'28 8'2'10 Mean Deviation from the Mean for Normal-Distribu.tion 8'28 8.2.11 Area Property: Normal Probability Integral' -8'29 8212 Error Function 8'30 8'213 Importance of Normal Distribution 831 8,2:14 Fitting of Normal Distribution 8'32 ' 82f5 Log-Normal Distribution 8'65

    83 Gamma Distribution 8'68 8~3'1 MGF of Gamma Distribution 8'68 8'3'2 Cumulant Generating F.unction of Gamma Distribution 8'68 8'3'3 Additive Property of Gamma Distribution 8'70

    8'4 B~ta Distribution of First Kind 870 8'4'1

    8'5 8'5'1

    86 8'6'1

    87 88

    8'8'1 8'8'2 8'8'3

    8'9 8'9'1 8'9'2 8'10

    8'10'1 8'10'2 8'10'3 8'10'4

    8'11 8'11'1 8'11'2 'lH2

    8'12'1

    13'12'2 8'12'3 8'12'4 8'12'5 8'12'6

    Constants of Beta Distribution of First Kind 8' 71 Beta Distribution of Second Kind 8.72 Constants of Beta Distribution of Second Kind 8-72 The Exponential Distribution 8lt5 MGF of Exponential Distribution Laplace Double Exponential Distribution

    8'86 8'89

    Weibul Distribution 8'90 Moments of Standard Weibul Distribution Characterisation of Weibul Distribution

    8'91 8'91

    Logistic Distribution 8'92 Cauchy Distribution 8'98 Characteristic Function of Cauchy Distribution 8'99 Moments of Cauchy Distribution 8-100 Central Limit Theorem 8-105 Lindeberg-Levy Theorem 8-107 Applications of Central Limit Theorem 8-'108 Liapounoff's Central' Limit Theorem 8-109

    Cram~r's Theorem 8-11'1 Compound Distributions 8-11 Q Compound Binomial Distribution 8-116 Compound Poisson distribution 8-117 Pearson\s Distributions 8 '120 Detennination of the ConStants of the Equation in Terms of Moments 8 '121 Pearson Measure of Skewness Criterion 'K' 8 '121

    8'121

    Pearson's Main Type I 8-12Z Pearson Type IV 8-124 Pearson Type VI 8-125

  • 8'12'7- lype III 8'125 8'1"2'8 Type V 8'126 8'12'9 Typell 8'126'

    _8'12'10 T...vpeVIII 8'127 '8'12'11 Zero Type (Nonnal Curve) 8'127 8" 2'12 Type VIII to XII 8'127

    8'1"3 Variate Transformations 8'132 8 '13'1 Uses of Variate Transfonnations 8 132 8'13'2 Square Root Transformation 8'133 8'13-3 Sine Inverse

  • 10-3 10-3-1 10-3-2

    10-4

    10-5 10-6

    10-6-1 10-6-2 1063

    10Q 1071 1072

    '1073 1074 1075 1076 1077

    108 1081

    109 1010

    10101

    10102 10103

    1011 10111

    1012 10121

    1013 10131

    1014 10141

    1015 10151

    1016

    1017

    1018

    Karl Pearson Coefficient of Correlation 107 Limits for Correlation Coefficient 10 -2 Assumptions Unqerlying Karl Pearson's Correlation Coefficient 105 Calculation of the Correlation Coefficient for a Bivariate Frequency Distribution 1032 Probable Error of Correlation Coefficient Rank Correlation 10-39 Tied Ranks 1.040 Repeated Ranks (Continued) 10-43

    10-38

    Limits for Rank Corcelation Coefficient Regression 1049

    1044

    Lines of Regression 1049 Regression Curves 10-52 Regression Coefficients 10-58 Properties of Regression Coefficients Angle Between Two Lines of Regression Standard Error of ~stimate 10-60

    1058 .10-59

    Correlation Coefficient Between Observed and Estimated Value 1061

    Correlation Ratio 10 76 Measures of Correlation Ratio 10-76 Intra-class Correlation 1081 Bivariate Normal Distribution 10-84 Moment Generating Function of Bivariate Normal Distribution 10-86 Marginal Distributions of Biv~riate Normal Distribution 1088 Conditional Distributions 1090 Multiple and Partial Correlation 10-103 Yule's Notation 10 104 Plane of Regression 10105 Generalisation 10106 Properties of Residuals, 10109 Variance of the Residuals 10-11,0 Coefficient of Multiple Correlation 10111 Properties of Multiple Correlation 'Coefficient 70-113 Coefficient of Partial Correlation 10-11'4 Generalisation 10-116 MlAltiple Correlation in Terms of Total and Partial Correlations :10-116 Expression for Regression Coefficient in Terms pf Regression Coefficients of Lower Order 10118 Expression for Partial Correl::ltion Coefficient in Terms of Correlation Coefficients of Lower Order 10118

  • (xviii)

    Chapter - 11 :r Theory of Attributes 11-1 - 11-22

    11-1 Introduction 11-1 11-2 Notations 11-1 11:3 Dichotomy 11-1 11-4 Classes and Class Frequencies. 11-1

    1'1"-4-1 Order of Classes and Class Frequencies 11-1 11-4-2 Relation between Class Frequencies 11-2

    11-5 Class Symbols as Operators 11-3 11-6 Consistency of Data 11-8

    11-6-' Condjtions for Consistency of Data 11-8 11-7 Independence of Attributes 11-12

    11-7-1 Criterion of Independence 11-12 11-7-2 Symbols (AB)o and 0 11-14

    11 -8 Association of Attributes 11 -15 11-8 -1 Yule's Coefficient of Association 11-16 11 -S -2 Coefficient of Colligation 11 -16

    Chapter-12 Sampling and Large Sample Tests 12-1 - 12-50

    12-1 Sampling -Introduction 12-1 12-2 Types of Sampling 12-1

    12-2-1 Purposive Sampling 12-2 12-2-2 RandomSampling 12~ 12-2-3 Simple Sampling 12-2 12-2-4 Stratified Sampling 12-3

    12-3 Parameter and Statistic 12-3 12-3-1 Sampling Distribution 12-3

    12

  • (xix)

    12'9'2 Test of Significance for Difference of Proportions 12-15 12-10 Sampling of Variables 12'28 12 '11 Unbiased Estimates for Population Mean (~) and

    Variance (~) 12-29 12'12 Standard Err9rof Sample Mean 12'31 12'13 Test of Si91"!ificance for Single Mean 12-31 12'14 Test of Significance for Difference of Means 12-37 12'15' Test of Significance for Difference of Standard

    Deviations 12'42

    Chapter-13 Exact Sampling Distributions (Chi-Square Distribution)

    13-1 13'2

    13'3 13'3'1 13'3'2 13'3'3 13-3'4 13'3'5

    134 13'5 13'6 137

    137'1 137'2 137'3

    138 13'9

    139'1

    13'10

    13'11 13'12

    13'12'1

    13'1 - 13'72 Chi-square Variate 13-1 Derivation of the Chi-square Distribution-First Method - Metho.d of MGF- 13-1 SecoRd Method - Method of Induction 13'2 M-G'F' of X2-Distribution 13-5 Cumulant Generating Function of X2 Distribution 135 Limiting Form of X2 DistributiOn 136 Characteristic Function of X2 Distribution Mode and Skewness of X2 Distribution

    13'7 137

    Additive Property of Chi-square Variates 13-7 Chi-Square Probability Curve 13-9 Conditions for the Validity of X2 test 13'15 Linear Transformation 13 -1.6 Applications of Chi-Square Distribution 13'37 Chi-square Test for Population Variance 13'38' Chi-square Test of Goodness of Fit 1339 Independence of Attributes 13 '49 Yates Correction 1357' Brandt and Snedecor Formula for 2 x k Contingency Table 13-57 Chi-square Test of Homogeneity of Correlation, Coefficients 13'66 Bartlett's Test fOF Homogeneity of Severallndependent Estimates of the same Population Variance 13'68 X2 Test for Pooling the Probabilities (P4 Test.) 1~'69 Non-central X2 Distribution 13'69 Non-central X2 Distribution with Nqn-Cel'!trality

    Parameter). 13' 70 13'12'2 Moment Generating Function of Non-central

    X2 Distribution 13'70

  • (xx)

    13123 Additive Property of Non-central Chi-square Distribution 13: 72

    13 124 Cumul:mts of Non~entral Chi-square Distribution 1372

    Chapter-14 Exact Sampling Distributions (Continued) (t, F an~ z distributions) , 141 - 1474

    141 Introduction 141 142 Studenfs"t" 141

    1421 Derivation of Student's t-distribution 1422 Fisher's "t" 143

    144

    142

    1423 Distribution of Fisher's "t" 1424 Constants of t-distribution 1425 Limiting ferm of t-distribution

    145 14014

    1426 Graph of t-distribution 1415 1427 Critical Values of 't' 14'15

    Applications of t-distribution 1416 't-Test for Single Mean 1416 Hest for Difference of Means 1424

    1428 1429

    14210 14211 t-te~t for Testing Significance of an Observed Sample

    14212

    14213

    143

    144 145

    145'1 1452 1453 1454 1455 1456 1457 1458

    1459

    14510

    Correlation Coefficient 1437 t-test for Testing Significance of an Observed Regression Coefficient 1439 t-testlor Testing Significance of an Observed Partial Correlation Coefficient 1439 Distribution of Sample Correlation Coefficient when Population Correlation Coefficient p = 0 1439 Non-central t-distribution 1443 F-statistic (Definition) 1444 Derivation of Snedecor's F-Distribution 1445 Constants of F-c:listributlon 1446 Mode and POints of Inflexion of F-distribution Applications of F-c:listribution 1457

    1448

    F-test for Equality, of Population Variances Relation Between t and F-di$lributions

    1457 1464

    Relation ~~tween F and X2 1465 F-test for Testing the Significance of an Observed Multiple Correlation Coefficient 1466 F-test for Testing the Significance of an Observed Sample Correlation Ratio 1466 F-test ior Te$ting the linearity of Regression 1466

  • (xxi)

    145'11 F-test for Equality of Several Means 14'67 146 Non-Central'P-Distribution 14'67 147 Fisher's Z - Distribution 1469

    14'7-1 M-G-F- of Z- Distribution 14-70 14-8 Fisher's Z- Transformation 14-71

    Chapter-15 Statistical Inference - I (Theory of Estimation) 15-1 - 15-92

    15'1 Introduction 15-1 15'2 Characteristics of Estimators 15'1 15'3 Consistency 15'2 15'4 Unbiasedness 15'2

    15'4'1 Invariance Property of Consistent Estimators 15-3 15'4'2 Sufficient Condjtions for Consistency 15-3

    15'5 Efficient Estimators 15'7 155'1 Most Efficient Estimator 15-8

    15'6 Sufficiency 15'18 157 Cfamer-Rao Inequality -15'22

    15'7'1 Conditions for the Equality Sign in Cramer-Rao (C'R') Inequality 15:25

    15-8 Complete Family of Distributions 15-31 15'9 MVUE and Btackwellisation 15'34

    15'10 Methods of Estimation 1552 15'11 Method of Maximum Likelihood Estimation 15-52 15'12 Method of Minimum Variance 15-69 15'13 Method of Moments 15-69 15'14 Method of Least Squares 15-73 15'15 Confidence Intervals and Confidence Limits 15'82

    15'15'1 Confidence Intervals for Large Samples 15'87

    Chapter-16 Statistical Inference - \I Testing of Hypothesis, Non-parametric Methods and Sequential Analysis 16'1 - 1 6-' 80

    16'1 Introduction 16'1 16'2 Statistical Hypothesis (Simple and-Composite) 16'1

    16'2,'1 Test of a Statistical Hypothesis 16-2 16'2'2 Null Hypothesis 16-2 16-2-3 Alternative Hypothesis 16-2 16-2-4 Critical Region 16-3

  • (xxii)

    162'5 Two Types of Errors 164 16:2'6 level of Significance 165 162'7 Power of the Test 165,

    16'3 Steps in Solving Testing of Hypothesis Problem 16'6 16'4 Optimum Tests Under Different Situations 166

    16'4'1 Most Powerful Test (MP Test.) 16'6 164'2 Uniformly Most Powerful Test 167

    16'5 Neyman-Pearson lemma 16' 7 165'1 Unbiased Test and Unbiased Critical Region 165'2 Optimum Regions and Sufficient Statistics

    16'6 likelihood Ratio Test 16'34

    16'9 1610

    16'6',1 Prope'rties of Likelihood Ratio Test 16-37 16-37 16-7-1 . Test for the Mean of a Normal Population

    16'7-2 Test for the Equality of Means of Two Normal Populations 1642

    16' 7'3 Test for the Equality of -Means of Several Normal Populations 16'47

    16'7-4 Test for the Variance of a Normal Population 16'7-5 Test for Equality of Variances of two Normal

    Populations 1653 16-7'6 Test forthe Equality of Variances of several

    Normal Populations 1655 16'8 Non-parametric Methods 16-59

    16-50

    16-8'1 Advantages and Disadvantages of N'P' Methods over

    16'8'2 16'8'3 16'8'4 16'85 16-8'6 16'8'7

    Parametric Methods 1659 Basic Distribution 16'60 Wald-Wolfowitz Run Test 16,61 Test for Randomness 16'63 Median Test 1664 Sign Test 16'65 Mann-Whitney-Wilcoxon U-test 16'66 Sequential Analysis 16'69 16'9

    16'9'1 16'9'2

    Sequential Probability Ratio Test (SPRT) 16'69 Operating Characteristic (O.C.) Function of S.P.R.T 16-71

    16'9'3 Average Sample Number (A.S.N.)

    APPENDIX

    Numerical Tables (I to VIII) Index

    ,1671

    1'1 - 1'11 1-5

  • Fundamentals of Mathematical Statistics s.c. GUPTA V.K. KAPOOR Hindu College, Shri Ram College of Commerce

    University of Delhi, Delhi Un!~e~!ty of Delhi, Delhi Tenth Edition 206~1 Pages xx+ 1284 22 x 14 cm ISBN 81-7014-791-3 Rs 210.00 Special Features comprehensive and analytical treatment is given of all the topics. Difficult mathematical deductions have been treated logically and in a very simple manner. It conforms to the latest syllabi of the Degree and post ,graduate examinations in

    Mathematics, Statistics and Economics. Contents IntrOduction Frequency Dislribution and Measures 01 Central Tendency Measures of Dispersion, Skewness and KurtoSIs Theory 01 Probabilily Random Vanab/es-Dislribution Function Malhemalical Expectalion, Generating Func:lons and Law of Large Numbers Theorelical Discrele Dislributions Theoretical Continuous Distnbulion~ CUM HUng and prinCiple 01' Leasl Squares Correlation, Regression. Bivariale Normal Distribution and Partial & Multiple Correfation TheOry 01 Attribules ' Sampling and Large Sample TeslS 01 Mean and Proportion Sampling Dislribution Exact (Ch~sQuare Distribution) focI Sampling Dislribulions (I, F and Z Distribullons) Theort of Estimation Tesll'lg 01 HypoIhesis.

    ~~ and Non-parametric Melhods.

    Elements of Mathematical Statistics s.c. GUPTA V.K. KAPOOR

    Third Edition'2001 Pages xiv + 489 ISBN 81-7014-29'" Rs70.00

    Ii Prepared specially for B.Sc. students, studying Statistics as subsidiary or ancillary subject. Contents Introduction-Meaning and Scope Frequency Dislributions and Measures 01 Cenlral Tendency Measures 01 Dispersion, SkewnessandKurtosis TheoryolProbabitity RandomVariable~istributionFunctions MalhemalicalExpec\lIlon. Generalon Functions and Law 01 Large Numbers Theoretic;al Discrete Dislributions Theorelic;al Continuous Dislrllutions Curve Filling and Principle 01 Least Squares Correlation a!1d Regression Theory 01 Annbules Sampling and Large Sample Tests Chlsquare Dislnbulion Exact. Sampling Dislribulion Theory 01 Estimalion :resting of Hypothesis Analysis 01 Variance Design 01 Experiments Design 01 Sample Surveys Tables.

    Fundamentals of Applied 'Statistics s.c. GUPTA V.K. KAPOOR

    Third Edition 2001 Pages xvi + 628 ISBN 817014-1516 Rs 110.00

    Special Foatures The book provides comprehensive and exhaustive theoretical discussion. All basic concepts have been explained in an easy and understandable manner. 125 stimulating problems selected from various university examinations have been solved. It conforms to the latest syllabi of B.Sc. (Hons.) and post'graduate examination in Statistics,

    Agriculture and Economics. Contents Stalisli

  • Operation~ Research for Managerial Decision-making

    V. K. KAPOOR Co-author of Fundamentals of Mathematical Statistics

    Sixth Revised Edition 2~ Pages xviii + 904 ISBN 81-7014-130-3 Rs225.00 This well-organised and profusely illustrated book presents updated account of the Operations Research Techniques. Special Features It is lucid and practical in approach. Wide variety of carefully selected. adapted and specially designed problems with complete

    solutions and detailed workings. 221 Worked example~ l:Ire expertly woven into the text. Useful sets of 740 problems as exercises are given. , The book completely covers the syllabi of M.B.A.. M.M.S. and M.Com. courses of all Indian

    Universities. Contents Introduction to Operations Research Linear Programming : Graphic Method Linear 'Programming : SimPlex Method Linear Programming. DUality Transportation Problems Assignment Problems Sequencing Problems Replacement Dacisions Queuing Theory Decision Theory Game Theory Inventory Management Statistical Quality Control Investment Analysis PERT & CPM Simulation Work Study Value Analysis Markov Analysis Goal. Integer and Dynamic Programming

    Problems and Solutions in Operations Research

    V.K. KAPOOR Fourth Rev. Edition .200~ Pages xii + 835 ISBN 81-7014-605-4

    Salient Features

    Rs235.00

    The book fully meets the course requirements of management and commerce students. It would also be extremely useful for students ot:professional courses like ICA. ICWA.

    Working rules. aid to memory. short-cuts. altemative methods are special attractions of the book.

    Ideal book for the students involved in independent study. Contents Meaning & Scope Linear Programming: Graphic Method Linear Programming : Simplex Method Linear Programming: Duality Transportation Problems Assignment Problems Replacement Decisions Queuing Theory Decision Theory Inventory Management Sequencing Problems Pert & CPM Cost Consideration in Pert Game Theory Statistical Quality Control Investment DeCision Analysis Simulation.

    Sultan Chand & Sons Providing books - the never failing friends

    23. Oaryaganj. New Oelhi-110 002 Phones: 3266105. 3277843. 3281876. 3286788; Fax: 011-326-6357

  • CHAPTER ONE

    Introduction - Meaning and Scope 11. Origin and Development of Statistics', Statistics, in a sense, is as old

    as the human society itself. .Its origin can be traced to the old days when it 'was regarded as the 'science of State-craft' and,was the by-product of the administrative activity of the State. The word 'Statistics' seems to have been'derived from the Latin word' status' or the Italian word' statista' orthe German word' statistik' each of which means a 'political state'. In ancient times, the government used to collect .the information regarding the population and 'property or wealth' of the country-the fo~er enabling the government to have an idea of the manpower of the country (to safeguard itself against external aggression, if any), and the latter providing it a basis for introducing news taxes and levies'.

    In India, an efficient system of collecting official and administrative statistics existed even more than 2,000 years ago, in particular, during the reign of Chandra Gupta Maurya ( 324 -300 B.C.). From Kautilya's Arthshastra it is known that even before 300 B.C. a very good system of collecting 'Vital Statistics' and registration of births and deaths was in vogue. During Akbar's reign ( 1556 - 1605 A.D.), Raja Todarmal, the then. land and revenue ministeI, maintair.ed good records of land and agricultural statistics. In Aina,e-Akbari written by Abul Fazl (in 1596 - 97 ), one of the nine gems of Akbar, we find detailed accounts of the administrative and statistical surveys conducted during Akbar's reign.

    In Germany, the systematic c(\llection of official statistics originated towards the end of the 18th'century when, in order to' have an idea of the relative strength of different Gennan States: information regarding population and- output - in-dustrial and agricultural - was collected. In England, statistics were the outcome of Napoleonic Wars. The Wars necessilated the systematic collection of numerical data to enable the government to assess the revenues and expenditure with greater precision and then to levy new taxes in order to 1)1CCt the cost ~f war.

    Seventeenth century saw the.,origin of the 'Vital Statistics.' Captain John Grant of London (1620 - 1674) ,known as the 'father' of Vital Statistics, was the first man to study the statistics of births and deaths. Computauon of mortality tables and the calculation of expectation of life at different ages by a number of persons, viz., Casper Newman,.Sir WiJliallt Petty (1623 " 1687 ), James Dodson: Dr. Price, to mention only a few, led to the idea of 'life insurance' and the first life insurance institution was founded in London in 1698.

    The theoretical deveiopment of the so-called modem statistics came during the mid~sevemecnth century with the introduction of 'Theory of Probability' and 'Theory of Games and Chance', the chief contributors being mathematiCians and gamblers of France, Germany and England. The French mathematician Pascal (1623 - 1662 ), after lengthy correspondence with another French mathematician

  • 12 Fundamentals of Mathematical Statistics

    P. Fermat (1601 - 1665 ) solved the famous 'Problem of Points' posed by the gambler Chevalier de - Mere. His study of the problem laid the foundation of the theory of probability which is the backbone of the modern theory of statistics. Pascal also investigated the properties of the co-effipients of binomml expansions and also invented mechanical computation machine. Other notable contributors in this field are : James Bemouli ( 1654 - 1705 ), who wrote the fIrst treatise on the 'Theory of Probability'; De-Moivre (1667 - 1754) who also worked on prob-abilities an,d annuities arid published his important work "The Doctrine of Chances" in 1718, Laplac~ (1749 -, 1827) who published in l782 his monumental work on the theory of'Rrobability, and Gauss (1777 - 1855), perhaps the most original!Qf all ~riters po statistical subjects, who gave .the principle.of deast squares and the normal law of errors. Later on, most of the prominent mathematicians of 18th, 19th and 20th centuries, viz., Euler, Lagrange, Bayes, A. Markoff, Khintchin, Kol

    J

    mogoroff, to mention only a few, added to. the contributions in. the field of probability. . Modem veterans in the developlJlent of the subject are Englishmen. Francis

  • Introduction

    Statistics as 'Statistical Data' Webster defines Statistics ali "classified facts represt;nting the conditions of

    the people in a State ... especially those facts which can be stated in numbers or in any other tabular or classified arrangement." This definition, since it confines Statistics only to the data pertaining to State; is inadequate as the domain of .Statistics is much wider.

    Bowley defines Statistics as " numerical statements of facts in any department of enquiry placed in relation to each other."

    A more exhaustive definition is given by Prof. Horace Secrist as follows: " By Statistics we mean aggregates of facts affected to a marked extent by

    multiplicity of causes numerically expressed, enumerated or estimated according to reasonable standards of accuracy, collected in a systematic manner for a pre"lletermined purpose and placed in relation tv each other." Statistics as Statistical,Methods

    Bowley himselr d.efines Statistics in. the rollQwing three different ways: (i) Statis4c~ may be called the ~i~ce of cou~ting.

    (ii) Statistics may rightly be called the science of.averages. (iii) Statistics is the science of the meac;urement of social organism, .reg~ded

    ali a whole in all its manifestations. But none of the above definitions is adequate. The first because1tatisticsJis

    not merely confined to the collection of data as other aspects like presentation, analysis and interpretation, etc., are also covered by it. The second, because averages are onl y a part ofthe statistical tools used in the analysis of the data, others' being dispersion, skewness, kurtosi'S, correlation, regression, etc. J'he third, be=-cause it restricts the application of StatistiCS'fO sociology alone while in modem days Statistics is used in almost all sciences - social as well as physical.

    According to Boddington, " Statistits is the.. science of estimates and prob-abilities." This also is an inadequate definition smce probabilities'and estimates] constitute only a part of the statistical methods. .

    Some other definitions are : "The science of Statistics is the method of judging colleotive, natural or soc.,idl

    phenomenon from the results obtained from the analysis or enUin.eration or collectio~ of estimates. "- King.

    " Statistics is the science which deals with collection, classification and tabulation of nume.rical facts as the basis for explanation, 'description and com-' parison of phenomenon." ...: Lovitt.

    Perha~s the best definition-seems to be one given by' Croxton and Cowden, according to whom.Statistics may be defined as " the science which deals w(th.the collection, analysis and interpretation of numerical data."

  • 14 Fundamentals of Mathematical Statistics

    13. Importance and Scope of Statistics. In modern times, Statistics is viewed not as a mere device for collecting numerical data but as a means of developing sound techniques for their handl!ng and analysis and drawing valid inferences from them. As such it is not confined to the affairs of the State but is intruding constantly into various diversified spheres of life - social, economic and political. It is now finding wide applications in almost all sciences - social as well as physical- such as biology, psychology, education, econom ics, business manage-ment, etc. It is hardly possible to enumerate even a single department of human activity where statistics does not creep in. It has rather become indispensable in all phases of human endeavour.

    Statistics a d Planning. Statistics is indispensable to planning. In the modem age which is termed as 'the age of planning', almost allover the world, goemments, particularly of the budding economies, are resorting to planning for the economic development. In order that planning is successful, it must be based soundly on the correct analysis of complex statistical data.

    Statistics and Economi~s. Statistical data and technique of sta~istical analysis have' proved immensely usefulin solving a variety of economic problems, such as wages, prices, analysis of time series and demand analysis. It has also facilitated the development of economic theory. Wide applications of mathematics a~d statistics in the study of economics have led to the development of new disciplines called Economic Statistics and Econometrics.

    Statistics and Bl!.siness. Statistics is an indispensable tool of production control also. Business executives are relying more and more on statistical techni-ques for studying the needs and the desires of the consumers and for many other purposes. The success of a businessman more or less depends upon the accuracy and precision of his statistical forecasting. Wrong expectations, which may be the resUlt ,of faulty and inaccurate analysis of. various causes affecting a particular phenomenon, might lead to his, disaster. Suppose a businessman wants to manu fac-ture readymade gannents. Before starting with the production process he must have an, overall idea as to 'how man y ,garments are to be manufactured', 'how much raw material and labour is needed for that' ,-and 'what is the quality, shape, coloQl', size, etc., of the garments to be manufactured'. Thus the fonnulation of a produc-tion plan in advance is a must which cannot be done without having q4alltitative facts about the details mentioned above. As such most of the large industrial and commercial enterprises are employing trailled and efficient statisticians.

    Statistics and Industry. In indu~try,Statistics is very widely used in 'Quality Control'. in production engineering, to find whether the product is confonning to specifications or not, statistical tools, vi~" inspection plans, control charts, etc., are of e~treme importance. In inspection p~ns we have to resort to some kind of sampling - a very impOrtant aspect of Statistics.

  • Introduction 15 Statistics and Mathematics. Statistics and mathematics arc very intimately

    related. Recent advancements in statistical techniques arc the outcome of \yide. applications of advanced mathematic.s. Main contributors to statistics, namely,-Bemouli, Pascal, Laplace, De-Moivre, Gauss, R. A. Fisher, to mention only a few, were primarily talented and skilled mathematicians. Statistics may be regarded as that branch of mathematics which provided us with systematic methods of analys-ing a large I)umber of related numerical facts. According to Connor, " Statistics is a branch of Applied Mathematics which specialises in data." Increac;ing role of mathematics in statistical ~alysis has resulted in a new branch of Statistics called Mathematical Statistics.

    Statistics and Biology, Astronomy and Medical Science. The association between statistical methods and biological theories was first studied by Francis Galton in his work in 'Regressior:t'. According to Prof. Karl Pearson, the whole 'theory of heredity' rests on statistical basis. He says, " The whole problem of evolution is a problem of vital statistics, a problerrz of longevity, of fertility, of health, of disease and it is impossible for the Registrar General to discuss the national mortality without an enumeration of the popUlation, a classification of deaths and knowledge of statistical theory."

    In astronomy, the theory of Gaussian 'Normal Law of Errors' for the study. of the movement of stars and planets is developed by using the 'Principle of Least Squares'.

    In medical science also, the statistical tools for the collection, presen,tation and analysis of observed facts relating to the causes and incidence bf diseases and the results obtained from the use of various drugs and medicines, are of great importance. Moreover, thtf efficacy of a manufacutured drug or injection or medicine is tested by l!sing the 'tests of sigJ'lificance' - (t-test).

    Statistics and Psychology and Education. In education and psychology, too, Statistics has found wide applications, e.g., to determine the reliability and validity of a test, 'Factor Analysis', etc., so much so that a new subject called 'Psychometry' has come into existence.

    Statistics and War. In war, the theory of 'Decision Functions' can beof great assistance to military and technical personnel to plan 'maximum destruction with minimum effort'.

    Thus, we see that the science of Siatistics is. associated with almost a1.1 the sciences - social as well as physical. Bowley has rightJy said, " A knowledge of Statistics is like a knowledge o/foreign language or 0/ algebra; it may prove o/use at any time under any circumstance:"

    14. Limitations of StatistiCs. Statistics, with its wide applications in almost every sphere of hUman ac!iv,ity; is not without limitations. The following are some of its important limitations:

  • ie; Fundamentals of Mathematical Statistics (i) Statistics is not suitedto the study of qualitative phenomenon. Statistics,

    being a science dealing with a set of numerical data, is applicable to the study of only those subjects of enquiry which are capable of quantitative measurement. As such; qualitative phenomena like honesty, poverty, culture, etc., which cannot be expressed numerically, are not capable of direct statistical analysis. However, statistical techniques may be applied indirectly by first reducing the qualitative expressions to precise quantitative tenns. For example, the intelligenc.e.of.a group of candidates can be studied on the basis of their scores in a certain test.

    (ii) Statistics does not study individuals. Statistics deals with an aggregate of objects and does not give any specific recognition to the individual items of a series. Individual items, taken separately, do:not constitute statistical data and are mean-ingless for any statistical enquiry. For example, the individual figures of agricul-tural production, industrial output or national income of ~y. country for a particular year are meaningless unless, to facilitate comparison, similar figures of other countries or of the same country for different years are given. Hence, statistical analysis is suitod to only those problems where group characteristics are to be studied

    (iii) Statistical laws are not eXilct. Unlike the laws of physical and natural sciences, statistica1laws are only approximations and not exact. On the basis of statistical analysis we can tallc only in tenns of probability and chance and not in terms of certainty. Statistical conclusicns are not universally true - they are true only on an average. For example, let us consider the statement:" It has been found that 20 % of-a certain surgical operations by a particular doctor are successful."

    '!' The statement does not imply that if the doctor is to operate on 5 persons on any day and four of the operations have proved fatal, the fifth must bea success. It may happen that fifth man also dies of the operation or it may also happen that of the .fi~e operations 9n any day, 2 or 3 or even more may be successful. By the statement 'lje mean that as number of operations becomes larger and larger we should expect, on the average, 20 % operations to be successful.

    (iv) Statistics is liable to be misused. Perhaps the most important limitation of Statistics is that it must be uSed by experts. As the saying goes, " Statistical methods are the most dangerous tools in the hands of the inexperts. Statistics is one of those sciences whose adepts must exercise the self-restraint of an artist." The use of statistical tools by inexperienced and mtttained persons might lead to very fallacious conclusions. One of the greatest shoncomings of Statistics is that they do not bear on their face the'label of their quality and as such carl be moulded and manipulated in any manner to suppon one's way of argument and reasoning. As King says, " Statistics are like clay of which one can niake a god or devil as one pleases." The requirement of experience and skin for judicious use of statistical methods restricts their use to experts only and limits the chances of the mass popularity of this useful ~d important science.

  • Introduction I '1 15. Distrust of Statistics. WlYoften hear the following interesting comments

    on Statistics: (i) 'AI) ounce of truth will produce tons of Statistics', (ii) 'Statistics can prove anything', (iii) 'Figures do not lie. Liars figure', (iv) 'If figures say so it can't be otherwise', (v) 'There are three type of lies - lies, demand lies, and Statistics - wicked in

    the order ofotheirnaming, and so on. Some of the reasons for the existence of such divergent views regarding the

    nature and function of Statistics are as follows: (i) Figures are innocent, easily believable and more convincing. The facts

    supported by figures are psychologically more appealing. (ii) Figures put forward for arguments may be inaccurate or incomplete and

    thus mighUead to wrong inferences. (iii) Figures. though accurate. might be moulded and manipulated by selfish

    persons to conceal the truth and present a distorted picture of facts to the public to meet their selfish motives. When the skilled talkers. writers or politicians through their forceful..writings &rid speeches or the business and commercial enterprises through aavertisements in the press mislead the public or belie their expectations by quoting wrong statistical statements or manipulating statistsical data for per-sonal motives. the public loses its faith ,and belief in the science of Statistics and starts condemning it. We cannot blame t~e layman for his distrust of Statistics. as he. unlike statistician. is not in a position to distinguish between valid and invalid conclusions from statistical statements and analysis.

    It may be pointed out that Statistics neither proves anything nor disproves anything. It is only a tool which if rightly used may prove extremely useful and if misused. might be disastrous. Accord'ing to Bowley. "Statistics only furnishes a tool. necessary though imperfect, which is dangerous in the hands of those who do not know its use and its deficiencies." It is not the subject of Statistics that is to be blamed but those people who twist the numerical data and misuse them either due to ignorance or deliberately for personal selfish motives. As King points out. "Science of Statistics is the most useful servant but only of great value to those who understand its proper ~e."

    We discuss below a few interesting examples of misrepresentation of statistical data.

    (i) A statistical report: "The number of accidents taking place in the middle of the road is much less than the number of accidents taking place on 'its side. Hence it is safer to walk in the middle of the road." This conclusion is obviously wrong since we are not given the proportion of the number of accidents to the number of persons walking in the two cases. .

  • 18 Fundamentals of MathematiCal StatisticS

    (ii) "The number ohtudents laking up Mathematics Honours in a University has increased 5 times during the last 3 years. Thus, Mathematics is gaining

    'popularity among the students of the university." Again, the conclusion is faulty since we are not given any such details about the other subjects and hence comparative study is not possible.

    (iii) "99% of the JJe9ple who drink alcohol die before atlainjng the age of 100 years. Hence drinking)s harmful for longevity of life." This statement, too, is incorrect since nothing is mentioned about the number of per:sons who do not

  • FREQUENCY DISTRlBUfIONS AND MEASURES OF CENTRAL TENDENCY

    2.1. Frequency Distributions. When observations, discrete or contin~ous, are available on a single characteristic of a large number of individuals, often it beComes necessary to condense the data as far as possible without losing any information of interest Lei ~ consider the mw:ks in Statistics obtained by 250 candidates selected at random from among those appearing in a certain examina-tion.

    TABLE 1: MARKS IN STATISTICS OF 250 CANDIDATES , ' ,

    32 47 41 51 41 30 39 18 48 53 54 32 31 46 .5 37 32 56 42 48 38 26 50 40 38 42 35 22 62 51 44 21 45 31 37 41 44 18 37 47 68 41 30 52 52 60 42 38 38 34 41 53 48 21 28 49 42 36 41 29 30 33 31 35 29 37 38 40 32 49 43 32 24 38 38 22 41 50 17 46 46 50 26 15 23 42 25 52 38 46 41 38 40 37 40 48 45 30 28 31 40 33 42 36 51 42 56 44 35 38 31 51 45 41 50 53 50 32 45 48 40 43 40 34 '34 44 38 58 49 28 40 45 19 24 34 47 37 33 37 36 36 32 61 30 44 43 50 31 38 45 46 40 32 34 44 54 35 39 31 48 48 50 43 55 43 39 41 48 53 34 32 31 42 34 34 32 33 24 43 39 40 50 27 47 34 44 34 33 47 42 17 42 57 35 38 17 33 46 36 23 48 50 31 58 33 .cf4 26 29 3t 37 47 55 57 37 41 54 42 45 47 43 37 52 47 46 44 50 44 38 42 19 52 45 23 41 47 33 42 24 48 39 48 44 60 38 38 44 38 43 40 48

    This representation of, the data dQes not furnish any useful information and is rather confusing to mind. A better way may be to express the figures in an ascending or descending order of magnitude, commonly termed as array. But this does not reduce the bulk of the data. A much better representation is given on the next page.

    A bar ( I ) called tally mark is put against the'number when it occurs. Having occurred four times. the fifth occurrence is represented by puttfug a cross tally (j) on the first four. tallies. This technique faciliiates the counting of the tally marks at the end.

  • 22 Fundamentals Of Mathematical Statistics

    The representation of the data as above is known as frequency distribution. Marks are called the variable (x) and the 'number of students' against the marks is known as the frequency (f) of the variable. The word 'frequency' is derived from 'how frequently' a variable occurs. For example, in the above case the frequency of 31 is 10 as there are ten students getting 31 marks. This repre-sentation, though beuer than an array' ,does not condense the data much and it is quite cumbersome to go through this huge mass of iIata.

    TABLE 2 Marks

    15 17 18 19 21 22 23 24 25 26 27 28 29 30 31 32 " 33 34 35 36 37 J8 39

    No.o Students Total Tal yMarks Frequency

    II =2 III =3 II =2 II =2 /I =2 II =2 III =3 1111 =4 I =1 III =3 I =1 III =3 /I =2 un =5 un un = 10 un un =10 un III =8 un l1li1 =11 l1li =5 l1li =5 l1li l1li11 = 12 un l1li un II = 17 l1li1 =6

    Marks NO.ofStudelits Tota Tally Marks Frequency

    40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 60 61 62 68

    l1lil1li1 l1lil1li un l1li Iii un III un l1li II un II un II l1li111 I un l1li I III ' IIII .. Uft 1111 un 1111

    =11 =10 =13 =8 =12 =7 =7 =8 = 12 =3 = 10 =4 =5 =4 =3 =2 =2 =2 =2 =3 =1 =1 =1

    If the identity of the ihdividuals about whom a particular i.QfonnatiQn is taken is not relevant, nor the order in which the ooservations arise, then the first real step of condensation is to divide the observed range of variable into a suitable number of class-intervals and to recooI the number of o~rvations in each class. FQr example, in the above case, the data may be expressed as shown in Table 3.

    Such a table showing the distribu- TABLE 3 ': FREQUENCY TABLE tion of ~e' frequend~ in the different.;' Marks No. of students. classes IS called a frequency table and ( x ) (f)

    ~ manner in which the class frequen- 15,-,19 9 des are distributed over the class batel- 20 - 24 11 vals is called the grouped jreq.uency ~ =~: ~ distribution of the variable. 35 - 39 45

    ~emark. The classes of che type :g = ~ ~~ 15-19,20-24,25-29 etc., in which 50 -54 26 both the upper and lower limits are 55 - 59 8 included are called 'inclusive classes' . ~ = ~ 1 For example the class 20-24, includes Total 250

  • Frequency Distributions Arid Measures Of'Central Tendenq 13

    all the values from:20 10 24, both inclusive 'and tlie classification is termed as inclusive type classification.

    In spite of great importance of classification in statistical analysis, no hard and fast rules can be laid down for it The following points may be kep~ in mind for classification' :

    (i) Th~ classes should be clearl5' defmedand should not lead 10 aliy ambiguity. (ii) The classes should be exhaustive, i.e., each of the given values should be

    included in one of the classes. (iii) The classes shoul

  • 24 Fundamentals or Mathematical Statistics

    211. Continuous Frequency Distribution. If we deal with a continuous variable, it is not possible to arrange the data in the class intervals of above type. Let us consider the distribution of age in years. If class intervals are 15-19, 20-24 then the persons with ages between 19 and 20 years are not taken into consideration. In such a case we form the class intervals as shown below.

    Age in years Below 5

    5 or more but less than 10 10 or more but less than 15 15 or more but less than 20 20 or more but less than 25 and soon.

    Here all the persons with ~y fraction of age are included in one group or the other. For practical purpose we re-writethe above clasSes as

    0-5 5-10

    10-15 15-20 20-25

    This form of frequency distribution is known as continuousj:-equency distribu-tion.

    It should be clearly understood that in. the above classes, the upper limits of each class are excluded from the respective classes. Such classes in which the upper limits are excluded from the respective classes and are included in Ihe immediate next class are known as 'exclusive classes' and Ihe classification is termed as 'exclusive type classi/icadon'.

    22. Graphic RepreseDtatioD of a FrequeDty DistributioD. It is often useful to represent a frequency distribution by means of a diagram which makes the unwieldy data intelligible and conveys to the eye the general run of the observa-tions. Diagrammatic representation also facilitates the cOmparison of two or more frequency distributions. We consider below some important types of graphic representation.

    221. H~iogram. In drawing the histogram of a given continuous frequency distribution we flCSt mark off along the x-axis all the class interval~ on a suitable scale. On each class interval erect rectangles with ~eights proponi~ to the frequency of the corresponding class interval so that the area of the rectangle is prq>ortional to th~ frequency of the class. If. however, the classes are of Wlequal width then die height of the rectangle wili be proponional to the ratio of the frequencies to the width of the classes. 7be diagram of continuous rectaDgleS so obtained is called histrogram.

  • Frequency Distributions And Measures or Central Tendency 25

    Remarks. 1. To draw the histogram for an ungrouped frequency distribution of a variable we shall have to assume that the frequency correspon~ing to ~e variate value x is spread. over the interval x - hl2 to x + hl2, where h IS the Jump from one value to the next.

    2. If the grouped' frequency distribution is not continuous, fIrst it is to be converted into continuous disiribution and then the histrogam is drawn.

    3. Although the height of each rectangle is proportional to the frequency of the corresponding class, the height of a fraction of the rectangle is not proportional to the frequency of the correspOnding fraction of the class, so that histogram cannot be directly used to read frequency over a fraction of a class interval.

    4. The histogram of the distribution of marks of250 students in Table 3 (page 22) is obtained as follows.

    Since the grouped frequency distribution is not continuous, we frrst convert it into a continuous distribution as follows: HISTOGRAM FOR FREQ. DISTRIBUTION

    Maries 14~195 195-245 245-295

    29~345 345-395 395r445 445-495 495-545 545-595

    59~5 64~95

    No.ofStudmls 9

    n 10 44 45 54 37 26 8 5 1

    56

    III 40 . LLI

    ~ 32 LLI :;:, 24 ~ 1&.1 e: 16

    8:'

    O~L-~~--~~~~~~~_ .nw:'I11,!,":''':'w:'

    ;!~~~~~:J MARKS

    Remark. The upper and lower class limits of the new aclusive type classes are known as class boundaries.

    If d is the gap between the upper limit of any class and the lower limit of the Succeeding class, the class bOundaries for any class are then given by :

    Upper class boundary = Upper class limit + ~ Lower class boundary = Lower class limit - ~

    222. Frequency Polygon. For an ungrouped distribution, the frequency polygon is obtai~ by,plotting points with abscissa as.the variate,valus and the ordinate as the correspon~ng frequencies and joining the plotted points by means of straight lines. For a gtVuped frequency distribution, the abscissa'of points are mid-values of the class intervals. For eqUal class intervals th~ f~uel)cy polygon can be obtained by joining the middle Points of the upper sides of tJle adjacent rectangles of the histogram by m~s of straight lines. If the class intervals are of small width the polygon can be approximated by a smooth curve. The frequency curve can be obtained by drawing a smooth freehand curve through the vertices of the fre

  • Fundamentals Of Mllthem.atical.sta~lstics 23. Averages or Measures of Central Tendency or Measures of Location.

    According to Professor Bowley. averages are "statistical constants which enable us to comprehend in a single effort the significance of the whole." They give.us an idea about the concentration of the values in the central part of the distribution. Plainly speaking. ap ,;:lver:age of a statistical series is the r.-wue of the variable Which is representative of the entire distribution. The following are the five measures of central tendency that are in common use:

    (i) Arithmetic Mean.or $.imp/y Mean, (ii) Media,!, (iii) Mode, (iv) Geometric Mean, and (v) Harmonic Mean.

    24. Requisites lor'an Ideal Measure ol'Central Tendency. According to Professor Yule . the following are the characteristics to be satisfied by-an ideal measure of central tendency :

    .(i) It should be rigidly dermed. (ii) It should be readily comprel)ensible and easy to calculate. (iii) It shoul~ be based on all the observations. (iv) It should be suitable for further mathematical treatment. By this we mean

    that if we are given the averages and sizes of a nwnber of series. we should be.able to calculate the average of the composite series obtained on combining the given series.

    (v) It should be affected as little as possible by fluctuations of sampling. In addition to t,he above criteria. we m~y add the following (which is not due

    to Prof.Yule) : (vi) It should not be affected much by extreme values. 2S. Arithmetic Mean. Arithmetic mean of a set of observastions is their sum

    divided by the number of observations. e.g the arithmetic mean x of n observa-tions Xl Xl X. is given by

    1 1 II X = - ( Xl + Xl + '" + x,. ) = - 1:' Xi

    n n i= 1 In case offrequencydistribution.x; If;. i = 1.2 ... n. where/; is the freq~encyof the variable Xi ;

    x = h Xl + h Xl + .. + f,. x,. h+ h+ ... /.

    II 1:f;.x;

    ; = 1 ---

    II 1:f;

    ;= 1

    1 II [" ] N 1:f;Xi. 1:f;=N {= l' i = 1 ... (21)

    --

    In case of grouped or continuous frequency distributioo. X is taken as the mid. value of the corresponding class.

    Remark. The symbol 1: is the leUer capital sigma of the Greek alphabet and is used in mathematics to denote the sum of values.

  • Frequency Distributions And Measures Of Central Tendency 2 7

    Example 21. (a) Find the aritlunetic mean of th~ following frequency dis-tribution:

    x: 1 2 3 4 5 6 7 f: 5 9 12 17 14 10 6

    (b) Calculate the arithmetic mean of the ",arks.from thefollowing table: Marks : 0-10 10-20 20-30 30-40 40-50 50-60

    No. of students : 12 18 27 20 17 6 Solution .. (a)

    x f fx 1 5 5 2 9 18 3 12 36 4 17 68 5 14 70 6 10 60 7 6 42

    73 299 1 299

    .. x= - r. [X=f 73 =; 409 N (b)

    Marks No. of students Mid - paint fx (f) (x) ~1O 12 5 .60 1~20 18 15 270 2~30 27 25 675

    3~0 20 35 700 4~50 17 45 765 50-60 6 55 330 Total 100 2,800

    Arithmetic mean or x= ! r. fx= _1_ x 2800= 28 N 100' It may be noted that if the values of x or (and) f are large, the calculation of

    mean by formula (21) is q9ite timtKonsumiJig and tediOus. The arithmetic is reduced to a great extent, by taking the ~eviations of the given values from any arbitrary point 'A', as explained below.

    Let dj = Xj - A , then .j;,di = j; (Xi - A) =. j; X,4'- At ) Summing both sides over i from 1 to n, we get

    " " " " r. J'.d= r. ~x- A r. ~. = r. J'.XI- A .N Jj I )1 , ~,Ji Ji i=1 ~=1 ;=1 -'j=1

  • 28 Fundamentals Of Mathematical Statistics

    1 n 1''11 ~ N r.fidi= N r.fiXi- .4= x- A,

    i= 1 i= 1 where x is the arithmetic mean of the distribution.

    ~= A+ 1. i ftdi N i= 1

    ... (22) This fonnuIa is much more convenient to apply than formula (21).

    Any number can serve the purpose of arbitrary point 'A' but. usually, the value of x corresponding to the middJe part of the distribution will be much more convenient.

    In case of grouped- or continuous frequency distribution, the arithmetic is reduced to a still greater extent by taking

    Xi - A di= -h-'

    where A is an arbitrary point and h is the common magnitude of dass interval. In this case, we have

    hdi=xi-A, and proceeding exactly similarly as above, we get

    h n x= A+ N r.. fidi ... (23)

    i= 1 Example 22. Calculate the meanfor thehllowingfrequency distribution. Class-interval: 0-8 8-16 1~24 24-32 32~O 40-48 Frequency 8 7 16 24 15 7 Solution.

    Class-i1Jlerval mid-value Frequency d=(x-A)lh fd (x) (f)

    0-8 4 8 -3 -24 8-16 12 7 ..:2 -14

    16-24 20 16 -I, -16 24-32 28 24 0 0 32-40 36 1'5 1 15 40-48 44 7 2 14

    77 -2)

    tIere we take A = 28 and h ='8. .. x = A + h ~f d = 28 + 8 x \~. 25) = 28 - ~~ = 25.404

    251. Properties or Arithmetic Mean Property 1. Algebraic suw of the deviations of a set of values from

    their arithmetic mean is zero. If Xi It, i = 1, 2, ... , n is the frequency distribu-tion, then

  • Frec:\Ieocy Distributions And Measures or Centr.1 Tendency

    n

    r f; (Xi - X) = 0, X being the mean of distribution. i-I

    Proof. r f; (Xi - X) = r f; Xi - X r fi = r fi Xi - X N

    Also

    Hence

    r f,Xi X= _,_- ~ r f;Xi= N X'

    N n

    ~ Ii (Xi -'- X) = N . X - X . N = 0 i- I

    29

    I'ropel"ty 2. rhe sum of the squares of the deviations of a set of values is minimum when taken about mean.

    I'roof. For the frequency distribution Xi I Ii, i = '1,2, ... , n, let "

    z = r Ii (Xi - A )~ , i-I

    be the sum of the squares orthe deviations of given values from any arbitrary point 'A '. We have to prove that Z is miitimuiil when A = X.

    Applying the principle. of maxima and minima from 'differential calculus, Z will be minimum for variations in A if

    Now

    ~=o aA

    iiz and --, > 0

    aA-az aA = - 2 ~ f; ( Xi - A ) = 0

    1

    ~ r f; ( Xi - A ) = 0 1

    r Ii Xi - A r f; = 0 or A = r Ii Xi = X N

    . ~Z . Agalll -, = - 2 r f, (- 1) = 2 r f; = 2N> 0

    aA- i i . . Hence Z is minimum at the point A = x. This establishes the result. I'roperty 3. (Mean of the composite series). If Xi, (i = 1, 2, ... , k) ar(!

    the means of k-component series of sizes ni. ( i = 1, 2, ... , k) respectively, then the mean X of thi? composite series opt(lin~d qn combining the c~mponent series '.5 given. by the formula:

    .'roof. Let XI I , XI~ , _\:I~I be nl luembers of the first serits- ; X:i, X~~ X2n2 be ~2 members of the second series, Xli , Xt~ , , x.~. be nk members of the kth series. Then, by def.,

  • 2-10 Fuudameotals Of Mathematical Statistics

    ._ 1 XI - - (xu + Xu + . + Xlftl )

    nl

    ~- ..!..(XlI+ X'22+ ... + X2n2) n2

    Xk Y ..!.. (Xkl + Xk2 + ... + Xbt.) nk

    The mea n x of composite series of sjze nl + n2 + ... + nk is given -by _ (xu + Xl2 + '" + Xlftl) + (X21 + X22 + ... + X2ft2) + ... + (X;I + Xk2 + ... + Xbtt) X=

    = nl XI + "2X2 + ... + nkxk

    "I + n2 + ." nk Thus, x = l: n; X; / (l:n;)

    (From (*)}

    Example 23. Tile average salary of male employees in a fum was Rs.520 qnd tllal off~males was Rs.420. The mean salary of a/l tile employees was Rs.500. Find tile percentage of male ,and femdle employees.

    Solution. Let nl and n2 ~enote ~spe~tively the nu~ber o~ plale and female employees in the concern and XI and X2 denote respectively their average salary (in rupees). Let X denote the avarage salary of all the workers in the firm.

    We are giv,en that: XI = 520, X2 = 420 and x = 500

    Also we know

    nl + n'2 ~ 500 (nl + n2) = 520 nl + 420'n2 ~ (520 - 5(0) nt = (500 - 420)"2 ::;. 20 nl = 80 n2

    nl 4 ~ -=-

    ,n2 .1 Hence the percentage of male employees in the~fiml

    = _4_ )( 100.. 80 4+1

    and percentage of female employees in the firm = _1_ )( 100 = 20

    4+1 252. Merits and Demerits of Arithmetic. Mean

    Merits. (i) It is rigidly defined. (ii) It is easy to understand and easy to' calculate.

    (iii) It is based upon all the observations.

  • Frequency Distributions And Measures or Central Tendency HI

    (iv) It is amenable to algebraic treatment. The mean of the composite series in terms of the means and sizes of the component series is given by

    k k x= 1: niX;! (1: nil

    i= I (v) Of all the averages, arithmetic mean is affected least by fluctuations of samp~ng. This property is sometiwes described by saying that ari\hmetic mean is, a stable averdge.

    Thus, we see that arithmetic mean satisfies all the properties laid down by Prof. Yule for an ideal average.

    Demerits. (i) IL cannot be determined by inspection nor it can be located graphically.

    (ii) Arithmetic mean cannot be used if we are dealing with qualitative charac-teristics which cannot be measured quantitively; such as, intelligence, honesty, beauty, etc. In such cases median (discussed later) is the only average to be used.

    (iii) Arithmetic mean cannot be obtained if a single observation is missing or lost or is illegible unless we drop it out and compute the arithmetic mean of the remaining values. ,

    (iv) Arithmetic mean is affected very much by extreme values. In case of extreme items, arithmetic mean gives a distorted picture of the distribution and no longer remains representative of the distribution.

    (v) Arithmetic mean may lead to wrong conclusions if the details of the ~ql from which it is computed are not given. ~t us consider the following J:Ilarks obtained by two students A and B in three tests, viz., terminal test, half-yearly examination and annual examination respectively.

    Marks in : ~ I Tes( 1/ Test 11/ Test A 50% 60% 70% B 70% 60% 50%

    Average marks 60% 60%

    Thus average marks obtained by each of the two students at the end of the year are 60%. If we are given the average marks alone we conclude that the level of intelligence of both the students at the end of the year is same. This is a fallacious conclusion since we find from the data that student A has improved consistently while student B has deteriorated consistently.

    (vi) Arithmetic mean cannot 'be calculated if the extreme class is open, e.g., below IOor above90.Morever, even if a single observation is missing mean cannot be calculated.

    (vii) In extremely asymmertrical (skewed) distribution, usually arithmetic mean is not a suitable measure of location.

    253. Weighted Mean. In calculating arithmetic' mean we suppose that all the items in the distribution have equal importance. But in practice this may not be so. If some items in a distribution are more important than others, then this

  • 2.12 Fundamentals of Mathematical Statistics

    point must be borne in mind, in order that average computed is representative of the distribution. In such cases, proper weightage is to be given to various items - the weights attached to each item being proportional to the importance of the item in the distribution. For example, if we want to have an idea of the change in cost of living of a certain group of people, then the simple mean of the prices of the commodities consumed by them will not do, since all the commodities are not equally important, e.g., wheat, rice and pulses are more important than cigarettes, tea, confectionery, etc.

    Let IV; be the weight attached to the item X;. i = I, 2, ... , 11. Then we define: Weighted arithmetic mean or weighted mean = L W; X; I L W; ... (25)

    ; ;

    It may be observed that the formula for weighted mean is the same as tlie I'ormula for simple mean with f;, (i = I, 2, ... , II), tile freqllellcies replaced by II';, (i = I, 2, ... , II), the weights.

    Weighted mean gives the result equal to the simple mean if the weights assigned to each of the variate values arc equal. It results in higher value than the simple mean if snuiller weights arc given to smaller items and larger weights to larger items. If the weights attached to larger items are smaller and those attached to smaller items arc larger, then the weighted mean results in smaller value than the simple mean.

    Example 24. Filld the simple alld weighted arithmetic meall of the first II l/atllrailllll11bers, the weights being the correspollding Illimbers.

    Solution. The first natural numbers arc I, 2 ,3, ... ,11.

    We know that

    1 +2+3+ ... +Il

    Simple A.M. is

    X=LX=I+2+3+ 11 11

    Weighted A.M. is

    11 (11 + I) 2

    II (II + 1) (211 + I) 6

    +11 11+1 =-2-

    X . = L w X = 12 + 22 + ... + n2 1\ LW 1+2+ ... +11

    = " (II + I) (2" + I) 2 6 . Il (11+ 1)

    x I 2 3

    II

    w

    I 2 3

    II

    wX

  • Frequency Distributions And Measures or fentra' Tendency 2)3 2.6. Median. Median of Ii 4fstribution is the value of the variable which

    divides it into two equal pans. Itlis the v'a1ue w~ich exceeds and is exceeded by the same number of observations, i.e., it ,is the value such that the nUl1lber of observations above it is equal to the nomber of observations below it The median is thus a positional average.

    In case of ungrouped data, if the rumber of observations is odd then median is the middle value after the values haye been arranged in ascending or descending order of magnitude. In case of even number of observations, there are two middle terms and median is obtained by taki~g the arithmetic mean of the middle tenns. For example, the median of the valueh5, 20,15,35,18, i.e., 15, 18, 20, 25:~5 is 20 and the median of 8, 20, 50, 25, 15, 30, i.e., of 8, 15, 20, 25, 30, 50' is ! ( 20 -t ~5 ) = 225 . 2

    Remark. In case of even number of observations, in fact any value lying between the two middle values can be taken as m~ian but onventionally we take it to be the mean of the middle tenns.

    In case of discrete frequency distribution median is obtained by considering the cumulative freqoencies. The steps for calculating median are ~iven below:

    (i) Find NI2, wheteN = t fi. (ii) See the (less than) cumulative frequency (cf.) just' greater thail N12. (iii) The corresponding value of x is median. Example 25'. Obtain the median/or the/ollowing/requency distribution: x: 1 2 3 4 5 6 7 8 9 / : 8 10 11 16 20 -25 15 9 6 Solution.

    x / c.f i ' 8 8 2 10 18 3 11 29 4 16 45 5 20 65/ 6 25 90' 7 15 lOS S 9 114 9 6 120

    120

    Hence N = 120 => NI2 = 60 Cumulative frequency (cf.) just greater than Nn, is 65 and the value of x

    corresponding to 65 is 5. Therefore, median is 5. ' In the case of continuous frequency distribution, the class corresponding to the

    cf. just greater than NI2 is called the median class and.the value of median is obtained by the following fonnula : .

  • 214 Fundamentals or Mathell1lltical Statistics

    Median::: 1+ 7 (~ - c ) ... (26) where I -is the lower. limit of the median class,

    / is the frequency of the median class, h is the magnitude of the median class,

    'c' is the cf. of the class preceding the median class, andN= 1:./.

    Derivation or the Median Formula (26). Let us consider the following continuous frequency distribution, ( XI < Xl < ... < x. + I ) :

    Class interval: XI - Xl, Xl - X3, Xi - Xl + I, x.. - x.. + I Frequency : fi /1 fi /. The cumulaltive frequency distribution is givC?" by : Class interval: XI - Xl, Xl - X3, x.. - Xi + I, x.. - x.. + I Frequency: .FI Fl........ Fi F.

    where Fi = fi + fz + ...... + f;. The class x.. - Xl + I is the median class if and only if F1- I < NI2 S Fl.

    Now, if we assume that the variate values are unifonnly distributed over the median-class which implies that the ogive is a straight line in the median-class, tl!en we get from the Fig. 1,

    i.e.

    ar

    RS AC tancl=-=-BS BC

    RT-TS _ AQ-CQ BS - BC

    RT-BP _ AQ-BP BS - PQ

    NI2-Fl_1 Fl-F1 - I or BS = 'PQ

    B ~~-H---C...t

    Fk-1

    _h 0 P Q -h ~h---~~

    T

    where fi is the frequency and h the magnitude of the median class. . BS = 7. (~ - Fl- I ) Hence

    Median= OT = OP + PT = OP + BS

    = 1+ 7. ( ~ - F1- I ) which is the required fonnula.

    Remark. 100 median formula (26) can be used only for continuous classes without any gaps, i.e., for' exclusive type' classification. If we are given a frequency

  • Frequency Distributions And Measures or Central Tendency 215

    distribution in which classes are of 'in.elusive type' with gaps, then it must be converted into a continuous 'exclusive type' frequency distribution without any gaps before applying (26). This will affect the value of I in (26). As an illustration see Example 27.

    Example 26. Find the median wage of the following distribution: Wages (in Rs.) : 20-30 30-40 40-50 50--60 60-70

    No. of labour~rs . : 3 5 20 10 5 [Gorakhpur Univ. B. Sc. 1989]

    Solution. Wages (in Rs.)

    20-30 30-40 40-50 50--60 60-70

    No. of labourers 3 5

    20 10 -5

    cf. 3 8

    28 38 43

    HereN/2= 43/2= 215 . Cumulative frequency just greater than 215 is 28 and the corresponding class is 40-50. Thus median class is 40-50 .. Hence using (26), we get

    Median = 40+ ~~(21.5- 8)= 40+ 675=,4675 Thus median wage is Rs. 4675. Example 27. In afactory employing 3,000 persons, 5 per cent earn less than

    Rs. 3 per hour, 580 earnfro"-, Rs. 301 to Rs. 450 per hour, 30 percent earn from Rs.451 toRs. 600per hour, 500 earnfromRs. 601 toRs. 750per hour, 20 percent earnfrom Rs. 751 to Rs. 900 per hour, and the rest earn Rs. 901 or more per hour. What is the median wage? [Utkal Univ. B.Sc.1992]

    Solution. The' given infonnation can be expressed in tabular fonn as follows. CALCULATIONS FoR MEDIAN. WAGE

    Earnings Percentage No. of Less than Class (in Rs.) of workers -workers (f) cf boundaries

    leIS than 3 5% 5 - x 3000 = ISO 150 Below 3005 -100

    301-150 580 73() 3995-1-505 451~00 30~ 1Q. x 3000 = 900 100 1630 4,505~005 6-01-750 500 2130 6-005-7505 751--9-00 20% 1Q.. x 3000 = 600 100 2730. 7505--9005

    9-01 and above 3000 - 2730 = 270 3000=N 9005 and above

  • 2-16 Fundamentals Of Mathematical Statistics

    N 12 = 1500. 1 ne c f. just greater than 1500 is 1630. The corresponding class 4 51-600, whOse class boundaries are 4505-6005, is the median class. Using the median formula, we get:

    Median = 1+ 7 ( ~ - C }= 4505 + ~~ (1500,.., 730) = 4505 + 1283"" 579

    Hence median wage is Rs. 579. Example 28. An incomplete/requency distribution is given as/ollows'

    Variable Frequency Variable Frequency 1~20 12 50-60 ? 2~30 30 ~70 25 30-40 ? 70-80 18

    4~50 Total 229 Given that the median v{flue is 46, determine the missingfrequencies using the

    median/ormula. [Delhi Univ. B. Sc., Oct. 1992] SolutiOn. Let the frequency of the class 30---40 be /1 and that of 50-60

    bef,.. 1Mn fi-+ b= 229 - (12 + 30 + 65 + 25 + 18)= 79. Since median is given to be 46, the c~ 40-50 is the median class. Hence using median formula (26), y!e get

    46=40+ 1145- (~;+aO+fi) x 10 46 40 - 72.5-fi 10 6- 125 -fi

    - - ~ x or - 6:5 II =125- 39= 335 .. 34

    11==,79-34=45 261. Merits,and Demerits of MediaD Merits. (i) It is rigidly defmed.

    [Since frequency is never fractional] [Since!1 + b = 79 ~

    (ii) It is easily understood and is easy to calculate. In some cases it can be located merely by inspection.

    (iii) It is not at all affected by extreme Yalues. (iv) It can be calculated for distributions with open-end classes. Demerits. (i) In case of even number of observations median cannot be

    determined exactly. We ,nerely esu,JIlate it by taking the mean of two middle terms. (ii) It is not based on all the observations. For example, the median of 10, 25,

    50,60 and' 65 is 50. We can rq>tace the observations 10 and 2S by any two values which are smaller than 50 and lheobservations 60 and 65 by any two valueS gre8ter than 50 without affecting the value of median. This property is sometimes described

  • Frequency Distributions And Measures or Central Tendency 217

    by saying that median is insensitive. (iii) It is not amenable to algebraic treatment. (iv) As compared with mean, it is affected much by fluctuations of sampling. Uses. (i) Median is ~ only average to be used while dealing with qualitative

    data which cannot be measured quantitatively but still can be arranged in ascending or descending order of magnitude, e.g., to fmd the average iI}telligence or average honesty among a group of people.

    (ii) It is 10 beused for determining the typical value in problems concerning wages, distribution of wealth, etc.

    27. Mode. Let us cosider the followi~g statements: (i) The average height of an Indian (male) is 5'-6". (ii) The average size of tile shoes sold in a shop is 7. (iii) An average student in a hostel spends Rs.l50 p.m.

    In all the above cases, the average referred to is mode. Mode is the value which occurs most frequently in a se