discrete multivariate analysis

30
Discrete Multivariate Analysis

Upload: jcombari

Post on 02-May-2017

238 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Discrete Multivariate Analysis

Discrete Multivariate Analysis

Page 2: Discrete Multivariate Analysis

Yvonne M. M. Bishop, Stephen E. Fienberg, and Paul W. Holland with the collaboration of

Richard J. Light and Frederick Mosteller

Discrete Multivariate AnalysisTheory and Practice

Page 3: Discrete Multivariate Analysis

Yvonne M. BishopWashington, DC [email protected]

Paul W. HollandEducational Testing ServicePrinceton, NJ [email protected]

Stephen E. FienbergDepartment of statisticsCarnegie-Mellon UniversityPittsburgh, PA [email protected]

ISBN 978-0-387-72805-6

Library of Congress Control Number: 2007928365

© 2007 Springer Science+Business Media, LLC. This Springer edition is a reprint of the 1975 edition published by MIT Press.All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden.The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identifi ed as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

Printed on acid-free paper.

Printed in the United States of America.

9 8 7 6 5 4 3 2 1

springer.com

Page 4: Discrete Multivariate Analysis
Page 5: Discrete Multivariate Analysis
Page 6: Discrete Multivariate Analysis

CONTENTS

PREFACE ..................................................................................................... v

1 INTRODUCTION ................................................................................ 11.1 The Need ........................................................................................ 11.2 Why a Book? .................................................................................. 11.3 Different Users ............................................................................... 21.4 Sketch of the Chapters ................................................................... 21.5 Computer Programs ....................................................................... 71.6 How to Proceed from Here ............................................................. 7

2 STRUCTURAL MODELS FOR COUNTED DATA .......................... 92.1 Introduction ................................................................................... 92.2 Two Dimensions—The Fourfold Table ........................................... 112.3 Two Dimensions—The Rectangular Table...................................... 242.4 Models for Three-Dimensional Arrays ........................................... 312.5 Models for Four or More Dimensions ............................................ 422.6 Exercises ......................................................................................... 482.7 Appendix: The Geometry of a 2 x 2 Table ...................................... 49

3 MAXIMUM LIKELIHOOD ESTIMATES FOR COMPLETE TABLES .......................................................................... 573.1 Introduction ................................................................................... 573.2 Sampling Distributions .................................................................. 623.3 Suffi cient Statistics ......................................................................... 643.4 Methods of Obtaining Maximum Likelihood Estimates ................ 733.5 Iterative Proportional Fitting of Log-Linear Models ..................... 833.6 Classical Uses of Iterative Proportional Fitting.............................. 973.7 Rearranging Data for Model Fitting .............................................. 1023.8 Degrees of Freedom ....................................................................... 114

4 FORMAL GOODNESS OF FIT: SUMMARY STATISTICS AND MODEL SELECTION ................................................................ 1234.1 Introduction ................................................................................... 1234.2 Summary Measures of Goodness of Fit ......................................... 1244.3 Standardized Rates ......................................................................... 1314.4 Internal Goodness of Fit ................................................................ 1364.5 Choosing a Model .......................................................................... 1554.6 Appendix: Goodman’s Partitioning Calculus ................................. 169

5 MAXIMUM LIKELIHOOD ESTIMATION FOR INCOMPLETE TABLES ...................................................................... 1775.1 Introduction ................................................................................... 1775.2 Incomplete Two-Way Tables ........................................................... 1785.3 Incomplete Two-Way Tables for Subsets of Complete Arrays ........ 2065.4 Incomplete Multiway Tables ........................................................... 2105.5 Representation of Two-Way Tables as Incomplete Multiway Arrays ...................................................... 225

Page 7: Discrete Multivariate Analysis

6 ESTIMATING THE SIZE OF A CLOSED POPULATION ................ 2296.1 Introduction ................................................................................... 2296.2 The Two-Sample Capture-Recapture Problem ................................ 2316.3 Conditional Maximum Likelihood Estimation of N ...................... 2366.4 The Three-Sample Census .............................................................. 2376.5 The General Multiple Recapture Problem ...................................... 2466.6 Discussion ...................................................................................... 254

7 MODELS FOR MEASURING CHANGE .......................................... 2577.1 Introduction ................................................................................... 2577.2 First-Order Markov Models ........................................................... 2617.3 Higher-Order Markov Models ........................................................ 2677.4 Markov Models with a Single Sequence of Transitions .................. 2707.5 Other Models ................................................................................. 273

8 ANALYSIS OF SQUARE TABLES: SYMMETRY AND MARGINAL HOMOGENEITY .......................................................... 2818.1 Introduction ................................................................................... 2818.2 Two-Dimensional Tables ................................................................ 2828.3 Three-Dimensional Tables .............................................................. 2998.4 Summary ........................................................................................ 309

9 MODEL SELECTION AND ASSESSING CLOSENESS OF FIT: PRACTICAL ASPECTS ......................................................... 3119.1 Introduction ................................................................................... 3119.2 Simplicity in Model Building .......................................................... 3129.3 Searching for Sampling Models ...................................................... 3159.4 Fitting and Testing Using the Same Data ....................................... 3179.5 Too Good a Fit .............................................................................. 3249.6 Large Sample Sizes and Chi Square When the Null Model is False ................................................................................ 3299.7 Data Anomalies and Suppressing Parameters ................................ 3329.8 Frequency of Frequencies Distribution .......................................... 337

10 OTHER METHODS FOR ESTIMATION AND TESTING IN CROSS-CLASSIFICATIONS .......................................................... 34310.1 Introduction ................................................................................... 34310.2 The Information-Theoretic Approach ............................................ 34410.3 Minimizing Chi Square, Modifi ed Chi Square, and Logit Chi Square ..................................................................... 34810.4 The Logistic Model and How to Use It .......................................... 35710.5 Testing via Partitioning of Chi Square ........................................... 36110.6 Exact Theory for Tests Based on Conditional Distributions .............................................................. 36410.7 Analyses Based on Transformed Proportions ................................. 36610.8 Necessary Developments ................................................................ 371

11 MEASURES OF ASSOCIATION AND AGREEMENT ..................... 37311.1 Introduction ................................................................................... 37311.2 Measures of Association for 2 x 2 Tables ........................................ 37611.3 Measures of Association for I x J Tables ........................................ 38511.4 Agreement as a Special Case of Association................................... 393

Page 8: Discrete Multivariate Analysis

12 PSEUDO-BAYES ESTIMATES OF CELL PROBABILITIES ....................................................................... 40112.1 Introduction ................................................................................... 40112.2 Bayes and Pseudo-Bayes Estimators ............................................... 40412.3 Asymptotic Results for Pseudo-Bayes Estimators........................... 41012.4 Small-Sample Results ..................................................................... 41612.5 Data-Dependent λ’s ........................................................................ 41912.6 Another Example: Two Social Mobility Tables .............................. 42612.7 Recent Results and Some Advice .................................................... 429

13 SAMPLING MODELS FOR DISCRETE DATA ................................ 43513.1 Introduction ................................................................................... 43513.2 The Binomial Distribution ............................................................. 43513.3 The Poisson Distribution ................................................................ 43813.4 The Multinomial Distribution ........................................................ 44113.5 The Hypergeometric Distribution .................................................. 44813.6 The Multivariate Hypergeometric Distribution .............................. 45013.7 The Negative Binomial Distribution............................................... 45213.8 The Negative Multinomial Distribution ......................................... 454

14 ASYMPTOTIC METHODS ................................................................. 45714.1 Introduction ................................................................................... 45714.2 The O, o Notation .......................................................................... 45814.3 Convergence of Stochastic Sequences............................................. 46314.4 The Op, op Notation for Stochastic Sequences ................................. 47514.5 Convergence of Moments............................................................... 48414.6 The d Method for Calculating Asymptotic Distributions ............................................................... 48614.7 General Framework for Multinomial Estimation and Testing ................................................................... 50214.8 Asymptotic Behavior of Multinomial Maximum Likelihood Estimators ................................................... 50914.9 Asymptotic Distribution of Multinomial Goodness-of-Fit Tests .................................................................... 513

REFERENCES ............................................................................................. 531

INDEX TO DATA SETS .............................................................................. 543

AUTHOR INDEX ........................................................................................ 547

SUBJECT INDEX ........................................................................................ 551

Page 9: Discrete Multivariate Analysis
Page 10: Discrete Multivariate Analysis
Page 11: Discrete Multivariate Analysis
Page 12: Discrete Multivariate Analysis
Page 13: Discrete Multivariate Analysis
Page 14: Discrete Multivariate Analysis
Page 15: Discrete Multivariate Analysis
Page 16: Discrete Multivariate Analysis
Page 17: Discrete Multivariate Analysis
Page 18: Discrete Multivariate Analysis
Page 19: Discrete Multivariate Analysis
Page 20: Discrete Multivariate Analysis
Page 21: Discrete Multivariate Analysis
Page 22: Discrete Multivariate Analysis
Page 23: Discrete Multivariate Analysis
Page 24: Discrete Multivariate Analysis
Page 25: Discrete Multivariate Analysis
Page 26: Discrete Multivariate Analysis
Page 27: Discrete Multivariate Analysis
Page 28: Discrete Multivariate Analysis
Page 29: Discrete Multivariate Analysis
Page 30: Discrete Multivariate Analysis