SAS - Regression Using JMP

Download SAS - Regression Using JMP

Post on 14-Sep-2014




1 download

Embed Size (px)


<p>Praise for Regression Using JMPThe examples in the text are very applicable to various disciplines and very easy to understand and follow. Overall I would highly recommend this text to anyone in need of a manual on regression methods in JMP. Luis Rene Mateo Research Statistician I found Regression Using JMP by Freund, Littell, and Creighton to be well written and helpful. The chapter on collinearity is particularly useful as it explained the use of leverage plots to explore collinearity, in a clear, straightforward manner. Rob Gier Medication Delivery Baxter Healthcare Corporation Although a non-statistician, I enjoyed the topics discussed and the almost one on one feel in the verbiage. The book expanded my knowledge of JMP software as well as statistical concepts. Longstanding questions regarding the Stepwise Platform were resolved. James Thacker TRIZ/DOE Engineering Mentor Regression Using JMP is a well-written book for the intermediate-level statistician who wants to get more from the JMP program. This book offers a better blend of theory and mechanics than most software-oriented texts. Eric Hamann Principal Biostatistician Wyeth</p> <p>Ive been using JMP for about one year now. A couple months ago, I started to use JMP for regression analysis. I wished I had this book when I started. The explanation of the variable selection was terrific, and I know I will continue to use this book as a resource when Im doing future analysis. Sharon Field, new JMP user</p> <p>Regression Using JMP</p> <p>Rudolf Freund Ramon Littell Lee Creighton</p> <p>The correct bibliographic citation for this manual is as follows: Freund, Rudolf J., Ramon C. Littell, and Lee Creighton. 2003. Regression Using JMP. Cary, NC: SAS Institute Inc.</p> <p>Regression Using JMPCopyright 2003 by SAS Institute Inc., Cary, NC, USA Jointly co-published by SAS Institute and Wiley 2003. SAS Institute Inc. ISBN 1-59047-160-1 John Wiley &amp; Sons, Inc. ISBN 0-471-48307-9 All rights reserved. Printed in the United States of America. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc. U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation by the U.S. government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52.227-19, Commercial Computer Software-Restricted Rights (June 1987). SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513. 1st printing, June 2003 SAS Publishing provides a complete selection of books and electronic products to help customers use SAS software to its fullest potential. For more information about our e-books, e-learning products, CDs, and hardcopy books, visit the SAS Publishing Web site at or call 1-800-727-3228. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. IBM and all other International Business Machines Corporation product or service names are registered trademarks or trademarks of International Business Machines Corporation in the USA and other countries. Oracle and all other Oracle Corporation product service names are registered trademarks of Oracle Corporation in the USA and other countries. Other brand and product names are trademarks of their respective companies.</p> <p>Contents</p> <p>Acknowledgments v Using This Book vii</p> <p>1 Regression Concepts1.1 1.2 1.3 1.4</p> <p>1</p> <p>What Is Regression? 1 Statistical Background 13 Terminology and Notation 15 Regression with JMP 23</p> <p>2 Regressions in JMP 252.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 Introduction 25 A Model with One Independent Variable 27 A Model with Several Independent Variables 32 Additional Results from Fit Model 36 Further Examination of Model Parameters 50 Plotting Observations 54 Predicting to a Different Set of Data 63 Exact Collinearity: Linear Dependency 67 Summary 70</p> <p>3 Observations 733.1 3.2 3.3 3.4 3.5 Introduction 73 Outlier Detection 74 Specification Errors 91 Heterogeneous Variances Summary 104</p> <p>96</p> <p>4 Collinearity: Detection and Remedial Measures 1054.1 4.2 4.3 Introduction 105 Detecting Collinearity 107 Model Restructuring 115</p> <p>iv</p> <p>Contents</p> <p>4.4 4.5</p> <p>Variable Selection Summary 136</p> <p>125</p> <p>5 Polynomial and Smoothing Models 1395.1 5.2 5.3 5.4 5.5 5.6 5.7 Introduction 139 Polynomial Models with One Independent Variable 140 Polynomial Models with Several Variables 151 Response Surface Plots 156 A Three-Factor Response Surface Experiment 158 Smoothing Data 167 Summary 173</p> <p>6 Special Applications of Linear Models 1756.1 6.2 6.3 6.4 6.5 6.6 6.7 Introduction 175 Errors in Both Variables 176 Multiplicative Models 181 Spline Models 191 Indicator Variables 195 Binary Response Variable: Logistic Regression Summary 214</p> <p>203</p> <p>7 Nonlinear Models 2157.1 7.2 7.3 7.4 Introduction 215 Estimating the Exponential Decay Model 216 Fitting a Growth Curve with the Nonlinear Platform Summary 236</p> <p>229</p> <p>8 Regression with JMP Scripting Language 2378.1 8.2 8.3 8.4 8.5Index</p> <p>Introduction 237 Performing a Simple Regression Regression Matrices 241 Collinearity Diagnostics 242 Summary 246247 249</p> <p>237</p> <p>References</p> <p>Acknowledgments</p> <p>We would like to acknowledge several people at SAS Institute whose efforts have contributed to the completion of this book. First of all, we are grateful to Jim Goodnight, who originally encouraged us to write the book. John Sall provided insight into the inner workings of JMP that proved invaluable. Special thanks go to Dr. Mark Bailey, who meticulously read each chapter and made many suggestions that greatly enhanced the quality of the finished product. Duane Hayes, from SAS Technical Support, also reviewed the entire book through several stages and provided useful comments, as well as offering ideas for the JSL section of the text. Jim Ashton, Jenny Kendall, Charles Lin, Eddie Routten, Warren Sarle, Mike Stockstill, Tonya Baker, John Sall, Bradley Jones, and Chuck Boiler also reviewed the text in various stages. The work of several persons has influenced our writing. In particular, we acknowledge Walt Harvey of Ohio State University, Ron Hocking of Texas A&amp;M University, Bill Saunders of SAS Institute, Shayle Searle of Cornell University, and Ann Lehman of SAS Institute. Also important in the completion of this book was the SAS production team, including Julie Platt and Donna Faircloth. Finally, we thank the students at Texas A&amp;M University and the University of Florida whose research projects provided the ideas and data for many of the examples.</p> <p>vi</p> <p>Using This Book</p> <p>PurposeMost statistical analyses are based on linear models, and most analyses of linear models can be performed by two JMP platforms: Fit Y by X and Fit Model. Unlike statistical packages that need different programs for each type of analysis, these procedures provide the power and flexibility for almost all linear model analyses. To use these platforms properly, you should understand the statistics you need for the analysis and know how to instruct JMP to carry out the computations. Regression Using JMP was written to make it easier for you to use JMP in your data analysis problems. In this book, a wide variety of data is used to illustrate the basic kinds of regression models that can be analyzed with JMP.</p> <p>AudienceRegression Using JMP is intended to assist data analysts who use JMP software to perform data analysis using regression analysis. This book assumes you are familiar with basic JMP concepts such as entering data into a JMP table, and with standard operating system procedures like manipulating the mouse and accessing menus.</p> <p>PrerequisitesAlthough this book contains some explanation of statistical methods, it is not intended as a text for these methods. The reader should have a working knowledge of the statistical concepts discussed in this book.</p> <p>viii</p> <p>Using This Book</p> <p>How to Use This BookThe following sections provide an overview of the information contained in this book and how it is organized.</p> <p>OrganizationRegression Using JMP represents an introduction to regression analysis as performed by JMP. In addition to information about the customary Fit Y by X and Fit Model platforms found in all versions of JMP, this volume contains information about new features and capabilities of JMP Version 5. Here is a summary of the information contained in each chapter. Chapter 1, Regression Concepts Chapter 1 presents an interactive introduction to regression, the terminology and notation used in regression analysis, and an overview of matrix notation. Chapter 2, Regressions in JMP Chapter 2 introduces regression analysis, using both the Fit Y by X and Fit Model platforms with a single independent variable. The statistics in the output are discussed in detail. Next, a regression is performed using the same data but with several independent variables. Confidence limits are discussed in detail. The No Intercept option of the Fit Model dialog, which is controversial, is used and discussed. This option is also misused in an example to point out the dangers of forcing the regression response to pass through the origin when this situation is unreasonable or unlikely. Chapter 3, Observations Chapter 3 discusses the assumptions behind regression analysis and illustrates how the data analyst assesses violations of assumptions. This chapter discusses outliers, or those observations that do not appear to fit the model; outliers can bias parameter estimates and make your regression analysis less useful. This chapter also discusses ways to identify the influence (or leverage) of specific observations. Studentized residuals are used to identify large residuals. Examples are shown using residuals that are plotted using the Overlay Plot platform. In addition, this chapter discusses ways to detect specification errors and assess the fit of the model, ways to check the distribution of the errors for nonnormality, ways to check for heteroscedasticity, or nonconstant variances of errors, and ways to detect correlation between errors.</p> <p>Using This Book</p> <p>ix</p> <p>Chapter 4, Collinearity: Detection and Remedial Measures Chapter 4 discusses the existence of collinearity (correlation among several independent variables). It contains measures you can use to detect collinearity and ways to alleviate the effects of it. Variance inflation factors are used to determine the variables involved. Multivariate techniques are used to study the structure of collinearity. Also presented in this chapter are discussions of principal components regression and variable selection, including stepwise regression techniques. Chapter 5, Polynomial and Smoothing Models Chapter 5 gives examples of linear regression methods to estimate parameters of models that cannot be described by straight lines. The discussion of polynomial models, in which the dependent variable is related to functions of the powers of one or more independent variables, begins by using one independent variable. Then examples are given using several variables. Response surface plots are used to illustrate the nature of the estimated response curve. Chapter 6, Special Applications of Linear Models Chapter 6 covers some special applications of the linear model, including orthogonal regression, log-linear (multiplicative) models, spline functions with known knots, and the use of indicator variables. Chapter 7, Nonlinear Models Chapter 7 discusses special relationships that cannot be addressed by linear models or adaptations of linear models. Topics include the Nonlinear platform and fitting nonlinear growth curves. Chapter 8, Regression with JMP Scripting Language Chapter 8 provides examples of how JMP can be customized to extend its built-in regression methods. Sample code is provided for doing simple regressions, displaying regression matrices, and performing collinearity diagnostics.</p> <p>ConventionsThis book uses a number of conventions, including typographical conventions, syntax conventions, and conventions for presenting output.</p> <p>x</p> <p>Using This Book</p> <p>Typographical ConventionsThis book uses several type styles. The following list summarizes style conventions: bold romanbold Helvetica monospace</p> <p>is used in headings, in text to indicate very important points, and in formulas to indicate matrices and vectors. is used to refer to menu items and other control elements of JMP. is used to show examples of programming code. In addition, steps that are intended to be carried out on the computer are preceded by a mouse symbol, like the one shown here.</p> <p>Conventions for OutputAll JMP analyses in the book were run on JMP Version 5.0 under the Macintosh OS X operating system. In most cases, generic output was produced (without operating systemspecific title bars and menus). Because of differences in software versions, size options, and graphics devices, your graphics or numerical output may not be identical to what appears in this book, especially for the screen shots involving dialog boxes.</p> <p>Chapter</p> <p>1</p> <p>Regression Concepts1.1 What Is Regression? 1 1.1.1 Seeing Regression 1 9 1.1.2 A Physical Model of Regression 7 1.1.3 Multiple Linear Regression 1.1.4 Correlation 9 13 15 16 21 18 1.1.5 Seeing Correlation 11 1.2 Statistical Background</p> <p>1.3 Terminology and Notation 1.3.2 Hypothesis Testing 1.4 Regression with JMP 23</p> <p>1.3.1 Partitioning the Sums of Squares 1.3.3 Using the Generalized Inverse</p> <p>1.1</p> <p>What Is Regression?Multiple linear regression is a means to express the idea that a response variable, y, varies with a set of independent variables, x1, x2, ..., xm. The variability that y exhibits has two components: a systematic part and a random part. The systematic variation of y is modeled as a function of the x variables. This model relating y to x1, x2, ..., xm is called the regression equation. The random part takes into account the fact that the model does not exactly describe the behavior of the response.</p> <p>1.1.1 Seeing RegressionIt is easiest to see the process of regression by looking at a single x variable. In this example, the independent (x) variable is a persons height and the dependent (y) variable is a persons weight. The data are shown in Figure 1.1 and stored in the Small data file. First, we explore the variables to get a sense of their distribution. Then, we illustrate regression by fitting a model of height to weight.</p> <p>2</p> <p>Regression Concepts</p> <p>Figure 1.1Small</p> <p>Data Set</p> <p>An excellent first step in any data analysis (one that is not expressly stated in each analysis of this book) is to examine the distribution of the x and y variables. Note that this step is not required in a regression analysis, but nevertheless is good practice. To duplicate the di...</p>


View more >