# handbook of statistical analyses using sas second ed

Post on 24-Oct-2014

61 views

Embed Size (px)

TRANSCRIPT

A Handbook of Statistical Analyses using SASSECOND EDITIONGeoff DerStatistician MRC Social and Public Health Sciences Unit University of Glasgow Glasgow, Scotland

and

Brian S. EverittProfessor of Statistics in Behavioural Science Institute of Psychiatry University of London London, U.K.

CHAPMAN & HALL/CRCBoca Raton London New York Washington, D.C.

Library of Congress Cataloging-in-Publication DataCatalog record is available from the Library of Congress

This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microlming, and recording, or by any information storage or retrieval system, without prior permission in writing from the publisher. The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specic permission must be obtained in writing from CRC Press LLC for such copying. Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identication and explanation, without intent to infringe.

Visit the CRC Press Web site at www.crcpress.com 2002 by Chapman & Hall/CRC No claim to original U.S. Government works International Standard Book Number 1-5848-8245-X Printed in the United States of America 1 2 3 4 5 6 7 8 9 0 Printed on acid-free paper

PrefaceSAS, standing for Statistical Analysis System, is a powerful software package for the manipulation and statistical analysis of data. The system is extensively documented in a series of manuals. In the rst edition of this book we estimated that the relevant manuals ran to some 10,000 pages, but one reviewer described this as a considerable underestimate. Despite the quality of the manuals, their very bulk can be intimidating for potential users, especially those relatively new to SAS. For readers of this edition, there is some good news: the entire documentation for SAS has been condensed into one slim volume a Web browseable CD-ROM. The bad news, of course, is that you need a reasonable degree of acquaintance with SAS before this becomes very useful. Here our aim has been to give a brief and straightforward description of how to conduct a range of statistical analyses using the latest version of SAS, version 8.1. We hope the book will provide students and researchers with a self-contained means of using SAS to analyse their data, and that it will also serve as a stepping stone to using the printed manuals and online documentation. Many of the data sets used in the text are taken from A Handbook of Small Data Sets (referred to in the text as SDS) by Hand et al., also published by Chapman and Hall/CRC. The examples and datasets are available on line at: http://www.sas. com/service/library/onlinedoc/code.samples.html. We are extremely grateful to Ms. Harriet Meteyard for her usual excellent word processing and overall support during the preparation and writing of this book. Geoff Der Brian S. Everitt

2002 CRC Press LLC

Contents1 A Brief Introduction to SAS 1.1 Introduction 1.2 The Microsoft Windows User Interface 1.2.1 The Editor Window 1.2.2 The Log and Output Windows 1.2.3 Other Menus 1.3 The SAS Language 1.3.1 All SAS Statements Must End with a Semicolon 1.3.2 Program Steps 1.3.3 Variable Names and Data Set Names 1.3.4 Variable Lists 1.4 The Data Step 1.4.1 Creating SAS Data Sets from Raw Data 1.4.2 The Data Statement 1.4.3 The Inle Statement 1.4.4 The Input Statement 1.4.5 Reading Data from an Existing SAS Data Set 1.4.6 Storing SAS Data Sets on Disk 1.5 Modifying SAS Data 1.5.1 Creating and Modifying Variables 1.5.2 Deleting Variables 1.5.3 Deleting Observations 1.5.4 Subsetting Data Sets 1.5.5 Concatenating and Merging Data Sets 1.5.6 Merging Data Sets: Adding Variables 1.5.7 The Operation of the Data Step 1.6 The proc Step 1.6.1 The proc Statement 1.6.2 The var Statement

2002 CRC Press LLC

1.7 1.8 1.9

1.10 2

1.6.3 The where Statement 1.6.4 The by Statement 1.6.5 The class Statement Global Statements ODS: The Output Delivery System SAS Graphics 1.9.1 Proc gplot 1.9.2 Overlaid Graphs 1.9.3 Viewing and Printing Graphics Some Tips for Preventing and Correcting Errors

Data Description and Simple Inference: Mortality and Water Hardness in the U.K. 2.1 Description of Data 2.2 Methods of Analysis 2.3 Analysis Using SAS Exercises Simple Inference for Categorical Data: From Sandies to Organic Particulates in the Air 3.1 Description of Data 3.2 Methods of Analysis 3.3 Analysis Using SAS 3.3.1 Cross-Classifying Raw Data 3.3.2 Sandies 3.3.3 Acacia Ants 3.3.4 Piston Rings 3.3.5 Oral Contraceptives 3.3.6 Oral Cancers 3.3.7 Particulates and Bronchitis Exercises Multiple Regression: Determinants of Crime Rate in the United States 4.1 Description of Data 4.2 The Multiple Regression Model 4.3 Analysis Using SAS Exercises Analysis of Variance I: Treating Hypertension 5.1 Description of Data 5.2 Analysis of Variance Model 5.3 Analysis Using SAS

3

4

5

2002 CRC Press LLC

Exercises 6 Analysis of Variance II: School Attendance Amongst Australian Children 6.1 Description of Data 6.2 Analysis of Variance Model 6.2.1 Type I Sums of Squares 6.2.2 Type III Sums of Squares 6.3 Analysis Using SAS Exercises Analysis of Variance of Repeated Measures: Visual Acuity 7.1 Description of Data 7.2 Repeated Measures Data 7.3 Analysis of Variance for Repeated Measures Designs 7.4 Analysis Using SAS Exercises Logistic Regression: Psychiatric Screening, Plasma Proteins, and Danish Do-It-Yourself 8.1 Description of Data 8.2 The Logistic Regression Model 8.3 Analysis Using SAS 8.3.1 GHQ Data 8.3.2 ESR and Plasma Levels 8.3.3 Danish Do-It-Yourself Exercises Generalised Linear Models: School Attendance Amongst Australian School Children 9.1 Description of Data 9.2 Generalised Linear Models 9.2.1 Model Selection and Measure of Fit 9.3 Analysis Using SAS Exercises

7

8

9

10 Longitudinal Data I: The Treatment of Postnatal Depression 10.1 Description of Data 10.2 The Analyses of Longitudinal Data 10.3 Analysis Using SAS 10.3.1 Graphical Displays 10.3.2 Response Feature Analysis Exercises

2002 CRC Press LLC

11 Longitudinal Data II: The Treatment of Alzheimers Disease 11.1 Description of Data 11.2 Random Effects Models 11.3 Analysis Using SAS Exercises 12 Survival Analysis: Gastric Cancer and Methadone Treatment of Heroin Addicts 12.1 Description of Data 12.2 Describing Survival and Coxs Regression Model 12.2.1 Survival Function 12.2.2 Hazard Function 12.2.3 Coxs Regression 12.3 Analysis Using SAS 12.3.1 Gastric Cancer 12.3.2 Methadone Treatment of Heroin Addicts Exercises 13 Principal Components Analysis and Factor Analysis: The Olympic Decathlon and Statements about Pain 13.1 Description of Data 13.2 Principal Components and Factor Analyses 13.2.1 Principal Components Analysis 13.2.2 Factor Analysis 13.2.3 Factor Analysis and Principal Components Compared 13.3 Analysis Using SAS 13.3.1 Olympic Decathlon 13.3.2 Statements about Pain Exercises 14 Cluster Analysis: Air Pollution in the U.S.A. 14.1 Description of Data 14.2 Cluster Analysis 14.3 Analysis Using SAS Exercises 15 Discriminant Function Analysis: Classifying Tibetan Skulls 15.1 Description of Data 15.2 Discriminant Function Analysis 15.3 Analysis Using SAS Exercises

2002 CRC Press LLC

16 Correspondence Analysis: Smoking and Motherhood, Sex and the Single Girl, and European Stereotypes 16.1 Description of Data 16.2 Displaying Contingency Table Data Graphically Using Correspondence Analysis 16.3 Analysis Using SAS 16.3.1 Boyfriends 16.3.2 Smoking and Motherhood 16.3.3 Are the Germans Really Arrogant? Exercises Appendix A: SAS Macro to Produce Scatterplot Matrices Appendix B: Answers to Selected Chapter Exercises References

2002 CRC Press LLC

Chapter 1

A Brief Introduction to SAS1.1 Introduction

The SAS system is an integrated set of modules for manipulating, analysing, and presenting data. There is a large range of modules that can be added to the basic system, known as BASE SAS. Here we concentrate on the STAT and GRAPH modules in addition to the main features of the base SAS system. At the heart of SAS is a programming language composed of statements that specify how data are to be processed and analysed. The statements correspond to operations to be performed on the data or instructions about the analysis. A SAS program consists of a sequence of SAS statements grouped together into blocks, referred to as steps. These fall into two types: data steps and procedure (proc) steps. A data step is used to prepare data for analysis. It creates a SAS data set and may reorganise the data and modify it in the process. A proc step is used to perform a particular type of analysis, or statistical test, on the data in a SAS data set. A typical program might comprise a data step to read in some raw data followed by a series of proc steps analysing that data. If, in the course of the analysis, the data need to be modied, a second data step would be used to do this. The SAS system is available for a wide range of different computers and operating systems and the way in which SAS programs are entered and run differs somewhat according to the computing environment. We

2002 CRC Press LLC

describe the

Recommended