introduction to sas

22
Introduction to SAS BIO 226 – Spring 2011

Upload: thor

Post on 21-Jan-2016

34 views

Category:

Documents


0 download

DESCRIPTION

BIO 226 – Spring 2011. Introduction to SAS. Slides 3-7 Slides 8-10 Slide 9 Slide 13 Slides 14-15 Slide 15 Slide 16 Slide 16 Slide 17 Slide 11-12 Slide 18 Slide 19 Slide 20 Slides 21-22. Outline. Windows and common rules Getting the data The PRINT and CONTENTS Procedures - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to SAS

Introduction to SAS

BIO 226 – Spring 2011

Page 2: Introduction to SAS

2

Outline• Windows and common rules• Getting the data

– The PRINT and CONTENTS Procedures• Basic SAS procedures

– The SORT Procedure– The MEANS Procedure– The UNIVARIATE Procedure– The FREQ Procedure– The CORR Procedure – The PLOT Procedure

• Manipulating the data, e.g., creating new variables

• Libraries• Output in Word document• References• Practice

Slides 3-7Slides 8-10

Slide 9

Slide 13Slides 14-15

Slide 15Slide 16Slide 16Slide 17

Slide 11-12Slide 18Slide 19Slide 20

Slides 21-22

Page 3: Introduction to SAS

3

The different SAS windows

• Explorer: contains SAS files and libraries

• Editor: where you can open or type SAS programs

• Log: stores details about your SAS session (code run, dataset created, errors...)

• Results: table of contents for output of programs

• Output: printed results of SAS programs

Page 4: Introduction to SAS

4

Basic SAS rules (1)

• Variable names must:– be one to 32 characters in length– begin with letter (A-Z) or underscore (_)– continue with any combination of number, letters or underscores

• A variable’s type is either character or numeric

• Missing values: – missing character data is left blank– missing numeric data is denoted by a period (.)

Page 5: Introduction to SAS

5

Basic SAS rules (2)

• Two ways to make comments: – * write comment here;– /* write comment here */

• SAS is insensitive to case

Page 6: Introduction to SAS

6

Basic programming rules (1)

• SAS programs are composed of statements: these are organized in DATA steps and PROC steps– DATA step: gives dataset a name, manipulates dataset– PROC step: procedure or analysis you want SAS to carry out

• SAS reads code line by line and the end of a line is marked by a semicolon.

• All SAS programs end with RUN;

• Quotes can be single or double.

Page 7: Introduction to SAS

7

Basic programming rules (2)

• SAS statements are free-format:– Can begin and end in any column– One statement can continue over several lines– Several statements can be on one line

• To submit program, highlight the code to run and click on the submit button (running silhouette)

Page 8: Introduction to SAS

8

Loading data• If you have SAS data set (sasintro.sas7bdat) you can double

click on it and it will load itself.

• If you don’t have SAS data set (sasintro.txt), and the first row of your dataset contains the variable names, you can import it using File > Import Data… and specify the directory.

• Or you can use the following code:

DATA mydata;INFILE ‘g:\shared\bio226\sasintro.txt’;INPUT weight bmi id age activity education smoking;RUN;

• Setting your current directory: on the bottom line of the main SAS window, you should see it set to C:\WINDOWS\system32. Double click on it to change it.

Page 9: Introduction to SAS

9

How to view the loaded data?

• Go in the Explorer window, double click on Libraries, then Work and sasintro.sas7bdat

• To view general information about the data set, like variables’ name and type:

PROC CONTENTS DATA=mydata;RUN;

• Use the PRINT procedure to view the first 25 records:

PROC PRINT DATA=mydata (OBS=25);RUN;

Page 10: Introduction to SAS

10

Variables from sasintro.txt

# Variable Type Unit

5 activity Num kcal/week

4 age Num years

2 bmi Num kg/m2

6 education Num years

3 id Num

7 smoking Num 1:current smoker, 0:non-smoker

1 weight Num lbs

Page 11: Introduction to SAS

11

Manipulating data (1)

• selecting a subset of rows

DATA mydata_s;SET mydata;IF smoking=1;RUN;

• deleting a column (or columns)

DATA mydata2;SET mydata;DROP weight education;RUN;

Page 12: Introduction to SAS

12

Manipulating data (2)

• adding a column (or columns)

DATA mydata3;SET mydata;weight_kg=weight*0.453;IF age <= 60 THEN agegroup=1;ELSE IF age<=70 THEN agegroup=2;ELSE agegroup=3;/*drop age;*/RUN;

Page 13: Introduction to SAS

13

Sorting data

PROC SORT DATA=mydata OUT=mydata4;BY ID age weight;

PROC PRINT DATA=mydata (OBS=5);PROC PRINT DATA=mydata4 (OBS=5);RUN;

Page 14: Introduction to SAS

14

Summarizing data (1)

• Summarizing weight:

PROC MEANS DATA=mydata;VAR weight;RUN;

• Summarizing weight in the youngest agegroup:

PROC MEANS DATA=mydata3;VAR weight;WHERE agegroup=1;RUN;

Page 15: Introduction to SAS

15

Summarizing data (2)

• Summarizing weight by smoking status (two possible codes):

PROC SORT DATA=mydata OUT=mydata5;BY smoking;PROC MEANS DATA=mydata5;VAR weight;BY smoking;RUN;

PROC MEANS DATA=mydata;CLASS smoking;VAR weight;RUN;

• All these summarizing measures can be obtained with PROC UNIVARIATE also.

Page 16: Introduction to SAS

16

Categorical data and correlation

• Summarizing categorical data

PROC FREQ DATA=mydata3;TABLES smoking*agegroup /chisq exact;RUN;

• Examining correlation

PROC CORR DATA=mydata;VAR weight;WITH bmi age;RUN;

Page 17: Introduction to SAS

17

Basic procedures: plots

• BarchartsPROC CHART DATA=mydata3;VBAR agegroup /DISCRETE;RUN;

• ScatterplotPROC PLOT DATA=mydata3;PLOT bmi*weight='*';RUN;

• Histogram, Boxplot, Normal Probability PlotPROC UNIVARIATE DATA=mydata3 PLOT;VAR weight;RUN;

Page 18: Introduction to SAS

18

/* Libraries */• A library is the directory where your SAS dataset is stored.

• The default library is named Work and stores your SAS datasets temporarily: they will be deleted when you end your SAS session

• If you want to save your SAS datasets and use them again later, create your own library:

LIBNAME SAS_Lab 'p:\BIO226\SAS';DATA SAS_Lab.mydata;INFILE ‘g:\shared\bio226\sasintro.txt’;INPUT weight bmi id age activity education smoking;RUN;

Page 19: Introduction to SAS

19

SAS output and Word

• To send you SAS output to a Word document:

ODS RTF FILE=‘p:output.RTF’ style=minimal;PROC CORR DATA =mydata;

VAR weight;WITH bmi age;RUN;

ODS RTF CLOSE;

• Other styles: Journal, Analysis, Statistical

Page 20: Introduction to SAS

20

For further references

• SAS9 Documentation on the Web: http://support.sas.com/onlinedoc/913/docMainpage.jsp

• Applied Statistics and the SAS Programming Language (5th Edition) Ron P. Cody and Jeffrey K. Smith

• The Little SAS Book, L.D. Delwiche and S.J. Slaughter

• See SAS_help.doc on course website

Page 21: Introduction to SAS

21

Try your own

• Find the summary statistics (mean, mode, standard deviation,…) for education with PROC UNIVARIATE, as well as a histogram for years of education.

• Create a new variable educ_group which breaks years of education into four groups (0-10, 10-15,15-18,18-25). Put this new variable in a new data set and drop the education variable, as well as weight, bmi and age.

• Find the number of smokers per education group.

• Find the mean physical activity in each education group.

Page 22: Introduction to SAS

22

Data name Description

mydata original imported data

mydata_s only smokers

mydata2 dropped weight, education

mydata3 added weight_kg, agegroup, dropped age

mydata4 sorted original data by age and weight

mydata5 sorted original data by smoking status

Recap of different datasets created