chapter 1: introduction to sas (2/2) -...

39

Upload: others

Post on 22-Jul-2020

35 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

Chapter 1: Introduction to SAS (2/2)

Junshu Bao

University of Pittsburgh

1 / 39

Page 2: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

Table of contents

1.5 Modifying SAS Data

1.6 Proc Step

1.7 Global Statements

* More about Modifying and Combining Data Sets

1.8 SAS Graphics

2 / 39

Page 3: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.5 Modifying SAS Data

Creating and Modifying Variables

The assignment statement can be used both to create newvariables and modify existing ones. The basic form is

variable = expression

For examples

weightloss=startweight-weightnow;

startweight=startweight*0.4536;

SAS has the normal set of arithmetic operators: +, -, /(divide), * (multiply), and ** (exponential), plus variousarithmetic, mathematical and statistical functions.

3 / 39

Page 4: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.5 Modifying SAS Data

Assignment Statements

Here are examples of basic types of assignment statements:

Type of expression Assignment statement

numeric constant NewVar = 10;character constant NewVar = `ten';a variable NewVar = OldVar;a function of variable(s) NewVar = function(OldVariable);

Whether the variable NewVar is numeric or character dependson the expression that de�nes it.

4 / 39

Page 5: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.5 Modifying SAS Data

Example: Survey of Home Gardeners

Gardeners were asked to estimate the number of pounds theyharvested for four crops: tomatoes, zucchini, peas, and grapes.

Gregor 10 2 40 0Molly 15 5 10 1000Luther 50 10 15 50Susan 20 0 . 20

The following program reads the data and then modi�es the data.

DATA homegarden;

INFILE 'c:\MyRawData\Garden.dat';

INPUT Name $ 1-7 Tomato Zucchini Peas Grapes;

Zone = 14;

Type = `home';

Zucchini = Zucchini * 10;

Total = Tomato + Zucchini + Peas + Grapes;

PerTom = (Tomato / Total) * 100;

RUN;

See SAS program and output.5 / 39

Page 6: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.5 Modifying SAS Data

Missing Values

I The result of an arithmetic operation performed on a missingvalue is itself a missing value.

I Missing values for numeric variables are represented by a period.

I A numeric variable can be set to a missing value by anassignment statement such as:

age = .;

I A missing value may be assigned to a character variable asfollows:

team=` ';

6 / 39

Page 7: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.5 Modifying SAS Data

Using SAS Functions

SAS has hundreds of functions in general areas including:

Character Character String Matching Date and TimeDistance Financial Descriptive StatisticsMacro Mathematical ProbabilityRandom Number State and Zip Code Variable Information

For example,

AvgScore = MEAN(Scr1, Scr2, Scr3, Scr4, Scr5);

DayEntered = DAY(Date);

Type = UPCASE(Type);

I The MEAN function returns the mean of non-missing arguments.

I The DAY function returns the day of the month.

I The UPCASE function transform the variable values touppercase. * SAS is case sensitive when it comes to variablevalues; a 'd' is not the same as 'D'.

7 / 39

Page 8: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.5 Modifying SAS Data

Using IF-THEN Statements

Frequently, you want an assignment statement to apply to someobservations, but not all. This is called conditional logic and you do itwith IF-THEN statements:

IF condition THEN action;

Example: IF Model=`Mustang' THEN Make=`Ford';

Here are the basic comparison operators:

Symbolic Mnemonic Meaning= EQ equals^= and ~= NE not equal> GT greater than< LT less than>= GE greater than or equal<= LE less than or equal

The IN operator also makes comparisons. Here is an example:

IF Model IN (`Corvette', `Camaro') THEN Make=`Chevrolet'; 8 / 39

Page 9: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.5 Modifying SAS Data

DO-END Keywords

A single IF-THEN statement can only have one action. If you add thekeywords DO and END, then you can execute more than one action.The basic form is as follows:

IF condition THEN DO;

action1;

action2;

END;

For example,

IF Model=`Mustang' THEN DO;

Make=`Ford';

Size=`compact';

END;

9 / 39

Page 10: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.5 Modifying SAS Data

Specifying Multiple Conditions

You can also specify multiple conditions with the keywords AND andOR:

IF condition1 AND condition2 THEN action;

For example

IF Model=`Mustang' AND Year<1975 THEN Status=`classic';

Like the comparison operators, AND and OR may be symbolic ormnemonic:

Symbolic Mnemonic Meaning& AND all comparisons must be true| or ! OR at least one comparison must be true

10 / 39

Page 11: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.5 Modifying SAS Data

ExampleThe following data about used cars contain values for model, year,make, number of seats, and color:

Corvette 1955 . 2 black

XJ6 1995 Jaguar 2 teal

Mustang 1966 Ford 4 red

Miata 2002 . . silver

CRX 2001 Honda 2 black

Camaro 2000 . 4 red

We will �ll in missing data, and create a new variable, Status.

DATA sportscars;

INFILE `c:\MyRawData\UsedCars.dat';

INPUT Model $ Year Make $ Seats Color $;

IF Year < 1975 THEN Status = `classic';

IF Model = `Corvette' OR Model = `Camaro' THEN Make = `Chevy';

IF Model = `Miata' THEN DO;

Make = `Mazda';

Seats = 2;

END;

RUN;

11 / 39

Page 12: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.5 Modifying SAS Data

Grouping Observations with IF-THEN/ELSE

One common use of IF-THEN statements is for groupingobservations. By adding the keyword ELSE to your IF statements,you can tell SAS that these statements are related.

IF-THEN/ELSE logic takes this basic form:

IF condition1 THEN action1;

ELSE IF condition2 THEN action2;

ELSE IF condition3 THEN action3;

... ...

ELSE action;

The last ELSE statement contains just an action. An ELSE of thiskind becomes a default which is automatically executed for allobservations failing to satisfy any of the previous IF statements. Forexample,

IF Cost = . THEN CostGroup = 'missing';

ELSE IF Cost < 2000 THEN CostGroup = 'low';

ELSE IF Cost < 10000 THEN CostGroup = 'medium';

ELSE CostGroup = 'high';

* SAS considers missing values to be smaller than non-missing values.12 / 39

Page 13: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.5 Modifying SAS Data

Simplifying Programs with Arrays

When the same operation is to be carried out on several variables, itis often convenient to use an array and an iterative do loop incombination

For example, suppose you have 20 variables, q1 to q20, for which "notapplicable" has been coded -1 and we wish to set those to missingvalues, we might do it as follows:

array qall{20} q1-q20;

do i = 1 to 20;

if qall{i} = -1 then qall{i} = . ;

end;

The array statement de�nes an array by specifying the name of thearray, `qall' here, the number of variables to be included in it inbraces and the list of variables to be included.

* All the variables in the array must be of the same type, that is all

numeric or all character.13 / 39

Page 14: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.5 Modifying SAS Data

Deleting Variables

Variables may be removed from the data set being created byusing the drop and keep statements.I The drop statement names a list of variables that are to be

excluded from the data set. For example:

data gradebook_final;

set gradebook;

drop quiz5;

run;

I The keep statement names a list of variables that are to bethe only ones retained in the data set. For example:

data gradebook_final;

set gradebook;

keep quiz1 quiz2 quiz3 quiz4;

run;14 / 39

Page 15: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.5 Modifying SAS Data

Deleting Observations

It may be necessary to delete observations from the data seteither because they contain errors or because the analysis is tobe carried out on a subset of the data.

I Deleting erroneous observations is best done by using the ifthen statement with the delete statement. For example,if weightloss>startweight then delete;

I In the case above, it would also be useful to write out amessage giving more information about the observationthat contains the error.if weightloss>startweight then do;

put 'Error in weight data' idno = startweight = weightloss = ;

delete;

run;

The put statement write text (in quotes) and the values ofvariables to the log.

15 / 39

Page 16: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.5 Modifying SAS Data

Subsetting Data Sets

It may be necessary to delete observations from the data seteither because they contain errors or because the analysis is tobe carried out on a subset of the data. This can be achievedwith the subsetting if statement in a data step.

For example,

data women;

set bodyfat;

if sex = 'F';

run;

16 / 39

Page 17: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.6 Proc Step

Proc Statement

I Once data have been read into a SAS data set, SASprocedures can be used to analyze the data.

I The proc step is a block of statements that specify the dataset to be analyzed, the procedure to be used and anyfurther details of the analysis.

I The proc statement names the procedure to be used andmay also specify options for the analysis.

The most important option is data= option that names thedata set to be analyzed. If the option is omitted, theprocedure uses the most recently created data set.

17 / 39

Page 18: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.6 Proc Step

Var Statement

The var statement speci�es that variables that are to beprocessed by the proc step. For example,

proc print data = SlimmingClub;

var name team weightloss;

run;

restricts the printout to the three variables mentioned, whereasthe default would be to print all variables.

18 / 39

Page 19: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.6 Proc Step

Where Statement

The where statement selects the observations to be processed.The keyword where is followed by a logical condition, and onlythose observations for which the condition is true are includedin the analysis.

proc print data = SlimmingClub;

where weightloss>0;

run;

only prints out observations with positive weight loss.

19 / 39

Page 20: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.6 Proc Step

By Statement

The by statement is used to process the data in groups.

I The observations are grouped according to the values of thevariable named in the by statement, and a separate analysisis conducted for each group.

I The data set must �rst be sorted on the by variable.

proc sort data=SlimmingClub;

by team;

proc means;

var weightloss;

by team;

run;

20 / 39

Page 21: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.6 Proc Step

Class Statement

The class statement is used with many procedures to namevariables that are to be used as classi�cation variables, orfactors.

The variables named may be character or numeric variables andwill typically contain a relatively small range of discrete values.For example

proc logistic data=ghq;

class sex;

model cases/total=sex ghq;

run;

21 / 39

Page 22: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.7 Global Statements

Global Statements (1) Title

Global statements may occur at any point in a SAS programand remain in e�ect until reset. The title statement is a globalstatement and provides a title that will appear on each page ofprinted output and each graph until reset. An example would be

title `Analysis of Slimming Club Data';

I The text of the title must be enclosed in quotes.

I Multiple lines of titles can be speci�ed with the title2statement for the second line, title3 for the third line, andso on up to 10.

I The title statement is synonymous with title1.

22 / 39

Page 23: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.7 Global Statements

Global Statements (2) Comments

Comment statements are global statements in the sense thatthey can occur anywhere. There are two forms of commentstatement.

I The �rst form begins with an asterisk and ends with asemicolon, for example,* this is a comment;

I The second form begins with /* and ends with */:

/* this is also a

comment

*/

Comments may appear on the same line as a SASstatement, for example

bmi=weight/height**2; /* Body Mass Index */

23 / 39

Page 24: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.7 Global Statements

Global Statements (3) Options

The options statement is used to set SAS system options. Mostof these can be safely left at their default values. Some usefuloptions are:

I Nocenter aligns the output at the left, rather thancentering it on the page.

I Nodate suppresses printing of the date and time on theoutput.

I Pageno=n sets the page number for the next page ofoutput. Alternatively, nonumber turns page numberingo�.

For example

options nodate nocenter nonumber;

24 / 39

Page 25: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

* More about Modifying and Combining Data Sets

Concatenating Data Sets - Adding Observations

The set statement can be used to concatenate or stack the data setsone on top of the other.

This is useful when you want to combine data sets with all or most ofthe same variables but di�erent observations. The basic form is:

data new-dataset;

set dataset1 dataset2;

run;

I The number of observations in the new data set will equal thesum of the number of observations in the old data sets.

I If one of the data sets has a variable not contained in the otherdata sets, then the observations from the other data sets willhave missing values for that variable.

25 / 39

Page 26: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

* More about Modifying and Combining Data Sets

Example: Stacking Data Sets

The Fun Times Amusement Park has two entrances where theycollect data about their customers.

South Entrance Data:

Entrance Pass Number Size of Party AgeS 43 3 27S 44 3 24S 45 3 2

North Entrance Data:

Entrance Pass Number Size of Party Age Parking LotN 21 5 41 1N 87 4 33 3N 65 2 67 1N 66 2 7 1

Note that the north entrance data set has one more variable, parking

lot number. The north entrance only has one parking lot. 26 / 39

Page 27: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

* More about Modifying and Combining Data Sets

Example: Stacking Data Sets (cont.)

Suppose we would like to combine the data of the two entrancesand create a new variable, AmountPaid, which tells how mucheach customer paid based on their age.

DATA both;

SET southentrance northentrance;

IF Age = . THEN AmountPaid = .;

ELSE IF Age < 3 THEN AmountPaid = 0;

ELSE IF Age < 65 THEN AmountPaid = 35;

ELSE AmountPaid = 27;

RUN;

See SAS program and output.

27 / 39

Page 28: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

* More about Modifying and Combining Data Sets

Merging Data Sets - Adding Variables (1)

Data for a study may arise from more than one source, or at di�erenttimes, and need to be combined.

I For matching purpose, you will want to have a common variableor several variables which taken together uniquely identify eachobservation. If the data are not already sorted, use the sortprocedure to sort all data sets by the common variables.

I The basic form is as follows:

proc sort data=dataset1;

by variable-list;

proc sort data=dataset2;

by variable-list;

data new-dataset;

merge dataset1 dataset2;

by variable-list;

* If the two data sets have variables with the same names, then the

variables from the second data set will overwrite any variables having

the same name in the �rst data set. 28 / 39

Page 29: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

* More about Modifying and Combining Data Sets

Example: Belgian Chocolatier

A Belgian chocolatier keeps track of the number of each type ofchocolate sold each day.

I The code number for each chocolate and the number of piecessold that day are kept in a �le.

I In a separate �le she keeps the names and descriptions of eachchocolate as well as the code number.

In order to print the day's sales along with the descriptions of thechocolates, the two �les must be merged together using the codenumber as the common variable.

See SAS program and output.

29 / 39

Page 30: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

* More about Modifying and Combining Data Sets

One-to-Many Match Merge

Sometimes you need to combine two data sets by matching oneobservation from one data set with more than one observation inanother.

Suppose you had data for every state in the U.S. and wanted tocombine it with data for every county. This would be a one-to-manymatch merge.

The statements for a one-to-many match merge are identical to thosefor a one-to-one match merge:

data new-dataset;

merge dataset1 dataset2;

by variable-list;

I The order of the data sets in the merge statement does not a�ectthe matching. In other words, a one-to-many merge will matchthe same observations as a many-to-one merge.

I Before you merge two data sets, they must be sorted by one ormore common variables.

I You cannot do a one-to-many merge without a by statement.

30 / 39

Page 31: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

* More about Modifying and Combining Data Sets

Example: One-to-Many Match Merge

A distributor of athletic shoes is putting all its shoes on sale at 20 to30% o� the regular price. The distributor has two data sets:

I Data set 1: information about each type of shoe. It contains onerecord for each shoe with values for style, type of exercise(running, walking, or cross-training), and regular price.

I Data set 2: discount factor. It contains one record for each typeof exercise and its discount.

To �nd the sale price, we need to merge the two data sets andcalculate a new price after the discount.

See SAS program and output.

31 / 39

Page 32: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

* More about Modifying and Combining Data Sets

Tracking and Selecting Observations

When you combine two data sets, you can use in= options to trackwhich of the original data sets contributed to each observation in thenew data set.

For example, the data step below creates a data set named both bymerging two data sets state and county. Then the in= optionscreate two variables named InState and InCounty.

data both;

merge state (in=InState) county (in=InCounty);

by StateName;

SAS gives the in= variables a value of 0 or 1. A value of 1 means thatdata set did contributes to the current observation, and a value of 0means no contribution.

* You can use this in= variable to subset data sets.

32 / 39

Page 33: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

* More about Modifying and Combining Data Sets

Example: The IN= Option

A sporting goods manufacturer wants to send a sales rep to contactall customers who did not place any orders during the third quarter ofthe year. The company has two data �les:

I Data �le 1: customer information

I Data �le 2: orders placed during the third quarter

To compile a list of customers without orders, you merge the two datasets using the IN= option, and then select customers who had noobservations in the orders data set.

See SAS program and output.

33 / 39

Page 34: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

* More about Modifying and Combining Data Sets

Selecting Observations with the WHERE= Option

The where= data set option is the most �exible of all ways to subsetdata. You can use it in data steps or proc steps. The basic form of awhere= option is:where = (condition)

I If used in a set or merge statement, the where= option will beapplied to the data set that is being read. For example,

data gone;

set animals (where = (Status = 'Extinct'));

I If used in a data statement, the where= option will be applied tothe data set that is being written. For example,

data uncommon (where = (Status IN ('Endangered', 'Threatened'));

set animals;

34 / 39

Page 35: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

* More about Modifying and Combining Data Sets

Example: WHERE= Option

The following data contain information about the Seven Summits, thehighest mountains on each continent. Each line of data includes thename of a mountain, its continent, and height in meter.

Kilimanjaro Africa 5895

Vinson Massif Antarctica 4897

Everest Asia 8848

Elbrus Europe 5642

McKinley North America 6194

Aconcagua South America 6962

Kosciuszuko Australia 2228

We will create two data sets named "tallpeaks" (above 6000 meters)and "American".

See SAS program and output.

35 / 39

Page 36: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.8 SAS Graphics

SAS Graphics

When the SAS/GRAPH module has been licensed, there are anumber of ways of producing high-quality graphical output. Threemain approaches:

I Graphical options within a statistical procedure

I Traditional graphics procedures (gplot, gchart, etc.)

Graphics procedures that existed in versions of SAS prior to 9.2.

I Statistical graphics procedures (sgplot, sgpanel, sgmatrix andsgrender)

New graphics procedures which can produce a wide range ofattractive graphics.

We will focus on the statistical graphics procedures for now. The

speci�c graphical options that are available within statistical

procedures will be dealt with in later chapters.36 / 39

Page 37: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.8 SAS Graphics

xy Plots - Proc sgplot

An xy plot is one in which the data are represented in two dimensionsde�ned by the values of two variables. For example, to create ascatterplot,

proc sgplot data=bodyfat;

scatter y=pctfat x=age;

run;

The syntax is straightforward:

I A scatter statement is used to tell SAS to create a scatterplot.

I In the scatter statement, both the x and y variables are speci�edexplicitly.

For di�erent types of plot, a statement other than scatter is used. See

next page.

37 / 39

Page 38: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.8 SAS Graphics

Types of xy Plots

Type of Plot Plotting StatementScatter plot - data values are plotted scatterLine plot - data values are joined with lines seriesStep plot - data values joined with stepped lines stepNeedle plot - vertical line joins the value to the x axis needleRegression plot - a scatter plot with a regression line regLocally weighted regression loessPenalized Beta splines pbspline

* For line plots and step plots the points will be plotted in theorder in which they occur in the data set, so sort the data by thex axis variable �rst.

* A common variant of the xy plot distinguish groups in the databy using di�erent symbols/lines. This is done by the group=varoption. For example: scatter y=pctfat x=age/group=sex;

38 / 39

Page 39: Chapter 1: Introduction to SAS (2/2) - pitt.edupitt.edu/.../stat1301/Lec/SAS-Ch1-Introduction-2.pdf · Chapter 1: Introduction to SAS (2/2) 1.5 Modifying SAS Data Example: Survey

Chapter 1: Introduction to SAS (2/2)

1.8 SAS Graphics

Overlaying Plots

It is often useful to combine the information from two or more plotsby overlaying them. Sgplot does this automatically. For example, aplot to compare the �ts from linear regression and locally weightedregression could be produced as follows:

proc sgplot data=bodyfat;

reg y=pctfat x=age;

loess y=pctfat x=age/nomarkers;

run;

The nomarkers option is speci�ed to prevent the data points being

plotted twice as sgplot uses di�erent plotting symbols for each.

39 / 39