sas lecture 6 – sas/graph aidan mcdermott, may 3, 2005
TRANSCRIPT
SAS Lecture 6 – SAS/GRAPH
Aidan McDermott,May 3, 2005
2
SAS/GRAPH
There are a small number of graphic types commonly used in public health presentations and publication.
These basic types are either used alone or mixed together to form a composite graphic.
Here we will look at how to build some of these basic types of graph.
Golden Rule: Everybody is a graph critic.
3
Two types of graph maker If you are using SAS for statistics and data management then it
seems natural to use it to produce your graphs as well. Sometimes a statistical procedure will produce the graph you are looking for anyway.
Need a one-off graph for a presentation versus production line graphs.
To produce “quick and dirty” graphs you can use Graph-n-go.Very easy to use; not bad for putting multiple graphs on one page; data
viewer is a graph type; only a small number of graph types available; not all options available; labor intensive so not suitable for production line graphs.
Use SAS/Graph proceduresVery flexible; complete control over graphic elements; less labor intensive
in the long run; harder to learn; same control can be used for SAS/STAT graphics output.
4
Some common types of graphChartsHistogramsStem and leaf plotsBoxplotsPlotsContour plots / 3-dimensional plotsMapsGantt chartsTrellis plots Trees / pedigrees / dendograms (mathematical) graphs / networksFlow charts / entity-relationship diagrams
6
Graph-n-goSolutions reporting graph-n-goThe top two icons represent data modelsThe rest are data viewers.
7
Graph-n-goChoose and configure a data model.
Choose a dataset.Right mouse
button click on the data model and choose properties.
Set which columns to use, where clauses etc.
8
Graph-n-go Choose a viewer and position it on the viewer
area (e.g. a bar chart).Drag and drop the
data model onto the viewer to associate data with the viewer.
Right mouse button on the viewer and choose properties.
Configure (choose variables to plot etc).
9
Graph-n-go When finished graph can be exported to html etc.
Choose file export write to file
You’ll see more in the lab.
10
Graphic output within SAS
• You have already seen some graphic output from within SAS.
• proc means, proc univariate, proc genmod, proc lifetest etc. all produce graphs
• Other procedures in SAS specifically produce graphs, even some procedures that are not part of SAS/Graph (proc boxplot is an example)Here our aim is to produce
publication/presentation-- quality graphs.
11
Graph basics
SAS stores graphs in catalogs (an entity similar to a folder in windows).
Graphs are stored in a SAS proprietary format.By default graphs are stored in a catalog called
Gseg in the work library.Graphs can be translated to postscript, gif, jpeg, and
a number of other commonly used formats for printing or including in other documents (Word, html, etc.).
12
Graphic control
There are three ways to control the look of a sas/graph.
1. Use options within the procedure
2. Use global commands
3. Use goptions
13
GOPTIONS set the environment for a graphics program to
run and send output
independent of the program
remain in effect for the entire SAS session unless changed or reset
control appearance of graphic elements by specifying default fonts, colors, text heights etc.
Useful when you want the same options in multiple procs
14
PROC GOPTIONS
used to review current GOPTIONSlists alphabetically all of the current
GOPTIONS in the LOG window
proc goptions;run;
Can also type goptions at the command line
15
GOPTIONSGOPTIONS options-list ROTATE= portrait or landscape (will override the setting in the print dialog
box)
RESET=ALL resets all options to defaults including all global statements
RESET=GOPTIONS resets only goptions statements
16
COLORS=device dependent default color list for device driver
GUNIT= unit of measurement for height in global statements, such as TITLE and FOOTNOTE
cell - character cells pct - percent of graphics area in - inches
17
Data• From the SAS samples folder.• Three Californian pollutant monitoring
stations (AZU, LIV, SFO)
• One monthly measurement (taken on the 15th of the month) for CO, O3, SO4, temperature etc. for each station. 36 observations in all
• Month is a numeric variable taking the value 1 for January, 2 for February, etc.
18
Californian Air pollutant Data – ca88air
19
Charts
• Examples
Look for graphic elements in each chart
Look for common data types
Look for similarities among the examples
20
21
22
23
24
25
26
27
Charts• All the examples used a small number of
graphic elements• Main difference between plots is the
polygon/area type• Most involved a categorical/discrete
variable and a numeric variable. A histogram uses a continuous variable to
create categories. The counts of a categorical variable can be used to create the numeric variable.
28
Proc GCHART
produces charts based on the values of one or more chart variables.
produces vertical and horizontal bar charts, block charts, pie charts etc.
graphs based on statistics - counts, percentages, sums, or means
run-group processing
numeric and character variables
29
Proc GCHART example proc format; value seas 1 = ‘Win’ 2 = ‘Spr’
3 = ‘Sum’ 4 = ‘Fal’;
data ca88air; set vol1.ca88air(where=(station=“SFO”));
if ( month in (12,1,2) ) then season = 1; else if ( month in (3,4,5) ) then season = 2; else if ( month in (6,7,8) ) then season = 3; else if ( month in (9,10,11)) then season = 4;
format season seas.; format month mth.; run;
30
Proc GCHART example title1 h=4 ’Mean seasonal carbon monoxide for station
SFO’; footnote j=l h=4 f=simplex 'Bar Chart - vertical’;
proc gchart data=ca88air; vbar season / sumvar=co type=mean discrete ctext=black clm=95 ; run; quit;
31
32
Proc GCHART syntax
PROC GCHART data=data set name;
One of the following:
VBAR variables / options;
HBAR variables / options; STAR variables / options; PIE variables / options; BLOCK variables / options;
run;
33
VBARseparate bar chart for each chart
variable
each bar represents the statistic selected for a value of the chart variable
response axis (vertical) provides a scale for statistic graphed
midpoint axis - horizontal axis
34
VBAR SYNTAX
VBAR chart variables/ options;
chart-variable(s) specifies one or more variables that define the categories of data to chart.
optionsspecifies appearance, statistics, axes and midpoint options
35
VBAR
midpoints are the values of the chart variable that identify categories of data. By default, midpoints are selected or calculated by the procedure. The way the procedure handles the midpoints depends on whether the values of the chart variable are character, discrete numeric, or continuous numeric.
character chart variables- separate bar is drawn for each value
36
VBAR numeric chart variables - each bar represents
a range of values - DISCRETE option generates a midpoint
for each unique value of the chart variable.
- generates midpoints that represent ranges of values. By default, determines the ranges, calculates the median value of each range, and displays the median value at each midpoint on the chart. A value that falls exactly halfway between two midpoints is placed in the higher range.
37
VBAR OPTIONS
For character or discrete numeric values, you can use the MIDPOINTS= option to rearrange the midpoints or to exclude midpoints from the chart.
For character dataMIDPOINTS= list values in quotesMIDPOINTS=‘Sydney’ ‘Atlanta’ ‘Paris’
38
VBAR OPTIONS For continuous numeric variables, use the
MIDPOINTS= option to change the number of midpoints, to control the range of values each midpoint represents, or to change the order of the midpoints. To control the range of values each midpoint represents, use the MIDPOINTS= option to specify the median value of each range. For example, to select the ranges 20-29, 30-39, and 40-49, specify
MIDPOINTS=25 35 45
39
VBAR OPTIONS
Other options;
DISCRETE separate bar for each value of numeric variable
TYPE=statistic specifies the chart statistic.
FREQ frequency
PCT percentage
SUM sum (the default)
MEAN mean
CLM=confidence-level draws chart confidence intervals (error bars)
40
VBAR SYNTAX
SUMVAR=variablespecifies variable to used for sum or mean calculations for each midpoint. The resulting statistics are represented by
the length of the bars along the response axis, and they are displayed at major tick marks. REQUIRED if specifying TYPE-MEAN or SUM.
RAXIS= axisn response axis MAXIS=axisn midpoint axis
41
GLOBAL STATEMENTS
define titles, footnotes
used to control axes, symbols, patterns, and legends
can be defined anywhere inside a proc or before a proc
in effect until canceled, replaced, or the end of SAS session
cancel by repeating statement with no options or using
goptions RESET=ALL;
42
GLOBAL STATEMENTS
TITLE defines titles
AXIS defines appearance of axes
FOOTNOTE defines footnotes
PATTERN defines patterns used in graphs (histograms)
LEGEND defines legends
SYMBOL defines symbols (plotting) NOTE adds text to graph
43
TITLE STATEMENT
creates, changes or cancels a title for all subsequent graphics output in a SAS session
allowed up to 10 titles keyword TITLE can be followed by
unlimited number of text strings and options
text strings enclosed in single or double quotes
most recently created TITLE number replaces the previous TITLE of the same number
44
Title syntax
TITLE<1,2....10> <options | ‘text’> ...... <options-n>| ‘text-n’>;Options: FONT=font specifies the font for the
subsequent text.
HEIGHT= specifies the height of text H=n<units> characters in number of units
JUSTIFY= specifies the alignment J=R|L|C By default, JUSTIFY=C=center
R=right L=left.
45
PATTERN STATEMENT
defines the characteristics of patterns used in charts
type of fill pattern - solid, empty, lined color
An example of a global statement
46
PATTERN STATEMENT
PATTERN <1....99> options;
OPTIONS COLOR= pattern color
VALUE= fill E empty S solid Ln left slanting lines Rn right slanting lines Xn crosshatched lines where n is 1-5 1 indicating the lightest
47
Proc GCHART example
pattern1 color=blue value=fill; pattern2 color=red value=fill;
proc gchart data=ca88air; star month / sumvar=co type=mean discrete ctext=black noheading ; run; quit;
48
49
Exporting graphs
Make sure the graphics window has focus, by clicking on it.
File export as Image select type of image – gif, … open other software program – Powerpoint insert picture
50
Graphs can also be saved in a SAS catalog. They are stored in a SAS proprietary format. They can be viewed with proc greplay.
goptions replace;libname mylib ‘c:\Temp\sasclass\myfiles’;proc gchart data=mydat gout=lib.mygraphs;…
proc greplay allows multiple plots on one page.
Saving graphs
51
PROC GPLOTgraphs one variable against another
producing presentation quality plots
coordinates of each point correspond to the values in one or more observations of the input data set.
run-group processing- procedure does not end with a run- submit new statements and produce
more graphs without another PROC- ends with QUIT or PROC or DATA
52
Proc GPLOT
produces two-dimensional graphs that plot one variable against another within a set of coordinate axes
graphs are automatically scaled to the values of your data, although scaling can be controlled with options or with AXIS statements.
scatterplots, bubble plots plots, plots with interpolated lines (SYMBOL statement)
53
2 4 6 8
10
Tick Marks
Values
VERTICAL AXIS Y variable
H O R IZ O N T A L A X IS X va r iab le
20
54
GPLOT SYNTAX
PROC GPLOT data=data-set-name <options>;
PLOT request list </options list>;
request list is of the form:
vertical*horizontal e.g. PLOT y*x;
vertical*horizontal=variable e.g. PLOT y*x=z;
55
Graphics options on PLOT statement
CTEXT= color LEGEND= LEGENDn
(uses nth global LEGEND statement)
HAXIS=AXISn (uses nth global AXIS statement)
VAXIS=AXISn (uses nth global AXIS statement)
GPLOT SYNTAX
56
Proc GPLOT example
• Suppose we are asked to draw a plot of ozone by month for the three stations SFO, LIV, AZU. After consulting the help we might try:
proc gplot data=ca88air; plot o3 * month; run; quit;
which produces:
57
58
Proc GPLOT example• Increase the size of the text• use a format to print out Month names• clear the unwanted footnote
GOPTIONS gunits=pct htext=4; footnote1;
proc gplot data=ca88air; plot o3 * month ; format month mth.; title1 '1988 Air Quality Data - Ozone'; run;
59
60
Proc GPLOT example
• back to the help• you can make a stratified plot by station• x axis too crowded - use a different format
proc gplot data=ca88air; plot o3 * month = station; format month mthc.; title1 '1988 Air Quality Data - Ozone'; run;
61
62
Proc GPLOT example
• the symbols in the plot are too small• use symbol global statements!
symbol1 v=dot i=join c=blue h=1.3; symbol2 v=dot i=join c=green h=1.3; symbol3 v=dot i=join c=brown h=1.3;
proc gplot data=ca88air; plot o3 * month = station; format month mthc.; title1 '1988 Air Quality Data - Ozone'; run;
63
64
Proc GPLOT exampleThe x-axis is not right - use an axis global statement
axis1 minor = none label = (f=simplex j=c 'Ozone levels at three locations') major = (h=1.1) order = (0 to 13 by 1) value = (f=simplex h=3.0);
proc gplot data=ca88air; plot o3 * month = station / haxis=axis1; format month mthc.; title1 '1988 Air Quality Data - Ozone'; run;
65
66
Proc GPLOT example• The x-axis has extra characters - use a new format
or use an axis global statement• y-axis label need to be rotated and placed in
center of axis• legend needs moving - legend global command
axis1 minor = none label = (f=centb j=c 'Ozone levels at three locations') major = (h=1.0) order = (0 to 13 by 1) value = (f=simplex h=3.0 " " "J" "F" "M" "A" "M" "J” "J" "A" "S" "O" "N" "D" " ");
67
Proc GPLOT example axis2 label = (f=centb rotate=0 angle=90 j=c
'Ozone') value = (f=simplex h=3.0) ;
legend1 across=3 position=(bottom center inside) label=none; proc gplot data=ca88air; plot o3 * month = station / haxis=axis1
vaxis=axis2; format month mthc.; title1 '1988 Air Quality Data - Ozone'; run;
68
69
proc g3d and proc contour produce 3-dimensional analogs of gplot
70
Maps• You can use proc gmaps to make simple
presentation maps• There is another product by SAS called
SAS/GIS - i.e. SAS / geographical information system
71
72
Data• taken from the CDC web page• AIDS prevalence during 1997-1998
• rate is given for each state per 100,000 of population
• state is given by name and two letter code
• map data is provided by SAS in the library maps -- the map we will use is maps.us
• if you look in the maps library you will see data for maps for most countries and world maps
73
Data• this data uses FIPS coding to match geographic
boundries e.g. the fips coding for Alaska is 02 and Maryland is 24
• We need to join the AIDS data and the FIPS codes in order to map the data
proc sort data=aids; by name;proc sort data=state; by name;
data join; merge aids(in=inaids) state(in=instate); by name;
if inaids and instate then output join;run;
74
Proc GMAP
• proc gmap is used to create a number of different types of map
• the map we will be interested in is a choropleth map -- this is a map in which the rates will be color-coded by state.
• such a map shares many of the properties of a chart, particulary a pie or star chart -- both use areas to represent information, but in the case of the choropleth map the color/shading contains the display information
75
Proc GMAP
• First we set up some global title and footnote statements:
title1 color=blue font=centb "Acquired immunodeficiency syndrome (AIDS) by
state" ; title2 font=cent "(per 100,000 of population)" ; title3 font=cent "12 months ending June, 1998" ;
footnote1 color=green justify=left " Choropleth Map";
76
Proc GMAP• the syntax of proc gmap is like other graphic
procedures we have met, but it specifically requires:– a map dataset (maps.us in this case)– an id variable which is present in both the map
dataset and the dataset we wish to map (in this case the variable state is in both datasets and contains the fips code)
– the syntax is: proc gmap map=map data=data; id idvar; choro rate / options; run;
77
Proc GMAP title1 color=blue font=centb "Acquired immunodeficiency syndrome (AIDS) by
state" ; title2 font=cent "(per 100,000 of population)" ; title3 font=cent "12 months ending June, 1998" ;
footnote1 color=green justify=left " Choropleth Map"; proc gmap map=maps.us data=join; id state; choro rate / coutline=black midpoints=5.0 10.0 15.0 20.0 25.0 35.0 ; run;
78
79
Proc GMAP
Instead of a choropleth map, you could also make a surface map. For example:
proc gmap map=maps.us data=join; id state; surface rate / constant=20 cbody=red
nlines=100; footnote1 color=green justify=left " Surface
Map"; run;
80
81
defines appearance and location of axes and tick marks
defines text and appearance of axis label
defines order of data values on axis
99 active AXIS statements in a SAS session
Syntax: AXIS<1...99> <option(s)>;
Axis statement
82
ORDER=(value list)specifies the data values in the order they are to appear on the axis. The values specified by ORDER= are the major tick mark values. These values are displayed at the major tick marks unless they are modified by the VALUE= option.
Examples:
ORDER=(10 to 50 by 10)ORDER=(10,20,30,40,50)
Axis statement options
83
LABEL= (text description ‘text string’); By default, the text of the axis label is either the
variable name or a previously assigned variable label. Enclose each string in quotation marks.
COLOR=text-color ANGLE=degrees FONT=font | NONE HEIGHT=text-height <units>JUSTIFY=LEFT | CENTER | RIGHT
Example: Label= (font=swissb color=blue j=l a=90
‘Systolic BP mmHG’) ;
Axis statement options
84
VALUE=(text description1 ‘text’ ... text descriptionn ‘textn’);
modifies the major tick mark values , that is, the text that labels the major tick marks on the axis. Text-description defines the appearance and ‘text’ is the text of a major tick mark value.
COLOR=text-color ANGLE=degrees FONT=font | NONE HEIGHT=text-height <units>JUSTIFY=LEFT | CENTER | RIGHT
Axis statement options
85
specifies symbols in GPLOT
defines appearance of symbols, plot lines, including bars, boxes, confidence limits, and area fills
interpolation methods
Symbol statement
86
SYMBOL<1....99> options;
COLOR = symbol color FONT= font HEIGHT= n <units> INTERPOL = R<type> =STEP ( for KM plots) =BOX VALUE= symbol WIDTH=n