summarizing your data - kocwcontents.kocw.net/kocw/document/2014/gacheon/kimnamh... · 2016. 9....
TRANSCRIPT
Summarizing Your Data
Statistical Data Analysis 1
Namhyoung Kim
Dept. of Applied Statistics
Gachon University
1
Contents
3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining the Distribution of Data with
PROC UNIVARIATE 3.3 Counting Data with PROC FREQ 3.4 Creating Graphics with PROC PLOT and
PROC CHART 3.5 Other statements 3.6 Interactive Data Analysis: SAS/INSIGHT
2
Procedure step
PROC procedure-name DATA=SAS-data-set options;
keywordname of the procedure
data set to use as input for that procedure
3.1 Summarizing Data UsingPROC MEANS
The MEANS procedure provides simple statistics on numeric variables
PROC MEANS DATA=SAS-data-set options;BY variables;CLASS variables/options;FREQ variables;ID variables;OUTPUT OUT=SAS-data-set statistic-specifications/ options;TYPES requests;VAR variables/ WEIGHT=weight-variable;WAYS list;WEIGHT variable;
RUN;
4
3.1 Summarizing Data UsingPROC MEANS
5
3.1 Summarizing Data UsingPROC MEANS
List of default statistics N(number of non-missing values) MEAN(the mean) STD( the standard deviation) MIN(the minimum value) MAX(the maximum value)
More options MEDIAN(the median) MODE(the mode) NMISS(number of missing values) SUM(the sum) CV(Coefficient of Variation) RANGE(the range)
6
3.1 Summarizing Data UsingPROC MEANS
CLASS variable-list; The CLASS statement also performs separate
analyses for each level of the variables in the list
VAR variable-list; The VAR statement specifies which numeric
variables to use in the analysis. If it is absent then SAS uses all numeric variables.
7
3.2 Examining the Distribution of Data with PROC UNIVARIATE
PROCE UNIVARIATE produces statistics and graphs describing the distribution of a single variable.
The statistics include the mean, median, mode, standard deviation, skewness, and kurtosis.
The UNIVARIATE procedure can produce several graphs that are useful for data exploration. CDFPLOT : a cumulative distribution function plot HISTOGRAM: a histogram PPPLOT: a probability-probability plot PROBPLOT: a probability plot QQPLOT: a quantile-quantile plot
8
3.2 Examining the Distribution of Data with PROC UNIVARIATE
PROC UNIVARIATE DATA=SAS-data-set options;BY variables;CLASS variables/KEYLEVEL=‘value1’|(‘value1 ‘value2’)>;FREQ variables;HISTOGRAM variables / options;ID variables;OUTPUT OUT=SAS-data-set statistic-keywords=names;PROBPLOT variables / options;QQPLOT variables / options;VAR variables;WEIGHT variable;
RUN;
9
3.2 Examining the Distribution of Data with PROC UNIVARIATE
10
3.2 Examining the Distribution of Data with PROC UNIVARIATE
PROC UNIVARIATE vs. PROC BOXPLOT
11
>>>>> 성별 기초통계량 <<<<<
F M
150
155
160
165
170
175
180
키
성별
3.3 Counting Data with PROC FREQ
The most obvious reason for using PROC FREQ is to create tables showing the distribution of categorical data values.
PROC FREQ DATA=SAS-data-set options;BY variables;EXACT statistic-keywords /options;OUTPUT statistic-keywords OUT=SAS-data-set;TABLES requests / options;TEST statistic-keywords;WEIGHT variable;
RUN;
12
3.3 Counting Data with PROC FREQ
TABLES statement TABLES age; TABLES age*gender; TABLES a*b*c*d; TABLES a*(b c); a*b, a*c frequency table TABLES (a b)*(c d); a*c, a*d, b*c, b*d frequency
table TABLES (a b c)*d; a*d, b*d, c*d
13
3.3 Counting Data with PROC FREQ
14
3.3 Counting Data with PROC FREQ
Data with frequency variable
15
3.3 Counting Data with PROC FREQ
16
3.4 Creating Graphics with PROC PLOT and PROC CHART
PROC PLOT and PROC CHART provide text-type graphics
PROC PLOT DATA=SAS-data-set options;BY variables;PLOT plot-requests / options;
RUN;
17
3.4 Creating Graphics with PROC PLOT and PROC CHART
18
3.4 Creating Graphics with PROC PLOT and PROC CHART
PROC CHART DATA=SAS-data-set options;BLOCK variables / options;BY variables;HBAR variables / options;PIE variables / options;STAR variables / options;VBAR variable/ options;
RUN;
19
3.4 Creating Graphics with PROC PLOT and PROC CHART
20
3.5.1 OPTIONS statement
21
The OPTIONS statement is part of a SAS program and affects all steps that follow it.
3.5.1 OPTIONS statement
Common options
22
CENTER | NOCENTER
DATE | NODATE
LINESIZE = n
NUMBER | NONUMBER
ORIENTATION = PORTRAIT ORIENTATION = LANDSCAPE
Specifies the orientation for printing output.Default: PORTRAIT
PAGENO = n
PAGESIZE = n
RIGHTMARGIN = n LEFTMARGIN = n TOPMARGIN = nBOTTOMMARGIN = n
Specifies size of margin (such as 0.75in or 2cm) to be used for printingoutput. Default: 0.00in.
3.5.2 TITLE, FOOTNOTE statement
TITLEn ‘text’; or FOOTNOTEn ‘text’; You can put them anywhere in your program, but
since they apply to the procedure output it generally makes sense to put them with the procedure.
effect until you replace them with new ones or cancel them with a null statement(TITLE;)
When you specify a new title or footnote, it replaces the old title or footnote with the same number and cancels those with a higher number.
23
3.5.2 TITLE, FOOTNOTE statement
24
Practice
Car sas data set 3.2 MEANS procedure 변수 mileage 와 reliable에 대한 평균, 표준편차, 합계
를 소수점 이하 세 자리까지 출력하여라 변수 mileage와 reliable에 대한 평균, 표준편차, 합계
를 SAS 데이터셋에 저장하고 그 내용을 살펴보아라 변수 manufact의 각 수준별로 변수 mileage와
reliable에 대한 개체 수, 결측값의 수, 최소값, 최대값, 범위, 평균, 표준편차, 변동계수를 출력하여라.
25
Practice
Car sas data set 3.3 UNIVARIATE procedure 변수 mileage 와 reliable에 대한 일변량 기술통계량
을 출력하여라 HISTOGRAM, PROBPLOT, QQPLOT 명령문을 사용하
여 변수 mileage와 reliable에 대한 막대그래프, 확률그림, 분위수-분위수 그림을 출력하여라
변수 size의 각 수준별로 변수 mileage와 reliable에대한 일변량 기술통계량을 출력하여라.
26
Practice
Car sas data set 3.4 FREQ procedure Size, manufact, reliable, index 의 1차원 빈도표를 출
력하여라. 이때 결측값을 빈도표의 범주에 포함하여라(참고: TABLES 명령문에 MISSING 옵션을 사용할 것)
Size*manufact, size*index의 2차원 분할표를 출력하여라.
size*index의 각 수준별 조합에 따른 빈도를 SAS 데이터셋에 저장하고 그 내용을 살펴보아라(참고: TABLES 명령문에 OUT=옵션을 사용할 것).
27