summarizing your data - kocwcontents.kocw.net/kocw/document/2014/gacheon/kimnamh... · 2016. 9....

Post on 04-Sep-2020

13 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Summarizing Your Data

Statistical Data Analysis 1

Namhyoung Kim

Dept. of Applied Statistics

Gachon University

nhkim@gachon.ac.kr

1

Contents

3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining the Distribution of Data with

PROC UNIVARIATE 3.3 Counting Data with PROC FREQ 3.4 Creating Graphics with PROC PLOT and

PROC CHART 3.5 Other statements 3.6 Interactive Data Analysis: SAS/INSIGHT

2

Procedure step

PROC procedure-name DATA=SAS-data-set options;

keywordname of the procedure

data set to use as input for that procedure

3.1 Summarizing Data UsingPROC MEANS

The MEANS procedure provides simple statistics on numeric variables

PROC MEANS DATA=SAS-data-set options;BY variables;CLASS variables/options;FREQ variables;ID variables;OUTPUT OUT=SAS-data-set statistic-specifications/ options;TYPES requests;VAR variables/ WEIGHT=weight-variable;WAYS list;WEIGHT variable;

RUN;

4

3.1 Summarizing Data UsingPROC MEANS

5

3.1 Summarizing Data UsingPROC MEANS

List of default statistics N(number of non-missing values) MEAN(the mean) STD( the standard deviation) MIN(the minimum value) MAX(the maximum value)

More options MEDIAN(the median) MODE(the mode) NMISS(number of missing values) SUM(the sum) CV(Coefficient of Variation) RANGE(the range)

6

3.1 Summarizing Data UsingPROC MEANS

CLASS variable-list; The CLASS statement also performs separate

analyses for each level of the variables in the list

VAR variable-list; The VAR statement specifies which numeric

variables to use in the analysis. If it is absent then SAS uses all numeric variables.

7

3.2 Examining the Distribution of Data with PROC UNIVARIATE

PROCE UNIVARIATE produces statistics and graphs describing the distribution of a single variable.

The statistics include the mean, median, mode, standard deviation, skewness, and kurtosis.

The UNIVARIATE procedure can produce several graphs that are useful for data exploration. CDFPLOT : a cumulative distribution function plot HISTOGRAM: a histogram PPPLOT: a probability-probability plot PROBPLOT: a probability plot QQPLOT: a quantile-quantile plot

8

3.2 Examining the Distribution of Data with PROC UNIVARIATE 

PROC UNIVARIATE DATA=SAS-data-set options;BY variables;CLASS variables/KEYLEVEL=‘value1’|(‘value1 ‘value2’)>;FREQ variables;HISTOGRAM variables / options;ID variables;OUTPUT OUT=SAS-data-set statistic-keywords=names;PROBPLOT variables / options;QQPLOT variables / options;VAR variables;WEIGHT variable;

RUN;

9

3.2 Examining the Distribution of Data with PROC UNIVARIATE 

10

3.2 Examining the Distribution of Data with PROC UNIVARIATE 

PROC UNIVARIATE vs. PROC BOXPLOT

11

>>>>> 성별 기초통계량 <<<<<

F M

150

155

160

165

170

175

180

성별

3.3 Counting Data with PROC FREQ

The most obvious reason for using PROC FREQ is to create tables showing the distribution of categorical data values.

PROC FREQ DATA=SAS-data-set options;BY variables;EXACT statistic-keywords /options;OUTPUT statistic-keywords OUT=SAS-data-set;TABLES requests / options;TEST statistic-keywords;WEIGHT variable;

RUN;

12

3.3 Counting Data with PROC FREQ

TABLES statement TABLES age; TABLES age*gender; TABLES a*b*c*d; TABLES a*(b c); a*b, a*c frequency table TABLES (a b)*(c d); a*c, a*d, b*c, b*d frequency

table TABLES (a b c)*d; a*d, b*d, c*d

13

3.3 Counting Data with PROC FREQ

14

3.3 Counting Data with PROC FREQ

Data with frequency variable

15

3.3 Counting Data with PROC FREQ

16

3.4 Creating Graphics with PROC PLOT and PROC CHART

PROC PLOT and PROC CHART provide text-type graphics

PROC PLOT DATA=SAS-data-set options;BY variables;PLOT plot-requests / options;

RUN;

17

3.4 Creating Graphics with PROC PLOT and PROC CHART

18

3.4 Creating Graphics with PROC PLOT and PROC CHART

PROC CHART DATA=SAS-data-set options;BLOCK variables / options;BY variables;HBAR variables / options;PIE variables / options;STAR variables / options;VBAR variable/ options;

RUN;

19

3.4 Creating Graphics with PROC PLOT and PROC CHART

20

3.5.1 OPTIONS statement

21

The OPTIONS statement is part of a SAS program and affects all steps that follow it.

3.5.1 OPTIONS statement

Common options

22

CENTER | NOCENTER

DATE | NODATE

LINESIZE = n

NUMBER | NONUMBER

ORIENTATION = PORTRAIT ORIENTATION = LANDSCAPE

Specifies the orientation for printing output.Default: PORTRAIT

PAGENO = n

PAGESIZE = n

RIGHTMARGIN = n LEFTMARGIN = n TOPMARGIN = nBOTTOMMARGIN = n

Specifies size of margin (such as 0.75in or 2cm) to be used for printingoutput. Default: 0.00in.

3.5.2 TITLE, FOOTNOTE statement

TITLEn ‘text’; or FOOTNOTEn ‘text’; You can put them anywhere in your program, but

since they apply to the procedure output it generally makes sense to put them with the procedure.

effect until you replace them with new ones or cancel them with a null statement(TITLE;)

When you specify a new title or footnote, it replaces the old title or footnote with the same number and cancels those with a higher number.

23

3.5.2 TITLE, FOOTNOTE statement

24

Practice

Car sas data set 3.2 MEANS procedure 변수 mileage 와 reliable에 대한 평균, 표준편차, 합계

를 소수점 이하 세 자리까지 출력하여라 변수 mileage와 reliable에 대한 평균, 표준편차, 합계

를 SAS 데이터셋에 저장하고 그 내용을 살펴보아라 변수 manufact의 각 수준별로 변수 mileage와

reliable에 대한 개체 수, 결측값의 수, 최소값, 최대값, 범위, 평균, 표준편차, 변동계수를 출력하여라.

25

Practice

Car sas data set 3.3 UNIVARIATE procedure 변수 mileage 와 reliable에 대한 일변량 기술통계량

을 출력하여라 HISTOGRAM, PROBPLOT, QQPLOT 명령문을 사용하

여 변수 mileage와 reliable에 대한 막대그래프, 확률그림, 분위수-분위수 그림을 출력하여라

변수 size의 각 수준별로 변수 mileage와 reliable에대한 일변량 기술통계량을 출력하여라.

26

Practice

Car sas data set 3.4 FREQ procedure Size, manufact, reliable, index 의 1차원 빈도표를 출

력하여라. 이때 결측값을 빈도표의 범주에 포함하여라(참고: TABLES 명령문에 MISSING 옵션을 사용할 것)

Size*manufact, size*index의 2차원 분할표를 출력하여라.

size*index의 각 수준별 조합에 따른 빈도를 SAS 데이터셋에 저장하고 그 내용을 살펴보아라(참고: TABLES 명령문에 OUT=옵션을 사용할 것).

27

top related