summarizing your data - kocwcontents.kocw.net/kocw/document/2014/gacheon/kimnamh... · 2016. 9....

27
Summarizing Your Data Statistical Data Analysis 1 Namhyoung Kim Dept. of Applied Statistics Gachon University [email protected] 1

Upload: others

Post on 04-Sep-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

Summarizing Your Data

Statistical Data Analysis 1

Namhyoung Kim

Dept. of Applied Statistics

Gachon University

[email protected]

1

Page 2: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

Contents

3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining the Distribution of Data with

PROC UNIVARIATE 3.3 Counting Data with PROC FREQ 3.4 Creating Graphics with PROC PLOT and

PROC CHART 3.5 Other statements 3.6 Interactive Data Analysis: SAS/INSIGHT

2

Page 3: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

Procedure step

PROC procedure-name DATA=SAS-data-set options;

keywordname of the procedure

data set to use as input for that procedure

Page 4: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

3.1 Summarizing Data UsingPROC MEANS

The MEANS procedure provides simple statistics on numeric variables

PROC MEANS DATA=SAS-data-set options;BY variables;CLASS variables/options;FREQ variables;ID variables;OUTPUT OUT=SAS-data-set statistic-specifications/ options;TYPES requests;VAR variables/ WEIGHT=weight-variable;WAYS list;WEIGHT variable;

RUN;

4

Page 5: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

3.1 Summarizing Data UsingPROC MEANS

5

Page 6: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

3.1 Summarizing Data UsingPROC MEANS

List of default statistics N(number of non-missing values) MEAN(the mean) STD( the standard deviation) MIN(the minimum value) MAX(the maximum value)

More options MEDIAN(the median) MODE(the mode) NMISS(number of missing values) SUM(the sum) CV(Coefficient of Variation) RANGE(the range)

6

Page 7: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

3.1 Summarizing Data UsingPROC MEANS

CLASS variable-list; The CLASS statement also performs separate

analyses for each level of the variables in the list

VAR variable-list; The VAR statement specifies which numeric

variables to use in the analysis. If it is absent then SAS uses all numeric variables.

7

Page 8: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

3.2 Examining the Distribution of Data with PROC UNIVARIATE

PROCE UNIVARIATE produces statistics and graphs describing the distribution of a single variable.

The statistics include the mean, median, mode, standard deviation, skewness, and kurtosis.

The UNIVARIATE procedure can produce several graphs that are useful for data exploration. CDFPLOT : a cumulative distribution function plot HISTOGRAM: a histogram PPPLOT: a probability-probability plot PROBPLOT: a probability plot QQPLOT: a quantile-quantile plot

8

Page 9: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

3.2 Examining the Distribution of Data with PROC UNIVARIATE 

PROC UNIVARIATE DATA=SAS-data-set options;BY variables;CLASS variables/KEYLEVEL=‘value1’|(‘value1 ‘value2’)>;FREQ variables;HISTOGRAM variables / options;ID variables;OUTPUT OUT=SAS-data-set statistic-keywords=names;PROBPLOT variables / options;QQPLOT variables / options;VAR variables;WEIGHT variable;

RUN;

9

Page 10: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

3.2 Examining the Distribution of Data with PROC UNIVARIATE 

10

Page 11: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

3.2 Examining the Distribution of Data with PROC UNIVARIATE 

PROC UNIVARIATE vs. PROC BOXPLOT

11

>>>>> 성별 기초통계량 <<<<<

F M

150

155

160

165

170

175

180

성별

Page 12: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

3.3 Counting Data with PROC FREQ

The most obvious reason for using PROC FREQ is to create tables showing the distribution of categorical data values.

PROC FREQ DATA=SAS-data-set options;BY variables;EXACT statistic-keywords /options;OUTPUT statistic-keywords OUT=SAS-data-set;TABLES requests / options;TEST statistic-keywords;WEIGHT variable;

RUN;

12

Page 13: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

3.3 Counting Data with PROC FREQ

TABLES statement TABLES age; TABLES age*gender; TABLES a*b*c*d; TABLES a*(b c); a*b, a*c frequency table TABLES (a b)*(c d); a*c, a*d, b*c, b*d frequency

table TABLES (a b c)*d; a*d, b*d, c*d

13

Page 14: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

3.3 Counting Data with PROC FREQ

14

Page 15: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

3.3 Counting Data with PROC FREQ

Data with frequency variable

15

Page 16: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

3.3 Counting Data with PROC FREQ

16

Page 17: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

3.4 Creating Graphics with PROC PLOT and PROC CHART

PROC PLOT and PROC CHART provide text-type graphics

PROC PLOT DATA=SAS-data-set options;BY variables;PLOT plot-requests / options;

RUN;

17

Page 18: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

3.4 Creating Graphics with PROC PLOT and PROC CHART

18

Page 19: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

3.4 Creating Graphics with PROC PLOT and PROC CHART

PROC CHART DATA=SAS-data-set options;BLOCK variables / options;BY variables;HBAR variables / options;PIE variables / options;STAR variables / options;VBAR variable/ options;

RUN;

19

Page 20: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

3.4 Creating Graphics with PROC PLOT and PROC CHART

20

Page 21: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

3.5.1 OPTIONS statement

21

The OPTIONS statement is part of a SAS program and affects all steps that follow it.

Page 22: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

3.5.1 OPTIONS statement

Common options

22

CENTER | NOCENTER

DATE | NODATE

LINESIZE = n

NUMBER | NONUMBER

ORIENTATION = PORTRAIT ORIENTATION = LANDSCAPE

Specifies the orientation for printing output.Default: PORTRAIT

PAGENO = n

PAGESIZE = n

RIGHTMARGIN = n LEFTMARGIN = n TOPMARGIN = nBOTTOMMARGIN = n

Specifies size of margin (such as 0.75in or 2cm) to be used for printingoutput. Default: 0.00in.

Page 23: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

3.5.2 TITLE, FOOTNOTE statement

TITLEn ‘text’; or FOOTNOTEn ‘text’; You can put them anywhere in your program, but

since they apply to the procedure output it generally makes sense to put them with the procedure.

effect until you replace them with new ones or cancel them with a null statement(TITLE;)

When you specify a new title or footnote, it replaces the old title or footnote with the same number and cancels those with a higher number.

23

Page 24: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

3.5.2 TITLE, FOOTNOTE statement

24

Page 25: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

Practice

Car sas data set 3.2 MEANS procedure 변수 mileage 와 reliable에 대한 평균, 표준편차, 합계

를 소수점 이하 세 자리까지 출력하여라 변수 mileage와 reliable에 대한 평균, 표준편차, 합계

를 SAS 데이터셋에 저장하고 그 내용을 살펴보아라 변수 manufact의 각 수준별로 변수 mileage와

reliable에 대한 개체 수, 결측값의 수, 최소값, 최대값, 범위, 평균, 표준편차, 변동계수를 출력하여라.

25

Page 26: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

Practice

Car sas data set 3.3 UNIVARIATE procedure 변수 mileage 와 reliable에 대한 일변량 기술통계량

을 출력하여라 HISTOGRAM, PROBPLOT, QQPLOT 명령문을 사용하

여 변수 mileage와 reliable에 대한 막대그래프, 확률그림, 분위수-분위수 그림을 출력하여라

변수 size의 각 수준별로 변수 mileage와 reliable에대한 일변량 기술통계량을 출력하여라.

26

Page 27: Summarizing Your Data - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/kimnamh... · 2016. 9. 9. · Contents 3 Using SAS Procedures 3.1 Summarizing Data Using PROC MEANS 3.2 Examining

Practice

Car sas data set 3.4 FREQ procedure Size, manufact, reliable, index 의 1차원 빈도표를 출

력하여라. 이때 결측값을 빈도표의 범주에 포함하여라(참고: TABLES 명령문에 MISSING 옵션을 사용할 것)

Size*manufact, size*index의 2차원 분할표를 출력하여라.

size*index의 각 수준별 조합에 따른 빈도를 SAS 데이터셋에 저장하고 그 내용을 살펴보아라(참고: TABLES 명령문에 OUT=옵션을 사용할 것).

27