time series data processes

24
Time Series Data Processes by Tai Yu April 15, 2013

Upload: danyl

Post on 04-Feb-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Time Series Data Processes. by Tai Yu April 15, 2013. WHAT ARE YOU DOING???. 61 %macro get_data(mn_yr0 ,mn_yr1 ,mn_yr2 ,mn_yr3 ,mn_yr4 ,mn_yr5, mn_yr6); 62 63 data asof_&mn_yr6. asof_&mn_yr5. asof_&mn_yr4. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Time Series Data Processes

Time Series Data Processes

by

Tai Yu

April 15, 2013

Page 2: Time Series Data Processes

• 61 %macro get_data(mn_yr0 ,mn_yr1 ,mn_yr2 ,mn_yr3 ,mn_yr4 ,mn_yr5, mn_yr6);

• 62• 63 data asof_&mn_yr6. asof_&mn_yr5. asof_&mn_yr4. • asof_&mn_yr3. asof_&mn_yr2. asof_&mn_yr1.;• • 64 set wfpf_delinq_data;• 65• 66 if daysdelq<1 THEN DPD=0; • else if daysdelq>180 THEN DPD=180; else DPD=daysdelq;• 67• 68 if "28&mn_yr1.:00:00:00"dt<=ASOF_DT<"01&mn_yr0.:00:00:00"dt • then do;dpd_1=dpd; output asof_&mn_yr1.; end;• 69 else if "28&mn_yr2.:00:00:00"dt<=ASOF_DT<"01&mn_yr1.:00:00:00"dt • then do; dpd_2=dpd; output asof_&mn_yr2.; end;• 70 else if "28&mn_yr3.:00:00:00"dt<=ASOF_DT<"01&mn_yr2.:00:00:00"dt • then do; dpd_3=dpd; output asof_&mn_yr3.; end;• 71 else if "28&mn_yr4.:00:00:00"dt<=ASOF_DT<"01&mn_yr3.:00:00:00"dt • then do; dpd_4=dpd; output asof_&mn_yr4.; end;• 72 else if "28&mn_yr5.:00:00:00"dt<=ASOF_DT<"01&mn_yr4:00:00:00"dt • then do; dpd_5=dpd; output asof_&mn_yr5.; end;• 73 else if "28&mn_yr6.:00:00:00"dt<=ASOF_DT<"01&mn_yr5.:00:00:00"dt • then do; dpd_6=dpd; output asof_&mn_yr6.; end;• 74• 75 run;

WHAT ARE YOU DOING???

2

Page 3: Time Series Data Processes

• 84 data _asof_&mn_yr1.; set asof_&mn_yr1.; drop dpd_2--dpd_6; run;• 85 data _asof_&mn_yr2.; set asof_&mn_yr2.; drop dpd_1 dpd_3--dpd_6; run;• 86 data _asof_&mn_yr3.; set asof_&mn_yr3.; drop dpd_1--dpd_2 dpd_4--dpd_6; run;• 87 data _asof_&mn_yr4.; set asof_&mn_yr4.; drop dpd_1--dpd_3 dpd_5--dpd_6; run;• 88 data _asof_&mn_yr5.; set asof_&mn_yr5.; drop dpd_1--dpd_4 dpd_6 ; run;• 89 data _asof_&mn_yr6.; set asof_&mn_yr6.; drop dpd_1--dpd_5 ; run;• 90• 91 data Cohort_12mn_dpd_&mn_yr1.;• 92 merge _asof_&mn_yr6. _asof_&mn_yr5. _asof_&mn_yr4. _asof_&mn_yr3. • _asof_&mn_yr2. _asof_&mn_yr1. (in=a);• • 93 by acct_id;• 94 if a;• 95 run;• 96• 97 %mend; • 98• 99 get_data(Sep2006,• Aug2006,Jul2006,Jun2006,May2006,Apr2006, Mar2006,);

WHAT ARE YOU DOING???

3

Page 4: Time Series Data Processes

• 61 %macro get_data(mn_yr0 ,mn_yr1 ,mn_yr2 ,mn_yr3 ,mn_yr4 ,mn_yr5 , mn_yr6 , • 62 mn_yr6, mn_yr7 ,mn_yr8 ,mn_yr9 ,mn_yr10,mn_yr11 , mn_yr12);

• 63 data asof_&mn_yr12. asof_&mn_yr11. asof_&mn_yr10. asof_&mn_yr9. asof_&mn_yr8. asof_&mn_yr7. • 64 asof_&mn_yr6. asof_&mn_yr5. asof_&mn_yr4. asof_&mn_yr3. asof_&mn_yr2. asof_&mn_yr1. ;• 65 set wfpf_delinq_data;• 66• 67 if daysdelq<1 THEN DPD=0; else if daysdelq>180 THEN DPD=180; else DPD=daysdelq;• 68• 69 if "28&mn_yr1.:00:00:00"dt<=ASOF_DT<"01& asof_&mn_yr0. :00:00:00"dt • then do;dpd_1=dpd; output asof_&mn_yr1.; end;• 70 else if "28&mn_yr2.:00:00:00"dt<=ASOF_DT<"01&mn_yr1.:00:00:00"dt • then do; dpd_2=dpd; output asof_&mn_yr2.; end;• 71 else if "28&mn_yr3.:00:00:00"dt<=ASOF_DT<"01&mn_yr2.:00:00:00"dt • then do; dpd_3=dpd; output asof_&mn_yr3.; end;• 72 else if "28&mn_yr4.:00:00:00"dt<=ASOF_DT<"01&mn_yr3.:00:00:00"dt • then do; dpd_4=dpd; output asof_&mn_yr4.; end;• 73 else if "28&mn_yr5.:00:00:00"dt<=ASOF_DT<"01&mn_yr4:00:00:00"dt • then do; dpd_5=dpd; output asof_&mn_yr5.; end;• 74 else if "28&mn_yr6.:00:00:00"dt<=ASOF_DT<"01&mn_yr5.:00:00:00"dt • then do; dpd_6=dpd; output asof_&mn_yr6.; end;• 75 else if "28&mn_yr7.:00:00:00"dt<=ASOF_DT<"01&mn_yr6.:00:00:00"dt • then do; dpd_7=dpd; output asof_&mn_yr7.; end;• 76 else if "28&mn_yr8.:00:00:00"dt<=ASOF_DT<"01&mn_yr7.:00:00:00"dt • then do; dpd_8=dpd; output asof_&mn_yr8.; end;• 77 else if "28&mn_yr9.:00:00:00"dt<=ASOF_DT<"01&mn_yr8.:00:00:00"dt • then do; dpd_9=dpd; output asof_&mn_yr9.; end;• 78 else if "28&mn_yr10.:00:00:00"dt<=ASOF_DT<"01&mn_yr9.:00:00:00"dt • then do; dpd_10=dpd; output asof_&mn_yr10.; end;• 79 else if "28&mn_yr11.:00:00:00"dt<=ASOF_DT<"01&mn_yr10.:00:00:00"dt • then do;dpd_11=dpd; output asof_&mn_yr11.;end;• 80 else if "28&mn_yr12.:00:00:00"dt<=ASOF_DT<"01&mn_yr11.:00:00:00"dt • then do;dpd_12=dpd; output asof_&mn_yr12.; end;• 81 run;

WHAT ARE YOU DOING???

4

Page 5: Time Series Data Processes

• 82• 83 data _asof_&mn_yr1.; set asof_&mn_yr1.; drop dpd_2--dpd_12; run;• 84 data _asof_&mn_yr2.; set asof_&mn_yr2.; drop dpd_1 dpd_3--dpd_12; run;• 85 data _asof_&mn_yr3.; set asof_&mn_yr3.; drop dpd_1--dpd_2 dpd_4--dpd_12; run;• 86 data _asof_&mn_yr4.; set asof_&mn_yr4.; drop dpd_1--dpd_3 dpd_5--dpd_12; run;• 87 data _asof_&mn_yr5.; set asof_&mn_yr5.; drop dpd_1--dpd_4 dpd_6--dpd_12; run;• 88 data _asof_&mn_yr6.; set asof_&mn_yr6.; drop dpd_1--dpd_5 dpd_7--dpd_12; run;• 89 data _asof_&mn_yr7.; set asof_&mn_yr7.; drop dpd_1--dpd_6 dpd_8--dpd_12; run;• 90 data _asof_&mn_yr8.; set asof_&mn_yr8.; drop dpd_1--dpd_7 dpd_9--dpd_12; run;• 91 data _asof_&mn_yr9.; set asof_&mn_yr9.; drop dpd_1--dpd_8 dpd_10--dpd_12; run;• 92 data _asof_&mn_yr10.; set asof_&mn_yr10.; drop dpd_1--dpd_9 dpd_11--dpd_12; run;• 93 data _asof_&mn_yr11.; set asof_&mn_yr11.; drop dpd_1--dpd_10 dpd_12; run;• 94 data _asof_&mn_yr12.; set asof_&mn_yr12.; drop dpd_1--dpd_11 ; run;• 95• 96 data Cohort_12mn_dpd_&mn_yr1.;• 97 merge _asof_&mn_yr12. _asof_&mn_yr11. _asof_&mn_yr10. _asof_&mn_yr9. _asof_&mn_yr8. • 98 _asof_&mn_yr7. _asof_&mn_yr6. _asof_&mn_yr5. _asof_&mn_yr4. _asof_&mn_yr3.

_asof_&mn_yr2. • 99 _asof_&mn_yr1.(in=a);• 100 by acct_id;• 101 if a;• 102 run;• 103• 104 %mend;• 105• 106 get_data(Sep2006,• Aug2006,Jul2006,Jun2006,May2006,Apr2006, Mar2006, • Feb2006,Jan2006,Dec2005,Nov2005,Oct2005,Sep2005);

WHAT ARE YOU DOING???

5

Page 6: Time Series Data Processes

What is Time Series Data?• Definition of Time Series:

A time series is a collection of observations of well-defined data items obtained through repeated measurements over time.

(by Australian Bureau of Statistics)

An ordered sequence of values of a variable at equally spaced time intervals. (by Engineering Statistics Handbook)

6

Page 7: Time Series Data Processes

What is Time Series Data?• For example,

the monthly delinquent status of a customer over 12-month period

Obs id asof_dt dpd

1 106794 08/31/06 108

2 106794 07/31/06 77

3 106794 06/30/06 46

4 106794 05/31/06 16

5 106794 04/30/06 15

6 106794 03/31/06 16

7 106794 02/28/06 13

8 106794 01/31/06 16

9 106794 12/31/05 16

10 106794 11/30/05 15

11 106794 10/31/05 0

12 106794 09/30/05 07

Page 8: Time Series Data Processes

Stages of Time Series Analysis

1. Analyze data to obtain an understanding of the underlying drivers that produced the collected data.

2. Develop model(s) to forecast possible outcomes through the collected data.

3. Compare monitoring results with predicted outcomes to make appropriate control process modification(s).

8

Page 9: Time Series Data Processes

Applications of Time Series Data

Stock Market

Inventory

Workload Sales 9

Page 10: Time Series Data Processes

A Typical Time Series Data Process

Transpose 12 monthly delinquent status observations of account

to a single account observation with 12-month delinquent status

Obs acct_id asof_dt dpd

1 106794 08/31/06 108

2 106794 07/31/06 77

3 106794 06/30/06 46

4 106794 05/31/06 16

5 106794 04/30/06 15

6 106794 03/31/06 16

7 106794 02/28/06 13

8 106794 01/31/06 16

9 106794 12/31/05 16

10 106794 11/30/05 15

11 106794 10/31/05 0

12 106794 09/30/05 0

Obs acct_id perf_obs_dt dpd_1 dpd_2 dpd_3 dpd_4 dpd_5 dpd_6 dpd_7 dpd_8 dpd_9 dpd_10 dpd_11 dpd_12

1 106794 08/31/06 108 77 46 16 15 16 13 16 16 15 0 0

10

Page 11: Time Series Data Processes

Time Series Data by SAS Procedure • 001 %macro DLQ_12_Month(perf_obs_date);• 002 • 003 data perf_12_month;• 004 set acct_basel_dpd;• 005• 006 format perf_obs_dt date9.;• 007• 008 perf_obs_dt = intnx("MONTH","&perf_obs_date."d,0,'END');• 009• 010 dlq_status_month = intck("MONTH", datepart(asof_dt)), perf_obs_dt ) + 1;• 011• 012 if 1 <= dlq_status_month <= 12; • 013 run;• 014• 015 proc sort data = perf_12_month;• 016 by acct_id perf_obs_dt dlq_status_month;• 017 run; • 018

11

Page 12: Time Series Data Processes

Time Series Data by SAS Procedure After executing line 003 to line 018, the data set is now withtwo additional variables:

Performance Observed Date (perf_obs_dt) Delinquent Status Month (dlq_status_month) obs acct_id asof_dt basel_dpd perf_obs_dt dlq_status_month

1 106794 08/31/2006 108 08/31/2006 1

2 106794 07/31/2006 77 08/31/2006 2

3 106794 06/30/2006 46 08/31/2006 3

4 106794 05/31/2006 16 08/31/2006 4

5 106794 04/30/2006 15 08/31/2006 5

6 106794 03/31/2006 16 08/31/2006 6

7 106794 02/28/2006 13 08/31/2006 7

8 106794 01/31/2006 16 08/31/2006 8

9 106794 12/31/2005 16 08/31/2006 9

10 106794 11/30/2005 15 08/31/2006 10

11 106794 10/31/2005 0 08/31/2006 11

12 106794 09/30/2005 0 08/31/2006 12

12

Page 13: Time Series Data Processes

Time Series Data by SAS Procedure • 018• 019 proc transpose data = perf_12_month• 020 out = Cohort_12mn_dpd_&perf_obs_date• 021 (drop = _name_ where=(dpd1 ^=.))• 022 prefix = dpd• 023 ;• 024 by acct_id perf_obs_dt;• 025• 026 id dlq_status_month;• 027• 028 var basel_dpd;• 029 run;• 030• 031 %mend;• 032• 033 %DLQ_12_Month(01AUG2006);

PROC TRANSPOSE: 1. Transposes the variable basel_dpd by acct_id perf_obs_dt 2. Creates new variables dpd_1 to dpd_12 by PREFIX and ID options

13

Page 14: Time Series Data Processes

Time Series Data by SAS Procedure After executing line 019 to line 029, the new data set is a single account observation with 12-month delinquent status

Obs acct_id perf_obs_dt dpd_1 dpd_2 dpd_3 dpd_4 dpd_5 dpd_6 dpd_7 dpd_8 dpd_9 dpd_10 dpd_11 dpd_12

1 106794 08/31/06 108 77 46 16 15 16 13 16 16 15 0 0

14

Page 15: Time Series Data Processes

Time Series Data by SAS Data Step• 019 data Cohort_12mn_dpd_&perf_obs_date (keep = acct_id perf_obs_dt dpd1 - dpd12);• 020 set perf_12_month; • 021 by acct_id perf_obs_dt dlq_status_month;• 022 array dpd[12] dpd1 - dpd12;• 023 retain dpd1 - dpd12 j 0;• 024 • 025 if first.acct_id and first.perf_obs_dt • 026 then do;• 027 do i = 1 to 12;• 028 dpd[i] = 0;• 029 end;• 030 j = 0;• 031 end;• 032 • 033 j = j + 1;• 034• 035 dpd[j] = basel_dpd;• 036• 037 if last.acct_id and last.perf_obs_dt ; • 038 run;• 039 %mend;• 040• 041 %DLQ_12_Month(01AUG2006);

15

Page 16: Time Series Data Processes

Time Series Data by SAS Data StepSAS Data Step:

1. Declares ARRAY DPD to create new variable DPD_1 to DPD_12

2. Declares RETAIN to pass the values of variable DPD_1 to DPD_12 from one observation to the next observation

3. Initiates the values of variable DPD_1 to DPD_12 to 0s when the first account id and the first observation date are detected and neutralizes index J

4. Assigns the values of variable DPD_1 to DPD_12 by index J

5. Outputs the values of variable DPD_1 to DPD_12 to new data set only when the last account id and the last observation date is detected

16

Page 17: Time Series Data Processes

Time Series Data by SAS Function • 019 data Cohort_12mn_dpd_&perf_obs_date • 020 (keep = acct_id perf_obs_dt dpd1 - dpd12);• 021 set perf_12_month; • 022 by acct_id perf_obs_dt dlq_status_month;• 023 array dpd[12] dpd1 - dpd12;;• 024 • 025 %do i = 1 %to 12;• 026 %let j = %eval(12 - &i);• 027 dpd[&i] = lag&j(basel_dpd);• 028 %end;• 029• 030 if dlq_status_month = 12; • 031 run;• 032• 033 %mend;• 034• 035 %DLQ_12_Month(01AUG2006);

17

Page 18: Time Series Data Processes

Time Series Data by SAS Function • SAS LAG Function: 1. Stores a value in a queue and returns a value stored previously in that queue.

2. Each occurrence of a LAGn function in a program generates its own queue of values.

3. When an occurrence of LAGn is executed, the value at the top of its queue is removed and returned, the remaining values are shifted upwards, and the new value of the argument is placed at the bottom of the queue. observation of the prior execution.

18

obs acct_id asof_dt basel_dpd perf_obs_dt dlq_status_month DPD LAG

1 106794 08/31/2006 108 08/31/2006 1 DPD_12 106794 07/31/2006 77 08/31/2006 2 DPD_23 106794 06/30/2006 46 08/31/2006 3 DPD_34 106794 05/31/2006 16 08/31/2006 4 DPD_45 106794 04/30/2006 15 08/31/2006 5 DPD_56 106794 03/31/2006 16 08/31/2006 6 DPD_67 106794 02/28/2006 13 08/31/2006 7 DPD_78 106794 01/31/2006 16 08/31/2006 8 DPD_89 106794 12/31/2005 16 08/31/2006 9 DPD_9

10 106794 11/30/2005 15 08/31/2006 10 DPD_1011 106794 10/31/2005 0 08/31/2006 11 DPD_1112 106794 09/30/2005 0 08/31/2006 12 DPD_12 LAG0

LAG1

LAG2LAG3LAG4

LAG5LAG6

LAG7LAG8

LAG9LAG10

LAG11

Page 19: Time Series Data Processes

Time Series Data by SAS Function • 019 data Cohort_12mn_dpd_&perf_obs_date • 020 (keep = acct_id perf_obs_dt dpd1 - dpd12);• 021 set perf_12_month; • 022 by acct_id perf_obs_dt dlq_status_month;• 023 array dpd[12] dpd1 - dpd12;;• 024 • 025 if dlq_status_month = 12; • 026• 027 %do i = 1 %to 12;• 028 %let j = %eval(12 - &i);• 029 dpd[&i] = lag&j(basel_dpd);• 030 %end;• 031• 032 * if dlq_status_month = 12; • 033 run;• 034• 035 %mend;• 036• 037 %DLQ_12_Month(01AUG2006);

19

Page 20: Time Series Data Processes

Time Series Data by SAS Function SAS LAG Function: 1. The queue for each occurrence of LAGn is initialized with n missing values.

2. Missing values are returned for the first n executions of each occurrence of LAGn, after which the lagged values of the argument begin to appear.

3. Stores values at the bottom of the queue and returns values from the top of the queue occurs only when the function is executed. An occurrence of the LAGn function that is executed conditionally will store and return values only from the observations for which the condition is satisfied.

Obs acct_id perf_obs_dt dpd1 dpd2 dpd3 dpd4 dpd5 dpd6 dpd7 dpd8 dpd9 dpd10 dpd11 dpd12

1 106794 08/31/06 . . . . . . . . . . . 0

20

Page 21: Time Series Data Processes

Time Series Data by SAS Function Special Case: When not all time series is populated. The sub-setting IF statement (if dlq_status_month = 12;) will never be satisfied. SAS returns no observation tothe output dataset.

21

obs acct_id asof_dt basel_dpd dlq_status_month

1 10679 08/31/2006 10 1

2 10679 07/31/2006 7 2

3 10679 06/30/2006 4 3

4 10679 05/31/2006 6 4

5 10679 04/30/2006 5 5

6 10679 03/31/2006 6 6

7 10679 02/28/2006 3 7

8 10679 01/31/2006 6 8

9 10679 12/31/2005 6 9

Page 22: Time Series Data Processes

Time Series Data by SAS Function • 019 proc sort data = perf_12_month;• 020 by acct_id load_dt descending dlq_status_month;• 021 run; • 022• 023 data Cohort_12mn_dpd_&perf_obs_date (keep = acct_id load_dt dpd1 - dpd12);• 024 set perf_12_month; • 025 by acct_id load_dt descending dlq_status_month;• 026 • 027 array dpd[12] dpd1 - dpd12;;• 028 • 029 %do i = 1 %to 12;• 030 %let j = %eval(&i - 1);• 031 dpd[&i] = lag&j(basel_dpd);• 032 %end;• 033• 034 if dlq_status_month = 1; • 035 run;• 036 %mend;• 037• 038 %DLQ_12_Month(01AUG2006);

22

Page 23: Time Series Data Processes

Time Series Data by SAS Function Special Case: When not all time series is populated.

By 1. Sorting the variable dlq_status_month in descending order 2. Conditioning the sub-setting IF statement is “True” when the beginning of

the time series date period (if dlq_status_month = 1) is reached.

23

Obs acct_id load_dt dpd1 dpd2 dpd3 dpd4 dpd5 dpd6 dpd7 dpd8 dpd9 dpd10 dpd11 dpd12

1 10679 31-Aug-06 10 7 4 6 5 6 3 6 6 . . .

Page 24: Time Series Data Processes

Weakness and Strength of Each Approach

Approach Strength Weakness==================================================

=

PROC TRANSPOSE

DATA STEP

LAG FUNCTION

24

Easy Coding Limited Variable

Flexible Manipulating Initialization

Self Explanatory Conditional Execution