Enter the MatrixProc IML for Neophytes
September 20, 2019
Toronto Area SAS Society
Erik Johnson
Senior Analyst
Credit Risk Analytics
Motivation
This is your last chance. After this, there is no turning back. You take the blue pill - the story ends, you wake up in your bed and believe whatever you want to believe. You take the red pill - you stay in Wonderland and I show you how deep the rabbit-hole goes.
Morpheus, The Matrix (1999)
TASS | Proc IML for beginners 2
Welcome to Wonderland
• SAS User mantra “there’s a PROC for that”
•Until there isn’t …
•Enter Proc IML
–Solving a set of equations
–Statistical Analysis
–Custom algorithms
–Matrix algebra makes the above much easier
–Incorporate R code into SAS ☺
TASS | Proc IML for beginners 3
Trigger Warning
There will be math …
TASS | Proc IML for beginners 4
But maybe you took the blue pill?
• It’s a Base-SAS world, we’re just living in it– https://www.lexjansen.com/pharmasug/2010/CC/CC15.pdf
• You can still do matrix algebra …
TASS | Proc IML for beginners 5
You’ll be fine ..
TASS | Proc IML for beginners 6
Oh crap ...
TASS | Proc IML for beginners 7
Shut the front door
TASS | Proc IML for beginners 8
What is the matrix?
• “The matrix is a system …”
• Or a collection of numbers structured into rows and columns (which can impart meaning)
TASS | Proc IML for beginners 9
𝐴 =𝑎11 𝑎12𝑎21 𝑎22
Back to Wonderland
TASS | Proc IML for beginners 10
• Importing/Exporting data to/from IML and SAS
• Matrix Algebra
• Statistics in IML
• Simulations
• If you want me back for “advanced” IML
– (Custom) Model Validation
–Optimization Methods in IML
–Advanced Statistics in IML
Basic IML syntax
proc iml <symsize=n1> <worksize=n2>;
<iml/sas code>;
<iml/sas code>;
<iml/sas code>;
quit;
TASS | Proc IML for beginners 11
A word on worksize
• If you don’t specify WORKSIZE, SAS will use the host-dependent default—it’s in KBs
• SYMSIZE allocates memory to PROC IML’s symbol space
–This is where the names of matrices are stored
–There are two kinds local and global
–Locals are defined for each module with arguments
TASS | Proc IML for beginners 12
Reading datasets into IML from SAS I
/* Importing data into SAS/IML */
proc iml;
use work.my_data;
read all var _ALL_ into matrix[colname=varNames];
close work.my_data;
print matrix;
quit;
TASS | Proc IML for beginners 13
Reading datasets into IML from SAS II
TASS | Proc IML for beginners 14
matrix
A B C
1 5 3
5 1 5
3 3 1
matrix_2
D E F
3 7 0
7 3 7
0 0 3
Exporting datasets from IML to SAS code
/* Exporting data from IML to SAS */
proc iml;
varNames = {A B C};
create my_data_is_back from matrix [colname=varNames];
append from matrix;
close my_data_is_back;
quit;
TASS | Proc IML for beginners 15
Matrix Transposition Code
/* Transposition of a matrix */
proc iml;
transposed=T(matrix);
transposed_2=matrix_2`;
print transposed transposed_2;
quit;
TASS | Proc IML for beginners 16
Matrix Transposition Results
TASS | Proc IML for beginners 17
transposed transposed_2
1 5 3 3 7 0
5 1 3 7 3 0
3 5 1 0 7 3
Matrix Addition Code
/* Matrix Addition */
proc iml;
matrix_add=matrix+matrix_2;
print matrix_add;
quit;
TASS | Proc IML for beginners 18
Matrix Addition Results
TASS | Proc IML for beginners 19
matrix_add
4 12 3
12 4 12
3 3 4
Matrix Multiplication Code
/* Matrix Multiplication */
proc iml;
matrix_mult=matrix*matrix_2;
matrix_mult_2=matrix_2*matrix;
print matrix_mult[rowname= {row1,row2,row3}
colname={A B C}]
matrix_mult_2[colname={co1 col2 col3}];
quit;
TASS | Proc IML for beginners 20
Matrix Multiplication Results
TASS | Proc IML for beginners 21
matrix_mult A B Cmatrix_mult_2
CO1 COL2 COL3
ROW1 38 22 44 38 22 44
ROW2 22 38 22 43 59 43
ROW3 30 30 24 9 9 3
Matrix Element-Wise Powers Code
/* Matrix Element-wise powers */
proc iml;
matrix_e_power=matrix##3;
matrix_e_power_2=matrix_2##3;
print matrix_e_power matrix_e_power_2 ;
quit;
TASS | Proc IML for beginners 22
Matrix Element-Wise Operation Results
TASS | Proc IML for beginners 23
matrix_e_power matrix_e_power_2
1 125 27 27 343 0
125 1 125 343 27 343
27 27 1 0 0 27
Other Matrix Operators Code
/* Other Matrix Operators */
proc iml;
matrix_inv=inv(matrix);
matrix_trace=trace(matrix);
matrix_det=det(matrix);
matrix_logic=matrix>=matrix_2;
print matrix_inv matrix_logic;
print matrix_trace matrix_det;
quit;
TASS | Proc IML for beginners 24
Other Matrix Operators Results I
TASS | Proc IML for beginners 25
matrix_inv matrix_logic
-0.194444 0.0555556 0.3055556 0 0 1
0.1388889 -0.111111 0.1388889 0 0 0
0.1666667 0.1666667 -0.333333 1 1 0
Other Matrix Operators Results II
TASS | Proc IML for beginners 26
matrix_trace matrix_det
3 72
Matrix Reduction Operations Code
/* Matrix reduction operations */
proc iml;
matrix_row_red=matrix_e_power[+,];
matrix_row_red_2=(matrix_e_power_2[+,])[,<>];
print matrix_row_red matrix_row_red_2;
quit;
TASS | Proc IML for beginners 27
Matrix Reduction Operations Results
TASS | Proc IML for beginners 28
matrix_row_red matrix_row_red_2
153 153 153 370
Matrix Algebra in IML Review
TASS | Proc IML for beginners 29
Matrix Operation IML Shortcut
Addition +
Subtraction -
Division, element wise /
Multiplication, element wise #
Multiplication, matrix *
Power, element wise ##
Power, Matrix **
Matrix Operators Review
TASS | Proc IML for beginners 30
Matrix Function IML Alias
Transpose ` or T(matrix)
Determinant Det(matrix)
Inverse Inv(matrix)
Trace Trace(matrix)
Logicals >,>=,=,<,<=,^=,
Identity I(n)
Dummy Matrix J(nrow,ncol,a)
Reshape a matrix in row major order
rowvec(matrix)
Matrix Reduction Operators Review
TASS | Proc IML for beginners 31
Operation IML Alias
Addition +
Multiplication #
Minimum ><
Maximum <>
Index of minimum >:<
Index of maximum <:>
Mean :
Sum of squares ##
Concatenate || (horizontal) // (vertical)
Statistics in SAS/IML
• Standard method for evaluating the relationship between a variable of interest (dependent variable) and potential explanatory variables (independent variables) is the Ordinary Least Squares Regression
• It’s the solution to the following equation
𝑦 = 𝑋𝛽 + 𝑢መ𝛽 = 𝑋′𝑋 −1𝑋′𝑦
TASS | Proc IML for beginners 32
OLS IML Code I
/* Statistics in IML (Ordinary Least Squares
Regression) */
/* Read SASHELP.CARS into work and create an
interaction with foreign and MPG */
data cars;
set sashelp.cars;
if origin eq "USA" then foreign = 0 ;
else foreign = 1;
mpg_x_foreign=foreign*mpg_highway;
run;
TASS | Proc IML for beginners 33
OLS IML Code II
/* Building model in IML */
proc iml;
/* Reading in data from SAS */
use work.cars;
read all var {'MPG_Highway' 'Weight' 'Foreign'
'mpg_x_foreign'} into X;
read all var {'MSRP'} into Y;
close cars;
TASS | Proc IML for beginners 34
OLS IML Code III
/* Transforming the Data for a regression */
timer=J(2,1,0);
n=nrow(X);
X=J(n,1,1) || X;
k=ncol(X);
/* Estimating Beta_hat */
t0=time();
beta_hat=(inv(X`*X))*X`*y;
timer[1,1]=time()-t0;
u_hat=y-beta_hat`*X`*y;
TASS | Proc IML for beginners 35
OLS IML Code IV
/* Use Matrix Algebra to Calculate OLS statistics */
SSE=y`*y-beta_hat`*X`*y;
MSE=sse/(n-k);
Y_bar=sum(Y)/n;
ESS=beta_hat`*X`*y-n*y_bar**2;
MSR=ESS/(k-1);
F=MSR/MSE;
SST=ESS+SSE;
R_2=ESS/SST;
/* SAS is using Adj R-Sq of 1-(n)/(n-k+1)*(1-R_2) */
Adj_R_2=1-(n-1)/(n-k+1)*(1-R_2);
TASS | Proc IML for beginners 36
OLS IML Code V
/* Calculate Hypothesis Testing Components */
SE=sqrt(vecdiag(inv(X`*X))#MSE);
T=beta_hat/se;
p_stats=2*(1-CDF('T',ABS(T),n-k));
timer[2,1]=time()-t0;
reg_stats=(k||ESS||MSR||F) // (n-k||SSE||MSE||{.});
coefs=beta_hat || SE || T || p_stats;
TASS | Proc IML for beginners 37
OLS IML Code VI
/* Clean up the results to print */
print 'OLS Statistics for regression of Car Prices';
print reg_stats (|Colname={DF SS MS F} rowname={Model
Residuals} format=8.4|);
print 'Parameter estimates';
print coefs (|Colname={Coef SE T p_stat} rowname={INT
MPG Weight Foreign MPG_x_Foreign} format=8.4|);
print " ";
print 'The Adjusted R-Square is ' Adj_R_2;
print 'The time to invert X*X was' (timer[1,1]);
print 'The time calculate all statistics was'
(timer[2,1]);
quit;
TASS | Proc IML for beginners 38
OLS Regression of Car Prices in IML part I
TASS | Proc IML for beginners 39
Regression Statistics
DF SS MS F
MODEL 5.0000 4.684E10 1.171E10 43.2981
RESIDUALS 423.0000 1.144E11 2.7044E8 .
Adj_R_2
The Adjusted R-Square is 0.2854775
OLS Regression of Car Prices in IML part II
TASS | Proc IML for beginners 40
Parameter Estimates
COEF SE T P_STAT
INT -9863.91 14813.12 -0.6659 0.5058
MPG 37.0433 352.3656 0.1051 0.9163
WEIGHT 9.8881 1.8064 5.4739 0.0000
FOREIGN 32324.29 8557.125 3.7775 0.0002
MPG_X_FOREIGN -835.176 314.5948 -2.6548 0.0082
The proof is in pudding
TASS | Proc IML for beginners 41
/* Check the IML results with PROC REG */
%let timer_start = %sysfunc(datetime());
proc reg data=work.cars;
model msrp = mpg_highway weight foreign mpg_x_foreign;
run;
data _null_;
dur=datetime()-&timer_start;
put 30*'-' / ' Total Duration' dur MMSS13.6 / 30*'-';
run;
PROC REG Output for Car Prices
TASS | Proc IML for beginners 42
SAS/IML Results one more time
TASS | Proc IML for beginners 43
Does “vectorizing” help? Yes.
TASS | Proc IML for beginners 44
Does “vectorizing” help 2.0
TASS | Proc IML for beginners 45
Compare to Proc Reg again
TASS | Proc IML for beginners 46
• IML appears to be running at least 5 times faster than PROC REG
• Caveat: in-memory calculations rely on RAM
Why bother – that was a lot of code?
• “Vectorizing” your code –> avoid loops
• Customization is key here
• Extensions to the standard statistics models can be implemented to your delight with IML
– Newey-West Variance-Covariance Matrix (SC and HC)
– Spatial Correlation (models with geographic components)
– Clustering (one-way and two-way)
• There is a PROC for that (sometimes)
– PROC SURVEYREG, PROC MODEL, PROC REG (w/ acov)
• For the thrills!
TASS | Proc IML for beginners 47
Simulation with SAS/IML
• An empirically robust method for constructing test statistics, is known as the “bootstrap”
– You can be agnostic about the underlying DGP
• The idea behind the bootstrap is to:
– Calculate the statistic of interest from your sample
– Resample the data B times to create B bootstrap samples
– Re-calculate the statistics for each sample
– Use the bootstrap distribution to obtain parameters of interest (confidence intervals, standard errors, etc.)
TASS | Proc IML for beginners 48
Sample Kurtosis
• A measure of how “heavy” the tails of a distribution are (relative to the standard normal)
𝑘𝑢𝑟𝑡𝑜𝑠𝑖𝑠 =𝜇4
𝜇22 − 3
𝑛(𝑛 + 1)
(𝑛 − 1)(𝑛 − 2)(𝑛 − 3)
𝑖=1
𝑛
𝑤𝑖2 𝑥𝑖 − ҧ𝑥𝑤
𝑠𝑤−
3 𝑛 − 1 2
(𝑛 − 2)(𝑛 − 3)
TASS | Proc IML for beginners 49
Estimating the Kurtosis of car weights
/* The bootstrap in IML */
ods listing gpath="<output_directory>";
ods graphics on / imagename="iml_results_" ;
proc univariate data=cars;
var weight;
histogram weight;
inset N Kurtosis (8.4) / position=NE;
run;
TASS | Proc IML for beginners 50
Sample Histogram of car weights
TASS | Proc IML for beginners 51
SAS/IML Code for Bootstrap of Kurtosis I
proc iml;
/* Create module to estimate kurtosis */
start BootStat(A);
return kurtosis(A);
finish;
/* Set Crit Value and number of bootstrap samples */
alpha=0.05;
B=10000;
/* Read in cars data */
use work.cars;
read all var "Weight";
close;
TASS | Proc IML for beginners 52
SAS/IML Code for Bootstrap of Kurtosis II
/* Resample Weight data and recalculate kurtosis */
call randseed(153);
est=BootStat(weight);
s=sample(weight, B // nrow(weight));
bStat=T(BootStat(s));
bootEst=mean(bStat);
SE=std(bStat);
call qntl(CI, bStat, alpha/2 || 1-alpha/2);
TASS | Proc IML for beginners 53
SAS/IML Code for Bootstrap of Kurtosis III
/* Summarize results of Bootstrap procedure */
R=Est || BootEst || SE || CI` ;
print R[format=8.4 L="95% Bootstrap CI" c={"Obs"
"BootEst" "StdErr" "Lower" "Upper"}];
/* Output the results as a graph */
call symputx('BootEst', round(BootEst, 1e-4));
call symputx('Lower', round(CI[1], 1e-4));
call symputx('Upper', round(CI[2], 1e-4));
TASS | Proc IML for beginners 54
SAS/IML Code for Bootstrap of Kurtosis IV
refStmt = 'refline &BootEst / axis=x
lineattrs=(color=red) name="BootEst"
legendlabel="Bootstrap Statistic = &BootEst";'
+ 'refline &Lower &Upper / axis=x
lineattrs=(color=blue) name="CI" legendlabel="95% Pctl
CI";'
+ 'keylegend "BootEst" "CI";';
title "Bootstrap Distribution";
call histogram(bStat) label="Kurtosis" other=refStmt;
ods graphics off;
ods _all_ close;
TASS | Proc IML for beginners 55
SAS/IML Output for Bootstrap of Kurtosis
TASS | Proc IML for beginners 56
Bootstrap Distribution of Car Weights Kurtosis, 95% Bootstrap CI
Obs BootEst StdErr Lower Upper
1.6888 1.5977 0.7653 0.3489 3.2050
SAS/IML Histogram of Bootstrap Kurtosis
TASS | Proc IML for beginners 57
Useful IML commands for Statistics
TASS | Proc IML for beginners 58
IML Function PurposeCALL RANDSEED(n) Set seed for random number generator
CALL RANDGEN(A, ‘distname’ <,parm1> …)
Generate pseudo random numbers from the ‘distname’ distribution
VECDIAG(A) Creates a column vector of the elements on the main diagonal of a matrix
CDF(‘dist’, q <,parm1, … parmk>) Returns a value from the cumulative probability distribution of ‘dist’
CALL QNTL(q, A <, probs> <, method>) Computes sample quantiles of A in q
CALL HISTOGRAM(x <,*>) Calls SGPLOT to plot the histogram of vector x
SHAPE(A, nrow <, ncol> <, pad-val>) Creates a new matrix from the data in A of size nrow by ncol.
SAMPLE(A, <,n> <, method> <, prob>) Creates a random sample of the elements of A
TIME() Outputs the current time—useful for timing code
IML Examples Review
TASS | Proc IML for beginners 59
• Importing/Exporting data to/from IML and SAS
• Matrix Algebra
• Statistics in IML
• Simulations in IML
• Next time in IML
–Custom Model Validation
–Advanced Statistics
–Optimization
Conclusion
60
• SAS/IML offers a lot of flexibility and vectorization missing from base SAS
•Caution: SAS/IML code isn’t validated the way PROCs are (have to say it)
• Feel free to reach out with questions
•Thank you!
References
61
• Ajmani, V. B. (2009). Applied Econometrics Using the SAS System. Hoboken, NJ: Wiley.
• B. Baesens, D. Roesch, H. Scheule (2016). Credit Risk Analytics: Measurement Techniques, Applications and Examples in SAS. Hoboken, NJ: Wiley.http://www.creditriskanalytics.net/
• Wicklin, R. (2010). Statistical Programming with SAS/IML Software. Cary, NC: SAS Institute Inc.
• Wicklin, R. (2013). Simulating Data with SAS. Cary, NC: SAS Institute Inc.
– https://support.sas.com/content/dam/SAS/support/en/books/simulating-data-with-sas/65378_Appendix_A_A_SAS_IML_Primer.pdf
– https://blogs.sas.com/content/iml/2013/11/25/twelve-advantages-to-calling-r-from-the-sasiml-language.html
• Wooldridge, J. (2019). Introductory Econometrics: A Modern Approach (7th
ed.). Mason, OH: South-Western.