programmability in spss 14, 15 and 16
TRANSCRIPT
Programmability in SPSS 14, SPSS 15 and SPSS 16
The Revolution ContinuesJon PeckTechnical AdvisorSPSS
Copyright (c) SPSS Inc, 2007Copyright (c) SPSS Inc, 2007
Recap of SPSS 14 Python programmability
Developer Central
New features in SPSS 15 programmability Writing first-class procedures Updating the data
New features in SPSS 16 programmability
Interacting with the user
Q & A
Conclusion
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Agenda
"Because of programmability, SPSS 14 is the most important
release since I started using SPSS fifteen years ago."
"I think I am going to like using Python."
"Python and SPSS 14 and later are, IMHO, GREAT!"
"By the way, Python is a great addition to SPSS."
From InfoWorld (April 19, 2007) "Of all the tools fueling the dynamic-language trend in the enterprise,
general-purpose dynamic languages such as Python and Ruby present
the greatest upside for enhancing developer productivity."
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Quotations from SPSS Users
SPSS provides a powerful engine for statistical
and graphical methods and for data
management.
Python® provides a powerful, elegant, and
easy-to-learn language for controlling and
responding to this engine.
Together they provide a comprehensive system
for serious applications of analytical methods to
data.
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
The Combination of SPSS and Python
SPSS 14.0 provided Programmability Multiple datasets Variable and File Attributes Programmability read-access to case data Ability to control SPSS from a Python program
SPSS 15 adds Read and write case data Create new variables directly rather than generating syntax Create pivot tables and text blocks via backend API's Easier setup
SPSS 16 will add EXTENSION command for user procedures with SPSS syntax Dataset features for complex data management Ability to use R procedures within SPSS through R Plug-In
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Programmability Features in SPSS 14, 15, and 16
Makes possible easy jobs that respond to datasets, output, environment
Allows greater generality, more automation
Makes jobs more robust
Allows extending the capabilities of SPSS
Enables better organized and more maintainable code
Facilitates staff specialization
Increases productivity
More fun
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Programmability Advantages
Python extends SPSS via General programming language Access to variable dictionary, case data, and output Access to standard and third-party modules SPSS Developer Central modules Module structure for building libraries of code
Runs in "back-end" syntax context (like macro) SaxBasic scripting runs in "front-end" context
Two modes Traditional SPSS syntax window Drive SPSS from Python (external mode)
Optional install (licensed with SPSS Base)
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Programmability Overview
SPSS is not the owner or licensor of the Python
software. Any user of Python must agree to the
terms of the Python license agreement located
on the Python web site. SPSS is not making any
statement about the quality of the Python
program. SPSS fully disclaims all liability
associated with your use of the Python program.
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Legal Notice
Supports implementing various programming
languages Requires a programmer to implement a new language
VB.NET Plug-In available on Developer Central Works only in external mode C
op
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
The SPSS Programmability Software Development Kit
Python interpreter embedded within SPSS
SPSS runs in traditional way until BEGIN PROGRAM command is found
Python collects commands until END PROGRAM command is found; then runs the program
Python can communicate with SPSS through API's (calls to functions) Includes running SPSS syntax inside Python program Includes creating macro values for later use in syntax
Python can access SPSS output and data
OMS is a key tool
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
How Programmability Works
BEGIN PROGRAM.import spss, spssauxspssaux.GetSPSSInstallDir("SPSSDIR")spssaux.OpenDataFile("SPSSDIR/employee data.sav")
# find categorical variablescatVars = spssaux.VariableDict(variableLevel=['nominal', 'ordinal'])if catVars:
spss.Submit("FREQ " + " ".join(catVars.variables))# create a macro listing categorical variablesspss.SetMacroValue("!catVars", " ".join(catVars.variables))
END PROGRAM.
DESC !catVars. Run
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Example: Summarize Categorical Variables
Two modes of operation
SPSS Drives mode (inside): traditional syntax context BEGIN PROGRAM …program… END PROGRAM Program in 14, 15, or 16 is in Python or, new in 16, in R
X Drives mode (outside): eXternal program drives SPSS Python interpreter (or VB.NET) No SPSS Viewer, Data Editor, or SPSS user interface
Output sent as text to the application – can be suppressed Has performance advantages Build programs with an IDE
Even if to be run in traditional mode
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Programmability Inside or Outside SPSS
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
PythonWin IDE Controlling SPSS(eXternal Mode)
Be productive quickly
Get more return as you learn more
Python.org
Python Tutorial
Cheeseshop over 2200 packages as of April 11, 2007
SPSS Developer Central
SPSS Programming and Data Management, 4th ed, 2006.
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Python Resources
Dive Into Python book or PDF
Practical Python by Magnus Lie Hetland Extensive examples and discussion of Python
Python Cookbook, 2nd ed by Martelli, Ravenscroft, & Ascher
Python in a Nutshell, 2nd ed by Martelli, O'Reilly Very clear, comprehensive reference material
wxPython in Action by Rappin and Dunn Explains user interface building with wxPython
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Python Books
scipy 0.5.2 Scientific Algorithms Library for Python Scipy.org
scipy is an open source library of scientific tools for Python. scipy gathers a variety of high level science and engineering modules together as a single package. scipy provides modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, genetic algorithms, ODE solvers, special functions, and more. scipy requires and supplements NumPy, which provides a multidimensional array object and other basic functionality.
Python is becoming a major language for scientific computing
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cheeseshop: scipy
SPSS Developer Central is the web home for developing SPSS applications
Python, .NET, R Integration Plug-Ins
Supplementary modules by SPSS and others
Articles on programmability and graphics
Forums for asking questions and exchanging information
Programmability Extension SDK
Get Python itself from Python.org or CD SPSS 14, 15 use 2.4. (2.4.3) SPSS 16 will use 2.5
Not limited to programmability GPL graphics User-contributed code
Key Supplementary Modulesspssauxspssdata
New for SPSS 15trans extendedTransforms rake plsenhanced tables.py
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
SPSS Developer Central
tables.py module on Developer Central can merge two tables into one. E.g., Ctables significance tests into main tables Merge or replace cells with cells from a different table Flexibly define the join
tables.py can also censor cells, e.g., blank statistics based on small counts.
Merge example: data on importance of education qualifications for immigration by region of Europe CTABLES /TABLE qfimeduBin BY Region
/TITLES TITLE='Qualifications for Immigration'/COMPARETEST TYPE=PROP
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Example: Manipulating Output: Merging Tables
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Ctables Output
BEGIN PROGRAM.import spss, tablescmd=r"""CTABLES /TABLE qfimeduBin BY Region /TITLES TITLE='Qualifications for Immigration' /COMPARETEST TYPE=PROP"""tables.mergeLatest(cmd, autofit=False)END PROGRAM.
Runs Ctables and merges test table into main table Using default merge behavior
"If it really is this simple this will generate a lot of excitement for us."
"This is really fantastic."
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Program to Merge
Qualifications for ImmigrationComparisons of Column Proportions
974 376 1024A B D
533
1361B D
336 1282A B D
574
2940D
974 2720A B D
1555
3543 1130 2989B
2038
3585C
1288C
2540 2229A C
1931C
823A C
876 1299A C
0
1
2
3
4
5
Qualification forimmigration:good educationalqualifications
Count(A)
WesternCount
(B)
EasternCount
(C)
NorthernCount
(D)
SouthernRegion of Europe
Results are based on two-sided tests with significance level 0.05. For eachsignificant pair, the key of the category with the smaller column proportionappears under the category with the larger column proportion.
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Merged Output
You can extend SPSS capabilities by building new procedures Or use ones that others have built
Combine SPSS procedures and transformations with Python logic Poisson regression (SPSS 14) example using iterated CNLR New raking procedure built over GENLOG
GENLINin SPSS 15
Calculate data aggregates in SPSS and pass to algorithm coded in Python Raking procedure starts with AGGREGATE; uses GENLOG
Acquire case data and compute in Python Use Python standard modules and third-party additions Partial Least Squares Regression (pls module)
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Approaches to Creating New Procedures
Common to adapt existing libraries or code for use as Python extension modules C, C++, VB, Fortran,...
Python tools and API's to assist Chap 25 in Python in a Nutshell Tutorial on extending and embedding the Python
interpreter
Call R programs with SPSS 16
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Adapt Existing Code Libraries
Regression with large number of predictors (even k > N)
Similar to Principal Components but considers dependent
variable simultaneously
Calculates principal components of (y, X) then use regression
on the scores instead of original data
Equivalent to ordinary regression when number of factors
equals number of predictors and one y variable
For more information see An Optimization Perspective on
Kernel Partial Least Squares Regression.pdf.
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Partial Least Squares Regression
Strategy Fetches data from SPSS Uses scipy matrix operations to compute results
Third-party module from Cheeseshop
Writes pivot tables to SPSS Viewer Subject to OMS SPSS 14 viewer module created pivot table using OLE
automation SPSS 15 has direct pivot table API's
Saves predicted values to active dataset
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
The pls Module for SPSS 15
GET FILE="c:/spss15/tutorial/sample_files/car_sales.sav".REGRESSION /STATISTICS COEFF R /DEPENDENT sales /METHOD=ENTER curb_wgt engine_s fuel_cap horsepow length mpg price resale type wheelbas width .
begin program.import spss, pls
pls.plsproc("sales", """curb_wgt engine_s fuel_cap horsepowlength mpg price resale type wheelbas width""", yhat="predsales")end program.
plsproc defaults to five factors
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
pls Example: REGRESSION vs PLS
PLS with 5 factors
almost equals
regression with 11
variables
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Results
User procedures can be written in Python but specified using SPSS traditional syntax
User never writes or sees Python code
Used as if a built-in SPSS command
EXTENSION command defines command to SPSS via simple XML file
Python module called with syntax already checked and processed by SPSS
More general PLS module PLS y1 y2 y3 BY fac1 fac2 WITH z1 z2 z3
/CRITERIA LATENTFACTORS=2.
Dialog box interface tools in SPSS 17 In the meantime, use wxPython or
something similar
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
SPSS 16 User Procedures
"Raking" adjusts sample weights to control totals in n dimensions
Example: data classified by age and sex with known population totals or proportions
Calculated by fitting a main effects loglinear model Various adjustments required Not a complete solution to reweighting
Not directly available in SPSS
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Raking Sample Weights
Strategy: combine SPSS procedures with Python logic
rake.py (from SPSS Developer Central) Aggregates data via AGGREGATE to new dataset Creates new variable with control totals Applies GENLOG, saving predicted counts Adjusts predicted counts Matches back into original dataset
Does not use MATCH FILES or require a SORT command Written in one (long) day
rake.rake("age sex", [{0: 1140, 1:1140}, {0: 104.6, 1:2175.4}], finalweight="finalwt")
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Raking Module
SPSS 14 programmability can wrap SPSS syntax in
Python logic, e.g., generate COMPUTE commands
on the fly Useful when definitions can be expressed in SPSS syntax
SPSS 15 programmability can Generate new variables directly Add new cases directly Create new datasets from scratch
SPSS 16 has additional dataset capabilities
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Extending SPSS Transformations
trans module facilitates plugging in Python code to iterate over cases
Runs as an SPSS procedure Passes the data Adds variables to the SPSS variable dictionary Can apply any calculation casewise
Use with Standard Python functions (e.g., math module) Any user-written functions or appropriate classes Functions in extendedTransforms module
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
trans and extendedTransforms Modules
trans strategy Pass case data through Python code writing
result back to SPSS in new variables
extendedTransforms collection of 12 functions to
apply to SPSS variables, including Regular expression search/replace soundex and nysiis functions for phonetic equivalence Date/time conversions based on patterns
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
trans and extendedTransforms Modules
Pattern matching in text strings
If you use SPSS index or replace, you need these
Standardize string data (Mr, Mr., Herr, Senor,...)
Extract data from loosely structured text "simvastatin-- PO 80mg TAB" -> "simvastatin", "80"
Patterns can be simple strings (as with SPSS index) or complex patterns
Pick out variable names with common parts
Can greatly simplify code
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Python Regular Expressions
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Cop
yrig
ht (c) S
PS
S In
c, 20
07
Write to Me!