programmability in spss 15
TRANSCRIPT
Programmability in SPSS 15
The Revolution Continues
Jon PeckTechnical AdvisorSPSS
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Recap of SPSS 14 Python programmability
Developer Central
New features in SPSS 15 programmability Writing first-class procedures Updating the data
The Bonus Pack modules
Interacting with the user
Q & A
ConclusionCopyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Agenda
"Because of programmability, SPSS 14 is the most important release since I started using SPSS fifteen years ago."
"I think I am going to like using Python."
"Python, here I come!"
"I now think Python is an amazing language."
"Python and SPSS 14 and later are, IMHO, GREAT!"
"By the way, Python is a great addition to SPSS."Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Quotations from SPSS Users
SPSS provides a powerful engine for statistical and
graphical methods and for data management.
Python® provides a powerful, elegant, and easy-
to-learn language for controlling and responding to
this engine.
Together they provide a comprehensive system for
serious applications of analytical methods to data.
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
The Combination of SPSS and Python
SPSS 14.0 provided Programmability Multiple datasets Variable and File Attributes Programmability read-access to case data Ability to control SPSS from a Python program
SPSS 15 adds Read and write case data Create new variables directly rather than generating syntax Create pivot tables and text blocks via backend API’s Easier setup
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Programmability Features in SPSS 14 and 15
Makes possible jobs that respond to datasets, output, environment
Allows greater generality, more automation
Makes jobs more robust
Allows extending the capabilities of SPSS
Enables better organized and more maintainable code
Facilitates staff specialization
Increases productivity
More fun
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Programmability Advantages
Python extends SPSS via General programming language Access to variable dictionary, case data, and output Access to standard and third-party modules SPSS Developer Central modules Module structure for building libraries of code
Runs in "back-end" syntax context (like macro) SaxBasic scripting runs in "front-end" context
Two modes Traditional SPSS syntax window Drive SPSS from Python (external mode)
Optional installCopyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Programmability Overview
SPSS is not the owner or licensor of the Python
software. Any user of Python must agree to the
terms of the Python license agreement located on
the Python web site. SPSS is not making any
statement about the quality of the Python program.
SPSS fully disclaims all liability associated with
your use of the Python program.
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Legal Notice
Supports implementing various programming
languages Requires a programmer to implement a new language
VB.NET Plug-In available on Developer Central Works only in external mode
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
The SPSS Programmability SDK
Python interpreter embedded within SPSS
SPSS runs in traditional way until BEGIN PROGRAM command is found
Python collects commands until END PROGRAM command is found; then runs the program
Python can communicate with SPSS through API's (calls to functions) Includes running SPSS syntax inside Python program Includes creating macro values for later use in syntax
Python can access SPSS output and data
OMS is a key toolCopyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
How Programmability Works
BEGIN PROGRAM.import spss, spssauxspssaux.GetSPSSInstallDir("SPSSDIR")spssaux.OpenDataFile("SPSSDIR/employee data.sav")
# find categorical variablescatVars = spssaux.VariableDict(variableLevel=['nominal', 'ordinal'])if catVars:
spss.Submit("FREQ " + " ".join(catVars.variables))# create a macro listing categorical variablesspss.SetMacroValue("!catVars", " ".join(catVars.variables))
END PROGRAM.
DESC !catVars.Run
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Example: Summarize Categorical Variables
Two modes of operation
SPSS Drives mode (inside): traditional syntax context BEGIN PROGRAM …program… END PROGRAM
X Drives mode (outside): eXternal program drives SPSS Python interpreter (or VB.NET) import spss No SPSS Viewer, Data Editor, or SPSS user interface
Output sent as text to the application – can be suppressed Has performance advantages Build programs with an IDE
Even if to be run in traditional mode
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Programmability Inside or Outside SPSS
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
PythonWin IDE Controlling SPSS
Python.org
Python Tutorial
Global (standard) Module Index
Python help system and help command
Cheeseshop 1627 packages as of Sept 21, 2006
SPSS Developer Central
SPSS Programming and Data Management, 3rd ed, 2006.
Many books Look for books at the Python 2.4 level
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Python Resources
Dive Into Python book or PDF
Practical Python by Magnus Lie Hetland Extensive examples and discussion of Python
Python Cookbook, 2nd ed by Martelli, Ravenscroft, & Ascher
Second edition (July, 2006) ofMartelli, Python in a Nutshell, O'Reilly Very clear, comprehensive reference material
wxPython in Action by Rappin and Dunn Explains user interface building with wxPython
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Python Books
scipy 0.5.0 Scientific Algorithms Library for Python
scipy is an open source library of scientific tools for Python. scipy gathers a variety of high level science and engineering modules together as a single package. scipy provides modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, genetic algorithms, ODE solvers, special functions, and more. scipy requires and supplements NumPy, which provides a multidimensional array object and other basic functionality.
scipy rework currently beta
Visit Scipy.org
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Cheeseshop: scipy
Went Live21-May-2006
New Web home for developing SPSS applications
SPSS Developer Central old url: forums.spss.com/code_center
Python Integration Plug-Ins
Useful supplementary modules by SPSS and others Updated for SPSS 15
Articles on programmability and graphics
Place to ask questions and exchange information
Programmability Extension SDK
Get Python itself from Python.org SPSS uses 2.4. (2.4.3)
Not limited to programmability GPL graphics User-contributed code
Key Supplementary Modulesspssauxspssdata
New for SPSS 15trans extendedTransforms rake pls
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
SPSS Developer Central
You can extend SPSS capabilities by building new procedures Or use ones that others have built
Combine SPSS procedures and transformations with Python logic Poisson regression (SPSS 14) example using iterated CNLR New raking procedure built over GENLOG
Calculate data aggregates in SPSS and pass to algorithm coded in Python Raking procedure starts with AGGREGATE
Acquire case data and compute in Python Use Python standard modules and third-party additions Partial Least Squares Regression (pls module)
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Approaches to Creating New Procedures
Common to adapt existing libraries or code for use as Python extension modules C, C++, VB, Fortran,...
Extension modules are normal Python modules Python itself written in C Many standard modules are C code
Python tools and API's to assist Chap 25 in Python in a Nutshell Tutorial on extending and embedding the Python interpreter
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Adapt Existing Code Libraries
Regression with large number of predictors (even k > N)
Similar to Principal Components but considers dependent variable simultaneously
Calculates principal components of (y, X) then use regression on the scores instead of original data
User chooses number of factors
Equivalent to ordinary regression when number of factors equals number of predictors and one y variable
For more information see An Optimization Perspective on
Kernel Partial Least Squares Regression.pdf.Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Partial Least Squares Regression
Strategy Fetches data from SPSS Uses scipy matrix operations to compute results
Third-party module from Cheeseshop
Writes pivot tables to SPSS Viewer Subject to OMS SPSS 14 viewer module created pivot table using OLE automation
Saves predicted values to active dataset
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
The pls Module
GET FILE="c:/spss15/tutorial/sample_files/car_sales.sav".REGRESSION /STATISTICS COEFF R /DEPENDENT sales /METHOD=ENTER curb_wgt engine_s fuel_cap horsepow length mpg price resale type wheelbas width .
begin program.import spss, pls
pls.plsproc("sales", """curb_wgt engine_s fuel_cap horsepowlength mpg price resale type wheelbas width""", yhat="predsales")end program.
plsproc defaults to five factors
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
pls Example: REGRESSION vs PLS
PLS with 5 factors
almost equals
regression with 11
variables
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Results
"Raking" adjusts sample weights to control totals in n dimensions
Example: data classified by age and sex with known population totals or proportions
Calculated by fitting a main effects loglinear model Various adjustments required Not a complete solution to reweighting
Not directly available in SPSS
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Raking Sample Weights
Strategy: combine SPSS procedures with Python logic
rake.py (part of SPSS 15 Bonus Pack) Aggregates data via AGGREGATE to new dataset Creates new variable with control totals Applies GENLOG, saving predicted counts Adjusts predicted counts Matches back into original dataset
Does not use MATCH FILES or require a SORT command Written in one (long) day
rake.rake("age sex", [{0: 1140, 1:1140}, {0: 104.6, 1:2175.4}], finalweight="finalwt")
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Raking Module
SPSS 14 programmability can wrap SPSS syntax in Python logic Useful when definitions can be expressed in SPSS syntax
SPSS 15 programmability can generate new variables directly Cursor can have accessType='w'
SPSS 15 programmability can add cases directly Cursor can have accessType='a'
SPSS 15 programmability can create new datasets from scratch Cursor can have accessType='n'
spssdata module on Developer Central updated to support these modes
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Extending SPSS Transformations
trans module facilitates plugging in Python code to iterate over cases
Runs as an SPSS procedure Passes the data Adds variables to the SPSS variable dictionary Can apply any calculation casewise
Use with Standard Python functions (e.g., math module) Any user-written functions or appropriate classes Functions in extendedTransforms module
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
trans and extendedTransforms Modules
trans strategy Pass case data through Python code writing
result back to SPSS in new variables
extendedTransforms collection of ten functions to apply to SPSS variables Regular expression search/replace Template-based substitution soundex and nysiis functions for phonetic equivalence Levenshtein distance function for string similarity Date/time conversions based on patterns
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
trans and extendedTransforms Modules
Pattern matching in text strings
If you use SPSS index or replace, you need these
Standardize string data (Mr, Mr., Herr, Senor,...)
Patterns can be simple strings (as with SPSS
index) or complex patterns
Pick out variable names with common parts
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Python Regular Expressions
"age" – string containing the letters age
"\wage" – string containing the word age
"abc|xyz|pqrst" = string containing any of abc etc
"\d+" – a string of any number of digits
"x.*y" – a string starting with x and ending with y
Can be case sensitive or not
Can greatly simplify code currently using SPSS index and replace functions
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Regular Expressions: A Few Examples
import spss, trans, spssaux, extendedTransforms
spssaux.OpenDataFile("c:/data/names.sav")tproc = trans.Tfunction(listwiseDeletion=True)
tproc.append(extendedTransforms.search, 'match','a8', ['names', trans.const('Peck|Pech|Pek')])
tproc.append(extendedTransforms.search, 'matchignorecase','a8', ['names', trans.const('peck'), trans.const(True)])
tproc.append(extendedTransforms.search, ('match2','startpos','length'), ('a12','f4.0','f4.0'), ['names', trans.const('Peck')])
tproc.execute()spss.Submit("SELECT IF length > 0")
spssaux.SaveDataFile("c:/temp/namesplus.sav")Run
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Using trans and extendedTransforms search Function
begin program.import trans, redef splitAndExtract(s): """split a string on "--" and return the left part and the number in the right part. Ex: "simvastatin-- PO 80mg TAB" -> "simvastatin", 80""" parts = s.split("--") try: number = re.search("\d+", parts[1]).group() except: number = None return parts[0], number
tproc = trans.Tfunction()tproc.append(splitAndExtract, ("name", "number"), ("a30", "f5.0"), ["medicine"])tproc.execute()end program. Run
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Using trans:Writing Your Own Function
Algorithms for approximating phonetic equivalence of names
soundexallwords can be used on unstructured text
Applied to database of 20,000+ surnames
import spss, trans, spssaux, extendedTransforms
spssaux.OpenDataFile("c:/data/names.sav")tproc = trans.Tfunction()
tproc.append(extendedTransforms.soundex, 'soundex','a5', ['names'])tproc.append(extendedTransforms.nysiis, 'nysiis', 'a20', ['names'])tproc.execute()
spssaux.SaveDataFile("c:/temp/namesplusplus.sav")Run
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
extendedTransforms soundex and nysiis
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Results
(Overly) simple processing of unstructured text
Use soundex word by word to abstract spelling
No stemming, linguistic analysis etc Use STAFS for serious work
Very simple to use
begin program.import spss, trans, extendedTransformst = trans.Tfunction()t.append(extendedTransforms.soundexallwords, 'allsoundexn66',
'a108', ['n_66'])t.execute()end program.
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
soundex on Unstructured Text
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
soundex on Unstructured Text
Python comes with Tkinter, a gui toolkit
There are better ones freely downloadable E.g., wxPython Visit wxpython.org
Very easy to do small user interactions
Examples Message box File chooser Variable picker
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Creating a Graphical User Interface
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Simple Message Box Using wxPython
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Simple File Chooser Using wxPython
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Variable Picker Using wxPython
User-missing values GetVarMissingValues GetSPSSLowHigh
Pivot table API's BasePivotTable CellText Dimension
Output Text block support Good for writing comments to the Viewer
Miscellaneous GetWeightVar HasCursor SplitChange
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Other New spss Module API’s
SPSS 14 introduced major programmability features
SPSS 15 adds Reading and writing case data: new variables; new cases Creating pivot tables and text blocks Writing first-class SPSS procedures
Bonus Pack and Partial Least Squares modules illustrate these features
Developer Central improves ability to provide modules and information Will soon have four new SPSS 15 modules
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Recap
?
? ??
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Questions
SPSS 15 programmability makes it easy to add capabilities beyond what is already built in to SPSS
SPSS 15 makes it easier to build complete applications on top of SPSS
SPSS 15 programmability makes you more productive
SPSS 15 has lots of other great features, too
Try it out
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
SPSS 15: The Revolution Continues
Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006
Write to Me!