programmability in spss 16 & 17, jon peck
DESCRIPTION
Programmability in SPSS 16 & 17, Jon PeckTRANSCRIPT
Programmability in SPSS 16 and 17
Jon K Peck
Technical Advisor and Principal Software Engineer
Athens, May 2008
Agenda
Review of programmabilityThe Extension mechanism and the PROPOR procedureUser-Defined dialog boxesThe Dataset class and comparing datasetsExamples: custom sorting, pattern matchingBuilding applications that embed SPSSIntegrating R into SPSSQ and AWrap up
Programmability extends the standard SPSS capabilitiesMakes it easy to build jobs that respond to data, output, and the environmentAllows greater generality, more automationMakes jobs more flexible and robustAllows extending the capabilities of SPSSAllows the use of existing or new statistical modules written in R or PythonEnables simpler and more maintainable codeIncreases your productivityPuts you in control
More fun
SPSS syntaxBEGIN PROGRAM PYTHON or R.Python or R codeEND PROGRAM.SPSS syntaxA program in the SPSS input stream can communicate with SPSS and control it and use the language's facilities and modulesA Python or .NET application can embed SPSS inside itselfResources and forums are at SPSS Developer Central
www.spss.com/devcentralProgrammability plugins are an optional install
Programmability embeds Python or R inside SPSS
BEGIN PROGRAM.import spss, spssaux, spssdatadef findUnlabelledValues(name):d = spssaux.VariableDict()labels = set(d[name].ValueLabelsTyped)data = spssdata.Spssdata(indexes=[name])values = set()for case in data:
values.add(case[0])data.close()values.discard(None)print "\nUnlabeled Values:\n",sorted(values.difference(labels))
findUnlabelledValues("origin")END PROGRAM.
Example: Automate the job of finding unlabelled values of a variable
No label may indicate an error
Unlabeled Values:[4.0, 7.0, 11.0]
Python and R Are open source software
SPSS is not the owner or licensor of the Python or R software. Any user of Python or R must agree to the terms of the license agreement located on the Python or R web site. SPSS is not making any statement about the quality of the Python or R programs. SPSS fully disclaims all liability associated with your use of the Python or R programs.
SPSS is divided into two parts
The SPSS Processor: invisible– Syntax processing– Computation– Data handling– Procedures– May be remote with SPSS Server
The SPSS Front End: what you see– Menus and dialog boxes– Output Viewer– Data Editor– Syntax window
SPSS 16 added new programmability and scripting features
SPSS Processor– SPSS syntax– Python programs– .NET programs
SPSS Front End– SaxBasic scripts– COM support
SPSS Processor– SPSS syntax– Python programs– .NET programs– R programs– Extensions
SPSS Front End– Basic scripts (Windows)– COM support (Windows)– Python scripts
SPSS 15 SPSS 16
Scripting is useful for working with Viewer contents
Scripts can be written in Python or, on Windows, in BasicPython apis have a structure similar to familiar SaxBasic scripting– Import the spssClient module
IDEs are provided for Python and BasicSPSS 17 will allow programs to use the spssClientmoduleAutoscripts are triggered by specified types of output events– E.g., creating a table of regression coefficients
Autoscripts have been generalized in SPSS 16
Python and R add great functionality to SPSSMany users know only SPSS syntaxMEANS TABLES = accel BY origin
/CELLS MEAN COUNT STDDEV MEDIAN/STATISTICS LINEARITY.
Extensions define SPSS syntax for programs via XMLDefinitions are loaded automatically on SPSS startupParsed syntax is passed to Python or R moduleUser never needs to know about the programsAuthor never needs to parse SPSS syntaxPLS module in SPSS 16 is an extension
The EXTENSION mechanism turns Python or R programs into user-defined SPSS syntax
Extensions simplify the author's job
User'sSPSSSyntax
SPSSParser
ExtensionXML
Author codeModule
Run
extensionmodule
Templateparsecmd
Output
The author supplies only the gold parts
The user just enters the command syntax
PROPOR is a new extension procedure
Calculates confidence intervals for proportions
Produces pivot table output
PROPOR /HELP.Confidence Intervals for Proportions and Differences in Proportions.
PROPOR /HELP displays this help and does nothing else.Syntax:
PROPOR NUM=list DENOM=list [ID=varname][/DATASET NAME=dsname][/LEVEL ALPHA=value][/HELP]
Example:PROPOR NUM= 55 DENOM=100.
Developer Central
PROPOR produces a pivot table of confidence intervals
What about user interfaces?
SPSS
17
User-defined dialog boxes look like SPSS-defined dialogs
Which is the real one?
SPSS 17
Programmability can enhance procedures: A program to customize sorting in CTABLES
CTABLES /TABLE occupation[COUNT]/CATEGORIES VARIABLES=occupation ORDER=D KEY=COUNT TOTAL=YES.This table is sorted in descending order, but category Other should be at the bottom.
A Program To Customize Sorting in Ctables
import spss, spssaux2spssaux2.genCategoryList("occupation", specialvalues=[4], macroname="other")spss.Submit("""CTABLES /TABLE occupation[COUNT] /CATEGORIES VARIABLES=occupation [!other] TOTAL=YES.""")
Python regular expressions greatly simply tasks involving patterns in strings
A regular expression defines a pattern that can be searched for or used in a replaceExample: a dataset contains three variables, firstname, lastname, and narrative. The names need to be replaced in the narratives so that they are anonymousSample data:
Using regular expressions to work with patterns: Making a narrative anonymous
begin program.import spss, spssaux, spssdata, revard = spssaux.VariableDict()curs = spssdata.Spssdata(indexes='firstname lastname narrative', accessType='w')curs.append(spssdata.vdef("anonnarrative",
vtype=vard['narrative'].VariableType + 100))curs.commitdict()wbound = r"\b"for case in curs:
fnregex = re.compile(wbound + case.firstname.strip() + wbound, flags=re.IGNORECASE)
lnregex = re.compile(wbound + case.lastname.strip() + wbound, flags=re.IGNORECASE)
newnarr = fnregex.sub("-firstname-", case.narrative)newnarr = lnregex.sub("-lastname-", newnarr)curs.casevalues([newnarr])
curs.CClose()end program.
E.g. \bSmith\b
Before and After
The Dataset class delivers new functionality for data management
Available for Python and .NETRetrieve, add, delete and change variables, properties, and valuesProcess multiple datasets at the same timeAccess any case by case numberIncluded in the spss module in the plug-in
SPSS
16
ds = spss.Dataset()ds.varlist['accel'].label = "acceleration" #change labelprint len(ds.cases)ds.cases[10,2] = [100] #change a value
comparedatasets uses the Dataset class to compare cases and variables in two datasets
BEGIN PROGRAM.import spss, comparedatasetsc = comparedatasets.CompareDatasets("first", "second",
idvar="id", diffcount="differences", reportroot="compare")
c.cases()c.dictionaries()c.close()END PROGRAM.
As an extension:
COMPDS DS1=first, DS2=second/DATA ID=id DIFFCOUNT=differencesROOTNAME=compare.
Developer Central
Comparedatasets: The output dataset reports case differences
comparedatasets:A summary is written to the SPSS Viewer
You can do selection, summary statistics, and charts on the outcome variables for further information.
SPSS 17 will have a built-in procedure
The Dataset class makes it easy to use the functions in the extendedTransforms module
data list fixed /dt(a21).begin data.2/22/2008 11:47:45 AM2/22/2008 11:47:45 PMend data.
begin program.import spss, extendedTransformsspss.StartDataStep()ds = spss.Dataset()ds.varlist.append("newdt", 0)ds.varlist[-1].format = (22,22,0) # DATETIME22.0 format
for i, case in enumerate(ds.cases):ds.cases[i, -1] = extendedTransforms.strtodatetime(case[0],
"%m/%d/%Y %I:%M:%S %p")
spss.EndDataStep()end program.
strtodatetimeand datetimetostrallow patternsto be usedfor dates and times
14 functions inextendedTransforms
Developer Central
You can write applications where SPSS is hidden using external drives mode
Application built by SPSS Services
A Reporting Application
Real nameshave beenscrambled
Written entirely in PythonUses SPSS invisibly for calculation and chartingOutput is captured with the Output Management System (OMS)Uses free packages to supplement SPSS– wxPython for user interface– Reportlab for PDF production
Similar things could be done with .NET
The application was built with Python, SPSS, and standard Python packages
R programs can be run inside SPSS
SPSS datasets and output can be processed by RNew SPSS datasets can be created from RR can communicate with SPSS via 30 apis
BEGIN PROGRAM R.cases <- spssdata.GetDataFromSPSS(c("mpg", "accel"), 5)spsspivottable.Display(cases, collabels=c("mpg", "accel"))END PROGRAM.
• Output appears in the SPSS Viewer• spsspivottable.Display produces pivot tables
• print() produces plain text•SPSS 17 will include graphical output
R brings many statistical methods into SPSS
52 packages starting with"a"
Example: Estimate Rents Using theR Package kknn: K Nearest Neighbors
BEGIN PROGRAM R.dict <- spssdictionary.GetDictionaryFromSPSS()data <-spssdata.GetDataFromSPSS()library(kknn)kl <- c("rectangular","triangular","epanechnikov", "gaussian","rank")t.con <-train.kknn(nmqm ~ wfl + bjkat + zh, data=data, kmax=25, kernel=kl)print(t.con)newv <- spssdictionary.CreateSPSSDictionary(c("predictedRent",
"Predicted Rent", 0, "F8.2", "scale"))spssdictionary.SetDictionaryToSPSS("newrents", data.frame(dict, newv))best <- (charmatch(t.con$best.parameters$kernel, klist)-1) * 25 +
t.con$best.parameters$kspssdata.SetDataToSPSS("newrents",
data.frame(c(t.con$fitted.values[[best]]), data))spssdictionary.EndDataStep()END PROGRAM. (Adapted from an Example in the kknn Package)
R output appears in the Viewer. The output data appear in the Data Editor
Where We Have Been Today
Programmability adds flexibility and power to SPSSThe extension mechanism integrates programs better into SPSS syntaxThe new Dataset class adds data management powerThe new scripting capabilities provide more ways to work with outputR integration opens a large collection of statistical techniques to SPSS users
Questions and Answers
?
??
????
In Conclusion
Programmability capabilities continue to growOpening up SPSS puts you in control through plugging in your own codeMore tasks can be automatedYou can easily tap large R and Python librariesNew capabilities extend data managementThe Extension mechanism integrates capabilities with a consistent syntax
Tell us about your programmability experiences
Jon Peck, Ph. D.
SPSS Inc233 S Wacker DriveChicago, IL [email protected]