2017 predictive analytics symposium...sneath, p. h. and r. r. sokal (1973). numerical taxonomy. the...
TRANSCRIPT
![Page 1: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/1.jpg)
2017 Predictive Analytics Symposium
Session 10, Clustering Techniques
Moderator: Geoffrey R. Hileman, FSA, MAAA
Presenters:
Matthias Kullowatz Marjorie A. Rosenberg, FSA
SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer
![Page 2: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/2.jpg)
An application of K-means cluster analysis with missing values
Matthias Kullowatz, MSSession 10: Clustering TechniquesSeptember 14, 2017
![Page 3: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/3.jpg)
Motivation
2
![Page 4: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/4.jpg)
Clustering examples
Taxonomy (e.g. organisms)Optimal geographic positioning (e.g. ambulances)Market segmentation
3
![Page 5: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/5.jpg)
BackgroundUse enriched dataset to predict variable annuity
policyholder behavior (with GLWB rider)DataCredit info Lifestyle infoMortgage infoCensus Bureau info
Behaviors LapseGLWB election timingGLWB utilization efficiency
9/18/2017 4
![Page 6: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/6.jpg)
Benefits of clustering
Produce a more implementable modelMore stable representation of complex data for
long-term projectionsCreate recognizable clusters to tell intuitive stories
9/18/2017 5
![Page 7: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/7.jpg)
What is (and isn’t) k-means clustering?ExclusiveUnsupervisedNot hierarchical
6
https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/images/tree1a.png
![Page 8: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/8.jpg)
Basic k-means algorithm
• Select k cluster “centroids” in the data space• Assign each of n observations to nearest cluster by
shortest Euclidean distance to centroid• Standardize variables first• No categorical variables allowed
• Re-calculate new centroids as means• Repeat until convergence
7
![Page 9: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/9.jpg)
Visualizing K-means
8
Documentation can be found at: https://github.com/milliman/SOA_PAS_KmeansClustering
![Page 10: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/10.jpg)
K-means with missing values
9
![Page 11: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/11.jpg)
What to do with missing values
• Calculate centroid means: • Ignore missing values for each dimension’s mean• Weight observations using proportion of known values
• Observation assignment:• Calculate distance in a reduced dimension
(ignore missing values)
10
![Page 12: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/12.jpg)
Example: Calculating cluster centroid
* Our algorithm uses the weighted centroid means
11
Cluster A Creditscore
Delinquencies Populationdensity
Home value
Appreciation
Obs A1 1.0 1 -0.5 NA NA
Obs A2 2.0 3 -0.4 -0.5 -0.7
Obs A3 NA NA -0.6 -0.5 -0.3
Unweighted 1.5 2 -0.5 -0.5 -0.5
Weighted* 1.625 2.25 -0.48 -0.5 -0.55
![Page 13: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/13.jpg)
Example: Calculating distance
𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 𝐷𝐷𝑡𝑡 𝐴𝐴: 0.6252 + 1.2502 + 0.0202 = 1.40𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 𝐷𝐷𝑡𝑡 𝐵𝐵: 0.5002 + 0.5002 + 0.5002 = 0.87
12
Metric Creditscore
Delinquencies Populationdensity
Home value
Appreciation
Centroid A 1.625 2.25 -0.48 -0.5 -0.55
Centroid B 0.5 0.5 0.0 0.25 0.25
Obs A1 1.0 1 -0.5 NA NA
![Page 14: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/14.jpg)
Considerations
• How do you choose initial cluster centroids?• Why Euclidean distance?• Binary variables: could we handle them?• Could weight some fields differently
13
![Page 15: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/15.jpg)
Variable Annuity Example
14
![Page 16: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/16.jpg)
K-means data summary
220,000 policyholders20 third-party data fields+Attained age, account value
15
Observations Missing Proportion0 11.2%1 2.8%2 13.3%3 7.1%4 20.4%5 19.2%
6+ 26.0%
![Page 17: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/17.jpg)
How to define clusters
UnsupervisedNo preconceived notion of group labels
How different is each cluster across each dimension?Calculate average Z-scores of each dimension
Identify dimensions that make each cluster unique
16
![Page 18: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/18.jpg)
The Debt cluster
9/18/2017 17
![Page 19: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/19.jpg)
The Urban Renters cluster
9/18/2017 18
![Page 20: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/20.jpg)
Customer Segments
19
In Debt:Low credit scores, high counts of credit delinquencies in the last five years
Lower Income:Lower than average education levels, home values, and income levels
Middle Income:Slightly higher than average education levels, home values, and income levels
High Income:Highest education levels, home values, and income levels
Urban Renters: Live in high population density areas, with low proportion of homeowners
Families:More likely to have children living at home, younger on average
Retired:Likely to be older, and live in areas with high proportions of individuals over the age of 65
![Page 21: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/21.jpg)
9/18/2017 20
Modeling process
Cash flow projection
Customer level profitability (CLP)
Cash flow projection
Predictive modeling
Clustering
![Page 22: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/22.jpg)
Lapse sensitivity to in-the-moneyness (ITM)
21
![Page 23: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/23.jpg)
Benefit utilization deferral curves
22
![Page 24: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/24.jpg)
CLP by cluster
9/18/2017 23
This boxplot shows us that the policies sold to the urban, debt, and retired clusters are the most profitable on average. Also the retired cluster shows the widest range of CLP.
![Page 25: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/25.jpg)
Thank you
24
![Page 26: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/26.jpg)
A Numerical Taxonomy Application to ClusterIndividuals for Predicting Healthcare Expenditures
Work in Progress
Josh Agterberg, Fanghao Zhong, Richard Crabb, MargieRosenberg
University of Wisconsin – Madison
We acknowledge the Society of Actuaries CAE Research Grant fortheir partial support in this work.
Rosenberg et al. (UW-Madison) SOA Predictive Analytics Meeting September 2017 1 / 21
![Page 27: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/27.jpg)
Purpose
To present a novel way of clustering individuals to group similarindividuals for prediction of health care expenditures where covariatesare categorical
Secondarily: To define what is a numerical taxonomy system and itsrelationship to unsupervised clustering
Rosenberg et al. (UW-Madison) SOA Predictive Analytics Meeting September 2017 2 / 21
![Page 28: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/28.jpg)
Outline
1 Background
2 Numerical Taxonomy
3 Methods
4 Our Approach
5 Data and Design
6 Results
7 Conclusions
Rosenberg et al. (UW-Madison) SOA Predictive Analytics Meeting September 2017 3 / 21
![Page 29: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/29.jpg)
Background
Area of Application
• Focus on health care data• Only categorical predictors• That are not necessarily numbered or ordered• With no prior clusters defined• And with no labels as to assignment to cluster
Want to group individuals who are similar
Rosenberg et al. (UW-Madison) SOA Predictive Analytics Meeting September 2017 4 / 21
![Page 30: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/30.jpg)
Background
Approach: Want to group individuals who are similar
• Data need to be definable• Data need to be measurable• Need to determine how two individuals are related
Sneath and Sokal (1973)
Rosenberg et al. (UW-Madison) SOA Predictive Analytics Meeting September 2017 5 / 21
![Page 31: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/31.jpg)
Background
Examples of Health Care
• ICD classification• DRG• HCC
Rosenberg et al. (UW-Madison) SOA Predictive Analytics Meeting September 2017 6 / 21
![Page 32: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/32.jpg)
Numerical Taxonomy
Some Definitions
Taxonomy (Simpson (1961)) : Theoretical study of classification includingits bases, principles, and rules
Taxon (Sneath and Sokal (1973)) : Abbreviation for taxonomic group.Plural is taxa
Classification (Simpson (1961)) : Ordering of entities into groups or setson the basis of relationships
Classification could be seen as method, but is sometimesused as outcome of process (i.e. result of classification isclassification). Taxonomy could be viewed as theory (butsometimes used interchangeably)
Numerical Taxonomy (Sneath and Sokal (1973)) : Grouping by numericalmethods of taxonomic units based on their characterstates, using methods that are objective, explicit, andrepeatable
Rosenberg et al. (UW-Madison) SOA Predictive Analytics Meeting September 2017 7 / 21
![Page 33: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/33.jpg)
Numerical Taxonomy
Numerical Taxonomic Principles
• More information available for classification, the better theclassification
• Every variable has equal weight (or not?)• Overall similarity between 2 individuals is function of
individual-level similarity in each of the variables• Distinct groups recognized due to correlation of variables within
each group• Classification based on similarity• Inferences valid from developed clusters (originally based on
biology)
Note: of course, data need to be definable and measurable
Sneath and Sokal (1973) (Note: Traced back to 1700s)
Rosenberg et al. (UW-Madison) SOA Predictive Analytics Meeting September 2017 8 / 21
![Page 34: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/34.jpg)
Numerical Taxonomy
Taxonomic Principles Applied (ICD, DRG, HCC, andbiology)
• Approach Defensible• Reproducible process• Individuals classified into reliable groups by different coders• Set of operational protocol with standardized names• Interpretable clusters with characteristics generally constant• Manageable number of clusters• Predictive power of cluster
Evans, Pope, Kautter, Ingber, Freeman, Sekar, and Newhart (2011); Feinstein (1967, 1988);
Fetter, Shin, Freeman, Averill, and Thompson (1980); Sneath and Sokal (1973)
Rosenberg et al. (UW-Madison) SOA Predictive Analytics Meeting September 2017 9 / 21
![Page 35: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/35.jpg)
Methods Principles
Questions when Clustering with Only CategoricalVariables
• How many variables to include?• Which variables to include and how include?• If ordered categorical variable, how measure differences?• If multi-level,
• Convert into multiple indicator variables?• Weight rarer levels more?• How incorporate (or not) missing values
• How define the center?
Goodall (1964, 1966)
Rosenberg et al. (UW-Madison) SOA Predictive Analytics Meeting September 2017 10 / 21
![Page 36: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/36.jpg)
Methods Principles
Clustering Operational Protocol with CategoricalVariables
To create clusters (labeled as cluster method), we need:1 Definition of cluster center2 General algorithm of how to create clusters: like k-means,
k-medoids3 Particular function in software to calculate4 Need a definition of similarity5 We need a matrix of dis-similarity (per numerical taxonomy
literature, like distance, correlation or probability). If distance, thenwhich kind L1, L2 or other?
Note: Above items are inter-relatedEnd result sensitive to choice of algorithm
Rosenberg et al. (UW-Madison) SOA Predictive Analytics Meeting September 2017 11 / 21
![Page 37: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/37.jpg)
Methods Principles
Issues with Clustering Methods
1 Appropriateness of cluster method when all variables arecategorical
2 Random choice of starting point produces clusters that could differ(i.e. greedy algorithm may converge to local minimum but may notbe global minimum)
3 Choice of similarity measures could change results4 How justify clusters using taxonomic principles?
Rosenberg et al. (UW-Madison) SOA Predictive Analytics Meeting September 2017 12 / 21
![Page 38: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/38.jpg)
Our Approach
Our Cluster Protocols
1 Define center as medoid (center represents actual data point)2 General clustering algorithm: PAM (Partitioning Around Medoids)3 Cluster function: Use pam() function in cluster package in R4 Compare two similarity measures: Gower’s distance and
Goodall’s similarity• Gower: Define similar if attributes are equal; compute simple
average• Goodall: Define two entities to be more similar if attributes are
rarer; compute similarity index
Rosenberg et al. (UW-Madison) SOA Predictive Analytics Meeting September 2017 13 / 21
![Page 39: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/39.jpg)
Data and Design
NHIS/MEPS Data
• National Health Interview Survey (NHIS) linked to MedicalExpenditure Panel Survey (MEPS)
• NHIS Sample Adult Questionnaire for adult health behavior data• Complex survey design allowing estimates of US civilian
non-institutionalized population
Note: Complex design provides estimates of population, althoughsample (possibly allow for more reproducible results)
Rosenberg et al. (UW-Madison) SOA Predictive Analytics Meeting September 2017 14 / 21
![Page 40: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/40.jpg)
Data and Design
Study Design
• NHIS baseline year 2010: All individual-level characteristics todefine initial clusters
• MEPS panel 16 (representing calendar years 2011 to 2012) forvalidation of clusters using expenditures
Rosenberg et al. (UW-Madison) SOA Predictive Analytics Meeting September 2017 15 / 21
![Page 41: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/41.jpg)
Data and Design
MEPS Longitudinal Design
Rosenberg et al. (UW-Madison) SOA Predictive Analytics Meeting September 2017 16 / 21
![Page 42: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/42.jpg)
Data and Design
NHIS Variables for Clusters
• Personal demographic information• Health status information• Living style or habits• Working environment and status
Rosenberg et al. (UW-Madison) SOA Predictive Analytics Meeting September 2017 17 / 21
![Page 43: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/43.jpg)
Results
Results
• To be filled in
Rosenberg et al. (UW-Madison) SOA Predictive Analytics Meeting September 2017 18 / 21
![Page 44: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/44.jpg)
Conclusions
Conclusions
• Numerical taxonomy literature provides history• Taxonomy principles provide guidance to justify approach• Operational protocol provide guidance to communicate methods• Presence of categorical variables can complicate analysis
Rosenberg et al. (UW-Madison) SOA Predictive Analytics Meeting September 2017 19 / 21
![Page 45: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/45.jpg)
Conclusions
Bibliography I
Evans, M. A., G. C. Pope, J. Kautter, M. J. Ingber, S. Freeman,R. Sekar, and C. Newhart (2011). Evaluation of the CMS-HCC riskadjustment model. CfMM Services, Editor .
Feinstein, A. R. (1967). Clinical Judgment. Williams and Wilkens Co.Feinstein, A. R. (1988). ICD, POR, and DRG: Unsolved scientific
problems in the nosology of clinical medicine. Archives of internalmedicine 148(10), 2269–2274.
Fetter, R. B., Y. Shin, J. L. Freeman, R. F. Averill, and J. D. Thompson(1980). Case mix definition by diagnosis-related groups. Medicalcare 18(2), i–53.
Goodall, D. W. (1964). A probabilistic similarity index.Nature 203(4949), 1098–1098.
Rosenberg et al. (UW-Madison) SOA Predictive Analytics Meeting September 2017 20 / 21
![Page 46: 2017 Predictive Analytics Symposium...Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. The principles and practice of numerical classification. Rosenberg et al. (UW-Madison)](https://reader030.vdocuments.mx/reader030/viewer/2022040321/5e54311f647080478f288280/html5/thumbnails/46.jpg)
Conclusions
Bibliography II
Goodall, D. W. (1966). A new similarity index based on probability.Biometrics, 882–907.
Simpson, G. G. (1961). Principles of animal taxonomy.
Sneath, P. H. and R. R. Sokal (1973). Numerical taxonomy. Theprinciples and practice of numerical classification.
Rosenberg et al. (UW-Madison) SOA Predictive Analytics Meeting September 2017 21 / 21