apsta-ge.2011 advanced topics in quantitative methods… · f. manly, multivariate statistical...

2

Click here to load reader

Upload: vomien

Post on 09-Aug-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: APSTA-GE.2011 Advanced Topics in Quantitative Methods… · F. Manly, Multivariate Statistical Methods, A Primer (2nd Ed.), chapter 9. Class 2: Everitt & Dunn, Applied Multivariate

APSTA-GE.2011 Advanced Topics in Quantitative Methods: Marc Scott Classification and Clustering Spring 2015 Lecture: Tues & Weds 9:30AM-12:15PM (Runs 1/6-21/2015) Office: 801W Kimball Hall Location: TBD Phone: 212-992-9407 Office Hours: By appointment email: [email protected] Text: There is no required text. Selected chapters from several sources will be made available. Software: STATA, R, Python (optional). This course will use NYU Classes. COURSE DESCRIPTION: Classification and clustering are important statistical techniques commonly applied in many social and behavioral science research problems. Both seek to understand social phenomena through the identification of naturally occurring homogeneous groupings within a population. Classification techniques are used to sort new observations into pre-existing or known groupings, while clustering techniques sort the population under study into groupings based on their observed characteristics. Both help to reveal hidden structure that may be used in further analyses. This course will compare and contrast these techniques, including many of their variations, with an emphasis on applications. COURSE REQUIREMENTS: Participation: 10% You are expected to attend class and participate in class discussions Homework problems: 20% There will be several assigned problems intended to give you practical

experience with the methods discussed. Data Analysis Projects: 70% There will be two data analysis projects (worth 35% each). COURSE READINGS: Handouts will be available on Blackboard by the Thursday preceding class. It is the student’s responsibility to print out and review the notes before coming to class. Late assignment policy: Assignments are to be handed in on time.

2015 SCHEDULE (tentative)

Date Topic Jan. 6 Introduction to classification and clustering; what is a cluster; visualization

techniques, including principal components. The classification technique you already know (logistic regression). Intro to Hierarchical clustering

7 Hierarchical clustering: linkage choices; distance measures; the dendogram. Optimization techniques (k-means); choosing the number of groups; evaluating clusters;

9 HW 1 DUE (this is a Friday) 13 Model-based clustering (including model selection); Nagin clusters (intro); HW 2

DUE 14 Nagin clusters (model selection; group selection model); multinomial logit; 16 PROJECT 1 DUE (this is a Friday) 20 Classification; (Linear) Discriminant function analysis; Logistic Classifier 21 Tree-based methods; Naïve Bayes Classifier; HW 3 DUE 27 PROJECT 2 DUE (this is the Tuesday after last class)

Readings

Classes 1 & 2: Everitt et al., Cluster Analysis (4th Ed.), chapters 1 & 2. Brian F. Manly, Multivariate

Statistical Methods, A Primer (2nd Ed.), chapter 11 (skip 11.11). Everitt & Dunn, Applied Multivariate Data Analysis, chapter 6, sections 6.1, 6.2. Brian

F. Manly, Multivariate Statistical Methods, A Primer (2nd Ed.), chapter 9. Class 2: Everitt & Dunn, Applied Multivariate Data Analysis, chapter 6, sections 6.3, 6.4 (this

section is helpful for Feb. 18). Peter J. Rousseeuw (1987). Silhouettes: a graphical aid to

Page 2: APSTA-GE.2011 Advanced Topics in Quantitative Methods… · F. Manly, Multivariate Statistical Methods, A Primer (2nd Ed.), chapter 9. Class 2: Everitt & Dunn, Applied Multivariate

the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65.

Class 3: Banfield & Raftery (1993). Model-based Gaussian and non-Gaussian clustering.

Biometrics, Vol. 49, no. 3, 803-821. Fraley and Raftery (1998). How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis. The Computer Journal, 41(8):578-588.

Bobby L. Jones, Daniel S. Nagin, and Kathryn Roeder (2001). A SAS Procedure Based on Mixture Models for Estimating Developmental Trajectories. Sociological Methods & Research (29): 374-393.

Class 4: Shaun McDermott and Daniel S. Nagin (2001). Same or Different?: Comparing Offender

Groups and Covariates Over Time. Sociological Methods Research (29): 282-318. Class 5: Everitt & Dunn, Applied Multivariate Data Analysis, chapter 11. Brian F. Manly,

Multivariate Statistical Methods, A Primer (2nd Ed.), chapter 8. Tabachnick & Fidell, Using Multivariate Statistics (4th Ed.), chapter 11.

Class 6: Handouts