testtw.rpi.edu/media/latest/dataanalytics2018_group4_modul… · ppt file · web viewpeter fox ....
TRANSCRIPT
![Page 1: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/1.jpg)
1
Peter Fox Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
Group 4 Module 14, April 16, 2018
Some review, then Hierarchical Linear Models,
Optimizing, Iterating ctd.
![Page 2: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/2.jpg)
Some review(s) first• ctrees
– group2/lab1_ctree2.R• Kknn for iris• SVM for iris
• Factor Analysis v. Principal Components
• Remember open lab this Thursday
• Assignment 7 due Friday 20th 2
![Page 3: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/3.jpg)
Swiss ctree…
3
![Page 4: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/4.jpg)
Kknn lab - iris
4
![Page 5: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/5.jpg)
Svm lab - iris
5
![Page 6: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/6.jpg)
randomForest (iris)
6
![Page 7: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/7.jpg)
Factor Analysis – 2 (Athletics)
7
![Page 8: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/8.jpg)
Factor Analysis – 3 (Athletics)
8
![Page 9: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/9.jpg)
Principal component (Athletics)
9
![Page 10: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/10.jpg)
10
![Page 11: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/11.jpg)
Iterating: structure to regression• Hierarchical … simpler form of mixed model?
11
![Page 12: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/12.jpg)
Remember: Random effects..In the initial exploration class as nested within school, “class is 'under' school”, specified inside parentheses and can be repeated measures, interaction terms, or nested
lmm.1 <- lmer(extro ~ open + social + class + (1|school/class), data = lmm.data)summary(lmm.1)
12
![Page 13: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/13.jpg)
Summary(lmm.1)Linear mixed model fit by REML ['lmerMod']Formula: extro ~ open + social + class + (1 | school/class) Data: lmm.data
REML criterion at convergence: 3521.5
Scaled residuals: Min 1Q Median 3Q Max -10.0144 -0.3373 0.0164 0.3378 10.5788
Random effects: Groups Name Variance Std.Dev. class:school (Intercept) 2.8822 1.6977 school (Intercept) 95.1725 9.7556 Residual 0.9691 0.9844 Number of obs: 1200, groups: class:school, 24; school, 6
13
![Page 14: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/14.jpg)
Fixed effects: Estimate Std. Error t value(Intercept) 5.712e+01 4.052e+00 14.098open 6.053e-03 4.965e-03 1.219social 5.085e-04 1.853e-03 0.274classb 2.047e+00 9.835e-01 2.082classc 3.698e+00 9.835e-01 3.760classd 5.656e+00 9.835e-01 5.751
Correlation of Fixed Effects: (Intr) open social classb classcopen -0.049 social -0.046 -0.006 classb -0.121 -0.002 0.005 classc -0.121 -0.001 0.000 0.500 classd -0.121 0.000 0.002 0.500 0.500
14
![Page 15: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/15.jpg)
Now: Intra Class Correlation# First, run the 'null' model (which includes just the intercepts and the random effect for the highest level of the nesting variables; in this example 'school’.lmm.null <- lmer(extro ~ 1 + (1|school), data = lmm.data)summary(lmm.null)
15
![Page 16: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/16.jpg)
summaryLinear mixed model fit by REML ['lmerMod']Formula: extro ~ 1 + (1 | school) Data: lmm.dataREML criterion at convergence: 5806.1Scaled residuals: Min 1Q Median 3Q Max -5.9773 -0.5315 0.0059 0.5298 6.2109 Random effects: Groups Name Variance Std.Dev. school (Intercept) 95.87 9.791 Residual 7.14 2.672 Number of obs: 1200, groups: school, 6
Fixed effects: Estimate Std. Error t value(Intercept) 60.267 3.998 15.07
16
![Page 17: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/17.jpg)
Intra Class Correlation (ICC)# Notice the variance component estimates for the random effect. If we add these together, then divide that total by the 'school' variance estimate; we get the ICC95.8720 + 7.139995.8720 / 103.0119
# This indicates that 93.06886% of the variance in 'extro' can be "explained" by school group membership (verified below using Bliese's multilevel package).
17
![Page 18: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/18.jpg)
# ICC1 and ICC2 as described by Bliese.
library(multilevel)aov.1 <- aov(extro ~ school, lmm.data)summary(aov.1) Df Sum Sq Mean Sq F value Pr(>F) school 5 95908 19182 2687 <2e-16 ***Residuals 1194 8525 7 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 18
![Page 19: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/19.jpg)
ICC1/ ICC2 (Bliese)# Below (ICC1) indicates that 93.07% of the variance in 'extro' can be "explained" by school# group membership.ICC1(aov.1)[1] 0.930689
# The ICC2 value (below) of .9996 indicates that school groups can be very reliably differentiated in terms of 'extro' scores.> ICC2(aov.1)[1] 0.9996278 19
![Page 20: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/20.jpg)
Simulating the Posterior Distribution of Predicted Values.
# 'arm' package and use the 'sim' function. Note: n = 100 is the default for 'sim’.library(arm)
sim.100 <- sim(lmm.2, n = 100)# Show the structure of objects in the 'sim' object.str(sim.100)<not displayed> 20
![Page 21: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/21.jpg)
Simulating the Posterior Distribution of Predicted Values.
# Fixed effect parameter estimates resulting from the 'sim' function applied to the fitted model (lmm.3).fe.sim <- fixef(sim.100)fe.sim (Intercept) open agree social classb classc classd [1,] 55.24643 0.0113879890 -7.370662e-03 4.115703e-03 1.99092257 2.9418821 3.162604 [2,] 56.69630 0.0051451242 -1.373704e-02 -1.799054e-03 1.73041539 3.8671053 6.160748 [3,] 63.18570 0.0003935109 2.607783e-03 1.435752e-03 1.80586410 3.2203590 5.802364 [4,] 56.00007 0.0042571840 -6.076147e-03 -5.324692e-03 2.71728164 5.6066533 6.852651 [5,] 59.94718 0.0026340937 -2.584516e-03 3.295548e-07 1.45650055 3.3174045 5.871667 [6,] 65.26589 0.0100470520 -1.324052e-02 -3.480780e-04 1.79030239 3.3253023 4.050358 [7,] 56.80116 0.0082074105 -8.175804e-03 1.182413e-03 2.35693946 3.0119753 5.937348 [8,] 61.32350 0.0047934705 -1.484498e-02 -2.710392e-03 2.11558934 4.2048688 6.552194 [9,] 53.87001 0.0054213155 -7.160089e-03 8.668833e-04 1.86080451 2.8613245 4.761669 [10,] 57.47641 0.0055136083 -6.293459e-03 -5.253847e-05 3.17600677 6.4525022 6.438270
21
![Page 22: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/22.jpg)
Simulating the Posterior Distribution of Predicted Values.
# Random effect parameter estimates resulting from the 'sim' function applied to the fitted model (lmm.3).re.sim <- ranef(sim.100)re.sim[[1]] # For "class:school" random effect.re.sim[[2]] # For ”school" random effect.
22
![Page 23: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/23.jpg)
re.sim[[1]] # For ”class:school" random effect., , (Intercept)
a:I a:II a:III a:IV a:V a:VI b:I b:II [1,] -1.8138575 1.009722294 0.502308352 0.574242632 1.62249792 0.34486828 0.41734749 -0.516721008 [2,] -4.5023927 0.325461572 1.105711427 0.555938715 1.49927806 -1.05082790 0.72720272 1.065476210 [3,] -2.9011592 1.699112086 1.924096930 1.588047483 0.08551292 -1.71333314 0.47475579 0.095562455 [4,] -4.7454517 -1.024665550 0.449287566 1.066899463 1.56470696 -1.34450134 -0.47980863 0.964331898 [5,] -4.6413961 0.092845610 0.878011579 0.328065852 0.94227622 -2.48685750 0.13250051 0.336973705Much more!
23
![Page 24: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/24.jpg)
re.sim[[2]] # For "school" random effect.
, , (Intercept)
I II III IV V VI [1,] -10.889610 11.9319979 6.4468727 7.52046579 9.407021912 14.8484638 [2,] -11.811196 -10.1548630 -2.3812528 4.24907315 6.038850618 15.1022442 [3,] -17.642004 -6.5881409 2.6734584 5.09687885 7.313420709 7.6798984 [4,] -12.201235 -6.5415744 -6.2550322 4.62112286 13.050521302 14.7147714 [5,] -16.604904 -10.9215257 -3.2698478 2.47299902 2.276550540 11.8441601
24
![Page 25: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/25.jpg)
Get predicted values
# To get predicted values from the posterior distribution, use the 'fitted' function.
yhat.lmm.2 <- fitted(sim.100, lmm.2)head(yhat.lmm.2)< see output >tail(yhat.lmm.2)< see output >
25
![Page 26: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/26.jpg)
# The above object (yhat.lmm.2) is a matrix of 100 (simulations) by 1200 participants.# In this matrix, each row represents a participant and each column represents a simulated predicted value for the outcome variable of our lmm.2 model.# Therefore, the yhat.lmm.2 object can be used to create credible intervals for each participant (i.e. individual level).> quantile(yhat.lmm.2, probs = c(.025, .985)) # For first participant (i.e. row 1). 2.5% 98.5% 39.93096 81.29584 26
![Page 27: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/27.jpg)
# We can also create a data frame with the quantiles for every participant.quant.mat <- data.frame(matrix(rep(NA, 1200*2), ncol = 2))names(quant.mat) <- c("2.5%", "98.5%")quant.mat[,1] <- apply(yhat.lmm.2, 1, quantile, probs = .025)quant.mat[,2] <- apply(yhat.lmm.2, 1, quantile, probs = .985)head(quant.mat, 25) 27
![Page 28: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/28.jpg)
Head of data frame 2.5% 98.5%1 47.99122 80.077362 66.11761 72.793333 76.65614 83.608974 46.50965 79.564515 48.01904 80.077426 47.20663 54.454877 49.31807 75.217088 48.06083 80.11512
28
![Page 29: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/29.jpg)
In R - lcmm• Estimation of latent class mixed-effect models
for different types of outcomes (continuous Gaussian, continuous non-Gaussian or ordinal)
• This function fits mixed models and latent class mixed models for different types of outcomes. – continuous longitudinal outcomes (Gaussian or
non-Gaussian) as well as bounded quantitative, discrete and ordinal longitudinal outcomes.
29
![Page 30: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/30.jpg)
What does it do?• The different types of outcomes are taken
into account using parameterized nonlinear link functions between the observed outcome and the underlying latent process of interest
• At the latent process level, the model estimates a standard linear mixed model or a latent class mixed model when heterogeneity in the population is investigated (in the same way as in function hlme -> next) but it should be noted that the program also works when no random-effect is included! 30
![Page 31: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/31.jpg)
What does it do?• Parameters of the nonlinear link function and
of the latent process mixed model are estimated simultaneously using a maximum likelihood method.
lcmm(fixed, mixture, random, subject, classmb, ng = 1, idiag = FALSE, nwg = FALSE, link = "linear", intnodes = NULL, epsY = 0.5, data, B, convB = 1e-04, convL = 1e-04, convG = 1e-04, maxiter=100, nsim=100, prior,range=NULL, na.action=1) ### that’s a lot of parameters 31
![Page 32: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/32.jpg)
Turning to lcmm# Beta link functionm11<-lcmm(Ydep2~Time+I(Time^2),random=~Time,subject='ID',ng=1,data=data_Jointlcmm,link="beta")summary(m11)plot.linkfunction(m11,bty="l")# I-splines with 3 equidistant nodesm12<-lcmm(Ydep2~Time+I(Time^2),random=~Time,subject='ID',ng=1,data=data_Jointlcmm,link="3-equi-splines")summary(m12)# I-splines with 5 nodes at quantilesm13<-lcmm(Ydep2~Time+I(Time^2),random=~Time,subject='ID',ng=1,data=data_Jointlcmm,link="5-quant-splines")summary(m13)# I-splines with 5 nodes, and interior nodes entered manuallym14<-lcmm(Ydep2~Time+I(Time^2),random=~Time,subject='ID',ng=1,data=data_Jointlcmm,link="5-manual-splines",intnodes=c(10,20,25))summary(m14)plot.linkfunction(m14,bty="l")
32
![Page 33: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/33.jpg)
Turning to lcmm# Thresholds# Especially for the threshold link function, we recommend to estimate models # with increasing complexity and use estimates of previous ones to specify # plausible initial values (we remind that estimation of models with threshold# link function involves a computationally demanding numerical integration # -here of size 3)m15<-lcmm(Ydep2~Time+I(Time^2),random=~Time,subject='ID',ng=1,data=data_Jointlcmm,link="thresholds",maxiter=100,B=c(-0.8379, -0.1103, 0.3832, 0.3788 , 0.4524, -7.3180, 0.5917, 0.7364, 0.6530, 0.4038, 0.4290, 0.6099, 0.6014 , 0.5354 , 0.5029 , 0.5463, 0.5310 , 0.5352, 0.6498, 0.6653, 0.5851, 0.6525, 0.6701 , 0.6670 , 0.6767 , 0.7394 , 0.7426, 0.7153, 0.7702, 0.6421))summary(m15)plot.linkfunction(m15,bty="l")
33
![Page 34: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/34.jpg)
Turning to lcmm#### Plot of estimated different link functions:#### (applicable for models that only differ in the "link function" used. #### Otherwise, the latent process scale is different and a rescaling is necessary)transfo <- data.frame(marker=m10$estimlink[,1],linear=m10$estimlink[,2],beta=m11$estimlink[,2],spl_3e=m12$estimlink[,2],spl_5q=m13$estimlink[,2],spl_5m=m14$estimlink[,2])dev.new()plot(transfo[,1]~transfo[,2],xlim=c(-10,5),col=1,type='l',xlab="latent process",ylab="marker",bty="l")par(new=TRUE)plot(transfo[,1]~transfo[,3],xlim=c(-10,5),col=2,type='l',xlab="",ylab="",bty="l")par(new=TRUE)plot(transfo[,1]~transfo[,4],xlim=c(-10,5),col=3,type='l',xlab="",ylab="",bty="l")par(new=TRUE)plot(transfo[,1]~transfo[,5],xlim=c(-10,5),col=4,type='l',xlab="",ylab="",bty="l")par(new=TRUE)plot(m15$estimlink[,1]~m15$estimlink[,2],xlim=c(-10,5),col=5,type='l',xlab="",ylab="",bty="l")legend(x="bottomright",legend=c(colnames(transfo[,2:5]),"thresholds"),col=1:5,lty=1,inset=.02,bty="n”)
34
![Page 35: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/35.jpg)
Turning to lcmm#### Estimation of 2-latent class mixed models with different assumed link #### functions with individual and class specific linear trend#### for illustration, only default initial values where used but other#### sets of initial values should also be tried to ensure convergence #### towards the golbal maximum# Linear link functionm20<-lcmm(Ydep2~Time,random=~Time,subject='ID',mixture=~Time,ng=2,idiag=TRUE,data=data_Jointlcmm,link="linear")summary(m20)postprob(m20)# Beta link functionm21<-lcmm(Ydep2~Time,random=~Time,subject='ID',mixture=~Time,ng=2,idiag=TRUE,data=data_Jointlcmm,link="beta")summary(m21)postprob(m21)# I-splines link function (and 5 nodes at quantiles)m22<-lcmm(Ydep2~Time,random=~Time,subject='ID',mixture=~Time,ng=2,idiag=TRUE,data=data_Jointlcmm,link="5-quant-splines")summary(m22)postprob(m22)
data <- data_Jointlcmm[data_Jointlcmm$ID==193,]plot.predict(m22,var.time="Time",newdata=data,bty="l")
35
![Page 36: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/36.jpg)
Turning to multlcmmlibrary(lcmm)data(data_Jointlcmm)# linear link function# Latent process mixed model for two curvilinear outcomes. Link functions are aproximated by I-splines, the first one has 3 nodes (i.e. 1 internal node 8), the second one has 4 nodes (i.e. 2 internal nodes 12,25)m1 <- multlcmm(Ydep1+Ydep2~1+Time*X2+contrast(X2),random=~1+Time, subject="ID",randomY=TRUE,link=c("4-manual-splines","3-manual-splines"), intnodes=c(8,12,25),data=data_Jointlcmm)
Be patient, multlcmm is running ... The program took 56.14 seconds
36
![Page 37: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/37.jpg)
Quicker lcmm# to reduce the computation time, the same model is estimated using# a vector of initial valuesm1 <- multlcmm(Ydep1+Ydep2~1+Time*X2+contrast(X2),random=~1+Time, subject="ID",randomY=TRUE,link=c("4-manual-splines","3-manual-splines"), intnodes=c(8,12,25),data=data_Jointlcmm, B=c(-1.071, -0.192, 0.106, -0.005, -0.193, 1.012, 0.870, 0.881, 0.000, 0.000, -7.520, 1.401, 1.607 , 1.908, 1.431, 1.082, -7.528, 1.135 , 1.454 , 2.328, 1.052)) Be patient, multlcmm is running ... The program took 7.78 seconds
37
![Page 38: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/38.jpg)
Summary(m1)General latent class mixed model fitted by maximum likelihood method multlcmm(fixed = Ydep1 + Ydep2 ~ 1 + Time * X2 + contrast(X2), random = ~1 + Time, subject = "ID", randomY = TRUE, link = c("4-manual-splines", "3-manual-splines"), intnodes = c(8, 12, 25), data = data_Jointlcmm) Statistical Model: Dataset: data_Jointlcmm Number of subjects: 300 Number of observations: 3356 Number of latent classes: 1 Number of parameters: 21 Link functions: Quadratic I-splines with nodes 0 8 12 17.581 for Ydep1 Quadratic I-splines with nodes 0 25 30 for Ydep2
38
![Page 39: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/39.jpg)
Summary(m1) Iteration process: Convergence criteria satisfied Number of iterations: 4 Convergence criteria: parameters= 5.2e-11 : likelihood= 2.1e-08 : second derivatives= 1.2e-09 Goodness-of-fit statistics: maximum log-likelihood: -6977.48 AIC: 13996.95 BIC: 14074.73
39
![Page 40: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/40.jpg)
Summary(m1)Maximum Likelihood Estimates: Fixed effects in the longitudinal model: coef Se Wald p-valueintercept (not estimated) 0.00000 Time -1.07056 0.12293 -8.70900 0.00000X2 -0.19225 0.16697 -1.15100 0.24957Time:X2 0.10627 0.18634 0.57000 0.56847Contrasts on X2 (p=0.88696) Ydep1 -0.00483 0.03399 -0.14215 0.88696Ydep2* 0.00483 0.03399 0.14215 0.88696 *coefficient not estimated but obtained from the others as minus the sum of themVariance-covariance matrix of the random-effects:(the variance of the first random effect is not estimated) intercept Timeintercept 1.00000 Time -0.19338 1.01251
40
![Page 41: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/41.jpg)
Summary(m1) – last bit! Ydep1 Ydep2Residual standard error: 0.86955 0.88053Standard error of the random effect: 0.00000 0.00000Parameters of the link functions: coef Se Wald p-valueYdep1-I-splines1 -7.51985 0.64412 -11.675 0e+00Ydep1-I-splines2 1.40067 0.18058 7.756 0e+00Ydep1-I-splines3 1.60739 0.10324 15.569 0e+00Ydep1-I-splines4 1.90822 0.07873 24.238 0e+00Ydep1-I-splines5 1.43117 0.09075 15.770 0e+00Ydep1-I-splines6 1.08205 0.21198 5.105 0e+00Ydep2-I-splines1 -7.52861 0.67080 -11.223 0e+00Ydep2-I-splines2 1.13505 0.25553 4.442 1e-05Ydep2-I-splines3 1.45345 0.14629 9.935 0e+00Ydep2-I-splines4 2.32793 0.08636 26.956 0e+00Ydep2-I-splines5 1.05187 0.05908 17.803 0e+00
41
![Page 42: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/42.jpg)
plot(m1,which="linkfunction")# variation percentages explained by linear mixed regression
> VarExpl(m1,data.frame(Time=0)) class1%Var-Ydep1 56.94364%Var-Ydep2 56.32753
42
![Page 43: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/43.jpg)
summary(m2)< … ># posterior classificationpostprob(m2)Posterior classification: class1 class2N 143.00 157.00% 47.67 52.33Posterior classification table: --> mean of posterior probabilities in each class prob1 prob2class1 1.0000 0.0000class2 0.0589 0.9411Posterior probalities above a threshold (%): class1 class2prob>0.7 100 98.09Prob>0.8 100 96.18prob>0.9 100 85.99
43
![Page 44: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/44.jpg)
# longitudinal predictions in the outcomes scales for a given profile of covariates
newdata <- data.frame(Time=seq(0,5,length=100),X1=rep(0,100),X2=rep(0,100),X3=rep(0,100))predGH <- predictY(m2,newdata,var.time="Time",methInteg=0,nsim=20)head(predGH)
Etc. 44
![Page 45: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/45.jpg)
In lcmm - hlme• Fits a latent class linear mixed model (LCLMM) also
known as growth mixture model or heterogeneous linear mixed model.
• LCLMM consists in assuming that the population is divided in a finite number of latent classes; each latent class is characterized by a specific mean trajectory which is described by a class-specific linear mixed model.
• Both the latent class membership and the trajectory can be explained according to covariates.
• This model is limited to a Gaussian outcome.45
![Page 46: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/46.jpg)
In Rhlme(fixed, mixture, random, subject, classmb, ng = 1, idiag = FALSE, nwg = FALSE, cor=NULL, data, B, convB=0.0001, convL=0.0001, convG=0.0001, prior, maxiter=500, subset=NULL, na.action=1)
46
![Page 47: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/47.jpg)
Exampledata(data_hlme)m1<-hlme(Y~Time*X1, random=~Time, subject='ID', ng=1, idiag=TRUE, data=data_hlme)summary(m1)
47
![Page 48: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/48.jpg)
Summary hlmeHeterogenous linear mixed model fitted by maximum likelihood method hlme(fixed = Y ~ Time * X1, random = ~Time, subject = "ID", ng = 1, idiag = TRUE, data = data_hlme) Statistical Model: Dataset: data_hlme Number of subjects: 100 Number of observations: 326 Number of latent classes: 1 Number of parameters: 7
48
![Page 49: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/49.jpg)
Summary hlmeIteration process: Convergence criteria satisfied Number of iterations: 9 Convergence criteria: parameters= 1.2e-07 : likelihood= 1.6e-05 : second derivatives= 6.2e-13 Goodness-of-fit statistics: maximum log-likelihood: -804.98 AIC: 1623.95 BIC: 1642.19 Maximum Likelihood Estimates: Fixed effects in the longitudinal model: coef Se Wald p-valueintercept 25.86515 0.79448 32.556 0.00000Time -0.33282 0.17547 -1.897 0.05787X1 1.69698 1.03466 1.640 0.10098Time:X1 -0.39364 0.22848 -1.723 0.08491
49
![Page 50: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/50.jpg)
Summary hlme
Variance-covariance matrix of the random-effects: intercept Timeintercept 24.63032 Time 0.00000 1.168762
coef seResidual standard error: 0.9501876 0.05765784
50
![Page 51: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/51.jpg)
plot(m1)
51
![Page 52: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/52.jpg)
Examplem2<-hlme(Y~Time*X1, mixture=~Time, random=~Time, classmb=~X2+X3, subject='ID', ng=2, data=data_hlme, B=c(0.11, -0.74, -0.07, 20.71, 29.39, -1, 0.13, 2.45, -0.29, 4.5, 0.36, 0.79, 0.97))m2Heterogenous linear mixed model fitted by maximum likelihood method hlme(fixed = Y ~ Time * X1, mixture = ~Time, random = ~Time, subject = "ID", classmb = ~X2 + X3, ng = 2, data = data_hlme) Statistical Model: Dataset: data_hlme Number of subjects: 100 Number of observations: 326 Number of latent classes: 2 Number of parameters: 13
52
Iteration process: Convergence criteria satisfied Number of iterations: 2 Convergence criteria: parameters= 1.3e-07 : likelihood= 4.4e-07 : second derivatives= 2.5e-12 Goodness-of-fit statistics: maximum log-likelihood: -773.82 AIC: 1573.64 BIC: 1607.51
![Page 53: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/53.jpg)
Examplesummary(m2)postprob(m2)Posterior classification: class1 class2N 46 54% 46 54 Posterior classification table: --> mean of posterior probabilities in each class prob1 prob2class1 0.9588 0.0412class2 0.0325 0.9675 Posterior probalities above a threshold (%): class1 class2prob>0.7 93.48 100.00prob>0.8 93.48 92.59prob>0.9 86.96 83.33
53
![Page 54: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/54.jpg)
54
![Page 55: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/55.jpg)
Example### same model as m2 but initial values specifiedm3<-hlme(Y~Time*X1, mixture=~Time, random=~Time, classmb=~X2+X3, subject='ID', ng=2, data=data_hlme, B=c(0, 0, 0, 30, 25, 0, -1, 0, 0, 5, 0, 1, 1))m3
55
![Page 56: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/56.jpg)
Predicting…summary(m3)
Etc.
## plot of predicted trajectories using some newdatanewdata<-data.frame( Time= seq(0,5,length=100), X1= rep(0,100), X2=rep(0,100), X3=rep(0,100)) plot.predict(m3,newdata,"Time","right",bty="l") 56
![Page 57: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/57.jpg)
plot m3
57
![Page 58: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/58.jpg)
Beyond PCA!• Kernel PCA• ICA
– PCA is not particularly helpful for finding independent clusters
– ICA idea:– Assume non-Gaussian data– Find multiple sets of components– Minimize correlation between components
– Blind source separation example:– Given: Audio recording with w/2 overlapping voices – Goal: Separate voices into separate tracks 58
![Page 59: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/59.jpg)
Beyond PCA!Probabilistic PCA
Bayesian source separation
Continuous Latent Variables
59
![Page 60: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018](https://reader036.vdocuments.mx/reader036/viewer/2022070611/5b24f0137f8b9a137a8b4b5c/html5/thumbnails/60.jpg)
Reading, etc.• http://data-informed.com/focus-predictive-ana
lytics/
• Final week – your project presentations ~ Monday, Thursday in two sections (Carnegie 113 and Lally 102) – we cannot run over the class time to complete these – plan accordingly, arrive on time (instructions and initial schedule sent in LMS) – and attendance is essential (no excuses)
• 5 MINUTES – you do not need more60