rohan's ms project ucsd - kaggle.com
TRANSCRIPT
![Page 1: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/1.jpg)
``What do you know?' ' Latent feature approach for the
Kaggle's GrockIt challenge
Rohan AnilAdvised by Prof. Charles Elkancollaboration with Aditya Menon
UC San DiegoMarch 19, 2012
![Page 2: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/2.jpg)
Outline● Introduction
● Kaggle.com● GrockIt● ``What do you know?' ' Challenge
● Latent Feature Log-Linear (LFL)● Ensemble Learning● Our Results● Q/A
![Page 3: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/3.jpg)
Kaggle.com
![Page 4: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/4.jpg)
' 'What do you know?' ' - Competition
1st Prize : 3000$ 2nd Prize : 1500$ 3rd Prize : 500$
![Page 5: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/5.jpg)
GrockIt.com
![Page 6: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/6.jpg)
Dataset
Training Set
4,851,476 outcomes of students answering various questions
Outcomes
Four types:-
i) correct ii) incorrect iii) skipped iv) timed-out.
Students practicing for competitive exams
i) GMAT, ii) ACT and iii) SAT
![Page 7: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/7.jpg)
Dataset
![Page 8: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/8.jpg)
DatasetDifferences between training set and test set are:-
BiasBiased towards users who have answered more questions.
#ResponeOnly one response per student
TemporalOutcomes are latter in time than the training responses and validation responses of that student.
OutcomesTest set distribution is different from training set,it does not include timed-out or skipped outcomes.
![Page 9: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/9.jpg)
Baseline
Rasch BaselineA baseline was provided by Kaggle for the dataset.
![Page 10: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/10.jpg)
Bs - ability of the student 's'
δq - difficulty of question 'q'
For a given student 's' ( Fixed Bs )
– The probability of answering a question is only dependent on the difficult of the question q
– Consequence of this is that for every student, the ranking interms of probability of answering the question correctly is the same.
...
![Page 11: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/11.jpg)
Dataset
Validation set Grockit created a validation set which contains responses of 80,075 students on different questions.
Test setTest set was used for ranking the teams, it contains responses of 93,100 users on different questions.
![Page 12: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/12.jpg)
Dyadic Prediction
A dyadic prediction task is a learning task which involves predicting a class label for a pair of items ( Hoffman 1999 )
![Page 13: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/13.jpg)
Side-Information
Sometimes there is more information in the dataset. They are
1. side-information associated with u
2. side-information associated with i
3. interaction side-information for (u,i)
![Page 14: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/14.jpg)
Interpreting the task as a collaborative filtering problem
The dataset contains student responses for various questions.
179,107 students and 6,046 questions
![Page 15: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/15.jpg)
....Skipped
Timed out
![Page 16: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/16.jpg)
...
Nominal Outcomes● Correct● Incorrect● Timed-Out● Skipped
![Page 17: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/17.jpg)
Dyadic Prediction
( , )
..... .....
.....
( , )
Training Set
![Page 18: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/18.jpg)
Dyadic Prediction
( , ) ?
Query in Test
![Page 19: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/19.jpg)
Side Information in the dataset
Associated with a student
Not Available
Associated with a question
Question Type, Group, Track, Subtrack, Tags
Associated with (student,question) dyad
Game, Number of Players, Started at, Answered at, Deactivated at, Question set
![Page 20: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/20.jpg)
Side Information
Question TypeMultiple Choice, Free Response
GroupACT, GMAT, SAT
SubtrackCritical Reasoning, Data Sufficiency, English, Ientifying Sentence Errors, Improving Paragraphs, Improving Sentences, Math, Multiple Choice, Passage Based Reading, Problem Solving, Reading, Reading Comprehension, Science, Sentence Completion, Sentence Correction, Student Produced Response
Tags
describes the skill that is needed to solve the question.
![Page 21: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/21.jpg)
Dataset
![Page 22: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/22.jpg)
Dataset
![Page 23: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/23.jpg)
Dataset
![Page 24: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/24.jpg)
....
The dataset is similar to the typical dyadic dataset with a couple of key differences: ● Duplicate Dyads
There can exist duplicate dyad pairs in the training set with different outcomes, since a student can answer a question many times,
● Collaborative or Competitive AnsweringIn some games types, students can collaboratively answer questions.
![Page 25: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/25.jpg)
Motivation for Latent feature approach
Highly successful at winning the Netflix prize 1M$ challenge (Toscher et al., 2009) where the problem was to predict ratings for movies.
![Page 26: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/26.jpg)
Metric used to rank the teams
Binomial Capped Deviance, similar to log-likelihood
Estimated probability of correct responseCapped between [0.01,.99]
True label of the dyad
![Page 27: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/27.jpg)
Leaderboard
![Page 28: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/28.jpg)
Latent feature log-linear
Motivations for Latent Feature Log-Linear (LFL) (Menon & Elkan, 2010)
Well calibrated Probabilitieswe need to predict the probability of correct outcome for the dyadic pairs in the test set.
Leverage Side-InformationMost collaborative filtering algorithms do not have any principled way of including side-information
Scale WellTo be used in the industry, the method has to scale well to large datasets
![Page 29: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/29.jpg)
Multiclass LFL model
![Page 30: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/30.jpg)
Multiclass LFL model
Case | Y| = 3
p(y=3 | (user,item)) = exp( U3user . I
3item )
U1 I1 U1 U1 I2U2 U3 I3
Z = exp( U1user . I
1item ) +exp( U1
user . I1
item ) + exp( U3user . I
3item )
Z
![Page 31: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/31.jpg)
Binary LFL on the dataset
Test Set contains only two types of outcomes i) correct ii) incorrect
y = 1 ( Correct Response)
y = 0 ( Incorrect Response)
The binary-LFL model has appeared in the literature before (Schein et al., 2003; Agarwal & Chen, 2009)
![Page 32: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/32.jpg)
Training
We optimize for the negative log likelihood
We can optimize this objective function using the stochastic gradient descent method.
Regularization Terms
![Page 33: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/33.jpg)
Stochastic Gradient Descent
![Page 34: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/34.jpg)
LFL on GrockIt
![Page 35: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/35.jpg)
Stochastic Gradient Descent
![Page 36: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/36.jpg)
Grid Search
parameters
![Page 37: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/37.jpg)
Parallel SGD Training
Was formulated independently by Gemulla et al., 2011
![Page 38: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/38.jpg)
KDD CUP, Spring, 2011
This is us!!! =)
![Page 39: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/39.jpg)
Parallelism
![Page 40: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/40.jpg)
Side-Information
For a question q, let g =group(q). We can add a latent vector for each group i.e ACT, GMAT, SAT
Prediction equation after adding side information is
![Page 41: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/41.jpg)
Categorical Features
Group – G
Track – T
Subtrack – ST
Game Type – GT
Question Type – QT
![Page 42: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/42.jpg)
LFL Models
![Page 43: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/43.jpg)
Training Set
Training set contains four types of outcomes
i) correct, ii) incorrect, iii) skipped and iv) timed-out.
Test set contains four types of outcomes
i) correct, ii) incorrect
We create two training sets,a) Training set with skipped and timed-out responses excluded
b) Training set with skipped and timed-out responses treated as an incorrect outcome
![Page 44: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/44.jpg)
Results from LFL Models (a)
![Page 45: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/45.jpg)
Results from LFL Models (b)
![Page 46: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/46.jpg)
Observation
Throwing away data helps!Removing skipped and timed-out responses from training set improved the BCD (binomial capped deviance)
Motivates for adapting the model to the test-set distribution to win the competition.
![Page 47: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/47.jpg)
Ensemble Learning
No Single Model works well on every dyad.Combining predictions from multiple models can outperform each of the individual models (Takcas et al., 2009 )
1M$ Netflix Prize was won by a blend of multiple models
![Page 48: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/48.jpg)
Intuition for Ensemble LearningTrue labels for four samples(1,1,0,0)
Predictions from four different models.(0,1,0,0) – accuracy 75%(1,0,0,0) – accuracy 75%(1,1,1,0) – accuracy 75%(1,1,0,1) – accuracy 75%
Average of different models(.75,.75,.25,.25)
Threshold the average at 0.5(1,1,0,0) – accuracy = 100%
![Page 49: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/49.jpg)
Using Linear Regression for combining predictions
For a set with known labels,
{ (s,q) – > y(s,q) } , where y can take 0 or 1
pi = pi ( y=1) | (s,q) ) is the estimated probability of a correct response from the ith model
Define matrix P and column matrix Y,
where each row of P contains predictions from n models, ( p1 .., pi , .. pn )
and Y contains the target value y(s,q)
Similarly using predictions for every dyad in the set, we create matrix P with predictions and Y with target values.
We solve,
Pw = Y
![Page 50: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/50.jpg)
To predict the probability of a correct response of an example in the test set,
We combine predictions from n models using the weight vector w
pestimated = wj pj
....
![Page 51: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/51.jpg)
Which set to use?
Step 1
for each of the n models
Train on the training setPredict on the validation setsave parameters
Step 2: Estimate w using linear regression on the validation set predictions
Step 3:
for each of the n models
Train on the training set + validation setPredict on the test set
Step 4:
Combine predictions of the test set using w
![Page 52: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/52.jpg)
Results
![Page 53: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/53.jpg)
After combining predictions using linear regression
![Page 54: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/54.jpg)
2 weeks later
![Page 55: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/55.jpg)
some weeks later..
![Page 56: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/56.jpg)
Gradient Boosted Decision Trees
Leverage Side-Information in Ensemble learning
Gradient Boosted Decision Trees (GBDT) (Friedman, 1999) algorithm can be used to combine predictions and side information together.
Popular algorithm
GBDT is a powerful learning algorithm that is widely used (see Li & Xu, 2009, chap. 6)
The core of the algorithm is a decision tree learner
![Page 57: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/57.jpg)
Decision Tree
Decision tres can handle both i) Numeric, and ii) categorical variables.
It can also handle missing information.
![Page 58: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/58.jpg)
Decision Tree
Prediction ( Y6 + Y7 + Y9 ) / 3
Prediction ( Y1 + Y3 ) / 2 ................... .................
Decision function
![Page 59: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/59.jpg)
Gradient Boosting
Select the base learner, and loss function.● Decision Tree as the base learner, and Squared
Loss as the loss function Gradient boosting is an iterative-procedure
● Iteratively fit a base learner on the gradient of the previous iteration
![Page 60: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/60.jpg)
Gradient Boosting
We can add the a regularization parameter as follows
![Page 61: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/61.jpg)
Side-Information for GBDT
![Page 62: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/62.jpg)
Meta-Features
![Page 63: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/63.jpg)
Preprocessing Tags
Each question has a set of tags that is associated with it. Some are listed below
Statistics (incl. mean median mode),259
Strengthen Hypothesis,260
Student Produced Response,261
System of Linear Equations,262
Systems of Linear Equations,263
Systems of linear equations and inequalities,264
We manually merge the tags that we feel are very similar.
We cluster the tags into 40 clusters using spectral clustering (Ng et al., 2001) with normalized co-ocurrence of tags as the similarity measure to generate the affinity matrix A.
![Page 64: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/64.jpg)
Results from GBDT
● GBDT only improved the bcd marginally.
![Page 65: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/65.jpg)
Including Temporal Features
![Page 66: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/66.jpg)
...
![Page 67: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/67.jpg)
GBDT Results after including temporal features
![Page 68: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/68.jpg)
Feb 23, Week, competition end
![Page 69: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/69.jpg)
Last day
Combined predictions from GBDT models using linear regression, improved slightly.
![Page 70: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/70.jpg)
Last day of competition
![Page 71: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/71.jpg)
Final Private set ranks
![Page 72: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/72.jpg)
Post competition analysis
Latent feature approach is a good approach for this dataset.
LFL performs really well on the dataset
Code will be available soon @ http:/ / code.google.com/p/ latent-feature-log-linear/
![Page 73: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/73.jpg)
Questions
![Page 74: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/74.jpg)
References
Agarwal, Deepak and Chen, Bee-Chung. Regression based latent factor models. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’09, pp. 19– 28, New York, NY, USA, 2009. ACM. ISBN 978- 1-60558-495-9.Friedman, Jerome H. Stochastic gradient boosting. Computational Statistics and Data Analysis, 38: 367– 378, 1999.Gemulla, Rainer, Nijkamp, Erik, Haas, Peter J., and Sismanis, Yannis. Large-scale matrix factorization with distributed stochastic gradient descent. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’11, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0813-7.Hofmann, Thomas, Puzicha, Jan, and Jordan Michael I. Learning from dyadic data. In Proceedings of the 1998 conference on Advances in neural information processing systems II, pp. 466– 472, Cambridge, MA, USA, 1999. MIT Press. ISBN 0-262-11245-0.Li, Xiaochun and Xu, Ronghui (eds.). High dimensional data analysis in cancer research. Springer, CA, U.S.A, 2009.Menon, Aditya Krishna and Elkan, Charles. A log linear model with latent features for dyadic predic-tion. In ICDM’10, pp. 364– 373, 2010.Ng, Andrew Y., Jordan, Michael I., and Weiss, Yair. On spectral clustering: Analysis and an algorithm.In Advances in Nueral Information Processing Systems, pp. 849– 856. MIT Press, 2001.
![Page 75: Rohan's MS Project UCSD - Kaggle.com](https://reader034.vdocuments.mx/reader034/viewer/2022051322/5450c4f8af79590b098b4e15/html5/thumbnails/75.jpg)
References
Rasch, Georg. Estimation of parameters and control of the model for two response categories, 1960.
Schein, Andrew I., Lawrence, Andrew I., Saul, Lawrence K., and Ungar, Lyle H. A generalized linear model for principal component analysis of binary data, 2003.
Takcas, G abor, Pilaszy, Istvan, Nemeth, Bottyan, and Tikk, Domonkos. Scalable ́collaborative filtering approaches for large recommender systems. J. Mach. Learn. Res., 10:623– 656, June 2009. ISSN 1532- 4435.
Tscher, Andreas, Jahrer, Michael, and Bell, Robert M. The bigchaos solution to the netflix grand prize, 2009.