latent factor models

65
Latent Factor Models Geoff Gordon Joint work w/ Ajit Singh, Byron Boots, Sajid Siddiqi, Nick Roy

Upload: uriel

Post on 13-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Latent Factor Models. Geoff Gordon Joint work w/ Ajit Singh, Byron Boots, Sajid Siddiqi, Nick Roy. Motivation. A key component of a cognitive tutor: student cognitive model Tracks what skills student currently knows — latent factors. circle-area. rectangle-area. decompose-area. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Latent Factor Models

Latent Factor Models

Geoff GordonJoint work w/ Ajit Singh, Byron Boots,

Sajid Siddiqi, Nick Roy

Page 2: Latent Factor Models

Motivation

A key component of a cognitive tutor: student cognitive model

Tracks what skills student currently knows—latent factors

circle-area

rectangle-area

decompose-area

right-answer

Page 3: Latent Factor Models

Motivation

Student models are a key bottleneck in cognitive tutor authoring and performance

rough estimate: 20-80 hrs to hand-code model for 1 hr of content

result may be too simple, not rigorously verified

But, demonstrated improvements in learning from better models

E.g., Cen et al [2007]:12% less time to learn 6 geometry units (same retention) using tutor w/ more accurate model

This talk: automatic discovery of new models and data-driven revision of existing models via (latent) factor analysis

Page 4: Latent Factor Models

SCORE: STDNT I, ITEM J

Simple case: snapshot, no side information

1 2 3 4 5 6 …

A 1 1 0 0 1 0 …

B 0 1 1 0 0 0 …

C 1 1 0 1 1 0 …

D 1 0 0 1 1 0 …

… … … … … … … …

ITEMS

STUDENTS

Page 5: Latent Factor Models

Missing data

1 2 3 4 5 6 …

A 1 ? ? ? 1 0 …

B 0 ? 1 0 ? ? …

C 1 1 ? ? ? 0 …

D 1 0 0 1 ? ? …

… … … … … … … …

ITEMS

STUDENTS

Page 6: Latent Factor Models

Data matrix X

xx11

xx22

xx33

..

..

..xxnn

STUDENTS

ITEMS

Page 7: Latent Factor Models

Simple case: model

XX

VV

UU

U: student latent factorsV: item latent factorsX: observed performance

n students

m items k latent factors

k latent factors

observed

unobserved

Page 8: Latent Factor Models

Linear-Gaussian version

student factoritem factor

XX

VV

UU

n students

m items k latent factors

k latent factors

U: Gaussian (0 mean, fixed var)V: Gaussian (0 mean, fixed var)X: Gaussian (fixed var, mean at left)

Page 9: Latent Factor Models

Matrix form: Principal Components Analysis

xx11

xx22

xx33

..

..

..xxnn

DATA MATRIX X

COMPRESSED MATRIX U

uu11

uu22

uu33

..

..

..uunn

vv11 …… vvkk

BASIS MATRIX VT

Page 10: Latent Factor Models

PCA: the picture

Page 11: Latent Factor Models

PCA: matrix form

xx11

xx22

xx33

..

..

..xxnn

DATA MATRIX X

COMPRESSED MATRIX U

uu11

uu22

uu33

..

..

..uunn

vv11 …… vvkk

BASIS MATRIX VT

COLS OF V SPAN THE LOW-RANK SPACE

Page 12: Latent Factor Models

Interpretation of factors

uu11

uu22

uu33

..

..

..uunn

vv11 …… vvkk

STUDENTS

ITEMSBASIS WEIGHTS

BASIS VECTORS

BASIS VECTORS ARE CANDIDATE “SKILLS” OR “KNOWLEDGE COMPONENTS”

WEIGHTS ARE STUDENTS’ KNOWLEDGE LEVELS

Page 13: Latent Factor Models

PCA is a widely successful model

FACE IMAGES FROM Groundhog Day, EXTRACTED BY CAMBRIDGE FACE DB PROJECT

Page 14: Latent Factor Models

Data matrix: face images

xx11

xx22

xx33

..

..

..xxnn

IMAGES

PIXELS

Page 15: Latent Factor Models

Result of factoring

uu11

uu22

uu33

..

..

..uunn

vv11 …… vvkk

IMAGES

PIXELSBASIS WEIGHTS

BASIS VECTORS

BASIS VECTORS ARE OFTEN CALLED “EIGENFACES”

Page 16: Latent Factor Models

Eigenfaces

IMAGE CREDIT: AT&T LABS CAMBRIDGE

Page 17: Latent Factor Models

PCA: the good

Unsupervised: need no human labels of latent state!

No worry about “expert blind spot”

Of course, labels helpful if available

Post-hoc human interpretation of latents is nice too—e.g., intervention design

Page 18: Latent Factor Models

PCA: the bad

Linear, Gaussian

PCA assumes E(X) is linear in UV

PCA assumes (X–E(X)) is i.i.d. Gaussian

Page 19: Latent Factor Models

Nonlinearity: conjunctive skills

P(CORRECT)

SKILL 1SKILL 2

Page 20: Latent Factor Models

Nonlinearity: disjunctive skills

P(CORRECT)

SKILL 1SKILL 2

Page 21: Latent Factor Models

Nonlinearity: “other”P(CORRECT)

SKILL 1SKILL 2

Page 22: Latent Factor Models

Non-Gaussianity

Typical hand-developed skill-by-item matrix

1 2 3 4 5 6 …

1 1 0 0 1 1 …

0 0 1 1 0 1 …

SKILLS

ITEMS

Page 23: Latent Factor Models

Result of Gaussian assumption

true recovered

rows of true and recovered V matrices

Page 24: Latent Factor Models

Result of Gaussian assumption

true recovered

rows of true and recovered V matrices

Page 25: Latent Factor Models

The ugly: MLE only

PCA yields maximum-likelihood estimate

Good, right?

sadly, the usual reasons to want the MLE don’t apply here

e.g., consistency: variance and bias of estimates of U and V do not approach 0 (unless #items/student and #students/item )

Result: MLE is typically far too confident of itself

Page 26: Latent Factor Models

Too certain: example

Learned coefficients

(e.g., a row of U)

Predictions

Page 27: Latent Factor Models

Result: “fold-in problem”

Nonsensical results when trying to apply learned model to a new student or item

Similar to overfitting problem in supervised learning: confident-but-wrong parameters do not generalize to new examples

Unlike overfitting, fold-in problem doesn’t necessarily go away with more data

Page 28: Latent Factor Models

Summary: 3 problems w/ PCA

Can’t handle nonlinearity

Can’t handle non-Gaussian distributions

Uses MLE only (==> fold-in problem)

Let’s look at each problem in turn

Page 29: Latent Factor Models

Nonlinearity

In PCA, had Xij ≈ Ui ⋅ Vj

What if

Xij ≈ exp(Ui ⋅ Vj)

Xij ≈ logit(Ui ⋅ Vj)

Page 30: Latent Factor Models

Non-Gaussianity

In PCA, had Xij ∼ Normal(μ), μ = Ui ⋅ Vj

What if

Xij ∼ Poisson(μ)

Xij ∼ Binomial(p)

Page 31: Latent Factor Models

Exponential family review

Exponential family of distributions:

P(X | θ) = P0(X) exp(X⋅θ – G(θ))

G(θ) is always strictly convex, differentiable on interior of domain

• means G’ is strictly monotone (strictly generalized monotone in 2D or higher)

Page 32: Latent Factor Models

Exponential family review

Exponential family PDF:

P(X | θ) = P0(X) exp(X⋅θ – G(θ))

• Surprising result: G’(θ) = g(θ) = E(X | θ)

• g & g–1 = “link function”

• θ = “natural parameter”

• E(X | θ) = “expectation parameter”

Page 33: Latent Factor Models

Examples

Normal(mean)

g = identity

Poisson(log rate)

g = exp

Binomial(log odds)

g = sigmoid

Page 34: Latent Factor Models

Nonlinear & non-Gaussian

Let P(X | θ) be an exponential family with natural parameter θ

Predict Xij ∼ P(X | θij), where θij = Ui ⋅ Vj

e.g., in Poisson, E(Xij) = exp(θij)

e.g., in Binomial, E(Xij) = logit(θij)

Page 35: Latent Factor Models

Optimization problem

max ∑ log P(Xij | θij)

s.t. θij = Ui ⋅ Vj

• “Generalized linear” or “exponential family” PCA

• all P(…) terms are exponential families

• analogy to GLMs

+ log P(U) + log P(V)U,V

[Collins et al, 2001][Gordon, 2002][Roy & Gordon, 2005]

Page 36: Latent Factor Models

Special cases

PCA, probabilistic PCA

Poisson PCA

k-means clustering

Max-margin matrix factorization (MMMF)

Almost: pLSI, pHITS, NMF

Page 37: Latent Factor Models

Comparison to AFM

p = probability correct

θ = student overall performance

β = skill difficulty

Q = item x skill matrix

γ = skill practice slope

T = number of practice opportunities

TTikik γkkθ

β0

QQ

11

xx

Page 38: Latent Factor Models

Theorem

• In GL PCA, finding U which maximizes likelihood (holding V fixed) is a convex optimization problem

• And, finding best V (holding U fixed) is a convex problem

• Further, Hessian is block diagonal

So, an efficient and effective optimization algorithm: alternately improve U and V

Page 39: Latent Factor Models

Example: compressing

histograms w/ Poisson PCA

Points: observed frequencies in ℝ3

Hidden manifold: a 1-parameter family of multinomials

A

B C

Page 40: Latent Factor Models

Example

ITERATION 1

Page 41: Latent Factor Models

Example

ITERATION 2

Page 42: Latent Factor Models

Example

ITERATION 3

Page 43: Latent Factor Models

Example

ITERATION 4

Page 44: Latent Factor Models

Example

ITERATION 5

Page 45: Latent Factor Models

Example

ITERATION 9

Page 46: Latent Factor Models

Remaining problem: MLE

Well-known rule of thumb: if MLE gets you in trouble due to overfitting, move to fully-Bayesian inference

Typical problem: computation

In our case, the computation is just fine if we’re a little clever

Additional wrinkle: switch to hierarchical model

Page 47: Latent Factor Models

Bayesian hierarchical exponential-family PCA

XX

VV

UU

U: student latent factorsV: item latent factorsX: observed performanceR: shared prior for student latentsS: shared prior for item latents

n students

m items

k latent factors

k latent factors

observed

unobserved

RR

SS

student factoritem factor

Page 48: Latent Factor Models

A little clever: MCMC

Z P(X)

Page 49: Latent Factor Models

Experimental comparisonGeometry Area 1996-1997

data

Geometry tutor: 139 items presented to 59 students

On average, each student tested on 60 items

Page 50: Latent Factor Models

Results: hold-out error

Embedding dimension for *EPCA is K = 15

credit: Ajit Singh

Page 51: Latent Factor Models

Extensions

Relational models

Temporal models

Page 52: Latent Factor Models

Relational models

1 2 3 4 5 6john

1 1 0 0 1 0

sue 0 1 1 0 0 0

tom 1 1 0 1 1 0

ITEMS

STUDENTS

1 2 3 4 5 6trig 1 1 0 0 1 0

story

0 1 1 0 0 0

hard 1 1 0 1 1 0

ITEMS

TAGS

Page 53: Latent Factor Models

Relational hierarchical Bayesian exponential-family

PCA

XX

VV

UU

X, Y: observed dataU: student latent factorsV: item latent factorsZ: tag latent factorsR, S, T: shared priors

n students

m items

k latent factors

k latent factors

observed

unobserved

RR

SS

p tags

YY

ZZk latent factors

TT

X ≈ f(UVT) Y ≈ g(VZT)

Page 54: Latent Factor Models

Example: brain imaging

2000 dictionary words

60 stimulus words

500 brain voxels

X = co-occurrence of (dictionary word, stimulus word) on web

Y = activation of voxel when presented with stimulus

Task: predict X

HB-EPCA

H-EPCA

EPCA

Relational versions

Mean squared error

credit: Ajit Singh

Page 55: Latent Factor Models

Temporal models

So far: latent factors of students and content

e.g., knowledge components

for student: skill at KC

for problem: need for KC

e.g., student affect

But limited idea of evolution through time

e.g., fixed-structure models: proficiency = a + B X, where x = # practice opportunities, A = initial skill level, b = skill learning rate

Page 56: Latent Factor Models

Temporal models

For evolving factors, we expect far better results if we learn about time explicitly

learning curves, gaming state, affective state, motivational state, self-efficacy, …

XX11XX11XX11LATENT STATE

PROPERTIES OF TRANSACTION

X1X1X1X1YY11

INSTRUCTIONAL DECISIONS X1X1X1X1UU11

TRANS. 1 TRANS. 2 TRANS. 3

XX11XX11XX22

X1X1X1X1YY22

X1X1X1X1UU22

XX11XX11XX33

X1X1X1X1YY33

X1X1X1X1UU33

Page 57: Latent Factor Models

Example: Bayesian Evaluation & Assessment

[BECK ET AL., 2008]

PROPERTIES OF TRANSACTIONS

LATENT STATE

INSTRUCTIONAL DECISIONS

Page 58: Latent Factor Models

The hope

Fit a temporal model

Examine learned parameters and latent states

Discover important evolving factors which affect performance

learning curve, affective state, gaming state, …

Discover how they evolve

Page 59: Latent Factor Models

The hope

Reduce assumptions about what the factors are

Explore a wider variety of models

Model search guided by data

⇒ discover factors we might otherwise have missed

Page 60: Latent Factor Models

Walking: original data

QuickTime™ and a decompressor

are needed to see this picture.

THANKS: BYRON BOOTS, SAJID

SIDDIQI

Page 61: Latent Factor Models

Walking: original data

THANKS: BYRON BOOTS, SAJID

SIDDIQI

XX11XX11XX11LATENT STATE

JOINT ANGLESX1X1X1X1YY11

DESIREDDIRECTION X1X1X1X1UU11

TRANS. 1 TRANS. 2 TRANS. 3

XX11XX11XX22

X1X1X1X1YY22

X1X1X1X1UU22

XX11XX11XX33

X1X1X1X1YY33

X1X1X1X1UU33

Page 62: Latent Factor Models

Walking: learned model

QuickTime™ and a decompressor

are needed to see this picture.

Page 63: Latent Factor Models

Steam: original data

QuickTime™ and a decompressor

are needed to see this picture.

Page 64: Latent Factor Models

Steam: original data

XX11XX11XX11LATENT STATE

PIXELSX1X1X1X1YY11

(EMPTY) X1X1X1X1UU11

TRANS. 1 TRANS. 2 TRANS. 3

XX11XX11XX22

X1X1X1X1YY22

X1X1X1X1UU22

XX11XX11XX33

X1X1X1X1YY33

X1X1X1X1UU33

Page 65: Latent Factor Models

Steam: learned model

QuickTime™ and a decompressor

are needed to see this picture.