human memory search as initial-visit emitting...

1
RE S E ARCH P OSTE R P RE SE NTATION DES IGN © 2015 www.PosterPresentations.com § Random spanning tree algorithm [Broder89] § Self-avoiding Random Walk [Flory53] § Cascade Model [Gomez-Rodriguez10] for disease / information spread § Why hard? A naïve way requires an infinite sum. - = ( 1 , 2 , …): uncensored, hidden - = ( 1 , …, ) : censored, observed - E.g., = (1, 2, 3, 4) § Key Quantity: § Solution: turn P into an absorbing random walk! - Specifically, turn unvisited states into absorbing states - : from visited state to visited state - : from visited state to unvisited state § Thm 3. [Doyle84] Let = 3 . 56 is the probability of a chain starting from being absorbed by . § Cor 1. Assume is arranged as above. Then, * E.g., = (1,2,3,4,5). Compute Pr( 4 = 4| 1 = 1, 2 = 2, 3 = 3,) . § n states, π : initial distribution, P: Markov chain (row stochastic) § The random walk runs indefinitely. § A censored list is not Markovian anymore. § GOOD NEWS: captures important human behavior in a cognitive task (see below) [Abbott12] § BAD NEWS: Parameter Estimation is HARD! Main Contribution 1. First tractable method for the maximum likelihood estimate (MLE) of INVITE 2. Consistency of the INVITE MLE INitial-VIsiT Emitting (INVITE) Random Walk § Data § Result: negative log likelihood on holdout set (smaller is better) [Abbott12] J. T. Abbott, J. L. Austerweil, and T. L. Griffiths, “Human memory search as a random walk in a semantic network,” in NIPS, 2012, pp. 3050–3058. [Abrahao13] B. D. Abrahao, F. Chierichetti, R. Kleinberg, and A. Panconesi, “Trace complexity of network inference.” CoRR, vol. abs/1308.2954, 2013. [Broder89] A. Z. Broder, “Generating random spanning trees,” in FOCS. IEEE Computer Society, 1989, pp. 442–447. [Doyle84] P. G. Doyle and J. L. Snell, Random Walks and Electric Networks. Washington, DC: Mathe- matical Association of America, 1984. [Durrett12] R. Durrett, Essentials of stochastic processes, 2nd ed., ser. Springer texts in statistics. New York: Springer, 2012. [Flory53] P. Flory, Principles of polymer chemistry. Cornell University Press, 1953. [Gomez-Rodriguez10] M. Gomez Rodriguez, J. Leskovec, and A. Krause, “Inferring networks of diffusion and influ- ence,” Max-Planck-Gesellschaft. New York, NY, USA: ACM Press, July 2010, pp. 1019– 1028. Kwang-Sung Jun ([email protected]), Xiaojin Zhu , Timothy Rogers , Zhuoran Yang ξ , Ming Yuan ζ Wisconsin Institute for Discovery, Department of Computer Sciences, Psychology, ζ Statistics, University of Wisconsin-Madison, ξ Department of Mathematical Sciences, Tsinghua University Human Memory Search as Initial-Visit Emitting Random Walk Censored Lists Generated by INVITE Consistency of INIVITE MLE Order Item 1 cow 2 horse 3 chicken 4 bear 5 lion 6 tiger 7 porcupine 8 rat 9 mouse 10 duck 11 goose 12 cow horse chicken elephant bear lion tiger porcupine squirrel rat mouse rabbit x 1 ~ π x t+1 ~ P(. | x t ) a output a 1 cow a 2 horse a 3 chicken a 4 bear a 5 lion a 6 tiger x 1 = cow x 2 = horse x 3 = chicken x 4 = cow x 5 = bear x 6 = lion x 7 = bear x 8 = tiger “Censored List” § A censored list is a permutation of items or a prefix of it. § Does it produce every permutation? Or every prefix? Transient state: a state that has nonzero probability of not coming back to itself in finite time Recurrent state: a state that is not transient A is closed if and implies a walk from cannot reach j. B is irreducible if for all , ∈ , a walk from can reach . Computing INVITE Likelihood Pr(a 4 =4|a 1 =1, a 2 =2, a 3 =3, P) 2 3 1 4 5 P P’ 2 3 1 4 5 = ((I - Q) -1 R) 34 unvisited states è absorbing states [Doyle84] 4,5 1,2,3 1,2,3 4,5 Parameter Estimation: Regularized MLE § Data: § Relaxation: assume that the underlying random walk terminates after finite number of steps. § Initial distribution π: MAP estimation (easy) § Transition Matrix P is constrained (nonnegative, sum to 1) - Easier: unconstrained parameterization - 55 ∶= −∞ to disallow self-transitions. § Optimization problem: § We run LBFGS. § For larger dataset, we run averaged stochastic gradient descent. § Thm 1. [Durrett12] A finite set of states can be uniquely decomposed as =∪ M ∪ …∪ N , where is the set of transient states (possibly empty) 6 is a nonempty closed irreducible set of recurrent states. § Thm 2. Given P, let =∪ M ∪ …∪ N be the decomposition by Thm 1. A censored list a generated by INVITE on has zero or more transient states (in ), followed by all states in one and only one closed irreducible set ( 6 for some ) § Consistency rate: how fast does it converge to the true parameter? § Structured estimation of P: Given a cluster structure, form a stochastic block matrix P. Or, learn the cluster structure and the parameters at the same time. reduces the number of parameters § Allow repeats: create “dongle twin” that is an absorbing state. interpolates between “no repeat” and “repeats as in the standard random walk” § Incorporate timing information 2 3 1 4 5 1 2 3 ε ε ε KEY: Output the state only when visiting it for the first time Verbal Fluency: A Human Memory Search Task TASK: List examples of animals in 60 seconds without repetition Related Work § different categories possible: e.g., vehicles § A “generative” task where participants must remember past productions, inhibit these, and focus on the task. § Importance 1. Clinical application: different neurological syndromes have different patterns in lists (e.g. repeats more, less/irrelevant items) important diagnostic information 2. Study of human memory search: responses are runs of semantically related items. reveal structure in semantic representation § Our focus is on the second application, so repeats are ignored, but can be allowed by a reduction (see future work). § a (1) , a (2) , … ~ INVITE(π * ,P * ) § (π (m) , P (m) ) = arg max (π,P) Σ i m log Pr(a (i) ; π,P) § Question: Does { (π (m) , P (m) ) } converges to (π * ,P * ) ? § A necessary condition: identifiability § Thm 4. INVITE is not identifiable (adjusting self-transitions does not change the model). § Thm 5. INVITE with π * > 0 (element-wise) and without self- transitions in P * is identifiable. § Challenge: a common strategy is to show uniform convergence of the log likelihood, which is not true in INVITE MLE. § Solution: show the local uniform convergence § uniformly convergent in an intersection of a max-norm ball and a subspace of “equivalent chain decomposition” around the true parameter (π * ,P * ). § Thm 6. If π * > 0 (element-wise), INVITE MLE is consistent. Experiment: Toy Future Work References § Goal: (1) confirm the consistency result (2) compare with baselines § Baselines § Naïve Random Walk (RW) § FirstEdge (FE) [Abrahao13] § Define P * as the uniform transition matrix on toy undirected graphs (# of nodes = 25) § Use INVITE(P * ) to generate censored lists § Evaluation: Ring Star Grid Experiment: Verbal Fluency INitial-VIsiT Emitting (INVITE) Random Walk

Upload: others

Post on 14-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Human Memory Search as Initial-Visit Emitting …pages.cs.wisc.edu/.../career/pub/kjun-nips15-poster-v2.pdfHuman Memory Search as Initial-Visit Emitting Random Walk Censored Lists

RESEARCH POSTER PRESENTATION DESIGN © 2015

www.PosterPresentations.com

§ Random spanning tree algorithm [Broder89]

§ Self-avoiding Random Walk [Flory53]§ Cascade Model [Gomez-Rodriguez10] for

disease / information spread§ Why hard? A naïve way requires an infinite sum.

- 𝒙 = (𝑥1, 𝑥2, … ): uncensored, hidden- 𝒂 = (𝑎1, … , 𝑎𝑀): censored, observed

- E.g., 𝒂 = (1, 2, 3, 4)§ Key Quantity: § Solution: turn P into an absorbing random walk!

- Specifically, turn unvisited states into absorbing states- 𝐐: from visited state to visited state- 𝐑: from visited state to unvisited state

§ Thm 3. [Doyle84] Let 𝐁 = 𝐈– 𝐐 3𝟏𝐑. 𝐁56 is the probability of a chain starting from 𝑖 being absorbed by 𝑘.

§ Cor 1. Assume 𝐏 is arranged as above. Then,

* E.g., 𝒂 = (1,2,3,4,5). Compute Pr(𝑎4 = 4|𝑎1 = 1, 𝑎2 = 2, 𝑎3 = 3,𝐏).

§ n states, π : initial distribution, P: Markov chain (row stochastic)

§ The random walk runs indefinitely.§ A censored list is not Markovian anymore.§ GOOD NEWS: captures important human behavior in a cognitive task

(see below) [Abbott12]§ BAD NEWS: Parameter Estimation is HARD!

Main Contribution1. First tractable method for the maximum likelihood

estimate (MLE) of INVITE2. Consistency of the INVITE MLE

INitial-VIsiT Emitting (INVITE) Random Walk§ Data

§ Result: negative log likelihood on holdout set (smaller is better)

[Abbott12] J. T. Abbott, J. L. Austerweil, and T. L. Griffiths, “Human memory search as a random walk in a semantic network,” in NIPS, 2012, pp. 3050–3058.

[Abrahao13] B. D. Abrahao, F. Chierichetti, R. Kleinberg, and A. Panconesi, “Trace complexity of network inference.” CoRR, vol. abs/1308.2954, 2013.

[Broder89] A. Z. Broder, “Generating random spanning trees,” in FOCS. IEEE Computer Society, 1989, pp. 442–447.

[Doyle84] P. G. Doyle and J. L. Snell, Random Walks and Electric Networks. Washington, DC: Mathe- matical Association of America, 1984.

[Durrett12] R. Durrett, Essentials of stochastic processes, 2nd ed., ser. Springer texts in statistics. New York: Springer, 2012.

[Flory53] P. Flory, Principles of polymer chemistry. Cornell University Press, 1953. [Gomez-Rodriguez10] M. Gomez Rodriguez, J. Leskovec, and A. Krause, “Inferring networks of

diffusion and influ- ence,” Max-Planck-Gesellschaft. New York, NY, USA: ACM Press, July 2010, pp. 1019– 1028.

Kwang-Sung Jun∗ ([email protected]), Xiaojin Zhu†, Timothy Rogers‡, Zhuoran Yangξ, Ming Yuanζ∗Wisconsin Institute for Discovery, Department of †Computer Sciences, ‡Psychology, ζStatistics, University of Wisconsin-Madison, ξDepartment of Mathematical Sciences, Tsinghua University

Human Memory Search as Initial-Visit Emitting Random Walk

Censored Lists Generated by INVITE ConsistencyofINIVITEMLE

Order Item1 cow2 horse3 chicken4 bear5 lion6 tiger7 porcupine8 rat9 mouse10 duck11 goose12 …

cow

horsechicken

elephant bear

liontiger

porcupine

squirrel

ratmouse

rabbit

x1 ~πxt+1 ~P(.|xt)

a outputa1 cowa2 horsea3 chickena4 beara5 liona6 tiger

x1=cowx2=horsex3 =chickenx4 =cowx5 =bearx6 =lionx7 =bearx8 =tiger

“Censored List”

§ A censored list is a permutation of 𝑛 items or a prefixof it.

§ Does it produce every permutation? Or every prefix?

Transient state: a state that has nonzero probability of not coming back to itself in finite time

Recurrent state: a state that is not transientA is closed if 𝑖 ∈ 𝐴 and 𝑗 ∉ 𝐴 implies a walk from 𝑖 cannot

reach j.B is irreducible if for all 𝑖, 𝑗 ∈ 𝐵, a walk from 𝑖 can reach 𝑗.

ComputingINVITELikelihood

Pr(a4 =4|a1=1,a2=2,a3=3,P)

2 3

1

4 5

P P’2 3

1

4 5

= ((I- Q)-1R)34

unvisitedstatesè absorbing states

[Doyle84]

4,51,2,31,2,3

4,5

ParameterEstimation:RegularizedMLE§ Data: § Relaxation: assume that the underlying random walk terminates after

finite number of steps.§ Initial distribution π: MAP estimation (easy)§ Transition Matrix P is constrained (nonnegative, sum to 1)

- Easier: unconstrained parameterization- 𝛽55 ∶= −∞ to disallow self-transitions.

§ Optimization problem:

§ We run LBFGS.§ For larger dataset, we run averaged stochastic gradient descent.

§ Thm 1. [Durrett12] A finite set of states 𝑆 can be uniquely decomposed as 𝑆 = 𝑇 ∪ 𝑊M ∪…∪𝑊N , where 𝑇 is the set of transient states (possibly empty) 𝑊6 is a nonempty closed irreducible set of recurrent states.

§ Thm 2. Given P, let 𝑆 = 𝑇 ∪𝑊M ∪…∪𝑊N be the decomposition by Thm 1. A censored list a generated by INVITE on 𝐏 has zero or more transient states (in 𝑇), followed by all states in one and only one closed irreducible set (𝑊6 for some 𝑘)

§ Consistency rate: how fast does it converge to the true parameter?

§ Structured estimation of P: Given a cluster structure, form a stochastic block matrix P. Or, learn the cluster structure and the parameters at the same time.⟹ reduces the number of parameters

§ Allow repeats: create “dongle twin” that is an absorbing state.⟹ interpolates between “no repeat” and

“repeats as in the standard random walk”§ Incorporate timing information

2 3

1

4 5

1

2 3’

εε

ε

KEY: Output the state only when visiting it for the first time

Verbal Fluency: A Human Memory Search Task

TASK: List examples of animals in 60 seconds without repetition

Related Work

§ different categories possible: e.g., vehicles§ A “generative” task where participants must

remember past productions, inhibit these,and focus on the task.

§ Importance1. Clinical application: different neurological

syndromes have different patterns in lists (e.g. repeats more, less/irrelevant items)⟹ important diagnostic information

2. Study of human memory search: responses are runs of semantically related items.⟹ reveal structure in semantic representation

§ Our focus is on the second application, so repeats are ignored, but can be allowed by a reduction (see future work).

§ a(1), a(2), … ~ INVITE(π*,P*)§ (π(m), P(m)) = arg max(π,P) Σ i ≤m log Pr(a(i); π,P)§ Question: Does { (π(m), P(m)) } converges to (π*,P*) ?§ A necessary condition: identifiability

§ Thm 4. INVITE is not identifiable (adjusting self-transitions does not change the model).

§ Thm 5. INVITE with π * > 0 (element-wise) and without self-transitions in P * is identifiable.

§ Challenge: a common strategy is to show uniform convergence of the log likelihood, which is not true in INVITE MLE.

§ Solution: show the local uniform convergence§ uniformly convergent in an intersection of a max-norm ball and a

subspace of “equivalent chain decomposition” around the true parameter (π*,P*).

§ Thm 6. If π* > 0 (element-wise), INVITE MLE is consistent.

Experiment:Toy

FutureWork

References

§ Goal: (1) confirm the consistency result (2) compare with baselines§ Baselines

§ Naïve Random Walk (RW)

§ FirstEdge (FE) [Abrahao13]

§ Define P* as the uniform transition matrix on toy undirected graphs (# of nodes = 25)

§ Use INVITE(P*) to generate censored lists§ Evaluation: Ring Star Grid

Experiment:VerbalFluencyINitial-VIsiT Emitting (INVITE) Random Walk