human memory search as initial-visit emitting...

RESEARCH POSTER PRESENTATION DESIGN © 2015

www.PosterPresentations.com

§ Random spanning tree algorithm [Broder89]

§ Self-avoiding Random Walk [Flory53]§ Cascade Model [Gomez-Rodriguez10] for

disease / information spread§ Why hard? A naïve way requires an infinite sum.

- 𝒙 = (𝑥1, 𝑥2, … ): uncensored, hidden- 𝒂 = (𝑎1, … , 𝑎𝑀): censored, observed

- E.g., 𝒂 = (1, 2, 3, 4)§ Key Quantity: § Solution: turn P into an absorbing random walk!

- Specifically, turn unvisited states into absorbing states- 𝐐: from visited state to visited state- 𝐑: from visited state to unvisited state

§ Thm 3. [Doyle84] Let 𝐁 = 𝐈– 𝐐 3𝟏𝐑. 𝐁56 is the probability of a chain starting from 𝑖 being absorbed by 𝑘.

§ Cor 1. Assume 𝐏 is arranged as above. Then,

* E.g., 𝒂 = (1,2,3,4,5). Compute Pr(𝑎4 = 4|𝑎1 = 1, 𝑎2 = 2, 𝑎3 = 3,𝐏).

§ n states, π : initial distribution, P: Markov chain (row stochastic)

§ The random walk runs indefinitely.§ A censored list is not Markovian anymore.§ GOOD NEWS: captures important human behavior in a cognitive task

(see below) [Abbott12]§ BAD NEWS: Parameter Estimation is HARD!

Main Contribution1. First tractable method for the maximum likelihood

estimate (MLE) of INVITE2. Consistency of the INVITE MLE

INitial-VIsiT Emitting (INVITE) Random Walk§ Data

§ Result: negative log likelihood on holdout set (smaller is better)

[Abbott12] J. T. Abbott, J. L. Austerweil, and T. L. Griffiths, “Human memory search as a random walk in a semantic network,” in NIPS, 2012, pp. 3050–3058.

[Abrahao13] B. D. Abrahao, F. Chierichetti, R. Kleinberg, and A. Panconesi, “Trace complexity of network inference.” CoRR, vol. abs/1308.2954, 2013.

[Broder89] A. Z. Broder, “Generating random spanning trees,” in FOCS. IEEE Computer Society, 1989, pp. 442–447.

[Doyle84] P. G. Doyle and J. L. Snell, Random Walks and Electric Networks. Washington, DC: Mathe- matical Association of America, 1984.

[Durrett12] R. Durrett, Essentials of stochastic processes, 2nd ed., ser. Springer texts in statistics. New York: Springer, 2012.

[Flory53] P. Flory, Principles of polymer chemistry. Cornell University Press, 1953. [Gomez-Rodriguez10] M. Gomez Rodriguez, J. Leskovec, and A. Krause, “Inferring networks of

diffusion and influ- ence,” Max-Planck-Gesellschaft. New York, NY, USA: ACM Press, July 2010, pp. 1019– 1028.

Kwang-Sung Jun∗ ([email protected]), Xiaojin Zhu†, Timothy Rogers‡, Zhuoran Yangξ, Ming Yuanζ∗Wisconsin Institute for Discovery, Department of †Computer Sciences, ‡Psychology, ζStatistics, University of Wisconsin-Madison, ξDepartment of Mathematical Sciences, Tsinghua University

Human Memory Search as Initial-Visit Emitting Random Walk

Censored Lists Generated by INVITE ConsistencyofINIVITEMLE

Order Item1 cow2 horse3 chicken4 bear5 lion6 tiger7 porcupine8 rat9 mouse10 duck11 goose12 …

cow

horsechicken

elephant bear

liontiger

porcupine

squirrel

ratmouse

rabbit

x1 ~πxt+1 ~P(.|xt)

a outputa1 cowa2 horsea3 chickena4 beara5 liona6 tiger

…

x1=cowx2=horsex3 =chickenx4 =cowx5 =bearx6 =lionx7 =bearx8 =tiger

“Censored List”

§ A censored list is a permutation of 𝑛 items or a prefixof it.

§ Does it produce every permutation? Or every prefix?

Transient state: a state that has nonzero probability of not coming back to itself in finite time

Recurrent state: a state that is not transientA is closed if 𝑖 ∈ 𝐴 and 𝑗 ∉ 𝐴 implies a walk from 𝑖 cannot

reach j.B is irreducible if for all 𝑖, 𝑗 ∈ 𝐵, a walk from 𝑖 can reach 𝑗.

ComputingINVITELikelihood

Pr(a4 =4|a1=1,a2=2,a3=3,P)

2 3

1

4 5

P P’2 3

1

4 5

= ((I- Q)-1R)34

unvisitedstatesè absorbing states

[Doyle84]

4,51,2,31,2,3

4,5

ParameterEstimation:RegularizedMLE§ Data: § Relaxation: assume that the underlying random walk terminates after

finite number of steps.§ Initial distribution π: MAP estimation (easy)§ Transition Matrix P is constrained (nonnegative, sum to 1)

- Easier: unconstrained parameterization- 𝛽55 ∶= −∞ to disallow self-transitions.

§ Optimization problem:

§ We run LBFGS.§ For larger dataset, we run averaged stochastic gradient descent.

§ Thm 1. [Durrett12] A finite set of states 𝑆 can be uniquely decomposed as 𝑆 = 𝑇 ∪ 𝑊M ∪…∪𝑊N , where 𝑇 is the set of transient states (possibly empty) 𝑊6 is a nonempty closed irreducible set of recurrent states.

§ Thm 2. Given P, let 𝑆 = 𝑇 ∪𝑊M ∪…∪𝑊N be the decomposition by Thm 1. A censored list a generated by INVITE on 𝐏 has zero or more transient states (in 𝑇), followed by all states in one and only one closed irreducible set (𝑊6 for some 𝑘)

§ Consistency rate: how fast does it converge to the true parameter?

§ Structured estimation of P: Given a cluster structure, form a stochastic block matrix P. Or, learn the cluster structure and the parameters at the same time.⟹ reduces the number of parameters

§ Allow repeats: create “dongle twin” that is an absorbing state.⟹ interpolates between “no repeat” and

“repeats as in the standard random walk”§ Incorporate timing information

2 3

1

4 5

1

2 3’

’

’

εε

ε

KEY: Output the state only when visiting it for the first time

Verbal Fluency: A Human Memory Search Task

TASK: List examples of animals in 60 seconds without repetition

Related Work

§ different categories possible: e.g., vehicles§ A “generative” task where participants must

remember past productions, inhibit these,and focus on the task.

§ Importance1. Clinical application: different neurological

syndromes have different patterns in lists (e.g. repeats more, less/irrelevant items)⟹ important diagnostic information

2. Study of human memory search: responses are runs of semantically related items.⟹ reveal structure in semantic representation

§ Our focus is on the second application, so repeats are ignored, but can be allowed by a reduction (see future work).

§ a(1), a(2), … ~ INVITE(π*,P*)§ (π(m), P(m)) = arg max(π,P) Σ i ≤m log Pr(a(i); π,P)§ Question: Does { (π(m), P(m)) } converges to (π*,P*) ?§ A necessary condition: identifiability

§ Thm 4. INVITE is not identifiable (adjusting self-transitions does not change the model).

§ Thm 5. INVITE with π * > 0 (element-wise) and without self-transitions in P * is identifiable.

§ Challenge: a common strategy is to show uniform convergence of the log likelihood, which is not true in INVITE MLE.

§ Solution: show the local uniform convergence§ uniformly convergent in an intersection of a max-norm ball and a

subspace of “equivalent chain decomposition” around the true parameter (π*,P*).

§ Thm 6. If π* > 0 (element-wise), INVITE MLE is consistent.

Experiment:Toy

FutureWork

References

§ Goal: (1) confirm the consistency result (2) compare with baselines§ Baselines

§ Naïve Random Walk (RW)

§ FirstEdge (FE) [Abrahao13]

§ Define P* as the uniform transition matrix on toy undirected graphs (# of nodes = 25)

§ Use INVITE(P*) to generate censored lists§ Evaluation: Ring Star Grid

Experiment:VerbalFluencyINitial-VIsiT Emitting (INVITE) Random Walk

human memory search as initial-visit emitting...

Documents