human memory search as initial-visit emitting...
TRANSCRIPT
RESEARCH POSTER PRESENTATION DESIGN © 2015
www.PosterPresentations.com
§ Random spanning tree algorithm [Broder89]
§ Self-avoiding Random Walk [Flory53]§ Cascade Model [Gomez-Rodriguez10] for
disease / information spread§ Why hard? A naïve way requires an infinite sum.
- 𝒙 = (𝑥1, 𝑥2, … ): uncensored, hidden- 𝒂 = (𝑎1, … , 𝑎𝑀): censored, observed
- E.g., 𝒂 = (1, 2, 3, 4)§ Key Quantity: § Solution: turn P into an absorbing random walk!
- Specifically, turn unvisited states into absorbing states- 𝐐: from visited state to visited state- 𝐑: from visited state to unvisited state
§ Thm 3. [Doyle84] Let 𝐁 = 𝐈– 𝐐 3𝟏𝐑. 𝐁56 is the probability of a chain starting from 𝑖 being absorbed by 𝑘.
§ Cor 1. Assume 𝐏 is arranged as above. Then,
* E.g., 𝒂 = (1,2,3,4,5). Compute Pr(𝑎4 = 4|𝑎1 = 1, 𝑎2 = 2, 𝑎3 = 3,𝐏).
§ n states, π : initial distribution, P: Markov chain (row stochastic)
§ The random walk runs indefinitely.§ A censored list is not Markovian anymore.§ GOOD NEWS: captures important human behavior in a cognitive task
(see below) [Abbott12]§ BAD NEWS: Parameter Estimation is HARD!
Main Contribution1. First tractable method for the maximum likelihood
estimate (MLE) of INVITE2. Consistency of the INVITE MLE
INitial-VIsiT Emitting (INVITE) Random Walk§ Data
§ Result: negative log likelihood on holdout set (smaller is better)
[Abbott12] J. T. Abbott, J. L. Austerweil, and T. L. Griffiths, “Human memory search as a random walk in a semantic network,” in NIPS, 2012, pp. 3050–3058.
[Abrahao13] B. D. Abrahao, F. Chierichetti, R. Kleinberg, and A. Panconesi, “Trace complexity of network inference.” CoRR, vol. abs/1308.2954, 2013.
[Broder89] A. Z. Broder, “Generating random spanning trees,” in FOCS. IEEE Computer Society, 1989, pp. 442–447.
[Doyle84] P. G. Doyle and J. L. Snell, Random Walks and Electric Networks. Washington, DC: Mathe- matical Association of America, 1984.
[Durrett12] R. Durrett, Essentials of stochastic processes, 2nd ed., ser. Springer texts in statistics. New York: Springer, 2012.
[Flory53] P. Flory, Principles of polymer chemistry. Cornell University Press, 1953. [Gomez-Rodriguez10] M. Gomez Rodriguez, J. Leskovec, and A. Krause, “Inferring networks of
diffusion and influ- ence,” Max-Planck-Gesellschaft. New York, NY, USA: ACM Press, July 2010, pp. 1019– 1028.
Kwang-Sung Jun∗ ([email protected]), Xiaojin Zhu†, Timothy Rogers‡, Zhuoran Yangξ, Ming Yuanζ∗Wisconsin Institute for Discovery, Department of †Computer Sciences, ‡Psychology, ζStatistics, University of Wisconsin-Madison, ξDepartment of Mathematical Sciences, Tsinghua University
Human Memory Search as Initial-Visit Emitting Random Walk
Censored Lists Generated by INVITE ConsistencyofINIVITEMLE
Order Item1 cow2 horse3 chicken4 bear5 lion6 tiger7 porcupine8 rat9 mouse10 duck11 goose12 …
cow
horsechicken
elephant bear
liontiger
porcupine
squirrel
ratmouse
rabbit
x1 ~πxt+1 ~P(.|xt)
a outputa1 cowa2 horsea3 chickena4 beara5 liona6 tiger
…
x1=cowx2=horsex3 =chickenx4 =cowx5 =bearx6 =lionx7 =bearx8 =tiger
“Censored List”
§ A censored list is a permutation of 𝑛 items or a prefixof it.
§ Does it produce every permutation? Or every prefix?
Transient state: a state that has nonzero probability of not coming back to itself in finite time
Recurrent state: a state that is not transientA is closed if 𝑖 ∈ 𝐴 and 𝑗 ∉ 𝐴 implies a walk from 𝑖 cannot
reach j.B is irreducible if for all 𝑖, 𝑗 ∈ 𝐵, a walk from 𝑖 can reach 𝑗.
ComputingINVITELikelihood
Pr(a4 =4|a1=1,a2=2,a3=3,P)
2 3
1
4 5
P P’2 3
1
4 5
= ((I- Q)-1R)34
unvisitedstatesè absorbing states
[Doyle84]
4,51,2,31,2,3
4,5
ParameterEstimation:RegularizedMLE§ Data: § Relaxation: assume that the underlying random walk terminates after
finite number of steps.§ Initial distribution π: MAP estimation (easy)§ Transition Matrix P is constrained (nonnegative, sum to 1)
- Easier: unconstrained parameterization- 𝛽55 ∶= −∞ to disallow self-transitions.
§ Optimization problem:
§ We run LBFGS.§ For larger dataset, we run averaged stochastic gradient descent.
§ Thm 1. [Durrett12] A finite set of states 𝑆 can be uniquely decomposed as 𝑆 = 𝑇 ∪ 𝑊M ∪…∪𝑊N , where 𝑇 is the set of transient states (possibly empty) 𝑊6 is a nonempty closed irreducible set of recurrent states.
§ Thm 2. Given P, let 𝑆 = 𝑇 ∪𝑊M ∪…∪𝑊N be the decomposition by Thm 1. A censored list a generated by INVITE on 𝐏 has zero or more transient states (in 𝑇), followed by all states in one and only one closed irreducible set (𝑊6 for some 𝑘)
§ Consistency rate: how fast does it converge to the true parameter?
§ Structured estimation of P: Given a cluster structure, form a stochastic block matrix P. Or, learn the cluster structure and the parameters at the same time.⟹ reduces the number of parameters
§ Allow repeats: create “dongle twin” that is an absorbing state.⟹ interpolates between “no repeat” and
“repeats as in the standard random walk”§ Incorporate timing information
2 3
1
4 5
1
2 3’
’
’
εε
ε
KEY: Output the state only when visiting it for the first time
Verbal Fluency: A Human Memory Search Task
TASK: List examples of animals in 60 seconds without repetition
Related Work
§ different categories possible: e.g., vehicles§ A “generative” task where participants must
remember past productions, inhibit these,and focus on the task.
§ Importance1. Clinical application: different neurological
syndromes have different patterns in lists (e.g. repeats more, less/irrelevant items)⟹ important diagnostic information
2. Study of human memory search: responses are runs of semantically related items.⟹ reveal structure in semantic representation
§ Our focus is on the second application, so repeats are ignored, but can be allowed by a reduction (see future work).
§ a(1), a(2), … ~ INVITE(π*,P*)§ (π(m), P(m)) = arg max(π,P) Σ i ≤m log Pr(a(i); π,P)§ Question: Does { (π(m), P(m)) } converges to (π*,P*) ?§ A necessary condition: identifiability
§ Thm 4. INVITE is not identifiable (adjusting self-transitions does not change the model).
§ Thm 5. INVITE with π * > 0 (element-wise) and without self-transitions in P * is identifiable.
§ Challenge: a common strategy is to show uniform convergence of the log likelihood, which is not true in INVITE MLE.
§ Solution: show the local uniform convergence§ uniformly convergent in an intersection of a max-norm ball and a
subspace of “equivalent chain decomposition” around the true parameter (π*,P*).
§ Thm 6. If π* > 0 (element-wise), INVITE MLE is consistent.
Experiment:Toy
FutureWork
References
§ Goal: (1) confirm the consistency result (2) compare with baselines§ Baselines
§ Naïve Random Walk (RW)
§ FirstEdge (FE) [Abrahao13]
§ Define P* as the uniform transition matrix on toy undirected graphs (# of nodes = 25)
§ Use INVITE(P*) to generate censored lists§ Evaluation: Ring Star Grid
Experiment:VerbalFluencyINitial-VIsiT Emitting (INVITE) Random Walk