mini-course on artificial neural networks and bayesian networks
DESCRIPTION
ืชืฉืกืดื ืืจึพ ืืืื ืืื ืืืจืกืืืช ืืืื ืืืงืจ ืืจืฉืชืืช ืืืจืื ืืจื ืชืืืื ืืจืืื ืงืืจืก. Mini-course on Artificial Neural Networks and Bayesian Networks. Michal Rosen-Zvi. Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004. - PowerPoint PPT PresentationTRANSCRIPT
Mini-course on Artificial Neural Networks and Bayesian Networks
Michal Rosen-Zvi
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Section 1: Introduction
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Networks (1)
Networks serve as a visual way for displaying relationships:
Social networks are examples of โflatโ networks where the only information is relation between entities
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Example: collaboration network
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
1. Analyzing Cortical Activity using Hidden Markov Models
Itay Gat, Naftali Tishby, and Moshe Abeles"Network, Computation in Neural Systems", August 1997.2. Cortical Activity Flips Among Quasi Stationary StatesMoshe Abeles, Hagai Bergman, Itay Gat, Isaac Meilijson, Eyal Seidemann, Naftali Tishby, Eilon VaadiaPrepared: Feb 1, 1995, Appeared in the Proceedings of the National Academy of Science (PNAS)3. Rigorous Learning Curve Bounds from Statistical Mechanics
David Haussler, Michael Kearns, H. Sebastian Seung, and Naftali TishbyPrepared: July 1994. Full version, Machine Learning (1997).
4. H. S. Seung, Haim Sompolinsky, Naftali Tishby: Learning Curves in Large Neural Networks. COLT 1991: 112-127
5. Yann LeCun, Ido Kanter, Sara A. Solla: Second Order Properties of Error Surfaces. NIPS 1990: 918-924
6. Esther Levin, Naftali Tishby, Sara A. Solla: A Statistical Approach to Learning and Generalization in Layered Neural Networks. COLT 1989: 245-260
7. Litvak V, Sompolinsky H, Segev I, and Abeles M (2003) On the Transmission of Rate Code in Long Feedforward Networks with Excitatory-Inhibitory Balance. Journal of Neuroscience, 23(7):3006-30158. Senn, W., Segev, I., and Tsodyks, M. (1998). Reading neural synchrony with depressing synapses. Neural Computation 10: 815-819
8. Tsodkys, M., I.Mit'kov, H.Sompolinsky (1993): Pattern of synchrony in inhomogeneous networks of oscillators with pulse interactions. Phys. Rev. Lett.,
9. Memory Capacity of Balanced Networks (Yuval Aviel, David Horn and Moshe Abeles)10. The Role of Inhibition in an Associative Memory Model of the Olfactory Bulb.
(Ofer Hendin, David Horn and Misha Tsodyks)11 Information Bottleneck for Gaussian Variables
Gal Chechik, Amir Globerson, Naftali Tishby and Yair WeissPrepared: June 2003. Submitted to NIPS-2003
[matlab]
Networks (2)
Artificial Neural Networks represent rules โ deterministic relations - between input and output
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Networks (3)
Bayesian Networks represent probabilistic relations - conditional independencies and dependencies between variables
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Outline
Introduction/Motivation Artificial Neural Networks
โ The Perceptron, multilayered FF NN and recurrent NNโ On-line (supervised) learningโ Unsupervised learning and PCAโ Classificationโ Capacity of networks
Bayesian networks (BN)โ Bayes rules and the BN semanticsโ Classification using Generative models
Applications: Vision, Text
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Motivation
The research of ANNs is inspired by neurons in the brain and (partially) driven by the need for models of the reasoning in the brain.
Scientists are challenged to use machines more effectively for tasks traditionally solved by humans (example - driving a car, inferring scientific referees to papers and many others)
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Questions
How can a network learn? What will be the learning rate? What are the limitations on the network capacity? How networks can be used to classify results with no
labels (unsupervised learning)? What are the relations and differences between
learning in ANN and learning in BN? How can network models explain high-level
reasoning?
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
History of (modern) ANNs and BNs
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
1940 1950 1960 1970 1980 1990 2000
McCulloch and Pitts Model
Hebbian Learning rule
Minsky and Papertโs book
Perceptron Hopfield Network
Pearlโs Book
Statistical Physics
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Section 2: On-line Learning
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
Based on slides from Michael Biehlโs summer course
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Section 2.1: The Perceptron
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
The Perceptron
Input:
Adaptive Weights JJ
Output: SMini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Perceptron: binary output
Implements a linearly separable classification of inputs
Milestones:Perceptron convergence theorem, Rosenblatt
(1958)Capacity, winder (1963) Cover(1965)Statistical Physics of perceptron weights,
Gardner (1988)
How does this device learn?
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
W
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Learning a linearly separable rule from reliable examples
Unknown rule: ST()=sign(BB) =ยฑ1
Defines the correct classification.
Parameterized through a teacher perceptron with weights BBRN, (BBBB=1)
Only available information: example data
D= { , ST()=sign(BB) for =1โฆP }
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Learning a linearlyโฆ (Cont.)
Training: finding the student weights JJโ J J parameterizes a hypothesis SS()=sign(JJ) โ Supervised learning is based on the student
performance with respect to the training data DDโ Binary error measure
T(JJ)= [S
S(),ST()]
T(JJ)=1 if S
S()ST()
T(WW)=0 if SS()=S
T()
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Off-line learning
Guided by the minimization of a cost function H(JJ), e.g., the training error
H(JJ) tT(JJ)
Equilibrium statistical mechanics treatment:โ Energy H of N degrees of freedmโ Ensemble of systems is in thermal equilibrium at formal
temperatureโ Disorder avg. over random examples (replicas) assumes
distribution over the inputsโ Macroscopic description, order parametersโ Typical properties of large sustems, P= N
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
On-line training
Single presentation of uncorrelated (new) {,S
T()} Update of student weights:
Learning dynamics in discrete time
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
On-line training - Statistical Physics approach
Consider sequence of independent, random Thermodynamic limit Disorder average over latest example self-
averaging properties Continuous time limit
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Generalization
Performance of the student (after training) with respect to arbitrary, new input
In practice: empirical mean of mean error measure over a set of test inputs
In the theoretical analysis: average over the (assumed) probability density of inputs
Generalization error:
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Generalization (cont.)
The simplest model distribution:
Isotropic density P(), uncorrelated with B B and JJ
Consider vectors of independent identically distributed (iid) components jj with
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Geometric argument
Projection of data into (BB, JJ)-plane yields isotropic density of inputs
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
BBJJ
ST()=SS()
g=/
For |BB|=1
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Overlap Parameters
Sufficient to quantify the success of learning
R=BBJ J Q=JJJ J
Random guessing R=0, g=1/2
Perfect generalization , g=0
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Derivation for large N
Given BB, JJ, and uncorrelated random input i=0, i j =ij, consider student/teacher fields that are sums of (many) independent random quantities:
x=JJ=โiJiI
y=BB=โiBii
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Central Limit Theorem
Joint density of (x,y) is for Nโโ, a two dimensional Gaussian, fully specified by the first and the second moments
x=โiJii=0 y=โiBii=0
x2 = โijJiJjij = โiJi2 = Q
y2 = โijBiBjij = โiBi2 = 1
xy = โijJiBjij = โiJiBi = R
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Central Limit Theorem (Cont.)
Details of the input are irrelevant.
Some possible examples: binary, i1, with equal prob. Uniform, Gaussian.
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Generalization Error
The isotropic distribution is also assumed to describe the statistics of the example data inputs
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Exercise: Derive the generalization error as a function of R,Q use Mathematical notes
Assumptions about the data
No spatial correlatins No distinguished directions in the input space No temporal correlations No correlations with the rule Single presentation without repeatitionsConsequences: Average over data can be performed step by step Actual choice of B B is irrelevant, it is not necessary to
averaged over the teacher
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Hebbian learning (revisited) Hebb 1949
Off-line interpretation Vallet 1989
Choice of student weights given D={,ST}=1
P
JJ(P) = โST/N
Equivalent On-line interpretation
Dynamics upon single presentation of examples
JJ() = JJ(-1) + ST/N
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Hebb: on-line
From microscopic to macroscopic: recursions for overlaps
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Exercise: Derive the update equations of R,Q
Hebb: on-line (Cont.)
Average over the latest example โฆ
The random input, enters only through the fields
The random input and JJ(-1), BB are statistically independent
The Central Limit Theorems applies and obtains the joint density
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Hebb: on-line (Cont.)
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Exercise: Derive the update equations of R,Q as a function of use Mathematical notes [off-line]
Hebb: on-line (Cont.)
Continuous time limit, Nโโ, = /N, d=1/N
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Initial conditions - tabula rasa R(0)=Q(0)=0
What are the mean values after training with N examples???
[See matlab code]
Hebb: on-line mean values
The order parameters, Q and R, are self averagingself averaging for infinite N
Self average properties of A(JJ):โ The observation of a value of A different from its mean
occurs with vanishing probability
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Learning Curve: dependent of the order parameters
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Exercise: Solve the differential equations for R and Q
Exercise: Find the function ()
Learning Curve: dependent of the order parameters
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
The normalized overlap between the two vectors, BB, J J provides the angle between the vectors two vectors
1cos1 g
Learning Curve: dependent of the order parameters
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Exercise: Find asymptotic behavior of ()
Asymptotic expansion [draw w. matlab]
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Questions:
What are other learning algorithms that can be used for efficient learning?
What training algorithm will provide the best learning/ the fastest asymptotic decrease?
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Modified Hebbian learning
The training algorithm is defined by a modulation function f
JJ() = JJ(-1) +f(โฆ) ST/N
Restriction: f may depend on available quantities: f(JJ(-1),,S
T)
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Perceptron Rosenblatt 1959
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
If classification is correct donโt change the weights.
If classification is incorrect
โ if the right class for the
example is 1 JJ(). increases.โ if right class for the example
is -1 JJ(). decreases
w
Perceptron
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Only informative points are used (mistake driven) The solution is a linear combination of the training
points Converges only for linearly separable data
Exercise: Derive the update equations of ,Q as a function of , J,B and
On-line dynamics Biehl and Riegler 1994
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
Questions:
Find the asymptotic behavior (by simulations and/or analytically) of the generalization error for the perceptron algorithm and Hebb algorithm, which one is better?
What training algorithm will provide the best learning/ the fastest asymptotic decrease?
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Learning Curve - Hebb and Perceptron
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Section 2.2: On-line by gradient descent
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Introduction
Commonly used in practical applications:
Multilayered neural network with continuous activation functions, where output is a differentiable function of the adaptive parameters
Can be used for fitting a function to a data
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Linear perceptron and linear regression (1D)
x=J
Using a quadratic loss function and gradient descent for finding the best curve to fit a data set [see โ , off-line]
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
y
Simple case: โLinear perceptronโ
Teacher: ST()=y=BBStudent: SS()=x=JJ
Training and performance evaluation are based on the quadratic error
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Consider the training dynamics
Exercise: Derive the update equations of R,Q as a function of
โLinear perceptronโ (cont.)
Some exercises: Write a matlab code for the linear perceptron,
teacher-student scenario. Show that Investigate the role of the learning rate Find the asymptotic decrease to zero errors
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Adatron: binary output JJ() = JJ(-1) +f(โฆ) ST/N
Some exercises: Write a matlab code for the linear perceptron,
teacher-student scenario. Find the asymptotic decrease to zero errors Compare with the performance of the
Perceptron and Hebb rule
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Multilayered feed-forward NN
Example architecture: the soft-committee machine
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Multilayered ff NN (cont.)
Transfer function: sigmoidal g(x)
e.g., g(x)=tanh(x) or g(x)=
Error function is defined:
The total output:
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Teacher-Student scenario
If teacher and student have the same architecture but student has K hidden units and teacher as M hidden units,
Can the student learn the rules?
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
K < M
Unlearnable rule
K = M
Learnable rule Overlearnable rule
K > M
In the following we will discuss matching architectures
The error measure
One (obvious) choice for continuous outputs
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
On-line gradient descent
Assuming the same learning rate, , over all the network, the update equations are for fixed known {v}
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Assumptions and definitions
Isotropic uncorrelated input data The number of input components N is huge
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
The rule is specified by the norms of the teacher, say all 1s
Order parameters role
The set of order parameters and weights is sufficient for describing the learning, this is the macroscopic set of parameters
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Microscopic: KN+K degrees of freedom
Macroscopic: K(K-1)/2+KM+K different order parameters
Generalization error: erf function Saad & Solla 1995
Reflect symmetries of the soft committee machine
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Permutation Symmetry
The generalization error is characterized by invariance under permutations of branches
How do you think this feature affects learning performance?
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
A simple case
Hidden to output weights are fixed and known wi=vi=1
The update rule is
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Update of the order parameters
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Differential Equations
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Learning curves
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Section 3: Unsupervised learning
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
Based on slides from Michael Biehlโs summer course
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Introduction
Learning without a teacher!?
Real world data is, in general, not isotropic and structure less in input space.
Unsupervised learning = extraction of information from unlabelled inputs
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Potential aims
Correlation analysis Clustering of data โ grouping according to
some similarity criterion Identification of prototypes โ represent large
amount of data by few examples Dimension reduction โ represent high
dimensional data by few relevant features
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
A simple example
Prototypes for high dimensional data โ directions in the space
Assume data points are distributed as
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
A simple example (cont)
The student task is to find the directions BB11 and BB22 The data looks different in different planes!
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Student scenario
Search for the two vectors, using a two student vectors:โ Define set of possible learning rulesโ Analyze learning abilitiesโ Compare and choose the best learning
It would provide the two principle principle componentscomponents of the data
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
PCA: General setting [matlab]
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
N
n
Tnn
N 1
))((1
xxxxฮฃ
Given a set of data points: X X
1. Compute the covariance matrix
2. Compute the eigenvalues and eigenvectors of the covariance matrix
3. Arrange the egienvalues from the biggest to the smallest. Take the first d eigenvectors as principle components if the input dimensionality is to be reduced to d.
4. Project the input data onto the principle components, which forms the representation of input data.
Principle Component Analysis
Algebraic view point: Given data find a linear transformation such that the sum of squared distances is minimized over all linear transformations
Statistical view point: Given data assume that each point is a random variable sampled from a Gaussian with unit covariance and mean. Find the ML estimator of the means under the constraint that there are K different means that are linearly related to the data.
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Example: vision
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Example: vision (cont)
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Average results for each of the 6400 pixels
First nine eigen faces
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Dimensionality Reduction
The goal is to compress information with minimal loss
Methods: โ Unsupervised learning
Principle Component Analysis
โ Nonnegative Matrix Factorization Bayesian Models (Matrices are probabilities)
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Section 4: Bayesian Networks
Some slides are from Baldiโs course on Neural Networks
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Bayesian Statistics
Bayesian framework for induction: we start with hypothesis space and wish to express relative preferences in terms of background information (the Cox-Jaynes axioms).
Axiom 0: Transitivity of preferences. Theorem 1: Preferences can be represented by a real number ฯ (A). Axiom 1: There exists a function f such that
ฯ(non A) = f(ฯ(A)) Axiom 2: There exists a function F such that
ฯ (A,B) = F(ฯ(A), ฯ(B|A)) Theorem2: There is always a rescaling w such that p(A)=w(ฯ(A)) is in
[0,1], and satisfies the sum and product rules.
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Probability as Degree of Belief
Sum Rule:P(non A) = 1- P(A)
Product Rule:P(A and B) = P(A) P(B|A)
BayesTheorem:P(B|A)=P(A|B)P(B)/P(A)
Induction Form:P(M|D) = P(D|M)P(M)/P(D)
Equivalently:log[P(M|D)] = log[P(D|M)]+log[P(M)]-log[P(D)]
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
The Asia problem
โShortness-of-breath (dyspnoea) may be due to Tuberculosis, Lung cancer or bronchitis, or none of them. A recent visit to Asia increases the chances of tuberculosis, while Smoking is known to be a risk factor for both lung cancer and Bronchitis. The results of a single chest X-ray do not discriminate between lung cancer and tuberculosis, as neither does the presence or absence of Dyspnoea.โ
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
Lauritzen & Spiegelhalter 1988
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Graphical models
โSuccessful marriage between Probabilistic Theory and Graph Theoryโ
M. I. Jordan
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
P(x1,x2,x3) P(x1,x3) P(x2,x3)
P(x1,x2,x3) (x1,x3) (x2,x3)x1
x3x2Applications: Vision, Speech
Recognition, Error correcting codes, Bioinformatics
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Directed acyclic Graphs
Involves conditional dependencies
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
x1
x3x2
P(x1,x2,x3) = P(x1)P(x2)P(x3|x1,x2)
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Directed Graphical Models (2)
Each node is associated with a random variable
Each arrow is associated with conditional dependencies (Parentsโchild)
Shaded nodes illustrates an observed variable
Plates stand for repetitions of i.i.d. drawings of the random variables
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Classification problem
This is problem is โunsupervisedโ where one is searching for best labels that fit the data, and does not have any examples that contain labels
Perceptron and Support vector machines are widely used for classifications. These are discriminative methods
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Classification: assigning labels to data
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Discrete Classifier, modeling the boundaries between different classes of the data
Prediction of Categorical output e.g., SVM
Density Estimator: modeling the distribution of the data points themselves
Generative Models e.g. NB
Density estimator
The simplest model for density estimation is the Naรฏve Bayes classifier
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Assumes that each of the data points is distributed independently:
Results in a trivial learning algorithm
Usually does not suffer from overfitting
Directed graph: โreal worldโ example
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
x
z
w
D
A
T
da
NdThe author topic model
Statistical modeling of data mining:
Huge corpus, authors and words are observed, topics and relations are learned.
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Goal
Automatically extract topical content of documents and learn association of topics to authors of documents
Expand existing probabilistic topic models to include author information
Some queries that model should be able to answer: โ What topics does author X work on?โ Which authors work on topic X? โ What are interesting temporal patterns in topics?
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Previous topic-based models
Hoffman (1999): Probabilistic Latent Semantic Indexing (pLSI)
โ EM implementationโ Problem of overfitting
Latent Dirichlet Allocation (LDA) Blei, Ng, & Jordan (2003): Griffiths
& Steyvers, (PNAS 2004): โ Clarified the pLSI modelโ Variational EM, Scalability?โ Gibbs sampling technique for inference
Computationally simple, Efficient (linear with size of data), Can easily be applied to >100K documents
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Classification
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Topics Model for Semantic Representation
Based on a Professor Mark Steyverโs slides, a joint work of Mark Steyverโs (UCI) and Tom Griffiths (Stanford)
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
The DRM Paradigm
The Deese (1959), Roediger, and McDermott (1995) Paradigm:
Subjects hear a series of word lists during the study phase, each comprising semantically related items strongly related to another non-presented word (โfalse targetโ).
Subjects (later) receive recognition tests for all words plus other distracted words including the false target.
DRM experiments routinely demonstrate that subjects claim to recognize false tagets.
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Example: test of false memory effects in the DRM Paradaigm
STUDY: Bed, Rest, Awake, Tired, Dream, Wake, Snooze, Blanket, Doze, Slumber, Snore, Nap, Peace, Yawn, Drowsy
FALSE RECALL: โSleepโ 61%
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
A Rational Analysis of Semantic Memory
Our associative/semantic memory system might arise from the need to efficiently predict word usage with just a few basis functions (i.e., โconceptsโ or โtopicsโ)
The topics model provides such a rational analysis
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
A Spatial Representation: Latent Semantic Analysis (Landauer & Dumais, 1997)
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
Document/Term count matrix
1
โฆ
16
โฆ
0
โฆ
SCIENCE
โฆ
6190RESEARCH
2012SOUL
3034LOVE
Doc3 โฆ Doc2Doc1
High dimensional space
SOUL
RESEARCH
LOVE
SCIENCE
SVD
EACH WORD IS A SINGLE POINT IN A SEMANTIC SPACE
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Triangle Inequality constraint on words with multiple meanings
Euclidian distance: AC AB + BC
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
FIELD MAGNETIC
SOCCER
AB
BC
AC
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
A generative model for topics
Each document (i.e. context)
is a mixture of topics.
Each topic is a distribution
over words.
Each word is chosen
from a single topic.
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
z
wN
D
T
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
A toy example
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
P( z = 1 ) P( z = 2 )
wi
P( w | z )SCIENTIFIC 0.4 KNOWLEDGE 0.2WORK 0.1RESEARCH 0.1MATHEMATICS 0.1MYSTERY 0.1
P( w | z )HEART 0.3 LOVE 0.2SOUL 0.2TEARS 0.1MYSTERY 0.1JOY 0.1
Words can occur in multiple topics
TOPIC MIXTURE
All probability to topic 1โฆ
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
P( z = 1 )=1 P( z = 2 )=0
wi
P( w | z )SCIENTIFIC 0.4 KNOWLEDGE 0.2WORK 0.1RESEARCH 0.1MATHEMATICS 0.1MYSTERY 0.1
P( w | z )HEART 0.3 LOVE 0.2SOUL 0.2TEARS 0.1MYSTERY 0.1JOY 0.1
One TOPIC
Document: HEART, LOVE, JOY, SOUL, HEART, โฆ.
All probability to topic 2โฆ
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
P( z = 1 )=0 P( z = 2 )=1
wi
P( w | z )SCIENTIFIC 0.4 KNOWLEDGE 0.2WORK 0.1RESEARCH 0.1MATHEMATICS 0.1MYSTERY 0.1
P( w | z )HEART 0.3 LOVE 0.2SOUL 0.2TEARS 0.1MYSTERY 0.1JOY 0.1
Document: SCIENTIFIC, KNOWLEDGE, SCIENTIFIC, RESEARCH, โฆ.
Application to corpus data
TASA corpus: text from first grade to collegeโ representative sample of text
26,000+ word types (stop words removed) 37,000+ documents 6,000,000+ word tokens
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Fitting the model
Learning is unsupervised Learning means inverting the generative model
โ We estimate P( z | w ) โ assign each word in the corpus to one of T topics
โ With T=500 topics and 6x106 words, the size of the discrete state space is (500)6,000,000 HELP!
โ Efficient sampling approach Markov Chain Monte Carlo (MCMC)
โ Time & Memory requirements linear with T and N
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Gibbs Sampling & MCMCsee Griffiths & Steyvers, 2003 for details
Assign every word in corpus to one of T topics
Sampling distribution for z:
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
number of times word w assigned to topic j
number of times topic j used in document d
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
A selection from 500 topics [P(w|z = j)]
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
THEORYSCIENTISTSEXPERIMENTOBSERVATIONSSCIENTIFICEXPERIMENTSHYPOTHESISEXPLAINSCIENTISTOBSERVEDEXPLANATIONBASEDOBSERVATIONIDEAEVIDENCETHEORIESBELIEVEDDISCOVERED
SPACEEARTHMOONPLANETROCKETMARSORBITASTRONAUTSFIRSTSPACECRAFTJUPITERSATELLITESATELLITESATMOSPHERESPACESHIPSURFACESCIENTISTSASTRONAUT
ARTPAINTARTISTPAINTINGPAINTEDARTISTSMUSEUMWORKPAINTINGSSTYLEPICTURESWORKSOWNSCULPTUREPAINTERARTSBEAUTIFULDESIGNS
BRAINNERVESENSESENSESARENERVOUSNERVESBODYSMELLTASTETOUCHMESSAGESIMPULSESCORDORGANSSPINALFIBERSSENSORY
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Polysemy: words with multiple meanings represented in different topics
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
FIELDMAGNETIC
MAGNETWIRE
NEEDLECURRENT
COILPOLESIRON
COMPASSLINESCORE
ELECTRICDIRECTION
FORCEMAGNETS
BEMAGNETISM
SCIENCESTUDY
SCIENTISTSSCIENTIFIC
KNOWLEDGEWORK
RESEARCHCHEMISTRY
TECHNOLOGYMANY
MATHEMATICSBIOLOGY
FIELDPHYSICS
LABORATORYSTUDIESWORLD
SCIENTIST
BALLGAMETEAM
FOOTBALLBASEBALLPLAYERS
PLAYFIELD
PLAYERBASKETBALL
COACHPLAYEDPLAYING
HITTENNISTEAMSGAMESSPORTS
JOBWORKJOBS
CAREEREXPERIENCE
EMPLOYMENTOPPORTUNITIES
WORKINGTRAINING
SKILLSCAREERS
POSITIONSFIND
POSITIONFIELD
OCCUPATIONSREQUIRE
OPPORTUNITY
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Predicting word association
LSA: finds the closest word
Topics Model: do inference โ given that one word was observed what will be the next word with the highest probability?
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Word Association (norms from Nelson et al. 1998)
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
Associate N. People:1 EARTH2 STARS3 SPACE4 SUN5 MARS
CUE: PLANET
Model STARS
SUN EARTH SPACE
SKY
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
First associate โEARTHโ is in the set of 5 associates (from the model)
P( set contains first associate )
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
100
101
102
103
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
P(s
et
con
tain
s fir
st a
sso
cia
te)
Set size
LSATOPICS
Explaining variability in false recall
One factor: mean associative strength of list items to critical item (Deese 1959; Roediger et al. 2001).
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
BEDREST
AWAKETIRED
โฆDROWSY
.638
.475
.618
.493
.551
For 55 DRM lists, R = .69 (with the given lexicon)
Mean = .431
SLEEP
One recall component: inference
Encoding: study words lead to topics distribution (โgistโ)
Retrieval: infer words from stored topics distribution
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
Predictions for the โSleepโ list
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
0 0.05 0.1 0.15 0.2 0.25
SLEEP
RESTTIRED
BEDWAKE
AWAKENAP
DREAMYAWNDROWSYBLANKETSNORESLUMBERDOZEPEACE
SLEEPNIGHT
HOURSMORNING
ASLEEPSLEEPYAWAKENEDSPENT
STUDYLIST
EXTRALIST
(top 8)
studyrecallwP w|
Correlation between intrusion rates and predictions
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
LSA
# Dimensions
0 200 400 600 800
Cor
rela
tion
0.2
0.3
0.4
0.5
0.6
0.7
0.8
TOPICS MODEL
# Topics
0 400 800 1200 1600 2000
Cor
rela
tion
0.2
0.3
0.4
0.5
0.6
0.7
0.8
(word association) (word association)
Other recall components??? One possibility: two routes add strength
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
ืงืืจืก ืืจืืื ืชืืืื ืืจื ืืจืฉืชืืช ืืืจืื ืืืงืจ ืืืื ืืื ืืืจืกืืืช ืืจึพ ืืืื ืชืฉืกืดื
0 0.05 0.1 0.15 0.2 0.25
SLEEP
RESTTIRED
BEDWAKE
AWAKENAP
DREAMYAWNDROWSYBLANKETSNORESLUMBERDOZEPEACE
SLEEPNIGHT
HOURSMORNING
ASLEEPSLEEPYAWAKENEDSPENT
STUDYLIST
EXTRALIST
(top 8)