on the slowness principle and learning in hierarchical temporal memory

Download On the Slowness Principle and Learning  in Hierarchical Temporal Memory

Post on 03-Jan-2016

93 views

Category:

Documents

1 download

Embed Size (px)

DESCRIPTION

The slowness principle is believed to be one clue to how the brain solves theproblem of invariant object recognition. It states that external causes for sensoryactivation, i.e., distal stimuli, often vary on a much slower time scale than thesensory activation itself. Slowness is thus a plausible objective when the brain learnsinvariant representations of its environment. Here we review two approaches toslowness learning: Slow Feature Analysis (SFA) and Hierarchical TemporalMemory (HTM), and show how Generalized SFA (GSFA) links the two. Theconnection between SFA, Linear Discriminant Analysis (LDA), and LocalityPreserving Projections (LPP) is also investigated. Experimental work is presentedwhich demonstrates how the local neighborhood implicit in the original SFAformulation, by the use of the temporal derivative of the input, renders SFA moreefficient than LDA when applied to supervised pattern recognition, if the data has alow-dimensional manifold structure.Furthermore, a novel object recognition model, called Hierarchical GeneralizedSlow Feature Analysis (HGSFA), is proposed. Through the use of GSFA, the modelenables a possible manifold structure in the training data to be exploited duringtraining, and the experimental evaluation shows how this leads to greatly increasedclassification accuracy on the NORB object recognition dataset, compared topreviously published results.Lastly, a novel gradient-based fine-tuning algorithm for HTM is proposed andevaluated. This error backpropagation can be naturally and elegantly implementedthrough native HTM belief propagation, and experimental results show that a twostagetraining process composed by temporal unsupervised pre-training andsupervised refinement is very effective. This is in line with recent findings on otherdeep architectures, where generative pre-training is complemented by discriminantfine-tuning.

TRANSCRIPT

  • On the Slowness Principle and Learning in Hierarchical Temporal Memory

    Erik M. Rehn

    A thesis submitted in partial fulfillment of the requirements for the degree of

    Master of Science in Computational Neuroscience

    Bernstein Center for Computational Neuroscience Berlin, Germany

    February 01, 2013

  • 1

    Abstract The slowness principle is believed to be one clue to how the brain solves the problem of invariant object recognition. It states that external causes for sensory activation, i.e., distal stimuli, often vary on a much slower time scale than the sensory activation itself. Slowness is thus a plausible objective when the brain learns invariant representations of its environment. Here we review two approaches to slowness learning: Slow Feature Analysis (SFA) and Hierarchical Temporal Memory (HTM), and show how Generalized SFA (GSFA) links the two. The connection between SFA, Linear Discriminant Analysis (LDA), and Locality Preserving Projections (LPP) is also investigated. Experimental work is presented which demonstrates how the local neighborhood implicit in the original SFA formulation, by the use of the temporal derivative of the input, renders SFA more efficient than LDA when applied to supervised pattern recognition, if the data has a low-dimensional manifold structure.

    Furthermore, a novel object recognition model, called Hierarchical Generalized Slow Feature Analysis (HGSFA), is proposed. Through the use of GSFA, the model enables a possible manifold structure in the training data to be exploited during training, and the experimental evaluation shows how this leads to greatly increased classification accuracy on the NORB object recognition dataset, compared to previously published results.

    Lastly, a novel gradient-based fine-tuning algorithm for HTM is proposed and evaluated. This error backpropagation can be naturally and elegantly implemented through native HTM belief propagation, and experimental results show that a two-stage training process composed by temporal unsupervised pre-training and supervised refinement is very effective. This is in line with recent findings on other deep architectures, where generative pre-training is complemented by discriminant fine-tuning.

  • 2

    Eidesstattliche Versicherung Die selbstndige und eigenhndige Ausfertigung versichert an Eides statt ............................ (Datum/Date) .. (Ort/Place)

    Statutory Declaration I declare in lieu of oath that I have written this thesis myself and have not used any sources or resources other than stated for its preparation (Unterschrift/ Signature)

  • 3

    Contents

    1 Introduction ............................................................................................................................. 5 1.1 The Slowness Principle ...................................................................................................... 6 1.2 Outline ................................................................................................................................ 7 1.3 Mathematical notation ........................................................................................................ 8

    2 Slowness learning ................................................................................................................... 9 2.1 Slowness learning as feature extraction ............................................................................. 9 2.2 Slowness learning as graph partitioning ........................................................................... 10

    2.2.1 Normalized spectral clustering ................................................................................ 12 2.3 Unifying perspective: Generalized Adjacency ................................................................. 14 2.4 Hierarchical SFA .............................................................................................................. 16 2.5 Repeated SFA ................................................................................................................... 16

    3 SFA as a Locality Preserving Projection .............................................................................. 17 3.1 Locality Preserving Projections ........................................................................................ 17 3.2 Relation to PCA and LDA ................................................................................................ 17 3.3 Relation to SFA ................................................................................................................ 19 3.4 Manifold learning as regularization .................................................................................. 20 3.5 Semi-supervised learning with manifolds ........................................................................ 24

    4 Hierarchical Generalized Slow Feature Analysis for Object Recognition ........................... 25 4.1 Related work ..................................................................................................................... 25 4.2 Model ................................................................................................................................ 26

    4.2.1 Output normalization ............................................................................................... 27 4.2.2 K-means feature extraction ...................................................................................... 27

    4.3 Adjacency ......................................................................................................................... 27 4.3.1 Class adjacency ........................................................................................................ 28 4.3.2 K-random adjacency ................................................................................................ 28 4.3.3 K-nearest neighborhood adjacency .......................................................................... 28 4.3.4 Transformation adjacency ....................................................................................... 28 4.3.5 Temporal adjacency ................................................................................................. 28

    4.4 Experiments on SDIGIT ................................................................................................... 28 4.4.1 Pattern generation .................................................................................................... 29 4.4.2 Architecture ............................................................................................................. 30 4.4.3 Effect of neighborhood relations ............................................................................. 30

    4.5 Experiments on NORB ..................................................................................................... 34 4.5.1 Architecture ............................................................................................................. 34 4.5.2 Effect of supervised neighborhood relations ........................................................... 35 4.5.3 Performance of k-nearest neighborhood adjacency ................................................. 36 4.5.4 Comparison to previously published results ............................................................ 37

    4.6 Implementation ................................................................................................................. 37 4.7 Discussion & Conclusion ................................................................................................. 37

    5 Incremental learning in Hierarchical Temporal Memory ..................................................... 39 5.1 Network structure ............................................................................................................. 39 5.2 Information flow ............................................................................................................... 40 5.3 Internal node structure and pre-training ........................................................................... 40

    5.3.1 Spatial feature selection ........................................................................................... 40 5.3.2 Temporal clustering ................................................................................................. 41 5.3.3 Output node training ................................................................................................ 41

  • 4

    5.4 Feed-forward message passing ......................................................................................... 42 5.5 Feedback message passing ............................................................................................... 42 5.6 HTM Supervised Refinement ........................................................................................... 43

    5.6.1 Output node update .................................................................................................. 43 5.6.2 Intermediate nodes update ....................................................................................... 44 5.6.3 HSR pseudocode ...................................................................................................... 45

    5.7 Experimental evaluation ................................................................................................... 46 1.1 Training configurations .................................................................................................... 46 1.2 HTM scalability ................................................................................................................ 47

    5.7.1 Group change ....................................

View more >