cognitive science final paper--chengyuan

13
More biological plausible Neuron Network on image recognition Chengyuan Zhuang University of North Texas, Denton, TX [email protected] On my honor as a UNT student, I have neither given nor received unauthorized assistance on this work. Abstract Artificial Neural Network (ANN) is a very powerful tool as researchers gain deeper and deeper understanding on it. Nowadays Deep Neuron Network (DNN) has become the most promising technique to make significant improvement on all high level Artificial Intelligence applications. However, Deep Neuron Network still has limitations of lacking biological plausibility and requiring complex high level math computation. We present a more biological plausible Neural Network on image recognition which deviates from traditional error-based model. By following more biological principles, we aim to fix these limitations and still can give good results. Introduction Artificial Neuron Network (ANN), which imitates biological neuron network to gain advanced intelligence, is gradually showing its power as researchers gain deep understanding on it. Recent Deep Neuron Network (DNN), with multiple layers similar to biological neuron networks, is first able to learn deep (complex) features instead of shallow (simple) features by previous techniques, and has shown amazing performance, therefore become the most promising tool in high level Artificial Intelligence applications. However, Deep Neuron Network still has limitations: not so biological plausible. Researchers argue that there is no evidence of such process as back propagation of error or gradient descent in the brain; weights setting and parameter tuning are tricky. The pure mathematic solution for DNN is very computational expansive, and easily to fall into local optimal instead of global optimal. To run Deep Neuron Network is very expensive: given large training data, it usually requires lots of machines or clusters to run days or weeks (e.g. Automatic Speech Recognition, ASR, in google). In industry filed, due to the high amount of computation and time limit, high level servers equipped with top GPUs are required to address this problem. Industry and academic fields are eager expecting new technique to reduce cost and time. There seems not much work in biological plausible Artificial Neuron Network (ANN) and very difficult to give an practical good solution. The typical error-based model is very few ones which can "get things done", and provide acceptable results,

Upload: chengyuan-zhuang

Post on 19-Aug-2015

85 views

Category:

Documents


2 download

TRANSCRIPT

More biological plausible Neuron Network on image recognitionChengyuan Zhuang

University of North Texas, Denton, [email protected]

On my honor as a UNT student, I have neither given nor received unauthorizedassistance on this work.

Abstract

Artificial Neural Network (ANN) is a very powerful tool as researchers gaindeeper and deeper understanding on it. Nowadays Deep Neuron Network (DNN) hasbecome the most promising technique to make significant improvement on all highlevel Artificial Intelligence applications. However, Deep Neuron Network still haslimitations of lacking biological plausibility and requiring complex high level mathcomputation. We present a more biological plausible Neural Network on imagerecognition which deviates from traditional error-based model. By following morebiological principles, we aim to fix these limitations and still can give good results.

Introduction

Artificial Neuron Network (ANN), which imitates biological neuron network togain advanced intelligence, is gradually showing its power as researchers gain deepunderstanding on it. Recent Deep Neuron Network (DNN), with multiple layerssimilar to biological neuron networks, is first able to learn deep (complex) featuresinstead of shallow (simple) features by previous techniques, and has shown amazingperformance, therefore become the most promising tool in high level ArtificialIntelligence applications.

However, Deep Neuron Network still has limitations: not so biological plausible.Researchers argue that there is no evidence of such process as back propagation oferror or gradient descent in the brain; weights setting and parameter tuning are tricky.The pure mathematic solution for DNN is very computational expansive, and easily tofall into local optimal instead of global optimal. To run Deep Neuron Network isvery expensive: given large training data, it usually requires lots of machines orclusters to run days or weeks (e.g. Automatic Speech Recognition, ASR, in google).In industry filed, due to the high amount of computation and time limit, high levelservers equipped with top GPUs are required to address this problem. Industry andacademic fields are eager expecting new technique to reduce cost and time.

There seems not much work in biological plausible Artificial Neuron Network(ANN) and very difficult to give an practical good solution. The typical error-basedmodel is very few ones which can "get things done", and provide acceptable results,

so it become a standard for almost all Neural Network design later. Only very fewpeople explore different ways, but it's extremely difficult to give to a new networkdesign for high level Artificial Intelligence tasks, rather than simple tasks.

Related Work

Previous research has already made some discoveries in biological neuron networks.Each neuron is a cell and has a cell body containing a nucleus. There are manyroot-like extensions from the cell body. These are called neurites. Neuron sendssignals to another neuron through axon and synapses, and receive signals from manydendrites. The human brain has a huge number of synapses. Each of the 1011 (onehundred billion) neurons has on average 7,000 synaptic connections to other neurons(Drachman D, 2005). There are two different types of neurite: Some of the signalsreaching the neuron's dendrites promote firing and others inhibit it. These are calledexcitatory and inhibitory synapses respectively.

All neurons are electrically excitable, maintaining voltage gradients across theirmembranes by means of metabolically driven ion pumps. If the voltage changes by alarge enough amount, an all-or-none electrochemical pulse called an action potentialis generated, which travels rapidly along the cell's axon, and activates synapticconnections with other cells when it arrives. (Wikipedia: Neuron)

Researchers have discovered Hebbian Learning as a very import learning rule forneurons to learn from experience: When an axon of a cell A is near enough to excitecell B or repeatedly or persistently takes part in firing it, some growth or metabolicchange takes place in both cells such that A's efficiency, as one of the cells firing B, isincreased (Hebb, D.O. 1949). This is the unsupervised (and could also be supervised)learning process -- concise, elegant, and powerful -- comparing to the awkward,cumbersome error back propagation, gradient descent computation.

The theory is often summarized as "Cells that fire together, wire together" (Doidge,2007). However, this summary should not be taken literally. Hebb emphasized thatcell A needs to ''take part in firing'' cell B, and such causality can only occur if cell Afires just before, not at the same time as, cell B. From this theory, researchers havealso developed other related theories as well: Long-term potentiation andLong-term depression. Both can be plausible weights updating rules in ANN.

In neuroscience, Long-term potentiation (LTP) a persistent strengthening ofsynapses based on recent patterns of activity. These are patterns of synaptic activitythat produce a long-lasting increase in signal transmission between two neurons(Cooke SF, 2006). It is one of several phenomena underlying synaptic plasticity, theability of chemical synapses to change their strength. As memories are thought to beencoded by modification of synaptic strength, LTP is widely considered one of themajor cellular mechanisms that underlies learning and memory (Cooke SF, 2006).

Long-term depression (LTD), in neurophysiology, is an activity-dependentreduction in the efficacy of neuronal synapses lasting hours or longer following a longpatterned stimulus. LTD occurs in many areas of the CNS with varying mechanismsdepending upon brain region and developmental progress (Massey PV, 2007). LTD isone of several processes that serves to selectively weaken specific synapses in orderto make constructive use of synaptic strengthening caused by LTP. This is necessarybecause, if allowed to continue increasing in strength, synapses would ultimatelyreach a ceiling level of efficiency, which would inhibit the encoding of newinformation (Purves D,2008). LTP and LTD make perfect weights updating rules.

Since there are inhibitory connections, researchers also developed CompetitiveLearning: a form of unsupervised learning in artificial neural networks, in whichnodes compete for the right to respond to a subset of input data (Rumelhart, 1986). Avariant of Hebbian learning, competitive learning works by increasing thespecialization of each node in the network. It is well suited to finding clusters withindata. Competitive Learning can assign label without flaws of error-based approach.

The key to making this work is that there are inhibitory connections between theunits in a single layer. The point of these inhibitory connections is that they allow theoutput units to compete with each other. So, the unit that fires the most will win thecompetition. Only the winning unit is "rewarded" (by having its weights increased).This increase in weights makes it more likely to win the competition when the input issimilar. The end result is that each output ends up firing in response to a set of similarinputs (Bermúdez, 2014). Models and algorithms based on this principle includeVector Quantization and Self-Organizing Maps (Kohonen maps).

A Self-Organizing Map (SOM) or Self-Organizing Feature Map (SOFM) is atype of Artificial Neural Network (ANN) that is trained using unsupervised learningto produce a low-dimensional (typically two-dimensional), discretized representationof the input space of the training samples, called a map. Self-organizing maps aredifferent from other artificial neural networks in the sense that they use aneighborhood function to preserve the topological properties of the input space(Wikipedia: Self-Organizing Map).

The goal of learning in the Self-Organizing Map (SOM) is to cause different partsof the network to respond similarly to certain input patterns. This is partly motivatedby how visual, auditory or other sensory information is handled in separate parts ofthe cerebral cortex in the human brain (Haykin, 1999).

A Self-Organizing Map (SOM) consists of components called nodes or neurons.Associated with each node are a weight vector of the same dimension as the inputdata vectors, and a position in the map space. The usual arrangement of nodes is atwo-dimensional regular spacing in a hexagonal or rectangular grid. Theself-organizing map describes a mapping from a higher-dimensional input space to alower-dimensional map space. The procedure for placing a vector from data spaceonto the map is to find the node with the closest (smallest distance metric) weightvector to the data space vector (Wikipedia: Self-Organizing Map). The neighborhoodfunction is like a Mexico Hat, and the shape can be square or bee cell.

However, Self-Organizing Map (SOM) requires normalize input matrix to be 1,and also normalize each inside matrix to be 1 to decide matching before each learning,this process makes it not so biological plausible, because you can’t let neurons do it.

Also powerful is Adaptive Resonance Theory. Principles derived from an analysisof experimental literatures in vision, speech, cortical development, and reinforcementlearning, including attentional blocking and cognitive-emotional interactions, led tothe introduction of adaptive resonance as a theory of human cognitive informationprocessing (Grossberg, 1976). The theory has evolved as a series of real-time neuralnetwork models that perform unsupervised and supervised learning, patternrecognition, and prediction (Duda, Hart, and Stork, 2001; Levine, 2000).

The primary intuition behind the ART model is that object identification andrecognition generally occur as a result of the interaction of 'top-down' observerexpectations with 'bottom-up' sensory information. The model postulates that'top-down' expectations take the form of a memory template or prototype that is thencompared with the actual features of an object as detected by the senses. Thiscomparison gives rise to a measure of category belongingness. As long as thisdifference between sensation and expectation does not exceed a set threshold calledthe 'vigilance parameter', the sensed object will be considered a member of theexpected class. The system thus offers a solution to the 'plasticity/stability' problem,i.e. the problem of acquiring new knowledge without disrupting existing knowledge(Wikipedia: Adaptive resonance theory).

The basic ART system is an unsupervised learning model. It typically consists of acomparison field and a recognition field composed of neurons, a vigilance parameter(threshold of recognition), and a reset module. The comparison field takes an inputvector (a one-dimensional array of values) and transfers it to its best match in therecognition field. Its best match is the single neuron whose set of weights (weightvector) most closely matches the input vector. Each recognition field neuron outputs anegative signal (proportional to that neuron's quality of match to the input vector) toeach of the other recognition field neurons and thus inhibits their output. In this way

the recognition field exhibits lateral inhibition, allowing each neuron in it to representa category to which input vectors are classified. After the input vector is classified, thereset module compares the strength of the recognition match to the vigilanceparameter. If the vigilance parameter is overcome, training commences: the weightsof the winning recognition neuron are adjusted towards the features of the inputvector. Otherwise, if the match level is below the vigilance parameter the winningrecognition neuron is inhibited and a search procedure is carried out. In this searchprocedure, recognition neurons are disabled one by one by the reset function until thevigilance parameter is overcome by a recognition match. In particular, at each cycleof the search procedure the most active recognition neuron is selected and thenswitched off if its activation is below the vigilance parameter (note that it thusreleases the remaining recognition neurons from its inhibition). If no committedrecognition neuron's match overcomes the vigilance parameter, then an uncommittedneuron is committed and its weights are adjusted towards matching the input vector.The vigilance parameter has considerable influence on the system: higher vigilanceproduces highly detailed memories (many, fine-grained categories), while lowervigilance results in more general memories (fewer, more-general categories)(Wikipedia: Adaptive resonance theory). ART model has answered manyfundamental questions of the mechanism in learning process.

Also inspired by biology is Genetic algorithm: a search heuristic that mimics theprocess of natural evolution, such as inheritance, mutation, selection, and crossover.This heuristic is routinely used to generate useful solutions to optimization and searchproblems (Mitchell 1996). In each generation (iteration), the fitness of everyindividual is evaluated; the more fit individuals are stochastically selected. As a result,individuals evolve toward better solutions after iterations.

Kirt (2013) present an Evolution based Features Construction for General ObjectRecognition. They use Genetic algorithm to decide an ordering of basic imagetransforms, parameters, and also which region to operate on, with excellent results.

Our approach

Our approach is inspired from all previous related work, and we aim to capture theindeed mechanism of learning process in lifeforms. After all these research, andserious deep thinking on all the information -- mainly in biological perspective andalso some in modeling, computational perspective, we have found the weakness, notso biological plausible issue in traditional design of Artificial Neuron Network(ANN): although neurons do have threshold for firing, they over use it. Traditionaldesign is that, each category is one neuron at last column, to correctly classify, onlyone neuron represent the true label can fire, all other neurons at the last column shouldnot fire -- anything violates that is called an "error" -- and these errors should bepropagated back to rescue the correctness of final output. This awkward, cumbersomeprocess is so hard -- errors quickly vanished through layers -- that is why before 2006when Deep Learning emerges, this method is not practical.

What we think is quite different: from relevant reading and serious thinking, werealize that the truth is not this case -- actually there are huge number of neuronsfiring at the same time inside human brain. So only allow one neuron to fire is notsound, not to mention for doing that, it still needs to change threshold (raising ordecreasing so freely to fit the model, another biological implausible issue). We believethe threshold for a particular neuron is somehow fixed, can vary but not much, lots ofneurons can fire at the same time to the same layer inside one Neuron Network, byinhibitory connections, only the neuron which gets strongest input will win (inSelf-Organizing Map, its nearby neurons could also win, but not gain that much),and only the winner fires, other loser neurons get inhibited. By Hebbian Learningtheory, only the firing neurons in previous layer which contribute input to this winnerwill increase the connections to the winner -- this is a very elegant learning process.This answers how weights are updated inside Artificial Neuron Network (ANN),and also how to get label -- by competition.

The most difficult part is how patterns are learned and organized. Self-OrganizingMap (SOM) has provided an good perspective, but its weights updating rule is notHebbian Learning theory -- just adjusts to mimic the input vector, normalized to be 1,which makes it not so biological plausible. The most difficult thing is to locate apattern if you do not use their mathematic way. Inspired by Adaptive resonancetheory (ART), with our original thoughts of using Hebbian Learning for weightsupdating rule, we give a biological plausible design.

The application we aim to target is image recognition. In image recognition, thefeature we plan to use is only raw pixels -- two dimensional pixels matrix, no anytransformed high level features. The dataset to train and test is handwriting digits.Each pixel has two kinds of value: zero represents white, non-zero represents black.

In our Neural Network design, we create two layers: Short-Term Memory Layerand Long-Term Memory Layer. They are different: Short-Term Memory isresponsible for checking patterns, it learns fast; Long-Term Memory is responsiblefor storing patterns, it learns slowly but last long, and also responsible for recognitionclassification decision.

For any input pattern, the Short-Term Memory Layer will check whether itcontains this pattern or not: if any pattern inside matching the input pattern over acertain high threshold value, then the pattern already inside; otherwise is new. Fornew pattern, Short-Term Memory Layer will activate a free neuron does not containany pattern to store the new input pattern, and also through connections toLong-Term Memory Layer to activate a free neuron to store this new input patternfor long. In this process, the connection from the new Short-Term Memory neuronto the new Long-Term Memory neuron is built. With this connection, theLong-Term Memory neuron is able to learn: each time, a small portion (learning rate)of the input pattern matrix will be added (learned) to the Long-Term Memoryneuron's weight matrix. This means, through this learning, it become more similar tothe input pattern by a small amount. With this input pattern happens enough time, thisLong-Term Memory neuron's weight matrix will be more and more close to the inputpattern, that will give high response output signal when it receives this input pattern.

Inspired from Self-Organizing Map (SOM), we do not want only one neuron to beselected and learned -- we want some similar neurons could also learn. So we set thesimilarity criteria, neurons over that criteria will all be seen as "matching", so thecorresponding Long-Term Memory neurons will get to learn.

Another important design is category. Whether by unsupervised learning orsupervised learning, we get categories. And for Short-Term Memory Layer tosearch for a pattern, it actually check by categories (regions). Different categories takedifferent regions. If a category fails to match, it will switch to another category(region). With our target dataset -- MNIST database of hand-written digits, the datahas labels, so we incorporate supervised learning into our Network: for each trainingdata, get its label (class label), locate that category (class), search all patternsmatching over a certain high criteria (we set 0.9 for black pixel), all the correspondingLong-Term Memory neurons will be activated to learn, by adding a small portion ofthis input pattern to their weights matrix -- which means through this learning,connections to some pixels will be strengthen, some will be weaken. The neurons willbe more similar to the input pattern. Given enough learning on this input pattern, theseneurons will output strong signal when receive this input pattern, that is "recognition".

The whole architecture and learning process can be illustrated as below:

Dataset and result

The Dataset we use is the MNIST database of hand-written digits. The MNISTdatabase of handwritten digits contains 60,000 training images and 10,000 test images.It is a subset of a larger set available from NIST. The digits have beensize-normalized and centered in a fixed-size image (Y LeCun, 1998). It has become avery popular standard to compare the performance of machine learning results.

(Y LeCun, 1998)

About the class distribution, there are 10 digits from 0 to 9, meaning 10 class total.Each class is equally distributed, both in training and testing. Each image has its label.And the images are hand-written digits, so they are quite irregular and difficult thanprinted forms.

These are the comparing results when Y LeCun published his famous paper in 1998.

The performance now onMNIST database (from wikipedia):

Due to several design and redesign, we currently only have trained on 10,000equally distributed data, and test on 1,000 equally distributed data, the accuracy is93.2%. We just use a portion of total data, we believe that, with more training data,the performance still has room for improvement. We just give a basic architecture, ourapproach is still pretty rough, some good ideas are not implemented, and there is stillroom for improvement. We will keep working on that.

And there is also ImageNet LSVRC-2010 Dataset:

ImageNet is a dataset of over 15 million labeled high-resolution images belongingto roughly 22,000 categories. The images were collected from the web and labeled byhuman labels using Amazon's Mechanical Turk crowd-sourcing tool. Starting in 2010,as part of the Pascal Visual Object Challenge, an annual competition called theImageNet Large-Scale Visual Recognition Challenge (ILSVRC) has been held.ILSVRC uses a subset of ImageNet with roughly 1000 images in each of 1000

categories. In all, there are roughly 1.2 million training images, 50,000 validationimages, and 150,000 testing images (Krizhevsky A, 2012).

However, it is far more complex to deal with colored, high resolution image, due tothe time limit, we have to leave it as future work.

Conclusion

We have presented a more biological plausible Neural Network design on imagerecognition. It follows more biological principles, aims to be more close to the indeedlearning process in lifeforms. Due to the several changes in minds, the current designis just finished, we only trained and tested on both 1000 equally distributed images,the accuracy we got is 93.2%. We believe that given more training data will furtherincrease the performance of accuracy.

The good part of our design is that it follows more biological principles, and theweights update in a more biological plausible way. It is simple compared to traditionalerror-based approaches, and also able to do on-line learning: easy to add new classes,while it is impossible and needs to start over for traditional approaches. We haveexplored an untraditional ways, and the result is good, which makes our approachpractical and meaningful.

Future Work

Future work would incorporate high level group cluster algorithm, and explore howto improve performance while reduce computation cost. Self-Organizing Map (SOM)would also be incorporated more to gain high level ability. And we would also extendthis approach to colored and high resolution images as in ImageNet, to explore morecomplex, more demanding tasks such as object detection or automatic driving.

Acknowledgement

I would like to thank Dr. Nielsen for his value instruction and feedback on thiswork. I would also like to thank my peer group, Milad Pejmanrad and Frank Paiva,for their precious suggestions and sharing on this project. Finally I would also thankeveryone in the class, for their discussion and participation, which inspire me duringthis semester.

References

Drachman D (2005). "Do we have brain to spare?". Neurology 64 (12):2004–5. doi:10.1212/01.WNL.0000166914.38327.BB PMID 15985565

Hebb, D.O. (1949). "The Organization of Behavior". New York: Wiley & Sons.

The mnemonic phrase is usually attributed to Carla Shatz at Stanford University,referenced for example in Doidge, Norman (2007). The Brain That Changes Itself.United States: Viking Press. p. 427. ISBN 067003830X.

Cooke SF, Bliss TV (2006). "Plasticity in the human central nervous system". Brain129 (Pt 7): 1659–73. doi:10.1093/brain/awl082. PMID 16672292.

Massey PV, Bashir ZI (April 2007). "Long-term depression: multiple forms andimplications for brain function". Trends Neurosci. 30 (4): 176–84.doi:10.1016/j.tins.2007.02.005. PMID 17335914.

Purves D (2008). Neuroscience (4th ed.). Sunderland, Mass: Sinauer. pp. 197–200.ISBN 0-87893-697-1.

Rumelhart, David; David Zipser; James L. McClelland; et al. (1986). ParallelDistributed Processing, Vol. 1. MIT Press. pp. 151–193.

José Luis Bermúdez (2014). "Cognitive Science: An Introduction to the Science of theMind" Cambridge University Press. pp. 231

Kohonen, Teuvo (1982). "Self-Organized Formation of Topologically Correct FeatureMaps". Biological Cybernetics 43 (1): 59–69. doi:10.1007/bf00337288.

Haykin, Simon (1999). "9. Self-organizing maps". Neural networks - A comprehensivefoundation (2nd ed.). Prentice-Hall. ISBN 0-13-908385-5.

Duda, R.O., Hart, P.E., and Stork, D.G., 2001, Pattern Classification, Second Edition,New York: John Wiley, Section 10.11.2.

Levine, D.S., 2000, Introduction to Neural and Cognitive Modeling, Mahwah, NewJersey: Lawrence ErlbaumAssociates, Chapter 6.

Mitchell 1996, p. 2.

Kirt Lillywhite, Dah-Jye Lee, Beau Tippetts, James Archibald (2013). "A featureconstruction method for general object recognition" Pattern Recognition 46 (2013)3300–3314

Y LeCun (1998) Proceedings of the IEEE 86 (11), 2278-2324

Krizhevsky A, Sutskever I, Hinton G E. "Imagenet classification with deep

convolutional neural networks" Advances in neural information processing systems.2012: 1097-1105.