the attitude to expected reward as the basis for fuzzy logic of the brain: we feel drives because we...

18
INTERNATIONAL JOURNAL OF COMPUTATIONAL COGNITION (HTTP://WWW.YANGSKY.COM/YANGIJCC.HTM), VOL. 3, NO. 1, MARCH 2005 91 The Attitude to Expected Reward as the Basis for Fuzzy Logic of the Brain: We Feel Drives Because We are Mortal (Invited Paper) U.Sandler and L.Tsitolovsky Abstract— Many theories relate to the brain as a complex net- work of neurons, which are approximated as simple elements that make summation of excitations and generate output reaction in accordance with simple activation functions. Such an idealization, however, is far from the properties of a real neuron. In this paper we present a comprehensive review of the literature and results of the original experiments which supply trustworthy evidence that a real neuron makes a prediction of consequences of input signals and generates an output signal in conformity with this prediction. Based on the experimental data we point out that the phenomenon of motivation appears already on the neuronal level as the phenomenon of avoidance of neuron damage. We argue also that avoidance of injury and aspiration to live is one of the main parts of physiological regulation of the neuron learning and the motivational processes. We propose, also, theoretical description of a learning process which is based on Fuzzy Dynamics: contemporary theory op- erates with “perceptions” (in terms of Zadeh’s terminology) as with a mathematical object. We point out that logic of a neuron’s decision-making may be close to the fuzzy logic. In the conclusion, we discuss possibility of design of a “feeling robot”, which will be able to use a trial-and-error self-learning process based on artificial motivations and effectively adapt (like an animal) its behavior in suddenly changing environmental con- ditions. Copyright c 2003-2005 Yang’s Scientific Research Institute, LLC. All rights reserved. Index Terms— Learning, motivation, neuronal damage, home- ostasis, fuzzy logic, fuzzy dynamics, “feeling robot”. I. PRELIMINARY REMARKS T HE brain’s behavior is not deterministic for either the supervisor studying the brain or the brain itself. The brain is an unique object. We usually examine brain behavior like that of any other object, but the brain itself also studies the environment. A brain does not possess complete information about the environment and its decision-making depends not only on information about a given event but also on the brain’s initial state, in other words, on particular drive or motivation, i.e. forces acting either on or within a person to initiate behavior (in this paper we will use words drive and Manuscript received December 1, 2003; revised December 1, 2003. U.Sandler, Math. Dept. Jerusalem College of Technology, Jerusalem 91160, Israel. ([email protected]) L.Tsitolovsky, Life Sci.Dept. Bar-Ilan University, Ramat Gan 52900, Israel. ([email protected]). Publisher Item Identifier S 1542-5908(05)10110-9/$20.00 Copyright c 2003-2005 Yang’s Scientific Research Institute, LLC. All rights reserved. The online version posted on December 1, 2003 at http://www.YangSky.com/ijcc31.htm motivation as synonyms, because on the neuronal level they are equivalent). Which “logic” does the brain use for description of an environment? The choice of the means for description of the environment is determined by the primary features of the brain and one of such features is capability of perceiving. So, it seems reasonable to employ such formal notion of perception, which the brain itself may use for the description of an environment. In this work we want to show that the subjective attitude of the brain to an expected event can be the reason for the advent of the brain’s logic of perception (fuzzy logic [58]). An animal’s arbitrary actions are aimed at avoidance of punishment or winning of a reward from the environment. In this work we analyze neuronal mechanisms of decision- making during sudden changes in the environment and have investigated two main elementary forms of animal behavior, classical and instrumental conditioning. Classical conditioning consists of a pairing of an initial neutral conditioned stimulus (tactile, light, etc.) with a meaningful unconditioned stimulus (pain, food, etc.). An animal learns that the conditioned stimulus predicts the occurrence of an unconditioned stimulus. During instrumental conditioning there is a specific association between the animal’s action and the unconditioned stimulus, which is absent in classical conditioning. In particular, the an- imal must determine which action will lead to reinforcement. At the early stage of acquisition the animal usually makes mistakes and therefore instrumental conditioning is also called trial-and-error learning. At the beginning of an experiment the animal has no information on which experimental procedure is presented and this leads to extremely non-stable and protracted dynamics of the formation of conditioned reactions [22], [53]. Study of the neuronal processes are involved in the two forms of conditioning, may help to understand, if an animal chooses a presumed form of behavior on the basis of statistical evaluation of its experience, or during, it believes in its previous successive trials. In the single neuron analog of instrumental conditioning, the animal receives a painful stimulus if the trained neuron fails to generate an action potential in response to a tactile stimulus, while in the neuronal analog of classical conditioning, the properties of the environment do not depend on animal actions. In this paper we demonstrate that the advent of instrumental reactions results from the recovery of the neuron after ex-

Upload: jct

Post on 14-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

INTERNATIONAL JOURNAL OF COMPUTATIONAL COGNITION (HTTP://WWW.YANGSKY.COM/YANGIJCC.HTM), VOL. 3, NO. 1, MARCH 2005 91

The Attitude to Expected Reward as theBasis for Fuzzy Logic of the Brain:

We Feel Drives Because We are Mortal(Invited Paper)

U.Sandler and L.Tsitolovsky

Abstract— Many theories relate to the brain as a complex net-work of neurons, which are approximated as simple elements thatmake summation of excitations and generate output reaction inaccordance with simple activation functions. Such an idealization,however, is far from the properties of a real neuron.

In this paper we present a comprehensive review of theliterature and results of the original experiments which supplytrustworthy evidence that a real neuron makes a prediction ofconsequences of input signals and generates an output signal inconformity with this prediction. Based on the experimental datawe point out that the phenomenon ofmotivation appears alreadyon the neuronal level as the phenomenon of avoidance of neurondamage. We argue also that avoidance of injury and aspirationto live is one of the main parts of physiological regulation of theneuron learning and the motivational processes.

We propose, also, theoretical description of a learning processwhich is based on Fuzzy Dynamics: contemporary theory op-erates with “perceptions” (in terms of Zadeh’s terminology) aswith a mathematical object. We point out that logic of a neuron’sdecision-making may be close to the fuzzy logic.

In the conclusion, we discuss possibility of design of a “feelingrobot”, which will be able to use a trial-and-error self-learningprocess based onartificial motivations and effectively adapt (likean animal) its behavior in suddenly changing environmental con-ditions. Copyright c© 2003-2005 Yang’s Scientific Research Institute,LLC. All rights reserved.

Index Terms— Learning, motivation, neuronal damage, home-ostasis, fuzzy logic, fuzzy dynamics, “feeling robot”.

I. PRELIMINARY REMARKS

T HE brain’s behavior is not deterministic for either thesupervisor studying the brain or the brain itself. The brain

is an unique object. We usually examine brain behavior likethat of any other object, but the brain itself also studies theenvironment. A brain does not possess complete informationabout the environment and its decision-making depends notonly on information about a given event but also on thebrain’s initial state, in other words, on particulardrive ormotivation, i.e. forces acting either on or within a person toinitiate behavior (in this paper we will use wordsdrive and

Manuscript received December 1, 2003; revised December 1, 2003.U.Sandler, Math. Dept. Jerusalem College of Technology, Jerusalem 91160,

Israel. ([email protected])L.Tsitolovsky, Life Sci.Dept. Bar-Ilan University, Ramat Gan 52900, Israel.([email protected]).

Publisher Item Identifier S 1542-5908(05)10110-9/$20.00Copyright c©2003-2005 Yang’s Scientific Research Institute, LLC. Allrights reserved. The online version posted on December 1, 2003 athttp://www.YangSky.com/ijcc31.htm

motivationas synonyms, because on the neuronal level theyare equivalent).

Which “logic” does the brain use for description of anenvironment? The choice of the means for description of theenvironment is determined by the primary features of thebrain and one of such features is capability of perceiving.So, it seems reasonable to employ such formal notion ofperception, which the brain itself may use for the descriptionof an environment. In this work we want to show that thesubjective attitude of the brain to an expected event can bethe reason for the advent of the brain’slogic of perception(fuzzy logic [58]).

An animal’s arbitrary actions are aimed at avoidance ofpunishment or winning of a reward from the environment.In this work we analyze neuronal mechanisms of decision-making during sudden changes in the environment and haveinvestigated two main elementary forms of animal behavior,classical and instrumental conditioning. Classical conditioningconsists of a pairing of an initial neutral conditioned stimulus(tactile, light, etc.) with a meaningful unconditioned stimulus(pain, food, etc.). An animal learns that the conditionedstimulus predicts the occurrence of an unconditioned stimulus.During instrumental conditioning there is a specific associationbetweenthe animal’s action and the unconditioned stimulus,which is absent in classical conditioning. In particular, the an-imal must determine which action will lead to reinforcement.At the early stage of acquisition the animal usually makesmistakes and therefore instrumental conditioning is also calledtrial-and-error learning. At the beginning of an experiment theanimal has no information on which experimental procedure ispresented and this leads to extremely non-stable and protracteddynamics of the formation of conditioned reactions [22],[53]. Study of the neuronal processes are involved in thetwo forms of conditioning, may help to understand, if ananimal chooses a presumed form of behavior on the basis ofstatistical evaluation of its experience, or during, it believes inits previous successive trials.

In the single neuron analog of instrumental conditioning, theanimal receives a painful stimulus if the trained neuron fails togenerate an action potential in response to a tactile stimulus,while in the neuronal analog of classical conditioning, theproperties of the environment do not depend on animal actions.In this paper we demonstrate that the advent of instrumentalreactions results from the recovery of the neuron after ex-

92 INTERNATIONAL JOURNAL OF COMPUTATIONAL COGNITION (HTTP://WWW.YANGSKY.COM/YANGIJCC.HTM), VOL. 3, NO. 1, MARCH 2005

citotoxic damage during the motivational excitation, and anaction potential in a single neuron can serve as an elementalinstrumental reaction at the cellular level. So, motivational-likebehavior can be observed in a single cell. Although neuronalreactions are not repeated from trial to trial, a neuron is taughtto evaluate the most preferable consequence of its action andto change its excitability [49], [52].

Based on the original experimental results and data fromliterature we have formulated several simple rules that charac-terize a neuronal behavior in a learning process. These rulesnaturally conform to the fuzzy description, so apparatus ofFuzzy Dynamics [16], [17] can be applied for theoreticaldescription of a single neuron learning. It is shown that such atheory is in acceptable agreement with the experimental data,so logic of a neuron’s decision-making may be close to thefuzzy logic.

It should be emphasized that such description of a neuronalbehavior has not only academic but also practical interest.Since fuzzy logic is easily computerized, it creates the pos-sibility of developing a new type of robotic system: “feelingrobot”, which behaves in according with an “artificial drives”,rather than with a deterministic algorithm for its behavior. Thisrobot may have an advantage for working in suddenly chang-ing and poorly predicted environmental conditions, becausein such a case the trial-and-error learning process based onartificial motivations may be significantly more effective thanconventional reinforcement learning approaches [47], [50].

In order not to overburden the reader with mathematicaldetails, we will restrict ourselves to the “physical level ofrigorousness”, i.e. all limits are supposed to be well defined,functions are supposed to be continual,etc. We have omittedalso most mathematical tricks, which make up the theoreticalphysics ”kitchen”. Most of the omitted things can be found in[15], [16], [17], [41], [42]. The reader who is interested only inapplications can leave out the Section II-B, but the Sections II-A, II-C and II-D should be read. One who is not interested inthe mathematical background, can omit Sections III-A and III-B. The Appendixes are intended for readers interested in moredetailed consideration of neuro-physiological mechanisms ofneurons’ motivational behavior and in details of the mathe-matical description of neuron adaptation.

II. EXPERIMENT: METHODS AND RESULTS

A. Neuro-physiological background

When choosing an action, an animal evidently uses its pre-viously accumulated experience. However, action generationdepends also on the animal’s attitude to the intended result.Force acting within an animal to initiate behavior is calleddrive or motivation. Motivation is a phenomenon consisting ofthe generation of actions that lead (following interaction withthe environment) to the primary goal, which is attainment ofa certain optimal state.

There is a limited number of simple biologic drives: feeding,drinking, respiration, temperature regulation, sexual motiva-tion, avoidance of danger and drug-dependence. Note, thatthese drives are not independent and can influence eachother. For a detailed description of the general properties of

motivations, we refer the reader to reviews [36], [11], [28],[57], [13], [25], [33].

Motivation is satisfied by reward or by the avoidanceof punishment (negative reinforcement). It is important thatstability of internal environmental factors and homeostasis areassociated with motivation [6]. A deviation from the optimuminduces action generation directed toward the correction of thisdeviation. For example, respiratory motivation is connectedwith a decrease in pH, thirst is an accompaniment of celldehydration, strong adverse stimuli have an excitotoxic effectand the metabolic signal for feeding motivation is an energydepletion [50]. So one can say that motivations arise as a resultof deviations of vitally important constants from their optimalvalues and are related to transient damage or to the threat ofdamage to the organism. Note that strong excitation of theneurons in many cases lead to damage and death [32], [35],[12], so preservation from excitation usually protects neuronsfrom damage. Moreover, simple biological motivations areconnected with excitotoxicity and neuronal injury [50]. Thesubstances that decrease motivation, such as cholecystokinin,opiates, insulin, gamma-aminobutyric acid and dopamine, inthe most cases, protect neurons from damage.

Since neurons of higher perceptive, emotional, homeostaticand motor centers (neocortex, hippocampus, hypothalamus andcerebellum) are especially vulnerable to damage, it is temptingto assume that the sensitivity of higher nervous centers toinjurious factors is not only an annoying occurrence, but ratherplays an essential role for the fulfillment of higher neuralfunctions. If this is indeed the case, motivation and rewardare related to the transient damage to and recovery of specificbrain neurons.

Although the protective role of reward means that neuronaldamage during motivation is transient, neurons are sometimesirreversibly injured, such as during drug-dependence, self-stimulation, stress and redundant mating [50]. When an animalencounters harmful influences it tries to avoid them by learningand mobilization of inner homeostatic mechanisms for thecompensation of injury. It has been shown that regulationof ion homeostasis itself leads to habituation, sensitization,and conditioning [13], [61], which means that processes ofcompensation may participate in the organization of learning.

A comparison of the neuronal processes that are responsiblefor classical and instrumental conditioning may throw lightupon the decision-making process. In particular, it is possibleto judge whether an animal’s choices are based only on astatistical evaluation of its experience, or the animal believesin its previous experience until this belief is destroyed byunsuccessful attempts. An animal’s attitude to the expectedresults of its actions may be estimated by evaluation ofmotivational excitation and the subsequent disturbance of aneuron’s condition evoked by this excitation.

B. Experiments on a single neuron

1) Methods:Elaboration of neuronal analogs of instrumen-tal and classical conditioning were performed under identicalconditions, with a negative reinforcement. The experimentswere carried out in a semi-intact preparation (Fig. 1) of non-

SANDLER & TSITOLOVSKY, ATTITUDE TO EXPECTED REWARD AS THE BASIS FOR FUZZY LOGIC OF THE BRAIN 93

Fig. 1. A schematic representation of the semi-intact preparation of Helix. Apreparation was used to record intracellularly from central neurons (with theneural system intact and connected to the periphery, magnified in the figure).A mollusk received whole-body tactile stimuli (conditioned stimulus, CS anddiscriminated stimulus, DS) from electrical stimulaters ES1 and ES3 by meansof inductance and whole-body painful stimulus from electrical stimulator ES2(unconditioned stimulus, US) in those cases in which the trained neuron didnot generate an instrumental action potential (shown at the left). LPaG andRPaG, left and right parietal ganglion. Identified neurons are shown by thepoints and designated by the ciphers.

anesthetized snails (Helix lucorum L.), as it was previouslydescribed in [51], [54].

The snails were fixed on a tray with paraffin and their shellswere removed. Ganglial connections were left intact. Theelectrical activity of the snail neurons responsible for defensiveclosure of the pneumostome was recorded intracellularly. Tac-tile stimulation of the foot served as the conditioned stimulus.A similar tactile stimulus directed to another point of the bodyserved as the discriminated stimulus (stimulus that does notlead to an environmental reaction) and a cutaneous electricstimulation of the mollusk’s body served as the unconditionedstimulus.

The basic schedule of the instrumental conditioning wasdelivered to only a single target neuron [54], [55] in orderto ensure that the instrumental reaction occurred within therecorded neuron (Fig. 1). The snail received negative rein-forcement when the trained neuron did not generate an actionpotential (AP ) in response to a conditioned stimulus within1.5-3 seconds. The appearance of an unconditioned stimulusdid not depend on the generation or failure of a spike in thecontrol neuron or on the presence of a discriminated stimulus.

In different experiments, various identified neurons wereused as the trained and control neurons. The neural systemhad to determine which neuron was responsible for avoidanceof punishment. The interval between consecutive conditionedstimuli was 2-5 min, with at least 40 learning trials. Thetraining procedure in experiments with elaboration of a neu-

Fig. 2. Representative intracellular recordings of neuronal responses ofthe neuron RPa3 to the combination of conditioned stimulus and US duringelaboration of the neuronal analog of a classical conditional reflex. solid circle- CS, black triangle - US, black square - DS A CS 1, A CS 5 and A CS30 - responses to the 1th, 5th and 30th combinations of the CS-US duringacquisition. A DS 1, A DS 3 and A DS 29 - responses to the DS after1th, 3th and 29th combinations of the CS-US during acquisition. R CS 1 -response to the first combination of CS-US during reacquisition. R DS 1 -first response to the DS during reacquisition. Mean membrane potential of theneuron during recording was -62.3 mV. Dotted line indicates level of potential-65 mV. Calibrations are shown at the plot. Tops of spikes are not shown.

ronal analog of the classical conditioned reflex consisted ofthe acquisition (25-35 combinations of the conditioned andunconditioned stimuli), an extinction series (15-20 presen-tations of the isolated conditioned stimulus after 5-10 minbreak), and a second acquisition stage consisting of repeateddevelopment of the conditioned reflex following a 20-minutebreak. The combinations were presented at intervals of 2-5min. The interval between the conditioned and unconditionedstimuli was 1 sec. The discriminated stimulus was presentedin pseudo-random order instead of the regular combination, inorder to assess the learning specificity. The development ofthe associative connection was judged by the change in theelectrical activity of the defensive behavior command neurons[3]. Training was carried out under conditions in whichthe defensive conditioned reflex of pneumostome closing inresponse to a tactile stimulus was well developed.

Figure 2 demonstrates an example of intracellular recordingduring classical conditioning. Throughout pairing, response tothe conditioned stimulus increased from 1 to 3APs, whileresponse to the discriminated stimulus decreased from 2 to1 AP . During extinction, the neuron failed to generate anAP (not shown in the Figure), but a response was partiallyrecovered after a 20 minute break of stimulation to 2APs atthe beginning of reacquisition.

A typical example of intracellular recording during elab-oration of a neuronal analog of instrumental conditioning ispresented in Figure 3. Response of the trained neuron to theconditioned stimulus increased during training, while responseof the control neuron to the conditioned stimulus decreased.Responses of both neurons to the discriminated stimulus de-clined during training. Figure 3 demonstrates both the correctand the incorrect responses, depending on the trained neuron’sreaction to the conditioned stimulus. Unconditioned stimuliwere presented only when a trained neuron (top curve at theeach frame) failed to generate anAP . Unconditioned stimulus

94 INTERNATIONAL JOURNAL OF COMPUTATIONAL COGNITION (HTTP://WWW.YANGSKY.COM/YANGIJCC.HTM), VOL. 3, NO. 1, MARCH 2005

Fig. 3. Representative intracellular recordings of neuronal responses duringelaboration of a local instrumental conditional reflex with the trained neuronRPa3 (top in each frame) and the control neuron LPa3 (bottom). At eachexposure, the number of the conditioned stimulus is indicated. For the re-sponses to the discriminated stimulus, the number of the preceding conditionedstimuli is indicated. Failure of the spike in the responses to conditionedstimulus numbers 1, 6, 11, 22, 30, 38 (circle) in the trained neuron attract aUS (triangles). Spike generation in the control neuron does not prevent theappearance of a US (trials 1, 6, 22) and spike failure did not attract a US(trials 16, 41). Responses to the discriminated stimulus pointed by a shorterexposure (black square). Mean membrane potential during recording in thetrained neuron was -58,8 mV and in the control neuron -60.2 mV. The dottedlines indicate level of potential -65 mV.

appearance did not depend on activity of the control neuron(bottom curve at the each frame). As a result of acquisition,responses of the trained neuron to the conditioned stimulusincreased from spike failure (trial 1) to spike generation(trial 41). Responses of the control neuron to the conditionedstimulus decreased from spike presence (trial 1) to spikefailure (trials 30, 38, 41). Both trained and control neuronsreduced their responses to the discriminated stimulus from 2spikes (right, trial 1) to 1 spike (right, trial 40). Averaged datahave confirmed these observations.

The dynamics of responses satisfy the basic propertiesof behavioral experiments. Conditioned response increasedduring acquisition of classical conditioning, decreased duringextinction sessions and rapidly recovered during reacquisition(see Fig. 4, top). This was not the case for responses to the

Fig. 4. Behavior of neuronal activity (means through n=24 neurons) duringacquisition, extinction and reacquisition of classical conditioning. At thetop - number ofAP in the responses to the CS and DS vs. the trialnumber. Medians and significance in difference between responses to theconditioned stimulus and discriminated stimulus are shown (Mann-WhitneyU test,∗ : P < 0.05; ∗∗ : P < 0.01; ∗ ∗ ∗ : P < 0.001). Solid symbolsrepresent the data for the conditioned stimuli and the open symbols the datafor the discriminated stimuli. At the bottom - mean membrane potentialand confidence intervals (P < 0.05) are shown. Arrows between the lastvalues during acquisition and the first values during extinction (and the lastvalue during extinction and the first value during reacquisition) point out thesignificance of the difference between these values.

discriminated stimulus. Figure 5 (top) demonstrates averagedata for change in the trained and control neuron responsesto the conditioned stimulus during instrumental conditioning.The dynamics of neuronal activity were not monotonous.Responses of a trained neuron to an originally neutral condi-tioned stimulus decreased at the beginning of a session. Thisdecreased response resulted in an increase in the frequency ofpunishing unconditioned stimuli. At the end of the learningsession, an instrumental reaction was selectively generated bythe trained neuronAP . During learning, responses of thecontrol neuron to the conditioned stimulus decreased onlyafter overcoming a local maximum (Fig. 5, top). This localmaximum (trials 17-31) is the result of a substitution of thetrained neuron action by the control neuron action.

Instrumental conditioning was selective in relation to boththe input (responses to conditioned stimulus vs responsesto discriminated stimulus) and the output (responses of atrained neuron vs responses of a control neuron). Therefore,the neuronal activity observed in our experiments may beconsidered as the instrumental action of the entire mollusk.

2) Ways to adaptation:An animal must decide whether ornot something important happened in the environment (if it isabsent, we have habituation); if there is a correlation betweena signal and a punishment or reward (if it is absent,we havepseudo-conditioning); if something depends on the animal’s

SANDLER & TSITOLOVSKY, ATTITUDE TO EXPECTED REWARD AS THE BASIS FOR FUZZY LOGIC OF THE BRAIN 95

Fig. 5. Behavior of neuronal activity (means through n=16 neurons) duringacquisition of instrumental conditioning. From top to bottom - the number ofAPs in the responses to the conditioned stimulus (medians, Mann-WhitneyU test), mean membrane potential and mean absolute value of differencingbetween two successive responses vs. the trial number. Confidence intervalsare shown,p < 0.05. Solid black square represent the data for the controlneurons and open square - the data for the trained neurons.

own actions (if it isn’t, we have classical conditioning) andwhich action influences the appearance of punishment orreward. At each stage the subject has only a fuzzy guide toforthcoming events, but gathers knowledge during learning.

During classical conditioning, the appearance of a condi-tioned stimulus always brings an end to the unconditionedstimulus, independently of the generation or failure of anAPin response to the conditioned stimulus. Accordingly, in ourexperiments the conclusion that neurons in the response tothe conditioned stimulus generated or failed to generate anAP with different probabilities was not significant for thefirst half of the training (test for binary sequences, Fig. 6(A),squares). This result corresponded to the properties of classicalconditioning and contradicted the properties of an instrumentalprocedure. At the end of classical conditioning, the neuronsin the main did generateAPs, but the experimental procedurewas, at this moment, already recognized as a classical one.

Evidently, an animal does not re-evaluate the expectation ofa reward each time but follows its decision for several trialsin succession. During conditioning, an animal also acquiredknowledge about which tactile stimulus would more fre-quently bring an end to the unconditioned stimulus (Fig. 6(A),rhombi). Although one of two tactile stimuli was beforehandchosen as a conditioned stimulus, this conclusion became sig-nificant only after 5-7 combinations of the conditioned and theunconditioned stimuli, after accumulation of the data duringtraining. An increase in the response to the conditioned stimu-lus also began after about only 7 combinations (Fig. 4, top). Atthis moment, appearance of the unconditioned stimulus after

Fig. 6. Statistical significance of information, collected by neurons duringclassicalA and instrumentalB,C conditioning. The difference in the numberof events corresponded to the presence or absence of the given sign andwas evaluated by the test for binary sequences. Significance of the dataaccumulated during the given number of trials (abscissa) was calculatedfor each neuron and mean values through neurons and confidence intervals(p < 0.05) are shown.A: black square demonstrate the significance ofthe conclusion that neurons in a response to the CS (always completed bythe US) generated or failed to generate anAP with different probability;rhombi designate the significance of the difference in the numbers of the USappearance in the responses to the CS and DS.B: rhombi demonstrate thesignificance of trained neuron participation in generation of the instrumentalreaction; this was determined by the product of two probabilities: significanceof the positive difference between the frequencies of absence and presenceof a US after anAP generation and significance of the positive differencebetween the presence and absence of a US after anAP failure. Availability ofonly one regularity is not sufficient: absence of a US after anAP generationmight mean habituation, while presence of a US after anAP failure mightmean classical conditioning.B: black square correspond to the significanceof control neuron participation in generation of the instrumental reaction. Acontrol neuron participated in instrumental reaction when it generated or failedto generate anAP in the same trials, as the trained neuron.C: rhombi as inA,but acquisition is slower than inA, since not each CS was completed by theUS. This depended on the presence of the instrumental reaction, generated bythe trained neuron.C: black square plot the significance of the conclusion thatin the course of instrumental conditioning during trials completed by the US,the control neuron in the response to the CS generated or failed to generatean AP with different probability. This means that the control neuron did notaccept an instrumental procedure as classical conditioning.

the conditioned stimulus still did not depend on generation orfailure of anAP in response to the conditioned stimulus (Fig.6(A), squares) in correspondence with a classical procedure.

In this way, the system acquired data corresponding toclassical conditioning. Instrumental conditioning proceeded ina much more complex manner. An animal evidently hopes toreceive a certain result if it generates an instrumental reaction.Therefore, comparison of the cases in which the unconditionedstimulus appeared either after generation or after failure of theAP in the given neuron is the criterion of itsAP participationin the instrumental reaction. Absence of a difference betweenthe appearance of these two events means that a given neuron

96 INTERNATIONAL JOURNAL OF COMPUTATIONAL COGNITION (HTTP://WWW.YANGSKY.COM/YANGIJCC.HTM), VOL. 3, NO. 1, MARCH 2005

Fig. 7. Time delay between the factors of neuronal activity. At thetop: cross-correlation between two simultaneously recorded neurons duringacquisition of classical conditioning. At the bottom: cross-correlation betweenthe trained neuron membrane potential (input) and control neuron membranepotential (output). Cross-correlation (bars) was calculated for each neuronand significance of the mean value (through neurons,∗ : p < 0.05) foreach lag was evaluated (Mann-Whitney U test). Curves - corresponding cross-correlation calculated for the mean preliminary averaged by neurons.

does not participate in the instrumental paradigm. An absenceof difference for the whole brain means that the paradigmis not instrumental at all. In our experiments, trained neuronparticipation in the instrumental reaction was predeterminedby the conditions of the experiment, while the significanceof its participation in an instrumental reaction increased withthe data accumulation (Fig. 6(B), rhombi). After trials 7-10, the trained neuron acquired sufficient knowledge for aconclusion about itsAP participation in the instrumentalreaction. Nevertheless, it still did not form a correct reactionto this moment of training (Fig. 5, top).

In order to be effective, the animal must determine whichaction will lead to an avoidance of punishment. It was un-known a priori which reaction,AP generation orAP failure,would play a role in the instrumental action.AP failure inthe trained and control neuron did not appear at the sametime. However, the control neuron sometimes produced anAPduring the same trials in which the trained neuron produced anAP and received information as if its reaction was essential forthe instrumental reaction. Up to trials 15-17, both the trainedand control neurons decreased their reaction to the conditionedstimulus (Fig. 5).

It is natural to suppose that a coincidence in time of theoutput reaction and the unconditioned stimulus would beaccepted more easily by a neural system as a possible logicalconnection than a coincidence in time of an absence of theoutput reaction and a presence of the unconditioned stimulus.This is explained by the notion that an absence of a reaction

fails to determine which reaction must be absent in order toprevent an unconditioned stimulus appearance.

When a conditioned stimulus in our experiments failedto generate anAP in the trained neuron and caused theunconditioned stimulus appearance, it still inducedAPs inmany other neurons. These were a control neuron and relativeneurons, as well as the presynaptic neurons that produced apostsynaptic potential duringAP failure in the trained neuron(Fig. 4, trials CS 1, CS 6, etc.). Therefore,AP failure inthe response to the conditioned stimulus may be the attemptof the neural system to counteract an unconditioned stimulusappearance after anAP generation within numerous non-trained neurons. Until this moment, it was undetermined whichneuron was responsible for the instrumental reaction andtherefore the response to the conditioned stimulus decreasedin both the trained and in the control neurons.

If a coincidence in time of anAP in the conditionedresponds and the unconditioned stimulus was effective forformation of the negative reinforcement, the neural systemhad to decrease its reaction to the conditioned stimulus, untilthe data would be accumulated that absence of anAP in theresponse contains the unconditioned stimulus. Consequently,the process of trial-and-error may continue after acquisition ofconsistent data about the instrumental behavior. For example,even if an animal knows the most preferable way to obtain aresult, it sometimes searches for the new ones.

A decision regarding of which neuron is responsible for aninstrumental reaction is also a problem that the neural systemought to resolve. Although our experimental procedure wasdirected to the trained neuron, reactions of both the trainedand the control neuron reactions were correlated (coefficient ofcorrelationr627 = 0.26 (p < 0.0001)). Therefore, the controland trained neurons sometimes generated similar reactionsand control neurons acquired erroneous information as ifparticipating in an instrumental reaction. In order to evaluateto what extent the control neuron received information thatits participation would be important for the generation ofan instrumental reaction, we compared the number of eventsduring which the control and trained neurons generated similarreactions (both generated or both failed to generate anAPin responses to the conditioned stimulus at the same trials)and the number of events when reactions of the control andtrained neurons did not coincide (Fig. 6(B), squares). At thebeginning of training and up to 15 trials, it was uncertainwhether a control neuron participates in the instrumentalreaction (probability of its participation was around 0.5). In themiddle of training, the control neuron generated an erroneousinstrumental reaction and the trained neuron failed to partic-ipate in the reaction (Fig. 5, top), although the significanceof control neuron participation in the instrumental reactionwas lower than the corresponding value for the trained neuron(Fig. 6(B)).

This fact can be considered an example to show thatstatistical significance itself is not necessarily the basis fordecision-making. During instrumental conditioning, the ap-pearance of a conditioned stimulus did not always bring anend to the unconditioned stimulus, because its appearance wasdependent on the generation or failure of anAP in the trained

SANDLER & TSITOLOVSKY, ATTITUDE TO EXPECTED REWARD AS THE BASIS FOR FUZZY LOGIC OF THE BRAIN 97

neuron’s response to the conditioned stimulus. Therefore, thedifferences between the responses to the conditioned anddiscriminated stimuli (Fig. 6(C), rhombi) were acquired laterthan in the case of classical conditioning (Fig. 6(A), rhombi).

As we have pointed out earlier, the control neuron duringinstrumental conditioning did not receive reliable informationabout its participation in the instrumental reaction (Fig. 6(B),squares). For that reason, at the beginning of training theexperimental procedure for the control neuron may look likeclassical conditioning with partial reinforcement. The controlneuron acquired this information during trials completed bythe unconditioned stimulus. Figure 6(C) (squares) demon-strates the significance of the conclusion that the controlneuron in the response to the conditioned stimulus generatedor failed to generate anAP with different probabilities (in thetrials completed by the unconditioned stimulus). Although thisconclusion was not significant up to 10 trials and, therefore,it corresponded to a classical paradigm, the control neuroncould not elaborate classical conditioning, since up to the10th trial the contingence between a conditioned stimulusand an unconditioned stimulus was still absent (Fig. 6(C),squares). In the middle of training, when the control neu-ron effectively generated erroneous instrumental reaction, itcollected almost significant data against its participation inclassical conditioning (Fig. 6, squares). We may concludethat the system acquired information relative to instrumentalconditioning, through several complex stages with varyingintermediate conclusions.

3) Change of a resting membrane potential during train-ing: The important feature of conditioning is the change inthe membrane potential during training session. A decreaseof a membrane’s potential by a motivational excitation caninduce neuronal damage, alternate ion homeostasis and disturbexcitability. Although the average shifts of the membrane po-tential during acquisition of classical conditioning are insignif-icant [51], [53], it does not mean that classical conditioningfailed to affect membrane potential at all. Unconditioned stim-uli depolarized neurons (see Fig. 2), but this shift of membranepotential, evidently, can be compensated by some homeostaticprocesses displaying itself after a break in conditioning for5-10 minutes before the beginning of extinction.

After a break in the pairing, a compensational mechanismcontinues to hyperpolarize neurons and, at the beginning ofextinction, the increase in the membrane potential becomessignificant (Fig. 4, bottom). Hyperpolarization, according toa classical neuron’s property, could not be the reason forresponse augmentation after a break (Fig. 4, top). Cancellationof an unconditioned stimuli during a session of extinctiongradually decreased the influence of the compensation, andmembrane potential recovered (Fig. 4, bottom), while responseto the conditioned stimulus decreased (Fig. 4, top). It shouldbe emphasized that an increase in neuronal responses didnot follow through to a decrease in membrane potentials.On the contrary, compensational hyperpolarization lead toaugmentation of the neuronal responses, like in the protectedneurons after any damage evoked by the excitation.

During instrumental conditioning, membrane potentials ofthe neurons are changed differently in the trained and control

Fig. 8. Phase portraits of the dependencies of the number ofAP in theresponse to the US on the membrane potential in the trained (top) andcontrol (bottom) neurons. Ordinate -response to the US, abscissa - membranepotential, relative units.• - beginning of acquisition. 1 and 40 -responses tothe first and the last presentations of the US.¯ - the trials, when instrumentalreaction in the trained neuron and erroneous instrumental reaction in thecontrol neuron arises.

neurons (Fig. 5, middle). TheAP in the trained neuron ariseswhen this abundant depolarization began to decrease. Duringtrials 17-31, when the trained neuron response was substi-tuted by the control neuron response, erroneous instrumentalreaction of the control neuron also arose during this neuronhyperpolarization (Fig. 5, middle; Fig. 8).

Depolarization in membrane potential did not correspondto an augmentation in the value of the neuronal responseseither in the trained or in the control neuron. Therefore, de-polarization is not the immediate cause of theAP generation.For example, Figure 3 demonstrates that the control neurongenerated maximal response to the unconditioned stimuluswhen it was hyperpolarized (response CS 22) and it generatedminimal response to the unconditioned stimulus when itsmembrane potential decreased (response CS 38). The trainedneuron generated minimal response to the unconditioned stim-ulus when it was the most depolarized (response CS 30).Analogously, the trained neuron failed to generate anAPin response to the conditioned stimulus, when its membranepotential decreased (response CS 30), but it did generate anAP in response to the conditioned stimulus, when it washyperpolarized (response CS 41). On the average, after 25trials, when instrumental reaction began to arise, the trainedneuron accented the balance of the membrane potential to thedepolarization, reached a minimum in the 30-32 trials, andthen rapidly increased (Fig. 5, middle).

Evidently, during instrumental conditioning the regularitiesof unconditioned stimulus presentation were much more com-plex, and a shift of the membrane potential after an uncon-

98 INTERNATIONAL JOURNAL OF COMPUTATIONAL COGNITION (HTTP://WWW.YANGSKY.COM/YANGIJCC.HTM), VOL. 3, NO. 1, MARCH 2005

ditioned stimulus presentation could not be compensated. Wemay suppose that membrane potential in the control neuronwas overcompensated, while compensation in the trainedneuron did not overcome the depolarization evoked by theunconditioned stimulus.

These regularities are present also for responses to theunconditioned stimulus. Phase portraits of the dependenciesbetween a value of the responses and membrane potentialreveal that when the neuron generated an instrumental reaction(both correct or erroneous), response to the unconditionedstimulus was larger during neuronal hyperpolarization (Fig. 8).This follows from the positive correlation between membranepotential and responses to the unconditioned stimulus (circlesat Figure 8).

This phenomenon corresponds to recovery of the responsesduring classical conditioning, when compensatory hyperpolar-ization arises. Therefore, generation of instrumental reactionscan be considered as a consequence of the recovery of neu-ronal excitability after excitotoxic damage.

4) Instability of neuronal reactions during learning:Thegeneral outlines of the neuronal analogs of instrumental andclassical conditioning, which are demonstrated in Figures 2and 4, do not display the fine dynamics of elaboration, inparticular the high level of trial-to-trial instability of responses.In fact,during instrumental learning, the firing pattern wasespecially irregular. The number ofAPs in the responsewas not stationary from trial-to-trial, but a correspondingdifferentiated process was stationary and had a mean of zero.

In order to estimate the trial-to-trial variability of response,we calculated a differencing between two neighboring re-sponses from the time series data. Figure 5 (bottom) showsthe current trial-to-trial variability of the neuronal activity.Although the mean differencing during training did not signif-icantly exceed the zero level (not shown in Fig. 5), the meanabsolute differencing of the responses is changed significantlyduring training.

This value manifests instability of the responses. The initialvariability of the trained neuron responses exceeded the vari-ability of the control neuron responses. The trained neuronresponse instability was maximal during the origination of theinstrumental reaction in this neuron (trials 26-33), whereas theinstability of the control neuron response was maximal duringthe transient substitution of the instrumental reaction by theAP of the control neuron (trials 17-31). During this periodof substitution, the instability of the trained neuron reactiondecreased. Evidently, an instability of responses increaseswhen the neural system tries to use a given response as aninstrumental reaction.

Instability of neuronal reactions is the basis of trial-and-error behavior, but trial-and-error behavior is not stochastic.We found rigorous regularities in the choice of the output re-action, such as alternate generation of anAP in the responses.Figure 9 demonstrates that an increase in the current responselead to a significant decrease in the following response. Thiswas observed for both classical and instrumental conditioning.Neurons during classical conditioning and control neuronsduring instrumental conditioning replaced the periods of in-stability with periods of stability (Fig. 9, top and bottom). In

Fig. 9. Autocorrelation of differencing between two consecutive responses tothe CS during training. From top to bottom: classical conditioning; instrumen-tal conditioning, trained neuron; instrumental conditioning, control neuron.Abscissa - value of autocorrelation, ordinate - lags. Curves - confidenceinterval for autocorrelation (p < 0.05). Opened bars - autocorrelation of thedifferencing, closed bars - autocorrelation of absolute value of differencing(instability of responses).

the trained neuron, a short period of instability (AP generation- AP failure orAP failure -AP generation) was replaced by ashort period of stability following a period of irregular activity(Fig. 9, middle, lags 4-7).

Evidently, training is not only a gradual memorizing of thecorrect instrumental reaction, but, probably, is the series ofconsistent attempts to achieve a healthy result. The membranepotentials in the trained and control neurons may display coor-dinated changes. Figure 7 presents cross-correlation betweenmembrane potentials of the simultaneously recorded neurons.Classical conditioning (Fig. 7, top) revealed synchronizationof the membrane potential changes as the central pikes of thecross-correlation were of high significance. The origin of thecorrelation between the neuronal responses is impossible toexplain by their similar changes during training, since cross-correlation between the membrane potentials, preliminarilyaveraged through the neurons, was almost absent (Fig. 7,top, curve). Instrumental learning revealed non-symmetricsignificant peaks (Fig. 7, bottom). Mean membrane potentialsin the trained and control neurons changed oppositely, sincecross-correlation of preliminary averaged data was negative(Fig. 7, bottom, curve). Within an experiment, changes ofmembrane potential in the control neuron were 3-5 trials,i.e. approximately 10 - 15 min, ahead of the correspondingchanges in the trained neuron (the non-symmetric peak ofcross-correlation for both averaged and non-averaged data waspositive). This time is much more than the time for a spreadingof the electrical signal through neural tissue, but it compareswith time for cardinal reorganization of biochemical processes

SANDLER & TSITOLOVSKY, ATTITUDE TO EXPECTED REWARD AS THE BASIS FOR FUZZY LOGIC OF THE BRAIN 99

in the tissue.

C. Discussion of experimental results

Our data revealed that the conventional properties of aneuron, such as membrane potential maintenance and spikegeneration, were disturbed during instrumental learning. Thetrained neuron depolarized during developing of a motiva-tion. Formation of the local instrumental reactions (spikegeneration) at the end of elaboration was accompanied by arecovery of the membrane potential. The origination of bothtrue and erroneous instrumental reactions was observed duringa transient increase in the membrane potential and the riseof neuronal instability in the corresponding neuron. Therefor,depolarization is not immediate cause of the AP generationwhile hyperpolarization is not the immediate cause of the APfailure. It is reasonable to suppose that the motivational exci-tation is induced by excitotoxic damage in the trained neuron.Compensational hyperpolarization decreased the damage andrecovered spike generation. A similar phenomenon was foundin the motor neuron of the mollusk Aplysia after the deliveryof strong sensitizing stimuli [8].

Note that the neuron itself may be the functional unitof reward [26], [46] and can produce self-administrationof psycho-stimulants, opiates and DA-agonists [46]. Bothphysical and psychological stressors, which evoke damage inneurons, facilitate the acquisition of drug self-administration[37].

It has been shown that neurons of morphine-dependentanimals are depolarized relative to the control animals [29] andthere is evidence for a loss of neurons in many regions of thebrain in chronic alcoholics [14], [4]. Homeostatic mechanismsof cells, evidently, counteract damage by means of a decreasein excitability. Similarly, overprotection leads to a homeostaticincrease in excitability and the state of the cell shifts to damage(see Appendix A1 for further discussion).

It has been demonstrated above that in a simple task(classical conditioning), neuronal homeostasis compensatesfor harmful changes in the neuronal membrane properties,whereas during complex instrumental behavior, the spike gen-eration capability is injured in the specific trained neuron,although the intensity of harmful influences was smaller.During instrumental conditioning, the trained neuron accu-mulated depolarization from trial-to-trial. This depolarizationwas evoked by the unconditioned stimulus, while the controlneuron overcompensated depolarization and its membrane po-tential increased until origination of the instrumental reactionin the trained neuron. Thus, predictability of environmentalterations is essential for the development of compensation.

We may suppose that compensation is directed to the causesof alterations and when these causes are not determined (asduring elaboration of instrumental reflex), compensation isnot able to manage with the cellular alterations. Excitationis the most important reason for cell damage and forestalledinfluence of compensation may be directed to countering achange in excitability. It has been demonstrated [51], [53] thatduring learning excitability is changed selectively and tran-siently within the response evoked by the conditioned stimulus

and does not concern responses evoked by the discriminatedstimulus, which is never paired with the punishment. Fore-stalling influence of compensation can be directed to theseselective changes in excitability (see Appendix A2 for relateddiscussion).

The reward consists of stimuli which stabilize the firingpattern of neurons, while punishment de-stabilizes neurons,so that a given stimulus elicits variable responses from trialto trial. The disturbance of the equilibrium in the damage-protection mechanism leads to a switching of the compensa-tion mechanism into delayed (by time) damage after a correctreaction or into delayed protection after an incorrect reaction.A similar phenomenon is well known as a consequenceof opiate consumption: opiates induce a transient neuronalprotection and delayed damage. Therefore, one sees that underambiguous conditions, a neuron tends to vary its behavior.

Sporadic processes of compensation restore the excitablemembrane function and origination of the instrumental reac-tion prevents the appearance of the unconditioned stimulusand induces reduction of over-depolarization and additionalprotection. When the regularities for punishment occurrencehave been established, the punishment becomes predictableand the correct reaction does not induce a positive reinforce-ment, while a negative reinforcement continues to producetransient damage and a delayed protection. The excitablemembrane recovers its function and the trained neuron emitsan instrumentalAP . Voluntary actions arise as the result ofcounteraction of endogenous compensational mechanisms andexcitotoxic damage of specific neurons, and thus anticipate theexogenous compensation evoked by a reward. Thus, voluntaryactions, which are targeted to compensate metabolic alter-ations, are goal directed (see Appendix A3 for an additionalconsideration).

D. Rules of the adaptation

Now, we can formulate two simple rules, which explain partof above mentioned data (see also [41]) :

Rule 1Activity of neuron a increases with neuronal damageand decreases with neuronal protection.

Rule 2The damage increases with a punishment or witha high activity of the surrounding neurons and de-creases with a reward and non-violent activity of thesurrounding neurons.

These rules, however, include neither the compensationalmechanism nor neuron’s estimation of an environmental re-action: “expectation of punishment” in the considered case or“expectation of reward” in situations like drug-consumption.

In order to taking into account the compensational processesin a neural tissue, let us consider variants of the neuron’sestimation of correlation between its response and the envi-ronmental reaction:

a. expectation of (AP→ punishment) is high,b. expectation of (absence of AP→ punishment) is high;c. expectation of (AP→ punishment) is low;d. expectation of (absence of AP→ punishment) is low.

Cooperative properties ofa or b correspond to classical condi-tioning; c andd are like habituation;a andd are instrumental

100 INTERNATIONAL JOURNAL OF COMPUTATIONAL COGNITION (HTTP://WWW.YANGSKY.COM/YANGIJCC.HTM), VOL. 3, NO. 1, MARCH 2005

conditioning for learning to react;b and c are instrumentalconditioning for learning to to retrain from reaction. Theneuron must reveal which task: habituation, classical condi-tioning or instrumental conditioning, is presented. (Our caseis b and c for the trained neuron and absence of regularitiesfor the control neuron). Punishment leads to damage, whilereward leads to protection. The degree of these alterationsdecreases with a decrease in the current level of “expectationof punishment”. Each value of the expectation of punishmentis determined by corresponding varianta- d of correlationbetween behavior of the neural system and reaction of theenvironment.

The rules which describe the phenomenon of a neuron“expectation” and the compensational processes in a neuraltissue can be formulated as such:

Rule 3Expectation of punishment increases if correspond-ing correlation betweenAP generation and punish-ment is high and decreases otherwise.

Rule 4Compensation is turned on either after anAP gen-eration or afterAP absence depending upon forwhich one of the cases (a or b) the expectation ofpunishment is higher. If these expectations are close,then compensation is turned on for both types of theneuronal response.

and the following “sub-rules” can be added to Rule1 and Rule2:

Sub-Rule 1Even if punishment follows after a signal, butexpectation of this punishment is low, then a responseto this signal can decrease during the nearest time.

Sub-Rule 2 Reward is high, if expectation of the punish-ment is high, but the punishment is absent. Other-wise, reward is small.

III. T HEORY: FUZZY DYNAMICS OF NEURONAL

ADAPTATION

Now we could try to develop a mathematical theory ofan adaptation of single neuron by representing of a neuronbehavior as ”trajectory” in a 7D state space:S = {x,Θ}:

x1(t) ← relative value ofAP activity± x2(t) ← relative value of the protection/damage± x3(t) ← relative value of the protection/damage

compensationΘν(t) ← relative value of the environment reactions

expectation(ν = a, b, c, d)

where an environment is described by the variabler, whichis determined by environmental reactions: punishment, rewardor absence of the reaction. In our case,r can be representedas a function ofAP activity and expectation of a punishment:

r = r(x1, Θν), (1)

where negativer corresponds to the punishment and thepositive one corresponds to the reward.

It should be emphasized that these variables have consider-ably different relations with the experimental data. Variablex1

is directly measurable. Variablex3 is not measured directly,but its behavior is qualitatively reflected in behavior of the

membrane potential (which is directly measured), unlike, thevariablesx2 and Θν which are pure phenomenological andcan not be obtained from the experimental data (in physicssuch quantities are often called “hidden” parameters). Thesequantities, of course, are excluded from the final results, buttheir analysis helps to understand implicit motives of a systembehavior.

The rules on the page 99 are expressed as natural languagesentences (as “perceptions” in accordance Zadeh’s terminol-ogy [60]), rather than as precise “mathematically looking”laws. One of the main causes of such form of representationof inferences from an experimental data is quite general. Ina real experiment only a few parameters of a system areobservable and controllable, while a number of the otherones are “hidden” or remain out of control. For a complexphenomenon it leads to instability in experimental resultsfrom trial to trial and makes it difficult to estimate accuracyof the obtained values. In fact, for such a phenomenon fairdescription of a system behavior is based on our “percept” ofobserved tendencies rather than on precise numerical valuesof the experimental data. The second cause is specific for theconsidered system. Since a neuron is an elementary “unit” ofprocessing of perceptions, such structure of the rules directlyreflects biological meaning of the neuronal functioning.

So, if we want to develop an adequate mathematical theoryof a neuron adaptation, we, are in a some sense, compelledto search an apparatus, which could directly operate with“perceptions” as with mathematical objects.

A. Introduction to Fuzzy Dynamics

Two decades ago L.Zadeh proposedTheory of Possibility[59], which is intended to deal with qualitative and impreciseknowledge. Recently he has discussed a general way forrepresentation of a problem expressed in terms of perceptionsin a form fitted for mathematical and computerized solutions[60]. Briefly, this consists of the following steps :

Z1 Expression of a problem in terms of a commonhuman language:Natural Language (NL) represen-tation.

Z2 Reformulating of the problem by using formalizeddefinitions of fuzzy logic (e.g. “linguistic variables”[58]): Precise Natural Language (PNL) representa-tion.

Z3 Rewriting PNL-description by using an appropriatesymbolic representation of the fuzzy logic notations:Protoform representation.

Z4 Mathematical or algorithmic representation of theProtoformcorresponded to mathematical or comput-erized solution of the problem.

Below we show how this general concept can be used fordeveloping ofFuzzy Dynamics- theory of a system evolution,which directly operates with qualitative terms of commonhuman language.

The first step is formulation of the dynamic laws in commonhuman language terms (NL level):

A. “A system can be at a given point (of a state space)if transition between this point and the point, wherethe system was earlier, is possible”.

SANDLER & TSITOLOVSKY, ATTITUDE TO EXPECTED REWARD AS THE BASIS FOR FUZZY LOGIC OF THE BRAIN 101

Since we want to develop a theory of system evolution,i.e. “moving” of a system in the corresponding state space,we have to consider local topology of this space. In mostimportant cases we can restrict ourself to the spaces with asimple local topology, like:

B. “The system state is in some small domain of a statespace, if it is at least in one of the parts of thisdomain”.

It is somewhat amazing, but these very general (and almosttrivial) claims lead to unique forms of the dynamical equa-tions, which are nothing else as somewhat generalized famousdynamical equations of theoretical physics (the Hamiltonianequations, the equations of Liouville, of Schreodinger, ofDirac, etc).

The next step isPNL-expressed Dynamic Laws, where asystem states are described as “possibility” [59] that in a timet a system state is in the neighborhood of some pointx of thesystem state space.

A. Possibility that state of the system is in the neigh-borhood of the state -x at the timet + δ can beconsidered as:the possibility that the possibility that the systemwas in the neighborhood of the statex1 at the timet AND there was possible its transferring to theneighborhood ofx during the time intervalδ,ORthe possibility that it was in a neighborhood of a statex2 at time t AND there was possible its transferringto the neighborhood ofx during the time intervalδ,OR... and so on, for all possible statesxi of the system;

B. If a system “physical state space” has a simple localtopology and if a small neighborhoodU consists oftwo non-intersected domainsU1 and U2, then thepossibility that a system state is inU is equal to thepossibility that the system state is in theU1 OR it isin the U2.

Let us, now designate:

m(Ux, t) − symbolizes the possibility that a system state -x is in the neighborhoodUx at the timet,

Pδ(Uy, Ux, t) − symbolizes the possibility that the systemstate pass on fromUy to Ux during the time intervalδ,

(mA ∨mB) and (mA ∧mB) − symbolize the logical con-nectives{mA OR mB} and{mA AND mB}.

Then, the expressionsA andB can be rewritten as aProtoform:

m (Ux, t + δ)

={

...∨ (

Pδ (Ux, Uxi ; t)∧

m (Uxi , t))∨

...}

︸ ︷︷ ︸for all xi

, (2)

m(U1 ∪ U2, t) =(m(U1, t)

∨m(U2, t)

). (3)

In order to obtain mathematical equations corresponding toprotoform (2)-(3), we should find plausible mathematical rep-resentation for the possibilitiesm(Ux, t) andPδ(Ux, Uxi ; t).

B. Fuzzy Dynamics equations

Since we consider the small neighborhoods and small timeintervals, it is reasonable to discuss the limitUx → x,when the neighborhood is collapsing to the point. Generallyspeaking, there are two cases:

limUx→x

m (Ux, t) = µ(x, t)

limUx→x

m (Ux, t) ≡ 0.

We will consider here only the first case. (See [17], [42] forthe second one.) In this case the possibilitiesm(Ux, t) andPδ(Uy, Ux, t) are well defined in all points of a physical statespace. So, it is reasonable to represent this possibility as anordinary functions, with the correspondence:

m (Ux, t) → µ(x, t), (4)

Pδ(Ux, Uy, t) → P (x,x + vδ; t) = Γ(v, x, t), (5)

wherev = (y − x)�δ. These functions have the followingmeaning:

µ(x, t) - is the possibility that system is in the “point”x at the timet,

Γ(v,x, t) - is the possibility that system has “velocity”v in the pointx at the timet.

It is important, thatΓ(v, x, t) can be obtained straightfor-wardly from dynamics rules expressed in the “linguistic” formas on the page 99 (see Section III-C.2).

In order to avoid misunderstanding, it should be emphasizedthat there is a considerable difference between concepts ofpossibility and probability [59] and it is reflected in theirmathematical properties. For example, the probability densityfunctions should be totally integrable, but can be non-bounded,while the possibility functions have an opposite features. It isconvenient to choose:

supx

µ(x, t) ≡ 1.

In order to find permitted representation of the connectiveOR, we consider a small domainU consisting of the twonearest-neighbor ones :

U = Ux1 ∪ Ux2 ,

and take a limit:

Ux1 → x1

Ux2 → x2

x1 → x2 → x.

Then, in accordance with (3), one obtains:(µ(x, t)

∨µ(x, t)

)= µ(x, t), (6)

(i.e. in the considered case correct representation of theconnectiveOR should be idempotent).

Together with common requirements, like monotonicity,commutativity and boundary conditions [5], [23]:

µ1 ≤ µ2 →(µ0

∨µ1

)≤

(µ0

∨µ2

),

(µ1

∨µ2

)=

(µ2

∨µ1

),

(0

∨µ1

)= µ1,

102 INTERNATIONAL JOURNAL OF COMPUTATIONAL COGNITION (HTTP://WWW.YANGSKY.COM/YANGIJCC.HTM), VOL. 3, NO. 1, MARCH 2005

expression (6) leads to an unique permissible representation:(µ1

∨µ2

)= max(µ1; µ2). (7)

ConnectiveAND, however, remains not fixed and must berepresented as anaggregation operator[23]:

(µ1 ∧ µ2) = T{µ1; µ2}. (8)

If the aggregation operator is symmetric, associative and has1 as neutral element(i.e. T{µ; 1} = µ), it is called t-norm(good expounding of the t-norms is given in [23]).

Representation (4),(7) and (6) convertsprotoform (2) intoequation:

µ(x, t + δ) = supv

T{Γ(v,x, t); µ(x− vδ, t)}. (9)

In spite of arbitrariness of thet-norm in Eq. (9), it should benoted, thatMIN t-norm plays a specific role in representationof the aggregation operator. Due to an inequality [23]:

T{X; Y } ≤ min(X; Y ),

such representation gives an “optimistic” solution of the dy-namics problem, because in accordance with Eq. (9) one has:

µmin t−norm(x, t) ≥ µarbitary t−norm(x, t). (10)

Taking of limit δ → 0 we can obtain continuous-timedynamics corresponding to Eq. (9). Note, that in this limitthe functionΓ and the aggregation operator are related eachother: the supremum ofΓS = supv Γ(v,x, t) must be theneutral elementof the aggregation operator. In particular, ifTis a t-norm it should be:

ΓS(x, t) ≡ 1. (11)

It can be shown (see [16] for details) that in the limitδ → 0Eq. (9) leads to the Liouville-like equation:

∂µ

∂t= −

∑n

Vn(x, p, t, µ)∂µ

∂xn. (12)

with fuzzy velocity -V is obtained from the equation:

pn = λ∂Γ(V , x, t)

∂Vn, (13)

whereλ is a Lagrange multiplier, which is found from:

Γ(V , x, t) = f(µ). (14)

where[f ; 1] is neutral intervalof the aggregation operator, i.e.f(µ) ≤ Γ ≤ 1 is the solution of the equation:T{Γ; µ} = µ.

Eqs. (9)-(14) are the basic equations of the Fuzzy Dynamics.“Trajectories” - ξ(t) with a constant possibilityµ(ξ(t), t) ≡µ(ξ(0), 0) = µ0(x) obey the system of the equations:

dξn(t)dt

= Vn(ξ, p, t, µ0), , (15)

dpn(t)dt

= −∑

k

pk∂Vk(ξ, p, t, µ0)

∂ξn− λ

∂f

∂µ0pn. (16)

with initial conditions:

ξ(0) = x, p(0) = ∇µ0(x).

Note, that the bundle of the trajectoriesξ∗(t), which aresatisfied toµ(ξ∗(t), t) ≡ 1, can be considered as a “tube”of the most preferable ways of evolution of a system.

In fact, the trajectories of a constant possibility are directlyrelated to “dynamics of perceptions”. For example, truth valueof the expression “|x| is Large” changes in the time as:

L|ξ|(t) = supx

T{L(|ξ(t)|), µ0(x)},

whereL|ξ|(t) is the truth value of the perception that|ξ| islarge at the timet.

Equations (15)-(16) describe evolution of a system in anextended state space, which consists of the two components:

“physical” component:x“informational” component:p.

Since Eq. (16) is similar to the equation for the physicalmoment,p could be called as “informational moment”, whichshows direction from a given state to the most preferablenearest one.

Actually, (15)-(16) are more general than common dynam-ical equations, because it includes both differential equationand differential inclusion [2].

C. Dynamics of Neuronal Adaptation

In the fuzzy dynamics approach the functionΓ(v, x, t),which determines “informational forces”:

−∑

k

pk∂Vk

∂xn− λ

∂f

∂µ0pn,

and fuzzy velocities:

Vn(p, x, t, µ0)

can be derived immediately from the rules1-4 (page 99).1) Linguistic variables and protoform of the dynamics law:

In order to do this, let us introduce “linguistic variables”Positive, Negative, etc. and symbolize them as:

P - Positive,N - Negative,PL - Positive Large,S - Small,NL - Negative Large,

Thenprotoformsof the rules1-4 can be represented as in theTable I, with x,Θ corresponds to the trained neuron,y,Θ′

to the surrounding ones (control neurons in our experiments).The quantitiesvi and vα designate velocities of changing ofthe variablesxi andΘα.

In according theSub-Rules1,2 (see page 99) an environ-mental response -r for the trained neuron is defined as

r is NL ⇔ x1 is Sr is PL ⇔ x1 is PL ∧ Θa is PLr is S ⇔ x1 is PL ∧ Θa is S

while for the surrounding (control) neurons:

r is NL ⇔ x1 is Sr is PL ⇔ x1 is PL ∧ [(Θa is PL) ∨ (Θb is PL)]r is S ⇔ x1 is PL ∧ [Θa is S∨ Θb is S]

SANDLER & TSITOLOVSKY, ATTITUDE TO EXPECTED REWARD AS THE BASIS FOR FUZZY LOGIC OF THE BRAIN 103

TABLE I

PROTOFORMS OF THE DYNAMICS RULES

1.1 v1 is P ← x2 is NL1.2 v1 is N ← x2 is PL ∨[r is NL ∧ Θb is S]2.1 v2 is P ← (r is PL ∧ y1 is S)∨ (x2 is N ∧ x3 is N) } )2.2 v2 is N ← [x2 is P∧ x3 is P] ∨ r is NL ∨ y1 is PL3.1 v3 is P ← x2 is PL ∨[x3 is N ∧ x2 is S]3.2 v3 is N ← (x3 is P∧ x2 is S)∨ ([x2 is NL]

∧ { [x1 is PL ∧ Θa −Θb is PL]∨[Θa −Θb is S]∨ [x1 is S∧ Θa −Θb is NL]})

a.1 va is P ← x1 is PL ∧ r is NLa.2 va is N ← [x1 is PL ∧(r is S∨ r is PL)]b.1 vb is P ← x1 is S∧ r is NLb.2 vb is N ← [x1 is S∧(r is S∨ r is PL)]c.1 vc is P ← x1 is PL ∧ r is Sc.2 vc is N ← [x1 is PL∧(r is NL ∨ r is PL)]d.1 vd is P ← x1 is S∧ r is Sd.2 vd is N ← [x1 is S∧(r is NL ∨ r is PL)]

Because sub-rulesn.1 - n.2 are related by connectiveOR,while the rules are related by connectiveAND, protoform ofthe Γ is:

Γproto =(1.1

∨1.2

) ∧(2.1

∨2.2

)∧ (3.1

∨3.2

)

∧ (a.1

∨a.2

) ∧(b.1

∨b.2

)(17)

∧ (c.1

∨c.2

) ∧(d.1

∨d.2

). (18)

The next steps are very common [58]: the linguistic variablesPositive, Negative, etc.are represented as “membership” func-tions P (z), N(z) (see Fig. 10), which istruth valuesof theexpressions: “z is positive”, “z is negative”,etc. Note, thatthese functions are restricted:

0 ≤ P (z), N(z), etc.,≤ 1,

All these membership functions can be expressed in term ofthe one function, say,P (z):

N(z) = P (−z), PL(z) = P (z − a),NL(z) = P (−z − a),

S(z) ' min{1− PL(z); 1−NL(z)} (19)

= P (a− |z|).

2) Informational forces, fuzzy velocities and most prefer-able ways of a neuron adaptation:By using (20) and repre-sentation of the logical connectives we can find thetruth valueof each rule in the Table I. For example, the truth value of thefirst rule is:

T1.1 = T{P (γ1v1); P (−x2 − a2)},

where coefficientγn reflects a time-scale of dynamics of thecorresponding variable. (TruthTn.i values of the other rulesare presented in the Appendix A4). Note, thatimplication{←}is represented here as t-norm, that is common for such kind ofthe problems. It is reasonable that the truth value of the rule“v is Positive if x is Negative” is close to the truth value ofthe expression “v is Positiveandx is Negative”.

Fig. 10. Membership functions of the linguistic variables.

Finally, the functionΓ(v, x,Θ) is obtained as :

Γ(v,x,Θ) =χ(v, x,Θ)

supv χ(v,x,Θ), (20)

where

χ(v, x,Θ) = T {max [T1.1; T1.2] ;max [T2.1; T2.2] ;max [T3.1; T3.2] ; max [Ta.1; Ta.2] ;max [Tb.1; Tb.2] ;max [Tc.1; Tc.2] ;max [Td.1; Td.2]} . (21)

In Eq. (20) condition (11) has been taken into account andnotationT{µ1; µ2; µ3} def= T{µ1;T{µ2;µ3}} has been used.

Expression (20) still does not explicitly defineΓ(v,x,Θ),because aggregation operatorT{µ1;µ2} and functionP (x)are not determined in unique manner and rely on expertknowledge and intuition.

This is not a specific disadvantage of the consideredapproach. Actually, mathematical consideration of any realproblem is ambiguous in either degree. In our case, however,this ambiguousness is reduced to ambiguity in determining oftwo monotonic functions, while conventional dynamics theoryof complex system like neuron requiresa priori determiningof much more complicated mathematical expressions.

In [41] we have chooseMIN t-norm for the aggregationoperator. This choice gives an “optimistic” description (seepage 102) of the neuronal dynamics. Here we will use anothert-norm, which is more realistic from the neuro-physiologicalpoint of view and covers wide class of the aggregationoperators:

T{µ1; µ2} = f−1 (f(µ1) + f(µ2)) , (22)

104 INTERNATIONAL JOURNAL OF COMPUTATIONAL COGNITION (HTTP://WWW.YANGSKY.COM/YANGIJCC.HTM), VOL. 3, NO. 1, MARCH 2005

wheref(µ) is a monotonic decreasing function withf(1) =0, f(0) = ∞. In addition sucht-norm simplifies analysis ofthe most preferable way of neuronal adaptation (tube of themost preferable trajectories). Indeed, in this case right part ofEq. (15) takes of the form:

Vn =

(γ−1

n (αnCl + (1 − αn)cm) → In.1(x, Θ) > In.2(x, Θ)−γ−1

n ((1 − αn)Cs + αncm) → In.1(x, Θ) < In.2(x, Θ),(23)

where0 ≤ αn ≤ 1 indicates a trajectory in the tube and wehave used crisp cutoff for the velocities:−Cs ≤ γnVn ≤ Cl.The constantcm is defined asP (z ≥ cm) = 1.

Solution of Eq. (15) withVn from (23) is:

ti ≤ t ≤ ti+1

xn(t) = xn(ti) + Vn(t− ti), (24)

wherexn(ti) are solutions of the the equations:

In.1(x(ti),Θ(ti)) = In.2(x(ti),Θ(ti)). (25)

For example, forn = 1 andn = b this equation leads to:

f(P (−x2 − a2)) = min{f(P (a1 − x1)) + f(P (0.5−Θb));f(P (x2 − a2))}, (26)

and

f(P (a1 − x1)) = f(P (x1 − a1)) + f(P (|0.5−Θa|)), (27)

correspondingly. Since for the othern Eq. (25) leads tosimilar forms of equations, we can obtain the tube of the mostpreferable ways of adaptation by defining only compositionf ∗P (instead defining each function separately). For a simplechoice:

f(P (z)) =

0 → z > cm

− 0.5cm

z + 0.5 → −cm ≤ z ≤ cm

∞ otherwise.(28)

results forAP and for environment-reactions expectation ofthe trained neuron are shown on Fig. 11. Straightness ofthe lines on this figure is result of the choice (28) and foranotherT more complicated shapes of the curve lines arepossible (see [41]). On the other hand “sudden” changes inneuron behavior, which are seen in the experimental data andare predicted by the fuzzy dynamics, are fundamental ones.Theoretically, these changes arise from “majorant” type ofthe fuzzy master equation (9) and its kernel (20) and thisis the consequence of theMAX representation for logicalconnectiveOR. Note, that such representation is not the resultof our arbitrary choice, but is unambiguously dictated by localtopology of the neuron’s state space (see page 101))

IV. CONCLUDING DISCUSSION

It is known that small elements such as a single cellcan demonstrate “reasonable” behavior. For example, thesalient feature of chemotaxis is similar to goal-directed be-havior: “Spermatozoa migrating to a pheromone exhibit pho-bic reactions when perceiving decreasing concentrations ofthe pheromone gradient” [31]. Motivationally relevant neuro-peptides influence motivation and single cell chemotaxis [31],[20] in similar concentrations, depending on experience andwith a similar U-shaped dependence upon concentration. Cells

Fig. 11. Tubes of the most preferable way of evolution of the trained neuron.The first picture shows an action potential (AP ) and the second one showsexpectations of the environmental reactions:AP → punishment(Θa) - thinlines; absence of AP→ punishment(Θb) - thick lines. Parametersan; γn

were adjusted in order to achieve best agreement with the experimental datafor APs (circles). (Fine-notched bars are an artefact resulted by the crispcutoff: |xn| , |Θα| ≤ 1.)

sometimes utilize attractant molecules in their metabolism,and the same intracellular mechanisms that are involved inchemotaxis are involved in motivational behavior, especiallyintracellular Ca++ augmentation [9], [39] and participation ofG-proteins [18], [24], [30], which realize a connection betweenthe extra- and intracellular environment and can reorganize theion homeostasis of neurons.

Motivational behavior is present at the neuronal level andfurthermore homeostasis of a neuron is evidently a unit ofmotivational behavior. A neuron is a specialized cell forbehavior control. Therefor, why must a neuron be deprivedof the capability of feeling primitive sensations that non-differentiated cells possess? If a neuron does feel sensation,then what is the demand that the neuron aspires to satisfy?Although a neuron needs to support many important char-acteristics, only one integral characteristic may affect thesole output of the neuron. This characteristic is the level ofcellular excitation, which is intimately connected with thecellular damage/protection. We may suppose that excitationand inhibition are perceived by a neuron as negative and

SANDLER & TSITOLOVSKY, ATTITUDE TO EXPECTED REWARD AS THE BASIS FOR FUZZY LOGIC OF THE BRAIN 105

positive sensations. In fact, treatments that protect neuronsusually inhibit neurons and exert psychotropic actions relatedto relief, while the treatments, that induce damage, exciteneuron and intensify motivations [50].

In above mentioned experiments it has been demonstratedthat AP in a single neuron can serve as an elemental in-strumental reaction at the cellular level. Therefore, motiva-tional behavior can be studied at the neuronal level, just asmotivational-like behavior is observed in a single cell. Moti-vational excitation induces transient neuronal damage, whichis conveyed by the shift of some important inner constant:membrane potential decrease, inner calcium concentrationgrowth, etc. Neuronal homeostatic mechanisms compensatefor this shift. The compensation is steady if the properties ofthe environment are simple enough, as during habituation andclassical conditioning. Based on the above consideration, wemay conclude that the very moment of physiological regulationof damage/protection is also a moment of psychological sensa-tion. Therefor, we may conclude, also, that damage/protectionprocesses in the neurons can be considered as the physiologicalbasis of the motivations. We feel drives because we are mortaland carry out our major needs, such as the aspiration tolive and avoidance of injury. Consciousness consists of beingbetween satisfaction and depression that isbetween life anddeath, and the ability to feel rests on the fact of mortality.

Appearing of the sudden changes in a neuron’s behavior (seeFig. 11) are very fundamental result of fuzzy dynamics and itis consequence of the representation of the logical connectiveOR as (A

∨B) → max{µA; µB}. It should be emphasized

that such a representation is not result of our free choice, butit is dictated by the neuron’s physical features (see Section III-C.2).

Sudden alterations in the neuron’s behavior have an effecton macro-behavior of an animal. It is well known that evenknowing a good solution to a given problem, an animal fromtime-to-time tries to find a new solution and if the new solutionis a worse one the animal returns to the previous behavior.Such “researcher’s instinct” is very beneficial, since it enablesthe animal to effectively optimize its behavior in continuouslychanging environmental conditions. Note, that in the abovepresented approach such behavior is neither the consequenceof random inner influences of the neural system and nor onlythe result of the sudden changes in environment, but rather afundamental feature of neurons.It seems very likely, that theinner logic of the neuron’s behavior is close to fuzzy logic.

The idea to use some of the features of physiologicalprocesses in cybernetics is very old [1], but it is still attractive.For example, concepts of self-consciousness and emotion forrobotic systems has been discussed recently in [27].

Our approach makes it feasible to design new kinds ofartificial neurons: motivational artificial neurons, becauseFuzzy Rules in the Table I (see Section III-C) are easilycomputerized,

Motivational artificial neuronsseem to be very promisingas elements of the “brain of the feeling robot”. Behaviorof such robots will be initiated by a general “defensivemotivation”, which is expressed as a few artificial drives likeenergy recuperation, avoidance of injury and aspiration to

survive, while the robots main task can be implemented as ananalog of an animal’s sexual motivation. As most of the fuzzysystems,motivational artificial neuronsare easily hybridizedwith almost any sensors, actions controllers and long-termmemory system. So, in a “feeling robot” performance of themain task, aspiration to survive, trial-and-error learning and“instinct of researcher” will be naturally combined.

If a robot have to act autonomously in a poorly predictableand a hard environmental condition, a motivational paradigmmay be significantly more effective than conventional rein-forcement learning approaches [47].

APPENDIX A1: DISTURBANCE OF HOMEOSTASIS, DAMAGE

AND COMPENSATION

Although overall intensity of the punishment during thesession of instrumental conditioning was almost twice assmall as that during classical conditioning, in comparison toclassical conditioning, in instrumental learning the neurons didnot compensate for the changes of the membrane potentialinduced by negative reinforcement. Evidently, the predictableproperties of the environment during classical conditioningallow the possibility to employ self-contained mechanismsof membrane potential maintenance. At the beginning ofthe instrumental conditioning, there was overcompensationof the unconditioned excitation and both trained and controlneurons were hyperpolarized. Further overcompensation of thecontrol neuron continued, but excitation in the trained neuroncontinued to increase and passed the normal limit.

There are multiple pathways leading to cell death, numerousmechanisms protecting a cell from death and no one of themis obligatory. This is, evidently, the consequence of the ionhomeostasis of neurons, which exhibits non-linear properties.The Hodgkin-Huxley equation [19] gives birth to a neuralimpulse only by a very precise regulation of the equationcoefficients. A neuron may be injured if the homeostatic reg-ulation of any ion current participating in neuronal function isdisturbed. However, each of the parameters may decline fromthe homeostatic equilibrium without symptoms of neuronaldamage under compensatory changes of other parameters.For example, an excitotoxic mechanism cannot be the solereason for cell death [7]. Nevertheless, an important reason forneuronal damage and death is a strong excitation. An absenceof coordination between membrane potential and excitabilityis observed during damage-recovery of cells.

An injury or excitation evokes depolarization and transientgrowth of excitability in neurons, so a phase of high excitabil-ity in dying neurons inevitably changes to an unexcitable state.An injurious agent usually evokes a short-term depolarizationand excitation, which after a few minutes is replaced by aslow compensatory hyperpolarization followed by a persistentdepolarization when membrane potential eventually reacheszero even if the injurious agent is removed [32], [21]. A changein excitability after damage is not the direct consequenceof a shift in the membrane potential [61], [10]. In fact, thethreshold may increase during persistent depolarization evokedby injury. In such a case, hyperpolarization paradoxicallyrecovers excitability. A shift in cellular status to damage orprotection is accompanied by a homeostatic compensation.

106 INTERNATIONAL JOURNAL OF COMPUTATIONAL COGNITION (HTTP://WWW.YANGSKY.COM/YANGIJCC.HTM), VOL. 3, NO. 1, MARCH 2005

These compensation processes might easily be demonstratedwith the example of the emergence of drug-dependence. Acuteadministration of opiates lowers neuronal excitation, reducecell damage and therefore it is pleasant for the subject [45],[40]. However, cellular homeostasis compensates for this over-protection, aggravates neuronal damage, compels one to takenew doses of opiates and drug-dependence is developed [38],[45].

APPENDIX A2: DOES COMPENSATION COUNTERACT THE

CURRENT DISTORTION OF CELL STATUS OR DOES IT

DETERMINES THE CAUSES FOR ALTERATIONS?

Instability that is put at the basis of trial-and-error learning isnot related to data scattering. We have revealed that instabilityincreases when the neural system tries to use its currentresponse as an instrumental reaction. The process of trial-and-error may continue after acquisition of reliable data about thetype of current behavior. An animal tends to generate severalof the same kind of actions in succession, and failure of actionsis alternated with their generation.

The status of ambient neurons exerts an effect on the state ofthe trained neuron. It has been demonstrated that co-activationof neural cells is not permanent but rather is spontaneouslychangeable, and is changeable during reception of a visualsignal, during a motor response, and while performing be-havioral tasks [56]. Mesolimbic dopamine neurons, whichare responsible for reinforcement, are activated by rewardingevents that are better than predicted and are depressed byevents that are worse than predicted [43]. The activationsof such neurons report preferentially a predicted error inenvironmental events. However, it is still unclear how a neuralsystem elaborates the predictive signal. A negative reinforce-ment by itself may not result immediately in neuronal damage,as revealed by the properties of the membrane potentialduring classical conditioning. Unavoidable pain may mobilizemembrane mechanisms for damage compensation when theexpectation of pain is great.

APPENDIX A3: EMISSION OF PURPOSEFUL ACTIONS

Let us discuss the process of emission of purposeful actions.In the cases where the appearance of the harmful influencesturn out to be unpredictable, the neuron overcompensates theharmful influences. This may be a hyperpolarization of thecellular membrane as in the control neuron during instrumentalconditioning or as in the trained neuron at the beginning ofthe learning. At the beginning of learning, the trained neuronreceives neither damage nor protection when it generates aninstrumentalAP . At the same time, the trained neuron receivesa harmful stimulus, is transiently injured and increases delayedovercompensation when it fails to generate anAP in responseto the conditioned stimulus. Furthermore, a neuron does notreceive a harmful stimulus when it generates anAP . How-ever, now the unconditioned stimulus non-attendance is thepositive reinforcement and this induces transient protection.As a result, the trained neuron cancels the compensation anddamage increases. Nevertheless, a logical preconditioning forthe appearance of the harmful stimulus is still not elucidated by

the neural system, the neuronal damage continues to increaseand the excitable membrane properties are disturbed. This inturn prevents theAP generation, provokes the appearance ofthe next unconditioned stimulus, supplementary damage andinstability.

Harsh increase in activity of a majority of the neurons in themedial preoptic nucleus (sexual center) during sexual arousalabruptly decreases during intromission and this decrease be-comes strong and protracted after ejaculation [44]. There islittle doubt that an organism supports the stability of its innermedium. This is evidently that the organism may supportonly that value of inner constants that does not damage itsneurons. Just as motivation and the very life of an organismdepends upon homeostasis, so the neuronal behavior and lifedepend upon neuronal homeostasis, which may serve as a basisfor actions directed to avoidance of damage. Particularly, ionhomeostasis regulates input-output characteristics of neurons.Shifts in ionic homeostasis result in a change in informationalcapability of the neuron [48] and can concomitantly increasea neuron’s vulnerability to excitotoxicity [32], [34].

APPENDIX A4: TRUTH VALUES OF THE ADAPTATION RULES

Truth values of the rulesn.1, n.2 are:

Tn.1 = T{P (γnvn); In.1(x,Θ)},Tn.2 = T{P (−γnvn); In.2(x,Θ)},

where for the trained neuronIn.k(x,Θ) are equal to:

I1.1 = P (−x2 − a2),I1.2 = max[P (x2 − a2); T{nl(r); S(Θb)}],I2.1 = max[T{P (−x2); P (−x3)};T{pl(r); S(y1)}],I2.2 = max[T{P (x2); P (x3))};nl(r); P (y1 − a1)],I3.1 = max[P (x2 − a2); T{P (−x3); S(x2)],I3.2 = max[T{P (x3); S(x2)};T{P (−x2 − a2);

max[S(Θa −Θb); T{P (x1 − a1); P (Θa −Θb − 0.5)};T{S(x1); P (Θb −Θa − 0.5)}]}],

Ia.1 = T{P (x1 − a1); nl(r)},Ia.2 = T{P (x1 − a1);max[pl(r); s(r)]},Ib.1 = T{S(x1); nl(r)},Ib.2 = T{S(x1);max[pl(r); s(r)]},Ic.1 = T{P (x1 − a1); s(r)},Ic.2 = T{P (x1 − a1);max[nl(r); pl(r)]},Id.1 = T{S(x1); s(r)},Id.2 = T{S(x1);max[nl(r); pl(r)]},

whereS(xk) = P (ak − |xk|). In order to obtainIn.k(y,Θ′)for the surrounding neurons we should replace in the aboveexpressionsx → y andΘ → Θ′.

For the experimental situation which was discussed in theSection II-B, the membership functionspl(r), nl(r), s(r) for

SANDLER & TSITOLOVSKY, ATTITUDE TO EXPECTED REWARD AS THE BASIS FOR FUZZY LOGIC OF THE BRAIN 107

the trained neuron are equal to (see page 102):

nl(r) = S(x1) = P (a1 − x1),pl(r) = T{P (x1 − a1); P (Θa − 0.5)},s(r) = T{P (x1 − a1); S(Θa)}

= T{P (x1 − a1); P (0.5−Θa)},and for the surrounding neurons:

nl(r) = S(x1) = P (a1 − x1),pl(r) = T{P (x1 − a1);max[T{P (y1 − a1); P (Θ′a − 0.5)};

T{P (a1 − y1); P (Θ′b − 0.5)]})},s(r) = T{P (x1 − a1);max[T{P (y1 − a1); S(Θ′a)};

T{P (a1 − y1); S(Θ′b)}]},The function Γ(v, x,Θ) is found by combination of theseexpressions with (20) and (21).

REFERENCES

[1] Ashby W.R., (1956), An Introduction to Cybernetics, Chapmen&Hall,London. (Internet, 1999, http://pcp.vub.ac.be/books/IntroCyb.pdf).

[2] Aubin J.P., (1990), Fuzzy Differential Inclusions,Problems of Controland Information Theory, 19(1), 55-67.

[3] Balaban, P.M. (2002).Cellular mechanisms of behavioral plasticity interrestrial snail.Neurosci.& Biobehav.Rev.,26, 597-630.

[4] Brooks, P.J. (2000). Brain atrophy and neuronal loss in alcoholism: arole for DNA damage?Neurochemistry International, 37, 403-412.

[5] Butnariu D., Klement E.P., “On Triangular Norm-Based Measures andGames with Fuzzy Coalitions”, Kluwer Ac.Publ.,Dordrecht-Boston-London, 1993.

[6] Cannon, W.B. (1929). Organization for physiological homeostasis.Phys-iological Reviews. 9, 399-431.

[7] Centonze, D., Marfia, at al. (2001). Ionic mechanisms underlyingdifferential vulnerability to ischemia in striatal neurons.Progress inNeurobiology, 63, 687-696.

[8] Cleary, L.J., at al. (1998). Cellular correlates of long-term sensitizationin Aplysia. Journal of Neuroscience, 18, 5988-5998.

[9] Colmers, W.E., Bleakman, D. (1994). Effects of neuropeptide Y on theelectrical properties of neurons.Trends in Neurosciences, 17, 373-379.

[10] Corronc, H.L., at al. (1999). Ionic mechanisms underlying depolarizingresponses of an identified insect motor neuron to short period of hypoxia.Journal of Neurophysiology, 81, 307-318.

[11] Davis, M., at al. (1994). Neurotransmission in the rat amygdala relatedto fear and anxiety.Trends in Neurosciences, 17, 208-214.

[12] Droge, W. (2002). Free radicals in the physiological control of cellfunction. Physiological Reviews, 82, 47-95.

[13] Everitt, B.J., at al. (2001). The neuropsychological basis of addictivebehavior.Brain Research Reviews, 36, 129-138.

[14] Fadda, F. Rossetti, Z.L. (1998). Chronic ethanol consumption: fromneuro-adaptation to neuro-degeneration.Progress in Neurobiology, 56,385-431.

[15] Friedman Y., Sandler U., (1996). Evolution of Systems under FuzzyDynamic Law,Fuzzy Set and Systems, 84, 61-74.

[16] Friedman Y., Sandler U., (1997). Dynamics of Fuzzy Systems,Inter.J.of Chaos Theory and Application,, 2, 5-21.

[17] Friedman Y., Sandler U., (1999). Fuzzy Dynamics as Alternative toStatistical Mechanics,Fuzzy Set and Systems, 106, 61-74.

[18] Gerard, C., Gerard, N.P. (1994). The pro-inflammatory seven-transmembrane segment receptors of the leukocyte.Current Opinion inImmunology, 6, 140-145.

[19] Hodgkin, A.L. Huxley, A.F. (1952). Currents carried by sodium andpotassium ions through the giant axon of Loglio.Journal of Physiology,116, 449-472.

[20] Houten, J.V. (1994). Chemosensory transduction in eukaryotic microor-ganisms: trends for neuroscience?Trends in Neurosciences, 17, 62-75.

[21] Hyllienmark, L. Brismar, T. (1999). Effect of hypoxia on membranepotential and resting conductance in rat hippocampal neurons.Neuro-science, 91, 511-517.

[22] Keifer, J., at al. (1995). In vitro classical conditioning of abducens nervedischarge in turtles.J.Neuroscience, 15. pp.5036-5048.

[23] Klement E.P., at al., “Triangular Norms”,Vol.8 of Trends in Logic,Kluwer Ac.Publ.,Dordrecht-Boston-London, (2000).

[24] Klotz, K.N., Jesaitis, A.J. (1994). Neutrophil chemoattractant receptorsand the membrane skeleton.Bioassays, 16, 193-198.

[25] Koob, G.F., Moal, M.L. (2001). Drug addiction, dysregulation of reward,and allostasis.Neuropsychopharmacology, 24, 97-129.

[26] Kotlyar B.I., Yeroshenko T.M. (1971), Hypothalamus glucoreceptors:the phenomenon of plasticity.Physiology& Behavior, 4, 609-615.

[27] Kubota N. at al., (2001), Self-Consciouness and Emotion for a PetRobot with Structured Intelligence, InProc. of the Joint 9th IFSA WorldCongress and 20th NAFIPS Intrnational Conference, 2786-2791.

[28] Kupfermann, I. (1994). Neural control of feeding.Current Opinion inNeurobiology, 6, 869-876.

[29] Leedham, J.A., at al. (1992). Membrane potential in Myenteric neuronsassociated with tolerance and dependence to morphine.The Journal ofPharmacology and Experimental Therapeutics, 263, 15-19.

[30] Maghazachi, A.A. (2000). Intracellular signaling events at the leadingedge of migrating cells.The international Journal of Biochemistry &Cell Biology, 32, 931-943.

[31] Maier I., (1993), Gamete orientation and induction of gametogenesis bypheromones in algae and plant.Plant Cell & Environment, 16, 891-907.

[32] Martin, R.L., at al. (1994). The early events of oxygen and glucose de-privation: setting the scene for neuronal death?Trends in Neurosciences,17, 251-257.

[33] Martin-Soelch, C., at al. (2001). Reward mechanisms in the brainand their role in dependence: evidence from neurophysiological andneuroimaging studies.Brain Research Review, 36, 139-149.

[34] Mattson, M.P. (1998) Modification of ion homeostasis by lipid peroxi-dation: roles in neuronal degeneration and adaptive plasticity.Trends inNeurosciences, 20, 53-57.

[35] Meldrum, B.S. (2000). Glutamate as a neurotransmitter in the brain:review of physiology and pathology.J. Nutr. 130, 1007-1015.

[36] Nicolaidis, S., Even, P. (1992). The metabolic signal of hunger andsatiety, and its pharmacological manipulation.International Journal ofObesity, 16, (Suppl.3), 31-41.

[37] Piazza, P.V., Moal, M.L. (1998). The role of stress in drug self-administration.Trends in Pharmacological Sciences, 19, 67-74.

[38] Pulvirenti, L.(1993). Functional neurotoxicity of drugs of abuse.Funct.Neurology, 8, 433-440.

[39] Rakic, P., at al. (1994). Recognition, adhesion, transmembrane signalingand cell motility in guided neuronal migration.Current Opinion inNeurobiology, 4, 63-69.

[40] Rauca, C., at al. (1999). Effect of somatostatin, octreotide and cortistatinon ischemic neuronal damage following permanent middle cerebralartery occlusion in the rat.Naunyn Schmiedebergs Archive of Pharma-cology, 360, 633-638.

[41] Sandler U., Tsitolovsky L., (2001). Fuzzy Dynamics of Brain: ReflexElaboration,Fuzzy Sets and Systems, 121/2, 237-245.

[42] Sandler U., (2002), “Dynamics of Fuzzy Systems: Theory and Ap-plications, Proc. 23 Annual Linz Seminar on Fuzzy Sets Theory andFuzzy Analysis, Linz2002, Linz, Austria, Feb. 5-9, pp.60-66; Evolutionof Fuzzy Systems and Dynamics Theory,Proc. 2002World Congress onComputational Intelligence, FUZZ-IEEE2002, Honolulu, Hawaii, May12-17, (2002).

[43] Schultz, W. (2002). Getting formal with dopamine and reward.Neuron,36, 241-263.

[44] Shimura, T., at al. (1994). The medial preoptic area is involved in bothsexual arousal and performance in male rats: re-evaluation of neuronactivity if freely moving animals.Brain Research, 640, 215-222.

[45] Sklair-Tavron, L., at al. (1996). Chronic morphine induces visi-ble changes in the morphology of mesolimbic dopamine neurons.Proc.Natl.Acad.Sci.USA, 93, 11202-11207.

[46] Stein L., at al. (1994). In vitro reinforcement of hippocampal bursting: asearch for skinner’s atoms of behavior.Journal of Experimental Analysisof Behavior, 61, 155-168.

[47] Sutton R.S., Barto A.G. (1998), Reinforcement Learninig, MIT Press,Cambridge,MA.

[48] Turrigiano G.G., Nelson S.B. (2000). Hebb and homeostasis in neuronalplasticity. Current Opinion in Neurobiology, 10, 358-364.

[49] E. Tsitolovsky L.,(1986). Integrative activity of nerve cells duringrecording of a memory.Usp.Fiziol.Nauk.17, 83-103.

[50] Tsitolovsky, L. (1997), A model of motivation with chaotic neurondynamics.J. Biol. System, 5, 197, 301-323.

[51] Tsitolovsky L.E., Babkina N.V. (1992), Selective form of an excitablemembrane plasticity.Brain Res., 595, 67 -73.

[52] Tsitolovsky L., Guselnukov, (1974), Nonclassical state of neuron,Nauch.Docl. Vyssh. Shkol. Biol. Nauki., 10, 36-47.

108 INTERNATIONAL JOURNAL OF COMPUTATIONAL COGNITION (HTTP://WWW.YANGSKY.COM/YANGIJCC.HTM), VOL. 3, NO. 1, MARCH 2005

[53] Tsitolovsky, L., Babkina, N. (2002). Neurons evaluate both the amplitudeand the meaning of signals.Brain Res., 946, 104-118.

[54] Tsitolovsky L.E., Shvedov A. (1997), Instrumental conditioning of theactivity of putative command neurons in the mollusk Helix.Brain Res.745, 271-282.

[55] Tsitolovsky, L., Schvedov, A. (1998). Collaboration of trained andforeign neurons during formations of a local instrumental reflex.Neu-roReport, 9, 2297-2303.

[56] Vaadia, E, at al. (1995). Dynamics of neuronal interactions in monkeycortex in relation to behavioral events.Nature, 373, 515-518.

[57] Woods, S.C., Ramsay, D.S. (2000). Pavlovian influences over food anddrug intake.Behavioral Brain Research, 110, 175-182.

[58] Zadeh L., (1975), The concept of a linguistic variable and its applicationto approximate reasoning,Inform.Sci, I-8, 199-250; II-8, 301-357; III-9,43-80.

[59] Zadeh L., (1978), Fuzzy sets as basis for a theory of possibility,FuzzySet and Systems, 1, 3-28.

[60] Zadeh L., (2002), Invited Lectures on: 2002World Congress on Compu-tational Intelligence, FUZZ-IEEE2002, Honolulu, Hawaii, May 12-17,and 9th Inter. conference on Information Processing and Managementof Uncertainty, IPMU2002, Annasy, France, July 5-9.

[61] Zhan, R.Z., at al. (1998). Intracellular acidification induced by mem-brane depolarization in rat hippocampal slices: roles of intracellularCa2+ and glycolysis.Brain Research, 780, 86-94.