university studies 15a: consciousness i neural network modeling

University Studies 15A:

Consciousness I

Neural Network Modeling

The Neuron

axon (output) cell nucleus

dendrite (input)

t

i(a)

i(b)

i(c) o

wa

wc

wb

The Schematic Model of a Neuron

output = 1, if ∑ input(i) wi > t 0, otherwise

i(1)

i(2)

i(3)

i(i)

i(n)

t……

w1

w2

w3

wi

wn

output

The Perceptron: A Single-layered Neural Network

∑𝑖=1

𝑛

i( 𝑖)×𝑤𝑖≥ 𝑡if output = 1

output = 0else

t = threshold activation level

i(1)

i(2)

i(3)

i(i)

i(n)

t……

w1

w2

w3

wi

wn

output

We can think of the input into the perceptron as a vector: [𝑖(1)𝑖(2)⋮𝑖 (𝑖)⋮

𝑖(𝑛)]

i(1)

i(2)

i(3)

i(i)

i(n)

t……

w1

w2

w3

wi

wn

output

We can think of the weights of the connections between the units in the perceptron as a vector: [

𝑤(1)𝑤(2)⋮

𝑤 (𝑖)⋮

𝑤(𝑛)]

Linear Algebra: the Mathematics of Many Dimensions (Quick Version)

Quantities in one dimension: the scalar

nQuantities in two dimensions: the two-dimensional vector

(a,b)

(a,b)

a

bThe vector (a,b) has both an amount and a direction

More generally, we can think of vectors in an n-dimensional space (this can be arbitrarily large):

(a1, a2, a3, … ai, … , an)

Or, the simpler notation: �⃑�The vector arithmetic we need for neural networks is simple:

Vector addition: �⃑� �⃑�(a1, a2, a3, … ai, … , an) (b1, b2, b3, … bi, … , bn) =

(a1+b1, a2+b2, a3+b3, … ai+bi, … , an+bn)

+

+

(a1, a2, a3, … ai, … , an)

Or, more generally: multiplying a vector by a scalar

n∙⃑𝑎

Then: �⃑�(a1, a2, a3, … ai, … , an)

(2a1, 2a2, 2a3, … 2ai, … , 2an)

+

+

�⃑� = 2⃑𝑎

= (na1, na2, na3, … nai, … , nan)

Some general properties of addition should be clear:

�⃑� �⃑�+ = �⃑�+�⃑��⃑� �⃑�+ �⃑�) +( )= �⃑� �⃑�+ ( �⃑� +

Now that we have these facts, we can introduce two important features of vectors: linear combination and linear independence

is a linear combination of and�⃑� �⃑� �⃑�if there are scalars m and n such that m⃑𝑎+ n⃑𝑏=�⃑�

is linearly independent of and�⃑� �⃑� �⃑�Otherwise,

In an n-dimensional space, any set of n vectors that are linearly independent of one another can be used (in linear combination) to describe all the other vectors in the space.

That set of vectors spans the space.

More Vector Math: Multiplication (Inner Product)

Another important mathematical operation performed on vectors is the inner product (which is a scalar quantity).

�⃑� ∙ �⃑�=∑𝑖=1

𝑛

𝑎𝑖𝑏𝑖

Given this, the following equalities should be clear:

�⃑� ∙ �⃑�=�⃑� ∙ �⃑��⃑� ∙ ( �⃑�+�⃑� )=�⃑� ∙ �⃑�+ �⃑� ∙ �⃑�

i(1)

i(2)

i(3)

i(i)

i(n)

t……

w1

w2

w3

wi

wn

output

For the Perceptron: Activation is the inner product of the input vector and the weighting vector

∑𝑖=1

𝑛

i( 𝑖)×𝑤𝑖=�⃑� ∙�⃑�

a(1)

a(2)

a(3)

a(i)

a(n)

b(j)……

Oj

b(m)

b(1)

……

Om

O1

Now lets complicate the model: two layers of neurons are connected

a(1)

a(2)

a(3)

a(i)

a(n)

b(j)……

Oj

b(m)

b(1)

……

Om

O1

For the link between the ith unit in Layer A and the jth unit in Layer B, there is a connection weigh wi,j

This set of weights defines a weighting matrix of dimension (m,n) (columns for Layer A, rows for Layer B)

[ 𝑤1,1 𝑤2,1 ⋯ 𝑤𝑛 ,1

𝑤1,2 𝑤2,2 ⋯ 𝑤𝑛 , 2

⋮ ⋮ ⋱ ⋮𝑤1 ,𝑚 𝑤2 ,𝑚 ⋯ 𝑤𝑛 ,𝑚

]Wn,m =

For our purposes, it perhaps is best to think of matrices as entities that transform vectors of one dimensionality into a different dimensional space in a way determined by the values of the rows and columns in the matrix.

a(1)

a(2)

a(3)

a(i)

a(n)

b(j)……

Oj

b(m)

b(1)

……

Om

O1

One can describe the output from Layer A as a vector⃑𝑂𝐴

The activation values of Layer B also are a vector: �⃑�𝑒𝑡 𝐵

Putting everything together, we have the equation:

�⃑�𝑒𝑡 𝐵 �⃑�𝐴= W ∙

Finally, because the output from Layer B depends on the threshold tj for each unit

= f(W ∙ )⃑𝑂𝐴�⃑�𝐵 = f( )⃑𝑁𝑒𝑡 𝐵

So,… what does all of this get us?

We now can describe Hebb’s learning rule:

∆wij = aibj

ai and bj = the output values for the ith unit in Layer A and the jth unit in Layer Bwij = the connection weight between the units = a learning parameter

Note that if either ai or bj is 0, the weight does not changeNote also that the vectors and matrix can be arbitrarily large, since we now are tracking relations between individual units

So,… where does this simple learning rule get us?Let’s begin with a simple system:

1. Sixteen input units are connected to two output units

2. Only two input units are active at a time.

3. They must be horizontal or vertical neighbors

4. Only one output unit can be active at a time (inhibition is marked by the black dots).

If one trains the network via Hebbian learning using a series of activations that follow the neighbor rule, the system settles into a stable set of weights.

Trial 1 Trial 2 Trial 3

Filled circle: Output Unit 1 gave the input from that unit a higher weight, Empty circle: Output Unit 2 gave the input from that unit a higher weight. Heavy line: When the two input units were active, Output Unit 1 wonThin line: Output unit 2 won the competition. Since the output units organize their responses through mutual inhibition, they must find some feature to divide the input domain, and so they discover a topographic map.

This simple example resembles more complex cases:

In the retina, there are “on-surround” and “off-surround” cells that send their activation to the thalamus. The thalamus processes the information and passes it along to the primary visual cortex.

On-center off-surround (active)

Off-center on-surround(active)

Off-center on-surround(Inactive)

On-center off-surround(inactive)

The so-called “simple cells” of the primary visual cortex organize themselves through mutual inhibition as they divide the inputs from the thalamus (LGN).

How they divide the input space is to learn to respond to line segments at a specified angle. Some respond to 45°, some to 72°, and so. V1 uses “coarse coding:” not all angles are represented. Instead, angles in between can be represented by linear combinations of activation vectors.

In the simple example of the artificial network, the network had input that followed certain regularities, and it divided the input space in half to match the binary output dimension. In the visual system as well, there are regularities that the system captures, constrained by its input and output design. In V1, the simple cells start with angled line segments. There are regularities among the patterns of line segments taken from the natural world that the complex cells then capture.

Neural networks extract patterns and divide an input space.

This can lead to odd results with implications for biological neural networks.

David McClelland tested the ability of a neural network to build a classification tree based on closeness of attributes.

He built a network that could handle simple property statements like:Robin can grow, move, fly.Oak can grow.Salmon has scales, gills, skin.Robin has wings, feathers, skin.Oak has bark, branches, leaves, roots.

Baars and Gage discuss this and give the design:

What Baars and Gage do not discuss was the next step.

McClelland fed the system facts about penguins:

Penguin can swim, move, grow.Penguin has wings, feathers, skin.

The result was a tree that did a good job:

The results were profoundly different if they gave the facts about penguins interleaved with facts about the other objects or if it was all penguins all the time (we’ll come back to this result when we discuss memory):

People explored the properties of networks as pattern extractors from many different angles.

For example, trying to teach a system how to handle relative clauses proved very hard.

Then people tried modeling the system on a feature of the child’s brain: that not all of the memory resources are there from the beginning.

They built a system that initially had limited short-term memory to handle sentence structure and simply jettisoned all complexities. Then they slowly expanded the size of short-term memory, and the system mimicked children’s behavior in acquiring the ability to handle relative clauses.

That is, their models taught them to respect issues of timing and resources.

Another aspect of neural networks in the brain that people explored through artificial networks is recurrency, when nodes in networks loop back on themselves.

One absolutely crucial feature of recurrent networks is the ability to complete partial patterns:

The image of the Dalmatian is very incomplete, but the brain feeds back knowledge of Dalmatians to the visual system, which then produces a yet more complete view and cycles in loops until perception settles into “Dalmation.”

However, if a recurrent system insists on settling into its best guess, it will never be able to learn anything new.

Artificial neural network modelers borrowed from the brain again to provide a way to selectively shut down recurrent connections:

Whenever a new pattern produces great activity as the mutually inhibiting neurons in the “IT cortex” cannot quickly settle into a pattern, that activity activates the “Basal Forebrain” to produce acetylcholine that shuts down the recurrent connections in the “IT cortex,” allowing the new pattern to be assimilated into the organization of the input space.

The use of recurrent connections is central to an important model for learning in neural networks that does not rely on the “back-propagation of error” initially use to train the hidden units that overcame the limitations of perceptrons.

The first layer of memory suggests its best guess, F1, and passes it to the second layer. If that layer also finds a pattern like F1, the recurrent connection quickly leads to a settled state. If it finds and F2 that is too different from F1, it passes activation back to the first layer that removes the block on resetting the second layer, sends it the input, and retrains the system.

It really does work.

These sorts of pattern-completing, self-modifying networks appear throughout the brain.

Baars and Gage stress that 90% of the connections between the thalamus and V1 go from V1 to the thalamus as re-entrant connections rather than feed-forward input.

Many neural net modelers have developed systems based on re-entrant brain connectivity:

What needs to be stressed is that the neural network modelers can test the behavior of networks (for example, the effect of fear arousal from the amygdala on visual object recognition in the IT (inferotemporal cortex)).

Artificial neural networks have given us a model for memory in the changing of synaptic connections strength (the Weighting matrix).

The models present us with “objects” at all levels that are the ways the network divides the input space into a space with an implicit dimensionality. For V1, for example, that space largely is composed of angled line segments and then more complex sets of combinations of line segments. As one goes higher into the visual cortex, ever more complex mutually-differentiated patterns define the “base vectors” that describe the object space.

Artificial neural networks give us testable ways to think about how the brain operates that simply were not available before these models were developed.

The success of the neural network models and the growing sophistication of our understanding of them allow us to approach a system like visual consciousness with tools that can help us explore the dynamics:

university studies 15a: consciousness i neural network modeling

Documents