noise analysis of polarization-based optoelectronic connectionist machines

Noise analysis of polarization-basedoptoelectronic connectionist machines

Michael G. Robinson and Kristina M. Johnson

Adaptive optical systems such as the bipolar, polarization-based optical connectionist machine are capableof operating in the presence of substantial noise generated by optoelectronic devices such as spatial lightmodulators, sources, and detectors. We present results on two optoelectronic connectionist machines thatimplement the single-layer delta rule and backward error propagation neural network algorithms andanalyze the influence of noise on their performance. Results show that an optoelectronic neural networkwith 200 input units can easily classify 30 random patterns with a spatial light modulator contrast ratio of10:1 and output cross talk of 10%.

1. Introduction

Artificial neural networks contain large numbers ofsimple, massively interconnected, nonlinear process-ing units. The power of this approach to computationis that the weighted interconnections between theprocessing units are trained by example.' Further-more, the parallel distributed nature of these net-works provides a certain degree of fault tolerance.

The inherent parallelism of optics is often men-tioned as the motivation for using optoelectronictechnology to represent both the neurons and syn-apses in neural network hardware. However, opticalprocessing systems suffer from noise generated byimperfect devices, optical aberrations such as pin-cushion and barrel distortion, detector noise, andoptical cross talk.2 Digital electronic systems are moreimmune to noise caused by transistor logic levelrestoration. As parallel optoelectronic neural net-work hardware is scaled to sizes that compete withdigital electronics, the question arises as to whetherthe noise inherent in optical systems severely limitstheir overall performance. We present results on theoperation of two optoelectronic connectionist ma-chines (OCM's): The first uses pixels in twistednematic liquid crystal televisions (LCTV's) to storeanalog values of the desired interconnection weights.The second uses spatially multiplexed, binary valued

The authors are with the Department of Electrical and Comput-ing Engineering, Optoelectronic Computing Systems Center, Uni-versity of Colorado, Boulder, Colorado 80309.

Received 5 April 1990.0003-6935/92/020263-10$05.00/0.© 1992 Optical Society of America.

pixels in ferroelectric liquid crystal (FLC) spatial lightmodulators (SLM's) to represent analog weights. Werefer to the former as the analog OCM and the latteras the quantized OCM. In Section II the origin ofsystem noise and its influence of the OCM's operatingas two-layer networks are analyzed. Computer mod-els parametrized by real device characteristics demon-strate the scalability of this optoelectronic hardware.Section III discusses the influence of noise on theconvergence rate of three-layer OCM's. Experimentalresults on the optoelectronic implementation of par-ity checking and THEONet, a solar flare predictor,are also presented. Section V describes a noveloptoelectronic hidden layer device, which consists ofpairs of amorphous silicon photosensors connected inseries with liquid crystal modulators. This deviceallows the three-layer connectionist machine to oper-ate in an all-optical, feedforward processing mode.Finally, Section V summarizes the key points of thispaper.

II. Influence of Noise on the Two-Layer OptoelectronicConnectionist Machines

A. Analog Optoelectronic Connectionist Machine

1. Operation

The analog OCM, which uses polarization encoding torepresent bipolar weights, has been extensively de-scribed elsewhere.3 6 Briefly, collimated, verticallypolarized laser light illuminates a 1 x 16 columnarinput vector programmed on a 220- x 240-pixel,twisted nematic LCTV (Model LVD-012) SLM (seeFig. 1). Each processing unit or artificial neuronconsists of a column of 10 x 160 pixels. The input

10 January 1992 / Vol. 31, No. 2 / APPLIED OPTICS 263

polarizerpolarizing

beamLCTV2 splitter

cylindricallensI detector

//_l array

orlzontallarlzation

{[exp(-ax) + 1]-'}, a is the activation coefficient, I isthe jth element of the input pattern p, and wij is theweighted interconnection between I and 0I'. Thedifference between O/ and the target output unit T/'generates an error term, which is used to modify theweights according to the least-mean-square algo-rithm 7 :

(2)Awij = rQ 2 (Tip - Owp,p

Fig. 1. Schematic of the nematic liquid-crystal-based OCM archi-tecture. The input vector is encoded onto LCTV1, and the weightmatrix is encoded onto LCTV2. The polarization-based logic isdecoded by using the polarizing beam splitter and the matcheddetector arrays.

units are encoded as either 0 or 900 polarizationstates. A polarizer positioned at the output of LCTV1converts the input polarization states to intensityencoded l's or 0's. The columnar input vector is thenimaged onto a second twisted nematic LCTV, whichstores the weighted interconnections between theinput and output units. This two-dimensional SLM(LCTV2) is sectioned into a 16 x 16 grid. Eachsynaptic weight is a maxipixel that contains 80 individ-ual LCTV pixels. These analog matrix elements per-form the desired weighted interconnection by modify-ing the input unit state of optical polarization. Bipolaroperation is obtained by passing the output beamtransmitted by a second LCTV through a polarizingbeam splitter, which separates the vertical and hori-zontal polarization components.

The summation required for the matrix-vectormultiplication is carried out by cylindrical lensespositioned at each output port of the polarizing beamsplitter. These lenses focus the different polarizationsonto two matched 1 x 16 Hamamatsu S994-18detector arrays. The detector readings are subtractedin pairs by using differential amplifiers, digitized, andinput to an IBM XT personal computer (PC) througha Data translation DTX311EX board. The differencebetween the vertical and horizontal polarization com-ponents yields the correct matrix-vector multiplica-tion. For example, multiplication of an input unit by-1 is accomplished by rotating the incident polariza-tion by 900 from vertically polarized light to horizon-tally polarized light. Hence subtraction of the horizon-tal output polarization from the vertical componentideally yields -1. Multiplication by + 1 is achieved byzero rotation of the input unit's vertical polarizationstate. Likewise, all multiplications between -1 and+ 1 are accessed by partial rotation of the polarizationstate between 00 and 90°.

A sigmoid nonlinearity is applied to the resultingoutput vector whose elements are proportional to thesubtracted detector readings. Hence the ith outputunit O/ of thepth pattern presented to the network isgiven by

Owp = f s wijgjpoi n er

where f represents the sigmoid nonlinearity

where is the learning rate parameter. After manypattern presentations the weights are optimized ortrained to perform the desired pattern mapping be-tween a set of input and output units.

The two-layer analog OCM successfully performsassociative recall between linearly independent inputpatterns and orthogonal output patterns and ran-domly generated input and target patterns. Thelinearly independent patterns consist of 16 units withequal numbers of l's and 0's and with an averageHamming distance of 8. The number of linearlyindependent pattern associations that can be achieveddepends on how the particular associations are cho-sen. For example, the analog OCM was successfullytrained with over 20 input/output pattern pairs. Theeffect of learning more associations decreases the sizeof the decision region in the input space, which isrelated to the inner product between the input andoutput vector pairs. This can be shown by presentinga linear combination of two stored vectors a and b(not necessarily orthogonal) to the trained network.The combination is parameterized in terms of anangle 00. Thus the input vector can be represented byI = a cos(0) + b sin(0). The inner product betweenthe orthogonal target vectors of the two stored pat-terns and the actual output should be 1.0 for correctassociation and 0.0 for an incorrect association. Theresults of this experiment show a decrease in theregion when the number of stored patterns is in-creased from two to eight [see Figs. 2(a) and 2(b)].The random input patterns are associated with ran-dom targets. Typically only (N/2) patterns can bestored because of the limitations of two-layer net-works.8

For the test sets used in these experiments Fig. 3shows that the OCM performs the desired associa-tions with a total sum squared (tss) error given by

1 m_ tss error/pattern = - I (Ti' - 0I')l (3)

m P=, i-l

of < 0.1 per pattern after 20 epochs, where m is thenumber of patterns in the test set and n is thenumber of elements in the input vector.

In Subsection II.A.1 we have shown an analognematic liquid crystal OCM where 16 fully intercon-nected processing units can associate pairs of linearlyindependent and random patterns with a <0.1 tsserror per pattern. Noise caused by imperfect opticaldevices used in the implementation, however, de-grades the system performance. The origin of these

264 APPLIED OPTICS / Vol. 31, No. 2 / 10 January 1992

polarizerLCTV I

hn

cylindrical lena

detector array poverticaldetector array t polarization

[ 2 patterns(a) - 2 patterns(b)|

o 8 patternsia)|l * 8 patterns(b) I

0 30 6C 90

Admixture ()(b)

Fig. 2. Effect on the decision regions in the input vector space bystoring two and eight patterns, respectively. Plotted are thenormalized inner products between two target vectors 0(a) andO(b) and the output of the nematic liquid crystal OCM O(I) as afunction of the vector admixture. The angle 0 in degrees is used togenerate a linear combination of stored vectors and is defined asI = a cos(0) + b sin(0), where I is the input vector, a and b arestored vectors associated with the targets a and b, respectively.

noise terms and their influence on the performance ofthe analog OCM are addressed in Subsections II.A.2and II.A.3.

2. Analog OCM NoiseThe noise terms contributing to the nonideal perfor-mance of the analog OCM include the following:

.

0

Laser noise.Gaussian beam profile.

0 5 10 15

(3) Finite SLM contrast ratio.(4) Cross talk caused by optical aberrations.(5) Nonlinear polarization rotation of the intercon-

nection SLM.(6) Nonlinearity of the polarization logic.(7) Detector amplifier mismatch.(8) Electronic noise.(9) Temporal and spatial fluctuations in the ne-

matic liquid crystal SLM's.

Each of these noise terms is measured indepen-dently. The laser noise was found to be <0.1%. TheGaussian laser beam profile, shown in Fig. 4(a),causes a 40% intensity variation over the spatiallymodulated input vector. Hence a unit's activationvalue has a spatial dependence. The contrast ratio ofthe LCTV SLM's used in the analog OCM is typically15:1, as shown in Fig. 4(b). This results in zero valuedinput units with an effective magnitude of 0.07. Thecross talk between input units is measured to be<0.005, and hence it is considered negligible com-pared with the other noise terms.

A noise component specific to the analog OCM isthe nonlinear mapping between weights representedas voltages in the IBM PC and the subsequentvoltage-controlled unwinding of the nematic liquidcrystals that store the optoelectronic weights in theLCTV. To quantify this nonlinearity an input patternof all l's is displayed on LCTV1. This input vector isthen multiplied by the weight matrix with identical

1.

1.

0.

0.

0.

0.0

0 5

9

8.

7-68 5 10 1

5 10 15x (mm)

(a)

20

0Q6 Ei:

4 ;:

co

2

a

0

Epoch

Fig. 3. The tss error during training on the one-layer nematicliquid crystal OCM. The solid and dotted curves are obtained froma computer model that is trained with identical pattern/target sets.

1 2 3 4 5 6 7 8 9Input vector bit pairs

Input (Before spatial filtering)* Input (After spatial filtering)

(b)

Fig. 4. (a) Experimentally measured Gaussian profile of theinput collimated laser beam. (b) Experimentally measured con-trast ratio of the input nematic liquid crystal SLM. This wasmeasured by having alternate input vector pairs on and off andmeasuring the output optical intensity of the vertical stripesrepresenting the vector components.


0 30 60 9C

Admixture ()(a)

Z

-0.1

0 . . . . . . . . . I . r . .Experimental results

8 random patternso Expertment -- Stmulatton

16 linearly independent patterns

* ExperIment - Simulation

2

0.

Co01

't

Q0

*l1

.1 . . .- .

values on all the pixels. Detector pair 4 is used tomeasure the output for different uniform weightvalues. By normalizing the detected output the actualpolarization rotation performed by the pixels inLCTV2 is determined from the two polarizationcomponents. The results are plotted in Fig. 5 andclearly show that the result of this nonlinearity is toincrease the effect of the positive, excitatory intercon-nections. An attempt to remove this nonlinearity byusing the controlling software is difficult since theLCTV nonlinearities vary spatially and temporally.

There is also an inherent nonlinearity associatedwith the bipolar matrix-vector multiplication opera-tion. Let us assume that the weight value wiJ storedon the LCTV is a linear function of the appliedvoltage. Then each output vector element neti is thevector-matrix multiplication of the input vector com-ponent I. and the weight matrix elements w*. This isjust the intensity of the vertically polarized compo-nents minus the horizontally polarized components:

neti = cos2 ( wvIj) - sin2 ( iI)= cos (2 wj, (4)

which is not a linear function of w'j.Other sources of noise include output optical cross

talk. This is measured by placing alternating l's and- l's on the weight matrix and measuring the changein the output vector with all l's on the input vector.The noise observed in the output detectors averagedover 50 readings is random with a standard deviation

= 0.05 V. (Without averaging the nematic liquidcrystal SLM's are responsible for up to 1.0 V of noiseout of a ±5-V output range.) The detector amplifiermismatch is < 3%, which is determined by the accu-racy of the feedback resistor in the transimpedanceamplifier. In Subsection II.A.3 we discuss the influ-ence of these noise terms on the training rate or therate of decrease in the tss error of the analog OCM.

3. Influence of Noise Terms on Analog OCMPerformanceTo analyze the influence of these noise terms on thetraining rate of the OCM, a computer model is usedwith parameters taken from actual noise measure-ments on the OCM hardware. The finite contrast

ratio was accounted for by replacing all the zero inputvector components with 0.07. The inputs are alsospatially weighted by a Gaussian envelope function.The optical cross talk is implemented by adding to theinput units a factor proportional to the amount oflight overlapping adjacent detector elements mea-sured at the output. A third-order polynomial expres-sion is used to model the nonlinear mapping betweenthe desired weight values and the actual LCTVweight values. Detector mismatch is taken into ac-count by an offset applied to only one of the outputdetector arrays. Finally, the electronic noise is mod-eled by using a pseudorandom number generator inthe computer.

A close match was obtained between the computermodel and the experimental measurements of theOCM tss error (see Fig. 3). This verification of themodel allows us to use it in making predictions aboutthe operation of the OCM in the presence of noise.Figure 6 shows the tss error per pattern versus thenumber of epochs for an ideal OCM and the influenceof each noise term on the training rate. From thisfigure it can be seen that most of the error terms donot significantly affect the OCM performance. Thisresult is anticipated since the algorithm given in Eq.(2) modifies the weights to compensate for systemerror, which is generated in part by noise in theoptoelectronic hardware. The finite contrast ratio, forexample, causes an increased error in the outputvector component Oi. This produces a larger changein the weight values connecting output unit toinput vector I, which compensates for the imperfectSLM contrast ratio. A similar argument can be madefor the other linear errors. An analytic proof forconvergence requires knowing the statistics of thecoupled system, which are heavily pattern dependent.

The results shown in Fig. 6 indicate that the criticalnoise term that affects the OCM training efficiency isthe nonlinear mapping between the weights calcu-lated in the computer and those stored on the LCTVpixels. This nonlinearity essentially increases themagnitude of the positive weights while reducing the

;a

0.0

0.

1.2

1.0

0.8

0.6

-1.0 -0.5 0.0 0.5weight matrix value

Fig. 5. Nonlinear polarization rotation of thecrystal SLM.

0.

.001

nematic liquid

0 5 10 15Epoch

20

Fig. 6. Simulated training error for the analog nematic liquidcrystal OCM. The ideal OCM included the nonlinear polarization-based logic and the clipping of the weights. Each system error termwas added to this ideal machine to investigate individually theeffect of such terms on the machine's performance.


magnitude of the negative weights, as compared withan ideal OCM with a linear weight mapping. Whenlarge activation coefficients ( > 3.0) are used, thispositive bias is sufficient to cause severe error in theoutput vector components. This result is significantfor optical neural network system design as manyoptical devices (e.g., SLM's or photorefractives) thatare used to encode the interconnection weights arenonlinear. For machines based on these devices toperform learning using the delta rule or backwarderror propagation (Backprop) algorithms, these non-linearities should be small or corrected for in thelearning algorithm. Another view is that new algo-rithms are needed that take advantage of the natu-rally occurring nonlinearities in the physics of thesematerials.

By using this computer model, it is possible topredict the effect of the noise terms on scaling tohigher dimension machines. An analog OCM with200 input and output units associating ten randominput patterns with ten random target patterns wascomputer simulated. Figure 7 illustrates the rmserror per unit versus a pattern presentation for the200-dimension OCM, which is simulated with thesame noise parameters used in the 16-dimensionOCM. This result indicates the noise compensationability of adaptive optical processors. Larger systemscan operate without necessarily improving individualdevice performance. It is important to note the sparsedata set that is used to train the 200-dimension OCM.However, scaling a nonadaptive optical processor,such as the optical matrix-vector multiplier, requiresimproving the device contrast ratio, even for multiply-ing a single input vector by the desired matrix.9

In this subsection we have described the operationof the analog OCM implementing the delta-rule learn-ing algorithm. The influence of noise on the analogOCM was also analyzed. In Subsection III.B wedescribe the operation of the quantized OCM utilizingpixels in FLC SLM's. These devices provide a linearmapping between the weights calculated in the com-puter and those stored on the SLM. The origin of thenoise and its influence on this machine's training rate

.0 0.61

0.4

0.2

0s 5 10 15 20|i0 5 10 15 20

Epoch

Fig. 7. Simulation results of a 200-bit machine with systemerrors that are the same as those measured for the 16-bit nematicliquid crystal OCM. A comparison is made with the performance ofthe 16-bit machine. In each case the number of patterns asa percentage of input dimensionality N was the same (12%).For comparison the rms error per bit is plotted where the rmserror/bit = [(tss error/pattern)/N]"' 2 .

I x 32detector arrays

collimatedlaser beam

Fig. 8. Schematic diagram of the quantized, 32-dimension OCM.

are presented. Furthermore it is shown that thequantized OCM converges faster to a minimum tsserror than the analog OCM does.

B. Quantized FLC OCM

1. Operation and System NoiseThe architecture and operation of the quantizedOCM, shown in Fig. 8, have several important differ-ences compared with the analog OCM:

(1) The quantized FLCOCM associates vectorswith 32 units, which is twice the dimensionality of theanalog OCM.

(2) It is physically more compact. There are noimaging optics between the input SLM, the weightSLM, and the detectors. This reduces the length ofthe OCM from 200 to 25 cm. Photographs of theanalog and quantized OCM's, which are shown inFigs. 9 and 10, respectively, illustrate this point.

(3) It uses faster switching, binary surface stabi-lized FLC's 0 instead of nematic liquid crystals in theSLM's. The input SLM in this machine is one-dimensional, which allows addressing in a time gov-erned solely by the switching speed of the liquidcrystal rather than the time it takes to matrix addressa two-dimensional SLM. The importance of this

Fig. 9. Two-layer analog OCM.


.Q __ 200 bit machineI9. . I....o.. 16 bit machine

0 d

13j:m~

v.v

Fig. 10. More compact, quantized OCM.

feature arises when operating the machine in theprocessing mode rather than the training mode.

(4) Finally, it represents the analog matrix valuesby spatially multiplexed, binary FLC elements. Eachweight consists of 4 x 4 binary pixels. This provides17 grey levels for each of the quantized matrixelements. Therefore, a 128 x 128 STC TechnologyFLC SLM"1 is used to store the connection weights.Quantizing the weight values also gets rid of thenematic liquid crystal nonlinearities that degrade theanalog OCM performance.

A direct comparison of the analog and quantizedOCM's is made by analyzing the experimental resultsof associating identical sets of random input andoutput patterns. (Only 16 units of the quantizedOCM were accessed.) In each case the OCM weightmatrix started with the same random values. The tsserror per pattern during training of the two machinesis shown in Fig. 11. The final tss error per pattern forthe quantized OCM is 20 times less than that of theanalog OCM. The difference between the machinesresults from the individual system components' per-formances, which are summarized in Table I.

The main noise differences between the two sys-tems are the output cross talk and the liquid crystalnonlinearities. The increased output cross talk for thequantized OCM is due to optical distortions at the

eS

0

.001 4-0 5

Epoch

Table 1. Comparison of System Noise in the Analogand Quantized OCM's

Nematic LC FLCSystem Error Machine Machine

Contrast ratio of input SLM 15:1 25:1

Gaussian beam profile (intensity 0.6 0.55of input bit 0/maximum inten-sity)

Detector mismatch =3% =3%

Cross talk between adjacent (0.5%/2%) (1%/10%)channels (before weight SLM/after weight SLM)

Random noise (electronic) =2% =3%

Nonlinearities () 0.49 0.07b

af C (0.0), actual weight value = f(desired weight value).bThe quantized nature of the function f only makes this valid for

large desired weight values.

output plane, which is a consequence of removing theimage correcting optics from the original architectureto make the system compact. The spatially encoded,quantized weights clearly form a more linear map-ping with the connection weights stored in the com-puter, as shown in Fig. 12. The degree of linearitydepends on the FLC uniformity over a given maxi-pixel representing a weight value.

The effect of the noise terms on the performance ofthis machine is analyzed with a computer modelparameterized by the characteristics of the quantizedOCM. Results indicate that the noise term thataffects this machine's performance most is the 10%optical cross talk at the output, which is caused bylight imaged onto adjacent output photodetectors.This noise increases the training error by 20% afterten epochs for the random association test set. Thefinal convergence tss error per pattern (after 25epochs) is 0.4 compared with 0.002 for the ideal casewith no output cross talk.

1.0

0.5 h

2.r.

Sp 0.0

-0.5

-1.0

10 15

Fig. 11. Direct comparison between the learning error using thenematic liquid crystal OCM and that using the FLC OCM. Bothwere trained on identical pattern/target sets with identical initialconditions.

-1.0 -0.5 0.0Desired weight value

. . . .. . . . . . . 1 . 0

Fig. 12. Experimental training error of the analog, nematic liquidcrystal OCM and the quantized, FLC OCM operating as two-layernetworks. The networks are trained to perform parity checking oninput vectors.


. . . . . . . . . . . . . . . . . . . . ..

0.5 1.0

The effect of the number of quantization levels onthe tss error is also studied with the computer model.The computer simulation consists of training associa-tions between ten random patterns in the presence oftypical OCM system noise. Figure 13 shows that thenumber of quantized weight levels does not influencetraining when more than five levels are used in asingle-layer, 32-dimension OCM. Note that the num-ber of levels corresponds to 1 x 1, 2 x 2, 4 x 4, and32 x 32 minipixels per weight.

2. Influence of Noise on Scaling the Quantized0CMThe improved performance of the quantized OCMover that of the analog OCM implies that this technol-ogy is more suitable for scaling to larger dimensions.Simulations of a 200-dimension machine associating30 random patterns by using the delta-rule trainingalgorithm were performed to test this hypothesis.The computer model parameters include a contrastratio of 10:1, output cross talk of 10% measured atthe detectors, a 3% detector mismatch, and 3% ran-dom noise. Figure 14 illustrates the influence of theseterms on the training rate and absolute tss error ofthe FLC-based hardware. It is interesting to note thatreducing the number of grey levels in the weights tofive has little effect on the OCM training rate.

From the analysis carried out on these two-layerOCM's we conclude that the two-layer network isextremely tolerant of errors resulting from the opticalvector-matrix multiplications. This is significant asprevious researchers have suggested that the lowaccuracy of these optical processors limit their use inperforming analog matrix multiplication." We havedocumented the most serious OCM system error,which is the nonlinearity associated with the twistednematic LCTV SLM's. By using quantized, binaryswitching FLC devices to encode the weight values,this error becomes insignificant. Using FLC's alsoincreases the processing speed of the machine bypotentially 3 orders of magnitude over its nematicliquid crystal counterpart. Furthermore, spatial encod-ing of the weighted interconnections with five greylevels appears to be adequate for implementing thedelta-rule neural network algorithm with 200 inputs.

E/ OV g v an----- Ideal (5 levels)~~O~fl IdealwithlOOtlCR

40 . I

20

0 0 .

o -

0 5 10 15 20 25 30Epoch

Fig. 14. Schematic of THEONet. The 17 input units are experi-mentally measured parameters describing various physical param-eters of sunspot activity. The output units are predictions of low,medium, and high solar flare activity.

Finally, scaling the quantized OCM to 200 dimen-sions by using typical noise parameters obtained fromthe actual hardware does not severely affect itsconvergence rate or absolute tss error.

111. Multilayer OCM's

A. Operation

By using the OCM's previously described, it is possi-ble to implement multilayer neural networks bytemporal multiplexing. From Fig. 1 we can see thatLCTV1 encodes the input vector and LCTV2 theinterconnections [wij't '] between the input and amiddle or hidden layer of processing units. The IBMPC calculates the values of the hidden units byapplying a sigmoid nonlinearity to the result of thedifferential detection. These values are then subse-quently displayed on LCTV1, and the second layer ofinterconnections [w..2'] is programmed onto LCTV2.The output of this second pass through the OCMyields the output units. The modifications to theinterconnection weights are carried out by usingthe Backprop algorithm. For the interconnectionsWi,." between the hidden and output units, the weightupdate is given by

(5)

where

V = (TP - Ojf)OjP(l - O). (6)

E 10 ~ ~~~-- u~ogrey sv

0 5 10 15 20Epoch

Fig. 13. Effect of noise terms on convergence of training errorwhen the OCM is operated as a two-layer network. Thirty randomassociations are used to train the network.

For the interconnection weights WJ11' between theinput and hidden layers, the update is given by

Apwv' (1= la *POjP (7)

with the error * backpropagated from the outputlayer by the following equation:

al = Op(l - Op) E skils . (8)

The weights are updated after each pattern presenta-tion and not after all patterns have been presented.

These networks overcome problems inherent intwo-layer neural network architectures such as the


APWij(2 = 9 j P

association of a target set with linearly dependentinput patterns. The most common example of aproblem unsolvable with a two-layer network is theexclusive-OR association. Here the following pattern/target associations are made:

(9)\° (0 l o (ol 0 / (1 (1

In higher-input dimension machines, the problemis analogous to parity checking and can be encoded assuch on the two OCM's presented here. As this is asimple and somewhat standard problem, we use it totest the training of the OCM's when operating asthree-layer feedforward networks.

Both the analog and quantized OCM's were trainedusing Backprop to perform the parity checking. Theresults illustrated in Fig. 15 show that both theanalog and quantized OCM's successfully learn thisassociation. The tss error per pattern falls below thatachievable by a two-layer network for this particulartraining set (tss error = 0.25). However, the rela-tively large number of epochs required for conver-gence and the obvious oscillation in the error indicatethat the Backprop algorithm does not appear tocompensate for system errors to the same degree asthe two-layer OCM. To investigate the effect of noiseterms on the training rate of the multilayer OCM, thecomputer model for the two-layer OCM is extended toa three-layer simulation using Backprop to train theweights. A training set of 30 random associations isused.

Computer simulations comparing the performanceof an ideal, three-layer OCM with the OCM operatingin the presence of noise generated by an SLM contrastratio of 20:1 and parameters modeling the nematicliquid crystal nonlinearity are shown in Fig. 16.Errors such as the Gaussian beam profile have beenneglected, as their influence on system operation iscomparable to or less than that of the contrast ratio.

The nematic liquid crystal nonlinearity causes se-vere problems in training the three-layer OCM. Thisis due in part to inaccuracies in the optical feedfor-ward propagation and in part to the incorrect elec-tronic backpropagation of the error. The matrixmultiplication associated with Backprop [Expres-

E

0.

10'10 15 20

Epoch

Fig. 15. Experimental training error for thelearning the 200-pattern THEONet data set.

32-bit FLC OCM

10.0

8.0'

0 6.0-

4 4.0

2.00 10 20 30

EpochIdeal

-e-- 20:1 Contrast ratio- -- Nonlinear

Fig. 16. Schematic of an all-optical hidden layer using a stripedoptically addressed SLM.

sions (9)] is carried out on the IBM XT PC and usesthe stored weight values w 1 . As these are not theactual weight values, the backpropagated error dif-fers significantly from the correct error and leads toerroneous updates of the weight values. This problemcannot be solved by optical backpropagation, as theoptical matrix-vector multiplication is not operatedin reverse. These results do indicate, however, thatthree-layer networks implemented on the analog0CM can be trained to compensate for most of thesystem errors.

B. THEONet: A Solar Flare Prediction Network

A further test of the ability to train a three-layer0CM is to present it with a real problem. Theproblem chosen here is that of predicting solar flareactivity from physical measurements of sunspot activ-ity. The data are presented as a series of pattern!target vectors trained with the Backprop algorithm.The same data have been used to develop the expertsystem THEO and more recently to train a three-layer neural network (THEONet) implemented on aDEC 3100 work station.1" The results from THEONetindicate that the neural network successfully predictssolar flares with 90% accuracy. As the nonlinearitiesand slow speed of the analog 0CM make trainingdifficult, only the quantized FLC 0CM implementedTHEONet.

The 0CM architecture used to demonstrate theTHEONet consists of 17 binary input units, whichclassify the sunspot location, size, long-term andshort-term history of activity, ten binary hiddenunits, and three output units indicating the presenceof a low-, medium-, or high-intensity solar flare (seeFig. 17). Bias units were also introduced at the inputand hidden layers.

The result of the 0CM training is shown in Fig. 18.The 0CM was trained on 200 input patterns. Aftertraining is completed a second set of 200 patterns ispresented to the machine and the tss error/pattern ismeasured. It is found that the error is comparablewith that obtained for the training set, indicatinggood network generalization.

In this section we have presented results on theanalog 0CM performing parity checking and thequantized 0CM predicting solar flare activity from


Output units

Hidden layer nterconnections

Input units

Fig. 17. Schematic of a fully interconnected second-order net-work using SLM's and polarization logic for bipolar weight encod-ing.

sunspot data. The operation of these two OCM's wasachieved by temporal multiplexing of the two-layeroptical hardware. In Section IV we present the designof an optoelectronic hidden unit device that will allowthe three-layer OCM to operate faster and with bettererror tolerance than the temporal multiplexed ma-chines.

IV. Optoelectronic Hidden Units

A method of implementing a three-layer networkwithout downloading the trained matrix values in-volves fabricating optoelectronic devices, which canbe used as hidden units. Such a device needs to carryout ideally a sigmoid nonlinearity on the net input tothe layer. To be compatible with the quantized OCMarchitecture, the device should have a one-dimen-sional, striped output for illuminating the subse-quent w,, weight matrix.

We briefly describe the design and operation of anoptoelectronic hidden layer based on the amorphoussilicon (Si:H) FLC optically addressed SLM.'4 Thisdevice consists of conducting stripes fabricated be-tween the FLC and the a-Si:H photosensor, as shownin Fig. 19. Preliminary results indicate that the FLCelectro-optic modulator switches as a function of thedifference between the optical power illuminatingalternate amorphous silicon stripes. Spatially separat-ing the different polarization components by using abirefringent material such as Calcite allows thisdifferential device to operate correctly with the bipo-lar polarization weight encoding. It also potentiallyallows for optical gain, which is particularly useful, asthis device can be operated in an all-optical recurrent

1.0T

E

a.

0.9

0.8-

0.7

0.6-

0.5

0.4

0.0 20 40 60 80

Epoch

Fig. 18. Experimental training error for the 32-bit FLC OCM bylearning the 200-pattern THEONet data set.

ITO stripes

a-Si:H 0>(Not shadedY)

Glass

Fig. 19. Schematic of annected striped amorphousnonlinear FLC modulator.

a-Si:H

F lTC

Al stripes

-V

ITO

IIR

,FLC

+V

Al

all-optical hidden layer. Serially con-Si photodiodes are used to address a

network such as a Hopfield network or a Boltzmannmachine.' The device is currently being fabricated ina 1 x 10 array for insertion into the OCM applicationof the THEONet.

The optoelectronic hidden layer allows the OCM tooperate in an all-optical feedforward mode. However,this three-layer OCM still suffers from the low errortolerance and the slow convergence speed associatedwith the backpropagation of the error algorithm. Analternative to using multilayer networks trained withBackprop for solving complex problems is to considertwo-layer networks with higher-order connectivitybetween the input and output layers. These higher-order networks offer complexities that are similar tothose of the multilayer networks but can be trained ina feedforward, single-layer architecture." For exam-ple, the exclusive-OR problem can be solved exactly byusing only second-order interconnections; i.e., theoutput unit is the sum of the product of pairs ofweighted input units. Optical implementations ofsecond-order networks have been proposed and dem-onstrated.- 23

V. Conclusions

The influence of system noise on the performance ofliquid crystal-based bipolar optoelectronic connection-ist or neural network hardware has been analyzed. Inparticular noise caused by the nonlinear mappingbetween weights stored as voltages in the controllingcomputer and the subsequent analog optical modula-tion by the nematic liquid crystals are found to be themajor contributors to the system error. By replacingthe analog nematic liquid crystal weights with spa-tially multiplexed, binary switching ferroelectric liq-uid crystals, we show that this quantized OCMconverges faster to a minimum tss error than theanalog OCM.

By using a computer model with noise parametersobtained from real devices, both the analog andquantized OCM's are shown to scale easily to 200dimensions. The noise terms include contrast ratiosof 15:1, Gaussian laser beam variations at the inputof 40%, output cross talk of 10%, detector mismatchof 5%, and 3% random noise. This demonstrates thatadaptive processing architectures can compensate for


9 ?

; I

[

.<

the noise inherent in parallel optical hardware. It isimportant to note that the hardware must be in-cluded in the training cycle to obtain this noiseimmunity.

Results have also been presented on performing thethree-layer backward error propagation algorithm bytemporal multiplexing of the two-layer analog andquantized OCM's. In particular, parity checking on16 and 32 inputs has been demonstrated. The quan-tized OCM also succeeded in predicting solar flareactivity from sunspot data. Finally, a novel optoelec-tronic hidden unit device was described, which isdesigned to allow the OCM to operate in an all-opticalprocessing mode.

We gratefully acknowledge the support of the Na-tional Science Foundation Engineering Research Cen-ter Program CDR862228 and equipment grant EET-8607833. We thank Jack Bigner and Lin Zhang forexperimental assistance and Ted Weverka, Mike Fel-lows, and Garret Moddel for stimulating discussions.We are grateful to David Doroski for fabricating theinput FLC SLM in the 32-dimension machine andSkip Wichart for depositing the amorphous siliconfilms for the hidden unit device.

References1. D. E. Rumelhart, J. L. McClelland, and the PDP Research

Group, Parallel Distributed Processing Explorations in theMicrostructure of Cognition. Vol. 1, Foundations (MIT Press/Bradford Books, Cambridge, Mass., 1986).

2. A. R. Dias, "Incoherent optical matrix-vector multiplier forhigh speed data processing," Ph.D. Dissertation, StanfordUniversity, Stanford, Calif. 94305 (1980).

3. M. Kranzdorf, K. M. Johnson, L. Cotter, L. Zhang, and B. J.Bigner, "A polarization-based optoelectronic connectionistmachine," in Optical Computing '88, P. Chavel, J. W. Good-man, and G. Rohl, eds., Proc. Soc. Photo-Opt. Instrum. Eng.963, 512-521 (1988).

4. M. Kranzdorf, K. M. Johnson, B. J. Bigner, and L. Zhang, "An

optical connectionist machine with polarization-based bipolarweights," Opt. Eng. 28, 844-848 (1989).

5. M. G. Robinson, K. M. Johnson, L. Zhang, and B. J. Bigner,"Optical neural networks using smectic liquid crystals," inDigital Optical Computing II, R. Arrathoon, ed., Proc. Soc.Photo-Opt. Instrum. Eng. 1215, 389-399 (1990).

6. K. M. Johnson and G. Moddel, "Motivations for using ferroelec-tric liquid crystal spatial light modulators in neurocomputing,"Appl. Opt. 28, 4888-4899 (1989).

7. B. Widrow and M. E. Hoff, "Adaptive switching elements,"IRE WESCON Conv. Rec. Part 4,96-104 (1960).

8. M. Minsky and S. Papert, Perceptions (MIT Press, Cambridge,Mass., 1969).

9. A. R. Dias, R. F. Kalman, J. W. Goodman, and A. A. Sawchuk,"Fiber-optic crossbar switch with broadcast capability," Opt.Eng. 27, 955-960 (1988).

10. N. A. Clark, M. A. Handschy, and S. T. Lagerwall, "Ferroelec-tric liquid crystal electro-optics using the surface stabilizedstructure," Mol. Cryst. Liq. Cryst. 94, 213-234 (1983).

11. STC Technology, Harlow, England, available through Display-tech, Inc., 2200 Central Ave., Boulder, Colo. 80301.

12. D. Psaltis and R. A. Athale, "High accuracy computation withlinear analog optical systems: a critical study," Appl. Opt. 25,3071-3077 (1986).

13. R. Fozzard, G. Bradshaw, and L. Ceci, "A connectionist expertsystem that actually works," Adv. Neural Inf. Process. Syst. 1,248-251 (1988).

14. G. Moddel, K. M. Johnson, W. Li, R. A. Rice, L. A. Pagano-Stauffer, and M. A. Handschy, "High-speed, binary opticallyaddressed spatial light modulator," Appl. Phys. Lett. 55,537-539 (1989).

15. M. Robinson, K. M. M. Johnson, D. Jared, D. Doroski,S. Wichart, and G. Moddel, "Custom designed component foroptically integrated multi-layer neural networks," in OpticalComputing, Vol. 6 of OSA 1991 Technical Digest Series(Optical Society of America, Washington, D.C., 1990), pp.84-87.

16. J. J. Hopfield, "Neural networks and physical systems withemergent collective computational abilities," Proc. Natl. Acad.Sci. USA 79, 2554-2558 (1982).

17. G. E. Hinton and T. J. Sejnowski, "Optimal perceptualinterface," in Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition (Institute of Electricaland Electronics Engineers, New York, 1983), pp. 448-453.

18. C. L. Giles and T. Maxwell, "Learning, invariance, and general-ization in high-order neural networks," Appl. Opt. 26, 4972-4978 (1987).

19. D. Psaltis, C. H. Park, and J. Hong, "Higher order associativememories and their optical implementations," Neural Net-works 1, 149-163 (1988).

20. A. Von Lehmen, E. G. Paek, P. F. Liao, A. Marrakchi, and J. S.Patel, "Influence of interconnection weight discretization andnoise in an optoelectronic neural network," Opt. Lett. 14,928-930 (1989).

21. A. Von Lehmen, E. G. Paek, L. C. Carrion, J. S. Patel, andA. Marrakchi, "Optoelectronic chip implementation of a qua-dratic associative memory," Opt. Lett. 15, 279-281 (1990).

22. L. Zhang, M. G. Robinson, and K. M. Johnson, "Opticalimplementation of a second order neural network," Opt. Lett.16, 45-47 (1991).

23. P. Horan and A. Arimoto, "Optical implementation of asecond-order neural network classifier with translationinvariance," in Proceedings of the, 1990 International TopicalMeeting on Optical Computing (Japan Society of AppliedPhysics, Kobe, Japan, 1990), p. 102.


noise analysis of polarization-based optoelectronic connectionist machines

Documents