optical implementation of a second-order neural network

3
January 1, 1991 / Vol. 16, No. 1 / OPTICS LETTERS 45 Optical implementation of a second-order neural network Lin Zhang, Michael G. Robinson, and Kristina M. Johnson Center for Optoelectronic Computing Systems, Department of Electrical and Computer Engineering, University of Colorado at Boulder, Boulder, Colorado 80309 Received June 26, 1990; accepted November 2, 1990 An optical implementation of a single-layer,second-order neural network is presented. The quadratic products are obtained by passing the optical beam twice through the input ferroelectric liquid-crystal (FLC) spatial light modulator (SLM), with the interconnection weights being implemented by a further two-dimensional 128 X 128 FLC SLM. The machine successfullyassociates eight randomly chosen pattern-target pairs (dimensions 16 and 4, respectively) and can learn the parity association. Translation invariance is also demonstrated. Results from a computer model indicate that input SLM contrast ratios of 4:1 and electronic noise of 10% of the maximum output can be tolerated. We have shown previously that optoelectronic systems can perform pattern association with single-layer per- ceptron-type neural networks, despite large errors due to the optical hardware used in the implementation.' Furthermore a computer model of the hardware shows that the errors do not severely affect the system per- formance, even when scaled to accommodate large in- put vector dimensions of 200. Single-layer machines are limited, however, in their ability to classify input vectors. For example, linearly separable input pat- terns cannot in general be associated with arbitrary output vectors. For many types of application a high- er complexity is necessary to overcome these limita- tions. We have therefore modified the original single- layer design to permit quadratic interconnections be- tween neurons, thus increasing the association capability of the network without decreasing the pro- cessing speed of the machine and maintaining the er- ror tolerance of the original design. For quadratic interconnections Wijk and an input vector Ii, the output vector Oi is given by °i = I wijk (1) Ik where fAx} is the sigmoidal nonlinearity, and the weight update during training is Awijk = 7(Ti - O)IjIk, (2) where X7 is the learning parameter and Ti are the target vector components. Second-order networks in particular permit transla- tion and scale invariance with regard to the input components. 2 ' 3 That is, g(Ii) = g(Ii+m) and g(Is) = g(Iixm), where g represents the output of the network. For the second-order network the translation invari- ance can be rigorously enforced by setting. f{> Wijk~iIk} =f{; WilkIjmIkm}. (3) f S jk j }f jk jmkm with Wijk = Wi,j+m,k+m, (4) where 0 < j- m < N, 0 < k - m N, and N is the dimension of the input vector. Scale invariance can- not be rigorously enforced in the same way but in practice can be learned by these networks. To imple- ment the translation invariance, either the training algorithm in Eq. (2) can be altered to enforce the added restriction on the weights, such that, Awijk = E n(Ti -Oi)lj-.Ik-. m (5) or the training set presented to the unaltered network can include all translational variations of a particular input vector. The effect of the condition given in Eq. (4) effectively reduces the number of independent in- terconnections, which in turn severely limits the net- work capacity to learn pattern associations. For this reason no alteration of the training rule is made for the optically implemented machine. Previous optical implementations of second-order machines have used either two input spatial light modulators 4 (SLM's) or intermediate electronic pro- cessors 5 to perform the quadratic products. Here we use a single input SLM with no intermediate electron- ic processing. The architecture of the optical imple- mentation is shown in Fig. 1. The input vector com- ponents are amplitude encoded on a collimated, 5- mW, He-Ne laser beam using a ferroelectric liquid- crystal (FLC) SLM. This device was fabricated by surface stabilizing Chisso CS1014 FLC material be- tween two optical flats. A positive or negative (±15 V) d.c. electric field is applied across the material, through vertically striped transparent electrodes, to switch the material between two bistable states, the optic axes of which are separated by 450. By making the thickness of the material equal to X/2dAn ( half- wave) and placing the device between crossed polariz- ers (P1, P2), vertically striped amplitude modulation corresponding to the binary input vector components is achieved. In this particular implementation the axes of these polarizers were set at +45 0 to the vertical, with the FLC optic axes designed to be 0° and +45° for vector components of 1 and 0, respectively. 0146-9592/91/010045-03$5.00/0 © 1991 Optical Society of America

Upload: kristina-m

Post on 30-Sep-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Optical implementation of a second-order neural network

January 1, 1991 / Vol. 16, No. 1 / OPTICS LETTERS 45

Optical implementation of a second-order neural network

Lin Zhang, Michael G. Robinson, and Kristina M. Johnson

Center for Optoelectronic Computing Systems, Department of Electrical and Computer Engineering, University of Colorado at Boulder,Boulder, Colorado 80309

Received June 26, 1990; accepted November 2, 1990

An optical implementation of a single-layer, second-order neural network is presented. The quadratic products areobtained by passing the optical beam twice through the input ferroelectric liquid-crystal (FLC) spatial lightmodulator (SLM), with the interconnection weights being implemented by a further two-dimensional 128 X 128FLC SLM. The machine successfully associates eight randomly chosen pattern-target pairs (dimensions 16 and 4,respectively) and can learn the parity association. Translation invariance is also demonstrated. Results from acomputer model indicate that input SLM contrast ratios of 4:1 and electronic noise of 10% of the maximum outputcan be tolerated.

We have shown previously that optoelectronic systemscan perform pattern association with single-layer per-ceptron-type neural networks, despite large errors dueto the optical hardware used in the implementation.'Furthermore a computer model of the hardware showsthat the errors do not severely affect the system per-formance, even when scaled to accommodate large in-put vector dimensions of 200. Single-layer machinesare limited, however, in their ability to classify inputvectors. For example, linearly separable input pat-terns cannot in general be associated with arbitraryoutput vectors. For many types of application a high-er complexity is necessary to overcome these limita-tions. We have therefore modified the original single-layer design to permit quadratic interconnections be-tween neurons, thus increasing the associationcapability of the network without decreasing the pro-cessing speed of the machine and maintaining the er-ror tolerance of the original design.

For quadratic interconnections Wijk and an inputvector Ii, the output vector Oi is given by

°i = I wijk (1)Ik

where fAx} is the sigmoidal nonlinearity, and the weightupdate during training is

Awijk = 7(Ti - O)IjIk, (2)

where X7 is the learning parameter and Ti are the targetvector components.

Second-order networks in particular permit transla-tion and scale invariance with regard to the inputcomponents. 2'3 That is, g(Ii) = g(Ii+m) and g(Is) =g(Iixm), where g represents the output of the network.For the second-order network the translation invari-ance can be rigorously enforced by setting.

f{> Wijk~iIk} =f{; WilkIjmIkm}. (3)f S jk j }f jk jmkm

with

Wijk = Wi,j+m,k+m, (4)

where 0 < j- m < N, 0 < k - m • N, and N is thedimension of the input vector. Scale invariance can-not be rigorously enforced in the same way but inpractice can be learned by these networks. To imple-ment the translation invariance, either the trainingalgorithm in Eq. (2) can be altered to enforce theadded restriction on the weights, such that,

Awijk = E n(Ti -Oi)lj-.Ik-.m

(5)

or the training set presented to the unaltered networkcan include all translational variations of a particularinput vector. The effect of the condition given in Eq.(4) effectively reduces the number of independent in-terconnections, which in turn severely limits the net-work capacity to learn pattern associations. For thisreason no alteration of the training rule is made for theoptically implemented machine.

Previous optical implementations of second-ordermachines have used either two input spatial lightmodulators4 (SLM's) or intermediate electronic pro-cessors5 to perform the quadratic products. Here weuse a single input SLM with no intermediate electron-ic processing. The architecture of the optical imple-mentation is shown in Fig. 1. The input vector com-ponents are amplitude encoded on a collimated, 5-mW, He-Ne laser beam using a ferroelectric liquid-crystal (FLC) SLM. This device was fabricated bysurface stabilizing Chisso CS1014 FLC material be-tween two optical flats. A positive or negative (±15V) d.c. electric field is applied across the material,through vertically striped transparent electrodes, toswitch the material between two bistable states, theoptic axes of which are separated by 450. By makingthe thickness of the material equal to X/2dAn ( half-wave) and placing the device between crossed polariz-ers (P1, P2), vertically striped amplitude modulationcorresponding to the binary input vector componentsis achieved. In this particular implementation theaxes of these polarizers were set at +450 to the vertical,with the FLC optic axes designed to be 0° and +45° forvector components of 1 and 0, respectively.

0146-9592/91/010045-03$5.00/0 © 1991 Optical Society of America

Page 2: Optical implementation of a second-order neural network

46 OPTICS LETTERS / Vol. 16, No. 1 / January 1, 1991

45°

Input SLM \ Matrix(switchable Beam splitter

half-wave plate)

Fig. 1. Schematic of the second-order neural network optical architecture.

This vertically striped encoded beam is then reflect-ed in a 450-rotated right-angle prism, which creates acounterpropagating beam with horizontally modulat-ed stripes. A further traversal of the beam throughthe input SLM and polarizer P2 forms the spatiallyencoded outer-product matrix, Ii~j. The interconnec-tion weights are introduced by passing this beamthrough a subsequent 128 X 128 binary FLC SLM thatis spatially multiplexed to permit quantized weightvalues. Each interconnection weight is representedby a maxipixel, which is in turn made up of a 4 X 4binary minipixel array that allows 17 gray levels to beimplemented. A minipixel switches the optical polar-ization between vertical and horizontal, correspondingto + and - output channels.6 The + and - polariza-tions are separated with a polarizing beam splitter,and the output is obtained by subtracting betweendetector pairs.

To have more than one output unit the input vectoris replicated N times, which results in N2 spatiallyseparated outer-product matrices, each correspondingto an output unit. In this machine an input vector ofdimension 16 is associated with an output vector ofdimension 4. The nonlinearity, the calculation of theerror, the presentation of the input vector, and theupdating of the weight matrix on the SLM are allcarried out by the controlling IBM AT personal com-puter.

Ten pattern files containing random associationswere trained on the machine with different randominitial weight matrices. The files were created by us-ing a random number generator and typically had onaverage an equal number of ones and zeros in both thetargets and pattern vectors. The average Hammingdistances for the inputs and targets were therefore 8and 2, i.e., half the bits were different between pat-terns. Results show that the machine successfullylearns 10 associations for 6 such training sets and ob-tained a total-sum-squared (tss) error per pattern ofless than 0.1 in fewer than 15 complete presentationsof the input-target vector pairs (epochs). A success-ful association is when all output bits are predictedcorrectly when the activation coefficient is made infi-nite, i.e., equivalent to a hard threshold. The remain-ing training sets (4) converged to a tss error per pat-tern of 0.3 in 20 epochs, which corresponds to a 12%

error in the output vector components. The activa-tion coefficient is 3.0, and the learning parameter i7 is0.3. These results show that the machine has a capac-ity that is comparable with that of the previous single-layer implementation' despite larger errors with themore complex optical system.

This machine is capable of learning higher-orderassociations, whereas single-layer machines cannot.An example of a higher-order association is the parity

2.5

2.0

Ec.)

!E-

12

1.5

1.0

0.5

0.00 205 10 15

Epoch

Fig. 2. Comparison between the training error as a functionof pattern presentation for the computer model and theoptically implemented neural network. Ten random binaryinput vectors (of dimension 16) are mapped onto outputvectors dependent on their parity. Both the computer mod-el and the optical machine are initialized to the same randomweight matrix.

Table 1. Magnitudes of the System Errors

ExperimentalSystem Error Measurements

Contrast ratio of input SLM 20:1Gaussian beam profile (intensity of 0.55

input bit 0/maximum intensity)Detector mismatch 3%Cross talk between adjacent channels (1%/1%)

(before weight SLM/after weightSLM)

Random noise (electronic) 5%Weight value gray levels 17

Page 3: Optical implementation of a second-order neural network

January 1, 1991 / Vol. 16, No. 1 / OPTICS LETTERS 47

0.01 _ _ _ _ o l

0.001 10: IS3\----4:1 '

0r.00 ..,....I........ 2:

0 5 10 15 20Epoch

Fig. 3. Effect of altering the contrast ratio on training 12random pattern-target associations on the computer model.All other system noise errors are included with the magni-tude typical of the actual optical implementation.

0 5 10 15 20Epoch

Fig. 4. Effect of altering the temporal random noise ontraining 12 random associations on the computer model.Each curve represents a noise level as a percentage of themaximum output achievable.

association. Here binary random input vectors withan even number of ones are associated with one outputvector (1 10 0), and those inputs having an odd num-ber of ones are associated with another output vector(0 0 1 1). Ten files containing seven randomly choseninput patterns were successfully associated with re-spect to their parity, with typical results shown in Fig.2.

Finally, the translation invariance inherent in qua-dratic networks of this type can also be demonstratedon this machine. Results show that the machine suc-cessfully learns to associate two sets of translationallyrelated, 16-component input vectors with two separateoutput vectors. Training was completed in 20 epochswith >95% accuracy in distinguishing between the twovectors presented at random positions following thetraining.

The performances of the individual optical devices,which make up the system, and the optical alignmentcontribute to the overall system error. In particular,the contrast ratio of the input SLM, the Gaussianprofile of the laser beam, the cross talk between spa-

tially separate optical channels, the quantized weightvalues, and the temporal electronic noise contribute tothe system noise. Typical magnitudes for these sys-tem error terms for this machine are given in Table 1.Incorporation of these errors into a computer model ofthe machine allowed us to simulate the influence ofcomponent noise on the system performance. Theagreement between the simulation results and thoseobtained by the optical machine is shown in Fig. 2.With noise levels close to those measured for the ma-chine, the training error is most affected by alteringthe contrast ratio and the random noise terms. Figure3 shows the typical effect of having different inputSLM contrast ratios on the training error in the pres-ence of all other noise terms. From the figure it can beseen that contrast ratios of up to 4:1 can be toleratedby this architecture. Figure 4 shows the equivalentresults when the temporal noise associated with theelectronic hardware is altered as a percentage of themaximum output signal. Again the machine is toler-ant for <10% noise.

In conclusion, we have implemented a second-order,single-layer neural network using only one input SLMand no intermediate electronic processing. The in-creased complexity of such higher-order networks al-lows for successful learning of the random associationsand parity mapping. The translation invariance of asecond-order network was demonstrated with this op-tical architecture by training with translation interre-lated patterns. Forcing of this invariance by alteringthe training algorithm was not implemented, as thisseverely reduces the network capacity. By using acomputer model in which the system error terms wereincluded, the effect of such noise was investigated. Itwas found that the contrast ratio of the input SLMand the electronic noise most affected the system per-formance. However, contrast ratios as low as 4:1 andnoise levels of 10% of the maximum output possiblehave small effects on the system learning rate and thefinal tss error per pattern.

We gratefully acknowledge the support of the Na-tional Science Foundation Center for OptoelectronicComputing Systems (grant CDR862226) and interest-ing discussions with Shelly Goggin, a GTE GraduateResearch Fellow.

References

1. M. G. Robinson, K. M. Johnson, L. Zhang, and B. J.Bigner, Proc. Soc. Photo-Opt. Instrum. Eng. 1215, 389(1990).

2. C. L. Giles and T. Maxwell, Appl. Opt. 26, 4972 (1987).3. M. B. Reid, L. Spirkovska, and E. Ochoa, "Rapid training

of higher-order neural networks for invariant pattern re-cognition," in Proceedings of International Joint Con-ference on Neural Networks (Institute of Electrical andElectronics Engineers, New York, 1989).

4. P. Horan and A. Arimoto, in Proceedings of 1990 Interna-tional Topical Meeting on Optical Computing (JapanSociety of Applied Physics, Kobe, Japan, 1990), p. 102.

5. A. Von Lehmen, E. G. Paek, L. C. Carrion, J. S. Patel, andA. Marrakchi, Opt. Lett. 5, 279 (1980).

6. M. Kranzdorf, K. M. Johnson, J. Bigner, and L. Zhang,Opt. Eng. 28, 844 (1989).