[ieee conference proceedings., ieee international conference on systems, man and cybernetics -...

4
FAST LEARNING IN SYMBOLIC/NEURAL MOOELS USING EXTERNAL CONSTRAINST AND AUTOMATIC RE-STRUCTURING Alistair D.C. Holden Electrical Engineering Dept., University of Washington Seattle, Washington Introduction Powerful, but fragll e, systems can be bui 1 t using production rules as a uniform way of encoding human knowledge. We have shown t h a t a combination of rule-sets and neural nets can be powerful and less fraglle. Neural nets are robust and can recognize the "closest" patterns. Production systems are fragile and fail if problems are tackled whlch are not exactly covered by thelr "left-side" predicates. Their power i s derived from the painstaking addition of human knowledge by the designer t o cover sltuations as they arise during testing, which were not envisaged during the orlginal design process. Neural nets are able to tllearn" during operation, but learning i s very slow and thus feasible only in narrow situations. Their power is derived from the skill of the designer in creating suitable a priori architectures. There Is no reason why an "expert system" which i s encoded in rule-sets cannot be directly transformed into a neural net systemr provlded that the left-sides of the rules do not contain constructs such as "pattern variablesrft etc. Many rule-sets use propositional loglc to specify their left-side patterns and can be dlrectly implemented I n neural nets. - We have developed an effectlve method for model 1 ing very complex systems, call ed N-Port Symbol lc-Neural (NPSN) model 1 i n g Cl,21. Both production-rule sets and back-propagation multilayer neural nets are used as building blocks. This method is effective for the development o f l%rainedgf models o f real physical systems for both simulation of the system "as built" (with possible faults), and for the model 1 ing of human control behavior. New methods have been devised to speed up the notoriously slow 1 earning process i n back-propagation networks. We have shown that if large networks are broken down into architectures of smal l e r networks t o make the input-output data of each sub-network " s t r i c t 1 y monotonic," this speeds up the training process. Also, the concept of 8fru1 e-injection hints" was developed to speed-up learning In networks with non-monotonic input-output data. An analog equivalent of Denkerfs rule- extraction and functional entropy C31 was used t o formally evaluate the effects of "hints." From this, an improvement ratio was derived which gives a lower bound for the Improvement in the probablllty of rule extraction when a properly formulated hlnt i s used. It was a l s o shown t h a t the functional entropy of the network decreases faster when such a hint i s used, indicating a speed-up in training tlme In this case. A series of experiments confirmed that a substantial improvement in trainlng time resulted from the use of monotonic training data and rule- injection hints. Both a serles of simple non-monotonic functions and a model o f human con- trol behavior with a planetary 1 ander, were model led. The latter was model led at varying stages of detail and, while each model functioned t o some degree, the more detal 1 ed models performed more accurately and were tralned in less time than the less detalled models. C11. I n our more recent experiments, an NPSN model is being created to model the pllot in controlling an airplane i n "instrument-flying" situations. It i s clear that it i s f u t i l e t o attempt t o implement such complex systems with a monolithic neural net. I n the above case, the context of the situation could be "normal-fl ight" with sub-contexts, Takeoff, Landing and Crulse. Takeoff could agaln have sub-contexts such as Visual Flight Takeoff, Short Field Takeoff or Soft Field Takeoff. Cruise could have sub-contexts such as straight and levelr cl imb and descent. C41 A1 1 complex model 1 ing situations can be decomposed Into such hierarchies, and it is efficient to design NPSN systems with a module for each of the fflowest" (and simplest) sub-contexts and with the same hierarchy. When the appropriate situation is recognized, simp1 e AND-gate switching can se1 ect the appropriate modul e, and back-propagation learning will then be applied only to that module. Goldstein and Grimson E41 have designed an interesting system i n LISP using "annotated production systems,I1 for model 1 ing p i l o t control , etc., and discuss such problems as choosing the best action in a given context when there are several candidates and exceptions to a rule. Neural Net systems can readily cope with these 2 CH2809-2/89/0000-0002 $1.00 ' 1989 IEEE ,~ 7- - -

Upload: adc

Post on 26-Feb-2017

217 views

Category:

Documents


4 download

TRANSCRIPT

FAST LEARNING I N SYMBOLIC/NEURAL MOOELS USING EXTERNAL CONSTRAINST AND AUTOMATIC RE-STRUCTURING

A l i s t a i r D.C. Holden

E l e c t r i c a l Engineering Dept., Un ivers i ty o f Washington

Seattle, Washington

Introduction Powerful , b u t f r a g l l e, systems can be b u i 1 t

u s i n g p r o d u c t i o n r u l e s as a u n i f o r m way o f encoding human knowledge. We have shown t h a t a combina t ion o f r u l e - s e t s and n e u r a l n e t s can be powerful and l e s s f rag l le . Neural nets are robust and can r e c o g n i z e t h e " c l o s e s t " p a t t e r n s . P r o d u c t i o n sys tems a r e f r a g i l e and f a i l i f problems are tack led whlch are no t exac t ly covered by t h e l r " l e f t - s i d e " p red ica tes . T h e i r power i s d e r i v e d f rom t h e p a i n s t a k i n g a d d i t i o n o f human knowledge by the designer t o cover s l t ua t i ons as t h e y a r i s e d u r i n g t e s t i n g , w h i c h were n o t envisaged dur ing t h e o r l g i n a l design process.

N e u r a l n e t s a r e a b l e t o t l l ea rn " d u r i n g opera t ion , b u t l e a r n i n g i s v e r y s low and thus f e a s i b l e o n l y i n narrow si tuat ions. The i r power i s d e r i v e d f rom t h e s k i l l o f t h e des igner i n c rea t ing s u i t a b l e a p r i o r i architectures.

There I s no reason why an "exper t system" which i s encoded i n ru le -se ts cannot be d i r e c t l y t rans formed i n t o a n e u r a l n e t systemr p rov lded t h a t t h e l e f t - s i d e s o f t h e r u l e s do n o t c o n t a i n constructs such as "pattern var iablesrf t etc. Many rule-sets use propos i t iona l l o g l c t o speci fy t h e i r l e f t - s i d e pa t te rns and can be d l r e c t l y implemented I n neural nets. -

We have deve loped an e f f e c t l v e method f o r model 1 i n g v e r y complex systems, c a l l ed N-Port Symbol l c -Neura l (NPSN) model 1 i n g Cl,21. Both p r o d u c t i o n - r u l e s e t s and b a c k - p r o p a g a t i o n m u l t i l a y e r n e u r a l n e t s a r e used as b u i l d i n g b l o c k s . T h i s method i s e f f e c t i v e f o r t h e development o f l%rainedgf models o f rea l physical systems f o r b o t h s i m u l a t i o n o f t h e system "as b u i l t " ( w i t h p o s s i b l e f a u l t s ) , and f o r t h e model 1 i ng o f human con t ro l behavior. New methods have been devised t o speed up the no tor ious ly slow 1 earning process i n back-propagation networks.

We have shown t h a t i f l a r g e networks a r e broken down i n t o a rch i tec tu res o f smal l e r networks t o make the input-output data o f each sub-network " s t r i c t 1 y monotonic," t h i s speeds up the t r a i n i n g process. Also, t h e concept o f 8 f ru1 e - i n j e c t i o n h i n t s " was deve loped t o speed-up l e a r n i n g I n networks w i th non-monotonic input-output data.

An a n a l o g e q u i v a l e n t o f Denkerfs r u l e - ex t rac t ion and func t iona l entropy C31 was used t o f o r m a l l y e v a l u a t e t h e e f f e c t s o f "hints." From th is , an improvement r a t i o was der ived which g ives a l o w e r bound f o r t h e Improvemen t i n t h e p r o b a b l l l t y of r u l e e x t r a c t i o n when a p r o p e r l y f o r m u l a t e d h l n t i s used. It was a l s o shown t h a t t h e f u n c t i o n a l en t ropy o f t h e network decreases f a s t e r when such a h i n t i s used, i n d i c a t i n g a speed-up i n t r a i n i n g t lme I n t h i s case.

A s e r i e s o f exper iments conf i rmed t h a t a subs tan t ia l improvement i n t r a i n l n g t ime resu l ted from the use o f monotonic t r a i n i n g data and r u l e - i n j e c t i o n h i n t s . B o t h a s e r l e s o f s i m p l e non-monotonic funct ions and a model o f human con- t r o l b e h a v i o r w i t h a p l a n e t a r y 1 ander, were model led . The l a t t e r was model l e d a t v a r y i n g stages o f d e t a i l and, w h i l e each model functioned t o some degree, t h e more de ta l 1 ed models performed more accurately and were t ra lned i n less t ime than the l e s s de ta l l ed models. C11.

I n our more recent experiments, an NPSN model i s being created t o model t h e p l l o t i n c o n t r o l l i n g an a i rp lane i n "instrument-flying" s i tuat ions. It i s c l e a r t h a t it i s f u t i l e t o attempt t o implement such complex systems w i t h a mono l i th ic neural net. I n t h e above case, t h e c o n t e x t o f t h e s i t u a t i o n c o u l d b e " n o r m a l - f l i g h t " w i t h sub-contexts, Takeoff, Landing and Crulse. Takeoff cou ld agaln have sub-contexts such as Visual F l i g h t Takeoff, Short F i e l d Takeoff o r So f t F i e l d Takeoff. Cruise c o u l d have sub-contexts such as s t r a i g h t and l e v e l r c l imb and descent. C41 A 1 1 complex model 1 i ng s i t ua t i ons can be decomposed I n t o such h ie ra rch ies , and it i s e f f i c i e n t t o des ign NPSN systems w i t h a module f o r each o f t h e f f lowest " (and s i m p l e s t ) sub-contexts and w i t h t h e same h ie rarchy . When t h e a p p r o p r i a t e s i t u a t i o n i s recognized, simp1 e AND-gate switching can se1 e c t t h e a p p r o p r i a t e modul e, and back-propagat ion learn ing w i l l then be app l ied o n l y t o t h a t module.

G o l d s t e i n and Grimson E41 have designed an i n t e r e s t i n g system i n L ISP u s i n g " a n n o t a t e d production systems,I1 f o r model 1 i ng p i l o t con t ro l , etc., and d i scuss such problems as choos ing t h e b e s t a c t i o n i n a g i v e n c o n t e x t when t h e r e a r e s e v e r a l cand ida tes and excep t ions t o a r u l e . Neura l Net systems can r e a d i l y cope w i t h these

2

CH2809-2/89/0000-0002 $1.00 ' 1989 IEEE

,~ 7- - -

problems. the real- t ime response problem,

They a l s o a r e v e r y a b l e t o cope w i t h

Whi 1 e we have a comple te dynamic model o f a commercial a i rp lane we have s ta r ted w i th a s imple dynamic model where o n l y t w o c o n t r o l s a r e necessary (Thrust and P i tch ing Moment). This w i l l 1 a t e r be extended g r a d u a l l y t o i n c r e a s i n g l y complex models. The ex i s tence o f a p r o t o t y p e rule-based expert system f o r t h i s problem g r e a t l y f a c i 1 i t a t e s t h e development o f an NPSN model.

t i c Restructu r i ag

I n b u i l d i n g t h e NPSN model, t t f e I1bestt1 a r c h i t e c t u r e i s f i r s t c r e a t e d i n t e r m s o f I l s t r i c t l y monotonic" modules as f a r as p o s s i b l e ( t o g i v e l e a r n i n g e f f i c i e n c y ) . L e a r n i n g e f f i c i e n c y i s t h e n a l s o enhanced by a d d i n g p r o p e r l y chosen h i n t s (dummy o u t p u t s o f an NN module which c o n s t r a i n t h e we igh t -ad jus tment process). Automatic res t ruc tu r ing t o improve both t h e c o n t i n u o u s l e a r n i n g p r o c e s s and t h e performance, then ranges from the simp1 e process o f weight adjustment (which by i t s e l f can remove p a r t s o f a network which do n o t c o n t r i b u t e t o a decision) t o the automatic generation o f s u i t a b l e h i n t s and o t h e r methods f o r adding o r d e l e t i n g su b-netwo rk s

I i b ra rv of P r t m i t i v e NN's for 18Features11

We a r e c u r r e n t l y i n v e s t i g a t i n g t h e use o f a l a r g e l f l i b ra ry l f o f p r i m i t i v e n e u r a l nets, and combina t ions o f t hese p r i m i t i v e s which have prev ious ly been found t o be useful, t o synthesize optimal models automatical l y . The basic problem here i s t o index the p r i m i t i v e s and combinations i n terms o f t h e i r f u n c t i o n s and g o a l s and t o recognize when they should be used. This approach i s s i m i l a r t o the Vase-based reasoningf1 research which i s c u r r e n t l y o f g r e a t i n t e r e s t t o t h e A I community. The combina t ions o f p r i m i t i v e s a r e ltfeatureslf o f t he problem. The best features are t h o s e w h i c h f r e q u e n t l y o c c u r i n s p e c i f i c s i t u a t i o n s b u t occur i n f r e q u e n t l y i n general . (e.g. t h e F - r a t i o can be used as a measure).

HPSN Models

N-Port Symbol i c -Neura l (NPSN) models a r e b u i l t o f n -po r t dev i ces w i t h a r b i t r a r y , b u t u n i d i r e c t i o n a l i n p u t s and outputs. These models are b u i l t using many smal l neural networks, which are coupled together through conventional symbolic logic. Here, t he neural network used i s a mu l t i - 1 ayer network t r a i n e d using the back-propagation t r a i n i n g a l g o r i t h m def ined by Rumel har t , H in ton and Wi l l iams C51. With t h i s mode l l ing approach, e x p l i c i t human k n o w l e d g e i s d e f i n e d by t h e e x t e r n a l s t r u c t u r e o f t h e networks, as w e l l as r u l e s and algorithms. Knowledge i s then provided i n the form o f t r a i n i n g f o r t he i nd i v idua l neural networks which a r e capab le o f l e a r n i n g f rom t h e ob serv a t i on o f data.

I t i s o f t e n p o s s i b l e t o p a r t i t i o n an n-por t model i n such a way t h a t t he t r a i n i n g data f o r t he smal l e r component neural networks becomes s t r i c t l y monotonic. T h i s speeds up t h e l e a r n i n g process s i n c e t h e back-propagat ion t r a i n i n g a l g o r i t h m f i r s t a t tempts t o model f u n c t i o n s which a r e s t r i c t l y monotonic. C21

Another method f o r increasing the p r o b a b i l i t y o f successful r u l e ex t rac t ion f o r model 1 i ng non- s t r i c t l y monotonic re la t i onsh ips i s t he add i t ion o f a l l r u l e - i n j e c t i o n h in t v1 t o t h e network. A h inted network i s t ra ined simultaneously f o r two response v e c t o r par ts , t h e o r i g i n a l response v e c t o r o f i n t e r e s t R and t h e h i n t v e c t o r Rc. The h i n t c o n s t r a i n s !'he t r a i n i n g a l g o r i t h m , f o r c i n g t h e h idden l a y e r t o be implemented i n a way t h a t can be l i n e a r l y combined t o generate both t h e o u t p u t o f i n t e r e s t Ru and t h e h i n t Rc, t h u s improving r u l e ex t rac t ion and learn ing time.

As an example, a 2 -h idden u n i t n e t w o r k converged f o r s t r i c t l y monotonic funct ions such as X+Y and X-Y i n about 170 epochs. However, t h e XOR(X,Y), a non-monotonic func t ion , took 3001 epochs. Using X+Y and s i m i l a r h i n t s reduced t h e t r a i n i n g t ime f o r XOR funct ions t o 841 epochs.

B r i e f l y , t h e e f f e c t i v e n e s s o f t h e h i n t s i s e x p l a i n e d i n two ways. F i r s t , i f t h e h i n t s a r e s t rong ly monotonic, (i.e., t he gradient vector i s l a rge over the por t ion o f t he range f o r which the o r i g i n a l output o f i n t e r e s t i n non-monotonic) then they propagate strong e r r o r s igna ls o f a monotonic func t ion t o the hidden layer. These strong e r ro r s i g n a l s o v e r r i d e t h e llweaklf s i g n a l s o f t h e non- monotonic o u t p u t and t h e r e f o r e make t h e hidden u n i t s t r a i n more as though they were t r a i n i n g t o model s t r i c t l y monotonic func t tons . The o t h e r r e a s o n i s t h a t h i n t s s e r v e as c o n s t r a i n t s , reducing the number o f poss ib le so lu t ions t h a t t he network w i l l f i n d as i t t r a f n s . If t h e h i n t and

t h e o u t p u t o f i n t e r e s t a r e d e r i v e d f rom t h e same concepts o r r u l e s (such as shared i n t e r m e d i a t e terms i f bo th r e s u l t s were t o be computed by p h y s i c s e q u a t i o n ) , t h e n t h e r e i s a h i g h p r o b a b i l i t y t h a t t h e network w i l l e x t r a c t t h e proper r u l e . Furthermore, t h e en t ropy o f t h e t r a i n i n g mechanism decreases fas te r f o r a hinted network, thus i nd i ca t i ng a fas te r t r a i n i n g cycle.

C21

The f o l l o w i n g t e s t s w i l l i n d i c a t e i f t h e llhintll rc i s l i k e l y t o be e f fec t i ve :

1. The h i n t rc must be s t r i c t l y monotonic w i t h respect t o the inputs, S.

2. The o u t p u t o f i n t e r e s t , ru, shou ld be a monotonic combina t ion o f a subset o f S and t h e h i n t rc.

rc should not be de r i vab le from a l i n e a r (o r nea r l y l i n e a r ) combination o f t he o ther out- puts.

3 .

3

- I n Suddarth, Sutton and Holden C l 1 an example

i s g i v e n where a symbol l c - n e u r a l model o f human con t ro l behavior was used t o con t ro l a s imulated p l a n e t a r y lander . The f i r s t a t tempt used a symbol ic-neural model cons is t ing o f a s i n g l e back- p ropaga t ion network which used t h r e e inputs : f u e l , p resen t v e l o c i t y , and a l t i t u d e , and which produced one ou tpu t : t h r u s t . T r a i n i n g took 323,083 epochs (11 hours and 35 minu tes on a Compaq/286).

The system d i d l a n d somewhat success fu l l y ( r a t e o f descent l e s s than 3 u n i t s ) i n two o f t h e t h r e e above cases; it o n l y d i d so because it ran o u t o f f u e l c l o s e t o touchdown. Thus, t h e model had n o t performed an e n t i r e l y success fu l r u l e extraction. Further d e f i n i t i o n o f t h e model was required.

F u r t h e r d e f i n i t i o n was added by d e f i n i n g a "desired ve loc i t y " p r o f l l e and t r a i n i n g it i n t o a sub-modul e.

T h i s a r c h i t e c t u r e produced a more "Sure" land lng than t h e single-network system. Also, t he system c o n t r o l l e d v e l o c i t y near ly as w e l l as t h e human did. Another s i g n i f i c a n t b e n e f i t t o t h i s approach was the difference i n learn ing times, f o r network No. 1 t r a l n e d i n 721 epochs and network No. 2 t r a i n e d 13,041 epochs f o r a t o t a l t r a i n i n g t lme o f 27 minutes and 43 seconds.

A r u l e was then added which t o l d t h e model t o " d i t h e r " t h r u s t (an i n t e r g e r v a l u e ) i f necessary t o ho ld ve loc i ty . The f i n a l a rch i tec tu re i s shown i n f i g u r e 1. T h i s approach made t h e c r a f t t ouch down a t 3 f t / s , 1 f t / s s l o w e r than t h e model w i t h o u t r u l e s and o n l y 1 f t / s f a s t e r t hen t h e human.

The NNL experiment was a l s o implemented w i th a "hinted" model as shown i n f i g u r e 2.

This system converged i n 42,017 epochs ( lhour and 58 minu tes on t h e Compaq/80286). T h i s was o n l y 17% o f t h e t r a i n i n g t i m e r e q u i r e d f o r t h e same a r c h i t e c t u r e w i t h o u t t h e h i n t . The p e r f o r m a n c e a 1 so i n d i c a t e d success fu l r u l e e x t r a c t i o n . a l t h o u g h w i t h touchdown v e l o c i t i e s around 4 f t / s , a l i t t l e rougher than t h e more s t ruc tu red models already discussed.

References:

1. Suddar th , S.C. Su t ton , S.A. and Ho lden, A.D.C., "A Symbol ic-Neural Method f o r Sol v ing Control Prob l ems,g1 Proceedings In te rna t l ona l Conference on Neural Nets, San Diego, 1988.

Suddarth, S.C. and Holden, A.D.C., "An N-Port Symbol i c -Neura l Method f o r Model 1 i n g and C o n t r o l 1 i n g SystemsSs1 t o b e pub1 i shed , Journal o f Man-Machine Studies.

2.

3. Denker, J., Schwartz, D., Wi t tner , E., Sol l a , S., Hop f ie ld , J., Howard, R., and Jacke l , L., llAutomatic Learning, Ru l e -Ex t rac t i on and General i z a t i o n P Complex Systems, Vol. 1, pp.

4. Go lds te in , I.P. and Grimson, E., "Annotated P roduc t i on Systems: A Model f o r S k i l l Acqu is i t i onsg1 A I Memo 407, A I Laboratory, M.I.T., Feb. 1977.

5. Rumel har t , D.E., Hinton, G.E., W i 11 iams, R.J., I ILearning I n t e r n a l Representa t ions by E r r o r P r o a g a t i o n , ch. 7 i n P a r a l l e l D i s t r i b u t e d P r o c e s s i n g , V o l u m e 1: Foundations, Rumel h a r t 8 McCl e l 1 and (ed), MIT Press, Cambridge, pp. 318-362, 1986.

877-922, 1987.

4

FIGURE 1 : 2-Notwork Architocturo with Rules

FIGURE 2: Hinted NNL Modo1 Architocturo

5