[ieee conference proceedings., ieee international conference on systems, man and cybernetics -...
TRANSCRIPT
FAST LEARNING I N SYMBOLIC/NEURAL MOOELS USING EXTERNAL CONSTRAINST AND AUTOMATIC RE-STRUCTURING
A l i s t a i r D.C. Holden
E l e c t r i c a l Engineering Dept., Un ivers i ty o f Washington
Seattle, Washington
Introduction Powerful , b u t f r a g l l e, systems can be b u i 1 t
u s i n g p r o d u c t i o n r u l e s as a u n i f o r m way o f encoding human knowledge. We have shown t h a t a combina t ion o f r u l e - s e t s and n e u r a l n e t s can be powerful and l e s s f rag l le . Neural nets are robust and can r e c o g n i z e t h e " c l o s e s t " p a t t e r n s . P r o d u c t i o n sys tems a r e f r a g i l e and f a i l i f problems are tack led whlch are no t exac t ly covered by t h e l r " l e f t - s i d e " p red ica tes . T h e i r power i s d e r i v e d f rom t h e p a i n s t a k i n g a d d i t i o n o f human knowledge by the designer t o cover s l t ua t i ons as t h e y a r i s e d u r i n g t e s t i n g , w h i c h were n o t envisaged dur ing t h e o r l g i n a l design process.
N e u r a l n e t s a r e a b l e t o t l l ea rn " d u r i n g opera t ion , b u t l e a r n i n g i s v e r y s low and thus f e a s i b l e o n l y i n narrow si tuat ions. The i r power i s d e r i v e d f rom t h e s k i l l o f t h e des igner i n c rea t ing s u i t a b l e a p r i o r i architectures.
There I s no reason why an "exper t system" which i s encoded i n ru le -se ts cannot be d i r e c t l y t rans formed i n t o a n e u r a l n e t systemr p rov lded t h a t t h e l e f t - s i d e s o f t h e r u l e s do n o t c o n t a i n constructs such as "pattern var iablesrf t etc. Many rule-sets use propos i t iona l l o g l c t o speci fy t h e i r l e f t - s i d e pa t te rns and can be d l r e c t l y implemented I n neural nets. -
We have deve loped an e f f e c t l v e method f o r model 1 i n g v e r y complex systems, c a l l ed N-Port Symbol l c -Neura l (NPSN) model 1 i n g Cl,21. Both p r o d u c t i o n - r u l e s e t s and b a c k - p r o p a g a t i o n m u l t i l a y e r n e u r a l n e t s a r e used as b u i l d i n g b l o c k s . T h i s method i s e f f e c t i v e f o r t h e development o f l%rainedgf models o f rea l physical systems f o r b o t h s i m u l a t i o n o f t h e system "as b u i l t " ( w i t h p o s s i b l e f a u l t s ) , and f o r t h e model 1 i ng o f human con t ro l behavior. New methods have been devised t o speed up the no tor ious ly slow 1 earning process i n back-propagation networks.
We have shown t h a t i f l a r g e networks a r e broken down i n t o a rch i tec tu res o f smal l e r networks t o make the input-output data o f each sub-network " s t r i c t 1 y monotonic," t h i s speeds up the t r a i n i n g process. Also, t h e concept o f 8 f ru1 e - i n j e c t i o n h i n t s " was deve loped t o speed-up l e a r n i n g I n networks w i th non-monotonic input-output data.
An a n a l o g e q u i v a l e n t o f Denkerfs r u l e - ex t rac t ion and func t iona l entropy C31 was used t o f o r m a l l y e v a l u a t e t h e e f f e c t s o f "hints." From th is , an improvement r a t i o was der ived which g ives a l o w e r bound f o r t h e Improvemen t i n t h e p r o b a b l l l t y of r u l e e x t r a c t i o n when a p r o p e r l y f o r m u l a t e d h l n t i s used. It was a l s o shown t h a t t h e f u n c t i o n a l en t ropy o f t h e network decreases f a s t e r when such a h i n t i s used, i n d i c a t i n g a speed-up i n t r a i n i n g t lme I n t h i s case.
A s e r i e s o f exper iments conf i rmed t h a t a subs tan t ia l improvement i n t r a i n l n g t ime resu l ted from the use o f monotonic t r a i n i n g data and r u l e - i n j e c t i o n h i n t s . B o t h a s e r l e s o f s i m p l e non-monotonic funct ions and a model o f human con- t r o l b e h a v i o r w i t h a p l a n e t a r y 1 ander, were model led . The l a t t e r was model l e d a t v a r y i n g stages o f d e t a i l and, w h i l e each model functioned t o some degree, t h e more de ta l 1 ed models performed more accurately and were t ra lned i n less t ime than the l e s s de ta l l ed models. C11.
I n our more recent experiments, an NPSN model i s being created t o model t h e p l l o t i n c o n t r o l l i n g an a i rp lane i n "instrument-flying" s i tuat ions. It i s c l e a r t h a t it i s f u t i l e t o attempt t o implement such complex systems w i t h a mono l i th ic neural net. I n t h e above case, t h e c o n t e x t o f t h e s i t u a t i o n c o u l d b e " n o r m a l - f l i g h t " w i t h sub-contexts, Takeoff, Landing and Crulse. Takeoff cou ld agaln have sub-contexts such as Visual F l i g h t Takeoff, Short F i e l d Takeoff o r So f t F i e l d Takeoff. Cruise c o u l d have sub-contexts such as s t r a i g h t and l e v e l r c l imb and descent. C41 A 1 1 complex model 1 i ng s i t ua t i ons can be decomposed I n t o such h ie ra rch ies , and it i s e f f i c i e n t t o des ign NPSN systems w i t h a module f o r each o f t h e f f lowest " (and s i m p l e s t ) sub-contexts and w i t h t h e same h ie rarchy . When t h e a p p r o p r i a t e s i t u a t i o n i s recognized, simp1 e AND-gate switching can se1 e c t t h e a p p r o p r i a t e modul e, and back-propagat ion learn ing w i l l then be app l ied o n l y t o t h a t module.
G o l d s t e i n and Grimson E41 have designed an i n t e r e s t i n g system i n L ISP u s i n g " a n n o t a t e d production systems,I1 f o r model 1 i ng p i l o t con t ro l , etc., and d i scuss such problems as choos ing t h e b e s t a c t i o n i n a g i v e n c o n t e x t when t h e r e a r e s e v e r a l cand ida tes and excep t ions t o a r u l e . Neura l Net systems can r e a d i l y cope w i t h these
2
CH2809-2/89/0000-0002 $1.00 ' 1989 IEEE
,~ 7- - -
problems. the real- t ime response problem,
They a l s o a r e v e r y a b l e t o cope w i t h
Whi 1 e we have a comple te dynamic model o f a commercial a i rp lane we have s ta r ted w i th a s imple dynamic model where o n l y t w o c o n t r o l s a r e necessary (Thrust and P i tch ing Moment). This w i l l 1 a t e r be extended g r a d u a l l y t o i n c r e a s i n g l y complex models. The ex i s tence o f a p r o t o t y p e rule-based expert system f o r t h i s problem g r e a t l y f a c i 1 i t a t e s t h e development o f an NPSN model.
t i c Restructu r i ag
I n b u i l d i n g t h e NPSN model, t t f e I1bestt1 a r c h i t e c t u r e i s f i r s t c r e a t e d i n t e r m s o f I l s t r i c t l y monotonic" modules as f a r as p o s s i b l e ( t o g i v e l e a r n i n g e f f i c i e n c y ) . L e a r n i n g e f f i c i e n c y i s t h e n a l s o enhanced by a d d i n g p r o p e r l y chosen h i n t s (dummy o u t p u t s o f an NN module which c o n s t r a i n t h e we igh t -ad jus tment process). Automatic res t ruc tu r ing t o improve both t h e c o n t i n u o u s l e a r n i n g p r o c e s s and t h e performance, then ranges from the simp1 e process o f weight adjustment (which by i t s e l f can remove p a r t s o f a network which do n o t c o n t r i b u t e t o a decision) t o the automatic generation o f s u i t a b l e h i n t s and o t h e r methods f o r adding o r d e l e t i n g su b-netwo rk s
I i b ra rv of P r t m i t i v e NN's for 18Features11
We a r e c u r r e n t l y i n v e s t i g a t i n g t h e use o f a l a r g e l f l i b ra ry l f o f p r i m i t i v e n e u r a l nets, and combina t ions o f t hese p r i m i t i v e s which have prev ious ly been found t o be useful, t o synthesize optimal models automatical l y . The basic problem here i s t o index the p r i m i t i v e s and combinations i n terms o f t h e i r f u n c t i o n s and g o a l s and t o recognize when they should be used. This approach i s s i m i l a r t o the Vase-based reasoningf1 research which i s c u r r e n t l y o f g r e a t i n t e r e s t t o t h e A I community. The combina t ions o f p r i m i t i v e s a r e ltfeatureslf o f t he problem. The best features are t h o s e w h i c h f r e q u e n t l y o c c u r i n s p e c i f i c s i t u a t i o n s b u t occur i n f r e q u e n t l y i n general . (e.g. t h e F - r a t i o can be used as a measure).
HPSN Models
N-Port Symbol i c -Neura l (NPSN) models a r e b u i l t o f n -po r t dev i ces w i t h a r b i t r a r y , b u t u n i d i r e c t i o n a l i n p u t s and outputs. These models are b u i l t using many smal l neural networks, which are coupled together through conventional symbolic logic. Here, t he neural network used i s a mu l t i - 1 ayer network t r a i n e d using the back-propagation t r a i n i n g a l g o r i t h m def ined by Rumel har t , H in ton and Wi l l iams C51. With t h i s mode l l ing approach, e x p l i c i t human k n o w l e d g e i s d e f i n e d by t h e e x t e r n a l s t r u c t u r e o f t h e networks, as w e l l as r u l e s and algorithms. Knowledge i s then provided i n the form o f t r a i n i n g f o r t he i nd i v idua l neural networks which a r e capab le o f l e a r n i n g f rom t h e ob serv a t i on o f data.
I t i s o f t e n p o s s i b l e t o p a r t i t i o n an n-por t model i n such a way t h a t t he t r a i n i n g data f o r t he smal l e r component neural networks becomes s t r i c t l y monotonic. T h i s speeds up t h e l e a r n i n g process s i n c e t h e back-propagat ion t r a i n i n g a l g o r i t h m f i r s t a t tempts t o model f u n c t i o n s which a r e s t r i c t l y monotonic. C21
Another method f o r increasing the p r o b a b i l i t y o f successful r u l e ex t rac t ion f o r model 1 i ng non- s t r i c t l y monotonic re la t i onsh ips i s t he add i t ion o f a l l r u l e - i n j e c t i o n h in t v1 t o t h e network. A h inted network i s t ra ined simultaneously f o r two response v e c t o r par ts , t h e o r i g i n a l response v e c t o r o f i n t e r e s t R and t h e h i n t v e c t o r Rc. The h i n t c o n s t r a i n s !'he t r a i n i n g a l g o r i t h m , f o r c i n g t h e h idden l a y e r t o be implemented i n a way t h a t can be l i n e a r l y combined t o generate both t h e o u t p u t o f i n t e r e s t Ru and t h e h i n t Rc, t h u s improving r u l e ex t rac t ion and learn ing time.
As an example, a 2 -h idden u n i t n e t w o r k converged f o r s t r i c t l y monotonic funct ions such as X+Y and X-Y i n about 170 epochs. However, t h e XOR(X,Y), a non-monotonic func t ion , took 3001 epochs. Using X+Y and s i m i l a r h i n t s reduced t h e t r a i n i n g t ime f o r XOR funct ions t o 841 epochs.
B r i e f l y , t h e e f f e c t i v e n e s s o f t h e h i n t s i s e x p l a i n e d i n two ways. F i r s t , i f t h e h i n t s a r e s t rong ly monotonic, (i.e., t he gradient vector i s l a rge over the por t ion o f t he range f o r which the o r i g i n a l output o f i n t e r e s t i n non-monotonic) then they propagate strong e r r o r s igna ls o f a monotonic func t ion t o the hidden layer. These strong e r ro r s i g n a l s o v e r r i d e t h e llweaklf s i g n a l s o f t h e non- monotonic o u t p u t and t h e r e f o r e make t h e hidden u n i t s t r a i n more as though they were t r a i n i n g t o model s t r i c t l y monotonic func t tons . The o t h e r r e a s o n i s t h a t h i n t s s e r v e as c o n s t r a i n t s , reducing the number o f poss ib le so lu t ions t h a t t he network w i l l f i n d as i t t r a f n s . If t h e h i n t and
t h e o u t p u t o f i n t e r e s t a r e d e r i v e d f rom t h e same concepts o r r u l e s (such as shared i n t e r m e d i a t e terms i f bo th r e s u l t s were t o be computed by p h y s i c s e q u a t i o n ) , t h e n t h e r e i s a h i g h p r o b a b i l i t y t h a t t h e network w i l l e x t r a c t t h e proper r u l e . Furthermore, t h e en t ropy o f t h e t r a i n i n g mechanism decreases fas te r f o r a hinted network, thus i nd i ca t i ng a fas te r t r a i n i n g cycle.
C21
The f o l l o w i n g t e s t s w i l l i n d i c a t e i f t h e llhintll rc i s l i k e l y t o be e f fec t i ve :
1. The h i n t rc must be s t r i c t l y monotonic w i t h respect t o the inputs, S.
2. The o u t p u t o f i n t e r e s t , ru, shou ld be a monotonic combina t ion o f a subset o f S and t h e h i n t rc.
rc should not be de r i vab le from a l i n e a r (o r nea r l y l i n e a r ) combination o f t he o ther out- puts.
3 .
3
- I n Suddarth, Sutton and Holden C l 1 an example
i s g i v e n where a symbol l c - n e u r a l model o f human con t ro l behavior was used t o con t ro l a s imulated p l a n e t a r y lander . The f i r s t a t tempt used a symbol ic-neural model cons is t ing o f a s i n g l e back- p ropaga t ion network which used t h r e e inputs : f u e l , p resen t v e l o c i t y , and a l t i t u d e , and which produced one ou tpu t : t h r u s t . T r a i n i n g took 323,083 epochs (11 hours and 35 minu tes on a Compaq/286).
The system d i d l a n d somewhat success fu l l y ( r a t e o f descent l e s s than 3 u n i t s ) i n two o f t h e t h r e e above cases; it o n l y d i d so because it ran o u t o f f u e l c l o s e t o touchdown. Thus, t h e model had n o t performed an e n t i r e l y success fu l r u l e extraction. Further d e f i n i t i o n o f t h e model was required.
F u r t h e r d e f i n i t i o n was added by d e f i n i n g a "desired ve loc i t y " p r o f l l e and t r a i n i n g it i n t o a sub-modul e.
T h i s a r c h i t e c t u r e produced a more "Sure" land lng than t h e single-network system. Also, t he system c o n t r o l l e d v e l o c i t y near ly as w e l l as t h e human did. Another s i g n i f i c a n t b e n e f i t t o t h i s approach was the difference i n learn ing times, f o r network No. 1 t r a l n e d i n 721 epochs and network No. 2 t r a i n e d 13,041 epochs f o r a t o t a l t r a i n i n g t lme o f 27 minutes and 43 seconds.
A r u l e was then added which t o l d t h e model t o " d i t h e r " t h r u s t (an i n t e r g e r v a l u e ) i f necessary t o ho ld ve loc i ty . The f i n a l a rch i tec tu re i s shown i n f i g u r e 1. T h i s approach made t h e c r a f t t ouch down a t 3 f t / s , 1 f t / s s l o w e r than t h e model w i t h o u t r u l e s and o n l y 1 f t / s f a s t e r t hen t h e human.
The NNL experiment was a l s o implemented w i th a "hinted" model as shown i n f i g u r e 2.
This system converged i n 42,017 epochs ( lhour and 58 minu tes on t h e Compaq/80286). T h i s was o n l y 17% o f t h e t r a i n i n g t i m e r e q u i r e d f o r t h e same a r c h i t e c t u r e w i t h o u t t h e h i n t . The p e r f o r m a n c e a 1 so i n d i c a t e d success fu l r u l e e x t r a c t i o n . a l t h o u g h w i t h touchdown v e l o c i t i e s around 4 f t / s , a l i t t l e rougher than t h e more s t ruc tu red models already discussed.
References:
1. Suddar th , S.C. Su t ton , S.A. and Ho lden, A.D.C., "A Symbol ic-Neural Method f o r Sol v ing Control Prob l ems,g1 Proceedings In te rna t l ona l Conference on Neural Nets, San Diego, 1988.
Suddarth, S.C. and Holden, A.D.C., "An N-Port Symbol i c -Neura l Method f o r Model 1 i n g and C o n t r o l 1 i n g SystemsSs1 t o b e pub1 i shed , Journal o f Man-Machine Studies.
2.
3. Denker, J., Schwartz, D., Wi t tner , E., Sol l a , S., Hop f ie ld , J., Howard, R., and Jacke l , L., llAutomatic Learning, Ru l e -Ex t rac t i on and General i z a t i o n P Complex Systems, Vol. 1, pp.
4. Go lds te in , I.P. and Grimson, E., "Annotated P roduc t i on Systems: A Model f o r S k i l l Acqu is i t i onsg1 A I Memo 407, A I Laboratory, M.I.T., Feb. 1977.
5. Rumel har t , D.E., Hinton, G.E., W i 11 iams, R.J., I ILearning I n t e r n a l Representa t ions by E r r o r P r o a g a t i o n , ch. 7 i n P a r a l l e l D i s t r i b u t e d P r o c e s s i n g , V o l u m e 1: Foundations, Rumel h a r t 8 McCl e l 1 and (ed), MIT Press, Cambridge, pp. 318-362, 1986.
877-922, 1987.
4