WCCI 2008 Tutorial on Computational Intelligence and Games, part 2 of 3

Download WCCI 2008 Tutorial on Computational Intelligence and Games, part 2 of 3

Post on 21-Jan-2015




0 download

Embed Size (px)


WCCI 2008 Tutorial on Computational Intelligence and Games by Simon Lucas, Julian Togelius and Thomas Runarsson, part 2 of 3


<ul><li> 1. CIG case study: car racing A prolonged example of applying CI to agame: car racing Sensor representation and input selection Incremental evolution Competitive coevolution Player modelling Content creation </li></ul> <p> 2. Racing games On the charts for the last three decades Can be technically simple (computationally cheap) or very sophisticated Easy to pick up and play, but possess almost unlimited depth (a lifetime to master) Can be played on your own or with others 3. CI in racing games Learning to race on your own, against specic opponents, against opponents in general, on one or several tracks, using simple or complex cars/physics models, etc. Modelling driving styles Creating entertaining game content: tracks and opponent drivers 4. A simple car game Optimised for speed, not for prettiness 2D dynamics (momentum, understeer, etc.) Intended to qualitatively replicate a standard toy R/C car driven on a table Bang-bang control (9 possible commands) 5. Walls are solid Waypoints must be passed in order Fitness: continuous approximation of waypoints passed in 700 time steps 6. Inputs Six range-nder sensors (evolvable pos.) Waypoint sensor, Speed, Bias Networks Standard MLP, 9:6:2 Outputs interpreted as thrust/steering 7. T rack10 50100 200 P r.1 1.9 (0.1)1.99 (0.06) 2.02(0.01)2.04 (0.02) 102 2.06 (0.1) 2.12 (0.04) 2.14(0) 2.15 (0.01) 103 3.25 (0.08)3.4 (0.1) 3.45(0.12)3.57 (0.1)104 3.35 (0.11)3.58 (0.11) 3.61(0.1) 3.67 (0.1)105 2.66 (0.13)2.84 (0.02) 2.88(0.06)2.88 (0.06) 106 2.64 (0) 2.71 (0.08) 2.72(0.08)2.82 (0.1)107 1.53 (0.29)1.84 (0.13) 1.88(0.12)1.9 (0.09)10T rack 1820.59 (0.15) 3 0.73 (0.22)4 0.85(0.21)5 0.93 (0.25) 06 7 8Fitness (sd) 1.66 (0.12)1.86 (0.02) 2.27 (0.45) 2.66 (0.3)TABLE VI 2.19 (0.23)2.47 (0.18) 0.22 (0.15) 0.15 (0.01)TABLE VF ITNESS OF BEST CONTROLLERS , EVOLVING CONTROLLERSF ITNESS OF A FURTHER EVOLVED GENERAL CONTROLLER WITH EVOLVABLE SENSOR PARAMETERS ON THE DIFFERENT TRACKS . C OMPOUND FITNESSSPECIALISED FOR EACH TRACK , STARTING FROM A FURTHER EVOLVED 2.22 (0.09). GENERAL CONTROLLER WITH EVOLVED SENSOR PARAMETERS .Fig. 6. Sensor setup of a controllerreach good tness on, track 7. Presumtheir angular spread reects the largT rack 1050 100200 P r. T rack1050100 200 Phas to handle in order to navigate th r.11.9 (0.1) 1.99 (0.06)2.02 (0.01) 1(0.02) 2.040.32 (0.07) 100.54 (0.2)0.7 (0.38)0.81 (0.5)222.06 (0.1)2.12 (0.04)2.14 (0) 2 2.150.38 (0.24)(0.01) 100.49 (0.38) 0.56 (0.36) 0.71 (0.5)233.25 (0.08) 3.4 (0.1)3.45 (0.12) 3(0.1) 3.570.32 (0.09) 100.97 (0.5)1.47 (0.63) 1.98 (0.66) 743.35 (0.11) 3.58 (0.11)3.61 (0.1) 4(0.1) 3.670.53 (0.17) 101.3 (0.48)1.5 (0.54)2.33 (0.59) 952.66 (0.13) 2.84 (0.02)2.88 5 (0.06)2.880.45 (0.08)(0.06) 100.95 (0.6)0.95 (0.58) 1.65 (0.45) 862.64 (0)2.71 (0.08)2.72 6 (0.08)0.4 (0.08) 2.82 (0.1)100.68 (0.27) 1.02 (0.74) 1.29 (0.76) 571.53 (0.29) 1.84 (0.13)1.88 7 (0.12)1.9 0.3 (0.07) (0.09)100.35 (0.05) 0.39 (0.09) 0.46 (0.13) 0 8 0.16 (0.02) 0.19 (0.03) 0.2 (0.01)0.2 (0.01)080.59 (0.15) 0.73 (0.22)0.85 (0.21)0.93 (0.25) 0TABLE ITABLE VI T HE FITNESS OF THE BEST CONTROLLER OF VARIOUS GENERATIONS ONF ITNESS OF BEST CONTROLLERS , EVOLVING CONTROLLERS TRACKS , AND NUMBER OF RUNS PRODUCINGTHE DIFFERENT SPECIALISED FOR EACH TRACK , STARTING FROMPROFICIENT CONTROLLERS . F ITNESS AVERAGED OVER 10 SEPARATEA FURTHER EVOLVEDGENERAL CONTROLLER WITH EVOLVED SENSOR PARAMETERS . STANDARD DEVIATION BETWEEN PARENTHESES . EVOLUTIONARY RUNS ; Fig. 2. The initial sensor setup, which is kept throughout the evolutionaryFig. 6. 5. track Sensor setup of a controller specialized for, and able to consistently run for those runs where sensor parameters are not evolvable. Here, setup of controller specialized forreach good While on, track 7. Presumably the use of all but one sensor andFig. 5. Sensor the car more orless retaining the two longest-range sensors from the further evolved tness general is seen in close-up moving upward-leftward. At this particular position, thetheir angular spread reects the large variety of different situations the car front-right sensor returns a positive number very close to 0, as it detects on, it has added medium-range sensors in the front andcontroller it is based ahas to handle in order to navigate this more difcult track. wall near the limit of its range; the front-left sensor returns a number closeback,The front, very short-range sensor to the left. number of waypoints in the track, 7. Sensor setup of another con to 0.5, and the back sensor a slightly larger number.and a left and right passed, divided by theFig. sensors do not detect any walls at all and thus return 0.plus an intermediate term representing how far it is on its way in gure 6 seemingly using all i oneto the next waypoint, calculated from the relative distancesbetween the car and the previous and next waypoint. A range 200 pixels, as has three sensors pointing forward- tness of 10 evolutionary runs were made, trackcontrollers. For each track, 1.0 thus means having completed one full VII. O BSERVATIONS ON EV left, forward-right and backward respectively. The two other within the alloted time. Waypoints can only be passed in thewhere the initial population was seeded with the general sensors, which point left and right, have reach 100; this is correct order, and a waypoint is counted as passed when the illustrated in gure 2.controller and evolution of the car is within 30continue for waypoint. In It has previously been foundcentre was allowed to pixels from the 200 8. Example video Evolved with 50+50 ES, 100 Generatons 9. Choose your inputs (+their representation) Using third-person inputs (cartesian inputs) seems not to work Either range-nders or waypoint sensor can be taken away, but some tness lost A little bit of noise is not a problem, actually its desirable Adding extra inputs (while keeping core inputs) can reduce evolvability drastically! 10. If you dont knowyour inputs... Memetic techniques (e.g. memetic ES) can sort out useful from useless inputs Principle: evolve neural network weights together with a mask: whether connections are on or off Masks and weights are evolved at different time scales; after every mask mutation, weight space is searched - if no tness increase, the mask is reverted 11. Learning controllers with irrelevant inputs presentTogelius, Gomez and Schmidhuber (2008) 12. Generalization andspecialization A controller evolved for one track does not necessarily perform well on other tracks How do we achieve more general game- playing skills? Is there a tradeoff between generality and performance? 13. damaging such cars in collisions is ha weight.The dynamics of the car are based on mechanical model, taking into account car and bad grip on the surface, but is n measurement [13][14]. The model is s [4], and differs mainly in its improve after more experience with the physical response system was reimplemented to realistic (and, as an effect, more undesir may cause the car to get stuck if the unfortunate angle, something often see physical cars.A track consists of a set of walls, a and a set of starting positions and di is added to a track in one of the sta corresponding starting direction, both t being subject to random alterations. Th for tness calculations.For the experiments we have des tracks, presented in gure 1. The tr vary in difculty, from easy to hard. are versions of three other tracks wi in reverse order, and the directions of reversed.The main differences between our real R/C car racing problem have to reported in Tanev et al. as well as [4] not unimportant lag in the communica computer and car, leading to the control perceptions. Apart from that, there Fig. 1. The eight tracks. Notice how tracks 1 and 2 (at the top), 3 and 4, 5 and 6 differ in the clockwise/anti-clockwise layout of waypoints and in estimations of the cars position a associated starting points. Tracks 7 and 8 have no relation to each other overhead camera. In contrast, the sim 14. Incremental evolution Introduced by Gomez &amp; Mikkulainen (1997) Change the tness function f (to make it more demanding) as soon as a certain tness is achieved In this case, add new tracks to f as soon as the controller can drive 1.5 rounds on all tracks currently in f 15. Incremental evolution 16. Controllers evolved for specic tracks perform poorly on other tracks General controllers, that can drive almost any track, can be incrementally evolved Starting from a general controller, a controller can be further evolved for specialization on a particular track drive faster than the general controller works even when evolution from scratch did not work! 17. Two cars on a track Two car with solo-evolved controllers on one track: disaster they dont even see each other! How do we train controllers that take other drivers into account? (avoiding collisions or using them to their advantage) Solution: car sensors (rangenders, like the wall sensors) and competitive coevolution 18. Video: navigating a complex track 19. Competitive coevolution The tness function evaluates at least two individuals One individuals success is adversely affected by the others (directly or indirectly) Very potent, but seldom straightforward; e.g. Hillis (1991), Rosin and Belew (1996) 20. Competitive coevolution Standard 15+15 ES; each individual is evaluated through testing against the current best individual in the population Fitness function a mix of... Absolute tness: progress in n time steps Relative tness: distance ahead of or behind the other car after n time steps 21. Video: absolute tness 22. Video: 50/50 tness 23. Video: relative tness 24. Problems with coevolution Over-specialization and cycling Can be battled with e.g. archives Loss of gradient Can be battled with careful tness function design, e.g. combining absolute and relative tness Much more research needed here! 25. Multi-population coevolution Typically, competitive coevolution uses one or two populations Many more populations can be used! Can help against cycling and overspecialization The phenotypical diversity between populations can be useful in itself 26. Example: 1 versus 9 populationsTogelius, Burrow, Lucas (2007) 27. Player modelling Can we create players that drive just like specic human players? The models need to be... Similar in terms of performance Similar in terms playing (driving) style Robust 28. Direct modelling Let a player drive a number of tracks Use supervised learning to associate inputs (sensors) with outputs (driving commands) e.g. MLP/Backpropagation or k-nearest neighbour Suffers from generalization problems, and that any approximation is likely to lead to worse playing performance 29. Indirect modelling Let a human drive a test track, record performance, speed and orthogonal deviation at the various waypoints the track Start from a good, general evolved neural network controller, and evolve it further Fitness: negative difference between controller and player for the three measures above 30. The test track supposedly requires a varied repertoire of driving skills1 0.8 0.6Fitness (progess, speed)0.4 0.2 Fig. 2. The test track and the car. Fig. 3.Evolving a First of all, we design a test track, featuring a number of different types of racing challenges. The track, as pictured0in (g), has two long straight sections where the player can0.2 31. Content creation Creating interesting, enjoyable levels, worlds, tracks, opponents etc. Not the same as well-playing opponents Probably the area where commercial game developers need most help What makes game content fun? Many theories, e.g. Thomas Malone, Raph Koster, Mihly Cskszentmihlyi 32. Track evolution Using the controllers we evolved to model human players, we evolve tracks that are fun to drive for the modelled player Fitness function: Right amount of progress Variation in progress High maximum speed 33. Fig. 5. Track evolved using the random walk initialisation and mutation.e the representa- nted with severalt the beginning plementations ofngurations arerd initial trackrners. Each mu-ontrol points by distribution with y axes.xperiments, mu-onguration, but ectangle track iseds of mutations those mutationscontroller is not e result of such all drivable track. ck and evolutionFig. 6. A track evolved (using the radial method) to be fun for the rst author, who plays too many racing games anyway. It is not easy to drive, which is just as it should be. n, starts from an rol points around 34. the results of ou car racing [10].In the section describe a numb value, most of wh described here. D sures would den urgent to study th oft-cited hypothe know there are n entertainment me games and types needed.Finally we no different approach in the beginning Fig. 7. A track evolved (using the radial method) to be fun for the second viewed from sev author, who is a bit more careful in his driving. Note the absence of sharp on using evoluti turns.in games is not studying under w perspective we h 35. ks by sampling aken advantageack. First thick side of the b- ixels or subject nt is set up. Butth of the track, and sometimes struction of the ing the b-splinemiddle of theimately regular e resulting track est track whiche control pointsFig. 5. Track evolved using the random walk initialisation and mutation.the representa- ed with several the beginning 36. but only sometimes causes the car to collide. Those elements are believed to be the main source of nal progress variability. These features are also notably absent from track c, on which the good player model has very low variability. The progress of the controller is instead limited by many broad curves. Fig. 3. Three evolved tracks: ((a)) evolved for a bad player with target progress 1.1, (b) evolved for a good player with target tness 1.5, (c) evolved for a good player with target progress 1.5 using only progress tness. 37. Video: evolved TORCS drivers 38. Video: real car control 39. More on these topics http://julian.togelius.com e.g. Togelius, Lucas and De Nardi: Computational Intelligence in Racing Games Togelius, Gomez and Schmidhuber: Learning what to ignore on Friday, 11.10, room 606 Car Racing Competition on Tuesday 15.00, room 402 </p>