wcci 2008 tutorial on computational intelligence and games, part 2 of 3

CIG case study: car racing• A prolonged example of applying CI to a

game: car racing

• Sensor representation and input selection

• Incremental evolution

• Competitive coevolution

• Player modelling

• Content creation

Racing games

• On the charts for the last three decades

• Can be technically simple (computationally cheap) or very sophisticated

• Easy to pick up and play, but possess almost unlimited “depth” (a lifetime to master)

• Can be played on your own or with others

CI in racing games• Learning to race

• on your own, against specific opponents, against opponents in general, on one or several tracks, using simple or complex cars/physics models, etc.

• Modelling driving styles

• Creating entertaining game content: tracks and opponent drivers

A simple car game

• Optimised for speed, not for prettiness

• 2D dynamics (momentum, understeer, etc.)

• Intended to qualitatively replicate a standard toy R/C car driven on a table

• Bang-bang control (9 possible commands)

• Walls are solid

• Waypoints must be passed in order

• Fitness: continuous approximation of waypoints passed in 700 time steps

• Inputs

• Six range-finder sensors (evolvable pos.)

• Waypoint sensor, Speed, Bias

• Networks

• Standard MLP, 9:6:2

• Outputs interpreted as thrust/steering

Fig. 2. The initial sensor setup, which is kept throughout the evolutionaryrun for those runs where sensor parameters are not evolvable. Here, the caris seen in close-up moving upward-leftward. At this particular position, thefront-right sensor returns a positive number very close to 0, as it detects awall near the limit of its range; the front-left sensor returns a number closeto 0.5, and the back sensor a slightly larger number. The front, left and rightsensors do not detect any walls at all and thus return 0.

range 200 pixels, as has three sensors pointing forward-

left, forward-right and backward respectively. The two other

sensors, which point left and right, have reach 100; this is

illustrated in figure 2.

B. Neural networks

The controllers in the experiments below are based on

neural networks. More precisely, we are using multilayer

perceptrons with three neuronal layers (two adaptive layers)

and tanh activation functions. A network has at least three

inputs: one fixed input with the value 1, one speed input

in the approximate range [0..3], and one input from the

waypoint sensor, in the range [-!..!]. In addition to this,it might have any number of inputs from wall sensors, in

the range [0..1]. All networks have two outputs, which are

interpreted as driving commands for the car.

C. Evolutionary algorithm

The genome is an array of floating point numbers, of

variable or fixed length depending on the experimental setup.

Apart from information on the number of wall sensors and

hidden neurons, it encodes the orientation and range of the

wall sensors, and weights of the connections in the neural

network.

The evolutionary algorithm used is a kind of evolutionary

strategy, with µ = 50 and ! = 50. In other words, 50genomes (the elite) are created at the start of evolution. At

each generation, one copy is made of each genome in the

elite, and all copies are mutated. After that, fitness value is

calculated for each genome, and the 50 best individuals of

all 100 form the new elite.

There are two mutation operators: Gaussian mutation

of all weight values, and Gaussian mutation of all sensor

parameters (angles and lengths), which might be turned on

or off. In both cases, the standard deviation of the Gaussian

distribution was set to 0.3.

Last but not least: the fitness function. The fitness of a

controller is calculated as the number of waypoints it has

Track 10 50 100 200 Pr.1 0.32 (0.07) 0.54 (0.2) 0.7 (0.38) 0.81 (0.5) 22 0.38 (0.24) 0.49 (0.38) 0.56 (0.36) 0.71 (0.5) 23 0.32 (0.09) 0.97 (0.5) 1.47 (0.63) 1.98 (0.66) 74 0.53 (0.17) 1.3 (0.48) 1.5 (0.54) 2.33 (0.59) 95 0.45 (0.08) 0.95 (0.6) 0.95 (0.58) 1.65 (0.45) 86 0.4 (0.08) 0.68 (0.27) 1.02 (0.74) 1.29 (0.76) 57 0.3 (0.07) 0.35 (0.05) 0.39 (0.09) 0.46 (0.13) 08 0.16 (0.02) 0.19 (0.03) 0.2 (0.01) 0.2 (0.01) 0

TABLE I

THE FITNESS OF THE BEST CONTROLLER OF VARIOUS GENERATIONS ON

THE DIFFERENT TRACKS, AND NUMBER OF RUNS PRODUCING

PROFICIENT CONTROLLERS. FITNESS AVERAGED OVER 10 SEPARATE

EVOLUTIONARY RUNS; STANDARD DEVIATION BETWEEN PARENTHESES.

passed, divided by the number of waypoints in the track,

plus an intermediate term representing how far it is on its way

to the next waypoint, calculated from the relative distances

between the car and the previous and next waypoint. A

fitness of 1.0 thus means having completed one full track

within the alloted time. Waypoints can only be passed in the

correct order, and a waypoint is counted as passed when the

centre of the car is within 30 pixels from the waypoint. In

the evolutionary experiments reported below, each car was

allowed 700 timesteps (enough to do two to three laps on

most tracks in the test set) and fitness was averaged over

three trials.

IV. EVOLVING TRACK-SPECIFIC CONTROLLERS

The first experiments consisted in evolving controllers for

the eight tracks separately, in order to the test the software

in general and to rank the difficulty of the tracks.

For each of the tracks, the evolutionary algorithm was run

10 times, each time starting from a population of “clean”

controllers, with all connection weights set to zero and sensor

parameters as explained above. Only weight mutation was

allowed. The evolutionary runs were for 200 generations

each.

A. Fixed sensor parameters

1) Evolving from scratch: The results are listed in table I,

which is read as follows: each row represents the results for

one particular track. The first column gives the mean of the

fitnesses of the best controller of each of the evolutionary

runs at generation 10, and the standard deviation of the

fitnesses of the same controllers. The next three columns

present the results of the same calculations at generations 50,

100 and 200, respectively. The “Pr” column gives the number

of proficient best controllers for each track. An evolutionary

run is deemed to have produced a proficient controller if

its best controller at generation 200 has a fitness (averaged,

as always, over three trials) of at least 1.5, meaning that it

completes at least one and a half lap within the allowed time.

For the first two tracks, proficient controllers were pro-

duced by the evolutionary process within 200 generations,

but only in two out of ten runs. This means that while it is

possible to evolve neural networks that can be relied on to

Track 1 2 3 4 5 6 7 8Fitness (sd) 1.66 (0.12) 1.86 (0.02) 2.27 (0.45) 2.66 (0.3) 2.19 (0.23) 2.47 (0.18) 0.22 (0.15) 0.15 (0.01)

TABLE V

FITNESS OF A FURTHER EVOLVED GENERAL CONTROLLER WITH EVOLVABLE SENSOR PARAMETERS ON THE DIFFERENT TRACKS. COMPOUND FITNESS

2.22 (0.09).

Track 10 50 100 200 Pr.1 1.9 (0.1) 1.99 (0.06) 2.02 (0.01) 2.04 (0.02) 102 2.06 (0.1) 2.12 (0.04) 2.14 (0) 2.15 (0.01) 103 3.25 (0.08) 3.4 (0.1) 3.45 (0.12) 3.57 (0.1) 104 3.35 (0.11) 3.58 (0.11) 3.61 (0.1) 3.67 (0.1) 105 2.66 (0.13) 2.84 (0.02) 2.88 (0.06) 2.88 (0.06) 106 2.64 (0) 2.71 (0.08) 2.72 (0.08) 2.82 (0.1) 107 1.53 (0.29) 1.84 (0.13) 1.88 (0.12) 1.9 (0.09) 108 0.59 (0.15) 0.73 (0.22) 0.85 (0.21) 0.93 (0.25) 0

TABLE VI

FITNESS OF BEST CONTROLLERS, EVOLVING CONTROLLERS

SPECIALISED FOR EACH TRACK, STARTING FROM A FURTHER EVOLVED

GENERAL CONTROLLER WITH EVOLVED SENSOR PARAMETERS.

Fig. 5. Sensor setup of controller specialized for track 5. While more orless retaining the two longest-range sensors from the further evolved generalcontroller it is based on, it has added medium-range sensors in the front andback, and a very short-range sensor to the left.

controllers. For each track, 10 evolutionary runs were made,

where the initial population was seeded with the general

controller and evolution was allowed to continue for 200

generations. Results are shown in table VI. The mean fitness

improved significantly on all six first tracks, and much of

the fitness increase occured early in the evolutionary run,

as can be seen from a comparison with table V. Further,

the variability in mean fitness of the specialized controllers

from different evolutionary runs is very low, meaning that the

reliability of the evolutionary process is very high. Perhaps

most surprising, however, is that all 10 evolutionary runs

produced proficient controllers for track 7, on which the

general controller had not been trained (and indeed had very

low fitness) and for which it had previously been found to

be impossible to evolve a proficient controller from scratch.

Analysis of the evolved sensor parameters of the special-

ized controllers show a remarkable diversity, even among

controllers specialized for the same track, as evident in

figures 5, 6 and 7. Sometimes, no similarity can be found

between the evolved configuration and either the original

sensor parameters or those of the further evolved general

controller the specialization was based on.

Fig. 6. Sensor setup of a controller specialized for, and able to consistentlyreach good fitness on, track 7. Presumably the use of all but one sensor andtheir angular spread reflects the large variety of different situations the carhas to handle in order to navigate this more difficult track.

Fig. 7. Sensor setup of another controller specialized for track 7, like theone in figure 6 seemingly using all its sensors, but in a quite different way.

VII. OBSERVATIONS ON EVOLVED DRIVING BEHAVIOUR

It has previously been found that the evolutionary approach

used in this paper can produce controllers that outperform

human drivers[4]. To corroborate this result, one of the

authors measured his own performance on the various tracks,

driving the car using keyboard inputs and a suitable delay

of 50 ms between timesteps. Averaged over 10 attempts,

the author’s fitness on track 2 was 1.89, it was 2.65 on

track 5, and 1.83 on track 7, numbers which compare rather

unfavourably with those found in table VI. The responsible

author would like to believe that this says more about the

capabilities of the evolved controllers than those of the

author.

Traces of steering and driving commands from the evolved

controllers show that they often use a PWM-like technique,

in that they frequently - sometimes almost every timestep -

change what commands they issue. For example, the general

controller used as the base for the specializations above

employs the tactic of constantly alternating between steering

left and right when driving parallell to a wall, giving the

appearance that the car is shaking. Frequently alternating

Track 1 2 3 4 5 6 7 8Fitness (sd) 1.66 (0.12) 1.86 (0.02) 2.27 (0.45) 2.66 (0.3) 2.19 (0.23) 2.47 (0.18) 0.22 (0.15) 0.15 (0.01)

TABLE V

FITNESS OF A FURTHER EVOLVED GENERAL CONTROLLER WITH EVOLVABLE SENSOR PARAMETERS ON THE DIFFERENT TRACKS. COMPOUND FITNESS

2.22 (0.09).

Track 10 50 100 200 Pr.1 1.9 (0.1) 1.99 (0.06) 2.02 (0.01) 2.04 (0.02) 102 2.06 (0.1) 2.12 (0.04) 2.14 (0) 2.15 (0.01) 103 3.25 (0.08) 3.4 (0.1) 3.45 (0.12) 3.57 (0.1) 104 3.35 (0.11) 3.58 (0.11) 3.61 (0.1) 3.67 (0.1) 105 2.66 (0.13) 2.84 (0.02) 2.88 (0.06) 2.88 (0.06) 106 2.64 (0) 2.71 (0.08) 2.72 (0.08) 2.82 (0.1) 107 1.53 (0.29) 1.84 (0.13) 1.88 (0.12) 1.9 (0.09) 108 0.59 (0.15) 0.73 (0.22) 0.85 (0.21) 0.93 (0.25) 0

TABLE VI

FITNESS OF BEST CONTROLLERS, EVOLVING CONTROLLERS

SPECIALISED FOR EACH TRACK, STARTING FROM A FURTHER EVOLVED

GENERAL CONTROLLER WITH EVOLVED SENSOR PARAMETERS.

Fig. 5. Sensor setup of controller specialized for track 5. While more orless retaining the two longest-range sensors from the further evolved generalcontroller it is based on, it has added medium-range sensors in the front andback, and a very short-range sensor to the left.

controllers. For each track, 10 evolutionary runs were made,

where the initial population was seeded with the general

controller and evolution was allowed to continue for 200

generations. Results are shown in table VI. The mean fitness

improved significantly on all six first tracks, and much of

the fitness increase occured early in the evolutionary run,

as can be seen from a comparison with table V. Further,

the variability in mean fitness of the specialized controllers

from different evolutionary runs is very low, meaning that the

reliability of the evolutionary process is very high. Perhaps

most surprising, however, is that all 10 evolutionary runs

produced proficient controllers for track 7, on which the

general controller had not been trained (and indeed had very

low fitness) and for which it had previously been found to

be impossible to evolve a proficient controller from scratch.

Analysis of the evolved sensor parameters of the special-

ized controllers show a remarkable diversity, even among

controllers specialized for the same track, as evident in

figures 5, 6 and 7. Sometimes, no similarity can be found

between the evolved configuration and either the original

sensor parameters or those of the further evolved general

controller the specialization was based on.

Fig. 6. Sensor setup of a controller specialized for, and able to consistentlyreach good fitness on, track 7. Presumably the use of all but one sensor andtheir angular spread reflects the large variety of different situations the carhas to handle in order to navigate this more difficult track.

Fig. 7. Sensor setup of another controller specialized for track 7, like theone in figure 6 seemingly using all its sensors, but in a quite different way.

VII. OBSERVATIONS ON EVOLVED DRIVING BEHAVIOUR

It has previously been found that the evolutionary approach

used in this paper can produce controllers that outperform

human drivers[4]. To corroborate this result, one of the

authors measured his own performance on the various tracks,

driving the car using keyboard inputs and a suitable delay

of 50 ms between timesteps. Averaged over 10 attempts,

the author’s fitness on track 2 was 1.89, it was 2.65 on

track 5, and 1.83 on track 7, numbers which compare rather

unfavourably with those found in table VI. The responsible

author would like to believe that this says more about the

capabilities of the evolved controllers than those of the

author.

Traces of steering and driving commands from the evolved

controllers show that they often use a PWM-like technique,

in that they frequently - sometimes almost every timestep -

change what commands they issue. For example, the general

controller used as the base for the specializations above

employs the tactic of constantly alternating between steering

left and right when driving parallell to a wall, giving the

appearance that the car is shaking. Frequently alternating

Example video

Evolved with 50+50 ES, 100 Generatons

Choose your inputs(+their representation)• Using third-person inputs (cartesian inputs)

seems not to work

• Either range-finders or waypoint sensor can be taken away, but some fitness lost

• A little bit of noise is not a problem, actually it’s desirable

• Adding extra inputs (while keeping core inputs) can reduce evolvability drastically!

If you don’t knowyour inputs...

• Memetic techniques (e.g. memetic ES) can sort out useful from useless inputs

• Principle: evolve neural network weights together with a mask: whether connections are on or off

• Masks and weights are evolved at different time scales; after every mask mutation, weight space is searched - if no fitness increase, the mask is reverted

Learning controllers with irrelevant inputs present

Togelius, Gomez and Schmidhuber (2008)

Generalization and specialization

• A controller evolved for one track does not necessarily perform well on other tracks

• How do we achieve more general game-playing skills?

• Is there a tradeoff between generality and performance?

Fig. 1. The eight tracks. Notice how tracks 1 and 2 (at the top), 3 and4, 5 and 6 differ in the clockwise/anti-clockwise layout of waypoints andassociated starting points. Tracks 7 and 8 have no relation to each otherapart from both being difficult.

how to evolve controllers that provide robust performanceover several tracks. These controllers are then validated ontracks for which they have not been evolved. Finally, thesecontrollers are further evolved to provide better fitness onspecific tracks, conclusions are drawn, and further researchis suggested.

II. THE CAR RACING MODEL

The experiments in this article were performed in a2-dimensional simulator, intended to qualitatively if notquantitatively, model a standard radio-controlled (R/C) toycar (approximately 17 centimeters long) in an arena withdimensions approximately 3*2 meters, where the track isdelimited by solid walls. The simulation has the dimensions400*300 pixels, and the car measures 20*10 pixels.

R/C toy car racing differs from racing full-sized cars inseveral ways. One is the simplified controls; many R/C carshave only three possible drive modes (forward, backward,and neutral) and three possible steering modes (left, rightand center). Other differences are that many toy cars havebad grip on many surfaces, leading to easy skidding, and that

damaging such cars in collisions is harder due to their lowweight.

The dynamics of the car are based on a reasonably detailedmechanical model, taking into account the small size of thecar and bad grip on the surface, but is not based on any actualmeasurement [13][14]. The model is similar to that used in[4], and differs mainly in its improved collision handling;after more experience with the physical R/C cars the collisionresponse system was reimplemented to make collisions morerealistic (and, as an effect, more undesirable). Now, a collisonmay cause the car to get stuck if the wall is struck at anunfortunate angle, something often seen in experiments withphysical cars.

A track consists of a set of walls, a chain of waypoints,and a set of starting positions and directions. When a caris added to a track in one of the starting positions, withcorresponding starting direction, both the position and anglebeing subject to random alterations. The waypoints are usedfor fitness calculations.

For the experiments we have designed eight differenttracks, presented in figure 1. The tracks are designed tovary in difficulty, from easy to hard. Three of the tracksare versions of three other tracks with all the waypointsin reverse order, and the directions of the starting positionsreversed.

The main differences between our simulation and thereal R/C car racing problem have to do with sensing. Asreported in Tanev et al. as well as [4], there is a small butnot unimportant lag in the communication between camera,computer and car, leading to the controller acting on outdatedperceptions. Apart from that, there is often some errorin estimations of the car’s position and velocity from anoverhead camera. In contrast, the simulation allows instantand accurate information to be fed to the controller.

III. EVOLVABLE INTELLIGENCE

A. Sensors

The car experiences its environment through two typesof sensors: the waypoint sensor, and the wall sensors. Thewaypoint sensor gives the difference between the car’s cur-rent orientation and the angle to the next waypoint (but notthe distance to the waypoint). When pointing straight to awaypoint, this sensor thus outputs 0, when the waypoint isto the left of the car it outputs a positive value, and vice versa.As for the wall sensors, each sensor has an angle (relative tothe orientation of the car) and a range, between 0 and 200pixels. The output of the wall sensor is zero if no wall isencountered along a line with the specified angle and rangefrom the centre of the car, otherwise it is a fraction of one,depending on how close to the car the sensed wall is. A smallamount of noise is applied to all sensor readings, as it is tostarting positions and orientations.

In some of the experiments the sensor parameters aremutated by the evolutionary algorithm, but in all experimentsthey start from the following setup: one sensor points straightforward (0 radians) in the direction of the car and has

Incremental evolution

• Introduced by Gomez & Mikkulainen (1997)

• Change the fitness function f (to make it more demanding) as soon as a certain fitness is achieved

• In this case, add new tracks to f as soon as the controller can drive 1.5 rounds on all tracks currently in f

Incremental evolution

• Controllers evolved for specific tracks perform poorly on other tracks

• General controllers, that can drive almost any track, can be incrementally evolved

• Starting from a general controller, a controller can be further evolved for specialization on a particular track

• drive faster than the general controller

• works even when evolution from scratch did not work!

Two cars on a track• Two car with solo-evolved controllers on

one track: disaster

• they don’t even see each other!

• How do we train controllers that take other drivers into account? (avoiding collisions or using them to their advantage)

• Solution: car sensors (rangefinders, like the wall sensors) and competitive coevolution

Video: navigatinga complex track

Competitive coevolution

• The fitness function evaluates at least two individuals

• One individual’s success is adversely affected by the other’s (directly or indirectly)

• Very potent, but seldom straightforward; e.g. Hillis (1991), Rosin and Belew (1996)

Competitive coevolution

• Standard 15+15 ES; each individual is evaluated through testing against the current best individual in the population

• Fitness function a mix of...

• Absolute fitness: progress in n time steps

• Relative fitness: distance ahead of or behind the other car after n time steps

Video: absolute fitness

Video: 50/50 fitness

Video: relative fitness

Problems with coevolution

• Over-specialization and cycling

• Can be battled with e.g. archives

• Loss of gradient

• Can be battled with careful fitness function design, e.g. combining absolute and relative fitness

• Much more research needed here!

Multi-population coevolution

• Typically, competitive coevolution uses one or two populations

• Many more populations can be used!

• Can help against cycling and overspecialization

• The phenotypical diversity between populations can be useful in itself

Example: 1 versus 9 populations

Togelius, Burrow, Lucas (2007)

Player modelling

• Can we create players that drive just like specific human players?

• The models need to be...

• Similar in terms of performance

• Similar in terms playing (driving) style

• Robust

Direct modelling• Let a player drive a number of tracks

• Use supervised learning to associate inputs (sensors) with outputs (driving commands)

• e.g. MLP/Backpropagation or k-nearest neighbour

• Suffers from generalization problems, and that any approximation is likely to lead to worse playing performance

Indirect modelling

• Let a human drive a test track, record performance, speed and orthogonal deviation at the various waypoints the track

• Start from a good, general evolved neural network controller, and evolve it further

• Fitness: negative difference between controller and player for the three measures above

Fig. 2. The test track and the car.

First of all, we design a test track, featuring a number ofdifferent types of racing challenges. The track, as picturedin (fig), has two long straight sections where the player candrive really fast (or choose not to), a long smooth curve,and a sequence of nasty sharp turns. Along the track are 30waypoints, and when a human player drives the track, theway he passes each waypoint is recorded. What is recordedis the speed of the car when the waypoint is passed, and theorthogonal deviation from the path between the waypoints,i.e. how far to the left or right of the waypoint the car passed.This matrix of two times 30 values constitutes the raw datafor the player model.

The actual player model is constructed using the Cascad-ing Elitism algorithm, starting from a general controller andevolving it on the test track. Three fitness functions are used,based on minimising the following differences between thereal player and the controller: f1: total progress (number ofwaypoints passed within 1500 timesteps), f2: speed at whicheach waypoint was passed, and f3: orthogonal deviation waspassed. The first and most important fitness measure is thustotal progress difference, followed by speed and deviationdifference respectively.

D. ResultsIn our experiments, five different players’ driving was

sampled on the test track, and after 100 generations ofthe Cascading Elitism algorithm with a population of 100,controllers whose driving bore an acceptable degree ofresemblance to the modelled humans had emerged. Thetotal progress varied considerably between the five players- between 1.31 and 2.59 laps in 1500 timesteps - and thisdifference was faithfully replicated in the evolved controllers,which is to say that some controllers drove much faster thanothers. Progress was made on the two other fitness measuresas well, and though there was still some numerical differ-ence between the real and modelled speed and orthogonaldeviation at most waypoint passings, the evolved controllersdo reproduce qualitative aspects of the modelled players’driving. For example, the controller modelled on the first

0 10 20 30 40 50!0.8

!0.6

!0.4

!0.2

0

0.2

0.4

0.6

0.8

1

Fitn

ess

(pro

gess

, spe

ed)

0 10 20 30 40 50!11

!10.5

!10

!9.5

!9

!8.5

!8

!7.5

!7

!6.5

!6

Generations

Fitn

ess

(orth

ogon

al d

evia

tion)

speedprogressorthogonal deviation

Fig. 3. Evolving a controller to model a slow, careful driver.

0 10 20 30 40 50

!2

!1.8

!1.6

!1.4

!1.2

!1

!0.8

!0.6

!0.4

!0.2

0

Fitn

ess

(pro

gess

, spe

ed)

0 10 20 30 40 50!11

!10.5

!10

!9.5

!9

!8.5

!8

!7.5

!7

Generations

Fitn

ess

(orth

ogon

al d

evia

tion)

speedprogressorthogonal deviation

Fig. 4. Evolving a controller to model a good driver. The lack of progress onminimising the progress difference is because the progress of the modelleddriver is very close to that of the generic controller used to initialise theevolution.

author drives very close to the wall in the long smoothcurve, very fast on the straight paths, and smashes into thewall at the beginning of the first sharp turn. Conversely, thecontroller modelled on the anonymous and very careful driverwho scored the lowest total progress crept along at a steadyspeed, always keeping to the center of the track.

V. TRACK EVOLUTION

Once a good model of the human player has been acquired,we will use this model to evolve new, fun racing tracks forthe human player. In order to do this, we must know whatit is for a racing track to be fun, how we can measure thisproperty, and how the racing track should be representedin order for good track designs to be in easy reach of theevolutionary algorithm. We have not been able to find anyprevious research on evolving tracks, or for that sake any sortof computer game levels or environments. However, Ashlock

The test track supposedly requires a varied repertoire of driving skills

Content creation• Creating interesting, enjoyable levels, worlds,

tracks, opponents etc.

• Not the same as well-playing opponents

• Probably the area where commercial game developers need most help

• What makes game content fun? Many theories, e.g. Thomas Malone, Raph Koster, Mihály Csíkszentmihályi

Track evolution

• Using the controllers we evolved to model human players, we evolve tracks that are fun to drive for the modelled player

• Fitness function:

• Right amount of progress

• Variation in progress

• High maximum speed

The collision detection in the car game works by samplingpixels on a canvas, and this mechanism is taken advantageof when the b-spline is transformed into a track. First thickwalls are drawn at some distance on each side of the b-spline, this distance being either set to 30 pixels or subjectto evolution depending on how the experiment is set up. Butwhen a turn is too sharp for the current width of the track,this will result in walls intruding on the track and sometimesblocking the way. The next step in the construction of thetrack is therefore “steamrolling” it, or traversing the b-splineand painting a thick stroke of white in the middle of thetrack. Finally, waypoints are added at approximately regulardistances along the length of the b-spline. The resulting trackcan look very smooth, as evidenced by the test track whichwas constructed simply by manually setting the control pointsof a spline.

D. Initialisation and mutation

In order to investigate how best to leverage the representa-tional power of the b-splines, we experimented with severaldifferent ways of initialising the tracks at the beginningof the evolutionary runs, and different implementations ofthe mutation operator. Three of these configurations aredescribed here.

1) Straightforward: The straightforward initial trackshape forming a rectangle with rounded corners. Each mu-tation operation then perturbs one of the control points byadding numbers drawn from a gaussian distribution withstandard deviation 20 pixels to both x and y axes.

2) Random walk: In the random walk experiments, mu-tation proceeds like in the straightforward configuration, butthe initialisation is different. A rounded rectangle track isfirst subject to random walk, whereby hundreds of mutationsare carried out on a single track, and only those mutationsthat result in a track on which a generic controller is notable to complete a full lap are retracted. The result of such arandom walk is a severely deformed but still drivable track.A population is then initialised with this track and evolutionproceeds as usual from there.

3) Radial: The radial method of mutation, starts from anequally spaced radial disposition of the control points aroundthe center of the image; the distance of each point fromthe center is generated randomly. Similarly at each mutationoperation the position of the selected control point is simplychanged randomly along the respective radial line from thecenter. Constraining the control points in a radial dispositionis a simple method to exclude the possibility of producinga b-spline containing loops, therefore producing tracks thatare always fully drivable.

E. Results

We evolved a number of tracks using the b-spline rep-resentation, different initialisation and mutation methods,and different controllers derived using the indirect playermodelling approach.

Fig. 5. Track evolved using the random walk initialisation and mutation.

Fig. 6. A track evolved (using the radial method) to be fun for the firstauthor, who plays too many racing games anyway. It is not easy to drive,which is just as it should be.

1) Straightforward: Overall, the tracks evolved with thestraightforward method looked smooth, and were just as easyor hard to drive as they should be: the controller for which thetrack was evolved typically made a total progress very closeto the target progress. However, the evolved tracks didn’tdiffer from each other as much as we would have wanted.The basic shape of a rounded rectangle shines through rathermore than it should.

2) Random walk: Tracks evolved with random walk ini-tialisation look weird and differ from each other in aninteresting way, and so fulfil at least one of our objectives.However, their evolvability is a bit lacking, with the actualprogress of the controller often quite a bit from the targetprogress and maximum speed low.

Fig. 7. A track evolved (using the radial method) to be fun for the secondauthor, who is a bit more careful in his driving. Note the absence of sharpturns.

3) Radial: With the radial method, the tracks evolve ratherquickly and look decidedly different depending on whatcontroller was used to evolve them, and can thus be saidto be personalised. However, there is some lack of variety inthe end results in that they all look slightly like flowers.

4) Comparison with segment-based tracks: It is interest-ing to compare these tracks with some tracks evolved usingthe segment-based representation from our previous paper.Those tracks do show both the creativity evolution is capableof and a good ability to optimise the fitness values we define.But they don’t look like anything you would want to get outand drive on.

VI. DISCUSSION

We believe the method described in this paper holds greatpromise, and that our player modelling method is goodenough to be usable, but that there is much that needs tobe done in order for track evolution to be incorporated inan actual game. To start with, the track representation andmutation methods need to be developed further, until wearrive at something which is as evolvable and variable asthe segment-based representation but looks as good as (andis closed like) the b-spline-based representation.

Further, the racing game we have used for this investiga-tion is too simple in several ways, not least graphically butalso in its physics model being two-dimensional. A naturalnext step would be to repeat the experiments performed herein a graphically advanced simulation based on an suitablephysics engine, such as Ageia’s PhysX technology [19]. Insuch a simulation, it would be possible to evolve not only thetrack in itself, but also other aspects of the environment, suchas buildings in a city in which a race takes place. This couldbe done by combining the idea of procedural content creation[20][21] with evolutionary computation. Another excitingprospect is evolving personalised competitors, building on

the results of our earlier investigations into co-evolution incar racing [10].

In the section above on what makes racing fun, wedescribe a number of potential measures of entertainmentvalue, most of which are not implemented in the experimentsdescribed here. Defining quantitative versions of these mea-sures would definitely be interesting, but we believe it is moreurgent to study the matter empirically. Malone’s and Koster’soft-cited hypotheses are just hypotheses, and as far as weknow there are no psychological studies that tell us whatentertainment metric would be most suitable for particulargames and types of player. Real research on real players isneeded.

Finally we note that although we distinguished betweendifferent approaches to computational intelligence and gamesin the beginning to this paper, many experiments can beviewed from several perspectives. The focus in this paperon using evolutionary computation for practical purposesin games is not at all incompatible with using games forstudying under what conditions intelligence can evolve, aperspective we have taken in some of our previous papers.On the contrary.

VII. ACKNOWLEDGEMENTS

Thanks to Owen Holland, Georgios Yannakakis, RichardNewcombe and Hugo Marques for insightful discussions.

REFERENCES

[1] G. Kendall and S. M. Lucas, Proceedings of the IEEE Symposium onComputational Intelligence and Games. IEEE Press, 2005.

[2] P. Spronck, “Adaptive game ai,” Ph.D. dissertation, University ofMaastricht, 2005.

[3] I. Tanev, M. Joachimczak, H. Hemmi, and K. Shimohara, “Evolutionof the driving styles of anticipatory agent remotely operating a scaledmodel of racing car,” in Proceedings of the 2005 IEEE Congress onEvolutionary Computation (CEC-2005), 2005, pp. 1891–1898.

[4] B. Chaperot and C. Fyfe, “Improving artificial intelligence in amotocross game,” in IEEE Symposium on Computational Intelligenceand Games, 2006.

[5] J. Togelius and S. M. Lucas, “Evolving controllers for simulated carracing,” in Proceedings of the Congress on Evolutionary Computation,2005.

[6] ——, “Evolving robust and specialized car racing skills,” in Proceed-ings of the IEEE Congress on Evolutionary Computation, 2006.

[7] K. Wloch and P. J. Bentley, “Optimising the performance of aformula one car using a genetic algorithm,” in Proceedings of EighthInternational Conference on Parallel Problem Solving From Nature,2004, pp. 702–711.

[8] D. Cliff, “Computational neuroethology: a provisional manifesto,” inProceedings of the first international conference on simulation ofadaptive behavior on From animals to animats, 1991, pp. 29–39.

[9] D. Floreano, T. Kato, D. Marocco, and E. Sauser, “Coevolution ofactive vision and feature selection,” Biological Cybernetics, vol. 90,pp. 218–228, 2004.

[10] J. Togelius and S. M. Lucas, “Arms races and car races,” in Proceedingof Parallel Problem Solving from Nature. Springer, 2006.

[11] D. A. Pomerleau, “Neural network vision for robot driving,” in TheHandbook of Brain Theory and Neural Networks, 1995.

[12] J. Togelius, R. D. Nardi, and S. M. Lucas, “Making racing fun throughplayer modeling and track evolution,” in Proceedings of the SAB’06Workshop on Adaptive Approaches for Optimizing Player Satisfactionin Computer and Physical Games, 2006.

[13] D.-A. Jirenhed, G. Hesslow, and T. Ziemke, “Exploring internalsimulation of perception in mobile robots,” in Proceedings of theFourth European Workshop on Advanced Mobile Robots, 2001, pp.107–113.

The collision detection in the car game works by samplingpixels on a canvas, and this mechanism is taken advantageof when the b-spline is transformed into a track. First thickwalls are drawn at some distance on each side of the b-spline, this distance being either set to 30 pixels or subjectto evolution depending on how the experiment is set up. Butwhen a turn is too sharp for the current width of the track,this will result in walls intruding on the track and sometimesblocking the way. The next step in the construction of thetrack is therefore “steamrolling” it, or traversing the b-splineand painting a thick stroke of white in the middle of thetrack. Finally, waypoints are added at approximately regulardistances along the length of the b-spline. The resulting trackcan look very smooth, as evidenced by the test track whichwas constructed simply by manually setting the control pointsof a spline.

D. Initialisation and mutation

In order to investigate how best to leverage the representa-tional power of the b-splines, we experimented with severaldifferent ways of initialising the tracks at the beginningof the evolutionary runs, and different implementations ofthe mutation operator. Three of these configurations aredescribed here.

1) Straightforward: The straightforward initial trackshape forming a rectangle with rounded corners. Each mu-tation operation then perturbs one of the control points byadding numbers drawn from a gaussian distribution withstandard deviation 20 pixels to both x and y axes.

2) Random walk: In the random walk experiments, mu-tation proceeds like in the straightforward configuration, butthe initialisation is different. A rounded rectangle track isfirst subject to random walk, whereby hundreds of mutationsare carried out on a single track, and only those mutationsthat result in a track on which a generic controller is notable to complete a full lap are retracted. The result of such arandom walk is a severely deformed but still drivable track.A population is then initialised with this track and evolutionproceeds as usual from there.

3) Radial: The radial method of mutation, starts from anequally spaced radial disposition of the control points aroundthe center of the image; the distance of each point fromthe center is generated randomly. Similarly at each mutationoperation the position of the selected control point is simplychanged randomly along the respective radial line from thecenter. Constraining the control points in a radial dispositionis a simple method to exclude the possibility of producinga b-spline containing loops, therefore producing tracks thatare always fully drivable.

E. Results

We evolved a number of tracks using the b-spline rep-resentation, different initialisation and mutation methods,and different controllers derived using the indirect playermodelling approach.

Fig. 5. Track evolved using the random walk initialisation and mutation.

Fig. 6. A track evolved (using the radial method) to be fun for the firstauthor, who plays too many racing games anyway. It is not easy to drive,which is just as it should be.

1) Straightforward: Overall, the tracks evolved with thestraightforward method looked smooth, and were just as easyor hard to drive as they should be: the controller for which thetrack was evolved typically made a total progress very closeto the target progress. However, the evolved tracks didn’tdiffer from each other as much as we would have wanted.The basic shape of a rounded rectangle shines through rathermore than it should.

2) Random walk: Tracks evolved with random walk ini-tialisation look weird and differ from each other in aninteresting way, and so fulfil at least one of our objectives.However, their evolvability is a bit lacking, with the actualprogress of the controller often quite a bit from the targetprogress and maximum speed low.

evolutionary selection can be seen to guarantee the top speed not to be droppedin favour of the other fitnesses.

In figure 3 are displayed three tracks that evolution tailored on the playermodel of two of the authors; track ((a)) is evolved for a final progress of 1.1(since the respective human player was not very skilled), track (b) and (c) areinstead evolved on a model of a much skilled player for final progress 1 1.5. Fortrack (a) and (b) all the three fitness measure were used, while for track (c) onlyprogress fitness was used.

The main di!erence between tracks (a) and (b) is that track (a) is broaderand has fewer tricky passages, which makes sense as the player model used toevolve (a) drives slower. Both contain straight paths that allow the controller toachieve high speeds. In track (b) we can definitely notice the presence of narrowpassages and sharp turns, elements that force the controller to reduce speedbut only sometimes causes the car to collide. Those elements are believed tobe the main source of final progress variability. These features are also notablyabsent from track c, on which the good player model has very low variability.The progress of the controller is instead limited by many broad curves.

Fig. 3. Three evolved tracks: ((a)) evolved for a bad player with target progress 1.1,(b) evolved for a good player with target fitness 1.5, (c) evolved for a good player withtarget progress 1.5 using only progress fitness.

7 Conclusions

We have shown that we can evolve tracks that, for a given controller, will yield apredefined progress for the car in a given time, while maximizing the maximum1 The target progress is set between 50 and 75 percent of the progress achievable by

the specific controller in a straight path. As a comparison, in Formula 1 races thisratio (calculated as ratio between average speed and top speed) is about 70 percent,and for the latest Need for Speed game it is between 50 and 60 percent.

Video: evolvedTORCS drivers

Video: real car control

More on these topics

• http://julian.togelius.com

• e.g. Togelius, Lucas and De Nardi: “Computational Intelligence in Racing Games”

• Togelius, Gomez and Schmidhuber: “Learning what to ignore” on Friday, 11.10, room 606

• Car Racing Competition on Tuesday 15.00, room 402

http://julian.togelius.com

http://julian.togelius.com

wcci 2008 tutorial on computational intelligence and games, part 2 of 3

Sports

left sensor

waypoint sensor

right sensor

track sensor setup

shortrange sensor

sensor andfig

evolved sensor parameters

evolvable sensor parameters