artificial intelligence in game design
DESCRIPTION
Artificial Intelligence in Game Design. Off-Line and Neural Network Learning in Games. Type of Learning in Games. On-Line learning Takes place during game Changes game parameters for this particular play of game Must be fast and efficient - PowerPoint PPT PresentationTRANSCRIPT
Artificial Intelligence in Game Design
Off-Line and Neural Network Learning in Games
Type of Learning in Games
• On-Line learning– Takes place during game
– Changes game parameters for this particular play of game
– Must be fast and efficient
– Simple hill climbing, N-Grams, ID3 decision tree learning
• Off-Line learning– Done during development stage of game
– Used to set game parameters for final release of game
– Can use complex forms of learning
– Neural networks, Reinforcement learning, Bayesian learning, etc.
Game Parameters
• Most complex games use continuous-valued parameters of some sort in decision making– Probabilities/Fuzzy measures
– Coefficients in MinMax heuristics
– How do we know what the best values are?
Confident Angry Frightened
Attack Left 40% 60% 30%
Attack Right 40% 35% 20%
Defend 20% 5% 50%
9 5 3 3 1
Learning Game Parameters
• Learn game parameters from examples– Database of environmental inputs and desired character actions
Current Parameters
Actions indicated by rules
Degree of error between
actual and desired action
Learning ElementDetermines how to change parameters in order to decrease error
Inputs Desired Action
Example Example
… …
Learning Game Parameters
• Main difficulty: Acquiring examples of desired behavior to learn from– Most learning algorithms require thousands of examples
– Must know desired action for each one
• One solution: Let customers provide examples– Neural network learning in “twenty questions”
– http://www.20q.net/
Server contains database of examples submitted by users
Users interact with system on line
Learning ElementAdjusts knowledge to reduce error between its actions and user input
“Console version” marketed with final
rules learned
Learning Game Parameters
• Can learn by repeatedly playing game against itself• Samuel’s checkers program
– Early application of MinMax
– Used up to 15 different weight values for heuristic evaluation of board
• Basic idea:– Multiple versions of weights generated
– All versions played against one another
– Weights with most wins considered best
Artificial Neural Networks
• Based on “structure” of brain– Only known working model of intelligence
– Note that ANN is very rough approximation of little we know about brain
• Main components:– Neural Units
• Approximation of “neurons”
• Roughly binary states– “on” or “off”, “1” or “0”– “active” or “inactive”
– Connections• Approximation of “synapses”
• Connect 2 neural units
Unit
Connection
Unit
Artificial Neural Networks
• Connections have weights– Positive or negative numbers
– Positive weight causes unit A to activate unit B
– Negative weight causes unit A to deactivate unit B
• Example: “mouse behavior”
Unit A
Connection
Unit BWeight W
Smell cheese
Run towards smell
Positive weight
Smell cat Negative weight
Network Layers
• Input units– Correspond to sensory
input (such as “smell cheese”)
• Intermediate units– Used for complex
interactions
• Output units– Correspond to actions
taken (such as “run towards smell”)
Human brain contains billions of neuronsEach connected to between 1000 – 10000 other units via synapses
Perceptron Networks
• Early model (1960’s)• Very simple representation and
learning– Fast and easy to implement
– Too simple for some domains
• Single layer of inputs, outputs– All units binary (value 0 or 1)
– No intermediate units
– Input and output layers interconnected with each other
• Example: simple orc
Hit points
Have weapon
Player strength
Run
Attack
Net Input Value
• Step 1: Compute net input from inputs S1 – Sn to output Sj
– Based on activation of input units S1 – Sn (0 or 1)
– Based on weights Wij from inputs Si to Sj
• netj = Σ Si Wij i
Sn
Si
S1
Sj
W1j
Wij
Wnj
…
…
Ideas:
Inactive units (with value 0) should have no effect on output
Effect of active units should be proportional to the weight between that input and output
Activation of Outputs
• Threshold activation
– netj > threshold for unit j Sj = 1
– netj ≤ threshold for unit j Sj = 0
• Often represent threshold as another weight Wbias
– Negative weight from unit which is always active
– Makes it easier to learn thresholdSbias (always 1)
Sn
S1
Sj
W1j
Wnj
Wbias j
…
Sj
netjthreshold
Problem Format
• Neural networks expect input activations in range 0 to 1• May need to normalize data to get this• Example: orc
– Hit points between 1 and 100 divide hit points by 100
– Have weapon: 1 if yes, 0 if no
– Player strength between 10 and 20 subtract 10 and divide by 10
Hit points = 50
Have weapon = yes
Player strength = 17.5
(0.5, 1.0, 0.75)
Orc Example
• netrun = (-1.5) + (-2) + 3 + (-2) = -2.5 Srun = 0
• netattack = 2 + 5 + (-2.25) + (-3) = 1.75 Sattack = 1
Hit points = 0.5
Have weapon = 1
Player strength = .75
Run
Attack
-3
4
-2
5
Bias = 1
4
-3
-2-3
Perceptron Learning
• Key question: Where do weights come from?– Too complex to hand code
– Must learn from examples
• Sample training set– Input values normalized between 0 and 1
Inputs Desired Outputs
ExampleHit
PointsHas
WeaponPlayer
Strength Attack Run
1 0.6 1 0.8 1 0
2 0.7 0 0.4 1 0
3 0.4 0 0.3 0 1
4 0.8 0 0.7 0 1
5 0.3 1 0.4 1 0
6 0.2 1 0.6 0 1
Perceptron Learning
• Given the following:
– Sip : State of input i for example p
– Sjp : Actual state of output j for example p
– tjp : Desired state of output j for example p
• How should the weight Wij be changed to decrease the error between Sjp and tjp ?
SipSjp
Wij
Perceptron Learning
Key ideas:
• If Sjp correct (Sjp == tjp), then no change to Wij – Not broken, so don’t fix!
• If Sjp == 0 and tjp == 1, then netjp is too low
– Increase Wij
• If Sjp == 1 and tjp == 0, then netjp is too high
– Decrease Wij
• If input value Sip == 0, then Wij had no effect on netjp
– Only change Wij by the magnitude of Sip
Perceptron Learning
• Perceptron learning rule:
Wij = Wij + η Sip (tjp - Sjp )
• η = step size– Should be small (usually around 0.1)
Perceptron Learning
• Learning involves cycling through training set many times– Often thousands of cycles
• Start with randomly determined set of weights– Usually between -1 and 1
• While error left for some example pFor example p = 1 to number of examples
For all inputs i Assign input i value Sip
For all outputs jCompute actual output value Sjp
For all weights Wij Apply learning rule to update weight
Limits of Perceptron Learning
• Perceptron learning proven to correctly learn any behavior that can be represented with single layer of weights
– Problem: Not all behaviors can be represented this way
– Example: Exclusive-or A B
A B bias A B
1 1 1 0
1 0 1 1
0 1 1 1
0 0 1 0bias
B
AWA
WB
Wbias
WA + WB + Wbias > 0 WA + Wbias < 0 WB + Wbias < 0 Wbias > 0
Contradiction
Limits of Perceptron Learning
• Networks with at least two layers of weights proven to represent any behavior defined by some logical proposition
• Problem: perceptron learning does not work– No “desired value” for intermediate units– Learning rule undefined without known values for units connected by weight
Back Propagation Learning
• Most successful algorithm for multilayered neural networks• Based on gradient descent
– Derivative of error with respect to change in weight
• Ep = Σ (Skp - tkp )2
all outputs k
• Δ W = ∂ (Ep) / ∂ W
• Requires activation function with continuous derivative– Threshold function has infinite derivative at threshold
– Often use sigmoidal activation function 1/(1 + e-netj)• More biologically plausible anyway
Sj
netj
Back Propagation Learning
Equations:
• Weights Wjk between intermediate and output units:
Δ Wjk = η (Skp - tkp ) Skp (1 - Skp) Sjp
• Weights Wij between input and intermediate units:
Δ Wij = η Σ(Skp - tkp ) Skp (1 - Skp) Sjp Wjk Sjp (1 - Sjp) Sip
k
– Summed because Wjk affects all ouputs
Sip
Sjp
Wij Skp
Wjk
Neural Networks and Games
• Perceptron learning often sufficient for game development– Only fails for nonlinear behaviors like XOR
– Types of problem where multiple “good” things not good• High NPC hit points attack• Low player strength attack• High NPC hit points and low player strength run?
– Rare in practice
• Works well even if results not perfect• Nonlinear aspects to behavior
• Occasional bad data in training set
– Still creates weights that work most of time• Advantage of most neural network algorithms
Neural Networks and Games
What is being learned?• Relative importance of different inputs on player actions• Thresholds at which different values should affect behavior
– Extremely hard to create a good set of rules by hand if large number of inputs and possible actions
Hit points
Have weapon
Player strength
“Is my hit points or player strength more important in deciding
when to attack?
“How low should I let my hit points get
before running away?
Neural Networks and Games
• Can also use learning to create variety of characters– Provide variety of experiences for players– Makes characters more realistic
• Train each with slightly different set of examples– Much like how we learn from different experiences
Inputs Desired Outputs
ExampleHit
PointsHas
WeaponPlayer
Strength Attack Run
1 0.4 1 0.9 1 0
2 0.6 0 0.5 1 0
3 0.2 0 0.5 0 1
4 0.4 0 0.9 0 1
5 0.3 1 0.4 1 0
6 0.1 1 0.8 0 1
“I don’t know the meaning
of fear (or many other
words)!
Example: Dirt Track Racing
• Neural network used to train cars to take corners– Inputs: Track conditions (degree of mud, gravel, snow, etc.)– Outputs: Location at which to begin steering, angle, speed, etc.– Supervision: Positive reinforcement if made it through turn
quickly without crashing