intelligent agents - princeton university · 2018. 5. 7. · paradigms for intelligent agents...

Robert StengelPrinceton University

School of Engineering and Applied Science

! Cognitive and Biological Paradigms for Intelligent Agents

! Intelligent Vehicle/Highway Systems

! Advanced Vehicle Control Systems

– Control of a Fuel-Cell Preferential Oxidizer

– Adaptive Critic Neural Control of an Aircraft

! Autonomous Vehicles

– Intelligent Guidance for Headway and Lane Control

Neural-Adaptive Control of

Dynamic Systems

presented at George Washington

University, February 23, 2007

Intelligent Agents

• Perform useful functions driven by objectivesand current knowledge

– Emulate biological and cognitive processes

– Process information to achieve goals

– Learn by example or from experience

– Adapt functions to a changing environment

Cognitive and BiologicalParadigms for Intelligent Agents

Thinking- Syntax (form) and Semantics (meaning)- Algorithmic vs. Non-Algorithmic Behavior- Consistency, Emotion, "The Collective Subconscious"- Generating Alternatives- Randomized Search

Consciousness- Self-Awareness and Perception- Creativity, Wisdom, and Imagination- Common Sense, Understanding, and Judgment of Truth- Learning by Example

Recognizing Aliases and ObjectsHandling EmergenciesFocusing on Things that are "Out of Focus"Richness of Sensory InformationHierarchical and Redundant StructuresGenetic Reproduction of Elements

Learning Requires Error or Incompleteness

Biological Adaptation is a Slow Process

REM (Rapid Eye Movement) Sleep is a Time ofLearning, Consolidating, and Pruning Knowledge

Cells Undergo Birth-Life-Death Cycle

Short-Term Memory Recedes into Long-TermMemory or is Forgotten

Humans Form Chords of Actions

Knowledge Acquisition,Behavior, and Control

Conscious Thought- Awareness- Focus- Reflection- Rehearsal- Declarative Processing of Knowledge or

Beliefs

Unconscious Thought- Subconscious Thought

> Procedural Processing> Communication> Learned Skills> Subliminal Knowledge Acquisition

- Preconscious Thought> Pre-attentive Declarative Processing> Subject Selection for Conscious Thought> Concept Development> Information Pathway to Memory> Intuition

Hierarchy ofDeclarative, Procedural,and Reflexive Actions

Reflexive Behavior- Instantaneous Response to

Stimuli- Elementary, Forceful Actions- Stabilizing Influence- Simple Goals

Intelligent Agent PossessingDeclarative, Procedural, and Reflexive Traits

Declarative Functions Expert Systems, Decision TreesProcedural Functions modeled by Estimation and Control "Circuits"Reflexive Functions Neural Networks

Natural Neurons

• Neurons are biological cellswith significantelectrochemical activity

• ~10 billion neurons in thebrain

• Neuron activity is complex,but output is scalar

• Single neuron– receives many inputs– produces a single output

ComputationalNeural Networks

• Functional structure– Algebraic processing of

inputs to producecontinuous outputs• Flow-through network

• Convergence to desiredsolution as outputsequence evolves

– Search of input space toidentify discrete outputs

• Recursive network

• Convergence to distinctsolution before outputoccurs

• Training categories– Supervised learning

• Define input-outputrelationship fromexamples, e.g.,backpropagation

– Unsupervised learning• Identify inputs that are

similar or close

– Non-adaptive Network

• Training beforefunctional application

– Adaptive Network• Training during

functional application(adaptive)

Computational (Artificial)Neuron Complex

• Synapse effects represented by weights,gains, or multipliers

• Neuron firing frequency is modeled by lineargain or nonlinear element

w11

w12

w13

w21

w22

w23

Natural NeuralNetworks

• Dendrites receive signalsfrom other neurons• Axons transmit signals to

other neurons and endeffectors• Synapses reflect

connection strength–Excite or inhibit neuron

activity–Are the learning

parameters of the nervoussystem

Layout of an AlgebraicNeural Network

Layered, parallelstructure for computation

Intelligent Vehicle/Highway Systems(IVHS)

• Rationale• Congestion• Highway Throughput and Trip Time• Traveler Safety• Industrial and Social Productivity• Convenience• Handicapped and Elderly

• Functional Areas• Vehicle Control Systems• Traffic Management Systems• Traveler Information Systems• Public Transportation Systems• Cargo Transportation Systems• Rural Transportation Systems

• Issues• Smart Cars and Smart Highways• Autonomy and System Architecture• Cost and Resource Allocation• Benefits and Privacy• Rights and Responsibilities• Regulation and Liability

Predicted Market Penetration Dates for IVHS Control Technologies (1991)

Feature 5% 50%Vehicle Probes Producing Traffic Data 2000 2016Real-Time Optimal Route Guidance 2000 2020Frontal Collision Warning 2002 2013Backup and Blind-Spot Detection 2002 2015Roadway Imaging 2010 NeverGPS Navigation 2000 2012Map-Matching/Dead-Reckoning Navigation 2000 2020Adaptive Cruise Control 2004 2015Automatic Backup Braking 2008 2020Autonomous Lane-Keeping 2012 2032Platooning 2035 NeverAutomatic Chauffeuring 2040 Never

compiled by Steven Underwood, “Delphi Forecast and Analysis of Intelligent Vehicle-Highway Systems,” U. Mich, 1992.

Elements of an IntelligentVehicle/Highway System

RegionalTraffic Management

Organization

CellularTraffic Management

Center


Center


Center

Road/Highway/Communication Infrastructure

Roadway Systems• Traffic Lights• Changeable Message Signs• Ramp Metering• Tolling Systems• Law Enforcement• Maintenance• Radio Links• Wire/Optic Lines• Loops, Video Detectors• Weather Information

Client Vehicles• Regular Traffic - Passenger Cars - Buses - Trucks & Vans - Motorcycles - Bicyles - Pedestrians• Transient Traffic - ...... - ......

Traffic Obstructions• Accidents• Road Maintenance• Roadside Construction• Sports/Social Events• Natural Calamities

A Network of Intelligent Agents

Regional Traffic ManagementIntelligentAgent

Cellular Traffic ManagementIntelligentAgent

Police/EmergencyIntelligentAgent

ClientVehicleIntelligentAgent

Controlling

Devices

Sensing

DevicesControlling

Devices

Sensing

DevicesControlling

Devices

Sensing

Devices

Cellular Traffic ManagementIntelligentAgent

RoadsideIntelligentAgent

Functions of Two AgentTypes in an IVHS

• Cellular Traffic ManagementAgent• Declarative Functions

• Area traffic monitoring• Nominal traffic routing plan• Area emergency planning• Accident strategic response

• Procedural Functions• Traffic flow assessment• Driver information services• Accident detection and tactical

response

• Reflexive Functions• Normal traffic signaling• Traffic volume and speed logging• Communication with vehicles,

adjacent cells, and regionalcenter

• Driver/Automobile Agent• Declarative Functions

• Destination and route selection• Choice and timing of waypoints• Strategy selection

• Procedural Functions• Neighboring traffic assessment• Roadway, obstacle, and hazard

assessment• Obeying traffic rules and

regulations

• Reflexive Functions• Steering and accelerating• Normal and emergency braking• Internal systems control• Communication with adjacent

vehicles and traffic managementsystem

I. Control of a Fuel-CellPreferential Oxidizer

BATTERIES

POWER

CONDITIONING

AND MOTOR

CONTROL

GEARMOTOR/

GEN.

FUEL

PROCESSOR

FUEL

STORAGE

FUEL CELL

STACKShift

2H O

Air

PrOx

Reformer or Partial

Oxidation Reactor

• Control logic mimics functions of the brain’s cerebellum

PreferentialOxidizer

• Proton-Exchange Membrane Fuel Cell converts hydrogen andoxygen to water and electrical power

• Steam Reformer/Partial Oxidizer-Shift Reactor converts fuel (e.g.,alcohol or gasoline) to H2, CO2, H2O, and CO. Fuel flow rate µpower demand

• CO “poisons” the fuel cell and must be removed from thereformate

• Catalyst promotes oxidation of CO to CO2 over oxidation of H2 in aPreferential Oxidizer (PrOx)

• PrOx reactions are nonlinear functions of catalyst, reformatecomposition, temperature, and air flow

FUEL

PROCESSOR

Shift

2H O

Air

PrOx

Reformer or Partial

Oxidation Reactor

TheCerebellum

• Cerebellum integrates sensoryinput and motor output

Cerebellar ModelArticulation

Controller (CMAC)

• CMAC: Two-stagemapping of a vectorinput to a scalar output

• First mapping: Inputspace to associationspace– f is fixed

– a is binary

• Second mapping:Association space tooutput space– g contains learned

weights

ASSOCIATION MEMORY, c = 3

INPUT SPACE, n = 2 Layer 1 Layer 2 Layer 3

input 2

inp

ut 1

quant. widthof input 2

!

f : x" a

Input" Selector vector

!

g :a" y

Selector"Output

Single-Input CMAC Example

• x is in (xmin, xmax)

• Selector vector is binary and has Nelements

• Receptive regions of associationspace map x to a

• NA = Number of receptive regions =N + C – 1 = dim(a)

• C = Generalization parameter = # ofoverlapping regions

• Input quantization = (xmax –!xmin) / N

!

f : x" a

Input" Selector vector

!

g :a" y

Selector"Output

!

a = 0 0 0 1 1 1 0 0[ ]T

CMAC Outputand Training

• CMAC output from activated cells of cAssociative Memory layers:

• Least-squares training of CMAC weights:

where ! is the learning rate and wj is anactivated cell weight

• Localized generalization and training

wj ,new = wj ,old +!

cydesired " wi, old

i=1

c

#$

% & '

(



input 2

inp

ut

1


!

yCMAC = wTa = wi,activated

i= j

j+C"1

# j= index of first activated region

CMAC Outputand Training

• In higher dimensions, association space isdim(x), a plane, cube, or hypercube

• Potentially large memory requirements

• Granularity (quantization) of output

• Variable generalization and granularity



input 2

inp

ut

1


CMAC/PID* Control Systemfor Preferential Oxidizer

desired H2

conversion

airCMAC

airPID

airTOTAL

training

+-

+

+! ! PROX

PID

CMAC

H2 conv.

error

HYBRID CONTROL SYSTEM

(ANN)

(Conventional)

PROX reformate flow rate

PROX inlet [CO] Inlet coolant temperature

gains=f(flow rate)

Inlet reformate

Outlet reformate

H2 conv. =

f(airTotal, [H2]in, [H 2]out,

flow rate, sensor dynamics)

H2 Conversion Calc.

actual H2 conversion[H2]out

[H2]in



input 2

inp

ut

1


* Proportional-Integral-Derivative

Feedback andAdaptation

• Feedback– Learn the current dynamic state of the system

and adjust the control commands

• Adaptation– Learn the current status of the system’s

dynamic model and adjust the control law

desired H2

conversion

airCMAC

airPID

airTOTAL

training

+-

+

+! ! PROX

PID

CMAC

H2 conv.

error

HYBRID CONTROL SYSTEM

(ANN)

(Conventional)

PROX reformate flow rate

PROX inlet [CO] Inlet coolant temperature

gains=f(flow rate)

Inlet reformate

Outlet reformate

H2 conv. =

f(airTotal, [H2]in, [H 2]out,

flow rate, sensor dynamics)

H2 Conversion Calc.

actual H2 conversion[H2]out

[H2]in

Summary of CMACCharacteristics

• Inputs and Number of Divisions:

– PrOx inlet reformate flow rate (95)

– PrOx inlet cooling temperature (80)

– PrOx inlet CO concentration (100)

• Output: PrOx air injection rate

• Associative Layers, C: 24

• Number of Associative Memory Cells/Weights and LayerOffsets: 1,276 and [1,5,7]

• Learning Rate, !: ~0.01

• Sampling Interval: 100 ms



input 2

inp

ut

1


Flow Rate and Hydrogen Conversionof CMAC/PID Controller

• H2 conversion command (across PrOx only): 1.5%

• Novel data, with (—-) and without pre-training (––)

• Federal Urban Driving Cycle (= FUDS)

Comparison of PrOxControllers on FUDS

mean H2 error

maximum H2 error

mean CO out

max. CO out

% % ppm ppm %

• Fixed-Air 0.68 0.87 6.3 28 57.2

• Table Look-up 0.13 1.43 6.5 26 57.8

• PID 0.05 0.51 7.7 30 58.1

• CMAC/PID 0.02 0.16 7.3 26 58.1

net H2 output

Time (seconds)

Flo

w r

ate

(g/

s)

0

1

2

3

4

5

6

7

8

9

10

0 200 400 600 800 1000 1200 1400

0200

400600

800

-1000

-500

0

6750

6800

6850

6900

6950

7000

7050

• Adaptive criticcontroller– Estimates cost

function– “Criticizes” non-

optimal performance– Adapts control gains to

improve performance– Adapts cost model to

improve estimate

II. Adaptive Critic NeuralControl of an Aircraft

Design Philosophy for Adaptive CriticNeural Control

• Define an acceptable linear controlstructure

• Design linear controllers that satisfyrequirements at n operating points

• Train neural networks

– Off-line to replicate control responseat n operating points (~ “GainScheduling”)

– On-line to optimize performance

Linear-Quadratic Proportional-Integral(LQ-PI) Control System

• LQ-PI regulatorprovides:

– Multi-input/multi-output control

– Damping andstabilization

– Command response

– Disturbance rejection

– Implicitly accountsfor system modelingerrors

• Gains chosen tominimize a cost (orvalue) function

!

min"u( tk )

V "x tk( )[ ] = min

"u( tk )L "x t

k( ),"u tk( )[ ] +V "x tk+1( )[ ]{ }

L "x tk( ),"u tk( )[ ] =

1

2"xT t

k( ) "uT tk( )[ ]

Q M

MTR

#

$ %

&

' ( "x t

k( )"u t

k( )

#

$ %

&

' (

Structure of EquivalentProportional-Integral

Neural Controller

!

uk

= c xk,y

ck, "y

kdt,# a

k[ ]=NN

FyCk

,ak[ ] +NN

Bxk,a

k[ ] +NNI

"yckdt# ,a

k[ ]

Off-Line Initializationof Neural Networks

!

uk

= c xk,y

ck

, "ykdt,# a

k[ ]

!

"uk

= "c •[ ] =#c

#yc

"yck

+#c

# ycdt$( )

%"yck

dt$ +#c

#x"x

k

=CF"y

ck

+CI%"y

ck

dt$ +CB"x

k

"xa

"u

A

B*

*

• Pre-training paradigm– Nonlinear optimal control

hypersurfaces (unknown)

– Optimal linear control gainmatrices and trim settingscomputed at operating points(known)

– Gain matrices define slopes ofnonlinear control hypersurfaces

– Algebraic training of neuralnetworks fits controlhypersurfaces and gradientsexactly at operating points

• Interpolation and gainscheduling via neural networks

• One node/operating point ineach neural network

On-Line Training

• Dual Heuristic ProgrammingAdaptive Critic for infinite-horizon optimization problem(tf -> ∞)

• Critic and Action (i.e., Control)networks adapted concurrently

• LQ-PI cost function

• Modified resilientbackpropagation for neuralnetwork training

V x tk

( )[ ] = L x tk

( ),u tk( )[ ] + V x tk+1( )[ ]

!V

!u=!L

!u+!V

!x

!x

!u= 0

!

"V

"xk

=NNCyCk

,xk,a

k[ ]

!

uk

= c xk,y

ck, "y

kdt,# a

k[ ]=NN

FyCk

,ak[ ] +NN

Bxk,a

k[ ] +NNI

"yckdt# ,a

k[ ]

BackpropagationTraining of a Single

Sigmoid Neuron

!

"J

"p= ˆ y # yT( )

"y

"p= ˆ y # yT( )

"ˆ y

"r

"r

"p

where

r = wTx + b

dˆ y

dr= 1# ˆ y ( ) ˆ y

"r

"p= x

T1[ ]

!

pk +1= pk "#

$J

$p

%

& '

(

) *

k

T

= pk "#+k

x k

1

,

- .

/

0 1

or

w

b

,

- . /

0 1

k +1

=w

b

,

- . /

0 1

k

"# ˆ y k " yT( ) 1" ˆ y ( ) ˆ y kx k

1

,

- .

/

0 1

!

" = ˆ y # yT

J =1

2"2 =

1

2

ˆ y # yT( )2

=1

2

ˆ y 2# 2 ˆ y yT + yT

2( )!

p =

p1

p2

...

pn+1

"

#

$ $ $ $

%

&

' ' ' '

=w

b

"

# $ %

& ' =

Input Weights

Bias

"

# $

%

& '

• Training error and cost function

• Neuron parameters

• Cost functiongradient

• Backpropagation algorithm

BackpropagationTraining of a

Sigmoid Network

!

p1,2

=Vec W( )b

"

# $

%

& ' 1,2

=

p1

p2

...

pn+1

"

#

$ $ $ $

%

&

' ' ' ' 1,2

!

" = ˆ y # yT

J =1

2"T" =

1

2

ˆ y # yT( )

T

ˆ y # yT( ) =

1

2

ˆ y T

ˆ y # 2ˆ y T

yT

+ yT

T

yT( )

!

p1,2k+1

= p1,2k

"#$J

$p1,2

%

& '

(

) * k

T

• Training error and cost function

• Neuron parameters

• Backpropagation algorithm

Adaptation of Action (Control) andCritic (Optimizing) Networks

Train action network, at time t,holding the critic parameters fixed

Train critic network, at time t,holding the action parameters fixed

Effect of AdaptiveCritic in Steep Turn

0200

400600

800

-1000

-500

0

6750

6800

6850

6900

6950

7000

7050

• 70-deg banking turn– Outside normal flight envelope of jet transport– Pre-trained neural network ignores longitudinal-lateral-directional coupling

Uncoupled

control

Adaptive critic control

• 50% thrust reduction• 15-deg rudder jam

Effect of Adaptive Criticwith Control Failures

• 50% reduction incontrol effectiveness

• 20% reduction inlongitudinal stability

• 30% reduction indirectional stability

Effect of Adaptive Criticwith System Parameter

Variations

Movie from California PATH, 1997

III. Intelligent Guidance forHeadway and Lane Control (IGHLC)

Typical Equipage for IGHLC(or Automatic Chauffeuring)

Illustration from Ohio State University, 1997

Functions of anAutomatic Chauffeur

• Control logic mimics functions of the brain’s cerebrum

TheCerebrum

• Language andcommunication

• Movement

• Olfaction

• Memory

• Emotion

• Cerebrum integrates declarativethought and action

An Expert System forGuidance and Control

• Automated inferenceor reasoning

• Subject-specific rulesand data

• Knowledgerepresentation andacquisition

• Higher-order controlof side effects

• Explanation

• User interface

Expert System forHighway Driving

Functions of theIGHLC Expert System

• Top Level Executive– Guide other functions to determine controller

parameters

• Situation Assessment– Determine if current situation is safe or unsafe

– Invoke normal or emergency expert

• Normal Expert– Select option and issue command that is safe

and satisfies driver’s goal

• Emergency Expert– Select option and issue command that is safe

• Projected Action– Assess outcome of guidance command

• Lane-Change Indications– Identify desirable lane-change option

• Default Strategies– Backup driver-selected values

Normal ExpertSystem Flow

• Identify Own Vehicle’s– Speed goal

– Lane goal

– Aggressiveness factor

– Security factor

• Worst-Plausible-Case Decision-Making (WPCDM)– Probabilistic evaluation of

current state and uncertainty ofOwn Vehicle• Known characteristics of Own

Vehicle

– Probabilistic evaluation ofcurrent state and uncertainty ofall neighboring vehicles• Distinct plausible strategies and

corresponding control actions ofneighboring vehicles

• Worst plausible strategy andhazard function identified for eachvehicle

IGHLC Rules, Parameters,and Structure

• Elements of a Rule– Type, Name, and Status

– Parameters tested by rule

– Parameters set by rule

– Premise: Logical statement ofproposition or predicates

– Action: Logical consequenceof premise being true

– Description of premise andaction (for explanation)

• Elements of a Parameter– Type, Name, and Current value

– Rules that test the parameter

– Rules that set the parameter

– Allowable values of theparameter

– Description of parameter (forexplanation)

Size of the IGHLCKnowledge Base

Lane ChangePlausibility Scores

• Plausibility function reflects likely lateral action for each vehicle

• Large negative value effectively rules out the option

!

"ik = Score Increment jk( )i

j=1

Observation Number

# , i =1,..., Number of Vehicles, k = Left, Same, Right

IGHLC (All Cars) – Low UncertaintyA

Own

B C

A

B Own

C

B Own A

C

t = 4 s

t = 8.4 s

t = 0 s

Vehicle Own A B CInitial Lane 2 3 1 1Distance, ft 0 10 30 130Velocity, ft/s 90 100 70 65Maximum Acceleration, ft/s2 10 10 10 10

Maximum Deceleration, ft/s2 10 10 10 10Aggressiveness Factor 0.5 0.5 0.5 0.5Security Factor 0.5 0.5 0.5 0.5Desired Separation Time, s 2 2 2 2Desired Velocity, ft/s 100 100 100 65Vehicle Length, ft 13.44 13.44 13.44 13.44

Vehicle A B CDistance Std. Dev, ft 0.5 1 3Velocity Std. Dev, ft/s 1 1 3

Couple

Sep. Time / Desired

Sep. Time

Req. Decel. / Max. Decel. Safety

Worst B/C 0.56 0.09 SafeBest Own/B 0.1 0.92 EmergencyBest Own/A -0.01 0 Collision

A

Own

B C

IGHLC (All Cars) - High Uncertainty

A

Own C

B

A

Own C

B

t = 0.2 s

t = 2.6 s

t = 0 s

Vehicle A B CDistance Std. Dev, ft 0.5 1 3Velocity Std. Dev, ft/s 1 1 15

Couple

Sep. Time / Desired

Sep. Time

Req. Decel. / Max. Decel. Cost Safety

Worst B/C 0.56 0.63 1.07 EmergencyBest Own/B 0.1 0.92 1.82 EmergencyBest Own/A -0.01 0 0.32 Safe

Conclusions and Future Work

! Adaptation

! Vehicle characteristics

! Driver preferences

! Data transfer andcommunication

! Panel displays

! Cellular traffic management

! Neighboring vehicles

! Current state

! Intent

! Biological and CognitiveModels

! Cerebellar Model ArticulationController

! Simple, adaptive controlstructure

! Adaptive Critic Neural Control

! Algebraic pre-training forinitialization

! On-line optimization

! Failure tolerance

! Rule-Based Expert System forControl

! Declarative decision making

! Significance of probabilisticapproach

Acknowledgments

! Intelligent Vehicle/Highway Systems

! Timothy Chao, ‘94

! Alexander Maravas, ‘94

! Control of a Fuel-Cell Preferential Oxidizer

! Laura Iwan, *97

! Adaptive Critic Neural Control of an Aircraft

! Silvia Ferrari, *02

! Intelligent Guidance for Headway and Lane Control

! Axel Niehaus, *95

Addendum: DARPA GrandChallenge, Oct 6, 2005

• 132-mile course through the desert

• Winner: Stanford racing team

• Winning time: 6 hr, 54 min

Princeton’s 2005 Entry:Prospect Eleven• Alternate semi-finalist

• 10th seed in National Qualifying Event

• 10/8/2005: Disabled at 9.6 miles, fartherthan any entrants in 2004 GrandChallenge

• Bug in one line of code

• Post-Grand Challenge– Bug in code fixed

– Successfully navigated 2005 course, withmanual diversions for mud and new ruts

– Successfully navigated 2004 course, withmanual diversions for mud and new ruts

• Team ofundergraduatesadvised by Prof.Alain Kornhauser

Princeton AutonomousVehicle Engineering

Program

The Princeton Team:

40 undergraduates

8 faculty advisors

http://pave.princeton.edu/main/team/

• DARPA Urban Challenge, 2007,stresses the difficulties of making anautonomous vehicle drive within acomplex urban network, including lossof GPS coverage, intersections, lanechanging, merging, and parking -while obeying traffic laws.

intelligent agents - princeton university · 2018. 5. 7. · paradigms for intelligent agents...

Documents