Deep-Q network for truss topology optimization with stress constraints
Kazuki Hayashi (Kyoto University)Makoto Ohsaki (Kyoto University)
Ingredients of RL
• :agent
1. status 2. action 3. reward 4. environment
• Objective is to maximize total reward
2
( )1,Q as
・・・ ・
・・
・・・
・・・
( )2,Q as
( ), nQ as
1s
2s
ms
1 if meet constraint1 else+−
Python
Neural network
s a r ( , ) { ', }F s a s r=
Problem is NOT always pixel-wise
CNN→ How to apply RL to discrete structures?
available NOT available
continuous truss frame
Structure2Vec (Hanjun et al., 2016)
• Reinforcement learning + graph embedding• Train for MVC and TSP with 50-100 nodes→ Apply the trained model to 1000-1200 nodes→ Solution comparable to CPLEX solver’s one
1 hour12 seconds
Minimum vertex cover (MVC) Traveling salesman problem(TSP)
2
2
4
12
3
2
Minimize (no. of nodes selected) Minimize (total path)
Graph embedding
• Convolutional NN : :• Graph embedding :
Whole graph Node embedding
=
Edge embedding
Image → Feature VectorGraph → Feature Vector
We formulate a new methodfor truss topology optimization
ex ) Structure2Vec(Hanjun et al., 2016)
Edge embedding
• : feature vector of edge eeµ
e e
( )tuµ( )t
uµ
( )tuµ ( )t
uµ
ew
,2ex,1ex
( 1)teµ+
( ),
2( ) ( )
2 2( 1)
1 31 1
,4relu relu relu( )e i
t tu
te
i ive i
ueew xµ µ θ θµθ θ+
= =∈ℵ
← + − +
∑ ∑ ∑
Input features x and w
• : feature vector of edge eeµ
( ),
2( ) ( )
2 2( 1)
1 31 1
,4relu relu relu( )e i
t tu
te
i ive i
ueew xµ µ θ θµθ θ+
= =∈ℵ
← + − +
∑ ∑ ∑
e
( )tuµ( )t
uµ
( )tuµ ( )t
uµ
ew
,2ex,1ex 3
1) Binary feature if node is pin-supported2) Load in x direction3) Load in y direction
5
1) Absolute value of cosine direction2) Absolute value of sine direction3) Member length4) Binary feature if 5) Axial stress over admissible stress
20
Embedded edge features
Q-value using embedded value
• : value to remove edge e in current state s( ),Q s e
T ( ) ( )5 6 7( ( ), ) relu[ , ]T T
u eu V
Q s eµ θ θ µ θ µ∈
= ∑
( )Teµ
( )Tµ( )Tµ
( )Tvµ ( )Tµ
e e
( ),Q s e
Q-learning
• Q-learning (Watkins, 1989)
→
s(μ) : expressed by edge embeddinga : remove an edge e withr : +1 (if satisfy constraint) or -1 (else)
( ) ( ) ( ) ( )( )2
1 7m 'inim mize { x, , } ,a ',a
r Q Q aaF θ θ γ+= −s ss
( ) ( ) ( ) ( ) ( )( )' max ' ,,, ,a
Q aQ r aa Qa Qγα← + −+ss ss s
estimated total rewardat current state
estimated total rewardat next state
observed reward +
max ( , )e
Q a e=s
Training modelmaximizesubject to 2.0 ( 1, , )
16.0 ( 1, , )i
j p
number of edges eliminatedi m
u u j n
σ σ≤ = =
≤ = =
Continue until violating constraints
4
assign very small cross-section
4
Choose 1 load randomly
Training result
Agent learned how to remove unnecessary members
Validation model Best scored topology
number of training episodes
validation score= Total rewards
Applicability to another model
12
3 s.t. 2.0, 16.0uσ = =
Trained agent can be used without re-training
1
Applicability to another models.t. 4.0, 50.0uσ = =
1
1
6
6
Conclusion
• Graph embedding method is introduced to express features of truss members
• Conducted Q-learning based on the embedded features
• Trained agent is applicable to any topology and geometry of trusses
Email: [email protected]
vvx
( , )w v u
( )tuµ
( )tuµ
( , )w v u
( , )w v u
( )tuµ