anartiﬁcialneuralnetworkmodelfortheunarydescriptionofpure ...the aim of this work is to introduce...

Soft Computing (2020) 24:12227–12239https://doi.org/10.1007/s00500-019-04663-3

METHODOLOGIES AND APPL ICAT ION

An artificial neural network model for the unary description of puresubstances and its application on the thermodynamic modelling ofpure iron

Maximilian Länge1

Published online: 22 January 2020© The Author(s) 2020

AbstractThe aim of this work is to introduce a novel approach for the universal description of the thermodynamic functions ofpure substances on the basis of artificial neural networks. The proposed approximation method is able to describe thethermodynamic functions (Cp(T ), S(T ), H(T )− H(Tref),G(T )) of the different phases of unary material systems in a widetemperature range (between 0 and 6000K). Phase transition temperatures and the respective enthalpies of transformation,which are computationally determined by the minimization of the Gibbs free energy, are also approximated. This is achievedby using artificial neural networks as models for the thermodynamic functions of the individual phases and by expressingthe thermodynamic quantities in terms of the free network parameters. The resulting expressions are then optimized withmachine learning algorithms on the basis of measurement data. A physical basis for the resulting approximation is given bythe use of, among others, Planck–Einstein functions as activation function of the neurons of the network. This article providesa description of the method and as an example of a specific application the approximation of the thermodynamic functions ofthe different phases of pure iron. The article focuses on the problem of the representation of thermodynamic data and theirpractical application.

Keywords Neural networks · Function approximation · Derivative approximation · Computational thermodynamics

1 Introduction

The approximation of thermodynamic functions is an impor-tant task in material science and engineering. A consistentdescription of thermodynamic data helps to understand,improve and develop materials. With methods and for-malisms from the field of computational thermodynamics,e.g. the CALPHAD method (Lukas et al. 2007; Saundersand Miodownik 1998), the numerical calculation of stablephases, phase equilibria, phase transitions, whole phase dia-grams and phase diagram-related data is possible. The Gibbsenergy G is the central quantity of such calculations, and anexact description of G in terms of the temperature T and

Communicated by V. Loia.

B Maximilian Lä[email protected]

1 Institute for Applied Thermo- and Fluiddynamics,Mannheim – University of Applied Sciences,Mannheim, Germany

the systems composition is the key for a reliable descrip-tion of material systems and their properties. G and otherthermodynamic quantities are related to one another, and thedifferent quantities of interest can be expressed as sets of par-tial derivatives of G. Expressions for G are usually derivedby approximating the temperature-dependent isobaric heatcapacityCp based on suitable models (Dinsdale 1991; Chaseet al. 1995; Chen and Sundman 2001; Roslyakova et al.2016), where the free model parameters are optimized tofit measurement data. G is then calculated by a subsequentintegration of the optimized model for Cp.

Artificial neural networks (ANN) can be used for func-tion regression.With the property of being universal functionapproximators as described in Hornik (1991) ANNs have theability to map a set of independent input variables to a setof dependent output variables and thus can detect and modelany linear or nonlinear relationship between these. In recentyears ANNs are used to solve tasks in science and engineer-ing in many scientific fields and among them in materialscience. ANNs have been used extensively to model phys-ical properties of materials (Hemmat 2017; Hemmat Esfe

123

http://crossmark.crossref.org/dialog/?doi=10.1007/s00500-019-04663-3&domain=pdf

http://orcid.org/0000-0002-2597-9601

12228 M. Länge

et al. 2016). The latest work of Avrutskiy (2017) delivers theframework for approximating functions and their derivativessimultaneously, and it can therefore be used to solve thisspecific task. In the present work the question is thereforeinvestigated if thermodynamic functions can be modelled onthe basis of ANNs. To answer this question a neural networkmodel for the approximation of thermodynamic functions isintroduced. The thermodynamic functions of iron between 0and 6000K are approximated and validated as a challengingexample.

2 Methods

2.1 Modelling of thermodynamic functions

Unary systems consist of only one compound. The thermody-namic state variables depend therefore only on the pressure pand temperature T . The Gibbs energy G(T , p) is the centralquantity when calculating phase diagrams or phase diagram-related data. It is given in Eq. (2.1) by

G = H − T S. (2.1)

A relationship between G and the entropy S can be estab-lished by the derivation of G w.r.t. T at constant pressure asgiven in Eq. (2.2) by

S = − ∂G

∂T

∣∣∣∣p. (2.2)

A relationship between G and the enthalpy H can also bederived and is known as Gibb–Helmholtz equation as givenin Eq. (2.3) by

H = −T 2 ∂

∂T

G

T

∣∣∣∣p. (2.3)

The isobaric heat capacity Cp, in the following denoted asC , can also be derived from the Gibbs energy and is given inEq. (2.4) by

C = −T∂2G

∂T 2

∣∣∣∣p. (2.4)

Equations (2.1)–(2.4) can be used to characterize a mate-rial system. In the present work this resulting set of partialderivatives is modelled and solved on the basis of artificialneural networks.

G(T , p) as a thermodynamic potential can be used to cal-culate the stable phase for a chosen state, phase equilibria and

whole phase diagrams. A detailed description of the under-lying physical principles is above the scope of this work.

2.2 Neural networks for the approximation offunctions and their derivatives

Artificial neural networks (ANNs) consist of a large num-ber of elementary processing units, the so-called neurons.The different neurons are interconnected to each other, andthe connections are weighted individually. Neural networkscan be trained to a desired behaviour by trial-and-error pro-cedures based on training data. The purpose of the trainingis to adjust the individual weights of the network, so thenetwork exhibits a desired behaviour. The neural networkmodel presented in this work is a feedforward network andis trainedwith the resilient propagation (rProp) learning algo-rithm. Expressions for the derivatives of the networks outputw.r.t. its inputs, the calculation of the different thermody-namic quantities as well as the training procedure itself willbe discussed in the following section.

2.2.1 Notation

The proposed model can be used with an arbitrary numberof layers but will be fixed to three consisting of one inputlayer (1), one hidden layer (2) and one output layer (3). Neu-rons of the different layers of a network will be denoted bythe Greek letters α, for layer 1 and β for layer 2. The gen-eral structure of an arbitrary neuron β is shown in Fig. 1.The number of neurons for the output layer will be fixed to1. Network parameters are numerated due to their multipleuse. The weight matrices connecting the neurons of differentlayers will be in this sense referred to as W21,βα that con-nects a neuron α from layer 1 with a neuron β from layer 2and W32,1β , respectively. Vector-valued network parametersapply only on one layer and are addressed by the numberof the layer and one Greek letter for the neuron, like b2,β .The value a neuron receives before applying its activationfunction will be denoted as net input s. The net input ofeach layer is organized as vector st2,β where the superscriptt runs through the number of training examples in general.The absolute number of iterable objects will be denoted byamount lines, like |γ | or |t |. The activation function of theinput and output layer is linear. The activation function ofthe hidden layer will be denoted as f for the general descrip-tion and will be specified later. The activation functions areapplied on every neuron of a layer. Expressions for the neu-ral network representations of the different thermodynamicquantities are in the following denoted by the subscript N .The overall input pattern is denoted as xtα and the overalloutput pattern as yt .

123

An artificial neural network model for the unary description of pure substances and its… 12229

2.2.2 Representation of thermodynamic functions withneural network variables

Feedforward networks are suitable networks for solvingregression tasks (Hornik 1991). Every layer is fully con-nected with its adjacent layer, and the information flowsmonodirectionally from the input through the hidden layerto the output of the network. In the following section the netinput of a arbitrary layer, e.g. st2,β , is defined as the weightedsum of all input values and an additional threshold value b2,βas given in Eq. (2.5) by

st2,β =|α|∑

α=1

W21,βα · xtα + b2,β (2.5)

Using this definition the overall output of the network yt asfunction of its input xt for any input pattern t is calculatedas given in Eq. (2.6) by

yt (x) = W2,1β · f

⎛

⎝

|α|∑

α=1

W1,βα · xtα + b2,β

⎞

⎠ + b3,1

= W2,1β · f(

stβ

)

+ b3,1. (2.6)

Expressions for the first and the second derivative of yt

w.r.t. xt are derived by the application of the chain rule asgiven in Eqs. (2.7) and (2.8) by

∂ yt

∂xα

= W2,1β · f ′⎛

⎝

|α|∑

α=1


⎞

⎠ · W1,βα (2.7)

and, respectively,

∂2yt

∂xα∂x≈α

= W2,1β · f ′′⎛

⎝

|α|∑

α=1


⎞

⎠ · W1,βα

·W1,β

≈α. (2.8)

Using Eqs. (2.6)–(2.8) one can express the Gibbs energyand the related quantities given by Eqs. (2.1)–(2.4) solely byneural network variables. In the proposed model, the inputto the network is directly given by the temperature T and theoutput of the network yt is directly connected to the Gibbsenergy G. Therefore, the neural network representation Gt

Nof the Gibbs energyGt at a given temperature T is calculatedas given in Eq. (2.9) by

GtN = yt = W2,1β · f

(

stβ

)

+ b3,1 (2.9)

The entropy as defined in Eq. (2.2) is calculated as the firstderivative of G w.r.t. T . The neural network representation

of the entropy SN is therefore given as in Eq. (2.10) by

StN = −dyt

dxt

= −W2,1β · f ′ (stβ)

· W1,β1 (2.10)

The neural network representation of the enthalpy fromEq. (2.3) is given as in Eq. (2.11) by

HtN = Gt

N + xt StN

= W2,1β · f(

stβ

)

+ b3,1 − xt(

W2,1β · f ′ (stβ)

· W1,β1

)

(2.11)

And finally the neural network representation of the isobaricheat capacity from Eq. (2.4) is given as in Eq. 2.12 by

CtN = −xt

d2yt

d (xt )2

= −xt · W2,1β · f ′′ (stβ)

· (

W1,β1)2 (2.12)

Equations (2.9)–(2.12) consist now solely of neural networkparameters and reflect at the same time the physical rela-tions of the different thermodynamic quantities. As describedin the work of Avrutskiy (2017) the expressions for thederivatives of a neural network outputs w.r.t. its inputs canbe considered as standalone neural networks and could betrained individually. Due to the fact that each of the expres-sions above is formed from the same network parametersthe calculation of GN, SN and HN is also possible whenthe training data only contain values for C . This results intwomajor advantages: firstly, all available measurement datacan be used for the modelling process and, secondly, the

f∑

W21,β1

W21,β2

...

W21,β|α|

y2,β

Output

x1

x2

x|α|

...

b2,β

Addingnode

Activation-function

WeightsInputs

Threshold

s2,β

Net-input

Fig. 1 Structure of an arbitrary artificial neuron β

123

12230 M. Länge

self-consistency of resulting network for the different ther-modynamic quantities is always fulfilled.

Activation function

Activation functions within neural network regression prede-fines the shape of the function approximation. In general anyshape can be approximated by a neural network using thestandard sigmoid activation function and a sufficient num-ber of layers and neurons as proven in Hornik (1991). Amajor disadvantage is the extrapolation ability of a neu-ral network. The only information about the function tobe approximated is provided by the training data. Withoutfurther assumptions and constrains a reliable prediction offunction values outside the borders of the training data is notpossible.Whenapproximating thermodynamic functions andespecially when calculating stable phases, phase transitionsor phase equilibria a model must deliver reasonable valuesfor the thermodynamic functions of the different phases evenoutside their stable temperature regimes. The extrapolationabilities of the proposed model are provided by the use oftwo different activation functions fa and fb as also shown inFig. 2:

fa (s) = E0 + 3

2R�E + 3Rslog

(

1 − e−�E/s)

− 1

2as2 − 1

6bs3 (2.13)

fb (s) = log(

es + 1)

(2.14)

The expression from Eq. (2.13) is derived by the integra-tion of the Chen and Sundman model for the isobaric heatcapacity, where each term describes a different physical con-tribution. For a detailed description of this model the authorrefers to Chen and Sundman (2001). By using Eq. (2.13)as an activation function a physical basis for the approxima-

tion of thermodynamic functions is providedwhich describesthe general shape of the approximated thermodynamic func-tions. In Eq. (2.13) R stands for the universal gas constant.The parameters E0, �E , a and b are treated as additionalnetwork parameters and are optimized during the learningprocess. In many material systems, such as iron, which willbe approximated in this work, effects, for example magneticordering effects, can occur locally and affect the thermody-namic functions. The local deviations are approximated bya second network with its activation function fb as given inEq. (2.14). fb is based on the so-called softplus activationfunction (Nair and Hinton 2010). The reason to use Eq. 2.14lies in the bell shape of the second derivative of fb w.r.t. sthat allows to model local and peak-shaped effects in the heatcapacity curve.

Training neural networks and derivatives with resilientpropagation

The training data consist of sets of pairs {(xt ,Gt)} defin-

ing a desired network output Gt at given xt . When dealingwith partial derivatives there need to be additional data pro-vided for every unknown quantity of the system. The sets{(xu, Su)}, {(xv, Hv)} and {(xw,Cw)} provide these addi-tional trainings data for each of the derived thermodynamicquantities approximated in this work. The different indicest , u, v and w stand for the fact that the absolute number oftrainings examples and the locations xt , xu , xv and xw do notnecessarily need to be the same.Tomeasure the error betweenthe resulting output from the network and the desired outputdefined by the training data, a cost function E = e (pα) withpα = [

W γβ,W δγ , bγ , . . .]T

that depends on all free networkparameters is defined. In the present work the least squaressum is chosen as cost function as given in Eq. (2.15) by

Fig. 2 Normalized prototypefunctions for the differentthermodynamic quantities forthe ANN approximation aga(x) = fa(sa(x)) withsa(x) = x and bgb(x) = fb(sb(x)) withsb(x) = x − 5

0.0 2.5 5.0 7.5 10.0 12.5x

−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75

1.00

y

ga (x)

-∂ga∂x

ga (x) − x∂ga∂x

−x∂2fa∂x2

(a)

0.0 2.5 5.0 7.5 10.0 12.5x

−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75

1.00

y

gb (x)

-∂gb∂x

gb (x) − x∂gb∂x

−x∂2gb∂x2

(b)

123


xt

...

...

GtN

xt

Subnetwork 2

Subnetwork 1

Fig. 3 Structure of the ANN model for the approximation of thermo-dynamic functions

E = qG|t |

∑

t

EtG + qS

|u|∑

u

EuS + qH

|v|∑

v

EvH + qC

|w|∑

w

EwC

= qG|t |

∑

t

(

GtN

(

xt) − Gt)2

+ · · · qS|u|∑

u

(

SuN(

xu) − Su

)2

+ · · · qH|v|∑

v

(

HvN

(

xv) − Hv

)2

+ · · · qC|w|∑

w

(

CwN

(

xw) − Cw

)2 (2.15)

The main goal of the training process is now to find an opti-mal set of parameters so that the error defined in Eq. (2.15)reaches aminimum. By repeated application of the chain rule∂E/∂ pα can be calculated. A step in the opposite gradientdirection is then performed to move towards the minimumof the error surface. The factors qG , qS , qH and qC areintroduced to equalize the contributions of the different ther-modynamic quantities, which differ in orders of magnitude,to the error expression.

Usually the magnitude and the sign of the different par-tial derivatives are used for the optimization of the networkparameters. This can be a problem when the error sur-face described by E is too rugged and nonlinear. Undersuch conditions the magnitude of ∂E/∂ pα can vary byorders of magnitude from one iteration to the next result-ing in a unstable learning process and even in the failing

0 500 1000 1500 2000 2500 3000T in K

0

20

40

60

80

Cin

J/(m

olK

)

Bendick and Pepperhoff (1982)Rogez and Le Coze (1980)Tsuchiya et al. (1971)Anderson and Hultgren (1962)Vollmer et al. (1966)Wallace et al. (1960)Dench and Kubaschewski (1963)Normanton et al. (2013)Cezairliyan and McClure (1974)Keesom and Kurrelmeyer (1939)Eucken and Werth (1930)Stepakoff and Kaufman (1968)Kelley (1943)Simon and Swain (1922)Griffiths and Griffiths (1914)

Reddy and Redyy (1974))9691(sennIcaMdnaredorhcS

)7191(rehtnuGKollie et al. (1969)Awbery and Griffiths (1940)Pallister (1949)

)0791(redorhcSdnaykstoherOShanks et al. (1967)Esser and Baerlecken (1941)Lederman et al. (1974)Kraftmakher and Romashina (1965)Rodebush and Michalek (1925)This work FeBCC

FeBCC from Bale et al. (2016)

(a)

0 500 1000 1500T in K

−2.0

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0Δ

C/C

Bendick and Pepperhoff (1982)Rogez and Le Coze (1980)Tsuchiya et al. (1971)Anderson and Hultgren (1962)Vollmer et al. (1966)Wallace et al. (1960)Dench and Kubaschewski (1963)Normanton et al. (2013)Cezairliyan and McClure (1974)Keesom and Kurrelmeyer (1939)Eucken and Werth (1930)Stepakoff and Kaufman (1968)Kelley (1943)Simon and Swain (1922)

Griffiths and Griffiths (1914)Reddy and Redyy (1974)

)9691(sennIcaMdnaredorhcS)7191(rehtnuG

Kollie et al. (1969)Awbery and Griffiths (1940)Pallister (1949)

)0791(redorhcSdnaykstoherOShanks et al. (1967)Esser and Baerlecken (1941)Lederman et al. (1974)Kraftmakher and Romashina (1965)Rodebush and Michalek (1925)

0 20 40

0

10

(b)

Fig. 4 a The neural network representation of the isobaric heat capac-ity of FeBCC and the underlying trainings data, b relative differences�C/C = (CN − C)/CN of the trainings data from the neural networkapproximation for FeBCC

of the optimization routine. The resilient propagation algo-rithm (rProp) (Riedmiller and Braun 1993) was developed tobecome independent from the magnitude of ∂E/∂ pα . rPropadapts local stepsizes based only on the sign of ∂E/∂ pα ofthe current iteration k and of the previous iteration k − 1.

The rProp learning algorithm introduces an individualupdate value ηα for each of the pα network parameters. Theindividual ηα are changed during the learning process, and its

123

12232 M. Länge

400 600 800 1000 1200 1400 1600T in K

0

10

20

30

40

50

60H

inkJ

/m

ol

FeBCC → FeFCC

This work FeBCC

This work FeFCC

FeBCC from Bale et al. (2016)FeFCC from Bale et al. (2016)

Anderson and Hultgren (1962)Normanton et al. (2013)

(a)

400 600 800 1000 1200 1400 1600T in K

−0.7

−0.6

−0.5

−0.4

−0.3

−0.2

−0.1

0.0

0.1

ΔH

/H

FeBCC → FeFCC

Anderson and Hultgren (1962)Normanton et al. (2013)

(b)

Fig. 5 a The neural network representation of the enthalpy of FeBCCand FeFCC together with their underlying trainings data for 300K <

T < 1600K, b relative differences �H/H = (HN − H)/HN of thetrainings data from the neural network approximation for FeBCC andFeFCC

evolution is influenced only by the sign of ∂E/∂ pα . Accord-ing toRiedmiller andBraun (1993) the change of the differentηα is given by

0 500 1000 1500 2000 2500 3000T in K

0

10

20

30

40

Cin

J/(m

olK

)

Bendick and Pepperhoff (1982)Rogez and Le Coze (1980)Tsuchiya et al. (1971)Anderson and Hultgren (1962)Vollmer et al. (1966)Wallace et al. (1960)Dench and Kubaschewski (1963)Normanton et al. (2013)Cezairliyan and McClure (1974)

Kollie et al. (1969)Awbery and Griffiths (1940)Pallister (1949)Shanks et al. (1967)Esser and Baerlecken (1941)Kraftmakher and Romashina (1965)This work FeFCC

FeFCC from Bale et al. (2016)

(a)

1200 1300 1400 1500 1600T in K

−0.30

−0.25

−0.20

−0.15

−0.10

−0.05

0.00

0.05

0.10

0.15Δ

C/C

Bendick and Pepperhoff (1982)Rogez and Le Coze (1980)Tsuchiya et al. (1971)Anderson and Hultgren (1962)Vollmer et al. (1966)Wallace et al. (1960)Dench and Kubaschewski (1963)Normanton et al. (2013)

Cezairliyan and McClure (1974)Kollie et al. (1969)Awbery and Griffiths (1940)Pallister (1949)Shanks et al. (1967)Esser and Baerlecken (1941)Kraftmakher and Romashina (1965)

(b)

Fig. 6 a The neural network representation of the isobaric heat capac-ity of FeFCC and the underlying training data, b relative differences�C/C = (CN − C)/CN of the trainings data from the neural networkapproximation for FeFCC

ηαk =

⎧

⎪⎪⎪⎨

⎪⎪⎪⎩

η+ · ηαk−1, if

(∂E∂ pα

)

k−1·(

∂E∂ pα

)

k> 0

η− · ηαk−1, if

(∂E∂ pα

)

k−1·(

∂E∂ pα

)

k< 0

ηαk−1, else.

(2.16)

123


1600 1700 1800 1900 2000 2100 2200T in K

40

50

60

70

80

90

100H

inkJ

/m

ol

FeBCC → FeLIQ

This work FeBCC

This work FeLIQFeBCC from Bale et al. (2016)

FeLIQ from Bale et al. (2016)

Morris et al. (1966)Treverton and Margrave (1971)

(a)

1600 1700 1800 1900 2000 2100 2200T in K

−0.20

−0.15

−0.10

−0.05

0.00

0.05

ΔH

/H

FeBCC → FeLIQ

Morris et al. (1966)Treverton and Margrave (1971)

(b)

Fig. 7 a The neural network representation of the enthalpy of FeBCCand FeLIQ together with their underlying trainings data for 1600K <

T < 2200K, b relative differences �H/H = (HN − H)/HN of thetrainings data from the neural network approximation for FeBCC andFeLIQ

After the update values the individual pα are updated as givenin Eq. (2.17) by

�pαk =

⎧

⎪⎪⎨

⎪⎪⎩

−ηαk , if

(∂E∂ pα

)

k> 0

ηαk , if

(∂E∂ pα

)

k< 0

0, else.

(2.17)

As in regular gradient descent optimization procedures theupdate is made in opposite gradient direction. The updatesare applied as given in Eq. (2.18) by

pαk = pα

k−1 + �pαk (2.18)

Learning algorithms can be used in online or batch modes.During online learning only one training example per itera-tion is used at a time for updating the free networkparameters,while batch learning uses a mean error over more than onetraining example per iteration. One cycle through all theavailable training data is called epoch. It is worth to men-tion that the rProp algorithm does only work in batch modeand with a large batch size.

3 Results

3.1 A neural network for the approximation ofthermodynamic functions

The proposed model for the approximation of thermody-namic functions consists of two interconnected subnetworksas shown in Fig. 3. The first subnetwork consists of a 1−1−1ANN and has fa (Eq. 2.13) as its activation function for thehidden neurons. As described in the previous section this firstsubnetwork provides a base level for the approximation. Thesecond subnetwork has a 1 − N − 1 structure and uses fb(Eq. 2.14) as activation function for the hidden neurons. Thenumber of hidden neurons is depending on the case.

The proposed method is implemented in Python and usesTheano (Theano Development Team 2016) as tensor libraryas well as Theanos built-in automated differentiation algo-rithm for the derivation of the gradients needed for thecalculation of the derived thermodynamic quantities and dur-ing the learning procedure. rProp is used in batch mode witha batch size equal to the number of available measurementdata. The different phases of a chemical element are approxi-mated each with a separate neural network. The optimizationof the network parameters is carried out at the same time forall phases of the system. This is achieved by formulating theerror for every phase separately and adding up the differ-ent phase-wise error expressions to an overall system error.The reason for this approach lies in the calculation thermo-dynamic quantities like the Gibbs energy Ga→b(Ttr) or theenthalpy change�Ha→b(Ttr) at the transitions from an arbi-trary phase a to a phase b at the transition temperature Ttr .�Ha→b(Ttr) = Hb(Ttr) − Ha(Ttr) for example depends onthe values for Ha(Ttr) and Hb(Ttr). The optimization of theneural network representation is based on measurements forHa(Ttr), Hb(Ttr) and �Ha→b(Ttr), and it is more than likelythat measurements from different sources are not fully con-sistent. The optimization has the goal to minimize the error

123

12234 M. Länge

3000 3500 4000 4500 5000 5500 6000

T in K

−200

−400

−600

−800

−1000

Gin

kJ/

mol

This work FeGAS

FeGAS from Chase (1998)

(a)

3000 3500 4000 4500 5000 5500 6000

T in K

480

500

520

540

560

580

Hin

kJ/

mol

This work FeGAS


(b)

3000 3500 4000 4500 5000 5500 6000

T in K

230

235

240

245

250

255

260

Sin

J/

(mol

K)

This work FeGAS


(c)

3000 3500 4000 4500 5000 5500 6000

T in K

25

30

35

40

45

Cin

J/

(mol

K)

This work FeGAS


(d)

Fig. 8 The thermodynamic functions of FeGAS between 3000K < T < 6000K together with the training data from Chase (1998). a G(T ), bH(T ), c S(T ), d C(T )

between the network output and its underlying training dataleading to an error E > 0 even for a fully optimized net-work. Optimizing Ha(Ttr) in a first step and Hb(Ttr) and�Ha→b(Ttr) in the second step would distribute the errorE > 0 due to inconsistencies of the measurement data onlyon the second phase and would lead to a reduced qualityof its approximation. By optimizing all the different phasesof a system simultaneously on the basis of a overall systemerror the error due to inconsistencies of the measurementdata is distributed evenly among the different phases of thesystem.

3.2 Approximation of the thermodynamic functionsof pure iron

The thermodynamic functions of Fe were considered toevaluate the performance of the proposed neural networkmodel. The thermodynamic functions of each phase of Feare approximated for the temperature range between 0 <

T < 6000K in its stable and metastable regime. Between 0and 1184K Fe has a BCC crystal structure and is denotedα-Fe in the literature. At 1184K there is a phase transitionFeBCC → FeFCC. The FCC phase is denoted γ -Fe and is

123


stable between 1184 and 1665K. At 1665K a second phasetransition occurs in the solid state FeFCC → FeBCC. The sec-ond BCC phase is referred to as δ-Fe, but since the crystalstructure is the same as for α − Fe, the two BCC phases aremodelled by a single ANN. Fe has its melting point at 1809Kand the transition to the gaseous phase at 3134K.The approx-imation of the thermodynamic functions of Fe is challengingdue to the strong magnetic peak in the Cp curve of BCC ironat 1042K and to the four phase transitions between 0 and6000K. There exist reviews of the available measurementsfor Fe, among which Desai (1986) and Chen and Sundman(2001) are used in this work to gather the training data for theneural network model. The used training data, if not men-tioned explicitly, consist solely of raw measurement data.Published values from other already optimized models arenot taken as training data but are used to compare the obtainedresults to. The calculation of phase transitions is based on abisection method which is implemented in python.

3.2.1 BCC

Figure 4a shows the ANN representation of the specific heatCN of FeBCC iron together with the training data used for theapproximation and the results from the FactSage 7.0 FactPSdatabase (Bale et al. 2016). Figure 4b shows the respec-tive relative differences between CN and the training data.The calculated thermodynamic function is over wide rangesin good agreement with the measurement data. The overallstandard deviation of the neural network approximation tothe training data is σBCC = 2.9 J/mol K. Between 500 and700K the calculated curve tends to predict slightly lowerheat capacity values than the pure training data would sug-gest. This behaviour can be explained by the nature of theneural network regression itself. The error expression com-bines the error of different quantities of the different phases.The overall system error has a minimum, but can still beseen as the best compromise between the individual errorsthe system error consists of.

The calculated heat capacity function from this work isin good agreement with curve from the FactSage 7.0 FactPSdatabase (Bale et al. 2016). Both approximations show devi-ations from the measurement data between 500 and 700K.The values near the magnetic peak at 1042K are better rep-resented by the neural network representation than by theFactSage 7.0 FactPS database (Bale et al. 2016). Further-more, the FactSage 7.0 FactPS database (Bale et al. 2016)representation has a jump at the transition between FeBCCand FeLIQ at 1809K. This jump is avoided in the resultsobtained in this work, and the transition into the metastableregime is smooth and continuous. In addition, Fig. 5a showsthe results for the ANN representation of the enthalpy HN

togetherwith the training data and the representation from theFactSage 7.0 FactPS database (Bale et al. 2016). Figure 5b

shows in addition the relative difference between HN and thetraining data. The calculated values of HN are in good agree-ment with the measurement data and the results from Baleet al. (2016). Another interesting aspect is the model’s abil-ity to approximate the different thermodynamic quantities,especially CN, between 0 and 298.15K, which is a problemfor the polynomial-based models as reported in Roslyakovaet al. (2016).

3.2.2 FCC

Figure 6a shows the ANN representation of the specificheat CN of FeFCC iron together with the underlying train-ing data and the results obtained from the FactSage 7.0FactPS database (Bale et al. 2016). The relative differencesbetween CN and the training data for FeFCC iron are shownin figure (6b). The available measurement data are scattered.For example the difference between Cp(1600K) from Ben-dick and Pepperhoff (1982) and Cp(1600K) from Rogezand Le Coze (1980) is �Cp = 11.44 J/ (mol K). The neuralnetwork approximation lies in good agreement with the mea-surement data in view of their strong scattering. The overallstandard deviation of the neural network approximation to thetraining data is σFCC = 2.1 J/ (mol K). The slope of the neu-ral network approximation of the heat capacity in its stableregime between 1184 and 1665K is lower than the repre-sentation from the FactSage 7.0 FactPS database (Bale et al.2016). Like in the FeBCC phase the FactPS database approx-imation has a jump above at the melting temperature. This

Table 1 Comparison between calculated and experimental tempera-tures of phase transformations for iron

Ttr in K Source

FeBCC→FCC 1184 Bendick and Pepperhoff (1982)

1184 Tsuchiya et al. (1971)

1184 Anderson and Hultgren (1962)

1184 Vollmer et al. (1966)

1184.08 This work

FeFCC→BCC 1665 Bendick and Pepperhoff (1982)

1665 Tsuchiya et al. (1971)



1665.09 This work

FeBCC→LIQ 1811 Desai (1986)



1808.99 This work

FeLIQ→GAS 3134 Haynes (2015)

3134.0 This work

123

12236 M. Länge

Table 2 Comparison between calculated and experimental phase transition enthalpies for iron

�HBCC→FCCin J/molT = 1183.95K

�HFCC→BCCin J/molT = 1664.95K

�HBCC→LIQin J/molT = 1808.99K

Smith (1946) 900 – –

Wallace et al. (1960) 911(±80) – –

Anderson and Hultgren (1962) 941(±80) 1088(±210) –

Dench and Kubaschewski (1963) 900 837 –

Braun and Kohlhaas (1965) 910(±20) 850(±30) 14,400(±300)

Morris et al. (1966) – – 13,799(±418)

Vollmer et al. (1966) – – 14,400(±200)

Treverton and Margrave (1971) – – 13,836(±293)

Normanton et al. (2013) 923(±10) 765(±22) –

Cezairliyan and McClure (1974) – 890 –

Rogez and Le Coze (1980) 900(±20) – –

Schürmann and Kaiser (1981) 954 770 14,390

Bendick and Pepperhoff (1982) 820 850 –

Johansson (1937) 1339 444 –

Darken and Smith (1951) 900 690 15,355

Kaufman et al. (1963) 879 937 –

Braun and Kohlhaas (1965) 900 950 –

Guillermet et al. (1985) 1013 826 13,807

Desai (1986) 900 850 13,810

This work 928 871 14,145

jump does not occur in the neural network approximation ofthis work.

Figure 5a shows the enthalpy of FeBCC and FeFCC near thetransition FeBCC → FeFCC at 1184.08K together with theunderlying training data and with the SGTE approximationas a comparison. The relative differences of the training dataand HN are shown in Fig. 5b. The neural network approx-imations obtained in this work lie in good agreement withthe measurement data. For the FeBCC phase the neural net-work approximation and the SGTE approximation are almostidentical. For the FeFCC phase, the results from this work pre-dict lower values in the metastable regime under 1184.08 Kthan the SGTE approximation from Bale et al. (2016) andthe difference increases with decreasing temperature.

3.2.3 Liquid

The neural network approximation for liquid iron FeLIQ isbased on the work of Desai (1986), who suggests a value forthe isobaric heat capacity of liquid iron of 46.632 J/(mol K),and additionally on measurement values for the enthalpy.The optimization procedure for liquid iron is slightly differ-ent from the optimization of the solid phases. The isobaricheat capacity of liquids is constant. For the learning algo-rithm the stable temperature regime of iron between 1809K

and 3134K is represented by 100 points in the interval[1809, 3134]. The target value for each of the 100 points isthe 46.632 J/(mol K) from Desai (1986). Calculations haveshown that this condition alone is not sufficient for a slope of0 J/

(

molK2)

of C in of the liquid phase. The error expres-sion for liquid iron is therefore extended by the expressionas given in Eq. (3.1) by

dC

dT= 0. (3.1)

Figure 7a shows the results for the enthalpy of FeBCC andFeLIQ near the transition FeBCC → FeLIQ at 1808.9 Ktogether with the results from the FactSage 7.0 FactPSdatabase (Bale et al. 2016). The relative differences betweenHN and the measurement data for FeBCC and Fe are shownin Fig. 7b. The approximation for FeLIQ above the meltingtemperature lies in good agreement with the available mea-surement values and is almost identical to the approximationfrom the FactPS database (Bale et al. 2016). In themetastabletemperature regimes, for FeLIQ above and for FeBCC belowthe melting temperature, the neural network approximationpredicts slightly lower values for C as the approximationfrom Bale et al. (2016).

123


0 1000 2000 3000 4000 5000 6000

T in K

400

200

0

−200

−400

−600

−800

−1000

Gin

kJ/

mol

FeBCC

FeFCC

FeLIQ

FeGAS

(a)

0 1000 2000 3000 4000 5000 6000T in K

0

100

200

300

400

500

600

Hin

kJ/

mol

FeBCC

FeFCC

FeLIQ

FeGAS

(b)

0 1000 2000 3000 4000 5000 6000

T in K

0

50

100

150

200

250

Sin

J/(m

olK

)

FeBCC

FeFCC

FeLIQ

FeGAS

(c)

0 1000 2000 3000 4000 5000 6000T in K

0

10

20

30

40

50

60

70

80

Cin

J/

(mol

K)

FeBCC

FeBCC

FeLIQ

FeGAS

(d)

Fig. 9 Overview over the thermodynamic functions from this work of FeBCC, FeFCC, FeLIQ and FeGAS between 0K < T < 6000K. a G(T ), bH(T ), c S(T ), d C(T ). The solid lines stand for the stable, the dashed lines for the metastable regime (color figure online)

3.2.4 Gaseous phase

The approximation of the gaseous phase is based on theNISTJANAF thermochemical tables (Chase 1998). This meansthe basis for the approximation of the thermodynamic func-tions does not consist of measurement values as for the otherphases of iron but on already optimized values. Neverthe-less, and for the sake of completeness, the gaseous phase isstill incorporated in the proposed results. A important aspectof the approximation of the gaseous phase is that the NISTJANAF thermochemical tables provide values for G, H , Sand C . Figure 8a–d shows the approximated thermodynamicfunctions of gaseous iron. The results for the gaseous phase

show that every available thermodynamic quantities can beused for the approximation of the thermodynamic functions

3.2.5 Further remarks

For the optimization of the neural network model values forH , C and S were used. Values for G were used indirectly toapproximate transition temperatures. This demonstrates themodels ability to use any of these quantities for the approx-imation of thermodynamic functions, which is clearly anadvantage over the polynomial-based models. Figure 8a–dshows the curves of G, H , S and C for the whole considered

123

12238 M. Länge

temperature range and for all of the different phases of iron.The dashed lines indicate the metastable regimes of respec-tive phase. Table 1 lists the calculated transition temperaturesof iron for the different phase transitions together with avail-able literature data. The calculated values from this worklie in good agreement with the literature data. AdditionallyTable 2 lists the enthalpies of transformation for the differentphase transformations of iron. The calculated values fromthis work are in good agreement with the data from the liter-ature. It is worth to mention that these values are also part ofthe training data and are learned by the network during theoptimization phase. Nevertheless, these values show clearlythat the proposed method can be used for the approximationof the thermodynamic functions of unary systems and theobtained results are self-consistent.

4 Summary and outlook

This work presents a new model for the approximation ofthermodynamic functions of unary systemsbasedon artificialneural networks. For iron as a complex example the differentthermodynamic functions were successfully approximated.The comparison with literature data and already existingapproximations of the thermodynamic functions of pure ironshows the suitability of the proposed method. The proposedmethod solves the underlying optimization problem with aminimum user input and almost automatically. One majordisadvantage of the proposed method is the black-box char-acter of ANNs. As a consequence the ability of the networkto deliver correct values can only be verified through trial anderror and not by investigating the optimized network itself.The extension of the proposed method on material systemswith more than one constituents and the investigation in howfar the beneficial properties of the proposed method can beextended are the main questions for further investigations(Fig. 9).

Acknowledgements OpenAccess funding provided by Projekt DEAL.

Compliance with ethical standards

Conflict of interest The author Maximilian Länge declares that he hasno conflict of interest. This article does not contain any studies withhuman participants performed by any of the authors.

Open Access This article is licensed under a Creative CommonsAttribution 4.0 International License, which permits use, sharing, adap-tation, distribution and reproduction in any medium or format, aslong as you give appropriate credit to the original author(s) and thesource, provide a link to the Creative Commons licence, and indi-cate if changes were made. The images or other third party materialin this article are included in the article’s Creative Commons licence,unless indicated otherwise in a credit line to the material. If materialis not included in the article’s Creative Commons licence and your

intended use is not permitted by statutory regulation or exceeds thepermitted use, youwill need to obtain permission directly from the copy-right holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

References

Anderson PD, Hultgren R (1962) The thermodynamics of solid iron atelevated temperatures. Trans Metall Soc AIME 224:842–845

Avrutskiy VI (2017) Enhancing approximation abilities of neural net-works by training derivatives. arXiv:1712.04473v2

Awbery JH, Griffiths E (1940) The thermal capacity of pure iron. ProcR Soc Lond Ser A 174:1–15

Bale CW, Bélisle E, Chartrand P et al (2016) FactSage thermochemicalsoftware and databases, 2010–2016. Calphad 54:35–53

Bendick W, Pepperhoff W (1982) On the α / γ phase stability of iron.Acta Metall 30:679–684

BraunM, Kohlhaas R (1965) Die spezifischeWärme von Eisen, Kobaltund Nickel im Bereich hoher Temperaturen. Phys Status Solidi B12:429–444

Cezairliyan A, McClure JL (1974) Thermophysical measurements oniron above 1500 K using a transient (subsecond) technique. J ResNatl Stand Sec A 78A:1

Chase MW Jr (1998) NIST-JANAF thermochemical tables, 4th edn.American Chemical Society, New York, American Institute ofPhysics for the National Institute of Standards and Technology,Washington

ChaseMW, Ansara I, Dinsdale AT et al (1995) Thermodynamic modelsand data for pure elements and other end members of solutions.Calphad 19:437–447

Chen Q, Sundman B (2001) Modeling of thermodynamic properties forBcc, Fcc, liquid, and amorphous iron. J Phase Equilib 22:631–644

Darken LS, Smith RP (1951) Thermodynamic functions of iron. IndEng Chem 43:1815–1820

Dench WA, Kubaschewski O (1963) Heat capacity of iron at 800◦C to1420◦C. J Iron Steel Inst 201:140–143

Desai PD (1986) Thermodynamic properties of iron and silicon. J PhysChem Ref Data 15:967–983

Dinsdale AT (1991) SGTE data for pure elements. Calphad 15:317–425Esser H, Baerlecken E-F (1941) Die wahre spezifische Wärme von

reinem Eisen und Eisen–Kohlenstoff–Legierungen von 20 bis1100◦. Arch Eisenhuttenw 14:617–624

Eucken A, Werth H (1930) Die spezifische Wärme einiger Metalle undMetallegierungen bei tiefen Temperaturen. Z Anorg Allg Chem188:152–172

Griffiths EH, Griffiths E (1914) The capacity for heat of metals at lowtemperatures. Proc R Soc Lond 90:557–560

Guillermet AF, Gustafson P, Hillert M (1985) The representation ofthermodynamic properties at high pressures. J Phys Chem Solids46:1427–1429

Günther P (1917) Untersuchungen über die spezifische Wärme beitiefen Temperaturen. Ann Phys (Leipzig) 356:828–846

HaynesWM (2015) CRC handbook of chemistry and physics, 95th edn.CRC Press, Hoboken

Hemmat Esfe M (2017) Designing an artificial neural network usingradial basis function (RBF-ANN) to model thermal conductivityof ethylene glycol–water-based TiO2 nanofluids. J Therm AnalCalorim 127:2125–2131

Hemmat EsfeM, YanW-M, AfrandM, SarrafM, Toghraie D, DahariM(2016) Estimation of thermal conductivity of Al2O3/water (40%)–ethylene glycol (60%) by artificial neural network and correlationusing experimental data. Int Commun Heat Mass 74:125–128

Hornik K (1991) Approximation capabilities of multilayer feedforwardnetworks. Neural Netw 4:251–257

123

http://creativecommons.org/licenses/by/4.0/

http://creativecommons.org/licenses/by/4.0/

http://arxiv.org/abs/1712.04473v2


Johansson CH (1937) Thermodynamisch begründete Deutung derVorgänge bei der Austenit-Martensit-Umwandlung. Arch Eisen-huettenwes 11:241–251

Kaufman L, Clougherty E, Weiss R (1963) The lattice stability ofmetals—III. Iron Acta Metall 11:323–335

Keesom WH, Kurrelmeyer B (1939) The atomic heat of iron from 1.1to 20.4◦K. Physica 6:633–647

Kelley KK (1943) The specific heat of pure iron at low temperatures. JChem Phys 11:16–18

Kollie TG,McElroyDL,BarisoniM,BrooksCR (1969) Pulse calorime-try using a digital voltmeter for transient data acquisition. ORNLReport No. ORNL-4380

Kraftmakher YA, Romashina TY (1965) Specific heat of iron near thecurie point. Fiz Tverd Tela 7:2532

Lederman FL, Salamon MB, Shacklette LW (1974) Experimental ver-ification of scaling and test of the university hypothesis fromspecific-heat data. Phys Rev B 9:2981–2988

Lukas HL, Sundman B, Fries SG (2007) Computational thermody-namics: the CALPHAD method. Cambridge University Press,Cambridge

Morris JP, Foerster EF, Schulz CW, Zellar GR (1966) Use of a diphenylether calorimeter in determining the heat of fusion of iron. U.S.Bureau of Mines Rep. Invest, 6723

Nair V, Hinton GE (2010) Rectified linear units improve restrictedBoltzmann machines. In: 27th international conference onmachine learning

Normanton AS, Bloomfield PE, Sale FR, Argent BB (2013) A calori-metric study of iron–cobalt alloys. Met Sci J 9:510–517

Orehotsky JL, Schröder K (1970) A high temperature heat exchangecalorimeter. J Phys E 3:889–891

Pallister PR (1949) The specific heat and resistivity of high-purity ironup to 1250 C. J Iron Steel Inst 161:87–90

Reddy BPN, Reddy PJ (1974) Low temperature heat capacities of someferrites. Phys Status Solidi A 22:219–223

Riedmiller M, Braun H (1993) A direct adaptive method for fasterbackpropagation learning: the RPROP algorithm. In: IEEE inter-national conference on neural networks, pp 586–591

Rodebush WH, Michalek JC (1925) The atomic heat capacities of ironand nickel at low temperatures. J Am Chem Soc 47:2117–2121

Rogez J, Le Coze J (1980) Description et étalonnage d’un calorimètreadiabatique à balayage (800K−1800K). Rev Phys Appl 15:341–351

Roslyakova I, Sundman B, Dette H, Zhang L, Steinbach I (2016) Mod-eling of Gibbs energies of pure elements down to 0 K usingsegmented regression. Calphad 55:165–180

Saunders N, Miodownik AP (1998) CALPHAD (calculation of phasediagrams): a comprehensive guide, vol 1. Pergamon, Oxford

Schröder K, Maclnnes WM (1969) Simple pulse heating method forspecific heat measurements. J Phys E 2:959–962

Schürmann E, Kaiser H-P (1981) Entwicklung und Überprüfung einesHochtemperaturkalorimeters. Arch Eisenhuettenwes 52:99–101

Shanks HR, Klein AH, Danielson GC (1967) Thermal properties ofArmco iron. J Appl Phys 38:2885–2892

SimonF, SwainRC (1922)Untersuchungen über die spezifischeWärmebeitiefen Temperaturen. Ann Phys Leipz 373:241–280

Smith RP (1946) Equilibrium of iron–carbon alloys with mixtures ofCO–CO2 and CH4–H2. J Am Chem Soc 68:1163–1175

Stepakoff GL, Kaufman L (1968) Thermodynamic properties of h.c.p.iron and iron–ruthenium alloys. Acta Metall 16:13–22

Theano Development Team (2016) Theano: a Python framework forfast computation of mathematical expressions. CoRR

Treverton JA, Margrave JL (1971) Thermodynamic properties by levi-tation calorimetry III. The enthalpies of fusion and heat capacitiesfor the liquid phases of iron, titanium, and vanadium. J ChemThermodyn 3:473–481

Tsuchiya M, Izumiyama M, Imai Y (1971) Free energy of the solid-phase of pure iron. J Jpn Inst Met 35:839–845

Vollmer O, Kohlhaas R, Braun M (1966) Notizen: Die Schmelzwärmeund die Atomwärme im schmelzflüssigen Bereich von Eisen,Kobalt und Nickel. Z. Naturforsch, Teil A 21

Wallace DC, Sidles PH, Danielson GC (1960) Specific heat of highpurity iron by a pulse heating method. J Appl Phys 31:168–176

Publisher’s Note Springer Nature remains neutral with regard to juris-dictional claims in published maps and institutional affiliations.

123

anartiﬁcialneuralnetworkmodelfortheunarydescriptionofpure ...the aim of this work is to introduce...

Documents