comparison of the estimation capabilities of response surface
TRANSCRIPT
-
7/30/2019 Comparison of the Estimation Capabilities of Response Surface
1/12
O R I G I N A L P A P E R
Comparison of the estimation capabilities of response surfacemethodology and artificial neural network for the optimization
of recombinant lipase production by E. coli BL21
Rubina Nelofer Ramakrishnan Nagasundara Ramanan
Raja Noor Zaliha Raja Abd Rahman
Mahiran Basri Arbakariya B. Ariff
Received: 14 February 2011 / Accepted: 9 July 2011 / Published online: 11 August 2011
Society for Industrial Microbiology 2011
Abstract Response surface methodology (RSM) and
artificial neural network (ANN) were used to optimize theeffect of four independent variables, viz. glucose, sodium
chloride (NaCl), temperature and induction time, on lipase
production by a recombinant Escherichia coli BL21. The
optimization and prediction capabilities of RSM and ANN
were then compared. RSM predicted the dependent vari-
able with a good coefficient of correlation determination
(R2) and adjusted R2 values for the model. Although the R2
value showed a good fit, absolute average deviation (AAD)
and root mean square error (RMSE) values did not support
the accuracy of the model and this was due to the inferi-
ority in predicting the values towards the edges of the
design points. On the other hand, ANN-predicted values
were closer to the observed values with better R2, adjusted
R2, AAD and RMSE values and this was due to the
capability of predicting the values throughout the selected
range of the design points. Similar to RSM, ANN could
also be used to rank the effect of variables. However, ANN
could not predict the interactive effect between the vari-ables as performed by RSM. The optimum levels for glu-
cose, NaCl, temperature and induction time predicted by
RSM are 32 g/L, 5 g/L, 32C and 2.12 h, and those by
ANN are 25 g/L, 3 g/L, 30C and 2 h, respectively. The
ANN-predicted optimal levels gave higher lipase activity
(55.8 IU/mL) as compared to RSM-predicted levels
(50.2 IU/mL) and the predicted lipase activity was also
closer to the observed data at these levels, suggesting that
ANN is a better optimization method than RSM for lipase
production by the recombinant strain.
Keywords Process optimization Response surfacemethodology Artificial neural network Lipase production Recombinant Escherichia coli
Introduction
Optimization of medium components and environmental
factors is an important step in a high-performance fermen-
tation process [11]. Various techniques have been proposed
and used for the optimization of industrial processes. The
classical technique of one parameter at a time is time con-
suming and not effective for the identification of interac-
tions and also for the predictions of parameters involved in
the process. Nowadays, mathematical methods such as
response surface methodology (RSM) [1, 11, 20, 23] and
artificial neural network (ANN) [9, 13, 27] are commonly
used for modelling and optimization of processes.
RSM is based on a collection of statistical and mathe-
matical techniques. This technique is useful in developing,
improving and optimizing processes in which a response of
interest is influenced by several independent variables and
R. Nelofer A. B. Ariff (&)Department of Bioprocess Technology,
Faculty of Biotechnology and Biomolecular Sciences,
University Putra Malaysia, 43400 Serdang, Selangor, Malaysia
e-mail: [email protected]
R. N. RamananChemical and Sustainable Process Engineering Research Group,
School of Engineering, Monash University,
46150 Bandar Sunway, Selangor, Malaysia
R. N. Z. R. A. Rahman
Department of Microbiology, Faculty of Biotechnology
and Biomolecular Sciences, University Putra Malaysia,
43400 Serdang, Selangor, Malaysia
M. Basri
Department of Chemistry, Faculty of Science,
University Putra Malaysia, 43400 Serdang, Selangor, Malaysia
123
J Ind Microbiol Biotechnol (2012) 39:243254
DOI 10.1007/s10295-011-1019-3
-
7/30/2019 Comparison of the Estimation Capabilities of Response Surface
2/12
the objective is to optimize this response. Besides analys-
ing the effects of the independent variables, this experi-
mental methodology generates a mathematical model,
which can be used to describe the process for better
understanding. Before applying the RSM, some pre-
liminary studies are needed such as selection of the
experimental design [5]. If the number of variables is too
large, screening of the significant variables shall be carriedout first, prior to selection of appropriate optimization
design. Various experimental designs may be used for
RSM optimization [7, 8, 25]. RSM has many advantages
over the conventional one parameter at a time technique,
but it is not applicable to all optimization and modelling
studies. The major drawback of RSM is the need for a
second-order polynomial to fit the data [3]. All systems
containing curvature such as symmetrical or non-symmet-
rical bell-shaped curves may not be well explained by the
second-order polynomial [3, 6].
ANNs are computational models formed from hundreds
of single units, artificial neurons, inspired by biologicalneurons and connected with coefficients (weights) which
constitute the neural structure [7]. These neurons are
sometimes called processing elements (PE) as they process
information. These weights are just like the synaptic
activity in a biological neuron. The weights of the inputs
are summed, and the threshold subtracted, to determine the
activation of the neuron [22]. The other important capa-
bility of neural networks is that they can learn the input/
output relationship through training. ANN analysis is quite
flexible as regards to the amount and form of the training
(experimental) data, which makes it possible to use more
informal experimental designs than with statistical
approaches [22].
A neural network does not need any model or screening
before the development of a network. Neural networks may
be applied on designed data or on the data that is not sta-
tistically designed. Sufficient data with all possible oper-
ating conditions of input variables are needed to develop a
neural network. A network model is then constructed
according to the systems behaviour. The constructed model
may be used for predictions and other applications within
the assessed operating conditions. Since the regression
analysis is dependent on predetermined statistically sig-
nificant levels, the less significant factors are not included
in the model. ANN uses all the data making the model
more accurate [7]. A neural network can perform tasks that
cannot be performed by linear programming. If an element
of the neural network fails, the network can still continue to
perform the task owing to its parallel nature. Neural net-
works learn and there is no need for reprogramming [15].
The main disadvantage of ANN is the requirement of
training in order to operate. A neural network needs to be
emulated because the architecture is different from that of
microprocessors [15], where high processing time is
required for a large network. Different architectures may
also be involved in ANN which requires different types of
algorithms.
The development of accurate models for a biological
reaction on chemical and physical bases is still a critical
challenge, mainly due to the non-linear nature of the bio-
chemical network interactions. The use of advanced non-linear data analysis techniques such as ANN has been
applied in various areas such as food science [26], bio-
technology [12], chemical processes [4], equipment
development [22] and biochemical engineering [13].
Comparative studies of ANN and RSM for fermentation
processes employing wild strains have been reported [79].
To our knowledge, comparison of ANN and RSM for
optimization of cultural conditions for enzyme production
by a recombinant strain has not been reported in the
literature.
The objective of the present study was to compare the
efficiency of ANN in modelling and optimization oflipase production by a recombinant E. coli BL21 with
optimization using RSM from data of the previous study
[21]. The optimization and prediction capabilities of RSM
and ANN were compared by the coefficient of correlation
determination (R2), adjusted R2, absolute average devia-
tion (AAD) and root mean square error (RMSE) values
for the models.
Materials and methods
Microorganism and inoculum preparation
The microorganism used in this study was E. coli BL21
(DE3) pLysS [9, 14] harbouring the organic solvent toler-
ant and thermostable lipase gene of Bacillus sp. 42 [10].
Inoculum was prepared by adding a single colony grown
from an LB agar plate in 50 mL LB broth in 250-mL
screw-cap Schott Duran bottles and incubating in a rotary
shaker with a shaking speed of 200 rpm at 37C for
1618 h.
Lipase production
All fermentations were conducted under aerobic conditions
in 250-mL screw cap Schott Duran bottles with 50 mL
production medium (5 g/L yeast extract, 10 g/L tryptone,
19 g/L NaCl and 1090 g/L glucose). The pH of all
media was adjusted to 7 using either 0.1 M HCl or NaOH
prior to sterilization. Ampicillin (50 lg/mL) and chlor-
amphenicol (35 lg/mL) were added to all media to inhibit
growth of bacteria without lipase gene. Fermentation was
carried out in a rotary shaker agitated at 200 rpm for 24 h.
244 J Ind Microbiol Biotechnol (2012) 39:243254
123
-
7/30/2019 Comparison of the Estimation Capabilities of Response Surface
3/12
The culture pH was not controlled throughout the culti-
vations but the pH was measured at time intervals. In all
fermentations, isopropyl b-D-1-thiogalactopyranoside
(IPTG) at a concentration of 0.5 mM was used as an
inducer. Induction time was varied from 1 to 5 h and the
fermentation was carried out for 24 h. The variations in
medium components and other fermentation variables are
given in Table 1.
Experimental design
The optimization of fermentation conditions was con-
ducted using the PlackettBurman (PB) design as described
earlier [21]. In this experimental design, four significant
variables (glucose, NaCl, temperature and induction time)
were selected. A total of 32 experiments were conducted
according to Box-Wilson (BW) 24 full factorial central
Table 1 Box-Wilson 24 factorial central composite design for optimization of lipase production by recombinant E. coli used for RSM [21] and
ANN
Exp. no. Glucose
(X1) (g/L)
NaCl (X2)
(g/L)
Temperature
(X3) (C)
Induction
(X4) time (h)
Lipase activity (Y) (IU/mL) Final
culture
pHObserved Predicted by
RSM (% differencea)
Predicted by
ANN (% differencea)
1 70 7 43 4 04.2 07.2 (69) 04.9 (15) 4.93
2 70 7 43 2 12.9 14.4 (10) 12.9 (0.5) 4.91
3 70 7 31 4 18.9 18.2 (4) 18.8 (0.6) 4.52
4 70 7 31 2 29.7 30.0 (1) 29.2 (1) 4.53
5 70 3 43 4 06.0 08.0 (33) 06.6 (9) 4.95
6 70 3 43 2 15.7 16.9 (8) 15.9 (2) 4.94
7 70 3 31 4 22.0 20.6 (6) 21.5 (2) 4.67
8 70 3 31 2 35.3 34.2 (3) 35.6 (0.8) 4.69
9 30 7 43 4 0.02 1.9 (9076) 0.01 (33) 5.29
10 30 7 43 2 12.9 14.7 (13) 12.4 (4) 5.29
11 30 7 31 4 22.6 21.7 (4) 22.3 (2) 5.28
12 30 7 31 2 40.3 39.0 (3) 42.3 (5) 5.29
13 30 3 43 4 0.04 -0.02 (134) 0.03 (31) 5.30
14 30 3 43 2 12.9 14.4 (11) 12.6 (3) 5.30
15 30 3 31 4 21.9 21.3 (3) 21.6(2) 5.29
16 30 3 31 2 43.1 40.4 (6) 44.0 (2) 5.30
17 90 5 37 3 23.4 21.6 (8) 23.7 (2) 4.21
18 10 5 37 3 21.8 22.5 (3) 21.9 (0.3) 7.27
19 50 9 37 3 26.9 24.7 (8) 26.6 (1) 4.28
20 50 1 37 3 25.9 26.9 (4) 25.6 (1) 4.45
21 50 5 49 3 0.05 -5.7 (11132) 0.04 (15) 5.74
22 50 5 25 3 26.6 31.3 (17) 26.9 (1) 4.78
23 50 5 37 5 02.9 01.9 (33) 02.7 (10) 4.28
24 50 5 37 1 28.4 28.3 (0.4) 28.6 (0.4) 4.29
25 50 5 37 3 34.2 35.3 (3) 34.7 (2) 4.30
26 50 5 37 3 35.8 35.3 (1) 34.7 (3) 4.3127 50 5 37 3 34.9 35.3 (1) 34.7 (0.6) 4.31
28 50 5 37 3 35.3 35.3 (0.2) 34.7 (1) 4.30
29 50 5 37 3 34.9 35.3 (1) 34.7 (0.6) 4.31
30 50 5 37 3 35.5 35.3 (0.6) 34.7 (2) 4.29
31 50 5 37 3 36.0 35.3 (2) 34.7 (3) 4.30
32 50 5 37 3 35.9 35.3 (2) 34.7 (3) 4.29
The italic, bold and normal values represent the experiments used for selection, training and testing, respectively, by the selected ANNa % difference was calculated as the % difference between the observed value and corresponding predicted value over the observed value
J Ind Microbiol Biotechnol (2012) 39:243254 245
123
-
7/30/2019 Comparison of the Estimation Capabilities of Response Surface
4/12
composite design (CCD). Each variable was set at five
different levels of variations (Table 1). The first 16
experiments (24 = 16, factorial CCD) were at factorial
points, eight at axial points (a = 2) and eight replications
for the central points.
Response surface methodology
The optimization results of lipase fermentation conditions
by recombinant E. coli using RSM as reported in the pre-
vious study [21] with some extensions in experiments and
statistical analysis were used in this study. In the RSM
method, a second-order model (Eq. 1) was used to calcu-
late the predicted response and optimal levels:
Y b0 b1X1 b2X2 b3X3 b4X4 b11X21
b22X22 b33X
23 b44X
24 b12X1 X2 b13X1
X3 b14X1 X4 b23X2 X3 b24X2
X4 b34X3 X4 1
where Y represents the response variable and b0 is the
interception coefficient. b1, b2, b3 and b4 are coefficients of
the linear effects, b11, b22, b33 and b44 are coefficients of
quadratic effects and b12, b13, b14, b23, b24 and b34 are
coefficients of interaction effects for the four independent
variables (X1 = glucose, X2 = NaCl, X3 = temperature
and X4 = induction time).
Artificial neural network
For comparison, the same data of lipase fermentation by
recombinant E. coli applied for optimization using RSM, asreported in the previous study [21], were applied to ANN.
The intelligent problem solver in STATISTICA software
version 7 was used to construct the regression-based net-
works from the data. A total of 60 different trained net-
works were observed for selection on the bases of the
highest coefficient of correlation determination (R2) and
the lowest selection error. A multilayer perception network
(MPN) was selected from them. Back propagation (BP)
and conjugate gradient descent (CG) algorithms were used
in the training of the neural network on the basis of varying
input/output pair data sets. The experiments used for
selection (8), training (16) and testing (8) are indicated inTable 1. The topology of the network consists of three
layers with one hidden layer.
Comparison of optimization capability of ANN
and RSM
Adjusted R2, AAD and RMSE were calculated in addition
to the R2 for the comparison of estimation capabilities of
RSM and ANN. The R2 was calculated using Eq. 2:
R2
Pi1n Xi Yi
2Pi1n
"Yi Yi2
2
where X is the predicted lipase activity (by either RSM or
ANN), Yis the observed lipase activity and "Y is the average
observed lipase activity.
The adjusted R2 was calculated using Eq. 3:
AdjustedR2 1 1 R2 N 1
N K 1
!3
where N is the total number of observations and K is the
number of input variables.
The AAD was calculated using Eq. 4:
AAD Xp
i1jyi;exp yi;calj=yi;exp
h i.P
n o 100 4
where yi,exp and yi,cal are the experimental and calculated
responses, respectively, and P is the number of
experiments.
The RMSE was calculated using Eq. 5:
RMSE
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPyi;exp yi;cal
2
n
s5
where yi,exp is the experimental response, yi,cal is the cal-
culated response and n is the number of experiments.
Statistical analysis
The statistical analysis of data and plots were constructed
using STATISTICA software version 7. The analysis of
variance (ANOVA) was employed to determine the sig-
nificance of model parameters in RSM. R2 and adjusted R2
values were calculated to evaluate the performance of the
regression model. The optimum levels of the selected
variables were obtained from the desirability charts (data
not shown). The intelligent problem solver in STATISTICA
neural network was used to construct various neural
networks, the best of which was selected and used for
prediction and optimization. Sensitivity analysis was con-
ducted by the selected ANN to rank the input variables.
Analytical procedures
Culture samples were centrifuged at 10,7009g (TA-14-50,
Allegra250R, Beckman Coulter, USA) for 10 min to
obtain the cell pellets. The collected cell pellets were
washed and resuspended in 20 mM phosphate buffer at pH
7 and then lysed by sonication for 2 min on ice. Superna-
tant was used for lipase determination after removal of the
cell debris by centrifugation. Lipase activity was deter-
mined according to the method proposed by Hamid et al.
[14], which is a modified form of Kwon and Rhees [17]
method. In this method, 1 mL of diluted sample was mixed
246 J Ind Microbiol Biotechnol (2012) 39:243254
123
-
7/30/2019 Comparison of the Estimation Capabilities of Response Surface
5/12
with 2.5 mL olive oil emulsion in phosphate buffer with a
ratio of 1:1. Subsequently, 20 lL of 20 mM CaCl2 was
added to the mixture. The reaction was carried out at 60C
in a water bath, agitated at 200 rpm for 30 min. The
reaction was stopped by the addition of 6 N HCl (1 mL).
The free fatty acids liberated by the action of lipase were
extracted in 5 mL isooctane. Pyridine cupric acetate
reagent (1 mL) was then added to the extracted free fattyacids in isooctane (4 mL) and vortexed. The absorbance of
the upper layer was read at 715 nm. A standard curve
constructed from different concentrations of oleic acid was
used to determine the concentration of free fatty acids in
each sample. One unit of lipase activity is defined as 1 lM
free fatty acids released per minute. The cell concentration
was determined as dry cell weight by drying the cell pellet
at 80C until a constant weight was achieved, normally for
at least 24 h.
Results
Optimization using RSM
The full quadratic second-order model obtained by multi-
ple regression analysis of the experimental data by apply-
ing RSM was expressed in Eq. 6. This model was used for
the prediction of lipase activity.
Y 110:933 0:024X1 4:657X2 8:401X3
11:996X4 0:008X21 0:593X
22 0:157X
23
5:046X24 0:017X1 X2 0:018X1 X3
0:033X1 X4 0:069X2 X3 0:221X2 X4
0:193X3 X4 6
where Y is the lipase activity and X1, X2, X3 and X4are glucose, NaCl, temperature and induction time,
respectively.
The model is a highly significant model according to the
statistical analysis (Table 2). The calculated F value was
16.4 with a very small P value (0.0008). The high R2 value
(0.97) supported the models accuracy. Only 3% of the
total variations were not explained by the model, as is
obvious from the R2 value. The significance of the model is
also represented by the value of the adjusted R2 (0.96).
The regression coefficient values and P values (Table 2)
indicated that the main effects of temperature and induction
time were significant. Quadratic effects of all four variables
glucose, NaCl, temperature and induction time were sig-
nificant. The maximum effect was due to the temperature
followed by induction time, NaCl and glucose sequentially.
The temperatureglucose and induction timeglucose
interaction effects were significant. The interaction effects
are represented by 3D surface plot (Figs. 1, 2).
Optimization using ANN
The selected network (MPN) has a better R2 (0.999) than
RSM. Predicted values of this model were also closer to the
observed values than the RSM-predicted values (Table 1).
The topology of the network consisted of three layers
(4:9:1), an input layer consisting of four fermentation
variables, a middle hidden layer of nine neurons and oneoutput layer for lipase activity. The activation level of the
neurons for ANN processing is represented by different
colours (Fig. 3). The optimum levels of glucose (25 g/L),
NaCl (3 g/L), induction time (2 h) and temperature (30C)
predicted by ANN were different from those predicted by
RSM. The highest effect was from the temperature, fol-
lowed by induction time as calculated by the sensitivity
analysis (Table 3). These results are in agreement with
those obtained by RSM. Glucose and NaCl represented the
third and fourth in terms of significant effect, respectively.
The interaction effects are represented by surface plots
(Figs. 4, 5).
Comparison of optimization using RSM and ANN
The predicted values using the selected model of RSM and
selected network of ANN for the experimental runs with
their percentage differences from the observed lipase val-
ues are provided in Table 2. In most of the cases ANN-
predicted values were closer to the observed values. This
difference between the RSM and ANN was more promi-
nent in the experiments at the edge points such as in
experiments 1, 2, 5, 9, 10, 13, 14 and 21, whereas at the
centre points the predicted levels by RSM and ANN were
more similar.
Verification experiments including RSM- and ANN-
predicted optimum levels for the tested variables are rep-
resented in Table 4. The RSM-predicted levels were
32.4 g/L, 5 g/L, 31.7C and 2.121 h for glucose, NaCl,
temperature and induction time, respectively. The pre-
dicted lipase activity at these optimum levels was 48.9 IU/
mL. The ANN-predicted levels were 25 g/L, 3 g/L, 30C
and 2 h for glucose, NaCl, temperature and induction time,
respectively, with 56.3 IU/mL predicted activity. Experi-
ments were conducted in triplicate at these optimum levels
to calculate the observed response. The observed lipase
activity at the predicted optimum levels of tested variables
was 50.2 IU/mL at RSM optimum levels and 55.8 IU/mL
at ANN predicted levels (Table 4). The lipase activity
predicted by ANN was closer to the observed lipase
activity at mean levels of four variables, at the levels used
before optimization and in some other verification experi-
ments near ANN-predicted optimum levels as compared to
RSM (Table 4). In these verification experiments only
glucose concentration was varied because other parameters
J Ind Microbiol Biotechnol (2012) 39:243254 247
123
-
7/30/2019 Comparison of the Estimation Capabilities of Response Surface
6/12
Table 2 Analysis of variance for optimization of lipase production by recombinant E. coli using Box-Wilson design (data shown in Table 1),
calculated by RSM regression [21]
Variables Analysis of variance Parameter estimates
SS Deg. of
freedom
MS F Estimates t values P values Confidence limits
-95% ?95%
Intercept 103.5437 1 103.5437 16.4015 2110.933 24.0499 0.000832 2168.724 253.1414
Glucose (X1) 0.0560 1 0.0560 0.0089 0.024 0.0942 0.926090 -0.520 0.5686
Glucose2 (X12) 325.1179 1 325.1179 51.4991 20.008 27.1763 0.000002 20.011 20.0059
NaCl (X2) 20.5663 1 20.5663 3.2577 4.657 1.8049 0.088831 -0.787 10.1004
NaCl2 (X22) 166.1063 1 166.1063 26.3114 20.593 25.1295 0.000084 20.837 20.3490
Temperature (X3) 388.8201 1 388.8201 61.5896 8.401 7.8479 0.000000 6.143 10.6599
Temperature2 (X32) 939.7712 1 939.7712 148.8609 20.157 212.2009 0.000000 20.184 20.1296
Induction time (X4) 32.5716 1 32.5716 5.1594 11.996 2.2714 0.036401 0.854 23.1393
Induction time2 (X42) 752.0353 1 752.0353 119.1233 25.046 210.9144 0.000000 26.021 24.0704
Glucose (X1) * NaCl (X2) 7.8338 1 7.8338 1.2409 -0.017 -1.1140 0.280802 -0.051 0.0156
Glucose (X1) * temperature
(X3)
76.2758 1 76.2758 12.0822 0.018 3.4759 0.002891 0.007 0.0292
Glucose (X1
) * induction time
(X4)
2.4380 1 2.4380 0.3862 0.033 0.6214 0.542560 -0.078 0.1430
NaCl (X2) * temperature (X3) 30.4991 1 30.4991 4.8311 0.069 2.1980 0.042092 0.003 0.1353
NaCl (X2) * temperature (X3) 3.1237 1 3.1237 0.4948 0.221 0.7034 0.491317 -0.442 0.8836
Temperature (X3) * induction
time (X4)
21.4360 1 21.4360 3.3955 0.193 1.8427 0.082885 -0.028 0.4138
Error 107.3224 17 6.3131
Bold letters represent the significant variables and their calculated values. SS and MSare sum of squares and mean sum of squares, respectively
The asterisk between two variables represents interaction effects of the two variables
Fig. 1 Surface plot obtainedfrom optimization using RSM
for the combination effect of
temperature and glucose on
lipase production by
recombinant E. coli by keeping
other parameters constant [21]
248 J Ind Microbiol Biotechnol (2012) 39:243254
123
-
7/30/2019 Comparison of the Estimation Capabilities of Response Surface
7/12
are not much different in ANN- and RSM-predicted
optima. There is only a small difference between the
observed and predicted response. Four evaluation param-
eters, namely R2, adjusted R2, AAD and RMSE, were used
to compare RSM and ANN (Table 5). The selected ANN
has higher values of R2 (0.999) and adjusted R2 (0.988)
than those obtained by RSM (0.97 and 0.965, respectively).
On the other hand, the values of AAD (5.09) and RMSE
(0.63) for ANN were lower than those obtained by RSM
(26.38 and 1.48, respectively). These results indicate that
ANN was superior to RSM for the optimization of lipase
production by recombinant E. coli.
Discussion
Optimization is used to increase the performance of a
system. Traditionally, the one parameter at a time tech-
nique is used to optimize bioprocesses. RSM has many
advantages compared to the classical one variable at a time
optimization technique. ANN has received increasing
interest over the last few years, and has been successfully
applied across an extraordinary range of problem domains,
in areas as diverse as business, medicine, engineering,
geology, biotechnology, bioprocessing and physics.
The second-order model obtained from RSM for opti-
mization of recombinant lipase production was verified by
ANOVA and R2. In the ANOVA test, calculated F and
P values were used to determine the significance of an
input variable. Larger Fvalues and smaller P values are an
indication of the significance of the model. From RSM
analysis, F and P values for the main effects, quadratic
effects and interaction effects of variables were obtained.
The effects of temperatureglucose and induction time
Fig. 2 Surface plot obtained
from optimization using RSM
for the combination effect of
induction time and glucose on
lipase production by
recombinant E. coli by keeping
other parameters constant [21]
Fig. 3 The topology of neural network for the estimation of lipase
production. Triangles represent the inputs (neurons added for ANN
processing); glucose, NaCl, temperature and induction time. Squaresrepresent the hidden and output layer (neurons generated during ANN
processing). Small open circles indicate the input and output layers
(the neurons that can be observed in the form of numerical values)
J Ind Microbiol Biotechnol (2012) 39:243254 249
123
-
7/30/2019 Comparison of the Estimation Capabilities of Response Surface
8/12
glucose interactions were significant according to the cal-
culated Fand P values. These values also gave information
about the ranking of variables. For example, a variable with
a large value has a large effect on the response and vice
versa for a variable with a small value.
According to RSM analysis, the maximum effect on
lipase yield was due to temperature. Lipase activity gradu-
ally increased with temperature ranging from 25 to 30C and
then decreased sharply with an increase in temperature from
32 to 45C. The probable reason for the low lipase activity
with respect to high temperature is due to the aggregation of
inclusion bodies. High temperature favours the aggregation
of inclusion bodies whereas lower temperature enhances the
secretion of recombinant protein in soluble form [24]. At
lower temperature, reduced growth of E. coli [2] might be
the reason for decreased lipase production.
The induction time was in second place according to the
significance ranking by RSM. Induction time is one of the
important parameters that affect the expression of recom-
binant proteins. The optimal time of induction (2 h) for
lipase production by recombinant E. coli obtained in this
study corresponded to the early log phase, with cell con-
centrations ranging from 0.3 to 0.5 g/L. Induction was
performed at mid log phase as a base case prior to opti-
mization. The selection of appropriate induction time for
enhancement of recombinant protein production is strain
dependent. The strains, which are sensitive to IPTG con-
centration, need to be induced at late loge phase for
enhancement of recombinant protein production. On the
other hand, strains which are not sensitive to IPTG can be
induced at early log phase [2]. The recombinant E. coli
used in this study for lipase production most probably
belongs to the strain that is not sensitive to IPTG. The
effects of sodium chloride and glucose on lipase production
were not significant and were ranked in third and fourth
place, respectively. RSM analysis also generates informa-
tion on the positive and negative effect that could be
determined by the respective coefficient of the parameters
or variables investigated. The effect of the individual var-
iable or parameter on the process performance could be
evaluated in a more obvious way using RSM as compared
to ANN [8].
Table 3 Sensitivity analysis by ANN for optimization of lipase production by recombinant E. coli
Parameter Glucose (X1) (g/L) NaCl (X2) (g/L) Temperature (X3) (C) Induction time (X4) (h)
Ratio 1.517316 0.960510 3.990739 2.839100
Rank 3.000000 4.000000 1.000000 2.000000
Ratios are values given by ANN as a result of sensitivity analysis to input variables
Rank is the order according to the ratios
Fig. 4 Surface plot obtained
from optimization using ANN
for the combination effect of
temperature and glucose on
lipase production by
recombinant E. coli by keeping
other parameters constant
250 J Ind Microbiol Biotechnol (2012) 39:243254
123
-
7/30/2019 Comparison of the Estimation Capabilities of Response Surface
9/12
RSM gives quantitative interaction effects of paired
variables, which cannot be estimated by ANN. From RSM,
it was observed that the glucosetemperature and glucoseinduction time interaction effects have P values smaller
than 0.05, indicating that these interaction effects were
significant. The glucosetemperature interaction effect
(F value is 12.08) was stronger than the temperature
induction time interaction effect (F value is 4.83). Other
interaction effects were not significant as they have P val-
ues greater than 0.5.
ANN is widely used in various processes to solve
problems related to prediction, classification and control
[7, 9, 12, 19, 26]. ANN is a modern modelling technique
capable of modelling extremely complex systems, in which
any non-linear systems with large numbers of variables canbe handled. The ANN architecture is represented by its
topology rather than by a model as used in RSM. Topology
in ANN mostly consists of three layers: one input layer,
one hidden layer and one output layer. In most of the
estimation cases one hidden layer is sufficient. Two or
more layers are used for the systems with discontinuities
[4]. The number of hidden neurons is determined according
to the size of the input vector and the number of input
output space classifications. Under-fitting may occur when
Fig. 5 Surface plot obtained
from optimization using ANN
for the combination effect of
induction time and glucose on
lipase production by
recombinant E. coli by keeping
other parameters constant
Table 4 Verification experiments including the predicted optimal levels and lipase activity obtained from optimization of lipase production by
recombinant E. coli using RSM and ANN
Serial
no.
Method Glucose
(X1) (g/L)
NaCl (X2)
(g/L)
Temperature
(X3) (C)
Induction
time (X4) (h)
Lipase activity (Y) (IU/mL)
Predicted by
RSM (% diff.a)
Predicted by
ANN (% diff.a)
Experimental
data
1 Base case 50 5 37 3 35.3 (19) 34.7 (17) 29.5 0.799
2 RSM predicted optima 32 5 32 2.12 48.9 (3) 51.3 (2) 50.2 0.809
3 ANN predicted optima 25 3 30 2 39.6 (29) 56.2 (0.7) 55.8 0.934
4 Optima before optimization 10 5 37 2.5 25.2 (13) 30.6 (6) 28.9 0.869
5 Verification experiment 20 3 30 2 38.2 (25) 52.3 (3) 50.6 1.267
6 Verification experiment 15 3 30 2 36.3 (22) 47.8 (3) 46.5 0.949
7 Verification experiment 35 3 30 2 41.2 (4) 43.4 (1) 42.8 0.965
Values followed by are standard errors of mean valuesa % difference from the observed values
J Ind Microbiol Biotechnol (2012) 39:243254 251
123
-
7/30/2019 Comparison of the Estimation Capabilities of Response Surface
10/12
the number of neurons used in the hidden layer is too small
and vice versa [16]. The developed network in this study
contains three layers: (1) an input layer with four neurons,
(2) one hidden layer with nine neurons and (3) an output
layer with one neuron. The number of hidden neurons is
not too large or too small and represents the best network
system from the observed 60 networks. An increase in the
number of hidden layers and number of neurons therein didnot increase the prediction accuracy.
ANN sensitivity analysis could be carried out to calcu-
late the ratio and thereby the ranking of the variables. Ratio
is the basic measure of sensitivity, which is calculated as
the ratio of the error with missing value substitution to the
original error. Ratio shows how much the network is sen-
sitive for a particular input value. The variable which has a
ratio equal to or less than 1 is not significant, whereas the
variable with a ratio larger than 1 has a significant effect on
output [18]. From the sensitivity analysis conducted in this
study, temperature (ratio = 3.99) gave the highest effect
on lipase production by recombinant E. coli and the lowesteffect was from sodium chloride (ratio = 0.96) (Table 3).
According to significance criteria, the effect of tempera-
ture, induction time and glucose were significant whereas
the effect of NaCl was insignificant. Results from RSM and
ANN indicate that the ranking for temperature and induc-
tion time was similar, whereas the ranking for glucose and
NaCl was different. It is important to note that the qua-
dratic and interaction effects could not be evaluated using
ANN.
The optimum level for glucose (25 g/L) as predicted by
ANN was smaller than that predicted by RSM (32 g/L).
The levels of NaCl (3 g/L), temperature (30C) and
induction time (2 h) predicted by ANN were also lower
than that those predicted by RSM. The optimum induction
time predicted by ANN was similar to that predicted by
RSM. Higher lipase activity was observed at optimum
levels predicted by ANN as compared to RSM. The lipase
activity predicted by ANN also fitted the observed data at
the optimum levels better than the case for RSM (Table 4).
Results from this study showed that the estimation capa-
bility of ANN for optimum levels of variables for lipase
production by recombinant E. coli was more accurate than
RSM. Similar observations have also been reported [8, 28,
29]. Both methods (RSM and ANN) estimated low glucose
concentration as optimal levels. The production of acetate
and other organic acids, as indicated by a drastic drop in
the final culture pH at higher glucose levels, may reduce
the growth and lipase production (Table 1).
Several methods could be used to evaluate the goodness
and accuracy of a given model and to compare two or more
models. The overall predictive capability of the model is
normally determined by R2, but the efficiency of a model
may not be explained by R2 alone [9]. Besides R2, adjusted
R2 may be used to support the models accuracy. In a
multiple linear regression model, adjusted R2 measures the
proportion of the variation in the dependent variable
accounted for by the explanatory variables. Adjusted R2 is
generally considered to be a more accurate goodness-of-fit
measure than R2. Results from this study show that the
adjusted R2 values for both RSM and ANN support the
models accuracy as their values are not much differentfrom R2 values. For a good model, R2 must be closed to 1
whereas the difference between the values of adjusted R2
and R2 must be very small.
Large values of R2 and adjusted R2 do not always mean
that the regression model is an efficient model. Other
values such as AAD and RMSE are also used to validate
and compare more than one model. For a good model, the
AAD value must be as small as possible whereas the
RMSE value must be close to zero. The AAD value (26.38)
for RSM was almost five times higher than the AAD value
(5.09) for ANN (Table 5). In addition, the RMSE value
(1.48) for RSM was two times higher than the RMSE
value (0.63) for ANN. Larger values of RMSE and AAD
mean higher chances of errors in prediction. Therefore,
ANN predictions with lower RMSE and ADD values are
more reliable and accurate than RSM predictions.
The ANN optima were closer to the true optima as
observed from the verification experiments (Table 4). It
was noted that the optimum levels of most of the variables
predicted by RSM and ANN were closer to the edges of
topology. In some other cases as reported in the literature,
the optima predicted by ANN and RSM were located at the
edges [79] and in one case the optima were even located
outside of the topology [8]. It was also observed from the
present studies that the RSM-predicted optima was closer
to the centre points as compared to the ANN-predicted
optima.
The present study indicated that the values predicted by
RSM and ANN were similar to the observed values
towards the centre points. However, the values predicted
by RSM were different from the observed values towards
the edges. At the edges, the effect was observed mainly
from glucose concentration. When glucose concentration
Table 5 Comparison of optimization and prediction capability by
ANN and RSM for lipase production by recombinant E. coli
Serial no. Statistic RSM ANN
1 R2 0.97 0.99
2 Adjusted R2 0.965 0.988
3 AAD 26.38 5.09
4 RMSE 1.48 0.63
AAD absolute average deviation, RMSE root mean square error,
R2 coefficient of correlation determination, adjusted R2 adjusted
coefficient of correlation determination
252 J Ind Microbiol Biotechnol (2012) 39:243254
123
-
7/30/2019 Comparison of the Estimation Capabilities of Response Surface
11/12
was decreased from the RSM-predicted optimum glucose
level, the lipase activity predicted by RSM decreased
sharply. Therefore, it can be concluded that RSM-predicted
optima were not as close to the real optima as the ANN-
predicted optima of the lipase production by recombinant
E. coli. For many other bioprocesses, ANN was a better
optima predictor than RSM [79].
Conclusion
Results from this study have demonstrated that ANN gave
better estimation capabilities throughout the range of
variables as compared to RSM in the optimization of lipase
production by a recombinant E. coli. Lipase production
predicted by ANN at optimal variables fitted well to the
experimental data. The constructed ANN has larger R2 and
adjusted R2 values, whereas AAD and RMSE values are
smaller as compared to those observed from the application
of RSM. Similar to the application of RSM, ANN couldalso rank the independent variables. In addition, ANN does
not need a model for prediction, a specified design or
preliminary knowledge about the system. However, the
application of RSM enables the estimation of the quantity
of effect of independent variables as well as the interaction
effects.
Acknowledgments The work presented here is part of Ms Rubina
Nelofers Ph.D. studies, funded by the Pakistan Council of Scientific
and Industrial Research. The research work is funded by Universiti
Putra Malaysia.
References
1. Azaman SN, Ramanan RN, Tan JS, Rahim RA, Abdulla MP,
Ariff AB (2010) Optimization of an induction strategy for
improving interferon-alpha2b production in the periplasm of
Escherichia coli using response surface methodology. Biotechnol
Appl Biochem 56:141150
2. Azaman SN, Ramanan RN, Tan JS, Rahim RA, Abdullah MP,
Ariff AB (2010) Screening for the optimal induction parameters
for periplasmic producing interferon-2b in Escherichia coli. Afr J
Biotechnol 9:63456354
3. Bas D, Boyaci IH (2007) Modeling and optimization I: usability
of response surface methodology. J Food Eng 78:8368454. Basheer IA, Hajmeer M (2000) Artificial neural networks: fun-
damentals, computing, design, and application. J Microbiol
Method 43:331
5. Bezerra MA, Santelli RE, Oliveira EP, Villar SL, Escaleira LA
(2008) Response surface methodology (RSM) as a tool for opti-
mization in analytical chemistry. Talanta 76:965977
6. Cornish-Bowden A (2001) Detection of errors of interpretation in
experiments in enzyme kinetics. Methods 24:181190
7. Dasari VRRK, Donthireddy SRR, Nikku MY, Garapati HR
(2009) Optimization of medium constituents for cephalosporin C
production using response surface methodology and artificial
neural networks. J Biochem Technol 1:6974
8. Desai KM, Survase SA, Saudagar PS, Lele SS, Singhal RS (2008)
Comparison of artificial neural network (ANN) and response
surface methodology (RSM) in fermentation media optimization:
case study of fermentative production of scleroglucan. Biochem
Eng J 41:266273
9. Ebrahimpour A, Rahman RNZRA, Chng DH, Basri M, Salleh
AB (2008) A modelling study by response surface methodology
and artificial neural network on culture parameters optimization
for thermostable lipase production from a newly isolated ther-
mophilic Geobacillus sp. strain ARM. BMC Biotechnol 8:96
10. Eltaweel MA, Rahman RNZRA, Salleh AB, Basri M (2005) An
organic solvent-stable lipase from Bacillus sp. strain 42. Ann
Microbiol 55:187192
11. Farliahati MR, Ramanan NR, Mohamad R, Puspaningsih NNT,
Ariff AB (2010) Enhanced production of xylanase by recombi-
nant Escherichia coli DH5 through optimization of medium
composition using response surface methodology. Ann Microbiol
60:279285
12. Ghaffari A, Abdollahi H, Khoshayand MR, Bozchalooi IS,
Dadgar A, Rafiee-Tehrani M (2006) Performance comparison of
neural network training algorithms in modeling of bimodal drug
delivery. Int J Pharm 327:126138
13. Haider MA, Pakshirajan K, Singh A, Chaudhry S (2008) Artifi-
cial neural network-genetic algorithm approach to optimize
media constituents for enhancing lipase production by a soil
microorganism. Appl Biochem Biotechnol 144:225235
14. Hamid THTA, Eltaweel MA, Rahman RNZRA, Basri M, Salleh
AB (2009) Characterization and solvent stable features of strep-
tagged purified recombinant lipase from thermostable and solvent
tolerant Bacillus sp. strain 42. Ann Microbiol 59:111118
15. Hill T, Lewicki P (2007) Statistics: methods and applications.
StatSoft, Tulsa
16. Karnik SR, Gaitonde VN, Davim JP (2008) A comparative study
of the ANN and RSM modeling approaches for predicting burr
size in drilling. Int J Adv Manuf Technol 38:868883
17. Kwon DY, Rhee JS (1986) A simple and rapid colorimetric
method for determination of free fatty acids for lipase assay.
J Am Oil Chem Soc 63:8992
18. Lou W, Nakai S (2001) Application of artificial neural networks
for predicting the thermal inactivation of bacteria: a combined
effect of temperature, pH and water activity. Food Res Int
34:573579
19. Low CT, Mohamed R, Tan CP, Long K, Ismail R, Lo SK, Lai OM
(2007) Lipase-catalyzed production of medium-chain triacylgly-
cerols from palm kernel oil distillate: optimization using response
surface methodology. Eur J Lipid Sci Technol 109:107119
20. Maldonado LMTP, Hernandez VEB, Rivero EM, Rosa AB,
Flores JLF, Acevedo LGO, Rodrguez ADL (2007) Optimization
of culture conditions for a synthetic gene expression in Esche-
richia coli using response surface methodology: the case of
human interferon beta. Biomolecul Eng 24:217222
21. Nelofer R, Ramanan NR, Rahman RNZRA, Basri M, Ariff AB
(2010) Sequential optimization of production of a thermostable
and organic solvent tolerant lipase by recombinant Escherichiacoli. Ann Microbiol. doi:10.1007/s13213-010-0170-9
22. Noorossana R, Davanloo TS, Saghaei A (2009) An artificial
neural network approach to multiple-response optimization. Int J
Adv Manuf Technol 40:12271238
23. Pan H, Xie Z, Bao W, Zhang J (2008) Optimization of culture
conditions to enhance cis-epoxysuccinate hydrolase production in
Escherichia coli by response surface methodology. Biochem Eng
J 42:133138
24. Qiao CL, Shen BC, Xing JM, Huang J, Zhang JL, Zhao DH,
Yang B (2006) Culture and characteristics of recombinant protein
production of an Escherichia coli strain expressing carboxylest-
erase B1. Int Biodeterior Biodegrad 58:7781
J Ind Microbiol Biotechnol (2012) 39:243254 253
123
http://dx.doi.org/10.1007/s13213-010-0170-9http://dx.doi.org/10.1007/s13213-010-0170-9 -
7/30/2019 Comparison of the Estimation Capabilities of Response Surface
12/12
25. Rajendran A, Palanisamy A, Thangavelu V (2008) Evaluation of
medium components by Plackett-Burman statistical design for
lipase production by Candida rugosa and kinetic modelling. Chin
J Biotechnol 24:436444
26. Razmi-Rad E, Ghanbarzadeh B, Rashmekarim J (2008) An arti-
ficial neural network for prediction of Zeleny sedimentation
volume of wheat flour. Int J Agric Biol 10:422426
27. Singh V, Khan M, Khan S, Tripathi CKM (2009) Optimization of
actinomycin V production by Streptomyces triostinicus using
artificial neural network and genetic algorithm. Appl Microbiol
Biotechnol 82:379385
28. Tsao CC (2008) Comparison between response surface method-
ology and radial basis function network for core-center drill in
drilling composite materials. Int J Adv Manuf Technol 37:1061
1068
29. Youssefi SH, Emam-Djomeh Z, Mousavi SM (2009) Comparison
of artificial neural network (ANN) and response surface meth-
odology (RSM) in the prediction of quality parameters of spray-
dried pomegranate juice. Dry Technol 27:910917
254 J Ind Microbiol Biotechnol (2012) 39:243254
123