support vector regression models for trickle bed reactors

10
Support vector regression models for trickle bed reactors Shubh Bansal a , Shantanu Roy a,, Faical Larachi b a Department of Chemical Engineering, Indian Institute of Technology – Delhi, Hauz Khas, New Delhi 110 016, India b Department of Chemical Engineering, Laval University, Quebec City, Quebec, Canada G1V 0A6 highlights " Regression method based on Support Vector Machines. " Data mined from over 22,000 experimental conditions from various authors. " SVR method shows remarkably good predictability over this wide range of data. " Improves on earlier neural network-based heuristic learning proposed by Larachi and co-workers [1]. " Scalable and extendable to other multiphase reactor systems. article info Article history: Available online 31 July 2012 Keywords: Support Vector Machines (SVMs) Support Vector Regression (SVR) Correlations Trickle bed reactors Machine learning abstract Transport phenomena in multiphase reactors are poorly understood and first-principles modeling approaches have hitherto met with limited success. Industry continues thus far to depend heavily on engineering correlations for variables like pressure drop, transport coefficients and wetting efficiencies. While immensely useful, engineering correlations typically have wide variations in their predictive capa- bility when venturing outside their instructed domain, and hence universally applicable correlations are rare. In this contribution, we present a machine learning approach for modeling such multiphase sys- tems, specifically using the Support Vector Regression (SVR) algorithm. An application of trickle bed reac- tors is considered wherein key design variables for which numerous correlations exist in the literature (with a large variation in their predictions), are all correlated using the SVR approach with remarkable accuracy of prediction for all the different literature data sets with wide-ranging databanks. Ó 2012 Elsevier B.V. All rights reserved. 1. Introduction Estimation of various design variables in multiphase reactors has been a major challenge in the Chemical Engineering discipline since its early years. For design and scale-up, it is crucial to esti- mate properties such as heat and mass transport coefficients, wet- ting efficiencies and friction factors, as well as for secondary functions like discrimination of hydrodynamic flow regimes (e.g. [1,2]). Prior to the early 1990s, detailed modeling methodologies like computational fluid dynamics (CFD) and multi-scale and mul- ti-physics approaches were not available, hence almost all indus- trial design and scale-up was based on correlations, and at best, phenomenological one-dimensional models. Two decades hence, we have made significant progress in multi-physics and multi- scale modeling approaches, and even commercially available pack- ages like Comsol Ò Multiphysics, Ansys Ò Multiphysics and Fluent Ò profess the relative ease with which it is now possible to solve flow and transport problems in complex geometries. Be that as it may, these detailed modeling methodologies most often depend on clo- sure models for various phenomena occurring at different scales. And since progress of our understanding of the multi-scale physics is limited, these powerful numerical platforms are also limited in their predictions of key design variables. Thus, for all practical pur- poses, the technical basis of industrial design has had a status quo in terms of empirical correlations still continuing to be the work- horse of industrial design. Effective mining of the experimental data collected over large ranges and cast into simple, easy-to-use correlations is still a desirable objective in all designs. The perennial question is how can we make the correlations themselves more predictive in terms of their accuracy and versatil- ity had it been possible to grasp via heuristic (or soft) modeling the physical features concealed over wide-ranging databanks while bypassing the necessity to construct ad hoc first-principle clo- sures? Thus, a ‘‘parallel’’ approach to the better physical models and better numerical algorithms to solve the first principles trans- port equation (progress in which is welcome and in the future may become the workhorse of the design process), would be to develop 1385-8947/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.cej.2012.07.081 Corresponding author. Tel.: +91 11 2659 6021; fax: +91 11 2658 1120. E-mail address: [email protected] (S. Roy). Chemical Engineering Journal 207–208 (2012) 822–831 Contents lists available at SciVerse ScienceDirect Chemical Engineering Journal journal homepage: www.elsevier.com/locate/cej

Upload: faical

Post on 08-Dec-2016

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Support vector regression models for trickle bed reactors

Chemical Engineering Journal 207–208 (2012) 822–831

Contents lists available at SciVerse ScienceDirect

Chemical Engineering Journal

journal homepage: www.elsevier .com/locate /ce j

Support vector regression models for trickle bed reactors

Shubh Bansal a, Shantanu Roy a,⇑, Faical Larachi b

a Department of Chemical Engineering, Indian Institute of Technology – Delhi, Hauz Khas, New Delhi 110 016, Indiab Department of Chemical Engineering, Laval University, Quebec City, Quebec, Canada G1V 0A6

h i g h l i g h t s

" Regression method based on Support Vector Machines." Data mined from over 22,000 experimental conditions from various authors." SVR method shows remarkably good predictability over this wide range of data." Improves on earlier neural network-based heuristic learning proposed by Larachi and co-workers [1]." Scalable and extendable to other multiphase reactor systems.

a r t i c l e i n f o

Article history:Available online 31 July 2012

Keywords:Support Vector Machines (SVMs)Support Vector Regression (SVR)CorrelationsTrickle bed reactorsMachine learning

1385-8947/$ - see front matter � 2012 Elsevier B.V. Ahttp://dx.doi.org/10.1016/j.cej.2012.07.081

⇑ Corresponding author. Tel.: +91 11 2659 6021; faE-mail address: [email protected] (S. Roy).

a b s t r a c t

Transport phenomena in multiphase reactors are poorly understood and first-principles modelingapproaches have hitherto met with limited success. Industry continues thus far to depend heavily onengineering correlations for variables like pressure drop, transport coefficients and wetting efficiencies.While immensely useful, engineering correlations typically have wide variations in their predictive capa-bility when venturing outside their instructed domain, and hence universally applicable correlations arerare. In this contribution, we present a machine learning approach for modeling such multiphase sys-tems, specifically using the Support Vector Regression (SVR) algorithm. An application of trickle bed reac-tors is considered wherein key design variables for which numerous correlations exist in the literature(with a large variation in their predictions), are all correlated using the SVR approach with remarkableaccuracy of prediction for all the different literature data sets with wide-ranging databanks.

� 2012 Elsevier B.V. All rights reserved.

1. Introduction

Estimation of various design variables in multiphase reactorshas been a major challenge in the Chemical Engineering disciplinesince its early years. For design and scale-up, it is crucial to esti-mate properties such as heat and mass transport coefficients, wet-ting efficiencies and friction factors, as well as for secondaryfunctions like discrimination of hydrodynamic flow regimes (e.g.[1,2]). Prior to the early 1990s, detailed modeling methodologieslike computational fluid dynamics (CFD) and multi-scale and mul-ti-physics approaches were not available, hence almost all indus-trial design and scale-up was based on correlations, and at best,phenomenological one-dimensional models. Two decades hence,we have made significant progress in multi-physics and multi-scale modeling approaches, and even commercially available pack-ages like Comsol� Multiphysics, Ansys� Multiphysics and Fluent�

profess the relative ease with which it is now possible to solve flow

ll rights reserved.

x: +91 11 2658 1120.

and transport problems in complex geometries. Be that as it may,these detailed modeling methodologies most often depend on clo-sure models for various phenomena occurring at different scales.And since progress of our understanding of the multi-scale physicsis limited, these powerful numerical platforms are also limited intheir predictions of key design variables. Thus, for all practical pur-poses, the technical basis of industrial design has had a status quoin terms of empirical correlations still continuing to be the work-horse of industrial design. Effective mining of the experimentaldata collected over large ranges and cast into simple, easy-to-usecorrelations is still a desirable objective in all designs.

The perennial question is how can we make the correlationsthemselves more predictive in terms of their accuracy and versatil-ity had it been possible to grasp via heuristic (or soft) modeling thephysical features concealed over wide-ranging databanks whilebypassing the necessity to construct ad hoc first-principle clo-sures? Thus, a ‘‘parallel’’ approach to the better physical modelsand better numerical algorithms to solve the first principles trans-port equation (progress in which is welcome and in the future maybecome the workhorse of the design process), would be to develop

Page 2: Support vector regression models for trickle bed reactors

Nomenclature

AARE average absolute relative error 1N

PNi¼1

ycalc;i�yexp;iyexp;i

��� ���h i(–)

aLG specific gas–liquid interfacial area (m2/m3)C SVM parameter (trade-off cost) (–)dc column diameter (m)dp equivalent diameter (m)DG diffusivity in gas phase (m2/s)DL diffusivity in liquid phase (m2/s)

FrL liquid Froude number ¼ v2SL

gdp

� �(–)

g acceleration due to gravity (m/s2)kLa liquid phase side volumetric mass transfer coefficient

(s�1)kGa gas phase side volumetric mass transfer coefficient (s�1)ScL liquid phase Schmidt number ¼ lL

DLqL

� �(–)

ScG gas phase Schmidt number ¼ lGDGqG

� �(–)

ShL liquid phase Sherwood number ¼ kLad2h

DL

� �(–)

ShG gas phase Sherwood number ¼ kGad2h

DG

� �(–)

Sb bed correction factor ¼ asdh1�e

� �(–)

vSL or UL liquid superficial velocity (m/s)vSG or UG gas superficial velocity (m/s)~xðiÞ set of inputs merged into a vector, for the ith training

sample (–)~xðiÞj jth feature/attribute/input of the ith training sample (–)y(i) output value/label corresponding to the ith training

sample

Greek letterse bed porosity (–)r standard deviation of relative (percentage) error of pre-

diction

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPN

i¼1ð

ycalc;i�yexp;iyexp;i

��� ����AAREÞ2

ðN�1Þ

s264

375 (–)

/ sphericity of packing particles (–)rL surface tension of liquid (N/m)l dynamic viscosity (kg/m s)

S. Bansal et al. / Chemical Engineering Journal 207–208 (2012) 822–831 823

rigorous methods for making correlation predictions more accu-rate for data sets in which they have been developed, and also al-low if possible to extrapolate them to situations in which datacould not be or cannot be collected. This forms the motivationfor this contribution.

It is generally well-known that there is a wide variation in thepredictions for the same dependent variable by using correlationsfrom different sources [1]. For example, a comparison of two best-known correlations for Sherwood number for gas-phase in tricklebeds from Refs. [3,4] is reported in [1], in which the correlationpredictions are compared with experimental values. The compari-son with Ref. [3] reports an AARE (average absolute relative error)of 54.9% (standard deviation of 141%), and the comparison withRef. [4] reports an AARE of 89.4% (standard deviation of 266%),and 37.2% (standard deviation of 167%), respectively, for the lowinteraction regime and high interaction regime that prevail intrickle beds (see Fig. 5 in [1], which is reproduced as Fig. 1 here).

Fig. 1. Parity plot of predicted value of gas phase Sherwood number vs.experimental values obtained from all correlations in the database (adapted from[1]).

In other words, even the best and popular correlations for variousdesign parameters (Sherwood number being only one such illus-trative example) have rather poor predictive capability when usedoutside the domains over which the correlations were originallydeveloped.

In recent years, some researchers (e.g. [1]) have attempted toconsolidate a large body of published experimental data for hydro-dynamics and transport in multiphase reactors using artificial neu-ral network (ANN) models. What these researchers did was todevelop neural network-based models that were ‘‘trained’’ on alarge body of data (from different, independently obtained datasetsreported in the open literature), resulting in a heuristic learningtool that was able to predict to remarkable success the various de-sign variables of choice. Fig. 1 clearly illustrates the improvementin predictability that could be achieved with the ANN approach[1]. Our current paper is inspired largely by this philosophy, thatan effective machine learning tool should be able to ‘‘wadethrough’’ large body of empirical data and ‘‘learn’’ the trends tothe extent that its own predictability would far outweigh that ofcorrelations developed based on individual data sets that consti-tute the data set in toto. We demonstrate this through Support Vec-tor Machines (SVMs) based approach in the following sections.

It is worth noting that the power and versatility of the SVMtechnique has already been applied to a limited number of Chem-ical Engineering applications, particularly in the context of processcontrol and optimization [5–8], in the development of sensors [9]and in correlating data for gas holdup in bubble columns [10].The last reference is in some ways similar to our current contribu-tion, though in a different context: the current work being dedi-cated to multi-variable correlations in trickle bed reactors.

2. Relevant theory of Support Vector Machines (SVMs) andRegression (SVR)

Support Vector Machines or SVMs [11] is a machine learningmethod that was conceived and gained popularity in the late1990s, and have a lot of attractive features which seem to put themin a better position than artificial neural networks (ANNs) as learn-ing tools for large sets of empirical data. In particular, they directlyaddress the issue that commonly plagues ANN models, i.e., overfit-ting. SVR theory has been rigorously developed from computa-tional theory, as opposed to ANNs where development has taken

Page 3: Support vector regression models for trickle bed reactors

824 S. Bansal et al. / Chemical Engineering Journal 207–208 (2012) 822–831

a more heuristic path [12]. Rather than minimizing empirical risk,or training error (as is done in ANNs), SVMs minimize structuralrisk [12]. In SVR, the objective function (for instance the error func-tion which may need to be minimized) is convex, meaning that theglobal optimum is always reached. This is sharply in contrast toANNs, where for instance, the classical back-propagation learningalgorithm is prone to convergence to ‘‘bad’’ local minima [12–14]. In practice, they have greatly outperformed ANNs in a widebreadth of applications [12].

While a rigorous derivation of SVR data regression is outside thescope of this contribution and will be reported elsewhere (the der-ivation is along same lines as theory described in [12]), an illustra-tion of the basic concept is in order. Suppose we want to model adimensionless quantity, such as the specific gas–liquid interfacialarea, aLG, as a function of some other dimensionless groups, saythe vector ~x ¼ ReL;ReG;WeL;WeG; FrL½ �. That is, suppose we hadthe following modeling problem, and we wanted to find a functionf that is optimal in some sense:

aLG ¼ f ðReL;ReG;WeL;WeG; FrLÞ ð1Þ

Conventionally, many correlations of relevance to Chemical Engi-neering are typically formulated by assuming a form of the functionf that is a product of various powers of the dimensionless groups,pre-multiplied by a constant. Functions of this kind are inspiredby the use of Buckingham pi theorem for developing correlations.Notionally, such a form of f may be written as:

aLG ¼ f ðReL;ReG;WeL;WeG; FrLÞ¼ kðReLÞw1 ðReGÞw2 ðWeLÞw3 ðWeGÞw4 ðFrLÞw5 ð2Þ

Next, the parameter vector [w1, w2, . . ., w5] is estimated by varyingits corresponding input (dimensionless group), while holding allother inputs fixed, and computing the slope of the best fit line to lo-g (aLG) vs. log (inputi). For instance, w1 can be found by varying only

Fig. 2. Parity plot of predicted value of specific gas–liquid interfacial area for same datdimensional value of a. (c) Using SVM: Model 2 (Table 1) with dimensionless value of a

ReL while holding all of [ReG, WeL, WeG, FrL] fixed, by the followingexpression:

logðaLGÞ ¼ w1 logðReLÞ þ log½ðReGÞw2 ðWeLÞw3 ðWeGÞw4 ðFrLÞw5 � ð3Þ

Note that when ReL is varied, the w1 is estimated from the slope ofthe log(a) vs. log(ReL) graph and the second term in Eq. (3) ismerely the intercept (even though in itself it consists of dimension-less groups whose powers have also to be estimated). In numericalimplementations, various approximations are used for defining the‘‘best’’ estimate of the ‘‘slope’’, in order to estimate the exponents ofthe various dimensionless groups.

Even though this has been regarded as a reasonable approachfor decades, clearly there are some limitations with it. The first isthat this is a very restricted functional space to explore, and the‘‘true’’ relationship in our dataset may not really be described inthe best way by Eq. (2). Even if the functional form may be optimal,this form of regression (as in Eq. (3)) will not in general give us thebest set of parameters. Here, with respect to each input (on a log-arithmic scale), the objective is to find a line that best representsthe output’s variation with respect to only that input. Note thateven in multivariable regression, the various algorithms essentiallyfollow this process for regression (Eq. (3)) only, except that differ-ent variables may be optimized one at a time in different stepswhich follow some rules (depending on the algorithm used). Ineither case, when tradeoffs exist (which do, because a differentdimensionless groups may be affected by the same physical quan-tity, like viscosity or density), optimizing each wi independentlymay not result in the best estimated for the parameter vector:~wð¼ ½w1;w2 . . . w5�Þ.

In fact, the conventional method of regression building as high-lighted in above method actually, in general, minimize even the er-ror what it is trying to minimize, i.e., the so-called training error, orerror over the data that is being used to build the correlation(which essentially ‘‘trains’’ the mathematical function f to give itpredictive ability). A more fundamental limitation of this approach

abase. (a) Using ANN based correlation [1]. (b) Using SVM: Model 1 (Table 1) withLG.

Page 4: Support vector regression models for trickle bed reactors

S. Bansal et al. / Chemical Engineering Journal 207–208 (2012) 822–831 825

is that it at best considers training error, and pays no heed to gen-eralization error, i.e., it provides no probabilistic guarantee regard-ing its performance on new, unseen data. Thus, such correlationsare almost never reliable for extrapolation into domains that arebeyond the range in which the correlation was developed. In phys-ical terms, this limitation may translate not only to domains out-side the operating conditions, but also outside domains likedifferent materials and fluids. For instance, our variable of interestis Reynolds number which appears in some correlation developedthus. Not only the reliability of the correlation is rather limited inflow velocities outside the range of values available in the datarange wherein the correlation was developed, but also limited inits use of different kind of fluids. As an example, if the correlationwas developed by using data from tests done with air and waterbeing fluids, its extrapolation to a situation which involves liquidhydrocarbons and hydrogen gas is rather questionable. The lattersituation, as researchers will easily recognize, is a common limita-tion faced in ‘‘academic’’ correlation development from laboratorycold flow units running on air–water, and the intended use beingindustrial reactors using hydrocarbons.

SVR addresses this fundamental limitation of conventional cor-relations in the following manner. SVR only explores functions that

Table 1SVR models for specific gas–liquid interfacial area (Fig. 2).

Inputs Output Kernel an

Model parameters and errorsModel 1 log [uL, uG, Dc, e, dv, /, qL, lL, rL, qG, lG] log aLG Gaussian

e = 0.031Model 2 log [ReL, ReG, WeL, WeG, ScL, StL, XG, MoL, FrL, EoM, Sb] log aLG dh

1�eGaussiane = 0.003

Percentile aLG (m2/m3 reactor) uL (m/s) uG (m/s) Dc (m) e (–) dv (m)

Profile by dimensional quantitiesMin 2.34E+01 5.81E�04 8.39E�03 2.30E�02 2.63E�01 1.16E�035% 7.69E+01 1.20E�03 2.77E�02 2.30E�02 3.37E�01 1.16E�0325% 1.63E+02 3.60E�03 8.67E�02 5.00E�02 3.74E�01 2.40E�0350% 3.30E+02 7.23E�03 2.50E�01 5.00E�02 4.08E�01 3.37E�0375% 7.58E+02 2.12E�02 8.83E�01 1.00E�01 6.70E�01 8.81E�0395% 2.45E+03 8.34E�02 2.00E+00 2.00E�01 9.30E�01 2.04E�02Max 1.07E+04 1.26E�01 4.50E+00 3.80E�01 9.40E�01 3.47E�02

Percentile a_dimensionless ReL ReG WeL WeG ScL

Profile by dimensionless numbersMin 6.86E�09 8.79E�02 1.77E+00 1.20E�05 3.55E�06 1.375% 2.52E�07 1.18E+00 5.47E+00 9.42E�05 7.78E�05 1.5025% 3.52E�06 6.17E+00 3.12E+01 7.68E�04 1.22E�03 6.6250% 1.89E�05 2.31E+01 1.28E+02 4.57E�03 8.36E�03 8.3975% 4.26E�04 9.43E+01 4.34E+02 6.30E�02 1.05E�01 1.3295% 5.96E�02 3.08E+02 1.89E+03 4.09E�01 1.18E+00 1.05Max 2.08E�01 2.90E+03 6.24E+03 4.38E+00 6.86+00 4.23

Percentile uL (m/s) uG (m/s) Dc (m) e (–) dv (m) / (–

Trends by dimensional quantities5% 173.89 226.57 536.98 306.65 299.52 22225% 314.73 354.93 838.08 495.55 954.32 26850% 428.55 528.21 1013.72 851.04 805.02 89675% 501.16 986.32 815.21 958.57 578.72 68895% 1183.67 832.79 401.67 257.26 248.69 706Max 2197.52 443.83 284.05 234.79 192.37 706

Percentile ReL ReG WeL WeG ScL

Trends by dimensionless numbers5% 2.72E�05 3.35E�06 2.55E�05 2.34E�05 1.33E�0625% 1.81E�05 1.56E�03 1.08E�03 2.99E�05 2.85E�0250% 8.15E�04 6.42E�05 2.26E�03 4.12E�05 7.04E�0375% 5.71E�03 1.35E�03 1.21E�02 3.85E�03 5.59E�0395% 1.67E�02 2.11E�02 1.94E�02 2.01E�02 1.31E�04Max 6.52E�02 7.16E�02 9.33E�03 5.90E�02 9.27E�06

are linear in the input space, i.e., functions of the following form,and by design, nearly (almost always) guarantees the set of param-eters ~w that will minimize the generalization error given a certaindata set [12–14]:

aLG ¼ f ðReL;ReG;WeL;WeG; FrLÞ¼ w1ReL þw2ReG þw3WeL þw4WeG þw5FrL þ b ð4Þ

Compare this with the functional form in Eq. (2). In order to con-sider functions of the form that we had considered earlier (Eq.(2)), one needs to transform the inputs. Thus, if in our original inputspace, input vectors were of the form: ~x ¼ ReL;ReG;WeL;WeG; FrL½ �,then, in our transformed set of inputs, or mapped feature space,the inputs we will consider are of the form:

uð~xÞ ¼ ½log ReL; log ReG; log WeL; log WeG; log FrL� ð5Þ

Now, instead of applying Support Vector Regression (SVR) on~x; aLGð Þ, we apply it on ðuð~xÞ; log aLGÞ, the form of functions that

we will be exploring will be:

log f ¼ w1 log ReL þw2 log ReG þw3 log WeL þw4 log WeG

þw5 log FrL þ b ð6Þ

d parameters Supportvectors

Totalsamples

Training error Validation error

c = 1, C = 64,25

758 1461 8.05% (1061points)

12.77% (400points)

c = 1, C = 64,91

1369 1461 10.9% (1061points)

17.2% (400points)

/ (–) qL (kg/m3) lL (Pa s) rL (N/m) DL (m2/s) qG (kg/m3) lG (Pa s)

1.26E�01 8.05E+02 6.32E�04 1.06E�02 1.23E�10 1.15E+00 1.50E�051.39E�01 8.39E+02 6.90E�04 2.56E�02 2.40E�10 1.17E+00 1.71E�054.55E�01 1.00E+03 1.12E�03 4.00E�02 1.17E�09 1.19E+00 1.74E�059.11E�01 1.04E+03 1.40E�03 6.28E�02 1.60E�09 1.24E+00 1.78E�051.00E+00 1.08E+03 1.55E�03 6.44E�02 2.00E�09 1.70E+00 1.80E�051.00E+00 1.10E+03 2.32E�02 7.40E�02 3.97E�09 1.26E+01 1.80E�051.00E+00 1.12E+03 6.60E�02 7.70E�02 4.06E�09 5.75E+01 2.02E�05

StL Xg MoL FrL EoM Sb

E+08 1.95E�07 1.07E�02 1.81E�11 9.68E�06 2.69E�02 1.14E�06E+08 2.12E�06 7.09E�02 3.59E�11 3.65E�05 6.64E�02 2.16E�06E+08 1.39E�05 5.76E�01 9.10E�11 2.79E�04 5.25E�01 1.20E�05E+08 6.25E�05 1.46E+00 1.36E�10 9.89E�04 7.72E�01 2.35E�05E+09 2.18E�04 3.38E+00 7.97E�10 5.63E�03 9.62E+00 8.21E�04E+11 7.15E�03 1.11E+01 2.16E�05 3.68E�01 1.07E+02 9.34E�02E+11 7.03E�02 4.83E+01 1.01E�02 9.21E�01 1.59E+02 1.79E�01

) qL (kg/m3)

lL (Pa s) rL (N/m) DL (m2/s) qG (kg/m3)

lG (Pa s)

.93 763.76 211.36 998.36 1649.18 191.32 330.73

.16 792.01 388.76 519.43 507.68 223.57 458.51

.54 418.38 622.12 914.98 626.64 430.07 497.48

.16 515.57 566.61 392.71 684.62 783.56 770.73

.61 514.06 990.77 572.09 370.81 1320.00 1002.19

.61 1563.43 1075.10 275.08 258.98 384.77 931.35

StL Xg MoL FrL EoM Sb

2.22E�02 4.35E�03 3.86E�02 6.88E�03 9.92E�08 9.92E�083.00E�02 8.42E�04 1.89E�02 7.54E�03 7.66E�06 5.85E�064.59E�03 3.56E�03 1.05E�02 4.66E�03 1.09E�05 1.19E�051.57E�04 1.14E�02 1.31E�04 1.87E�02 1.54E�04 1.47E�042.22E�05 1.47E�02 1.02E�04 3.90E�03 2.15E�02 2.19–021.17E�05 2.04E�02 3.51E�05 1.59E�05 6.37E�02 8.65E�02

Page 5: Support vector regression models for trickle bed reactors

826 S. Bansal et al. / Chemical Engineering Journal 207–208 (2012) 822–831

which is equivalent to exploring the following form:

f ¼ 10bðReLÞw1 ðReGÞw2 ðWeLÞw3 ðWeGÞw4 ðFrLÞw5 ð7Þ

Note that for k > 0 (Eq. (2)), Eq. (7) represents precisely the samefamily of functions as described by Eq. (2). Hence, we can essen-tially make the SVR to learn precisely the same kind of correlationsconsidered in the conventional approach, however use of the formlike Eq. (4) ensures that we are optimizing the generalization error,and optimizing it globally (not locally, which might happen evenwith advanced regression and pattern recognition techniques suchas artificial neural networks (ANNs)).

We have so far considered only one kind of input transforma-tion (which, in machine learning parlance is referred to as theso-called feature mapping), i.e., one that maps the inputs logarith-mically (Eq. (5)). There are many other transformations that workfor different kinds of problems, such as polynomial, and sigmoidal.These maps can be efficiently realized using different kernels. Inparticular, there exists the so-called Gaussian kernel [12–14] thatmaps each input vector, to a vector with infinite terms. Details ofdiscussion on this kernel is out of the scope of this short paper,but it suffices to say that this kernel is extremely powerful, andcan adapt to really almost any kind of non-linear relationship indata. Normally, the Gaussian kernel is characterized by threeparameters: C, c and e. It is arguably the Gaussian kernel that lendsSVR its true power, and is of great relevance here since we do notknow the structure of the parameter space a priori in the context ofthe Chemical Engineering correlation building exercise.

3. Implementation

LibSVM [15] (software available at: http://www.csie.ntu.e-du.tw/~cjlin/libsvm) is a popular tool for SVM training that imple-ments most of the functions required by the work contained withinthis paper. Thus, it was the platform on which the current work

Fig. 3. Parity plot of predicted value of mass transfer coefficient for same database. (a) Uvalue of kLa. (c) Using SVM: Model 2 (Table 2) with dimensionless value of ShL.

was implemented. The essential steps to the implementation ofthe current problem are described as follows.

Step 1: Assembling. Assemble a training data set. Ideally, thedata instances should be randomly and independently sampled,and in number should be at least (about) ten times the numberof input attributes. More practically, make sure that the trainingset reasonably samples the entire range of inputs for which themodel or correlation is to be used.Step 2: Take logarithms of every value in the dataset, bothinputs as well as outputs. This serves many purposes. First, itroughly imposes a percentage insensitive loss on the output,and second, forms of correlations such as Eq. (2) often are trans-formed to simpler in terms of logarithms and also provide goodbest fit estimates. In fact, the latter is one of the reasons whyconventional correlations of the type in Eq. (2) are popular inthe first place. Also, it imposes non-negativity constraints onthe output and input variables, something that is required inmost physical situations.Step 3: Scaling. Normalize all inputs to some common range(typical range could be [�1,1]). This prevents larger magnitudeinputs from dominating the optimization. Another variant ofnormalization, called standardization, that normalizes each fea-ture to 0 mean and unit variance, often works better in the pres-ence of outliers [12].Step 4: Select features. It is well known that identifying andremoving noisy inputs from the dataset can lead to significantlysuperior models. In this context, we propose that a logicalmethod to choose that subset from the set of possible inputs(as has been used in this work), is to choose a subset that leadsto a model with the lowest cross-validation error (CVE). If theset of inputs is small enough (say at most about 10 inputdimensionless groups, which is most often the case in buildingchemical engineering correlations), then we propose that themost robust way to identify the subset of data with least CVE

sing ANN based correlation [1]. (b) Using SVM: Model 1 (Table 1) with dimensional

Page 6: Support vector regression models for trickle bed reactors

TabSVR

MM

M

P

PM52579M

P

PM52579M

P

TM52579M

P

TM52579M

S. Bansal et al. / Chemical Engineering Journal 207–208 (2012) 822–831 827

is to do a brute force search over all possible subsets. Alterna-tively, instead of optimizing the CVE, if it is required to imposesome constraints on the correlation or model, one can also add apenalty for deviation of the model or correlation predictionfrom physical expectations of our objective function, e.g., inthe form of a priori topological–phenomenological rules. Thisis one of the ways that we can use physical constraints to guidelearning algorithms [16].Step 5: Select kernel and model parameters. If the form of func-tion is known a priori (for instance, Eq. (2)), and only functionparameters are to be regressed, it is possible to ‘‘hand tune’’ akernel that only allows relevant functions. For instance, if weknow that (logarithm of) friction factor depends linearly on(logarithm of) Reynolds number in the laminar region for flowthrough ducts, then it is possible to choose a kernel functionaccordingly. In case the form of the functional relationship inthe model of correlation is not known a priori (such as Eq.(2)), then popular kernels may be implemented: such as linear,polynomial and the Gaussian kernel. In this work, a Gaussiankernel has been implemented [12–14]. A popular way of opti-mizing model parameters and kernel parameters is to use theso-called grid search CVE minimization [17]. In this search

le 2models for liquid side mass transfer coefficient (Fig. 3).

Inputs Output Kernel and par

odel parameters and errorsodel 1 log [uL, uG, Dc, e, dv, /, qL, lL, rL, qG, lG] kLa Gaussian c = 0

e = 0.00781odel 2 log [ReL, ReG, WeL, WeG, ScL, StL, XG, MoL, FrL, EoM, Sb] ShL Gaussian c = 1

e = 0.03125

ercentile kLa (1/s) uL (m/s) uG (m/s) Dc (m) e (–) dv (m) /

rofile by dimensional quantitiesin 4.25E�04 6.67E�07 1.50E�03 1.58E�02 3.56E�01 5.41E�04 1.3

% 5.01E�03 1.08E�03 2.45E�03 2.50E�02 3.56E�01 1.77E�03 1.45% 1.70E02 4.65E�03 5.91E�02 5.00E�02 3.85E�01 2.40E�03 5.60% 4.12E�02 9.57E�03 1.48E�01 5.10E�02 4.00E�01 3.63E�03 9.15% 1.79E�01 2.29E�02 4.90E�01 1.14E�01 5.90E�01 6.00E�03 1.05% 1.17E+00 8.57E�02 2.00E+00 1.72E�01 8.90E�01 1.77E�02 1.0ax 7.04E+00 1.49E�01 4.50E+00 1.72E�01 8.90E�01 2.04E�02 2.3

ercentile ShL ReL ReG WeL WeG ScL

rofile by dimensionless numbersin 4.73E�03 1.00E�03 1.24E�02 2.90E�11 5.62E�09 1.03E+0

% 6.69E+00 4.46E�01 9.29E�01 5.03E�05 5.58E�07 3.51E+05% 3.08E+01 7.16E+00 1.31E+01 1.15E�03 2.76E�04 5.24E+00% 2.59E+02 3.09E+01 5.22E+01 6.57E�03 2.11E�03 6.13E+05% 2.61E+03 9.09E+01 1.70E+02 5.85E�02 3.22E�02 7.31E+05% 3.10E+04 6.40E+02 1.39E+03 6.24E�01 8.79E�01 7.66E+0ax 9.55E+04 1.58E+03 5.88E+03 3.67E+00 6.51E+00 4.89E+0

ercentile uL(m/s) uG (m/s) Dc (m) e (–) dv (m) / (–)

rends by dimensional quantitiesin 0.00 0.00 0.04 0.03 0.01 0.03

% 0.01 0.01 0.05 0.03 0.88 0.035% 0.03 0.04 0.48 0.06 0.58 0.210% 0.05 0.09 0.50 0.20 0.25 0.515% 0.14 0.25 0.17 0.49 0.10 0.175% 0.55 0.80 0.09 0.19 0.13 0.18ax 1.80 0.10 0.04 0.03 0.03 0.18

ercentile ReL ReG WeL WeG ScL

rends by dimensionless numbersin 0.00 0.01 0.01 0.01 0.06

% 63.94 23.03 330.29 22.80 24251.885% 637.78 93.71 206.85 88.63 10323.430% 961.09 709.83 997.42 1395.35 3549.785% 1353.76 3100.60 3507.32 2547.03 1540.895% 13377.54 8522.82 14399.75 8688.75 1203.84ax 30272.60 40159.92 12848.76 39906.04 3603.12

method too, it is possible to modify the objective function tobe able to preferentially choose physically consistent models.Step 6: Use cross-validation, as described above, to find the bestestimates of parameters (such as unknowns in Eq. (6)).Step 7: Test the parameters against validation data set (whichshould be distinct from the training data set).

4. Results and discussion

The methodology of use of SVMs for correlation building, andthe necessary background as described in Section 2, is sufficientto develop correlations for any chemical engineering process. In-deed, an instance of this has already been reported [10], whereingas holdup in bubble columns has been correlated using SVMs.In this work, we present an implementation of SVMs for develop-ing generalized correlations for trickle bed reactors (TBRs) (fluiddynamic and mass transfer characteristics), by using historical dat-abases (22,000 experiments in all, across different parameters) onTBR performance reported in the open literature. This is philosoph-ically along the same lines as [1], except that we find remarkableimprovement in overall prediction with the use of SVMs.

ameters Supportvectors

Totalsamples

Training error Validation error

.5, C = 32, 802 902 12.5% (600points)

18.3% (302points)

.0, C = 64 552 902 11.3% (600points)

17.04% (302points)

(–) qL (kg/m3) lL (Pa s) rL (N/m) DL (m2/s) qG (kg/m3) lG (Pa s)

3E�01 6.91E+02 6.30E�04 1.06E�02 4.76E�11 1.19E�01 8.21E�065E01 8.06E+02 8.00E�04 2.53E�02 2.27E�10 1.12E+00 1.45E�052E�01 9.98E+02 1.00E�03 4.77E�02 1.60E�09 1.15E+00 1.71E�051E�01 1.00E+03 1.20E�03 7.20E�02 1.82E�09 1.24E+00 1.77E�050E+00 1.08E+03 1.54E�03 7.20E�02 2.29E�09 1.27E+00 1.08E�050E+00 1.17E+03 2.15E�02 7.56E�02 2.89E�09 2.30E+00 1.08E�053E+02 1.17E+03 2.48E�02 7.56E�02 9.79E�09 3.68E+01 1.82E�05

StL Xg MoL FrL EoM Sb

2 7.60E�09 2.14E�03 1.08E�11 1.51E�11 2.73E�02 6.17E�012 1.97E�06 1.78E�02 1.12E�11 2.43E�05 1.50E�01 2.53E+002 2.06E�05 1.99E�01 2.63E�11 4.68E�04 4.96E�01 2.74E+002 8.62E�05 6.65E�01 5.44E�11 2.17E�03 7.72E�01 3.28E+002 2.62E�04 1.96E+00 3.31E�09 9.68E�03 4.09E+00 1.02E++014 9.06E�03 9.31E+00 1.78E�05 2.27E�01 5.94E+01 2.15E+025 6.83E�02 2.16E+02 3.18E�05 9.64E�01 1.33E+05 2.40E+02

qL(kg/m3) lL (Pa s) rL (N/m) DL (m2/s) qG (kg/m3) lG (Pa s)

0.00 0.02 0.03 0.03 0.00 0.000.03 0.05 0.62 0.66 0.04 0.040.31 0.04 0.49 0.30 0.08 0.180.12 0.13 0.17 0.31 0.20 0.130.13 0.14 0.05 0.22 0.17 0.370.40 0.56 0.11 0.20 0.46 0.550.33 0.72 0.33 0.03 1.14 0.50

StL Xg MoL FrL EoM Sb

0.09 100.28 24.80 0.09 0.51 34.537382.60 276.55 30071.75 713.80 64.53 206.17

13139.75 3335.67 11490.97 1478.39 1048.73 127.983191.02 2216.22 3093.68 4204.82 710.76 483.812509.96 3501.01 1316.76 7391.72 1262.97 1914.05

740.72 10514.69 343.48 6277.79 11660.12 10750.304341.14 10006.43 2494.86 3942.60 44212.62 46973.75

Page 7: Support vector regression models for trickle bed reactors

Fig. 4. Parity plot of predicted value of mass transfer coefficient for same database.(a) Using ANN based correlation [1]. (b) Using SVM: Model 1 (Table 1) withdimensional value of kGa. (c) Using SVM: Model 2 (Table 3) with dimensionlessvalue of ShG.

828 S. Bansal et al. / Chemical Engineering Journal 207–208 (2012) 822–831

In [1], the researchers have presented an extensive trickle bed-flow database built up by consolidating experimental informationfrom over 150 literature references, scanned over half a century.In doing so, the authors consolidated information over manygas–liquid systems (more than 30), over 30 column diameters, over40 different packings and equally large packing bed heights. In all,this represented above 22,000 experiments. Operating conditionsranged from high pressure and temperature (10 MPa and 350 �C),Newtonian, non-Newtonian, coalescing, non-coalescing, organic,pure, mixed and aqueous liquids. The data also scanned across var-ious flow regimes, viz. low interaction regimes (LIR) and high inter-action regimes (HIR). Arguably, data mining this large databasewas an extensive and commendable task. The central idea of thatpaper was the idea that in multiphase reactors such as tricklebed reactors, these range of correlations actually lead to a widediversity of results. Thus, it is almost impossible to use any onecorrelation (or even a combination of a few) to make accurate pre-dictions for the system design parameters with reliable accuracy.For instance, Fig. 1 clearly shows (as an example) that for gas-phase Sherwood number (dimensionless mass transfer coefficient),the prediction from correlations in [3,4] leads to unacceptable(several hundred percent) standard deviation from experimentalvalues. Iliuta et al. [1] reported an ANN model which successfullyreduced this variability to around 10%, which was a big improve-ment on what individual correlations could predict and also hada wide applicability through the entire database of 22,000 experi-mental conditions. It is however notable that not all parametersare populated in every observation. For our aLG model for instance,when we find the data points (which are actually individual datacollected from different research works) from the entire databasereported in [1], the number comes up to 1461 observations (whichalso, is a very large number of observations).

The same experimental database as [1] was used in the currentwork, and SVMs were implemented to achieve the same task as [1],using the logic outlined in Section 3.

Fig. 2 shows the first of the comparisons, attempting to predictthe specific gas–liquid interfacial area aLG in TBRs, a variable that isnot only very important to estimate for design, but is also onewhich conventional correlations, including those using ANNs [1]have found particularly difficult to model universally. A parity plotthat summarizes the performance of popular ‘‘general’’ correla-tions, and an ANN trained by Iliuta et al. [1] is presented inFig. 2a. The important point to be noted from Fig. 2a is that theAARE of popular correlations was found to be as large as 148%when applied to the entire dataset, with an even higher standarddeviation of as much as 356% in the so-called ‘‘low interaction re-gime’’. ANNs were impressively able to improve prediction perfor-mance, reducing AARE to 28.1% with a standard deviation of 37%.Table 1 presents a summary of the SVR models that we havetrained and Fig. 2b and c present parity plots demonstrating theirperformance. We note a notably significant improvement in pre-dictability compared to previous models. For instance, predictionerror reduced from 28.1% (for ANNs trained model precisely onthe same dataset) to 12.8%.

Note that in Table 1 and Fig. 2a and b two different SVM regres-sion models (or simply, SVR models) have been presented. Model 1,which is designed to predict the dimensional gas–liquid interfacialarea, aLG (m2/m3), is significantly better than that of Model 2which is designed to predict the dimensionless gas–liquid interfa-cial area, aLGdh

1�e . The reader should also note that the number ofsupport vectors (which is an indicator of model complexity) ismuch larger for Model 2 than 1, indicating that Model 2 is a worsemodel on both counts.

Our next variable of interest was the liquid side mass transfercoefficient kLa, which in dimensionless terms is expressed as the li-quid-side Sherwood number ShL. Fig. 3a shows the performance of

previous models for liquid phase Sherwood number, as reported in[1]. Similar to Fig. 2, the two kinds of SVR models were trained, andthe results are summarized in Table 2. Their performance is dem-onstrated through parity plots in Fig. 3b and c, respectively. In thiscase, we find that Model 2 (which is based on dimensionless num-ber ShL being the output variable of the SVR) seems to performmarginally better. One notes from Table 2 even though the predict-ability performance of both Models 1 and 2 are similar, the numberof support vectors that are required to reach the optimal predict-ability in each case is almost 2/3rd in case of Model 2 (as comparedto Model 1). Some explanation for this is provided below.

Finally, we present Fig. 4, which is the prediction of gas sidemass transfer coefficient kGa, which in dimensionless terms is ex-pressed as the gas-side Sherwood number ShG. This should beviewed in context of Fig. 1, which presents the best case predict-ability from the ANN model presented in [1], and also shows com-parison with all the models available in the open literature. Theparameters of the SVR model are presented in Table 3. Again onnotes in Fig. 4 significant improvements than anything that hasever been presented in the open literature, and notably both thedimensional and dimensionless descriptions for seeking the opti-mal SVR model leads us to comparable results. From Table 3 andFig. 4, it is clear that not only is the ‘‘goodness of fit’’ remarkablebut the requirement of support vectors is also small (of the orderof 300 or so).

With the idea of providing some backup information regardingour SVR models, in Tables 1–3, in addition to model details

Page 8: Support vector regression models for trickle bed reactors

S. Bansal et al. / Chemical Engineering Journal 207–208 (2012) 822–831 829

described above, there two tables of profiles by dimensionless anddimensional numbers in each case, as well as trends with dimen-sionless and dimensional numbers in each case. Essentially, in eachcase we show the ‘‘range’’ of parameters in which the SVR modelswere created (considering dimensionless number regression ordimensional number or physical quantity regression). This actuallycorresponds to Table 1 in [1]. We highlight here the profiles of per-tinent parameters in the data, indicating ranges over which modelpredictions would have the most accuracy. The presented tablesindicate the percentile distribution of each relevant parameter inthe SVR regression model.

Further, in each of Tables 1–3, we show the ‘‘trends’’ in theoutput data with various input variables. In these tables, high-light in each variable case, the approximate trends of the out-put as a function of the input parameters. Each cell of thetable reports the average value of the output, over one bin ofthe input.

In summary, the point to note is that over reasonably widerange of parameters, SVR regression works almost equally welland also the regression does not show any noticeable bias in pre-dicting the trends.

Table 3SVR models for gas side mass transfer coefficient (Fig. 4).

Inputs Output Kernel and p

Model parameters and errorsModel 1 log [uL, uG, Dc, e, dv, /, qL, lL, rL, qG, lG] kGa Gaussian c =

e = 0.01563

Model 2 log [ReL, ReG, WeL, WeG, ScL, StL, XG, MoL, FrL, EoM, Sb] ShG Gaussian c =e = 0.0039

Percentile kGa (1/s) uL (m/s) uG (m/s) Dc (m) e (–) dv (m) /

Profile by dimensional quantitiesMin 8.84E�03 4.56E�04 3.83E�03 2.58E�02 2.73E�01 5.41E�04 1.35% 1.72E�01 6.64E�04 2.40E�02 5.00E�02 2.73E�01 1.35E�03 3.825% 5.81E�01 1.52E�03 5.48E�02 5.00E�02 3.65E�01 1.77E�03 8.150% 1.16E+00 4.14E�03 1.48E�01 5.00E�02 3.89E�01 2.40E�03 1.075% 2.24E+00 6.67E�03 2.72E�01 5.10E�02 4.53E�01 3.00E�03 1.095% 4.11E+00 1.50E�02 8.30E�01 1.52E�01 7.40E�01 2.21E�02 1.0Max 6.94E+00 1.62E�02 2.01E+00 1.52E�01 9.30E�01 2.21E�02 1.0

Percentile ShG ReL ReG WeL WeG ScL

Profile by dimensionless numbersMin 1.49E�04 2.11E�01 1.44E�01 1.69E�06 1.29E�07 8.07E�05% 3.52E�03 6.02E�01 2.22E+00 2.51E�05 1.35E�05 9.14E�025% 2.54E�02 1.56E+00 7.90E+00 1.54E�04 1.69E�04 9.14E�050% 8.87E�02 4.70E+00 1.80E+01 6.24E�04 1.26E�03 1.07E+075% 4.05E�01 1.13E+01 5.44E+01 2.80E�03 7.06E�03 1.08E+095% 7.17+01 3.38E+01 7.84E+02 3.43E�02 1.44E�01 1.10E+0Max 8.03E+02 1.01E+02 1.43E+03 1.15E�01 5.74E�01 2.35E+0

Percentile uL (m/s) uG (m/s) Dc (m) e (–) dv (m) / (–)

Trends by dimensional quantitiesMin 1.07 0.08 1.33 1.62 1.42 2.495% 1.33 0.45 1.40 1.52 1.41 1.5725% 1.39 1.14 1.40 1.54 1.41 1.3950% 1.48 2.10 1.47 1.01 1.46 1.4375% 2.05 2.20 2.02 1.55 1.88 1.4395% 2.24 3.99 1.69 2.49 0.88 1.43Max 1.67 6.94 1.69 3.88 0.88 1.43

Percentile ReL ReG WeL WeG ScL

Trends by dimensionless numbersMin 0.45 0.00 0.02 0.00 25.915% 3.91 0.02 3.75 0.02 31.1025% 6.13 0.06 6.30 0.08 14.9850% 2.16 0.16 3.31 0.45 50.7575% 30.07 50.33 69.37 34.64 0.0895% 252.26 177.31 36.18 246.95 0.01Max 539.90 110.84 51.88 803.03 0.00

In the above analysis, it is clear that making variables dimen-sionless does not necessarily lead to better mathematical descrip-tions in terms of correlation. Conventional correlations benefittedfrom making variables dimensionless because it was the most con-venient and intuitive way for data mining in times before machinelearning. Also, physical meaning could be ascribed to the dimen-sionless groups which broadly provided some physical insightsinto the physical phenomena and helped guide design and scale-up. However, if the latter requirement were not an overridingfactor, the ‘‘structure’’ of the space of physical variables (notdimensionless variables) does not necessarily make the dimen-sionless correlation the optimal way of describing their interrela-tionships. Fig. 2 makes this amply clear. On the other hand,Figs. 3 and 4 seem to indicate that the predictions are similar nomatter whether one chooses dimensional (physical) or dimension-less (compact) variables. In the latter cases, the space of variableswould have been rather flat to have permitted this result that wederive mathematically through the SVM regression method.

On important issue of concern to engineers is the ‘‘scale’’ issue.Is the proposed regressions method better suited for some scalesand worse at others? The answer lies in the parity plots, which

arameters Supportvectors

Totalsamples

Training error Validation error

0.25, C = 64, 431 498 3.06% (350points)

4.99% (148points)

0.5, C = 64, 270 498 3.69% (350points)

7.44% (148points)

(–) qL (kg/m3) lL (Pa s) rL (N/m) DL (m2/s) qG (kg/m3) lG (Pa s)

9E�01 9.00E+02 1.00E�03 2.67E�02 6.11E�06 1.16E+00 1.71E�050E�01 9.00E+02 1.05E�03 2.67E�02 1.40E�05 1.17E+00 1.74E�053E�01 9.56E+02 1.50E�03 3.70E�02 1.40E�05 1.17E+00 1.74E�050E+00 1.01E+03 1.66E�03 4.00E�02 1.40E�05 1.19E+00 1.76E�050E+00 1.09E+03 3.20E�03 7.28E�02 1.60E�05 1.19E+00 1.80E�050E+00 1.39E+03 9.00E�03 7.77E�02 1.60E�05 1.24E+00 1.80E�050E+00 1.39E+03 9.00E�03 7.77E�02 1.60E�05 1.56E+00 1.80E�05

StL Xg MoL FrL EoM Sb

1 6.20E�07 2.55E�02 2.54E�11 9.62E�07 2.70E�02 1.72E+001 5.81E�06 1.38E�01 2.99E�11 8.03E�06 3.52E�02 1.72E+001 4.11E�05 4.72E�01 1.61E�10 1.35E�04 2.36E�01 2.58E+000 1.05E�04 1.26E+00 1.02E�09 6.16E�04 4.50E�01 2.85E+000 2.50E�04 2.86E+00 2.61E�08 1.35E�03 9.37E�01 4.13E+000 4.23E�04 1.61E+01 1.35E�07 3.00E�03 1.03E+02 2.56E+010 1.80E�03 5.33E+01 1.35E�07 5.03E�03 1.03E+02 2.18E+02

qL (kg/m3) lL (Pa s) rL (N/m) DL (m2/s) qG (kg/m3) lG (Pa s)

1.45 1.26 1.45 1.42 0.30 1.431.34 1.53 1.27 1.50 0.70 1.691.23 1.33 1.26 1.50 1.70 1.591.97 1.58 1.43 1.59 1.68 1.501.71 1.63 1.82 1.82 1.96 1.341.69 1.69 1.73 1.82 2.91 1.341.69 1.69 1.73 1.82 4.25 1.34

StL Xg MoL FrL EoM Sb

44.80 0.02 0.12 36.20 0.01 0.0280.31 0.48 53.36 5.07 0.03 0.07

3.61 5.49 0.09 21.06 0.11 0.130.46 19.15 0.08 20.36 0.11 0.090.33 57.06 21.75 28.26 65.39 25.310.05 33.65 31.10 1.10 55.75 163.690.00 6.09 31.10 0.00 55.75 448.95

Page 9: Support vector regression models for trickle bed reactors

Table 4Percentile distribution of AARE for illustrative case of liquid side Sherwood number (Fig. 3(c)).

Percentile bin (%) ReL (%) ReG (%) WeL (%) WeG (%) ScL (%) StL (%) Xg (%) MoL (%) FrL (%) EoM (%) Sb (%)

0 – – – – – – – – – – –1 24 24 24 24 24 46 31 – 24 8 24

25 24 19 23 22 10 15 17 14 24 14 1350 15 21 17 15 18 17 17 17 16 17 2475 18 17 17 20 15 17 17 17 15 24 1599 12 11 12 11 21 18 16 21 13 15 12

100 10 9 6 9 12 1 32 – 12 6 –

830 S. Bansal et al. / Chemical Engineering Journal 207–208 (2012) 822–831

indicate that the same SVR models perform to similar levels ofaccuracy across the large variation in scale that our data has. Toillustrate this point further, we have added Table 4, as an exampleof the fact that the error distribution does not show any noticeablescale dependence. Table 4, which is for the illustrative case of li-quid side Sherwood number, shows that the average AARE overpercentile bins of different ranges of parameters. Note that theoverall average AARE is 17.05 ± 18.97% (Fig. 3c). Similar resultswere obtained in other cases as well. Thus, ‘‘scale’’ dependence iswell incorporated in the designed SVR regression correlations inthe documented range of the databanks.

Finally we present the generalizing ability of SVRs in the pres-ent context. One merit of conventional correlations is that sincethe number of ‘‘choosable’’ parameters is small, we intuitively ex-pect good regression results on a large sample set to be statisticallysignificant, and hence expect results to generalize to unseen datawell. In this experiment, we demonstrate the generalizing abilityof SVR, using again the TBR dataset, specifically modeling aLG.The SVR model was developed in this case using randomly sampled400 points, and the remaining 1061 points were treated as unseentest data. The results are shown in Fig. 5, which summarizes theperformance of this model on the unseen data. We see that despitethe model having ‘‘seen’’ only a very small fraction of the dataset, itis able to successfully predict for the large unseen fraction with areasonable AARE = 21.6%. In other words, even if one attempts to‘‘extrapolate’’ the predictions to more conditions than what wasoriginally used to developing the SVM correlations, the resultsare still acceptable.

5. Summary and conclusions

In this work, we have presented a machine learning method ofempirical modeling, and its adaptations towards correlation build-

Fig. 5. Parity plot demonstrating the generalization ability of SVR models. Themodel was trained only on 400 randomly sampled points, and was tested on theremaining 1061 points.

ing in Chemical Engineering problems, specifically trickle bed reac-tors. We have demonstrated its ability to adapt to highly non linearrelationships in data, which otherwise permit poor predictionswith a lot of scatter from conventional correlations. Our method,using SVMs, avoids over-fitting, while retaining the generalizationpower of simple, flat models. In that sense, the SVR approach im-proves significantly over the earlier ‘‘best option’’ available, basedon correlation development using artificial neural networks [1].We also show that making variables dimensionless is not necessar-ily the best option in terms of accuracy of correlations, anddepending on the structure of the parameter space, dimensionalSVR correlations may actually be better. Of course, the relativemerit of dimensionless groups in correlations lies in their physicalinterpretation.

Until rigorous theory catches up with engineering need, empir-ical models combined with diversified and broad enough data-banks will probably keep dominating the realm of design in theindustry. SVR is a very powerful learning paradigm that can leadto vastly superior predictive models as compared to those thatare popular today. The proposed algorithm allows us to extractphysical insights from complex data by highlighting their inter-relationships, and this cannot only help us model it better statisti-cally, but also help derive rigorous theory.

Endnote

The programs developed as part of this work will be made avail-able in public domain and will be shared with anyone who wishesto obtain a copy. Interested persons are advised to write an emailrequesting the same to the corresponding author. The models arealso being made available as Supplementary Material with the on-line version of this manuscript.

Appendix A. Supplementary material

Models created in this work (as Excel sheets) can be found, inthe online version, at http://dx.doi.org/10.1016/j.cej.2012.07.081.

References

[1] I. Iliuta, A. Ortiz-Arroyo, F. Larachi, B.P.A. Grandjean, G. Wild, Hydrodynamicsand mass transfer in trickle-bed reactors: an overview, Chem. Eng. Sci. 54(1999) 5329–5337.

[2] C. Vial, S. Poncin, G. Wild, N. Midoux, A simple method for regimeidentification and flow characterisation in bubble columns and airliftreactors, Chem. Eng. Process. 40 (2001) 135–151.

[3] W. Yaïci, A. Laurent, N. Midoux, J.C. Charpentier, Détermination des coefficientsde transfert de matière en phase gazeuse dans un réacteur catalytique à litfixed arrosé en présence de phases liquides aqueouses et organiques, Bull. Soc.Chem. France 6 (1985) 1032.

[4] G. Wild, F. Larachi, J.C. Charpentier, Heat and mass transfer in gas–liquid–solidfixed bed reactors, in: M. Quintard, M. Todorovic (Eds.), Heat and MassTransfer in Porous Media, Elsevier, Amsterdam, The Netherlands, 1992, p. 616.

[5] A. Kulkarni, V.K. Jayaraman, B.D. Kulkarni, Knowledge incorporated supportvector machines to detect faults in Tennessee Eastman Process, Comput. Chem.Eng. 29 (10) (2003) 2128–2133.

Page 10: Support vector regression models for trickle bed reactors

S. Bansal et al. / Chemical Engineering Journal 207–208 (2012) 822–831 831

[6] A. Kulkarni, V.K. Jayaraman, B.D. Kulkarni, Control of chaotic dynamicalsystems using support vector machines, Phys. Lett. A 317 (5–6) (2003) 429–435.

[7] M. Agrawal, A.M. Jade, V.K. Jayaraman, B.D. Kulkarni, Support vector machines:a useful tool for process engineering applications, Chem. Eng. Prog. 98 (1)(2003) 57–62.

[8] S. Nandi, Y. Badhe, J. Lonari, U. Sridevi, B.S. Rao, S.S. Tambe, B.D. Kulkarni,Hybrid process modeling and optimization strategies integrating neuralnetworks/support vector regression and genetic algorithms: study ofbenzene isopropylation on h-beta catalyst, Chem. Eng. J. 97 (2004) 115–129.

[9] K. Desai, Y. Badhe, S.S. Tambe, B.D. Kulkarni, Soft-sensor development for fed-batch bioreactors using support vector regression, Biochem. Eng. J. 27 (3)(2006) 225–239.

[10] A.B. Gandhi, J.B. Joshi, V.K. Jayaraman, B.D. Kulkarni, Development of supportvector regression (SVR)-based correlation for prediction of overall gas hold-upin bubble column reactors for various gas–liquid systems, Chem. Eng. Sci. 62(24) (2007) 7078–7089.

[11] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (1995) 273–297.

[12] L. Wang, Support Vector Machines: Theory and Applications, Springer, NewYork, 2005.

[13] C. Burges, A tutorial on support vector machines for pattern recognition, DataMin. Knowl. Disc. 2 (1998) 121–167.

[14] D. Basak, S. Pal, D.C. Patranabis, Support vector regression, Neural Inf. Process.Lett. Rev. 11 (10) (2007) 203–224.

[15] C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, ACMTrans. Intell. Syst. Technol. 2 (2011) 27:1–27:27.

[16] L. Tarca, B. Grandjean, F. Larachi, Reinforcing the phenomenologicalconsistency in artificial neural network modeling of multiphase reactors,Chem. Eng. Process. 42 (8–9) (2003) 653–662.

[17] C.W. Hsu, C.C. Chang, C.J. Lin, A Practical Guide to Support Vector Classification,Department of Computer Science, National Taiwan University, 2010.