predicting saturates of sour vacuum gas oil using artificial neural networks and genetic algorithms

Expert Systems with Applications 37 (2010) 4768–4771

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier .com/locate /eswa

Predicting saturates of sour vacuum gas oil using artificial neural networksand genetic algorithms

Shouchun Wang *, Xiucheng Dong, Renjin SunSchool of Business Administration, China University of Petroleum, Beijing 102249, PR China

a r t i c l e i n f o

Keywords:SaturatesSourVacuum gas oilPredictionArtificial neural networksGenetic algorithm

0957-4174/$ - see front matter � 2009 Elsevier Ltd. Adoi:10.1016/j.eswa.2009.11.073

* Corresponding author. Tel.: +86 010 89734397.E-mail addresses: [email protected] (S. W

com.cn (X. Dong), [email protected] (R. Sun).

a b s t r a c t

Accurate predictions of chemical composition by physical properties of sour vaccum gas oil (VGO) frac-tions are important for the refinery. In this paper, a feed-forward type network based on genetic algo-rithm (GA), was developed and used for predicting saturates of sour vacuum gas oil. The number ofneurons in the hidden layer, the momentum and the learning rates were determined by using the geneticalgorithm. The five physical properties of sour VGO, namely, average boiling point, density at 20 �C,molecular weight, kinematic viscosity at 100 �C and refractive index at 70 �C were considered as inputvariables of the ANN and the saturates of sour VGO was used as output variable. The study shows thatgenetic algorithm could find the optimal networks architecture and parameters of the back-propagationalgorithm. Further, the artificial neural network models based on genetic algorithm are tested and theresults indicate that the adopted model is very suitable for the forecasting of saturates of sour VGO. Com-pared with other forecasting models, it can be found that this model can improve prediction accuracy.

� 2009 Elsevier Ltd. All rights reserved.

1. Introduction

Vacuum gas oil (VGO, 370–560 �C) of crude is the importantfraction in crude processing and the main feedstock in catalyticcracking processing and hydrotreatment. The hydrocarbongroup-type composition of VGO is the fundamental data in engi-neering designing, optimization of processing, catalyst selectionand assessing the quality of the end product and it plays anessential role on analyzing procedure, secondary processing,engineering calculating and process modeling. The petroleumrefinery increasingly processed sour crude and the studies onrelationship between hydrocarbon group-type composition andphysical properties of sour crude VGO become more and moreimportant.

The hydrocarbon group-type composition of petroleum frac-tions which is at different boiling point is mainly obtained byusing different methods and specific equipment. At present,hydrocarbon type composition of VGO is obtained by Liquidchromatography and thin-layer chromatography-flame ionizationdetection (TLC–FID). However, this technique has proven time-consuming and cost effective. On the other hand, some methodscan be proposed to build correlations between physical proper-ties and hydrocarbon group-type composition of petroleumfraction.

ll rights reserved.

ang), dongxiucheng@yahoo.

A number of related studies have been performed in the pastfew decades. In the early studies, the empirical methods werewidely used. For example, van Nes and van Westen (1951) devel-oped n-d-M method. Waterman, Boehower, and Cornelissen(1958) used simple graphic statistical methods to correlate physi-cal properties with the chemical structure of the minerals oil. Thecorrelations of Riazi and Daubert (1980) developed correlationequations based on the utilization of the factors: VGF, VGC andRI. Riazi and Daubert (1986) are based on the use of the former fac-tors and some new factors such as carbon to hydrogen weight ratioand molecular weight. The correlations of Nwadinigwe and Okoroji(1990) are based on the use of refractivity-intercept. In recentstudies, the correlations of El-Hadi and Bezzina (2005) are basedon the use of normal boiling point, density, refractive index andcarbon to hydrogen weight ratio. The correlations of Xing and Chen(2006) are based on average boiling point, density, hydrogen con-tent, molecular weight, refraction index and viscosity. Liu, Tian,Liu, and Wang (2007) used stepwise regression method to corre-late hydrocarbon type of VGO with its five physical properties(density, refractive index, molecular weight, the contents of sulfurand hydrogen). But correlation between the hydrocarbon group-type composition of sour crude VGO and its physical propertiesis complex and nonlinear and these methods were found to giveless accurate results. Moreover, these methods are applied mainlyfor light fractions. Sun, Wang, and Zhao (2007) used the artificialneural networks to predict saturates of sour vacuum gas oil. How-ever, this process tedious and difficult and it mainly depends ondesigner’s experience.

http://dx.doi.org/10.1016/j.eswa.2009.11.073

mailto:[email protected]




http://www.sciencedirect.com/science/journal/09574174

http://www.elsevier.com/locate/eswa

Input Layer Hidden Layer Output Layer

ABP

S. Wang et al. / Expert Systems with Applications 37 (2010) 4768–4771 4769

In this study, a new method was proposed to improve the pre-diction of saturates of sour VGO by using genetic algorithm to opti-mize the multilayer feed-forward network. Genetic algorithm wasused to search for optimal architectures and training parameters ofANN for predicting saturates of sour VGO. In other words, a novel,artificial neural networks model based on genetic algorithm wasdeveloped to build relationship between saturates and the averageboiling point, density, kinematic viscosity, molecular weight andrefractive index of sour VGO.

2. Artificial neural networks and genetic algorithm

2.1. Artificial neural networks

An artificial neural network is a flexible mathematical modelwhich imitates the working principles of human brain. Three ele-ments are particularly important in any model of artificial neuralnetworks: the structure of the nodes, the topology of the networkand the learning algorithm used to find the weights of the network(Rojas, 1996). The ANN is composed of many nodes that operate inparallel and communicate with each other through connectingweights. In terms of their topology structures, neural networks canbe divided into two types: feed-forward networks and recurrent net-works. There are two types of learning algorithm used in neural net-works: supervised and unsupervised learning. The ANN can realizecomplex nonlinear mapping by using learning algorithm to adjustthe connection weights based on its distributed structure. ANN candiscover patterns adaptively from the data. The ANN is capable oflearning complex relationships from many individual examples (orexperiences). In addition, they have fault and noise tolerance. Inaddition, the ANN has high robustness and generalization capability.They have been used with success for prediction.

2.2. Genetic algorithm

Genetic algorithm (GA) is a global heuristic, stochastic, optimi-zation technique based on evolution theory and genetic principlesdeveloped by Holland (Holland, 1975). Goldberg and Michalewiczdiscussed the mechanism and robustness of GA in solving nonlin-ear optimization problems (Goldberg, 1989; Michalewicz, 1992).The algorithm begins with a randomly generated population whichis consist of chromosomes, and applies three kinds of genetic oper-ators: The selection, crossover and mutation operators to find theoptimal solutions. The selection operator chooses chromosomesfrom the current population based on fitness value of the individ-uals. The crossover operator combines the features of two parentchromosomes to form two similar offspring by swapping corre-sponding segments of the parents (Goldberg, 1989). The mutationoperator creates new chromosomes by randomly changing thegenes of existing chromosomes. GA can explore the entire designspace by the genetic manipulations, it does not easily fall into acertain local minima or maxima. Therefore, GA is an aggressivesearch technique that quickly converges to find the optimal solu-tion in a large solution domain.

Table 1Summary of input variables.

Factors Symbol Unit

Average boiling point Tb �CDensity D gram per cubic centimetre (g/cm3)Molecular weight M uKinematic viscosity m mm2/sRefractive index nSaturates S %

3. The ANN–GA model development and application

3.1. Data definitions and sources

The previous literatures showed the most important influencingfactors on the saturates of sour vacuum gas oil were average boil-ing point, density, molecular weight, viscosity, refractive index andcarbon residue. Based on available experimental data, five basephysical properties were selected as input variables to predict sat-urates. All influencing factors are summarized in Table 1.

Sample data were collected from 122 kinds of sour vacuum gasoil at home and abroad, and 51 kinds of analysis data of sour crudeVGO were selected. Typically, 90% of the data were used for train-ing and cross-validation purposes. The remainders were chosen astesting data. The test dataset is used to guide the network to testover training (over generalization) phenomena during trainingphase. In the ANN testing phase, the objective is to determinethe suitability of the weight coefficients with regard to over-fittingprocess.

3.2. ANN–GA model scheme

Although many types of neural network models have beendeveloped, the most popular one for application is the back-prop-agation (BP) neural network model. Back-propagation (BP) neuralnetwork is multilayer feed-forward neural network which istrained by the error BP algorithms. In this work, a multi-layeredfeed-forward neural network with a back propagation algorithmwas adopted. Although the algorithm is successful, it has some dis-advantages. The algorithm is not guaranteed to find global mini-mum of error space and the convergence tends to be extremelyslow. In addition, the selection of the learning factor and inertialfactor affects the convergence of the BP neural network which isusually determined by experience. In present work, the numberof neurons in the hidden layer, the momentum and the learningrates were determined using the genetic algorithm.

The number of hidden layers and the number of nodes in eachlayer depends on the complexity of the patterns and the natureof the problem to be solved. The use of a single hidden layer is suf-ficient to approximate to any continuous function as closely as re-quested (Funahashi, 1989; Hornik, Stinchcombe, & White, 1990)and the study also shows that having more than three layersmay not result in significant performance improvements (Patuwo,Hu, & Hung, 1993). Thus, in our study, a three-layer ANN was used(see Fig. 1).

These five influencing factors were used as inputs to the ANN–GA network and saturates of sour VGO as an output. The number ofneurons in the input and output layers are given by the number ofinput and output variables of network. The number of neurons inhidden layer was obtained by GA. The results of study showed that

S

D

n

Tb

M

Fig. 1. The structure of a three-layer feed-forward artificial neural network.

Fig. 2. The optimization process flow chart of GA.

0

0.0005

0.001

0.0015

0.002

0.0025

0.003

0.0035

0.004

0.0045

0.005

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Generation

MSE

Best Fitness

Fig. 3. Best fitness (MSE) vs. generation for ANN–GA.

Table 3Results of two trial and error method and their comparison to the GA method.

Model Number of neurons R NMSE

ANN–GA 8 0.996 0.011ANN 10 0.991 0.019

Table 4GA–ANN performance.

Performance Number of neurons

MSE 1.9149NMSE 0.0111MAE 1.2353Min abs error 0.4036Max abs error 1.8008R 0.9957

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29Generation

MSE

Average Fitness

Fig. 4. Average fitness (MSE) vs. generation for ANN–GA.

4770 S. Wang et al. / Expert Systems with Applications 37 (2010) 4768–4771

the optimal number of neurons in the hidden layer is 8 and therewere eight neurons. Therefore, the three-layered ANN–GA networkused in this work has 5–8–1 structure which consists of 1 inputlayer, 1 hidden layers and 1 output layer. The design of theANN–GA was shown in Fig. 2. The tanh function was used in eachnode of the hidden layer and output layer as the transfer functionsfor the ANN–GA model.

In this study, the roulette-wheel and tournament for the selec-tion, one-point and uniform for crossover and uniform for muta-tion operators were applied to train the networks. Some testswere performed to represent the combination of different optionsfor crossover, selection operators, and population size and genera-tion number. The parameter values used to search the optimal con-ditions in this study were summarized in Table 2. After adjusting

Table 2Parameter values for GA.

Parameter Value

Population size 20Probability crossover 0.9Probability mutation 0.01

the parameters mentioned above in the training phase, it wasfound that the optimum number of neurons in the hidden layerfor networks was 8. This operation flowchart is illustrated in Fig. 3.

Table 5Summary report for ANN–GA.

Optimization summary Best fitness Average fitness

Generation # 30 21Minimum MSE 0.0093 0.00104Final MSE 0.0093 0.00196

Table 6The forecasting results of the model.

No. M (u) D (g/cm3) m (mm2/s) n Tb (�C) Ideal outputs (%) Actual outputs (%) Relative error (%)

1 378 0.8595 5.39 1.4617 463 81.4 79.6 �2.22 405 0.8681 6.69 1.4651 475 76.8 76.3 �0.73 414 0.9251 9.63 1.5005 463 46.1 47.9 3.94 367 0.8770 6.47 1.4680 425 80.4 82.0 2.05 407 0.8978 7.88 1.4830 455 67.0 66.6 �0.6

S. Wang et al. / Expert Systems with Applications 37 (2010) 4768–4771 4771

ANN models also have been trained by trial-and-error methodby changing the number of neurons in the hidden layer from 2 to20 and BP algorithm with momentum and adaptive learning ratehas been chosen as the learning algorithm. Table 3 shows the re-sults of the trial-and-error method and their comparison to theGA method. GA approach resulted in considerably lower errorand higher linear correlation coefficient. Therefore, it is concludedthat the GA method dominates the trial-and-error method.

4. Results and discussions

Table 4 reports the performance of ANN–GA in terms of meansquared error (MSE), normalized mean squared error (NMSE),mean absolute error (MAE), minimum absolute error and maxi-mum absolute error and the linear correlation coefficient (r) be-tween experimental data and neural network outputs. As inTable 4, NMSE of GA–ANN model is considerably small and R isclose to 1. These indicate that training performance of GA–ANNmodel is good. In the present study, MSE is only used for the esti-mation of network training performance, whereas r, MSE andNMSE are used to measure the prediction performance of GA–ANN on the validation data set.

Table 5 shows a summary of the best fitness and the average fit-ness values for network. In this table, the minimum MSE (across allgenerations), the generation of this minimum and the final MSE aredisplayed. Also, corresponding plots which resulted from Table 5are shown in Figs. 3 and 4. Fig. 3 demonstrates the best fitness va-lue vs. the number of generations. The best fitness is the minimumMSE taken across all of the networks within the correspondinggeneration. In Fig. 4, the average fitness achieved during each gen-eration of the optimization is illustrated. The average fitness is theaverage of the minimum MSE taken across all of the networkswithin the corresponding generation. Both Figs. 3 and 4 show thatGA algorithms gradually converge to global optimal point withevolutions of generations. In Table 6, the forecasting result ofANN–GA is compared to the experimental data for saturates ofsour VGO. As in the Table 6, the average relative error of forecast-ing result of the GA-ANN model is 1.88%. It can be proved that themodel have good ability of forecasting.

5. Conclusions

In this work, artificial neural networks coupled with a geneticalgorithm were developed and used for predicting saturates of sourVGO. The number of neurons in the hidden layer, the momentumand the learning rates of back-propagation algorithm were deter-mined using the GA algorithm. The ANN was used to build the rela-

tionship between saturates and physical properties of sour VGO.The five physical properties of sour VGO, namely, average boilingpoint, density at 20 �C, molecular weight, viscosity at 100 �C andrefractive index at 70 �C were considered as input variables andthe saturates of sour VGO was used as output variable. The studyshowed that genetic algorithm could find the optimal architectureof the neural network and parameters of the back-propagationalgorithm. Further, the artificial neural network model based ge-netic algorithm was tested and the results indicated that excellentagreement between the predictions and the experimental data isobtained. Meanwhile, it can be found that this model can improveprediction accuracy.

Acknowledgement

This work was supported by China’s Program for New CenturyExcellent Talents in University.

References

El-Hadi, D., & Bezzina, M. (2005). Improved empirical correlation for petroleumfraction composition quantitative prediction. Fuel, 84(5), 611–617.

Funahashi, K. (1989). On the approximate realization of continuous mappings byneural networks. Neural Networks, 2(3), 183–192.

Goldberg, D. E. (1989). Genetic algorithms in search, optimization, and machinelearning. Reading, MA: Addison-Wesley.

Holland, J. (1975). Adaptation in natural and artificial systems. Berlin: Springer.Hornik, K., Stinchcombe, M., & White, H. (1990). Universal approximation of an

unknown mapping and its derivatives using multilayer feedforward networks.Neural Networks, 3(5), 551–560.

Liu, S. B., Tian, S. B., Liu, Y. R., & Wang, J. (2007). Study on the prediction ofhydrocarbon type compositions of VGO based on its conventionalcharacteristics. Petroleum processing and petrochemicals, 38(9), 18–22.

Michalewicz, Z. (1992). Genetic algorithms + data structures = evolution programs(3rd ed.). Springer-Verlag.

Nwadinigwe, C. A., & Okoroji, K. A. (1990). Novel equations for quantitativehydrocarbon-type analysis of petroleum fractions. Fuel, 69, 340–343.

Patuwo, E., Hu, M. Y., & Hung, M. S. (1993). Two group classification problem usingneural networks. Decision Sciences, 24(4), 825–846.

Riazi, M. R., & Daubert, T. E. (1980). Prediction of the composition of petroleumfractions. Industrial Engineering of Chemical Process and Design Development.

Riazi, M. R., & Daubert, T. E. (1986). Prediction of molecular-type analysis ofpetroleum fraction and coal liquids. Industrial Engineering of Chemical Processand Design Development, 25, 1009–1015.

Rojas, R. (1996). Neural networks-a systematic introduction. Berlin, New-York:Springer-Verlag.

Sun, R. J., Wang, S. C., & Zhao, S. Q. (2008). Study on prediction of sour vacuum gasoil (VGO) saturates by using artificial neural networks. Computers and AppliedChemistry, 35(11).

van Nes, K., & van Westen, H. A. (1951). Aspects of the constitution of mineral oils.New York: Elsevier.

Waterman, H. I., Boehower, C., & Cornelissen, J. (1958). Correlation between physicalconstants and chemical structure. New York: Elsevier.

Xing, B., & Cheng, Z. (2006). Studies on the prediction of hydrocarbon seriescomposition of VGO. Qilu Petrochemical Technology, 34(2), 173–177.

predicting saturates of sour vacuum gas oil using artificial neural networks and genetic algorithms

Documents