data-driven esp modelling and optimisation

8
Data-driven ESP modelling and optimisation Daniel Toimil, Alberto Gómez n , Sara M. Andre ́ s Department of Business Management, University of Oviedo, 33203 Gijón, Asturias, Spain article info Article history: Received 21 December 2013 Accepted 30 December 2013 Available online 15 January 2014 Keywords: Electrostatic precipitators Data mining Optimisation ESP modelling abstract The process that takes place in an electrostatic precipitator (ESP) is complex and is influenced by several phenomena. The numerical models present in the literature are continuously growing, but the complexity inherent to the process and the limits of current computers make it impossible to carry out a complete modelling process, being necessary to carry out important simplifications on the models. Taking into account these limitations of numerical models, the use of data mining techniques is proposed for the analysis and modelling of the ESP's performance, the advantages of which are the possibility to include a large amount of complex phenomena and factors and their applicability to any ESP, regardless of its specific configuration or shape. This approach is especially interesting for the analysis of ESPs that have already been implemented, from which we can recover data and take into account the environment's characteristics. Furthermore, the resulting models can be used in optimisation processes in which the best ESP configuration is sought at all times, in order to maximise its yield. & 2014 Elsevier Ltd. All rights reserved. 1. Introduction Electrostatic precipitators (ESPs) are widespread in the industry due to their great performance in the collection of suspended particles, which usually reaches efficiencies above 90% of gathered particles. Nowadays, the evaluation of the influence of each part of the process in the resultant emissions is a difficult task. This evaluation supports the design of process strategies that enhance the level of emissions without affecting the final results of the industrial process. Due to the more restrictive environmental laws, these strategies are an important added value for industrial installations. The efficiency of ESPs depends on many factors and the modelling and study of the latter is an important step forward in the design and configuration of the ESP to reach an appropriate efficiency. The three main phenomena that all models must take into account are the gas flow, the electric field and the transport of particles. Furthermore, it is known that these phenomena interact with each other complicating the system modelling. Other phenomena, such as the corona discharge, the particle charge, electrodynamic flows and the re-entry of particles also play an important role. Numerical modelling has evolved considerably since the classic models proposed by Deutsch (1925) and Cooperman (1971). However, the models of greater complexity like those proposed by Farnoosh et al. (2011) or Neimarlija et al. (2011) are computationally costly and still do not include a complete modelling of all the phenomena that take part in the process's efficiency. Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/jaerosci Journal of Aerosol Science 0021-8502/$ - see front matter & 2014 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.jaerosci.2013.12.013 n Corresponding author. E-mail address: [email protected] (A. Gómez). Journal of Aerosol Science 70 (2014) 5966

Upload: sara-m

Post on 30-Dec-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Data-driven ESP modelling and optimisation

Contents lists available at ScienceDirect

Journal of Aerosol Science

Journal of Aerosol Science 70 (2014) 59–66

0021-85http://d

n CorrE-m

journal homepage: www.elsevier.com/locate/jaerosci

Data-driven ESP modelling and optimisation

Daniel Toimil, Alberto Gómez n, Sara M. AndresDepartment of Business Management, University of Oviedo, 33203 Gijón, Asturias, Spain

a r t i c l e i n f o

Article history:Received 21 December 2013Accepted 30 December 2013Available online 15 January 2014

Keywords:Electrostatic precipitatorsData miningOptimisationESP modelling

02/$ - see front matter & 2014 Elsevier Ltd.x.doi.org/10.1016/j.jaerosci.2013.12.013

esponding author.ail address: [email protected] (A. Góm

a b s t r a c t

The process that takes place in an electrostatic precipitator (ESP) is complex and isinfluenced by several phenomena. The numerical models present in the literature arecontinuously growing, but the complexity inherent to the process and the limits of currentcomputers make it impossible to carry out a complete modelling process, being necessaryto carry out important simplifications on the models. Taking into account these limitationsof numerical models, the use of data mining techniques is proposed for the analysis andmodelling of the ESP's performance, the advantages of which are the possibility to includea large amount of complex phenomena and factors and their applicability to any ESP,regardless of its specific configuration or shape. This approach is especially interesting forthe analysis of ESPs that have already been implemented, fromwhich we can recover dataand take into account the environment's characteristics. Furthermore, the resultingmodels can be used in optimisation processes in which the best ESP configuration issought at all times, in order to maximise its yield.

& 2014 Elsevier Ltd. All rights reserved.

1. Introduction

Electrostatic precipitators (ESPs) are widespread in the industry due to their great performance in the collection ofsuspended particles, which usually reaches efficiencies above 90% of gathered particles.

Nowadays, the evaluation of the influence of each part of the process in the resultant emissions is a difficult task. Thisevaluation supports the design of process strategies that enhance the level of emissions without affecting the final results ofthe industrial process. Due to the more restrictive environmental laws, these strategies are an important added value forindustrial installations.

The efficiency of ESPs depends on many factors and the modelling and study of the latter is an important step forward inthe design and configuration of the ESP to reach an appropriate efficiency. The three main phenomena that all models musttake into account are the gas flow, the electric field and the transport of particles. Furthermore, it is known that thesephenomena interact with each other complicating the system modelling. Other phenomena, such as the corona discharge,the particle charge, electrodynamic flows and the re-entry of particles also play an important role.

Numerical modelling has evolved considerably since the classic models proposed by Deutsch (1925) and Cooperman(1971). However, the models of greater complexity like those proposed by Farnoosh et al. (2011) or Neimarlija et al. (2011)are computationally costly and still do not include a complete modelling of all the phenomena that take part in the process'sefficiency.

All rights reserved.

ez).

Page 2: Data-driven ESP modelling and optimisation

D. Toimil et al. / Journal of Aerosol Science 70 (2014) 59–6660

Taking into account these limitations of numerical models, the use of data mining techniques is proposed for the analysisand modelling of the process that takes place in the ESPs. The advantages of this approach lie in the capacity to find patternsbased on empirical data that provide knowledge about the process. The data mining models can be used not only to predictthe performances of the ESP, but also to identify the most influential factors of the process, taking into account phenomenathat currently cannot be modelled with numerical techniques and without the ESP's geometry and configuration being animportant obstacle that can complicate the modelling process. This knowledge may be helpful to propose changes in thesefactors in order to achieve a reduction in the emissions.

Finally, the application of data mining models to ESP optimisation tasks is posed. These tasks consist in finding the bestESP configurations for the functioning conditions present at each moment. The aim is to control the ESP's configurableparameters by means of metaheuristic algorithms that are based on the models generated (Glover & Kochenberger, 2003;Talbi, 2009). The final performance of the ESP can be maximised with this optimisation.

The rest of the paper is organised as follows. Section 2 analyses the most important phenomena and some of the mostused techniques in numerical modelling. Section 3 describes different data mining techniques useful for the analysis of ESPs,while Section 4 discusses the advantages and applications of those techniques in the ESP field. Section 5 analyses theapplication of models to the optimisation of ESP functioning by means of metaheuristic algorithms. Finally, Section 6contains the conclusions.

2. Theoretical modelling of ESPs

During the filtering process, the ESP receives an airflow full of particles at a certain speed that, after going through theelectrical field existing inside the ESP, produces a ionized charge space that charges the particles causing them to becollected on the collection plate. According to Talaie et al. (2001), the three main phenomena of this process are the electricfield, the gas flow and the speed of the particles. The action of each one of these phenomena affects the others so that alltheir behaviours are related, making the modelling process even more complicated.

Classic models like those proposed by Deutsch (1925) and Cooperman (1971) are somewhat incomplete models that donot take into account great part of the phenomena that take place in the ESP and that present a low consumption ofcomputational resources thanks to the simplifications carried out. Nowadays, the increase of computer power has led to thedevelopment of new mathematical models that are more complex and costly, and that take into account a larger amount ofphenomena and explain the behaviour of ESPs more accurately. The following section is going to describe the most usedtechniques for the modelling of several of the most influential phenomena regarding the functioning of the ESP.

2.1. Main phenomena

Most of the models proposed in the literature model the three main phenomena mentioned above. On one hand, thecalculation of the electric field inside the ESP is usually carried out using Poisson's equation, which can be solved with greatprecision for almost any geometric shape (Adamiak, 2013). These models often include simplifications that reduce memoryand the execution time, such as not taking into account the charge space made up by the precipitated particles (Böttner,2003; Khare & Sinha, 1996), replacing the wires with point electrodes (Yamamoto & Sparks, 1986) or using the method ofimages for partial differential equations (Soldati et al., 1993).

Secondly, the gas flow is made up by a primary and a secondary flow. The primary flow is the gas flow when it enters theESP and causes the gas to move from the entry. On the other hand, the so-called secondary flow appears as a result of theionization of its particles, which modifies the initial flow and complicates the modelling. Furthermore, the particles presentin the gas are charged by the ionized molecules that, due to changes in the electric field, also undergo their own secondaryflow, usually different to the secondary gas flow (Mizeraczyk et al., 2013). This reciprocal effect between the electric fieldand the gas flow complicates the modelling of this phenomenon that is usually modelled using electrohydrodynamics(Mizeraczyk et al., 2013).

The prediction of the gas flow inside the ESP has been approached from different aspects, being the following the mostcommon:

Simplified flow patterns: They do not solve the flow equations, but are only based on the phenomenon's basic features(Khare & Sinha, 1996; Kim et al., 2001; Lu & Huang, 1998) and, therefore, are less accurate.

2D (Blanchard et al., 2001; Dumitran et al., 2004; Liang & Lin, 1994) and 3D (Yamamoto et al., 2003) laminar model: Theyoffer a more complete analysis with a much greater computational cost.

The flow's turbulences, intensified by interactions with the gas, ions and particles, have been simulated on manyoccasions by means of the k–ɛ model (Farnoosh et al., 2010; Lei et al., 2008; Soldati et al., 1993). Adamiak (2013) states in hisrevision that the best model used up to now for turbulences is the Navier–Stokes solution, proposed by Soldati (2003).

Finally, the calculation of the transport of particles inside the ESP and their deposit on the collecting plate has been setout mainly from two different approaches: Eulerian simulation (Khare & Sinha, 1996; Kim et al., 2001; Lu & Huang, 1998) andLagrangian simulation (Adamiak & Atten, 2009; Neimarlija et al., 2011; Soldati, 2000). In the Eulerian approach, the problem

Page 3: Data-driven ESP modelling and optimisation

D. Toimil et al. / Journal of Aerosol Science 70 (2014) 59–66 61

is laid out through the solution of a single equation, with the subsequent advantage in terms of computation time, at theexpense of not taking into account other phenomena. The second approach takes into account all the essential factors, butthe computational cost is increased considerably.

2.2. Other phenomena

As well as the abovementioned phenomena, there are others that also affect the ESP's behaviour and that certainnumerical models take into account. One of them is the corona discharge that is responsible for the creation of ionizedcharge space that is essential for the ESP's functioning and that takes place in the discharge electrode. The corona dischargetakes place when the voltage applied to the discharge electrode is increased and becomes a glowing halo. This phenomenonreleases electrons outside the electrode at high speed, which create the ionized field charging the particles negatively.

The behaviour of the corona discharge depends to a great extent on the electrode's configuration and is a complexproblem even in the case of configuring simpler electrodes. Most of the papers published apply simplified models of thisphenomenon, in which the ionization layer is disregarded and a single stationary flow is simulated (Adamiak, 2013).

The electrons released by the corona discharge generate molecules of ionized gas. These molecules move in the oppositedirection to the discharge electrode, towards the collection plate. The particles that move in the gas flow join the ionizedmolecules and the latter adhere to the particles providing them with negative charge. The process of charging the particlesby means of ionized molecules can happen due to field charging or diffusion charging. Field charging prevails in largerparticles (larger than 1 μm) and is based on the distortion that the particle causes to the existing electric field, making thegas molecules that travel through the field collide with the particle, providing it with charge. Diffusion charging prevails insmaller particles (smaller than 1 μm) and consists in the collision of gas ions due to random thermal movement of theparticles.

Therefore, the prevailing charging mechanisms are different depending on the size of the particle, which suggestdifferent behaviours depending on this variable. Due to the fact that in practice these particles are neutral when they enterthe ESP, some works model their charging process due to the electric field (Lei et al., 2008), while other papers simplify theprocess supposing that the particles reach the ESP pre-charged (Kim et al., 2001; Soldati, 2000, 2003).

2.3. Validity of the models

The numerical modelling of ESPs is a very useful technique to study and design ESPs. In some cases, laboratory-scale ESPsare used to design and validate the models (Jädedrusik & Åšwierczok, 2013). This process can be interesting and generateuseful knowledge, however, there are significant differences between a laboratory-scale ESP and an ESP located on anindustrial site. There is much more control over the environment of the laboratory-scale ESP and there are less internal andexternal factors that can affect the ESP's performance than in the case of an industrial ESP, such as vibrations, temperaturechanges, ambient humidity, etc. On the other hand, there are other aspects like the physical similarity of the ESPs (in termsof materials, for instance) that, according to Ortiz et al. (2010) is very difficult to achieve. Current numerical models arelimited to the extent that they are not capable of taking into account all these additional variables.

Another problematic aspect of numerical modelling is the size of the particles. In the case of relatively large particles(larger than 1 μm) it is rather simple to capture them in the ESP, provided that their resistivity is not too high, as thetransport of particles is controlled by electric forces and the details regarding the flow pattern and ionic wind are not critical(Adamiak, 2013). However, collecting smaller particles (smaller than 1 μm) is much more difficult and the abovementioneddetails become more important. In these cases, the models must take into account a larger amount of phenomena that affectthe collection of particles. As well as the size, the shape of the particles is another factor that affects the ESP's efficiency(Zhuang et al., 2000). Both parameters are variable as the flow contains particles of many different sizes and shapes, whichmakes it more difficult to carry out a correct modelling process. On many occasions, this fact forces us to carry out asimplification of the problem, supposing that the particles have a spherical shape or working with mean particle sizes.

On the other hand, the ESP's shape plays an important role in the behaviour, and therefore, the theoretical models musttake into account. Mizeraczyk et al. (2013) proves that geometry affects the air flow and performances of the ESP(Mizeraczyk et al., 2013). This limits the application of theoretical models to specific configurations, making it more difficultto design a general model.

With regard to the transport of particles, Boettner (2003) verified that the majority of the models were capable ofpredicting correctly the average speed of the particles, but the forecasts were mistaken when it came to modelling speedfluctuations caused by electrostatic fields.

A fact that is commonly assumed is that once the particles have made contact with the collector, they are depositedpermanently. However, it has been found out that this deposit is not always final and that the particles can re-enter the airflow due to its turbulences, to the back corona discharge (Adamiak, 2013) or to the hammering process. This phenomenonaffects the ESP's performance and according to Adamiak (2013), it is a problem that is hard to solve and that is not usuallytaken into account in the models.

Finally, due to the great complexity of the problem, it is currently impossible to obtain a complete numerical model andit is necessary to carry out many simplifications in order to maintain the computational cost of the model at a reasonablelevel (Adamiak, 2013). The data mining techniques proposed in the paper intend to provide a model that complements the

Page 4: Data-driven ESP modelling and optimisation

D. Toimil et al. / Journal of Aerosol Science 70 (2014) 59–6662

numerical modelling, and that also solves the problems brought up, provides new ways of obtaining knowledge andanalysing ESPs, opening new research channels in this field.

3. Data mining theory

Data mining is a field within computational science that can be defined as a process the aim of which is the non-trivialachievement of previously unknown information or patterns based on large amounts of data. This is an interdisciplinaryfield that includes the automatic learning and statistics fields, among others.

The application of data mining can be considered a stage of the knowledge discovery process, which consists in applyingdiscovery algorithms and analyses to the data to produce a particular list of patterns (or models) (Fayyad et al., 1996).

These techniques allow us to obtain underlying information from the data, analyse behaviours of the reality under studyand discover the importance of each variable in the process studied.

Depending on the purpose of the data analysis, mining techniques can be classified as supervised or unsupervised. Someof the most common techniques are described below according to their type and can be very useful in the field of ESPs.

3.1. Supervised methods

The problem that supervised methods intend to solve is made up by a set of input variables the values of which areknown and that affect one or more output variables (target variables). The problem consists in predicting the value of theoutput variables based in the input variables. For this reason, these methods are also called predictive methods and can beused to predict the performance of ESPs, the emissions and any other measured variable of interest.

The nature of the variables can be qualitative, when the variable acquires categorical or discrete values, or qualitative, ifthe variable takes continuous values. This aspect of variables is essential when it comes to choosing the appropriate datamining technique. The techniques can be capable of working with qualitative data, quantitative data or a combination ofboth, and the latter will be defined by the method's nature.

The difference between the types of data has led to distinguish between the methods that predict qualitative outputs andthose that forecast quantitative outputs. The methods that work with qualitative outputs are called classification methodsand their outputs, classes. On the other hand, the methods that work with quantitative methods are called regressionmethods. Both methods have many things in common and both can be seen as a sort of function approximation task (Hastieet al., 2001).

It is worth mentioning that it is possible to apply classification models to originally quantitative outputs by means of adiscretization usually carried out by splitting the value space into intervals. In some cases, this practice involves a loss ofprecision and less reliable results, while in others, positive results may be achieved (Fayyad & Irani, 1993).

Supervised methods generally go beyond a simple prediction as the models generated contain valuable informationabout the behaviour of the phenomenon under study that can be very significant in the understanding of the problem. Forexample, many algorithms are capable of generating models that include information of the most important variables, inother words, those that affect the output variable in the most decisive manner, which can be useful to make decisions aboutthe ESP during its configuration or in future designs, for instance.

One of the main characteristics of supervised methods is that they generate the model, carrying out a training processbased on the values of input and output variables measured in the process to be modelled, in other words, they need a set oflarge data that includes both the value of the input variables and the output variables obtained from the ESP under study.

Some of the algorithms that can obtain good results in predictive tasks regarding ESPs, depending on the specificproblem, are the following: decision trees (Breiman et al., 1984; Cestnik et al., 1987; Quinlan, 1986, 1993, Chapter C4.5),support vector machines (Cortes & Vapnik, 1995) and Bayesian networks (Pearl, 1988) in classification, and linear regression,MARS (Friedman, 1991) and RuleFit (Friedman & Popescu, 2008) in regression.

3.2. Unsupervised methods

In the case of unsupervised methods, there is not a target variable to be predicted or explained based on the inputvariables. In this case, the problem is posed as a set of measurements of the variables under study, based on which, the ideais to discover patterns inherent to the data. Therefore, these are explanatory methods (in contrast with predictive methods)and their main aim is to discover knowledge. For instance, by means of using association rules, it is possible to discoverrelationships between the variables of the ESP process. On the other hand, clustering techniques are capable of findingstrange functioning patterns, providing information about the causes of these patterns, etc. These techniques aredescribed below.

3.2.1. Association analysisThe aim of the association analysis is to find relationships between the variables under study. On many occasions, these

relationships are entailment rules between values of the variables and, therefore, these values are usually categorical orquantitative variables that have been previously discretized. The two most usual types of association studies are associationrules and the discovery of sequential patterns.

Page 5: Data-driven ESP modelling and optimisation

D. Toimil et al. / Journal of Aerosol Science 70 (2014) 59–66 63

In the first case, the result obtained is presented as rules like X⟶Y where X and Y are some sort of sentence about thevariables of the set of data under study. Generally, these entailments are presented with measurements that allow us toassess the usefulness of the rule, such as the confidence level and the support factor, which measure the truthfulness of therule and its appearance frequency, respectively.

On the other hand, the discovery of sequential patterns is focused on searching for events that are related over time. Forexample, certain variables can acquire certain values after a set of variables has previously taken other values. Thesevariables will be related by a sequential pattern that, as in the previous case, can be associated to quality measurements suchas the confidence level and support factor.

One of the most used algorithms for the association study is Apriori (Agrawal et al., 1994).

3.2.2. Cluster analysisThe main idea behind clustering is to group a data set in similar registers so that the registers belonging to the same

cluster have more similarities between them than with individuals belonging to other clusters. Therefore, it is necessary todefine a way to measure the similarity of registers to be able to compare them during the cluster formation process and thiselection can affect the resulting groups significantly. In the case of ESPs, the clusters will tend to group those measurementsthat register similar behaviours of the ESP, therefore grouping behaviour patterns of the ESP process as periods of low orhigh performance.

One of the main advantages of this sort of analysis is the capacity to work with problems with high dimensionality, thusallowing us to find out information about very complex processes in which many variables take part.

As well as clustering as a method to group registers, there are other techniques like hierarchical clustering that not onlygroup registers, but also provide cluster similarity measurements. This allows us to go even further, obtaining informationabout the distance between the members of two different groups, which can be useful when it comes to generatingknowledge and subsequently making decisions. Both types of clustering can detect in the ESP anomalous behaviour patternsas well as other potentially useful information, helping to identify the causes of such phenomena.

4. ESP modelling and analysis by means of data mining

Despite the usefulness of numerical modelling, it is a technique with certain limitations, such as the impossibility to usethe same model for different ESP configurations, the difficulty to model the complex phenomena that take place inthe filtering process, the variable size and shape of the suspended particles, etc. (see Section 2.3). This paper sets out thepossibility to use some of the data mining techniques described in Section 3 to make up for several of the shortages of thecurrent theoretical models. Sections 4.1 and 4.2 describe the advantages of data mining techniques, while Section 4.3analyses its drawbacks.

4.1. Data-driven generalisation

One of the main principles of data mining techniques is that they are capable of generalising the behaviours that takeplace inside the ESP based on the data obtained. Due to this fact, it is not necessary to create physical model that describesthe ESP's behaviour, which in some cases are really complex processes, the complete physical modelling of which isimpossible. Furthermore, data mining methods are capable of handling relatively large amounts of variables, allowing theinclusion of many variables that a numerical model would have trouble to model. For those cases in which these variablesplay an important role in the ESP's performance, the models obtained by means of data mining can improve considerablythe predictive capacity of the current numerical models. Moreover, it is relatively simple to include factors that do not stemdirectly from the ESP but could affect its performance in industrial environments.

An example of external factor is the temperature of the different parts of the process. White (1974) proves that thetemperature plays an important role in the resistivity of particles, thus affecting the ESP's efficiency. Industrial environmentscan have variable ambient temperatures that can affect the filtering process.

Another factor to be taken into account are the vibrations from the surroundings or the different components of the ESP.The latter can affect the phenomenon of re-entrainment of dust in the gas flow, therefore, they may be interesting to assessthe ESP's performance.

The ambient and particle humidity are other related external variables that also affect, as in the case of the temperature,the resistivity of the particles making them variables that are susceptible of having influence on the ESP's performance.

The inclusion of external factors like those described in the numerical models is not always possible and, when it is, theyare usually complex and computationally costly, therefore the use of data mining techniques is an interesting alternative.

4.2. Application to different ESP systems

ESP modelling by means of data mining is a more robust technique in the sense that the same technique is capable ofmodelling ESPs with very different shapes, configurations, environment conditions, etc., while numerical modelling isparticularly appropriate for a very specific type of ESP. This is due to the fact that all the factors invariable to time, whetherinternal or external to the ESP, such as its shape, size, configuration and shape of its electrodes, the number of stages, etc.,

Page 6: Data-driven ESP modelling and optimisation

D. Toimil et al. / Journal of Aerosol Science 70 (2014) 59–6664

are included implicitly in the data mining model during the modelling process. This includes the ESP's industrial conditionsand works both in the case of supervised (Section 3.1) and unsupervised methods (Section 3.2).

Supervised models are capable of generalising the behaviour of output variables based on the data measured andgenerating predictive models regarding those variables that must be studied, such as the percentage of emissions, thesparking ration, the amount of particles gathered, etc. Besides, they are able to provide information about the phenomenathat affect the variables under study, such as humidity, resistivity or other previously unknown causes.

However, applying data mining to the ESP modelling is not only limited to obtaining predictive models. The capacity toanalyse patterns and extract information from the data mining techniques can be a very powerful supplementary tool togenerate new knowledge. For example, by means of a variable importance analysis we can discover new variables that playan important role in the ESP's behaviour and that knowledge can be applied to the design of numerical models, improvingtheir efficiency.

On the other hand, the unsupervised methods can identify relationships between variables, helping to find cause-and-effect relationships that had not been noticed previously. Furthermore, behaviour patterns that may indicate a mal-functioning of the ESP can be identified by means of clustering techniques, helping to identify its causes.

4.3. Drawbacks of data mining

Data are the foundations of all data mining techniques. This fact has two important implications that condition theapplication of these methods in the analysis of ESP systems.

In the first place, the ESP under analysis must be installed and running so that all those variables that have to be taken intoaccount when it comes to applying data mining techniques can be measured. These techniques require data obtained from theprocess that we intend to model and that cover the greatest amount of different cases of phenomena that can possibly take placein the system, so as to be able to generalise those phenomena and incorporate that information to the resulting model.

On the other hand, the quality of the data plays an important role in the results of applying data mining. As Yan et al.(2003) pointed out, preparing the data can take more time than the data mining process itself and can bring up the sameproblems, if not more, as that process. The accuracy of the data used plays an important role in the results obtained (O'Leary,1993), for which reason it is necessary to make sure that the data are gathered and pre-processed correctly. In general, low-quality data generate useless results. Besides, it is necessary to have enough quantity of data since these techniques learnfrom patterns present in data and generally achieve better performance when there are a high amount of data available.

Due to these characteristics, numerical modelling is more appropriate during the design stages of the ESP, when the endESP has not yet been installed or when it is necessary to work with an ESP on a smaller scale, while data mining can offerbetter results when the end ESP has already been installed and is running.

5. Optimisation of the ESP's performance

As brought up in Section 1, the data mining models obtained can be used to control those ESP parameters that can bemodified, with the intention of optimising the ESP's performance. The use of metaheuristic algorithms is a possibleapproach to this problem.

Metaheuristic algorithms are capable of finding “good enough” solutions for a complex optimisation problem, in areasonable amount of time. These algorithms are very popular in those optimisation problems in which an exact algorithmis not known or, if it is known, it has an unacceptable computational cost. An in-depth description of how these methodswork is beyond the scope of this paper, but there is a large amount of literature available regarding this issue (Glover &Kochenberger, 2003; Talbi, 2009).

The functioning of these techniques is based on carrying out an intelligent search within the space of possible solutionsavoiding having to appraise all the existing solutions. Thus, the amount of calculations to be carried out is reduced, allowingus to deal with problems that are computationally costly. These algorithms are based on the use of a fitness function thatassesses the goodness of a given solution. By means of this function, it carries out a search for solution, the fitness functionof which is the best possible, sticking to the best solution found among all those that are appraised.

In this type of applications, the solutions that will be tested by the algorithm will consist in configuration combinationsof the parameters to be controlled, in order to find the best configuration possible. The data mining models are used toevaluate the fitness of each configuration and are the base of the optimisation process. For this reason, it becomes necessarythat the training data have high quality. This will allow the models to be correctly trained and capable of respondingaccurately to all the parameter combinations that the optimisation algorithm appraises.

Therefore, using the data mining models generated using the ESP's data, optimisation methods can be used to select thebest values for the ESP's configurable parameters, maximising its performance.

6. Conclusions

The complexity of the processes that take place inside an ESP leads the numerical modelling techniques to have severallimitations. In the case in which it is possible to measure the variables that take part in the ESP process, data mining appearsas a technique that allows us to overcome several limitations of numerical modelling.

Page 7: Data-driven ESP modelling and optimisation

D. Toimil et al. / Journal of Aerosol Science 70 (2014) 59–66 65

One of the strengths of data mining techniques is the capacity to include in the modelling process all the variables whichthere are data about, whether internal or external to the ESP. Furthermore, these techniques can be applied regardless of theshape and configuration of the ESP.

Due to the characteristics of each technique, numerical modelling is more appropriate for the analysis and modellingduring the design of the ESP or when working with lab systems on a smaller scale, in which the onsite data of the ESP arenot available. On the contrary, data mining is more useful when the system has already been established in the industrialprocess, where there is less control on the environment and there are many factors that numerical modelling cannot takeinto account, as the mining techniques are able to handle and extract underlying information among the data, taking intoaccount in an implicit manner all the phenomena that take place in the filtering process.

All in all, the application of data mining to the analysis of the process that takes place inside the ESP poses a newapproach to the modelling of these systems that can generate new knowledge about their functioning and offer newperformance forecast channels. The authors of this paper suggest as a future research line the application of thesetechniques to the data of ESPs implemented in industrial environments, in order to analyse the results and information thatthey may provide, as well as to optimise the existing systems by means of the dynamic configuration of the ESP'sparameters.

References

Adamiak, K. (2013). Numerical models in simulating wire-plate electrostatic precipitators: a review. Journal of Electrostatics, 71(4), 673–680, http://dx.doi.org/10.1016/j.elstat.2013.03.001.

Adamiak, K., & Atten, P. (2009). Numerical simulation of the 2-d gas flow modified by the action of charged fine particles in a single-wire esp. IEEETransactions on Dielectrics and Electrical Insulation, 16(3), 608–614, http://dx.doi.org/10.1109/TDEI.2009.5128495.

Agrawal, R., Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB(Vol. 1215, pp. 487–499).

Blanchard, D., Dumitran, L., & Atten, P. (2001). Effect of electro-aero-dynamically induced secondary flow on transport of fine particles in an electrostaticprecipitator. Journal of Electrostatics, 51–52(1–4), 212–217.

Böttner, C.-U. (2003). The role of the space charge density in particulate processes in the example of the electrostatic precipitator. Powder Technology, 135–136, 285–294 ⟨http://dx.doi.org/10.1016/j.powtec.2003.08.020⟩.

Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification and Regression Trees. Wadsworth & Brooks: Monterey, CA.Cestnik, G., Kononenko, I., & Bratko, I. (1987). Assistant 86: a knowledge acquisition tool for sophisticated users. Progress in Machine Learning, 1, 987.Cooperman, P. (1971). A new theory of precipitator efficiency. Atmospheric Environment, 5(7), 541–551, http://dx.doi.org/10.1016/0004-6981(71)90064-3.Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297, http://dx.doi.org/10.1007/BF00994018.Deutsch, W. (1925). Spitzenentladung und elektrischer wind. Annalen der Physik, 381(7), 729–736, http://dx.doi.org/10.1002/andp.19253810705.Dumitran, L., Atten, P., & Blanchard, D. (2004). Numerical Simulation of Fine Particles Charging and Collection in an Electrostatic Precipitator with Regular

Barbed Electrodes (Vol. 178, 199–205).Farnoosh, N., Adamiak, K., & Castle, G. (2010). 3-d numerical analysis of ehd turbulent flow and mono-disperse charged particle transport and collection in a

wire-plate esp. Journal of Electrostatics, 68(6), 513–522.Farnoosh, N., Adamiak, K., & Castle, G.S.P. (2011). Numerical calculations of submicron particle removal in a spike-plate electrostatic precipitator. IEEE

Transactions on Dielectrics and Electrical Insulation, 18(5), 1439–1452, http://dx.doi.org/10.1109/TDEI.2011.6032814.Fayyad, U.M., & Irani, K.B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In IJCAI (pp. 1022–1029).Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI magazine, 17(3), 37.Friedman, J.H. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19(1), 1–67.Friedman, J.H., & Popescu, B.E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3), 916–954.Glover, F., & Kochenberger, G.A. (Eds.) (2003). Handbook of Metaheuristics. International Series in Operations Research & Management Science (Vol. 57).

Boston: Kluwer Academic Publishers.Hastie, T., Tibshirani, R., & Friedman, J.J.H. (2001). The Elements of Statistical Learning. Springer Series in Statistics, Vol. 1. Springer: New York.Jädedrusik, M., & Åšwierczok, A. (2013). The correlation between corona current distribution and collection of fine particles in a laboratory-scale

electrostatic precipitator. Journal of Electrostatics, 71(3), 199–203, http://dx.doi.org/10.1016/j.elstat.2013.01.002.Khare, M., & Sinha, M. (1996). Computer-aided simulation of efficiency of an electrostatic precipitator. Environment International, 22(4), 451–462, http://dx.

doi.org/10.1016/0160-4120(96)00033-5.Kim, S., Park, H., & Lee, K. (2001). Theoretical model of electrostatic precipitator performance for collecting polydisperse particles. Journal of Electrostatics,

50(3), 177–190.Lei, H., Wang, L.-Z., & Wu, Z.-N. (2008). Ehd turbulent flow and monte-carlo simulation for particle charging and tracing in a wire-plate electrostatic

precipitator. Journal of Electrostatics, 66(3–4), 130–141, http://dx.doi.org/10.1016/j.elstat.2007.11.001.Liang, W.-J., & Lin, T. (1994). The characteristics of ionic wind and its effect on electrostatic precipitators. Aerosol Science and Technology, 20(4), 330–344.Lu, C., & Huang, H. (1998). A sectional model to predict performance of a plate-wire electrostatic precipitator for collecting polydisperse particles. Journal of

Aerosol Science, 29(3), 295–308.Mizeraczyk, J., Podlinski, J., Niewulis, A., & Berendt, A. (2013). Recent progress in experimental studies of electro-hydrodynamic flow in electrostatic

precipitators. Journal of Physics: Conference Series, 418(1))http://dx.doi.org/10.1088/1742-6596/418/1/012068.Neimarlija, N., Demirdi, I., & Muzaferija, S. (2011). Numerical method for calculation of two-phase electrohydrodynamic flows in electrostatic precipitators.

Numerical Heat Transfer, Part A: Applications, 59(5), 321–348, http://dx.doi.org/10.1080/10407782.2011.549080.O'Leary, D.E. (1993). The impact of data accuracy on system learning. Journal of Management Information Systems, 9(4), 83–98.Ortiz, F.G., Navarrete, B., & Caadas, L. (2010). Dimensional analysis for assessing the performance of electrostatic precipitators. Fuel Processing Technology, 91

(12), 1783–1793, http://dx.doi.org/10.1016/j.fuproc.2010.07.013.Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann: .Quinlan, J.R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.Quinlan, J.R. (1993). Programs for Machine Learning, Vol. 1. Morgan Kaufmann: .Soldati, A. (2000). On the effects of electrohydrodynamic flows and turbulence on aerosol transport and collection in wire-plate electrostatic precipitators.

Journal of Aerosol Science, 31(3), 293–305.Soldati, A. (2003). Cost-efficiency analysis of a model wire-plate electrostatic precipitator via DNS based eulerian particle transport approach. Aerosol

Science & Technology, 37(2), 171–182, http://dx.doi.org/10.1080/02786820300957.Soldati, A., Andreussi, P., & Banerjee, S. (1993). Direct simulation of turbulent particle transport in electrostatic precipitators. AIChE Journal, 39(12),

1910–1919, http://dx.doi.org/10.1002/aic.690391203.

Page 8: Data-driven ESP modelling and optimisation

D. Toimil et al. / Journal of Aerosol Science 70 (2014) 59–6666

Talaie, M., Taheri, M., & Fathikaljahi, J. (2001). A new method to evaluate the voltage–current characteristics applicable for a single-stage electrostaticprecipitator. Journal of Electrostatics, 53(3), 221–233, http://dx.doi.org/10.1016/S0304-3886(01)00142-5.

Talbi, E.-G. (2009). Metaheuristics: From Design to Implementation, Vol. 74. John Wiley & Sons: http://dx.doi.org/10.1002/9780470496916.White, H.J. (1974). Resistivity problems in electrostatic precipitation. Journal of the Air Pollution Control Association, 24(4), 313–338, http://dx.doi.org/

10.1080/00022470.1974.10469923 arXiv:http://www.tandfonline.com/doi/pdf/10.1080/00022470.1974.10469923.Yamamoto, T., & Sparks, L.E. (1986). Numerical simulation of three-dimensional tuft corona and electrohydrodynamics. IEEE Transactions on Industry

Applications, 1IA-22(5), 880–885, http://dx.doi.org/10.1109/TIA.1986.4504808.Yamamoto, T., Okuda, M., & Okubo, M. (2003). Three-dimensional ionic wind and electrohydrodynamics of tuft/point corona electrostatic precipitator. IEEE

Transactions on Industry Applications, 39(6), 1602–1607, http://dx.doi.org/10.1109/TIA.2003.818983.Yan, X., Zhang, C., & Zhang, S. (2003). Toward databases mining: pre-processing collected data. Applied Artificial Intelligence, 17(5–6), 545–561, http://dx.doi.

org/10.1080/713827171.Zhuang, Y., Kim, Y.J., Lee, T.G., & Biswas, P. (2000). Experimental and theoretical studies of ultra-fine particle behavior in electrostatic precipitators. Journal of

Electrostatics, 48(3-4), 245–260, http://dx.doi.org/10.1016/S0304-3886(99)00072-8.