Data mining and machine learning algorithms for soft ... ?· Data mining and machine learning algorithms…
Post on 05-Jun-2018
Theses of the doctoral (PhD) dissertation
Data mining and machine learning
algorithms for soft sensor development
University of PannoniaDoctoral School in
Chemical Engineering and Material Sciences
Supervisor:Jnos Abonyi, DSc
Department of Process EngineeringVeszprm2016.
1. Introduction and the aim of the work
Nowadays industry and the in particular process industry more and more reli-es on new information theory and articial intelligence related solutions thatcan help to improve the technology and reduce the cost of instrumentation,automation and maintenance. Software sensors are capable of extending oreven replace the classical instrumentation of technology. I developed techni-ques and tools to support the identication and maintenance of data-drivenmodels of soft sensors used for product quality and energy usage estimation.
The dissertation describes three dierent aspects of soft-sensor develop-ment. The rst two chapters are focusing on parametric and non-parametricmodelling, while the third chapter deals with the selection and preprocessingof the data.
Regarding non-parametric models, I proposed a genetic programmingbased algorithm to generate dimension reduction mappings. I showed theapplicability of these mappings in spectroscopic modelling and in solving theclassical Wine classication benchmark problem. Finally, I presented toolsto for the selection of the input variables of these models and data segments.I demonstrated the capability of the methods in spectroscopic modelling andenergy monitoring of chemical processes.
2. Experimental tools and technologies
We developed the dimension reduction mapping algorithms to extract usefulinformation from the data are originated from the Diesel Blending Unit andthe product development laboratory of MOL Duna Renery. The datasetscontained spectra recorded by ABB and Bruker spectrometers. For the pro-cessing of the spectra, I used the software of ABB and Bruker. We usedMATLAB to implement our algorithms. For the development of paramet-ric modelling related algorithms, the times series of process variables wareextracted from the OSIsoft PI central data collection system of MOL DunaRenery.
1. I have developed a genetic programming based solution tovisualise high dimensional datasets. The method can explore
the operating regimes of online NIR analysers and can support
the identication of classier models.
(Related publications: [3, 5, 15, 18, 19, 14, 13])
(a) I developed a multi-chromosome representation based genetic al-gorithm to nd explicit multi-dimensional projections of high-dimensional input spaces into lower dimensions. I applied themethod to visualise spectral databases of soft sensors by preserv-ing neighbourhood and distance relations of NIR spectra. [3, 18,19]
(b) I modied the cost function of the genetic programming to supportthe visualisation of classication problems. The results conrmthat the performance of traditional classiers improves, when weapply them on goal-oriented projected data. Additionally, I havedened a new classier that uses convex polygons and also gen-erates an informative view to the user. I have shown that thealgorithm can separate the operational regimes of a technologyhence helps to dene local models of soft sensors. [3, 5, 14, 13]
2. I developed parametric models for spectrometric applicationsand target calculation in energy monitoring systems. I worked
out methods to accelerate the modelling process and can gen-
erate informative vislualisations to show the hidden structure
and the validity range of the models.
(Related publications: [7, 1, 4, 6, 16, 8, 5, 12, 9])
(a) I developed a validation process that can qualify the modellingperformance and can determine the validity range of spectrometricmodels. I presented models that can visualise and explore thehidden structures in the training dataset. I compared data from aow-through cell, and a bre optic spectrometer and proved thatthe more cost ecient bre optic system has similar performanceas the ow through cell system has. [7, 6, 12, 9]
(b) I proved that Self-Organizing-Maps (SOM) can separate the op-erating modes in energy monitoring (EM) systems and I workedout a SOM based feature ranking and selection tool. [1, 4, 6]
(c) I implemented a Random Forest (RF) regression based featureselection algorithm as an extension of the developed framework.I used the RF to select relevant input variables for EM targetingmodels. 
3. I developed goal-orineted multivariate time series analysis meth-ods to support the identication of parametric and non-parametric
models used in NIR analyser based soft sensors and energy
monitoring systems. (Related publications: [1, 4, 2, 12, 9, 14])
(a) I demonstrated that Principal Component Analysis based timesseries segmentation can be used to nd consistent operating pe-riods of production and events aecting the dynamical behaviourof the process. [1, 6]
(b) I developed a novel regression-based times series algorithm to de-tect homogeneous periods of operation based on the predictionaccuracy of energy targeting models. The method is applicable toidentify events when energy eciency diers signicantly. [1, 2]
(c) I demonstrated that Self-Organizing-Maps can not only be usedto isolate operating modes and to dene local models for the in-dividual operating regimes, but it can also be applied to featureselection. The results illustrate that all of these functionalitiessupport the building of compact models used in energy monitor-ing. 
4 Utilization of results
A part of the results presented in the dissertation has been already utilised.A new practice has been introduced to maintenance and to upgrade themodel that is running in the ABB spectrometer of the Diesel Blending unitin MOL Duna Renery. This method includes the generation of new explicitmapping equations (aggregates). A bre optic spectrometer was installedinto one of the experimental reactors of the product development department,and it is being used to trace the experiments. In this system, partial leastsquares models are providing the estimations for the material properties ofthe product. The feature selection methods are being utilised to build newtargeting models into the energy monitoring systems.
The dissertation presented the application of the genetic algorithm togenerate dimensional reduction mappings of spectral datasets. However be-
sides of chemical applications this tools could be used eectively in any otherdata mining problems.
The presented time series evaluation methods have been already utilisedin telecommunication, more specically in the platforms that are suppliedby I-New Unied Mobil Solutions. Hence, these practices can detect severalfault and incident patterns before they cause the complete service outage.The time series analysis has been included into the monitoring system ofthe Mobil Virtual Network Operators (MVNOs) of Virgin Mobile Colombia,Virgin Mobile Chile, Compass and other providers.
5 Publications related to theses
Articles in international journals
 Janos Abonyi, Tibor Kulcsar, Miklos Balaton, and Laszlo Nagy. His-torical process data based energy monitoring-model based time-seriessegmentation to determine target values. Chemical Engineering Trans-actions, 35:931936, 2013.
 Janos Abonyi, Tibor Kulcsar, Miklos Balaton, and Laszlo Nagy. Energymonitoring of process systems: time-series segmentation-based targetingmodels. Clean Technologies and Environmental Policy, 16(7):12451253,2014.
 Tibor Kulcsar and Janos Abonyi. Development of a modelling frame-work for nir spectroscopy based on-line analyzers using dimensional re-duction techniques and genetic programming. Chemical EngineeringTransactions, 32, 2013.
 Tibor Kulcsar, Miklos Balaton, Laszlo Nagy, and Janos Abonyi. Featureselection based root cause analysis for energy monitoring and targeting.Chemical Engineering Transactions, 39:709714, 2014.
 Tibor Kulcsar, Barbara Farsang, Sandor Nemeth, and Janos Abonyi.Multivariate statistical and computational intelligence techniques forquality monitoring of production systems. In Cengiz Kahraman andSeda Yank, editors, Intelligent Decision Making in Quality Manage-ment, volume 97 of Intelligent Systems Reference Library, pages 237263. Springer International Publishing, 2016.
Articles in Hungarian journals
 Tibor Kulcsar and Janos Abonyi. Statistical process control based per-formance evaluation of on-line analysers. Hungarian Journal of Industryand Chemistry, 41(1):7782, 2013.
 Tibor Kulcsar, Gabor Sarossy, Gabor Bereznai, Robert Auer, and JanosAbonyi. Partial least squares model based process monitoring usingnear infrared spectroscopy. Periodica Polytechnica Chemical Engineer-ing, 57(1-2):1520, 2013.
 J. Abonyi, B. Farsang, and T. Kulcsar. Data-driven development andmaintenance of soft-sensors. In Applied Machine Intelligence and In-formatics (SAMI), 2014 IEEE 12th International Symposium on, pages239244. IEEE, Jan 2014.
 J. Abonyi, B. Farsang, and T. Kulcsar. Data-driven developmentand maintenance of soft-sensors. In 2014 IEEE 12th InternationalSymposium on Applied Machine Intelligence and Informatics (SAMI),Herl'any, Slovakia, 2014.
 Janos Abonyi, Tibor Kulcsar, Miklos Balaton, and Laszlo Nagy. Histor-ical process data based energy monitoring - model based time-series seg-mentation to determine target values. In PRES'13 - Conference ProcessIntegration, Modelling and Optimisation for Energy Saving and Pollu-tion Reduction, Rhodes, Greece, 2013.
 Janos Abonyi, Tibor Kulcsar, Miklos Balaton, and Laszlo Nagy. Fea-ture selection based root cause analysis for energy monitoring and tar-geting. In PRES 2014 - Conference Process Integration, Modelling andOptimisation for Energy Saving and Pollution Reduction, Prague, CzechRepublic, 2014.
 Tibor Kulcsar and Janos Abonyi. Partial least squares model basedprocess monitoring using near infrared spectroscopy. In Chemical Engi-neering Days '12, Veszprem, Hungary, 2012.
 Tibor Kulcsar and Janos Abonyi. Development of a modeling frameworkfor nir spectroscopy based on-line analysers using dimensional reductiontechniques and genetic programming. In ICheaP12 - International Con-ference on Chemical and Process Engineering, 2014.
 Tibor Kulcsar, Gabor Bereznai, Gabor Sarossy, Robert Auer, and JanosAbonyi. Data-driven development and maintenance of soft-sensors. InVisualisation of High Dimensional Data by Use of Genetic Program-ming: Application to On-line Infrared Spectroscopy Based Process Mon-itoring, 2012.
 Tibor Kulcsar, Gabor Bereznai, Gabor Sarossy, Robert Auer, and JanosAbonyi. Visualisation of high dimensional data by use of genetic pro-gramming: Application to on-line infrared spectroscopy based pro-cess monitoring. In Soft Computing in Industrial Applications, volume
223 of Advances in Intelligent Systems and Computing, pages 223231.Springer International Publishing, 2014.
 Tibor Kulcsar, Peter Koncz, Miklos Balaton, Laszlo Nagy, and JanosAbonyi. Statistical process control based energy monitoring of chemicalprocesses. In Petar Sabev Varbanov Ji Jaromr Kleme and Peng YenLiew, editors, 24th European Symposium on Computer Aided ProcessEngineering, volume 33 of Computer Aided Chemical Engineering, pages397 402. Elsevier, 2014.
 Tibor Kulcsar, Peter Koncz, Miklos Balaton, Laszlo Nagy, and JanosAbonyi. Statistical process control based energy monitoring of chemicalprocesses. In ESCAPE 24 - 24th European Symposium on ComputerAided Process Engineering, Budapest, Hungary, 2014.
 Tibor Kulcsar, Gabor Sarossy, Gabor Bereznai, Robert Auer, and JanosAbonyi. Visualization and indexing of spectral databases. In Proceedingsof International Conference on Computational and Statistical Sciences,volume 6, pages 860 865. World Academy of Science, Engineering andTechnology, 2012.
 Tibor Kulcsar, Gabor Sarossy, Gabor Bereznai, Robert Auer, and JanosAbonyi. Visualization and indexing of spectral databases. In Interna-tional Conference on Computational and Statistical Sciences, Zurich,Switzerland, 2012.