Wastewater quality monitoring system using sensor fusion and machine learning techniques

Download Wastewater quality monitoring system using sensor fusion and machine learning techniques

Post on 30-Oct-2016




1 download


  • gu


    Solids (TSS) and Oil & Grease (O&G) concentrations of the effluents from the Chinese

    that online monitoring for real-time process control may save

    several drawbacks in what concerns the control and optimi-

    zation of the treatment plants (Pouet et al., 1999). In addition

    to sampling and sample storage problems, the standard

    analytical methods currently available do not allow the

    implementation of real-time monitoring and process control.

    The use of better online sensors is in imminent demand for

    application of online measurements in water quality moni-

    based on visual observation and direct comparison of the UV/

    Vis spectra (Roig et al., 2002). Several reports applied multi-

    variate monitoring approaches (Lourenco et al., 2006; Wu

    et al., 2006; Wu, 2007). It is worth to point out that in most of

    these reports UV/Vis spectra were examined with the waste-

    water samples being static and of low strength. However, for

    * Corresponding author.

    Available online at www.sciencedirect.com

    .e ls

    wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 4E-mail address: kechengh@ust.hk (G. Chen).asmuch as 40% of the energy (themajor part of cost) currently

    needed for wastewater treatment by continuous aeration

    (Chambers and Jones, 1988; Olsson et al., 2005). However, the

    available wastewater quality monitoring technologies have

    toring, UV/Vis spectroscopy has attracted substantial atten-

    tion and led to some useful results (Bookman, 1997; Fogelman

    et al., 2006; Lourenco et al., 2006), although most of the re-

    ported wastewater UV/Vis spectrometric applications are1. Introduction

    Monitoring wastewater quality is a subject of growing

    importance around the world such that better understanding

    of both treated and untreated effluent is improved for a better

    treatment plants control. For example, it has been estimated

    water quality monitoring. However, due to the complex

    pollutant matrix and generally hostile environment, the lack

    of accurate, cost-effective and robust sensors, the automation

    of wastewater treatment systems is still not as developed as

    other process industries.

    In view of the high potential for the development andReceived in revised form

    4 October 2011

    Accepted 5 December 2011

    Available online 11 December 2011


    Online monitoring

    UV/Vis spectroscopy


    Variable weighting


    Wastewater treatment0043-1354/$ e see front matter 2011 Elsevdoi:10.1016/j.watres.2011.12.005restaurant on campus and an electrocoagulationeelectroflotation (ECeEF) pilot plant. In

    order to handle the noise and information unbalance in the fused UV/Vis spectra and

    turbidity measurements during the calibration model building, an improved boosting

    method, Boosting-Iterative Predictor Weighting-Partial Least Squares (Boosting-IPW-PLS),

    was developed in the present study. The Boosting-IPW-PLS method incorporates IPW into

    boosting scheme to suppress the quality-irrelevant variables by assigning small weights,

    and builds up the models for the wastewater quality predictions based on the weighted

    variables. The monitoring system was tested in the field with satisfactory results, under-

    lying the potential of this technique for the online monitoring of water quality.

    2011 Elsevier Ltd. All rights reserved.Article history:

    Received 22 June 2011Amulti-sensor water quality monitoring system incorporating an UV/Vis spectrometer and

    a turbidimeter was used to monitor the Chemical Oxygen Demand (COD), Total Suspendeda r t i c l e i n f o a b s t r a c tWastewater quality monitorinand machine learning techniq

    Xusong Qin, Furong Gao, Guohua Chen*

    Department of Chemical and Biomolecular Engineering, The Hong K

    Kowloon, Hong Kong, China

    journal homepage: wwwier Ltd. All rights reservedsystem using sensor fusiones

    University of Science and Technology, Clear Water Bay,

    evier .com/locate/watres.

  • wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 41134some effluents such as restaurantwastewater, concentrations

    of oil and grease, suspended solids and colloids are very high

    (Chen et al., 2000), making it very difficult to obtain a satis-

    factory monitoring performance using a single optical tech-

    nique. In addition, these pollutants tend to foul the light

    transmitting windows, degrading the monitoring perfor-

    mance of the UV/Vis spectroscopy. Thus, one may need to

    develop a combined optical method, combining two or three

    optical spectral sources and process variables to compensate

    their drawbacks so as to improve themonitoring performance

    (Russell et al., 2003; Thomas and Constant, 2004;Wu, 2007;Wu

    et al., 2007).

    The combined optical method is essentially one typical

    application of the sensor fusion technology (Wu, 2007). Sensor

    fusion, which is also known as multi-sensor data fusion, first

    appeared in the literature as mathematical models for data

    manipulation in the 1960s. It refers to the acquisition, pro-

    cessing and synergistic combination of information gathered

    by various knowledge sources and sensors to provide a better

    understanding of a phenomenon. It is a fascinating and

    rapidly evolving field that has generated a lot of excitement in

    the research and development community, and is being

    applied to a wide variety of the fields such as military

    command and control, robotics, image processing, air traffic

    control, medical diagnostics, pattern recognition and envi-

    ronmental monitoring (Varshney, 1997). However, the appli-

    cations of sensor fusion techniques are disparate and

    problem-dependent (Klein, 2004; Esteban et al., 2005). It is

    impossible to use a one-fits-all technique/algorithm to solve

    all the problems as well as the frameworks for the imple-

    mentation of the sensor fusion systems. Therefore, in order to

    construct an optimal sensor fusion system for the combined

    optical monitoring system, one has to properly select the

    optical spectral sources/process variables, the sensor fusion

    framework and its corresponding algorithms at each step. As

    a result, the candidate optical spectral sources and process

    variables for the combined optical method should be

    complementary from chemical and/or physical aspects for the

    water quality measurement. However, in spite of their infor-

    mation complementarities, information redundancy and

    unbalance problems usually occur because of the variable

    collinearity among spectral wavelengths and process vari-

    ables, and the significant dimensionality difference among

    the various types of optical spectra and process variables. If

    the sensor fusion strategy and fusion levels are not properly

    selected, the modeling effort may concentrate on the data

    sources with higher dimensionality rather on the more

    quality-informative low dimensional data sources. The cali-

    bration model obtained therefore may be less accurate due to

    the loss of quality-related information. In addition, the results

    obtained are oftenworsened by the presence of uninformative

    variables, such as the highly fluctuating background and

    noises, in optical spectra/process variables. Since not all

    wavelengths/variables are useful for quality prediction,

    various sensor fusion systems and variable selection/feature

    extraction methods (Wu, 2007; Wu et al., 2007; Centner et al.,

    1996; Forina et al., 1999; Swierenga et al., 2000; Goicoechea

    and Olivieri, 2002; Chen et al., 2004; Lu et al., 2007) havebeen designed to cooperate with the calibration modeling

    methods. In this way, the variable selection procedure isperformed only once, and only one set of those most quality-

    related variables is retained as descriptors for regression

    modeling. However, this may lead to information loss

    compared with the original spectra/process variables space,

    more or less, resulting in accuracy loss of the regression


    In the past two decades, the application of Boosting to

    regression problems has received significant attention

    because its ensemble learning nature can produce higher

    predictive accuracy than single model strategies. Freund and

    Schapire (1997) proposed the first algorithm of Boosting for

    regression problems, the AdaBoost.R algorithm. The most

    important contribution of this method is the majority vote

    idea that combines a group of weighted weak learners (fitting

    models) which only guarantee to achieve an error rate of

    slightly less than that achieved by random guessing. The

    weights of the weak learners are defined by their accuracies,

    respectively. Drucker (1997) developed the AdaBoost.R2 algo-

    rithm, which is an ad hoc modification of the AdaBoost.R

    algorithm. The advantage of Druckers method is its ad hoc

    ability, i.e., any learners, nomatter linear or non-linear, can be

    incorporated. Many other researchers (for examples,

    Ridgeway (1999), Friedman et al. (2000), Friedman (2001),

    Zemel and Pitassi (2001), Duffy and Helmbold (2002)) have

    viewed Boosting as a gradient machine that optimizes

    a particular loss function. In this sense, Boosting is essentially

    a method that combines a group of weak learners that

    perform marginally better than random guessing to obtain

    a powerful learner in regression. These weak learners are

    constructed through iterative steps by always using a basic

    learning algorithm. In each step, a new learner (fitting model)

    is established by relating the predictor variables of X (an n pmatrix with p predictors for n samples) to the residuals

    (prediction errors) of the responses y (an n 1 vector for nsamples) that are not fitted by previous learners. The

    Boosting-partial least squares (Boosting-PLS) and its modifi-

    cations (Zhang et al., 2005; Wu et al., 2006; Zhou et al., 2007;

    Lutz et al., 2008; Shao et al., 2010; Tan et al., 2010) proposed

    recently introduced PLS into the boosting procedure by

    combining a set of shrunken PLS models. However, the above

    Boosting methods construct all the weak learners based on

    the same p predictor variables of X. The highly fluctuating

    background and noises in X definitely weaken the predictive

    ability of the weak learners, as well as the Boosting model.

    From the nature of sequential additions of weak learners in

    boosting procedure, it is worthwhile to extract the quality-

    informative features/variables only for the weak learner in

    each iteration, for the improvement of the predictive accuracy

    and robustness of the Boosting model.

    Because the trade effluent surcharge of the restaurant is

    determined by the effluent COD value according to the sewage

    services ordinance of Hong Kong (DSD-HKSAR, 2011), and the

    suspended solids and oil & grease have strong impacts on the

    light penetration and the biodegradation ability of the water,

    these three water quality indices are of particular interest in

    the present study. The objective of the present study was

    therefore to develop a wastewater quality monitoring system

    that incorporates a UV/Vis spectrometer and a turbidimeter tomonitor COD, TSS and O&G concentrations of the effluents of

    the Chinese restaurant on campus and a pilot ECeEF

  • vector (1 p) of sample j.

    wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 4 11352.2. Uninformative variable elimination-partial leastsquares (UVE-PLS)

    In a PLS model, there are some of the variables that can be

    noisy and/or do not contain information relevant to the

    prediction of y. Eliminating these variables from the explan-

    atory data can improve the model. UVE-PLS method proposedwastewater treatment plant. Sensor fusion technique was

    used to fuse the signals from these two sensors/instruments.

    An improved boosting method, Boosting-IPW-PLS, was

    developed here to handle the noises and information unbal-

    ance of the fused information, to model and predict the water

    quality. The system was evaluated in field trials. Satisfactory

    results were obtained as seen subsequently.

    2. Theory and methods

    2.1. Partial least squares (PLS)

    Suppose a data set {X, y} where X is an n p process variabledata matrix with p predictor variables for n samples, e.g. n

    wastewater samples with corresponding UV/Vis spectra (each

    UV/Vis spectrum has p wavelengths), and y is the corre-

    sponding dependent variable vector with size n 1, e.g. CODmeasurements of these n wastewater samples. PLS models

    both outer relations within X and y blocks and inner relations

    between two blocks. The equations are as follows:

    X TPT E Xai1

    tipTi E; (1)

    y UQT F Xai1

    uiqTi F; (2)

    u^i biti; (3)where T, P and E are the score matrix, loading matrix and

    residual matrix of X space while U, Q and F are score matrix,

    loading matrix and residual matrix of y space, respectively. ti,

    pi, ui and qi are the corresponding vectors of T, P, U and Q

    matrices. Eqs. (1) and (2) describe the outer relations, Eq. (3)

    describes the inner relations between y space and X space,

    a is the number of latent variables, bi tiTui/(tiTti) is theregression coefficient between the PLS component ti from X

    space and the PLS component ui from y space. The standard

    PLS procedure based on non-linear iterative partial least

    squares (NIPALS) algorithm and the methods to choose the

    number of latent variables (LVs) of PLS (cross-validation,

    jackknife and so on) can be found in the works of Wold et al.

    (1984), Geladi and Kowalski (1986), and Hoskuldsson (1988).

    Based on NIPALS algorithm, the prediction of new sample j

    can be written as:

    y^aj xjba ca; (4)

    where ba is the coefficient vector ( p 1), ca is the offset whena latent variables are used, and xj is the predictor variableby Centner et al. (1996) is one of the methods to eliminate the

    uninformative variables.In UVE-PLS method, a PLS regression coefficient matrix

    ba [b1a,.bpa] with a latent variables is calculated througha leave-one-out validation. Because each coefficient bk

    a repre-

    sents the contribution of the corresponding variable to the

    establishedmodel, the reliability of each variable (wavelength)

    k can be quantitatively measured by its stability defined as:

    ck mean


    stdbak ; (5)

    where mean(bka) and std(bk

    a) are the mean and standard devi-

    ation of the regression coefficients of variable k. To determine

    the uninformative variables, UVE-PLS adds an equal number

    of random predictors or artificial predictors with very small

    value (range of about 1010) to the original predictors. Themaximum of the absolute value of the reliability value Ccutoffdefined by Eq. (5) for the added artificial predictors is the cut-

    off value for the elimination of non-informative original

    predictors. Only the original variables which have reliability

    values larger than Ccutoff will be retained.

    2.3. Iterative predictor weighting-partial least squares(IPW-PLS)

    The IPW-PLS method originally developed by Forina et al.

    (1999) aims at producing acceptable calibration models with

    a small number of variables. The useless or redundant

    predictors in the PLS regression have been eliminated. The key

    component of the IPW-PLS is tomultiply the variables by their

    importance in the cyclic repetition of PLS regression. The

    importance of the variable is defined as:

    zk bakskPpk1baksk; (6)

    where sk and bka are the standard deviation and PLS regression

    coefficient of the variable k respectively, and p is the number

    of variables.

    2.4. Boosting-partial least squares (Boosting-PLS)

    The basic idea of Boosting is to sequentially construct additive

    regression models by fitting a basic learner to the current

    residuals that are not fitted by previousmodels, and finally the

    weighted predictions of a collection of regression models are

    used as an ensemble prediction. Using PLS as the basic/weak

    algorithm, one can obtain the Boosting-PLS algorithm (Zhang

    et al., 2005). Because of the nature of Boosting, Boosting-PLS

    does not require the selection of an adequate number of

    latent variables, in contrast to the classical PLS. With proper

    shrinkage value and number of iterations determined,

    Boosting-PLS has at least comparable prediction ability as

    classical PLS (Zhang et al., 2005).

    2.5. Boosting-IPW-PLS

    In the standard boosting procedures, the weak learners are

    constructed from the X matrix (n p, with p predictors for nsamples) and the current residuals (n 1, for n samples) that

    are not fitted by previous models. In the Boosting model

    construction process, weights of the samples are updated

  • according to the prediction errors of previous models, and are

    utilized for later weak learner training. However, the predic-

    tors in X matrix are not weighted/modified. The quality-

    irrelevant predictors, which are not useful to the estimation

    of quality indices, definitely affect the establishments of the

    weak learners. Therefore, the loss of accuracy and robustness

    of the Boostingmodelmight occur. Realizing this fact, a robust

    calibration modeling method, Boosting-IPW-PLS, is proposed

    on the bases of predictorweighting and boosting framework in

    the present study, as shown in Fig. 1. The proposed algorithm

    is derived from SQUARELEV.R algorithm developed by Duffy

    and Helmbold (2002). It integrates the predictor weighting

    into the boosting procedure to suppress the quality-irrelevant

    variables by signing smallweights. The algorithm is as follows.

    Consider a data set {X, y} and a Boosting model with sizeM

    (M weak learners), where X is an n p process variable datamatrix with p predictor variables for n samples, and y is the

    corresponding dependent variable vector with size n 1. TheBoosting model F0 is initialized as the zero function.

    For m 1, 2,., M, repeat the following steps 1e6.

    wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 41136Xm X : Wm: (8)

    Here Wm is an n p matrix with each row as the variableweight vector wm, and the operator . is the element-by-element multiplication operator.yres;m y Fm1X: (7)

    Step 2 Perform predictor weighting on the original variables

    with respects to X and yres,m using IPW-PLS method,

    and get the variable weight vector wm with size 1 paccording to Eq. (6).

    Step 3 Multiply the variables weights with the predictors of

    original X.Step 1 Calculate the residual of the Boostingmodel obtained in

    last iteration,Fig. 1 e Schematic diagram of Boosting-IPW-PLS method.m1 h1

    s:t: MN

    bm;h 1;

    bm;h > 1060;

    bm;h 106 ; h 1; 2;.;p; m 1; 2;.;M; (13)where y^M is the prediction of the response y using a Boostingchanged in every iteration, it is very important to select an

    appropriate iteration time, M, to avoid overfitting. We

    proposed one stopping criterion here to determine when to

    stop adding models, derived from structural risk minimiza-

    tion (SRM) principle (Vapnik, 2000) and Akaikes information

    criterion (AIC) (Akaike, 1974), as:

    min CM nln


    y y^M 2




    !one is when to stop adding models, that is, the size of M. For

    the first two questions, one may use cross-validation, jack-

    knife and other methods to determine the latent variables as

    discussed in the works of Wold et al. (1984), Geladi and

    Kowalski (1986) or Hoskuldsson (1988). For simplicity, only

    one PLS component was used in IPW-PLS for variable

    weighting and the PLS weak learner construction in the

    present study. Since the X space and the residual space areStep 4 Construct a PLS weak learner fm on the weighted

    predictors Xm and the current residual yres,m,

    yres;m fmXm Em Xmbm cm Em; (9)

    where bm and cm are the corresponding PLS regression coeffi-

    cient vector and offset, the Em is residual not fitted by current

    weak learner.

    Step 5 Calculate the shrinkage value of the current weak


    am yres;m yres;m

    fmXm fmXm

    fmXm fmXm2 : (10)

    Here, yres;m andfmXm are the mean values of the current

    residual and predictions, respectively.

    Step 6 Update the Boosting model,

    Fm Fm1 amfm: (11)

    After finishingM boosting cycles,Mmember of PLSmodels are

    built and their corresponding shrinkage parameters are

    determined. The dependent variables yu (l 1 vector) of theunknown l samples with measurement matrix Xu (l pmatrix) are predicted as:

    y^u a1y^u;1 a2y^u;2 a3y^u;3 / aMy^u;M


    amXu : Umbm cm; (12)

    where Um is an l p matrix with each row as the variableweight vector wm determined in Step 2; bm, cm, and am are

    determined in Steps 4 and 5 respectively.

    Now three questions remain. The first two questions are

    the numbers of PLS components used in IPW-PLS for variable

    weighting and the PLS weak learner construction. The thirdmodel with size M, bm,h and am are the PLS regression coeffi-

    cient and shrinkage value of weak learnerm. The first term of

  • Eq. (13) represents the empirical error of the model of the

    training set. The second term represents the complexity of the

    Boosting model. For each weak learner, its model complexity

    is considered as the number of the variables whose corre-

    sponding absolute PLS regression coefficients are larger than

    a very small value, for example 106 in the present study.Since the prediction of the weak learner only contributes

    partially to the final prediction as shown in Eq. (12), the cor-

    responding weak learners model complexity also only

    contributes partially to the final Boosting model complexity.

    Because of the nature of Boosting, the empirical error of the

    boosting model with respect to the training set will decrease

    whenmoreweak learners are added. However, the complexity

    of the boosting model will increase accordingly. When the

    decrease of the empirical error predominates the increase of

    the model complexity, CM decreases, and vice versa. As

    a consequence, CM increases in the initial a few iterations, and

    then increases as more weaker learners are added. The

    optimal tradeoff between the empirical error and the model

    complexity is therefore leading to the minimum CM. Thus, the

    M value, which is corresponding to the minimum CM, is

    therefore selected as the size of the Boosting model.

    Particularly, the wastewater was inmotion, falling through an

    open chamber with a specified jet flow of a 10 mm in diameter

    for UV/Vis spectrum acquisition. Since there is no contact

    between the wastewater and the optical windows, the fouling

    of the light transmitting windows could be eliminated. The

    monitoring performance of the proposed system could then

    be improved, and the corresponding maintenance of sensors

    could be reduced to acceptable level.

    The UV/Vis spectroscopy has a wavelength ranging from

    200 to 800 nm with 1749 variables in the present study. The

    UV/Vis spectrum data could be directly imported to the

    computer. The turbidity measurement needs to be acquired

    by the data acquisition system. Themeasurement range of the

    turbidimeter is 0e1000 NTU. As a consequence, there are two

    types of information, turbidity measurement and UV/Vis

    spectrum of the water sample, could be utilized for the water

    quality monitoring purpose, as seen from Fig. 2. Taking the

    turbidity measurement as one artificial variable at UV/Vis

    wavelength 801 nm, the combination of the UV/Vis spectrum

    and the turbidity measurement has 1750 variables, ranging

    from 200 to 801 nm.

    In order to introduce the wastewaters from the restaurant,

    EC reactor, and EF reactor to the water quality monitoring

    system, a sampling system was installed. The wastewater

    wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 4 11373. Experimental

    3.1. Hardware setup

    The experimental setup including the ECeEF treatment

    system and water quality monitoring system is schematically

    shown in Fig. 2. The water quality monitoring system consists

    of two parts: the UV/Vis spectroscopy and the turbidimeter.

    The UV/Vis spectroscopy and turbidimeter were operated in

    continuous/online measurement mode in the present study.Fig. 2 e Schematic diagram of the ECeEF treatment systflows supplied to the water quality monitoring system were

    then determined by the operation of the switch valve. In the

    present study, the switching time of the switch valve was

    3 min, providing 2 min for flushing and 1 min for water quality

    monitoring. Therefore, the sampling intervals for the effluents

    from the restaurant, EC reactor, and EF reactor were 9 min in

    the present study. It is worth to note that this comparatively

    long sampling interval is due to the switching operation of the

    switch valve for the introduction of the three wastewater

    sources to the monitoring system. If the monitoring system isem and the wastewater quality monitoring system.

  • dedicated to one water source, real-time water quality moni-

    toring could be achieved because the response time of the

    water quality monitoring system is only 1 s.

    3.2. Wastewater characteristics

    In order to test the validity of the proposed water quality

    Turbidity was measured using HF Micro TOL turbidimeter (HF

    Vis spectrometer and turbidimeter, were utilized and fused for

    the construction of thewater qualitymonitoring system in the

    present study.

    As described in Section 3.1, the turbidity measurement has

    only one variable while the UV/Vis spectrum has 1749 vari-

    ables. Consequently, one could either perform the sensor

    fusion after the feature extraction of UV/Vis spectrum, or vice

    versa. In the present study, signal level of fusion was adopted

    in order to utilize the information of the UV/Vis spectrum and

    Root mean square error of prediction:

    Table 2 e Typical wastewater samples.





    Restaurant 1370 193 213 164

    EC reactor 782 224 81.7 120

    EF reactor 497 53.6 14.5 24.1

    200 300 400 500 600 700 800







    Wavelength, nm



    wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 41138scientific, Inc., USA). COD was measured using COD reactor

    and direct reading spectrophotometer (DR/2000, Hach

    Company, USA). O&Gwas examined by EPAmethod 1664A (US

    EPA, 1999).

    Table 2 lists the details of some typical wastewater

    samples with their corresponding UV/Vis spectra as shown in

    Fig. 3. As seen from Fig. 3, the UV/Vis spectra are very noisy

    due to the strong interactions between UV/Vis light and the

    particles, oil and grease droplets, especially in UV wavelength

    range 200e250 nm and near-infrared wavelength range

    600e800 nm.

    4. Results and discussions

    4.1. Data pre-processing and sensor fusion

    The presence of the high concentrations of solid, oil and

    grease contents would affect the UV/Vis spectrum measure-

    ment significantly, as illustrated in Section 1 and demon-

    strated in Section 3.2. The measurement of turbidity is a key

    test of water quality. It is a measure of the degree to which the

    water sample loses its transparency due to the presence of

    suspended particulates, droplets of oil and grease, etc., which

    scatter the light and prevent it from passing through. It is

    essentially complementary to UV/Vis spectrum from this

    aspect. Thus, two different kinds of instruments/sensors, UV/

    Table 1 e Characteristics of wastewater samples.

    Range (mg/L) Number of samples

    Training Testing

    COD 176e2550 108 55

    TSS 9.7e410 108 55monitoring system, a total of 163 samples were collected from

    the Chinese restaurant and the pilot ECeEF wastewater

    treatment plant on campus for COD, TSS and O&G measure-

    ments. Due to the sampling handling problem during O&G

    analysis, 10 samples were discarded for O&G measurements.

    As a consequence, the numbers of the samples for COD, TSS

    and O&G measurements are 163, 163 and 153, respectively.

    These sample sets were then randomly separated into two

    subsets, two-third for prediction model training while the

    other one-third for testing. The concentrations of COD, TSS

    and O&G and sample numbers for model training and testing

    are listed in Table 1. The pH, conductivity and turbidity of

    these samples ranged from 4.65 to 9, 172 to 1750 mS/cm, and

    1.68 to 254 NTU, respectively. Here, TSS, pH and conductivity

    were examined by the standard methods (APHA et al., 2005).O&G 0.93e525 102 51RMSEP 1k


    y^j yj

    2vuut : (16)







    , a


    .turbidity signal asmuch as possible. Prior to the sensor fusion,

    standard variate transformationwas performed on the UV/Vis

    spectra and turbidity signals respectively. As a result, the

    fused data have zero-means and unit variances.

    4.2. Performance indices

    To evaluate the performance of the proposedmethod/system,

    the following indices are used.

    Maximum prediction error:

    MaxE maxy^j yj

    ; j 1;.; k: (14)

    Minimum prediction error:

    MinE miny^j yj

    ; j 1;.; k: (15)Fig. 3 e Typical UV/Vis spectra of the effluents from (1)

    restaurant, (2) EC reactor, and (3) EF reactor.

  • independent space and response space is expressed by the

    first several latent variables. If only one latent variable was

    used in IPW-PLS, only the variables contributing to the most

    significant covariance information were assigned with large

    weights while the other variables may be assigned zero

    weights. This is equivalent to variable selection/feature


    It is quite interesting that for the three Boosting models,

    0 100 200 300 400 500






    l e


    Number of Boosting-IPW-PLS iterations



    0 100 200 300 400 500








    Empirical error

    Model complexity



    l e


    r / M


    el c




    Number of Boosting-IPW-PLS iterations

    Optimal M value: 183








    0 100 200 300 400 500











    Empirical error

    Model complexity



    al e


    r / M


    el c




    Number of Boosting-IPW-PLS iterations







    Optimal M value: 283




    Fig. 4 e The empirical error, model complexity and CMwith

    various sizes of Boosting-IPW-PLS models for: (a) COD,

    (b) TSS, (c) O&G measurements.

    wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 4 1139Correlation coefficient of the prediction values and analyt-

    ical values:

    R Xk


    y^j ^y

    yj y

    !, Xkj1





    yj y

    2vuut ; (17)where k is the number of samples of the test set, y^j is the

    prediction of yj of the sample j of the test set, ^y and y are the

    mean values of the predictions and responses, respectively.

    4.3. Boosting-IPW-PLS model building

    4.3.1. Determination of the size of Boosting-IPW-PLSBefore obtaining final prediction models for different quality

    parameters, one needs to determine the size of Boosting-IPW-

    PLS, as illustrated in Section 2.5. Fig. 4 shows the influence of

    the ensemble sizes on the empirical error, model complexity,

    and CM for COD, TSS and O&Gmeasurements, respectively. As

    seen in Fig. 4, when the ensemble size increased, the empirical

    error gradually decreased. This was obvious because in each

    iteration the new weak learner focused on the residual of the

    Boosting-IPW-PLS model obtained in last iteration. However,

    with a too large M, the Boosting model often becomes too

    strongly tailored to the particularities of the training set and

    the models generalization capability to the new water

    sampleswould be poorer. In addition, when the ensemble size

    increased, more weak learners were added into the Boosting

    model. As a consequence, the complexity of Boosting model

    was increased, as shown in Fig. 4. Because a prediction model

    with too complex structure is not preferable in practice, an

    appropriate ensemble size of Boosting model should be


    The proposed stopping criterion CM decreased firstly and

    then increased as the ensemble size increased for the

    Boosting-IPW-PLS models of these three water quality

    measurements, as shown in Fig. 4. This phenomenon was

    expected because the trend of CM is determined by the

    empirical error and the model complexity, as described in Eq.

    (13). CM would decease if the decrease of the empirical error

    was larger than the increase of themodel complexity, and vice

    versa. Therefore, theM value where CM reached theminimum

    was selected as the ensemble size of the Boosting model. As

    a result, the sizes of the Boosting-IPW-PLS models for COD,

    TSS, and O&G measurements were selected as 199, 183, and

    283 as found in Fig. 4, respectively.

    4.3.2. Variable weight evolutionIn Boosting-IPW-PLS, the weight of each variable reflects the

    correlation between the variable and the quality index. The

    higher the weight is, the more informative the variable is. The

    evolution of the variable weights along ensemble size reveals

    the contribution of the variable to the final model.

    Fig. 5 shows the variable weight evolution of the first eight

    weak learners/ensembles of Boosting models for COD, TSS,

    and O&G measurements obtained in the previous section.

    Only a very small portion of the variables was enrolled in each

    weak learner construction. This may be attributed to the fact

    that only one latent variable was used in the IPW-PLS algo-rithm for the variable weighting in the present study. In PLS,

    the most significant variance information between the60





    Empirical error

    Model complexity

    r / M


    el c







    Optimal M value: 199


    CMthe turbidity measurement (at the wavelength of 801 nm) was

    assigned with very large weights in the construction of the

  • ak

    wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 411401aWefirst weak learner. This fact indicates its information

    complementarity to UV/Vis spectrum and its informativeness

    for the predictions of COD, TSS and O&G concentrations of

    wastewater samples, as discussed in previous sections.

    200 300 400

















    le w











    200 300 400


















    le w












    200 300 400


















    le w












    Fig. 5 e The variable weight evolution of the first eight weak le

    (c) O&G measurements.learner 8The frequency of the variable selected by theweak learners

    is another important indicator for process analysis and model

    prune. If a variable has never been selected in any weak

    learner construction, it may be quality-irrelevant and can be

    500 600 700 800

    learner 7

    learner 6

    learner 5

    learner 4

    learner 3

    learner 2

    learner 1

    ngth, nm

    500 600 700 800

    learner 8

    learner 7

    learner 6

    learner 5

    learner 4

    learner 3

    learner 2

    learner 1

    ngth, nm

    500 600 700 800

    learner 8

    learner 7

    learner 6

    learner 5

    learner 4

    learner 3

    learner 2

    learner 1

    ngth, nm

    arners of Boosting-IPW-PLS models for: (a) COD, (b) TSS,

  • wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 4 114180aremoved from the original X space. Fig. 6 shows the frequency

    of the variables selected by the weak learners of the Boosting

    models for COD, TSS and O&G measurements. Here, whether

    or not a variable is selected by aweak learner is determined by

    200 300 400 500 600 700 800












    Wavelength, nm

    200 300 400 500 600 700 800















    Wavelength, nm

    200 300 400 500 600 700 800














    Wavelength, nm

    Fig. 6 e The frequency of the variables selected by the weak

    learners of the Boosting-IPW-PLS models for (a) COD,

    (b) TSS, (c) O&G measurements.1750







    aits weights assigned by IPW-PLS. If the variable weight is less

    than a very small value, for example 106 in the present study,the corresponding variable is considered as quality-irrelevant

    and will not be selected by the weak learner. As seen from

    Fig. 6, except the turbiditymeasurement (at the wavelength of

    801 nm), only a very small portion of variables in the UV region

    and near-infrared region were selected in the construction of

    0 250 500 750 1000 1250 1500 1750 2000 2250 2500








    Analytical Values,mg/L




    d v



    0 50 100 150 200 250 300 350 400 450
















    Analytical Values,mg/L



    0 50 100 150 200 250 300 350 400 450 500 550





















    Analytical Values,mg/L

    Fig. 7 e Scatter plots of predicted versus analytical value for

    the test set using Boosting-IPW-PLS models for (a) COD,

    (b) TSS, (c) O&G measurements.

  • Table 3 e Summary of COD prediction using different methods.

    Method Number of LVs M Number of variables MaxE (mg/L) MinE (mg/L) RMSEP (mg/L) R

    PLS 7 e 1750 372 347 176 0.903Boosting-PLS 1 3013 1750 467 314 157 0.922UVE-PLS 8 e 205 1020 413 216 0.915IPW-PLS 7 e 22 372 347 176 0.903Boosting-IPW-PLS 1 199 36 453 261 141 0.945



    wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 41142the prediction models. This was also proved in the variable

    weight evolution of the weak learners.

    4.3.3. Prediction performance of Boosting-IPW-PLSWith variable weights and model size determined, one could

    use the Boosting-IPW-PLS models to predict the water quality

    indices, as shown in Fig. 7 and Tables 3e5. The predicted

    values fit the analytical values well with high correlation

    coefficients and small RMSEP values, as 0.945, 0.965, 0.945 and

    141, 30.2, 34 mg/L for COD, TSS and O&G measurements,

    respectively, revealing the effectiveness of the proposed


    4.4. Comparisons of different methods

    For comparison of the performances of the commonly used

    modeling methods, MaxE, MinE, RMSEP and R for the test sets

    Table 4 e Summary of TSS prediction using different metho

    Method Number of LVs M Number of var

    PLS 12 e 1750

    Boosting-PLS 1 4456 1750

    UVE-PLS 4 e 279

    IPW-PLS 12 e 45

    Boosting-IPW-PLS 1 183 20by PLS, Boosting-PLS, UVE-PLS, IPW-PLS and Boosting-IPW-

    PLS are summarized in Tables 3e5. Among these methods,

    PLS is themost commonly used one, UVE-PLS and IPW-PLS are

    known as variable selection/modeling methods, and the

    Boosting-PLS is known as an ensemble one. Here, the number

    of latent variables of PLS, UVE-PLS and IPW-PLS was deter-

    mined by 5-fold-cross-validation with respect to the root

    mean square error. The Boosting-PLS only used one latent

    variable and the shrinkage valuewas set as 0.9 as suggested by

    Zhang et al. (2005). The size of Boosting-PLS model was

    determined by 5-fold-cross-validation with respect to the root

    mean square error also while the size of Boosting-IPW-PLS

    Table 5 e Summary of O&G prediction using different method

    Method Number of LVs M Number of variab

    PLS 18 e 1750

    Boosting-PLS 1 5356 1750

    UVE-PLS 6 e 41

    IPW-PLS 25 e 71

    Boosting-IPW-PLS 1 283 27was determined by the proposed stopping criterion, as dis-

    cussed in previous sections.

    From the results listed in Tables 3e5, it is clear that the

    Boosting-IPW-PLS method gives the best overall results in the

    predictions of these three water quality indices, followed by

    UVE-PLS, Boosting-PLS, PLS and IPW-PLS. Although UVE-PLS

    has comparable prediction performance in TSS and O&G

    predictions, its MaxE and MinE values in COD predictions are

    comparatively large. The possible reason may rely on its

    disability in handling outliers. The fact that Boosting-PLS

    requires thousands of weak learners to build up the final

    Boosting model may be attributed to the fact that only one

    latent variable was used in the PLS weak learner construction.

    With only one weak learner, only a small portion of the

    quality-informative features was extracted from the noisy

    fused data. As the consequence, more weak learners were

    added to achieve acceptable prediction ability, resulting in


    les MaxE (mg/L) MinE (mg/L) RMSEP (mg/L) R

    57.3 102 35.9 0.93953.9 119 37.5 0.9454.3 108 34.7 0.9657.3 102 35.9 0.93955.7 90.5 30.2 0.965a very complex model structure, which is not preferable in

    practice. This is one of the important reasons why IPW-PLS

    was incorporated into Boosting scheme to suppress the

    quality-irrelevant variables and further reduce the complexity

    of the model. It is quite interesting that PLS and IPW-PLS had

    almost the same prediction results for COD and TSS concen-

    trations although there is a significant difference in the

    number of the variables enrolled in themodeling. Thismay be

    attributed to the fact that both PLS and IPW-PLS used the same

    number of latent variables, and the variables retained in IPW-

    PLS models express the same amount of the quality-

    informative features with the PLS models. It should be noted


    les MaxE (mg/L) MinE (mg/L) RMSEP (mg/L) R

    95.3 80.2 35.9 0.93684 118 37.8 0.9466.9 79.3 34.2 0.95698.3 92.2 38.3 0.92885.1 90.3 34 0.945

  • in slurry and effluent using ultraviolet spectrophotometry.Water Research 31, 372e374.

    techniques. In: Fisher, D.H. (Ed.), Proceedings of the 14thInternational Conferences on Machine Learning. Morgan

    wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 4 1143Kaufmann, San Mateo, CA, pp. 107e115.Duffy, N., Helmbold, D., 2002. Boosting methods for regression.

    Machine Learning 47, 153e200.Esteban, J., Starr, A., Willetts, R., Hannah, P., Bryanston-Cross, P.,Centner, V., Massart, D.L., de Noord, O.E., de Jong, S.,Vandeginste, B.M., Sterna, C., 1996. Elimination ofuninformative variables for multivariate calibration.Analytical Chemistry 68, 3851e3858.

    Chambers, B., Jones, G., 1988. Optimization and uprating ofactivated sludge plants by efficient process design. WaterScience and Technology 20, 121e132.

    Chen, D., Hu, B., Shao, X.G., Su, Q.D., 2004. Variable selection bymodified IPW (iterative predictor weighting)-PLS (partial leastsquares) in continuous wavelet regression models. Analyst129, 664e669.

    Chen, G., Chen, X., Yue, P.L., 2000. Electrocoagulation andelectroflotation of restaurant wastewater. Journal ofEnvironmental Engineering 126, 858e863.

    Drainage Services Department of the Government of the HongKong Special Administrative Region (DSD-HKSAR), 2011.Sewage Service Charging Scheme [Online]. Available at: http://www.dsd.gov.hk/EN/Sewage_Services_Charging_Scheme/index.html (accessed 10.09.11).

    Drucker, H., 1997. Improving regressors using boostingthat with different information expressed, the prediction

    ability of the models would differ, as shown in the O&G

    concentration predictions using PLS and IPW-PLS models.

    5. Conclusions

    A multi-sensor water quality monitoring system incorpo-

    rating an UV/Vis spectrometer and a turbidimeter was first

    proposed and tested to monitor COD, TSS and O&G online.

    The COD, TSS and O&G prediction models were constructed

    by the Boosting-IPW-PLS method that was first developed in

    the present study. The sizes of COD, TSS and O&G prediction

    models were 199, 183 and 283 with only one PLS component

    used in IPW-PLS for variable weighting and the PLS weak

    learner construction. Themonitoring systemwas tested in the

    field. Experimental results showed that the predicted values

    fit the analytical values well with high correlation coefficients

    and small RMSEP values, as 0.945, 0.965, 0.945 and 141, 30.2

    and 34 mg/L for COD, TSS and O&G measurements, respec-

    tively, revealing the effectiveness of the proposed system.

    r e f e r e n c e s

    Akaike, H., 1974. A new look at the statistical model identification.IEEE Transactions on Automatic Control 19, 716e723.

    American Public Health Association (APHA), American WaterWorks Association, Water Environment Federation, 2005.Standard Methods for the Examination of Water andWastewater, 21st ed. American Public Health Association,New York.

    Bookman, S.K.E., 1997. Estimation of biochemical oxygen demand2005. A review of data fusion models and architectures:towards engineering guidelines. Neural Computing &Applications 14, 273e281.

    Fogelman, S., Blumenstein, M., Zhao, H., 2006. Estimation ofchemical oxygen demand by ultraviolet spectroscopicprofiling and artificial neural networks. Neural Computing &Applications 15, 197e203.

    Forina, M., Casolino, C., Millan, C.P., 1999. Iterative predictorweighting (IPW) PLS: a technique for the elimination of uselesspredictors in regression problems. Journal of Chemometrics13, 165e184.

    Freund,Y., Schapire, R.E., 1997. Adecision-theoretic generalizationof on-line learning and an application to boosting. Journal ofComputer and System Sciences 55, 119e139.

    Friedman, J., 2001. Greedy function approximation: a gradientboosting machine. Annals of Statistics 29, 1189e1232.

    Friedman, J., Hastie, T., Tibshirani, R., 2000. Additive logisticregression: a statistical view of boosting. Annals of Statistics28, 337e374.

    Geladi, P., Kowalski, B., 1986. Partial least-squares regression:a tutorial. Analytica Chimica Acta 185, 1e17.

    Goicoechea, H.C., Olivieri, A.C., 2002. Wavelength selection formultivariate calibration using a genetic algorithm: a novelinitialization strategy. Journal of Chemical Information andModeling 42, 1146e1153.

    Hoskuldsson, A., 1988. PLS regression methods. Journal ofChemometrics 2, 211e228.

    Klein, L.A., 2004. Sensor and Data Fusion: A Tool for InformationAssessment and Decision Making. SPIE Press, Bellingham,Wash.

    Lourenco, N.D., Chaves, C.L., Novais, J.M., Menezes, J.C.,Pinheiro, H.M., Diniz, D., 2006. UV spectra analysis for waterquality monitoring in a fuel park wastewater treatment plant.Chemosphere 65, 786e791.

    Lu, X., Jiang, J.H., Wu, H.L., Shen, G.L., Yu, R.Q., 2007. Variable-weighted PLS. Chemometrics and Intelligent LaboratorySystems 85, 140e143.

    Lutz, R.W., Kalisch, M., Buhlmann, P., 2008. Robustified L2boosting. Computational Statistics and Data Analysis 52,3331e3341.

    Olsson, G., Nielsen, M.K., Yuan, Z., Lynggaard-Jensen, A.,Steyer, J.-P., 2005. Instrumentation, Control and Automationin Wastewater Systems. IWA Publishing, London.

    Pouet, M.-F., Thomas, O., Jacobsen, B.N., Lynggaard-Jensen, A.,Quevauviller, P., 1999. Conclusions of the workshop onmethodologies for wastewater quality monitoring. Talanta 50,759e762.

    Ridgeway, G., 1999. The state of boosting. Computing Science andStatistics 31, 172e181.

    Roig, B., Chalmin, E., Touraud, E., Thomas, O., 2002. Spectroscopicstudy of dissolved organic sulfur (DOS): a case study ofmercaptans. Talanta 56, 585e590.

    Russell, S., Marshallsay, D., MacCraith, B., Devisscher, M., 2003.Non-contact measurement of wastewater polluting load ethe Laodmon project. Water Science and Technology 47,79e86.

    Shao, X., Bian, X., Cai, W., 2010. An improved boosting partialleast squares method for near-infrared spectroscopicquantitative analysis. Analytica Chimica Acta 666, 32e37.

    Swierenga, H., Wulfert, E., de Noord, O.E., de Weijer, A.P.,Smilde, A.K., Buydens, L.M.C., 2000. Development of robustcalibration models in near infrared spectrometricapplications. Analytica Chimica Acta 411, 121e135.

    Tan, C., Wang, J., Wu, T., Qin, X., Li, M., 2010. Determination ofnicotine in tobacco samples by near-infrared spectroscopyand boosting partial least squares. Vibrational Spectroscopy54, 35e41.

    Thomas, O., Constant, D., 2004. Trends in optical monitoring.

    Water Science and Technology 49, 1e8.

  • US Environmental Protection Agency (EPA), 1999. Method 1664,Revision A: N-Hexane Extractable Material (HEM; Oil andGrease) and Silica Gel Treated N-Hexane Extractable Material(SGT-HEM; Non-Polar Material) by Extraction and Gravimetry[Online]. Available at: http://water.epa.gov/scitech/methods/cwa/oil/upload/2007_07_10_methods_method_oil_1664.pdf(accessed 04.06.11).

    Vapnik, V.N., 2000. The Nature of Statistical Learning Theory,second ed. Springer, New York.

    Varshney, P.K., 1997. Multisensor data fusion. Electronics &Communication Engineering Journal, 245e253.

    Wold, S., Ruhe, A., Wold, H., Dunn III, W., 1984. The collinearityproblem in linear regression. The partial least squares (PLS)approach to generalized inverses. SIAM Journal on Scientificand Statistical Computing 5, 735e743.

    Wu, X.L., 2007. An applied research on information fusion andensemble learning for spectral analysis of water quality. Ph.D.thesis, Zhejiang University, Hangzhou, China.

    Wu, X.L., Li, Y.J., Wu, T.J., 2006. A boosting-partial least squaredmethod for ultraviolet spectroscopic analysis of waterquality. Chinese Journal of Analytical Chemistry 8,1091e1095.

    Wu, X.L., Li, Y.J., Wu, T.J., 2007. Application of multi-spectralinformation fusion for water quality analysis. Chinese Journalof Analytical Chemistry 12, 1716e1720.

    Zemel, R., Pitassi, T., 2001. A gradient-based boostingalgorithm for regression problems. In: Leen, T.K.,Dietterich, T.G., Tresp, V. (Eds.), 2001. Advances in NeuralInformation Processing Systems, vol. 13. MIT Press,Cambridge, MA.

    Zhang, M.H., Xu, Q.S., Massart, D.L., 2005. Boosting partial leastsquares. Analytical Chemistry 77, 1423e1431.

    Zhou, Y.P., Cai, C.B., Huan, S., Jiang, J.H., Wu, H.L., Shen, G.L.,Yu, R.Q., 2007. QSAR study of angiotensin II antagonists usingrobust boosting partial least squares regression. AnalyticaChimica Acta 593, 68e74.

    wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 41144

    Wastewater quality monitoring system using sensor fusion and machine learning techniques1. Introduction2. Theory and methods2.1. Partial least squares (PLS)2.2. Uninformative variable elimination-partial least squares (UVE-PLS)2.3. Iterative predictor weighting-partial least squares (IPW-PLS)2.4. Boosting-partial least squares (Boosting-PLS)2.5. Boosting-IPW-PLS

    3. Experimental3.1. Hardware setup3.2. Wastewater characteristics

    4. Results and discussions4.1. Data pre-processing and sensor fusion4.2. Performance indices4.3. Boosting-IPW-PLS model building4.3.1. Determination of the size of Boosting-IPW-PLS4.3.2. Variable weight evolution4.3.3. Prediction performance of Boosting-IPW-PLS

    4.4. Comparisons of different methods

    5. ConclusionsReferences


View more >