wastewater quality monitoring system using sensor fusion and machine learning techniques

ww.sciencedirect.com

wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 4

Available online at w

journal homepage: www.elsevier .com/locate/watres

Wastewater quality monitoring system using sensor fusionand machine learning techniques

Xusong Qin, Furong Gao, Guohua Chen*

Department of Chemical and Biomolecular Engineering, The Hong Kong University of Science and Technology, Clear Water Bay,

Kowloon, Hong Kong, China

a r t i c l e i n f o

Article history:

Received 22 June 2011

Received in revised form

4 October 2011

Accepted 5 December 2011

Available online 11 December 2011

Keywords:

Online monitoring

UV/Vis spectroscopy

Turbidity

Variable weighting

Boosting-IPW-PLS

Wastewater treatment

* Corresponding author.E-mail address: [email protected] (G. Che

0043-1354/$ e see front matter ª 2011 Elsevdoi:10.1016/j.watres.2011.12.005

a b s t r a c t

Amulti-sensor water quality monitoring system incorporating an UV/Vis spectrometer and

a turbidimeter was used to monitor the Chemical Oxygen Demand (COD), Total Suspended

Solids (TSS) and Oil & Grease (O&G) concentrations of the effluents from the Chinese

restaurant on campus and an electrocoagulationeelectroflotation (ECeEF) pilot plant. In

order to handle the noise and information unbalance in the fused UV/Vis spectra and

turbidity measurements during the calibration model building, an improved boosting

method, Boosting-Iterative Predictor Weighting-Partial Least Squares (Boosting-IPW-PLS),

was developed in the present study. The Boosting-IPW-PLS method incorporates IPW into

boosting scheme to suppress the quality-irrelevant variables by assigning small weights,

and builds up the models for the wastewater quality predictions based on the weighted

variables. The monitoring system was tested in the field with satisfactory results, under-

lying the potential of this technique for the online monitoring of water quality.

ª 2011 Elsevier Ltd. All rights reserved.

1. Introduction water quality monitoring. However, due to the complex

Monitoring wastewater quality is a subject of growing

importance around the world such that better understanding

of both treated and untreated effluent is improved for a better

treatment plants control. For example, it has been estimated

that online monitoring for real-time process control may save

asmuch as 40% of the energy (themajor part of cost) currently

needed for wastewater treatment by continuous aeration

(Chambers and Jones, 1988; Olsson et al., 2005). However, the

available wastewater quality monitoring technologies have

several drawbacks in what concerns the control and optimi-

zation of the treatment plants (Pouet et al., 1999). In addition

to sampling and sample storage problems, the standard

analytical methods currently available do not allow the

implementation of real-time monitoring and process control.

The use of better online sensors is in imminent demand for

n).ier Ltd. All rights reserved

pollutant matrix and generally hostile environment, the lack

of accurate, cost-effective and robust sensors, the automation

of wastewater treatment systems is still not as developed as

other process industries.

In view of the high potential for the development and

application of online measurements in water quality moni-

toring, UV/Vis spectroscopy has attracted substantial atten-

tion and led to some useful results (Bookman, 1997; Fogelman

et al., 2006; Lourenco et al., 2006), although most of the re-

ported wastewater UV/Vis spectrometric applications are

based on visual observation and direct comparison of the UV/

Vis spectra (Roig et al., 2002). Several reports applied multi-

variate monitoring approaches (Lourenco et al., 2006; Wu

et al., 2006; Wu, 2007). It is worth to point out that in most of

these reports UV/Vis spectra were examined with the waste-

water samples being static and of low strength. However, for

.

mailto:[email protected]

www.sciencedirect.com/science/journal/00431354

http://www.elsevier.com/locate/watres

http://dx.doi.org/10.1016/j.watres.2011.12.005



wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 41134

some effluents such as restaurantwastewater, concentrations

of oil and grease, suspended solids and colloids are very high

(Chen et al., 2000), making it very difficult to obtain a satis-

factory monitoring performance using a single optical tech-

nique. In addition, these pollutants tend to foul the light

transmitting windows, degrading the monitoring perfor-

mance of the UV/Vis spectroscopy. Thus, one may need to

develop a combined optical method, combining two or three

optical spectral sources and process variables to compensate

their drawbacks so as to improve themonitoring performance

(Russell et al., 2003; Thomas and Constant, 2004;Wu, 2007;Wu

et al., 2007).

The combined optical method is essentially one typical

application of the sensor fusion technology (Wu, 2007). Sensor

fusion, which is also known as multi-sensor data fusion, first

appeared in the literature as mathematical models for data

manipulation in the 1960s. It refers to the acquisition, pro-

cessing and synergistic combination of information gathered

by various knowledge sources and sensors to provide a better

understanding of a phenomenon. It is a fascinating and

rapidly evolving field that has generated a lot of excitement in

the research and development community, and is being

applied to a wide variety of the fields such as military

command and control, robotics, image processing, air traffic

control, medical diagnostics, pattern recognition and envi-

ronmental monitoring (Varshney, 1997). However, the appli-

cations of sensor fusion techniques are disparate and

problem-dependent (Klein, 2004; Esteban et al., 2005). It is

impossible to use a one-fits-all technique/algorithm to solve

all the problems as well as the frameworks for the imple-

mentation of the sensor fusion systems. Therefore, in order to

construct an optimal sensor fusion system for the combined

optical monitoring system, one has to properly select the

optical spectral sources/process variables, the sensor fusion

framework and its corresponding algorithms at each step. As

a result, the candidate optical spectral sources and process

variables for the combined optical method should be

complementary from chemical and/or physical aspects for the

water quality measurement. However, in spite of their infor-

mation complementarities, information redundancy and

unbalance problems usually occur because of the variable

collinearity among spectral wavelengths and process vari-

ables, and the significant dimensionality difference among

the various types of optical spectra and process variables. If

the sensor fusion strategy and fusion levels are not properly

selected, the modeling effort may concentrate on the data

sources with higher dimensionality rather on the more

quality-informative low dimensional data sources. The cali-

bration model obtained therefore may be less accurate due to

the loss of quality-related information. In addition, the results

obtained are oftenworsened by the presence of uninformative

variables, such as the highly fluctuating background and

noises, in optical spectra/process variables. Since not all

wavelengths/variables are useful for quality prediction,

various sensor fusion systems and variable selection/feature

extraction methods (Wu, 2007; Wu et al., 2007; Centner et al.,

1996; Forina et al., 1999; Swierenga et al., 2000; Goicoechea

and Olivieri, 2002; Chen et al., 2004; Lu et al., 2007) have

been designed to cooperate with the calibration modeling

methods. In this way, the variable selection procedure is

performed only once, and only one set of those most quality-

related variables is retained as descriptors for regression

modeling. However, this may lead to information loss

compared with the original spectra/process variables space,

more or less, resulting in accuracy loss of the regression

models.

In the past two decades, the application of Boosting to

regression problems has received significant attention

because its ensemble learning nature can produce higher

predictive accuracy than single model strategies. Freund and

Schapire (1997) proposed the first algorithm of Boosting for

regression problems, the AdaBoost.R algorithm. The most

important contribution of this method is the majority vote

idea that combines a group of weighted weak learners (fitting

models) which only guarantee to achieve an error rate of

slightly less than that achieved by random guessing. The

weights of the weak learners are defined by their accuracies,

respectively. Drucker (1997) developed the AdaBoost.R2 algo-

rithm, which is an ad hoc modification of the AdaBoost.R

algorithm. The advantage of Drucker’s method is its ad hoc

ability, i.e., any learners, nomatter linear or non-linear, can be

incorporated. Many other researchers (for examples,

Ridgeway (1999), Friedman et al. (2000), Friedman (2001),

Zemel and Pitassi (2001), Duffy and Helmbold (2002)) have

viewed Boosting as a “gradient machine” that optimizes

a particular loss function. In this sense, Boosting is essentially

a method that combines a group of weak learners that

perform marginally better than random guessing to obtain

a powerful learner in regression. These weak learners are

constructed through iterative steps by always using a basic

learning algorithm. In each step, a new learner (fitting model)

is established by relating the predictor variables of X (an n� p

matrix with p predictors for n samples) to the residuals

(prediction errors) of the responses y (an n� 1 vector for n

samples) that are not fitted by previous learners. The

Boosting-partial least squares (Boosting-PLS) and its modifi-

cations (Zhang et al., 2005; Wu et al., 2006; Zhou et al., 2007;

Lutz et al., 2008; Shao et al., 2010; Tan et al., 2010) proposed

recently introduced PLS into the boosting procedure by

combining a set of shrunken PLS models. However, the above

Boosting methods construct all the weak learners based on

the same p predictor variables of X. The highly fluctuating

background and noises in X definitely weaken the predictive

ability of the weak learners, as well as the Boosting model.

From the nature of sequential additions of weak learners in

boosting procedure, it is worthwhile to extract the quality-

informative features/variables only for the weak learner in

each iteration, for the improvement of the predictive accuracy

and robustness of the Boosting model.

Because the trade effluent surcharge of the restaurant is

determined by the effluent COD value according to the sewage

services ordinance of Hong Kong (DSD-HKSAR, 2011), and the

suspended solids and oil & grease have strong impacts on the

light penetration and the biodegradation ability of the water,

these three water quality indices are of particular interest in

the present study. The objective of the present study was

therefore to develop a wastewater quality monitoring system

that incorporates a UV/Vis spectrometer and a turbidimeter to

monitor COD, TSS and O&G concentrations of the effluents of

the Chinese restaurant on campus and a pilot ECeEF



wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 4 1135

wastewater treatment plant. Sensor fusion technique was

used to fuse the signals from these two sensors/instruments.

An improved boosting method, Boosting-IPW-PLS, was

developed here to handle the noises and information unbal-

ance of the fused information, to model and predict the water

quality. The system was evaluated in field trials. Satisfactory

results were obtained as seen subsequently.

2. Theory and methods

2.1. Partial least squares (PLS)

Suppose a data set {X, y} where X is an n� p process variable

data matrix with p predictor variables for n samples, e.g. n

wastewater samples with corresponding UV/Vis spectra (each

UV/Vis spectrum has p wavelengths), and y is the corre-

sponding dependent variable vector with size n� 1, e.g. COD

measurements of these n wastewater samples. PLS models

both outer relations within X and y blocks and inner relations

between two blocks. The equations are as follows:

X ¼ TPT þ E ¼Xai¼1

tipTi þ E; (1)

y ¼ UQT þ F ¼Xai¼1

uiqTi þ F; (2)

ui ¼ biti; (3)

where T, P and E are the score matrix, loading matrix and

residual matrix of X space while U, Q and F are score matrix,

loading matrix and residual matrix of y space, respectively. ti,

pi, ui and qi are the corresponding vectors of T, P, U and Q

matrices. Eqs. (1) and (2) describe the outer relations, Eq. (3)

describes the inner relations between y space and X space,

a is the number of latent variables, bi¼ tiTui/(ti

Tti) is the

regression coefficient between the PLS component ti from X

space and the PLS component ui from y space. The standard

PLS procedure based on non-linear iterative partial least

squares (NIPALS) algorithm and the methods to choose the

number of latent variables (LVs) of PLS (cross-validation,

jackknife and so on) can be found in the works of Wold et al.

(1984), Geladi and Kowalski (1986), and Hoskuldsson (1988).

Based on NIPALS algorithm, the prediction of new sample j

can be written as:

yaj ¼ xjb

a þ ca; (4)

where ba is the coefficient vector ( p� 1), ca is the offset when

a latent variables are used, and xj is the predictor variable

vector (1� p) of sample j.

2.2. Uninformative variable elimination-partial leastsquares (UVE-PLS)

In a PLS model, there are some of the variables that can be

noisy and/or do not contain information relevant to the

prediction of y. Eliminating these variables from the explan-

atory data can improve the model. UVE-PLS method proposed

by Centner et al. (1996) is one of the methods to eliminate the

uninformative variables.

In UVE-PLS method, a PLS regression coefficient matrix

ba¼ [b1a,.bp

a] with a latent variables is calculated through

a leave-one-out validation. Because each coefficient bka repre-

sents the contribution of the corresponding variable to the

establishedmodel, the reliability of each variable (wavelength)

k can be quantitatively measured by its stability defined as:

ck ¼mean

�bak

�std�bak

� ; (5)

where mean(bka) and std(bk

a) are the mean and standard devi-

ation of the regression coefficients of variable k. To determine

the uninformative variables, UVE-PLS adds an equal number

of random predictors or artificial predictors with very small

value (range of about 10�10) to the original predictors. The

maximum of the absolute value of the reliability value Ccutoff

defined by Eq. (5) for the added artificial predictors is the cut-

off value for the elimination of non-informative original

predictors. Only the original variables which have reliability

values larger than Ccutoff will be retained.

2.3. Iterative predictor weighting-partial least squares(IPW-PLS)

The IPW-PLS method originally developed by Forina et al.

(1999) aims at producing acceptable calibration models with

a small number of variables. The useless or redundant

predictors in the PLS regression have been eliminated. The key

component of the IPW-PLS is tomultiply the variables by their

importance in the cyclic repetition of PLS regression. The

importance of the variable is defined as:

zk ¼��ba

k

��skPpk¼1

��bak

��sk; (6)

where sk and bka are the standard deviation and PLS regression

coefficient of the variable k respectively, and p is the number

of variables.

2.4. Boosting-partial least squares (Boosting-PLS)

The basic idea of Boosting is to sequentially construct additive

regression models by fitting a basic learner to the current

residuals that are not fitted by previousmodels, and finally the

weighted predictions of a collection of regression models are

used as an ensemble prediction. Using PLS as the basic/weak

algorithm, one can obtain the Boosting-PLS algorithm (Zhang

et al., 2005). Because of the nature of Boosting, Boosting-PLS

does not require the selection of an adequate number of

latent variables, in contrast to the classical PLS. With proper

shrinkage value and number of iterations determined,

Boosting-PLS has at least comparable prediction ability as

classical PLS (Zhang et al., 2005).

2.5. Boosting-IPW-PLS

In the standard boosting procedures, the weak learners are

constructed from the X matrix (n� p, with p predictors for n

samples) and the current residuals (n� 1, for n samples) that

are not fitted by previous models. In the Boosting model

construction process, weights of the samples are updated



wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 41136

according to the prediction errors of previous models, and are

utilized for later weak learner training. However, the predic-

tors in X matrix are not weighted/modified. The quality-

irrelevant predictors, which are not useful to the estimation

of quality indices, definitely affect the establishments of the

weak learners. Therefore, the loss of accuracy and robustness

of the Boostingmodelmight occur. Realizing this fact, a robust

calibration modeling method, Boosting-IPW-PLS, is proposed

on the bases of predictorweighting and boosting framework in

the present study, as shown in Fig. 1. The proposed algorithm

is derived from SQUARELEV.R algorithm developed by Duffy

and Helmbold (2002). It integrates the predictor weighting

into the boosting procedure to suppress the quality-irrelevant

variables by signing smallweights. The algorithm is as follows.

Consider a data set {X, y} and a Boosting model with size M

(M weak learners), where X is an n� p process variable data

matrix with p predictor variables for n samples, and y is the

corresponding dependent variable vector with size n� 1. The

Boosting model F0 is initialized as the zero function.

For m¼ 1, 2, ., M, repeat the following steps 1e6.

Step 1 Calculate the residual of the Boostingmodel obtained in

last iteration,

yres;m ¼ y� Fm�1ðXÞ: (7)

Step 2 Perform predictor weighting on the original variables

with respects to X and yres,m using IPW-PLS method,

and get the variable weight vector wm with size 1� p

according to Eq. (6).

Step 3 Multiply the variables weights with the predictors of

original X.

Xm ¼ X :� Wm: (8)

Here Wm is an n� p matrix with each row as the variable

weight vector wm, and the operator .� is the element-by-

element multiplication operator.

Fig. 1 e Schematic diagram of Boosting-IPW-PLS method.

Step 4 Construct a PLS weak learner fm on the weighted

predictors Xm and the current residual yres,m,

yres;m ¼ fmðXmÞ þ Em ¼ Xmbm þ cm þ Em; (9)

where b and c are the corresponding PLS regression coeffi-
m m
cient vector and offset, the Em is residual not fitted by current

weak learner.

Step 5 Calculate the shrinkage value of the current weak

learner,

am ¼�yres;m � �yres;m

��fmðXmÞ � �fmðXmÞ

��fmðXmÞ � �fmðXmÞ

��2 : (10)

Here, �yres;m and �fmðXmÞ are the mean values of the current

residual and predictions, respectively.

Step 6 Update the Boosting model,

Fm ¼ Fm�1 þ amfm: (11)

After finishingM boosting cycles,Mmember of PLSmodels are

built and their corresponding shrinkage parameters are

determined. The dependent variables yu (l� 1 vector) of the

unknown l samples with measurement matrix Xu (l� p

matrix) are predicted as:

yu ¼ a1yu;1 þ a2yu;2 þ a3yu;3 þ/þ aMyu;M

¼XMm¼1

amððXu :� UmÞbm þ cmÞ; (12)

where Um is an l� p matrix with each row as the variable

weight vector wm determined in Step 2; bm, cm, and am are

determined in Steps 4 and 5 respectively.

Now three questions remain. The first two questions are

the numbers of PLS components used in IPW-PLS for variable

weighting and the PLS weak learner construction. The third

one is when to stop adding models, that is, the size of M. For

the first two questions, one may use cross-validation, jack-

knife and other methods to determine the latent variables as

discussed in the works of Wold et al. (1984), Geladi and

Kowalski (1986) or Hoskuldsson (1988). For simplicity, only

one PLS component was used in IPW-PLS for variable

weighting and the PLS weak learner construction in the

present study. Since the X space and the residual space are

changed in every iteration, it is very important to select an

appropriate iteration time, M, to avoid overfitting. We

proposed one stopping criterion here to determine when to

stop adding models, derived from structural risk minimiza-

tion (SRM) principle (Vapnik, 2000) and Akaike’s information

criterion (AIC) (Akaike, 1974), as:

min CM ¼ n

�ln

2pn

��y� yM

��2þ 1

�þXMm¼1

am

Xph¼1

bm;h

!

s:t: M˛Nþ

bm;h ¼�1;

��bm;h

�� > 10�6

0;��bm;h

�� 10�6 ; h ¼ 1; 2;.;p; m ¼ 1; 2;.;M; (13)

where yM is the prediction of the response y using a Boosting

model with size M, bm,h and am are the PLS regression coeffi-

cient and shrinkage value of weak learner m. The first term of



wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 4 1137

Eq. (13) represents the empirical error of the model of the

training set. The second term represents the complexity of the

Boosting model. For each weak learner, its model complexity

is considered as the number of the variables whose corre-

sponding absolute PLS regression coefficients are larger than

a very small value, for example 10�6 in the present study.

Since the prediction of the weak learner only contributes

partially to the final prediction as shown in Eq. (12), the cor-

responding weak learner’s model complexity also only

contributes partially to the final Boosting model complexity.

Because of the nature of Boosting, the empirical error of the

boosting model with respect to the training set will decrease

whenmoreweak learners are added. However, the complexity

of the boosting model will increase accordingly. When the

decrease of the empirical error predominates the increase of

the model complexity, CM decreases, and vice versa. As

a consequence, CM increases in the initial a few iterations, and

then increases as more weaker learners are added. The

optimal tradeoff between the empirical error and the model

complexity is therefore leading to the minimum CM. Thus, the

M value, which is corresponding to the minimum CM, is

therefore selected as the size of the Boosting model.

3. Experimental

3.1. Hardware setup

The experimental setup including the ECeEF treatment

system and water quality monitoring system is schematically

shown in Fig. 2. The water quality monitoring system consists

of two parts: the UV/Vis spectroscopy and the turbidimeter.

The UV/Vis spectroscopy and turbidimeter were operated in

continuous/online measurement mode in the present study.

Fig. 2 e Schematic diagram of the ECeEF treatment syst

Particularly, the wastewater was inmotion, falling through an

open chamber with a specified jet flow of a 10 mm in diameter

for UV/Vis spectrum acquisition. Since there is no contact

between the wastewater and the optical windows, the fouling

of the light transmitting windows could be eliminated. The

monitoring performance of the proposed system could then

be improved, and the corresponding maintenance of sensors

could be reduced to acceptable level.

The UV/Vis spectroscopy has a wavelength ranging from

200 to 800 nm with 1749 variables in the present study. The

UV/Vis spectrum data could be directly imported to the

computer. The turbidity measurement needs to be acquired

by the data acquisition system. Themeasurement range of the

turbidimeter is 0e1000 NTU. As a consequence, there are two

types of information, turbidity measurement and UV/Vis

spectrum of the water sample, could be utilized for the water

quality monitoring purpose, as seen from Fig. 2. Taking the

turbidity measurement as one artificial variable at UV/Vis

wavelength 801 nm, the combination of the UV/Vis spectrum

and the turbidity measurement has 1750 variables, ranging

from 200 to 801 nm.

In order to introduce the wastewaters from the restaurant,

EC reactor, and EF reactor to the water quality monitoring

system, a sampling system was installed. The wastewater

flows supplied to the water quality monitoring system were

then determined by the operation of the switch valve. In the

present study, the switching time of the switch valve was

3 min, providing 2 min for flushing and 1 min for water quality

monitoring. Therefore, the sampling intervals for the effluents

from the restaurant, EC reactor, and EF reactor were 9 min in

the present study. It is worth to note that this comparatively

long sampling interval is due to the switching operation of the

switch valve for the introduction of the three wastewater

sources to the monitoring system. If the monitoring system is

em and the wastewater quality monitoring system.



Table 2 e Typical wastewater samples.

COD(mg/L)

TSS(mg/L)

O&G(mg/L)

Turbidity(NTU)

Restaurant 1370 193 213 164

EC reactor 782 224 81.7 120

EF reactor 497 53.6 14.5 24.1

wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 41138

dedicated to one water source, real-time water quality moni-

toring could be achieved because the response time of the

water quality monitoring system is only 1 s.

3.2. Wastewater characteristics

In order to test the validity of the proposed water quality

monitoring system, a total of 163 samples were collected from

the Chinese restaurant and the pilot ECeEF wastewater

treatment plant on campus for COD, TSS and O&G measure-

ments. Due to the sampling handling problem during O&G

analysis, 10 samples were discarded for O&G measurements.

As a consequence, the numbers of the samples for COD, TSS

and O&G measurements are 163, 163 and 153, respectively.

These sample sets were then randomly separated into two

subsets, two-third for prediction model training while the

other one-third for testing. The concentrations of COD, TSS

and O&G and sample numbers for model training and testing

are listed in Table 1. The pH, conductivity and turbidity of

these samples ranged from 4.65 to 9, 172 to 1750 mS/cm, and

1.68 to 254 NTU, respectively. Here, TSS, pH and conductivity

were examined by the standard methods (APHA et al., 2005).

Turbidity was measured using HF Micro TOL turbidimeter (HF

scientific, Inc., USA). COD was measured using COD reactor

and direct reading spectrophotometer (DR/2000, Hach

Company, USA). O&Gwas examined by EPAmethod 1664A (US

EPA, 1999).

Table 2 lists the details of some typical wastewater

samples with their corresponding UV/Vis spectra as shown in

Fig. 3. As seen from Fig. 3, the UV/Vis spectra are very noisy

due to the strong interactions between UV/Vis light and the

particles, oil and grease droplets, especially in UV wavelength

range 200e250 nm and near-infrared wavelength range

600e800 nm.

1.5

2.0

2.5

3.0

orb

an

ce

, a

.u

.

4. Results and discussions

4.1. Data pre-processing and sensor fusion

The presence of the high concentrations of solid, oil and

grease contents would affect the UV/Vis spectrum measure-

ment significantly, as illustrated in Section 1 and demon-

strated in Section 3.2. The measurement of turbidity is a key

test of water quality. It is a measure of the degree to which the

water sample loses its transparency due to the presence of

suspended particulates, droplets of oil and grease, etc., which

scatter the light and prevent it from passing through. It is

essentially complementary to UV/Vis spectrum from this

aspect. Thus, two different kinds of instruments/sensors, UV/

Table 1 e Characteristics of wastewater samples.

Range (mg/L) Number of samples

Training Testing

COD 176e2550 108 55

TSS 9.7e410 108 55

O&G 0.93e525 102 51

Vis spectrometer and turbidimeter, were utilized and fused for

the construction of thewater qualitymonitoring system in the

present study.

As described in Section 3.1, the turbidity measurement has

only one variable while the UV/Vis spectrum has 1749 vari-

ables. Consequently, one could either perform the sensor

fusion after the feature extraction of UV/Vis spectrum, or vice

versa. In the present study, signal level of fusion was adopted

in order to utilize the information of the UV/Vis spectrum and

turbidity signal asmuch as possible. Prior to the sensor fusion,

standard variate transformationwas performed on the UV/Vis

spectra and turbidity signals respectively. As a result, the

fused data have zero-means and unit variances.

4.2. Performance indices

To evaluate the performance of the proposedmethod/system,

the following indices are used.

Maximum prediction error:

MaxE ¼ max�yj � yj

�; j ¼ 1;.; k: (14)

Minimum prediction error:

MinE ¼ min�yj � yj

�; j ¼ 1;.; k: (15)

Root mean square error of prediction:

RMSEP ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1k

Xkj¼1

�yj � yj

�2vuut : (16)

200 300 400 500 600 700 800

0.0

0.5

1.0

Ab

s

Wavelength, nm

1

23

Fig. 3 e Typical UV/Vis spectra of the effluents from (1)

restaurant, (2) EC reactor, and (3) EF reactor.



0 100 200 300 400 500

0

20

40

60

80

100

120

140a

Empirical error

Model complexity

Em

pirica

l e

rro

r / M

od

el c

om

ple

xity

Number of Boosting-IPW-PLS iterations

80

85

90

95

100

Optimal M value: 199

CM

CM

0 100 200 300 400 500

0

20

40

60

80

100

120b

Empirical error

Model complexity

Em

pirica

l e

rro

r / M

od

el c

om

ple

xity



60

65

70

75

80

CM

CM

0 100 200 300 400 500

0

20

40

60

80

100

120

140

160

180

Empirical error

Model complexity

Em

piric

al e

rro

r / M

od

el c

om

ple

xity


85

90

95

100

105

110


c

CM

CM

Fig. 4 e The empirical error, model complexity and CM with

various sizes of Boosting-IPW-PLS models for: (a) COD,

(b) TSS, (c) O&G measurements.

wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 4 1139

Correlation coefficient of the prediction values and analyt-

ical values:

R ¼ Xk

i¼1

�yj � �y

��yj � �y

�!, ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXkj¼1

�y

j

� �y�2Xk

j¼1

�yj � �y

�2vuut ; (17)

where k is the number of samples of the test set, yj is the

prediction of yj of the sample j of the test set, �y and �y are the

mean values of the predictions and responses, respectively.

4.3. Boosting-IPW-PLS model building

4.3.1. Determination of the size of Boosting-IPW-PLSBefore obtaining final prediction models for different quality

parameters, one needs to determine the size of Boosting-IPW-

PLS, as illustrated in Section 2.5. Fig. 4 shows the influence of

the ensemble sizes on the empirical error, model complexity,

and CM for COD, TSS and O&Gmeasurements, respectively. As

seen in Fig. 4, when the ensemble size increased, the empirical

error gradually decreased. This was obvious because in each

iteration the new weak learner focused on the residual of the

Boosting-IPW-PLS model obtained in last iteration. However,

with a too large M, the Boosting model often becomes too

strongly tailored to the particularities of the training set and

the model’s generalization capability to the new water

sampleswould be poorer. In addition, when the ensemble size

increased, more weak learners were added into the Boosting

model. As a consequence, the complexity of Boosting model

was increased, as shown in Fig. 4. Because a prediction model

with too complex structure is not preferable in practice, an

appropriate ensemble size of Boosting model should be

selected.

The proposed stopping criterion CM decreased firstly and

then increased as the ensemble size increased for the

Boosting-IPW-PLS models of these three water quality

measurements, as shown in Fig. 4. This phenomenon was

expected because the trend of CM is determined by the

empirical error and the model complexity, as described in Eq.

(13). CM would decease if the decrease of the empirical error

was larger than the increase of themodel complexity, and vice

versa. Therefore, theM value where CM reached theminimum

was selected as the ensemble size of the Boosting model. As

a result, the sizes of the Boosting-IPW-PLS models for COD,

TSS, and O&G measurements were selected as 199, 183, and

283 as found in Fig. 4, respectively.

4.3.2. Variable weight evolutionIn Boosting-IPW-PLS, the weight of each variable reflects the

correlation between the variable and the quality index. The

higher the weight is, the more informative the variable is. The

evolution of the variable weights along ensemble size reveals

the contribution of the variable to the final model.

Fig. 5 shows the variable weight evolution of the first eight

weak learners/ensembles of Boosting models for COD, TSS,

and O&G measurements obtained in the previous section.

Only a very small portion of the variables was enrolled in each

weak learner construction. This may be attributed to the fact

that only one latent variable was used in the IPW-PLS algo-

rithm for the variable weighting in the present study. In PLS,

the most significant variance information between the

independent space and response space is expressed by the

first several latent variables. If only one latent variable was

used in IPW-PLS, only the variables contributing to the most

significant covariance information were assigned with large

weights while the other variables may be assigned zero

weights. This is equivalent to variable selection/feature

extraction.

It is quite interesting that for the three Boosting models,

the turbidity measurement (at the wavelength of 801 nm) was

assigned with very large weights in the construction of the



200 300 400 500 600 700 800

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

Variab

le w

eig

ht

aWeak learner 8

Weak learner 7

Weak learner 6

Weak learner 5

Weak learner 4

Weak learner 3

Weak learner 2

Weak learner 1

Wavelength, nm

200 300 400 500 600 700 800

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1b

Variab

le w

eig

ht

Weak learner 8

Weak learner 7

Weak learner 6

Weak learner 5

Weak learner 4

Weak learner 3

Weak learner 2

Weak learner 1

Wavelength, nm

200 300 400 500 600 700 800

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1c

Variab

le w

eig

ht

Weak learner 8

Weak learner 7

Weak learner 6

Weak learner 5

Weak learner 4

Weak learner 3

Weak learner 2

Weak learner 1

Wavelength, nm

Fig. 5 e The variable weight evolution of the first eight weak learners of Boosting-IPW-PLS models for: (a) COD, (b) TSS,

(c) O&G measurements.

wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 41140

first weak learner. This fact indicates its information

complementarity to UV/Vis spectrum and its informativeness

for the predictions of COD, TSS and O&G concentrations of

wastewater samples, as discussed in previous sections.

The frequency of the variable selected by theweak learners

is another important indicator for process analysis and model

prune. If a variable has never been selected in any weak

learner construction, it may be quality-irrelevant and can be



200 300 400 500 600 700 800

0

10

20

30

40

50

60

70

80

Freq

uen

cy

a

Wavelength, nm

200 300 400 500 600 700 800

0

10

20

30

40

50

60

70

80

Fre

qu

en

cy

b

Wavelength, nm

200 300 400 500 600 700 800

0

10

20

30

40

50

60

70

80

90c

Freq

uen

cy

Wavelength, nm

Fig. 6 e The frequency of the variables selected by the weak

learners of the Boosting-IPW-PLS models for (a) COD,


0 250 500 750 1000 1250 1500 1750 2000 2250 2500

0

250

500

750

1000

1250

1500

1750

2000

2250

2500

Analytical Values,mg/L

Pre

dic

te

d v

alu

es,m

g/L

R=0.945

a

0 50 100 150 200 250 300 350 400 450

0

50

100

150

200

250

300

350

400

450

Pred

icted

valu

es,m

g/L


R=0.965

b

0 50 100 150 200 250 300 350 400 450 500 550

0

50

100

150

200

250

300

350

400

450

500

550

Pred

icted

valu

es

,m

g/L

c

R=0.945


Fig. 7 e Scatter plots of predicted versus analytical value for

the test set using Boosting-IPW-PLS models for (a) COD,


wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 4 1141

removed from the original X space. Fig. 6 shows the frequency

of the variables selected by the weak learners of the Boosting

models for COD, TSS and O&G measurements. Here, whether

or not a variable is selected by aweak learner is determined by

its weights assigned by IPW-PLS. If the variable weight is less

than a very small value, for example 10�6 in the present study,

the corresponding variable is considered as quality-irrelevant

and will not be selected by the weak learner. As seen from

Fig. 6, except the turbiditymeasurement (at the wavelength of

801 nm), only a very small portion of variables in the UV region

and near-infrared region were selected in the construction of



Table 3 e Summary of COD prediction using different methods.

Method Number of LVs M Number of variables MaxE (mg/L) MinE (mg/L) RMSEP (mg/L) R

PLS 7 e 1750 372 �347 176 0.903

Boosting-PLS 1 3013 1750 467 �314 157 0.922

UVE-PLS 8 e 205 1020 �413 216 0.915

IPW-PLS 7 e 22 372 �347 176 0.903

Boosting-IPW-PLS 1 199 36 453 �261 141 0.945

Table 4 e Summary of TSS prediction using different methods.

Method Number of LVs M Number of variables MaxE (mg/L) MinE (mg/L) RMSEP (mg/L) R

PLS 12 e 1750 57.3 �102 35.9 0.939

Boosting-PLS 1 4456 1750 53.9 �119 37.5 0.94

UVE-PLS 4 e 279 54.3 �108 34.7 0.96

IPW-PLS 12 e 45 57.3 �102 35.9 0.939

Boosting-IPW-PLS 1 183 20 55.7 �90.5 30.2 0.965

wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 41142

the prediction models. This was also proved in the variable

weight evolution of the weak learners.

4.3.3. Prediction performance of Boosting-IPW-PLSWith variable weights and model size determined, one could

use the Boosting-IPW-PLS models to predict the water quality

indices, as shown in Fig. 7 and Tables 3e5. The predicted

values fit the analytical values well with high correlation

coefficients and small RMSEP values, as 0.945, 0.965, 0.945 and

141, 30.2, 34 mg/L for COD, TSS and O&G measurements,

respectively, revealing the effectiveness of the proposed

system.

4.4. Comparisons of different methods

For comparison of the performances of the commonly used

modeling methods, MaxE, MinE, RMSEP and R for the test sets

by PLS, Boosting-PLS, UVE-PLS, IPW-PLS and Boosting-IPW-

PLS are summarized in Tables 3e5. Among these methods,

PLS is themost commonly used one, UVE-PLS and IPW-PLS are

known as variable selection/modeling methods, and the

Boosting-PLS is known as an ensemble one. Here, the number

of latent variables of PLS, UVE-PLS and IPW-PLS was deter-

mined by 5-fold-cross-validation with respect to the root

mean square error. The Boosting-PLS only used one latent

variable and the shrinkage valuewas set as 0.9 as suggested by

Zhang et al. (2005). The size of Boosting-PLS model was

determined by 5-fold-cross-validation with respect to the root

mean square error also while the size of Boosting-IPW-PLS

Table 5 e Summary of O&G prediction using different method

Method Number of LVs M Number of variab

PLS 18 e 1750

Boosting-PLS 1 5356 1750

UVE-PLS 6 e 41

IPW-PLS 25 e 71

Boosting-IPW-PLS 1 283 27

was determined by the proposed stopping criterion, as dis-

cussed in previous sections.

From the results listed in Tables 3e5, it is clear that the

Boosting-IPW-PLS method gives the best overall results in the

predictions of these three water quality indices, followed by

UVE-PLS, Boosting-PLS, PLS and IPW-PLS. Although UVE-PLS

has comparable prediction performance in TSS and O&G

predictions, its MaxE and MinE values in COD predictions are

comparatively large. The possible reason may rely on its

disability in handling outliers. The fact that Boosting-PLS

requires thousands of weak learners to build up the final

Boosting model may be attributed to the fact that only one

latent variable was used in the PLS weak learner construction.

With only one weak learner, only a small portion of the

quality-informative features was extracted from the noisy

fused data. As the consequence, more weak learners were

added to achieve acceptable prediction ability, resulting in

a very complex model structure, which is not preferable in

practice. This is one of the important reasons why IPW-PLS

was incorporated into Boosting scheme to suppress the

quality-irrelevant variables and further reduce the complexity

of the model. It is quite interesting that PLS and IPW-PLS had

almost the same prediction results for COD and TSS concen-

trations although there is a significant difference in the

number of the variables enrolled in themodeling. Thismay be

attributed to the fact that both PLS and IPW-PLS used the same

number of latent variables, and the variables retained in IPW-

PLS models express the same amount of the quality-

informative features with the PLS models. It should be noted

s.

les MaxE (mg/L) MinE (mg/L) RMSEP (mg/L) R

95.3 �80.2 35.9 0.936

84 �118 37.8 0.94

66.9 �79.3 34.2 0.956

98.3 �92.2 38.3 0.928

85.1 �90.3 34 0.945



wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 4 1143

that with different information expressed, the prediction

ability of the models would differ, as shown in the O&G

concentration predictions using PLS and IPW-PLS models.

5. Conclusions

A multi-sensor water quality monitoring system incorpo-

rating an UV/Vis spectrometer and a turbidimeter was first

proposed and tested to monitor COD, TSS and O&G online.

The COD, TSS and O&G prediction models were constructed

by the Boosting-IPW-PLS method that was first developed in

the present study. The sizes of COD, TSS and O&G prediction

models were 199, 183 and 283 with only one PLS component

used in IPW-PLS for variable weighting and the PLS weak

learner construction. Themonitoring systemwas tested in the

field. Experimental results showed that the predicted values

fit the analytical values well with high correlation coefficients

and small RMSEP values, as 0.945, 0.965, 0.945 and 141, 30.2

and 34 mg/L for COD, TSS and O&G measurements, respec-

tively, revealing the effectiveness of the proposed system.

r e f e r e n c e s

Akaike, H., 1974. A new look at the statistical model identification.IEEE Transactions on Automatic Control 19, 716e723.

American Public Health Association (APHA), American WaterWorks Association, Water Environment Federation, 2005.Standard Methods for the Examination of Water andWastewater, 21st ed. American Public Health Association,New York.

Bookman, S.K.E., 1997. Estimation of biochemical oxygen demandin slurry and effluent using ultraviolet spectrophotometry.Water Research 31, 372e374.

Centner, V., Massart, D.L., de Noord, O.E., de Jong, S.,Vandeginste, B.M., Sterna, C., 1996. Elimination ofuninformative variables for multivariate calibration.Analytical Chemistry 68, 3851e3858.

Chambers, B., Jones, G., 1988. Optimization and uprating ofactivated sludge plants by efficient process design. WaterScience and Technology 20, 121e132.

Chen, D., Hu, B., Shao, X.G., Su, Q.D., 2004. Variable selection bymodified IPW (iterative predictor weighting)-PLS (partial leastsquares) in continuous wavelet regression models. Analyst129, 664e669.

Chen, G., Chen, X., Yue, P.L., 2000. Electrocoagulation andelectroflotation of restaurant wastewater. Journal ofEnvironmental Engineering 126, 858e863.

Drainage Services Department of the Government of the HongKong Special Administrative Region (DSD-HKSAR), 2011.Sewage Service Charging Scheme [Online]. Available at: http://www.dsd.gov.hk/EN/Sewage_Services_Charging_Scheme/index.html (accessed 10.09.11).

Drucker, H., 1997. Improving regressors using boostingtechniques. In: Fisher, D.H. (Ed.), Proceedings of the 14thInternational Conferences on Machine Learning. MorganKaufmann, San Mateo, CA, pp. 107e115.

Duffy, N., Helmbold, D., 2002. Boosting methods for regression.Machine Learning 47, 153e200.

Esteban, J., Starr, A., Willetts, R., Hannah, P., Bryanston-Cross, P.,2005. A review of data fusion models and architectures:

towards engineering guidelines. Neural Computing &Applications 14, 273e281.

Fogelman, S., Blumenstein, M., Zhao, H., 2006. Estimation ofchemical oxygen demand by ultraviolet spectroscopicprofiling and artificial neural networks. Neural Computing &Applications 15, 197e203.

Forina, M., Casolino, C., Millan, C.P., 1999. Iterative predictorweighting (IPW) PLS: a technique for the elimination of uselesspredictors in regression problems. Journal of Chemometrics13, 165e184.

Freund,Y., Schapire, R.E., 1997. Adecision-theoretic generalizationof on-line learning and an application to boosting. Journal ofComputer and System Sciences 55, 119e139.

Friedman, J., 2001. Greedy function approximation: a gradientboosting machine. Annals of Statistics 29, 1189e1232.

Friedman, J., Hastie, T., Tibshirani, R., 2000. Additive logisticregression: a statistical view of boosting. Annals of Statistics28, 337e374.

Geladi, P., Kowalski, B., 1986. Partial least-squares regression:a tutorial. Analytica Chimica Acta 185, 1e17.

Goicoechea, H.C., Olivieri, A.C., 2002. Wavelength selection formultivariate calibration using a genetic algorithm: a novelinitialization strategy. Journal of Chemical Information andModeling 42, 1146e1153.

Hoskuldsson, A., 1988. PLS regression methods. Journal ofChemometrics 2, 211e228.

Klein, L.A., 2004. Sensor and Data Fusion: A Tool for InformationAssessment and Decision Making. SPIE Press, Bellingham,Wash.

Lourenco, N.D., Chaves, C.L., Novais, J.M., Menezes, J.C.,Pinheiro, H.M., Diniz, D., 2006. UV spectra analysis for waterquality monitoring in a fuel park wastewater treatment plant.Chemosphere 65, 786e791.

Lu, X., Jiang, J.H., Wu, H.L., Shen, G.L., Yu, R.Q., 2007. Variable-weighted PLS. Chemometrics and Intelligent LaboratorySystems 85, 140e143.

Lutz, R.W., Kalisch, M., Buhlmann, P., 2008. Robustified L2boosting. Computational Statistics and Data Analysis 52,3331e3341.

Olsson, G., Nielsen, M.K., Yuan, Z., Lynggaard-Jensen, A.,Steyer, J.-P., 2005. Instrumentation, Control and Automationin Wastewater Systems. IWA Publishing, London.

Pouet, M.-F., Thomas, O., Jacobsen, B.N., Lynggaard-Jensen, A.,Quevauviller, P., 1999. Conclusions of the workshop onmethodologies for wastewater quality monitoring. Talanta 50,759e762.

Ridgeway, G., 1999. The state of boosting. Computing Science andStatistics 31, 172e181.

Roig, B., Chalmin, E., Touraud, E., Thomas, O., 2002. Spectroscopicstudy of dissolved organic sulfur (DOS): a case study ofmercaptans. Talanta 56, 585e590.

Russell, S., Marshallsay, D., MacCraith, B., Devisscher, M., 2003.Non-contact measurement of wastewater polluting load e

the Laodmon project. Water Science and Technology 47,79e86.

Shao, X., Bian, X., Cai, W., 2010. An improved boosting partialleast squares method for near-infrared spectroscopicquantitative analysis. Analytica Chimica Acta 666, 32e37.

Swierenga, H., Wulfert, E., de Noord, O.E., de Weijer, A.P.,Smilde, A.K., Buydens, L.M.C., 2000. Development of robustcalibration models in near infrared spectrometricapplications. Analytica Chimica Acta 411, 121e135.

Tan, C., Wang, J., Wu, T., Qin, X., Li, M., 2010. Determination ofnicotine in tobacco samples by near-infrared spectroscopyand boosting partial least squares. Vibrational Spectroscopy54, 35e41.

Thomas, O., Constant, D., 2004. Trends in optical monitoring.Water Science and Technology 49, 1e8.

http://www.dsd.gov.hk/EN/Sewage_Services_Charging_Scheme/index.html





wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 41144

US Environmental Protection Agency (EPA), 1999. Method 1664,Revision A: N-Hexane Extractable Material (HEM; Oil andGrease) and Silica Gel Treated N-Hexane Extractable Material(SGT-HEM; Non-Polar Material) by Extraction and Gravimetry[Online]. Available at: http://water.epa.gov/scitech/methods/cwa/oil/upload/2007_07_10_methods_method_oil_1664.pdf(accessed 04.06.11).

Vapnik, V.N., 2000. The Nature of Statistical Learning Theory,second ed. Springer, New York.

Varshney, P.K., 1997. Multisensor data fusion. Electronics &Communication Engineering Journal, 245e253.

Wold, S., Ruhe, A., Wold, H., Dunn III, W., 1984. The collinearityproblem in linear regression. The partial least squares (PLS)approach to generalized inverses. SIAM Journal on Scientificand Statistical Computing 5, 735e743.

Wu, X.L., 2007. An applied research on information fusion andensemble learning for spectral analysis of water quality. Ph.D.thesis, Zhejiang University, Hangzhou, China.

Wu, X.L., Li, Y.J., Wu, T.J., 2006. A boosting-partial least squaredmethod for ultraviolet spectroscopic analysis of waterquality. Chinese Journal of Analytical Chemistry 8,1091e1095.

Wu, X.L., Li, Y.J., Wu, T.J., 2007. Application of multi-spectralinformation fusion for water quality analysis. Chinese Journalof Analytical Chemistry 12, 1716e1720.

Zemel, R., Pitassi, T., 2001. A gradient-based boostingalgorithm for regression problems. In: Leen, T.K.,Dietterich, T.G., Tresp, V. (Eds.), 2001. Advances in NeuralInformation Processing Systems, vol. 13. MIT Press,Cambridge, MA.

Zhang, M.H., Xu, Q.S., Massart, D.L., 2005. Boosting partial leastsquares. Analytical Chemistry 77, 1423e1431.

Zhou, Y.P., Cai, C.B., Huan, S., Jiang, J.H., Wu, H.L., Shen, G.L.,Yu, R.Q., 2007. QSAR study of angiotensin II antagonists usingrobust boosting partial least squares regression. AnalyticaChimica Acta 593, 68e74.

http://water.epa.gov/scitech/methods/cwa/oil/upload/2007_07_10_methods_method_oil_1664.pdf

http://water.epa.gov/scitech/methods/cwa/oil/upload/2007_07_10_methods_method_oil_1664.pdf



wastewater quality monitoring system using sensor fusion and machine learning techniques

Documents