wastewater quality monitoring system using sensor fusion and machine learning techniques
TRANSCRIPT
ww.sciencedirect.com
wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 4
Available online at w
journal homepage: www.elsevier .com/locate/watres
Wastewater quality monitoring system using sensor fusionand machine learning techniques
Xusong Qin, Furong Gao, Guohua Chen*
Department of Chemical and Biomolecular Engineering, The Hong Kong University of Science and Technology, Clear Water Bay,
Kowloon, Hong Kong, China
a r t i c l e i n f o
Article history:
Received 22 June 2011
Received in revised form
4 October 2011
Accepted 5 December 2011
Available online 11 December 2011
Keywords:
Online monitoring
UV/Vis spectroscopy
Turbidity
Variable weighting
Boosting-IPW-PLS
Wastewater treatment
* Corresponding author.E-mail address: [email protected] (G. Che
0043-1354/$ e see front matter ª 2011 Elsevdoi:10.1016/j.watres.2011.12.005
a b s t r a c t
Amulti-sensor water quality monitoring system incorporating an UV/Vis spectrometer and
a turbidimeter was used to monitor the Chemical Oxygen Demand (COD), Total Suspended
Solids (TSS) and Oil & Grease (O&G) concentrations of the effluents from the Chinese
restaurant on campus and an electrocoagulationeelectroflotation (ECeEF) pilot plant. In
order to handle the noise and information unbalance in the fused UV/Vis spectra and
turbidity measurements during the calibration model building, an improved boosting
method, Boosting-Iterative Predictor Weighting-Partial Least Squares (Boosting-IPW-PLS),
was developed in the present study. The Boosting-IPW-PLS method incorporates IPW into
boosting scheme to suppress the quality-irrelevant variables by assigning small weights,
and builds up the models for the wastewater quality predictions based on the weighted
variables. The monitoring system was tested in the field with satisfactory results, under-
lying the potential of this technique for the online monitoring of water quality.
ª 2011 Elsevier Ltd. All rights reserved.
1. Introduction water quality monitoring. However, due to the complex
Monitoring wastewater quality is a subject of growing
importance around the world such that better understanding
of both treated and untreated effluent is improved for a better
treatment plants control. For example, it has been estimated
that online monitoring for real-time process control may save
asmuch as 40% of the energy (themajor part of cost) currently
needed for wastewater treatment by continuous aeration
(Chambers and Jones, 1988; Olsson et al., 2005). However, the
available wastewater quality monitoring technologies have
several drawbacks in what concerns the control and optimi-
zation of the treatment plants (Pouet et al., 1999). In addition
to sampling and sample storage problems, the standard
analytical methods currently available do not allow the
implementation of real-time monitoring and process control.
The use of better online sensors is in imminent demand for
n).ier Ltd. All rights reserved
pollutant matrix and generally hostile environment, the lack
of accurate, cost-effective and robust sensors, the automation
of wastewater treatment systems is still not as developed as
other process industries.
In view of the high potential for the development and
application of online measurements in water quality moni-
toring, UV/Vis spectroscopy has attracted substantial atten-
tion and led to some useful results (Bookman, 1997; Fogelman
et al., 2006; Lourenco et al., 2006), although most of the re-
ported wastewater UV/Vis spectrometric applications are
based on visual observation and direct comparison of the UV/
Vis spectra (Roig et al., 2002). Several reports applied multi-
variate monitoring approaches (Lourenco et al., 2006; Wu
et al., 2006; Wu, 2007). It is worth to point out that in most of
these reports UV/Vis spectra were examined with the waste-
water samples being static and of low strength. However, for
.
wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 41134
some effluents such as restaurantwastewater, concentrations
of oil and grease, suspended solids and colloids are very high
(Chen et al., 2000), making it very difficult to obtain a satis-
factory monitoring performance using a single optical tech-
nique. In addition, these pollutants tend to foul the light
transmitting windows, degrading the monitoring perfor-
mance of the UV/Vis spectroscopy. Thus, one may need to
develop a combined optical method, combining two or three
optical spectral sources and process variables to compensate
their drawbacks so as to improve themonitoring performance
(Russell et al., 2003; Thomas and Constant, 2004;Wu, 2007;Wu
et al., 2007).
The combined optical method is essentially one typical
application of the sensor fusion technology (Wu, 2007). Sensor
fusion, which is also known as multi-sensor data fusion, first
appeared in the literature as mathematical models for data
manipulation in the 1960s. It refers to the acquisition, pro-
cessing and synergistic combination of information gathered
by various knowledge sources and sensors to provide a better
understanding of a phenomenon. It is a fascinating and
rapidly evolving field that has generated a lot of excitement in
the research and development community, and is being
applied to a wide variety of the fields such as military
command and control, robotics, image processing, air traffic
control, medical diagnostics, pattern recognition and envi-
ronmental monitoring (Varshney, 1997). However, the appli-
cations of sensor fusion techniques are disparate and
problem-dependent (Klein, 2004; Esteban et al., 2005). It is
impossible to use a one-fits-all technique/algorithm to solve
all the problems as well as the frameworks for the imple-
mentation of the sensor fusion systems. Therefore, in order to
construct an optimal sensor fusion system for the combined
optical monitoring system, one has to properly select the
optical spectral sources/process variables, the sensor fusion
framework and its corresponding algorithms at each step. As
a result, the candidate optical spectral sources and process
variables for the combined optical method should be
complementary from chemical and/or physical aspects for the
water quality measurement. However, in spite of their infor-
mation complementarities, information redundancy and
unbalance problems usually occur because of the variable
collinearity among spectral wavelengths and process vari-
ables, and the significant dimensionality difference among
the various types of optical spectra and process variables. If
the sensor fusion strategy and fusion levels are not properly
selected, the modeling effort may concentrate on the data
sources with higher dimensionality rather on the more
quality-informative low dimensional data sources. The cali-
bration model obtained therefore may be less accurate due to
the loss of quality-related information. In addition, the results
obtained are oftenworsened by the presence of uninformative
variables, such as the highly fluctuating background and
noises, in optical spectra/process variables. Since not all
wavelengths/variables are useful for quality prediction,
various sensor fusion systems and variable selection/feature
extraction methods (Wu, 2007; Wu et al., 2007; Centner et al.,
1996; Forina et al., 1999; Swierenga et al., 2000; Goicoechea
and Olivieri, 2002; Chen et al., 2004; Lu et al., 2007) have
been designed to cooperate with the calibration modeling
methods. In this way, the variable selection procedure is
performed only once, and only one set of those most quality-
related variables is retained as descriptors for regression
modeling. However, this may lead to information loss
compared with the original spectra/process variables space,
more or less, resulting in accuracy loss of the regression
models.
In the past two decades, the application of Boosting to
regression problems has received significant attention
because its ensemble learning nature can produce higher
predictive accuracy than single model strategies. Freund and
Schapire (1997) proposed the first algorithm of Boosting for
regression problems, the AdaBoost.R algorithm. The most
important contribution of this method is the majority vote
idea that combines a group of weighted weak learners (fitting
models) which only guarantee to achieve an error rate of
slightly less than that achieved by random guessing. The
weights of the weak learners are defined by their accuracies,
respectively. Drucker (1997) developed the AdaBoost.R2 algo-
rithm, which is an ad hoc modification of the AdaBoost.R
algorithm. The advantage of Drucker’s method is its ad hoc
ability, i.e., any learners, nomatter linear or non-linear, can be
incorporated. Many other researchers (for examples,
Ridgeway (1999), Friedman et al. (2000), Friedman (2001),
Zemel and Pitassi (2001), Duffy and Helmbold (2002)) have
viewed Boosting as a “gradient machine” that optimizes
a particular loss function. In this sense, Boosting is essentially
a method that combines a group of weak learners that
perform marginally better than random guessing to obtain
a powerful learner in regression. These weak learners are
constructed through iterative steps by always using a basic
learning algorithm. In each step, a new learner (fitting model)
is established by relating the predictor variables of X (an n� p
matrix with p predictors for n samples) to the residuals
(prediction errors) of the responses y (an n� 1 vector for n
samples) that are not fitted by previous learners. The
Boosting-partial least squares (Boosting-PLS) and its modifi-
cations (Zhang et al., 2005; Wu et al., 2006; Zhou et al., 2007;
Lutz et al., 2008; Shao et al., 2010; Tan et al., 2010) proposed
recently introduced PLS into the boosting procedure by
combining a set of shrunken PLS models. However, the above
Boosting methods construct all the weak learners based on
the same p predictor variables of X. The highly fluctuating
background and noises in X definitely weaken the predictive
ability of the weak learners, as well as the Boosting model.
From the nature of sequential additions of weak learners in
boosting procedure, it is worthwhile to extract the quality-
informative features/variables only for the weak learner in
each iteration, for the improvement of the predictive accuracy
and robustness of the Boosting model.
Because the trade effluent surcharge of the restaurant is
determined by the effluent COD value according to the sewage
services ordinance of Hong Kong (DSD-HKSAR, 2011), and the
suspended solids and oil & grease have strong impacts on the
light penetration and the biodegradation ability of the water,
these three water quality indices are of particular interest in
the present study. The objective of the present study was
therefore to develop a wastewater quality monitoring system
that incorporates a UV/Vis spectrometer and a turbidimeter to
monitor COD, TSS and O&G concentrations of the effluents of
the Chinese restaurant on campus and a pilot ECeEF
wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 4 1135
wastewater treatment plant. Sensor fusion technique was
used to fuse the signals from these two sensors/instruments.
An improved boosting method, Boosting-IPW-PLS, was
developed here to handle the noises and information unbal-
ance of the fused information, to model and predict the water
quality. The system was evaluated in field trials. Satisfactory
results were obtained as seen subsequently.
2. Theory and methods
2.1. Partial least squares (PLS)
Suppose a data set {X, y} where X is an n� p process variable
data matrix with p predictor variables for n samples, e.g. n
wastewater samples with corresponding UV/Vis spectra (each
UV/Vis spectrum has p wavelengths), and y is the corre-
sponding dependent variable vector with size n� 1, e.g. COD
measurements of these n wastewater samples. PLS models
both outer relations within X and y blocks and inner relations
between two blocks. The equations are as follows:
X ¼ TPT þ E ¼Xai¼1
tipTi þ E; (1)
y ¼ UQT þ F ¼Xai¼1
uiqTi þ F; (2)
ui ¼ biti; (3)
where T, P and E are the score matrix, loading matrix and
residual matrix of X space while U, Q and F are score matrix,
loading matrix and residual matrix of y space, respectively. ti,
pi, ui and qi are the corresponding vectors of T, P, U and Q
matrices. Eqs. (1) and (2) describe the outer relations, Eq. (3)
describes the inner relations between y space and X space,
a is the number of latent variables, bi¼ tiTui/(ti
Tti) is the
regression coefficient between the PLS component ti from X
space and the PLS component ui from y space. The standard
PLS procedure based on non-linear iterative partial least
squares (NIPALS) algorithm and the methods to choose the
number of latent variables (LVs) of PLS (cross-validation,
jackknife and so on) can be found in the works of Wold et al.
(1984), Geladi and Kowalski (1986), and Hoskuldsson (1988).
Based on NIPALS algorithm, the prediction of new sample j
can be written as:
yaj ¼ xjb
a þ ca; (4)
where ba is the coefficient vector ( p� 1), ca is the offset when
a latent variables are used, and xj is the predictor variable
vector (1� p) of sample j.
2.2. Uninformative variable elimination-partial leastsquares (UVE-PLS)
In a PLS model, there are some of the variables that can be
noisy and/or do not contain information relevant to the
prediction of y. Eliminating these variables from the explan-
atory data can improve the model. UVE-PLS method proposed
by Centner et al. (1996) is one of the methods to eliminate the
uninformative variables.
In UVE-PLS method, a PLS regression coefficient matrix
ba¼ [b1a,.bp
a] with a latent variables is calculated through
a leave-one-out validation. Because each coefficient bka repre-
sents the contribution of the corresponding variable to the
establishedmodel, the reliability of each variable (wavelength)
k can be quantitatively measured by its stability defined as:
ck ¼mean
�bak
�std�bak
� ; (5)
where mean(bka) and std(bk
a) are the mean and standard devi-
ation of the regression coefficients of variable k. To determine
the uninformative variables, UVE-PLS adds an equal number
of random predictors or artificial predictors with very small
value (range of about 10�10) to the original predictors. The
maximum of the absolute value of the reliability value Ccutoff
defined by Eq. (5) for the added artificial predictors is the cut-
off value for the elimination of non-informative original
predictors. Only the original variables which have reliability
values larger than Ccutoff will be retained.
2.3. Iterative predictor weighting-partial least squares(IPW-PLS)
The IPW-PLS method originally developed by Forina et al.
(1999) aims at producing acceptable calibration models with
a small number of variables. The useless or redundant
predictors in the PLS regression have been eliminated. The key
component of the IPW-PLS is tomultiply the variables by their
importance in the cyclic repetition of PLS regression. The
importance of the variable is defined as:
zk ¼��ba
k
��skPpk¼1
��bak
��sk; (6)
where sk and bka are the standard deviation and PLS regression
coefficient of the variable k respectively, and p is the number
of variables.
2.4. Boosting-partial least squares (Boosting-PLS)
The basic idea of Boosting is to sequentially construct additive
regression models by fitting a basic learner to the current
residuals that are not fitted by previousmodels, and finally the
weighted predictions of a collection of regression models are
used as an ensemble prediction. Using PLS as the basic/weak
algorithm, one can obtain the Boosting-PLS algorithm (Zhang
et al., 2005). Because of the nature of Boosting, Boosting-PLS
does not require the selection of an adequate number of
latent variables, in contrast to the classical PLS. With proper
shrinkage value and number of iterations determined,
Boosting-PLS has at least comparable prediction ability as
classical PLS (Zhang et al., 2005).
2.5. Boosting-IPW-PLS
In the standard boosting procedures, the weak learners are
constructed from the X matrix (n� p, with p predictors for n
samples) and the current residuals (n� 1, for n samples) that
are not fitted by previous models. In the Boosting model
construction process, weights of the samples are updated
wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 41136
according to the prediction errors of previous models, and are
utilized for later weak learner training. However, the predic-
tors in X matrix are not weighted/modified. The quality-
irrelevant predictors, which are not useful to the estimation
of quality indices, definitely affect the establishments of the
weak learners. Therefore, the loss of accuracy and robustness
of the Boostingmodelmight occur. Realizing this fact, a robust
calibration modeling method, Boosting-IPW-PLS, is proposed
on the bases of predictorweighting and boosting framework in
the present study, as shown in Fig. 1. The proposed algorithm
is derived from SQUARELEV.R algorithm developed by Duffy
and Helmbold (2002). It integrates the predictor weighting
into the boosting procedure to suppress the quality-irrelevant
variables by signing smallweights. The algorithm is as follows.
Consider a data set {X, y} and a Boosting model with size M
(M weak learners), where X is an n� p process variable data
matrix with p predictor variables for n samples, and y is the
corresponding dependent variable vector with size n� 1. The
Boosting model F0 is initialized as the zero function.
For m¼ 1, 2, ., M, repeat the following steps 1e6.
Step 1 Calculate the residual of the Boostingmodel obtained in
last iteration,
yres;m ¼ y� Fm�1ðXÞ: (7)
Step 2 Perform predictor weighting on the original variables
with respects to X and yres,m using IPW-PLS method,
and get the variable weight vector wm with size 1� p
according to Eq. (6).
Step 3 Multiply the variables weights with the predictors of
original X.
Xm ¼ X :� Wm: (8)
Here Wm is an n� p matrix with each row as the variable
weight vector wm, and the operator .� is the element-by-
element multiplication operator.
Fig. 1 e Schematic diagram of Boosting-IPW-PLS method.
Step 4 Construct a PLS weak learner fm on the weighted
predictors Xm and the current residual yres,m,
yres;m ¼ fmðXmÞ þ Em ¼ Xmbm þ cm þ Em; (9)
where b and c are the corresponding PLS regression coeffi-
m mcient vector and offset, the Em is residual not fitted by current
weak learner.
Step 5 Calculate the shrinkage value of the current weak
learner,
am ¼�yres;m � �yres;m
��fmðXmÞ � �fmðXmÞ
���fmðXmÞ � �fmðXmÞ
��2 : (10)
Here, �yres;m and �fmðXmÞ are the mean values of the current
residual and predictions, respectively.
Step 6 Update the Boosting model,
Fm ¼ Fm�1 þ amfm: (11)
After finishingM boosting cycles,Mmember of PLSmodels are
built and their corresponding shrinkage parameters are
determined. The dependent variables yu (l� 1 vector) of the
unknown l samples with measurement matrix Xu (l� p
matrix) are predicted as:
yu ¼ a1yu;1 þ a2yu;2 þ a3yu;3 þ/þ aMyu;M
¼XMm¼1
amððXu :� UmÞbm þ cmÞ; (12)
where Um is an l� p matrix with each row as the variable
weight vector wm determined in Step 2; bm, cm, and am are
determined in Steps 4 and 5 respectively.
Now three questions remain. The first two questions are
the numbers of PLS components used in IPW-PLS for variable
weighting and the PLS weak learner construction. The third
one is when to stop adding models, that is, the size of M. For
the first two questions, one may use cross-validation, jack-
knife and other methods to determine the latent variables as
discussed in the works of Wold et al. (1984), Geladi and
Kowalski (1986) or Hoskuldsson (1988). For simplicity, only
one PLS component was used in IPW-PLS for variable
weighting and the PLS weak learner construction in the
present study. Since the X space and the residual space are
changed in every iteration, it is very important to select an
appropriate iteration time, M, to avoid overfitting. We
proposed one stopping criterion here to determine when to
stop adding models, derived from structural risk minimiza-
tion (SRM) principle (Vapnik, 2000) and Akaike’s information
criterion (AIC) (Akaike, 1974), as:
min CM ¼ n
�ln
2pn
��y� yM
��2þ 1
�þXMm¼1
am
Xph¼1
bm;h
!
s:t: M˛Nþ
bm;h ¼�1;
��bm;h
�� > 10�6
0;��bm;h
�� � 10�6 ; h ¼ 1; 2;.;p; m ¼ 1; 2;.;M; (13)
where yM is the prediction of the response y using a Boosting
model with size M, bm,h and am are the PLS regression coeffi-
cient and shrinkage value of weak learner m. The first term of
wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 4 1137
Eq. (13) represents the empirical error of the model of the
training set. The second term represents the complexity of the
Boosting model. For each weak learner, its model complexity
is considered as the number of the variables whose corre-
sponding absolute PLS regression coefficients are larger than
a very small value, for example 10�6 in the present study.
Since the prediction of the weak learner only contributes
partially to the final prediction as shown in Eq. (12), the cor-
responding weak learner’s model complexity also only
contributes partially to the final Boosting model complexity.
Because of the nature of Boosting, the empirical error of the
boosting model with respect to the training set will decrease
whenmoreweak learners are added. However, the complexity
of the boosting model will increase accordingly. When the
decrease of the empirical error predominates the increase of
the model complexity, CM decreases, and vice versa. As
a consequence, CM increases in the initial a few iterations, and
then increases as more weaker learners are added. The
optimal tradeoff between the empirical error and the model
complexity is therefore leading to the minimum CM. Thus, the
M value, which is corresponding to the minimum CM, is
therefore selected as the size of the Boosting model.
3. Experimental
3.1. Hardware setup
The experimental setup including the ECeEF treatment
system and water quality monitoring system is schematically
shown in Fig. 2. The water quality monitoring system consists
of two parts: the UV/Vis spectroscopy and the turbidimeter.
The UV/Vis spectroscopy and turbidimeter were operated in
continuous/online measurement mode in the present study.
Fig. 2 e Schematic diagram of the ECeEF treatment syst
Particularly, the wastewater was inmotion, falling through an
open chamber with a specified jet flow of a 10 mm in diameter
for UV/Vis spectrum acquisition. Since there is no contact
between the wastewater and the optical windows, the fouling
of the light transmitting windows could be eliminated. The
monitoring performance of the proposed system could then
be improved, and the corresponding maintenance of sensors
could be reduced to acceptable level.
The UV/Vis spectroscopy has a wavelength ranging from
200 to 800 nm with 1749 variables in the present study. The
UV/Vis spectrum data could be directly imported to the
computer. The turbidity measurement needs to be acquired
by the data acquisition system. Themeasurement range of the
turbidimeter is 0e1000 NTU. As a consequence, there are two
types of information, turbidity measurement and UV/Vis
spectrum of the water sample, could be utilized for the water
quality monitoring purpose, as seen from Fig. 2. Taking the
turbidity measurement as one artificial variable at UV/Vis
wavelength 801 nm, the combination of the UV/Vis spectrum
and the turbidity measurement has 1750 variables, ranging
from 200 to 801 nm.
In order to introduce the wastewaters from the restaurant,
EC reactor, and EF reactor to the water quality monitoring
system, a sampling system was installed. The wastewater
flows supplied to the water quality monitoring system were
then determined by the operation of the switch valve. In the
present study, the switching time of the switch valve was
3 min, providing 2 min for flushing and 1 min for water quality
monitoring. Therefore, the sampling intervals for the effluents
from the restaurant, EC reactor, and EF reactor were 9 min in
the present study. It is worth to note that this comparatively
long sampling interval is due to the switching operation of the
switch valve for the introduction of the three wastewater
sources to the monitoring system. If the monitoring system is
em and the wastewater quality monitoring system.
Table 2 e Typical wastewater samples.
COD(mg/L)
TSS(mg/L)
O&G(mg/L)
Turbidity(NTU)
Restaurant 1370 193 213 164
EC reactor 782 224 81.7 120
EF reactor 497 53.6 14.5 24.1
wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 41138
dedicated to one water source, real-time water quality moni-
toring could be achieved because the response time of the
water quality monitoring system is only 1 s.
3.2. Wastewater characteristics
In order to test the validity of the proposed water quality
monitoring system, a total of 163 samples were collected from
the Chinese restaurant and the pilot ECeEF wastewater
treatment plant on campus for COD, TSS and O&G measure-
ments. Due to the sampling handling problem during O&G
analysis, 10 samples were discarded for O&G measurements.
As a consequence, the numbers of the samples for COD, TSS
and O&G measurements are 163, 163 and 153, respectively.
These sample sets were then randomly separated into two
subsets, two-third for prediction model training while the
other one-third for testing. The concentrations of COD, TSS
and O&G and sample numbers for model training and testing
are listed in Table 1. The pH, conductivity and turbidity of
these samples ranged from 4.65 to 9, 172 to 1750 mS/cm, and
1.68 to 254 NTU, respectively. Here, TSS, pH and conductivity
were examined by the standard methods (APHA et al., 2005).
Turbidity was measured using HF Micro TOL turbidimeter (HF
scientific, Inc., USA). COD was measured using COD reactor
and direct reading spectrophotometer (DR/2000, Hach
Company, USA). O&Gwas examined by EPAmethod 1664A (US
EPA, 1999).
Table 2 lists the details of some typical wastewater
samples with their corresponding UV/Vis spectra as shown in
Fig. 3. As seen from Fig. 3, the UV/Vis spectra are very noisy
due to the strong interactions between UV/Vis light and the
particles, oil and grease droplets, especially in UV wavelength
range 200e250 nm and near-infrared wavelength range
600e800 nm.
1.5
2.0
2.5
3.0
orb
an
ce
, a
.u
.
4. Results and discussions
4.1. Data pre-processing and sensor fusion
The presence of the high concentrations of solid, oil and
grease contents would affect the UV/Vis spectrum measure-
ment significantly, as illustrated in Section 1 and demon-
strated in Section 3.2. The measurement of turbidity is a key
test of water quality. It is a measure of the degree to which the
water sample loses its transparency due to the presence of
suspended particulates, droplets of oil and grease, etc., which
scatter the light and prevent it from passing through. It is
essentially complementary to UV/Vis spectrum from this
aspect. Thus, two different kinds of instruments/sensors, UV/
Table 1 e Characteristics of wastewater samples.
Range (mg/L) Number of samples
Training Testing
COD 176e2550 108 55
TSS 9.7e410 108 55
O&G 0.93e525 102 51
Vis spectrometer and turbidimeter, were utilized and fused for
the construction of thewater qualitymonitoring system in the
present study.
As described in Section 3.1, the turbidity measurement has
only one variable while the UV/Vis spectrum has 1749 vari-
ables. Consequently, one could either perform the sensor
fusion after the feature extraction of UV/Vis spectrum, or vice
versa. In the present study, signal level of fusion was adopted
in order to utilize the information of the UV/Vis spectrum and
turbidity signal asmuch as possible. Prior to the sensor fusion,
standard variate transformationwas performed on the UV/Vis
spectra and turbidity signals respectively. As a result, the
fused data have zero-means and unit variances.
4.2. Performance indices
To evaluate the performance of the proposedmethod/system,
the following indices are used.
Maximum prediction error:
MaxE ¼ max�yj � yj
�; j ¼ 1;.; k: (14)
Minimum prediction error:
MinE ¼ min�yj � yj
�; j ¼ 1;.; k: (15)
Root mean square error of prediction:
RMSEP ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1k
Xkj¼1
�yj � yj
�2vuut : (16)
200 300 400 500 600 700 800
0.0
0.5
1.0
Ab
s
Wavelength, nm
1
23
Fig. 3 e Typical UV/Vis spectra of the effluents from (1)
restaurant, (2) EC reactor, and (3) EF reactor.
0 100 200 300 400 500
0
20
40
60
80
100
120
140a
Empirical error
Model complexity
Em
pirica
l e
rro
r / M
od
el c
om
ple
xity
Number of Boosting-IPW-PLS iterations
80
85
90
95
100
Optimal M value: 199
CM
CM
0 100 200 300 400 500
0
20
40
60
80
100
120b
Empirical error
Model complexity
Em
pirica
l e
rro
r / M
od
el c
om
ple
xity
Number of Boosting-IPW-PLS iterations
Optimal M value: 183
60
65
70
75
80
CM
CM
0 100 200 300 400 500
0
20
40
60
80
100
120
140
160
180
Empirical error
Model complexity
Em
piric
al e
rro
r / M
od
el c
om
ple
xity
Number of Boosting-IPW-PLS iterations
85
90
95
100
105
110
Optimal M value: 283
c
CM
CM
Fig. 4 e The empirical error, model complexity and CM with
various sizes of Boosting-IPW-PLS models for: (a) COD,
(b) TSS, (c) O&G measurements.
wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 4 1139
Correlation coefficient of the prediction values and analyt-
ical values:
R ¼ Xk
i¼1
�yj � �y
��yj � �y
�!, ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXkj¼1
�y
j
� �y�2Xk
j¼1
�yj � �y
�2vuut ; (17)
where k is the number of samples of the test set, yj is the
prediction of yj of the sample j of the test set, �y and �y are the
mean values of the predictions and responses, respectively.
4.3. Boosting-IPW-PLS model building
4.3.1. Determination of the size of Boosting-IPW-PLSBefore obtaining final prediction models for different quality
parameters, one needs to determine the size of Boosting-IPW-
PLS, as illustrated in Section 2.5. Fig. 4 shows the influence of
the ensemble sizes on the empirical error, model complexity,
and CM for COD, TSS and O&Gmeasurements, respectively. As
seen in Fig. 4, when the ensemble size increased, the empirical
error gradually decreased. This was obvious because in each
iteration the new weak learner focused on the residual of the
Boosting-IPW-PLS model obtained in last iteration. However,
with a too large M, the Boosting model often becomes too
strongly tailored to the particularities of the training set and
the model’s generalization capability to the new water
sampleswould be poorer. In addition, when the ensemble size
increased, more weak learners were added into the Boosting
model. As a consequence, the complexity of Boosting model
was increased, as shown in Fig. 4. Because a prediction model
with too complex structure is not preferable in practice, an
appropriate ensemble size of Boosting model should be
selected.
The proposed stopping criterion CM decreased firstly and
then increased as the ensemble size increased for the
Boosting-IPW-PLS models of these three water quality
measurements, as shown in Fig. 4. This phenomenon was
expected because the trend of CM is determined by the
empirical error and the model complexity, as described in Eq.
(13). CM would decease if the decrease of the empirical error
was larger than the increase of themodel complexity, and vice
versa. Therefore, theM value where CM reached theminimum
was selected as the ensemble size of the Boosting model. As
a result, the sizes of the Boosting-IPW-PLS models for COD,
TSS, and O&G measurements were selected as 199, 183, and
283 as found in Fig. 4, respectively.
4.3.2. Variable weight evolutionIn Boosting-IPW-PLS, the weight of each variable reflects the
correlation between the variable and the quality index. The
higher the weight is, the more informative the variable is. The
evolution of the variable weights along ensemble size reveals
the contribution of the variable to the final model.
Fig. 5 shows the variable weight evolution of the first eight
weak learners/ensembles of Boosting models for COD, TSS,
and O&G measurements obtained in the previous section.
Only a very small portion of the variables was enrolled in each
weak learner construction. This may be attributed to the fact
that only one latent variable was used in the IPW-PLS algo-
rithm for the variable weighting in the present study. In PLS,
the most significant variance information between the
independent space and response space is expressed by the
first several latent variables. If only one latent variable was
used in IPW-PLS, only the variables contributing to the most
significant covariance information were assigned with large
weights while the other variables may be assigned zero
weights. This is equivalent to variable selection/feature
extraction.
It is quite interesting that for the three Boosting models,
the turbidity measurement (at the wavelength of 801 nm) was
assigned with very large weights in the construction of the
200 300 400 500 600 700 800
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
Variab
le w
eig
ht
aWeak learner 8
Weak learner 7
Weak learner 6
Weak learner 5
Weak learner 4
Weak learner 3
Weak learner 2
Weak learner 1
Wavelength, nm
200 300 400 500 600 700 800
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1b
Variab
le w
eig
ht
Weak learner 8
Weak learner 7
Weak learner 6
Weak learner 5
Weak learner 4
Weak learner 3
Weak learner 2
Weak learner 1
Wavelength, nm
200 300 400 500 600 700 800
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1c
Variab
le w
eig
ht
Weak learner 8
Weak learner 7
Weak learner 6
Weak learner 5
Weak learner 4
Weak learner 3
Weak learner 2
Weak learner 1
Wavelength, nm
Fig. 5 e The variable weight evolution of the first eight weak learners of Boosting-IPW-PLS models for: (a) COD, (b) TSS,
(c) O&G measurements.
wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 41140
first weak learner. This fact indicates its information
complementarity to UV/Vis spectrum and its informativeness
for the predictions of COD, TSS and O&G concentrations of
wastewater samples, as discussed in previous sections.
The frequency of the variable selected by theweak learners
is another important indicator for process analysis and model
prune. If a variable has never been selected in any weak
learner construction, it may be quality-irrelevant and can be
200 300 400 500 600 700 800
0
10
20
30
40
50
60
70
80
Freq
uen
cy
a
Wavelength, nm
200 300 400 500 600 700 800
0
10
20
30
40
50
60
70
80
Fre
qu
en
cy
b
Wavelength, nm
200 300 400 500 600 700 800
0
10
20
30
40
50
60
70
80
90c
Freq
uen
cy
Wavelength, nm
Fig. 6 e The frequency of the variables selected by the weak
learners of the Boosting-IPW-PLS models for (a) COD,
(b) TSS, (c) O&G measurements.
0 250 500 750 1000 1250 1500 1750 2000 2250 2500
0
250
500
750
1000
1250
1500
1750
2000
2250
2500
Analytical Values,mg/L
Pre
dic
te
d v
alu
es,m
g/L
R=0.945
a
0 50 100 150 200 250 300 350 400 450
0
50
100
150
200
250
300
350
400
450
Pred
icted
valu
es,m
g/L
Analytical Values,mg/L
R=0.965
b
0 50 100 150 200 250 300 350 400 450 500 550
0
50
100
150
200
250
300
350
400
450
500
550
Pred
icted
valu
es
,m
g/L
c
R=0.945
Analytical Values,mg/L
Fig. 7 e Scatter plots of predicted versus analytical value for
the test set using Boosting-IPW-PLS models for (a) COD,
(b) TSS, (c) O&G measurements.
wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 4 1141
removed from the original X space. Fig. 6 shows the frequency
of the variables selected by the weak learners of the Boosting
models for COD, TSS and O&G measurements. Here, whether
or not a variable is selected by aweak learner is determined by
its weights assigned by IPW-PLS. If the variable weight is less
than a very small value, for example 10�6 in the present study,
the corresponding variable is considered as quality-irrelevant
and will not be selected by the weak learner. As seen from
Fig. 6, except the turbiditymeasurement (at the wavelength of
801 nm), only a very small portion of variables in the UV region
and near-infrared region were selected in the construction of
Table 3 e Summary of COD prediction using different methods.
Method Number of LVs M Number of variables MaxE (mg/L) MinE (mg/L) RMSEP (mg/L) R
PLS 7 e 1750 372 �347 176 0.903
Boosting-PLS 1 3013 1750 467 �314 157 0.922
UVE-PLS 8 e 205 1020 �413 216 0.915
IPW-PLS 7 e 22 372 �347 176 0.903
Boosting-IPW-PLS 1 199 36 453 �261 141 0.945
Table 4 e Summary of TSS prediction using different methods.
Method Number of LVs M Number of variables MaxE (mg/L) MinE (mg/L) RMSEP (mg/L) R
PLS 12 e 1750 57.3 �102 35.9 0.939
Boosting-PLS 1 4456 1750 53.9 �119 37.5 0.94
UVE-PLS 4 e 279 54.3 �108 34.7 0.96
IPW-PLS 12 e 45 57.3 �102 35.9 0.939
Boosting-IPW-PLS 1 183 20 55.7 �90.5 30.2 0.965
wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 41142
the prediction models. This was also proved in the variable
weight evolution of the weak learners.
4.3.3. Prediction performance of Boosting-IPW-PLSWith variable weights and model size determined, one could
use the Boosting-IPW-PLS models to predict the water quality
indices, as shown in Fig. 7 and Tables 3e5. The predicted
values fit the analytical values well with high correlation
coefficients and small RMSEP values, as 0.945, 0.965, 0.945 and
141, 30.2, 34 mg/L for COD, TSS and O&G measurements,
respectively, revealing the effectiveness of the proposed
system.
4.4. Comparisons of different methods
For comparison of the performances of the commonly used
modeling methods, MaxE, MinE, RMSEP and R for the test sets
by PLS, Boosting-PLS, UVE-PLS, IPW-PLS and Boosting-IPW-
PLS are summarized in Tables 3e5. Among these methods,
PLS is themost commonly used one, UVE-PLS and IPW-PLS are
known as variable selection/modeling methods, and the
Boosting-PLS is known as an ensemble one. Here, the number
of latent variables of PLS, UVE-PLS and IPW-PLS was deter-
mined by 5-fold-cross-validation with respect to the root
mean square error. The Boosting-PLS only used one latent
variable and the shrinkage valuewas set as 0.9 as suggested by
Zhang et al. (2005). The size of Boosting-PLS model was
determined by 5-fold-cross-validation with respect to the root
mean square error also while the size of Boosting-IPW-PLS
Table 5 e Summary of O&G prediction using different method
Method Number of LVs M Number of variab
PLS 18 e 1750
Boosting-PLS 1 5356 1750
UVE-PLS 6 e 41
IPW-PLS 25 e 71
Boosting-IPW-PLS 1 283 27
was determined by the proposed stopping criterion, as dis-
cussed in previous sections.
From the results listed in Tables 3e5, it is clear that the
Boosting-IPW-PLS method gives the best overall results in the
predictions of these three water quality indices, followed by
UVE-PLS, Boosting-PLS, PLS and IPW-PLS. Although UVE-PLS
has comparable prediction performance in TSS and O&G
predictions, its MaxE and MinE values in COD predictions are
comparatively large. The possible reason may rely on its
disability in handling outliers. The fact that Boosting-PLS
requires thousands of weak learners to build up the final
Boosting model may be attributed to the fact that only one
latent variable was used in the PLS weak learner construction.
With only one weak learner, only a small portion of the
quality-informative features was extracted from the noisy
fused data. As the consequence, more weak learners were
added to achieve acceptable prediction ability, resulting in
a very complex model structure, which is not preferable in
practice. This is one of the important reasons why IPW-PLS
was incorporated into Boosting scheme to suppress the
quality-irrelevant variables and further reduce the complexity
of the model. It is quite interesting that PLS and IPW-PLS had
almost the same prediction results for COD and TSS concen-
trations although there is a significant difference in the
number of the variables enrolled in themodeling. Thismay be
attributed to the fact that both PLS and IPW-PLS used the same
number of latent variables, and the variables retained in IPW-
PLS models express the same amount of the quality-
informative features with the PLS models. It should be noted
s.
les MaxE (mg/L) MinE (mg/L) RMSEP (mg/L) R
95.3 �80.2 35.9 0.936
84 �118 37.8 0.94
66.9 �79.3 34.2 0.956
98.3 �92.2 38.3 0.928
85.1 �90.3 34 0.945
wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 4 1143
that with different information expressed, the prediction
ability of the models would differ, as shown in the O&G
concentration predictions using PLS and IPW-PLS models.
5. Conclusions
A multi-sensor water quality monitoring system incorpo-
rating an UV/Vis spectrometer and a turbidimeter was first
proposed and tested to monitor COD, TSS and O&G online.
The COD, TSS and O&G prediction models were constructed
by the Boosting-IPW-PLS method that was first developed in
the present study. The sizes of COD, TSS and O&G prediction
models were 199, 183 and 283 with only one PLS component
used in IPW-PLS for variable weighting and the PLS weak
learner construction. Themonitoring systemwas tested in the
field. Experimental results showed that the predicted values
fit the analytical values well with high correlation coefficients
and small RMSEP values, as 0.945, 0.965, 0.945 and 141, 30.2
and 34 mg/L for COD, TSS and O&G measurements, respec-
tively, revealing the effectiveness of the proposed system.
r e f e r e n c e s
Akaike, H., 1974. A new look at the statistical model identification.IEEE Transactions on Automatic Control 19, 716e723.
American Public Health Association (APHA), American WaterWorks Association, Water Environment Federation, 2005.Standard Methods for the Examination of Water andWastewater, 21st ed. American Public Health Association,New York.
Bookman, S.K.E., 1997. Estimation of biochemical oxygen demandin slurry and effluent using ultraviolet spectrophotometry.Water Research 31, 372e374.
Centner, V., Massart, D.L., de Noord, O.E., de Jong, S.,Vandeginste, B.M., Sterna, C., 1996. Elimination ofuninformative variables for multivariate calibration.Analytical Chemistry 68, 3851e3858.
Chambers, B., Jones, G., 1988. Optimization and uprating ofactivated sludge plants by efficient process design. WaterScience and Technology 20, 121e132.
Chen, D., Hu, B., Shao, X.G., Su, Q.D., 2004. Variable selection bymodified IPW (iterative predictor weighting)-PLS (partial leastsquares) in continuous wavelet regression models. Analyst129, 664e669.
Chen, G., Chen, X., Yue, P.L., 2000. Electrocoagulation andelectroflotation of restaurant wastewater. Journal ofEnvironmental Engineering 126, 858e863.
Drainage Services Department of the Government of the HongKong Special Administrative Region (DSD-HKSAR), 2011.Sewage Service Charging Scheme [Online]. Available at: http://www.dsd.gov.hk/EN/Sewage_Services_Charging_Scheme/index.html (accessed 10.09.11).
Drucker, H., 1997. Improving regressors using boostingtechniques. In: Fisher, D.H. (Ed.), Proceedings of the 14thInternational Conferences on Machine Learning. MorganKaufmann, San Mateo, CA, pp. 107e115.
Duffy, N., Helmbold, D., 2002. Boosting methods for regression.Machine Learning 47, 153e200.
Esteban, J., Starr, A., Willetts, R., Hannah, P., Bryanston-Cross, P.,2005. A review of data fusion models and architectures:
towards engineering guidelines. Neural Computing &Applications 14, 273e281.
Fogelman, S., Blumenstein, M., Zhao, H., 2006. Estimation ofchemical oxygen demand by ultraviolet spectroscopicprofiling and artificial neural networks. Neural Computing &Applications 15, 197e203.
Forina, M., Casolino, C., Millan, C.P., 1999. Iterative predictorweighting (IPW) PLS: a technique for the elimination of uselesspredictors in regression problems. Journal of Chemometrics13, 165e184.
Freund,Y., Schapire, R.E., 1997. Adecision-theoretic generalizationof on-line learning and an application to boosting. Journal ofComputer and System Sciences 55, 119e139.
Friedman, J., 2001. Greedy function approximation: a gradientboosting machine. Annals of Statistics 29, 1189e1232.
Friedman, J., Hastie, T., Tibshirani, R., 2000. Additive logisticregression: a statistical view of boosting. Annals of Statistics28, 337e374.
Geladi, P., Kowalski, B., 1986. Partial least-squares regression:a tutorial. Analytica Chimica Acta 185, 1e17.
Goicoechea, H.C., Olivieri, A.C., 2002. Wavelength selection formultivariate calibration using a genetic algorithm: a novelinitialization strategy. Journal of Chemical Information andModeling 42, 1146e1153.
Hoskuldsson, A., 1988. PLS regression methods. Journal ofChemometrics 2, 211e228.
Klein, L.A., 2004. Sensor and Data Fusion: A Tool for InformationAssessment and Decision Making. SPIE Press, Bellingham,Wash.
Lourenco, N.D., Chaves, C.L., Novais, J.M., Menezes, J.C.,Pinheiro, H.M., Diniz, D., 2006. UV spectra analysis for waterquality monitoring in a fuel park wastewater treatment plant.Chemosphere 65, 786e791.
Lu, X., Jiang, J.H., Wu, H.L., Shen, G.L., Yu, R.Q., 2007. Variable-weighted PLS. Chemometrics and Intelligent LaboratorySystems 85, 140e143.
Lutz, R.W., Kalisch, M., Buhlmann, P., 2008. Robustified L2boosting. Computational Statistics and Data Analysis 52,3331e3341.
Olsson, G., Nielsen, M.K., Yuan, Z., Lynggaard-Jensen, A.,Steyer, J.-P., 2005. Instrumentation, Control and Automationin Wastewater Systems. IWA Publishing, London.
Pouet, M.-F., Thomas, O., Jacobsen, B.N., Lynggaard-Jensen, A.,Quevauviller, P., 1999. Conclusions of the workshop onmethodologies for wastewater quality monitoring. Talanta 50,759e762.
Ridgeway, G., 1999. The state of boosting. Computing Science andStatistics 31, 172e181.
Roig, B., Chalmin, E., Touraud, E., Thomas, O., 2002. Spectroscopicstudy of dissolved organic sulfur (DOS): a case study ofmercaptans. Talanta 56, 585e590.
Russell, S., Marshallsay, D., MacCraith, B., Devisscher, M., 2003.Non-contact measurement of wastewater polluting load e
the Laodmon project. Water Science and Technology 47,79e86.
Shao, X., Bian, X., Cai, W., 2010. An improved boosting partialleast squares method for near-infrared spectroscopicquantitative analysis. Analytica Chimica Acta 666, 32e37.
Swierenga, H., Wulfert, E., de Noord, O.E., de Weijer, A.P.,Smilde, A.K., Buydens, L.M.C., 2000. Development of robustcalibration models in near infrared spectrometricapplications. Analytica Chimica Acta 411, 121e135.
Tan, C., Wang, J., Wu, T., Qin, X., Li, M., 2010. Determination ofnicotine in tobacco samples by near-infrared spectroscopyand boosting partial least squares. Vibrational Spectroscopy54, 35e41.
Thomas, O., Constant, D., 2004. Trends in optical monitoring.Water Science and Technology 49, 1e8.
wat e r r e s e a r c h 4 6 ( 2 0 1 2 ) 1 1 3 3e1 1 4 41144
US Environmental Protection Agency (EPA), 1999. Method 1664,Revision A: N-Hexane Extractable Material (HEM; Oil andGrease) and Silica Gel Treated N-Hexane Extractable Material(SGT-HEM; Non-Polar Material) by Extraction and Gravimetry[Online]. Available at: http://water.epa.gov/scitech/methods/cwa/oil/upload/2007_07_10_methods_method_oil_1664.pdf(accessed 04.06.11).
Vapnik, V.N., 2000. The Nature of Statistical Learning Theory,second ed. Springer, New York.
Varshney, P.K., 1997. Multisensor data fusion. Electronics &Communication Engineering Journal, 245e253.
Wold, S., Ruhe, A., Wold, H., Dunn III, W., 1984. The collinearityproblem in linear regression. The partial least squares (PLS)approach to generalized inverses. SIAM Journal on Scientificand Statistical Computing 5, 735e743.
Wu, X.L., 2007. An applied research on information fusion andensemble learning for spectral analysis of water quality. Ph.D.thesis, Zhejiang University, Hangzhou, China.
Wu, X.L., Li, Y.J., Wu, T.J., 2006. A boosting-partial least squaredmethod for ultraviolet spectroscopic analysis of waterquality. Chinese Journal of Analytical Chemistry 8,1091e1095.
Wu, X.L., Li, Y.J., Wu, T.J., 2007. Application of multi-spectralinformation fusion for water quality analysis. Chinese Journalof Analytical Chemistry 12, 1716e1720.
Zemel, R., Pitassi, T., 2001. A gradient-based boostingalgorithm for regression problems. In: Leen, T.K.,Dietterich, T.G., Tresp, V. (Eds.), 2001. Advances in NeuralInformation Processing Systems, vol. 13. MIT Press,Cambridge, MA.
Zhang, M.H., Xu, Q.S., Massart, D.L., 2005. Boosting partial leastsquares. Analytical Chemistry 77, 1423e1431.
Zhou, Y.P., Cai, C.B., Huan, S., Jiang, J.H., Wu, H.L., Shen, G.L.,Yu, R.Q., 2007. QSAR study of angiotensin II antagonists usingrobust boosting partial least squares regression. AnalyticaChimica Acta 593, 68e74.