modelling the distribution of solar spectral irradiance using data mining techniques

10
Modelling the distribution of solar spectral irradiance using data mining techniques Rafael Moreno-Sáez a, 1 , Llanos Mora-López b, * a Departamento de Física Aplicada II, E.U. Politécnica, Universidad de Málaga, Campus de Teatinos, 29071 Malaga, Spain b Departamento de Lenguajes y Ciencias de la Computación, E.T.S.I. Informática, Universidad de Málaga, Campus de Teatinos, 29071 Malaga, Spain article info Article history: Received 30 May 2013 Received in revised form 27 November 2013 Accepted 9 December 2013 Available online 27 December 2013 Keywords: Solar spectral distribution K-means clustering Data mining techniques Statistical techniques Average photon energy abstract A procedure for modelling the distribution of solar spectral irradiance is proposed. It uses both statistical and data mining techniques. As a result, it is possible to simulate solar spectral irradiance distribution using some astronomical parameters and the meteorological parameters solar irradiance, temperature and humidity. With these parameters, the average photon energy and the normalization factor, which characterise the solar spectra, are estimated. First, the KolmogoroveSmirnov two-sample test is used to analyse and compare all measured spectra. The k-means data mining technique is subsequently used to cluster all measurements. We found that three clusters are enough to characterise all observed spectra. Finally, an articial neural network and a multivariate linear regression are estimated to simulate the solar spectral distribution matching certain meteorological parameters. The results obtained show that over 99.98% of cumulative probability distribution functions of measured spectra are the same as simulated ones. Ó 2013 Elsevier Ltd. All rights reserved. 1. Introduction Classical methods that are used to characterise the solar spectral irradiance distributions are based on physically modelling the at- mospheric processes and they require the use of several environ- mental factors that account for atmospheric conditions. Basically, two different physical methods have been proposed: atmospheric transmittance methods and radiative transfer methods. The rst methods are simpler: the atmosphere is modelled using a one-layer medium in which scattering and absorption processes attenuate the extraterrestrial solar radiation (Leckner, 1978; Bird et al., 1982; de La Casinier et al., 1997); examples of such methods are SPCTRAL2 (see Bird and Riordan, 1986) and SMARTS2 (see Gueymard, 2001); for the respective applications of these methods see Jacovides et al. (2004) and Kaskaoutis and Kambezidis (2008). The second methods are the radiative transfer methods, which are more rigorous: they use several scattering and absorbing layers to ac- count for the vertical atmospheric in-homogeneity (see Liou, 1980; Stamnes et al., 1988); examples of such methods are MODTRAN models (or its predecessor LOWTRAN) (see Anderson et al., 1993). There are also easier models that include the effects of atmospheric components, such as gaseous pollutants and aerosols (see Jacovides et al., 2000) and there are methods that propose the characterisa- tion of the solar spectral irradiance using global and diffuse irra- diance measurements and account for the modication of the spectrum under different atmospheric conditions (see Kaskaoutis et al., 2006). In general, the goal of these methods is to obtain the best rep- resentation of the atmosphere using local geographic coordinates, several types of atmospheric measurements and different aerosol models, which use the aerosol optical thickness (usually corre- sponding to 500 nm) and the Angstrom turbidity coefcient as inputs (see Utrillas et al., 1998). These last two parameters are estimated from the global, direct and diffuse integrated irradiance values. However, these methods are not appropriate for some en- gineering applications due to the detailed required inputs and software. Several elds in which information about the solar spectral irradiance distribution is useful, and which lack such detailed measurements, are illumination engineering or solar thermal and photovoltaic applications; for example, in the photo- voltaics eld, new materials that are used for solar photovoltaic modules have a performance that depends on the solar spectral distribution of solar radiation (see, for example, Martín and Ruíz, 1999; Minemoto et al., 2009; Gottschalg et al., 2003; Piliougine et al., 2011; Myers, 2012). For this reason, several studies have been conducted in this eld for tackling the problem of knowing * Corresponding author. Tel.: þ34 952132802. E-mail addresses: [email protected] (R. Moreno-Sáez), llanos@lcc. uma.es (L. Mora-López). 1 Tel.: þ34 951952299. Contents lists available at ScienceDirect Environmental Modelling & Software journal homepage: www.elsevier.com/locate/envsoft 1364-8152/$ e see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.envsoft.2013.12.002 Environmental Modelling & Software 53 (2014) 163e172

Upload: llanos

Post on 30-Dec-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modelling the distribution of solar spectral irradiance using data mining techniques

lable at ScienceDirect

Environmental Modelling & Software 53 (2014) 163e172

Contents lists avai

Environmental Modelling & Software

journal homepage: www.elsevier .com/locate/envsoft

Modelling the distribution of solar spectral irradiance using datamining techniques

Rafael Moreno-Sáez a,1, Llanos Mora-López b,*

aDepartamento de Física Aplicada II, E.U. Politécnica, Universidad de Málaga, Campus de Teatinos, 29071 Malaga, SpainbDepartamento de Lenguajes y Ciencias de la Computación, E.T.S.I. Informática, Universidad de Málaga, Campus de Teatinos, 29071 Malaga, Spain

a r t i c l e i n f o

Article history:Received 30 May 2013Received in revised form27 November 2013Accepted 9 December 2013Available online 27 December 2013

Keywords:Solar spectral distributionK-means clusteringData mining techniquesStatistical techniquesAverage photon energy

* Corresponding author. Tel.: þ34 952132802.E-mail addresses: [email protected] (R

uma.es (L. Mora-López).1 Tel.: þ34 951952299.

1364-8152/$ e see front matter � 2013 Elsevier Ltd.http://dx.doi.org/10.1016/j.envsoft.2013.12.002

a b s t r a c t

A procedure for modelling the distribution of solar spectral irradiance is proposed. It uses both statisticaland data mining techniques. As a result, it is possible to simulate solar spectral irradiance distributionusing some astronomical parameters and the meteorological parameters solar irradiance, temperatureand humidity. With these parameters, the average photon energy and the normalization factor, whichcharacterise the solar spectra, are estimated. First, the KolmogoroveSmirnov two-sample test is used toanalyse and compare all measured spectra. The k-means data mining technique is subsequently used tocluster all measurements. We found that three clusters are enough to characterise all observed spectra.Finally, an artificial neural network and a multivariate linear regression are estimated to simulate thesolar spectral distribution matching certain meteorological parameters. The results obtained show thatover 99.98% of cumulative probability distribution functions of measured spectra are the same assimulated ones.

� 2013 Elsevier Ltd. All rights reserved.

1. Introduction

Classical methods that are used to characterise the solar spectralirradiance distributions are based on physically modelling the at-mospheric processes and they require the use of several environ-mental factors that account for atmospheric conditions. Basically,two different physical methods have been proposed: atmospherictransmittance methods and radiative transfer methods. The firstmethods are simpler: the atmosphere is modelled using a one-layermedium in which scattering and absorption processes attenuatethe extraterrestrial solar radiation (Leckner, 1978; Bird et al., 1982;de La Casinier et al., 1997); examples of suchmethods are SPCTRAL2(see Bird and Riordan, 1986) and SMARTS2 (see Gueymard, 2001);for the respective applications of these methods see Jacovides et al.(2004) and Kaskaoutis and Kambezidis (2008). The secondmethods are the radiative transfer methods, which are morerigorous: they use several scattering and absorbing layers to ac-count for the vertical atmospheric in-homogeneity (see Liou, 1980;Stamnes et al., 1988); examples of such methods are MODTRANmodels (or its predecessor LOWTRAN) (see Anderson et al., 1993).

. Moreno-Sáez), llanos@lcc.

All rights reserved.

There are also easier models that include the effects of atmosphericcomponents, such as gaseous pollutants and aerosols (see Jacovideset al., 2000) and there are methods that propose the characterisa-tion of the solar spectral irradiance using global and diffuse irra-diance measurements and account for the modification of thespectrum under different atmospheric conditions (see Kaskaoutiset al., 2006).

In general, the goal of these methods is to obtain the best rep-resentation of the atmosphere using local geographic coordinates,several types of atmospheric measurements and different aerosolmodels, which use the aerosol optical thickness (usually corre-sponding to 500 nm) and the Angstrom turbidity coefficient asinputs (see Utrillas et al., 1998). These last two parameters areestimated from the global, direct and diffuse integrated irradiancevalues. However, these methods are not appropriate for some en-gineering applications due to the detailed required inputs andsoftware. Several fields in which information about the solarspectral irradiance distribution is useful, and which lack suchdetailed measurements, are illumination engineering or solarthermal and photovoltaic applications; for example, in the photo-voltaics field, new materials that are used for solar photovoltaicmodules have a performance that depends on the solar spectraldistribution of solar radiation (see, for example, Martín and Ruíz,1999; Minemoto et al., 2009; Gottschalg et al., 2003; Piliougineet al., 2011; Myers, 2012). For this reason, several studies havebeen conducted in this field for tackling the problem of knowing

Page 2: Modelling the distribution of solar spectral irradiance using data mining techniques

R. Moreno-Sáez, L. Mora-López / Environmental Modelling & Software 53 (2014) 163e172164

the solar spectral irradiation distribution. Statistical and datamining methods that do not require a detailed description of theatmospheric composition but that only use the most typical at-mospheric measurements that are available, such as global solarirradiance, temperature and humidity, have been used. Previouslyobtained results so far allow us to know some aspects of the sta-tistical relationship among the spectral distribution of solar radia-tion and different meteorological variables. Most of the proposedstatistical methods are based on the use of one parameter thatcharacterises, and hence describes, the solar spectral distribution:Fabero and Chenlo (1991) use the Spectral Factor, Poissant et al.(2003) propose the Mismatch Factor and Williams et al. (2003)use the Average Photon Energy (APE).

All of these studies use only classical statistical methods toaddress the analysis and characterisation of the solar spectrum.Recently, other approaches that are based on the use of data miningmodels have been proposed, for example, some of these models areused in Moreno-Sáez et al. (2013) for characterising the solarspectral irradiance distribution by using a few meteorological pa-rameters. Data mining techniques are widely applied in differentareas, especially when it is necessary to address a large amount ofdata; these techniques have proved to be very useful. They allow usto process data and to identify models and patterns. A combinationof different techniques to characterize environmental parametershas also been proposed in previous papers, such as in modellingground-level ozone using principal component and multipleregression analysis (Abdul-Wahab et al., 2004) and using principalcomponent regression and artificial neural networks (Al-Alawiet al., 2006).

This study seeks to obtain amodel that allows us to establish thesolar spectral irradiance distribution by using only meteorologicalparameters, which are usually available for different applications.For developing the model, first we characterised the measuredspectra: we deeply analysed the relationships among the param-eter average photon energy (APE) and the solar spectral irradiancedistribution, and we also analysed how many different solar spec-tral irradiance distributions can be found in all of the measuredspectral curves and how these different curves can be related bymeans of APE to some meteorological parameter. For performingthese analyses, we have used both statistical and data miningtechniques, as were proposed in Moreno-Sáez et al. (2013). Oncewe found a one-to-one or biunivocal relationship among APE andsolar spectral irradiance distribution curves, we analysed how tosimulate these curves using frequently available meteorologicalparameters. For this scope, we have determined the relationshipsbetween these meteorological parameters and the parameters thatcharacterise the spectra, namely the APE value and the total energyreceived in the range of the wavelengths used.

The following section describes the materials and methods thatwe propose to use for characterising and simulating solar spectralirradiance values. The third section describes the proposed meth-odology. The fourth section describes the data that was used for thispaper. The obtained results are presented in the fifth section.Finally, the conclusions summarise the most relevant results thatwere obtained in this study.

2. Materials and methods

In this section, we describe the parameters and methods that we propose to usefor characterising the solar irradiance spectral curves and for simulating thesecurves once we have found the significant parameters that characterise the spectraand their relationships with the available meteorological parameters. We propose touse a hybrid approach that is based on the use of both statistical and data miningmethods. We have already used, in a previous study, two of the techniques that wepropose now: the KolmogoroveSmirnov two-sample test and the k-means clus-tering method; the results that were obtained in this previous work are similar tosome of the results that we present here, but those are focused on how the solar

spectrum affects the performance of thin-film photovoltaic modules (Moreno-Sáezet al., 2013).

2.1. Average photon energy

We propose to use the average photon energy (APE) parameter for character-ising the solar spectral distribution. The APE value is an index that was proposed byGottschalg’s group (Loughborough University) (Williams et al., 2003). APE is definedas the average energy per photon included in the spectrum (Williams et al., 2003); itis calculated by dividing the integrated irradiance by the integrated photon fluxdensity, according to the following expression:

APE ¼

Zba

E lð Þdl

qZba

F lð ÞdleVð Þ (1)

where E(l) is the energy at wavelength l, F lð Þ is the photon flux density at wave-length l, and a and b are the considered wavelength boundaries. The APE value forthe standard spectrum AM 1.5 is 1.88 eV in the wavelength range that we areworking in, which is 350e1050 nm.

Minemoto et al. (2009) proved that the APE value can be used to statisticallycharacterise the spectral irradiance using the methodology adopted by the Inter-national Electrotechnical Commission (IEC, 2007). They compare only spectra thathave similar APE values by using themean and standard deviation and conclude thatan APE value uniquely yields the shape of a solar spectrum, although they remarkthat the uniqueness of APE was verified only for spectra that were collected in themeasurement location. This uniquenessmeans that, with the APE value, it is possibleto know the shape of the solar spectral irradiance, in other words, the relativeirradiance that corresponds to each wavelength is determined by the value of theAPE. We propose to use this index as one of the variables that explains the solarspectral irradiance distribution. To analyse the relationship between this variableand the different distributions recorded in Malaga, we have used both statistical anddata mining techniques, which are described in the following sections.

2.2. Statistical test for comparing solar spectral distributions

We propose using a statistical test that addresses all of the measured data in thespectral curves and not only the wavelength interval bands of 50 or 100 nm, as inprevious studies (see, for example, Minemoto et al., 2009). We have implementedthe classical statistical KolmogoroveSmirnov two-sample test to analyse thedifferent solar spectral distributions measured, as explained in Mora and Mora-López (2010). We propose to check if two solar spectral irradiance distributionsare the same. We are given the following:

nXli

oln¼1050

l1¼350and

nYlioln¼1050

l1¼350

which are the solar spectral irradiance values for the different wavelengths li of bothmeasurements. Denote the probability distribution function (p.d.f.) of X as fX ð$Þ,which is estimated by using:

bf X�lj�hEliEt

(2)

where:

Et ¼X1050

li ¼350

Eli (3)

Denote the cumulative probability distribution function (c.p.d.f.) of X as FXð$Þand the c.p.d.f. of Yas FY ð$Þ. Both FXð$Þ and FY ð$Þ are assumed to be continuous. Then,the null hypothesis that we propose is expressed as:

H0 : FXð$Þ ¼ FY ð$Þ; (4)

versus the general alternative hypothesis

Ha : FXð$ÞsFY ð$Þ; (5)

which makes no parametric assumption about the shape of these c.p.d.f.s. Thisprocedure is known as the “test of homogeneity between two samples”. If thesample sizes n and m are sufficiently large, then this test can be performed by usingthe KolmogoroveSmirnov statistic, which compares the empirical c.p.d.f. obtainedwith each sample (measurement). The sample sizes for this experiment are thenumber of solar spectral irradiances measured (one for each wavelength recordedby the measurement equipment).

Specifically, if for a real number lj in the range [350.0 � 1050.0] (which corre-sponds to one of the wavelengths measured), we define:

Page 3: Modelling the distribution of solar spectral irradiance using data mining techniques

R. Moreno-Sáez, L. Mora-López / Environmental Modelling & Software 53 (2014) 163e172 165

bFX�lj�h

Xlj E1li

Etand bFY

�lj�h

Xlj E2li

Et(6)

li ¼350 li ¼350

Then, the KolmogoroveSmirnov statistic is:

DX;Yh� nmnþm

�1=2supt˛R

���bFXðliÞ � bFY ðliÞ���: (7)

The null hypothesis is rejected with a significance level of a if DX,Y > ca, where cais a critical value that depends only on a (for details, see, e.g., Rohatgi and Saleh,2001).

2.3. K-means for clustering the solar spectral distributions and selectingrepresentative curves

The aim of clustering is to use a technique that partitions a data set (observa-tions) into groups, in such away that one observation is more similar to the others ofits cluster than to observations in other clusters according to some objective func-tion that defines similarity or dissimilarity among objects (Han and Kamber, 2001).Clustering techniques can be classified into two groups, hierarchical and partitional(Jain et al., 1999). Each observation is composed of m variables (m-dimensionalspace). Many different research areas have used clustering techniques such as textmining, statistical learning and pattern recognition (Jain et al., 1999; Duda et al.,2001; Hastie et al., 2001).

We propose using the partitional data mining technique known as k-meansclustering to decide how many different c.p.d.f. of spectra there are. K-means is themost widely used partitional clustering algorithm (Emre Celebi et al., 2013). It isbased on analysing one or more attributes (variables) to identify a cluster ofcorrelating results. Using these attributes, it is possible to gather observations thathave some similarity. This partitionminimises the sum, over all of the clusters, of thewithin-cluster sums of point-to-cluster-centroid distances. To measure the simi-larity between each observation and the centroid of each cluster, the distance fromthe sample to its cluster is used because all of the variables are numerical. We haveused squared Euclidean distances, as is defined in Jain et al. (1999).

dpððXi;XjÞ ¼ Xd

k¼ 1

���xi;k � xj;k���p!1=p

¼ ��Xi � Xj��p (8)

where Xi and Xj are the variables that are included for characterising observations iand j, respectively; in our study, they are the c.p.d.f. and APE values for eachmeasurement.

The algorithm 1 is used for clustering the solar spectral irradiance distributions.

Algorithm 1. k-means algorithm.

2.4. Artificial neural network for estimating the APE from available meteorologicalparameters

The equipment that is necessary for measuring the solar spectral irradiancevalues, and then for obtaining the APE values, is expensive and is rarely available inmanyapplications inwhich thesemeasurements could be veryuseful. For this reason,we have analysed the relationships between the most typically available variablesand the APE value. On the one hand,wepropose using somemeteorological variablesthat are usually recorded, such as the solar irradiance, temperature and humidity. Onthe other hand, we propose using variables that are related to the incidence angle ofsolar radiation, specifically the air mass factor and the solar extraterrestrial irradi-ance; the latter is used for estimating the value of the clearness index.

Solar extraterrestrial irradiance is defined as the instantaneous energy that isreceived on the outside of the atmosphere, per unit area. Extraterrestrial solar ra-diation will depend on every moment, on the Earth Sun distance, the declination,

the latitude of the place and the hour angle considered. The solar extraterrestrialirradiance received in a horizontal surface is obtained by using the expression (Iqbal,1983):

G0 ¼ IscE0 cos qz ¼ IscE0ðsin d sin fþ cos d cos fcosuÞ�Wm�2

�(9)

where ISC is the solar constant, E0 is the eccentricity factor, d is the declination angle,f is the latitude and us is the hour angle. With this value, the clearness index isestimated by using the expression:

Kt ¼ Gt

G0(10)

where Gt is the solar global irradiance.The air mass factor is the optical path length of the light when the Sun is at a

given elevation divided by the corresponding valuewhen the Sun is at its zenith. Theair mass factor can be approximated by using the expression (Iqbal, 1983):

AMt ¼ 1:0

cos uþ 0:50572ðarcsinðcos uÞ þ 6:07995Þ�1:6364 ; (11)

Although there is a relationship between the APE and each one of these variable,we have verified that it is not linear. For this reason, we propose to use artificialneural networks (ANNs) for predicting the APE value from the aforementionedvariables, because they are non-linear statistical data modelling tools and, therefore,can be used to model complex relationships between variables. The algorithm forestimating ANNs is provided with labelled samples, in other words, with all of theindependent aforementioned variables and the APE value (label) that corresponds toeach instant. We have fitted the data by using different configurations of artificialneural networks, as is explained in the following section. The ANNs are trained usingthe LevenbergeMarquardt backpropagation algorithm, in which the artificial neuralnetworks are organised in layers and send their signal forward, and the errors arepropagated backwards. This learning function uses an adaptive learning rate. Thedata set is randomly divided into three groups: training, validation and testing. Themean squared errors and the regression values of each group have been estimated.The regression values measure the correlations between the outputs and targets.The mean squared error is estimated as the average squared difference between theoutputs and targets, according to the expression Eq. (12):

MSE ¼PN

i¼1

�AbPEi � APEi

�2N

(12)

where AbPEi are the outputs of the ANN, and APEi are the APE values that are esti-mated from the measurements of the solar irradiance spectra.

The training automatically ends when generalisation stops improving, as indi-cated by an increase in the mean squared error of the validation samples.

2.5. Regression model to estimate the normalisation factor of the solar spectralirradiance distribution

The coefficient of normalisation Et, (Eq. (3)), which corresponds to each mea-surement, must be known to obtain the solar spectral irradiance distribution fromthe values of the cumulative probability distribution function (Eq. (6)). This coeffi-cient is related to the meteorological variables. Specifically, as has been proposed byGottschalg et al. (2004), this distribution clearly depends on the value of the solarglobal irradiance. We have also included other available meteorological parametersthat are related to this coefficient. We propose using the following linear multi-variate regression to obtain the normalisation factor value, Et:

Et ¼ a0 þ a1Gt þ a2Tt þ a3Ht þ a4Kt þ a5AMt : (13)

where Gt is the solar global irradiance, Tt is the temperature, Ht is the humidity, Kt isthe clearness index, and AMt is the air mass.

2.6. Selected metrics for evaluating the proposed models

We suggest to assess the proposed models performances with quantitative toolsas all of them are quantitative models. As pointed out in Bennett et al. (2013), per-formance criterion for each proposed model depends on the model characteristics,data, information and knowledge that is available to the modeller, and the specificgoals of the modelling exercise. Moreover, quantitative testing involves the calcu-lation of suitable numerical metrics to characterise model performance.

The k-means clustering method proposed in 2.3 shall be labelled in the directvalue comparison category (Bennett et al., 2013) which “tests whether the modeloutput shows similar characteristics as a whole to the set of comparison data” butdoes not directly compare observed and modelled data points, as well as the Kol-mogoroveSmirnov two sample homogeneity test that is used to compare c.p.d.f.’scalculated from spectra, even though this test is classified as a residual method in itsone-sample test version.

Page 4: Modelling the distribution of solar spectral irradiance using data mining techniques

R. Moreno-Sáez, L. Mora-López / Environmental Modelling & Software 53 (2014) 163e172166

Regarding the ANN used to estimate APE values, the methods used to measurethe quantitative performance can be classified into the following categories(seeBennett et al., 2013):

� Coupling real and modelled values using residual methods by calculating thedifference of the pair observed-modelled data points. We have selected theMean Square Error (MSE) from among all the possible numerical calculations onmodel residuals as it is one of the most common. The Mean Square Error cri-terion squares the residuals before calculating the mean, making all contribu-tions positive and penalizing greater errors more heavily.

� Preserving the data pattern using the coefficient of determination (R2) as it in-dicates howwell the model explains the variance in the observations, comparedwith using their mean as the prediction.

Finally, the correlation coefficient has been used to measure the performance ofregression model to estimate the normalization factor of the solar spectral irradi-ance distribution; it is included in the category of preserving data pattern ((Bennettet al., 2013)). The correlation coefficient is used to indicate how variation of onevariable (dependent) is explained by an other variables (independent).

3. Proposed methodology

This work seeks to characterise the solar spectral irradiancecurves and, once achieved, we propose to perform the inverseprocess, to simulate these curves using the meteorological pa-rameters that characterise them. This characterisation is useful notonly in traditional areas, as noted by (Nan and Riordan, 1991)(daylight availability predictions, radiation models for climatechange predictions and biological impacts of climate change) butalso in new fields, such as solar cell performance (Martín and Ruíz,1999).

For characterising the solar spectral irradiance curves, we pro-pose to use the procedure that is described in Fig. 1. The goal of thischaracterisation is to determine how many different spectral irra-diance curves there are and to identify the parameters that explainthese curves.

Fig. 1. Proposed procedure for characterising the solar spectral irradiance curves.

This work also aims to propose a model that allows the solarspectral irradiance distribution to be generated only by using someusually available meteorological parameters. We propose to use thealgorithm 2 for obtaining the solar spectral irradiance distributionsby using the results obtained in the characterisation of the spectra.

Algorithm 2. Algorithm for obtaining the solar spectral irradiancedistribution.

4. Data description and preparation

The data used were recorded at the Photovoltaic Systems Lab-oratory of the University of Malaga. The temperature, humidity,solar irradiance and solar spectral irradiance measurements wererecorded at the same time. A Grating Spectroradiometer preparedfor continuous outdoor exposure has been used to record the solarspectral irradiance. This spectroradiometer has a silicon sensor thatprovides a spectral measurement range of 330e1050 nm, VIS andNIR; it was placed on a fixed 21� slope. It shortens themeasurementto the range of 10 mse5 s. The geographical coordinates of theLaboratory are latitude 36.7� N and longitude 4.5� W, height 50 m.Measurements have been collected from November 2010 to May2012. Spectra were obtained with a spectral resolution of below8 nm at a wavelength interval of 0.75 nm. The total amount of solarspectral irradiances that were measured is up to 400,000. Somemeasures were removed to avoid the reflection that is produced bythe angle of incidence and other effects. The spectrum taken withan elevation angle of under fifteen degrees has also been deleted(Nann and Riordan, 1991).

For this study, we used the irradiance values that correspond tospectra whose wavelength range from 350 to 1050 nm. The samerange has been previously used (see Sirisamphanwong and Ketjoy,2012). A total of 920 values has been used for each spectrum, and atotal of 282,318 spectrum (samples) have been used.

5. Results and discussion

5.1. Estimating APE values

The average photon energy (APE) value has been estimated foreach one of the measured spectra. After calculating the APE valuefor every spectrum that was recorded, we have calculated thepercentage distribution of the spectra in terms of the APE, to see thebehaviour of the APE value at the location of Malaga. The distri-bution of the calculated APEs using Eq. (1) for all of the measuredspectra is shown in Fig. 2.

Fig. 2 shows that there is a high percentage of spectra, over fiftypercent, with a higher APE value than the standard AM 1.5 spec-trum, 1.88 eV. This arrangement can be explained because of thesituation of the location, which is a seaside town with high rates ofrelative humidity (see Cornaro and Andreotti, 2013; Ishii et al.,2013).

Page 5: Modelling the distribution of solar spectral irradiance using data mining techniques

0

5

10

15

20

25

30

APE [eV]

Spec

tra [%

]

1.78

1.79

1.80

1.81

1.82

1.83

1.84

1.85

1.86

1.87

1.88

1.89

1.90

1.91

1.92

1.93

1.94

1.95

1.96

1.97

1.98

1.99

2.00

2.01

2.02

2.03

2.04

2.05

2.06

2.07

2.08

2.09

Fig. 2. Distribution of the APE values estimated from all of the solar spectral irradiance measurements.

R. Moreno-Sáez, L. Mora-López / Environmental Modelling & Software 53 (2014) 163e172 167

5.2. Comparing cumulative probability distribution functions ofsolar spectral irradiance distributions

For every one of the solar spectral irradiance curves recorded(samples), the c.p.d.f. has also been estimated (Eq. (6)). Thesefunctions allow us to compare the recorded curves while ac-counting for the shape of the spectral irradiance curves, instead ofthe absolute values.

We have performed two types of comparisons using Eq. (4) andEq. (6). On the one hand, all of the possible comparisons betweenthe c.p.d.f.’s of the spectra that have a similar APE value (�0.005)have been performed. On the other hand, after clustering the c.p.d.f.of spectra as explained in the following subsection, we have per-formed the comparisons of c.p.d.f. for the spectra that are includedin each one of the clusters obtained and its centroid.

First, we have performed all of the possible comparisons amongthe c.p.d.f. that have similar APE values. Because the APE valuesrange from 1.79 to 2.09 eV, we have separated the samples into 31groups to perform these comparisons, using the following criteria:

Xi˛Gi if APEi˛½1:78þ ði� 1Þ*0:01;1:78þ i*0:01½ (15)

Using these criteria, we have considered that the APE values ineach group are similar because the differences among them are

0

10

20

30

40

50

60

70

80

90

100

AP

Equa

l Sam

ples

per

Cla

ss [%

]

1.78

1.79

1.80

1.81

1.82

1.83

1.84

1.85

1.86

1.87

1.88

1.89

1.90

1.91

1.92

Fig. 3. Percentage of comparison for which it is accepted th

always lower than 0.01 eV. Fig. 3 shows the results that are ob-tained when performing all of the possible comparisons betweenthe c.p.d.f.’s of the samples that were included in each group. Thisfigure shows the percentage of comparisons for which the nullhypothesis is accepted; in other words, the compared c.p.d.f.’s arethe same when using the homogeneity test (Eq. (4)).

These results show that, for APE values that range from 1.79 to2.00 eV, the c.p.d.f. of the solar spectral irradiance are the same(homogeneity test) in almost every case, and for almost all of thesevalues of APE, the c.p.d.f.’s of each group are similar for all of thegroups (for this range, the lower percentage is 99.27). However, theresults of the homogeneity test for APE values of greater than2.00 eV suggest that there are significant differences betweenc.p.d.f.’s. In other words, it is not possible for these values to pro-pose a unique c.p.d.f. for each APE value; in contrast, only less than1.0% of the spectral irradiance curves has an APE value that isgreater than 2.00 eV.

For illustrating these results, Fig. 4 shows the probability andcumulative probability distribution functions of spectral irradiancefor two spectra that have similar values for APE, and Fig. 5 showsthe probability and cumulative probability distribution functions ofthe spectral irradiance for two spectra that have different values forAPE.

E [eV]

1.93

1.94

1.95

1.96

1.97

1.98

1.99

2.00

2.01

2.02

2.03

2.04

2.05

2.06

2.07

2.08

2.09

at the c.p.d.f.’s are the same for each APE value group.

Page 6: Modelling the distribution of solar spectral irradiance using data mining techniques

Fig. 4. Probability and cumulative probability distribution functions for two spectra with similar APE values.

R. Moreno-Sáez, L. Mora-López / Environmental Modelling & Software 53 (2014) 163e172168

As can be observed, the distributions that correspond to similarvalues of APE have the same shape, whilst the shapes of the dis-tributions that correspond to the two spectra with different APEvalues are different; this fact is clearly shown in the cumulativeprobability distribution functions. It is important to remark on thesimilitude of the shapes and not the heights of each curve. In otherwords, the APE is a good indicator for quantifying the relativecontribution of each wavelength to the total irradiance by deter-mining the spectral absorption windows position. These resultsagree with those obtained in Minemoto et al. (2009), and theyextend their conclusion for a different location. They remark thatthe uniqueness of APE had been verified only for the spectrumcollected in their measurement location, and therefore, theyrecommend to performing similar analysis at different locationsand climates to apply the uniqueness universally. The new resultsthat we have obtained contribute to the idea of the universality ofthis relationship. Because we performed a statistical analysis, ourproposal does not address the components that are responsible forthe attenuation in each wavelength, but the result can statisticallyquantify these attenuations for different wavelengths.

5.3. K-means for clustering solar spectral irradiance distributions

The results obtained for the APE values that range from 1.79 to2.00 eV suggest that a smaller number of groups could be able toclassify all of the APE values. The problem of deciding what theminimum number of groups to be used is and how to assign thegroup of each sample to ensure that all of the samples are similarin each group is computationally very expensive and has severalsolutions, depending on the criteria that are used for assigning agroup for each sample. For this reason, we propose using the datamining k-means clustering technique that allows us to performthis classification in an automatic way. It is necessary only todecide the number of groups, and the algorithm will assign thegroup with the most similar samples to each group. The data are

Fig. 5. Probability and cumulative probability distribution

clustered using the algorithm described in Section 2.3. Moreover,for each cluster, the k-means algorithm selects the c.p.d.f. that isthe most similar to all of the c.p.d.f.’s in the cluster. This result isreferred to as the centroid of the cluster, with the comparison ofall the c.p.d.f.’s of each cluster with its centroid subsequentlyperformed. Table 1 shows the obtained results when the homo-geneity test (Eq. (4)) is applied using two different significancelevels (0.05 and 0.01) and when the values of 2, 3, 4, 5 and 6 areused as the input parameter K (number of clusters) in the k-meansalgorithm.

These results allow us to conclude that when using three clus-ters, with a significance level of 0.05, over 99.0% of the c.p.d.f.’s passthe homogeneity test; in other words, by using only three differentc.p.d.f.’s, which correspond to the centroids of each cluster, it ispossible to characterise all of the measured curves for APE valuesthat are less than or equal to 2.00 eV. Table 2 shows the detailedresults for each cluster when 3 clusters are used.

To know which of these three curves must be used for eachspectrum, we have also analysed the relationship between the APEsthat belong to the same cluster. Fig. 6 shows how the APE values aredistributed for each cluster when 3 clusters are used.

As shown, there is a relationship between the APE value and thecluster to which the c.p.d.f. of this APE belongs; this relationshipappears to present uniqueness when 3 clusters are used. Fig. 7shows the c.p.d.f. of the three centroids when three clusters areselected as the optimal value.

As can be observed, each one of these spectra can be associatedwith a different atmospheric composition. For the relative spec-trum that correspond to centroid 1 (blue line), the absorption growto be greater than the shorter wavelength, and as a result, therelative irradiances for the shorter wavelengths are lower. Incontrast, the centroid 3 (black line) corresponds to a relativespectrum in which the absorption is greater for larger spectra andthe relative greater irradiance values are more attenuated than forthe other two spectra. This finding explains why even when the

functions for two spectra with different APE values.

Page 7: Modelling the distribution of solar spectral irradiance using data mining techniques

Table 1Percentage of comparisons for which the null hypothesis is accepted at the signifi-cance levels of 0.05 and 0.01.

Number of clusters Homogeneity test results (%)

a ¼ 0.05 a ¼ 0.01

2 98.21 99.093 99.76 99.964 99.89 99.995 99.98 100.006 99.99 100.00

Table 2Results obtained for each cluster when a total of 3 clusters are used (significancelevels of 0.05 and 0.01).

Cluster number Samples in cluster Homogeneity test results (%)

Total Percentage a ¼ 0.05 a ¼ 0.01

1 70,510 25.23 99.66 99.882 177,441 63.48 100.00 100.003 31,569 11.29 98.67 99.89Total % of similar cpdf 99.76 99.96

R. Moreno-Sáez, L. Mora-López / Environmental Modelling & Software 53 (2014) 163e172 169

model does not include the effects of each of the atmosphericcomponents, it is possible to statistically characterise their effectson the spectrum.

Finally, it should be noted that, when using 3 clusters, for theAPE value of 1.88 eV, there are samples that have been included incluster 1 and cluster 2; the same situation occurs for APE values of1.92 eV, but in this case, there are samples that have been includedin cluster 2 and cluster 3. For all of these samples that could be intwo different clusters, we have used the homogeneity test for thecentroids of these clusters. In all of these cases, the null hypothesisis accepted at the 0.05 significance level, which means that it ispossible to assign these samples to any of these two clusters. Theobtained results allow us to conclude that by using the APE value, itis possible to know, with an error lower than 0.012%, the c.p.d.f. ofthe solar spectral irradiance distribution using the centroid (c.p.d.f.)corresponding to the cluster to which the APE value belongs.

5.4. Artificial neural network for estimating the APE value from theavailable meteorological data

We have analysed the relationship between the availablemeteorological variables and the APE value estimated for each

1.78 1.79 1.80 1.81 1.82 1.83 1.84 1.85 1.86 1.87 1.88 1.0

5

10

15

20

25

30

AP

Num

ber o

f Sam

ples

[%]

Fig. 6. Distribution of the APE values in each c

spectrum using ANNs because the APE values are not recorded.Specifically, the independent variables that are used are the solarirradiance, the temperature and the humidity (as meteorologicalparameters) and the solar extraterrestrial irradiance and air massfactor (which depend on the astronomical parameters for the timeof the measurements).

We have checked three different configurations of ANNs: 5, 10and 20 hidden layers. In all of the cases, there is only one outputlayer. To train the network, the samples are divided randomly intothree sets: training, validation and testing. The artificial neuralnetworks are trained using the LevenbergeMarquardt back-propagation algorithm. Training automatically ends when gener-alisation stops improving, as indicated by an increase in the meansquare error of the validation samples. Two measures of the per-formance of the artificial neural networks have been used: themean squared error and the regression value. The first measure isthe average squared difference between the outputs and targets,and the secondmeasure is the correlation between the outputs andtargets.

Table 3 shows the results that were obtained for the threeconfigurations. The third column shows the number of samples ineach set, the fourth column shows the mean squared error and thelast column shows the R values for each set and configuration.

As can be observed, the mean squared error is lower than0.00018, and the R values are greater than 0.83 in all of the con-figurations and sets. This finding means that this type of model canbe used for estimating the APE value when using the above-mentioned meteorological and astronomical parameters. Theimprovement that is achieved when using 20 hidden layers withrespect to the results obtained when using 10 hidden layers is notsignificant;moreover, if 10 hidden layers are used instead of using 5hidden layers, the R value for all of the sets is greater than 0.85. Forthese reasons, we propose using 10 hidden layers in the artificialneural network.

5.5. Multivariate regression model for estimating the normalisationfactor

The normalisation factor that is used to estimate the originalc.p.d.f. must be able to obtain the solar spectral irradiance distri-bution from this centroid. We have estimated the normalisationfactor value, Et, by using expression 13. The coefficients of thisexpression have been obtained by using all of the measurements ofthe meteorological parameters (solar irradiance, temperature and

89 1.90 1.91 1.92 1.93 1.94 1.95 1.96 1.97 1.98 1.99 2.00E [eV]

Cluster#1Cluster#2Cluster#3

luster when using 3 clusters in k-means.

Page 8: Modelling the distribution of solar spectral irradiance using data mining techniques

Fig. 7. Probability distribution functions for the centroids of clusters.

R. Moreno-Sáez, L. Mora-López / Environmental Modelling & Software 53 (2014) 163e172170

humidity) and the solar extraterrestrial irradiance and air mass foreach measurement. The model obtained is the following:

Et ¼ 264020þ 827Gt � 7061Tt � 1025Ht þ 258855Kt

� 35843AMt : (16)

The multiple correlation coefficient of fitting is 0.937, and thedetermination coefficient is 0.878.

5.6. Comparison with previous models

We have proposed to use a procedure that is based on the use ofstatistical and data mining techniques while other previous studiesare based on the use of physical models of the atmosphere. For thisreason, the obtained results are difficult to compare because theerror measurements are not the same. In any case, we make a re-view of some of the previous results, to clarify the validity andapplicability of the proposed model in contrast with these previousproposed models. The most accepted methods for characterisingthe solar spectrum are the atmospheric transmittancemethods andthe radiative transfer methods. The input parameters of thesemethods are estimated from meteorological parameters and fromusing different atmospherical models for the components of theatmosphere. Because of the more rigorous analysis of radiativetransfer models and their complexity, the results obtained for thesemethods are not comparable with our results, and the re-quirements of each approach are very different.

We have reviewed the results thatwere obtained for atmospherictransmittancemodels. InUtrillas et al. (1998), ananalysis of SPCTRAL2and SMARTS2 for data from Valencia is proposed (atmospherictransmittance methods). Global and direct solar irradiance,

Table 3Results of estimating the APE with artificial neural networks.

Hidden layers Set Samples MSE R

5 Training 186,778 0.0001798 0.8421Validation 40,024 0.0001801 0.8426Testing 40,024 0.0001849 0.8383

10 Training 186,778 0.0001589 0.8623Validation 40,024 0.0001632 0.8576Testing 40,024 0.0001646 0.8574

20 Training 186,778 0.0001606 0.8608Validation 40,024 0.0001616 0.8590Testing 40,024 0.0001595 0.8615

atmosphere temperature and relative humidityare used as input datafor obtaining different parameters of the models, such as theAngstrom turbidity coefficient, aerosol optical depth and watervapourcontent. The rootmeansquaredeviation (RMSD) isusedas theerror measurement for comparing the measured spectral values ofthe experimental direct irradiance and those obtained by themodels.ThemeanRMSDpresents values of approximately 10%, depending onthe aerosol model for SPCTRAL2 and the range from 5.4 to 15% forSMARTS2. SMARTS is also used in Kaskaoutis and Kambezidis (2008)with data fromAtenas (urban area) for predicting the spectral direct-beam irradiance. The accuracy of the results is evaluated by using theRMSD (relative). The obtained RMSD is approximately 14%. The maindrawback of using this type of model for certain applications is theirrelative complexity (they require the use of specific software) and thenecessity of having a priori knowledge of the aerosol model for eachsituation.

In our proposal, the errors for each of the models that make upthe process are the following:

� Over 99.9% of the relative shape of the solar spectral irradiancecurves (c.p.d.f.’s) are well characterised using the centroid of thecluster to which the APE of the curve (APE values) ranges from1.79 to 2.00 eV.

� The mean squared error for estimating the APE value frommeteorological parameters using an ANN is lower than 0.00018,and the R values are greater than 0.83.

� The multiple correlation coefficient of multivariate regressionfor estimating the normalisation factor is greater than 0.93.

Then, the errors of some of the models included in the proposedprocedure are very small and improve results obtained with pre-vious proposed methods; moreover, it could be possible to reducethe error in some of these models by investigating the use of othervariables that have proven to be significant in the analysis ofspectra, such as the aerosol spectral depth. The advantage of thehybrid proposed procedure is that it is possible to improve theoverall results by changing only some of the proposed models,specifically those that have greater errors.

5.7. Generating solar spectral irradiance distributions: a practicalexample

Using the procedure described in Section 3 the solar spectralirradiance distribution that corresponds to a time can be generated.

Page 9: Modelling the distribution of solar spectral irradiance using data mining techniques

300 400 500 600 700 800 900 1000 11000

200

400

600

800

1000

1200

1400

1600

1800

Wavelength, λ [nm]

Spec

tral I

rradi

ance

[W/m

/μm

]2

MeasuredGenerated

Fig. 8. Simulated and measured solar spectral irradiance distribution for April 23rd, 2012 at 15:37:00.

R. Moreno-Sáez, L. Mora-López / Environmental Modelling & Software 53 (2014) 163e172 171

We will show an example of the steps that were proposed forgenerating a solar spectral distribution when using the meteoro-logical and solar parameters that are required; the example per-tains to April 23rd, 2012 at 15:37:00 h.

Inputs:

� Meteorological parameters measured: irradiance: 908 Wh/m2,temperature: 25.2 �C and relative humidity: 26%.

� Estimated parameters: solar extraterrestrial irradiance:1161 Wh/m2 (Eq. (9)), clearness index: 0.78 (Eq. (10)) and airmass: 1.136 (Eq. (11)).

Using the proposed models:

� Estimated APE value using ANN (see Section 2.4): 1.90 eV.� Selected cluster for APE 1.90 eV: 2 (to select the c.p.d.f. of thiscluster (centroid)).

� Estimated normalisation factor Et using Eq. (16): 971,276.� Using c.p.d.f. of cluster 2 and Eq. (14) the solar spectral irradi-ance distribution is generated.

Output

� Solar spectrum (simulated) for April 23rd, 2012 at 15:37:00 h.

The solar spectral irradiance distribution generated is shown inFig. 8. The measured spectrum for this time has also been includedin this figure.

As can be observed, the spectrum obtained with the proposedprocedure is very similar to the real spectrum, as the results ob-tained in the previous sections indicate.

6. Conclusions

We have developed a procedure to model the solar spectralirradiance distribution that uses both statistical and data miningtechniques. This procedure is based on the fact that all of thedifferent spectra measured in Malaga for twenty months can becharacterised by using only three different types of spectrum, ac-cording to the results that we have obtained. To find out howmanydifferent spectra there are we have used the k-means clusteringtechnique and the KolmogoroveSmirnov two-sample test. Toanalyse the solar spectral irradiance distributions, we first

estimated the average photon energy of each measure and its cu-mulative probability distribution functions, using a normalisationfactor for each measure. With these values and the mentionedtechnique and test, it has been possible to reduce over 260,000measured spectra to only three. More than 99.8% of the measureddistributions are similar to the selected spectra that correspond tothe cluster to which they belong.

After selecting these spectra, we have analysed the relationshipamong certain meteorological parameters (solar irradiance, tem-perature and humidity), certain astronomical parameters (air massand extraterrestrial solar irradiance) and the variables necessary tocharacterise the spectra (average photon energy and normalisationfactor). We have proposed using an artificial neural network forestimating the APE value from themeteorological and astronomicalparameters. The mean squared error of the model is lower than0.00018, and the R values are greater than 0.85. Finally, we haveproposed using a multivariate linear model for estimating thenormalisation factor that has a multiple correlation coefficient offitting of 0.937 and a determination coefficient of 0.878.

We can conclude that the proposed models allow the solarspectral irradiance distributions to be generated for any meteoro-logical conditions while using only the values for the solar irradi-ance, temperature and humidity as the input measured parameters.The obtained results are in keeping with the previous results ob-tained byMinemoto et al. (2009), which can be an indicator of theiruniversality because the study has been conducted in a differentlocation.

Acknowledgements

This work has been supported by the projects P10-TIC-6441 andP11-RNM-07115 of the Junta de Andalucía, Spain.

References

Abdul-Wahab, S.A., Bakheit, C.S., Al-Alawi, S.M., 2004. Principal component andmultiple regression analysis in modelling of ground-level ozone and factorsaffecting its concentrations. Environ. Modell. Softw. 20 (10), 1263e1271.

Al-Alawi, S.M., Abdul-Wahab, S.A., Bakheit, Charles S., 2006. Combining principalcomponent regression and artificial neural networks for more accurate pre-dictions of ground-level ozone. Environ. Modell. Softw. 23 (4), 396e403.

Anderson, G.P., Ghetwynd, J.H., Theriault, J.M., Acharya, P., Berk, A., Robertson, D.C.,et al., 1993. MOTRAN2: suitability for remote sensing. In: Proceedings of theConference on Atmospheric Propagation and Remote Sensing II. SPIE, Orlando,pp. 514e525.

Page 10: Modelling the distribution of solar spectral irradiance using data mining techniques

R. Moreno-Sáez, L. Mora-López / Environmental Modelling & Software 53 (2014) 163e172172

Bennett, N.D., Croke, B.F., Guariso, G., Guillaume, J.H., Hamilton, S.H., Jakeman, A.J.,Marsili-Libelli, S., Newham, L.T., Norton, J.P., Perrin, C., Pierce, S.A., Robson, B.,Seppelt, R., Voinov, A.A., Fath, B.D., Andreassian, V., 2013. Characterising per-formance of environmental models. Environ. Modell. Softw. 40, 1e20.

Bird, R.E., Riordan, C., 1986. Simple solar spectral model for direct and diffuseirradiance on horizontal and tilted planes at the earth’s surface for cloudlessatmospheres. J. Clim. Appl. Meteorol. 25, 87e97.

Bird, R.E., Hulstrom, R.L., Kliman, A.W., Eldering, H.G., 1982. Solar spectral mea-surements in the terrestrial environment. Appl. Optics 21 (8), 1430e1436.

Cornaro, C., Andreotti, A., 2013. Influence of Average Photon Energy index on solarirradiance characteristics and outdoor performance of photovoltaic modules.Progress Photovolt. Res. Appl. 21 (5), 996e1003.

de La Casinier, A., Bokoye, A.I., Cabot, T., 1997. Direct solar spectral irradiancemeasurements and updated simple transmittance models. J. Appl. Meteorol. 36,509e520.

Duda, R., Hart, P., Stork, D., 2001. Pattern Classification. John Wiley & Sons.Emre Celebi, M., Kingravi, Hassan A., Vela, Patricio A., 2013. A comparative study of

efficient initialization methods for the k-means clustering algorithm. ExpertSyst. Appl. 40, 200210.

Fabero, F., Chenlo, F., 1991. Variance in the solar spectrum with the position of thereceiver surface during the day for PV applications. In: Proceedings of the 22ndIEEE Photovoltaic Specialists Conference. IEEE Press, New York, pp. 812e817.

Gottschalg, R., Infield, D.G., Kearney, M.J., 2003. Experimental study of variations ofthe solar spectrum of relevance to thin film solar cells. Sol. Energy Mater. Sol.Cells 79, 527537.

Gottschalg, R., Betts, T.R., Infield, D.G., Kearney, M.J., 2004. On the importance ofconsidering the incident spectrum when measuring the outdoor performanceof amorphous silicon photovoltaic devices. Meas. Sci. Technol. 15, 460e466.

Gueymard, C., 2001. Parameterized transmittance model for direct beam and cir-cumsolar spectral irradiance. Sol. Energy 71 (5), 325e346.

Han, J., Kamber, M., 2001. Data Mining Concepts and Techniques. Morgan Kauf-mann, San Francisco.

Hastie, T., Tibshirani, R., Friedman, J., 2001. The Elements of Statistical Learning:Data Mining, Inference and Prediction. Springer.

International Electrotechnical Commission, 2007. IEC 60904e9.Iqbal, M., 1983. An Introduction to Solar Radiation. Academic Press.Ishii, T., Otani, K., Takashima, T., Xue, Y., 2013. Solar spectral influence on the per-

formance of photovoltaic (PV) modules under fine weather and cloudy weatherconditions. Progress Photovolt. Res. Appl. 21 (4), 481e489.

Jacovides, C.P., Steven, M.D., Asimakopoulos, D.N., 2000. Solar spectral irradianceunder clear skies around a major metropolitan area. J. Appl. Meteorol. 39, 917e930.

Jacovides, C.P., Kaskaoutis, D.G., Tymvios, F.S., Asimakopoulos, D.N., 2004. Applica-tion of SPCTRAL2 parametric model in estimating spectral solar irradiances overpolluted Athens atmosphere. Renew. Energy 29, 1109e1119.

Jain, A., Murty, M., Flynn, P., 1999. Data clustering: a review. ACM Comput. Surv. 31(3), 264323.

Kaskaoutis, D.G., Kambezidis, H.D., 2008. The role of aerosol models of the SMARTScode in predicting the spectral-beam irradiance in an urban area. Renew. En-ergy 33, 1532e1543.

Kaskaoutis, D.G., Kambezidis, H.D., Jacovides, C.P., Steven, M.D., 2006. Modificationof solar radiation components under different atmospheric conditions in theGreater Athens Area, Greece. J. Atmos. Solar-Terres. Phys. 68, 1043e1052.

Leckner, B., 1978. The spectral distribution of solar radiation at the earth’s surface eelements of a model. Sol. Energy 20, 143e150.

Liou, K.N., 1980. An Introduction to Atmospheric Radiation. Academic Press.Martín, N., Ruíz, J.M., 1999. A new method for the spectral characterization of PV

modules. Prog. Photovolt: Res. Appl. 7, 299e310.Minemoto, T., Nakada, Y., Takahashi, H., Takakura, H., 2009. Uniqueness verification

of solar spectrum index of average photon energy for evaluating outdoor per-formance of photovoltaic modules. Sol. Energy 83, 1294e1299.

Mora, J., Mora-López, L., 2010. Comparing distributions with bootstrap techniques:an application to global solar radiation. Math. Comput. Simul. 81, 811e819.

Moreno-Sáez, R., Sidrach-de-Cardona, M., Mora-López, L., 2013. Data mining andstatistical techniques for characterizing the performance of thin-film photo-voltaic modules. Expert Syst. Appl. 40 (17), 71417150.

Myers, D.R., 2012. Direct beam and hemispherical terrestrial solar spectral distri-butions derived from broadband hourly solar radiation data. Sol. Energy 86,2771e2782.

Nan, S., Riordan, C., 1991. Solar spectral irradiance under clear and cloudy skies:measurements and a Semiempirical model. J. Appl. Meteorol. 30, 447e462.

Piliougine, M., Carretero Rubio, J.E., Mora-Lopez, L., Sidrach de Cardona Ortin, M.,2011. Experimental system for current-voltage curve measurement of photo-voltaic modules under outdoor conditions. Progress Photovolt., 1e12.

Poissant, Y., Couture, L., Dignard-Bailey, L., Thevenard, D., Cusack, P., Oberholzer, H.,2003. Simple test methods for evaluating the energy Rati ng of PV modulesunder various environmental conditions. In: Proceedings of ISES 2003, Goth-enburg, Sweden.

Rohatgi, V.K., Saleh, A.K.M.E., 2001. An Introduction to Probability and Statistics,second ed. Wiley-Interscience, New York.

Sirisamphanwong, C., Ketjoy, N., 2012. Impact of spectral irradiance distribution onthe outdoor performance of photovoltaic system under Thai climatic condi-tions. Renew. Energy 38, 69e74.

Stamnes, K., Tsay, S.C., Wiscombe, W., Jayaweera, K., 1988. Numerically stable al-gorithm for discrete-ordinate-method radiative transfer in multiple scatteringand emitting layered media. Appl. Optics 27, 2502e2509.

Utrillas, M.P., Boscá, J.V., Martínez-Lozano, J.A., Cañada, J., Tena, F., Pinazo, J.M., 1998.A comparative study of SPCTRAL2 and SMARTS2 parameterised models basedon spectral irradiance measurements at Valencia, Spain. Sol. Energy 63 (3),161e171.

Williams, S.R., Betts, T.R., Helf, T., Gottschalg, R., Beyer, H.G., Infield, D.G., 2003.Modelling long-term module performance based on realistic reporting condi-tions with consideration to spectral effects. In: Proc. of the Third World Con-ference on Photovoltaic Energy Conversion, Osaka, Japan, pp. 1908e1911.