machine learning approaches to coastal water quality monitoring using goci satellite data

19
This article was downloaded by: [York University Libraries] On: 13 August 2014, At: 04:14 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK GIScience & Remote Sensing Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/tgrs20 Machine learning approaches to coastal water quality monitoring using GOCI satellite data Yong Hoon Kim a , Jungho Im b , Ho Kyung Ha c , Jong-Kuk Choi d & Sunghyun Ha b a RPS ASA, South Kingstown, RI, USA b School of Urban and Environmental Engineering, Ulsan National Institute of Science and Technology, Ulsan, South Korea c Department of Ocean Sciences, Inha University, Incheon, South Korea d Korea Institute of Ocean Science & Technology, Ansan, South Korea Published online: 03 Apr 2014. To cite this article: Yong Hoon Kim, Jungho Im, Ho Kyung Ha, Jong-Kuk Choi & Sunghyun Ha (2014) Machine learning approaches to coastal water quality monitoring using GOCI satellite data, GIScience & Remote Sensing, 51:2, 158-174 To link to this article: http://dx.doi.org/10.1080/15481603.2014.900983 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

Upload: sunghyun

Post on 01-Feb-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine learning approaches to coastal water quality monitoring using GOCI satellite data

This article was downloaded by: [York University Libraries]On: 13 August 2014, At: 04:14Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

GIScience & Remote SensingPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/tgrs20

Machine learning approaches to coastalwater quality monitoring using GOCIsatellite dataYong Hoon Kima, Jungho Imb, Ho Kyung Hac, Jong-Kuk Choid &Sunghyun Hab

a RPS ASA, South Kingstown, RI, USAb School of Urban and Environmental Engineering, Ulsan NationalInstitute of Science and Technology, Ulsan, South Koreac Department of Ocean Sciences, Inha University, Incheon, SouthKoread Korea Institute of Ocean Science & Technology, Ansan, SouthKoreaPublished online: 03 Apr 2014.

To cite this article: Yong Hoon Kim, Jungho Im, Ho Kyung Ha, Jong-Kuk Choi & Sunghyun Ha(2014) Machine learning approaches to coastal water quality monitoring using GOCI satellite data,GIScience & Remote Sensing, 51:2, 158-174

To link to this article: http://dx.doi.org/10.1080/15481603.2014.900983

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

Page 2: Machine learning approaches to coastal water quality monitoring using GOCI satellite data

Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:14

13

Aug

ust 2

014

Page 3: Machine learning approaches to coastal water quality monitoring using GOCI satellite data

Machine learning approaches to coastal water quality monitoringusing GOCI satellite data

Yong Hoon Kima, Jungho Imb*, Ho Kyung Hac, Jong-Kuk Choid and Sunghyun Hab

aRPS ASA, South Kingstown, RI, USA; bSchool of Urban and Environmental Engineering, UlsanNational Institute of Science and Technology, Ulsan, South Korea; cDepartment of Ocean Sciences,Inha University, Incheon, South Korea; dKorea Institute of Ocean Science & Technology, Ansan,South Korea

(Received 28 December 2013; accepted 25 February 2014)

Since coastal waters are one of the most vulnerable marine systems to environmentalpollution, it is very important to operationally monitor coastal water quality. This studyattempts to estimate two major water quality indicators, chlorophyll-a (chl-a) andsuspended particulate matter (SPM) concentrations, in coastal environments on thewest coast of South Korea using Geostationary Ocean Color Imager (GOCI) satellitedata. Three machine learning approaches including random forest, Cubist, and supportvector regression (SVR) were evaluated for coastal water quality estimation. In situmeasurements (63 samples) collected during four days in 2011 and 2012 were used asreference data. Due to the limited number of samples, leave-one-out cross validation(CV) was used to assess the performance of the water quality estimation models.Results show that SVR outperformed the other two machine learning approaches,yielding calibration R2 of 0.91 and CV root-mean-squared-error (RMSE) of 1.74 mg/m3 (40.7%) for chl-a, and calibration R2 of 0.98 and CV RMSE of 11.42 g/m3 (63.1%)for SPM when using GOCI-derived radiance data. Relative importance of the predictorvariables was examined. When GOCI-derived radiance data were used, the ratio ofband 2 to band 4 and bands 6 and 5 were the most influential input variables inpredicting chl-a and SPM concentrations, respectively. Hourly available GOCI imageswere useful to discuss spatiotemporal distributions of the water quality parameters withtidal phases in the west coast of Korea.

Keywords: chlorophyll-a concentration; suspended particulate matter; GOCI; waterquality; machine learning

Introduction

Environmental threatening on coastal and inner shelf regions has increased due to elevatedanthropogenic input in recent decades. Monitoring of water quality is one of fundamentalactivities in order to conduct sound coastal and ocean management and set up appropriateenvironmental policies and decisions. Remote sensing technologies, such as satelliteMODIS, SeaWiFS, and MERIS sensors, have been commonly used to monitor coastalwater quality with advantage of its ability of covering vast areas at high temporalresolution. While it is relatively straightforward to estimate water quality in open oceanusing remote sensing, coastal water quality estimation in typical Case II waters is stillchallenging as many factors can influence water quality, including various materialscoming from inland drainage systems and coastal circulation. There are continued efforts

*Corresponding author. Email: [email protected]

GIScience & Remote Sensing, 2014Vol. 51, No. 2, 158–174, http://dx.doi.org/10.1080/15481603.2014.900983

© 2014 Taylor & Francis

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:14

13

Aug

ust 2

014

Page 4: Machine learning approaches to coastal water quality monitoring using GOCI satellite data

to accurately estimate water quality parameters in coastal waters from remote sensing datain a timely manner (Chen, Hu, and Muller-Karger 2007; Zhang et al. 2010; Son and Wang2012; Choi et al. 2012; Tilstone et al. 2013).

There have been numerous studies to estimate water quality in coastal environmentsusing satellite data. Most of the studies applied relatively simple linear or nonlinearregressions to satellite data based on empirical relationships between satellite-derivedreflectance and target water quality parameters (Chen, Hu, and Muller-Karger 2007;Choi et al. 2014; Darecki and Stramski 2004; Miller and McKee 2004; Son and Wang2012; Tilstone et al. 2013). A major limitation of these empirical models developed forcoastal waters is that they might not work well for different coastal environments, whichthus requires new in situ data and parameter values optimized for the coastal waters. Inaddition, colored dissolved organic materials, chlorophyll-a (chl-a) concentration, and thetypes and size of suspended particulate matters (SPMs) affect spectral responses in coastalwaters in a complicated manner, which makes it more difficult to apply an algorithmdeveloped for a coastal water to a different one (e.g., Nechad, Ruddick, and Park 2010;Zhang et al. 2010; Son and Wang 2012; Tilstone et al. 2013; Choi et al. 2014).

Recently, machine learning approaches that do not require any data assumptions unlikeconventional statistical approaches have been proven to be useful in various remote sensingclassification and regression tasks (Chau 2006; Gleason and Im 2012; Gong, Im, andMountrakis 2011; Gong et al. 2012; Lu et al. 2010; Li, Im, and Beier 2013; Li et al. 2014;Rhee et al. 2008; Singh, Basant, and Gupta 2011; Vilas, Spyrakos, and Palenzuela 2011).Although machine learning approaches such as random forest (RF), regression trees, andsupport vector regression (SVR) have advantages such as no assumption in data distributionin predicting various measures, there are few studies that applied such approaches in coastal orocean water quality estimation from remote sensing data. For example, Camps-Valls et al.(2006) evaluated relevance vector machines to estimate chl-a concentration using theSeaBAM dataset including 919 in situ measurements around the United States and Europe.Mulia et al. (2013) developed a hybrid artificial neural network combined with geneticalgorithm (ANN-GA) model for predicting water turbidity and chl-a concentration inSingapore and Johor Straits. While machine learning approaches have been reported aspromising tools in the literature, it should be noted that they tend to overfit on input sampledata especially when input data are limited (Im et al. 2012; Mountrakis, Im, and Ogole 2011).

The west coast of Korea is characterized by shallow water depths, extensive tidal flats,strong tidal currents, and various materials from inland water systems, which make itdifficult to quantify water quality parameters such as chl-a and SPM concentrations usingremote sensing (Kang et al. 2009; Choi et al. 2012). In addition, many civil structuresconstructed along the west coast during the past few decades have significantly changedmarine environments, which made water quality variations more dynamic (Choi et al.2012; Jung and Kim 2005). There exists no robust model that can be used to accuratelyestimate water quality parameters from satellite data such as MODIS and GeostationaryOcean Color Imager (GOCI) in this region.

The aim of this study was to evaluate the use of machine learning approaches to theestimation of two major water quality indicators, chl-a, and SPM concentrations, at twoselected regions along the west coast of South Korea using GOCI satellite data. Thespecific objectives of this study were to (1) assess three machine learning approaches –RF, Cubist regression trees (Cubist), and SVR – for coastal water quality estimation, (2)examine the relative importance of the predictor variables by model, and (3) map andexplore the spatiotemporal distributions of chl-a concentration and turbidity from GOCIimages based on the selected machine learning model.

GIScience & Remote Sensing 159

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:14

13

Aug

ust 2

014

Page 5: Machine learning approaches to coastal water quality monitoring using GOCI satellite data

Study area and data

The Yellow Sea is a shallow marginal sea with average depth of only 44 m (Wei et al.2010). It is bounded by the Chinese main land on the west and the Korean Peninsula onthe right. It is highly affected by physical factors such as tidal force, river discharge, andwind forcing. Intensive mutual interaction among those processes makes the Yellow Seacomplicated in terms of physical and thus environmental aspects. The major tidal compo-nents in the Yellow Sea are dominantly semidiurnal (M2, S2, and N2) overlaid with somediurnal components (K1 and O1). The tidal ranges along the coast of South Korea canreach up to 10 m in Gyeonggi Bay (Choi and Kim 2006).

Two sites were selected for this study: site A in the middle part of the Peninsula nearthe city of Incheon and site B around the western tip of the Peninsula adjacent to the cityof Mokpo (Figure 1). A total of 63 in situ chl-a and SPM samples were collected during 4days in 2011 and 2012, which were used as reference data. Samples were obtained fromsurface water using a plastic bowl and filtered using 25 mm and 47 mm GF/F filters toestimate chl-a and SPM concentrations, respectively. Water quality at each samplinglocation was estimated in duplicate or triplicate for accurate measurement. In addition,the optical properties of surface water were measured using a hand-held spectroradi-ometer. The total water-leaving (Lw) radiance, sky radiance, and down-welling irradiancewere measured at each sampling location. All the optical properties measured three timeswere averaged, and then quality-controlled by using the methods described in Moon et al.(2012). The measured spectra were converted into remote sensing reflectance (Rrs) foreight GOCI bands by weighting with the GOCI response functions. Table 1 summarizesthe basic statistics of the in situ measurements of chl-a and SPM concentrations and tidalconditions over the four field sampling days. It is notable that relatively high chl-aconcentration was found in site A on 19–20 June 2012 while very high SPM was recordedin site B on 26 October 2011.

GOCI is the world’s first ocean color observation satellite placed in a geostationaryorbit. GOCI covers the 2500 × 2500 km square around Korean Peninsula including Japan,Eastern China, and portions of Mongolia and Russia and collects eight images per day at

Figure 1. Two study sites and in situ sampling locations in this study.

160 Y.H. Kim et al.

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:14

13

Aug

ust 2

014

Page 6: Machine learning approaches to coastal water quality monitoring using GOCI satellite data

Table1.

Basicstatisticsof

insitu

measurementsby

fieldsurvey

andtid

alcond

ition

salon

gwith

windspeedinform

ationon

each

dayof

datacollectionatthetwo

stud

ysites.

Chl-a

concentration

(mg/m

3)

SPM

concentration

(g/m

3)

Field

sampling

date

(location)

Sam

plesize

Mean

Stand

arddeviation

Mean

Stand

arddeviation

Tidal

rang

e(m

)*Windspeed

(m/s)**

Current

velocity

(cm/s)***

River

discharge

(m3/s)

11June

2011

(site

A)

171.05

0.99

2.06

1.08

5.42

3.9

18.65

565.08

26Octob

er20

11(site

B)

132.12

0.69

77.54

48.61

4.58

3.4

35.37

19June

2012

(site

A)

178.07

2.14

3.05

0.89

7.21

3.4

42.03

455.85

20June

2012

(site

A)

165.44

1.33

2.83

0.69

7.34

2.7

29.03

511.76

Notes:*M

easuredfrom

Incheonstationforsite

AandMokpo

stationforsite

B(byKorea

HydrographicandOceanographic

Adm

inistration).

**Averagedhourly

data

from

9am

to4pm

measuredfrom

Dukjuk-do

stationforsite

AandMokpo

stationforsite

B(byKorea

MeteorologicalAdm

inistration).

***M

eanin

situ

observationusingRCM9at17

stations

during

thefieldsurvey

in20

11forsiteAandat21

stations

forsiteB.B

uoyob

servation(9

am–4

pm)attheYou

ngjong

bridge

stationin

2012

forsite

A.

****Sum

ofriverdischargemeasuredat

Han

River

andIm

jinRiver

stations

forsite

A(availablefrom

Water

Managem

entInform

ationSystem).

GIScience & Remote Sensing 161

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:14

13

Aug

ust 2

014

Page 7: Machine learning approaches to coastal water quality monitoring using GOCI satellite data

500 m resolution between 9:00 and 16:00 with an hour interval. It has six visible bandswith band centers at 412, 443, 490, 555, 660, and 680 nm, and two near-infrared (NIR)bands with band centers at 745 and 865 nm (Choi et al. 2012). A total of 32 GOCI imagescollected during the same four days as in situ measurements in 2011 and 2012 were usedto estimate chl-a and SPM: 8 images between 9:30 and 16:30 on each of 11 June 2011, 26October 2011, and 19–20 June 2012. The GOCI data are collected approximately 30minutes after each hour over the west coast of Korea.

Atmospheric correction was performed on all GOCI images to remove radiancereflected by the intervening atmosphere and the air–water interface, to generate imagesof Lw radiance. For GOCI atmospheric correction in the present study, we employed amodified version of the original management unit of the north sea mathematical models(MUMM) algorithm by Ruddick, Ovidio, and Rijkeboer (2000), which had been success-fully applied to turbid water remote sensing (Choi et al. 2012; Fettweis, Nechad, and Vanden Eynde 2007; Kowalczuk et al. 2010). The algorithm calculates the contributions ofaerosol and water to satellite reflectance on a per-pixel basis with the assumption that theratio of GOCI band 7 to band 8 for 2 NIR aerosol reflectance is spatially constant and for 2NIR water reflectance changes in proportion to turbidity to avoid underestimation of waterreflectance in highly turbid water (Lee, Ahn, et al. 2012; Choi et al. 2013). Consequently,Rrs images were generated from the GOCI images using atmospheric correction.

Methods

Three machine learning regression models – RF, Cubist, and SVR – were evaluated toestimate chl-a and SPM concentrations using in situ spectral reflectance and GOCI-derived Lw radiance data. A total of 11 input features were used in the machine learningmodels, including atmospherically corrected radiance data from the eight GOCI bands andthree band ratios – band 2 (centered at 443 nm)/band 4 (centered at 555 nm), band 3(centered at 490 nm)/band 4, and band 4/band 6 (centered at 680 nm) – which are oftenused in water quality estimation due to their high correlation with target variables such aschl-a and SPM concentrations.

Random forest (RF)

RF predicts a dependent variable by combining predictions returned by multiple classifi-cation and regression trees (CART) using the Gini index for variable selection measure-ment at each node. One of the well-known problems in CART is its sensitivity to trainingdata (i.e., training bias from a single tree). RF adopts two levels of randomness to improveits robustness to training data outliers and overfitting (Breiman 2001): A randomlyselected subset of training samples is used for generating each tree and a randomlyselected subset of predictor variables is used at each node to determine a feature fortree split at the node. Numerous de-correlated trees by such randomness are combinedusing different weights by performance of each tree. One advantage of RF is the no needof model validation because some of training samples, excluded in an out-of-bag foreach tree generation, are used to calculate the errors associated with the tree, whichprovides information about the relative strength and correlation of that tree (Breiman2001). In addition, relative feature (i.e., variable) importance can be determined bycalculating the modeling error when each feature is held out-of-bag. The RF model wasimplemented in the R statistical software (version 3.0.2; http://www.r-project.org/), usingthe randomForest add-on package based on the work of Breiman and Cutler (Liaw and

162 Y.H. Kim et al.

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:14

13

Aug

ust 2

014

Page 8: Machine learning approaches to coastal water quality monitoring using GOCI satellite data

Wiener 2002) with default settings, except that 1000 trees were used instead of the defaultnumber of trees.

Cubist regression trees (Cubist)

Cubist regression trees developed by RuleQuest Research, Inc. are commercially availablerule-based software that has been widely applied in remote sensing regression studies(Blackard et al. 2008; Chen et al. 2011; Im et al. 2009, 2012). Unlike CART-basedregression trees (e.g., RF) that have a single constant value at each final node, Cubist usesa modified regression tree system with instance-based criteria, which produces rule-basedmultivariate regression output (i.e., each rule is associated with a multivariate regression).Since Cubist generates rule-based results, it is more straightforward and interpretable thanRF. Walton (2008) notes that Cubist has much shorter run time than CART-basedregression trees. When multiple rules overlap in Cubist, the final prediction is done byaveraging all results from the multiple rule-based multivariate regressions.

Support vector regression (SVR)

SVR is a regression version of support vector machines (SVM) that have been widelyused in remote sensing applications, including forestry mapping (Li, Im, and Beier 2013),land cover classification (Kavzoglu and Colkesen 2009), and habitat monitoring (Boyd,Sanchez-Hernandez, and Foody 2006). SVR has been adopted for various regression taskssuch as forest leaf-area-index estimation (Durbha, King, and Younan 2007), carbon stockestimation (Li et al. 2014), and soil property quantification (Ballabio 2009). Many studieshave reported the powerful predictability of SVM/SVR especially when the size oftraining samples was small (Pal and Mather 2005). A detailed description about SVM/SVR is found in Mountrakis, Im, and Ogole (2011). SVR tries to find optimum hyper-planes that separate groups of input features with similar feature characteristics to predicta dependent variable. A kernel function is typically used to transform original featurespace into higher dimensions to improve group separability by locating hyperplanes.Commonly used kernel functions include linear, polynomial, Gaussian, sigmoid, spectralangle, and radial basis functions. An optimum solution can be found through iterativelyadjusting hyperplanes based on the errors associated with them.

In this study, several kernel functions were tested and the radial basis kernel function wasselected for further analysis based on its performance. Library for support vector machines(LIBSVM), an open source library (http://www.csie.ntu.edu.tw/~cjlin/libsvm/), was used toimplement SVR (Chang and Lin 2001). SVR was developed as a stand-alone tool using theC# language and Geospatial Data Abstraction Library (Gleason and Im 2012). A grid searchoptimization algorithm (Hsu, Chang, and Lin 2010; Huang and Wang 2006) was used todetermine two major parameters of the radial basis kernel function in the tool.

Accuracy assessment

Due to the limited number of samples, leave-one-out cross validation (CV) was performedto assess three machine learning models for water quality estimation. In the data miningand machine learning communities, leave-one-out CV has been often used for modelevaluation and selection because it provides an almost unbiased estimate with minimiza-tion of manual intervention (Refaeilzadeh, Tang, and Liu 2009). Five accuracy metricswere used, including the coefficient of determination (R2), calibration root-mean-squared-

GIScience & Remote Sensing 163

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:14

13

Aug

ust 2

014

Page 9: Machine learning approaches to coastal water quality monitoring using GOCI satellite data

error (RMSE), relative RMSE, CV RMSE, and relative CV RMSE. Relative RMSE is arelative error estimate that allows for comparing results on the same scale especially whendifferent samples or target variables are used. Relative (CV) RMSE is calculated bydividing (CV) RMSE with the mean of all samples of the dependent variables.

Results and discussion

Three machine learning models were used to predict two water quality measures, chl-aand SPM concentrations, by applying in situ spectra of surface water measured using ahand-held spectroradiometer. Figure 2 shows the correlation between the predicted resultsand measured data by applying three modeling approaches. For chl-a concentration, R2

between the measured and predicted values ranges from 0.68 to 0.86, out of which SVRshows the best correlation. Both RMSE and CV RMSE are also the smallest for the SVRmodel (1.18–1.64 mg/m3; 27.5–38.4%). On the contrary, the predicted values for SPMconcentration resulted in the highest R2 (0.99) for the Cubist method, followed very closeby SVR (0.98) and RF. The calibration RMSE for SPM using Cubist (3.86 g/m3; 21.3%)was the smallest, almost half of that with SVR (5.65 g/m3; 31.2%). However, the SVRmodel resulted in the smallest CV RMSE value (8.75 g/m3; 48.3%) in SPM estimation.

Figure 3 shows the similar measures (i.e., R2, RMSE, and CV RMSE) for thecomparison between predicted water quality values with GOCI data and measuredsamples. When predicting chl-a concentration, the SVR approach outperformed theother two methods with the highest R2 (0.91) over 0.79 (Cubist) and 0.57 (RF), and the

Figure 2. Scatterplots between observations and predicted values from three machine learningmodels for chl-a and SPM concentrations when using in situ spectra: (a) chl-a concentration usingRF, (b) chl-a concentration using Cubist, (c) chl-a concentration using SVR, (d) SPM concentrationusing RF, (e) SPM concentration using Cubist, and (f) SPM concentration using SVR.

164 Y.H. Kim et al.

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:14

13

Aug

ust 2

014

Page 10: Machine learning approaches to coastal water quality monitoring using GOCI satellite data

lowest RMSE (0.99 mg/m3; 23%) and CV RMSE (1.74 mg/m3; 40.7%). This wasconsistent with the correlation analysis results based on the in situ spectra using thehand-held spectroradiometer. In terms of prediction capability for SPM concentration,SVR and Cubist produced almost similar performance (R2 of ~0.97 and RMSE of~6 g/m3; 32%), but RF resulted in lower R2 (0.87) and larger calibration RMSE (13.47g/m3; 74.4%). While R2 values for SPM concentration were higher than those for chl-aconcentration due to the availability of few samples with very large SPM concentration,relative RMSEs that allowed relative comparison between models were lower for SPMthan chl-a due to the different ranges of the concentrations.

While RF exhibited the worst performance among the three machine learningapproaches, it is notable that the difference between calibration RMSE and CV RMSEwas the smallest for RF. This indicates that RF can be considered a more stable machinelearning method than the others. The relatively poor performance of RF could result fromthe small sample size in that RF determines the final output by combining numerousdecorrelated trees developed using different sets of training samples. In fact, RF has beenproven robust especially when a large size of samples were used (Li et al. 2014; Lu et al.2013; Yoo, Im, and Wagner 2012). On the other hand, the best performance of SVRagrees well with the literature in that SVR particularly works well with the limited numberof samples (Mountrakis, Im, and Ogole 2011).

RF provides the relative importance of input variables using out-of-bag data. In otherwords, when a variable is held out-of-bag, the increase in mean-squared-errors (MSE) is

Figure 3. Scatterplots between observations and predicted values from three machine learningmodels for chl-a and SPM concentrations when using GOCI-derived radiance data: (a) chl-aconcentration using RF, (b) chl-a concentration using Cubist, (c) chl-a concentration using SVR,(d) SPM concentration using RF, (e) SPM concentration using Cubist, and (f) SPM concentrationusing SVR.

GIScience & Remote Sensing 165

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:14

13

Aug

ust 2

014

Page 11: Machine learning approaches to coastal water quality monitoring using GOCI satellite data

calculated, which can be used to assess the relative importance of the variable. In the case ofhand-held spectroradiometer data being used to predict chl-a concentration, MSE increasedby the largest value when the ratio of band 4 to band 6 was left out (Figure 4(a)), whichimplies that the ratio would be the most influential input in predicting chl-a concentrationby using the hand-held spectroradiometer data. Other than that, bands 3 through 8 appearedalmost equally important in predicting chl-a. When estimating SPM concentration, optingfor a single-input parameter did not show a significant increase in MSE similar to the caseof predicting chl-a. Instead, the impact of bands 5 through 8 turned out to be moresignificant than the other bands (Figure 4(a)). Figure 4(b) shows relative importance of

Figure 4. Relative variable importance (i.e., increase in MSE when a variable was held out-of-bag)calculated from RF. (a) Using in situ spectral samples. (b) Using GOCI-derived radiance data.

166 Y.H. Kim et al.

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:14

13

Aug

ust 2

014

Page 12: Machine learning approaches to coastal water quality monitoring using GOCI satellite data

input variables when using GOCI-derived data to predict water quality parameters. For thechl-a concentration models, opting for the ratio of band 2 over band 4 brought up thehighest increase in MSE. However, bands 6 and 5 seemed to be the most influential inputvariables in predicting SPM concentration, which is consistent with the linear regressionmodel using GOCI band 5 for estimating SPM in Choi et al. (2012, 2014).

Cubist provides variable usage information from the resultant rules and conditions,which can also be used to evaluate the relative importance of input variables (Table 2).For chl-a concentration, many of the input variables were equally used in the conditions,while only band 4 was used in the rules when using GOCI-derived data. Interestingly,similar to the results of the RF model, band 8 and band 6 appeared to be the mostimportant features when using in situ spectra and GOCI-derived data to predict SPMconcentration, respectively. Current GOCI standard algorithms to compute chl-a and SPMconcentrations are based on empirical relationships derived from historical in situ data:

Chl a ¼ 1:8528 Rrs 443ð Þ þ Rrs 490ð Þf g � Rrs 412ð Þ½ �=Rrs 555ð Þð Þ�3:263 in mg�m3

� �(1)

SPM ¼ 945:07 Rrs 555ð Þ½ �1:137 in g�m3

� �(2)

where Rrs(412), Rrs(443), Rrs(490) and Rrs(555) are reflectance of GOCI band 1, 2, 3,and 4, respectively (Moon et al. 2012). Since the blue or green band is known to besensitive to suspended matter for open ocean (Zhang et al. 2010), GOCI band 4 centeredat 555 nm is suitable for the detection of turbidity as seen in the GOCI SPM standardalgorithm (Equation (2)). However, such an approach is not applicable to areas of highturbidity in coastal waters, as suspended matter is mainly from the resuspension of debris,coastal erosion, and river discharge, in which case the red or NIR band could be moresuitable (Nechad, Ruddick, and Park 2010). An empirical SPM algorithm using GOCIband 5 centered at 660 nm was developed by Choi et al. (2012, 2014). Along with the

Table 2. Variable usage by Cubist regression trees.

Input variables Chl-a concentration SPM concentration

In situ spectra Band 1 (100%) Band 8 (100%/100%)*Band 5 (100%) Band 2/Band 4 (25%/8%)Band 6 (100%) Band 4 (100%)Band 7 (100%) Band 5 (83%)Band 2/Band 4 (100%) Band 6 (83%)Band 3/Band 4 (100%) Band 7 (83%)

Band 4/Band 6 (83%)GOCI-derived radiance Band 4 (100%/100%) Band 6 (100%/100%)

Band 1 (100%) Band 2/Band 4 (17%/89%)Band 3 (100%) Band 1 (89%)Band 5 (100%) Band 3 (89%)Band 6 (100%) Band 4 (89%)Band 8 (100%) Band 5 (89%)Band 4/Band 6 (100%) Band 4/Band 6 (83%)

Notes: *The former percentage means variable usage in conditions and the latter percentage indicates the usagein multivariate models. A single percentage means that the variable was used not in conditions, but in multi-variate models.

GIScience & Remote Sensing 167

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:14

13

Aug

ust 2

014

Page 13: Machine learning approaches to coastal water quality monitoring using GOCI satellite data

literature, the findings from the present study imply that GOCI red and NIR bands can beused to operationally predict SPM concentration in coastal waters.

Based on the accuracy assessment results in estimating both chl-a and SPM concen-trations, the SVR model was selected to map spatial distribution of water quality proper-ties with GOCI data. Figures 5 and 6 show the spatial and temporal variability in GOCI-derived chl-a and SPM concentrations in Gyeonggi Bay, respectively. Chl-a concentrationwas generally high in the southern part of Gyeonggi Bay, that is, near Incheon andNamyang Bay (Figure 5), while SPM concentration showed relatively higher values inthe northern part of Gyeonggi Bay, that is, Han River estuary region (Figure 6). Ingeneral, SPM concentration is strongly influenced by resuspension processes, and physi-cal processes including wave and currents and bathymetry are the first order keys incontrolling SPM distribution. On the contrary, chl-a is mainly controlled by biologicalconditions such as the proximity to the source of nutrients (i.e., either riverine orupwelling areas). Thus, the different spatial patterns between chl-a and SPM concentra-tions might imply that more active biological processes (i.e., primary production) occurredin the southern part of Gyeonggi Bay, while active resuspension processes occurred in the

Figure 5. Hourly (9:30 am to 4:30 pm) spatial distribution of chl-a concentration estimated using theSVR model on 20 June 2012 for site A. Color figures are available in the online version of this article.

168 Y.H. Kim et al.

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:14

13

Aug

ust 2

014

Page 14: Machine learning approaches to coastal water quality monitoring using GOCI satellite data

northern part of the Bay with high SPM concentration. SPM concentration is highlycorrelated with tidal stage because the alteration of high and low current velocities duringthe maximum and slack tides is the main factor to induce the periodic repetition ofresuspension and settling. Rather, chl-a concentration has less dependency on the tidalstage (Lee, Park, et al. 2013). The temporal change in SPM concentration, shown with 1-hour interval seems to be mainly affected by tidal cycles, because the study area is ashallow, macro-tidal environment (Table 1), and the tidal flat is well developed. SPMexhibited the relatively high concentration (~30 g/m3) in the first image (9:30 am; Figure6(a)), decreased to the minimum value on 2:30 pm (Figure 6(f)), and then increased afterthat time. Although the tide propagates from south to north in this region, the phasedifference within site A is less than 1 hour (Hwang et al. 2014). Thus, it is assumed thateach GOCI image represents the same tidal stage for the entire area of site A. It should benoted that the GOCI-derived SPM represents near-surface concentrations. Therefore, thetidal acceleration phase might not perfectly correspond to the high concentration timingdisplayed on the GOCI-derived images. A coherent relationship between SPM concentra-tion and tidal cycles could be unraveled through simultaneous observation of the verticalprofiles of current velocities and SPM concentration, which were not measured in the study.

Figure 6. Hourly (9:30 am to 4:30 pm) spatial distribution of SPM concentration estimated usingthe SVR model on 20 June 2012 for site A.

GIScience & Remote Sensing 169

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:14

13

Aug

ust 2

014

Page 15: Machine learning approaches to coastal water quality monitoring using GOCI satellite data

Figures 7 and 8 show other example cases of the SVR model in estimating chl-a andSPM concentrations, particularly in the southwestern coastal region of Korea (site B inFigure 1). In general, chl-a concentration gradually increased from the first to last image,and reached maximum around 4:30 pm (Figure 7). SPM concentration, however, was thehighest at 9:30 am (Figure 8(a)), and then gradually decreased to early afternoon, which isconsistent with tidal stage. In southwestern coastal waters of Korea (i.e., around site B),strong tidal current is the dominant factor controlling sediment transport along with high-energy wind waves, which affects the resuspension of bottom sediments and the redis-tribution of sediment according to grain size (e.g., Kim and Lim 2009). Low and high tidein the study area on 26 October 2011 occurred at around 6:00 and 14:00, respectively(Mokpo station) (www.khoa.go.kr). Thus, high current velocities at the surface occurredin the early morning and late afternoon, which could make SPM concentration in thesurface water high. It is also consistent with the findings of Choi et al. (2012), that currentvelocities were the highest during the flood tide and decreased remarkably around thehigh tide slack. Another interesting feature in this area is that chl-a and SPM concentra-tions were out of phase, whereby high concentration of SPM occurred with low concen-tration of chl-a and vice versa. Although detailed analysis on such a phase lag relationshipis beyond the scope of this study, it warrants more process-oriented studies for resuspen-sion and primary production in that region.

Figure 7. Hourly (9:30 am to 4:30 pm) spatial distribution of chl-a concentration estimated usingthe SVR model on 26 October 2011 for site B.

170 Y.H. Kim et al.

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:14

13

Aug

ust 2

014

Page 16: Machine learning approaches to coastal water quality monitoring using GOCI satellite data

Conclusions

The present study examined three machine learning approaches for water quality mon-itoring in Korean coastal waters using high temporal resolution (eight times a day) GOCIsatellite data. SVR produced the best results among the three models regardless of theinput data used (i.e., in situ spectra or GOCI-derived data), yielding R2 ~ 0.9 and CVRMSE ~ 1.6–1.7 mg/m3 (39%) for chl-a concentration, and R2 ~ 0.98 and CV RMSE of8.8–11.4 g/m3 (48–63%) for SPM. Relative importance of the predictor variables wasexamined by model except for SVR. Both RF and Cubist showed similar patterns inidentifying important variables to estimate water quality parameters. Important variablesidentified by RF and Cubist for estimating SPM concentration from GOCI data agreedwell with some previous studies that estimated SPM from GOCI data using a simplenonlinear regression. Maps of two water quality measures, chl-a and SPM concentrations,were prepared for two dates in 2011 and 2012 based on the SVR approach. The hourlyGOCI results showed that the spatial and temporal distributions of chl-a and SPM wereinfluenced by tidal phases.

The results from this study made it promising to use high-temporal resolution GOCIdata for operational monitoring of water quality in Korean coastal waters. Future researchincludes investigation of seasonal and yearly variations of GOCI-based coastal water

Figure 8. Hourly (9:30 am to 4:30 pm) spatial distribution of SPM concentration estimated usingthe SVR model on 26 October 2011 for site B.

GIScience & Remote Sensing 171

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:14

13

Aug

ust 2

014

Page 17: Machine learning approaches to coastal water quality monitoring using GOCI satellite data

quality parameters using machine learning approaches, and development of operationalmodels to estimate water quality in Korean coastal waters from GOCI-derived radiancedata using more in situ observations.

FundingThis research was supported by “Research for applications of Geostationary Ocean Color Imager(GOCI)-2nd Stage” and “Development of Satellite based Ocean Carbon Flux Model for Seas aroundKorea” funded by the Ministry of Oceans and Fisheries, South Korea, and Inha University ResearchGrant (INHA-49278).

ReferencesBallabio, C. 2009. “Spatial Prediction of Soil Properties in Temperate Mountain Regions Using

Support Vector Regression.” Geoderma 151: 338–350.Blackard, J., M. Finco, E. Helmer, G. Holden, M. Hoppus, D. Jacobs, A. Lister, et al. 2008.

“Mapping US Forest Biomass Using Nationwide Forest Inventory Data and ModerateResolution Information.” Remote Sensing of Environment 112: 1658–1677. doi:10.1016/j.rse.2007.08.021.

Boyd, D., C. Sanchez-Hernandez, and G. Foody. 2006. “Mapping a Specific Class for PriorityHabitats Monitoring from Satellite Sensor Data.” International Journal of Remote Sensing 27(13): 2631–2644. doi:10.1080/01431160600554348.

Breiman, L. 2001. “Random Forests.” Machine Learning 45: 5–32. doi:10.1023/A:1010933404324.Camps-Valls, G., L. Gomez-Chova, J. Vila-Francés, J. Amorós-López, J. Muñoz-Marí, and J. Calpe-

Maravilla. 2006. “Retrieval of Oceanic Chlorophyll Concentration with Relevance VectorMachines.” Remote Sensing of Environment 105 (1): 23–33.

Chang, C. C., and C. J. Lin. 2001. LIBSVM: A Library for Support Vector Machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm

Chau, K. 2006. “A Review on Integration of Artificial Intelligence into Water Quality Modelling.”Marine Pollution Bulletin 52: 726–733. doi:10.1016/j.marpolbul.2006.04.003.

Chen, X., S. Liu, Z. Zhu, J. Vogelmann, Z. Li, and D. Ohlen. 2011. “Estimating AbovegroundForest Biomass Carbon and Fire Consumption in the US Utah High Plateaus Using Data fromthe Forest Inventory and Analysis Program, Landsat, and Landfire.” Ecological Indicators 11:140–148. doi:10.1016/j.ecolind.2009.03.013.

Chen, Z. Q., C. M. Hu, and F. Muller-Karger. 2007. “Monitoring Turbidity in Tampa Bay UsingModis/Aqua 250-M Imagery.” Remote Sensing of Environment 109: 207–220. doi:10.1016/j.rse.2006.12.019.

Choi, J., Y. Park, J. Ahn, H. Lim, J. Eom, and J. Ryu. 2012. “GOCI, The World’s FirstGeostationary Ocean Color Observation Satellite, for the Monitoring of Temporal Variabilityin Coastal Water Turbidity.” Journal of Geophysical Research: Oceans 117: C09004.doi:10.1029/2012JC008046.

Choi, J., Y. Park, B. Lee, J. Eom, J. Moon, and J. Ryu. 2014. “Application of the GeostationaryOcean Color Imager (GOCI) to Mapping the Temporal Dynamics of Coastal Water Turbidity.”Remote Sensing of Environment. doi:10.1016/j.rse.2013.05.032.

Choi, K., and S. P. Kim. 2006. “Late Quaternary Evolution of Macro-tidal Kimpo Tidal Flat,Kyonggi Bay, West Coast of Korea.” Marine Geology 232: 17–34. doi:10.1016/j.margeo.2006.06.007.

Darecki, M., and D. Stramski. 2004. “An Evaluation of MODIS and Seawifs Bio-OpticalAlgorithms in the Baltic Sea.” Remote Sensing of Environment 89: 326–350. doi:10.1016/j.rse.2003.10.012.

Durbha, S. S., R. L. King, and N. H. Younan. 2007. “Support Vector Machines Regression forRetrieval of Leaf Area Index from Multiangle Imaging Spectroradiometer.” Remote Sensing ofEnvironment 107: 348–361. doi:10.1016/j.rse.2006.09.031.

Fettweis, M., B. Nechad, and D. Van den Eynde. 2007. “An Estimate of the Suspended ParticulateMatter (SPM) Transport in the Southern North Sea Using Seawifs Images, in SituMeasurements and Numerical Model Results.” Continental Shelf Research 27: 1568–1583.doi:10.1016/j.csr.2007.01.017.

172 Y.H. Kim et al.

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:14

13

Aug

ust 2

014

Page 18: Machine learning approaches to coastal water quality monitoring using GOCI satellite data

Gleason, C., and J. Im. 2012. “Forest Biomass Estimation from Airborne Lidar Data Using MachineLearning Approaches.” Remote Sensing of Environment 125: 80–91. doi:10.1016/j.rse.2012.07.006.

Gong, B., J. Im, J. Jensen, M. Coleman, J. Rhee, and E. Nelson. 2012. “Characterization of ForestCrops with a Range of Nutrient and Water Treatments using AISA Hyperspectral Imagery.”Giscience and Remote Sensing 49 (4): 463–491. doi:10.2747/1548-1603.49.4.463.

Gong, B., J. Im, and G. Mountrakis. 2011. “An Artificial Immune Network Approach to Multi-Sensor Land Use/Land Cover Classification.” Remote Sensing of Environment 115: 600–614.doi:10.1016/j.rse.2010.10.005.

Hsu, C., C. Chang, and C. Lin. 2010. A Practical Guide to Support Vector Classification. https://www.cs.sfu.ca/people/Faculty/teaching/726/spring11/svmguide.pdf

Huang, C., and C. J. Wang. 2006. “A Ga-Based Feature Selection and Parameters Optimization forSupport Vector Machines.” Expert Systems with Applications 31: 231–240. doi:10.1016/j.eswa.2005.09.024.

Hwang, J. H., S. P. Van, B. J. Choi, and Y. H. Kim. Forthcoming. “The Physical Processes in theYellow Sea.” Ocean and Coastal Management.

Im, J., J. Jensen, M. Coleman, and E. Nelson. 2009. “Hyperspectral Remote Sensing Analysis ofShort Rotation Woody Crops Grown with Controlled Nutrient and Irrigation Treatments.”Geocarto International 24: 293–312. doi:10.1080/10106040802556207.

Im, J., Z. Lu, J. Rhee, and L. Quackenbush. 2012. “Impervious Surface Quantification Using aSynthesis of Artificial Immune Networks and Decision/Regression Trees from Multi-SensorData.” Remote Sensing of Environment 117: 102–113. doi:10.1016/j.rse.2011.06.024.

Jung, T. S., and S. G. Kim. 2005. “Monitoring System of Coastal Environment Changes Due to theConstruction on the Sea.” Journal of Korean Society of Marine Environment and Energy 8 (2):53–59.

Kang, J. W., S. R. Moon, S. J. Park, and K. H. Lee. 2009. “Analyzing Sea Level Rise and TideCharacteristics Change Driven by Coastal Construction at Mokpo Coastal Zone in Korea.”Ocean Engineering 36 (6–7): 415–425. doi:10.1016/j.oceaneng.2008.12.009.

Kavzoglu, T., and I. Colkesen. 2009. “A Kernel Functions Analysis for Support Vector Machines forLand Cover Classification.” International Journal of Applied Earth Observation andGeoinformation 11: 352–359. doi:10.1016/j.jag.2009.06.002.

Kim, C. S., and H. S. Lim. 2009. “Sediment Dispersal and Deposition Due to Sand Mining in theCoastal Waters of Korea.” Continental Shelf Research 29 (1): 194–204. doi:10.1016/j.csr.2008.01.017.

Kowalczuk, P., M. Darecki, M. Zabłocka, and I. Górecka. 2010. “Validation of Empirical and Semi-Analytical Remote Sensing Algorithms for Estimating Absorption by Coloured DissolvedOrganic Matter in the Baltic Sea from Seawifs and MODIS Imagery.” Oceanologia 52: 171–196. doi:10.5697/oc.52-2.171.

Lee, B., J. Ahn, Y. Park, and S. Kim. 2013. “Turbid Water Atmospheric Correction for GOCI:Modification of MUMM Algorithm (Korean Ed.).” Korean Journal of Remote Sensing 29 (2):173–182. doi:10.7780/kjrs.2013.29.2.2.

Lee, H. J., J. Y. Park, S. H. Lee, J. M. Lee, and T. K. Kim. 2013. “Suspended Sediment Transport inA Rock-Bound Macrotidal Estuary: Han Estuary, Eastern Yellow Sea.” Journal of CoastalResearch 287 (2): 358–371. doi:10.2112/JCOASTRES-D-12-00066.1.

Li, M., J. Im, and C. Beier. 2013. “Machine Learning Approaches for Forest Classification andChange Analysis Using Multi-Temporal Landsat TM Images over Huntington Wildlife Forest.”Giscience and Remote Sensing 50 (4): 361–384.

Li, M., J. Im, L. Quackenbush, and T. Liu. 2014. “Forest Biomass and Carbon Stock QuantificationUsing Full Waveform Lidar Data in Montane Forests.” IEEE Journal of Selected Topics inApplied Earth Observations and Remote Sensing. doi:10.1109/JSTARS.2014.2304642.

Liaw, A., and M. Wiener. 2002. “Classification and Regression by Random Forest.” R News 2 (3):18–22. http://cran.r-project.org/doc/Rnews/Rnews_2002-3.pdf

Lu, Z., J. Im, L. Quackenbush, and K. Halligan. 2010. “Population Estimation Based on Multi-Sensor Data Fusion.” International Journal of Remote Sensing 31 (21): 5587–5604.doi:10.1080/01431161.2010.496801.

Lu, Z., J. Im, L. J. Quackenbush, and S. Yoo. 2013. “Remote Sensing Based House ValueEstimation using An Optimized Regional Regression Model.” Photogrammetric Engineering& Remote Sensing 79: 809–820. doi:10.14358/PERS.79.9.809.

GIScience & Remote Sensing 173

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:14

13

Aug

ust 2

014

Page 19: Machine learning approaches to coastal water quality monitoring using GOCI satellite data

Miller, R. L., and B. A. McKee. 2004. “Using MODIS Terra 250 M Imagery to Map Concentrationsof Total Suspended Matter in Coastal Waters.” Remote Sensing of Environment 93: 259–266.doi:10.1016/j.rse.2004.07.012.

Moon, J., Y. Park, J. Ryu, J. Choi, J. Ahn, J. Min, Y. Son, S. Lee, H. Han, and Y. Ahn. 2012. “InitialValidation of GOCI Water Products against in Situ Data Collected around Korean Peninsula for2010–2011.” Ocean Science Journal 47: 261–277. doi:10.1007/s12601-012-0027-1.

Mountrakis, G., J. Im, and C. Ogole. 2011. “Support Vector Machines in Remote Sensing: AReview.” ISPRS Journal of Photogrammetry and Remote Sensing 66: 247–259. doi:10.1016/j.isprsjprs.2010.11.001.

Mulia, I., H. Tay, K. Roopsekhar, and P. Tkalich. 2013. “Hybrid ANN-GA Model for PredictingTurbidity and Chlorophyll-a Concentrations.” Journal of Hydro-Environment Research 7: 279–299. doi:10.1016/j.jher.2013.04.003.

Nechad, B., K. G. Ruddick, and Y. Park. 2010. “Calibration and Validation of a Generic MultisensorAlgorithm for Mapping of Total Suspended Matter in Turbid Waters.” Remote Sensing ofEnvironment 114: 854–866. doi:10.1016/j.rse.2009.11.022.

Pal, M., and P. M. Mather. 2005. “Support Vector Machines for Classification in Remote Sensing.”International Journal of Remote Sensing 26 (5): 1007–1011. doi:10.1080/01431160512331314083.

Refaeilzadeh, P., L. Tang, and H. Liu. 2009. “Cross-Validation.” In Encyclopedia of DatabaseSystems, edited by L. Liu and M. T. Özsu, 532–538. New York: Springer.

Rhee, J., J. Im, G. Carbone, and J. Jensen. 2008. “Delineation of Climate Regions Using In-Situ andRemotely-Sensed Data for the Carolinas.” Remote Sensing of Environment 112: 3099–3111.doi:10.1016/j.rse.2008.03.001.

Ruddick, K., F. Ovidio, and M. Rijkeboer. 2000. “Atmospheric Correction of Seawifs Imagery forTurbid Coastal and Inland Waters.” Applied Optics 39: 897–912. doi:10.1364/AO.39.000897.

Singh, K., N. Basant, and S. Gupta. 2011. “Support Vector Machines in Water QualityManagement.” Analytica Chimica Acta 703: 152–162. doi:10.1016/j.aca.2011.07.027.

Son, S., and M. Wang. 2012. “Water Properties in Chesapeake Bay from Modis-AquaMeasurements.” Remote Sensing of Environment 123: 163–174. doi:10.1016/j.rse.2012.03.009.

Tilstone, G., A. Lotliker, P. Miller, P. Ashraf, T. Kumar, T. Suresh, B. Ragavan, and H. Menon.2013. “Assessment of MODIS-Aqua Chlorophyll-a Algorithms in Coastal and Shelf Waters ofthe Eastern Arabian Sea.” Continental Shelf Research 65: 14–26. doi:10.1016/j.csr.2013.06.003.

Vilas, L., E. Spyrakos, and J. Palenzuela. 2011. “Neural Network Estimation of Chlorophyll a fromMERIS Full Resolution Data for the Coastal Waters of Galician Rias (NW Spain).” RemoteSensing of Environment 115: 524–535. doi:10.1016/j.rse.2010.09.021.

Walton, J. T. 2008. “Subpixel Urban Land Cover Estimation Comparing Cubist, Random Forests,and Support Vector Regression.” Photogrammetric Engineering and Remote Sensing 74: 1213–1222.

Wei, H., J. Shi, Y. Lu, and Y. Peng. 2010. “Inter-Annual and Long-Term Hydrographic Changes inthe Yellow Sea during 1977-1998.” Deep Sea Research Part II: Topical Studies inOceanography 57: 1025–1034. doi:10.1016/j.dsr2.2010.02.004.

Yoo, S., J. Im, and J. E. Wagner. 2012. “Variable Selection for Hedonic Model using MachineLearning Approaches: A Case Study in Onondaga County, NY, A Case Study in OnondagaCounty, NY, USA.” Landscape and Urban Planning 107: 293–306. doi:10.1016/j.landurbplan.2012.06.009.

Zhang, M. W., J. W. Tang, Q. Dong, Q. T. Song, and J. Ding. 2010. “Retrieval of Total SuspendedMatter Concentration in the Yellow and East China Seas from MODIS Imagery.” RemoteSensing of Environment 114: 392–403. doi:10.1016/j.rse.2009.09.016.

174 Y.H. Kim et al.

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:14

13

Aug

ust 2

014