anovelgeneticalgorithm-basedoptimizationframeworkfor ...variable selection for nir calibration of...

10
ResearchArticle A Novel Genetic Algorithm-Based Optimization Framework for the Improvement of Near-Infrared Quantitative Calibration Models Quanxi Feng, 1,2 Huazhou Chen , 1,2 Hai Xie, 1 Ken Cai, 3 Bin Lin, 1,2 and Lili Xu 4 1 College of Science, Guilin University of Technology, Guilin 541004, China 2 Center for Data Analysis and Algorithm Technology, Guilin University of Technology, Guilin 541004, China 3 College of Automation, Zhongkai University of Agriculture and Engineering, Guangzhou 510225, China 4 College of Marine Sciences, Beibu Gulf University, Qinzhou 535011, China Correspondence should be addressed to Huazhou Chen; [email protected] Received 24 December 2019; Revised 10 June 2020; Accepted 12 June 2020; Published 10 July 2020 Academic Editor: Mario Versaci Copyright © 2020 Quanxi Feng et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. e global fishmeal production is used for animal feed, and protein is the main component that provides nutrition to animals. In order to monitor and control the nutrition supply to animal husbandry, near-infrared (NIR) technology was utilized for rapid detection of protein contents in fishmeal samples. e aim of the NIR quantitative calibration is to enhance the model prediction ability, where the study of chemometric algorithms is inevitably on demand. In this work, a novel optimization framework of GSMW-LPC-GA was constructed for NIR calibration. In the framework, some informative NIR wavebands were selected by grid search moving window (GSMW) strategy, and then the variables/wavelengths in the waveband were transformed to latent principal components (LPCs) as the inputs for genetic algorithm (GA) optimization. GA operates in iterations as implementation for the secondary optimization of NIR wavebands. In steps of the variable’s population evolution, the parametric scaling mode was investigated for the optimal determination of the crossover probability and the mutation operator. With the GSMW-LPC-GA framework, the NIR prediction effect on fishmeal protein was experimentally better than the effect by simply adopting the moving window calibration model. e results demonstrate that the proposed framework is suitable for NIR quantitative determination of fishmeal protein. GA was eventually regarded as an implementable method providing an efficient strategy for improving the performance of NIR calibration models. e framework is expected to provide an efficient strategy for analyzing some unknown changes and influence of various fertilizers. 1. Introduction Fishmeal is a kind of popular animal food that provides high contents of protein and amino and fatty acids, which are essential nutrients in many formulated diets [1, 2]. Hence, fishmeal industry is widely spread across the world [3]. Approximately 65% of the annual global fishmeal produc- tion is used for animal feed, including poultry, swine, and even human foods [4]. In fishmeal manufacture, the nu- trition supply is mainly summarized as the variability of protein, which generally accounts for 52–75% of the dif- ferent fish species. Uniform protein composition is induced and produced from raw materials so that the detection of protein content is essential to guarantee the nutrient quality of fishmeal [5]. e conventional method to determine the protein content is the Kjeldahl method. However, the Kjeldahl method requires chemical operation skills, needs chemical reagents, and is time-consuming [6]. erefore, the study of rapid detection technologies is inevitable for precise determination of protein content in the process control of fishmeal production. Near-infrared (NIR) spectroscopy is a rapid detection technology that is able to record the response of the chemical functional groups such as C-H, O-H, and N-H bands, which are the primary chemical compositions of fishmeal protein [7, 8]. NIR spectroscopy shows its advantages in easy and Hindawi Computational Intelligence and Neuroscience Volume 2020, Article ID 7686724, 10 pages https://doi.org/10.1155/2020/7686724

Upload: others

Post on 24-Apr-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ANovelGeneticAlgorithm-BasedOptimizationFrameworkfor ...variable selection for NIR calibration of fishmeal for the quantitative determination of protein content. For NIR calibrations,

Research ArticleA Novel Genetic Algorithm-Based Optimization Framework forthe Improvement of Near-Infrared QuantitativeCalibration Models

Quanxi Feng12 Huazhou Chen 12 Hai Xie1 Ken Cai3 Bin Lin12 and Lili Xu4

1College of Science Guilin University of Technology Guilin 541004 China2Center for Data Analysis and Algorithm Technology Guilin University of Technology Guilin 541004 China3College of Automation Zhongkai University of Agriculture and Engineering Guangzhou 510225 China4College of Marine Sciences Beibu Gulf University Qinzhou 535011 China

Correspondence should be addressed to Huazhou Chen hzchengutfoxmailcom

Received 24 December 2019 Revised 10 June 2020 Accepted 12 June 2020 Published 10 July 2020

Academic Editor Mario Versaci

Copyright copy 2020 Quanxi Feng et al is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

e global fishmeal production is used for animal feed and protein is the main component that provides nutrition to animals Inorder to monitor and control the nutrition supply to animal husbandry near-infrared (NIR) technology was utilized for rapiddetection of protein contents in fishmeal samples e aim of the NIR quantitative calibration is to enhance the model predictionability where the study of chemometric algorithms is inevitably on demand In this work a novel optimization framework ofGSMW-LPC-GA was constructed for NIR calibration In the framework some informative NIR wavebands were selected by gridsearch moving window (GSMW) strategy and then the variableswavelengths in the waveband were transformed to latentprincipal components (LPCs) as the inputs for genetic algorithm (GA) optimization GA operates in iterations as implementationfor the secondary optimization of NIR wavebands In steps of the variablersquos population evolution the parametric scalingmode wasinvestigated for the optimal determination of the crossover probability and the mutation operator With the GSMW-LPC-GAframework the NIR prediction effect on fishmeal protein was experimentally better than the effect by simply adopting the movingwindow calibrationmodele results demonstrate that the proposed framework is suitable for NIR quantitative determination offishmeal protein GA was eventually regarded as an implementable method providing an efficient strategy for improving theperformance of NIR calibration models e framework is expected to provide an efficient strategy for analyzing some unknownchanges and influence of various fertilizers

1 Introduction

Fishmeal is a kind of popular animal food that provides highcontents of protein and amino and fatty acids which areessential nutrients in many formulated diets [1 2] Hencefishmeal industry is widely spread across the world [3]Approximately 65 of the annual global fishmeal produc-tion is used for animal feed including poultry swine andeven human foods [4] In fishmeal manufacture the nu-trition supply is mainly summarized as the variability ofprotein which generally accounts for 52ndash75 of the dif-ferent fish species Uniform protein composition is inducedand produced from raw materials so that the detection of

protein content is essential to guarantee the nutrient qualityof fishmeal [5] e conventional method to determine theprotein content is the Kjeldahl method However theKjeldahl method requires chemical operation skills needschemical reagents and is time-consuming [6]erefore thestudy of rapid detection technologies is inevitable for precisedetermination of protein content in the process control offishmeal production

Near-infrared (NIR) spectroscopy is a rapid detectiontechnology that is able to record the response of the chemicalfunctional groups such as C-H O-H and N-H bands whichare the primary chemical compositions of fishmeal protein[7 8] NIR spectroscopy shows its advantages in easy and

HindawiComputational Intelligence and NeuroscienceVolume 2020 Article ID 7686724 10 pageshttpsdoiorg10115520207686724

fast application with no destruction to samples and minimalrequirement of pretreatment and thus it has been devel-oped to be a prevalent technique for quantitative andqualitative analysis and well established in fields of agri-cultural engineering food safety environmental assessmentand medical and pharmaceutical science [9ndash12] Conse-quently NIR spectrometers are increasingly becoming agood choice for in situ measurements on the spots Howeverin the quantitative prediction of fishmeal there are manycomponents other than protein (eg moisture ash andamino acids) which will raise major effects in the process ofprotein detection [13] NIR fast detection is accompaniedwith noises from the nontarget components and interfer-ences from the instable instrument

NIR spectral response is detected at each resolvablewavelength to describe a sample to be tested When hun-dreds of samples are tested the NIR data are recorded as aspectral matrix as the incident NIR light that is preset in acertain frequency range is split into a bunch of wavelengthsBut noting that not all of the wavelengths have a high signal-to-noise ratio spectral features should be extracted toidentify the informative wavelengths and a calibrationmodel should be established and optimized In recent yearsboth theoretical evidence and experimental evidence showthat variable selection is on critical demand to find out thespectral features corresponding to the target analyte com-ponent (eg the protein) so that the performance of thecalibration model can be significantly improved [14 15]

Many calibration methods are effectively applied to NIRquantitative analysis Partial least squares (PLS) is a standardlinear regression method commonly applied to NIR quan-titative calibration in areas of agriculture food and bio-medicine [16 17] e original form of PLS is to extractprincipal components both for the independent variables(ie the spectral wavelengths) and for the dependent var-iables (ie the contents of target components) It does notinclude any procedure of variable selection and all variablesare included in the original algorithm Even though a linearsubspace is built to suppress the influence of noise and thekey factor of the latent variable is tunable for model opti-mization the instinct noise interference of all variables isintroduced into the regression procedure along with thetarget information [18 19] and thus the model predictiveperformance is limited and can hardly be improved

Some refined methods related to variable selection wereproposed such as equivalent division of waveband [20]combined use of different subwavebands [21] movingwindow search of waveband [22] and competitive adaptivereweighted sampling of variables [23] ey are experi-mentally proved to be effective Nevertheless these methodsmainly emphasize the selection of variables at their initialstate (ie the wavelengths) or the extraction of principalcomponent variables from the initial wavelengths Seldomstudies concern on a secondary implementable variableoptimization which is much necessary for the enhancementof variable selection and further for the improvement ofmodel performance

Iterative optimization algorithms can be considered forthe secondary optimization of variable selection Genetic

algorithm (GA) is a kind of iterative optimization methodthat simulates the biological evolution in the sense of sur-vival fitness [24] It is on the basis of random global searchwith iterations for preset times Selection crossover andmutation will be carried out in the process of iteration sothat the initial population of variables will evolve as newgenerations [25] e genetic algorithm is well known andwidely used for variable selection for spectroscopic analysisembedded with the PLS or MLR regressions for quantitativecalibration [26 27]

In this paper genetic algorithmic iteration was designedas the core process of the implementable optimization forvariable selection in NIR quantitative analysis of fishmealprotein content Firstly the so-called informative wavebandswere identified using the refined method grid search movingwindow (GSMW) where the calibration models wereestablished by the classic PLS regression Considering thatprotein is the only target analyte representing the variationof fishmeal nutrition supply the search for latent variables inthe PLS algorithm is preferable to be simplified as theanalysis of latent principal component (LPC) en we aretending to launch a variable selection operation based on theprincipal component extraction combined with movingwindow technique (GSMW-LPC) where the genetic algo-rithm iteration is applied e available LPCs extracted fromeach subwaveband will be further optimized by GA iterationand parametric scaling mode is introduced for GA opti-mization Consequently the NIR calibration models werereestablished based on the GSMW-LPC-GA extracted var-iables and the models were experimentally examined toimprove the quantitative determination of protein contentsin fishmeal samples Hence genetic iteration is regarded asan implementable optimization strategy for variable selec-tion in NIR quantitative calibrations [28] As different fromprevious studies the design of GSMW-LPC-GA frameworkundertakes a hypercombined optimization of GSMW LPCand GA en the calibration model can be optimized injoint optimal selection of MW parameters and the numberof LPC and GA iterations Especially in the GA process thecrossover rate mutation rate and the iteration times weredesigned for tuning and a new subjection for stopping theGA iterations is identified is strategy has the potentialapplicable prospectiveness for the nondestructive rapiddetection of other analytes in fields of agriculture envi-ronment and animal husbandry

2 Methods

21 +e Parametric Scaling Genetic Algorithm for VariableSelection In genetic algorithm the trial individuals in eachgeneration are encoded the gene according to its chromo-somes using a specific language that guarantees a uniquemapping from genotype to phenotype [29] e phenotyperesults of the spectral calibration experiment are coded asgenotype results so that its properties components andcharacteristic responses can be predicted A loss function(also called fitness function or objectivity function) is de-fined to transform the genotype of an individual into a singlequantitative value that represents the evolutionary

2 Computational Intelligence and Neuroscience

performance of the individuals e genetic algorithm isexpressed as the following steps [30]

Step 1 a population is initially generated by randomlyselecting a subset of variables (ie spectral wave-lengths) e size of this population is preset A flag isset for each wavelength in the full-scanning regionEach variable is successively marked with flags wherethe flag is a binary marker equaling to 1 or 0 Flag 1represents the corresponding variable is selected intothe subset while Flag 0 represents the variable is notselectedStep 2 calibration models are established based on eachvariable subset and thus the model performance isvalidated and evaluated by computing the value of theloss function for the individuals (ie the variables) inthe current population in the way of cross-validationStep 3 the calibration models are compared and thenext generation is produced by selecting a suitablevariable subset that leads to improved performancewith high prediction accuracy e selected variablesubsets are further modified by crossover or bymutationStep 4 a modified variable subset is produced in oneway by the crossover of the available variables betweenthe selected variable subsets or in the other way bymutating the flags for each variable by smallprobabilityStep 5 the selected variable subsets and the modifiedvariable subsets are used to form a new variablepopulation that should be taken as the inputs andreturn to step 2

e steps 2ndash5 are repeated for iteration calculation thusto generate several new populations According to themodeling performance of cross-validation the iteration willbe stopped in one way when the model prediction errorbecomes stable or in the other way meeting the preset it-eration times e last generated population is regarded asthe optimal informative variable subset

e implementation of genetic algorithm depends oninternal parameters such as population size the proba-bility of crossover (commonly set as 50) and mutationoperators (commonly set as 12) and the stoppingcondition of iteration (commonly set as 500) [31] eparametric scaling strategy should be applied to achievethe optimization of genetic evolution e exploration ofparametric optimization refers to the enhancement of themodel prediction accuracy originated from the full-scanning variables regarding suitable genotype-phenotypecoding and aiming to find the minimal value of the lossfunction

22+eGrid SearchMode forMovingWindowOptimizationMoving window (MW) is a simple and effective way to findthe spectral informative features in a grid search optimi-zation manner [32 33] Combined with the classical PLSregression MW manages to extract spectral features in the

study of chemometric algorithmic optimization for NIRrapid analysis leading to appreciate prediction effects [34]

A window is initially defined by its size and its positionIn particular the size is annotated by the number of variables(N ie spectral wavelengths) in the window and a startingwavelength (S) represents the position of the window Wheneither N or S changes the window moves through the full-scanning spectral range with adjustable size us N and Sare regarded as two parameters to determine a window (iea subset of continuous wavelengths) A fix value of thecombination of (N S) precisely identifies a window in whichthe variables will be utilized for establishing calibrationmodels with some suitable quantitative regressionalgorithm

With a grid search manner the window parameters Nand S are respectively set tunable in designated changingranges For any fixed value of N S is set tunable from 1 toP minus N + 1 (here P represents the number of wavelengths inthe full-scanning range) the calibration models are estab-lished optimized and evaluated By comparing the modelprediction results the most optimal model corresponding tothe fixed N can be determined and simultaneously thepairing value of S can also be found On the counterpart forany fixed value of S N is set tunable from 1 to P minus S + 1 thecalibration models are also compared to determine the mostoptimal model corresponding to the prevalued S and theparing value of N is simultaneously observed In this way allpossible windows are tested and the most informativespectral waveband is identified

emethod of grid searchmoving window (GSMW) canbe expressed in statistical formulae to identify the paringvalue of N (or S) for a fixed value of S (or N)

Modelopt Nfixed( 1113857 middot window Nfix arg min ERR Stunable( 1113857 Nfixed11138681113868111386811138681113872 11138731113872 1113873

Modelopt Sfixed( 1113857 middot window arg min ERR Ntunable( 1113857 Sfix11138681113868111386811138681113872 1113873 Sfixed1113872 1113873

(1)

where the term on the left of the equal sign represents theoptimally found window (ie the most informative wave-band) the term on the right is the optimally selected windowsize and position expressed in the form of (N S) and ERR(middot)

is the loss function of the calibration model which is usuallydefined as the model prediction error

23 +e Optimization Framework of GSMW-LPC-GANIR calibration models were established and optimizedwhere the informative wavebands were selected by using thegrid search moving window operation As the parameters of(N S) were tunable all possible applicable subwavebandswere tested and thus several wavebands were identifiedwith outputting appreciable modeling results On this basisa secondary optimization was carried out using the para-metric scaling of GA as an implementable manner andbeforehand the latent principal variable components aregenerated by the method of principal component analysis

Figure 1 shows the flowchart of the GSMW-LPC-GAframework and the algorithmic steps are expressed asfollows

Computational Intelligence and Neuroscience 3

Framework Step 1 Wavelengths in the full-scanningrange of NIR spectroscopy were resolved as continu-ously changing variables e spectral data are nor-malized and pretreated by the method of standardnormal variate (SNV) [35]Framework Step 2 e grid search moving window(GSMW) technique was applied to the full-lengthspectral range In the way of screening all possiblewindows with applicable size and position the optimalsubwaveband can be identified and some quasioptimalsubwavebands (several appreciable parameter combi-nations of N and S) were found as wellFramework Step 3 e wavelengths in the selectedoptimal and quasioptimal subwavebands were trans-formed to the principal variables using the principalcomponent algorithm Each new generated compre-hensive variable of latent principal component (LPC) isexpressed by a linear function of the initial wavelengthsin the target subwaveband e LPCs are sorted in asequence with their descending information contri-bution and significant LPCs can be obtained in front ofthe sequence e LPCs should have been the optimaloutput variables for NIR predictions but GA isdesigned in this framework as the implementablemethod for secondary optimizationFramework Step 4 As for GA implementation thedesignated significant LPCs are taken as the candidateindividuals for GA inputs e initial population isgenerated by randomly selecting a subset of theavailable LPCs and then goes through the steps of the

parametric scaling GA iteration e loss function isreasonably defined obeying the rule of minimizing theNIR prediction errors Accompanied with the selectioncrossover and mutation processes new generations ofthe population are repeatedly produced and cyclicallyrefreshed us the secondary optimized informativevariables are updated in loops until the iteration stops

3 Data and Applications

31 Data Acquisition Fishmeal samples were prepared asanimal feeds A total of 194 samples were collected for ourexperiment 113 of them were acquired from Guangxi (anautonomous province in South China) and 81 were acquiredfrom Vietnam e samples were physically pretreated byair-drying crushing and sifting to particles with diametersno larger than 02mm e contents of protein were pre-viously detected for each sample using the conventionalKjeldahl method (Chinese National Standard GBT 6432-2018) e maximum and minimum values as well as thestatistical average value and the standard deviation wererecorded as 6703 5317 60670 and 4362 (wt)respectively

e NIR spectral measurement was performed by usingFOSS NIR Systems 5000 grating spectrometer (Foss NIR-Systems Inc Denmark) equipped with its diffuse reflectanceaccessory and a round sample cell To reduce the systematicerror each sample was repeatedly detected for 5 times andthe average spectrum was output as the spectrum of thesample e temperature in the laboratory was controlled at25plusmn 1degC and the humidity was controlled at 70plusmn 1 RH

Parametric scaling genetic optimization

Iteration

Spectral data

Pretreatment

Grid search moving window

Stunable = 1 1 (P ndash N + 1)Window(N arg(min ERR(Stunable))

Ntunable = 1 1 (P ndash S + 1)

Window(arg(min ERR(Ntunable) S)

Selected wavebands

Latent principal componentextraction

PCA transform

Latent variablesrepresenting main

information ofwavelengths

A set of individualwavelengths

Comprehensivevariables

Populationinitialized Calibration model Model evaluation

Variableoutput

Variableinput

Satisfied

NonsatisfiedOptimalvariables

N

S

Evolution

SelectionCrossover

Mutation

Figure 1 e flowchart of the optimization framework of GSMW-LPC-GA

4 Computational Intelligence and Neuroscience

throughout the spectral scanning processeNIR spectrumwas measured over the wavenumber range of 1100ndash2500 nmwith a resolution of 2 nm so that 700 distincable wave-lengths were identified as variables recording the spectraldata e spectra of 194 fishmeal samples are shown inFigure 2

32 Sample Sets and Model Indicators e proposed algo-rithmic framework of GSMW-LPC-GA was applied tovariable selection for NIR calibration of fishmeal for thequantitative determination of protein content For NIRcalibrations the fishmeal samples should be split into themodeling set and the testing set As for modeling repre-sentativeness the testing set was randomly picked in ad-vance as including 54 samples and the remaining sampleswere for modeling Because the NIR calibration models needestablishment and optimization the modeling samples werefurther divided into the calibration part (90 samples) and thevalidation part (50 samples) where the method of sample setpartitioning based on joint x-y distance [36] was appliedemaximum minimum and average values and the standarddeviation for the calibration part validation part and testingset are listed in Table 1 e statistical data of all sampleswere also listed

Quantitative indicators are needed to evaluate the cal-ibration model Root mean square error (RMSE) and cor-relation coefficients (R) are commonly used which aredefined as follows

RMSE

1113936 yi minus yiprime( 11138572

n

1113971

R 1113936 yi minus ym( 1113857 yi

prime minus ymprime( 1113857

1113936 yi minus ym( 11138572

1113936 yiprime minus ymprime( 1113857

21113969

(2)

where yi and yiprime are the Kjeldahl-measured protein content

and the NIR predictive value of the i-th fishmeal sample ym

and ymprime are the mean value of yi | i 1 2 n1113864 1113865 and

yiprime | i 1 2 n1113864 1113865 respectively and n represents the

number of target samples Consequently we denotedRMSECRC RMSEVRV and RMSETRT as the corre-sponding signs of RMSER for the calibration samplesvalidation samples and testing samples respectively

4 Results and Discussion

e optimization framework of GSMW-LPC-GA was ap-plied to the quantitative analysis of fishmeal protein basedon the NIR spectral matrix In the framework GSMWmodeling method was used to select wavebands with highsignal-to-noise ratio which were regarded as the informativecontinuous wavelengths for further optimization Next la-tent variables were extracted from the initial continuouswavelengths by using the latent principal componenttechnique so as to hold more information of the target withless designated variablesen genetic algorithmwas appliedto promote the variable population evolution with steps ofselection crossover and mutation In the end we managed

to obtain some informative variables for the calibration offishmeal protein aiming to have a prospective improvementfor the quality of NIR analysis

41 Waveband Selection for the Partition of GSMWGSMW is designed to use the grid search mode to optimizethe way of parameter identification for the moving windowalgorithm flow As is known that a specific window is de-termined by the position and its width (ie the number ofvariables in the window) the task of moving window op-timization is to find the target parameter combinations of(N S) that point to a solid window leading to optimal NIRprediction effect In this work we set N and S tunabletargeting 700 discrete initial scanning wavelengths Con-sidering that a model established on a large number ofvariables will increase the computational complexity thuswe design N valuing continuously when it is not over 100and changing with a jumping step of 10 points when it issmaller than 300 and with a step of 20 when larger than 300(ie N isin [1 1 100]cup [110 10 300]cup [320 20 700]) Inthis way the value of N roughly went through some rep-resentative cases from 1 to 700 and also guaranteed a lesscomputing load for acceptable operation with currentlyprevalent computers On the other hand the parametricvalue of S was set changing from 1 consecutively to 700 (ieS isin [1 1 700]) All applicable combinations of (N S)should subject to the rule of N + S minus 1leP where P is equalto 700 Consequently we totally tested 78 790 possiblewindows

PLS the commonly used calibration method was ap-plied for each determinant window to estimate the func-tional contributions of the window wavelengths to theresults of NIR calibrations e calibration samples wereused for model training and the validation samples forselecting the optimal model with a minimum value ofRMSEV e predictive RMSEV matrix is shown as acontour plot in Figure 3 corresponding to each fixed valueof (N S)en the most optimal window can be identified inFigure 3 Additionally some well-performing models areexpected to have further improvement in the next steps ofthe GSMW-LPC-GA framework us we finally determine

1100 1300 1500 1700 1900 2100 2300 250001

02

03

04

05

06

07

08

09

10

Log

(1R

)

Wavelength (nm)

Figure 2 NIR reflectance spectra of 193 fishmeal samples

Computational Intelligence and Neuroscience 5

to choose 5 optimal windows for further analysis (marked as(1) (2) (3) (4) and (5) in Figure 3)e prediction results aswell as their operating parameters of the PLS models in these5 windows are presented in Table 2 respectively We foundthat the most optimal waveband was 1446ndash1520 nm whichincludes 38 wavelengthsvariables Also we spotlight theselected 5 optimal wavebands in the full range (see Figure 4)

42GeneticOptimization for LPCs in the SelectedWavebandse selected 5 optimal wavebands were available for furtheroptimization e parameter scaling genetic algorithm wasapplied with principal component analysis plugged in evariables in each waveband were transformed to principalcomponents (LPC) as latent variables for analysis enumber of LPCs is determined by the number of the originalvariables in the waveband ese latent components weresorted in descending contribution to its original data whichmeans that the latent variables in front of the queue containmore information than the variables in the back us the

LPC transform projected the raw data into a reasonablyrefined variable space in which the target information iseasier to observe e original variables in windows (1) (2)(3) (4) and (5) were transformed into 24 23 38 36 and 18LPCs respectively All of the LPCs were used as the inputsfor genetic optimization

Genetic optimization is the vital implementation forvariable selection embedded in the GSMW-LPC-GAframework To make the GA procedure suitable for theselected input data the population size was set fitting the

0 50 100 150 200 250 300Serial number of the starting wavelength (S)

350 400 450 500 550 600 650 700

102030405060708090

100120140160180200220

260240

280300

340

Num

ber o

f var

iabl

es (N

) 380

420

460

500

540

580

620

660

700RMSEV

114108102969084787266605448

Figure 3 Contour plot of the validating results by the GSMW model

Table 1 e statistical data of fishmeal protein contents for the calibration part validation part and testing set

No of samples Maximum Minimum Average Standard deviationCalibration part 90 6703 5317 60593 4347Validation part 50 6619 5386 59174 4412Testing set 54 6693 5384 60437 4308All samples 194 6703 5317 60670 4360

Table 2e optimal model validating results corresponding to the5 selected wavebands

S N Waveband (nm) RMSEV (wt) RV

(1) 18 24 1134ndash1180 5146 0874(2) 77 23 1252ndash1296 5071 0884(3) 174 38 1446ndash1520 4944 0905(4) 441 36 1980ndash2050 5190 0892(5) 486 18 2070ndash2104 5312 0908

6 Computational Intelligence and Neuroscience

1100 1300 1500 1700 1900 2100 2300 250046

47

48

49

50

51

52

53

542070ndash2104

1980ndash2050

1446ndash1520

1252ndash12961134ndash1180

RMSE

V

Wavelength

The shape of thespectrum not fitting

the RMSEV-axis

Figure 4 e locations of the 5 selected wavebands in the full spectral range

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

Evolution for variablesin waveband 1252ndash1296nm

42

44

46

48

50

52

54

56

58

0 100 200 300 400 500Iterations

(b) (c)

(d) (e)

Evolution for variablesin waveband 1446ndash1520nm

42

44

46

48

50

52

54

56

58

RMSE

V(lo

ss fu

nctio

n)RM

SEV

(loss

func

tion)

0 100 200 300 400 500Iterations

Evolution for variablesin waveband 2070ndash2104nm

42

44

46

48

50

52

54

56

58

0 100 200 300 400 500Iterations

0 100 200 300 400 50042

44

46

48

50

52

54

56

58Evolution for variables

in waveband 1134ndash1180nm

RMSE

V(lo

ss fu

nctio

n)RM

SEV

(loss

func

tion)

RMSE

V(lo

ss fu

nctio

n)

Iterations(a)

Evolution for variablesin waveband 1980ndash2050nm

42

44

46

48

50

52

54

56

58

0 100 200 300 400 500Iterations

(1) (2) (3)

(4) (5) (6)

(7) (8) (9)

Probability of crossover40 50 60

Prob

abili

ty o

f mut

atio

n 08

12

15

Figure 5 e optimizational effects for the principal variables in the 5 selected wavebands based on the parametric scaling GA iterations

Computational Intelligence and Neuroscience 7

inputs and then the selection crossover and mutation werecarried out For parametric scaling tuning the probability ofcrossover was set as 40 50 and 60 for testing emutation operator was set testing the probabilities of 0812 and 15 en the population automatically evolvedin a loop iteration and the loop was stopped in case ofmeeting each of the following two conditions and theframework-optimized model was determined

Condition 1 When reaching 500 times iteration the loopwas mandatorily stopped

Condition 2 When there was no further change appearingin the loss function (ie the RMSEV) for consecutively 20times iteration the loop was adaptively stopped

We tested the specific scaling parameter valuing ofcrossover andmutation in genetic algorithm for each of the 5designated wavebands e framework optimization resultsin the genetic evolution process are shown in Figure 5 It canbe seen in Figure 5 that the evolution loop did not stop insome cases until 500 times iteration and in other cases theloop stopped before iteration reached 500 times Finally wewere able to find the best results for each of the 5 selectedwavebands by the genetic optimizational process e sec-ondary optimized RMSEV and the corresponding RV areshown in Table 3 as well as their preferable GA parametersWe concluded from Table 3 that all of the data in the 5selected wavebands were further optimized by GA and the

most optimal result was observed in the waveband of1980ndash2050 nm whose evolution loop stopped at the 434times iteration

43 +e Testing Results Based on the Optimal GSMW-LPC-GAModel As discussed above the predictive performance ofthe GSMW-LPC-GA framework on the calibration andvalidation sample sets was satisfactory e optimal modelselected by the GSMW-LPC-GA framework was establishedon the waveband 1980ndash2050 nm (36 variables included)with genetic optimization at 40 crossover and 08 mu-tation e evolution loop stopped at 434 times iterationis framework optimal model outputs the prediction errorRMSEV as 4215 and the correlation RV as 0926 To evaluatethe practical modeling effect of the framework we took thewaveband 1980ndash2050 nm as the target spectral data in thetesting part performed the principal component extractionand applied the same GA parameters to estimate themodeling effect for the 53 testing samples which were in-dependent of the training part e PLS regression plot isshown in Figure 6(a) demonstrating the testing resultswhere the RMSET and RT were 5341 and 0883 respectivelyAs for comparison the typical GA evolution method wasused for establishing and optimizing the PLS calibrationvalidation and testing processes With common algorithmicsettings GA runs with 50 crossover 12 mutation and500 times of iteration [31] e GA-PLS testing results are

Table 3 e optimal model validating results corresponding to the 5 selected wavebands

Waveband (nm)e optimal parameters for genetic evolution

RMSEV (wt) RVCrossover () Mutation () Iteration times

1134ndash1180 50 15 500 4755 09061252ndash1296 40 08 437 4546 08971446ndash1520 50 08 469 4353 09181980ndash2050 40 08 434 4215 09262070ndash2104 50 08 484 4649 0912e iteration times represent when the optimal RMSEV was kept as a constant value for a cycle of 20 times iteration or less than 20 in case the iteration hadreached 500 times

40 45 50 55 60 65 70 75 8040

45

50

55

60

65

70

75

80 GSMW-LPC-GA optimizationwaveband 1980ndash2050nm

Measured content

RMSEV = 5341RT = 0883

NIR

pre

dict

ed v

alue

(a)

40 45 50 55 60 65 70 75 8040

45

50

55

60

65

70

75

80

RMSEV = 6372RT = 0864

Typical GA optimizationwaveband 1100ndash2500nm

NIR

pre

dict

ed v

alue

Measured content

(b)

Figure 6 e PLS regression plot for test samples ((a) the proposed GSMW-LPC-GA framework and (b) the typical GA evolution)

8 Computational Intelligence and Neuroscience

shown as the regression plot in Figure 6(b) e comparativeexperiments show that the proposed GSMW-LPC-GAframework performs better than the typical GA evolution inthe rapid NIR quantitative prediction of fishmeal protein

5 Conclusions

NIR calibration model was established for the quantitativedetermination of protein content in fishmeal samples Anovel optimization framework of GSMW-LPC-GA wasconstructed for model optimization Possible wavebandswere tested for model enhancement by grid searchingmoving window parameters in which the starting point wasset continuously changing from 1 to the total number ofwavelengths in the scanning range and the window size wasset in a specific design to balance the model representa-tiveness and the computational complexity Successively 5optimal window wavebands were selected for further op-timization by LPC feature extraction and GA iterationGenetic algorithmic iteration was designed as the coreprocess of implementable optimization e variableswavelengths in each of the 5 selected wavebands weretransformed to LPCs as the input variables for GA iterationIn GA optimization the population size was set obeying thedimensions of the input variable set e probability ofcrossover and mutation was designed for parameter tuninge control of iteration enlarged the extension of parametricscaling for generation evolutions Finally the model wassecondarily optimized by the selection crossover andmutation operators in a refined interaction within theGSMW-LPC-GA framework

e model prediction effects resulting from the GSMW-LPC-GA framework were experimentally proved better thanthe effect by simply adopting the moving window PLSmodel In calibration-validation model training the optimalGA probabilities for crossover and mutation were 40 and08 respectively and the predictive RMSEV and RV werebest as 4215 and 0926 respectively at the waveband1980ndash2050 nm is is significantly better than the movingwindow output at the waveband 1446ndash1520 nm For modelevaluation the best model generated from the GSMW-LPC-GA framework was further estimated by the testing sampleset which is previously chosen outside of the trainingsamples e testing results were not so good as the trainingresults but it was practically acceptable for some industrialrapid detection cases with the RMSET and RT equaling to5341 and 0883 respectively

e experimental results demonstrate that the proposedGSMW-LPC-GA framework is suitable for NIR quantitativedetermination of fishmeal protein It was able to producebetter models in relation to the full-spectrum model withthe advantage of selecting informative wavebands for targetchemical analytes In the framework GA optimizationeventually plays an important role in the improvement of thecalibration model Further studies include using the NIRtechnology to perform rapid detections on other targets infields of animal husbandry agriculture and food scienceand genetic optimization iteration is expected to be animplementable method providing an efficient strategy for

analyzing some unknown changes and influence of variousfertilizers

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (61505037 and 61763008) the NaturalScience Foundation of Guangxi Province(2018GXNSFAA050045) Guangzhou Science and Tech-nology Program (202002030246) and Fangchenggang Sci-entific Research and Technology Development Plan (2014no 42)

References

[1] M A Booth G L Allan J Frances and S ParkinsonldquoReplacement of fish meal in diets for Australian silver perchBidyanus bidyanusrdquo Aquaculture vol 196 no 1-2 pp 67ndash852001

[2] R W Hardy ldquoUtilization of plant proteins in fish diets effectsof global demand and supplies of fishmealrdquo AquacultureResearch vol 41 no 5 pp 770ndash776 2010

[3] A Corten C-B Braham and A S Sadegh ldquoe developmentof a fishmeal industry in Mauritania and its impact on theregional stocks of sardinella and other small pelagics inNorthwest Africardquo Fisheries Research vol 186 pp 328ndash3362017

[4] D Han X Shan W Zhang et al ldquoA revisit to fishmeal usageand associated consequences in Chinese aquaculturerdquo Re-views in Aquaculture vol 10 no 2 pp 493ndash507 2018

[5] D Cozzolino A Chree I Murray and J R Scaife ldquoeassessment of the chemical composition of fishmeal by nearinfrared reflectance spectroscopyrdquo Aquaculture Nutritionvol 8 no 2 pp 149ndash155 2002

[6] M Uddin E Okazaki H Fukushima S Turza Y Yumikoand Y Fukuda ldquoNondestructive determination of water andprotein in surimi by near-infrared spectroscopyrdquo FoodChemistry vol 96 no 3 pp 491ndash495 2006

[7] H Yamasaki and S Morita ldquoIdentification of the epoxycuring mechanism under isothermal conditions by thermalanalysis and infrared spectroscopyrdquo Journal of MolecularStructure vol 1069 no 1 pp 164ndash170 2014

[8] J Yan N Villarreal and B Xu ldquoCharacterization of degra-dation of cotton cellulosic fibers through near infraredspectroscopyrdquo Journal of Polymers and the Environmentvol 21 no 4 pp 902ndash909 2013

[9] P Sirisomboon P Pornchaloempong P RamsiriS Pongkuan and S Srikornkarn ldquoEvaluation of the saltcontent of canned sardines in tomato ketchup by diffusereflection near infrared spectroscopyrdquo Journal of Near In-frared Spectroscopy vol 22 no 5 pp 329ndash336 2014

[10] C A T dos Santos M Lopo R N M J Pascoa andJ A Lopes ldquoA review on the applications of portable near-

Computational Intelligence and Neuroscience 9

infrared spectrometers in the agro-food industryrdquo AppliedSpectroscopy vol 67 no 11 pp 1215ndash1233 2013

[11] A Sakudo ldquoNear-infrared spectroscopy for medical appli-cations Current status and future perspectivesrdquo ClinicaChimica Acta vol 455 pp 181ndash188 2016

[12] S Hong H Chen J Gu K Cai Z Liu and J WenldquoComparative assessment on smart pre-processing methodsfor extracting information in ft-nir measured datardquo Mea-surement vol 157 Article ID 107663 2020

[13] T N Kok and J W Park ldquoExtending the shelf life of set fishballrdquo Journal of Food Quality vol 30 no 1 pp 1ndash27 2007

[14] G Foca C Ferrari A Ulrici M C Ielo G Minelli andD P Lo Fiego ldquoIodine value and fatty acids determination onpig fat samples by FT-NIR spectroscopy benefits of variableselection in the perspective of industrial applicationsrdquo FoodAnalytical Methods vol 9 no 10 pp 2791ndash2806 2016

[15] F Westad and H Martens ldquoVariable selection in near in-frared spectroscopy based on significance testing in partialleast squares regressionrdquo Journal of Near Infrared Spectros-copy vol 8 no 2 pp 117ndash124 2000

[16] T Mehmood K H Liland L Snipen and S Saeligboslash ldquoA reviewof variable selection methods in partial least squares regres-sionrdquo Chemometrics and Intelligent Laboratory Systemsvol 118 pp 62ndash69 2012

[17] S Kuriakose and H Joe ldquoQualitative and quantitative analysisin sandalwood oils using near infrared spectroscopy com-bined with chemometric techniquesrdquo Food Chemistryvol 135 no 1 pp 213ndash218 2012

[18] M Versaci S Calcagno M Cacciola F C MorabitoI Palamara and D Pellicano ldquoStandard soft computingtechniques for characterization of defects in nondestructiveevaluationrdquo in Ultrasonic Nondestructive Evaluation SystemsP Burrascano S Callegari A Montisci et al Eds pp 175ndash199 Springer Cham Heidelberg Dordrecht London 2015

[19] M Versaci ldquoFuzzy approach and eddy currents NDTNDEdevices in industrial applicationsrdquo Electronics Letters vol 52no 11 pp 943ndash945 2016

[20] Y-H Yun H-D Li L E R Wood et al ldquoAn efficient methodof wavelength interval selection based on random frog formultivariate spectral calibrationrdquo Spectrochimica Acta Part AMolecular and Biomolecular Spectroscopy vol 111 pp 31ndash362013

[21] H Chen Z Liu K Cai L Xu and A Chen ldquoGrid searchparametric optimization for FT-NIR quantitative analysis ofsolid soluble content in strawberry samplesrdquo VibrationalSpectroscopy vol 94 pp 7ndash15 2018

[22] S Kasemsumran N Suttiwijitpukdee and V KeeratinijakalldquoRapid classification of turmeric based on DNA fingerprint bynear-infrared spectroscopy combined with moving windowpartial least squares-discrimination analysisrdquo Analytical Sci-ences vol 33 no 1 pp 111ndash115 2017

[23] K Zheng T Feng W Zhang et al ldquoVariable selection bydouble competitive adaptive reweighted sampling for cali-bration transfer of near infrared spectrardquo Chemometrics andIntelligent Laboratory Systems vol 191 pp 109ndash117 2019

[24] A Joo A Ekart and J P Neirotti ldquoGenetic algorithms fordiscovery of matrix multiplication methodsrdquo IEEE Transac-tions on Evolutionary Computation vol 16 no 5 pp 749ndash7512012

[25] E Haq I Ahmad A Hussain and I M Almanjahie ldquoA novelselection approach for genetic algorithms for global optimi-zation of multimodal continuous functionsrdquo ComputationalIntelligence and Neuroscience vol 2019 Article ID 86402182019

[26] B K Lavine and C G White ldquoBoosting the performance ofgenetic algorithms for variable selection in partial leastsquares spectral calibrationsrdquo Applied Spectroscopy vol 71no 9 pp 2092ndash2101 2017

[27] L Mo H Chen W Chen Q Feng and L Xu ldquoStudy onevolution methods for the optimization of machine learningmodels based on FT-NIR spectroscopyrdquo Infrared Physics ampTechnology vol 108 Article ID 103366 2020

[28] Y-H Yun H-D Li B-C Deng and D-S Cao ldquoAn overviewof variable selection methods in multivariate analysis of near-infrared spectrardquo TrAC Trends in Analytical Chemistryvol 113 pp 102ndash115 2019

[29] S Forrest ldquoGenetic algorithms principles of natural selectionapplied to computationrdquo Science vol 261 no 5123pp 872ndash878 1993

[30] H Jiang W Xu and Q Chen ldquoComparison of algorithms forwavelength variables selection from near-infrared (NIR)spectra for quantitative monitoring of yeast (saccharomycescerevisiae) cultivationsrdquo Spectrochimica Acta Part A Mo-lecular and Biomolecular Spectroscopy vol 214 pp 366ndash3712019

[31] J Koljonen T E M Nordling and J T Alander ldquoA review ofgenetic algorithms in near infrared spectroscopy and che-mometrics past and futurerdquo Journal of Near Infrared Spec-troscopy vol 16 no 3 pp 189ndash197 2008

[32] J-H Jiang R J Berry H W Siesler and Y OzakildquoWavelength interval selection in multicomponent spectralanalysis by moving window partial least-squares regressionwith applications to mid-infrared and near-infrared spec-troscopic datardquo Analytical Chemistry vol 74 no 14pp 3555ndash3565 2002

[33] H-Z Chen Q-Q Song G-Q Tang and L-L Xu ldquoAnoptimization strategy for waveband selection in FT-NIRquantitative analysis of corn proteinrdquo Journal of Cereal Sci-ence vol 60 no 3 pp 595ndash601 2014

[34] C Ranzan L F Trierweiler B Hitzmann andJ O Trierweiler ldquoNIR pre-selection data using modifiedchangeable size moving window partial least squares and purespectral chemometrical modeling with ant colony optimiza-tion for wheat flour characterizationrdquo Chemometrics andIntelligent Laboratory Systems vol 142 pp 78ndash86 2015

[35] T Fearn C Riccioli A Garrido-Varo and J E Guerrero-Ginel ldquoOn the geometry of SNV and MSCrdquo Chemometricsand Intelligent Laboratory Systems vol 96 no 1 pp 22ndash262009

[36] R Galvao M Araujo G Jose M Pontes E Silva andT Saldanha ldquoA method for calibration and validation subsetpartitioningrdquo Talanta vol 67 no 4 pp 736ndash740 2005

10 Computational Intelligence and Neuroscience

Page 2: ANovelGeneticAlgorithm-BasedOptimizationFrameworkfor ...variable selection for NIR calibration of fishmeal for the quantitative determination of protein content. For NIR calibrations,

fast application with no destruction to samples and minimalrequirement of pretreatment and thus it has been devel-oped to be a prevalent technique for quantitative andqualitative analysis and well established in fields of agri-cultural engineering food safety environmental assessmentand medical and pharmaceutical science [9ndash12] Conse-quently NIR spectrometers are increasingly becoming agood choice for in situ measurements on the spots Howeverin the quantitative prediction of fishmeal there are manycomponents other than protein (eg moisture ash andamino acids) which will raise major effects in the process ofprotein detection [13] NIR fast detection is accompaniedwith noises from the nontarget components and interfer-ences from the instable instrument

NIR spectral response is detected at each resolvablewavelength to describe a sample to be tested When hun-dreds of samples are tested the NIR data are recorded as aspectral matrix as the incident NIR light that is preset in acertain frequency range is split into a bunch of wavelengthsBut noting that not all of the wavelengths have a high signal-to-noise ratio spectral features should be extracted toidentify the informative wavelengths and a calibrationmodel should be established and optimized In recent yearsboth theoretical evidence and experimental evidence showthat variable selection is on critical demand to find out thespectral features corresponding to the target analyte com-ponent (eg the protein) so that the performance of thecalibration model can be significantly improved [14 15]

Many calibration methods are effectively applied to NIRquantitative analysis Partial least squares (PLS) is a standardlinear regression method commonly applied to NIR quan-titative calibration in areas of agriculture food and bio-medicine [16 17] e original form of PLS is to extractprincipal components both for the independent variables(ie the spectral wavelengths) and for the dependent var-iables (ie the contents of target components) It does notinclude any procedure of variable selection and all variablesare included in the original algorithm Even though a linearsubspace is built to suppress the influence of noise and thekey factor of the latent variable is tunable for model opti-mization the instinct noise interference of all variables isintroduced into the regression procedure along with thetarget information [18 19] and thus the model predictiveperformance is limited and can hardly be improved

Some refined methods related to variable selection wereproposed such as equivalent division of waveband [20]combined use of different subwavebands [21] movingwindow search of waveband [22] and competitive adaptivereweighted sampling of variables [23] ey are experi-mentally proved to be effective Nevertheless these methodsmainly emphasize the selection of variables at their initialstate (ie the wavelengths) or the extraction of principalcomponent variables from the initial wavelengths Seldomstudies concern on a secondary implementable variableoptimization which is much necessary for the enhancementof variable selection and further for the improvement ofmodel performance

Iterative optimization algorithms can be considered forthe secondary optimization of variable selection Genetic

algorithm (GA) is a kind of iterative optimization methodthat simulates the biological evolution in the sense of sur-vival fitness [24] It is on the basis of random global searchwith iterations for preset times Selection crossover andmutation will be carried out in the process of iteration sothat the initial population of variables will evolve as newgenerations [25] e genetic algorithm is well known andwidely used for variable selection for spectroscopic analysisembedded with the PLS or MLR regressions for quantitativecalibration [26 27]

In this paper genetic algorithmic iteration was designedas the core process of the implementable optimization forvariable selection in NIR quantitative analysis of fishmealprotein content Firstly the so-called informative wavebandswere identified using the refined method grid search movingwindow (GSMW) where the calibration models wereestablished by the classic PLS regression Considering thatprotein is the only target analyte representing the variationof fishmeal nutrition supply the search for latent variables inthe PLS algorithm is preferable to be simplified as theanalysis of latent principal component (LPC) en we aretending to launch a variable selection operation based on theprincipal component extraction combined with movingwindow technique (GSMW-LPC) where the genetic algo-rithm iteration is applied e available LPCs extracted fromeach subwaveband will be further optimized by GA iterationand parametric scaling mode is introduced for GA opti-mization Consequently the NIR calibration models werereestablished based on the GSMW-LPC-GA extracted var-iables and the models were experimentally examined toimprove the quantitative determination of protein contentsin fishmeal samples Hence genetic iteration is regarded asan implementable optimization strategy for variable selec-tion in NIR quantitative calibrations [28] As different fromprevious studies the design of GSMW-LPC-GA frameworkundertakes a hypercombined optimization of GSMW LPCand GA en the calibration model can be optimized injoint optimal selection of MW parameters and the numberof LPC and GA iterations Especially in the GA process thecrossover rate mutation rate and the iteration times weredesigned for tuning and a new subjection for stopping theGA iterations is identified is strategy has the potentialapplicable prospectiveness for the nondestructive rapiddetection of other analytes in fields of agriculture envi-ronment and animal husbandry

2 Methods

21 +e Parametric Scaling Genetic Algorithm for VariableSelection In genetic algorithm the trial individuals in eachgeneration are encoded the gene according to its chromo-somes using a specific language that guarantees a uniquemapping from genotype to phenotype [29] e phenotyperesults of the spectral calibration experiment are coded asgenotype results so that its properties components andcharacteristic responses can be predicted A loss function(also called fitness function or objectivity function) is de-fined to transform the genotype of an individual into a singlequantitative value that represents the evolutionary

2 Computational Intelligence and Neuroscience

performance of the individuals e genetic algorithm isexpressed as the following steps [30]

Step 1 a population is initially generated by randomlyselecting a subset of variables (ie spectral wave-lengths) e size of this population is preset A flag isset for each wavelength in the full-scanning regionEach variable is successively marked with flags wherethe flag is a binary marker equaling to 1 or 0 Flag 1represents the corresponding variable is selected intothe subset while Flag 0 represents the variable is notselectedStep 2 calibration models are established based on eachvariable subset and thus the model performance isvalidated and evaluated by computing the value of theloss function for the individuals (ie the variables) inthe current population in the way of cross-validationStep 3 the calibration models are compared and thenext generation is produced by selecting a suitablevariable subset that leads to improved performancewith high prediction accuracy e selected variablesubsets are further modified by crossover or bymutationStep 4 a modified variable subset is produced in oneway by the crossover of the available variables betweenthe selected variable subsets or in the other way bymutating the flags for each variable by smallprobabilityStep 5 the selected variable subsets and the modifiedvariable subsets are used to form a new variablepopulation that should be taken as the inputs andreturn to step 2

e steps 2ndash5 are repeated for iteration calculation thusto generate several new populations According to themodeling performance of cross-validation the iteration willbe stopped in one way when the model prediction errorbecomes stable or in the other way meeting the preset it-eration times e last generated population is regarded asthe optimal informative variable subset

e implementation of genetic algorithm depends oninternal parameters such as population size the proba-bility of crossover (commonly set as 50) and mutationoperators (commonly set as 12) and the stoppingcondition of iteration (commonly set as 500) [31] eparametric scaling strategy should be applied to achievethe optimization of genetic evolution e exploration ofparametric optimization refers to the enhancement of themodel prediction accuracy originated from the full-scanning variables regarding suitable genotype-phenotypecoding and aiming to find the minimal value of the lossfunction

22+eGrid SearchMode forMovingWindowOptimizationMoving window (MW) is a simple and effective way to findthe spectral informative features in a grid search optimi-zation manner [32 33] Combined with the classical PLSregression MW manages to extract spectral features in the

study of chemometric algorithmic optimization for NIRrapid analysis leading to appreciate prediction effects [34]

A window is initially defined by its size and its positionIn particular the size is annotated by the number of variables(N ie spectral wavelengths) in the window and a startingwavelength (S) represents the position of the window Wheneither N or S changes the window moves through the full-scanning spectral range with adjustable size us N and Sare regarded as two parameters to determine a window (iea subset of continuous wavelengths) A fix value of thecombination of (N S) precisely identifies a window in whichthe variables will be utilized for establishing calibrationmodels with some suitable quantitative regressionalgorithm

With a grid search manner the window parameters Nand S are respectively set tunable in designated changingranges For any fixed value of N S is set tunable from 1 toP minus N + 1 (here P represents the number of wavelengths inthe full-scanning range) the calibration models are estab-lished optimized and evaluated By comparing the modelprediction results the most optimal model corresponding tothe fixed N can be determined and simultaneously thepairing value of S can also be found On the counterpart forany fixed value of S N is set tunable from 1 to P minus S + 1 thecalibration models are also compared to determine the mostoptimal model corresponding to the prevalued S and theparing value of N is simultaneously observed In this way allpossible windows are tested and the most informativespectral waveband is identified

emethod of grid searchmoving window (GSMW) canbe expressed in statistical formulae to identify the paringvalue of N (or S) for a fixed value of S (or N)

Modelopt Nfixed( 1113857 middot window Nfix arg min ERR Stunable( 1113857 Nfixed11138681113868111386811138681113872 11138731113872 1113873

Modelopt Sfixed( 1113857 middot window arg min ERR Ntunable( 1113857 Sfix11138681113868111386811138681113872 1113873 Sfixed1113872 1113873

(1)

where the term on the left of the equal sign represents theoptimally found window (ie the most informative wave-band) the term on the right is the optimally selected windowsize and position expressed in the form of (N S) and ERR(middot)

is the loss function of the calibration model which is usuallydefined as the model prediction error

23 +e Optimization Framework of GSMW-LPC-GANIR calibration models were established and optimizedwhere the informative wavebands were selected by using thegrid search moving window operation As the parameters of(N S) were tunable all possible applicable subwavebandswere tested and thus several wavebands were identifiedwith outputting appreciable modeling results On this basisa secondary optimization was carried out using the para-metric scaling of GA as an implementable manner andbeforehand the latent principal variable components aregenerated by the method of principal component analysis

Figure 1 shows the flowchart of the GSMW-LPC-GAframework and the algorithmic steps are expressed asfollows

Computational Intelligence and Neuroscience 3

Framework Step 1 Wavelengths in the full-scanningrange of NIR spectroscopy were resolved as continu-ously changing variables e spectral data are nor-malized and pretreated by the method of standardnormal variate (SNV) [35]Framework Step 2 e grid search moving window(GSMW) technique was applied to the full-lengthspectral range In the way of screening all possiblewindows with applicable size and position the optimalsubwaveband can be identified and some quasioptimalsubwavebands (several appreciable parameter combi-nations of N and S) were found as wellFramework Step 3 e wavelengths in the selectedoptimal and quasioptimal subwavebands were trans-formed to the principal variables using the principalcomponent algorithm Each new generated compre-hensive variable of latent principal component (LPC) isexpressed by a linear function of the initial wavelengthsin the target subwaveband e LPCs are sorted in asequence with their descending information contri-bution and significant LPCs can be obtained in front ofthe sequence e LPCs should have been the optimaloutput variables for NIR predictions but GA isdesigned in this framework as the implementablemethod for secondary optimizationFramework Step 4 As for GA implementation thedesignated significant LPCs are taken as the candidateindividuals for GA inputs e initial population isgenerated by randomly selecting a subset of theavailable LPCs and then goes through the steps of the

parametric scaling GA iteration e loss function isreasonably defined obeying the rule of minimizing theNIR prediction errors Accompanied with the selectioncrossover and mutation processes new generations ofthe population are repeatedly produced and cyclicallyrefreshed us the secondary optimized informativevariables are updated in loops until the iteration stops

3 Data and Applications

31 Data Acquisition Fishmeal samples were prepared asanimal feeds A total of 194 samples were collected for ourexperiment 113 of them were acquired from Guangxi (anautonomous province in South China) and 81 were acquiredfrom Vietnam e samples were physically pretreated byair-drying crushing and sifting to particles with diametersno larger than 02mm e contents of protein were pre-viously detected for each sample using the conventionalKjeldahl method (Chinese National Standard GBT 6432-2018) e maximum and minimum values as well as thestatistical average value and the standard deviation wererecorded as 6703 5317 60670 and 4362 (wt)respectively

e NIR spectral measurement was performed by usingFOSS NIR Systems 5000 grating spectrometer (Foss NIR-Systems Inc Denmark) equipped with its diffuse reflectanceaccessory and a round sample cell To reduce the systematicerror each sample was repeatedly detected for 5 times andthe average spectrum was output as the spectrum of thesample e temperature in the laboratory was controlled at25plusmn 1degC and the humidity was controlled at 70plusmn 1 RH

Parametric scaling genetic optimization

Iteration

Spectral data

Pretreatment

Grid search moving window

Stunable = 1 1 (P ndash N + 1)Window(N arg(min ERR(Stunable))

Ntunable = 1 1 (P ndash S + 1)

Window(arg(min ERR(Ntunable) S)

Selected wavebands

Latent principal componentextraction

PCA transform

Latent variablesrepresenting main

information ofwavelengths

A set of individualwavelengths

Comprehensivevariables

Populationinitialized Calibration model Model evaluation

Variableoutput

Variableinput

Satisfied

NonsatisfiedOptimalvariables

N

S

Evolution

SelectionCrossover

Mutation

Figure 1 e flowchart of the optimization framework of GSMW-LPC-GA

4 Computational Intelligence and Neuroscience

throughout the spectral scanning processeNIR spectrumwas measured over the wavenumber range of 1100ndash2500 nmwith a resolution of 2 nm so that 700 distincable wave-lengths were identified as variables recording the spectraldata e spectra of 194 fishmeal samples are shown inFigure 2

32 Sample Sets and Model Indicators e proposed algo-rithmic framework of GSMW-LPC-GA was applied tovariable selection for NIR calibration of fishmeal for thequantitative determination of protein content For NIRcalibrations the fishmeal samples should be split into themodeling set and the testing set As for modeling repre-sentativeness the testing set was randomly picked in ad-vance as including 54 samples and the remaining sampleswere for modeling Because the NIR calibration models needestablishment and optimization the modeling samples werefurther divided into the calibration part (90 samples) and thevalidation part (50 samples) where the method of sample setpartitioning based on joint x-y distance [36] was appliedemaximum minimum and average values and the standarddeviation for the calibration part validation part and testingset are listed in Table 1 e statistical data of all sampleswere also listed

Quantitative indicators are needed to evaluate the cal-ibration model Root mean square error (RMSE) and cor-relation coefficients (R) are commonly used which aredefined as follows

RMSE

1113936 yi minus yiprime( 11138572

n

1113971

R 1113936 yi minus ym( 1113857 yi

prime minus ymprime( 1113857

1113936 yi minus ym( 11138572

1113936 yiprime minus ymprime( 1113857

21113969

(2)

where yi and yiprime are the Kjeldahl-measured protein content

and the NIR predictive value of the i-th fishmeal sample ym

and ymprime are the mean value of yi | i 1 2 n1113864 1113865 and

yiprime | i 1 2 n1113864 1113865 respectively and n represents the

number of target samples Consequently we denotedRMSECRC RMSEVRV and RMSETRT as the corre-sponding signs of RMSER for the calibration samplesvalidation samples and testing samples respectively

4 Results and Discussion

e optimization framework of GSMW-LPC-GA was ap-plied to the quantitative analysis of fishmeal protein basedon the NIR spectral matrix In the framework GSMWmodeling method was used to select wavebands with highsignal-to-noise ratio which were regarded as the informativecontinuous wavelengths for further optimization Next la-tent variables were extracted from the initial continuouswavelengths by using the latent principal componenttechnique so as to hold more information of the target withless designated variablesen genetic algorithmwas appliedto promote the variable population evolution with steps ofselection crossover and mutation In the end we managed

to obtain some informative variables for the calibration offishmeal protein aiming to have a prospective improvementfor the quality of NIR analysis

41 Waveband Selection for the Partition of GSMWGSMW is designed to use the grid search mode to optimizethe way of parameter identification for the moving windowalgorithm flow As is known that a specific window is de-termined by the position and its width (ie the number ofvariables in the window) the task of moving window op-timization is to find the target parameter combinations of(N S) that point to a solid window leading to optimal NIRprediction effect In this work we set N and S tunabletargeting 700 discrete initial scanning wavelengths Con-sidering that a model established on a large number ofvariables will increase the computational complexity thuswe design N valuing continuously when it is not over 100and changing with a jumping step of 10 points when it issmaller than 300 and with a step of 20 when larger than 300(ie N isin [1 1 100]cup [110 10 300]cup [320 20 700]) Inthis way the value of N roughly went through some rep-resentative cases from 1 to 700 and also guaranteed a lesscomputing load for acceptable operation with currentlyprevalent computers On the other hand the parametricvalue of S was set changing from 1 consecutively to 700 (ieS isin [1 1 700]) All applicable combinations of (N S)should subject to the rule of N + S minus 1leP where P is equalto 700 Consequently we totally tested 78 790 possiblewindows

PLS the commonly used calibration method was ap-plied for each determinant window to estimate the func-tional contributions of the window wavelengths to theresults of NIR calibrations e calibration samples wereused for model training and the validation samples forselecting the optimal model with a minimum value ofRMSEV e predictive RMSEV matrix is shown as acontour plot in Figure 3 corresponding to each fixed valueof (N S)en the most optimal window can be identified inFigure 3 Additionally some well-performing models areexpected to have further improvement in the next steps ofthe GSMW-LPC-GA framework us we finally determine

1100 1300 1500 1700 1900 2100 2300 250001

02

03

04

05

06

07

08

09

10

Log

(1R

)

Wavelength (nm)

Figure 2 NIR reflectance spectra of 193 fishmeal samples

Computational Intelligence and Neuroscience 5

to choose 5 optimal windows for further analysis (marked as(1) (2) (3) (4) and (5) in Figure 3)e prediction results aswell as their operating parameters of the PLS models in these5 windows are presented in Table 2 respectively We foundthat the most optimal waveband was 1446ndash1520 nm whichincludes 38 wavelengthsvariables Also we spotlight theselected 5 optimal wavebands in the full range (see Figure 4)

42GeneticOptimization for LPCs in the SelectedWavebandse selected 5 optimal wavebands were available for furtheroptimization e parameter scaling genetic algorithm wasapplied with principal component analysis plugged in evariables in each waveband were transformed to principalcomponents (LPC) as latent variables for analysis enumber of LPCs is determined by the number of the originalvariables in the waveband ese latent components weresorted in descending contribution to its original data whichmeans that the latent variables in front of the queue containmore information than the variables in the back us the

LPC transform projected the raw data into a reasonablyrefined variable space in which the target information iseasier to observe e original variables in windows (1) (2)(3) (4) and (5) were transformed into 24 23 38 36 and 18LPCs respectively All of the LPCs were used as the inputsfor genetic optimization

Genetic optimization is the vital implementation forvariable selection embedded in the GSMW-LPC-GAframework To make the GA procedure suitable for theselected input data the population size was set fitting the

0 50 100 150 200 250 300Serial number of the starting wavelength (S)

350 400 450 500 550 600 650 700

102030405060708090

100120140160180200220

260240

280300

340

Num

ber o

f var

iabl

es (N

) 380

420

460

500

540

580

620

660

700RMSEV

114108102969084787266605448

Figure 3 Contour plot of the validating results by the GSMW model

Table 1 e statistical data of fishmeal protein contents for the calibration part validation part and testing set

No of samples Maximum Minimum Average Standard deviationCalibration part 90 6703 5317 60593 4347Validation part 50 6619 5386 59174 4412Testing set 54 6693 5384 60437 4308All samples 194 6703 5317 60670 4360

Table 2e optimal model validating results corresponding to the5 selected wavebands

S N Waveband (nm) RMSEV (wt) RV

(1) 18 24 1134ndash1180 5146 0874(2) 77 23 1252ndash1296 5071 0884(3) 174 38 1446ndash1520 4944 0905(4) 441 36 1980ndash2050 5190 0892(5) 486 18 2070ndash2104 5312 0908

6 Computational Intelligence and Neuroscience

1100 1300 1500 1700 1900 2100 2300 250046

47

48

49

50

51

52

53

542070ndash2104

1980ndash2050

1446ndash1520

1252ndash12961134ndash1180

RMSE

V

Wavelength

The shape of thespectrum not fitting

the RMSEV-axis

Figure 4 e locations of the 5 selected wavebands in the full spectral range

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

Evolution for variablesin waveband 1252ndash1296nm

42

44

46

48

50

52

54

56

58

0 100 200 300 400 500Iterations

(b) (c)

(d) (e)

Evolution for variablesin waveband 1446ndash1520nm

42

44

46

48

50

52

54

56

58

RMSE

V(lo

ss fu

nctio

n)RM

SEV

(loss

func

tion)

0 100 200 300 400 500Iterations

Evolution for variablesin waveband 2070ndash2104nm

42

44

46

48

50

52

54

56

58

0 100 200 300 400 500Iterations

0 100 200 300 400 50042

44

46

48

50

52

54

56

58Evolution for variables

in waveband 1134ndash1180nm

RMSE

V(lo

ss fu

nctio

n)RM

SEV

(loss

func

tion)

RMSE

V(lo

ss fu

nctio

n)

Iterations(a)

Evolution for variablesin waveband 1980ndash2050nm

42

44

46

48

50

52

54

56

58

0 100 200 300 400 500Iterations

(1) (2) (3)

(4) (5) (6)

(7) (8) (9)

Probability of crossover40 50 60

Prob

abili

ty o

f mut

atio

n 08

12

15

Figure 5 e optimizational effects for the principal variables in the 5 selected wavebands based on the parametric scaling GA iterations

Computational Intelligence and Neuroscience 7

inputs and then the selection crossover and mutation werecarried out For parametric scaling tuning the probability ofcrossover was set as 40 50 and 60 for testing emutation operator was set testing the probabilities of 0812 and 15 en the population automatically evolvedin a loop iteration and the loop was stopped in case ofmeeting each of the following two conditions and theframework-optimized model was determined

Condition 1 When reaching 500 times iteration the loopwas mandatorily stopped

Condition 2 When there was no further change appearingin the loss function (ie the RMSEV) for consecutively 20times iteration the loop was adaptively stopped

We tested the specific scaling parameter valuing ofcrossover andmutation in genetic algorithm for each of the 5designated wavebands e framework optimization resultsin the genetic evolution process are shown in Figure 5 It canbe seen in Figure 5 that the evolution loop did not stop insome cases until 500 times iteration and in other cases theloop stopped before iteration reached 500 times Finally wewere able to find the best results for each of the 5 selectedwavebands by the genetic optimizational process e sec-ondary optimized RMSEV and the corresponding RV areshown in Table 3 as well as their preferable GA parametersWe concluded from Table 3 that all of the data in the 5selected wavebands were further optimized by GA and the

most optimal result was observed in the waveband of1980ndash2050 nm whose evolution loop stopped at the 434times iteration

43 +e Testing Results Based on the Optimal GSMW-LPC-GAModel As discussed above the predictive performance ofthe GSMW-LPC-GA framework on the calibration andvalidation sample sets was satisfactory e optimal modelselected by the GSMW-LPC-GA framework was establishedon the waveband 1980ndash2050 nm (36 variables included)with genetic optimization at 40 crossover and 08 mu-tation e evolution loop stopped at 434 times iterationis framework optimal model outputs the prediction errorRMSEV as 4215 and the correlation RV as 0926 To evaluatethe practical modeling effect of the framework we took thewaveband 1980ndash2050 nm as the target spectral data in thetesting part performed the principal component extractionand applied the same GA parameters to estimate themodeling effect for the 53 testing samples which were in-dependent of the training part e PLS regression plot isshown in Figure 6(a) demonstrating the testing resultswhere the RMSET and RT were 5341 and 0883 respectivelyAs for comparison the typical GA evolution method wasused for establishing and optimizing the PLS calibrationvalidation and testing processes With common algorithmicsettings GA runs with 50 crossover 12 mutation and500 times of iteration [31] e GA-PLS testing results are

Table 3 e optimal model validating results corresponding to the 5 selected wavebands

Waveband (nm)e optimal parameters for genetic evolution

RMSEV (wt) RVCrossover () Mutation () Iteration times

1134ndash1180 50 15 500 4755 09061252ndash1296 40 08 437 4546 08971446ndash1520 50 08 469 4353 09181980ndash2050 40 08 434 4215 09262070ndash2104 50 08 484 4649 0912e iteration times represent when the optimal RMSEV was kept as a constant value for a cycle of 20 times iteration or less than 20 in case the iteration hadreached 500 times

40 45 50 55 60 65 70 75 8040

45

50

55

60

65

70

75

80 GSMW-LPC-GA optimizationwaveband 1980ndash2050nm

Measured content

RMSEV = 5341RT = 0883

NIR

pre

dict

ed v

alue

(a)

40 45 50 55 60 65 70 75 8040

45

50

55

60

65

70

75

80

RMSEV = 6372RT = 0864

Typical GA optimizationwaveband 1100ndash2500nm

NIR

pre

dict

ed v

alue

Measured content

(b)

Figure 6 e PLS regression plot for test samples ((a) the proposed GSMW-LPC-GA framework and (b) the typical GA evolution)

8 Computational Intelligence and Neuroscience

shown as the regression plot in Figure 6(b) e comparativeexperiments show that the proposed GSMW-LPC-GAframework performs better than the typical GA evolution inthe rapid NIR quantitative prediction of fishmeal protein

5 Conclusions

NIR calibration model was established for the quantitativedetermination of protein content in fishmeal samples Anovel optimization framework of GSMW-LPC-GA wasconstructed for model optimization Possible wavebandswere tested for model enhancement by grid searchingmoving window parameters in which the starting point wasset continuously changing from 1 to the total number ofwavelengths in the scanning range and the window size wasset in a specific design to balance the model representa-tiveness and the computational complexity Successively 5optimal window wavebands were selected for further op-timization by LPC feature extraction and GA iterationGenetic algorithmic iteration was designed as the coreprocess of implementable optimization e variableswavelengths in each of the 5 selected wavebands weretransformed to LPCs as the input variables for GA iterationIn GA optimization the population size was set obeying thedimensions of the input variable set e probability ofcrossover and mutation was designed for parameter tuninge control of iteration enlarged the extension of parametricscaling for generation evolutions Finally the model wassecondarily optimized by the selection crossover andmutation operators in a refined interaction within theGSMW-LPC-GA framework

e model prediction effects resulting from the GSMW-LPC-GA framework were experimentally proved better thanthe effect by simply adopting the moving window PLSmodel In calibration-validation model training the optimalGA probabilities for crossover and mutation were 40 and08 respectively and the predictive RMSEV and RV werebest as 4215 and 0926 respectively at the waveband1980ndash2050 nm is is significantly better than the movingwindow output at the waveband 1446ndash1520 nm For modelevaluation the best model generated from the GSMW-LPC-GA framework was further estimated by the testing sampleset which is previously chosen outside of the trainingsamples e testing results were not so good as the trainingresults but it was practically acceptable for some industrialrapid detection cases with the RMSET and RT equaling to5341 and 0883 respectively

e experimental results demonstrate that the proposedGSMW-LPC-GA framework is suitable for NIR quantitativedetermination of fishmeal protein It was able to producebetter models in relation to the full-spectrum model withthe advantage of selecting informative wavebands for targetchemical analytes In the framework GA optimizationeventually plays an important role in the improvement of thecalibration model Further studies include using the NIRtechnology to perform rapid detections on other targets infields of animal husbandry agriculture and food scienceand genetic optimization iteration is expected to be animplementable method providing an efficient strategy for

analyzing some unknown changes and influence of variousfertilizers

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (61505037 and 61763008) the NaturalScience Foundation of Guangxi Province(2018GXNSFAA050045) Guangzhou Science and Tech-nology Program (202002030246) and Fangchenggang Sci-entific Research and Technology Development Plan (2014no 42)

References

[1] M A Booth G L Allan J Frances and S ParkinsonldquoReplacement of fish meal in diets for Australian silver perchBidyanus bidyanusrdquo Aquaculture vol 196 no 1-2 pp 67ndash852001

[2] R W Hardy ldquoUtilization of plant proteins in fish diets effectsof global demand and supplies of fishmealrdquo AquacultureResearch vol 41 no 5 pp 770ndash776 2010

[3] A Corten C-B Braham and A S Sadegh ldquoe developmentof a fishmeal industry in Mauritania and its impact on theregional stocks of sardinella and other small pelagics inNorthwest Africardquo Fisheries Research vol 186 pp 328ndash3362017

[4] D Han X Shan W Zhang et al ldquoA revisit to fishmeal usageand associated consequences in Chinese aquaculturerdquo Re-views in Aquaculture vol 10 no 2 pp 493ndash507 2018

[5] D Cozzolino A Chree I Murray and J R Scaife ldquoeassessment of the chemical composition of fishmeal by nearinfrared reflectance spectroscopyrdquo Aquaculture Nutritionvol 8 no 2 pp 149ndash155 2002

[6] M Uddin E Okazaki H Fukushima S Turza Y Yumikoand Y Fukuda ldquoNondestructive determination of water andprotein in surimi by near-infrared spectroscopyrdquo FoodChemistry vol 96 no 3 pp 491ndash495 2006

[7] H Yamasaki and S Morita ldquoIdentification of the epoxycuring mechanism under isothermal conditions by thermalanalysis and infrared spectroscopyrdquo Journal of MolecularStructure vol 1069 no 1 pp 164ndash170 2014

[8] J Yan N Villarreal and B Xu ldquoCharacterization of degra-dation of cotton cellulosic fibers through near infraredspectroscopyrdquo Journal of Polymers and the Environmentvol 21 no 4 pp 902ndash909 2013

[9] P Sirisomboon P Pornchaloempong P RamsiriS Pongkuan and S Srikornkarn ldquoEvaluation of the saltcontent of canned sardines in tomato ketchup by diffusereflection near infrared spectroscopyrdquo Journal of Near In-frared Spectroscopy vol 22 no 5 pp 329ndash336 2014

[10] C A T dos Santos M Lopo R N M J Pascoa andJ A Lopes ldquoA review on the applications of portable near-

Computational Intelligence and Neuroscience 9

infrared spectrometers in the agro-food industryrdquo AppliedSpectroscopy vol 67 no 11 pp 1215ndash1233 2013

[11] A Sakudo ldquoNear-infrared spectroscopy for medical appli-cations Current status and future perspectivesrdquo ClinicaChimica Acta vol 455 pp 181ndash188 2016

[12] S Hong H Chen J Gu K Cai Z Liu and J WenldquoComparative assessment on smart pre-processing methodsfor extracting information in ft-nir measured datardquo Mea-surement vol 157 Article ID 107663 2020

[13] T N Kok and J W Park ldquoExtending the shelf life of set fishballrdquo Journal of Food Quality vol 30 no 1 pp 1ndash27 2007

[14] G Foca C Ferrari A Ulrici M C Ielo G Minelli andD P Lo Fiego ldquoIodine value and fatty acids determination onpig fat samples by FT-NIR spectroscopy benefits of variableselection in the perspective of industrial applicationsrdquo FoodAnalytical Methods vol 9 no 10 pp 2791ndash2806 2016

[15] F Westad and H Martens ldquoVariable selection in near in-frared spectroscopy based on significance testing in partialleast squares regressionrdquo Journal of Near Infrared Spectros-copy vol 8 no 2 pp 117ndash124 2000

[16] T Mehmood K H Liland L Snipen and S Saeligboslash ldquoA reviewof variable selection methods in partial least squares regres-sionrdquo Chemometrics and Intelligent Laboratory Systemsvol 118 pp 62ndash69 2012

[17] S Kuriakose and H Joe ldquoQualitative and quantitative analysisin sandalwood oils using near infrared spectroscopy com-bined with chemometric techniquesrdquo Food Chemistryvol 135 no 1 pp 213ndash218 2012

[18] M Versaci S Calcagno M Cacciola F C MorabitoI Palamara and D Pellicano ldquoStandard soft computingtechniques for characterization of defects in nondestructiveevaluationrdquo in Ultrasonic Nondestructive Evaluation SystemsP Burrascano S Callegari A Montisci et al Eds pp 175ndash199 Springer Cham Heidelberg Dordrecht London 2015

[19] M Versaci ldquoFuzzy approach and eddy currents NDTNDEdevices in industrial applicationsrdquo Electronics Letters vol 52no 11 pp 943ndash945 2016

[20] Y-H Yun H-D Li L E R Wood et al ldquoAn efficient methodof wavelength interval selection based on random frog formultivariate spectral calibrationrdquo Spectrochimica Acta Part AMolecular and Biomolecular Spectroscopy vol 111 pp 31ndash362013

[21] H Chen Z Liu K Cai L Xu and A Chen ldquoGrid searchparametric optimization for FT-NIR quantitative analysis ofsolid soluble content in strawberry samplesrdquo VibrationalSpectroscopy vol 94 pp 7ndash15 2018

[22] S Kasemsumran N Suttiwijitpukdee and V KeeratinijakalldquoRapid classification of turmeric based on DNA fingerprint bynear-infrared spectroscopy combined with moving windowpartial least squares-discrimination analysisrdquo Analytical Sci-ences vol 33 no 1 pp 111ndash115 2017

[23] K Zheng T Feng W Zhang et al ldquoVariable selection bydouble competitive adaptive reweighted sampling for cali-bration transfer of near infrared spectrardquo Chemometrics andIntelligent Laboratory Systems vol 191 pp 109ndash117 2019

[24] A Joo A Ekart and J P Neirotti ldquoGenetic algorithms fordiscovery of matrix multiplication methodsrdquo IEEE Transac-tions on Evolutionary Computation vol 16 no 5 pp 749ndash7512012

[25] E Haq I Ahmad A Hussain and I M Almanjahie ldquoA novelselection approach for genetic algorithms for global optimi-zation of multimodal continuous functionsrdquo ComputationalIntelligence and Neuroscience vol 2019 Article ID 86402182019

[26] B K Lavine and C G White ldquoBoosting the performance ofgenetic algorithms for variable selection in partial leastsquares spectral calibrationsrdquo Applied Spectroscopy vol 71no 9 pp 2092ndash2101 2017

[27] L Mo H Chen W Chen Q Feng and L Xu ldquoStudy onevolution methods for the optimization of machine learningmodels based on FT-NIR spectroscopyrdquo Infrared Physics ampTechnology vol 108 Article ID 103366 2020

[28] Y-H Yun H-D Li B-C Deng and D-S Cao ldquoAn overviewof variable selection methods in multivariate analysis of near-infrared spectrardquo TrAC Trends in Analytical Chemistryvol 113 pp 102ndash115 2019

[29] S Forrest ldquoGenetic algorithms principles of natural selectionapplied to computationrdquo Science vol 261 no 5123pp 872ndash878 1993

[30] H Jiang W Xu and Q Chen ldquoComparison of algorithms forwavelength variables selection from near-infrared (NIR)spectra for quantitative monitoring of yeast (saccharomycescerevisiae) cultivationsrdquo Spectrochimica Acta Part A Mo-lecular and Biomolecular Spectroscopy vol 214 pp 366ndash3712019

[31] J Koljonen T E M Nordling and J T Alander ldquoA review ofgenetic algorithms in near infrared spectroscopy and che-mometrics past and futurerdquo Journal of Near Infrared Spec-troscopy vol 16 no 3 pp 189ndash197 2008

[32] J-H Jiang R J Berry H W Siesler and Y OzakildquoWavelength interval selection in multicomponent spectralanalysis by moving window partial least-squares regressionwith applications to mid-infrared and near-infrared spec-troscopic datardquo Analytical Chemistry vol 74 no 14pp 3555ndash3565 2002

[33] H-Z Chen Q-Q Song G-Q Tang and L-L Xu ldquoAnoptimization strategy for waveband selection in FT-NIRquantitative analysis of corn proteinrdquo Journal of Cereal Sci-ence vol 60 no 3 pp 595ndash601 2014

[34] C Ranzan L F Trierweiler B Hitzmann andJ O Trierweiler ldquoNIR pre-selection data using modifiedchangeable size moving window partial least squares and purespectral chemometrical modeling with ant colony optimiza-tion for wheat flour characterizationrdquo Chemometrics andIntelligent Laboratory Systems vol 142 pp 78ndash86 2015

[35] T Fearn C Riccioli A Garrido-Varo and J E Guerrero-Ginel ldquoOn the geometry of SNV and MSCrdquo Chemometricsand Intelligent Laboratory Systems vol 96 no 1 pp 22ndash262009

[36] R Galvao M Araujo G Jose M Pontes E Silva andT Saldanha ldquoA method for calibration and validation subsetpartitioningrdquo Talanta vol 67 no 4 pp 736ndash740 2005

10 Computational Intelligence and Neuroscience

Page 3: ANovelGeneticAlgorithm-BasedOptimizationFrameworkfor ...variable selection for NIR calibration of fishmeal for the quantitative determination of protein content. For NIR calibrations,

performance of the individuals e genetic algorithm isexpressed as the following steps [30]

Step 1 a population is initially generated by randomlyselecting a subset of variables (ie spectral wave-lengths) e size of this population is preset A flag isset for each wavelength in the full-scanning regionEach variable is successively marked with flags wherethe flag is a binary marker equaling to 1 or 0 Flag 1represents the corresponding variable is selected intothe subset while Flag 0 represents the variable is notselectedStep 2 calibration models are established based on eachvariable subset and thus the model performance isvalidated and evaluated by computing the value of theloss function for the individuals (ie the variables) inthe current population in the way of cross-validationStep 3 the calibration models are compared and thenext generation is produced by selecting a suitablevariable subset that leads to improved performancewith high prediction accuracy e selected variablesubsets are further modified by crossover or bymutationStep 4 a modified variable subset is produced in oneway by the crossover of the available variables betweenthe selected variable subsets or in the other way bymutating the flags for each variable by smallprobabilityStep 5 the selected variable subsets and the modifiedvariable subsets are used to form a new variablepopulation that should be taken as the inputs andreturn to step 2

e steps 2ndash5 are repeated for iteration calculation thusto generate several new populations According to themodeling performance of cross-validation the iteration willbe stopped in one way when the model prediction errorbecomes stable or in the other way meeting the preset it-eration times e last generated population is regarded asthe optimal informative variable subset

e implementation of genetic algorithm depends oninternal parameters such as population size the proba-bility of crossover (commonly set as 50) and mutationoperators (commonly set as 12) and the stoppingcondition of iteration (commonly set as 500) [31] eparametric scaling strategy should be applied to achievethe optimization of genetic evolution e exploration ofparametric optimization refers to the enhancement of themodel prediction accuracy originated from the full-scanning variables regarding suitable genotype-phenotypecoding and aiming to find the minimal value of the lossfunction

22+eGrid SearchMode forMovingWindowOptimizationMoving window (MW) is a simple and effective way to findthe spectral informative features in a grid search optimi-zation manner [32 33] Combined with the classical PLSregression MW manages to extract spectral features in the

study of chemometric algorithmic optimization for NIRrapid analysis leading to appreciate prediction effects [34]

A window is initially defined by its size and its positionIn particular the size is annotated by the number of variables(N ie spectral wavelengths) in the window and a startingwavelength (S) represents the position of the window Wheneither N or S changes the window moves through the full-scanning spectral range with adjustable size us N and Sare regarded as two parameters to determine a window (iea subset of continuous wavelengths) A fix value of thecombination of (N S) precisely identifies a window in whichthe variables will be utilized for establishing calibrationmodels with some suitable quantitative regressionalgorithm

With a grid search manner the window parameters Nand S are respectively set tunable in designated changingranges For any fixed value of N S is set tunable from 1 toP minus N + 1 (here P represents the number of wavelengths inthe full-scanning range) the calibration models are estab-lished optimized and evaluated By comparing the modelprediction results the most optimal model corresponding tothe fixed N can be determined and simultaneously thepairing value of S can also be found On the counterpart forany fixed value of S N is set tunable from 1 to P minus S + 1 thecalibration models are also compared to determine the mostoptimal model corresponding to the prevalued S and theparing value of N is simultaneously observed In this way allpossible windows are tested and the most informativespectral waveband is identified

emethod of grid searchmoving window (GSMW) canbe expressed in statistical formulae to identify the paringvalue of N (or S) for a fixed value of S (or N)

Modelopt Nfixed( 1113857 middot window Nfix arg min ERR Stunable( 1113857 Nfixed11138681113868111386811138681113872 11138731113872 1113873

Modelopt Sfixed( 1113857 middot window arg min ERR Ntunable( 1113857 Sfix11138681113868111386811138681113872 1113873 Sfixed1113872 1113873

(1)

where the term on the left of the equal sign represents theoptimally found window (ie the most informative wave-band) the term on the right is the optimally selected windowsize and position expressed in the form of (N S) and ERR(middot)

is the loss function of the calibration model which is usuallydefined as the model prediction error

23 +e Optimization Framework of GSMW-LPC-GANIR calibration models were established and optimizedwhere the informative wavebands were selected by using thegrid search moving window operation As the parameters of(N S) were tunable all possible applicable subwavebandswere tested and thus several wavebands were identifiedwith outputting appreciable modeling results On this basisa secondary optimization was carried out using the para-metric scaling of GA as an implementable manner andbeforehand the latent principal variable components aregenerated by the method of principal component analysis

Figure 1 shows the flowchart of the GSMW-LPC-GAframework and the algorithmic steps are expressed asfollows

Computational Intelligence and Neuroscience 3

Framework Step 1 Wavelengths in the full-scanningrange of NIR spectroscopy were resolved as continu-ously changing variables e spectral data are nor-malized and pretreated by the method of standardnormal variate (SNV) [35]Framework Step 2 e grid search moving window(GSMW) technique was applied to the full-lengthspectral range In the way of screening all possiblewindows with applicable size and position the optimalsubwaveband can be identified and some quasioptimalsubwavebands (several appreciable parameter combi-nations of N and S) were found as wellFramework Step 3 e wavelengths in the selectedoptimal and quasioptimal subwavebands were trans-formed to the principal variables using the principalcomponent algorithm Each new generated compre-hensive variable of latent principal component (LPC) isexpressed by a linear function of the initial wavelengthsin the target subwaveband e LPCs are sorted in asequence with their descending information contri-bution and significant LPCs can be obtained in front ofthe sequence e LPCs should have been the optimaloutput variables for NIR predictions but GA isdesigned in this framework as the implementablemethod for secondary optimizationFramework Step 4 As for GA implementation thedesignated significant LPCs are taken as the candidateindividuals for GA inputs e initial population isgenerated by randomly selecting a subset of theavailable LPCs and then goes through the steps of the

parametric scaling GA iteration e loss function isreasonably defined obeying the rule of minimizing theNIR prediction errors Accompanied with the selectioncrossover and mutation processes new generations ofthe population are repeatedly produced and cyclicallyrefreshed us the secondary optimized informativevariables are updated in loops until the iteration stops

3 Data and Applications

31 Data Acquisition Fishmeal samples were prepared asanimal feeds A total of 194 samples were collected for ourexperiment 113 of them were acquired from Guangxi (anautonomous province in South China) and 81 were acquiredfrom Vietnam e samples were physically pretreated byair-drying crushing and sifting to particles with diametersno larger than 02mm e contents of protein were pre-viously detected for each sample using the conventionalKjeldahl method (Chinese National Standard GBT 6432-2018) e maximum and minimum values as well as thestatistical average value and the standard deviation wererecorded as 6703 5317 60670 and 4362 (wt)respectively

e NIR spectral measurement was performed by usingFOSS NIR Systems 5000 grating spectrometer (Foss NIR-Systems Inc Denmark) equipped with its diffuse reflectanceaccessory and a round sample cell To reduce the systematicerror each sample was repeatedly detected for 5 times andthe average spectrum was output as the spectrum of thesample e temperature in the laboratory was controlled at25plusmn 1degC and the humidity was controlled at 70plusmn 1 RH

Parametric scaling genetic optimization

Iteration

Spectral data

Pretreatment

Grid search moving window

Stunable = 1 1 (P ndash N + 1)Window(N arg(min ERR(Stunable))

Ntunable = 1 1 (P ndash S + 1)

Window(arg(min ERR(Ntunable) S)

Selected wavebands

Latent principal componentextraction

PCA transform

Latent variablesrepresenting main

information ofwavelengths

A set of individualwavelengths

Comprehensivevariables

Populationinitialized Calibration model Model evaluation

Variableoutput

Variableinput

Satisfied

NonsatisfiedOptimalvariables

N

S

Evolution

SelectionCrossover

Mutation

Figure 1 e flowchart of the optimization framework of GSMW-LPC-GA

4 Computational Intelligence and Neuroscience

throughout the spectral scanning processeNIR spectrumwas measured over the wavenumber range of 1100ndash2500 nmwith a resolution of 2 nm so that 700 distincable wave-lengths were identified as variables recording the spectraldata e spectra of 194 fishmeal samples are shown inFigure 2

32 Sample Sets and Model Indicators e proposed algo-rithmic framework of GSMW-LPC-GA was applied tovariable selection for NIR calibration of fishmeal for thequantitative determination of protein content For NIRcalibrations the fishmeal samples should be split into themodeling set and the testing set As for modeling repre-sentativeness the testing set was randomly picked in ad-vance as including 54 samples and the remaining sampleswere for modeling Because the NIR calibration models needestablishment and optimization the modeling samples werefurther divided into the calibration part (90 samples) and thevalidation part (50 samples) where the method of sample setpartitioning based on joint x-y distance [36] was appliedemaximum minimum and average values and the standarddeviation for the calibration part validation part and testingset are listed in Table 1 e statistical data of all sampleswere also listed

Quantitative indicators are needed to evaluate the cal-ibration model Root mean square error (RMSE) and cor-relation coefficients (R) are commonly used which aredefined as follows

RMSE

1113936 yi minus yiprime( 11138572

n

1113971

R 1113936 yi minus ym( 1113857 yi

prime minus ymprime( 1113857

1113936 yi minus ym( 11138572

1113936 yiprime minus ymprime( 1113857

21113969

(2)

where yi and yiprime are the Kjeldahl-measured protein content

and the NIR predictive value of the i-th fishmeal sample ym

and ymprime are the mean value of yi | i 1 2 n1113864 1113865 and

yiprime | i 1 2 n1113864 1113865 respectively and n represents the

number of target samples Consequently we denotedRMSECRC RMSEVRV and RMSETRT as the corre-sponding signs of RMSER for the calibration samplesvalidation samples and testing samples respectively

4 Results and Discussion

e optimization framework of GSMW-LPC-GA was ap-plied to the quantitative analysis of fishmeal protein basedon the NIR spectral matrix In the framework GSMWmodeling method was used to select wavebands with highsignal-to-noise ratio which were regarded as the informativecontinuous wavelengths for further optimization Next la-tent variables were extracted from the initial continuouswavelengths by using the latent principal componenttechnique so as to hold more information of the target withless designated variablesen genetic algorithmwas appliedto promote the variable population evolution with steps ofselection crossover and mutation In the end we managed

to obtain some informative variables for the calibration offishmeal protein aiming to have a prospective improvementfor the quality of NIR analysis

41 Waveband Selection for the Partition of GSMWGSMW is designed to use the grid search mode to optimizethe way of parameter identification for the moving windowalgorithm flow As is known that a specific window is de-termined by the position and its width (ie the number ofvariables in the window) the task of moving window op-timization is to find the target parameter combinations of(N S) that point to a solid window leading to optimal NIRprediction effect In this work we set N and S tunabletargeting 700 discrete initial scanning wavelengths Con-sidering that a model established on a large number ofvariables will increase the computational complexity thuswe design N valuing continuously when it is not over 100and changing with a jumping step of 10 points when it issmaller than 300 and with a step of 20 when larger than 300(ie N isin [1 1 100]cup [110 10 300]cup [320 20 700]) Inthis way the value of N roughly went through some rep-resentative cases from 1 to 700 and also guaranteed a lesscomputing load for acceptable operation with currentlyprevalent computers On the other hand the parametricvalue of S was set changing from 1 consecutively to 700 (ieS isin [1 1 700]) All applicable combinations of (N S)should subject to the rule of N + S minus 1leP where P is equalto 700 Consequently we totally tested 78 790 possiblewindows

PLS the commonly used calibration method was ap-plied for each determinant window to estimate the func-tional contributions of the window wavelengths to theresults of NIR calibrations e calibration samples wereused for model training and the validation samples forselecting the optimal model with a minimum value ofRMSEV e predictive RMSEV matrix is shown as acontour plot in Figure 3 corresponding to each fixed valueof (N S)en the most optimal window can be identified inFigure 3 Additionally some well-performing models areexpected to have further improvement in the next steps ofthe GSMW-LPC-GA framework us we finally determine

1100 1300 1500 1700 1900 2100 2300 250001

02

03

04

05

06

07

08

09

10

Log

(1R

)

Wavelength (nm)

Figure 2 NIR reflectance spectra of 193 fishmeal samples

Computational Intelligence and Neuroscience 5

to choose 5 optimal windows for further analysis (marked as(1) (2) (3) (4) and (5) in Figure 3)e prediction results aswell as their operating parameters of the PLS models in these5 windows are presented in Table 2 respectively We foundthat the most optimal waveband was 1446ndash1520 nm whichincludes 38 wavelengthsvariables Also we spotlight theselected 5 optimal wavebands in the full range (see Figure 4)

42GeneticOptimization for LPCs in the SelectedWavebandse selected 5 optimal wavebands were available for furtheroptimization e parameter scaling genetic algorithm wasapplied with principal component analysis plugged in evariables in each waveband were transformed to principalcomponents (LPC) as latent variables for analysis enumber of LPCs is determined by the number of the originalvariables in the waveband ese latent components weresorted in descending contribution to its original data whichmeans that the latent variables in front of the queue containmore information than the variables in the back us the

LPC transform projected the raw data into a reasonablyrefined variable space in which the target information iseasier to observe e original variables in windows (1) (2)(3) (4) and (5) were transformed into 24 23 38 36 and 18LPCs respectively All of the LPCs were used as the inputsfor genetic optimization

Genetic optimization is the vital implementation forvariable selection embedded in the GSMW-LPC-GAframework To make the GA procedure suitable for theselected input data the population size was set fitting the

0 50 100 150 200 250 300Serial number of the starting wavelength (S)

350 400 450 500 550 600 650 700

102030405060708090

100120140160180200220

260240

280300

340

Num

ber o

f var

iabl

es (N

) 380

420

460

500

540

580

620

660

700RMSEV

114108102969084787266605448

Figure 3 Contour plot of the validating results by the GSMW model

Table 1 e statistical data of fishmeal protein contents for the calibration part validation part and testing set

No of samples Maximum Minimum Average Standard deviationCalibration part 90 6703 5317 60593 4347Validation part 50 6619 5386 59174 4412Testing set 54 6693 5384 60437 4308All samples 194 6703 5317 60670 4360

Table 2e optimal model validating results corresponding to the5 selected wavebands

S N Waveband (nm) RMSEV (wt) RV

(1) 18 24 1134ndash1180 5146 0874(2) 77 23 1252ndash1296 5071 0884(3) 174 38 1446ndash1520 4944 0905(4) 441 36 1980ndash2050 5190 0892(5) 486 18 2070ndash2104 5312 0908

6 Computational Intelligence and Neuroscience

1100 1300 1500 1700 1900 2100 2300 250046

47

48

49

50

51

52

53

542070ndash2104

1980ndash2050

1446ndash1520

1252ndash12961134ndash1180

RMSE

V

Wavelength

The shape of thespectrum not fitting

the RMSEV-axis

Figure 4 e locations of the 5 selected wavebands in the full spectral range

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

Evolution for variablesin waveband 1252ndash1296nm

42

44

46

48

50

52

54

56

58

0 100 200 300 400 500Iterations

(b) (c)

(d) (e)

Evolution for variablesin waveband 1446ndash1520nm

42

44

46

48

50

52

54

56

58

RMSE

V(lo

ss fu

nctio

n)RM

SEV

(loss

func

tion)

0 100 200 300 400 500Iterations

Evolution for variablesin waveband 2070ndash2104nm

42

44

46

48

50

52

54

56

58

0 100 200 300 400 500Iterations

0 100 200 300 400 50042

44

46

48

50

52

54

56

58Evolution for variables

in waveband 1134ndash1180nm

RMSE

V(lo

ss fu

nctio

n)RM

SEV

(loss

func

tion)

RMSE

V(lo

ss fu

nctio

n)

Iterations(a)

Evolution for variablesin waveband 1980ndash2050nm

42

44

46

48

50

52

54

56

58

0 100 200 300 400 500Iterations

(1) (2) (3)

(4) (5) (6)

(7) (8) (9)

Probability of crossover40 50 60

Prob

abili

ty o

f mut

atio

n 08

12

15

Figure 5 e optimizational effects for the principal variables in the 5 selected wavebands based on the parametric scaling GA iterations

Computational Intelligence and Neuroscience 7

inputs and then the selection crossover and mutation werecarried out For parametric scaling tuning the probability ofcrossover was set as 40 50 and 60 for testing emutation operator was set testing the probabilities of 0812 and 15 en the population automatically evolvedin a loop iteration and the loop was stopped in case ofmeeting each of the following two conditions and theframework-optimized model was determined

Condition 1 When reaching 500 times iteration the loopwas mandatorily stopped

Condition 2 When there was no further change appearingin the loss function (ie the RMSEV) for consecutively 20times iteration the loop was adaptively stopped

We tested the specific scaling parameter valuing ofcrossover andmutation in genetic algorithm for each of the 5designated wavebands e framework optimization resultsin the genetic evolution process are shown in Figure 5 It canbe seen in Figure 5 that the evolution loop did not stop insome cases until 500 times iteration and in other cases theloop stopped before iteration reached 500 times Finally wewere able to find the best results for each of the 5 selectedwavebands by the genetic optimizational process e sec-ondary optimized RMSEV and the corresponding RV areshown in Table 3 as well as their preferable GA parametersWe concluded from Table 3 that all of the data in the 5selected wavebands were further optimized by GA and the

most optimal result was observed in the waveband of1980ndash2050 nm whose evolution loop stopped at the 434times iteration

43 +e Testing Results Based on the Optimal GSMW-LPC-GAModel As discussed above the predictive performance ofthe GSMW-LPC-GA framework on the calibration andvalidation sample sets was satisfactory e optimal modelselected by the GSMW-LPC-GA framework was establishedon the waveband 1980ndash2050 nm (36 variables included)with genetic optimization at 40 crossover and 08 mu-tation e evolution loop stopped at 434 times iterationis framework optimal model outputs the prediction errorRMSEV as 4215 and the correlation RV as 0926 To evaluatethe practical modeling effect of the framework we took thewaveband 1980ndash2050 nm as the target spectral data in thetesting part performed the principal component extractionand applied the same GA parameters to estimate themodeling effect for the 53 testing samples which were in-dependent of the training part e PLS regression plot isshown in Figure 6(a) demonstrating the testing resultswhere the RMSET and RT were 5341 and 0883 respectivelyAs for comparison the typical GA evolution method wasused for establishing and optimizing the PLS calibrationvalidation and testing processes With common algorithmicsettings GA runs with 50 crossover 12 mutation and500 times of iteration [31] e GA-PLS testing results are

Table 3 e optimal model validating results corresponding to the 5 selected wavebands

Waveband (nm)e optimal parameters for genetic evolution

RMSEV (wt) RVCrossover () Mutation () Iteration times

1134ndash1180 50 15 500 4755 09061252ndash1296 40 08 437 4546 08971446ndash1520 50 08 469 4353 09181980ndash2050 40 08 434 4215 09262070ndash2104 50 08 484 4649 0912e iteration times represent when the optimal RMSEV was kept as a constant value for a cycle of 20 times iteration or less than 20 in case the iteration hadreached 500 times

40 45 50 55 60 65 70 75 8040

45

50

55

60

65

70

75

80 GSMW-LPC-GA optimizationwaveband 1980ndash2050nm

Measured content

RMSEV = 5341RT = 0883

NIR

pre

dict

ed v

alue

(a)

40 45 50 55 60 65 70 75 8040

45

50

55

60

65

70

75

80

RMSEV = 6372RT = 0864

Typical GA optimizationwaveband 1100ndash2500nm

NIR

pre

dict

ed v

alue

Measured content

(b)

Figure 6 e PLS regression plot for test samples ((a) the proposed GSMW-LPC-GA framework and (b) the typical GA evolution)

8 Computational Intelligence and Neuroscience

shown as the regression plot in Figure 6(b) e comparativeexperiments show that the proposed GSMW-LPC-GAframework performs better than the typical GA evolution inthe rapid NIR quantitative prediction of fishmeal protein

5 Conclusions

NIR calibration model was established for the quantitativedetermination of protein content in fishmeal samples Anovel optimization framework of GSMW-LPC-GA wasconstructed for model optimization Possible wavebandswere tested for model enhancement by grid searchingmoving window parameters in which the starting point wasset continuously changing from 1 to the total number ofwavelengths in the scanning range and the window size wasset in a specific design to balance the model representa-tiveness and the computational complexity Successively 5optimal window wavebands were selected for further op-timization by LPC feature extraction and GA iterationGenetic algorithmic iteration was designed as the coreprocess of implementable optimization e variableswavelengths in each of the 5 selected wavebands weretransformed to LPCs as the input variables for GA iterationIn GA optimization the population size was set obeying thedimensions of the input variable set e probability ofcrossover and mutation was designed for parameter tuninge control of iteration enlarged the extension of parametricscaling for generation evolutions Finally the model wassecondarily optimized by the selection crossover andmutation operators in a refined interaction within theGSMW-LPC-GA framework

e model prediction effects resulting from the GSMW-LPC-GA framework were experimentally proved better thanthe effect by simply adopting the moving window PLSmodel In calibration-validation model training the optimalGA probabilities for crossover and mutation were 40 and08 respectively and the predictive RMSEV and RV werebest as 4215 and 0926 respectively at the waveband1980ndash2050 nm is is significantly better than the movingwindow output at the waveband 1446ndash1520 nm For modelevaluation the best model generated from the GSMW-LPC-GA framework was further estimated by the testing sampleset which is previously chosen outside of the trainingsamples e testing results were not so good as the trainingresults but it was practically acceptable for some industrialrapid detection cases with the RMSET and RT equaling to5341 and 0883 respectively

e experimental results demonstrate that the proposedGSMW-LPC-GA framework is suitable for NIR quantitativedetermination of fishmeal protein It was able to producebetter models in relation to the full-spectrum model withthe advantage of selecting informative wavebands for targetchemical analytes In the framework GA optimizationeventually plays an important role in the improvement of thecalibration model Further studies include using the NIRtechnology to perform rapid detections on other targets infields of animal husbandry agriculture and food scienceand genetic optimization iteration is expected to be animplementable method providing an efficient strategy for

analyzing some unknown changes and influence of variousfertilizers

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (61505037 and 61763008) the NaturalScience Foundation of Guangxi Province(2018GXNSFAA050045) Guangzhou Science and Tech-nology Program (202002030246) and Fangchenggang Sci-entific Research and Technology Development Plan (2014no 42)

References

[1] M A Booth G L Allan J Frances and S ParkinsonldquoReplacement of fish meal in diets for Australian silver perchBidyanus bidyanusrdquo Aquaculture vol 196 no 1-2 pp 67ndash852001

[2] R W Hardy ldquoUtilization of plant proteins in fish diets effectsof global demand and supplies of fishmealrdquo AquacultureResearch vol 41 no 5 pp 770ndash776 2010

[3] A Corten C-B Braham and A S Sadegh ldquoe developmentof a fishmeal industry in Mauritania and its impact on theregional stocks of sardinella and other small pelagics inNorthwest Africardquo Fisheries Research vol 186 pp 328ndash3362017

[4] D Han X Shan W Zhang et al ldquoA revisit to fishmeal usageand associated consequences in Chinese aquaculturerdquo Re-views in Aquaculture vol 10 no 2 pp 493ndash507 2018

[5] D Cozzolino A Chree I Murray and J R Scaife ldquoeassessment of the chemical composition of fishmeal by nearinfrared reflectance spectroscopyrdquo Aquaculture Nutritionvol 8 no 2 pp 149ndash155 2002

[6] M Uddin E Okazaki H Fukushima S Turza Y Yumikoand Y Fukuda ldquoNondestructive determination of water andprotein in surimi by near-infrared spectroscopyrdquo FoodChemistry vol 96 no 3 pp 491ndash495 2006

[7] H Yamasaki and S Morita ldquoIdentification of the epoxycuring mechanism under isothermal conditions by thermalanalysis and infrared spectroscopyrdquo Journal of MolecularStructure vol 1069 no 1 pp 164ndash170 2014

[8] J Yan N Villarreal and B Xu ldquoCharacterization of degra-dation of cotton cellulosic fibers through near infraredspectroscopyrdquo Journal of Polymers and the Environmentvol 21 no 4 pp 902ndash909 2013

[9] P Sirisomboon P Pornchaloempong P RamsiriS Pongkuan and S Srikornkarn ldquoEvaluation of the saltcontent of canned sardines in tomato ketchup by diffusereflection near infrared spectroscopyrdquo Journal of Near In-frared Spectroscopy vol 22 no 5 pp 329ndash336 2014

[10] C A T dos Santos M Lopo R N M J Pascoa andJ A Lopes ldquoA review on the applications of portable near-

Computational Intelligence and Neuroscience 9

infrared spectrometers in the agro-food industryrdquo AppliedSpectroscopy vol 67 no 11 pp 1215ndash1233 2013

[11] A Sakudo ldquoNear-infrared spectroscopy for medical appli-cations Current status and future perspectivesrdquo ClinicaChimica Acta vol 455 pp 181ndash188 2016

[12] S Hong H Chen J Gu K Cai Z Liu and J WenldquoComparative assessment on smart pre-processing methodsfor extracting information in ft-nir measured datardquo Mea-surement vol 157 Article ID 107663 2020

[13] T N Kok and J W Park ldquoExtending the shelf life of set fishballrdquo Journal of Food Quality vol 30 no 1 pp 1ndash27 2007

[14] G Foca C Ferrari A Ulrici M C Ielo G Minelli andD P Lo Fiego ldquoIodine value and fatty acids determination onpig fat samples by FT-NIR spectroscopy benefits of variableselection in the perspective of industrial applicationsrdquo FoodAnalytical Methods vol 9 no 10 pp 2791ndash2806 2016

[15] F Westad and H Martens ldquoVariable selection in near in-frared spectroscopy based on significance testing in partialleast squares regressionrdquo Journal of Near Infrared Spectros-copy vol 8 no 2 pp 117ndash124 2000

[16] T Mehmood K H Liland L Snipen and S Saeligboslash ldquoA reviewof variable selection methods in partial least squares regres-sionrdquo Chemometrics and Intelligent Laboratory Systemsvol 118 pp 62ndash69 2012

[17] S Kuriakose and H Joe ldquoQualitative and quantitative analysisin sandalwood oils using near infrared spectroscopy com-bined with chemometric techniquesrdquo Food Chemistryvol 135 no 1 pp 213ndash218 2012

[18] M Versaci S Calcagno M Cacciola F C MorabitoI Palamara and D Pellicano ldquoStandard soft computingtechniques for characterization of defects in nondestructiveevaluationrdquo in Ultrasonic Nondestructive Evaluation SystemsP Burrascano S Callegari A Montisci et al Eds pp 175ndash199 Springer Cham Heidelberg Dordrecht London 2015

[19] M Versaci ldquoFuzzy approach and eddy currents NDTNDEdevices in industrial applicationsrdquo Electronics Letters vol 52no 11 pp 943ndash945 2016

[20] Y-H Yun H-D Li L E R Wood et al ldquoAn efficient methodof wavelength interval selection based on random frog formultivariate spectral calibrationrdquo Spectrochimica Acta Part AMolecular and Biomolecular Spectroscopy vol 111 pp 31ndash362013

[21] H Chen Z Liu K Cai L Xu and A Chen ldquoGrid searchparametric optimization for FT-NIR quantitative analysis ofsolid soluble content in strawberry samplesrdquo VibrationalSpectroscopy vol 94 pp 7ndash15 2018

[22] S Kasemsumran N Suttiwijitpukdee and V KeeratinijakalldquoRapid classification of turmeric based on DNA fingerprint bynear-infrared spectroscopy combined with moving windowpartial least squares-discrimination analysisrdquo Analytical Sci-ences vol 33 no 1 pp 111ndash115 2017

[23] K Zheng T Feng W Zhang et al ldquoVariable selection bydouble competitive adaptive reweighted sampling for cali-bration transfer of near infrared spectrardquo Chemometrics andIntelligent Laboratory Systems vol 191 pp 109ndash117 2019

[24] A Joo A Ekart and J P Neirotti ldquoGenetic algorithms fordiscovery of matrix multiplication methodsrdquo IEEE Transac-tions on Evolutionary Computation vol 16 no 5 pp 749ndash7512012

[25] E Haq I Ahmad A Hussain and I M Almanjahie ldquoA novelselection approach for genetic algorithms for global optimi-zation of multimodal continuous functionsrdquo ComputationalIntelligence and Neuroscience vol 2019 Article ID 86402182019

[26] B K Lavine and C G White ldquoBoosting the performance ofgenetic algorithms for variable selection in partial leastsquares spectral calibrationsrdquo Applied Spectroscopy vol 71no 9 pp 2092ndash2101 2017

[27] L Mo H Chen W Chen Q Feng and L Xu ldquoStudy onevolution methods for the optimization of machine learningmodels based on FT-NIR spectroscopyrdquo Infrared Physics ampTechnology vol 108 Article ID 103366 2020

[28] Y-H Yun H-D Li B-C Deng and D-S Cao ldquoAn overviewof variable selection methods in multivariate analysis of near-infrared spectrardquo TrAC Trends in Analytical Chemistryvol 113 pp 102ndash115 2019

[29] S Forrest ldquoGenetic algorithms principles of natural selectionapplied to computationrdquo Science vol 261 no 5123pp 872ndash878 1993

[30] H Jiang W Xu and Q Chen ldquoComparison of algorithms forwavelength variables selection from near-infrared (NIR)spectra for quantitative monitoring of yeast (saccharomycescerevisiae) cultivationsrdquo Spectrochimica Acta Part A Mo-lecular and Biomolecular Spectroscopy vol 214 pp 366ndash3712019

[31] J Koljonen T E M Nordling and J T Alander ldquoA review ofgenetic algorithms in near infrared spectroscopy and che-mometrics past and futurerdquo Journal of Near Infrared Spec-troscopy vol 16 no 3 pp 189ndash197 2008

[32] J-H Jiang R J Berry H W Siesler and Y OzakildquoWavelength interval selection in multicomponent spectralanalysis by moving window partial least-squares regressionwith applications to mid-infrared and near-infrared spec-troscopic datardquo Analytical Chemistry vol 74 no 14pp 3555ndash3565 2002

[33] H-Z Chen Q-Q Song G-Q Tang and L-L Xu ldquoAnoptimization strategy for waveband selection in FT-NIRquantitative analysis of corn proteinrdquo Journal of Cereal Sci-ence vol 60 no 3 pp 595ndash601 2014

[34] C Ranzan L F Trierweiler B Hitzmann andJ O Trierweiler ldquoNIR pre-selection data using modifiedchangeable size moving window partial least squares and purespectral chemometrical modeling with ant colony optimiza-tion for wheat flour characterizationrdquo Chemometrics andIntelligent Laboratory Systems vol 142 pp 78ndash86 2015

[35] T Fearn C Riccioli A Garrido-Varo and J E Guerrero-Ginel ldquoOn the geometry of SNV and MSCrdquo Chemometricsand Intelligent Laboratory Systems vol 96 no 1 pp 22ndash262009

[36] R Galvao M Araujo G Jose M Pontes E Silva andT Saldanha ldquoA method for calibration and validation subsetpartitioningrdquo Talanta vol 67 no 4 pp 736ndash740 2005

10 Computational Intelligence and Neuroscience

Page 4: ANovelGeneticAlgorithm-BasedOptimizationFrameworkfor ...variable selection for NIR calibration of fishmeal for the quantitative determination of protein content. For NIR calibrations,

Framework Step 1 Wavelengths in the full-scanningrange of NIR spectroscopy were resolved as continu-ously changing variables e spectral data are nor-malized and pretreated by the method of standardnormal variate (SNV) [35]Framework Step 2 e grid search moving window(GSMW) technique was applied to the full-lengthspectral range In the way of screening all possiblewindows with applicable size and position the optimalsubwaveband can be identified and some quasioptimalsubwavebands (several appreciable parameter combi-nations of N and S) were found as wellFramework Step 3 e wavelengths in the selectedoptimal and quasioptimal subwavebands were trans-formed to the principal variables using the principalcomponent algorithm Each new generated compre-hensive variable of latent principal component (LPC) isexpressed by a linear function of the initial wavelengthsin the target subwaveband e LPCs are sorted in asequence with their descending information contri-bution and significant LPCs can be obtained in front ofthe sequence e LPCs should have been the optimaloutput variables for NIR predictions but GA isdesigned in this framework as the implementablemethod for secondary optimizationFramework Step 4 As for GA implementation thedesignated significant LPCs are taken as the candidateindividuals for GA inputs e initial population isgenerated by randomly selecting a subset of theavailable LPCs and then goes through the steps of the

parametric scaling GA iteration e loss function isreasonably defined obeying the rule of minimizing theNIR prediction errors Accompanied with the selectioncrossover and mutation processes new generations ofthe population are repeatedly produced and cyclicallyrefreshed us the secondary optimized informativevariables are updated in loops until the iteration stops

3 Data and Applications

31 Data Acquisition Fishmeal samples were prepared asanimal feeds A total of 194 samples were collected for ourexperiment 113 of them were acquired from Guangxi (anautonomous province in South China) and 81 were acquiredfrom Vietnam e samples were physically pretreated byair-drying crushing and sifting to particles with diametersno larger than 02mm e contents of protein were pre-viously detected for each sample using the conventionalKjeldahl method (Chinese National Standard GBT 6432-2018) e maximum and minimum values as well as thestatistical average value and the standard deviation wererecorded as 6703 5317 60670 and 4362 (wt)respectively

e NIR spectral measurement was performed by usingFOSS NIR Systems 5000 grating spectrometer (Foss NIR-Systems Inc Denmark) equipped with its diffuse reflectanceaccessory and a round sample cell To reduce the systematicerror each sample was repeatedly detected for 5 times andthe average spectrum was output as the spectrum of thesample e temperature in the laboratory was controlled at25plusmn 1degC and the humidity was controlled at 70plusmn 1 RH

Parametric scaling genetic optimization

Iteration

Spectral data

Pretreatment

Grid search moving window

Stunable = 1 1 (P ndash N + 1)Window(N arg(min ERR(Stunable))

Ntunable = 1 1 (P ndash S + 1)

Window(arg(min ERR(Ntunable) S)

Selected wavebands

Latent principal componentextraction

PCA transform

Latent variablesrepresenting main

information ofwavelengths

A set of individualwavelengths

Comprehensivevariables

Populationinitialized Calibration model Model evaluation

Variableoutput

Variableinput

Satisfied

NonsatisfiedOptimalvariables

N

S

Evolution

SelectionCrossover

Mutation

Figure 1 e flowchart of the optimization framework of GSMW-LPC-GA

4 Computational Intelligence and Neuroscience

throughout the spectral scanning processeNIR spectrumwas measured over the wavenumber range of 1100ndash2500 nmwith a resolution of 2 nm so that 700 distincable wave-lengths were identified as variables recording the spectraldata e spectra of 194 fishmeal samples are shown inFigure 2

32 Sample Sets and Model Indicators e proposed algo-rithmic framework of GSMW-LPC-GA was applied tovariable selection for NIR calibration of fishmeal for thequantitative determination of protein content For NIRcalibrations the fishmeal samples should be split into themodeling set and the testing set As for modeling repre-sentativeness the testing set was randomly picked in ad-vance as including 54 samples and the remaining sampleswere for modeling Because the NIR calibration models needestablishment and optimization the modeling samples werefurther divided into the calibration part (90 samples) and thevalidation part (50 samples) where the method of sample setpartitioning based on joint x-y distance [36] was appliedemaximum minimum and average values and the standarddeviation for the calibration part validation part and testingset are listed in Table 1 e statistical data of all sampleswere also listed

Quantitative indicators are needed to evaluate the cal-ibration model Root mean square error (RMSE) and cor-relation coefficients (R) are commonly used which aredefined as follows

RMSE

1113936 yi minus yiprime( 11138572

n

1113971

R 1113936 yi minus ym( 1113857 yi

prime minus ymprime( 1113857

1113936 yi minus ym( 11138572

1113936 yiprime minus ymprime( 1113857

21113969

(2)

where yi and yiprime are the Kjeldahl-measured protein content

and the NIR predictive value of the i-th fishmeal sample ym

and ymprime are the mean value of yi | i 1 2 n1113864 1113865 and

yiprime | i 1 2 n1113864 1113865 respectively and n represents the

number of target samples Consequently we denotedRMSECRC RMSEVRV and RMSETRT as the corre-sponding signs of RMSER for the calibration samplesvalidation samples and testing samples respectively

4 Results and Discussion

e optimization framework of GSMW-LPC-GA was ap-plied to the quantitative analysis of fishmeal protein basedon the NIR spectral matrix In the framework GSMWmodeling method was used to select wavebands with highsignal-to-noise ratio which were regarded as the informativecontinuous wavelengths for further optimization Next la-tent variables were extracted from the initial continuouswavelengths by using the latent principal componenttechnique so as to hold more information of the target withless designated variablesen genetic algorithmwas appliedto promote the variable population evolution with steps ofselection crossover and mutation In the end we managed

to obtain some informative variables for the calibration offishmeal protein aiming to have a prospective improvementfor the quality of NIR analysis

41 Waveband Selection for the Partition of GSMWGSMW is designed to use the grid search mode to optimizethe way of parameter identification for the moving windowalgorithm flow As is known that a specific window is de-termined by the position and its width (ie the number ofvariables in the window) the task of moving window op-timization is to find the target parameter combinations of(N S) that point to a solid window leading to optimal NIRprediction effect In this work we set N and S tunabletargeting 700 discrete initial scanning wavelengths Con-sidering that a model established on a large number ofvariables will increase the computational complexity thuswe design N valuing continuously when it is not over 100and changing with a jumping step of 10 points when it issmaller than 300 and with a step of 20 when larger than 300(ie N isin [1 1 100]cup [110 10 300]cup [320 20 700]) Inthis way the value of N roughly went through some rep-resentative cases from 1 to 700 and also guaranteed a lesscomputing load for acceptable operation with currentlyprevalent computers On the other hand the parametricvalue of S was set changing from 1 consecutively to 700 (ieS isin [1 1 700]) All applicable combinations of (N S)should subject to the rule of N + S minus 1leP where P is equalto 700 Consequently we totally tested 78 790 possiblewindows

PLS the commonly used calibration method was ap-plied for each determinant window to estimate the func-tional contributions of the window wavelengths to theresults of NIR calibrations e calibration samples wereused for model training and the validation samples forselecting the optimal model with a minimum value ofRMSEV e predictive RMSEV matrix is shown as acontour plot in Figure 3 corresponding to each fixed valueof (N S)en the most optimal window can be identified inFigure 3 Additionally some well-performing models areexpected to have further improvement in the next steps ofthe GSMW-LPC-GA framework us we finally determine

1100 1300 1500 1700 1900 2100 2300 250001

02

03

04

05

06

07

08

09

10

Log

(1R

)

Wavelength (nm)

Figure 2 NIR reflectance spectra of 193 fishmeal samples

Computational Intelligence and Neuroscience 5

to choose 5 optimal windows for further analysis (marked as(1) (2) (3) (4) and (5) in Figure 3)e prediction results aswell as their operating parameters of the PLS models in these5 windows are presented in Table 2 respectively We foundthat the most optimal waveband was 1446ndash1520 nm whichincludes 38 wavelengthsvariables Also we spotlight theselected 5 optimal wavebands in the full range (see Figure 4)

42GeneticOptimization for LPCs in the SelectedWavebandse selected 5 optimal wavebands were available for furtheroptimization e parameter scaling genetic algorithm wasapplied with principal component analysis plugged in evariables in each waveband were transformed to principalcomponents (LPC) as latent variables for analysis enumber of LPCs is determined by the number of the originalvariables in the waveband ese latent components weresorted in descending contribution to its original data whichmeans that the latent variables in front of the queue containmore information than the variables in the back us the

LPC transform projected the raw data into a reasonablyrefined variable space in which the target information iseasier to observe e original variables in windows (1) (2)(3) (4) and (5) were transformed into 24 23 38 36 and 18LPCs respectively All of the LPCs were used as the inputsfor genetic optimization

Genetic optimization is the vital implementation forvariable selection embedded in the GSMW-LPC-GAframework To make the GA procedure suitable for theselected input data the population size was set fitting the

0 50 100 150 200 250 300Serial number of the starting wavelength (S)

350 400 450 500 550 600 650 700

102030405060708090

100120140160180200220

260240

280300

340

Num

ber o

f var

iabl

es (N

) 380

420

460

500

540

580

620

660

700RMSEV

114108102969084787266605448

Figure 3 Contour plot of the validating results by the GSMW model

Table 1 e statistical data of fishmeal protein contents for the calibration part validation part and testing set

No of samples Maximum Minimum Average Standard deviationCalibration part 90 6703 5317 60593 4347Validation part 50 6619 5386 59174 4412Testing set 54 6693 5384 60437 4308All samples 194 6703 5317 60670 4360

Table 2e optimal model validating results corresponding to the5 selected wavebands

S N Waveband (nm) RMSEV (wt) RV

(1) 18 24 1134ndash1180 5146 0874(2) 77 23 1252ndash1296 5071 0884(3) 174 38 1446ndash1520 4944 0905(4) 441 36 1980ndash2050 5190 0892(5) 486 18 2070ndash2104 5312 0908

6 Computational Intelligence and Neuroscience

1100 1300 1500 1700 1900 2100 2300 250046

47

48

49

50

51

52

53

542070ndash2104

1980ndash2050

1446ndash1520

1252ndash12961134ndash1180

RMSE

V

Wavelength

The shape of thespectrum not fitting

the RMSEV-axis

Figure 4 e locations of the 5 selected wavebands in the full spectral range

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

Evolution for variablesin waveband 1252ndash1296nm

42

44

46

48

50

52

54

56

58

0 100 200 300 400 500Iterations

(b) (c)

(d) (e)

Evolution for variablesin waveband 1446ndash1520nm

42

44

46

48

50

52

54

56

58

RMSE

V(lo

ss fu

nctio

n)RM

SEV

(loss

func

tion)

0 100 200 300 400 500Iterations

Evolution for variablesin waveband 2070ndash2104nm

42

44

46

48

50

52

54

56

58

0 100 200 300 400 500Iterations

0 100 200 300 400 50042

44

46

48

50

52

54

56

58Evolution for variables

in waveband 1134ndash1180nm

RMSE

V(lo

ss fu

nctio

n)RM

SEV

(loss

func

tion)

RMSE

V(lo

ss fu

nctio

n)

Iterations(a)

Evolution for variablesin waveband 1980ndash2050nm

42

44

46

48

50

52

54

56

58

0 100 200 300 400 500Iterations

(1) (2) (3)

(4) (5) (6)

(7) (8) (9)

Probability of crossover40 50 60

Prob

abili

ty o

f mut

atio

n 08

12

15

Figure 5 e optimizational effects for the principal variables in the 5 selected wavebands based on the parametric scaling GA iterations

Computational Intelligence and Neuroscience 7

inputs and then the selection crossover and mutation werecarried out For parametric scaling tuning the probability ofcrossover was set as 40 50 and 60 for testing emutation operator was set testing the probabilities of 0812 and 15 en the population automatically evolvedin a loop iteration and the loop was stopped in case ofmeeting each of the following two conditions and theframework-optimized model was determined

Condition 1 When reaching 500 times iteration the loopwas mandatorily stopped

Condition 2 When there was no further change appearingin the loss function (ie the RMSEV) for consecutively 20times iteration the loop was adaptively stopped

We tested the specific scaling parameter valuing ofcrossover andmutation in genetic algorithm for each of the 5designated wavebands e framework optimization resultsin the genetic evolution process are shown in Figure 5 It canbe seen in Figure 5 that the evolution loop did not stop insome cases until 500 times iteration and in other cases theloop stopped before iteration reached 500 times Finally wewere able to find the best results for each of the 5 selectedwavebands by the genetic optimizational process e sec-ondary optimized RMSEV and the corresponding RV areshown in Table 3 as well as their preferable GA parametersWe concluded from Table 3 that all of the data in the 5selected wavebands were further optimized by GA and the

most optimal result was observed in the waveband of1980ndash2050 nm whose evolution loop stopped at the 434times iteration

43 +e Testing Results Based on the Optimal GSMW-LPC-GAModel As discussed above the predictive performance ofthe GSMW-LPC-GA framework on the calibration andvalidation sample sets was satisfactory e optimal modelselected by the GSMW-LPC-GA framework was establishedon the waveband 1980ndash2050 nm (36 variables included)with genetic optimization at 40 crossover and 08 mu-tation e evolution loop stopped at 434 times iterationis framework optimal model outputs the prediction errorRMSEV as 4215 and the correlation RV as 0926 To evaluatethe practical modeling effect of the framework we took thewaveband 1980ndash2050 nm as the target spectral data in thetesting part performed the principal component extractionand applied the same GA parameters to estimate themodeling effect for the 53 testing samples which were in-dependent of the training part e PLS regression plot isshown in Figure 6(a) demonstrating the testing resultswhere the RMSET and RT were 5341 and 0883 respectivelyAs for comparison the typical GA evolution method wasused for establishing and optimizing the PLS calibrationvalidation and testing processes With common algorithmicsettings GA runs with 50 crossover 12 mutation and500 times of iteration [31] e GA-PLS testing results are

Table 3 e optimal model validating results corresponding to the 5 selected wavebands

Waveband (nm)e optimal parameters for genetic evolution

RMSEV (wt) RVCrossover () Mutation () Iteration times

1134ndash1180 50 15 500 4755 09061252ndash1296 40 08 437 4546 08971446ndash1520 50 08 469 4353 09181980ndash2050 40 08 434 4215 09262070ndash2104 50 08 484 4649 0912e iteration times represent when the optimal RMSEV was kept as a constant value for a cycle of 20 times iteration or less than 20 in case the iteration hadreached 500 times

40 45 50 55 60 65 70 75 8040

45

50

55

60

65

70

75

80 GSMW-LPC-GA optimizationwaveband 1980ndash2050nm

Measured content

RMSEV = 5341RT = 0883

NIR

pre

dict

ed v

alue

(a)

40 45 50 55 60 65 70 75 8040

45

50

55

60

65

70

75

80

RMSEV = 6372RT = 0864

Typical GA optimizationwaveband 1100ndash2500nm

NIR

pre

dict

ed v

alue

Measured content

(b)

Figure 6 e PLS regression plot for test samples ((a) the proposed GSMW-LPC-GA framework and (b) the typical GA evolution)

8 Computational Intelligence and Neuroscience

shown as the regression plot in Figure 6(b) e comparativeexperiments show that the proposed GSMW-LPC-GAframework performs better than the typical GA evolution inthe rapid NIR quantitative prediction of fishmeal protein

5 Conclusions

NIR calibration model was established for the quantitativedetermination of protein content in fishmeal samples Anovel optimization framework of GSMW-LPC-GA wasconstructed for model optimization Possible wavebandswere tested for model enhancement by grid searchingmoving window parameters in which the starting point wasset continuously changing from 1 to the total number ofwavelengths in the scanning range and the window size wasset in a specific design to balance the model representa-tiveness and the computational complexity Successively 5optimal window wavebands were selected for further op-timization by LPC feature extraction and GA iterationGenetic algorithmic iteration was designed as the coreprocess of implementable optimization e variableswavelengths in each of the 5 selected wavebands weretransformed to LPCs as the input variables for GA iterationIn GA optimization the population size was set obeying thedimensions of the input variable set e probability ofcrossover and mutation was designed for parameter tuninge control of iteration enlarged the extension of parametricscaling for generation evolutions Finally the model wassecondarily optimized by the selection crossover andmutation operators in a refined interaction within theGSMW-LPC-GA framework

e model prediction effects resulting from the GSMW-LPC-GA framework were experimentally proved better thanthe effect by simply adopting the moving window PLSmodel In calibration-validation model training the optimalGA probabilities for crossover and mutation were 40 and08 respectively and the predictive RMSEV and RV werebest as 4215 and 0926 respectively at the waveband1980ndash2050 nm is is significantly better than the movingwindow output at the waveband 1446ndash1520 nm For modelevaluation the best model generated from the GSMW-LPC-GA framework was further estimated by the testing sampleset which is previously chosen outside of the trainingsamples e testing results were not so good as the trainingresults but it was practically acceptable for some industrialrapid detection cases with the RMSET and RT equaling to5341 and 0883 respectively

e experimental results demonstrate that the proposedGSMW-LPC-GA framework is suitable for NIR quantitativedetermination of fishmeal protein It was able to producebetter models in relation to the full-spectrum model withthe advantage of selecting informative wavebands for targetchemical analytes In the framework GA optimizationeventually plays an important role in the improvement of thecalibration model Further studies include using the NIRtechnology to perform rapid detections on other targets infields of animal husbandry agriculture and food scienceand genetic optimization iteration is expected to be animplementable method providing an efficient strategy for

analyzing some unknown changes and influence of variousfertilizers

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (61505037 and 61763008) the NaturalScience Foundation of Guangxi Province(2018GXNSFAA050045) Guangzhou Science and Tech-nology Program (202002030246) and Fangchenggang Sci-entific Research and Technology Development Plan (2014no 42)

References

[1] M A Booth G L Allan J Frances and S ParkinsonldquoReplacement of fish meal in diets for Australian silver perchBidyanus bidyanusrdquo Aquaculture vol 196 no 1-2 pp 67ndash852001

[2] R W Hardy ldquoUtilization of plant proteins in fish diets effectsof global demand and supplies of fishmealrdquo AquacultureResearch vol 41 no 5 pp 770ndash776 2010

[3] A Corten C-B Braham and A S Sadegh ldquoe developmentof a fishmeal industry in Mauritania and its impact on theregional stocks of sardinella and other small pelagics inNorthwest Africardquo Fisheries Research vol 186 pp 328ndash3362017

[4] D Han X Shan W Zhang et al ldquoA revisit to fishmeal usageand associated consequences in Chinese aquaculturerdquo Re-views in Aquaculture vol 10 no 2 pp 493ndash507 2018

[5] D Cozzolino A Chree I Murray and J R Scaife ldquoeassessment of the chemical composition of fishmeal by nearinfrared reflectance spectroscopyrdquo Aquaculture Nutritionvol 8 no 2 pp 149ndash155 2002

[6] M Uddin E Okazaki H Fukushima S Turza Y Yumikoand Y Fukuda ldquoNondestructive determination of water andprotein in surimi by near-infrared spectroscopyrdquo FoodChemistry vol 96 no 3 pp 491ndash495 2006

[7] H Yamasaki and S Morita ldquoIdentification of the epoxycuring mechanism under isothermal conditions by thermalanalysis and infrared spectroscopyrdquo Journal of MolecularStructure vol 1069 no 1 pp 164ndash170 2014

[8] J Yan N Villarreal and B Xu ldquoCharacterization of degra-dation of cotton cellulosic fibers through near infraredspectroscopyrdquo Journal of Polymers and the Environmentvol 21 no 4 pp 902ndash909 2013

[9] P Sirisomboon P Pornchaloempong P RamsiriS Pongkuan and S Srikornkarn ldquoEvaluation of the saltcontent of canned sardines in tomato ketchup by diffusereflection near infrared spectroscopyrdquo Journal of Near In-frared Spectroscopy vol 22 no 5 pp 329ndash336 2014

[10] C A T dos Santos M Lopo R N M J Pascoa andJ A Lopes ldquoA review on the applications of portable near-

Computational Intelligence and Neuroscience 9

infrared spectrometers in the agro-food industryrdquo AppliedSpectroscopy vol 67 no 11 pp 1215ndash1233 2013

[11] A Sakudo ldquoNear-infrared spectroscopy for medical appli-cations Current status and future perspectivesrdquo ClinicaChimica Acta vol 455 pp 181ndash188 2016

[12] S Hong H Chen J Gu K Cai Z Liu and J WenldquoComparative assessment on smart pre-processing methodsfor extracting information in ft-nir measured datardquo Mea-surement vol 157 Article ID 107663 2020

[13] T N Kok and J W Park ldquoExtending the shelf life of set fishballrdquo Journal of Food Quality vol 30 no 1 pp 1ndash27 2007

[14] G Foca C Ferrari A Ulrici M C Ielo G Minelli andD P Lo Fiego ldquoIodine value and fatty acids determination onpig fat samples by FT-NIR spectroscopy benefits of variableselection in the perspective of industrial applicationsrdquo FoodAnalytical Methods vol 9 no 10 pp 2791ndash2806 2016

[15] F Westad and H Martens ldquoVariable selection in near in-frared spectroscopy based on significance testing in partialleast squares regressionrdquo Journal of Near Infrared Spectros-copy vol 8 no 2 pp 117ndash124 2000

[16] T Mehmood K H Liland L Snipen and S Saeligboslash ldquoA reviewof variable selection methods in partial least squares regres-sionrdquo Chemometrics and Intelligent Laboratory Systemsvol 118 pp 62ndash69 2012

[17] S Kuriakose and H Joe ldquoQualitative and quantitative analysisin sandalwood oils using near infrared spectroscopy com-bined with chemometric techniquesrdquo Food Chemistryvol 135 no 1 pp 213ndash218 2012

[18] M Versaci S Calcagno M Cacciola F C MorabitoI Palamara and D Pellicano ldquoStandard soft computingtechniques for characterization of defects in nondestructiveevaluationrdquo in Ultrasonic Nondestructive Evaluation SystemsP Burrascano S Callegari A Montisci et al Eds pp 175ndash199 Springer Cham Heidelberg Dordrecht London 2015

[19] M Versaci ldquoFuzzy approach and eddy currents NDTNDEdevices in industrial applicationsrdquo Electronics Letters vol 52no 11 pp 943ndash945 2016

[20] Y-H Yun H-D Li L E R Wood et al ldquoAn efficient methodof wavelength interval selection based on random frog formultivariate spectral calibrationrdquo Spectrochimica Acta Part AMolecular and Biomolecular Spectroscopy vol 111 pp 31ndash362013

[21] H Chen Z Liu K Cai L Xu and A Chen ldquoGrid searchparametric optimization for FT-NIR quantitative analysis ofsolid soluble content in strawberry samplesrdquo VibrationalSpectroscopy vol 94 pp 7ndash15 2018

[22] S Kasemsumran N Suttiwijitpukdee and V KeeratinijakalldquoRapid classification of turmeric based on DNA fingerprint bynear-infrared spectroscopy combined with moving windowpartial least squares-discrimination analysisrdquo Analytical Sci-ences vol 33 no 1 pp 111ndash115 2017

[23] K Zheng T Feng W Zhang et al ldquoVariable selection bydouble competitive adaptive reweighted sampling for cali-bration transfer of near infrared spectrardquo Chemometrics andIntelligent Laboratory Systems vol 191 pp 109ndash117 2019

[24] A Joo A Ekart and J P Neirotti ldquoGenetic algorithms fordiscovery of matrix multiplication methodsrdquo IEEE Transac-tions on Evolutionary Computation vol 16 no 5 pp 749ndash7512012

[25] E Haq I Ahmad A Hussain and I M Almanjahie ldquoA novelselection approach for genetic algorithms for global optimi-zation of multimodal continuous functionsrdquo ComputationalIntelligence and Neuroscience vol 2019 Article ID 86402182019

[26] B K Lavine and C G White ldquoBoosting the performance ofgenetic algorithms for variable selection in partial leastsquares spectral calibrationsrdquo Applied Spectroscopy vol 71no 9 pp 2092ndash2101 2017

[27] L Mo H Chen W Chen Q Feng and L Xu ldquoStudy onevolution methods for the optimization of machine learningmodels based on FT-NIR spectroscopyrdquo Infrared Physics ampTechnology vol 108 Article ID 103366 2020

[28] Y-H Yun H-D Li B-C Deng and D-S Cao ldquoAn overviewof variable selection methods in multivariate analysis of near-infrared spectrardquo TrAC Trends in Analytical Chemistryvol 113 pp 102ndash115 2019

[29] S Forrest ldquoGenetic algorithms principles of natural selectionapplied to computationrdquo Science vol 261 no 5123pp 872ndash878 1993

[30] H Jiang W Xu and Q Chen ldquoComparison of algorithms forwavelength variables selection from near-infrared (NIR)spectra for quantitative monitoring of yeast (saccharomycescerevisiae) cultivationsrdquo Spectrochimica Acta Part A Mo-lecular and Biomolecular Spectroscopy vol 214 pp 366ndash3712019

[31] J Koljonen T E M Nordling and J T Alander ldquoA review ofgenetic algorithms in near infrared spectroscopy and che-mometrics past and futurerdquo Journal of Near Infrared Spec-troscopy vol 16 no 3 pp 189ndash197 2008

[32] J-H Jiang R J Berry H W Siesler and Y OzakildquoWavelength interval selection in multicomponent spectralanalysis by moving window partial least-squares regressionwith applications to mid-infrared and near-infrared spec-troscopic datardquo Analytical Chemistry vol 74 no 14pp 3555ndash3565 2002

[33] H-Z Chen Q-Q Song G-Q Tang and L-L Xu ldquoAnoptimization strategy for waveband selection in FT-NIRquantitative analysis of corn proteinrdquo Journal of Cereal Sci-ence vol 60 no 3 pp 595ndash601 2014

[34] C Ranzan L F Trierweiler B Hitzmann andJ O Trierweiler ldquoNIR pre-selection data using modifiedchangeable size moving window partial least squares and purespectral chemometrical modeling with ant colony optimiza-tion for wheat flour characterizationrdquo Chemometrics andIntelligent Laboratory Systems vol 142 pp 78ndash86 2015

[35] T Fearn C Riccioli A Garrido-Varo and J E Guerrero-Ginel ldquoOn the geometry of SNV and MSCrdquo Chemometricsand Intelligent Laboratory Systems vol 96 no 1 pp 22ndash262009

[36] R Galvao M Araujo G Jose M Pontes E Silva andT Saldanha ldquoA method for calibration and validation subsetpartitioningrdquo Talanta vol 67 no 4 pp 736ndash740 2005

10 Computational Intelligence and Neuroscience

Page 5: ANovelGeneticAlgorithm-BasedOptimizationFrameworkfor ...variable selection for NIR calibration of fishmeal for the quantitative determination of protein content. For NIR calibrations,

throughout the spectral scanning processeNIR spectrumwas measured over the wavenumber range of 1100ndash2500 nmwith a resolution of 2 nm so that 700 distincable wave-lengths were identified as variables recording the spectraldata e spectra of 194 fishmeal samples are shown inFigure 2

32 Sample Sets and Model Indicators e proposed algo-rithmic framework of GSMW-LPC-GA was applied tovariable selection for NIR calibration of fishmeal for thequantitative determination of protein content For NIRcalibrations the fishmeal samples should be split into themodeling set and the testing set As for modeling repre-sentativeness the testing set was randomly picked in ad-vance as including 54 samples and the remaining sampleswere for modeling Because the NIR calibration models needestablishment and optimization the modeling samples werefurther divided into the calibration part (90 samples) and thevalidation part (50 samples) where the method of sample setpartitioning based on joint x-y distance [36] was appliedemaximum minimum and average values and the standarddeviation for the calibration part validation part and testingset are listed in Table 1 e statistical data of all sampleswere also listed

Quantitative indicators are needed to evaluate the cal-ibration model Root mean square error (RMSE) and cor-relation coefficients (R) are commonly used which aredefined as follows

RMSE

1113936 yi minus yiprime( 11138572

n

1113971

R 1113936 yi minus ym( 1113857 yi

prime minus ymprime( 1113857

1113936 yi minus ym( 11138572

1113936 yiprime minus ymprime( 1113857

21113969

(2)

where yi and yiprime are the Kjeldahl-measured protein content

and the NIR predictive value of the i-th fishmeal sample ym

and ymprime are the mean value of yi | i 1 2 n1113864 1113865 and

yiprime | i 1 2 n1113864 1113865 respectively and n represents the

number of target samples Consequently we denotedRMSECRC RMSEVRV and RMSETRT as the corre-sponding signs of RMSER for the calibration samplesvalidation samples and testing samples respectively

4 Results and Discussion

e optimization framework of GSMW-LPC-GA was ap-plied to the quantitative analysis of fishmeal protein basedon the NIR spectral matrix In the framework GSMWmodeling method was used to select wavebands with highsignal-to-noise ratio which were regarded as the informativecontinuous wavelengths for further optimization Next la-tent variables were extracted from the initial continuouswavelengths by using the latent principal componenttechnique so as to hold more information of the target withless designated variablesen genetic algorithmwas appliedto promote the variable population evolution with steps ofselection crossover and mutation In the end we managed

to obtain some informative variables for the calibration offishmeal protein aiming to have a prospective improvementfor the quality of NIR analysis

41 Waveband Selection for the Partition of GSMWGSMW is designed to use the grid search mode to optimizethe way of parameter identification for the moving windowalgorithm flow As is known that a specific window is de-termined by the position and its width (ie the number ofvariables in the window) the task of moving window op-timization is to find the target parameter combinations of(N S) that point to a solid window leading to optimal NIRprediction effect In this work we set N and S tunabletargeting 700 discrete initial scanning wavelengths Con-sidering that a model established on a large number ofvariables will increase the computational complexity thuswe design N valuing continuously when it is not over 100and changing with a jumping step of 10 points when it issmaller than 300 and with a step of 20 when larger than 300(ie N isin [1 1 100]cup [110 10 300]cup [320 20 700]) Inthis way the value of N roughly went through some rep-resentative cases from 1 to 700 and also guaranteed a lesscomputing load for acceptable operation with currentlyprevalent computers On the other hand the parametricvalue of S was set changing from 1 consecutively to 700 (ieS isin [1 1 700]) All applicable combinations of (N S)should subject to the rule of N + S minus 1leP where P is equalto 700 Consequently we totally tested 78 790 possiblewindows

PLS the commonly used calibration method was ap-plied for each determinant window to estimate the func-tional contributions of the window wavelengths to theresults of NIR calibrations e calibration samples wereused for model training and the validation samples forselecting the optimal model with a minimum value ofRMSEV e predictive RMSEV matrix is shown as acontour plot in Figure 3 corresponding to each fixed valueof (N S)en the most optimal window can be identified inFigure 3 Additionally some well-performing models areexpected to have further improvement in the next steps ofthe GSMW-LPC-GA framework us we finally determine

1100 1300 1500 1700 1900 2100 2300 250001

02

03

04

05

06

07

08

09

10

Log

(1R

)

Wavelength (nm)

Figure 2 NIR reflectance spectra of 193 fishmeal samples

Computational Intelligence and Neuroscience 5

to choose 5 optimal windows for further analysis (marked as(1) (2) (3) (4) and (5) in Figure 3)e prediction results aswell as their operating parameters of the PLS models in these5 windows are presented in Table 2 respectively We foundthat the most optimal waveband was 1446ndash1520 nm whichincludes 38 wavelengthsvariables Also we spotlight theselected 5 optimal wavebands in the full range (see Figure 4)

42GeneticOptimization for LPCs in the SelectedWavebandse selected 5 optimal wavebands were available for furtheroptimization e parameter scaling genetic algorithm wasapplied with principal component analysis plugged in evariables in each waveband were transformed to principalcomponents (LPC) as latent variables for analysis enumber of LPCs is determined by the number of the originalvariables in the waveband ese latent components weresorted in descending contribution to its original data whichmeans that the latent variables in front of the queue containmore information than the variables in the back us the

LPC transform projected the raw data into a reasonablyrefined variable space in which the target information iseasier to observe e original variables in windows (1) (2)(3) (4) and (5) were transformed into 24 23 38 36 and 18LPCs respectively All of the LPCs were used as the inputsfor genetic optimization

Genetic optimization is the vital implementation forvariable selection embedded in the GSMW-LPC-GAframework To make the GA procedure suitable for theselected input data the population size was set fitting the

0 50 100 150 200 250 300Serial number of the starting wavelength (S)

350 400 450 500 550 600 650 700

102030405060708090

100120140160180200220

260240

280300

340

Num

ber o

f var

iabl

es (N

) 380

420

460

500

540

580

620

660

700RMSEV

114108102969084787266605448

Figure 3 Contour plot of the validating results by the GSMW model

Table 1 e statistical data of fishmeal protein contents for the calibration part validation part and testing set

No of samples Maximum Minimum Average Standard deviationCalibration part 90 6703 5317 60593 4347Validation part 50 6619 5386 59174 4412Testing set 54 6693 5384 60437 4308All samples 194 6703 5317 60670 4360

Table 2e optimal model validating results corresponding to the5 selected wavebands

S N Waveband (nm) RMSEV (wt) RV

(1) 18 24 1134ndash1180 5146 0874(2) 77 23 1252ndash1296 5071 0884(3) 174 38 1446ndash1520 4944 0905(4) 441 36 1980ndash2050 5190 0892(5) 486 18 2070ndash2104 5312 0908

6 Computational Intelligence and Neuroscience

1100 1300 1500 1700 1900 2100 2300 250046

47

48

49

50

51

52

53

542070ndash2104

1980ndash2050

1446ndash1520

1252ndash12961134ndash1180

RMSE

V

Wavelength

The shape of thespectrum not fitting

the RMSEV-axis

Figure 4 e locations of the 5 selected wavebands in the full spectral range

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

Evolution for variablesin waveband 1252ndash1296nm

42

44

46

48

50

52

54

56

58

0 100 200 300 400 500Iterations

(b) (c)

(d) (e)

Evolution for variablesin waveband 1446ndash1520nm

42

44

46

48

50

52

54

56

58

RMSE

V(lo

ss fu

nctio

n)RM

SEV

(loss

func

tion)

0 100 200 300 400 500Iterations

Evolution for variablesin waveband 2070ndash2104nm

42

44

46

48

50

52

54

56

58

0 100 200 300 400 500Iterations

0 100 200 300 400 50042

44

46

48

50

52

54

56

58Evolution for variables

in waveband 1134ndash1180nm

RMSE

V(lo

ss fu

nctio

n)RM

SEV

(loss

func

tion)

RMSE

V(lo

ss fu

nctio

n)

Iterations(a)

Evolution for variablesin waveband 1980ndash2050nm

42

44

46

48

50

52

54

56

58

0 100 200 300 400 500Iterations

(1) (2) (3)

(4) (5) (6)

(7) (8) (9)

Probability of crossover40 50 60

Prob

abili

ty o

f mut

atio

n 08

12

15

Figure 5 e optimizational effects for the principal variables in the 5 selected wavebands based on the parametric scaling GA iterations

Computational Intelligence and Neuroscience 7

inputs and then the selection crossover and mutation werecarried out For parametric scaling tuning the probability ofcrossover was set as 40 50 and 60 for testing emutation operator was set testing the probabilities of 0812 and 15 en the population automatically evolvedin a loop iteration and the loop was stopped in case ofmeeting each of the following two conditions and theframework-optimized model was determined

Condition 1 When reaching 500 times iteration the loopwas mandatorily stopped

Condition 2 When there was no further change appearingin the loss function (ie the RMSEV) for consecutively 20times iteration the loop was adaptively stopped

We tested the specific scaling parameter valuing ofcrossover andmutation in genetic algorithm for each of the 5designated wavebands e framework optimization resultsin the genetic evolution process are shown in Figure 5 It canbe seen in Figure 5 that the evolution loop did not stop insome cases until 500 times iteration and in other cases theloop stopped before iteration reached 500 times Finally wewere able to find the best results for each of the 5 selectedwavebands by the genetic optimizational process e sec-ondary optimized RMSEV and the corresponding RV areshown in Table 3 as well as their preferable GA parametersWe concluded from Table 3 that all of the data in the 5selected wavebands were further optimized by GA and the

most optimal result was observed in the waveband of1980ndash2050 nm whose evolution loop stopped at the 434times iteration

43 +e Testing Results Based on the Optimal GSMW-LPC-GAModel As discussed above the predictive performance ofthe GSMW-LPC-GA framework on the calibration andvalidation sample sets was satisfactory e optimal modelselected by the GSMW-LPC-GA framework was establishedon the waveband 1980ndash2050 nm (36 variables included)with genetic optimization at 40 crossover and 08 mu-tation e evolution loop stopped at 434 times iterationis framework optimal model outputs the prediction errorRMSEV as 4215 and the correlation RV as 0926 To evaluatethe practical modeling effect of the framework we took thewaveband 1980ndash2050 nm as the target spectral data in thetesting part performed the principal component extractionand applied the same GA parameters to estimate themodeling effect for the 53 testing samples which were in-dependent of the training part e PLS regression plot isshown in Figure 6(a) demonstrating the testing resultswhere the RMSET and RT were 5341 and 0883 respectivelyAs for comparison the typical GA evolution method wasused for establishing and optimizing the PLS calibrationvalidation and testing processes With common algorithmicsettings GA runs with 50 crossover 12 mutation and500 times of iteration [31] e GA-PLS testing results are

Table 3 e optimal model validating results corresponding to the 5 selected wavebands

Waveband (nm)e optimal parameters for genetic evolution

RMSEV (wt) RVCrossover () Mutation () Iteration times

1134ndash1180 50 15 500 4755 09061252ndash1296 40 08 437 4546 08971446ndash1520 50 08 469 4353 09181980ndash2050 40 08 434 4215 09262070ndash2104 50 08 484 4649 0912e iteration times represent when the optimal RMSEV was kept as a constant value for a cycle of 20 times iteration or less than 20 in case the iteration hadreached 500 times

40 45 50 55 60 65 70 75 8040

45

50

55

60

65

70

75

80 GSMW-LPC-GA optimizationwaveband 1980ndash2050nm

Measured content

RMSEV = 5341RT = 0883

NIR

pre

dict

ed v

alue

(a)

40 45 50 55 60 65 70 75 8040

45

50

55

60

65

70

75

80

RMSEV = 6372RT = 0864

Typical GA optimizationwaveband 1100ndash2500nm

NIR

pre

dict

ed v

alue

Measured content

(b)

Figure 6 e PLS regression plot for test samples ((a) the proposed GSMW-LPC-GA framework and (b) the typical GA evolution)

8 Computational Intelligence and Neuroscience

shown as the regression plot in Figure 6(b) e comparativeexperiments show that the proposed GSMW-LPC-GAframework performs better than the typical GA evolution inthe rapid NIR quantitative prediction of fishmeal protein

5 Conclusions

NIR calibration model was established for the quantitativedetermination of protein content in fishmeal samples Anovel optimization framework of GSMW-LPC-GA wasconstructed for model optimization Possible wavebandswere tested for model enhancement by grid searchingmoving window parameters in which the starting point wasset continuously changing from 1 to the total number ofwavelengths in the scanning range and the window size wasset in a specific design to balance the model representa-tiveness and the computational complexity Successively 5optimal window wavebands were selected for further op-timization by LPC feature extraction and GA iterationGenetic algorithmic iteration was designed as the coreprocess of implementable optimization e variableswavelengths in each of the 5 selected wavebands weretransformed to LPCs as the input variables for GA iterationIn GA optimization the population size was set obeying thedimensions of the input variable set e probability ofcrossover and mutation was designed for parameter tuninge control of iteration enlarged the extension of parametricscaling for generation evolutions Finally the model wassecondarily optimized by the selection crossover andmutation operators in a refined interaction within theGSMW-LPC-GA framework

e model prediction effects resulting from the GSMW-LPC-GA framework were experimentally proved better thanthe effect by simply adopting the moving window PLSmodel In calibration-validation model training the optimalGA probabilities for crossover and mutation were 40 and08 respectively and the predictive RMSEV and RV werebest as 4215 and 0926 respectively at the waveband1980ndash2050 nm is is significantly better than the movingwindow output at the waveband 1446ndash1520 nm For modelevaluation the best model generated from the GSMW-LPC-GA framework was further estimated by the testing sampleset which is previously chosen outside of the trainingsamples e testing results were not so good as the trainingresults but it was practically acceptable for some industrialrapid detection cases with the RMSET and RT equaling to5341 and 0883 respectively

e experimental results demonstrate that the proposedGSMW-LPC-GA framework is suitable for NIR quantitativedetermination of fishmeal protein It was able to producebetter models in relation to the full-spectrum model withthe advantage of selecting informative wavebands for targetchemical analytes In the framework GA optimizationeventually plays an important role in the improvement of thecalibration model Further studies include using the NIRtechnology to perform rapid detections on other targets infields of animal husbandry agriculture and food scienceand genetic optimization iteration is expected to be animplementable method providing an efficient strategy for

analyzing some unknown changes and influence of variousfertilizers

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (61505037 and 61763008) the NaturalScience Foundation of Guangxi Province(2018GXNSFAA050045) Guangzhou Science and Tech-nology Program (202002030246) and Fangchenggang Sci-entific Research and Technology Development Plan (2014no 42)

References

[1] M A Booth G L Allan J Frances and S ParkinsonldquoReplacement of fish meal in diets for Australian silver perchBidyanus bidyanusrdquo Aquaculture vol 196 no 1-2 pp 67ndash852001

[2] R W Hardy ldquoUtilization of plant proteins in fish diets effectsof global demand and supplies of fishmealrdquo AquacultureResearch vol 41 no 5 pp 770ndash776 2010

[3] A Corten C-B Braham and A S Sadegh ldquoe developmentof a fishmeal industry in Mauritania and its impact on theregional stocks of sardinella and other small pelagics inNorthwest Africardquo Fisheries Research vol 186 pp 328ndash3362017

[4] D Han X Shan W Zhang et al ldquoA revisit to fishmeal usageand associated consequences in Chinese aquaculturerdquo Re-views in Aquaculture vol 10 no 2 pp 493ndash507 2018

[5] D Cozzolino A Chree I Murray and J R Scaife ldquoeassessment of the chemical composition of fishmeal by nearinfrared reflectance spectroscopyrdquo Aquaculture Nutritionvol 8 no 2 pp 149ndash155 2002

[6] M Uddin E Okazaki H Fukushima S Turza Y Yumikoand Y Fukuda ldquoNondestructive determination of water andprotein in surimi by near-infrared spectroscopyrdquo FoodChemistry vol 96 no 3 pp 491ndash495 2006

[7] H Yamasaki and S Morita ldquoIdentification of the epoxycuring mechanism under isothermal conditions by thermalanalysis and infrared spectroscopyrdquo Journal of MolecularStructure vol 1069 no 1 pp 164ndash170 2014

[8] J Yan N Villarreal and B Xu ldquoCharacterization of degra-dation of cotton cellulosic fibers through near infraredspectroscopyrdquo Journal of Polymers and the Environmentvol 21 no 4 pp 902ndash909 2013

[9] P Sirisomboon P Pornchaloempong P RamsiriS Pongkuan and S Srikornkarn ldquoEvaluation of the saltcontent of canned sardines in tomato ketchup by diffusereflection near infrared spectroscopyrdquo Journal of Near In-frared Spectroscopy vol 22 no 5 pp 329ndash336 2014

[10] C A T dos Santos M Lopo R N M J Pascoa andJ A Lopes ldquoA review on the applications of portable near-

Computational Intelligence and Neuroscience 9

infrared spectrometers in the agro-food industryrdquo AppliedSpectroscopy vol 67 no 11 pp 1215ndash1233 2013

[11] A Sakudo ldquoNear-infrared spectroscopy for medical appli-cations Current status and future perspectivesrdquo ClinicaChimica Acta vol 455 pp 181ndash188 2016

[12] S Hong H Chen J Gu K Cai Z Liu and J WenldquoComparative assessment on smart pre-processing methodsfor extracting information in ft-nir measured datardquo Mea-surement vol 157 Article ID 107663 2020

[13] T N Kok and J W Park ldquoExtending the shelf life of set fishballrdquo Journal of Food Quality vol 30 no 1 pp 1ndash27 2007

[14] G Foca C Ferrari A Ulrici M C Ielo G Minelli andD P Lo Fiego ldquoIodine value and fatty acids determination onpig fat samples by FT-NIR spectroscopy benefits of variableselection in the perspective of industrial applicationsrdquo FoodAnalytical Methods vol 9 no 10 pp 2791ndash2806 2016

[15] F Westad and H Martens ldquoVariable selection in near in-frared spectroscopy based on significance testing in partialleast squares regressionrdquo Journal of Near Infrared Spectros-copy vol 8 no 2 pp 117ndash124 2000

[16] T Mehmood K H Liland L Snipen and S Saeligboslash ldquoA reviewof variable selection methods in partial least squares regres-sionrdquo Chemometrics and Intelligent Laboratory Systemsvol 118 pp 62ndash69 2012

[17] S Kuriakose and H Joe ldquoQualitative and quantitative analysisin sandalwood oils using near infrared spectroscopy com-bined with chemometric techniquesrdquo Food Chemistryvol 135 no 1 pp 213ndash218 2012

[18] M Versaci S Calcagno M Cacciola F C MorabitoI Palamara and D Pellicano ldquoStandard soft computingtechniques for characterization of defects in nondestructiveevaluationrdquo in Ultrasonic Nondestructive Evaluation SystemsP Burrascano S Callegari A Montisci et al Eds pp 175ndash199 Springer Cham Heidelberg Dordrecht London 2015

[19] M Versaci ldquoFuzzy approach and eddy currents NDTNDEdevices in industrial applicationsrdquo Electronics Letters vol 52no 11 pp 943ndash945 2016

[20] Y-H Yun H-D Li L E R Wood et al ldquoAn efficient methodof wavelength interval selection based on random frog formultivariate spectral calibrationrdquo Spectrochimica Acta Part AMolecular and Biomolecular Spectroscopy vol 111 pp 31ndash362013

[21] H Chen Z Liu K Cai L Xu and A Chen ldquoGrid searchparametric optimization for FT-NIR quantitative analysis ofsolid soluble content in strawberry samplesrdquo VibrationalSpectroscopy vol 94 pp 7ndash15 2018

[22] S Kasemsumran N Suttiwijitpukdee and V KeeratinijakalldquoRapid classification of turmeric based on DNA fingerprint bynear-infrared spectroscopy combined with moving windowpartial least squares-discrimination analysisrdquo Analytical Sci-ences vol 33 no 1 pp 111ndash115 2017

[23] K Zheng T Feng W Zhang et al ldquoVariable selection bydouble competitive adaptive reweighted sampling for cali-bration transfer of near infrared spectrardquo Chemometrics andIntelligent Laboratory Systems vol 191 pp 109ndash117 2019

[24] A Joo A Ekart and J P Neirotti ldquoGenetic algorithms fordiscovery of matrix multiplication methodsrdquo IEEE Transac-tions on Evolutionary Computation vol 16 no 5 pp 749ndash7512012

[25] E Haq I Ahmad A Hussain and I M Almanjahie ldquoA novelselection approach for genetic algorithms for global optimi-zation of multimodal continuous functionsrdquo ComputationalIntelligence and Neuroscience vol 2019 Article ID 86402182019

[26] B K Lavine and C G White ldquoBoosting the performance ofgenetic algorithms for variable selection in partial leastsquares spectral calibrationsrdquo Applied Spectroscopy vol 71no 9 pp 2092ndash2101 2017

[27] L Mo H Chen W Chen Q Feng and L Xu ldquoStudy onevolution methods for the optimization of machine learningmodels based on FT-NIR spectroscopyrdquo Infrared Physics ampTechnology vol 108 Article ID 103366 2020

[28] Y-H Yun H-D Li B-C Deng and D-S Cao ldquoAn overviewof variable selection methods in multivariate analysis of near-infrared spectrardquo TrAC Trends in Analytical Chemistryvol 113 pp 102ndash115 2019

[29] S Forrest ldquoGenetic algorithms principles of natural selectionapplied to computationrdquo Science vol 261 no 5123pp 872ndash878 1993

[30] H Jiang W Xu and Q Chen ldquoComparison of algorithms forwavelength variables selection from near-infrared (NIR)spectra for quantitative monitoring of yeast (saccharomycescerevisiae) cultivationsrdquo Spectrochimica Acta Part A Mo-lecular and Biomolecular Spectroscopy vol 214 pp 366ndash3712019

[31] J Koljonen T E M Nordling and J T Alander ldquoA review ofgenetic algorithms in near infrared spectroscopy and che-mometrics past and futurerdquo Journal of Near Infrared Spec-troscopy vol 16 no 3 pp 189ndash197 2008

[32] J-H Jiang R J Berry H W Siesler and Y OzakildquoWavelength interval selection in multicomponent spectralanalysis by moving window partial least-squares regressionwith applications to mid-infrared and near-infrared spec-troscopic datardquo Analytical Chemistry vol 74 no 14pp 3555ndash3565 2002

[33] H-Z Chen Q-Q Song G-Q Tang and L-L Xu ldquoAnoptimization strategy for waveband selection in FT-NIRquantitative analysis of corn proteinrdquo Journal of Cereal Sci-ence vol 60 no 3 pp 595ndash601 2014

[34] C Ranzan L F Trierweiler B Hitzmann andJ O Trierweiler ldquoNIR pre-selection data using modifiedchangeable size moving window partial least squares and purespectral chemometrical modeling with ant colony optimiza-tion for wheat flour characterizationrdquo Chemometrics andIntelligent Laboratory Systems vol 142 pp 78ndash86 2015

[35] T Fearn C Riccioli A Garrido-Varo and J E Guerrero-Ginel ldquoOn the geometry of SNV and MSCrdquo Chemometricsand Intelligent Laboratory Systems vol 96 no 1 pp 22ndash262009

[36] R Galvao M Araujo G Jose M Pontes E Silva andT Saldanha ldquoA method for calibration and validation subsetpartitioningrdquo Talanta vol 67 no 4 pp 736ndash740 2005

10 Computational Intelligence and Neuroscience

Page 6: ANovelGeneticAlgorithm-BasedOptimizationFrameworkfor ...variable selection for NIR calibration of fishmeal for the quantitative determination of protein content. For NIR calibrations,

to choose 5 optimal windows for further analysis (marked as(1) (2) (3) (4) and (5) in Figure 3)e prediction results aswell as their operating parameters of the PLS models in these5 windows are presented in Table 2 respectively We foundthat the most optimal waveband was 1446ndash1520 nm whichincludes 38 wavelengthsvariables Also we spotlight theselected 5 optimal wavebands in the full range (see Figure 4)

42GeneticOptimization for LPCs in the SelectedWavebandse selected 5 optimal wavebands were available for furtheroptimization e parameter scaling genetic algorithm wasapplied with principal component analysis plugged in evariables in each waveband were transformed to principalcomponents (LPC) as latent variables for analysis enumber of LPCs is determined by the number of the originalvariables in the waveband ese latent components weresorted in descending contribution to its original data whichmeans that the latent variables in front of the queue containmore information than the variables in the back us the

LPC transform projected the raw data into a reasonablyrefined variable space in which the target information iseasier to observe e original variables in windows (1) (2)(3) (4) and (5) were transformed into 24 23 38 36 and 18LPCs respectively All of the LPCs were used as the inputsfor genetic optimization

Genetic optimization is the vital implementation forvariable selection embedded in the GSMW-LPC-GAframework To make the GA procedure suitable for theselected input data the population size was set fitting the

0 50 100 150 200 250 300Serial number of the starting wavelength (S)

350 400 450 500 550 600 650 700

102030405060708090

100120140160180200220

260240

280300

340

Num

ber o

f var

iabl

es (N

) 380

420

460

500

540

580

620

660

700RMSEV

114108102969084787266605448

Figure 3 Contour plot of the validating results by the GSMW model

Table 1 e statistical data of fishmeal protein contents for the calibration part validation part and testing set

No of samples Maximum Minimum Average Standard deviationCalibration part 90 6703 5317 60593 4347Validation part 50 6619 5386 59174 4412Testing set 54 6693 5384 60437 4308All samples 194 6703 5317 60670 4360

Table 2e optimal model validating results corresponding to the5 selected wavebands

S N Waveband (nm) RMSEV (wt) RV

(1) 18 24 1134ndash1180 5146 0874(2) 77 23 1252ndash1296 5071 0884(3) 174 38 1446ndash1520 4944 0905(4) 441 36 1980ndash2050 5190 0892(5) 486 18 2070ndash2104 5312 0908

6 Computational Intelligence and Neuroscience

1100 1300 1500 1700 1900 2100 2300 250046

47

48

49

50

51

52

53

542070ndash2104

1980ndash2050

1446ndash1520

1252ndash12961134ndash1180

RMSE

V

Wavelength

The shape of thespectrum not fitting

the RMSEV-axis

Figure 4 e locations of the 5 selected wavebands in the full spectral range

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

Evolution for variablesin waveband 1252ndash1296nm

42

44

46

48

50

52

54

56

58

0 100 200 300 400 500Iterations

(b) (c)

(d) (e)

Evolution for variablesin waveband 1446ndash1520nm

42

44

46

48

50

52

54

56

58

RMSE

V(lo

ss fu

nctio

n)RM

SEV

(loss

func

tion)

0 100 200 300 400 500Iterations

Evolution for variablesin waveband 2070ndash2104nm

42

44

46

48

50

52

54

56

58

0 100 200 300 400 500Iterations

0 100 200 300 400 50042

44

46

48

50

52

54

56

58Evolution for variables

in waveband 1134ndash1180nm

RMSE

V(lo

ss fu

nctio

n)RM

SEV

(loss

func

tion)

RMSE

V(lo

ss fu

nctio

n)

Iterations(a)

Evolution for variablesin waveband 1980ndash2050nm

42

44

46

48

50

52

54

56

58

0 100 200 300 400 500Iterations

(1) (2) (3)

(4) (5) (6)

(7) (8) (9)

Probability of crossover40 50 60

Prob

abili

ty o

f mut

atio

n 08

12

15

Figure 5 e optimizational effects for the principal variables in the 5 selected wavebands based on the parametric scaling GA iterations

Computational Intelligence and Neuroscience 7

inputs and then the selection crossover and mutation werecarried out For parametric scaling tuning the probability ofcrossover was set as 40 50 and 60 for testing emutation operator was set testing the probabilities of 0812 and 15 en the population automatically evolvedin a loop iteration and the loop was stopped in case ofmeeting each of the following two conditions and theframework-optimized model was determined

Condition 1 When reaching 500 times iteration the loopwas mandatorily stopped

Condition 2 When there was no further change appearingin the loss function (ie the RMSEV) for consecutively 20times iteration the loop was adaptively stopped

We tested the specific scaling parameter valuing ofcrossover andmutation in genetic algorithm for each of the 5designated wavebands e framework optimization resultsin the genetic evolution process are shown in Figure 5 It canbe seen in Figure 5 that the evolution loop did not stop insome cases until 500 times iteration and in other cases theloop stopped before iteration reached 500 times Finally wewere able to find the best results for each of the 5 selectedwavebands by the genetic optimizational process e sec-ondary optimized RMSEV and the corresponding RV areshown in Table 3 as well as their preferable GA parametersWe concluded from Table 3 that all of the data in the 5selected wavebands were further optimized by GA and the

most optimal result was observed in the waveband of1980ndash2050 nm whose evolution loop stopped at the 434times iteration

43 +e Testing Results Based on the Optimal GSMW-LPC-GAModel As discussed above the predictive performance ofthe GSMW-LPC-GA framework on the calibration andvalidation sample sets was satisfactory e optimal modelselected by the GSMW-LPC-GA framework was establishedon the waveband 1980ndash2050 nm (36 variables included)with genetic optimization at 40 crossover and 08 mu-tation e evolution loop stopped at 434 times iterationis framework optimal model outputs the prediction errorRMSEV as 4215 and the correlation RV as 0926 To evaluatethe practical modeling effect of the framework we took thewaveband 1980ndash2050 nm as the target spectral data in thetesting part performed the principal component extractionand applied the same GA parameters to estimate themodeling effect for the 53 testing samples which were in-dependent of the training part e PLS regression plot isshown in Figure 6(a) demonstrating the testing resultswhere the RMSET and RT were 5341 and 0883 respectivelyAs for comparison the typical GA evolution method wasused for establishing and optimizing the PLS calibrationvalidation and testing processes With common algorithmicsettings GA runs with 50 crossover 12 mutation and500 times of iteration [31] e GA-PLS testing results are

Table 3 e optimal model validating results corresponding to the 5 selected wavebands

Waveband (nm)e optimal parameters for genetic evolution

RMSEV (wt) RVCrossover () Mutation () Iteration times

1134ndash1180 50 15 500 4755 09061252ndash1296 40 08 437 4546 08971446ndash1520 50 08 469 4353 09181980ndash2050 40 08 434 4215 09262070ndash2104 50 08 484 4649 0912e iteration times represent when the optimal RMSEV was kept as a constant value for a cycle of 20 times iteration or less than 20 in case the iteration hadreached 500 times

40 45 50 55 60 65 70 75 8040

45

50

55

60

65

70

75

80 GSMW-LPC-GA optimizationwaveband 1980ndash2050nm

Measured content

RMSEV = 5341RT = 0883

NIR

pre

dict

ed v

alue

(a)

40 45 50 55 60 65 70 75 8040

45

50

55

60

65

70

75

80

RMSEV = 6372RT = 0864

Typical GA optimizationwaveband 1100ndash2500nm

NIR

pre

dict

ed v

alue

Measured content

(b)

Figure 6 e PLS regression plot for test samples ((a) the proposed GSMW-LPC-GA framework and (b) the typical GA evolution)

8 Computational Intelligence and Neuroscience

shown as the regression plot in Figure 6(b) e comparativeexperiments show that the proposed GSMW-LPC-GAframework performs better than the typical GA evolution inthe rapid NIR quantitative prediction of fishmeal protein

5 Conclusions

NIR calibration model was established for the quantitativedetermination of protein content in fishmeal samples Anovel optimization framework of GSMW-LPC-GA wasconstructed for model optimization Possible wavebandswere tested for model enhancement by grid searchingmoving window parameters in which the starting point wasset continuously changing from 1 to the total number ofwavelengths in the scanning range and the window size wasset in a specific design to balance the model representa-tiveness and the computational complexity Successively 5optimal window wavebands were selected for further op-timization by LPC feature extraction and GA iterationGenetic algorithmic iteration was designed as the coreprocess of implementable optimization e variableswavelengths in each of the 5 selected wavebands weretransformed to LPCs as the input variables for GA iterationIn GA optimization the population size was set obeying thedimensions of the input variable set e probability ofcrossover and mutation was designed for parameter tuninge control of iteration enlarged the extension of parametricscaling for generation evolutions Finally the model wassecondarily optimized by the selection crossover andmutation operators in a refined interaction within theGSMW-LPC-GA framework

e model prediction effects resulting from the GSMW-LPC-GA framework were experimentally proved better thanthe effect by simply adopting the moving window PLSmodel In calibration-validation model training the optimalGA probabilities for crossover and mutation were 40 and08 respectively and the predictive RMSEV and RV werebest as 4215 and 0926 respectively at the waveband1980ndash2050 nm is is significantly better than the movingwindow output at the waveband 1446ndash1520 nm For modelevaluation the best model generated from the GSMW-LPC-GA framework was further estimated by the testing sampleset which is previously chosen outside of the trainingsamples e testing results were not so good as the trainingresults but it was practically acceptable for some industrialrapid detection cases with the RMSET and RT equaling to5341 and 0883 respectively

e experimental results demonstrate that the proposedGSMW-LPC-GA framework is suitable for NIR quantitativedetermination of fishmeal protein It was able to producebetter models in relation to the full-spectrum model withthe advantage of selecting informative wavebands for targetchemical analytes In the framework GA optimizationeventually plays an important role in the improvement of thecalibration model Further studies include using the NIRtechnology to perform rapid detections on other targets infields of animal husbandry agriculture and food scienceand genetic optimization iteration is expected to be animplementable method providing an efficient strategy for

analyzing some unknown changes and influence of variousfertilizers

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (61505037 and 61763008) the NaturalScience Foundation of Guangxi Province(2018GXNSFAA050045) Guangzhou Science and Tech-nology Program (202002030246) and Fangchenggang Sci-entific Research and Technology Development Plan (2014no 42)

References

[1] M A Booth G L Allan J Frances and S ParkinsonldquoReplacement of fish meal in diets for Australian silver perchBidyanus bidyanusrdquo Aquaculture vol 196 no 1-2 pp 67ndash852001

[2] R W Hardy ldquoUtilization of plant proteins in fish diets effectsof global demand and supplies of fishmealrdquo AquacultureResearch vol 41 no 5 pp 770ndash776 2010

[3] A Corten C-B Braham and A S Sadegh ldquoe developmentof a fishmeal industry in Mauritania and its impact on theregional stocks of sardinella and other small pelagics inNorthwest Africardquo Fisheries Research vol 186 pp 328ndash3362017

[4] D Han X Shan W Zhang et al ldquoA revisit to fishmeal usageand associated consequences in Chinese aquaculturerdquo Re-views in Aquaculture vol 10 no 2 pp 493ndash507 2018

[5] D Cozzolino A Chree I Murray and J R Scaife ldquoeassessment of the chemical composition of fishmeal by nearinfrared reflectance spectroscopyrdquo Aquaculture Nutritionvol 8 no 2 pp 149ndash155 2002

[6] M Uddin E Okazaki H Fukushima S Turza Y Yumikoand Y Fukuda ldquoNondestructive determination of water andprotein in surimi by near-infrared spectroscopyrdquo FoodChemistry vol 96 no 3 pp 491ndash495 2006

[7] H Yamasaki and S Morita ldquoIdentification of the epoxycuring mechanism under isothermal conditions by thermalanalysis and infrared spectroscopyrdquo Journal of MolecularStructure vol 1069 no 1 pp 164ndash170 2014

[8] J Yan N Villarreal and B Xu ldquoCharacterization of degra-dation of cotton cellulosic fibers through near infraredspectroscopyrdquo Journal of Polymers and the Environmentvol 21 no 4 pp 902ndash909 2013

[9] P Sirisomboon P Pornchaloempong P RamsiriS Pongkuan and S Srikornkarn ldquoEvaluation of the saltcontent of canned sardines in tomato ketchup by diffusereflection near infrared spectroscopyrdquo Journal of Near In-frared Spectroscopy vol 22 no 5 pp 329ndash336 2014

[10] C A T dos Santos M Lopo R N M J Pascoa andJ A Lopes ldquoA review on the applications of portable near-

Computational Intelligence and Neuroscience 9

infrared spectrometers in the agro-food industryrdquo AppliedSpectroscopy vol 67 no 11 pp 1215ndash1233 2013

[11] A Sakudo ldquoNear-infrared spectroscopy for medical appli-cations Current status and future perspectivesrdquo ClinicaChimica Acta vol 455 pp 181ndash188 2016

[12] S Hong H Chen J Gu K Cai Z Liu and J WenldquoComparative assessment on smart pre-processing methodsfor extracting information in ft-nir measured datardquo Mea-surement vol 157 Article ID 107663 2020

[13] T N Kok and J W Park ldquoExtending the shelf life of set fishballrdquo Journal of Food Quality vol 30 no 1 pp 1ndash27 2007

[14] G Foca C Ferrari A Ulrici M C Ielo G Minelli andD P Lo Fiego ldquoIodine value and fatty acids determination onpig fat samples by FT-NIR spectroscopy benefits of variableselection in the perspective of industrial applicationsrdquo FoodAnalytical Methods vol 9 no 10 pp 2791ndash2806 2016

[15] F Westad and H Martens ldquoVariable selection in near in-frared spectroscopy based on significance testing in partialleast squares regressionrdquo Journal of Near Infrared Spectros-copy vol 8 no 2 pp 117ndash124 2000

[16] T Mehmood K H Liland L Snipen and S Saeligboslash ldquoA reviewof variable selection methods in partial least squares regres-sionrdquo Chemometrics and Intelligent Laboratory Systemsvol 118 pp 62ndash69 2012

[17] S Kuriakose and H Joe ldquoQualitative and quantitative analysisin sandalwood oils using near infrared spectroscopy com-bined with chemometric techniquesrdquo Food Chemistryvol 135 no 1 pp 213ndash218 2012

[18] M Versaci S Calcagno M Cacciola F C MorabitoI Palamara and D Pellicano ldquoStandard soft computingtechniques for characterization of defects in nondestructiveevaluationrdquo in Ultrasonic Nondestructive Evaluation SystemsP Burrascano S Callegari A Montisci et al Eds pp 175ndash199 Springer Cham Heidelberg Dordrecht London 2015

[19] M Versaci ldquoFuzzy approach and eddy currents NDTNDEdevices in industrial applicationsrdquo Electronics Letters vol 52no 11 pp 943ndash945 2016

[20] Y-H Yun H-D Li L E R Wood et al ldquoAn efficient methodof wavelength interval selection based on random frog formultivariate spectral calibrationrdquo Spectrochimica Acta Part AMolecular and Biomolecular Spectroscopy vol 111 pp 31ndash362013

[21] H Chen Z Liu K Cai L Xu and A Chen ldquoGrid searchparametric optimization for FT-NIR quantitative analysis ofsolid soluble content in strawberry samplesrdquo VibrationalSpectroscopy vol 94 pp 7ndash15 2018

[22] S Kasemsumran N Suttiwijitpukdee and V KeeratinijakalldquoRapid classification of turmeric based on DNA fingerprint bynear-infrared spectroscopy combined with moving windowpartial least squares-discrimination analysisrdquo Analytical Sci-ences vol 33 no 1 pp 111ndash115 2017

[23] K Zheng T Feng W Zhang et al ldquoVariable selection bydouble competitive adaptive reweighted sampling for cali-bration transfer of near infrared spectrardquo Chemometrics andIntelligent Laboratory Systems vol 191 pp 109ndash117 2019

[24] A Joo A Ekart and J P Neirotti ldquoGenetic algorithms fordiscovery of matrix multiplication methodsrdquo IEEE Transac-tions on Evolutionary Computation vol 16 no 5 pp 749ndash7512012

[25] E Haq I Ahmad A Hussain and I M Almanjahie ldquoA novelselection approach for genetic algorithms for global optimi-zation of multimodal continuous functionsrdquo ComputationalIntelligence and Neuroscience vol 2019 Article ID 86402182019

[26] B K Lavine and C G White ldquoBoosting the performance ofgenetic algorithms for variable selection in partial leastsquares spectral calibrationsrdquo Applied Spectroscopy vol 71no 9 pp 2092ndash2101 2017

[27] L Mo H Chen W Chen Q Feng and L Xu ldquoStudy onevolution methods for the optimization of machine learningmodels based on FT-NIR spectroscopyrdquo Infrared Physics ampTechnology vol 108 Article ID 103366 2020

[28] Y-H Yun H-D Li B-C Deng and D-S Cao ldquoAn overviewof variable selection methods in multivariate analysis of near-infrared spectrardquo TrAC Trends in Analytical Chemistryvol 113 pp 102ndash115 2019

[29] S Forrest ldquoGenetic algorithms principles of natural selectionapplied to computationrdquo Science vol 261 no 5123pp 872ndash878 1993

[30] H Jiang W Xu and Q Chen ldquoComparison of algorithms forwavelength variables selection from near-infrared (NIR)spectra for quantitative monitoring of yeast (saccharomycescerevisiae) cultivationsrdquo Spectrochimica Acta Part A Mo-lecular and Biomolecular Spectroscopy vol 214 pp 366ndash3712019

[31] J Koljonen T E M Nordling and J T Alander ldquoA review ofgenetic algorithms in near infrared spectroscopy and che-mometrics past and futurerdquo Journal of Near Infrared Spec-troscopy vol 16 no 3 pp 189ndash197 2008

[32] J-H Jiang R J Berry H W Siesler and Y OzakildquoWavelength interval selection in multicomponent spectralanalysis by moving window partial least-squares regressionwith applications to mid-infrared and near-infrared spec-troscopic datardquo Analytical Chemistry vol 74 no 14pp 3555ndash3565 2002

[33] H-Z Chen Q-Q Song G-Q Tang and L-L Xu ldquoAnoptimization strategy for waveband selection in FT-NIRquantitative analysis of corn proteinrdquo Journal of Cereal Sci-ence vol 60 no 3 pp 595ndash601 2014

[34] C Ranzan L F Trierweiler B Hitzmann andJ O Trierweiler ldquoNIR pre-selection data using modifiedchangeable size moving window partial least squares and purespectral chemometrical modeling with ant colony optimiza-tion for wheat flour characterizationrdquo Chemometrics andIntelligent Laboratory Systems vol 142 pp 78ndash86 2015

[35] T Fearn C Riccioli A Garrido-Varo and J E Guerrero-Ginel ldquoOn the geometry of SNV and MSCrdquo Chemometricsand Intelligent Laboratory Systems vol 96 no 1 pp 22ndash262009

[36] R Galvao M Araujo G Jose M Pontes E Silva andT Saldanha ldquoA method for calibration and validation subsetpartitioningrdquo Talanta vol 67 no 4 pp 736ndash740 2005

10 Computational Intelligence and Neuroscience

Page 7: ANovelGeneticAlgorithm-BasedOptimizationFrameworkfor ...variable selection for NIR calibration of fishmeal for the quantitative determination of protein content. For NIR calibrations,

1100 1300 1500 1700 1900 2100 2300 250046

47

48

49

50

51

52

53

542070ndash2104

1980ndash2050

1446ndash1520

1252ndash12961134ndash1180

RMSE

V

Wavelength

The shape of thespectrum not fitting

the RMSEV-axis

Figure 4 e locations of the 5 selected wavebands in the full spectral range

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

(1)(4)(7)

(2)(5)(8)

(3)(6)(9)

Evolution for variablesin waveband 1252ndash1296nm

42

44

46

48

50

52

54

56

58

0 100 200 300 400 500Iterations

(b) (c)

(d) (e)

Evolution for variablesin waveband 1446ndash1520nm

42

44

46

48

50

52

54

56

58

RMSE

V(lo

ss fu

nctio

n)RM

SEV

(loss

func

tion)

0 100 200 300 400 500Iterations

Evolution for variablesin waveband 2070ndash2104nm

42

44

46

48

50

52

54

56

58

0 100 200 300 400 500Iterations

0 100 200 300 400 50042

44

46

48

50

52

54

56

58Evolution for variables

in waveband 1134ndash1180nm

RMSE

V(lo

ss fu

nctio

n)RM

SEV

(loss

func

tion)

RMSE

V(lo

ss fu

nctio

n)

Iterations(a)

Evolution for variablesin waveband 1980ndash2050nm

42

44

46

48

50

52

54

56

58

0 100 200 300 400 500Iterations

(1) (2) (3)

(4) (5) (6)

(7) (8) (9)

Probability of crossover40 50 60

Prob

abili

ty o

f mut

atio

n 08

12

15

Figure 5 e optimizational effects for the principal variables in the 5 selected wavebands based on the parametric scaling GA iterations

Computational Intelligence and Neuroscience 7

inputs and then the selection crossover and mutation werecarried out For parametric scaling tuning the probability ofcrossover was set as 40 50 and 60 for testing emutation operator was set testing the probabilities of 0812 and 15 en the population automatically evolvedin a loop iteration and the loop was stopped in case ofmeeting each of the following two conditions and theframework-optimized model was determined

Condition 1 When reaching 500 times iteration the loopwas mandatorily stopped

Condition 2 When there was no further change appearingin the loss function (ie the RMSEV) for consecutively 20times iteration the loop was adaptively stopped

We tested the specific scaling parameter valuing ofcrossover andmutation in genetic algorithm for each of the 5designated wavebands e framework optimization resultsin the genetic evolution process are shown in Figure 5 It canbe seen in Figure 5 that the evolution loop did not stop insome cases until 500 times iteration and in other cases theloop stopped before iteration reached 500 times Finally wewere able to find the best results for each of the 5 selectedwavebands by the genetic optimizational process e sec-ondary optimized RMSEV and the corresponding RV areshown in Table 3 as well as their preferable GA parametersWe concluded from Table 3 that all of the data in the 5selected wavebands were further optimized by GA and the

most optimal result was observed in the waveband of1980ndash2050 nm whose evolution loop stopped at the 434times iteration

43 +e Testing Results Based on the Optimal GSMW-LPC-GAModel As discussed above the predictive performance ofthe GSMW-LPC-GA framework on the calibration andvalidation sample sets was satisfactory e optimal modelselected by the GSMW-LPC-GA framework was establishedon the waveband 1980ndash2050 nm (36 variables included)with genetic optimization at 40 crossover and 08 mu-tation e evolution loop stopped at 434 times iterationis framework optimal model outputs the prediction errorRMSEV as 4215 and the correlation RV as 0926 To evaluatethe practical modeling effect of the framework we took thewaveband 1980ndash2050 nm as the target spectral data in thetesting part performed the principal component extractionand applied the same GA parameters to estimate themodeling effect for the 53 testing samples which were in-dependent of the training part e PLS regression plot isshown in Figure 6(a) demonstrating the testing resultswhere the RMSET and RT were 5341 and 0883 respectivelyAs for comparison the typical GA evolution method wasused for establishing and optimizing the PLS calibrationvalidation and testing processes With common algorithmicsettings GA runs with 50 crossover 12 mutation and500 times of iteration [31] e GA-PLS testing results are

Table 3 e optimal model validating results corresponding to the 5 selected wavebands

Waveband (nm)e optimal parameters for genetic evolution

RMSEV (wt) RVCrossover () Mutation () Iteration times

1134ndash1180 50 15 500 4755 09061252ndash1296 40 08 437 4546 08971446ndash1520 50 08 469 4353 09181980ndash2050 40 08 434 4215 09262070ndash2104 50 08 484 4649 0912e iteration times represent when the optimal RMSEV was kept as a constant value for a cycle of 20 times iteration or less than 20 in case the iteration hadreached 500 times

40 45 50 55 60 65 70 75 8040

45

50

55

60

65

70

75

80 GSMW-LPC-GA optimizationwaveband 1980ndash2050nm

Measured content

RMSEV = 5341RT = 0883

NIR

pre

dict

ed v

alue

(a)

40 45 50 55 60 65 70 75 8040

45

50

55

60

65

70

75

80

RMSEV = 6372RT = 0864

Typical GA optimizationwaveband 1100ndash2500nm

NIR

pre

dict

ed v

alue

Measured content

(b)

Figure 6 e PLS regression plot for test samples ((a) the proposed GSMW-LPC-GA framework and (b) the typical GA evolution)

8 Computational Intelligence and Neuroscience

shown as the regression plot in Figure 6(b) e comparativeexperiments show that the proposed GSMW-LPC-GAframework performs better than the typical GA evolution inthe rapid NIR quantitative prediction of fishmeal protein

5 Conclusions

NIR calibration model was established for the quantitativedetermination of protein content in fishmeal samples Anovel optimization framework of GSMW-LPC-GA wasconstructed for model optimization Possible wavebandswere tested for model enhancement by grid searchingmoving window parameters in which the starting point wasset continuously changing from 1 to the total number ofwavelengths in the scanning range and the window size wasset in a specific design to balance the model representa-tiveness and the computational complexity Successively 5optimal window wavebands were selected for further op-timization by LPC feature extraction and GA iterationGenetic algorithmic iteration was designed as the coreprocess of implementable optimization e variableswavelengths in each of the 5 selected wavebands weretransformed to LPCs as the input variables for GA iterationIn GA optimization the population size was set obeying thedimensions of the input variable set e probability ofcrossover and mutation was designed for parameter tuninge control of iteration enlarged the extension of parametricscaling for generation evolutions Finally the model wassecondarily optimized by the selection crossover andmutation operators in a refined interaction within theGSMW-LPC-GA framework

e model prediction effects resulting from the GSMW-LPC-GA framework were experimentally proved better thanthe effect by simply adopting the moving window PLSmodel In calibration-validation model training the optimalGA probabilities for crossover and mutation were 40 and08 respectively and the predictive RMSEV and RV werebest as 4215 and 0926 respectively at the waveband1980ndash2050 nm is is significantly better than the movingwindow output at the waveband 1446ndash1520 nm For modelevaluation the best model generated from the GSMW-LPC-GA framework was further estimated by the testing sampleset which is previously chosen outside of the trainingsamples e testing results were not so good as the trainingresults but it was practically acceptable for some industrialrapid detection cases with the RMSET and RT equaling to5341 and 0883 respectively

e experimental results demonstrate that the proposedGSMW-LPC-GA framework is suitable for NIR quantitativedetermination of fishmeal protein It was able to producebetter models in relation to the full-spectrum model withthe advantage of selecting informative wavebands for targetchemical analytes In the framework GA optimizationeventually plays an important role in the improvement of thecalibration model Further studies include using the NIRtechnology to perform rapid detections on other targets infields of animal husbandry agriculture and food scienceand genetic optimization iteration is expected to be animplementable method providing an efficient strategy for

analyzing some unknown changes and influence of variousfertilizers

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (61505037 and 61763008) the NaturalScience Foundation of Guangxi Province(2018GXNSFAA050045) Guangzhou Science and Tech-nology Program (202002030246) and Fangchenggang Sci-entific Research and Technology Development Plan (2014no 42)

References

[1] M A Booth G L Allan J Frances and S ParkinsonldquoReplacement of fish meal in diets for Australian silver perchBidyanus bidyanusrdquo Aquaculture vol 196 no 1-2 pp 67ndash852001

[2] R W Hardy ldquoUtilization of plant proteins in fish diets effectsof global demand and supplies of fishmealrdquo AquacultureResearch vol 41 no 5 pp 770ndash776 2010

[3] A Corten C-B Braham and A S Sadegh ldquoe developmentof a fishmeal industry in Mauritania and its impact on theregional stocks of sardinella and other small pelagics inNorthwest Africardquo Fisheries Research vol 186 pp 328ndash3362017

[4] D Han X Shan W Zhang et al ldquoA revisit to fishmeal usageand associated consequences in Chinese aquaculturerdquo Re-views in Aquaculture vol 10 no 2 pp 493ndash507 2018

[5] D Cozzolino A Chree I Murray and J R Scaife ldquoeassessment of the chemical composition of fishmeal by nearinfrared reflectance spectroscopyrdquo Aquaculture Nutritionvol 8 no 2 pp 149ndash155 2002

[6] M Uddin E Okazaki H Fukushima S Turza Y Yumikoand Y Fukuda ldquoNondestructive determination of water andprotein in surimi by near-infrared spectroscopyrdquo FoodChemistry vol 96 no 3 pp 491ndash495 2006

[7] H Yamasaki and S Morita ldquoIdentification of the epoxycuring mechanism under isothermal conditions by thermalanalysis and infrared spectroscopyrdquo Journal of MolecularStructure vol 1069 no 1 pp 164ndash170 2014

[8] J Yan N Villarreal and B Xu ldquoCharacterization of degra-dation of cotton cellulosic fibers through near infraredspectroscopyrdquo Journal of Polymers and the Environmentvol 21 no 4 pp 902ndash909 2013

[9] P Sirisomboon P Pornchaloempong P RamsiriS Pongkuan and S Srikornkarn ldquoEvaluation of the saltcontent of canned sardines in tomato ketchup by diffusereflection near infrared spectroscopyrdquo Journal of Near In-frared Spectroscopy vol 22 no 5 pp 329ndash336 2014

[10] C A T dos Santos M Lopo R N M J Pascoa andJ A Lopes ldquoA review on the applications of portable near-

Computational Intelligence and Neuroscience 9

infrared spectrometers in the agro-food industryrdquo AppliedSpectroscopy vol 67 no 11 pp 1215ndash1233 2013

[11] A Sakudo ldquoNear-infrared spectroscopy for medical appli-cations Current status and future perspectivesrdquo ClinicaChimica Acta vol 455 pp 181ndash188 2016

[12] S Hong H Chen J Gu K Cai Z Liu and J WenldquoComparative assessment on smart pre-processing methodsfor extracting information in ft-nir measured datardquo Mea-surement vol 157 Article ID 107663 2020

[13] T N Kok and J W Park ldquoExtending the shelf life of set fishballrdquo Journal of Food Quality vol 30 no 1 pp 1ndash27 2007

[14] G Foca C Ferrari A Ulrici M C Ielo G Minelli andD P Lo Fiego ldquoIodine value and fatty acids determination onpig fat samples by FT-NIR spectroscopy benefits of variableselection in the perspective of industrial applicationsrdquo FoodAnalytical Methods vol 9 no 10 pp 2791ndash2806 2016

[15] F Westad and H Martens ldquoVariable selection in near in-frared spectroscopy based on significance testing in partialleast squares regressionrdquo Journal of Near Infrared Spectros-copy vol 8 no 2 pp 117ndash124 2000

[16] T Mehmood K H Liland L Snipen and S Saeligboslash ldquoA reviewof variable selection methods in partial least squares regres-sionrdquo Chemometrics and Intelligent Laboratory Systemsvol 118 pp 62ndash69 2012

[17] S Kuriakose and H Joe ldquoQualitative and quantitative analysisin sandalwood oils using near infrared spectroscopy com-bined with chemometric techniquesrdquo Food Chemistryvol 135 no 1 pp 213ndash218 2012

[18] M Versaci S Calcagno M Cacciola F C MorabitoI Palamara and D Pellicano ldquoStandard soft computingtechniques for characterization of defects in nondestructiveevaluationrdquo in Ultrasonic Nondestructive Evaluation SystemsP Burrascano S Callegari A Montisci et al Eds pp 175ndash199 Springer Cham Heidelberg Dordrecht London 2015

[19] M Versaci ldquoFuzzy approach and eddy currents NDTNDEdevices in industrial applicationsrdquo Electronics Letters vol 52no 11 pp 943ndash945 2016

[20] Y-H Yun H-D Li L E R Wood et al ldquoAn efficient methodof wavelength interval selection based on random frog formultivariate spectral calibrationrdquo Spectrochimica Acta Part AMolecular and Biomolecular Spectroscopy vol 111 pp 31ndash362013

[21] H Chen Z Liu K Cai L Xu and A Chen ldquoGrid searchparametric optimization for FT-NIR quantitative analysis ofsolid soluble content in strawberry samplesrdquo VibrationalSpectroscopy vol 94 pp 7ndash15 2018

[22] S Kasemsumran N Suttiwijitpukdee and V KeeratinijakalldquoRapid classification of turmeric based on DNA fingerprint bynear-infrared spectroscopy combined with moving windowpartial least squares-discrimination analysisrdquo Analytical Sci-ences vol 33 no 1 pp 111ndash115 2017

[23] K Zheng T Feng W Zhang et al ldquoVariable selection bydouble competitive adaptive reweighted sampling for cali-bration transfer of near infrared spectrardquo Chemometrics andIntelligent Laboratory Systems vol 191 pp 109ndash117 2019

[24] A Joo A Ekart and J P Neirotti ldquoGenetic algorithms fordiscovery of matrix multiplication methodsrdquo IEEE Transac-tions on Evolutionary Computation vol 16 no 5 pp 749ndash7512012

[25] E Haq I Ahmad A Hussain and I M Almanjahie ldquoA novelselection approach for genetic algorithms for global optimi-zation of multimodal continuous functionsrdquo ComputationalIntelligence and Neuroscience vol 2019 Article ID 86402182019

[26] B K Lavine and C G White ldquoBoosting the performance ofgenetic algorithms for variable selection in partial leastsquares spectral calibrationsrdquo Applied Spectroscopy vol 71no 9 pp 2092ndash2101 2017

[27] L Mo H Chen W Chen Q Feng and L Xu ldquoStudy onevolution methods for the optimization of machine learningmodels based on FT-NIR spectroscopyrdquo Infrared Physics ampTechnology vol 108 Article ID 103366 2020

[28] Y-H Yun H-D Li B-C Deng and D-S Cao ldquoAn overviewof variable selection methods in multivariate analysis of near-infrared spectrardquo TrAC Trends in Analytical Chemistryvol 113 pp 102ndash115 2019

[29] S Forrest ldquoGenetic algorithms principles of natural selectionapplied to computationrdquo Science vol 261 no 5123pp 872ndash878 1993

[30] H Jiang W Xu and Q Chen ldquoComparison of algorithms forwavelength variables selection from near-infrared (NIR)spectra for quantitative monitoring of yeast (saccharomycescerevisiae) cultivationsrdquo Spectrochimica Acta Part A Mo-lecular and Biomolecular Spectroscopy vol 214 pp 366ndash3712019

[31] J Koljonen T E M Nordling and J T Alander ldquoA review ofgenetic algorithms in near infrared spectroscopy and che-mometrics past and futurerdquo Journal of Near Infrared Spec-troscopy vol 16 no 3 pp 189ndash197 2008

[32] J-H Jiang R J Berry H W Siesler and Y OzakildquoWavelength interval selection in multicomponent spectralanalysis by moving window partial least-squares regressionwith applications to mid-infrared and near-infrared spec-troscopic datardquo Analytical Chemistry vol 74 no 14pp 3555ndash3565 2002

[33] H-Z Chen Q-Q Song G-Q Tang and L-L Xu ldquoAnoptimization strategy for waveband selection in FT-NIRquantitative analysis of corn proteinrdquo Journal of Cereal Sci-ence vol 60 no 3 pp 595ndash601 2014

[34] C Ranzan L F Trierweiler B Hitzmann andJ O Trierweiler ldquoNIR pre-selection data using modifiedchangeable size moving window partial least squares and purespectral chemometrical modeling with ant colony optimiza-tion for wheat flour characterizationrdquo Chemometrics andIntelligent Laboratory Systems vol 142 pp 78ndash86 2015

[35] T Fearn C Riccioli A Garrido-Varo and J E Guerrero-Ginel ldquoOn the geometry of SNV and MSCrdquo Chemometricsand Intelligent Laboratory Systems vol 96 no 1 pp 22ndash262009

[36] R Galvao M Araujo G Jose M Pontes E Silva andT Saldanha ldquoA method for calibration and validation subsetpartitioningrdquo Talanta vol 67 no 4 pp 736ndash740 2005

10 Computational Intelligence and Neuroscience

Page 8: ANovelGeneticAlgorithm-BasedOptimizationFrameworkfor ...variable selection for NIR calibration of fishmeal for the quantitative determination of protein content. For NIR calibrations,

inputs and then the selection crossover and mutation werecarried out For parametric scaling tuning the probability ofcrossover was set as 40 50 and 60 for testing emutation operator was set testing the probabilities of 0812 and 15 en the population automatically evolvedin a loop iteration and the loop was stopped in case ofmeeting each of the following two conditions and theframework-optimized model was determined

Condition 1 When reaching 500 times iteration the loopwas mandatorily stopped

Condition 2 When there was no further change appearingin the loss function (ie the RMSEV) for consecutively 20times iteration the loop was adaptively stopped

We tested the specific scaling parameter valuing ofcrossover andmutation in genetic algorithm for each of the 5designated wavebands e framework optimization resultsin the genetic evolution process are shown in Figure 5 It canbe seen in Figure 5 that the evolution loop did not stop insome cases until 500 times iteration and in other cases theloop stopped before iteration reached 500 times Finally wewere able to find the best results for each of the 5 selectedwavebands by the genetic optimizational process e sec-ondary optimized RMSEV and the corresponding RV areshown in Table 3 as well as their preferable GA parametersWe concluded from Table 3 that all of the data in the 5selected wavebands were further optimized by GA and the

most optimal result was observed in the waveband of1980ndash2050 nm whose evolution loop stopped at the 434times iteration

43 +e Testing Results Based on the Optimal GSMW-LPC-GAModel As discussed above the predictive performance ofthe GSMW-LPC-GA framework on the calibration andvalidation sample sets was satisfactory e optimal modelselected by the GSMW-LPC-GA framework was establishedon the waveband 1980ndash2050 nm (36 variables included)with genetic optimization at 40 crossover and 08 mu-tation e evolution loop stopped at 434 times iterationis framework optimal model outputs the prediction errorRMSEV as 4215 and the correlation RV as 0926 To evaluatethe practical modeling effect of the framework we took thewaveband 1980ndash2050 nm as the target spectral data in thetesting part performed the principal component extractionand applied the same GA parameters to estimate themodeling effect for the 53 testing samples which were in-dependent of the training part e PLS regression plot isshown in Figure 6(a) demonstrating the testing resultswhere the RMSET and RT were 5341 and 0883 respectivelyAs for comparison the typical GA evolution method wasused for establishing and optimizing the PLS calibrationvalidation and testing processes With common algorithmicsettings GA runs with 50 crossover 12 mutation and500 times of iteration [31] e GA-PLS testing results are

Table 3 e optimal model validating results corresponding to the 5 selected wavebands

Waveband (nm)e optimal parameters for genetic evolution

RMSEV (wt) RVCrossover () Mutation () Iteration times

1134ndash1180 50 15 500 4755 09061252ndash1296 40 08 437 4546 08971446ndash1520 50 08 469 4353 09181980ndash2050 40 08 434 4215 09262070ndash2104 50 08 484 4649 0912e iteration times represent when the optimal RMSEV was kept as a constant value for a cycle of 20 times iteration or less than 20 in case the iteration hadreached 500 times

40 45 50 55 60 65 70 75 8040

45

50

55

60

65

70

75

80 GSMW-LPC-GA optimizationwaveband 1980ndash2050nm

Measured content

RMSEV = 5341RT = 0883

NIR

pre

dict

ed v

alue

(a)

40 45 50 55 60 65 70 75 8040

45

50

55

60

65

70

75

80

RMSEV = 6372RT = 0864

Typical GA optimizationwaveband 1100ndash2500nm

NIR

pre

dict

ed v

alue

Measured content

(b)

Figure 6 e PLS regression plot for test samples ((a) the proposed GSMW-LPC-GA framework and (b) the typical GA evolution)

8 Computational Intelligence and Neuroscience

shown as the regression plot in Figure 6(b) e comparativeexperiments show that the proposed GSMW-LPC-GAframework performs better than the typical GA evolution inthe rapid NIR quantitative prediction of fishmeal protein

5 Conclusions

NIR calibration model was established for the quantitativedetermination of protein content in fishmeal samples Anovel optimization framework of GSMW-LPC-GA wasconstructed for model optimization Possible wavebandswere tested for model enhancement by grid searchingmoving window parameters in which the starting point wasset continuously changing from 1 to the total number ofwavelengths in the scanning range and the window size wasset in a specific design to balance the model representa-tiveness and the computational complexity Successively 5optimal window wavebands were selected for further op-timization by LPC feature extraction and GA iterationGenetic algorithmic iteration was designed as the coreprocess of implementable optimization e variableswavelengths in each of the 5 selected wavebands weretransformed to LPCs as the input variables for GA iterationIn GA optimization the population size was set obeying thedimensions of the input variable set e probability ofcrossover and mutation was designed for parameter tuninge control of iteration enlarged the extension of parametricscaling for generation evolutions Finally the model wassecondarily optimized by the selection crossover andmutation operators in a refined interaction within theGSMW-LPC-GA framework

e model prediction effects resulting from the GSMW-LPC-GA framework were experimentally proved better thanthe effect by simply adopting the moving window PLSmodel In calibration-validation model training the optimalGA probabilities for crossover and mutation were 40 and08 respectively and the predictive RMSEV and RV werebest as 4215 and 0926 respectively at the waveband1980ndash2050 nm is is significantly better than the movingwindow output at the waveband 1446ndash1520 nm For modelevaluation the best model generated from the GSMW-LPC-GA framework was further estimated by the testing sampleset which is previously chosen outside of the trainingsamples e testing results were not so good as the trainingresults but it was practically acceptable for some industrialrapid detection cases with the RMSET and RT equaling to5341 and 0883 respectively

e experimental results demonstrate that the proposedGSMW-LPC-GA framework is suitable for NIR quantitativedetermination of fishmeal protein It was able to producebetter models in relation to the full-spectrum model withthe advantage of selecting informative wavebands for targetchemical analytes In the framework GA optimizationeventually plays an important role in the improvement of thecalibration model Further studies include using the NIRtechnology to perform rapid detections on other targets infields of animal husbandry agriculture and food scienceand genetic optimization iteration is expected to be animplementable method providing an efficient strategy for

analyzing some unknown changes and influence of variousfertilizers

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (61505037 and 61763008) the NaturalScience Foundation of Guangxi Province(2018GXNSFAA050045) Guangzhou Science and Tech-nology Program (202002030246) and Fangchenggang Sci-entific Research and Technology Development Plan (2014no 42)

References

[1] M A Booth G L Allan J Frances and S ParkinsonldquoReplacement of fish meal in diets for Australian silver perchBidyanus bidyanusrdquo Aquaculture vol 196 no 1-2 pp 67ndash852001

[2] R W Hardy ldquoUtilization of plant proteins in fish diets effectsof global demand and supplies of fishmealrdquo AquacultureResearch vol 41 no 5 pp 770ndash776 2010

[3] A Corten C-B Braham and A S Sadegh ldquoe developmentof a fishmeal industry in Mauritania and its impact on theregional stocks of sardinella and other small pelagics inNorthwest Africardquo Fisheries Research vol 186 pp 328ndash3362017

[4] D Han X Shan W Zhang et al ldquoA revisit to fishmeal usageand associated consequences in Chinese aquaculturerdquo Re-views in Aquaculture vol 10 no 2 pp 493ndash507 2018

[5] D Cozzolino A Chree I Murray and J R Scaife ldquoeassessment of the chemical composition of fishmeal by nearinfrared reflectance spectroscopyrdquo Aquaculture Nutritionvol 8 no 2 pp 149ndash155 2002

[6] M Uddin E Okazaki H Fukushima S Turza Y Yumikoand Y Fukuda ldquoNondestructive determination of water andprotein in surimi by near-infrared spectroscopyrdquo FoodChemistry vol 96 no 3 pp 491ndash495 2006

[7] H Yamasaki and S Morita ldquoIdentification of the epoxycuring mechanism under isothermal conditions by thermalanalysis and infrared spectroscopyrdquo Journal of MolecularStructure vol 1069 no 1 pp 164ndash170 2014

[8] J Yan N Villarreal and B Xu ldquoCharacterization of degra-dation of cotton cellulosic fibers through near infraredspectroscopyrdquo Journal of Polymers and the Environmentvol 21 no 4 pp 902ndash909 2013

[9] P Sirisomboon P Pornchaloempong P RamsiriS Pongkuan and S Srikornkarn ldquoEvaluation of the saltcontent of canned sardines in tomato ketchup by diffusereflection near infrared spectroscopyrdquo Journal of Near In-frared Spectroscopy vol 22 no 5 pp 329ndash336 2014

[10] C A T dos Santos M Lopo R N M J Pascoa andJ A Lopes ldquoA review on the applications of portable near-

Computational Intelligence and Neuroscience 9

infrared spectrometers in the agro-food industryrdquo AppliedSpectroscopy vol 67 no 11 pp 1215ndash1233 2013

[11] A Sakudo ldquoNear-infrared spectroscopy for medical appli-cations Current status and future perspectivesrdquo ClinicaChimica Acta vol 455 pp 181ndash188 2016

[12] S Hong H Chen J Gu K Cai Z Liu and J WenldquoComparative assessment on smart pre-processing methodsfor extracting information in ft-nir measured datardquo Mea-surement vol 157 Article ID 107663 2020

[13] T N Kok and J W Park ldquoExtending the shelf life of set fishballrdquo Journal of Food Quality vol 30 no 1 pp 1ndash27 2007

[14] G Foca C Ferrari A Ulrici M C Ielo G Minelli andD P Lo Fiego ldquoIodine value and fatty acids determination onpig fat samples by FT-NIR spectroscopy benefits of variableselection in the perspective of industrial applicationsrdquo FoodAnalytical Methods vol 9 no 10 pp 2791ndash2806 2016

[15] F Westad and H Martens ldquoVariable selection in near in-frared spectroscopy based on significance testing in partialleast squares regressionrdquo Journal of Near Infrared Spectros-copy vol 8 no 2 pp 117ndash124 2000

[16] T Mehmood K H Liland L Snipen and S Saeligboslash ldquoA reviewof variable selection methods in partial least squares regres-sionrdquo Chemometrics and Intelligent Laboratory Systemsvol 118 pp 62ndash69 2012

[17] S Kuriakose and H Joe ldquoQualitative and quantitative analysisin sandalwood oils using near infrared spectroscopy com-bined with chemometric techniquesrdquo Food Chemistryvol 135 no 1 pp 213ndash218 2012

[18] M Versaci S Calcagno M Cacciola F C MorabitoI Palamara and D Pellicano ldquoStandard soft computingtechniques for characterization of defects in nondestructiveevaluationrdquo in Ultrasonic Nondestructive Evaluation SystemsP Burrascano S Callegari A Montisci et al Eds pp 175ndash199 Springer Cham Heidelberg Dordrecht London 2015

[19] M Versaci ldquoFuzzy approach and eddy currents NDTNDEdevices in industrial applicationsrdquo Electronics Letters vol 52no 11 pp 943ndash945 2016

[20] Y-H Yun H-D Li L E R Wood et al ldquoAn efficient methodof wavelength interval selection based on random frog formultivariate spectral calibrationrdquo Spectrochimica Acta Part AMolecular and Biomolecular Spectroscopy vol 111 pp 31ndash362013

[21] H Chen Z Liu K Cai L Xu and A Chen ldquoGrid searchparametric optimization for FT-NIR quantitative analysis ofsolid soluble content in strawberry samplesrdquo VibrationalSpectroscopy vol 94 pp 7ndash15 2018

[22] S Kasemsumran N Suttiwijitpukdee and V KeeratinijakalldquoRapid classification of turmeric based on DNA fingerprint bynear-infrared spectroscopy combined with moving windowpartial least squares-discrimination analysisrdquo Analytical Sci-ences vol 33 no 1 pp 111ndash115 2017

[23] K Zheng T Feng W Zhang et al ldquoVariable selection bydouble competitive adaptive reweighted sampling for cali-bration transfer of near infrared spectrardquo Chemometrics andIntelligent Laboratory Systems vol 191 pp 109ndash117 2019

[24] A Joo A Ekart and J P Neirotti ldquoGenetic algorithms fordiscovery of matrix multiplication methodsrdquo IEEE Transac-tions on Evolutionary Computation vol 16 no 5 pp 749ndash7512012

[25] E Haq I Ahmad A Hussain and I M Almanjahie ldquoA novelselection approach for genetic algorithms for global optimi-zation of multimodal continuous functionsrdquo ComputationalIntelligence and Neuroscience vol 2019 Article ID 86402182019

[26] B K Lavine and C G White ldquoBoosting the performance ofgenetic algorithms for variable selection in partial leastsquares spectral calibrationsrdquo Applied Spectroscopy vol 71no 9 pp 2092ndash2101 2017

[27] L Mo H Chen W Chen Q Feng and L Xu ldquoStudy onevolution methods for the optimization of machine learningmodels based on FT-NIR spectroscopyrdquo Infrared Physics ampTechnology vol 108 Article ID 103366 2020

[28] Y-H Yun H-D Li B-C Deng and D-S Cao ldquoAn overviewof variable selection methods in multivariate analysis of near-infrared spectrardquo TrAC Trends in Analytical Chemistryvol 113 pp 102ndash115 2019

[29] S Forrest ldquoGenetic algorithms principles of natural selectionapplied to computationrdquo Science vol 261 no 5123pp 872ndash878 1993

[30] H Jiang W Xu and Q Chen ldquoComparison of algorithms forwavelength variables selection from near-infrared (NIR)spectra for quantitative monitoring of yeast (saccharomycescerevisiae) cultivationsrdquo Spectrochimica Acta Part A Mo-lecular and Biomolecular Spectroscopy vol 214 pp 366ndash3712019

[31] J Koljonen T E M Nordling and J T Alander ldquoA review ofgenetic algorithms in near infrared spectroscopy and che-mometrics past and futurerdquo Journal of Near Infrared Spec-troscopy vol 16 no 3 pp 189ndash197 2008

[32] J-H Jiang R J Berry H W Siesler and Y OzakildquoWavelength interval selection in multicomponent spectralanalysis by moving window partial least-squares regressionwith applications to mid-infrared and near-infrared spec-troscopic datardquo Analytical Chemistry vol 74 no 14pp 3555ndash3565 2002

[33] H-Z Chen Q-Q Song G-Q Tang and L-L Xu ldquoAnoptimization strategy for waveband selection in FT-NIRquantitative analysis of corn proteinrdquo Journal of Cereal Sci-ence vol 60 no 3 pp 595ndash601 2014

[34] C Ranzan L F Trierweiler B Hitzmann andJ O Trierweiler ldquoNIR pre-selection data using modifiedchangeable size moving window partial least squares and purespectral chemometrical modeling with ant colony optimiza-tion for wheat flour characterizationrdquo Chemometrics andIntelligent Laboratory Systems vol 142 pp 78ndash86 2015

[35] T Fearn C Riccioli A Garrido-Varo and J E Guerrero-Ginel ldquoOn the geometry of SNV and MSCrdquo Chemometricsand Intelligent Laboratory Systems vol 96 no 1 pp 22ndash262009

[36] R Galvao M Araujo G Jose M Pontes E Silva andT Saldanha ldquoA method for calibration and validation subsetpartitioningrdquo Talanta vol 67 no 4 pp 736ndash740 2005

10 Computational Intelligence and Neuroscience

Page 9: ANovelGeneticAlgorithm-BasedOptimizationFrameworkfor ...variable selection for NIR calibration of fishmeal for the quantitative determination of protein content. For NIR calibrations,

shown as the regression plot in Figure 6(b) e comparativeexperiments show that the proposed GSMW-LPC-GAframework performs better than the typical GA evolution inthe rapid NIR quantitative prediction of fishmeal protein

5 Conclusions

NIR calibration model was established for the quantitativedetermination of protein content in fishmeal samples Anovel optimization framework of GSMW-LPC-GA wasconstructed for model optimization Possible wavebandswere tested for model enhancement by grid searchingmoving window parameters in which the starting point wasset continuously changing from 1 to the total number ofwavelengths in the scanning range and the window size wasset in a specific design to balance the model representa-tiveness and the computational complexity Successively 5optimal window wavebands were selected for further op-timization by LPC feature extraction and GA iterationGenetic algorithmic iteration was designed as the coreprocess of implementable optimization e variableswavelengths in each of the 5 selected wavebands weretransformed to LPCs as the input variables for GA iterationIn GA optimization the population size was set obeying thedimensions of the input variable set e probability ofcrossover and mutation was designed for parameter tuninge control of iteration enlarged the extension of parametricscaling for generation evolutions Finally the model wassecondarily optimized by the selection crossover andmutation operators in a refined interaction within theGSMW-LPC-GA framework

e model prediction effects resulting from the GSMW-LPC-GA framework were experimentally proved better thanthe effect by simply adopting the moving window PLSmodel In calibration-validation model training the optimalGA probabilities for crossover and mutation were 40 and08 respectively and the predictive RMSEV and RV werebest as 4215 and 0926 respectively at the waveband1980ndash2050 nm is is significantly better than the movingwindow output at the waveband 1446ndash1520 nm For modelevaluation the best model generated from the GSMW-LPC-GA framework was further estimated by the testing sampleset which is previously chosen outside of the trainingsamples e testing results were not so good as the trainingresults but it was practically acceptable for some industrialrapid detection cases with the RMSET and RT equaling to5341 and 0883 respectively

e experimental results demonstrate that the proposedGSMW-LPC-GA framework is suitable for NIR quantitativedetermination of fishmeal protein It was able to producebetter models in relation to the full-spectrum model withthe advantage of selecting informative wavebands for targetchemical analytes In the framework GA optimizationeventually plays an important role in the improvement of thecalibration model Further studies include using the NIRtechnology to perform rapid detections on other targets infields of animal husbandry agriculture and food scienceand genetic optimization iteration is expected to be animplementable method providing an efficient strategy for

analyzing some unknown changes and influence of variousfertilizers

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (61505037 and 61763008) the NaturalScience Foundation of Guangxi Province(2018GXNSFAA050045) Guangzhou Science and Tech-nology Program (202002030246) and Fangchenggang Sci-entific Research and Technology Development Plan (2014no 42)

References

[1] M A Booth G L Allan J Frances and S ParkinsonldquoReplacement of fish meal in diets for Australian silver perchBidyanus bidyanusrdquo Aquaculture vol 196 no 1-2 pp 67ndash852001

[2] R W Hardy ldquoUtilization of plant proteins in fish diets effectsof global demand and supplies of fishmealrdquo AquacultureResearch vol 41 no 5 pp 770ndash776 2010

[3] A Corten C-B Braham and A S Sadegh ldquoe developmentof a fishmeal industry in Mauritania and its impact on theregional stocks of sardinella and other small pelagics inNorthwest Africardquo Fisheries Research vol 186 pp 328ndash3362017

[4] D Han X Shan W Zhang et al ldquoA revisit to fishmeal usageand associated consequences in Chinese aquaculturerdquo Re-views in Aquaculture vol 10 no 2 pp 493ndash507 2018

[5] D Cozzolino A Chree I Murray and J R Scaife ldquoeassessment of the chemical composition of fishmeal by nearinfrared reflectance spectroscopyrdquo Aquaculture Nutritionvol 8 no 2 pp 149ndash155 2002

[6] M Uddin E Okazaki H Fukushima S Turza Y Yumikoand Y Fukuda ldquoNondestructive determination of water andprotein in surimi by near-infrared spectroscopyrdquo FoodChemistry vol 96 no 3 pp 491ndash495 2006

[7] H Yamasaki and S Morita ldquoIdentification of the epoxycuring mechanism under isothermal conditions by thermalanalysis and infrared spectroscopyrdquo Journal of MolecularStructure vol 1069 no 1 pp 164ndash170 2014

[8] J Yan N Villarreal and B Xu ldquoCharacterization of degra-dation of cotton cellulosic fibers through near infraredspectroscopyrdquo Journal of Polymers and the Environmentvol 21 no 4 pp 902ndash909 2013

[9] P Sirisomboon P Pornchaloempong P RamsiriS Pongkuan and S Srikornkarn ldquoEvaluation of the saltcontent of canned sardines in tomato ketchup by diffusereflection near infrared spectroscopyrdquo Journal of Near In-frared Spectroscopy vol 22 no 5 pp 329ndash336 2014

[10] C A T dos Santos M Lopo R N M J Pascoa andJ A Lopes ldquoA review on the applications of portable near-

Computational Intelligence and Neuroscience 9

infrared spectrometers in the agro-food industryrdquo AppliedSpectroscopy vol 67 no 11 pp 1215ndash1233 2013

[11] A Sakudo ldquoNear-infrared spectroscopy for medical appli-cations Current status and future perspectivesrdquo ClinicaChimica Acta vol 455 pp 181ndash188 2016

[12] S Hong H Chen J Gu K Cai Z Liu and J WenldquoComparative assessment on smart pre-processing methodsfor extracting information in ft-nir measured datardquo Mea-surement vol 157 Article ID 107663 2020

[13] T N Kok and J W Park ldquoExtending the shelf life of set fishballrdquo Journal of Food Quality vol 30 no 1 pp 1ndash27 2007

[14] G Foca C Ferrari A Ulrici M C Ielo G Minelli andD P Lo Fiego ldquoIodine value and fatty acids determination onpig fat samples by FT-NIR spectroscopy benefits of variableselection in the perspective of industrial applicationsrdquo FoodAnalytical Methods vol 9 no 10 pp 2791ndash2806 2016

[15] F Westad and H Martens ldquoVariable selection in near in-frared spectroscopy based on significance testing in partialleast squares regressionrdquo Journal of Near Infrared Spectros-copy vol 8 no 2 pp 117ndash124 2000

[16] T Mehmood K H Liland L Snipen and S Saeligboslash ldquoA reviewof variable selection methods in partial least squares regres-sionrdquo Chemometrics and Intelligent Laboratory Systemsvol 118 pp 62ndash69 2012

[17] S Kuriakose and H Joe ldquoQualitative and quantitative analysisin sandalwood oils using near infrared spectroscopy com-bined with chemometric techniquesrdquo Food Chemistryvol 135 no 1 pp 213ndash218 2012

[18] M Versaci S Calcagno M Cacciola F C MorabitoI Palamara and D Pellicano ldquoStandard soft computingtechniques for characterization of defects in nondestructiveevaluationrdquo in Ultrasonic Nondestructive Evaluation SystemsP Burrascano S Callegari A Montisci et al Eds pp 175ndash199 Springer Cham Heidelberg Dordrecht London 2015

[19] M Versaci ldquoFuzzy approach and eddy currents NDTNDEdevices in industrial applicationsrdquo Electronics Letters vol 52no 11 pp 943ndash945 2016

[20] Y-H Yun H-D Li L E R Wood et al ldquoAn efficient methodof wavelength interval selection based on random frog formultivariate spectral calibrationrdquo Spectrochimica Acta Part AMolecular and Biomolecular Spectroscopy vol 111 pp 31ndash362013

[21] H Chen Z Liu K Cai L Xu and A Chen ldquoGrid searchparametric optimization for FT-NIR quantitative analysis ofsolid soluble content in strawberry samplesrdquo VibrationalSpectroscopy vol 94 pp 7ndash15 2018

[22] S Kasemsumran N Suttiwijitpukdee and V KeeratinijakalldquoRapid classification of turmeric based on DNA fingerprint bynear-infrared spectroscopy combined with moving windowpartial least squares-discrimination analysisrdquo Analytical Sci-ences vol 33 no 1 pp 111ndash115 2017

[23] K Zheng T Feng W Zhang et al ldquoVariable selection bydouble competitive adaptive reweighted sampling for cali-bration transfer of near infrared spectrardquo Chemometrics andIntelligent Laboratory Systems vol 191 pp 109ndash117 2019

[24] A Joo A Ekart and J P Neirotti ldquoGenetic algorithms fordiscovery of matrix multiplication methodsrdquo IEEE Transac-tions on Evolutionary Computation vol 16 no 5 pp 749ndash7512012

[25] E Haq I Ahmad A Hussain and I M Almanjahie ldquoA novelselection approach for genetic algorithms for global optimi-zation of multimodal continuous functionsrdquo ComputationalIntelligence and Neuroscience vol 2019 Article ID 86402182019

[26] B K Lavine and C G White ldquoBoosting the performance ofgenetic algorithms for variable selection in partial leastsquares spectral calibrationsrdquo Applied Spectroscopy vol 71no 9 pp 2092ndash2101 2017

[27] L Mo H Chen W Chen Q Feng and L Xu ldquoStudy onevolution methods for the optimization of machine learningmodels based on FT-NIR spectroscopyrdquo Infrared Physics ampTechnology vol 108 Article ID 103366 2020

[28] Y-H Yun H-D Li B-C Deng and D-S Cao ldquoAn overviewof variable selection methods in multivariate analysis of near-infrared spectrardquo TrAC Trends in Analytical Chemistryvol 113 pp 102ndash115 2019

[29] S Forrest ldquoGenetic algorithms principles of natural selectionapplied to computationrdquo Science vol 261 no 5123pp 872ndash878 1993

[30] H Jiang W Xu and Q Chen ldquoComparison of algorithms forwavelength variables selection from near-infrared (NIR)spectra for quantitative monitoring of yeast (saccharomycescerevisiae) cultivationsrdquo Spectrochimica Acta Part A Mo-lecular and Biomolecular Spectroscopy vol 214 pp 366ndash3712019

[31] J Koljonen T E M Nordling and J T Alander ldquoA review ofgenetic algorithms in near infrared spectroscopy and che-mometrics past and futurerdquo Journal of Near Infrared Spec-troscopy vol 16 no 3 pp 189ndash197 2008

[32] J-H Jiang R J Berry H W Siesler and Y OzakildquoWavelength interval selection in multicomponent spectralanalysis by moving window partial least-squares regressionwith applications to mid-infrared and near-infrared spec-troscopic datardquo Analytical Chemistry vol 74 no 14pp 3555ndash3565 2002

[33] H-Z Chen Q-Q Song G-Q Tang and L-L Xu ldquoAnoptimization strategy for waveband selection in FT-NIRquantitative analysis of corn proteinrdquo Journal of Cereal Sci-ence vol 60 no 3 pp 595ndash601 2014

[34] C Ranzan L F Trierweiler B Hitzmann andJ O Trierweiler ldquoNIR pre-selection data using modifiedchangeable size moving window partial least squares and purespectral chemometrical modeling with ant colony optimiza-tion for wheat flour characterizationrdquo Chemometrics andIntelligent Laboratory Systems vol 142 pp 78ndash86 2015

[35] T Fearn C Riccioli A Garrido-Varo and J E Guerrero-Ginel ldquoOn the geometry of SNV and MSCrdquo Chemometricsand Intelligent Laboratory Systems vol 96 no 1 pp 22ndash262009

[36] R Galvao M Araujo G Jose M Pontes E Silva andT Saldanha ldquoA method for calibration and validation subsetpartitioningrdquo Talanta vol 67 no 4 pp 736ndash740 2005

10 Computational Intelligence and Neuroscience

Page 10: ANovelGeneticAlgorithm-BasedOptimizationFrameworkfor ...variable selection for NIR calibration of fishmeal for the quantitative determination of protein content. For NIR calibrations,

infrared spectrometers in the agro-food industryrdquo AppliedSpectroscopy vol 67 no 11 pp 1215ndash1233 2013

[11] A Sakudo ldquoNear-infrared spectroscopy for medical appli-cations Current status and future perspectivesrdquo ClinicaChimica Acta vol 455 pp 181ndash188 2016

[12] S Hong H Chen J Gu K Cai Z Liu and J WenldquoComparative assessment on smart pre-processing methodsfor extracting information in ft-nir measured datardquo Mea-surement vol 157 Article ID 107663 2020

[13] T N Kok and J W Park ldquoExtending the shelf life of set fishballrdquo Journal of Food Quality vol 30 no 1 pp 1ndash27 2007

[14] G Foca C Ferrari A Ulrici M C Ielo G Minelli andD P Lo Fiego ldquoIodine value and fatty acids determination onpig fat samples by FT-NIR spectroscopy benefits of variableselection in the perspective of industrial applicationsrdquo FoodAnalytical Methods vol 9 no 10 pp 2791ndash2806 2016

[15] F Westad and H Martens ldquoVariable selection in near in-frared spectroscopy based on significance testing in partialleast squares regressionrdquo Journal of Near Infrared Spectros-copy vol 8 no 2 pp 117ndash124 2000

[16] T Mehmood K H Liland L Snipen and S Saeligboslash ldquoA reviewof variable selection methods in partial least squares regres-sionrdquo Chemometrics and Intelligent Laboratory Systemsvol 118 pp 62ndash69 2012

[17] S Kuriakose and H Joe ldquoQualitative and quantitative analysisin sandalwood oils using near infrared spectroscopy com-bined with chemometric techniquesrdquo Food Chemistryvol 135 no 1 pp 213ndash218 2012

[18] M Versaci S Calcagno M Cacciola F C MorabitoI Palamara and D Pellicano ldquoStandard soft computingtechniques for characterization of defects in nondestructiveevaluationrdquo in Ultrasonic Nondestructive Evaluation SystemsP Burrascano S Callegari A Montisci et al Eds pp 175ndash199 Springer Cham Heidelberg Dordrecht London 2015

[19] M Versaci ldquoFuzzy approach and eddy currents NDTNDEdevices in industrial applicationsrdquo Electronics Letters vol 52no 11 pp 943ndash945 2016

[20] Y-H Yun H-D Li L E R Wood et al ldquoAn efficient methodof wavelength interval selection based on random frog formultivariate spectral calibrationrdquo Spectrochimica Acta Part AMolecular and Biomolecular Spectroscopy vol 111 pp 31ndash362013

[21] H Chen Z Liu K Cai L Xu and A Chen ldquoGrid searchparametric optimization for FT-NIR quantitative analysis ofsolid soluble content in strawberry samplesrdquo VibrationalSpectroscopy vol 94 pp 7ndash15 2018

[22] S Kasemsumran N Suttiwijitpukdee and V KeeratinijakalldquoRapid classification of turmeric based on DNA fingerprint bynear-infrared spectroscopy combined with moving windowpartial least squares-discrimination analysisrdquo Analytical Sci-ences vol 33 no 1 pp 111ndash115 2017

[23] K Zheng T Feng W Zhang et al ldquoVariable selection bydouble competitive adaptive reweighted sampling for cali-bration transfer of near infrared spectrardquo Chemometrics andIntelligent Laboratory Systems vol 191 pp 109ndash117 2019

[24] A Joo A Ekart and J P Neirotti ldquoGenetic algorithms fordiscovery of matrix multiplication methodsrdquo IEEE Transac-tions on Evolutionary Computation vol 16 no 5 pp 749ndash7512012

[25] E Haq I Ahmad A Hussain and I M Almanjahie ldquoA novelselection approach for genetic algorithms for global optimi-zation of multimodal continuous functionsrdquo ComputationalIntelligence and Neuroscience vol 2019 Article ID 86402182019

[26] B K Lavine and C G White ldquoBoosting the performance ofgenetic algorithms for variable selection in partial leastsquares spectral calibrationsrdquo Applied Spectroscopy vol 71no 9 pp 2092ndash2101 2017

[27] L Mo H Chen W Chen Q Feng and L Xu ldquoStudy onevolution methods for the optimization of machine learningmodels based on FT-NIR spectroscopyrdquo Infrared Physics ampTechnology vol 108 Article ID 103366 2020

[28] Y-H Yun H-D Li B-C Deng and D-S Cao ldquoAn overviewof variable selection methods in multivariate analysis of near-infrared spectrardquo TrAC Trends in Analytical Chemistryvol 113 pp 102ndash115 2019

[29] S Forrest ldquoGenetic algorithms principles of natural selectionapplied to computationrdquo Science vol 261 no 5123pp 872ndash878 1993

[30] H Jiang W Xu and Q Chen ldquoComparison of algorithms forwavelength variables selection from near-infrared (NIR)spectra for quantitative monitoring of yeast (saccharomycescerevisiae) cultivationsrdquo Spectrochimica Acta Part A Mo-lecular and Biomolecular Spectroscopy vol 214 pp 366ndash3712019

[31] J Koljonen T E M Nordling and J T Alander ldquoA review ofgenetic algorithms in near infrared spectroscopy and che-mometrics past and futurerdquo Journal of Near Infrared Spec-troscopy vol 16 no 3 pp 189ndash197 2008

[32] J-H Jiang R J Berry H W Siesler and Y OzakildquoWavelength interval selection in multicomponent spectralanalysis by moving window partial least-squares regressionwith applications to mid-infrared and near-infrared spec-troscopic datardquo Analytical Chemistry vol 74 no 14pp 3555ndash3565 2002

[33] H-Z Chen Q-Q Song G-Q Tang and L-L Xu ldquoAnoptimization strategy for waveband selection in FT-NIRquantitative analysis of corn proteinrdquo Journal of Cereal Sci-ence vol 60 no 3 pp 595ndash601 2014

[34] C Ranzan L F Trierweiler B Hitzmann andJ O Trierweiler ldquoNIR pre-selection data using modifiedchangeable size moving window partial least squares and purespectral chemometrical modeling with ant colony optimiza-tion for wheat flour characterizationrdquo Chemometrics andIntelligent Laboratory Systems vol 142 pp 78ndash86 2015

[35] T Fearn C Riccioli A Garrido-Varo and J E Guerrero-Ginel ldquoOn the geometry of SNV and MSCrdquo Chemometricsand Intelligent Laboratory Systems vol 96 no 1 pp 22ndash262009

[36] R Galvao M Araujo G Jose M Pontes E Silva andT Saldanha ldquoA method for calibration and validation subsetpartitioningrdquo Talanta vol 67 no 4 pp 736ndash740 2005

10 Computational Intelligence and Neuroscience