small target detection algorithm based on transfer

Research ArticleSmall Target Detection Algorithm Based on Transfer Learningand Deep Separable Network

Peng Wang,1,2,3 Haiyan Wang,1,2 Xiaoyan Li ,3 Lingling Zhang,3 Ruohai Di,3

and Zhigang Lv3

1School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an, Shaanxi, China 7100722Key Laboratory of Ocean Acoustics and Sensing (Northwestern Polytechnical University), Ministry of Industry andInformation Technology, Xi’an, Shaanxi, China 7100723School of Electronic and Information Engineering, Xi’an Technological University, Xi’an 710021, China

Correspondence should be addressed to Xiaoyan Li; [email protected]

Received 30 July 2021; Revised 31 August 2021; Accepted 7 September 2021; Published 4 October 2021

Academic Editor: Haibin Lv

Copyright © 2021 Peng Wang et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

With the development of deep learning, target detection from vision sensor has achieved high accuracy and efficiency. However,small target detection remains a challenge due to inadequate use of semantic information and detailed texture information ofunderlying features. To solve the above problems, this paper proposes a small target detection algorithm based on Mask R-CNN model which integrates transfer learning and deep separable network. Firstly, the feature pyramid fusion structure isintroduced to enhance the learning effect of low-level and high-level features, especially to strengthen the information channelof low-level feature and meanwhile optimize the feature information of small target. Secondly, the ELU function is used as theactivation function to solve the problem that the original activation function disappears in the negative half axis gradient.Finally, a new loss function F-Softmax combined with Focal Loss was adopted to solve the imbalance of positive and negativesample proportions. In this paper, self-made data set is used to carry out experiments, and the experimental results show thatthe proposed algorithm makes the detection accuracy of small targets reach 66.5%.

1. Introduction

In recent years, the aerial image target detection technologybased on UAV has become one of the forefront researchtopics [1–3]. Due to the distance from the target, the UAVaerial images are mostly small- and medium-sized targets.From the perspective of absolute size, a small target isdefined as a 32∗32 pixel target. In terms of relative size, ifthe target occupies 0.1 times the size of the whole picture,it can be considered as a small target [4–6]. The traditionaltarget detection algorithm is easy to cause misdetectionand missed detection of small targets in these image process-ing, and the detection rate of small targets is low, so smalltarget detection is the focus and difficulty in this field [7–9].

Small target detection is a very important field in imageprocessing, and it is only in recent years that more and moreattention has been paid to the research of small target detec-

tion [10, 11]. Different Gaussian methods are using in deeplearning algorithms to detect small targets in maritime infra-red images. However, due to small imaging area of small tar-gets in infrared images and insignificant target features,traditional Gaussian methods have problems such as highfalse positive rate in target detection [12–15]. A general bandselection algorithm based on high-order cumulant is ana-lyzed and applied the general band of the high-order cumu-lant to detect the small targets [16]. Although the detectioneffect was optimized to some extent, it had a strong depen-dence on the data set, and its robustness was poor. Thesingular value decomposition technology was applied tothe convolution feature compression processing to reduce thecalculation and storage requirements of the model, and themultiscale training method was adopted to adapt to the changeof the scale of aviation targets, but there was still a large rate ofmissed detection, and the detection rate was seriously affected

HindawiJournal of SensorsVolume 2021, Article ID 9006288, 10 pageshttps://doi.org/10.1155/2021/9006288

https://orcid.org/0000-0002-1459-428X

https://creativecommons.org/licenses/by/4.0/

https://doi.org/10.1155/2021/9006288

[17]. The main reasons for missed detection and false detectionare that the target object is not only interfered by the lumi-nance, occlusion, and other factors but also affected by theinterference factors such as the small scale of the target andthe large scale change, the complex and changeable back-ground, and there are many background objects which are verysimilar to the target.

In this paper, we proposed small target detection algo-rithm based on migration study and separable network tosolve the high rate of false positives, poor robustness, andlow detection rate under battlefield environment. The keycontributions in our work can be summarized as follows:

(1) In order to strengthen the relationship between theshallow layer and the deep layer, three fusion layersunder the idea of the fusion feature pyramid areproposed. The new feature layer obtained by fusionis taken as the input of the next layer to learn thefeature extraction. In addition, the deep separableconvolutional network is used for feature extractionto reduce the computational load

(2) Adopt exponential linear element (ELU) instead oftraditional ReLU activation function. It achieves theeffect of BN layer and reduces a lot of computation.At the same time, it is more robust to the noise ofinput change and has low complexity

(3) An improved Softmax loss function, namely, F-Soft-max, is proposed. By introducing angle constraint,the distance between classes can be increased, andthe distance within classes can be reduced by strictdecision conditions. This will make the classificationmore accurate. The introduction of key factors canreduce the weight of samples which are easy toclassify and meanwhile make the model focus onthe samples which are difficult to classify duringtraining. To do this can solve the unbalance problemof positive and negative sample

2. Design of Small TargetDetection Architecture

2.1. Network Structure. The Mask R-CNN [18] model is apyramid-like structure, but the shallow feature map with alarge field of view does not have the detailed information ofthe deep feature map, and the deep feature map of small fieldview cannot cover the target information. So theMask R-CNNis not good for small target detection. Meanwhile, the hugecalculation of Mask R-CNN model makes the positioningand classification speeds slow. In order to enhance relation-ship between shallow features and deep features, this paperproposes Mask R-CNN model fused with feature pyramids,as shown in Figure 1.

In this paper, three layers of Conv7_1, Conv8_1, andConv9_1 are selected for feature pyramid fusion structure.Conv7_1 is fused with Conv6_2, Conv8_1 is fused withConv7_2, and Conv9_1 is fused with Conv8_2. Next, thefusion calculation of Conv7_1 and Conv6_2 is taken as an

example for analysis as shown in Figure 2. The other twofusion methods are the same.

The characteristic in Conv7_1 is fused with the charac-teristic in Conv6_2. Low level feature Conv6_2 needs tochange the number of channels through a 1 × 1 convolu-tional layer to reduce the dimension of the feature graph.Similarly, for Conv7_1, the number of channels should bechanged through a 1 × 1 convolutional layer to change it into19 × 19 × 256. Then, the image size of Conv7_1 should beexpanded twice by using bilinear interpolation algorithm tobecome 38 × 38 × 256. Finally, the low level features andhigh level features are fused to get a new feature layer. Thenew feature layer obtained by fusion is taken as the inputof the next layer to learn the feature extraction.

2.2. Transfer Learning Structure. This paper introduced thetransfer learning combined with CNN to propose a remotesensing image target model recognition algorithm based ontransfer learning. Among them, the source domain is PAS-CAL VOC2012 data set of ten type of targets. Source taskfocused on image classification in the source domain. Thetarget domain refers to the PASCAL VOC2012 data set offive small type of targets.

Target task is to classify small targets in the image of theland battlefield. The overall structure of the transfer learningmethod is shown in Figure 3.

The framework of transfer learning used in the algo-rithm is based on the Mask R-CNN network model, eachof which includes multiple convolutionlayers, activationlayers, pooling layers, and fully connected layers. The algo-rithm can be divided into two stages: the preliminary train-ing stage and parameter fine-tuning stage. First, classifiedtraining with five small types of targets in VOC2012 dataset and get the classification model. Then, under this basismodel, classification training is carried out in the other kindsof target image.

The network model used in this paper includes 13 convo-lution layers, 13 activation function layers, and 4 poolinglayers. Among the convolution layer, the convolution kernelsize is 3, zero complement is 1, and step size is 1. Among thepooling layer, the window size is 2, and step size is 2. Theactivation function used in the model is ReLU activation func-tion. The full connection layer improves overfitting by usingdropout, which randomly sets the neurons in the model to 0with a 50% probability to reduce the dependence of fixed con-nections between neurons. The classification layer adoptsNMS function. There are 10 categories of labels, and each ispredicted as the probability of the corresponding category.

2.3. Deeply Separable Network. Using the concept of Xcep-tion model to balance the accuracy and speed and mean-while realize extract the attention feature of the image. Inthe template feature extraction network and feature extrac-tion to be detected network, using depth separable convolu-tion instead of traditional convolution kernels, which meansto build the DS-AlexNet (Depthwise Separable-AlexNet)network. And the other module of the network is not needto be changed. To do this can reduce the cost of the networkparameters while not affect the accuracy of model.

2 Journal of Sensors

The standard convolution structure is shown in Figure 4.Separable convolution is a one-dimensional channel convo-lution kernel and a two-dimensional position convolutionkernel. The channel information and position informationin the image are, respectively, learned. The separable convo-lution structure is shown in Figure 5.

The main purpose of using separable convolution is toseparate the spatial cross-correlation information from thechannel cross-correlation information, so as to improve therecognition rate while speeding up calculation. Assume theinput feature size is Dk ×Dk ×M, M is the number of inputchannels. The standard convolution kernel is D ×D ×N ,where D is the length and width of the convolution kerneland N is the number of output channels. The calculationamount of a standard convolution is shown in Equation (1).

Dk ×Dk ×M ×N ×D ×D: ð1Þ

In the case of separable convolution, D ×D filters areapplied to M input channels, i.e., D ×D ×M ×DK ×DK ,and N 1 × 1 ×M convolution filters are applied to combineM input channels into N output channels, i.e., M ×N ×DK×DK. Merge each value in the 1 × 1 ×M feature graphtogether, and the calculation amount is shown in Equation (2).

D ×D ×M × 1 ×Dk ×Dk + 1 × 1 ×M ×N ×Dk ×Dk: ð2Þ

Compared with the standard convolution structure, such aseparable convolution structure requires less computation, asshown in Equation (3).

D ×D ×M × 1 ×Dk ×Dk + 1 × 1 ×M ×N ×Dk ×Dk

D ×D ×M ×N ×Dk ×Dk= 1N

+ 1D2 :

ð3Þ

DSCN

3838

Conv6(FC6)

Conv7(FC7)

Conv8

Conv9

Conv10

Conv11

Input

3838

19

19

10

10

55

3 311

Conv10_2

Conv9_2

Conv8_2

Conv7_2

Detectioncategory

andlocation

ROI align FCNONROI

Figure 1: Overall network structure.

Conv6_2 (38 × 38 × 512)

Conv1 × 1

Conv1 × 1

Bilinearinter-

polation

Conv7_1 (38 × 38 × 256)Conv7_1 (19 × 19 × 512)

Conv6_2 (38 × 38 × 256)Concat

Fusion layer(38 × 38 × 256)

Figure 2: Fusion structure.

3Journal of Sensors

Taking Conv4 to Con5 as an example, the feature map-ping of the input is ð13 × 13 × 384Þ, and the standard convolu-tion kernel is ð3 × 3 × 256Þ.

13 × 13 × 384 × 1 × 3 × 3 + 1 × 1 × 256 × 384 × 3 × 313 × 13 × 384 × 256 × 3 × 3 ≈

1128 :

ð4Þ

As shown in Equation (4), the amount of calculation isreduced to 1/128 of the original amount.

Through the above analysis, it is proved that using thedeep separable convolution structure instead of the tradi-tional convolution structure can speed up the calculation

and reduce the used computing resources under the condi-tion of ensuring the same feature extraction effect.

2.4. Activation Function and Loss Function

2.4.1. Activation Function. The exponential linear unit (ELU)is used to replace the traditional ReLU activation function,and the ELU function expression is shown in Equation (5).

f xð Þ =x if x ≥ 0,α ex − 1ð Þ, if x < 0,

(

f ′ xð Þ =1 if x ≥ 0,f xð Þ + α, if x < 0:

( ð5Þ

The ELU function is an improvement over ReLU. Whenthe parameters are greater than or equal to 0, the computa-tional complexity is low, and the learning speed is fast withoutthe need for exponential operation, which also increases thenonlinear characteristics of the model. When the parameteris less than 0, a smooth function is used instead of the originalidentity 0, so that the average output value of the activationfunction is close to zero; therefore, the convergence speed isfaster. The BN layer effect is achieved, and a lot of computa-tion is reduced. At the same time, it is more robust to the inputnoise and has lower complexity.

2.4.2. Loss Function. The classification function used in theMask R-CNN model is Smooth L1, which approximatesthe output as a probability distribution, as shown in Equa-tion (6).

Convolutional layer 1

Activiation layer 1


Activiation layers 2

Pooling layer 1

...



...Fully connected layer

Source domaindataset

Data transform Pre-training stage


Activiation layer 1



Pooling layer 1...



...

Fully connected layerTarget domaindataset

Parameter fine-tuning stage

Figure 3: The overall structure of the transfer learning.

M. . .

N

Dk

Dk

OutputInput

Figure 4: Standard convolution.

M

Dk

Dk1

M . . .

N

11OutputInput

Depthwise convolution Pointwise convolution

Figure 5: Separable convolution.


Lloc x, l, gð Þ = 〠N

i∈Postive〠

m∈ cx,cy,w,hf gxkijsmoothL1 lmi − gmj

� �: ð6Þ

This paper proposes an improved Softmax loss functionwith a period of increasing angle constraints and a key factornamed F-Softmax.

(1) Angle constraints

The output of the model is w1x,w2x⋯wnx; the classifi-cation estimation probability obtained after the loss functionis shown in Equation (7).

hw xð Þ = 1ew1x + ew2x+⋯+ewnx

ew1x

ew2x

⋯

ewnx

266664

377775: ð7Þ

x in formula (7) represents the sample, n represents thecategory, and ewnx represents the parameter weight vectorof different sample types. If the input sample x belongs tothe n category, then the value of ewnx is the largest; that is,wnx is also required to be the largest. Expand the equationto obtain Equation (8).

wnk k xk k cos θn > w1k k xk k cos θ1, w2k k xk k cos θ2 ⋯f g:ð8Þ

θ is the included angle between the sample probability vectorand the parameter weight vector. Assuming an integer n,Equation (9) can be obtained according to the propertiesof cos function.

wnk k xk k cos θn > w1k k xk k cos θ1, w2k k xk k cos θ2 ⋯f gwhere θ1, θ2 ∈ 0, π

n

h i� �:

ð9Þ

The inclusion of the limiting conditions in the formulamakes the discrimination more strict, so that in the originalloss function, if there is A class of targets that may belong toclass A or may belong to class B. At this time, when judgingthe category of the target, not only the probability vector isrequired to be the same as the parameter weight vector butalso the constraint condition of an included angle is added.Strict criteria can make the distance between classes largerand the distance within classes smaller, so that the classifica-tion is more accurate.

(2) Key factors

Angle constraint considers that the distance between theclass and class does not take into account the balance ofpositive and negative samples, on the basis of the previoussection introduced the Focal Loss of ideological building lossfunction, the Focal Loss formula such as type of (10).

LossFL = −αt 1 − ptð Þγ log ptð Þ, ð10Þ

where pt is equal to (11).

Pt =p, if y = 1,1 − p, else:

(ð11Þ

αt is called the weighting factor; y is called the key factor;ð1 − ptÞ represents the probability of belonging to the label,within the range of [0,1]; and P is the probability of thetarget predicted by the model, within the range of [0,1].

Finally, the loss function formula is substituted intoFocal Loss, which is the proposed loss function formula F-Softmax, as shown in Equation (12).

LFS =1N〠i

− αt 1 − ptð Þγ log pð Þ: ð12Þ

The optimal initial value selection is given throughexperiments, and α = 0:25 and γ = 2 are set. Where n is thenumber of categories and p is the probability precalculatedby Softmax function, the calculation formula is shown inEquation (13).

p = e xik k wik k cos θn

e xik k wik k cos θn +∑i≠yie xik k wik k cos θn

, θ ∈ 0, πn

h i: ð13Þ

3. Experimental Results

3.1. Introduction to the Experimental Environment. In theUbuntu 16.04 operating system, the algorithm in this paperadopts the deep learning framework PyTorch to realize theground-field target detection algorithm based on the multi-level feature pyramid. The experimental platform uses CPU:Intel(R) Core(TM) I5-8600 3.10GHz; Memory: 16G; GPU:NVIDIA GTX 1080TI, training and testing the network inthe above environment. In order to verify the accuracy andreal-time performance of the algorithm, YOLO v3, Faster R-CNN, and Mask R-CNN algorithms with better current per-formance were selected for comparison, all of which weretested in the same environment. The training set is made byrandomly extracting 70% data from the data set, while the testset is made by randomly extracting 30% data from the data set.

3.2. Introduction to the Experimental Data Set. The targetincluded tank, person, gun, cannon, helicopter, and car.The data set contains 9,000 images of the abovementionedtarget. Then, the data was expanded to 27,000 by addingnoise and scaling to some extent. We also found 3000 rele-vant video images containing the target from the network.So the data set consists of total 30,000 images. Each imageis manually annotated in accordance with the format ofPASCAL VOC data set. Some images of the data set areshown in Figure 6.

3.3. Ablation Experiment. The experimental data set isexpanded self-made data set. The data set was taken as input,and the parameters of network training were set as follows:

5Journal of Sensors

learning rate 0.1, the learning rate decreased by 1 order ofmagnitude after each epoch, regular term f = 0:1, and Batch-Size 100. After each epoch, the data set was rearrangedrandomly.

3.3.1. Activation Function. The recognition accuracy of thenetwork on the test set-the iteration step curve (acc-step)and the loss function value-the iteration step curve (loss-step) are shown in Figures 7(a) and 7(b). It can be seen thatwith the update of iteration step, the overall recognitionaccuracy finally reached more than 90%, and the curves ofthe identification accuracy and the loss function value ofthe data set basically leveled off after about 3300 iterations.

The activation functions in the residual block are,respectively, set to ReLU and LReLU. The activation func-tions are shown in Equations (5), (14), and (15). Only theactivation function in the model is changed; other parts ofthe model remain unchanged and are trained under thesame training set.

a is the adjustment parameter, and it control the activa-tion of the ELU function in the negative half axis.

Re LU xð Þ =x, if x > 0,0, f x ≤ 0,

(ð14Þ

L Re Lu xð Þ =xi, if xi ≥ 0,αixi, if xi < 0:

(ð15Þ

ai is fixed. i means different channels correspond todifferent ai.

It can be seen from Figure 8 that the convergence ofLReLU is close to ELU during 20000-25000 iterations. How-ever, compared with ReLU and LReLU, as the number ofiterations increases, ELU has the minimum final loss func-tion value, and the training effect of the model is better.

3.3.2. Comparison of Loss Functions. In order to verify thesuperiority of F-Softmax function, ROC curve is used toevaluate the influence of various loss functions on the classi-fication of model samples. Softmax loss function and cross-entropy loss function are used to compare with F-Softmaxfunction. In order to ensure objectivity, other parametersof the model remain unchanged.

From Figure 9, the classification effect of the Softmax isthe worst. The cross-entropy loss function can effectivelysolves the probability problem of multiple classificationsand improves the classifier effect. The F-Softmax functionnot only effectively solves the guidance problem of difficultsamples and simple samples but also effectively deals withthe problem of sample imbalance, making the loss functionmore reasonable. The classification model using the lossfunction has the best classification performance.

3.3.3. Comparative Experiment of Transfer Learning. Inorder to verify the applicability of the migration study, thispaper is the first on the five types of self-made data set totrain and get classification model. Then, using this modelto all the 10 kinds of target image data set to train the model,the model will give recognition rate and loss function changecurve of the two training condition, respectively. The fivetypes of data sets used to train the original model includeperson, armoured vehicles, gun, tank, and drone. The other

37 38 39 40 41 42

43 44 45 46 47 48

49 50 51 52 53 54

55 56 57 58 59 60

Figure 6: Part of the data set.


five data sets used to verify transfer learning include knives,helicopter, car, bulldozer, and cannon, and the other 10 datasets include all of the above targets. Figure 10 shows thecomparison of classification accuracy and loss functioncurves of the model under zero-based learning and transferlearning modes.

The parameters of the network model were set as fol-lows: the learning rate was 0.1, the regular term f = 0:1,and the BatchSize was 100. After each epoch, the data setswere rearranged randomly, and the amplified self-made dataset was used as input to train the network. As can be seenfrom Figures 10(a) and 10(b), when the five types of data setsare classified, the initial value of classification accuracy ofzero-based learning is 11.6% while the transfer learningmethod in the same period is as high as 62.3%. After about1800 steps, the accuracy of network classification in thetransfer learning mode reached a peak of more than 90%,and after about 4500 steps, the accuracy curve had no obvi-ous change. The classification accuracy and loss functioncurves of the model combining the transfer learning on all10 types of targets are shown in Figures 10(c) and 10(d).

0 2 4 6 8 10 12 14 ×1000

0.100.200.300.400.500.600.700.800.901.00

Acc

Step

(a)

0 2 4 6 8 10 12 14 ×1000

0.200.400.600.801.001.201.401.601.802.00

Loss

Step

(b)

Figure 7: (a) Acc-step curve. (b) Loss-step curve.

5000 10000 15000 20000 35000 40000

0.200.400.600.801.001.201.401.601.802.00

Trai

n lo

ss cu

rve

Iterations0 25000 30000

ReLUELULReLU

Figure 8: Loss function changes.

00

0.10.20.3Tr

ue p

ositi

ve ra

te

0.40.50.60.70.80.9

1

0.1 0.2 0.3 0.4 0.5False positive rate

0.6 0.7 0.8

Cross entropySoftmaxF-softmax

0.9 1

Figure 9: ROC curve.

7Journal of Sensors

The initial accuracy values are 21.3% and 44.1%, respec-tively. After about 2800 steps, the classification accuracy inthe transfer learning mode was higher than 92%, slightlyhigher than the 90% in the zero-basic learning mode of4800 steps. The iteration curves of transfer learning modein the above two training sets are smoother, and the modeltraining speed is faster.

3.3.4. Model Comparison Experiment. In this section, theexisting YOLO-v3, Faster R-CNN [19], and Mask R-CNNmethods will be used to compare with our proposed algo-rithm which called Ours+. Ours+ is a model for transferring5 types of data training. All methods are trained in the samedata set and tested in the same test set. There are eightparameters used to compare the performance of the fouralgorithms: mAP (mean Average Precision), AP1 (AveragePrecision of tank), AP2 (Average Precision of person), AP3(Average Precision of gun), AP4 (Average Precision of can-non), AP5 (Average Precision of helicopter), AP6 (AveragePrecision of car), and FPS (Frame Per Second).

As shown in Table 1, compared with YOLO v3 model, asthe method mainly focuses on lightweight detection so it isthe fastest among the four models compared in terms ofFPS, but the detection accuracy is far behind the othermethods. The original design intention of Faster R-CNN andMask R-CNN is two-stage structure, which have candidate

Above: transfer learningFollowing: zero basic learning

0

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

Acc

0.90

1.00

0.5 1 1.5 2 2.5 3 3.5 ×1000Step

(a) 5 types of target accuracy-iteration step curve


0.20

0.40

0.60

0.80

1.00

1.20

1.40

1.60

1.80

2.00

Loss

0 0.5 1 1.5 2 2.5 3 3.5 4 ×1000Step

(b) 5 types of objective loss function-iteration step curve


0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

Acc

0.90

1.00

0 1 2 3 4 5 6 7 ×1000Step

(c) 10 types of target correct rate-iteration step curve


Loss

0.20

0.40

0.60

0.80

1.00

1.20

1.40

1.60

1.80

2.00

0 1 2 3 4 5 6 7 8 ×1000Step

(d) 10 types of objective loss function-iteration step curve

Figure 10: Comparison of experimental results of transfer learning and zero-based learning.

Table 1: Test data.

Ours+ YOLO v3 Faster-RCNN Mask R-CNN

mAP 66.5 59.4 65.7 66.0

AP1 68.3 61.2 66.7 68.0

AP2 65.5 57.3 65.3 65.1

AP3 65.3 56.5 65.0 64.5

AP4 66.2 59.0 64.8 65.0

AP5 67.0 61.5 66.3 66.9

AP6 66.7 60.9 66.1 66.5

FPS 14 35 15 12


region generation network. So the network is far more com-plex than other methods, and the detection accuracy is superthan YOLO v3. The proposed method improves several short-comings of Mask R-CNNmodel, so the accuracy of the FPS issuper than Mask R-CNN. Some test results are shown inFigure 11.

4. Conclusion

To solve the problem of low accuracy of small target detec-tion, this paper proposes a small target detection algorithmbased on transfer learning and deep separable network.Firstly, feature extraction is carried out by deep separableconvolutional network, which reduces the amount of com-putation. Then, the feature pyramid fusion structure is usedto fuse the high-level and low-level feature information,optimize the shallow feature information of the network,and effectively compensate for the loss of informationcaused by continuous pooling, so as to extract more shallowdetail texture information and improve the detection perfor-mance of small targets. Finally, the activation function andloss function are optimized to solve the imbalance of positiveand negative samples, so as to optimize the network perfor-mance. The whole network model is trained by transferringthe learning method, and experiments are carried out on thePASCAL VOC2012 data set. The experimental results showthat the proposed model is significantly better than otheralgorithm models in the detection accuracy of small targets.

Data Availability

The processed data required to reproduce these findingscannot be shared at this time as the data also forms part ofan ongoing study.

Conflicts of Interest

These are no potential competing interests in our paper.

Authors’ Contributions

All authors have seen the manuscript and approved to sub-mit to your journal.

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (62171360), the Science and Technol-ogy Program of Xi’an Technology Department(2020KJRC0037), and the Principal Research Foundationof Xi’an Technological University (XGPY200217).

References

[1] X. Bai and F. Zhou, “Analysis of new top-hat transformationand the application for infrared dim small target detection,”Pattern Recognition, vol. 43, no. 6, pp. 2145–2156, 2010.

[2] P. Wang, M. Sun, H. Wang, X. Li, and Y. Yang, “Convolutionoperators for visual tracking based on spatial-temporal

Figure 11: Part of the test results.

9Journal of Sensors

regularization,” Netural Computing and Applications., vol. 32,no. 10, pp. 5339–5351, 2020.

[3] C. Gao, “Infrared small-dim target detection based on Markovrandom field guided noise modeling,” Pattern Recognition theJournal of the Pattern Recognition Society, vol. 76, pp. 463–475,2018.

[4] P. Wang, J. Wu, H. Wang, and X. Li, “Low-light-level imageenhancement algorithm based on integrated networks,”Multi-media Systems, vol. 7, 2020.

[5] X. Bai and Y. Bi, “Derivative entropy-based contrast measurefor infrared small-target detection,” IEEE Transactions onGeoence and Remote Sensing, vol. 56, no. 4, pp. 2452–2466,2018.

[6] P. Głomb, K. Domino, M. Romaszewski, and M. Cholewa,Band Selection with Higher Order Multivariate Cumulantsfor Small Target Detection in Hyperspectral Images, 2018,https://arxiv.org/abs/1808.03513.

[7] S. Chen, F. Luo, C. Hu, and X. Nie, “Small target detection insea clutter background based on Tsallis entropy of Dopplerspectrum,” Journal of Radars, vol. 8, no. 3, pp. 344–354, 2019.

[8] L. Hua, S. Yu-Long, and Q. Feng, Detection of Small Target inAerial Photography Based on Deep Learning, Chinese Journalof Liquid Crystals and Displays, 2018.

[9] W. Sun, D. Yan, J. Huang, and C. Sun, “Small-scale movingtarget detection in aerial image by deep inverse reinforcementlearning,” Soft Computing, vol. 24, no. 8, pp. 5897–5908, 2020.

[10] L. Liangkui, W. Shaoyou, and T. Zhongxing, “Using deeplearning to detect small targets in infrared oversamplingimages,” Journal of Systems Engineering & Electronics,vol. 29, no. 5, pp. 947–952, 2018.

[11] Y. Li, W. Wang, and Y. Jiang, “Through wall human detectionunder small samples based on deep learning algorithm,” IEEEAccess, vol. 6, pp. 65837–65844, 2018.

[12] A. Mansour, W. M. Hussein, and E. Said, “Small objects detec-tion in satellite images using deep learning,” in 2019 NinthInternational Conference on Intelligent Computing and Infor-mation Systems (ICICIS), IEEE, 2020.

[13] L. I. Dajun, H. E. Weilong, and G. Bingxuan, Building TargetDetection Algorithm Based on Mask-RCNN, ence of Surveyingand Mapping, 2019.

[14] P.Wang, F. Huitong, X. Li, J. Guo, Z. Lv, and R. Di, “Multi-fea-ture fusion tracking algorithm based on generative compres-sion network,” Future Generation Computer Systems,vol. 124, pp. 206–214, 2021.

[15] A. O. Vuola, S. U. Akram, and J. Kannala, “Mask-RCNN andU-net ensembled for nuclei segmentation,” in 2019 IEEE16th International Symposium on Biomedical Imaging (ISBI2019), IEEE, 2019.

[16] M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele, “2Dhuman pose estimation: new benchmark and state of the artanalysis,” In CVPR, vol. 8, 2014.

[17] T. Y. Lin, P. Dollár, and R. Girshick, “Feature pyramid net-works for object detection,” IEEE Computer Vision and Pat-tern Recognition, vol. 11, 2017.

[18] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,”IEEE Conference on Computer Vision, vol. 12, 2017.

[19] S. Ren, K. He, R. Girshick, and J. Sun, “Faster, R-CNN: towardsreal-time object detection with region proposal networks,”IEEE Transactions on Pattern Analysis and Machine Intelli-gence, vol. 39, no. 6, pp. 1137–1149, 2017.


https://arxiv.org/abs/1808.03513

small target detection algorithm based on transfer

Documents