Download - AI-based autonomous forest stand generation
AI-based autonomous forest stand
generation
Using forest stands as key predictor
Diana Saveh
Master thesis
Main field of study: Computer engineer science
Credits: 30hp
Year: 2021
Supervisor: Magnus Eriksson
Examiner: Tingting Zhang
Course code: DT005A
Degree programme: MSE
2
Abstract In recent years, the tech is moving towards a more automized and
smarter software. To achieve smarter software the implementation of AI
is a step towards that goal. The forest industry needs to become more
automized and decrease the manual labor. Decreasing manual labor will
both have a positive impact on both the cost and the environment. After
doing a literature study the conclusion was to use Mask R-CNN to be
able to make the AI learn about the pattern of the different stands. The
different stands were extracted and masked for the Mask R-CNN. First
there was a comparison between the usage of a computer versus Google
Colab, and the results show that Google Colab did deliver the results a
little faster than on the computer. Using a smaller area with fewer stands
gave a better result and decreased the risk of the algorithm crashing.
Using 42 areas with about 10 stands in each gave better results than using
one big area with 3248 stands. Using 42 areas gave the result of an
average IoU of 42%. Comparing this to 6 areas with about 10 stands each
gave the result of 28% IoU. The result of increasing the data split to 70/30
did gave the best IoU with the value of 47%.
Keywords: Forest stands, TensorFlow, AI, QGIS, Mask R-CNN
3
Acknowledgements I would like to thank CGI for the guidance, specifically Magnus Olovsson
for giving me this opportunity, and a huge thanks to Magnus Ericsson at
Mid Sweden University for being a great supervisor. Then I would also
like to thank Elijs Dima at mid Sweden university for great support and
information. Lastly, I would also like to thank TingTing Zhang for being
a great examinator and giving me advice.
4
Table of Contents Abstract ............................................................................................................. 2
Acknowledgements ........................................................................................ 3
Terminology ..................................................................................................... 7
1 Introduction ............................................................................................ 8
1.1 Background and problem motivation ................................................. 8
1.2 Overall aim.............................................................................................. 8
1.3 Problem statement ................................................................................. 9
1.4 Research question .................................................................................. 9
1.5 Scientific goals (Concrete and verifiable) ........................................... 9
1.6 Scope ........................................................................................................ 9
1.7 Outline ..................................................................................................... 9
1.7.1 Contributions ........................................................................................ 10
2 Theory .................................................................................................... 11
2.1 Forest stands ......................................................................................... 11
2.2 QGIS ....................................................................................................... 11
2.3 Machine learning ................................................................................. 12
2.4 Supervised learning ............................................................................. 13
2.5 Computer vision .................................................................................. 14
2.6 Convolutional Neural Networks ....................................................... 14
2.6.1 Object detection .................................................................................... 14
2.6.2 Semantic segmentation ........................................................................ 15
2.6.3 R-CNN.................................................................................................... 15
2.6.4 Fast R-CNN ........................................................................................... 16
2.7 Mask R-CNN ........................................................................................ 16
2.7.1 Loss function ......................................................................................... 17
2.7.2 Intersection over union ........................................................................ 18
2.7.3 Non-Maximum Suppression .............................................................. 18
2.8 Related work ......................................................................................... 18
2.8.1 Mapping roads through deep learning and weakly supervised
training ............................................................................................................ 19
2.8.2 Generating retinal flow maps from structural optical coherence
tomography with artificial intelligence ...................................................... 19
2.8.3 Utilizing Mask R-CNN for Detection and Segmentation of Oral
Diseases ........................................................................................................... 19
2.8.4 Segmentation of Forest to Tree Objects ............................................. 19
2.8.5 Automating Forest stands with Image Segmentation ..................... 20
3 Methodology ........................................................................................ 21
5
3.1 Scientific method description ............................................................. 21
3.1.1 CRISP-DM ............................................................................................. 21
3.2 Business understanding ...................................................................... 22
3.3 Structure the work ............................................................................... 22
3.4 Datamining task ................................................................................... 23
3.5 Evaluate tool ......................................................................................... 23
3.5.1 Comparing R-CNN , Fast R-CNN and Mask R-CNN ..................... 23
3.5.2 Functionality ......................................................................................... 24
3.5.3 Usability and Support .......................................................................... 25
3.5.4 Identify resources ................................................................................. 25
3.6 Data understanding ............................................................................. 25
3.6.1 RGB band ............................................................................................... 26
3.7 Data preparation of TiFF and Shape file .......................................... 27
3.8 Evaluation ............................................................................................. 27
4 Implementation ................................................................................... 28
4.1 Handling the data ................................................................................ 28
4.2 Python .................................................................................................... 29
4.3 How Mask RCNN is used .................................................................. 29
4.3.1 Environment .......................................................................................... 30
4.3.14 Running the Mask R-CNN ........................................................... 30
4.4 Measurement setup ............................................................................. 31
5 Results ................................................................................................... 32
5.1 Manually masking a forest stand ...................................................... 32
5.2 Results of all the areas ......................................................................... 33
5.3 One area ................................................................................................. 35
5.3.1 6 areas ..................................................................................................... 38
5.3.2 42 areas ................................................................................................... 40
6 Discussion ............................................................................................. 43
6.1 Scientific discussion ............................................................................. 43
6.1.1 Manually masking a forest stand ....................................................... 43
6.2 One, six and 42 areas. .......................................................................... 43
6.3 Rpn value loss ....................................................................................... 44
6.4 Project method discussion .................................................................. 45
6.5 Ethical and societal discussion ........................................................... 46
7 Conclusions .......................................................................................... 47
7.1 Project continuation ............................................................................. 47
7.1.1 Computer set against Google Colab .................................................. 47
7.1.2 One big area with many stands versus 6 areas with smaller stands
48
6
7.1.3 Comparing 6 stands versus 42. ........................................................... 48
7.2 Scientific conclusions ........................................................................... 48
7.2.1 Is it possible to create forest stand with the help of ai? .................. 48
7.2.2 Find a good strategy. ............................................................................ 48
7.2.3 How long will it take for the AI to generate results, and will it be
more efficient than traditional methods? ................................................... 49
7.2.4 Compare generated forest stands with manually selected forest
stands. .............................................................................................................. 49
7.2.5 Compare the time and accuracy of forest stand generated in Google
Colab versus forest stands generated on the computer locally. .............. 49
7.2.6 Measure the accuracy. .......................................................................... 50
7.3 Future Work .......................................................................................... 50
Bibliography .................................................................................................. 51
Appendix A: ................................................................................................... 55
7.4 Different stands .................................................................................... 57
7.4.1 6 stands versus one stand on the computer ...................................... 57
7.4.2 6 stands running on computer versus running it on the cloud ..... 60
7.4.3 6 stands on the cloud versus 42 stands on the cloud ...................... 62
7.5 Areas ...................................................................................................... 65
7.5.1 One area ................................................................................................. 65
7.5.2 6 areas ..................................................................................................... 71
7.5.3 42 areas ................................................................................................... 75
7.5.4 42 areas with 70%30% .......................................................................... 76
7
Terminology Acronyms
CNN Convolutional Neural Network
R-CNN Region Convolutional Neural Networks
FCN Fully Convolutional Network
FPN Feature Pyramid Network
CRISP-DM Cross-Industry Standard Practice for Data Mining
RGB Red, Green and Blue
TIFF Tagged Image File Format
ROI Region of Interest
IoU Intersection over Union
8
1 Introduction There is a need for better technology when dividing the forest areas into
different stands. The usage of dividing the forest to different sections is
crucial because cutting down the trees to stimulate new growth makes
the forest healthy. CGI has motivated that if it is possible to generate
forest stands it will be extremely beneficial in the economical aspect.
1.1 Background and problem motivation
Today, traditional methods are used to determine what a forest stands
looks like. A forest stand is an area of the forest that is divided into
different sections.
Traditional methods use airplanes to take pictures of the forest and then
a person has to manually look at these photos to determine
approximately where the forest stand should be. Then the companies in
some instances need to send out people to go to the forest and analyze it,
mainly the height and variety of the trees. They then determine where
the forest stand should be. This is very costly, time-consuming, and
inefficient. Also, the flight photos have a negative impact on the
environment.
1.2 Overall aim
This thesis work will investigate how forest stands can be modernized
with the help of AI/ML. By using modern solutions, the vision is to
generate automated forest stands and to create a more time and cost-
efficient process.
It could be different nature stands and not just forest stands that could
use this modern technology. The parameters could be adjusted to the
specific stand.
The overall aim will be achieved by using Mask R-CNN to generate
forest stands. If this becomes accurate enough then it can replace the old
method i.e. aerial photos and to manually sending out people to look in
the different areas for where these stands should be. This will reduce the
cost by millions according to CGI. This will also be beneficial for the
environment.
9
1.3 Problem statement
The problem with regular that regular forest strand selection is slow and
wasteful, and there does not yet exist an automatic solution. It also cost
a lot for the forest related companies to do this forest stands.
1.4 Research question
The investigated research question of this thesis is to research whether it
is possible to generate forest stands from the coordinates and generate
forest stands based on the training data. What classification accuracy can
be achieved by using location and images in a mask-rcnn predictive
model?
1.5 Scientific goals (Concrete and verifiable)
1. Is it possible to create forest stand with the help of ai?
2. Is it possible to use the location of the forest stand as a parameter
to create and predict new forest stands?
3. How long will it take for the AI to generate this result, and will it
be more efficient than traditional methods?
4. Compare generated forest stands with manually selected forest
stands.
5. Compare the time and accuracy of forest stand generated in
Google Colab versus forest stands generated on the computer
locally.
6. Find a good strategy.
7. Measure the accuracy.
1.6 Scope
The training data will only be focused on the Swedish coordinates for the
forest stand. The rest of the worlds forest stand may differ from the ones
used in this project.
1.7 Outline
Chapter 2 describes the theory of the forest stands and an overview of
machine learning. Then it reveals about convolutional networks and a
literature review.
10
Chapter 3 describes the methodology used in this essay and the tools
used to evaluate it. This is the CRISP-DM and a method from Jose Solarte.
Chapter 4 describes the implementation of the work and how the Mask
R-CNN is used. And more details about the environment and
measurement.
Chapter 5 describes the results of the test conducted on 1,6 and 42 areas.
Graphs over the epoch loss and images of the ground truth and detection
is showcased.
Chapter 6 and 7 discusses results and provides a conclusion.
1.7.1 Contributions
This is a continuation of the Miun project work, the group consisted of
me, Herman Jansson, Lucas Thunberg Wessel and Jawad Ramezanja.
The code that I will be continue using to extract the stands is written by
me and Herman Jansson.
11
2 Theory This chapter is going to explain how the forest industry is operating with
the different forest stands. A deep explanation of the machine learning
required for this project is also given. Then it will describe an overview
of machine learning in general and its components. After that, an
explanation about Convolutional networks is given.
2.1 Forest stands
Forests do contribute to the well being of humans because of its great
properties, which is cleaning the air and giving us useful material that
we can harvest [1]. To preserve and keep the forest as long as possible it
is important to use the knowledge we have about the trees, which is
trimming and diving up the area into different sections.
A forest stand can be defined as “a contiguous group of trees sufficiently
uniform in age-class distribution, composition, and structure, and
growing on a site of sufficiently uniform quality, to be a distinguishable
unit”. [2]
There is no method to model a forest stand but there have been various
methods used. One option is to use a model which include the stand, size
class and tree. They are usually divided into three categories which is to
have specific species, varied-species, same-aged or uneven-aged. [3]
2.2 QGIS
QGIS was created 2002 and is a program where the purpose is for the
user to interact with the data and see it on the maps. QGIS is an open-
source program and is classified as a geographic information system. [4]
12
2.3 Machine learning
To learn something requires different methods such as
exploring/practicing new information and storing the information [5]. To
teach computers is something that has been in the interest for a decade.
The word artificial intelligence was formulated in the 1956 by a man
called John McCarthy which had a presentation about it [6]. After a
couple of years, the first paper about AI that could be acumen and for
example play chess, this article was written by Alan Turning [6] [7].
The thought process behind machine learning was to emulate the human
brain. And in the 1950’s Frank Rosenblatt did create the perceptron
which is used when the dataset is linearly separable. The perceptron is
designed after the neuron of the human brain. The neurons are receiving
and handling the incoming messages. The same thing is happening in the
perceptron. The perceptron handles one or more input and sends out an
output. The perceptron is visualized in the Figure 1. [8]
The numbered x:es is the information receiver, and the circle represents
the place where the input gets processed, which in the computer means
where the calculating takes place. When the input is processed then it
will go to the output where it is written as hθ. [8]
The Figure 2 demonstrates an overview of the different parts of the
different parts of machine learning.
Figure 1 - Perceptron visualized (source: G. Hackeling,
Apply effective learning algorithms to real-world)
13
2.4 Supervised learning
There are two different kinds of methods when using machine learning,
supervised and unsupervised. The supervised machine learning is data
that is classified. Problems with this method is to collect the data and
labeling it. Secondly the problem with the data can be the labeling of this
data is inaccurate, this could be a missing value or an implausible value.
[9] It is important to correct the values for getting an accurate result when
running the algorithm.
Figure 2 - overview of the different parts in Machine learning
14
2.5 Computer vision
Computer vision was developed after the AI was created. Computer
vision was documented to be experimented with in the early 1970s.
Computer vision is as the name suggest the visual aspect. Computer
vision was created to assist in the development of the AI visual aspect, to
try to make a computer recognize what is shown in a picture. This was
developed in MIT. Early computer algorithms were developed to firstly
look at the boundaries of a shape in 2D. [10]
CNN is an algorithm that is used for computer vision, this algorithm is
analyzing the image and extracting data and using it to group the
different patterns and objects together to establish what it is. [11]
2.6 Convolutional Neural Networks
Convolutional networks are mostly used for the visual aspect i.e. image
data. The data is processed, and the AI is finding patterns and
constructing them. [11] [12] CNN has two different approaches which is
the two-stage [13] and one- stage system [14]. Two-stage system is
generating bounding boxes, these groups contain the created candidate
object proposals. The candidates are created in two different steps [15]
[12], while the one-stage technique does the grouping and classification
of the object proposals, all at one time [15].
2.6.1 Object detection
Object detection and image classification had a difference where the
image classification can tell what object is present but not how many,
where object detection could determine how many there are. [12] Object
detection is not limited, it can be used in computer vision, but it is
burdensome when it comes to the background structure and size
difference. [15]Object detection does simply two things, i.e. pinpoints the
units in an image and their position in it. [12]
First step is to do region proposals, which means that you read in the
image and at random and produce bounding boxes. Then an algorithm
is needed to find the optimal place for the box, for example that is the
size of the box, and where on the picture the coordinates for the box
should start and be centered. [12]
15
2.6.2 Semantic segmentation
Segmentation has an input and output, where the input consists of
pictures and the output entail regions and structures. When a picture has
been used it is crucial to treat the image, you treat the image by working
on the filters, gradient, and colour data. [16] Semantic segmentation is
the concept of evaluating every pixel in picture and comprehend it. [16]
[17]
2.6.3 R-CNN
R-CNN stands for “Region based Convolutional Network. R-CNN
begins with analyzing the picture and chooses some areas from it. Within
these areas it tries to identify and label each of these areas. These areas
could be bounding boxes. With the CNN algorithm it does a calculation
to use some of the characteristics of interest and to forecast the
classification. [18]
Figure 3 - R-CNN overview. Source [18]
With R-CNN there are multiple steps of clustering where the clusters gets
bigger. This type of clustering shows that the object does not share the
same size [19]. There are different parts to selective search. The first is
that the selective search is applied on the picture. The selective search
gives the best guess of the areas that would be of interest, and it will
choose them. [20]
After the selective search the data goes through a CNN prior to it
reaching the output layer. [18] The CNN can only handle a small image
that is why the different regions are removed and resized for it. [19]
Thirdly, to instruct the multiple support vector machines for the object
classification, the information that the algorithm has received is united
16
to create a sample for that purpose. Then the bounding box and the
attributes chosen are also united to instruct a linear model for ground-
truth bounding box forecast. [18] That can for example be AlexNet or
VGGNet. A linear model like a support vector machine is used in the
classifier. [19]
2.6.4 Fast R-CNN
Fast R-CNN is built for the purpose of enhancing the R-CNN. Ross
Girshick made enhancements to the R-CNN because of its slow speed
and memory usage. [21] One of the changes made was removing the
selective search algorithm. Looking at the architecture the input picture
has region proposals, and it is not generated by the Fast R-CNN, the
region proposals is generated beforehand. [22]
In the next step the CNN manages the images and gives the feature map,
the feature map contains the parts of interest and then selected where
they are detected on the image. [22]
Figure 4 - Fast R-CNN architecture. Source [21]
2.7 Mask R-CNN
R-CNN is an abbreviation for Region Convolutional Neural Networks
[15] and the Mask R-CNN is a development of the Fully convolutional
Network (FCN) [17]and fast R-CNN is developed because the difficulty
in instance segmentation.
[17] In instance segmentation there needs to be a more precise
classification, to be even more precise every single pixel is defined with
the object-instance [12],which is called semantic segmentation [17].
17
Therefore, a fusion of the two techniques is required to do the instance
segmentation which is both the object detection and semantic
segmentation [17]. The Mask R-CNN is the development of Faster R-
CNN. [23] This can be shown in the Figure 5.
Figure 5 - Mask R-CNN architecture [17]
2.7.1 Loss function
The Mask R-CNN is built on the Fast R-CNN and the mask-task loss has
a mixture of the bounding box classification and regression. The formula
is visible in Equation 1. [17]
Equation 1 - The mask-task loss formula for Mask R-CNN [17]
The Lcls is the classification loss and the Lbox is the bounding box. [17]
comparing the Lcls and Lbox it shows that it is the same in Mask R-CNN
as in Faster R-CNN, but with the only difference of adding the Lmask. So ,
Lcls added together with the Lbox gives the formula in Equation 2. [21]
Equation 2 - Faster R-CNN loss function [21]
18
The output of the Lmask is Km², and in the essay written by [17] says
that the K stands for the binary masks with the resolution of m x m,
which means that is the number of K classes.
2.7.2 Intersection over union
IoU stands for intersection over a union, and in some cases, it is also
recognized as the Jazzard index. The IoU is used in Mask R-CNN, and it
is well liked when it comes to segmentation. The formula for the IoU is
shown in the Equation 3. [24]
The I in the Equation 3 is the Intersection Area and the U is the Union
area. Also, in the Equation 3 the A is the prediction and B is the ground
truth bounding boxes. [24]
When normalizing the IoU the output is the area and the sizes do not
make a difference. With this information in mind the regular IoU do
consider the sizes. The IoU is used in machine learning when there is a
need to analyze how accurate the bounding box is to the ground truth.
[24]
2.7.3 Non-Maximum Suppression
Non-Maximum Suppression has an acronym that follows as (NMS). The
NMS is a powerful tool when it comes to recognizing replicas. When
recognizing the matching pair, it removes them and outputs the images
without the matching pair. [25]
2.8 Related work
The first two articles are about Facebooks scientist implementing maps
with the help of ai, and the rest is researchers using Mask R-CNN to
detect different problems with the AI.
Equation 3 - Formula for the IoU source:[24]
19
2.8.1 Mapping roads through deep learning and weakly supervised training
Researchers at the Facebook AI department uses AI to draw maps where
the maps are not accurate or nonexistent, according to the researchers
they used a weak supervised training to guess different parts of the map.
For completing the maps satellite images were used, in a test they did
use training data from the entire country which predicted accurate
results in Thailand but using the same training data in other countries
did not produce the same result. [26]
2.8.2 Generating retinal flow maps from structural optical coherence tomography with artificial intelligence
This article is related because the scientist uses retinal flow maps to train
an AI and then uses this flow maps to deviate from using professional
help to classify the retina flow maps. The approach that was used by the
scientist is that their dataset was separated into training and validation
and a held-out test, the percentage of this training was split between
70%15%15%. They also explain that the most suitable parameter was
used. The scientist also tested 12 different deep learning models and the
function used for training the data was the adam learning function. [27]
2.8.3 Utilizing Mask R-CNN for Detection and Segmentation of Oral Diseases
In this paper the researchers use Mask R-CNN which was based on
Matterport Inc code (https://github.com/matterport/Mask
RCNN), they limit the research to only use it for identifying the highly
widespread disease in oral pathology which is herpes labialis and
aphthous ulcer. They compared the result to another pixelwise classifier
in the same area and found out that their model had a higher
accuracy .744 vs .621 even though the latter had 258 images and their
only had 30 images. They hypothesize that it would have been a better
result if they had been able to have more images to use as data, and
they did prove that with this algorithm the accuracy will be higher even
with less data provided. [28]
2.8.4 Segmentation of Forest to Tree Objects
In this essay they research if it is possible to do a segmentation using only
the height of trees as the data. They use the raster data. The data needed
to be separated because some of the LiDAR data containing some
variations and the irrelevant branches get removed.
20
LiDAR data is used to analyze the maximum and smallest reflection in a
pixel. And they do loose some of the information of the data, but this is
done so that the program would be running faster to decrease the
computational power required. [29]
2.8.5 Automating Forest stands with Image Segmentation
This work is a continuation of the work I did in a group. In the group we
used GAN to try to identify the different stands. When using GAN the
problem was that the algorithm went into a mode collapse. The solution
was using object segmentation algorithms, but the problem was that it
was not an ai, so there was not a learning process. [30]
21
3 Methodology This chapter discussed the methodology used in this essay. One
important aspect of this report was handling a lot of data because every
stand took space, all of the data of Sweden was 13GB (before dividing it
up). When handling a lot of data, it was important to carefully do so, or
else there could be an error when running it through the Mask R-CNN.
The methodology used to help in this process was a mixture of CRISP-
DM and a methodology proposed by Jose Solarte, which had a more
detailed approach on how to tackle a datamining task. Some parts of the
methodology steps are focused on the evaluation of the project with the
goal of delivering a product. Since this project had no intention of that,
those parts of the methodology were discarded.
3.1 Scientific method description
I used a quantitative method to measure the accuracy of the project. The
comparisons are numerical. I did experiments and measured the
accuracy and time. Gantt was used to keep the work on track. Gantt is a
schedule technique which gives the user the ability to keep the work
trackable and to do every single task that is needed to be able to complete
before the deadline.
3.1.1 CRISP-DM
CRISP-DM is an acronym for Cross-Industry Standard Practice for Data
Mining. There are 7 different steps to this process which is:
- Business understanding
- Data understanding
- Data preparation
- Modeling
- Evaluation
- Deployment
- Performance measurement
22
Figure 6 - Crisp DM process source: [31]
The method used was CRISP-DM and a proposed method from Jose
Solarte where the methodology was very detailed, it was based on
CRISP-DM and SEMMA. However, Solartes method was proposed for
students in industrial engineering which is why the economic
perspective was not used and the focus was on the data mining and
analyzing the data. [32]
3.2 Business understanding
First the business understanding (the first step) was analyzed for the
organization which was CGI in this particular case. CGI needed a
solution for their forest stand problem. The solution which CGI wanted
solved was to make the forest stand automatically. Details of this was
written in chapter 1 and 2.
3.3 Structure the work
After the business understanding the work needed to become more
structured and some information about the forest stand was needed,
hence a literature study was conducted. The literature study was
conducted to understand what algorithm, and how to structure the data
is the most appropriate for the task. The essay did follow the proposed
methodology of Solarte [32] which was choosing between the following:
-Description and summarization
- Segmentation
23
-Concept description
-Classification
-Prediction
-Department analysis
An example of this process is found in Figure 6 [31]
The segmentation step was picked after doing the literature study for the
essay. There was an experiment [33] that used Mask-RCNN on nucleus
data, and after reading about that and how the Mask-RCNN the
segmentation looked promising.
3.4 Datamining task
According to the figure in Appendix A: Figure 26 there was a need for an
analyzation of the datamining task. When doing the analyzing the forest
stands showed a need to use segmentation. The segmentation was
necessary because the stands needed to be separated and analyzed by the
algorithm in order to find a pattern for the different stands.
The next step in Appendix A: Figure 26 was to find the technique that
was needed, the technique used was neural network because the
algorithm required it to be trained with the aim of finding patterns.
3.5 Evaluate tool
To evaluate the algorithm chosen the use of Appendix A: Figure 26
guided through the different areas that needed to be inspected.
Doing the literature study, it pointed to an algorithm called Mask R-CNN.
This algorithm did solve the segmentation of the forest stands. The Mask
R-CNN algorithm was able to run locally on the computer, the
specification of the computer is shown in Table 1.
3.5.1 Comparing R-CNN , Fast R-CNN and Mask R-CNN
Comparing the Mask R-CNN to R-CNN, Fast R-CNN and Faster R-CNN
the Mask R-CNN is superior.
The limitation of the R-CNN was the speed [18], when training the data
there was multiple steps in the process. Because of these multiple steps,
the training took time, and needed a decent computer to be able to do the
24
calculations. Also, a lot of space was required because every object
proposal in the data was written to the computer’s storage unit. [21] R-
CNN had a slow speed hence why it is not used a lot. [18]
Lastly the negative part of R-CNN was that the object detection was not
as fast as it could be when trained with the VGG16 network. The
conclusion was that it takes 47 seconds for every picture. [21]
Comparing the Fast-RCNN and R-CNN the difference was that Fast R-
CNN is 9 times faster when it came to training. When comparing the time
spent running the models, it was 213 times faster than R-CNN. Also, it
had a considerable better score on the mAP on the Pascal VOC 2012
dataset. [12]
Figure 7 - comparison of R-CNN and Fast R-CNN Source: [21]
Mask R-CNN was a more precise object detector than faster R-CNN.
[23]The overhead generated by Mask R-CNN was very minuscule
because the algorithm uses the already present network. Because of this
small overhead the processing is practically the same as Faster R-CNN.
[12]
3.5.2 Functionality
The Mask R-CNN had a good set of functionalities. One of the many
good features was that it was possible to train different layers, for
example if there was an error when running all layers, there was a
possibility to divide the layers and begin with only training the head
layer. Dividing the layers was necessary in the beginning of the work, on
25
the first run of the algorithm because when running all the layers in the
beginning it crashed. This helped for analyzing how to input correct
parameters for the Mask R-CNN.
When training the Mask R-CNN , another neat functionality was that
there was a possibility to continue to train an already trained model,
which I did use. One example is when there was more data available. So
instead of creating a new model from scratch, it was possible to take a
previously trained model and continue to train it with the new data. This
functionality was very useful and saved a lot of time and computing
power.
3.5.3 Usability and Support
Since the purpose of this work was not to create, nor evaluate a tool in
terms of usability, the usability, and the support of Mask-RCNN was not
considered. The focus had been towards the functionality, to make sure
the proper tests/training and analyzation of the result could be
performed in a good manner.
3.5.4 Identify resources
The next part of the process was about making sure that the project was
feasible, in short, to determine if the project could be executed or not. The
steps were to identify the resources required by the tool, determine
additional required resources and from that, decide if the project was
feasible. The resources required by Mask-RCNN was the training and
test-data, along with an environment that had sufficient memory to load
the data necessary to perform the training.
3.6 Data understanding
The next step was to gather all the data, prepare the data and then run
that through the machine learning algorithm. In this report the data
collected was the different stands of the different areas of forests. The
preparation consisted of randomly selecting smaller areas and dividing
them into test and training. Then all the different stands in those areas
was first extracted and then masked, so the Mask R-CNN had a ground
truth. All the training data was then fed into the Mask R-CNN during the
training phase.
The parameters available was:
- Height
26
- Tree species
- Vegetation
- Volume (m3sk / ha)
- Manually selected forest stands.
3.6.1 RGB band
The data consisted of three bands, RGB which stands for red, green and
blue. These different colors represent different data. The data in the
report these colors were representing different aspects of the trees, in this
case the red stood for the height of the trees. The height was especially
important to know because a huge part of dividing the forest stands did
depend on the height of the trees.
The green band was the basal area, and it told how much trees there were
in the area. The basal area was measured by taking the height of 1.3m of
the ground, and with a diameter of the tree’s circumference was
measured. [34] The formula for measuring the basal area was written as:
𝐵𝐴 =𝜋 𝑥 𝑑𝑏ℎ2
40000
Equation 4 – Basal area equation
The basal area was measured in sq m. The dbh was a shortening for
“diameter at breast height” in cm which was around 1.3 m [34] from the
ground. The areas of all trees could be used as a measurement to see how
crowded the stand was. This measurement was called the basal area of
a stand. [35]
The blue band stood for the volume. The volume was calculated in m3 by
using the mid-point diameter and size of every tree. There were two
different conditions that needed to be fulfilled when measuring trees.
The one that was fulfilled first was the one that was used. The first
condition was if the overbark was 7 cm then that was used (trees at breast
height with a lesser diameter than 7 cm were routinely disregarded). The
second condition was when the main stem was distinct. [35]
To estimate the volume there was different methods, but one method
was using the following equation:
27
𝑉 = (𝐵𝐴
2+ 0.0019) 𝑥 𝐻
Equation 5 – Volume of trees formula in m3
The formula described the volume for the tree which was measured in
cubic meters. Then BA stands for the basal area in m2, and lastly the H
was for the height. [35]
RGB merged displayed a picture which was all the values merged. Then
this data was used for analyzing.
3.7 Data preparation of TiFF and Shape file
The file format of the data was a large Tagged Image File Format (TIFF)
file of the area in Sweden. There was also a shape file that was a layer on
the map with markings, which outlined the different stands on the map.
These stands were the target data.
To be able to make this project work, the stands needed to be extracted
from the shape file. Then each of the stands needed to be masked which
meant that it must be filled in with white where the stands was, then the
background should be colored to black according for the segmentation
to work.
A problem that could occur was the tiny stands that was consisting of
just a couple of pixels, those stands was difficult for the Mask R-CNN to
process and needed to be sorted out. It was also a problem when the
forest stand area was too big, because the computer did not have enough
processing power it to render it.
3.8 Evaluation
I was evaluating the accuracy by dividing the data to training and
validation data. The data consisted of several stands and from the stand
a lot of masks was extracted into several PNG files.
The accuracy was based on the masked data. For example, if the Mask R-
CNN founded all the masks in the test data set, the accuracy would be a
100%. The company wanted an accuracy around 80%.
28
4 Implementation This chapter showcased a flowchart of how the areas got extracted and
used in the Mask R-CNN. The programming language python was
chosen, and a description of how Mask R-CNN was used with the setup
was described.
4.1 Handling the data
There were different steps to train the
Mask RCNN. These steps were
demonstrated in Figure 8.
Firstly, the extraction was executed by
different steps. First step was to extract an
area. Dividing the data was important
because of the computers processing time.
If an area was not extracted, then the
processing would be too big for the
computer to handle.
Secondly, an area of choice was selected,
and in this report, it was an area in the
south of Sweden. There was a TIFF file and
a shape file for the area. The shape layer
was used to extract the different stands
from the selected area. These stands were
selected, and an extraction was made from
the area.
Thirdly the images were converted to a
different format. This format was crucial
because the input of the Mask-RCNN only
accepted PNG files.
Furthermore, the images were masked. The masking was a process
where the stands that were on the area was extracted in order to teach
the algorithm where the different stands were. With this in mind there
was not a possibility to input more than one stand at a time into the
algorithm, the different stands on the picture needed to separately be
inputted to teach the algorithm the different stands. In the masked image
Figure 8 - flowchart of the data
29
the stand was filled in with a white color then the rest of the picture was
filled in with black. Then a new picture was created and the stand next
to the previous one was chosen.
The first approach for masking the pictures were made in adobe
photoshop. The next step was doing it automatically.
With atomization the data was filled in using code. The code did analyze
each pixel, if the pixel contained a color, then it was colored white. If
there was not any color, then it was set to black. The black was created
by using the RGB colors in the picture and setting them to the values
0(0,0,0) and the stand was colored in by setting the RGB to white which
is the number 255 (255,255,255).
Lastly the data was then used as an input in the Mask R-CNN. All of the
data was split into two parts, one was the training data, and was the
validation data. The data sent into the algorithm was the TiFF file
converted to png and masked images for each stand.
4.2 Python
The code was written in python. Python is a programming language
which was widely used for machine learning. A benefit of using python
was that it was focused on readability which made it easy to read, and
they also did reinforce modules and packages. [36]
4.3 How Mask RCNN is used
The Mask R-CNN implementation was based on the foundation made
from matterports code from github. The code used was tested on
microscopic images. [37]
When running the Mask R-CNN it requires a set of weights. The weights
used in this paper was by MS-COCO. The MS-COCO weights were a
pretrained model which made it easier when training a new dataset for
the first time.
When training a Mask-RCNN model, it was possible to specify which
layers to train. The user could either train all layers, a choice of layers or
select a predefined setting, called heads. The layer that was going to be
trained was specified. The purpose of using heads was basically to keep
the training time down when the data was trained on. Similar to the data
30
that the model had trained on previously. When using the heads setting
only the RPN (Region Proposal Network), the FPN (Feature Pyramid
Network) and mask heads of the network was trained. [17]
RPN is a neural network that proposed what regions in an image might
contain the objects that were being looked after, also called ROI (Region
Of Interest). That way, the rest of the layers would not have to scan the
whole areas but can instead focus on the areas that had been proposed
by the RPN. [17]
FPN is, very simplified, a CNN with the purpose of trying to identify the
objects searched for in different scales that may occur in the image. When
searching for a forest stand it did find a small stand as well as a stand
that might cover a quarter of the image. [38]
Also, the Mask head would be trained which was responsible to mask all
the ROIs that had been identified in the previous steps.
4.3.1 Environment
The specifications of the computer used to train the Mask R-CNN
algorithm is specified in the Table 1.
Table 1 - Computer specifications and software version
4.3.14 Running the Mask R-CNN
An epoch was when the algorithm had passed through the whole
training dataset one time. [39]
After every epoch the model was saved, this was because of the
possibility to go back and run on a specific model, because this was
required to save time. When running the Mask R-CNN there was an
uncertainty if the algorithm would be able to run until the end, because
it could easily run out of memory if there is too much data in one batch.
4.3.2 Operating System 4.3.3 Windows
4.3.4 GPU 4.3.5 NVIDIA GeForce RTX 2060
4.3.6 CPU 4.3.7 2.60GHz 6-Core Intel(R) Core i7-10750H
4.3.8 Memory (RAM) 4.3.9 16,0 GB
4.3.10 Cuda verision 4.3.11 10.1.243
4.3.12 Cudnn 4.3.13 7.6.3
31
4.4 Measurement setup
Along with the model there was logs in the file which gave the possibility
to use Tensorboard for analysing the data. The model file was saved as
a .h5 file which contained the trained model along with a timestamp. The
timestamp was from when the epoch was finished running.
In Tensorboard the information given was epoch loss i.e adding the
following losses: RPN classifier loss, RPN bounding box loss graph, loss
for the classifier head of the Mask R-CNN, loss for the Mask R-CNN
bounding box and lastly mask binary cross-entropy loss for the mask
head.
32
5 Results The manually extracted stands data was not sufficient for it to be able to
run, because it did not have enough stands. When using code to run the
masking, the masking was made in an instant which is necessary when
working with a lot of training data.
Firstly, the big area with 3248 stands did take 2 days to run and was on
the verge of running out of memory. Secondly the areas were extracted
with a smaller number of stands. That made a big difference when it
came to the memory on both the computer and Google Colab, both
platforms managed to run the algorithm smoothly.
When the algorithm was done running, the log file was saved. Then the
log file was used to run the analyzing section, which was provided by
Matterport Inc [37]. The code contained a Jupiter notebook. The Jupiter
notebook helped to investigate the dataset, running statistics on it and
analyze the detection from the beginning to the end.
5.1 Manually masking a forest stand
In the Figure 9 is a picture of when the stands were masked manually in
photoshop. Then the masked images were used as a test run in Mask R-
CNN. The test run did not manage to run because the algorithm requires
at least two areas to be able to run. That was because it needed one area
for training and one for validation.
Figure 9 - manually masked stands in ps
33
5.2 Results of all the areas
The epoch loss of the areas 1, 6 and 42 is visible in Figure 10. The epoch
loss was all of the different losses added together. The different losses
were the RPN classifier loss, RPN bounding box loss graph, loss for the
classifier head of the Mask R-CNN, loss for the Mask R-CNN bounding
box and lastly mask binary cross-entropy loss for the mask head.
Figure 10 - Comparison of all the different areas
The RPN class loss, which was the RPN anchor classifier loss, is visible
in the Figure 11. The RPN was the region of proposal network, and it
displayed the classification loss, which signifies the belief in forecasting
the class labels.
34
Figure 11 - RPN class loss
The time to complete running the Mask R-CNN and the different
number of stands used. The average score was the highest when using
42 areas with one validation area, but the average IoU was the best
when using 42 areas with a 70/30 split, with a percentage of 47,5 %. This
is showed in the Table 2.
Area Time to
complete
Stands Average
score
Average
IoU
1 area ran on computer 2days 7 min 3248 64,5% 15,2%
6 areas ran on computer 1h 4 min 87 85,2% 24,7%
6 areas ran on Google
Colab
2 h 35 min 87 88,3% 28,9%
42 areas ran on Google
Colab
2h 56 min 624 88,5% 42%
35
42 areas ran on Google
Colab with 70%30% split
2h 51 min 624 86,9% 47,5%
Table 2 - Time to complete running the stands with different number of areas and Average score and IoU from all the
different areas
5.3 One area
The one big area was the most difficult to run because of the size of the
area and the amount of stands in it. The total amount of stands in this
area was 3248.
The setup of my environment was a laptop and a fan beneath it, and also
a mini-fan on the front to prevent the computer of overheating when it
ran for two days, this can be seen in Figure 12.
The computer did manage to run out of memory many times, with a
strike of luck, it worked. But it did not manage to run again which
indicates that it was on the edge of working and not working. This was
tested both on Google Colab and locally on the computer. The only time
it did not run out of memory was on the computer. Because of its massive
size of 3248 stands it did take a long time to run, a total of two days to be
exact. This can be seen in Table 2.
The output when analyzing the log file in Jupiter notebook is shown in
Figure 13 ,Figure 14 and Figure 15. The Figure 13 showcases all the
stands that it predicted and the ground truth. The score of the predictions
is displayed in the captions: the score/IoU. The green areas are the
ground truth in the Figure 13 and the red areas was the predictions.
The Figure 14 is displaying the detections that it did after running the
NMS. Lastly the Figure 15 shows the ROIs after refinement.
38
5.3.1 6 areas
When conducting the test on the 6 areas, the total amount of stands was
87. The head layer was running on the computer and the results is visible
in Figure 10. The time it took was 1 hour and 16 minutes to finish.
There was a comparison made between running the head layers locally
on the computer and running it on Google Colab. The contrast of the
epoch loss and time it took to finish it is shown in Figure 16. The time it
took to run it on the cloud was 1 hour and 12 minutes. The time difference
was 4 minutes. The epoch loss graph is pretty much identical on both the
computer and Google Colab.
Once again, the results were analyzed with the Jupiter notebook and it
shows the stands that it predicted and the ground truth. The score of the
predictions is displayed in the captions: the score/IoU, which is visible in
Figure 18, the green is the ground truth, and the red is the predictions.
The Figure 17 is displaying the detections that it did after running the
NMS. Lastly the Figure 19 shows the ROIs after refinement.
Figure 16 - Epoch loss of 6 areas running locally
and on Google Colab cloud.
39
Figure 17 - Ground truth and Detections, captions:
score/IoU for 6 areas
Figure 18 - Detections after NMS for 6 areas
40
5.3.2 42 areas
The 42 areas result of epoch loss and time is visible in the Figure 10, and
the time it took to finish it was 2 hours and 56 minutes. The amount of
stands in the 42 areas were 624.
Once again, the ground truth and the algorithm predictions are visible
on the Figure 20 and the score of the predictions is displayed in the
captions: the score/IoU, where the green is the ground truth and the red
is the predictions.
The Figure 22 is displaying the detections that it did after running the
NMS. Lastly the Figure 21 shows the ROI after refinement.
Figure 19 - ROIs after refinement for 6 areas
43
6 Discussion The result of the literature review was that the algorithm that should be
used was Mask R-CNN.
A comparison is made between the different results of 1, 6 and 42
different areas. The Mask R-CNN did produce interesting results.
6.1 Scientific discussion
This report shows that Google Colab is faster comparing to a computer
with the specifications of the computer written in Table 1.
The limitations of this project were that the number of areas extracted
was not enough to get a high accuracy and there needed to be a bigger
set when running the Mask R-CNN. The timeframe did limit the number
of areas that could manually be extracted.
6.1.1 Manually masking a forest stand
Masking the forest stands in photoshop requires time because of the
program, and it takes a couple of steps to do.
Masking the forest stands manually is a tedious task. Completing one
small area with six stands is inadequate because when masking an area
there needs to be at least more than six stands for Mask R-CNN to be able
to run. In the future, even for testing the algorithm it is not recommended
to manually paint and mask 6 small stands.
6.2 One, six and 42 areas.
The first big area extracted was 18,9 x 16,9 km and it contained 3248
stands. This area caused the Mask R-CNN to be on the verge of crashing
and making it run was very difficult.
Examining the Figure 23 there is a visible spike after 20 epochs that is
because of a restart of the training moving from training only RPN, FPN
and mask head, to it training all the layers.
44
The graph in Figure 23 (the highlighted areas) indicates that it has not
really flattened out, so this model is still a little underfitted. It seems to
be moving towards a good fit which indicates a better predicting model.
If it would continue training it probably would have resulted in a good
fit.
Every graph that has to do with the validation data looks different than
the average validation data. The reason for it to look different may be
due to the validation set being set too small.
In Table 2 the score is a confidence score for the prediction. I.e. a high
confidence score means that the model is confident that the marked area
is a stand. Looking at the average of the IoU, the best average IoU is the
42 areas with a data split of 70/30. The 70/30 split means that 70% is
training data and the 30 % is the validation data, and in this case the
validation was 12 stands. The result it gave was an IoU of 47,5%. The
IoU is pretty decent and with only 624 stands it was better than expected.
To be able to get a high IoU there needs to be more automatization when
it comes to extracting the areas with the stands in it, that is to make it
possible to get a bigger training set without having to spend a lot of time
on just the extraction of the stands.
6.3 Rpn value loss
Analyzing the graphs given from the results in Figure 10, there is a more
stable line in the training and validation split where it has a 70/30 split
Figure 23 - highlighted area shows stabilization.
45
instead of using only one area as a validation. The Figure 24 is a zoomed
figure from the results in the Figure 11.
The reason for the values changing more rapidly is, when a small change
is made to make the model better, a bigger impact is visible in the graphs.
With a bigger impact means that it does not show a stable line. This is
partly due to the set it validates against is too small. With a bigger
validation set the outcome becomes smoother and better.
Figure 24 - Zoomed in figure of the Rpn class loss graph
6.4 Project method discussion
The chosen method was clear, concise, and easy to follow. The method
was not optimal for my particular case because it was more fitted
towards creating a product and not analyzing data.
The different stands were extracted and converted for the Mask R-CNN
to be able to process the data given and to teach the AI about the pattern
of the different stands.
To extract the different stands QGIS was used to open up the TIFF file
and then they were extracted, so they later could be converted into PNG.
The conversion made it possible for the Mask R-CNN to process the data.
When a conversion was made, the algorithm required masked images,
hence masking was done to the images by converting the RGB values.
Running the masked images through the Mask R-CNN teaches the
algorithm the position of where the stands are.
46
Tensorboard was a great tool to evaluate the log files. The log contains
the model performance throughout the training process.
The Jupiter notebook had a lot of great evaluation tools, such as the
ground truth and detections with the captions: score/IoU, Detection
before/after NMS and ROIs before/after refinement.
6.5 Ethical and societal discussion
The ethical prospect of this work is that if this application is successful
then the workers that manually does the work will probably have to do
something else because their services will not be needed when a
computer generates and sees a pattern of the work. So, there needs to be
a responsibility of using the software and make sure to use the workers
for other purposes.
The environment will benefit from this work if it is implemented. That is
because there will not be a reason to fly the airplanes and take the flight
photos for the stands, the data that is available is enough to make it work
without the flight photos. When the trees are cut down there needs to be
an evaluation of the forest stands. The workers would not have to use
cars to travel to the different areas which would be needed to be re-
evaluated. The companies can use the Mask R-CNN to re-evaluate the
stands.
47
7 Conclusions After the literature review the conclusion was to use Mask R-CNN. When
using Mask R-CNN the results show that Google Colab is faster than
running the forest stands locally on the computer. However, the IoU and
the epoch loss did not matter whether running the Mask R-CNN on the
cloud or locally. This result is limited to the stands in Sweden which are
taken from the south of Sweden. If changing the position to the north of
Sweden the results may change because the vegetation does differ when
comparing the south to the north.
To my knowledge no one has used Mask R-CNN to segment different
forest stands. Hopefully, this essay will give the forest industry what it
needs to be able to make a change when dividing different stands. This
method shows promising results and with more training it could be to
good use.
7.1 Project continuation
This project is a continuation of the project made in Mid Sweden
University, where I was a part of the group. On the last project we did
not manage to get an AI to analyze the different forest stands. The only
thing we did manage to do was to use object segmentation algorithms.
But in this project, I did manage to run it in an AI and make it learn that
the different stands pattern to a degree. The best predicted stands had a
IoU of 80%.
7.1.1 Computer set against Google Colab
Comparing the different results, the difference between Google Colab
and the test computer with the RTX 2060 graphics card, shows a
difference in time when running the same data. Google Colab was faster
and managed to finish faster compared to the computer. Even though
the difference was not that huge, it could be when using a big amount of
data.
Because there is a lot of stands to be extracted and worked on, the time
required to run the algorithm is of huge importance. Hence, why it is
better to use Google Colab and its services, if having a computer with
similar specification as in Table 1.
48
7.1.2 One big area with many stands versus 6 areas with smaller stands
Comparing how the algorithm can handle data, the algorithm was
running out of memory both locally and on the cloud, when running the
area with 3248 stands.
To improve both the time and accuracy I would not run such a massive
area with so many stands in it. When looking at the results, it shows it is
much more efficient and beneficial to dividing an area and concur it.
7.1.3 Comparing 6 stands versus 42.
Processing 6 different areas took 2,5 hours and processing 42 different
areas took 3 hours. Comparing these results shows that having a lot of
areas with stands is much more effective than for example having one
big area with many stands in it.
The algorithm did quickly process the data even though the amount of
data was multiplied by 7. This shows that the key to get a working model
is to extract many small areas rather than a big one.
7.2 Scientific conclusions
This work will probably help the forest industry and the company to
figure out a good solution to their problem. There is still a little work left
to be done, using this method could be a solution. Using a training and
validation split of 70/30% was better than using one area as validation.
The value loss did differ, it became more stabilized.
7.2.1 Is it possible to create forest stand with the help of ai?
It is possible to create a forest stand with the help of AI, but the difficult
part of the algorithm is to get a high IoU. Some of the stands did get a
high IoU. The company (CGI) had a requirement for the AI to have a
precision about 80%, only one stand had a precision that reached that
high.
7.2.2 Find a good strategy.
It was difficult to find a good strategy because forest segmentation is not
anything common. So, the approach was to use the nucleus code which
is tested on microscopic data. The similarities are that a stand is an area
on an image which is similar to the microscopic data, hence why it was
49
chosen. When doing a literature review about the different approaches,
the Mask R-CNN seemed to be the most fitting because it was both faster
and better than the other CNN algorithms.
7.2.3 How long will it take for the AI to generate results, and will it be more efficient than traditional methods?
It is more or less instant for the Mask R-CNN to generate prediction
results. However, to train a model does take time, and evaluating the
different stands takes time. Comparing the results running on Google
Colab, the time it takes for 42 areas with a total of 624 stands, gives the
result of 2,455 hours. A forecast visible on Figure 25 shows that 504
stands would take 5,255 hours.
Figure 25 - Forecast of 504 areas
7.2.4 Compare generated forest stands with manually selected forest stands.
The manually selected forest stand is currently doing a better job than
the automized forest stand because the accuracy of the generated are still
too low to be able to compete with the manually selected stands.
7.2.5 Compare the time and accuracy of forest stand generated in Google Colab versus forest stands generated on the computer locally.
There is not a noticeable difference on the IoU when running the stands
locally on the computer versus when running them on Google Colab.
Both devices did give the same output, the difference was the time it took
50
to process the data. Google Colab was faster and easier to use than using
the code locally.
7.2.6 Measure the accuracy.
The accuracy was measured with the Jupiter notebook code, and it made
it possible to see what a big difference adding some areas can do to the
accuracy of the stand predicted by the Mask R-CNN. The average IoU
was at 42% with 42 stands, which makes the algorithm unusable by the
company because of their requirement of the percentage to be at around
80%.
7.3 Future Work
In the future the forest areas should be extracted and sorted
automatically from the start to the finish. Also, it is important to have
more validation data than used in this paper. For the future work it
would be interesting to try at least 1000 areas to see the results and to
compare those results with the results from this paper.
Also, the nature in north would be interesting to add to the stands ran
in the Mask R-CNN. For the most optimal result it would be best to run
all the stands from the Sweden map in the Mask R-CNN.
It could also be interesting to try to artificially enhance the resolution on
the test data to see if the output would differ. With this said, with the
data available, I believe that it is fully possible to achieve an average IoU
of 80% with more training.
51
Bibliography
[1] Swedish goverment, "Den globala skogsutvärderingen 2015," in
FN:s Livsmedels- och jordbruksorganisation, 2015.
[2] J. Helms, The dictionary of forestry. Society of American Foresters,
Bethesda, 1998.
[3] H. E. Burkhart and M. Tom´e, Modeling Forest Trees and Stands,
Netherlands: Springer, 2012.
[4] QGIS, "QGIS dokumentation," [Online]. Available:
https://docs.qgis.org/3.16/en/docs/user_manual/preamble/feature
s.html. [Accessed 1 March 2021].
[5] J. G. Carbonell, R. S. Michalski and T. M. Mitchell, Machine
Learning: An Artificial Intelligence Approach, Springer-Verlag
Berlin Heidelberg, 1983.
[6] C. Smith, "The History of Artificial Intelligence," University of
Washington, 2006.
[7] Tate,Carl, "History of A.I.: Artificial Intelligence," livescience, 25
August 2014. [Online]. Available:
https://www.livescience.com/47544-history-of-a-i-artificial-
intelligence-infographic.html. [Accessed 07 March 2021].
[8] G. Hackeling, Apply effective learning algorithms to real-world,
Birmingham: Packt Publishing Ltd., 2014.
[9] . I. Maglogiannis , . K. Karpouzis, M. Wallace and J. Soldatos ,
"Supervised machine learning: A review of classification
techniques," Emerging artificial intelligence applications in computer
engineering, vol. 160, no. 1, pp. 3--24, 2007.
[10] R. Szeliski, Computer Vision: Algorithms and Applications,
NY,USA: Springer Science & Business Media, 2010.
[11] C. C. Aggarwal, Neural Networks, NY, USA: Springer, 2018.
[12] M. Sewak, P. Pujari and M. R. Karim, ractical Convolutional
Neural Networks: Implement Advanced Deep Learning Models
Using Python, Birmingham: Packt Publishing, 2018.
52
[13] J. Dai, Y. Li, K. He and J. Sun, "R-FCN: Object Detection via Region-
based Fully Convolutional Networks," 2016.
[14] T.-Y. Lin, P. Goyal, R. Girshick, K. He and P. Dollár, " Focal loss for
dense object detection.," in Proc. Int. Conf. Comput. Vis., 2017.
[15] X. Jiang, A. Hadid, Y. Pang, E. Granger and X. Feng, Deep learning
in Object Detection and recognition, Singapore: Springer, 2019.
[16] t. wang, "Semantic Segmentation," university of toronto, toronto.
[17] K. He, G. Gkioxari, P. Dollar and R. Girshick, "Mask R-CNN," in
Proceedings of the IEEE International Conference on Computer Vision .
[18] A. Zhang, Z. C. Lipton, M. Li and A. . J. Smola, Dive into Deep
Learning, 2021.
[19] M. . M. Khapra, "CS7015 (Deep Learning) : Lecture 12 - Object
Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only Look
Once(YOLO)," [Online]. [Accessed 2021].
[20] J. Uijlings, v. d. Sande, G. K.E.A. and T. e. al., "Selective Search for
Object Recognition," International Journal of Computer Vision, vol.
104, p. 154–171, 2013.
[21] Girshick, Ross, "Fast R-CNN," Microsoft Research.
[22] "Advanced Computer Vision with TensorFlow," Coursera, 2020.
[Online]. Available: https://www.coursera.org/lecture/advanced-
computer-vision-with-tensorflow/fast-r-cnn-FjbUm. [Accessed 10
05 2021].
[23] L. Jiao, F. Zhang, Liu and Fang, "A Survey of Deep Learning-based
Object Detection," IEEE Access, vol. 7, no. 2169-3536, p. 128837–
128868, 2019.
[24] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid and S.
Savarese, "Generalized Intersection over Union," The IEEE
Conference on Computer Vision and Pattern Recognition (CVPR),
2019.
[25] "ArcGIS Pro," esri, [Online]. Available:
https://pro.arcgis.com/en/pro-app/latest/tool-reference/image-
analyst/non-maximum-suppression.htm. [Accessed 01 05 2021].
[26] S. Basu, D. Bonafilia, J. Gill, D. Kirsanov and D. Yang, "Mapping
roads through deep learning and weakly supervised training,"
53
Facebbok AI, 23 7 2019. [Online]. Available:
https://ai.facebook.com/blog/mapping-roads-through-deep-
learning-and-weakly-supervised-training/. [Accessed 09 March
2021].
[27] C. S. Lee, A. J. Tyring, Y. Wu, S. Xiao and A. S. Rokem, "Generating
retinal flow maps from structural optical coherence tomography
with artificial intelligence," Scientific Reports, vol. 9, no. 1, pp. 2045-
2322, 2019.
[28] R. Anantharaman, M. Velazquez and Y. Lee, "Utilizing Mask R-
CNN for Detection and Segmentation of Oral Diseases," in IEEE
International Conference on Bioinformatics and Biomedicine, 2018.
[29] M. Maltamo, E. Næsset and J. Vauhkonen, "Segmentation of Forest
to Tree Objects," Springer, Netherlands, 2014.
[30] D. Saveh, H. Jansson, L. Thunberg and J. Ramezanja, "Automating
Forest stands with Image Segmentation," Mid Sweden University,
2020.
[31] B. D. Ville, Microsoft Data Mining, Digital press, 2001.
[32] Solarte and Jose, "A Proposed Data Mining Methodology and its
Application to Industrial Engineering Industrial Engineer," The
University of Tennessee, Knoxville, 2002.
[33] G. Lv, K. Wen, Z. Wu, X. Jin, H. An and J. He, "Nuclei R-CNN:
Improve Mask R-CNN for Nuclei Segmentation," IEEE 2nd
International Conference on Information Communication and
Signal Processing (ICICSP), Weihai, China , 2019 .
[34] C. Gilman, S. G. Letcher, R. M. Fincher, A. I. Perez, T. W. Madell
and A. L. F. Alex, "Recovery of floristic diversity and basal area in
natural forest regeneration and planted," BIOTROPICA, vol. 48, no.
6, pp. 798-808, 2016.
[35] P. Edwards, TimberMeasurementA Field Guide by, Edinburgh:
Crown , 1998 .
[36] Python, "What is python? executive summary," python, [Online].
Available: https://www.python.org/doc/essays/blurb/. [Accessed 1
March 2021].
[37] "Github," Matterport, Inc , [Online]. Available:
https://github.com/matterport. [Accessed 01 03 2021].
54
[38] T.-Y. Lin, P. Doll ́ar, R. Girshick, K. He, B. Hariharan and S.
Belongie, "Feature Pyramid Networks for Object Detection,"
Facebook AI Research (FAIR), 2017.
[39] "DeepAI," [Online]. Available: https://deepai.org/machine-
learning-glossary-and-terms/epoch. [Accessed 01 05 2021].
57
Figure 28 - continue of the purposed method of Solarte [32]
7.4 Different stands
7.4.1 6 stands versus one stand on the computer
66
0,73 0,18
0,56 0,06
0,57 0,09
0,63 0,05
0,72 0,00
0,58 0,03
0,61 0,02
0,73 0,46
0,76 0,10
0,72 0,00
0,71 0,03
0,6 0,20
0,8 0,41
0,68 0,44
0,53 0,05
0,77 0,62
0,63 0,00
0,64 0,12
0,77 0,15
0,55 0,14
0,76 0,12
0,52 0,18
0,8 0,01
0,61 0,02
0,6 0,07
0,52 0,14
0,79 0,25
0,57 0,07
0,71 0,05
0,65 0,01
0,72 0,02
0,59 0,09
67
0,6 0,06
0,75 0,23
0,58 0,47
0,51 0,27
0,65 0,3
0,61 0,26
0,54 0,05
0,57 0,07
0,75 0,42
0,57 0,11
76
1,00 0,45
0,82 0,73
0,97 0,7
0,62 0,31
1,00 0,20
0,83 0,40
1,00 0,54
0,88 0,24
0,97 0,76
0,99 0,25
7.5.4 42 areas with 70%30%
Score IoU
0.97 0.36
0.59 0.32
0.99 0.44
0.62 0.73
0.95 0.66
0.84 0.38
1 0.18
1 0.6
0.65 0.25
0.99 0.8