ai for qc: op landsat and roadmap towards sentinel-2
TRANSCRIPT
VH-RODA Workshop 2021 | 20-23 April 2021 | Slide 1
VH
-RO
DA
20
21
on
line
wo
rksh
op
AI for QC: Landsat and roadmap
towards Sentinel-2
Kevin Halsall
VH-RODA Workshop 2021 | 20-23 April 2021 | Slide 2
VH
-RO
DA
20
21
on
line
wo
rksh
op
Contents
▪ Introduction
▪ Background
▪ Data and Tooling
▪ Machine Learning model development for Landsat
▪ Supervised
▪ Semi-supervised
▪ Future Development Roadmap – towards Sentinel-2
VH-RODA Workshop 2021 | 20-23 April 2021 | Slide 3
VH
-RO
DA
20
21
on
line
wo
rksh
op
Introduction
Kevin Halsall
Project Manager & Ease QC Product Owner, Telespazio UK
Telespazio UK
15+ years experience performing QC assessments on EO data
Prime contractor for the IDEAS-QA4EO service for ESRIN assessing ESA’s EO
data
Ease QC
Internal programme within Telespazio UK applying Machine Learning techniques
to EO data QC
VH-RODA Workshop 2021 | 20-23 April 2021 | Slide 4
VH
-RO
DA
20
21
on
line
wo
rksh
op
Background
▪ Traditional EO Data Quality Control activities consist of:
▪ Automated checks (applied to whole dataset)
▪ Detailed human observations (subset of data)
▪ EO data volumes increasing year on year
▪ More satellites
▪ More complex data
▪ Funding and resources cannot increase at the same pace
▪ Could ML techniques support QC assessments to keep up with increases
and/or improve upon existing assessment activities?
VH-RODA Workshop 2021 | 20-23 April 2021 | Slide 5
VH
-RO
DA
20
21
on
line
wo
rksh
op
Data & Tooling
▪ Development of an ML model can be idealised to a 5 step process
▪ Data Preparation is often the most effort intensive step
▪ If you want a good model you need a lot of good data
1 2 3 4 5
Get Data
Clean, Prepare & Manipulate Data
Train Model
Improve
Test Data
VH-RODA Workshop 2021 | 20-23 April 2021 | Slide 6
VH
-RO
DA
20
21
on
line
wo
rksh
op
Data & Tooling
▪ Ease QC activities initially applied to Landsat data
▪ Driven by a need for lots of clearly labelled data
▪ QA4EO service undertaking assessment of Landsat 1-5 reprocessed data
▪ 600,000+ data products
▪ Allowed Ease QC team to source suitable, trustworthy, labelled data for
training in-house
▪ Sourcing appropriate data can often be a major issue
▪ Much of the effort of this phase went to the development of a tool
▪ Support the labelling of data and definition of training datasets
▪ Integrate the activities of the QC engineers and ML developers
VH-RODA Workshop 2021 | 20-23 April 2021 | Slide 7
VH
-RO
DA
20
21
on
line
wo
rksh
op
Data & Tooling
▪ “QCOLT” software application developed to support the project; enables:
▪ QC engineers to assess and label data
▪ Included features to make QC assessments more efficient
▪ ML developers to
▪ Define and select suitable training datasets based on assessments
▪ View results of models once applied to the data
▪ QC engineers can assess flagged data
▪ ML models can be re-developed based on new assessments
▪ Tightly integrated process
▪ Does not itself do any ‘machine learning’
▪ Supports all elements of the cycle
“Quality Control Optical Learning Tool”
VH-RODA Workshop 2021 | 20-23 April 2021 | Slide 8
VH
-RO
DA
20
21
on
line
wo
rksh
op
Data & Tooling
▪ Customised GUI permitting:
▪ visual inspection of the data
▪ inspection of metadata & automated QC check information
▪ anomaly assignment
“Quality Control Optical Learning Tool”
VH-RODA Workshop 2021 | 20-23 April 2021 | Slide 9
VH
-RO
DA
20
21
on
line
wo
rksh
op
Machine Learning Model Development
▪ First proof of concept ML activity focused on ‘Supervised’ model development
▪ Model trained to detect a single anomaly type
▪ Anomaly criteria:
▪ #1: Visible in the product image
▪ #2: Deterministic detection unfeasible
▪ Convolutional Neural Network type used
▪ Training data consists of examples of the anomaly and those of ‘good’ data
▪ Only limited number of anomalous examples
▪ ‘Chipping’ technique employed to increase data sample size
▪ 25 positive produces → 3,686 synthesised data products
Supervised models
“Scan start” anomaly
VH-RODA Workshop 2021 | 20-23 April 2021 | Slide 10
VH
-RO
DA
20
21
on
line
wo
rksh
op
Machine Learning Model Development
▪ Completed model infers a ‘soft classifier’ for each product assessed
▪ % probably of a product having the anomaly (not True/False)
▪ Scan Start anomaly model was run over 39,000 Landsat-3 products
▪ The 5% and 95% probability levels were
selected to define 3 confidence levels
▪ 0-5% - anomaly free
▪ 95-100% - anomaly detected
▪ (5%-95%) classified as ‘undecided’
▪ Active Learning iterations
▪ Data reassessed
▪ Model retrained & improved
Supervised models results
Active Learning
VH-RODA Workshop 2021 | 20-23 April 2021 | Slide 11
VH
-RO
DA
20
21
on
line
wo
rksh
op
Machine Learning Model Development
▪ Model is trained to recognise ‘normal’ data & detect anomalous data
▪ Potentially able to detect multiple types of anomaly
▪ Anomalies detected by the model are not classified
▪ Require further identification by the QC engineer
▪ Much more complex than a supervised model
▪ Sub-divided into two separate workflows
Semi-supervised “model”
VH-RODA Workshop 2021 | 20-23 April 2021 | Slide 12
VH
-RO
DA
20
21
on
line
wo
rksh
op
Machine Learning Model Development
▪ Model is trained to recognise ‘normal’ data & detect anomalous data
▪ Potentially able to detect multiple types of anomaly
▪ Anomalies detected by the model are not classified
▪ Require further identification by the QC engineer
▪ Much more complex than a supervised model
▪ Sub-divided into two separate workflows
▪ Workflow A: Complexity reduction
▪ Subdivide dataset into clusters
▪ Uses pre-existing ML models
▪ Resnet50
▪ K-means
Semi-supervised “model”
VH-RODA Workshop 2021 | 20-23 April 2021 | Slide 13
VH
-RO
DA
20
21
on
line
wo
rksh
op
Machine Learning Model Development
▪ Model is trained to recognise ‘normal’ data & detect anomalous data
▪ Potentially able to detect multiple types of anomaly
▪ Anomalies detected by the model are not classified
▪ Require further identification by the QC engineer
▪ Much more complex than a supervised model
▪ Sub-divided into two separate workflows
▪ Workflow B: Binary Classifier
▪ Detects anomalies in each cluster separately
▪ Convolutional Neural Network Auto-Encoder
▪ Support Vector Machine
Semi-supervised “model”
VH-RODA Workshop 2021 | 20-23 April 2021 | Slide 14
VH
-RO
DA
20
21
on
line
wo
rksh
op
Machine Learning Model Development
▪ Promising preliminary results
▪ Highly dependent upon the content of the training data sets
▪ Good performance achieved on anomalies highly represented in the
training data
▪ Model demonstrated success at detecting ‘Scan start’ anomalies
▪ Supervised model enabled identification of 20k+ examples
▪ Less success with less represented anomaly
▪ Improving the performance of the model will require:
▪ Obtaining more labelled data
▪ Redevelopment of the training network and model architecture
Semi-supervised model results
VH-RODA Workshop 2021 | 20-23 April 2021 | Slide 15
VH
-RO
DA
20
21
on
line
wo
rksh
op
Development Roadmap
▪ Focus on Landsat driven by availability of accurately labelled training data
▪ Similarities between Landsat and Sentinel-2 to be explored
▪ More data for training
▪ One model applicable to both (+ other similar instruments)
▪ Updates required:
▪ Existing models/processes need to be modified for new data
▪ Need to access a sufficient amount of training data (cloud e.g. DIAS)
▪ Training data needs to be appropriately labelled
▪ Incorporate external QC analyses into ML development process
▪ Other improvements
▪ Incorporate full band data
▪ Assessments performed on RGB data to reduce complexity
VH-RODA Workshop 2021 | 20-23 April 2021 | Slide 16
VH
-RO
DA
20
21
on
line
wo
rksh
op
Summary
▪ The quality and quantity of the training data is extremely important
▪ Required in sufficient volumes
▪ Requires accurate labelling
▪ Effort expended in preparing the data cannot be understated
▪ Techniques/tools to support this part of the process very important
▪ Supervised model success demonstrates feasibility of using ML for EO data QC
▪ Dedicated effort required for detection of single anomaly type
▪ Results can be very effective – particularly for bulk assessments
▪ Semi-supervised ‘model’ has potential to be more generally applicable
▪ Detects multiple anomalies, potentially across similar instrument types
▪ Highly complex model requiring more development and data
VH-RODA Workshop 2021 | 20-23 April 2021 | Slide 17
VH
-RO
DA
20
21
on
line
wo
rksh
op
Thank you for your attention
Kevin Halsall
Telespazio UK