image classification: accuracy assessment lecture 10 prepared by r. lathrop 11//99 updated 3/06...

Image Classification: Accuracy Assessment

Lecture 10

Prepared by R. Lathrop 11//99

Updated 3/06

Readings:

ERDAS Field Guide 5th Ed. Ch 6:234-260

Where in the World?

Learning objectives• Remote sensing science concepts

– Rationale and technique for post-classification smoothing– Errors of omission vs. commission– Accuracy assessment

• Sampling methods • MeasuresFuzzy accuracy assessment

• Math Concepts– Calculating accuracy measures: overall accuracy, producer’s

accuracy and user’s accuracy and kappa coefficient.

• Skills Interpreting Contingency matrix and Accuracy

assessment measures

Post-classification “smoothing”• Most classifications have a problem with “salt and pepper”, i.e.,

single or small groups of mis-classified pixels, as they are “point” operations that operate on each pixel independent of its neighbors

• “Salt and pepper” may be real. The decision on whether to filter/eliminate depends on the choice of the minimum mapping unit = does it equal single pixel or an aggregation

• Majority filtering: replaces central pixel with the majority class in a specified neighborhood (3 x 3 window); con: alters edges

• Eliminate: clumps “like” pixels and replaces clumps under size threshold with majority class in local neighborhood; pro: doesn’t alter edges

Example: Majority filtering

6 6 6 6 6

2 6 6 2 6

2 6 2 6 6

2 8 2 6 6

2 2 2 2 2

6 6 2

6 2 6

8 2 6

Example from ERDAS IMAGINE Field Guide, 5th ed.

3x3 window

Class 6 = majority in window

6 6 6 6 6

2 6 6 6 6

2 2 6 6 6

2 2 2 6 6

2 2 2 2 2

Example: reduce single pixel “salt and pepper”

6 6 6 6 6

2 6 6 2 6

2 6 2 6 6

2 8 2 6 6

2 2 2 2 2

Input Output

Edge

6 6 6 6 6

2 6 6 6 6

2 2 6 6 6

2 2 2 6 6

2 2 2 2 2

Example: altered edge

6 6 6 6 6

2 6 6 2 6

2 6 2 6 6

2 8 2 6 6

2 2 2 2 2

Input Output

Edge

Example: Majority filtering

6 6 6 6 6

2 6 6 2 6

2 6 2 6 6

2 8 2 6 6

2 2 2 2 2

6 6 2

6 2 6

8 2 6

Example from ERDAS IMAGINE Field Guide, 5th ed.

3x3 window

Class 6 = majority in window

6 6 6 6 6

2 6 6 2 6

2 6 2 6 6

2 2 2 6 6

2 2 2 2 2

Example: ERDAS “Eliminate” no altered edge

6 6 6 6 6

2 6 6 2 6

2 6 2 6 6

2 8 2 6 6

2 2 2 2 2

Input Output

Edge Small clump “eliminated”

Accuracy Assessment

• Always want to assess the accuracy of the final thematic map! How good is it?

• Various techniques to assess the “accuracy’ of the classified output by comparing the “true” identity of land cover derived from reference data (observed) vs. the classified (predicted) for a random sample of pixels

• The accuracy assessment is the means to communicate to the user of the map and should be included in the metadata documentation

Accuracy Assessment

• R.S. classification accuracy usually assessed and communicated through a contingency table, sometimes referred to as a confusion matrix

• Contingency table: m x m matrix where m = # of land cover classes– Columns: usually represent the reference data– Rows: usually represent the remote sensed

classification results (i.e. thematic or information classes)

Accuracy Assessment Contingency Matrix

Reference Data Class. Data

1.10

1.20

1.40

1.60

2.00

2.10

2.40

2.50

Row Total

1.10

109

11

14

0

0

0

1

1

126

1.20

2

82

2

13

0

0

0

0

111

1.40

3

4

123

0

0

0

0

0

130

1.60

2

1

0

22

0

0

1

1

25

2.00

0

0

0

3

5

0

0

0

8

2.10

0

0

0

0

0

9

0

0

9

2.40

0

2

1

0

0

0

74

0

77

2.50

0

1

0

0

0

1

4

41

47

Col Total

116

101

140

38

5

10

80

43

533

Accuracy Assessment• Sampling Approaches: to reduce analyst bias

– simple random sampling: every pixel has equal chance

– stratified random sampling: # of points will be stratified to the distribution of thematic layer classes (larger classes more points)

– equalized random sampling: each class will have equal number of random points

• Sample size: at least 30 samples per land cover class

How good is good?

• How accurate should the classified map be?• General rule of thumb is 85% accuracy• Really depends on how much “risk” you are willing

to accept if the map is wrong • Are you interested in more in the overall accuracy of

the final map or in quantifying the ability to accurately identify and map individual classes

• Which is more acceptable overestimation or underestimation

How good is good? Example

• USGS_NPS National Vegetation classification standard

• Horizontal positional locations meet National Map Accuracy standards

• Thematic accuracy >80% per class• Minimum Mapping Unit of 0.5 ha• http://biology.usgs.gov/npsveg/aa/

indexdoc.html

A whole set of field reference point can be developed using some sort of random allocation but due to travel/access constraints, only a subset of points is actually visited. Resulting in a not truly random distribution.

Accuracy Assessment Issues

• What constitutes reference data? - higher spatial resolution imagery (with

visual interpretation) - “ground truth”: GPSed field plots

- existing GIS maps

• Reference data can be polygons or points

Accuracy Assessment Issues• Problem with “mixed” pixels: possibility of

sampling only homogeneous regions (e.g., 3x3 window) but introduces a subtle bias

• If smoothing was undertaken, then should assess accuracy on that basis, i.e., at the scale of the mmu

• If a filter is used should be stated in metadata• Ideally, % of overall map that so qualifies should

be quantified, i.e., 75% of map is composed of homogenous regions greater than 3x3 in size – thus 75% of map assessed, 25% not assessed.

Errors of Omission vs. Commission

• Error of Omission: pixels in class 1 erroneously assigned to class 2; from the class 1 perspective these pixels should have been classified as class1 but were omitted

• Error of Commission: pixels in class 2 erroneously assigned to class 1; from the class 1 perspective these pixels should not have been classified as class but were included

Errors of Omission vs. Commission: from a Class2 perspective

0 255

Digital Number

# of pixels

Class 1 Class 2

Commission error: pixels in Class1 erroneously assigned to Class 2

Omission error: pixels in Class2 erroneously assigned to Class 1

Accuracy Assessment Measures• Overall accuracy: divide total correct (sum of the major

diagonal) by the total number of sampled pixels; can be misleading, should judge individual categories also

• Producer’s accuracy: measure of omission error; total number of correct in a category divided by the total # in that category as derived from the reference data; measure of underestimation

• User’s accuracy: measure of commission error; total number of correct in a category divided by the total # that were classified in that category ; measure of overestimation

Accuracy Assessment Contingency Matrix

Class.Data

1.10

1.20

1.40

1.60

2.00

2.10

2.40

2.50

Row Total

1.10

308

23

12

1

0

1

3

0

348

1.20

2

279

9

2

0

0

2

1

295

1.40

0

1

372

0

1

1

4

0

379

1.60

0

1

0

26

0

0

0

0

27

2.00

0

0

1

0

10

0

2

5

18

2.10

1

0

2

0

2

93

1

0

99

2.40

3

1

12

0

0

1

176

1

194

2.50

1

0

0

0

0

1

1

48

51

ColTotal

315

305

408

29

13

97

189

55

1411

Reference Data

Accuracy Assessment Measures

Code

Land Cover Description

Number Correct

Producer=s Accuracy

User=s Accuracy

Kappa

1.10

Developed

308

1.20

Cultivated/Grassland

279

1.40

Forest/Scrub/Shrub

372

1.60

Barren

26

2.00

Unconsolidated Shore

10

---

---

---

2.10

Estuarine Emergent Wetland

93

2.40

Palustrine Wetland: Emergent/Forested

176

2.50

Water

48

Totals

1312

Overall Classification Accuracy =


Code


Number Correct

Producer=s Accuracy

User=s Accuracy

Kappa

1.10

Developed

308

308/315

308/348

1.20


279

279/305

279/295

1.40

Forest/Scrub/Shrub

372

372/408

372/379

1.60

Barren

26

26/29

26/27

2.00


10

10/13

10/18

2.10


93

93/97

93/99

2.40


176

176/189

176/194

2.50

Water

48

48/55

48/51

Totals

1312

Overall Classification Accuracy = 1312/1411


Code


Number Correct

Producer=s Accuracy

User=s Accuracy

Kappa

1.10

Developed

308

97.8

88.5

.

1.20


279

91.5

94.6

.

1.40

Forest/Scrub/Shrub

372

91.2

98.2

1.60

Barren

26

89.7

96.3

.

2.00


10

76.9

55.6

2.10


93

95.9

93.9

2.40


176

93.1

90.7

.

2.50

Water

48

87.3

94.1

.

Totals

1312

Overall Classification Accuracy = 93.0%

Accuracy Assessment Measures• Kappa coefficient: provides a difference measurement between the

observed agreement of two maps and agreement that is contributed by chance alone

• A Kappa coefficient of 90% may be interpreted as 90% better classification than would be expected by random assignment of classes

• What’s a good Kappa? General range K < 0.4: poor 0.4 < K < 0.75: good K > 0.75: excellent

• Allows for statistical comparisons between matrices (Z statistic); useful in comparing different classification approaches to objectively decide which gives best results

• Alternative statistic: Tau coefficient

Kappa coefficientKhat = (n * SUM Xii) - SUM (Xi+ * X+i)

n2 - SUM (Xi+ * X+i)

where SUM = sum across all rows in matrix

Xii = diagonal

Xi+ = marginal row total (row i)

X+I = marginal column total (column i)

n = # of observations

Takes into account the off-diagonal elements of the contingency matrix (errors of omission and commission)

Kappa coefficient: Example

(SUM Xii) = 308 + 279 + 372 + 26 +10 + 93 + 176 + 48 = 1312

SUM (Xi+ * X+i) = (348*315) + (295*305) + (379*408) + (27*29) +

(18*13) + (99*97) + (194*189) + (51*55) =

Khat = 1411(1312) – 404,318 (1411)2 – 404,318

Khat = 1851232 – 404,318 = 1,446,914 = .912 1990921 – 404,318 1,586,603

k

iii

k

iii

k

iii

xxN

xxxNK

1

2

11

)*(

)*(*ˆ


Code


Number Correct

Producer=s Accuracy

User=s Accuracy

Kappa

1.10

Developed

308

97.8

88.5

.8520

1.20


279

91.5

94.6

.9308

1.40

Forest/Scrub/Shrub

372

91.2

98.2

.9740

1.60

Barren

26

89.7

96.3

.9622

2.00


10

76.9

55.6

***

2.10


93

95.9

93.9

.9349

2.40


176

93.1

90.7

.8929

2.50

Water

48

87.3

94.1

.9388

Totals

1312

.9120

** Sample Size for this Land Cover Class Too Small (< 25) for valid Kappa measure

Overall Classification Accuracy = 93.0%

Case StudyMulti-scale segmentation approach to mapping seagrass habitats using

airborne digital camera imaging

Richard G. Lathrop¹, Scott Haag¹·² , and Paul Montesano¹.¹Center for Remote Sensing & Spatial AnalysisRutgers UniversityNew Brunswick, NJ 08901-8551

²Jacques Cousteau National Estuarine Research Reserve130 Great Bay BlvdTuckerton NJ 08087

Method> Field Surveys

All transect endpoints and individual check points were first mapped onscreen in the GIS.

Endpoints were then loaded into a GPS (+- 3meters) for navigation on the water.

A total of 245 points were collected.

Method> Field Surveys

For each field reference point, the following data was collected:

• GPS location (UTM)• Time• Date• SAV species presence/dominance: Zostera marina or Ruppia maritima or

macroalgae• Depth (meters)• % cover (10 % intervals) determined by visual estimation• Blade Height of 5 tallest seagrass blades• Shoot density (# of shoots per 1/9 m2 quadrat that was extracted and counted on

the boat)• Distribution (patchy/uniform)• Substrate (mud/sand)• Additional Comments

Results> Accuracy Assessment

The resulting maps were compared with the 245 field reference points.

All 245 reference points were used to support the interpretation in some fashion and so can not be truly considered as completely independent validation

The overall accuracy was 83% and Kappa statistic was 56.5%, which can be considered as a moderate degree of agreement between the two data sets.

Reference Reference

GIS Map Seagrass Absent

Seagrass Present

User’s Accuracy

Seagrass Absent

67 32 68%

Seagrass Present

10 136 93%

Producer’s Accuracy

87% 81% 83%

Results> Accuracy Assessment

The resulting maps were also compared with an independent set of 41 bottom sampling points collected as part of a seagrass-sediment study conducted during the summer of 2003 (Smith and Friedman, 2004).

The overall accuracy was 70.7% and Kappa statistic was 43%, which can be considered as a moderate degree of agreement between the two data sets.

Reference Reference

GIS Map SeagrassAbsent

SeagrassPresent

User’s Accuray

SeagrassAbsent

14 3 82%

Seagrass Present

9 15 62%

Producer’s Accuracy

61% 83% 71%

SAV Accuracy Assessment Issues

• Matching spatial scale of field reference data with scale of mapping

• Ensuring comparison of “apples to apples”

• Spatial accuracy of “ground truth” point locations

• Temporal coincidence of “ground truth” and image acquisition

“Fuzzy” Accuracy Assessment •“Real world” is messy; natural vegetation communities are a continuum of states, often with one grading into the next

•R.S. classified maps generally break up land cover/vegetation into discrete either/or classes

•How to quantify this messy world? R.S. classified maps have still have some error while still having great utility

•Fuzzy Accuracy Assessment: doesn’t quantify errors as binary correct or incorrect but attempts to evaluate the severity of the error

“Fuzzy” Accuracy Assessment •Fuzzy rating: severity of error or conversely the similarity between map classes is defined from a user standpoint

•Fuzzy rating can be developed quantitatively based on the deviation from a defined class based on a % difference (i.e., within +/- so many %)

•Fuzzy set matrix: fuzzy rating between each map class and every other class is developed into a fuzzy set matrixFor more info, see: Gopal & Woodcock, 1994. PERS:181-188

“Fuzzy” Accuracy Assessment:Level Description

5 Absolutely right: Exact match

4 Good: minor differences; species dominance or composition is very similar

3 Acceptable Error: mapped class does not match; types have structural or ecological similarity or similar species

2 Understandable but wrong: general similarity in structure but species/ecological conditions are not similar

1 Absolutely wrong: no conditions or structural similarity

http://biology.usgs.gov/npsveg/fiis/aa_results.pdf

http://www.fs.fed.us/emc/rig/includes/appendix3j.pdf

http://biology.usgs.gov/npsveg/fiis/aa_results.pdf

“Fuzzy” Accuracy Assessment:

•Each user could redefine the fuzzy set matrix on an application-by-application basis to determine what percentage of each map class is acceptable and the magnitude of the errors within each map class

•Traditional map accuracy measures can be calculated at different levels of error

Exact – only level 5 (MAX) Acceptable – level 5, 4, 3 (RIGHT)

•Example: from USFS

Label #Sites MAX(5 only) RIGHT (3,4,5)CON 88 71 81% 82 93%

“Fuzzy” Accuracy Assessment: example from USFS

Label Sites CON MIX HDW SHB HEB NFO Total

CON 88 X 0 1 5 0 0 6

MIX 14 2 X 1 1 0 0 4

HDW 6 1 1 X 0 0 0 2

SHB 8 1 0 0 X 0 0 1

HEB 1 0 0 0 1 X 0 1

NFO 4 3 0 0 3 0 X 6

Total 121 7 1 2 10 0 0 20

Confusion Matrix based on Level 3,4,5 as Correct


“Fuzzy” Accuracy Assessment:•Ability to evaluate the magnitude or seriousness of errors

•Difference Table: error within each map class based on its magnitude with error magnitude calculated by measuring the difference between the fuzzy rating of each ground reference point and the highest rank assigned to all other possible map classes

• All points that are Exact matches have Difference values >= 0; all mismatches are negative. Values -1 to 4 generally correspond to correct map labels. Values of -2 to -4 correspond to map errors with -4 representing a more serious error than -1

“Fuzzy” Accuracy Assessment: Difference Table: example from USFS


Label Sites # Mismatches # Matches-4 -3 -2 -1 0 1 2 3 4

CON 88 4 2 0 11 3 0 12 23 33

Higher positive values indicate that pure conditions are well mapped while lower negative values show pure conditions to be poorly mapped. Mixed or transitional conditions, where a greater number of class types are likely to be considered acceptable, will fall more in the middle

“Fuzzy” Accuracy Assessment:

•Ambiguity Table: tallies map classes that characterize a reference site as well as the actual map label

• Useful in identifying subtle confusion between map classes and may be useful in identifying additional map classes to be considered

•Example from USFS


Label Sites CON MIX HDW SHB HEB NFO Total

CON 88 X 11 6 15 0 0 32

15 out of 88 reference sites mapped as conifer could have been equally well labeled as shrub

Alternative Ways of Quantifying Accuracy: Ratio Estimators

• Method of statistically adjusting for over- or underestimation

• Randomly allocate “test areas”, determine area from map and reference data

• Ratio estimation uses the ratio of Reference/Map area to adjust the mapped area estimate

• Uses the estimate of the variance to develop confidence levels for land cover type area

Shiver & Border, 1996. Sampling Techniques for Forest Resource Inventory, Wiley, NY, NY. Pp. 166-169

Example: NJ 2000 Land Use Update Comparison of urban/transitional land use as determined by photo-interpretation of 1m B&W photography vs. 10m SPOT PAN

1 m B&W 10 m SPOT PAN

Comparison of Land Use between Reference Imagery & SPOT: Urban & Transitional

0

50

100

150

200

250

300

350

400

0 50 100 150 200 250 300 350 400

SPOT (acres)

Ref

eren

ce Im

ager

y (a

cres

)

Tile

1 to 1 line

Below 1-to-1 line: overestimate

Above 1-to-1 line: underestimate

Example: NJ Land Use Change

Land Use Change Category

Mapped Estimate

(Acres)

Statistically Adjusted Estimate with 95% CI (acres)

Urban 73,191 77,941 +/- 17,922

Transitional/Barren 20,861 16,082 +/- 7,053

Total Urban & Barren

94,052 89,876 +/- 16,528

30m TM pixel grid on IKONOS image

Case Study: Sub-pixel Un-mixing

Urban/Suburban Mixed Pixels: varying proportions of developed surface, lawn and trees

Objective: Sub-pixel Unmixing

FalseColorCompositeImageR: ForestG: LawnB: IS

ImperviousSurfaceEstimation

GrassEstimation

WoodyEstimation

Validation Data

• For homogenous 90mx90m test areasinterpreted DOQ-DOQ pixels scaled to match TM

• For selected sub-areas:IKONOS multi-spectral image

– 3 key indicator land use classified map: impervious surface, lawn, and forest

-IKONOS pixels scaled to match TM

Egg Harbor City

IKONOS

Impervious

Grass

Woody

Landsat SOM-LVQLandsat LMMInterior LMM - Impervious Surface

0

2000

4000

6000

8000

1 3 5 7 9 11 13

Plot

Are

a LMM

Reference

Interior LMM - Lawn

0

2000

4000

6000

8000

1 3 5 7 9 11 13

Plot

Are

a LMM

Reference

Interior LMM - Urban Tree

0

2000

4000

6000

8000

1 3 5 7 9 11 13

Plot

Are

a LMM

Reference

Hammonton

IKONOS Landsat LMM

Impervious

Grass

Woody

Landsat SOM-LVQ

Interior LMM - Impervious Surface

0

2000

4000

6000

8000

1 3 5 7 9 11 13 15

Plot

Are

a LMM

Reference

Interior LMM - Lawn

0

2000

4000

6000

8000

1 3 5 7 9 11 13 15

Plot

Are

a LMM

Reference

Interior LMM - Urban Tree

0

2000

4000

6000

8000

1 3 5 7 9 11 13 15

Plot

Are

a LMM

Reference

Impervious Lawn Urban Tree

IKONOS ± 5.6% ± 5.8% ± 6.1%

LMM ± 7.7% ± 12.5% ± 19.6%

SOM_LVQ ± 6.8% ± 6.0% ± 5.0%

Impervious Grass Tree

IKONOS ± 7.4% ± 8.2% ± 7.1%

LMM ± 10.8% ± 13.6% ± 20.7%

SOM_LVQ ± 12.0 % ± 10.3% ± 11.0%

Hammonton

Egg Harbor City

Root Mean Square Error: 90m x 90m test plots

HammontonEgg Harbor

City

I mpervious

Grass

Trees

SOM-LVQ vs. IKONOS

Study sub-area comparison

3x3 TM pixel zonal %

RMSE = ± 13.5%

RMSE = ± 17.6%

RMSE = ± 15.0%

RMSE = ± 14.4%

RMSE = ± 17.6%

RMSE = ± 21.6%

NJDEP - Landsat_SOM IS Area

0

300000

600000

900000

1200000

1500000

0 300000 600000 900000 1200000 1500000

NJDEP

Lan

dsa

t_S

OM

Commercial

Comparison of Landsat TM vs. NJDEP IS estimates


0

1000000

2000000

3000000

4000000

0 1000000 2000000 3000000 4000000

NJDEP

Lan

dsat_

SO

M

High Density Residential


0

1000000

2000000

3000000

4000000

0 1000000 2000000 3000000 4000000

NJDEP

Lan

dsat_

SO

M

Medium density Residential


0

500000

1000000

1500000

2000000

2500000

0 500000 1000000 1500000 2000000 2500000

NJDEP

Lan

dsat_

SO

M

Low Density Residential

Summary of Results• Impervious surface estimation compares favorably

to DOQ and IKONOS – ±10 to 15% for impervious surface – ±12 to 22% for grass and tree cover.

• Shows strong linear relationship with IKONOS in impervious surface and grass estimation

• Greater variability in forest fraction due to variability in canopy shadowing and understory background

Summary of the lecture

1 Remove “salt & pepper” though eliminate clump-like pixels.

2 Sampling methods of reference points

3 Contingency matrix and Accuracy assessment measures: overall accuracy, producer’s accuracy and user’s accuracy, and kappa coefficient.

4 Fuzzy accuracy assessment: Fuzzy rating, set matrix, and ratio estimators.

Homework

1 Homework: Accuracy Assessment;

3 Reading Textbook Ch 13, Field Guide Ch 6

image classification: accuracy assessment lecture 10 prepared by r. lathrop 11//99 updated 3/06...

Documents