land cover, population esimates, and state boundaries: a comparison of uncertainty among gis...

14
LAND COVER, POPULATION ESIMATES, AND STATE BOUNDARIES: A COMPARISON OF UNCERTAINTY AMONG DATATSETS Kim Boggio March 2014

Upload: kim-boggio

Post on 03-Jul-2015

59 views

Category:

Data & Analytics


2 download

DESCRIPTION

LAND COVER, POPULATION ESIMATES, AND STATE BOUNDARIES: A COMPARISON OF UNCERTAINTY AMONG GIS DATATSETS

TRANSCRIPT

Page 1: LAND COVER, POPULATION ESIMATES, AND STATE BOUNDARIES: A COMPARISON OF UNCERTAINTY AMONG GIS DATATSETS

LAND COVER, POPULATION ESIMATES, AND STATE BOUNDARIES:A COMPARISON OF UNCERTAINTY AMONG DATATSETS

Kim BoggioMarch 2014

Page 2: LAND COVER, POPULATION ESIMATES, AND STATE BOUNDARIES: A COMPARISON OF UNCERTAINTY AMONG GIS DATATSETS

LAND COVER, POPULATION ESIMATES, AND STATE BOUNDARIES:A COMPARISON OF UNCERTAINTY AMONG DATATSETS

There are various techniques used in GIS to visualize land coverage, population estimates,

and state boundaries. This data is accessed through different sources; each dataset has its

advantages and disadvantages. This document evaluates the uncertainty associated with the

various methods used to present land cover, population, and state boundaries in a GIS

format.

This study is conducted as an outgrowth of work done for the Open Space Institute (OSI) in an

effort to identify forested lands adjacent to urban and suburban areas that would be

appropriate for acquisition.

Some background on the Open Space Institute:

• The Open Space Institute (OSI) protects scenic, natural, and historic landscapes to ensure public enjoyment, conserve habitats, and sustain community character.

• OSI achieves its goals through land acquisition, conservation easements, regional loan programs, fiscal sponsorship, creative partnerships, and analytical research. OSI has protected more than 100,000 acres through the New York land program through direct acquisition and conservation easements in the State of New York.

• Through the Conservation Finance Program, which provides low-cost bridge loans, OSI has assisted in the protection of an additional 1.6 million acres across the East Coast.

• The Research Program influences land use policy and practice through research, communication and training.

METHODS

The task associated with the OSI project was to identify land adjacent to urban and suburban

areas for open space acquisition. The solution was to use land cover, impervious cover,

federal lands, wilderness areas, and LANDSCAN datasets to identify those areas. After using

various raster and vector data it became obvious that there were large differences in scale,

coordinate systems, classification of data, and underlying attribute tables. This introduced

uncertainty, depending on the dataset used. The data used in the OSI project is analyzed

below to see how closely the raster and vector layers align.

Page 3: LAND COVER, POPULATION ESIMATES, AND STATE BOUNDARIES: A COMPARISON OF UNCERTAINTY AMONG GIS DATATSETS

The land cover rasters used in the study were the 30m grid from the National Land Cover

Database (NLCD) and the 200m grid land cover from National Atlas (See appendix A for the

respective websites). Population raster layers used were Impervious 30m from NLCD) and

LANDSCAN (.00833 X .00833 decimal degree cells – approximately 1010m X 488m) from

Oak Ridge National Laboratory. State boundary shapefiles were downloaded from National

Atlas, Tiger Data (US Census Bureau) and ESRI.

Land cover data was compared for similarities in classification, and visual clarity at different

map scales on an equal extent basis. Land cover was also compared using 500 random

points in a relatively small area at the head of the Chesapeake Bay. Population data was

analyzed for classification methods, and visual clarity at different map scales on an equal

extent basis. Finally, state boundaries shapefiles were investigated for how closely they

follow the Delmarva Peninsula coastline; this would reveal how closely the boundary files

match each other.

LANDCOVER RESULTS

The NLCD classifications used to analyze land cover are shown in Figure1.

Figure 1: NLCD Land cover classifications used to analyze the data

Page 4: LAND COVER, POPULATION ESIMATES, AND STATE BOUNDARIES: A COMPARISON OF UNCERTAINTY AMONG GIS DATATSETS

LAND COVER AREA BY CLASS ANALYSIS

Figure 2 shows the various land classes, the cell counts, and area by class for 30m and 200m

NLCD data in National Land Cover boundary zone 13.

One can see that cell counts are much lower for 200m data compared to 30m data, which

would be expected given that the 200m data is comprised of 40ha cells and 30m data is

0.9ha. It appears obvious from figure 2 that 30m and 200m land cover are representing the

land cover classes differently. For instance class 43 shows mixed forest area as 164,000,000

m² (16,400 ha) for 30m data and 1,940,280,000 m² (194,028 ha) for 200m data. There are

statistically significant differences between 30m and 200m data for urban, forested, and

herbaceous wetland cover classes as shown in Tables 2 and 3.

Table 1: 200m & 30m Land cover area analysis

CLASS DESCRIPTION30m

COUNT30m AREA

(m²)200m

COUNT200m AREA

(m²)

AREA DIFFERENCE

%

11 OPEN WATER 1,014,628 865,176,000 24,611 984,440,000 13.8%

21 LOW INTENSITY RESIDENTIAL 2,043,829 1,742,780,000 49,179 1,967,160,000 12.9%

22 HIGH INTENSITY RESIDENTIAL 1,859,436 1,585,550,000 10,389 415,560,000 73.8%

23 COMMERCIAL/ INDUSTRIAL/TRANSPORTATION 921,654 785,897,000 15,954 638,160,000 18.8%

24 HIGH INTENSITY URBAN 431,715 368,124,000 - - - - N/A

31 BARE ROCK/ SAND/ CLAY 349,813 298,286,000 - - - - N/A

32 QUARRIES/ STRIP MINES - - - - 2,455 98,200,000 N/A

33 TRANSITIONAL - - - - 1,890 75,600,000 N/A

41 DECIDUOUS FOREST 7,470,356 6,369,990,000 160,357 6,414,280,000 0.7%

42 EVERGREEN FOREST 1,235,803 1,053,770,000 24,979 999,160,000 5.2%

43 MIXED FOREST 192,967 164,543,000 48,507 1,940,280,000 1079.2%

51 SHRUBLAND - - - - - - - - 100.0%

61 ORCHARDS/ VINEYARDS - - - - - - - - N/A

71 GRASSLAND/ HERBACEOUS - - - - - - - - N/A

81 PASTURE/ HAY 6,504,092 5,546,050,000 186,453 7,458,120,000 34.5%

82 ROW CROPS 5,075,837 4,328,180,000 59,044 2,361,760,000 45.4%

83 SMALL GRAINS - - - - - - - - N/A

85 URBAN/ RECREATIONAL GRASSES - - - - 4,666 186,640,000 N/A

90EMERGENT HERBACEOUS WETLANDS/ WOODY WETLANDS 1,343,873 1,145,920,000 - - - - N/A

91 WOODY WETLANDS - - - - 18,548 741,920,000 N/A

92 EMERGENT HERBACEOUS WETLANDS - - - - 9,062 362,480,000 N/A

95EMERGENT HERBACEOUS WETLANDS/ WOODY WETLANDS 455,573 388,468,000 - - - - N/A

TOTALS28,899,57

6 24,642,734,000 616,094 24,643,760,000 0.0%

Table 2: General land cover class totals

Page 5: LAND COVER, POPULATION ESIMATES, AND STATE BOUNDARIES: A COMPARISON OF UNCERTAINTY AMONG GIS DATATSETS

CELL COUNT AREA (m²)

CELL COUNT AREA (m²)

% DIFFERENCE

FOREST LAND TOTAL 8,899,126 7,588,303,000 233,843 9,353,720,000 23.3%

RESIDENTIAL TOTAL 5,256,634 4,482,351,000 75,522 3,020,880,000 32.6%

FARMLAND TOTAL 11,579,929 9,874,230,000 245,497 9,819,880,000 0.6%

EMERGENT HERBACEOUS WETLANDS/ WOODY WETLANDS 1,799,446 1,534,388,000 27,610 1,104,400,000 28.0%

Table 3: General land cover class statistical analysis

Hypotheses: H0: φ1 = φ2 vs. HA: φ1 ≠ .φ2

TEST STATISTIC p1 φ2 SE p1- φ2 Z

FOREST LAND (CLASSES 41 – 43) Z= p1- φ2/√ φ1(1- φ)/n ) 0.308 0.380 0.017

-0.072 -4.26 P < .01

RESIDENTIAL (CASSES 21 – 24) 0.182 0.123 0.008 0.059 7.72 P < .01

FARMLAND (CLASSES 61 – 83) 0.401 0.398 0.017 0.002 0.13 N.S.

EMERGENT HERBACEOUS WETLANDS/ WOODY WETLANDS (CLASSES 90 – 95) 0.062 0.045 0.003 0.017 5.71 P < .01

LAND COVER RANDOM POINT ANALYSIS

The random point analysis was performed on a relatively small area at the head of the

Chesapeake Bay. ArcMap generated 500 random points for the same geographic

coordinates in both 30m and 200 m land cover as depicted in Figure 5 below. The results of

the random point analysis were similar to the NLCD area by class analysis in that there are

statistically significant differences between 30m and 200m data for urban, forested, and

herbaceous wetland cover classes as shown in Tables 5 and 6. The random point analysis

provides an accurate cell to cell comparison for both resolutions of NLCD land cover data.

Figure 2: 200M & 30M land cover with random points

Page 6: LAND COVER, POPULATION ESIMATES, AND STATE BOUNDARIES: A COMPARISON OF UNCERTAINTY AMONG GIS DATATSETS

Table 4: 200M & 30M random point NLCD classification analysis

CLASS DESCRIPTIONPOINT COUNT

POINT COUNT

11 OPEN WATER 12 12

21 LOW INTENSITY RESIDENTIAL 31 48

22 HIGH INTENSITY RESIDENTIAL 38 8

23 COMMERCIAL/ INDUSTRIAL/TRANSPORTATION 19 8

24 HIGH INTENSITY URBAN 2

31 BARE ROCK/ SAND/ CLAY 4

32 QUARRIES/ STRIP MINES 3

33 TRANSITIONAL 1

41 DECIDUOUS FOREST 127 116

42 EVERGREEN FOREST 23 15

43 MIXED FOREST 2 53

51 SHRUBLAND

61 ORCHARDS/ VINEYARDS

71 GRASSLAND/ HERBACEOUS

81 PASTURE/ HAY 125 153

82 ROW CROPS 91 62

83 SMALL GRAINS

85 URBAN/ RECREATIONAL GRASSES 1

90EMERGENT HERBACEOUS WETLANDS/ WOODY WETLANDS 22

91 WOODY WETLANDS 14

92 EMERGENT HERBACEOUS WETLANDS 5

95EMERGENT HERBACEOUS WETLANDS/ WOODY WETLANDS 4

TOTALS 500 499

Page 7: LAND COVER, POPULATION ESIMATES, AND STATE BOUNDARIES: A COMPARISON OF UNCERTAINTY AMONG GIS DATATSETS

Table 5: General land cover class random point totals

LAND CLASSPOINT COUNT

30mPOINT COUNT

200m% DIFFERENCE

FOREST LAND TOTAL 152 184 17.4%

RESIDENTIAL TOTAL 90 64 40.6%

FARMLAND TOTAL 216 215 0.5%

EMERGENT HERBACEOUS WETLANDS/ WOODY WETLANDS 26 19 36.8%

Table 6: General land cover class random point statistical analysis

LAND CLASSTEST

STATISTIC p1 φ2 SE p1- φ2 Z

FOREST LAND (CLASSES 41 - 43)Z= p1- φ2/√ φ1(1- φ)/n ) 0.304 0.369 0.017 -0.065 -3.89 P < .01

RESIDENTIAL (CASSES 21 - 24) 0.180 0.128 0.008 0.052 6.48 P < .01

FARMLAND (CLASSES 61 - 83) 0.432 0.431 0.018 0.001 0.06 N.S.

EMERGENT HERBACEOUS WETLANDS/ WOODY WETLANDS (CLASSES 90 - 95) 0.052 0.038 0.003 0.014 5.32 P < .01

LAND COVER RESOLUTION ANALYSIS

On the following page are two maps at 1:75,000 scale. The 30m map still provides sharp

delineations of land cover and boundaries; the 200m map of the same area shows a blur of

cells. The 200m map may be appropriate for macro view analysis, but inappropriate for

analysis on a small scale. The 200m map falls apart below a 1:1,000,000 scale; whereas the

30m map is still useful at 1:50,000.

Figure 3: 30M LAND COVER 1:75,000

Page 8: LAND COVER, POPULATION ESIMATES, AND STATE BOUNDARIES: A COMPARISON OF UNCERTAINTY AMONG GIS DATATSETS

Figure 4: 200M LANDCOVER 1:75,000

LANDSCAN RESULTS

Page 9: LAND COVER, POPULATION ESIMATES, AND STATE BOUNDARIES: A COMPARISON OF UNCERTAINTY AMONG GIS DATATSETS

LANDSCAN CLASSES

The LANDSCAN database provides population values for cells with an area of 54 ha at 40˚

latitude. The screen shot below shows that LANDSCAN symbology has population class

breaks at 5, 25, 50, 100, 500, 2,500, 5,000 and 130,000 people per cell. So each cell has an

actual population number associated with it (see Table 8). The NLCD Impervious dataset has

only a relative scale of population with no values appearing in the attribute table. However the

Impervious cells are 30m and may be useful for cursory analysis.

Figure 5: LANDSCAN Symbology

Table 8: LANDSCAN zonal statistics by NLCD class

Page 10: LAND COVER, POPULATION ESIMATES, AND STATE BOUNDARIES: A COMPARISON OF UNCERTAINTY AMONG GIS DATATSETS

CLASS DESCRIPTION COUNT AREA (m) MEAN STD SUM

NO. PEOPLE/

HA

Page 11: LAND COVER, POPULATION ESIMATES, AND STATE BOUNDARIES: A COMPARISON OF UNCERTAINTY AMONG GIS DATATSETS

11 OPEN WATER 732 747,490,000 83 361 60,810 0.8

21 LOW INTENSITY RESIDENTIAL 1,386 1,415,330,000 491 646 680,973 4.8

22 HIGH INTENSITY RESIDENTIAL 1,299 1,326,490,000 662 1,052 860,134 6.5

23COMMERCIAL/ INDUSTRIAL/TRANSPORTATION 659 672,945,000 1,171 1,532 771,940 11.5

24 DEVELOPED/ HIGH INTENSITY 326 332,899,000 1,658 2,114 540,591 16.2

31 BARE ROCK/ SAND/ CLAY 201 205,253,000 141 246 28,388 1.4

41 DECIDUOUS FOREST 4,384 4,476,770,000 121 296 532,410 1.2

42 EVERGREEN FOREST 229 233,846,000 133 258 30,446 1.3

43 MIXED FOREST 48 49,015,700 60 180 2,862 0.6

81 PASTURE/ HAY 4,703 4,802,520,000 113 263 530,065 1.1

82 ROW CROPS 3,390 3,461,740,000 118 320 398,785 1.2

90EMERGENT HERBACEOUS/ WOODY WETLANDS 463 472,798,000 230 559 106,560 2.3

95EMERGENT HERBACEOUS/ WOODY WETLANDS 276 281,841,000 120 709 33,092 1.2

TOTALS 18,096 18,478,937,700 5,102 8,5354,577,05

6

POPULATION ESTMATE RESOLUTION ANALYSIS

Similar to the land cover datasets, the large cell LANDSCAN map falls apart at scales under

1:1,000,000. The NLCD Impervious layer aligns nicely with the LANDSCAN layer (Figure 6),

however LANDSCAN provides actual population estimates.

Figure 6: LANDSCAN & NLCD IMPERVIOUS 1:1,500,000

Figure 7: LANDSCAN & NLCD IMPERVIOUS 1:50,000

Page 12: LAND COVER, POPULATION ESIMATES, AND STATE BOUNDARIES: A COMPARISON OF UNCERTAINTY AMONG GIS DATATSETS

STATE BOUNDARY ANALYSIS

Three state boundary shapefiles were used in this analysis: National Atlas, Tiger Data and

ESRI. All three shapefiles were compared to the 30m land cover layer. The National Atlas

data seemed to provide the most accurate boundaries, followed by Tiger data and ESRI (see

figure 8). ESRI boundaries were all encompassing; on a small scale they provided little detail.

However the ESRI attribute table provided detailed information for each state that would be

useful for demographic studies such as area, population, number of households, etc.

National Atlas and Tiger Data attribute tables only provide information regarding the state

boundary polygon shapes.

Page 13: LAND COVER, POPULATION ESIMATES, AND STATE BOUNDARIES: A COMPARISON OF UNCERTAINTY AMONG GIS DATATSETS

Figure 8: State boundaries

STATE BOUNDARIES – NATIONAL ATLAS, TIGER DATA, ESRI SHAPEFILES 1:75,000

RED BOUNDARY – NATIONAL ATLAS STATE SHAPEFILEBLUE BOUNDARY – TIGER STATE SHAPEFILE

MAGENTA BOUNDARY – ESRI STATE SHAPEFILE

CONCLUSIONS

The following recommendations are made in order to reduce uncertainty in the Open Space

Institute project:

LAND COVER

• The 30m NLCD land cover rasters provide much more visual detail at small scales.

• There are statistically significant differences in which the different NLCD classes are represented in 30m and 200m NLCD data.

• 30m land cover data is appropriate for most analyses; 200m data may be used for a “macro” analysis.

POPULATION ESTIMATES

Page 14: LAND COVER, POPULATION ESIMATES, AND STATE BOUNDARIES: A COMPARISON OF UNCERTAINTY AMONG GIS DATATSETS

• NLCD Impervious layer may be used for cursory analysis; LANDSCAN is appropriate for detailed analysis because of the inclusion of actual population by cell data.

STATE BOUNDARIES

• The National Atlas state boundary shapefile is more accurate than Tiger Data and ESRI, and should be used for most analyses.

• The ESRI state boundary shapefile is appropriate for demographic studies.

DATA SOURCES

http://www.mrlc.gov/nlcd_multizone_map.php

Multi – Resolution Land Characteristics Consortium (MRLC) includes:National Land Cover Database (NLCD) multi-zone download site. NLCD 2001 includes 21 classes of Land Cover, Percent Tree Canopy,and Urban Imperviousness at 30m cell resolution.

The Urban Imperviousness layer aligns nicely with the LANDSCAN data.

http://www.epa.gov/mrlc/nlcd-2001.html

The EPA site for NLCD data.

http://www.ornl.gov/sci/landscan/

The LANDSCAN Dataset comprises a worldwide population database compiled on a 30" X 30" latitude/longitude grid. Census counts are based on proximity to roads, slope, land cover, nighttime lights, and other information.

http://eros.usgs.gov/products/elevation.html

The USGS site provides Digital Elevation Models (DEM) at 30m resolution. You can download seamless 7.5 degree quads from this site. Also access to the National Map Seamless Server.

http://www.nationalatlas.gov/

Access to National Atlas Seamless server. The OSI map includes state and county boundaries and Federal land locations downloaded from this site. National Atlas also has 200m resolution land cover maps.

http://www.census.gov/geo/www/tiger/

The Census Bureau is home to Tiger data shapefiles