an introduction to geographic information systems

Coraput. & Indus. Engng, Vol. 1, pp. 131-138. Pergamon Press, 1977 Printed in Great Britain

A N I N T R O D U C T I O N T O G E O G R A P H I C

I N F O R M A T I O N S Y S T E M S

RONALD J. CLASSEN

University of Arkanas, Fayeneville, AR, U.S.A.

(Received 12 August 1976)

Abstract-4:)eographic information systems are concerned with the organization, handling and retrieval of data whose spatial position or geographic pattern is of concern. This paper defines a geographic information system, explains some terms associated with geographical or spatial data, and discusses methods of organizing such data for flexible and efficient retrieval. The paper also reviews the analysis possibilities available using spatial data including computer mapping and other nongraphic analyses such as correlation and regression, descriptive spatial statistics, trend surface analysis, cluster analysis, and network analysis. Potential applications of geographic information systems are presented including their use in the areas of public health, resource management and public safety.

I N T R O D U C T I O N

Industrial engineers have long been involved in the design, implementation and use of management information systems, especially those systems used in manufacturing concerns. However, in recent years industrial engineers have expanded their interests into such nonmanufacturing areas as state and local government. Here the need for knowledge of information systems is at least as great as in manufacturing.

The job of obtaining information for intelligent decision making is even more difficult in local government due to the size and diversity of city and state information needs. Within a state there are an enormous number of agencies collecting information for various different geographic reporting areas. Each agency usually invests a large amount of resources in collecting and maintaining data files relevant to its own needs. Therefore, valuable resources are often wasted due to duplication of effort.

The basis of the problem lies in the fact that data are aggregated according to some areal unit for analysis. This simplifies data collection but diminishes the value of the collected data. For example, unemployment data are generally collected and aggregated by county. However, this procedure has an averaging effect on data: a county-wide unemployment rate of 5% may obscure the fact that small areas of the county have unemployment rates of 35%.

In the same vein employment statistics might be needed for some other geographic area within a county such as a school district. Since the data have been aggregated at the county level there is no way to obtain these statistics for other areal units except to collect the same data again for the particular area.

One answer to this and similar problems is to not aggregate the data or to aggregate it as little as possible. This approach is not widely used because it requires more sophisticated collection methods and data handling methods. However, the cost of redundant data collection is so high and the need for accurate and versatile methods so great that attention has recently been focused on computer information systems capable of storing and manipulating large amounts of location-specific data. Such systems are known as geographic information systems since they are concerned with the geographic location of data.

A geographic information system is differentiated from other types of information systems by the requirement that each data element be referenced to a specific geographic location by some Iocational identifier. Locational identifiers (also called geographic codes or geocodes) can be anything from alphanumeric codes representing cities to' the latitude and longitude of a specific point. In addition, there is general agreement that data in geographic information systems are manipulated and retrieved on geographical criteria and that the output generally takes the form of graphical presentation [8].

This paper is intended as an introduction to geographic information systems and to their use in state and local government. The author is presently designing a statewide geographic information

131

132 RONALD J. CLASSEN

system for the state of Arkansas. The system is intended to provide flexible geographic data handling capabilities for various governmental agencies and private groups.

GEOGRAPHIC DATA HANDLING

Geocoding systems t The most noticeable attribute that separates geographic information systems from other

information systems is the need for a complete subsystem to handle the locational aspect of data. Such a subsystem is based upon a system or method, known as a geographic coding system, used to assign some type of geographic code to each data element. The two major structural elements of all geocoding systems are a concept of areal division, classification or definition and some form of coding logic [11]. In addition to these major elements there are a number of characteristics by which various geocoding systems can be classified.

Size of unit coded. One important characteristic is the size of units coded which may range from a theoretically infinite number of discrete points to a few large areas such as states or regions. In general, the smaller the size of the unit the more flexible the system. However, since each unit coded usually corresponds to a logical record in the computer, flexibility must be balanced with practicality to ensure that sufficient secondary storage space is available for all data.

Type of unit coded. The type of unit coded is also an importantcharacteristic of geocoding systems. Some systems, such as the Census Bureau system, code geo-political units such as counties, cities, townships and enumeration districts. Others identify economic areas such as ZIP marketing areas or physical features such as drainage basins, wildlife habitat, mineral deposits, etc. Still others identify units such as grid cells which are not related to any physical, economic, or political area.

Spatial relationship. Some geocoding systems use only an alphabetic code which indicates no spatial relationship between units. Other systems are similar to the Census Bureau system which indicates that a certain enumeration district is within a certain county. Still other systems, such as various grid and coordinate systems, indicate exactly the location of each grid unit or point.

Coding logic. There are four general types of coding schemes used to assign locational identifiers to units coded in a geocoding system. Figure 1 illustrates these identifiers which include [8]:

External index Coordinate reference Arbitrary grid Explicit boundary.

External index codes, such as census tract codes and ZIP codes, do not actually define a geographic location but rather are an index which must be used with a map or sketch. External index codes are poorly suited for use in geographic information systems since most spatial analyses are not possible using data coded with external index codes.

Coordinate reference identifiers usually refer to the centroid of an area or to the specific location of an event. For example, a census tract would be identified by referencing its centroid while a traffic accident would be identified by the specific coordinates of the event.

In the systems using arbitrary grids as locators, data are referenced to the lower left corner coordinates of the grid cell in which they are located. The major difficulty encountered arises when grid cells contain data which are not homogeneous within the cell. For example, in classifying land according to its use, some land in the cell might be cropland and some might be forest. The usual solution is to classify the land as to predominant use which, of course, builds in some error. Another approach is to code the percentages of various land types contained in each cell.

Explicit boundary identifiers are used to describe the boundary of an area or data element with a series of line segments. The land use classification problem is solved by defining one polygon boundary for cropland and another polygon boundary for forest land. The explicit boundary is the most sophisticated locator and also requires the most sophisticated data handling techniques.

tFor a comprehensive reivew of national geocoding systems, see Werner[ll].

C~,nsus Tr. . Po , .

l n 5 172' 1G,, 4532 I07 ]628

]"!' I ]05

107 108

EXTI'IR!!,~L I NDE>[

l

CENTROIDS

EVE[~T$

COORDIr~ATE REFERENCE

An introduction to geographic information systems 133

ARBITRARY GRID EXPLICIT BOUNDARY

Fig. I. Locational identifiers for geographic information systems[8].

Geographic information systems Automated geographic information systems can generally be classified according to the

geocoding system upon which they are based as grid-cell systems, polygon systems, or point systems.

In grid--cell systems variables are associated with a uniform grid--cell matrix superimposed on the area being studied. Once the grid system is established, data associated with each grid cell can be encoded or digitized and stored in a computer file. The file can then be used for a variety of purposes including various spatial analyses and computer mapping.

Grid-cell systems have a number of advantages over other systems. In many cases the grid cell is an easy way to collect data. Also, due to the matrix structure, grid cell data files are easy to manipulate with scientific programming languages such as FORTRAN, BASIC or PL/1. In addition, by making the cells small enough, grid--cell systems can be used for discrete as well as continuous types of data analysis. The main disadvantage of the grid--cell approach is that accuracy is lost when certain types of data are aggregated to the grid-cell level.

Polygon systems are based upon geocoding schemes which use explicit boundary locational identifiers. In a polygon system data are collected by polygon areal units such as census tracts. Polygon systems are also ideally suited for collecting natural resource data such as soil types since boundaries around a specific soil type can best be described by a polygon. A polygon system requires that data be digitized, stored, and retrieved in their basic form, which, unlike a grid-cell system, is not usually related to any regular geographic identification system. The main disadvantage of polygon systems is that they require sophisticated software and large amounts of computer processing time.

Point systems use some type of x-y coordinate to represent the location of various types of phenomena. Point data are often used to represent an area such as a census tract by referencing the centroid of the area and using that point in any calculations. Point data may also be used to represent a continuous distribution as in the case of air samples to approximate the continuous air quality over a region. In other cases point identifiers are used to represent entities such as weather stations, traffic accidents, factories, etc. Point systems are often used in conjunction

CAIE Vol, I, No. 2--E

134 RONALD J, CLASSEN

with one of the other types of systems since point data can be aggregated to any higher level by various point-in-polygon routines.

Analysis possibilities. There are a number of spatial analysis techniques that can be employed by geographic information systems. One of the most common is computer mapping of data. While maps are not a new idea, the use of a computer to prepare maps of data contained in various files is a fairly recent development. Maps are valuable as both a descriptive tool and for evaluating alternatives. Indeed, a picture is often worth a thousand words and certainly worth more than ten pounds of computer printout.

Maps can be produced by both pen and ink plotters and by line printers although line printers are probably more popular due to their speed. Generally a number of different map types are useful. Conformant or choropleth maps show discrete areas such as states or counties and, by various types of shading, the value or intensity of a specific variable which refers to that region. These areas are considered uniform with respect to the statistics collected within them. Contour maps display data by interpolating a continuous surface in areas where there are no data points. Trend surface maps display data by fitting a continuous surface which can be described mathematically. Discrete point data are mapped by indicating the location and magnitude of each data element by some symbol.

In addition to mapping, there are a number of nongraphic analyses which can be performed on geocoded data including the following:

(1) NonspatMl statistical analyses. Include the normal calculations for factor analysis, analysis of variance, correlation, regression, etc., and are commonly performed upon area aggregated summary tabulations.

(2) Descriptive spatial statistics [1]. Are analogous to nonspatiai descriptive statistics, describing the central tendency, dispersion, and skewness of a distribution, only in a two-dimensional sense.

(3) Trend surface analysis [3]. Provides a method whereby mathematical surfaces of increasing complexity, starting with a plane, are fitted to point observations. The surface fitting method used is related to regression analysis, only in three dimensions rather than two.

(4) Cluster analysis [7]. A general term for a large class of numerical techniques for defining groups of related entities based on high similarity coefficients. In spatial analyses the coefficients may reflect point-to-point proximity or area-to-area adjacency.

(5) Territorial analysis. Similar to cluster analysis but deals with the allocation of areas, people, etc., between point centroids. For examples of territorial analysis used to allocate school children to neighborhood schools, see Barb[2].

(6) Network analysis. Includes a large class of techniques used to analyze flows and routes through a network of spatially distributed points.

Data collection. Data collection for geographic information systems is similar to other data collection since data may be generated as a result of routine reports or through a special survey or census. However, collecting geographic data is different because some type of geocode must be attached to each data element. One of the most successful systems to automatically add geocodes to a data file was developed by the U.S. Census Bureau for use in the 1970 census.

In metropolitan areas the 1970 Population Census was conducted by mail rather than by enumerators as was the case in rural areas. In order to classify and aggregate data from the mail-in returns, the Bureau needed a method to translate urban street addresses to codes for census tracts, enumeration districts, inc. To accomplish this the Bureau developed files for each Standard Metropolitan Statistical Area (SMSA) in the United States at the time of the 1970 census. These files, known as DIME (Dual Independent Map Encoding) files, are actually computerized maps of a city (see Fig. 2.) DIME files were built by coding each street segment (a length of street between two distinct vertices, or nodes) and relating it to intersections, census tracts, ZIP areas, etc.

DIME files were used by the Census Bureau in the following manner. Returns from each house in a city were keypunched and transferred to magnetic tape which was then processed with the DIME file as illustrated in Fig. 3. Each house address was compared to the address range of each record in the DIME file. When a match was found various codes for census areas were appended to the original file of returns. Once these codes had been added, the statistics for the area corresponding to each code could be accumulated.

An introduction to geographic information systems

FOR EACH STREET SEGMENT

Street Name

Street Type

Lt Addreases

Rt Addresses A DIME RECORD

Left Block C ONTAINS : Left Tract

Right Block

Right Tract

Low Node

X-Y coordinate

High Node

X-Y coordinate

Fig. 2. DIME record contents [10].

MAIN

ST

101-199

100-198

138

9

131

9

123

155000

232000

t24

15600O

234000

135

@ S t r e e t name and number

257 Braemore St.

Files to be matched in computer

S t r e e t name

end address range

C o d e to be

appended

60 251-299 B r a e m o r e

Code appended to local f i le

S t r e e t n a m e

and n u m b e r

257 Braemore St.

A p p e n d e d

c o d a

60

Fig. 3. Automated geocoding using DIME files[10].

By using the DIME concept any file containing records with street addresses can be modified by adding various geocodes , including x -y coordinates, to each record. DIME files are available for every SMSA in the country and the concept could be ex tendedto any city with a system of street addresses.

Of course, not all data can be related to a street address. Assigning geocodes to other spatial data becomes more difficult but not impossible due to the existence of electronic digitizers


especially suited to this task. Digitizers are machines used to record x-y coordinate, or grid, values from a map. The cross hairs attached to a pointer are positioned over each location on a map which is to be digitized and a button pressed to automatically record the coordinate location of that point on a card or magnetic tape. While the coordinates are being recorded, other data such as names and statistical values relating to points and areas can be entered through the computer console or a remote terminal connected to the digitizer. The coordinates thus recorded can be set in relation to a given origin. If this zero point is also given its correct location in a particular coordinate system (such as its latitude and longitude), then all the stored coordinates can be converted by computer processing.

Machine requirements An integrated geographic information system should contain a set of file management

routines, a set of mathematical and statistical routines, a set of spatial analysis routines, and a package of graphic display programs. In addition, the system should have a control program to translate user requests into machine-readable code. Defining the equipment required to implement such a system is difficult since few, if any, complete systems exist. However, the Data Presentation System (DPS), developed by IBM for the U.S. Government, meets many of the requirements. DPS operates in an OS/360 environment with a minimum configuration of 256 K bytes of main storage. This would require an IBM System/360 Model 40 or larger. In addition to the normal tape drives, direct access device, high-speed printer, and a card reader, a Calcomp plotter is required for the graphic output[6].

The equipment required for DPS is probably the minimum necessary to implement a fully developed geographic information system. However, a fully-developed system is not necessary in many applications. For example, if the user is primarily interested in computer mapping, programs such as the SYMAP[5] package produce a full complement of maps on the printer but are limited in their analysis and file management capabilities. SYMAP is commercially available and requires a machine with 128 K main storage.

A number of mapping and spatial analysis routines have also been written for smaller computers although they are not as sophisticated or versatile as programs written for larger machines. Most programs for the smaller computers require a system comparable to the IBM System/360 Model 30 with 32 K bytes of main storage. Such a system would allow the user to perform a limited amount of analysis, produce some maps on the line printer, and perform street address geocoding as illustrated earlier in Fig. 3.

In summary, a 32 K machine is about the minimum required to handle spatial data without being severely limited. Larger machines give the user more versatility, with a machine in the range of the IBM, System/360 Model 40 with 256 K being required for a fully-developed geographic information system.

A P P L I C A T I O N S

Health Geocoded data is particularly relevant to a number of studies where the location or spatial

pattern of a given variable may be as significant as its magnitude. One example of this is in the area of public health where health officials are interested not only in the actual number of cases of a specific disease but also in the spatial distribution in a state, region, or city. A geocoded data file can be used to produce computer maps which quickly show areas where a particular disease is above or below average.

A geocoded file of reported disease can also be compared on an area-by-area basis with other data files to determine whether areal relationships exist. A recent finding that certain kinds of cancer were more prevalent in areas with high concentrations of chemical processing plants is an example of this type of study.

Other files of health indicator data such as infant mortality, number of visits to a doctor, distance to nearest hospitals or clinics can also be geocoded and analyzed to determine needs and optimal locations for outpatient clinics.

Crime prevention The procedure for analyzing crime statistics is analogous to the procedures used with health

data. Basically, the concern is with the same type of questions. Officials need to know what type

An introduction to geographic information systems 137

of crimes are occurring and, more importantly, where. Again computer mapping can easily pinpoint areas with high and low crime rates and thereby help decision makers to allocate scarce resources to the proper area. In addition, other analyses will show any trends that might be present and also the relationship of crime increases or decreases to other variables.

Resource management The management of natural resources such as land, water and wildlife can be improved

through the use of geographic information systems. Satellite photographs of the earth's surface can be digitized and classified according to land use, availability of land, and suitability for various activities. The resulting file can be analyzed by regional planners using a geographic information system. Land suitable for recreation, industry, housing, farming, etc., can be listed or mapped by area. Changes in land use can be determined by periodic updates to the file and the effect on wildlife, water resources, etc., of these changes can be studied.

Tragic safety The analysis of traffic accidents according to their location is a common duty of at least one

agency in almost every state. The procedures are similar but one problem is common to most, That is how to specify the location of an accident for use in the analysis procedures. The problem is attacked in a number of ways including the use of elaborate systems of mileposts on all roads to which an accident can be referenced.

A combination of the various geocoding systems discussed gives a simpler solution. For accidents in metropolitan areas, the address of accidents can be matched against DIME files and an x-y coordinate, or other code, appended to the accident file. For rural areas the location of accidents can be marked on medium to large scale maps and the maps then digitized. This produces a geocoded file which can then be used to determine locations with abnormally high accident rates. Such Iocati6ns are targets for various counter-measures designed to reduce the number of accidents.

Other Other applications of geographically referenced data depend upon the type of data files

available. In addition to data sources mentioned above, some of the more common data files that might be available to a state or local government include:

(1) Population characteristics (2) Housing characteristics (3) Income statistics (4) Land value (5) Traffic data (6) School census data (7) Migration (8) Housing market data (9) Welfare data

(10) Insurance data.

SUMMARY

Industrial engineers have often worked with information systems in manufacturing concerns and as industrial engineers move into nonmanufacturing areas such as state and local government, their knowledge of information systems is an important asset. However, the informational needs of state or local government are different from those of manufacturing firms in that often data have a spatial attribute; that is, the location or spatial pattern of data is often as important as the magnitude. Computer information systems, known as geographic information systems, designed to handle the Iocational or geographical aspect of data show promise of providing better information to state and local administrators.

Geographic information systems can be classified as grid systems, polygon systems, or point systems based upon the manner in which data are referenced to a specific geographic areal or point. Grid systems are the most flexible, while polygon systems are more accurate for certain kinds of data such as natural resource data. Point systems can be considered a subset of grid systems since grid cells become points as they are made very small.


Geographic information systems can be used to map statistical information as well as for other nongraphic statistical analyses. Geographic information systems have many applications in public health, resource management, and public safety. However, in order to reach their full potential, more development must occur in the areas of data definition, data collection methods and data analysis. Industrial engineers can make a significant contribution to more effective state and local government by applying their skills to develop these areas.

REFERENCES

1. C. E. Barb, Automated Street Address Geocoding Systems: Their Local Adaptation and Institutionalization. Unpublished Ph.D. dissertation, University of Washington, (1974).

2. C. E. Barb, Compiler, GEOCODING '72, Papers of the Geographic Base File Special Interest Group Sessions, 1972 Annual Conference of the Urban and Regional Information Systems Association. San Francisco (1972).

3. J. P. Cole & C. A. M. King, Quantitative Geography, Wiley, New York (1968). 4. Environmental Systems Research Institute, AUTOMAP II User's Manual. Redlands, California (1975). 5. Harvard University. Laboratory for Computer Graphic and Spatial Analysis. SYMAP User's Manual. Cambridge (1975). 6. National Military Command System Support Center. Data Presentation System User's Manual. Alexandria, Virginia

(1971). 7. R. R. Sokal & Peter Sheath, Principles of Numerical Taxonomy, W. H. Freeman, San Francisco (1963). 8. R. F. Tomlinson, ed., Geographical Data Handling, VoL I & II. International Geographical Union, Ottawa (1972). 9. R. F. Tomlinson, H. W. Calkins & D. F. Marble, Computer Handling of Geographical Data. Unesco Press, Paris (1976).

I0. U.S. Census Bureau, Census Use Study. DIME: A Geographic B~se File Package. U.S. Government Printing Office, Washington D.C. (1972).

11. P. Werner, A Survey of National Geo-Coding Systems. U.S. Government Printing Office, Washington D.C. (1972).

an introduction to geographic information systems

Documents