lsgi 521: principles of gis lecture 5: data management · lsgi 521: principles of gis lecture 5:...

24
LSGI 521: Principles of GIS Lecture 5: Data management LSGI 521: Principles of GIS Lecture 5: Data management LSGI 521: Principles of GIS Dr. Bo Wu [email protected] Department of Land Surveying & Geo-Informatics The Hong Kong Polytechnic University Lecture 5: Spatial Data Management in GIS LSGI 521: Principles of GIS Lecture 5: Data management 1. Learning outcomes 2. From files to database 3. Definition of spatial database 4. Evolution of spatial database 5. Fundamental database elements 6. Spatial database design 7. ESRI GeoDatabase 8. Issues for spatial database creation Contents 2011/10/12 2

Upload: phungdiep

Post on 26-Jul-2018

226 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data managementLSGI 521: Principles of GIS Lecture 5: Data management

LSGI 521: Principles of GIS

Dr. Bo Wu [email protected]

Department of Land Surveying & Geo-InformaticsThe Hong Kong Polytechnic University

Lecture 5: Spatial Data Management in GIS

LSGI 521: Principles of GIS Lecture 5: Data management

1. Learning outcomes2. From files to database3. Definition of spatial database4. Evolution of spatial database5. Fundamental database elements6. Spatial database design7. ESRI GeoDatabase8. Issues for spatial database creation

Contents

2011/10/12 2

Page 2: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data management

• By the end of this lecture you should be able to:– Outline why databases are important in GIS– Explain how the relational database model works– Describe how to set up a relational database– Know the procedure of database design– Explain how to create a database in a GIS– List important considerations for GIS databases– Know the basic structure of ESRI GeoDatabase

Learning Outcomes

2011/10/12 3

LSGI 521: Principles of GIS Lecture 5: Data management

• When do we store data?– During and at the end of a session of data acquisition, editing, updating,

or processing – Communication (e.g. satellite-ground) – Data exchange – Archiving

• What do we store in a database?– Anything important for GIS: spatial, attribute, and topological information

• Where do we store data?– Mass media: disks, tapes, CD-ROMs, and optical discs– Working media: hard discs, memory, and radio waves– Maintenance and security

•• HHOWOW do we store data? – Files– Database– …

Data Store Issues

2011/10/12 4

Spatial Spatial DatabaseDatabase

Page 3: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data management

• File – File is a collection of organized records of

information– A record has usually a record number and record

content

• ASCII Files – Use ASCII characters to represent information– ASCII (American Standard Code for Information

Interchange) characters: 0, 1, ..., 9, A, ..., Z, a, ..., z, +, -, ..., @, ...

• Binary Files– Organize information according to bits and a

combination of bits, and are in a computer-readable format

Data Store in Files

2011/10/12 5

Record # Record Content

1 x1,y1

2 x2,y2

3 x3,y3

... ...

LSGI 521: Principles of GIS Lecture 5: Data management

Examples of Data Files

2011/10/12 6

ASCII File

Binary File

Page 4: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data management

From Files to Database

2011/10/12 7

Example - Assuming there are ten cities with different elevation: H1=4m, H2=1004m, H3=820m, H4=640m, ..., H10=20m

A simple list file

4 1004 820640...20

• Easy to generate and edit

• Slow to search specific data

An ordered sequential file

4 20 640 820 ... 1004

• Search became relatively easy

• Not easy to update

Direct indexed files

Low elevation (L): 1 ~ 200mMedium elevation (M): 201 ~ 700mHigh elevation (H): 701 ~ 2000m

• Easy to search• Easy to update

This is actually a small database

LSGI 521: Principles of GIS Lecture 5: Data management

The problems with the traditional approach to data managementThe problems with the traditional approach to data management• Data redundancy (the unnecessary repetition or duplication of data)• High maintenance costs• Difficulties in moving from one system to another• Data-sharing difficulties• Lack of security and standards• Lack of coherent and integrated management of data

Why We Need Database in GIS

2011/10/12 11

A different version of the visitor’s details may be stored in each of the separate systems

Page 5: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data management

• Database: is a large collection of data in a computer system, organized so that it can be expanded, updated, and retrieved rapidly for various uses. It could be a file or a set of files.

• Spatial database: stores GEOREFERENCED data. For example, buildings with their locations, bank account holders with addresses.

What Is a Spatial Database

2011/10/12 12

LSGI 521: Principles of GIS Lecture 5: Data management

• Is a spatially enabled DBMS (Database Management System) system, with additional capability to handle spatial data

• It offers Spatial Data Types (SDTs) in its physical data representation (data model)

• It supports special query language, e.g. SQL (Structured Query Language), for efficient manipulation of spatial data and geometric operations

• It supports efficient spatial indexing and effective algorithms for handling spatial joins

Generally Accepted Definition of Spatial Database

2011/10/12 13

Page 6: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data management

A Simple Relational Database

2011/10/12 14

• Data are organized in a series of tables, each of which contains records for one entity. Tables are linked by common data known as keys

• Queries are possible on individual tables or on groups of tables

LSGI 521: Principles of GIS Lecture 5: Data management

• Characteristics– A relational DBMS or some components of it for descriptive

data– A specific module for spatial data management– Usually proprietary– Examples: ArcInfo (ESRI), TiGRis (Intergraph)

Early Spatial Database: Loosely Coupled Approach

2011/10/12 17

DBDB FilesFiles

• Drawbacks:The coexistence of heterogeneous data model, which implies difficulties in modeling, use and integrationSince proprietary, hard to integrate and interoperate

Page 7: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data management

• Characteristics:– New spatial data types (point, line, polygons) are handled as

base alphanumeric types– The query language SQL is extended to manipulate spatial data

and descriptive data– Many other DBMS functions, eg. spatial indexing, query

optimization are adapted so as to handle geospatial data efficiently

– Being Open in data model and architecture– Examples: ArcSDE (ESRI), Oracle Spatial

Current Spatial Database: Extended DBMS

2011/10/12 18

Spatial &Spatial &alphanumeric alphanumeric

typestypes

LSGI 521: Principles of GIS Lecture 5: Data management

• Entity– An entity is “a phenomenon of interest in reality that is not further

subdivided into phenomena of the same kind” eg. city, building– Entities in reality modeled in a spatial database as Objects

• Object– An object is “a digital representation of all or part of an entity”

• Entity Types– An entity type is any grouping of similar phenomena that should

eventually get represented and stored in a identical way. (e.g. streets, buildings, rivers, vegetation

– 1st step in DB design is the selection and definition of entity types to be included

– 2nd step of DB design is to choose an appropriate method of spatial representation for each entity type

Fundamental Database Elements

2011/10/12 19

Page 8: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data management

• Spatial Object Type– The digital representation of entity types in a spatial DB– Classification is based on the spatial dimensions: 0,1,2,3D– Point (0-D), line (1-D), area/polygon (2-D), volume (3-D)

• Object Class– An object class is the set of objects which represent the set of same

entities

• Attributes– An attribute is a characteristic of an entity– Usually non-spatial, can be stored as alphanumeric values– Are presented as columns in attribute tables in the DB

• Attributes Values– The actual value of the attribute that has been recorded in the DB– An entity type is almost always labeled and known by attributes– Are usually presented as cells in attribute tables

Fundamental Database Elements (cont'd)

2011/10/12 20

LSGI 521: Principles of GIS Lecture 5: Data management

• Data Type– The attribute of a variable, field or column in a table that determines the kind of

data it can store– Common data types include character, integer, decimal, single, double and string

• Layers– Spatial objects can be grouped into layers, also called themes– One layer may represent a single entity type or a group of conceptually related

entity types

• Behavior– A set of rules define how an objects can be edited and drawn

• Relational Join– An operation by which two tables are related through a common field, known as a

key

• Spatial Join– A type of table join operation in which fields from one layer’s attribute table are

appended to another layer’s attribute table based on the relative locations of the features in the two layers

Fundamental Database Elements (cont'd)

2011/10/12 21

Page 9: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data management

• Database Model– Is a conceptual description of a DB defining entity type and associated

attributes

• Georelational Data Model– A spatial data model that represent spatial features as an interrelated set

of spatial and attribute data– The georelational data model is the fundamental data modal used in

ArcInfo

• Geodatabase Data Model– An object-oriented data model introduced by ESRI that represents spatial

features and attributes as object and the relationships between objects– A geodatabase can store objects, such as feature classes, feature

dataset, non-spatial tables, and relationship classes

Fundamental Database Elements (cont'd)

2011/10/12 22

LSGI 521: Principles of GIS Lecture 5: Data management

• Almost all entities in reality have a 3-D spatial character, but not all dimensions may be needed

– Highway pavement actually has a depth which might be important, but is not as important as the width, which is not as important as its length

• Representation should be based on the types of process that the application may ultimately utilized

– Vector vs. Raster

• Map scale of the source data is important in constraining the level of detail represented in a DB

– On a 1:20000 map individual building are not visible

Database Design Considerations

2011/10/12 24

Page 10: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data management

• What storage media to use? – How large is the database? – How much can be stored online? what access speed is required for

what parts of the database? – How should the database be laid out on the various media? – What growth should be allowed for in acquiring storage devices?

• How will the database change over time? – Will new attributes be added? – Will the number of features stored increase?

• How should the data be partitioned - both geographically and thematically? – Is source data partitioned? – Will products be partitioned?

Database Design Considerations (cont’d)

2011/10/12 25

LSGI 521: Principles of GIS Lecture 5: Data management

• What security is needed? – Who should be able to create - new attributes, new objects? – Who should be able to edit and update?

• Should the database be distributed or centralized? – If distributed, how will it be partitioned between hosts?

• How should the database be documented?

• How should database creation be scheduled? – Where will the data come from? – Who determines product priorities? – Who is responsible for scheduling data availability?

Database Design Considerations (cont’d)

2011/10/12 26

Page 11: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data management

• Conceptual design– Software and hardware independent– Describes and defines entities and spatial objects– Identifies how entities will be represented in the database– Selection of spatial objects types – points, lines, polygons, raster cells

Spatial Database Design 1 - Conceptual

2011/10/12 27

Should a building be represented as an area or a point?

Should highway segments be

explicitly linked in the DB?

LSGI 521: Principles of GIS Lecture 5: Data management

• Logical design– Software specific but hardware independent– Translation of the conceptual model into the data model of GIS– Determined by database management system

Spatial Database Design 2- Logical

2011/10/12 28

Page 12: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data management

• Physical design– Both hardware and software specific– Related to issues of file structure, memory size and access

requirements

Spatial Database Design 3 - Physical

2011/10/12 29

LSGI 521: Principles of GIS Lecture 5: Data management

Spatial Database Design

2011/10/12 30

Conceptual Design

Logical Design

Physical Design

Page 13: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data management

• Omitted data• No update potential• Inappropriate representation of entities• Lack of integration between various parts

of the database• Unsupported applications

Bad Aspects in Database Design

2011/10/12 31

LSGI 521: Principles of GIS Lecture 5: Data management

General Steps in Spatial Database Design – the ESRI Geodatabase Approach

2011/10/12 32

Page 14: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data management

Step 1: Model the User’s View

2011/10/12 33

Things to do:• Identify organizational

functions• Identify the data

required to support the functions

• Organize the data into logical sets of common features

LSGI 521: Principles of GIS Lecture 5: Data management

An Example: System Function Diagram for LSIS

2011/10/12 34

Page 15: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data management

Step 2: Define Entities and Relationships

2011/10/12 35

Things to do:• Identify and describe

entities that you model• Identity and describe the

relationships among these entities

• Document the entities and relationships with diagrams (UML, Data Flow Diagrams)

LSGI 521: Principles of GIS Lecture 5: Data management

An Example: System Entity Descriptions for LSIS

2011/10/12 36

Page 16: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data management

An Example: E-R Diagram for LSIS

2011/10/12 37

E-R Diagram: Entity Relationship Diagram

LSGI 521: Principles of GIS Lecture 5: Data management

Step 3: Select Geographic Representations

2011/10/12 38

Page 17: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data management

Things to be considered:• The feature might be represented on a map• The shape of the feature might be significant in

performing geographic analysis (e.g., tracing water pipe network)

• Accessing one feature from another by relationships• Features will have different representations at different

map scales• Textual attributes will be displayed on the screen or map

products (e.g., labels, annotations)

Select Geographic Representations

2011/10/12 39

Point, Line, Polygons, Surface, Images Point, Line, Polygons, Surface, Images ……. ???. ???

LSGI 521: Principles of GIS Lecture 5: Data management

An Example: Geographic Representation in LSIS

2011/10/12 40

Same entity, different symbol for different map scales

Page 18: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data management

Step 4: Matching Entities to GeoDB Data Model

2011/10/12 41

LSGI 521: Principles of GIS Lecture 5: Data management

Matching Entities to GeoDB Data Model

2011/10/12 42

Things to do:•Software specific (data model supported by the s/w)

•To develop an efficient and effective database schema

•Determine the appropriate geodatabase representation for entities

•Ensure that complex feature classes are supported

Page 19: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data management

Step 5: Organize Spatial Database Structure

2011/10/12 43

Things to do:•Assign entities to feature classes and subtypes (subclass)•Group related sets of features into geometric networks or planar topologies•Organize feature classes and datasets into geodatabase

LSGI 521: Principles of GIS Lecture 5: Data management

An Example: Spatial Database Structure

2011/10/12 44

Page 20: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data management

• Core ESRI ArcGISdata model

• Set of ArcObjectscomponents in ArcGIS for accessing data

• A physical store of geographic data

ESRI’s Geodatabase

2011/10/12 45

LSGI 521: Principles of GIS Lecture 5: Data management

Geodatabase Data Management

2011/10/12 46

• Personal Geodatabase– Single user editing– Stored in MS Access– Size limit of 2 GB

• File Geodatabase (9.2)– 1 TB per table– Reduced storage

requirements

• ArcSDE Geodatabase– Enterprise– Supports multiuser editing

via versioning– Requires ArcEditor or

ArcInfo to edit

PersonalGeodatabase

ArcSDEGeodatabase

OracleSQL Server

DB2Informix

ArcSDE

ArcGIS

FileGeodatabase

Page 21: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data management

Create a Geodatabase

2011/10/12 47

• Define schema in ArcCatalog– Define feature classes,

datasets, relationships, etc• Import and convert data from

other formats– Shapefile– Coverage– CAD– Raster

• Copy and Paste• Use an ESRI Data Model

– Industry specific data models available

– Copy geodatabase template

GDB

LSGI 521: Principles of GIS Lecture 5: Data management

• Effective functionality in the storage, retrieval and query of historical data.

• New data item -- “EDITED FEATURE” is proposed to be added to each feature of the map layers.

• Feature change log has been identified, such as User ID, Geo-reference number, Map ID, Map Feature ID and others.

• Versioning function in ArcInfo 8 can be used as an efficient tool for – Recording historical change of a certain types of land feature,

such as land parcel– Periodic data archiving (e.g., once each week) can achieved

overall database

• Data volume can be one of the major concerns

Introducing Versioning

2011/10/12 48

Page 22: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data management

• Data Volume of a GIS– Databases for GIS applications range from a few megabytes (a small

resource management project) to hundreds Gbytes

Key Hardware Parameters

2011/10/12 49

Case Data VolumeA small raster-based project: IDRISI, 100 by 200 cells50 layers

10 Mbytes

A mid-sized vector-based project: National Forest in ARC/INFO

300 Mbytes

A national archival database many hundreds of GbytesSpatial database imagery of Landsat accumulated to 1989 imagery

order 1013 bytes

Hong Kong (3,000 sheets), BMS 1:1000, Arc/Info16 layers

4 Gbytes

Hong Kong CIS (3200 maps): 1:1000, 33 layers, Arc/Info

3 Gbytes

LSGI 521: Principles of GIS Lecture 5: Data management

• Access speed– “On-line": data which can be accessed

– Archival media:• Magnetic tape• CD-ROM• …

• Network configuration– Should database be centralized or distributed?– All departments share one common database, or parts of the

database exist on different workstations in an integrated network

Key Hardware Parameters (cont’d)

2011/10/12 50

Page 23: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data management

• To enhance overall system performance• To solve data integrity, versioning control and data concurrency• To integrate various GIS sub-systems• To apply client-server and other technologies for handling data

communication and management between Information Centre and sub-offices

A Centralized Database Architectural

2011/10/12 51

Office-1 Office-2 Office-19 Office-20

Server

HQ DB

Information Centre

……

LSGI 521: Principles of GIS Lecture 5: Data management

• Database creation: is a time-consuming and expensive operation which must be phased over several years of operation

• To know the complexity of data on each input source document to forecast data input workload

Scheduling Database Creation

2011/10/12 52

Page 24: LSGI 521: Principles of GIS Lecture 5: Data management · LSGI 521: Principles of GIS Lecture 5: Data management • Database: is a large collection of data in a computer system,

LSGI 521: Principles of GIS Lecture 5: Data management

• Determine the order of datasets input, must rank products based on – Perceived benefit – Cost of necessary input

• To know the payoffs between– producing a single tile of a new product– producing further tiles of an existing product

• Priorities under the constraint of data input capacity is a delicate operation for the Database Manager

Database Scheduling Issues

2011/10/12 53

LSGI 521: Principles of GIS Lecture 5: Data management

• Further readings– Geodatabase, ESRI,

(http://www.esri.com/software/arcgis/geodatabase/index.html)– David Arctur, 2004, Designing Geodatabases: Case Studies in GIS Data

Modeling, Esri Press, ISBN-13: 978-1589480216, 393p.

• Summarization of the main ideas presented in this lecture:

• Questions?

Review

2011/10/12 54