a.k.a ‘how to preprocess you data in a meaningful way’

Queries, overlays and lookup tables

a.k.a ‘How to preprocess you data in a meaningful way’

Devis Tuia, v0.1, March 14 2018

Why are we talking about it

In a GIS project, we create a LOT of datasets

● We like to avoid unnecessary information (it takes place)

● We like to avoid redundancy

(it makes a mess)

● We like to avoid erroneous calculations

(sometimes the tool used has an impact on the final result)

2

Example: a “Merge” duplicates!

3

merge

ID length !!!!!!!!!!

1 100 !same

2 100 !same

3 10

So if you calculate the total length,

it will be 210, instead of 110.

Menu of the day

Different families of data handling types

(can be queries, transformations or alterations)

Reviewing useful tools from each family

Introduction of Lookup tables (called “reclassification tables in ArcGIS”)

4

Families of handling types

There are two main families

● Query-like: they work on the features, but do not modify the attributes

This means: you select, cut and paste, but the attributes stay the

same.

● Overlays: they modify both geometry and attributes.

This means: the resulting features have a different set of

attributes than the originals.

5

QUERIES, part 1

selecting

6

Queries, group 1: the select

Select only highlights features according to a query

It can be

● Spatial selection: ‘select buildings WITHIN Wageningen’

● Attribute selection: ‘select buildings where GM_NAAM = ‘Wageningen’ ’

In the first case you will need a second (polygon) feature class with the municipality boundaries, in the second, you need the municipality of every building in the attribute table, under ‘GM_NAAM’(see “overlays” tools later).

7

Examples of select by location

8

Examples of select by location

9

To select within a buffer, you don’t need to create and save a buffer

HINT: about the tool

if your selecting feature class has a feature selected, it will be the only one the tool will select in

10

INPU

T

INPU

T

RESU

LT

RESU

LT

But remember: it is just a selection!

You haven’t saved the resulting features (your selection is only in memory

If you want you can

● Save them manually (right click on the feature class being selected)

● Use another tool with direct saving options: this is necessary to guarantee reproducibility

Selections are taken into account when performing operations (see previous slide for buildings in Wageningen)

11

QUERIES, part 2

cutting out

12

Clip

A way to select by location AND saving.

Does not create any new attribute, it just cuts and copies in a new feature class (entity stays the same)

But remember: IT CUTS features

With the previous, you select them and then save them, so you wont alter their geometry

13

Comparing them:

The green is the feature class being clipped on the area of the blue square

Clip modifies the geometry, selecting does not.

Neither add attributes.

14

Erase

Does the same job as clip, but removing the intersection with the second feature class

So remember: it modifies the geometry.

15

QUERIES, part 3

adding / removing features

16

The mergers

Sometimes we want to make a single feature class from two

= we concatenate the features

This time, they MUST have the same attributes!

(entities must be the same)

17

ID Height Width Region

8 55 3 12

9 54 4 16

10 12 6 17

ID Height Width Region

5 17 6 12

6 19 5 17

7 44 3 15

ID Height Width Latitude

8 55 3 12N

9 54 4 16N

10 12 6 17N

ID Height Width Region Color

5 17 6 12 Red

6 19 5 17 Green

7 44 3 15 Red

Can merge Cannot merge!

Append VS merge

They merge datasets into a single one

Differences:

● APPEND writes into an existing dataset, MERGE creates a new one

● In APPEND, the output dataset does not have the same attribute structure (only the inputs), but will overwrite it with the inputs’ one (for this, use the NO_TEST option)

BOTH GENERATE DUPLICATE FEATURES

18

Example

We want to merge these two overlapping buildings feature classes

APPEND and MERGE will create an extra building feature!

19

OVERLAYS

20

Overlays in general

They are based on the concept of spatial joins.

They combine two feature classes

● can be of different feature type (point, line, polyg.)

● Can be with different attributes

1. first apply a spatial query based on location

2. Apply a specific geometric processing (based on set theory)

3. Concatenate (join) the attributes: the output will have as much attributes as the sum of the composing feature classes.

21

Geometric operations

Union

Intersect

Identity

Update

Symmetric difference

22

Set overlays: union

Obtain all features primitives in each feature class, after intersecting them (and removing duplicates)

The primitives of Union are used in all the other tools.

23

Set overlays: union (attribute table)

24

Feature Attributes (green feature class)

Attributes (blue feature class)

Values Values

<null> Values

Values Values

Values <null>

Values Values

Values <null>

Set overlays: union

Examples of use

Merge all built structure using a set of feature classes from different offices (some buildings will be duplicated, for example)

Merging a multi-year dataset of deforestation areas (non forest areas remain from year to year)

25

Set overlays: intersection

keeps only the primitives that belong to both inputs

= removes all primitives that belong to either one or another

26

Set overlays: intersection (attribute table)

27



Values Values

Values Values

Values Values

Set overlays: intersection

Examples of use

Extract buildings within Wageningen

(ok, can also be done with a clip if joining the attributes is not desired)

Extract points of interest on the highway between Ede and Utrecht (and you need to be able to differentiate those in Utrecht and Gelderland districts... Otherwise it’s a clip)

28

Set overlays: identity

keeps only the primitives spatially located on the original input feature class

(in our case the blue square in the left image)

29

Set overlays: identity (attribute table)

30



Values Values

<null> Values

Values Values

Values Values

Set overlays: identity

Examples of use

Extract extract within a protected zone which areas are forest (forest areas are generally bigger polygons, so the features will need to be cut) and which are not

31

Set overlays: update

Merges all primitives belonging to the update feature class.

32

Set overlays: update (attribute table)

33



<null> Values

Values <null>

Values <null>

Set overlays: update if same attributes

34

Feature Attributes

Values

Values

Values

Set overlays: update

Examples of use

Extract forest areas EXCLUSIVE OR protected zones

Merge two versions of the same map. The most recent remains identical and differences found in the old one are added

35

Set overlays: symmetric difference

Removes primitives spatially common to both feature classes

= Union - Intersection

36


(attribute table)

37



<null> Values

Values <null>

Values <null>


Examples of use

Extract areas that are EITHER forests OR protected (but not both)

38

Summing up

There are many tools at your disposal

First think of what you need for further calculations (carry attributes or not? Need to save it or just selecting is enough?)

Then think of the most efficient way, most solutions can be attained with a combination of tools, but also with a single one!

Remember: if features are selected in the processed feature class, only those will be processed.

Be careful of features duplicates (overlaps) when merging datasets!

39

LOOKUP TABLES

40

What is a lookup table?

Also called Reclassification tables

It is basically an attribute that

● Has several repeated entries in a single feature class

● Appears with same name and meaning across feature classes

A classical example is the TDN code

Another is neighborhood codes

41

TDN codes

They are a description of landuse centrally defined

They fit our description


TDN codes are repeated, many polygons are of the same land

use type


Land use types can be found in attribute tables concerning

agricultural fields, buildings, ...

42

Region codes

A single numerical code to define a region (can also be the name of the region, of course, but then careful with typos )


region codes are repeated, many fields are located in the same

region


fields and protection areas can have a “region” attribute, to

select them more easily.

43

Lookup table are good for

Keep in mind the important grouping variables

Ensure that you do not have duplicates

Structure your data: if your grouping variables are in tables, you will re use them!

44

Lookup tables (by building type)

45

Lookup tables (by TDN code)

46

Careful!

The extra attributes you add are summed (so not super helpful to sum TDN codes for instance

Use it for distances, counts, ...

To obtain a lookup table we use the frequency tool.

This tools is not to be confused with the Reclassify tool, where we recode an attribute in a raster (e.g.: all values for deciduous forests (value 10) and pine forest (value 20) are reclassified into generic forest (value 1))

47

JOINS

48

What is a join?

It is a logical extension of the concept of lookup tables

In a nutshell:

● You have two datasets

● They share a lookup attribute or spatial locations

● You want to join them (i.e. that the features of one get enriched with the attributes of the other)

They can be based on attributes or on spatial queries.

49

Spatial join

Performs a selection based on relative locations of the features belonging to two feature classes

Merges attributes only on features selected according to location

The others get <null> fields values.

It allows flexibility of spatial selection criteria (matching operations in ArcGIS): e.g. join when

● Features intersect

● Features intersect the boundary of the other

● Features are contained completely in the other

● ...

50

Spatial join (2)

You can see it as creating a lookup table of locations and selecting only those with the same ID.

E.g. (let-s say we want to add flood risk values into a buildings feature class by intersection with flood risk areas):

● First it performs geometrical matching operation (intersect, ...)

● Then selects all the buildings according to it

● For the selected ones, it joins attributes from the

corresponding flood risk feature class.

51

Attribute join (1)

It is the same concept, but based on one (or many) common attribute(s) (yes, a lookup table ).

● Example: different suitabilities depending on roads types AND municipality

You will the common attributes in the two datasets being joined.

The second dataset can be a lookup table itself!

52

Ex: Road buffer size

For instance: in the lightrail project, you need to attribute different buffer values wrt road types:

● First you create a LUT for road types

● You add a new attribute to the LUT, the buffer size

● You enter the values (e.g. fietsers get a buffer of 100m)

53

Ex: road buffer size

54

● Now you join the roads feature class with the LUT

● Your buffer widths are now replicated correctly for each road segment

Final point: do you need to save it?

If yes, use the Join and Spatial Join tools

● They save a new feature class with the joint attributes

It is only temporary? Keep it in memory (right click on the feature class and use the option ‘joins and relates’

55

BUT NOT WITHIN

THE LIGHTRAIL PROJECT!

Today

We discussed a number of operations (tools) to preprocess and organize your data

We saw queries and overlay operation and highlighted specificities

We introduced the concept of lookup tables and joins.

56

Putting in context of the light rail project...

You will encounter these concepts (or already have)

● Clipping to reduce a feature class extent (and the number of features)

● Overlays to exclude features or join datasets

● Lookup tables to assign suitability values to features (via a join)

57

THANK YOU!

58

a.k.a ‘how to preprocess you data in a meaningful way’

Documents