a.k.a ‘how to preprocess you data in a meaningful way’
TRANSCRIPT
Queries, overlays and lookup tables
a.k.a ‘How to preprocess you data in a meaningful way’
Devis Tuia, v0.1, March 14 2018
Why are we talking about it
In a GIS project, we create a LOT of datasets
● We like to avoid unnecessary information (it takes place)
● We like to avoid redundancy
(it makes a mess)
● We like to avoid erroneous calculations
(sometimes the tool used has an impact on the final result)
2
Example: a “Merge” duplicates!
3
merge
ID length !!!!!!!!!!
1 100 !same
2 100 !same
3 10
So if you calculate the total length,
it will be 210, instead of 110.
Menu of the day
Different families of data handling types
(can be queries, transformations or alterations)
Reviewing useful tools from each family
Introduction of Lookup tables (called “reclassification tables in ArcGIS”)
4
Families of handling types
There are two main families
● Query-like: they work on the features, but do not modify the attributes
This means: you select, cut and paste, but the attributes stay the
same.
● Overlays: they modify both geometry and attributes.
This means: the resulting features have a different set of
attributes than the originals.
5
QUERIES, part 1
selecting
6
Queries, group 1: the select
Select only highlights features according to a query
It can be
● Spatial selection: ‘select buildings WITHIN Wageningen’
● Attribute selection: ‘select buildings where GM_NAAM = ‘Wageningen’ ’
In the first case you will need a second (polygon) feature class with the municipality boundaries, in the second, you need the municipality of every building in the attribute table, under ‘GM_NAAM’(see “overlays” tools later).
7
Examples of select by location
8
Examples of select by location
9
To select within a buffer, you don’t need to create and save a buffer
HINT: about the tool
if your selecting feature class has a feature selected, it will be the only one the tool will select in
10
INPU
T
INPU
T
RESU
LT
RESU
LT
But remember: it is just a selection!
You haven’t saved the resulting features (your selection is only in memory
If you want you can
● Save them manually (right click on the feature class being selected)
● Use another tool with direct saving options: this is necessary to guarantee reproducibility
Selections are taken into account when performing operations (see previous slide for buildings in Wageningen)
11
QUERIES, part 2
cutting out
12
Clip
A way to select by location AND saving.
Does not create any new attribute, it just cuts and copies in a new feature class (entity stays the same)
But remember: IT CUTS features
With the previous, you select them and then save them, so you wont alter their geometry
13
Comparing them:
The green is the feature class being clipped on the area of the blue square
Clip modifies the geometry, selecting does not.
Neither add attributes.
14
Erase
Does the same job as clip, but removing the intersection with the second feature class
So remember: it modifies the geometry.
15
QUERIES, part 3
adding / removing features
16
The mergers
Sometimes we want to make a single feature class from two
= we concatenate the features
This time, they MUST have the same attributes!
(entities must be the same)
17
ID Height Width Region
8 55 3 12
9 54 4 16
10 12 6 17
ID Height Width Region
5 17 6 12
6 19 5 17
7 44 3 15
ID Height Width Latitude
8 55 3 12N
9 54 4 16N
10 12 6 17N
ID Height Width Region Color
5 17 6 12 Red
6 19 5 17 Green
7 44 3 15 Red
Can merge Cannot merge!
Append VS merge
They merge datasets into a single one
Differences:
● APPEND writes into an existing dataset, MERGE creates a new one
● In APPEND, the output dataset does not have the same attribute structure (only the inputs), but will overwrite it with the inputs’ one (for this, use the NO_TEST option)
BOTH GENERATE DUPLICATE FEATURES
18
Example
We want to merge these two overlapping buildings feature classes
APPEND and MERGE will create an extra building feature!
19
OVERLAYS
20
Overlays in general
They are based on the concept of spatial joins.
They combine two feature classes
● can be of different feature type (point, line, polyg.)
● Can be with different attributes
1. first apply a spatial query based on location
2. Apply a specific geometric processing (based on set theory)
3. Concatenate (join) the attributes: the output will have as much attributes as the sum of the composing feature classes.
21
Geometric operations
Union
Intersect
Identity
Update
Symmetric difference
22
Set overlays: union
Obtain all features primitives in each feature class, after intersecting them (and removing duplicates)
The primitives of Union are used in all the other tools.
23
Set overlays: union (attribute table)
24
Feature Attributes (green feature class)
Attributes (blue feature class)
Values Values
<null> Values
Values Values
Values <null>
Values Values
Values <null>
Set overlays: union
Examples of use
Merge all built structure using a set of feature classes from different offices (some buildings will be duplicated, for example)
Merging a multi-year dataset of deforestation areas (non forest areas remain from year to year)
25
Set overlays: intersection
keeps only the primitives that belong to both inputs
= removes all primitives that belong to either one or another
26
Set overlays: intersection (attribute table)
27
Feature Attributes (green feature class)
Attributes (blue feature class)
Values Values
Values Values
Values Values
Set overlays: intersection
Examples of use
Extract buildings within Wageningen
(ok, can also be done with a clip if joining the attributes is not desired)
Extract points of interest on the highway between Ede and Utrecht (and you need to be able to differentiate those in Utrecht and Gelderland districts... Otherwise it’s a clip)
28
Set overlays: identity
keeps only the primitives spatially located on the original input feature class
(in our case the blue square in the left image)
29
Set overlays: identity (attribute table)
30
Feature Attributes (green feature class)
Attributes (blue feature class)
Values Values
<null> Values
Values Values
Values Values
Set overlays: identity
Examples of use
Extract extract within a protected zone which areas are forest (forest areas are generally bigger polygons, so the features will need to be cut) and which are not
31
Set overlays: update
Merges all primitives belonging to the update feature class.
32
Set overlays: update (attribute table)
33
Feature Attributes (green feature class)
Attributes (blue feature class)
<null> Values
Values <null>
Values <null>
Set overlays: update if same attributes
34
Feature Attributes
Values
Values
Values
Set overlays: update
Examples of use
Extract forest areas EXCLUSIVE OR protected zones
Merge two versions of the same map. The most recent remains identical and differences found in the old one are added
35
Set overlays: symmetric difference
Removes primitives spatially common to both feature classes
= Union - Intersection
36
Set overlays: symmetric difference
(attribute table)
37
Feature Attributes (green feature class)
Attributes (blue feature class)
<null> Values
Values <null>
Values <null>
Set overlays: symmetric difference
Examples of use
Extract areas that are EITHER forests OR protected (but not both)
38
Summing up
There are many tools at your disposal
First think of what you need for further calculations (carry attributes or not? Need to save it or just selecting is enough?)
Then think of the most efficient way, most solutions can be attained with a combination of tools, but also with a single one!
Remember: if features are selected in the processed feature class, only those will be processed.
Be careful of features duplicates (overlaps) when merging datasets!
39
LOOKUP TABLES
40
What is a lookup table?
Also called Reclassification tables
It is basically an attribute that
● Has several repeated entries in a single feature class
● Appears with same name and meaning across feature classes
A classical example is the TDN code
Another is neighborhood codes
41
TDN codes
They are a description of landuse centrally defined
They fit our description
● Has several repeated entries in a single feature class
TDN codes are repeated, many polygons are of the same land
use type
● Appears with same name and meaning across feature classes
Land use types can be found in attribute tables concerning
agricultural fields, buildings, ...
42
Region codes
A single numerical code to define a region (can also be the name of the region, of course, but then careful with typos )
● Has several repeated entries in a single feature class
region codes are repeated, many fields are located in the same
region
● Appears with same name and meaning across feature classes
fields and protection areas can have a “region” attribute, to
select them more easily.
43
Lookup table are good for
Keep in mind the important grouping variables
Ensure that you do not have duplicates
Structure your data: if your grouping variables are in tables, you will re use them!
44
Lookup tables (by building type)
45
Lookup tables (by TDN code)
46
Careful!
The extra attributes you add are summed (so not super helpful to sum TDN codes for instance
Use it for distances, counts, ...
To obtain a lookup table we use the frequency tool.
This tools is not to be confused with the Reclassify tool, where we recode an attribute in a raster (e.g.: all values for deciduous forests (value 10) and pine forest (value 20) are reclassified into generic forest (value 1))
47
JOINS
48
What is a join?
It is a logical extension of the concept of lookup tables
In a nutshell:
● You have two datasets
● They share a lookup attribute or spatial locations
● You want to join them (i.e. that the features of one get enriched with the attributes of the other)
They can be based on attributes or on spatial queries.
49
Spatial join
Performs a selection based on relative locations of the features belonging to two feature classes
Merges attributes only on features selected according to location
The others get <null> fields values.
It allows flexibility of spatial selection criteria (matching operations in ArcGIS): e.g. join when
● Features intersect
● Features intersect the boundary of the other
● Features are contained completely in the other
● ...
50
Spatial join (2)
You can see it as creating a lookup table of locations and selecting only those with the same ID.
E.g. (let-s say we want to add flood risk values into a buildings feature class by intersection with flood risk areas):
● First it performs geometrical matching operation (intersect, ...)
● Then selects all the buildings according to it
● For the selected ones, it joins attributes from the
corresponding flood risk feature class.
51
Attribute join (1)
It is the same concept, but based on one (or many) common attribute(s) (yes, a lookup table ).
● Example: different suitabilities depending on roads types AND municipality
You will the common attributes in the two datasets being joined.
The second dataset can be a lookup table itself!
52
Ex: Road buffer size
For instance: in the lightrail project, you need to attribute different buffer values wrt road types:
● First you create a LUT for road types
● You add a new attribute to the LUT, the buffer size
● You enter the values (e.g. fietsers get a buffer of 100m)
53
Ex: road buffer size
54
● Now you join the roads feature class with the LUT
● Your buffer widths are now replicated correctly for each road segment
Final point: do you need to save it?
If yes, use the Join and Spatial Join tools
● They save a new feature class with the joint attributes
It is only temporary? Keep it in memory (right click on the feature class and use the option ‘joins and relates’
55
BUT NOT WITHIN
THE LIGHTRAIL PROJECT!
Today
We discussed a number of operations (tools) to preprocess and organize your data
We saw queries and overlay operation and highlighted specificities
We introduced the concept of lookup tables and joins.
56
Putting in context of the light rail project...
You will encounter these concepts (or already have)
● Clipping to reduce a feature class extent (and the number of features)
● Overlays to exclude features or join datasets
● Lookup tables to assign suitability values to features (via a join)
57
THANK YOU!
58