technical description for data merging

33
1 Technical report on GIS Analysis, Mapping and Linking of Contextual Data to the European Social Survey HAPPINESS project of the Cross-National and Multi-level Analysis of Human Values, Institutions and Behaviour (HumVIB) programme Finbarr Brereton, University College Dublin, Ireland Mirko Moro, University of Stirling, The United Kingdom Tine Ningal, University College Dublin, Ireland Susana Ferreira, University of Georgia, USA Abstract This technical paper documents the work undertaken to link the European Social Survey a biennial multi-country survey, which measures attitudes, beliefs and values of individuals living in more than 30 nations to multi-level variables capturing the physical environment and context of the respondents (air pollution, climate, land use, local GDP per capita, population density, unemployment rate, etc.). The process of linking the data involved creating a series of spatial identifiers based on the Nomenclature of Territorial Units for Statistics (NUTS) geocodes. In addition, while the macroeconomic contextual variables are typically available at the regional level, pollution and climate data are recorded at monitoring stations, and Geographic Information Systems (GIS) spatial interpolation techniques need to be applied prior to linking these to a particular respondent. GIS is increasingly used to process, analyse and display georeferenced data effectively due to its mapping capabilities. The resulting dataset provides a unique tool for quantitative investigation of interrelationships at the individual, regional and national levels in Europe. Financial support from the from European Science Foundation (Cross-National and Multi-level Analysis of Human Values, Institutions and Behaviour (HumVIB)) is gratefully acknowledged. We thank Oana Borcan and Victor Peredo Alvarez for outstanding research assistance. Corresponding author: [email protected]

Upload: lykhanh

Post on 02-Jan-2017

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 1

    Technical report on GIS Analysis, Mapping and Linking of Contextual Data to the

    European Social Survey

    HAPPINESS project of the Cross-National and Multi-level Analysis of Human Values,

    Institutions and Behaviour (HumVIB) programme

    Finbarr Brereton, University College Dublin, Ireland

    Mirko Moro, University of Stirling, The United Kingdom

    Tine Ningal, University College Dublin, Ireland

    Susana Ferreira, University of Georgia, USA

    Abstract

    This technical paper documents the work undertaken to link the European Social Survey a

    biennial multi-country survey, which measures attitudes, beliefs and values of individuals

    living in more than 30 nations to multi-level variables capturing the physical environment

    and context of the respondents (air pollution, climate, land use, local GDP per capita,

    population density, unemployment rate, etc.). The process of linking the data involved

    creating a series of spatial identifiers based on the Nomenclature of Territorial Units for

    Statistics (NUTS) geocodes. In addition, while the macroeconomic contextual variables are

    typically available at the regional level, pollution and climate data are recorded at monitoring

    stations, and Geographic Information Systems (GIS) spatial interpolation techniques need to

    be applied prior to linking these to a particular respondent. GIS is increasingly used to

    process, analyse and display georeferenced data effectively due to its mapping capabilities.

    The resulting dataset provides a unique tool for quantitative investigation of interrelationships

    at the individual, regional and national levels in Europe.

    Financial support from the from European Science Foundation (Cross-National and Multi-level Analysis of

    Human Values, Institutions and Behaviour (HumVIB)) is gratefully acknowledged. We thank Oana Borcan and

    Victor Peredo Alvarez for outstanding research assistance. Corresponding author: [email protected]

    mailto:[email protected]

  • 2

    TABLE OF CONTENTS

    1. Introduction 3

    1.1 European Social Survey 3

    1.2 Geographical Information Systems 3

    1.3 Deliverables from the Project 5

    2. Creating a regional identifier in the ESS for data linking 7

    3. GIS analysis and mapping of air quality 9

    3.1 Data 9

    3.2 Methods 10

    3.2.1 Importing spreadsheet data into GIS 11

    3.2.2 Spatial Interpolation in GIS 14

    3.2.3 Integration of air quality with NUTS data 23

    4. GIS analysis and mapping of climate and land use data 27

    4.1 Climate data 27

    4.2 Land use data 28

    5. References 31

    6. Appendix 32

  • 3

    1. Introduction

    1.1 European Social Survey (ESS)

    The ESS is an academically-driven, international survey examining changing social attitudes,

    beliefs and values across Europe. It has become the first ever social science project to be

    granted the prestigious Descartes prize, awarded by the European Commission for

    excellence in scientific research. In our project we focus on the first three waves of the

    survey. The first wave was fielded in 2002/2003, the third one in 2006/2007. ESS data are

    obtained using random (probability) samples, where the sampling strategies, which may vary

    by country, are designed to ensure representativeness and comparability across European

    countries. The three-wave cumulative includes around 120,000 observations from 23

    European countries.1

    One of the variables collected in the survey is the region within a country where the

    respondent lives. This information allows us to match the survey data spatially to a map of

    Europe using Geographic Information Systems (GIS) and hence it is possible to combine

    individual data with a vector of spatial amenities.2 These two datasets are combined at the

    NUTS level.3 To assess the impact of changes in spatial amenities on individual variables (of

    particular interest for our project, self-reported subjective well-being) in a more precise

    manner, ideally, one would want to be able to match contextual factors to a particular

    individual rather than a particular area. At present, however, the data do not allow this and

    anonymity may preclude this in any case.

    1.2 Geographic Information Systems and the Social Sciences

    Adoption of Geographic Information System (GIS) and spatial modelling tools in the social

    sciences is in its infancy, which is primarily due to a lack of recognition by social scientists of

    the capability and capacity of such tools to support and develop new research areas and to aid

    1 They are Austria, Belgium, Czech Republic, Switzerland, Germany, Denmark, Estonia, Spain, Finland, France,

    Greece, Hungary, Ireland, Italy, Luxembourg, Netherlands, Norway, Poland, Portugal, Sweden, Slovenia,

    Slovakia and the UK. 2 GIS works well when applied to static data, and less well when applied to time series analysis (Goodchild and

    Haining, 2004) and hence is well-suited to the cross-sectional data employed in this project.

    3 European Nomenclature of Territorial Units for Statistics (NUTS) is a geocode standard for referencing

    administrative divisions of countries for statistical purposes developed by the European Union.

  • 4

    and enhance research applications. GIS offers great potential to generate innovative

    approaches and advance knowledge in disciplines such as political science, economics,

    archaeology, environmental studies, history, demography, anthropology, and applied social

    sciences.

    GIS is widely-used as a planning and analysis computing tool that allows the visual

    representation of spatially referenced data and provides a powerful set of tools for spatial

    analysis and modelling. It has advanced the technical ability to handle spatial data as

    countable numbers of points, lines and polygons4 in two-dimensional space (Goodchild and

    Haining, 2004) and to link various datasets using spatial identifiers (Bond and Devine, 1991).

    It represents a solid base for spatial data analysis and provides a range of techniques for

    analysis and visualisation of spatial data. It provides effective decision support through its

    database management capabilities, graphical user interfaces and cartographic visualisation

    (Wu et al., 2001). It provides tools for integrating, querying and analysing a wide variety of

    data types, such as scientific and cultural data, satellite imagery and aerial photography, as

    well as data collected by individuals, into projects, with geographic locations providing the

    integral link between all the data.

    With the rapid growth in the availability of geographical data in digital formats and parallel

    innovations in technology to allow for the manipulation, analyses and visualisation of these

    data, new types of information are being created. This underpins developments in

    Participatory GIScience which provides a better understanding of the complexity of decision

    situations involving human interactions with their physical environment. A recent article in

    Science (Butz and Torrey, 2006) highlights the importance of the new GIScience tools in

    providing the ability to analyse social behaviour across time and geographic scales. It further

    points out that their adoption by social scientists is still in its infancy.

    GIS methods can contribute to multi-level analysis; they can even generate new levels of

    analysis and allow access to levels previously only identifiable in principle. They can also

    help disseminating multilevel research findings. With a diverse range of disciplines involved

    in multi-disciplinary research (sociologists, psychologists, economists, political scientists

    etc.), creating policy documents that are accessible to the research community and to the

    4 A polygon is the GIS term for any multi sided figure.

  • 5

    general public can become a challenge. To this end, GIS applications allow cartographic

    representations of data and results that aid in disseminating information to a wide audience

    a picture is worth a thousand words.

    There now exists unprecedented individual-level data resources in Europe, typified by the

    European Social Survey (ESS). There also exists comprehensive system-level and contextual

    data. Heretofore, however, there are few analyses employing individual level data linked to

    contextual data and they typically cover a limited local area or a limited set of indicators (see,

    e.g., Brereton et al. 2008; MacKerron and Mourato, 2008; Luechinger, 2009). GIS facilitates

    linking contextual data (institutional, economic. environmental etc.) to individual-level data.

    While many social scientists are currently engaged in cross-national analysis, using GIS to

    link data at the regional level would allow investigators to go further and engage in analysis

    at the micro, meso and macro levels, using data that are comparable across a larger number of

    units of analysis (regions) and this would increase the validity of multi-level analysis.

    The growth in the availability of geographical data has not been accompanied by a coherent,

    coordinated data collection effort at the European level. For example, the National Institute

    for Regional and Spatial Analysis (NIRSA) in Ireland, the URBIS Digital Spatial Database in

    University College Dublin, EDINA in the UK and Eurostat all house digital spatial data,

    much of this overlapping. A goal of our project is to address the fragmentation that currently

    plagues digital data archives in Europe by creating a pan-European research dataset with

    environmental and other contextual data spatially referenced and linked to the ESS, and to

    share the dataset and methodologies used to create it.

    1.3 Deliverables from the Project

    A key deliverable of this project is a pan-European dataset of environmental and other spatial

    variables geo-referenced at a regional level and linked to the individuals in the ESS. The

    contextual variables can be classified into four groups: air pollution concentrations, climate,

    land use and macro-socioeconomic factors (Table 1).

  • 6

    Table 1. List of variables in spatial dataset

    Category Indicators Main Source

    Air Pollution PM10 mean annual concentration (g/m3) EEA AirBase

    CO mean annual concentration (mg/ m3) (http://acm.eionet.europa.eu/databases/airbase/)

    SO2 mean annual concentration (g/ m3)

    NO mean annual concentration (g/ m3)

    NO2 mean annual concentration (g/ m3)

    Benzene mean annual concentration (g/ m3)

    Climate Annual mean temperature (C) ECA

    Mean of daily max. temperature in July (C) (http://eca.knmi.nl/)

    Mean of daily min. temperature in January

    (C)

    Annual mean precipitation (mm)

    Land use Residential CORINE

    Commercial and Industrial (http://www.eea.europa.eu/publications/COR0-landcover)

    Mines and Dumps

    Green Urban Spaces

    Agricultural Land

    Forestry

    Natural Areas

    Water bodies

    Macro-

    socioeconomic GDP per capita Eurostat

    Population change (%) (http://epp.eurostat.ec.europa.eu/portal/page/portal/eurostat

    /home/)

    Population density

    Deaths from respiratory diseases

    Unemployment rate (by age group/gender)

    In the following sections, we provide a more detailed description of the construction of the

    dataset, in particular for the pollution, climate and land use variables. Macro-socioeconomic

    variables were already available at a NUTS 2 or NUTS 3 level (depending on the variable)

    from the Eurostat database.

    The NUTS is a geocode standard for referencing administrative divisions of countries for

    statistical purposes developed by the European Union. Regions at NUTS level 1 are large

    sub-national units (such as Scotland or Bavaria) each of which usually comprises a number of

    NUTS 2 regions (examples of this level include the Autonomous Communities in Spain or

    the "regions" in France). In turn, these are made up of NUTS 3 regions (such as the "Kreis" in

  • 7

    Germany). Although broadly very stable over time in a number of countries, the NUTS

    classification has been amended several times, most recently in 1995, 1999 and 2003.5

    The Cumulative ESS database does not use a coherent definition of region; in some cases the

    regions can be NUTS level 1 in other cases NUTS level 2 or NUTS level 3 (Table 2).

    Table 2. NUTS levels used in ESS for each participant country

    NUTS Level Countries

    1 Belgium (BE), Germany (DE), France* (FR), Luxemburg (LU), United Kingdom (UK)

    2 Austria (AT), Switzerland (CH), Spain (ES), Finland (FI), France* (FR), Greece (GR),

    Hungary (HU), Italy (IT), Ireland* (IE), Norway (NO), Poland (PO), Portugal (PT), Sweden

    (SE)

    3 Czech Republic (CZ), Denmark (DK), Estonia (EE), Ireland* (IE), The Netherlands (NL),

    Slovenia (SI), Slovakia (SK), UA (Ukraine)

    * ESS used mixed boundaries of NUTS levels 1 & 2 for France and levels 2 & 3 for Ireland.

    The spatial variables in Table 1 were linked to each respondent at the corresponding NUTS

    level in Table 2. In addition, we preserved them at the higher level of spatial disaggregation

    at which they were available (NUTS 3 for pollution, climate and land use data, NUTS2-3 for

    macro-socioeconomic indicators). All the datasets will be publicly available in the project

    website (http://www.ucd.ie/happy/resear.html) from January 2012.

    2. Creating a regional identifier in the ESS for data linking

    The cumulative ESS dataset can be freely downloaded from the ESS website

    (www.europeansocialsurvey.org). We appended three additional spatial identifiers in order to

    facilitate the matching of the ESS data file with spatially referenced data and to carry out

    spatial analysis of data: "cntry," "region" and "code_id." "cntry" is the NUTS code for each

    country; "region" is name of the region as reported in the ESS Cumulative dataset. The ESS

    contains multiple variables to identify the region where the respondent lives. We merge these

    into one new variable. The key new regional variable we create is "code_id:" a unique

    identifier equal to the NUTS level for a particular observation in the ESS (see Table 2).

    5 A detailed list of NUTS by each European country can be found at

    http://ec.europa.eu/eurostat/ramon/nuts/codelist_en.cfm?list=nuts. Maps of each country and regions with

    subdivision in NUTS levels can be found at http://circa.europa.eu/irc/dsis/regportraits/info/data/en/.

  • 8

    In the NUTS system, each country is divided following a three-level hierarchy of regions

    established on the basis of existing administrative regions or groupings of these. The NUTS 1

    code is composed of three alphanumeric characters. The first two refers to the country (and

    they are the same as the cntry variable), while the third one is usually a number. NUTS 2

    code is composed of four alphanumeric characters, while NUTS 3 consists of five

    alphanumeric characters. Box 1 illustrates the NUTS code hierarchy for the Spanish regions.

    Therefore if code_id for a particular observation is, for example, GR11 it means that the

    respondent lives in a NUTS2 level region of Greece, while UA044 stands for a NUTS 3 level

    region of Ukraine, etc.

    Box 1: NUTS code hierarchy for Spanish regions

    ES (Spain)

    ES1 (represents NUTS 1 level identifying the North-West region)

    ES11 (represents NUTS 2 level identifying Galicia)

    ES111 (represents NUTS 3 level identifying La Corua)

    ES112 (region at NUTS 3 level, Lugo)

    ES12 (Asturias)

    ES120 (Asturias)

    ES13 (Cantabria)

    ES130 (Cantabria)

    ES2 (NUTS 1 identifying North-East region)

    ES21 (Basque Country)

    ES211 (lava/Araba)

    ES212 (Guipzcoa/Gipuzkoa)

    ...

    ...

    ES7 (NUTS1 Canarias)

    ES70 (Canary Islands)

    ES701 (Las Palmas)

    ES702 (Tenerife)

    In a few cases (Switzerland, France, Italy, Ireland), a new code was created to accommodate

    the fact that their ESS regions were an aggregation of NUTS. In this case, the following rule

    has been observed in creating the ad-hoc code:

    - CH02-04 is an aggregation of 3 Swiss regions at NUTS 2-level (i.e., CH02, CH03,

    and CH04).

    - FI18,20 represents 2 Finnish regions at NUTS 2-level (i.e., FI18 and FI20), etc. Ditto

    for France.

    - IE022-025 represents 4 Irish regions at the third NUTS level (i.e., IE022, IE023,

    IE024, IE025).

  • 9

    The unique spatial identifier enables us to match any kind of data available at different spatial

    levels to the original Cumulative ESS dataset.

    3. GIS analysis and mapping of air quality

    This section describes the technical procedures involved in the GIS analyses and mapping of

    concentrations of PM10, SO2, NO, NO2, CO and Benzene across the 23 European countries in

    the cumulative ESS R1-R3 dataset (see footnote 1) from 2001 to 2008. The air quality

    datasets were obtained from AirBase - the European Air Quality Database in spreadsheet

    (Excel) format. The datasets underwent several preparation, conversion, interpolation,

    processing and analyses steps in spreadsheet and GIS formats. The first outcome is a GIS

    database with a grid cell size of 5km on the side. The data was combined with EU NUTS data

    to map air quality at NUTS3 level. The final resulting database on air quality can be queried

    for air quality and NUTS information over any location in the study and mapped both at

    NUTS3 and 5 km spatial resolutions.

    3.1. Air quality data

    The air quality data under investigation are Particulate Matter under 10 microns (PM10),

    Sulphur Dioxide (SO2), Nitrogen Oxide (NO), Nitrogen Dioxide (NO2), Carbon Monoxide

    (CO) and Benzene (C6H6). The data is average annual time series from 2001-2008 and covers

    23 European countries. The air quality data have been recorded by a network of monitoring

    stations and submitted to European Topic Centre for Air Pollution and Climate Change

    Mitigation (ETC/ACM), an agent of the European Environmental Agency (EEA). The data

    and other information on air quality are hosted by European Air Quality Database (AirBase)

    where they are publicly accessible (http://acm.eionet.europa.eu/databases/airbase/).

    The datasets for the 23 countries were downloaded in spreadsheet (Excel) format and the

    tables were re-structured for eventual import into GIS. The structured tables were identical in

    their data types and each air pollution table contains 30 variables, ranging from

    administrative units to geographic coordinates (longitude and latitude) of the monitoring

    station and air quality values. Since all the spreadsheet tables have identical number of

    columns and data types, all the air quality data for the different member states were merged

  • 10

    into a single Microsoft Excel spreadsheet. The combined datasets in the single spreadsheet

    file has six worksheets, each representing one of the six air pollutants.

    An enumeration of the monitoring stations by country showed Germany leading with over

    25% of the total air monitoring stations, followed by Spain, France and Italy (Table 3). Over

    60% of the total monitoring stations in the study are concentrated in these four countries:

    Table 3. Number of monitoring stations and land areas per country

    3.2 Methods

    The methodology employed in processing the air pollution datasets is multi-step and lengthy.

    In brief, three main phases can be identified: i) importing data into GIS, ii) undertaking

    interpolation, and iii) integrating NUTS with air quality data. Each of these main steps will be

    discussed in the following sections. Fig.1 provides a schematic overview of the workflow.

  • 11

    Figure 1. Workflow showing input, processing and output of air quality data. An example demonstrating

    the above workflow is shown in Fig.12 for Germany.

    3.2.1 Importing spreadsheet data into GIS

    In order to map air quality data in GIS, the data must be spatially referenced. The longitude

    and latitude provide the necessary spatial references that define the locations of the

    monitoring stations. However, these coordinates are in World Grid System of 1984 (WGS84)

    and must be re-projected to the European ETRF projection system to accurately overlay with

    other GIS layers.

    The air quality data in the spreadsheet is prepared for import into GIS by shortening the

    column names to less than 10 characters and eliminating character in column names. The

    columns are formatted as numeric, text or dates accordingly, as ArcGIS has specific protocols

    regarding table structures and data types. After the necessary preparations are made in the

    spreadsheet, it is ready for import into GIS.

    In ArcMap, the Add XY Data command is invoked which opens a dialogue window for the

    input of XY data from a tabular data to create an event theme. In ArcMap 10, the command

    to add XY data is via File > Add Data > Add XY Data. In the dialogue window that

    follows, the relevant spreadsheet file is selected, then the columns that hold the X- and Y

  • 12

    coordinates are selected to match the corresponding X and Y fields, and the output coordinate

    system is specified (see Fig.2). The settings are checked and if satisfactory, the selection is

    confirmed with OK to execute the process of converting the spreadsheet data into GIS.

    Figure 2. The Add XY data dialogue windows in ArcMap to import tabular data with coordinates to

    create event themes, before filling in the details (left) and after entering the required parameters (right).

    After the tabular data is successfully imported into GIS, an event theme (temporary GIS

    layer) is created which places a point on each coordinate pairs (see Fig.3-A). The result is

    examined for spatial accuracy and if satisfactory, a permanent copy is then made by

    exporting the event theme to a new GIS layer. The new GIS copy of air quality contains all

    the data from the spreadsheet and is now ready for processing and analys in GIS.

    The air quality GIS layer is first re-projected from WGS84 coordinate system to ETRS

    Lambert Azimuthal Equal Area projection. This is necessary to adopt a common EU

    projection because all the countries have different local datums. On the ETRS datum, all the

    air pollution data accurately overlay with each other in ArcMap.

    When the monitoring stations are superimposed on the EU country layer, it is evident that

    some of the monitoring stations lie outside the country boundaries. This may be due to errors

    in the coordinate values from the spreadsheet. Using the country boundaries as a spatial filter,

  • 13

    the monitoring stations that are found inside the countries are saved to a new GIS layer while

    those that fall outside are excluded (Fig.3-B). The next step is then to filter and separate the

    pollution types by dates into the different years they were recorded. The attribute tables are

    queried and the monitoring stations are separated and saved into different years from 2001 to

    2008 for all the pollution types (see Fig.4). After this process, there are 48 GIS layers

    resulting from the 6 original air quality layers each having 8 layers for the different years

    from 2001 to 2008. The final 48 GIS layers are now ready for interpolation in the subsequent

    stage.

    Figure 3. Event theme of PM10 created from spread sheet (A) and the permanent copy made of

    monitoring stations that fall inside the countries GIS layer (B) shown in ArcMap. The permanent copy of

    PM is for all years from 2001 to 2008.

  • 14

    Figure 4. The PM10 layer over the study area is separated into different years from 2001 to 2008 for

    interpolation by individual years.

    3.2.2 Spatial Interpolation in GIS

    Ambient air concentrations are recorded at the monitoring-station level. However, due to

    their uneven distributions, the concentrations between monitoring stations remain unknown.

    The immediate solution is to apply spatial interpolation techniques to the available

    monitoring data to provide air quality information between monitoring stations (Denbyl et al.,

    2010).

    Air monitoring stations measure ambient air concentrations, generally at fixed locations and

    they represent changes in air concentrations. However, on their own, they are insufficient to

    provide estimates for the intervening locations to visualize their continuity and variability.

    This is where interpolation becomes unavoidable because of its ability to create continuous

    surfaces from sample data that makes interpolation both powerful and useful. From the

    surface, the morphology and characteristics of the changes can be described (Childs, 2004).

    There are different methods of interpolation. Each method uses a different approach and they

    almost always produce different results, therefore the most appropriate method will depend

    on the distribution of the sample points and the phenomena being studied (Childs, 2004).

  • 15

    In GIS, surface representation is done by storing the x,y values and Z values define the

    location of a sample and the change characteristic represented by the Z value. These points

    can be represented as contours where lines of equal values can be joined to depict the surface

    as in contour lines or alternatively, the points can be represented as triangular irregular

    network (TIN) or as grid surfaces. TIN is a vector data structure used to store and display

    surface models while grid is a spatial data structure that defines spaces as an array of cells of

    equal size that are arranged in rows and columns representing a surface. The various methods

    are aimed at representing continuous surfaces through interpolation.

    There are various interpolation techniques but some of the common ones available in GIS are

    spline, inverse distance weighting (IDW), kriging, trend surface and thiessen polygons.

    Within ArcGIS, several spatial interpolation techniques such as natural neighbour, spline

    with barriers, topo to raster and trend are available. These spatial interpolation methods can

    be generally grouped into several categories based on their basic hypotheses and

    mathematical natures such as geometric method, statistical, geostatistical, stochastic

    simulation, physical model simulation and combined method (Li et al., 2000). Ultimately, the

    rationale behind interpolation is to fill in the blanks in between points and display a much

    smoother and fine surface. Therefore, well distributed and sufficient number of data points in

    the area under investigation would minimize uncertainties between points.

    Research on the comparison of the various spatial interpolation methods shows that there is

    no absolutely optimal method; however, there is only relatively optimal interpolation method

    in special situations (ibid). Therefore the best spatial interpolation method should be selected

    in accordance with the quantitative analysis of the data analysis and repeated experiments. In

    addition, the results of spatial interpolation should be strictly examined for validity (ibid).

    Studies relating to air pollution that implemented spatial interpolation to map air quality at

    European scale used additional datasets like land-cover, elevation, meteorology and

    population density to improve the methodology and reduce uncertainties in their models

    (Horlek et al., 2007, Horlek et al., 2010, Smet et al., 2009). A series of technical

    publications by the European Topic Centre on Air and Climate Change (ETC/ACC) deals

    extensively on the topic of interpolation and air quality mapping. In particular, a paper by

    Horlek and others (2007) on Spatial Mapping of Air Quality for European Scale

    Assessment is comprehensive and incorporates most of the common interpolation techniques

    in their methodologies with supplementary data to map air quality in urban and rural areas

    across Europe. From a series of testing and modelling they concluded that for air quality

  • 16

    assessment, kriging methods are generally preferred over IDW and for PM10, lognormal

    kriging over ordinary kriging. In addition, preference is advocated to methodologies that are

    based on linear regression using supplementary data over pure interpolation methods. The

    usage of concurrent meteorological data is reported to give better results than climatological

    data (Horlek et al., 2007). All these preferences and recommendations are based on repeated

    testing of their methodologies over time. Others like Naoum and Tsanis (2004) stated that

    despite the numerous articles written about interpolation, there is little or no agreement

    among the authors on the superiority of some techniques over others. They added that

    judgement and experience come into play when considering which interpolation method to

    use. In a personal communication with Peter de Smet (2011), a leading researcher on air

    quality mapping over Europe from the European Topic Centre on Air Pollution and Climate

    Change Mitigation (ETC/ACM), he confirmed that their tests showed kriging method to

    produce accurate results and IDW usually produce high uncertainties.

    After considering the data available to us, the methodologies used in other researches, the

    tools at our disposal and testing several interpolation techniques, the options came down to

    kriging and inverse distance weighted (IDW). Although kriging is preferred over IDW for

    mapping air quality at European scale, IDW remains popular where there are fewer

    datapoints and is suitable for rapid interpolation of in-situ air quality data. When both kriging

    and IDW were tested by varying the number of monitoring stations, the differences were

    acceptable and the values for IDW remained generally consistent. In addition, IDW retains a

    large range of the original data after interpolation compared to kriging.

    IDW is grounded on the principle of inverse distance where the values of the cells are based

    on a linear weighted combination set of sample points. The value assigned to a cell is a

    function of the distance of an input point from the output cell location the so-called distance

    decay concept. In other words, its estimates are based on values at nearby locations weighted

    only by distance from the interpolation location. The greater the distance, the less influence

    the cell has on the output value. IDW does not make assumptions about spatial relationships

    except the basic assumption that nearby points ought to be more closely related than distant

    points of the value at the interpolated location (Naoum and Tsanis, 2004). IDW interpolation

    is preferred over kriging in this work and will be demonstrated in the following section.

    Choosing an appropriate geostatistical model

    The various geostatistical analysis models have advantages and setbacks for different

    applications. After consulting a number of sources and testing the available interpolation

  • 17

    techniques using Ireland as a case study, the inverse weighted distance (IDW) and kriging

    methods appear to suit the mapping of PM10 and other pollution data. IDW is a deterministic

    method but is mentioned in a number of sources as a suitable choice while kriging which is a

    geostatistical technique is also recommended.

    A test over and Germany using IDW and krigging was carried out and values compared on a

    cell by cell basis. The range in their difference is between -9 and 9, and while IDW retains a

    longer range of values after interpolation, kriging appears to trim the values

    further, resulting in a shorter range on the result. On the test with PM10 for 2005 over

    Germany, the IDW interpolated values range from 11.6 to 36.5 with a mean of 24.1, whereas

    kriging ranges from 17.1 to 31 with a mean of 24.1. The standard deviation for IDW is 3.2

    and kriging is 2.6. An Excel table on the test with descriptive statistics is available on request

    to help explain the differences between the two. In spite of their differences, the majority of

    the values compare well (Figure 4a).

    Figure 4a. IDW interpolation (A) compared to Kriging interpolation (B).

  • 18

    Data preparation prior to Interpolation

    Prior to any implementation of interpolation, a number of steps are necessary to ensure the

    correct outcomes. The first step is to reproject the pollution layer from WGS84 to ETRS 1989

    projection system on the European datum. Since the air quality data is a composite of time-

    series data between 2001 and 2008, the next step then is to filter and separate them into

    individual layers which results in 8 separate GIS layers for each pollution type. Then the

    pollution data for all years are loaded into ArcMap one pollutant at a time for processing.

    At this stage, the geoprocessing environment is configured to the EU countries as the

    maximum processing extent for interpolation. After setting the environment the IDW

    interpolation technique is invoked which opens up the IDW interpolation dialogue window

    (see Fig.5). The input point feature requires the input of pollution point data, the Z value

    field is the field for air quality data to interpolate and the output raster is for the name and

    location for the interpolation result. After the input dialog options are selected and adjusted,

    the interpolation is executed. The result is displayed immediately as shown in Fig.6. This

    procedure is repeated for all the years from 2001 to 2008 for each pollutant. After the

    completion of the interpolation processing, there are 48 raster layers created from 6 pollution

    types.

    Figure 5. Dialogue windows for IDW interpolation in ArcMap: default user interactive window (A) and

    populated IDW window ready for processing (B).

  • 19

    Figure 6. The outcome of IDW Interpolation for PM10 for 2001 across the 23 EU member states.

    The ideal scenario would be to query and extract information on air quality from a single

    database as opposed to dealing with a number of separate layers which would involve

    additional time and effort besides taking up more storage space, particularly for raster

    datasets. In order to combine all the air quality data into a single GIS database, a vector grid

    matrix is created as a container to store the extracted raster values from the interpolation

    results. The steps involved are detailed as follows.

    Fishnet vector grid matrix to store interpolation results

    The fishnet function in ArcGIS can create a regularized vector grid matrix of any cell size at

    any given extent. Setting the EU countries as the maximum geoprocessing extent, the fishnet

    tool is implemented to create a vector grid matrix of 5x5 km cell size. The result is a vector

    GIS layer that spans the extent of the EU countries and contains 534,378 grid cells (Fig.7-A).

    A spatial overlay is made between the two layers and the grids that intersect with the EU

    countries are saved to a new GIS layer for further processing. The resulting grid matrix has

    177,645 grid cells (Fig.7-B).

  • 20

    Figure 7. The EU countries GIS layer on the left is used to select the 5x5km vector grid that it intersects

    with and the result saved to a new file as shown on the right.

    Since the interpolation and vector grid layers are created from a common map extent and

    their grid sizes set to 5km on the side, there is an exact match when the vector grid is

    superimposed on the interpolation raster layer (see Fig.8). The attribute table of the vector

    grid is restructured and 8 new fields are added with their nomenclature sequentially ranging

    from PM10_2001 to PM10_2008. These fields will store the pollution values of the

    corresponding years from the interpolation raster layers. One vector GIS grid matrix is

    created for each pollution type resulting in 6 grids and their attribute tables restructured to

    store data for all years.

    Figure 8. Overlay of vector grid matrix on PM10 raster interpolation result showing the exact

    match between the raster cells and the vector grid cells. The figure on the right is an inset of the

    box on the figure to the left.

  • 21

    In the next step the vector grid is overlaid on the raster layer and the raster pixels are

    transferred to the corresponding vector grid cells. This process is repeated for all the 8 raster

    layers for each year where the pixel values of each raster layer is transferred to the

    corresponding year attribute in the vector GIS database. For example, the pixel values from

    PM10 for 2001 are transferred to the attribute (field/column) for 2001 in the GIS attribute

    table, the pixel values for PM10 for 2002 are stored in the attribute for 2002 and so on (see

    Fig.10). This process is repeated for all the 8 raster layers until all the corresponding 8

    attributes in the vector database for PM10 are updated. The advantage of storing data in the

    vector format is its flexibility to query the database in various ways and vector data structure

    occupies less storage space. The same process is replicated on the remaining seven pollutants

    and in the end, the 48 raster layers are compacted into to just 6 vector GIS layers.

    Transferring raster cell values to vector attribute table

    The Transfer of raster cell values to the grid matrix is a two-step process. First an

    intermediate point layer is created from the polygon grid matrix layer (Fig.9-A) where the

    points represent the centre (centroid) of the grid (Fig.9-B). Using a GIS surface overlay tool

    called Extract Raster Values, (Fig.9-D) the raster cell values (Fig.9-C) are extracted and

    transferred to the corresponding points attribute table (see Fig.10). After this step is

    completed, the second step is then to transfer the attributes from the point GIS layer across to

    the polygon grid matrix through attribute transfer function. Since the points were created

    from the grid matrix layer, they have identical feature identification record addresses and the

    values in the point layer are transferred by address matching. This procedure is repeated for

    all the other layers.

  • 22

    Figure 9. Raster Value Extraction procedure showing the construction of vector grid matrix (A),

    creating points/centroids from the grid (B) and extracting the raster pixel values (C). The GIS

    tool used to extract raster values is shown in D.

    PM10

    2001

    PM10

    2002

    PM10

    2003

    PM10

    2004

    PM10

    2005

    PM10

    2006

    PM10

    2007

    PM10

    2008

    Figure 10. The pixel values of the 8 raster layers for PM10 from 2001-2008 (top) are transferred to the

    corresponding fields in the vector .grid attribute table (bottom) resulting in only one file storing PM10

    data for all years.

  • 23

    Querying and mapping the pollution data at 5x5 km grid matrix

    The outcome of the previous step produced 6 vector grid layers which can be queried and

    mapped. The IDW interpolation results showed that the original air quality values are

    rounded up from the lowest and rounded down from the highest values. For example, the raw

    data for PM10 for 2001 ranged from 6.3 to 103.4; however, the range of values after

    interpolation is shortened to 8 till 95. The computed values cannot be higher than the highest

    and lower than the lowest value of the original data, a typical feature pertinent to IDW

    interpolation. From one vector GIS layer, 8 time series maps for 2001 to 2008 can be

    produced (Fig.11). A variety of statistical analysis and querying are possible from the

    database when the data is in vector format.

    Figure 11. Rapid outputs of annual mean PM10 from 2001 to 2008 across the study area.

    3.2.3 Integration of air quality with NUTS data

    The smallest mapping unit or resolution for the air quality maps are 5x5 km. These data can

    be integrated with other datasets like climate, demography, socio-economic, transport and

    others to enhance the data content to facilitate answering questions on a broad range of

    themes. One of the aims of this study is to map the air quality at NUTS3 level. This implies

    an aggregation of the interpolation results to NUTS3 scale. This involves a number of steps to

    produce a new dataset that integrates information on both NUTS3 and pollution.

    2004 2002 2003

    2006 2007

    2001

    2005 2008

  • 24

    The first step requires combining the air quality data and NUTS3 into a single dataset. This

    task is carried by union in GIS overlay operation. In union, there is a geometric intersection

    of the two GIS data layers where they are multiplied by each other to produce a Cartesian

    product. After the union, some clean-up process is necessary and the portions of grids that

    fall outside the boundaries of the countries are eliminated.

    At this stage the smallest mapping unit is 5x5 km with the NUTS3 boundaries infused,

    cutting through them like a cookie cutter (see Fig.12-E). Since the desired output is to map

    the air quality at NUTS3 level, the air quality data is aggregated to NUTS3 level using a

    generalization tool called dissolve by melting away the 5x5 km grids in each NUTS3 units.

    During the aggregation, the numeric data are averaged while the non-numeric data are

    transferred either by first or last name. After the dissolve operation, the pollution data are

    aggregated at NUTS3 level and can be queried and mapped.

    The workflow in Fig.1 is illustrated in Fig.12 using Germany as an example. It shows the

    steps involves in extracting air quality monitoring stations over Germany, then applying the

    interpolation techniques to create a surface and extracting the raster values to vector grids of

    5x5km resolution. The values are then transferred and averaged to the NUTS level where

    PM10 for 2005 is mapped (Fig.12:A-G).

  • 25

  • 26

    Figure 12. Demonstrating actual steps involved in creating PM10 map at NUTS3 scale for Germany.

    Notes on Fig.12 A-G

    A: Spatial distribution of monitoring

    stations across Germany.

    B: IDW interpolation result from the

    monitoring stations.

    C: Extract raster values from

    interpolation surface to 5x5 km

    vector grid matrix through overlay

    function.

    D: Mapping PM 10 at 5x5 km grid

    resolution.

    E: Geometric intersection with

    NUTS3 GIS layer through union

    overlay function.

    F: Dissolving the 5x5 km grid

    resulting in air quality data

    aggregated and mapped at NUTS3

    scale.

    G: Final PM10 map for 2005 over

    Germany at NUTS3 scale.

  • 27

    4. GIS analysis and mapping of climate and land use data

    4.1 Climate data

    Temperature and precipitation data was obtained from the European gridded data set of

    surface temperature and precipitation for the period of 1950 2008, version 4.0, produced by

    the European Climate Assessment & Dataset (ECA&D). This freely available dataset

    contains daily observations at meteorological stations throughout Europe and the

    Mediterranean (http://eca.knmi.nl/).

    The selected data files were compressed in a NetCDF format. With an original resolution of

    0.25 degrees and projected in a latitude longitude grid. The relevant NetCDF files were

    extracted using the specialised software CDO produced by Max-Planck. CDO is a collection

    of tools developed to manipulate and analyse climate and forecast model data.

    The produced factors were of: i) mean annual temperature, ii) maximum temperature (July),

    iii) minimum temperature (January) and iv) mean annual precipitation. Mean values were

    obtained from daily data with the use of CDO.

    The extracted maps were re-projected from their latitude and longitude coordinates to

    Lambert Azimuthal Equal Area (LAEA) and re-sampled to a 5,000 metre resolution with the

    use of ArcGIS. The produced maps showed continuous data of temperature and precipitation.

    The main characteristic of continuous data is that values often changed every 5,000 metres.

    To obtain mean values at NUTS 3 level, images for each factor where analysed, using the

    Zonal Statistics module in ArcGis. This module allows calculating statistics on values of a

    raster within the zones of another dataset. In this case, the NUTS 3 dataset defined the zones

    and each factor the aggregated values.

    In total, 32 maps were produced from the period of 2001 to 2008, one for each year for each

    of the four climatic variables (see Fig.13 for an example).

    In order to produce attribute tables for each factor and link those to the NUTS 3 database, the

    produced maps where transformed to vector files. This produced attribute tables for each

    factor with the same Object ID as the NUTS 3, making it possible to link each database.

    Finally, with the use of Spatial Join module all attribute tables where integrated into the

    NUTS 3 database. The Spatial Join module creates a table join in which fields from one

  • 28

    layers attribute table are appended to another layers attribute table based on the relative

    location of the features in the two layers.

    Mean Annual Temperature 2001, NUTS 3

    TC

    01-mat--c

    Value

    High : 20.000000

    Low : -30.000000

    Mean Annual Temperature 2002, NUTS 3

    TC

    2002-matnuts3

    Value

    High : 20

    Low : -30

    Mean Annual Temperature 2003, NUTS 3

    TC

    2003-matnuts3

    Value

    High : 20.000000

    Low : -30.000000

    Mean Annual Temperature 2004, NUTS 3

    TC

    2004-matnuts3

    Value

    High : 20.000000

    Low : -30.000000

    Mean Annual Temperature 2005, NUTS 3

    TC

    2005-matnuts3

    Value

    High : 20.000000

    Low : -30.000000

    Mean Annual Temperature 2006, NUTS 3

    TC

    2006-matnuts3

    Value

    High : 20.000000

    Low : -30.000000

    Mean Annual Temperature 2007, NUTS 3

    TC

    2007-matnuts3

    Value

    High : 20.000000

    Low : -30.000000

    Mean Annual Temperature 2008, NUTS 3

    TC

    2008-matnuts3

    Value

    High : 20.000000

    Low : -30.000000

    Figure 13. Mean Annual Temperature 2001 to 2008 across the study area.

    4.2 Land Use Data

    The vector of spatial land use factors comes from the Coordination of Information on the

    Environment (CORINE) land cover database. CORINE is a pan-European database carried

    out within each European member state. It is a vector spatial dataset, land cover digitized

    based on the interpretation of medium resolution satellite imagery and assigned a land use

    class based on a standardized land cover nomenclature defined by the European Environment

    Agency. The minimum area mapped in the dataset is 25 hectares. Within this research, broad

    land use statistics were derived from the CORINE database for the years 2001 and 2006. For

    the purposes of this study, the original 44 land use categories of the CORINE nomenclature

    are re-categorised into the following classes.

    1. Residential

    2. Commercial and Industrial

    3. Mines and Dumps

    2004 2002 2003

    2006 2007

    2001

    2005 2008

  • 29

    4. Green Urban Spaces

    5. Agricultural Land

    6. Forestry

    7. Natural Areas6

    8. Waterbodies

    This re-categorisation is considered more appropriate to capture the environmental typologies

    of interest in this study due to the low spatial resolution of 25 hectares and for econometric

    purposes (i.e. to avoid multicollinearity).

    As a quantitative representation of land cover in the ESS regions we have used areas

    in square meters for each of the CORINE 44 land cover classes (see Appendix). However, it

    was not straightforward as the regions used in ESS for different countries were based on the

    boundaries of different NUTS levels (Table 2). Moreover, in some cases, different NUTS

    levels were used even inside of the same country (e.g. France and Ireland). Therefore, we

    have composed an ESS regions map, which includes the boundaries of corresponding NUTS

    level for each country. Europe NUTS 1-, 2- and 3-level maps provided by ESRI were used as

    a base for this composite map, which we will call ESS Regions map hereafter.

    Figure 14: Geographical coverage of CORINE for 2000 and 2006

    6 Natural Areas are EU-designated as areas of outstanding natural beauty.

    2000

    2006

  • 30

    As mentioned above, the studied ESS rounds were implemented in 2001, 2004 and 2006.

    Thus, only for 2006 we have data from both sources (ESS and CORINE). Therefore, we have

    used linear interpolation to estimate the appropriate land cover statistics in the intermediate

    years of 2001 and 2004, based on the CORINE data of 2000 and 2006. Particularly, in the

    first stage ArcGIS Spatial Analyst Tabulate Area function was used to calculate areas by

    land cover classes for each region from the ESS Region map based on 2000 and 2006

    CORINE raster maps. Then the results were exported to MS Excel and used to estimate the

    corresponding values of 2001/2 and 2004 in the following formula for each land cover class

    and region:

    ,46

    ,26

    ,

    2000

    ,

    2006,

    2000

    ,

    2002

    ,

    2000

    ,

    2006,

    2000

    ,

    2002

    ncnc

    ncnc

    ncnc

    ncnc

    CLCCLCCLCCLC

    CLCCLCCLCCLC

    where nc

    yCLC,

    is the area of a land cover class c in a NUTS unit n in the year y.

    As a result for each region used in ESS we have got 44 land cover class areas in square

    meters for 2000 and 2006 from actual CORINE datasets and for 2001/2 and 2004/5 from

    linear interpolation. Finally the resulting land cover data table was joined with ESS dataset

    and the demographic data provided by ESRI Europe NUTS maps. ArcGIS Join attributes

    from a table function was applied using NUTS name as linking field.

  • 31

    References

    Bond, Derek and Devine, Paula. The Role of Geographic Information Systems in Survey

    Analysis. The Statistician, 1991, 40 (2), pp. 209 216.

    Brereton, F., Clinch, J.P. and Ferreira, S. (2008) Happiness, Geography and the

    Environment Ecological Economics 65, 386396

    Butz, W.P. & Torrey, B.B. (2006) Some Frontiers in Social Science, Science, 312, 30,

    1898-1900.

    Childs, C. 2004. Interpolating Surfaces in ArGIS Spatial Analyst. Developer's Corner -

    ArcUser July-September 2004 ESRI.

    Denbyl, B., Garcia, V., HoUand, D. & Hogrefe, C. 2010. Integration of air quality modeling

    and monitoring data for enhanced health exposure assessment. EM Magazine

    [Online], Special Issue. Available:

    http://www.google.ie/url?sa=t&source=web&cd=1&ved=0CBUQFjAA&url=http%3

    A%2F%2Foaspub.epa.gov%2Feims%2Feimscomm.getfile%3Fp_download_id%3D4

    91678&ei=9SI0TrLKHYbMhAfc3NHiCg&usg=AFQjCNFvcSq8o9Gm6ZGWASgac

    Ukky3Gj-Q [Accessed 30 July 2011].

    Goodchild, Michael, F. and Haining, Robert, P. GIS and Spatial Data Analysis: Converging

    Perspectives. Papers in Regional Science, 2004, 83, pp. 363 385.

    Horlek, J., Denby, B., Smet, P. d., Leeuw, F. d., Kurfrst, P., Swart, R. & Noije, T. v. 2007.

    Spatial mapping of air quality for European scale assessment. ETC/ACC Technical

    Paper 2006/6. Bilthoven: European Topic Centre on Air and Climate Change.

    Horlek, J., Smet, P. d., Leeuw, F. d., Cokov, M., Denby, B. & Kurfrst, P. 2010.

    Methodological improvements on interpolating European air quality maps. ETC/ACC

    Technical Paper 2009/16. Bilthoven: European Topic Centre on Air and Climate

    Change.

    Li, X., Cheng, G. & Lu, L. 2000. Comparison of Spatial Interpolation Methods. Advances in

    Earth Science, 260-265.

    Luechinger, S., (2009). Valuing Air Quality Using the Life Satisfaction Approach. Economic

    Journal 119, 482-515.

    MacKerron, G., and S. Mourato, (2009). Life satisfaction and air quality in London,

    Ecological Economics, 68(5): 1441-1453

    Naoum, S. & Tsanis, I. K. 2004. Ranking Sparial Interpolation Techniques using a GIS-

    Based DSS. Global Nest The International Journal, 6, 1-20.

    Smet, P. d. 2011. RE: Interpolation techniques and modelling for mapping air monitoring

    values across EU. Type to Ningal, T.

    Smet, P. d., Horlek, J., Cokov, M., Kurfrst, P., Leeuw, F. d. & Denby, B. 2009.

    European air quality maps of ozone and PM10 for 2007 and their uncertainty analysis.

    ETC/ACC Technical Paper 2009/9. Bilthoven: European Topic Centre on Air and

    Climate Change.

    Wu, Yi-Hwa; Miller, Havery, J. and Hung, Ming-Chih. A GIS-based Decision Support

    System for Analysis of Route Choice in Congested Urban Road Networks. Journal

    of Geographical Systems, 2001, 3, pp. 3 24.

    http://www.google.ie/url?sa=t&source=web&cd=1&ved=0CBUQFjAA&url=http%3A%2F%2Foaspub.epa.gov%2Feims%2Feimscomm.getfile%3Fp_download_id%3D491678&ei=9SI0TrLKHYbMhAfc3NHiCg&usg=AFQjCNFvcSq8o9Gm6ZGWASgacUkky3Gj-Qhttp://www.google.ie/url?sa=t&source=web&cd=1&ved=0CBUQFjAA&url=http%3A%2F%2Foaspub.epa.gov%2Feims%2Feimscomm.getfile%3Fp_download_id%3D491678&ei=9SI0TrLKHYbMhAfc3NHiCg&usg=AFQjCNFvcSq8o9Gm6ZGWASgacUkky3Gj-Qhttp://www.google.ie/url?sa=t&source=web&cd=1&ved=0CBUQFjAA&url=http%3A%2F%2Foaspub.epa.gov%2Feims%2Feimscomm.getfile%3Fp_download_id%3D491678&ei=9SI0TrLKHYbMhAfc3NHiCg&usg=AFQjCNFvcSq8o9Gm6ZGWASgacUkky3Gj-Qhttp://www.google.ie/url?sa=t&source=web&cd=1&ved=0CBUQFjAA&url=http%3A%2F%2Foaspub.epa.gov%2Feims%2Feimscomm.getfile%3Fp_download_id%3D491678&ei=9SI0TrLKHYbMhAfc3NHiCg&usg=AFQjCNFvcSq8o9Gm6ZGWASgacUkky3Gj-Q

  • 32

    Appendix. Classes of CORINE nomenclature 3 levels

    GRID

    CODE LABEL1 LABEL2 LABEL3

    1 Artificial surfaces Urban fabric Continuous urban fabric

    2 Artificial surfaces Urban fabric Discontinuous urban fabric

    3 Artificial surfaces Industrial, commercial and transport units Industrial or commercial units

    4 Artificial surfaces Industrial, commercial and transport units Road and rail networks and associated land

    5 Artificial surfaces Industrial, commercial and transport units Port areas

    6 Artificial surfaces Industrial, commercial and transport units Airports

    7 Artificial surfaces Mine, dump and construction sites Mineral extraction sites

    8 Artificial surfaces Mine, dump and construction sites Dump sites

    9 Artificial surfaces Mine, dump and construction sites Construction sites

    10 Artificial surfaces Artificial, non-agricultural vegetated areas Green urban areas

    11 Artificial surfaces Artificial, non-agricultural vegetated areas Sport and leisure facilities

    12 Agricultural areas Arable land Non-irrigated arable land

    13 Agricultural areas Arable land Permanently irrigated land

    14 Agricultural areas Arable land Rice fields

    15 Agricultural areas Permanent crops Vineyards

    16 Agricultural areas Permanent crops Fruit trees and berry plantations

    17 Agricultural areas Permanent crops Olive groves

    18 Agricultural areas Pastures Pastures

    19 Agricultural areas Heterogeneous agricultural areas Annual crops associated with permanent crops

    20 Agricultural areas Heterogeneous agricultural areas Complex cultivation patterns

    21 Agricultural areas Heterogeneous agricultural areas Land principally occupied by agriculture, with significant areas of natural vegetation

    22 Agricultural areas Heterogeneous agricultural areas Agro-forestry areas

    23 Forest and semi natural areas Forests Broad-leaved forest

    24 Forest and semi natural areas Forests Coniferous forest

    25 Forest and semi natural areas Forests Mixed forest

    26 Forest and semi natural areas Scrub and/or herbaceous vegetation associations Natural grasslands

    27 Forest and semi natural areas Scrub and/or herbaceous vegetation associations Moors and heathland

    28 Forest and semi natural areas Scrub and/or herbaceous vegetation associations Sclerophyllous vegetation

    29 Forest and semi natural areas Scrub and/or herbaceous vegetation associations Transitional woodland-shrub

    30 Forest and semi natural areas Open spaces with little or no vegetation Beaches, dunes, sands

    31 Forest and semi natural areas Open spaces with little or no vegetation Bare rocks

    32 Forest and semi natural areas Open spaces with little or no vegetation Sparsely vegetated areas

  • 33

    33 Forest and semi natural areas Open spaces with little or no vegetation Burnt areas

    34 Forest and semi natural areas Open spaces with little or no vegetation Glaciers and perpetual snow

    35 Wetlands Inland wetlands Inland marshes

    36 Wetlands Inland wetlands Peat bogs

    37 Wetlands Maritime wetlands Salt marshes

    38 Wetlands Maritime wetlands Salines

    39 Wetlands Maritime wetlands Intertidal flats

    40 Water bodies Inland waters Water courses

    41 Water bodies Inland waters Water bodies

    42 Water bodies Marine waters Coastal lagoons

    43 Water bodies Marine waters Estuaries

    44 Water bodies Marine waters Sea and ocean

    48 NODATA NODATA NODATA

    49 UNCLASSIFIED UNCLASSIFIED LAND SURFACE UNCLASSIFIED LAND SURFACE

    50 UNCLASSIFIED UNCLASSIFIED WATER BODIES UNCLASSIFIED WATER BODIES

    255 UNCLASSIFIED UNCLASSIFIED UNCLASSIFIED