using e-infrastructures for biodiversity conservation - module 3
TRANSCRIPT
Using e-Infrastructures for Biodiversity Conservation
Gianpaolo Coro ISTI-CNR, Pisa, Italy
• Biodiversity and geospatial data• Trends in biodiversity observations• Combining species observations• Combining biodiversity and geospatial data
Module 3 - Outline
D4ScienceD4Science is both a Data and a Computational e-Infrastructure
• Used by several Projects: i-Marine, EUBrazil OpenBio, ENVRI;
• Implements the notion of e-Infrastructure as-a-Service: it offers on demand access to data management services and computational facilities;
• Hosts several VREs for Fisheries Managers, Biologists, Statisticians…and Students.
D4Science - ResourcesLarge Set of Biodiversity and Taxonomic Datasets connected
A Network to distribute and access to Geospatial Data
Distributed Storage System to store datasets and documents
A Social Networkto share opinions and useful news
Algorithms for Biology-related experiments
• Biodiversity and geospatial data• Trends in biodiversity observations• Combining species observations• Combining biodiversity and geospatial data
Module 3 - Outline
Biodiversity and Geospatial Data
Biodiversity Data Providers
i-Marine hosts biodiversity datasets coming from several data providers:• Some are remotely accessed and are maintained by the respective owners;• Other ones are resident in the e-Infrastructure.
Currently, the accessible datasets are:• Catalogue of Life (CoL) • Global Biodiversity Information Facility (GBIF), • Integrated Taxonomic Information System (ITIS), • Interim Register of Marine and Nonmarine Genera (IRMNG), • Ocean Biogeographic Information System (OBIS), • World Register of Marine Species (WoRMS) • World Register of Deep-Sea Species ( WoRDSS )
Some data providers are collectors of other data providers, but the alignment is not guaranteed!The datasets allow to retrieve:• Occurrence points (presence points or specimen)• Taxa names
Online Examples:http://www.catalogueoflife.org/
http://www.gbif.org/http://www.iobis.org/
Geospatial Data Providers
Bio-ORACLE
NetCDF NetCDFASCIIArcGIS
ASCII Raw formatsWorld Ocean Atlas
Online Examples:http://www.myocean.eu
https://www.nodc.noaa.gov/OC5/woa13/http://www.oracle.ugent.be/
ToolsUI ftp://ftp.unidata.ucar.edu/pub/netcdf-java/v4.5/toolsUI-4.5.jar
• Biodiversity and geospatial data• Trends in biodiversity observations• Combining species observations• Combining biodiversity and geospatial data
TrendylyzerTrendylyzer allows to discover species observation trends.It is based on the OBIS collector
OBIS
This trend tells the story of the Coelacanth discovery
Online Example: the i-Marine Trendylyzer
https://i-marine.d4science.org/group/biodiversitylab/trends-production
• Biodiversity and geospatial data• Trends in biodiversity observations• Combining species observations• Combining biodiversity and geospatial data
Cleaning
Union – Difference - Intersection
Occurrences Points Operations
A
x,y
Event Date
Modif Date
Author
Species Scientific Name
d(x,y) < Distance Thr
=
LD(Author) * LD(SciName) > Lexical Thr
<Take the most recent>
B
x,y
Event Date
Modif Date
Author
Species Scientific Name
Evaluate
Experiment
Solea solea57 085 Records2 324 Records
1 871 Records10 542 Records
Duplicates Deletionwith Exact Match(DThr=0; LThr=1)
Subtraction
DThr=0.01; LThr=0 DThr=0.01; LThr=1DThr=0.0001;
LThr=0.8
183 Records 0 Records 0 Records
Main remarks:
• The “recordedBy” fields contain differences in names formats
• The Scientific Names fields are different (names vs names and codes)
• D4Science helps in collecting a larger number of Solea solea unique occurrence records
• Even if GBIF collects data from OBIS, the coverage is not updated
Occurrences Points Operations
Occurrences Duplicates Deleter:An algorithm for deleting similar occurrences in a sets of occurrence points coming from the Species Discovery Facility of D4Science.
A
Occurrences Points Operations
Occurrences Intersection: Between two Ocurrence Sets A and B, keeps the elements of the B that are similar to elements in A.
A B
Occurrences Points Operations
Occurrences Subtraction:Between two Ocurrence Sets A and B, keeps the elements of the A that are not similar to any element in B
A B
Occurrences Points Operations
Occurrences Merger:Between two Ocurrence Sets A and B, enriches A with the elements of B that are not in the A. Updates the elements of the A with more recent elements in B. If one element in A corresponds to several recent elements in B, these are substituted to the element of A.
A
B
Online experiments: the i-Marine
Occurrence Management systemhttps://i-marine.d4science.org/group/biodiversitylab/processing-tools
• Biodiversity and geospatial data• Trends in biodiversity observations• Combining species observations• Combining biodiversity and geospatial data
Module 3 - Outline
Combining Biodiversity and Geospatial data
Environmental layers
Species occurrence dataset
Enriched dataset
Online Experiments:https://i-marine.d4science.org/group/biodiversitylab/processing-tools
One practical application
The giant squid - Architeuthis
16th century 2012
The giant squid (Architeuthis) has been reported worldwide even before the 16th century, and has recently been observed live in its habitat for the first time.
Why rare species?• Biological and evolutionary investigations• Fisheries management policies and conservation• Vulnerable Marine Ecosystems• Key role in affecting biodiversity richness• Indicators of degradation for aquatic ecosystems
Detecting rare species
• How to build a reliable distribution from few observations?
• How to account for absence locations?• Is there any approach forrare species?
Data qualityFor rare species, data quality is fundamental:
• Reliable presence data • Reliable absence locations• High quality environmental features• Non-noisy environmental features
Tools – i-marine.d4science.orgD4Science e-Infrastructure:
• Retrieve presence data• Generate absence data• Get environmental data• Model, adjust data and
produce maps• Share results
1. Presence data of A. dux from D4S
https://i-marine.d4science.org/group/biodiversitylab/species-data-discovery
2. Simulating A. dux absence locations from AquaMaps
https://i-marine.d4science.org/group/biodiversitylab/processing-tools
0<Prob. < 0.2AquaMaps Native
3. Environmental Features
https://i-marine.d4science.org/group/biodiversitylab/geo-visualisation
https://i-marine.d4science.org/group/biodiversitylab/processing-tools
Most of these layers were available in D4Science
Depth and Distance from landwere imported using the Statistical Manager
4. MaxEnt model as filter
https://i-marine.d4science.org/group/biodiversitylab/processing-tools
MaxEntEnv. features most
correlated to the giant squid
Presence data
Env. data
Filtered Environmental Features
5. Presence/absence modelling: Artificial Neural Networks (ANN)
Model trained on positive and negative examplesIn terms of env. features
Binary file
https://i-marine.d4science.org/group/biodiversitylab/processing-tools
Presence/absence data
Filtered env. features
6. Projection of the Neural Network
https://i-marine.d4science.org/group/biodiversitylab/processing-tools
7. Comparison
MaxEnt (presence-only)
22.01% 21.68%
Similarity calculated using Maps Comparison, by Coro, Ellenbroek, Pagano DOI: 10.1080/15481603.2014.959391
Expert map, Nesis, 2003
Aquamaps Suitable
(expert system)
Neural Network (presence/absence)
42.83%
https://i-marine.d4science.org/group/biodiversitylab/processing-tools
Conclusions
• Using data quality enhancement produces high performance distribution
• A presence/absence ANN combines these data• Biological, observation and expert evidence confirm the prediction
by the ANN
Summary: modelling rare species distributions
1. Retrieve high quality presence locations by relying on the metadata of the records,
2. Use expert knowledge or an expert system to detect absence locations. Select absence locations as widespread as possible,
3. Select a number of environmental characteristics correlated to the species presence,
4. Use MaxEnt to filter the environmental characteristics that are really important with respect to the presence points,
5. Train an Artificial Neural Network on presence and absence locations and select the best learning topology,
6. Project the ANN at global scale, using the a resolution equal to the maximum in the environmental features,
7. Train a MaxEnt model as comparison system.
Just another exampleCoelacanth, Smith 1939
GARP
MaxEnt
AquaMaps
Neural Network
Coro, Gianpaolo, Pasquale Pagano, and Anton Ellenbroek. "Combining simulated expert knowledge with Neural Networks to produce Ecological Niche Models for Latimeria chalumnae." Ecological Modelling 268 (2013): 55-63.