environment canada's data management service
TRANSCRIPT
![Page 1: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/1.jpg)
A brief history in TimeSeriesdata at Environment CanadaJames DoyleProject Manager
&Christopher ThorneGeomatics Data Analyst
![Page 2: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/2.jpg)
Environment Canada’s Data Management Program (2011 –Present)
Projects:
1. Data Governance and Architecture(Data Stewardship Model & Standards)
2. Data Catalogue (supporting Open Data and Federal Geospatial Platform)
3. Data Access and Sharing
4. Data Consolidation
5. Data Integration
![Page 3: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/3.jpg)
EC Subject Area Model
![Page 4: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/4.jpg)
Hunting for a standard -XML Architecture
North American Profile of ISO 19115
(ISO/TS 19139) GeographyMarkupLanguage 3.2(ISO 19136)
Observations andMeasurements 2.0 (ISO 19156)
SWECommon Data Model
2.0
WaterML 2.0Part 1- Timeseries
TimeSeriesML
• WMO/NOAA and EC want WaterML 2.0 Part 1 rebranded
• IMD is participating in the OGC TimeSeriesML SWG
![Page 5: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/5.jpg)
COMP Logical Data ModelProvides a simple, stable, logical layer used for:
User interfaces Data resource modularization
Common Observation and Measurement ProfileA common XML exchange profile for time series data that is 100% compliant with the OGC international standards:
wml2: WaterML 2.0 Part 1 – Timeseries om: Observations & Measurements swe: Sensor Web Enablement Common Data Model gml: Geography Markup Language
What does the standard look like?The Anatomy of COMP
![Page 6: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/6.jpg)
What does COMP offer EC and its partners?XML Data Exchange
COMP ViewerWhen you open an online COMP XML file in your browser, the Viewer tracks down all the external references and presents you with a complete picture of the metadata and data as an HTML report in the official language of your choice – with outlining for easy navigation
COMP Data Point UtilitiesTo extract data values into tabular formats for consumption by your analytical software
Value Added Tools
GIS Mapping Data Visualization
A common XML exchange profile for time series data that is 100% compliant with OGC international standards – no local extensions
![Page 7: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/7.jpg)
COMP XMLThis XML fragment references a name and a unit of measure in SKOS taxonomies
SKOS TaxonomiesDefine these terms in English and French
COMP ViewerLooks up these SKOS references and resolves them in English and French
en-CAfr-CA
Simple Knowledge Organization System
EC ISO-NAP NAtChem (Air Quality) Substance Unit of Measure WaterML2 Species Bio-organism Water Quality Water Quantity Meteorology Ice Service Wild Life Service ?
Example of COMP Use of SKOS Taxonomies
![Page 8: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/8.jpg)
COMP in Actionhttp://www.ec.gc.ca/data_donnees/compCOMP XML File
When the user clicks on the file, it asks the browser to render the XML using the COMP Viewer instead of its own default XSLT script
(See 2nd line of syntax)
2
The Download Service pipes back the output to the browser invoking its standard file download facilities
4
Selecting a download optioninvokes the Download Service
3
DownloadService
Data PointExtraction Scripts
COMP Viewer XSLT
COMP Files SKOS Taxonomies
1Browser uses COMP XSLT
![Page 9: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/9.jpg)
Setting up Pilot Project
What EC Monitoring Program will be our guinea pig?
Weather Monitoring
Water Quality & Availability Monitoring
Air Quality Monitoring
Emissions Sources (Air, Water, Land)
Species & Habitat Monitoring
…etc.
Pick me
Pick me!
![Page 10: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/10.jpg)
Selecting Program Observations (Input Dataset)
Data Input:
The National Atmospheric Chemistry Database (NatChem) NARSTO Quality Science Center of the U.S. Oak Ridge Laboratory.
Accessory COMP specific XLS data entry templates For data not found or not easily accessible within source data.
Output Data:
OGC WaterML2.0- Time series (XML) Observations Data linking to Reference Master Data
Reference Data: monitoring site, instrument procedures, parameters (data types), bilingual terms & look up lists.
![Page 11: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/11.jpg)
Who is going to migrate the data?
“No problem, Chris will do it!”
(Correction: Chris + FME )
![Page 12: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/12.jpg)
What does the Input Data Look Like?
Natchem holds 100s of these NARSTO files: organized by study or monitoring network across Canada (+100’s
sites)
~35 years of data at each location/region
~500 instrument and sampling measurement procedures
Time Series logged data can be in - days, hours, or minute
NARSTO files are (TXT/CSV)
With some (not complete) accessory Program Reference Data (CSV, XLS)
All stored on a file share drive.
![Page 13: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/13.jpg)
Input file – header info*
NA
RS
TO
Varia
ble
s
Contacts
File Description/Name
File Abstract / Versioning Info
n…
.File Begins
![Page 14: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/14.jpg)
Input file – Monitoring site information
Site Location(s)
Table Schema/Metadata(uom)
….
n…
.
Table Info
![Page 15: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/15.jpg)
Input file – Observation data & metadata
Time Series by Site Observations(data point records)
Table Schema/Metadata
Observation Table Name & Notes
….
Column Metadatainstrument/sampling procedures
![Page 16: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/16.jpg)
Project Planning
ETL
Data RequestBy timeBy locationBy substance…
Web Services(controlled user driven quality data products )
(centralization & cleaning within DB)
Master Data Recast
(conversion & migration transactions)
Reporting- Data Profiling- QA/QC - Internal Business needs- Data Process logs
Resource Intensity(time & resource)
Quality of Data
![Page 17: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/17.jpg)
Task Breakdown
1. Parse NARSTO formatted csv sources and load into MS SQL Database.
2. Reference Data
i. Develop data profiling & reporting methods to QA/QC the reference data between submitted observation files.
ii. Centralize Program data master reference data for – bilingual definitions, contacts, sites, variables (procedures), and observations.
iii. Data mapping of reference data to OGC WaterML2.0– convert, store, and publishing processes.
3. Time Series Data
1. Create physical data model within MSSQL for storage and also for the TimeSeries XML output.
2. Join/Link reference data to 34 years of observations (semantic web relationships).
3. Produce, validate & publish to online COMP viewer
![Page 18: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/18.jpg)
Data Publication System Architecture
ETL
Data RequestCOMP Viewer& Conversion
Web Services(controlled user driven quality data products )
(prepping & cleaning within DB)
Master Data Recast
(conversion & migration transactions)
Reporting- Data Profiling- QA/QC- Internal Business needs- Data Process Logs
COMP WaterML2.0
XML
Data Sources
…n
XLS
COMP Templates
(data entry)
+
SME(NatChem)
![Page 19: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/19.jpg)
NARSTO Parser using FME
Reader: TEXTLINE (Line by line)
Transformer:
StringSearcher, StringReplacers, AttributeSplitter, ListExploder, ListSearcher, AttributeTrimmer, AttributeRemover. NARSTOFileMetadata (custom)
Writer: MSSQL tables ->
File header, observation, site, lookup tables NARSTO information
FME workbench was HUGE!
Mostly due to the complexity of the NARSTO custom structure.
Using Lists were my friend.
Able to preform batch import on folders!
Once Run able to Query and validate across files within MSSQL
![Page 20: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/20.jpg)
Database System View
CSV
1. NARSTO Files
2. Query data content across imported txt files.
3. Create TABLES: sites, & observations
B. List Values Parsing/ Table Schema Extraction
A. Custom File Parser & Batch File Importer
Create TABLES: sites, file header, variable name, lookup tables, & observations
C. Create QA/QC Tables, (reports)
4. Data Consolidation & Assessment
D. Data ValueConsolidation & Assessment E. Reference
Data Creation
6b. Join References files
5b. Upload Reference COMP Templates (terms, contacts)
6a. Join Reference Value
F. Build & Map XML
7. Store XML
G. Publish XML to Website
5a. Clean Reference Values
8. COMP Viewer
XMLFINISH
START
![Page 21: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/21.jpg)
Data Quality Feedback Loop…n
ETL
Data RequestCOMP Viewer(Internal)
Web Services(controlled user driven quality data products )
(prepping & cleaning within DB)
Master Data Recast
(conversion & migration transactions)
Reporting- Data Profiling- QA/QC - Internal Business needs
COMP WaterML2.0
XML
Data Sources
XLS
…n
COMP Templates
(data entry)
Program - QA/QC
+
SME
Data Quality Improvement process feedback loop…
![Page 22: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/22.jpg)
Remember This?
COMP Logical Model(WaterML2.0)
![Page 23: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/23.jpg)
Mapping Tables to WaterML and store.
n….
![Page 24: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/24.jpg)
Semantic Web Data Uniform Resource Identifiers (URIs):
<om:name
xlink:href="../def/natchem/1-0/natchem-skos.rdf#ObservationType"
xlink:title="Category Parameter"
owns="false"
xlink:type="simple"
/>
Links to semantic values:
</skos:Concept>
<skos:Concept rdf:about="http://intranet.ec.gc.ca/donnees-data/comp/def/natchem/1-0/natchem-skos.rdf#ObservationType">
<skos:prefLabel xml:lang="en-CA">Observation type</skos:prefLabel>
<skos:prefLabel xml:lang="fr-CA">Type d'observation</skos:prefLabel>
<skos:inScheme rdf:resource="http://intranet.ec.gc.ca/donnees-data/comp/def/natchem/1-0/natchem-skos.rdf" />
</skos:Concept>
![Page 25: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/25.jpg)
Unexpected Challenges: Converting Tabular Values to Semantic Web Data
Due to the source data complexity and huge volumes of descriptive reference data the transformations required:
Lots of StringSearchers & StingReplacer of the tabular values with the URI reference location on the web.
Lots of FeatureMergers (>100) due to source data complexity.
With Semantic Web Values have to deal with relative vs. absolute URI paths.
Where do all these values go within WaterML2.0 logical components? XMLTemplater – was a big help!
Across many workbenches (~20- fmw).
Overall lots of time, effort reworking of the data, transformations and facilitation with program to ensure quality over ~6 months of effort.
![Page 26: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/26.jpg)
Using FME Benefits
FME Workspace transformation diagram helps communicate areas of improvement required back to data owners.
Similar to a Data Model Diagram, Can demonstrate the data transformation complexes and issues
Once Workbenches are set up. Enabling Programs to run the FME Workbenches as new or updated data comes.
Improved overall data quality management and reporting.
Supports all of data consumers needs of air quality data, now and in the future.
![Page 27: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/27.jpg)
Next Steps…
API
WFS Service
Query
ResponseCOMP XML PayloadAudience
EC GOC International
Built-in Functionality COMP Viewer Data Point
Downloads
Data Warehouse
Query Dimensions
Temporal extent Spatial extent Sites Variables Techniques
Indexed SQL Tables
XML-Relational Hybrid
Query-specific Collections of COMP components
are assembled on-the-flyfor the API
XML CLOBs
Pointing to
FME Server
![Page 28: Environment Canada's Data Management Service](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a6ae3f1a28ab4d418b466a/html5/thumbnails/28.jpg)
Thank You!
Questions?
For more information:
James Doyle - [email protected]
Christopher Thorne – [email protected]