quality, uncertainty and bias representations of atmospheric remote sensing information products...

67
Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11, April 21, 2015

Upload: felix-young

Post on 20-Jan-2016

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products

Peter Fox, and … others

Xinformatics 4400/6400

Week 11, April 21, 2015

Page 2: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

reading

• Audit/ Workflow• Information Discovery

– Information discovery graph(IDG)– Projects using information discovery– Information discovery and Library Sciences– Information Discovery and retrieval tools– Social Search

• Metadata

– http://en.wikipedia.org/wiki/Metadata– http://www.niso.org/publications/press/UnderstandingMetada

ta.pdf– http://dublincore.org/ 2

Page 3: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Acronyms

AOD Aerosol Optical Depth

MDSAMulti-sensor Data Synergy Advisor

MISR Multi-angle Imaging Spectro-Radiometer

MODIS Moderate Resolution Imaging Spectro-radiometer

OWL Web Ontology Language

REST Representational State Transfer

UTC Coordinated Universal Time

XML eXtensible Markup Language

XSL eXtensible Stylesheet Language

XSLT XSL Transformation

Page 4: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

4

Where are we in respect to the data challenge?

“The user cannot find the data;

If he can find it, cannot access it;

If he can access it, ;

he doesn't know how good they are;

if he finds them good, he can not merge them with other data”

The Users View of IT, NAS 1989

Page 5: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Data quality is an ill-posed problems because

It is not uniquely defined

It is user dependent

It is difficult to be quantified

It is handled differently by different teams

It is perceived differently by data providers and data

users

User question: Which data or product is better for me?

PROBLEM STATEMENT

Page 6: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

QUALITY CONCERNS ARE POORLY ADDRESSED

Data quality issues have lower priority than building an

instrument, launching rockets, collecting/processing data, and

publishing papers using the data.

Little attention on how validation measurements are passed from

Level 1 to Level 2 and higher as it propagates in time and space.

Page 7: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

USERS PERSPECTIVE

There might be a better product somewhere but if I cannot easily find it and understand it, I am going to use whatever I have and know already.

Page 8: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

(Some) Facets of Quality

• Accuracy: closeness to Truth– Bias: systematic deviation– Uncertainty: non-systematic deviation

• Completeness: how well data cover a domain– Spatial– Temporal

• Consistency– Spatial: absence of spurious spatial artifacts– Temporal: absence of trend, spike and offset artifacts

• Resolution– Temporal: time between successive measurements of the same volume– Spatial: distance between adjacent measurements

• Ease of Use• Latency: Time between data collection and receipt

Page 9: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Pretend you’re a museum curator...

Which data quality facet is most important to you?

...and you’re putting together an exhibit on wildfires with some cool satellite data

A – AccuracyB – Resolution (spatial and/or temporal)C – Completeness (spatial and/or temporal)D – LatencyE – Ease of Use

Museum Curator

Page 10: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Museum Curator Poll

Which data quality facet is most important to you?

A – AccuracyB – Resolution (spatial and/or temporal)C – Completeness (spatial and/or temporal)D – LatencyE – Ease of Use

Page 11: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

You’re an operational user and...

Which data quality facet is most important to you?

...you want to use satellite wildfire data to direct HotShot team deployments

A – AccuracyB – Resolution (spatial and/or temporal)C – Completeness (spatial and/or temporal)D – LatencyE – Ease of Use

Page 12: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Operational User / HotShot

Which data quality facet is most important to you?

A – AccuracyB – Resolution (spatial and/or temporal)C – Completeness (spatial and/or temporal)D – LatencyE – Ease of Use

Page 13: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

You’re an operational user and...

Which data quality facet is most important to you?

...you want to use satellite wildfire data to estimate burn scar areas for landslide prediction

A – AccuracyB – Resolution (spatial and/or temporal)C – Completeness (spatial and/or temporal)D – LatencyE – Ease of Use

Page 14: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Operational User / Landslide

Which data quality facet is most important to you?

A – AccuracyB – Resolution (spatial and/or temporal)C – Completeness (spatial and/or temporal)D – LatencyE – Ease of Use

Page 15: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

You’re an ecology researcher and...

Which data quality facet is least important to you?

...you want to use satellite wildfire data to predict extinction risk of threatened species

A – AccuracyB – Resolution (spatial and/or temporal)C – Completeness (spatial and/or temporal)D – LatencyE – Ease of Use

Page 16: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Ecology Researcher

Which data quality facet is least important to you?

A – AccuracyB – Resolution (spatial and/or temporal)C – Completeness (spatial and/or temporal)D – LatencyE – Ease of Use

Page 17: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

You’re a remote sensing researcher...

Which data quality facet is least important to you?

...you want to perfect an algorithm to detect and estimate active burning areas at night with visible and infrared radiances

A – AccuracyB – Resolution (spatial and/or temporal)C – Completeness (spatial and/or temporal)D – LatencyE – Ease of Use

Page 18: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Remote Sensing Researcher...

Which data quality facet is least important to you?

A – AccuracyB – Resolution (spatial and/or temporal)C – Completeness (spatial and/or temporal)D – LatencyE – Ease of Use

Page 19: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Giovanni Earth Science Data Visualization & Analysis Tool

• Developed and hosted by NASA/ Goddard Space Flight Center (GSFC)

• Multi-sensor and model data analysis and visualization online tool

• Supports dozens of visualization types

• Generate dataset comparisons

• ~1500 Parameters

• Used by modelers, researchers, policy makers, students, teachers, etc.

19

Page 20: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Web-based tools like Giovanni allow scientists to compress

the time needed for pre-science preliminary tasks:

data discovery, access, manipulation, visualization,

and basic statistical analysis.

DO SCIENCE

Submit the paper

Minutes

Web-based Services:

Perform filtering/masking

Find data Retrieve high volume data

Extract parameters

Perform spatial and other subsetting

Identify quality and other flags and constraints

Develop analysis and visualization

Accept/discard/get more data (sat, model, ground-based)

Learn formats and develop readers

Jan

Feb

Mar

May

Jun

Apr

Pre-Science

Days for exploration

Use the best data for the final analysis

Write the paper

Derive conclusions

Exploration

Use the best data for the final analysis

Write the paper

Initial Analysis

Derive conclusions

Submit the paper

Jul

Aug

Sep

Oct

The Old Way: The Giovanni Way:

Read Data

Reformat

Analyze

Explore

Reproject

Visualize

Extract Parameter

Gio

vann

i

Mirad

or

Scientists have more time to do science!

DO SCIENCE

Giovanni Allows Scientists to Concentrate on the Science

Filter Quality

Subset Spatially

Page 21: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

EXPECTATIONS FOR DATA QUALITY

What do most users want?

Gridded data (without gaps) with error bars in each grid cell

What do they get instead?

Level 2 swath in satellite projections with poorly defined quality flags

Level 3 monthly data with a lot of suspicious aggregations and

standard deviation as an uncertainty measure (fallacy) – Standard

deviation mostly reflects the variability within the grid box.

Little or no information on sampling (Level 3).

Page 22: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

The effect of bad qualitydata is often not negligible

Total Column Precipitable Water Quality

Best Good Do Not Usekg/m2

Hurricane Ike, 9/10/2008

Page 23: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Data Discovery Assessment Access Manipulation Visualization Analyze

Data Usage Workflow

23

Page 24: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Data Discovery Assessment Access Manipulation Visualization Analyze

Data Usage Workflow

24Integration

Reformat

Re-project

Filtering

Subset / Constrain

*Giovanni helpsstreamline / automate

Page 25: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Data Discovery Assessment Access Manipulation Visualization Analyze

Data Usage Workflow

25

Integration Planning

Precision Requirements

Quality Assessment Requirements

Intended Use

Integration

Reformat

Re-project

Filtering

Subset / Constrain

*Giovanni helpsstreamline / automate

Page 26: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Challenge

• Giovanni streamlines data processing, performing required

actions on behalf of the user

– but automation amplifies the potential for users to generate

and use results they do not fully understand

• The assessment stage is integral for the user to understand

fitness-for-use of the result

– but Giovanni did not assist in assessment

• We were challenged to instrument the system to help users

understand results

26

Page 27: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

27

Producers Consumers

Quality Control

Fitness for Purpose Fitness for Use

Quality Assessment

Trustee Trustor

Page 28: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Definitions – for an atmospheric scientist

• Quality– Is in the eyes of the beholder – worst case scenario…

or a good challenge

• Uncertainty– has aspects of accuracy (how accurately the

real world situation is assessed, it also includes bias) and precision (down to how many digits)

28

Page 29: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Quality Control vs. Quality Assessment

• Quality Control (QC) flags in the data (assigned by the algorithm) reflect “happiness” of the retrieval algorithm, e.g., all the necessary channels indeed had data, not too many clouds, the algorithm has converged to a solution, etc.

• Quality assessment is done by analyzing the data “after the fact” through validation, intercomparison with other measurements, self-consistency, etc. It is presented as bias and uncertainty. It is rather inconsistent and can be found in papers, validation reports all over the place.

Page 30: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Definitions – for an atmospheric scientist

• Bias has two aspects:– Systematic error resulting in the distortion of

measurement data caused by prejudice or faulty measurement technique

– A vested interest, or strongly held paradigm or condition that may skew the results of sampling, measuring, or reporting the findings of a quality assessment:• Psychological: for example, when data providers audit their

own data, they usually have a bias to overstate its quality.• Sampling: Sampling procedures that result in a sample that is

not truly representative of the population sampled. (Larry English) 30

Page 31: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Data quality needs: fitness for use

• Measuring Climate Change:– Model validation: gridded contiguous data with uncertainties– Long-term time series: bias assessment is the must , especially

sensor degradation, orbit and spatial sampling change

• Studying phenomena using multi-sensor data:– Cross-sensor bias is needed

• Realizing Societal Benefits through Applications:– Near-Real Time for transport/event monitoring - in some cases,

coverage and timeliness might be more important that accuracy– Pollution monitoring (e.g., air quality exceedance levels) – accuracy

• Educational (users generally not well-versed in the intricacies of quality; just taking all the data as usable can impair educational lessons) – only the best products

Page 32: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Level 2 data

32

Page 33: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Level 2 data

• Swathfor MISR, orbit 192 (2001)

33

Page 34: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Level 3 data

34

Page 35: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

MODIS vs. MERIS

Same parameter Same space & time

Different results – why?

MODIS MERIS

A threshold used in MERIS processing effectively excludes high aerosol values. Note: MERIS was designed primarily as an ocean-color instrument, so aerosols are “obstacles” not signal.

Page 36: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Spatial and temporal sampling – how to quantify to make it useful for modelers?

• Completeness: MODIS dark target algorithm does not work for deserts• Representativeness: monthly aggregation is not enough for MISR and

even MODIS• Spatial sampling patterns are different for MODIS Aqua and MISR Terra:

“pulsating” areas over ocean are oriented differently due to different orbital direction during day-time measurement Cognitive bias

MODIS Aqua AOD July 2009 MISR Terra AOD July 2009

Page 37: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

37

Three projects with data quality flavor

• Multi-sensor Data Synergy Advisor– Product-level Quality: how closely the data represent the

actual geophysical state

• Data Quality Screening Service– Pixel-level Quality: algorithmic guess at usability of data point– Granule-level Quality: statistical roll-up of Pixel-level Quality

• Aerosol Statistics– Record-level Quality: how consistent and reliable the data

record is across generations of measurements

Page 38: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

38

Multi-Sensor Data Synergy Advisor (MDSA)

•Goal: Provide science users with clear, cogent information on salient differences between data candidates for fusion, merging and intercomparison

–Enable scientifically and statistically valid conclusions

•Develop MDSA on current missions:– NASA - Terra, Aqua, (maybe Aura)

•Define implications for future missions

Page 39: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

39

How MDSA works?

MDSA is a service designed to characterize the differences between two datasets and advise a user (human or machine) on the advisability of combining them.

• Provides the Giovanni online analysis tool • Describes parameter and products• Documents steps leading to the final data product• Enables better interpretation and utilization of parameter

difference and correlation visualizations. • Provides clear and cogent information on salient differences

between data candidates for intercomparison and fusion. • Provides information on data quality• Provides advice on available options for further data

processing and analysis.

Page 40: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Correlation – same instrument, different satellites

Anomaly

40

MODIS Level 3 dataday definition leads to artifact in correlation

Page 41: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

…is caused by an Overpass Time Difference

41

Page 42: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Correlation between MODIS Aqua AOD (Ocean group product) and MODIS-Aqua AOD (Atmosphere group product)

Pixel Count distribution

Only half of the Data Day artifact is present because the Ocean Group uses the better

Data Day definition!

Effect of the Data Day definition on Ocean Color data correlation with Aerosol data

Page 43: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Research approach

• Systematizing quality aspects– Working through literature– Identifying aspects of quality and their

dependence of measurement and environmental conditions

– Developing Data Quality ontologies– Understanding and collecting internal and external

provenance

• Developing rulesets allows to infer pieces of knowledge to extract and assemble

• Presenting the data quality knowledge with good visual, statement and references

Page 44: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

44

Semantic Web Basics

• The triple: {subject-predicate-object}Interferometer is-a optical instrumentOptical instrument has focal length

• W3C is the primary (but not sole) governing org. languages– RDF programming environment for 14+ languages, including C, C+

+, Python, Java, Javascript, Ruby, PHP,...(no Cobol or Ada yet ;-( ) – OWL 1.0 and 2.0 - Ontology Web Language - programming for

Java

• Query, rules, inference…

• Closed World - where complete knowledge is known (encoded), AI relied on this

• Open World - where knowledge is incomplete/ evolving, SW promotes this

Page 45: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

45

Ontology Spectrum

Catalog/ID

SelectedLogical

Constraints(disjointness,

inverse, …)

Terms/glossary

Thesauri“narrower

term”relation

Formalis-a

Frames(properties)

Informalis-a

Formalinstance

Value Restrs.

GeneralLogical

constraints

Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness.Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html

Page 46: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Model for Quality Evidence

46

Page 47: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Data Quality Ontology Development (Quality flag)

Working together with Chris Lynnes’s DQSS project, started from the pixel-level quality view.

Page 48: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Data Quality Ontology Development (Bias)

http://cmapspublic3.ihmc.us:80/servlet/SBReadResourceServlet?rid=1286316097170_183793435_22228&partName=htmltext

Page 50: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

MDSA Aerosol Data Ontology Example

Ontology of Aerosol Data made with cmap ontology editorhttp://tw.rpi.edu/web/project/MDSA/DQ-ISO_mapping

Page 51: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Multi-Domain Knowledgebase

51

Provenance Domain

Earth Science Domain

Data Processing

Domain

Page 52: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

RuleSet Development

[DiffNEQCT:(?s rdf:type gio:RequestedService),(?s gio:input ?a),(?a rdf:type gio:DataSelection),(?s gio:input ?b),(?b rdf:type gio:DataSelection),(?a gio:sourceDataset ?a.ds),(?b gio:sourceDataset ?b.ds),(?a.ds gio:fromDeployment ?a.dply),(?b.ds gio:fromDeployment ?b.dply),(?a.dply rdf:type gio:SunSynchronousOrbitalDeployment),(?b.dply rdf:type gio:SunSynchronousOrbitalDeployment),(?a.dply gio:hasNominalEquatorialCrossingTime ?a.neqct),(?b.dply gio:hasNominalEquatorialCrossingTime ?b.neqct),notEqual(?a.neqct, ?b.neqct)->(?s gio:issueAdvisory giodata:DifferentNEQCTAdvisory)]

Page 53: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Advisor Knowledge Base

53Advisor Rules test for potential anomalies, create

association between service metadata and anomaly metadata in Advisor KB

Page 54: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Data Discovery Assessment Access Manipulation Visualization Analyze Re-

Assessment

Assisting in Assessment

54

Integration Planning

Precision Requirements

Quality Assessment Requirements

Intended Use

Integration

Reformat

Re-project

Filtering

Subset / Constrain

MDSA Advisory Report

Provenance & Lineage Visualization

Page 55: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Thus - Multi-Sensor Data Synergy Advisor

• Assemble semantic knowledge base

– Giovanni Service Selections

– Data Source Provenance (external provenance - low detail)

– Giovanni Planned Operations (what service intends to do)

• Analyze service plan

– Are we integrating/comparing/synthesizing?

• Are similar dimensions in data sources semantically comparable? (semantic diff)

• How comparable? (semantic distance)

– What data usage caveats exist for data sources?

• Advise regarding general fitness-for-use and data-usage caveats 55

Page 56: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Semantic Advisor Architecture

RPI

Page 57: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

…. complexity

57

Page 58: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Presenting data quality to users

• Global or product level quality information, e.g. consistency, completeness, etc., that can be presented in a tabular form.

• Regional/seasonal. This is where we've tried various approaches: – maps with outlines regions, one map per

sensor/parameter/season– scatter plots with error estimates, one per a combination

of Aeronet station, parameter, and season; with different colors representing different wavelengths, etc.

Page 59: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Advisor Presentation Requirements

• Present metadata that can affect fitness for use of result

• In comparison or integration data sources– Make obvious which properties are

comparable– Highlight differences (that affect

comparability) where present• Present descriptive text (and if possible

visuals) for any data usage caveats highlighted by expert ruleset

• Presentation must be understandable by Earth Scientists!! Oh you laugh… 59

Page 60: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Advisory Report

• Tabular representation of the semantic equivalence of comparable data source and processing properties

• Advise of and describe potential data anomalies/bias

60

Page 61: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Advisory Report (Dimension Comparison Detail)

61

Page 62: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Advisory Report (Expert Advisories Detail)

62

Page 63: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Quality Comparison Table for Level-3 AOD (Global example)

Quality Aspect MODIS MISR

Completeness

Total Time Range Platform Time Range 2/2/200-presentTerra 2/2/2000-present

Aqua 7/2/2002-present

Local Revisit Time Platform Time Range Platform Time Range

Terra 10:30 AM Terra 10:30 AM

Aqua 1:30 PM

Revisit Time global coverage of entire earth in 1 day; coverage overlap near pole

global coverage of entire earth in 9 days & coverage in 2 days in polar region

Swath Width 2330 km 380 km

Spectral AOD AOD over ocean for 7 wavelengths (466, 553, 660, 860, 1240, 1640, 2120 nm );AOD over land for 4 wavelengths (466, 553, 660, 2120 nm (land)

AOD over land and ocean for 4 wavelengths (446, 558, 672, and 866 nm) 

AOD Uncertainty or Expected Error (EE)

+-0.03+- 5% (over ocean; QAC > = 1)+-0.05+-20% (over land, QAC=3);

63% fall within 0.05 or 20% of Aeronet AOD; 40% are within 0.03 or 10%

Successful Retrievals

15% of Time 15% of Time (slightly more because of retrieval over Glint region also)

Page 64: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

What they really like!

64

Page 65: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Summary

• Quality is very hard to characterize, different groups will focus on different and inconsistent measures of quality– Modern ontology representations and reasoning to the rescue!

• Products with known Quality (whether good or bad quality) are more valuable than products with unknown Quality.– Known quality helps you correctly assess fitness-for-use

• Harmonization of data quality is even more difficult that characterizing quality of a single data product

65

Page 66: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Summary

• Advisory Report is not a replacement for proper analysis planning– But provides benefit for all user types summarizing general

fitness-for-usage, integrability, and data usage caveat information

– Science user feedback has been very positive

• Provenance trace dumps are difficult to read, especially to non-software engineers– Science user feedback; “Too much information in provenance

lineage, I need a simplified abstraction/view”

• Transparency Translucency– make the important stuff stand out

66

Page 67: Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,

Current Work

• Advisor suggestions to correct for potential anomalies

• Views/abstractions of provenance based on specific user group requirements

• Continued iteration on visualization tools based on user requirements

• Present a comparability index / research techniques to quantify comparability

67