can we quantify the quality of the data? - eurachem...eurachem dublin, 14-15 may 2018 can we...

57
Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management Unit European Food Safety Authority

Upload: others

Post on 31-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

Eurachem Dublin, 14-15 May 2018

Can we Quantify the Quality of the Data?

Stefano Cappè

Data Management Team Leader

Evidence Management Unit

European Food Safety Authority

Page 2: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

2

Summary

Background

Engagement from the start

Quality of data

Quantification of data quality

Visualising data quality

Governance and collaboration with Member States

Conclusions

Page 3: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

3

Background: Founding regulation

The MSs shall take the necessary measures to enable the data they collect in the fields of

EFSA be transmitted to the Authority

Data Collection

EFSA forward to Member States and EC

appropriate recommendations which

might improve the technical comparability

of the data

Data harmonisation

Regulation (Ec) No 178/2002 – Article 33

Page 4: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

4The Laboratories in the Member States are EFSA’s Laboratories

Background: Data collection

4

EFSA has no laboratories

Page 5: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

5

Background: Evidence management unit

Page 6: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

6

Summary

Background

Engagement from the start

Quality of data

Quantification of data quality

Visualising data quality

Governance and collaboration with Member States

Conclusions

Page 7: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

7

harmonization

2010 - Standard Sample Description ver. 1

A priority from the start: Data harmonisation initiatives

2013 - Standard Sample Description ver. 2

Based on XML (eXtensible Markup Language)

Data format specified with XML schemas

Page 8: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

8

A priority from the start: Standard terminologies

Standard terminologies

Around 120 catalogues

Parameters (PARAM)

Matrices(FoodEx2)

Analytical methods(ANLYMD)

Standardised concepts

Unique identities

Language independent

Reusable

Around 200,000 term

identities

Around 350,000 connected elements (e.g. CAS numbers,

species latin names)

Page 9: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

9

A priority from the start: Standardised validation rules

businessRule Set

name

PayloadPayloadbusinessRule

scientific specification

technical specification

▪ Validation rules are implemented in XML language

businessRuleCode

description

infoMessage

infoType

status

lastUpdate

checkedDataElements

Page 10: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

10

businessRule Set

name

PayloadPayloadbusinessRule

scientific specification

technical specification

▪ Business rules are implemented in XML language

includes

appliesTo

ignoreNull

forEach

transformation

condition

verify

A priority from the start: Standardised validation rules

Complex, cannot be converted in a table

Page 11: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

1111

A priority from the start: Acknowledgement message

◼ 11

Provide an automatic feedback to data providers:

Page 12: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

12

A priority from the start: Hierarchical rules

Standard Sample Description ver. 1

General Validation Rules

Contaminants

All data domains

Specific requirements for

contaminants

Specific validation rules for

contaminants

Pesticide residues

Guidance for pesticide residues

Specific validation rules for pesticide

residues

Page 13: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

13

A priority from the start: data validation

Analytical results

XML file

Acknowledgement

XML file

XML Schema

XML file

Validation rules

XML files

Data Collection Framework (DCF) System

Terminologies

XML files

Data Quality Checks

XML ParserTerminologyManagement

System

ValidationRules Engine

Feedback to sender

Page 14: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

14

A priority from the start: Process standardisation...

14

Consumption

Consumption data

• Comprehensive

Food Consumption

• EUMenu

Chemicals

SSD ver. 1

• Contaminant concentration

• Pesticide residues

• Additive concentration

SSD ver. 2

• Chemical contaminants

• Pesticide residues

• Veterinary Medicinal Products

Zoonoses

Prevalence

Antimicrobial resistance

Food borne outbreaks

Animal disease

Animal population

TSEs√ Ad-hoc data collections

Standardised operating procedures

√ Centralised data collectionsCentralised data management and governance

Page 15: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

15

EFSA

National competent authorities

Local authorities

Laboratories

Standardisation of data reporting is across differentdomains and the entire data collection chain

Country data

A priority from the beginning…not only for EFSA

Page 16: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

16

Summary

Background

Engagement from the start

Quality of data

Quantification of data quality

Visualising data quality

Governance and collaboration with Member States

Conclusions

Page 17: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

17

Quality of data: More data quality, please!

▪ Workload intensive

▪ All data reporting levels are impacted for:

▪ Resources

▪ Costs

▪ Data users want more data quality

Can we actually use the data?

Page 18: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

18

Quality of data

Page 19: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

19

Data quality is the extent to which

data are fit for their intended use

(in line with the general definition of quality as set in the standard ISO 9000:“degree to which a set of inherent characteristics of an object fulfilsrequirements”)

Quality of data: A definition

Page 20: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

20

Going to the next level: Data Quality Management Framework

Page 21: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

21

Going to the next level: Data Quality Virtuous Cycle

1. DEFINE use cases andrequirements for data quality (Data Quality Objectives)

2. MEASURE quality of data by Key Performance Indicators

3. ANALYSE quality assessment outcomes

4. IMPROVE by taking corrective actions

This also matches the PDCA (Plan, Do, Check, Act) cycle of more general quality assurance process (ISO9001, 2016).

Data Quality Management Framework

Page 22: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

22

Summary

Background

Engagement from the start

Quality of data

Quantification of data quality

Visualising data quality

Governance and collaboration with Member States

Conclusions

Page 23: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

23

Quantification of data quality: DQ dimensions

▪ Data quality dimensions: Classification system of positive features often desired in the data, used to help specification of data quality objectives and aggregation, reporting and comparison of data quality analysis

▪ Classification system: DQ dimensions classify features of data as food classes classify food (e.g. cereal and cereal products, vegetables, dairy products)

▪ Positive: standardise to look positive features (mainly not to get confused)

▪ Often: They do not need to be all present at all time

Page 24: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

24

Quantification of data quality: DQ dimensions

Validity

• Are data elements consistent to their format, type and range?

• Are constraints respected?

Uniqueness

• Are the records present only once in the database?

• Are database unique identifier available?

Timeliness

• Are data available when needed?

• Are data up to date for their uses?

Page 25: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

25

Quantification of data quality: DQ dimensions

Accuracy

• Are data elements representing correctly the real world from which are extracted?

• Are the data plausible?

Completeness

• Is information reported in the data elements comprehensive?

• Are valuable data elements missing?

Consistency

• Are different data elements providing non-conflicting details for a specific piece of information?

Page 26: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

26

Quantification of data quality: DQ use cases and objectives

▪ Data use cases: Define the main uses for which data are collected and managed. For example:

▪ Risk assessment, exposure assessment

▪ Risk management

▪ Data quality objectives: Requirements that data users expect in the data, in order to fulfil their uses

Page 27: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

27

Quantification of data quality: DQ KPIs and thresholds

▪ Data quality Key Performance Indicators(KPIs):

▪ Indicators measuring the level of fulfilment of a data quality objective

▪ Data quality thresholds:

▪ levels that the DQ KPIs must achieve in order to consider the DQ objective fulfilled.

Page 28: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

28

Quantification of data quality: general overview

DQ use case 1

DQ Objectives1

DQ KPI1 DQ KPI2

DQ objective2

DQ KPI3

DQ Dimension 1 DQ Dimension 2

Page 29: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

29

Quantification of data quality: general principles

1. Keep it simple2. Focus on incoming data: Data Quality KPIs implemented and

calculated on incoming data from as submitted by data providers. Data quality at entrance.

3. Only most relevant dimensions4. Minimise KPIs: Potentially one for objective5. Avoid complex formulas for KPIs: Use only proportions6. Agree data quality objectives and KPIs with data providers

Page 30: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

An example: chemical contaminants

Page 31: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

31

Quantification of data quality: define data quality objectives/KPIs

DQ Use case DQ Objectives DQ KPI

Risk assessment/exposure assessment

DQO_CHEM_01: Timely availability of the data for analysis to risk assessors and risk managers

KPI_CHEM_01: Proportion of data records in “SUBMITTED” status by data collection deadline

KPI_CHEM_02: Proportion of data records confirmed by data providers within one month (calendar) from data collection deadline

DQO_CHEM_03: No duplication of records

KPI_CHEM_03: Proportion of data records not duplicated in a data collection

DQO_CHEM_06: No mistakes for relevant numerical values (e.g. VAL, LOD or LOQ)

KPI_CHEM_04: Proportion of records containing the correct numerical value for the analytical result (i.e. resVal)

KPI_CHEM_05: Proportion of records containing the correct limit of detection for the analytical result (i.e. resLOD)

KPI_CHEM_06: Proportion of records containing the correct limit of quantification for the analytical result (i.e. resLOQ)

Page 32: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

32

Summary

Background

Engagement from the start

Quality of data

Quantification of data quality

Visualising data quality

Governance and collaboration with Member States

Conclusions

Page 33: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

33

Visualising data quality: contaminants DQ dimensions

Page 34: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

34

Visualising data quality: Contaminants DQ Objectives

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

DQ

O_C

HEM

_01

DQ

O_C

HEM

_03

DQ

O_C

HEM

_06

DQ

O_C

HEM

_07

DQ

O_C

HEM

_08

DQ

O_C

HEM

_10

Page 35: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

35

Visualising data quality: Contaminants DQ KPIs

Page 36: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

36

Terms Score

Foodex 1.00

Analytical Method 1.00

Country of Sampling 1.00

Country of Origin 0.84

Sampling Method 0.93

Sampling Point 0.94

Program Type 0.99

Sampling Strategy 0.91

Reported LOQ 0.91

Product Treatment 0.67

Completeness: generic terms

Not ReportedNot AvailableUnknown

UnknownNot ReportedNot Available...

Visualising data quality: Contaminants Drill down completeness

Page 37: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

37

Visualising data quality: contaminants trends

Trend in Data Quality since 2013

Validity

Timeliness (submission)

Analytical method

LOQ reported

Page 38: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

38

◼Timeliness

Streamline communication - clear rules must be defined and disseminated by EFSA on how to send data, by when, and how to perform data confirmation

◼Completeness

Re-discuss the inclusion of generic terms in the catalogues related to the data elements highlighted:

Mandatory + generic term (e.g. “Unspecified”) = Optional

◼Validity

Promote automation of transmissions to reduce non standard format (20%)

Visualising data quality: DQ actions

Page 39: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

39

Summary

Background

Engagement from the start

Quality of data

Quantification of data quality

Visualising data quality

Governance and collaboration with Member States

Benefits, limitations and lessons learnt

Page 40: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

40

Governance and collaboration with MSs

Select action

Assign resources

Implement

Data Quality GovernanceData Providers

EFSA Data analysts/ EFSA Data stewards/ EFSA Data managers

EuropeanCommission

Scientific Network

Page 41: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

41

Governance and collaboration with MSs

▪ Support definition of DQ Objectives and KPIs

▪ Agree actions for data quality improvement

▪ Coordinate data stewardship activities to improve data quality, and cross domain issues

▪ Decide priorities to invest available resources for data stewardship and data management automation

Pilot of a Framework Partnership Agreement (FPA) on Data Quality

Framework Partnership Agreement on Data Quality

Page 42: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

42

Governance and collaboration with Member States

1. Data Governance and Coordination

2. System enhancements

3. Data stewardship

Framework Partnership Agreement on Data Quality: Objectives

Page 43: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

43

Governance and collaboration with Member States

▪ Pilot

▪ Five countries (Cyprus, Denmark, France, Germany, Slovakia) selected by geographic distribution and size

▪ Essential to negotiate data quality objectives and KPIs

▪ Additional network members were involved in the discussions

Framework Partnership Agreement on Data Quality

Page 44: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

44

AF

National Data Coordinators

Scientific Networks

Goverance and collaboration with MSs: Data governance Pyramid

Operational

Tactical

Strategic

PILOT

Define DQO, select KPIs,Propose actions

Broader aspects related to Data Models, cross domain aspects, ...

Specify needs on a longer scale, allocate resources by priorities

Framework Partnership Agreement on Data Quality

?

Page 45: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

45

Summary

Background

Engagement from the start

Quality of data

Quantification of data quality

Visualising data quality

Governance and collaboration with Member States

Conclusions

Page 46: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

46

Benefits

▪ The effort for improving data quality is connected to actual value (DQ use cases) for the data users

▪ The need for certain actions of data quality improvement are evident through the process of DQ Objectives ad KPIs (e.g. adding a new business rule)

▪ Make evident the cost of data quality, define limits

▪ Provide a governance of the process o where actions and responsibilities are shared with data providers (e.g. Member States)

Page 47: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

47

Constraints

▪ DQ Framework applied only to incoming data and not to the entire EFSA Scientific Data Warehouse

▪ DQ Framework applied only to data passing the validation phase. The framework should be extended to the entire set of applied validation rules

▪ Only National Competent Authorities involved so far: particularly from FPA pilot. Reactions from other data providers must be investigated

Page 48: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

48

Lessons learnt

▪ Engagement of data providers is essential

▪ Start small (few dimensions, objectives and KPIs)

▪ Keep It Simple approach for KPIs

▪ Low score in KPIs is an effect and not a cause: always explain it to Data Providers

▪ Be ready to clean the house on recipient side:

▪ Clarify Service Level Agreements on receiver side

Page 49: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

49

Acknowledgements

▪ Valentina Bocca: Data Quality/Data validation

▪ Alessandro Carletti: Data Quality/Data QualityFramework

Page 50: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

Any questions?

Page 51: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

References

Page 52: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

52

References: Standard Sample Description (SSD)

▪ SSD ver 1.0

▪ European Food Safety Authority; Standard sample description for food and feed. EFSA Journal 2010;8(1):1457 [54 pp.].doi:10.2903/j.efsa.2010.1457.

▪ SSD ver 2.0

▪ EFSA (European Food Safety Authority), 2013. Standard Sample Description ver. 2.0. EFSA Journal 2013;11(10):3424, 114 pp., doi:10.2903/j.efsa.2013.3424

Page 53: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

53

References: Contaminants Requirements

▪ For SSD1 data format:▪ EFSA (European Food Safety Authority),2017.

Specificreporting requirements for contaminants and foodadditives occurrence data submission.EFSAsupportingpublication 2017:EN-1262. 27 pp. doi:10.2903/sp.efsa.2017.EN-126

▪ For SSD2 data format:▪ EFSA (European Food Safety Authority),2017. Specific

reporting requirements for contaminants and food additives occurrence data submission in SSD2. EFSA supporting publication 2017:EN-1261. 43pp. doi:10.2903/sp.efsa.2017.EN-1261

Page 54: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

54

References: Pesticide residues requirements

▪ Yearly updated guidance, publication 2017

▪ For SSD1 data format:▪ EFSA (European Food Safety Authority), Brancato A, Brocca D,

Erdos Z, Ferreira L,Greco L, Jarrah S, Leuschner R, Lythgo C, Medina P, Miron I, Nougadere A, Pedersen R, Reich H, Santos M,Stanek A, Tarazona J, Theobald A and Villamar-Bouza L, 2017. Guidance for reporting data on pesticideresidues in food and feed according to Regulation (EC) No 396/2005 (2016 data collection). EFSA Journal2017;15(5):4792, 48 pp.https://doi.org/10.2903/j.efsa.2017.4792

Page 55: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

55

References: Veterinary Med. Prod. reporting requirements

▪ Quick start reporting guide ver 2▪ https://zenodo.org/record/1204115#.WvBfby5ubcv

Page 56: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

56

Contaminants KPIs (1)

Data Quality Objective DescriptionDQO_CHEM_01 Timely availability of the data for analysis to risk assessors and risk managers.

Late or last minute data transmissions delays availability of data to support risk assessment and management processes. In addition it increases the risk of not identifying and fixing possible data quality issues.

DQO_CHEM_02 Timely availability of data updates. These requests are part of broader workflows for the use of data. The time for answering influences the entire process

DQO_CHEM_03 No duplication of records. Finding, resolving and reducing the incidence of duplicated records in delivered datasets by data providers. Duplicated records impede the suitability of the data for immediate use and, if not identified, put at risk the value of the analysis.

DQO_CHEM_05 Completeness of the dataset with respect to the planned and performed analyses and inclusion of all results. Omitting results from the original plan reduces the representativeness of the data and not reporting some results seriously biases the statistics on the occurrence of the hazards and consequently the exposure estimate.

DQO_CHEM_06 No mistakes for relevant numerical values (e.g. VAL, LOD or LOQ) or in the associated unit of measurement. Finding, resolving and reducing the incidence in records of numerical errors in delivered datasets by data providers. Records with numerical errors impede the suitability of the data for immediate use and, if not identified, put at risk the value of the analysis.

Page 57: Can we Quantify the Quality of the Data? - Eurachem...Eurachem Dublin, 14-15 May 2018 Can we Quantify the Quality of the Data? Stefano Cappè Data Management Team Leader Evidence Management

57

Data Quality Objective DescriptionDQO_CHEM_07 Detailed and consistent identification of the analysed Matrix (e.g. food, feed)

coded according to the relevant food classification system. A detailed identification of the matrix allows a better granularity of the data analysis thus improving the assessment; A detailed and correct use of the matrix catalogue expedite the direct use of the collected data for analysis and reduces the time needed for manual data cleansing preparing the data for risk assessment.

DQO_CHEM_08 Precise and consistent identification of the Parameter (analyte) not using generic browsing terms. The Parameter catalogue is a multi-level catalogue and the aggregated terms are useful for navigating the hierarchy, but only the use of detailed terms allows a precise data analysis

DQO_CHEM_09 Pertinence and correctness of the expression of the result for the combinations parameter-matrix. The matrix reported must be relevant for the parameter analysed and must be expressed in the correct expression of the result.

DQO_CHEM_10 No incomplete data for mandatory and recommended elements. Different catalogues also include generic descriptors (e.g. other, unspecified, not in list), but the use of such descriptors seriously reduces the usability of data.

DQO_CHEM_11 No data provided outside agreed standard format (SSD1 o SSD2). Chemical contaminants data are often provided in non-standard format that can cause various transcription and processing issues.

Contaminants KPIs (2)