report of the workshop on integrated datras products (wkidp)

70
ICES WKIDP REPORT 2014 SCICOM STEERING GROUP ON ECOSYSTEM S URVEYS SCIENCE AND T ECHNOLOGY SSGESST:17 REF . ACOM, DIG, SCICOM Report of the Workshop on Integrated DATRAS Products (WKIDP) 7–9 October 2014 ICES Headquarters, Copenhagen

Upload: others

Post on 03-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 SCICOM STEERING GROUP ON ECOSYSTEM SURVEYS SCIENCE AND TECHNOLOGY

SSGESST:17

REF. ACOM, DIG, SCICOM

Report of the Workshop on Integrated DATRAS Products

(WKIDP)

7–9 October 2014

ICES Headquarters, Copenhagen

Page 2: Report of the Workshop on Integrated DATRAS Products (WKIDP)

International Council for the Exploration of the Sea Conseil International pour l’Exploration de la Mer

H. C. Andersens Boulevard 44–46 DK-1553 Copenhagen V Denmark Telephone (+45) 33 38 67 00 Telefax (+45) 33 93 42 15 www.ices.dk [email protected]

Recommended format for purposes of citation:

ICES. 2014. Report of the Workshop on Integrated DATRAS Products (WKIDP), 7–9 October 2014, ICES Headquarters, Copenhagen. SSGESST:17. 70 pp.

For permission to reproduce material from this publication, please apply to the Gen-eral Secretary.

The document is a report of an Expert Group under the auspices of the International Council for the Exploration of the Sea and does not necessarily represent the views of the Council.

© 2014 International Council for the Exploration of the Sea

Page 3: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | i

Contents

Executive Summary ............................................................................................................... i

1 Terms of Reference ........................................................................................................ 1

2 Current use of DATRAS (ToR a) ................................................................................ 2

2.1 Contact to and feedbacks from DATRAS users ................................................ 2 2.2 Users’ data requests and downloads from web site ........................................ 2

2.2.1 Use of existing products and downloading activities ......................... 2 2.2.2 Addressing specific requests sent to Data Centre ............................... 6

2.3 Recent features in DATRAS that are worth being aware of ........................... 7

2.4 Conclusions for ToR a) ....................................................................................... 12 2.4.1 Different types of DATRAS users ........................................................ 12 2.4.2 Different types of needs for improved DATRAS products .............. 13

3 Type of needs 1: Improved quality of and access to raw data ............................. 14

3.1 Dealing with missing data and updates .......................................................... 14 3.1.1 Problem description .............................................................................. 14 3.1.2 Potential future developments ............................................................. 14

3.2 Publication of other parameters involved in calculations ............................. 15 3.2.1 Problem description .............................................................................. 15 3.2.2 Future work ............................................................................................ 15

4 Type of needs 2: Improved data mining, exploration and mapping of raw data ......................................................................................................................... 16

4.1 What are the needs? ........................................................................................... 16 4.2 Data mining tools on DATRAS homepage ..................................................... 21

4.2.1 Better documentation of the data submission.................................... 21 4.2.2 The “Flex File”, A simpler version of the exchange data,

easier to explore ..................................................................................... 22 4.2.3 DATRAS web services .......................................................................... 23 4.2.4 ICES Geoportal ....................................................................................... 26

4.3 Data mining tools publicly available on web outside of DATRAS website.................................................................................................................. 26 4.3.1 DTU DATRAS R package (DTU Aqua) .............................................. 26 4.3.2 RICES R package based on DATRAS web services (MRI) ............... 27 4.3.3 Published surveys and population indicators (IFREMER) .............. 27 4.3.4 ERDDAP visualization interface (marine Institute) .......................... 28 4.3.5 D. Beare DATRAS R package ............................................................... 29

4.4 What is not publicly available but could be useful to users ......................... 30

5 Type 3: Application of existing products to new outputs (e.g. to new species, new quarters, new areas, new surveys etc) .............................................. 31

5.1 Overview of existing products by survey ....................................................... 31

Page 4: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ii | ICES WKIDP REPORT 2014

5.2 Applying existing products to new surveys: ongoing work for the Beam trawl Surveys (WGBEAM) ..................................................................... 32 5.2.1 Standard products from DATRAS input: ........................................... 33 5.2.2 Results from the BTS index................................................................... 33 5.2.3 Internal WGBEAM products for survey summary ........................... 33

6 Type of needs 4: Changes in estimation or calculation procedures of existing products .......................................................................................................... 35

6.1 Description of the issues .................................................................................... 35

6.2 An example of how to tackle such issues – WGBIFS changes in ALK and maturity estimation procedures. ............................................................... 35

6.3 Ways forward ...................................................................................................... 36

7 Type of needs 5: Other new products ...................................................................... 37

7.1 New products needed for single stock assessments ...................................... 37 7.1.1 Estimation of swept area ....................................................................... 37 7.1.2 Confidence intervals for indices .......................................................... 37 7.1.3 Mean weight at age and maturity ogive ............................................. 39 7.1.4 Abundance and biomass indices for Data Limited Stocks

(DLS category 3) ..................................................................................... 39 7.2 Products needed for integrated ecosystem assessments and MSFD ........... 40

7.2.1 What......................................................................................................... 40 7.2.2 Why .......................................................................................................... 41 7.2.3 When? ...................................................................................................... 41 7.2.4 How? ........................................................................................................ 41 7.2.5 Who? ........................................................................................................ 42

8 Conclusions .................................................................................................................. 43

8.1 Synopsis ............................................................................................................... 43 8.2 Summary of suggested tasks to ICES Data Centre and surveys WG .......... 43

8.3 References ............................................................................................................ 43

Annex 1: List of participants............................................................................................... 45

Annex 2: Recommendations ............................................................................................... 48

Annex 3: Protocol for new data services (ICES, 2011) .................................................... 49

Annex 4: Filled templates for new DATRAS products ................................................. 53

Annex 5: Flex File structure draft ...................................................................................... 63

Page 5: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | i

Executive Summary

The Workshop on Integrated DATRAS Products (WKIDP), chaired by Clara Ulrich (Denmark), met at ICES Headquarters, Copenhagen, Denmark, 7–10 October 2014 to:

a ) Create a list of working groups using DATRAS data on different aggrega-tion levels (e.g. by species over several surveys);

b ) Create detailed descriptions for the output/products needed, e.g. CPUE per length per haul, ALK files, etc.;

c ) Investigate the possibility to create those products; d ) Develop an implementation plan for the creation of those products and the

metadata, based on WG priorities and time schedule ICES Data Centre.

The workshop gathered providers of DATRAS data involved in some surveys Working Groups, as well as users of these data.

A review of the current uses of and requests to DATRAS (ToR a) highlighted a frequent and diverse activity. Beyond standard analytical stock assessment using age-based sur-vey indices, DATRAS data and products are also used for giving advice on Data Lim-ited Stocks (DLS), for many research projects and publications, and, increasingly, for integrated ecosystem assessment and for MSFD (Marine Strategy Framework Di-rective) related queries. Users’ profiles appeared also quite diverse, with some users requesting only access to raw data to develop and apply their own methods after-wards, and some users requesting more standardised and documented ready-to use products.

On this basis, WKIDP collected a suite of ideas and suggestions for further develop-ments and improvements of the DATRAS services. These ideas have been classified into five different groups, depending on the type of needs that they address. The first two groups deal with raw data (improved submission process and improved explora-tion tools). The next two groups deal with existing computed products (extending their use or changing their computing methodology). Finally, the last group deals with new products not yet available from the DATRAS home page. The five types of needs are addressed individually (ToR b and c). In each of these, needs are detailed according to 1) describe the issue, 2) review what is currently being done about it and 3) if necessary, set up an action plan for future work.

Addressing the second group of needs, it became obvious that many useful generic tools for the exploration of raw exchange data already exist, and WKIDP was a great opportunity to share knowledge and documentation about these.

Finally, the needs of new products included both aspects of single-stock assessment (estimation of weight and maturity at age, abundance and biomass indices for the DLS) and ecosystem assessment (standard data and calculations for selected MSFD GES in-dicators).

A general recommendation from the group is to enhance communication and collabo-ration between the data end-users, the ICES Data centre and the ICES Survey WGs.

Page 6: Report of the Workshop on Integrated DATRAS Products (WKIDP)
Page 7: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 1

1 Terms of Reference

The Workshop on Integrated DATRAS Products (WKIDP), chaired by Clara Ulrich* (Denmark), will meet at ICES Headquarters, Copenhagen, Denmark, 7–10 October 2014 to:

a ) Create a list of working groups using DATRAS data on different aggrega-tion levels (e.g. by species over several surveys);

b ) Create detailed descriptions for the output/products needed, e.g. CPUE per length per haul, ALK files, etc.;

c ) Investigate the possibility to create those products; d ) Develop an implementation plan for the creation of those products and the

metadata, based on WG priorities and time schedule ICES Data Centre.

WKIDP will report by 1 November 2014 to the attention of DIG, ACOM and SCICOM.

Page 8: Report of the Workshop on Integrated DATRAS Products (WKIDP)

2 | ICES WKIDP REPORT 2014

2 Current use of DATRAS (ToR a)

2.1 Contact to and feedbacks from DATRAS users

Ahead of the workshop, the Chair and the Secretariat circulated information about the workshop to the list of WG chairs and stocks coordinators, and asked for feedback and input regarding this ToR. The topic was also brought to the attention of SCICOM and ACOM during the 2014 Annual Science Conference.

However, direct feedback from those groups has been limited to very few people. In-direct or informal feedback has been provided by a few others. WKIDP did not have the time or the resources to go through all WG reports to address ToR a) in detail and establish a full list of DATRAS use. Rather, WKIDP built on the experience of the ex-perts attending the meeting to provide a general overview of what is meant to be the various uses of DATRAS data within ICES WGs. The most usual data uses include:

• Standard indices calculations by year and stock area for analytical single-stock assessments;

• Mapping of spatial distributions; • Stock weights at age and maturity ogives; • Abundance and/or biomass estimates for Data Limited Stocks (DLS) cate-

gory 3.

Some of those uses are directly provided through existing DATRAS products, other are computed by the relevant experts in the WG, based on DATRAS raw exchange data.

Additionally, the range of topics addressed by ICES is constantly broadening, espe-cially regarding the wider ecosystem considerations (Integrated Ecosystem Assess-ment, Marine Strategy, etc.). Consequently, new uses of DATRAS data within the ICES community are emerging, but those uses are largely based on exchange data rather than on existing products.

Finally DATRAS Exchange data is a very valuable source of information for scientific analyses. On google scholar, more than 220 records contain a reference to ICES DATRAS, many of those being peer-reviewed publications. WKIDP notes furthermore that ICES WG may not always refer properly to DATRAS in their reports, making it difficult to assess the exact dependencies of these to DATRAS products using standard queries.

2.2 Users’ data requests and downloads from web site

To complete this broad overview, the ICES Data Centre summarised the activities from the DATRAS website and the type and frequency of ad-hoc data requests it receives.

2.2.1 Use of existing products and downloading activities

There are three main categories of data product available on the download page

1) Exchange data: raw survey data, as provided by the data submitters, without any calculations or aggregation procedures applied. This is the highest down-loaded product in ICES community. Users:

• Data submitters: review the uploaded data, quality control procedure. • Data scientists: use this type of data directly in their model/application/pro-

gramme.

Page 9: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 3

2) Length- and Age-based aggregated data: Catch Per Unit Effort by selected cri-teria. These data products are useful for their easy-to-understand format and standardized catch figures. Users:

• Data scientists: Data products with different levels of aggregation save time for stock-assessment scientists and provide basic data for inter-survey com-parison OR inter-area comparison.

• Students: Aggregated data products provide a good tool for research projects. 3) Indices and bootstrap: survey-specific data products.

Users: • Data scientists: Survey working groups and stock assessment groups are the

main users of this product as this product is standardised and the routines are agreed by individual survey working groups, based on the sampling design and the stock assessment needs; calculation documentation is also available on DATRAS homepage document section.

The following figures display statistics charts in number of downloads by year for the various products, in decreasing range of order:

1084 1536 1490

3082 3141

0

2000

4000

2009 2010 2012 2013 2014

Exchange Data

609818

1040 1094782

0

500

1000

1500

2009 2010 2012 2013 2014

CPUE per length per haul

Page 10: Report of the Workshop on Integrated DATRAS Products (WKIDP)

4 | ICES WKIDP REPORT 2014

94 171 290

731

278

0

500

1000

2009 2010 2012 2013 2014

CPUE per length per area

322 302

523345 310

0

200

400

600

2009 2010 2012 2013 2014

Indices

101 164

429540

196

0

200

400

600

2009 2010 2012 2013 2014

SMALK

94248 210

385

140

0

500

2009 2010 2012 2013 2014

CPUE per length per subarea

191258

350279

125

0

200

400

2009 2010 2012 2013 2014

CPUE per age per haul

Page 11: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 5

163120

167127 139

0

100

200

2009 2010 2012 2013 2014

ALK

121 104147

110138

0

100

200

2009 2010 2012 2013 2014

CPUE per age per subarea

76

134

4972

122

0

50

100

150

2009 2010 2012 2013 2014

CPUE per age per area

30 30

5043

28

0

20

40

60

2009 2010 2012 2013 2014

Bootstrap Data

8

17

1115

10

0

10

20

2009 2010 2012 2013 2014

Range divide by median bootstrap

Page 12: Report of the Workshop on Integrated DATRAS Products (WKIDP)

6 | ICES WKIDP REPORT 2014

2.2.2 Addressing specific requests sent to Data Centre

Specific requests by working group members or data users are made by filling in the request form (http://ices.dk/marine-data/guidelines-and-policy/Pages/Requesting-data-from-ICES.aspx) and delivering it to [email protected]. If the request needs ex-pertise and advice from a wider audience, ICES Data Centre asks for help from Data and Information Group (DIG). As these ad-hoc requests are not in the ICES Data Centre work plan, they are prioritised based on other on-going work within the ICES Data Centre plan, and the level of the user (see Section 2.2.2.2 on priorities)

2.2.2.1 Typical types of requests

DATRAS-related requests are very versatile in their types, and can be classified by re-quester, time-frame, or actions/results.

The typical requests can be classified as below.

By requester:

I. ICES Community I.I. Direct requests by EU commission or ACOM I.II. Expert groups requests I.II.I. Requests for standard data products (develop new or edit the exist-ing) I.II.II. Requests for changes in quality checks or formats I.II.III. Ad-hoc requests for data products and data extractions I.III. New surveys in DATRAS I.IV. Data submitters

II. Scientists (in and outside of ICES community) interested in trawl survey data II.I. External quality assurance

III. Beginners (scientists and students) III.I. Information and guidelines

Timing:

• Regular (e.g. data calls for data submissions and calculations for Advisory assessments)

• Ad hoc (e.g. requests to produce a new data output or check data availabil-ity)

• Official (e.g. via the request form or WG recommendation) • Unofficial (e.g. data submitter questioning the screening output results or a

data user trying to find out what DATRAS data are)

Actions and results:

• Supply of information by the ICES Data Centre that would result in the data user obtaining the lacking knowledge on DATRAS-related matters.

• Implementation of the new checks in the screening program that would re-sult in better data quality

• Elimination of bugs from current products which results in better data qual-ity

• Production of the new or updated data products which would result in in-creasing the quality of DATRAS data services and increasing the circle of potential data users.

Page 13: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 7

Typical examples for data requests are:

• Indices calculation based on certain subarea • Mean length of species based on selected criteria • CPUE per area based on selected subarea • Data mining questions (e.g. relation of ICES StatRecs to survey areas/ICES ar-

eas) • Quality and data-product description queries (e.g. CPUE calculation methods)

2.2.2.2 DATRAS users profiles and prioritisation in requests

Priorities in execution of requests are generally defined by the workload based on the work plan, size of the request, etc. In general, it is up to ICES Data Centre to figure out how the requests should be prioritized, and if needed, DIG can help with the prioritiz-ing or request approval.

The approximate scheme of priorities:

• Priority 1. Requests by survey working groups about quality procedures and data warehouse issues

• Priority 2. Requests by stock assessment groups, ACOM and EU commis-sion concerning data stored in DATRAS or new calculation methods. Re-quests by data users regarding data mining or aggregation of data.

• Priority 3. Requests by survey working groups or ICES community users for development of new data products.

• Priority 4. Various ad hoc queries about DATRAS web, data format and un-derstanding of data products.

2.2.2.3 Workload, time constraints database requirements to address these

There are many issues that affect the development of a product, however the most im-portant issue is the fact that the most successful requests require involvement from both sides. If the Data Centre needs detailed information, verification, etc., about the request, the faster the requester responds, the faster the work on the product will be done. Good communication is the key for fast and effective development of the re-quested product.

2.2.2.4 From ad hoc requests to generic products

Ad-hoc requests by ICES Working Groups are usually survey- and species-specific. They might be a one-time request, but they can generally be considered as platform for development of the new data products. If an expert group requests a specific data ex-tract from DATRAS, or a data product alternative to an existing one, and finds it useful, this might end up as one of the standard DATRAS data products. An example of this might be NS-IBTS extended cod indices that are now requested by the WGNSSK to be published as a standard data product.

2.3 Recent features in DATRAS that are worth being aware of

DATRAS is developing constantly, evolving to improve the quality of services and to attract data users.

The main landing page in DATRAS features many direct links to improve direct access to DATRAS-related services and products.

The main features are:

1 ) Online data download per data product and survey

Page 14: Report of the Workshop on Integrated DATRAS Products (WKIDP)

8 | ICES WKIDP REPORT 2014

2 ) Online data upload with built-in data quality checks 3 ) DATRAS webservices – a brand new feature that aims to ease direct access

to the data 4 ) DATRAS documents section stores all DATRAS-related documents, includ-

ing manuals, user help documents, and calculation procedures. 5 ) News and Updates section aims to inform DATRAS users about recent de-

velopments, quality changes, or recent request information 6 ) DATRAS exchange format page provides overview of fields within survey

data records, their description, units. And direct link to survey-specific list of codes

7 ) Submission Status provides an overview of the latest data submissions per survey, year, quarter

8 ) ISO 19115 metadata records for trawl surveys are stored at ICES Geonet-work

9 ) DATRAS data distribution maps can be viewed from ICES Data Portal. 10 ) ICES Data Policy specifies data use and data citation. 11 ) ICES Reference codes vocabulary (vocab.ices.dk) provides the complete list-

ing of ICES codes, including trawl surveys. 12 ) DATRAS screening includes now listing per check ID, which improves the

manageability of datasets during the upload process.

Please refer to the slides below for visual overview of DATRAS features.

Page 15: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 9

DATRAS webservices (menu 3) are explained in section 4.2.3 below

DATRAS documents (menu 4) contain useful links to e.g. survey manuals, calculation procedures, and a FAQ that is updated regularly when new questions pop up.

News and updates (menu 5) document the history of changes and major events involv-ing DATRAS.

Screenshots of the menus 6 to 10, with some explanations, are given below.

Page 16: Report of the Workshop on Integrated DATRAS Products (WKIDP)

10 | ICES WKIDP REPORT 2014

Page 17: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 11

Page 18: Report of the Workshop on Integrated DATRAS Products (WKIDP)

12 | ICES WKIDP REPORT 2014

2.4 Conclusions for ToR a)

2.4.1 Different types of DATRAS users

The previous paragraphs illustrate that the community of DATRAS users is large and heterogeneous, and it is difficult to frame a typical user-profile. The high number of download of exchange data, together with the numerous requests sent to ICES Data centre may indicate that many needs are not covered by the current range of calculated products on DATRAS homepage (CPUE, indices, ALK), which makes users develop their own codes and estimation procedures. This can be interpreted in two ways: either that those needs are so variable that it is best to let it to end-users to cover them, and to let DATRAS concentrate on providing high quality raw data; or, on the contrary, that the range of current DATRAS products is not sufficient and should be expanded to a broader range of generic products.

Page 19: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 13

WKIDP reflected upon this question and noted that, based on own experience within ICES Working Groups, user-profiles vary within the same WG or the same research institutes:

1 ) There are advanced users having knowledge about data manipulation and statistical procedures for estimation of survey metrics, who prefer having own control of computation processes from raw data.

2 ) On the other hand, there are less experienced users or users who spend only a limited part of their time to assessment-related work, who prefer to use standardised input data within short time frames.

The first group does not necessarily need changes on DATRAS homepage, especially after implementation of DATRAS web services allowing for automatized extraction of exchange data (see section 4). WKIDP acknowledges however that some benefits could be gained by gathering this group of users around a common forum for documentation and sharing of their procedures, since different scientists might use different statistical methods for the same purpose (for example “filling” of missing values in ALK).

WKIDP was obviously rather meant as a support for the second group of users, assum-ing that by providing a greater range of tools and products the amount of data requests sent to ICES Data Centre would decrease. This would also contribute to improved doc-umentation and capacity building within the ICES Community.

2.4.2 Different types of needs for improved DATRAS products

Based on the above list of current DATRAS uses, WKIDP mapped a list of potential needs for further developments and improvements. These needs have been classified into three directions, depending on whether they deal with raw data, with existing computed products or with new computed products not yet available from the DATRAS home page. Two types of needs have been identified within the two first directions, leading to five different areas of work:

1 ) Improved quality of and access to raw data 2 ) Improved data mining, exploration and mapping of raw data 3 ) Application of existing products to new outputs (e.g. to new species, new

quarters, new areas, new surveys etc) 4 ) Changes in estimation or calculation procedures of existing products 5 ) Other new products

These five types of needs are addressed individually in the five next sections (sections 3 to 7 – ToR b and c). In each of these, needs are detailed according to 1) describe the issue, 2) review what is currently being done about it and 3) if necessary, set up an action plan for future work.

The needs that have been identified as requiring a specific action plan for development and implementation are described in the section narrative, and are supplemented by an action plan annex following a fixed template (ToR d). All of these are gathered into Annex 4.

Page 20: Report of the Workshop on Integrated DATRAS Products (WKIDP)

14 | ICES WKIDP REPORT 2014

3 Type of needs 1: Improved quality of and access to raw data

3.1 Dealing with missing data and updates

3.1.1 Problem description

Over the years, the suite of species uploaded in DATRAS has changed, at least for a number of surveys. As a result, it is not clear for data users if species are missing due to changed submission criteria, or the species not being there.

3.1.1.1 Data (re)submission

DATRAS should reflect the most actual state of the data, it is im-portant that there is good over-view of the last update. This can be found at

https://datras.ices.dk/Data_products/Submission_Status.aspx

Here, data users can see when the latest version of a data-series was submitted.

Resubmitting large amounts of data is time-consuming and therefore data-submitters do not always re-submit data to DATRAS when small changes are being made in the institute’s database.

3.1.1.2 Responsibilities related to data upload

a ) Institutes are responsible for the quality of the uploaded data in DATRAS. b ) Survey working groups (i.e. WGBIFS, IBTSWG, WGBEAM) discuss and de-

cide on the species lists that have to be uploaded, and check regularly if all institutes follow the guidelines.

c ) Deadlines for data submission are set by the survey working groups, and influenced by assessment working groups working with the data.

d ) A survey working group can decide that data have to be resubmitted when they are incomplete, the institute is responsible for the resubmission.

e ) If data users (e.g. individual scientists, ICES working groups) suspect in-complete or incorrect data, they should contact ICES Data Centre which can check whether there is a problem with data understanding, system error, or a data quality issue. If it is the latter which is identified, then the relevant institute or survey working group should be contacted.

3.1.2 Potential future developments

1 ) To facilitate contact between data-users and data-submitters, a list should be established, containing the survey acronym, the responsible survey working group, the country, and the contact person(s) for the country and

Page 21: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 15

survey, being the data-submitter(s). If needed, the relevant WG chair can be incorporated in mail discussions. The table can be produced based on the ICES Data Centre’s list of data sub-mitters, and will be automated. A link to the table could be incorporated in the downloads from the DATRAS webpage. An example of the table head-ings:

SURVEY ACRONYM

COUNTRY SURVEY WORKING GROUP

CONTACT PERSON

EMAIL

2 ) Making re-submission less time-consuming might be done by automating

resubmissions. In DATRAS only complete survey sets by year, quarter and ship can be submitted. As re-submitted information is usually largely iden-tical to the originally submitted information, there is no need to report on those values that the data-submitter has already agreed upon at first sub-mission. Basically, for re-submissions only records differing from the origi-nal records should be checked. If all changed data are within the limits, the resubmission should be accepted directly. Feedback on potential errors should only be given on the changed information. Potential solutions might be to develop partial upload (in progress), or to keep the history of the ‘click-ing result’.

3 ) In addition to the calculation date, it is useful to add date of the last submis-sion to Exchange file if partial update is available

3.2 Publication of other parameters involved in calculations

3.2.1 Problem description

Beyond the exchange data, a number of other parameters are involved in the calcula-tions of indices and other products. This includes for example:

• Strata size (especially for surveys stratified by depth rather than by ICES rectangles)

• Conversion factors between ships or gears • Sunrise-sunset functions • Swept area calculations (based on gear characteristics) – see also chapter

7.1.1 below • List of all species with all WORMS codes

The issue is that these parameters are usually not published, making it difficult for users to reproduce the estimates.

3.2.2 Future work

This issue has already been raised and is already being addressed. For example, gear conversion factors and area weights for the BITS survey are now published on DATRAS website as csv files, available at http://www.ices.dk/marine-data/data-por-tals/Pages/DATRAS-Docs.aspx

It is suggested to publish all these other parameters for all surveys in the same way.

WKIDP does not consider it necessary to develop a specific interface access for these parameters, the current interface seems sufficient for the time being.

Page 22: Report of the Workshop on Integrated DATRAS Products (WKIDP)

16 | ICES WKIDP REPORT 2014

4 Type of needs 2: Improved data mining, exploration and mapping of raw data

4.1 What are the needs?

ICES data policy states “ICES will be a leader in marine data and information manage-ment, providing best practices, data mobilization (i.e. making it easy to access and eval-uate) and services for its advisory and science groups and the wider marine and maritime communities”. While the survey data held in DATRAS are now largely com-plete and secure, attention at WKIDP was turned to data services and “mobilization”.

Data mining, the process of exploring patterns in large data sets, is currently not well-developed on DATRAS homepage and must therefore largely be performed by indi-vidual users on other platforms (excel, R etc.). Increasingly, more numerous and more diverse users are accessing the raw exchange format (see download trends in section 2). Also, DATRAS has grown over the years from a simple data repository targeted to survey working groups to a complex database with many features, and WKIDP acknowledged that the comprehensive content of the DATRAS homepage and the data structure can be quite a steep learning curve for newcomers and casual users, who do not have the necessary understanding of what this is about and what is in there. There-fore, it is worth considering whether providing improved facilities for data access and exploration could be an added value.

Before the launching of DATRAS webservices in May 2014 (which are themselves not for direct use, but as an export facility for other software or platforms, see below), the only way to access raw data was by directly accessing the DATRAS website, going through to the download page and individually downloading each survey dataset in the exchange format.

It could be argued that such access to DATRAS is fine as it is, and sufficient for the purpose. These raw exchange format data are freely accessible, and all the information is available to users as to how the data was collected, which species are loaded, which time series there are and when they are carried out. In data mining/mobilization terms however, many extraction and data manipulation steps are required before any con-clusions can be reached about the data.

As an example, in order to download exchange data for the UK Beam Trawl survey in the eastern English Channel nine steps are required, before beginning to look at the data:

Page 23: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 17

1 ) Navigate to the ICES Data Centre Portal and click on the DATRAS link (red circle)

2 ) Once on the DATRAS page, click the DOWNLOAD DATA PRODUCTS link.

Page 24: Report of the Workshop on Integrated DATRAS Products (WKIDP)

18 | ICES WKIDP REPORT 2014

3 ) This opens the download page. Then choose ‘Exchange data’ from the data products and select which headers are needed (HH, HL and/or CA – chosing this requires the basic understanding of data structure).

4 ) Then choose the BTS survey from the survey list.

Page 25: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 19

5 ) Then choose the correct quarter (one may need to go to the WGBEAM webpage or the WGBEAM manual page to get the relevant information on survey timing)

6 ) Then choose years.

Page 26: Report of the Workshop on Integrated DATRAS Products (WKIDP)

20 | ICES WKIDP REPORT 2014

7 ) Then choose the correct ship or ships (again one might need to look up that information as well)

8 ) Then read and accept the warning about the data use.

Page 27: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 21

9 ) And then finally the exchange data are downloaded and can be then read in and explored in e.g. excel or any other platform

Additionally, survey data are multidimensional, and data visualisation will typically tend to explore patterns with regards to time, space and/or size (age-length). Capturing main patterns of the dataset will become even more complex when multiple species, areas and survey dynamics are included, and this will require a fair amount of similar data filtering and manipulation before that the main features of the data can be appre-hended.

The suite of screenshots illustrates that while the overall procedure is correct and now well known to usual ICES users, the current DATRAS homepage only allows for ex-traction of raw data with some filtering functions, but does not provide many facilities for quick and easy exploration of the dataset, which must be done in other platforms. Therefore, any additional tool providing easy access, manipulation and visualisation of exchange data would be potentially beneficial for many users, both within and out-side the ICES community.

WKIDP has identified a number of initiatives already available to address this need, some being already a part of DATRAS home page and others being developed by users outside. These initiatives are described in the chapters below. Additionally, WKIDP has identified further options that could be developed in the near future.

4.2 Data mining tools on DATRAS homepage

4.2.1 Better documentation of the data submission

As recalled in the section 3, the submission status of uploaded files is available on the DATRAS home page (https://datras.ices.dk/Data_products/Submission_Status.aspx). To enhance the usability of submission status page, few more parameters and columns could be added, to give to the user an overview of submitted data as a summary of all species or even on an individual species level.

WKIDP recommends that one extra parameter and seven new columns are added to the submission status table:

Page 28: Report of the Workshop on Integrated DATRAS Products (WKIDP)

22 | ICES WKIDP REPORT 2014

New parameter proposed: species; the user can select ‘all’ or ‘standard’ species based on the survey. New columns proposed:

• No_of_Hauls = Total valid hauls for selected survey, year, quarter • No_of_obs_Hauls = Total number of hauls where the selected species are

observed • No_of_Lngt_Dist = No of length freq for the species selected. (No of unique

occurrences of HL by species by haul where Specval = 1) • No_of_Indi_Meas = Total number of individuals measured in length data

sampled • No_of_Indi_Raised = Total number of Individuals in length data raised to

total catch • No_of_Age = Number of Age readings • No_of_Spec = Number of Species encountered in the data selected

The webpage would need a new design to allow the user to extract this information regarding the submission status.

Draft design:

Cf. Also Annex 4 (page 59) for the more detailed action plan

4.2.2 The “Flex File”, A simpler version of the exchange data, easier to explore

Currently DATRAS provides a number of useful, but quite specific products, such as CPUE per length per haul. The data provided is for a single survey per download and for non-zero data only. If the investigator wants the flexibility for example to explore data across surveys or to include effort in this product from tows where the species was not observed then it is necessary is to download the raw Exchange Data Files.

The Exchange files consist of three separate “related” record types covering the range of information necessary to describe the hauls (HH), the length data (HL) and the age information (CA). A degree of flexibility is included in the file structure to allow for different sampling methods or measurement units between institutes. Flexibility re-quires certain conversions to be done to standardise outputs and therefore a number of fields have flags and codes associated with them, which must be understood in order to interpret the data correctly.

Discussions over a number of years at IBTSWG, WKDATR and DUAP have high-lighted the difficulty for all users (both experts and non-experts) in using these files as a quick access download product. The request for a simplified “flat file” product was formalized by the WKDATR in 2013, and supported by the Survey groups the same year. (ICES, 2013 section 4.2.41). In contrast to a relational database type structure

1http://www.ices.dk/sites/pub/Publication%20Reports/Expert%20Group%20Re-port/SSGESST/2013/WKDATR13.pdf,

Page 29: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 23

where data is held in a memory-efficient manner over a number of related tables, a “flatfile” structure is generally assumed to be a single self-contained table. Age data (CA Exchange File records) being of a quite different structure may require a separate file, but the rationale discussed above holds.

The ICES Data centre is currently working on implementing such a file, referred to as “flex file”. It is derived from the exchange data, where the unnecessary fields will be removed from the data records, and some derived fields will be introduced, so it would be easier for data users to refer to one data product to perform data analysis by selected criteria. The product will also be more flexible towards accepting user requests about structure change. See the draft structure of the “Flex file” in Annex 4.

4.2.3 DATRAS web services

Web services introduce a method for retrieving data directly from DATRAS without creating a static download. Thus, other applications can access DATRAS data and ma-nipulate and display it, always retrieving the most current version of data directly from the database (an example of such application is given in section 4.3.2 below). Web ser-vices are accessed by requesting data through a URL, and the result returned is data formatted in XML. This allows other applications to automatically submit requests and retrieve data in a known format. There can be a small delay for larger data requests, but generally, retrieving data through web services is much faster than going to the website, download a snapshot of data and then import it into analytical packages or visualisation software.

The XML format from web services can also be stored and worked with in Excel as described in the example below:

1 ) Access the web service for either HH, HL, or CA records (as defined on the DATRAS Web Service description page (https://datras.ices.dk/Web-Services/Webservices.aspx). The URL can be manipulated to change the sur-vey series, year and quarter for which data is to be extracted. In this example, we will use the default HH format for stations/Hauls and retrieve the North Sea IBTS data from quarter 1 in 1966. This is all achieved by click-ing on the URL in the web service page, but the year, quarter and survey name can be changed to other values to get data from other years, quarters or surveys. In this example, the address entered into the web browser’s ad-dress line is: https://datras.ices.dk/WebServices/DATRASWeb-Service.asmx/getHHdata?survey=NS-IBTS&year=1966&quarter=1

2 ) When the page loads, it will return a set of XML data, which can then be read in by an application or saved:

Page 30: Report of the Workshop on Integrated DATRAS Products (WKIDP)

24 | ICES WKIDP REPORT 2014

Initially, this format may look daunting to an end-user who is not used to work with XML. However, the data can be saved as an XML file (in most browser, simply right click and choose “Save As”. Once the file is saved, it can be dragged directly into excel and be displayed as a table:

During WKIDP, several products were presented and discussed (see next section on tools outside DATRAS), and some of these use the web services to ingest the DATRAS data directly. An early prototype of a web-preview of stations was mocked up during WKIDP as well (contact Jens Rasmussen for more info/code). This simply allow a sim-pler display of data in formatted tables on a web page, as well as an interactive web map:

Page 31: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 25

The above map is a direct display of a given survey/year/quarter combination pulled directly form web service using a local open source web server, PHP, and open source tools (Leaflet javascript and JQuery). Similarly a table can be generated which can be copied into Excel or saved as a CSV file:

3 ) The URL for calling the web service can be altered to retrieve new data. In the web prototype, a form was built allowing users to call different surveys, years, and quarters. However there is not method for resolving the years

Page 32: Report of the Workshop on Integrated DATRAS Products (WKIDP)

26 | ICES WKIDP REPORT 2014

available under a particular survey at the moment (this require a visit to the download page or previous knowledge of what is kept in DATRAS)

Based on this review and testing of Web services, WKIDP also suggested a few web queries that could be added to the list:

i. getSurveys. Provide a simple list of the names of surveys, possible with some additional information like full names, etc. But principally, the purpose is to retrieve the web service usage name of each survey. Return would be a simply list containing the lookup names of all surveys held in DATRAS

ii. getSurveyYears(Survey). For any given survey, return a list of all the years for which data are held.

iii. getSurveyQuarters(Survey, Year). Return the quarters for which data are available in a given survey and year.

These generic reference web services would assist in better automatic of data retrieval via web services (since no explicit knowledge of years and quarters for specific services would be required, they can be looked up via a web service instead).

4.2.4 ICES Geoportal

WKIDP reviewed the current use of ICES geoportal facilities for GIS exploration of DATRAS data, and found out that it has so far only a limited use as a data mining tool as it is only able to output one type of information from DATRAS at this time (total abundance).

WKIDP considers that this feature could certainly be enhanced, with a broader range of DATRAS data and products directly visualised on the GeoPortal.

4.3 Data mining tools publicly available on web outside of DATRAS website

In addition to the web services available on the ICES website, there are additional tools that have been developed outside of the ICES Data Centre. WKIDP reviewed those that the group was aware of, but other might exist. WKIDP concluded that it is worth to have a good overview of products made by scientists and recommends that ICES Data Centre links to the known products on the DATRAS webpage.

4.3.1 DTU DATRAS R package (DTU Aqua)

DTU Aqua (C. Berg, K. Kristensen) have developed an R package that allows users with basic R knowledge to directly access and download full survey datasets and, us-ing functions within the R package, combine and manipulate data in any combination. The DATRAS R-package has facilities for quickly downloading, reading, and sub set-ting of the exchange data from the DATRAS database. Conversion and standardization of units is performed within the DATRAS package, such that it is easy to get length in centimetres and Latin species names rather than codes used in the exchange data. It is ideal for data mining, since R has excellent functions for inspecting, tabulating, plotting and making various summaries of the data. Some specialized plotting functions are also built in to the package. This data mining tool (http://rforge.net/DATRAS/Tuto-rial.html), which is free to download and use, starts to make data mining from the DATRAS database more accessible to the general user (screen shot below).

Page 33: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 27

4.3.2 RICES R package based on DATRAS web services (MRI)

Along the same idea, MRI (Einar Hjorleifsson) has put together some wrapping code into a R package (RICES) available on github (https://github.com/einarhjor-leifsson/rices). This package takes advantage of the DATRAS web services to read in the data and plug them in R, where they can subsequently be manipulated and ana-lysed. WKIDP successfully tested this feature on a usual Windows laptop, using the following installation code (here with an example of extraction of North Sea IBTS quar-ter 3 in 2013):

require(devtools) install_github("rices","einarhjorleifsson") require(rices) options(RCurlOptions = list(cainfo = system.file("CurlSSL", "ca-cert.pem", package = "RCurl"))) options(stringsAsFactors=FALSE) st <- get_hh_data(survey = "NS-IBTS", year = 2013, quarter = 3) ca <- get_ca_data(survey = "NS-IBTS", year = 2013, quarter = 3)

With st and ca being usual dataframe R objects. A useful script example was provided, exploring raw data with ggplot2 library. This script is available on the github addresse above.

4.3.3 Published surveys and population indicators (IFREMER)

Ifremer commissioned the creation of an open source software called Coser for check-ing and preparing bottom trawl survey data for the calculation of indicator time series. The software is available at http://maven-site.forge.codelutin.com/coser/en/in-dex.html. Coser has the following capabilities in terms of data checking with some op-erations being automatic and others requiring intervention by an expert

Page 34: Report of the Workshop on Integrated DATRAS Products (WKIDP)

28 | ICES WKIDP REPORT 2014

• checking completeness of data, incl. availability haul swept areas and strata surface areas

• checking consistency of species names with a reference list • checking for errors in length measurements, e.g. error in unit (mm instead

of cm or vice versa) using histograms • preparing species list for population indicator calculation using criteria for

minimum occurrence (proportion of hauls species is present) and minimum density

• preparing species list for species with length measurements in all years (in certain surveys not all species have been length measured throughout the years)

• allowing the user to merge species at the genus or higher level in cases were taxonomic identification is considered variable over time, i.e. dependence on who was onboard

• selecting data for certain years and strata only to ensure homogeneity of the species that were noted, length measured etc.

All data selection choices are saved in a file so that the same choices can be applied in the future.

Ifremer has also created an R package called RSufi that calculates a number of popula-tion and community indicators using as input the data files that were checked and created with the Coser software. The R package is available on demand. The results for all areas with a French contribution are published on the internet at http://www.ifremer.fr/SIH-indices-campagnes/index. The web site also contains the prepared data files and a description of the indicator calculation. A system for tracing versions of data sets and results is in place.

4.3.4 ERDDAP visualization interface (marine Institute)

WKIDP looked at the NOAA developed ERDDAP data server used at the Marine In-stitute to make some of its larger datasets to the public in a digestible format http://erddap2.marine.ie/erddap/index.html. ERDDAP (the Environmental Research Division's Data Access Program) is a data server that gives you a simple, consistent way to download subsets of scientific datasets in common file formats and make graphs and maps. It is a free and open source, all-Java (servlet), web application that runs in a web application server (for details see http://coast-watch.pfeg.noaa.gov/erddap/download/setup.html).

ERDDAP is largely a data server and allows filtering of any fields in the data table behind the application (Fig 4.3.3.1). Once selections are made the data can be down-loaded in broad range of file types for further visualisation and/or analysis. More im-portantly the query, including the desired export file type is encapsulated in a URL that can be copied and run again at any point by simply pasting into any web browser. The URL can also be incorporated into other analytical packages such as R so the data can be refreshed dynamically any time the code is run.

Page 35: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 29

Fig 4.3.3.1. Screenshot of the ERDDAP “graphs” screen for IBTS Trawl Data. Data is plotted on the fly using the axis specified in the upper drop down menus, while the values plotted can be con-strained to single or ranges of values in the lower drop downs. Existing data is provided in the drop downs and can be selected.

ERDDAP is only one of many proprietary and open source data web server packages. Several others such as Leaflets (http://leafletjs.com/) and OpenLayers (http://openlay-ers.org/) are also free and open source and appear to be more Mapping focused. Expe-rience at the Marine Institute was that ERDDAP was initially set up and running within two about days and additionally data sets take a day or less to add, depending on complexity. A review is underway at the Institute of available products and conclusions will be fed back hopefully for inclusion in this report.

4.3.5 D. Beare DATRAS R package

Another R package developed by D. Beare was found on googlecode, https://code.google.com/p/datras/, but no information on this was made available to WKIDP.

Page 36: Report of the Workshop on Integrated DATRAS Products (WKIDP)

30 | ICES WKIDP REPORT 2014

4.4 What is not publicly available but could be useful to users

WKIDP acknowledged that there are certainly many other pieces of code developed by various users, which could advantageously be gathered and shared, for the benefits of the wider user community and for improved capacity building.

There are no obvious ways to do this, as such work develops rather informally follow-ing users’ needs. WKIDP suggested that trials could be initiated for gathering such pieces of code into an open repository such as wiki or github accessible from DATRAS homepage.

Page 37: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 31

5 Type 3: Application of existing products to new outputs (e.g. to new species, new quarters, new areas, new surveys, etc.)

5.1 Overview of existing products by survey

All surveys in DATRAS have the Exchange data available online. Some of the surveys have more data products developed. The full overview is at the Table 5.1.1. DATRAS data products availability per survey.

Normally, a survey follows a standard routine to achieve storage of data and data products in DATRAS. DATRAS survey and data products establishment process in-cludes stages as follows.

1. New survey 1.1. After the initial request to store the survey data, confirmed by ACOM, the Re-

porting format is agreed between ICES and the WG. 1.2. Data screening set-up (codes, fields, conditions, and ranges) and testing. 1.3. Data upload set-up and testing. 1.4. Upload of the raw survey data by the data submitters. 2. New products (or revision of the old ones) 2.1. Request by an Expert group for data products other than the Exchange data with

methods and examples included. 2.2. ICES Data Centre in co-operation with the Survey working group would develop

algorithm, produce the draft products. 2.3. The Expert group – verification of the product output. 2.4. ICES Data Centre – delivering of the product to the requester or publishing the

products online, informing the requester.

As a general comment, WKIDP recommends that any change in data products or new data products are much more systematically documented on the DATRAS homepage. Progresses on documentation and information have already been achieved, using the “news and updates” page http://ices.dk/marine-data/data-portals/Pages/DATRAS-News-and-updates.aspx, so WKIDP encourages the ICES Data Centre to make a very regular use of this page as an information log.

Page 38: Report of the Workshop on Integrated DATRAS Products (WKIDP)

32 | ICES WKIDP REPORT 2014

Table.5.1.1. DATRAS data products availability per survey (October 2014).

Existing Products

Exchange Data

CPUE L/Haul

CPUE L/Area

CPUE L/SubArea

CPUE Age/Haul

CPUE Age/Area

CPUE Age/Subarea SMALK ALK Indicies BootStrap

Range/median Bootstrap

SURV

EYS

BITS A A A A A A A A A A N N

BTS A R R R R R R R R R N N

BTS-VIIA A P P P P P P P P P N N

DWS A* N N N N N N N N N N N

EVHOE A A A A R R R A A R N N

FR-CGFS A N N N N N N N N N N N

IE-IGFS A* N N N N N N N N N N N

NIGFS A N N N N N N N N N N N

NS-IBTS A A A A A A A A A A A A

PT-IBTS A N N N N N N N N N N N

ROCKALL A N N N N N N N N N N N

SP_ARSA A N N N N N N N N N N N

SP-NORTH A N N N N N N N N N N N

SP-PORC A N N N N N N N N N N N

SWC-IBTS A A A A A A A A A A N N

IDPS-IS P N N N N N N N N N N N

IDPS-NS P N N N N N N N N N N N

5.2 Applying existing products to new surveys: ongoing work for the Beam trawl Surveys (WGBEAM)

No estimation products are available so far on the DATRAS homepage for the BTS survey, and ongoing progresses are currently being developed as collaboration be-tween WGBEAM and ICES Data Centre. These current developments are documented here, as a summary overview below and as a more implementation plan in Annex 4 (page 55).

WGBEAM2013 requested that the original standard BTS and BTS VIIa products were removed from the DATRAS webpage. WGBEAM decided on the standard products that should be available for different user groups, and defined those into detail during WGBEAM2014.

When the upload facility for BTS-VIII and the inshore surveys is ready, and data is uploaded, the output products can be made available for the next WGBEAM meeting. For the output to be calculated by ICES Data Centre, a formal data product request has been done in May 2014 by WGBEAM. The products are:

A available

A* limited availability

R under revision

P planned

N not planned (yet)

Page 39: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 33

5.2.1 Standard products from DATRAS input:

Update frequency: continuous

Information type: flexible (always use most recent data)

Data used: all available in DATRAS

Timing: always

Location: on the DATRAS webpage, where people also can download the Exchange file

Products: ALK for all species, SMALK for all species, CPUE per length per haul (num-bers/km2 and numbers/hour), CPUE per haul per species (numbers/km2 and num-bers/hour)

WGBEAM decided only to provide data up to the haul level because users themselves should decide how to combine the different gears. As long as WGBEAM does not have a well-developed protocol for this, the group feels it should be clear to all users that one has to be aware of the different characteristics of the surveys.

5.2.2 Results from the BTS index

a. Frequency: once per year Information type: fixed (so: no updates throughout the year) Data used: only the WGBEAM approved indices and related products as used in the assessment groups. In 2014 and 2015: sole and plaice in the North Sea for Neth-erlands (Tridens and Isis), and England and in VIId. If there is a need to calculate indices for other than those species, see section 5. Timing: prior to WGNSSK (depending on the WGNSSK deadlines) Location: on the DATRAS webpage, the file marked with * will be added to the Indices download and won’t be available as separately downloadable products

NB: before the information can be made available:

• fine-tuning of Dutch index to be done • English index calculations for IVa have to be approved by Cefas • Documentation on index calculation.

Products: Indices, SD, Age_composition, ALK, Combined_HHHL

b. Frequency: continuous Information type: flexible (always latest data available) Data used: all index calculations for available data Timing: always available Location: at DATRAS webpage, only available on login Products: all ‘in-between’ products as produced by current DATRAS index calcu-lation

5.2.3 Internal WGBEAM products for survey summary

Frequency: continuous Information type: flexible Data used: all available of the last survey year (in 2015: all data for 2014) Timing: always available Location: at DATRAS webpage, only available on login Products: Time schedule by week, Overview of hauls carried out, Biological sam-pling: ages, Biological sampling: not aged, Map with hauls by gear

Page 40: Report of the Workshop on Integrated DATRAS Products (WKIDP)

34 | ICES WKIDP REPORT 2014

Additional products for survey summary sheets by WGBEAM: distribution plots of target species by age-group, distribution plots of species. WKIDP noted that this last type of questions could be answered using the generic data mining tools and packages described in section 4.

Page 41: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 35

6 Type of needs 4: Changes in estimation or calculation procedures of existing products

6.1 Description of the issues

WKIDP acknowledged that two issues in particular might contribute to casting some discredit and mistrust over DATRAS from users side.

The first issue is the claim repeatedly heard that “indices cannot be reproduced”. In a number of occasions, individual scientists or working groups have tried to come to the same results as those published on DATRAS (ALK, CPUE, indices) by their own pro-cedures, but could not get exactly to the same values. As it is now, the various steps in the algorithms used for products calculation are described in the surveys manuals as general principles, but this obviously is not sufficient for allowing full reproducibility.

The second issue is the validity and accuracy of these methods used. The algorithms for surveys products involve a suite of complex calculations, some of which being based on specific hypotheses and methodological choices. For example, it must be dealt with standardisation of vessels/gears, missing data and holes in the ALKs, etc. These algorithms have often been established a long time ago, but more recent statistical ap-proaches may now be available. There may thus be several recipes to estimate the same products, and the official DATRAS calculation may be disputed.

These two issues are important and should not be ignored, in order to maintain a high level of trust with the survey data. The survey working groups should be those in charge of this as they have the ultimate responsibility of the quality and transparency of their surveys products.

6.2 An example of how to tackle such issues – WGBIFS changes in ALK and maturity estimation procedures.

As an example, WKIDP discussed how the BITS WG (WGIBFS) did address such issues in close collaboration with ICES Data Centre.

CPUE per age, ALK, SMALK etc. have been available from DATRAS for many years for the Baltic International Trawl Surveys (BITS) conducted in quarters 1 and 4. The Working group on Baltic International Fish Survey (WGBIFS) is responsible for the planning of Baltic International Trawl Surveys (BITS). In addition, the group described algorithms for estimating stock indices, mean weight-at-age and the maturity ogive which are given in the BITS manual. Age based CPUE values are provided by DATRAS based on these algorithms, and are used to estimate stock indices for cod and flatfish. However, different members of WGBIFS tried to recalculate the stock indices based on the exchange data and these described algorithms. Unfortunately, data given on the DATRAS website and the recalculated CPUE values differ. Consequently, the data available on the website were not used for the stock assessment. In addition, not all data which are used in the estimation procedure were not available ate the DATRAS website, like conversion factors for transferring CPUE values of national gears into units of standard gears and area of depth layers.

Cross-check of all steps of the calculation procedure was started in August 2013 by member of WGBIFS and ICES data centre. All wrong calculation procedures were sub-stituted by corrected versions. Finally, the estimated CPUE per age per Area, the final product of the calculation procedure was check for different randomly selected BITS. The work was finished in spring 2014 and since April 2014 the assessment working group of the Baltic Sea uses the data products provided by ICES. The example illus-trates the responsibility of the survey working group for the correct implementation of

Page 42: Report of the Workshop on Integrated DATRAS Products (WKIDP)

36 | ICES WKIDP REPORT 2014

the defined procedures (given in the corresponding manual) for calculating data prod-uct provided by DATRAS. The experience of the work also showed that changes of the calculation of data products given by DATRAS must be realized in close connection with member of the survey working group and that cross check of the implemented procedures are necessary.

6.3 Ways forward

The first and most effective way forward is, as in all other sections, to deal with the issues collaboratively between the ICES Data Centre and with the Survey WGs, follow-ing the 2011 protocol for new data services from the Working group on Data and In-formation Management (WGDIM) (ICES, 2011).

ICES Data Centre hosts the DATRAS database and ensures the proper functioning of the data screening and upload. ICES Data Centre is responsible for the data output including the raw survey data, and development and implementation of algorithms for the data products. ICES survey groups are responsible for checking and verifying the quality of data and the results of the applied algorithms. ICES working groups of trawl surveys are the ones defining new data products and changes of procedures for estimating available data products. Confirmation by the responsible working group is required before ICES Data Centre publishes the new data products or changes in pro-cedures of the existing data products in DATRAS. Cooperation between ICES Data Centre and the Expert group is the key for implementing the successful data product.

Additionally, WKIDP suggested that some intermediate calculations could be pub-lished on the homepage for full transparency and reproducibility of the indices values used in assessment. For example, the subset of raw data, the ALK before filling and after filling could be made available, so that users are able to track where and why their own estimations might differ.

Page 43: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 37

7 Type of needs 5: Other new products

This section gathers all the DATRAS development ideas that are not covered in the previous sections, i.e. ideas dealing with new products rather than with improvements of existing products. These new suggestions are separated according to whether they deal with single-stock data for assessment WGs, or for ecosystem data for integrated WGs

7.1 New products needed for single stock assessments

7.1.1 Estimation of swept area

Current DATRAS products are all calculated based on a unit of time (e.g. Nbr per hour), but there is a demand for also producing those based on a unit of swept area (see also 7.2.2 below), as this is considered to be potentially more appropriate and less variable. In order to do so, swept area must be calculated.

The areas swept vary for each station due to the distance towed and the vertical and horizontal net openings. Before any calculations are produced, a full dataset of infor-mation must be available. Within the survey working groups (WGBEAM and IBTSWG) there is at present an ongoing process for filling-in missing values for ‘Dis-tance’, ‘Groundspeed’ (both IBTSWG and WGBEAM surveys), ‘Netopening’, ‘Wing-Spread’ and ‘DoorSpread’ (IBTSWG surveys only) in the HH Exchange file. In the past not all of these values were mandatory (they are now); however, historical values will have to be filled-in in order to create complete time-series. For example, for the IBTSWG, all countries are expected to have the missing values submitted to DATRAS by the end of October 2014 (WGBIFS surveys are not included in this as it is not possible to calculate swept area as many surveys are missing gear geometry parameters for al-most all years).

Of course it is expected that not all missing values will be able to be filled in (when no real value was recorded during the survey), and they will have to be calculated. Then a standard method for calculating the area swept by the gear during each station should be produced and made available. WKIDP discussed briefly some of the possible methods to fill-in missing values and estimate swept areas, but concluded that this question was beyond the scope of the workshop and should be dealt with during an-other dedicated workshop.

In conclusion, WKIDP considered that this issue was already being dealt with, and that no further actions were necessary at this stage beyond those already on the way.

Once swept area is calculated, it will be possible to estimate new CPUE and indices, in the same way as before, linking thus to section 5above.

7.1.2 Confidence intervals for indices

The need to produce confidence intervals of survey indices has been in the discussions for a long time. WKIDP recalls that such confidence intervals are already available, but are seldom used. In 2007, the EU Commission requested ICES to implement uncer-tainty estimation in DATRAS and to supplement the routine abundance indices with

Page 44: Report of the Workshop on Integrated DATRAS Products (WKIDP)

38 | ICES WKIDP REPORT 2014

uncertainty estimates. Various studies were initiated at that time to review and evalu-ate methods available for computing uncertainty. These are summarized in ICES, 20072 . It was then decided that non parametric bootstrapping was the most suitable method. A standard DATRAS product is therefore a set of 500 bootstrap replicates for the indi-ces (NS-IBTS only). Confidence interval can be summarized through the 25-75% quar-tile range divided by the median, which is also computed as a standard DATRAS product.

It is the understanding of WKIDP that this bootstrap has not been widely used. ICES WGISDAA (2012)3 did some exploration of this and briefly looked at the consequences for survey and assessment working groups, using IBTS Q1 survey indices for North Sea haddock (ages 1–6, years 1983–2011) as a test example. Figure 7.1.1 gives the esti-mated CV (standard error divided by the mean) for each year and age, along with a variance metric provided by ICES on the DATRAS website (range divided by median). The Figure shows that the CVs of this survey were generally low, around 10% – 20%, although there has been an increase in CV for all ages towards the end of the time-series. The CVs on older ages towards the end of the time-series exceeded the 30% level. The increasing CVs was seen as a source of concern and possible causes should be investigated by IBTSWG. WGISDAA (2012) then compared the results from a stand-ard SURBAR run for North Sea haddock (with no downweighting of index data during sum-of-squares minimization) with results from a run with inverse-variance down-weighting applied, using the inverse of the estimated CVs as weights. This made very little difference to the stock estimates from SURBAR in this case. But WGISDAA noted that aside from some of the CVs on the older ages towards the end of the time-series, the estimated variances on the survey indices were quite stable for this particular case. The effect of downweighting in a case with more extreme and changeable CVs still needs to be explored. But the ability in an assessment to downweight an index with a lower CV, such as a commercial index, in favour of possibly a more variable, but less biased survey index for example, requires that the CVs be available for evaluation.

2 http://www.ices.dk/marine-data/Documents/DATRAS/Final_report_to_EU_Bootstrap_calcula-tion.pdf 3 http://www.ices.dk/sites/pub/Publication%20Reports/Expert%20Group%20Re-port/SSGESST/2012/WGISDAA12.pdf

Page 45: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 39

Figure 7.1.1. Variance estimates for the IBTS Q1 survey indices for North Sea haddock (From WGIS-DAA 2012 report)

7.1.3 Mean weight at age and maturity ogive

Mean weight at age in stock and maturity information are standard inputs for most analytical assessments and normally has to be updated every year. The estimates are based on the scientific surveys and the calculation procedure will not vary between most surveys. The maturity ogive might vary from species to species due to the differ-ent maturity scales used, but the basic methodology is the same. Mathematic back-ground for the estimation of mean weight ate age and the maturity ogive is given in the manual of BITS. However, the additional data products are not available for BITS, yet, because additional methods are required to approximate the relative age distribu-tion, the mean weight at length and the proportion of spawner by age for length classes where CPUE per length was larger than zero and age based data were not available. The missing data were estimated by means of tools developed by the stock assessor.

Methods were developed to solve the open problem for BITS. The methods were dis-cussed by WGBIFS and WGBFAS in 2014. Both groups recommended the implemen-tation of the proposed methods in DATRAS. After the implementation the methods can also be applied for other surveys. The action plan is available in Annex 4, page 59.

7.1.4 Abundance and biomass indices for Data Limited Stocks (DLS category 3)

Data Limited Stocks represent the majority of stocks for which ICES provides advice. Most of them falls under DLS category 3, where advice is based on a survey index. As no age-based analytical assessment is performed, a unique synthetic index integrated over all ages/lengths is required. For the 2014 advice this index was mostly based on biomass (e.g. tonnes per hour), but in some other cases an abundance index was used (e.g. number per hour). The algorithms used may currently differ from stock to stock,

Page 46: Report of the Workshop on Integrated DATRAS Products (WKIDP)

40 | ICES WKIDP REPORT 2014

but it might be possible to develop generic and standardized products on DATRAS for group of DLS stocks within the same survey.

CPUE per length is in principle available from DATRAS for most, if not all, species caught in the surveys. These may thus be a better starting point than exchange data, since parameters such as zero hauls and conversion factors are already taken care of. However, care must be taken whether the current subareas definitions are appropriate for the given stocks (e.g. whether roundfish areas can be used for flatfish).

From there, a biomass index is considered more appropriate than an abundance index, since an abundance index may mainly reflect recruitment. But this requires converting CPUE per length per hour into weight. For this it must be decided whether a length-weight key can be fitted on data, or whether fixed length-weight parameters should be used if data are too sparse. Similarly, a maturity ogive can be fitted on data, if the data are sufficient. WKIDP acknowledged that support could be provided on for which DLS stocks can the ALK/SMALK be considered statistically good enough for being used to compute biomass indices, depending on survey protocols and for the amount of meas-urements for each species WKIDP considered that this question could be brought to WGISDAA for generic input, and WKIDP encourages a continuous dialogue between the survey groups and the DLS assessment groups on the estimation of biomass indi-ces.

Alternatively, abundance estimates (in number) could be computed for adults only, using the average adult length provided by IBTSWG.

As for DLS species all available information is needed, it is recommended that down-load by species over all surveys is explicitly facilitated, whether via the manual down-load, webservices or in any other way.

Methodological approaches to data poor stocks are currently in full development, and therefore DLS requests from DATRAS are likely to evolve in the near future.

7.2 Products needed for integrated ecosystem assessments and MSFD

7.2.1 What

The implementation of MSFD means that several indicators for determining Good En-vironmental Status (GES) are being developed both on a national level and internation-ally (common indicators). To enable members and the ICES community to perform and develop indicator calculations, compare processes, and research these, it is important to start out from the data evidence. DATRAS is a “living” database where updates are performed on a routine basis, both to ensure best quality of data and to align with assessment periods. However, for the purpose of the lower frequency reporting for MSFD (every 3 years), it is important that the evidence base used in developing indi-cators can be shared and submitted. To do this, it is necessary to create a dataset that does not change.

A common reference base would be needed for the calculation of indicators for MSFD. It is suggested that a new data product is developed as an extract from DATRAS. The product is different from the DATRAS “raw” data in that interpretation is required in order to ensure a high degree of coverage and completeness of the data product, even where there may be individual observations missing from the raw data. The intended use of the product is specifically for the assessment of Good Environmental Status, with particular emphasis on addressing D1 and D4 needs, which are likely to differ from those associated with D3.

Page 47: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 41

The product specification details the information required for the product, while the annex details some of the considerations required to improve completeness of the data product.

7.2.2 Why

To ensure consistency in the assessment of Good Environmental Status that utilises data from DATRAS and the groundfish surveys. Extracting, filtering, and adjusting the raw data for these purposes is a time-consuming task that will have to be repeated many times by different member nations and for each assessment cycle, potentially generating multiple versions of ostensibly the same monitoring programme data set. As such, it is deemed highly beneficial if the product is developed and resides at ICES as a product that everyone can utilise for the particular purposes of assessment GES. This is an excellent example of potential sharing and reuse of a data product, reducing the risk of assessments being performed on similar, but subtly different data sets which could lead to inconsistencies in the assessment outcomes. Additionally, it provides a clearer provenance of data, effectively shortening the “process-chain” required to move from raw data collection through data formatting to final product.

One of the clear recommendations from the Marine Strategy Coordination Group on monitoring programmes were that “Monitoring programmes have to be ‘coordinated’, ‘compatible’, ‘coherent’, ‘consistent’ and ‘comparable’”4. While many aspects of the DATRAS data already meets these requirements in some form, the “comparable” as-pect is not fully realised until there is a coordinated approach and effectively a product available.

7.2.3 When?

It is proposed that the data product is developed as a periodic output from DATRAS (every 3 years), survey by survey.

OSPAR is aiming to complete common indicator testing by December 2014, and to then undertake an intermediate assessment during 2016, but the first full assessment for MSFD is by 2018. However, to give time to the actual work on assessments of GES, a product would need to be developed during 2014 and 2015, to allow time of uptake and modification of current approaches for assessment (e.g. where some individual DATRAS extract has been modified for a particular assessment)

7.2.4 How?

Ground rules for the ETL and manipulation of data from DATRAS is described in the product description and Annex11.5. There are multiple ways in which this operation can be performed, and it is assumed that the ICES Data Centre would be able to rec-ommend the most suitable technical option for achieving this.

It would however be beneficial if the code or pseudo-code for the generation of the product could be released alongside documentation and data product itself. Ideally, and R-style script.

It is recommended that the published data product becomes a persistent resource and is referenced as such (e.g. the version released in 2017 will be superseded by the 2020

4 Recommendation 2 from OSPAR Commission Report: OSPAR Coordinates Monitoring in the North-East Atlantic. http://www.ospar.org/documents/dbase/publica-tions/p00622/p00622_ospar_monitoring_coordination_report.pdf

Page 48: Report of the Workshop on Integrated DATRAS Products (WKIDP)

42 | ICES WKIDP REPORT 2014

version, but the 2017 data product remains available for full transparency and compat-ibility).

7.2.5 Who?

This is a highly collaborative task between the ICES Data Centre, ICES Survey Expert Groups, ICES WGECO, possibly ICES CSG MSFD, and the member nations requiring the output product (in the first instance, UK is committed to providing the Abundance of a Suite of Sensitive Species and the Large Fish Indicator as common indicators, but the data are likely to be needed to support several other D1 and D4 MSFD indicators by a number of other Member States). As such, the discussion at the workshop should be sufficient to commence proto-typing of the product, while some group of users needing the product will need to opportunity to provide feedback, and provide some decisions for the development when and if missing or blank values need to be substi-tuted with derived or interpreted values instead. Expertise advice from survey group may well be needed, although manuals on the DATRAS submission and fields are al-ready available to guide this process along.

Page 49: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 43

8 Conclusions

8.1 Synopsis

The conclusions from participants were that the workshop had been useful to bring together a number of expectations and opportunities regarding the use of ICES trawl survey data.

The workshop has highlighted that the community of data end-users is wide and di-verse, making it a bit difficult to come up with a one-size-fits-all approach. Indices, and other integrated products used in stock – and ecosystem assessments, result from a complex suite of calculations performed on a complex dataset. The calculations them-selves can follow several statistical procedures.

This means that the DATRAS portal has a dual role. The first is that DATRAS serves as the repository of high quality raw data. The second is that DATRAS offers a stand-ardised frame for the estimation of indices based on those. Distinguishing between the two aspects, and identifying their corresponding user profiles was useful for formulat-ing the needs for further developments. The general approach to DATRAS protocols should therefore ensure that both the raw data and the suite of calculations are fully accessible and documented, so that all estimated products are fully trustable and re-producible.

The development of integrated products raised some questions of responsibility dur-ing the workshop. When several statistical methods can be applied for deriving the same products (for example, filling missing ALK data by “borrowing” or by “fitting a model”), with potentially different outcomes, then a collaboration should take place between the users, the Data Centre and the corresponding Surveys WG, to assess the various methods and decide upon the most appropriate. The development of an open source community with sharing of scripts and tutorials is encouraged, to enhance transparency in the methods used and capacity building among users.

8.2 Summary of suggested tasks to ICES Data Centre and surveys WG

The suggested improvements to DATRAS are summarised as follows:

• Establishment of a data submitters contact list (1 person per country and per survey)

• Easier data submission process and improved documentation • Publication of fixed parameters involved in products’ calculation • Ongoing documentation of changes and updates • Improved visualisation of DATRAS data on ICES GeoPortal • Links on DATRAS webpage to tutorials and packages using DATRAS data,

and possible • Development of products for BTS surveys • Publication of intermediate calculations for existing products • Development of new products for single-stock and ecosystem assessment

8.3 References

ICES. 2012. Report of the Working Group on Improving use of Survey Data for Assessment and Advice (WGISDAA), 10–13 January 2012, ICES Headquarters, Copenhagen. ICES CM 2012/SSGESST:18. 39 pp

Page 50: Report of the Workshop on Integrated DATRAS Products (WKIDP)

44 | ICES WKIDP REPORT 2014

ICES. 2012. Report of the Workshop on Implementation in DATRAS of Confidence Limits Esti-mation, 10–12 May 2006, ICES Headquarters, Copenhagen. 53 pp

ICES. 2013. Report of the Workshop on DATRAS data Review Priorities and checking Proce-dures (WKDATR), 29-31 January 2013, ICES Headquarters, Copenhagen. ICES CM 2013/SSGESST:05. 45 pp.

Page 51: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 45

Annex 1: List of participants

NAME ADDRESS PHONE/FAX EMAIL

Clara Ulrich (Chair) Clara Ulrich DTU Aqua - National Institute of Aquatic Resources Jægersborg Allé 1 2920 Charlottenlund Denmark

Phone +45 35 88 33 95/21157486 Fax +45 3588 3333

[email protected]

Casper Berg Casper Willestofte Berg DTU Aqua - National Institute of Aquatic Resources Section for Fisheries Advice Charlottenlund Slot Jægersborg Alle 1 2920 Charlottenlund Denmark

Phone +45 35 88 34 33

[email protected]

Ingeborg de Boois Ingeborg de Boois Wageningen IMARES P.O. Box 68 1970 AB IJmuiden Netherlands

[email protected]

Gary Burt Gary Burt Centre for Environment, Fisheries and Aquaculture Science (Cefas) Lowestoft Laboratory Pakefield Road NR33 0HT Lowestoft Suffolk United Kingdom

[email protected]

Henrik Degel Henrik Degel DTU Aqua - National Institute of Aquatic Resources Section for Fisheries Advice Charlottenlund Slot Jægersborg Alle 1 2920 Charlottenlund Denmark

Phone +45 21314880 Fax +45 33 96 3333

[email protected]

Brian Harley Brian Harley Centre for Environment, Fisheries and Aquaculture Science (Cefas) Lowestoft Laboratory Pakefield Road NR33 0HT Lowestoft Suffolk United Kingdom

Phone 44 (0) 1502 562244

[email protected]

Page 52: Report of the Workshop on Integrated DATRAS Products (WKIDP)

46 | ICES WKIDP REPORT 2014

Kasper Kristensen Kasper Kristensen DTU Aqua - National Institute of Aquatic Resources Section for Fisheries Advice Charlottenlund Slot Jægersborg Alle 1 2920 Charlottenlund Denmark

Phone +45 33 96 33 00 Fax +45 33 96 33 33

[email protected]

Chris Lynam Christopher Lynam Centre for Environment, Fisheries and Aquaculture Science (Cefas) Lowestoft Laboratory Pakefield Road NR33 0HT Lowestoft Suffolk United Kingdom

Phone +44 1502 52 4514 Fax +44 1502 313865

[email protected]

Rainer Oeberst Rainer Oeberst Thünen Institute Institute for Baltic Sea Fisheries Alter Hafen Süd 2 18069 Rostock Germany

Phone +49 381 811 6125 Fax +49 381 811 6199

[email protected]

Anna Osypchuk Anna Osypchuk International Council for the Exploration of the Sea H. C. Andersens Boulevard 44-46 1553 Copenhagen V Denmark

[email protected]

Jens Rasmussen Jens Rasmussen Marine Scotland Science Marine Laboratory 375 Victoria Road AB11 9DB Aberdeen United Kingdom

[email protected]

Vaishav Soni Vaishav Soni International Council for the Exploration of the Sea H. C. Andersens Boulevard 44-46 1553 Copenhagen V Denmark

[email protected]

David Stokes David Stokes Marine Institute Rinville Oranmore Co. Galway Ireland

Phone +353 (0)91 387200 Fax +353 (0)91 387201

[email protected]

Page 53: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 47

Verena Trenkel Verena Trenkel Ifremer Nantes Centre P.O. Box 21105 44311 Nantes Cédex 03 France

Phone +33 240 374157

[email protected]

Francisco Velasco Francisco Velasco Instituto Español de Oceanografía Centro Oceanográfico de Santander P.O. Box 240 39004 Santander Cantabria Spain

Phone +34 942 291060 Fax +34 942 275072

[email protected]

Page 54: Report of the Workshop on Integrated DATRAS Products (WKIDP)

48 | ICES WKIDP REPORT 2014

Annex 2: Recommendations

RECOMMENDATION TO

A general recommendation is to keep developing documentation on what is available where on the DATRAS portal, which requests are dealt with and which changes are performed, etc.

Secondly, WKIDP has formulated a number of suggestions for DATRAS improvements, summarised in section 8.2.

ICES Data Centre, Surveys WG

Advice is thought on minimum level of quality and coverage of ALK/SMALK data that can be used for converting CPUE at length data into biomas indices for DLS stocks.

Surveys WG, WGISDAA

Page 55: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 49

Annex 3: Protocol for new data services (ICES, 2011)

Input

Outline of steps to be taken when a request for a new data service facility is received

Figure 8.1. Flowchart of the decision-making process of new data storage at ICES.

Checklist for the request of new data services at ICES Data Centre

The framework used to identify the relevant information were the what, why, when, who, how questions. From this, a checklist for data service requests was developed.

The access to data should always be in line with ICES Data Policy, ICES Data Centre Guidelines.

1) Request justification: • Relevance / Rationale for the requester • Relevance / Rationale for the Expert Groups • Strategic importance for ICES

2) Request description: • (Online) Database (options, more than one possible):

o Data housing o Data checking (if yes, valid variable ranges, mandatory variables

should be provided) o Automatic uploading facility o Output interface (if yes, a description should be provided)

• Other 3) Database definition:

• Metadata information

RequesterExpert Group/external ICES Data Centre

Generation of request

Formal request for a new

service

Checklist for new service

ICES Data Centre Review

Status of the request WorkplanRejected

Not accepted Accepted

WGDIM decide

Page 56: Report of the Workshop on Integrated DATRAS Products (WKIDP)

50 | ICES WKIDP REPORT 2014

• Basic dataset description • Description of existing systems • Repository of the originator • External linkages (e.g. to vocabularies)

4) Database delivery deadline • Expert Group’s own resources

5) Customer definition: • Contact person • Data owner (options):

o Personal (individual researchers, students, etc.) o Organization o Project (like EU projects) o ICES working groups or expert groups

• Data end-use description: o Who o How

What

Prioritizing of requests was not discussed as it is assumed that this is part of the ICES Data Centre’s review process.

Output

The subgroup was tasked with discussing how the ICES Data Centre should handle product requests from requesters inside and outside ICES, including the ICES expert groups and ICES advisory groups.

Outline of steps to be taken when a request for data output is received.

1 ) The request should be forwarded to the ICES Data Centre. 2 ) If the product is available through the website or the product could be pro-

duced by automated processes on the website the ICES Data Centre gives advice on how this can be extracted.

3 ) If the product isn't available through the website the ICES Data Centre sends a set of predefined questions to the data requester. Predefined questions are outlined below.

4 ) Based on the answers given by the data requester the ICES Data Centre de-cides whether this product can be produced by ICES or if the requester ought to be given information about other places to attain the product.

The predefined questions are a mechanism to prioritize the requests. For equal request priorities, the Head of the ICES Data Centre should be consulted before any decision is made.

Page 57: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 51

Figure 8.2. Flowchart of the decision-making process of new data output from ICES.

Checklist for the request of data output from ICES Data Centre

The following checklist to handle product requests is suggested:

1) Data selection definition: • Parameters • Geographical area • Period

2) Output definition: • Product description (options):

o Raw data o Calculated data (e.g. indices) o Other output (e.g. maps)

• Output format required 3) Output delivery deadline 4) Customer definition:

• Contact person • Type of request (options):

o Personal request (individual researchers, students, etc.) o Organization request o Project request (like EU projects) o ICES working groups or expert groups request o Commercial use request

5 ) The request should be forwarded to the ICES Data Centre.

Requester ICES Data Centre

Requesting for data product

Data product questionnaire

Advice on data extraction

1. Request for data product

2. Request for questionnaire

2. Data available on the Web

Redirection toother data sources

Create data product

3. Filled out questionnaire

4. Product generation not possible or desirable

4. Product generation

possible & desirable

Page 58: Report of the Workshop on Integrated DATRAS Products (WKIDP)

52 | ICES WKIDP REPORT 2014

6 ) If the product is available through the website or the product could be pro-duced by automated processes on the website the ICES Data Centre gives advice on how this can be extracted.

7 ) If the product isn't available through the website the ICES Data Centre sends a set of predefined questions to the data requester. Predefined questions are outlined below.

8 ) Based on the answers given by the data requester the ICES Data Centre de-cides whether this product can be produced by ICES or if the requester ought to be given information about other places to attain the product.

The predefined questions are a mechanism to prioritize the requests. For equal request priorities, the Head of the ICES Data Centre should be consulted before any decision is made.

Page 59: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 53

Annex 4: Filled templates for new DATRAS products

Extended submission status page

a) Contributor(s): Vaishav

b) Product name: Extended submission status page

c) Origin of the request: [e.g. ICES WG, institute, user(s)] DATRAS data product workshop 2014

d) Product use: [what is the main reason for product development] More visibility and quick overview of the present data in the database for data users

e) Surveys involved: [use DATRAS coding] ALL

f) Methodology description (if available): [narrative, try to use DATRAS names of products to be used, like ‘HH Exchange file’, ’HL Exchange file’, ’CPUE per length per haul’] DATRAS submission page

g) Bottlenecks: [clearly describe why we couldn’t deal with this request during WKIDP] Required web-programming

h) Follow-up actions:

Action no. Action description Action holder Action before Status

1 Create procedure for retrieving data from database

Vaishav

2 Design status page Vaishav

3 Test with different selection Anna

4 Publish the page DATRAS user

5

Page 60: Report of the Workshop on Integrated DATRAS Products (WKIDP)

54 | ICES WKIDP REPORT 2014

Additions to DATRAS Web services

a) Contributor(s): WKIDP (Jens Rasmussen (UK), , ) b) Product name: Additions to DATRAS Web services c) Origin of the request: [e.g. ICES WG, institute, user(s)]: WKIDP d) Product use: [what is the main reason for product development]:

The existing data services provide an excellent way of retrieving HH, HL, and CA records from a given survey, year and quarter. However in order to determine which surveys, years and quarters are available, you currently need either knowledge of the surveys and their coverage, or to go to the DATRAS website to retrieve this information. From a development/automation perspective, having this information available as simple reference web services would make it possible to fully automate the reach to DATRAS data from external code in either R or local data exploration tools.

e) Surveys involved: [use DATRAS coding]: All, but only referencing/lookup of broad level data.

f) Methodology description (if available): [narrative, try to use DATRAS names of products to be used, like ‘HH Exchange file’, ’HL Exchange file’, ’CPUE per length per haul’]

Suggest 3 additional reference services:

1 ) getSurveys. Provide a simple list of the names of surveys, possible with some additional information like full names, etc. But principally, the pur-pose is to retrieve the web service usage name of each survey. Return would be a simply list containing the lookup names of all surveys held in DATRAS

2 ) getSurveyYears(Survey). For any given survey, return a list of all the years for which data are held.

3 ) getSurveyQuarters(Survey, Year). Return the quarters for which data are available in a given survey and year.

Once a developer has retrieved information from these 3 services, the existing HL, HH, and CA calls can be utilised more effectively.

g) Bottlenecks: [clearly describe why we couldn’t deal with this request during

WKIDP]. We don’t have the access or capability of programming web services for DATRAS

h) Follow-up actions:

Action no. Action description Action holder Action before Status

1 Service Development ICES Data Centre

2 Service Testing JR + others?

3 Service deployment on ICES website

ICES

4 How-to guide(s) with examples of using the web services?

Web service users/DIG

5

Page 61: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 55

WGBEAM products (see 5.2, Surveys under WGBEAM)

For the current BTS data (survey BTS) the products below should be made available as soon as possible, and at the latest before WGBEAM 2015. When the upload facility for BTS-VIII and the inshore surveys is ready, and data is uploaded, the output products from 3 can be made available for those surveys. For the output to be calculated by ICES Data Centre, a formal data product request has been done by WGBEAM in May 2014. Separate Excel files containing detailed calculation examples have been sent to ICES Data Centre.

1. Standard products from DATRAS input: Update frequency: continuous Information type: flexible (always use most recent data) Data used: all available in DATRAS Timing: always Location: on the DATRAS webpage, where people also can download the Ex-change file

Products:

To be created by

Product name

Aggregation level Methodology

Target audience

Countries involved

Contact person WGBEAM

ICES Data Centre

ALK By country and ship for all CA species (real numbers)

Pivot table from CA, all aged fish (age >= 0)

Wider audience

NED, GFR, ENG, BEL

Ingeborg

SMALK By country and ship for all CA species (real numbers)

Pivot table from CA, all aged fish (age >= 0) and sex IN(‘M’,’F’)

Wider audience

NED, GFR, ENG, BEL

Ingeborg

CPUE per species by length by haul

CPUE as in numbers/km2. File does not include 0 values!

Combine HH and HL data

Wider audience, including WGEF, WGCEPH, WGNEW

NED, GFR, ENG, BEL

Ingeborg

CPUE per species by haul

CPUE as in numbers/km2, file includes 0 values.

Combine HH and HL data

Wider audience, including WGEF, WGCEPH, WGNEW

NED, GFR, ENG, BEL

Ingeborg

Flat file As described in WKDATR report, and recommendation from WGBEAM 2013

See WKDATR report

Wider audience

NED, GFR, ENG, BEL

Ingeborg

Page 62: Report of the Workshop on Integrated DATRAS Products (WKIDP)

56 | ICES WKIDP REPORT 2014

To be created by

Product name

Aggregation level Methodology

Target audience

Countries involved

Contact person WGBEAM

Additional requests based on WKIDP 2014

ICES Data Centre

CPUE per species by length by haul

CPUE as in numbers/hour. File does not include 0 values!

Combine HH and HL data

Wider audience, including WGEF, WGCEPH, WGNEW

NED, GFR, ENG, BEL

Ingeborg

CPUE per species by haul

CPUE as in numbers/hour, file includes 0 values.

Combine HH and HL data

Wider audience, including WGEF, WGCEPH, WGNEW

NED, GFR, ENG, BEL

Ingeborg

WGBEAM decided only to provide data up to the haul level because users themselves should decide how to combine the different gears. As long as WGBEAM does not have a well-developed protocol for this, the group feels it should be clear to all users that one has to be aware of the different characteristics of the surveys.

2. Results from the BTS index c. Frequency: once per year

Information type: fixed (so: no updates throughout the year) Data used: only the WGBEAM approved indices and related products as used in the assessment groups. In 2014 and 2015: sole and plaice in the North Sea for Neth-erlands (Tridens and Isis), and England and in VIId. Timing: prior to WGNSSK (depending on the WGNSSK deadlines) Location: on the DATRAS webpage, the file marked with * will be added to the Indices download and won’t be available as separately downloadable products

NB: before the information can be made available:

• fine-tuning of Dutch index to be done (action Ingeborg) • English index calculations for IVa have to be approved by Cefas (action

Brian) • Documentation on index calculation. Action Vaishav/ Ingeborg

Products:

To be created by Product name

Calculation step Species

Countries involved Comment

ICES Data Centre

Indices S6 Pleuronectes platessa, Solea solea

NED, ENG No of countries will be extended after index approval by WGBEAM

SD* S6 Pleuronectes platessa, Solea solea

NED, ENG No of countries will be extended after index approval by WGBEAM

Page 63: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 57

Aco4* S5 Pleuronectes platessa, Solea solea

NED, ENG Remove meanlen columns (and meanweight?) Rename file into Age_composition No of countries will be extended after index approval by WGBEAM

ALK* S4 Pleuronectes platessa, Solea solea

NED, ENG No of countries will be extended after index approval by WGBEAM

CombineHH_HL S2 Pleuronectes platessa, Solea solea

NED, ENG No of countries will be extended after index approval by WGBEAM

d. Frequency: continuous

Information type: flexible (always latest data available) Data used: all index calculations for available data Timing: always available Location: at DATRAS webpage, only available on login Products: all ‘in-between’ products as produced by current DATRAS index calcu-lation

3. Internal WGBEAM products for survey summary Frequency: continuous Information type: flexible Data used: all available of the last survey year (in 2015: all data for 2014) Timing: always available Location: at DATRAS webpage, only available on login Products:

To be created by

Product name Product Methodology

Contact person WGBEAM Comment

ICES Data Centre

WGBEAM1 Time schedule, by week

From HH file, calculate week number and put in sheet

Ingeborg de Boois

WGBEAM2 Overview of hauls carried out

Pivot table from HH (valid and invalid hauls) +number of planned stations

Ingeborg de Boois

Planned tows are in sheet WGBEAM2a

WGBEAM3 Biological sampling: ages

Pivot table from CA, all aged fish (age >= 0)

Ingeborg de Boois

Page 64: Report of the Workshop on Integrated DATRAS Products (WKIDP)

58 | ICES WKIDP REPORT 2014

WGBEAM4 Biological sampling: not aged

Pivot table from CA, not aged information (age=-9)

Ingeborg de Boois

WGBEAM5** Map with hauls, only valid hauls

From HH table, real towing positions. Map including ICES rectangles.

Ingeborg de Boois

Options: different colour by gear, by country or by ship

WGBEAM WGBEAM6 Distribution plots by agegroup (sole, plaice)

WGBEAM7** Species distribution plots densities

** might be taken from the DATRAS plotting facility directly if it becomes available.

Page 65: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 59

Meanweight at age and Maturity

a) Contributor(s): b) Product name: Standard report: Stock mean weight at age c) Origin of the request: [e.g. ICES WG, institute, user(s)] WGBFAS d) Product use: The product is standard input to cohort analysis and central calcula-

tion will facilitate improved quality assurance, prober documentation and relief work load

e) Surveys involved: BITS, NS-IBTS… f) Methodology description

The stock mean weight at age is in theory the weighed mean weight at 1st of Janu-ary for each age as it appears in the stock (which is different from the mean weight by age in the commercial catches). Therefore, only first quarter data are used. The calculation is based on information included the CA records in the exchange for-mat. The mean weight should initially be stratified on stock, year, area, country and quarter (only 1st) and later aggregated to stock using weighting by numbers. It is important that values are available on disaggregated level as well as on stock level in order to be able to scrutinize discrepancies in the different mean weights in-cluded in the stock mean weight. If an plus-group is defined for a given stock, both the individuals stated as belong-ing to the plus-group and the individuals equal or older than the plus-group should be treated as a plus-group individuals.

g) Bottlenecks: [clearly describe why we couldn’t deal with this request during WKIDP]

h) Follow-up actions:

Action no. Action description Action holder Action before Status

1 Developing algorithms Rainer Oeberst

Vaishav Soni

???? Under development

2 Implementation in DATRAS Vaishav Soni Anna Osypchuk

??? Pending

3 Documentation Vaishav Soni Anna Osypchuk

??? Pending

4

5

i) Contributor(s): j) Product name: Standard report: Maturity ogives k) Origin of the request: [e.g. ICES WG, institute, user(s)] l) Product use: The product is standard input to cohort analysis and central calcula-

tion will facilitate improved quality assurance, prober documentation and relief work load

m) Surveys involved: BITS, NS-IBTS…

Page 66: Report of the Workshop on Integrated DATRAS Products (WKIDP)

60 | ICES WKIDP REPORT 2014

n) Methodology description: The calculation is based on information included the CA records in the exchange format. Various groups of fish species are assigned to different maturity scales. The stages of each maturity scale must be defines as be-ing mature or immature. This information is obtained from auxiliary sources. The fraction of mature individuals of each age class is to be calculated. The final out-put is the fraction mature for each aged class. The analysis is initially stratified on stock, year, area, country and quarter and later aggregated to stock level. No weighting is applied.

o) Bottlenecks: [clearly describe why we couldn’t deal with this request during WKIDP] None

p) Follow-up actions:

Action no. Action description Action holder Action before Status

1 Developing algorithms Rainer Oeberst

Vaishav Soni

???? Under development

2 Implementation in DATRAS Vaishav Soni Ana Osypchuk

??? Pending

3 Documentation Vaishav Soni Ana Osypchuk

¼-2015 Pending

4

5

Page 67: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 61

MSFD Extract Product

a) Contributor(s): (Simon Greenstreet (UK), Jens Rasmussen (UK) ) b) Product name: MSFD Extract Product c) Origin of the request: [e.g. ICES WG, institute, user(s)]: Initially, Marine Scot-

land, UK (as part of work for OSPAR common MSFD indicators for D1 + D4) d) Product use: [what is the main reason for product development]:

The main concept is to provide a common, static, extract of DATRAS data in a format that has dealt with the majority of missing data by documented methods for deriving values where they are missing (were possible). The static approach is promoted to ensure a common resource for use in MSFD assessments, and report-ing to the EU. It is fully acknowledged that this product is not suitable for many other purposes where the most up to date data are required. However, for the pur-poses of assessment, the consistency and a shared resource for development of common and possibly national indicators is deemed of higher importance. Further, there is a need to calculate swept areas and to convert abundance and size data to biomass. (See Link to specification below)

e) Surveys involved: [use DATRAS coding]: Preferably a common format for all sur-veys, but Initially NS_IBTS, Scottish West Coast, Irish Ground fish, Beam Trawl survey, BITS

f) Methodology description (if available): [narrative, try to use DATRAS names of products to be used, like ‘HH Exchange file’, ’HL Exchange file’, ’CPUE per length per haul’]

Separate extract of data. It can be in exchange file-type format or a simple csv downloading. Product is only expected to be an extract performed every 3 years. For details on the required fields and challenges, see accompanying doc-ument in WKIDP sharepoint (DATRAS_MSFDProduct_Spec.doc)

g) Bottlenecks: [clearly describe why we couldn’t deal with this request during WKIDP]. The task is too big for the time during WKIDP + overlap with IBTSWG, WGBIFS and WGBEAM developments. Not full consensus on approach/speed of develop-ment. WKIDP promoted to start the development as a wiki-type approach where the product is developed over time. It is suggested that a “prototype” format is developed based on the existing prod-uct specification, and then made available for discussion. The discussion could be a combination of an online wiki-type approach and an ICES Workshop in autumn 2015 to discuss the use of DATRAS specifically for MSFD. The workshop should be attended by DATRAS experts, MSFD indicator users and ICES Data Centre. Ideally, a prototype data format would be ready ahead of a workshop in order to have something to relate to. Also discussion in WGISUR in January 2015 ahead of a workshop to do prepare the workshop would be useful.

h) Follow-up actions:

Page 68: Report of the Workshop on Integrated DATRAS Products (WKIDP)

62 | ICES WKIDP REPORT 2014

Action no. Action description

Action holder Action before Status

1 Determine list of required algorithms/calculations for fields in prototype format. Match this with ongoing work in survey working groups.

WKIDP End of WS?

2 Timeline for completion of calculation methods based on Working group progress

WKIDP End October?

3 Prototype detailed specification including “rules” for checks, calculations and filling blanks.

January, for discussion at ISUR?

4 Prototype data extract development

Data Centre Before MSFD Product Workshop?

5 MSFD Product Workshop ? Discussion/evaludation of tool/format

6 Online wiki/discussion of format

Data Centre?

To coincide with Workshop?

Page 69: Report of the Workshop on Integrated DATRAS Products (WKIDP)

ICES WKIDP REPORT 2014 | 63

Annex 5: Flex File structure draft

Fields Units

RecordType HH

HH_ID automated DB reference

Survey survey acronym

Quarter

Country survey country code

Ship survey ship code

Gear gear code

HaulNo

Date year/month/day

TimeShot HH:MM

Stratum depth stratum code

HaulDur duration in minutes

DayNight reported code

HaulLat

HaulLong

StatRec

ICES Area derived field by haul coordinates

Depth in metres

HaulVal Haul validity code

DataType Data type code

DoorSpread in metres

Calculated DoorSpread derived field in metres

DS_Flag Door spread flag

WingSpread in metres

Calculated WingSpread derived field in metres

WS_Flag Wing spread flag

Distance reported, in metres

Calculated Distance calculated, in metres

SweptAreaDSKM2 derived based on Door spread, m2

SweptAreaWSKM2 derived based on Wing spread, m2

RecordType HL

HH_ID automated DB reference

Species_reported Latin Name

Species_valid Latin Name

SpecVal Species validity code

Sex sex code

TotalNo totals per haul, species, sex, category

CatIdentifier category number

NoMeas measured per haul, species, sex, category

SubFactor subsampling factor

LngtClas_in_MM millimetres

HLNoAtLngt numbers at length

Page 70: Report of the Workshop on Integrated DATRAS Products (WKIDP)

64 | ICES WKIDP REPORT 2014

Fields Units

RecordType CA

HH_ID automated DB reference

Species_reported Latin Name

Species_valid Latin Name

AreaCode Survey-specific area code

LngtClas_in_MM millimetres

Sex sex code

National Maturity National code, if submitted

International Maturity International code

PlusGr age plus group, if assigned

Age in years

IndWgt in grams

NoAtALK numbers at age and length