introduction the imber ssc created a dmc, which has made three major proposals: educate, change the...
TRANSCRIPT
Introduction
• The IMBER SSC created a DMC, which has made three major proposals:
• Educate, change the negative ethos
• Help you to do DM by creating a cookbook of simple guidelines
• Advocate that every IMBER project or cruise identify a Data Scientist to help with DM
Writing papers
• Writing papers is an essential part of a researcher’s job
• Writing papers is time consuming
• Writing papers is tedious/boring
• Writing papers needs attention to detail
• Publications are a legacy of your research
Data management
• Data management is an essential part of a researcher’s job
• Data management is time consuming
• Data management is tedious/boring
• Data management needs attention to detail• Data sets are a legacy of your research
potentially more objective than your publications, and available for re-interpretation
So why do we accept that we must write papers, but treat DM as the poor relation?
• Because we get recognition for publishing• “Publish or perish”• But we don’t get recognition for DM• Let’s try to change that
Recognition for DM
• Carrots and sticks
• Stick– No more funding
• Carrots– Referenceable data sets using DOIs (Digital
Object Identifiers) - SCOR/IODE initiative– Give help with DM – develop a cookbook– Data Scientist will look good on your CV
Example 1
Example of data spreadsheet submission to the BODC
Good filename
Good data set title
Good overall data organisation
Good explicit column headers
Use consistent data format – use a number to indicate a missing numerical value and use text to indicate a missing text value
Avoid blank cells unless the value is missing
Do not mix characters and numbers in same field
Avoid free text for dates – best to use separate columns for year, month, day.
Definitions need to be explicit
Station Lon Lat Time SPEED1 -69.30732 39.86233 7:002 -68.93825 38.70241 8:00 29.213 -68.54282 37.30523 9:00 34.854 -67.96285 35.5917 10:00 43.425 -66.56567 33.1664 11:00 67.186 -66.11751 32.45462 12:00 20.197 -67.54106 34.58994 13:00 61.598 -65.03667 30.87291 14:00 107.579 -64.11399 30.84654 15:00 22.15
10 -63.56039 31.37378 16:00 18.3511 -65.64299 34.53722 18:00 45.4512 -67.35653 38.46515 19:00 102.8513 -60.89783 38.14881 19:15 620.7814 -67.67287 39.41418 20:00 220.5515 -68.25284 40.38957 21:00 27.23
Example guideline 2
Example guideline 2map stations
Station Lon Lat Time SPEED1 -69.30732 39.86233 7:002 -68.93825 38.70241 8:00 29.213 -68.54282 37.30523 9:00 34.854 -67.96285 35.5917 10:00 43.425 -66.56567 33.1664 11:00 67.186 -66.11751 32.45462 12:00 20.197 -67.54106 34.58994 13:00 61.598 -65.03667 30.87291 14:00 107.579 -64.11399 30.84654 15:00 22.15
10 -63.56039 31.37378 16:00 18.3511 -65.64299 34.53722 18:00 45.4512 -67.35653 38.46515 19:00 102.8513 -60.89783 38.14881 19:15 620.7814 -67.67287 39.41418 20:00 220.5515 -68.25284 40.38957 21:00 27.23
Some suggestions
• Talk to a Data Centre from the start
• Fill in Metadata right from the start
• Delegate someone to help right from the start
• Follow the guidelines in the cookbook
• Maintain an Event Log during a cruise
• Take regular copies of notes and data
The bottom line
• DM cannot be an afterthought
• If you give DM some thought when you first plan a project, it will be– relatively straightforward– not too much effort– remarkably useful to all participants– valuable to those who come after
• Help is at hand: talk to Data Centre right at the start
So, what is a Data Scientist?
• The Data Scientist is someone who helps and advises the project/cruise Principal Scientist and researchers to document their data sets so that they are properly described
• The DS also interacts with PIs and Data Specialists to calibrate, validate, save and archive data
• Why is it FUN? - because you learn so much yourself by having to talk to people
• Can be full or part-time; paid or unpaid; hire, cajole or volunteer
What does the DS gain?
• Broadening your experience, learning from other PIs
• Advancing your own DM skills
• Great management training! (listening to others, looking for problems)
• Looks great on your CV
• You might even get paid
IMBER cookbook
• Draft version by Christmas
• Find it on-line via IMBER web site– Click on Data/How to do?
• Advise widely and seek your comments
• Create downloadable version + supporting templates etc (to take to sea)
IMBER cookbook demo
Summary
• Plan DM right at the start and allocate funds (5-10%, includes Data Centre time)
• Help us get the cookbook right
• Follow the guidelines in the cookbook
• Appoint/delegate/cajole somebody to be Data Scientist on a major project/cruise– Both he/she and you will benefit