the best of data - ncar library · the best of data the worst of data… experiences with small...
TRANSCRIPT
The Best of Data The Worst of Data…
Experiences with small project data collection. D. Gochis, NCAR/RAL
Tales of Joy:
~dozen field campaigns over the last 10 years all with different scopes, types of data, levels of complexity, degrees of multi-disciplinarity…
Tales of Joy:
Attributes of successful efforts… – Data ‘engineering’ is at the table from day 1
– Methods for data collection, archival and access is clear from the beginning using relatively mature technologies
– ‘Raw’ data is archived either in real-time or immediately after collection
– Efficient methods to browse or ‘discover’ data within each project archive…critical for the revolving door of students and post-docs
– Downloading data is controlled in a manner that the provider can help the user interpret the data
Tales of Joy: BEACHON-MEF through RAL Winter Weather site
• Easy navigability/browsing • Real-time monitoring (…or soon after upload to dbase) • Automated data availability reporting • Password controlled downloading • Capitalized on an existing capability
Tales of Joy: EOL/NAME Field Data Catalog
• ‘Big data’ management but for a disparate group of small projects • Minimal overhead for data providers • ‘Perpetual’ access to individual measurments (not just large synthesis
datasets) some 10 years after data collection • Reasonable data browsing/discovery capabilities
Tales of Woe:
~dozen field campaigns over the last 10 years all with different scopes, types of data, levels of complexity, degrees of multi-disciplinarity…yet lacking consistent data plans…
Tales of Woe:
Attributes of ‘painful’ efforts…
– No clear pathway for data (result: to each their own and no one knows what to do)
– No clear archiving strategy (result: lost data)
– No browse-ability or discoverability component (result: lots of duplicative emails…lots of wasted student time)
– No data format plan (result: umpteen different formats to convert between)
Synthesis of Key Attributes for Managing Small Data:
1.Low barriers to entry for inputting and updating data
2.Utilize data standards (formats, etc) but provide conversion tools for those standards (getting easy with Python…)
3.Cross-referencing and browsing
4.Automated data availability diagnostics
5.Longevity for multiple student generations
Final thought: • A lot of emphasis on ‘Big Data’ these days…but
an overwhelming amount of research, particularly NSF-funded research is from small projects
• Most ‘Big Data’ efforts already have substantial engineering support (Rich Get Richer…)
• New White House directive is a game-changer for small projects
• Biggest impact on the community would be the development and availability of tools for small projects to archive and access their data