CDI: DATA MANAGEMENT WORKING GROUP
Heather Henkel and Viv HutchisonHeather Henkel and Viv Hutchison
Outline
Data Management in USGS : Why? Data Management in USGS : Why? Evolution of the Data Management Working Evolution of the Data Management Working
GroupGroup Monthly PresentationsMonthly Presentations Sub-Team Accomplishments Sub-Team Accomplishments Powell Center Proposal and Alternative Powell Center Proposal and Alternative
ResultsResults FY12 Proposals FY12 Proposals
Data Management in USGS : Why?
About the Data About the Data Rescue Program: Rescue Program: “It is both a great “It is both a great and terrible thing and terrible thing that we have that we have such a program such a program at the USGS” at the USGS” (J. Faundeen, August 16, 2011)(J. Faundeen, August 16, 2011)CDI Recognizes: Good
data management is a prerequisite for data integration.
Data Management in USGS : Why?
Credit: DataONE
Data Management Working Group Goals
The Data Management Working Group will:The Data Management Working Group will: seek mechanisms for incorporating data management
into USGS science develop ways to educate its scientists of its value
The group seeks to The group seeks to elevate the practice of elevate the practice of data management such that it is seen as a data management such that it is seen as a critical partner in the pursuit of science in critical partner in the pursuit of science in USGSUSGS
What is Data Management?
““The business function that develops and The business function that develops and executes plans, policies, practices and executes plans, policies, practices and projects that acquire, control, protect, deliver projects that acquire, control, protect, deliver and enhance the value of data and and enhance the value of data and information.” information.” Source: DAMA Source: DAMA
Dictionary of Data Management, Dictionary of Data Management,
1st Ed.1st Ed.
Evolution of the Data Management Working Group
2010 CDI Meeting: formation of group2010 CDI Meeting: formation of group Monthly telecons with ~50 Working Group Monthly telecons with ~50 Working Group
participants from across the USGS Mission Areas participants from across the USGS Mission Areas and partner agencies:and partner agencies:
Data Management Working Group Wiki
my.usgs.gov/confluence/display/cdi/my.usgs.gov/confluence/display/cdi/Data+Management+Working+GroupData+Management+Working+Group
Monthly meeting notes, presentations, sub-teams, Monthly meeting notes, presentations, sub-teams, membership informationmembership information
DMWG: Presentations Basics of Using Mendeley - Natalie Latysh, USGSBasics of Using Mendeley - Natalie Latysh, USGS EROS Scientific Records Appraisal Process - John EROS Scientific Records Appraisal Process - John
Faundeen, USGSFaundeen, USGS Ocean Biodiversity Information System (OBIS) - Phillip Ocean Biodiversity Information System (OBIS) - Phillip
Goldstein, University of Colorado-BoulderGoldstein, University of Colorado-Boulder USGS Professional/Profile USGS Professional/Profile
Pages and Sharepoint – Pages and Sharepoint –
James Sayer, USGSJames Sayer, USGS
DMWG: Presentations National Geological and Geophysical Data National Geological and Geophysical Data
Preservation Program's Best Practices for Data Preservation Program's Best Practices for Data Preservation Project - Brian Buczkowski, USGSPreservation Project - Brian Buczkowski, USGS
Data Dissemination thru Cloud Computing: the Next Data Dissemination thru Cloud Computing: the Next Generation of Data.gov - Ray Obuch, USGS Generation of Data.gov - Ray Obuch, USGS
USGS Survey Manual Policy Development and Status USGS Survey Manual Policy Development and Status on Policy of Interest to the CDI - Carolyn Reid, USGSon Policy of Interest to the CDI - Carolyn Reid, USGS
Presentation on DataBasin - Denny Grossman, Jim Presentation on DataBasin - Denny Grossman, Jim Strittholt, and Brendan Ward, Data BasinStrittholt, and Brendan Ward, Data Basin
DMWG: Monthly Meetings and Presentations
Topics covered during CDI DM WG Calls:Topics covered during CDI DM WG Calls: Charter for group Powell Center Proposal Coordination between Tech Stack Group Development of three sub-teams (Policy (RGE/EDGE),
Best Practices, Data Management Workshop/Meeting) Discussion of abilities and specialties among CDI DM
working group members Encouraging people to create USGS Professional Pages
to highlight data management work and experience FY 12 proposals
DMWG Sub-team: Data Policy
DMWG Sub-Team: Data Policy Goals
Work toward formal incorporation of data Work toward formal incorporation of data management into the Survey Manualmanagement into the Survey Manual
Explore opportunities to relate good data Explore opportunities to relate good data management to the Research Grade Evaluation management to the Research Grade Evaluation (RGE) and Equipment Development Grade Evaluation (RGE) and Equipment Development Grade Evaluation (EDGE)(EDGE)
Partner with Office of Science Quality and Integrity Partner with Office of Science Quality and Integrity to: to: review existing USGS policies on data management help write new policies provide feedback to RGE-EDGE processes
DMWG Sub-Team: Data Policy
DMWG Data Policy Sub-Team: Survey Manual Chapters
New Policy Chapters in Progress, in varying stages New Policy Chapters in Progress, in varying stages of review: of review:
Survey Manual Chapter 502.x – Fundamental Science Practices: Metadata for Datasets and Information Products
Survey Manual Chapter 502.x - Fundamental Science Practices: Safeguarding Unpublished U.S. Geological Survey Data and Information
Survey Manual Chapter .XXX - Release of Computer Databases and Computer Programs
DMWG Data Policy Sub-Team: RGE Review
Where and how do we enable our scientists to report data Where and how do we enable our scientists to report data management activities in RGE? management activities in RGE?
Actions: Actions: Attended peer-review panel (March 2011) Reviewed USGS RGE Enhancement Team charter (2007) Interviewed RGE scientists Reviewed RGE/EDGE Guides/Evaluation Forms Presented to the RGE Panel (August 4, 2011)
DMWG Data Policy Sub-Team
Questioned 4 RGE Questioned 4 RGE scientists about data scientists about data management management practices and needspractices and needs
Short-term Result: Short-term Result: Input form created to Input form created to discover and track discover and track data management data management needs of scientistsneeds of scientists
DMWG Data Policy Sub-Team
““Never quite sure a dataset is the most recent. Some Never quite sure a dataset is the most recent. Some datasets don't have an author/creator listed so tracking datasets don't have an author/creator listed so tracking down if it is the most recent is a challenge.”down if it is the most recent is a challenge.”
““Our data are located all over and it is a real effort just to Our data are located all over and it is a real effort just to locate data.”locate data.”
““It should be obvious to a researcher how to cite a It should be obvious to a researcher how to cite a dataset. Peer-reviewed papers that point to how dataset. Peer-reviewed papers that point to how scientists find out about a dataset are critical.”scientists find out about a dataset are critical.”
**************************************************************************************************************************************** Sub-Team learned from Scientists about their needs and Sub-Team learned from Scientists about their needs and
could make recommendations to the RGE Panel based on could make recommendations to the RGE Panel based on findingsfindings
DMWG Data Policy Sub-Team: Recommendations to RGE Panel
Make it easier for USGS scientists to do data Make it easier for USGS scientists to do data management (CDI)management (CDI)
Incentivize good data management thru RGE (RGE)Incentivize good data management thru RGE (RGE) Make it easier to document data (CDI)Make it easier to document data (CDI) Allow easier reporting of data management in RGE–Allow easier reporting of data management in RGE–
modify self-evaluation documents (RGE)modify self-evaluation documents (RGE) Develop criteria for RGE panel to recognize and Develop criteria for RGE panel to recognize and
reward good data management (CDI-RGE)reward good data management (CDI-RGE)
DMWG Data Policy Sub-Team: Next Steps
Continue work with RGE Coordinators on Continue work with RGE Coordinators on recommendations and on feedback from their inputrecommendations and on feedback from their input
Explore opportunities to include informatics Explore opportunities to include informatics professionals in the RGE/EDGE processprofessionals in the RGE/EDGE process
Assist in completion of Survey Manual chapters to Assist in completion of Survey Manual chapters to publicationpublication
Assist in ‘refresh’ of new scientist Orientation Assist in ‘refresh’ of new scientist Orientation Checklist and Exit SurveyChecklist and Exit Survey
Review Survey Manual for relevant data management Review Survey Manual for relevant data management languagelanguage
DMWG Sub-team: Data Best Practices
DMWG Sub-team: Data Best Practices Goals
The Best Practices Sub-team was formed in The Best Practices Sub-team was formed in early 2011 to:early 2011 to:
compile a suite of best practices, lessons learned, and learning opportunities, regarding data management
organize this information and make it available through a website or portal
Participants John Faundeen (Lead)John Faundeen (Lead) Brian BuczkowskiBrian Buczkowski Tom BurleyTom Burley Jennifer CarlinoJennifer Carlino Robin FegeasRobin Fegeas Dave GovoniDave Govoni Heather HenkelHeather Henkel Sally HollSally Holl Donn HolmesDonn Holmes Richard HuffineRichard Huffine Viv HutchisonViv Hutchison
Tim KernTim Kern Tim MancusoTim Mancuso Elizabeth MartinElizabeth Martin Scott McEwenScott McEwen Ellyn MontgomeryEllyn Montgomery Cassandra LadinoCassandra Ladino Daniel SandhausDaniel Sandhaus Steve TesslerSteve Tessler Jessica ThompsonJessica Thompson Lisa ZollyLisa Zolly Joseph KalfsbeekJoseph Kalfsbeek
Work Approach
Monthly Webex SessionsMonthly Webex Sessions March 9 April 6 May 4 June 1 July 6 August 3
WorkshopWorkshop July 26-27 Reston
Beginning Steps Step 1: Data Lifecycle Model…Step 1: Data Lifecycle Model…
Develop/adopt a data lifecycle model that accurately reflects how USGS science data does or should travel through its life.
Foundational for Sub-Team Goals: Simplicity, Intuitive, Identify Roles
““As the government looks to its plan for open government As the government looks to its plan for open government through the development of tools such as Data.gov, it is important through the development of tools such as Data.gov, it is important to integrate these tools into the overall federal architecture and to integrate these tools into the overall federal architecture and project lifecycle.”project lifecycle.”
Harnessing the Power of Digital Data: Taking the Next Step. Scientific Data Harnessing the Power of Digital Data: Taking the Next Step. Scientific Data Management (SDM) for Government Agencies: report from the Workshop to Improve Management (SDM) for Government Agencies: report from the Workshop to Improve SDM SDM held held June 29 – July 1, 2010, Washington, DC.June 29 – July 1, 2010, Washington, DC.
Work Item: Data Life Cycle Model
““Literature Search”Literature Search” CompilationCompilation ReviewReview NSF WorkshopNSF Workshop USGS WorkshopUSGS Workshop
Guidance
““The business function that develops and The business function that develops and executes plans, policies, practices and projects executes plans, policies, practices and projects that acquire, control, protect, deliver and that acquire, control, protect, deliver and enhance the value of data and information.”enhance the value of data and information.”
Source: DAMA Dictionary of Data Management, 1st Ed.
Draft Data Lifecycle Model
PLAN
ACQUIRE & PROCESS
ANALYZE PRESERVE PUBLISH/ SHARE
CDI Data Blast Poster
““Write-On” Write-On” PosterPoster
DMWG Sub-team: Data Best Practices Next Steps
Digest Data Blast CommentsDigest Data Blast Comments Receive CDI Sponsor FeedbackReceive CDI Sponsor Feedback Assign Roles to ModelAssign Roles to Model Finalize GraphicFinalize Graphic Establish Science ReviewEstablish Science Review
FY12 Validation Beginning of Outreach Effort
Communicate Final Model to USGSCommunicate Final Model to USGS Start Aligning Best Practices to ModelStart Aligning Best Practices to Model Determine GapsDetermine Gaps
Funded FY11 Data Management Projects
Funded FY11 Data Management Projects
Group convened December, 2011 to put together data management proposal to the Powell Center Heather Henkel, Sally Holl, Viv Hutchison, Steve
Tessler, Jessica Thompson, Lisa Zolly Proposal not funded, but instead received support from
Powell Center to have proposal funded at a higher (enterprise) level
Proposal modified and resubmitted June 20th funding received from Core Science Systems
(CSS) and CDI Work begun in July
Funded Data Management Projects Creation of data management website:
Provide one place for best practices, tools,
education, key points, recommended reading,
checklists Internal initially,
plans to expand to
external site FY12
Funded Data Management Projects Categorization of existing data management
materials: Creation of bibliography Content for website
Purchase of Enterprise-wide license for DAMA Dictionary of Data Management DAMA Data Management Body of Knowledge
Funded Data Management Projects DM training for team:
Expose team to same DM background Build upon same core training Intent to provide focused DM training to others, based upon
initial training DM Education Products:
Educate and encourage data management practices Repurpose existing materials created through DataONE Make available on website
DM Planning Tool: Template to guide users through the creation of a DM plan Build upon exiting work done by DataONE and USGS
FY12: Proposals from the Working Group
Moving Forward
Proposals requested from anyone within the CDI-DM working group Initial discussion during monthly telecons 20 proposals submitted Presentation of draft submissions during Tuesday
afternoon’s working group session Work done on combining similar proposals,
tasks Identification of cross-cutting tasks Creation of slides for this presentation
FY12 Proposals
Data Management Website (Phase 2) Summary: Summary: A critical activity needed for data integration is A critical activity needed for data integration is
well-managed data. Enhancement to the Phase 1 data well-managed data. Enhancement to the Phase 1 data management website will provide USSG researchers with the management website will provide USSG researchers with the information they need about how to implement data information they need about how to implement data management practices in their work.management practices in their work.
Deliverables: Deliverables: Internal (eventually Internal (eventually
migrating to a public-facing), migrating to a public-facing),
usability-tested, data management usability-tested, data management
website to underscore the Bureau’s website to underscore the Bureau’s
understanding of the importance of understanding of the importance of
data management. USGS researchers have easy access to data management. USGS researchers have easy access to the standards, tools, and best practices that will ensure the standards, tools, and best practices that will ensure adherence to data management.adherence to data management.
FY12 Proposals
Data Management Framework
Summary: Summary: A critical activity needed for data integration A critical activity needed for data integration is well-managed data. With a framework for USGS is well-managed data. With a framework for USGS researchers to use to guide planning to preservation of researchers to use to guide planning to preservation of their data, the USGS can offer better access to data their data, the USGS can offer better access to data ready for integration.ready for integration.
Deliverables: Deliverables: Cross-Mission Area, agreed-upon, Cross-Mission Area, agreed-upon, framework for standardizing data management planning, framework for standardizing data management planning, ultimately resulting in improved access to and integration ultimately resulting in improved access to and integration of research data products. Outreach and training of research data products. Outreach and training materials will accompany the framework to facilitate materials will accompany the framework to facilitate communication about the framework to USGS scientists communication about the framework to USGS scientists and science managers. and science managers.
FY12 Proposals
USGS Science Center Adaptable Data Management Plan Framework
Summary: Devise an adaptable baseline framework for Summary: Devise an adaptable baseline framework for science center data management plans: science center data management plans:
Conduct analysis of existing USGS and external DMPs
Address project, data, and business model variations Refine and test proposed framework through
implementations at the Alaska (Integrated) Science Center and Texas Water Science Center
Deliverables: Deliverables:
Publish a wiki version of the DMP framework to Publish a wiki version of the DMP framework to enhance future participation and developmentenhance future participation and development
FY12 Proposals
Validate Data Life Cycle Model Summary: Because this model is intended to be the
conceptual foundation from which our data management best practices, tools, policies and procedures will emanate, it is vital that it be reviewed extensively… Directly Engage our Scientists & Management
Deliverables: Formal Science Review (all Regions & Mission Areas) USGS-Wide Opportunity to Comment
Through Data Management Website Town Hall Sessions (Reston & Denver as largest USGS
numbers) Communicate Final Model to Bureau (outreach element)
FY12 Proposals
Data Preservation Mechanisms for USGS Researchers
Summary: Identify and provide information to USGS researchers about available data preservation mechanisms they can use and where they can submit their critical data for preservation.
Deliverables: Summary reports of potential data preservation
mechanisms and what USGS researchers need to participate in data preservation activities.
A data preservation webpage in the USGS Data Management website.
Identified most feasible data preservation mechanism(s) that USGS researchers can use to preserve their critical data and information on how to participate in those efforts.
FY12 Proposals
National Vegetation Classification Standard
Summary: In this proposal we have identified 3 possible sub-tasks related to the implementation of the National Vegetation Classification Standard (FGDC 2008). The content for the NVC is currently being developed through a variety of FGDC and Ecological Society of America Vegetation Panel. Each of the sub-tasks is a critical component of the full cyber-infrastructure needed to support the standard. Currently prototypes for several of these components exists but they have each been developed independently, and ultimately need to be linked in common framework.
FY12 Proposals
National Vegetation Classification Standard (cont.)
Deliverables: Vegetation classification – interim database design
Supports content (community types and descriptions) being developed through grant funding and linking to the NVC website.
Vegetation Plot Database - migration plan Provide a centralized database of vegetation plots (currently
housed at NCEAS – VegBank) and linkages to existing plot databases in partner agencies (distributed network)
Peer Review Infrastructure – workplan A prototype software exists – a data management work flow
and document management system is needed.
FY12 Proposals
Data Mgt – CHA CHA “like” ProposalSummary: A texting, Internet, and Chat based service for rapid response and
networking USGS Data Management questions, activities, and support. Deliverables:
Network of <10 USGS Data Managers Mechanisms to submit text, e-mail, chat, and web Q&A Integration with USGS Data Management Site Mobile Submission Application Training for CHA CHA Experts Promotion/Outreach/Education Materials 9 Month Evaluation of effectiveness, next steps, etc.
Outcomes: Network of USGS Data Managers to support Data Mgt. Development of Architecture & Services for USGS CHA CHA Service. An easy to use, multi-submission method, for rapid response to USGS Data
Management questions & issues More effective Data Mgt practices, awareness, & leveraging expertise
FY12 Proposals
Data Integration Potential from Linking Monitoring Protocols
Summary: Efforts to identify, collect, and characterize online monitoring protocol libraries will provide a valuable reference resource to USGS scientists and foster coordinated science and integration opportunities.
Deliverables: Centralized access to existing tools that collect documented monitoring
protocols through the Data Management website Leverage existing resources, expertise, technology, and content of
existing efforts such as Natural Resources Monitoring Partnership, Pacific Northwest Aquatic Monitoring Partnership, and National Environmental Methods Index
Common elements identified that will enable interoperability among the systems
Leverage the Data Management website as a mechanism for collecting USGS scientist needs for specific protocols and promote additional content into the monitoring library resources
FY12 Proposals
Quick Response Team to Web-enable Data
Summary: High-level (OSTP, NSTC, DOI Secretary, USGS Director, USGS Assistant Director) initiatives require timely response from the agency of relevant data and tools. Mobilizing a team of a metadata creation expert and a Web/map service IT expert will assist scientists to address these data requests in a timely manner, and demonstrate USGS relevance and competency meet the information needs of the Department and higher.
Deliverables: Process developed for handling data-release activities that could be
transferred to other data management activities under development Undetermined number of datasets broadly available for specific purpose
as well as ancillary benefits showcased Leverage development of the GOS to Data.gov migration to develop a
process that will be sustainable for future initiatives Leverage thesauri and other metadata standards and existing tools Leverage Document Production process of the Records Management
Office
FY12 Proposals
Develop A Data Standard Process For USGS, Using A TIME Standard As The Pilot
Summary: Data Standards are generally lacking across the USGS landscape and the inconsistencies in how we name, describe, and populate various common data elements are impediments to effective data integration. There is currently no process in place on how to establish a data standard.
How we name TIME fields and characterize our temporal data is critical to fostering data integration across the enterprise. Also, TIME is not a simple data element as temporal data can represent a full date-time, or only a year, month, day, time interval (range), or a timestamp in a data system, and concepts of ‘valid time’ also need to be considered (the time interval over which a value is valid).
Deliverable: Establish a formal process for proposing, evaluating, approving, and
implementing a data standard within the USGS. TIME is a ubiquitous data element made up of date and time components, and can serve as the pilot data standard.
FY12 Proposals
Write A ‘How To…’ Publication On How To Identify And Resolve Issues Involving Non-uniform (Mixed) Time Scales When Integrating Data For Research Use
Summary: Facilitate best practices for the integration of data in temporal dimensions.
Deliverables/Work: Organize subject matter experts to outline and discuss
the problems, existing solutions, and use cases at project and program levels.
Prepare a publication on how to identify and resolve these issues in order to use data from various sources and studies for USGS research.
FY12 Proposals
Write A ‘How To…’ Publication On How To Identify And Resolve Issues Involving Non-uniform (Mixed) Spatial Scales When Integrating Data For Research Use
Summary: Facilitate best practices for the integration of data in spatial dimensions.
Deliverables/Work: Organize subject matter experts to outline and discuss
the problems, existing solutions, and use cases at project and program levels.
Prepare a publication on how to identify and resolve these issues in order to use data from various sources and studies for USGS research.
FY12 Proposals
Survey of USGS Scientists about DM
Summary: Summary: In order to inform our future actions In order to inform our future actions to assist USGS in the management of its data, a to assist USGS in the management of its data, a survey of current practices will help us to identify survey of current practices will help us to identify where USGS is performing really well, and where where USGS is performing really well, and where some gaps may exist that we can look to some gaps may exist that we can look to improve. improve.
Deliverables: Deliverables: A USGS survey, leveraged from A USGS survey, leveraged from DataONE survey of scientists, with results DataONE survey of scientists, with results compiled and analyzed.compiled and analyzed.
FY12 Proposals
Data Exit Survey for USGS Scientists
Summary: Summary: To prevent loss of information about To prevent loss of information about data from exiting employees, an exit interview data from exiting employees, an exit interview about the data is necessary in our administrative about the data is necessary in our administrative processes. processes.
Deliverables: Deliverables: A USGS exit survey/interview, A USGS exit survey/interview, given to existing employees that asks such given to existing employees that asks such questions as “Has your data been archived?”, Is questions as “Has your data been archived?”, Is the metadata complete?”, “Where is the data the metadata complete?”, “Where is the data located?”located?”
FY12 Proposals
Thank you!
Questions? Comments?
Titles of Proposals Data Management Website (Phase 2) Data Management Website (Phase 2) Data Management Framework for USGSData Management Framework for USGS USGS Science Center Adaptable Data Management Plan FrameworkUSGS Science Center Adaptable Data Management Plan Framework Validate Data Life Cycle ModelValidate Data Life Cycle Model Data Preservation Mechanisms for USGS Researchers Data Preservation Mechanisms for USGS Researchers National Vegetation Classification StandardNational Vegetation Classification Standard Data Mgt – CHA CHA “like” ProposalData Mgt – CHA CHA “like” Proposal Data Integration Potential from Linking Monitoring ProtocolsData Integration Potential from Linking Monitoring Protocols Quick Response Team to Web-enable DataQuick Response Team to Web-enable Data Develop A Data Standard Process For USGS, Using A TIME Standard As The Develop A Data Standard Process For USGS, Using A TIME Standard As The
PilotPilot Write A ‘How To…’ Publication ---Write A ‘How To…’ Publication ---Time ScalesTime Scales Write A ‘How To…’ Publication Write A ‘How To…’ Publication Spatial ScalesSpatial Scales Survey of USGS Scientists about DMSurvey of USGS Scientists about DM Data Exit Survey for USGS ScientistsData Exit Survey for USGS Scientists