datanet: infrastructure to connect data, people, and · pdf filedatanet: infrastructure to...
TRANSCRIPT
DataNet: Infrastructure to Connect Data, People, and Science
Mission: Lower barriers to conducting interdisciplinary human-‐environment interactions research by making data with different formats from different scienti:ic domains easily interoperable.
Data, Tools & Services Source Data
Census microdata and aggregate data Land use/land cover and climate data Other population and environmental data
Data Integra.on Methods to integrate diverse data using spatial location and geographic boundaries to link data contents
Web-‐based Data Access System Explore available data and metadata Select variables of interest Merge data from different source datasets and formats
Human Networks Development system tes.ng Opportunities to explore pre-‐release versions and provide feedback at conferences, including AGU (contact [email protected] to participate)
Development Community Feedback through surveys and beta testing. Sign up at www.terrapop.org
Mission: Support the “long tail” of research through an environment with low barriers to deposit, active and social curation, and links to existing preservation infrastructure for long-‐term access.
Data, Tools & Services Social Networking Environment VIVO instance with researcher pro:iles, publications and data citations for discovery of expertise, publications, and data with network visualizations
Ac.ve Content Repository Storage for data and metadata undergoing active use with capabilities for deposit, metadata extraction, previewing, tagging and social curation
Virtual Archive Distributed storage for long-‐term archiving and dissemination of ‘:inished’ data products in institutional repositories and topical archives
Human Networks Ac.ve and Social Data Cura.on Tools for incorporating community-‐generated tags, annotations, assessments, and repurposing notes in metadata and for identi:ication and generation of archival data packages
Science Community Networking Compiling connections among individual scientists, research teams, publications, source datasets and derived datasets and tools for traversing the network to discover related people and work
Mission: Enable collaborative research through policy-‐ and standards-‐based federation of existing data management infrastructure
Data, Tools & Services iRODS Data Grids Sharable collections of remotely-‐located datasets managed by policies that automate administrative tasks, validation, and federation
Workflow Integra.on Capture processes applied to data to support documentation, repeatability, sharing, and re-‐execution
Interoperability Mechanisms Enable access to community resources using their protocols and register remote data into collaboration environments
Human Networks Collabora.on Environments Enable groups of researchers to access common datasets, work:lows, and relationships between data and work:lows
Educa.onal Access to Live Data Support controlled access to collections of data allowing students to build personal reference collections and perform de:ined data management and analysis tasks
Mission: Develop an institutional solution for the collection, preservation and re-‐use of data; encourage collaboration by enabling researchers to :ind someone else’s data products and assess their potential for re-‐use and re-‐combination.
Data, Tools & Services Data Conservancy Service and Reference UI • Robust ingest framework • Query interface • Archival store abstraction over the Fedora Repository • HTTP APIs supporting ingest, query, and retrieval of
data • Browser-‐based user interface Integra.ons with External Systems • Antarctica Dry Valley Glacier Photograph Collection at
National Snow and Ice Data Center (NSIDC) – Uses search and access APIs.
• ArXiv.org Pre-‐Print Repository – Uses search, access and ingest APIs
Human Networks DC Instances at JHU and NSIDC • Technical tools and organizational services for data
collection, curation, management, storage, preservation, and sharing.
• JHU Data Management Services – Helps researchers develop data management plans and both preserve and share research data.
• NSIDC – Facilitates curation of results from knowledge documentation projects in Arctic communities by the Exchange for Local Observations and Knowledge of the Arctic project
Educa.on Graduate programs, training courses, webinars, and other resources on data curation and management.
Mission: Enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it.
Data, Tools & Services Distributed Data Network • Member Nodes – Existing data collections exposed
through DataONE • Coordinating Nodes – Support indexing and
replication services across member nodes • Common Search and Discovery – ONEMercury :inds
data in in all member nodes from a single entry point Inves.gator toolkit • Data Management Planning Tool – Guides
development of DMPs for grant proposals • Data Citations –ONEMercury search results are tagged
for import into common bibliography management tools
• DataUp – Best practice checks and metadata creation to prepare data in Excel for archives
Human Networks DataONE Users’ Group Annual meetings and other opportunities for stakeholders to learn about and guide DataONE’s development
Working Groups Identify, describe, and implement DataONE cyber-‐infrastructure, governance, and other projects
Educa.on Training sessions, education models, and graduate courses relating to various aspects of data management for students and citizen scientists
Institutional Repositories
Network of Data Producers
Web User Interface
Active Content Repository
Services Provided
Virtual Archives
User Network
Data Conservancy
IU ICPSR
Content Mining
Curation Decisions
Archival data
generation
Other services
RPI UIUC UM
For more informa<on: www.dataone.org Amber Budden, Director for Community Engagement and Outreach [email protected]
For more informa<on: www.dataconservancy.org Shonna Clark, Project Coordinator [email protected]
For more informa<on: hOp://datafed.org Mary Whitton, Project Manager [email protected]
For more informa<on: hOp://sead-‐data.net Marietta Van Buhler, Project Manager [email protected]
For more informa<on: www.terrapop.org Tracy Kugler, Project Manager [email protected]
get
create
replicate
synchronize
search
Cross-DataNet Collaboration
The :ive DataNet projects collaborate through monthly conference calls, in-‐person PI meetings, and
joint projects to build interoperable cyber-‐infrastructure and to engage with a broad network of
researchers in the natural and social sciences.
Interoperable Cyber-‐Infrastructure Human Networks Examples of Joint Projects • Access to TerraPop extracts in DFC collaboration environments • Integration of Data Conservancy DCS-‐Lite and SEAD Active
Content Repository tools • Projects participating in DataONE as member nodes
DataNet Collabora.on Areas • Semantic integration • Technical best practices for sustainability • Data discovery, formats, and interoperability
from the scientist’s perspective
Popula.on and environmental data in grids
Environmental and popula.on summaries for spa.al units
Area-‐level data
Rasters
Microdata
Individuals and households with their environmental and social context
• Training and educa.on – Joint development and cross-‐program utilization of data management courses, sessions, and workshops
• Cross-‐disciplinary data awareness – Introducing scientists to data from other disciplines through cross-‐program conference activities and other outreach
• Long-‐term financial sustainability – Identifying and implementing funding and revenue models to support long-‐term data preservation and access
• Governance – Mechanisms for gathering stakeholder feedback and decision-‐making
Data Grid iRODS
controlled workflows
Storage
Shared Collec.on
Data Grid iRODS
controlled workflows
Researchers -‐ Client
Storage Storage Storage
Minnesota Population Center