usgs bioinformatics activities ecoinformatics january 2010 gladys cotter mike frame ecoinformatics...
TRANSCRIPT
USGS Bioinformatics ActivitiesUSGS Bioinformatics ActivitiesEcoinformatics
January 2010
Gladys Cotter
Mike Frame
Ecoinformatics
January 2010
Gladys Cotter
Mike Frame
3
2
1USGS Bioinformatics Activities
Potential areas of collaboration
Questions
Topics for Discussion
•Tools•Protocols•Standards
Collecting
Bioinformatics USGS NBII – addressing bioinformatics challenges
through collaboration, content development, technology, and creating long-term infrastructure
•Cross-referencing •Relationship of data
Linking
•DBMS•Central & Distributed•Security•Backups•Archival •Standards
Storage
•Structure•Governance•Standards •Policies
Organization
•Multi-levels•Difficult•Mashups•Standards
Integration
•Tools•Standards•Usability•Training•Non-biased
Analysis Synthesis
•Tools•Governance•Infrastructure•User analysis
Delivery
•Tools•Protocols•Standards
Applications
for
•Fusion•Blending•Related Integration•Analysis •Models
•Research•Decision Making•Policies•Education•Outreach
Sustainable Reliable Outreach Training
Biological Spatial InfrastructureNBII
Over 72,000 records Based on FGDC BDP Training Program QA/QC Program Standards Cross-walks
EML Dublin Core
Establishing Administrative Tools Expanding internationally Embedding in-line visualization
World Data Center for Biodiversity & Ecology
• World Data System created through the International Council of Scientific Unions (ICSU) in 1957
• Currently 50 World Data Centers (WDC) in place internationally
• USGS National Biological Information Infrastructure (NBII) network designated as the WDC for Biodiversity & Ecology in 2002
WDC Current Activities
• Renewable Energy Project Prequalification Demonstration project – Goal: support rapid prequalification of sites across the nation that are potentially
suitable for renewable energy (with an initial focus on federal lands). • Data sets include, but are not limited to: • Land Cover (GAP), • Protected areas/Stewardship (GAP), • Species Distributions/Habitat Affinities (GAP), • Species Occurrences (US-GBIF Mirror Site and NBII), • Integrated Taxonomic Information System (ITIS)• Topography (USGS), • Landforms (USGS/GAM), • Soil Moisture (USGS/GAM), • Ecosystems (USGS/GAM), • Renewable Energy Potential (i.e., wind, solar, geothermal, and
biofuels; NREL), and • Infrastructure (i.e., power grid, projected smart grid, and roads; NREL
and USGS).
• Protected areas – working with WDPA, USGS GAP• Sponsoring WDC for Biodiversity & Human Health
– South Africa is hosting– Providing workshops, training, demonstration projects– Evaluating how to leverage ILTER activities
Multilingual IABIN Catalog
Ability to search by:IABIN TNMap interface Resource TypeLanguageTaxonomyMulti-lingual thesaurus
Thesaurus web-servicesEnglishSpanishPortuguese
NBII Search
Unique Facets
Dynamic biological clusters
Refine Results
Biological images
Map Display
Additional
Unique Facets
Thesaurus integration
Publisher refinement
Diverse Sources
DBMSWebsites
FederationDocuments
Weighting of sources
Integrated Taxonomic
Information System
• Multi-agency partnership
• Primarily North America Taxa
• Used Globally
• Web-services released Summer 2009
• Taxonomic Workbench 2010
NBII Species Mashups• Designed for
– One-stop-shop for species information in SE– Integrate diverse sources
• Content Type• UI Presentation
USGS Data Integration
3 Major Goals:1. Establishing corporate data available via
ESRI services
2. Improving access to Modeling data, including Water quality, stream, etc.
3. Providing easy to use “data upload”, “registry”, and “discovery tools”
North American EOL
• Multi-agency partnership designed to develop a prototype for “species” information” within the Great Lakes and Chesapeake Bay regions
NSF DataNet Grant Background
• NSF solicitation to establish– Long-term archives for science data – Develop sustainable business model to
support these activities– Involve multi-disciplinary domains– Develop various R&D needed to support effort – Provide ongoing “operational” support
Funded 2:
DataONE
The Data Conservancy
DataONEAreas of emphasis
• Data loss: preserving all the work that has been done; by preserving at-risk (orphaned) biological ecological environmental data from individual scientists
• Data dispersion: finding the needle in the haystack; by facilitating discovery and access of data through a single easy-to-use portal
• Data deluge: navigating the flood of increasingly heterogeneous data; by providing a toolbox that empowers scientists and organizations to more easily and effectively manage, analyze, and synthesize data
• Data Practice: using the best tools to do the job; by creating an informatics-literate workforce through innovative outreach and training efforts (e.g., best-practice videos, podcasts, on-line certificate programs, downloadable best practice guides and exemplars of data management plans)
16
DataONE Technology Directions
• DataONE will enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it by:
– making the scientist an active member of the data preservation process,
– creating cyberinfrastructure that supports the full data life cycle,
– promulgating cultural changes that value data stewardship and data sharing,
– broadly promoting best practices– engaging citizens in science – domain-agnostic Solutions
17
Partnering organizations
• Libraries & digital libraries • Academic institutions • Research networks • NSF- and government-funded
synthesis & supercomputer centers/networks
• Governmental organizations • International organizations • Data and metadata archives • Professional societies • NGOs • Commercial sector
Why is this relevant to Ecoinformatics
Share similar Cyber infrastructure needs Architecture Portals Distributed approaches Replication Secure, controlled access Authentication methods Tools deployed, and supported Data discovery & interoperability methods Standards developed, deployed
Life Cycle Data Management tools (i.e Investigator toolkit, CI) R&D activities in the areas of CS, IS, SS, GIS, Env., etc. Opportunity for broad Governmental & International Participation (i.e. working groups, tool evaluations, etc.) Complementary to several of our groups goals, projects, activities Potential Microsoft related projects (i.e. MS Excel)
Potential areas of collaboration
• NBII Metadata Expansion• Incorporation of additional species data
into NA EOL, NBII Species Mashups, etc • USGS Data Integration activities• NSF DataONE Grant• Potential Microsoft tools
Technical Architecture & Discussions
DataONE: Enabling Data-Intensive Biological and Environmental Research
22
Existing biological data archives
ESA’s Ecological Archive
Long Term Ecological Research Network
Fire Research & Management Exchange System
National Biological Information Infrastructure
Distributed Active Archive Center
Knowledge Network for Biocomplexity
23
Example data holdings
Data Archive Types of Data ManagedMetadata
Standard(s)
Biodiversity, taxonomic, ecological BDP, DwC, DC, OGIS
Biogeochemical dynamics, terrestrial ecological Earth observation imagery
DIF, BDP, ECHO
Ecological, biodiversity, biophysical, social, genomics, and taxonomic
EML
Avian populations and molecular biology DwC
Biological and taxonomic DC subset
Biophysical, biodiversity, disturbance, and Earth observation imagery
EML
Biodiversity, biotic structure, function/process, biogeochemical,
climate, and hydrologic
EML
Metadata Interoperability Across Data Holdings
EML=Ecological Metadata Language
BDP=Biological Data Profile DwC=Darwin Core
DC=Dublin Core ECHO=EOS ClearingHOuse
OGIS=OpenGIS
DC subset=Dublin Core subset
DIF=Directory Interchange Format
Distributed framework
Member Nodes
• diverse institutions
• serve local community
• provide resources for managing their data
Coordinating Nodes• retain complete metadata catalog • subset of all data• perform basic indexing• provide network-wide services• ensure data availability (preservation) • provide replication services
Flexible, scalable, sustainable network
Supporting the data lifecycle
UCSBNode
UNMNode
ORCNode
1. Deposition/acquisition/ingest2. Curation and metadata management3. Protection, including privacy4. Discovery, access, use, and dissemination5. Interoperability, standards, and integration6. Evaluation, analysis, and visualization
The data lifecycle }
Use Cases, Architecture Planning
http://mule1.dataone.org/ArchitectureDocs/index.html
Changing science culture
1. Education and training
2. Engaging citizens in science
3. Building global communities of practice
Career Long Learning: • best practice guides• exemplary data management
plans• podcasts, web-casts• workshops and seminars• downloadable curricula
Education and training
Best Practice Guide
How to Cite Your Data
6 in a series
Best Practice Guide
Using Metadata fore-research
5 in a series
Gold Star Data Management Plan
Here’s HowBest Practice Guide
How to Cite Your Data
6 in a series
www.CitizenScience.org
Engaging citizens in science
Building global long-lived communities of practice:
• Broad, active community engagement– Involvement of library and science educators engaging
new generations of students in best practices– Existing outreach and education programs
• Transparent, participatory governance• Adoption/creation of innovative and sustainable business
and organizational models
Engagement Working Groups
External Advisory Committee
DIUG
Infrastructure and Research Working Groups
Director Development & Operations
Principal Investigator
R&D Operations
Coordinating Nodes
Member Nodes
Sociocultural barriers to data sharing and preservation
Long-term sustainability and governance
Community engagement and education
Citizen science and public outreach
Usability and assessment
Data integration and semantics
Data preservation, metadata, and interoperability Distributed storage
Federated security
Scientific workflows
Usability and assessment
DirectorCommunity Engagement & Outreach
Education and Outreach Team
Operations
Core CI Team
R&D
Executive Director
Exploration, Visualization, Analysis Exploration, Visualization, Analysis
DataNet Partners
NSF
Leadership Team
DataONEOffice
Why is this relevant to Ecoinformatics
Share similar Cyber infrastructure needs Architecture Portals Distributed approaches Replication Secure, controlled access Authentication methods Tools deployed, and supported Data discovery & interoperability methods Standards developed, deployed
Life Cycle Data Management tools (i.e Investigator toolkit, CI) R&D activities in the areas of CS, IS, SS, GIS, Env., etc. Opportunity for broad Governmental & International Participation (i.e. working groups, tool evaluations, etc.) Complementary to several of our groups goals, projects, activities Potential Microsoft related projects (i.e. MS Excel)
Thanks!
Leadership Team:Bill Michener – UNM, PISuzie Allard – UTJohn Cobb – ORNLBob Cook – ORNLPatricia Cruse – CDLMike Frame – USGSStephanie Hampton – UCSBViv Hutchison – USGSMatt Jones – UCSBSteve Kelling – CornellKathleen Smith - DukeCarol Tenopir – UTDave Vieglais – KU, DataONEBruce Wilson – Joint ORNL – UT