eagle eagle: data archiving and metadata the eagle legacy r-8286750
TRANSCRIPT
EaGLe
EaGLe: Data Archiving and MetadataEaGLe: Data Archiving and Metadata
The EaGLe Legacy
R-8286750
EaGLe Metadata Training | 24-JUN-04 | Page 2
EaGLeWhy archive the EaGLe data? Why archive the EaGLe data?
To ensure its preservation for future generations of scientists
To ensure it is broadly available for current scientists to use
To create the broadest possible public benefit from this taxpayer-funded program
To help EPA retain the data that is collected / created through its funding
Because we wish that earlier researchers had archived their data for us to use
EaGLe Metadata Training | 24-JUN-04 | Page 3
EaGLeEaGLe Data Committee Mission EaGLe Data Committee Mission StatementStatement
Develop an information management plan to archive EaGLe data with appropriate metadata so that EPA can make it readily available
Ensure that data usefulness outlives the EaGLe project (and does not require continued maintenance by EaGLe researchers)
Skip
EaGLe Metadata Training | 24-JUN-04 | Page 4
EaGLe
DATA & METADATA EML and XML COST-EFFECTIVENESS1 Types of data A) Metadata standards 1$ Seems awfully complicated
2 What to archive B) What is EML? 2$ How much will it cost me?
3 Data objects C) Ecological metadata 3$ How long does it take?
4 Data packages D) What good is EML? 4$ What good are metadata?
5 What is metadata? E) EML vocabulary 5$ Who needs metadata?
6 Why collect metadata? F) What is XML? SECURITY ISSUES7 The ‘cons’ of metadata G) What good is XML? (1) Access controls
8 The ‘locs’ of metadata H) Do I have to learn XML? (2) Approval process
9 Sample metadata file 1 METADATA RETRIEVAL (3) The ‘locs’ of metadata
10 Sample metadata file 2 IJ) EIMS overview MORE INFORMATION
11 Getting in gear K) Data flow: You to EIMS Optional data archival
12 EaGLe metadata entry L) EaGLe home page Do NOT archive13 Metadata checklist M-N) Global search Non-standardized metadata
14 Checklist continued O-S) Metadata report EaGLe contacts
15 Data file formats T-U) Searches End
Jump 2Jump 2
EaGLe Metadata Training | 24-JUN-04 | Page 5
EaGLe1 EaGLe Data Types1 EaGLe Data Types
Geospatial & Imagery
Genomic
Remote Sensing
Biological
Routine Monitoring
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 6
EaGLe2 What data must be archived?2 What data must be archived?All new data created or collected using EaGLe funds Field data Genomics experiments New GIS coverages New remote sensing data Other images, models
All important summary, supplemental, and explanatory information Journal articles Poster Sessions Presentations Rules governing data QC or transforms SOPs, protocols, experimental design documents, QA/QC documents
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 7
EaGLe3 Types of Data Objects3 Types of Data Objects
Literature Objects Journal Articles, Bibliographies, Books, Adobe.pdf files, etc.
Flat Files Stand-alone tables (i.e., SAS tables), spreadsheet data
Relational Databases Many normalized tables joined by relational rules Data views, query objects: combined bits from separate tables
Graphical Objects Maps, photos, digital sounds, presentations, Web sites
Material objects Soil samples, stained slides, microfiche, posters, video tapes,etc.
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 8
EaGLe4 What is a Data Package?4 What is a Data Package? Together, electronic data objects and their metadata file
constitute a Data Package.
The metadata file is like the box, inventory tag and instruction manual
The data themselves are the content of the package
Data inventory requires good-quality metadata
Even material objects can have electronic metadata
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 9
EaGLe5 What’s metadata ?5 What’s metadata ? Metadata means “beside the data” or “data about data” Metadata files contain summary and reference data
about primary data objects: Any information needed to identify, decode, interpret, track,
store, locate, assign ownership of, or control access to a data object.
Everyday examples: Library card catalogue; Key to map symbols; Checkbook register
Scientific Metadata examples: Particulate matter instruments: equipment models and settings,
detection limits, replication, sample handling details Journal article citation, methods citation Sample indented metadata
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 10
EaGLe6 Why Collect Metadata?6 Why Collect Metadata? Long-term Storage
Keep EaGLe data safely banked for future reuse Support long-term data tracking and retrieval
Data Broadcasting Publish metadata via the Environmental Research and Science
Library (ERSL) public interface
Foster collaborative and cross-cutting research Meta-analyses made possible—small dataset mergers Cross-regional data, cross-media data Longitudinal time-series analyses—data recombining
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 11
EaGLe7 The ‘cons’ of metadata7 The ‘cons’ of metadata
Content:Content: What is in the data object? Data descriptions, citation info, electronic file formats
Contacts:Contacts: Who owns the data? Authors, contact person, organization
Context:Context: What is the provenance of the data? Applicable knowledge areas, methods, project origins, etc.
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 12
EaGLe8 The ‘locs’ of metadata8 The ‘locs’ of metadata
LocationLocation Where is the electronic file located? What is the geographic coverage of the data object?
LocksLocks Final version (protected against inadvertent updates) Viewing access controls Editing/downloading access controls Release date, expiration date
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 13
EaGLe
9 Sample Indented Metadata file9 Sample Indented Metadata file
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 14
EaGLe
Acrobat Document
10 Sample 2 10 Sample 2 indented metadataindented metadata
Switch to “Normal” view
Click on icon
Press Page down key to view PDF
When finished, press ESC key to restore “Normal” view
Use slide show icon to resume
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 15
EaGLe 11 Getting in Gear: 11 Getting in Gear:
Feb. 1, 2004: Begin metadata creation.Summer 2004: Begin EaGLe data uploading.Jan. 2005: EaGLe metadata completed.End of no-cost extensions (early 2006): Most of
EaGLe datasets archived but password-protected.
Jan. 2008: Most of EaGLe data released to public
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 16
EaGLe
12 Metadata Creation / Data Uploading12 Metadata Creation / Data Uploading
Metadata Entry Form (MEF) Generates an EML-compliant metadata file in XML format
Automatic upload to ERSL Data packages stored in EIMS repository (ERSL backend)
EaGLe Portal—intranet interface for grantees Review, Approval, and Release Processes
Post-Release: Search, Store and Update Searchable Metadata Records in one area of EIMS/ERSL
Actual Datasets stored in EIMS/ERSL Repository
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 17
EaGLe13 Metadata Checklist13 Metadata Checklist General Information
Data Set Title Point of Contact Time period of the information contained in the dataset Abstract (brief description) of the dataset Geographic coverage of the dataset Data format (i.e., shape-file, coverage, spreadsheet, etc.)
Dataset Creation Formal authors Others who contributed Research objectives for dataset Common misinterpretations of the data, if any
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 18
EaGLe14 Metadata Checklist 14 Metadata Checklist (continued)(continued)
Dataset Contents Was a georeferencing system used? If so, what is it? What does each dataset record describe? What are the attributes that describe these features?
Define each attribute and provide measurement units. Also provide resolution and estimated accuracy, if possible
Define or reference coded attributes (e.g., FIPS codes, error codes)
Dataset Processes Citation of source of original data, if applicable (e.g., GIS data) Types of major data processing steps Detailed methodology of data collection, including study designs,
protocols, equipment, analyses, etc., and any changes in data collection procedures during the study
Record any QA tests performed and their results
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 19
EaGLe
15 Data File Formats:15 Data File Formats:
Files converted into character delimited ASCII files (i.e., comma delimited .csv files)
jpeg, jpg, tiff, gif, img, png, geo-tiff, ecw, ArcView, simple html or htm, xml, LaTeX, TeX, pdf (method files)
Programs in programming language (must have text support).
Excel Spreadsheets (convert to .csv)
Presentation files such as PowerPoint (convert to .pdf)
Word-processing files (convert to ASCII)
Proprietary files
RTF files
Special characters (Greek letters and other symbols not found in ASCII)
Acceptable Unacceptable
Go BackGo End
EaGLe Metadata Training | 24-JUN-04 | Page 20
EaGLeA) Standards for Metadata CreationA) Standards for Metadata Creation
FGDC Content Standard for Digital Geospatial Metadatahttp://www.fgdc.gov/metadata/contstan.htmlhttp://www.fgdc.gov/metadata/metadata.html
National Biological Information Infrastructure http://www.nbii.gov/
Ecological Metadata Language http://knb.ecoinformatics.org/software/eml
Knowledge Network for Biocomplexity (MORPHO)http://knb.ecoinformatics.org/
Dublin Core Metadata Element Set www.dublincore.org
Encoded Archival Description (EAD) http://www.loc.gov/ead/
Data Documentation Initiative http://www.icpsr.umich.edu/DDI/
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 21
EaGLeB) So, what’s EML?B) So, what’s EML?
Ecological Metadata Language A metadata standard designed to handle cross-disciplinary research
A ‘wrapper’ that holds metadata for many different types of primary data (geospacial, biological, genomic,etc)
Widely accepted standard in the ecological communities of interest.
A container that meshes with other types of metadata standards
A metadata standard based on XML vocabulary.
An information ‘tree’ that can graft on new branches of knowledge when they become necessary to the knowledge community
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 22
EaGLeC) EML: Standard for Ecological MetadataC) EML: Standard for Ecological Metadata
Core: Definitions and units of the columns (fields or attributes) in all data tables
Methods, procedures, and protocols
Research questions and hypotheses
Site selection
Authors, contacts, and proper citation for use
Sampling Extent: spatial, biological, & temporal
Sample Indented Metadata
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 23
EaGLeD) What good is EML?D) What good is EML?
Ease of data interchange with other scientists
Enhances precision in data documentation Forces clarity in defining measurement units
Missing-data codes, other interpretative codes
Enforces data access rules
Improves rapid search capability
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 24
EaGLeE) EML Specialty TermsE) EML Specialty Terms
Common usage EML Term
Field, independent variable, column name, header
Attribute
Abstract, Brief, Executive Summary
Abstract
Project Officer, Primary investigator
Party
Go BackGo End
EaGLe Metadata Training | 24-JUN-04 | Page 25
EaGLeF) What is XML?F) What is XML?
eXtensible Markup LanguageA subset of Standard General Markup LanguageA method for marking up plain text
To distinguish clearly between the: content (text) document structure (title, paragraph, line, etc.)
Note: Textual attributes (bold, large, italic, etc) are NOT included.
To make electronic documents readily machine-readable Makes document structures explicit and modular Permits easy transformations between document formats
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 26
EaGLeG) What good is XML?G) What good is XML?
Allows document contents to be re-used in new ways
Allows document elements to be stored just like tables of numerical data
Enforces precise translation of document “look and feel” from one presentation mode (hard-copy) to another (web)
Transparency of markup to future readers
Can accommodate new kinds of text markup at need (audio tags, motion tags, etc)
Converts information to platform and software independent formats to maximize long‑term utility
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 27
EaGLeH) Do I have to learn XML?H) Do I have to learn XML?
NO!
The Metadata Entry Form automatically creates a valid XML document Data entered into the form automatically follows the EML
constraints on mandatory inclusion of metadata elements
Only system administrators and metadata librarians need XML expertise
Go BackGo End
EaGLe Metadata Training | 24-JUN-04 | Page 28
EaGLeIJ) EIMS overviewIJ) EIMS overview
Metadata (data about data)
Information Objects: Data Sets Databases Documents Meetings Models Multimedia Projects Spatial Data Web Site
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 29
EaGLeK) Data Flow: From You to EIMS & backK) Data Flow: From You to EIMS & back
EIMS
EaGLe
Metadata entry into existing
EaGLe system
EaGLe Portal
Data load into EIMS
Data update / retrieval from EaGLe intranet portal into
EIMSData retrieval from
EaGLe internet portal
Future…
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 30
EaGLeL) EaGLe Prototype Home PageL) EaGLe Prototype Home Page
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 31
EaGLeM) EaGLe Prototype Global SearchM) EaGLe Prototype Global Search
Enter selection criteria and click Global Search
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 32
EaGLeN) EaGLe Prototype Search ResultsN) EaGLe Prototype Search Results
Click link to display Metadata Report
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 33
EaGLeO) EaGLe Metadata ReportO) EaGLe Metadata Report
Links to headers in the Metadata Report
Header
Link to top of page
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 34
EaGLeP) EaGLe Metadata Report P) EaGLe Metadata Report (continued)(continued)
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 35
EaGLeQ) EaGLe Metadata Report Q) EaGLe Metadata Report (continued)(continued)
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 36
EaGLeR) EaGLe Metadata Report R) EaGLe Metadata Report (continued)(continued)
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 37
EaGLeS) EaGLe Prototype Simple SearchS) EaGLe Prototype Simple Search
…and click Search
Enter selection criteria…
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 38
EaGLeT) EaGLe Prototype Advanced SearchT) EaGLe Prototype Advanced Search
Enter selection criteria…
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 39
EaGLeU) EaGLe Prototype Advanced Search U) EaGLe Prototype Advanced Search (continued)(continued)
Enter selection criteria…
…and click Search
Go BackGo End
EaGLe Metadata Training | 24-JUN-04 | Page 40
EaGLeOptional Data ArchivalOptional Data Archival
Historical data owned by EaGLe researchers
Data used strictly for QA/QC e.g., temperature of experimental tanks
Work that produced no analyzable data Qualitative reports
Pilot data
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 41
EaGLeDo NOT ArchiveDo NOT Archive
Data not owned by EaGLe researchers
Data already archived elsewhere e.g., many GIS coverages
“Dirty” data Sans quality controls
Containing many missing values, duplicates, etc.
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 42
EaGLeNon-standardized metadataNon-standardized metadata
Field notes
Marginalia
Large object free text fields
Index cards
Voice recordings
Personal communications
Mental notes (non-transcribed knowledge)
Go BackGo End
EaGLe Metadata Training | 24-JUN-04 | Page 43
EaGLeWho is working on EaGLe data Who is working on EaGLe data archiving? archiving? EaGLe data committee (EDC):
Valerie Brady (chair) Terry Brown (GLEI)
Peter Noble (CEER-GOM) Lexia Valdes (ACE INC)
Webb Sprague (PEEIR) Chris Pfeiffer (ASC)
Environmental Information Management System (EIMS)
John Sykes (USEPA EIMS) Computer Sciences Corporation (CSC)
Derek Lane Susan Eversole Steve Walata III
Geoff Blair Wally Schwab And othersGo BackGo End
EaGLe Metadata Training | 24-JUN-04 | Page 44
EaGLe1$ Seems awfully complicated…1$ Seems awfully complicated…
...but it’s easier than statistics No need to learn whole of EML to use the relevant bits
No more complicated than programming a VCR Time, Date, Channel, Skip commercials
Similar to writing a journal article Abstract, Background, Protocol,
Methods, Analysis, Discussion, Results,
Caveats, Secondary analysis potential
Author Names, Affiliations, Bibliography
EaGLe MEF or Morpho user-interface allow production of the most useful metadata
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 45
EaGLe2$ How much does it cost to collect metadata?2$ How much does it cost to collect metadata?
Estimate the value of your research results Total amount of research grant(s) plus 15% added value
Divide by number of years project is funded
Allocate 10% of resulting $/efforts to metadata collection
Distribute amounts evenly over years—don’t stint! Collecting metadata at the beginning of a study captures important
data decisions and research design elements
Use metadata collection as an ad hoc method of data quality control during each year of the study.
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 46
EaGLe3$ How much time is this going to take?3$ How much time is this going to take?
Between 8 and 40 hours per data group All similar data bundled together—not a per dataset cost!
More complex datasets take more time
Loading or linking to pre-written material can save time
Training for use of Metadata Entry Form One-time 3-hour training session
Minimum 3 hours hands-on practice
Availability of live “help” during first solo MEF work
Go BackGo End
EaGLe Metadata Training | 24-JUN-04 | Page 47
EaGLe4$ What Good are Metadata?4$ What Good are Metadata?
High quality metadata serve 5 purposes:Data Integrity Maintenance over the long term: 20-year rule
• Across expected changes in data storage technology, compression, etc.
Tracking, searching for, and retrieving datasets• Like a library card catalogue—where to find data, where to shelve it.
Scientific collaboration• Joint analysis and secondary analysis potential
Cathedral effect• Pooling data across regions contributes to an environmental “big picture”
• Longitudinal studies--building science efforts upon a shared data foundation.
Economical • Extending the shelf life of data gives taxpayers more return on investment
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 48
EaGLe5$ Who needs the EaGLe metadata? 5$ Who needs the EaGLe metadata? Other scientists
Today’s Colleagues & Scientific Collaborators
Tomorrow’s meta-analysts
The next generationArchivists
Data Librarians
Data Exchange Tools (CDX)The Public
Citizens and Citizen Groups
Legislators and other decision-makers
Go BackGo End
EaGLe Metadata Training | 24-JUN-04 | Page 49
EaGLe
1) Data Access and Security1) Data Access and Security
Only registered users may enter or edit a metadata record Record-level edit permissions required for input and update
Only registered Data Librarians can release records to a designated user base (Public, EPA Only, Group, Owner)
Confidential records can be restricted to a subset of users EPA Only – accessible only to EPA registered users Group – accessible only to members of a specified group of users (including
system users outside the EPA firewall, if necessary) Owner – accessible only by the designated owner of the EIMS record
Post-release: any internet user may view metadata records. Separate access controls for actual datasets
Go Back
EaGLe Metadata Training | 24-JUN-04 | Page 50
EaGLe
Generations of ResearchGenerations of Research
For a true confluence of research efforts, clarity in metadata is the key