methods for data discovery – portals portal facilitates access to and also assimilation of data...

4
Methods for Data Discovery – Portals Portal facilitates access to and also assimilation of data Portal is not simply a web site: it offers services such as data reformatting, subsetting, brokering, etc. Portal is not just a collection of information and links: portal takes you elsewhere through a service Portal answers questions: abstracts data or does simple analysis Identify phases: Phase 1: need a simple presence (web page) to start: avoid initial overreaching Could be multiple portals/interfaces Define discovery Identifying what you know you want Also, importantly, “accidental” discoveries that derive from the broad scope of disciplines and nations PIs want “definitive datasets”: vetted for quality, coverage, etc. Metadata is key In US, 10% of all IT spending is for metadata generation 85% of data is unstructured Need a new means—other than a list returned from a search—to present the data to the users Vetted datasets Desired and useful Danger of cliques taking control Root of ‘vet’ also leads to ‘veto’; overreaching? A desired interface: a list that is classified and aggregated Who are the users? Don’t forget education and outreach community

Upload: suzan-hodges

Post on 24-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Methods for Data Discovery – Portals Portal facilitates access to and also assimilation of data Portal is not simply a web site: it offers services such

Methods for Data Discovery – Portals

• Portal facilitates access to and also assimilation of data• Portal is not simply a web site: it offers services such as data reformatting, subsetting, brokering,

etc.• Portal is not just a collection of information and links: portal takes you elsewhere through a

service• Portal answers questions: abstracts data or does simple analysis• Identify phases:

– Phase 1: need a simple presence (web page) to start: avoid initial overreaching• Could be multiple portals/interfaces• Define discovery

– Identifying what you know you want – Also, importantly, “accidental” discoveries that derive from the broad scope of disciplines and nations– PIs want “definitive datasets”: vetted for quality, coverage, etc.

• Metadata is key– In US, 10% of all IT spending is for metadata generation– 85% of data is unstructured– Need a new means—other than a list returned from a search—to present the data to the users

• Vetted datasets– Desired and useful– Danger of cliques taking control– Root of ‘vet’ also leads to ‘veto’; overreaching?

• A desired interface: a list that is classified and aggregated• Who are the users? Don’t forget education and outreach community

Page 2: Methods for Data Discovery – Portals Portal facilitates access to and also assimilation of data Portal is not simply a web site: it offers services such

Methods for Data Discovery – Portals

• IPY legacy:– Need long term stewardship of metadata and data

• Define audiences: scientists and public– Public needs access to information products

• Phase 0: list of datasets and datacenters• Phase 1: metadata for datasets• 2: publications• 3: Services: visualizations• Start with a single data center (?) NSIDC?• Stages:

– 1. IPY project honeycomb charts: identify sources of data• Done by 2007• Science base • Dataflows:

– Regional focus, discipline focus which point to archive or individuals

– 2. Complementary Portals (links)– 3. Services that allow discovery (esp. databases) of unexpected connections

• Search – access• Interactive – community tools• Visualization• Integrative

Page 3: Methods for Data Discovery – Portals Portal facilitates access to and also assimilation of data Portal is not simply a web site: it offers services such

Methods for Data Discovery – Portals

• Portal must be accessible though search engines (Google)• Alignment of commercial interests with IPY• GoogleBase as a metadata service• Target audiences: scientists and education and outreach

– Also recognize that • Not designing a portal—actually designing a process• Portal captures user interaction and uses this to enhance future use

(e.g. Amazon)• Need to address ontology, metadata design, data collection design

early in the process; counterpoint: we don’t have enough a priori information to design

• Data managers come up with good plans, but implementation is spotty, unless compelled

• Location is a common element that could tie discovery and integration together

• Involve projects in classifying the honeycomb and building the initial lists in Stage 1

Page 4: Methods for Data Discovery – Portals Portal facilitates access to and also assimilation of data Portal is not simply a web site: it offers services such

Methods for Data Discovery – Portals

• Addendums following group discussion– Who is going to do this? (Implementation plan)

• Agencies• National Committees• PIs• DIS• Arctic Council working groups• International bodies• NGOs

– Use lessons learned from groups like ice coring, oceanographers, etc. who are already good at sharing data– All of this goes into the “funding agency data management letter”; can this be articulated in time?– Letter needs to go to agency IPY point of contact.– Three questions

• Who is responsible for IPY?• How will info be used• Wher will info go? (ipy.org)

– Create metadata to describe portals• AMD is an example for metadata and services descriptions• Enable search of portals• Annotate with keywords to limit search results

– Geographic focus– Stakeholders– Disciplines

– Create an online mechanism for users to input list of portals and annotate them; that is, put the burden on the community

• Suggestions: use GCMD and AMD• Use this to solicit feedback and ideas that are desired by the user community