gbrds tech issues op

18
GLOBAL GLOBAL BIODIVERSITY BIODIVERSITY INFORMATION INFORMATION FACILITY FACILITY Tim Robertson Systems Architect September 2009 WWW.GBIF.ORG WWW.GBIF.ORG Technical Issues Technical Issues and Opportunities and Opportunities for Resource for Resource Discovery Discovery

Upload: vishwas-chavan

Post on 18-May-2015

333 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Gbrds Tech Issues Op

GLOBALGLOBALBIODIVERSITYBIODIVERSITYGLOBALGLOBALBIODIVERSITYBIODIVERSITY

INFORMATIONINFORMATIONFACILITYFACILITY

Tim Robertson

Systems Architect

September 2009

WWW.GBIF.OWWW.GBIF.ORGRG

Technical Issues and Technical Issues and Opportunities for Opportunities for Resource Resource DiscoveryDiscovery

Page 2: Gbrds Tech Issues Op

ContentContent

A look at the past, present and future of the GBIF registry and portals for biodiversity resources discovery.

Register existence Associate metadata Enable discovery through search

Page 3: Gbrds Tech Issues Op

Registry: The past…Registry: The past…

Universal Description Discovery and Integration (UDDI)

“…XML-based registry for businesses worldwide to list themselves on the Internet …”

UDDI GBIF

Businesses Institutions

+ Services + Collections

+ Service Bindings + Endpoints (DiGIR etc)

+ TModels + Application Schemas (DwC etc)

Page 4: Gbrds Tech Issues Op

UDDI: Metadata UDDI: Metadata

Limited by-in-large to: Contact Information (emails, addresses

etc) Key-Value pairs

ISO country code Endorsing node

Allows for search by title, contact etc 2 levels of credit

Data provenance is lost – lack of recognition!

Page 5: Gbrds Tech Issues Op

Past: Search capabilitiesPast: Search capabilities

Recognising the federated search was limited, GBIF built the Data Portal ( http://data.gbif.org )

Harvesting of resources registered in the UDDI

TAPIR, DiGIR, BioCASe Rich search for individual records and

resources by Darwin Core type terms (the what, where, when etc) by building indexes

Limited metadata search capabilities DiGIR, BioCASe, TAPIR etc offer TECHNICAL

metadata only

Page 6: Gbrds Tech Issues Op

GBIF Network: The real scenarioGBIF Network: The real scenario

Challenge #1:

Model the true nature of the network makeup.

A graph and not a tree Multiple entity types

Institutions, networks, collections, GBIF Nodes

Many relationship types

Page 7: Gbrds Tech Issues Op

Benefits: Accurate data provenance Duplicate record detection Ability to model sub networks

Opportunity: Re-use of registry for your own purposes

Registry: A graph based modelRegistry: A graph based model

Page 8: Gbrds Tech Issues Op

Challenge #2: Scalable deployment supporting this

reuse (99.9%, 24/7)

Authentication model Identity management? Cascading permissions? Wiki style?

Or perhaps copy the model of ?

“Institution X requests to be associated with you. Would you like to accept this association?”

Registry: A graph based modelRegistry: A graph based model

Page 9: Gbrds Tech Issues Op

Challenge #2 (cont.): Who should curate?

Private and community copies?

Single (scalable) instance or multiple masters?

Opportunity: Offering tagging (machine and human) allows for

people to make use of the registry in ways we would not envision

myimagebank.org:containsTypesInTaxon = Leiopelmatidae

Registry: A graph based modelRegistry: A graph based model

Page 10: Gbrds Tech Issues Op

Endpoint monitoring http://bioguid.info/status/ (Rod Page)

Provider monitoringProvider monitoring

Page 11: Gbrds Tech Issues Op

Enabling discoverabilityEnabling discoverability

Combination of human authored with machine generated metadata?

“…artificial intelligence is just that; ‘ARTIFICIAL intelligence’. For a system to feel smart to humans, you need human crafted metadata…”

Page 12: Gbrds Tech Issues Op

Challenge #3:

If there is agreement to improve discoverability by associating automatically generated metadata with a registered entity:

How to uniquely identify resources within the registry? Preserve existing (multiple) identifiers

Where does one stop? (Inventory of Taxa for example?)

What services are required to enable this association? E.g. Find resource for “DwC:collectionCode”

Associating data and metadataAssociating data and metadata

Page 13: Gbrds Tech Issues Op

Existing metadata storesExisting metadata stores

There are many existing resources… Identification of the master copy is critical for

success Conflict resolution – how do we achieve this?

Complete copies or subset copies? Wikipedia style, make copies available?

Page 14: Gbrds Tech Issues Op

Service registrationService registration

To enable a service oriented architecture (SOA) workflow definition

Requires the definition of Service endpoints Input formats Output formats

Remember:

Page 15: Gbrds Tech Issues Op

GUID ResolutionGUID Resolution

Awaiting recommendation from the task group

Do we envisage GBIF running a generic resolver (multiple)?

Act as a cache? Include endpoint monitoring and early warning

system?

Page 16: Gbrds Tech Issues Op

Vocabulary definitionsVocabulary definitions

Requires consensus within the community that terms adequately describe the content.

Community site for authoring vocabularies?

The same applies for extensions to the Darwin Core The GBIF Integrated Publishing Toolkit (IPT) uses

the GBRDS as the source for extension definition and vocabulary definition.

Page 17: Gbrds Tech Issues Op

Be smart with our limited resourcesBe smart with our limited resources

Page 18: Gbrds Tech Issues Op

ContactContact

Web site: http://www.gbif.org

Data portal: http://data.gbif.org

GBIF SecretariatUniversitetsparken 152100 CopenhagenDenmark

E-mail: [email protected]

Phone: +45 3532 1487