aggregation as tactic sm new

Download Aggregation as tactic sm new

Post on 17-May-2015




3 download

Embed Size (px)


  • 1. aggregation as a tactic -to support discoveryPeter Burnhill & Stuart Macdonald EDINA national data centre University of Edinburgh CERN workshop on Innovations in Scholarly Communication (OAI7) University of Geneva, 23 June 2011


  • RDTF Vision:
  • The joint JISC / RLUK Resource Discovery Task Force (RDTF) Vision:
  • UK researchers and students will have easy, flexible, and ongoing access to
  • content and services through a collaborative, aggregated and integrated resource
  • discovery and delivery framework which is comprehensive, open and sustainable
  • Making content more discoverableboth by people and machinevia a
  • mixed economy of technological solutions.
  • The Discovery Initiative aims to:
  • Engage stakeholders across libraries, archives and museums
  • Build critical mass of open content to inspire others to participate
  • Encourage development of purposeful aggregations and compelling
  • applications -mashing at the macro-level
  • Exemplify what can be done across domains to free data and explore how tomake that data work harder
  • No one-size fits all solution!

Context 3.

  • Key concept in RDTF Vision is aggregation, directly or represented through metadata to unlock the online & digital riches held in our organisations
  • Regard aggregation as intervention t o exploit the telematic opportunity for things [that] are'remote, digital & published-a phrase derived from an IASSIST conference in 1990 exploring what it meant with the Internet if we regarded all [content] as remote and published.
  • The Web in mid-1990s simplified and thus improved
  • Unfortunately, even now, much which is online and on the Web is badly or inadequately published
  • We have to improve, re-interpreting what it means to be well-published

aggregation as a tactic-a phrase coined to end an an impasse during a meeting to discuss technical aspects of the RDTF Vision statement to identify stakeholder groups 4.

  • The term aggregation is used a lot in computer science for:
    • objects assembled or configured together to create a more complex object UML, IBM
    • aggregating resources based on properties. they are owl:sameAs and their other properties can be intermixed .
  • For purposes of RDTF aggregation means:
  • an assembly of data sources
    • more than a collection of objects (image banks, data services, catalogues, activity data) related or otherwise
  • for machine-as-user independent of presentation layer
  • However aggregation is not a goal nor an end in itself-It is an intervention to be used for a twofold strategic purpose:
  • improvement -merge & match, customisation and consumption, multiple output formats, reduce duplication of effort
  • discoverability via promiscuous or well-dressed metadata through e.g. Google or tailored services


  • Digital Library has mixed parentage - a re-mix of the document
  • tradition & the computation tradition
    • approaches based on a concern with documents, withsignifying records : archives, bibliography, documentation, librarianship, records management, and the like [ Content Provider speak ]
    • approaches based onuses of formal techniques , whether mechanical (such as punch cards and data-processing equipment) or mathematical/computational (as in algorithmic procedures).[ Developer speak ]
      • Prof. Michael Buckland,Presidential Address, American Society for Information Science,JASISs 50th (1998)

Language & Perspectives 6.

  • EDINA-develops and delivers JISC-sponsored national online services
    • adding value to data and content
      • Digimap Collections (OS mapping; SeaZone; BGS)
      • NewsfilmOnline (various; digitised with JISC )
      • UK Access Management Federation (institutions; authentication)
  • Data Library move from support to middle folk
      • Research data support for Edinburgh researchers
      • Research data management guidelines, training, OER materials
      • Edinburgh DataShare open data repository
      • RADAR Researching A Data Asset Registry
  • Maybe asmiddle folk -c.f. those who deal in middleware
      • sometimes having the role of creator and supplier of some service
      • sometimes being the user of what others supply
      • inter-operator

Perspectives as provider 7. Perspective as aggregator:developing and delivering JISC-sponsored aggregation services

    • JISCMediahub - links to collections & hosted content (c. 1m resources)
      • CultureGrid; First World War Poetry; Films of Scotland; Getty images (all content searchable and viewable within JISC Media Hub)
    • GoGeo!- metadata registry for spatially-referenced data
      • Geodoc Metadata creation tool, ShareGeo Open
    • SUNCAT serials union catalogue: 80 libraries
      • metadata/links to full text, download MARC records (& XML & SUTRS -Simple
      • Unstructured Text Record Syntax - data exchange format widely used in
      • Z39.50)
    • PEPRS -e-journal preservation registryjointly led by EDINA with the ISSN International Centre
      • metadata registry of available back copy e-journals - aggregated from
      • preservation agencies (incl. British Library, UK LOCKSS Alliance, CLOCKSS)

8. Some RDTF-related projects @ EDINA

    • GOgeo Linked Data(GOLD) triplify INSPIRE compliant metadata to improve discoverability of metadata records via search engines
    • SUNCAT : Exploring Open [bibliographic] Metadata (working with OKF to open up data sent by contributing libraries convert to RDF)
    • Sharing OpenURL Activity Data - monthly usage data: date & time; anonymised IP address/inst. ID; title; author; ISSN, DOI
    • Uses article/journal recommendations, publishers reviewingwhat content is of interest to specific communities, innovativeservices to meet users needs
    • CHALICE Use data mining to extract placenames from the English Place Name Survey to create a UK historic gazetteer published as Linked Data & link it to the Geonames ontology on the semantic web.
    • AddressingHistory Geo-parsing of Scottish Post Office Directories, API onto digitised content, output in XML, CSV, JSON
    • 3 further case studies on other EDINA services illustrating how other collections can benefit from the same techniques.

9. The end is the start of a new beginning

  • In earlier web time we had the MODELS user-verbs:
    • Discover -> Locate -> Request -> Access (Deliver)
    • Dempsey, Russell & Murray (1999)
    • where Access was the end game for us middle folk even if the
    • beginning & part of a deeper process for researchers, students
  • Now there is call formore than bilateral & negotiated interoperability, where Access is the beginningfor developers and for other services
  • RDF/Linked Data enables information to be shared in a more Web-friendly way
  • RDF/Linked Data enables structure and content of those data sources to be explicit- vocabularies, ontologies, relationships
  • Exposing the complexity and relationship in the underlying data,
  • hanging the insides on the outside!

10. The treasures are on show inside, but CentrePompidou 11. and so to summarise..

  • Early web approaches focused on making content accessible for humans
  • hidingthe complexity and relationship in the underlying data
    • paying attention to the user interface:HCI & GUI; Usability and Accessibility
  • However to ensure content gets noticed it must be made easier for machines to understand by:
  • e