eurosakai clif project presentation
DESCRIPTION
A presentation given at the EuroSakai 2011 conference in Amsterdam on 27th September 2011. It covers the work of the CLIF project to investigate the management of the digital lifecycle across systems, using the integration of the Sakai collaboration and learning environment with the Fedora digital repository system as an exemplar.TRANSCRIPT
Enabling the digital content lifecycle: content flow between Sakai and FedoraChris Awre
Library and Learning Innovation
EuroSakai
Amsterdam, 27th September 2011
1
CLIF Project
CLIF - Content Lifecycle Integration Framework
Funded by JISC
01 July 2009 – 31 March 2011
Project partners
University of Hull
King’s College London
Centre for e-Research (CeRch)
2
Background
• CLIF is building on work within the JISC-funded RepoMMan and REMAP projects• In particular, REMAP explored how a repository
could support records management and digital preservation as part of a lifecycle management approach for digital content
• Previous work had sought to push the repository upstream in the workflow• Dilemma was that the repository risked becoming
another content silo alongside other content management systems on campus (in our case, Sakai and SharePoint)
• How can the repository become more integrated in the institutional environment?3
Fedora
• Powerful digital repository framework• Adopted at University of Hull in 2005• Live institutional repository since 2008
• Developed and managed through DuraSpace• Strong community model, akin to Sakai
• Features we like (the advert!)• Powerful digital object model• Extensible metadata management• Expressive inter-object relationships• Version management• Configurable security architecture
4
5
Local repository need
• Scalable solution (not one that has upper limit) Digital content is only going to grow
• Standards-based (open standards where possible) To provide a future-proof exit strategy
• Content agnosticism We don’t know what types of content may come
along• Content semantics
Recording the relationships between different pieces of content supports future use and preservation
6
Other repository systems?
• The focus of the work was based around systems that were in place at Hull• Other repository options were not actively considered
• Following on from work looking at integration of DSpace and Sakai through CTREP project• Aimed to achieve the same end goal of seamless
integration for Fedora• Regardless of the system, it is important to
understand what you are trying to achieve in the management of content through integration• Repository choice driven by external factors of how
repository management is carried out
CTREP
• CTREP project was a JISC-funded project, 2007-9• Aimed to increase repository usage through integration within
the LMS, using Sakai as the platform• Cambridge examined integration with DSpace• University of Highlands & Islands (UHI) examined integration
with Fedora• Work focused on use of Sakai ContentHostingHandler• DSpace work successful, albeit that information being sent
between the two was limited• Fedora work halted as it became clear that the version of Sakai
CHH at the time was not able to deal with rich Fedora objects• Re-visiting this has been possible through Sakai developments
• We are grateful to CTREP for pioneering this approach
7
Lifecycle
Lifecyclemanagementwithin arepository
8
Can this beenabled acrosssystems?
Lifecycle integration
9
Content flows between systems according to need in lifecycle
Sakai SharePoint
Repository
10
Sakai and content management• Content management for teaching & learning
makes heavy use of the Resources tool• Some imaginative ways used for how content from
here is used by other tools within the system• Content is also shared between sites, and staff are
encouraged to make their content shareable• Focus of content management is to support use
within Sakai• Focus is on Sakai, not the content• A content silo?
• How could integration with a content store – a repository – enhance how Sakai manages and uses content?
CLIF project objectives
• Understand how digital content can be managed across systems as part of the digital content lifecycle• Recognising that individual systems cannot always
support the whole lifecycle from creation to preservation or deletion
• Specifically investigate the role of repositories in the digital content lifecycle• Where is the repository best positioned within the
lifecycle?• What roles can digital repositories play?
• Understand how content will flow in and out of a repository as part of the lifecycle• CLIF has been agnostic about this
11
CLIF use cases I
• Use cases cover research, teaching and administration
• Based on interviews with staff at partner institutions• Academic staff (Head of Department / Senior
Lecturer)• Records Manager• Research active staff
• Interviews highlighted that staff were managing as best they could within single systems they were familiar with• Potential to exploit additional functionality in other
systems welcomed12
CLIF use cases II
• Research• Capturing data produced through experimental equipment
and archiving this for use in future work in the repository• Preparation of research outputs and archiving of these for
dissemination• Teaching
• Teaching materials accessed from within a repository to inform current courses
• Exam papers created in one system and archived for future reference in the repository (marks could be archived for private access as well)
• Administration• Committee papers circulated to committee members before
a meeting are moved to the repository for wider access post-meeting
13
CLIF outputs
• Literature review on managing the digital content lifecycle across systems
• Technology integrations as exemplars of how a repository can support lifecycle management across systems• Fedora – Sakai integration• Fedora – SharePoint integration• Software available on GitHub• Technical appendix to final report describing
architecture and implementation
14
A digital content lifecycle
15
© Digital Curation Centre
There are many variations andversions of lifecycle models
- another is not required
Each has a number of stages
CLIF sought to capture use casesthat encompassed a number ofthese stages and tested how theycould be managed across systems
Literature review
• There was little literature directly addressing the system aspects of managing the digital content lifecycle• Work was focused within a system or was more
architecture-based without addressing specific systems• Possibly due to flux in technology development
• Terminology is key to addressing lifecycle management• There are many different lifecycles (knowledge,
digitisation, metadata, etc.) that may overlap• Can be easier to break down the lifecycle into
stages, many of which are common16
Lifecycle characteristics
• The use of standards can greatly ease movement between systems• cf. the use of the Hydra digital object approach
• Policy is as important as technology in determining how different systems are used to manage a lifecycle
• Digital preservation can be greatly supported if considered at the beginning of the lifecycle (as REMAP found)
• There is a need to identify how people and roles fit into an overall lifecycle
• It may be valuable to record information about the lifecycle itself as content moves, but this has resource implications• cf. the use of PREMIS events metadata recording what happens to
an object
17
Sakai – Fedora integration
• Sakai 2.6.1
• Fedora v3.4
• Extends and enhances the JISC CTREP Fedora ContentHostingHandler
plugin• CHH is a pluggable provider model for hosting content• Content displayed in standard Sakai Resources Tool
• Enabled and Configured by uploading a mountpoint.properties text file
• Resources Tree view shows a ‘live view’ of a specific Fedora collection
• ‘Show other sites’ allows files and/or nested folders to be copied/moved between MyWorkspace site and Fedora mounted site
19
20
.properties configuration file
21
Sakai to Fedora
22
Or…
ContentHostingHandlerImplFedora
ContentHostingHandlerResolverImpl
DBContentService
BaseContentService
CHS API
Resources Tool
23
Linking Sakai and Fedora
• Content held in Sakai and Fedora are held very differently• Sakai holds files• Fedora holds objects made up of a collection of datastreams,
one of which is the file (others will contain metadata)• In linking Sakai and Fedora, three considerations
needs to be addressed• Displaying Fedora objects in a tree structure and Fedora
collections as folders• Issue for security around the objects
• Depositing a file in Fedora from Sakai requires a Fedora object with associated metadata to be created
• Retrieving a file from Fedora for use in Sakai requires use of the search capability within Fedora
24
Lessons learned
• SOAP messaging between the two systems made the link very slow• Due to use of HTTPS• Switching to HTTP improved performance and
allowed easier debugging• Other performance improvements enabled
included,• Caching of resources and folder objects• Minimising web service calls by sing one call to
retrieve multiple properties• No pre-fetching of datastreams
• The CHH code is over-complicated at times• Impact of changes at high level can be extensive
lower down
Sakai – Fedora features
• The repository is embedded as a set of resources that appear like any other set of resources• The majority of menu functions work in the same
manner as with standard resources, e.g., upload, copy, paste, move, delete, create
• This applies to folders as well as individual objects• Folders represent collection objects in the
repository• Metadata can be captured in Sakai for use in
Fedora (though Sakai is not able to re-use this when retrieving an object from Fedora)
• User can browse Fedora collection (though not yet search)
• User does not need to know they are working with the repository
25
Fedora 2
• Very flexible – this has made exchanging objects between Fedora instances and between Fedora and other systems difficult
• Common approach to structuring digital objects is required• Systems interacting with Fedora can build objects
using this common approach• CLIF adopted the approach developed through the
Hydra project• http://projecthydra.org/
26
27
Fedora 2 contd.
• Common structuring/modelling approach allows for object metadata to be edited in the repository as part of their lifecycle management
• Each object has:• rightsmetadata
• …and could have…• descmetadata (using MODS)
• contentmetadata
• techmetadata
• etc.
• If Sakai can provide this
Copy/move to/from Repository
28
Copy & move folders/files between Fedora and MyWorkspace is easy ! Copy…
Copy/move to/from Repository
…paste!
29
It looks easy, but…
… you don’t see what is going on underneath!
30
© 2008 Richard Green
31
Outstanding work
• Managing versions from within Sakai, or accessing them, isn’t currently possible
• Some of the commands under the Edit functionality have no current effect on the object in Fedora
• The metadata captured is minimal, and Sakai cannot make use of metadata added within Fedora
• Folders with large numbers of resources have a noticeable impact on performance when browsing or carrying out actions upon them
Evaluation
• There needs to be a clear understanding and view about where the boundaries are between the different systems being used, to avoid confusion
• There needs to be clarity over why different systems are being used, to overcome concerns about having to work with multiple systems
• There is a need for better preservation and a recognition that integrating the repository could support this, but also a need to be clear about what needs preserving
• There is benefit in being able to access other content stores from within your current working environment in order to see what is available more broadly
32
33
Sakai-repository evaluation• The seamless access was much valued
• Having access to resources that could be used within Sakai was a valuable addition to being able to browse resources inside Sakai
• Providing access to resources in context was considered very important, hence, linking to the files in the repository instead of copying them across may be preferred• Why create a copy if access is OK where the content is?
• Reference or irregular content was considered to fit best into the model of access via repository
• Bulk movement likely to be more useful than object by object movement
34
Sakai OAE
• Focus on presentation of content in context• This tallies with findings in CLIF
• Focus on use of APIs where available• Institutional repository systems are not so good at
this• A challenge for these systems
• Capturing annotations alongside original content would enhance archival records
• Exporting multiple resources, as IMS CP or other, also a route for managing content across systems
Conclusions
• Diverse content management systems can be effectively integrated to allow cross-system lifecycle management• Better adoption of interface standards would be helpful
• Standardisation in the structure of the content being moved maximises how the content can be managed by the different systems
• Where the repository is one of the systems involved its current primary role appears to be as a recipient of content (for preservation)• Perception that content in the repository can be used
there without moving it into the other integrated systems
35
Demo
36
Copyright © copyright-free-photos.org.uk
Thank you
Chris Awre – [email protected]
Richard Green – [email protected]
Andrew Thompson – [email protected]
Simon Waddington – [email protected]
Project website - http://www2.hull.ac.uk/discover/clif.aspx
Project GitHub - https://github.com/uohull/clif-sharepoint and https://github.com/uohull/clif-sakai
Project final report - http://edocs.hull.ac.uk/splash.jsp?parentId=hull:1647%26pid=hull:4194
37