the dspace digital reposito
TRANSCRIPT
8/8/2019 The DSpace Digital Reposito.
http://slidepdf.com/reader/full/the-dspace-digital-reposito 1/7
{ 2006 11 09 }
The DSpace Digital Repository: A Project Analysis
Here is the conclusion of my analysis of DSpace. I liked this one, I had a fun time doing it.
The issue is that I use Latex and Bibtex, so I couldn’t seem to copy text from a PDF to submit
it to my blog without taking off the references. But, here is a full copy of my PDF, so you can
read it all if you want. I will update things when I can get the full paper translated.
Update: Full Paper below the cut, thanks to latex2rft
6 Summary of Issues and Benefits
6.0.1 Issues
As has been addressed, there are some problems with DSpace. In the first place, the
software is open source. While this does come with its own benefits, it also comes with its
own problems. Commercial support for the software does not exist at this time, neither for
installation nor for later technical issues. Libraries used to working with commercial software
or ILS vendors may find implementation difficult. Furthermore, some who have previously
implemented the software have had problems with performance while updating files and with
the structure of the communities, although these may have been fixed in successive releases
of the software.
The major difficulty we have found is with DSpace’s handling of metadata. While we feel that
the number of fields in Dublin Core is adequate for most if not all uses (DCMI Usage Board
2006), we are troubled by the lack of authority control when completing its fields. Without
some control over uniform titles, authors and subjects accessing the items in the future will
very problematic. However, this could be solved at an institutional policy level, with
guidelines for submission and librarians or faculty having roles in the “ workflow” overseeing
metadata. While there is no scope in this paper for a discussion of necessity of controlled
vocabulary, we will stress that this necessity does not just apply to paper documents, but to
digital ones as well.
6.0.2 Benefits
Despite this fault, we do find that DSpace has many positive aspects. We find it to be an
amazingly flexible and robust system which would be ready to handle almost any university’s
needs right out of the box. It has the flexibility to handle all types of documents and methods
of research, as well as the simplicity to encourage non-technical users towards the Open
Access (OA) of scholarly research. We also feel that, given Smith’s intentions as cited above,
the system would be an ready for a university to experiment in self-publishing even a part of
its faculty’s research. Furthermore, while open source can have its drawbacks, it has some
definite benefits. The software itself is customizable from the ground up, and any perceived
problems with the system could be fixed by an institution if they so desired. If this were
beyond the abilities of the institution, the software is free, has little hardware requirements,
and would require little administration for a simple, uncustomized installation.
7 Conclusions
It is the goal of the developer’s of DSpace to make the collection, preservation, indexing and
distribution of digital research objects simple (Smith, 2003), to the extent that it encourages
researches to self-archive their own work. Despite a few drawbacks that we have noted,
particularly with the lack of control over metadata, DSpace is an excellent digital repository
system supported by an active community of both users and developers. Given DSpace’s
flexibility to archive any type of digital object and deal with any model of research within a
department or other research community, it is a highly recommended system which can only
improve with further development. This flexibility is increased by the fact that DSpace is open
source, and any modifications or improvements can be implemented by the institutions
themselves, and those improvements can be shared with the wider research community.
1 Introduction
DSpace is an advanced digital repository system that aims to simplify the long-term
HOME
SEARCH
Find
PAGES
About
CATEGORIES
Books
Digital Culture
Libraries
Personal
Philosophy
Updates
Writing
ARCHIVES
March 2010
November 2009
October 2009
September 2009
July 2009
April 2009January 2009
December 2008
November 2008
October 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006November 2006
October 2006
September 2006
August 2006
Subject/ObjectSteven Chabot
DSpace Digital Repository: A Project Analysis / Subject/Object http://subjectobject.net/2006/11/09/the-dspace-digital-repository-a-proj...
7 27/8/2010 6:02 PM
8/8/2019 The DSpace Digital Reposito.
http://slidepdf.com/reader/full/the-dspace-digital-reposito 2/7
archivization and access of digital research objects in any format. DSpace is an open-source,
web-based system which can be remotely accessed by submitters, administrators and the
general public, and can be modified to suit a particular institution’s needs. Furthermore, while
DSpace’s flexibility allows it to be used in a variety of scenarios (“ Introducing DSpace” 2006),
this paper will examine the usefulness of DSpace as a research repository implemented by
the library of a large university for use of its faculty and departments. Here we will examine
the installation, implementation, and usage of a DSpace set-up, and address some problems
or questions that may arise. A test installation of the software is beyond the scope of this
analysis, but reports from other users will be cited. In the end we will conclude that any
limitations of DSpace are minor, and that it would be a highly useful tool for any university to
implement.
2 Project Summary
DSpace was completed in November 2002 through a joint ef fort between Hewlett-Packard
Labs (HP) and the Massachusetts Institute of Technology (MIT), who have released the
resulting code under an open-source licence, specifically the permissive BSD license (Smith,
2003). This means that end-users can adjust, modify or improve the code as they see fit, and
furthermore the project developers do evaluate and reincorporate any improvements made
by users into the main distribution (Smith, 2003). As of this writing the software is hosted on
the open-source repository Sourceforge which currently offers version 1.4 of the software,
indicating the project is beyond beta testing ready for end-users (“DSpace” 2006). DSpace
Federation’s unofficial list has over 100 institutions using DSpace (“DSpaceInstances”2006).
We can conclude that the software is well tested and supported by a community of users.
However, as the software is open-source, neither MIT nor HP offers official support (Smith,
2003).
The project was designed to be a tool for institutions, in MIT’s case a university, to implement
a central location where faculty, departments, disciplines, labs and research centres could
store their published and pre-published research for access by others and long-term
archivization. The developers claim that the software was build to support “every function
that a research organization needs to run a production digital repository service, but as
simply as possible” (Smith, 2003). Furthermore, the software was designed to be
multidisciplinary: it is designed around the idea of the “Community,” which designs its own
work flows and manages its own deposits, which we will examine under “ Usage and
Institutional Policy.” Communities can be any size, from labs to departments to entire
institutes of research (Smith, 2003).
As well, the repository does not simply archive text as some other e-print servers, butanything that may be part of faculty research. Text, audio and video are the most obvious
data formats, but the system will except anything in any format for viewing with the
appropriate software: data sets, complex computer models and simulations, even binary
software (e.g. .EXE files) (“EndUserFaq”2006). The software goes beyond the needs for
eTheses and pre-print servers, although these have been implemented with DSpace (Jones
2004, Nixon2003). The director of the project, MacKenzie Smith, envisions a future where
scholarly journals are removed from the publishing process and universities self-pulish
faculty research with the help of software like DSpace (“ Interview: A journey into
DSpace”2003). DSpace is a robust and flexible repository implementation that, with the right
policies, will be able to handle any research users would wish to deposit in it.
3 Technology Considerations
3.0.1 Requirements
DSpace is designed to run on a standard UNIX system with minimal resources (Smith, 2003),
which should already be in place in most university environments. The system itself is
composed of a standard open-source database (PostgreSQL) and web-server (Apache and
Tomcat) software. The back end of the service runs on Java, and theoretically it could run on
any operating system environment, but this is untested by the developers (Smith, 2003). The
DSpace Foundation recommends IT support by someone with both UNIX administration
experience and Java programming ability (“DSpace System Manager: Impliment
DSpace”2006), although this may only be necessary if an institution were looking to heavily
modify their local installation. Given someone familiar with UNIX software installation and
networking, a basic system could be installed very quickly and simply (Horsman & Pompe
2005).
3.0.2 Support
While neither MIT nor HP offers official support, there is a very active community around the
software, and it is in active development. Beyond the DSpace Wiki <http://wiki.dspace.org>
which addresses both technical and non-technical questions, there are also general,
technical and development mailing lists at <http://dspace.org/feedback/mailing.html> which
July 2006
June 2006
May 2006
April 2006
March 2006
December 2005
October 2005
DSpace Digital Repository: A Project Analysis / Subject/Object http://subjectobject.net/2006/11/09/the-dspace-digital-repository-a-proj...
7 27/8/2010 6:02 PM
8/8/2019 The DSpace Digital Reposito.
http://slidepdf.com/reader/full/the-dspace-digital-reposito 3/7
are very active and bugs are actively tracked on the Sourceforge site <http://sourceforge.net
/projects/dspace/> . There may be some issues with universities who are not experienced
with the support process regarding open-source software and are more familiar with
commercial customer support. Nevertheless, most large university libraries do have IT staff
with the recommended level of experience who should be very familiar with open-source
software.
4 Usage and Institutional Policy
4.0.1 Submission
After installation the system is accessed through a set of three web-based interfaces (Smith,
2003). One is for the end-users, one for those in the submission process, discussed below,
and one for administrators (Smith, 2003). Those formats viewable from within the browser
are loaded on demand, with all other formats available for download and viewing with the
required software (Smith, 2003). In examining the system from the prospective of a submitter
or an administrator, an installation was beyond the scope of this analysis, but we can cite
other users’ impressions of the software. Nixon (2003) outlines a seven step process for
depositing materials: three Description steps, Upload, Verify, Licence and Complete. These
steps are tracked by a progress bar, and the submitter is free to move back and forth
between the steps. For ease of use the submitter, who might not be technically inclined, does
not have to know the file format of his submission as DSpace analyses the file and assigns
an appropriate designation upon upload (Nixon2003). One issue Horsman and Pompe (2005)
found was that the upload process was slow, particularly for larger files, although this may
have been improved in a successive version. Lastly, the submitter can select a licence for their submission, allowing for the choice of an open-source (i.e. Creative Commons) licence if
desired.
4.0.2 Communities
The submission process itself depends greatly on the policies of a particular “Community” as
understood by DSpace. As noted, communities can be of any size, from a small lab to a large
institute. They are defined by the internal policies regarding submission and access to the
research of that group. Submitters are not bound to a particular community, but they do have
to select which community their work will be submitted to (Nixon2003). Users of the system
with different levels of involvement work within a community to access the submission and
prepare it for archivization, a work not being archived until it goes through the community’s
process (Smith, 2003).
4.0.3 Policy
While it could be the policy of a community to allow any of its faculty to submit papers which
are automatically archived, a more complex example may have a group of people designated
as reviewers, a member who is responsible for metadata (discussed below) and a project
co-ordinator who gives final approval (Smith, 2003). A research object would need to be
reviewed and edited according to the community’s policy before it were ultimately archived.
Each person with a role in the process can log on to the system to see what objects are at
what stage of review, and what action must be taken by the various members of the process.
The developers of DSpace call this a “workflow,” (Smith, 2003) and have designed the
system to be flexible enough to handle the work flow of all researchers, from sole English
professors to complex bio-chemical medical research teams.
There can be problems, however, with the implementation of communities. Nixon (2003)
found the communities too “ flat” as sub-communities were not implemented. However, I
believe this critique misunderstands the role of the community. Communities are not,
primarily, for organization of the archive, which can easily be handled by metadata, but are
necessary for the submission process, which can be radically different not only for different
departments across the university, but also “sub-communities” within each department.
Nevertheless, Nixon (2003) does state that sub-communities were added as of version 1.2 of
DSpace.
5 Metadata and Access
5.0.1 Metadata
DSpace archives all research objects under a qualified Dublin Core metadata standard
(Smith, 2003). This is recorded at the time of submission, is displayed with the item when
accessed, and items can be searched by their metadata by end-users (Nixon 2003). Like alldiscussions of metadata, however, there are those who require both more and less
information. Jones jones2004 found the possible metadata as more than adequate for his
uses while Horsman and Pompe horsman2005 found the metadata severely lacking in
specificity for archive purposes. Furthermore they found the lack of multilevel description and
authority control over vocabulary problematic (Horsman & Pompe 2005). Browsing the
DSpace Digital Repository: A Project Analysis / Subject/Object http://subjectobject.net/2006/11/09/the-dspace-digital-repository-a-proj...
7 27/8/2010 6:02 PM
8/8/2019 The DSpace Digital Reposito.
http://slidepdf.com/reader/full/the-dspace-digital-reposito 4/7
University of Toronto’s own “T -Space” repository list of subjects
<https://tspace.library.utoronto.ca/browse-subject> without a controlled vocabulary and
classification scheme proves to be daunting, and searching by subject is very difficult as well.
It might be possible for individual communities to control their own vocabulary, but this is not
a function of the software itself.
5.0.2 Integration
This standard metadata scheme does allow tight integration between DSpace and other
digital repositories, through the implementation of the Open Archives Initiative protocol
(Smith, 2003). This allows data submitted to DSpace to be “harvested” by other repositories.For instance, a community working in Library and Information Science, while submitting their
papers to their local DSpace repository, might also concurrently submit their work to a OAI
compliant pre-print repository such as the Digital Library of Information Science and
Technology (DLIST) <http://dlist.sir.arizona.edu> without having to re-upload files or re-enter
metadata a second time. This makes the connections between databases very easy and
efficient, promoting scholarly interaction beyond the local department or faculty.
5.0.3 Access
Works are accessed by a unique identifier called a “handle,” the goal being to have
persistent citations to a particular document or object for as long as possible (Smith, 2003).
Handles are organized by a special proxy server which keeps track of handles and their
corresponding objects, allowing an item to move or change while retaining the same URL for
web-brower access. As already noted, the user ’s web-browser will open any formats it
recognizes, and any other formats will be downloaded for viewing by the appropriate
software. Not only does this allow for secure archivization and cataloguing of materials, but
also gives researches direct links to previously read materials and long lasting citations
within their own publications for others to follow what they had read. These permanent URLs
also facilitate long-term archivization: as file formats and technologies change, those archives
which can be translated between formats can retain the same URL, allowing transparent
access to users in the distant future (Smith, 2003).
6 Summary of Issues and Benefits
6.0.1 Issues
As has been addressed, there are some problems with DSpace. In the first place, the
software is open source. While this does come with its own benefits, it also comes with its
own problems. Commercial support for the software does not exist at this time, neither for installation nor for later technical issues. Libraries used to working with commercial software
or ILS vendors may find implementation difficult. Furthermore, some who have previously
implemented the software have had problems with performance while updating files and with
the structure of the communities, although these may have been fixed in successive releases
of the software.
The major difficulty we have found is with DSpace’s handling of metadata. While we feel that
the number of fields in Dublin Core is adequate for most if not all uses (DCMI Usage Board
2006), we are troubled by the lack of authority control when completing its fields. Without
some control over uniform titles, authors and subjects accessing the items in the future will
very problematic. However, this could be solved at an institutional policy level, with
guidelines for submission and librarians or faculty having roles in the “ workflow” overseeing
metadata. While there is no scope in this paper for a discussion of necessity of controlled
vocabulary, we will stress that this necessity does not just apply to paper documents, but to
digital ones as well.
6.0.2 Benefits
Despite this fault, we do find that DSpace has many positive aspects. We find it to be an
amazingly flexible and robust system which would be ready to handle almost any university’s
needs right out of the box. It has the flexibility to handle all types of documents and methods
of research, as well as the simplicity to encourage non-technical users towards the Open
Access (OA) of scholarly research. We also feel that, given Smith’s intentions as cited above,
the system would be an ready for a university to experiment in self-publishing even a part of
its faculty’s research. Furthermore, while open source can have its drawbacks, it has some
definite benefits. The software itself is customizable from the ground up, and any perceived
problems with the system could be fixed by an institution if they so desired. If this were
beyond the abilities of the institution, the software is free, has little hardware requirements,
and would require little administration for a simple, uncustomized installation.
7 Conclusions
It is the goal of the developer’s of DSpace to make the collection, preservation, indexing and
DSpace Digital Repository: A Project Analysis / Subject/Object http://subjectobject.net/2006/11/09/the-dspace-digital-repository-a-proj...
7 27/8/2010 6:02 PM
8/8/2019 The DSpace Digital Reposito.
http://slidepdf.com/reader/full/the-dspace-digital-reposito 5/7
distribution of digital research objects simple (Smith, 2003), to the extent that it encourages
researches to self-archive their own work. Despite a few drawbacks that we have noted,
particularly with the lack of control over metadata, DSpace is an excellent digital repository
system supported by an active community of both users and developers. Given DSpace’s
flexibility to archive any type of digital object and deal with any model of research within a
department or other research community, it is a highly recommended system which can only
improve with further development. This flexibility is increased by the fact that DSpace is open
source, and any modifications or improvements can be implemented by the institutions
themselves, and those improvements can be shared with the wider research community.
References
DCMI Usage Board (2006) DCMI metadata terms. Retrieved November 8 2006 from the
Dublin Core Metadata Initiative website: http://dublincore.org/documents/dcmi-terms/.
DSpace (2006). Retrieved November 8 2006 from Sourceforge website:
http://sourceforge.net/projects/dspace/.
DSpaceInstances (2006). Retrived November 8 2006 from DSpace Wiki:
http://wiki.dspace.org/index.php/DspaceInstances.
DSpace System Manager: Implement DSpace. (2006). Retrieved November 8 2006 from
DSpace Federation website: http://dspace.org/implement/sys-man.html.
EndUserFaq. (2006). Retrived November 8 2006 from DSpace Wiki: http://wiki.dspace.org
/index.php//EndUserFaq.
Horsman, P. & Pompe, K. (2005). Building a digital archive: A dutch experience. RLG
DigiNews, 9(6). Retrieved November 8 2006 from RLG website: http://www.rlg.org
/en/page.php? Page_ID=20865#article2.
Interview: A journey into DSpace. (2003), October 20. Open Access Now. Retrieved
November 8 2006 from: http://www.biomedcentral.com/openaccess/archive/?
page=features&issue=7.
Introducing DSpace. (2006). Retrieved November 8 2006 from DSpace Federation website:
http://dspace.org/introduction/index.html.
Jones, R. (2004). DSpace vs. ETD-db: Choosing software to manage electronic theses and
dissertations. Ariadne(38). Retrieved November 8 2006 from: http://www.ariadne.ac.uk
/issue38/jones/.
Nixon, W. (2003). DAEDALUS: initial experiences with EPrints and DSpace at the University
of Glasgow. Ariadne(37). Retrived November 8 2006 from: http://www.ariadne.ac.uk/issue37
/nixon/.
Smith, M., Bass, M., McClellan, G., Tansley, R., Barton, M., Branschofsky, M. (2003).
DSpace: an open source dynamic digital repository. D-Lib Magazine, 9(1). Retrieved
November 8 2006 from: http://www.dlib.org/dlib/january03/smith/01smith.html.
TechnicalFaq. (2006). Retrived November 8 2006 from DSpace Wiki: http://wiki.dspace.org
/index.php//TechnicalFaq.
Posted by Steven Chabot on Thursday, November 9th, 2006, at 9:27 pm, and filed under
Uncategorized.Follow any responses to this entry with the RSS 2.0 feed.
You can post a comment, or trackback from your site.
[view academic citations]
Similar Posts:
Between Books and Bytes
Beneath the Metadata: Some Philosophical Problems with Folksonomy – Elaine
Peterson
On Hobbies
Serendipitous Browsing: A summary and commentary of Thomas Mann’s “What’s
Going on at the Library of Congress?”
Jealousy, or, why closed access journal articles not only hurt scholarship, but basicthe flow of knowledge
{ 6 }
DSpace Digital Repository: A Project Analysis / Subject/Object http://subjectobject.net/2006/11/09/the-dspace-digital-repository-a-proj...
7 27/8/2010 6:02 PM
8/8/2019 The DSpace Digital Reposito.
http://slidepdf.com/reader/full/the-dspace-digital-reposito 6/7
Comments
Dorothea | 10-Nov-06 at 1:26 pm | Permalink
Thank you; this is an excellent summary.
Re: authority control. While DSpace could conceivably provide a
scaffold for authority control, even tying it into national or international
authority files wouldn’t solve the problem in fields where the monograph
is not the primary mode of publication. Too many scientists don’t have
authority records!For what it’s worth, I check authority via the LoC, intervene in the
database as necessary to unite author representations, and don’t fret
about representations for authors with no authority records.
A union authority database would be a wonderful thing
1.
Steven Chabot | 10-Nov-06 at 9:07 pm | Permalink
Thank you for your comments.
And I agree with you on the authority control issue. I realize that things
like that are difficult, but I had to say something negative about the
project. And as I indicated, authority control could be implemented by a
librarian.
However my conclusions are genuine. I am particularly excited by the
project and I would love to get involved with DSpace installation at my
own university, but it doesn’t seem to be as publicized as it could be. I
never new U of T had is own repository all through my undergrad here,
and looking at it now things are kind of a mess.
Too bad that the student positions they advertise seem to be for
undergraduates and not graduate library students.
2.
Unilever Centre for Molecular Informatics, Cambridge - Jim Downing
» Blog Archive » | 14-Nov-06 at 10:22 am | Permalink
[...] Steven Chabot has posted an analysis of the DSpace project and
software (Full report in PDF). As has been addressed, there are someproblems with DSpace. In the first place, the software is open source.
While this does come with its own benefits, it also comes with its own
problems. Commercial support for the software does not exist at this
time, neither for installation nor for later technical issues. Libraries used
to working with commercial software or ILS vendors may find
implementation difficult. Furthermore, some who have previously
implemented the software have had problems with performance while
updating files and with the structure of the communities, although these
may have been fixed in successive releases of the software. The major
difficulty we have found is with DSpace’s handling of metadata. While
we feel that the number of fields in Dublin Core is adequate for most if
not all uses (DCMI Usage Board 2006), we are troubled by the lack of
authority control when completing its fields. Without some control over
uniform titles, authors and subjects accessing the items in the future will
very problematic. However, this could be solved at an institutional policy
level, with guidelines for submission and librarians or faculty having
roles in the “workflow” overseeing metadata. While there is no scope in
this paper for a discussion of necessity of controlled vocabulary, we will
stress that this necessity does not just apply to paper documents, but to
digital ones as well. [...]
3.
Jenny | 09-Oct-07 at 12:04 am | Permalink
I’m surprised that the fact that DSpace is open source is considered a
‘problem’. It is just open source. Open source has considerable
advantages over proprietary software where the code is unavailable –
you can actually do things with it. This is a benefit and not a problem.
Anyone who works with open source, including libraries, understands
that open source is not free – that you need to have the tech support in
place or available to support the implementation. But products like
DSpace (and others like Moodle, Sakai, Shibboleth etc) have been built
4.
DSpace Digital Repository: A Project Analysis / Subject/Object http://subjectobject.net/2006/11/09/the-dspace-digital-repository-a-proj...
7 27/8/2010 6:02 PM
8/8/2019 The DSpace Digital Reposito.
http://slidepdf.com/reader/full/the-dspace-digital-reposito 7/7
Name *
Email *
Website
Message
« PROCRASTINATIONS UNIVERSITY OF TORONTO LIBRARY
AND WORLDCAT »
by a group or community who have a professional approach to the
development process. It’s not a world of cowboys out there any more
-but communities of contributors. T he establishment of the DSpace
foundation also means that DSpace will be properly supported into the
future.
Steven Chabot | 09-Oct-07 at 12:07 am | Permalink
Jenny,
All things with which I agree, and which I also addressed in my analysis.
5.
Dirk Swart | 03-Apr-08 at 9:25 am | Permalink
This is a great article. Any chance you could do the same thing for
Fedora?
Jenny, I completely agree with you about open source, but want to add
that in my experience implementing FOSS at universities is typically
more expensive than an off the shelf solution, and that a significant
portion of the costs are hidden, so much so that it may look cheaper.
This increased cost is not necessarily bad – it spends money “at home”,
usually on people, and given low staff turnover there is at least a strong
case that this is a sound investment.
6.
Post a Comment
Your email is never published nor shared. Required fields are marked *
Receive an email if someone else comments on this post?
Post
Notify me of followup comments via e-mail
© 2010 Steven Chabot | Thanks, WordPress | Barthelme theme by Scott | Valid XHTML & CSS | RSS: Posts & Comments
Type the two words:
DSpace Digital Repository: A Project Analysis / Subject/Object http://subjectobject.net/2006/11/09/the-dspace-digital-repository-a-proj...