selective archiving of croatian web resources a study of processing costs at the national and...
TRANSCRIPT
![Page 1: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/1.jpg)
Selective Archiving of Croatian Web Resources
A Study of Processing Costs at theNational and University Library of Croatia
Tanja Buzina* Karolina Holub* Miroslav Milinović** Nebojša Topolšćak**
Mirna Willer* Jasenka Zajec**National and University Library, Croatia
[email protected] **University of Zagreb Computing Centre, Croatia
LIDA 2007
![Page 2: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/2.jpg)
Contents
• Background
• Research aim and tasks involved
• Task/Staff members involved
• Assessment analysis
• Conclusions
• Acknowledgements
![Page 3: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/3.jpg)
Background:Archiving selected Croatian web resources
The National and University Library of Croatia archives web resources selectively.
National and University Library & the University of Zagreb Computing Centre (Srce) developed the Digital Archive of Web Resources as a part of a co-operative project (2003-)– Design of the System for Harvesting and Archiving Legal Deposit of
Croatian Web Publications.
The Digital Archive is fully integrated with the library information system and is running as a service since January 2004.
The funding for the development of the Digital Archive, its integration to the catalogue, the purchase of the appropriate computer system, and training of the staff involved were all taken from the running budget.
![Page 4: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/4.jpg)
Background:Digital Archive of Web Resources
After two years of experimenting within the Project team (2003-2005), the Library selected three members of the staff to work full time in a newly created Unit for Processing Web Recourses (UPWR) on the broadly defined jobs of identification, selection, cataloguing and archiving web resources.
The UPWR is supported by the project manager and several staff members from different Library’s departments working part time, and the development and technical staff and the project manager from Srce.
The UPWR’s everyday tasks run in parallel to the third phase of the Project to be finished in October 2007.
![Page 5: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/5.jpg)
Background:The Digital Archive: statistics
• May 15, 2007– 1,640 items,– 15,390 number of instances, growing at the rate of
approximately 400 items per year,– total size of archive: 1TB– 147 of web resources had disappeared from the live
web– 20 items: access restricted for commercial reasons.
• May 15, 2006– 1,364 items – 6,041 instances– total size of archive: 277 GB
![Page 6: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/6.jpg)
Research aim and tasks involved
The aim of this research was to assess the costs for processing web resources.
Two analysis were done:(1) time and type of task per item archived
(2) other costs related to maintenance and development of the service.
The period of the assessment was two months during which the staff involved minutely monitored their tasks.
![Page 7: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/7.jpg)
Tasks involved
1 Identification2 Selection3 Formal and Subject Cataloguing4 Archiving5 Updating the catalogue6 Communication with publishers7 Updating publisher’s register8 Training the library staff and publishers9 Promoting the Digital Archive10 Communication within the Library and with other institutions &
projects11 Design of the System for Harvesting and Archiving Legal Deposit of
Croatian Web Publications: the third phase of the project12 Tasks performed by the University of Zagreb Computing Centre
(Srce)
![Page 8: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/8.jpg)
Cataloguing and archiving: workflow
WEB RESAURCES
1 IDENTIFICATION
2 SELECTION
4 ARCHIVING
6, 7, 10ADMINISTRATIVE
TASKS &CO-
ORDINATION
3 CATALOGUING
4.2 UPDATING DATA IN THE ARCHIVE PUBLISHERS
![Page 9: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/9.jpg)
Total number of staff members (16) working full (3)/part (13) time on web resources per organisational unit (10) per type of task (12)
Name of Unit Number of Staff Type of task Involvement Unit for Processing Web Resources
3 1-11 [except for 5]
full
CIP Unit 1 6 part time ISSN Centre for Croatia
2 1, 2, 3,6, 7, 10 part time
Authority Control Department
1 3 part time
Subject and Classification Department
1 3 part time
Music Collection 1 1, 2, 3, 6, 10 part time Croatian Institute for Librarianship (project co-ordination)
1 9, 10, 11 part time
Serials Cataloguing Department (project member; ISSN and serials cataloguing co-ordination)
1 9, 10, 11 part time
Information Technology Department (project member and lead programmer)
1 5 part time
Croatian Institute for Librarianship (project member)
1 11 part time
University of Zagreb Computing Centre (Srce)
2 4, 6, 8, 9,11, 12 part time
University of Zagreb Computing Centre (Srce) (project co-ordination)
1 8, 9, 10,11, 12 part time
![Page 10: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/10.jpg)
Assessment analysisresults
The assessment period was two months: • March 15 to May 15 2007• 42 working days or 315 hours (7.5 hours/day or 450 minutes/day)
Items fully processed and cost analysed:• 385 items were processed
Items dealt with but not fully processed:• About 100 items were identified and evaluated for inclusion in the
Digital Archive, but did not fulfil the selection criteria• 14 items were password protected but the publishers/authors did not
give permission for full text access during the assessment period
– these were excluded from cost analysis per item– time given in (2) other activities
![Page 11: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/11.jpg)
Distribution of tasks: 1 Identification, 2 Selection, 3 Cataloguing 4 Archiving per item processed per minute
![Page 12: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/12.jpg)
Data analysis: Distribution of type of resource in the sample
28,57%
54,81%
16,62%
serial (1) integrating resource (2) monographic resource (3)
![Page 13: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/13.jpg)
Distribution of type of format: web pages (text, image, sound, video); doc/pdf (text);
other
12,99%
78,70%
8,31%
doc/pdf Web pages unknown
![Page 14: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/14.jpg)
Frequency of harvestingmanual = not automated harvesting
daily etc. = automated harvesting
59,74%
2,60%
1,82%
10,65%
19,74%
5,45%
manual daily day in a week day in a month month in a year unknown
![Page 15: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/15.jpg)
Number of harvesting parameters used
68,31%
5,19%
0,78%5,45%
3,12%
17,14%
1 2 3 4 >4 unknown
![Page 16: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/16.jpg)
Types of harvesting parameters per item:examples for random eight items to be harvested
recursion_depth, unwanted_path_pattern, always_get_embeded_resources
recursion_depth
recursion_depth, unwanted_path_pattern, alternative_host, always_get_embeded_resources, remove_url_param
recursion_depth
recursion_depth
recursion_depth, alternative_host
recursion_depth, synonym, alternative_host
recursion_depth, unwanted_path_pattern, always_get_embeded_resources, remove_url_param
![Page 17: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/17.jpg)
(1) Items fully processed: percentage of time per task58% archiving; 33% cataloguing; 7% selection; 2%
identification
58%
33%
7% 2%
prosjek-t.4 prosjek-t.3 prosjek-t.2 prosjek-t.1
![Page 18: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/18.jpg)
(1) Items fully processed: total sum of time per task126 h /archiving; 72 h /cataloguing; 14.81 h /selection;
4.4 h /identification
264,00 889,00
4334,00
7572,00
ukupno-t.1 ukupno-t.2 ukupno-t.3 ukupno-t.4
![Page 19: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/19.jpg)
(1) Items fully processed: total sum of time per task126 h /archiving
4 Archiving4.1 New items: archiving process
• checking the item on its live address on the Web;• defining the harvesting parameters; registering the item to harvesting queue;• checking the quality of the first harvesting:
– repeating, when necessary, the harvesting with changed parameters,– deleting unsuccessful or poor instances of harvesting,– checking the archived item and for display in the catalogue and the Digital
Archive’s web interface;• defining the frequency of harvesting;
4.2 Existing items: quality control of archived instances• checking the availability of the item on its live Web address according to the monthly
automatic report;• changing harvesting parameters if a change in properties/structure has taken place:
– deactivating harvesting parameters if the web resource has disappeared from the live Web,
– control of the multiple harvesting instances,– deleting unsuccessful harvesting;
• checking automatic daily reports on possible duplicates, and deleting them;• changing frequency of harvesting parameters;4.5 reporting on harvesting problems.
![Page 20: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/20.jpg)
(1) Items fully processed: percentage of time per archiving52% archiving new items (4.1); 45% archiving existing items (4.2); 3%
reporting on harvesting problems (4.2.5)
3%
45%52%
prosjek-t.4.2.5 prosjek-t.4.2 prosjek-t.4.1
![Page 21: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/21.jpg)
(1) Items fully processed: total sum of time/min per task66.18 h/archiving new items (4.1); 56.28 h/archiving existing items
(4.2); 3.73 h/reporting on harvesting problems (4.2.5)
3971,003377,00
224,00
ukupno-t.4.1 ukupno-t.4.2 ukupno-t.4.2.5
![Page 22: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/22.jpg)
(1) Items fully processed: avarage time per task per type of resources processed
0,00
5,00
10,00
15,00
20,00
25,00
30,00
35,00
40,00
prosjek-t.1 prosjek-t.2 prosjek-t.3 prosjek-t.4 prosjek-t.1-t.4
svi PDF only HTML/web
![Page 23: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/23.jpg)
(2) Assessment of costs of other activitiesProject 81.9 h; communication within library & others 33.7 h; training 26.8 h; communication with publishers 17,4 h;
identification & evaluation 8.5 h;
IDENTIFICATION AND EVALUATION
5%
COMMUNICATION WITH PUBLISHERS
10%
UPDATING PUBLISHER’S REGISTERS
1%
TRAINING LIBRARY STAFF 15%
PROMOTING DIGITAL ARCHIVE
4%
COMMUNICATION WITHIN LIBRARY
19%
PROJECT46%
![Page 24: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/24.jpg)
Conclusions: (1) comparison
National Library of Australia: • the costs of processing web resources (acquisitions to
archiving) at the National Library of Australia.[1]
[1] Phillips, Margaret E. Selective Archiving of Web Resources: A Study of Acquisition Costs at the National Library of Australia. // RLG (vol.9, no.13, 2005). Available at: http://www.rlg.org/en/page.php?Page_ID=20666#article0
![Page 25: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/25.jpg)
Conclusions: (1) comparison: National Library of Australia
National and University Library Croatia
• Identification and selection = 30 min; 3 min/item + 8.5 h / 6 staff = ± 80 min/staff member
• Publishers contact, negotiating permission to archive the title, and filling of correspondence = 30 min; 2.56 min/item + 7.4 h / 8 staff = < 60 min/staff member
• Gathering, quality assurance, and archiving instances – 210 min; 19.08 min/item
• Cataloguing – 81 min; 11.26 min/item• Other activities (correspondence with I&A services,
reference inquiries, contribution to the development of PANDAS) – 60 min; (Project) 81.9 h / 8 staff = ± 60 min/staff member
![Page 26: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/26.jpg)
Conclusions (2) comparison: National and University Library,
Croatia - print publications
• As far as we are aware there are no comparable analyses of processing cost for different tasks pertaining to print publications, so only the cost of cataloguing can be compared. – March – May 2007: 10 items/day (monographs or
serial)• Web resources:
– Identification, selection, cataloguing & archiving: 33.92 min/item <3+6 staff>= ± 14 items/day
– Cataloguing <3+4 staff> 11.26 min/item = ± 28 items/day <!!>
![Page 27: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/27.jpg)
Conclusionsgeneral observation
• Analysis is a snapshot of activities within 2 months: the obtained results are not absolute, but should be interpreted taking into consideration specific conditions
• Cataloguing: the results show– Relatively small number of entries (resources) compared to print
publications– Almost the same time used for original cataloguing and updating
existing records– Updating due to the changes of resources characteristics:
specific to web resources vs. print publications– Further analysis need to see the percentage of original
cataloguing & updating
![Page 28: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/28.jpg)
Conclusionsgeneral observation
• Archiving– High percentage of time used for archiving– Further training of the cataloguing staff needed, or– Employment of staff with technical skills: knowledge of web
technology and techniques– Staff member with technical skills should be a member of the
Unit for Processing Web resources (UPWR)• Development
– High percentage of task of the UPWR dedicated to development: reserach, services and tools (guidelines for cataloguing)
– Percentage of time used by UPWR (36.5 h per 1,5 staff) and ISSN Centre for Croatia (10.5 h per 3 staff) vs. Percentage of time of the co-ordinator in Croatian Institute for Librarianship (22.8 h) shows that UPWR and to a lesser degree ISSN have taken much of the development as part of their everyday activities.
![Page 29: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/29.jpg)
The Digital Archive of Croatian Web Resources is freely available to anyone, anywhere in the world
– via the catalogue: http://katalog.nsk.hr/ – Digital Archive’s interface http://
www.nsk.hr/digarhiv
![Page 30: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*](https://reader035.vdocuments.mx/reader035/viewer/2022062718/56649e7b5503460f94b7cae3/html5/thumbnails/30.jpg)
Acknowledgements
The authors wish to thank colleagues who took part in this assessment exercise.
They are Hrvoje Brozović (IT Department), Danijela Getliher and Renata Petrušić (ISSN), Sofija Klarin (Croatian Institute for Librarianship, Project member), Tatjana Mihalić (Music Collection), Robert Ravnić (Authority Control Department), Ingeborg Rudomino (UPWR), Tomica Vrbanc (CIP Unit) and Mirjana Vujić (Subject and Classification Department) from the National and University Library