selective archiving of croatian web resources a study of processing costs at the national and...

30
Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub* Miroslav Milinović** Nebojša Topolšćak** Mirna Willer* Jasenka Zajec* *National and University Library, Croatia [email protected] **University of Zagreb Computing Centre, Croatia [email protected] LIDA 2007

Upload: morgan-reeves

Post on 29-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

Selective Archiving of Croatian Web Resources

A Study of Processing Costs at theNational and University Library of Croatia

Tanja Buzina* Karolina Holub* Miroslav Milinović** Nebojša Topolšćak**

Mirna Willer* Jasenka Zajec**National and University Library, Croatia

[email protected] **University of Zagreb Computing Centre, Croatia

[email protected]

LIDA 2007

Page 2: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

Contents

• Background

• Research aim and tasks involved

• Task/Staff members involved

• Assessment analysis

• Conclusions

• Acknowledgements

Page 3: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

Background:Archiving selected Croatian web resources

The National and University Library of Croatia archives web resources selectively.

National and University Library & the University of Zagreb Computing Centre (Srce) developed the Digital Archive of Web Resources as a part of a co-operative project (2003-)– Design of the System for Harvesting and Archiving Legal Deposit of

Croatian Web Publications.

The Digital Archive is fully integrated with the library information system and is running as a service since January 2004.

The funding for the development of the Digital Archive, its integration to the catalogue, the purchase of the appropriate computer system, and training of the staff involved were all taken from the running budget.

Page 4: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

Background:Digital Archive of Web Resources

After two years of experimenting within the Project team (2003-2005), the Library selected three members of the staff to work full time in a newly created Unit for Processing Web Recourses (UPWR) on the broadly defined jobs of identification, selection, cataloguing and archiving web resources.

The UPWR is supported by the project manager and several staff members from different Library’s departments working part time, and the development and technical staff and the project manager from Srce.

The UPWR’s everyday tasks run in parallel to the third phase of the Project to be finished in October 2007.

Page 5: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

Background:The Digital Archive: statistics

• May 15, 2007– 1,640 items,– 15,390 number of instances, growing at the rate of

approximately 400 items per year,– total size of archive: 1TB– 147 of web resources had disappeared from the live

web– 20 items: access restricted for commercial reasons.

• May 15, 2006– 1,364 items – 6,041 instances– total size of archive: 277 GB

Page 6: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

Research aim and tasks involved

The aim of this research was to assess the costs for processing web resources.

Two analysis were done:(1) time and type of task per item archived

(2) other costs related to maintenance and development of the service.

The period of the assessment was two months during which the staff involved minutely monitored their tasks.

Page 7: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

Tasks involved

1 Identification2 Selection3 Formal and Subject Cataloguing4 Archiving5 Updating the catalogue6 Communication with publishers7 Updating publisher’s register8 Training the library staff and publishers9 Promoting the Digital Archive10 Communication within the Library and with other institutions &

projects11 Design of the System for Harvesting and Archiving Legal Deposit of

Croatian Web Publications: the third phase of the project12 Tasks performed by the University of Zagreb Computing Centre

(Srce)

Page 8: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

Cataloguing and archiving: workflow

WEB RESAURCES

1 IDENTIFICATION

2 SELECTION

4 ARCHIVING

6, 7, 10ADMINISTRATIVE

TASKS &CO-

ORDINATION

3 CATALOGUING

4.2 UPDATING DATA IN THE ARCHIVE PUBLISHERS

Page 9: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

Total number of staff members (16) working full (3)/part (13) time on web resources per organisational unit (10) per type of task (12)

Name of Unit Number of Staff Type of task Involvement Unit for Processing Web Resources

3 1-11 [except for 5]

full

CIP Unit 1 6 part time ISSN Centre for Croatia

2 1, 2, 3,6, 7, 10 part time

Authority Control Department

1 3 part time

Subject and Classification Department

1 3 part time

Music Collection 1 1, 2, 3, 6, 10 part time Croatian Institute for Librarianship (project co-ordination)

1 9, 10, 11 part time

Serials Cataloguing Department (project member; ISSN and serials cataloguing co-ordination)

1 9, 10, 11 part time

Information Technology Department (project member and lead programmer)

1 5 part time

Croatian Institute for Librarianship (project member)

1 11 part time

University of Zagreb Computing Centre (Srce)

2 4, 6, 8, 9,11, 12 part time

University of Zagreb Computing Centre (Srce) (project co-ordination)

1 8, 9, 10,11, 12 part time

Page 10: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

Assessment analysisresults

The assessment period was two months: • March 15 to May 15 2007• 42 working days or 315 hours (7.5 hours/day or 450 minutes/day)

Items fully processed and cost analysed:• 385 items were processed

Items dealt with but not fully processed:• About 100 items were identified and evaluated for inclusion in the

Digital Archive, but did not fulfil the selection criteria• 14 items were password protected but the publishers/authors did not

give permission for full text access during the assessment period

– these were excluded from cost analysis per item– time given in (2) other activities

Page 11: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

Distribution of tasks: 1 Identification, 2 Selection, 3 Cataloguing 4 Archiving per item processed per minute

Page 12: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

Data analysis: Distribution of type of resource in the sample

28,57%

54,81%

16,62%

serial (1) integrating resource (2) monographic resource (3)

Page 13: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

Distribution of type of format: web pages (text, image, sound, video); doc/pdf (text);

other

12,99%

78,70%

8,31%

doc/pdf Web pages unknown

Page 14: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

Frequency of harvestingmanual = not automated harvesting

daily etc. = automated harvesting

59,74%

2,60%

1,82%

10,65%

19,74%

5,45%

manual daily day in a week day in a month month in a year unknown

Page 15: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

Number of harvesting parameters used

68,31%

5,19%

0,78%5,45%

3,12%

17,14%

1 2 3 4 >4 unknown

Page 16: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

Types of harvesting parameters per item:examples for random eight items to be harvested

recursion_depth, unwanted_path_pattern, always_get_embeded_resources

recursion_depth

recursion_depth, unwanted_path_pattern, alternative_host, always_get_embeded_resources, remove_url_param

recursion_depth

recursion_depth

recursion_depth, alternative_host

recursion_depth, synonym, alternative_host

recursion_depth, unwanted_path_pattern, always_get_embeded_resources, remove_url_param

Page 17: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

(1) Items fully processed: percentage of time per task58% archiving; 33% cataloguing; 7% selection; 2%

identification

58%

33%

7% 2%

prosjek-t.4 prosjek-t.3 prosjek-t.2 prosjek-t.1

Page 18: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

(1) Items fully processed: total sum of time per task126 h /archiving; 72 h /cataloguing; 14.81 h /selection;

4.4 h /identification

264,00 889,00

4334,00

7572,00

ukupno-t.1 ukupno-t.2 ukupno-t.3 ukupno-t.4

Page 19: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

(1) Items fully processed: total sum of time per task126 h /archiving

4 Archiving4.1 New items: archiving process

• checking the item on its live address on the Web;• defining the harvesting parameters; registering the item to harvesting queue;• checking the quality of the first harvesting:

– repeating, when necessary, the harvesting with changed parameters,– deleting unsuccessful or poor instances of harvesting,– checking the archived item and for display in the catalogue and the Digital

Archive’s web interface;• defining the frequency of harvesting;

4.2 Existing items: quality control of archived instances• checking the availability of the item on its live Web address according to the monthly

automatic report;• changing harvesting parameters if a change in properties/structure has taken place:

– deactivating harvesting parameters if the web resource has disappeared from the live Web,

– control of the multiple harvesting instances,– deleting unsuccessful harvesting;

• checking automatic daily reports on possible duplicates, and deleting them;• changing frequency of harvesting parameters;4.5 reporting on harvesting problems.

Page 20: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

(1) Items fully processed: percentage of time per archiving52% archiving new items (4.1); 45% archiving existing items (4.2); 3%

reporting on harvesting problems (4.2.5)

3%

45%52%

prosjek-t.4.2.5 prosjek-t.4.2 prosjek-t.4.1

Page 21: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

(1) Items fully processed: total sum of time/min per task66.18 h/archiving new items (4.1); 56.28 h/archiving existing items

(4.2); 3.73 h/reporting on harvesting problems (4.2.5)

3971,003377,00

224,00

ukupno-t.4.1 ukupno-t.4.2 ukupno-t.4.2.5

Page 22: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

(1) Items fully processed: avarage time per task per type of resources processed

0,00

5,00

10,00

15,00

20,00

25,00

30,00

35,00

40,00

prosjek-t.1 prosjek-t.2 prosjek-t.3 prosjek-t.4 prosjek-t.1-t.4

svi PDF only HTML/web

Page 23: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

(2) Assessment of costs of other activitiesProject 81.9 h; communication within library & others 33.7 h; training 26.8 h; communication with publishers 17,4 h;

identification & evaluation 8.5 h;

IDENTIFICATION AND EVALUATION

5%

COMMUNICATION WITH PUBLISHERS

10%

UPDATING PUBLISHER’S REGISTERS

1%

TRAINING LIBRARY STAFF 15%

PROMOTING DIGITAL ARCHIVE

4%

COMMUNICATION WITHIN LIBRARY

19%

PROJECT46%

Page 24: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

Conclusions: (1) comparison

National Library of Australia: • the costs of processing web resources (acquisitions to

archiving) at the National Library of Australia.[1]

[1] Phillips, Margaret E. Selective Archiving of Web Resources: A Study of Acquisition Costs at the National Library of Australia.  // RLG (vol.9, no.13, 2005). Available at: http://www.rlg.org/en/page.php?Page_ID=20666#article0

Page 25: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

Conclusions: (1) comparison: National Library of Australia

National and University Library Croatia

• Identification and selection = 30 min; 3 min/item + 8.5 h / 6 staff = ± 80 min/staff member

• Publishers contact, negotiating permission to archive the title, and filling of correspondence = 30 min; 2.56 min/item + 7.4 h / 8 staff = < 60 min/staff member

• Gathering, quality assurance, and archiving instances – 210 min; 19.08 min/item

• Cataloguing – 81 min; 11.26 min/item• Other activities (correspondence with I&A services,

reference inquiries, contribution to the development of PANDAS) – 60 min; (Project) 81.9 h / 8 staff = ± 60 min/staff member

Page 26: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

Conclusions (2) comparison: National and University Library,

Croatia - print publications

• As far as we are aware there are no comparable analyses of processing cost for different tasks pertaining to print publications, so only the cost of cataloguing can be compared. – March – May 2007: 10 items/day (monographs or

serial)• Web resources:

– Identification, selection, cataloguing & archiving: 33.92 min/item <3+6 staff>= ± 14 items/day

– Cataloguing <3+4 staff> 11.26 min/item = ± 28 items/day <!!>

Page 27: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

Conclusionsgeneral observation

• Analysis is a snapshot of activities within 2 months: the obtained results are not absolute, but should be interpreted taking into consideration specific conditions

• Cataloguing: the results show– Relatively small number of entries (resources) compared to print

publications– Almost the same time used for original cataloguing and updating

existing records– Updating due to the changes of resources characteristics:

specific to web resources vs. print publications– Further analysis need to see the percentage of original

cataloguing & updating

Page 28: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

Conclusionsgeneral observation

• Archiving– High percentage of time used for archiving– Further training of the cataloguing staff needed, or– Employment of staff with technical skills: knowledge of web

technology and techniques– Staff member with technical skills should be a member of the

Unit for Processing Web resources (UPWR)• Development

– High percentage of task of the UPWR dedicated to development: reserach, services and tools (guidelines for cataloguing)

– Percentage of time used by UPWR (36.5 h per 1,5 staff) and ISSN Centre for Croatia (10.5 h per 3 staff) vs. Percentage of time of the co-ordinator in Croatian Institute for Librarianship (22.8 h) shows that UPWR and to a lesser degree ISSN have taken much of the development as part of their everyday activities.

Page 29: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

The Digital Archive of Croatian Web Resources is freely available to anyone, anywhere in the world

– via the catalogue: http://katalog.nsk.hr/ – Digital Archive’s interface http://

www.nsk.hr/digarhiv

Page 30: Selective Archiving of Croatian Web Resources A Study of Processing Costs at the National and University Library of Croatia Tanja Buzina* Karolina Holub*

Acknowledgements

The authors wish to thank colleagues who took part in this assessment exercise.

They are Hrvoje Brozović (IT Department), Danijela Getliher and Renata Petrušić (ISSN), Sofija Klarin (Croatian Institute for Librarianship, Project member), Tatjana Mihalić (Music Collection), Robert Ravnić (Authority Control Department), Ingeborg Rudomino (UPWR), Tomica Vrbanc (CIP Unit) and Mirjana Vujić (Subject and Classification Department) from the National and University Library