Download - Farl web archiving
![Page 1: Farl web archiving](https://reader034.vdocuments.mx/reader034/viewer/2022052601/55988e101a28ab90128b475c/html5/thumbnails/1.jpg)
A survey of web-based art resources with findings applicable to FARL electronic records collection development
Alison Rhonemus, LIS 698, Seminar and Practicum, Dr. Tula Giannini
Frick Art Reference LibraryDeborah Kempe, Chief, Collections Management & Access
Web Survey and Collection Development
Coffee on the terrace
![Page 2: Farl web archiving](https://reader034.vdocuments.mx/reader034/viewer/2022052601/55988e101a28ab90128b475c/html5/thumbnails/2.jpg)
M-LEAD-TWO
Intern enterprises -"collection assessments, digital resource surveys, web archiving, provide support for important consortial programs such as shared resources"● Brooklyn Museum: Mark Daly, Ronnette Hope,
Project Manager: Emily Atwater● NYARC Latin American Resources (MOMA):
Ralph Baylor● FARL: Gretchen Nadasky, Alison Rhonemus
![Page 3: Farl web archiving](https://reader034.vdocuments.mx/reader034/viewer/2022052601/55988e101a28ab90128b475c/html5/thumbnails/3.jpg)
Frick Art Reference Library
In early 2011, the Frick Art Reference Library and the Thomas J. Watson Library at The Metropolitan Museum of Art completed a pilot project to address coordinated collecting of born-digital auction catalogs using ContentDM and Archive-It.
![Page 4: Farl web archiving](https://reader034.vdocuments.mx/reader034/viewer/2022052601/55988e101a28ab90128b475c/html5/thumbnails/4.jpg)
FARL web archiving program is situated in Collection Development.Current plans for website capture include online auction catalogs and art web resources
cataloged by NYARC.Fellow MLEAD-TWO intern Gretchen Nadasky has just described online auction
catalogs.My project focused on NYARC cataloged websites.
![Page 5: Farl web archiving](https://reader034.vdocuments.mx/reader034/viewer/2022052601/55988e101a28ab90128b475c/html5/thumbnails/5.jpg)
Web Archiving
"The Internet Archive is already doing it.”
Actually, the IA is providing the tools for other institutions to use in archiving.
![Page 6: Farl web archiving](https://reader034.vdocuments.mx/reader034/viewer/2022052601/55988e101a28ab90128b475c/html5/thumbnails/6.jpg)
ARCHIVE - ITuses open source tools developed by the
Internet Archive● Heritrix Web Crawler ● Wayback Interface● WARC format, an ISO standard
![Page 7: Farl web archiving](https://reader034.vdocuments.mx/reader034/viewer/2022052601/55988e101a28ab90128b475c/html5/thumbnails/7.jpg)
![Page 8: Farl web archiving](https://reader034.vdocuments.mx/reader034/viewer/2022052601/55988e101a28ab90128b475c/html5/thumbnails/8.jpg)
the report and manual checks
Partner and WAYBACK interface
Quality Assurance
![Page 9: Farl web archiving](https://reader034.vdocuments.mx/reader034/viewer/2022052601/55988e101a28ab90128b475c/html5/thumbnails/9.jpg)
• Password protected sites – can not be archived
• Javascript – more complicated implementation can be difficult to capture and display. Ongoing area of development.
• Videos -- difficulty with some proprietary formats
• Form and Database driven content --‐ may be archived using a sitemap or other direct links to the content.
Evaluating seeds
![Page 10: Farl web archiving](https://reader034.vdocuments.mx/reader034/viewer/2022052601/55988e101a28ab90128b475c/html5/thumbnails/10.jpg)
Robots.txt Blocks
The crawler by default respects all robots.txt files. Check post--‐crawl reports for blocked seeds or documents
If your site is blocked:
a) Contact the site owner and ask if they will un--‐block
b) Ask your Partner Specialist to turn on “ignore robots” feature in your account
Notes:
/ denotes single directory seed
subdomains.archive.org (add individually or expand seed)
![Page 11: Farl web archiving](https://reader034.vdocuments.mx/reader034/viewer/2022052601/55988e101a28ab90128b475c/html5/thumbnails/11.jpg)
Site Survey Criteria● html/flash/pdf
● images
● embedded material ● links ● directories and subdomains ● terms, rights statements and permissions
![Page 12: Farl web archiving](https://reader034.vdocuments.mx/reader034/viewer/2022052601/55988e101a28ab90128b475c/html5/thumbnails/12.jpg)
Obvious ruse
![Page 13: Farl web archiving](https://reader034.vdocuments.mx/reader034/viewer/2022052601/55988e101a28ab90128b475c/html5/thumbnails/13.jpg)
More of the obvious
Sites created without the intention of being archived are the sites in need of
archiving.
![Page 14: Farl web archiving](https://reader034.vdocuments.mx/reader034/viewer/2022052601/55988e101a28ab90128b475c/html5/thumbnails/14.jpg)
Survey Says
● 257 cataloged entries● 168 resources are possible to capture ● 82 resources would require more research or
display definite red flags for web archiving. ● PDFs are available for at least some of the
content in 75 resources. ● Flash was an element in 23 resources ● 16 sites used HTML5 ● 54 used a CMS like Drupal or WordPress
![Page 15: Farl web archiving](https://reader034.vdocuments.mx/reader034/viewer/2022052601/55988e101a28ab90128b475c/html5/thumbnails/15.jpg)
There were 3 cataloged resources no longer available on the live web but viewable through Internet Archive. Another 2 defunct resources were not available through Internet Archive. The main page for one of these lost resources was available as a snapshot in WAYBACK but the actual cataloged resource was not available.
![Page 16: Farl web archiving](https://reader034.vdocuments.mx/reader034/viewer/2022052601/55988e101a28ab90128b475c/html5/thumbnails/16.jpg)
![Page 17: Farl web archiving](https://reader034.vdocuments.mx/reader034/viewer/2022052601/55988e101a28ab90128b475c/html5/thumbnails/17.jpg)
![Page 18: Farl web archiving](https://reader034.vdocuments.mx/reader034/viewer/2022052601/55988e101a28ab90128b475c/html5/thumbnails/18.jpg)
![Page 19: Farl web archiving](https://reader034.vdocuments.mx/reader034/viewer/2022052601/55988e101a28ab90128b475c/html5/thumbnails/19.jpg)
Change is Constant
Archive-It Updates:● Heritrix 1 series to Heritrix 3 series (February)● Archive-It 4.8
(May)
![Page 20: Farl web archiving](https://reader034.vdocuments.mx/reader034/viewer/2022052601/55988e101a28ab90128b475c/html5/thumbnails/20.jpg)
Archive-It 4.8
![Page 21: Farl web archiving](https://reader034.vdocuments.mx/reader034/viewer/2022052601/55988e101a28ab90128b475c/html5/thumbnails/21.jpg)
Plans
● Upcoming grants
● Capture of NYARC institution websites
● Include Wayback interface links in Arcade catalog records
● Continue to identify websites for capture and implement capture
![Page 22: Farl web archiving](https://reader034.vdocuments.mx/reader034/viewer/2022052601/55988e101a28ab90128b475c/html5/thumbnails/22.jpg)
Conclusions
○ Digital resources not prevalent enough to reassign current staff
○ Website capture most costly in terms of staff time
○ Copyright continues to be an issue
○ Long term digital preservation needs yet to be assessed
○ Capture of Frick Collection sites and NYARC will pose as a challenging test case