1 advanced archive-it application training: quality assurance october 17, 2013

19
1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013

Upload: arely-gragg

Post on 01-Apr-2015

235 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: 1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013

1

Advanced Archive-It Application Training:

Quality Assurance

October 17, 2013

Page 2: 1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013

Goals

• Effective use of tools within the Archive-It web application to get the best quality capture possible of your archived content, including embedded resources necessary to the display and functionality of all in scope content.

• See recorded training videos for more detailed information about crawl scoping.

2

Page 3: 1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013

Quality Assurance Tips

1. Prioritize crawls and websites within your collection to use your time effectively

2. Review Reports, including QA report3. Browse your Websites

-Wayback QA-Proxy Mode

3

Page 4: 1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013

4

Reviewing Reports

• How make the most of your time reviewing reports:– Review high level reports first (Seed Status and

Seed Source) for seed level issues– Then review more detailed reports (Hosts report

and file type specific reports)– Run a QA Report to see if any embedded content

on your seed pages was not captured

Page 5: 1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013

5

Seed Status Report

• Are there any seeds not being crawled?– Double check your seed URLs are correct– Ignore robots.txt

Page 6: 1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013

6

Seed Source Report

• Are there any seeds that are capturing far fewer or far more URLs than others?– Fewer: Was seed “Not Crawled” in seed status report?– More: Check host report for any obvious area to limit your crawl

Page 7: 1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013

7

Hosts Report

• Are there numbers in the “Queued” or “Robots.txt Blocked” column?– Check the URL lists to see if you want to capture these URLs or not

• Are there hosts with fewer or more archived URLs than you expected?– Fewer: Are any expected URLs “Out of Scope”?– More: Are there parts of the site or specific URLs you want to block?

Page 8: 1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013

8

File Type/PDF/Videos Reports

• Are there file types you expected to archive that were not archived?– Check the “Out of scope” column of host report for files not captured

Page 9: 1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013

9

QA Report

• Is there embedded content on your seed pages that was not captured?– Run a Patch Crawl!

Page 10: 1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013

10

QA Report

• Quickly see from the Reports menu which crawls you have run a QA report for already.

Page 11: 1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013

11

Reviewing Archived Websites

How to make the most of your time reviewing archived websites:

• Browse in Proxy mode• Use Wayback QA to check for missing URLs that

may be important to the display and functioning of the website.– Be sure to check pages that are heavy in javascript or

video files, to ensure that content was archived

Page 12: 1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013

12

Proxy Mode

• Why is this helpful?– Browsing in proxy mode ensures that you are only

seeing archived versions of files, and no content is coming from the live web

– Sometimes sites that are heavy in javascript display more fully in proxy mode, so this can help you ensure that content was captured

Page 13: 1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013

13

Proxy Mode

• Live demonstration

Page 14: 1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013

14

Wayback QA

• Why is this helpful?– Wayback QA allows you to perform automated

quality assurance work as you’re browsing through your archived pages in Wayback.

– Wayback QA will note any missing files from the pages you view and allow you to run a patch crawl in order to capture these files and improve the display of your archived pages.

Page 15: 1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013

15

Wayback QA

• Live Demonstration

Page 16: 1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013

16

Wayback QA - Tips

• Browse through all of the sites that you would like to QA before running a patch crawl- you can do one patch crawl across your entire collection.

• Sometimes Wayback QA can be an iterative process.

• Ignoring Robots.txt for a patch crawl does not change crawl settings for future crawls.

Page 17: 1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013

17

Wayback QA vs. QA Report

Wayback QA• Immediately check

for missing resources.• Can be conducted on

any page• Occurs while

browsing in Wayback• Patch crawl: selective

QA Report• Takes 24 hours to

generate after content is Wayback.

• Includes initial seed pages• Tied to a specific crawl

report• Patch crawl: All or

nothing

Page 18: 1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013

18

Potential Workflow

1. After crawl completes- log in to web application2. Analyze reports- any surprises?3. Check pages in Wayback – any surprises?4. Request QA Report and run patch crawl5. In archive mode, run Wayback QA on necessary seeds, as well as some

linked content or pages that may not have archived well.Optional: compare sites in Proxy Mode versus Archive mode

6. Run patch crawl from Wayback QA7. Check for improvements to archived content.8. Use “Submit a Question” link to get further help and guidance for difficult to

archive sites.

What is your workflow like?

Page 19: 1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013

19

Questions?

Please take our quick survey to let us know what you thought about today’s training, and any suggestions or ideas you have for further Archive-It trainings!

http://www.surveymonkey.com/s/FHVCVP6

(see Webex chat for link)