an arizona model for capturing and describing documents on the web richard pearce-moses director of...
TRANSCRIPT
An Arizona Model for Capturing and Describing Documents on
the Web
Richard Pearce-Moses
Director of Digital Government Information
Arizona State Library, Archives and Public Records
rpm at lib.az.us
What Does WWW Stand For?
They both abbreviate to WWW
Rugged Individualism
Lack of standards ~ Lawlessness
[Collage of Robert Conrad as James West in the Wild, Wild West removed to
avoid violation of copyright.]
The Dream
To collect, manage, preserve, and make useful the
enormous amount of digital information
our culture is now producing
The Reality
Two Approaches
Bibliocentric (Item-by-Item)
Tech-centric (Capture-It-All)
Emphasis on Software Tools and Technology
Limited Assistance from Content Providers
Library of Congress & NDIIPP
University of Illinois at
Urbana-ChampaignSchool of Library • Information Science
OCLC
Content ProvidersTufts University Perseus Project • Michigan State University Library • State libraries: Arizona Connecticut, Illinois, North
Carolina, Wisconsin • UIUC partners: NCSA • WILL-AM/FM/TV • Information Management Services
Digital Archives
LibrariesArtificial collections • Item Level Control
ArchivesProvenance • Original Order • Hierarchy • Aggregate Control
Websites as Archival Collections
Documents of Common Provenance
Organized into Directories (Archival Series)
Publications v. Records
The Art and Craft of Building a Collection
What we do remains the same
How we do it will change
※
Identification/Selection
Acquisition
Description
Reference
Preservation
Identification — Where Do We Look?
Finding the Forest az.gov • state.az.us
※
Domain ToolIdentifies all distinct domains Reports
new sites since previous spider
Reports when sites disappear
Selection: Which Collections Do We Harvest?
Collection-Level Analysis
Macro appraisal sets priorities
Materials appraised as series
Content Providers Taxonomy Tool
Names • Administrative history
Relationships • Subjects • Functions
Selection: Which Documents Do We Harvest?
Identify Series Aggregate selection Set frequency of harvests
Site Analysis Tool Display structure Harmonize physical, intellectual structure Identify inaccessible content Show what’s new Show significant changes
Description
To be able to locate documents• when the creator or provenance is known• when the subject is known• and to aid in selection as to character
Series Description• Make directory name a meaningful title• Scope and contents note• High-level subject headings• Recorded in site analysis tool database
Document Description• Creator: taxonomy, internal metadata• Title: from internal metadata, noun
phrases• Subject: from series metadata, internal
metadata
Access
Finding Aids A valuable bird’s-eye view for archivists Of limited value to patrons . . . Unless they’re transformed into topic maps
Full Text Search Engines Ranking Algorithms Categorization / Packaging Results Based on series-level metadata Based on autoclassification
Description and AccessSeries-Level Description
name=“Creator” Governor’s Drought Task Force Rural Watershed Alliance
name=“Subject” reservoirs ground water
name=“Subject” drought water conservation
name=“Subject” potable water agriculture
name=“Type” planning reports
Categorized ResultsYour search for water, Phoenix Found documents in the following categories water (500+) water conservation (357) Salt River Project (210) drought (110) flood control (98) xeriscape (25)
Found documents from the following agencies Water Resources (135) Governor's Drought Task Force (102) Phoenix (87) Maricopa County (84) Corporation Commission (35)
Administration / Curation / Stewardship
SystematicRegular Workflows
Not idiosyncratic
CollaborativeConsensual , Not Idiosyncratic
Avoid Redundant Efforts
Quality ControlNeed for Good Metrics
Need for Regular Audits
Stay Tuned . . . .