boudrez - preserving websites controlling the life
TRANSCRIPT
-
7/29/2019 BOUDREZ - Preserving Websites Controlling the Life
1/13
Preserving websites:
controlling the life-cycle
Filip Boudrez
ErpaSeminar: Preserving the web
Kerkira, 23th May 2003
-
7/29/2019 BOUDREZ - Preserving Websites Controlling the Life
2/13
TOC
1. Archival questions and answers:
1.1 archiving the original?
1.2 archiving the record!
1.3 snapshot preservation
2. Archiving antwerpen.be
2.1 current procedure
2.2 preservation legacy websites
2.3 storage
3. Accessing antwerpen.be
-
7/29/2019 BOUDREZ - Preserving Websites Controlling the Life
3/13
1.1 Archiving the original?
Wanted: the original!
static websites
dynamic/interactive websites:
different contents original: server version ?
client version ?
websites designed for specific browserse.g. Dynamic HTML Static W3C HTML
-
7/29/2019 BOUDREZ - Preserving Websites Controlling the Life
4/13
1.2 Archiving records!
snapshot
webbrowser, plug-ins (PDF-reader,
Flash-player, )
serverscripts, executables,stylesheets,
tools
logic
data/content
logfiles
deep web
transactions
flat files,
XML
identification ofrecords + appraisal
selective approach
-
7/29/2019 BOUDREZ - Preserving Websites Controlling the Life
5/13
1.3 Snapshot preservation
Browsable off-line snapshot: undo website fromwebserver and back-end- platform independant preservation
- tool: harvester / off-line browser
Preservation of a webbrowser independant version standard file formats
official (X)HTML-tags and attributes
avoid vendor/software specific features
Long-term preservation: migration: feasible?
webbrowser emulation
-
7/29/2019 BOUDREZ - Preserving Websites Controlling the Life
6/13
1.3 Snapshot preservation
Drawbacks of snapshot approach:
- loss of functionality document in metadata
- intervals: changes / versions in between?
- practical problems
- technology dependant websites: DHTML, Flash
filming ? preserving static version? emulation?
-
7/29/2019 BOUDREZ - Preserving Websites Controlling the Life
7/13
2.1 Archiving current antwerpen.be
creation / design:9make archivable websites
9 registration of metadata: within webpages
external documentation
9awareness: logfiles, e-mails, deepweb
management9outdated information? changes?
9 controlled environment
9keeping documentation
-
7/29/2019 BOUDREZ - Preserving Websites Controlling the Life
8/13
2.1 Archiving current antwerpen.be
archiving:
9quality requirements?
9practical issues?
9archiving action:
when ?
who ?
what ?
9organisation of the website
archive
-
7/29/2019 BOUDREZ - Preserving Websites Controlling the Life
9/13
2.1 Archiving current antwerpen.be
NOK
checking links
broken anchors?
missing files?
file formats?
inspection
transferremove website
OK
add metadata +export as XML
adjust portalsite
capture websitecollect metadata
website: pull/push
metadata: Excel-formMETADATA
Editorial boardEditorial board
Content responsiblesContent responsiblesWebmastersWebmasters
informingregister metadata
Archival serviceArchival service
-
7/29/2019 BOUDREZ - Preserving Websites Controlling the Life
10/13
2.2 Archiving legacy antwerpen.be
Version 1.5/2: - rebuild backup computer
- manual adjusting hyperlinksabsolute document-relative
- migration of obsolete formats
- too much files preserved
- incomplete metadata
Version 3: - definitive loss
Version 4: - manual adjusting hyperlinksabsolute root-relative
- back on webserver: creation ofsnapshot
absolute document-relative
-
7/29/2019 BOUDREZ - Preserving Websites Controlling the Life
11/13
2.3 Storage
in EDMS: storage, no consultation possible
on server: storage + consultation
on optical medium:
CD: ISO-9660 limitations length folder- and filenames
DVD: UDF standardisation
on tape: slow
-
7/29/2019 BOUDREZ - Preserving Websites Controlling the Life
12/13
3. Accessing archived antwerpen.be
-
7/29/2019 BOUDREZ - Preserving Websites Controlling the Life
13/13
More information?
http://www.antwerpen.be/david