boudrez - preserving websites controlling the life

Upload: daniel

Post on 14-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 BOUDREZ - Preserving Websites Controlling the Life

    1/13

    Preserving websites:

    controlling the life-cycle

    Filip Boudrez

    ErpaSeminar: Preserving the web

    Kerkira, 23th May 2003

  • 7/29/2019 BOUDREZ - Preserving Websites Controlling the Life

    2/13

    TOC

    1. Archival questions and answers:

    1.1 archiving the original?

    1.2 archiving the record!

    1.3 snapshot preservation

    2. Archiving antwerpen.be

    2.1 current procedure

    2.2 preservation legacy websites

    2.3 storage

    3. Accessing antwerpen.be

  • 7/29/2019 BOUDREZ - Preserving Websites Controlling the Life

    3/13

    1.1 Archiving the original?

    Wanted: the original!

    static websites

    dynamic/interactive websites:

    different contents original: server version ?

    client version ?

    websites designed for specific browserse.g. Dynamic HTML Static W3C HTML

  • 7/29/2019 BOUDREZ - Preserving Websites Controlling the Life

    4/13

    1.2 Archiving records!

    snapshot

    webbrowser, plug-ins (PDF-reader,

    Flash-player, )

    serverscripts, executables,stylesheets,

    tools

    logic

    data/content

    logfiles

    deep web

    transactions

    flat files,

    XML

    identification ofrecords + appraisal

    selective approach

  • 7/29/2019 BOUDREZ - Preserving Websites Controlling the Life

    5/13

    1.3 Snapshot preservation

    Browsable off-line snapshot: undo website fromwebserver and back-end- platform independant preservation

    - tool: harvester / off-line browser

    Preservation of a webbrowser independant version standard file formats

    official (X)HTML-tags and attributes

    avoid vendor/software specific features

    Long-term preservation: migration: feasible?

    webbrowser emulation

  • 7/29/2019 BOUDREZ - Preserving Websites Controlling the Life

    6/13

    1.3 Snapshot preservation

    Drawbacks of snapshot approach:

    - loss of functionality document in metadata

    - intervals: changes / versions in between?

    - practical problems

    - technology dependant websites: DHTML, Flash

    filming ? preserving static version? emulation?

  • 7/29/2019 BOUDREZ - Preserving Websites Controlling the Life

    7/13

    2.1 Archiving current antwerpen.be

    creation / design:9make archivable websites

    9 registration of metadata: within webpages

    external documentation

    9awareness: logfiles, e-mails, deepweb

    management9outdated information? changes?

    9 controlled environment

    9keeping documentation

  • 7/29/2019 BOUDREZ - Preserving Websites Controlling the Life

    8/13

    2.1 Archiving current antwerpen.be

    archiving:

    9quality requirements?

    9practical issues?

    9archiving action:

    when ?

    who ?

    what ?

    9organisation of the website

    archive

  • 7/29/2019 BOUDREZ - Preserving Websites Controlling the Life

    9/13

    2.1 Archiving current antwerpen.be

    NOK

    checking links

    broken anchors?

    missing files?

    file formats?

    inspection

    transferremove website

    OK

    add metadata +export as XML

    adjust portalsite

    capture websitecollect metadata

    website: pull/push

    metadata: Excel-formMETADATA

    Editorial boardEditorial board

    Content responsiblesContent responsiblesWebmastersWebmasters

    informingregister metadata

    Archival serviceArchival service

  • 7/29/2019 BOUDREZ - Preserving Websites Controlling the Life

    10/13

    2.2 Archiving legacy antwerpen.be

    Version 1.5/2: - rebuild backup computer

    - manual adjusting hyperlinksabsolute document-relative

    - migration of obsolete formats

    - too much files preserved

    - incomplete metadata

    Version 3: - definitive loss

    Version 4: - manual adjusting hyperlinksabsolute root-relative

    - back on webserver: creation ofsnapshot

    absolute document-relative

  • 7/29/2019 BOUDREZ - Preserving Websites Controlling the Life

    11/13

    2.3 Storage

    in EDMS: storage, no consultation possible

    on server: storage + consultation

    on optical medium:

    CD: ISO-9660 limitations length folder- and filenames

    DVD: UDF standardisation

    on tape: slow

  • 7/29/2019 BOUDREZ - Preserving Websites Controlling the Life

    12/13

    3. Accessing archived antwerpen.be

  • 7/29/2019 BOUDREZ - Preserving Websites Controlling the Life

    13/13

    More information?

    http://www.antwerpen.be/david

    [email protected]