texts and digital objects what seems to have changed

33
Texts and Digital Objects What seems to have changed

Upload: miguel-monroe

Post on 27-Mar-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Texts and Digital Objects What seems to have changed

Texts and Digital Objects

What seems to have changed

Page 2: Texts and Digital Objects What seems to have changed

The web as universal library

• Generation I the ASCII text

• Generation II the XML text

• Generation III the book as object

Page 3: Texts and Digital Objects What seems to have changed

The web as universal library

• Generation I the ASCII textA web of text nodes with documents at the nodes

• Generation II the XML textA web where the documents retain deep structure but the web is still the library

• Generation III the book as objectThe library will be imported to the web. Page by page. Library by library. The web is simply a way of accessing the universal library of print objects.

Page 4: Texts and Digital Objects What seems to have changed

But are we going backwards?

Page 5: Texts and Digital Objects What seems to have changed

But are we going backwards?

Some of the movement looks a trifle retrograde

Page 6: Texts and Digital Objects What seems to have changed

Generation I

The primacy of texts Nodes can in principle also contain non-text information such as diagrams, pictures, sound, animation etc. The term hypermedia is simply the expansion of the hypertext idea to these other media. (Tim Berners Lee 1989 proposal for a www written at CERN)

Texts: hypertext, http, and ASCII will do

Page 7: Texts and Digital Objects What seems to have changed

Generation I circa 1995

A forest of connected texts which frankly doesn’t look too great.

Page 8: Texts and Digital Objects What seems to have changed
Page 9: Texts and Digital Objects What seems to have changed

Project Gutenberg

• Texts are what matter

• Accuracy matters

• Page numbering doesn’t

• Typography doesn’t matter either

Page 10: Texts and Digital Objects What seems to have changed

But a good deal is lost

• Typography may not matter, but good web design does

• Typography carries a lot of meta-data

• Meta-data and the formal structure of the text needs to be kept

• Variety, flexibility, and machine-readability ……. xml

Page 11: Texts and Digital Objects What seems to have changed
Page 12: Texts and Digital Objects What seems to have changed

Generation II circa 2000

Books repurposed for the web look a lot better than flat ASCII.

But there is a big overhead.

Page 13: Texts and Digital Objects What seems to have changed

Republished for the web

• Inevitable duplication• Page numbers don’t matter• Typography can be optimised for web

browsers• Structure and added value is preserved• Links and HTTP connections are fine• But this re-purposing is a hassle and

ultimately confusing

Page 14: Texts and Digital Objects What seems to have changed

So Google has a better idea

• Words matter• Pages matter• Books matter• Libraries matter• And they should be searched in the way

that all other digital objects and collections can be searched

Page 15: Texts and Digital Objects What seems to have changed

Generation III circa 2005

Put books on the web just as they are. Books not texts are the

primary resource for a library.

Page 16: Texts and Digital Objects What seems to have changed

Keep it simple

• Scan every page of every book• OCR every word and symbol• Store every word and symbol in a database• Store an image of every page in the database• Know precisely where every word is on every

page

Page 17: Texts and Digital Objects What seems to have changed

How the Google system works

• The browser has a JPEG and some HTML around it

• The web page is an image with search terms highlighted

• The intelligence is in the database• Search is precise and fast• The Google database would be the

universal library

Page 18: Texts and Digital Objects What seems to have changed
Page 19: Texts and Digital Objects What seems to have changed
Page 20: Texts and Digital Objects What seems to have changed

Pages really matter

• Every print page is a web page• A book is just a collection of web pages• The concept of a ‘union catalogue’ will now

have its co-relative a ‘union library collection’ (ie what is a duplicate?)

• There is no such thing as a Google edition• Are the Google standards of preservation

good enough?

Page 21: Texts and Digital Objects What seems to have changed

Simplicity and Conservatism

• Publishers should be flattered• Book designers, editors and typographers

should be more than flattered• Authors are still authors• Catalogues and references work with minimal

adjustment• Book warehouses become obsolete

Page 22: Texts and Digital Objects What seems to have changed

So what is lost?

• Perhaps publishers and authors lose profits????

• The text is lost. The text is readable and searchable…. But there is no text.

• A searchable text, but not an entire and complete text. A collection of pages (JPEGs).

• Certainly none of the deep structure of the xml is retained

• Linkages and references are absent

Page 23: Texts and Digital Objects What seems to have changed

What is gained?

• Books: all texts, documents and libraries become fully searchable.

• Automation of reading and accessibility of rare editions.

• Incredibly cheap in relation to the enhanced availability

• Bibliographies and Catalogues and other systems of metadata are preserved

Page 24: Texts and Digital Objects What seems to have changed

There is much left to do

• No fine structure in the pages

• Poor navigation within the books

• The commercial model has to be invented

• It will not all be advertising driven

Page 25: Texts and Digital Objects What seems to have changed

Exact Editions uses a Google-style platform for magazines

Technology is similar but the sociology is different.

Page 26: Texts and Digital Objects What seems to have changed
Page 27: Texts and Digital Objects What seems to have changed
Page 28: Texts and Digital Objects What seems to have changed
Page 29: Texts and Digital Objects What seems to have changed
Page 30: Texts and Digital Objects What seems to have changed
Page 31: Texts and Digital Objects What seems to have changed
Page 32: Texts and Digital Objects What seems to have changed

Similar to Google Book Search

• Platform for publishers of magazines

• Publishers can add web functionality (links and advertisements)

• PDF as input and automated production

• Subscription or free access

• Full web functionality (statistics and integration with web apps)

Page 33: Texts and Digital Objects What seems to have changed

Adam Hodgkin

[email protected]