digital annotation and exploration techniques for handling image-based hypermedia

8
Digital Annotation and Exploration Techniques for Handling Image-Based Hypermedia Laurent Robert, Eric Lecolinet ENST / CNRS URA 820, 46 rue Barrault, 75013 Paris, France {lrobert,elc}@enst.fr – http://www.enst.fr/~{lrobert,elc} Abstract: Users need more sophisticated tools to retrieve and manipulate the growing number of image-based documents available in databases. In this paper, we describe interaction and visualization techniques for handling image-based hypermedia. First, we propose a digital annotation scheme that provides an indirect mean for indexing complex documents that can not be processed automatically by usual techniques. We present then two graphical workspaces for accessing data by interactive manipulation in a more intuitive way. The first system provides focus+context views for navigating through large information spaces. The second one is dedicated to the management and the retrieval of documents that have already been seen. Keywords: hypermedia, digitized documents, digital annotation, text/image coupling, text encoding and rendering, XML, information visualization, navigation in large hypermedia, bookmark space. 1 Introduction Digital libraries make it possible to access vast collections of digitized documents such as photo- graphs, pictures, maps, newspapers and various handwritten material (including personal notes, literary manuscripts or experimentation notebooks) (William Blake Archive, 2000). While setting up and organizing digital repositories to provide better access to vast amounts of heterogeneous data is a major industrial and research topic, it seems that less attention has been devoted to the interactive access and manipulation of such material. Indeed, accessing documents in large information spaces is not a trivial task that may require many interactive operations for retrieving appropriate documents. The first difficulty comes from the computer representation of digitized documents. Many of them (especially handwritten or complex composite documents containing pictures) are bitmap images that can not be processed automatically in most cases (common automatic recognition systems such as OCRs being mostly restricted to rather simple machine printed documents). As a consequence, automatic indexation techniques will fail or provide poor results on such documents. A possible solution consists in annotating bitmap images in order to associate textual data (such as notes, keywords or complete transcriptions) with interesting locations in these images. Automatic indexation and various text processing techniques can then be performed on such annotations and other attached data. Annotation thus provides an indirect mean for indexing complex documents that could not be processed automatically by usual techniques. In addition, general annotation techniques allow to link bitmap image locations with various kinds of data by adding hyperlinks. Section 2 presents a system that was designed for transcribing or annotating handwritten or complex composite documents in an interactive way. This work proposes a text/image coupling scheme that is based on the capabilities of the XML/TEI markup language. This system makes it possible to perform tight coupling between digitized documents and their hypertext annotations (these annotations constituting an hypermedia description of the original documents). This system was implemented by using the Ubit toolkit (Lecolinet, 1999), a new GUI toolkit whose specific architecture is well suited for making hypermedia applications. Accessing documents by interactive exploration can be quite a time consuming and inefficient task. Users quite often get “lost in the information space”. They are often unable to figure out precisely where they are currently located in the information space and which path they should follow for accessing rapidly the data they are looking for. Besides, they often have hard times remembering how to come back to previous interesting locations (this problem is for instance typical of the World Wide Web where people often need to find again certain pieces of information). Users are thus confronted to the disorientation problem (Nielsen, 1990) which is caused by the complexity of the hyperlink network

Upload: independent

Post on 11-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Digital Annotation and Exploration Techniques for Handling Image-Based Hypermedia Laurent Robert, Eric Lecolinet

ENST / CNRS URA 820, 46 rue Barrault, 75013 Paris, France

{lrobert,elc}@enst.fr – http://www.enst.fr/~{lrobert,elc} Abstract: Users need more sophisticated tools to retrieve and manipulate the growing number of image-based documents available in databases. In this paper, we describe interaction and visualization techniques for handling image-based hypermedia. First, we propose a digital annotation scheme that provides an indirect mean for indexing complex documents that can not be processed automatically by usual techniques. We present then two graphical workspaces for accessing data by interactive manipulation in a more intuitive way. The first system provides focus+context views for navigating through large information spaces. The second one is dedicated to the management and the retrieval of documents that have already been seen.

Keywords: hypermedia, digitized documents, digital annotation, text/image coupling, text encoding and rendering, XML, information visualization, navigation in large hypermedia, bookmark space.

1 Introduction Digital libraries make it possible to access vast collections of digitized documents such as photo-graphs, pictures, maps, newspapers and various handwritten material (including personal notes, literary manuscripts or experimentation notebooks) (William Blake Archive, 2000). While setting up and organizing digital repositories to provide better access to vast amounts of heterogeneous data is a major industrial and research topic, it seems that less attention has been devoted to the interactive access and manipulation of such material. Indeed, accessing documents in large information spaces is not a trivial task that may require many interactive operations for retrieving appropriate documents.

The first difficulty comes from the computer

representation of digitized documents. Many of them (especially handwritten or complex composite documents containing pictures) are bitmap images that can not be processed automatically in most cases (common automatic recognition systems such as OCRs being mostly restricted to rather simple machine printed documents). As a consequence, automatic indexation techniques will fail or provide poor results on such documents.

A possible solution consists in annotating bitmap images in order to associate textual data (such as notes, keywords or complete transcriptions) with interesting locations in these images. Automatic indexation and various text processing techniques can then be performed on such annotations and other attached data. Annotation thus provides an indirect

mean for indexing complex documents that could not be processed automatically by usual techniques. In addition, general annotation techniques allow to link bitmap image locations with various kinds of data by adding hyperlinks.

Section 2 presents a system that was designed for transcribing or annotating handwritten or complex composite documents in an interactive way. This work proposes a text/image coupling scheme that is based on the capabilities of the XML/TEI markup language. This system makes it possible to perform tight coupling between digitized documents and their hypertext annotations (these annotations constituting an hypermedia description of the original documents). This system was implemented by using the Ubit toolkit (Lecolinet, 1999), a new GUI toolkit whose specific architecture is well suited for making hypermedia applications.

Accessing documents by interactive exploration

can be quite a time consuming and inefficient task. Users quite often get “lost in the information space”. They are often unable to figure out precisely where they are currently located in the information space and which path they should follow for accessing rapidly the data they are looking for. Besides, they often have hard times remembering how to come back to previous interesting locations (this problem is for instance typical of the World Wide Web where people often need to find again certain pieces of information). Users are thus confronted to the disorientation problem (Nielsen, 1990) which is caused by the complexity of the hyperlink network

and by the large range of navigation choices available to them. Moreover, users can usually only see a small fraction of the information space at a given time (typically, the single page that is currently displayed by their browser). This makes it harder to associate ideas and to understand relationships between data.

Section 3 presents some visualization techniques for facilitating the interactive retrieval of documents. The graphical workspaces that have been developed provide image-based representations and interactive manipulation capabilities.

2 Augmenting Digitized Facsimiles This section describes a system that was initially

developed for transcribing literary hypermedia (Lecolinet, 1998). It was then re-implemented and generalized for annotating, editing and visualizing various kinds of complex documents. As said in the introduction, a major goal of this new system is the ability to integrate various kinds of documents into a single information space and to provide unified access to all subparts. On the one hand, handwritten or complex documents that can not be processed automatically by usual techniques can be annotated or transcribed in order to allow for indirect access by content. Automatic indexation is performed on the hypertext annotations that are linked to the corresponding areas in the original bitmap images. On the other hand, generalized hyperlinks can be created for linking various hypermedia elements with specific areas in the original bitmap images, thus allowing for a continuous and uniform browsing scheme in heterogeneous data spaces (digital annotation can thus be seen as a generalization of the hyperlink concept).

This scheme makes it for instance possible to integrate archive documents including manuscript or

pictorial data into a common hypermedia framework and to provide extended access capabilities to this kind of material. Various kinds of pre-computer age data can thus be exploited with modern information management tools.

Another example is the capability to integrate and tightly relate various kinds of data elements that are commonly used in everyday life. As shown in Figure 1, this system could for instance be used to annotate a newspaper article (this paper being either encoded as a scanned bitmap image or an hypertext or a PDF file) and to link interesting parts with personal (typed or handwritten) notes, with other files or with related Web URLs. This system can thus be seen as an attempt to unify the way people interact with various kinds of readable data by providing uniform capabilities that do not highly depend on data encoding schemes (bitmap images, PDF data, markup languages…).

Three technologies, which are detailed in the

following subsections, were used for implementing the presented system. First, text/image coupling techniques were designed in order to facilitate user interaction. Normalization and reusability are other important concerns that were taken into account in this project. A crucial point was the possibility to reuse the data produced by our system by other tools. This goal was achieved by using the capabilities of the XML/TEI markup language for storing annotations and text/image coupling data in a standardized way. Implementing user interfaces that manage extended editing hypermedia capabilities is not a trivial task. Our system implementation is based on the Ubit toolkit that allows for bi-univocal correspondence between GUI objects and markup language tags. This specific architectural design simplifies the implementation of hypermedia authoring tools.

Figure 1 : Annotating a newspaper article.

2.1 Text/Image Coupling The interactive transcription and annotation proce-dure is performed in two successive steps (Figure 2). The user first selects a specific area in the original bitmap image by using specially designed tools (called interactive markers) that allow to edit various kinds of shapes in the original image (Figure 3). Four markers are currently available, each of them corresponding to a different selecting task (e.g. selecting a word, a text line, a paragraph or an arbitrary region). The user can then annotate the image content that is relative to the selected area and choose the appropriate XML tag (note, line, para-graph, etc.) that will characterize this element. As said before, digital annotations can constitute a full transcription of the original text or they can just contain key words, comments, rough descriptions of the original image content, references to external data or hyperlinks.

Figure 2 : The annotation and transcription system: a) an image of a manuscript. b) the XML encoding.

c) the graphical representation of the XML elements.

The transcription / annotation interface automati-cally appears over the original image after zone selection. It is automatically positioned near the selected area (Figure 4). Spatial proximity between the original bitmap zone and the annotation tool helps the user to focus on the transcription task. As shown in Figure 4, the editing interface is mostly transparent in order to increase this visual proximity effect. This effect is intended to provide contextual information to the user as both resources (the typed text and the image) are closely located in her visual

field. However, transparency can be deactivated at any moment in order to increase readability, which strongly depends on the nature of the background bitmap image.

Figure 3 : Interactive markers make it possible to select versatile shapes in digitized facsimiles.

Figure 4 : The transparent annotation / transcription

tool. Transparency can be deactivated when preferable.

XML elements are automatically created and linked to the appropriate image areas by the annotation interface. The system provides automatic layout capabilities: the textual content of XML elements are displayed in a way that respects the spatial layout of their corresponding zones in the original image (this text being either superimposed to the original image or displayed in a separate window). The user can take advantage of text/image coupling during the editing or browsing stages: activating an image area (by clicking on it or entering it, depending on the visualization mode) will automatically highlight the corresponding anno-tations and vice versa, as shown in Figure 2 (parts A and C). These capabilities tend to improve the visual correspondence between text and image resources.

The system takes advantage of annotation types when automatic processing are performed on XML documents. These types can also be used for filtering the data that are actually shown to the user. This scheme avoids users from being disturbed by the simultaneous display of many unrelated information.

2.2 XML/TEI Encoding Annotation encoding is based on the XML (W3C, 2000) version of the TEI (Sperberg, 1994), a DTD which is widely used in different fields of the Humanities. It offers a high expressive power for describing para-textual phenomena such as inser-tions, corrections and unreadable portions of text.

C

B

A

A transcription is made up of lines, paragraphs and columns which are respectively encoded with the <lb> , <p> and <cb> XML/TEI elements. <text><body> <cb><p id="p1"> <lb id="l1">Bernartz deuentedorn</lb> <lb id="l2">Qan uei lalauzeta mouer. deioi sas</lb> ……

</p></cb> </body></text>

Annotations are encoded by using the <note> element. This element allows to categorize annota-tion types (such as note, comment, keyword, link) by means of the value of its type attribute. Annotations are grouped within a <noteGrp> element which is located at the end of the transcription file in order to facilitate the readability of produced files. <noteGrp> <note id="n1" type="keyword">Troubadour</note> <note id="n2" type="link"><url> http://pro.wanadoo.fr/cc-pays.ventadour/troubadours.html </url></note> </noteGrp>

XML/TEI elements are uniquely identified by means of their id attribute, which is also used to link XML elements to image areas. As image fragments are contained in separate files that are stored with specific external formats (such as GIF, JPEG, PNG formats), it is not possible to refer to these files by using the simple XML identification reference pro-cedure. Moreover, no general image subpart pointing mechanism is currently available in XML specifica-tions. This problem was solved by extending the semantic of the standard <xptr> element. This extension relies on existing mechanisms in the SGML version of the TEI DTD.

A XML pointer (an <xptr> element) is created for each part of the image that must be referenced. The doc attribute identifies a bitmap image while the from attribute indicates a closed polygon in this image (which has been interactively selected). The following example illustrates this XML encoding.

<ptrGrp> <xptr id="xp1" type="rect" doc="lauz.gif"

from="space (3d) (0 98 330) (5 640 788)" /> <xptr id="xp2" type="line" doc="lauz.gif"

from="space (3d) (0 253 452) (30 54 55)" /> </ptrGrp>

The <xptr> elements that identify image zones and XML annotations are then related by means of the <corresp> tag, as shown in the next example. Besides, arbitrary XML elements can be linked together by using the <link> tag (that will create actual hyperlinks). A digital annotation can for ins-tance be linked to any element of the transcription. <correspGrp> <corresp targets="p1 xp1" /><corresp targets="n2 xp2" /> </correspGrp>

2.3 Linking XML and GUI Elements A key point in our system implementation is that all data is stored in a single tree structure. Each node integrates the necessary data for storing the hypertext part of a digital annotation, the corresponding coordinates in the bitmap image, and the corresponding GUI components for displaying it on the screen. This scheme allows for complex interactions between both textual and image representations. This design feature is made possible by the use of the Ubit GUI toolkit (Lecolinet, 1999). This toolkit, that was also developed at our institute, is based on the concept of “basic bricks”, which are lightweight modular objects that can easily be combined together. This design offers several advantages. First, basic bricks can easily be derived into a large variety of customized objects that can fit various specific tasks. For example, a brick can implement a specific object like an XML element or attribute. This flexibility allows to make user interface objects that really match the needs of the application domain. In addition, an interesting characteristic of lightweight components is that a very large number of objects can be simultaneously created while using a reasonable amount of memory. This makes it possible to develop elegant and fully object oriented implementations without scarifying performances. In contrast, the classical widget-based architecture that is adopted by most GUI toolkits does not allow for such flexibility and imposes limitation on the number of object instances that can reasonably be used.

We took advantage of the specific features of the Ubit toolkit in the following way. Instead of dealing with separate internal representations for the XML tree and the set of UI objects, we merged them into a single graph. So, each XML node is a Ubit brick that both implements appropriate XML and graphical functions. This representation is well suited for building interactive editors as each tagged object appearing on the screen effectively corresponds to an actual (and unique) object in the internal representation. The textual representation displayed in our system (Figure 3, part C) can be seen a projection of the internal XML/Ubit tree that is automatically managed (and updated on the screen) by the Ubit toolkit. This design also provides reverse correspondence from the visual representation to the internal data. This means that the correspondence between cursor location on the text and XML nodes is permanently known. Thus, moving the cursor on the screen is conceptually equivalent to moving a

pointer in the XML/Ubit tree. The user interacts directly with a visual representation that is virtually equivalent to the internal data.

The XML/Ubit nodes also include text/image coupling data. These nodes thus integrate three different kinds of data in a single representation: • the XML characteristics of the tag, its attributes,

and its textual content, • the graphical features for appropriate rendition of

the tagged text in the transcription and XML view, • the spatial coordinates of this text in the bitmap

image.

This tight correspondence between XML tagged nodes and their graphical representations could be found surprising at a first glance as XML encoding tends to separate the structure and semantics of the text from its graphical rendition. First, it is important to recall this tight correspondence only exists in the internal representation of the data in order to improve data management and user interaction but does not have any impact on external (XML) representations. Second, another interesting point is that graphical characteristics for text rendition are not usually stored in Ubit nodes but in separate “Style objects”. Style objects, another feature of the toolkit, make it possible to parameterize the graphi-cal appearance of GUI objects in a way that is coherent with hypertext style sheets. This feature also facilitates the translation between the XML and the UI worlds, thanks to a double level of correspon-dence: • the XML tag / Ubit brick correspondence for text

representation, • the XML style sheet / Ubit style object correspon-

dence for text rendering.

3 Interactive Exploration The previous section focused on the annotation and the editing of image-based hypermedia. This second part now describes information visualization techniques for manipulating and retrieving this data in a more comprehensive way. Two different cases are considered here. The first case addresses the navigation problem where the user have to find appropriate data in a hyperspace by performing visual exploration. The second case addresses the “bookmarking” problem where the user has to find again data that was already seen at a previous stage.

Different techniques, which are presented in the next subsections, were developed for dealing with each case. These techniques were initially implemented for handling literary information

spaces including manuscript documents and Web pages, as described previously.

3.1 Finding Information Users are often disoriented by the complexity of hypermedia networks. One of the ways to reduce users’ disorientation is to provide structural cues (such as structured overviews) (Rouet, 1996). We developed a navigator, ZoomTree (Robert, 1998), which creates structured presentations of the information space and displays them in hierarchical focus+context views. ZoomTree takes advantage of hierarchical structures, which are easier to display and understand than arbitrary graphs. Moreover, people are used to using these structures (for exam-ple, tables of contents and file system explorers).

ZoomTree shows multiple hierarchical local views, that focus on different parts of the hyperdocu-ment. A local view shows a small group of related hypertext pages which is represented as a tree of windows in the workspace (Figure 5). Texts, images and other elements of the hyperdocument are displayed in windows that can be moved or rescaled interactively by direct manipulation.

Once the root document of the tree has been chosen by the user, ZoomTree generates the hierar-chical structure by following hyperlinks (each docu-ment being only referenced once time). The depth of the generated hierarchy can be adjusted by the user.

Figure 5: ZoomTree, browsing the hyperspace with

hierarchical local focus+context views.

This kind of visual representation offers several advantages. First, users can access information in context. A focused document can be visualized in detail while others documents, which are displayed at a smaller size, constitute its local context. This representation helps the user to understand the place of the focused document in the hyperbase and its relationships with other data (context understanding generally enhances document interpretation).

ZoomTree also makes it possible to perform multi-focus representations (Figure 5) so that several documents can be simultaneously visualized at various scales. Multi-focus views seems to help for comparing documents and associating ideas. Besides, this representation also provides visual information (the related pages) which help the user making navigation choices. This feature should decrease the number of erroneous navigation choices that are usually observed when users interact with traditional Web browser (by clicking the “back” button so many times).

Other systems like PadPrints (Hightower, 1998) and WebTOC (Nation, 1997) also use hierarchical local views for helping to navigate. PadPrints implements a thumbnail image-based Web history mechanism that helps users remembering which pages they already browsed and which navigation paths they followed. A major difference with ZoomTree is that PadPrints focuses on the representation of navigation history rather than it tries giving hints about possible navigation paths from the current location. WebTOC generates a hierarchical table of contents with additional graphical statistics for navigating through a Web site. Users easily get an overview of the content of the documents but have no visual representations of this content (thumbnail images, etc.). Besides, the ability to compare documents (that can be distant in terms of hyperlinks) for associating ideas (through multi-focus representations) does not seem to be taken into account in these studies. Fisheye views (Furnas, 1986) are often used to define multi-focus representations (Shipman, 1999). The method consists in expanding the amount of space devoted to the “objects of interest” while decreasing the space given to remaining objects, thus providing both focus and context. The visualization workspace acts as a constraining area, all the objects being visible. However, multiple focus fisheye views, from a user perspective, are unfamiliar. They introduce spatial distortions which can be difficult to understand. Moreover, interactions are often uneasy for defining distortion parameters. Pad++ (Bederson, 1996), a pan/zoom navigating interface, supports multiple foci by multiple windows called portals. Since it does not perform automatic portal management, the user has to create, delete and arrange multiple portals manually. It seems also uneasy to understand relationships between portal views and thus to compare them. ZoomTree supports multi-focus views through a coherent and unified zooming interaction scheme.

Interaction. Our model of interaction is based on a direct zooming paradigm that tries to let users handle document windows in a quick and straightforward way. Users should not be disturbed by complicated and time consuming interactions so that they can focus attention on the information retrieval task. Direct zooming operations are both used to customize the representation of the documents (by rescaling them in an appropriate way) and to organize the global visualization space.

The user can define her main focus points and keep explicit control of how much context is viewed by rescaling windows interactively. By opposition to usual Web browsers, window resizes do not change the spatial layout of the contained pages but only rescale their content by preserving the relative size ratio of their components. Hence, users are not disturbed by frequent changes of the document layout and this consistency help them memorizing and recognizing documents even when they are shown at very small size.

This model unifies two kinds of operations in one interaction. Zooming is used to manage the level of detail of the documents and the spatial layout of the document windows. Zooming is considered here as a local operation which is applied to certain elements of the representation where the user wants to focus on. This scheme differs from global zooming systems such as Pad++ (Bederson, 1996) or Zomit (Pook, 2000) where zooming serves to move through the information space (and to explore it progressive-ly). Such systems transform the whole representation displayed on the screen in a global way. Graphical objects located near the zooming focus point get bigger (or are semantically replaced by alternate representations) while objects located near the borders get out of the visualization scope. This interactive scheme is quite interesting for navigating in large information spaces but some authors object that inexperienced users may get disoriented and that it may be difficult to know where to zoom as most objects do not appear in the initial view.

Rescaling windows individually would quickly

become a tedious task when many documents are displayed simultaneously. ZoomTree thus provides zooming techniques for handling groups of document windows in order to change the spatial organization of the display in an efficient way.

Hierarchical zooming makes it possible to resize a hierarchy of windows in a single interaction. In this mode, all children of the selected window are simul-taneously rescaled while preserving their relative

aspect ratio. This makes it possible to magnify a group of related windows in one action in order to get an overview of the corresponding documents.

Contextual zooming can be used for distributing display space among a focus area and other elements. Ascendant and descendant windows that are linked to the selected window are proportionally resized, according to their distance (in terms of links) from the selected window. Conversely, other document windows (which are not linked to the selection) are decreased in size.

Zooming operations automatically update the layout of the window trees, these windows being repositioned or resized. An animation process is automatically performed when transformations are too complex. It allows the user to track the transfor-mations, and thus reduces the cognitive overload that is necessary for understanding the new spatial layout.

3.2 Finding Information Again It is often difficult to retrieve documents that have been previously visualized in a hypermedia space. The solution that we discuss below concerns graphical information workspaces (Card, 1991). This concept refers to environments in which documents of interest can be compared, manipulated, and stored for future use. Iconic desktops, Web favorites or bookmarks are typical examples of information workspaces.

2D desktops (Windows and Mac Os) use a spatial layout of icons for representing documents. Acces-sing a document can be done by pointing the corresponding icon. However, it is often uneasy to remember pertinent information about documents. Another problem concerns the limited number of icons that can simultaneously be displayed. The Treemap, which is used for overviews of document collections, is another form of 2D spatial layout (Johnson, 1991). It is a space-filling layout that is generated automatically. However, automatic layout strategies may leverage the user’s ability to recogni-ze and understand spatial relationships (in 2D and 3D). In contrast, 3D systems like Workscape (Ballay, 1994) or the Web Forager (Card, 1996) let the users lay out by themselves the documents in order to improve further retrievals. The use of 3D techniques may make interaction uneasy to inexperienced users. The Data Mountain (Robertson, 1998) system allows users to place interactively thumbnail images of Web pages on an inclined plane using a 2D interaction technique. Perspective is used to increase the maximum number of documents that can be displayed but the system is limited to one

visualization layer. Moreover, mechanisms for managing and indexing groups of documents do not seem to be available.

The workspace that we developed consists in a

hierarchical information structure. Each node of the hierarchy is a visualization layer on which documents can be stored and grouped, as shown in Figure 6. These surfaces can be seen as visual bookmarks.

Figure 6 : Defining a logical organization of the

documents. Icon A references a group of documents (and another layer in the hierarchy).

The documents are represented by thumbnail images which can be placed anywhere (and at any size) on a layer of the workspace. These images can then be moved at any time with a traditional mouse drag technique. Zooming can also be performed on thumbnail images in a similar way. This capability makes it possible to adapt the level of detail of a document with current user’s interest (the more an object is large, the more it is visually attractive).

As the layout of the documents is defined manually, it can evolve over time and fit user’s needs. Classification schemes are established between documents (for example, they can be placed to form categories). During future explorations users take advantage of their spatial and visual memory for retrieving documents. They do not have to rebuild a mental map of the information space (where are the interesting documents ? how accessing them ? etc.), which reduces the associated cognitive overload. Our work relies on other studies (Darken, 1996; Robertson, 1998) which show that the use of human spatial memory (i.e. the ability to remember where you put something) and visual images helps users to retrieve information.

As the user can create and manage a hierarchy of

2D layers, the number of possible documents is not limited to only one visualization space. Each element

A

of the hierarchy is assimilated to a folder that can either be dedicated to a specific task, a given subject of interest or a particular document space. The user is thus provided with two levels of organization: the hierarchical level and the spatial level. Each visualization space is linked to a thumbnail image displayed on the surface of its parent. Clicking on a such icon opens the associated layer. This provides users with an efficient way for indexing groups of documents. For example, icon A in Figure 6 represents a visualization layer (shown at Figure 7) where similar manuscript documents have been previously stored. At last, the user can also browse the hierarchy by using a classical model of graphical file explorer, as shown in Figure 7.

Figure 7 : Another node of our hierarchical

workspace. The “plane explorer” is displayed on the left.

4 Conclusion We have presented interaction schemes for accessing digitized facsimiles and other kinds of documents in large information spaces. A typical point that distinguishes our work is that it tries to integrate and unify techniques that are generally developed in separate research communities (hypertext, structured documents and information visualization).

First, we propose an annotation scheme which allows to link digitized facsimiles with digital XML annotations. Text/image coupling provides indirect access by content to the original bitmap images. Automatic processing can then take place on the associated XML data. The second part presents interactive techniques for manipulating and browsing this kind of documents. The proposed environment contains two different workspaces. The first one provides contextual navigation through the information space. The second one is dedicated to the management and the retrieval of documents that have been previously seen.

Future works will focus on pen-based interaction in order to experiment digital annotation techniques using specialized hardware such as graphic tablets and e-books. As part of this research, we have also

worked on semi-automatic procedures in order to encode digital handwritten annotations using both word recognition and interactive mechanisms. At last, we will also focus on the experimentation of the proposed visualization techniques for evaluating their usability and their actual contribution to the navigation task.

References Ballay, J.M. Designing Workscape: An Interdisciplinary Experience. ACM

Conf. on Human Computer Interaction (CHI’94), pp. 10-15. Bederson, B., Hollan, J.D., Perlin, K., Meyer, J., Bacon, D. and Furnas, G.

Pad++ :a Zoomable Graphical Sketchpad for Exploring Alternate Interface Physics. Journal of Visual Languages and Computing, 7:3-31, 1996.

Card, S.K., Robertson, G., and Mackinlay, J.D. The information Visualizer, an Information Workspace. ACM Conf. on Computer Human Interaction (CHI’91), pp181-188.

Card, S.K., Robertson G. & York W. The WebBook and the WebForager : An Information Workspace for the World Wide Web. ACM Conf. on Computer Human Interaction (CHI’96), pp. 111-117.

Darken, R. & Sibert, J.L. A Toolset for Navigating in Virtual Environ-ments. Int. Journal of Human-Computer Interaction, 8, 49-72.

Editors & Staff of the William Blake Archive. The Persistence of Vision: Images and Imaging at the William Blake Archive. RLG DigiNews4.1, 2000.

Furnas, G. Generalized Fisheye Views. ACM Conf. on Computer Human Interaction (CHI’86), pp. 16-23.

Hightower, R.R., Ring, L.T., Helfman, J.I., Bederson, B.B. & Hollan, J.D. Graphical Multiscale Web Histories : a Study of PadPrints. ACM Conf. on Hypertext and Hypermedia (HT’98), pp. 58-65.

Johnson, B. & Shneiderman, B. Space-filling Approach to the Visualization of Hierarchical Information Structures. IEEE Visualization’91, pp. 284-291.

Lecolinet, E., Role, F., Robert, L., Likforman-Sulem, L. & Lebrave, J-L. An Integrated Reading and Editing Environment for Scholarly Research on Literary Works and their Handwritten Sources. ACM Conf. on Digital Libraries (DL'98), pp. 144-151.

Lecolinet, E. A Brick Construction Game Model for Creating Graphical User Interfaces: the Ubit Toolkit. IFIP Conf. on Human-Computer Interaction (INTERACT’99), pp. 510-518.

Nation, D.A., Plaisant, C., Marchionini, G. & Komlodi, A. Visualizing websites using a hierarchical table of contents browser: WebTOC. Conf. on Human Factors and the Web, 1997.

Nielsen J. Hypertext and Hypermedia. Academic Press, 1990. Pook, S., Lecolinet, E., Vaysseix, G. & Barillot, E. Context and Interaction

in Zoomable User Interfaces. ACM Conf. on Advanced Visual Interfaces (AVI’2000), pp. 227-231.

Robert, L. & Lecolinet, E. Browsing Hyperdocuments with Multiple Focus+Context Views. ACM Conf. on Hypertext and Hypermedia (HT'98), pp. 293-294.

Robertson, G., Czerwinski, M., Larson, K., Robbins, D.C., Thiel, D. & Van Dantzich, M. Data Mountain : Using Spatial Memory for Document Management. ACM Symp. on User Interface Software and Technology (UIST’98), pp. 153-162.

Rouet, J. & Lovonen, J. Studying and Learning with hypertext: empirical studies and their implications. Hypertext and Cognition. Lawrence Earlbaum, New Jersey, 1996

Shipman, F.M., Marshall, C. & LeMere, F. Beyond location: Hypertext Workspaces and Non-Linear Views. ACM Conf. on Hypertext and Hypermedia (HT’99), pp. 121-130.

Sperberg-McQueen, C.M. & Burnard, L. Guidelines for electronic text encoding and interchange (TEI P3). ACH-ACL-ALLC Text Encoding Initiative, 1994.

W3C (World Wide Web Consortium), XML 1.0, W3C Recommendation. http://www.w3.org/TR/2000/REC-xml-20001006.