Table of Contents
DocEng 2012 Symposium Organization viii
DocEng 2012 Sponsors & Supporters x
Keynote Address i
Session Chair: Peter King (University ofManitoba)
• Document and Archive: Editing the Past 1
Bruno Bachimont (Universite de Technologie de Compiegne)
Session 1: Layout and Presentation Generation
Session Chair: David Brailsford (University ofNottingham)
• Ad Insertion in Automatically Composed Documents 3
Niranjan Damera-Venkata (Hewlett-Packard Laboratories), Jose Bento (Stanford University)
• Optimal Guillotine Layout 13
Graeme Gange (University ofMelbourne), Kim Marriott (Monash University),Peter Stuckey (University ofMelbourne)
• ALMcss: A Javascript Implementation of the CSS Template Layout Module 23
Cesar Acebal (Universidad de Oviedo), Bert Bos (W3C/ERCIM),Maria Rodriguez, Juan Manuel Cueva (Universidad de Oviedo)
• Learning How to Trade Off Aesthetic Criteria in Layout 33
Peter Moulder, Kim Marriott (Monash University)
Session 2: Document AnalysisSession Chair: Simone Marinai (University ofFlorence)
• Challenges in Generating Bookmarks from TOC Entries in e-Books 37
Chandrashekar Ramanathan, Yogalakshmi Jayabal, Mehul Sheth
(International Institute ofInformation Technology)
• A Section Title Authoring Tool for Clinical Guidelines 41
Mark Truran (Teesside University), Gersende Georg (Haute Autorite de Sante),Marc Cavazza (Teesside University), Dong Zhou (Hunan University ofScience and Technology)
• A Methodology for Evaluating Algorithms for Table Understanding in PDF Documents...45
Max GObel, Tamir Hassan (Technische Universitdt Wien), Ermelinda Oro (Universitd delta Calabria),Giorgio Orsi (University ofOxford)
Session 3: Multimedia and HypermediaSession Chair: Cyril Concolato (Telecom ParisTech)
• Interactive Non-Linear Video: Definition and XML Structure 49
Britta Meixner, Harald Kosch (University ofPassau)
• Just-In-Time Personalized Video Presentations 59
Jack Jansen, Pablo Cesar (Centrum Wiskunde & Informatica),Rodrigo Laiola Guimaraes, Dick C. A, Bulterman (Centrum Wiskunde & Informatica & VU University)
• TAL Processor for Hypermedia Applications 69
Carlos de Salles Soares Neto, Hedvan Fernandes Pinto (UFMA), Luiz Fernando G. Soares (PUC-Rio)
• Advene as a Tailorable Hypervideo Authoring Tool: A Case Study 79
Olivier Aubert, Yannick Prie (Universite de Lyon, CNRS & Universite Lyon I, LIRIS),Daniel Schmitt (Universite de Strasbourg)
v
Keynote Address IISession Chair: Patrick Schmitz (University ofCalifornia, Berkeley)
• Content and Document Based Approach for Digital Productivity Applications 83
Thierry Delprat (Nuxeo)
Session 4: XML and Related ToolsSession Chair: Matthew Hardy (Adobe Systems Inc.)
• A First Approach to the Automatic Recognition of Structural Patterns
in XML Documents 85
Angelo Di Iorio, Silvio Peroni, Francesco Poggi, Fabio Vitali (University ofBologna)
• XML Query-Update Independence Analysis Revisited 95
Muhammad Junedi (Inria/LIG), Pierre Geneves (CNRS & Inria/LIG), Nabil Layai'da (Inria/LIG)
• Structure-Conforming XMLDocument Transformation Based
on Graph Homomorphism 99
Tyng-Ruey Chuang (Academics Sinica), Hui-Yin Wu (National Chengchi University)
• Toward Automated Schema-Directed Code Revision 103
Raquel Oliveira (Inria/LIG), Pierre Geneves (CNRSandInria/LIG), Nabil Layai'da (Inria/LIG)
Session 5: OCR and Visual AnalysisSession Chair: Laurence Likfonnan-Sulem (Telecom ParisTech)
• Effective Radical Segmentation of Offline Handwritten Chinese Characters Towards
Constructing Personal Handwritten Fonts 107
Zhanghui Chen (Chinese Academy ofSciences), Baoyao Zhou (EMC Labs China)
• Structural and Visual Comparisons for Web Page Archiving 117
Marc Teva Lawo, Nicolas Thome, Stephane Gancarski, Matthieu Cord (Sorbonne University)
• Receipts2Go: The Big World of Small Documents 121
Bill Janssen, Eric Saund, Eric Bier (Palo Alto Research Center),Patricia Wall, Mary Ann Sprague (Xerox Research Center Webster)
• Displaying Chemical Structural Formulae in ePub Format 125
Simone Marinai, Stefano Quiriconi (Universita di Firenze)
• Logical Segmentation for Article Extraction in Digitized Old Newspapers 129
Thomas Palfray, David Hebert, Stephane Nicolas, Pierrick Tranouez, Thierry Paquet (University ofRouen)
• Scientific Table Type Classification in Digital Library 133
Seongchan Kim, Keejun Han (Korea Advanced Institute ofScience and Technology), Soon Young Kim (KISTI),
Ying Liu (Korea Advanced Institute ofScience and Technology)
• Document Understanding of Graphical Content in Natively Digital PDF Documents 137
Aysylu Gabdulkhakova (Ufa State Aviation Technical University), Tamir Hassan (Technische Universitdt Wien)
Session 6: Demonstrations and Posters
Session Chair: Kim Marriott (Monash University)
• HP Relate - A Customer Communication System for the SMB Market 141
Steve Pruitt, Anthony Wiley (Hewlett-Packard & Exstream R&D)
• Structured and Fragmented Content in Collaborative XML Publishing Chains 145
Stephane Crozat (Universite de Technologie de Compiegne),
• Typesetting Multiple Interacting Streams 149
Blanca Mancilla, Jarryd P. Beck, John Plaice (The University ofNew South Wales)
• An Inheritance Model for Documents in Web Applications with Sydonie 153
Jean-Marc Lecarpentier, Pierre-Yves Buard, Herv6 Le Crosnier, Romain Brixtel (Universite de Caen)
• 500 Year Documentation 157
Francis T. Marchese, Maninder Pal Kaur Shergill (Pace University)
vi
Session 7: Search and SensemakingSession Chair: Charles Nicholas (University ofMaryland, Baltimore County)
• Personalized Document Clustering with Dual Supervision 161
Yeming Hu, Evangelos E. Milios, James Blustein, Shall Liu (Dalhousie University)
• The Glozz Platform: A Corpus Annotation and Mining Tool 171
Antoine WidlCcher, Yarn Mathet (Universite de Caen)
• Sift: An End-User Tool for Gathering Web Content on the Go 181
Matthias Geel, Timothy Church, Moira C. Norrie (ETHZurich)
o Faceted Documents: Describing Document Characteristics Using Semantic Lenses 191
Silvio Peroni (University ofBologna), David Shotton (University ofOxford), Fabio Vitali (University ofBologna)
Session 8: Digital HumanitiesSession Chair: Patrick Schmitz (University ofCalifornia, Berkeley)
• A Framework for Retrieval and Annotation in Digital Humanities Using XQuery Full
Text and Update in BaseX 195
Cerstin Mahlow (University ofBasel),Christian Grtin, Alexander Holupirek, Marc H. Scholl (University ofKonstanz)
• DocExplore: Overcoming Cultural and Physical Barriers to Access Ancient
Documents 205
Pierrick Tranouez, Stephane Nicolas, Vladislavs Dovgalecs, Alexandre Burnett, Laurent Heutte
(University ofRouen),
Yiqing Liang, Richard Guest, Michael Fairhurst (University ofKent)
• Evaluation of BILBO Reference Parsing in Digital Humanities via a Comparisonof Different Tools 209
Young-Min Kim (University ofAvignon), Patrice Bellot (Aix-Marseille University),Jade Tavernier (University ofAvignon), Elodie Faath, Marin Dacos (Aix-Marseille University)
• Glyph Spotting for Mediaeval Handwritings by Template Matching 213
Jan-Hendrik Worch (University ofBremen),Mathias Lawo (Berlin-BrandenburgischeAkademie der Wissenschaften), Bjorn Gottfried (University ofBremen)
Session 9: Architecture and Document ManagementSession Chair: Anthony Wiley (Hewlett-Packard)
• Architecture for Hypermedia Dynamic Applications with Content
and Behavior Constraints 217
Luiz Fernando G. Soares (PUC-Rio), Carlos de Salles Soares Neto (UFMA), Jose Geraldo de Sousa (PUC-Rio)
• Full-Text Search on Multi-Byte Encoded Documents 227
Raymond K. Wong (University ofNew South Wales & National ICTAustralia),Fengming Shi, Nicole Lam (University ofNew South Wales)
• Deriving Document Workflows from Feature Models 237
Ma Carmen Penades, Abel Gomez, Jose" H. Canos (Universitat Politecnica de Valencia)
• Charactles: More than Characters 241
Blanca Mancilla, John Plaice (The University ofNew South Wales)
Author Index 245
vii