incremental updates in structured documents

152

Upload: aalto-fi

Post on 03-Apr-2023

1 views

Category:

Documents


0 download

TRANSCRIPT

Incremental updates in

structured documents

Greger Lind�en

Licentiate ThesisFebruary �� ����Department of Computer ScienceUniversity of Helsinki

ii

Acknowledgements

I am deeply grateful to my advisor Professor Heikki Mannila for his expertguidance and unfailing support during the course of this work� I also wish tothank Associate Professor Jukka Paakki for his many valuable comments onmy work� I thank Professor Martti Tienari� head of the Department of Com�puter Science� and my colleagues at the department� for providing excellentworking conditions� The �nancial support of the Academy of Finland and ofthe Technology Development Centre TEKES is greatly appreciated�

Finally� I wish to thank all who have enriched my life during studies andresearch�

iii

iv

Contents

� Introduction ���� Related work � � � � � � � � � � � � � � � � � � � � � � � � � � � ���� Outline of the thesis � � � � � � � � � � � � � � � � � � � � � � � �

� Lazy and incremental methods ���� Incrementality � � � � � � � � � � � � � � � � � � � � � � � � � � � ���� Laziness � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ���� Discussion � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

� Structured text and the HST system �� �� Grammars for structure description � � � � � � � � � � � � � � � �� �� Views of documents � � � � � � � � � � � � � � � � � � � � � � � � �� � The document preparation model of the HST system � � � � � �� �� Lazy transformation � � � � � � � � � � � � � � � � � � � � � � � �� �� Incremental updates � � � � � � � � � � � � � � � � � � � � � � � �� �� Combining incrementality with laziness � � � � � � � � � � � � � ��

� Grammars� parsing and tree distances ����� Context�free grammars � � � � � � � � � � � � � � � � � � � � � � ����� Parse trees � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ���� Some further de�nitions and notations � � � � � � � � � � � � � ����� Recursive descent parsing � � � � � � � � � � � � � � � � � � � � ���� The LL� conditions � � � � � � � � � � � � � � � � � � � � � � � ���� Grammars with variable strings � � � � � � � � � � � � � � � � � ��� The LL� condition for textable grammars � � � � � � � � � � � ���� Syntax�directed translations � � � � � � � � � � � � � � � � � � � ���� Edit distances between trees � � � � � � � � � � � � � � � � � � � ��

v

� Incremental parsing ����� Related work � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� Incremental Parsing � � � � � � � � � � � � � � � � � � � � � � � � ���� Data structure support � � � � � � � � � � � � � � � � � � � � � � ����� Start position � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� Termination conditions � � � � � � � � � � � � � � � � � � � � � � ����� Incremental parsing� the method � � � � � � � � � � � � � � � � ����� Iterations � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� Error recovery � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� Complexity � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������ Improving the incremental parser � � � � � � � � � � � � � � � � ������ An example of incremental parsing � � � � � � � � � � � � � � � ��

� Incremental syntaxdirected translation ���� Incremental translations � � � � � � � � � � � � � � � � � � � � � ����� Data structure support � � � � � � � � � � � � � � � � � � � � � � ���� Termination conditions � � � � � � � � � � � � � � � � � � � � � � ����� Iterations � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� A naive algorithm � � � � � � � � � � � � � � � � � � � � � � � � � � ��� An enhanced algorithm � � � � � � � � � � � � � � � � � � � � � � ����� An example of incremental tree translation � � � � � � � � � � � ��

� Lazy syntaxdirected translation ������ A lazy view grammar � � � � � � � � � � � � � � � � � � � � � � � �� ��� Iterations � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� Continuous transformation � � � � � � � � � � � � � � � � � � � � ������ Combining lazy translation and incremental updates � � � � � ������ Examples of lazy transformation � � � � � � � � � � � � � � � � � ���

The HST approach and implementation ������ The HST system � � � � � � � � � � � � � � � � � � � � � � � � � ������ The incremental parser � � � � � � � � � � � � � � � � � � � � � � ����� Test results � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������ The lazy translator � � � � � � � � � � � � � � � � � � � � � � � � ������ Incremental translation � � � � � � � � � � � � � � � � � � � � � � ������ Discussion � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������ Future work � � � � � � � � � � � � � � � � � � � � � � � � � � � � ���

vi

� Conclusion ���

References ���

Index ���

vii

List of Figures

��� Document preparation model of HST� � � � � � � � � � � � � � � �

�� A document tree� � � � � � � � � � � � � � � � � � � � � � � � � � �� �� An annotated parse tree� � � � � � � � � � � � � � � � � � � � � � �� � Di�erent representations of a document in the HST� � � � � � � �� �� A document with only a part of the logical representation

transformed� � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� �� An incremental update in a document with several views� � � � � �� A lazily transformed document and its update� � � � � � � � � � ��

��� A tree and its translation� � � � � � � � � � � � � � � � � � � � � ����� Two labeled trees with minimum edit distance �� � � � � � � � � �� Two labeled trees with ordered nodes� � � � � � � � � � � � � � ��

��� A scenario for incremental parsing� � � � � � � � � � � � � � � � ����� A parse tree and its corresponding token list� � � � � � � � � � � ���� A parse tree with with one token not synchronized with the

input token list� � � � � � � � � � � � � � � � � � � � � � � � � � � ����� A parse tree and two modi�cations� � � � � � � � � � � � � � � � � ��� A parse tree and its modi�cation� � � � � � � � � � � � � � � � � ����� The incremental parser� main parse functions � � � � � � � � � ����� An instance of a parse tree and the corresponding editor tokens� ����� A parse tree and modi�ed editor tokens� � � � � � � � � � � � � � ��� A partly updated parse tree and corresponding editor tokens� � ������ An updated parse tree and corresponding editor tokens� � � � � ��

��� A scenario for incremental syntax�directed translation� � � � � ����� Two corresponding trees of a translation form� � � � � � � � � � ��

viii

�� A change of production during retranslation� � � � � � � � � � � ����� A parse tree of an input string� � � � � � � � � � � � � � � � � � ����� The output string of a translation as a tree� � � � � � � � � � � ������ A parse tree of a modi�ed input string� � � � � � � � � � � � � � ������ A parse tree of a retranslated output string� � � � � � � � � � � ���

��� A parse tree and its lazy view� � � � � � � � � � � � � � � � � � � ������ A lazy view of an iteration� � � � � � � � � � � � � � � � � � � � ����� An subtree of a logical tree� � � � � � � � � � � � � � � � � � � � ������ A totally computed view subtree� � � � � � � � � � � � � � � � � ������ A lazily computed view tree� phase one� � � � � � � � � � � � � � �� ��� A lazily computed view subtree� phase two� � � � � � � � � � � � ���

��� An overview of the user interface of the HST system� � � � � � ������ The two parts of HST� � � � � � � � � � � � � � � � � � � � � � � ���

ix

List of Algorithms

Top�down parsing of a nonterminal � � � � � � � � � � � � � � � � � � �Syntax�directed translation of a tree � � � � � � � � � � � � � � � � � �Preorder transformation � � � � � � � � � � � � � � � � � � � � � � � � ��Parse transformation � � � � � � � � � � � � � � � � � � � � � � � � � � �

Incremental update traverse � � � � � � � � � � � � � � � � � � � � � � ��Incremental top�down parsing of nonterminal � � � � � � � � � � � � ��Incremental top�down parsing of constant string � � � � � � � � � � � ��Incremental top�down parsing of variable string � � � � � � � � � � � � Incremental top�down parsing of empty string � � � � � � � � � � � � ��

Naive incremental syntax�directed translation � � � � � � � � � � � � ��Enhanced incremental syntax�directed translation � � � � � � � � � � ��

Lazy translation � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ���Lazy incremental document preparation � � � � � � � � � � � � � � � ���

x

Chapter �

Introduction

Document preparation has been an increasingly important application ofcomputers for over thirty years� The �rst systems were only extensions toprogram editing tools� low level formatters used to construct mainly lines ofequal length� and to produce justi�ed pages� Today we see a large numberof di�erent systems� processing� among other things� multimedia� structureddocuments� and hypertext�

What is actually a document� Most dictionaries de�ne it as an o�cialrecord� e�g�� the Collins Cobuild English Language Dictionary de�nes a docu�ment as �one ore more pieces of paper with writing on them which provide ano�cial record or o�cial evidence about something� �Sin���� We shall use theterm in a somewhat broader sense� We see a document as a paper� memo�note� computer program� or any written material electronic or on paperthat can be read or modi�ed� A document may also be circularly de�ned asthe product of a document preparation system �AFQ��a��

Document preparation systems can be classi�ed into batch�oriented sys�tems and interactive systems� Batch�oriented systems such as TEX �Knu����LaTEX �Lam��� and Scribe �Rei��� transform an input �le of text and markupcommands into a formatted output �le� This process takes place without theinteraction of the user� and the e�ect of a local change can be seen only byreformatting the entire document� Interactive systems� on the other hand�like Grif �QV��� and Microsoft�s Word �Mic��� permit the author to viewand edit the formatted text directly� The e�ect of changes in the documentare immediately shown� These systems are often called WYSIWYG Whatyou see is what you get� Interactive systems combine the editor and the for�

matter into a single system� so that users can interact with both componentswithout changing environments �CKS�����

Document preparation systems can also be classi�ed according to theirprocedurality� This classi�cation is based on the level of the commands bywhich the user communicates with the system� In a procedural system� theuser controls formatting by low level commands� such as �skip two lines��or �use Roman style�� The formatter follows the instructions by the userwithout any understanding of the reason behind the commands� On theother hand� a declarative system operates on a higher level� The user markshis document with tags that identify parts of the document such as footnotes�sections� section titles etc� Each tag is then interpreted by the system whichproduces the correct format for the tagged object� Examples of declarativesystems are Scribe �Rei��� and LaTEX �Lam���� For surveys on documentpreparation systems� see �FSS��� And��� Fur����

Recent document preparation systems provide several views of a docu�ment �Bro��� QV��� CH���� Special research e�ort has been put into two�view editors that show a textual declarative view and a formatted viewof a document simultaneously� The idea is that the user can edit any view�changes are propagated to the other view immediately� Propagation demandsthat processing is fast and e�cient� One approach of achieving this is to useincremental updates� An incremental update performs minimal update workafter a modi�cation�

Maintaining several views of a document usually relies on the structureof the document� Some systems give the user the opportunity to de�ne ageneral structure of the document� The simplest document model representsa document as a sequence of characters� A more advanced model decomposesthe document into logical parts� In a structured document the logical partsof the document are organized hierarchically� For example� a structureddocument can consist of a title and several sections� The sections can consistof subsections or paragraphs� Examples of structured documents are reports�dictionaries� and manuals�

In this thesis� we present incremental techniques for updating struc�tured documents� The Helsinki Structured Text Database System HST�KLMN��� is an environment for creating� modifying and querying struc�tured documents� In a structured document� there are usually logical partsthat we like to distinguish from formatting details� In HST� the user cande�ne the hierarchical structure of a generic document� We call this de�ni�

tion the logical de�nition of a generic document� Documents or instancesover this generic de�nition are called logical documents� The logical de�ni�tion de�nes the logical parts of a document such as title� list of sections� andbibliography� but it does not deal with formatting details such as newlines�blank spaces� and fonts�

In HST� a document over the generic de�nition can be presented in di�er�ent formatted versions called views� For every di�erent formatted version ofa document there is a separate view de�nition� A view de�nition correspondsclosely to a logical de�nition� Most logical parts of a document given in alogical de�nition are usually enclosed also in the view de�nition� But a viewde�nition can remove some logical parts� and reorder other parts� E�g�� aview de�nition can state that a view of a document should contain sectiontitles only�

In order to present a logical document in some formatted version� wede�ne a transformation from the logical document to a view� For everydi�erent view of the document� we need a separate transformation� We alsode�ne the inverse transformation� With an inverse transformation we canextract the logical document from a view�

When the user creates a document� he follows a certain view de�nition�On demand� the HST system extracts the logical structure by an inversetransformation� If the user wants to see di�erent formatted versions of thedocument� he de�nes additional views� i�e�� transformations� The system au�tomatically transforms the logical document and presents these views� Thiswe call the document preparation model of HST Figure ����

Updates in one view are propagated �rst to the corresponding logicaldocument and then to other views� The propagation of a modi�cation in aview of a document can take considerable time� First the modi�ed logicalstructure must be extracted from the view� This computation reconstructsthe entire logical document� The next step is to update other views bytransforming the logical document and reconstruct the views�

In this thesis� we study modi�cations on the document and the problemof how to e�ciently update the di�erent representations due to these mod�i�cations� The suggested solution is based on incremental algorithms� Anincremental algorithm updates an old representation instead of reconstruct�ing the entire representation� Updating due to a modi�cation in a view startswith incrementally extracting the logical structure� i�e�� we update the oldlogical document� If other views of the document exist� they must also be

� �� �

��

transform

transform

load

save

read

write

logicaldocument

view textual

document

user

Figure ���� Document preparation model of HST�

updated� Therefore� the logical document is incrementally transformed toother views� The principal idea is to minimize the amount of update workand thereby make the system more e�cient and interactive�

We also study lazy techniques for producing a part of a view or a lazyview of a document� A lazy view shows only a part of a formatted document�There are several reasons for using a lazy view� First� the user can only workon a part of a document view at a time� Therefore it su�ces to transform apart of a logical document� Second� transforming a large document entirelyis time consuming� Lazy transformation produces a part of a view only ondemand� For example� when the user wants to see a document� only the�rst part of it is transformed into a view� The amount that is transformedcan depend on the size of the editor where the view is presented� or on thesize of the window in which the view is shown� The principal idea here isto postpone as much work as possible until the user actually demands atransformation�

It should be noted that the HST system is not a formatter producinghigh quality paper output� The system is mainly used for producing di�erentviews of a document on the screen� It can well be used as a preprocessor toLaTEX �Lam��� or some other formatting system� The user can de�ne LaTEXviews of a document and prepare documents to be printed by LaTEX�

It should also be noted that the logical de�nition of a document is given

as a context�free grammar� Preparing a document will therefore include bothparsing and syntax�directed translation� Consequently� incremental updateswill include incremental parsing and incremental syntax�directed translation�We present these concepts in Sections � � and ��

��� Related work

As mentioned� there exist several di�erent sorts of document preparationsystems� We shall here give a short description of systems related to theHST system �KLMN���� These systems usually present to the user two ormore views of a document�

The Sam system �Tri��� was one of the �rst two�view editors for graphi�cal pictures� It combines graphics and a layout language� the user can edita picture in two views� The program view displays a textual description ofgraphical objects in a declarative layout language� The layout view is a pic�ture of the graphical objects themselves� The two views are connected byan internal representation� a parse tree� over the layout language� A syntax�directed editor in the program view prevents the user from introducing in�consistencies� Only modi�cations consistent with the syntax of the languageare allowed� All modi�cations are re�ected in the internal structure incre�mentally� but the incremental updates are greatly simpli�ed thanks to therestricted modi�cations in the program view�

Janus �CKS���� CBG���� was the �rst two�view text processing system�It provides the user with two views of a document on two di�erent screens�The markup view shows the document marked with SGML tags �Bar���� Theformatted view shows a page of the formatted document� Only the markupview is editable� except for some simple layout editing in the formatted view�Modi�cations in the markup view are updated by reformatting the corre�sponding output view� Modi�cations can have far�reaching e�ects due toforward and backward references and therefore the reformatting of the entiredocument is ine�cient� Because reformatting is not done incrementally� theuser can instead edit a document in di�erent modes� In fast mode� the data inboth views is locally consistent� but there might be errors in� e�g�� pagination�Safe mode ensures that everything is correct except for forward references�This can be viewed as the best single�pass formatting� Perfect mode ensuresthat both views are consistent� In general� this can be achieved only by

multipass formatting� When the internal structure changes both views areupdated�

Juno �Nel��� integrates a language describing graphical pictures with aWYSIWYG image editor� In the textual view� pictures are described withprocedures using a declarative language based on points and constraints� Inthe graphical view the user can create� select� and move points in addition toexpressing various constraints on the points� The textual view is not updatedon every operation� but new procedures are added when the user so requests�

VORTEX �CCH��� Che��� CH��� CHM��� is a TEX�based �Knu��� doc�ument preparation system� Both source and target representations of TEXdocuments are maintained and shown� Changes to the source representa�tion are automatically propagated to the target representation� The internalstructure of the document is represented as a tree� combining both the sourcerepresentation and the target representation and some auxiliary information�The leaves of the subtrees in the source representation form the actual con�tents of the source �le� The target representation has TEX boxes as leaves�Corresponding nodes in the source and target representations are linked�When the source is modi�ed� the corresponding changes are made in the tar�get representation incrementally� The two views are redisplayed after eachmodi�cation�

The Grif system �QV��� QVB��a� QVB��b� FQA��� is an interactive sys�tem for editing structured documents� It is a structure�oriented editor whichguides the user in accordance with the structure of the document� The usercan de�ne new document structures� The presentation of a document is de��ned as a presentation schema� A document can have several presentationschemas� de�ning di�erent views of the document� The default view showsthe entire document� Other views are� e�g�� a table of contents and the listof all formulas de�ned in the document� Grif is not a formatter even if thepresented views are almost WYSIWYG� Instead� Grif can be used as a pre�processor to TEX and LaTEX �QVB��a�� A presentation schema de�nes someconversion rules that transform a document into TEX or LaTEX� Especiallythe structure is here taken into account� The transformation cannot onlyremove certain parts from a document� it can also reorder elements in thedocument� The internal structure of Grif is a tree where the logical docu�ment is saved� A modi�cation in a view propagates through the tree� Grifknows what constraints exist between the modi�ed part and its neighbors�and updates can be done incrementally �QV����

Tweedle �Ase��� lets the user edit graphical objects using a procedurallanguage called Dum� The internal data is represented as a parse tree� Whenthe user makes changes to the textual representation� Tweedle must decidehow to change the graphical picture� Reparsing is done from scratch� andthe old and the new parse tree are compared to determine where the changesoccur in the graphical picture� This technique is claimed to work acceptablywell in practice� The textual representation is reexecuted to form the newpicture� Often reexecution means execution from scratch� but in some casesit can be incremental�

Lilac �Bro��� Bro��� o�ers both WYSIWYG editing and language�baseddescription side by side� First� the user de�nes the logical structure of thedocument� Then he can edit a document in either the language descriptionview or the WYSIWYG view� Modi�cations in one view are immediatelyre�ected in the other view� An incremental interpreter keeps the two viewsup to date�

Quill �CHL���� CHP��� Cha��� Lun��� Cha��� is a WYSIWYG systemdesigned to support full integration of various sorts of graphics editing to�gether with text editing� The logical document is described with SGML tags�Bar���� The document is organized as a hierarchy of elements of mixedtypes� The system is organized as a collection of cooperating editors� onefor each type of material in the document� The underlying data structureis a document tree� The logical document is incrementally updated when amodi�cation is made in the WYSIWYG document� The rest of the WYSI�WYG document is reformatted in the background while the user continuesto modify the current view�

A related area is incremental program generation consisting of incremen�tal parsing and incremental compilation �Rei��� Fri���� We review somerelated work later in Section � on incrementality and laziness in general andin Section � on incremental parsing�

��� Outline of the thesis

This thesis is organized as follows� In Section � we present lazy and incre�mental techniques and give their connection to document preparation� InSection we give an introduction to structured texts and the HST system�KLMN���� which is an example of a document preparation system that han�

dles structured texts� We also informally study the goals of this thesis� InSection � we present some concepts that we need for explaining the incre�mental techniques� We give an overview of grammars and parsing� and followup with de�ning syntax�directed translations and tree�to�tree corrections� InSections � and � we study two cases of incremental tree updates in partic�ular� incremental parsing and incremental syntax�directed translation� InSection � we de�ne a model of lazy transformation� Section � discusses someimplementation details of the HST system and how incrementality and lazi�ness can be merged into the existing system� Finally� we give a summary ofthe thesis in Section ��

Chapter �

Lazy and incremental methods

Traditionally� computable problems have been solved through what couldbe called a one�shot paradigm� An algorithm computes a complete solutionto a problem instance each time it is activated� However� an incrementalparadigm is a more natural characteristic for a number of problems� Anincremental algorithm updates an old solution instead of recomputing thesolution entirely� Moreover� some problems are instances of a lazy paradigmwhere a computation is not made until it is absolutely needed� In the follow�ing we give a brief overview of incrementality and laziness and refer to someresearch in the area�

��� Incrementality

In the one�shot paradigm� the solution of a modi�ed problem instance iscompletely recomputed� Many problems� however� are incremental in theirnature� An incremental algorithm updates an existing solution in response toa change in the problem instance� Typical applications include program de�velopment parsing� compiling� text processing formatting� window man�agement� and maintenance of consistent information spreadsheets� There isa distinction between incremental analysis and on�line analysis� The formerstudies real time as a function of the size of a change in the underlying datastructures� whereas the latter studies real time as a function of the size ofthe data structures themselves and the number of requests made �AHR�����

Assume that an incremental algorithm has solved a problem instance

P and denote by S the solution to P � Assume P is modi�ed to P � bymodi�cation �P � The task of the incremental algorithm is to produce �S�an update to the old solution S�

Batch�mode computations usually study the complexity of algorithmsas a function of the size of the entire input� The sum of the sizes of thechanges in the problem instance and in the solution is a better parameterwhen evaluating the time complexity of an incremental algorithm� Denoteby changed the sum j�P j� j�Sj� Note that changed only characterizesthe amount of work absolutely necessary to perform a given incrementalproblem� It does not take into account updating costs of various internaldata structures used by the particular incremental algorithm� changed isalso not known a priori� when the update process begins� only j�P j is known�RR����

An incremental algorithm is bounded if the computation of the solutionupdate takes time dependent only on changed� and not on the size of theentire input� Otherwise� an incremental algorithm is unbounded� A problemis said to be incremental if it has a bounded algorithm� Boundedness is notthe only relevant criterion in the study of incremental computation� and infact relatively few bounded incremental algorithms are known �RR����

Some examples of research on incrementality are the following� Rama�lingam and Reps �RR��� have studied incrementality in general and theyorder known incremental algorithms into a complexity hierarchy� They em�phasize that the sum of the sizes of the changes in the input and in theoutput is a better parameter than the size of the entire input when evalu�ating complexities� Berman et al� �BPR��� present a general method forproving lower bounds of incremental algorithms� As mentioned� there areseveral application areas for incremental algorithms� Incremental text pro�cessing� as in incremental syntax�directed editing �Rep��� RT��� and incre�mental formatting �CHM���� has received some attention� A closely relatedarea is incremental program generation parsing and compiling �Fri��� Fri����incremental semantic analysis �Hed���� and incremental program execution�KW��� Bha��� HM���� Incremental attribute evaluation has been stud�ied both per se �DRT��� Rep��� Kap��� Alb��� and as part of applications�Hud��� HK���� Finally� incremental algorithms have been applied on userinterfaces �Hol��� Hol��� and data �ow analysis �Ryd� � Zad����

��

Example � We shall give an example of incremental update in a constraintsystem� Assume that we have the following equations

a � � d � c� �b � � e � d� �c � a� b f � b� �

The system consists of six equations� Dependence between equations is im�portant� In this system we say that an equation eq depends on an equationeq� if the left hand side of equation eq� is a part of the right hand side of eq�The dependencies can be shown in the following way�

a � � d � c � �� �

c � a� b

� �b � � e � c� �

�f � b� �

In the �gure� a dependency is denoted by an expression eq� � eq with themeaning that equation eq depends on equation eq� � The system is consistentwhen every variable on the left hand side of an equation has the value com�puted by the right hand side of the same equation� This system is consistentwhen we have the following values for the variables� a � �� b � �� c � �d � �� e � �� and f � ��

Assume that we change the value of variable a to �� A reevaluation fromscratch would reevaluate all variables� including b and f even if they do notdepend on a� An incremental evaluation would only reevaluate variables thatare changed� i�e�� the variables c� d� and e� The values that would make thesystem consistent are a � �� b � �� c � �� d � �� e � �� and f � ��

Now� assume that we swap the values of a and b in the original system�i�e�� we set a � � and b � �� A reevaluation from scratch would againreevaluate all variables� An incremental evaluator would reevaluate f andc� After reevaluating c� the evaluator would notice that the value of c doesnot change and therefore reevaluation of variables depending only on theequation c � a � b would be unnecessary� The values that would make thesystem consistent are a � �� b � �� c � � d � �� e � �� and f � �� �

��

��� Laziness

Laziness is a concept related to incrementality� Traditionally� the one�shotparadigm computes the entire solution of a problem instance� Again� manyproblems bene�t from a lazy paradigm� A lazy algorithm postpones the com�putation as much as possible� There can be several reasons not to performa computation completely� Some part of the computation may be totallyirrelevant or not needed immediately� or some results are not visible to theuser and therefore redundant� Applications include user interface manage�ment� database management� and maintenance of consistent information asin a constraint restriction system�

A lazy algorithm solves only a part of a problem� Let the problem instanceP consist of some possibly overlapping instance parts� A lazy algorithmsolves a sub�problem by giving a part of the solution� There is a fundamentaldistinction between incrementality and laziness� The incremental algorithmupdates the complete solution after a modi�cation in the problem instance�the lazy algorithm solves di�erent parts of the problem instance� The lazyalgorithm must also be told which part of the instance to solve�

Lazy methods are used� e�g�� in program generation �HKR��� HKR����Traditional program generators usually require that a program must be gen�erated in its entirety before it can be used� If generation time is scarce� itmay be better to generate only those parts that are indispensable for process�ing some particular data� A related application is lazy generation of scannersand parsers �HKr��� HKR��� Kos���� A lazy scanner postpones the construc�tion of parts of an automaton for recognizing certain textual patterns untilthe parts are needed� Similarly� a lazy parser generates a parse table duringparsing� but only those parts that are needed to parse the input sentence athand�

User interfaces are another application area �Hud��� HK��� Hud���� Vari�ous components of the user interface are produced only when they are directlyor indirectly observed by the user� The UIMS user interface managementsystem �HK��� supports direct manipulation� It provides semantic feedbackwith a graphical representation and automatic screen update� The underly�ing data model is an attributed graph which is incrementally updated whena change occurs in the user interface� The update consists of two phases� amark phase that traverses dependencies in the graph and marks nodes outof date� and a reevaluation phase that only evaluates nodes that are out of

��

date and important� A node is important if it has an immediate e�ect on thesystem� Only nodes that need to be reevaluated are evaluated� Hudson alsosuggests a way of having di�erent views of the data in an attributed graph�Hud��� Hud���� For each node there is an attribute V which computes theview node� If a subview is missing� i�e�� it is not visible� it is replaced bydots� Views are constructed �rst top down what is visible and then bottomup subviews� The user interface is updated lazily� i�e�� only attributes thata�ect what the user currently sees are updated�

Example � Let us study a small example of lazy update in the previousconstraint system of Example �� We have the following dependencies�

a � � d � c � �� �

c � a� b

� �b � � e � c� �

�f � b� �

Assume that the user is only interested in the variable e� There could beseveral reasons for this� For example� the variable e could be the only onewhose value is currently shown to the user� or it could be the only one thata�ects some system that the user is working with�

Assume that we change the value of the variable a to �� A reevaluationfrom scratch would reevaluate all variables� An incremental evaluation wouldonly reevaluate the variables c� d� and e� A lazy evaluation would reevaluateonly variables c and e� The value of d would be inconsistent with its function�d would be updated later when the user wants the value of d� The values ofthe variables after a lazy reevaluation would be� a � �� b � �� c � � � d � ��e � �� f � �� �

��� Discussion

We saw in Section � that there exists a lot of document preparation sys�tems that use incremental updates� Incrementality in document preparationhas been considered both important and necessary� Especially in interactive

systems the response time is of greatest importance� Shneiderman �Shn���argues that a response time that is too fast can produce stress and highererror rates� He thinks that the response time should be kept within a two sec�ond range� This is of course highly dependent on the application� We do notexpect to reach this range even with incremental updates in our documentpreparation system HST� there should be no danger of too fast response� Onthe contrary� any speed up in the response time is needed�

Laziness� on the other hand� can further improve the response time�� Wehave not seen laziness used with document preparation but expect it to bewell suited for the area�

Incrementality and laziness are orthogonal concepts� we can have onewithout the other� The obvious approach� however� is to combine them�If only some parts of a problem instance are computed� an update algo�rithm needs to concentrate only on these parts� In Section we see howthese concepts are combined� In order to enhance the e�ciency of the doc�ument preparation system HST we apply lazy and incremental methods onthe transformations between di�erent representations of a document� Themethods are divided into two parts� lazy transformation or transformationof a part of the logical document to a view� and incremental transformationsbetween the document representations�

�It would seem that laziness would not make anything faster� However� we urge thereader to remember that we use the term laziness in a slightly di�erent manner than indaily life�

��

Chapter �

Structured text and the HST

system

Text with structure is quite common� dictionaries� reference manuals andannual reports are typical examples� In recent years� research on systemsfor writing structured documents has been very intensive� Recent surveysof the �eld are �AFQ��a� AFQ��b� Qui��� Sal���� The interest in the areahas led to the creation of several document standards� e�g�� ODA and SGML�Hor��� HKN��� Jol��� Bar��� Bro��� App���� In this section we give anexample of structured document preparation and also outline the goals ofthe thesis�

��� Grammars for structure description

One way to describe the structure of a document is to use context�free gram�mars �BR��� CIV��� QV��� GT��� FQA��� KP���� Thus� in database termi�nology� grammars correspond to schemas� and parse trees to instances�

The Helsinki Structured Text Database System HST is an environ�ment for writing� editing and querying structured documents �KLMN���KLM���a�� The document model of HST is based on the work of Gonnetand Tompa �GT���� extended with some further enhancements �KLM���b��

In HST we describe the document structure with a context�free grammar�a document grammar� and we de�ne views of the document by annotatingthis grammar� An annotated grammar� a view grammar� de�nes a view

��

of the document� It thereby also de�nes a transformation from the logicalrepresentation to a document view� We can de�ne several view grammarsfor a document�

A document tree is a parse tree over a document grammar� A documenttree describes the structure of a document� Additionally it contains thetext of the document in its leaves� A view tree is a parse tree over a viewgrammar� The view tree describes the structure of a view� The textual viewof a document instance consists of the terminal leaves� i�e�� the frontier� of aview tree�

When the user writes and modi�es a document� the HST system main�tains three representations of it� the document tree� at least one view tree�and the plain text in an editor�

Example � Assume that the user wants to establish a collection of bibli�ographic references� He describes the structure of the document by Gram�mar �� For simplicity� we use a very short grammar� See Section � forde�nitions of grammars and parse trees�

Grammar �� List � Publications� Publications � Publication�

Publication � Author Title Journal Year� Author � Text� Title � Text� Journal � Text� Year � �

� Year � Text

Note that the grammar de�nes only the logical structure� no formattingdetails� e�g�� newlines or blank spaces� are present here� A parse tree overthis grammar is shown in Figure ���

In the following we denote nonterminals with an emphasized string be�ginning with a capital letter� e�g�� Journal� A� or Listed Publications� Termi�nals are denoted by boldface strings� e�g�� end� � or Murder�Police� Forclarity� we can enclose terminals in single quotation marks� e�g�� ���� ���� or

��

List

Publicationshhhhhh

������� � � � � �Publication

��������������HHHH

XXXXXXXAuthor Title Journal Year

AText Text Text Text

Fletcher Murder Police ����

Figure ��� A document tree�

�Murder�on�the�rocks�� Note that any symbols can be part of a terminal�especially also blank spaces and newlines� The empty string is denoted by �

and a blank space by ��

��� Views of documents

In the HST system� the user de�nes views by annotating the document gram�mar� An annotated grammar or view grammar contains a modi�ed produc�tion for each production of the document grammar� The modi�ed productioncan omit some of the nonterminals of the original production� reorder the re�maining nonterminals and insert terminal strings�

Example � Assume that we have the logical document in Example andthat the user wants to produce some views of it� Assume that he wants asimple view of his references� Then he can produce a representation of thefollowing form�

Fletcher Murder� Police� �����nn

��

Here nn is the newline character� When adding new references� a represen�tation such as

Author FletcherTitle MurderJournal PoliceYear ����

could be more useful�Suppose the user has edited the list of references and wants to produce

input for the LaTEX document preparation system �Lam���� Then he wouldlike to have a representation such as

nitem Fletcher Murder� fnem Policeg� �����

Grammar � is an annotation of Grammar �� A parse tree over this gram�mar is a view of the document instance� The productions have been num�bered to show the correspondence with the productions in Grammar ��

Grammar �� List � Publications� Publications � Publication�

Publication � Author � �� Title ����Journal Year ��nn�

� Author � Text� Title � Text� Journal � Text� Year � �

� Year � ���� Text

Terminal symbols are surrounded by simple quotation marks� and the emptystring is denoted by �� The symbol �nn� stands for the newline character� Aparse tree according to this annotated grammar is shown in Figure ���

��

List

Publicationshhhhhh

������� � � � � �Publication

�����������

��������

���HHHH

XXXXXXX

hhhhhhhhhhhhAuthor Title Journal Year

A��Text Text Text Text

Fletcher Murder � Police � ���� �nn

Figure ��� An annotated parse tree�

��� The document preparation model of theHST system

The document representations are processed in di�erent ways Figure � �The textual view is parsed to form a view tree� which is transformed toform the document tree� A document tree can be transformed into severalview trees� The frontier of a view tree always forms a textual view of thedocument� so the view tree is �attened and the catenation of the strings inthe leaves is presented as a textual view� The transformations are furtherdescribed in �Nik���� For e�ciency reasons� a transformation is always doneonly by demand when a view is saved� The user is given the possibility tocomplete a modi�cation before other representations are updated� Automatictransformations� even if sometimes desirable� would easily produce a greatamount of intermediate errors�

The textual view of a document is presented in an editor� It must benoted that the HST system is not a syntax�directed editor� Instead� the userhas the freedom to write what he wants� On demand� the text is saved andparsed� Like in program generation� the text must be syntactically correct orthe parser will notify errors� We will take a closer look at the parsing processin Section ��

HST document preparation can be compared with program generation�

��

������

AAAAAA

������

AAAAAA

� �� �

transform �atten

transform parse

logical document

tree databaseformatted

documenttree view

textual view

in editor

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

Figure � � Di�erent representations of a document in the HST�

When we produce a program� we develop di�erent views or representationsof it� e�g�� the source code� the object code and the executable code� Westart by writing the source code with an editor� Thereafter this source codeis parsed and compiled to produce the object code which is linked to producethe executable program�

Many document preparation systems follow a similar scenario� E�g�� aLaTEX document �Lam��� is �rst produced as LaTEX code with an editor�Thereafter we have two phases� the LaTEX code is formatted into a deviceindependent form dvi� which furthermore is transformed into Postscriptcode �Ado��a� Ado��b�� If we reverse the document �ow in the LaTEX system�we could� e�g�� update the Postscript version of a LaTEX document in orderto get modi�cations in the LaTEX source�

An incremental document �ow in the HST system comprises incrementalparsing and incremental transformation� An additional phase is incremen�tal update of the editor� which we do not consider here� Closely connectedis lazy transformation which only transforms a part of the document� Theincrementalization of the document �ow can be compared with incremen�tal compilation �Rei��� Fri��� where the program transformations are madeincrementally� Incremental compilation consists of incremental analysis� in�cremental parsing� incremental code generation and incremental linking� Fur�thermore� we can have incremental execution� where execution is continued

��

������

AAAAAA

������

� �

lazilytransform �atten

logical

treeviewtree

editor

AA

AA

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

Figure ��� A document with only a part of the logical representation trans�formed�

after a program has been recompiled�

��� Lazy transformation

When working with large documents� the transformation of the entire logicaldocument to a view can take considerable time� On the other hand� we aremainly interested in what we can see in the editor� This suggests postponinga part of the transformation� According to lazy transformation only a partof the logical document is transformed into a view� The part is determinedby a predicate telling where to start the transformation in the documenttree and how much to transform� For example� we could try to minimizethe transformation and transform only the amount of the document that �tsinto one editor window�

Assume that the editor contains the �rst part of the document Fig�ure ��� If the user wants to move forward in the document� the next partof the logical document must be transformed and loaded into the editor� Atthe same time some parts of the document might have to be removed fromthe editor window�

The swapping of transformed document parts must be taken into accountwhenever the user performs an editing operation that moves through the text�

��

e�g�� an operation that moves to the end or beginning of the text� moves onepage backwards or forwards or searches the entire text for a certain string�

We assume that the document tree contains the entire document� butthat the view tree can be partly empty depending on what parts of thelogical document have been transformed� Before a new part is loaded intothe editor� the old part must be saved so that possible modi�cations to thetext are saved� If the text is read�only� the part can easily be erased as theunderlying logical document remains unmodi�ed in the database�

��� Incremental updates

Assume that a user is working on a document� He has opened two or moreviews of it and he is making modi�cations in one view� The HST systemupdates the document and thereby the other views in order to make themconsistent with the modi�cations� When working with large documents ormany views� these updates considerably slow down the work� What we needhere is an incremental update� where as little work as possible is done tomake the system consistent� Instead of scanning the entire document� onlythe modi�ed portion is processed and the modi�cations are mapped to otherviews�

The changes are propagated through the di�erent representations� Whenthe user makes a modi�cation in one textual view� the text is parsed and theview tree is transformed to form a new document tree� The document treeis transformed for every other open view to form new view trees� and thetextual views are updated�

Example � Assume that the user has opened three views of a documentand that he is currently making a modi�cation to the �rst view Figure ���

When he saves the �rst view� the modi�cation must be propagated to thedi�erent representations of the document� The incremental update starts byparsing the modi�cation and updating the �rst view tree Figure ��� parsea� The modi�ed part of the view tree is inverted back to the document tree�updating the old tree Figure ��� transformation b� After that the modi��cations in the document tree are transformed to update the other view treesFigure ��� transformations c and d� The update ends with substitutingthe corresponding text in the editor windows Figure ��� substitutions e andf� All modi�cations are made incrementally�

��

document tree

view tree � view tree � view tree

view � view � view

������

AAAAAA

������

AAAAAA

������

AAAAAA

������

AAAAAA

��AA

��AA

��AA

��AA

SSSSSSSSw

a e f

b c d

p p p p p

p p p p p

p p p p p

p p p p p

p p p p p

p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p

p p p p

p p p

p p

p

p p p p p

p p p p

p p p

p p

p

p p p p p

p p p p

p p p

p p

p

p p p p p

p p p p

p p p

p p

p

Figure ��� An incremental update in a document with several views�

A simple modi�cation modi�es only one leaf in the view tree� It can bea change� insertion� or deletion� A modi�cation to the textual view consistsof a simple modi�cation or a series of simple modi�cations� Usually� sim�ple modi�cations are easier to update� but we place no restrictions on themodi�cations made� Depending on the view document grammar� a simplemodi�cation can a�ect the entire view document tree�

Several modi�cations can be considered as a series of simple modi�cationsand processed as a series of incremental updates� i�e�� the incremental updatealgorithm is called for every simple modi�cation� Several modi�cations can

also be considered as one large modi�cation and the incremental updatealgorithm is called only once� Sometimes two modi�cations to a text mightcancel the e�ects on the view tree� sometimes two modi�cations only a�ectthe view tree in the close vicinity of the corresponding leaves�

��� Combining incrementality with laziness

Laziness and incrementality combine very well� We give an example of howto bene�t from the advantages of both facilities�

Assume that the user has opened a lazily transformed document� Assumealso that the space of the editor is limited� Before transforming and loadingmore parts of the document� the current part must be parsed and transformedback to the document tree�

������

AAAAAA

������

� �� �

lazilytransform �atten

incre�mentallytransform

incrementallyparse

logical

treeviewtree

editor

AA

AA

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

Figure ��� A lazily transformed document and its update�

If the user has made some modi�cations to the text� the relevant partof the document is incrementally parsed and the view tree is updated� Themodi�cations are incrementally transformed to the document tree and otherpossible views are incrementally updated Figure ��� Subsequent partsof the document can be updated in the same way� Lazy transformationis applied in one direction from the logical representation to the textualrepresentation� whereas the incremental update is applied �rst towards thelogical document and then towards other views�

��

Some questions remain open� For example� we must decide what thegranularity of the lazy transformation is� A natural size of the portion tobe processed is� of course� a subtree� But subtrees can be of various sizes�Implementation plans and technical problems are discussed in Section ��

��

Chapter �

Grammars� parsing and tree

distances

Many types of structural information can be de�ned by context�free gram�mars� We use context�free grammars to de�ne the structure of documents�The text of a document is parsed to form a parse tree over a certain grammar�Here we use the recursive descent parsing method� Parsed documents can betransformed into other parse trees for producing di�erent views� These trans�formations are achieved by syntax�directed translations� In this section wegive some general de�nitions on context�free grammars� parsing� and syntax�directed translations� We also present the tree�to�tree correction problemand de�ne some di�erent ways to update a tree� These concepts will beused later in the development of our incremental parser and our incrementalsyntax�directed translator�

��� Context�free grammars

We start with some general de�nitions� A rewriting system is a pair R �V� P � where V is an alphabet and P is a �nite set of ordered pairs of stringsover V � The elements of P are referred to as rewriting rules or productionsand denoted � � �� where �� � � V ��

Given a rewriting system the yield relation � on the set V � is de�nedas follows� For any strings � and �� the relation � � � holds if and only ifthere are strings ��� ��� �� �� such that � � ����� and � � ����� and � � �

��

is a production in the system� We also say that � derives �� The re�exivetransitive closure of the relation � is denoted by ��� Thus ��� � holds ifand only if there are n � � strings ��� � � � � �n such that � � ��� � � �n and�i � �i�� holds for every i � �� � � � � n � �� Correspondingly� the transitiveclosure of the relation � is denoted by ��� and � �� � holds if and onlyif there are n � � strings ��� � � � � �n such that � � ��� � � �n and �i � �i��holds for every i � �� � � � � n � �� We say that �� derives � in zero or moresteps� if ��� � and that �� derives � in one or more steps� if ��� ��

A grammar is a quadruple G � N�T� P� S� where N �T� P is a rewrit�ing system� and S � N � We call N the nonterminal alphabet or the set ofnonterminals� The set of tokens T is called the terminal alphabet or the setof terminals� The symbol S is the start symbol of G�

Grammars are classi�ed according to the Chomsky hierarchy into fourgroups �Cho��� Sal���� It can be shown that the hierarchy is a strictly de�creasing hierarchy of language families� In the �rst group� we set no restric�tions on the productions�

In the second group� the productions are of type �A� � ���� where �and � are arbitrary strings over V � � is a nonempty string over V and A isa nonterminal symbol� Also productions of the type S � � are possible � isthe empty string� but only if S does not occur on the right hand side of anyproduction� These grammars are called context�sensitive�

In the third group� the productions are of type A� �� where A � N and� is an arbitrary string over V � These grammars are called context�free�

In the fourth group� the productions are of type A � aB� where A� B� N and a � T � We can also have productions of the type A � � whereA � N � These grammars are called right�linear or regular�

The language LG generated by a given grammar G is the set fwjS ��

wg of strings of terminal symbols in T �A word on notation� We use emphasized capital letters from the beginning

of the alphabet and strings beginning with capital letters for nonterminals�e�g�� A� B� Journal� Terminals are denoted by boldface strings� e�g�� a� begin� Terminals can be surrounded by simple quotation marks� e�g�� ���� �a��Capital emphasized letters from the end of the alphabet stand for terminalsor nonterminals� e�g�� X� Y� The greek letters �� �� � � � � possibly indexed�denote any string over the alphabet V �

��

��� Parse trees

A parse tree shows pictorially how the start symbol of a grammar derivesa string in the language� Before we de�ne parse trees formally� we take ageneral look at trees�

A tree is a collection of elements called nodes� one of which is distin�guished as a root� along with a relation parenthood that places a hierarchi�cal structure on the nodes� Formally� a tree can be de�ned in the followingmanner� �� A single node is itself a tree� The node is also the root of thetree� �� Suppose n is a node and t�� t�� � � � tk are trees with roots n�� n�� � � � �nk� respectively� We construct a new tree by making n the parent of nodesn�� n�� � � � �nk� In this tree� n is the root and t�� t�� � � � � tk are the subtrees ofthe root� Nodes n�� n�� � � � � nk are called the children of node n �AHU����

Often it is useful to associate a value or label with a node of a tree� Thelabel is not the name of the node� even if we often identify a node by itslabel�

Trees are usually traversed in preorder� postorder� or inorder� In preorder�the root of a subtree is �rst processed� thereafter the children from left toright� In postorder� the children are �rst processed from left to right andthereafter the root� In inorder� we �rst the process the �rst child� then theroot� and thereafter the rest of the children from left to right�

Given a context�free grammar G� a parse tree is a tree with the followingproperties �ASU����

�� The root is labeled by the start symbol�

�� Each leaf is labeled by a terminal or by ��

� Each interior node is labeled by a nonterminal�

�� If A is the nonterminal labeling some interior node m and X�� X�� � � ��Xn are the labels of the children of that node from left to right� then A�X�X� Xn is a production of the grammar G� HereX��X�� � � � �Xn

stand for symbols that are either terminals or nonterminals� As aspecial case� if A� � is a production in P then a node labeled A mayhave a single child labeled ��

We say that the production A � X�X� Xn has been expanded or ap�plied at node m� The leaves of the parse tree read from left to right form the

��

yield or the frontier of the tree� which is the string generated or derived fromthe nonterminal at the root of the parse tree� The grammar G is unambiguousif for each terminal string there is at most one possible parse tree�

Example � We already saw some examples of parse trees� which were doc�ument instances and views� See Examples and � on pages �� and ��� �

When we speak about a symbol in a production rule of a grammar� wetalk about its occurrence� A symbol can occur several times in the right handside of a production� When we expand a production at a node in a parsetree� the label of the node is an instance of a symbol� Let n be a node in aparse tree and let p be the production expanded at n� Then the node n hasas many children as there are symbol occurrences on the right hand of theproduction p� The labels of the children� the symbol instances� correspondto the occurrences on the right hand side of the production�

��� Some further denitions and notations

Let G � N�T� P� S be a context�free grammar� Let V � N � T � We de�nethe following functions on V �

First�X� � fa j a � T andX�

� a� for some � � V �g

Follow�X� � fa j a � T and S�

� �Xa� for some� � V �and� �V �g

Nullable�X� � if X�

� � then true else false

The function First determines the �rst set of the symbol X� This set con�tains all the terminals that X can derive as its �rst symbol� Correspondingly�the Follow function determines the follow set of the symbol X� The followset contains all the terminal symbols that immediately can follow the symbolX in a derivation� The Boolean function Nullable tells if X can derive theempty string� We extend the function Nullable for a sequence of symbols inV � A sequence X�X� Xn is nullable� if every symbol Xi in the sequenceis nullable� i�e�� NullableX�X� Xn if and only if NullableXi for every i�where � i n�

We de�ne on V � the following function�

��

Last��� � if � � � then � elseX where X � V and � � �X and � � V �

The Last function determines the last symbol of ��

In a parsing process we usually want to be able to choose the appropriateproduction by only looking at the next input symbol� Therefore we de�nethe function Dirsym on productions of G �MPS����

DirsymX� � X�X� Xn � fa j a � FirstXi� where � i n and

NullableX�X� Xi�� g �if NullableX�X� Xn then FollowX� else �

TheDirsym function determines the dirsym of a production� The set containsall terminals that can be the �rst symbols derived using this production� Ifall nonterminals on the right hand side of the production are nullable� weinclude the follow set of the left hand side nonterminal X��

Example � Consider the context�free grammar Grammar � Boldface char�acters stand for terminal symbols and capital letters for nonterminals�

Grammar � S � B A C � gA � a b c C � �

A � d e f D � hB � C D D � �

Here we have the following dirsym sets�

Production Dirsym set Production Dirsym setS � BA fg�h�a�dg C � g fggA� abc fag C � � fa�d�hgA� def fdg D � h fhgB � CD fg�h�a�dg D � � fa�dg

It is important that the grammar is unambiguous� Here we see that if twoproductions have the same left hand symbol� their dirsym sets are alwaysdisjoint� The parser will know which production to expand by only lookingat one symbol at a time� The requirement that the dirsym sets are disjointis actually equivalent to the LL� condition to be given in Section ���� �

Algorithm � �Top�down parsing of a nonterminal�

procedure parseXInput� Text to be parsed�

Output� Parsed subtree according to nonterminal X�

Task� Parse input and construct parse tree �top�down��

begin

� Let a be the next input symbol �

� if X is a terminal then

� if X � a then

� Read a�� else

� error�

� else

� Let a be in the dirsym set

Dirsym�X � X� X� Xk�� Expand the production X � X� X� Xk �

for all Xi do

� parseXi�

end�

��� Recursive descent parsing

Parsing is the process of determining if a string of tokens can be generatedby a grammar� Most parsing methods fall into one of two classes� calledtop�down and bottom�up methods� These terms refer to the order in whichnodes in a parse tree are constructed� In the former� construction starts atthe root and proceeds towards the leaves� whereas in the latter� constructionstarts at the leaves and proceeds towards the root�

Recursive�descent parsing is a top�down method of syntax analysis inwhich we execute a set of recursive procedures to process the input� A pro�cedure is associated with each nonterminal of the grammar� Here we considera special form of recursive�descent parsing called predictive parsing� in whichthe lookahead symbol� i�e�� the next symbol in the string being parsed� un�ambiguously determines the procedure selected for each nonterminal� The

sequence of procedures called in processing the input implicitly de�nes aparse tree for the input �ASU����

Each procedure in a predictive parser does two things� First� it decideswhich production to use by looking at the lookahead symbol� The productionA � � is used if the lookahead symbol is in DirsymA � �� If there is acon�ict between two right sides for any lookahead symbol� we cannot use thisparsing method�

Second� it uses a production mimicking the right side� A nonterminalresults in a call to the procedure for the nonterminal� and a token matchingthe lookahead symbol results in the next input symbol being read� If at somepoint the token in the production does not match the lookahead symbol� anerror is emitted� The parse function for symbol X is given in Algorithm ��

The parse procedures can be automatically constructed from a givengrammar� The parsing can also be done in an interpreting way� See �Nik���for further details on parsing structured documents�

��� The LL�� conditions

Before the parser expands a production with a certain right hand side� itchecks if the next input symbol belongs to the dirsym set of the production�Therefore we demand that the dirsym sets of two productions with the sameleft hand side must be disjoint� that is� for each pair of productions A � �

and A � � we have that

DirsymA� � �DirsymA� � � �

This equation is equivalent with the so called LL��� conditions for a grammar�We say that the grammar is LL��� if the following conditions hold for eachpair of productions A � �� A � ��

�� First��First� � ��

�� Only one of � and � can derive the empty string�

� If � �� �� then � does not derive a string beginning with a terminal inFollowA�

These conditions guarantee that the grammar is unambiguous�

��� Grammars with variable strings

In HST we use context�free grammars not only to describe the logical struc�ture of the documents� but also to describe formatting details� We extend thegrammars to include two kinds of terminals� constant strings and variablestrings� The constant strings can consist of any characters in the characterset� Examples of constant strings are �if�� ��� and ����� In the following wedenote constant strings with c and assume that we can substitute this stringfor any character sequence� The constant strings are given in the grammar�The variable strings can contain any character sequence� but we do not knowbefore parsing what the string will be� A variable string is derived fromthe nonterminal Text and its instance depends on the succeeding constantstrings� The Text nonterminal is nullable� We call grammars with variablestrings textable�

Example Consider the grammar

S � A�A�A � Text

where � and � are constant strings� The nonterminal A derives variablestrings� the �rst A in the �rst production derives only strings that do notcontain � as a substring� the second A derives strings that do not contain� as a substring� Therefore an occurrence of the Text nonterminal dependson what constant strings succeed it in a derivation and we can say that thegrammar is context�sensitive if we write it in the form

S � A�A�A � TextText�w� � c��w�

Text� � c��

where c� denotes any string that does not contain the constant string �and c� denotes any string that does not contain the constant string �� Inthe following� however� we consider a context�free grammar extended withtextable productions still to be context�free� �

Parsing is always done according to a view grammar� Therefore it isnot wise to have productions of type A � Text Text c� because the secondoccurrence of the Text nonterminal would always derive an empty string� Ifwe have a delimiter a constant string between the two occurrences of Textnonterminals� the parser knows when to stop parsing the �rst occurrence ofText and when to continue with the second� In a document grammar this sortof production is possible� the document tree is translated from a view tree�where the textual instances of the two Text occurrences are clearly separated�

We de�ne a new Boolean function Textable on V � which tells when asymbol can derive a variable string as its immediate starter�

TextableX � if X�

� Text � for some � � V �

then true else false

We extend the de�nition of the function Textable for sequences of symbolsin V � A sequence X�X� Xn is textable if for some i we have TextableXi and NullableX�X� Xi�� � We say that a production is textable� if itsright hand side is textable�

All terminal symbols are of type string� For textable grammars� the tokenclasses are determined by the grammars� When we talk about tokens in thesequel we mean constant or variable strings�

Parsing is usually preceded by a lexical analysis process� An analyzerreads the input and returns input tokens to a parser� When the tokens arenot context�dependent� the analyzer is implemented as a separate process�This is the case in most programming languages� A program consists ofsome reserved words constant strings and some identi�ers variable strings�The identi�ers can be determined without looking at the grammar� e�g�� anidenti�er might be an alphanumeric string that starts with a letter� When wedeal with context�dependent tokens we cannot separate the analyzer from theparser� Instead the tokens are determined by a grammar� The parser readsthe input� a grammar determines where to stop and start reading the nexttoken� This is implemented by giving a set of �stop strings� as a parameterto the parser� Similar techniques can be found in �Kos��� JKP��� discussingmodular implementation techniques of programming languages�

In the sequel� we consider our grammars to be context�free� For this rea�son we must slightly modify the LL� condition for our textable grammars�

��� The LL�� condition for textable gram�

mars

Here we study what the LL� conditions mean for textable grammars� As�sume that the grammar G contains the productions A � � and A � ��

�� First we demand that First� � First� � ��

Then � and � cannot start with the same terminal� but we have animportant exception� We allow production pairs of type

A � TextA � c

where � � Text and � � c which is a constant string� Here the vari�able string is determined by the constant string� It can contain anysubstrings but it cannot start with the constant string c� Two textableproductions of the same grammar must� however� have di�erent lefthand sides� i�e�� we do not allow production pairs of type A � Text���and A � Text���

�� Only one of � and � can derive the empty string�

The Text nonterminal is nullable� If � �� Text A is textable� thenwe may not have � �� Text or � ��

�� But� e�g�� � �� c Text isacceptable�

� If � ���� then � does not derive a string beginning with a terminal

in FollowA�

Here again we have to take into account that the Text nonterminal isnullable� If we have

S � ABA � TextA � �B � ��

we have a problem parsing the sentence �� because the grammaris ambiguous� The nonterminal A is nullable and can start with aterminal that belongs to its follow set�

When checking the LL� conditions we have to remember that the Text non�terminal is nullable� The only exception that we make is to allow productionpairs of type A� Text and A� c� This is not actually an exception� becausewhen we introduce the production A � c� we give the Text nonterminal anew meaning� Without such a production in the grammar� Text could de�rive all possible strings over the terminal alphabet� and now it can derive allstrings that do not contain the constant string c�

�� Syntax�directed translations

In this section we de�ne the meaning of an annotated grammar formally� Asyntax�directed translation scheme SDTS �AU��� is a ��tuple F � N � ���� R� S� where N is a �nite set of nonterminal symbols� � is a �nite inputalphabet� and � is a �nite output alphabet� R is a �nite set of rules of theform W � ���� where W � N � � � N � ��� and � � N � ��� and thenonterminals of � are a permutation of the nonterminals of �� Finally S is adistinguished nonterminal of N � the start symbol�

Now� suppose the user has described the structure of a document bya context�free grammar G � N�T� P� S and that he has annotated theproductions of P � Then an annotated grammar for G is a translation schemaFG � N�T� T �� P �� S� where N is the common set of nonterminals of G�and T and T � are the sets of terminals of the grammar G and of the annotatedversion� respectively� and P � is the set of productions W� ���� Here W�� is the original production of G and W � � is the annotated version of it�Nik����

Let W� � � � be a rule� To each nonterminal of � there is associated anidentical nonterminal of �� If a nonterminal B appears only once in � and�� then the association is obvious� If there are more than one occurrence ofB in �� we associate the B�s in � with B�s in � in the same order�� The �rst

�We could also use integer superscripts to indicate the association� This associationwould be an intimate part of the rule� For example� in the rule W � B�CB� � B� B�

C� the three positions in B�CB� are associated with positions �� �� and �� respectively� inB�B�C �AU��

B in � is associated with the �rst B in � and so on� This concerns especiallyiterations� In the rule W� B� � B�� the occurrences of B in � are associatedwith the occurrences of B in � in the same order�

A translation form of a syntax directed translation schema F is de�nedas follows �AU����

�� S� S is a translation form and the two S�s are said to be associated�

�� If �A����A� � is a translation form� in which the two explicit oc�currences of A are associated� and if A � �� �� is a rule in R� then���� ������ is a translation form� The nonterminals of � and �� areassociated as they are associated in the rule A � �� �� and the nonter�minals of � and � are associated with those of �� and �� in the newtranslation form ���� ������ in the same way as they are associatedin the translation form �A����A���

The relation between the translation forms �A����A� � and ���� ������ isdenoted by ��F and we write �A����A�� ��F ���� ������� We often

drop the subscript F � The relation is transitive and re�exive and we use�

��to stand for the re�exive transitive closure�

The syntax�directed translation SDT de�ned by F � denoted � F � is a set

of pairs� � � fx� y j S� S�

�� x� y where x � �� and y � ��g� Finally�the input grammar Fi � N��� P� S of an SDTS F � N����� R� S hasthe production set P � fW � � j W � ��� � Rg and the output grammarFo � N��� P �� S of F has the production set P � � fW � � j W � ��� �Rg�

Let F be an SDTS and let x be a string of the language LFi� The viewof x is a string y in LFo such that x� y is in � F � Both grammars Fi

and Fo are considered to be unambiguous� The translation can therefore beconsidered as a function mapping strings to strings� Here we consider thetranslation as a function mapping trees to trees� every string in LFi hasa corresponding parse tree which is mapped to the parse tree of the stringy � LFo� if y is the view of x�

Example � Consider a syntax�directed translation schema F with produc�tions

Algorithm � �Syntax�directed translation of a tree�

procedure translateX�t� parse tree��

Input� Tree t�

Output� Tree t translated into t��

Task� Translate tree t recursively�

begin

� n �� t�root�

� for i � � to n�size do

� if �n�childi is a leaf� then

� delete n�childi�

� Let A � � � Gi be the production expanded at n

� and let A � � be the corresponding annotated

� production in Go �

� Permute the children of n in accordance with the

association between nonterminals of � and ��

� Insert leaves into n so that the labels of its

�� children form ��

�� for i � � to n�size do

�� if �n�childi is NOT a leaf� then

�� translaten�childi�root�label�n�childi��

end�

Author � First Name Middle initial� Last name �Last name ��� First name Middle initial�

First name � Text � TextMiddle initial � Text ��� � Text ���Last name � Text � Text

We have the following derivation

Author� Author�� First name Middle initial Last name �

Last name ��� First name Middle initial�� Text Middle initial Last name �

Last name ��� Text Middle initial�� Jessica Middle initial Last name �

Last name ��� Jessica Middle initial�� Jessica Text ��� Last name �

Last name ��� Jessica Text ����� Jessica B � Last name � Last name ��� Jessica B ����� Jessica B � Text � Text ��� Jessica B ����� Jessica B ��� Fletcher � Fletcher ��� Jessica B ���

A view or translation of the string Jessica B� Fletcher is the stringFletcher� Jessica B�� Note that even if the translation forms above containseveral instances of the nonterminal Text� it is always clear how they areassociated due to the de�nition of associated nonterminals� If we present thestrings as parse trees t and t� we say that a translation of tree t is tree t�

Figure �����

A translation can be done top�down starting at the root� Assume thatwe have a grammar Gi as an input grammar of an SDTS F and let Go be theoutput grammar of F � Let t be a parse tree over Gi representing a document�We transform the tree into a parse tree t� over Go� The translation methodis given in Algorithm �� The algorithm is given in �Nik��� pages �� ����

We use the following notations in the algorithm� Let t be a parse subtree�Then t �root is the root of the tree� Let n be a node in the tree� Then n�sizeindicates the number of children of the node n� n�childi indicates the subtreerooted at the ith child of n� and n�label indicates the label of the node n��

Example �� We continue Example �� Consider a syntax�directed transla�tion schema F with the following productions�

t t�

Author����

HHHH

Author����

HHHHFirst Middle Lastname initial name

���AA

Last First Middlename name initial

AA

Text Text Text Text Text Text

Jessica B � Fletcher Fletcher� Jessica B �

Figure ���� A tree and its translation�

Author � First name Middle initial� Last name �Last name ��� First name Middle initial�

First name � Text � TextMiddle initial � Text ��� � Text ���Last name � Text � Text

and the parse tree

Author����

HHHHFirst Middle Lastname initial name

���AA

Text Text Text

Jessica B � Fletcher

��

We now use the translation algorithm to transform the tree� First thesubtrees of the node Author are permuted a and the terminal childrenare inserted to the root b� Then the terminals of the input grammar arereplaced by the terminals of the output grammar� but here this phase istrivial� because they do not change� Actually� the constant string child ��� ofthe node Middle initial is �rst removed and then reinserted�

Author����

HHHHFirst Middle Lastname initial name

���AA

Text Text Text

Jessica B � Fletcher

a��

Author����

HHHHLast First Middlename name initial

AA

Text Text Text

Fletcher Jessica B �

b��

Author����

HHHHLast First Middlename name initial

AA

Text Text Text

Fletcher� Jessica B �

��� Edit distances between trees

In this section we give some measures of di�erence between two trees� tak�ing into account how transformations between the trees are done� Thesemeasures help us to express the e�ciency of di�erent incremental updatealgorithms see Sections ���� ����� and ������

The tree�to�tree correction problem is to determine for two labeled trees tand t� the distance from t to t� as measured by the minimum cost sequenceof edit operations needed to transform t into t�� We consider three kinds ofoperations� changes� deletes and inserts� Since a string can be considered atree of depth two with a virtual root added� the string�to�string correctionproblem �WF��� is just a special case of the tree�to�tree correction problem�

The tree�to�tree correction problem has been studied �rst by Selkow�Sel���� Tai �Tai��� presented an algorithm with time complexity Ojt j jt �j dt dt �� where jtj and jt�j are the number of nodes in each tree�respectively� and dt and dt � stand for the depths of the trees� Zhangand Shasha have presented an algorithm with time complexity Ojt j jt �j

��

minfdt� ltg minfdt �� lt �g� where lt and lt� are the number ofleaves in the trees �ZS���� They have given further enhancements in �SZ����

We de�ne below three di�erent edit distances between trees� The mini�mum edit distance tells the minimal cost of all edit sequences that transformone tree into another no restrictions on the trees� The preorder distancetells the minimal cost of all edit sequences that are done in strict preorderand transform one tree into another and� �nally� the parse distance tells theminimal cost of all edit sequences that transform one tree into another wherethe trees follow a certain grammar and the parsing is done top�down�

����� The minimum edit distance

We consider here three kinds of operations� A change in a node meanschanging the label of the node� Deleting a node n means removing the nodeand attaching its children to the parent of n� Inserting is the complementof delete� When we insert a node n as a child of a node n�� we make it theparent of a consecutive sequence of children of the node n��

Let ! denote the null node� An edit operation is written b �� c� whereb and c are either nodes or !� We let the label of the node represent thenode when no misunderstanding can occur� The operation b �� c is a changeoperation if b �� ! and c �� !� It is a delete operation if b �� ! and c � !�and it is an insert operation if b � ! and c �� !� The null operation ! �� !is not considered� The null node is more a technicality for de�ning the deleteand insert operations� When we de�ne a tree� we never explicitly say that itcan contain null nodes�

Let S be a sequence s�� � � � � sk of edit operations� An S�derivation fromtree t to tree t� is a sequence of trees t�� � � � � tk� such that t � t�� t� � tk�and we get tree ti from tree ti�� by applying the edit operation si� where� i k�

Let � be a cost function that assigns to each edit operation b �� c a non�negative real number �b �� c� This cost can vary for separate nodes� Weconstrain � to be metric� so the following conditions must hold�

�� �b �� c � � and �b �� b � �

�� �b �� c � �c �� b

� �a �� c �a �� b � �b �� c

��

����

AAAA

t

S

A B C����

AAAA

t�

S

B C

Figure ���� Two labeled trees with minimum edit distance ��

We extend the cost function � to the edit sequence S � s�� s�� � � � � sk byletting �S �

Pi�ki�� �si� Now� we can de�ne the minimum edit distance

between two trees �Tai����

De�nition � Let t and t� be two labeled trees� The minimum edit distancefor t and t� is

mindistt � t � � minf�S j S is an edit sequence transforming t to t �g

The distance function mindist is metric due to its de�nition�

Example �� In Figure ��� we have two labeled trees t and t�� Changingtree t to tree t� can be done in several ways� Let the labels of the nodes standfor the nodes themselves� Then the edit sequence A �� B� B �� C� C �� !changes t to t�� Under the unit cost function i�e�� �b �� c � �� when b �� c�the cost of this sequence is three edit operations� Another sequence� A�� !� transforms t to t� with cost �� which is obviously the minimum cost�Therefore mindistt � t � � � under the unit cost function�

����� Mappings

The number of di�erent sequences of edit operations which transform t intot� is in�nite� Therefore it is hard to enumerate all valid sequences and �nd

the minimum cost� We de�ne mappings �Tai��� that help us compute thedistance in polynomial time� Intuitively� a mapping is a description of how asequence of edit operations transforms t into t�� ignoring the order in whichthe edit operations are applied� Before de�ning a mapping we need to de�nesome auxiliary concepts� The ancestor of a node in a tree is its parent or aparent of an ancestor� The root of a tree is an ancestor of all nodes in thetree except of the root itself� Consequently� a node m is a descendant of anode n if n is an ancestor of m� A node is a sibling of another node if bothnodes have the same parent� We say that a node m is to the left of anothernode n if m occurs before n in postorder and m is not a descendent of n�

Suppose we have an ordering for each tree� Any complete ordering willdo� We denote by t�i� the ith node of tree t in the given ordering� A mappingis a triple M� t� t�� where M is a set of pairs of integers i� j� The followingconditions must hold�

�� for any i� j �M we have � i jtj and � j jt�j

�� for any pairs i�� j� and i�� j� in M we have

a i� � i� if and only if j� � j� one to one relation

b t�i�� is to the left of t�i�� if and only if t��j�� is to the left of t��j��

c t�i�� is an ancestor of t�i�� if and only if t��j�� is an ancestor of t��j���

Condition �b states that the sibling order is preserved� and condition �cthat the ancestor order is preserved�

When we transform t into t� we perform the following operations� Thenodes in t that are mapped on nodes in t� are relabeled if their labels di�erfrom those of their corresponding nodes in t�� The nodes in tree t that are notmapped are removed and the nodes in t� which do not have a correspondingnode in t are inserted into t� When there is no confusion we use M insteadof M� t� t�� Let M be a mapping from t to t� and let Ut and Ut� be the set ofindices of nodes in t and t�� respectively� which are not members of any pairin M � Then we can de�ne the cost of M

�M �X

�i�j ��M

t �i � �� t ��j � �X

i�Ut

t �i � �� ��X

j�Ut �

� �� t ��j �

The cost is the sum of the costs of all changes� deletes� and inserts�

��

The relation between a mapping and a sequence of edit operations is asfollows�

Lemma � Given a sequence S � s�� � � � � sk of edit operations from t to t��there exists a mapping M from t to t� such that �M �S� Conversely�for any mapping M � there exists a sequence of editing operations such that�S � �M�

Proof� See �ZS���� �

Hence� we have the following corollary�

Corollary � We can de�ne the minimum edit distance between two trees tand t� as

mindistt � t � � minf�M jM is a mapping from t to t �g

Proof Follows from Lemma �� �

Example �� In Example �� we saw two edit sequences that changed a treet to a tree t�� Assume that the nodes of the trees are ordered in preorder�i�e�� the node order of tree t is S�A�B�C and the order of tree t� is S�B�CFigure �� � nodes are indexed with order number� We have a mappingM� � f�� �� �� �� � g� The cost of this mapping is assume a unit costfunction

�M� � �t��� �� t���� � �t��� �� t���� � �t� � �� t�� � ��t��� �� !

� �S �� S � �A �� B � �B �� C � �C �� !� � � � � � � ��

Another mapping is M� � f�� �� � �� �� g The cost of this mapping is

�M� � �t��� �� t���� � �t� � �� t���� � �t��� �� t�� � ��t��� �� !

� �S �� S � �B �� B � �C �� C � �A �� !� � � � � � � �� �

��

����

AAAA

t

S�

A� B� C

����

AAAA

t�

S�

B� C�

Figure �� � Two labeled trees with ordered nodes�

This is less or equal then the cost of any mapping and consequently we havemindistt � t � � �� Both mappings M� and M� follow the conditions givenabove� But� e�g�� the triple f�� �� �� �g� t� t� is not a mapping as there isno conservation of the ancestor order� �

The solutions to the tree�to�tree correction problem are based on minimalmappings and several algorithms have been developed for computing themappings �Tai��� ZS��� SZ����

����� The preorder distance

The minimum edit distance corresponds to the minimum number of editoperations needed to transform one tree into another� Let t be a parse treeconstructed according to a grammarG and some input text� When we modifythe tree t because of a modi�cation in the input we get an updated tree t��The structure of t� is unknown at the start� We know� however� that the treet� follows the productions of the grammar� The work that must be done toconstruct the updated tree t� does not relate directly to the minimum editdistance between the trees� Here we de�ne two other distance functions thatmore accurately measure the reparsing�

We assume the recursive descent parsing technique to be applied for atext that follows a context�free textable grammar� Therefore� it is natural tomeasure the distance of the trees by counting the mismatches when the treesare traversed in preorder� We consider the same three operations� changes�

��

inserts and deletes� Informally we de�ne the preorder distance as the mini�mum cost sequence of edit operations needed to transform tree t into tree t�

when we traverse and transform the trees simultaneously in preorder� Whena mismatch occurs� the proper edit operation change� delete or insert mustbe applied to correct the mismatch�

In order to de�ne this distance formally� we �rst de�ne the preorder map�ping between two trees� Suppose we have an ordering for each tree anycomplete ordering will do� We denote by t�i� the ith node of tree t in thegiven ordering� A complete preorder mapping MPre� t� t

� is similar to anordinary mapping� but with the following conditions�

�� for any pair i� j �MPre we have � i jtj and � j jt�j

�� i� j �MPre if and only if

a� jtj � and jt�j � and i � j � �

that is� the root of t is mapped onto the root of t�

or

b� there is a pair i�� j� �MPre such that

a t�i�� is the father of t�i� and

b t��j�� is the father of t��j� and

c t�i� and t��j� are both the nth child of t�i�� and t��j ��� respec�tively�

A complete preorder mapping induces conditions that are much stricterthan the conditions of an ordinary mapping� This means that usually moreediting actions must be taken when the transformation is done� We de�nethe cost of the preorder mapping MPre in the same way as the cost of M �

�MPre �X

�i�j ��MPre

t �i � �� t ��j � �X

i�Ut

t �i � �� ��X

j�Ut �

� �� t ��j �

Here Ut and Ut� are the sets of indices of nodes in t and t�� respectively� thatare not members of any pair inMPre� Before we de�ne the preorder distance�we give the following lemma�

��

Lemma � The complete preorder mapping is unique�Proof The preorder is unique� therefore there can be only one preorder

mapping� �

Now we can de�ne the preorder distance formally�

De�nition � Let t and t� be labeled trees� The preorder distance of thetrees is

predistt � t � � �MPre

where MPre is the complete preorder mapping of t and t�� �

The function predist is metric due to its de�nition�The algorithm of the preorder transformation is given as Algorithm �

The algorithm uses the following notations� Let t be a tree� Then t �root isthe root node of tree t� Let n be a node in the tree� Then n�label is the labelof node n� n�size the number of the children of node n� and n�childi the ithchild of node n� The time complexity of the algorithm is Ojtj � jt�j� Thetrees are traversed exactly once each� possibly nearly all of the nodes ofthe tree t must be removed and new nodes according to the tree t� insertedwhich requires a total time of about �jtj� jt�j� under unit cost�

The preorder distance can be determined by the algorithm by counting thenumber of change� delete and insert operations that the algorithm performs�It can easily be shown that the preorder distance is always greater or equalto the minimum edit distance�

Example �� In Example ��� we have that MPre � M� ��

����� The parse distance

The preorder distance does not yet take into account the grammar whicht and t� follow� We de�ne the parse distance as the total size of subtreesthat are changed� inserted or deleted when transforming t into t�� A subtreeis considered to have changed if the production expanded at the root ofthe subtree has been substituted for another production� If the productionexpanded at the root of tree t is changed� the parse distance is the sum of

��

Algorithm � �Preorder transformation�

procedure preorder transform�t t��

Input� Two subtrees t and t��

Output� The subtree t transformed into t��

Task� Check and change nodes in preorder�

begin

� n � t�root �

� n� � t��root �

� for i � � to max�n�size n��size� do

� if �n��childi � nil� then

� for all nodes m in n�childi do

� delete m�

� delete n�

� else

if �n�childi � nil� then

� insert node with label n��childi�label

�� �and no children� as i�th child of n �

�� else if �n�childi�root�label �

n��childi�root�label� then

�� change n�childi�root�label to

n��childi�root�label�

�� preorder transform�n�childi n��childi��

end�

the size of t and t� �� If the production of the root stays the same we traversethe children from left to right and check the productions expanded at thechildren� If an expanded production has been changed� we increment theparse distance with the sum of the size of the subtrees at these children�

We de�ne the parse distance by means of a mapping� Suppose we havean ordering for each tree any complete ordering will do� We denote by t�i�

�This seems to be a very coarse estimation of the edit distance between the trees� butthe underlying assumption is that in order to update the tree t� we have to remove allnodes in t and reparse the entire frontier of t

�� This work corresponds to the sum of thesizes of the trees�

��

the ith node of tree t in the given ordering� Suppose that the trees followa certain grammar G and that at every node in a tree a certain productionhas been expanded� A complete parse mapping MParse� t� t

� is similar to anordinary mapping� but with the following conditions�

�� for any pair i� j �MParse we have � i jtj and � j jt�j

�� i� j �MParse if and only if

a� jtj � and jt�j � and i � j � � and the productions applied atthe roots of the trees are the same

that is� the root of t is mapped onto the root of t�

or

b� there is a pair i�� j� �MParse such that

a t�i�� is the father of t�i� and

b t��j�� is the father of t��j� and

c t�i� and t��j� are both the nth child of t�i�� and t��j ��� respec�tively

d the productions applied at t�i� and t��j� are the same�

A complete parse mapping induces conditions that are even stricter thanthe conditions of a complete preorder mapping�

We de�ne the cost of a complete parse mapping MParse in the same wayas the cost of M and MPre�

�MParse �X

�i�j ��MParse

t �i � �� t ��j � �X

i�Ut

t �i � �� ��X

j�Ut �

� �� t ��j �

Here Ut and Ut� are the sets of indices of nodes in t and t�� respectively� thatare not members of any pair in MParse� There is a mapping from a node intree t to a node in tree t� only if the same production has been expandedwhich implies that the labels of the nodes must be the same� Therefore thesumP

�i�j��MParse�t�i� � t�j� is zero and the cost of the complete parse

mapping is reduced to

�MParse �X

i�Ut

t �i � �� ��X

j�Ut �

� �� t ��j �

��

that is� the sum of the costs of all inserts and deletes� We have

Lemma � The complete parse mapping is unique� if the grammar is unam�biguous�

Proof Skipped� �

The parse distance is de�ned accordingly�

De�nition � Let t and t� be labeled parse trees over some grammar G� Theparse distance is

parsedistt � t � � �MParse

where MParse is the complete parse mapping of t and t�� �

The function parsedist is metric due to its de�nition� The parse transforma�tion algorithm is given as Algorithm �� The notation is the same as used inAlgorithm � Additionally� we assume that each node has information aboutwhat production has been expanded at the node� this is indicated by n�prod �The copy command copies a tree preserving the structure of the tree�

This corresponds fairly well to the amount of reparsing done when updat�ing the tree t to get t�� When reparsing� assume that we have to change theproduction at a certain node� This means that we have to continue the re�cursive descent parsing according to this production� We start by calling the�rst child of the production� etc� If we do not have to change the production�we still have to traverse the children and check if some of the productions atthese nodes change�

If two corresponding nodes in t and t� have a production that containsan iteration on its right side� and the iteration is used to produce a di�erentnumber of children for the nodes� we assume that the production is the samein both nodes� Consider� e�g�� the production S � A� used in correspondingnodes t�i� and t��j�� such that t�i� has two children two As and t��j� has threechildren three As� Then the productions used at both nodes are the same�and so are probably the productions used at the two �rst children of thenodes� Only the third child di�ers� which is then the only di�erence a�ectingthe cost function�

Example �� Assume that we have the same trees as in Example ��� Themindist and predist functions do not take any grammar into account� Forthe parsedist function� we need the grammar of the parse trees the same

��

grammar for both trees� Assume that the grammar contains the productionsS � ABC and S � BC� The trees in Figure �� must be constructed withtwo separate productions� because the grammar is supposed to be LL��The complete parse mapping is empty� because the productions expandedat the roots of trees t and t� are di�erent� The cost of the complete parsemapping MParse is assume again unit cost

�MParse � �t���� ! � �t���� !� �t� �� ! � �t���� ! �

�!� t���� � �!� t���� � �!� t�� �

� �S � ! � �A� ! � �B � ! � �C � ! �

�!� S � �!� B � �!� C

� � � � � � � � � � � � � �

� �

Now� parsedistt � t � predistt � t � because the production expanded at theroot is di�erent in t and t�� For this pair of trees we have mindistt � t � predistt � t � parsedistt � t ��

We can show that

Lemma � For any pair of labeled trees t and t� we have

mindistt � t � predistt � t �

and if the trees are constructed over a certain unambiguous grammar gram�mars we have

mindistt � t � predistt � t � parsedistt � t �

Proof From discussion above� �

Generally we do not expect the distances to be equal�

��

Algorithm � �Parse transformation�

procedure parsing transform�t t��

Input� Two subtrees t and t��

Output� The subtree t transformed into t��

Task� Parse transform t into t��

begin

� n � t�root �

� n� � t��root �

� for i � � to max�n�size n��size� do

� if �n��childi � nil� then

� for all nodes m in n�childi do

� delete m�

� delete n�

� else if �n�childi � nil� then

insert nodes of t��childi into t�childi

Copy t��childi as i�th child of t

� else if n�childi�root�prodn �

n�childi�root�prodn then

�� for all nodes m in t do

�� delete m�

�� delete n�

insert nodes of t��childi into t�childi

�� Copy t��childi as i�th child of t

�� else

�� parsing transform�n�childi n��childi��

end�

Chapter �

Incremental parsing

A parser structures a text forming a parse tree over a grammar� If we modifythe text� the parser updates the tree to correspond to the modi�cations� Theparse�from�scratch technique updates the tree by reparsing the entire text�If the text is large� this technique can be very time�consuming� Incrementalparsing� however� reuses the old parse tree modifying it only where it shouldbe updated� We use parsing for capturing the structure of text documents�Incremental parsing brings e�ciency to the structuring�

��� Related work

Themes connected to incremental parsing are present already in �Loc���Lin���� Research on incremental parsing handles both LR parsing �Cel���GM��� GM��� Weg��� JG��� AD� � DMM��� Lar���� and LL or recursivedescent parsing �Kah��� Str��� SDB��� MPS���� We distinguish incrementalparsing from incremental generation of parsers �Kos��� GHKT���� where amodi�cation in the grammar leads to an incremental update of the parser�

In the HST system �KLMN��� we use a recursive descent parsing tech�nique for capturing the structure of documents� We shall� therefore� take acloser look at some incremental LL parsers� All techniques start by parsingthe input text� They di�er� however� in the way they update the parse treeafter a modi�cation in the input�

Str"omberg �Str��� presents a way to incrementally parse Pascal pro�grams� Two attributes are associated with every node in the parse tree� One

��

tells the length of the frontier of the subtree for the node and the othertells the length of the reserved word that may be associated with the node�When an insertion occurs in the input� the incremental parser obtains theinsert position and the length of the inserted text as parameters� The parserthen traverses the parse tree top�down looking for a modi�ed node n� Thenode n is found by comparing the insert position with the cumulated sum ofthe lengths of subtree frontiers and reserved words depending on how far thetraversal has proceeded in the tree� When the right node has been found� theinserted text is parsed and inserted into the tree� The parser then updatesthe length attributes of the ancestor nodes of n� If parsing fails� the parserreturns to the parent of the node n and tries again� This is a weak point ofthe incremental parser� if the parser backtracks far enough� it must reparsethe entire text�

Schwartz et al� �SDB��� DMS��� MS��� present another incremental re�cursive descent parser for Pascal� When the program is modi�ed� the parsersplits the parse tree into a sequence of parse subtrees� The subtree corre�sponding to the modi�cation is removed� possible insertions are parsed� andthe sequence of parse subtrees is joined to form the complete parse tree� Thefragmentation technique used depends on language constructs and there�fore this sort of incremental parser cannot be constructed automatically aspointed out by �MPS����

Murching et al� �MPS��� present an incremental parser that can be auto�matically generated for any LL� grammar� The parser forms a parse treeand connects the input tokens with the leaves of the tree� The input tokensare also connected� forming a list� When the input is modi�ed� the parser up�dates the list by marking some of the tokens deleted� inserted� or unchanged�The parser traverses the token list and updates the parse tree� It �nds themodi�ed nodes in the parse tree by following links from the input tokens tothe leaves of the tree� We extend this technique and present an incrementalparser for structured documents� The extended method is presented in thefollowing sections�

��� Incremental Parsing

Incremental parsing aims at minimizing the work performed in a parse treeupdate� Let G � N��� P� S be a context�free unambiguous grammar and

��

t

t�

x

x�

incrementalmodi�cation

parse

incremental

parse

incrementalmodi�cation

������

AAAAAA

������

AAAAAA

��AA

p p p p p

p p p p p

p p p p p

p p p p p

p p p p

p p p

p p

p

p p p p p

p p p p p

p p p p p

modi�ed part

Figure ���� A scenario for incremental parsing�

let x be a string of the language L�� Let t be the parse tree over G withfrontier x� Assume that we modify the string x obtaining the new inputstring x�� The parse�from�scratch technique reads the entire string x� andreconstructs the parse tree giving t�� The incremental technique� however�reuses the old parse tree t� updates it� and produces a modi�ed parse tree t�

Figure ����An optimal incremental parser traverses and updates the parse tree only

where it should be modi�ed� A lower bound for an incremental parser ismindistt � t �� the cost of the edit sequence required to transform tree t intotree t�� We do not expect to reach this bound� A more practical bound willlie somewhere between predistt � t � and parsedistt � t ��

��� Data structure support

The parse�from�scratch technique always rebuilds the complete parse tree�there is no need to express the associations between the tree and the inputstring� Incremental parsers� however� have to connect the input with the

��

parse tree� When the input is modi�ed� the parser e�ciently �nds the a�ectednodes in the parse tree�

While the input string is being parsed� the parser connects pieces of itwith corresponding leaf nodes in the parse tree� The parser divides theinput string into input tokens� An input token is determined by the textablegrammar� and is either a constant string or a variable string� Every inputtoken is tagged� determined by a start tag and an end tag� The tags arenot visible to the user� The tags are needed by the incremental parser forrecognizing the structure of the text�

Every input token is connected to a parse tree token� the correspondingleaf node in the parse tree� We say that the input token and the parsetree token are associated� We connect the input tokens to the parse treetokens only at the leaf level� The input tokens corresponding to a subtreeare determined by the associated parse tree tokens of the subtree� Associatedtokens are connected with links that are bidirectional�

Example �� Assume that we have two parse trees corresponding to theproductions Author� Text and Year� �� respectively� The �rst tree consistsof a nonterminal node� Author� a parse tree token� Fletcher� and an inputtoken� Fletcher � which is the piece of string that is associated with thetree� Note that we distinguish input tokens from parse tree tokens by framingthem� The second tree is similar� it consists of a nonterminal node� Year� aparse tree token� �� and an input token� � � which in this case is an epsilontoken empty string� Epsilon tokens are tagged but not visible to the user�

Author Yearj j

Fletcher �

l l

Fletcher �

For further examples of tagged text see Figure ��� of Example �� page ����

We see that every input token is connected to its associated parse tree token�This is true only when the input string has been parsed� An unparsed stringhas no associations�

��

����� Editor support

Tagged text can be supported by the editor in di�erent ways� We have usedthe Sun Microsystems TexteditTool in OpenWindows �Sun��� which auto�matically supports invisible tags in the text� The XView developing environ�ment �Hel��� o�ers some procedures for maintaining� creating� destroying�and listing tags in the text� In the example above assume that the epsilonnode � is the following leaf node after the node Fletcher in left to rightorder� In the editor we would have the following text�

Fletcher� � ��t� t� t�t

There are four tags inserted� two t� and t� delimiting the terminal stringFletcher� and two t� and t delimiting the invisible epsilon token� TheHST system �KLMN��� maintains a table linking a pair of tags to the cor�responding parse tree token� In the example we would have the followingtable�

Start Tag End Tag Link to Nodet� t� �� t� t ���

The numbers �� and ��� are links to the Fletecher and � nodes� respec�tively�

We are of course not limited to editors that support invisible tags in thetext� The document preparation system in hand can explicitly maintain thetags itself� We could also use a representation that explicitly shows the tagsin the text� An example is the following�

Author����Fletcher�Author Year�����Year

The input tokens are delimited by SGML�like tags �Bar��� augmented withlinks to the corresponding parse tree tokens� This representation is not veryuser friendly as links are interspersed with ordinary text� Additionally� therecan be problems with the consistency between links and the parse tree if theuser is allowed to modify the links�

��

����� Parse tree notation

We associate some information with each tree node in a parse tree� Let t bea parse tree and n a node in it�

t �root the root node of tree tn�size number of children of node nn�childi ith child subtree of node nn�label label of node n left side of production

expanded at nn�prod production expanded at node nn�string constant or variable string connected

with leaf node nprod �symboli�label label of ith symbol on the right hand

side of production

The attribute n�size denotes the number of the children of node n� A leafnode has always size �� The attribute n�childi denotes the i�th child of noden� The label of node n is denoted by n�label � The expanded production at nis denoted by n�prod � All the leaf nodes n parse tree tokens contain a stringdenoted by n�string� The notation prod �childi�label refers to the labels of non�terminals or to the terminals on the right hand side of a production prod � Ifn is a node in a parse tree� we have n�childi �root �label � n�prod �symboli�label

��� Start position

Incremental parsing divides into three subproblems� The parser must �rstdecide where to start reparsing� It must determine which node in the treeis the �rst to be modi�ed� The second subproblem is the method� how willthe parser proceed updating the parse tree e�ciently� For the third� a parsermust know when it can terminate� In this section� we motivate the choiceof the start position� In the following section we de�ne the terminationconditions� before going into the method�

We locate the start position as in �MPS���� Let x � vwz be the old inputstring� where v�w� z � T � and let x� � vw�z� where w� � T � be the modi�edstring� Let t be the parse tree corresponding to the string x� The old inputstring x does not exist any more� it has been replaced by the new input

��

string x�� However� the old parse tree t still corresponds to the string x� Thesubstring v has not been modi�ed� the part of the parse tree correspondingto this string should� in consequence� not change� The parser� however� needsto look ahead one symbol before choosing which production to expand at acertain node� The string w� can be preceded by epsilon tokens that are notpart of the unmodi�ed string v� In order to capture such possible epsilons�the incremental parser starts reparsing at Lastv� i�e�� the last input tokenof the string v�

From the last input token of v we follow the link to the associated parsetree token in the parse tree� The comparison between the parse tree and theinput starts at these two tokens see Section ����� for details�

��� Termination conditions

Determining the termination position of a reparse is not straightforward� Asimple modi�cation in the input text can sometimes a�ect the entire parsetree� We give here termination conditions that indicate when the parser canstop updating the parse tree� If these conditions hold we claim that the parsetree is consistent with the modi�ed input� even if we have not traversed theinput string totally�

Termination conditions for incremental LR parsing have been presentedby �Weg��� JG��� Yeh� � YK���� We present some conditions that concernincremental LL parsing� Some further enhancements can be made by apply�ing the skipping heuristic presented in �YK��� but introduced by �Weg����where parts of parsing are skipped even if the parser has not passed the lastedit change in the text�

����� Local synchronizations

The termination conditions are divided into two concepts� local synchroniza�tions and possible termination nodes� We show that reparsing can terminateif a local synchronization holds at a possible termination node�

Assume that we have an input string� the corresponding parse tree andlinks between associated input and parse tree tokens� Assume also that theinput has been modi�ed and that the parse tree is not up to date�

��

List

Publicationshhhhhh

������� � � � � �Publication

�����������

��������

���HHHH

XXXXXXX

hhhhhhhhhhhhAuthor Title Journal Year

AFletcher � Murder �� Police ������ �nn

� � � � � � � �

Fletcher � Murder �� Police �� ���� �nn

Figure ���� A parse tree and its corresponding token list�

De�nition � We say that all unmodi�ed input tokens are locally synchro�nized with their associated parse tree tokens� All modi�ed input tokens� orinserted text that does not belong to an input token� are not synchronized��

We use the term locally synchronized because other parts of the parse treeand the input string can still be inconsistent�

Example �� In Figure ��� we have an example of a parse tree with associ�ated input tokens� The parse tree is constructed over Grammar � given inExample � on page ���

The parse tree represents the input string Fletcher �Murder�� Police�������nn� All input tokens are locally synchronized with their associatedparse tree tokens� Now� assume that we modify the input string� e�g�� sub�stituting the string Police by Crook� Then the new input token Crookmaintains its link to the parse tree token Police� The tokens are� of course�not synchronized� All the other input tokens are locally synchronized withtheir associated parse tree tokens Figure �� �

��

List

Publicationshhhhhh

������� � � � � �Publication

�����������

��������

���HHHH

XXXXXXX

hhhhhhhhhhhhAuthor Title Journal Year

AFletcher � Murder �� Police ������ �nn

� � � � � � � �

Fletcher � Murder �� Crook �� ���� �nn

Figure �� � A parse tree with with one token not synchronized with the inputtoken list�

A local synchronization says that if a tagged input token has been parsedbefore� the parser does not have to update the parse tree at the associatedparse tree token because it is already correct� A local synchronization be�tween two tokens is not a su�cient termination condition� There are tworeasons for this� First� if the parser has not passed the last modi�ed in�put token� there will most certainly be other modi�cations in the parse tree�Second� the structure of the tree may have changed due to the update� thismeans that the parse tree may be restructured even after the last inputmodi�cation�

Example �� Consider a grammar G with productions P � fS � ABCD jEBFG� A� a� B � b� C � c� D � d� E � e� F � cd j f � G� �g� Thegrammar is LL� and unambiguous� We can derive three di�erent terminalstrings� abcd� ebcd� and ebf�

We shall show why a local synchronization is not a su�cient terminationcondition� First� we show that the parser must proceed until it has passedthe last modi�ed input token� Assume that we have a parse tree for thestring abcd Figure ���� tree t� Assume also that we change the terminal

��

BBB

���

t

S

A B C D

a b c d

t�

S

BBB

���

E B F G

e b f �

t��

S

BBB

���

E B F G

e b cd �

Figure ���� A parse tree and two modi�cations�

string a to e and the terminal string cd to f� The corresponding tree isgiven as tree t� in Figure ���� The only locally synchronized node� B� andthe corresponding terminal string� b� have been surrounded by a dashed box�We start updating the tree t by parsing the string e� If we stop when wereach the local synchronization� the rest of the tree will never ve updatednodes F and G�

Second� we show that a local synchronization is not a su�cient termina�tion condition because the structure of the tree might change because of anearlier modi�cation� Assume that we start from the tree t in Figure ��� andchange the terminal string a to e� This time� the last modi�ed input token isthe token corresponding to the parse tree token e in tree t��� But the parsercannot stop when it reaches the local synchronization at node b� Even ifthere has been no modi�cations among the input tokens after the string b�the tree is still changed�

We can extend the de�nition of local synchronization for a sequence ofinput tokens� A sequence of input tokens is locally synchronized with a parsesubtree� if each input token is synchronized with a parse tree token in thesubtree and the input tokens occur in the same order as the parse tree tokensin the frontier of the parse subtree�

����� Termination nodes

We now de�ne a property of a node in the parse that says that it is possibleto stop the reparsing there� This property depends on the history of thenode� that is� on the development of its ancestors� its right brothers and itsright ancestors right brothers of its ancestors�

De�nition � A node n is a possible termination node for reparsing if duringthe reparsing

a no production expanded at an ancestor of n is changedorb no label of a right ancestor or a right brother is changed

and no right brother or right ancestor of n is deleted orinserted

We should make two important comments on this de�nition� First� conditiona does not imply condition b or vice versa� Second� because reparsing isdone top�down and the production to be expanded at a node is chosen beforethe parser reaches the children of the node� we always know � if an ancestorhas changed its production� and � if an ancestor or a right ancestor haschanged its label�

Example � In Figure ��� we see a parse tree t and its modi�cation t� aftera reparse� The grammar contains the two productions S � ABC and S �BC and satis�es the LL� conditions� We study the node B in tree t�� Theproduction expanded at the root of t has changed� so condition a in thede�nition above does not hold at B� But no ancestor� right ancestor or rightbrother of B has changed its label and therefore it is a possible terminationnode� Also nodes S and C are possible termination nodes�

The following lemma gives su�cient conditions for stopping a reparse andstill producing a correct new parse tree for the modi�ed text�

Lemma � Let G be a context�free� unambiguous LL� grammar� Assumethat we have an input string x � vwz where v�w� z � T �� its parse tree t over

��

����

AAAA

t

S

A B C

����

AAAA

t�

S

B C

Figure ���� A parse tree and its modi�cation�

the grammar G and links between the associated tokens in the input and theparse tree� Assume that the input is modi�ed to x� � vw�z where w� � T �

and that we reparse it� The parser starts at the last input token of stringv� The parser can stop the reparsing if the following termination conditionshold

a The last modi�ed input token has been parsed�b Parsing is locally synchronized�c The parser has reached a possible termination node�

Proof The proof should be obvious and we will only informally sketch it�Assume that the three conditions given above hold� If the last modi�ed inputtoken has been parsed� the parser must be processing the string z� This stringhas been parsed before� Additionally we know that the structure of the treeconcerning the string z has not changed because the parser has reached apossible termination node� We also know that the parser has found a localsynchronization� Because the grammar is LL� and unambiguous� there canbe no other changes in the tree after this local synchronization� �

��� Incremental parsing� the method

In this section we concentrate on the parsing method� We extend the tech�nique presented by Murching et al� �MPS���� They give a top�down incre�

��

mental parsing algorithm which can be automatically constructed from acontext�free grammar� We give four extensions� First� we let the parserhandle also variable strings� Second� we use the termination conditions ofSection ��� to stop the incremental parsing� Third� we also give an enhance�ment of the algorithm that uses termination conditions for locally skippingunmodi�ed text strings during an incremental parse� Fourth� we discussiterations and their update problems�

����� Scenario of the reparse

Before we present the incremental algorithms� we take an informal look onthe method� We start by some notations� Let x � vwz be the old inputstring� where v�w� z � T � and let x� � vw�z be the modi�ed string� wherew� � T �� The old string x does not exist any more� it has been replaced in theeditor by the new string x�� However� before the old parse tree is updated�we can reconstruct the old string x by concatenating the strings in the parsetree tokens the leaves� Let t be the old parse tree corresponding to thestring x� The incremental parsing algorithm updates the tree t to a parsetree t��

The incremental parsing algorithm traverses the old parse tree and thenew input string simultaneously� updating the tree t when necessary� Travers�ing is done with the help of two pointers� oldptr that points to parse treetokens in tree t and newptr that points to input tokens in the string x�� Be�cause newly inserted text is not divided into input tokens before the texthas been parsed� newptr can point to a certain inserted character� Duringparsing� oldptr points to the tree node that is currently being updated andnewptr points to the next character being read from the input�

Incremental parsing starts with the initialization of oldptr and newptr�Then the parser calls the function Traverse for traversal of the old parse treesimultaneously with reading the new input string� The Traverse function �rstdetermines the start position ClimbUpTree and then calls the appropriateparse functions ParseX� Parsec� ParseText � or Parse�� see Figure ����

The parser sets the pointers to point to Lastv of x and x� respectively�i�e�� oldptr is set to point to the parse tree token that comes directly beforethe �rst one to be modi�ed� and newptr is set to the last unmodi�ed inputtoken before the �rst modi�ed token� The pointers are set this way because ofpossible epsilon tokens in the parse tree and the editor� Because the structure

��

Traverse

ParseX Parsec ParseText Parse�ClimbUpTree

������

������

XXXXXX

XXXXXX

����

��

HHHH

HH

Figure ���� The incremental parser� main parse functions

of the parse tree is inherent in the editor through the input tokens� invisibleepsilon tokens must be inserted into the editor� This is done solely by theparser when needed for consistency with the parse tree� If there are epsilontokens following Lastv among the editor tokens� the number of these mustbe made equal with the number of epsilon tokens at the corresponding placein the parse tree�

When the number of epsilon tokens has been made equal in the editor andin the parse tree Algorithm �� lines ��� the actual reparsing can start�Both pointers� oldptr and newptr are advanced to the following parse treetoken and input token� respectively� If the strings of the associated tokensdi�er� the text has been modi�ed and must be reparsed� The question ofwhich parse function to use is solved in the following way� The parse treeis traversed upwards from the oldptr token until a node is reached that isnot the leftmost child of its parent� This is the �rst node in a top�downtraversal where the expanded production can change� The subtree at thisnode is reparsed� After the reparse� oldptr is advanced beyond the yield ofthe old subtree� The pointer newptr is advanced automatically when readinginput�

The traversal of the parse tree continues� gradually changing it to the newtree t�� The pointer oldptr is advanced� a start position is found by traversingthe tree upwards from the oldptr token to a node that is not the leftmostchild of is parent� and the correct parsing function is called� The update canstop when a termination node has been reached�

The parsing functions need some further explaining� A parsing function

��

recursively parses the new input text� However� it tries to avoid reparsingunmodi�ed parts of the parse tree� Nonterminals� constant strings� variablestrings� and the empty string all have slightly di�erent functions� Assumethat a parse function of a certain nonterminal is called when the traversal ofthe parse tree has reached should reach node n� If the node n is missing� itis inserted� in addition to the children according to the production expandedat the node� If the production expanded at n is changed� new children areinserted� All these children must be reparsed�

If the production expanded at n does not change� parsing can continue intwo ways� If the next input token is di�erent from the next parse tree token�then either some node in the leftmost branch of the subtree at n changes itsproduction� or the production at n is textable� This is di�erent from thecorresponding algorithm in �MPS���� where there are no variable strings� Inthese both cases the leftmost branch of the subtree is traversed and reparsed�The rest of the tree is searched for further modi�cations� If the input tokenand the parse tree token are locally synchronized� the leftmost branch of thetree does not change� but the rest of the subtree must be traversed becauseit might have to be be modi�ed�

A traversal of a subtree is equivalent to the top level traversal explainedabove� This traversal is referred to as traverse yield of tree�tree��

����� Algorithms for incremental parsing

The incremental parsing functions are given in Algorithms � �� The toplevel traversal algorithm is given in Algorithm �� Some auxiliary functionsare brie�y explained�

Traverse�oldptr� newptr� reads the input from the �rst modi�cation pointuntil a termination node is reached� The function expects that oldptrand newptr have been set to accurate parse tree and input tokens� TheTraverse function �rst balances the number of epsilon tokens in theparse tree and the input� Thereafter� it updates the parse tree at theoldptr node according to the input at newptr� The function is given inAlgorithm ��

Traverse yield of tree�tree� is similar to the Traverse function� Thisfunction traverses a subtree� updating it due to modi�cations in theinput�

��

Algorithm � �Incremental update traverse�

procedure Traverse �oldptr newptr� pointer��

Input� oldptr and newptr set to Last�v� in x and x��

Task� Traverse parse tree and update it�

begin

� while not termination node do

� if Next�Lookahead�oldptr�� then equate eps

� while Symbol�oldptr� � � do

� if Symbol�newptr� � � then

� Link Symbol�newptr� to parse tree�

� Advance newptr�

� else

� Insert epsilon token and link�

Advance oldptr�

� while Symbol�newptr� � � do

�� Delete epsilon token and Advance newptr�

�� Link Scan�Lookahead�oldptr�� to tree �

�� Advance oldptr�

�� else Lookahead�oldptr� �� Next�newptr�

�� n �� ClimbUpTree�oldptr��

�� Parsen�label�n Follow�n�label���

end�

ParseX�tree� followset�� For each nonterminal X in the grammar thereis a parse function ParseX tree� followset� The function expands aproduction with left hand side X at the root n of the tree�

First� if no tree node corresponding to the nonterminal exists� a new treenode with label X is inserted� This may be the case if the productionexpanded at a node has been changed� If the new production hasmore children than the old one� additional children must be inserted�Depending on the productions the node may or may not have the samenumber of children after the change� Among the productions with X

on the left hand side� the parser �nds the production whose Dirsym set

��

Algorithm � �Incremental top�down parsing of nonterminal�

procedure ParseX�t� tree node� folset� strings��

Input� Tree node t and follow set folset�

Output� Parse tree with root label X�

Task� Recursively parse input�

begin

� if t � nil then insert node with label X�

� prod �� find production with lhs X whose Dirsym set

� contains next input token �possibly textable��

� if t is new or production of t changed then

� forall children t�childi do

� folseti �� folset � Follow�prod�childi�label��

� Parseprod�childi�label��t�childi folseti��

� else t�prod � prod

if Next�Lookahead�oldptr�� then

� Substitute old token by new�

�� Traverse yield of t�

�� else

�� folset� �� folset � Follow�prod�child��label��

�� Parseprod�child��label�t folset���

�� Traverse yield of t�

�� Remove excess subtrees�

end�

the next input token belongs to� If one of the productions is textable�it is chosen as the last possible one� If no appropriate production isfound� there is an error in the input�

The chosen production is expanded in one of the two following ways�

�� If the tree node n is newly created or if the old production ex�panded at n is di�erent from the new one� node n is changed torepresent the new production� The subtrees of the node are val�idated to represent the new production by calling the functionscorresponding to the right hand side symbols of the production�

��

The pointer oldptr is advanced beyond the yield of the tree t� Ifthe old production had more symbols on the right hand side thanthe new production� the parser deletes all the excess subtrees ofthe tree node�

�� If there is no change in the production expanded at tree node n�there are two possible cases�

a The lookahead symbol of oldptr is the same as the next inputtoken� Then the leftmost branch of the tree will not change�but it is possible that some node within the subtree needsmodi�cation� Therefore� the parser traverses the yield of theentire subtree Traverse yield of tree�tree��

b The lookahead symbol of oldptr is di�erent from the nexteditor token� This indicates that there is a change in someproduction in the leftmost branch of the subtree� Anotherpossibility is that the nonterminal X is textable� The parsercalls the function corresponding to the �rst symbol on theright hand side of the production� Any other modi�cations inthe other subtrees of the tree are taken care of by traversingthe yield of the rest of the tree Traverse yield of tree�tree��

The parse functions are given a follow set as a parameter� There aretwo reasons for this� First� because the Text nonterminal is context�sensitive the parser must know when to stop parsing a certain Textinstance� The text is scanned until one of the terminals in the followset is encountered see also the ParseText function below� Second� evenif there is no error recovery included at the moment� there are plansto include it later see Section ���� The ParseX function is given asAlgorithm ��

Parsec�tree� followset�� There is a function Parsectree� followset for ev�ery constant string c �� �� The function inserts a node� if noneexists� If the next input token is not c� we have an error in the input�Otherwise newptr is advanced to the character the next input tokenfollowing the scanned token and the constant string is added as an at�tribute to the node� All excess children of the tree are deleted� Thepointer oldptr is advanced beyond the yield of the old subtree t� TheParsec function is given as Algorithm ��

��

Algorithm � �Incremental top�down parsing of constant string�

procedure Parsec�t� tree node� followset� strings��

Input� Tree node t�

Output� Parsed tree with label String and

constant string as frontier�

Task� Read input and form string until a symbol

in followset is detected� If next symbol

differs from constant string then error�

begin

� if t � nil then

� t �� insert tree node with label String�

� else

� Delete children of t�

� if Next�c� then

� t�string �� Scan until�followset��

� else

� Error�

Advance oldptr beyond yield of old tree�t��

end�

ParseText�tree� followset�� There is a function ParseTexttree� followsetfor parsing variable strings� This is an extension to the technique ofMurching et al� �MPS���� If there is no leaf node corresponding to thevariable string� the parser inserts one� Then the input text is scanneduntil one of the strings in followset is discovered� The scanned textstring is added as a child to the Text node cf� Parsec� The ParseTextfunction is given as Algorithm ��

Parse��tree�� The function Parse�tree di�ers from Parsec in the followingrespect� If newptr is pointing to an epsilon input token� only the linksare updated� Otherwise a new epsilon token is inserted to the left ofnewptr� The Parse� function is given as Algorithm ��

Some other functions needed in the process are brie�y explained below�

��

Algorithm �Incremental top�down parsing of variable string�

procedure ParseText�t� tree node� followset� strings��

Input� Tree node t�

Output� Parse tree with root label Text and

variable string as frontier�

Task� Read input and form string until a symbol

in followset is detected�

begin

� if t � nil then

� Insert tree node with label Text�

� else

� Delete the children of t�

� str �� Scan until�followset��

� Add str as an attribute to t with label String�

� Advance oldptr beyond yield of old tree�t��

end�

Advance oldptr moves oldptr forward to the next parse tree token in theparse tree� If there are no tokens� oldptr is set to nil�

Advance oldptr beyond yield of old tree�t� moves oldptr to the parsetoken following subtree t�

Advance newptr moves newptr forwards in the input text� If the followinginput token is an old epsilon token� the parser sets newptr to point tothis token� otherwise it is set to point to the following token the nextvisible character that follows the token that was scanned when newptrpointed to the previous token�

ClimbUpTree�oldptr� �nds a suitable start node for the reparsing� Thefunction goes along the link from the parse tree token oldptr is pointingto and follows the ancestors upwards of this node until it �nds a nodethat is not the leftmost child of its parent or until it reaches the rootnode�

Algorithm � �Incremental top�down parsing of empty string�

procedure Parse��t� tree node��

Input� Tree node t�

Output� Parse tree with root label Eps and

one epsilon child�

Task� Insert epsilon child at t�

begin

� if t � nil then

� t �� create tree node with label Epsilon�

� Attach t to parent node�

� else

� Delete children of t�

� if Symbol�newptr� � � then

� Update links from token to tree�

� Advance newptr�

else

� Insert new epsilon token to the left

�� of newptr and update links�

end�

Lexical analysis within the parser is important when dealing with variablegrammars� Variable strings can be checked and read by scanning the in�put until a certain string is encountered with the functions Next until andScan until� We need the following lexical functions�

Lookahead�oldptr� returns the contents of the terminal node oldptr ispointing to� If it is an epsilon node� the contents of the followingnon empty leaf node is returned�

Symbol�oldptr� returns the token string that oldptr is pointing to� Ifoldptr is pointing to an epsilon token� this token is returned�

Symbol�newptr� returns the token string that newptr is pointing to� Ifnewptr is pointing to an epsilon token� this token is returned�

��

Next�Str� checks if the string starting at newptr starts with the string Str�Does not move newptr�

Scan�Str� scans the string Str in the input and moves newptr past the stringStr�

Next until�Strset� reads the input until a string in the set Strset is en�countered� The pointer newptr is not moved�

Scan until�Strset� reads the input until a string in the set Strset is en�countered� The pointer newptr is moved past the scanned string�

For an example of incremental parsing� see Section ����� A part of thisincremental parser has been implemented� See Section � for further details�

��� Iterations

In this section we study how to update iterations e�ciently� An iteration pro�duction S� ���� contains an iteration part � that can be repeated when theproduction is expanded� Consequently� an iteration production is expandedat an iteration node� which as its children gets iteration subtrees�

The algorithms in Section ����� do not handle iterations e�ciently� Whenan iteration subtree is inserted or deleted at an iteration node� the incremen�tal parser reparses all subsequent iteration subtrees� No subsequent iterationsubtree is locally synchronized because oldptr and newptr are out of phase�

In practice� we mostly handle iteration productions of type S � A��where the iteration part consists of one nonterminal� Every other iterationproduction can be rewritten in this form� We give three solutions to theproblem of updating an iteration node�

First� consider the insertion or deletion of an iteration part as a changeof production� The incremental parser reparses all subtrees with changedproductions� thereby reparsing all iteration subtrees� The extra work makesthis approach less e�cient than the method used by the incremental parserin the previous section�

Second� let the parser reparse all iteration subtrees after the �rst deletedor inserted one� The parser produces new subtrees or deletes old subtrees atthe end of the iteration node� The parser also knows the iteration part andhow to build new iteration subtrees�

��

Third� let the parser reparse only the changed� inserted or deleted itera�tion subtrees� We take a closer look on this approach� which is more optimalthan the other two�

���� A change in an iteration subtree

The incremental parser presented in Section ����� handles all changes in theparse tree e�ciently� A modi�ed iteration subtree is updated according tothe changes in the input� The parser continues until it reaches a terminationnode� A change in an iteration subtree can� however� a�ect the rest of thetree� and the rest of the input might have to be completely reparsed�

���� An insertiondeletion of an iteration part

Consider �rst the deletion of an iteration part� The old input tokens thatconstituted the deleted iteration part have been tagged� When the inputtokens are removed� the links must remain� The parser �nds the nodes ofthe deleted iteration parts and removes them from the parse tree� Whenreaching an unchanged iteration part input token� the parser notices alocal synchronization� If there are no more modi�ed input tokens� parsingcan terminate�

When an instance of an iteration part has been inserted into the text�there is no tagged text marking the iteration subtree in the parse tree� There�fore the lack of links tells the parser that there is an insertion� When theinsertion is parsed� the parser has to know what the iteration part is� Forexample� it could call the function of the parse function of the iteration pro�duction with a parameter telling from which child to start parsing� Theparser continues until a termination node is reached� The termination nodeis not reached as long as inserted text is parsed�

���� Insertions� deletions and changes combined

The update of a change in an iteration subtree di�ers from the update of aninsertion or a deletion� We now combine these two modi�cations�

The following is a suggestion for a solution� The parser starts reparsingat the �rst modi�ed input token� By following links from the input to theparse tree tokens� the parser can deduce what subtree to construct� As long

��

as there are tagged input tokens� the parser changes and updates the parsetree� If new text has been inserted� the parser constructs a new subtree� If theconstruct is to be an iteration subtree� it is inserted� Otherwise the subtree isinserted and a corresponding structure is removed from the tree� If the parserencounters deleted input tokens only links remaining the associated tokenssubtrees are removed� The parser continues until it reaches a terminationnode�

�� Error recovery

Error recovery is not considered in this thesis� We have� however� includeda follow set as a parameter for each parse function� The follow set of a parsefunction for a nonterminal contains the tokens that can possibly follow thenonterminal in any derivation� This makes it easy to include some form oferror handling� E�g�� if a correct �rst symbol is not encountered in the inputtext� the parser skips all tokens until a follow set symbol is reached� For thiskind of error handling see� e�g�� �WM��� pages �� �� ��

��� Complexity

Determining the complexity of the incremental parser is hard� A simplemodi�cation can have far�reaching e�ects on the parse tree� and many nodesthat do not change might have to be traversed� We will now use the tree editdistances de�ned in Section � to measure the complexity of the incrementalparser at least in some cases�

First� consider the time complexity of the parse�from�scratch algorithmgiven as Algorithm �� Let t be the old parse tree corresponding to the in�put x and t� the new parse tree corresponding to the modi�ed input x�� Webase the complexity of the algorithm not solely on the changes in input themodi�cation but also on the changes in the output the parse tree� Theparse�from�scratch algorithm disregards the old parse tree t� This tree must�however� in any real application be destroyed to deallocate space� Destroyingthe tree demands work clearly related to the size jtj of the tree� The parse�from�scratch algorithm reconstructs the entire parse tree� The algorithm isrecursive� traversing and building one node at a time� The work done corre�

��

sponds to the size jt�j of the resulting tree t�� Additionally� the entire input isreread� The input is inherent in the new tree t� so we can assume that recon�structing the parse tree has a time complexity of Ojt�j� Clearly� it cannotbe constructed in less time� Combining this result with the destruction ofthe old parse tree gives us a time complexity of Ojtj� jt�j�

When using the parse�from�scratch algorithm� the parse tree does nothave to carry information about productions expanded at nodes� If we as�sume a simple implementation of a parse tree t where every node has infor�mation about � its label� � its �rst child� its right sibling� and �its parent� we would need space of size � jtj� We conclude that the spacecomplexity of the parse�from�scratch algorithm is Ojtj� jxj� where jtj is thesize of the parse tree t and jxj is the size of the input� We assume that thegrammar� �rst� follow� and dirsym sets� etc�� must be stored in any case andwe do not take them into account here�

Second� let us study the complexity of the incremental parser� Assume�rst that the modi�cation in the input is a simple one that only changes oneinput token� Let t be the old parse tree corresponding to the input x � vwz

and t� the updated parse tree corresponding to the modi�ed input x� � vw�z�The incremental parser reparses the modi�ed input token and subsequenttokens until it reaches a termination node� Even a simple modi�cation cana�ect all succeeding input tokens and the structure of the tree� In the bestcase� however� the tree is not a�ected except in the modi�ed parse token� Theupdate work corresponds to reading the modi�ed input token and changingone parse tree token� resulting in a time cost of Ojw�j� Assuming the mod�i�ed input token shorter than a constant� the work can be done in constanttime�

In the worst case� however� all input tokens and parse tree tokens suc�ceeding the modi�ed one in addition to the structure of the tree might bea�ected by the simple modi�cation� In this case� the rest of the input mustbe reread� and the �rest� of the tree� i�e�� all nodes that are to the right ofthe path from the �rst modi�ed parse token to the root� might have to betraversed� We get a time complexity of Ojw�zj for reading the rest of theinput and Oparsedistt � t � for traversing the rest of the tree� We thereforeconclude that the time complexity of the incremental parser in the worst caseis Ojw �z j� parsedistt � t ��

Now let us consider arbitrary modi�cations� These consist of a sequence ofsimple modi�cations� Each simple modi�cation is updated corresponding to

��

the time complexity given above� Our incremental parser� however� traversesthe entire tree between the �rst and the last modi�cation� Therefore thetime complexity of an incremental update can be greater than the sum ofthe complexities of the individual simple modi�cations� This makes it hard toestimate the time complexity� but the lower bound will be Oparsedistt � t ��jw �j� In the next section we give some enhancements on the incrementalalgorithm that result in a better time complexity�

We see that in either case� processing simple modi�cations or arbitrarymodi�cations� the update process is not dependent only on the size of thechanges in the input input tokens or output parse tree� We thereforeconclude that the incremental parser is unbounded�

The incremental parser requires some additional information to be main�tained in the parse tree� For example� it has to know which production hasbeen expanded at a certain node� Furthermore� input tokens are linked toparse tokens and vice versa� These additions require an extra space com�pared with the original tree of jtj� We conclude that the extra spaceneeded is Omaxjtj� jt�j�

Our conclusions are summarized in Observation ��

Observation � Let G be a context�free textable grammar� x � vwz a stringof the language LG� and t the corresponding parse tree� Let x� � vw�z bea string of the language LG� The incremental parser Algorithms � �executes correctly producing the updated parse tree t� over G correspondingto the string x�� If the modi�cation w� is simple� the time complexity is inthe best case Ojw�j and in the worst case Oparsedistt � t � � jw �z j� Anarbitrary modi�cation gives a time complexity of Ojw�j in the best case� Inthe worst case the time complexity is Ojtj� jt�j� jw�zj� the same as in theparse�from�scratch technique� �

���� Improving the incremental parser

The algorithms given by Murching et al� �MPS��� and extended in Sec�tion ����� are not optimal� The incremental parser starts at the �rst modi�edinput token and continues until it has passed the last modi�ed input tokenand reached a termination node� Input tokens in between are not reparsedif they are unmodi�ed� still the algorithm traverses all these tokens togetherwith associated parse tree tokens checking if they have been modi�ed or not�

��

Therefore� a natural enhancement is to give as input to the algorithma list of simple modi�cations list of changes� i�e�� a list of modi�ed inputtokens� When one simple modi�cation has been reparsed and the parsetree updated� the algorithm continues with the next modi�cation withoutunnecessary checking of unmodi�ed editor tokens in between�

As explained earlier� a modi�cation can have a far�reaching in�uence� Wetherefore demand that the reparsing process reaches a local termination nodebefore skipping to the next simple modi�cation� A local termination node isa node where parsing is locally synchronized and where no ancestor� brother�or right ancestor of the node has changed its production or contents� Localtermination nodes are similar to termination nodes with the exception thatwe do not demand that the parser has passed the last modi�ed input token�

Hence� the parser reparses only modi�ed parts of the input string andskips the unmodi�ed parts� For every modi�ed part� if the production ex�panded at a node changes� the subtree at the node is reparsed� Therefore� wecan expect a time complexity close to Oparsedistt � t � � jlist of changesjin the worst� where t and t� are the old and the updated parse tree� respec�tively� In the best case� where the modi�ed input tokens only a�ect the asso�ciated parse tree tokens we have a time complexity of Ojlist of changesj �length of modi�ed tokens� where length of modified tokens indicates thesum of the length of modi�ed input tokens�

A second enhancement is to take into account unchanged subtrees of anode with a changed production� This corresponds to a time complexity ofOpredistt � t �� We leave this as an open problem�

We conclude our observations about the enhancement of the incrementalparser in Observation ��

Observation � Let G be a context�free textable grammar� x � vwz a stringof the language LG� and t the corresponding parse tree� Let x� � vw�z bea string of the language LG� Let list of changes be a list of modi�edinput tokens� where the modi�cations transform w into w�� Then� the in�cremental parser Algorithms � � executes correctly producing the updatedparse tree t� over G corresponding to the string x�� An arbitrary modi��cation requires a time complexity of Oparsedistt � t � � jlist of changesjworst case� In the best case� the time complexity is Ojlist of changesj �length of modi�ed tokens� �

��

���� An example of incremental parsing

Example �� Assume that G is a grammar G � N�T� P� List� where N �fList� Publ� Author� Title� Journal� Yearg� T � all possible ascii strings�and let the productions and the First� Follow and Dirsym sets be as follows�

Prod First Follow Dirsym TxbleList � Publ� T feotg T yesPubl � Author � �� T T � feotg T yes

Title ����JournalYear ���

Author � Text T n f���g f���g T yesTitle � Text T n f���g f���g T yesJournal � Text T n f����� ��g f����� ��g T yesYear � �

f���g f���gf���g no

Year � ���� Text f���g no

The terminal eot stands for end of text� The column Txble indicates if theproduction is textable or not�

Assume that we have the text

x � Fletcher �Murder��Police�������

which has been parsed and we are changing it to

x� � Fletcher �Murder��Crook�

Both texts are consistent with the grammar G� We denote v � Fletcher �Murder�� w � Police������� z � � and w� � Crook� For the string x wehave the parse tree in Figure ���� This parse tree has been produced fromscratch� The bottom row represents the input token list� These tokens arelinked with the parse tree tokens� The parse tree is up to date� and all theassociated tokens are locally synchronized� the associated tokens representthe same constant or variable string�

Now we modify the input tokens� We substitute the journal name Policeby Crook and delete the comma and the year� Before the parsing is startedwe have the tree and the token list as in Figure ���� The inserted string

��

List

Publicationshhhhhh

������� � � � � �Publication

�����������

��������

���HHHH

XXXXXXX

hhhhhhhhhhhhAuthor Title Journal Year

AFletcher � Murder �� Police ������ �

� � � � � � � �

Fletcher � Murder �� Police �� ���� �

Figure ���� An instance of a parse tree and the corresponding editor tokens�

Crook is not yet an input token and has no associated parse tree token�The links corresponding to the tokens Police� ���� and ���� are still therebut the input tokens are empty�

The incremental parser sets oldptr and newptr to point to Last�v� amongthe input tokens and parse tokens� respectively� The pointer oldptr is set tothe parse tree token ���� and the pointer newptr is set to the input token ��Figure ����

After setting the pointers� the parser calls the function Traverse�oldptrnewptr� to reparse the modi�ed part� Because newptr and oldptr are pointingto the same symbol and there are no epsilons involved� the pointers areadvanced� oldptr is set to point to the token Police in the parse tree andnewptr is set to point to the �rst character of the string Crook i�e�� �C��

The parser goes to line �� in the Traverse function� The function Climb�UpTree�oldptr� returns the node with the label Journal� The parser sets tto point to this node and calls ParseJournal�t Follow�Journal��� SubstituteJournal for X in Algorithm ��

The function ParseJournal�t Follow�Journal�� does not create a new node�because we already have a node with the label Journal� Instead we look for

��

List

Publicationshhhhhh

������� � � � � �Publication

�����������

��������

���HHHH

XXXXXXX

hhhhhhhhhhhhAuthor Title Journal Year

AFletcher � Murder �� Police������ �

� � � � � � � �

Fletcher � Murder �� Crook �

newptr�

oldptr

���

Figure ���� A parse tree and modi�ed editor tokens�

the proper production with the left hand side Journal to expand� Thereis only one production for Journal and its Dirsym set approves the char�acter C� This is� of course� the same production that was expanded whenthe input token Police was parsed� so the parser goes to line � of theParseJournal function� Because the text has changed� it calls the functionParseText�tchild� followset�� The parser scans the new string Crook un�til one of the strings in the follow set is encountered and attaches this stringto the node� This moves newptr to the �nal input token � � Now the parsermoves oldptr past the yield of the old subtree which was only the parsetree token Police� The pointer oldptr is therefore set to point to the commatoken ���� The new scanned text is made into an input token� which replacesthe old token Figure ����

The loop in the Traverse function has now been executed once� Thepointer oldptr points to the comma token ��� and the lookahead symbol ofoldptr is ���� But the input text starts with a period� The parser calls thefunction ClimbUpTree�oldptr� and sets t to point to the node with the labelYear� Thereafter the parser calls the function ParseYear�t Follow�Year���

List

Publicationshhhhhh

������� � � � � �Publication

�����������

��������

���HHHH

XXXXXXX

hhhhhhhhhhhhAuthor Title Journal Year

AFletcher � Murder �� Crook ������ �

� � � � � � � �

Fletcher � Murder �� Crook �

oldptr

JJ�

newptr�

Figure ���� A partly updated parse tree and corresponding editor tokens�

List

Publicationshhhhhh

������� � � � � �Publication

�����������

��������

���HHHH

XXXXXXX

hhhhhhhhhhhhAuthor Title Journal Year

Fletcher � Murder �� Crook � �

� � � � � � �

Fletcher � Murder �� Crook � �

Figure ����� An updated parse tree and corresponding editor tokens�

��

The node exists� so no new one is created� We see that the productionthat is expanded this time is a new one� The parser chooses the produc�tion that derives the empty string because the following input charactersmatch the Dirsym set of this production� The parser calls the productionParse�t �child� �

Here the parser inserts an epsilon input token into the text and linksit with the tree� The parser returns to the Traverse function and skipsTraverse yield of tree�tree� and removes the excess child the second of thenode with the label Year� The associated input tokens are also removed� Thepointer oldptr is then advanced beyond the yield of the tree i�e�� to pointto the period token ���� Now both pointers have been moved beyond thechange� the parsing is locally synchronized and oldptr points to a possibletermination node� Therefore� parsing terminates Figure �����

We can compare the amount of work done by the incremental parser withthe work done by a parser that reparses the entire parse tree� In this exam�ple� the incremental parser calls � parse functions ParseJournal � ParseText �ParseYear � Parse�� There is one call to the Traverse function and two callsto the ClimbUpTree function� The parser parse�from�scratch would call � parse functions only for the Publication subtree ParsePublication � ParseAuthor �etc�� Of course� the parser would not be limited to this subtree but it wouldtraverse the entire parse tree� In this case� the advantage of the incrementalparser is obvious� �

��

Chapter �

Incremental syntax�directed

translation

We have studied syntax�directed translations in Section �� A string of the in�put language is translated to a string of the output language� Where parsinguses only one grammar for constructing the parse tree of a string� syntax�directed translation uses two grammars� an input and an output grammar�The translate�from�scratch technique retranslates the input string after it hasbeen modi�ed� Incremental syntax�directed translating reuses the old transla�tion� updating the output string where needed� Because the input and outputstrings can be viewed as parse trees� we also use the term incremental treetranslation�

��� Incremental translations

We start by de�ning what we mean with an incremental translation� LetF � N����� R� S be a syntax�directed translation schema and let x be astring of the input language L� and y a string of the output language L�such that x� y is a translation form of F � Let tx and ty be the correspondingparse trees�

Assume that the string x and hence its parse tree tx are modi�ed givinga new string x� and a parse tree t�x� We can for simplicity assume that thetree is updated through incremental parsing� but this is not essential for thefollowing discussion�

��

tx

t�x

ty

t�y

�incremental�modi�cation

translate

incremental

translation

incrementalmodi�cation

������

AAAAAA

������

AAAAAA

������

AAAAAA

������

AAAAAA

��AA

��AA

p p p p p

p p p p

p p p

p p

p

p p p p p

p p p p

p p p

p p

p

p p p p p

p p p p p

p p p p p

modi�ed part

Figure ���� A scenario for incremental syntax�directed translation�

An incremental translation updates the tree ty by retranslating modi�edsubtrees in the tree t�x modi�ed with respect to the tree tx and substitutesthe corresponding subtrees in the tree ty with the retranslated trees seeFigure ���� The goal is to do minimal work with respect to the update� Aswe shall see� this is not always possible�

����� Alternative approaches of retranslation

We give three alternatives for incrementally retranslating a tree� In the �rstalternative we start retranslating the tree t�x at its �rst changed node inpreorder �rst edit change� The node can be found top�down by �ndingthe �rst changed subtree or bottom�up by going upwards in the tree fromthe �rst edit change� The translation proceeds until it has passed the lastedit change the last changed subtree and reaches a termination node seebelow�

In the second alternative� the roots of the changed subtrees in tree tx aregiven as input to the incremental translation� The list is traversed in left toright order retranslating changed subtrees� This is a more e�cient method

��

than the �rst alternative� but demands additional parameters�The third alternative is to combine incremental tree translating with in�

cremental parsing or another incremental tree translation� As soon as asubtree in the input tree changes� it is retranslated and the correspondingsubtree in the output tree is substituted�

����� Relation to incremental parsing

Incremental translation is clearly related to incremental top�down parsing�The relation to incremental parsing is explained through the following cor�respondences�

Concept Incremental parsing Incremental tree translation

Modi�cation Change in string �� Change in input tree ��Change in tree Change in output tree

Method Top�down parsing Top�down translationInput String with changes Input tree with changes

First and last edit First and last editor list of changes or list of changes

Parse grammar Input and output grammarOld parse tree Old output tree

Output Updated parse tree Updated output tree

For a presentation of related work of incremental parsing see Section ��

��� Data structure support

In order to do the retranslation e�ciently� we need some information of theassociated nodes in the input tree tx and its corresponding output tree ty�Two nodes n and m in tx and ty� respectively are associated� if the resultof translating the subtree rooted at n is the subtree rooted at m�

We assume that we can always �nd the associated node m of n� This canbe implemented in two ways� We can connect n with m with a physical linkreference� This is very time e�cient� but demands an extra attribute ofevery node in the input tree�

��

The associated node can also be found by traversing� First �nd the parentn� of n and the associated nodem� of n�� Then traverse the children ofm� untilthe associated node m of n is found� If the SDTS is simple� i�e�� nonterminalsin the input and output productions occur in the same order� the traversalis easy� Let the label of n be l� Now there might be several nodes among thesiblings of n with label l� Assume that n is the ith node among siblings withlabel l i�e�� there are i� � siblings of n with label l to the left of n� Thenm is the ith l�labeled node among the children of m���

In our approach� we demand that iteration productions are simple and ofthe type I � A�� Then the ith subtree of an iteration always corresponds tothe ith subtree of the associated subtree in the output tree� The nonterminalsof other productions can be permuted� except for nonterminals with the samelabel� these always occur in the same order between themselves� Therefore�the associated node can be found as above in the case of a simple SDTS�

The traversal approach trades space for time� Let k be the maximumlength of an input�output production pair� Then the time complexity ofthe traversal is clearly Ok� for the production in question� Iterationsare traversed di�erently� see below� The link approach demands an extraattribute for every node� but an associated node is accessed in one step�

��� Termination conditions

As in incremental parsing� we de�ne here some termination conditions forincremental tree translation�

Assume that a node n is a termination node in an incremental update of atree and that the tree is used as an input tree in an incremental translation�Let the associated node of n in the output tree be n�� It is not certain�however� that the associated node n� is a termination node for the incrementaltree translation see Example ��� As a matter of fact� if n is a constant stringthere is no associated node in the output tree� But if we translate subtreesof the input tree in left to right order until we reach a termination node� all

�If the nonterminals can occur in di�erent order in the input and output productions�the grammars must be considered when looking for the associated node� We assume� how�ever� that if there are several occurrences of a nonterminal in two associated productionsin an SDTS� the occurrences are associated in the order they appear� Compare with thede�nition of a translation form on page �

��

modi�ed subtrees in the input tree will be correctly translated� as long asthe production expanded at the parent node of the termination node has notchanged�

Example �� Let F be a syntax�directed translation schema with the pro�duction S �� ABC�CBA� Assume that we have instances of the input andoutput strings where the corresponding trees are tx and ty Figure ����

����

AAAA

tx

S

A B C

����

AAAA

ty

S

C B A

translation �

Figure ���� Two corresponding trees of a translation form�

If node B in the input tree tx is a termination node when updating theinput tree� the corresponding node cannot be considered as a terminationnode in the tree ty� then the node A would not be translated and the nodeC would be translated unnecessarily� If we� however� translate the nodes intree tx in left to right order until we reach a termination node in the inputtree� all subtrees are correctly translated� �

If the production at the parent of a termination node changes in theinput tree� the entire subtree with the parent as root must sometimes beretranslated We give the following clarifying example�

Example �� Let F be a syntax�directed translation schema with produc�tions S �� ABC�ABC a and S �� DBC�DBC b� where a and b areconstant strings� Assume that we have instances of the input and outputstrings where the corresponding trees are tx and ty Figure �� �

Let node B in the input tree tx be a termination node� This means thatthe node B remains the same after the modi�cation� Even if retranslations

��

����

AAAA

t�x

S

D B C

����

AAAA

HHHH

tx

S

A B C a

����

AAAAHHHH

t�y

S

D B C b

incremental�translation

����

AAAA

ty

S

A B C

translation�

Figure �� � A change of production during retranslation�

of all subtrees preceding B are done� the resulting output tree is incorrect ifthe subtree rooted at S is not retranslated� The production at node S haschanged� thereby changing constant strings� �

We state our observations as a lemma compare with Lemma � on page ���

Lemma � Let F � N����� R� S be a syntax�directed translation schema�Let x� y be a translation form of F with tx and ty as the corresponding parsetrees� Assume a left to right top�down incremental tree translation order�Assume also that the string x is modi�ed to x�� The incremental retranslatorcan stop retranslating when the following termination conditions hold�

a All nodes with new expanded productions have beenretranslated�

b The retranslation has passed a termination node in theinput tree�

Proof We brie�y sketch the proof� It is obvious from the example abovethat the retranslator must process all nodes with a new expanded produc�

��

tion� Otherwise� consistency of the resulting tree with the input may notbe maintained� On the other hand� the contents of a leaf node might havechanged without changing the expanded production at an ancestor node�Therefore condition b corresponds to conditions a and c in Lemma � en�suring that the translator has retranslated all the modi�ed nodes� �

If we want to set a termination node m for the output tree it can bedetermined as follows� Let n be the termination node of the input tree� If noancestor of n in the input tree has changed its production� m is the associatednode of n or the associated node of the nearest nonterminal node followingn in preorder if n does not have an associated node�

If some ancestor of n has changed its production� we search for the oldestsuch ancestor� The termination node m is the associated node of the �rstnonterminal node following the ancestor in preorder�

��� Iterations

We have so far handled only changes changed contents or changes of ex�panded productions in the input tree� not insertions or deletions� These canoccur when the input and output grammar contain iterations�

Iterations are more di�cult to retranslate� we must know where to insertor delete an iteration part� and how to permute the nonterminals in it� Ifan iteration part is inserted to or deleted from a subtree� we can consider theproduction to have changed at the root and retranslate the entire subtree�This solution is still better than the transformation�from�scratch method�but we might do a lot of extra work retranslating iteration parts that havenot changed�

We restrict the iteration productions to the type I �� A�� where theiteration part consists of a single nonterminal or a constant string� Otheriterations can easily be rewritten in this form by adding more productions�

����� Deletions

Deletions in the input tree are easily updated in the output tree� But asthe deleted subtrees are no longer present in the input tree� the incrementaltranslation needs a separate list of deleted nodes� All deleted subtrees havean associated node in the output tree which is deleted�

��

In the implementation� deleted nodes can be removed from the iteration�marked as deleted and maintained as separate trees� This guarantees thatthey can be given as parameters to the incremental translation algorithm�

����� Insertions

Insertions are more di�cult to handle than deletions� When retranslatingchanged and new subtrees� new nodes belonging to iterations have no asso�ciated nodes in the old output tree� Because iteration productions are of thesimple type I �� A�� there is no problem with permutations� A new nodein the input tree is translated top�down and the resulting subtree is insertedto the right of the associated node of the left brother of the new node� If thenew node has no left brother� the result is inserted as the �rst child of theiteration node in the output tree�

��� A naive algorithm

We give an algorithm that follows the second alternative approach of transla�tion Algorithm ��� p� ��� The algorithm is given a list of modi�ed inserted�deleted� or changed subtrees to retranslate� a subtree has been modi�ed ifthe production applied at its root has been changed� or if its contents in thecase of a Text node has changed� Only non�overlapping modi�ed subtreesare included in the list� modi�ed subtrees contained in another modi�ed sub�tree are not included� E�g�� if the production applied at the root of the inputtree has changed� the list contains only one element� the root�

The algorithm traverses the list� handling one modi�ed subtree at a time�If the subtree t has been deleted and it is part of an iteration it is alsodeleted from the output tree� If it has been inserted� an associated node ofthe root of the subtree must be inserted in the output tree� The insertionpoint is determined as follows� First �nd the left brother of the root n of tand its associated node m in the output tree� Then insert a new node m� tothe right of m� If there is no left brother of n� insert m� as the �rst child inthe iteration� Node m� is the root of a subtree t� corresponding to t� Nodesare only marked as deleted or inserted if they are roots of subtrees in aniteration� Inserted new nodes are translated top�down�

If t was neither deleted nor inserted� it must have been modi�ed new

Algorithm �� �Naive incremental syntax�directed translation�

procedure inc translate�list of changes tx ty��

Input� A modified input tree tx a list of nonover�

lapping nonredundant modified subtrees in tx and

the old output tree ty�

Output� Updated output tree ty�

Task� Propagate modifications through tree�

begin

� while list of changes not empty do

� t �� next modified subtree�

� if t deleted then

� Remove associated node t��

� Mark t� deleted

� elsif t inserted then

� Insert an associated node t��

� Mark t� inserted�

Top�down translate t at t��

� else t is modified

�� Top�down translate t�

�� Replace associated node t� in ty�

�� Mark t� modified�

end�

production or modi�ed contents� The subtree t is translated top�down andthe result replaces the old associated tree t��

Inserted and modi�ed nodes are marked accordingly� All these nodes mustbe collected in a list of all modi�ed iteration parts� this makes it possible touse the output tree as an input tree in another incremental translation��

The modi�ed subtrees of the input list are completely retranslated� Thisis due to the translation�from�scratch algorithm which is called to translatethe modi�ed tree top�down� A tree is considered to be modi�ed when the

�This is actually the case� when di�erent document representations are updated in theHST system� A view tree is retranslated to update the logical tree� which is retranslatedto update other view trees� etc� see Section ���

��

production expanded at the root is changed� Therefore� if the productionexpanded at the root has changed� the input tree is completely retranslated�But even if a new production is applied at the root� some of the subtreesmay remain unchanged see Example �� on page ����

����� Complexity

All subtrees with a changed production are retranslated� Therefore the workdone by his algorithm is proportional to parsedisttx � t �x � parsedistty� t �y�We traverse the modi�ed subtrees in tree t�x and translate them to subtrees inthe tree t�y� The method that translates the tree from scratch demands timeproportional to the sum of the size of trees t�x and t�y� which is the complexityin the worst case of Algorithm ���

Compared with translation from scratch� some extra space is needed forboth the input and output trees� The algorithm reads constructs a list ofmodi�ed subtrees� For every modi�ed subtree on the list we need to knowif it has been deleted� inserted� or modi�ed� The references to the subtreescan be pointers� The incremental translation therefore demands extra spaceof size � jlist of changesj� An upper bound on the list of changes is thesize of the input output tree� The algorithm needs links assuming thisimplementation for associated nodes� This extra space is proportional tothe size of the input tree�

Our conclusions are summarized in Observation �

Observation � Let tree ty be a translation of the tree tx� let t�x be an updateof the tree tx and list of changes a nonredundant covering list of modi�edsubtrees in tx� Then Algorithm �� executes correctly producing the updatedversion of the output tree in time Oparsedisttx � t �x � parsedistty� t �y andextra space Ojtxj� jlist of changesj� �

��� An enhanced algorithm

Algorithm �� is not optimal� When the production at a node changes� itis still possible that parts of the subtree rooted at the node remain intact�The incremental tree translation Algorithm �� retranslates all subtrees withchanged productions in order to maintain consistency�

��

Algorithm �� �Enhanced incremental syntax�directed translation�

procedure inc translate��tx ty��

Input� Modified input tree tx where all new and modified

nodes have been marked and old output tree ty�

Output� Updated output tree ty�

Task� Propagate modifications through tree�

begin

� Let n be the root of tx�

� Let W �� � be the production expanded at n and

� let W �� � be the annotated production�

Assume � consists of symbols X��X�� �Xk�

� Let m be the root of ty�

� Reorder insert and delete children of m so that

their labels form � � X�X� Xk�

� Let J � fj�� j�� � � � � jpg be the permutation order

where p is the number of permuted nonterminals

� Let n�� n�� � � � � nl be the children of n and

m��m�� � � �mk the children of m�

� for all marked nonterminal children ni of n do

inc translate� �ni mji��

� mark mji�

end�

Consider Example �� on page ��� Because the production at node S isreplaced in the input tree� Algorithm �� retranslates the entire tree� To bemore e�cient� the algorithm could delete old constant strings and insert newones in the output tree and then retranslate subtrees until a termination nodein the input tree is reached� In the example� the constant string a could bereplaced with the string b� and as B is a termination node� only the subtreeat D should be retranslated�

We give an enhanced incremental translation algorithm Algorithm ��which uses a more e�cient method of change propagation� This new al�gorithm can be called instead of the translation�from�scratch procedure inAlgorithm ��� The algorithm can also be executed with only the input and

��

output trees as parameters� A precondition of the algorithm is that all mod�i�ed subtrees nodes in the input tree are marked�

The algorithm starts by �nding the production expanded at the root oftree tx and its associated production for the tree ty� Thereafter it reorders�deletes� and inserts nodes to the root of the output tree so that the labels ofthe nodes form the right hand side of the output production� Old nodes arereused as much as possible even constant strings� As most usable produc�tions contain less than a constant number of children iterations excluded�this reordering is not too time consuming�

If� however� the production is an iteration of type I � A�� we mayassume that references to deleted� inserted� and modi�ed nodes are kept in alist associated with their parent� The time saved might very well compensatefor the extra space used�

Modi�ed nodes are marked correspondingly and incrementally retrans�lated� in Algorithm �� they were translated from scratch� Deleted or insertednodes do not have to be marked except in iterations� the translator noticesthem to be missing or present and can take appropriate actions�

����� Complexity

Algorithm �� retranslates only modi�ed subtrees� In order to determinewhich subtrees to translate� all the children of the root of a modi�ed subtreeare checked� Considering that checking for changed children takes much lesstime than retranslating a subtree� the time complexity is Olog jtxjpredisttx�t�x � log jtyjpredistty� t�y� where the logarithmic factor stands for the pathfrom the root to the modi�ed node� In any practical case� the number ofchildren is limited iterations excluded and the checking can be done inconstant time� Iterations are handled in a di�erent way by associating a listwith deleted� inserted and changed nodes with the root of the iteration inthe input tree� These nodes can then be processed under the same timecomplexity�

The logarithmic factor disappears if the algorithm is given a list of allmodi�ed� inserted� or deleted nodes as input�

Algorithm �� demands more space than the naive algorithm� For boththe input tree and the output tree we need an attribute for each node tellingwhether it has been inserted or changed or if it is unmodi�ed� Deleted nodeshave been removed or are being removed from the tree and do not need to be

��

marked� Furthermore� the list at iteration nodes containing inserted� deleted�and changed nodes demands a number of references to these nodes� limitedby the size of the tree� The link implementation between associated nodesdemands an extra space of the size of the input tree� Therefore we concludethat the extra space needed is at most jtxj�� jty j or Ojtxj� jtyj� wheretx and ty are the input and output tree� respectively�

Our conclusions are summarized in Observation ��

Observation � Let tree ty be a translation of the tree tx� let t�x be an up�date of the tree tx� and assume that changed or inserted nodes in tx havebeen marked� Assume also that iterations have been �marked� with a listcontaining all modi�ed nodes� Then Algorithm �� executes correctly produc�ing the updated version of the output tree in time Olog jtxjpredisttx� t�x �log jtyjpredistty� t�y and extra space Ojtxj� jtyj� �

��� An example of incremental tree transla�

tion

We give a small example of incremental tree translation as a successive stepto an incremental update incremental parse or incremental translation ofthe input tree tx�

Example �� Let F be an SDTS with the following productions� Note thetwo di�erent representations of the Publication production� A blank space isdenoted by ��

Grammar � List � Publications � PublicationsPublications � Publication� � Publication�

Publication � Author � �� Title ���� Journal Year ��� �Author Title Journal Year

Publication � �Author �� Author ���Title �� Title ����Title Author

Author � Text � TextTitle � Text � TextJournal � Text � TextYear � ���� Text � Text

��

Listtx

Publicationshhhhhh

������� � � � � �Publication

�����������

��������

���HHHH

XXXXXXX

hhhhhhhhhhhhAuthor Title Journal Year

A��Text Text Text Text

Fletcher � Murder �� Police ������ �

Figure ���� A parse tree of an input string�

A syntax�directed translation from scratch of the tree in Figure ��� pro�duces the tree in Figure ����

A modi�cation in the view tree� e�g�� changing the parse tree tokens

Fletcher �Murder��Police�������

to

Author �Fletcher��Title � Crook�

invokes the incremental parser to produce the corresponding modi�cations inthe view tree Figure ���� The period is the termination node of the reparse�No node after it is changed in the update of the view tree�

Now we invoke the incremental translator� The naive Algorithm �� isgiven a list of modi�ed subtrees not contained in each other consisting ofone subtree� the subtree at Publication� Because the production expanded atthe root of this subtree was changed� the entire subtree is retranslated seeFigure ���� This corresponds to removing all � old nodes in the Publicationsubtree of tree ty Figure ��� and inserting � new nodes in the same subtreeFigure ���� all together �� �node modi�cations� parsedistty� t �y � �� �

��

Listty

Publicationshhhhhh

������� � � Publication

XXXX

Publication��������������HHHH

XXXXXXXAuthor Title Journal Year

A

Author

Text Text Text Text Text

Fletcher Murder Police ���� � � �

Figure ���� The output string of a translation as a tree�

In retranslating the subtree t�x it must be traversed� This corresponds totraversing � nodes in the subtree at Publication�

If we want to set a termination node for the updated parse tree� it is setto the Author node in the following subtree Figure ����

Let us assume that we invoke Algorithm �� with the subtree at the Pub�lication node as parameter� We also assume that modi�ed nodes have beenmarked� in this case the Publication and Title nodes have been marked aschanged� Additionally� two nodes have been deleted Journal and Year�The incremental translator starts by processing the changed node Publica�tion� The translator checks the children and removes the associated nodeswhich were deleted in the input tree� Then no nodes are inserted� and theremaining nodes are reordered as in Figure ���� The node Author is notretranslated because it has not been marked� The node Title has been mod�i�ed� and is retranslated� After this the termination node the period in theinput tree has been reached and the incremental tree translation is termi�nated� We see that the incremental translator removes � nodes� reorders �nodes and retranslates nodes the subtree beginning at Title� All in allthere are �� �node modi�cations��

���

Listt�x

Publicationshhhhhh

������� � � � � �Publication

�������������

��QQQ

XXXXXXX

hhhhhhhhhhhhAuthor Title

Text Text

Author � Fletcher ��Title � Crook �

Figure ���� A parse tree of a modi�ed input string�

t�y List

Publicationshhhhhh

������� � � Publication

HHH

Publication���HHHH

Title Author Author

Text Text Text

Crook Fletcher � � �

Figure ���� A parse tree of a retranslated output string�

���

The termination node of the output tree is set to the Author node� whichis the node following the last changed subtree the second Author node inFigure ���� The result of the retranslation is of course the same as the resultof the translation�from�scratch Figure ����

���

Chapter

Lazy syntax�directed

translation

Lazy syntax�directed translation transforms a part of a document instance toa view� The subtrees in the part are all transformed forming a lazy view ofthe document� A lazy translation di�ers from pruning a document� A prunedview of a document contains only certain logical parts� e�g�� all section titles�Lazy views and pruned views are both special cases of partial views� A generalpartial view can contain any transformed parts of a document instance� Inthis section we present a formalism and technique for lazy translation of adocument instance� Lazy translation is combined with incremental updates�

��� A lazy view grammar

Lazy translation transforms a part of the document instance into a view� LetG be a grammar and Ga its annotation� We denote by T G and T Ga thesets of all possible parse trees overG and Ga� Let t � T G be a parse tree fora document� We say that a lazily transformed document is a tree w � T W �where W is a suitable grammar we explain W later� Transforming a partof a document instance is a function T W � R � T W � where R is a setof ranges that indicate where the transforming should start and how muchof the document should be transformed�

We demand the grammarW to be such that the lazily transformed tree wcan contain subtrees according to the grammar G� We furthermore demand

��

that T G � T W and T Ga � T W � Let

pG � AG � B�B� � � �Bn

be a production of G and let

pGa� A � b�Bi�b�Bi�b� � � �bnBinbn��

be the corresponding annotated production of Ga� A lazy production of pGand pGa

for a lazy view grammar is a production

pW � A � b�Bi�b�Bi�b� � � �bnBinbn�� j AG�

a combination of pG and pGa� Each bi is a terminal string or empty� and the

indices i�� � � � � in are a permutation of the sequence �� � � � � n�Let t � T G be a document instance� We say that the parse tree w �

T W is a lazily transformed representation lazy view of t if for each noden � w we have that the production of Ga corresponding to n has beenapplied� or the transfer production referring to G has been applied and thesubtree n�t� for n is the corresponding subtree of t i�e�� the two trees sharethe subtree� Technically� we implement a transfer production as a link fromthe lazy view tree to the logical tree�

Example �� Let grammar G contain productions P � fSG � AG BG CG�AG � a� BG � b� CG � cg and let its annotated grammar Ga contain thecorresponding productions Pa � fSGa

� CGa��� BGa

��� AGa� AGa

� a� BGa

� b� CGa� cg� The lazy grammar has productions PW � fSGa

� CGa���

BGa��� AGa

j SG� AGa� a j AG� BGa

� b j BG� CGa� c j CGg � P � In

Figure ��� we have parse tree t over grammar G and a lazy view t� of t� Thesubtrees with roots CG and BG have been transformed� The subtree withroot AG has not been transformed� Instead the transfer production has beenexpanded in the lazy view t�� In practice� we implement this by a special link from the node AGa

to the associated node AG in tree t see Figure ����

The grammarW is similar to the view grammar Ga� but every productionin Ga is extended with a nonterminal that transfers the production into G�A tree w over T W can be partially transformed� but those parts that are

���

t t�

SG����

HHHH

SGa

����

��

HHHHAG BG CG CGa

BGaAGa

a b c c � b �

Figure ���� A parse tree and its lazy view�

not transformed are directly linked to the document tree� Every node in aview tree is then transformed or linked to the document tree� A transformednode can be the root of a subtree that has been totally transformed� Atransformed node can also have children that have not been transformed�

��� Iterations

We modify the iteration productions of a lazy view grammar for better e��ciency� According to the de�nition of a lazy production given above� the rootof an iteration subtree must be totally transformed all children present� orlinked to the associated iteration node in the document tree� This violatesthe lazy principle� Obviously� we want to transform only those iteration sub�trees that are needed� As a solution we give a slightly di�erent lazy iterationproduction� Assume that we have the production

pG � AG � B�

G

and its annotated version

pGa� AGa

� BGa����

with a semicolon delimiting each BGa� We have the lazy production

���

SW�

���

����

eeee

aaaaa

BW� BW �rAG

��

��

����

eeee

bbbbb

BG BG BG BG� � �

not transformed

Figure ���� A lazy view of an iteration�

pW � AW � BW ���� AG j �

If we have an iteration in n and the transfer production has been applied�the subtree n�t� beginning at n is the combination of the corresponding sub�tree in t and a predicate � that indicates those nodes of the iteration thathave been transformed�

Example �� In Figure ���� we have a lazy view tree� where the �rst twosubtrees BG of an iteration have been transformed� The subtree for AG ispart of the document t� The predicate � tells how much of the iteration hasbeen transformed� �

��� Continuous transformation

Transforming more of a lazily transformed document is a function T W � R� T W � where R is the set of ranges� Assume that w � T W is a lazy viewof the document� When applying the lazy translator� we choose some noden � w� which is indicated by the range r � R and where the transfer produc�tion has been applied� and call the appropriate translation procedures� Theresult is a lazily transformed tree� Algorithm �� performs a lazy translation�

���

Algorithm �� �Lazy translation�

procedure Lazy translate�t r��

Input� Lazy view tree t and range r�

Output� Lazy view tree�

Task� Transform t in range r�

begin

� while in range r do

� n �� next node in preorder�

� if transfer production nG applied then

� Translate nG�

� Replace n with result�

end�

We denote the range r � R by a triple n� ch#� amount� where n is anode in the lazy parse tree and ch# is the ordinal number of a characterin the frontier of the subtree with the node n as a root� This characteris the �rst character that is part of the result of the transformation� Thevariable amount indicates how much how many characters the result shouldcomprise� The amount can also be negative indicating that the direction ofthe transformation should be from right to left in the input tree�� This waywe can specify a forward or backward lazy transformation� Obviously� it isnot su�cient to specify the starting point by simply giving a node� a Textnode can contain an arbitrarily long text�

We can compare the lazy translator with the incremental translator� Ev�ery untransformed subtree in the lazy view tree can be considered as a mod�i�ed subtree� The lazy translator only retranslates such subtrees� The in�cremental translator �nds associated nodes in the input and output treesthrough links� The lazy translator uses transfer productions for �nding as�sociated nodes�

The complexity of Algorithm �� can be compared with the complexity ofthe incremental translator Algorithm ��� The lazy translator transforms

�This can be done by changing line �� in Algorithm � on page �� to for i � n�size to

� do� i�e�� by processing the children of a node in opposite order

���

only untransformed subtrees� Still� the time complexity depends much on thegiven range r� If the range also includes transformed parts of the view tree�the lazy translator traverses these in order to �nd untransformed subtrees�In this case the lazy translator is not bounded by� e�g�� the measures parsedistor predist� but only by the size of the range r� If the range r is carefully cho�sen� the algorithm will traverse only untransformed parts of the view tree�When it �nds an untransformed node� it traverses the associated subtreeof the input tree and performs the transformation� Let t�� t�� � � � � tm be thenonoverlapping subtrees in the range r� The subtrees can be transformed�untransformed� or partially transformed� Let the corresponding transformedsubtrees be t��� t

�� � � � � t�

m� We have that tj � t�j for some � j m trans�formed subtrees and tk �� t�k for some k untransformed� partially trans�formed� We conclude that the time complexity of the lazy translator liesbetween Ojt�j� � jtmj and Ojt�j� � jtmj� jt��j � jt

mj� wherejtij denotes the size of tree ti� �

If we improve the time complexity by associating with every node in theview tree the status of the node transformed� untransformed� or partiallytransformed� the lazy translator does not have to traverse transformed sub�trees and the time complexity is Opredistt� � t �� � � predisttm� t �m�where predistti� t �i � � if ti � t�i�

��� Combining lazy translation and incre�

mental updates

We combine lazy translation with incremental updates� Assume that theuser is modifying a large document� Assume also that editor space is lim�ited� When the user opens the document� only the �rst part of the documentis transformed and loaded into the editor� Subsequent parts are transformedand loaded when the user moves to other parts of the document� Conse�quently� some parts must then be removed from the editor to make space�If the user has made modi�cations in some parts� these must be parsed andthe document tree must be updated� All updates are done incrementally�

The processing algorithm is sketched as Algorithm � � A portion of textis either loaded or saved� If it is saved� the textual view is �rst parsed toupdate the view tree� The view tree is translated to the document tree� If

���

Algorithm �� �Lazy incremental document preparation�

procedure Process�act t r��

Input� act� load or save view tree t range r�

Task� Transform and load more of a lazy view or

save and update a part propagate changes

through representations�

begin

� if action � save and view modified then

� Incrementally parse view �� view tree t�

� Incrementally translate

view tree t �� logical tree tl�

� if other views open then

� Incrementally translate

logical tree tl �� view trees�

� Update views�

� elseif action � load then

� if no more space then

Process�save t rold��

� Remove parts of view�

�� Lazy translate�t r��

�� Update view�

end�

other views are open� the document tree is translated to di�erent view treesand the corresponding textual views are updated� If a portion of a textualview is loaded� actions are �rst taken possibly to save loaded portions� Theseportions are saved� and their position in the view tree t is indicated by a rangeparameter that has been updated for every load� When there is su�cientroom in the editor the lazy translator can proceed and transform some partof the document� The textual view is updated with the result�

���

��� Examples of lazy transformation

We look at a larger example of lazy transformation� We concentrate onlazy transformation� not on incremental updates� Examples of incrementalparsing and translation can be found in Sections � and ��

Example �� Assume we have the following context�free grammar Gram�mar � and its annotated version Grammar � for modeling the structure ofthe document�

Grammar � Grammar �Document grammar View grammar

List � Publs List � PublsPubls � Publ$ Publs � Publ$Publ � Authors Publ � Authors � ��

Title Title ����Journal Journal � ��Volume VolumeYear ���� Year �����Pages Pages ���

Authors � Author Author$ Authors � Author ���� Author$Author � Initials Name Author � Initials ��� NameInitials � Text Initials � TextName � Text Name � TextTitle � Text Title � TextJournal � Text Journal � TextVolume � Text Volume � TextYear � Text Year � TextPages � Start End Pages � Start ���� EndStart � Text Start � TextEnd � Text End � Text

A part of a document tree might look like shown in Figure �� � The treeonly contains one instance of Publication

If the entire document tree in Figure �� is translated we get an annotatedview tree which is shown in Figure ����

���

List

Publs

Publ

Authors Title JournalVolume Year Pages

��

������

PPPPPP

������������

hhhhhhhhhhhh

Author

Initials Name

Start End�� HH

�� HH

Text Text

Text Text Text Text

Text Text

J� Fay Me Life � � � �

Figure �� � An subtree of a logical tree�

The view that is loaded into the editor is

J��Fay �Me��Life �������������

Assume that this text does not �t totally in the editor what an editor%�Only some characters can be loaded at a time� The lazy translator starts withthe document tree in Figure �� and transforms the �rst part of it� Assumethat the editor can only load �� characters at a time� Then this reference toan article really is too long�

Before calling the lazy translator� we must specify the range parameterr� Obviously� because the document is not open we want to transform the�rst part of it� and the range is set to r � Publ#� �� ��� where Publ# isa link to the root of the subtree at Publ� the number � indicates that theresult should start with the leftmost character of the frontier of the subtree�and the number �� indicates how many characters should be in the result atmost�

���

List

Publs

Publ

Authors Title JournalVolume Year Pages

��

������

PPPPPP

������������

������������

Author

Initials Name

Start End�� HH

�� HH

Text Text

Text Text Text Text

Text Text

J�� Fay Me Life � � � ��

��������

����

�� �

HHHH

��

XXXXXXXX

��� ��

hhhhhhhhhhhhhhhh

J�

Fay

Me

Life

��

��

���

��

Figure ���� A totally computed view subtree�

The lazy translator does not �nd a view tree so �rst it calls TranslateListsee Algorithm � on page �� TranslateList calls Translate Publs and so onuntil the translator has a view tree that contains about �� characters i�e�� atleast �� characters� but if the last node transformed is omitted� there shouldbe less than �� characters� We get a lazy view tree like the one in Figure ����Only the �rst part of the document tree has been transformed� We see thatthe node Authors� the string � ��� the node Title and the string ���� togethercontain �� characters they will �t into the editor� But if also the node Jour�nal is included we get too many characters� Consequently� the rest of theview tree is not yet computed� Instead some nodes have transfer productionsdenoted by r and they are linked to the associated nodes in the docu�

���

List

Publs

Publ

Authors Title JournalVolume Year Pages

��

������

PPPPPP

������������

������������

Author

Initials Name

�� HH

p

p

p

r

p

p

p

r

p

p

p

r

p

p

p

r

Text Text

Text

J�� Fay Me�

��������

����

�� �

HHHH

��

XXXXXXXX

���

hhhhhhhhhhhhhhhh

J�

Fay

Me

��

��

���

Figure ���� A lazily computed view tree� phase one�

ment tree� The range parameter is updated to r � Journal#� � � �� � whereJournal# stands for a link to the Journal node� The string �� charactersloaded into the editor is

J��Fay �Me��

The lazy translator knows that the last node that was loaded was thenode with the constant string ����� The nodes Journal� Volume� Year andPages remain untransformed in addition to the nodes List� Publs� and Publpartly untransformed� The rest of the nodes are totally transformed�

We now try to load more of the view into the editor� The lazy translatorstarts with the node Journal� It is not yet transformed� so the lazy translator

��

List

Publs

Publ

Authors Title JournalVolume Year Pages

��

������

PPPPPP

������������

������������

Author

Initials Name

�� HH

Text Text

Text Text Text Text

J�� Fay Me Life � ��

��������

����

�� �

HHHH

��

XXXXXXXX

���

p

p

p

r

hhhhhhhhhhhhhhhh

J�

Fay

Me

Life

��

��

���

Figure ���� A lazily computed view subtree� phase two�

transforms it and continues� taking into account the constant strings� It hasto go as far as the terminal string following the node Year to get a new stringthat contains exactly �� characters� The node Pages still remains untrans�formed� The range parameter is again updated to r � Publ#� � � �� � Thestring that is the result of this translation is

Life ������

The corresponding view tree is shown in Figure ���� The next load bringsthe rest of the text into the editor� The rest of the text is

�������

���

The totally transformed subtree is shown in Figure ���� The range param�eter would now be r � Publ#� �� � �� � At this moment the entire view istransformed� If the user does not modify the view� new parts of the view caneasily be loaded� If the user chooses to move backwards in the document� theamount parameter in the range is set to a negative value and the transformedview is loaded from a certain position in the tree�

We have not considered here incremental updates as we saw several ex�amples of these in Sections � and �� �

���

Chapter

The HST approach and

implementation

The Helsinki Structured Text Database System HST is an environment forpreparing structured documents� The system keeps di�erent representationsof a document� plain text in an editor� a view tree� and a document tree� Inthis section we brie�y present the incremental parser that has been developed�We also give a brief discussion on problems related to the implementation ofthe incremental and lazy translators�

�� The HST system

The Helsinki Structured Text Database System �KLMN��� was developedduring a three years period starting in ����� The project was part of theFINSOFT software technology program under the Technology DevelopmentCenter TEKES and the work was supported by TEKES and the Academyof Finland� We presented the functional principles in Section � Figure ���shows the user interface of the system� In the �gure� we see two views of adocument containing references from a bibliography database�

The system was developed in C under Sun�s OpenWindows environment�Sun���� For developing the user interface we used the XView �Hel��� macropackage� We also used a relational database system as storage for the struc�tured documents�

We divided the HST system into two parts� the user interface and the

���

Figure ���� An overview of the user interface of the HST system�

PQLM process� The user interface Figure ��� lets the user convenientlymanage di�erent views of a document� The PQLM� process handles parsing�transformations and the connection to the database see Figure ����

User interface ��PQLM�parsingtransformationdatabase

Figure ���� The two parts of HST�

�PQLM stands for �P�string Query Language Machine�� an implementation for the p�string query language PQL we constructed �KLM���b based on the p�string language ofGonnet and Tompa �GT��

���

�� The incremental parser

We have built a simple incremental parser following the speci�cation givenin Section �� Some restrictions were made� For simplicity� we excluded itera�tions� We did not integrate the incremental parser with the entire HST sys�tem but only with the PQLM process� We started by writing an incrementalparser for a certain view grammar and then continued by developing a parsergenerator� The generator consists of about ���� lines of code in the PQL lan�guage� which is a procedural tree manipulation language �KLM���b�� Thegenerator produces incremental parse functions for context�free grammarswith textable productions�

�� Test results

We have tested the incremental parser and compared its performance withparsing from scratch� We have also compared it with the original interpretingparser� used in HST� The results show that the incremental parser is fasterthan parsing from scratch� It should be noted though� that we use theincremental parser to parse from scratch by forcing it to parse also unmodi�edinput tokens� The incremental parser also seems to be slightly faster thanthe interpreting parser� The interpreting parser consists of ��� lines of PQLcode whereas the generated parser for this example consists of ��� lines ofPQL code� The tests were run on a Sun Sparc station SLC�

We �rst tested the parsers on the documents and grammars of Exam�ple ��� where we had a very small document� We have the text

Fletcher Murder� Police� �����

and change it to

Fletcher Murder� Crook�

We parsed the �rst text with the interpreting parser and the incrementalparser from scratch� Then we made the modi�cation and parsed the modi��cation with the interpretive parser and the incremental parser from scratch

�An interpreting parser is a generic parser that takes a grammar and a text as input�The text is parsed according to the grammar� No parse functions are generated� See�Nik�� for details�

���

as well as incrementally only the modi�ed part with the incremental parser�The results are shown in Table ��

Table �Interpreting Incremental Incrementalparser parser parser�from scratch��ms �from scratch��ms �incrementally��ms

First text ��� �� Modi�cation � ��

The documents were too small � and �� bytes to give reliable resultsbut the tendency shows that the incremental parser performs well� The factthat the interpreting parser performs di�erently on the two strings may bedue to the fact that the modi�ed string is shorter and therefore parsing ittakes less time�

In the next test we used the same grammar� but enlarged the documentsto about � kB� The author�s name was now about � kB and the rest ofthe documents were the same as in the case above� We also used the samemodi�cation in the �rst document to create the second text� The results areshown in Table ��

Table �Interpreting Incremental Incrementalparser parser parser�from scratch��ms �from scratch��ms �incrementally��ms

First text ���� ���� Modi�cation ���� ���� ��

Here� it is clear that the parser loses time when it is obliged to read theentire text about � kB instead of just the modi�cation� The results alsoshow that the parser with generated functions the incremental parser isfaster than the interpreting parser� In our opinion this clearly shows thatthe incremental parser is a valuable extension to HST�

���

�� The lazy translator

There are several problems connected with the implementation of lazy trans�lation� We want lazy translation to be invisible to the user� This meansthat the user prepares a document without knowing that only parts of it areloaded� Consequently� parts must be loaded and saved automatically whenthe user moves in the document� Edit operations like move� search� etc�� af�fect this process� Another problem is the granularity of the document parts�How much should the translator transform at a time� This problem arisesespecially with textable productions� Take as an example the production S� Text� A corresponding parse tree can contain a Text node with an ar�bitrarily long string� The lazy translator must be able to divide the stringduring transformation�

�� Incremental translation

There are also several problems connected with the implementation of anincremental translator� Incremental translation is heavily based on associ�ated nodes and links between them� It is easy to link a view tree to thecorresponding document tree� But a document tree can have several corre�sponding view trees� A solution might be to link associated nodes in viewtrees and the document tree circularly� A second problem is the update ofthe node status� A modi�cation can mark a node inserted� deleted� changed�or unchanged� All nodes in a tree cannot be marked this way after eachmodi�cation� If the modi�cations are propagated further to other trees� thestatus mark must be maintained during the propagation�

�� Discussion

As mentioned� we have only implemented a part of the incremental parser�The incremental translator and the lazy translator have not been imple�mented� The HST system is based on a procedural query language PQL�which has been implemented earlier �KLM���b�� Our intention was to im�plement searches� parsing� translations� etc�� of a document in this language�In fact some research has been done on structured searches �Kil���� However�

���

programming with the PQL language proved to be di�cult� Only a subsetof the features of a conventional procedural language was implemented inthe PQL language� This makes it hard to implement certain data structures�This is also a reason for the large size of the incremental parser generator�

�� Future work

We are at the moment thinking of rebuilding the entire system from scratch�possibly moving it to other environments than the Sun&OpenWindows envi�ronment� In this case incrementality and laziness could be part of the newsystem� No de�nite decisions have been made�

���

Chapter �

Conclusion

We have studied some incremental and lazy techniques customized for thepreparation of structured documents�

Incremental updates have an advantage over non�incremental updates� wehave shown that they are faster� However� all problems are not incrementalin nature� Incremental algorithms demand extra space and maintenance ofauxiliary data structures� Closely related to the incremental paradigm is thelazy paradigm� We also expect lazy algorithms to be faster than algorithmsthat compute the entire solution�

Parsing and syntax�directed translation are two areas where we can useincremental and lazy algorithms� We have extended an incremental parsingalgorithm by Murching et al� �MPS��� to handle structured texts�

We divide incremental parsing into three subproblems� First� the startingposition of the parse tree update must be determined� Our algorithm �ndsthe starting position by comparing the �rst modi�ed input token with theold parse tree�

The second problem is the method used when reparsing� We have giventwo di�erent solutions� The �rst algorithm starts reparsing at the �rst mod�i�ed input position and continues until it has passed the last modi�ed inputposition� It also reparses or traverses input tokens and parse subtrees thathave not been modi�ed� We have shown that time complexity does not de�pend on the size of the modi�cations� Still� it is bounded by the size ofthe input and the size of the parse tree� We have also shown that in thecase of a simple modi�cation� the time complexity is bounded by the size ofthe modi�cation and its update� Our enhancement of the parsing algorithm

���

traverses only modi�ed parse subtrees� The input and the old parse treecan be consistent with each other between modi�cations in the input� Wehave discussed how the algorithm could be implemented to skip such localsynchronizations� Under the assumption that the algorithm is given a list ofmodi�ed input tokens as a parameter this latter algorithm has a lower timecomplexity than the former�

The third subproblem is the termination of a reparse� We have shownthat a modi�cation can have a far�reaching in�uence on the parse tree� Asimple modi�cation can restructure the entire parse tree from the �rst mod�i�ed token to the last token in the tree� We have given two terminationconditions� when the conditions hold� and the last modi�ed input token hasbeen parsed� the incremental parser can stop reparsing� The �rst conditioncaptures synchronizations between input and the parse tree� Input tokensthat have already been parsed before are not reparsed� if they are in theright position� The second condition restricts the development of the parsetree structure� If the structure of parse tree is not modi�ed beyond a certainnode� this node can be considered as a possible termination point� The top�down incremental parser is aware of the modi�cations in the tree structureduring traversal from the root towards the leaves� and any modi�cation on ahigher level in the tree is always noticed before reaching a lower level�

Incremental translation is closely related to incremental parsing� We havepresented an incremental syntax�directed translator that e�ciently updatesan output parse tree due to modi�cations in an input parse tree� A naivetranslator transforms all modi�ed input subtrees� The algorithm is givenas input a list of non�overlapping modi�ed subtrees� thereby correspondingto the enhanced version of the incremental parser� We have also presentedan enhancement to the incremental translator� A modi�ed input subtree cancontain unmodi�ed parts� Our enchanced algorithm is optimal in the respectthat it retransforms only modi�ed parts�

The start position of the incremental translator is the �rst modi�ed sub�tree� i�e�� the �rst subtree in the list given to the translator� The terminationposition can be determined from the termination position of the input tree�When the input tree is updated by incremental parsing or translation� a ter�mination position is determined� This termination node has a correspondingposition in the output tree not necessarily its associated node�

We have presented a lazy syntax�directed translator that transforms onlypart of an input tree� Applied on structured document preparation� the

��

algorithm computes lazy views of document instances� Lazy and incrementaltechniques combine very well� A lazy view is updated incrementally� Wehave presented a scenario of lazy incremental document preparation� In theprocess� parts of a document instance are automatically transformed intolazy views� which are incrementally updated when needed�

Finally� we have discussed the implementation and test results of a simpleincremental parser and some implementation problems of the incrementaltranslator and the lazy translator in the structured text database systemHST�

���

Bibliography

�AD� � Rakesh Agrawal and Keith D� Detro� An e�cient incrementalLR parser for grammars with epsilon productions� Acta Infor�matica� ���� �� ��� September ��� �

�Ado��a� Adobe Systems Corporated� PostScript Language ReferenceManual� Addison�Wesley� June �����

�Ado��b� Adobe Systems Corporated� PostScript Language Tutorial andCookbook� Addison�Wesley� July �����

�AFQ��a� J� Andr'e� Richard Furuta� and Vincent Quint� By way of anintroduction� Structured documents� What and why� In J� An�dr'e� R� Furuta� and V� Quint� editors� Structured documents�The Cambridge Series on Electronic Publishing� pages � ��Cambridge University Press� �����

�AFQ��b� J� Andr'e� Richard Furuta� and Vincent Quint� editors� Struc�tured documents� The Cambridge Series on Electronic Publish�ing� Cambridge University Press� �����

�AHR���� Bowen Alpern� Roger Hoover� Barry K� Rosen� Peter F� Swee�ney� and Kenneth Zadeck� Keeping priorities straight� An inves�tigation of incremental algorithms� Technical Report CS����� �Brown University� November �����

�AHU��� Alfred V� Aho� John E� Hopcroft� and Je�rey D� Ullman� DataStructures and Algorithms� Addison�Wesley� April �����

�Alb��� Henk Alblas� Incremental attribute evaluation� In H� Alblasand B� Melichor� editors� Proceedings of the International Sum�

���

mer School on Attribute Grammars Applications and Systems�SAGA� Prague� number ��� in Lecture Notes in ComputerScience� pages ��� � � Springer�Verlag� June �����

�And��� J� Andr'e� Manipulation de documents� bibliographie� TSI �Techniques et Sciences Informatiques� ��� � ��� Jul Aug�����

�App��� Wolfgang Appelt� Document Architecture in Open Systems TheODA Standard� Springer�Verlag� May �����

�Ase��� Paul John Asente� Editing Graphical Objects Using ProceduralRepresentations� PhD thesis� Technical report No CSL�TR���� � � Computer Systems Laboratory� Stanford University� Octo�ber �����

�ASU��� Alfred A� Aho� Ravi Sethi� and Je�rey D� Ullman� CompilersPrinciples Techniques and Tools� Addison�Wesley� �����

�AU��� Alfred V� Aho and Je�rey D� Ullman� The Theory of ParsingTranslating and Compiling� volume I� Prentice�Hall� �����

�Bar��� David Barron� Why use SGML� Electronic Publishing� ��� ��� April �����

�Bha��� M� Afzal Bhatti� Incremental execution environment� UK Com�puting Laboratory Report No ��� University of Kent at Canter�bury� June �����

�BPR��� A� Michael Berman� Marvin C� Paull� and Barbara G� Ryder�Proving relative lower bounds for incremental algorithms� ActaInformatica� ������� �� � July �����

�BR��� F� Bancilhon and P� Richard� Managing texts and facts in amixed data base environment� In G� Gardarin and E� Gelenbe�editors� New Applications of Databases� pages �� ���� Aca�demic Press� �����

�Bro��� Kenneth P� Brooks� A two�view document editor with user�de�nable document structure� Technical Report No � DigitalSystems Research Center� November �����

���

�Bro��� Heather Brown� Standards for structured documents� The Com�puter Journal� ������ ���� December �����

�Bro��� Kenneth P� Brooks� Lilac� A two�view document editor� IEEEComputer� ����� ��� June �����

�CBG���� Donald D� Chamberlin� O� P� Bertrand� M� J� Goodfellow� J� C�King� Donald R� Slutz� Stephen J� P� Todd� and Bradford W�Wade� JANUS� An interactive document formatter based ondeclarative tags� IBM Systems Journal� �� ���� ���� �����

�CCH��� Pehong Chen� John Coker� and Michael A� Harrison� TheVORTEX document preparation environment� In TEX for Scien�ti�c Documentation Second European Conference StrasbourgFrance� pages �� ��� June �����

�Cel��� A� Celentano� Incremental LR parsers� Acta Informatica���� �� ��� �����

�CH��� Pehong Chen and Michael A� Harrison� Multiple representationdocument development� IEEE Computer� ������ �� January�����

�Cha��� Donald Chamberlin� An adaptation of data�ow methods forWYSIWYG document processing� In Proceedings of the ACMConference on Document Processing Systems Santa Fe NewMexico� pages ��� ���� December �����

�Cha��� Donald D� Chamberlin� Managing properties in a system of co�operating editors� In Richard Furura� editor� EP�� � Proceedingsof the International Conference on Electronic Publishing Doc�ument Manipulation and Typography Gaithersburg Maryland�The Cambridge Series on Electronic Publishing� pages � ���Cambridge University Press� September �����

�Che��� Pehong Chen� A Multiple�representation Paradigm for Docu�ment Development� PhD thesis� Report No UCB&CSD ��&� ��Computer Science Division� University of California� Berkeley�July �����

���

�CHL���� Donald D� Chamberlin� Helmut F� Hasselmeier� Allen W� Lu�niewski� Dieter P� Paris� Bradford W� Wade� and Mitch L� Zol�liker� Quill� An extensible system for editing documents ofmixed type� In Proceedings of the ��st Hawaii InternationalConference on System Sciences Kailu�Kona Hawaii� pages �� ��� January �����

�CHM��� Pehong Chen� Michael A� Harrison� and Ikuo Minakata� In�cremental document formatting� In Proceedings of the ACMConference on Document Processing Systems� pages � ����December �����

�Cho��� Noam Chomsky� On certain formal properties of grammars�Information and Control� ��� � ���� June �����

�CHP��� Donald D� Chamberlin� Hemut F� Hasselmeier� and Dieter P�Paris� De�ning document styles for WYSIWYG processing� InJ� C� van Vliet� editor� EP�� � Proceedings of the InternationalConference on Electronic Publishing Document Manipulationand Typography Nice France� The Cambridge Series on Elec�tronic Publishing� pages ��� � �� Cambridge University Press�April �����

�CIV��� Giovanni Coray� Rolf Ingold� and Christine Vanoirbeek� Format�ting structured documents� Batch versus interactive� In J� C�Vliet� editor� Proceedings of the International Conference onText Processing and Document Manipulation� British ComputerSociety Workshop Series� pages ��� ���� Cambridge UniversityPress� April �����

�CKS���� Donald D� Chamberlin� James C� King� Donald R� Slutz�Stephen J� P� Todd� and Bradford W� Wade� JANUS� An in�teractive system for document composition� In Proceedings ofthe ACM SIGPLAN SIGOA Symposium on Text Manipulation�pages �� ��� June ����� Proceedings also published as ACMSIGPLAN Notices ���� June �����

�DMM��� Pierpaolo Degano� Stefano Mannucci� and Bruno Mojana� E��cient incremental LR parsing for syntax�directed editors� ACM

���

Transactions on Programming Languages and Systems� ��� �� � � July �����

�DMS��� Norman M� Delisle� David E� Menicosy� and Mayer D� Schwartz�Viewing a programming environment as a single tool� ACMSIGPLAN Notices� ������ ��� May �����

�DRT��� Alan Demers� Thomas Reps� and Tim Teitelbaum� Incrementalevaluation for attribute grammars with application to syntax�directed editors� In Proceedings of the �th Annual ACM Sympo�sium on Principles of Programming Languages WilliamsburgVirginia� pages ��� ���� January �����

�FQA��� Richard Furuta� Vincent Quint� and J� Andr'e� Interactivelyediting structured documents� Electronic Publishing� ����� ��� April �����

�Fri��� Peter Fritzon� Fine�grained compilation for Pascal�like lan�guages� Research Report LiTH�MAT�R������� Link"oping In�stitute of Technology� July �����

�Fri��� Peter Fritzson� Incremental symbol processing� In D� Ham�mer� editor� Compiler Compilers and High Speed CompilationProceedings of the �nd CCHSC Workshop Berlin GDR� num�ber �� in Lecture Notes in Computer Science G� Goos and J�Hartmanis eds�� pages �� �� Springer�Verlag� October �����

�FSS��� Richard Furuta� Je�rey Sco�eld� and Alan Shaw� Document for�matting systems� Survey� concepts� and issues� ACM ComputingSurveys� �� ���� ���� September �����

�Fur��� Richard Furuta� Important papers in the history of docu�ment preparation systems� basic sources� Electronic Publishing������ ��� March �����

�GHKT��� T� Gyim'othy� T� Horv'ath� F� Kocsis� and J� Toczki� Incremen�tal algorithms in PROF�LP� In D� Hammer� editor� CompilerCompilers and High Speed Compilation Proceedings of the �ndCCHSC Workshop Berlin GDR� number �� in Lecture Notes

���

in Computer Science G� Goos and J� Hartmanis eds�� pages� ���� Springer�Verlag� October �����

�GM��� Carlo Ghezzi and Dino Mandrioli� Incremental parsing� ACMTransactions on Programming Languages and Systems� ����� ��� July �����

�GM��� Carlo Ghezzi and Dino Mandrioli� Augmenting parsers to sup�port incrementality� Journal of the Association for ComputingMachinery� �� ���� ���� July �����

�GT��� Gaston H� Gonnet and Frank Wm� Tompa� Mind your gram�mar� A new approach to modelling text� In Peter M� Stocker�WilliamKent� and Peter Hammersley� editors� Proceedings of theThirteenth International Conference on Very Large DatabasesBrighton England� pages � ��� September �����

�Hed��� G"orel Hedin� Incremental Semantic Analysis� PhD thesis� Re�port LUTEXD&TECS ��� &�����&����� Department ofComputer Science� Lund University� May �����

�Hel��� Dan Heller� XView Programming Manual for Version �� of theX Window system� volume � of The De�nitive Guides to the XWindow System� O�Reilly ( Associates� July �����

�HK��� Scott E� Hudson and Roger King� Semantic feedback in theHiggens UIMS� IEEE Transactions on Software Engineering��������� ����� August �����

�HKN��� Roy Hunter� Per Kaijser� and Frances H� Nielsen� ODA� a docu�ment architecture for open systems� Computer Communications������� ��� April �����

�HKR��� J� Heering� P� Klint� and Jan Rekers� Lazy and incrementalprogram generation� Technical Report CS�R����� Centrum voorWiskunde en Informatica CWI� November �����

�HKr��� J� Heering� P� Klint� and J� rekers� Incremental generation ofparsers� In proceedings of the SIGPLAN ��� Conference on Pro�gramming Langugae Design and Implementation Portland Ore�

� �

gon� pages ��� ���� July ����� Published as SIGPLAN Notices���� July �����

�HKR��� J� Heering� P� Klint� and J� Rekers� Lazy and incremental pro�gram generation revised version� Technical Report CS�R�����Centrum voor Wiskunde en Informatica CWI� April �����

�HKR��� J� Heering� P� Klint� and J� Rekers� Incremental generation oflexical scanners� ACM Transactions on Programming Languagesand Systems� ������� ���� October �����

�HM��� G"orel Hedin and Boris Magnusson� Incremental execution ina programming environment based on compilation� TechnicalReport LUTFD�& TFCS� ���& ����& ����� Lund University�October �����

�Hol��� Niklas Holsti� Incrementality in problem solving with comput�ers� In B� Ford and F� Chatelin� editors� Proceedings of the IFIPTC ��WG� Working Conference on Problem Solving Envi�ronments for Scienti�c Computing Sophia Antipolis France�pages �� ��� Elsevier Science Publishers� June �����

�Hol��� Niklas Holsti� Incremental MATLAB a program with an incre�mental user interface� Technical Report Report A������ � De�partment of Computer Science� University of Helsinki� March�����

�Hor��� Wolfgang Horak� O�ce document architecture and o�ce docu�ment interchange formats� Current status of international stan�dardization� IEEE Computer� ������� ��� October �����

�Hud��� Scott E� Hudson� A User Interface Management System WhichSupports Direct Manipulation� PhD thesis� University of Col�orado� �����

�Hud��� Scott E� Hudson� Incremental attribute evaluation� A �exiblealgorithm for lazy update� Technical Report TR ������ Univer�sity of Arizona� Tucson� Arizona� �����

� �

�Hud��� Scott E� Hudson� Incremental attribute evaluation� A �exiblealgorithm for lazy update� ACM Transactions on ProgrammingLanguages and Systems� � � �� ��� July �����

�JG��� Fahimeh Jalili and Jean H� Gallier� Building friendly parsers� InProceedings of the �th Annual ACM Symposium on Principlesof Programming Languages Albuquerque New Mexico� pages��� ���� January �����

�JKP��� Esa J"arnvall� Kai Koskimies� and Jukka Paakki� The design ofthe Tampere language editor TaLE� Technical Report A��������� Department of Computer Science� University of Tampere�December �����

�Jol��� Vania Jolobo�� Document representation� Concepts and stan�dards� In J Andr'e� R� Furuta� and V� Quint� editors� Struc�tured Documents� The Cambridge Series on Electronic Publish�ing� pages �� ���� Cambridge University Press� �����

�Kah��� Mark Kahrs� Implementation of an interactive programmingsystem� Sigplan Notices� ������ ��� August �����

�Kap��� S� M� Kaplan� Incremental attribute evaluation on node�labelcontrolled graphs� Technical Report UIUCDCS�R����� ��� Uni�versity of Illinois� May �����

�Kil��� Pekka Kilpel"ainen� Tree Matching Problems with Applicationsto Structured Text Databases� PhD thesis� Report A�������� De�partment of Computer Science� University of Helsinki� Novem�ber �����

�KLM���a� Pekka Kilpel"ainen� Greger Lind'en� Heikki Mannila� Erja Niku�nen� and Kari�Jouko R"aih"a� The architecture of the Helsinkistructured text database system HST� Unpublished report�Department of Computer Science� University of Helsinki� �����

�KLM���b� Pekka Kilpel"ainen� Greger Lind'en� Heikki Mannila� Erja Niku�nen� and Kari�Jouko R"aih"a� The data model and the query lan�guage of the Helsinki structured text database system HST�

� �

Unpublished report� Department of Computer Science� Univer�sity of Helsinki� �����

�KLMN��� Pekka Kilpel"ainen� Greger Lind'en� Heikki Mannila� and ErjaNikunen� A structured document database system� In RichardFuruta� editor� EP�� � Proceedings of the International Confer�ence on Electronic Publishing Document Manipulation � Ty�pography Gaithersburg Maryland� The Cambridge Series onElectronic Publishing� pages � � ���� Cambridge UniversityPress� September �����

�Knu��� Donald E� Knuth� The TEXbook� Addison�Wesley� December�����

�Kos��� Kai Koskimies� Lazy recursive descent parsing for modularlanguage implementation� Software � Practice and Experience�������� ���� August �����

�KP��� Eila Kuikka and Martti Penttonen� Designing a syntax�directedtext processing system� In Kai Koskimies and Kari�Jouko R"aih"a�editors� proceedings of the Second Symposium on ProgrammingLanguages and Software Tools Pirkkala Finland� Technical Re�port A ������� University of Tampere� pages ��� ���� August�����

�KW��� Raghu R� Karinthi and Mark Weiser� Incremental re�executionof programs� In Proceedings of the SIGPLAN ��� Symposium onInterpreters and Interpretive Techniques St Paul Minnesota�pages � ��� ACMPress� May ����� Proceedings also publishedas ACM SIGPLAN Notices� ���� � ��� July �����

�Lam��� Leslie Lamport� A Document Preparation system LaTEX User�sGuide � Reference Manual� Addison�Wesley� April �����

�Lar��� J� M� Larchev)eque� Using an LALR compiler compiler to gener�ate incremental parsers� In D� Hammer� editor� Compiler Com�pilers Proceedings of the third international Workshop CC ���Schwerin FRG� number ��� in Lecture Notes in Computer Sci�ence G� Goos and J� Hartmanis eds�� pages ��� ���� Springer�Verlag� October �����

�Lin��� G� Lindstrom� The design of parsers for incremental languageprocessors� In Proceedings of the �nd ACM Symposium on The�ory of Computing Northampton Massachusetts� pages �� ���May �����

�Loc��� K� Lock� Structuring programs for multiprogram time�sharingon�line applications� In AFIPS ��� Fall Joint Computer Con�ference �FJCC�� volume ��� pages ��� ���� American Fed�eration of Information Processing Societies AFIPS� SpartanBooks� �����

�Lun��� Allen W� Luniewski� Intent�based page modelling using blocksin the Quill document editor� In J� C� van Vliet� editor� EP�� �Proceedings of the International Conference on Electronic Pub�lishing Document Manipulation and Typography Nice France�The Cambridge Series on Electronic Publishing� pages ��� ����Cambridge University Press� April �����

�Mic��� Microsoft Corporation� Microsoft Word User�s Guide� �����

�MPS��� Arvind M� Murching� Y� V� Prasad� and Y� N� Srikant� In�cremental recursive descent parsing� Computer Languages������� ���� April �����

�MS��� Joseph M� Morris and Mayer D� Schwartz� The design of alanguage�directed editor for block�structured languages� ACMSIGPLAN Notices� ������ � June �����

�Nel��� Greg Nelson� Juno� a constraint�based graphics systems� InProceedings of SIGGRAPH �� San Fransisco� July ����� Pro�ceedings also published as ACM SIGGRAPH Computer Graph�ics �� � July �����

�Nik��� Erja Nikunen� Views in Structured Text Databases� Phil� Lic�Thesis� Report C��������� Department of Computer Science�University of Helsinki� November �����

�Qui��� Vincent Quint� Systems for manipulation of structured docu�ments� In J� Andr'e� R� Furuta� and V� Quint� editors� Struc�

� �

tured Documents� The Cambridge Series on Electronic Publish�ing� pages � ��� Cambridge University Press� �����

�QV��� Vincent Quint and Ir*ene Vatton� GRIF� An interactive systemfor structured document manipulation� In J� C� van Vliet� editor�Proceedings of the International Conference on Text Processingand Document Manipulation� British Computer Society Work�shop Series� pages ��� �� � Cambridge University Press� April�����

�QV��� Vincent Quint and Ir*ene Vatton� An abstract model for inter�active pictures� In H� J� Bullinger and B� Shackel� editors� Hu�man Computer Interaction � INTERACT ���� pages �� ����IFIP� Elsevier Science Publishers� September �����

�QVB��a� Vincent Quint� Ir*ene Vatton� and Hassan Bedor� Grif� An in�teractive environment for TEX� In Jacques D'esarm'enien� editor�TEX for Scienti�c Documentation� number � � in Lecture Notesin Computer Science G� Goos and J� Hartmanis eds�� pages��� ���� Springer�Verlag� June �����

�QVB��b� Vincent Quint� Ir*ene Vatton� and Hassan Bedor� Le syst*emeGrif� TSI � Technique et Science Informatiques� ��� � ��� July August �����

�Rei��� Brian K� Reid� A high�level approach to computer documentformatting� In Proceedings of the �th Annual ACM Symposiumon Principles of Programming Languages Las Vegas Nevada�pages �� �� January �����

�Rei��� Steven P� Reiss� An approach to incremental compilation� InM� Van Deusen� editor� Compiler Construction Proceedings ofthe ACM SIGPLAN ��� symposium� pages ��� ���� June����� Proceedings also published as ACM SIGPLAN Notices�������� ���� June �����

�Rep��� Thomas Reps� Optimal�time incremental semantic analysis forsyntax�directed editors� In �th Annual ACM Symposium on

� �

Principles of Programming Languages Albuquerque New Mex�ico Jan � � ��� pages ��� ���� January �����

�Rep��� Thomas Reps� Generating Language�Based Environments� MITPress� �����

�RR��� G� Ramalingam and Thomas Reps� On the computational com�plexity of incremental algorithms� Computer Sciences TechnicalReport #�� � University of Wisconsin�Madison� August �����

�RT��� Thomas Reps and Tim Teitelbaum� The Synthesizer GeneratorA System for Constructing Language�Based Editors� Springer�Verlag� �����

�Ryd� � Barbara G� Ryder� Incremental data �ow analysis� In ��thAnnual ACM Symposium on Principles of Programming Lan�guages Austin Texas� pages ��� ���� January ��� �

�Sal��� Arto Salomaa� Formal languages and power series� In Jan vanLeeuwen� editor� Handbook of Theoretical Computer Science�volume B� pages �� � �� Elsevier� �����

�Sal��� Airi Salminen� Rakenteisen tekstin hallinta Management ofstructured text� Technical Report OM� � Department of Com�puter science� University of Jyv"askyl"a� September ����� InFinnish�

�SDB��� Mayer D� Schwartz� Norman M� Delisle� and Vimal S� Begwani�Incremental compilation in Magpie� ACM SIGPLAN Notices�������� � �� June �����

�Sel��� Stanley M� Selkow� The tree�to�tree editing problem� Informa�tion Processing Letters� ������ ���� December �����

�Shn��� Ben Shneiderman� Designing the User Interface� Strategies forE�ective Human�Computer Interaction� Addison�Wesley� �ndedition� ����� �� pages�

�Sin��� John M� Sinclair� editor� Collins Cobuild English Language Dic�tionary� Collins� London and Glasgow� �����

� �

�Str��� Dan Str"omberg� Text editing and incremental parsing� Techni�cal Report LiTH�MAT�R���� �� Link"oping University� October�����

�Sun��� Sun Microsystems Inc� OpenWindows Version � DeskSet Envi�ronment Reference Guide� June ����� Revision A�

�SZ��� Dennis Shasha and Kaizhong Zhang� Fast algorithms for theunit cost editing distance between trees� Journal of Algorithms�������� ���� December �����

�Tai��� Kuo�Chung Tai� The tree�to�tree correction problem� Journalof the Association for Computing Machinery� �� ���� � �July �����

�Tri��� Stephen Trimberger� Combining graphics and a layout lan�guage in a single interactive system� In Proceedings of the ��thACM�IEEE Design Automation Conference Nashville Ten�nessee� pages � � � �� Computer Science Press� June �����

�Weg��� Mark Wegman� Parsing for structural editors� In ��st An�nual Symposium on Foundations of Computer Science Syra�cuse New York� pages �� ��� IEEE� October �����

�WF��� Robert A� Wagner and Michael J� Fischer� The string�to�stringcorrection problem� Journal of the ACM� ������� �� � Jan�uary �����

�WM��� Jim Welsh and Michael McKeag� Structured System Program�ming� Prentice�Hall International Series in Computer ScienceC� A� R� Hoare� series ed�� Prentice�Hall� �����

�Yeh� � Dashing Yeh� On incremental shift�reduce parsing� BIT�� �� � ��� ��� �

�YK��� Dashing Yeh and Uwe Kastens� Automatic construction of incre�mental LR��parsers� ACM SIGPLAN Notices� � � March�����

� �

�Zad��� F� Kenneth Zadeck� An e�cient algorithm for incremental data��ow analysis� Technical Report No� CS������� Brown University�February �����

�ZS��� Kaizhong Zhang and Dennis Shasha� Simple fast algorithms forthe editing distance between trees and related problems� SiamJournal of Computing� �������� ����� December �����

� �

Index

Advance newptr � Advance oldptr � ancestor ��annotated grammar ��applied production ��associated nodes ��associated nonterminals �associated tokens ��

batch�oriented systems �bidirectional link ��bottom�up parsing �bounded incremental algorithm ��

change � child of a node ��Chomsky hierarchy ��ClimbUpTree � complete parse mapping ��complete preorder mapping ��constant string context�free grammar ��� ��context�sensitive grammar ��

declarative document preparation�

deletion � � ��derivation ��derived string ��descendant ��

Dirsym �dirsym set �document �ow ��document grammar ��document instance document preparation model of HST

document preparation system �document preparation �� ��document standard ��document tree ��document ��

edit distance ��end tag ��expanded production ��

First ���rst set ��Follow ��follow set ��formatted view �formatting detail �frontier of tree ��� ��

general document structure �generated string ��grammar ��graphical view �

� �

Helsinki Structured Text DatabaseSystem HST ��

HST document model ��

incremental algorithm �incremental analysis �incremental applications �incremental compilation �� ��incremental document �ow ��incremental methods �incremental parsing �� ��incremental syntax�directed trans�

lation �� ��incremental transformation ��incremental translation ��incremental tree translation ��incremental update of an editor ��incremental update �� ��inorder ��input grammar �input token ��insertion � � ��instance of a symbol ��interactive systems �inverse transformation iteration node ��iteration part ��iteration production ��iteration subtree ��

label ��language of a grammar ��Last �last symbol �layout view �lazy applications ��lazy iteration production ���

lazy methods �lazy paradigm ��lazy production ���lazy transformation ��� ��lazy view grammar ���lazy view �� �� link ��� ��� ��LL� grammar �local synchronization ��local termination node ��locally synchronized ��logical de�nition logical part �Lookahead ��

mapping ��markup view �mindist � minimum edit distance ��� � � ��

newptr ��Next ��Next until ��node ��node label ��nonterminal ��nonterminal alphabet ��Nullable ��

occurrence of a symbol ��ODA ��oldptr ��on�line analysis �one�shot paradigm �output grammar �

ParseText ��ParseX ��

���

Parsec ��Parse� ��parsedist ��parse distance ��� ��parse mapping ��parse tree token ��parse tree ��parsing �partial view �� possible termination node ��� ��postorder ��predictive parsing �predist ��preorder distance ��� ��preorder mapping ��preorder ��presentation schema �problem instance �procedural document preparation �procedurality of documents �production ��program generation ��program view �propagation ��pruned view ��

range �� � ���recursive descent parsing �re�exive transitive closure ��regular grammar ��rewriting rule ��rewriting system ��right�linear grammar ��root ��

Scan until ��Scan ��

S�derivation ��SDTS �SDT �SGML ��sibling ��simple SDTS ��simple modi�cation � start symbol ��start tag ��structured document �structured text ��subtree ��Symbol ��syntax�directed translation scheme

�syntax�directed translation �� �

tagged input token ��terminal alphabet ��termination conditions ��� ��Textable �textable grammar textable production �textual view �� ��token class �top�down parsing �transformation transitive closure ��translation form �translation �Traverse ��tree node ��tree�to�tree correction ��

unambiguous grammar ��unbounded incremental algorithm

��

���

variable string view de�nition view grammar ��� ��view tree �attening ��view tree ��view � ��� �

WYSIWYG systems �

yield of tree ��yield relation ��

���