managing the evolution of web-based applications with webscm · 2009. 11. 10. · for web-based...

10
Managing the evolution of Web-based applications with WebSCM Tien N. Nguyen, Ethan V. Munson, and Cheng Thao Computer Science Department University of Wisconsin-Milwaukee {tien,munson,chengt}@cs.uwm.edu Abstract In traditional software engineering tools, software con- figuration management (SCM) is the dominant approach to manage the evolution of a software system. However, the evolution of Web-based applications presents special chal- lenges that have not been well addressed by existing SCM systems. They often treat hyperlinked Web documents as a set of text files in a file system and disregard crucial struc- tures in a Web-based application such as internal structure, navigational structure, logical structure, and compositional structure among Web objects. Key limitations include their inadequacy to represent Web object semantics and their inability to manage changes to those important structures among objects making up a Web application. This paper presents a novel SCM-centered development environment for Web-based applications, named WebSCM. With Web- SCM, Web developers can manage fine-grained, structural evolution of Web objects and important structures among them at different levels of abstraction (e.g. conceptual, nav- igational, presentation, and implementation levels). This paper also describes the motivation for this environment as well as its user interfaces, features, and implementation. 1 Introduction Millions of Web sites have been developed in the past ten years. A lot of efforts and time have been spent to main- tain Web-based applications in the daily basis. Small Web sites with less than dozen HTML pages can easily be devel- oped and maintained. However, many organizations have failed or have struggled to avoid major failures in develop- ing large-scaled and high-quality Web-based applications. The main reason is that many Web developers commonly use ad hoc development processes that lack rigor, system- atic techniques, sound methodologies, and quality assur- ance and may pay little attention to issues such as require- ments analysis, quality, performance evaluation, configura- tion management, maintainability, and scalability [19]. To address this, many techniques from traditional and modern software engineering have been applied to Web de- velopment and maintenance. Especially, software config- uration management (SCM) technology has been used to manage the evolution of Web-based systems. However, the maintenance of Web-based applications do have distinct characteristics that arise from the nature of hyperlinks and the Web environment [19, 26]. The evolution of Web appli- cations presents special challenges that have not been well addressed by existing SCM systems. First of all, Web systems are built from a wide diver- sity of objects including documents in markup languages, document templates, style sheets, images, streaming media, animations, applets, scripts, etc. Unlike software engineer- ing, where program source code is seen as the central ar- tifact, in Web systems it is difficult to identify one type of object as most important. So, an SCM system for Web ap- plications must be compatible with a wide variety of file types with different kinds of structure. Second, a Web doc- ument, either static or dynamic, has some internal structure. An HTML document has a tree-based syntactical structure, while a program can be regarded as an abstract syntax tree (AST) of syntactic units. Taking advantage of the internal structure of a Web page results in various benefits in pro- cessing, visualizing, searching and retrieving Web informa- tion [14, 15, 29]. As the Internet moves toward XML [21], it is likely that Web documents will gain structure with se- mantics of growing importance. Structural semantics is de- scribed, at least to some extent, by Document Type Defini- tions (DTD) or XML Schemas, which are also evolving ob- jects. So, Web content must evolve along with its structural rules. Traditional SCM tools are not well-suited for Web applications because they often use a line-oriented model of internal changes that disregards the internal structures of Web contents. In addition, a good navigational structure of a Web sys- tem is one of keys to success of a Web application. Navi- gational structure refers to the structure of Web documents with respect to hyperlinks among them. Web documents in a Web system are logically related and connected to each Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM’05) 1063-6773/05 $20.00 © 2005 IEEE Authorized licensed use limited to: TOKYO INSTITUTE OF TECHNOLOGY. Downloaded on November 9, 2009 at 06:03 from IEEE Xplore. Restrictions apply.

Upload: others

Post on 24-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Managing the evolution of Web-based applications with WebSCM · 2009. 11. 10. · for Web-based applications, named WebSCM. With Web-SCM, Web developers can manage fine-grained,

Managing the evolution of Web-based applications with WebSCM

Tien N. Nguyen, Ethan V. Munson, and Cheng ThaoComputer Science Department

University of Wisconsin-Milwaukee{tien,munson,chengt}@cs.uwm.edu

Abstract

In traditional software engineering tools, software con-figuration management (SCM) is the dominant approach tomanage the evolution of a software system. However, theevolution of Web-based applications presents special chal-lenges that have not been well addressed by existing SCMsystems. They often treat hyperlinked Web documents as aset of text files in a file system and disregard crucial struc-tures in a Web-based application such as internal structure,navigational structure, logical structure, and compositionalstructure among Web objects. Key limitations include theirinadequacy to represent Web object semantics and theirinability to manage changes to those important structuresamong objects making up a Web application. This paperpresents a novel SCM-centered development environmentfor Web-based applications, named WebSCM. With Web-SCM, Web developers can manage fine-grained, structuralevolution of Web objects and important structures amongthem at different levels of abstraction (e.g. conceptual, nav-igational, presentation, and implementation levels). Thispaper also describes the motivation for this environment aswell as its user interfaces, features, and implementation.

1 Introduction

Millions of Web sites have been developed in the pastten years. A lot of efforts and time have been spent to main-tain Web-based applications in the daily basis. Small Websites with less than dozen HTML pages can easily be devel-oped and maintained. However, many organizations havefailed or have struggled to avoid major failures in develop-ing large-scaled and high-quality Web-based applications.The main reason is that many Web developers commonlyuse ad hoc development processes that lack rigor, system-atic techniques, sound methodologies, and quality assur-ance and may pay little attention to issues such as require-ments analysis, quality, performance evaluation, configura-tion management, maintainability, and scalability [19].

To address this, many techniques from traditional andmodern software engineering have been applied to Web de-velopment and maintenance. Especially, software config-uration management (SCM) technology has been used tomanage the evolution of Web-based systems. However,the maintenance of Web-based applications do have distinctcharacteristics that arise from the nature of hyperlinks andthe Web environment [19, 26]. The evolution of Web appli-cations presents special challenges that have not been welladdressed by existing SCM systems.

First of all, Web systems are built from a wide diver-sity of objects including documents in markup languages,document templates, style sheets, images, streaming media,animations, applets, scripts, etc. Unlike software engineer-ing, where program source code is seen as the central ar-tifact, in Web systems it is difficult to identify one type ofobject as most important. So, an SCM system for Web ap-plications must be compatible with a wide variety of filetypes with different kinds of structure. Second, a Web doc-ument, either static or dynamic, has some internal structure.An HTML document has a tree-based syntactical structure,while a program can be regarded as an abstract syntax tree(AST) of syntactic units. Taking advantage of the internalstructure of a Web page results in various benefits in pro-cessing, visualizing, searching and retrieving Web informa-tion [14, 15, 29]. As the Internet moves toward XML [21],it is likely that Web documents will gain structure with se-mantics of growing importance. Structural semantics is de-scribed, at least to some extent, by Document Type Defini-tions (DTD) or XML Schemas, which are also evolving ob-jects. So, Web content must evolve along with its structuralrules. Traditional SCM tools are not well-suited for Webapplications because they often use a line-oriented modelof internal changes that disregards the internal structures ofWeb contents.

In addition, a good navigational structure of a Web sys-tem is one of keys to success of a Web application. Navi-gational structure refers to the structure of Web documentswith respect to hyperlinks among them. Web documents ina Web system are logically related and connected to each

Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM’05)

1063-6773/05 $20.00 © 2005 IEEE

Authorized licensed use limited to: TOKYO INSTITUTE OF TECHNOLOGY. Downloaded on November 9, 2009 at 06:03 from IEEE Xplore. Restrictions apply.

Page 2: Managing the evolution of Web-based applications with WebSCM · 2009. 11. 10. · for Web-based applications, named WebSCM. With Web-SCM, Web developers can manage fine-grained,

other via these hyperlinks. When a user understands wherehe can navigate, and how he can reach a desired target point,he can benefit the most from the Web application. Naviga-tional structure also evolves over time. Recording its evo-lution would facilitate the management of “dangling” linksand avoid the navigation to an incorrect version of a desti-nation document.

Configuration management for Web-based applicationsis much more than providing version control and changemanagement for individual Web documents. Logical struc-tures and relationships in a Web system are also very im-portant. As developing large-scaled Web applications, de-velopers may apply a design methodology, which requirescertain logical design models, separately from the imple-mentation model (e.g. HTML files or scripts). For exam-ple, in OOHDM [26], the conceptual model expresses rela-tions among data objects, while the navigation model andpresentation model address navigational structure and in-formation presentation in a Web system, respectively. Enti-ties in each of these design models are often related to eachother. The relationships among them are crucial for the suc-cess of a design process [26]. For example, between entitiesin the data conceptual model and those in the presentationmodel, there exists logical mappings such as in which pagesor screens a piece of data or an object will be presented. Be-tween presentation and navigation models, the informationsuch as how screens or pages are connected by hyperlinksis very important. In addition, the mapping between screendesign and implementation files is important, for example,in which HTML files a screen in the presentation model isrealized. All of those logical mappings, structures, and re-lations evolve over time. Recording the evolution of thoselogical connections would greatly benefit for developers intheir development and maintenance tasks.

This paper presents a novel SCM-centered developmentenvironment for Web-based applications, named WebSCM.WebSCM was built using Molhado [22], an object-orientedSCM infrastructure, that is able to manage versions of asoftware system at the logical level. With WebSCM, Webdevelopers can manage fine-grained, structural evolution ofWeb objects and important structures among them at dif-ferent levels of abstraction (e.g. conceptual, navigational,presentation, implementation models). Important logicalconnections between those models are recorded over time.WebSCM always maintains version consistency not onlyamong implementation artifacts (e.g. source code, HTML,etc), but also among logical entities in design models. Webdevelopers can easily relate together changes in their de-signs and implementation artifacts.

The next section will discuss related work on apply-ing version control and configuration management to Web-based applications. Section 3 briefly describes the basicdata and version model that is used in WebSCM. Section 4

presents our representation for designs at the conceptualdata model in a Web-based application. Section 5 describesWebSCM’s supports for screen designs for Web pages. Therepresentation for Web documents is presented in Section 6.Section 7 describes how the system manages versions andconfigurations of Web contents, important structures, andlogical mappings. The highlights of GUI-based functional-ity are discussed in Section 8. Conclusions appear in thelast section.

2 Related work

Researchers and vendors in the configuration manage-ment area are taking different approaches to SCM for theWeb. All have added Web functions to their SCM tools byoffering access to some or all SCM functionality through abrowser [7]. WebSynergy [35] provides a Web front-endinto all of its existing SCM capabilities as well as Webauthoring tools. Similar to our approach, MKS’s WebIn-tegrity [34] integrates its version control facilities with anauthoring tool, while in Merant’s PVCS [20], version con-trol is the core part and is separate from authoring sys-tems. However, both of them version control at the filelevel. StarTeam [30] is Web-enabled with the intention oftool integration. TrueChange [33] provides content changemanagement along with its version control, but with lessfocus on configuration management. Serena’s eChange-Man [13] is focused on process management and changetracking. Rational’s ClearCase [18] provides configurationmanagement for text files via total versioning approach.

ClearQuest [5] is a change request management toolof ClearCase, coordinating many developers in chang-ing Web documents. Content change management inSourceSafe [28] is line-oriented. Computer Associate’sCCC/Harvest [4] pays considerable attention to supportingcollaboration among distributed development teams. Per-force [25] is more lightweight than other SCM tools andit has the ability to migrate repositories from other SCMtools such as CVS and PVCS into its internal repository.Although all of these SCM tools have distinguished andvaluable features, they are focused on version control offiles, rather than on configuration management for objectsmaking up a Web system. None of them supports fine-grained change management at the logical level and all con-tent change management is line-oriented.

On the other hand, some Web application develop-ment environments (also called Web authoring environ-ments or Web content management environments) have re-alized that SCM practices need to be incorporated intotheir tools. Tools such as FrontPage, Macromedia’sDreamWeaver [10], and ColdFusion [6] have no built-in version control support. Other tools, such as Story-Server [31] and TeamSite [32], are designed to support

Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM’05)

1063-6773/05 $20.00 © 2005 IEEE

Authorized licensed use limited to: TOKYO INSTITUTE OF TECHNOLOGY. Downloaded on November 9, 2009 at 06:03 from IEEE Xplore. Restrictions apply.

Page 3: Managing the evolution of Web-based applications with WebSCM · 2009. 11. 10. · for Web-based applications, named WebSCM. With Web-SCM, Web developers can manage fine-grained,

many aspects of Web development, with particular strengthin supporting collaborative work. TeamSite provides visualdifferencing tools so that two versions of the same con-tent can be examined side by side. Inso’s DynaBase [17]is an integrated content management and publishing plat-form for Web applications. Similar to our approach, it isXML-based, allowing better management and reuse of data.In DynaBase, configuration management use the “tagging”technique also seen in CVS. Configurations act like a bill-of-materials for the Web site, enumerating which items areincluded in the site and which version of each item is in use.ArticleBase [1] integrates content management and versioncontrol into an authoring and publishing system. Its versioncontrol support is on file basis.

Many research in versioned hypermedia community [16,24, 37] have focused on version control for documents inthe presence of hyperlinks. However, the main goals of ver-sioned hypermedia systems often do not include supportsfor Web application development. Therefore, supports forprogram source code are very limited.

To improve the authoring and browsing features for ver-sioned contents of Web pages, some researchers in this areafollowed the language-oriented approach. They have at-tempted to change the Uniform Resource Locator (URL)of a Web page to include a version identifier [27]. Theyuse existing Web infrastructure such as forms, Java applets,and plug-ins to create a user interface for revision controlsystems on the server. Bendix and Vitali proposed VTML(versioned text markup language) [3] to express change op-erations for HTML documents. For example, they intro-duced two new tags (INS and DEL) to express insertion anddeletion. The WebDAV protocol [36] is an extension of theHypertext Transfer Protocol (HTTP) to support distributedauthoring and versioning. It extends HTTP to include ver-sioning operations for Web pages. ττApache is transaction-time HTTP server that supports document versioning [12].To construct a document version history, snapshots of thedocuments files are obtained over time. Transaction timesare associated with each file version. The transaction timeis the system time of the edit that created the version.

In general, existing SCM systems for Web applicationshave a large variety of useful functionality. The similaritiesamong them include supports for Web and scripting lan-guages, templates and stylesheets, versioning of files, roll-back of complete sites via backup, audit logging, workflowsupport for collaborative work, commercial database inter-faces, change tracking and management support. However,configuration management support is still limited to files.None of them cares about the navigational or internal struc-ture of Web documents. Their content change managementis coarse-grained, with differencing done on a line-by-linebasis. None of them supports for managing logical connec-tions among entities of different design models over time.

Figure 1. Data model

3 Data model

WebSCM is built based on an object-oriented SCM in-frastructure, named Molhado [22]. This section describesthe main concepts of the data model used in Molhado.Those concepts are node, slot, attribute, and sequence. Anode is the basic unit of identity and is used to representany abstraction. A slot is a location that can store a valuein any data type, possibly a reference to a node. A slot canexist in isolation but typically slots are attached to nodes,using an attribute. An attribute is a mapping from nodes toslots. An attribute may have particular slots for some nodesand map all other nodes to a default slot. The data modelcan thus be regarded as an attribute table whose rows corre-spond to nodes and columns correspond to attributes. Thecells of the attribute table are slots (see Figure 1).

There are three kinds of slots. A constant slot is im-mutable; such a slot can only be given a value once, when itis defined. A simple slot may be assigned even after it hasbeen defined. The third kind of slot is the versioned slot,which may have different values in different versions. A se-quence is a container with slots of the same data type. It hasa unique identifier. Sequences may be fixed or variable insize and share common slots together. Slots in an attributetable can also be simple slots or constant slots. Once we addversioning, the table gets a third dimension: the version.Details of our version model are described in Section 7.

Trees and directed graphs are built using this data model.A directed graph is defined with an attribute (“children” at-tribute) that maps each node to a sequence holding its chil-dren. Trees additionally have the “parent” attribute, whichmaps each node to its parent. Each node in a tree or agraph can be associated with other slots defined by otherattributes. WebSCM uses these attributed tree and directedgraph data structures to represent important structures in aWeb-based system. This data model is used as the basis ofour representation for artifacts in design and implementa-tion models of a Web-based application (e.g. concept, pre-sentation, navigation, and implementation).

Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM’05)

1063-6773/05 $20.00 © 2005 IEEE

Authorized licensed use limited to: TOKYO INSTITUTE OF TECHNOLOGY. Downloaded on November 9, 2009 at 06:03 from IEEE Xplore. Restrictions apply.

Page 4: Managing the evolution of Web-based applications with WebSCM · 2009. 11. 10. · for Web-based applications, named WebSCM. With Web-SCM, Web developers can manage fine-grained,

title: stringtype: string

...

Story

text: string

illustration:......

Essay

audio_file: ......

Interview

tel: stringname: string

...

Author

email: string

granted_by

is_author_of

1

32

4 5

6

a. Diagram

b. Representation

n1

n2

n3

story

"component"

interview

essay

n4 undefined

n5 author

n6 undefined

"type"class

class

class

classrelation

relation

c. Attribute table

Figure 2. Conceptual modeling

4 Supports for design in conceptual model

The starting point of a Web application design process isthe elaboration of a model of the application domain, whichdetermines the universe of discourse [26]. This task is of-ten referred to as conceptual design in Web engineering. Inconventional Web applications, i.e. those in which most ofWeb objects are static pages, any simple document modelsuch as DOM [8] would be enough to model Web objects.However, in large-scaled Web sites where information maychange dynamically with complex computations or queries,we need a behaviorally richer conceptual model. To addressthis, we have chosen the OOHDM conceptual model [26].In OOHDM, a conceptual schema is built upon classes, re-lationships, and sub-systems. Classes are described as usualin object-oriented models, though attributes may be multi-typed representing different perspectives of the same real-world entity. A schema consists of a set of classes con-nected by relationships. Objects are instances of classes,and thus, when a relationship holds between classes, it ab-stracts corresponding object-to-object relationships.

Figure 2 shows an example of a conceptual schema foran online newspaper. There are stories, which can be es-says or interviews. Every story has an author, and an inter-view is related to the person who grants the interview. Classand relation in a conceptual schema are defined as Molhadoatomic components [22]. To represent this schema, we usean attributed directed graph. Each entity (class, relation,etc) is represented by a node except that each inheritancerelation is represented by an edge (e.g. between “n1” and“n2”). Edges connect entities together to reflect the relation-ships in the schema. Therefore, they form a directed graph(see Figure 2b)). Each node in the directed graph is asso-ciated with multiple attributes. The “component” attribute

defines for each node a reference to the corresponding en-tities except for the “relation” nodes (e.g. “n4” and “n6” inFigure 2b)). The “relation” nodes are associated with ad-ditional slots representing other properties of the relationssuch as arity, name, etc. Figure 2c) partially shows the at-tribute table for that conceptual schema. Each property ofa class is represented by a versioned slot (in a data type)within the class component.

Data records for objects instantiated from a class arestored in an attributed tree with the depth of two, branch-ing off the class node. Each of those tree nodes at the firstlevel represents a record. Each record can have a sequenceof fields, which are represented by children nodes of therecord node. Each of these field nodes has additional at-tributes such as “name”, “type”, and “value” to representthe name, type, and value of each field. In general, the con-ceptual schema and data records for objects are representedas attributed directed graphs and trees at two abstract levels:the schema level and the individual data object level.

5 Web screen design

5.1 Presentation design

In current Web development practice, presentation infor-mation is often encoded within HTML files. However, it isalways a good practice to separate the screen design phasefrom the coding phase for Web pages [26]. The separa-tion between screen design and implementation allows Webdesigners to work on the design aspects of a Web systemwithout worrying about concrete level of implementation inWeb languages (e.g. HTML, scripting languages). Also,this separation enables an automatic generation of imple-mentation files from the screen designs. This feature is par-ticularly important in E-commerce Web sites whose most ofWeb pages are dynamically generated.

Supports for Web screen design in WebSCM can be clas-sified into three categories. Firstly, Web developers canspecify presentation information for a Web page in a CSS-like stylesheet. Secondly, in WebSCM, developers are ableto work on the composition of Web screens. Pieces of infor-mation in the conceptual model can be presented on screens.A screen is either an atomic or composite component. Acomposite screen is a composition of atomic screens and/orother composite screens. A composite screen representssomething like an HTML frame containing multiple HTMLpages. An atomic screen, which does not contain anyother screen, represents an individual Web page. In gen-eral, screens are defined according to Molhado componentframework [22]. The compositional hierarchy of a compos-ite screen is represented as a tree. For each tree node, theassociated “component” slot holds a reference to a memberscreen contained within the composite screen.

Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM’05)

1063-6773/05 $20.00 © 2005 IEEE

Authorized licensed use limited to: TOKYO INSTITUTE OF TECHNOLOGY. Downloaded on November 9, 2009 at 06:03 from IEEE Xplore. Restrictions apply.

Page 5: Managing the evolution of Web-based applications with WebSCM · 2009. 11. 10. · for Web-based applications, named WebSCM. With Web-SCM, Web developers can manage fine-grained,

screen

1

frame B

screen 2

frame A

n0

n1 n2 n3

n4

n5 n6

A

B

n0

n1n2

n3

n4

nullscreen 1screen 2

"component"

frame B

n5 screen 3null

n6 screen 4

Composite Representation

...

frame B

screen 3 screen 4

Figure 3. Composite screen representation

screen

1

screen

2

screen

3screen

4

n1

n3

n2

n4

Representation

n1

n2

n3

n4

screen 1

screen 2

screen 3

"component"

screen 4

Navigational structure

Figure 4. Navigational structure

Figure 3 shows an example of a composite screen. Frame“A” contains two atomic screens (“1” and “2”) and anotherframe (frame “B”). Frame “B” has two atomic screens.Frame “A” is represented by a tree rooted at “n0”. Nodes“n1” and “n2” are associated with slots containing refer-ences to screen “1” and “2” respectively, while the “compo-nent” slot for node “n3” refers to frame “B”. The represen-tation of frame “B” is in a similar manner. The third type ofsupport for screen design is navigational design.

5.2 Navigational design

The simplicity of HTML as the implementation languageand the lack of a hypermedia design culture resulted in Websites in which users experienced cognitive overhead and dis-orientation while navigating the Web [26]. Navigational de-sign is a critical step in the design of a Web-based applica-tion. A navigational model is built as a view over a con-ceptual model, thus allowing the construction of differentmodels according to users’ profiles. A navigational modelprovides a “subjective” view of the conceptual model.

To represent the design for navigational structure amongscreens, it is very straightforward to use a directed graphin which edges represent hyperlinks and each node in thedirected graph is associated with a slot referring to the cor-responding screen. In other words, the navigational struc-ture is represented in the same way as the data conceptualschema as described in Figure 2 except that the “compo-

navigation

links

data objects screens

mapping edges

mapping

conceptual data model tree presentation model graph

a. Mappings

b. Representation graph

Figure 5. Mappings from data to screens

nent” slots refer to screens and hyperlinks are representedas edges instead of nodes. Figure 4 shows our represen-tation for a simple navigational structure among four Webpages. In a navigational schema, one representative edgebetween a pair of screens is kept even if there are mul-tiple hyperlinks among them. The reason for our designchoice is that the main purpose of navigational design is tosketch out coarse-grained linking structure, while detailedand fine-grained hyperlinks should be accommodated at theimplementation level such as in HTML pages or scripts.

WebSCM aims not only to provide version control fordesigns in the conceptual and navigational models, but alsoto manage the evolution of the logical mappings among en-tities in those models. For example, to record the screenswhere a data record or object would be displayed, logi-cal links are added from data elements in the conceptualdata model to screens in the presentation model. Figure 5shows the representation of those links in terms of map-ping edges. If we put together the conceptual schema graph(representing for a conceptual schema), the conceptual datamodel tree (representing for data objects), and the presenta-tion model graph (for navigational structure) with mappingedges, we have a complex directed graph in which edges aretyped. For example, hyperlink edges are for navigationalhyperlinks, and mapping edges for logical mappings fromconceptual data model to presentation model. Details ofhow WebSCM versions this graph will be explained later.

6 Web documents

6.1 Structure-oriented representation

This section describes our representation for differenttypes of Web contents at the implementation level such asmarkup files, program source code, scripts, etc. Web con-tent can consist of data objects, code, and component li-braries. Examples of data objects are data files, documents,

Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM’05)

1063-6773/05 $20.00 © 2005 IEEE

Authorized licensed use limited to: TOKYO INSTITUTE OF TECHNOLOGY. Downloaded on November 9, 2009 at 06:03 from IEEE Xplore. Restrictions apply.

Page 6: Managing the evolution of Web-based applications with WebSCM · 2009. 11. 10. · for Web-based applications, named WebSCM. With Web-SCM, Web developers can manage fine-grained,

<fulladdress><name> MSL </name>

<address POBox = "781">

<street> 3200 Crammer

<city> Milwaukee </city>

<state> WI </state></address>

</fulladdress>

a."operator"

n1n1

n2

"content"

n3

n4

n5

</street>

"POBox"

n6

n7

n8

n9

n10

b.

intop("full...")

intop(x): an intermediateoperator with the name x

...

textop: the text operator

null undefined

intop("na...") null

textop "MSL"

intop("add..") null "781"

intop("stre..") null

textop "3200..."

intop("city..") null

null

textop "Milwa.."

intop("state")

textop "WI"

Legend:

undefined

undefined

undefined

undefined

undefined

undefined

undefined

undefined

Figure 6. Document representation

images, audios, and videos. Code can be active controls andscripts that can be embedded into an HTML page. Com-ponent libraries are reusable codes such as JavaBeans andMicrosoft Foundation Classes. Among Web documents, hy-perlinks exist that can point to any page in the Web. Tem-plates such as stylesheets, which are written in some lan-guages, enable separation of content and presentation style.To provide fine-grained version control and content man-agement for Web documents, WebSCM follows the prin-ciple that textual information such as HTML, XML docu-ments, and program source code are treated as structuredobjects, while binary information such as audio clips, videoclips, and component libraries are considered to have no in-ternal structure and are versioned on the file basis.

Our document representation uses a structure-orientedapproach where each Web document is a document treecomposed of document nodes. Each document node can beassociated with multiple pairs of attribute name and value.A document tree, representing internal structure of a Webdocument, is versioned in a fine-grained manner (see Sec-tion 7). Document nodes in a document tree carry differentlogical senses depending on the document’s type. In a pro-gram or a script, a document node corresponds to a syntac-tic unit such as a class, method, expression, or statement. Inother Web documents (e.g. HTML, XML, etc), dependingon a document’s DTD, a document node might represent asection, paragraph, sentence, or phrase. That is, a documentnode represents logical semantics encoded via a markup.

Figure 6 shows an example of our representation for anXML-based or a markup-based document. A syntax tree,which represents the internal structure of a document, is atree with an additional slot that stores an operator for eachnode via “operator” attribute. The text operator is used torepresent XML’s character data (CDATA) construct. Eachnode associated with the text operator has an additional slot(defined by the “content” attribute) that holds the CDATAstring. Each element node in an XML document is asso-

ciated with an intermediate operator, whose name is theelement’s name. Each node associated with an intermediateoperator has one additional slot for each XML element-levelattribute that is defined for that element type. In Figure 6,node “n1”, representing the “fulladdress” element, is asso-ciated with an intermediate operator with the name “fullad-dress”. Node “n3”, whose “content” slot contains the string“MSL”, is associated with the unique text operator. The“POBox” attribute of “address” is valid only for “n4”.

For a program or a script, its AST (as its document tree)is constructed in the same way as an XML document tree. Aset of operators is defined to represent the AST, with eachnode in the AST being represented as a node in the datamodel. In a syntax tree, each node is associated with a slotreferring to an operator that defines the syntax for the syn-tactical unit represented by the node. For example, an addi-tion expression “x + y” would be represented by a node thatis associated with a slot referring to the “AddExpression”operator. Children nodes are left child and right child of theaddition expression. They are both associated with the sameoperator named “NameExpression”, and each has a specialslot whose value refers to its own identifier.

In general, this document tree representation is logicallyequivalent to the one supported by DOM [8]. A conver-sion facility is implemented to allow the import and exportof DOM-compatible and XML-based documents into ourrepresentation. The maturity of XML technology [21] al-lows us to support a wide variety of information types thatare based on XML, including graphics and animation writ-ten in SVG, XML-based design diagrams (e.g. UML di-agrams), and HTML documents. Any XML-based sourcecode formats for scripting or programming languages, alongtheir respective structured editors, can be easily pluggedinto WebSCM. WebSCM currently supports an XML-basedrepresentation for Java programs similar to JavaML [2]. Forcode artifacts that are not XML-based, appropriate parsersand structured editors are required to generate our documenttree representation. WebSCM has a built-in Java parser toparse textual Java programs into our AST representation.

6.2 Mappings from design to implementation

To help Web developers to keep track of which imple-mentation files (HTML, scripts, etc) realize a particularscreen in a design, WebSCM maintains the logical connec-tions from entities in the design models to entities in the im-plementation model. Similar to the case between the con-ceptual data model and presentation model, the mappingsare realized as typed edges between design nodes and im-plementation nodes in the representation graph and tree, re-spectively. Figure 7 illustrates edges from screen nodes inthe screen design representation graph to file nodes in a di-rectory structure representation tree.

Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM’05)

1063-6773/05 $20.00 © 2005 IEEE

Authorized licensed use limited to: TOKYO INSTITUTE OF TECHNOLOGY. Downloaded on November 9, 2009 at 06:03 from IEEE Xplore. Restrictions apply.

Page 7: Managing the evolution of Web-based applications with WebSCM · 2009. 11. 10. · for Web-based applications, named WebSCM. With Web-SCM, Web developers can manage fine-grained,

dir A

dir B dir C

a.html c.html... ...

Directory structureScreen design

Screen design

representation graphDirectory structure

representation tree

Figure 7. Mappings from screen to code

7 Structure-oriented product versioning

This section describes how WebSCM provides versioncontrol and configuration management for Web contentsand crucial structures in a Web-based application. Firstof all, the SCM model applied in WebSCM is the Mol-hado product versioning SCM model [22]. A version inthis model is global across entire Web-based project and isa point in a tree-structured discrete time abstraction. Thatis, the third dimension in the attribute table in Figure 1 istree-structured and versions move discretely from one pointto another. All system objects are versioned in a uniform,global version space across a software project, called prod-uct versioning space. Molhado emphasizes the evolutionof a software system as a whole. In this SCM model, thestate of the whole software system is captured at certaindiscrete time points and only these captured versions canbe retrieved in later sessions.

The current version is the version designating the currentstate of the project. When a particular version is set to becurrent, the state of the whole project is set back to thatversion. Changes made to versioned slots of the project atthe current version create a temporary version, branchingoff the current version. That temporary version will onlybe recorded if a user explicitly requests that it be captured.On the other hand, no version is created if a simple slotis modified. To record the history of an individual object,the whole project is captured. Capturing the whole projectis quite efficient because the version control system onlyrecords changes and works at the slot level [22]. It usestechniques derived from the work of Driscoll et al [11] tostore and retrieve different data types in attribute tables.

To be able to provide fine-grained versioning for all log-ical entities and structures in design and implementationmodels, as well as logical connections among them, a struc-ture versioning algorithm for an attributed directed graph isrequired. The algorithm assumes that editors for all logical

1 2 3

4

5 6 7

8

n1

n2

n3

n4

n5

n6

n7

n8

"content""children"

[n2]

null

1 2 3

4

6 7

8

109

Version v1 Version v2

[n3,n6]

[n2]

null

[n5,n7,n8]

null

[n4]

"1"

"2"

"3"

"4"

"5"

"6"

"7"

"8"

n1

n2

n3

n4

n5

n6

n7

n8

"content""children"

[n2]

[n6]

[n6]

[n2]

[n7,n8]

null

[n9]

"1+"

"2"

"3"

"4"

"6"

"7"

"8"

(content is

modified)

undefined

n9

n10

[n4]

[n9]

"9"

"10"

Figure 8. Structure versioning

structures or entities in models will call library functions fortrees, graphs, or slots in order to modify their structures orvalues. Those library functions will accordingly update thevalues of slots in an attribute table including structural slots(e.g. “parent” and “children”).

Figure 8 illustrates the algorithm via an example. In Fig-ure 8, a “content” attribute is defined that holds string valuesfor nodes. Other attributes can also be defined. Assume thatthere are two versions: v1 and v2. The shape of the graphat two versions is shown. Assume that after a sequence ofediting operations on the graph, several changes had beenmade to both the structure and the “content” slots associatedwith nodes (see Figure 8). The values of versioned slots inthe attribute table are changed to reflect modifications to thegraph at these versions. For example, at the version v2, the“content” slot of node “n1” contains a new value (the string“1+”), and the “children” slot of node “n2” contains a ref-erence to a new sequence object. That new sequence objectcontains only “n6” because the edge from “n2” to “n3” wasremoved. If there is a request for the values of slots associ-ated with node “n5” at v2, undefined values will be returnedsince “n5” was deleted at version v2. Note that “n9” and“n10” were inserted in the attribute table at v2.

The versioning mechanism for directed graphs withedges having attributes (e.g. type) is similar except that eachedge is now represented as a node. Each edge node is as-sociated with two main attributes: “source node” and “sinknode” attributes define for each edge node the source nodeand the destination node of the edge in a directed graph, re-spectively. Other attributes can also be defined, e.g, “typed”attribute defines for each edge node the type of the edge inthe directed graph. The steps of the fine-grained structureversioning mechanism are still the same.

This structure versioning mechanism is powerful since

Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM’05)

1063-6773/05 $20.00 © 2005 IEEE

Authorized licensed use limited to: TOKYO INSTITUTE OF TECHNOLOGY. Downloaded on November 9, 2009 at 06:03 from IEEE Xplore. Restrictions apply.

Page 8: Managing the evolution of Web-based applications with WebSCM · 2009. 11. 10. · for Web-based applications, named WebSCM. With Web-SCM, Web developers can manage fine-grained,

Figure 9. OOHDM conceptual schema

it can be used to version various forms of structures andlogical mappings that were presented in previous sections.It is also very efficient since common structures are sharedamong versions and all information including structures andcontents are versioned via one mechanism. Importantly,the algorithm is general for any subgraph, therefore, fine-grained version control is achieved for any abstraction rep-resented by a node. The versioning mechanism for a tree issimilar except that the “parent” attribute is added.

8 GUI-based functionality

This section presents GUIs for distinguished SCM func-tionality in WebSCM at both design and implementationlevels as well as the logical connections among them.

8.1 Version control for conceptual schemas

At the conceptual level, WebSCM provides supports forWeb application design with the OOHDM conceptual datamodel. A user can open an existing Web project or cre-ate a new one. The user selects a current working version,and opens or composes his conceptual design diagram viaa specialized editor. Figure 9 shows an example of a con-ceptual schema of an online department store. The user candefine different types of object-to-object relationships. Theuser is able to modify the properties of items and structuresin the conceptual schema. All fine-grained changes to theconceptual schema are captured. At the data level, datais currently stored in an attribute table. No real back-enddatabase is connected. A structural difference tool for con-ceptual schemas allows the user to see structural changesbetween two versions of a conceptual design diagram [22].

Figure 10. Logical mappings among models

8.2 Logical connections among models

In addition to supports for designs at the conceptualdata model, WebSCM also provides for developers a toolto work on presentation, screen, and navigational designsfor a Web application. The right window in the Figure 10shows the navigational and compositional designs for theUWM Multimedia Software Laboratory’s Web site. The topleft window displays data objects (e.g. lab staffs, students,projects, publications, etc) in a hierarchical view. The direc-tory structure of the Web project at the implementation levelis presented in the bottom left window. In WebSCM, Webdevelopers are able to know in which screens a data itemis presented at a particular version. For example, when dataobject “Tien” is selected, the screen “Tien” is highlighted inthe right window. WebSCM is also able to record the evo-lution of a mapping from a screen in the design model toactual files that realize the design of that screen (e.g. screen“Tien” and “tien.html”). In addition, developers can man-age the composition of screens and map them into HTMLframes or pages. If a screen is a composite entity, whenusers click on it, its member components will be presentedsuch as the “Main” page in Figure 10.

8.3 Fine-grained versioning for Web documents

WebSCM manages the versions of Web contents at a finegranularity. Figure 11 displays the contents of an HTMLdocument at two different versions v4 and v7. In the HTMLeditor, a user can select any document node and view its ver-sion history. For example, the un-ordered list of the HTMLdocument in the left window is chosen. A new window isdisplayed to show the version history of the list (see Fig-

Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM’05)

1063-6773/05 $20.00 © 2005 IEEE

Authorized licensed use limited to: TOKYO INSTITUTE OF TECHNOLOGY. Downloaded on November 9, 2009 at 06:03 from IEEE Xplore. Restrictions apply.

Page 9: Managing the evolution of Web-based applications with WebSCM · 2009. 11. 10. · for Web-based applications, named WebSCM. With Web-SCM, Web developers can manage fine-grained,

Figure 11. Versioning for HTML documents

Figure 12. HTML document comparison

ure 13). The list’s content at the version v4 is shown. Theuser is also able to see structural changes between those twoversions (v4 and v7) using the structural difference tool forHTML documents (see Figure 12). Icons attached to entriesshow the nature of changes to the entries. In this snapshot,between v4 and v7, some of document nodes have beenmodified in terms of both their attributes and structures, forexample, the “body” of the document (an “a” icon and atree icon are both attached to the node icon). Meanwhile,the “table” node has only an “a” icon since only the colorattribute of the table has been changed. Also, the third itemof the un-ordered list was deleted from v4 to v7 since it hasan “eraser” icon next to its entry in the left window.

This fine-grained, structural difference tool was easilybuilt due to our structure-oriented versioning, in which ev-

Figure 13. History of an HTML fragment

ery change to a document tree is recorded. Neither a treedifferencing algorithm nor extensive analysis is required tocompare different versions of an XML document tree sincethe SCM system works on the attributed tree representa-tion, rather than on the textual representation of an XMLdocument. Structural changes are stored as parent/childrenslots. The tool only needs to retrieve and display structuralchanges. No add-on to the SCM system is needed to derivestructural changes. Structural differencing tools for pro-grams and project directory structure were also built [23].Figure 14 shows structural changes in a Java program. Asimilar set of icons is used to show the nature of changes.

The current built-in editing services in WebSCM for

Figure 14. Program comparison

Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM’05)

1063-6773/05 $20.00 © 2005 IEEE

Authorized licensed use limited to: TOKYO INSTITUTE OF TECHNOLOGY. Downloaded on November 9, 2009 at 06:03 from IEEE Xplore. Restrictions apply.

Page 10: Managing the evolution of Web-based applications with WebSCM · 2009. 11. 10. · for Web-based applications, named WebSCM. With Web-SCM, Web developers can manage fine-grained,

Web documents at the implementation level include struc-tured document editors for XML, XHTML, HTML andplain text documents, and a syntax-recognizing Java edi-tor. Some of them are re-used from the previous version ofWebSCM [23]. Any externally stored documents in thoseformats can easily imported into WebSCM with built-inparsers. WebSCM is extensible to integrate any structurededitors and parsers for new document formats. If the newdocument format is XML-based, the built-in XML parsercan be re-used. For example, the drawSWF SVG graphicand animation editor [9], was easily integrated into Web-SCM since its document format is XML-based.

9 Conclusions

This paper presents a novel SCM-centered developmentenvironment for Web applications that allows Web devel-opers to manage fine-grained, structural evolution of Webcontents and their crucial structures. The separation be-tween design and implementation levels makes WebSCMwell-suited for large-scaled Web-based applications. Eachdesign and implementation model helps developers on aparticular aspect of the development process of a Web ap-plication. The evolution of logical entities and structures indesign models and logical mappings among them are inte-grally captured and related to each other in a cohesive man-ner. Version consistency is maintained not only among im-plementation artifacts, but also among all logical entities indifferent models at different levels of abstraction. This sortof versioning for model-to-model logical connections is fea-sible due to Molhado’s structure versioning engine [22]. In-ternal structure of a Web page is versioned in a fine-grainedmanner. Difference tools help Web developers to keep trackof fine-grained changes at multiple structural levels. An ex-perimental study is being conducted to evaluate the perfor-mance, efficiency, and usability of the system.

References

[1] ArticleBase. http://www.runningstart.com/.[2] G. Badros. JavaML: A markup language for java source

code. In Proceedings of the 9th International World WideWeb Conference, pages 159–177. ACM Press, 2000.

[3] L. Bendix and F. Vitali. VTML for Fine-grained Changetracking in Editing Structured Documents. In Proceedingsof the 1999 Software Configuration Management Workshop.Springer Verlag, 1999.

[4] CCC/Harvest. http://www3.ca.com/.[5] ClearQuest. www.rational.com/products/clearquest/index.jsp.[6] ColdFusion. http://www.allaire.com/.[7] S. Dart. Configuration Management: the missing link in Web

engineering. Artech House, 2000.[8] Document Object Model. http://www.w3.org/dom/.[9] DrawSWF. drawswf.sourceforge.net.

[10] Macromedia DreamWeaver. http://www.dreamweaver.com/.

[11] J. R. Driscoll, N. Sarnak, D. D. Sleator, and R. E. Tarjan.Making data structures persistent. J. Comput. Syst. Sci.,38(1):86–124, Feb. 1989.

[12] C. Dyreson, H.-L. Lin, and Y. Wang. Managing Versionsof Web Documents in a Transaction-time Web Server. InProceedings of the 13th International World Wide Web Con-ference (WWW13). ACM Press, 2004.

[13] eChangeMan. http://www.serena.com/.[14] E. J. Glover, K. Tsioutsiouliklis, S. Lawrence, D. M. Pen-

nock, and G. W. Flake. Using Web Structure for Classifyingand Describing Web Pages. In Proceedings of the 11th Inter-national World Wide Web Conference. ACM Press, 2002.

[15] S. Gupta, G. Kaiser, D. Neistadt, and P. Grimm. DOM-basedContent Extraction of HTML Documents. In Proceedings ofthe 12th International World Wide Web Conference, 2003.

[16] D. L. Hicks, J. J. Leggett, P. J. Nurnberg, and J. L. Schnase. Ahypermedia version control framework. ACM Transactionson Information Systems (TOIS), 16(2):127–160, 1998.

[17] Dynabase content management. http://www.rbii.com/ prod-uct/dynabase.

[18] D. Leblang. The CM challenge: Configuration managementthat works. Configuration Management, 2, 1994.

[19] D. Lowe and J. Eklund. Client Needs and the Design Processin Web Projects. J. of Web Engineering, 1(1):23–36, 2002.

[20] PVCS. http://www.merant.com/.[21] L. Mignet, D. Barbosa, and P. Veltri. The XML Web: a First

Study. In Proceedings of the 12th International World WideWeb Conference. ACM Press, 2003.

[22] T. N. Nguyen, E. V. Munson, J. T. Boyland, and C. Thao. AnInfrastructure for Development of Object-Oriented, Multi-level Configuration Management Services. In Proceedings ofthe 27th International Conference on Software Engineering(ICSE 2005). ACM Press, 2005.

[23] T. N. Nguyen, E. V. Munson, and C. Thao. Fine-grained,Structured Configuration Management for Web Projects. InProceedings of the 13th International World Wide Web Con-ference. ACM Press, 2004.

[24] K. Østerbye. Structural and cognitive problems in providingversion control for hypertext. In Proceedings of the ACMconference on Hypertext, pages 33–42. ACM Press, 1992.

[25] Perforce. http://www.perforce.com/.[26] D. Schwabe and G. Rossi. An Object Oriented Approach

to Web-based Application Design. Theory and Practice ofObject Systems, 4(4), 1998.

[27] J. Simonson, D. Berleant, X. Zhang, M. Xie, and H. Vo.Version augmented URIs for reference permanence via anApache module design. In Proceedings of the WWW7 Con-ference, Computer Networks and ISDN Systems, 1998.

[28] Microsoft Visual SoureSafe. http://msdn.microsoft.com/ssafe/prodinfo/overview.asp.

[29] E. Spertus and L. A. Stein. Squeal: A Structured Query Lan-guage for the Web. In Proceedings of the 9th InternationalWorld Wide Web Conference. ACM Press, 2000.

[30] StarTeam. http://www.startbase.com/.[31] StoryServer. http://www.vignette.com/.[32] TeamSite. http://www.interwoven.com/.[33] TrueChange. http://www.truesoft.com/.[34] WebIntegrity. http://www.mks.com/.[35] WebSynergy. http://www.continuus.com/.[36] E. J. Whitehead, Jr. WebDAV and DeltaV: collaborative au-

thoring, versioning, and configuration management for theWeb. In Proceedings of the ACM conference on Hypertextand Hypermedia, pages 259–260. ACM Press, 2001.

[37] E. J. Whitehead, Jr. An Analysis of the Hypertext VersioningDomain. PhD thesis, University of California – Irvine, 2000.

Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM’05)

1063-6773/05 $20.00 © 2005 IEEE

Authorized licensed use limited to: TOKYO INSTITUTE OF TECHNOLOGY. Downloaded on November 9, 2009 at 06:03 from IEEE Xplore. Restrictions apply.