semantics and planning based workflow composition for video processing

29
J Grid Computing (2013) 11:523–551 DOI 10.1007/s10723-013-9263-6 Semantics and Planning Based Workflow Composition for Video Processing Gayathri Nadarajan · Yun-Heh Chen-Burger · Robert B. Fisher Received: 15 May 2012 / Accepted: 24 May 2013 / Published online: 12 June 2013 © Springer Science+Business Media Dordrecht 2013 Abstract This work proposes a novel workflow composition approach that hinges upon ontolo- gies and planning as its core technologies within an integrated framework. Video processing prob- lems provide a fitting domain for investigating the effectiveness of this integrated method as tackling such problems have not been fully explored by the workflow, planning and ontological commu- nities despite their combined beneficial traits to confront this known hard problem. In addition, the pervasiveness of video data has proliferated the need for more automated assistance for image processing-naive users, but no adequate support has been provided as of yet. The integrated ap- proach was evaluated on a video set originating from open sea environment of varying quality. Experiments to evaluate the efficiency, adapt- This research was funded by European Commission FP7 grant 257024, in the Fish4Knowledge project (www.fish4knowledge.eu). G. Nadarajan (B ) · Y.-H. Chen-Burger Centre for Intelligent Systems and their Applications, School of Informatics, University of Edinburgh, 10 Crichton St., Edinburgh EH8 9AB, UK e-mail: [email protected] R. B. Fisher Institute for Perception, Action and Behaviour, School of Informatics, University of Edinburgh, 10 Crichton St., Edinburgh EH8 9AB, UK ability to user’s changing needs and user learn- ability of this approach were conducted on users who did not possess image processing expertise. The findings indicate that using this integrated workflow composition and execution method: (1) provides a speed up of over 90 % in execution time for video classification tasks using full au- tomatic processing compared to manual methods without loss of accuracy; (2) is more flexible and adaptable in response to changes in user requests than modifying existing image processing pro- grams when the domain descriptions are altered; and (3) assists the user in selecting optimal solu- tions by providing recommended descriptions. Keywords Automatic workflow composition · Intelligent video analysis · Ontology-based workflows · HTN planning 1 Introduction Traditional workflow systems have several draw- backs, e.g. in their inabilities to rapidly react to changes, to construct workflow automatically (or with user involvement) and to improve perfor- mance autonomously (or with user involvement) in an incremental manner according to specified goals. Overcoming these limitations would be highly beneficial for complex domains where such adversities are exhibited. Video processing is one

Upload: robert-b-fisher

Post on 13-Dec-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Semantics and Planning Based Workflow Composition for Video Processing

J Grid Computing (2013) 11:523–551DOI 10.1007/s10723-013-9263-6

Semantics and Planning Based WorkflowComposition for Video Processing

Gayathri Nadarajan · Yun-Heh Chen-Burger ·Robert B. Fisher

Received: 15 May 2012 / Accepted: 24 May 2013 / Published online: 12 June 2013© Springer Science+Business Media Dordrecht 2013

Abstract This work proposes a novel workflowcomposition approach that hinges upon ontolo-gies and planning as its core technologies withinan integrated framework. Video processing prob-lems provide a fitting domain for investigating theeffectiveness of this integrated method as tacklingsuch problems have not been fully explored bythe workflow, planning and ontological commu-nities despite their combined beneficial traits toconfront this known hard problem. In addition,the pervasiveness of video data has proliferatedthe need for more automated assistance for imageprocessing-naive users, but no adequate supporthas been provided as of yet. The integrated ap-proach was evaluated on a video set originatingfrom open sea environment of varying quality.Experiments to evaluate the efficiency, adapt-

This research was funded by European CommissionFP7 grant 257024, in the Fish4Knowledge project(www.fish4knowledge.eu).

G. Nadarajan (B) · Y.-H. Chen-BurgerCentre for Intelligent Systems and their Applications,School of Informatics, University of Edinburgh, 10Crichton St., Edinburgh EH8 9AB, UKe-mail: [email protected]

R. B. FisherInstitute for Perception, Action and Behaviour,School of Informatics, University of Edinburgh, 10Crichton St., Edinburgh EH8 9AB, UK

ability to user’s changing needs and user learn-ability of this approach were conducted on userswho did not possess image processing expertise.The findings indicate that using this integratedworkflow composition and execution method:(1) provides a speed up of over 90 % in executiontime for video classification tasks using full au-tomatic processing compared to manual methodswithout loss of accuracy; (2) is more flexible andadaptable in response to changes in user requeststhan modifying existing image processing pro-grams when the domain descriptions are altered;and (3) assists the user in selecting optimal solu-tions by providing recommended descriptions.

Keywords Automatic workflow composition ·Intelligent video analysis · Ontology-basedworkflows · HTN planning

1 Introduction

Traditional workflow systems have several draw-backs, e.g. in their inabilities to rapidly react tochanges, to construct workflow automatically (orwith user involvement) and to improve perfor-mance autonomously (or with user involvement)in an incremental manner according to specifiedgoals. Overcoming these limitations would behighly beneficial for complex domains where suchadversities are exhibited. Video processing is one

Page 2: Semantics and Planning Based Workflow Composition for Video Processing

524 G. Nadarajan et al.

such domain that increasingly requires attentionas larger amounts of images and videos are be-coming available to those who are not techni-cally adept in modelling the processes that areinvolved in constructing complex video processingworkflows. The requirements to address the prob-lem of automated video analyses for non-expertusers include (i) process automation; (ii) richprocess modelling; (iii) performance-based toolselection; and (iv) adaptable to changing userneeds.

Conventional video and image processing(VIP) systems, on the other hand, are developedby image processing experts and are tailored toproduce highly specialised hand-crafted solutionsfor very specific tasks, making them rigid andnon-modular. Traditionally they produce single-executable systems that work accurately on aspecific set of data. The knowledge-based visioncommunity have attempted to produce more mod-ular solutions by incorporating ontologies. How-ever, they have not been maximally utilised toencompass aspects such as application context de-scriptions (e.g. lighting and clearness effects) andqualitative measures.

This work aims to tackle some of the re-search gaps yet to be addressed by the workflowand knowledge-based image processing communi-ties by proposing a novel workflow compositionapproach within an integrated framework. Thisframework distinguishes three levels of abstrac-tion via the design, workflow and processing lay-ers. The core technologies that drive the workflowcomposition mechanism are ontologies and plan-ning. Video processing problems provide a fittingdomain for investigating the effectiveness of thisintegrated method as tackling such problems havenot been fully explored by the workflow, plan-ning and ontological communities despite theircombined beneficial traits to confront this knownhard problem. In addition, the pervasiveness ofvideo data has proliferated the need for moreautomated assistance for image processing-naiveusers, but no adequate support has been providedas of yet.

A set of modular ontologies was constructedto capture the goals, video descriptions and ca-pabilities (video processing tools). They are usedin conjunction with a domain independent plan-

ner to help with performance-based selection ofsolution steps based on preconditions, effects andpostconditions.

Two key innovations of the planner are theability to support workflow execution (by inter-leaving planning with execution) and can per-form in automatic or semi-automatic (interactive)mode. In the interactive mode, the user is involvedin tool selection based on the recommended de-scriptions provided by the workflow system viathe ontology. Once planning is complete, the re-sult of applying the tool of their choice is pre-sented to the user visually for verification. Thisplays a pivotal role in providing the user with con-trol and the ability to make informed decisions.Video processing problems can also be solved inmore modular, reusable and adaptable ways com-pared to conventional image processing systems.

The integrated approach was evaluated on atest set consisting of videos originating from opensea environment of varying quality. Experimentsto evaluate the efficiency, adaptability to user’schanging needs and user learnability of this ap-proach were conducted on users who did notpossess image processing expertise. The findingsindicate that using this integrated workflow com-position and execution method: (1) provides aspeed up of over 90 % in execution time for videoclassification tasks using full automatic process-ing compared to manual methods without loss ofaccuracy; (2) is more flexible and adaptable inresponse to changes in user requests (be it in thetask, constraints to the task or descriptions of thevideo) than modifying existing image processingprograms when the domain descriptions are al-tered; and (3) assists the user in selecting optimalsolutions by providing recommended descriptions.

Outline Section 2 describes existing workflowcomposition initiatives and highlights their short-falls. The workflow composition framework isoutlined in Section 3, which introduces theworkflow tool and its components. One of themajor component, the ontology will be detailedin Section 4. The second core technology, theplanning mechanism is explained in Section 5.Evaluation of the overall approach and analysisare provided in Section 6. Future directions forthis research is concluded in Section 7.

Page 3: Semantics and Planning Based Workflow Composition for Video Processing

Semantics and Planning Based Workflow Composition 525

2 Grid Workflow Systems

With the advent of distributed computing in thepast two decades, workflows have been deployedin distributed platforms. In a distributed context,such as the Grid or e-Science [9] a workflow canbe abstracted as a composite web service, i.e. a ser-vice that is made up of other services that are or-chestrated in order to perform some higher levelfunctionality. The goal of e-Science workflow sys-tems is to provide a specialised programmingenvironment to simplify the programming effortrequired by scientists to orchestrate a computa-tional science experiment [31]. Therefore, Grid-enabled systems must facilitate the compositionof multiple resources, and provide mechanismsfor creating and enacting these resources in adistributed manner. This requires means for com-posing complex workflows for execution, whichhas attracted considerable effort within the Gridworkflow community.

Several major workflow systems were analysedin terms of workflow composition. Among theminclude Pegasus [7], a workflow management sys-tem that aims to support large-scale data manage-ment in a variety of applications such as astron-omy, neuroscience, biology, gravitational wave-science and high-energy physics, Triana [32, 33],a problem solving and workflow programmingenvironment that has been used for text, speechand image processing tasks, Taverna [27], an opensource workflow engine that aims to provide alanguage and software tools to facilitate easy useof workflow and distributed compute technologyfor biologists and bioinformaticians and Kepler[17], a workflow project that consists of a set ofJava packages supporting heterogeneous, concur-rent modelling to design and execute scientificworkflows. Workflow composition mechanismsand limitations of current efforts will be discussedin the following subsections.

2.1 Workflow Composition Mechanisms

Studies on workflow systems have revealed fouraspects of the workflow lifecycle – composition,mapping (onto resources), execution and prove-nance capture [6]. This paper focuses on the com-position and execution aspects of the workflow

lifecycle. Workflow composition can be textual,graphical or semantics-based. Textual workflowediting requires the user to describe the workflowin a particular workflow language such as BPEL[1], SCUFL [26], DAGMan [29] and DAX [7].This method can be extremely difficult or error-prone even for users who are technically adeptwith the workflow language. Graphical renderingsof workflows such as those utilised by Triana,Kepler and VisTrails [3] are easy for small sizedworkflows with fewer than a few dozen tasks.However many e-Science and video processingworkflows are more complex. Some workflowshave both textual and graphical composition abil-ities. The CoG Kit’s Karajan [35] uses either ascripting language, GridAnt or a simple graphicaleditor to create workflows.

Blythe et al. [2] have researched into aplanning-based approach to workflow construc-tion and of declarative representations of datashared between several components in the Grid.This approach is extendable to be used in aweb services context. Workflows are generatedsemi-automatically with the integration of theChimera system [10]. In Pegasus [7], abstractworkflows1 may be constructed with the assistancefrom a workflow editor, such as the CompositionAnalysis Tool (CAT) [16] which critiques partialworkflows composed by users and offers sugges-tions to fix composition errors and to complete theworkflow templates. It assumes that the user maynot have the explicit descriptions of the desiredgoals at the beginning. It utilises classical planningto perform workflow verification. Wings [12] ex-tends this by dealing with the creation and valida-tion of very large scientific workflows. However,CAT requires the user to construct a workflowbefore interactively verifying it to produce a finalworkflow. Our effort, in contrast, aims to con-struct the workflow interactively or automatically.

Splunter et al. [34] propose a fully automatedagent-based mechanism for web service composi-tion and execution using an open matching archi-tecture. In a similar vein to these two approaches,our effort aims to provide semi-automatic and

1Directed acyclic graphs (DAGs) composed of tasks anddata dependencies between them.

Page 4: Semantics and Planning Based Workflow Composition for Video Processing

526 G. Nadarajan et al.

automatic means for workflow composition, butdoes not deal with the mapping of resources ontothe workflow components.

Three distinct stages are distinguished forworkflow creation; the creation of workflow tem-plates, the creation of workflow instances andthe creation of executable workflows (done byPegasus). Workflow templates specify complexanalyses sequences while workflow sequencesspecify data. Workflow templates and instancesare semantic objects that are represented in on-tologies using OWL-DL. While semantics-basedworkflow composition is the subject of currentresearch, most efforts have focused on either eas-ing the task of large scale workflows creation forcomputational workflows and for web services [6].

2.2 Limitations of Current Grid WorkflowSolutions

The major workflow systems mentioned at thebeginning of the section possess some featuresthat are worth investigating in order to assess theirsuitability and limitations for the purposes of thisstudy. In terms of composition itself, Pegasus’smain strength is in mapping abstract workflowsto their concrete (executable) forms, which arethen executed by a scheduler. It also providesadaptivity through a partitioner that uses planningto produce partial executable workflows. It doesnot, however, provide automatic composition ofthe abstract workflows.

Triana, Taverna and Kepler contain similar el-ements; Triana’s tasks are conceptually the sameas Taverna’s processes and Kepler’s actors. Theapproach in Kepler is very similar to Triana inthat the workflow is visually constructed from ac-tors (Java components), which can either be localprocesses or can invoke remote services such asWeb services. In terms of applicability, Pegasuswould best suit a domain with well-defined re-quirements and where the overall goal could bedetermined from a given set of rules and con-straints. Triana is well-suited for composing com-plex workflows for Web services and Peer to Peerservices. Taverna is also suitable to be used inWeb and Grid services contexts, but its use may belimited to composing simple workflows, whereasKepler works very well for composing workflows

for complex tasks but it has yet to reach its po-tential as a fully Grid-enhanced system. Kepler isbuilt upon Ptolemy II which is primarily aimed atmodelling concurrent systems. Furthermore, it isdesigned to be used by scientists which imposessome level of expertise to the user.

While existing workflow systems have morerecently incorporated ontologies, their use is stilllimited. The use of such technologies should notbe exclusively independent, rather they should befully integrated into the system. Existing systemsdo not provide full ontological handling nor in-tegration, instead they make use of separate on-tology tools to define and manipulate ontologies.The main limitations of existing workflow initia-tives can be summarised as follows:

– Limited or no automated support in construct-ing workflows, thus requiring the user to pos-sess domain expertise.

– Unable to improve performance autonom-ously (or with user involvement) in an incre-mental manner according to specified goals.

– Generally do not have full integration of on-tologies that would allow for more powerfulrepresentation and reasoning abilities.

Next, the framework constructed to overcomethe limitations of existing efforts, in particularto provide automatic workflow composition isdescribed.

3 Hybrid Three-layered Workflow CompositionFramework

A hybrid semantics-based workflow compositionmethod within a three-layered framework was de-vised and implemented (Fig. 1). It distinguishesthree different levels of abstraction through thedesign, workflow and processing layers. Eachlayer contains several key components that in-teract with one another and with componentsin other layers. This has been the backbone ofthe (Semantics-based Workflows for AnalysingVideos) (SWAV) tool [21].

The design layer contains components that de-scribe the domain knowledge and available videoprocessing tools. These are represented using on-tologies and a process library. A modeller is some-

Page 5: Semantics and Planning Based Workflow Composition for Video Processing

Semantics and Planning Based Workflow Composition 527

Fig. 1 Overview of hybrid workflow composition frame-work for video processing. It provides three levels ofabstraction through the design, workflow and processing

layers. The core technologies include ontologies and aplanner, used in the SWAV tool

one who is able to manipulate the components ofthe design layer, for example populate the processlibrary and modify the ontologies. Typically themodeller has training in conceptual modelling andhas knowledge in the application domain, but notnecessarily. The components could also be up-dated automatically, as will be shown in Section5. Knowledge about image processing tools, user-defined goals and domain description is organ-ised qualitatively and defined declaratively in thislayer, allowing for versatility, rich representationand semantic interpretation. The ontologies whichare used for this purpose will be described inSection 4.

The process library developed in the designlayer of the workflow framework contains thecode for the image processing tools and methodsavailable to the system. These are known as theprocess models. The first attempt in populatingthe process library involved identifying all prim-itive tasks for the planner based on the finestlevel of granularity. A primitive task is one that isnot further decomposable and may be performeddirectly by one or more image processing tools,

for instance a function call to a module withinan image processing library, an arithmetic, logicalor assignment operation. Each primitive task maytake in one or more input values and return one ormore output values. Additionally, the process li-brary contains the decomposition of non primitivetasks or methods. This will be explained in Section5. The complete list of independent executablescan be found in [20].

The workflow layer is the main interface be-tween the user and the system. It also acts as anintermediary between the design and processinglayers. It ensures the smooth interaction betweenthe components, access to and from various re-sources such as raw data, image and video process-ing toolset, and communication with the user.Its main reasoning component is an execution-enhanced planner that is responsible for trans-forming the high level user requests into low levelvideo processing solutions. Detailed workings ofthe planner is contained in Section 5.

The workflow enactor plays the importantrole of choreographing the flow of processingwithin the system. It should be noted that unlike

Page 6: Semantics and Planning Based Workflow Composition for Video Processing

528 G. Nadarajan et al.

the workflow enactors covered in Section 2, theSWAV tool does not deal with resource alloca-tion and scheduling, rather, on the composition ofspecific operators and the execution of the oper-ators given their predetermined parameters. Firstit reads in the user request in textual form (useselects from a list of options). Next it consults thegoal and video description ontologies to formulatethe input that is then fed to the planner. When theplanner, with the assistance of the process libraryand capability ontology, returns the final solutionplan, the enactor prompts the user for furtheraction. The user has access to the final result of thevideo processing task visually, and has the choiceto (1) rerun the same task on the same videobut with modifications to the domain information;(2) rate the quality of the result; or (3) performanother task. The composed workflow is savedin a script file that can be invoked off-line. Bybeing able to view the result of each solution withchanges to the domain information, the user canassess the quality of the solution produced. Thisfeedback mechanism could be used as a basis forimproving the overall performance of the systemas verifying the quality of the video processingsolutions automatically is not a trivial task. Theplanning mechanism is described in Section 5.

The processing layer consists of a set of videoand image processing tools that can perform var-

ious image processing functions. The functionsof these tools are represented in the capabilityontology in the design layer. Once a tool has beenselected by the planner, it is applied to the videodirectly. The final result is passed back to theworkflow layer for output and evaluation.

The functions (primitive tasks) that they canperform are represented semantically in the ca-pability ontology, described in Section 4. A set ofvideo processing tools developed for this researchis available at [20].

Several prototypes of the workflow interfacewere designed over time; initially it uses a textualinterface to communicate with the user and morerecently the SWAV tool incorporates a graphicalinterface (Fig. 2).

4 Video and Image Processing Ontology

Ontologies are used for capturing knowledge andsemantics in a domain and have been used widelyin several major fields including medical, linguis-tics and enterprise. Domain ontologies are oftenmodelled in a collaborative effort between do-main and ontology experts to capture consensualknowledge that is formed between the domainexperts that can be shared and reused amongthem. In the video processing field, ontologies are

Fig. 2 The SWAVinterface which allowsuser to select a video anda task (top left panels) andadjust domain settings(right panel). Annotatedresults are displayed tothe user

Page 7: Semantics and Planning Based Workflow Composition for Video Processing

Semantics and Planning Based Workflow Composition 529

extremely suitable to many problems that requireprior knowledge to be modelled and utilised inboth a descriptive and prescriptive capacity sincethey encode the concepts and relationships be-tween the components in the world.

4.1 Modularisation

For the purposes of this research, a set of on-tologies was required to model the video andimage processing (VIP) field so that it can beused for domain description and understanding,as well as inference. The ontology should describethe domain knowledge and support reasoningtasks, while being reasonably independent fromthe system. The principles adopted for the ontol-ogy construction included simplicity, concisenessand appropriate categorisation. For this reason,three aspects of the VIP field were highlighted.These were identified as goal, video descriptionand capability. These aspects were motivated bythe context of their use within a planning systemthat requires the goal and initial domain statemodel (which includes the initial video descrip-tion) and also a performance-based selection ofoperators. Following the SUMO(Suggested Up-per Merged Ontology)2 ontology representation,a modular ontology construction was adopted (seeFig. 3). The modularisation aims to separate theformulation of the problems from the descriptionof the data and the solutions to be produced.

The next three subsections describe the threeontologies, a more detailed explanation of theirconstruction in relation to the Fish4Knowledgeproject can be found in [19].

4.2 Goal Ontology

The Goal Ontology contains the high level ques-tions posed by the user and interpreted by theworkflow as VIP tasks, termed as goals, and theconstraints to the goals. The main concepts of thegoal ontology is shown in Fig. 4.

Under the ‘Goal’ class, more specialised sub-classes of video processing goals are described.Some examples include ‘Object detection’, ‘Event

2http://www.ontologyportal.org/

Fig. 3 Modular structure of the Video and Image Process-ing (VIP) Ontology

detection’ and ‘Object Clustering’. Under each ofthese, even more specialised goals are described.For instance, more specific goals of ‘Object de-tection’ include ‘Fish detection’ and ‘Coral detec-tion’, which are relevant for this work.

‘Constraint on Goal’ refers to the conditionsthat restrict the video and image processing tasksor goals further. In our context, the main con-striction for a VIP goal is the ‘Duration’, a sub-class of ‘Temporal Constraint’. Each task maybe performed on all the historical videos, or aportion specified by the user – within a day, night,week, month, year, season, sunrise or sunset (allspecified as instances of the class ‘Duration’).

Other constraints types include ‘Control Con-straint’, ‘Acceptable Error’ and ‘Detail Level’.The control constraints are those related to thespeed of VIP processing and the quality of the re-sults expected by the user. ‘Performance Criteria’allows the user to state whether the goal that theywish to perform should be executed using a fasteralgorithm (indicated by the criterion processingtime) or whether it should take less (CPU) mem-ory. Processing time and memory are instancesof ‘Performance Criteria’. Instances of the class‘Quality Criteria’ are reliability and robustness.‘Quality Criteria’ with the value reliability con-strains the solution to be the most accurate result.If such a solution could not be found, then thesystem should fail rather than produce alternativeoptions. ‘Robustness’ indicates the reverse; thatthe system should not break down completely incases where a reliable solution could not be found,instead it should return an alternative (imperfect)result.

Another constraint that the user may want toinsert is the threshold for errors, contained in theclass ‘Acceptable Error’. A typical example ofthis is contained in its subclass ‘Accuracy’, whichstates the accuracy level of a detected object. Two

Page 8: Semantics and Planning Based Workflow Composition for Video Processing

530 G. Nadarajan et al.

Fig. 4 Goal ontology denoting the main classes of goals and constraints

Page 9: Semantics and Planning Based Workflow Composition for Video Processing

Semantics and Planning Based Workflow Composition 531

instances of accuracy are prefer miss than falsealarm’ and prefer false alarm than miss. Miss andfalse alarm are terminologies used within VIPtasks that involve the detection of objects to indi-cate the accuracy level of the detection. Considera real object to be the object that needs to bedetected. A miss (false negative) occurs when areal object exists but is not detected. A false alarm(false positive) occurs when an object that is not areal object has been detected.

The class ‘Detail Level’ contains constraintsthat are specific to particular details, for exampledetail of ‘Occurrence’. The criteria for ‘Occur-rence’ is used for detection tasks to constrict thenumber of objects to be detected. The value ‘all’for occurrences imposes that all the objects shouldbe identified.

The Goal Ontology is used for consistencychecks when a user query is detected in the sys-tem. It can check that the query matches with

a goal or set of goals that is achievable withinthe workflow system. It is also used to guide theselection of higher level tasks for workflow andformulate input values to the reasoning enginethat is responsible for searching the VIP solutionset for a VIP task, i.e. to compose the workflow.

4.3 Video Description Ontology

The Video Description Ontology describes theconcepts and relationships of the video and im-age data, such as what constitutes video/imagedata, the acquisition conditions such as lightingconditions, colour information, texture, environ-mental conditions as well as spatial relations andthe range and type of their values. Figure 5 givesa pictorial overview of the main components ofthe Video Description Ontology. The upper levelclasses include ‘Video Description’, ‘DescriptorValue’, ‘Relation’, and ‘Measurement Unit’.

Fig. 5 Main concepts of the Video Description Ontology

Page 10: Semantics and Planning Based Workflow Composition for Video Processing

532 G. Nadarajan et al.

The main class of this ontology is the ‘VideoDescription’ class, which has two subclasses: ‘De-scription Element’ and ‘Descriptor’. A descriptionelement can be either a ‘Visual Primitive’ or an‘Acquisition Effect’. A visual primitive describesvisual effects of a video/image such as observedobject’s geometric and shape features, e.g. size,position and orientation while acquisition effectdescriptor contains the non-visual effects of thewhole video/image that contains the video/imageclass such as the brightness (luminosity), hue andnoise conditions. The descriptor for the descrip-tion elements are contained under the ‘Descrip-tor’ class and are connected to the ‘DescriptionElement’ class via the object property ‘hasDe-scriptionElement’ (not visible in the diagram).

Typical descriptors include shape, edge, colour,texture and environmental conditions. Environ-mental conditions, which are acquisitional effects,include factors such as current velocity, pollutionlevel, water salinity, surge or wave, water turbid-ity, water temperature and typhoon, specified asinstances. These values that the descriptors canhold are specified in the ‘Descriptor Value’ classand connected by the object property ‘hasValue’.For the most part, qualitative values such as ‘low’,‘medium’ and ‘high’ are preferred to quantitativeones (e.g. numerical values). ‘Qualitative’ valuescould be transformed to quantitative values us-ing the ‘convertTo’ relation. This would requirethe specific measurement unit derived from oneof the classes under the concept ‘MeasurementUnit’ and conversion function for the respectivedescriptor e.g. a low velocity could be interpretedas movement with velocity within a range of 0 and25 ms−1.3 Some descriptor values can be tied totheir appropriate measurement units. The prop-erty that specifies this is ‘hasMeasurementUnit’,which relates instances in the class ‘Descriptor’ toinstances in the class ‘Measurement Unit’.

The goal and video description ontology weredeveloped with collaboration with image process-ing experts, knowledge-based vision communities

3Currently, there is not a fixed conversation formula. Theactual conversion formula used will be determined as wegain more experience by using our workflow system overtime.

and domain experts. Preliminary work with theHermes project [28] brought about initial versionsof these two ontologies [22].

4.4 Capability Ontology

The capability ontology (Fig. 6) contains theclasses of video and image processing functionsand tools. Each function (or capability) is asso-ciated with one or more tools. A tool is a soft-ware component that can perform a video or im-age processing task independently, or a techniquewithin an integrated vision library that may beinvoked with given parameters. This ontology willbe used directly by the planner in order to identifythe tools that will be used to solve the problem.As this ontology was constructed from scratch,the METHONTOLOGY methodology [13] wasadopted. It is a comprehensive methodology forbuilding ontologies either from scratch, reusingother ontologies as they are, or by a process ofre-engineering them. The framework enables theconstruction of ontologies at the knowledge level,i.e. the conceptual level, as opposed to the imple-mentation level.

This ontology will be used to identify the toolsthat will be used for workflow composition andexecution of VIP tasks. In our context, it is usedby a reasoner and a process library for the se-lection of optimal VIP tools. The main conceptsintended for this ontology have been identifiedas ‘VIP Tool’, ‘VIP Technique’ and ‘Domain De-scriptions for VIP Tools’. Each VIP technique canbe used in association with one or more VIP tools.‘Domain Description for VIP Tool’ represent acombination of known domain descriptions (videodescriptions and/or constraints to goals) that arerecommended for a subset of the tools. This willbe used to indicate the suitability of a VIP toolwhen a given set of domain conditions hold ata certain point of execution. At present thesedomain descriptions are represented as strings andtied to VIP tools, e.g. Gaussian background modelwould have the description ‘Clear and Fast Back-ground Movement’ to indicate the best selectioncriteria for it.

The main types of VIP tools are video analysistools, image enhancement tools, clustering tools,

Page 11: Semantics and Planning Based Workflow Composition for Video Processing

Semantics and Planning Based Workflow Composition 533

Fig. 6 Main concepts of the Capability Ontology

image transform tools, basic structures and op-erations tools, object description tools, structuralanalysis tools and object recognition and clas-sification tools. At present fish detection andtracking have been performed more than theother tasks, hence the ontology has been popu-lated with most of the tools associated with thesetasks. For other tasks that have not been per-formed, e.g. fish clustering, the ontology will beextended and populated in due course. Detec-tion and tracking tools fall under the class ‘VideoAnalysis Tool’. Other types of video analysis toolsare event detection tools, background modellingtools and motion estimation tools.

The class ‘Object Description Tool’ specifiestools that extract features such as colour, tex-ture, size and contour, while image transformtools are those concerned with operations suchas point, geometric and domain transformations.‘VIP Technique’ is a class that contains technolo-gies that can perform VIP operations. For now,two types of machine learning techniques have

been identified. These techniques could be usedto accomplish the task of one or more VIP tools.For example, neural netwworks can be used asclassifiers as well as detectors.

The Capability Ontology can be used for rea-soning during workflow composition using plan-ning. As planning takes into account precondi-tions before selecting a step or tool, it will assessthe domain conditions that hold to be used inconjunction with an appropriate VIP tool.

4.5 Walkthrough

Based on the devised sub-ontologies, a walk-through of their usage to provide different levelsof vocabulary in a seamless and related manneris explained here. The flowchart in Fig. 7 outlinesthe workflow enactment process emphasising onpoints that require ontological usage.

The user may have a high level goal or tasksuch as “Detect all the f ish in the video 1.mpeg”in mind. One way this could be represented and

Page 12: Semantics and Planning Based Workflow Composition for Video Processing

534 G. Nadarajan et al.

Fig. 7 Flowchart of the workflow enactor’s interaction with the user annotated with ontology use. Shaded processes denoteuser-provided input

Page 13: Semantics and Planning Based Workflow Composition for Video Processing

Semantics and Planning Based Workflow Composition 535

selected are via the following criterion-value pairsin natural language:

[Goal: goal = fish_detection][Constraints: Performance = memory,

Quality = reliability,Accuracy = prefer_miss_than_false_alarm,

Occurrence = all][Video Descriptions: Brightness = not_known,

Clearness = not_known,Green Tone = not_known]

As a first step, the user selects the goal s/hewishes to conduct via the workflow interface. Thiscorresponds to steps 1 and 2 in the flowchart.Once the goal is retrieved, it is checked againstthe goal ontology (step 4). Axioms in the goalontology are used to check if the goal is indeedone that is valid for the application in question. Inthis case the instance ‘fish_detection’ is checkedagainst the class ‘goal’ to determine if they belongto the same hierarchy.

Next, the related constraints for the goal aredetermined (steps 5 and 6). These are addi-tional parameters to specify rules or restrict-ions that apply to the goal. For the goal‘fish_detection’, the relevant constraints arePerformance Criteria, Quality Criteria,Accuracy and Occurrence, contained in thegoal ontology. The user may choose to provide all,some or none of these values. Adopting the sameprinciple used for goal retrieval, constraint valuesare checked if they are valid (step 7), i.e. if theyare descendants of the class ‘Constraint on Goal’.

Then, depending on the goal, the user isprompted for the video descriptions (steps 9 and10). For the task ‘fish_detection’, the descriptionsrequired are brightness, clearness and green tonelevels. As before, the user may choose to provideall, some or none of these values. The values ob-tained are checked against the video descriptionontology (step 11). The checking of validity isagain the same as with the goal and constraints,except that the values (instances) are checkedagainst the class ‘descriptor’ in the video descrip-tion ontology. In the absence of user information,constraints are set to default values while videodescriptions are obtained via preliminary analysis.

Once the goal, constraints and video descrip-tions are determined, the formulation of the user’sproblem is complete and this information will be

fed to the planner (step 12). The inner workingsof the planner will be described in Section 5.1.Basically, the planner seeks to find steps in theform of VIP tools composed in sequential, iter-ative and conditional fashions in order to solvethe task. At each step of the way, the plannerattempts to find a suitable tool by consulting thecapability ontology (step 14/16). This is done us-ing the ‘check_capability’ function, described inAlgorithm 1.

First it retrieves a tool that can perform theplanning step. Subsequently, it checks if the se-lected tool is linked to a list of domain conditionsthat are deemed suitable for it according to imageprocessing experts’ heuristics. Not all tools aretied to domain conditions. The domain conditionsthat are suitable for this tool are checked againstthe current domain conditions. If all of them hold,then the tool is selected for execution. Otherwiseit will try to find another tool where such condi-tions hold, failing otherwise. This is applied whenthe planning mode is automatic (steps 13 and 14).In semi-automatic mode, user will make this toolselection whenever more than one tool is presentto perform a planning step. All the tools andtheir recommended descriptions are displayed tothe user who will select one of them (steps 15and 16). The descriptions are expressed in naturallanguage to ease readability for the user. Whena tool is selected, it is applied directly to thevideo or image in question (step 18). This planninginterleaved with execution process continues untilthe task is solved, i.e. when the goal is achieved(step 19).

5 Planning and Workflow Enactment

At the heart of the system lies a workflow enactorthat interfaces the interactions with the user andcoordinates all the activities between the compo-nents within the system. The main component isa planner that is responsible for the derivationof video and image processing (VIP) solutionsbased on the provided goal and domain descrip-tions. Therefore, the planner is a reasoner thattranslates the high level non-technical terms (usergoals and preferences) to low level technical terms

Page 14: Semantics and Planning Based Workflow Composition for Video Processing

536 G. Nadarajan et al.

Algorithm 1 VIP tool (planning step) selection via the ‘check_capability’ functioncheck_capability(planning-step S, planning-mode Auto)If Auto is true

Retrieve a tool, T that can perform S from capability ontology:getTool(S)Check that domain-criteria tied to this tool (if any) hold:check_criteria(T)

ElseDisplay description of S, D to user:hasDescription(S, D)Retrieve ALL tools that can perform S from capability ontology:For each tool ti in Ts

Retrieve recommended domain descriptions, RD for ti:hasPerformanceIndicator(ti, RD)Display ti and RD to user

End forDisplay also system’s recommended tool, tsgetTool(S)

Prompt user to select a tool TReturn T

getTool(S)canPerform(T, S)instance(T, T1)descendant_of(T1, tool)

Return T

check_criteria(T)If a set of domain-criteria, DC exist for this toolhasPerformanceIndicator(T, DC)

Retrieve the list of preconditions, P for DC:instance_att_list(DC, P)

If all preconditions in P holdreturn DC

Else return FailElse return ‘no_criteria’

(VIP operations) for workflow composition. Thisis done with the assistance of the process libraryand ontologies. Two key innovations of the plan-ner are the ability to support workflow execution(interleaves planning with execution) and can per-form in automatic or semi-automatic (interactive)mode. It extends the capabilities of typical plan-ners by guiding users to construct more optimalsolutions via the provision of recommended de-

scriptions for the tools. It is also highly adaptableto user preferences.

5.1 Planner Design and Representation

The planner was designed with the aim of en-hancing the flexibilities of existing HierarchicalTask Network (HTN) planners. HTN planninguses so-called methods or refinements to describe

Page 15: Semantics and Planning Based Workflow Composition for Video Processing

Semantics and Planning Based Workflow Composition 537

the decomposition of non primitive tasks in termsof primitive and other non primitive tasks. Prim-itive tasks are directly executable by applying anoperator. The planner starts with an abstract net-work representing the task to be accomplished,and proceeds by expanding the abstract networkinto more detailed networks lower down in theabstraction hierarchy, until the networks only con-tain executable actions. HTN methods are a formof domain-specific knowledge (in the form of taskdecompositions). This greatly reduces the searchspace of the planner by encoding knowledge onhow to go about looking for a plan in a domain andalso enables the user to control the type of solu-tions that are considered. Due to its successes andpracticalities, several major HTN planners havebeen developed over the past decades, namelyO-Plan [5], SHOP [25], I-X [30], HyHTN[18] andSHOP2 [24].

The design includes interleaving planning withexecution, and providing automatic and semi-automatic (interactive) modes of planning. Thesemi-automatic mode of planning enables the userto construct more optimal solutions by providingrecommended descriptions when more than onetool is available to perform a primitive task.

The implementation of the planner has beenkept separate to the modelling of the domain.The planner itself is domain-independent and canbe tailored to be used in different problem sce-narios as long as the domain descriptions andrelevant ontologies are provided. Domain-specificinformation is encoded in the knowledge base asfacts and in the process library as primitive tasksand methods. The planner and domain model aredescribed declaratively using SICStus Prolog 4.0.The planner was built based on ordered task de-composition where the planning algorithm whichcomposes tasks in the same order that they will beexecuted.

The input to the planner are the goals, objectsand conditions of the objects at the beginning ofthe problem (initial state), and a representation ofthe actions that can be applied directly to prim-itive tasks (operators). For HTN planning, a setof methods that describe how non primitive tasksare decomposed into primitive and non primi-tive tasks are also required. The output shouldbe (partial) orderings of operators guaranteed

to achieve the goals when applied to the initialstate.

A video processing task can be modelled as anHTN planning problem, where a goal list, G is rep-resented as the VIP task(s) to be solved, the prim-itive tasks, p are represented by the VIP primi-tive tasks and the operators, O are representedby the VIP tools that may perform the primitivetasks directly. The methods, M specify how thenon primitive tasks are decomposed into prim-itive and non primitive subtasks. The primitivetasks and methods are contained in the processlibrary.

Adapting the conventions provided in Ghallabet al. [11], an HTN planning problem is a5-tuple

P = (s0, G, P, O, M)

where s0 is the initial state, G is the goal list, Pis a set of primitive tasks, O is a set of operators,and M is a set of HTN methods. A primitive task,p ∈ P is a 3-tuple

p = (name(p), preconditions(p),

postconditions(p))

where name(p) is a unique name for the prim-itive task, preconditions(p) is a set of literalsthat must hold for the task to be applied andpostconditions(p) is a set of literals that must holdafter the task is applied.

An HTN method, m ∈ M is a 6-tuple

m = (name(m), task(m), preconditions(m),

decomposition(m), effects(m),

postconditions(m))

where name(m) is a unique name for the method,task(m) is a non primitive task, preconditions(m)is a set of literals that must hold for the task tobe applied, decomposition(m) is a set of primitiveor non primitive tasks that m can be decomposedinto, effects(m) is a set of literals to be assertedafter the method is applied and postconditions(m)is a set of literals that must hold after the taskis applied. The planning domain, D, is the pair(O, M).

Page 16: Semantics and Planning Based Workflow Composition for Video Processing

538 G. Nadarajan et al.

5.2 Primitive Tasks, Operators and Methods

30 VIP operators were identified and imple-mented for performing the task video clas-sification according to brightness, clearness andalgal levels, fish detection and counting. Theywere developed using a combination of top-downand bottom-up approaches that allowed a suitablelevel of granularity of the operators [23]. Thecorresponding primitive tasks that the operatorscan act upon are encoded in the process library.As stated earlier, primitive tasks are those that canbe performed directly by operators or VIP tools.

For each primitive task, its corresponding tech-nical name, preconditions, parameter values, out-put values, postconditions and effects of applyingthis task are specified. The preconditions are allthe conditions that must hold (prerequisites) forthis primitive task to be performed. The effectsare conditions that will be asserted or retracted af-ter completion of the task and the postconditionsare all the conditions that must hold after the taskis applied.

Non primitive tasks are decomposable to prim-itive and non primitive subtasks. Schemes forreducing them are encoded as methods in theprocess library. For each method, the name ofthe method, the preconditions, decomposition,effects and postconditions are specified. The de-composition is given by a set of subtasks thatmust be performed in order to achieve this nonprimitive task. For VIP tasks, the methods arebroadly categorised into three distinct types; nonrecursive, recursive and multiple conditions. Nonrecursive method is the most common form thatdoes not involve loops or branching of any sort,for example a direct decomposition of a videoclassification method into processing the imageframes, followed by performing the classificationon the frames and finally writing the resultingframes onto a video. Recursive methods, as itsname suggest, model loops, hence the decompo-sition of these methods will include itself as asubtask. Multiple conditions arise when there ismore than one decomposition for a method. Forinstance, texture features can be computed byeither computing the histogram values followedby computing the statistical moments or by com-puting the Gabor filter. Two separate decompo-

sitions will be possible for computing the texturefeatures.

5.3 Planner Algorithm

Planning algorithms solve problems by applyingapplicable operators to initial state to create newstates, then repeating this process from the newstates until the goal state is reached. In HTNplanning, the algorithm stops when all the goaltasks have been achieved. Algorithm 2 describesthe main workings of the planner implemented forthis thesis.

The domain is represented by the predicatesthat contain video descriptions (e.g. brightness,clearness and green tone levels), the constraints(e.g. accuracy and processing time), methods (de-compositions) and operators (tools to execute theprimitive tasks). The algorithm is a recursive func-tion that works on the goal list until it is empty.It inspects each item in the goal list to see if itis a primitive task. If the item is a primitive task,it seeks to find an operator that can perform theprimitive task. This is done automatically or semi-automatically, depending on the planning modeselected by the user. Once found, the operator isapplied and the primitive task is accomplished. Ifthe task is not primitive, it looks for a methodinstance that matches it and appends its decompo-sitions to the start of the goal list. The basis for theplanning algorithm was taken from HTN plannersthat generate plans in a totally ordered fashion,where tasks are decomposed from left to right inthe same order that they will be executed. In addi-tion, it can plan interactively, interleave planningwith workflow execution and has been enriched tomake use of knowledge from ontologies.

5.4 Automatic or Interactive Mode of Planning

The planner works in either automatic or semi-automatic (interactive) mode, determined by theuser before planning takes place. In the automaticmode, user intervention during tool selection isnot required. At each planning step, the planneritself selects the tool deemed most optimal basedon the domain conditions that match with thetool’s recommended domain conditions encodedin the capability ontology. These conditions have

Page 17: Semantics and Planning Based Workflow Composition for Video Processing

Semantics and Planning Based Workflow Composition 539

Algorithm 2 Workings of the SWAV planner in semi-automatic modegplanner(initial-state S, goal-list [g1|G], domain D, solution-plan P)Initialise P to be emptyIf goal-list is empty, return P

Consider first element in the goal-list, g1 in goal list [g1|T]Case 1: If g1 is primitive AND all its preconditions hold

1.1. If one or more operator instances (tools) match g1Retrieve ALL operators (tools) T = [t1,..,tn] from capabilityontology that can perform g1For each tool, ti in T

Retrieve suitable domain conditions for ti from capabilityontology

End ForIf more than one operator is available to perform g1Display all applicable tools with suitable domain conditionsPrompt user to select preferred tool, tp

Else tp is the only available operator to perform g1Apply tp to S and planning algorithm to rest of goal-list:apply-operator(tp, Input_list, Add_list)Check that all postconditions of g1 holdgplanner(tp(S), G, D, P)

1.2 Else return fail

Case 2: If g1 is non primitive2.1. If a method instance m matches g1 in S

AND all its preconditions holdAppend the decompositions of m into the front of the goal-listAdd all elements in m’s Add List to SCheck that all postconditions of m holdApply planning algorithm to this new list of goals:gplanner(S, append(m(g1),G), D, P)

2.2 Else return fail

apply-operator(Tool, Input_list, Add_list, P, S)Update solution-plan P with Tool (append Tool to the end of P)Execute Tool with parameters Input_listAdd all elements in Add_list (ef\/fects) to S

been determined based on image processing ex-perts’ heuristics (determined empirically). Thusthe preferred step in Algorithm 2 is selected bythe system rather than the user. When there areno heuristics to guide the tool selection, the firsttool encountered that can perform the primitivetask is selected for execution. Hence it follows adepth-first style of search.

In the semi-automatic mode, the planningprocess is interactive when more than one toolis available to perform a primitive task. At thislevel, the planner derives all the applicable VIPtools and their recommended domain descriptionsfrom the capability ontology (See Algorithm 1in Section 4.5 for the reasoning mechanism ofthis). The user selects the VIP operator/tool of

Page 18: Semantics and Planning Based Workflow Composition for Video Processing

540 G. Nadarajan et al.

their choice based on the recommended domaindescriptions for each tool as guidance. Thus theplanner allows the user to select a tool duringthe planning process, giving them control and alsothe ability to make informed decisions. The nextsection will highlight how this control is followedthrough with user verification of the final result.

5.5 Interleaving Planning with Execution

The planner follows a style of planning inter-leaved with execution. Once a VIP tool is selected,it is executed before the next planning step isinferred. This is because domain conditions couldchange as a result of the application of a selectedtool. This would then affect the planning processof subsequent steps. However, replanning is notallowed until execution of the whole task is com-pleted. This is due to the fact that intermediateresults cannot be used to assess the optimality ofthe selected VIP tools (neither by the system northe user) and finding a suitable heuristic for thispurpose is not trivial.

Once planning is complete, the user has accessto the final video containing the result of applyingthe tool(s) that they have selected. This will give

them a good indication of the optimality of thetool(s) that they have selected. Figure 8 shows anexample of this for a detection task. After viewingthis result, they may decide to replan in order totry different choices of tools. Section 6 includesan evaluation of the learnability level achieved bythe user in selecting the optimal solutions basedon the descriptions provided by the system usingthe semi-automatic planning mode. The next sec-tion will illustrate two examples on how domaindescriptions can affect the selection of planningoperators.

6 Evaluation

Three hypotheses were formulated by taking intoconsideration factors such as diversity in user re-quirements, variety in the quality of the videos(lighting conditions, object movement, etc.) andvastness of the data made available:

1. Automated support could be provided forusers without image processing expertise toperform VIP tasks in a time-ef f icient mannerusing a novel semantics- and planning-based

Fig. 8 Results of applyingfour different backgroundmodels for fish detectionand counting task for thesame video

(a) Adaptive Gaussian Mixture Model.

(c) Poisson Model.

(b) IFD Model.

(d) W4 Model.

Page 19: Semantics and Planning Based Workflow Composition for Video Processing

Semantics and Planning Based Workflow Composition 541

workflow composition and execution systemwithout loss of accuracy in the quality of thesolutions produced.

2. Constructing VIP solutions using multipleVIP executables employing planning andworkflow technologies is more flexible andadaptable towards changing users’ needs thanmodifying single executable programs.

3. The semantics and planning based automated-assisted mechanism to compose and executeworkflows for video processing tasks helps theuser manage and learn the processes involvedin constructing optimal solutions.

6.1 Data Set: Ecogrid Videos

Videos are captured, stored and made accessibleto marine biologists continuously via Ecogrid [8],a joint effort between the National Center forHigh Performing Computing (NCHC), Taiwanand several local and international research insti-tutes which provides a Grid-based infrastructurefor ecological research. Data is acquired usinggeographically distributed sensors in various pro-tected sites such as Ken-Ting national park, Ho-bihu station and Lanyu island. The video streamscollected have enabled analysis in underwater reefmonitoring, among others. Interesting behavioursand characteristics of marine life such as fish andcoral can be extracted from the videos by per-forming analysis such as classification, detection,counting and tracking.

As can be seen from the image captures inFig. 9, the videos were taken in an uncontrolledopen sea environment where the degree of lumi-nosity and water flow may vary depending uponthe weather and the time of the day. The wa-ter may also have varying degrees of clearnessand cleanness. In addition, the lighting conditionschange very slowly, the camera and the back-

ground are fixed, images are degraded by a block-ing effect due to the compression, and the fishesare regions whose colours are different from thebackground and are bigger than a minimal area(they are unusable otherwise). Furthermore, asalgae grow rapidly in subtropical waters and oncamera lens, it affects the quality of the videostaken. Consequently, different degrees of green-ish videos are produced. In order to decrease thealgae, frequent and manual cleaning of the lens isrequired.

6.2 Tasks and Subjects

Based on the video descriptions and other fea-tures displayed from the video captures, as well asdiscussions with marine biologists, several broadcategories of tasks have been identified as usefulfor this test data. These are described below, inorder of increasing complexity:

T1. Video classification based on brightness,clearness and algal levels.

T2. Fish detection and counting in individualframes.

T3. Fish detection, counting and tracking invideo.

The results are annotated to the original videoin text and numerical format. For example,Fig. 10a shows the brightness, clearness and greentone values on the resulting video while Fig. 10bshows the detected fish and the number of fish inthe current frame image on the top left (incorrectin this frame). These tasks can be conducted in acombined fashion for more sophisticated analysis,for instance T1 and T3 can be combined to con-duct video classification, fish detection, countingand tracking. Figure 10c shows an example resultfor this task. The final video is annotated withbrightness, clearness, green tone and fish presence

Fig. 9 Sample shots from Ecogrid videos. From left to right: clear with fish, algal presence on camera, medium brightnesswith fish, completely dark and human activity

Page 20: Semantics and Planning Based Workflow Composition for Video Processing

542 G. Nadarajan et al.

(a) T1: Video classification. (b) T2: Fish detection andcounting in frames.

(c) T1 & T3: Video classification,fish detection and tracking.

Fig. 10 Sample visual results for video processing tasks

values, as well as the number of fish in the currentframe (number at the top left of image) and thenumber of fish so far in the video (number at thebottom left of image).

These tasks will be useful to extract higher levelanalyses such as population analysis, fish behav-iour at certain times of the day or year and suitableenvironmental conditions for fish presence. How-ever, extracting useful characteristics and featuresfrom these videos would take too long to performmanually. One minute’s clip requires approxi-mately 15 minutes’ human effort on average forbasic processing tasks [4]. The following subjectsand systems were used for experimental purposes:

S1. An image processing-naive user who con-structs the solution with the assistance of theworkflow tool using full automatic and/orsemi-automatic mode.

S2. An image processing-naive user who solvesthe goal manually.

S3. An image processing expert who constructsthe solution using specialised tools (e.g. bywriting an OpenCV [15] program).

S4. A workflow modeller who constructs thesolution using the workflow tool (e.g.by manipulating the process library andontologies).

6.3 Statistical Hypothesis Testing Usingthe t-distribution

Experiments demonstrating performance gainsof plan generation with the assistance of the

workflow tool over manual processing and pro-gram generation from scratch were conducted.Performance gains are measured in CPU time,manual processing time, and quality of solutionsas compared to the ground truth provided bymarine biologists, where appropriate. The exper-iments were designed according to the principlesoutlined in [14], where first a hypothesis and itscounterpart null hypothesis are formulated, fol-lowed by the variables, conditions and subjectsfor the experiments, then the hypothesis is testedby providing measurement levels to report theresults and finally an analysis to accept or rejectthe null hypothesis. The two sample dependent t-test was performed to determine the t value andits corresponding p value in order to accept orreject the null hypothesis. The t value is given byequation (1) below:

t = d̄e√σ 2

den

(1)

where n is the sample size, d̄e is the mean of thedifferences between the manual and automatictimes and σde is the standard deviation of thismean. Based on the values of t and n, a sig-nificance level was computed. A significance levelof p < 0.05 was taken as an acceptable conditionto reject the null hypothesis.

Where appropriate, tasks and systems fromSection 6.2 will be referred to. All the exper-iments were conducted on 27 videos selectedfrom Ecogrid. For experiments to test efficiency(Section 6.4) and user learnability (Section 6.6),

Page 21: Semantics and Planning Based Workflow Composition for Video Processing

Semantics and Planning Based Workflow Composition 543

eight participants from a variety of backgroundswere selected as subjects. None of them possessedimage processing expertise; three computer scien-tists and five non computer scientists. The back-grounds of the non computer scientists includemedicine, history and archaeology, physics, ecol-ogy and marine biology. The reason for havingthis variety was to test the usability and effectsof the system on a mix of users, with technicalexpertise (computer scientists) and without, withdomain expertise (ecologist and marine biologist)and without. These two experiments also requireduser-driven evaluation measures. All experimentswere conducted on a laptop operating on Ubuntu8.04 with 512 MB RAM and 1.6 GHz processorspeed.

6.4 Manual vs. Automatic Approaches for VideoClassification Task

In this experiment, subjects were asked to clas-sify 10 videos according to brightness, clearnessand green tone (algal) levels. First, they wererequired to conduct the task manually, and thenusing the workflow tool. Using manual processing,each video was played using a video processingsoftware where the subject may pause and re-peat the video in order to determine the bright-ness, clearness and green tone levels. The subjectsrecord the classification value for brightness as“bright”, “medium”, or “dark”, the classificationvalue for clearness as “clear”, “medium” or “blur”and the classification value for green tone levelas “green” or “not green”. They were advised toselect these classification values based on theirown judgement.

Using the workflow tool, the subjects selectedthe option to perform the task ‘Video clas-sification according to brightness, clearness andgreen tone levels’. Then they were prompted toselect between a fast and a slow processing mode.For each video, they were required to run the slowprocessing followed by the fast processing. Theywere given the freedom to perform this task auto-matically using just the fast processing mode whenthey were confident enough to do so, otherwisethey would proceed to use the two modes on thevideos. The reason for conducting the experiment

in this fashion was to test the subjects’ confidencein the quality of the fast processing. The numberof videos processed before the user switched tousing the fast processing mode only was noted.

The fast processing algorithm was taken as theautomatic processing time as it has been empir-ically shown to produce the same quality of re-sults as the slow one. The CPU time taken toperform the task automatically and the time takento perform the task manually were computed. Theaccuracy of the results were computed using thefollowing procedure. As there were three clas-sification criteria (brightness, clearness and greentone), each matching value of the automatic (sys-tem) or manual (subject) classification value withthe ground truth was given a score of 1. For eachvideo manipulated by each subject, a score of 3would indicate 100 % accuracy, in which case allthree classification values (brightness, clearnessand green tone) matched the recommended val-ues. For all 10 videos manipulated by each subject,an average score was computed. A percentage wasthen calculated for the quality of the solution pro-duced. The users were also asked three additionalquestions, whether they noticed any differencein the fast and slow automatic processing times,whether they found the automatic processing lesstedious than the manual processing and whichmethod would they prefer to use if the tasks wereto be done frequently.

6.4.1 Results

Table 1 contains the average time measurements,accuracies of the results and the differences be-tween automatic and manual processing for eachsubject who participated in the experiment. Asexplained in the previous section, the metrics wereproduced based on 10 videos processed by eachsubject, out of a total of eight subjects. Therewas a total of 27 videos, where each video wasmanipulated three times throughout the entireexperiment. The classification result provided bya marine biologist was taken as the base line forthe accuracy. Based on these values, statisticaltests were performed to evaluate the efficiency ofthe methods of processing and the quality of thesolutions produced by the methods.

Page 22: Semantics and Planning Based Workflow Composition for Video Processing

544 G. Nadarajan et al.

Table 1 Time andaccuracy of automaticversus manual processing,and their differences forvideo classification task

Subject Automatic Manual Difference

Time Accuracy Time Accuracy Time Accuracy(s) (%) (s) (%) de da

1 2.12 61.11 47.90 76.19 −45.78 −15.082 2.13 61.11 39.65 53.33 −37.52 7.783 2.09 61.11 40.12 25.00 −38.03 36.114 2.06 61.11 45.33 87.50 −43.28 −26.395 2.13 61.11 35.02 52.38 −32.89 8.736 2.14 61.11 48.25 80.00 −46.11 −18.897 2.06 61.11 37.20 66.67 −35.14 −5.568 2.02 61.11 17.95 52.78 −15.93 8.33

Average 2.09 61.11 38.93 61.73 −36.83 −0.62

6.4.2 Testing of Ef f iciency

Statistical hypothesis testing using the t-distribution was conducted to measure thedependencies between the results obtained forthe times taken to conduct automatic and manualprocessing. The hypothesis, null hypothesis,independent and dependent variables for this testare given below.

– Hypothesis: Image processing-naive userssolve VIP tasks using the workflow tool fasterthan performing them manually.

– Null hypothesis: There is no difference in thetime taken for image processing-naive users tosolve the task manually and with the workflowtool.

– Independent variables:

– Data (27 Ecogrid videos of various quality).– Subjects and systems used for solving task:

S1, S2.– Task T1: Classify video according to

brightness, clearness and algal levels.

– Dependent variables: Time taken to performtask manually versus time taken to performtask automatically.

The t value, given by (1) was computed for thisdata yielding −11.43. The degree of freedom isset to n − 1, which is 7. A value of t(7) = −11.43corresponds to a significance level of p � 0.0001.This means that the null hypothesis can be re-jected. It can be concluded that the efficiency ofautomatic processing is significantly higher thanthe efficiency of manual processing.

6.4.3 Testing of Accuracy

A similar statistical hypothesis testing using thet-distribution was conducted to measure thedependencies between the accuracies betweenthe results obtained for automatic and manualprocessing. The hypothesis, null hypothesis, inde-pendent and dependent variables for this test aregiven below.

– Hypothesis: The quality of the solutions pro-duced by image processing-naive users whensolving video classification task using the au-tomatic workflow tool is higher than quality ofthe solutions produced when they perform thetask manually.

– Null hypothesis: There is no difference in thequality of the solutions produced by imageprocessing-naive users to solve the task manu-ally and with the workflow tool.

– Independent variables:

– Data (27 Ecogrid videos of variousquality).

– Subjects and systems used for solving task:S1, S2.

– Task T1: Classify video according tobrightness, clearness and algal levels.

– Dependent variables: Quality of results ascompared to ground truth.

The t value was computed using (1), yielding-0.09133. The degree of freedom is set to n −1, which is 7. A value of t(7) = −0.09133 cor-responds to a significance level of p = 0.4649.This means that the null hypothesis cannot be

Page 23: Semantics and Planning Based Workflow Composition for Video Processing

Semantics and Planning Based Workflow Composition 545

rejected. Hence, the accuracy of manual process-ing, although slightly higher on average, is notsignificant enough to indicate that it is more ac-curate than the solutions produced by automaticprocessing.

6.4.4 Analysis

One observation from the results is on the rela-tionship between the manual processing times andaccuracy. The accuracy of the subjects’ manualclassification varied slightly. It was noted that sub-jects who had domain knowledge (e.g. ecologistand marine biologist) had higher levels of accu-racies in the video classification than their coun-terparts without domain expertise (e.g. computerscientists). However, they did not take less time inperforming this classification. The main findingsof this experiment that compares automatic andmanual processing using efficiency and accuracymetrics are the following:

– Automatic processing for video classificationis on average 94.73 % faster than manualprocessing without loss of accuracy in the so-lutions produced.

– 75 % of the subjects found performing thevideo processing task using the automatic toolwas less tedious than performing it manually.

– All subjects preferred to use the automatictool over the manual method if they wereto conduct the video classification task fre-quently.

The next experiment was aimed at testing theefficiency and accuracy levels in the workflow tooland conventional image processing approacheswhen it comes to adapting to changes in userpreferences, with a focus on varying domaindescriptions.

6.5 Single- vs. Multiple-Executable Approacheson Software Alteration

This experiment aims to show that the workflowsystem which adopts a multiple-executable ap-proach adapts quicker to changes in user prefer-ences than its single-executable counterpart (spe-cialised image processing programs). This is the

test of adaptability of the workflow system toreconstruct solutions efficiently when the userchanges the domain descriptions for a task.

In this experiment, an image processing expertand a workflow modeller have access to the sameset of video and image processing tools; the for-mer has an OpenCV program with available im-age processing algorithms written as functions andthe latter in the form of independent executablesdefined in the process library.

Both subjects were familiar with the systemsthat they were manipulating. They were given anidentical task to perform – fish detection, count-ing and tracking in a video. Both systems wereable to perform this task using a default detectingand tracking algorithm. In the workflow tool, theGaussian mixture model was defined as the detec-tion algorithm, no methods were defined for theselection of any other detection algorithm. In theOpenCV program, the Gaussian mixture modelwas used as the detection algorithm. Six scenar-ios were presented to both subjects containingchanges to domain conditions (see Table 2). Bothsubjects were asked to make modifications or ad-ditions of code to their respective systems to caterfor these changes in order to solve the VIP taskas best as possible. For this purpose, they wereboth given which detection algorithm should beselected in each case. The number of lines of code(OpenCV for image processing expert and Prologfor workflow modeller) and the time taken tomake these modifications were computed for bothsubjects. A line of code in OpenCV is representedby a valid C++ line of code, i.e. a line endingwith a semi-colon (;), a looping or conditionalstatement (if/for/while). In Prolog, a line ofcode is a single predicate or fact ending with a fullstop (.) or a statement ending with a comma (,).

The quality of the solutions is calculated asfollows. There are two values to be considered,the first is the number of fish in the currentframe and the second is the number of fish inthe video so far. Each of these is given a scoreof 1 if there is a match with the ground truth.For each frame, the accuracy could be 0 %, 50 %or 100 %. An average accuracy as a percentageis computed by taking the accuracy of 10 frames(1st, 6th, 11th, ..., 46th) from each video over all27 videos.

Page 24: Semantics and Planning Based Workflow Composition for Video Processing

546 G. Nadarajan et al.

Table 2 Comparisons of number of new lines of code written, processing times and accuracies of solutions between single-executable image processing program and multiple-executable workflow system to adapt to changing domain descriptions

Domain descriptions Image processing expert Workflow modeller(User preference) New lines Time Accuracy New lines Time Accuracy

of code (min.) % of code (min.) %

Prefer false alarm than miss 43 16 58.25 3 3 59.30Prefer miss than false alarm 56 23 62.55 2 2 64.80Clear, no background movement 43 16 58.46 3 3 60.71Clear, background movement 61 27 60.42 2 2 60.10Blur, no background movement 43 16 60.88 3 3 62.09Blur, background movement 57 32 63.80 2 2 61.22

Average 50.50 21.67 60.73 2.50 2.50 61.37

6.5.1 Results

Table 2 contains the results obtained for this ex-periment. For each domain description altered,the time taken to modify the system, the numberof lines of code added, and the accuracy of thesolution are given.

The results produced are used to comparethe efficiency of two different problem solvingsystems given equal starting points in the formof available solutions and equal expectations inthe alterations required when user preferenceschange. Despite measuring the lines of code be-tween two programming languages, the experi-ment does not intend to compare the two pro-gramming languages, but rather, to show thedifferences in effort required (i.e. time) to solvevideo processing problems using two differentapproaches (single- versus multiple-executablesystems).

6.5.2 Testing of Ef f iciency

Statistical hypothesis testing using the t-distribution was conducted to measure thedependencies between the results obtained forthe times taken to make changes to the workflowtool and OpenCV program. The parameters forthis test are given below.

– Hypothesis: Constructing VIP solutions usingthe workflow tool is faster than modifyingexisting image processing programs each timea domain description is altered for a fish de-tection, counting and tracking task.

– Null hypothesis: There is no difference inthe time taken to solve the task using theworkflow tool and modifying existing imageprocessing programs each time a domain de-scription is altered for the same task.

– Independent variables:

– Data (27 Ecogrid videos of various quality)– Subjects and systems used for solving task:

S3, S4.– Task: T3 with the domain conditions given

in Table 2.

– Dependent variables:

(1) Time taken to modify existing OpenCVprogram versus time taken to encodechanges in workflow tool.

(2) Number of new lines of code added toencode the changes in OpenCV programand workflow tool.

Using the formula provided by (1), the valueof t was computed to be 7.01. The degree offreedom is set to n − 1, which is 5. A value oft(5) = 7.01 corresponds to a significance level ofp � 0.05. This means that the null hypothesis canbe rejected. Thus the workflow tool is faster toadapt to changes in domain descriptions than theimage processing program.

6.5.3 Testing of Accuracy

Again, statistical hypothesis testing using the t-distribution was conducted to measure the depen-dencies between the results obtained for the accu-racies of the solutions provided by the workflow

Page 25: Semantics and Planning Based Workflow Composition for Video Processing

Semantics and Planning Based Workflow Composition 547

tool and the OpenCV program. The parametersfor this test are given below.

– Hypothesis: Constructing VIP solutions usingthe workflow tool yields more accurate solu-tions than modifying existing image process-ing programs each time a domain descriptionis altered for a VIP task.

– Null hypothesis: There is no difference in thequality of the solutions obtained using theworkflow tool and modifying existing imageprocessing programs each time a domain de-scription is altered for a VIP task.

– Independent variables:

– Data (27 Ecogrid videos of variousquality)

– Subjects and systems used for solving task:S3, S4.

– Task: T3 with the domain conditions givenin Table 2.

– Dependent variables: Quality of solutions, as-sessed against manually determined values.

Using the formula provided by (1), the value oft was calculated to be 1.01. The degree of freedomis set to n − 1, which is 5. A value of t(5) = 1.01corresponds to a significance level of p = 0.1794.This means that the null hypothesis cannot berejected. Thus, the quality of the solutions pro-duced by the workflow tool, although on averageslightly better than the quality of the solutions ofthe image processing program, is not significantenough to be considered more superior.

6.5.4 Analysis

When an existing specialised image processingprogram can be found to support a specific task,the specialised program works very well. How-ever, when the user requirements (domain de-scriptions) are altered this is no longer guaran-teed without modifications to the program. Thismodification could range from 16 to 32 min forfish detection and counting tasks and take at most61 lines of code to be written. The workflowtool, however, is adaptable to these changes veryefficiently, taking just 3 min and 3 lines of code atmost. The steps involved for the workflow mod-eller to encode these changes include adding a

method in the process library to encapsulate thenew domain descriptions as preconditions, and therelevant detection algorithm. Two more lines areadded in the capability ontology to introduce thisdescription as a performance criteria and to tie itto a relevant tool.

In terms of accuracy, the workflow tool onaverage performed slightly better than the im-age processing program, however, this differencewas not significant enough to conclude that itproduced solutions with better quality. Hence,without loss of accuracy, the multiple-executableworkflow tool is a more adaptable problem solverthan the single-executable image processingprogram.

6.6 User Learnability in Selection of OptimalTool for Fish Detection Task

In this experiment, the system’s ability to help theuser learn and manage the processes involved inconstructing optimal video processing solutions istested. An optimal tool is one that yields the bestoverall performance for the VIP task. When theworkflow tool is run in full automatic mode, it self-manages the creation of workflows for the user.This is achieved by making use of expert heuristicsin assisting with the planning process. However,the system is not able to assess the optimality ofthe plan that results from this automatic solutionas verifying video processing solutions computa-tionally is not a trivial task. Humans, on the otherhand, are able to assess the optimality of the planby viewing the video containing the results. Forexample, it is trivial for a human to verify that thesystem has counted two fish in a frame by observ-ing the count displayed in the resulting frame andthe bounding boxes around the fish (e.g. Fig. 10c).The aim of this experiment is to test whether theuser can compare the results of different tools forthe same task and learn the best performing toolfrom this comparison.

In this experiment, each subject was presentedwith seven pairs of similar videos and sevenpairs of dissimilar videos to perform a fish detec-tion and counting task. Similar videos have thesame video descriptions (brightness, clearness andgreen tone levels), while dissimilar videos havediffering video descriptions. For these 14 pairs, it

Page 26: Semantics and Planning Based Workflow Composition for Video Processing

548 G. Nadarajan et al.

was determined previously that similar videos willuse the same detection algorithms (hence sameoptimal detection tool) while dissimilar videos re-quired different detection algorithms (hence notthe same optimal detection tool) and have beenused as baseline values for the evaluation. Thesubject was asked to perform this task using thesemi-automatic mode of the workflow tool on all14 pairs of videos. The ordering of the pairs weremixed between similar and dissimilar. The aimwas to test if they were able to determine the mostoptimal tool for the detection algorithm based onthe recommended descriptions provided by theworkflow tool. If they were, then they shouldselect this tool as the most optimal one in the nextsimilar video presented to them. They should alsonot conclude to select this tool as the most optimalone in the next dissimilar video presented to them.

Before conducting the experiment, subjectswere given a demonstration of one run using thesemi-automatic mode to familiarise them with theprocedures involved. Furthermore, they had per-formed the experiment in Section 6.4 and havesome familiarity with the system. For each pair,they first conducted the task on the first videogiven. The workflow tool will display the videoto the subject before proceeding to solve thetask. Knowing the descriptions of the video (e.g.brightness, clearness, movement speed, presenceof fish), the subject will have more informationwhen selecting a detection algorithm with thehelp of the recommended descriptions providedby the workflow tool. After selecting a detectionalgorithm, the workflow tool will display the finalresult of this task based on this selection visually.The subject repeated the same task on the samevideo by trying another detection algorithm, ifthey wished, and observed the result.

When they were confident with the most opti-mal tool, they proceeded to perform the same taskon the second video in the pair. This video couldhave similar video descriptions as the previousone (i.e. similar) or not (i.e. dissimilar). If thevideo descriptions are similar then the best toolinferred from the previous run should also bethe best (most optimal) tool for the second run,otherwise it should not. During the second run,subjects were asked which tool they would choose

based on what they have inferred from havingconducted the experiment on the first video. Eachtime they selected the best tool inferred from thefirst video for the second video in the similar pairs,they were given a score of “correct”. Each timethey selected the best tool from the first videofor the second video in the dissimilar pairs, theywere given a score of “incorrect”. If they werenot able to infer the most optimal tool from thefirst run, and thus could not infer any tool for thesecond video based on the first, they were givena score of “incorrect”. The subjects were askedfive additional questions on the usability of thesystem. These included verification of the system’sdesign principles, which included the option toprovide constraints and video descriptions, useful-ness of the recommended descriptions, and easeof use.

6.6.1 Testing of user Learnability

Statistical hypothesis testing using the t-distribution was conducted to measure thedependencies between the results obtained forthe number of times the user selected the correcttool based on a previous similar video and thenumber of times the user selected the incorrecttool based on a previous dissimilar video. Theparameters for this test are given below.

– Hypothesis: The semi-automated mechanismto compose workflows for fish detection andcounting task helps the user manage and learnthe processes involved in constructing optimalsolutions.

– Null hypothesis: The descriptions provided bythe workflow tool using the semi-automatedmechanism to compose workflows for fish de-tection and counting task do not assist the userto learn the optimal solutions.

– Independent variables:

– Data: 14 pairs of videos from 27 Ecogridvideos, 7 similar pairs and 7 dissimilarpairs.

– Subjects and systems used for solving task:S1.

– Task: T2 Detect and count fish in frames.

Page 27: Semantics and Planning Based Workflow Composition for Video Processing

Semantics and Planning Based Workflow Composition 549

– Dependent variables:

– Number of times correct tool was selectedas optimal tool in second video for similarpairs of videos.

– Number of times incorrect tool was se-lected as optimal tool in second video fordissimilar pairs of videos.

6.6.2 Results

Table 3 shows the results obtained to assess thelevel of user learnability when using the workflowtool for seven pairs of similar videos and sevenpairs of dissimilar videos.

The value of t was calculated to be 5.59 using(1). The degree of freedom is set to n − 1, whichis 7. A value of t(7) = 5.59 corresponds to a sig-nificance level of p = 0.0004. This means that thenull hypothesis can be rejected. With this, it canbe concluded that subjects, more often than not,selected the correct (optimal) tool when they werepresented with a similar video and did not choosethis tool by chance. Hence, the semi-automaticmode has helped the user learn and manage theprocesses involved in selecting the optimal stepswhen solving a task.

6.6.3 Analysis

The results in Table 3 indicate that subjects wereable to select the correct optimal tool when theywere presented with a similar video 5 out of 7

Table 3 Number of times “correct” tool was selected inseven similar pairs of videos, number of times “incorrect”tool was selected in seven dissimilar pairs of videos

Subject Similar pairs Dissimilar pairs Difference

No. correct No. incorrect c − i

choices c choices i d

1 6 2 42 4 3 13 5 2 34 6 1 55 4 3 16 6 2 47 4 2 28 5 3 2

Average 5 2.25 2.75

times on average (71.43 % of the time). This isrelatively high compared to the instances whenthey incorrectly chose an optimal tool learnt froma video for another video that is not similar to theone where they have inferred the tool from, whichwas 2.25 out of 7 times on average (32.14 % of thetime). The statistical test proves that the workflowtool is indeed able to help subjects learn optimalVIP tools without having any image processingknowledge, but purely from their own visual ca-pabilities in judging visual descriptions and fromthe textual descriptions provided by the system.

The user’s understanding is tested via the pro-vision of the description of the tools when therewas more than one tool to perform a task. Byrepeating the same task using a different tool, theuser can gain insight into how a different solutioncan be produced and could then ‘judge’ which toolshould be used for producing the most optimalsolution. Hence when a new video is presentedand the same task is performed, the user wouldhave learnt and gained confidence in selectingthe most suitable tool. The user can evaluate theperformance of any particular tool that they haveselected for a particular task. Using this feedbackloop, the system can now learn which tools workbetter for which type of videos (according to theirvideo descriptions) and under which constraintconditions.

To further validate the system’s usability andlearnability features, the findings of the question-naire are summarised as follows:

– All subjects agreed on the appropriateness ofhaving the option to provide constraints andvideo descriptions to specify the VIP task inmore detail.

– All subjects found the recommended domaindescriptions suitable and helpful in assistingthem make more informed decisions when se-lecting the best detection algorithms (i.e. VIPtools).

– All subjects felt it was appropriate to be giventhe control to select tools despite not havingimage processing expertise.

– On average, subjects were able to make thedecision to run the fast processing option forvideo classification task after just 6.4 runs.

Page 28: Semantics and Planning Based Workflow Composition for Video Processing

550 G. Nadarajan et al.

– All subjects would prefer to use the auto-mated tool if they were to perform video clas-sification and fish detection and counting tasksfrequently.

These findings indicate that although theworkflow tool and the domain that it manipu-lates are both complex, users without expertise inworkflow or image processing domains can learnhow to use the workflow tool and they can evenlearn some aspects of solving image processingproblems via selection of VIP tools. These wereachieved in a short period of time as indicatedby the experiment’s results. The experimentalfindings also verify and validate the strength of theintegrated approach. The efficiency and accuracyof the results obtained indicate that the workflowsystem is able to produce results comparable tothose produced by humans and by competing sys-tems (e.g. image processing programs). The use ofontologies has been proven to be beneficial via theprovision of recommended descriptions that wereuseful to users. The planner’s correctness andcompleteness has been tested on a set of VIP tasks(goals), constraints and detection algorithms.

7 Conclusions

This research has contributed towards the pro-vision of automatic workflow composition via anovel framework that hinges on two key technolo-gies: ontologies and planning. The integration hasresulted in a semantics-rich and flexible mecha-nism that is beneficial for the scientific, researchand application communities.

The framework has been evaluated on videoprocessing tasks, in particular for underwatervideos that vary in quality and for user re-quirements that change. Higher time-efficiencyhas been achieved in automated-supported videoprocessing tasks without loss of accuracy. In ad-dition, a new, more flexible approach for videoprocessing which utilises multiple executables thatcan be reused was introduced. Consequently, non-experts can also learn to construct optimal videoprocessing workflows which was impossible priorto this.

Optimistically, this framework could eventuallybe used for general-purpose video processing, notjust confined to underwater videos. It could alsoserve as an educational tool for naive users ona larger scale. At present, this approach is beingintegrated onto a heterogeneous multi-core en-vironment via a resource scheduler and also toa web-based front-end. It is envisaged that thisworkflow will be extended to support the man-agement of live video streams for on-demand userqueries.

References

1. Andrews, T., Curbera, F., Dholakia, H., Goland, Y.,Klein, J., Leymann, F., Liu, K., Roller, D., Smith, D.,Thatte, S., Trickovic, I., Weerawarana, S.: BusinessProcess Execution Language for Web Services Version1.1 (BPEL). IBM, BEA Systems, Microsoft, SAP AG,Siebel Systems (2003). http://download.boulder.ibm.com/ibmdl/pub/software/dw/specs/ws-bpel/ws-bpel.pdf.Last accessed 22 April 2013

2. Blythe, J., Deelman, E., Gil, Y.: Automatically com-posed workflows for Grid environments. IEEE Intell.Syst. 19(4):16–23 (2004)

3. Callahan, S.P., Freire, J., Santos, E., Scheidegger,C.E., Silva, C.T., Vo, H.T.: Managing the evolutionof dataflows with vistrails. In: IEEE Workshop onWorkflow and Data Flow for Scientific Applications(Sciflow’06), pp. 71–75 (2006)

4. Chen-Burger, Y.H., Lin, F.P.: A semantic-basedworkflow choreography for integrated sensing andprocessing. In: The 9th IEEE International Workshopon Cellular Neural Networks and Their Applications(CNNA’05) (2005)

5. Currie, K., Tate, A.: O-Plan: The open planning archi-tecture. Artif. Intell. 52, 49–86 (1991)

6. Deelman, E., Gannon, D., Shields, M., Taylor, I.:Workflows and e-Science: An overview of workflowsystem features and capabilities. Future Gener. Com-put. Syst. 25(5), 528–540 (2009)

7. Deelman, E., Singh, G., Su, M.H., Blythe, J., Gil, Y.,Kesselman, C., Mehta, G., Patil, S., Vahi, K., Berriman,B., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus:a framework for mapping complex scientific workflowsonto distributed systems. Sci. Program. J. 13(3), 219–237 (2005)

8. Ecogrid: National Center for High Performance Com-puting, Hsin-Chu, Taiwan (2006). http://ecogrid.nchc.org.tw. Last accessed 21 April 2012

9. Foster, I., Kesselman, C. (eds.): The Grid 2: Blueprintfor a New Computing Infrastructure, 2nd edn. MorganKaufmann (2003)

10. Foster, I., Voeckler, J., Wilde, M., Zhao, Y.: Chimera:A virtual data system for representing, querying, and

Page 29: Semantics and Planning Based Workflow Composition for Video Processing

Semantics and Planning Based Workflow Composition 551

automating data derivation. In: 14th Conference onScientific and Statistical Database Management,pp. 37–46 (2002)

11. Ghallab, M., Nau, D., Traverso, P.: Automated Plan-ning: Theory & Practice. Morgan Kaufmann (2004)

12. Gil, Y., Ratnakar, V., Deelman, E., Mehta, G., Kim,J.: Wings for Pegasus: creating large-scale scientific ap-plications using semantic representations of computa-tional workflows. In: Proceedings of the 19th NationalConference on Innovative Applications of ArtificialIntelligence (IAAI’07), pp. 1767–1774. AAAI Press(2007)

13. Gómez-Pérez, A., Fernández-López, M., Corcho, O.:Ontological Engineering. Springer (2004)

14. Howell, D.C.: Statistical Methods for Psychology, 6thedn. Belmont, CA (2007)

15. Intel: Open Source Computer Vision (OpenCV)Library (2006). http://sourceforge.net/projects/opencvlibrary. Last accessed 22 April 2013

16. Kim, J., Spraragen, M., Gil, Y.: An intelligent assistantfor interactive workflow composition. In: Proceedingsof the International Conference on Intelligent User In-terfaces (IUI’04), pp. 125–131. ACM Press (2004)

17. Ludäscher, B., Altintas, I., Berkley, C., Higgins, D.,Jaeger, E., Jones, M., Lee, E.A., Tao, J., Zhao, Y.:Scientific workflow management and the Kepler sys-tem. Concurr. Comput. Pract. Exper. 18(10), 1039–1065 (2005)

18. McCluskey, T.L., Liu, D., Simpson, R.M.: GIPO II:HTN planning in a tool-supported knowledge en-gineering environment. In: ICAPS’03, pp. 92–101(2003)

19. Nadarajan, G., Chen-Burger, Y.H.: Goal, videodescriptions and capability ontologies for Fish4-Knowledge domain. In: Special Session on Intelli-gent Workflow, Cloud Computing and Systems (KES-AMSTA’12) (2012)

20. Nadarajan, G., Chen-Burger, Y.H., Fisher, R.B.:A knowledge-based planner for processing uncon-strained underwater videos. In: IJCAI’09 Workshopon Learning Structural Knowledge from Observations(STRUCK’09) (2009)

21. Nadarajan, G., Chen-Burger, Y.H., Fisher, R.B.:SWAV: Semantics-based workflows for automaticvideo analysis. In: Special Session on IntelligentWorkflow, Cloud Computing and Systems (KES-AMSTA’11) (2011)

22. Nadarajan, G., Renouf, A.: A modular approach forautomating video processing. In: 12th InternationalConference on Computer Analysis of Images and Pat-terns (CAIP’07) (2007)

23. Nadarajan, G., Spampinato, C., Chen-Burger, Y.H.,Fisher, R.B.: A flexible system for automated com-position of intelligent video analysis. In: 7th Interna-

tional Symposium on Image and Signal Processing andAnalysis (ISPA’11) (2011)

24. Nau, D., Au, T.C., Ilghami, O., Kuter, U., W.Murdock,Wu, D., Yaman, F.: SHOP2: An HTN planning system.J. Artif. Intell. Res. 20, 379–404 (2003)

25. Nau, D.S., Cao, Y., Lotem, A., Muñoz Avila, H.:SHOP: Simple hierarchical ordered planner. In: In-ternational Joint Conference on Artificial Intelligence(IJCAI’99), pp. 968–973 (1999)

26. Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M.,Greenwood, M., Carver, T., Glover, K., Pocock, M.R.,Wipat, A., Li, P.: Taverna: A tool for the compositionand enactment of bioinformatics workflows. Bioinfor-matics 20(17), 3045–3054 (2004)

27. Oinn, T., Greenwood, M., Addis, M., Alpdemir, N.,Ferris, J., Glover, K., Goble, C., Goderis, A., Hull, D.,Marvin, D., Li, P., Lord, P., Pocock, M., Senger, M.,Stevens, R., Wipat, A., Wroe, C.: Taverna: Lessonsin creating a workflow environment for the life sci-ences. Concurr. Comput. Pract. Exper. 18(10), 1067–1100 (2006)

28. Renouf, A., Clouard, R.: Hermes—A human-machineinterface for the formulation of image process-ing applications (2007). https://clouard.users.greyc.fr/Hermes/index.html. Last accessed 22 April 2013

29. Tannenbaum, T., Wright, D., Miller, K., Livny, M.:Condor—A distributed job scheduler. In: Sterling, T.(ed.) Beowulf Cluster Computing with Linux. MITPress (2001)

30. Tate, A.: Intelligible AI planning—generating plansrepresented as a set of constraints. In: Proceedingsof the Twentieth British Computer Society SpecialGroup on Expert Systems International Conferenceon Knowledge Based Systems and Applied ArtificialIntelligence (ES’00), pp. 3–16. Springer (2000)

31. Taylor, I., Deelman, E., Gannon, D., Shields, M.:Workflows for e-Science. Springer, New York (2007)

32. Taylor, I., Shields, M., Wang, I., Rana, O.: Triana ap-plications within Grid computing and peer to peer en-vironments. J. Grid Computing 1(2), 199–217 (2003)

33. Taylor, I., Shields, M., Wang, I., Harrison, A.: TheTriana workflow environment: architecture and ap-plications. In: Taylor, I., Deelman, E., Gannon, D.,Shields, M. (eds.) Workflows for e-Science, pp. 320–339. Springer, New York (2007)

34. van Splunter, S., Brazier, F.M.T., Padget, J.A., Rana,O.F.: Dynamic service reconfiguration and enactmentusing an open matching architecture. In: Interna-tional Conference on Agents and Artificial Intelligence(ICAART’09), pp. 533–539 (2009)

35. von Laszewski, G., Hategan, M.: Java CoG KitKarajan/Gridant workflow guide. Technical report,Argonne National Laboratory, Argonne, IL, USA(2005)