cubrik integration guidelines

50
INTEGRATION GUIDELINES Human-enhanced time-aware multimedia search CUbRIK Project FP7-ICT-287704 Deliverable D9.1 WP9 Deliverable Version 1.0 – 31 January 2012 Document. ref.: cubrik.D9.1.ATN.WP9.V1.0

Upload: cubrik-project

Post on 23-Mar-2016

221 views

Category:

Documents


1 download

DESCRIPTION

Overview of software engineering techniques for software developments in CUbRIK

TRANSCRIPT

Page 1: CUbRIK Integration Guidelines

INTEGRATION GUIDELINES Human-enhanced time-aware multimedia search

CUbRIK

Project FP7-ICT-287704

Deliverable D9.1 WP9

Deliverable Version 1.0 – 31 January 2012

Document. ref.: cubrik.D9.1.ATN.WP9.V1.0

Page 2: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines D9.1 Version 1.0

Programme Name: ...................... ICT Project Number: ........................... 287704 Project Title:.................................. CUbRIK Partners:........................................ Coordinator: ENG

Contractors: UNITN, TUD, QMUL, LUH, POLMI, CERTH, NXT, MICT, ATN, FRH, INNEN, HOM, CVCE, EIPCM

Document Number: ..................... cubrik.D9.1.ATN.WP9.V1.0 Work-Package: ............................. WP9 Deliverable Type: ........................ Document Contractual Date of Delivery: ..... 31 January 2012 Actual Date of Delivery: .............. 31 January 2012 Title of Document: ....................... Integration Guidelines Author(s): ..................................... Björn Decker (ATN)

Ralph Traphöner (ATN) Igor Novakovic (ATN) Vincenzo Croce (ENG) Lorenzo Eccher (ENG) Paolo Mabboni (ENG) Piero Fraternali (POLMI) Aldo Bongio (WBM) Alessandro Bozzon (POLMI)

....................................................... Approval of this report ............... Vincenzo Croce (ENG) Summary of this report: .............. Overview and software engineering regulations

on how to develop software within CUbRIK

History: .......................................... Keyword List: ............................... Integration Guidelines, SMILA, Architecture,

Software Development, Software Engineering Availability .................................... Public

This work is licensed under a Creative Commons Attribution-NonCommercial-

ShareAlike 3.0 Unported License.

This work is partially funded by the EU under grant IST-FP7-287704

Page 3: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines D9.1 Version 1.0

Disclaimer

This document contains confidential information in the form of the CUbRIK project findings, work and products and its use is strictly regulated by the CUbRIK Consortium Agreement and by Contract no. FP7- ICT-287704.

Neither the CUbRIK Consortium nor any of its officers, employees or agents shall be responsible or liable in negligence or otherwise howsoever in respect of any inaccuracy or omission herein.

The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7-ICT-2011-7) under grant agreement n° 287704.

The contents of this document are the sole responsibility of the CUbRIK consortium and can in no way be taken to reflect the views of the European Union.

Page 4: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines D9.1 Version 1.0

Table of Contents

EXECUTIVE SUMMARY 1

1. INTRODUCTION: AN AGILE APPROACH TO INTEGRATION 2

2. LAYERS OF THE CUBRIK ARCHITECTURE 4

2.1 THE REQUIREMENTS FOR THE CUBRIK ARCHITECTURE 5 2.2 CONTENT AND USER ACQUISITION TIER 6 2.3 CONTENT PROCESSING TIER 9 2.4 QUERY PROCESSING TIER 10 2.5 SEARCH TIER 12 2.6 DATA STORAGE SERVICES 12 2.7 APPLICATIONS PROGRAMMING INTERFACES 12

3. HOW TO INTEGRATE COMPONENTS IN CUBRIK 13

3.1 SET UP THE DEVELOPMENT ENVIRONMENT 13 3.2 WRAP COMPONENT 13 3.3 CREATE PIPELINE / ASYNCHRONOUS WORKFLOW 13 3.4 TEST WORKFLOW 14

4. CONSIDERING LICENSE ISSUES 15

5. HOW TO DESCRIBE COMPONENTS 16

5.1 GENERAL COMPONENTS DESCRIPTION 16 5.1.1 Installation and configuration file naming and location 16 5.1.2 Component Description 17 5.1.3 Contact Information 17 5.1.4 Code Example 17 5.1.5 Dependencies on other CUbRIK Components 17 5.1.6 Third Party Libraries 18 5.1.7 Interface 18 5.1.8 Installation Prerequisites 18 5.1.9 Unpackaging 19 5.1.10 Component (Back End) and Web Service Installation 19 5.1.11 Component (Back End) and Web Service Test 19

6. HOW TO DEVELOP AND DELIVER COMPONENTS 20

6.1 DEVELOPMENT CONVENTIONS 20 6.1.1 Naming Conventions 20 6.1.2 Guidelines Writing Source Code 20 6.1.3 Exception Handling 21 6.1.4 Logging Guidelines 21 6.1.5 Third Party Components Integration Guidelines 21

6.2 DESIGN PATTERN FOR CUBRIK COMPONENTS 22 6.2.1 Pattern for Common Features 22 6.2.2 Pattern for Agents 22 6.2.3 Pattern for Crawlers 24 6.2.4 Pattern for Pipelets 25 6.2.5 Pattern for Pipelines 27 6.2.6 Additional Guidelines Pipelines 27

6.3 TESTING 28 6.3.1 Unit Test 28 6.3.2 Consumer-Provider Testing 28

Page 5: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines D9.1 Version 1.0

6.3.3 Integration Testing 28 6.3.4 End to End Functional Testing 29

6.4 DELIVERY PROCESS GUIDELINES 30 6.4.1 Component and Pipelines Provision Form 30 6.4.2 Installation and Configuration guideline 30 6.4.3 Path and references 30 6.4.4 Test client 30

6.5 SOFTWARE FACTORY SUPPORT ENVIRONMENT 31 6.5.1 CUbRIK Subversion System 31 6.5.2 CUbRIK SVN Structure 31 6.5.3 Bugs, releases and feedbacks tracking system 32 6.5.4 Bugzilla Reports 32 6.5.5 CUbRIK Bugzilla structure 34

REFERENCES 35

ANNEX A: INST_CONF.TXT FOOCOMPONENT EXAMPLE 36

Page 6: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines D9.1 Version 1.0

Table of Figures Figure 2-1 Essential aspects of the CUbRIK architecture as illustrated in the DoW 5

Figure 2-2 Detailed view of the CUbRIK architecture ...............................................6

Figure 3-1 Screenshot of BPEL Designer...............................................................14

Figure 5-1 Folders and files structure for a 3 parts components ............................17

Figure 5-2 Table with Example of description of a 3rd party library........................18

Figure 6-1 DoSomething datamodel class diagram...............................................22

Figure 6-2 Foo Agent class view............................................................................23

Figure 6-3 Foo Agent sequence diagram ..............................................................23

Figure 6-4 Foo Crawler class view.........................................................................24

Figure 6-5 Foo Crawler sequence diagram ...........................................................25

Figure 6-6 DoSomething pipelet class diagram.....................................................26

Figure 6-7 DoSomething pipelet sequence diagram .............................................26

Figure 6-8 SVN Structure.......................................................................................32

Figure 6-9 CUbRIK Bugzilla permissions schema.................................................34

Page 7: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 1 D9.1 Version 1.0

Executive Summary

This deliverable describes the background and procedures on how to integrate the components developed in CUbRIK – together with third party components – to create an open platform for multimedia search.

The actual architecture will be described in D9.8 Architecture Design (m8).

Page 8: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 2 D9.1 Version 1.0

1. Introduction: An Agile Approach to Integration

Multimedia search engines today are “black-box” systems. This closed architecture makes it difficult for technology providers, application integrators, and end-users to try out novel approaches for multimedia content and query processing, because there is no place where one can deploy content, components, and processes, integrate them with complementary technologies, and assess the results in a real and scalable environment.

The key technical principle of CUbRIK is to create a “white-box” version of a multimedia content & query processing system, by unbundling its functionality into a set of search processing pipelines, i.e., orchestrations of Open Source and third-party components instantiating current algorithms for multimedia content analysis, query processing, and relevance feedback evaluation. Examples will be pipelines for extracting metadata from media collections using the software mix that best fits application requirements, for processing multimodal queries, and for analyzing users' feedback in novel ways.

CUbRIK aims at constructing an open platform for multimedia search practitioners, researchers and end-users, where different classes of contributors can meet and advance the state-of-the-art by joining forces. Important scientific contributions will be the systematic integration of human and social computation in the design and execution of pipelines, and the enrichment of multimedia content and query processing with temporal and spatial entities.

On the business side, CUbRIK will endorse an ecosystem, where a multitude of actors will concur to implement real application scenarios that validate the platform features in real world conditions and for vertical search domains. The CUbRIK community will bring together technology developers, software integrators, social network and crowdsourcing providers, content owners and SMEs, to promote the open search paradigm for the creation of search solutions tailored to user needs in vertical domains.

To fulfill these objectives, an agile approach to integration will be applied within CUbRIK [1]. Based on initial scenarios, the application within CUbRIK will be developed based on early feedback from users. This will involve users early in the development of CUbRIK applications, thus reducing risks during this development.

These Integration Guidelines provide the organizational regulations on how this agile integration will be performed:

• Chapter two (Layers of the CUbRIK Architecture) gives an overview of the CUbRIK architecture. This helps developers to locate their developments within the CUbRIK platform. Users get an overview of the main functionality of CUbRIK.

• Chapter three (How to Integrate Components in CUbRIK) describes how components implementing and extending the functionalities of the CUbRIK platform should be integrated. While this chapter is mainly intended for developers, it allows technology-savvy users to integrate their own components.

• Chapter four (Considering License Issues) provides guidelines on how components with different licenses can be integrated into the CUbRIK architecture. When considered during development, it builds the basis for applications based on the CUbRIK platform to be used in commercial settings – an important issue for the sustainable use of CUbRIK results.

• Chapter five (How to Describe Components) provides a template explaining how CUbRIK components have to be described. By following this template, the components within CUbRIK are described in a way that allows users to evaluate, adapt and deploy the CUbRIK platform based on their needs.

• Chapter six (How to Develop and Deliver Components) gives a detailed description on how the actual development is performed. It provides the technical background on how

Page 9: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 3 D9.1 Version 1.0

the agile approach to integration can be actually performed. For a user, these guidelines assure that a system is developed using state-of-the-art software engineering techniques.

Page 10: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 4 D9.1 Version 1.0

2. Layers of the CUbRIK Architecture

The goal of the CUbRIK architecture design is to provide a light-weight platform for executing pipelines of heterogeneous components, mixing open source and proprietary software components and services realized by different organizations.

The architectural concept of CUbRIK differs from the development of a monolithic do-it-all architecture (e.g., multimedia databases), in that:

• There are no technological assumptions on the nature of the multimedia data processing components that participate to a content or query processing pipeline;

• There are no assumptions on the structure of the pipelines that realize a piece of CUbRIK functionality. Each pipeline is characterized by its specific workflow, which is an arbitrary mix of human and machine tasks. As a default, pipelines are represented in BPEL, but any language with comparable expressive power is an allowed replacement;

• The integration mechanism of pipelines and components is a conceptual model of data (specified in deliverable D2.1), which expresses the object model of CUbRIK data. Each component is responsible for adapting itself to the CUbRIK data model, via appropriate input/output adaptors;

• In principle, there is no assumption also on the data storage technology and on the location of data. Components could use different data storage and distribution technologies and policies;

• If, for example, for performance reasons, multiple components or pipelines need to share ad hoc data structures and formats (e.g., data caches), they remain responsible for inter-component and inter-pipeline communication and data exchange protocols local to such a component / pipeline pool; when they communicate with other generic components and pipelines (e.g., for accessing input data collections, for communicating conflicts to a conflict resolution module, for interacting with general-purpose crowdsourcing platforms), they must adhere to the CUbRIK data model for data representation and on SOAP and REST for multimedia Web service interactions, possibly enhanced with MTOM [2] and SOAP Messages with Attachments [3].

The design of the CUbRIK architecture is an example of differential design. The architecture is based on SMILA [4] as the underlying framework for supporting workflow definition and execution.

Therefore, the design of the CUbRIK architecture has proceeded by:

• Identifying the technical requirements of human computation enhanced multimedia processing in CUbRIK;

• Analyzing the capacities of SMILA related to CUbRIK objectives;

• Identifying gaps between SMILA and CUbRIK;

• Designing the architectural extensions needed to bridge the identified gaps.

Page 11: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 5 D9.1 Version 1.0

2.1 The requirements for the CUbRIK architecture

The original concept of the CUbRIK architecture is specified in the Project DoW, as shown in Figure 2-1.

Figure 2-1 Essential aspects of the CUbRIK architecture as illustrated in the DoW

This schema highlights that the CUbRIK architecture relies on a framework for executing processes (aka pipelines), consisting of collections of tasks to be executed in a distributed fashion. Each pipeline is described by a workflow of tasks, allocated to executors. Task executors can be software components (e.g., data analysis algorithms, metadata indexing tools, search engines of different nature, result presentation modules, etc.). Tasks can also be allocated to individual human users (e.g., via a gaming interfaces) or to an entire community (e.g., by a crowdsourcing component).

Different pipelines are defined for the different processes of a multimedia search application: content analysis and metadata extraction, query processing, and relevance feedback processing. Pipeline descriptions are stored in a process repository, for example encoded in the BPEL standard workflow language. A suitable data model supports the data exchanges across services in order to cope with the data-intensive nature of multimedia content processing and search.

The high level architecture schema of Figure 2-1 is made operational in the refined architecture diagram shown in Figure 2-2.

The CUbRIK architecture is divided into four main layers or tiers:

1. Content and user acquisition; 2. Content processing; 3. Querying; 4. Search.

In addition, the architecture comprises general purpose data storage facilities and APIS for developing applications on top of CUbRIK.

Page 12: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 6 D9.1 Version 1.0

Figure 2-2 Detailed view of the CUbRIK architecture

2.2 Content and User Acquisition Tier

The content and user acquisition tier has a twofold purpose:

o Supporting the provision of content into the CUbRIK platform, as required by one or more pipelines. Two major content provision modalities are supported:

• By copy: content is physically ingested and stored inside an instance of the platform. This mode is viable when content access rights permit to upload or make a copy inside a CUbRIK instance;

• By reference: content is stored externally and accessed by CUbRIK pipelines when necessary. This is the case, for example, of external

Page 13: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 7 D9.1 Version 1.0

semantic repositories (e.g., Entitypedia), which cannot obviously be stored internally, but need to be accessed via data transfer APIs from within a CUbRIK pipeline.

o Supporting the subscription of users to a CUbRIK platform instance, for using CUbRIK to launch multimedia search applications or to join CUbRIK as human performers of multimedia processing tasks. The acquisition of users to a CUbRIK instance can occur in two ways:

• By registration: users are offered an explicit application to register themselves to a CUbRIK instance and provide their profile data;

• By import: users can be detected and imported from external systems, including crowdsourcing platforms and social networks. This modality is respectful of the terms of usage of the user’s community platform and may simply provide a reference to the user’s identity (e.g., a public profile) with no associated profile data.

Users identities, profile data for registered users and simple references for users of other platforms, are normalized according to the CUbRIK data model and made available to the other subsystems and modules via suitable user data access Web services.

The content and user acquisition tier consists of the subsystems is described in the following.

The Subscription Manager

The Subscription Manager handles the explicit registration of users to the platforms. There are two broad classes of users: searchers and performers.

o Searchers use CUbRIK apps for searching and interacting with information; they may be exploited to get feedback on query result quality (e.g., via click stream analysis);

o Performers execute tasks on CUbRIK applications (for example, via an external crowdsourcing platform, a gaming or Query&Answer application) to provide contribution, inspections or conflict resolution.

The registration process is lightweight and leverages existing single sign-on platforms (e.g., login with Facebook account via OAuth, openID, etc.). In this case, profile data of the user who registers himself to the CUbRIK platform are kept in the system where they belong, and CUbRIK stores only the minimal amount of data for referencing the external user.

The registration process associates a unique CUbRIK ID to the existing account of the user.

The CUbRIK Subscription Manager should be designed to be “privacy-friendly”, thus using (and retaining) only the users information required for internal purposes (e.g., performance monitoring of CUbRIK task execution by performers, as needed to implement capacity-based task-to-user allocation policies), reusing their native profiles and personal data on social networks as much as possible and in compliance with the data access terms of each social network platform.

As a consequence of this design principle, CUbRIK pipelines should not rely on personal data retention features other than the work statistics of users who subscribed to perform specific CUbRIK tasks in human-enhanced CUbRIK pipelines.

The Upload and Crawling Managers

The Upload Manager and the Crawling Manager are the subsystem responsible of provisioning a CUbRIK instance with raw content (images, video, audio, text). Content elements can be added to a CUbRIK platform via upload or by scheduled crawling. After acquisition, they are subjected to normalization by means of one or more content processing pipelines, so to be made ready for search pipelines. These Managers are based on the

Page 14: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 8 D9.1 Version 1.0

import functionality provided by SMILA.

Crawling may import into CUbRIK external metadata in popular formats (e.g., Dublin Core, or a useful subset of MPEG-7). External metadata are reformatted according to the CUbRIK data model by a suitable stage of the content processing metadata.

Content registration gives content element a CUbRIK ID, and then stores:

� The content element in raw format; � The associated crawled or manual metadata (if any); � The associated content rights metadata (if any);

Content acquisition is independent of and asynchronous w.r.t. content processing; content processing pipelines can be activated immediately after the acquisition terminates, or at a later stage. The Upload and Crawling Manager maintains log information on the content that been acquired and is pending for some content processing pipeline to be applied to it.

Content acquisition includes extraction of metadata, including rights and license information.

The Content and Metadata Acquisition Manager

The Content and Metadata Acquisition Manager is an internal sub-system that offers services to the Upload Manager and to the Crawling Manager. It factors out the logic for normalizing the representation of content and of the associated metadata to the internal standards of CUbRIK. Metadata are aligned to the CUbRIK data model. Content is preserved in its native format and stored either by copy or by reference, depending on the copyright permissions and on the requirements of the content processing and query pipelines.

The Content and Metadata Acquisition Manager has a plug-in structure, whereby specific content reformatting tasks can be attached to an acquisition or upload job. For example, a post-processing task may require that video is transcoded into a different format from the native one, for cross publishing on a different access device (e.g., content from the fixed internet is transcoded at a lower resolution for mobile access).

Where possible, the Content and Metadata Acquisition manager will be implemented based on SMILA, e,g., by using pipelets that extract the needed metadata from the source or the file.

The Copyright Awareness Manager

The goal of this component is to address copyright aspects (e.g., regarding right of reproduction, right of communication to the public) for content approval, storage, annotation, transformation, presentation and distribution. The idea is to partially automate, both using automatic annotation and rules as well as user input, the determination of whether and how content is processed and used in the platform. The aim is to maximize availability of content for users, while ensuring respect for copyright holders at the same time, especially respect for the rights of users that participate in crowdsourcing and content production processes in CUbRIK.

The Copyright Awareness Manager is responsible for:

a.) Determining the content status / content approval for the system. This is done by using relevant information (contextual and otherwise) to determine content provenance / authenticity and trust into source / content provider;

b.) Using and interpreting relevant metadata (including CC content licenses, ACAP and other relevant information) and contextual information aggregated and harmonized by the Upload and Crawling Managers and Content and Metadata Acquisition Manager, derivation of permissions of how content (and derived information / metadata) should be handled in the system, and communication of permissions to the relevant system domains (storage, annotation, transformation, presentation, delivery), where it will be interpreted;

Page 15: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 9 D9.1 Version 1.0

c.) Communication with rights holders and crowds to track, complement and modify rights, license and provenance information and resolve possible conflicts.

2.3 Content Processing Tier

The Content Processing Tier is not a monolithic subsystem, but rather a set of pipelines (implemented using the SMILA framework) that are devoted to processing the content acquired the modules of the Content Acquisition Tier, in order to make it amenable for query processing.

The content processing pipelines are independent from the content acquisition jobs. Multiple pipelines can be applied to the same content collection and conversely, a same content collection can be subjected to more than one processing pipelines, e.g., to support different query scenarios.

The link between acquisition and processing is loose and based on a registration and notification mechanism. A pipeline registers for the availability of given content and is notified when new content elements become available.

The Content Processing Manager wraps the functionality of the native SMILA pipeline engine. It listens to a queue of pending content processing tasks and is responsible of starting, suspending, resuming, terminating, rescheduling a content processing task.

The content processing tier processes contents at different levels of granularity:

� Collection; � Sub-collection; � Content element; � Derivative content element;

The Content Processing Manager is oblivious to the internal status of a specific content processing task, which is responsibility of the SMILA pipeline engine.

A Content Processing Task has four external states:

� Pending; � Running; � Ended; � Suspended; � Terminated.

The SMILA pipeline engine orchestrates the actual execution of pipelines in response to requests for content processing. It is aware of the internal status of the content processing task (which pipelines have been executed and which ones are still to be executed). It deals with exceptions, pipeline monitoring and restart.

Content processing must be incremental: the same content can be subject to several processing pipelines, where each pipeline adds a new annotation. The goal is to enable real-time search, while giving to the infrastructure the time to perform full-annotation.

A pipeline is characterized by a “priority” flag (e.g., real-time, deferred, etc.) that instructs the SMILA pipeline engine about the importance of a given content processing task.

Control flow among pipelines is expressed declaratively (e.g., as a macro-pipeline formed of sub-pipelines, with runtime conditions governing the flow of control).

The typical output of a content processing pipeline consists of:

� Derivative content (e.g., key frames, thumbnails, audio summaries); � Low-level features; � Facts, i.e., annotations (a.k.a. high-level annotations) + confidence values; � Entities; � Conflicts (low confidence facts, contradictory facts).

However, depending on the application a content processing pipeline may output only a

Page 16: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 10 D9.1 Version 1.0

subset of the abovementioned elements.

The Conflict Manager

The Conflict Manager handles the set of conflicts and the assignment of conflicts to applications and performers.

Conflicts can be assigned:

� To an application: in this case the application manages the allocation of conflicts to be solved to performers. This is the typical case for GWAP apps;

� To an application-performer pair: in this case, the application routes the conflict to the selected performer. This is the typical case of Q&A apps.

The Conflict Manager is responsible of closing a conflict and storing the produced facts in the conflict store and, possibly, in the native knowledge repository (e.g., the EntityPedia semantic store).

The Conflict Manager can implement a policy of escalation (e.g., re-routing a hard conflict to a more skilled performer, or to the CUbRIK admin). It also accesses the performer store in order to implement assignment policies that take into account the difficulty level of a task and the available profile data of a user (e.g., the record of task resolutions).

The Performer Manager

The Performer Manager is responsible of keeping statistics on performers (profile, history of solved conflicts, throughput, quality of decision, etc.). These data can be used for several purposes:

� The construction of leader boards for gaming applications; � The design of task to performer allocation policies based on task difficulty levels and

performer’s skill level; � The design of task to performer allocation policies based on demographic data (e.g.,

location aware assignment policies, where the cultural background of the user may influence his or her adequacy as a task executor).

The Conflict Resolution Applications

The CUbRIK platform does not perform conflict resolution as part of its core pipelines. Rather it offers APIs for exporting conflicts and transforming them into tasks that can be performed by humans, by means of conflict resolutions applications.

A conflict resolution application can be:

� An application built on top of an existing crowdsourcing platform (e.g., on top of Microtask);

� A gaming application; � A query and answer application.

The Relevance Feedback Pipelines

Some pipelines are designed to receive feedback from the user on the results of a query. This feedback is routed to a feedback manager module that updates the level of trust of performers (human and automatic) in the component and performer store.

2.4 Query processing tier

CUbRIK accepts users' queries by means of external applications.

A Query App contains the front-end for issuing queries and viewing results; it normally triggers a SMILA query pipeline, passes to it the user’s input, receives the results of the query

Page 17: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 11 D9.1 Version 1.0

pipeline and formats them according to the interface specifications of the query application.

Queries are expressed according to a given query language, serialized and submitted to a CUbRIK pipeline (through Web services API) according to a given query protocol; the simplest case is that of mono-modal queries expressed as bag of keywords. More complex queries are mono-modal content similarity queries and multi-modal content similarity queries. The former require the submission of one content sample. The latter may require more than one content sample or a composite content sample (e.g., a video fragment).

Results are organized according to a given result schema, serialized, and returned as responses from CUbRIK to the app. Results in a result list are sorted and chunked. Results typically consist of:

� References to matched content element; � Match details for each content element; � Annotation values for the collection of results (facets).

The query protocol should allow CUbRIK applications to provide such information as the requested information profile (e.g., only content elements, content elements + associated annotations, content elements + associated annotations + annotation values for the result list collection).

Part of the functionalities of this tier can be implemented using a (synchronous) query pipeline based on SMILA. This pipeline might reuse components used for content processing or data acquisition (e.g., the metadata of an uploaded picture or a set of low level descriptors derived from the uploaded picture).

The Query Interpreter

The Query Interpreter receives the user’s queries coming from a Query Application, through a query APIs. Classes of supported queries are:

� Keyword; � Visual similarity (Still image and video), also called visual content based queries; � Aural similarity, also called aural content based queries; � Multimodal (keyword + one similarity criterion).

Further features of the Query Interpreter are:

� Queries must also accept temporal and spatial conditions (i.e., spatial proximity, temporal constraints, etc.);

� Queries accepts the following Query Modifiers: o Number of results; o Number of results per page; o Maximum execution time.

The Query Broker

The Query Broker associates a query to the query pipeline(s) that are in charge of processing it. The association of a query type to a pipeline can be:

� Static: the query application is registered into the CUbRIK platform as the submitter of a specific class of query that is associated to a specific query pipeline;

� Dynamic: The query pipeline that is associated (statically or dynamically) to the query has the responsibility of translating the query in the format expected by the search engine(s) and of sending the query or sub-queries to the search engine(s);

� For similarity queries, the Query Broker acts as a content acquisition app. It sends the query content element to the content acquisition and registration manager, with a high priority flag in order to have the content element analyzed and indexed immediately;

Page 18: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 12 D9.1 Version 1.0

� The Response Builder normalizes and fuses the responses from the search engines(s) and creates a single result list, to be returned to the query app;

� The Query Interpreter, the query broker, and the response builder have a plugin structure. It should be possible to add a new search engine and register the logics for analyzing the query, translating it, and processing the result list.

2.5 Search Tier

The search tier contains a collection of independent Search Engines that support the execution of query pipelines.

There are two principal ways to use a search engine from a CUbRIK pipeline:

� Black box: the search engine is external to the CUbRIK platform and is used to fetch content useful in the query pipeline (e.g., calling Google Images to retrieve logo images corresponding to a brand name input by the user). In this case, the search engine is used without accessing to the index;

� White box: the search engine is internal to the CUbRIK platform and is used to index and query content procured by the CUbRIK content acquisition tier.

Each search engine used in white box mode can access the content and annotation store(s) to build/rebuild its indexes. Indexing is independent and asynchronous w.r.t. content processing and acquisition. Each white box search engine should listen to the content processing manager events, in order to understand when to build, re-build, extend, and update its indexes.

2.6 Data Storage Services

The CUbRIK Stores are persistent storage repositories accessible via Web service under access control policies. All CUbRIK data elements that need to be referred or generated externally to the platform must have a unique ID (e.g.: contentID, factID, conflicted, userID). Wherever possible, standard identification mechanisms will be used.

Storage (in principle) should be distributed. This implies that IDs should be assigned according to a distributed object identification mechanism.

Content should be encoded (or re-encoded) so to be available for rendering and usage by several platform (e.g., IPhone//iPad). Since SMILA provides a coherent set of JSON REST Interfaces, these interfaces should be used as a basis.

2.7 Applications Programming Interfaces

The CUbRIK Apps are Web and mobile applications that interact with CUbRIK by accessing the Web services offered by the platform.

The classes of CUbRIK apps are:

� GWAPs for contribution (new annotation generation); � GWAPs for inspection; � GWAPs and Q&A applications for conflict resolution; � Vertical search applications (use case demonstrators); � Horizontal search applications (feature demonstrators); � Collection management applications (manually adding-deleting-updating collections,

content elements, facts, conflicts…); � Community management applications (monitoring user activity, managing incentives,

visualizing social structures, messaging, etc).

CUbRIK apps could be standalone or accessible through social network or crowdsourcing platforms (e.g., Facebook apps, Microtask Apps).

Page 19: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 13 D9.1 Version 1.0

3. How to Integrate Components in CUbRIK

While Chapter 2 gave an overview of the layers of the CUbRIK architecture, this chapter provides technical background that needs to be considered when components are integrated. Since the CUbRIK architecture is based on SMILA as the underlying framework for supporting workflow definition and execution, the How to of component integration for SMILA is relevant for component integration in CUbRIK.

Most of these descriptions are already available online as part of the SMILA documentation. Therefore, only a short overview with references to the corresponding online documentation is given. Furthermore, details of the software engineering process and coding conventions are given in Chapter 6.

3.1 Set Up the Development Environment

To start with the development using SMILA, you need to set up the development environment. For SMILA, a Java Development Kit, together with the current Eclipse IDE release, is needed.

After installation of this environment, you may get the software distribution of SMILA.

In addition – to be able to prepare the definitions of a workflow – you might also add the BPEL Editor extension to your IDE.

Further details can be found in [5].

3.2 Wrap Component

After the development environment has been set up, you can start integrating your own component into SMILA. In most cases, you will implement a processing component that works upon data crawled by SMILA. In SMILA, you have two ways to integrate such kinds of components:

� as a pipelet - being part of a synchronous pipeline [6], or � as a worker within an asynchronous workflow [7].

If in doubt which one to select, use the pipelet implementation, since there is a worker available that let you execute a synchronous pipeline or an individual pipelet also in an asynchronous workflow [8].

When you want to integrate a new data source, you might extend existing data importers. The respective API, as well as the extension of data importers, is described in [9], in the Importing section.

When you want to include third party components, you need to consider their license model. In particular, if you want to create a CUbRIK component that should become part of a SMILA distribution, you should choose software whose license is compatible with the EPL. See Chapter 4 of this document for details.

3.3 Create Pipeline / Asynchronous Workflow

After integrating your components, you need to orchestrate the pipelines or workers:

• For synchronous processes, i.e. pipelines, you can use the Open Source BPEL editor to define the workflow with editor support. Details are covered in [10];

Page 20: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 14 D9.1 Version 1.0

Figure 3-1 Screenshot of BPEL Designer

• For asynchronous workflows – as of Feb 2012 – there is no editor support available, hence you will have to create the workflow definition file manually [11].

3.4 Test Workflow

After defining your workflow, you need to test it. This is done by invoking the defined pipeline or workflow:

• When you want to index from a certain data source, you need to invoke the asynchronous workflow or pipeline via the SMILA job management [12]. Please note that you might need to wrap the invoked pipeline within a worker [13];

• When you want to use a pipeline for searching, you have to provide the pipeline name via the Search API [14].

Depending on the components you use, you might need additional tools (e.g., database clients) to inspect the results of a workflow / pipeline execution.

Page 21: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 15 D9.1 Version 1.0

4. Considering License Issues

After reading the previous chapters, you should be able to locate your development within the layers of the CUbRIK architecture and implement the needed pipelined or workflows.

This chapter briefly covers how to consider licensing issues. This important issue is captured in its own chapter since it contributes substantially to the “open the search box” vision of the CUbRIK project. By considering the licenses, CUbRIK can build upon already existing development.

For usage in industrial settings, a proper consideration of licenses is indispensable. Due to the complex nature of this issue, this chapter can only provide general guidelines – without any guarantee concerning legal issues.

In general, components used within CUbRIK should have licenses compatible with the Eclipse Public License [15]. This allows a potential integration into the CUbRIK as well as the general SMILA distribution. In any case, it is possible to integrate in the CUbRIK Platform components with different license models.

If the goal is to develop an open source component that is CUbRIK Platform license compliant, a list of compatible licenses is given in the eclipse legal process poster [16]. As of Feb 2012, the compatible licenses mentioned in this poster are:

• Apache Software License 1.1;

• Apache Software License 2.0;

• W3C Software License;

• Common Public License Version 1.0;

• IBM Public License 1.0;

• Mozilla Public License Version 1.1;

• Common Development and Distribution License (CDDL) Version 1.0;

• GNU Free Documentation License Version 1.3;

• BSD;

• MIT.

Non-compatible licenses mentioned in this poster are:

• GNU GPL 2.0;

• GNU LGPL;

• Sun Binary Code License Agreement.

In addition to open source licenses, most of the licenses of closed source software are not compatible for source code integration in an Open Source CUbRIK distribution as well. If in doubt whether you are allowed to use a component within a pipeline, please contact the owner of this component before integrating it.

A way to deal with non-compatible licenses is excluding the source code or binary of a component from a distribution, and to describe how to download it and integrate it in the

CUbRIK component you developed1. However, whether this way of handling incompatibilities is allowed depends on the actual license used.

1 A place to describe the integration would be in SMILA bazaar [17]. Contact one of the administrators to obtain access.

Page 22: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 16 D9.1 Version 1.0

5. How to Describe Components

In the previous chapters, how reusable components can be created and orchestrated was described. Since it is the purpose of CUbRIK to employ these components in multiple pipelines, they need to be described in a way that allows an easy reuse.

In this chapter, we describe a template for this description covering the relevant aspects for component integration. The template structure emerged during previous development and integration experiences, including description of components identified in the SMILA Hackathlon event, held in November 2011. During this event, several components were integrated based on the SMILA platform. The structure of this description will be adapted during the project based on the feedback of components integrators and users. The description is intended to support two kinds of users: Frist, developers that are developing components to be run in CUbRIK. Second, users of CUbRIK components to evaluate whether the component fits to their needs as well as supporting them in their first steps using the component.

This description is complementary to the procedure for component Development and Delivery provided in Chapter 6.

5.1 General Components Description

Each component comes with a description file, the installation and configuration guideline. It should be a text file containing a step-by-step guideline designed for installers with no knowledge of the component itself. An example of Installation and Configuration file, related to the FOO component, can be found in Annex A: Inst_Conf.txt FOOComponent . It is used as a fully-filled template for developers who are developing and delivering CUbRIK components and/or Pipelines.

5.1.1 Installation and configuration file naming and location

The name of the installation and configuration file should be: Inst_Conf.txt.

Components made of parts, delivered in different folders in a packed zip file, should be accompanied by an installation and configuration file located in the main folder. As an alternative, each folder can have its related installation and configuration file, but a main installation and configuration file has to be supplied in the main folder. This main file should list the component parts, and should specify where other installation and configuration files can be found and the installation order of the component parts. In this case, the name of the main file is Inst_Conf.txt, while the name of each secondary installation and configuration file should be Inst_Conf_<SUFFIX>.txt, where SUFFIX is defined by the component owner.

For example, folders and files related to component A composed of three parts A1, A2 and A3, deployed for version x.y from partner PHz: A_PHz/dist/vx.y/ (that contains):

Page 23: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 17 D9.1 Version 1.0

A

A1

A2

A_hist.txt

Inst_Conf_A1.txt

A1_hist.txt

A1files.zip

Inst_Conf_A2.txt

A2_hist.txt

A2files.zip

A3

Inst_Conf_A3.txt

A3_hist.txt

A3files.zip

Figure 5-1 Folders and files structure for a 3 parts components

Similarly to components, the pipeline should also be delivered with a proper installation and configuration guideline. This file should be a text file containing a step by step guideline designed for installers with no knowledge of the pipeline itself.

5.1.2 Component Description

This section describes in 2-5 sentences what the component does. This includes some information about the context, e.g., which operating system is needed to run the component.

This section allows a user to determine whether the component is relevant or not.

5.1.3 Contact Information

Each component should mandatorily report contact information to be used by a user to ask questions or discuss improvements. This should be the same as reported in CUbRIK bug-tracking system. The latter is recommended as official communication tool.

This section states who should be contacted in case of questions. At least, this should be an email. In addition, further information or a link to a wiki page can be provided.

5.1.4 Code Example

This section gives at least one example how the component is used within pipelines. This example could be taken from an actual pipeline or workflow where the component is used.

This section helps a user to employ this component in pipelines or workflows.

5.1.5 Dependencies on other CUbRIK Components

Pipelines are for their nature depending on components implementing specific tasks. So all the pipelines should be delivered with the list of components they are depending on.

In some cases preliminary relationships exist among components, which affect the installation of each component. The installation of Component A can have a preliminary

Page 24: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 18 D9.1 Version 1.0

condition that is the completed installation of component B. Each component should be delivered with the list of components interdependencies.

5.1.6 Third Party Libraries

The installation of third party libraries – libraries, components and other software artifacts not developed within CUbRIK project - can be preliminary to the installation of CUbRIK components. These dependencies should be clearly reported in the installation and configuration file. Libraries should be delivered jointly with related components where possible. If this is not possible because of technical or legal (licenses) or other reasons, the installation and configuration file should report download locations and describe installation and configuration details with reference to the proper URLs. This section allows a user to estimate the complexity of installing this component as well as determining whether the license fits.

E.g.: The component A depends on POSTGRESS DB:

Component A depends on: PostgreSQL JDBC Driver, JDBC3 Version 8.3-604 (http://jdbc.postgresql.org/).

The library is the version8.3 Build 604, JDBC 3, downloadable at [see table]. Installation of the library is described in “About” section of the same page [http://www.postgresql.org/docs/]. In this case a library configuration is not necessary (http://jdbc.postgresql.org/download.html).

The installation and configuration file should also describe changes of libraries (such as version etc.).

Library Version Note/Ratio License Download

PostgreSQL

JDBC Driver

8.3-604 In use BSD License

http://jdbc.postgresql.org/download.html#others

PostgreSQL

JDBC Driver

8.2 Build 509

Replaced with 8.3-604 version.

BSD License

http://jdbc.postgresql.org/download.html#others

Figure 5-2 Table with Example of description of a 3rd party library

5.1.7 Interface

This section lists the input and output parameters along with their data type. This section allows adapting the component to the actual record structure of an application.

• Input: List of parameters and their datatype that is expected by the component. It also states whether the input parameter is required optional;

• Output: List of parameters and their datatype that is created by the component. Additional information (like confidence of the information created) could be added if needed.

5.1.8 Installation Prerequisites

Component installation prerequisites should be described in a specific section of installation and configuration file. This section contains information on how to install the component on a target system. Followed by a general introduction of about 2-5 sentences, the following information should be provided:

• Environment, like Java virtual machine and related variables to be set;

Page 25: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 19 D9.1 Version 1.0

• Application server version and related setting (Tomcat, Axis2);

• Class path settings;

• Licenses if requested;

• Source Code: Link to source code;

• SMILA Version: SMILA Version under which this component runs;

• Documentation: An optional link to further documentation;

• Example Data: An optional link to example data to test the component;

• Restrictions: If needed, restrictions like operating system, performance needed or memory usage are described.

This information allows a user to determine whether the components will run in his or her environment. Furthermore, it eases the installation effort and this increases the likelihood or a successful reuse and evolution of a component.

5.1.9 Unpackaging

Instructions on how and where to unzip the file delivered should be provided in an installation and configuration file, along with additional information on paths or other steps to be followed if needed.

5.1.10 Component (Back End) and Web Service Installation

Components are usually structured in a back end part and the actual web service. For both of them, a step by step installation instruction should be provided in the installation and configuration file, in order to guide the installer along the process. Specific sections are provided in an installation and configuration template. Differences in installation under different operating systems should be detailed, along with installation instructions for third party libraries if needed, including their installation order in case of dependencies.

5.1.11 Component (Back End) and Web Service Test

The installation and configuration file have specific sections for back end testing and web service testing. Description should be made of step by step instructions on how to install and

use java code or pre compiled tests or a soapUI2 test delivered with the component.

2 http://www.soapui.org/

Page 26: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 20 D9.1 Version 1.0

6. How to Develop and Deliver Components

SW production from CUbRIK partners is expected to follow common rules in naming convention, quality certification, development systems, libraries, documentation and forms of SW delivery: all these issues are taken in consideration in the following, in order to set up proper foundations for the SW building. Where available at the time this document is issued, examples of guidelines and system configurations are provided in the appendix.

According to this goal, this section provides specific procedures for developers who have to deliver CUbRIK components. These procedures are intended to ensure that delivered components are installable and work properly. Ideally, by using these procedures, components need only to be retrieved from binary repository, installed, configured and tested to run, optimising the overall production cycle lead time.

6.1 Development Conventions

These paragraphs are intended to give some common conventions for CUbRIK software development. In the case of CUbRIK, for all kinds of artifacts produced, (components, pipelets and pipelines) the development conventions should be adopted.

6.1.1 Naming Conventions

The definition of a common and unanimous naming convention for developed bundles and created packages is basic in a project with various partners.

The names chosen should easy, giving an idea about the feature improved and the kind of bundle produced.

The name of the project should always be present in all features and bundles.

In the name of packages could appear also the coded name of the partner.

e.g.:

Feature Name: cubrikproject.pipelet.eng.wikilyrics

Base package name: eu.cbrkprj.pipelet.eng.wilkilyrics

6.1.2 Guidelines Writing Source Code

The source code should adopt the standard common java and JavaBean standard rules:

a. The package names should have only small letters;

b. All the class names should start with a capital letter;

c. All the class attributes and member names and also local variable names should start with a small letter;

d. When a class, an attribute, a member or a variable name is composed of more than a word, the letter of each word should be capital apart from first one;

e. The public static constant should be in capital letters and if more words, tied by an underscore;

f. All the class attributes should be private;

g. All the attributes must be used with their own getters and setters;

h. Getter and setter methods should be public;

i. Getter methods don’t have parameters and must return value type of same set

Page 27: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 21 D9.1 Version 1.0

methods;

j. Getter methods that don’t return boolean values must start with “get”;

k. Getter methods that return boolean values should start either with “is” or “get”;

l. Adding list methods should start with “add”;

m. Removing list methods should start with “remove”.

6.1.3 Exception Handling

Well done exception handling management is really important in framework-based software like in CUbRIK.

It is important that the checked exceptions are well thrown and well managed without changing their meaning. If a kind of exception is thrown, the calling code should not manage that, should not change the message but at least should create a new exception based on the original one.

Unchecked exception must be well prevented using right checks on parameters, variables and attributes. Each parameter should be checked in its integrity to prevent null pointer exception, format exception and arithmetical exceptions.

6.1.4 Logging Guidelines

In a highly distributed system like CUbRIK it is really important to create a useful and well designed logging system. As advised in SMILA, there are some easy rules to follow.

Avoid a static reference to an apache commons log instance:

e.g.: Is good doing

private final Log _log = LogFactory.getLog(MyClass.class);

Always check log level before logging:

e.g.: Is good doing

if (_log.isErrorEnabled()) {

_log.error("Your error message", e);

}

Not log exception before throwing it or new one:

e.g.: It is bad doing

...

if( paramXY == null ) {

if (_log.isErrorEnabled()) {

_log.error("paramXY is not set");

}

throw new NullPointerException("paramXY is not set");

}

6.1.5 Third Party Components Integration Guidelines

The installation of third party libraries (libraries, components and other software artifacts not developed within CUbRIK project – for example some graphical tools leverage on a third party library like GTK+) can be preliminary to the installation of CUbRIK components. As already mentioned in 5.1.6 these dependencies should be clearly reported in the installation and configuration file.

Libraries should be delivered jointly with related components where possible. If this is not possible because of technical or legal (licenses) or other reasons, the installation and configuration file should report download locations, versions and describe installation and configuration details with reference to the proper URLs. All this information is collected in theinstallation and configuration file.

Page 28: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 22 D9.1 Version 1.0

6.2 Design Pattern for CUbRIK Components

The development of a CUbRIK component should follow specific patterns depending on its kind. This creates bundles that have the same construction principles, thus making them easier to understand as well as easier to maintain.

6.2.1 Pattern for Common Features

Common features are intended as common third party bundle or specific developed utilities useful for other components. They should be implemented as plug-in bundles designed with high cohesion java paradigm.

An example of common feature could be intended as datamodel, the following picture represents the class model of a dosomething.datamodel and its relationship with smila.datamodel:

Figure 6-1 DoSomething datamodel class diagram

6.2.2 Pattern for Agents

Agent component is intended as a platform plug-in with specific behaviours if specific events come out. Even if obviously actions change between different agent types, their shape should remain the same. In particular, they have to implement common features:

• Initialize themselves;

• Create and manipulate record objects;

• Make something when it is requested.

This could be summed up in the following figures:

Page 29: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 23 D9.1 Version 1.0

Figure 6-2 Foo Agent class view

Figure 6-3 Foo Agent sequence diagram

When an agent needs third party libraries or features, they should be provided in the same plug-in except if they are intended as useful common features themselves. If so, the common

Page 30: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 24 D9.1 Version 1.0

feature should be converted in a bundle plug-in that other components could import.

Please note that with the release of SMILA 1.0, data connectors should use the new import framework [9]. The Agent API will be replaced in the near future.

6.2.3 Pattern for Crawlers

Like an Agent, a Crawler is a component platform plug-in, but it is activated explicitly by the platform. On the developer side, Crawlers are different objects from Agents but with common features:

• Initialize themselves;

• Create and manipulate Record objects;

• Make something when it is requested.

The following figures show this;

Figure 6-4 Foo Crawler class view

Page 31: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 25 D9.1 Version 1.0

Figure 6-5 Foo Crawler sequence diagram

When a crawler needs third party libraries or features, they should be provided in the same plug-in except if they are intended as useful common features themselves. If so, the common feature should be converted in a bundle plug-in that other components could import.

Please note that with the release of SMILA 1.0, data connectors should use the new import framework [9]. The Crawler API will be replaced in the near future.

6.2.4 Pattern for Pipelets

Pipelets are specific features used inside a pipeline. They are developed as platform plug-in with common features:

• Reading configurations;

• Creating and manipulating specific Record objects;

• Processing Records.

Page 32: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 26 D9.1 Version 1.0

Figure 6-6 DoSomething pipelet class diagram

Figure 6-7 DoSomething pipelet sequence diagram

Page 33: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 27 D9.1 Version 1.0

6.2.5 Pattern for Pipelines

A pipeline is defined as a pipelet sequence designed to perform required features described in well-defined use cases. It can be developed in specific workers objects or in BPEL Workflows.

Each pipeline must be described as a use case using UML diagrams and must be deployed with exhaustive comments reporting needed pipelets, configuration and setup information.

6.2.6 Additional Guidelines Pipelines

The CUbRIK project relies on the SMILA infrastructure for pipeline definition and running. An extensive description on how to use BPEL Workflow definitions within SMILA is available online [18]. Therefore, only a reference to this description is stated here.

Page 34: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 28 D9.1 Version 1.0

6.3 Testing

This section reports on the testing activities performed in order to release the final CUbRIK platform. Testing has been organised in two main phases:

1. Component testing has the goal to find the defects of each module of the CUbRIK platform. It is articulated in Unit test and Consumer Provider tests (described afterwards);

2. Platform testing has the goal to find defects when modules are integrated into a functional chain. This is made by two phases: the first one, named Functional test, is constituted by the CUbRIK components deployed locally by each partner. The second one, named Integration test; tests the actual “installability” level of the components developed by the CUbRIK partners.

For pipelines developed in CUbRIK, the testing procedure can be assimilated to the Functional test. In fact, pipelines are part of the Use Case workflow and can be tested in terms of workflow and dataflow requirement accomplishment. In this case, it is preliminary to have the components involved already in place and tested.

6.3.1 Unit Test

Unit testing tests a component before its official release, by means of functional tests performed to check the correct component behaviour, following its functional specs. Unit testing is not standardized within the CUbRIK project, and is carried on under the producer’s responsibility.

6.3.2 Consumer-Provider Testing

Consumer-provider testing tests a module from the point of view of its direct clients - which exploit the component under testing - by using specifically written applications, named Consumer Provider (C/P) tests. Each C/P test invokes the module using its standard input and checks for the correctness of the output (returned values and status code).

Consumer-provider testing is under the responsibility of the partner consumer of a service exposed by a module, on the basis of the dependency chart of the respective module. For example: a partner develops and deploys Component X and exposes it as a web service. The wsdl file provides all necessary information to test the services exposed. Consumer-provider test is performed when a new component is released or changes are applied to the component. It guarantees that component behaviour is compliant to the defined interface.

Consumer-provider testing is defined by both consumer and provider. Provider defines an initial proposal to be agreed by the Consumer. This kind of testing is not standardized within the CUbRIK project, but dedicated Java clients can be developed for C/P test implementation

using soapUI3, an open source tool for Web Service Testing. Additionally, the SMILA APIs could be tested using a JSON REST Client, which are available for most browsers.

6.3.3 Integration Testing

Integration testing tests the integration of a component into the chain. Installation is checked in agreement with the installation requirements defined in the corresponding Platform release. Integration testing is performed as a final test before platform packaging, after end-to-end tests which check the platform adherence to functional requirements, to ensure that components are correct and complete, starting a “from scratch” installation. Integration testing is under the responsibility of the partner in charge of the installation.

3 http://www.soapui.org/

Page 35: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 29 D9.1 Version 1.0

6.3.4 End to End Functional Testing

End to End Functional testing verifies the functionality of the integrated system. Fragments of CUbRIK pipelines, formalized in pipeline and reflecting use cases (or its fragments), are exploited using the User Interface as test sessions. Functional testing is performed mainly by the partner in charge of Platform Integration, supported by other partners involved in the Use Case.

Any testing session calls one of the entry points of the prototype and checks the desired response.

Note that the relationship between use cases and test pipeline could not be one-to-one, because a complex use case could be made by several distinct ways to interact with the CUbRIK Platform, and each one must be subject of a specific testing session.

Functional testing is supported by the Bugzilla infrastructure and in particular by the section Bugs, releases and feedbacks tracking system.

Pipeline testing can be in charge of pipeline owner and/or Platform integrator. Initial test is performed by pipeline owner for smooth testing. Then completed functional testing is performed by platform integrators supported by pipeline owner and by components owner for all the components involved.

Page 36: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 30 D9.1 Version 1.0

6.4 Delivery Process Guidelines

Each component developed has to follow its kind of specified guidelines to allow its delivery into the available CUbRIK bundle repository.

6.4.1 Component and Pipelines Provision Form

For components, only compiled code should be supplied, to avoid problems due to code compilation. Third party libraries should to be provided as compiled and installable files.

For pipeline delivery, these must be hard tested too and provisioned with a description file helping to understand how they work.

6.4.2 Installation and Configuration guideline

A proper installation and configuration guideline file should be provided jointly with the related component. It is described in section 5.1. The file should be stored in SVN folder specific for the component. In case of component delivery as packaged file the installation and configuration guideline should be included.

6.4.3 Path and references

Hard coded paths should be avoided. Paths subject to changes should be specified in configuration files. In particular, URL-based hardcode references to wsdl and services.xml files should be avoided, because references to wsdl are automatically changed at deployment time and the services.xml file URL is just a namespace (an alias of a namespace).

An actionMapping alias might be needed in case ws-addressing is used, but there is no need to use a namespace referenced by a URL; namespaces from client calls should be removed as well.

Each bundle using third party or external processes for data manipulation should use it’s own configuration reader component.

6.4.4 Test client

Installing an instance of the CUbRIK platform needs a step by step process, based on the installation of one single component at a time and its test by means of a specific test client which should be supplied by component owners along with components themselves. The test client should test that all functionalities work as expected, and should output informative messages in case of problems. Test clients should be provided as java code to be included

as a part of the testing code, and the web service testing tool soapUI4 should be used.

4 http://www.soapui.org/

Page 37: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 31 D9.1 Version 1.0

6.5 Software Factory Support Environment

The overall delivery process is based on support systems which are extensively described in the following, with special regard to internal folders organisation and naming convention, and to those information reports which come within a correct SW production workflow.

The CUbRIK Software Factory support environment is provided as a thorough platform to support the partnership in the dispersed process of CUbRIK Platform development, updating, deploying testing and packaging.

The supporting environment is composed of:

� a Subversion System;

� a Bug, Release and Feedback tracking system.

Both the systems are tailored on the specific CUbRIK Platform realization configuration needs.

6.5.1 CUbRIK Subversion System

The SVN (Apache™ Subversion®5) is an open source version control system that keeps track of all works and all changes in a set of files and provides concurrent access to the revision of the project resource. It is composed of two components:

1. Repository: running on a central server, it maintains the project versions and its history;

2. Client: used by developers to connect to the server in order to perform:

• Checkout: to acquire a copy of a component maintained in the SVN repository;

• Commit: to save the local copy into the SVN repository;

• Update: to update the local copy with the latest version of the SVN repository.

An example of software to use for Windows Client is TortoiseSVN6. This is a free software to provide revision control/version control/source control software.

6.5.2 CUbRIK SVN Structure

Folders structure

The CUbRIK SVN structure reflects the list of components following the CUbRIK platform architecture organisation.

WORK, the root folders, contains all components folders used by developers for files checking-out/in. Each CUbRIK component corresponds to a sub folder, named “component name_partner name”. E.g. <FOO_ENG>. This is the first nesting level.

Under the component folder, there is a further subfolder called dist which contains the component distribution delivered by the owner and ready for the installation. When a component is delivered, each partner responsible creates a subfolder under dist containing the version of the component. For example: component A is deployed for version x.y from partner PHz: A_PHz/dist/vx.y/ is created and related files are committed in. The following figure illustrates the structure:

5 http://subversion.apache.org/

6 http://tortoisesvn.net/

Page 38: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 32 D9.1 Version 1.0

Figure 6-8 SVN Structure

Regarding the CUbRIK platform packaging refer to Cap. Delivery Process .

Similarly for PIPELINE, a specific folder is created for each pipeline, it is named “p_pipelinename_partner name”. E.g. <p_ContentAnalytics3DObject_ENG>.

Folder Access Policy

The folder access policy allows any partner to access with read/write permission to any SVN folder.

Each partner will be provided with an account, it is possible to add accounts when necessary.

The definition of account name is as follows: <partner>_xxx (Example: ENG_0001).

6.5.3 Bugs, releases and feedbacks tracking system

Bugzilla is an open-source tool developed to support bugs handling and reports and tracking during software development. In CUbRIK development, Bugzilla supports:

1. Software defects handling;

2. Integration of software components in releases;

3. Collection of external feedback.

6.5.4 Bugzilla Reports

The following 3 types of reports are provided by CUbRIK Bugzilla:

a. Software Defects Tracking (Type A);

b. Release tracking (Type B);

Page 39: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 33 D9.1 Version 1.0

c. User feedbacks (Type C).

Software Defects Tracking (Type A)

Functional overview:

• each bug report is associated to one platform component or pipeline;

• any creation and change of a bug report is notified via e-mail;

• the e-mail is sent to component/pipeline owner and CC to a group of people;

• bugs have a status (new, assigned, resolved, …);

• the bug status can be seen by all users, but can be changed only by the component owner and by the members of the associated group.

Some other functionalities allow the user to:

• search for bug reports;

• create reports and charts;

• change user preferences, including e-mail notifications;

• tag bugs with specific keywords to make tracking easier.

Release Tracking (Type B)

The process of releasing and testing a new version of a CUbRIK component consists of three main phases:

1. creation of a report regarding the new release;

2. testing of all the components which depend from the released one;

3. system integration test.

Any CUbRIK component owner can create reports about a new release of the component itself.

After a new report has been created, an email notification is sent to the responsible of the final integration and to all members of the CC list. The CC list is made of the partners / owners of components/pipelines which depend on the released one.

The component/pipeline owner, the depending components owners and the responsible for final integration, have the permission to edit and comment the report.

The reporter can add people to the CC list others than those above defined, with read rights for the report. The next picture shows the permissions schema.

Page 40: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 34 D9.1 Version 1.0

Figure 6-9 CUbRIK Bugzilla permissions schema

User Feedback Collection (Type C)

Access to CUbRIK Bugzilla is also permitted to external users such as content providers or consumers. These users can to view and edit only a particular type of reports (Type C).

Reports could regard issues such as:

• Accuracy in results: aspects related to precision and quality of results, including relevance and ranking;

• Data and User Security: issues regarding security in access and search, including spam, access rights, copyrighted contents;

• Installation, configuration, user licenses and access: problems regarding the set-up of a CUbRIK system;

• Performance Problems: issues related to perceive performances.

Recipients of these reports could, if needed, create software defect bugs (Type A) reports, linking them to the original on.

6.5.5 CUbRIK Bugzilla structure

A CUbRIK Bugzilla is configured in order to support different activities, such us components development and platform installation:

1. CUbRIK Features: Collects and manages Features for CUbRIK platform versions. This product is useful to track and report results of End to End Functional described in section 6.3.4;

2. CUbRIK Components/Pipelines: Space for the insertion of bugs related to any component/pipelines of the CUbRIK platform;

3. CUbRIK Data model: This product collect all aspects related to CUbRIK data models;

4. CUbRIK Use Cases: This product collect all aspects related to CUbRIK use cases;

5. CUbRIK Platform: This product collect all aspects related to CUbRIK Platform.

Page 41: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 35 D9.1 Version 1.0

References

1. Beck, K., Beedle, M., van Bennekum, A., Cockburn, A., Cunningham, W., Fowler, M., Grenning, J., Highsmith, J., Hunt, A., Jeffries, R., Kern, J., Marick, B., Martin, R.C., Meller, S., Schwaber, K., Sutherland, J., Thomas, D.: Manifesto for Agile Software Development, http://agilemanifesto.org/.

2. Gudgin, M., Mendelsohn, N., Nottingham, M., Ruellan, H.: SOAP Message Transmission Optimization Mechanism, http://www.w3.org/TR/soap12-mtom/.

3. Barton, J.J., Thatte, S., Nielsen, H.F.: SOAP Messages with Attachments, http://www.w3.org/TR/SOAP-attachments.

4. Eclipse Foundation: SMILA - Unified Information Access Architecture Homepage, http://www.eclipse.org/smila/.

5. Eclipse Foundation: SMILA - How to set up dev environment, http://wiki.eclipse.org/SMILA/Documentation/HowTo/Howto_set_up_dev_environment.

6. Eclipse Foundation: SMILA - Project Concepts - BPEL Pipelining Concept, http://wiki.eclipse.org/SMILA/Project_Concepts/BPEL_Pipelining_Concept.

7. Eclipse Foundation: SMILA - How to write a Worker, http://wiki.eclipse.org/SMILA/Documentation/HowTo/How_to_write_a_Worker.

8. Eclipse Foundation: SMILA - PipeletProcessorWorker, http://wiki.eclipse.org/SMILA/Documentation/Worker/PipeletProcessorWorker.

9. Eclipse Foundation: SMILA - Documentation, http://wiki.eclipse.org/SMILA/Documentation#Importing.

10. Eclipse Foundation: SMILA - BPEL Designer, http://wiki.eclipse.org/SMILA/BPEL_Designer.

11. Eclipse Foundation: SMILA - Worker and Workflows, http://wiki.eclipse.org/SMILA/Documentation/WorkerAndWorkflows.

12. Eclipse Foundation: SMILA - JobRuns, http://wiki.eclipse.org/SMILA/Documentation/JobRuns.

13. Eclipse Foundation: SMILA - PipelineProcessorWorker, http://wiki.eclipse.org/SMILA/Documentation/Worker/PipelineProcessorWorker.

14. Eclipse Foundation: SMILA - Search, http://wiki.eclipse.org/SMILA/Documentation/Search#Search_Service_API.

15. Eclipse Foundation: Eclipse Public License - Version 1.0, http://www.eclipse.org/legal/epl-v10.html.

16. Eclipse Foundation: Eclipse Legal Process.

17. smila-bazaar - Share and find components for SMILA, http://code.google.com/a/eclipselabs.org/p/smila-bazaar/.

18. Eclipse Foundation: SMILA - How to filter and access record data in BPEL, http://wiki.eclipse.org/SMILA/Documentation/HowTo/How_to_filter_and_access_record_data_in_BPEL.

Page 42: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 36 D9.1 Version 1.0

Annex A: Inst_Conf.txt FOOComponent Example

Note: The information provided below should serve as an example only. FOOComponent’s actual installation and configuration could slightly differ from that presented in the following guideline.

/***********************************************************************/

* Copyright (C) 2007 XXXX

*

*Copyright (c) 2011, 2012 ACME. All rights reserved.

* This materials are made available under the terms of the XXXX

* License v.YY which accompanies this distribution.

* The XXXX License is available at : *http://www. *

* Disclaimer *

* This document contains confidential information in form of description * of the CUbRIK project findings, work and

* products and its use is strictly regulated by the CUbRIK Consortium

* Agreement and by contract no. FP7-287704. Neither

* the CUbRIK Consortium nor any of its officers, employees or agents

* shall be responsible or liable in negligence or

* otherwise howsoever in respect of any inaccuracy or omission herein.

* Without derogating from the generality of the

* foregoing neither the CUbRIK Consortium nor any of its officers,

* employees or agents shall be liable for any direct

* or indirect or consequential loss or damage, personal injury or death,

* caused by or arising from any information,

* advice or inaccuracy or omission herein. This document has been

* produced with the assistance of the European Union.

* The contents of this document are the sole responsibility of CUbRIK

* consortium and can in no way be taken to reflect

* the views of the European Union.

*

* CUbRIK is a project partially funded by the European Union.

/***********************************************************************/

Installation and Configuration guidelines location

--------------------------------------------------

FOOComponent is provided as composed by two further components, FOO_A and FOO_B. This file constitutes the main installation and configuration guideline. Further installation and configuration guidelines are provided for each sub component,

Page 43: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 37 D9.1 Version 1.0

they are named Inst_Conf_FOO_A.txt and Inst_Conf_FOO_B.txt and are located respectively under: <main>/FOO_A and <main>/FOO_B,

where <main> is the directory where the FOO zip file was unzipped. FOO_B is the actual component deployed as web service,

it relies on FOO_A for performing the foreseen tasks.

Installation of two components has to be performed as:

1. install FOO_A

2. test FOO_A

3. install FOO_B

4. test FOO_B.

Component Description

--------------------------------------------------

The FOO component is specific for the realizzation of process XX. It implements the following algorithms: H3C, HTC, VT21. …..etc……

Contact Information

--------------------------------------------------

FOO component is developed by XXXX. In case of bugs to report please refer to https://85.18.109.178/cubrikbugs/

For specific comment and request for improvement please refer to http://wiki.cubrikproject.eu

Code Example (for Twitter component)

--------------------------------------------------

Example of the twitter data source (tweets.xml):

<?xml version="1.0" encoding="UTF-8"?>

<DataSourceConnectionConfig

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:noNamespaceSchemaLocation="../org.eclipse.smila.connectivity.framework.agent.twitter/schemas/TwitterAgentDataSourceConnectionConfigSchema.xsd"

>

<DataSourceID>tweets</DataSourceID>

<SchemaID>org.eclipse.smila.connectivity.framework.agent.twitter</SchemaID>

<DataConnectionID>

<Agent>TwitterAgent</Agent>

</DataConnectionID>

<DeltaIndexing>disabled</DeltaIndexing>

<Attributes>

Page 44: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 38 D9.1 Version 1.0

<Attribute Type="String" Name="jsonString" KeyAttribute="true" HashAttribute="true">

<jsonString>jsonString</jsonString>

</Attribute>

</Attributes>

<Process>

<idsDatabase>

<ip>127.0.0.1</ip>

<port>27017</port>

<database>tweetRanking</database>

<collection>userIds</collection>

</idsDatabase>

<idsMaxNumber>5000</idsMaxNumber>

<skip>0</skip>

<twitterAcc>

<user>YourTwUserName</user>

<password>yourPW</password>

</twitterAcc>

</Process>

</DataSourceConnectionConfig>

Dependencies on other CUbRIK Component

--------------------------------------------------

Components required for FOO are :

1. Component A and Pipelet A

2. Component B and Pipelet B

4. _____

Third part libraries

---------------------------------------------

FOO A depends from:

1 PostgreSQL JDBC Driver,

JDBC3 Version 8.3-604 (http://jdbc.postgresql.org/).

The library is the version8.3 Build 604, JDBC 3, downloadable at [1]. Installation of the library is described in “About” section of the same page [1]. In this case a library configuration is not necessary.

[1] http://jdbc.postgresql.org/download.html.

Configuration is described in Install FOO section.

Page 45: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 39 D9.1 Version 1.0

2 wsdl4j (version 1.6.2), it can be downloaded from

http://repo1.maven.org/maven2/wsdl4j/wsdl4j/1.6.2/wsdl4j-1.6.2.jar

Configuration is described in Install FOO section.

3 etc ....

Some libraries were updated from previous version, table below report an overview:

--------------------------------------------------------------------

| Library | Version | Note/Ratio | License | Download

--------------------------------------------------------------------

| PostgreSQL | 8.3-604 | In use.

| JDBC Driver | |

--------------------------------------------------------------------

| PostgreSQL | 8.2 Build 509 | Replaced with 8.3-604 version. New

| JDBC Driver | | version allows to fix issue related

| | | to bugs #103, #106 and #201.

--------------------------------------------------------------------

| wsdl4j | 1.6.2 | In use.

--------------------------------------------------------------------

| ..... | ..... | .....

--------------------------------------------------------------------

Interface

---------------------------------------------

-Input

user IDs to follow (String)

twitter credentials (String)

- Output

twitter update as org.eclipse.smila.datamodel.Record

Installation Prerequisite

--------------------------------------------------

The FOO component requires a specific environment and is tested in pipeline developed over SMILA infrastructure. Below are reported some notes:

0. Source code is available at www.Foo.com

1. Java SE Development Kit (JDK) 5.0 is required. The latest version of Java 5.0 can be downloaded from

http://java.sun.com/javase/downloads/index_jdk5.jsp.

Page 46: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 40 D9.1 Version 1.0

* JAVA_HOME environment variable needs to be set and its value should be the location where JDK is installed.

2. Apache Tomcat is required. The tested and recommended Tomcat version is 6.0.16, which can be downloaded from

http://archive.apache.org/dist/tomcat/tomcat-6/v6.0.16/bin/apache-tomcat-6.0.16.zip.

* Note that in windows, if more than one Tomcat are installed and CATALINA_HOME system environment variable

is set to one of them, that one will always be used. It is required to change the value of CATALINA_HOME

in order to use other Tomcat installations.

3. Axis2 is installed in Tomcat. The tested and recommended Axis2 version is 1.3.

* Download the Axis2 1.3 war file from http://www.powertech.no/apache/dist/ws/axis2/1_3/axis2-1.3-war.zip.

* Unzip the downloaded zip file, and copy the axis2.war file inside it under webapps/ folder of Apache Tomcat,

and then start Tomcat. Tomcat will automatically install Axis2 in it.

* After the installation, use the web browser to view http://localhost:8080/axis2/ to see whether Axis2

welcome page could be seen.

4. A license file for FOO:

* Send a license request to Mr………. ( email@.....) to get a valid license file in order to use FOO component.

5. SMILA V.1.0:

* Download SMILA from www.eclipse.org/smila/

* Install SMILA according guideline available in http://www.eclipse.org/smila/documentation.php

Unpack the package CUbRIK_ACME_FOO-1.0.1.zip

---------------------------------------------

1. Unzip the package into a location, for example, c:\ld for windows or /ld for linux. This location is called <foo-home>

in the following texts.

* Note that <foo-home> should NOT have a long path, or the extracted files might exceed the length restricion of file names

in windows and linux and some errors during unzipping process would occur.

2. When the unzipping is done, a CUbRIK_ACME_FOO-1.0.1 folder should be under <foo-home>.

3. Read the README file under <foo-home>/CUbRIK_ACME_FOO-1.0.1/

Page 47: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 41 D9.1 Version 1.0

folder, which shows the overview of the package.

4. Notice that some links shown in the following sections might block the wget access from linux. Please use a web browser to download them.

Install FOO Engine (in case the component is released as Engine (back office) + webservice

---------------------------------------------

FOO Engine is already in the <foo-home>/CUbRIK_ACME_FOO-1.0.1/fooengine/ folder, but it requires some third party library dependencies

in order to function properly, as well as a valid license file.

1. Go to <foo-home>/CUbRIK_ACME_FOO-1.0.1/fooengine/ folder:

* ONLY needed for windows users: Follow the instructions in the README.txt to download necessary software.

* Here is the direct link to Unzip: ftp://ftp.tex.ac.uk/tex-archive/tools/zip/info-zip/WIN32/unz552xN.exe. Don't forget to put it under the httpgetz folder.

* Here is the direct link to Gzip: http://www.gzip.org/gzip124xN.zip. Don't forget to put it under the httpgetz folder.

* Here is the direct link to Tar: http://gnuwin32.sourceforge.net/downlinks/tar-bin.php. Double-click to install it first, and then add the folder containing the binary (the default is C:\Program Files\GnuWin32\bin) to the system environment variable PATH.

2. Go to <foo-home>/CUbRIK_ACME_FOO-1.0.1/fooengine/httpgetz/ folder, and execute "httpgetz.exe" for windows or "./httpgetz order_sheet_linux.txt" for linux to download third party libraries.

* Note that the user needs to type yes to continue the downloading process. The whole process takes around 50 mins to complete.

3. Put the received license file (i.e. CUbRIK_FOO.lic) under <foo-home>/CUbRIK_ACME_FOO-1.0.1/fooengine/mars/mars/licenses folder.

* Note that without the license file, it is NOT possible to use FOO engine.

4. Download some third party libraries that are required by FOO engine and copy them to proper location:

* Download wsdl4j (version 1.6.2) from http://repo1.maven.org/maven2/wsdl4j/wsdl4j/1.6.2/wsdl4j-1.6.2.jar, and then copy this jar into <foo-home>/CUbRIK_ACME_FOO-1.0.1/fooengine/mars/mars/lib/utilities/mmaccess/lib/ folder.

Test FOO Engine

---------------------------------------------

Page 48: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 42 D9.1 Version 1.0

1. Use the following commands to start FOO engine:

* on windows:

cd <foo-home>/CUbRIK_ACME_FOO-1.0.1/fooengine/mars/mars/bin/

mars.bat

* on linux:

cd <foo-home>/CUbRIK_ACME_FOO-1.0.1/fooengine/mars/mars/bin/

sh mars.sh

2. Once the juno:fooNode/FOONode> prompt is shown, type the following command:

int FOOS

3. Once the FOO> prompt is shown, type the following commands: (after the load Logger command, press Enter before typing the exec Logger command)

load logger

exec logger

4. If the screen output is "Execution completed successfully.", FOO engine is installed successfully.

5. Type the following commands to stop FOO engine: (after the first quit command, press Enter before typing the second quit command)

quit

quit

Install FOO Web Service

---------------------------------------------

1. It is highly recommended to install FOO Web Service and FOO Engine on the same machine to improve performance.

2. Assume Apache Tomcat is already installed in a location, for example, c:\ld\apache-tomcat-6.0.16 for windows or /ld/apache-tomcat-6.0.16 for linux. This location is called <tomcat-home> in the following texts.

3. Stop Tomcat if it is running.

4. Copy the files in <foo-home>/CUbRIK_ACME_FOO-1.0.1/webservice/classes/ folder to <tomcat-home>/webapps/axis2/WEB-INF/classes/ folder. These are configuration files for FOO web service.

* Note that the original file "log4j.properties" will be overwritten by the new one. If there are other applications that also made changes to that file, please manually merge the changes together (requiring some knowledge of log4j).

Page 49: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 43 D9.1 Version 1.0

5. Copy the files in <foo-home>/CUbRIK_ACME_FOO-1.0.1/webservice/lib/ folder to <tomcat-home>/webapps/axis2/WEB-INF/lib/ folder. These are library dependencies for FOO web service.

6. Copy the file in <foo-home>/CUbRIK_ACME_FOO-1.0.1/webservice/services/ folder to <tomcat-home>/webapps/axis2/WEB-INF/services/ folder. This is the .aar file for FOO web service.

7. Download some third party libraries that are required by FOO web service and copy them to proper location:

* Download osgi core from http://repo1.maven.org/maven2/org/eclipse/osgi/3.2.1-R32x_v20060919/osgi-3.2.1-R32x_v20060919.jar, and then copy this jar into <tomcat-home>/webapps/axis2/WEB-INF/lib/ folder.

* Download osgi services from http://repo1.maven.org/maven2/org/eclipse/osgi/services/3.1.100-v20060601/services-3.1.100-v20060601.jar, and then copy this jar into <tomcat-home>/webapps/axis2/WEB-INF/lib/ folder.

8. Go to <tomcat-home>/webapps/axis2/WEB-INF/classes/ folder and change the configuration files:

* Open log4j.properties file, and change the existence of "/ld/apache-tomcat-6.0.16/" to the current <tomcat-home>. Note that there should be two changes since we use two log files (one for administrator and the other for normal user).

* Open project.properties file, and change the existence of "/ld/apache-tomcat-6.0.16/" to the current <tomcat-home>.

* Note that if the windows style file path separator '\' is used in the configuration file, it must be replaced by '\\' because of the requirements of Java properties class.

9. The default endpoint for FOO web service is 'http://localhost:8080/axis2/services/FOOWebService'. If it needs to be modified, the user needs to unzip the FOOWebService.aar file under <FOO-home>/ENG_FOO-1.0.1/webservice/services/ folder, open the FOOWebService.wsdl file under META-INF/ folder, search for 'http://localhost:8080/axis2/services/FOOWebService', change all its appearances to the desired endpoint, save the changes, re-zip those unzipped files together with the modified wsdl file into a new FOOWebService.aar file, and replace the old FOOWebService.aar file with the new one. After the changes, follow Step 6 shown above in this section.

Test FOO Web Service

---------------------------------------------

1. In the following text, it is assumed that FOO web service client is on the same machine as FOO web service, and FOO web service is running on port 8080. If not, please check the README.txt file under <FOO-home>/CUbRIK_ENG_FOO-1.0.1/client/ folder for how to modify the configurations accordingly.

2. Start Tomcat, and use the web browser to check http://localhost:8080/axis2/services/listServices. FOOWebService should be in the Available services section on that page.

Page 50: CUbRIK Integration Guidelines

CUbRIK Integration Guidelines Page 44 D9.1 Version 1.0

3. Start FOO engine as shown in the "Test FOO Engine" section above.

4. Download some third party libraries that are required by FOO web service client and copy them to proper location:

* Download wsdl4j from http://repo1.maven.org/maven2/wsdl4j/wsdl4j/1.6.1/wsdl4j-1.6.1.jar, and then copy this jar into <FOO-home>/CUbRIK_ENG_FOO-1.0.1/client/lib/ folder.

5. Use the following commands to start FOO web service client:

* on windows:

cd <FOO-home>/CUbRIK_ENG_FOO-1.0.1/client

run.bat

* on linux:

cd <FOO-home>/CUbRIK_ENG_FOO-1.0.1/client

sh run.sh

6. Once the FOOclient> prompt is shown, type the following command:

run

7. Once the screen output shows the completion of the processing, if there is a file "logger.out" generated in <FOO-home>/CUbRIK_ENG_FOO-1.0.1/client/testdata/ folder with the same content of the file "logger.txt" under that folder , FOO web service is installed successfully.

8. For more usage about FOO web service client, please check the README.txt file under <FOO-home>/CUbRIK_ENG_FOO-1.0.1/client/ folder for details.