empirical support for the generation of domain-oriented quality models

IEdo

www.ietdl.org

Published in IET SoftwareReceived on 29th October 2008doi: 10.1049/iet-sen.2009.0040

ISSN 1751-8806

Empirical support for the generation ofdomain-oriented quality modelsM.T. Villalba1 L. Fernandez-Sanz2 J.J. Martınez2

1Universidad Europea de Madrid, C/Tajo s/n 28670 Madrid, Spain2Universidad de Alcala de Henares, Ctra. Madrid-Barcelona Km 33,600, 28771 Madrid, SpainE-mail: [email protected]

Abstract: The difficulties of software quality evaluation found during our activity in different projectsand publications led us to investigate a systematic method for building domain-oriented quality models basedon a sound empirical basis. General quality models need to be adapted to specific types of software productsto be effective. Related literature reveals that existing quality models tend to suffer from poor empiricalsupport. To overcome this situation, a review of applicable existing standards and related literaturehas enabled the generation of a basis for software quality models which can be adapted to specificapplication domains. The process, called DUMOD (Domain-oriented qUality MOdels Development), includesthe collection of experts’ opinions and the application of multivariate analysis techniques in order to eliminateredundant information and construct a validated model, more efficient and reliable. In this study, theempirical method to devise quality models for specific application domains is presented, as well as itsapplication to a case study for security software products empirically validated by extensive collection of datafrom IT professionals.

1 IntroductionA software quality model provides a hierarchy of softwarequality features devised to support evaluation processes ofsoftware systems and applications. Different standards andproposals have arisen to offer a guide or solution to theproblem of software quality evaluation (see [1]). Maybe thebest known one is the ISO/IEC 9126-1 [2] standard,which provides an internationally accepted (as a documentsupported by ISO) model for quality evaluation of software.ISO/IEC 9126, according to its nature of standard, isgeneric and not directly applicable in the daily practice soan effort is required to fit it to each specific applicationdomain and evaluation approach. In fact, this is one of themain shortcomings of many proposals of general softwarequality models: they are difficult to adapt to specificdomains of evaluation [3, 4], for example, COTS(Commercial off the Shelf) software like enterprise resourceplanning (ERPs), security products and so on so importantaspects of each field of action tend to be missed. Forexample, ISO 9126 does not include specific attributes

T Softw., 2010, Vol. 4, Iss. 1, pp. 1–14i: 10.1049/iet-sen.2009.0040

required for a sound evaluation of security, it does notfacilitate evaluation of efficiency or usability of softwareproducts and it is difficult to highlight what functionalitiesare essential for the type of software you need to select. Inaddition, we need to decide what the importance of eachcharacteristic is, in order to assign the subsequent weightwithin the aggregated evaluation. The influence orrelationships among the characteristics is needed too, inorder to avoid too much influence of some characteristicsover others. Therefore definition of quality models is acomplex task which requires so much time, applicationdomain knowledge and experience of evaluated products.Owing to this, quality models oriented to specificapplication domain products have been defined already inthe literature. This type of models can provide a moreaccurate evaluation, as their properties can be defined moreprecisely. Of course, having predefined quality models savestime because you do not have to devise them from scratcheach time. Domain quality models can still be refineddepending on the target organisation type (e.g. small andlarge organisations) where the selection or development

1

& The Institution of Engineering and Technology 2010

2&

www.ietdl.org

process will be carried out or depending on the specificproduct. For example, a small organisation may be justinterested in the existence of a fault tolerance mechanism,while a larger one will be interested in certain kind ofmechanism.

2 Related worksA systematic review related to software quality models forspecific domains was carried out with the objective ofknowing if a systematic valid methodology for generatingdomain-oriented quality models has been proposed. As aresult, the most relevant proposals were identified, analysedand compared following the guidelines proposed byKitchenham [5] who primarily adapted this researchtechnique to the field of software engineering. The numberof primary studies analysed was 374; after applyinginclusion criteria, 61 remained in the list; finally, onceexclusion criteria were used and redundant results wereeliminated and/or grouped, the final number of differentprimary studies was 24. The main conclusions drawn fromour analysis are the following:

† The first step towards a valid model is the use of standardsand other related literature as basis. So one of the first phasesof the method should be the review and analysis of relatedliterature as done in [6–21].

† Factor reduction enables practical and efficient qualitymodels. An interesting approach in order to obtain a reusablemodel is to collect experts’ opinion of different viewpoints toachieve this reduction covering a broad range of perceptionsthat make the model stronger and more applicable.

† A method for determination of weights or importance ofthe criteria and relations among them is anotherrecommended element for the process. Findings of thereview suggest the importance of taking into account theexperts’ opinion, either by consensus (using Delphi process)[7] or by collecting data with questionnaires [22–24] usingthe average values. Different quality perspectives (forexample: managers, developers and users) to embrace awider concept of quality are also important to consider [6,25, 26]. This variety of opinions gives more trustableresults as it reflects more precisely the common view ofquality of the universe of stakeholders.

† Finally [15] presents an interesting step by step process todevelop any domain-oriented quality model from scratch.However, this methodology does not provide any methodto reduce the number of factors neither determine weightsfor each criterion or the relations among criteria. Moreover,experts’ opinion is not taken into account neither thequality perspectives considered for other authors as wementioned above.

Obviously, existing domain-oriented software qualitymodels represent a high value contribution to the

The Institution of Engineering and Technology 2010

knowledge of specific types of applications. Given the factthat in the software engineering area it is hard to obtainindustrial validation of methods, proposals based oncollection of experts’ opinions are very interestingapproaches. However, proposed statistical analysis ofcollected data presents weaknesses. According to ourexperience, it is very difficult to discriminate between themost important factors and their weights using the averageor median values. Moreover, these statistics do not help toreduce the number of factors or to find out the weights andrelations among criteria.

Therefore we infer from results that, although severalstudies offer interesting ideas and methods which can bereused, it is necessary a deeper work in the definition of amethodology to overcome the detected problems. The aimof this methodology is to facilitate, not only the definitionof relevant criteria from standards, related studies and ofresearchers’ experience or knowledge, but also to validatesuch criteria. It is important to check that the criteria areadequate and reduce their number to obtain only theessential ones. Moreover, these data can provideinformation about the weight of each criterion and therelations among them. So, the participation in the processof other experts in order to look for consensus and embracea broad range of different views and knowledge disciplinesinvolved in the problem can help to obtain these models.

Other models not based on ISO/IEC 9126 wereanalysed as a consequence of this review, in order to findout a method to analyse the collected data. In oursearching, we found out another interesting approach:collecting data from questionnaires and applying then amultivariate statistical analysis to reduce the number ofattributes (variables) to be analysed. This is the ideabehind the development of models for evaluatingsatisfaction with the service in sociological and marketingstudies, for example, the widely used SERVQUAL modelfor measuring service quality [27, 28]. These approachesare based either on features catalogues collected andgrouped by the researchers or on the adaptation of othervalidated models. Then, factorial analysis (using principalcomponent solution) is applied in order to reduce thenumber of variables, keeping the most important ones.We have adapted this approach to the development ofquality models in order to overcome the drawbacks ofprevious methods mentioned above.

Using the best of all approaches and extending them toovercome their weaknesses, DUMOD (Domain-orientedqUality MOdels Development), a method for buildingvalidated quality models for specific application domains, ispresented in this paper. The paper is organised as follows.Section 3 introduces DUMOD process, Sections 4 and 5explain in depth the process and describe a case study asexample of application and, lastly, Section 6 presents theconclusions of this work.

IET Softw., 2010, Vol. 4, Iss. 1, pp. 1–14doi: 10.1049/iet-sen.2009.0040

IEdo

www.ietdl.org

3 DUMOD processThe DUMOD process is aimed at defining a quality modelfor a specific domain taking two approaches intoconsideration: industry practices and academic research. Asa consequence, the initial set of criteria is defined usingstandards and literature; then they are validated (andrefined) by application domain experts, IT managers andsoftware engineers. Fig. 1 shows the phases of DUMODprocess.

4 Initial analysisAs we mentioned above, due to generality of existingstandards it is necessary to fit stated characteristics andsubcharacteristics of ISO/IEC 9126-1 [2] to each specificdomain. So that end, the definition of the applicationdomain provides the framework on which the rest of theprocess will be applied. Once defined the applicationdomain, we are ready to adapt the ISO/IEC 9126 standardto that domain.

In the rest of this work, the application domain of networksecurity COTS product is used as example of application ofDUMOD. Network security COTS domain includes anyproduct with the main function of protecting the networksagainst malicious intrusions [29]. In this case, the ISO/

Figure 1 DUMOD process


IEC 9126 model has been used with the followingmodifications in order to adapt it to the domain of securityCOTS:

† Scalability subcharacteristic has been added to efficiencycharacteristic. This factor commonly comes out inspecifications of security products and it has been used tooin COTS quality models by other authors previously[30–32].

† Analysability and testability (maintainabilitysubcharacteristic) have been removed. COTS products haveparticular properties as a result of no information oninternal structure or code is generally available andcustomers are not involved in the development process.Therefore analysability and testability subcharacteristics arenot suitable to our perspective since they require internalaccess to the product [31].

† For the same reason and based on our experience insecurity products evaluation, modifiability subcharacteristichas been modified by updatability, which we have definedas ‘attributes of software that bear on its ability to maintainthe product without faults or security vulnerabilities’.

† Replaceability is an intrinsic property of COTS products,therefore has not been taken into account either.

† Security and accuracy subcharacteristics have not beenincluded due to ISO/IEC 15408-2 [33], which defines thegeneric security attributes (and functionality properties inthe case of security products) and provides protectionprofiles in order to fit the properties to the specific domain.

5 Definition of preliminary modelOnce the quality model is adapted to the specific field ofapplication, the attributes (atomic criteria) suitable forthe specific domain are compiled in order to obtain thecomplete quality model. ISO/IEC 25051 [34] defines theset of requirements for COTS so it is recommendable toanalyse it in conjunction with ISO/IEC 9126-1.Moreover, ISO/IEC 15408 [35] deals with the evaluationcriteria for IT security through the common criteriaapplication: this means it describes the criteria to evaluateone of the quality evaluation subcharacteristics defined inISO/IEC 9126-1. Finally, it is possible that othersstandards related to the specific application domain have tobe reviewed. So, one should review standards like ISO/IEC 9126-2 [36], ISO/IEC 9241-110 [37] and ISO/IEC9241-11 [38] or ISO/IEC 15408-2 [33], related publishedresearch articles, technical reports, publicly available andlesson learned from in-house evaluations.

As a complementary view, experiences in COTSevaluation [1, 39–48] show there is another class ofrequirements that should be considered: those related tonon-technical aspects such as costs, licensing, business

3


4

&

www.ietdl.org

view, vendor assistance or reliability. Identified works in thisarea determine non-technical criteria as generic catalogues forCOTS products but they have a great number of items, so itis necessary to eliminate redundancies and reduce the numberof non-technical attributes for efficiency. Therefore at leasttwo attributes catalogues are obtained in this phase:technical factors (TF) and non-technical factors (NTF)catalogue.

On the other hand, the initial set of attributes is usually toolarge for an efficient application of the model. So, it isconvenient to perform an internal review in order toidentify and combine duplicated factors, controlling andsolving each conflict and rewriting the ambiguous ones.During this process reviewers should have specificknowledge at least on the specific application domain,software quality fundamentals and IT management.Knowledge of software design and computer programmingcan be useful too when dealing with the development of aninformation system COTS component, and evaluationprocess knowledge and experience if the objective is COTSevaluation or selection. At the end of this phase, a wholeset of attributes to be validated by experts is obtained.

The following sources were used in the case study shownhere as an example of application of DUMOD:

† International standards as reference (ISO 9126-1 [2], ISO9126-2 [36], ISO 9241-110 [37], ISO 9241-11 [38], ISO25051 [34], IEEE 1061 [49], ISO 15408-2 [33]).

† Related literature [20, 32, 39, 40, 41, 43, 50–54].

† Technical reports and books [1, 44, 45].

† Usability tests: Software Usability Measurement Inventory(SUMI) [47], ISOMetrics [48], usability guidelines [46].

† Lessons learned from in-house evaluations [55–57].

The total number of attributes obtained in this phase, afteran internal review, was 248 (as we mentioned above securityattributes were not included). Moreover the review wascarried out by a group of ten experts: five security specialists(two of them with experience in security productsevaluation), three software engineering experts withsoftware quality knowledge (one of them with experience insoftware evaluation) and two IT managers. Moreover, inorder to identify unclear attributes, we used a pilotquestionnaire with final year students of an IT degree aswell as Masters level students (IT management). The finalresult was 116 attributes: 96 attributes for TF and 20 forNTF model.

6 Model validation and reductionOnce the attributes are obtained, the domain-oriented modelis ready to adapt to the specific product and project. So that


end, these attributes have to be decomposed in atomicelements, relations and weights of factors have to beidentified and, finally, metrics have to be defined in orderto measure the defined factors. In our case, this model wasadapted in two evaluation projects [55, 56]. Theevaluations projects took too much time for being practicaldue to the high number of factors obtained in both ofthem. The proposed solution consists of doing a reductionof factors while the essential information is retained.Factors reduction can be reached analysing the correlationbetween factors. So, factorial analysis can be applied inorder to find out the most correlated factors. The experts’opinion may be useful not only for this reduction, but alsofor validating the practical utility of the factors obtainedfrom standards and literature.

The main objective of this phase is the establishment andvalidation of factors’ relative importance and the reduction oftheir number to be more efficient. An external review processis proposed with three basic steps based on knowledge andexperience of IT professional experts on the specificapplication domain.

6.1 Design of factors questionnaires

Based on the set of attributes obtained in the previous phase,a questionnaire is designed in order to collect informationfrom IT professionals. The questionnaire has the followingstructure:

† Initial section to explain the goal of the study and to collectanonymous demographic data like sector of activity (banking,IT, communication technology etc.), technical profile,experience or professional qualification and experience withproducts of the specific application domain. Thesequestions will help us to accept or reject data consideringhigh expertise of participants in the study is required.

† Multiple-item questions on TF and NTF obtained fromphase one (see Section 6) using five-point Likert-type scaleranging from 1 (strongly agree) to 5 (strongly disagree).The questionnaires also include a free space at the end togive the respondents the opportunity of proposingadditional criteria.

In the case study reported here, refined questionnaires onan access-restricted web page for anonymous online surveywere implemented. Owing to the high number of technicalattributes, they had to be separated in two questionnaires:usability attributes (51 variables) and the rest of technicalattributes (45 variables). We present the results of the studyfor the TF and NTF questionnaires in this paper due tospace limitations.

6.2 Data collection

Since an important part of the process is based on the ITprofessional collected data, it is important to control their


IEdo

www.ietdl.org

profiles. Demographic data should be analysed in order toexclude possible respondents who do not fulfill theexpected profile.

In the case study here presented, we used restricted-accessweb questionnaires and respondents were specifically invitedafter checking their profile during events or with thecollaboration of professional associations. The survey processcaptured a total of 203 completed and valid questionnaires, sothe margin of error was 6.9% with a confidence interval of95%. Most of respondents were between 31 and 40 (36.46%)or 41 and 50 years old (34.25%). Information andCommunications Technology sector (38.12%) followed byIndustry (23.20%), Government and Education (9.94%) werethe main productive sectors. Technical profile of respondentswere very balanced between IT managers (30.54%) andsecurity engineers (25.82%), followed by software engineers(19.34%), project managers (9.94%), IT consultants (7.73%)and educators and research staff (6.63%). Most of them hadmore than five years of experience in their position (64.64%).Regarding to their professional qualification related to securitysoftware products, they had a high experience with the use ofthis kind of product: expert level for 64.64% (very high level),evaluator/consultancy level for 13.81% (high level),administrator level for 13.81% (medium level) and only4.97% had user level (low level). Lastly, most of them hadbasic (49.72%) or advance (32.04%) training on securityproducts.

Moreover, we included one more question in order to knowwhich is the perceived utility of such an evaluation modelranging from 0 (it is not useful) to 1 (it is useful). The resultsshow that the 94.48% of the experts consider that it is usefulhaving a specific quality model for security COTS productswith the attributes shown in the questionnaire.

6.3 Data analysis

As mentioned above, we use factorial analysis to reducethe number of factors of the quality model andeliminate redundancy. Reliability of the model is testedthrough the Cronbach’s alpha coefficient before to applythe factorial analysis. This measure helps to verify if thefactors are related between them, that is, all the factorsmeasure the same concept. Then factorial analysis isapplied to verify which concept measures each group offactors. This allows determining the structure of thescale [24].

The next sections will explain in detail the activitiesfor data analysis, as well as their application to the casestudy here shown in order to clarify the DUMODprocess.

6.3.1 Preliminary analysis: Exploratory analysisenables two important actions: exclusion of possible errorsduring data collection, as well as checking of the feasibilityof factorial analysis. Analysis requires the examination of


descriptive statistics (means, standard deviation, median,minimum and maximum and absolute and relativefrequencies) of all the variables in the study. Moreover, boxplots help to determine data entry errors and the coefficientof variation can be used to check the homogeneity of data.The correlation matrix offers information about factorialanalysis applicability: correlations higher than 0.30,significance levels and determinants close to zero show thatthere are correlated variables [58]. The result of thisanalysis is the set of data to analyse.

In the case study, we used SPSS 16.0.1 and LISREL 8.80statistical tools to analyse the collected data for securityCOTS products. A first visual inspection of correlationmatrix showed us that there was an essential number ofcorrelations higher than 0.30; consequently, we concludedthat there were interrelated variables [58]. Moreover, asalmost all significance levels are close to zero, we had toreject the null hypothesis and conclude that there was linearrelationship among variables. The determinants are nearzero too (4,97E-010 for NTF and 6,81E-024 for TFquestionnaire), which it is an indicator that confirms thesevariables are highly correlated between them and, therefore,factorial analysis is applicable.

6.3.2 Reliability analysis: The first validation carried outis the reliability analysis of the used scales. The reliability is thedegree in which the observed variable measures the real valueand is free of error. Factors (called items in statisticalterminology) are grouped into their characteristics. Then, thelist of factors for each characteristic (commonly calleddimension) is reduced by examining corrected item-to-totalcorrelations, and deleting those items which improvereliability alpha coefficient (after being eliminated). Thefactor analysis is conducted in each component separately,because it is important to establish that the construct is validin each group, or in other words, all the factors included inthe same subcharacteristic measure the same thing. Theresulting scales should exceed the conventional minimum of0.7 [59–61].

In the case of the application domain of network securityCOTS here presented, we have summarised the resultsobtained in Tables 1 and 2: Table 1 shows the Cronbach’salpha coefficient values of the resulting scales for NTF andTable 2 for TF model. As can be observed in these tables,the alpha coefficient values range from 0.736 to 0.930demonstrating high consistency and reliability of eachdimension.

6.3.3 Exploratory factorial analysis: Exploratoryfactorial analysis (EFA) enables to identify the underlyingstructure of the relations obtaining a predictive validation ofthe model. In order to find out the acceptability of factorialanalysis results, the Kaiser–Meyer–Oklin (KMO) index(KMO . 0.5 should be accepted; KMO between 0.7 and0.8 is considered a good value; KMO . 0.8 is meritorious[65]) and the Bartlett’s test of sphericity should be checked.

5


6

&

www.ietdl.org

DUMOD uses the method of principal components asextraction method instead of common factorial analysis sinceone of the main aims is the factors’ reduction. For the samereason, Varimax (with Kaiser normalisation) as rotationmethod and the breaks-in-eigenvalues criterion [58] is usedto decide on the initial number of factors to keep. Factorloadings (weight of each factor on its subcharacteristic) equalto or greater than 0.5 are considered strong [58]. Factorswith low loadings on all components are eliminated too (thecut-off value for loadings was 0.32 [62]).

In order to clarify the application of EFA, its applicationto the case of network security COTS is presentednext. Table 1 shows the KMO and Bartlett’s test forNTF model and Table 2 for TF model in the case studyhere reported. KMO was acceptable with values greater than0.79 and Bartlett’s test indicates that there is a meaningful

relationship among the variables. Table 1 presents too theextracted components which have been labelled as follows:

1. Vendor characteristics. Criteria related to the vendors(experience, reputation etc.).

2. Product non-technical characteristics. Criteria related toorganisational issues (type of license, support etc.).

3. Standards conformance. Complying with qualitystandards (product and vendor sides).

Note that EFA has extracted one factor else than the initialdefinition based on related literature (Section 5) which wehave called ‘standards conformance’. This means that thefactors ‘security certification’ and ‘conformance with qualitystandards’ are not related to the other factors.

Table 1 EFA results for NTF scale

Factor EFA loadings (after varimax rotation)a

Product NTFproperties

Vendorproperties

Standardsconformance

Non-technical vendor-related factors (coefficient alpha ¼ 0.846)

PROV12 vendor market share 0.87

PROV13 vendor reputation 0.79

PROV14 vendor solvency 0.75

PROV15 vendor experience 0.40 0.58

PROV17 vendor autonomy 0.62

PROV18 vendor service quality certification 0.63

Non-technical product-related factors (coefficient alpha ¼ 0.823)

PROD21 compatibility with the existentarchitecture

0.76

PROD22 compatibility with corporate securitypolicy

0.69 0.35

PROD23 product stability in the market 0.54 0.45

PROD25 license type 0.76

PROD27 support type 0.71

PROD28 training offer 0.69

TECN9 security certification 0.83

TECN10 conformance with quality standards 0.81

Kaiser–Meyer–Olkin measure of sampling adequacy 0.842

Bartlett’s test of sphericity (approx. chi-square) 363,014b

correlation matrix determinant 0.001

EFA ¼ exploratory factor analysis, loadings ,0.32 not shown; aTotal variance extracted by the three factors ¼ 62.319%,bp , 0.001

IET Softw., 2010, Vol. 4, Iss. 1, pp. 1–14The Institution of Engineering and Technology 2010 doi: 10.1049/iet-sen.2009.0040

IEdo

www.ietdl.org

Table 2 EFA results for TF scale


Supportability Upgradability Stability Recoverability Resource

behaviour

Time

behaviour

Usability Interoperability

Functionality (coefficient alpha ¼ 0.736)

FUNC2 use of standard

interfaces

0.36 0.56

FUNC3 HW requirements 0.88

FUNC4 SW requirements 0.35 0.83

Reliability (coefficient alpha ¼ 0.902)

FIAB8 percent of faults

(product

maturity)

0.45 0.61

FIAB9 time to fix a patch

(passed time while

vendor investigates

a fault, develop and

test a fix)

0.38 0.69

FIAB10 software faults do

not interfere with

others programs or

the operating

system

0.78

FIAB11 number of critical

faults

0.79

FIAB12 minor faults do not

interfere with the

other functions

0.78

FIAB13 serious faults do

not interfere with

critical functions

0.87

FIAB14 ability to return to

normal status after

a fault

automatically

0.72 0.36

FIAB15 unavailability time

after fault

0.42 0.65


a previous status

after an abnormal

event (restore)

0.36 0.66


a normal status

after a fault

(recovery)

0.32 0.72

FIAB18 recovery

documentation

0.45 0.61

usability (coefficient alpha ¼ 0.853)

USAB21 understandability 0.55 0.64

USAB22 learnability 0.86

USAB23 operability 0.78

Continued

T Softw., 2010, Vol. 4, Iss. 1, pp. 1–14 7i: 10.1049/iet-sen.2009.0040 & The Institution of Engineering and Technology 2010

8

&

www.ietdl.org

On the other hand, Table 2 presents too the extractedcomponents which we labelled and defined as follows:

1. Interoperability. Easiness of interaction with specified systems.

2. Stability (maturity and fault tolerance). Frequencyof failure (due to faults in the software) andability to maintain a specified level of performance in caseof failure.

Table 2 Continued


Supportability Upgradability Stability Recoverability Resource

behaviour

Time

behaviour

Usability Interoperability

efficiency (coefficient alpha ¼ 0.819)

EFIC26 response time 0.32 0.36 0.59

EFIC27 throughput

(processing

capability per time

unit)

0.49 0.56

EFIC28 memory use 0.92

EFIC29 processor use 0.84

EFIC30 disk space use 0.38 0.74

EFIC31 scalability 0.45 0.34 0.56

maintainability (coefficient alpha ¼ 0.896)

MANT33 time to deploy an

update (how long it

takes for users to

obtain a update

installed)

0.57 0.37

MANT36 stable patches 0.74

MANT37 ability to return to

a previous status

after a patch

installation

(upgrade)

0.34 0.73

MANT38 upgrades do not

remove previous

configuration

0.70

portability (coefficient alpha ¼ 0.930)

PORT40 installation

easiness

0.84

PORT41 installation effort 0.87

PORT42 installation

documentation

0.74

PORT46 configuration

assistant

0.82

PORT47 configuration help

and documentation

0.72

Kaiser–Meyer–Olkin measure of

sampling adequacy

0.79

Bartlett’s test of sphericity (approx.

chi-square)b

2050.612

correlation matrix determinant 5.98 � 1021þ 6

EFA ¼ exploratory factor analysis, loadings ,0.32 not shown; aTotal variance extracted by the eight factors ¼ 76,921%, bp , 0.001


IETdo

www.ietdl.org

3. Recoverability. Ability to re-establish its level ofperformance in case of failure.

4. Time behaviour. Response and processing times, andthroughput rates [2].

5. Resource behaviour. Amount of resources used andduration of such use [2].

6. Usability. A set of attributes that bear on the effort neededfor use, and on the individual assessment of such use, by astated or implied to a set of users [2].

7. Upgradeability. Attributes of software that bear onits ability to maintain the product without faults orsecurity vulnerabilities and with the most advancedtechnologies.

Table 3 CFA results

Goodness-of-fit statistics

Suggested cut-off value Technical factors scale Non-technical factors scale

x2 (df) 725.54 (436) 116.923 (74)

s-x2 .1, , 2 1.66 1.58

RMSEA ,0.08 0.0789 0.0692

CFI .0.9 0.900 0.979

NFI .0.9 0.907 0.944

NNFI .0.9 0.955 0.974

CFA ¼ confirmatory factor analysis; RMSEA ¼ root mean square error of approximation; CFI ¼ confirmatory factor analysis;NFI ¼ normed fit index; NNFI ¼ non-normed fit index

Figure 2 TF final quality model for security COTS

Softw., 2010, Vol. 4, Iss. 1, pp. 1–14 9i: 10.1049/iet-sen.2009.0040 & The Institution of Engineering and Technology 2010

10

&

www.ietdl.org

8. Supportability. Easiness for software management(installation, configuration, upgrades etc.)

In the case of TF model, the results are quite similar tothe ISO 9126 model: EFA has distributed the factors inthe same subcharacteristics that ISO 9126 model withthe exception that some of them have been eliminated(apart from those eliminated during the adaptation of thestandard to the application domain in Section 4). Themain difference is that stability subcharacteristic is here amix of maturity and fault tolerance due to factors relatedto both of them. Moreover, we have renamed some ofthem in order to adapt them to the remaining factorsand to the application domain of network securityCOTS. For example, in the case of maintainabilitysubcharacteristic, only the factors related to updates havebeen kept after EFA execution, so we have renamed

maintainability as upgradability. Something similarhappens in portability subcharacteristic: we have calledsupportability because of the meaning of the factorswhich now composed it, all of them related toinstallation and configuration features.

6.3.4 Confirmatory factorial analysis: The predictivevalidation obtained after applying EFA in previous sectionshould be confirmed to obtain the final model.Confirmatory factor analysis (CFA) through structuralequation model and maximum likelihood (ML) estimationmethod is used in order to assess the validity of the model.Table 3 shows minimum recommended values for good fit[63]. After checking the goodness-of-fit statistics, CFAshows the path diagram as the final model with therelations between latent variables or dimensions and theweights of each variable or attribute on its dimension.

Table 4 Correlations between latent variables (dimensions) for TF model

Interoperability Stability Recoverability Usability Timebehaviour

Resourcebehaviour

Upgradability Supportability

interoperability 1.00

stability 0.51 1.00

recoverability 0.41 0.72 1.00

usability 0.18 0.34 0.48 1.00

time behaviour 0.55 0.71 0.49 0.44 1.00

resourcebehaviour

0.32 0.42 0.37 0.41 0.56 1.00

upgradability 0.49 0.72 0.69 0.47 0.71 0.53 1.00

supportability 0.18 0.27 0.48 0.65 0.50 0.52 0.68 1.00

Figure 3 NTF final quality model for security COTS


IETdo

www.ietdl.org

Regarding the case study, Table 3 shows that the obtainedresults of the goodness-of-fit statistics for both models exceedthe minimum recommended values for good fit providingevidence of discriminant validity. On the other hand,Fig. 2 shows the path diagram obtained for TF model (dueto the space limitations and for correct understanding ofdata, correlation coefficients for latent variables have beenremoved and they are shown in Table 4). Such a diagramshows the weights of the variables on each dimension andTable 4 shows the correlation coefficients for latentvariables, that is, the grade of dependency among thedimensions. As can be observed in this table, the highestdependencies obtained are the relationship among stabilityand recoverability and those among stability andupgradability.

Regarding NTF model, as it is shown in Fig. 3, the highestcorrelation appears between product and vendor-relatedcriteria. This shows the high dependency amongthese criteria considered by other authors [39–41] asonly one set (called ‘non-technical’, ‘organisational’ or‘strategic’ criteria).

7 ConclusionsQuality models are an essential instrument in the discipline ofsoftware engineering as a framework for control. Manyproposals can be found in literature although it is frequentthey do not offer data on empirical support (if exists) andjustification of the details of each model. Even more,known models are not easily adaptable to specific domainswhere non-general characteristics arise as key factors forsoftware products. This is the case of COTS softwareproducts where we have realised the difficulty ofperforming consistent and empirically solid evaluations.

Based on the broad set of references in the literature wehave conceptualised, built, refined and tested a qualitymodel for security software products. We have collectedthe opinion of experts about the desired properties of ITsecurity COTS products and then we have analysed thesedata. As a result we have obtained two models formeasuring the quality of software products (commonlycalled COTS) on security domain: the TF model(relevant for measurement of the technical qualityproperties of the product) is an eight-dimensional, 32-item model, whereas NTF model (relevant formeasurement of the non-technical quality propertiesrelated to vendor and product properties) is a three-dimensional, 14-item model. Apart from validating thesetwo models, a reduction of quality factors has beenachieved through the elimination of redundancy (variablescorrelated with other ones). Total reduction was 39.47%of variables. This reduction will improve the efficiency ofquality evaluation processes. Moreover, this effort canhelp researchers to develop new domain-oriented qualitymodels through the use of the supporting modeldevelopment process DUMOD.

Softw., 2010, Vol. 4, Iss. 1, pp. 1–14i: 10.1049/iet-sen.2009.0040

As an additional result to the conclusions obtained forthe security COTS area (outside of this paper approach),we have presented two important results related to qualitymodels. The first one is that we have revealed relationshipsbetween the model characteristics. Therefore it is notrecommendable to focus only on one characteristic or smallset of them to analyse COTS quality. Moreover theseresults define and confirm the above mentioned influence(suggested but not confirmed in the literature [15, 64, 65])between the different ISO 9126 subcharacteristics. Thesecond conclusion is that these results contribute toquantify not only these relationships but also the weights ofattributes within each subcharacteristic.

Finally, we have confirmed that the factorial analysisapproach has proved the statistical soundness in comparisonwith previous studies, but nevertheless, the case studysuffers from two limitations. The first one is related to theIT professionals sample which is recommendable to extendto broader population. Moreover, the application ofstatistical analysis requires some knowledge of themultivariable analysis technique. So, purchasers andevaluators need experts who know how to use thesemethods correctly. Maybe the cooperation with a localuniversity could be a possible approach to overcome thisdifficulty. Anyway, these results are helping us in ongoingevaluations of security products [57] and we strongly thinkthey are a valuable asset for evaluators and for IT securityCOTS developers to promote systematic assessment andimprovement of security products.

8 References

[1] FENTON N., PFLEEGER S.L.: ‘Software metrics: a rigorous andpractical approach’ (PWS Publishing Co., Boston, MA, USA,1997, 2nd edn.)

[2] ISO 9126-1. ISO/IEC 9126-1:2001: ‘Software engineering –product quality – Part 1: quality model’ (InternationalStandards Organization, Geneva, 2001)

[3] KITCHENHAM B., PFLEEGER S.L.: ‘Software quality: the elusivetarget’, IEEE Softw., 1996, 13, (1), pp. 12–21

[4] DROMEY R.G.: ‘Cornering the chimera’, IEEE Softw., 1996,13, (1), pp. 33–43

[5] KITCHENHAM B.: ‘Procedures for performing systematicreviews’. Technical Report TR/SE0401, Keele University,and Technical Report 0400011T.1, National ICT Australia,2004

[6] HALL M.J.J., HALL R., ZELEZNIKOW J.: ‘A process for evaluatinglegal knowledge-based systems based upon the contextcriteria contingency-guidelines framework’. Proc. NinthInt. Conf. Artificial Intelligence and Law, Scotland, UnitedKingdom, 2003

11


12

&

www.ietdl.org

[7] LOSAVIO F., MATTEO A., RAHAMUT R.: ‘Web services domainanalysis based on quality standards’, in ‘Softwarearchitecture’, 2008 (LNCS), SpringerLink

[8] HELMUT N., BENJAMIN Z., JENS G.: ‘An approach to qualityengineering of TTCN-3 test specifications’, Int. J. Softw.Tools Technol. Transf., 2008, 10, (4), pp. 309–326

[9] MALAK G., BADRI L., BADRI M., SAHRAOUI H.: ‘Towards amultidimensional model for web-based applicationsquality assessment’ in BAUKNECHT K., BICHLER M., PROLL B.

(EDS.) : ‘E-Commerce and web technologies: computerscience journal’ (Springer, 2004), pp. 316–327

[10] MORAGA M.A., CALERO C., GARZAS J., PIATTINI M.: ‘Assessmentof portlet quality: collecting real experience. Computerstandards and interfaces’, Elsevier Science, 2009, 31, (2),pp. 336–347

[11] PEREZ L.S.V., TORNES A.F.G., RIVERON E.M.F.: ‘MECRAD:model and tool for the technical quality evaluationof software products in visual environment’. Computingin the Global Information Technology, 2008ICCGI’08, The Third International Multi-Conf., 2008,pp. 107–112

[12] STRAHONJA V.: ‘The evaluation criteria of workflowmetamodels’. Information Technology Interfaces, ITI 2007,29th Int. Conf., 2007, pp. 553–558

[13] YEONGSEOK L., JUNGHYUN B., SEOKKOO S.: ‘Development ofquality evaluation metrics for BPM (Business ProcessManagement) system’. Proc. Fourth Annual ACIS Int. Conf.Computer and Information Science, 2005

[14] SANGEETA N., HAUSI A.M.: ‘Quality criteria and an analysisframework for self-healing systems’. Proc. 29th Int. Conf.Software Engineering Workshops, 2007

[15] FRANCH X., CARVALLO J.P.: ‘Using quality models insoftware package selection’, Softw. IEEE, 2003, 20, (1),pp. 34–41

[16] CARVALLO J.P., FRANCH X., QUER C., RODRIGUEZ N.: ‘A frameworkfor selecting workflow tools in the context of compositeinformation systems’, in ‘Lecture notes in computerscience, database and expert systems applications’(Springer, Berlin/Heidelberg, 2004), pp. 109–119

[17] CARVALLO J.P., FRANCH X., QUER C.: ‘Defining a quality modelfor mail servers’. Proc. Second Int. Conf. COTS-BasedSoftware Systems, 2003

[18] FRANCH X., CARVALLO J.P.: ‘A quality-model-based approachfor describing and evaluating software packages’. Proc.10th Anniversary IEEE Joint Int. Conf. RequirementsEngineering, 2002


[19] BOTELLA P., BURGUES X., CARVALLO J.P., FRANCH X., PASTOR J.A.,QUER C.: ‘Towards a quality model for the selection ofERP systems’, in ‘Component-based software quality’,2003 (LNCS, 2693), Springer, Berlin/Heidelberg,pp. 225–245

[20] CARVALLO J.P., FRANCH X.: ‘Extending the ISO/IEC 9126-1quality model with non-technical factors for COTScomponents selection’. Proc. 2006 Int. Workshop onSoftware Quality, Shanghai, China, 2006

[21] FRANCH X., QUER C., CANTON J., SALIETTI R.: ‘Experiencereport on the construction of quality models for somecontent management software domains’. Proc. SeventhInt. Conf. Composition-Based Software Systems (ICCBSS2008), 2008

[22] OH G.O., KIM D.Y., KIM S.I., RHEW S.Y.: ‘A quality evaluationtechnique of RFID middleware in ubiquitous computing’.Proc. 2006 Int. Conf. Hybrid Information Technology,2006, vol. 2

[23] WON J.S., JI H.K., SUNG Y.R.: ‘A quality model for opensource software selection’. Proc. Sixth Int. Conf. AdvancedLanguage Processing and Web Information Technology(ALPIT 2007), 2007

[24] LEE K., LEE S.J.: ‘A quantitative evaluation model using theISO/IEC 9126 quality model in the component baseddevelopment process’. Proc. 2006 Int. Conf.Computational Science and its Applications (ICCSA 2006),Scotland, 2006 (LNCS), pp. 917–926

[25] STEFAN W., FLORIAN D.: ‘An integrated approach to qualitymodelling’. Proc. Fifth Int. Workshop on Software Quality,2007

[26] BEHKAMAL B., KAHANI M., AKBARI M.K.: ‘CustomizingISO 9126 quality model for evaluation of B2Bapplications’, Elsevier Infor. Softw. Technol. J., 2009, 51,(3), pp. 599–609

[27] PARASURAMAN A., ZEITHAML V.A., BERRY L.L.: ‘SERVQUAL:a multiple-item scale for measuring consumer forperceptions of service quality’, J. Retail., 1988, 64,pp. 12–40

[28] PARASURAMAN A., ZEITHAML V.A., BERRY L.L.: ‘A conceptualmodel of service quality and its implications for futureresearch’, J. Mark., 1985, 70, (3), pp. 201–230

[29] LINGYU W., ANOOP S., SUSHIL J.: ‘Toward measuring networksecurity using attack graphs’. Proc. 2007 ACM Workshop onQuality of Protection, Alexandria, VA, USA, 2007

[30] KUNDA D.: ‘STACE: social technical approach to COTSsoftware evaluation’, in CECHICH A., PIATTINI M., VALLECILLO A.


IETdo

www.ietdl.org

(EDS.) : ‘Component-based software quality: computerscience journal’ (Springer, 2003), pp. 64–84

[31] TORCHIANO M., JACCHERI L., SØRENSEN C.F., WANG A.I.: ‘COTSproducts characterization’. Proc. 14th Int. Conf.Software Engineering and Knowledge Engineering, Ischia,Italy, 2002

[32] KUNDA D., BROOKS L.: ‘Identifying and classifyingprocesses (traditional and soft factors) that support COTScomponent selection: a case study’, Eur. J. Inf. Syst., 2000,9, (4), pp. 226–234

[33] ISO 15408-2. ISO/IEC 15408-2 2005: ‘Informationtechnology – security techniques – evaluation criteria forIT security – Part 2: security functional requirements’(International Standards Organization, Geneva, 2005, 1stedn.)

[34] ISO 25051. ISO/IEC 25051: ‘Software engineering –software product quality requirements and evaluation(SQuaRE) – requirements for quality of commercial off-the-shelf (COTS) software product and instructions fortesting’ (International Standards Organization, Geneva,2006, 1st edn.)

[35] ISO 15408. ISO/IEC 15408: ‘Information technology –security techniques – evaluation criteria for IT security’(International Standards Organization, Geneva, 2005, 1stedn.)

[36] ISO 9126-2: ‘ISO/IEC 9126-2’2003 – softwareengineering – product quality – Part 2: external metrics’(International Standards Organization, Geneva, 2003)

[37] ISO 9241-110. ISO/IEC 9241-110: ‘Ergonomics ofhuman-system interaction – Part 110: dialogue principles’(International Standards Organization, Geneva, 2006)

[38] ISO 9241-11. ISO 9241-11: ‘Ergonomic requirementsfor office work with visual display terminals (VDTs) – Part11: guidance on usability’ (International StandardsOrganization, Geneva, 1998)

[39] KONTIO J., CALDIERA G., BASILI V.R.: ‘Defining factors, goalsand criteria for reusable component evaluation’. Proc.1996 Conf. Centre for Advanced Studies on CollaborativeResearch, Toronto, ON, Canada, 1996

[40] OCHS M., PFAHL D., CHROBOK-DIENING G., NOTHHELFER-KOLB B.: ‘Amethod for efficient measurement-based COTS assessmentand selection – method description and evaluation results’.Proc. Seventh Int. Symp. Software Metrics, 2001

[41] COMELLA-DORDA S., DEAN J.C., MORRIS E., OBERNDORF P.:‘A process for COTS software product evaluation’. First Int.Conf. ICCBSS 2002, Orlando, USA, 2002, pp. 86–96

Softw., 2010, Vol. 4, Iss. 1, pp. 1–14i: 10.1049/iet-sen.2009.0040

[42] LAWLIS P.K., MARK K.E., THOMAS D.A., COURTHEYN T.: ‘A formalprocess for evaluating COTS software products’, IEEEComput., 2001, 34, (5), pp. 58–63

[43] CARVALLO J.P., FRANCH X., QUER C.: ‘Managing non-technicalrequirements in COTS components selection’. Proc. 14thIEEE Int. Requirements Engineering Conf. (RE’06), 2006,pp. 316–321

[44] OBERNDORF P.A., BROWNSWORD L., MORRIS E., SLEDGE C.:‘Workshop on COTS-based systems’. Special Report:CMU/SEI-97-SR-019, Software Engineering Institute,Carnegie Mellon University, November 1997

[45] COMELLA-DORDA S., DEAN J., LEWIS G., MORRIS E., OBERNDORF P.,HARPER E.: ‘A process for COTS software productevaluation’. Technical Report CMU/SEI-2003-TR-017 ESC-TR-2003-017 Carnegie Mellon, Software EngineeringInstitute, Pittsburgh, 2004

[46] KOYANI S.J., BAILEY R.W., NALL J.R.: ‘Research-based webdesign and usability guidelines: U.S. Dept. of Health andHuman Services’, http://www.usability.gov/pdfs/guidelines.html, accessed 2006

[47] KIRAKOWSKI J., CORBETT M.: ‘SUMI: the software usabilitymeasurement inventory’, Br. J. Educ. Technol., 1993, 24,pp. 210–212

[48] GEDIGA G., HAMBORG K.-C., DUNTSCH I.: ‘The isometricsusability inventory. An operationalisation of ISO 9241-10 supporting summative and formative evaluation ofsoftware systems’, Behav. Infor. Technol., 1999, 18, (3),pp. 151–164

[49] IEEE. IEEE Std. 1061– 1998: ‘Standard for a softwarequality metrics methodology’ (IEEE Computer Society,1998)

[50] OBESO M.E.A.: ‘Metodologıa de Medicion y Evaluacion dela Usabilidad en Sitios Web Educativos’ (Universidad DeOviedo, Oviedo, 2005)

[51] PEREZ L.S.V., TORNES A.F.G.: ‘MECHDAV – a quality modelfor the technical evaluation of applications developmenttools in visual environments’. Software measurementEuropean Forum, SMEF 2005, Rome, Italy, 2005,pp. 155–163

[52] AVIZIENIS A., LAPRIE J.-C., RANDELL B., LANDWEHR C.: ‘Basicconcepts and taxonomy of dependable and securecomputing’, IEEE Trans. Dependable Secur. Comput., 2004,1, (1), pp. 11–33

[53] BERTOA M., VALLECILLO A.: ‘Atributos de calidad paracomponentes COTS’. Proc. IDEAS’02, La Habana, Cuba,2002, pp. 352–363

13


14

&

www.ietdl.org

[54] MORISIO M., TORCHIANO M.: ‘Definition and classificationof COTS: a proposal’. ICCBSS’02: Proc. First Int. Conf.COTS-based Software Systems, London, UK, 2002,pp. 165–175

[55] VILLALBA M.T., FERNANDEZ-SANZ L.: ‘Technical report –evaluation report of Internet Security and Acceleration(ISA) Server 2006’, Security, European Security, November2007, vol. 14, pp. 78–81

[56] VILLALBA M.T., FERNANDEZ-SANZ L.: . ‘Technical report –evaluation report of Cryptosec 2048’, eSecurity, EuropeanSecurity, June 2007, vol. 13, pp. 78–81

[57] VILLALBA M.T., FERNANDEZ-SANZ L.: . ‘Technical report –evaluation report of Internet Application Gateway (IAG)2007’, eSecurity, European Security, June 2008, vol. 18,pp. 53–57

[58] HAIR J.F., TATHAM R.L., ANDERSON R.E., BLACK W.: ‘Multivariatedata analysis’ (Prentice Hall, 1998, 5th edn.)

[59] THOMSEN E.: ‘Olap solutions: building multidimensionalinformation systems’ (Wiley, 2002)


[60] NUNNALLY J., BERNSTEIN I.: ‘Psychometric theory’ (McGraw-Hill, New York, 1994)

[61] FLYNN P., CURRAN K., LUNNEY T.: ‘A decision support systemfor telecommunications’, Int. J. Netw. Manag., 2002, 12,(2), pp. 69–80

[62] TABACHNICK B.G., FIDELL L.S.: ‘Using multivariate statistics’(Allyn & Bacon, Pearson Education, 2006, 5th edn.)

[63] HU L., BENTLER P.M.: ‘Cutoff criteria for fit indexes incovariance structure analysis: conventional criteria versusnew alternatives’, Struct. Equation Model., 1999, 6, (1),pp. 1–55

[64] VEENENDAAL E.P.W.M.V., TRIENEKENS J.J.M.: ‘Testing based onusers’ quality needs’. IFIP TC5 WG54 Third Int. Conf.Reliability, Quality and Safety of Software-intensiveSystems, Athens, Greece, 1997

[65] PUNTER T., SOLINGEN R.V., TRIENEKENS J.: ‘Softwareproduct evaluation – current status and future needs forcustomers and industry’. Fourth IT Evaluation Conf. (EVIT-97), The Netherlands, Delft, 1997


empirical support for the generation of domain-oriented quality models

Documents