automated reasoning for multi-step software …briand/white-mstep.pdfautomated reasoning for...

Automated Reasoning for Multi-step Software Product-lineConfiguration Problems

Jules White, Brian Dougherty, and Doulas C. SchmidtVanderbilt University

Email:{jules, briand, schmidt}@dre.vanderbilt.edu

David BenavidesUniversity of Seville

Email:[email protected]

Abstract—The increasing complexity and cost of software-intensive systems has led developers to seek ways of reusingsoftware components across development projects. One approachto increasing software reusability is to develop a SoftwareProduct-line (SPL), which is a software architecture that can bereconfigured and reused across projects. Rather than developingsoftware from scratch for a new project, a new configuration ofthe SPL is produced. It is hard, however, to find a configurationof the SPL that meets an arbitrary requirement set and does notviolate any configuration constraints in the SPL.

Existing research has focused on techniques that produce aconfiguration of the SPL in a single step. Budgetary constraints orother restrictions, however, may require multi-step configurationprocesses. For example, an automotive manufacturer may wantto produce a series of configurations of a car over a span of yearswithout exceeding a yearly budget to add features.

This paper provides three contributions to the study of multi-step configuration for SPLs. First, we present a formal modelofmulti-step SPL configuration and map this model to constraintsatisfaction problems (CSPs). Second, we show how solutions tothese SPL configuration problems can be automatically derivedwith a constraint solver by mapping them to CSPs. Third,we present empirical results demonstrating that our CSP-basedreasoning technique can scale to SPL models with hundreds offeatures and multiple configuration steps.

I. I NTRODUCTION

The high-cost of developing distributed real-time and em-bedded (DRE) systems has pushed developers to find novelsolutions to increase the reusability of software. One promisingreuse approach is Software Product-lines (SPLs) [6], whicharesoftware architectures that are designed with built-in points ofvariability that can be altered so that the software can be morereadily reused across projects. For example, an SPL for a carcan be built with the ability to use multiple engine controlsoftware components so it can be adapted to cars with differentengine types.

Ensuring that a correct software product is produced froman SPL involves building models of the rules for configuringthe points of variabilty. For example, an SPL configurationfor a car cannot simultaneously employ two different enginecontrol software components or the wrong component for thegiven engine type. A common technique for specifying SPLconfiguration rules is afeature model[12], which abstracts thecomponents and points of variaiblity in a software product asfeatures.

Feature models are typically implemented as tree-like struc-tures that specify how the components and points of variabilityaffect one another. For example, the feature model of a car

in Figure 1 can optionally include anAutomated DrivingController. If the car includes this feature it must alsoinclude theCollision Avoidance Breaking feature. Anyarbitrary configuration can be checked against the featuremodel to determine if it is a complete and correct configurationof a software product.

When an SPL is configured for a new set of requirements,developers must find a selection of the features from thefeature model that (1) satisfy the requirements and (2) adhereto the rules in the feature model. This configuration processinvolves reasoning over a complex set of constraints to meetan end goal. Various tools [2], [13], [1], [4], [5], [16] havebeen developed to help reduce the complexity of this processby automating parts of the feature selection process.

Open problems.Some configuration problems require start-ing at an arbitrary state and deriving a new configuration thatmeets the target requirements. For instance, an automotivesoftware designer using an SPL may start with no featuresselected and derive a selection of features for the automobilesoftware to meet the needs of a new model year car. Often,however, constraints limit developers from directly transition-ing from the starting state to the desired end configuration.

For example, assume that a group of automotive SPLdevelopers want to modify the configuration of an existingSPL car model to include automated driving capabilities, asshown in Figure 1. The developers have determined that thecost of adding all the new features will be 88 million dollars.The developers only have a annual development budget of 35million dollars to reconfigure the SPL variant, however, whichmeans that the developers cannot simply add all features ina single year. Management has also asked the developers tomake continual progress on developing the car by adding newfeatures to it every year.

To manage these constraints, the developers must incre-mentally add the desired features over a series of steps,i.e.,over several years the developers will produce a series ofintermediate configurations that leverage each other to reachthe desired configuration. For example, they can develop asubset of the new features in the first year’s car configuration,add more of the remaining desired features in the secondyear, and add the rest of the desired features and reach thenew configuration in the third year. This process of producinga series of intermediate configurations—i.e., a configurationpath—is shown in Figure 2. We call this sequence of activitiesa multi-step configuration problem.

Fig. 1: A Configuration Problem Requiring Multiple Steps

Year 1 Year 2 Year 3 Year 4

Year 1 Year 2 Year 3 Year 4

Development

Budget

Exceeded !!!!

Feature Model

Rules Violated

!!!!

!! Invalid !!

Configuration

Path

Valid

Configuration

Path

Fig. 2: Potential Configuration Paths

A key challenge is that the developers cannot arbitrarily pickand choose features to add in a given year to meet the budgetconstraint. For example, developers will violate the featuremodel rules if they choose to addParallel Parking in year3 without Lateral Range Finder, which is required via across-tree constraint, as shown in Figure 2. Developers musttherefore not only adhere to their constraints on the changesthat can be produced in a given year (such as the maximumallowed annual development budget) but also ensure that thechanges they choose create a valid configuration at the end ofeach year. The developers also cannot choose an intermediateconfiguration to transition through that will not function andhence cannot be sold.

Further complicating the multi-step configuration problemis that developers may need to foresee tradeoffs that must bemade along the way. For example, it is not possible to simulta-neously addCollision Avoidance Braking andEnhancedAvoidance in the same year since their total developmentcost is 36 million dollars, as shown in Figure 2. Develop-ers must therefore addCollision Avoidance Braking and

Standard Avoidance one year (to simultaneously meet thebudget constraint and the feature model constraints) and thenremove theStandard Avoidance feature at a later step toadd the desiredEnhanced Avoidance feature.

Prior work on configuration analysis and automation [13],[1], [4], [5], [18], [11] focused on creating one configurationthat meets a specific set of requirements,i.e., they find aconfiguration in one step and assume that it is possible todirectly transition to it. These techniques do not, however,support the need to split the configuration over multiplesteps to adhere to a change constraint, such as the maximumdevelopment budget per year. A gap therefore exists in currenttechniques when developers need to reason about and automateconfiguration over multiple steps.

Solution overview and contributions. To fill the gap inexisting research, we have developed an automated methodfor deriving a set of configurations that meet a series ofrequirements over a span of configuration steps. We call ourtechnique theMUlti-step Software Configuration probLEmsolver (MUSCLE). MUSCLE transforms multi-step feature

configuration problems into constraint satisfaction problems(CSPs) [9]. Once a CSP has been produced for the problem,MUSCLE uses a constraint solver (which is an automatedtool for finding solutions to CSPs) to generate a series ofconfigurations that meet the multi-step constraints.

This paper provides the following contributions to the studyof feature model configuration over a span of multiple steps:

1) We provide a formal model of multi-step configuration,2) We show how the formal model of multi-step configu-

ration can be mapped to a CSP,3) We show how multi-step requirements, such as limits on

the cost of feature changes between two successive con-figurations, can be specified using our CSP formulationof multi-step configuration,

4) We describe mechanisms for optimally deriving a setof configurations that meet the requirements and min-imize or maximize a property of the configurations orconfiguration process, such as total configuration cost,

5) We show how multi-step optimizations can be per-formed, such as deriving the series of configurations thatmeet a set of end-goals in the fewest time steps, and

6) We present empirical results from experiments thatdemonstrate that MUSCLE can scale to feature modelswith hundreds of features and configured over multiplesteps.

Paper organization. The remainder of the paper is orga-nized as follows: Section II summarizes the challenges of per-forming automated configuration reasoning over a sequence ofsteps; Section III describes a formal model of multi-step con-figuration; Section IV explains MUSCLE’s CSP-based auto-mated multi-step configuration reasoning approach; Section Vanalyzes empirical results from experiments demonstrating thescalability of MUSCLE; Section VI compares MUSCLE withrelated work; and Section VII presents concluding remarks.

II. CHALLENGES

A multi-step configuration problem for an SPL involvestransitioning from a starting configuration through a seriesof intermediate configurations to a configuration that meetsa desired set of end state requirements. The solution space forproducing a series of successive intermediate configurationsto reach the desired end state can be represented as a directedgraph, as shown in Figure 3. Each successive series of pointsrepresents potential configurations of the feature model atagiven step. For example, the configurationsB0 . . .Bi representthe intermediate configurations that can be reached in one stepfrom the starting configuration. In this section we use thisgraph formulation of the problem’s solution space to showcasethe challenges of finding valid solutions.

A. Challenge 1: Graph Complexity

A critical challenge to developers attempting to derivesolutions to multi-step configuration problems manually ortouse a graph algorithm is that there are an exponential numberof potential intermediate configurations and paths that couldbe used to reach the desired end state. In the worst case, at any

Fig. 3: A Graph of a Multi-step Configuration Problem

given intermediate step, there can beO(2n) points (wheren isthe number of features in the feature model). In the worst case,therefore, there are 2n potential permutations of the features inthe feature model that could form a configuration. Moreover,for a multi-step configuration problem overK time steps, thereareO(K2n) possible intermediate points.

Further compounding this problem is that for any inter-mediate configuration at stepT, there are in the worst case2n − 1 points at stepT + 1 that could be reached from itby adding or removing features to its feature selection. Theintermediate configurations that do not precede the end pointwill therefore have 2n−1 outgoing edges. Section IV discusseshow MUSCLE uses CSP-based automation to eliminate theneed for developers to manually find solutions to these multi-step configuration problems, which reduces configuration timeand cost.

B. Challenge 2: Point Configuration Constraints

Although there are a substantial number of potential in-termediate configurations, many of these configurations willnot meet developer requirements. For example, many of theK2n arbitrary permutations of feature selections will representconfigurations that do not adhere to the feature model con-straints. Moreover, other external constraints, such as safetyconstraints requiring a specific feature to be selected at alltimes, may not be met. We term these constraints on theallowed configurations at a given steppoint configurationconstraints.

Point configuration constraints eliminate many potentialconfiguration paths. These constraints may create small ad-ditional restrictions, such as that a particular feature mustalways be selected. Complex step-based constraints may alsobe present, such as a particular automotive feature becomesun-available after a specific time step (year) because the supplierdiscontinues it. Finally, a multi-step configuration problemmay not dictate an exact starting and ending configuration,but merely a series of point configuration constraints that musthold for the start and end points of the configuration path. Themyriad of possible point configuration constraints significantlyincreases the challenge of finding a valid configuration pathfor a multi-step configuration problem. Section IV-C describes

how MUSCLE models these constraints using a CSP, whichenables a CSP solver to automatically derive solutions thatadhere to these constraints and thus reduce tedious and error-prone manual configuration.

C. Challenge 3: Configuration Change/Edge Constraints

The automotive example in Figure 1 requires that developersadding new features spend no more than 35 million dollarsin one year. The cost of adding/removing features can becaptured as the length or weight of the edges connectingtwo transitions. For example, to transition directly from thestarting configuration to the desired end configuration requires88 million dollars and has an edge weight of 88.

Developers must not only find a path that reaches thedesired end state without violating the point configurationconstraints in Section II-B, but also ensure that any constraintson the edges connecting successive configurations are met.Transitioning directly from the start configuration to endconfiguration would violate the edge constraint of the 35million dollar yearly development budget. Edge constraintsfurther reduce the number of valid paths and add complexity tothe problem. Section IV-D shows how these edge restrictionscan be encoded as constraints on MUSCLE’s CSP variables toplan configuration paths that adhere to development budgets,which is hard to determine manually.

D. Challenge 4: Configuration Path Optimization

There may often be multiple correct configuration paths thatreach the desired end point. In these cases, developers wouldlike to optimize the path chosen, for example to minimize totalcost (the sum of the edge weights). In other cases, it may bemore imperative to meet the desired end point constraints inas few time steps as possible.

For example, in Figure 4, developers have an initial devel-opment budget of 35 million dollars and then a subsequentyearly budget of 50 million dollars. Although the cost of the

Fig. 4: Optimization of Total Steps

path through intermediate configurationsBi andCi is cheaper(70 million), developers may prefer to pass throughB0 andC0 since they will already have a configuration that meets theend goals atC0. Developers must therefore not only contendwith numerous multi-step constraints, but must also perform

complex optimizations on the properties of the configurationpath. Section IV-E shows how optimization can be performedon MUSCLE’s CSP formulation of multi-step configurationto allow developers to find the fastest and most cost-effectivemeans of achieving a configuration goal.

III. A F ORMAL DEFINITION OF MULTI -STEP

CONFIGURATION

This section presents a formal model of multi-step con-figuration. In its most general form, multi-step configurationinvolves finding a sequence of at mostK configurations thatsatisfy a series of point configuration constraints and edgeconstraints. This definition requires the start and end config-urations meet a set of point constraints, but does not dictatethat there be asinglevalid starting and ending configuration.

General formal model. We define a multi-step configuration problem using the 6-tupleMsc=< E,PC,∆(FT ,FU),K,FStart,Fend >, where:

• E is the set of edge constraints, such as the maximumdevelopment cost per year for features,

• PC is the set of point configuration constraints that mustbe met at each step, such as the feature model rules thatdevelopers may require to be adhered to across all steps(feature model rules do not have to be enforced at eachtime step),

• ∆(FT ,FU) is a function that calculates the change cost oredge weight of moving from a configurationFT at stepT to a configurationFU at stepU ,

• K is the maximum number of steps in the configurationproblem,

• FStart is a set of configuration constraints on the startingconfiguration, such as a list of features that must initiallybe selected,

• Fend is a set of configuration constraints on the finalconfiguration, such as a list of features that must beselected or maximum cost of the final configuration.

We define a configuration path from stepT overK steps asa K-tuple

P =< FT ,FT+1, . . .FT+K−1 >

, where the configuration at stepT is denoted byFT . Eachconfiguration,FT , denotes the set of selected features at stepT.

Section IV shows how this formal model can be specifiedas a CSP. Although we use CSPs for reasoning on the formalmodel, we could also use SAT solvers, propositional logic, orother techniques to reason about this model. The formal modelis thus applicable to a wide range of reasoning approaches.

A. Constraint and Function Examples

We now describe how the formal model presented above canbe used to model typical SPL configuration constraints. Weshow how common configuration needs, such as the selectionof specific features or budgetary constraints, can be mappedto portions of our multi-step configuration problem tuple.

Edge constraint examples.The set of edge constraintsEcan include numerous types of constraints on the transitionfrom one configuration to another. For example, a constrainte1 ∈ E may dictate that the maximum weight of any edgebetween successive configurations inFT ,FT+1 ∈ P have atmost weight 35 (for the automotive problem from Figure 1):

∀T ∈ (0..K−1), ∆(FT ,FT+1) ≤ 35

Edge constraints may also vary depending on the step, forexample a development budget may start at $35 million andmay expand as a function of the step:

∀T ∈ (0..K−1), ∆(FT ,FT+1) ≤35

1− (.01∗T)

Edge constraints may also be attached to specific time steps:

∀T ∈ (0..4,6..K−1), ∆(FT ,FT+1) ≤ 351−(.01∗T)

∆(F5,F6) ≤ 40

Point configuration constraint examples.The point config-uration constraints specify properties that must hold for thefeature selection at a given step. Both the starting and endingpoints for the multi-step configuration problem are defined aspoint configuration constraints on the first and last steps. Forexample, we want to start at a specific configurationFs andreach another configurationFe:

(F0 = Fs)∧ (FK = Fe)

Another general constraintpc1 ∈ PC could require that forany stepT, the feature selectionFT satisfies the feature modelconstraintsFc:

∀T ∈ (0..K−1), FT ⇒ Fc

Developers could also require that a specific set of featuresFs, such as safety critical braking features, be selected at alltimes:

∀T ∈ (0..K−1), Fs ⊂ FT

Change calculation function examples.The function∆(FT ,FU) calculates the cost of changing from one configura-tion to another configuration at a different step. For example,the following change calculation function computes the costof changing from one configuration to another:

Fadded = FU −FT

∆(FT ,FU) = ∑ fi ∗ ci, fi ∈ Fadded

where fi is the ith added feature andci is the price of addingthat feature.

IV. A CSP MODEL OF MULTI -STEPCONFIGURATION

This section describes how MUSCLE uses CSPs to automat-ically derive solutions to mulit-step configuration problems.To address the challenges outlined in Section II we show thatderiving a configuration path for a multi-step configurationproblem can be modeled as a CSP [9] using the formalframework from Section III. After a CSP formulation of amulti-step configuration problem is built, MUSCLE can use

a CSP solver to automatically derive a valid configurationpath on behalf of the developer. Automating the configurationpath derivation helps reduce the complexity from Challenge1 in Section II-A. Moreover, the CSP solver can be used toperform optimizations that would be extremely hard to achievemanually.

Prior work on automated feature model configuration [3],[16], [17] has yielded a framework for representing featuremodels and configuration problems as CSPs. This sectionshows how a new formulation of feature models and configura-tion problems can be developed that (1) incorporates multiplesteps, (2) allows a constraint solver to derive a configurationpath for evolving a feature selection over multiple intermediatesteps to meet an end goal, (3) permits the specification of inter-mediate configuration constraints, (4) allows for change/edgeconstraints on the transition between feature selections,and(5) can be leveraged to optimize configuration path properties,such as path length or cost.

A. CSP Automated Configuration Background

A CSP is a set of variables and a set of constraints over thevariables. For example,(X−Y > 0)∧(X < 10) is a simple CSPinvolving the integer variablesX andY. A constraint solver isan automated tool that takes a CSP as input and produces alabeling or set of values for the variables that simultaneouslysatisfies all of the constraints. The solver can also be used tofind a labeling of the variables that maximizes or minimizesa function of the variablese.g. maximizeX +Y yields X =9,Y = 8.

A feature model can be modeled as a CSP through a seriesof integer variablesF , where the variablefi ∈F corresponds tothe ith feature in the feature model. A configuration is definedas a series of values for these variables such thatfi = 1 impliesthat theith feature is selected in the configuration. If theithfeature is not selected,fi = 0. Configuration rules from thefeature model are represented as constraints over the variablesin F , as shown in Figure 5. More details on building a CSP

Fig. 5: Mapping Feature Model Rules to a CSP

from a feature model is described in [16], [3].

B. Introducing Multiple Steps into the CSP

The goal of automated configuration over multiple-steps isto find a configuration path that permutes a given starting con-figuration through a sequence of intermediate configurations

to reach a desired end state. For example, the configurationpaths in Figure 2 capture sequential modifications to the carconfiguration, shown in Figure 1, that will incorporate high-end features into the base automobile model. To reason abouta configuration path over a span of steps, we first introduce anotion of a configuration step into MUSCLE’s CSP model ofconfiguration.

CSP model of configuration steps.To introduce config-uration steps into MUSCLE’s configuration CSP, we modifythe configuration CSP formulation outlined in Section IV-A.We no longer use a variablefi to refer to whether or not theith feature is selected or deselected. Instead,we refer to theselection state of each feature at a specific stepT with thevariable fiT , i.e., if the ith feature is selected at stepT, fiT = 1.We refer to an entire configuration at a specific step as a set ofvalues for these variables,fiT ∈ FT . A solution to the CSP isconfiguration path defined by a labeling of all of the variablesin the K-tuple:< FT ,FT+1 . . .FT+K−1 >.

For example, if theABS feature (denotedfa) is not selectedat stepT and is selected at stepT +1, then:

faT = 0faT+1 = 1

Figure 6 shows a visualization of how thefiT ∈ FT variablesmap to feature selections.

Fig. 6: An Example of Variables Representing Feature Selec-tion State at Specific Steps

C. CSP Point Configuration Constraints

To address Challenge 2 from Section II-B, the point con-figuration constraints (which are the constraints that definewhat constitutes a valid intermediate configuration) can bemodeled as constraints on the variablesfiT ∈ FT . Each pointconfiguration constraint has a specific set of steps,Tpc, duringwhich it must be met,i.e., the constraint must only evaluate totrue on the precise steps for which it is in effect. For example, asimple constraint would be that the 2nd and 3rd configurationsmust have the featuref1 selected. The set of steps for whichthis constraint must hold would beTpc = {2,3}.

CSP model of point configuration constraints.A CSPpoint configuration constraint,pci ∈ PC, requires that:

∀T ∈ Tpc, FT ⇒ pci

Arbitrary point configuration constraints can be built usingthis model to restrict the valid configurations that are passedthrough by the configuration path. This flexible point con-figuration constraint mechanism allows developers to specifyand automatically find solutions to problems involving theconstraints from Challenge 2 in Section II-B.

CSP point configuration constraint example.Assume thatwe want to find values forFT . . .FT+K such that we neverviolate any of the feature model constraints at any step. Furtherassume that the constraints in the feature model remain staticover theK steps (feature model changes over multiple stepscan also be modeled). If thejth feature is a mandatory childof the ith feature, we add the constraint:

∀T ∈ (0. . .K), ( fiT = 1) ⇔ (FjT = 1)

That is, we require that at any stepT, if the ith feature (FiT )is selected, thejth feature (f jT ) is also selected. Furthermore,at any stepT, if the jth feature (FjT ) is selected, theith fea-ture (fiT ) is also selected. Other example point configurationconstraints can be mapped to the CSP as shown in Figure 7and Figure 8.

Fig. 7: Example Mappings of Static Feature Model ConstraintsOver Multiple Steps to a CSP

Fig. 8: Example Point Configuration Constraint Mappings toa CSP

D. CSP Edge/Change Constraints

Challenge 3 from Section II-C described how developersmust be able to specify and adhere to constraints on thedifference between two configurations at different steps. Theseedge/change constraints can be modeled in the CSP as con-straints over the variables in two configurationsFT and FU .By extending the CSP techniques we have developed in pastwork [17], we can specifically capture which features areselected or deselected between any two steps and constrainthese changes via budget or other restrictions.

CSP model of edge/change constraints.To capture differ-ences between feature selections between stepsT andU , wecreate two new sets of variablesSTU andDTU. These variableshave the following constraints applied to them:

∀siTU ∈ STU, (siTU = 1) ⇔ ( fiT = 0)∧ ( fiU = 1)∀diTU ∈ DTU, (diTU = 1) ⇔ ( fiT = 1)∧ ( fiU = 0)

If a feature is selected at time stepT and not at time stepU , therefore,diTU is equal to 1. Similarly, if a feature is notselected at stepT and selected at stepU , siTU is equal to 1.

An edgeedge(T,U) between the configurations at stepsTandU is defined as a 2-tuple:

edge(T,U) =< DTU,STU >

an edge is thus defined by the features deselected and selectedto reach configurationFU from configurationFT . The weightof the edgeweight(edge(T,U)) can then be calculated as afunction of the edge tuple. For example, if theith feature costsci to add or remove then

weight(edge(T,U)) =n

∑i=0

siTU ∗ ci +n

∑i=0

diTU ∗ ci

CSP edge/change constraint example.The cost of includ-ing a particular feature may change over time. For example,the cost of adding a GPS guidance system to a car does notremain fixed, but instead typically decreases from one year tothe next as GPS technology is commoditized. We can modeland account for these changes in MUSCLE’s CSP formulationand constrain the configuration path so that it adds featuresat times when they are sufficiently cheap. We will definean edge constraint that takes into account changing featuremodification costs and limits the change in cost between twosuccessive configurations to $35 million dollars.

We assume we can calculate that the price of including theith feature so that it is included in the feature selection at stepT by the function:

Cost(i,T) =ci

T +1We can then define the cost of adding features to a configu-ration as:

weight(edge(T,T +1)) =n

∑i=1

(siTT+1 ∗Cost(i,T +1))

We can now limit the cost of any two succesive configurationsvia the edge constraint:

∀T ∈ (0..K−1), weight(edge(T,T +1))≤ 35

E. Multi-step Configuration Optimization

Challenge 4 from Section II-D showed that optimizing theconfiguration path is an important issue. CSP solvers canautomatically perform optimization while finding values forthe variables in a CSP (though it may be impractical time-wise for some problems). We can define goal functions overthe CSP variables to leverage these optimization capabilitiesand address Challenge 4.

In some cases, developers may not want to just find anyconfiguration path that ends in the desired state. Instead,they may want a path that produces a configuration thatmeets the end goals as early as possible. For example, inthe automotive problem from Section I developers may wantto find a configuration path that meets their constraints andincludes the high-end features in the base model in fewer thanfive years.

CSP model of path length.To support path length opti-mization, we define a measure of the number of steps neededto reach a valid end state. We must therefore determine ifthe constraints on the final configurationFend (which is thegoal state) are met by some configuration prior to the lastconfiguration (FT whereT < K−1). If we meet the final stateconstraints sooner than the final configuration, then we havefound a configuration process that requires fewer configurationsteps.

To track whether or not a configuration has met the con-straints on the ending configurationFend, we create a series ofvariableswT ∈W to represent whether or not the configurationFT ∈ P satisfiesFend. For each configuration,FT ∈ P, if Fend

is satisifed:(FT ⇒ Fend) ⇒ (wT = 1)

i.e., if at any step (up to and including the last step) we satisfythe end state requirements, setwT equal to 1. We also requirethat after one step has reached a correct ending configuration,the remaining steps also keep the correct configuration and donot alter it:

(wT = 1) ⇒ (wT+1 = 1)

(wT = 1) ⇒ (n

∑i=0

siT T+1 +n

∑i=0

diTT+1 = 0)

Optimization examples.We can optimize to find the short-est configuration path to reach the goals over K steps by askingthe solver to maximize:

K−1

∑T=0

wT

The reason that maximizing this sum will minimize thenumber of steps taken to reach the desired end state is thatthe sooner the state is reached, the more stepswT will equal1.

The most straightforward optimization functions are definedas functions of the variables in the configuration pathP. Forexample, we can instruct the solver to minimize the cost ofthe ending configuration. Assume that the cost ofith feature

at stepK is denoted by the variableci ∈ CK , minimize CK ,where:

CK =n

∑i=0

fi ∗ ci

Other optimizations can be performed on the weights ofthe edges. For example, to find the configuration path withthe lowest development cost, where the development cost isthe edge weight the goal is to minimize:

K−1

∑T=0

weight(edge(T,T +1))

F. A Complete Multi-step CSP Example

Fig. 9: Point Configuration Constraints for the AutomobileExample

We now provide a complete mapping of the automotiveconfiguration problem in Section I to MUSCLE’s multi-stepCSP. For this problem, the automotive developers want toinclude the high-end features into the base model over thecourse of five years (K = 5). We first create a series ofconfiguration variables to represent the feature selectionat theend of each of the five years:

F0 = ( f00, f10, . . . f80)F1 = ( f00, f11, . . . f81)

. . .

F4 = ( f00, f14, . . . f84)

The mappings of the automobile features from Figure 1 toCSP variables can be seen in Figure 9. A configuration pathis defined by a set of feature selections for each of the fiveyears:

P =< F0,F1, . . .F4 >

The point constraint,pc0 ∈PC, ensures that the feature modelconstraints are met by each year’s configuration, as shownin Figure 9. We must also specify the point configurationconstraint for the starting configuration:

Fstart = ( f00 = 1)∧ ( f10 = 1)∧ ( f20 = 0)∧ ( f30 = 0)∧( f40 = 0)∧ ( f50 = 0)∧ ( f60 = 0)∧ f70 = 0)∧( f80 = 0)

Moreover, we must ensure that the high-end features areincluded in the last configuration:

Fend = ( f74 = 0)∧ ( f04 = 1)∧ ( f14 = 1)∧ ( f24 = 1)∧( f34 = 1)∧ ( f44 = 1)∧ ( f54 = 1)∧ ( f64 = 1)∧( f84 = 1)

Our complete set of point configuration constraints isPC =(pc0).

Finally, we must specify how the change cost between twoconfigurations is calculated and enforce the edge constraintthat at most $35 million dollars is spent per year.

∆(FT ,FT+1) = 20s2TT+1 +14s3TT+1 +19s4TT+1 +8s5TT+1

+11s6TT+1 +1s7TT+1 +16s8TT+1

E = (∀T ∈ (0..3), ∆(FT ,FT+1) ≤ 35)

Given this CSP formulation, we can use a constraint solverto automatically derive a solution to the multi-step automotiveconfiguration problem described in Section I.

V. RESULTS

As described in Section II-A, configuring an SPL overmultiple steps is a highly combinatorial problem. An auto-mated multi-step SPL configuration technique should be ableto scale to hundreds of features and multiple steps. This sectionpresents empirical results from experiments we performed todetermine the scalability of MUSCLE. We tested a number ofhypotheses related to the scalability of MUSCLE using variousSPL configuration parameters, such as the total number ofconfiguration steps.

A. Experimental Platform

Our experiments were performed with an implementation ofthe MUSCLE provided by the open-source Ascent Design Stu-dio (available from code.google.com/p/ascent-design-studio).The Ascent Design Studio’s implementation of MUSCLE isbuilt using the Java Choco open-source CSP solver (availablefrom choco.sourceforge.net). The experiments were performedon a computer with an Intel Core DUO 2.4GHZ CPU, 2gigabytes of memory, Windows XP, and a version 1.6 JavaVirtual Machine (JVM). The JVM was run in server modeusing a heap size of 40 megabytes (-Xms40m) and a maximummemory size of 256 megabytes (-Xmx256m).

To test the scalability of MUSCLE we needed 1,000s offeature models to test with, which posed a problem sincethere are not many large-scale feature models available toresearchers. To solve this problem, we used a random featuremodel generator developed in prior work [17]. The featuremodel generator and code for these experiments is availablein open-source form along with the Ascent Design Studio. Weused a maximum branching factor of 5 children per feature anda maximum of 1/3 of the features were in an XOR group.1

Our experiments uncovered trends similar to what observedin prior work [17]. In particular, the branching factor, depth,and cross-tree constraints had little effect on configuration

1XOR feature groups are features that require the set of theirselectedchildren to satisfy a cardinality constraint (the constraint is 1..1 for XOR).

time. The key indicator of the solving complexity was thenumber of XOR-feature groups in a model. The other keyindicators of solving complexity where whether or not opti-mization was used and the total number of time steps involvedin the configuration.

B. Experiment: Multi-step Configuration Scalability

Hypothesis.We hypothesized that MUSCLE could scale upto hundreds of features and 10 or more time steps. We alsobelieved that a CSP solver would be fast enough to derive aconfiguration path in a few seconds.

Experiment design.To test the scalability of MUSCLE, wegenerated random multi-step configuration problems and thensolved for configuration paths that involved larger and largernumbers of steps. The problems were created by generatinga semi-random feature model with 500 features as well asstarting and ending configurations. MUSCLE was used toderive a configuration path between the two configurations.

Our initial experiments were performed withlarge-scaleconfiguration paths, which were produced by forcing thesolver to find a configuration path that involved switchingbetween two children of the root feature that were involvedin an XOR group. Figure 10 shows an example large-scaleconfiguration path problem where the solver must derive aconfiguration path that switches from including featureA tofeatureB. With this type of configuration problem, the solver

Fig. 10: Changing Between Two XOR Subtrees

was forced to change every feature selection in the startingconfiguration to reach the end state,i.e., these experimentsmaximized the number of changes that the solver could everbe required to make.

We generated and solved temporal configuration path prob-lems for feature models with 500 features. We successivelyincreased the number of time steps involved in the configura-tion path to produce larger and larger configuration paths. Themaximum number of changes per configuration checkpointwere bounded to 1/4 of the total number of features. Wesolved 100 randomly generated configuration path problemsper problem size.

Results and analysis.The results from the experiment areshown in Figure 11. This figure shows the solving time inmilliseconds for the configuration path derivation versus thetotal number of time steps in the configuration problem. Asshown in Figure 11, the solving time scales roughly linearlywith the number of time steps.

The apparent linear scaling of the technique with respect tothe number of time steps is a promising result. Although morework is needed to show that this linear scaling continues for

Fig. 11: Automated Configuration Time for Varying Numbersof Time Steps

different configuration path properties, these results indicatethat the technique may scale well as the number of time stepsgrows. Our future work is investigating the scalability of thetechnique and improve MUSCLE’s CSP formulation.

VI. RELATED WORK

This section compares MUSCLE with related work.Automated Single-step Configuration:Several single-step

feature model configuration and validation techniques havebeen proposed [2], [13], [1], [4], [5], [16]. These techniquesuse CSPs and propositional logic to derive feature modelconfigurations in a single stage as well as assure their validity.These techniques help address the high complexity of findinga valid feature selection for a feature model that meets a setof intricate constraints.

While these techniques are useful for the derivation andvalidation of configurations in a single step, they do not con-sider feature configuration over the course of multiple steps.In scenarios, such as the automotive example from Section I,the ability to reason about configuration over multiple steps iscritical. MUSCLE provides this automated reasoning acrossmultiple steps. Moreover, MUSCLE can also be used forsingle-step configurations since it is a special case of multi-step configuration where there is only one stepK = 1.

Staged Configuration: Czarnecki et al. [7] describe amethod for using staged feature selection to achieve a finaltarget configuration. This multi-stage selection considers casesin which the selection of features in a previous stage impactsthe validitiy of later stage feature selections. Our techniquealso examines the production of a feature model configurationover multiple configuration steps. MUSCLE is complementaryto Czarnecki et al.’s work and provides a general formalframework that can be used to perform automated reasoningon staged configuration processes. Moreover, MUSCLE canalso be used to reason about other multi-step configurationprocesses that do not fit into the staged configuration model,such as the the example from Section I where each step mustreach a valid configuration.

Staged configuration can be modeled as a special instanceof multi-step configuration. Specifically, staged configurationis an instance of a multi-step configuration problem where:

E = /0, Fstart = /0, Fend= (FK−1 ⇒ Fc), K is set to the numberof stages,∆(FT ,FU) is not defined, andFc is the set of featuremodel constraints. That is, there are no limitations on thechanges that can be made between successive configurations,the starting configuration has no features selected, and theend-ing configuration yields a valid feature model configuration.The staged configuration definition can be refined to guaranteethat successive stages only add features:∀T ∈ (0..K−1),FT ⊂FT+1.

Quality Attribute Evaluation: Several techniques havebeen proposed for using quality attribute evaluation[8], [10],[15], [14], [18], [11] to guide a configuration process. Thesetechniques provide a framework for assessing the impactof each feature selection on the overall capabilities of theconfigured system. As a result, quality characteristics, suchas reliability, can be taken into account when selecting fea-tures. These techniques are also designed for single stepconfiguration processes. These techniques could be used ina complementary fashion to MUSCLE to produce the pointconfiguration, edge, and other constraints in the multi-stepconfiguration model.

VII. C ONCLUDING REMARKS

Many production SPL configuration problems require de-velopers to evolve a configuration over multiple steps, ratherthan in a single iteration. Multi-step configuration, however,must take into account constraints on the change betweensuccessive configurations, such as the increase in cost of anautomobile’s configuration from one year to the next. More-over, even though configuration is performed over multiplesteps, a valid configuration must still be produced at the endof each year, further adding complexity while maintaining afunctional system configuration.

It is hard to determine a sequence of feature model configu-rations and feature selections such that an initial configurationcan be transformed into a desired target configuration. Thispaper introduces a technique, called theMUlti-step SoftwareConfiguration probLEm solver(MUSCLE), for modeling andsolving multi-step configuration problems. MUSCLE repre-sents the problem as a CSP, which enables CSP solvers todetermine a path from a starting configuration to a targetconfiguration. The output from MUSCLE is a valid sequenceof feature selections that will lead from a starting configurationto the desired target configuration while also taking intoaccount resource constraints.

The following are lessons learned from our efforts examin-ing multi-step configuration using MUSCLE:

• SPL multi-step optimziation. Multi-step optimizationscan be performed to minimize both the number of timesteps and the resource consumption required to reach atarget SPL configuration.

• Multi-step SPL configuration complexity. For each timestep, there exists a worst case exponential number ofintermediate configurations. Due to this, it is paramountto employ an algorithm that does not rely on exhaustive

exploration. Our future work therefore plans to improveour solving methods to increase performance.

• Multi-step SPL CSP linear scaling.Emprical data hasshown that our technique scales linearly for varyingnumbers of time steps. Our future work will therefore testthe scalability of MUSCLE in response to other factors,including problem size and tightness of constraints.

Open-source implementations of MUSCLE are available inthe Ascent Design Studio (ascent-design-studio.googlecode.com) and FAMA (www.isa.us.es/fama).

REFERENCES

[1] D. Batory. Feature Models, Grammars, and PrepositionalFormulas.Software Product Lines: 9th International Conference, SPLC 2005,Rennes, France, September 26-29, 2005: Proceedings, 2005.

[2] D. Benavides, S. Segura, P. Trinidad, and A. Ruiz-Cortés. FAMA:Tooling a framework for the automated analysis of feature models. InProceeding of the First International Workshop on Variability Modellingof Software-intensive Systems (VAMOS), 2007.

[3] D. Benavides, P. Trinidad, and A. Ruiz-Cortes. Automated Rea-soning on Feature Models. InProceedings of the 17th Conferenceon Advanced Information Systems Engineering, Porto, Portugal, 2005.ACM/IFIP/USENIX.

[4] D. Beuche. Variant Management with Pure:: variants. Technical report,Pure-Systems GmbH, http://www.pure-systems.com, 2003.

[5] R. Buhrdorf, D. Churchett, and C. Krueger. Salion’s Experience witha Reactive Software Product Line Approach. InProceedings of the 5thInternational Workshop on Product Family Engineering, Siena, Italy,November 2003.

[6] P. Clements and L. Northrop.Software Product Lines: Practices andPatterns. Addison-Wesley, Boston, USA, 2002.

[7] K. Czarnecki, S. Helsen, and U. Eisenecker. Staged ConfigurationUsing Feature Models.Software Product Lines: Third InternationalConference, SPLC 2004, Boston, MA, USA, August 30-September 2,2004: Proceedings, 2004.

[8] L. Etxeberria and G. Sagardui. Variability Driven Quality Evaluation inSoftware Product Lines. InSoftware Product Line Conference, 2008.SPLC’08. 12th International, pages 243–252, 2008.

[9] P. V. Hentenryck.Constraint Satisfaction in Logic Programming. MITPress, Cambridge, MA, USA, 1989.

[10] A. Immonen. A method for predicting reliability and availability atthe architectural level. Research Issues in Software Product-Lines-Engineering and Management, T. Käkölä and JC Dueñas, Editors, 2005.

[11] S. Jarzabek, B. Yang, and S. Yoeun. Addressing quality attributesin domain analysis for product lines. InSoftware, IEE Proceedings-,volume 153, pages 61–73, 2006.

[12] K. C. Kang, S. Kim, J. Lee, K. Kim, E. Shin, and M. Huh. FORM:A Feature-Oriented Reuse Method with Domain-specific ReferenceArchitectures. Annals of Software Engineering, 5(0):143–168, January1998.

[13] M. Mannion. Using first-order logic for product line model validation.Proceedings of the Second International Conference on Software Prod-uct Lines, 2379:176–187, 2002.

[14] E. Niemelä and A. Immonen. Capturing quality requirements of productfamily architecture. Information and Software Technology, 49(11-12):1107–1120, 2007.

[15] F. Olumofin and V. Misic. Extending the ATAM Architecture Evaluationto Product Line Architectures. InIEEE/IFIP Working Conference onSoftware Architecture, WICSA, 2005.

[16] J. White, A. Nechypurenko, E. Wuchner, and D. C. Schmidt. AutomatingProduct-Line Variant Selection for Mobile Devices. InProceedingsof the 11th Annual Software Product Line Conference (SPLC), Kyoto,Japan, Sept. 2007.

[17] J. White, D. C. Schmidt, D. Benavides, P. Trinidad, and A. Ruiz-Cortez.Automated Diagnosis of Product-line Configuration Errors in FeatureModels. In Proceedings of the Software Product Lines Conference(SPLC), Limerick, Ireland, Sept. 2008.

[18] H. Zhang, S. Jarzabek, and B. Yang. Quality Prediction and Assessmentfor Product Lines.LECTURE NOTES IN COMPUTER SCIENCE, pages681–695, 2003.

automated reasoning for multi-step software …briand/white-mstep.pdfautomated reasoning for...

Documents