towards automated a/b testing

Towards AutomatedA/B Testing

Alessandro [email protected]

Università della Svizzera Italiana, Lugano (CH)

Giordano [email protected]

Università della Svizzera Italiana, Lugano (CH)

mailto:[email protected]

mailto:[email protected]

User-intensive Applications

• Large and evolving populations of users

• Meeting user preferences is a crucial factor

• Almost impossible to design applications that accurately capture all possible and meaningful user preferences upfront.

A/B Testing: what?

• Initial (inaccurate) implementation based on the available knowledge

A/B Testing: two distinct variants of the same application are compared using live experiments.

• Run-time continuous monitoring, refinement and improvement to meet newly discovered user preferences

A/B Testing: what?

A/B Testing: what?• Widely adopted in industry: “The A/B Test: Inside

the Technology Thats Changing the Rules of Business”.

A/B Testing: pitfalls

1. Development and deployment of multiple variants.

2. What is a variant

3. How many variants.

4. How to select variants.

5. How to evaluate variants.

6. When to stop

• Still a difficult, tedious, and costly manual activity.

About this work• Adopt SBSE methods to

improve/automate the A/B Testing process

• Investigate the feasibility and draft a possible solution (GA+AOP)

• A polished & ready-to-use solution

A/B Testing: an optimization problem

• Features: From an abstract viewpoint a program p can be viewed as a finite set of features: F = {f1 . . . fn}.

• Each feature f has an associated domain D that specifies which values are valid/allowed for f.


• Instantiation functions: Function that associates a feature with a specific value from its domain.

• To obtain a concrete implementation for a program it is necessary to specify the instantiations for all the features

• The specification of different instantiations yields to different concrete implementations of the same abstract program.


• Variants: We call a concrete implementation of a program p a variant of p.

• Constraints: A constraint is a function Ci,j : Di → P(Dj) that, given a domain value for a feature, returns a subset of values not allowed for other features

• A variant is valid if it satisfies all the defined constraints.


• Assessment Function: An assessment function is a function defined as o(v) : Vp → R, where Vp = {v1, . . . , vm} is the set of all possible variants for the program.

• This function associates to each and every variant of a program a numeric value, which indicates the goodness of the variant with respect to the goal of the program.


• Thus A/B testing can be formulated as:

v = argmax o(v)v

• Goal: exploit search algorithms to enable automated A/B testing.

Towards automated A/B Testing

• Two ingredients:• A design-time declarative

facility to specify program features

• A run-time framework in charge of automatically and iteratively exploring the solution space

Specifying Features

• Ad-hoc annotations to specify the set of relevant features

• They allow to write a parametric program, that represents all possible variants

• We rely on Aspect Oriented Programming to dynamically create variants from the parametric program

Primitive Type Features

@StringFeature(name=“checkOutButtonText”,values={"CheckOut", "Buy", "Buy Now!"})String buttonText; // Primitive feature specification

Button checkOutButton = new Button();checkOutButton.setText(buttonText); // Primitive feature usecheckOutButton.setFontSize(textSize); // Primitive feature use

@IntegerFeature(name="fontSize", range="12:1:18")int textSize; // Primitive feature specification

Generic Data Type Feature@GenericFeatureInterface(name="sort", values={"com.example.SortByPrice", "com.example.SortByName"})public interface AbstractSortingInterface{ public List<Item> sort();}

@GenericFeatureAbstractSortingInterface sortingFeature; // ADT feature specificationsortingFeature.sort(..); // ADT feature use

public class SortByPrice implements AbstractSortingInterface{ public List<Item> sort(){ // Sort by price }}

public class SortByName implements AbstractSortingInterface{ public List<Item> sort(){ // Sort by name }}

Encoding

• Each feature declared by developers directly maps into a gene, while each variant maps into a chromosome.

• The assessment function, which evaluates variants on live users, corresponds directly to the fitness function, which evaluates chromosomes.

Selection

• Identifying, at each iteration, a finite number of chromosomes in the population that survive

• Several possible strategies

• Selection strategies relieve the developer from manually selecting variants during A/B testing.

Crossover & Mutation

• Crossover and mutation contribute to the generation of new variants.

• In traditional A/B testing, the process of generating new variants of a program is performed manually by the developers.

Some results: setup

• We consider a program with n features.

• We assume that each feature has a finite, countable domain

• We split the users into groups

• We adopt an assessment function that ranges from 0 (worst case) to 1000 (best case).

• We set a distance threshold t. If the distance between the value of a feature and the user’s favourite value is higher than t, then the value of the variant is 0.

Some results

Key messages

• An automated solution is indeed

possible and worth investigating.

• Heterogenous groups implies complex

A/B Testing

• Intermediate variants of the programs

were providing good assessment

function values

• Future work:

• Real testbed

• Customised mutation operators

• Full support for constraints.

Thank you

1: A/B Testing as a complex and manual activity

2: A/B Testing can be seen as an optimisation problem

3: We can write parametric programs

4: GA can carry on A/B Testing campaigns for us

towards automated a/b testing

Engineering