biological modeling of software development dynamics

Software development can be treated

as an optimization problem:

Software quality: number of

features, performance, reliability, etc.

Resources: time, money, etc.

maximise software quality

subject to the constraints limited resources

Questions to answer:

› How long it will take to complete a particular

task?

› How long it will take to test a particular module?

› How many defects remain in a particular

module after a release?

› At a particular time, what portion of resources should be devoted to testing as

opposed to developing new features?

How to answer the previous questions:

› By building a model

Purpose of modelling

› Predict, explain, discover, guide

Modelling is a two-steps process:

› Determine set of variables of interests

› Determine set of equations to describe how these variables change and which kinds of

relations exist among them

Software development is an iterative

process

The process of software developement

can be described by the phrase

evolution of software

The phrase implies similarities with

evolution of biological systems

How can we use this fact? – central idea

of this presentation

Bio-inspired optimization algorithms:› Genetic algorithm – motivated by Darwin’s

principal of natural selection

› Memetic algorithm – motivated by genetic + experience

› Particle swarm optimization – motivated by behavior of a flock of birds

› Ant colony optimization – motivated by behavior of an ant-colony

› Bee-inspired optimization, shuffled frog leaping algorithm, etc.

Artificial neural networks

› It is computational structure inspired by central

nervous system

Artificial immune system

› It is computationally intelligent systems inspired

by the principles of the immune system

Capture-recapture bug-estimation method

› Statistical method developed for population

estimation in bio-ecosystems

Accepted among most major software companies (Google, Facebook, Mozilla and Microsoft)

Main principle - abandon long development cycles in favor of faster releases in order to bring the latest features and fixes to end-users

Information for making decision if software is ready for a new release:› Relationship between number of bugs in the

system (b(t)) and effort necessary to resolve them (e(t)).

› The more effort is spent on defect resolving, the less time is left for developing new features.

O1 – Increasing (or decreasing) of e(t) leads to decreasing (or increasing) of b(t):

O2 – Increasing (or decreasing) of b(t) requires increasing (or decreasing) of e(t):

O3 – As a result from the previous two observation, both b(t) and e(t) exhibit periodic oscillations.

)()( tbte )()( tbte

)()( tetb )()( tetb

O4 – The relationship from O1and O2 is

not linear.

O5 – Small increase of effort can lead to

significant reduction of defects number.

O6 –Pareto principle – majority of time is

spent on small number of difficult bugs

(approximately, removal of 30% of bugs

requires 70% of time).

O7 – Changes in code churn exhibits growth rate which can be modeled by sigmoid function.

O8 – b(t) increasing is steeper than decreasing.

O9 – e(t) decreasing is steeper than increasing.

Observations O1-O9 imply similarity with predator-prey ecosystem› Predator – tester, programmer

› Prey - bugs

Most famous predator prey model expressed by:

Main characteristics:› In the case e(t)=0 (no hunt for bugs), b(t) has exponential

growth.

› The rate of detecting and reducing number of bugs by developers/testers is proportional to the number of bugs and effort invested in bug reduction (βeb). Intuitively, if there are more bugs in the system, it will be easier to detect and eliminate some of them. In addition, if more effort is invested in bugs reduction, the more bugs will be reduced.

)()(

ebdt

tdb)(

)(be

dt

tde

› In the absence of bugs (b(t)=0), effort exponentially reduces to zero.

› The rate at which the effort grows is proportional to the rate at which the developers/testers encounter bugs.

Issues with the presented model:› Extremely sensitive to small perturbations

› Allows unlimited exponential growth of number of bugs

› Unlimited ability of a single developer/tester to detect and eliminate bugs

Improvement of LV model which resolves

previous issues:

Number of bugs is limited by the size and

complexity of a project which is

specified by the parameter K.

Rate at which a single developer/tester

detects and removes bugs is limited.

ekb

b

K

bb

dt

tdb)1(

)(e

kb

be

dt

tde )(

Relationship among parameters defines two main regions.

Model allows regular oscillations (suitable for the beginning and the middle of project) as well as dumped oscillations(suitable for project near completion)

Conclusions:

› The model was evaluated on real-life small size project developed under RR methodology

› The model fairly accurately captures observations O1-O9

› Future work: investigate if the results can be generalized for description of projects with different characteristics.

Normalized values of e(t) and b(t)

biological modeling of software development dynamics

Technology