ants, mutants and beyond - bcs sgaigenetic algorithms/genetic programming - i ... obtain a...
TRANSCRIPT
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
“Ants, Mutants and Beyond”Combining formal and stochastic techniques
to improve software
[email protected] [email protected]
October 12, 2015
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
Overview
1 Nature-inspired Computing.
2 Search Based Software Engineering (SBSE).
3 Genetic Improvement (GI).
4 Automatic Improvement (AIP).
5 Industry perspective on SBSE — Douglas Carson (Keysight).
6 Hylas and Antbox — Zoltan A. Kocsis (U. Stirling).
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
3/26
Naturally-inspired Computing
Some classes of problem are well-understood and haveefficient solution methods (e.g. the simplex algorithm forproblems with linear objectives/constraints).
However, in many cases we only have an operationaldescription of the problem — calculating derivatives to drivea ‘conventional’ optimization method such as Newton’smethod may not be easy (or even possible).
In such cases, it is common to turn to a variety of‘nature-inspired’ metaheuristic search methods: GeneticAlgorithms, Ant-Colony Optimization etc.
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
4/26
General problem-solving with Metaheuristics
Metaheuristics are stochastic (a form of ‘generate-and-test’) andrequire only a few problem-specific ingredients:
Solution representation: e.g. list of cities for the TSP.
Objective function: measures the quality of a solution (e.g.tour length).
Search operators: some means of mutating or (re)combiningsolutions to produce a new solution (e.g. swapping two cities).
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
5/26
Genetic Algorithms/Genetic Programming - I
Idea: generate/breed soutions and apply ‘survival of the fittest’
John H. Holland (1929-2015) John Koza
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
6/26
Genetic Algorithms/Genetic Programming - II
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
7/26
Search Based Software Engineering
There is a recent trend to tackle problems in softwareengineering using metaheuristics:
Requirements Selection:
Objective function: cost, value, . . .Representation: bitvector of requirements.
Test case prioritization
Objective function: rate of coverage, time, faults, . . .Representation: permutation of the test suite.
Program Synthesis
Objective function: correctness, speed, power, memory . . .Representation: proof trees; expression trees; source code.
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
8/26
Pragmatics of Program Synthesis
Scalability remains an issue for program synthesis:
We don’t yet know how to generate sizeable algorithms fromscratch.
Generative approaches such as GP still work best at thescale of expressions . . .
. . . but human ingenuity already provides a vast repertoireof (abstract) algorithms and (concrete) programs . . .
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
9/26
Templar - Improving algorithms
The ‘Template Method’ Design Pattern1 divides analgorithm into a fixed skeleton and some variants.
The fixed parts orchestrate the behaviour of the variants.
Example: Quicksort performance depends on the pivotfunction, so we can treat it as a variant:
DoubleArray q s o r t ( Doub leArray a r r ) {double p i v o t = pivotFn ( a r r ) ;// Imp l ementa t i on o f p i vo tFn can be v a r i e d
g e n e r a t i v e l yr e t u r n q s o r t ( a r r . f i l t e r ( < p i v o t ) )
++ a r r . f i l t e r ( == p i v o t )++ q s o r t ( a r r . f i l t e r ( > p i v o t ) ) ;
}
Expressing algorithms as templates allows us to learn goodimplementations for the variant parts.
1[Gamma, Helm, et al. 1995].
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
10/26
‘Template Method Hyper-heuristics’3
Templar2 is a JavaTM framework designed to make thegeneration of customized algorithms as simple as possible.Ingredients:
A list of variation points describing the parts of thealgorithm to be automatically generated.
An algorithm template expressing the algorithm skeleton.The template produces a customized version of the algorithmfrom automatically-generated implementations of the variationpoints.
An objective function to evaluate the customized algorithm.
An algorithm factory that searches the space of variationpoints to produce an optimized version of the algorithm.
2[Swan and Burles 2015].3[Woodward and Swan 2014].
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
11/26
Hyper-quicksort: Optimizing for energy consumption
8 16 32 64 128 256 512 1024 20480.0625
0.25
1
4
16
64
256
input array size (log2 scale)
Jou
les
(log
2sc
ale)
MidSedgewickRandomHyper-quicksort
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
12/26
Hyper-Quicksort - Results
Array size Middle index Sedgewick Random Index Hyper-quicksort
8 0.191 0.163 0.446 0.09416 0.296 0.345 0.410 0.17332 0.651 0.757 0.967 0.41064 1.366 1.145 1.708 0.976
128 3.505 4.034 5.221 2.341256 8.175 7.646 9.269 6.387512 19.777 21.391 27.685 15.268
1024 62.961 42.508 41.245 33.0122048 198.438 132.663 111.894 70.234
Energy consumption (Joules) against input array size
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
13/26
Genetic Improvement (GI)
Addresses scalability by applying (a variant of) GP to pre-existingprograms. It can be used to:
Fix bugs4 (maintanance dominates software lifecycle cost).
Obtain a multi-objective trade-off between Non-FunctionalProperties5 (NFPs).
Optimize/improve functional properties6.
4[Le Goues, Forrest, et al. 2013].5[Harman, Langdon, et al. 2012].6[Burles, Swan, et al. 2015].
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
14/26
Gen-O-Fix - Improving programs
A Scala framework for self-improving software systems7, i.e. it canimprove a system as it runs.
Performs both GIP and GP, rather than ‘plastic surgery’.
Tight integration of compiler and improvement mechanism(via reflection) is more efficient and less brittle than existingapproaches8.
Callbacks to newly-generated functionality can also beinjected into legacy Java code.
Uses an actor-based approach for executing program variants.
7[Swan, Epitropakis, et al. 2014].8[Langdon and Harman 2013]; [Le Goues, Nguyen, et al. 2012].
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
15/26
Why Scala?
Supports expressive webservice frameworks (‘Hello, Web’ in 6lines of code).
Increasingly popular for concurrency support (Twitter corerewritten in Scala).
Extensively used in industry:
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
16/26
Gen-O-Fix System Diagram
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
17/26
Gen-O-Fix example: Hotswapping webserver code
A stock-price predictor for shares9 in David Bowie10.
Achieved via univariate symbolic regression . . .
of a function extracted from web-application source code.
9*Not actual shares.10*Not the actual David Bowie.
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
18/26
Automatic Improvement
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
19/26
AIP versus Formal Methods
Automatic Improvement Programming (AIP) provides (some of)the benefits of formal methods without the difficulty of writingformal specs.
Formal Methods:
Require highly specialized developers.Hard to write large programs.
AIP:
Doesn’t need formal specs or tools with steep learning curve.Has no scalability issues since we search for transformablepatterns in the source code.
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
20/26
AIP versus GI
Metaheuristic approaches to program improvement (e.g. GI):
Rely heavily on random perturbation /recombination.Can degrade program structure/correctness/explanatorypower.
AIP:
Can use semantics-preserving transformations.Can come with an asymptotic guarantee of superiority.
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
21/26
Automatic Improvement Programming - 1
Well-known that implementing hashCode in Java is (oftenfatally) error-prone.
Our analysis revealed 487 incorrect implementations inApache Hadoop11.
We repaired Hadoop by correcting hashCode implementations(semantics-preserving), whilst simultaneously improving onthe efficiency of the incorrect version (generative).
11[Kocsis, Neumann, et al. 2014].
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
22/26
Automatic Improvement Programming - 2
PolyFunic uses Category Theory to replace stochasticGen-O-Fix mutation operators with catamorphisms:
This guarantees that the mutation is semantics-preserving.
Trial study obtained an asymptotic improvement inefficiency12 (O(n) to O(1)).
Performance comparison of naıve and AIP-optimized code
12[Kocsis and Swan 2014].
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
23/26
Real-World Impact
£80K research income from:
Dataductus: Funded the application of search combinators tohybrid search.
BT: Funding research studentship in adaptive scheduling.
Keysight: funded development of Hylas (one of only 30 suchgrants worldwide).
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
24/26
References I
[1] Nathan Burles, Jerry Swan, Edward Bowles, Alexander E. I. Brownlee,Zoltan A. Kocsis, and Nadarajen Veerapen. “Object-Oriented GeneticImprovement for Improved Energy Consumption in Google Guava”. In:Search-Based Software Engineering - 7th International Symposium, SSBSE2015, Bergamo, Italy, September 5-7, 2015, Proceedings. 2015, pp. 255–261.doi: 10.1007/978-3-319-22183-0_20.
[2] Jerry Swan and Nathan Burles. “Templar - A Framework for Template-MethodHyper-Heuristics”. In: Genetic Programming, LNCS 9025. Ed. byPenousal Machado et al. 2015. isbn: 978-3-319-16500-4.
[3] Z. A. Kocsis, G. Neumann, J. Swan, M. G. Epitropakis, A. E. I. Brownlee,S. O. Haraldsson, and E. Bowles. “Repairing and optimizing Hadoop hashCodeimplementations”. In: Symposium on Search-Based Software Engineering,Brazil, August 26 - 29. 2014.
[4] Zoltan A. Kocsis and Jerry Swan. “Asymptotic Genetic ImprovementProgramming with Type Functors and Catamorphisms”. In: Parallel ProblemSolving from Nature - PPSN XIV - 13th International Conference, Ljubljana,Slovenia, September 13-17, 2014, Proceedings. Ed. by Jurij Silc. Lecture Notesin Computer Science. Springer, 2014.
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
25/26
References II
[5] Jerry Swan, Michael G. Epitropakis, and John R. Woodward. Gen-O-Fix: Anembeddable framework for Dynamic Adaptive Genetic ImprovementProgramming. Tech. rep. CSM-195. Stirling FK9 4LA, Scotland: ComputingScience and Mathematics, University of Stirling, 2014, pp. 1–12.
[6] John R. Woodward and Jerry Swan. “Template Method Hyper-heuristics”. In:Proceedings of the 2014 Conference Companion on Genetic and EvolutionaryComputation Companion. GECCO Comp ’14. Vancouver, BC, Canada: ACM,2014, pp. 1437–1438. isbn: 978-1-4503-2881-4. doi:10.1145/2598394.2609843. url:http://doi.acm.org/10.1145/2598394.2609843.
[7] William B. Langdon and Mark Harman. “Optimising Existing Software withGenetic Programming”. In: IEEE Transactions on Evolutionary Computation(2013). Accepted. issn: 1089-778X. doi: doi:10.1109/TEVC.2013.2281544.
[8] Claire Le Goues, Stephanie Forrest, and Westley Weimer. “Current Challengesin Automatic Software Repair”. In: Software Quality Jornal 21 (3 2013),pp. 421–443.
Nature-inspired Computing Search Based Software Engineering Genetic Improvement Automatic Improvement References References
26/26
References III
[9] Mark Harman, William B. Langdon, Yue Jia, David Robert White,Andrea Arcuri, and John A. Clark. “The GISMOE challenge: constructing thepareto program surface using genetic programming to find better programs.”In: ASE. Ed. by Michael Goedicke, Tim Menzies, and Motoshi Saeki. ACM,2012, pp. 1–14. isbn: 978-1-4503-1204-2.
[10] Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley Weimer.“GenProg: A Generic Method for Automatic Software Repair”. In: IEEETransactions on Software Engineering 38 (2012), pp. 54–72. issn: 0098-5589.doi: http://doi.ieeecomputersociety.org/10.1109/TSE.2011.104.
[11] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Designpatterns: elements of reusable object-oriented software. Boston, MA, USA:Addison-Wesley Longman Publishing Co., Inc., 1995. isbn: 0-201-63361-2.