supporting software integration activities with fine ... · untangling fine-grained code changes...
TRANSCRIPT
Supporting Software Integration Activities
with Fine-grained Code Changes
Martín Dias
Advisor: Stéphane Ducasse
Co-Advisor: Damien Cassou
Research team: RMoD - Inria Lille
Begin: November 30th, 2012
End: November 29th, 2015
Fundings: INRIA doctoral grant
Software Integration
IntegratorDevelopers
Branch
MergeCommit
? $@%&!
Text-based
Semantical Conflicts
Impacts
Cherry-picking
1
2
2+𝝙
Integrator
1+𝝙
Developers
ah crotte!(biiiip)
• Who is the owner of this changed code?
• Which entities (e.g. classes, methods) have been changed?
• What is the intention of this commit?
• What bug fixes also affected the entities impacted by this change?
• Does this commit depend on previous ones?
Authorship
Code structure
Change intention
Bug tracking
Change sequence
Do Tools Support Code Integration? A Survey
(1): RMoD Inria Lille–Nord Europe, University of Lille — CRIStAL, France(2): Norizzk.com, Belgium
(Under submission to Journal of Technology)
Martín Dias (1), Stéphane Ducasse (1), Damien Cassou (1), Verónica Uquillas-Gómez (2)
RQ What is the importance and tools support of each question?
RQ What questions do integrators ask?
➡ Open call in 3 development mailing-lists➡ Literature survey
➡ Survey experts (42 integrators)
Impact (ripple effects) ?
Tangled changes?
Understanding Change Impact
Understanding Change Dependencies when Cherrypicking
Understanding Change Scattering
Most important questions without tool support
Untangling Fine-Grained Code Changes
(1): RMoD Inria Lille–Nord Europe, University of Lille — CRIStAL, France(2): SORCERERS @ Software Engineering Research Group, Delft University of Technology, The Netherlands (3): Digital Security Group, Radboud Universiteit Nijmegen, The Netherlands
(SANER’15)
Martín Dias (1), Alberto Bacchelli (2), Georgios Gousios (3), Damien Cassou (1), Stéphane Ducasse (1)
VCS
commit
t
Feature #6
Fix Bug
Feature #6
CH1 A.toString()
CH2 B.getFoo()
CH3 B.toString()
CH4 C.getBar()
CH5 B.toString()
Tangled
Development
VCS
commit
CH1 A.toString()
CH2 B.getFoo()
CH3 B.toString()
CH4 C.getBar()
CH5 B.toString()-1
+1
Integration
VCS
commit
CH1 A.toString()
CH2 B.getFoo()
CH3 B.toString()
CH4 C.getBar()
CH5 B.toString()-1
+1
Integration
VCS
commit
commit
CH1
CH4
CH5
CH2
CH3
TangledCH1
CH4
CH5
CH2
CH3
untangler tool
Herzig and Zeller (MSR 2013)
VCS repositories of 6 Java projects
Tangled commits: 20%
Untangling algorithm using features of code changes
Features of code changes?
CH4
CH5
CH1
CH3
CH1
CH2
Pair
…
…
…
…
…
…
Linear Regression
…
Call Graph Distance
…
File Distance
…
CH1
CH4
CH5
CH2
CH3
CH1
CH4
CH5
CH2
CH3
CH4
CH5
CH1
CH3
CH1
CH2
… ……… …
…
…
…
Herzig and Zeller (MSR 2013)
Limitations
dynamically-typed languages
light static analysis
t
CH1
CH2
CH3 B.toString()
CH4
CH5 B.toString()
t
CH1
CH2
CH3 B.toString()
CH4
CH5 B.toString()
t
CH1
CH2
CH3 B.toString()
CH4
CH5 B.toString()
Shadowing
Negara et al. (ECOOP’12)
“We found that 37% of code changes are shadowed by other changes, and are not
stored in VCS.”
t
CH1
CH2
CH3
CH4
CH5
t
CH1
CH2
CH3
CH4
CH5
Timestamps
t
CH1
CH2
CH3
CH4
CH5
Timestamps
VCSs don’t have this information
t
CH1
CH2
CH3
CH4
CH5
unit test☑
unit test☒
Test Runner
Activity
t
CH1
CH2
CH3
CH4
CH5
unit test☑
unit test☒
Test Runner
Activity
Overcoming such limitations
Epicea
fine-grained code changes & IDE events logging
plugin
A.toString()
B.getFoo()
B.toString()
C.getBar()
B.toString()
t
CH1
CH2
CH3
CH4
CH5
unit test☒
unit test☑
💡
Fine-Grained
Code Changes
Epicea Model: Events
Epicea Model: Code Changes
Epicea Log Browser
pluginA.toString()
B.getFoo()
B.toString()
C.getBar()
B.toString()
t
CH1
CH2
CH3
CH4
CH5
unit test☒
unit test☑☑
☒
💡
untangler tool
Fine-Grained
Code Changes
Epicea Untangler
CH1
CH4
CH5
CH2
CH3
CH1
CH4
CH5
CH2
CH3
CH4
CH5
CH1
CH3
CH1
CH2
… ……… …
…
…
…
any binary classifier+ fine-grained
features
A.toString()
B.getFoo()
B.toString()
C.getBar()
B.toString()
t
CH1
CH2
CH3
CH4
CH5
unit test☒
unit test☑
CH4
CH5
CH2
CH3
CH1
CH3
Pair
…
…
Ordered distance
2
1
1
…
…
Time difference
300”
10”
10”
…
…
Same test run
F
T
F
…
…
Fine-Grained
Code Changes
Epicea Untangler: Features
Static Code Analysis
same class same package same method name # shared variable accesses # shared method calls # shared variable accesses in delta # shared method calls in delta # variable accesses reciprocal method calls both cosmetic changes
Fine-grained Code Change Analysis
ordered distance timestamp difference same test run
Epicea Untangler: Features
binary logistic regression naïve bayes random forests
Epicea Untangler: Classifiers
different assumptions on underlaying data and model
Epicea Untangler
CH1
CH4
CH5
CH2
CH3
CH1
CH4
CH5
CH2
CH3
CH4
CH5
CH1
CH3
CH1
CH2
… ……… …
…
…
…
binary classifier+ fine-grained
features
Which features are dominant?
Most effective classifier?RQRQ
4 monthsx 2
plugin
☑
☒
Manual Untangling
http://dx.doi.org/10.6084/m9.figshare.1241571
Published
training testing
same
crossed
combined
Most effective classifier?RQ
1
2
AUC ACC PREC REC F.MEASURE G.MEAN
binary logistic regression 0.92 0.68 0.43 0.96 0.60 0.76
naïve bayes 0.88 0.65 0.41 0.94 0.57 0.73
random forests 0.99 0.96 0.96 0.88 0.92 0.93
1 1
1 2
1 12
Which features are dominant?RQ
AUC ACC PREC REC F.MEASURE G.MEAN
binary logistic regression 0.92 0.68 0.43 0.96 0.60 0.76
naïve bayes 0.88 0.65 0.41 0.94 0.57 0.73
random forests 0.99 0.96 0.96 0.88 0.92 0.93
random forests w/ 0.98 0.95 0.96 0.82 0.88 0.90
time difference
ordered distance
same class
dominant { Simple
features!
dominant
Which features are dominant?RQ
95% of accuracyCH x 200
97% of accuracyCH x 800
Only 2 days
of work!
time difference
ordered distance
same class
dominant { Simple
features!
CH1
CH4
CH5
CH2
CH3
CH1
CH4
CH5
CH2
CH3
CH4
CH5
CH1
CH3
CH1
CH2
… ……… …
…
…
…
RQ Is it effective with new data from real users?
Good with
initial data
plugin EpiceaUntangler
manual
sorting
2 weeksx 6
Is it effective with new data?RQ
➡Median success rate: 91%
➡Qualitative feedback:
• “It works good in many cases, especially for not so big change sets”
• “It was a bit painful to check everything”
# successfully clustered changes # changes
Success rate =
Conclusion
Supporting Software Integration Activities with Fine-grained Code Changes
Martín Dias, RMoD Inria-Lille — University of Lille 1, Cristal
IntegratorDevelopers
Branch
MergeCommit