helping developers help themselves: automatic ...€¦ · how do software engineers understand code...
Post on 20-Jun-2020
3 Views
Preview:
TRANSCRIPT
H E L P I N G D E V E L O P E R S H E L P T H E M S E LV E S : A U T O M A T I C D E C O M P O S I T I O N O F C O D E R E V I E W C H A N G E S
M I K E B A R N E T T, C H R I S T I A N B I R D , A N D S H U V E N D U K . L A H I R I M I C R O S O F T R E S E A R C H
J O Ã O B R U N E T F E D E R A L U N I V E R S I T Y O F C A M P I N A G R A N D E ( U F C G ) , B R A Z I L
1
2
25,000 developers 100,000 reviews per month
CodeFlow
The code review process
The Problem
“Fixed #244 by adding inmethod binders in between properties and indexers.
Also refactors expression-bodied member semantic model tests into their own file
and adds some extra tests.”
3
Developers commit loosely related changes.
It is difficult to review unrelated changes at once.
4
It is difficult to review unrelated changes at once.
4
Literature said so “The more files and diffs the more rela3vely independent changes” [1]
“Par3cipants call for a tool that can automa3cally decompose a change into separate sub-‐changes“ [2]
Wouldn’t it be nice?
5
Wouldn’t it be nice?
ClusterChanges!
5
6
ClusterChanges
diff regions
7
ClusterChangesStandard set of diff regions
8
ClusterChangesSome of them are syntactically and semantically related.
9
ClusterChanges
Partial program analysis
Roslyn
9
ClusterChanges
Partial program analysis
Roslyn Entities (nodes)
methods, fields, properties etc
Relationships (edges) method calls, field access etc
def-use relationship
10
The relationships
def-use relationship
use$
defini&on(
10
The relationships
use-use relationship
11
The relationships
use-use relationship
defini&on(
use( use(
11
The relationships
12
ClusterChangesnon-trivial partition non-trivial partition trivial partitions
12
ClusterChangesnon-trivial partition non-trivial partition trivial partitions
Diffs within the same method are linked together. We don’t split method across partitions.
2-step EvaluationQuantitative Qualitative
Distribu3on of par33ons
1000 Bing and Office reviews
Manual inspec3on
100 reviews
Interviews: 20 developers from 13 projects
13
Quantitative: trivial partitions
Why?
14
Quantitative: trivial partitions
Why?
14
Quantitative: trivial partitions
Why?e.g. Log Messages
14
Quantitative: non-trivial partitions
15
Quantitative: non-trivial partitions
15
Quantitative: non-trivial partitions
We found 3 false-negatives from 50 Changesets
15
Qualitative: The ground truth
16
Qualitative: The ground truth
16
Qualitative: The ground truth
RQ1: Do developers agree with our decomposition?
16
Qualitative: The ground truth
RQ1: Do developers agree with our decomposition?
16
Qualitative: The ground truth
RQ1: Do developers agree with our decomposition?
16
Qualitative: The ground truth
RQ1: Do developers agree with our decomposition?
RQ2: Can organizing a changeset using our decomposition help reviewers?
16
Firehouse interviews• Monitored review submissions
• Criteria: distance and #partitions
17
Firehouse interviews• Monitored review submissions
• Criteria: distance and #partitions
50mi%&>%80.5%km%
Rush to the scene!
17
RQ1: Do developers agree with our decomposition?
18
RQ1: Do developers agree with our decomposition?
19
RQ1: Do developers agree with our decomposition?
19
RQ1: Do developers agree with our decomposition?
Some trivial should be moved to one of non-trivial partitions.20
RQ1: Do developers agree with our decomposition?
21
RQ1: Do developers agree with our decomposition?
“These were actually two different changes and I actually split them in two different things after this review” [P7].
21
RQ1: Do developers agree with our decomposition?
“These were actually two different changes and I actually split them in two different things after this review” [P7].
“I would like tag partitions” [P14].
21
RQ1: Do developers agree with our decomposition?
22
RQ1: Do developers agree with our decomposition?
“In some sort of hypothetical perfect splitting that read my mind, there is one change in one line which was a variable changed (regular expression) that could be in a different partition. But I would not expect that, because it is difficult” [P9].
22
RQ1: Do developers agree with our decomposition?
23
RQ1: Do developers agree with our decomposition?
[P13] “There is no reason to commit unrelated changes.”
“The tool should be 95% correct or else I would not use it because it would be annoying.”
23
RQ1: Do developers agree with our decomposition?
[P13] “There is no reason to commit unrelated changes.”
“The tool should be 95% correct or else I would not use it because it would be annoying.”
[P17] “What is the why behind it?”
“If you do not have something showing why/how the partitions were created, it is difficult to see its value.”
23
RQ2: Can organizing a changeset using our decomposition help reviewers?
“For large sets it it would be very helpful"
“it is useful because allow different reviewers with different purposes to focus on what they want.”
24
RQ2: Can organizing a changeset using our decomposition help reviewers?
It would speed up the review process.
pre-commit usage
"If I had a way to run this tool before I commit, I would have even considered splitting this partition 2 into a second commit" [P4].
25
And, more importantly…
Please, can I try?
26
Developers mentioned CodeFlow and ask to use or integrate.
Microsoft internal event with developers [Techfest] • hundreds signed up to be notified • quite a few asked for the prototype • a few have joined the project and are now contributing
Related WorkTao et al. How do software engineers understand code changes? An exploratory study in industry. In FSE, 2012.
Kirinuki et al. Hey! are you committing tangled changes? In ICPC, 2014.
Herzig and Zeller: The impact of tangled code changes. In MSR, 2013.
27
Summary
28
Summary
28
Summary
28
Summary
28
Summary
28
Summary
28
Summary
28
top related