late propagation in software clones
TRANSCRIPT
![Page 1: Late Propagation in Software Clones](https://reader033.vdocuments.mx/reader033/viewer/2022051112/558bcf4ed8b42aab0b8b476b/html5/thumbnails/1.jpg)
Late Propagation in Software Clones
Liliane Barbour, Foutse Khomh, and Ying Zou
![Page 2: Late Propagation in Software Clones](https://reader033.vdocuments.mx/reader033/viewer/2022051112/558bcf4ed8b42aab0b8b476b/html5/thumbnails/2.jpg)
2
Late Propagation (LP)
• Definition: An inconsistent change that diverges a clone pair, later followed by a consistent, re-synchronizing change.
• It can be risky because failure to propagate changes between clones in a clone pair can lead to faults
• In our work, we found that 8-21% of genealogies contain a late propagation
![Page 3: Late Propagation in Software Clones](https://reader033.vdocuments.mx/reader033/viewer/2022051112/558bcf4ed8b42aab0b8b476b/html5/thumbnails/3.jpg)
3
LP With Propagation Example from ArgoUML
//Clone A, Revision 595add Field(new UMLComboBox(typeModel),1,0,0);
//Clone B, Revision 595add Field(new UMLComboBox(classifierModel),2,0,0);
//Diverging Change: Clone A, Revision 602add Field(new UMLComboBoxNavigator(this,”NavClass”,
new UMLComboBox(typeModel)),1,0,0);
//Re-synchronizing Change: Clone B, Revision 604add Field(new UMLComboBoxNavigator (this,”NavClass”,
new UMLComboBox(classifierModel)),2,0,0);Clone A Clone B
Diverging Change
Re-synchronizing Change
Revision 595
Revision 602
Revision 604
![Page 4: Late Propagation in Software Clones](https://reader033.vdocuments.mx/reader033/viewer/2022051112/558bcf4ed8b42aab0b8b476b/html5/thumbnails/4.jpg)
4
LP Without Propagation Example from Ant
//Clone A, Revision 270250if( destFile == null ){ destFile = new File(destDir,file.getName());}
//Clone B, Revision 270250if (destFile == null ) {
destFile = new File(destDir,file.getName());}
// Diverging Change: Clone A, Revision 270264if ( m_destFile == null ){
m_destFile = new File(m_destDir,m_file.getName());}
//Re-synchronizing Change: Clone A, Revision 271109if ( destFile == null ) {
destFile = new File(destDir,file.getName());}
Clone A Clone B
Diverging Change
Re-synchronizing Change
Revision 270250
Revision 270264
Revision 271109
![Page 5: Late Propagation in Software Clones](https://reader033.vdocuments.mx/reader033/viewer/2022051112/558bcf4ed8b42aab0b8b476b/html5/thumbnails/5.jpg)
5
Types of Late PropagationPropagation Category
LP Type
Modified During Diverging Change
Modified During the Period of Divergence
Modified During Re-synchronizing Change
Propagation Always Occurs
LP1 A A B
LP2 A A and B B
LP3 A A A and B
Propagation May or May Not Occur
LP4 A A and B A
LP5 A A and B A and B
LP6 A and B A and B A or B
LP7 A and B A and B A and B
Propagation Never Occurs
LP8 A A A
![Page 6: Late Propagation in Software Clones](https://reader033.vdocuments.mx/reader033/viewer/2022051112/558bcf4ed8b42aab0b8b476b/html5/thumbnails/6.jpg)
6
Research Questions
RQ1: Are there different types of LP?
RQ2: Are some types of LP more fault-prone than others?
RQ3: Which type of LP experiences the highest proportion of faults?
![Page 7: Late Propagation in Software Clones](https://reader033.vdocuments.mx/reader033/viewer/2022051112/558bcf4ed8b42aab0b8b476b/html5/thumbnails/7.jpg)
7
Subject Systems
System # LOC # Revisions# Gen CCFinder
# LP CCFinder
# Gen Simian
# LP Simian
ArgoUML 3.1M 18k 14k 1.1k 111 23Ant 2.3M 1.0M 30k 4.7k 461 80
![Page 8: Late Propagation in Software Clones](https://reader033.vdocuments.mx/reader033/viewer/2022051112/558bcf4ed8b42aab0b8b476b/html5/thumbnails/8.jpg)
8
Our Approach
![Page 9: Late Propagation in Software Clones](https://reader033.vdocuments.mx/reader033/viewer/2022051112/558bcf4ed8b42aab0b8b476b/html5/thumbnails/9.jpg)
Mining the SVN
• Use J-Rex to mine the SVN• Heuristics used to identify reason for commit
(Mockus et al., 2000)• Snapshots of all revisions to each Java file are stored
in an XML file• Test files are removed
9
![Page 10: Late Propagation in Software Clones](https://reader033.vdocuments.mx/reader033/viewer/2022051112/558bcf4ed8b42aab0b8b476b/html5/thumbnails/10.jpg)
Clone Detection
• Contents of each method revision extracted into individual files
• Perform clone detection once on all snapshots• Two existing clone detection tools are used– Simian (text-based) and CCFinder (token-based)
10
![Page 11: Late Propagation in Software Clones](https://reader033.vdocuments.mx/reader033/viewer/2022051112/558bcf4ed8b42aab0b8b476b/html5/thumbnails/11.jpg)
Building Clone Genealogies
• Build clone genealogies using the existing clone list• Query the SVN using diff to track changes to each
clone in a clone pair over time.• If a change modifies one of the clones in a clone pair,
query the clone list for a matching clone
11
![Page 12: Late Propagation in Software Clones](https://reader033.vdocuments.mx/reader033/viewer/2022051112/558bcf4ed8b42aab0b8b476b/html5/thumbnails/12.jpg)
12
RQ1: Are there different types of LP?
![Page 13: Late Propagation in Software Clones](https://reader033.vdocuments.mx/reader033/viewer/2022051112/558bcf4ed8b42aab0b8b476b/html5/thumbnails/13.jpg)
13
RQ1: Are there different types of LP?
There is representation from multiple types of LP and across all categories of LP.
LP1 LP2 LP3 LP4 LP5 LP6 LP7 LP8 0%
10%20%30%40%50%60%70%80%
Breakdown of LP Type by System
ArgoUML - Simian ArgoUML - CCFinder Ant - Simian Ant - CCFinderLP Types
Perc
enta
ge o
f All
LP O
ccur
renc
es
![Page 14: Late Propagation in Software Clones](https://reader033.vdocuments.mx/reader033/viewer/2022051112/558bcf4ed8b42aab0b8b476b/html5/thumbnails/14.jpg)
14
Part 1: Is Late Propagation fault-prone?
Part 2: Are specific types of late propagation more fault-prone?
RQ2: Are some types of LP more fault-prone than others?
![Page 15: Late Propagation in Software Clones](https://reader033.vdocuments.mx/reader033/viewer/2022051112/558bcf4ed8b42aab0b8b476b/html5/thumbnails/15.jpg)
15
In all significant cases, the odds ratio is greater than 1. Therefore, LP genealogies are more fault prone than
non-LP genealogies.
ArgoUML – Simian is omitted because it is not statistically
significant
00.5
11.5
22.5
33.5
LP vs. Non-LP Odds Ratios
Ant - Simian ArgoUML - CCFinder Ant - CCFinder
Odd
s Ra
tio
Part 1: Is Late Propagation Fault-prone?
![Page 16: Late Propagation in Software Clones](https://reader033.vdocuments.mx/reader033/viewer/2022051112/558bcf4ed8b42aab0b8b476b/html5/thumbnails/16.jpg)
16Note: ArgoUML – Simian is omitted because it is not statistically significant
Part 2: Are specific types of late propagation more fault-prone?
LP1 LP2 LP3 LP4 LP5 LP6 LP7 LP8 02468
10121416
Odds Ratios Between Each LP Type and Non-LP Genealogies
Ant - Simian ArgoUML - CCFinder Ant - CCFinder
LP Type
Odd
s Ra
tio
![Page 17: Late Propagation in Software Clones](https://reader033.vdocuments.mx/reader033/viewer/2022051112/558bcf4ed8b42aab0b8b476b/html5/thumbnails/17.jpg)
17
RQ2 Observations
• In general, some LP types are not more fault-prone than non-LP genealogies (i.e. odds ratio < 1)
• Some types that make up a small proportion of LP instances have a very high odds ratio
• LP7 and LP8 occur frequently but have low odds ratios.
Each type of LP has a different level of fault-proneness.
![Page 18: Late Propagation in Software Clones](https://reader033.vdocuments.mx/reader033/viewer/2022051112/558bcf4ed8b42aab0b8b476b/html5/thumbnails/18.jpg)
18
RQ3: Which type of LP experiences the highest proportion of faults?
![Page 19: Late Propagation in Software Clones](https://reader033.vdocuments.mx/reader033/viewer/2022051112/558bcf4ed8b42aab0b8b476b/html5/thumbnails/19.jpg)
19
RQ3: Which type of LP experiences the highest proportion of faults?
Note: ArgoUML – Simian is omitted because it is not statistically significant
LP1 LP2 LP3 LP4 LP5 LP6 LP7 LP8 0%
10%20%30%40%50%60%70%80%
Percentage of Fault Occurrences Broken Down by LP Type
Ant - Simian ArgoUML - CCFinder Ant - CCFinder
LP Type
Perc
enta
ge o
f Fau
lt O
ccur
renc
es
![Page 20: Late Propagation in Software Clones](https://reader033.vdocuments.mx/reader033/viewer/2022051112/558bcf4ed8b42aab0b8b476b/html5/thumbnails/20.jpg)
20
RQ3 Observations
• LP7 and LP8 contribute a large proportion of the faults but have lower odds ratios (RQ2)– When faults occur, they occur in large numbers
• Overall, LP7 and LP8 are the most dangerous, with the other types being system dependent in their fault-proneness.
The proportion of faults is different for each LP type.
![Page 21: Late Propagation in Software Clones](https://reader033.vdocuments.mx/reader033/viewer/2022051112/558bcf4ed8b42aab0b8b476b/html5/thumbnails/21.jpg)
21
Conclusion
• In general, LP genealogies are more fault-prone than non-LP genealogies
• LP7 and LP8 are the riskiest, in terms of their fault-proneness and magnitude of faults.– LP8 contains no propagation of changes– LP7 may or may not contain any propagation of
changes• The fault-proneness and fault-occurrence is
dependent on the LP type and is system-dependent.
![Page 22: Late Propagation in Software Clones](https://reader033.vdocuments.mx/reader033/viewer/2022051112/558bcf4ed8b42aab0b8b476b/html5/thumbnails/22.jpg)
22