Download - PhD Dissertation
![Page 1: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/1.jpg)
Patricia Deshane
Ph.D. Defense
April 14, 2010
![Page 2: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/2.jpg)
Introduction
Perspectives on copy-and-paste and code cloning
Problems of cloning and possible solutions
Dimensions for Tool Development
How related clone tracking tools define clone
properties and provide clone lifecycle support
Evaluation
Prevalence of clones, renaming, and errors
User study on clone visualization and renaming
User study on clone comparison (forthcoming)
Conclusion
![Page 3: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/3.jpg)
![Page 4: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/4.jpg)
Copy and paste – love it or hate it?
Short-term benefits
Copy/paste variable, type, or method names
Save typing
Remember a name’s spelling
Copy/paste blocks, methods, or classes
A similar solution exists…so why write from scratch?
Learn from past projects and examples
Different than plagiarism, this is software reuse!
Libraries & frameworks are designed well for reuse
But, the resulting clones need to be modified
(software maintenance … software quality)
![Page 5: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/5.jpg)
Clones as a software maintenance problem
Clone location & relationship forgotten over time
![Page 6: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/6.jpg)
Clones as a software maintenance problem
Clone location & relationship forgotten over time
![Page 7: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/7.jpg)
Clones as a software evolution problem
Clones also naturally evolve over time
Long-term or short-term…still need maintenance
Clones as a software quality problem
Software bugs and inconsistencies
(can be made for various reasons)
The addition of a new feature – apply update to all?
A bug is propagated & fixed – can become a new bug!
A clone is modified to fit its task – inconsistent rename
A single clone (parameterized clone) is modified,
usually only identifier names and literal constants
(numerical, character, boolean, or string values)
![Page 8: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/8.jpg)
Clones as an aesthetic or design problem
How does the source code look and smell?
Look
Code clones – an artificial increase in # of LOC
(duplication adds “unnecessary” lines of code)
Clones can make code more complex, less readable
Smell
Code smell – a hint that something could be wrong
(abstraction should be used whenever possible)
Design decision:
Create abstractions from the beginning, later on, or
not at all? If not from beginning, cloning is done
![Page 9: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/9.jpg)
Clone Detection
Algorithms and tools to detect code clones (exact
duplicates & “near-miss clones”) in pre-existing,
legacy source code
Retroactive – clone detection gives false positives
& false negatives…humans need to verify results
Clone Prevention?
Clone detection in real-time, disable copy/paste?
Clone Removal
Remove clones from system ASAP (refactoring)
When? Once and Only Once vs. Rule of Three
![Page 10: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/10.jpg)
Clones can be reasonable, beneficial, or
necessary
Clones can keep code clean & understandable
(GUI code, procedure with too many parameters)
Programming language can have limitations
(lack of expressiveness, no abstraction support)
Clones should be kept in the source code
Is it worth refactoring? Clone genealogies study:
Short-lived clones may diverge soon
Long-living clones are due to shortcomings of language
Making changes to clones is risky for companies
![Page 11: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/11.jpg)
Clones exist (the cloning problems do, too)
May not be desirable or possible to refactor
the clones, they need to be managed instead
Tool support is needed for all stages of the
clone lifecycle from clone creation to clone
extinction (which includes clone editing)
CnP – suite of Eclipse plug-ins for proactive
copy-and-paste-induced clone management
CReN - consistent renaming of identifiers
LexId – inferring lexical patterns in identifiers
CSeR – code segment reuse
CnP Clone
Visualization
CSeR Diff-Visualization
![Page 12: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/12.jpg)
Clone tracking tools with a focus on editing
[CnP, Clonescape, CPC] – proactive
[Codelink, LAPIS, CloneTracker] - retroactive
Definitions of clone properties
Clone similarity
Clone model
Clone visualization
Clone persistence
Clone documentation and clone attributes
Clone lifecycle support
4 lifecycle stages: clone creation, clone capture,
clone editing, and clone extinction
![Page 13: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/13.jpg)
![Page 14: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/14.jpg)
![Page 15: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/15.jpg)
Defining clones (similarity when captured)
Retroactive clone tracking tools rely on clone
detection tools or programmer’s selection of
clones – can yield inaccurate clones
Proactive clone tracking tools (including CnP)
capture copy/paste – 100% accurate, identical
Managing clones (similarity when edited)
Some corresponding code between related clones
(identifiers, substrings, fields/methods, etc.)
Longest-common subsequence (LCS) algorithm
Levenshtein distance (LD) - the edit distance
![Page 16: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/16.jpg)
Clone location
Character offset and length in a file
Copied and pasted source code is represented to the
largest continuous set of abstract syntax tree (AST)
nodes within the range
Copied and pasted source code that is only partially
contained within an AST node is not captured
File name plus line range
Clone region descriptor (CRD)
Tells of the clone’s relative location using syntactic,
structural, and lexical information
(for example, the clone’s alignment with code blocks)
![Page 17: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/17.jpg)
Clone relationship
Clone group – related clones are viewed at the
same level of group membership symmetrically
(also called: region set, clone class, etc.)
Knowing the origin can be useful for clone comparison
and clone visualization (separate from the model)
Clone family – distinguishes between the original
code (the parent) and the duplicated copies
(children, which are siblings to each other)
![Page 18: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/18.jpg)
Clone groups (related clones)
Clone group #1
Clone group #2
![Page 19: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/19.jpg)
Markers – colored bars and highlights
CnP clone visualization – shows clone locations,
clone groups, clone origin and subsequent pastes
CSeR diff-visualization - highlights user edits
Warnings – error prevention or detection
CnP – warnings about external identifier scoping
Alerts – clone modification notification
Alert the programmer when clones are edited
Views and graphs
Views – show lists of clones, clone groups, etc.
Graphs – can be complicated to understand
![Page 20: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/20.jpg)
Pasted code
Original code
CnP Clone Visualization
![Page 21: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/21.jpg)
CSeR Diff-Visualization
![Page 22: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/22.jpg)
A flat database (text file)
CnP - stores each clone’s location (file name,
clone’s starting character position & length in #
of characters) within each clone group
CReN - stores each identifier’s location
(identifier’s starting character position & length
in # of characters) within each identifier group
CSeR - stores character positions & change info.
XML file, SQL database, file meta-data
Can store clone information for tagged clones,
including links between clones, & copy/paste
activity, clone modification history also
![Page 23: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/23.jpg)
Additional information about clones provided
by the programmer
The reason why the code was duplicated
Only one form of clone classification
Whether the clone should be removed from the
system (clone severity)
CnP does not have this feature (yet)
![Page 24: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/24.jpg)
![Page 25: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/25.jpg)
How were the clones created?
Copy/paste
Other – manual typing, cut/paste/paste,
automatic code generation, etc.
Why were the clones created?
Intentional clones – code that the programmer
intended to reuse
Accidental clones – code that is similar due to a
protocol requirement
![Page 26: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/26.jpg)
Tracking copy-and-paste actions (proactive)
Detects the creation of new clones that are made
via copying and pasting
Listens to document activity in Eclipse’s Java
editor & makes correspondences when identical
CnP tracks only “significant” clones that contain:
More than two statements, or
At least one conditional statement, loop statement, or
method, or
A type definition (class or interface)
Other tools’ policies:
At least 30 tokens, specified minimum clone length
![Page 27: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/27.jpg)
Importing from clone detection tools
(retroactive)
Complements proactive clone tracking
Clone detection tool results are listed,
programmer selects which of the reported clone
groups to import, start tracking these clones
Programmer selection may not be required
Selecting clones (retroactive)
Clones are just selected manually by programmer
Programmer needs to know which clones to
select and where they are in the system
![Page 28: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/28.jpg)
Inter-clone editing (between clones)
Same physical change is needed between all
related clones such as a new feature or bug fix
In related work, this is called linked editing,
synchronous editing, and simultaneous editing
Update in one place like with an abstraction
But with inter-clone editing, clones remain in system
CnP does not have this feature (yet)
![Page 29: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/29.jpg)
Intra-clone editing (within clones)
Only the relationship is the same between the
clones, not the physical change itself
CReN – consistent renaming
of identifiers within clones
Copied
code
Pasted
code
![Page 30: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/30.jpg)
for(i = 1; i < size; i++)
{
if(array[i] < low) {
low = array[i];
}
}
![Page 31: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/31.jpg)
Intra-clone editing (within clones)
Only the relationship is the same between the
clones, not the physical change itself
LexId – consistent renaming
of substrings within clones
Copied
code
Pasted
code
![Page 32: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/32.jpg)
Refactoring
Actually a form of clone editing
CnP does not have this feature (yet)
Clone group #1
Clone group #2
![Page 33: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/33.jpg)
Refactoring
Actually a form of clone editing
CnP does not have this feature (yet)
Clone group #1
Clone group #2
Abstraction #2
Abstraction #1
![Page 34: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/34.jpg)
Clone divergence (loss of similarity)
Clones may naturally separate from one another
But if copied and pasted, likely to retain similarity
Unlike with refactoring, with clone divergence
the cloning relationship is removed
Tools allow the programmer to remove a
clone from a clone group (for tracking)
Programmer has full control over the clones that
are considered related (similar) to one another
Clones can be “linked” and “unlinked”
(for inter-clone editing)
![Page 35: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/35.jpg)
![Page 36: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/36.jpg)
There is significant code reuse in commercial
and open source software
Clone detection tools find clones during tests
Case study with CCFinderX and SimScan
clone detection tools on SCL and Eclipse JDT
UI plug-in source code
For SCL, SimScan found 102 clone groups, 70
which were intentional and useful clone groups
50 out of the 70 intentional, useful clone groups
consisted of clones that were likely copy/pasted
These 50 groups could have been supported with CnP
![Page 37: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/37.jpg)
Most (65-67%) copied-and-pasted code
fragments require renaming at least one
identifier [CP-Miner]
Difficult to tell retroactively whether code
was actually copy/pasted or renamed
Some tools look at the correspondence
between identifiers over software versions
(in version control systems), which can be
used to determine renaming inconsistencies
[Clever, Vaci]
![Page 38: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/38.jpg)
[CP-Miner]
![Page 39: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/39.jpg)
[Bug Isolation]
![Page 40: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/40.jpg)
[DECKARD-based]
![Page 41: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/41.jpg)
![Page 42: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/42.jpg)
Subject Characteristics
14 male subjects - 8 undergraduate, 6 graduate
students from Clarkson MCS and ECE departments
Knowledge of Java/Swing required, IDE optional
Study Procedure
Subjects came one at a time to a user study lab
Background about the problems of copy/paste
clones and the three features were presented
The source code and graphical Paint program
that was used for the tasks was shown to them
Subjects were recorded with video/audio
![Page 43: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/43.jpg)
Annotated screenshot of the Paint program
![Page 44: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/44.jpg)
Debugging tasks
Task 1: Moving the blue slider does not change
the pixel color.
rSlider should be bSlider (on line 120)
![Page 45: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/45.jpg)
Debugging tasks
Task 2: Moving the thickness slider does not
change the pixel thickness.
colorChangeListener should be
thicknessChangeListener (on line 142)
![Page 46: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/46.jpg)
Modification tasks
Task 3: Add a titled border to colorPanel and to
thicknessPanel.
![Page 47: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/47.jpg)
Task 3
![Page 48: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/48.jpg)
Modification tasks
Task 4: Add color to the label of each color slider
- red, green, and blue.
![Page 49: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/49.jpg)
Task 4
![Page 50: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/50.jpg)
Renaming tasks (with CReN)
Task 5: Rename colorPanel to thicknessPanel and
rPanel to tPanel within the clone.
![Page 51: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/51.jpg)
Renaming tasks (with CReN)
Task 6: Rename toolPanel to clearUndoPanel,
pencilButton to clearButton, and eraserButton to
undoButton within the clone.
![Page 52: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/52.jpg)
Renaming tasks (with LexId)
Task 7: Rename rPanel to gPanel and rSlider to
gSlider in the green slider clone (shown), and
rPanel to bPanel and rSlider to bSlider in the blue
slider clone.
![Page 53: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/53.jpg)
Renaming tasks (with LexId)
Task 8: Rename bPanel to tPanel and bSlider to
tSlider in the thickness slider clone.
![Page 54: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/54.jpg)
Results - Time per Task
The time (in minutes) to complete each pair of tasks.
![Page 55: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/55.jpg)
Results – Time per Task
Statistical hypothesis testing on the paired time data.
![Page 56: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/56.jpg)
Results – Solution Correctness
Correct states when running the program or when finished.
![Page 57: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/57.jpg)
Results – Method of Completion
Number of subjects who used each location and inspection
method for debugging and modification tasks.
![Page 58: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/58.jpg)
Results – Method of Completion
Number of times each renaming method was used for
renaming tasks.
![Page 59: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/59.jpg)
Discussion
Confounding factors for clone visualization
Clone visualization is not forced on the user
Subjects would have produced correct solutions to
Task 3 if they had made use of cloning information
Varying levels of subjects’ prior experience
Threats to validity
Some subjects had more prior knowledge/experience
Tasks fairly close to real-world GUI programming tasks
Tool design
Need to further improve the clone visualization
Need to tell programmers exactly what was renamed
![Page 60: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/60.jpg)
![Page 61: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/61.jpg)
Research contributions
The copy-and-paste (CnP) tool
Proactive tracking
Intra-clone editing
AST-based
Dimensions of clone tracking tool development
Definition of the clone lifecycle
Realization about clone visualization
Future work
Theory about copy-and-paste and abstractions
Other applications of this research
![Page 62: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/62.jpg)
![Page 63: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/63.jpg)
CnP user study paper published in ICPC 2010
Papers went through a rigorous reviewing process
Only 15/76 (< 20%) accepted as a full paper
CReN paper published in ETX 2007
This is a young topic that is getting recognition
Cited 7 times according to ACM Digital Library
2 additional citations are reported on CiteSeerX
39 downloads in the last year (from ACM website)
4 downloads in the last week (from ACM website)
![Page 64: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/64.jpg)
The position of the source code characters as represented in
an ASTNode.
![Page 65: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/65.jpg)
The three cases when capturing a range of source
code using the Eclipse AST API.
![Page 66: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/66.jpg)
Identifier Matching
![Page 67: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/67.jpg)
Identifier Partitioning
![Page 68: PhD Dissertation](https://reader033.vdocuments.mx/reader033/viewer/2022052910/559a88a31a28ab187d8b4821/html5/thumbnails/68.jpg)
Intra-clone editing (within clones)
Only the relationship is the same between the
clones, not the physical change itself
LexId – consistent renaming
of substrings within clones
Copied
code
Pasted
code