icsm 2011 you can't control the unfamiliar
TRANSCRIPT
12-04-2023/ W&I / MDSE PAGE 1
Metrics are usually computed at a low level: classes, methods, …
/W&I / MDSE 12-04-2023
Multitude of data values obscures a general picture of the system maintainability
PAGE 2
/W&I / MDSE 12-04-2023
That we are actually interested in!
PAGE 3
You Can't Control the Unfamiliar: A Study on the Relations Between Aggregation Techniques for Software Metrics
Bogdan Vasilescu
Alexander Serebrenik
Mark van den Brand
/W&I / MDSE 12-04-2023
Two kinds of aggregation
Same artifact, different metrics
Same metrics, different artifacts
PAGE 5
/W&I / MDSE 12-04-2023
Various techniques can be found in the literature
Same metrics, different artifacts
PAGE 6
Traditional: mean, median, sum, …
Econometric inequality indices: Gini, Theil, Hoover, Kolm, Atkinson
/W&I / MDSE 12-04-2023
Various techniques can be found in the literature
Same metrics, different artifacts
PAGE 7
Traditional: mean, median, sum, …
Econometric inequality indices: Gini, Theil, Hoover, Kolm, Atkinson
Which aggregation technique should we
use?
/W&I / MDSE 12-04-2023
Questions
1. Which and to what extent do the different aggregation techniques agree?
2. What is the nature of the relation between the various aggregation techniques?
3. How does the correlation coefficient change as the systems evolve?
PAGE 8
/W&I / MDSE 12-04-2023
Qualitas Corpus 20101126
PAGE 9
• Qualitas Corpus 20101126r, 106 systems • FitJava v1.1, 2 packages, 2240 SLOC • NetBeans v6.9.1, 3373 packages 1890536 SLOC.
/W&I / MDSE 12-04-2023
1) Agreement between diff techniques
• Agreement: • Aggregation: Class SLOC Package• Techniques agree if they rank the packages similarly
PAGE 10
We use rank-based correlation coefficient: Kendall’s
/W&I / MDSE 12-04-2023
1) Agreement: different inequality indices?
• Gini, Theil, Hoover, Atkinson – agree• aggregates obtained convey the same information• Kolm does not!
PAGE 11
/W&I / MDSE 12-04-2023
1) Agreement: traditional and ineq indices?
• mean • Kolm: strong (0,8) and statistically significant (92%)• median, standard deviation, and variance
• sum• does not correlate with any other aggregation technique
PAGE 12
/W&I / MDSE 12-04-2023
2) Nature of the relation: Typical patterns
• Theil is known to be more sensitive to the rich
• Theil increases faster when Gini increases
PAGE 13
• Linear relation with a “fat” head
/ W&I / MDSE 12-04-2023
Which aggregation technique? (1)
• Theil, Hoover, Gini and Atkinson agree• Any can be chosen from the correlation point of view
• Some might be “better” in each specific case• easy to interpret: Gini [0,1]• provide additional insights: Theil (explanation)• negative values: Gini, Hoover
− affects the domain!• sensitive for high values: Theil, Atkinson• deviations from uniformity: Gini, Hoover
PAGE 14
/ W&I / MDSE 12-04-2023
Which aggregation technique? (2)
• Kolm and mean agree• Kolm is reliable for skewed distributions
− better alternative (“by no means”)• Not in the paper:
− agreement observed for NOC− but not for DIT!
PAGE 15
/W&I / MDSE 12-04-2023
Conclusions
PAGE 16