ase2013
DESCRIPTION
Class Level Fault Prediction using Software Clustering for IEEE ASE 2013 by Giuseppe Scanniello (1) Carmine Gravino (2) Andrian Marcus (3) Tim Menzies (4) from 1 University of Basilicata, Italy 2 Italy University of Salerno, Italy 3 Wayne State University, USA 4 West Virginia University, USATRANSCRIPT
![Page 1: Ase2013](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554a0f85b4c90507558b4b7e/html5/thumbnails/1.jpg)
Giuseppe'Scanniello1''Carmine'Gravino2''Andrian'Marcus3''Tim'Menzies4'
''
[email protected],''[email protected]''[email protected],'''[email protected]'''
'1'University'of'Basilicata,'Italy''
2'Italy'University'of'Salerno,'Italy''3'Wayne'State'University,'USA''4'West'Virginia'University,'USA'
'''''
Class'Level'Fault'PredicCon''using'SoMware'Clustering''
'
![Page 2: Ase2013](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554a0f85b4c90507558b4b7e/html5/thumbnails/2.jpg)
This talk= BorderFlow clustering for defect prediction
• !Defect!predic+on!– Sort'modules'by'odds'of'having'defects'– Used'to'prioriCzing'subsequent'work'
'
• Borderflow'– Finds'code'clusters'with'
• High'cohesion'• Low'coupling''
• Produces'beRer'defect'predictors.'
2'
![Page 3: Ase2013](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554a0f85b4c90507558b4b7e/html5/thumbnails/3.jpg)
Q:'Why?''A:'Too'much'blah,'blah'
• SoMware'is'being'wriRen'by'– More'people'– For'more'tasks'– Using'changing'tools'– On'ever'changing'plaWorms'
''
• Any'claim'that'X'is'always'THE'prime''determiner'of''defects,'efforts,'livelocks,'etc'etc,'etc'is'…..'– Trite'simplificaCons'of'a'more'complex'issue'
3'
![Page 4: Ase2013](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554a0f85b4c90507558b4b7e/html5/thumbnails/4.jpg)
Local'lessons'>'Trite'global'claims'• Cluster'data'• Learn'1'model'per'cluster'
– Context^specific'soluCons'– BeRer'predicCons,''– lower'variance','– lower'false'alarm,'– faster'runCmes'– beRer'explanaCon'– etc.'
'
4'
• As'recommended'by''any'number'of'papers'– Turhan:ESEj’09;//– Menzies:ASE’11,TSE’13;//– Be:enburg:MSR’12;//– Yang:IST’13//– etc,'etc,'etc.'
''
![Page 5: Ase2013](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554a0f85b4c90507558b4b7e/html5/thumbnails/5.jpg)
Related'Work'• !How'to'cluster?''
– By'intra^module'features?''• Menzies:ASE’11,TSE’13;//Be:enburg:MSR’12/
– By'performance'deltas'of'models'learned''from'different'straCficaCons?'''
• Yang:IST’13;/He:ASEj’12;/He:ESEM’13/
• And'what'features'to'use?'– Just'staCc'code'measures:'
• 'Menzies:ASE’11,TSE’13;//Be:enburg:MSR’12'– Using'design'+'code'arCfacts?''''
• Schroter:ESEM’06/– Using'soMware'process'+product'measures'?'
• Kamei:ICSM’10/– Using'synthesized'aRributes'using'PCA'or'LSI?'
• 'Nagappan:ICSE’06,'Tan:WCRE’11//
• Premise'of'this'paper:'– These'How'and'What'are'related' 5'
Need'principles'for'reducing'opCon'space'
Hence,''this'paper'
![Page 6: Ase2013](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554a0f85b4c90507558b4b7e/html5/thumbnails/6.jpg)
New'idea''
• So'cluster'not'by'intra7module!features;''– e.g.'as'done'by'Menzies:ASE’11,TSE’13;/Be:enburg:MSR’12;/etc/– But'by'inter7module!features!
!• Use'a'clusterer'that'understands'cohesion'and'coupling'
– This'talk:'BorderFlow'clustering'for'defect!predic+on!6'
• SoMware'has''natural'clusters'– Regions'of'high'cohesion'
and'low'coupling''
• What'if'we'exploited'that'naturally'occurring'structure?''
![Page 7: Ase2013](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554a0f85b4c90507558b4b7e/html5/thumbnails/7.jpg)
Target'domain'• Defect'predicCon'from'staCc'
code'features'• Easy'to'use:''
– scalable'feature'extractors'+'logs'of'defects'found'
• Widely'used:'– PrioriCze'inspecCons:'find'20%'of'code'
with'80%'of'errors'• Ostrand:ISSTA’04;'Nagappan:ICSE’06;'
Menzies:ASEj’10;'Tosun:IAAI’10;'etc'etc'
• Useful'to'use:'– Compared'to'(some)'samples'of'
industrial'pracCces…'• Finds'more'defects:'Menzies:TSE’07;''
7'
![Page 8: Ase2013](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554a0f85b4c90507558b4b7e/html5/thumbnails/8.jpg)
Borderflow'Ngomo:CLCing’09/
• Graph'G'='(V,E);'V'are'classes''''
• e(ci','cj')'�'E'if''cj''references'ci''– In'class'instanCaCon,'method'invocaCon,'or'field'access'– JRIPPLES:'hRp://jripples.sourceforge.net/'
'
• A'cluster'X'is'a'subset'of'V'that'maximizes:'at'– F(X)' '='Ω(b(X),X)''/'Ω(b(X'),'n(X'))'– 'b(X)' '=''border'nodes'inside'X'– 'n(X)' '='direct'neighbors'of'b(X),'outside'of'X'– Ω'' '=''number'of'the'edges''between''subsets'of'V'';'''
' '''''Ω(X,Y)=Σ'e(ci,cj)|ci'�X'and'cj'�Y'''
• 'IteraCvely'inserts''nodes'in'n(X)'Cll''F(X)'is'maximized.'– 1)''Candidates:'find''C('X')'='nodes'not'in'X''where'''F(X+C(X))'>'F(X)'– 2)''Prune:''subset'Y''in''C(X)'that''maximize'Ω(Y,'n(X')).'– 3)'Test:'if'F(X+Y)'>='F(X),'then'X'='X'+'Y' 8'
![Page 9: Ase2013](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554a0f85b4c90507558b4b7e/html5/thumbnails/9.jpg)
Experiment:'leave^one^out,'JAVA'classes,''
learn'from'clusters'vs'learn'from'all''
Java'systems'''from'promisedata.googleode.com'X'='one'of''Ant,Jedit,Lucene,POI,Synapse,Velocity,Xalan''Xerces'''Version'='one'version'of'X''Clusters'='BorderFlow('Version')'''For'Cluster'in'Clusters/'''''''For'Class'in'Version/
'''''''''''''Test '= 'Class/''''''''''''''''a '= 'Class.Faults'
'''''''''''Train '= 'Version'–'Test'''''''''Model0 '= 'SwLSR'('Train')!#!baseline:!global!model''''''''''''''''''p0'= 'Model0('Test')!'''''''Model1 '= 'SwLSR('Cluster'–'Class)'#!!local!model''''''''''''''''''p1'= 'Model1('Class/)'
! Dependent'variable'! ClassFault.'
'! independent:''
! WMC''! Weighted'Methods''
per'Class)'! DIT''
! Depth'Inheritance'Tree''! NOC''
! Number'Of'Children)'! CBO''
! Coupling'Between''Object'classes'
! RFC''! Response'For'Class'
! LCOM''! Lack'of'Cohesion'in'
Methods'! NPM''
! Number'of''Public'Methods'
! LOC''! Lines'Of'Code'
'! Hypothesis'test:Mann^Whitney'(5%)' 9'
![Page 10: Ase2013](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554a0f85b4c90507558b4b7e/html5/thumbnails/10.jpg)
Results:'Error'='mean'absolute'residuals'='p^a'
(more'='worse)'
• Usually,''local'is'best'– Has'less'error'– SomeCmes,''
much'beRer'
• ExcepCons:''– See'Velocity,''
Jedit'(but'only'some'versions)'
^1'
^0.5'
0'
0.5'
1'
1.5'
'JEdit'3
.2.1''
'JEdit'4
.1''
'Velocity
'1.4''
'JEdit'4
.2''
'JEdit'4
.3''
'JEdit'4
.0'''
'Velocity
'1.6.1''
'Velocity
'1.5''
global' local' delta'
10'
![Page 11: Ase2013](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554a0f85b4c90507558b4b7e/html5/thumbnails/11.jpg)
Summary!
• Too'many'opCons'– Need'principles'to'design'data'
miners'for'soMware'engineering''
• We'applied'a'core'SE'principle'– Coupling'and'cohesion'
'• Used'it'to'select'both'
– A'data'miner:'BorderFlow'– And'the'aRributes'it'explores'
• Inter^module'features''
• Obtained'beRer'results'
Future!Work!
• Repeat'on'more'data'sets''
• Compare'with'other'local'learners''
• Test'if'inter^module'always'best'
11'