ase2013

11
Giuseppe Scanniello 1 Carmine Gravino 2 Andrian Marcus 3 Tim Menzies 4 [email protected], [email protected] [email protected], [email protected] 1 University of Basilicata, Italy 2 Italy University of Salerno, Italy 3 Wayne State University, USA 4 West Virginia University, USA Class Level Fault PredicCon using SoMware Clustering

Upload: cs-ncstate

Post on 06-May-2015

134 views

Category:

Technology


0 download

DESCRIPTION

Class Level Fault Prediction using Software Clustering for IEEE ASE 2013 by Giuseppe Scanniello (1) Carmine Gravino (2) Andrian Marcus (3) Tim Menzies (4) from 1 University of Basilicata, Italy 2 Italy University of Salerno, Italy 3 Wayne State University, USA 4 West Virginia University, USA

TRANSCRIPT

Page 1: Ase2013

Giuseppe'Scanniello1''Carmine'Gravino2''Andrian'Marcus3''Tim'Menzies4'

''

[email protected],''[email protected]''[email protected],'''[email protected]'''

'1'University'of'Basilicata,'Italy''

2'Italy'University'of'Salerno,'Italy''3'Wayne'State'University,'USA''4'West'Virginia'University,'USA'

'''''

Class'Level'Fault'PredicCon''using'SoMware'Clustering''

'

Page 2: Ase2013

This talk= BorderFlow clustering for defect prediction

•  !Defect!predic+on!–  Sort'modules'by'odds'of'having'defects'–  Used'to'prioriCzing'subsequent'work'

'

•  Borderflow'–  Finds'code'clusters'with'

•  High'cohesion'•  Low'coupling''

•  Produces'beRer'defect'predictors.'

2'

Page 3: Ase2013

Q:'Why?''A:'Too'much'blah,'blah'

•  SoMware'is'being'wriRen'by'– More'people'–  For'more'tasks'–  Using'changing'tools'–  On'ever'changing'plaWorms'

''

•  Any'claim'that'X'is'always'THE'prime''determiner'of''defects,'efforts,'livelocks,'etc'etc,'etc'is'…..'–  Trite'simplificaCons'of'a'more'complex'issue'

3'

Page 4: Ase2013

Local'lessons'>'Trite'global'claims'•  Cluster'data'•  Learn'1'model'per'cluster'

–  Context^specific'soluCons'–  BeRer'predicCons,''–  lower'variance','–  lower'false'alarm,'–  faster'runCmes'–  beRer'explanaCon'–  etc.'

'

4'

•  As'recommended'by''any'number'of'papers'–  Turhan:ESEj’09;//–  Menzies:ASE’11,TSE’13;//–  Be:enburg:MSR’12;//–  Yang:IST’13//–  etc,'etc,'etc.'

''

Page 5: Ase2013

Related'Work'•  !How'to'cluster?''

–  By'intra^module'features?''•  Menzies:ASE’11,TSE’13;//Be:enburg:MSR’12/

–  By'performance'deltas'of'models'learned''from'different'straCficaCons?'''

•  Yang:IST’13;/He:ASEj’12;/He:ESEM’13/

•  And'what'features'to'use?'–  Just'staCc'code'measures:'

•  'Menzies:ASE’11,TSE’13;//Be:enburg:MSR’12'–  Using'design'+'code'arCfacts?''''

•  Schroter:ESEM’06/–  Using'soMware'process'+product'measures'?'

•  Kamei:ICSM’10/–  Using'synthesized'aRributes'using'PCA'or'LSI?'

•  'Nagappan:ICSE’06,'Tan:WCRE’11//

•  Premise'of'this'paper:'–  These'How'and'What'are'related' 5'

Need'principles'for'reducing'opCon'space'

Hence,''this'paper'

Page 6: Ase2013

New'idea''

•  So'cluster'not'by'intra7module!features;''–  e.g.'as'done'by'Menzies:ASE’11,TSE’13;/Be:enburg:MSR’12;/etc/–  But'by'inter7module!features!

!•  Use'a'clusterer'that'understands'cohesion'and'coupling'

–  This'talk:'BorderFlow'clustering'for'defect!predic+on!6'

•  SoMware'has''natural'clusters'–  Regions'of'high'cohesion'

and'low'coupling''

•  What'if'we'exploited'that'naturally'occurring'structure?''

Page 7: Ase2013

Target'domain'•  Defect'predicCon'from'staCc'

code'features'•  Easy'to'use:''

–  scalable'feature'extractors'+'logs'of'defects'found'

•  Widely'used:'–  PrioriCze'inspecCons:'find'20%'of'code'

with'80%'of'errors'•  Ostrand:ISSTA’04;'Nagappan:ICSE’06;'

Menzies:ASEj’10;'Tosun:IAAI’10;'etc'etc'

•  Useful'to'use:'–  Compared'to'(some)'samples'of'

industrial'pracCces…'•  Finds'more'defects:'Menzies:TSE’07;''

7'

Page 8: Ase2013

Borderflow'Ngomo:CLCing’09/

•  Graph'G'='(V,E);'V'are'classes''''

•  e(ci','cj')'�'E'if''cj''references'ci''–  In'class'instanCaCon,'method'invocaCon,'or'field'access'–  JRIPPLES:'hRp://jripples.sourceforge.net/'

'

•  A'cluster'X'is'a'subset'of'V'that'maximizes:'at'–  F(X)' '='Ω(b(X),X)''/'Ω(b(X'),'n(X'))'–  'b(X)' '=''border'nodes'inside'X'–  'n(X)' '='direct'neighbors'of'b(X),'outside'of'X'–  Ω'' '=''number'of'the'edges''between''subsets'of'V'';'''

' '''''Ω(X,Y)=Σ'e(ci,cj)|ci'�X'and'cj'�Y'''

•  'IteraCvely'inserts''nodes'in'n(X)'Cll''F(X)'is'maximized.'–  1)''Candidates:'find''C('X')'='nodes'not'in'X''where'''F(X+C(X))'>'F(X)'–  2)''Prune:''subset'Y''in''C(X)'that''maximize'Ω(Y,'n(X')).'–  3)'Test:'if'F(X+Y)'>='F(X),'then'X'='X'+'Y' 8'

Page 9: Ase2013

Experiment:'leave^one^out,'JAVA'classes,''

learn'from'clusters'vs'learn'from'all''

Java'systems'''from'promisedata.googleode.com'X'='one'of''Ant,Jedit,Lucene,POI,Synapse,Velocity,Xalan''Xerces'''Version'='one'version'of'X''Clusters'='BorderFlow('Version')'''For'Cluster'in'Clusters/'''''''For'Class'in'Version/

'''''''''''''Test '= 'Class/''''''''''''''''a '= 'Class.Faults'

'''''''''''Train '= 'Version'–'Test'''''''''Model0 '= 'SwLSR'('Train')!#!baseline:!global!model''''''''''''''''''p0'= 'Model0('Test')!'''''''Model1 '= 'SwLSR('Cluster'–'Class)'#!!local!model''''''''''''''''''p1'= 'Model1('Class/)'

!  Dependent'variable'!  ClassFault.'

'!  independent:''

!  WMC''!  Weighted'Methods''

per'Class)'!  DIT''

!  Depth'Inheritance'Tree''!  NOC''

!  Number'Of'Children)'!  CBO''

!  Coupling'Between''Object'classes'

!  RFC''!  Response'For'Class'

!  LCOM''!  Lack'of'Cohesion'in'

Methods'!  NPM''

!  Number'of''Public'Methods'

!  LOC''!  Lines'Of'Code'

'!  Hypothesis'test:Mann^Whitney'(5%)' 9'

Page 10: Ase2013

Results:'Error'='mean'absolute'residuals'='p^a'

(more'='worse)'

•  Usually,''local'is'best'–  Has'less'error'–  SomeCmes,''

much'beRer'

•  ExcepCons:''–  See'Velocity,''

Jedit'(but'only'some'versions)'

^1'

^0.5'

0'

0.5'

1'

1.5'

'JEdit'3

.2.1''

'JEdit'4

.1''

'Velocity

'1.4''

'JEdit'4

.2''

'JEdit'4

.3''

'JEdit'4

.0'''

'Velocity

'1.6.1''

'Velocity

'1.5''

global' local' delta'

10'

Page 11: Ase2013

Summary!

•  Too'many'opCons'–  Need'principles'to'design'data'

miners'for'soMware'engineering''

•  We'applied'a'core'SE'principle'–  Coupling'and'cohesion'

'•  Used'it'to'select'both'

–  A'data'miner:'BorderFlow'–  And'the'aRributes'it'explores'

•  Inter^module'features''

•  Obtained'beRer'results'

Future!Work!

•  Repeat'on'more'data'sets''

•  Compare'with'other'local'learners''

•  Test'if'inter^module'always'best'

11'