empirical evaluations in software engineering research: a personal perspective

112
1 Empirical Evaluations in Software Engineering Research: A Personal Perspective Ahmed E. Hassan Canada Research Chair in Software Analytics Blackberry Ultra Large Scale Systems Chair School of Computing, Queen’s University Kingston, Canada

Upload: sailqu

Post on 21-Jan-2018

247 views

Category:

Presentations & Public Speaking


0 download

TRANSCRIPT

Page 1: Empirical Evaluations in Software Engineering Research: A Personal Perspective

1

Empirical Evaluations in Software Engineering Research: A Personal Perspective

Ahmed E. Hassan Canada Research Chair in Software Analytics Blackberry Ultra Large Scale Systems Chair

School of Computing, Queen’s University Kingston, Canada

Page 2: Empirical Evaluations in Software Engineering Research: A Personal Perspective

2

Today’s goal:

Give concrete ideas Inspire change

Page 3: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Kingston Home of the 1,000 Islands

3

Page 4: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Kingston Home of the 1,000 Islands

3

Page 5: Empirical Evaluations in Software Engineering Research: A Personal Perspective

4

Page 6: Empirical Evaluations in Software Engineering Research: A Personal Perspective

5

Intelligence throughout the lifetime of a software system

from inception to production http://sail.cs.queensu.ca

Page 7: Empirical Evaluations in Software Engineering Research: A Personal Perspective

1851

6

SAIL’s new home: A 5,000 sq. ft. historical building

Page 8: Empirical Evaluations in Software Engineering Research: A Personal Perspective

1851

6

SAIL’s new home: A 5,000 sq. ft. historical building

Page 9: Empirical Evaluations in Software Engineering Research: A Personal Perspective

1851

6

SAIL’s new home: A 5,000 sq. ft. historical building

Page 10: Empirical Evaluations in Software Engineering Research: A Personal Perspective

~15 years of Mining Software Repositories

7http://msrconf.org

Page 11: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Early MSR Days

8

Page 12: Empirical Evaluations in Software Engineering Research: A Personal Perspective

9

Early Day Miner

Page 13: Empirical Evaluations in Software Engineering Research: A Personal Perspective

MSR Today: Easy Peasy!

10

Page 14: Empirical Evaluations in Software Engineering Research: A Personal Perspective

11

Data track GHTorrent

Lots of data!

Page 15: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Then

12

Now

Page 16: Empirical Evaluations in Software Engineering Research: A Personal Perspective

13

Thanks to a large scale community

effort!

Page 17: Empirical Evaluations in Software Engineering Research: A Personal Perspective

14

Yet low practitioner interest!!

Page 18: Empirical Evaluations in Software Engineering Research: A Personal Perspective

15

Even though our friendly reviewers are pleased!

Page 19: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Why?!?

16

Page 20: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Why?!?

16

We continue to raise the bar for studies

Page 21: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Empirical evaluations continues to mature and evolve

17

Page 22: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Empirical evaluations continues to mature and evolve

17

No validation

Single System

Multiple Systems

GitHub Scale

+ Practitioner Feedback

Page 23: Empirical Evaluations in Software Engineering Research: A Personal Perspective

18

No validation

Single System

Multiple Systems

GitHub Scale

+ Practitioner Feedback

Page 24: Empirical Evaluations in Software Engineering Research: A Personal Perspective

18

No validation

Single System

Multiple Systems

GitHub Scale

+ Practitioner Feedback

Are things this bad? What can we do to prevent this?

Page 25: Empirical Evaluations in Software Engineering Research: A Personal Perspective

We are killing

research areas

19

Software Visualization

Program Understand.

Page 26: Empirical Evaluations in Software Engineering Research: A Personal Perspective

We are following industry instead of leading!!

Web Apps (e.g, AJAX) Data Analytics

Scripting

Infrastructure Mobile Apps IoT

Page 27: Empirical Evaluations in Software Engineering Research: A Personal Perspective

We are doing this to ourselves!!

21

Page 28: Empirical Evaluations in Software Engineering Research: A Personal Perspective

22

Page 29: Empirical Evaluations in Software Engineering Research: A Personal Perspective

22

Practitioner buy-in

More studies (N+1)

Replication data

Not Soft. Eng!

Page 30: Empirical Evaluations in Software Engineering Research: A Personal Perspective

23

Practitioner buy-in

More studies (N+1)

Replication data

Page 31: Empirical Evaluations in Software Engineering Research: A Personal Perspective

23

Practitioner buy-in

More studies (N+1)

Replication data

Not Soft. Eng!

Page 32: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Not Software Enginering

24

Page 33: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Not Software Enginering

24

Mobile AppsLogs

Build SystemsLoad Testing

NoSQL Systems

Page 34: Empirical Evaluations in Software Engineering Research: A Personal Perspective

25

Page 35: Empirical Evaluations in Software Engineering Research: A Personal Perspective

25

"I found the overall presentation disjointed…. This needs to focus more

on the IR issues and less on web analysis.” SIGIR Rejection 98

Page 36: Empirical Evaluations in Software Engineering Research: A Personal Perspective

26

Practitioner buy-in

Replication data

Not Soft. Eng!

Page 37: Empirical Evaluations in Software Engineering Research: A Personal Perspective

26

Practitioner buy-in

More studies (N+1)

Replication data

Not Soft. Eng!

Page 38: Empirical Evaluations in Software Engineering Research: A Personal Perspective

27

We can early-detect

brain cancer!!!

Page 39: Empirical Evaluations in Software Engineering Research: A Personal Perspective

28

How about skin/lung cancer?!!!

Page 40: Empirical Evaluations in Software Engineering Research: A Personal Perspective

The “N+1” Critique encourages

29

Case Study Fishing

Shallow Analysis

Page 41: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Generalization of approaches instead of generalization of results

30

Page 42: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Encourage Meta Analysis

31

Page 43: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Encourage Meta Analysis

31

Page 44: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Encourage Meta Analysis

31

Page 45: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Encourage Meta Analysis

31

N=15..241

Total N= 2,659

Page 46: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Encourage Meta Analysis

32

Page 47: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Encourage Meta Analysis

32

1972

Page 48: Empirical Evaluations in Software Engineering Research: A Personal Perspective

More is less!

33

Page 49: Empirical Evaluations in Software Engineering Research: A Personal Perspective

34

More studies (N+1)

Replication data

Not Soft. Eng!

Page 50: Empirical Evaluations in Software Engineering Research: A Personal Perspective

34

Practitioner buy-in

More studies (N+1)

Replication data

Not Soft. Eng!

Page 51: Empirical Evaluations in Software Engineering Research: A Personal Perspective

35

Smoking kills!

Page 52: Empirical Evaluations in Software Engineering Research: A Personal Perspective

36

What do smokers think?!

Page 53: Empirical Evaluations in Software Engineering Research: A Personal Perspective

37

54.9% of the time patients did not receive the recommended care for their condition

[New England Journal of Medicine 2003]

Page 54: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Less than half of developers have

a CS degree [stack overflow survey 2016]

38

Page 55: Empirical Evaluations in Software Engineering Research: A Personal Perspective

“A couple billion lines of code later: static checking in the real world” Dawson Engler

39

Page 56: Empirical Evaluations in Software Engineering Research: A Personal Perspective

“A couple billion lines of code later: static checking in the real world” Dawson Engler

39

Page 57: Empirical Evaluations in Software Engineering Research: A Personal Perspective

“A couple billion lines of code later: static checking in the real world” Dawson Engler

39

Page 58: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Do practitioners agree with

each other?

Page 59: Empirical Evaluations in Software Engineering Research: A Personal Perspective

4141

Page 60: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Practitioners do not agree with each other

42

Presentation

–---------–-------–---------–-------

–---------–-------–---------–-------

–---------–-------–---------–-------

Survey (93 stakeholders)

Page 61: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Practitioners do not agree with each other

42

Presentation

–---------–-------–---------–-------

–---------–-------–---------–-------

–---------–-------–---------–-------

Survey (93 stakeholders)

Semi-structured interviews

(15 Key engineers)

–---------–-------–---------–-------–---------–-------–---------–-------

Interviewee’s list

Page 62: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Practitioners do not agree with each other

42

Presentation

–---------–-------–---------–-------

–---------–-------–---------–-------

–---------–-------–---------–-------

Survey (93 stakeholders)

Semi-structured interviews

(15 Key engineers)

Validation survey Interviews

(25 Senior engineers)

–---------–-------–---------–-------

–---------–-------–---------–-------

–---------–-------–---------–-------

–---------–-------–---------–-------–---------–-------–---------–-------

Interviewee’s list

–---------–-------–---------–-------–---------–-------–---------–-------–---------–-------–---------–-------

Implications

Page 63: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Practitioners do not agree with each other

42

Presentation

–---------–-------–---------–-------

–---------–-------–---------–-------

–---------–-------–---------–-------

Survey (93 stakeholders)

Semi-structured interviews

(15 Key engineers)

Validation survey Interviews

(25 Senior engineers)

–---------–-------–---------–-------

–---------–-------–---------–-------

–---------–-------–---------–-------

–---------–-------–---------–-------–---------–-------–---------–-------

Interviewee’s list

–---------–-------–---------–-------–---------–-------–---------–-------–---------–-------–---------–-------

Implications

–---------–-------–---------–-------–---------–-------–---------–-------–---------–-------–---------–-------

Verified implications

Page 64: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Practitioners do not agree with each other

42

Presentation

–---------–-------–---------–-------

–---------–-------–---------–-------

–---------–-------–---------–-------

Survey (93 stakeholders)

Semi-structured interviews

(15 Key engineers)

Validation survey Interviews

(25 Senior engineers)

–---------–-------–---------–-------

–---------–-------–---------–-------

–---------–-------–---------–-------

–---------–-------–---------–-------–---------–-------–---------–-------

Interviewee’s list

–---------–-------–---------–-------–---------–-------–---------–-------–---------–-------–---------–-------

Implications

–---------–-------–---------–-------–---------–-------–---------–-------–---------–-------–---------–-------

Verified implications

As low as 68% agreement

Page 65: Empirical Evaluations in Software Engineering Research: A Personal Perspective

43

Page 66: Empirical Evaluations in Software Engineering Research: A Personal Perspective

43

even within the same system

Page 67: Empirical Evaluations in Software Engineering Research: A Personal Perspective

44

Practitioner buy-in

More studies (N+1)

Not Soft. Eng!

Page 68: Empirical Evaluations in Software Engineering Research: A Personal Perspective

44

Practitioner buy-in

More studies (N+1)

Replication data

Not Soft. Eng!

Page 69: Empirical Evaluations in Software Engineering Research: A Personal Perspective

45

Replication data is not sufficient!

Page 70: Empirical Evaluations in Software Engineering Research: A Personal Perspective

MSR Today: Easy Peasy!

46

Page 71: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Limited knowledge of machine learning is a serious risk

47

Page 72: Empirical Evaluations in Software Engineering Research: A Personal Perspective

A Simple Example

Page 73: Empirical Evaluations in Software Engineering Research: A Personal Perspective

A Simple Example

Logistic Regression

Page 74: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Empirical Hypothesis Testing (is complex code more risky?)

49

Bugs ~ f(LOC, CC_max)

NOT about predicting bugs

Page 75: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Including correlated metrics

50

Model M1: Bugs ~ f(CC_max, CC_avg, PAR_max, FOUT_max) Model M2: Bugs ~ f(CC_avg, CC_max, PAR_max, FOUT_max)

Page 76: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Including correlated metrics

51

Model M1: Bugs ~ f(CC_max, CC_avg, PAR_max, FOUT_max) Model M2: Bugs ~ f(CC_avg, CC_max, PAR_max, FOUT_max)

Page 77: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Including correlated metrics

51

Model M1: Bugs ~ f(CC_max, CC_avg, PAR_max, FOUT_max) Model M2: Bugs ~ f(CC_avg, CC_max, PAR_max, FOUT_max)

Page 78: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Including correlated metrics

51

Model M1: Bugs ~ f(CC_max, CC_avg, PAR_max, FOUT_max) Model M2: Bugs ~ f(CC_avg, CC_max, PAR_max, FOUT_max)

60% of 2000-2011 studies do not handle correlated metrics [Shihab 2012]

Page 79: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Using F-measure in studies

52

Threshold = 0.2 Threshold = 0.5 Threshold = 0.8

Page 80: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Resampling imbalanced data

53

Page 81: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Resampling imbalanced data

53

Page 82: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Resampling imbalanced data

54

Page 83: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Resampling imbalanced data

54

Page 84: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Variable importance: Original vs under sampled data

55

Page 85: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Variable importance: Original vs under sampled data

55

Page 86: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Variable importance: Original vs under sampled data

55

Page 87: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Not including control metrics

56

Model M1: Bugs ~ f( , CC_max, PAR_max, FOUT_max) Model M2: Bugs ~ f(TLOC, CC_max, PAR_max, FOUT_max)

Page 88: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Not including control metrics

57

Model M1: Bugs ~ f( , CC_max, PAR_max, FOUT_max) Model M2: Bugs ~ f(TLOC, CC_max, PAR_max, FOUT_max)

Page 89: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Not including control metrics

57

Model M1: Bugs ~ f( , CC_max, PAR_max, FOUT_max) Model M2: Bugs ~ f(TLOC, CC_max, PAR_max, FOUT_max)

Page 90: Empirical Evaluations in Software Engineering Research: A Personal Perspective

58

Unseen100

out of sample bootstrap

10-fold cross validation

10x10 Cross Validation

Using 10 fold cross validation

Page 91: Empirical Evaluations in Software Engineering Research: A Personal Perspective

59

Unseen25

out of sample bootstrap

100 out of sample

bootstrap

10-fold cross validation

10x10 cross validation

Using 10 fold cross validation

Page 92: Empirical Evaluations in Software Engineering Research: A Personal Perspective

60

Unseen25

out of sample bootstrap

100 out of sample

bootstrap

10-fold cross validation

10x10 cross validation

Using 10 fold cross validation

Page 93: Empirical Evaluations in Software Engineering Research: A Personal Perspective

61

Unseen25

out of sample bootstrap

100 out of sample

bootstrap

10-fold cross validation

10x10 cross validation

Using 10 fold cross validation

Page 94: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Using default ANOVA

62

Model M1: Bugs ~ f( TLOC, ….. ) Model M2: Bugs ~ f( ……,TLOC )

Page 95: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Using default ANOVA

63

Model M1: Bugs ~ f( TLOC, ….. ) Model M2: Bugs ~ f( ……,TLOC )

Page 96: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Using default ANOVA

63

Model M1: Bugs ~ f( TLOC, ….. ) Model M2: Bugs ~ f( ……,TLOC )

Page 97: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Pitfalls in Analytical Modelling

Page 98: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Pitfalls in Analytical Modelling

Joint work with Kla Tantihamthavorn

Page 99: Empirical Evaluations in Software Engineering Research: A Personal Perspective

65

Pitfalls in Analytical Modelling

Page 100: Empirical Evaluations in Software Engineering Research: A Personal Perspective

66

Pitfalls in Analytical Modelling

Page 101: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Open Scripts

67

Page 102: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Open Scripts

67

Encourages community mentorship

Page 103: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Open Scripts

67

Encourages community mentorship

Works even for industrial data

Page 104: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Open Scripts

67

Encourages community mentorship

Works even for industrial data

Ensures research transparency

Page 105: Empirical Evaluations in Software Engineering Research: A Personal Perspective

68

Page 106: Empirical Evaluations in Software Engineering Research: A Personal Perspective

68

Page 107: Empirical Evaluations in Software Engineering Research: A Personal Perspective

68

Page 108: Empirical Evaluations in Software Engineering Research: A Personal Perspective

68

Page 109: Empirical Evaluations in Software Engineering Research: A Personal Perspective

68

Page 110: Empirical Evaluations in Software Engineering Research: A Personal Perspective

N

69

Where do we go from

here?

Page 111: Empirical Evaluations in Software Engineering Research: A Personal Perspective

• Trailblazing research should be encouraged and defended

• Inaccurate analysis is a serious and growing threat

• Deeper analysis is more valuable than large scale studies

• Generalization is not feasible nor desired by practitioners

• Industry is a partner not an idol

Parting thoughts

70

http://sail.cs.queensu.ca

Page 112: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Think Different!