survival analysis of database technologies in open source java projects

20
Survival Analysis of Database Technologies in Open Source Java Projects Mathieu Goeminne, Tom Mens So?ware Engineering Lab, University of Mons, Belgium hEp://informaHque.umons.ac.be/genlog/projects/disse ICSME 2015 Early Research Achievements — Bremen, Germany, September 2015

Upload: tom-mens

Post on 08-Jan-2017

413 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Survival analysis of database technologies in open source Java projects

Survival  Analysis  of  Database  Technologies   in  Open  Source  Java  Projects

Mathieu  Goeminne,  Tom  Mens  So?ware  Engineering  Lab,  University  of  Mons,  Belgium

hEp://informaHque.umons.ac.be/genlog/projects/disse

ICSME  2015  Early  Research  Achievements  —  Bremen,  Germany,  September  2015

Page 2: Survival analysis of database technologies in open source Java projects

September  2015  —  InternaHonal  Conference  on  So?ware  Maintenance  and  EvoluHon  (ICSME2015),  Bremen,  Germany

Context

• FNRS  research  project  “Data-­‐Intensive  So?ware  System  EvoluHon”  – Interuniversity  collaboraHon  with  Université  de  Namur  

• Expand  empirical  MSR  research  to  include  database-­‐related  acHviHes  

• Overall  goal  – Empirically  analyse  and  support  co-­‐evoluHon  between  program  code  and  database  schema  in  data-­‐intensive  so?ware  systems  

• This  paper:  – Study  co-­‐evoluHon  of  Java  ORM  database  technologies

Page 3: Survival analysis of database technologies in open source Java projects

September  2015  —  InternaHonal  Conference  on  So?ware  Maintenance  and  EvoluHon  (ICSME2015),  Bremen,  Germany

Focus

• Open  source  Java  projects  – Extracted  from  GitHub  Java  Corpus [Allamanis&SuEon  —  MSR  2013]  

– We  considered  13,307  Java  projects  sHll  having  a  Git  repository  in  March  2015

3

Page 4: Survival analysis of database technologies in open source Java projects

September  2015  —  InternaHonal  Conference  on  So?ware  Maintenance  and  EvoluHon  (ICSME2015),  Bremen,  Germany

Focus

• Many  Java  relaHonal  database  technologies

4

Page 5: Survival analysis of database technologies in open source Java projects

September  2015  —  InternaHonal  Conference  on  So?ware  Maintenance  and  EvoluHon  (ICSME2015),  Bremen,  Germany

Focus

• Java  relaHonal  database  technologies

5

Page 6: Survival analysis of database technologies in open source Java projects

September  2015  —  InternaHonal  Conference  on  So?ware  Maintenance  and  EvoluHon  (ICSME2015),  Bremen,  Germany

Focus

• Java  relaHonal  database  technologies  • 19  different  Java  technologies  considered (of  at  least  3  years  old)  • Detected  by  looking  at import  statements  and configuraDon  files  

• This  le?  us  with  3,819  Java  projects

6

Page 7: Survival analysis of database technologies in open source Java projects

September  2015  —  InternaHonal  Conference  on  So?ware  Maintenance  and  EvoluHon  (ICSME2015),  Bremen,  Germany

Focus

• Top  5  technologies  occurred  in  over  200  projects  each  • 3,707  Java  projects  used  (at  least)  one  of  these  5  technologies

7

200

Page 8: Survival analysis of database technologies in open source Java projects

September  2015  —  InternaHonal  Conference  on  So?ware  Maintenance  and  EvoluHon  (ICSME2015),  Bremen,  Germany

Research  QuesHons

RQ1    Which  combinaHons  of  database  technologies  co-­‐occur  in  the  projects  in  which  they  are  used?  RQ2    How  long  do  database  technologies  survive  in  the  projects  in  which  they  occur?  RQ3    Does  the  introducHon  of  a  technology  influence  the  survivability  of  another  one?  RQ4    How  long  does  it  take  to  introduce  a  second  technology  a?er  a  previous  one  was  introduced?

8

Page 9: Survival analysis of database technologies in open source Java projects

September  2015  —  InternaHonal  Conference  on  So?ware  Maintenance  and  EvoluHon  (ICSME2015),  Bremen,  Germany

Different  technologies  used  within  the  same  Java  project  (not  necessarily  at  the  same  Hme)

(61%)

RQ1    Which  combinaHons  of  technologies  co-­‐occur  in  the  projects  in  which  they  are  used?

9

(34,5%)

Page 10: Survival analysis of database technologies in open source Java projects

September  2015  —  InternaHonal  Conference  on  So?ware  Maintenance  and  EvoluHon  (ICSME2015),  Bremen,  Germany

RQ1    Which  combinaHons  of  technologies  co-­‐occur  in  the  projects  in  which  they  are  used?• How  frequently  do  these  technologies  actually  co-­‐occur?  • Answer:  most  of  the  Hme!

10

JPA annotations as an indicator of the use of any frameworkbut JPA itself.

Vaadin is a framework for developing web applications. Itintroduces the notion of domain layer, which abstracts thedatabase structure through Java classes hosting the businesslogic of the application.

RQ1: Which combinations of database frameworks “co-occur”in the projects in which they are used?

We identified which of the 5 considered database frameworksoccurred throughout the lifetime of each considered project, andwe computed all possible intersections of framework occurrencein Fig. 2.

JDBC occurs as the only database framework in 56.3% ofall projects. At the other side of the spectrum, Hibernate occursas the only framework in only 2.9% of all projects. If we lookat their intersection, the large majority (82.8%) of all projectsthat have used Hibernate have also used JDBC during theirlifetime.

Something similar can be observed for JDBC and JPA. JPAoccurs in isolation in 29.5% of all projects, while almost halfof all projects that have used JPA (49.3% to be precise) havealso used JDBC during their lifetime.

Similarly, when comparing Hibernate and JPA, we observethat 49.6% of all projects that have used Hibernate have alsoused JPA, while 44.1% of all projects that have used Hibernatehave also used JPA and JDBC.

Fig. 2. Number of Java projects using a given number of database frameworks(over the entire project’s lifetime).

These high numbers could be due to the fact that somedatabase frameworks are used as supporting technologiesfor others (e.g., Spring typically uses JDBC for databaseaccess), while some frameworks are complementing each other(e.g., Vaadin has an optional module called JPAContainer forsupporting JPA annotations).

To determine for which frameworks this is the case, westudied the “co-occurrence” of different frameworks withinthe same project. This happens when files relating to bothframeworks are present in at least one of the project’s commits(but typically in many more commits).

Table II shows vertically the number of projects having useda given number of distinct frameworks over their entire history,and horizontally the maximum number of distinct “co-occurring”frameworks. Almost all values reside on the diagonal, implyingthat in the large majority of all cases (97.5%, i.e., 1213/1273),different database frameworks used in a project tend toco-occur.

TABLE IINUMBER OF PROJECTS INVOLVING A GIVEN NUMBER OF FRAMEWORKS,

OVER THEIR ENTIRE LIFETIME AND IN CO-OCCURRENCE.

# co-occurring fw. ! 1 2 3 4 5# total # frameworks used1 2,4432 22 7763 2 16 3284 0 0 18 1045 0 0 1 1 5

Focusing on specific combinations of frameworks, Table IIIreports the number of projects in which two database frame-works co-occurred at least once during the project’s lifetime.Not surprisingly, we observe that JDBC frequently co-occurswith other frameworks. That lets us suppose that JDBC is usedas a supporting technology that provides services not offered bythe other frameworks. 80.1% of all projects that used Hibernatehave also used JDBC in co-occurrence; 48.4% of all projectsthat used JPA have used JDBC in co-occurrence; 41.3% of allprojects that used Spring have used JDBC in co-occurrence;and 39.6% of all projects that used Vaadin have used JDBCin co-occurrence.

TABLE IIINUMBER OF PROJECTS IN WHICH PAIRS OF DATABASE FRAMEWORKS

CO-OCCUR.

Spring JPA Vaadin HibernateJDBC 645 565 143 192Spring 558 76 156JPA 98 105Vaadin 22

Some database frameworks seem to complement one another.For example, 47.8% of all projects using Spring also use JPA.Other database frameworks appear to be in competition. Forexample, Vaadin co-occurs with Hibernate in only 22 projects,which makes up 9.2% of all projects using Hibernate, and only6.1% of all projects using Vaadin. To a lesser extent, Vaadinalso co-occurs infrequently together with JPA or Spring.

RQ2: How long do database frameworks “survive” in theprojects in which they occur?

Fig. 3 shows the Kaplan-Meier survival curves of the selectedframeworks. After their introduction, all database frameworksremain present in more than 45% of the projects. Neverthe-less, we observe different trends in framework survivability.For example, in 11.7% of all cases Hibernate is removed 30days after its introduction. In the same time interval, Spring isonly removed from 3.7% of the projects. Three years after its

Page 11: Survival analysis of database technologies in open source Java projects

September  2015  —  InternaHonal  Conference  on  So?ware  Maintenance  and  EvoluHon  (ICSME2015),  Bremen,  Germany

RQ1    Which  combinaHons  of  technologies  co-­‐occur  in  the  projects  in  which  they  are  used?• JDBC  co-­‐occurs  with  most  other  technologies  • Spring  and  JPA  co-­‐occur  very  frequently  • Vaadin  occurs  infrequently  with  Hibernate,  JPA,  Spring

11

JPA annotations as an indicator of the use of any frameworkbut JPA itself.

Vaadin is a framework for developing web applications. Itintroduces the notion of domain layer, which abstracts thedatabase structure through Java classes hosting the businesslogic of the application.

RQ1: Which combinations of database frameworks “co-occur”in the projects in which they are used?

We identified which of the 5 considered database frameworksoccurred throughout the lifetime of each considered project, andwe computed all possible intersections of framework occurrencein Fig. 2.

JDBC occurs as the only database framework in 56.3% ofall projects. At the other side of the spectrum, Hibernate occursas the only framework in only 2.9% of all projects. If we lookat their intersection, the large majority (82.8%) of all projectsthat have used Hibernate have also used JDBC during theirlifetime.

Something similar can be observed for JDBC and JPA. JPAoccurs in isolation in 29.5% of all projects, while almost halfof all projects that have used JPA (49.3% to be precise) havealso used JDBC during their lifetime.

Similarly, when comparing Hibernate and JPA, we observethat 49.6% of all projects that have used Hibernate have alsoused JPA, while 44.1% of all projects that have used Hibernatehave also used JPA and JDBC.

Fig. 2. Number of Java projects using a given number of database frameworks(over the entire project’s lifetime).

These high numbers could be due to the fact that somedatabase frameworks are used as supporting technologiesfor others (e.g., Spring typically uses JDBC for databaseaccess), while some frameworks are complementing each other(e.g., Vaadin has an optional module called JPAContainer forsupporting JPA annotations).

To determine for which frameworks this is the case, westudied the “co-occurrence” of different frameworks withinthe same project. This happens when files relating to bothframeworks are present in at least one of the project’s commits(but typically in many more commits).

Table II shows vertically the number of projects having useda given number of distinct frameworks over their entire history,and horizontally the maximum number of distinct “co-occurring”frameworks. Almost all values reside on the diagonal, implyingthat in the large majority of all cases (97.5%, i.e., 1213/1273),different database frameworks used in a project tend toco-occur.

TABLE IINUMBER OF PROJECTS INVOLVING A GIVEN NUMBER OF FRAMEWORKS,

OVER THEIR ENTIRE LIFETIME AND IN CO-OCCURRENCE.

# co-occurring fw. ! 1 2 3 4 5# total # frameworks used1 2,4432 22 7763 2 16 3284 0 0 18 1045 0 0 1 1 5

Focusing on specific combinations of frameworks, Table IIIreports the number of projects in which two database frame-works co-occurred at least once during the project’s lifetime.Not surprisingly, we observe that JDBC frequently co-occurswith other frameworks. That lets us suppose that JDBC is usedas a supporting technology that provides services not offered bythe other frameworks. 80.1% of all projects that used Hibernatehave also used JDBC in co-occurrence; 48.4% of all projectsthat used JPA have used JDBC in co-occurrence; 41.3% of allprojects that used Spring have used JDBC in co-occurrence;and 39.6% of all projects that used Vaadin have used JDBCin co-occurrence.

TABLE IIINUMBER OF PROJECTS IN WHICH PAIRS OF DATABASE FRAMEWORKS

CO-OCCUR.

Spring JPA Vaadin HibernateJDBC 645 565 143 192Spring 558 76 156JPA 98 105Vaadin 22

Some database frameworks seem to complement one another.For example, 47.8% of all projects using Spring also use JPA.Other database frameworks appear to be in competition. Forexample, Vaadin co-occurs with Hibernate in only 22 projects,which makes up 9.2% of all projects using Hibernate, and only6.1% of all projects using Vaadin. To a lesser extent, Vaadinalso co-occurs infrequently together with JPA or Spring.

RQ2: How long do database frameworks “survive” in theprojects in which they occur?

Fig. 3 shows the Kaplan-Meier survival curves of the selectedframeworks. After their introduction, all database frameworksremain present in more than 45% of the projects. Neverthe-less, we observe different trends in framework survivability.For example, in 11.7% of all cases Hibernate is removed 30days after its introduction. In the same time interval, Spring isonly removed from 3.7% of the projects. Three years after its

Page 12: Survival analysis of database technologies in open source Java projects

September  2015  —  InternaHonal  Conference  on  So?ware  Maintenance  and  EvoluHon  (ICSME2015),  Bremen,  Germany

RQ2    How  long  do  database  technologies  survive  in  the  projects  in  which  they  occur?• Use  staHsHcal  technique  of  survival  analysis  

• Kaplan-­‐Meier  esHmator  represents  probability  to  survive  (i.e.,  the  Hme  it  takes  for  a  specific  event    to  occur)  

• Takes  into  account  right-­‐censored  data  (e.g.,  the  event  did  not  occur  yet)

12

Page 13: Survival analysis of database technologies in open source Java projects

September  2015  —  InternaHonal  Conference  on  So?ware  Maintenance  and  EvoluHon  (ICSME2015),  Bremen,  Germany

RQ2    How  long  do  database  technologies  survive  in  the  projects  in  which  they  occur?• Most  technologies  tend  to  survive  (>50%  probability)  over  the  project’s  lifeHme  • A  bit  less  for  Vaadin  and  Hibernate

13

Page 14: Survival analysis of database technologies in open source Java projects

September  2015  —  InternaHonal  Conference  on  So?ware  Maintenance  and  EvoluHon  (ICSME2015),  Bremen,  Germany

RQ3    Does  the  introducHon  of  a  technology  influence  the  survivability  of  another  one?

• Visually  we  observe  slightly  improved  survival  for  some  combinaHons;  but  staDsDcally  insignificant

14

0 1000 2000 3000 4000 5000 6000

0.0

0.2

0.4

0.6

0.8

1.0

C2C1

A = spring B = jdbc

0 1000 2000 3000 4000 5000 6000

0.0

0.2

0.4

0.6

0.8

1.0

C2C1

A = jdbc B = spring

Page 15: Survival analysis of database technologies in open source Java projects

September  2015  —  InternaHonal  Conference  on  So?ware  Maintenance  and  EvoluHon  (ICSME2015),  Bremen,  Germany

Effect  of  project  size?

• Project  size  and  age  follow  a  log-­‐normal  distribuHon  • How  does  this  affect  the  results?  

– Split  projects  in  two  equal  bins  according  to  project  size,  and  compare  results

15

Time  unDl  technology  introducDon

• Technologies  tend  to  be  introduced  at  the  start  of  the  project  

• Some  small  projects  also  introduce  technologies  near  the  end  their  observed  lifeHme  

Page 16: Survival analysis of database technologies in open source Java projects

September  2015  —  InternaHonal  Conference  on  So?ware  Maintenance  and  EvoluHon  (ICSME2015),  Bremen,  Germany

RQ4    How  long  does  it  take  to  introduce  a  second  technology  a?er  a  previous  one  was  introduced?

• Small  projects  are  less  likely  to  introduce  a  second  technology,  and  do  it  later  • JDBC  is  rarely  completed  with  another  technology  • Hibernate  is  o?en  quickly  completed  with  (or  replaced  by)  another  techno  

16

Page 17: Survival analysis of database technologies in open source Java projects

September  2015  —  InternaHonal  Conference  on  So?ware  Maintenance  and  EvoluHon  (ICSME2015),  Bremen,  Germany

Conclusions

• Some  technologies  are  much  more  popular  than  others  – JDBC,  JPA,  Hibernate,  Spring  

• Different  Java  database  technologies  tend  to  co-­‐occur  together  – Especially  in  the  larger  Java  projects  

• Technologies,  once  introduced,  tend  to  remain  • Introducing  new  technos  does  not  “replace”  exisHng  ones  but  rather  “complements”  them  

– Survival  of  exisHng  technologies  is  not  negaHvely  affected  

• Big  projects  tend  to  behave  differently  than  small  ones  • Many  technologies  are  used  in  combinaHon  with  JDBC;  but  JDBC  is  also  o?en  used  in  isolaHon

17

Page 18: Survival analysis of database technologies in open source Java projects

September  2015  —  InternaHonal  Conference  on  So?ware  Maintenance  and  EvoluHon  (ICSME2015),  Bremen,  Germany

Future  Work

• Analyse  co-­‐occurrence  of  technologies  at  finer  level  of  granularity    

• Look  at  NoSQL  database  technologies  • Study  co-­‐evoluHon  between  changes  in  the  source  code  and  changes  in  the  database  schema  

• Study  social  aspects

18

Page 19: Survival analysis of database technologies in open source Java projects

September  2015  —  InternaHonal  Conference  on  So?ware  Maintenance  and  EvoluHon  (ICSME2015),  Bremen,  Germany

References

• M.  Goeminne,  T.  Mens.  Towards  a  survival  analysis  of  database  framework  usage  in  Java  projects.  ICSME  2015  ERA  track  

• M.  Goeminne,  A.  Decan,  T.  Mens.  Co-­‐evolving  code-­‐related  and  database-­‐related  changes  in  a  data-­‐intensive  soQware  system.  CSMR-­‐WCRE  2014  ERA  track    

• L.  Meurice,  A.  Cleve.  DAHLIA:  A  visual  analyzer  of  database  schema  evoluDon.  CSMR-­‐WCRE  2014  Tool  Demo  

• A.  Cleve,  T.  Mens,  J.-­‐L.  Hainaut.  Data-­‐intensive  system  evoluHon,  IEEE  Computer  43(8):  110-­‐112  (2010)  

• A.  Cleve,  M.  Gobert,  L.  Meurice,  J.  Maes,  J.  Weber.  Understanding  database  schema  evoluHon:  A  case  study,  Science  of  Computer  Programming  (2013)

19

Page 20: Survival analysis of database technologies in open source Java projects

September  2015  —  InternaHonal  Conference  on  So?ware  Maintenance  and  EvoluHon  (ICSME2015),  Bremen,  Germany

References

X

!!Evolving Software Systems Mens, Tom; Serebrenik, Alexander; Cleve, Anthony (Eds.) 2014, XXIII, 404 p. !Springer, ISBN 978-3-642-45398-4