icsme 2016 keynote: an ecosystemic and socio-technical view on software maintenance and evolution

Post on 16-Apr-2017

486 Views

Category:

Science

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

An Ecosystemicand Socio-TechnicalView on Software

Maintenance & Evolution

Tom Mens @tom_mensCOMPLEXYS Research Institute

University of Mons, Belgium

-1999 PhD @VUB

1999-2003Postdoc @VUB

2003-now(full) professor

OO design &

refactoring

MDSE, model transformation

empirical research of

software ecosystems

2004 20081994- 2004

1998- 2004

2010- now

Research Collaborators

Research Context

2012-2017 ongoing research project“Ecological Studies of Open Source Software Ecosystems”

- Interdisciplinary research- Use ideas from biological ecology to understand and

improve evolution of software ecosystems

A software ecosystem is a collection of software projects that are developed

and evolve together in the same environment.

Mircea Lungu(PhD, 2008)

8

Software Ecosystem Examples

Gnome

CRAN

Debian Ubuntu KDE

JavaScript Ruby

When things go wrong…

CRAN

Credits: http://www.designandanalytics.com/cran-gephi/

Package dependency graph

> 9K active packages> 21K dependenciesin April 2016

CRAN

• Increasing number of R packages hosted on GitHub“non-transparent nature of the CRAN submission / rejection process”“CRAN […] is revealing some limitations of the current design. One such problem is the general lack of dependency versioning in the infrastructure.”

• Problems with breaking dependencies“It is more and more of a pain if the package I’m depending on breaks”“One recent example was the forced roll-back of the ggplot2 update to version 0.9.0, because the introduced changes caused several other packages to break.”

Decan et al. “When GitHub Meets CRAN: An Analysis of Inter-Repository Package Dependency Problems.” SANER 2016

JavaScript> 317K packages > 728K dependencies in June 2016

JavaScript

• Deliberate desire to distribute micropackages• Lots of dependencies to micropackagesExample: isarray

(150 direct, 77K transitive in-deps on Aug 2016)

var toString = {}.toString;module.exports = Array.isArray || function (arr) { return toString.call(arr) == '[object Array]’;};

David Haney’s code blog, March 2016http://www.haneycodes.net/npm-left-pad-have-we-forgotten-how-to-program/

Example: leftpad

• Package leftpadfunction leftpad (str, len, ch) {  str = String(str);  var i = -1;  if (!ch && ch !== 0) ch = ' ';  len = len - str.length;  while (++i < len) { str = ch + str; }  return str;}

• What happened?– Its developer unpublished all his modules from npm

“This impacted many thousands of projects. [...] We began observing hundreds of failures per minute, as dependent projects – and their dependents, and their dependents... – all failed when requesting the now-unpublished package.”

http://blog.npmjs.org/post/141577284765/kik-left-pad-and-npm

Example: leftpad

Departure of acentral contributor

• All bug handling became concentrated in 1 contributor• Contributor suddenly left project, being dissatisfied• Lasting negative impact on bug handling performance

Zanetti et al. “The rise and fall of a central contributor: Dynamics of social organization and performance in the Gentoo community.” CHASE 2013

17

Strict policy and tools for ensuring backward compatibility• “Prime Directive: When evolving the Component API

from release to release, do not break existing clients”

Bogart et al. “How to break an API: Cost negotiation and community values in three software ecosystems.” FSE 2016

18

May lead to stagnation and drive away developers – Coordination around synchronized yearly releases slows

down development

“If you have hip things, then you get people who create new APIs on top of that […] These things don’t happen on the Eclipse platform anymore.” “you have to be very patient and know who to talk with […] in order to get your patches accepted, and I think it’s very intimidating for some new people to come on.”

Bogart et al. “How to break an API: Cost negotiation and community values in three software ecosystems.” FSE 2016

Socio-Technical View

20

• Software ecosystems suffer from problems because of technical factors, social reasons, or both.

• A socio-technical viewis therefore essential for software ecosystem evolution research.

Socio-Technical View

• Socio-technical analyses can benefit frommixed method research– Combine quantitative and qualitative methods

into a single study• Empirical analysis of objective data• user surveys and interviews

– Exploiting their complementarity increases confidence of the findings

Johnson et al. Mixed methods research: A research paradigm whose time has come. Educational Researcher 33(7): 14–26, 2004

Software Ecosystem (SECO)Research Challenges

Understanding SECOs• How are SECOs structured?• What are their tools, habits, values, boundaries?• How do they emerge and evolve over time?• What are the mechanisms driving their dynamics?• How do different SECOs compare?• How to face technical challenges?

Serebrenik et al. “Challenges in Software Ecosystems Research.” IWSECO-WEA 2015

Software EcosystemResearch Challenges

Supporting SECO communities• How can they be made more sustainable and

resilient?• How can we predict their evolution?• How can we improve the SECO?

– In terms of productivity, quality, diversity, maintainability, survival, popularity, …

Serebrenik et al. “Challenges in Software Ecosystems Research.” IWSECO-WEA 2015

Supporting SECOsIncreasing resilience & sustainability

24

Can the SECO• resist to major disturbances?• return to a stable equilibrium after a major

disturbance?

Possible approach:• Estimate, predict and reduce risk of bus factor

Bus factorSocial view

Specific activity concentrated in few persons.Examples:

– Single responsible for bug handling in Gentoo– Only one developer knows some part of the code

Bus factorTechnical view

Too much software components depend on a single software component.

– Makes components more brittle to future changes– npm leftpad example

Bus factor

Active area of research

At least 4 GitHub projects compute (social) bus factor.

Cosentino et al. “Assessing the bus factor of Git repositories.” SANER 2015

Avelino et al. “A novel approach for estimating truck factors.” ICPC 2016

Bus factor

Experimental support on GitHubhttps://libraries.io/bus-factor

Bus factor

https://dependencyci.com

Supporting SECOsImproving quality

By increasing technical wealththrough reducing technical debt

“a concept in programming that reflects the extra development work that arises when code that is easy to implement in the short run is used instead of applying the best overall solution”(Ward Cunningham, 1992)

http://legacycoderocks.libsyn.com/technical-wealth-with-declan-wheelan

Implementation of SQALE model in SonarQube

Supporting SECOsImproving quality

Social view: Reducing social debt “Unforeseen project cost connected to sub-optimal organizational-social structures”

Supporting SECOsImproving quality

Reducing social debt by removing community smells– Organisational silo

• High decoupling and lack of communication between tasks– Black cloud

• lack of people able to bridge the knowledge and experience gap between distinct communities

– Prima-donnas• Seemingly condescending and egotistical behaviour, irreceptiveness to

collaboration– Sharing villainy

• Lack of knowledge exchange incentives– Organisational skirmish

• Misalignment of organisational cultures between distinct communities – …

Interdisciplinary research

“Many challenges we face are not solvable by people remaining in their single discipline silos”…

www.newscientist.com/article/mg20928002-100-open-your-mind-to-interdisciplinary-research/

Interdisciplinary research

“bringing […] disciplines together in the long term is what provides the big, big breakthroughs”

Interdisciplinary researchSocial Network Analysis (SNA)

Social Network Analysis

Social network centrality measuresDegree

Number of in- or outgoing dependencies of a node.

BetweennessQuantifies number of times a node acts as a bridge along the shortest path between two other nodes.

ClosenessThe more central a node, the lower its total distance from all other nodes.

Eigenvector centrality and PageRankMeasures the influence of a node in a network.

Social Network Analysis

Social network centrality measures

Social Network Analysis

Can be used to– detect social debt– identify social bus factor– predict software failures– … and many more …

Social Network Analysis

Social bus factor in Gentoo Linux– All bug handling became concentrated in one contributor– Measured by significant increase of centralization and

performance.

Zanetti et al. “The rise and fall of a central contributor: Dynamics of social organization and performance in the Gentoo community.” CHASE 2013

Social Network Analysis

Social bus factor in Gentoo Linux– Contributor suddenly left the project, being

dissatisfied– Sentiment analysis showed correlation with negative

emotions– Lasting negative impact on the bug handling

performance of the community.

Zanetti et al. “The rise and fall of a central contributor: Dynamics of social organization and performance in the Gentoo community.” CHASE 2013

Use of SNA to better predict software failures– By combining program dependency information

with social network information

Social Network Analysis

Bird et al. “Putting it All Together: Using Socio-Technical Networks to Predict Failures.” ISSRE 2009

Pinzger et al. “Can developer-module networks predict failures?”FSE 2008

Mirroring hypothesis

Conway’s lawSoftware structure tends to mirror the organisational/social structure

A.k.a. socio-technical congruencealignment between technical dependencies and social coordination in a project

Mirroring hypothesis

Conway’s law

• Evidence in favor: commercial “in-house” development

• Evidence against: “community-based” development

More modular software=> emergent “complex network” structure?

MacCormack et al. “Exploring the duality between product and organizational architectures: A test of the mirroring hypothesis.” Research Policy, 2012.

Colfer et al. “The mirroring hypothesis: Theory, evidence and exceptions.” Harvard Business School, 2010.

Interdisciplinary researchComplex Systems

Interdisciplinary researchComplexity Theory

Interdisciplinary researchComplex Systems

“A new approach to science that investigates how relationships between parts give rise to the collective behaviors of a system and how the system interacts

and forms relationships with its environment.”

Emergence: process whereby larger entities, patterns, and regularities arise through interactions among smaller or simpler entities that themselves do not exhibit such properties.

Complexity TheoryNetwork Theory

Citation from Mitchell’s book:

“network thinking is providing novel ways to think about difficult problems such as how to do efficient search on the Web, […] how to manage large organisations, how to preserve ecosystems, […] and, more generally, what kind of resilience and vulnerabilities are intrinsic to natural, social, and technological networks, and how to exploit and protect such systems.”

Complexity TheoryNetwork Theory

Some characteristics of complex networks:

Small-world property• Low average path length between any two nodes• Highly-clustered components linked through hubs

Skewed distributions (power law behaviour)• Few nodes with very high in-degree (resp. out-degree),

many nodes with very small in-degree (resp. out)

Complexity TheoryNetwork Theory

Some characteristics of complex networks:

Scale-freeness• Observed degree distribution is very similar

regardless of the scale of the observation

Scale-free networks are resilient• Robust to deletion of random (non-hub) nodes• vulnerable to the deletion of hubs

Complexity TheoryNetwork Theory

Examples of complex networks exhibiting these characteristics

– World-Wide Web– (Technical) software dependency graphs– Social networks (e.g. Facebook)– (Socio-technical) software ecosystems

Complexity TheoryNetwork Theory

Examples of softwaresystem dependencynetworks

Network TheoryPossible applications for SECOs• Provide prediction/forecasting models

– of how SECOs emerge– of how SECOs grow/evolve

• Estimate the resilience and sustainability of SECOs after major disturbances

• Assess risk of deleting hub nodes bus factor!

Network TheoryPossible applications for SECOsHow do SECOs emerge and grow?

A popular model is preferential attachmentOver time, nodes with higher degree receive more links than nodes with lower degree.

Extensions of this model have been proposed to simulate the growth of complex software systems

By mimicking the principle of coupling & cohesion

Barabasi et al. Emergence of Scaling in Random Networks. Science 286, 1999

Li et al. Multi-Level Formation of Complex Software Systems. Entropy 18(178), 2016

Network TheoryPossible applications for SECOs

Interdisciplinary researchEcology and natural ecosystems

Ecology and natural ecosystems

Biodiversity of species E.g. hosts – parasites / plants – pollinators

58

Mutual dependency and functional redundancy

Disappearing species may be compensated by others if there is sufficient diversity in both layers.

Ecology and natural ecosystems

Diversity metrics• species richness = number of different species in the ecosystem• species evenness (entropy) = relative abundance of the

population of each species in the ecosystem• Shannon diversity index (relative entropy) = specialisation of a

given species in relation to the species in the other level• Simpson index = degree of concentration when individuals are

classified into species

59

Software Ecosystems

Diversity in software ecosystems

62

Mutual dependency and functional redundancy

Disappearance of projects or contributors may be

compensated by others.

Software EcosystemsDiversity

Are software project teams diverse?– In terms of code ownership, types of activity,

gender balance, seniority, …How does this diversity affect …

– defect-proneness?– productivity?– …

Software EcosystemsDiversity

Success story of diversity measures:Assess defect-proneness in software projects

• More focused developers introduce fewer defects. • Modules receiving narrowly focused activity

are more likely to contain defects.

Posnett et al. Dual Ecological Measures of Focus in Software development.ICSE 2013

Software EcosystemsGender Diversity

Effect of gender diversity on productivity?Women underrepresented in programming

– industry: 16-18% female developers– open source: ~10%– social coding platforms:

• GitHub: ~9%• StackOverflow: ~7%

Vasilescu et al. Gender and tenure diversity in GitHub teams. CHI 2015A Data Set for Social Diversity Studies of GitHub Teams (MSR’15)

Software EcosystemsGender Diversity

Success story of diversity measures:– Gender and tenure diversity are positive and

significant predictors of productivity– Teams that are more balanced in terms of gender

and seniority have higher productivity rates

Vasilescu et al. Gender and tenure diversity in GitHub teams. CHI 2015

Interdisciplinary researchSurvival Analysis

Statistical technique used in many disciplines to analyze the time until the occurrence of an event of interest• Medicine

– Effect of treatment or medicine to cure disease– Effect of disease on patient mortality

• Sociology– Factors influencing marriage or divorce

Interdisciplinary researchSurvival Analysis

Interdisciplinary researchSurvival Analysis

Success story:OSS project survival

Factors positivelyinfluencing survival:

#contributorsProject age

Basis for predictionmodels

Samoladas et al. Survival analysis on the duration of open source projects. IST 2010

SECO Research Challenges continued…

Understanding SECOs• How do different SECOs compare?• How to face technical challenges?

– Big data– Privacy versus reproducibility

Serebrenik et al. “Challenges in Software Ecosystems Research.” IWSECO-WEA 2015

Research ChallengeComparing SECOs

• Each software ecosystem– has specific habits, expectations, change policies– uses specific tools

• Taking into account these differences is important– to support SECO maintenance and evolution– to generalise research findings across SECOs

Bogart et al. “How to break an API: Cost negotiation and community values in three software ecosystems.” FSE 2016

Decan et al. “On the topology of package dependency networks – A comparison of three programming language ecosystems.” WEA 2016

Research ChallengeBig Data

Volume Velocity

Variety Veracity

4V

Research Challenge

Privacy Reproducibility

Research ChallengePrivacy vs reproducibility

How to preserve privacy of individuals?– EU 2016/679 regulation on the protection of natural

persons with regard to the processing of personal data and on the free movement of such data

“The principles of data protection should apply to any information concerning an identified or identifiable natural person. “

– Appropriate anonimisation and privacy-preserving data mining techniques needed

Fung et al. Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys 2010

Malik et al. Privacy preserving data mining techniques: Current scenario and future prospects. IC3T 2012

Research ChallengePrivacy vs reproducibility

• Increase/ensure reproducible research results– Awareness is increasing– Solutions are being put into place– Big data problems remain an issue

• How to reconcile privacy with reproducibility?

Gonzalez-Barahona et al. On the reproducibility of empirical software engineering studies based on data retrieved from development repositories. Emp. Softw. Eng. 2012

Wrap-up

Research on SECO evolution requires– A socio-technical view– Mixed method research– Interdisciplinary research

Many technical challenges need to be faced

Are you willing to take up the challenge?

top related