on the topology of package dependency networks: a comparison of programming language ecosystems

20
On the Topology of Package Dependency Networks A Comparison of Programming Language Ecosystems Alexandre Decan, Tom Mens, Maëlick Claes Software Engineering Lab 1 November 2016 – Int’l Workshop Software Ecosystem Architectures (WE

Upload: tom-mens

Post on 16-Apr-2017

247 views

Category:

Software


0 download

TRANSCRIPT

Page 1: On the topology of package dependency networks: A comparison of programming language ecosystems

On the Topology of Package Dependency NetworksA Comparison of Programming Language Ecosystems

Alexandre Decan, Tom Mens, Maëlick ClaesSoftware Engineering Lab

1

29 November 2016 – Int’l Workshop Software Ecosystem Architectures (WEA)

Page 2: On the topology of package dependency networks: A comparison of programming language ecosystems

ResearchTeam

Page 3: On the topology of package dependency networks: A comparison of programming language ecosystems

Previous Work

• A. Decan, T. Mens, M. Claes, P. Grosjean– IWSECO-WEA 2015: "On the Development and Distribution of R

Packages: An Empirical Analysis of the R Ecosystem"– SANER 2016:"When GitHub Meets CRAN: An Analysis of Inter-

Repository Package Dependency Problems”

•A. Serebrenik, T. Mens– WEA 2015: "Challenges in Software Ecosystems Research"• Generalizability• Comparing different ecosystems

3

Page 4: On the topology of package dependency networks: A comparison of programming language ecosystems

Software Packaging Ecosystems

• Ecosystem: ”a collection of software projects which are developed and evolve together in the same environment” [Lungu]

• Software distributed as packages– Dependency relationships between

packages– Package versioning

4

Page 5: On the topology of package dependency networks: A comparison of programming language ecosystems

Software Packaging Ecosystemsfor programming languages

• Many programming-language specificpackage managers

5

npmJavaScript

PyPIPython

RubyGemsRuby

CRANR

Page 6: On the topology of package dependency networks: A comparison of programming language ecosystems

Software Packaging Ecosystemsfor programming languages

IEEE Spectrum ranking of most popular programming languages

6

(http://spectrum.ieee.org/image/Mjc5MjI0Ng.png)

“The real standard library people want is more like what you find in Python

or Ruby, and it’s more batteries included, feature complete, and that is not

in JavaScript. That’s in the NPM world or the larger world.”

Page 7: On the topology of package dependency networks: A comparison of programming language ecosystems

Ecosystem comparison

7

CRAN PyPI NPM

Snapshot date 2016-04-26 2016-02-17 2016-06-28Packages 9k 56k 317k

Dependencies 21k 53k 728kNew packages in

20151.6k 17k 113k

Updates in 2015 8k 131k 711k

Page 8: On the topology of package dependency networks: A comparison of programming language ecosystems

Data extraction

• CRAN: https://github.com/ecos-umons/extractoR• npm: https://registry.npmjs.org• PyPI: Missing dependencies information

=> https://kgullikson88.github.io/blog/pypi-analysis.html

8

Page 9: On the topology of package dependency networks: A comparison of programming language ecosystems

Terminology

• b is a dependency of a• a is a reverse dependency of b• c is a transitive dependency of a• a is a transitive reverse dependency of c• {a, b, c, d, e, f} is a (weakly connected) component• g is an isolated package 9

Page 10: On the topology of package dependency networks: A comparison of programming language ecosystems

Dependency usagein programming language ecosystems

PyPI has proportionally more isolated Python packages(due to its extensive standard library?)

10

“The real standard library people want is more like what you find in Python or Ruby, and it’s more batteries included, feature complete, and that is not in JavaScript. That’s in the NPM world or the larger world.”

Page 11: On the topology of package dependency networks: A comparison of programming language ecosystems

Topologyof programming language ecosystems

The majority of packages are part of a single huge component

11

Largest component:• 76.5% (CRAN), 35.6% (PyPI), 63.8% (npm) of all packages• 91% (CRAN), 88% (PyPI), 92% (npm) of all non-isolated packages

Page 12: On the topology of package dependency networks: A comparison of programming language ecosystems

Differences in dependenciesbetween programming language ecosystems

12

npm packages have a much higher ratio of transitive dependencies

Page 13: On the topology of package dependency networks: A comparison of programming language ecosystems

Differences in reverse dependencies between programming language ecosystems

13

There are proportionally more very popular npm packages(i.e. higher number of transitive reverse dependencies)

Page 14: On the topology of package dependency networks: A comparison of programming language ecosystems

Differences in reverse dependencies between programming language ecosystems

14

Number of packages required by more than 2% of the ecosystem

Page 15: On the topology of package dependency networks: A comparison of programming language ecosystems

Possible explanationmicro-packages in npm

“In a lot of JavaScript environments, space is at a premium. [...] Several larger libraries […] have actually intentionally split themselves into sub-modules because people usually only ever load them to use a single merge function.”

Example: isarray150 direct, 77K inverse transitive deps in August 2016

var toString = {}.toString;module.exports = Array.isArray || function (arr) { return toString.call(arr) == '[object Array]’;};

15

Page 16: On the topology of package dependency networks: A comparison of programming language ecosystems

function leftpad (str, len, ch) {  str = String(str);  var i = -1;  if (!ch && ch !== 0) ch = ' ';  len = len - str.length;  while (++i < len) { str = ch + str; }  return str;}

Known problems: leftpad

16

Its developer removed all his packages from npm:“This impacted many thousands of projects. [...] We began observing hundreds of failures per minute, as dependent projects – and their dependents, and their dependents... – all failed when requesting the now-unpublished package.”

http://blog.npmjs.org/post/141577284765/kik-left-pad-and-npm

Page 17: On the topology of package dependency networks: A comparison of programming language ecosystems

function leftpad (str, len, ch) {  str = String(str);  var i = -1;  if (!ch && ch !== 0) ch = ' ';  len = len - str.length;  while (++i < len) { str = ch + str; }  return str;}

Known problems: leftpad

17

npm managers un-unpublished leftpad but …

“a number of dependency chains [...] explicitly requested 0.0.3.”

http://blog.npmjs.org/post/141577284765/kik-left-pad-and-npm

Page 18: On the topology of package dependency networks: A comparison of programming language ecosystems

Conclusion

• Simple metrics can be used to compare the topology of different package-based software ecosystems

• Similarities in the dependency graph structure• Most non isolated packages are part of a large weakly

connected component• Differences that can be explained by the specificities of

each ecosystem• Python’s extensive standard library• CRAN’s particular versioning policy• npm's abundance of micro-packages

18

Page 19: On the topology of package dependency networks: A comparison of programming language ecosystems

Future work

• See our SANER 2017 article“An empirical comparison of dependency issues in OSS packaging ecosystems”• Include RubyGems• Study the evolution over time• Frequency of package updates• Resilience of packages to failures in dependencies• Impact of solutions that rely on dependency

constraints and semantic versioning• Beyond SANER 2017: study the interplay between social

and technical aspects19

Page 20: On the topology of package dependency networks: A comparison of programming language ecosystems

Thanks for you attention!

Questions?

20