human migration of open- source contributors kick-off presentation erik kouters graduation...

16
Human Migration of Open-Source Contributors Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu

Upload: edward-lambert

Post on 17-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Human Migration of Open- Source Contributors Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu

Human Migration of Open-Source Contributors

Kick-off Presentation

Erik Kouters

Graduation supervisor: A. Serebrenik

Graduation tutor: B. Vasilescu

Page 2: Human Migration of Open- Source Contributors Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu

What have I done so far?

• Geographical Movement of Mailing List Participants• Seminar SET

• Capita Selecta SET

• Who’s who in GNOME: using LSA to merge software repository identities• ICSM 2012 ERA Track

/ Software Engineering & Technology PAGE 218-04-23

Page 3: Human Migration of Open- Source Contributors Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu

What are the main topics?

• Human migration of open-source contributors• Identity matching

• Case study: GNOME

/ Software Engineering & Technology PAGE 318-04-23

Page 4: Human Migration of Open- Source Contributors Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu

Why is human migration of open-source contributors interesting?

• A passionate contributor would visit a conference.

• Don't program on Fridays!• Contributors that appear as weekend commuters are

less likely to introduce bugs on Fridays.

• Translators that reside in a different country than the country of the target language are expected to deliver translations of lower quality.

/ Software Engineering & Technology PAGE 418-04-23

Page 5: Human Migration of Open- Source Contributors Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu

What’s so interesting about this human migration of open-source contributors?

• What (geographical) patterns does the migration of open-source contributors follow?• Which patterns (source destination) are most

popular?

− Commute

− Conferences

• What are the factors that influence this migration?• Which factors are most influential?

/ Software Engineering & Technology PAGE 518-04-23

Page 6: Human Migration of Open- Source Contributors Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu

How am I planning to trace these migrations?

• Extract emails from mailing list archive• Resolve emails to location• Email A is sent from locationA at timestampA• Email B is sent from locationB at timestampB• <locationA, timestampA> + <locationB, timestampB>

= migration!

• But what if the contributor uses multiple email addresses?

/ Software Engineering & Technology PAGE 618-04-23

Page 7: Human Migration of Open- Source Contributors Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu

What exactly is Identity Matching?

• Identifying which aliases belong to the same individual

• Common in the form <name, emailAddress>

• <“George Stefanakis”, george.stefanakis@domainA>• <“Stephanakis, George”, g.stephanakis@domainB>

• Needs some similarity measure (e.g. edit distance)

/ Software Engineering & Technology PAGE 718-04-23

Page 8: Human Migration of Open- Source Contributors Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu

How am I going to match these identities?

/ Software Engineering & Technology PAGE 818-04-23

Page 9: Human Migration of Open- Source Contributors Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu

What will I be doing to improve the identity matching?

• Increase confidence when merging email addresses• Look at fellow recipients (mailing list)• Look at coauthors (source code repository)

• Use multiple similarity measures• Currently Levenshtein and Cosine Similarity• Compare performance with others (e.g. Jaccard, Jaro-Winkler,

Dice’s coefficient, etc.)

• Improve implementation• Currently slow• Data set limited to system’s memory

• Release the tool as open-source (e.g. Github)• Compare to current implementations

/ Software Engineering & Technology PAGE 918-04-23

Page 10: Human Migration of Open- Source Contributors Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu

So, what will I be doing?

1. Improve the identity matching algorithm’s performance

2. Run the algorithm on the data from the mailing list archive

3. Send out a questionnaire to verify the results

4. While waiting for the questionnaire, improve the algorithm with more advanced techniques

5. When we have received sufficient responses on questionnaire, analyse the data and look for patterns

/ Software Engineering & Technology PAGE 1018-04-23

Page 11: Human Migration of Open- Source Contributors Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu

A questionnaire? What about privacy?

• Only the individual can access the data• Participation by entering their email address

• Unique URL (hash) mailed to the email address

• Data will not be made public• Research published based on the data will be

anonymised

/ Software Engineering & Technology PAGE 1118-04-23

Page 12: Human Migration of Open- Source Contributors Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu

How do I confirm the identity matching?

/ Software Engineering & Technology PAGE 1218-04-23

Page 13: Human Migration of Open- Source Contributors Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu

How do I confirm the migrations?

/ Software Engineering & Technology PAGE 1318-04-23

Page 14: Human Migration of Open- Source Contributors Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu

Looks promising…

/ Software Engineering & Technology PAGE 1418-04-23

Page 15: Human Migration of Open- Source Contributors Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu

And what am I hoping to achieve?

• A more advanced and better performing identity matching algorithm than currently exists• Versatile and open-source tool

• According to which patterns and why skilled workers (open-source contributors) migrate• Work during holiday Hobbyist

• Visits conferences High activity in project

• More publications!

/ Software Engineering & Technology PAGE 1518-04-23

Page 16: Human Migration of Open- Source Contributors Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu

Thank you!

• Questions?

/ Software Engineering & Technology PAGE 1618-04-23