Leveraging the Crowd: Supporting Newcomers to Build an OSS Community
Marco Aurélio GerosaUniversity of São Paulo (USP)
Northern Arizona University (NAU)
Keynote @ PARIS Workshop (Methods and Tools for Project / Architecture / Risk Management in Globally Distributed Software Development Projects)August 2, 2016Irvine, California, US
Companies are open sourcing their code after using and
contributing to open source software projects
Context…
2
3
4
5
6
7
8
9
10
11
But, why companies are open sourcing their code?
12
Leveraging the Crowd
“The World Wide Web became a tool for bringing together the small contributions of millions of people and making them matter”
Collaboration on a scale never seen before
13
Where do they find the time?Activity- Wikipedia project
- Television watching in the U.S. (every year)
Time spent= 100,000,000 hours of human thought
= 200,000,000,000 hours Or 2,0000 Wikipedia projects (year)
14
"given enough eyeballs, all bugs are shallow"
15
16
Distributed and collaborative software development
• OpenStack:• 1.7M lines of code• 19 programming languages• 17K community members• 4.5K code contributors (2.1K in the last 12 months)• 38K e-mail messages • 51K followers on Twitter• took an estimated 507 years of effort (COCOMO model) -
first commit in 2006
• Mozilla Firefox:• 13.5M lines of code• 37 programming languages• 4K contributors (1K in the last 12 months)• 4,231 years of effort (COCOMO model) - first commit in 2002
• Swift:• 445K lines of code• Over the past 12 months, 428 developers (403 in the last 12
months)https://www.openhub.net/p/openstackhttps://opensource.com/business/14/6/openstack-numbershttps://www.openhub.net/p/apple_swift 17
Meet the community
https://github.com/about/press
> 14 million users
http://www.alexa.com/siteinfo/github.com
56th most accessed site18
19
DIAS, L.F., Igor STEINMACHER, Gustavo PINTO, Costa, D.A., and Marco GEROSA, How does the shift to GitHub impact project collaboration? - ICSME 2016 Era Track
20
Casual contributors
Pinto, Steinmacher & Gerosa (2016) ”More Common Than You Think: An In-Depth Study of Casual Contributors” 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2016)
• We analyzed the most popular projects in each language (300 in total)• 49% of contributors contributed only once (casual
contributor)• They are responsible for 2% of the total number of
commits• Casual contributions:
• bug fixes (30%)• fixing typos and grammar issues (29%)• adding new features (19%)• code refactoring (9%)
• Both casual contributors and project maintainersbelieve that casual contributions have more benefits than drawbacks (survey)
21
Heterogeneity• Meet multiple and specific demands• Cloud computing• Micro-services
SPEED
22
Ok, Let’s do it!
Let’s attract new developers!
Let’s open our code!
23
However…
24
“I opened my browser and typed the website address: http://www.libreoffice.org/. I will need to contribute to LibreOffice but I don’t have any clue on how to do it” “... I am a little lost, so I will try a bug that I think I can work with...”
“I don’t know what I was supposed to do after finishing the compilation process. I will watch the video tutorial once again to find it out. I need to define my next steps, I don’t know what these steps are.”
“The information I found in the project website are long and confusing. I felt really lost and concerned.”
Igor STEINMACHER, Tayana CONTE; Marco GEROSA, David REDMILES (2015) ” Social barriers faced by newcomers placing their first contribution in open source software projects”, 18th ACM Conference on Computer Supported Cooperative Work (CSCW 2015)
25
Why do newcomers dropout from OSS projects?
We analyzed:• 60 months Hadoop project• Mailing lists (50K messages), Issue tracker (8K issues, 76K comments), VCS• Survey
Absence of response , politeness, usefulness, and type of the author influence the retention of newcomers in an open source project
Steinmacher, Wiese, Chaves & Gerosa, ”Why do newcomers abandon open source software projects?”, 6th Int. Workshop on Cooperative and Human Aspects of Software Engineering (CHASE 2013)
82% of dropouts!!!
26
How to better support newcomers?
27
Our research goalTo understand the entrance of newcomers in open source software projects by means of empirical studies and mitigate the barriers they face by means of processes and tools, leveraging sociotechnical information from software repositories
28
Method
UnderstandEngineer
Evaluate
Model & Theories
Executable code
Research
Our general approach
29
1. Empirical studies using mixed-methods approach to understand the phenomenon
2. Engineering of innovative tool support for different stakeholders based on the understanding obtained
3. Evaluation using rigorous and systematic scientific studies
30
Engineer
Understand
Evaluate
Method
UnderstandEngineer
Evaluate
Model & Theories
Executable code
Research
31
Systematic literature review
STEINMACHER, I.; SILVA, M.A.; GEROSA, M.A.; REDMILES, D.F. “A systematic literature review on the barriers faced by newcomers to open source software projects.” Information and Software Technology, v. 59, p. 67-85, 2015
(“OSS” OR “Open Source” OR “Free Software” OR FLOSS OR FOSS) AND (newcomer OR “joining process” OR newbie OR “new developer” OR “new member” OR “new contributor” OR novice OR beginner OR “potential participant” OR retention OR joiner OR onboarding OR “new committer”)
291 papers initially found20 papers selected
32
RQ: What are the barriers that hinder the contribution of newcomers in OSS projects?
Understand
Empirical studies• Interviews: 36 subjects, 14 projects• Survey: 24 answers, 9 projects• Ethnography study: 2 courses
Igor STEINMACHER, Tayana CONTE; Marco GEROSA, David REDMILES (2015) ” Social barriers faced by newcomers placing their first contribution in open source software projects”, 18th ACM Conference on Computer Supported Cooperative Work (CSCW 2015)
33
Prof. David Redmiles
In collaboration with:
Understand
Put everything together
Igor STEINMACHER, Tayana CONTE; Marco GEROSA, David REDMILES (2015) ” Social barriers faced by newcomers placing their first contribution in open source software projects”, 18th ACM Conference on Computer Supported Cooperative Work (CSCW 2015)
34
Understand
Igor STEINMACHER, Tayana CONTE; Marco GEROSA, David REDMILES (2015) ” Social barriers faced by newcomers placing their first contribution in open source software projects”, 18th ACM Conference on Computer Supported Cooperative Work (CSCW 2015)
The Barriers Model
35
Understand
Method
UnderstandEngineer
Evaluate
Model & Theories
Executable code
Research
36
FLOSSCoach: a portal for newcomers
http://www.flosscoach.com/
Engineer
37
”Awareness denotes the practices through which actors tacitly and seamlessly align and integrate their distributed and yet interdependent activities.” Kjeld Schmidt (2002)
(Big)Data!
Mining software repositories
Mining
Information about a project
Information about an ecosystem
Information about Software Engineering
Decision making
Software understanding
Support maintenance
Empirical validation of ideas & techniques
Collaboration and software production
Practitioner Researcher
Applications
Tag cloud from MSR 2014 CFP
The Mining Software Repositories (MSR) field analyzes the rich data available in software repositories to uncover interesting and actionable information about software systems and projects.
40
Software repositories
https://github.com/about/presshttp://octoverse.github.com/
31 million repositories12 million usersIn a single year:• 3 million new users• 152 million pushes• 25 million comments• 14 million issues• 7 million pull requests
36K projectshttp://en.wikipedia.org/wiki/CodePlex
30K projectshttps://launchpad.net
324K projects3.4 million developers
http://sourceforge.net/apps/trac/sourceforge/wiki/What%20is%20SourceForge.net
250K projects
http://en.wikipedia.org/wiki/Comparison_of_open_source_software_hosting_facilities
93K projects1 million users
200 projectshttp://projects.apache.org/indexes/alpha.html
661K projects29 billion of lines of codes3 million users
33K projects
http://www.ohloh.net/
41
42
Method
UnderstandEngineer
Evaluate
Model & Theories
Executable code
Research
43
FLOSSCoach evaluation• Deployed the portal for 6 different
projects • Developers reported their progress on
user diaries• Surveyed developers using the
Technology Acceptance Model and Self-efficacy instruments.
44
Evaluate
Evaluation• The portal improved newcomers’ experiences of the contribution
process
Igor STEINMACHER, Tayana CONTE, Christoph TREUDE, Marco GEROSA,"Overcoming Open Source Project Entry Barriers with a Portal for Newcomers". International Conference on Conference on Software Engineering (ICSE 2016), Austin, Texas. 45
Evaluate
Evaluation• The portal improved newcomers’ experiences of the contribution
process
Igor STEINMACHER, Tayana CONTE, Christoph TREUDE, Marco GEROSA,"Overcoming Open Source Project Entry Barriers with a Portal for Newcomers". International Conference on Conference on Software Engineering (ICSE 2016), Austin, Texas.
“The tool seems to be good, because it solves doubts that range from the skills needed to start to pointing how to submit a contribution.” “I could check what newcomers need to know regarding the development environment, accessing the links to documentation and relevant guidelines, understanding how to search for help and who to talk to in case of problems.”
“…the tool helped me a lot, because it gave me an outstanding guidance about what I needed to do and, consequently, made me spend less time and made me more confident”
“The flow was great. I always used it, and from here I accessed the other information. It is easy”
46
Evaluate
Next steps
UnderstandEngineer
Evaluate
Model & Theories
Executable code
Research
47
How does the shift to GitHub impact projects’ collaboration?
DIAS, L.F., Igor STEINMACHER, Gustavo PINTO, Costa, D.A., and Marco GEROSA, How does the shift to GitHub impact project collaboration? - ICSME 2016 Era Track
We also investigated number of pull requests and issues
48
Understand
What are the benefits and challenges of open-sourcing a proprietary software project?
Prof. Gustavo Pinto (IFPA)
In collaboration with:
49
Understand
What are the benefits and barriers of contributing to OSS in a Software Engineering course?
50
Understanding newcomer’s motivations and engagement programs
Code Engagement Programs… may potentially motivate students to engage into Open Source
GSoC 2014
Analysis per project tool BeforeDuringAfter -> only 2 got back!
Country Partici-pants
Sri Lanka 13
China 5
India 3
USA 2
Spain 2Ireland 2United Kingdom, France, South Korea, Portugal, Hungary, Estonia
1
33 students
Prof. Daniel German (Uvic)
Understand
In collaboration with:
What about mentors?
What are the benefits?Is it worth it?What is the process?What are the motivations?What are the challenges?
Understand
In collaboration with:
Anita Sarma(Oregon State University) 52
Documentation is everywhere
53
Next steps – more techniques• Natural Language Processing • deals with analyzing, understanding, and generating languages
that humans use naturally
• Information Retrieval • obtains information resources relevant to an information need
from a collection of resources
• Mining Software Repositories • uncovers interesting and actionable information about
software systems and projects summarize
visualize
search sort
filter Prof. Christoph Treude
Engineer
In collaboration with:
54
Improving engagementEngineer
Sabrina Marczak(PUCRS)
In collaboration with:
55
Thomas Zimmermann, Peter Weissgerber, Stephan Diehl, and Andreas Zeller. 2005. Mining Version Histories to Guide Software Changes. IEEE Trans. Software Eng. 31, 6 (June 2005), 429-445.
Solving specific barriers: recommending co-changes
Engineer
Gustavo OLIVA, Marco GEROSA, Change coupling between software artifacts: learning from past changes - in: The Art and Science of Analyzing Software Data (2015)56
Social information to predict co-changesUsing context information from software change collected from communication (comments), coordination (issues), and cooperation (artifacts);Random forest classifier for each specific co-change;
Engineer
Igor WIESE, Reginaldo RÉ, Igor STEINMACHER, Rodrigo KURODA, Gustavo OLIVA, Christoph TREUDE, Marco GEROSA, “Using contextual information to predict co-change”, Journal of Systems and Software (JSS), Elsevier (2016)
57
Detecting code issues
Different architectural roles have different metrics
distribution
Engineer
Mauricio ANICHE, Gabriele BAVOTA, Christoph TREUDE, Arie van DEURSEN, Marco GEROSA, A validated set of smells in Model-View-Controller architectures - ICSME 2016
Specific smells to specific architectural roles
58
Training the next generation of software engineers
Yorah Bosse,PhD candidate
70.69%
29.31%
% Pass % Fail and Abort
C Java Python VBA2010 78% 10% 3% 13%2011 78% 8% 0% 13%2012 75% 9% 3% 13%2013 28% 6% 53% 13%2014 48% 0% 40% 13%
29.3% of these enrollments resulted
in fail and abortC and Python were the
most used programming languages
0.00%10.00%20.00%30.00%40.00%50.00%60.00%70.00%80.00%90.00%
100.00%
Fail/Abort Pass
One of these courses had 62.2% of fail and abort in
this period
1 2 3 4 5 >5
73.3
1%
19.6
0%
4.49
%
1.81
%
0.43
%
0.36
%
Engineer
59Leonidas BRANDÃO, Yorah BOSSE, Marco GEROSA, ” Visual programming and automatic evaluation of exercises: an experience with a STEM course”, Frontiers in Education conference (FIE), Erie, PA, October, 2016
Next steps – more evaluation
• Free and Open Source Competence Center• OSS in Education• Federal Government Public Software Portal• In the wild…
Evaluate
60
Vision: Bug ExchangeBug Exchange: helping newcomers familiarize with sociotechnical aspects of an ecosystem of projects, generating a workforce of contributors who can transfer knowledge across projects
61Anita SARMA, Marco GEROSA, Igor STEINMACHER, I., LEANO, R., ” Training the future workforce through task curation in an OSS ecosystem”, Foundations of Software Engineering (FSE 2016), Visions and Reflections Track (FSE-VaR)
• Identifying required skill• Determining task complexity• Identifying information needs and
providing documentation• Recommending tasks• supporting peer mentor networks• Transferring knowledge across
projects• Crowdsourcing tasks
In collaboration with:
Anita Sarma
Recap
62
Recap
63
Recap
64
For existing projects:
Lower the barriers to boost contributions, specially from newcomers
Steinmacher, I., Gerosa, M.A., “Fostering Free/Libre Open Source Software community formation: guidelines for communities to support newcomers’ onboarding,” in: XVI International Free Software Workshop (WSL 2015)
65
YOU NEVER GETA SECOND CHANCE TO
MAKE A FIRST IMPRESSION
For companies opening their code:
Steinmacher, I., Gerosa, M.A., “Fostering Free/Libre Open Source Software community formation: guidelines for communities to support newcomers’ onboarding,” in: XVI International Free Software Workshop (WSL 2015)
66
YOU NEVER GETA SECOND CHANCE TO
MAKE A FIRST IMPRESSIONLower the barriers to boost contributions, specially from newcomers
For researchers:64 barriers faced by newcomers
We still need methods and tools for Project/Architecture/Risk
Management 67
For researchers:64 barriers faced by newcomers
We still need methods and tools for Project/Architecture/Risk
Management 68
Thank you!Marco Aurelio Gerosa ([email protected])@gerosa_marcohttp://www.ime.usp.br/~gerosa