1 licensing is software too: achievements and challenges (and how this relates to code provenance)...
TRANSCRIPT
Licensing is Software Too: Achievements and Challenges(and how this relates to code provenance)
Massimiliano Di PentaUniversity of Sannio, Italy
http://www.rcost.unisannio.it/mdipenta
2
Acknowledgements
Daniel M. Germán, Univ. Victoria, Canada
Julius Davies, Univ. Victoria, Canada
Giuliano Antoniol, Ecole Polyt. Montréal, Canada
Yann-Gaël Guéhéneuc, Ecole Polyt. Montréal, Canada
3
Reusing Open Source Software When developing a software system,
we try (if possible) not to reinvent the wheel Components, libraries, source
code snippets out of there, ready to be reused Code search engines are becoming popular
Open source code modification and redistribution governed by Software licenses Copyright statements
Everything contained in a licensing block…
4
What does a licensing contain?/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
….
* Portions created by the Initial Developer are Copyright (C) 2002
* the Initial Developer. All Rights Reserved.
*
* Contributor(s):
* Brian Ryner <[email protected]>
….
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
#include "nsXULAppAPI.h"
#ifdef XP_WIN
#include <windows.h>
License(MPL+GPL+LGPL)
Copyrightstatement
Copyrightyear
Contributor
5
Restrictive vs. permissive licenses
Restrictive (aka copyleft or reciprocal) Changed software must be made available
under similar terms wrt. the original Example: GPL
Permissive Modifications/enhancements may remain
proprietary Distribution of source code or binary permitted
– Provided copyright notice and/or liability disclaimers– Contributor names do not imply endorsement
Examples: Berkeley Software Distribution (BSD), Apache Software License, MIT
6
FOSS development teams care! (source: Debian)
I am in the process of trying to prepare 0.8.0 for Debian GNU/Linux I have started going over the copyright/license headers. In src/celeste many files are missing copyright information. Most of these are files imported with minimal changes from Gabor API http://www.kung-foo.tv/gaborapi.php or libsvm http://www.csie.ntu.edu.tw/\~cjlin/libsvm/.
The attached patch adds copyright and license statements to these files.[1]
Please apply and update the headers (adding copyright holders) if you make substantial changes.
thanks, cu andreas
[1] I have doublechecked with Gabor API's upstream author Adriaan Tijsseling that files like ContrastFilter.cpp are Copyright (c) Adriaan Tijsseling and licensed under GPLv2+, although the original headers just say:
Original Author: Yasunobu Honma
Modifications by: Adriaan Tijsseling (AGT)
7
Conjectures Since licenses determine the way software
can be composed and re-distributed They may change/evolve as any other part of
the software They might be subject to bugs too
– See our ICPC 2010 paper about how to identify licensing incompatibilities
They might determine the success/failure of a software project
Code provenance and licenses: Licenses constrain source code migration
between projects Code provenance might be useful to determine
the licensing of closed components
8
Licenses influence the software lifetime OpenBSD founder and project leader Theo de Raadt
removed a security software package called IP-Filter [written by Darren Reed] after its author changed its license.
Stephen Shankland, CNET News, 2001/05/30.
Licenses evolve as software does Failing to account for that would cause copyright
infringements Decisions on license changes impact as other
decisions on software evolution Little attention so far from the scientific community
Need for methods and tools to audit licensing and their changes
9
Example: Java Until November 2006, the license of Java JDK v1.2
said:
“Except as specifically authorized in any Supplemental License Terms, you may not make copies of Software, other than a single copy of Software for archival purposes” This disallowed the inclusion of Java in Linux distributions
Java 5.0 released under the GPL v2 with the CLASSPATH exception: Java could be modified/updated under the GPL v2
Java programs could be released under any license as long as they satisfy the conditions stated in the CLASSPATH exception
Changing the license of a system can promote and ease the distribution and reuse of a software system
11
Example: QT First released under a non-open source but free
license, called the FreeQT License, and a commercial license
QT became the basis for KDE QT v2.0 was released under a new license, the Q Public
License incompatible with the GPL
GNOME project started as a QT-free alternative to KDE
Harmony project started as a GPL replacement of QT Trolltech changed the license of QT v3 to the GPL v2
The Harmony project was abandoned
Changing the license of FOSS system towards a more permissive might cause the abandonment of a competing system
13
Empirical Study Goal: analyze licensing evolution Purpose: investigating how
developers change licensing statements
Context: CVS/SVN repositories of ArgoUML, Eclipse-JDT, the FreeBSD and
the OpenBSD kernels, Mozilla, Samba
14
Research Questions
RQ1: To what extent are files changing their licenses?
RQ2: How are copyright years changed in licensing statements?
RQ3: Who are the contributors of a software project and how do they change?
15
Licensing Analysis Method – Extracting Licensing statements
/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
….
* Portions created by the Initial Developer are Copyright (C) 2002
* the Initial Developer. All Rights Reserved.
*
* Contributor(s):
* Brian Ryner <[email protected]>
….
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
#include "nsXULAppAPI.h"
#ifdef XP_WIN
#include <windows.h>
16
Licensing Analysis Method – Classifying licenses FoSSology [Gobeille, MSR 2008]: detects licenses
using the Binary Symbolic Alignment Matrix (bSAM) Ninka [German et al., ASE 2010]: uses a pattern-
matching approach
/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- *//* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * http://www.mozilla.org/MPL/
…. * Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Brian Ryner <[email protected]>
….
/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- *//* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * http://www.mozilla.org/MPL/
…. * Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Brian Ryner <[email protected]>
….MPL 1.1/GPL 2.0/LGPL 2.1
17
Licensing Analysis Method – Identifying changes in copyright years Mining references to years in licensing…
/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- *//* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * http://www.mozilla.org/MPL/
…. * Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Brian Ryner <[email protected]>
….
/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- *//* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * http://www.mozilla.org/MPL/
…. * Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Brian Ryner <[email protected]>
….
18
Licensing Analysis Method – Identifying contributor names
Mining emails, plus various patterns Copyright … year name Contributor(s) …
And mapped to committers, whenever possible
/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- *//* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * http://www.mozilla.org/MPL/
…. * Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Brian Ryner <[email protected]>
….
/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- *//* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * http://www.mozilla.org/MPL/
…. * Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Brian Ryner <[email protected]>
….
19
RQ1: Most relevant license changesEclipse-JDT
Common Public License v1.0 Eclipse Public License v1.0 CHANGE 2394Common Public License v0.5 Common Public License v1.0 UPDATE 808
MozillaNPL 'NPL v1.1'-style+GPL v2+LGPL v2.1 DUAL 2914
NPL 'Dual MPL GPL'-style+MPL DUAL 1274
'Dual MPL GPL'-style+MPL NPL BUG 1194
Licensing updated as new licenses were developed
Eclipse JDT: CPL 0.5CPL 1.0EPL 1.0 IBM has relinquished control of licenses to the Eclipse
Foundation
Mozilla: NPLMPL + GPL (+ LGPL) NPL allowed to release Netscape 6 as a proprietary system MPL only allows to re-distribute the source code under the
MPL Multiple licenses to deal with incompatibilities Files wrongly changed to NPL (bug #98089)
20
RQ1: Most relevant license changes
FreeBSD
BSD UCRegents (4-cl BSD)'BSD UCRegents'-style (4-cl BSD) UPDATE 491
'BSD UCRegents'-style (4-cl BSD) 'INRIA-OSL'-style (3-cl BSD) UPDATE 300
OpenBSD'BSD UCRegents'-style (4-cl BSD) 'INRIA-OSL'-style (3-cl BSD) UPDATE 964
BSD UCRegents (4-cl BSD)'BSD UCRegents'-style (4-cl BSD) UPDATE 414
FreeBSD and OpenBSD are more eclectic than other projects Moving from BSD-4 clauses to the more
permissive BSD-3 and BSD-2
21
RQ1: Most relevant license changes
ArgoUML
None 'Free with copyright clause'-style +'UC Regents free with copyright clause'-style ADD 127
SambaNone GPL v2 ADD 15
ArgoUML and Samba kept the same licenses over the analyzed time span Change is from None to a simple license Authors realized the importance of including a
license
22
RQ2: How and why were copyright years changed?
Files for which the copyright years were updated underwent a significantly higher number of changes than others
When developers perform substantial changes to a file, they also update copyright years
Required by copyright regulations Lack of updates with substantial changes
would allow an infringer to claim “innocent infringement”
Commits explicitly targeted to copyright years “Updated copyrights” “Updated copyrights to 2004”
23
RQ3: When do contributors change?
Changes where contributor names are added are significantly bigger than other changesContributors often added when they make substantial changes
Contributor names are importantassets in source code
Like the signature on a picture However…
contributors can change during the time no standard way of reporting them no clear rule on when one should become a
contributor Their presence can have legal implications
Licenses InfluenceCode Migration
25
Free (software) as a bird… As birds migrate differently
during different seasons…. Code might have a
migration preferential direction
Given two systems e.g. FreeBSD and Linux
We find the same code in both systems
Three scenarios: Migration FreeBSD Linux Migration Linux FreeBSD Migration third-party
FreeBSD, Linux
27
Sibling(s) Origin Identify siblings between systems using clone detection
CCFinderX, with >100 tokens as threshold, plus other heuristics Trace back into past siblings – their code fragments in
the same files Again clone detection, the sibling fragment wrt. previous file
revisions When they disappear, then we have their origins
Take the oldest of the two as the true originSys 1 – File i
Sys 2 – File j
siblings
Cloned fragments
Cloned fragments
Migrationdirection
28
Code Migration and LicensesFreeBSD Linux Files BSD GPL 8BSD MIT 2BSD None 2Corporate BSD+GPL 89GPL None 1Phrase BSD+GPL 1X.Net+BSD MIT 1
Linux FreeBSD Files
BSD+GPL Corporate 8GPL BSD 17GPL BSD+GPL 1GPL CPL+BSD+GPL 1MIT BSD 1MIT+GPL None 2None BSD 1Phrase+GPL
MIT 2
OpenBSD Linux FilesBSD BSD+GPL 1BSD MIT 2BSD Unknown 1BSD+GPL GPL 1BSD+Phrase
Phrase+GPL 1
MIT GPL 23
After Jan 1, 2002
Nothing before
Before Jan 1, 2002
Almost nothing after
29
Discussion
Siblings have a preferential flow Initially from BSD(s) to Linux – frequent Today from Linux to FreeBSD – less frequent Thus, due to licenses but also to the system
level of development
Companies directly contribute to code in different kernels – see Intel drivers with dual licenses In this case, code migrates from a third party
towards Linux and FreeBSD
Identifying licenses of jar archives
31
Motivations
Very often, Java open source software is distributed in jar archivesSee http://mvnrepository.com/
Problem: the jar might not contain licensing infoUnder what conditions can we integrate
the component?The jar might not be legally usedEven if it’s from open source code, we
might not found exactly the same jar
32
Search-driven approach
Extracting info from the class bytecode Class and package names.. or a fingerprint.. We use the ASM library (http://asm.ow2.org/)
Querying Google Code Search Using the full qualified class name Using the package only Query performed using the Google Code API
(http://code.google.com/apis/gdata/) If the same class is not found, its license is
obtained by those of classes belonging to the same package
33
Google Code Search Output
34
% of correct classifications Found license:
Min. 29% (commons.codec), Avg. 82%, median: 89.5%
Inferred licenses: Min. 62% (JLayer 1.0),
Avg. 95%, median 100%
The inferring heuristic significantly better both in terms of completeness and of precision
35
Incorrect classifications Most of them are between LGPL
and GPL and between BSD and Apache.
commons-codec: mismatching between Apache and BSD files licensed under the Apache v 1.1
derived from the BSD
JLayer: mismatching between GPL and LGPL same inferred licenses in both
releases (0.4 and 1.0)
however, JLayer moved from GPL to LGPL from release 0.4 to release 1.0
36
Conclusions We proposed a code analysis method as
support for lawyers other than for software engineers
We studied how licensing are used and evolveLicense type, copyright year, contributors
Main findings: License influence projects outcome License influence code migration Moving towards more permissive licenses Copyright years and contributor names updated
to preserve rights on new code
37
Licensing and code provenance
Licensing influences the direction in which code flows from a system towards another one Often code flows in the direction of more
permissive licenses… ..but there are many other factors influencing
how code flows
Search-driven approaches can be adopted to determine from what code does a closed component come from And thus its licensing… Issues related to the capabilities of the code
search tools
38
Thank you!
39
References Daniel M. Germán, Jens H. Weber-Jahnke, Massimiliano Di Penta: Lawful
Software Engineering, Proceedings of FoSER: Working Conference on the Future of Software Engineering Research, November 2010, Santa Fe', USA, 2010, ACM
Daniel M. Germán, Massimiliano Di Penta, Julius Davies: Understanding and Auditing the Licensing of Open Source Software Distributions. ICPC 2010: 84-93
Massimiliano Di Penta, Daniel M. Germán, Yann-Gaël Guéhéneuc, Giuliano Antoniol: An exploratory study of the evolution of software licensing. ICSE 2010: 145-154
Massimiliano Di Penta, Daniel M. Germán, Giuliano Antoniol: Identifying licensing of jar archives using a code-search approach. MSR 2010: 151-160
Massimiliano Di Penta, Daniel M. Germán: Who are Source Code Contributors and How do they Change? WCRE 2009: 11-20
Daniel M. Germán, Massimiliano Di Penta, Yann-Gaël Guéhéneuc, Giuliano Antoniol: Code siblings: Technical and legal implications of copying code between applications. MSR 2009: 81-90
Daniel M. Germán, Yuki Manabe, Katsuro Inoue: A sentence-matching method for automatic license identification of source code files. ASE 2010: 437-446
Daniel M. Germán, Ahmed E. Hassan: License integration patterns: Addressing license mismatches in component-based development. ICSE 2009: 188-198
Robert Gobeille: The FOSSology project. MSR 2008: 47-50