tracing software build processes to uncover license compliance inconsistencies

81
Tracing Software Build Processes to Uncover License Compliance Inconsistencies @shane_mcintosh Shane McIntosh Sander van der Burg Eelco Dolstra Julius Davies Daniel M. Germán Armijn Hemel Tjaldur Software Governance Solutions

Upload: shane-mcintosh

Post on 05-Dec-2014

169 views

Category:

Software


3 download

DESCRIPTION

Open Source Software (OSS) components form the basis for many software systems. While the use of OSS components accelerates development, client systems must comply with the license terms of the OSS components that they use. Failure to do so exposes client system distributors to possible litigation from copyright holders. Yet despite the importance of license compliance, tool support for license compliance assessment is lacking. In this paper, we propose an approach to extract and analyze the Concrete Build Dependency Graph (CBDG) of a software system by tracing system calls that occur at build-time. Through a case study of seven open source systems, we show that the extracted CBDGs: (1) accurately classify sources as included in or excluded from deliverables with 88%-100% precision and 98%-100% recall, and (2) can uncover license compliance inconsistencies in real software systems - two of which prompted code fixes in the CUPS and FFmpeg systems.

TRANSCRIPT

Page 1: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Tracing Software Build Processes to Uncover License Compliance Inconsistencies

@shane_mcintosh

Shane McIntosh

Sander van der Burg

Eelco Dolstra

Julius Davies

Daniel M. Germán

Armijn Hemel

! Tjaldur!

Software Governance Solutions

Page 2: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Software reuse enables rapid!development of new applications

2

Page 3: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Software reuse enables rapid!development of new applications

GNU

2

Page 4: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Software reuse enables rapid!development of new applications

GNUMozilla

2

Page 5: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Software reuse enables rapid!development of new applications

GNUApacheMozilla

2

Page 6: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Reusable components are released under different license terms

GNUMozilla

3

Apache

Page 7: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Reusable components are released under different license terms

GNUMozilla

3

Apache

Page 8: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Reusable components are released under different license terms

GNUMozilla

3

Apache

Page 9: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Reusable components are released under different license terms

GNUMozilla

3

Apache Public License

Apache

Page 10: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Reuse puts legal constraints on how client systems can be distributed

External component

Client system

Used by

4

Page 11: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Reuse puts legal constraints on how client systems can be distributed

External component

Client system

Used by

4

Page 12: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Failure to comply with license terms can lead to costly legal issues

5

Page 13: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Failure to comply with license terms can lead to costly legal issues

5

Page 14: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Failure to comply with license terms can lead to costly legal issues

5

Page 15: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Failure to comply with license terms can lead to costly legal issues

5

Page 16: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

6

Which source files are enabled?

Ensuring license compliance with reused components

.c.c.c.c

Page 17: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

6

Which source files are enabled?

Ensuring license compliance with reused components

.c.c.c

.c

Page 18: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

6

Which source files are enabled?

Which components are used?

Ensuring license compliance with reused components

.c.c.c

.c

Page 19: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

6

Which source files are enabled?

Which components are used?

How are they combined?

Ensuring license compliance with reused components

.c.c.c

.c

Static link

Dynamic link

Page 20: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

6

Which source files are enabled?

Which components are used?

How are they combined?

Ensuring license compliance with reused components

.c.c.c

.c

Static link

Dynamic link

The build system can!answer these questions!

Page 21: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

What is a build system?

7

Page 22: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Deliverable

What is a build system?

7

Page 23: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

.tex

.c

.cc

.o

.o

.dvi

.a

.exe

.pdf

.deb

Build systems describe how sources are!translated into deliverables

8

Page 24: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Step 1 - Configuration

Step 2 - Construction

Step 3 - Certification

Step 4 - Packaging

Step 5 - Deployment

9

Page 25: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Step 1 - Configuration

Step 2 - Construction

Step 3 - Certification

Step 4 - Packaging

Step 5 - Deployment

9

Focus of this paper

Page 26: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Step 1 - Configuration

10

Page 27: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

11

Step 2 - Construction

Page 28: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

patchelf: patchelf.o g++ patchelf.o -o patchelf

Incompleteness of build specs makes license compliance assessment difficult

patchelf.o: patchelf.cc g++ -c patchelf.cc

install: patchelf install patchelf /usr/bin/

12

Page 29: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

patchelf: patchelf.o g++ patchelf.o -o patchelf

patchelf.o: patchelf.cc g++ -c patchelf.cc

install: patchelf install patchelf /usr/bin/

13

Incompleteness of build specs makes license compliance assessment difficult

Page 30: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

patchelf: patchelf.o g++ patchelf.o -o patchelf

patchelf.o: patchelf.cc g++ -c patchelf.cc

install: patchelf install patchelf /usr/bin/

Header file dependencies are not listed

13

Incompleteness of build specs makes license compliance assessment difficult

Page 31: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

patchelf.ccelf.h

patchelf.o

ExtractedMissing

Dependencies

14

Incompleteness of build specs makes license compliance assessment difficult

Page 32: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

patchelf.ccelf.h

patchelf.o

ExtractedMissing

Dependencies

14

Incompleteness of build specs makes license compliance assessment difficult

Page 33: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

patchelf.ccelf.h

patchelf.o

ExtractedMissing

Dependencies

14

Incompleteness of build specs makes license compliance assessment difficult

Page 34: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

patchelf: patchelf.o g++ patchelf.o -o patchelf

patchelf.o: patchelf.cc g++ -c patchelf.cc

install: patchelf install patchelf /usr/bin/

15

Incompleteness of build specs makes license compliance assessment difficult

Page 35: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

patchelf: patchelf.o g++ patchelf.o -o patchelf

patchelf.o: patchelf.cc g++ -c patchelf.cc

install: patchelf install patchelf /usr/bin/

External library dependencies are not listed

15

Incompleteness of build specs makes license compliance assessment difficult

Page 36: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

patchelf.ccelf.h

patchelf.o

patchelf

libstdc++ExtractedMissing

Dependencies

16

Incompleteness of build specs makes license compliance assessment difficult

Page 37: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

patchelf.ccelf.h

patchelf.o

patchelf

libstdc++ExtractedMissing

Dependencies

16

Incompleteness of build specs makes license compliance assessment difficult

Page 38: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

patchelf.ccelf.h

patchelf.o

patchelf

libstdc++ExtractedMissing

Dependencies

16

Incompleteness of build specs makes license compliance assessment difficult

Page 39: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

patchelf: patchelf.o g++ patchelf.o -o patchelf

patchelf.o: patchelf.cc g++ -c patchelf.cc

install: patchelf install patchelf /usr/bin/

17

Incompleteness of build specs makes license compliance assessment difficult

Page 40: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

patchelf: patchelf.o g++ patchelf.o -o patchelf

patchelf.o: patchelf.cc g++ -c patchelf.cc

install: patchelf install patchelf /usr/bin/

Hidden relationship between patchelf and

/usr/bin/patchelf

17

Incompleteness of build specs makes license compliance assessment difficult

Page 41: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

patchelf.ccelf.h

patchelf.o

patchelf

/usr/bin/patchelflibstdc++ExtractedMissing

Dependencies

18

Incompleteness of build specs makes license compliance assessment difficult

Page 42: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

patchelf.ccelf.h

patchelf.o

patchelf

/usr/bin/patchelflibstdc++ExtractedMissing

Dependencies

18

Incompleteness of build specs makes license compliance assessment difficult

Page 43: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

We use system tracing to recover the missing dependencies

Build process

19

Trace!log

Page 44: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

OS kernel

open()

We use system tracing to recover the missing dependencies

Build process

19

read()

write()

close()

Trace!log

Page 45: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Trace!log

We mine build traces to construct a concrete build dependency graph

patchelf.ccelf.h

patchelf.o

patchelf

/usr/bin/patchelflibstdc++ExtractedMissing

Dependencies

20

Page 46: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Trace!log

We mine build traces to construct a concrete build dependency graph

patchelf.ccelf.h

patchelf.o

patchelf

/usr/bin/patchelflibstdc++ExtractedMissing

Dependencies

20

Page 47: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Trace!log

We mine build traces to construct a concrete build dependency graph

patchelf.ccelf.h

patchelf.o

patchelf

/usr/bin/patchelf

g++

libstdc++ExtractedMissing

Dependencies

20

Page 48: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Trace!log

We mine build traces to construct a concrete build dependency graph

patchelf.ccelf.h

patchelf.o

patchelf

/usr/bin/patchelf

g++

libstdc++

g++

ExtractedMissing

Dependencies

20

Page 49: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Trace!log

We mine build traces to construct a concrete build dependency graph

patchelf.ccelf.h

patchelf.o

patchelf

/usr/bin/patchelf

g++

libstdc++

g++install

ExtractedMissing

Dependencies

20

Page 50: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

patchelf.ccelf.h

patchelf.o

patchelf

/usr/bin/patchelf

g++

libstdc++

g++install

ExtractedMissing

Dependencies

Annotate build graph nodes with license information using Ninka

21

Page 51: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

patchelf.ccelf.h

patchelf.o

patchelf

/usr/bin/patchelf

g++

libstdc++

g++install

ExtractedMissing

Dependencies

Annotate build graph nodes with license information using Ninka

21

Page 52: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Inconsistency introduced!

patchelf.ccelf.h

patchelf.o

patchelf

/usr/bin/patchelf

g++

libstdc++

g++install

ExtractedMissing

Dependencies

Annotate build graph nodes with license information using Ninka

21

Page 53: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

22

Page 54: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Empirical study!!

!

!

!

!

!

!

!

!

!

22

Page 55: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Empirical study!!

!

!

!

!

!

!

!

!

!

(RQ1)!Accuracy

22

Page 56: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Empirical study!!

!

!

!

!

!

!

!

!

!

(RQ1)!Accuracy

(RQ2)!Practicality

22

Page 57: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Empirical study!!

!

!

!

!

!

!

!

!

!

(RQ1)!Accuracy

(RQ2)!Practicality

23

Page 58: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

24

Measuring the accuracy!of our CBDG approach

Included .c.c.c.c

Excluded

Page 59: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

24

Measuring the accuracy!of our CBDG approach

Included .c.c.c

.cExcluded

Page 60: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

24

Measuring the accuracy!of our CBDG approach

Included .c.c .c

.cExcluded

Delete

Page 61: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

24

Measuring the accuracy!of our CBDG approach

Included .c.c .c

.cExcluded

Delete

Execute build

Page 62: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

24

Broken means true positive

Measuring the accuracy!of our CBDG approach

Included .c.c .c

.cExcluded

Delete

Execute build

Page 63: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

24

Clean means false positive

Broken means true positive

Measuring the accuracy!of our CBDG approach

Included .c.c .c

.cExcluded

Delete

Execute build

Page 64: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

24

Clean means false positive

Broken means true positive

Measuring the accuracy!of our CBDG approach

Included .c.c .c

.cExcluded

Delete

Execute build

.c

Page 65: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

24

Clean means false positive

Broken means true positive

Measuring the accuracy!of our CBDG approach

Included .c.c .c

.cExcluded

Delete

Execute build

Execute build

.c

Page 66: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

24

Clean means false positive

Broken means true positive

Measuring the accuracy!of our CBDG approach

Included .c.c .c

.cExcluded

Delete

Execute build

Broken means false negative

Execute build

.c

Page 67: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

24

Clean means false positive

Broken means true positive

Measuring the accuracy!of our CBDG approach

Included .c.c .c

.cExcluded

Delete

Execute build

Clean means true negative

Broken means false negative

Execute build

.c

Page 68: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Our approach accurately selects the files that impact system deliverables

Aterm Opkg Bash CUPS Xalan OpenSSL FFmpeg

Technology Make Make Make Make Ant Make Make

Precision 100% 97% 88% 100% 99% 99% 99%

Recall 98% 99% 100% 99% 100% 100% 100%

25

Page 69: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

But there are cases where our approach makes mistakes

Aterm Opkg Bash CUPS Xalan OpenSSL FFmpeg

Technology Make Make Make Make Ant Make Make

Precision 100% 97% 88% 100% 99% 99% 99%

Recall 98% 99% 100% 99% 100% 100% 100%

26

Page 70: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

And there are cases when our approach misses files that impact deliverables

Aterm Opkg Bash CUPS Xalan OpenSSL FFmpeg

Technology Make Make Make Make Ant Make Make

Precision 100% 97% 88% 100% 99% 99% 99%

Recall 98% 99% 100% 99% 100% 100% 100%

27

Page 71: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Empirical study!!

!

!

!

!

!

!

!

!

!

(RQ1)!Accuracy

Precision: 88%-100%

Recall: 98%-100%

(RQ2)!Practicality

28

Page 72: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Empirical study!!

!

!

!

!

!

!

!

!

!

(RQ1)!Accuracy

Precision: 88%-100%

Recall: 98%-100%

(RQ2)!Practicality

29

Page 73: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Bugs filed using our approach!on multi-licensed packages

FFmpeg

License!was updated!within 3 days

+

30

Page 74: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Bugs filed using our approach!on multi-licensed packages

FFmpeg

License!was updated!within 3 days

+

CUPS

+

Offending files were removed within 2 days

30

Page 75: Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Empirical study!!

!

!

!

!

!

!

!

!

!

(RQ1)!Accuracy

Precision: 88%-100%

Recall: 98%-100%

(RQ2)!Practicality

Prompted quick code changes in two systems

31

Page 76: Tracing Software Build Processes to Uncover License Compliance Inconsistencies
Page 77: Tracing Software Build Processes to Uncover License Compliance Inconsistencies
Page 78: Tracing Software Build Processes to Uncover License Compliance Inconsistencies
Page 79: Tracing Software Build Processes to Uncover License Compliance Inconsistencies
Page 80: Tracing Software Build Processes to Uncover License Compliance Inconsistencies
Page 81: Tracing Software Build Processes to Uncover License Compliance Inconsistencies