tracing software build processes to uncover license compliance inconsistencies

Post on 05-Dec-2014

169 Views

Category:

Software

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Open Source Software (OSS) components form the basis for many software systems. While the use of OSS components accelerates development, client systems must comply with the license terms of the OSS components that they use. Failure to do so exposes client system distributors to possible litigation from copyright holders. Yet despite the importance of license compliance, tool support for license compliance assessment is lacking. In this paper, we propose an approach to extract and analyze the Concrete Build Dependency Graph (CBDG) of a software system by tracing system calls that occur at build-time. Through a case study of seven open source systems, we show that the extracted CBDGs: (1) accurately classify sources as included in or excluded from deliverables with 88%-100% precision and 98%-100% recall, and (2) can uncover license compliance inconsistencies in real software systems - two of which prompted code fixes in the CUPS and FFmpeg systems.

TRANSCRIPT

Tracing Software Build Processes to Uncover License Compliance Inconsistencies

@shane_mcintosh

Shane McIntosh

Sander van der Burg

Eelco Dolstra

Julius Davies

Daniel M. Germán

Armijn Hemel

! Tjaldur!

Software Governance Solutions

Software reuse enables rapid!development of new applications

2

Software reuse enables rapid!development of new applications

GNU

2

Software reuse enables rapid!development of new applications

GNUMozilla

2

Software reuse enables rapid!development of new applications

GNUApacheMozilla

2

Reusable components are released under different license terms

GNUMozilla

3

Apache

Reusable components are released under different license terms

GNUMozilla

3

Apache

Reusable components are released under different license terms

GNUMozilla

3

Apache

Reusable components are released under different license terms

GNUMozilla

3

Apache Public License

Apache

Reuse puts legal constraints on how client systems can be distributed

External component

Client system

Used by

4

Reuse puts legal constraints on how client systems can be distributed

External component

Client system

Used by

4

Failure to comply with license terms can lead to costly legal issues

5

Failure to comply with license terms can lead to costly legal issues

5

Failure to comply with license terms can lead to costly legal issues

5

Failure to comply with license terms can lead to costly legal issues

5

6

Which source files are enabled?

Ensuring license compliance with reused components

.c.c.c.c

6

Which source files are enabled?

Ensuring license compliance with reused components

.c.c.c

.c

6

Which source files are enabled?

Which components are used?

Ensuring license compliance with reused components

.c.c.c

.c

6

Which source files are enabled?

Which components are used?

How are they combined?

Ensuring license compliance with reused components

.c.c.c

.c

Static link

Dynamic link

6

Which source files are enabled?

Which components are used?

How are they combined?

Ensuring license compliance with reused components

.c.c.c

.c

Static link

Dynamic link

The build system can!answer these questions!

What is a build system?

7

Deliverable

What is a build system?

7

.tex

.c

.cc

.o

.o

.dvi

.a

.exe

.pdf

.deb

Build systems describe how sources are!translated into deliverables

8

Step 1 - Configuration

Step 2 - Construction

Step 3 - Certification

Step 4 - Packaging

Step 5 - Deployment

9

Step 1 - Configuration

Step 2 - Construction

Step 3 - Certification

Step 4 - Packaging

Step 5 - Deployment

9

Focus of this paper

Step 1 - Configuration

10

11

Step 2 - Construction

patchelf: patchelf.o g++ patchelf.o -o patchelf

Incompleteness of build specs makes license compliance assessment difficult

patchelf.o: patchelf.cc g++ -c patchelf.cc

install: patchelf install patchelf /usr/bin/

12

patchelf: patchelf.o g++ patchelf.o -o patchelf

patchelf.o: patchelf.cc g++ -c patchelf.cc

install: patchelf install patchelf /usr/bin/

13

Incompleteness of build specs makes license compliance assessment difficult

patchelf: patchelf.o g++ patchelf.o -o patchelf

patchelf.o: patchelf.cc g++ -c patchelf.cc

install: patchelf install patchelf /usr/bin/

Header file dependencies are not listed

13

Incompleteness of build specs makes license compliance assessment difficult

patchelf.ccelf.h

patchelf.o

ExtractedMissing

Dependencies

14

Incompleteness of build specs makes license compliance assessment difficult

patchelf.ccelf.h

patchelf.o

ExtractedMissing

Dependencies

14

Incompleteness of build specs makes license compliance assessment difficult

patchelf.ccelf.h

patchelf.o

ExtractedMissing

Dependencies

14

Incompleteness of build specs makes license compliance assessment difficult

patchelf: patchelf.o g++ patchelf.o -o patchelf

patchelf.o: patchelf.cc g++ -c patchelf.cc

install: patchelf install patchelf /usr/bin/

15

Incompleteness of build specs makes license compliance assessment difficult

patchelf: patchelf.o g++ patchelf.o -o patchelf

patchelf.o: patchelf.cc g++ -c patchelf.cc

install: patchelf install patchelf /usr/bin/

External library dependencies are not listed

15

Incompleteness of build specs makes license compliance assessment difficult

patchelf.ccelf.h

patchelf.o

patchelf

libstdc++ExtractedMissing

Dependencies

16

Incompleteness of build specs makes license compliance assessment difficult

patchelf.ccelf.h

patchelf.o

patchelf

libstdc++ExtractedMissing

Dependencies

16

Incompleteness of build specs makes license compliance assessment difficult

patchelf.ccelf.h

patchelf.o

patchelf

libstdc++ExtractedMissing

Dependencies

16

Incompleteness of build specs makes license compliance assessment difficult

patchelf: patchelf.o g++ patchelf.o -o patchelf

patchelf.o: patchelf.cc g++ -c patchelf.cc

install: patchelf install patchelf /usr/bin/

17

Incompleteness of build specs makes license compliance assessment difficult

patchelf: patchelf.o g++ patchelf.o -o patchelf

patchelf.o: patchelf.cc g++ -c patchelf.cc

install: patchelf install patchelf /usr/bin/

Hidden relationship between patchelf and

/usr/bin/patchelf

17

Incompleteness of build specs makes license compliance assessment difficult

patchelf.ccelf.h

patchelf.o

patchelf

/usr/bin/patchelflibstdc++ExtractedMissing

Dependencies

18

Incompleteness of build specs makes license compliance assessment difficult

patchelf.ccelf.h

patchelf.o

patchelf

/usr/bin/patchelflibstdc++ExtractedMissing

Dependencies

18

Incompleteness of build specs makes license compliance assessment difficult

We use system tracing to recover the missing dependencies

Build process

19

Trace!log

OS kernel

open()

We use system tracing to recover the missing dependencies

Build process

19

read()

write()

close()

Trace!log

Trace!log

We mine build traces to construct a concrete build dependency graph

patchelf.ccelf.h

patchelf.o

patchelf

/usr/bin/patchelflibstdc++ExtractedMissing

Dependencies

20

Trace!log

We mine build traces to construct a concrete build dependency graph

patchelf.ccelf.h

patchelf.o

patchelf

/usr/bin/patchelflibstdc++ExtractedMissing

Dependencies

20

Trace!log

We mine build traces to construct a concrete build dependency graph

patchelf.ccelf.h

patchelf.o

patchelf

/usr/bin/patchelf

g++

libstdc++ExtractedMissing

Dependencies

20

Trace!log

We mine build traces to construct a concrete build dependency graph

patchelf.ccelf.h

patchelf.o

patchelf

/usr/bin/patchelf

g++

libstdc++

g++

ExtractedMissing

Dependencies

20

Trace!log

We mine build traces to construct a concrete build dependency graph

patchelf.ccelf.h

patchelf.o

patchelf

/usr/bin/patchelf

g++

libstdc++

g++install

ExtractedMissing

Dependencies

20

patchelf.ccelf.h

patchelf.o

patchelf

/usr/bin/patchelf

g++

libstdc++

g++install

ExtractedMissing

Dependencies

Annotate build graph nodes with license information using Ninka

21

patchelf.ccelf.h

patchelf.o

patchelf

/usr/bin/patchelf

g++

libstdc++

g++install

ExtractedMissing

Dependencies

Annotate build graph nodes with license information using Ninka

21

Inconsistency introduced!

patchelf.ccelf.h

patchelf.o

patchelf

/usr/bin/patchelf

g++

libstdc++

g++install

ExtractedMissing

Dependencies

Annotate build graph nodes with license information using Ninka

21

22

Empirical study!!

!

!

!

!

!

!

!

!

!

22

Empirical study!!

!

!

!

!

!

!

!

!

!

(RQ1)!Accuracy

22

Empirical study!!

!

!

!

!

!

!

!

!

!

(RQ1)!Accuracy

(RQ2)!Practicality

22

Empirical study!!

!

!

!

!

!

!

!

!

!

(RQ1)!Accuracy

(RQ2)!Practicality

23

24

Measuring the accuracy!of our CBDG approach

Included .c.c.c.c

Excluded

24

Measuring the accuracy!of our CBDG approach

Included .c.c.c

.cExcluded

24

Measuring the accuracy!of our CBDG approach

Included .c.c .c

.cExcluded

Delete

24

Measuring the accuracy!of our CBDG approach

Included .c.c .c

.cExcluded

Delete

Execute build

24

Broken means true positive

Measuring the accuracy!of our CBDG approach

Included .c.c .c

.cExcluded

Delete

Execute build

24

Clean means false positive

Broken means true positive

Measuring the accuracy!of our CBDG approach

Included .c.c .c

.cExcluded

Delete

Execute build

24

Clean means false positive

Broken means true positive

Measuring the accuracy!of our CBDG approach

Included .c.c .c

.cExcluded

Delete

Execute build

.c

24

Clean means false positive

Broken means true positive

Measuring the accuracy!of our CBDG approach

Included .c.c .c

.cExcluded

Delete

Execute build

Execute build

.c

24

Clean means false positive

Broken means true positive

Measuring the accuracy!of our CBDG approach

Included .c.c .c

.cExcluded

Delete

Execute build

Broken means false negative

Execute build

.c

24

Clean means false positive

Broken means true positive

Measuring the accuracy!of our CBDG approach

Included .c.c .c

.cExcluded

Delete

Execute build

Clean means true negative

Broken means false negative

Execute build

.c

Our approach accurately selects the files that impact system deliverables

Aterm Opkg Bash CUPS Xalan OpenSSL FFmpeg

Technology Make Make Make Make Ant Make Make

Precision 100% 97% 88% 100% 99% 99% 99%

Recall 98% 99% 100% 99% 100% 100% 100%

25

But there are cases where our approach makes mistakes

Aterm Opkg Bash CUPS Xalan OpenSSL FFmpeg

Technology Make Make Make Make Ant Make Make

Precision 100% 97% 88% 100% 99% 99% 99%

Recall 98% 99% 100% 99% 100% 100% 100%

26

And there are cases when our approach misses files that impact deliverables

Aterm Opkg Bash CUPS Xalan OpenSSL FFmpeg

Technology Make Make Make Make Ant Make Make

Precision 100% 97% 88% 100% 99% 99% 99%

Recall 98% 99% 100% 99% 100% 100% 100%

27

Empirical study!!

!

!

!

!

!

!

!

!

!

(RQ1)!Accuracy

Precision: 88%-100%

Recall: 98%-100%

(RQ2)!Practicality

28

Empirical study!!

!

!

!

!

!

!

!

!

!

(RQ1)!Accuracy

Precision: 88%-100%

Recall: 98%-100%

(RQ2)!Practicality

29

Bugs filed using our approach!on multi-licensed packages

FFmpeg

License!was updated!within 3 days

+

30

Bugs filed using our approach!on multi-licensed packages

FFmpeg

License!was updated!within 3 days

+

CUPS

+

Offending files were removed within 2 days

30

Empirical study!!

!

!

!

!

!

!

!

!

!

(RQ1)!Accuracy

Precision: 88%-100%

Recall: 98%-100%

(RQ2)!Practicality

Prompted quick code changes in two systems

31

top related