Date Awarded

2018

Document Type

Dissertation

Degree Name

Doctor of Philosophy (Ph.D.)

Department

Computer Science

Advisor

Denys Poshyvanyk

Committee Member

Andreas Stathopoulos

Committee Member

Peter Kemper

Committee Member

Xu Liu

Committee Member

Daniel German

Abstract

Open source licensing determines how open source systems are reused, distributed, and modified from a legal perspective. While it facilitates rapid development, it can present difficulty for developers in understanding due to the legal language of these licenses. Because of misunderstandings, systems can incorporate licensed code in a way that violates the terms of the license. Such incompatibilities between licensing can result in the inability to reuse a particular library without either relicensing the system or redesigning the architecture of the system. Prior efforts have predominantly focused on license identification or understanding the underlying phenomena without reasoning about compatibility in a broad scale. The work in this dissertation first investigates the rationale of developers and identifies the areas that developers struggle with respect to free/open source software licensing. First, we investigate the diffusion of licenses and the prevalence of license changes in a large scale empirical study of 16,221 Java systems. We observed a clear lack of traceability and a lack of standardized licensing that led to difficulties and confusion for developers trying to reuse source code. We further investigated the difficulty by surveying the developers of the systems with license changes to understand why they first adopted a license and then changed licenses. Additionally, we performed an analysis on issue trackers and legal mailing lists to extract licensing bugs. From these works, we identified key areas in which developers struggled and needed support. While developers need support to identify license incompatibilities and understand both the cause and implications of the incompatibilities, we observed that state-of-the-art license identification tools did not identify license exceptions. Since these exceptions directly modify the license terms (either the permissions granted by the license or the restrictions imposed by the license), we proposed an approach to complement current license identification techniques in order to classify license exceptions. The approach relies on supervised machine learners to classify the licensing text to identify the particular license exceptions or the lack of a license exception. Subsequently, we built an infrastructure to assist developers with evaluating license compliance warnings for their system. The infrastructure evaluates compliance across the dependency tree of a system to ensure it is compliant with all of the licenses of the dependencies. When an incompatibility is present, it notes the specific library/libraries and the conflicting license(s) so that the developers can investigate these compliance warnings, which would prevent distribution of their software, in their system. We conduct a study on 121,094 open source projects spanning 6 programming languages, and we demonstrate that the infrastructure is able to identify license incompatibilities between these projects and their dependencies.

DOI

http://dx.doi.org/10.21220/s2-xp8w-0w53

Rights

© The Author

Share

COinS