Date Awarded


Document Type


Degree Name

Doctor of Philosophy (Ph.D.)


Computer Science


Adwait Nadkarni

Committee Member

Denys Poshyvanyk

Committee Member

Gang Zhou

Committee Member

Robert Lewis

Committee Member

Kevin Moran


There has been a massive shift towards the use of IoT products in recent years. While companies have come a long way in making these devices and services easily accessible to the consumers, very little is known about the privacy issues pertaining to these devices. In this dissertation, we focus on evaluating privacy pertaining to commodity-IoT devices by studying device usage behavior of consumers and privacy disclosure practices of IoT vendors. Our analyses consider deep intricacies tied to commodity-IoT domain, revealing insightful findings that help with building automated tools for a large scale analysis. We first present the design and implementation of Helion, a framework that generates natural home automation scenarios by identifying the regularities in user-driven home automation sequences, which are in turn generated from routines created by end-users. We hypothesize that smart home event sequences created by users exhibit inherent semantic patterns, or naturalness that can be modeled and used to generate valid and useful scenarios. To evaluate our approach, we first empirically demonstrate that this naturalness hypothesis holds, with a corpus of 30,518 home automation events, constructed from 273 routines collected from 40 users. We then demonstrate that the scenarios generated by Helion seem valid to end-users, through two studies with 16 external evaluators. We further demonstrate the usefulness of Helion’s scenarios by addressing the challenge of policy specification, and using Helion to generate 17 policies that help to improve security/safety and privacy with minimal effort. We then perform a systematic and data-driven analysis of the current state of smart home privacy policies, with a particular focus on three key questions: (1) how hard privacy policies are for consumers to obtain, (2) how existing policies describe the collection and sharing of device data, and (3) how accurate these descriptions are when compared to information derived from alternate sources. Our analysis of 596 smart home vendors, affecting 2, 442 smart home devices yields 17 findings that impact millions of users, demonstrate critical gaps in existing smart home privacy policies, as well as challenges and opportunities for their automated analysis. Finally, we propose a framework called PrivQuery that facilitates the representation and querying of regulatory statements in privacy regulations using a set of natural language processing techniques. PrivQuery converts unstructured statements from privacy regulations into a structured format, which are represented as tuples. PrivQuery then enables querying of tuples across privacy regulations to identify statements with semantically similar contexts. We evaluate tuples extracted by PrivQuery, demonstrating above 80% F-1 score in terms of both coverage and accuracy. We further compare our approach with existing techniques and show how PrivQuery reduces manual effort by providing an optimal number of similarity results. Our results demonstrate that PrivQuery can be useful in automating the task of privacy compliance and policy analysis as it enables identification and querying of semantically similar contexts from complex regulatory statements.



© The Author