Date Thesis Awarded

5-2018

Document Type

Honors Thesis

Degree Name

Bachelors of Science (BS)

Department

Computer Science

Advisor

Denys Poshyvanyk

Committee Members

Robert M Lewis

Peter Kemper

Ross Iaci

Abstract

Upon installing a mobile application, human beings are able, to a great extent, to know immediately what the subcomponents of the screen do. They know what buttons return them to the previous screen, which ones submit their log in information, and which brings up the menu. This is the result of a combination of intuitive design and cross-platform design standards which allow users to draw on previous experience. Regardless, the fact that humans are able to understand the functionality of screen components at a glance suggests that there is semantic information encode into a mobile application’s GUI. In this work, we present an automated approach to exploring the nature of the semantic information encoded into the GUI of a mobile application. We do this using three modalities (1) a screenshot of an image, (2) text descriptions of the functionality of GUI components sourced through Amazon’s Mechanical Turk, and (3) parsed information from the screen hierarchy’s XML dump. The first two modalities are aligned using a convolutional neural network, which detects objects in the screenshot and extracts salient features, paired with a bidirectional recurrent neural network which serves as a language model. Both of these models maps their respective modalities to a semantic space, and then aligns the two representations in that space. The third modality is incorporated by using a Seq2Seq model which maps the screen’s XML dump directly to reasonable descriptions of the functionality of the screen. Our experiments reveal that semantic information extracted from the above representations of the GUI of a mobile application is comparable to that of real-world images such as those found in the MSCOCO dataset. In this work, we compare our results to similar models trained on this dataset, and compare the results from different screen representations against each other.

Share

COinS