Date Thesis Awarded

5-2024

Access Type

Honors Thesis -- Open Access

Degree Name

Bachelors of Science (BS)

Department

Computer Science

Advisor

Denys Poshyvanyk

Committee Members

Adwait Nadkarni

Cristiano Fanelli

Martin White

Abstract

Large Language Models (LLMs) have the capability to model long-term dependencies in sequences of tokens, and are consequently often utilized to generate text through language modeling. These capabilities are increasingly being used for code generation tasks; however, LLM-powered code generation tools such as GitHub's Copilot have been generating insecure code and thus pose a cybersecurity risk. To generate secure code we must first understand why LLMs are generating insecure code. This non-trivial task can be realized through interpretability methods, which investigate the hidden state of a neural network to explain model outputs. A new interpretability method is rationales, which obtains the minimum subset of input tokens that lead to the model's output. Through obtaining rationales of insecure code, we are able to investigate the relationship between model inputs and LLM-generated insecure code tokens to further efforts in mitigating cybersecurity risks currently posed by LLM-generated code.

Our experiment conducts a case study on two common, pervasive, and severe real-world weaknesses: XSS injection (CWE-79) and SQL injection (CWE-89). We first collected data, then obtained rationales for our weak Python code samples via the greedy rationalization algorithm and a GPT-2 model. Thus, we were able to identify the specific tokens which lead to insecure token generation. We also explored an aggregation function for code rationales - structural code taxonomy - which allowed us to investigate rationales on the local and global levels. Our prototype study found good results: rationales for CWE-79 and CWE-89 code samples have different structural code taxonomy mappings. This implies that each LLM-generated weakness arises from different aspects of the code context, and thus efforts to mitigate insecure LLM-generated code must be precisely targeted to the weakness.

Comments

This honors thesis was written and accepted for departmental honors in Computer Science, also serving as partial fulfillment for the author's computer science minor. A second thesis (not honors) was written and accepted for partial fulfillment of the author's Bachelor's of Science in Physics. The second thesis, titled Data Analysis and Machine Learning on DIRC Hit Patterns for PID, explores a different application of neural networks: enhancement of subatomic particle classification methods through a novel neural network architecture, the Swin vision transformer.

Share

COinS