Security Flaws In Generative AI Code: Evaluating Prompt Strategies and Model Parameters in Python
2025 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE credits
Student thesis
Abstract [en]
This study further investigates security risks in code generated by LLMs, focused on how adjusting prompt design and model parameters can influence the occurrence of security vulnerabilities in the output. By generating Python functions with prompts using OpenAI's API call, and analyzing the output with Bandit, a static analysis tool for Python code. The findings suggest that adopting prompt strategies had a very noticeable impact on the reduction of security risks appearing in the code, at roughly tenfold less than the counterpart i.e. not adopting prompt strategies. While prompt engineering had a profound effect, the same thing didn’t apply for adjusting the models parameter; there were some small differences, but not large enough variability to draw a definite conclusion. Despite generating code throughout over dozen of different categories with their unique distinct prompt style coupled with parameter settings. Security risks still appeared on all batches of categories. This emphasizes that code output needs for continued human or security tool analysis review before deployment, as generative AI isn’t at a point for being to flawlessly produce perfect code, yet.
Abstract [sv]
Denna studie undersöker säkerhetsrisker i genererade kod av stora språkmodeller (LLM). Med fokus på hur justering av promptdesign och modellparametrar påverkar förekomsten av sårbarheter. Genom att generera diverse Python-funktioner via OpenAIs API call och analysera resultaten med ett statistiskt analysverktyg (Bandit), identifierade det tydliga skillnader mellan optimerade och icke optimerade prompts. Det visade sig att promptdesign hade en mycket större påverkan på kodens säkerhet gentemot en justering av modellens parametrar. Med ett snitt på ca tio gånger mindre sårbarheter mellan kod som var genererade med optimerade prompts än icke-optimerade prompter. Anledningen till studien var också att förmedla utvecklare att inte lägga alltför stor tillit på LLMs genererade kod, utan att det bör alltid krävas en mänsklig manuell bedömning av koden innan den används.
Place, publisher, year, edition, pages
2025. , p. 39
Series
TRITA-EECS-EX ; 2025:347
Keywords [en]
LLM, Temperature, Top-P, Security, Prompt Engineering
Keywords [sv]
LLM, Temperature, Top-P, Security, Prompt Engineering
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-367688OAI: oai:DiVA.org:kth-367688DiVA, id: diva2:1985840
Supervisors
Examiners
2025-07-302025-07-282025-07-30Bibliographically approved