Propostas com alunos

Gerado a 2025-07-04 04:30:34 (Europe/Lisbon).

Titulo Estágio

Evaluating Security in LLMs via Prompt Injection

Local do Estágio

CISUC-SSE

Enquadramento

Large Language Models (LLMs) have revolutionized natural language processing and artificial intelligence. These models, trained on vast datasets, can generate human-like text, answer questions, and perform various language-related tasks with remarkable accuracy. Given their appeal and potential, they have been widely integrated into various production environments. However, their complexity and the opacity of their internal mechanisms present significant challenges in ensuring their robustness, reliability, and security. They can be tricked into producing misleading, harmful, or unintended outputs through carefully crafted inputs. The fact that they are often fully integrated with other services and enhanced by Retrieval Augmented Generation (RAG) significantly increases the potential harm of an attack.

Prompt injection, a technique where specific inputs are crafted to elicit particular responses from an LLM, has emerged as both a tool for testing these models and a potential vector for security threats. This thesis proposal aims to explore the use of prompt injection to systematically test LLMs, with a dual focus on evaluating their performance and identifying security issues.

This work will focus on assessing prompt injection on state-of-the-art pre-trained LLMs (e.g., LLaMA, Falcon 180B, ChatGPT 3.5). A realistic scenario, with privacy and access controls concerns, relying on semi/fully automated GenAI-powered agents, such as a chatbot personal assistant or email application, will be devised for this study.

Objetivo

The learning objectives of this master internship are:
1) Security, vulnerabilities: study the subject of software security and vulnerabilities;
2) Secure Software Development: understand concepts related to secure software development, focusing on security-by-design, architecture, vulnerability detection;
3) Natural Language Processing and Large-Language Models: study AI/ML concepts, specifically LLMs; understand existing state-of-the-art LLM-based architectures;
4) Evaluating LLMs: study software/security validation approaches, and how existing techniques, with a focus on prompt injection, can be used for evaluating LLMs
5) Research Design: understand how to design and execute an experimental process to address complex and open research issues

Plano de Trabalhos - Semestre 1

[09/09/2024 a 20/10/2024] Literature review
Study the concepts to be used in the internship, namely security, vulnerabilities, LLMs, validation
[21/10/2024 a 05/11/2024] Analysis and selection of target techniques
Identification, analysis, and selection of which target datasets and scenarios, validation practices, machine learning techniques and LLMs will be studied
[06/11/2024 a 03/12/2024] Definition of the experimental process
Design and plan the experimental process that will be used to conduct the study
[04/12/2024 a 15/01/2025] Write the dissertation plan

Plano de Trabalhos - Semestre 2

[06/02/2025 a 06/03/2025] Set up the experimental testbed
Set up the testbed required to conduct the experiments
[07/03/2025 a 17/04/2025] Conduct the experimental campaign
Use the testbed to conduct the experimental process
[18/04/2025 a 08/05/2025] Analyze, explore, and process the results
Process, explore and analyze the results obtained from the experimental process, on the use of prompt injection to evaluate LLMs across various datasets and scenarios. Compare with existing results from the literature
[09/05/2025 a 05/06/2025] Write a scientific paper
[06/06/2025 a 08/07/2025] Write the thesis

Condições

Depending on the evolution of the internship a studentship may be available to support the development of the work in the second semester. The work is to be executed at the laboratories of the CISUC’s Software and Systems Engineering Group.

Observações

Co-supervised by Professor Leonardo Mariani from University of Milano – Bicocca, Italy

Orientador

João R. Campos
jrcampos@dei.uc.pt 📩