Propostas Submetidos

DEI - FCTUC
Gerado a 2024-11-21 22:12:27 (Europe/Lisbon).
Voltar

Titulo Estágio

Prompt Engineering for Knowledge Extraction from Large Language Models

Áreas de especialidade

Sistemas Inteligentes

Local do Estágio

DEI / CISUC

Enquadramento

Large Language Models (LLMs) have significantly advanced the field of Natural Language Processing (NLP) with their ability to represent human language and automate complex processes. They can perform a broad range of tasks and generate text constrained by straightforward natural language prompts, but their adoption has been hindered by the high computational resources required and the black-box reasoning mechanisms. On the other hand, Knowledge Bases (KBs) are structured on human-accessible facts, which may be queried by formal languages, and thus more consistent. However, they have limited coverage and lack the ability to learn.

In this dissertation, solutions that combine the strengths of both LLMs and KBs will be studied, tested and proposed. They should leverage on the flexibility of LLMs for text completion and generation, and in prompt engineering for Knowledge Extraction (KE). Extracted domain knowledge should be formalized in a KB, using standard technologies, such as RDF. We expect this KB to be an accessible, editable, reasonable and interpretable representation of what the model “knows”.

This internship will involve research and usage of LLMs, such as GPT, OPT, PALM and BERT, and the creation of easily-accessible, reasonable and interpretable KBs on specific domains (e.g., finance, biology).

Objetivo

In this dissertation, the student should study, propose and implement a solution that combines the strengths of both LLMs and KBs. To achieve this goal, the following objectives are set:
- Review the state of the art of LLMs and prompt engineering;
- Study and select the domains of application, based on available datasets and challenges;
- Explore prompt engineering for KE from different LLMs;
- Devise a structured representation of the knowledge extracted;
- Propose a framework for creating domain KBs from a LLM;
- Take conclusions on using the created KBs versus the LLM directly

Plano de Trabalhos - Semestre 1

- Literature review;
- Identification of a possible datasets / domains of application;
- Familiarization with LLMs, prompt engineering and KE through preliminary experimentation;
- Selection of the domains of application, LLMs and knowledge representation standards to use.
- Draft a framework for KE and formalization from LLMs
- Write intermediate report: Introduction, State of the Art, Project Proposal and Preliminary Work

Plano de Trabalhos - Semestre 2

- Experiment with possible approaches for KE from LLMs;
- Review the initial framework based on experimental results;
- Analyze the advantages and limitations of using LLMs directly versus domain KBs extracted from them, considering consistency, reasoning and explainability;
- Write the Master Thesis.

Condições

The workplace will be in a CISUC laboratory, where there will be regular communication with the supervisors. Since the dissertation will contribute to a funded research project, the selected student may apply for a research grant of 930€ / month, with a duration of 6 to 9 months.

Orientador

Hugo Oliveira, Catarina Silva, Bruno Ferreira
hroliv@dei.uc.pt 📩