Propostas atribuidas 2024/2025

DEI - FCTUC
Gerado a 2024-07-17 07:16:39 (Europe/Lisbon).
Voltar

Titulo Estágio

Humor Generation in Portuguese with Genetic Algorithms and Large Language Models

Áreas de especialidade

Sistemas Inteligentes

Local do Estágio

DEI / CISUC

Enquadramento

Verbal humor is a general phenomenon that is strongly related to complex linguistic knowledge and fluency. Therefore, researchers on Artificial Intelligence and Natural Language Processing have been working over the last decades to build machines that are able not only to recognize, but also to create funny texts. One such example is a PhD research currently occurring in CISUC about detecting and generating punning humor in the Portuguese Language.

Different methods can be adopted for generating humorous text. GALMET (Genetic Algorithm using Language Models for Evolving Text) creates funny headlines for news articles with: (i) a Masked Language Model fine-tuned on humor, for replacing words in a given text; (ii) a regression model to estimate the level of funniness as a fitness function for a Genetic Algorithm (GA). This should be an interesting approach to adapt to the pun generation scenario, with plenty of paths for research, such as the inclusion of explicit phonetic transcription into the process.

Moreover, Large Language Models (LLMs) have revolutionized the area of Natural Language Generation, with growing abilities to generate fluent texts. However, there is still little research on assessing how this kind of model deals with creativity and, specifically, humor. Within this context, this work could also explore pun generation using Transformers and LLMs.

Objetivo

This project aims to explore recent methods for text generation through the transformation of texts, in Portuguese, into punning texts. Among others, this should include the adaptation of GALMET, or fine-tuning a language model like T5 (https://huggingface.co/google-t5/t5-base) for the purpose. The work can exploit Puntuguese (https://huggingface.co/datasets/Superar/Puntuguese), a corpus of punning jokes in Portuguese and their non-humorous counterparts.
Some examples (in Portuguese) include:
A que velocidade correm as vacas? Acém. (A que velocidade correm as vacas? 20km/h.)
O que é um fuinho? É um buaquinho na pauede. (O que é um fuinho? É um animal.)
Segui as instruções na embalagem da lasanha que dizia para colocar o forno a 180 graus. Agora como é que eu abro a porta, se o forno está virado para a parede? (Segui as instruções na embalagem da lasanha que dizia para colocar o forno a 180 graus. Agora como é que eu abro a porta, se o forno está tão quente?)

Puntuguese can be used to train a fitness function model or to fine-tune a generative LLM. The work may also take advantage of models already trained during the aforementioned PhD project. Despite training in Puntuguese, the resulting methods will be applicable to virtually any short text in Portuguese, including news headlines.

Finally, the achieved results should be analyzed towards useful conclusions on the ability of the employed methods for humor generation, and evaluated according to general practices in related literature. This involves the manual evaluation by humans according to criteria such as funniness, fluency, grammaticality, and others.

As an extra objective, the integration of methods for Machine Learning Explainability can be used for providing human-interpretable insights on the generation process.

Briefly. the objectives are the following:
Implement a Genetic Algorithm to generate humorous puns in Portuguese;
Fine-tune a Language Model for transforming any short Portuguese text to a pun;
(Extra) Study how to integrate ML Explainability methods in humor generation;
Take conclusions on the success of each approach, considering the type and quality of the resulting text as a humorous artifact.

Plano de Trabalhos - Semestre 1

* Literature review — Humor Generation, Genetic Algorithms, Transformer Language Models, Punning Humor
* Familiarization with Text Generation models, initial experimentation
* Master’s thesis project writing and presentation

Plano de Trabalhos - Semestre 2

* Adaptation of genetic algorithm to punning humor (fitness, mutation operations)
* Fine-tuning of T5 model for humor transformation
* Experimentation
* Consider other possible approaches and the integration of ML Explainability
* Analysis and Evaluation
* Writing of the Master’s thesis

Condições

The workplace will be in a CISUC laboratory, where there will be regular communication with the supervisors. Depending on initial progress, it will be possible for the selected student to apply for a research grant of 990€ / month, with a duration of 6 to 9 months.

Orientador

Hugo Gonçalo Oliveira e Marcio Lima Inácio
hroliv@dei.uc.pt 📩