Propostas Submetidas

Gerado a 2025-07-17 19:29:40 (Europe/Lisbon).

Voltar

Titulo Estágio

Multimodal Machine Learning for Scalable Biodiversity Monitoring

Áreas de especialidade

Sistemas Inteligentes

Local do Estágio

CISUC

Enquadramento

Modern Artificial Intelligence (AI) systems are becoming increasingly multimodal, capable of learning from and combining different types of data, such as images, audio, and text. This is a growing trend in Machine Learning (ML), enabling more flexible, powerful, and realistic models that can process information the way humans do.
This internship will explore the foundations of multimodal learning, focusing on combining visual data (images) and acoustic data (sounds) for classification tasks. The idea is to build a simple yet effective system that takes input from both modalities and uses them together to improve decision making. A practical example would be detecting species using both photos and their vocalizations, but the core goal is to understand and prototype basic multimodal architectures.
The work begins with exploratory experiments on well known benchmark datasets for each modality before progressing toward more realistic and domain-specific scenarios. Examples of relevant datasets include the “200 Bird Species Image Dataset” (https://www.kaggle.com/datasets/veeralakrishna/200-bird-species-with-11788-images) and the “British Birdsong Dataset” (https://www.kaggle.com/datasets/rtatman/british-birdsong-dataset), which could be used independently or aligned at the class level to create a custom multimodal dataset. This would support experimentation with different fusion strategies and architectural designs for combining image and audio inputs in a unified classification framework.

Objetivo

To achieve the above defined goal, the following objectives will be pursued:
Study the state of the art multimodal deep learning, with a focus on combining visual and acoustic data
Study the available multimodal machine learning architectures
Possible construction of multimodal Dataset
Define, implement, and test the framework available data

Plano de Trabalhos - Semestre 1

- Review relevant literature on multimodal learning, CNNs, and audio-visual data fusion
- Explore and evaluate existing benchmark datasets for image and audio classification
- Run preliminary experiments using unimodal CNN architectures for both images and spectrograms
- Start implementing the proposed framework
- Write intermediate report

Plano de Trabalhos - Semestre 2

- Implement the proposed solution
- Test and evaluate performance
- Write final report

Condições

This work should take place in the context of a research project in CISUC. There is the possibility of a 6-month scholarship.

Orientador

Catarina Silva, Bernardete Ribeiro, Dinis Costa
catarina@dei.uc.pt 📩