Titulo Estágio
Active Learning Algorithm based on Surprise and Curiosity
Áreas de especialidade
Sistemas Inteligentes
Local do Estágio
CMS-DEI-FCTUC
Enquadramento
The key idea behind active learning is that a machine learning algorithm can perform better with less training if it is allowed to choose the data from which it learns. An active learner may pose "queries," usually in the form of unlabeled data instances to be labeled by an oracle (e.g., a human annotator) that already understands the nature of the problem. This sort of approach is well-motivated in many modern machine learning and data mining applications, where unlabeled data may be abundant or easy to come by, but training labels are difficult, time-consuming, or expensive to obtain. There are already several algorithms, such as those belonging to pool-based active learning, in which the learner (i.e., the classifier) has access to a pool of unlabeled instances (e.g., emails) and can request labels (spam/not spam) for instances likely to be most helpful in building a predictive model.
Objetivo
The goal of this project is to develop a new algorithm based on surprise [Macedo et al 2004, 2001] to be incorporated in artificial agents with the aim of decreasing the amount of information considered in the agents own decisions and/or decreasing the amount of information provided by the agents to humans. Such artificial agents can be used in several domains such as medical domain, spam detection. The application domain will be selected by the student. For instance, if integrated with a supervised machine learning classification system in the medical domain (e.g., when a medical doctor has to label the training examples of a machine learning system for disease categorization), such an agent would select a small set of examples in order to avoid the burden of labeling thousands of examples by a medical doctor. On the other hand, if integrated in a personal alerting system such as in a hospital information system, it would alert healthcare professionals or the patient himself when relevant changes are noticed in realtime patient data.
Plano de Trabalhos - Semestre 1
Fase 1 - Revisão da bibliografia e estado da arte
Fase 2 - Desenvolvimento de um protótipo simples para demonstração de conceito.
Fase 3 - Elaboração da proposta de dissertação.
Plano de Trabalhos - Semestre 2
Fase 4 - Desenvolvimento de soluções de acordo com o plano de investigação proposto.
Fase 5 - Testes, experimentação e avaliação dos resultados.
Fase 6 - Escrita da Dissertação.
Fase 7 - Escrita de um artigo científico.
Condições
O trabalho será desenvolvido num laboratório do CMS, com recurso a meios computacionais adequados.
Orientador
Luís Macedo
macedo@dei.uc.pt 📩