Propostas de Estágio 2011/2012

Gerado a 2025-02-04 05:57:15 (Europe/Lisbon).

Titulo Estágio

iData: an interactive data analysis tool for physicians using Microsoft Pivot

Área Tecnológica

Informática Médica

Local do Estágio



Unsupervised clustering methods group or cluster records into subclasses that reflect patterns inherent in the data. These unsupervised methods do not always match the human ability to identify useful clusters, especially when dimensionality is low and visualization is possible. This situation has prompted the development of interactive clustering algorithms; these integrate a human teacher into the clustering loop. These techniques potentiate the intrinsic ability of a human user to group similar items mentally in low-dimensional and small datasets. This allows users to compare clusters, break-up individual groups by different attributes, re-cluster them by different attributes, etc. Without proper support, it is observed that both the size of the data set and the types of attributes that an individual can operate is quite limited. In interactive clustering strategies, support is provided mainly in two fields: visualization of clustering results (e.g. Microsoft Pivot) and interaction with the clustering process. Regarding the latter, it can occur at two levels: interaction with the result set and interaction with the clustering process itself. Interaction with the result set is usually either centred around hiding data or domain specific methods. Systems that allow interaction with the clustering process are less common. Many systems let the user define initial cluster centres in a visual way. Others go further by allowing the user to interact with the clustering process as it occurs. Interaction might occur during cluster intermediate validation stages (e.g. merging, splitting, deletion, assigning confidence levels to clusters, etc.), or by specifying constraints that have to be met during the clustering process. Another form of knowledge-driven clustering that can be explored in interactive clustering is the so-called (semi-)supervised clustering. In this setting, a series of sets of items and the complete corresponding clusters are provided. The algorithm adjusts itself so that future sets of items, that are provided, are partitioned in the same fashion as training instances, i.e. supervised clustering is applied on classified examples with the aim of producing clusters that have high probability density with respect to individual classes.


The goal of this thesis will be the development of an interactive patient clustering tool based on the Microsoft Pivot technology (see able to assist physicians in the identification of groups of patients who share some common parameters or characteristics, according to a predetermined search criteria or search examples (e.g. grouping examples) defined by the user in each case. For this goal a data exploration pipeline that includes i) convenient patient group visualization methods, ii) proper user interaction models for knowledge expression and sub-problem formulation, and, finally, iii) their integration into iterative clustering method will be researched.

Plano de Trabalhos - Semestre 1

-Analysis and learning of Microsoft Pivot and Silver Light
-Analysis of the State of the Art in interactive clustering
-Formulation of Hypothesis
-Specification of the interactive clustering solution
-Writing of the first semester report

Plano de Trabalhos - Semestre 2

-Development of the interactive clustering solution
-Research of interactive clustering algorithms
-Integration of the interactive clustering algorithms into interactive clustering solution
-Evaluation of the interactive clustering solution on existing clinical databases
-Writing of the dissertation


Good knowledge on intelligent data analysis
Good programming skills


Paulo de Carvalho 📩