Propostas Submetidas

DEI - FCTUC
Gerado a 2024-05-02 13:26:40 (Europe/Lisbon).
Voltar

Titulo Estágio

GeoDataMining: Retrieval, Enrichment and Clustering of Geospatial Data

Áreas de especialidade

Sistemas Inteligentes

Local do Estágio

CISUC

Enquadramento

Over the last few years, foreign investment in Portugal has been growing and the country has attracted the interest from large corporations. According to AICEP, the Portuguese Trade & Investment Agency, multinational companies are well positioned to transfer their competitive products and processes, and are looking for locations that best match the factors that inform the decision to invest. In order to facilitate a first contact by these companies, AICEP provides a portal supported on a web GIS (Geographic Information System) platform that presents static socio-economic information about the municipalities, the location of industrial parks and equipment available in the form of features on a map (http://globalfind.globalparques.pt/).

However, an immense amount of georeferenced data is available online, whether in social platforms or in the Internet – by means of open data or in specialized directories, which after extracted and structured could allow to enrich such information and would attract even more attention in this first contact. In addition, different regions in the national territory offer their own conditions (e.g., financial incentives, available land, environmental impact, other competing companies or partners, and the availability of labor, accessibility of transport routes) that may influence the location of these companies. Furthermore, AICEP is committed in turning the “Transformação Digital” a reality, and this project is part of this strategy (https://www.dn.pt/lusa/interior/aicep-preve-investir-quase-1me-na-transformacao-digital-da-agencia-8819956.html).

Objetivo

The main objective of this proposal it to develop a Web Mining engine that collect, structure, fuse and make georeferenced data available for visualization in the AICEP Geographic Information System. The first main goal is described in the following steps:

(i) Step 1: Data Collecting: to collect and fuse data available for visualization. This data is already available in AICEP from partner institutions. A region of study will be chosen as a proof-of-concept. Beyond this, historical data comprising Investments, Funds applied and current PT2020 Funds per Region, Goods/Services Exported, Target Markets and Successful/Unsuccessful Business/Industry counselled in the past are available in AICEP internal databases. We aim to publish AICEP data as anonymized datasets for the benefit of other researchers and planners, following the directive from European Union for the Public Section Information (PSI), which encourages the Member States to make as much information available for re-use as possible. Data fusion mechanisms and spatial alignment techniques will be used to organize these data.

(ii) Step 2: Data Enrichment: to enrich these data with additional information from Social Networks and the Web with new dimensions that are not available or lack detail: companies that are already in the territory with different dimensions and from distinct productive sectors, real estate data on available land lots, stores and offices, professionals that may be interested to work in that region and in this type of company.

(iii) Step 3: Feature Engineering: different attributes will be
extracted from this new and the previous information to create rich datasets of choice of locations instances that occurred in the past.
The second main goal is to develop a model able to organize data in an unsupervised manner based on a similarity function, using machine learning and pattern recognition techniques. The next steps should be followed:

(iv) Step 4: Learning a Similarity Function: to learn a similarity function between locations using historical data comprising choices made in the past by companies in the past.

(v) Step 5: Model Building: to define a clustering model that groups similar locations, based on hierarchical and density-based clustering algorithms. This model will set out to classify new locations, which may thus belong to more than one cluster, where the membership function exhibits different strengths.

(vi) Step 6: Model Tuning and Validation: Select appropriate ML clustering algorithms for building the Web Mining engine. Perform Sampling and Model Evaluation; e.g. Contingency Tables, Sum-of-Squared-Error Criterion, Mutual Information, Rand Index, etc.

Plano de Trabalhos - Semestre 1

•Literature Review;
•Compilation of Dataset of already available data from AICEP and their partners;
•Preliminary Analysis of the Web Mining and Clustering Engine; •Development of a Scrapping tool for extracting real estate information from the Web for a given region of study;
•Writing of intermediate report.

Plano de Trabalhos - Semestre 2

•Extract, prepare, and preprocess a collection of Social Network data using different APIs for experiments;
•Study, and select, machine learning (ML) clustering algorithms and feature selection (FS) algorithms for building the unsupervised model for grouping locations on the region of study;
•Analyze experimental results: e.g., study parameter values; compare performance of the reduced datasets vs. previous results, etc.;
•Writing of scientific article;
•Writing of the thesis.

Condições

This work will be carried out in collaboration between the Ambient Intelligence Lab (AmILab) and the Laboratory of Neural Networks (LARN) of CISUC (Center of Informatics and Systems of the University of Coimbra), where there will be a regular supervision and feedback on the behalf of the supervisor and co-supervisor.

Familiarity with machine learning and data mining algorithms and software tools are essential. Participating students will acquire valuable knowledge and experience with model building and data science by mining massive datasets, which skills are currently in high demand for various technology employers due to the relevance to various applications. If not familiar with GIS, the student will get tutoring from the supervisor.

Observações

Grant scholarship will become available during internship depending on the 1st semester candidate evaluation. This grant is funded by AICEP (6 months x 745 euros). This project is part of a project proposal submitted to FCT in 2018 in the program AI & Data Science Inititative (https://www.fct.pt/apoios/projectos/concursos/datascience/index.phtml.en). The candidates are ranked for the grant depending on an interview basis score and CV.

Logistics @ Ambient Intelligence Lab (AmILab) and Laboratory of Neural Networks (LARN)
DEI-FCTUC

Orientador

Orientadora: Ana Alves (ana@dei.uc.pt) Co-orientadora: Bernardete Ribeiro
bribeiro@dei.uc.pt 📩