Titulo Estágio
Modeling Events Information
Área Tecnológica
Reconhecimento de Padrões
Local do Estágio
DEI
Enquadramento
Nowadays, the most relevant events in the city are advertised online, however this information is usually in the form of unstructured text (natural language), which hinders the exploitation of the full potential of such wealthy information that can be used, for example, in modeling urban mobility.
In the AmILab/CMS, we have been working with the Natural Language Processing (NLP) techniques in order to transform this unstructured information in structured knowledge that can be used, for example, to estimate the impact that events have on transportations systems.
In our project CROWDS (funded by FCT, reference PTDC/EIA-EIA/115014/2009), one of the main tasks is to correlate mobility with event semantics. With a rich information on events, we should be able to understand its mobility implications. This project involves collaboration with the Massachusetts Institute of Technology (MIT).
The role of the student in this project is to work on the event analysis side, particularly on modeling events information using Machine Learning techniques such as probabilistic topic models and supervised Latent Dirichlet Allocation.
Objetivo
Building on work that is currently being developed in our lab in connection with MIT and SMART, the goal is to use Machine Learning approaches (e.g. Latent Dirichlet Allocation) to understand the underlying topics of events descriptions (e.g. is it more about Music or more about Comedy?), and how these relate to other event characteristics such as their price, attendance (tickets sold), target audience, categories, and ultimately their impact on mobility.
Plano de Trabalhos - Semestre 1
The tentative plan for this project (semester 1) is the following:
- October 15th - State of the art. (1.5 months)
- October 31st - Understanding previous work in the lab. (2 months)
- November 30th - Topic modeling over events descriptions. (1 month)
- December 15th - Finding correlations between events distribution over topics and other event characteristics. (1 month)
- January 15th - Journal paper submission. (2 months)
- January 31th - Intermediate report. Plan for new techniques to explore on the following semester. Possible techniques include supervised Latent Dirichlet Allocation (sLDA) and max-margin Latent Dirichlet Allocation (MedLDA). (1 month)
Plano de Trabalhos - Semestre 2
Plano de Trabalhos 2º Semestre
The tentative plan for this project (semester 2) is the following:
- March 31th - Experimentation with new techniques according to previously defined plan. (2 months)
- May 31th - Experiments report. Paper submission. (2 month)
- June 30th - MSc thesis delivery. (1 month)
Condições
Strong skills in programming (Java, Python, C/C++).
Other interesting (optional) skills/interests include Machine Learning and Natural Language Processing techniques.
Will to communicate in English with other researchers is also important.
Observações
The candidate curriculum is required.
The CROWDS project comprises one or more short internships (1-2 months) on MIT.
There is also a possibility of funding for this work.
Orientador
Francisco Câmara Pereira e Filipe Rodrigues (CISUC)
camara@dei.uc.pt 📩