Titulo Estágio
Ensemble Learning for Keyword Extraction
Área Tecnológica
Reconhecimento de Padrões
Local do Estágio
DEI
Enquadramento
Nowadays, the most relevant events in the city are advertised online, however it often becomes difficult to know exactly what is happening in a place: information is spread across too many websites, many times not easy to understand.
In the AmILab/CMS, we have been working on the "enrichment" of place and event information. An example of enrichment applied to events would be to know (for example from Wikipedia (dbPedia)) that a specific rock group has a specific style, a famous music, etc. even if it is not advertised online. Another example would be to infer the popularity of the event (from google hits and opinion mining).
In our project CROWDS (funded by FCT, reference PTDC/EIA-EIA/115014/2009), one of the main tasks is to correlate mobility with event semantics. With a rich information on events, we should be able to understand its mobility implications. This project involves collaboration with the Massachusetts Institute of Technology (MIT).
The role of the student in this project is to work on the event analysis side, particularly on inferring popularity of events and mining for other information.
Objetivo
Building on work that is currently being developed in our lab in connection with SMART and MIT, the goal is to use Supervised Learning approaches (e.g. Bayesian model averaging) applied to results from known keyword extraction systems (OpenCalais, TextWise, SemanticHacker, Annie, KUSCO) to ensemble a set of keywords from event descriptions.
Plano de Trabalhos - Semestre 1
The tentative plan for this project (semester 1) is the following:
- October 15th - State of the art (1.5 months)
- October 31st - Understanding previous work in the lab (2 months)
- November 30th - Keyword Extraction ensembling using available keyword extraction systems, Part I. (1 month)
- December 15th - Experimentation. (1 month)
- January 31th - Intermediate report. Plan for the next months which supervised learning algorithms are to be applied to the system. Further experimentation and validation. (1 month)
Plano de Trabalhos - Semestre 2
The tentative plan for this project (semester 2) is the following:
- April 30th - Implementation of supervised learning algorithms (2 months)
- May 31th - Experiments and validation report. Journal paper submission. (2 months)
- June 30th - MSc thesis delivery. (1 month)
Condições
Strong skills in programming (Java, Python, C/C++).
Other interesting (optional) skills/interests include Machine Learning and Natural Language Processing techniques.
Will to communicate in English with other researchers is also important.
Observações
The candidate curriculum is required.
The CROWDS project comprises one or more short internships (1-2 months) on MIT.
There is also a possibility of funding for this work.
Orientador
Francisco Câmara Pereira e Ana Alves (CISUC)
camara@dei.uc.pt 📩