Titulo Estágio
Parsimonious sensing with Active Learning: applications with context mining and environmental sensing
Áreas de especialidade
Sistemas Inteligentes
Local do Estágio
AmIlab
Enquadramento
The Internet is an excellent oracle for events in the real world. From weather information to social events, from mundane events such as traffic incidents to historical ones such as the Arab spring or the swine flu, it became the very first source to provide and look for information. Given this correlation with real world phenomena, this source of information can be valuable to develop predictive models, for example, for traffic prediction. In other words, the internet can help understand when, where and why events happen, and this information can be relevant for predictive applications.
For each real-world event, the related web-based data is its context. A context is dynamic whenever it can change over time. An example is statistics of tweets about a certain event through time. Our context mining project explores machine learning and information retrieval methodologies to capture static and dynamic context data from the web in order to bring it into predictive model
Objetivo
Given a certain event for which we want to collect data, a trivial approach to dynamic context mining is to generate all possible combinations of queries that relate to this event (e.g. its title, its location, its type, performer, etc.), however when we have thousands of future events to target, as in the case of cities like London, Boston or Singapore, serious scalability issues arise. Limits of available APIs (e.g. Google, Facebook, Twitter) are quickly reached and we can’t collect data.
Our approach is to reframe our problem from a sampling perspective: given a certain budget of daily API calls and a much bigger set of possible queries, can we define the best subset of queries that covers the largest possible amount of useful information? We can exploit relationships between events (e.g. multiple rock events can be associated to a small set of queries) and the temporal dynamics.
This internship will build on the above principles yet leaving plenty of room for the candidate to use his/her own creativity and skills and develop a consistent work. It is crucial to both bring relevant scientific results as well as make it a well-defined and successful internship.
Plano de Trabalhos - Semestre 1
•Study of the state-of-the-art regarding information retrieval and methodologies to explore;
•Preliminary analysis of the available dataset (general statistics, topic modeling);
•Design and development of the context sensor model
•Write of intermediate report
Plano de Trabalhos - Semestre 2
•Validation of the model on available dataset
•Refinement of context sensor model
•Development of benchmark model
•Comparative validation of the model using real data
•Writing of scientific article
•Writing of the thesis
Condições
This work will be carried out in the Ambient Intelligence Lab (AmIlab) of CISUC, where there will be a regular supervision and feedback on the behalf of the supervisors.
This work will be paid through a BSc research grant.
Observações
Depending on quality of work during the first semester, this project involves a potential long-term international visit (3 to 6 months) at MIT in Boston or Singapore.
Orientador
Bernardete Ribeiro, Francisco Pereira (bribeiro@dei.uc.pt,camara@mit.edu)
bribeiro@dei.uc.pt 📩