Titulo Estágio
Malicious Bot Mining Detection in Twitter
Áreas de especialidade
Sistemas Inteligentes
Local do Estágio
Laboratório de Redes Neuronais (LARN-CISUC)
Enquadramento
The popularity and open structure of Twitter have attracted a large number of automated programs, known as bots. Legitimate bots generate a large number of benign tweets delivering news and updating feeds, while malicious bots spread spam or malicious contents. Important effects of bots are affecting the stock market, causing panic during emergencies, exposing private information, contributing to the strong polarization of political discussion, altering the perception of social media influence, ruining the reputation of a company to name a few examples.
Objetivo
The main objective of this proposal to develop a Malicious Bot Detection System in the Twitter Social Network. The first main goal is described in the following steps:
(i) Step 1: Data Collecting: Collect Twitter users’ data and tweets for data preprocessing and analysis; Using Twitter API and an open source library to collect data
(ii) Step 2: Feature Engineering: Design hand-crafted features for representing the Twitter bots; e.g., User Profile Network, Tweeting Behavior, Topic Modeling, Sentiment, etc.
(iii) Step 3: Feature Analytics: Perform data analytics to choose the best features for model building, namely, by including sentiment analysis and topic modeling.
The second main goal is to develop a model able to detect malicious bots, using machine learning and pattern recognition techniques. The next steps should be followed:
(iv) Step 4: Model Building: Train and evaluate bot detection engine using machine learning techniques, especially neural networks.
(v) Step 5: Model Tuning: Select appropriate ML algorithms for building the Bot detection Model. Perform Sampling and Model Evaluation; e.g. Accuracy, Precision, Recall and F-measure, ROC, AUC, etc.
(vi) Step 6: Model Validation: Validate the overall Model with real data.
Plano de Trabalhos - Semestre 1
•Literature Review;
•Complete Database Setup framework;
•Select, prepare, and preprocess a collection of Twitter datasets for experiments;
•Preliminary Analysis of Bot Detection on the Twitter DataSet;
•Writing of intermediate report.
Plano de Trabalhos - Semestre 2
•Study, and select, machine learning (ML) algorithms and feature selection (FS) algorithms for building the predictive model for detecting the Bots in Twitter streaming;
•Analyze experimental results: e.g., study parameter values; compare performance of the reduced datasets vs. previous results, etc.;
•Writing of scientific article;
•Writing of the thesis.
Condições
This work will be carried out in the Laboratory of Neural Networks (LARN) of CISUC, where there will be a regular supervision and feedback on the behalf of the supervisor and co-supervisor.
Familiarity with machine learning and data mining algorithms and software tools are essential. Participating students will acquire valuable knowledge and experience with model building and data science by mining massive datasets, which skills are currently in high demand for various technology employers due to the relevance to various applications.
Observações
Grant Opportunities may become available during internship depending on the 1st semester candidate evaluation.
Logistics @Laboratory of Neural Networks (LARN)
DEI-FCTUC
Orientador
Orientadora:Catarina Silva (catarina@dei.uc.pt) Co-orientadores: Bernardete Ribeiro; Hugo Oliveira
catarina@dei.uc.pt 📩