Titulo Estágio
A method for Sentiment Analysis for Portuguese Informal Texts
Local do Estágio
DEI-FCTUC
Enquadramento
With the increasing number of online shopping sites, people shop online rather than going to the stores. Since almost all online shopping sites allow shoppers to write and read comments about the products they are selling, people searching for items to purchase read these comments and then decide accordingly.
Furthermore, studies show that not only online shoppers but also online shoppers receive support from the online shopping sites by reading online comments before going to shopping. Websites providing comment support on items are not only online shopping sites. In addition, forums and blogs are huge data sources. Social networking sites like Facebook and Twitter are sometimes also used for rating a service or product.
Basically, sentiment analysis concentrates on analyzing and summarizing people's opinions, feelings and notions towards all kind of entities such as products, services and topics and their aspects. Commonly three different levels of sentiment analysis are studied: document-level, sentence-level and feature-level.
Document level sentiment analysis (DLSA) aims to determine whether the overall sentiment in the document is positive or negative. Sentence level sentiment analysis (SLSA) focuses on determining the orientation of feeling in each sentence. Both DLSA and SLSA processes are too general to classify sufficiency of features of items/products; hence feature level sentiment analysis (FLSA) is developed to nd polarity of sentiments for each different aspect of entities mentioned in a sentence or document.
All FLSA methods contain two common steps: feature extraction and featuresentiment classication. Feature extraction step extracts features from a given text while feature-sentiment classication step matches extracted features with the sentiments in the text. In this study, a new method to extract features of a topic from a given set of documents is proposed. Frequency of nouns in the item reviews and web search are the main components of the proposed method.
In this propose, a new unsupervised approach for feature extraction
for sentiment analysis is proposed. The main goal is improving the performance of frequency based feature extraction by using a search engine. Although frequency based feature extraction methods produce good precision and recall values on formal texts, they are not very successful on informal texts. The idea is to propose algorithm takes the features of items suggested by frequency based feature extraction method, then, eliminates the features which do not co-occur with the item, whose features are sought. Also is expected that approach constructs the candidate feature domain-independent. Is expected that the results of experiments reveal that for informal Portuguese texts, much higher performance than frequency based method.
Objetivo
In practice, the expected outcomes of this internship are:
- Develop approaches for sentiment analysis;
- Provide an approach for sentiment analysis to informal portuguese texts;
- Define an ontology to help in the sentiment analysis by classification algorithms.
- A research paper, to be submitted and presented at a top international conference, describing the approach and main results obtained from the experiments.
Plano de Trabalhos - Semestre 1
[Some tasks might overlap; M=Month]
T1 (M1 – M3): Knowledge transfer and state of the art literature review on sentiment analysis.
T2 (M3) Identification of data set to be used in the experiments.
T3 (M3-M4) Experiments with approachs gathered in task T1 with data set found in task T2.
T5 (M5): Writing the Intermediate report.
Plano de Trabalhos - Semestre 2
[Some tasks might overlap; M=Month]
T6 (M6): Integration of the intermediate defense comments and completion of the sentiment analysis.
T7 (M6 – M7): Implementation of the approach for sentiment analysis to informal Portuguese texts, and execution of tests.
T8 (M8): Execution of experiments and analysis of results.
T9 (M9): Write a research paper and submission to a top international conference on the Text Mining areas (Database Systems - TODS, Database Systems for Advanced Applications - DASFAA, IEEE International Conference on Data Engineering – ICDE, IEEE Transactions on Knowledge and Data Engineering, etc.).
T10 (M10): Writing the thesis.
Condições
The work will be carried out in the facilities of the Department of Informatics Engineering at the University of Coimbra (CISUC - Software and Systems Engineering Group), where a work place and necessary computer resources will be provided.
Observações
A scholarship may be available (value to be defined) for at least part of the duration of the internship.
Orientador
Rodrigo Rocha Silva
rrochas@dei.uc.pt 📩