Abstract:
Sentiment classification is an emerging research field. Due to the rich opinionated web content, people and organizations are interested in knowing others' opinions, so they need
an automated tool for analyzing and summarizing these opinions. One of the major tasks of sentiment classification is to classify a document (i.e. a blog, news article or review) as holding an overall positive or negative sentiment. Machine learning approaches have
succeeded in achieving better results than semantic orientation approaches in document-level sentiment classification; however, they still need to take linguistic context into account,
by making use of the so-called contextual valence shifters. Early research has tried to add sentiment features and contextual valence shifters to the machine learning approach to tackle this problem, but the classifier's performance was low.In this study, we would like to improve the performance of document-level sentiment classification using the machine learning approach by proposing new feature sets that refine the traditional sentiment feature extraction method and take contextual valence shifters
into consideration from a different perspective than the earlier research. These feature sets include: 1) a feature set consisting of 16 features for counting different categories of contextual valence shifters (intensifiers, negators and polarity shifters) as well as the frequency of words grouped according to their final (modified) polarity; and 2) another feature set consisting of the frequency of each sentiment word after modifying its prior polarity. We performed
several experiments to: 1) compare our proposed feature sets with the traditional sentiment features that count the frequency of each sentiment word while disregarding its prior polarity; 2) compare our proposed feature sets after combining them with stylistic features and n-grams with traditional sentiment features combined with stylistic features and n-grams; and 3) evaluate the effectiveness of our proposed feature sets against stylistic features and n-grams by performing feature selection. The results of all the experiments show a significant improvement over the baselines, in terms of the accuracy, precision and recall, which indicate that our proposed feature sets are effective in document-level sentiment classification.