Download report

This paper presents an empirical study of efficacy of machine learning techniques in classifying text messages by semantic meaning. We use movie review comments from popular social network Digg as our data set and classify text by subjectivity/objectivity and negative/positive attitude. We propose different approaches in extracting text features such as bag-of-words model, using large movie reviews corpus, restricting to adjectives and adverbs, handling negations, bounding word frequencies by a threshold, and using WordNet synonyms knowledge. We evaluate their effect on accuracy of four machine learning methods - Naive Bayes, Decision Trees, Maximum-Entropy, and K-Means clustering. We conclude our study with explanation of observed trends in accuracy rates and providing directions for future work.

Categories :