Sentiment Analysis in Czech Social Media Using Supervised Machine Learning

Sentiment Analysis in Czech Social Media Using Supervised Machine Learning

This article provides an in-depth research of machine learning methods for sentiment ana- lysis of Czech social media. Whereas in En- glish, Chinese, or Spanish this field has a long history and evaluation datasets for vari- ous domains are widely available, in case of Czech language there has not yet been any systematical research conducted. We tackle this issue and establish a common ground for further research by providing a large human- annotated Czech social media corpus. Fur- thermore, we evaluate state-of-the-art super- vised machine learning methods for sentiment analysis. We explore different pre-processing techniques and employ various features and classifiers. Moreover, in addition to our newly created social media dataset, we also report re- sults on other widely popular domains, such as movie and product reviews. We believe that this article will not only extend the current sentiment analysis research to another family of languages, but will also encourage competi- tion which potentially leads to the production of high-end commercial solutions.

Keywords: Social media, sentiment analysis, Czech corpus, classifier, supervised approach, Facebook

Year: 2013

Download: download Full text [256 kB]

Authors of this publication:

Josef Steinberger


Josef is an associated professor at the Department of computer science and engineering at the University of West Bohemia in Pilsen, Czech Republic. He is interested in media monitoring and analysis, mainly automatic text summarisation, sentiment analysis and coreference resolution.

Related Projects:


Multilingual Sentiment Analysis

Authors:  Josef Steinberger
Desc.:Sentiment analysis of news and social media in multiple languages.