Gesellschaft für Informatik e.V.

Lecture Notes in Informatics

10th International Conferenceon Innovative Internet Community Systems (I2CS) - Jubilee Edition 2010 - P-165, 180-189 (2010).

Gesellschaft für Informatik, Bonn

Copyright © Gesellschaft für Informatik, Bonn


Classifying business types from twitter posts using active learning

Chanattha Thongsuk , Choochart Haruechaiyasak and Phayung Meesad


Today, many companies have adopted Twitter as an additional marketing medium to advertise and promote their business activities. One possible solution for organizing a large number of posts is to classify them into a predefined category of business types. Applying normal text categorization technique on Twitter is ineffective due to the short-length (140-character limit) characteristic of each post and a large number of unlabeled data. In this paper, we propose a text categorization approach based on the active learning technique for classifying Twitter posts into three business types, i.e., airline, food and computer \& technology. By applying the active learning, we started by constructing an initial text categorization model from a small set of labelled data. Using this text categorization model, we obtain more positive data instances for constructing a new model by selecting the test data which are predicted as positive. As shown from the experimental results, our proposed approach based on active learning helped increase the classification accuracy over the normal text categorization approach. 180 Twitter, a well-known micro-blogging website, has recently gained a lot of popularity among the Web 2.0 community. Increasingly, many businesses use Twitter as a new channel to promote their products and services including other related activities. For example, many airlines use Twitter to post special flight discount or promotions for their followers. As with many social networking websites, Twitter is considered an important part of Web 2.0 community. Web 2.0 is a departure from traditional websites, and represents a large Internet social networking group which is constantly collecting a lot of online information. In a social networking website, people are allowed to follow other users based on their personal interests. Advertising on social networking websites is growing and interesting because it can reach a lot of customers with low overhead costs. Twitter provides an attractive platform for advertisers to promote their brands. The customer will get information and promotions from the companies. Moreover, the customers can post their opinions or complaints to the companies. Therefore, Twitter acts as a third-party provider where partners may place advertisements on their products and services. Today many brands and companies are using Twitter to advertise, get feedback from the customers and gain more revenue. With a large number of posts, one approach for organizing them is to apply a text categorization model. Previous research works on text categorization considered textual documents such as news articles, publications and web pages. These documents typically contain a large number of words in the range of hundreds or thousands of words. Applying a text categorization model for Twitter is very challenging due to the following reasons.

Full Text: PDF

Gesellschaft für Informatik, Bonn
ISBN 978-3-88579-259-8

Last changed 04.10.2013 18:31:24