JCSE, vol. 10, no. 4, pp.128-136, 2016
DOI: http://dx.doi.org/10.5626/JCSE.2016.10.4.128
Company Name Discrimination in Tweets using Topic Signatures Extracted from News Corpus
Beomseok Hong, Yanggon Kim, and Sang Ho Lee
Department of Computer and Information Science, Towson University, Towson, MD, USA
School of Software, Soongsil University, Seoul, Korea
Abstract: It is impossible for any human being to analyze the more than 500 million tweets that are generated per day. Lexical
ambiguities on Twitter make it difficult to retrieve the desired data and relevant topics. Most of the solutions for the word
sense disambiguation problem rely on knowledge base systems. Unfortunately, it is expensive and time-consuming to
manually create a knowledge base system, resulting in a knowledge acquisition bottleneck. To solve the knowledgeacquisition
bottleneck, a topic signature is used to disambiguate words. In this paper, we evaluate the effectiveness of
various features of newspapers on the topic signature extraction for word sense discrimination in tweets. Based on our
results, topic signatures obtained from a snippet feature exhibit higher accuracy in discriminating company names than
those from the article body. We conclude that topic signatures extracted from news articles improve the accuracy of word
sense discrimination in the automated analysis of tweets.
Keyword:
Twitter; Tweet; Word sense discrimination; Topic signature
Full Paper: 361 Downloads, 1638 View
|