JCSE, vol. 4, no. 2, pp.110-127, June, 2010
Representation of Texts into String Vectors for Text Categorization
School of Computer and Information Engineering, Inha University, Korea
Abstract: In this study, we propose a method for encoding documents into string vectors, instead ofnumerical vectors. A traditional approach to text categorization usually requires encodingdocuments into numerical vectors. The usual method of encoding documents therefore causestwo main problems: huge dimensionality and sparse distribution. In this study, we modify orcreate machine learning-based approaches to text categorization, where string vectors arereceived as input vectors, instead of numerical vectors. As a result, we can improve textcategorization performance by avoiding these two problems.
Full Paper: 34 Downloads, 849 View