| JCSE, vol. 4, no. 2, pp.110-127, 2010 DOI:    Representation of Texts into String Vectors for Text Categorization  Taeho JoSchool of Computer and Information Engineering, Inha University, Korea
 Abstract: In this study, we propose a method for encoding documents into string vectors, instead ofnumerical vectors. A traditional approach to text categorization usually requires encodingdocuments into numerical vectors. The usual method of encoding documents therefore causestwo main problems: huge dimensionality and sparse distribution. In this study, we modify orcreate machine learning-based approaches to text categorization, where string vectors arereceived as input vectors, instead of numerical vectors. As a result, we can improve textcategorization performance by avoiding these two problems. Keyword: 
                                                  
                                                  No keyword
                                                  
                                                 Full Paper:  128 Downloads, 4245 View |