Call for Papers
About the Journal
Editorial Board
Publication Ethics
Instructions for Authors
Current Issue
Back Issues
Search for Articles
Search for Articles

JCSE, vol. 4, no. 2, pp.110-127, 2010


Representation of Texts into String Vectors for Text Categorization

Taeho Jo
School of Computer and Information Engineering, Inha University, Korea

Abstract: In this study, we propose a method for encoding documents into string vectors, instead ofnumerical vectors. A traditional approach to text categorization usually requires encodingdocuments into numerical vectors. The usual method of encoding documents therefore causestwo main problems: huge dimensionality and sparse distribution. In this study, we modify orcreate machine learning-based approaches to text categorization, where string vectors arereceived as input vectors, instead of numerical vectors. As a result, we can improve textcategorization performance by avoiding these two problems.

Keyword: No keyword

Full Paper:   128 Downloads, 3814 View

ⓒ Copyright 2010 KIISE – All Rights Reserved.    
Korean Institute of Information Scientists and Engineers (KIISE)   #401 Meorijae Bldg., 984-1 Bangbae 3-dong, Seo-cho-gu, Seoul 137-849, Korea
Phone: +82-2-588-9240    Fax: +82-2-521-1352    Homepage:    Email: