Call for Papers
About the Journal
Editorial Board
Publication Ethics
Instructions for Authors
Announcements
Current Issue
Back Issues
Search for Articles
Categories
Search for Articles
 

JCSE, vol. 4, no. 2, pp.110-127, 2010

DOI:

Representation of Texts into String Vectors for Text Categorization

Taeho Jo
School of Computer and Information Engineering, Inha University, Korea

Abstract: In this study, we propose a method for encoding documents into string vectors, instead ofnumerical vectors. A traditional approach to text categorization usually requires encodingdocuments into numerical vectors. The usual method of encoding documents therefore causestwo main problems: huge dimensionality and sparse distribution. In this study, we modify orcreate machine learning-based approaches to text categorization, where string vectors arereceived as input vectors, instead of numerical vectors. As a result, we can improve textcategorization performance by avoiding these two problems.

Keyword: No keyword

Full Paper:   128 Downloads, 3814 View

 
 
ⓒ Copyright 2010 KIISE – All Rights Reserved.    
Korean Institute of Information Scientists and Engineers (KIISE)   #401 Meorijae Bldg., 984-1 Bangbae 3-dong, Seo-cho-gu, Seoul 137-849, Korea
Phone: +82-2-588-9240    Fax: +82-2-521-1352    Homepage: http://jcse.kiise.org    Email: office@kiise.org