Call for Papers
About the Journal
Editorial Board
Publication Ethics
Instructions for Authors
Announcements
Current Issue
Back Issues
Search for Articles
Categories
 

JCSE, vol. 4, no. 2, pp.110-127, June, 2010

DOI:

Representation of Texts into String Vectors for Text Categorization

Taeho Jo
School of Computer and Information Engineering, Inha University, Korea

Abstract: In this study, we propose a method for encoding documents into string vectors, instead ofnumerical vectors. A traditional approach to text categorization usually requires encodingdocuments into numerical vectors. The usual method of encoding documents therefore causestwo main problems: huge dimensionality and sparse distribution. In this study, we modify orcreate machine learning-based approaches to text categorization, where string vectors arereceived as input vectors, instead of numerical vectors. As a result, we can improve textcategorization performance by avoiding these two problems.

Keyword: No keyword

Full Paper:   128 Downloads, 3734 View

 
 
ⓒ Copyright 2010 KIISE – All Rights Reserved.    
Korean Institute of Information Scientists and Engineers (KIISE)   #401 Meorijae Bldg., 984-1 Bangbae 3-dong, Seo-cho-gu, Seoul 137-849, Korea
Phone: +82-2-588-9240    Fax: +82-2-521-1352    Homepage: http://jcse.kiise.org    Email: office@kiise.org