Lecture Notes in Informatics

Natural Language Processing and Information Systems, 8th International Conference on Applications of Natural Language to Information Systems, June 2003, Burg (Spreewald), Germany. P-29, 141-154 (2003).

Antje Düsterhöft (ed.), Bernhard Thalheim (ed.)

Approaches to feature selection for document categorization

Huaizhong Kou , Georges Gardarin and Karina Zeitouni


One of the problems faced by document categorization is that terms present in the collection of example documents are numerous. From the point of view of coherence between the models used in document categorization, we analyses the frameworks of both k-NN and NB categorization models and feature selection problem. Two algorithms CBA and IBA to feature selection are proposed. The empirical results done with k-NN and NB classifiers show that the coherence between models in the categorization system can bring benefits for performance.

GI, Gesellschaft für Informatik, Bonn
ISBN 3-88579-358-X

