Approaches to feature selection for document categorization
One of the problems faced by document categorization is that terms present in the collection of example documents are numerous. From the point of view of coherence between the models used in document categorization, we analyses the frameworks of both k-NN and NB categorization models and feature selection problem. Two algorithms CBA and IBA to feature selection are proposed. The empirical results done with k-NN and NB classifiers show that the coherence between models in the categorization system can bring benefits for performance.
Full Text: PDF