Gesellschaft fr Informatik e.V.

Lecture Notes in Informatics


Information Systems Technology and ist Applications, ISTA' 2005 4th. International Conference, 23. - 25. May 2005, Palmerston North, New Zealand. GI 2005 P-63, 19-30 (2005).

GI, Gesellschaft für Informatik, Bonn
2005


Editors

Roland Kaschek, Heinrich C. Mayr, Stephen Liddle (eds.)


Copyright © GI, Gesellschaft für Informatik, Bonn

Contents

On the impact of document representation on classifier performance in e-mail categorization

H. Berger , M. Koehle and D. Merkl

Abstract


This paper provides an analysis of multi-class e-mail categorization performance. In order to investigate this issue, the quality of various classification algorithms based on two distinct document representation formalisms is compared. In particular, both a standard word-based document representation as well as a character n-gram document representation is used. The latter is regarded as highly noise-tolerant and was originally proposed for automatic language identification and as a convenient means for producing compact document indices. Furthermore the impact of using available e-mail specific meta-information on classification performance is explored and the findings are presented.


Full Text: PDF

GI, Gesellschaft für Informatik, Bonn
ISBN 3-88579-392-X


Last changed 24.01.2012 21:49:02