Gesellschaft für Informatik e.V.

Lecture Notes in Informatics


Information Systems Technology and its Applications, International Conference ISTA'2003, June 19-21, 2003, Kharkiv, Ukraine, Proceedings. P-30, 47-58 (2003).

GI, Gesellschaft für Informatik, Bonn
2003


Editors

Mikhail Godlevsky (ed.), Stephen W. Liddle (ed.), Mayr Heinrich C. (ed.)


Copyright © GI, Gesellschaft für Informatik, Bonn

Contents

Using neighborhood information for automated categorization of web pages

Nadejda Panteleeva

Abstract


In this paper we discuss several issues related to the influence of expansion of a Web document representation on quality of topical categorization of Web pages. We consider a Web page expansion by using text content of it's linking pages. We show that naive expansion can grab too much noise and essentially harm categorization results. We present the approach to automated pruning of linking Web pages. We report that using our approach in forming a Web page representation always leads to better results than traditional single Web page categorization.


Full Text: PDF

GI, Gesellschaft für Informatik, Bonn
ISBN 3-88579-359-8


Last changed 04.10.2013 17:57:48