Lecture Notes in Informatics

INFORMATIK 2003 - Innovative Informatikanwendungen, Band 2, Beiträge der 33. Jahrestagung der Gesellschaft für Informatik e.V. (GI), 29. September - 2. Oktober 2003 in Frankfurt am Main. P-35, 286-291 (2003).

Klaus R. Dittrich (ed.), Wolfgang König (ed.), Andreas Oberweis (ed.), Kai Rannenberg (ed.), Wolfgang Wahlster (ed.)

Classifying documents by distributed P2P clustering

Martin Eisenhardt , Wolfgang Müller and Andreas Henrich


Clustering documents into classes is an important task in many Information Retrieval (IR) systems. This achieved grouping enables a description of the contents of the document collection in terms of the classes the documents fall into. The compactness of such a description is even more desirable in cases where the document collection is spread across different computers and locations; document classes can then be used to describe each partial document collection in a conveniently short form that can easily be exchanged with other nodes on the network. Unfortunately, most clustering schemes cannot easily be distributed. Additionally, the costs of transferring all data to a central clustering service are prohibitive in large-scale systems. In this paper, we introduce an approach which is capable of classifying documents that are distributed across a Peer-to-Peer (P2P) network. We present measurements taken on a P2P network using synthetic and real-world data sets.

ISBN 3-88579-364-4

