Gesellschaft für Informatik e.V.

Lecture Notes in Informatics


11^{th} international conference on innovative internet community services (I^{2}CS 2011) P-186, 139-148 (2011).

Gesellschaft für Informatik, Bonn
2011


Copyright © Gesellschaft für Informatik, Bonn

Contents

Topic detection based on the pagerank's clustering property

Mario Kubek and Herwig Unger

Abstract


This paper introduces a method to cluster graphs of semantically related terms from texts using PageRank calculations for use in the field of text mining, e.g. to automatically discover different topics in a text corpus. It is evaluated by providing empirical results of tests by applying this method on real text corpora. It is shown that this application of the PageRank formula realizes suitable clustering such that the mean similarity between the terms in the clusters reaches a high level. A special state transition in the mean term similarity is discussed when analysing texts with stopwords.


Full Text: PDF

Gesellschaft für Informatik, Bonn
ISBN 978-3-88579-280-2


Last changed 04.10.2013 18:35:55