Gesellschaft für Informatik e.V.

Lecture Notes in Informatics


Natural Language Processing and Information Systems, 8th International Conference on Applications of Natural Language to Information Systems, June 2003, Burg (Spreewald), Germany. P-29, 228-234 (2003).

GI, Gesellschaft für Informatik, Bonn
2003


Editors

Antje Düsterhöft (ed.), Bernhard Thalheim (ed.)


Copyright © GI, Gesellschaft für Informatik, Bonn

Contents

Keyword extraction for text characterization

Ingrid Renz , Andrea Ficzay and Holger Hitzler

Abstract


Keywords are valuable means for characterizing texts. In order to extract keywords we propose an efficient and robust, languageand domainindependent approach which is based on small word parts (quadgrams). The basic algorithm can be improved by re-examining and re-ranking keywords using edit distance (i.e. Levenshtein distance) and an algorithm based on the relativistic addition of velocities (here: weights). For the purpose of evaluation, we compare our approach to frequency-based keyword extraction (exemplary text collection: 45000 intranet documents in German and English).


Full Text: PDF

GI, Gesellschaft für Informatik, Bonn
ISBN 3-88579-358-X


Last changed 04.10.2013 17:57:46