Gesellschaft für Informatik e.V.

Lecture Notes in Informatics

Business Information Systems, 9th International Conference on Business Information Systems (BIS 2006), May 31-June 2 2006, Klagenfurt, Austria P-85, 342-352 (2006).



Witold Abramowicz, Heinrich C. Mayr (eds.)


Expected utility of content blocks in web content extraction

Marek Kowalkiewicz


In this paper we discuss the possible application of new concepts in web content extraction: utility assessment, utility annealing, and dynamic aggregated document generation. After analysis of the state of the art in web content extraction, results of a survey study among Polish managers are presented. The discussion covers a web content extraction system with possible extensions that may help tackle the information overload problem. The discussed extensions go beyond current state of the art. Utility assessment considers economical view on value of information, while utility annealing allows for removing content blocks that cover information already acquired from other content blocks. Due to the existing content block extraction technology and new concepts proposed in the paper, it is possible to dynamically generate aggregated documents.

Full Text: PDF

ISBN 2-88579-179-X

Last changed 24.01.2012 21:54:23