Expected utility of content blocks in web content extraction
In this paper we discuss the possible application of new concepts in web content extraction: utility assessment, utility annealing, and dynamic aggregated document generation. After analysis of the state of the art in web content extraction, results of a survey study among Polish managers are presented. The discussion covers a web content extraction system with possible extensions that may help tackle the information overload problem. The discussed extensions go beyond current state of the art. Utility assessment considers economical view on value of information, while utility annealing allows for removing content blocks that cover information already acquired from other content blocks. Due to the existing content block extraction technology and new concepts proposed in the paper, it is possible to dynamically generate aggregated documents.
Full Text: PDF