Detecting source topics by analysing directed co-occurrence graphs
This paper describes a new method to determine the sources of topics, that influence the main topics in texts, by analysing directed co-occurrence graphs using an extended version of the HITS algorithm. Additionally, this method can be used to identify characteristic terms in texts. In order to obtain the needed directed term relations the notion of term association is introduced to cover asymmetric reallife relationships between concepts and it is described how they can be calculated by statistical means. In the experiments, it is shown that the detected source topics and the characteristic terms can be used to find similar documents and documents that mainly deal with them in large corpora like the World Wide Web. In doing so iteratively, it is possible to easily follow topics by analysing documents from these corpora using this method. This way, users can be offered this new search function in interactive search systems that goes beyond a simple presentation of similar documents. This application will be elaborated on as well.
Full Text: PDF