Identifying hot cancer and novel novel cell-cycle genes from literature
Gene prioritization based on background knowledge mined from literature has become an important method for the analysis of results from high-throughput experimental assays such as gene expression microarrays, RNAi screens and genomewide association studies. We apply our gene mention identifier, which achieved the best result of over 80\% in the BioCreative II text-mining challenge [HPR+08], and show how text-mined associations can be complemented using guilt-by-association on high confidence protein interaction networks. First, we predict hand-curated gene-disease relationships in the OMIM database, Entrez Gene summaries and GeneRIFs with 37\% success rate. Second, we confirm 24\% of novel cell-cycle genes identified in a recent RNAi screen [KPH+07] by using text-mining and high confidence protein interactions. Moreover, we show how 71\% of GOA cell-cycle annotations can be automatically recovered. Third, we devise a method to rank genes based on novelty, increasing interest, impact, and popularity.
Full Text: PDF