Two graph-based approaches for finding cross-species conserved gene orders
Abstract
Identification of homologous regions across genomes is one crucial step in comparative genomics. This task is usually performed by genome alignment softwares like WABA or blastz [KZ00, SKS+03]. Alternatively such regions can be defined on a higher level of abstraction, that is conserved gene orders. On this level, homologies between even more distantly related genomes can be found, which can not be aligned by standard alignment softwares. We present two approaches to identify such regions of conserved synteny. This naturally involves prediction of orthologous genes. While existing methods like bestreciprocal hits or Inparanoid [ATLS06] predict orthology on sequence similarity alone, our methods use similarity as well as synteny information. Pairwise and multiple-species comparisons between human, mouse, rat and dog show that the different genomes express extensive collinearity. 76\% of human genes are found in blocks between human and dog, 55\% are found after including mouse. This value drops to about 50\% for blocks of all four species. For quality assessment we compared the SYNTENATOR orthologs to the Ensembl gene orthology predictions. Our method recovered 97\% of the Ensembl one-to-one orthologs. In addition more than 34\% of genes for which Ensembl predicted one-to-many could be resolved to one-to-one orthologs.
Full Text: PDF