Abstract
We define two new computational problems in the domain of perfect genome rearrangements, and propose three algorithms to solve them. The rearrangement scenarios modeled by the problems consider Reversal and Block Interchange operations, and a PQtree is utilized to guide the allowed operations and to compute their weights. In the first problem, Constrained TreeToString Divergence (CTTSD), we define the basic structureinformed rearrangement based divergence measure. Here, we assume that the gene order members of the gene cluster from which the PQtree is constructed are permutations. The PQtree representing the gene cluster is ordered such that the series of gene IDs spelled by its leaves is equivalent to the reference gene order. Then, a structureinformed gene rearrangement measure is computed between the ordered PQtree and the target gene order. The second problem, TreeToString Divergence (TTSD), generalizes CTTSD, where the gene order members are not necessarily permutations and the structureinformed rearrangement based divergence measure is extended to also consider up to d_S and d_T gene insertion and deletion operations, respectively, when modelling the PQtree informed divergence process from the reference order to the target order.
The first algorithm solves CTTSD in O(n γ² ⋅ (m_p ⋅ 1.381^γ + m_q)) time and O(n²) space, where γ is the maximum number of children of a node, n is the length of the string and the number of leaves in the tree, and m_p and m_q are the number of Pnodes and Qnodes in the tree, respectively. If one of the penalties of CTTSD is 0, then the algorithm runs in O(n m γ²) time and O(n²) space. The second algorithm solves TTSD in O(n² γ² {d_T}² {d_S}² m² (m_p ⋅ 5^γ γ + m_q)) time and O(d_T d_S m (m n + 5^γ)) space, where γ is the maximum number of children of a node, n is the length of the string, m is the number of leaves in the tree, m_p and m_q are the number of Pnodes and Qnodes in the tree, respectively, and allowing d_T deletions from the tree and d_S deletions from the string. The third algorithm is intended to reduce the space complexity of the second algorithm. It solves a variant of the problem (where one of the penalties of TTSD is 0) in O(n γ² {d_T}² {d_S}² m² (m_p ⋅ 4^γ γ²n(d_T+d_S+m+n) + m_q)) time and O(γ² n m² d_T d_S (d_T+d_S+m+n)) space.
The algorithm is implemented as a software tool, denoted MEMRearrange, and applied to the comparative and evolutionary analysis of 59 chromosomal gene clusters extracted from a dataset of 1,487 prokaryotic genomes.
BibTeX  Entry
@InProceedings{ozery_et_al:LIPIcs.WABI.2022.11,
author = {Ozery, Eden and Zehavi, Meirav and ZivUkelson, Michal},
title = {{New Algorithms for Structure Informed Genome Rearrangement}},
booktitle = {22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
pages = {11:111:19},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959772433},
ISSN = {18688969},
year = {2022},
volume = {242},
editor = {Boucher, Christina and Rahmann, Sven},
publisher = {Schloss Dagstuhl  LeibnizZentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2022/17045},
URN = {urn:nbn:de:0030drops170454},
doi = {10.4230/LIPIcs.WABI.2022.11},
annote = {Keywords: PQtree, Gene Cluster, Breakpoint Distance}
}
Keywords: 

PQtree, Gene Cluster, Breakpoint Distance 
Collection: 

22nd International Workshop on Algorithms in Bioinformatics (WABI 2022) 
Issue Date: 

2022 
Date of publication: 

26.08.2022 
Supplementary Material: 

The code for our tool, the data used in experiments, and the log file produced by the run of the reported benchmark, can be found on GitHub. Software (Source Code and Data): https://github.com/edenozery/MEMRearrange 