License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.WABI.2023.10
URN: urn:nbn:de:0030-drops-186364
URL: https://drops.dagstuhl.de/opus/volltexte/2023/18636/
Go to the corresponding LIPIcs Volume Portal


Rizzo, Nicola ; Cáceres, Manuel ; Mäkinen, Veli

Finding Maximal Exact Matches in Graphs

pdf-format:
LIPIcs-WABI-2023-10.pdf (0.8 MB)


Abstract

We study the problem of finding maximal exact matches (MEMs) between a query string Q and a labeled graph G. MEMs are an important class of seeds, often used in seed-chain-extend type of practical alignment methods because of their strong connections to classical metrics. A principled way to speed up chaining is to limit the number of MEMs by considering only MEMs of length at least κ (κ-MEMs). However, on arbitrary input graphs, the problem of finding MEMs cannot be solved in truly sub-quadratic time under SETH (Equi et al., ICALP 2019) even on acyclic graphs.
In this paper we show an O(n⋅ L ⋅ d^{L-1} + m + M_{κ,L})-time algorithm finding all κ-MEMs between Q and G spanning exactly L nodes in G, where n is the total length of node labels, d is the maximum degree of a node in G, m = |Q|, and M_{κ,L} is the number of output MEMs. We use this algorithm to develop a κ-MEM finding solution on indexable Elastic Founder Graphs (Equi et al., Algorithmica 2022) running in time O(nH² + m + M_κ), where H is the maximum number of nodes in a block, and M_κ is the total number of κ-MEMs.
Our results generalize to the analysis of multiple query strings (MEMs between G and any of the strings). Additionally, we provide some preliminary experimental results showing that the number of graph MEMs is an order of magnitude smaller than the number of string MEMs of the corresponding concatenated collection.

BibTeX - Entry

@InProceedings{rizzo_et_al:LIPIcs.WABI.2023.10,
  author =	{Rizzo, Nicola and C\'{a}ceres, Manuel and M\"{a}kinen, Veli},
  title =	{{Finding Maximal Exact Matches in Graphs}},
  booktitle =	{23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)},
  pages =	{10:1--10:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-294-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{273},
  editor =	{Belazzougui, Djamal and Ouangraoua, A\"{i}da},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2023/18636},
  URN =		{urn:nbn:de:0030-drops-186364},
  doi =		{10.4230/LIPIcs.WABI.2023.10},
  annote =	{Keywords: Sequence to graph alignment, bidirectional BWT, r-index, suffix tree, founder graphs}
}

Keywords: Sequence to graph alignment, bidirectional BWT, r-index, suffix tree, founder graphs
Collection: 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)
Issue Date: 2023
Date of publication: 29.08.2023
Supplementary Material: Software (Source Code): https://github.com/algbio/br-index-mems archived at: https://archive.softwareheritage.org/swh:1:dir:62181b3b38fb659c1c266508948c6058a61f3623


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI