License:
Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2020.5
URN: urn:nbn:de:0030-drops-121302
URL: https://drops.dagstuhl.de/opus/volltexte/2020/12130/
Ben-Nun, Stav ;
Golan, Shay ;
Kociumaka, Tomasz ;
Kraus, Matan
Time-Space Tradeoffs for Finding a Long Common Substring
Abstract
We consider the problem of finding, given two documents of total length n, a longest string occurring as a substring of both documents. This problem, known as the Longest Common Substring (LCS) problem, has a classic 𝒪(n)-time solution dating back to the discovery of suffix trees (Weiner, 1973) and their efficient construction for integer alphabets (Farach-Colton, 1997). However, these solutions require Θ(n) space, which is prohibitive in many applications. To address this issue, Starikovskaya and Vildhøj (CPM 2013) showed that for n^{2/3} ≤ s ≤ n, the LCS problem can be solved in 𝒪(s) space and 𝒪̃(n²/s) time. Kociumaka et al. (ESA 2014) generalized this tradeoff to 1 ≤ s ≤ n, thus providing a smooth time-space tradeoff from constant to linear space. In this paper, we obtain a significant speed-up for instances where the length L of the sought LCS is large. For 1 ≤ s ≤ n, we show that the LCS problem can be solved in 𝒪(s) space and 𝒪̃(n²/(L⋅s) +n) time. The result is based on techniques originating from the LCS with Mismatches problem (Flouri et al., 2015; Charalampopoulos et al., CPM 2018), on space-efficient locally consistent parsing (Birenzwige et al., SODA 2020), and on the structure of maximal repetitions (runs) in the input documents.
BibTeX - Entry
@InProceedings{bennun_et_al:LIPIcs:2020:12130,
author = {Stav Ben-Nun and Shay Golan and Tomasz Kociumaka and Matan Kraus},
title = {{Time-Space Tradeoffs for Finding a Long Common Substring}},
booktitle = {31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
pages = {5:1--5:14},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-149-8},
ISSN = {1868-8969},
year = {2020},
volume = {161},
editor = {Inge Li G{\o}rtz and Oren Weimann},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2020/12130},
URN = {urn:nbn:de:0030-drops-121302},
doi = {10.4230/LIPIcs.CPM.2020.5},
annote = {Keywords: longest common substring, time-space tradeoff, local consistency, periodicity}
}
Keywords: |
|
longest common substring, time-space tradeoff, local consistency, periodicity |
Collection: |
|
31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020) |
Issue Date: |
|
2020 |
Date of publication: |
|
09.06.2020 |