Abstract
Two equal length strings are a parameterized match (pmatch) iff there exists a onetoone function that renames the symbols in one string to those in the other. The Parameterized Suffix Tree (PST) [Baker, STOC' 93] is a fundamental data structure that handles various string matching problems under this setting. The PST of a text T[1,n] over an alphabet Σ of size σ takes O(nlog n) bits of space. It can report any entry in (parameterized) (i) suffix array, (ii) inverse suffix array, and (iii) longest common prefix (LCP) array in O(1) time. Given any pattern P as a query, a position i in T is an occurrence iff T[i,i+P1] and P are a pmatch. The PST can count the number of occurrences of P in T in time O(Plog σ) and then report each occurrence in time proportional to that of accessing a suffix array entry. An important question is, can we obtain a compressed version of PST that takes space close to the text’s size of nlogσ bits and still support all three functionalities mentioned earlier? In SODA' 17, Ganguly et al. answered this question partially by presenting an O(nlogσ) bit index that can support (parameterized) suffix array and inverse suffix array operations in O(log n) time. However, the compression of the (parameterized) LCP array and the possibility of faster suffix array and inverse suffix array queries in compact space were left open. In this work, we obtain a compact representation of the (parameterized) LCP array. With this result, in conjunction with three new (parameterized) suffix array representations, we obtain the first set of PST representations in o(nlog n) bits (when logσ = o(log n)) as follows. Here ε > 0 is an arbitrarily small constant.
 Space O(n logσ) bits and query time O(log_σ^ε n);
 Space O(n logσlog log_σ n) bits and query time O(log log_σ n); and
 Space O(n logσ log^ε_σ n) bits and query time O(1).
The first tradeoff is an improvement over Ganguly et al.’s result, whereas our third tradeoff matches the optimal time performance of Baker’s PST while squeezing the space by a factor roughly log_σ n. We highlight that our tradeoffs match the spaceandtime bounds of the bestknown compressed text indexes for exact pattern matching and further improvement is highly unlikely.
BibTeX  Entry
@InProceedings{ganguly_et_al:LIPIcs.ICALP.2022.65,
author = {Ganguly, Arnab and Shah, Rahul and Thankachan, Sharma V.},
title = {{Fully Functional Parameterized Suffix Trees in Compact Space}},
booktitle = {49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)},
pages = {65:165:18},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959772358},
ISSN = {18688969},
year = {2022},
volume = {229},
editor = {Boja\'{n}czyk, Miko{\l}aj and Merelli, Emanuela and Woodruff, David P.},
publisher = {Schloss Dagstuhl  LeibnizZentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2022/16406},
URN = {urn:nbn:de:0030drops164061},
doi = {10.4230/LIPIcs.ICALP.2022.65},
annote = {Keywords: Data Structures, Suffix Trees, String Algorithms, Compression}
}
Keywords: 

Data Structures, Suffix Trees, String Algorithms, Compression 
Collection: 

49th International Colloquium on Automata, Languages, and Programming (ICALP 2022) 
Issue Date: 

2022 
Date of publication: 

28.06.2022 