License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.WABI.2023.16
URN: urn:nbn:de:0030-drops-186424
URL: https://drops.dagstuhl.de/opus/volltexte/2023/18642/
Khan, Jamshed ;
Rubel, Tobias ;
Dhulipala, Laxman ;
Molloy, Erin ;
Patro, Rob
Fast, Parallel, and Cache-Friendly Suffix Array Construction
Abstract
String indexes such as the suffix array (SA) and the closely related longest common prefix (LCP) array are fundamental objects in bioinformatics and have a wide variety of applications. Despite their importance in practice, few scalable parallel algorithms for constructing these are known, and the existing algorithms can be highly non-trivial to implement and parallelize. In this paper we present CaPS-SA, a simple and scalable parallel algorithm for constructing these string indexes inspired by samplesort. Due to its design, CaPS-SA has excellent memory-locality and thus incurs fewer cache misses and achieves strong performance on modern multicore systems with deep cache hierarchies. We show that despite its simple design, CaPS-SA outperforms existing state-of-the-art parallel SA and LCP-array construction algorithms on modern hardware. Finally, motivated by applications in modern aligners where the query strings have bounded lengths, we introduce the notion of a bounded-context SA and show that CaPS-SA can easily be extended to exploit this structure to obtain further speedups.
BibTeX - Entry
@InProceedings{khan_et_al:LIPIcs.WABI.2023.16,
author = {Khan, Jamshed and Rubel, Tobias and Dhulipala, Laxman and Molloy, Erin and Patro, Rob},
title = {{Fast, Parallel, and Cache-Friendly Suffix Array Construction}},
booktitle = {23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)},
pages = {16:1--16:21},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-294-5},
ISSN = {1868-8969},
year = {2023},
volume = {273},
editor = {Belazzougui, Djamal and Ouangraoua, A\"{i}da},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2023/18642},
URN = {urn:nbn:de:0030-drops-186424},
doi = {10.4230/LIPIcs.WABI.2023.16},
annote = {Keywords: Suffix Array, Longest Common Prefix, Data Structures, Indexing, Parallel Algorithms}
}