Abstract
Simon’s congruence ∼_k is a relation on words defined by Imre Simon in the 1970s and intensely studied since then. This congruence was initially used in connection to piecewise testable languages, but also found many applications in, e.g., learning theory, databases theory, or linguistics. The ∼_krelation is defined as follows: two words are ∼_kcongruent if they have the same set of subsequences of length at most k. A long standing open problem, stated already by Simon in his initial works on this topic, was to design an algorithm which computes, given two words s and t, the largest k for which s∼_k t. We propose the first algorithm solving this problem in linear time O(s+t) when the input words are over the integer alphabet {1,…,s+t} (or other alphabets which can be sorted in linear time). Our approach can be extended to an optimal algorithm in the case of general alphabets as well.
To achieve these results, we introduce a novel datastructure, called SimonTree, which allows us to construct a natural representation of the equivalence classes induced by ∼_k on the set of suffixes of a word, for all k ≥ 1. We show that such a tree can be constructed for an input word in linear time. Then, when working with two words s and t, we compute their respective SimonTrees and efficiently build a correspondence between the nodes of these trees. This correspondence, which can also be constructed in linear time O(s+t), allows us to retrieve the largest k for which s∼_k t.
BibTeX  Entry
@InProceedings{gawrychowski_et_al:LIPIcs.STACS.2021.34,
author = {Gawrychowski, Pawe{\l} and Kosche, Maria and Ko{\ss}, Tore and Manea, Florin and Siemer, Stefan},
title = {{Efficiently Testing Simon’s Congruence}},
booktitle = {38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021)},
pages = {34:134:18},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959771801},
ISSN = {18688969},
year = {2021},
volume = {187},
editor = {Bl\"{a}ser, Markus and Monmege, Benjamin},
publisher = {Schloss Dagstuhl  LeibnizZentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2021/13679},
URN = {urn:nbn:de:0030drops136796},
doi = {10.4230/LIPIcs.STACS.2021.34},
annote = {Keywords: Simon’s congruence, Subsequence, Scattered factor, Efficient algorithms}
}
Keywords: 

Simon’s congruence, Subsequence, Scattered factor, Efficient algorithms 
Collection: 

38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021) 
Issue Date: 

2021 
Date of publication: 

10.03.2021 