Abstract
A factorization of a string S is a partition of w into substrings u_1,… ,u_k such that S = u_1 u_2 ⋯ u_k. Such a partition is called equalityfree if no two factors are equal: u_i ≠ u_j, ∀ i,j with i ≠ j. The maximum equalityfree factorization problem is to find for a given string S, the largest integer k for which S admits an equalityfree factorization with k factors.
Equalityfree factorizations have lately received attention because of their applications in DNA selfassembly. The best approximation algorithm known for the problem is the natural greedy algorithm, that chooses iteratively from left to right the shortest factor that does not appear before. This algorithm has a √n approximation ratio (SOFSEM 2020) and it is an open problem whether there is a better solution.
Our main result is to show that the natural greedy algorithm is a Θ(n^{1/4}) approximation algorithm for the maximum equalityfree factorization problem. Thus, we disprove one of the conjectures of Mincu and Popa (SOFSEM 2020) according to which the greedy algorithm is a Θ(√n) approximation.
The most challenging part of the proof is to show that the greedy algorithm is an O(n^{1/4}) approximation. We obtain this algorithm via prefix free factor families, i.e. a set of nonoverlapping factors of the string which are pairwise nonprefixes of each other. In the paper we show the relation between prefix free factor families and the maximum equalityfree factorization. Moreover, as a byproduct we present another approximation algorithm that achieves an approximation ratio of O(n^{1/4}) that we believe is of independent interest and may lead to improved algorithms. We then show that the natural greedy algorithm has an approximation ratio that is Ω(n^{1/4}) via a clever analysis which shows that the greedy algorithm is Θ(n^{1/4}) for the maximum equalityfree factorization problem.
BibTeX  Entry
@InProceedings{kraus_et_al:LIPIcs.CPM.2023.19,
author = {Kraus, Matan and Lewenstein, Moshe and Popa, Alexandru and Porat, Ely and Sadia, Yonathan},
title = {{String Factorization via Prefix Free Families}},
booktitle = {34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
pages = {19:119:10},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959772761},
ISSN = {18688969},
year = {2023},
volume = {259},
editor = {Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
publisher = {Schloss Dagstuhl  LeibnizZentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2023/17973},
URN = {urn:nbn:de:0030drops179738},
doi = {10.4230/LIPIcs.CPM.2023.19},
annote = {Keywords: string factorization, NPhard problem, approximation algorithm}
}
Keywords: 

string factorization, NPhard problem, approximation algorithm 
Collection: 

34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023) 
Issue Date: 

2023 
Date of publication: 

21.06.2023 