Abstract
Motivated by computing duplication patterns in sequences, a new fundamental problem called the longest letterduplicated subsequence (LLDS) is proposed. Given a sequence S of length n, a letterduplicated subsequence is a subsequence of S in the form of x₁^{d₁}x₂^{d₂}⋯ x_k^{d_k} with x_i ∈ Σ, x_j≠ x_{j+1} and d_i ≥ 2 for all i in [k] and j in [k1]. A linear time algorithm for computing the longest letterduplicated subsequence (LLDS) of S can be easily obtained. In this paper, we focus on two variants of this problem. We first consider the constrained version when Σ is unbounded, each letter appears in S at least 6 times and all the letters in Σ must appear in the solution. We show that the problem is NPhard (a further twist indicates that the problem does not admit any polynomial time approximation). The reduction is from possibly the simplest version of SAT that is NPcomplete, (≤ 2,1, ≤ 3)SAT, where each variable appears at most twice positively and exact once negatively, and each clause contains at most three literals and some clauses must contain exactly two literals. (We hope that this technique will serve as a general tool to help us proving the NPhardness for some more tricky sequence problems involving only one sequence  much harder than with at least two input sequences, which we apply successfully at the end of the paper on some extra variations of the LLDS problem.) We then show that when each letter appears in S at most 3 times, then the problem admits a factor 1.5O(1/n) approximation. Finally, we consider the weighted version, where the weight of a block x_i^{d_i} (d_i ≥ 2) could be any positive function which might not grow with d_i. We give a nontrivial O(n²) time dynamic programming algorithm for this version, i.e., computing an LDsubsequence of S whose weight is maximized.
BibTeX  Entry
@InProceedings{lai_et_al:LIPIcs.CPM.2022.7,
author = {Lai, Wenfeng and Liyanage, Adiesha and Zhu, Binhai and Zou, Peng},
title = {{Beyond the Longest LetterDuplicated Subsequence Problem}},
booktitle = {33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
pages = {7:17:12},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959772341},
ISSN = {18688969},
year = {2022},
volume = {223},
editor = {Bannai, Hideo and Holub, Jan},
publisher = {Schloss Dagstuhl  LeibnizZentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2022/16134},
URN = {urn:nbn:de:0030drops161348},
doi = {10.4230/LIPIcs.CPM.2022.7},
annote = {Keywords: Segmental duplications, Tandem duplications, Longest common subsequence, NPcompleteness, Dynamic programming}
}
Keywords: 

Segmental duplications, Tandem duplications, Longest common subsequence, NPcompleteness, Dynamic programming 
Collection: 

33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022) 
Issue Date: 

2022 
Date of publication: 

22.06.2022 