License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.AofA.2020.17
URN: urn:nbn:de:0030-drops-120476
URL: https://drops.dagstuhl.de/opus/volltexte/2020/12047/
Go to the corresponding LIPIcs Volume Portal


Janson, Svante ; Szpankowski, Wojciech

Hidden Words Statistics for Large Patterns

pdf-format:
LIPIcs-AofA-2020-17.pdf (0.5 MB)


Abstract

We study here the so called subsequence pattern matching also known as hidden pattern matching in which one searches for a given pattern w of length m as a subsequence in a random text of length n. The quantity of interest is the number of occurrences of w as a subsequence (i.e., occurring in not necessarily consecutive text locations). This problem finds many applications from intrusion detection, to trace reconstruction, to deletion channel, and to DNA-based storage systems. In all of these applications, the pattern w is of variable length. To the best of our knowledge this problem was only tackled for a fixed length m=O(1) [P. Flajolet et al., 2006]. In our main result Theorem 5 we prove that for m=o(n^{1/3}) the number of subsequence occurrences is normally distributed. In addition, in Theorem 6 we show that under some constraints on the structure of w the asymptotic normality can be extended to m=o(√n). For a special pattern w consisting of the same symbol, we indicate that for m=o(n) the distribution of number of subsequences is either asymptotically normal or asymptotically log normal. We conjecture that this dichotomy is true for all patterns. We use Hoeffding’s projection method for U-statistics to prove our findings.

BibTeX - Entry

@InProceedings{janson_et_al:LIPIcs:2020:12047,
  author =	{Svante Janson and Wojciech Szpankowski},
  title =	{{Hidden Words Statistics for Large Patterns}},
  booktitle =	{31st International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2020)},
  pages =	{17:1--17:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-147-4},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{159},
  editor =	{Michael Drmota and Clemens Heuberger},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2020/12047},
  URN =		{urn:nbn:de:0030-drops-120476},
  doi =		{10.4230/LIPIcs.AofA.2020.17},
  annote =	{Keywords: Hidden pattern matching, subsequences, probability, U-statistics, projection method}
}

Keywords: Hidden pattern matching, subsequences, probability, U-statistics, projection method
Collection: 31st International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2020)
Issue Date: 2020
Date of publication: 10.06.2020


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI