License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.AofA.2020.15
URN: urn:nbn:de:0030-drops-120459
URL: https://drops.dagstuhl.de/opus/volltexte/2020/12045/
Go to the corresponding LIPIcs Volume Portal


Jacquet, Philippe ; Szpankowski, Wojciech

Analysis of Lempel-Ziv'78 for Markov Sources

pdf-format:
LIPIcs-AofA-2020-15.pdf (0.7 MB)


Abstract

Lempel-Ziv'78 is one of the most popular data compression algorithms. Over the last few decades fascinating properties of LZ78 were uncovered. Among others, in 1995 we settled the Ziv conjecture by proving that for a memoryless source the number of LZ78 phrases satisfies the Central Limit Theorem (CLT). Since then the quest commenced to extend it to Markov sources. However, despite several attempts this problem is still open. The 1995 proof of the Ziv conjecture was based on two models: In the DST-model, the associated digital search tree (DST) is built over m independent strings. In the LZ-model a single string of length n is partitioned into variable length phrases such that the next phrase is not seen in the past as a phrase. The Ziv conjecture for memoryless source was settled by proving that both DST-model and the LZ-model are asymptotically equivalent. The main result of this paper shows that this is not the case for the LZ78 algorithm over Markov sources. In addition, we develop here a large deviation for the number of phrases in the LZ78 and give a precise asymptotic expression for the redundancy which is the excess of LZ78 code over the entropy of the source. We establish these findings using a combination of combinatorial and analytic tools. In particular, to handle the strong dependency between Markov phrases, we introduce and precisely analyze the so called tail symbol which is the first symbol of the next phrase in the LZ78 parsing.

BibTeX - Entry

@InProceedings{jacquet_et_al:LIPIcs:2020:12045,
  author =	{Philippe Jacquet and Wojciech Szpankowski},
  title =	{{Analysis of Lempel-Ziv'78 for Markov Sources}},
  booktitle =	{31st International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2020)},
  pages =	{15:1--15:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-147-4},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{159},
  editor =	{Michael Drmota and Clemens Heuberger},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2020/12045},
  URN =		{urn:nbn:de:0030-drops-120459},
  doi =		{10.4230/LIPIcs.AofA.2020.15},
  annote =	{Keywords: Lempel-Ziv algorithm, digital search trees, depoissonization, analytic combinatorics, large deviations}
}

Keywords: Lempel-Ziv algorithm, digital search trees, depoissonization, analytic combinatorics, large deviations
Collection: 31st International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2020)
Issue Date: 2020
Date of publication: 10.06.2020


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI