License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ITCS.2017.19
URN: urn:nbn:de:0030-drops-81763
Go to the corresponding LIPIcs Volume Portal

Ghazi, Badih ; Haramaty, Elad ; Kamath, Pritish ; Sudan, Madhu

Compression in a Distributed Setting

LIPIcs-ITCS-2017-19.pdf (0.6 MB)


Motivated by an attempt to understand the formation and development of (human) language, we introduce a "distributed compression" problem. In our problem a sequence of pairs of players from a set of K players are chosen and tasked to communicate messages drawn from an unknown distribution Q.
Arguably languages are created and evolve to compress frequently occurring messages, and we focus on this aspect.
The only knowledge that players have about the distribution Q is from previously drawn samples, but these samples differ from player to player.
The only common knowledge between the players is restricted to a common prior distribution P and some constant number
of bits of information (such as a learning algorithm).
Letting T_epsilon denote the number of iterations it would take for a typical player
to obtain an epsilon-approximation to Q in total variation distance, we ask
whether T_epsilon iterations suffice to compress the messages down roughly to their
entropy and give a partial positive answer.

We show that a natural uniform algorithm can compress the communication down to an average cost per
message of O(H(Q) + log (D(P || Q)) in tilde{O}(T_epsilon) iterations
while allowing for O(epsilon)-error,
where D(. || .) denotes the KL-divergence between distributions.
For large divergences
this compares favorably with the static algorithm that ignores all samples and
compresses down to H(Q) + D(P || Q) bits, while not requiring T_epsilon * K iterations that it would take players to develop optimal but separate compressions for
each pair of players.
Along the way we introduce a "data-structural" view of the task of
communicating with a natural language and show that our natural algorithm can also be
implemented by an efficient data structure, whose storage is comparable to the storage requirements of Q and whose query complexity is comparable to the lengths of the message to be
Our results give a plausible mathematical analogy to the mechanisms by which
human languages get created and evolve, and in particular highlights the
possibility of coordination towards a joint task (agreeing on a language)
while engaging in distributed learning.

BibTeX - Entry

  author =	{Badih Ghazi and Elad Haramaty and Pritish Kamath and Madhu Sudan},
  title =	{{Compression in a Distributed Setting}},
  booktitle =	{8th Innovations in Theoretical Computer Science Conference (ITCS 2017)},
  pages =	{19:1--19:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-029-3},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{67},
  editor =	{Christos H. Papadimitriou},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{},
  URN =		{urn:nbn:de:0030-drops-81763},
  doi =		{10.4230/LIPIcs.ITCS.2017.19},
  annote =	{Keywords: Distributed Compression, Communication, Language Evolution, Isolating Hash Families}

Keywords: Distributed Compression, Communication, Language Evolution, Isolating Hash Families
Collection: 8th Innovations in Theoretical Computer Science Conference (ITCS 2017)
Issue Date: 2017
Date of publication: 28.11.2017

DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI