License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.SEA.2023.21
URN: urn:nbn:de:0030-drops-183715
URL: https://drops.dagstuhl.de/opus/volltexte/2023/18371/
Go to the corresponding LIPIcs Volume Portal


Jaud, Stephen ; Wirth, Anthony ; Choudhury, Farhana

Maximum Coverage in Sublinear Space, Faster

pdf-format:
LIPIcs-SEA-2023-21.pdf (2 MB)


Abstract

Given a collection of m sets from a universe 𝒰, the Maximum Set Coverage problem consists of finding k sets whose union has largest cardinality. This problem is NP-Hard, but the solution can be approximated by a polynomial time algorithm up to a factor 1-1/e. However, this algorithm does not scale well with the input size.
In a streaming context, practical high-quality solutions are found, but with space complexity that scales linearly with respect to the size of the universe n = |𝒰|. However, one randomized streaming algorithm has been shown to produce a 1-1/e-ε approximation of the optimal solution with a space complexity that scales only poly-logarithmically with respect to m and n. In order to achieve such a low space complexity, the authors used two techniques in their multi-pass approach:
- F₀-sketching, allows to determine with great accuracy the number of distinct elements in a set using less space than the set itself.
- Subsampling, consists of only solving the problem on a subspace of the universe. It is implemented using γ-independent hash functions.
This article focuses on the sublinear-space algorithm and highlights the time cost of these two techniques, especially subsampling. We present optimizations that significantly reduce the time complexity of the algorithm. Firstly, we give some optimizations that do not alter the space complexity, number of passes and approximation quality of the original algorithm. In particular, we reanalyze the error bounds to show that the original independence factor of Ω(ε^{-2} k log m) can be fine-tuned to Ω(k log m); we also show how F₀-sketching can be removed. Secondly, we derive a new lower bound for the probability of producing a 1-1/e-ε approximation using only pairwise independence: 1- (4/(c k log m)) compared to 1-(2e/(m^{ck/6})) with Ω(k log m)-independence.
Although the theoretical guarantees are weaker, suggesting the approximation quality would suffer, for large streams, our algorithms perform well in practice. Finally, our experimental results show that even a pairwise-independent hash-function sampler does not produce worse solution than the original algorithm, while running significantly faster by several orders of magnitude.

BibTeX - Entry

@InProceedings{jaud_et_al:LIPIcs.SEA.2023.21,
  author =	{Jaud, Stephen and Wirth, Anthony and Choudhury, Farhana},
  title =	{{Maximum Coverage in Sublinear Space, Faster}},
  booktitle =	{21st International Symposium on Experimental Algorithms (SEA 2023)},
  pages =	{21:1--21:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-279-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{265},
  editor =	{Georgiadis, Loukas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2023/18371},
  URN =		{urn:nbn:de:0030-drops-183715},
  doi =		{10.4230/LIPIcs.SEA.2023.21},
  annote =	{Keywords: streaming algorithms, subsampling, maximum set cover, k-wise independent hash functions}
}

Keywords: streaming algorithms, subsampling, maximum set cover, k-wise independent hash functions
Collection: 21st International Symposium on Experimental Algorithms (SEA 2023)
Issue Date: 2023
Date of publication: 19.07.2023
Supplementary Material: Software (Source Code): https://github.com/caesiumCode/streaming-maximum-cover archived at: https://archive.softwareheritage.org/swh:1:dir:1012da79a9177f4dc0ae4e5851608b597e79fa8d


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI