When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.APPROX/RANDOM.2022.27
URN: urn:nbn:de:0030-drops-171497
URL: https://drops.dagstuhl.de/opus/volltexte/2022/17149/
 Go to the corresponding LIPIcs Volume Portal

### Exploring the Gap Between Tolerant and Non-Tolerant Distribution Testing

 pdf-format:

### Abstract

The framework of distribution testing is currently ubiquitous in the field of property testing. In this model, the input is a probability distribution accessible via independently drawn samples from an oracle. The testing task is to distinguish a distribution that satisfies some property from a distribution that is far in some distance measure from satisfying it. The task of tolerant testing imposes a further restriction, that distributions close to satisfying the property are also accepted.
This work focuses on the connection between the sample complexities of non-tolerant testing of distributions and their tolerant testing counterparts. When limiting our scope to label-invariant (symmetric) properties of distributions, we prove that the gap is at most quadratic, ignoring poly-logarithmic factors. Conversely, the property of being the uniform distribution is indeed known to have an almost-quadratic gap.
When moving to general, not necessarily label-invariant properties, the situation is more complicated, and we show some partial results. We show that if a property requires the distributions to be non-concentrated, that is, the probability mass of the distribution is sufficiently spread out, then it cannot be non-tolerantly tested with o(√n) many samples, where n denotes the universe size. Clearly, this implies at most a quadratic gap, because a distribution can be learned (and hence tolerantly tested against any property) using 𝒪(n) many samples. Being non-concentrated is a strong requirement on properties, as we also prove a close to linear lower bound against their tolerant tests.
Apart from the case where the distribution is non-concentrated, we also show if an input distribution is very concentrated, in the sense that it is mostly supported on a subset of size s of the universe, then it can be learned using only 𝒪(s) many samples. The learning procedure adapts to the input, and works without knowing s in advance.

### BibTeX - Entry

```@InProceedings{chakraborty_et_al:LIPIcs.APPROX/RANDOM.2022.27,
author =	{Chakraborty, Sourav and Fischer, Eldar and Ghosh, Arijit and Mishra, Gopinath and Sen, Sayantan},
title =	{{Exploring the Gap Between Tolerant and Non-Tolerant Distribution Testing}},
booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)},
pages =	{27:1--27:23},
series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN =	{978-3-95977-249-5},
ISSN =	{1868-8969},
year =	{2022},
volume =	{245},
editor =	{Chakrabarti, Amit and Swamy, Chaitanya},
publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},