Abstract
Several wellstudied models of access to data samples, including statistical queries, local differential privacy and lowcommunication algorithms rely on queries that provide information about a function of a single sample. (For example, a statistical query (SQ) gives an estimate of Ex_{x ~ D}[q(x)] for any choice of the query function q mapping X to the reals, where D is
an unknown data distribution over X.) Yet some data analysis algorithms rely on properties of functions that depend on multiple samples. Such algorithms would be naturally implemented using kwise queries each of which is specified by a function q mapping X^k to the reals. Hence it is natural to ask whether algorithms using kwise queries can solve learning problems more efficiently and by how much.
Blum, Kalai and Wasserman (2003) showed that for any weak PAC learning problem over a fixed distribution, the complexity of learning with kwise SQs is smaller than the (unary) SQ complexity by a factor of at most 2^k. We show that for more general problems over distributions the picture is substantially richer. For every k, the complexity of distributionindependent PAC learning with kwise queries can be exponentially larger than learning with (k+1)wise queries. We then give two approaches for simulating a kwise query using unary queries. The first approach exploits the structure of the
problem that needs to be solved. It generalizes and strengthens (exponentially)
the results of Blum et al.. It allows us to derive strong lower bounds for
learning DNF formulas and stochastic constraint satisfaction problems that hold
against algorithms using kwise queries. The second approach exploits the
kparty communication complexity of the kwise query function.
BibTeX  Entry
@InProceedings{feldman_et_al:LIPIcs:2017:8180,
author = {Vitaly Feldman and Badih Ghazi},
title = {{On the Power of Learning from kWise Queries}},
booktitle = {8th Innovations in Theoretical Computer Science Conference (ITCS 2017)},
pages = {41:141:32},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959770293},
ISSN = {18688969},
year = {2017},
volume = {67},
editor = {Christos H. Papadimitriou},
publisher = {Schloss DagstuhlLeibnizZentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2017/8180},
URN = {urn:nbn:de:0030drops81801},
doi = {10.4230/LIPIcs.ITCS.2017.41},
annote = {Keywords: Statistical Queries, PAC Learning, Differential Privacy, Lower bounds, Communication Complexity}
}
Keywords: 

Statistical Queries, PAC Learning, Differential Privacy, Lower bounds, Communication Complexity 
Collection: 

8th Innovations in Theoretical Computer Science Conference (ITCS 2017) 
Issue Date: 

2017 
Date of publication: 

28.11.2017 