Abstract
Sketching has emerged as a powerful technique for speeding up problems in numerical linear algebra, such as regression. In the overconstrained regression problem, one is given an n x d matrix A, with n >> d, as well as an n x 1 vector b, and one wants to find a vector \hat{x} so as to minimize the residual error Axb_2. Using the sketch and solve paradigm, one first computes S \cdot A and S \cdot b for a randomly chosen matrix S, then outputs x' = (SA)^{\dagger} Sb so as to minimize  SAx'  Sb_2.
The sketchandsolve paradigm gives a bound on x'x^*_2 when A is wellconditioned. Our main result is that, when S is the subsampled randomized Fourier/Hadamard transform, the error x'  x^* behaves as if it lies in a "random" direction within this bound: for any fixed direction a in R^d, we have with 1  d^{c} probability that
(1) \langle a, x'x^* \rangle \lesssim \frac{ \a\_2\x'x^*\_2}{d^{\frac{1}{2}\gamma}},
where c, \gamma > 0 are arbitrary constants. This implies x'x^*_{\infty} is a factor d^{\frac{1}{2}\gamma} smaller than x'x^*_2. It also gives a better bound on the generalization of x' to new examples: if rows of A correspond to examples and columns to features, then our result gives a better bound for the error introduced by sketchandsolve when classifying fresh examples. We show that not all oblivious subspace embeddings S satisfy these properties. In particular, we give counterexamples showing that matrices based on CountSketch or leverage score sampling do not satisfy these properties.
We also provide lower bounds, both on how small x'x^*_2 can be, and for our new guarantee (1), showing that the subsampled randomized Fourier/Hadamard transform is nearly optimal. Our lower bound on x'x^*_2 shows that there is an O(1/epsilon) separation in the dimension of the optimal oblivious subspace embedding required for outputting an x' for which x'x^*_2 <= epsilon Ax^*b_2 \cdot A^{\dagger}_2$, compared to the dimension of the optimal oblivious subspace embedding required for outputting an x' for which Ax'b_2 <= (1+epsilon)Ax^*b_2, that is, the former problem requires dimension Omega(d/epsilon^2) while the latter problem can be solved with dimension O(d/epsilon). This explains the reason known upper bounds on the dimensions of these two variants of regression have differed in prior work.
BibTeX  Entry
@InProceedings{price_et_al:LIPIcs:2017:7448,
author = {Eric Price and Zhao Song and David P. Woodruff},
title = {{Fast Regression with an $ell_infty$ Guarantee}},
booktitle = {44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)},
pages = {59:159:14},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959770415},
ISSN = {18688969},
year = {2017},
volume = {80},
editor = {Ioannis Chatzigiannakis and Piotr Indyk and Fabian Kuhn and Anca Muscholl},
publisher = {Schloss DagstuhlLeibnizZentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2017/7448},
URN = {urn:nbn:de:0030drops74488},
doi = {10.4230/LIPIcs.ICALP.2017.59},
annote = {Keywords: Linear regression, CountSketch, Gaussians, Leverage scores, ell_inftyguarantee}
}
Keywords: 

Linear regression, CountSketch, Gaussians, Leverage scores, ell_inftyguarantee 
Collection: 

44th International Colloquium on Automata, Languages, and Programming (ICALP 2017) 
Issue Date: 

2017 
Date of publication: 

07.07.2017 