Abstract
In dictionary learning we observe Y = AX + E for some Y in R^{n*p}, A in R^{m*n}, and X in R^{m*p}, where p >= max{n, m}, and typically m >=n. The matrix Y is observed, and A, X, E are unknown. Here E is a "noise" matrix of small norm, and X is columnwise sparse. The matrix A is referred to as a dictionary, and its columns as atoms. Then, given some small number p of samples, i.e. columns of Y , the goal is to learn the dictionary A up to small error, as well as the coefficient matrix X. In applications one could for example think of each column of Y as a distinct image in a database. The motivation is that in many applications data is expected to sparse when represented by atoms in the "right" dictionary A (e.g. images in the Haar wavelet basis), and the goal is to learn A from the data to then use it for other applications.
Recently, the work of [Spielman/Wang/Wright, COLT'12] proposed the dictionary learning algorithm ERSpUD with provable guarantees when E = 0 and m = n. That work showed that if X has independent entries with an expected Theta n nonzeroes per column for 1/n <~ Theta <~ 1/sqrt(n), and with nonzero entries being subgaussian, then for p >~ n^2 log^2 n with high probability ERSpUD outputs matrices A', X' which equal A, X up to permuting and scaling columns (resp. rows) of A (resp. X). They conjectured that p >~ n log n suffices, which they showed was information theoretically necessary for any algorithm to succeed when Theta =~ 1/n. Significant progress toward showing that p >~ n log^4 n might suffice was later obtained in [Luh/Vu, FOCS'15].
In this work, we show that for a slight variant of ERSpUD, p >~ n log(n/delta) samples suffice for successful recovery with probability 1  delta. We also show that without our slight variation made to ERSpUD, p >~ n^{1.99} samples are required even to learn A, X with a small success probability of 1/ poly(n). This resolves the main conjecture of [Spielman/Wang/Wright, COLT'12], and contradicts a result of [Luh/Vu, FOCS'15], which claimed that p >~ n log^4 n guarantees high probability of success for the original ERSpUD algorithm.
BibTeX  Entry
@InProceedings{blasiok_et_al:LIPIcs:2016:6324,
author = {Jaroslaw Blasiok and Jelani Nelson},
title = {{An Improved Analysis of the ERSpUD Dictionary Learning Algorithm}},
booktitle = {43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016)},
pages = {44:144:14},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959770132},
ISSN = {18688969},
year = {2016},
volume = {55},
editor = {Ioannis Chatzigiannakis and Michael Mitzenmacher and Yuval Rabani and Davide Sangiorgi},
publisher = {Schloss DagstuhlLeibnizZentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2016/6324},
URN = {urn:nbn:de:0030drops63246},
doi = {10.4230/LIPIcs.ICALP.2016.44},
annote = {Keywords: dictionary learning, stochastic processes, generic chaining}
}
Keywords: 

dictionary learning, stochastic processes, generic chaining 
Collection: 

43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016) 
Issue Date: 

2016 
Date of publication: 

23.08.2016 