License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ICALP.2016.44
URN: urn:nbn:de:0030-drops-63246
Go to the corresponding LIPIcs Volume Portal

Blasiok, Jaroslaw ; Nelson, Jelani

An Improved Analysis of the ER-SpUD Dictionary Learning Algorithm

LIPIcs-ICALP-2016-44.pdf (0.6 MB)


In dictionary learning we observe Y = AX + E for some Y in R^{n*p}, A in R^{m*n}, and X in R^{m*p}, where p >= max{n, m}, and typically m >=n. The matrix Y is observed, and A, X, E are unknown. Here E is a "noise" matrix of small norm, and X is column-wise sparse. The matrix A is referred to as a dictionary, and its columns as atoms. Then, given some small number p of samples, i.e. columns of Y , the goal is to learn the dictionary A up to small error, as well as the coefficient matrix X. In applications one could for example think of each column of Y as a distinct image in a database. The motivation is that in many applications data is expected to sparse when represented by atoms in the "right" dictionary A (e.g. images in the Haar wavelet basis), and the goal is to learn A from the data to then use it for other applications.

Recently, the work of [Spielman/Wang/Wright, COLT'12] proposed the dictionary learning algorithm ER-SpUD with provable guarantees when E = 0 and m = n. That work showed that if X has independent entries with an expected Theta n non-zeroes per column for 1/n <~ Theta <~ 1/sqrt(n), and with non-zero entries being subgaussian, then for p >~ n^2 log^2 n with high probability ER-SpUD outputs matrices A', X' which equal A, X up to permuting and scaling columns (resp. rows) of A (resp. X). They conjectured that p >~ n log n suffices, which they showed was information theoretically necessary for any algorithm to succeed when Theta =~ 1/n. Significant progress toward showing that p >~ n log^4 n might suffice was later obtained in [Luh/Vu, FOCS'15].

In this work, we show that for a slight variant of ER-SpUD, p >~ n log(n/delta) samples suffice for successful recovery with probability 1 - delta. We also show that without our slight variation made to ER-SpUD, p >~ n^{1.99} samples are required even to learn A, X with a small success probability of 1/ poly(n). This resolves the main conjecture of [Spielman/Wang/Wright, COLT'12], and contradicts a result of [Luh/Vu, FOCS'15], which claimed that p >~ n log^4 n guarantees high probability of success for the original ER-SpUD algorithm.

BibTeX - Entry

  author =	{Jaroslaw Blasiok and Jelani Nelson},
  title =	{{An Improved Analysis of the ER-SpUD Dictionary Learning Algorithm}},
  booktitle =	{43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016)},
  pages =	{44:1--44:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-013-2},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{55},
  editor =	{Ioannis Chatzigiannakis and Michael Mitzenmacher and Yuval Rabani and Davide Sangiorgi},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{},
  URN =		{urn:nbn:de:0030-drops-63246},
  doi =		{10.4230/LIPIcs.ICALP.2016.44},
  annote =	{Keywords: dictionary learning, stochastic processes, generic chaining}

Keywords: dictionary learning, stochastic processes, generic chaining
Collection: 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016)
Issue Date: 2016
Date of publication: 23.08.2016

DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI