Login

[email protected] · 10-04-2017, 08:46 PM

Abstract
Nonnegative matrix factorization (nmf) is a powerful and well-established technique in data mining. We generalize nmf signi_cantly by introducing several new and rich formulations for sparse nonnegative matrix factorization (snmf). Our formulations are substantially more general than previously studied snmf problems. In particular, we model sparsity via constraints and regularizers, for which we describe a variety of concrete choices of sparsity tuning functions. To solve snmf, we develop convergent and computationally e_cient algorithms based on proximal minimization. To illustrate the performance of our algorithms we provide experimental results that also indicate potential bene_ts of increased modeling power o_ered by new formulations.
1 Introduction
In the versatile repertoire of a data miner, a fundamen- tal position is occupied by matrix factorization meth- ods, amongst which, a powerful choice is: nonnegative matrix factorization (nmf). The standard nmf problem is formulated as follows. Let A 2 Rm_n + be an input matrix. We seek a set of k nonnegative \basis" vectors B = [b1; : : : ; bk] that can be nonnegatively (conically) combined to well- approximate A. In symbols, nmf seeks vectors bj _ 0 and coe_cients cji _ 0, such that for each i, (1.1) ai _ Xk j=1 bjcji; cji _ 0: Equivalently, nmf seeks matrices B and C, such that (1.2) A _ BC; where B 2 Rm_k + ; C 2 Rk_n + : nmf was originally studied in linear algebra in the early 1970s [29]. It found application in a factor- analysis setting much later [32]. However, nmf gained widespread attention only after the (1999) paper of Lee & Seung [21]. Ever since, nmf has been signi_cantly generalized (see e.g., [10, 11, 18, 37]), several algorithms have been proposed for it (see e.g., [4, 20, 22, 24]), and an enormous number of applications have emerged (e.g., see [4, 38], and references therein). _Based on research done while visiting MPI for Biological Cybernetics, T ubingen, Germany yMPI for Biological Cybernetics, T ubingen, Germany The huge attention received by nmf may be largely attributed to the following obvious but crucial feature: it approximates nonnegative data by nonnegative fac- tors. This feature not only results in models of the data that are easier to interpret, but it also favors discover- ing within data a \parts-based" structure [21]. Another structural property of data that often plays a key role in data analysis is sparsity. Indeed, sparsity constrained data modeling has recently received tremendous atten- tion, especially because it o_ers a powerful approach to analyze and model complex, high-dimensional datasets common in modern applications [25, 34, 35]. Therefore, a natural question to ask is whether nmf can be aug- mented with ideas from sparse modeling to obtain a richer, more exible data analysis tool? The answer to this question is yes. In fact, nmf already implicitly favors models sparser than traditional PCA / SVD factors but this sparsity is a byproduct of the nonnegativity constraints, and as such, the user does not have explicit control over it. Hoyer [18] introduced a simple but e_ective penalty function to help regulate sparsity of nmf models; other attempts at incorporating sparsity control include [14, 16, 20], for instance. However, these approaches are either limited to a _xed sparsity inducing function, or they are computationally demanding. In this paper we investigate sparse nmf problems that are not only general and broadly applicable, but also computationally practical. Before delving into the exact details, let us summarize the main contributions of this paper for the reader's convenience.

Download full report
http://googleurl?sa=t&source=web&cd=1&ve...sdmNMF.pdf&ei=uL9ETr-bA8urrAfXg-3cAw&usg=AFQjCNHGqjqIcwCwBRLQg-78-lyPGwNYdw&sig2=7pk61TTsM6xqgp88s4fQ-Q