dimensional data . The number of data samples (N) required to estimate a function of several variables grows exponentially with the number of dimensions (D) . Since, in practice, the number of observations available is limited, high-dimensional spaces are inherently sparse. This fact is responsible for the so-called curse of dimensionality and is often known as the empty space phenomenon . Although most real-world problems involve observations composed of several (many) variables, usually they do not suffer severely from this problem because data is located near a manifold of dimension r smaller than D . Therefore, according to Verleysen, any learning task should begin by an attempt to reduce the dimensionality of the data, since this is a key issue in high-dimensional learning . The rationale is to take advantage of the data redundancies in order to circumvent the curse of dimensionality problem (or at least to attenuate the problems inherent to high-dimensions) [234, 233]. In this context, the NMF algorithm can be used to reduce the data dimensionality, while preserving the information of the most relevant features in order to rebuild accurate approximations of the original data. NMF is a non-linear unsupervised technique for discovering a parts-based representation of objects [264, 111], with applications in image processing, text mining, document clustering, multimedia data, bioinformatics, micro-array data analysis, molecular pattern discovery, physics, air emission control, collaborative filtering and financial analysis among others [73, 115, 188, 264]. Essentially, it decomposes a matrix, containing only non-negative coefficients, into the product of two other matrices (also composed of non-negative coefficients): a parts-based matrix and a matrix containing the fractional combinations of the parts that approximate the original data. Since the factorized matrices are usually created with reduced ranks, NMF can be perceived as a method for reducing the number of features, while preserving the relevant information that allows for the reconstruction of the original data. Reducing the dimensionality of data poses several advantages: First, since noise is usually a random parameter, NMF cannot afford to represent it in lower dimensions of the space. Hence, noise disturbances are simply discarded, because there is no room to represent them. Moreover, redundant (highly correlated) data will be compacted for the same reason. Second, it allows for the circumvention of the curse of dimensionality and the empty space phenomenon problems , therefore allowing for the improvement of the accuracy of the models. Third, the computational cost associated with the problem is usually reduced  (since less data needs to be processed) and intractable problems may be handled. Moreover, learning methods (including NNs), often present difficulties handling high-dimensional feature vectors . Thus, reducing the data dimensionality assumes particular relevance when dealing with data vectors in high-dimensional spaces and may be crucial in domains such as face recognition or text categorization where the number of available features is typically high.
V ≈ WH . (7.1)
Generally, the value of r is chosen to satisfy (D + N)r < DN, so that the approximation, WH, can be viewed as a compressed form of the original data .