Matrix and Tensor Decompositions in Signal Processing. Gérard Favier
Читать онлайн книгу.the mode-n unfolding of χ7;
– in the case of the CMTucker model, the coupling considered here relates to the first modes of the tensor χ and the matrix Y of data via the common matrix factor A.
Coupled matrix and tensor factorization (CMTF) models were introduced in Acar et al. (2011b) by coupling a CPD model with a matrix factorization and using the gradient descent algorithm to estimate the parameters. This type of model was used by Acar et al. (2017) to merge EEG and fMRI data with the goal of analyzing brain activity. The EEG signals are modeled with a normalized CPD model (see Chapter 5), whereas the fMRI data are modeled with a matrix factorization. The data are coupled through the subjects mode (see Table I.1). The cost function to be minimized is therefore given by:
[I.3]
where the column vectors of the matrix factors (A, B, C) have unit norm, Σ is a diagonal matrix whose diagonal elements are the coefficients of the vector σ and α > 0 is a penalty parameter that allows the importance of the sparseness constraints on the weight vectors (g, σ) to be increased or decreased, modeled by means of the l1 norm. The advantage of merging EEG and fMRI data with the criterion [I.3] is that the acquisition and observation methods are complementary in terms of resolution, since EEG signals have a high temporal resolution but low spatial resolution, while fMRI imaging provides high spatial resolution;
– in the case of the CSTF model (Li et al. 2018), the tensor of high-resolution hyperspectral images (HR-HSI) is represented using a third-order Tucker model that has a sparse core with the following modes: space (width) × space (height) × spectral bands. The matrices denote the dictionaries for the width, height and spectral modes, composed of nw, nh and ns atoms, respectively, and the core tensor contains the coefficients relative to the three dictionaries. The matrices W∗, H∗ and S∗ are spatially and spectrally subsampled versions with respect to each mode. The term λ is a regularization parameter for the sparseness constraint on the core tensor, expressed in terms of the l1 norm of .
7 See definition [3.41] of the unfolding Xn, and definitions [1.65] and [1.67] of the Frobenius norm (||.||F) and the nuclear norm (||.||∗) of a matrix; for a tensor, see section 3.16.
The criteria listed in Table I.4 can be globally minimized using a nonlinear optimization method such as a gradient descent algorithm (with fixed or optimal step size), or the Gauss–Newton and Levenberg–Marquardt algorithms, the latter being a regularized form of the former. In the case of constrained optimization, the augmented Lagrangian method is very often used, as it allows the constrained optimization problem to be transformed into a sequence of unconstrained optimization problems.
The drawbacks of these optimization methods include slow convergence for gradient-type algorithms and high numerical complexity for the Gauss–Newton and Levenberg–Marquardt algorithms due to the need to compute the Jacobian matrix of the criterion w.r.t. the parameters being estimated, as well as the inverse of a large matrix.
Alternating optimization methods are therefore often used instead of a global optimization w.r.t. all matrix and tensor factors to be estimated. These iterative methods perform a sequence of separate optimizations of criteria linear in each unknown factor while fixing the other factors with the values estimated at previous iterations. An example is the standard ALS (alternating least squares) algorithm, presented in Chapter 5 for estimating PARAFAC models. For constrained optimization, the alternating direction method of multipliers (ADMM) is often used (Boyd et al. 2011).
To complete this introductory chapter, let us outline the key knowledge needed to employ tensor tools, whose presentation constitutes the main objective of this second volume:
– arrangement (also called reshaping) operations that express the data tensor as a vector (vectorization), a matrix (matricization), or a lower order tensor by combining modes; conversely, the tensorization and Hankelization operations allow us to construct tensors from data contained in large vectors or matrices;
– tensor operations such as transposition, symmetrization, Hadamard and Kronecker products, inversion and pseudo-inversion;
– the notions of eigenvalue and singular value of a tensor;
– tensor decompositions/models, and their uniqueness properties;
– algorithms used to solve dimensionality reduction problems and, hence, best low-rank approximation, parameter estimation and missing data imputation. This algorithmic aspect linked to tensors will be explored in more depth in Volume 3.
I.6. Brief description of content
Tensor operations and decompositions often use matrix tools, so we will begin by reviewing some matrix decompositions in Chapter 1, going into further detail on eigenvalue decomposition (EVD) and SVD, as well as a few of their applications.
The Hadamard, Kronecker and Khatri–Rao matrix products are presented in detail in Chapter 2, together with many of their properties and a few relations between them. To illustrate these operations, we will use them to represent first-order partial derivatives of a function, and solve matrix equations, such as Sylvester and Lyapunov ones. This chapter also introduces an index convention that is very useful for tensor computations. This convention, which generalizes Einstein’s summation convention (Pollock 2011), will be used to represent various matrix products and to prove some matrix product vectorization formulae, as well as various relations between the Kronecker, Khatri-Rao and Hadamard products. It will be used in Chapter 3 for tensor matricization and vectorization in an original way, as well as in Chapter 5 to establish matrix forms of the Tucker and PARAFAC decompositions.
Chapter 3 presents various sets of tensors before introducing the notions of matrix and tensor slices and of mode combination on which reshaping operations are based. The key tensor operations listed above are then presented. Several links between products of tensors and systems of tensor equations are also outlined, and some of these systems are solved with the least squares method.
Chapter 4 is dedicated to introducing the notions of eigenvalue and singular value for tensors. The problem of the best rank-one approximation of a tensor is also considered.
In Chapter 5, we will give a detailed presentation of various tensor decompositions, with a particular focus on the basic Tucker and CPD decompositions, which can be viewed as generalizations of matrix SVD to tensors of order greater than two. Block tensor models and constrained tensor models will also be described, as well as certain variants, such as HOSVD and BTD (block term decomposition). CPD-type decompositions are generally used to estimate latent parameters, whereas Tucker decomposition is often used to estimate modal subspaces and reduce the dimensionality via low multilinear rank approximation and truncated HOSVD.
A