Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic
Читать онлайн книгу.x Superscript l Baseline right-parenthesis right-parenthesis v e c left-parenthesis upper F right-parenthesis right-parenthesis Over partial-differential left-parenthesis v e c left-parenthesis upper F right-parenthesis Superscript upper T Baseline right-parenthesis EndFraction equals upper I circled-times normal phi left-parenthesis x Superscript l Baseline right-parenthesis period"/>
We have used the fact that ∂XaT/∂a = X or ∂Xa/∂aT = X so long as the matrix multiplications are well defined. This equation leads to
(3.93)
Taking the transpose, we get
(3.94)
Both Eqs. (3.87) and (3.88) are used in the above derivation giving ∂z/∂F= φ(xl)T ∂z/∂Y, which is a simple rule to update the parameters in the l−th layer: the gradient with respect to the convolution parameters is the product between φ(xl)T (the im2col expansion) and ∂z/∂Y (the supervision signal transferred from the (l + 1)‐th layer).
Function φ(xl) has dimension Hl + 1 Wl + 1 HW Dl. From the above, we know that its elements are indexed by a pair p,q. So far, from Eq. (3.84) we know: (i) from q we can determine dl, the channel of the convolution kernel that is used; and we can also determine i and j, the spatial offsets inside the kernel; (ii) from p we can determine il + 1 and jl + 1, the spatial offsets inside the convolved result xl + 1; and (iii) the spatial offsets in the input xl can be determined as il = il + 1 + i and jl = jl + 1 + j. In other words, the mapping m: (p, q) → (il, jl, dl) is one to one, and thus is a valid function. The inverse mapping, however, is one to many (and thus not a valid function). If we use m−1 to represent the inverse mapping, we know that m−1(il, jl, dl) is a set S, where each (p, q) ∈ S satisfies m(p, q) = (il, jl, dl). Now we take a look at φ(xl) from a different perspective.
The question: What information is required in order to fully specify this function? It is obvious that the following three types of information are needed (and only those). The answer: For every element of φ(xl), we need to know
(A) Which region does it belong to, or what is the value of (0 ≤ p< Hl + 1 Wl + 1)?
(B) Which element is it inside the region (or equivalently inside the convolution kernel); that is, what is the value of q(0 ≤ q< HWDl )? The above two types of information determine a location (p, q) inside φ(xl). The only missing information is (C) What is the value in that position, that is, [φ(xl)]pq?
Since every element in φ(xl) is a verbatim copy of one element from xl, we can reformulate question (C) into a different but equivalent one:
(C.1) Where is the value of a given [φ(xl)]pq copied from? Or, what is its original location inside xl, that is, an index u that satisfies 0 ≤ u < Hl Wl Dl? (C.2) The entire xl.
It is easy to see that the collective information in [A, B, C.1] (for the entire range of p, q, and u, and (C.2) (xl) contains exactly the same amount of information as φ(xl). Since 0 ≤ p < Hl + 1 Wl + 1, 0 ≤ q< HW Dl, and 0 ≤ u < Hl Wl Dl, we can use a a matrix
Then, we can use the “indicator” method to encode the function m(p, q) = (il, jl, dl) into M. That is, for any possible element in M, its row index x determines a(p, q) pair, and its column index y determines a(il, jl, dl) triplet, and M is defined as
(3.95)
The M matrix is very high dimensional. At the same time, it is also very sparse: there is only one nonzero entry in the Hl Wl Dl elements in one row, because m is a function. M, which uses information [A, B, C.1], encodes only the one‐to‐one correspondence between any element in φ(xl) and any element in xl; it does not encode any specific value in xl. Putting together the one‐to‐one correspondence information in M and the value information in xl, we have