Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic. Читать онлайн. MREADZ.NET

Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic

Читать онлайн книгу.

В начало <53 54 55 56 57 58 59 60 61 62 >В конец

Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic

z slash partial-differential upper Y right-parenthesis upper F Superscript upper T Baseline right-bracket Subscript left-parenthesis p comma q right-parenthesis Baseline period"/>

In other words, to compute ∂z/∂X, we do not need to explicitly use the extremely high‐dimensional matrix M. Instead, Eqs. (3.102) and (3.84) can be used to efficiently find it. The convolution example from Figure 3.23 is used to illustrate the inverse mapping m⁻¹ in Figure 3.25.

In the right half of Figure 3.25, the 6 × 4 matrix is ∂z/∂Y)F^T. In order to compute the partial derivative of z with respect to one element in the input X, we need to find which elements in ∂z/∂Y)F^T are involved and add them. In the left half of Figure 3.25, we see that the input element 5 (shown in larger font) is involved in four convolution operations, shown by the gray, light gray, dotted gray and black boxes, respectively. These four convolution operations correspond to p = 1, 2, 3, 4. For example, when p = 2 (the light gray box), 5 is the third element in the convolution, and hence q = 3 when p = 2, and we put a light gray circle in the (2, 3)‐th element of the (∂z/∂Y)F^Tmatrix. After all four circles are put in the matrix (∂z/∂Y)F^T,the partial derivative is the sum of ellements in these four locations of (∂z/∂Y)F^T. The set m⁻¹(i^l, j^l, d^l) contains at most HWD^l elements. Hence, Eq. (3.102) requires at most HWD^l summations to compute one element of ∂z/∂X.

The pooling layer: Let x Superscript l Baseline element-of double-struck upper R Superscript upper H Super Superscript l Superscript times upper W Super Superscript l Superscript times upper D Super Superscript l be the input to the l‐th layer, which is now a pooling layer. The pooling operation requires no parameter (i.e., wⁱ is null, and hence parameter learning is not needed for this layer). The spatial extent of the pooling (H × W) is specified in the design of the CoNN structure. Assume that H divides H^l and W divides W^l and the stride equals the pooling spatial extent, the output of pooling (y or equivalently x^{l + 1}) will be an order‐3 tensor of size H^{l + 1} × W^{l + 1} × D^{l + 1}, with H^{l + 1} = H^l/H, W^{l + 1} = W^l/W, D^{l + 1} = D^l. A pooling layer operates upon x^l channel by channel independently. Within each channel, the matrix with H^l × W^l elements is divided into H^{l + 1} × W^{l + 1} nonoverlapping subregions, each subregion being H × W in size. The pooling operator then maps a subregion into a single number. Two types of pooling operators are widely used: max pooling and average pooling. In max pooling, the pooling operator maps a subregion to its maximum value, while the average pooling maps a subregion to its average value as illustrated in Figure 3.26.

Schematic illustration of computing ∂z/∂X.

Figure 3.25 Computing ∂z/∂X. (for more details see the color figure in the bins).

Schematic illustration of pooling layer operation.

Figure 3.26 Illustration of pooling layer operation. (for more details see the color figure in the bins).

Formally this can be represented as

(3.103)

where 0 ≤ i^{l + 1} < H^{l + 1}, 0 ≤ j^{l + 1} < W^{l + 1}, and 0 ≤ d < D^{l + 1} = D^l.

Pooling is a local operator, and its forward computation is straightforward. When focusing on the backpropagation, only max pooling will be discussed and we can resort to the indicator matrix again. All we need to encode in this indicator matrix is: for every element in y, where does it come from in x^l?

We need a triplet (i^l, j^l, d^l) to locate one element in the input x^l, and another triplet (i^{l + 1}, j^{l + 1}, d^{l + 1}) to locate one element in y. The pooling output y Subscript i Sub Superscript l plus 1 Subscript comma j Sub Superscript l plus 1 Subscript comma d Sub Superscript l plus 1 comes from x Subscript i Sub Superscript l Subscript comma j Sub Superscript l Subscript comma d Sub Superscript l Subscript Superscript l , if and only if the following conditions are met: (i) they are in the same channel; (ii) the (i^l, j^l)‐th spatial entry belongs to the (i^{l + 1}, j^{l + 1} )‐th subregion; and (iii) the (i^l, j^l)‐th spatial entry is the largest one in that subregion. This can be represented as

d Superscript l plus 1 Baseline equals d Superscript l Baseline comma left floor i Superscript l Baseline slash upper H right floor equals i Superscript l plus 1 Baseline comma left floor j Superscript l Baseline slash upper W right floor equals j Superscript i plus 1 Baseline comma x Subscript i Sub Superscript l Subscript comma j Sub Superscript l Subscript comma d Sub Superscript l Subscript Superscript l Baseline greater-than-or-equal-to y Subscript i plus i Sub Superscript l plus 1 Subscript times upper H comma j plus j Sub Superscript l plus 1 Subscript times upper W comma d Sub Superscript l Subscript Baseline comma for-all 0 less-than-or-equal-to i less-than upper H comma 0 less-than-or-equal-to j less-than upper W comma

where ⌊·⌋ is the floor function. If the stride is not H(W) in the vertical (horizontal) direction, the equation must be changed accordingly. Given a (i^{l + 1}, j^{l + 1}, d^{l + 1}) triplet, there is only one (i^l, j^l, d^l) triplet that satisfies all these conditions. So, we define an indicator matrix Скачать книгу

В начало <53 54 55 56 57 58 59 60 61 62 >В конец