Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic. Читать онлайн. MREADZ.NET

Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic

Читать онлайн книгу.

В начало <50 51 52 53 54 55 56 57 58 59 >В конец

Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic

T Baseline circled-times upper A right-parenthesis v e c left-parenthesis upper X right-parenthesis comma"/>

The last equation can be utilized from both directions. Now we can write down

(3.89) v e c left-parenthesis y right-parenthesis equals v e c left-parenthesis normal phi left-parenthesis x Superscript l Baseline right-parenthesis italic upper F upper I right-parenthesis equals left-parenthesis upper I circled-times normal phi left-parenthesis x Superscript l Baseline right-parenthesis right-parenthesis v e c left-parenthesis upper F right-parenthesis comma

(3.90) v e c left-parenthesis y right-parenthesis equals v e c left-parenthesis upper I normal phi left-parenthesis x Superscript l Baseline right-parenthesis upper F right-parenthesis equals left-parenthesis upper F Superscript upper T Baseline circled-times upper I right-parenthesis v e c left-parenthesis normal phi left-parenthesis x Superscript l Baseline right-parenthesis right-parenthesis comma

where I is an identity matrix of appropriate size. In Eq. (3.89), the size of I is determined by the number of columns in F, and hence I ∈ ℝ^{D × D}. In Eq. (3.90), upper I element-of double-struck upper R Superscript left-parenthesis upper H Super Superscript l plus 1 Superscript upper W Super Superscript l plus 1 Superscript right-parenthesis times left-parenthesis upper H Super Superscript l plus 1 Superscript upper W Super Superscript l plus 1 Superscript right-parenthesis Baseline period For the derivation of the gradient computation rules in a convolution layer, the notation summarized in Table 3.3 will be used.

Update the parameters – backward propagation: First, we need to compute ∂z/∂vec(x^l) and z/∂vec(F), where the first term will be used for backward propagation to the previous (l − 1)th layer, and the second term will determine how the parameters of the current (l−th) layer will be updated. Keep in mind that f, F, and wⁱ refer to the same thing (modulo reshaping of the vector or matrix or tensor). Similarly, we can reshape y into a matrix upper Y element-of double-struck upper R Superscript left-parenthesis upper H Super Superscript l plus 1 Superscript upper W Super Superscript l plus 1 Superscript right-parenthesis times upper D ; then y, Y, and x^{l + 1} refer to the same object (again, modulo reshaping).

Table 3.3 Variables, for the derivation of gradient with ϕ ↔ φ_.

	Alias	Size and Meaning
X	x ^l	H^l W^l × D^l, the input tensor
F	f , w ^l	HW D^l × D, D kernels, each H × W and D^l channels
Y	y , x ^l+1	H^{l + 1} W^{l + 1} × D^{l + 1}, the output, D^{l + 1} = D
ϕ( x ^l)		H^{l + 1} W^{l + 1} × HW D^l, the `im2row` expansion of x ^l
M		H^{l + 1} W^{l + 1} HW D^l × H^l W^l D^l, the indictor matrix for ϕ( x ^l)
		H^{l + 1} W^{l + 1} × D^{l + 1}, gradient for y
		HW D^l × D, gradient to update the convolution kernels
		H^l W^l × D^l, gradient for x ^l, useful for back propagation

From the chain rule, it is easy to compute ∂z/∂vec(F) as

(3.91)

The first term on the right in Eq. (3.91) is already computed in the (l + 1)‐th layer as ∂z/∂(vec(x^{l + 1}))^T. Based on Eq. (3.89), we have

(3.92)Скачать книгу

В начало <50 51 52 53 54 55 56 57 58 59 >В конец