Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic. Читать онлайн. MREADZ.NET

Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic

Читать онлайн книгу.

В начало <93 94 95 96 97 98 99 100 101 102 >В конец

Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic

Скачать книгу

w affect four components of vec(A_{n, u}), that is, a₃, V₂, a₂ , and images

. By the properties of derivatives for matrix products and the chain rule

(5.86) equation

holds. Thus, (vec (R_u,v))^′ · ∂vec(A_n,u)/∂w is the sum of four contributions. In order to derive a method of computing those terms, let I_a denote the a × a identity matrix. Let ⊗ be the Kronecker product, and suppose that P_a is a a² × a matrix such that vec(diag (v) = P_a v for any vector v ∈ R^a. By the Kronecker product’s properties, vec(AB) = (B^′ ⊗ I_a) · vec(A) holds for matrices A, B, and I_a having compatible dimensions [67]. Thus, we have

which implies

Similarly, using the properties vec(ABC) =(C^′ ⊗ A) · vec(B) and vec(AB) =(I_a ⊗ A) · vec(B), it follows that

where d_h is the number of hidden neurons. Then, we have

(5.87) equation

(5.88) equation

(5.89) equation

(5.90) equation

where the aforementioned Kronecker product properties have been used.

It follows that (vec (R_u,v))^′ · ∂vec(A_n,u)/∂w can be written as the sum of the four contributions represented by Eqs. (5.87)–(5.90). The second and the fourth terms – Eqs. (5.88) and (5.90) – can be computed directly using the corresponding formulas. The first one can be calculated by observing that images looks like the function computed by a three‐layered FNN that is the same as h_w except for the activation function of the last layer. In fact, if we denote by images such a network, then

(5.91) equation

holds, where images . A similar reasoning can be applied also to the third contribution.

Required number of operations: The above method includes two tasks: the matrix multiplications of Eqs. (5.87)–(5.90) and the backpropagation as defined by Eq. (5.91). The former task consists of several matrix multiplications. By inspection of Eqs. (5.87)–(5.90), the number of floating point operations is approximately estimated as 2s² + 12s hi_h + 10s² · hi_h , where hi_h denotes the number of hidden‐layer neurons implementing the function h. The second task has approximately the same cost as a backpropagation phase through the original function h_w. Such a value is obtained from the following observations: for an a × b matrix C and a b × c matrix D, the multiplication CD requires approximately 2abc operations; more precisely, abc multiplications and ac (b − 1) sums. If D is a diagonal b ×b matrix, then CD requires 2ab operations. Similarly, if C is an a × b matrix, D is a b × a matrix, and P_a is the a ² ×a matrix defined above and used in Eqs. (5.87)–(5.90), then computing vec(CD)P_c costs only 2ab operations provided that a sparse representation is used for P_α . Finally, a₁, a₂, a₃ are already available, since they are computed during the forward phase of the learning algorithm. Thus, the complexity of computing ∂p_w/∂w is images . Note, however, that even if the sum in Eq. (5.85) ranges over all the arcs of the graph, only those arcs (n, u) such that R_{n, u} ≠ 0 have to be considered. In practice, R_{n, u} ≠ 0 is a rare event, since it happens only when the columns of the Jacobian are larger than μ, and a penalty function was used to limit the occurrence of these cases. As a consequence, a better estimate of the complexity of computing ∂p_w/∂w is O images , where t_R is the average number of nodes u such that R_{n, u} ≠ 0 holds for some n.

1 Instructions b = (∂ew/∂o)(∂Gw/∂x)(x, lN) and =(∂ew/∂o)(∂Gw/∂w)(x, lN): The terms b and c can be calculated by the backpropagation of ∂ew/∂o through the network that implements gw . Since such an operation must be repeated for each node, the time complexity of instructions b = (∂ew/∂o)(∂Gw/∂x)(x, lN) and c = (∂ew/∂o)(∂Gw/∂w)(x, lN) is for all the GNN models.

2 Instruction = z(t)(∂Fw/∂w)(x, l): By definition of Fw, fw , and BP, we have(5.92)

where y = [l_n, x_u, l_{(n, u)}, l_u] and BP₁ indicates that we are considering only the first part of the output of BP. Similarly

(5.93) equation

where y = [l_n, x_u, l_{(n, u)}, l_u]. These two equations provide a direct method to compute d in positional and nonlinear GNNs, respectively.

For linear GNNs, let Скачать книгу

В начало <93 94 95 96 97 98 99 100 101 102 >В конец