Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic
Читать онлайн книгу.ωy, j are the weights associated with the delayed outputs. The previous networks exhibit a locally recurrent structure, but when connected into a larger network, they have a feedforward architecture and are referred to as locally recurrent–globally feedforward (LRGF) architectures. A general LRGF architecture is shown in Figure 3.14. It allows dynamic synapses to be included within both the input (represented by H1, … , HM) and the output feedback (represented by HFB), some of the aforementioned schemes. Some typical examples of these networks are shown in Figures 3.15–3.18.
The following equations fully describe the RNN from Figure 3.17
(3.65)
Figure 3.14 General locally recurrent–globally feedforward (LRGF) architecture.
Figure 3.15 An example of Elman recurrent neural network (RNN).
Figure 3.16 An example of Jordan recurrent neural network (RNN).
where the (p + N + 1) × 1 dimensional vector u comprises both the external and feedback inputs to a neuron, as well as the unity valued constant bias input.
Training: Here, we discuss training the single fully connected RNN shown in Figure 3.17. The nonlinear time series prediction uses only one output neuron of the RNN. Training of the RNN is based on minimizing the instantaneous squared error at the output of the first neuron of the RNN which can be expressed as
(3.66)
where e(k) denotes the error at the output y1 of the RNN, and s(k) is the training signal. Hence, the correction for the l‐th weight of neuron k at the time instant k is
(3.67)
Figure 3.17 A fully connected recurrent neural network (RNN; Williams–Zipser network) The neurons (nodes) are depicted by circles and incorporate the operation Φ (sum of inputs).
Since the external signal vector s does not depend on the elements of W, the error gradient becomes ∂e(k)/∂ωn,l(k) = − ∂y1(k)/∂ωn,l(k). Using the chain rule gives
(3.68)
where δnl = 1 if n = l and 0 otherwise. When the learning rate η is sufficiently small, we have ∂yα(k − 1)/∂ωn, l(k) ≈ ∂yα(k − 1)/∂ωn, l(k − 1). By introducing the notation
(3.69)