Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic. Читать онлайн. MREADZ.NET

Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic

Читать онлайн книгу.

В начало <35 36 37 38 39 40 41 42 43 44 >В конец

Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic

layer l

bold a Subscript i Superscript l Baseline left-parenthesis k right-parenthesis equals left-bracket a Subscript i Superscript l Baseline left-parenthesis k right-parenthesis comma a Subscript i Superscript l Baseline left-parenthesis k minus 1 right-parenthesis comma period period a Subscript i Superscript l Baseline left-parenthesis k minus upper M Superscript l plus 1 Baseline right-parenthesis right-bracket

Vector of delayed activation values x Subscript i Baseline left-parenthesis k right-parenthesis equals a Subscript i Superscript 0 Baseline left-parenthesis k right-parenthesis

x Subscript i Baseline left-parenthesis k right-parenthesis equals a Subscript i Superscript 0 Baseline left-parenthesis k right-parenthesis

i‐th external input to network y Subscript i Baseline left-parenthesis k right-parenthesis equals a Subscript i Superscript upper L Baseline left-parenthesis k right-parenthesis

y Subscript i Baseline left-parenthesis k right-parenthesis equals a Subscript i Superscript upper L Baseline left-parenthesis k right-parenthesis

i‐th output of network Schematic illustration of finite impulse response (FIR) network unfolding.

Figure 3.6 Finite impulse response (FIR) network unfolding.

Example

For the network shown in Figure 3.6, all connections are made by second‐order (three tap) FIRs. Although at first sight it looks as though we have only 10 connections in the network, in reality there are a total of 30 variable filter coefficients (not counting five bias weights). Starting at the output, each tap delay can be interpreted as a “virtual neuron,” whose input is delayed by the given number of time steps. A tap delay can be “removed” by replicating the previous layers of the network and delaying the input to the network as shown in Figure 3.6. The procedure is then carried on backward throughout each layer until all delays have been removed. The final unfolded structure is depicted in the bottom of Figure 3.6.

3.2.3 Adaptation

For supervised learning with input sequence x(k), the difference between the desired output at time k and the actual output of the network is the error

(3.17) normal e left-parenthesis k right-parenthesis equals normal d left-parenthesis k right-parenthesis minus normal y left-parenthesis k right-parenthesis period

The total squared error over the sequence is given by

(3.18) upper J equals sigma-summation Underscript k equals 1 Overscript upper K Endscripts normal e left-parenthesis k right-parenthesis Superscript upper T Baseline normal e left-parenthesis k right-parenthesis period

The objective of training is to determine the set of FIR filter coefficients (weights) that minimizes the cost J subject to the constraint of the network topology. A gradient descent approach will be utilized again in which the weights are iteratively updated.

For instantaneous gradient descent, FIR filters may be updated at each time slot as

(3.19)

where partial-differential normal e Superscript upper T Baseline left-parenthesis k right-parenthesis normal e left-parenthesis k right-parenthesis slash partial-differential normal w Subscript italic i j Superscript l Baseline left-parenthesis k right-parenthesis is the instantaneous gradient estimate, and μ is the learning rate. However, deriving an expression for this parameter results in an overlapping of number of chain rules. A simple backpropagation‐like formulation does not exist anymore.

Temporal backpropagation is an alternative approach that can be used to avoid the above problem. To discuss it, let us consider two alternative forms of the true gradient of the cost function:

(3.20)

Note that

StartFraction partial-differential upper J Over partial-differential s Subscript j Superscript l Baseline left-parenthesis k right-parenthesis EndFraction StartFraction partial-differential s Subscript j Superscript l Baseline left-parenthesis k right-parenthesis Over partial-differential normal w Subscript italic i j Superscript l Baseline EndFraction not-equals StartFraction partial-differential normal e Superscript upper T Baseline left-parenthesis k right-parenthesis normal e left-parenthesis k right-parenthesis Over partial-differential normal w Subscript italic i j Superscript l Baseline EndFraction comma

only their sum over all k is equal. Based on this new expansion, each term in the sum is used to form the following stochastic algorithm:

(3.21)

For small learning rates, the total accumulated weight change is approximately equal to the true gradient. This training algorithm is termed temporal backpropagation.

To complete the algorithm, recall the summing junction is defined as

(3.22)Скачать книгу

В начало <35 36 37 38 39 40 41 42 43 44 >В конец