Скачать книгу
the sum runs over the sources at layer l for a fixed neuron k at layer l + 1, whereas in definition (4.8) the sum runs over the sinks at layer l + 1 for a fixed neuron i at a layer l. When using Eq. (4.8) to define the relevance of a neuron from its messages, then condition (4.9) is a sufficient condition I order to ensure that Eq. (4.2) holds. Summing over the left hand side in Eq. (4.9) yields
One can interpret condition (4.9) by saying that the messages are used to distribute the relevance of a neuron k onto its input neurons at layer l. In the following sections, we will use this notion and the more strict form of relevance conservation as given by definition (4.8) and condition (4.9). We set Eqs. (4.8) and (4.9) as the main constraints defining LRP. A solution following this concept is required to define the messages according to these equations.
Now we can derive an explicit formula for LRP for our example by defining the messages . The LRP should reflect the messages passed during classification time. We know that during classification time, a neuron i inputs aiwik to neuron k, provided that i has a forward connection to k. Thus, we can rewrite expressions for and so that they match the structure of the right‐hand sides of the same equations by the following:
The match of the right‐hand sides of the initial expressions for and against the right‐hand sides of Eqs. (4.10) and (4.11) can be expressed in general as
Although this solution, Eq. (4.12), for message terms still needs to be adapted such that it is usable when the denominator becomes zero, the example given in Eq. (4.12) gives an idea of what a message could be, namely, the relevance of a sink neuron that has been already computed, weighted proportionally by the input of neuron i from the preceding layer l.
Taylor‐type decomposition: An alternative approach for achieving a decomposition as in Eq. (4.1) for a general differentiable predictor f is a first‐order Taylor approximation: