Efficient Processing of Deep Neural Networks. Vivienne Sze

Читать онлайн книгу.

Efficient Processing of Deep Neural Networks - Vivienne Sze


Скачать книгу
which researchers and practitioners have made available to help enable the rapid progress in DNN model and hardware research and development.

      As discussed in Chapter 1, DNNs are composed of several processing layers, where in most layers the main computation is a weighted sum. There are several different types of layers, which primarily differ in terms of how the inputs and outputs are connected within the layers. There are two main attributes of the connections within a layer:

      1. The connection pattern between the input and output activations, as shown in Figure 2.1a: if a layer has the attribute that every input activation is connected to every output, then we call that layer fully connected. On the other hand, if a layer has the attribute that only a subset of inputs are connected to the output, then we call that layer sparsely connected. Note that the weights associated with these connections can be zero or non-zero; if a weight happens to be zero (e.g., as a result of training), it does not mean there is no connection (i.e., the connection still exists).

image

      Figure 2.1: Properties of connections in DNNs (Figure adapted from [4]).

      For sparsely connected layers, a sub attribute is related to the structure of the connections. Input activations may connect to any output activation (i.e., global), or they may only connect to output activations in their neighborhood (i.e., local). The consequence of such local connections is that each output activation is a function of a restricted window of input activations, which is referred to as the receptive field.

      2. The value of the weight associated with each connection: the most general case is that the weight can take on any value (e.g., each weight can have a unique value). A more restricted case is that the same value is shared by multiple weights, which is referred to as weight sharing.

      Alternatively, the output can be fed back to the input of its own layer in which case the connection is often referred to as recurrent. With recurrent connections, the output of a layer is a function of both the current and prior input(s) to the layer. This creates a form of memory in the DNN, which allows long-term dependencies to affect the output. DNNs that contain these connections are referred to as recurrent neural networks (RNNs), which are commonly used to process sequential data (e.g., speech, text), and will be discussed in more detail in Section 2.5.

      In this section, we will discuss the various popular layers used to form DNNs. We will begin by describing the CONV and FC layers whose main computation is a weighted sum, since that tends to dominate the computation cost in terms of both energy consumption and throughput. We will then discuss various layers that can optionally be included in a DNN and do not use weighted sums such as nonlinearity, pooling, and normalization.

      These layers can be viewed as primitive layers, which can be combined to form compound layers. Compound layers are often given names as a convenience, when the same combination of primitive layer are frequently used together. In practice, people often refer to either primitive or compound layers as just layers.

image

      Figure 2.2: Dimensionality of convolutions. (a) Shows the traditional 2-D convolution used in image processing. (b) Shows the high dimensional convolution used in CNNs, which applies a 2-D convolution on each channel.

Shape Parameter Description
N Batch size of 3-D fmaps
M Number of 3-D filters / number of channels of ofmap (output channels)
C Number of channels of filter / ifmap (input channels)
H/W
Скачать книгу