Machine Vision Inspection Systems, Machine Learning-Based Approaches. Группа авторов

Читать онлайн книгу.

Machine Vision Inspection Systems, Machine Learning-Based Approaches - Группа авторов


Скачать книгу
Related work Bayesian network Neural network Siamese neural network Capsule network Lake et al. [6] X Koch et al. [7] X X Hinton et al. [11] X Bertinetto et al. [24] X Chen et al. [13] X Fei-Fei et al. [4] X Lie et al. [15] X Bromley et al. [28] X Kumar et al. [31] X Zhao et al. [32] X Sabour et al. [9] X Sethy et al. [12] X

      Twin network: Twin network consists of two similar networks that share weights between them. The purpose of sharing weights is getting the same output from both networks if the same image feed to them. Since we wanted the twin network to learn how to extract features that could help distinguish images; convolutional layers, capsule layers and deep capsule layers were used and deep capsule layers-based model gave the best performance.

      The capsule network consists of four layers. Since we consider relatively simpler images with plain backgrounds, having many layers has a less effect. The first layer is a convolutional layer with 256, 9 × 9 kernels with a stride 5 to discover basic features in the 2D image. Second, third, and fourth layers are capsule layers with 32 channels of 4-dimensional capsules, where each capsule consists of 4 convolutional units with a 3 × 3 kernel and strides of 2 and 1, respectively. Next capsule layer contains 16 channels of 6-dimensional capsules. Each of them consists of a convolutional unit with a 3 × 3 kernel and stride of 2. The sixth layer is a fully connected capsule layer named as entity capsule layer. It contains 20 capsules of 16-dimension. We use dynamic routing proposed by Ref. [9], between final convolutional capsule layer and entity capsule layer with three routing iterations.

      Vector difference layer: After twin network identifies and extracts important features in two input images, the vector difference layer is used to compare those features to get a final decision about similarity. Each capsule in the twin network is trained to extract an exact type of property or entity such as an object or part of an object. Here, the length and the direction of the output vector is determined by the probability of feature detection and the state of the detected feature, respectively [11]. For example, when an identified feature is changed its state by a move, the probability remains the same with the vector length, while orientation changes. Due to this property, it is not enough to take scalar difference using L1 distance but needs to use more complex vector difference and analyse it. We obtain 20 vectors of dimension 16 after the difference layer and feed it to a fully connected network.

      Fully connected network: Fully connected network comprises four fully connected layers with parameters as shown by Figure 2.1. Except for the last fully connected layer which has sigmoid activation, other fully connected layers use Rectified Linear Unit (ReLU) activation [35]. In this study, multiple fully connected layers are used to analyse the complex output of the vector difference layer to get an accurate probability.

      The goal of this study is classifying characters in new alphabets. After fine-tuning the model for the verification job, we expect that it has learned a general enough function to distinguish between any two images. Hence, we could model character classification as a one-shot learning task that


Скачать книгу