Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic. Читать онлайн. MREADZ.NET

Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic

Читать онлайн книгу.

В начало <25 26 27 28 29 30 31 32 33 34 >В конец

Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic

gain (ig) is an impurity‐based criterion that uses the entropy (e) measure (origin from information theory) as the impurity measure:

italic i g left-parenthesis a Subscript i Baseline comma upper S right-parenthesis equals e left-parenthesis y comma upper S right-parenthesis minus sigma-summation Underscript v Subscript i comma j Baseline element-of dom left-parenthesis a Subscript i Baseline right-parenthesis Endscripts StartFraction bar sigma Subscript a Sub Subscript ModifyingAbove normal t With ampersand c period dotab semicolon Subscript equals v Sub Subscript i comma j Subscript Baseline upper S bar Over bar upper S bar EndFraction dot e left-parenthesis y comma sigma Subscript a Sub Subscript ModifyingAbove tau With ampersand c period dotab semicolon Subscript equals v Sub Subscript i comma j Subscript Baseline upper S right-parenthesis

where

(2.14)

Gini index: This is an impurity‐based criterion that measures the divergence between the probability distributions of the target attribute’s values. The Gini (G) index is defined as

(2.15) upper G left-parenthesis y comma upper S right-parenthesis equals 1 minus sigma-summation Underscript c Subscript j Baseline dot element-of dom left-parenthesis y right-parenthesis Endscripts left-parenthesis StartFraction bar sigma Subscript y equals c Sub Subscript j Subscript Baseline period upper S bar Over bar upper S bar EndFraction right-parenthesis squared

Consequently, the evaluation criterion for selecting the attribute a_i is defined as the Gini gain (GG):

(2.16)

Likelihood ratio chi‐squared statistics: The likelihood ratio (lr) is defined as

(2.17) italic l r left-parenthesis a Subscript i Baseline comma upper S right-parenthesis equals 2 dot ln left-parenthesis 2 right-parenthesis dot bar upper S bar dot italic i g left-parenthesis a Subscript i Baseline comma upper S right-parenthesis period

This ratio is useful for measuring the statistical significance of the information gain criteria. The zero hypothesis (H₀) is that the input attribute and the target attribute are conditionally independent. If H₀ holds, the test statistic is distributed as χ² with degrees of freedom equal to (dom(a_i) − 1) · (dom(y) − 1).

Normalized impurity‐based criterion: The impurity‐based criterion described above is biased toward attributes with larger domain values. That is, it prefers input attributes with many values over attributes with less values. For instance, an input attribute that represents the national security number will probably get the highest information gain. However, adding this attribute to a decision tree will result in a poor generalized accuracy. For that reason, it is useful to “normalize” the impurity‐based measures, as described in the subsequent paragraphs.

Gain ratio ( gr): This ratio “normalizes” the information gain (ig) as follows: gr(a_i, S) = ig(a_i, S)/e(a_i, S). Note that this ratio is not defined when the denominator is zero. Also, the ratio may tend to favor attributes for which the denominator is very small. Consequently, it is suggested in two stages. First, the information gain is calculated for all attributes. Then, taking into consideration only attributes that have performed at least as well as the average information gain, the attribute that has obtained the best ratio gain is selected. It has been shown that the gain ratio tends to outperform simple information gain criteria both from the accuracy aspect as well as from classifier complexity aspect.

Distance measure: Similar to the gain ratio, this measure also normalizes the impurity measure. However, the method used is different:

italic upper D upper M left-parenthesis a Subscript i Baseline comma upper S right-parenthesis equals StartFraction upper Delta upper Phi left-parenthesis a Subscript i Baseline comma upper S right-parenthesis Over minus sigma-summation Underscript v Subscript i comma j Baseline element-of dom left-parenthesis a Subscript i Baseline right-parenthesis Endscripts sigma-summation Underscript c Subscript k Baseline element-of dom left-parenthesis y right-parenthesis Endscripts b dot log Subscript 2 Baseline b EndFraction

where

(2.18) b equals StartFraction bar sigma Subscript a Sub Subscript i Subscript equals v Sub Subscript i comma j Subscript and Subscript y equals c Sub Subscript k Subscript Baseline upper S bar Over bar upper S bar EndFraction

Binary criteria: These are used for creating binary decision trees. These measures are based on the division of the input attribute domain into two subdomains.

Let β(a_i, d₁, d₂, S) denote the binary criterion value for attribute a_i over sample S when d1 and d2 are its corresponding subdomains. The value obtained for the optimal division of the attribute domain into two mutually exclusive and exhaustive subdomains, is used for comparing attributes, namely

(2.19) StartLayout 1st Row beta asterisk left-parenthesis a Subscript i Baseline comma upper S right-parenthesis equals max beta left-parenthesis a Subscript i Baseline comma d 1 comma d 2 comma upper S right-parenthesis 2nd Row normal s period normal t period d 1 union d 2 equals dom left-parenthesis a Subscript i Baseline right-parenthesis 3rd Row d 1 intersection d 2 equals empty-set period EndLayout

Twoing criterion: The Gini index may encounter problems when the domain of the target attribute is relatively wide. In this case, they suggest using the binary criterion called the twoing (tw) criterion. This criterion is defined as

(2.20)Скачать книгу

В начало <25 26 27 28 29 30 31 32 33 34 >В конец