Intelligent Data Analytics for Terror Threat Prediction. Группа авторов

Читать онлайн книгу.

Intelligent Data Analytics for Terror Threat Prediction - Группа авторов


Скачать книгу
Memory, etc. In this section two major classification algorithms are discussed.

Schematic illustration of the process of the classification of rumor.

       1.4.1.1 Naïve Bayes Classifier

User features in social networks Tweet features in Twitter Comments features in social networks
No of followers No of records No of replies
No of friends No of words No of words
User has location in his profile No of characters No of characters
User has URL? Tweet contains URL? Comments contain URL?
User is a verified user? Source of tweet Source of comment
Ratio of friends/followers Length of tweet Length of tweet
Age of the user account No of hash tags No of question mark
Ratio of statuses/followers No of mentions No of pronouns
No of pronouns No of URLs
No of URLs No of exclamation mark
No of question mark Polarity
No of exclamation mark Presence of colon symbol
Polarity
Presence of colon symbol

      It can be done using the following Bayes theorem,

      (1.1) image

      Where

       P(c/x) is the posterior probability of class.

       P(c) is the prior probability of class.

       P(x/c) is the likelihood which is the probability of given class.

       P(x) is the prior probability of predictor.

      Naive Bayes classifier is a combination of Bayes theorem and Naïve assumptions. This algorithm calculates assumption values even though use multiple parameters as input. Rumor detection is purely based on either classification of text or images. For example, try rumor detection in social networks like Twitter or Facebook, then it is required to consider several features like User features, Tweet features, and Comment features. All these features deal with text data [32]. If Tweet or post or comment includes these features then one can apply Naïve Bayes classifier algorithm to classify them whether it is a rumor or not. These features are classified into three categories. Some of dataset features are listed in Table 1.2.

Schematic illustration of Naïve Bayes classifier.

      It can be observed that there are two classes of data points and how they are classified with maximum distance.

      Two classes are

      1 i. Circle

      2 ii. Triangle.

      Adding more parameters in input dataset reduces the accuracy when compared to using less parameter. To increase the accuracy use another popular model SVM.

       1.4.1.2 Support Vector Machine

      Support vector machine (SVM) is a one of the best machine learning algorithm used for both classification and regression, widely it is used for classify given data points even though those input vectors are mapped non-linearly [8]. In social networks data available in many forms so to detect rumors it is required to classify given text data using classification algorithms based on dataset features. Classifying dataset which has multiple features and multiple dimensions is a challenging task, so using SVM will give better results.


Скачать книгу