Machine Learning for Healthcare Applications. Группа авторов
Читать онлайн книгу.for a particular day. But an individual’s health status can’t be accurate just by considering one day’s output. In Phase-I the decision tree classifier is used, it takes the activities of an individual as input and produces the status of the health parameters for one day. Thus, the output of Phase-I is collected over a week and feeds it to Phase-II.
2.4.3 Phase-II
The Phase-II of the model, process the data received from the data sources and the output of the Phase-I. In this phase, the decision tree classifier is used to estimate the health parameter of the user. Initially, the model is trained with the dataset received from the data sources. The Phase-II of the model estimates the health status of an individual for a week. The output of Phase-II estimates the health status and generates the alerts and suggestions that are to be notified to the individual. In Phase-II the decision tree classifier is used, it takes the daily status of the health parameters over a week as input (i.e. the output of Phase-I) and outputs alerts & predictions of that health parameter.
2.4.4 Dataset Generation
Sub-section below provides the details of the rules collection and the dataset generation. The generated dataset is used for training the model proposed in the previous section.
2.4.4.1 Rules Collection
For preparing the datasets a proper set of rules is required on how the daily life activities of an individual affect his health status. The rules are collected from different trusted sources [5] and [1]. Based on the activities and measures of an individual, these rules give the overall health status of an individual. For example, the recommended sleep time for the person aged between 6 and 13 years is 9 to 11 h. If the sleep time is between 7 and 8, it is a little less than normal. if the sleep time is between 11 and 12, it is a little more than normal. if the sleep time is more than 12 or less than 8, then it affects health.
2.4.4.2 Feature Selection
Selecting the features from the rules that are collected and these rules depend on some activities and measures of an individual. For example, alcohol consumption rules for females are different from males. Similarly, the calorie value recommended for a person of 100 kg is different than that of a person of 50 kg [1]. In these examples, gender and weight are the features that are selected. In a similar fashion all the features like age, gender, height, weight, calorie intake, units smoked, units drunk, physical activity, screen time and sleep time were collected.
2.4.4.3 Feature Reduction
Although the features were collected, some of them might not affect the health status of a person directly. Thus, the collected features need to be transformed into the actual features which affect the health status. Here, the Harris–Benedict equation is used to reduce the features. The Harris–Benedict equation [4] is a method used to estimate an individual’s basal metabolic rate (BMR). It says that the calories to be consumed depends on the BMR value and physical activity.
For example, If the physical activity is sedentary or a little active, then the calories to be consumed is 1.2 ∗ BMR. If the physical activity is lightly active, then the calories to be consumed is 1.375 ∗ BMR. If physical activity is moderate, then the calories to be consumed is 1.55 ∗ BMR. If physical activity is an intense exercise, then the calories to be consumed is 1.725 ∗ BMR. If physical activity is an extra hard exercise, then the calories to be consumed is 1.9 ∗ BMR.
(2.1)
Thus, the total number of inputs is reduced to seven. They are Age, Gender, Number of units smoked, Units of Alcohol Consumed, Screen Time, Sleep Time, Calories Difference.
2.4.4.4 Dataset Generation From Rules
Based on the rules discussed in Section 2.4.4.1, all the required features are extracted. The features include daily life activities and physical measures of an individual. From the features extracted, the number of features is reduced using some standard techniques as discussed [4].
There are two phases in the proposed system. Thus, the Phase-I needs one dataset and the Phase-II needs a different dataset with class labels. The example dataset is described in Table 2.1.
2.4.4.5 Example
Let the individual’s activities and measures for a day are:
Input = (Age = 21) ∩ (Gender = Male) ∩ (No. of cigars smoked = 0) ∩ (Units of Alcohol Consumed = 2) ∩ (Screen Time = 6) ∩ (Sleep Time = 8) ∩ (Height = 176) ∩ (Weight = 63) ∩ (Calorie Intake = 1,800) ∩ (Physical Activity = Lightly Active).
Table 2.1 Sample Dataset for Phase-I.
Class | Condition | Class label | Description |
---|---|---|---|
Sleep | |||
0 | for age less than 2 sleep value between 11 and 14For age between 3 and 5 sleep value between 10 and 13For age between 6 and 13 sleep value between 9 and 11For age between 14 and 17 sleep value between 8 and 10For age between 18 and 25 sleep value between 7 and 9For age between 26 and 64 sleep value between 7 and 9For age greater than 65 sleep value between 7 and 8 | normal | It tells the optimal sleep value for different age groups |
1 | for age less than 2 sleep value between 9 and 10For age between 3 and 5 sleep value between 8 and 9For age between 6 and 13 sleep value between 7 and 8For age between 14 and 17 sleep value between 7 and 8For age between 18 and 25 sleep value between 6 and 7For age between 26 and 64 sleep value between 6 and 7For age greater than 65 sleep value between 5 and 6 | less sleep | It tells the sleep value is less than the optimal value for different age groups |
2 | for age less than 2 sleep value between 15 and 16For age between 3 and 5 sleep value between 13 and 14For age between 6 and 13 sleep value between 11 and 12For age between 14 and 17 sleep value between 10 and 11For age between 18 and 25 sleep value between 9 and 10For age between 26 and 64 sleep value between 9 and 10For age greater than 65 sleep value between 8 and 9 | more sleep | It tells the sleep value is more than the optimal value for different age groups |
Smoke | |||
0 | if the number of cigars smoked is 0 | good smoke status | |
1 | if the number of cigars smoked is between 1 and 4 | smoking status is reasonable | |
2 | if the number of cigars smoked is between 5 and 15 | bad smoking status | |
3 |
if the |