Machine Learning Algorithms and Applications. Группа авторов
Читать онлайн книгу.an individual’s property.
The growth in the field of Open Data surely asks for new tools and techniques that can support it. Digital transformation needs companies to look out for new tools and techniques so as to be able to support the increasing need for faster delivery of services at large numbers of delivery points. Technologies like SaaS, mobile, and Internet of Things are gaining grounds in providing increase in endpoints and thus enabling the success of Open Data Initiative.
1.1.2 Air Quality
A report, State of Global Air 2017, by Institute for Health Metrics published recently [1] stated that, in the year 2015, there have been 1,090,400 deaths in India only due to an increase in PM2.5. High concentration of PM2.5 in the air is majorly caused by burning of petroleum fuels, household fuels, wooden fuels, agricultural fires, and industry related pollutants and contaminants. In 2015, India and Bangladesh came next to North African and Middle East countries in terms of places with high concentration of PM2.5 in air.
The report compares the ambient concentrations to the air quality guidelines established by the WHO in 2005. Based on the report by WHO, in the year 2015, 92% of the world’s population and 86% of Indian population lived in unsafe areas exceeding safe limits. It is therefore need of the hour to develop tools that can provide better forecasting and easy understanding of the surrounding environment to naive users with lowest cost possible. Air Quality Index (AQI) is a commonly used index by agencies to provide information about quality of air in the vicinity to its residents.
The irony of today’s Internet world is that even when we are inundated with large quantities of data or information, we as humans still struggle with its rightful interpretation. Extracting meaningful information from plain textual data in old tabular formats is an extraneous task. It is under these circumstances that data visualizations play a vital role.
The objective of this work was to build a machine learning–based visualization app for air quality evaluation and air pollution assessment by assessing various parameters by which air is getting polluted. Existing approaches did not account for variations in values of parameters at different locations. That is why we have trained different models for different locations to capture the trends better.
1.1.3 Impact of Lockdown on Air Quality
COVID-19 is a highly infectious disease caused by a newly discovered Coronavirus which was firstly identified in Wuhan, Central China. It has taken more than 460,000 lives as on 20th June, 2020, around the world. Due to this pandemic, a nationwide lockdown was imposed in India from 24th March, 2020, which extended up to several weeks. It is observed that lockdown could help in reducing pollution levels to a certain extent. This study tries to capture the variations in air pollution levels with and without lockdown.
1.2 Literature Survey
Air pollution occurs when particulates (pm2.5 and pm10), biological molecules, and other harmful substances are introduced into Earth’s atmosphere. Natural processes and human activities can both generate air pollution. Air pollution can be further classified into two sections: visible air pollution and invisible air pollution.
Proactive monitoring and control of our natural and built environments is important in various application scenarios. Semantic Sensor Web technologies have been well researched and used for environmental monitoring applications to expose sensor data for analysis in order to provide responsive actions in situations of interest [2]. A sliding window approach that employs the Multilayer Perceptron model to predict short-term PM 2.5 pollution situations is integrated into the proactive monitoring and control framework [2]. Time series data in practical applications always contain missing values due to sensor malfunction, network failure, outliers, etc. [3]. A spatiotemporal prediction framework based on missing value processing algorithms and deep recurrent neural network (DRNN) has been proposed [3]. A generic methodology for weather forecasting is proposed by the help of incremental K-means clustering algorithm in [4]. Air pollution data are available to the public as numeric values on the concentration of pollutants in the air on a web page [5, 6]. The numeric information is not conducive to determining the air pollution level intuitively [6]. To address this problem, the study developed and implemented a program for visualizing the air pollution level for six pollutants by obtaining real-time air pollution data using API and generating a keyhole markup language (KML) file defined to visualize the data on Google Earth intuitively [6]. Visualization method is intuitive and reliable through data quality checking and information sharing with multi-perspective air pollution graphs [7]. This method allows the data to be easily understood by the public and inspire or aid further studies in other fields [7]. As the tools are invented using spatial-temporal visualization and visual analytics for general visualization purposes of geo-referenced time series data of air quality and environmental data, they can be applied to other environmental monitoring data (temperature, precipitation, etc.) through some configurations [8].
According to a survey mentioned in [9], pollution levels in many cities across the country reduced down drastically only after a few days of imposing lockdown. Also, as discussed in the study [10], lockdown could be the effective alternative measure to be implemented for controlling air pollution.
The results above show us that all these machine learning techniques can be used for prediction and evaluating air pollution thereafter. Implementation details are described in the next section.
1.3 Implementation Details
There are several paradigms that can be implemented to classify the quality of air. The novelty of the application is to predict the future air quality of different places in detail with estimated values of various parameters along with its air quality and AQI. The application is able to visualize data in an efficient and descriptive way which is hard to analyze numerically in its raw form.
1.3.1 Proposed Methodology
Our proposed methodology steps have been discussed as follows:
1. Fetch real-time air quality data through an API of Open Data.
2. Clustering of air quality data based on AQI and assigning classes of air quality from good to severe.
3. Train a Support Vector Machine (SVM) model on the previously clustered data.
4. Train different time series Long Short-Term Memory (LSTM), a Recurrent Neural Network (RNN) model for different places to predict the future air quality of that place based on the previous trend.
5. Assign air quality and AQI to the observed/predicted values of the parameters. AQI is assigned based on the worst 24-hour average of all the parameters.
6. Different visualizations of the past data and future predictions using Heat Maps, Graphs, etc.
7. Compare variations in different parameters contributing toward air pollution at different places.
8. Provide a user-friendly web app to predict and analyze air quality.
Figure 1.1 Workflow of the application.
Figure 1.1 shows the detailed description of the working of the application. The latest data for specific location/place is fetched using Restful API in JSON format from the Open Data Repository. Prediction is also done for the future air quality using the past values. The predicted data is displayed with an adequate message through the web app to the user. Visualization involved displaying the results in human understandable format. For that, Heat Maps were generated and appropriate messages were displayed based on the WHO guidelines. The