Data Mining and Machine Learning Applications. Группа авторов
Читать онлайн книгу.Features of KNIME: KNIME [25] is an open-source analytical platform for data science. It helps to understand and design data science workflows, understanding time-series data analysis, to build machine learning models, and understand the data using visualization tools (charts, plots, etc.). It also helps to export the reports generated. KNIME workbench consists of KNIME explorer, Workflow bench, Node Repository, Workflow Editor, Description, Outline, and Console. It supports the data wrangling technique where one can collect and process the data from any source. It comes in two flavors:
◦ KNIME analytical platform
◦ KNIME server.
Both these platforms are available in Microsoft Azure and Amazon AWS
KNIME TOOL Installation
You can download the installer from the KNIME website. Once you successfully download it start the installation as specified in the next diagrams (Figure 1.5). Every installation requires you must accept the agreement, click on the button and accept the agreement (Figure 1.6). Installation requires specifying the path for installing the software, and as shown in the above diagram, it is a default path. If you wish, you can change the path by clicking on the “Browse” (Figures 1.7 and 1.8).
Figure 1.5 Installation of KNIME.
Figure 1.6 Installation of KNIME (2).
Figure 1.7 Setting path for installing KNIME.
Figure 1.8 Starting installation of KNIME.
Figures 1.9–1.16 show the complete workflow for selecting a Workspace path, and if you want to change the way, you can change it by clicking on the “Browse.” Finally, Figure 1.16 gives you the home screen for mining purpose.
Figure 1.9 Selecting directory as a workspace.
Figure 1.10 Starting KNIME.
Figure 1.11 Completing setup wizard.
Figure 1.12 Installing Workspace in KNIME.
Figure 1.13 Installing KNIME (2).
Figure 1.14 Specifying memory for KNIME.
Figure 1.15 Finalizing the installation of KNIME.
Figure 1.16 Initial screen of KNIME.
1.7.3 Rapid Miner
One can visit https://rapidminer.com/products/studio/ for further instructions to download this tool. Its main features are as follows speedy creation of predictive models; Rich set of libraries to build the model like Bayesian modeling, Regression, Clustering, Neural networks, Decision trees. A rapid miner comes with templates, which are provided for guidance. One can use any data source like MS-excel, Access, CSV, NoSQL, MongoDB, Microsoft SQL Server, MySQL, Cassandra, PDF, HTML, XML. Rapid Miner Supports ETL (extract–transform–load), multiple file types, and Data exploration using exact statistical analysis. The Code control & management module is responsible for Background process execution, Automatic optimization, Scripting, Macros, Logging, Process control, and Process-based reporting. One can obtain good visualization using Scatter, scatter matrices, Line, Bubble, Parallel, Deviation, Box, 3-D, Density, Histograms, Area, Bar charts, stacked bars, Pie charts, Survey plots, Self-organizing maps, Andrews curves, Quartile, Surface/contour plots, time series plots, Pareto/lift chart. And finally, One can validate the designed model before deployment through Split validation, Bootstrapping, Batch cross-validation, Wrapper cross-validation, Lift chart, and Confusion matrix [24].
References
1. Silberschatz, A., Korth, H.F., Sudarshan, S., Database system concepts, Mcgraw-hill, New York, 1997.
2. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., From data mining to knowledge discovery in databases. AI Mag., 17, 3, 37–37, 1996.
3. Tan, P.-N., Steinbach, M., Kumar, V., Introduction to data mining, Pearson Education India, New Delhi, 2016.
4. Sumathi, S. and Sivanandam, S.N., Introduction to data mining and its applications, vol. 29, Springer, Berlin Heidelberg, 2006.
5. Mehrotra, S., Rastogi, R., Korth, H.F., Silberschatz, A., A transaction model for multidatabase systems, in: ICDCS, pp. 56–63. Mining, What Is Data. “Data mining: Concepts and techniques, vol. 10, pp. 559–569, Morgan Kaufmann, 2006.
6. Pyo, S., Uysal, M., Chang, H., Knowledge discovery in the database for tourist destinations. J. Travel Res., California, USA, 40, 4, 374–384, 2002.
7. Gehrke, J., Ginsparg, P., Kleinberg, J., Overview of the 2003 KDD Cup. ACM Sigkdd