Inductive hybrid model of data clustering using density-based algorithms. Scientific Papers. Ukrainian Academy of Printing

Author(s)	Collection number	Pages	Download abstract	Download full text
Yasinska-Damri L. M.	№ 2 (63)	64-76

Summary
References

The paper presents the results of the research concerning the practical implementation of a hybrid inductive model of data clustering using DBSCAN and OPTICS density-based algorithms. A comparative analysis of different types of internal clustering quality criteria and the corresponding external quality criteria for various types of synthetic data is performed. It is shown that the choice of internal clustering quality criteria is essential for assessing the quality of the objects grouping in a cluster structure and, for each type of the studied dataset the formation of a combination of internal quality criteria should be done considering the nature of both the objects and clusters distribution in the feature space. The simulation procedure is carried out using the synthetic dataset Compound, which contains according to the data annotation six various shapes clusters. The simulation results regarding comparison analysis of various types of the internal clustering quality criteria have shown that for the dataset Compound the optimal criteria in terms of both minimal reproducibility error and the optimal cluster structure are the following ones: DUNN, Gamma and Xie Beni. The functions of calculating the balance criterion, which contains as components of the selected internal clustering quality criteria and respective external clustering quality criteria are formed. As the simulation results, the charts of balance criterion versus the Eps value for each MinPts parameter are created. It is shown that the proposed model allows optimizing the definition of parameters of density clustering algorithms DBSCAN and OPTICS in terms of the nature of the distribution of objects in the respective clusters. Moreover, the simulation results allow one to conclude about the advantage of the OPTICS algorithm due to the higher stability of this algorithm operation during the cluster structure formation on the one hand and, less sensitivity to variation of the algorithm parameters on the other hand.

Keywords: data clustering, density-based clustering algorithms, internal and external clustering quality criteria, objective clustering inductive technology.

doi: 10.32403/1998-6912-2021-2-63-64-76

1. Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial datasets with noise. In: Proceedings of the second international conference on knowledge discovery and data mining, Portland, Oregon, 226–231 (in English).
2. Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). OPTICS: Ordering Points To Identify the Clustering Structure. In: ACM special interest group on management of data record SIGMOD, 28 (2), 49–60. doi: https://doi.org/10.1145/304181. 304187 30 (in English).
3. Babichev, S., Durnyak, B., Pikh, I., & Senkivskyy, V. An Evaluation of the Objective Clustering Inductive Technology Effectiveness Implemented Using Density-Based and Agglomerative Hierarchical Clustering Algorithms (2020). Advances in Intelligent Systems and Computing, 1020, 532–553 (in English).
4. Babichev, S., Durnyak, B., Zhydetskyy, V., Pikh, I., & Senkivskyy, V. Application of Optics Density-Based Clustering Algorithm Using Inductive Methods of Complex System Analysis (2019). International Scientific and Technical Conference on Computer Sciences and Information Technologies, 1, art. no. 8929869, 169–172 (in English).
5. Zahn, C. T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers, 100 (1), 68–86 (in English).
6. El. Retrieved from https://cran.r-project.org/web/packages/clusterCrit (in English).
7. Ihaka, R., & Gentleman, R. (1996). R: a language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5 (3), 299–314 (in English).