Author(s) | Collection number | Pages | Download abstract | Download full text |
---|---|---|---|---|
Yasinska-Damri L. M. | № 2 (63) | 64-76 |
The paper presents the results of the research concerning the practical implementation of a hybrid inductive model of data clustering using DBSCAN and OPTICS density-based algorithms. A comparative analysis of different types of internal clustering quality criteria and the corresponding external quality criteria for various types of synthetic data is performed. It is shown that the choice of internal clustering quality criteria is essential for assessing the quality of the objects grouping in a cluster structure and, for each type of the studied dataset the formation of a combination of internal quality criteria should be done considering the nature of both the objects and clusters distribution in the feature space. The simulation procedure is carried out using the synthetic dataset Compound, which contains according to the data annotation six various shapes clusters. The simulation results regarding comparison analysis of various types of the internal clustering quality criteria have shown that for the dataset Compound the optimal criteria in terms of both minimal reproducibility error and the optimal cluster structure are the following ones: DUNN, Gamma and Xie Beni. The functions of calculating the balance criterion, which contains as components of the selected internal clustering quality criteria and respective external clustering quality criteria are formed. As the simulation results, the charts of balance criterion versus the Eps value for each MinPts parameter are created. It is shown that the proposed model allows optimizing the definition of parameters of density clustering algorithms DBSCAN and OPTICS in terms of the nature of the distribution of objects in the respective clusters. Moreover, the simulation results allow one to conclude about the advantage of the OPTICS algorithm due to the higher stability of this algorithm operation during the cluster structure formation on the one hand and, less sensitivity to variation of the algorithm parameters on the other hand.
Keywords: data clustering, density-based clustering algorithms, internal and external clustering quality criteria, objective clustering inductive technology.
doi: 10.32403/1998-6912-2021-2-63-64-76