Hybrid inductive model of gene expression profiles clustering based on SOTA algorithm

Author(s) Collection number Pages Download abstract Download full text
Yasinska-Damri L. M., Liakh I. M., Durniak B. V., Бабічев С. А. № 1 (64) 48-62 Image Image

The results of the research regarding the development of a hybrid inductive model of gene expression profiles clustering based on the joint application of the SOTA clustering algorithm (Self-Organizing Tree Algorithm) and the convolutional neural network are presented in the paper. The model is presented as a structural block chart of a stepwise procedure for implementing the clustering algorithm within the framework of objective clustering inductive technology in the first step and the application of a convolutional neural network to gene expression data in the formed clusters in the second step. As an experimental data, the authors used gene expression data of patients studied on lung cancer. 156 patients were studied in total, of which 65 were identified as healthy and 91 patients were diagnosed with cancer. Each of the studied objects contained 54,675 genes. In the first stage, 10,000 of the most informative genes in terms of statistical criteria and Shannon entropy were allocated. The formation of intermediate clustering was carried out on the basis of the analysis of the balance criterion values, which contained, as the components, both the internal and external clustering quality criteria. The final choice of the optimal clustering corresponded to the maximum value of the objects classification accuracy when using a convolutional neural network.

The performed research creates the conditions for improving the objectivity of the object identification by parallelizing the information processing, carefully selecting the most informative gene expression profiles according to the classification quality criteria and making a compromise decision by analyzing the results of the classification of the object containing only the most informative gene expression profiles.

A further perspective of the author’s research is the practical implementation of the proposed technique using various current gene expression data.

Keywords: SOTA clustering algorithm, convolutional neural network, gene expression data, clustering of gene expression profiles, objective clustering inductive technology, data classification, classification accuracy.

doi: 10.32403/1998-6912-2022-1-64-48-62


  • Madala, H. R., & Ivakhnenko, A. G. (1994). Inductive Learning Algorithms for Complex Systems Modeling. CRC Press (in English).
  • Babichev, S., Taif, M. A., Lytvynenko, V., & Korobchinskyi, M. (2017). Objective clustering inductive technology of gene expression sequences features. Communications in Computer and Information Science. In the book «Beyond Databases, Architectures and Structures», edi­ted by S. Kozelski and D. Mrozek, 359–372 (in English).
  • Babichev, S., Gozhyj, A., Kornelyuk, A., & Lytvynenko, V. (2017). Objective clustering inductive technology of gene expression profiles based on SOTA clustering algorithm: Biopolymers and Cell. Kiev : National Academy of Science Ukraine, 33 (5), 379–392 (in English).
  • Soni, N., & Ganatra, A. (2012). Categorization of Several Clustering Algorithms from Dif­ferent Perspective: A Review: International Journal of Advanced Research in Computer Scien­ce and Software Engineering, 2 (8), 63–68 (in English).
  • Xu R., Wunsch D.C. (2005). Survey of Clustering Algorithms. IEEE Transactions on neu­ral Networks, 16, 645–678 (in English).
  • Chuang, Y.-H., Huang, S.-H., & Hung, T.-M. et al. (2021). Convolutional neural network for human cancer types prediction by integrating protein interaction networks and omics data: Scientific Reports, 11 (1), art. no. 20691 (in English).
  • Busaleh, M., Hussain, M., & Aboalsamh, H. A. (2021). Breast mass classification using diverse contextual information and convolutional neural network: Biosensors, 11 (11), art. no. 419 (in English).
  • Li, J., Sun, W., & Feng, X. et al. (2021). A dense connection encoding–decoding convolutional neural network structure for semantic segmentation of thymoma: Neurocomputing, 451, 1–11 (in English).
  • Cao, X., Pan, J.-S., & Wang, Z. et al. (2021). Application of generated mask method based on Mask R-CNN in classification and detection of melanoma: Computer Methods and Programs in Biomedicine, 207, art. no. 106174 (in English).
  • Mostavi, M, Chiu, Y-C., Huang, Y., & Chen, Y. (2020). Convolutional neural network models for cancer type prediction based on gene expression: BMC Medical Genomics, 13 (5), art. no. 44 (in English).
  • Ramires, R., Chiu, Y., & Horerra, A. et al. (2020). Classification of cancer types using graph convolutional neural networks: Frontiers in Physics, 8, art. no. 203 (in English).
  • Dorazo, J., & Carazo, J. M. (1997). Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree: Journal of Molecular Evolution, 44 (2), 226–234 (in English).
  • Kohonen, T. (2001). Self-Organizing Maps (Third Extended Edition), New York (in English).
  • Fritzke, B. (1994). Growing Cell Structures. A Self-Organizing Network for Unsupervised and Supervised Learning: Neural Networks, 7 (9), 1441–1420 (in English).
  • Brock, G., Pihur, V., Datta, S., & Datta, S. (2008). clValid: An R Package for Cluster Va­lidation: Journal of Statistical Software, 25 (4), 1–22 (in English).