| Author(s) | Collection number | Pages | Download abstract | Download full text |
|---|---|---|---|---|
| Yasinska-Damri L. M., Liakh I. M., Durniak B. V., Бабічев С. А. | № 1 (64) | 48-62 |
|
|
The results of the research regarding the development of a hybrid inductive model of gene expression profiles clustering based on the joint application of the SOTA clustering algorithm (Self-Organizing Tree Algorithm) and the convolutional neural network are presented in the paper. The model is presented as a structural block chart of a stepwise procedure for implementing the clustering algorithm within the framework of objective clustering inductive technology in the first step and the application of a convolutional neural network to gene expression data in the formed clusters in the second step. As an experimental data, the authors used gene expression data of patients studied on lung cancer. 156 patients were studied in total, of which 65 were identified as healthy and 91 patients were diagnosed with cancer. Each of the studied objects contained 54,675 genes. In the first stage, 10,000 of the most informative genes in terms of statistical criteria and Shannon entropy were allocated. The formation of intermediate clustering was carried out on the basis of the analysis of the balance criterion values, which contained, as the components, both the internal and external clustering quality criteria. The final choice of the optimal clustering corresponded to the maximum value of the objects classification accuracy when using a convolutional neural network.
The performed research creates the conditions for improving the objectivity of the object identification by parallelizing the information processing, carefully selecting the most informative gene expression profiles according to the classification quality criteria and making a compromise decision by analyzing the results of the classification of the object containing only the most informative gene expression profiles.
A further perspective of the author’s research is the practical implementation of the proposed technique using various current gene expression data.
Keywords: SOTA clustering algorithm, convolutional neural network, gene expression data, clustering of gene expression profiles, objective clustering inductive technology, data classification, classification accuracy.
doi: 10.32403/1998-6912-2022-1-64-48-62