Hybrid model for identifying gene expression data patterns based on cluster- bicluster analysis using convolutional neural networks

Author(s) Collection number Pages Download abstract Download full text
Liakh I. M., Durniak B. V. № 1 (68) 136-144 Image Image

In modern bioinformatics, gene expression data analysis plays a crucial role in studying biological processes and gene regulation mechanisms. The increasing volume of data generated by DNA microarrays and RNA sequencing makes their analysis a challenging task. Traditional data analysis methods, such as cluster analysis, may not provide sufficient accuracy and information for identifying biologically significant pat­terns.

New approaches that combine multiple data analysis methods offer the potential for deeper and more comprehensive analysis. One such method is cluster-bicluster analysis, which combines cluster analysis with bicluster analysis. Cluster analysis allows genes to be grouped based on the similarity of their expression profiles, while bicluster analysis identifies groups of genes that are co-expressed under different conditions.

In this study, a Bayesian optimization algorithm is used to determine the optimal hyperparameters of the convolutional neural network for the application of gene expression data generated as a result of cluster-bicluster analysis. The results of training and validating the models formed by the neural network are used with 5-fold cross-validation.

The results of the comparative analysis of cancer type classification accuracy confirm the feasibility of applying the proposed step-by-step gene expression data processing procedure, which includes cluster-bicluster analysis. This highlights its potential for use in gene expression data-based diagnostic systems. Therefore, the effectiveness of applying a step-by-step procedure for clustering and biclustering gene expression data using gene ontology analysis is studied.

Keywords: gene expression, clustering, biclustering, gene ontology, data analysis.

doi: 10.32403/1998-6912-2024-1-68-136-144


  • 1. Paul, D. Thomas. The Gene Ontology and the meaning of biological function. Retrieved from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6438694/ (in English).
  • 2. Pietro H. Guzzi. Ontology in Bioinformatics. Retrieved from https://www.sciencedirect.com/science/article/abs/pii/B9780128096338204901 (in English).
  • 3. Yihong, Gong, & Wei, Xu. Machine Learning for Multimedia Content Analysis, 37−70. Retrieved from https://link.springer.com/chapter/10.1007/978-0-387-69942-4_3 (in English).
  • 4. Babichev, S., Yasinska-Damri, L., Liakh, I., & Škvor, J. (2022). Hybrid Inductive Model of Differentially and Co-Expressed Gene Expression Profile Extraction Based on the Joint Use of Clustering Technique and Convolutional Neural Network: Applied Sciences (Switzerland), 12 (22). Art. № 11795 (in English).
  • 5. Babichev, S., Yasinska-Damri, L., & Liakh, I. A Hybrid Model of Cancer Diseases Diagnosis Based on Gene Expression Data with Joint Use of Data Mining Methods and Machine Learning Techniques (2023): Applied Sciences (Switzerland), 13 (10). Art. № 6022 (in English).
  • 6. Yasinska-Damri, L., Babichev, S., Durnyak, B., & Goncharenko, T. (2023). Application of Convolutional Neural Network for Gene Expression Data Classification: Lecture Notes on Data Engineering and Communications Technologies, 149, 3−24  (in English).
  • 7. Babichev, S., Durnyak, B., Sharko, O., & Sharko, A. (2020). Technique of metals strength properties diagnostics based on the complex use of fuzzy inference system and hybrid neural network: Communications in Computer and Information Science, 1158, 114−126 (in English).