Author(s) | Collection number | Pages | Download abstract | Download full text |
---|---|---|---|---|
Liakh I. M., Durniak B. V. | № 1 (68) | 136-144 |
In modern bioinformatics, gene expression data analysis plays a crucial role in studying biological processes and gene regulation mechanisms. The increasing volume of data generated by DNA microarrays and RNA sequencing makes their analysis a challenging task. Traditional data analysis methods, such as cluster analysis, may not provide sufficient accuracy and information for identifying biologically significant patterns.
New approaches that combine multiple data analysis methods offer the potential for deeper and more comprehensive analysis. One such method is cluster-bicluster analysis, which combines cluster analysis with bicluster analysis. Cluster analysis allows genes to be grouped based on the similarity of their expression profiles, while bicluster analysis identifies groups of genes that are co-expressed under different conditions.
In this study, a Bayesian optimization algorithm is used to determine the optimal hyperparameters of the convolutional neural network for the application of gene expression data generated as a result of cluster-bicluster analysis. The results of training and validating the models formed by the neural network are used with 5-fold cross-validation.
The results of the comparative analysis of cancer type classification accuracy confirm the feasibility of applying the proposed step-by-step gene expression data processing procedure, which includes cluster-bicluster analysis. This highlights its potential for use in gene expression data-based diagnostic systems. Therefore, the effectiveness of applying a step-by-step procedure for clustering and biclustering gene expression data using gene ontology analysis is studied.
Keywords: gene expression, clustering, biclustering, gene ontology, data analysis.
doi: 10.32403/1998-6912-2024-1-68-136-144