Author(s) | Collection number | Pages | Download abstract | Download full text |
---|---|---|---|---|
Liakh I. M. | № 2 (67) | 136-144 |
The development of the technology of forming subsets of mutually expressed and significant gene expression profiles for their further use in diagnostic systems based on gene expression data is characterized. A technology for removing uninformative genes based on statistical criteria using gene ontology analysis, taking into account the number of genes and the nature of their interaction, is proposed. The results of the practical implementation of the proposed technology are presented using gene expression data from patients tested for various types of cancer. The analysis of the obtained results shows the high efficiency of the proposed model. Out of 19947 gene expression profiles, 14487 significant genes are identified, and the classification accuracy of samples containing the identified significant genes as attributes was 97.6 %. Of the 619 samples that made up the test data subset, only 15 are identified incorrectly. The presented research creates conditions for improving the efficiency of a hybrid model for diagnosing complex objects based on gene expression data. The key aspects of gene ontology are considered, namely: gene ontology (GO); selection of significant genes; enrichment analysis; functional interpretation; statistical analysis. A flowchart of a step-by-step procedure for applying GO analysis to select significant genes is also presented.
The enrichment estimates are verified using test statistics: Fisher’s criterion and Kolmogorov-Smirnov test and a common list of important genes for both tests is created, unique genes are identified and new data containing selected important genes as attributes are generated. The obtained results are consistent with the modelling results obtained by applying fuzzy logic reasoning and criterion analysis systems for statistics and entropy, and the adequacy of the model is assessed by applying a classifier to the generated data.
Keywords: gene ontology analysis, gene expression, Fisher’s test, modelling, classification.
doi: 10.32403/1998-6912-2023-2-67-136-144