Application of gene ontology analysis for the formation of a subset of significant genes

Author(s) Collection number Pages Download abstract Download full text
Liakh I. M. № 2 (67) 136-144 Image Image

The development of the technology of forming subsets of mutually expressed and significant gene expression profiles for their further use in diagnostic systems based on gene expression data is characterized. A technology for removing uninformative genes based on statistical criteria using gene ontology analysis, taking into account the number of genes and the nature of their interaction, is proposed. The results of the practical implementation of the proposed technology are presented using gene expression data from patients tested for various types of cancer. The analysis of the obtained results shows the high efficiency of the proposed model. Out of 19947 gene expression profiles, 14487 significant genes are identified, and the classification accuracy of samples containing the identified significant genes as attributes was 97.6 %. Of the 619 samples that made up the test data subset, only 15 are identified incorrectly. The presented research creates conditions for improving the efficiency of a hybrid model for diagnosing complex objects based on gene expression data. The key aspects of gene ontology are considered, namely: gene ontology (GO); selection of significant genes; enrichment analysis; functional in­terpretation; statistical analysis. A flowchart of a step-by-step procedure for applying GO analysis to select significant genes is also presented.

The enrichment estimates are verified using test statistics: Fisher’s criterion and Kolmogorov-Smirnov test and a common list of important genes for both tests is created, unique genes are identified and new data containing selected important genes as attributes are generated. The obtained results are consistent with the modelling results obtained by applying fuzzy logic reasoning and criterion analysis systems for statistics and entropy, and the adequacy of the model is assessed by applying a classifier to the generated data.

Keywords: gene ontology analysis, gene expression, Fisher’s test, modelling, classification.

doi: 10.32403/1998-6912-2023-2-67-136-144


  • 1. Tomczak, A., Mortensen, J. M., & Winnenburg, R. et al. (2018). Interpretation of biological experiments changes with evolution of the Gene Ontology and its annotations: Sci Rep, 8, 5115. DОІ: https://doi.org/10.1038/s41598-018-23395-2 (in English).
  • 2. Latrille, T., Rodrigue, N., & Lartillot, N. (March 10, 2023). Genes and sites under adaptation at the phylogenetic scale also exhibit adaptation at the population-genetic scale: PNAS 2023, 120, 11. DОІ: https://doi.org/10.1073/pnas.2214977120 (in English).
  • 3. Sharn, H. O., Singh, D. B., Yadav, P. K., Gautam, B., Kumar, V., & Singh, S. (2023). Genome annotation and comparative functional analysis of genomic islands in Bordetella pertussis Tohama I, Bordetella parapertussis 12822, and Bordetella bronchiseptica RB50 genomes: Network Modeling Analysis in Health Informatics and Bioinformatics, 12 (1), 23. DOI: 10.1007/s13721-023-00418-1 (in English).
  • 4. Huang, H., Song, J., Feng, Y., Zheng, L., Chen, Y., & Luo, K. (2023). Genome-Wide Iden­tification and Expression Analysis of the SHI-Related Sequence Family in Cassava: Genes, 14 (4), 870. DОІ: https://doi.org/10.3390/genes14040870 (in English).
  • 5. Ersoz, N. S., Bakir-Gungor, B., & Yousef, M. (2023).  GeNetOntology: identifying affected gene ontology terms via grouping, scoring, and modeling of gene expression data utilizing bio­logical knowledge-based machine learning: Frontiers in Genetics, 14, 1139082 (in English).
  • 6. Ietswaart, R., Gyori, B. M., & Bachman, J. A. et al. (2021). GeneWalk identifies relevant gene functions for a biological context using network representation learning: Genome Biol, 22, 55. DОІ: https://doi.org/10.1186/s13059-021-02264-8 (in English).
  • 7. ArrayExpress - Functional Genomics Data. Retrieved from https://www.ebi.ac.uk/biostudies/arrayexpress (in English).
  • 8. Gene Expression Omnibus – GEO. Retrieved from https://www.ncbi.nlm.nih.gov/geo/ (in English).
  • 9. The Cancer Genome Atlas Program – TCGA. Retrieved from https://www.cancer.gov/ccg/research/genome-sequencing/tcga (in English).