A fuzzy model for the removal of uninformative gene expression profiles using statistical and entropy criteria

Author(s) Collection number Pages Download abstract Download full text
Liakh I. M. № 1 (66) 39-55 Image Image

The results of research on the formation of subsets of mutually expressed gene expression profiles for further reconstruction of gene regulatory networks are presented. A technology for removing uninformative genes based on statistical criteria and Shannon’s entropy, considering the degree of priority of the corresponding criterion, is proposed. The range of variation of the values of the input parameters within the framework of the proposed model is determined by analyzing general statistics, while for the absolute values of gene expression, the maximum value of expression for each profile is determined in the first step. Next, general statistics are formed for the obtained vector of maximum values of gene expression, vector of dispersion of gene expression profiles and Shannon entropy. To create a fuzzy model, the interquartile interval of changes in maximum absolute values, dispersion and Shannon entropy of gene expression profiles are used. At the same time, the formed ranges are divided into three intervals with corresponding terms. A fuzzy model of the formation of a subset of informative gene expression profiles is developed, the validation of which is carried out by applying a classifier to objects containing the expression values of the genes selected in the subset as attributes. The results of classification of objects containing gene expression data in selected subsets as attributes shows the high efficiency of the proposed model, since the values of the object classification criteria correspond to the level of informativeness of the corresponding group of gene expression profiles.

Further perspectives of the author’s research are the practical implementation of the proposed model for the formation of subsets of informative gene expression profiles for the purpose of reconstructing gene regulatory networks.

Keywords: gene expression, statistical criteria, Shannon entropy, fuzzy logic, clas­sification criteria, ROC analysis.

doi: 10.32403/1998-6912-2023-1-66-39-55


  • 1. Ritchie, M. E., Phipson, B., & Wu, D. et al. (2015). Limma powers differential expression analyses for RNA-sequencing and microarray studies: Nucleic Acids Research, 43 (7), art. no. e47 (in English).
  • 2. Ihaka, R., & Gentleman, R. (1996). R: a language for data analysis and graphics: Journal of Computational and Graphical Statistics, 5 (3), 299−314 (in English).
  • 3. Babichev, S., Kornelyuk, A., Lytvynenko, V., & Osypenko, V. (2016). Computational analysis of microarray gene expression profiles of lung cancer: Biopolymers and Cell. Kyiv : NAS of Ukraine, 32 (1), 70–79 (in English).
  • 4. Babichev, S., Durnyak, B., Zhydetskyy, V., Pikh, I., & Senkivskyy, V. (2019). Techniques of DNA microarray data pre-processing based on the complex use of bioconductor tools and Shannon entropy: TCEUR Workshop Proceedings, 2353, 365−377 (in English).
  • 5. Babichev, S., Durnyak, B., & Senkivskyy, V. et al. (2019). Exploratory analysis of neuroblastoma data genes expressions based on bioconductor package tools: CEUR Workshop Procee­dings, 2488, 268−279 (in English).
  • 6. Tan, C. S., Ting, W. S., & Mohamad, M. S. et al. (20140. A Review of Feature Extraction Software for Microarray Gene Expression Data: BioMed Research International, 2014, art. no. 213656. doi: 10.1155/2014/213656 (in English).
  • 7. Mirkin, B. (2012). Clustering for data mining a data recovery approach. CRC Press (in Eng­lish).
  • 8. Pontes, B., Giráldez, R., & Aguilar-Ruiz, J. S. (2015). Biclustering on expression data: A re­view: Journal of Biomedical Informatics, 57, 163−180. doi: 10.1016/j.jbi.2015.06.028 (in Eng­lish).
  • 9. Kaiser, S. (2011). Biclustering: Methods, Software and Application. Thesis of Doctor of Philosophy. Minchin (in English).
  • 10. Eren, K., Deveci, M., Kucuktunc, O., & Catalyurek, U. V. (2012). A comparative analysis of biclustering algorithms for gene expression data: Briefings in Bioinformatics, 14 (3), 279–292 (in English).
  • 11. Kluger, Y., Basry, R., Chang, J. T., & Gerstein, M. (2003). Spectral biclustering of microarray data: co-clustering genes and conditions: Genome Resources, 13 (4), 703–716 (in English).
  • 12. Mukhopadhyay, A., Maulik, U., & Bandyopadhyay, S. (2010). On biclustering of gene expression data: Current Bioinformatics, 5, 204–216 (in English).
  • 13. Babichev, S., & Škvor, J. (2020). Technique of Gene Expression Profiles Extraction Based on the Complex Use of Clustering and Classification Methods: Diagnostics, 10 (8), art. no. 584 (in English).
  • 14. Babichev, S., Barilla, J., Fišer, J., & Škvor, J. (2020). A hybrid model of gene expression profiles reducing based on the complex use of fuzzy inference system and clustering quality criteria. Proceedings of the 11th Conference of the European Society for Fuzzy Logic and Technology, EUSFLAT 2019, 128−133 (in English).
  • 15. Babichev, S., Lytvynenko, V., Gozhyj, A., Korobchynskyi, M., & Voronenko, M. (2019). A fuzzy model for gene expression profiles reducing based on the complex use of statistical criteria and Shannon entropy: Advancesin Intelligent Systems and Computing, 754, 545−554 (in English).
  • 16. Hou, J., Aerts, J., & denHamer, B. etal. (2010). Gene expression – based classification of non-small cell lung carcinomas and survival prediction: PLoS ONE, 5, art. no. e10312 (in Eng­lish).
  • 17. Gene Expression Omnibus. El. Retrieved from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi (in English).
  • 18. Chuang, Y.-H., Huang, S.-H., & Hung, T.-M. et al. (2021). Convolutional neural network for human cancer types prediction by integrating protein interaction networks and omics data: Scientific Reports, 11 (1), art. no. 20691 (in English).
  • 19. Busaleh, M., Hussain, M., & Aboalsamh, H. A. (2021). Breast mass classification using diverse contextual information and convolutional neural network: Biosensors, 11(11), art. no. 419 (in English).
  • 20. Li, J., Sun, W., & Feng, X. etal. (2021). A denseconnection encoding–decoding convolutional neural network structure for semantic segmentation of thymoma: Neurocomputing, 451, 1−11 (in English).