Method of the proximity degree of complex objects evaluation on the basis of the modified index of mutual information maximization

Author(s) Collection number Pages Download abstract Download full text
Yasinska-Damri L. M. № 1 (62) 42-51 Image Image

The paper presents a method to estimate the proximity degree of complex objects based on a modified index of mutual information, the use of which involves an application of a set of methods for calculating Shannon’s entropy to assess the mutual information of examined objects. The final decision concerning the proximity degree of the respective objects was done based on the Harrington desirability function, which contains, as the components, the results of various methods applied to calculate Shannon’s entropy. The evaluation of the effectiveness of the proposed method was carried out by classifying the studied objects using the classification quality criteria. The random forest binary classifier was applied for data classification during the simulation procedure. The struc­tural block chart of the step-by-step algorithm to form informative data attributes ac­cording to the modified index of mutual information has been offered. The proposed method has been tested using the data of gene expressions of patients studied for lung cancer. The application of the proposed technique assumed the stepwise increasing the nearest gene expression profiles from 2 to 100 with the classification of the examined objects at each step of this procedure implementation with calculation classification quality criteria. The accuracy, F-score and Matthews correlation coefficient were used as the classification criteria. The diagrams of these criteria values variation versus the number of gene expression profiles were created as the simulation results. The analysis of the obtained results has shown the high effectiveness of the proposed method since the accuracy of the data classification is achieved by more than 99%. The increase of objectivity, in this case, is due to the correct application of a set of methods for calculating Shannon’s entropy, the value of which was used for assessing the mutual information of the respective gene expression profiles.

Keywords: Shannon entropy, maximization of mutual information, Harrington desirability function, gene expression, binary classification.

doi: 10.32403/1998-6912-2021-1-62-42-51

  • Pontes, C., Andrade, M., Fiorote, J., & Treptow, W. (2021). Trivial and nontrivial error sources account for misidentification of protein partners in mutual information approaches: Scientific Reports, 11 (1), art. no. 6902 (in English).
  • Babichev, S., & Škvor, J. (2020). Technique of Gene Expression Profiles Extraction Based on the Complex Use of Clustering and ClassificationMethods: Diagnostics, 10 (8), art. no. 584 (in English).
  • Almugren, N., & Alshamlan, H. (2019). A survey on hybrid feature selection methods in microarray gene expression data for cancer classification: IEEE Access, 7, art. no. 8736725, 78533–78548 (in English).
  • Thomas, M. C., & Joy, A. T. (2006). Elements of Information Theory. Wiley, 2nd Edition (in English).
  • Shannon, C. E. (1948). А mathematical theory of communication: Bell System Technical Journal, 27, 379–423, 623–656 (in English).
  • Hausser, J., & Strimmer, K. (2009). Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks: Journal of Machine Learning Research, 10, 1469–1484 (in English).
  • Harrington, J. (1965). The desirability function: Industrial Quality Control, 21 (10), 494–498 (in English).
  • Ihaka, R., & Gentleman R. (1996). R: a language for data analysis and graphics: Journal of Computational and Graphical Statistics, 5 (3), 299–314 (in English).
  • Hou, J., Aerts, J., & den Hamer, B. et al. (2010). Gene expression-based classification of non-small cell lung carcinomas and survival prediction: PLoS ONE, 5, art. no. e10312 (in English).
  • Matthews, B. W. (1975). Comparison of the predicted and observed secondary structure of t4 phage lysozyme: BBA–Protein Struct, 405, 442–451 (in English).
  • Breiman, L. (2001). Random forests: Mach. Learn, 45, 5–32 (in English).