Author(s) | Collection number | Pages | Download abstract | Download full text |
---|---|---|---|---|
Yasinska-Damri L. M. | № 1 (62) | 42-51 |
The paper presents a method to estimate the proximity degree of complex objects based on a modified index of mutual information, the use of which involves an application of a set of methods for calculating Shannon’s entropy to assess the mutual information of examined objects. The final decision concerning the proximity degree of the respective objects was done based on the Harrington desirability function, which contains, as the components, the results of various methods applied to calculate Shannon’s entropy. The evaluation of the effectiveness of the proposed method was carried out by classifying the studied objects using the classification quality criteria. The random forest binary classifier was applied for data classification during the simulation procedure. The structural block chart of the step-by-step algorithm to form informative data attributes according to the modified index of mutual information has been offered. The proposed method has been tested using the data of gene expressions of patients studied for lung cancer. The application of the proposed technique assumed the stepwise increasing the nearest gene expression profiles from 2 to 100 with the classification of the examined objects at each step of this procedure implementation with calculation classification quality criteria. The accuracy, F-score and Matthews correlation coefficient were used as the classification criteria. The diagrams of these criteria values variation versus the number of gene expression profiles were created as the simulation results. The analysis of the obtained results has shown the high effectiveness of the proposed method since the accuracy of the data classification is achieved by more than 99%. The increase of objectivity, in this case, is due to the correct application of a set of methods for calculating Shannon’s entropy, the value of which was used for assessing the mutual information of the respective gene expression profiles.
Keywords: Shannon entropy, maximization of mutual information, Harrington desirability function, gene expression, binary classification.
doi: 10.32403/1998-6912-2021-1-62-42-51