Figures and data

This figure illustrates the workflow of our research.
Compared to the existing sequence-based prediction method (left), our method (right) proposes a novel method that directly utilizes the disease string as input. Here d and D denote sets of miRNA-associated and gene-associated disease descriptors, with dn ∈ d and Dm ∈ D being individual descriptors. Embedding vectors are computed as un = fSBERT(dn) and vm = fSBERT(Dm).

Example of BIOSSES dataset.
BIOSSES dataset is a benchmark dataset specifically designed for evaluating semantic similarity calculation model in biomedical texts. The dataset consists of 100 sentence pairs extracted from biomedical literature, with each pair’s average semantic similarity score independently annotated by five experts.

Disease descriptor ‘Cardiovascular Infections’ and ‘Out-of-Hospital Cardiac Arrest’ hierarchical structure in MeSH tree.

Summary of MTIs and Source Databases Employed in the Study

Example of MeSHDS dataset.
Disease 1 and Disease 2 are derived from the MeSH Tree, and their disease association is computed using the LCA method described in section 2.2.2

Model performance benchmark.
The x-axis represents the correlation coefficient between the model-generated similarity scores and the BIOSSES dataset similarity scores, measuring the model’s ability to handle semantic similarity in the biomedical domain. The y-axis represents the correlation coefficient between the model-generated similarity scores and the MeSHDS dataset similarity scores, evaluating the fine-tuning effectiveness.

F1 Scores for Different SBERT Models and Machine Learning Classifiers

Performance Comparison of miRNA Target Prediction Methods

Performance Comparison of Using Different Feature

This figure presents the average frequency distribution of similarity values between miRNA-associated diseases and gene-associated diseases for both experimentally validated MTIs and predicted MTIs.
Experimentally validated MTIs are confirmed using western blot or reporter assay, while predicted MTIs are derived from computational predictions based on sequence data. The x-axis represents the intervals of similarity values, and the y-axis indicates the average frequency of these similarity values within each interval. Specifically, functional MTIs exhibited a lower frequency (p=4.10e-47, independent samples t-test) in the low-similarity range (0-0.2) and a higher frequency (p=6.22e-71) in the moderate-similarity range (0.2-0.4) compared to predicted MTIs. The figure directly demonstrates that diseases associated with miRNAs and genes in experimentally validated MTIs show greater association compared to those in predicted MTIs.