Pie-chart (a) showing an overview of the reporting and non-reporting (none) of sex only, age, or both sex and age in a set of 15,311 studies published between 1994 and 2014 by stating the number and …
PubMed search terms used for each disease group and their approaches.
Example rules for identification of sex and age.
The figure shows the top 70 journals from a total of 628 journals in which were published 30 or more articles of the corpus; corresponding to 81.05% of papers assessed. The journals are organised in …
The reporting of these variables was assessed for six groups of diseases from the top 10 causes of death according to the W.H.O. This analysis was performed in the set of 14,225 articles published …
The reporting of sex was assessed for each disease by the topic of research whether genetics (a), immunology (b), physiopathology (c), or therapy (d). This analysis was performed in the set of …
The graph shows the reporting in particular diseases. All these diseases that are among the most frequently reported causes of death world-wide or commonly used models. The distribution is presented …
Journal impact factor in which the papers were published (a) and h-index of journals (b). Spearman’s rank correlation coefficient r square is shown alongside the regression lines. The scatter plots …
Evaluation of the performance of the text mining system.
Characteristics | True- positives | True- negatives | False- positives | False- negatives | Precision (%) | Recall (%) | F-score (%) |
---|---|---|---|---|---|---|---|
Sex | 29 | 16 | 3 | 2 | 90.6 | 93.5 | 92.0 |
Age | 31 | 14 | 1 | 4 | 96.8 | 88.5 | 92.4 |
A total of 50 articles were used as the data set to evaluate the performance of the text mining system (Supplementary file 2D). The precision (P), calculated as TP/(TP+FP), determines the accuracy of the system in recognizing desirable terms. The recall (R), calculated as TP/(TP+FN), produces the coverage of the system. F-score is the harmonic mean of precision and recall and it is calculated as 2*P*R/(P+R).
Summary of the data sets used in this study.
Sets of articles | Number of articles | Task | File |
---|---|---|---|
Data 1 | 15,311 | Corpus for assessing reporting of the sex and age of the mice | Supplementary file 1* |
Data 2 | 40 | Creating the text-mining rules | Supplementary file 2A |
Data 3 | 40 | Manual inspection for finding the location of the mention of the sex and age of the mice | Supplementary file 2B |
Data 4 | 70 | Enhancing the performance of the text-mining rules | Supplementary file 2C |
Data 5 | 50 | Evaluating the text-mining system | Supplementary file 2D |
*Supplementary file 1 also contains data sets of the six groups of diseases analyzed (cardiovascular diseases; cancer; diabetes mellitus; lung diseases; infectious diseases; and neurological disorders), as well as of the different approaches to assess the disease models (i.e. genetics, immunology, physiopathology and therapy), and the disease example for each of the six disease groups.
Corpus for assessing reporting of the sex and age of the mice.
(A) Set of articles for creating the text-mining rules.
(B) Set of articles for finding the location of the mention of the sex and age of the mice. (C) Set of articles for enhancing the performance of the text-mining rules. (D) Set of articles for evaluating the text-mining system.
Rules used to identify the sex and age of experimental mouse models.