Trends by biology subdiscipline.
For the 4,964 images in the training set, we identified images that were “Definitely okay” or “Definitely problematic”. This graph shows the percentage of these images that were “Definitely problematic” for a given subdiscipline, as indicated in the article metadata. In many cases, a single image was associated with multiple subdisciplines; these images are counted separately for each subdiscipline. We used a chi-squared goodness-of-fit test to calculate the P-value.