Meta-Research: Centralized scientific communities are less likely to generate replicable results

  1. Valentin Danchev  Is a corresponding author
  2. Andrey Rzhetsky
  3. James A Evans  Is a corresponding author
  1. University of Chicago, United States
  2. Stanford University, United States
  3. Santa Fe Institute, United States
  • Download
  • Cite
  • CommentOpen annotations (there are currently 0 annotations on this page).
4 figures and 4 additional files

Figures

Figure 1 with 2 supplements
Alignment of drug-gene interaction (DGI) claims reported in the literature with DGI claims from high-throughput experiments.

(A) Our analysis of 51,292 DGI claims (see Supplementary file 1) in the literature revealed 60,159 supporting findings (green) and 4253 opposing findings (pink) in aggregate. These DGI claims …

https://doi.org/10.7554/eLife.43094.002
Figure 1—figure supplement 1
Publications that share authors are more likely to agree about the direction of a drug-gene interaction than publications with distinct authors, computed among pairs of papers reporting claims in the sub-corpus of 2493 claims.

(A) Bootstrap samples of agreement (1 = agreement, 0 = disagreement) among pairs of papers that reported findings about the same drug-gene interaction depending on whether the pairs of papers shared …

https://doi.org/10.7554/eLife.43094.003
Figure 1—figure supplement 2
Estimates of probability density functions for variables of interest in our corpus using a normal kernel function.

We used the normal kernel function to estimate probability density functions.

https://doi.org/10.7554/eLife.43094.004
Figure 2 with 2 supplements
Estimates of claim replication as a function of the probability of support in the literature and generalizability across high-throughput experiments.

(A) Posterior distributions of probability of support in the biomedical literature for a sample of seven DGI claims for which there are at least two findings (supporting and/or opposing). Note that …

https://doi.org/10.7554/eLife.43094.005
Figure 2—figure supplement 1
Replication increases with claim’s probability of support in the literature.

(A) Probability of claim replication estimated via a logistic regression model with replication as the response variable and support in the literature and experimental variability as predictors, for …

https://doi.org/10.7554/eLife.43094.007
Figure 2—figure supplement 2
Description of claim types in the whole corpus of 51,292 claims.

(A) Mean number of findings supporting or opposing the direction of the effect for a scientific claim across our typology. (B) Cumulative distribution of the probability of support in the …

https://doi.org/10.7554/eLife.43094.006
Figure 3 with 2 supplements
Exemplary networks comprising social, methodological, and references dependences and centralization patterns in scientific communities.

(A) Multilayer networks for four of the claims shown in Figure 2A. The nodes in each layer are scientific papers. Pairs of papers are connected by an unweighted edge in the top layer if they agree …

https://doi.org/10.7554/eLife.43094.008
Figure 3—figure supplement 1
Pearson correlation coefficients between network indices.

Social, methodological, and prior knowledge independencies are positively correlated with each other and negatively correlated with the network centralization of scientific communities. n = 2493 …

https://doi.org/10.7554/eLife.43094.009
Figure 3—figure supplement 2
Papers and pairs of papers are differentiated by the number of findings they report (in the sub-corpus of 2493 claims).

(A) Distribution of papers against the number of supporting findings they report. (B) Distribution of pairs of papers against the number of supporting findings they report.

https://doi.org/10.7554/eLife.43094.010
Figure 4 with 5 supplements
Predictors of replication success.

(A) Odds ratios derived from logistic regression models with claim replication as the response variable and seven predictors modeled independently (disconnected colored dots; n = 2493) and …

https://doi.org/10.7554/eLife.43094.011
Figure 4—figure supplement 1
Predictors of replication success.

(A) Relative risk (RR) of claim replication derived from the logistic regression models in Figure 4A. (B–H) Predicted probabilities (PP) of claim replication PP(R=1)=exp(β0+β1X)1+exp(β0+β1X) for the logistic regression models in …

https://doi.org/10.7554/eLife.43094.012
Figure 4—figure supplement 2
Claims reported by centralized communities less likely replicate.

(A–C) Logistic interaction model with claim replication as the response variable regressed on support in the literature and author centralization as interacting predictors, controlling for …

https://doi.org/10.7554/eLife.43094.014
Figure 4—figure supplement 3
Claims reported by multiple socially independent teams more likely replicate.

(A–C) Logistic interaction model with claim replication as the response variable regressed on support from the literature and social independence as interacting predictors, controlling for …

https://doi.org/10.7554/eLife.43094.015
Figure 4—figure supplement 4
Estimates of probability density functions for our variables on the sub-corpus of claims with determined direction of the drug-gene effect in CTD and LINCS L1000.

We used the normal kernel function to estimate probability density functions.

https://doi.org/10.7554/eLife.43094.013
Figure 4—figure supplement 5
Support in the literature and decentralization of scientific communities remain strong and significant predictors of claim replication success after we account for multicollinearity.

Odds ratios derived from logistic model with replication as the response variable, including only predictors with relatively low variance inflation factor (VIF <4). Predictors are rescaled ximin(x)max(x)min(x) for …

https://doi.org/10.7554/eLife.43094.016

Additional files

Supplementary file 1

Data about the corpus of 51,292 drug-gene interaction claims.

https://doi.org/10.7554/eLife.43094.017
Supplementary file 2

Data about the sub-corpus of 2493 drug-gene interaction claims for which there are two or more published findings.

https://doi.org/10.7554/eLife.43094.018
Supplementary file 3

Logistic regression models with claim replication R [Replicated = 1, Non-replicated = 0] as response variable and predictors modelled independently.

https://doi.org/10.7554/eLife.43094.019
Supplementary file 4

Logistic regression models with claim replication R [Replicated = 1, Non-replicated = 0] as response variable and predictors modelled simultaneously.

https://doi.org/10.7554/eLife.43094.020

Download links