Research topics for intramural and extramural projects.

The topics listed were identified by clustering publications based on their titles and abstracts via Word2Vec (see Methods). The relative ratio of intramural projects for each topic was calculated by taking a ratio of the proportions of total grants a topic represented in the intramural vs. extramural portfolios. A relative ratio >1 signifies a higher share of intramural project publications on that topic relative to their share across all topics. For example, if a topic comprised 10% of grants in the intramural portfolio but only 5% of grants in the extramural portfolio, this would represent a 2:1 intramural:extramural relative ratio, or 2.0.

Project funding for intramural and extramural projects.

(A) Mean project cost (on a log scale) versus year for intramural grants (green) and extramural grants (red) between 2009 and 2019. Error bars denote 95% confidence intervals. Total costs were used rather than only direct costs in order to fully account for the degree of government investment. Error bars are larger for intramural data because of the smaller total number of awards (98,648 extramural and 1594 intramural).

Annual outputs from intramural and extramural projects.

(A) Mean number of papers per project for intramural projects (green) and extramural projects (blue) between 2008 and 2020. The difference (inset) was close to 1.5 papers per project in 2008, but this gap closed over time. (B) Relative citation ratio per project. (C) Approximate potential to translate (ATP) per project. (D) Clinical citation counts per project. (E) Number of papers with at least one clinical citation per project. Error bars denote 95% confidence intervals. Error bars are larger for intramural data because of the smaller total number of awards (98,648 extramural and 1594 intramural).

Cost effectiveness of intramural and extramural projects.

(A) A measure of cost effectiveness versus progression (ie, year of grant) for intramural research (green) and extramural research (red), for projects of different durations: 1–3 years (top row), 4–6 years (middle), and 7–10 years (bottom). These regressions do not control for other characteristics, but rather represent the raw ratios. For the first column, the Y-axis displays log10(ratio) +1, where ratio is the cumulative total costs to the cumulative total research output for each metric (cost:output, for the first column output = #papers); error bars denote the 95% confidence intervals. The remaining columns show measures of cost effectiveness for relative citation ratio, approximate potential to translate, total clinical citation counts, and a binary measure of clinical citations. To account for the fact that many papers are published after funding for the relevant grant has ended, grant amounts were multiplied by a deflator – this represents the proportion of papers published to date against the anticipated number of future publications, as determined by empirical measurements (Supplemental Table 1). In most cases, according to this analysis, extramural research is more cost effective than intramural research when observing uncontrolled regressions. (BD) Linear regression results of the cost efficiency of research output measures against project types (intramural vs. extramural). The regression model was fitted for each year of the project’s progression. Unlike panel (A), this regression model controls for grant, investigator, and collaboration characteristics in order to obtain a more accurate estimate of the relative cost efficiency of intramural vs. extramural projects. The Y-axis coefficient indicates the mean disparity in research output between intramural and extramural projects, controlling for these other variables (see Methods). Because there might be covariates that could confound the data, separate regressions were conducted for all projects (B, the default), and for balanced projects using 1:1 propensity score matching (1 extramural grant for every 1 intramural grant) in order to compare grants that were the most similar to reduce the influence of unobserved covariates (C) and (D) similarly to (C) 1:4 propensity matching as a robustness check..

Comparison of scores for human-focused research, animal research, and molecular/cellular research for intramural and extramural projects.

(A-C) These represent the average Human, Animal and Molecular/Cellular scores for publications funded by extramural vs. intramural grants, respectively, which were downloaded from iCite (Hutchins, Davis, et al., 2019; iCite et al., 2019). (A) Average scores for human-focused research for intramural research (green) and extramural research (blue) for projects of different durations: 1–3 years (left), 4–6 years (middle), and 7–10 years (right).. (B) Average scores for animal research. (C) Average scores for molecular/cellular research. Mann-Whiteney U tests were conducted to test the difference between the scores for intramural and extramural projects. *** p<0.001, ** p < 0.01, * p < 0.05. No asterisk indicates the difference is not statistically significant.

T-SNE plot illustrating the distribution of topic clusters, with colors consistent with those in Figure 1.

This shows a dimensionality reduction of the locations of these grants in Word2Vec space, indicating their relative proximity to one another. Grants with similar semantic meaning appear closer to one another, while those with less semantic similarity will appear farther away. Similar grants tend to cluster together in T-SNE visualizations.

T-SNE plot illustrating the distribution of grants in NIH institutes, using the same methodology as Supplementary Figure 1.

This shows a dimensionality reduction of the locations of these grants in Word2Vec space, indicating their relative proximity to one another. Grants with similar semantic meaning appear closer to one another, while those with less semantic similarity will appear farther away. Similar grants tend to cluster together in T-SNE visualizations.

Ternary contour plots representing the clinical citation efficiency in the human, animal, and molecular/cellular score system for intramural (D) and extramural projects (E).

(Hutchins, Davis, Meseroll, & Santangelo, 2019; Weber, 2013). Here, efficiency was the percentile of the cost per output in descending order. Each contour line denotes a constant efficiency percentile. Yellow/green are high-efficiency areas of the triangle, and blues are low-efficiency areas.

Number and proportion of papers published after grant ended by activity code prefix.