Visualizing recent self-citation rates and temporal trends. a) Kernel density estimate of the distribution of First Author, Last Author, and Any Author self-citation rates in the last five years. b) Average self-citation rates over every year since 2000, with 95% confidence intervals calculated by bootstrap resampling.

Self-citation rates in 2016-2020 for First, Last, and Any Authors by field.

Average self-citation rates for each academic age in years 2016-2020. a) Self-citation rate vs. academic age for both First and Last Authors. Shaded regions show 95% confidence intervals obtained via bootstrap resampling. b) Comparison of self-citation rates by academic age for First and Last Authors. For a given academic age, a single point is plotted as (x=First Author self-citation rate for authors of academic age a, y=Last Author self-citation rate for authors of academic age a). The dashed line represents the y=x line, and the coloring of the points from dark to light represents increasing academic age.

Self-citation rates by country for First and Last Authors from 2016-2020. First Author data are presented in (a), and Last Author data are shown in panel (b). Only countries with >50 papers were included in the analysis. Country was determined by the affiliation of the author.

Self-citation rates by topic. Results are presented for a) First, b) Last, and c) Any Authors. Topics were determined by Latent Dirichlet Allocation. Confidence intervals of the average self-citation rate are shown based on 1000 iterations of bootstrap resampling.

Gender disparities in authorship and self-citation. a) Proportion of papers written by men and women First and Last Authors since 2000. b) Average self-citation rates for men and women First and Last Authors. c) Ratio of average self-citation rates of men to women for First and Last Authors. d) Self-citation rates by academic age for men and women authors, where the dashed line represents men and the solid line women. e) Ratio of self-citation rates of men to women by academic age. f) Number of papers by academic age for men and women, where the dashed line represents men and the solid line women. g) Ratio of average number of papers of men to women by academic age. In all subplots, 95% confidence intervals of the mean were calculated with 1000 iterations of bootstrap resampling.

Smooth predictors for generalized additive models presented in Table 2. Models for a) First Authors and self-citation counts, b) Last Authors and self-citation counts, c) First Authors and self-citation rates, d) Last Authors and self-citation rates, e) First Authors and publication history, f) Last Authors and publication history. The number in parentheses on each y-axis reflects the effective degrees of freedom. All P values were P<2e-16 except year citing for Last Authors for the count (P=5.0e-5) and rate (P=0.176) models.

Coefficients and P values for parametric terms in the models. Separate models were created for First and Last Authors. Models were also made for self-citation counts, self-citation rates, and the number of previously published papers. Quantile-quantile plots are presented in Figure S11. Results from 100 random resamplings are presented in Figure S12. Please note that model covariates were not included in the multiple comparisons correction in Table S9. *P<0.05, **P<1e-5, ***P<1e-10.

Data exclusions. Each cell shows the number of articles or citations remaining after exclusion, as well as the percentage that were dropped by the exclusion criteria.

All journals included in this analysis by field, sorted alphabetically.

Comparison between manual scoring of self-citation rates and self-citation rates estimated from Python scripts in 5 Psychiatry journals: American Journal of Psychiatry, Biological Psychiatry, JAMA Psychiatry, Lancet Psychiatry, and Molecular Psychiatry. 906 articles in total were manually evaluated (10 articles per journal per year from 2000-2020, four articles excluded for very large author list lengths and thus high difficulty of manual scoring). For manual scoring, we downloaded information about all references for a given article and searched for matching author names.

Percentiles of self-citation rates in articles from 2016-2020.

Temporal trends in First Author, Last Author, and Any Author self-citation rates from 2000-2020 in Neurology, Neuroscience, and Psychiatry papers. Shaded regions show 95% confidence intervals calculated with bootstrap resampling.

Correlations between year and self-citation rate and corresponding slopes by field.

Average of normalized self-citation counts for each academic age in years 2016-2020. For the normed self-citation counts, the number of self-citations were divided by the number of previously published papers by the author.

First Author and Last Author self-citation rates by affiliation country of the author for papers from 2016-2020. 95% confidence intervals obtained via bootstrap resampling are included in parentheses. Only countries with at least 50 papers were included in the analysis.

Mean impact factor by country for a) First Authors and b) Last Authors. Mean number of previous papers by country for c) First Authors and d) Last Authors. Normed number of self-citations for e) First Authors and f) Last Authors. The normed self-citation rate was computed as the number of self-citations divided by the number of previously published papers.

LDA perplexity on training and validation data for a different number of topics. The lowest validation perplexity was for seven topics.

Topic word clouds for 13 topics. These are the most common words appearing in each of our LDA model topics. Based on the word clouds, we assigned overall themes, or topic names.

Topic word clouds for seven topics. These are the most common words appearing in each of our LDA model topics. Based on the word clouds, we assigned overall themes, or topic names.

a) First Author, b) Last Author, and c) Any Author self-citation rates for seven topics.

Self-citation rates by number of papers for women and men. Self-citation rates were grouped in bins by number of previous papers: 0-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, 90-99, 100-149, 150-199, 200-249, 250-299, 300-399, 400-499, >500. Error bars reflect 95% confidence intervals obtained with bootstrap resampling.

Topic and gender interactions. Proportion of men and women authors by each topic for a) First Authors and b) Last Authors. Average self-citation rates for men and women authors by each topic for c) First Authors and d) Last Authors. Darker shades (top bar in each pair) are aggregated across men, and lighter shades (bottom bar in each pair) are aggregated across women.

Models with affiliation continent instead of low- and middle-income country terms. *P<0.05, **P<1e-5, ***P<1e-10.

Models with interaction terms for between gender/academic age and gender/number of previous papers. *P<0.05, **P<1e-5, ***P<1e-10.

Quantile-quantile plots for all models. The plots were generated with a simulation-based approach using the DHARMa package in R.

Tests for uniformity, outliers, and dispersion in models. Tests were performed using the DHARMa package in R. Uniformity: Asymptotic one-sample Kolmogorov-Smirnov test. DHARMa outlier test based on exact binomial test with approximate expectations. DHARMa nonparametric dispersion test via sd of residuals fitted vs. simulated. DHARMa zero-inflation test via comparison to expected zeros with simulation under H0 = fitted model

Values for parametric terms in models across 100 random resamplings.

Single author self-citation rates for Dustin Scheinost. a) Histogram of Scheinost-Scheinost self-citation rates, which were computed as the proportion of references with Scheinost as an author across every paper. b) Scheinost-Scheinost self-citation rate over time. c) Any Author self-citation rates for all papers with Scheinost as an author.

Comparison of self-citation rates in the entire field of Neuroscience and the journal Nature Neuroscience.

Data missingness.

Distribution of the natural log of exchangeability block size.

P values for all 44 comparisons performed in this study. P values are corrected for multiple comparisons with the Benjamini/Hochberg false discovery rate (FDR) correction with a=0.05. For P values determined by permutation testing, 10,000 permutations were used. Significant values (Pcorrected<0.05) are marked with an asterisk in the “Finding” column.