1. Biochemistry and Chemical Biology
Download icon

Meta-Research: Task specialization across research careers

  1. Nicolas Robinson-Garcia  Is a corresponding author
  2. Rodrigo Costas
  3. Cassidy R Sugimoto
  4. Vincent Larivière
  5. Gabriela F Nane
  1. Delft Institute of Applied Mathematics, Delft University of Technology, Netherlands
  2. Centre for Science and Technology Studies, Leiden University, Netherlands
  3. Centre for Research on Evaluation, Science and Technology (CREST), Stellenbosch University, South Africa
  4. School of Informatics, Computing, and Engineering, Indiana University Bloomington, United States
  5. École de bibliothéconomie et des sciences de l'information, Université de Montréal, Canada
Feature Article
Cite this article as: eLife 2020;9:e60586 doi: 10.7554/eLife.60586
8 figures, 7 tables, 1 data set and 1 additional file

Figures

Distribution of contributions by career stage and author order.

(A) Share of publications of authors by contributorship at each career stage. (B) Share of publications of authors by contributorship based on their author position in each paper. Only publications with at least 3 authors are included for B. Career stages: junior stage (< 5 years since first publication); early-career stage (≥ 5 and < 15 since first publication); mid-career stage (≥ 15 and < 30 years since first publication); and full career stage (≥ 30 years since first publication). WR (wrote the paper); AD (analyzed the data); CE (conceived and designed the experiments); CT (contributed reagents/materials/analysis tools); PE (performed the experiments); NC (number of contributions).

Figure 2 with 1 supplement
Mixed correlation matrix of contributorship and bibliometric variables (A) and the Bayesian network used for predicting contributorship (B).

Contribution variables are in green, bibliometric variables are in blue. Bibliometric variables: PO (author’s position); AU (number of authors); DT (document type); CO (number of countries); IN (number of institutions); YE (years since first publication); PU (average number of publications). Contribution variables: WR (wrote the paper); AD (analyzed the data); CE (conceived and designed the experiments); CT (contributed reagents/materials/analysis tools); PE (performed the experiments); NC (number of contributions).

Figure 2—figure supplement 1
Bayesian network structure used for predicting contributorship highlighting whitelisted arc relations.

Contribution variables are in green, bibliometric variables are in blue. Red arcs correspond to white listed relations, that is, those arcs which, although identified by the algorithm, have modified directionality to allow predictions on contributorship based on bibliometric information.

Probability density functions of contribution roles predicted using the Bayesian Network model.

Distributions are aggregated by career stage. (A) Probability distributions for the contributorship Wrote the manuscript. (B) Probability distributions for the contributorship Analyzed the data. (C) Probability distributions for the contributorship Conceived and designed the experiments. (D) Probability distributions for the contributorship Contributed with tools. (E) Probability distributions for the contributorship Performed the experiments. (F) Probability distributions for estimated Number of contributions of an author. Red color refers to scientists’ junior stage, green to early-career stage, blue to mid-career stage and purple to late-career stage.

Figure 4 with 1 supplement
Coefficient values of contributorships by archetype, per career stage.

Two archetypes are identified in the junior stage (Specialized and Supporting), three have been identified for the early- and mid-career (Leader, Specialized and Supporting) and two have been identified for the late-career stage (Leader and Supporting). Uncertainty intervals of coefficients are shown in brackets. Color grades reflect the value of the parameters. Contributions statements: WR, wrote the manuscript; AD, analyzed data; CE, conceived and designed the experiments; PE, performed the experiments; CT, contributed with tools.

Figure 4—figure supplement 1
Screeplots of the residual sum squares (RSS) which allows determining the number of archetypes for each career stage.
Figure 5 with 1 supplement
Career trajectories, productivity and citation impact boxplots by archetype.

(A) Sankey diagrams indicating the number of scientists by archetype at each career stage and transitions from one stage to the next, including changes on researchers’ archetype. (B) Productivity boxplots, by archetype and career stage. This is calculated based on the cumulative number of publications scientists had authored at each given stage. (C) Share of highly cited publications boxplots by archetype and career stage. Highly cited publications are defined as those which are among the 10% most highly cited publications in their field and year of publication. Red refers to the Leader archetype, Blue refers to the Specialized archetype and Green refers to the Supporting archetype.

Figure 5—figure supplement 1
Effect size for the differences between archetypes within each career stage for A number of publications and B share of highly cited papers.

Colored areas provide descriptive interpretation. Yellow indicates a small effect size, green indicates medium and purple indicates a large effect size. size.

Figure 6 with 3 supplements
Estimated proportion of scientists, along with 95% confidence intervals, by gender and career stage for each archetype.

Top-left panel refers to the junior stage in which only two archetypes are present: specialized and supporting. Top-right refers to the early-career stage. Bottom-left refers to the mid-career stage. Bottom-right refers to the late-career stage, again here only two archetypes are observed: leader and supporting. Blue refers to women scientists and yellow to men scientists.

Figure 6—figure supplement 1
Sankey diagram indicating the number of male scientists by archetype at each career stage and transitions from one stage to the next, including changes on researchers’ archetypes.

Red refers to the Leader archetype, Blue refers to the Specialized archetype and Green refers to the Supporting archetype. .

Figure 6—figure supplement 2
Sankey diagram indicating the number of female scientists by archetype at each career stage and transitions from one stage to the next, including changes on researchers’ archetypes.

Red refers to the Leader archetype, Blue refers to the Specialized archetype and Green refers to the Supporting archetype.

Figure 6—figure supplement 3
Effect sizes for proportion tests to identify differences by gender and archetype at each career stage.

Colored areas provide descriptive interpretation. Yellow indicates small effect size, green indicates medium and purple indicates large effect. Values for all archetypes in junior and late-career stage are overlapping.

Figure 7 with 1 supplement
Percentage of scientists by author position, along with 95% confidence intervals, for each archetype and career stage.

Top-left panel refers to the junior stage in which only two archetypes are present: specialized and supporting. Top-right refers to the early-career stage. Bottom-left refers to the mid-career stage. Bottom-right refers to the late-career stage, again here only two archetypes are observed: leader and supporting. Blue refers to share of scientists publishing as first authors, green refers to those publishing as middle authors, and pink refers to those publishing as last authors.

Figure 7—figure supplement 1
Effect sizes for differences in proportions by author position and archetype at each stage.

Colored areas provide descriptive interpretation. Yellow indicates small effect size, green indicates medium and purple indicates large effect. Values for specialized and supporting and the junior stage and leader and supporting at the late-career are overlapped.

Author response image 1

Tables

Table 1
Definition of variables included in the dataset.
AcronymDefinitionSource
Bibliometric variables
POAuthor’s position in the paperWoS
AUTotal number of authors in the paperWoS
DTDocument type. Letters are excludedWoS
CONumber of countries to which authors of the paper are affiliatedWoS
INNumber of institutions to which authors of the paper are affiliatedWoS
YENumber of years since first publication at the time the paper was publishedWoS
PUAverage number of publications (full counting) per yearWoS
of the author at the time the paper was published
Contribution variables
WRWrote the paperPLoS
ADAnalyzed the dataPLoS
PEPerformed the experimentsPLoS
CEConceived and designed the experimentsPLoS
CTContributed reagents/materials/analysis toolsPLoS
NCNumber of contributionsPLoS
Table 2
Distribution of papers by journal of the seed dataset on contributions.
JournalNo. of papers
PLOS ONE62,174
PLOS GENETICS2408
PLOS PATHOGENS1882
PLOS COMPUTATIONAL BIOLOGY1684
PLOS NEGLECTED TROPICAL DISEASES1432
PLOS BIOLOGY697
PLOS MEDICINE417
Table 3
Classification error rates from cross-validation of Bayesian Network model for the contribution variables.

For contributorships, the percentage of mis-classified predictions is shown, while for NC, the mean squared error between the predicted and the observed values is reported.

VariablesMin.MedianMeanMax.
WR0.0620.0640.0640.065
AD0.0640.0670.0670.069
PE0.0720.0750.0750.077
CE0.0620.0640.0640.066
CT0.0770.0780.0780.081
NC0.1200.1250.1250.127
Author response table 1
WRADCECTPE
Precision0.980.880.890.890.87
Recall0.890.990.990.990.99
Author response table 2
WR5170080.480.50
AD5170080.520.50
CE5170080.480.50
CT5170080.350.48
PE5170080.510.50
NC5170082.461.32
Author response table 3
ProbabilityAllWomenMen
P(early Leader)0.3710.2680.4314
P(early Leader| Junior Specialized)0.42140.33790.2991
P(mid Leader)0.3560.2490.415
P(mid Leader|early Leader)0.6850.6060.71
P(mid Leader|early Specialized)0.1970.1440.335
P(late Leader)0.0880.0540.107
P(late Leader|mid Leader)0.2460.2180.256
Author response table 4
Junior stagemin1st quartilemedianmean3rd quartilemax
Spe. vs. supp.-1,00-0,87-0,61-0,50-0,261,00
Early-career
Lead. vs.spe.-1,00-0,250,080,080,441,00
Spe. vs. supp.-1,00-0,080,250,190,481,00
Lead. vs. supp.-1,00-0,130,150,100,391,00
Mid-career
Lead. vs.spe.-1,000,180,510,410,761,00
Lead. vs. supp.-1,000,160,420,390,681,00
Spe. vs. supp.-1,00-0,140,060,020,231,00
Late-career
Lead. vs. supp.-1,00-0,41-0,17-0,120,111,00

Data availability

All data is openly accessible at http://doi.org/10.5281/zenodo.3891055.

The following data sets were generated
  1. 1
    Zenodo
    1. N Robinson-Garcia
    2. R Costas
    3. CR Sugimoto
    4. V Larivière
    5. GF Nane
    (2020)
    Datasets on contributorship and bibliometric variables for the study 'Task specialization and its effects on research careers.
    https://doi.org/10.5281/zenodo.3891055

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)