Analysis of science journalism reveals gender and regional disparities in coverage

  1. Natalie R Davidson
  2. Casey S Greene  Is a corresponding author
  1. University of Colorado School of Medicine, United States
3 figures, 10 tables and 1 additional file

Figures

Figure 1 with 1 supplement
Data and processing pipeline overview.

(a), left, depicts an example news article and the type of data extracted from the text. Green and blue highlighted text depicts all quotes, and associated speakers identified by the coreNLP …

Figure 1—figure supplement 1
Benchmark data.

The performance of gender prediction for pipeline-identified quoted speakers.

Figure 2 with 2 supplements
Speakers predicted to be men are sometimes over-represented in quotes, but this depends on the year and article type.

(a), left, depicts an example of the names extracted from quoted speakers in news articles and authors in papers. (a), right, highlighted the data types and processes used to analyze the predicted …

Figure 2—figure supplement 1
Speakers predicted to be men are over-represented in news quotes regardless of predicted journalist gender.

(a) depicts two trend lines: Yellow: proportion of Nature news articles written by a predicted women journalist; blue: proportion of Nature news articles written by a predicted men journalist. We …

Figure 2—figure supplement 2
Speakers predicted to be men are over-represented in news quotes when compared against Springer Nature authorship.

(a) depicts three trend lines: purple: proportion of Nature quotes for a speaker estimated to be a man; light gray: proportion of The Guardian quotes for a speaker estimated to be a man; yellow: …

Figure 3 with 4 supplements
Analysis of quotes and citations found over-representation of Celtic/English and under-representation of East Asian predicted name origins.

(a), left, depicts an example of the names extracted from quoted speakers and citations found within news articles and authors in papers. (a), right, highlights the data types and processes used to …

Figure 3—figure supplement 1
Predicted Celtic/English, and European name origins are the highest cited, quoted, and mentioned.

(a) depicts the number of quotes, mentions, citations, or research articles considered in the name origin analysis. (b–g) depicts the proportion of a name origin in a given dataset, citations in …

Figure 3—figure supplement 2
Distribution of name origins Nature and Springer Nature articles.

(a–d) depicts the predicted name origins of first and last authors in our background sets. (a and b) show the predicted name origins of Nature first and last authors, respectively. (c and d) show …

Figure 3—figure supplement 3
Over-representation of predicted Celtic/English and under-representation of East Asian name origins are also found in comparison to Nature and Springer Nature articles.

(a–f) depicts 10 plots, each for a possible name origin comparison against a background set. (a, c) and (e) compare the citation (a), quote (c), or mention (e) rate against Nature first and last …

Figure 3—figure supplement 4
Over-representation of predicted Celtic/English and under-representation of East Asian quotes and mentions are reduced when additionally considering citation (a–d) depicts twelve plots, each for a possible name origin comparison against a background set.

(a and b) compare name origin proportions of quotes from people that were also cited in the same article. (c and d) compare name origin proportions from mentions of people that were also cited in …

Tables

Table 1
Breakdown of quotes at major processing steps.
Processing stepFrequency
Total quotes105,457
Quotes with a full name or pronoun associated96,620
Quotes with a gender prediction96,390
Quote with a full name88,535
Quotes with a name origin prediction100,457
Table 2
Breakdown of citations at major processing steps.
Writer of articleTotal citationsTotal Springer Nature citationsFirst author citations with a full nameLast author citations with a full nameFirst author citations with a name origin predicitonLast author citations with a name origin prediciton
Journalist15,71357364452446444494447
Scientist40,70714,59711,27611,17011,27611,152
Table 3
Breakdown of all Springer Nature papers at major processing steps.
Processing stepFrequency
# Springer Nature articles38,400
# First + last authors with a full name in Springer Nature articles55,370
# First + last authors with a gender prediction in Springer Nature articles51,686
# First + last authors with a name origin prediction in Springer Nature articles55,197
Table 4
Breakdown of all Nature papers at major processing steps.
Processing stepFrequency
# Nature articles13,414
# First + last authors with a full name in Nature articles21,996
# First + last authors with a gender prediction in Nature articles21,173
# First + last authors with a name origin prediction in Nature articles21,996
Table 5
Quoted speaker gender by name origin.
WomenMenProportion men
African27015540.8519737
ArabTurkPers34617650.8360966
CelticEnglish639933,3290.8389297
EastAsian109044380.8028220
European478822,8440.8267226
Greek734450.8590734
Hebrew21313030.8594987
Hispanic76024500.7632399
Nordic59323970.8016722
SouthAsian46520190.8128019
Table 6
Mean fold change comparison with Nature from bootstrap samples with 95% CI.
CelticEnglishEastAsianEuropean
citation_journalist_first vs. nature_first1.36 (0.96, 1.74)0.7 (0.46, 0.91)1.01 (0.8, 1.25)
citation_journalist_last vs. nature_last1.18 (0.93, 1.54)0.82 (0.42, 1.27)0.93 (0.71, 1.19)
citation_scientist_first vs. nature_first1.26 (1.05, 1.5)0.81 (0.66, 1.02)1.05 (0.88, 1.22)
citation_scientist_last vs. nature_last1.11 (0.95, 1.31)0.77 (0.58, 0.99)1.06 (0.93, 1.19)
quote vs. nature_first2.12 (1.77, 2.51)0.25 (0.2, 0.32)1.01 (0.81, 1.22)
quote vs. nature_last1.52 (1.32 1.75)0.39 (0.3, 0.49)0.89 (0.79, 1.01)
mention vs. nature_first2.03 (1.67, 2.39)0.29 (0.23, 0.36)1.02 (0.81, 1.22)
mention vs. nature_last1.44 (1.26, 1.67)0.45 (0.35, 0.54)0.89 (0.79, 1)
Table 7
Mean fold change comparison with Springer Nature from bootstrap samples with 95% CI.
CelticEnglishEastAsianEuropean
citation_journalist_first vs. springer_first1.99 (1.42, 2.64)0.69 (0.47, 0.96)1.14 (0.89, 1.47)
citation_journalist_last vs. springer_last2.01 (1.31, 3.08)0.56 (0.3, 0.82)1.12 (0.91, 1.37)
citation_scientist_first vs. springer_last1.54 (0.95, 2.17)0.91 (0.62, 1.64)1.13 (0.91, 1.93)
citation_scientist_last vs. nature_last1.11 (0.95, 1.31)0.77 (0.58, 0.99)1.06 (0.93, 1.19)
quote vs. springer_last2.58 (1.74, 3.6)0.28 (0.2, 0.54)1.08 (0.84, 1.35)
quote vs. nature_last1.52 (1.32, 1.75)0.39 (0.3, 0.49)0.89 (0.79, 1.0)
mention vs. springer_last2.45 (1.65, 3.42)0.32 (0.23, 0.59)1.08 (0.85, 1.32)
mention vs. nature_last1.44 (1.26, 1.67)0.45 (0.35, 0.54)0.89 (0.79, 1)
Table 8
Quoted speaker name origin, by journalist name origin.
Journalist name originAfricanArab Turk PersCeltic EnglishEast AsianEuropeanGreekHebrewHispanicNordicSouth Asian
CelticEnglish0.0200.0250.4840.0380.3190.0060.0160.0330.0350.022
EastAsian0.0180.0170.3540.2430.2500.0040.0160.0260.0360.035
European0.0220.0230.4200.0860.3260.0050.0160.0430.0320.027
Table 9
Quoted + cited speaker name origin, by journalist name origin.
Journalist name originAfricanArab Turk PersCeltic EnglishEast AsianEuropeanGreekHebrewHispanicNordicSouth Asian
CelticEnglish0.0160.0270.3680.0700.3630.0080.0170.0230.0830.025
EastAsian0.0020.0770.3770.1430.1670.0000.0120.1330.0190.080
European0.0140.0280.3630.1160.3520.0060.0300.0260.0350.030
Table 10
Quoted speakers (with US-affiliated citation) name origin, by journalist name origin.
Journalist name originAfricanArab Turk PersCeltic EnglishEast AsianEuropeanGreekHebrewHispanicNordicSouth Asian
CelticEnglish0.0110.0230.3780.0860.3610.0100.0210.0290.0560.025
EastAsian0.0000.0660.3400.1480.2090.0000.0050.1480.0330.049
European0.0210.0300.4100.1110.3000.0120.0230.0190.0300.046

Additional files

Download links