Emergence of trait variability through the lens of nitrogen assimilation in Prochlorococcus

  1. Paul M Berube  Is a corresponding author
  2. Anna Rasmussen
  3. Rogier Braakman
  4. Ramunas Stepanauskas
  5. Sallie W Chisholm
  1. Massachusetts Institute of Technology, United States
  2. Bigelow Laboratory for Ocean Sciences, United States
7 figures, 5 tables and 2 additional files

Figures

Distributions of the nitrate reductase (narB) and nitrite reductase (nirA) genes across a core marker gene phylogeny of 329 Prochlorococcus and Synechococcus genomes.

Selected Prochlorococcus clades are highlighted as pie slices. Red bars indicate genomes with the potential to use nitrate based on the presence/absence of a narB gene. Blue bars indicate genomes with an annotated nirA gene in the genome assembly. The outer ring indicates the estimated percentage of the genomes recovered as a gray bar chart. Reference culture genomes are indicated by gray circles and the results of the PCR screen for narB are indicated by filled (present) and open (absent) squares. Single cells belonging to the HLIII and HLIV clades of Prochlorococcus were not screened by PCR and are only included as additional reference genomes. The nucleotide phylogeny is based on a concatenated alignment of 37 marker genes in the PhyloSift software package and inferred using maximum likelihood in RAxML (GTRCAT model) with automatic bootstopping criteria (250 replicate trees). Filled purple circles on branches indicate that the associated genomes clustered together in at least 75% of trees.

https://doi.org/10.7554/eLife.41043.002
Figure 1—source data 1

Compressed tar archive (zip format) containing the concatenated codon alignment (fasta format) and tree file (newick format) used to generate Figure 1.

https://doi.org/10.7554/eLife.41043.003
Figure 2 with 1 supplement
Hierarchical clustering of presence and absence distributions for flexible CyCOGs found in 83 Prochlorococcus HLII single cell genomes with genome recoveries of at least 75% (median 90%).

Genomes are sorted by Atlantic and Pacific Oceans and by the presence/absence of the narB gene – a marker for the capacity to assimilate nitrate. Other than genes in the nitrate assimilation gene cluster (green box), no CyCOGs were over- or under-represented among the flexible genes in genomes containing narB. Genomes from the Atlantic, regardless of whether or not they contained the narB marker gene, were enriched in phosphorus assimilation genes (blue box).

https://doi.org/10.7554/eLife.41043.004
Figure 2—source data 1

Binary matrix containing the raw presence and absence data for each CyCOG in each genome analyzed for Figure 2 and Figure 2—figure supplement 1.

https://doi.org/10.7554/eLife.41043.006
Figure 2—source data 2

Compressed tar archive (zip format) containing input and output files for gene enrichment analysis using BiNGO 3.0.3 (Maere et al., 2005) in Cytoscape 3.4 (Shannon et al., 2003).

https://doi.org/10.7554/eLife.41043.007
Figure 2—figure supplement 1
Hierarchical clustering of presence and absence distributions for flexible CyCOGs found in 22 Prochlorococcus single cell genomes belonging to the LLI clade with genome recoveries of at least 75% (median 87%).

Genomes are sorted by Atlantic and Pacific Oceans and by the presence/absence of the narB gene – a marker for the capacity to assimilate nitrate. Other than genes in the nitrate assimilation gene cluster (green box), no CyCOGs were over- or under-represented among the flexible genes in genomes containing narB.

https://doi.org/10.7554/eLife.41043.005
Figure 3 with 1 supplement
The core marker protein phylogeny of Prochlorococcus and Synechococcus.

(A) in comparison to phylogenies for the nitrate/nitrite transporter, NapA (B), the nitrate reductase, NarB (C), the molybdopterin biosynthesis protein, MoaA (D), the nitrite transporter, FocA (E), and the nitrite reductase, NirA (F). Interclade horizontal gene transfer is minimal for genes encoding proteins in the upstream half of the nitrate assimilation pathway (B–D) since clades defined by the core phylogeny (A) remain separate. Horizontal gene transfer is observed in a few instances for genes encoding proteins in the downstream half of the nitrate assimilation pathway (E–F). One single cell (AG-363-P06; brown circle) from the high-light adapted HLVI clade possesses FocA and NirA proteins most similar to those from low-light adapted Prochlorococcus. Prochlorococcus belonging to the LLI clade possess two types of NirA as indicated by well supported phylogenetic divergence of NirA among LLI cells. Filled purple circles on branches indicate that the associated taxa clustered together in at least 75% of trees. Scale bars are 0.1 changes per site.

https://doi.org/10.7554/eLife.41043.008
Figure 3—source data 1

Compressed tar archive (zip format) containing codon alignments (fasta format) and tree files (newick format) used to generate Figure 3 and Figure 3—figure supplement 1.

https://doi.org/10.7554/eLife.41043.010
Figure 3—figure supplement 1
The core marker gene phylogeny of Prochlorococcus and Synechococcus.

(A) in comparison to gene phylogenies for the nitrate/nitrite transporter, napA (B), the nitrate reductase, narB (C), the molybdopterin biosynthesis protein, moaA (D), the nitrite transporter, focA (E), and the nitrite reductase, nirA (F). Filled purple circles on branches indicate that the associated taxa clustered together in at least 75% of trees. Scale bars are 0.1 changes per site.

https://doi.org/10.7554/eLife.41043.009
Figure 4 with 3 supplements
The gene order and genomic location of the nitrate assimilation gene cluster in high-light and low-light adapted Prochlorococcus genomes.

The location of the nitrate and nitrite assimilation genes is shown relative to conserved core marker genes (black bars). The proportion of genomes in each clade with the indicated location is shown next to the clade names. The percentage of genomes in each group with a specific gene content and order is shown to the left of each gene order plot.

https://doi.org/10.7554/eLife.41043.011
Figure 4—figure supplement 1
Mauve alignments visualized for representative contigs from HLII Prochlorococcus single cell genome assemblies in comparison to the reference genome Prochlorococcus AS9601 (HLII; non-nitrate assimilating).

The nitrate assimilation gene cluster is found in a local collinear block shared with AS9601. CyCOG_60001297 and DNA polymerase I genes are marked as reference points.

https://doi.org/10.7554/eLife.41043.012
Figure 4—figure supplement 2
Mauve alignments visualized for representative contigs from HLI and HLVI Prochlorococcus single cell genome assemblies in comparison to the reference genomes Prochlorococcus MED4 (HLI) and Prochlorococcus AS9601 (HLII).

The reference genomes do not contain the nitrate assimilation gene cluster. In single cells, this cluster is found in the genomic island ISL1 (sensu Kettler et al., 2007).

https://doi.org/10.7554/eLife.41043.013
Figure 4—figure supplement 3
Mauve alignments visualized for representative contigs from LLI Prochlorococcus single cell genome assemblies in comparison to the reference genome Prochlorococcus NATL2A (LLI; nitrite assimilation only).

The nitrate assimilation gene cluster is found in a local collinear block shared with NATL2A. The pyrG and ppk genes are marked as reference points.

https://doi.org/10.7554/eLife.41043.014
Representative phylogenetic patterns for DNA gyrase subunit B (gyrB) in comparison to nitrate reductase (narB) and the phosphate assimilation gene pstB (encoding the ABC transporter ATP binding subunit) from single cells in surface populations at HOT (Hawai’i Ocean Time-series) and BATS (Bermuda Atlantic Timeseries Study).

The gyrB and narB genes do not exhibit significant phylogenetic divergence between the two sites. The pstB gene, in contrast, has significantly diverged into Atlantic-like and Pacific-like clusters of sequences due to frequent recombination, gene transfer, and/or selection (sensu Coleman and Chisholm, 2010).

https://doi.org/10.7554/eLife.41043.017
Percent nucleotide difference of selected genes for the HLII.

(A) and LLI (B) clades of Prochlorococcus. The 'core' genes are a concatenated alignment of up to 37 PhyloSift marker genes. Genes in the nitrate assimilation cluster are in gray. Center lines show the medians, box limits indicate the 25th and 75th percentiles, whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles, and outliers are represented by dots. For HLII genes (A), n = 5151, 4186, 3321, 3570, 4371, 3240, 378, 351, 351, 351 sample points. For LLI genes (B), n = 496, 406, 435, 435, 406, 406, 153, 136, 136, 171, 210, 28 sample points.

https://doi.org/10.7554/eLife.41043.020
Figure 6—source data 1

Compressed tar archive (zip format) containing the alignments (fasta format) and column distance matrices used in the preparation of Figure 6.

https://doi.org/10.7554/eLife.41043.021
Figure 7 with 1 supplement
Proposed model mapping the vertical inheritance of the nitrate assimilation pathway onto a general model of speciation in Prochlorococcus (Braakman et al., 2017).

We assume different cost:benefit ratios at top and bottom of the water column based on the energetic requirements of the nitrate assimilation pathway and the selective advantage of carrying this trait in environments with low nitrogen availability (see main text for further discussion). We further assume that ancestral Prochlorococcus, similar to Synechococcus, were capable of nitrate assimilation. As new Prochlorococcus clades/ecotypes emerged to more efficiently harness light energy and facilitate the draw-down of nitrogen at the surface, basal lineages were partitioned to the deeper regions of the euphotic zone (Braakman et al., 2017). In these basal lineages, higher relative costs of the pathway combined with access to other nitrogen sources (e.g. amino acids) hastened the loss of this trait through stochastic gene loss. In more recently emerging lineages (LLI, HLI, HLII, HLVI), the trait has been retained with intra-clade frequencies influenced by the specific chemical and physical characteristics of the environment in which they are found (Berube et al., 2016). The founder effect has driven punctuated changes (e.g. genome-wide rearrangements) during speciation, while homologous recombination has acted to constrain the divergence of gene sequence and order within clades.

https://doi.org/10.7554/eLife.41043.028
Figure 7—source data 1

Source data used to create Figure 7—figure supplement 1.

This file is provided in nexus format suitable for analysis in Mesquite. Contents include the taxa, phylogeny, and character matrix required for reconstructing the ancestral state of Prochlorococcus with respect to the nitrate assimilation trait.

https://doi.org/10.7554/eLife.41043.030
Figure 7—figure supplement 1
Reconstruction of the state of the ancestral Prochlorococcus node with regards to the presence or absence of narB.

When parsimony is used to infer whether or not ancestral Prochlorococcus possessed the nitrate assimilation trait (A, B), the results are highly dependent on the relative costs associated with the gain or loss of the trait. If these costs are equally weighted (A), all changes are estimated to occur at the leaves of the tree and consist primarily of acquisition events – this explanation seems unlikely given the conservation of gene order and location within clades (Figure 4). But, if a bias towards loss of the nitrate assimilation trait is imposed, the more parsimonious explanation is that ancestral Prochlorococcus possessed this trait (B). We further note that maximum likelihood estimates a reasonable probability that ancestral Prochlorococcus were capable of nitrate assimilation (C). Further, maximum likelihood generally supports a higher rate of loss relative to gain of the nitrate assimilation trait in Prochlorococcus given that the estimated forward (trait gain) rate was 14.5 and the estimated backward (trait loss) rate was 27.4 in this asymmetrical model.

https://doi.org/10.7554/eLife.41043.029

Tables

Table 1
Rates of recombination relative to mutation for representative genomic regions in high-light and low-light adapted Prochlorococcus.
https://doi.org/10.7554/eLife.41043.015
RegionFunctional groupSequences in alignmentAlignment length (bp)κδνR/θρ/θR/m
High-light adapted Prochlorococcus (HLII and HLVI clades)
nirA-moaAnitrate33114283.3242020.0120.681.3634.0
polA region*core flanking19115543.7280350.0200.200.4133.2
Low-light adapted Prochlorococcus (LLI clade)
moaB-narBnitrate17119852.914350.0211.783.5716.5
pyrG regioncore flanking15100802.819640.0292.174.3459.7
ppk regioncore flanking15119183.1910560.0310.931.8530.3
  1. * 3’ flanking region, containing polA, adjacent to the nitrate assimilation gene cluster in the HLII clade.

    5’ flanking region, containing pyrG, adjacent to the nitrate assimilation gene cluster in the LLI clade.

  2. 3’ flanking region, containing ppk, adjacent to the nitrate assimilation gene cluster in the LLI clade.

    κ, Transition/transversion rate ratio as estimated by PhyML under the HKY85 model.

  3. δ, Mean length of DNA imported by homologous recombination.

    ν, Divergence rate, per site, of DNA imported by homologous recombination.

  4. R/θ, Per-site rate of initiation of recombination relative to the population mutation rate.

    ρ/θ, Population recombination rate relative to the population mutation rate (ρ = 2R).

  5. r/m, Relative impact of recombination versus mutation on the per-site substitution rate. Equal to (R/θ) × δ × ν.

Table 1—source data 1

Compressed tar archive (zip format) containing alignments used in CLONALFRAMEML analyses.

https://doi.org/10.7554/eLife.41043.016
Table 2
Significance (p value) for divergent phylogenetic gene clusters separating populations in the North Pacific Subtropical Gyre (HOT) and in the North Atlantic Subtropical Gyre (BATS).
https://doi.org/10.7554/eLife.41043.018
GeneProductUnifracP-test
amtBAmmonium transporter0.651, 0.135, 0.1990.576, 0.006, 0.554
glnAGlutamine synthetase0.058, 0.129, 0.1360.045, 0.238, 0.214
glsFGlutamate synthase0.277, 0.066, 0.4280.547, 0.235, 0.249
gyrBDNA gyrase subunit B0.425, 0.110, 0.4770.867, 0.058, 0.006
moaAMolybdopterin biosynthesis0.040, 0.138, 0.0900.067, 0.577, 0.555
napANitrate/nitrite transporter0.427, 0.082, 0.6000.229, 0.051, 0.225
narBNitrate reductase0.326, 0.547, 0.3550.558, 0.060, 0.221
nirANitrite reductase0.004, 0.039, 0.0720.010, 0.066, 0.056
pstBPhosphate transporter ATP-binding0.001,<0.001,<0.0010.001,<0.001,<0.001
pstSPhosphate transporter substrate binding<0.001,<0.001,<0.001<0.001,<0.001,<0.001
  1. P values are reported for three replicate analyses using 18 single cells subsampled from HOT (AG-347; n = 9) and BATS (AG-355; n = 9). Trees exhibiting significant phylogenetic divergence between the two populations (p<0.05) are in bold face type.

Table 2—source data 1

Compressed tar archive (zip format) containing the alignments and group files used in beta diversity analyses.

https://doi.org/10.7554/eLife.41043.019
Table 3
Estimates of dN/dS and results of site model tests for adaptive evolution among codon sites for Prochlorococcus and Synechococcus.

Bolded LRT statistic values are chi-square critical values that meet a significance level of <0.001. For all genes, the inclusion of a class of neutral sites (M1) fits the data better than one dN/dS value for all sites (M0). While the inclusion of a class of sites under positive selection may be statistically justified under the M2 and M8 models, all dN/dS values are well below one suggesting that most sites are under purifying or neutral selection.

https://doi.org/10.7554/eLife.41043.022
log-likelihood of site models for adaptive evolutionlikelihood ratio test (LRT) statistic for model pairs
(degrees of freedom)
GeneTaxadN/dSM0M1M2M7M8M0 vs. M1
(1)
M1 vs. M2
(2)
M7 vs. M8
(2)
gyrB2260.036−105998−101367−101367−102434−100472926303924
pstB2000.045−39945−39466−39466−38543−384849570118
amtB2000.066−70319−68351−68234−68226−6762039352351212
glnA2290.031−66948−65905−65905−65462−6511820860688
glsF2150.103−267696−249591−249591−253085−24672436210012722
napA760.054−23709−23028−23028−23201−2296113630480
narB780.119−40791−38790−38790−39102−38498400101209
moaA670.206−24794−22973−22931−23268−22770364384996
focA600.078−18477−17701−17701−17895−1761515530562
nirA1150.109−54829−52361−52361−52063−51470493701187
Table 3—source data 1

Compressed tar archive (zip format) containing example codeml control files, codon alignments (phylip format), and tree files (newick format) used for site model tests of adaptive evolution.

https://doi.org/10.7554/eLife.41043.023
Table 4
Branch-site model tests for adaptive evolution among codon sites for the foreground HLII branch.

Background lineages include Synechococcus and all other Prochlorococcus. Bolded LRT statistic values are chi-square critical values that meet a significance level of <0.001 with 1 degree of freedom.

https://doi.org/10.7554/eLife.41043.024
GeneHypothesis log-likelihoodSite class 0Site class 1Site class 2aSite class 2bLRT statistic
gyrBH0
−104105
proportion of sites0.987590.001380.011020.000026074
 background dN/dS0101
 foreground dN/dS0111
H1
−101068
proportion of sites0.996660.001770.001570
background dN/dS0.0090810.009081
foreground dN/dS0.00908111
pstBH0
−39463
proportion of sites0.998010.001610.0003800
 background dN/dS0.0325510.032551
 foreground dN/dS0.03255111
H1
−39463
proportion of sites0.998010.001610.000380
background dN/dS0.0325510.032551
foreground dN/dS0.03255111
amtBH0
−68231
proportion of sites0.99260.006630.000760.00001-2
 background dN/dS0.0205210.020521
 foreground dN/dS0.02052111
H1
−68232
proportion of sites0.992880.006570.000540
background dN/dS0.020610.02061
foreground dN/dS0.020612.172162.17216
glnAH0
−65896
proportion of sites0.998390.001440.0001800
 background dN/dS0.0172210.017221
 foreground dN/dS0.01722111
H1
−65896
proportion of sites0.998390.001440.000180
background dN/dS0.0172210.017221
foreground dN/dS0.01722111
glsFH0
−249282
proportion of sites0.990920.006790.002270.000020
 background dN/dS0.022710.02271
 foreground dN/dS0.0227111
H1
−249282
proportion of sites0.990920.006790.002270.00002
background dN/dS0.022710.02271
foreground dN/dS0.0227111
napAH0
−23009
proportion of sites0.979660.012190.008050.00010
 background dN/dS0.0092810.009281
 foreground dN/dS0.00928111
H1
−23009
proportion of sites0.979660.012190.008050.0001
background dN/dS0.0092810.009281
foreground dN/dS0.00928111
narBH0
−38711
proportion of sites0.943110.021640.034460.000790
 background dN/dS0.0248910.024891
 foreground dN/dS0.02489111
H1
−38711
proportion of sites0.943110.021640.034470.00079
background dN/dS0.0248910.024891
foreground dN/dS0.02489111
moaAH0
−22958
proportion of sites0.940680.033750.024680.000890
 background dN/dS0.0388210.038821
 foreground dN/dS0.03882111
H1
−22958
proportion of sites0.940680.033750.024680.00089
background dN/dS0.0388210.038821
foreground dN/dS0.03882111
nirAH0
−52348
proportion0.979960.013020.006930.0000914
 background dN/dS0.0432710.043271
 foreground dN/dS0.04327111
H1
−52341
proportion0.972540.01240.014870.00019
background dN/dS0.0429310.042931
foreground dN/dS0.04293111
Table 4—source data 1

Compressed tar archive (zip format) containing example codeml control files, codon alignments (phylip format), and tree files (newick format) used for branch-site model tests of adaptive evolution in the HLII clade.

https://doi.org/10.7554/eLife.41043.025
Table 5
Branch-site model tests for adaptive evolution among codon sites for the foreground LLI branch.

Background lineages include Synechococcus and all other Prochlorococcus. Bolded LRT statistic values are chi-square critical values that meet a significance level of <0.001 with 1 degree of freedom.

https://doi.org/10.7554/eLife.41043.026
GeneHypothesis log-likelihoodSite class 0Site class 1Site class 2aSite class 2bLRT statistic
gyrBH0
−101367
proportion of sites0.998130.001870022
 background dN/dS0.0106210.010621
 foreground dN/dS0.01062111
H1
−101356
proportion of sites0.997680.001860.000460
background dN/dS0.0105310.010531
foreground dN/dS0.01053111
pstBH0
−39407
proportion of sites0.980460.001470.018050.000030
 background dN/dS0.0317210.031721
 foreground dN/dS0.03172111
H1
−39407
proportion of sites0.980460.001470.018050.00003
background dN/dS0.0317210.031721
foreground dN/dS0.03172111
amtBH0
−68211
proportion of sites0.991570.006730.001690.000010
 background dN/dS0.0199210.019921
 foreground dN/dS0.01992111
H1
−68211
proportion of sites0.99180.006770.001430.00001
background dN/dS0.0199210.019921
foreground dN/dS0.01992111
glnAH0
−65885
proportion of sites0.997990.001410.000600
 background dN/dS0.0172710.017271
 foreground dN/dS0.01727111
H1
−65885
proportion of sites0.997990.001410.00060
background dN/dS0.0172710.017271
foreground dN/dS0.01727111
glsFH0
−249371
proportion of sites0.988650.006860.004450.000030
 background dN/dS0.0230610.023061
 foreground dN/dS0.02306111
H1
−249371
proportion of sites0.988650.006860.004450.00003
background dN/dS0.0230610.023061
foreground dN/dS0.02306111
napAH0
−23014
proportion of sites0.971280.012480.016040.000210
 background dN/dS0.0100410.010041
 foreground dN/dS0.01004111
H1
−23014
proportion of sites0.971220.012460.016110.00021
background dN/dS0.0100410.010041
foreground dN/dS0.01004111
narBH0
−38790
proportion of sites0.975960.024040040
 background dN/dS0.0293910.029391
 foreground dN/dS0.02939111
H1
−38770
proportion of sites0.95770.023460.018390.00045
background dN/dS0.0276210.027621
foreground dN/dS0.02762111
moaAH0
−22961
proportion of sites0.906920.035650.055260.002170
 background dN/dS0.0370410.037041
 foreground dN/dS0.03704111
H1
−22961
proportion of sites0.906920.035650.055260.00217
background dN/dS0.0370410.037041
foreground dN/dS0.03704111
focAH0
−17695
proportion of sites0.976050.014110.009690.000140
 background dN/dS0.0142510.014251
 foreground dN/dS0.01425111
H1
−17695
proportion of sites0.976050.014110.00970.00014
background dN/dS0.0142510.014251
foreground dN/dS0.01425111
nirA type IH0
−52349
proportion of sites0.972920.012430.014460.000180
 background dN/dS0.0433310.043331
 foreground dN/dS0.04333111
H1
−52349
proportion of sites0.972920.012430.014460.00018
background dN/dS0.0433310.043331
foreground dN/dS0.04333111
nirA type IIH0
−52358
proportion of sites0.971390.012160.016250.00020
 background dN/dS0.044210.04421
 foreground dN/dS0.0442111
H1
−52358
proportion of sites0.971390.012160.016250.0002
background dN/dS0.044210.04421
foreground dN/dS0.0442111
Table 5—source data 1

Compressed tar archive (zip format) containing example codeml control files, codon alignments (phylip format), and tree files (newick format) used for branch-site model tests of adaptive evolution in the LLI clade.

https://doi.org/10.7554/eLife.41043.027

Additional files

Supplementary file 1

Tab-delimited table containing the genomes and associated IMG accession numbers used in the final data set.

Clade indicates the respective Synechococcus subcluster or Prochlorococcus clade. Results from the narB PCR screening assay are presented as a binary (0 = negative; 1 = positive; n.d = not determined). Estimated genome recovery was determined using checkM (Parks et al., 2015) and the presence/absence of annotated reductase and transporter genes for nitrite (nirA and focA) and nitrate (narB and napA) in each genome assembly are as given as a binary (0 = absent; 1 = present).

https://doi.org/10.7554/eLife.41043.031
Transparent reporting form
https://doi.org/10.7554/eLife.41043.032

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Paul M Berube
  2. Anna Rasmussen
  3. Rogier Braakman
  4. Ramunas Stepanauskas
  5. Sallie W Chisholm
(2019)
Emergence of trait variability through the lens of nitrogen assimilation in Prochlorococcus
eLife 8:e41043.
https://doi.org/10.7554/eLife.41043