1. Evolutionary Biology
Download icon

The landscape of coadaptation in Vibrio parahaemolyticus

  1. Yujun Cui  Is a corresponding author
  2. Chao Yang
  3. Hongling Qiu
  4. Hui Wang
  5. Ruifu Yang
  6. Daniel Falush  Is a corresponding author
  1. State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, China
  2. Shenzhen Centre for Disease Control and Prevention, China
  3. School of Public Health, Shanghai Jiao Tong University School of Medicine, China
  4. Institute for Nutritional Sciences, Chinese Academy of Sciences, China
  5. The Center for Microbes, Development and Health, Key Laboratory of Molecular Virology and Immunology, Institut Pasteur of Shanghai, Chinese Academy of Sciences, China
Research Article
Cite this article as: eLife 2020;9:e54136 doi: 10.7554/eLife.54136
9 figures, 3 tables and 3 additional files

Figures

Figure 1 with 1 supplement
Neighbor-joining (NJ) trees and principle coordination analysis (PCoA) of 469 non-redundant dataset and 198 strain discovery dataset.

(a, c) NJ trees of strains from two datasets. Red branches indicate strains in the discovery dataset. The colored circles indicate populations (inner) and ecogroups (outer) according to the legend on the left. (b, d) PCoA analysis of strains from two datasets based on core SNPs and accessory genes. Colors of points indicate the populations and ecogroups according to the legend on the right. Two distinct genotype clusters separated by O-3 variants were highlighted with dashed ellipses.

Figure 1—figure supplement 1
fineSTRUCTURE showing no clonal frame in the discovery dataset of 198 VppAsia isolates.

Coancestry matrix of 201 strains, including the discovery dataset of 198 strains and three isolates respectively from VppX, VppUS1 and VppUS2. The color of each cell indicates the expected chunks numbers imported from a donor (column) to a recipient (row).

Frequency distribution of Fisher exact test P values between genetic variants.

Colors and shapes indicate the interactions between different types of variants. The vertical dotted line shows the threshold p=10−10.

Figure 3 with 3 supplements
The largest interaction group (IG1) in V. parahaemolyticus.

(a, b) Hierarchical clustering of 469 non-redundant strains (columns) based on coadaptated loci (rows) of IG1. Colors of the heatmap indicate the status of genetic variants as shown in the legend (bottom right), background colors of the upper and left clustering tree separately indicate different ecogroups (EG) and tiers, respectively, matching the colors in panel b-d. Colored bars below the upper clustering tree indicate the populations of strains. SNPs of lateral flagellar genes were marked by grey bars on the left of the heatmap. Panel b is a zoom-in version of specific tiers in panel a. (c) The distribution of coadaptation SNPs in the lateral flagellar gene cluster region (VPA1538-1557). The top indicates the gene organization of lateral flagellar gene cluster. Light orange rectangles indicate regions in the accessory genome. The histograms indicate the distribution of SNPs along the gene cluster, with colors of bars indicating coadaptation tiers. (d) Coadaptation blocks of IG1, shown in their genomic locations. Four different reference genomes were used, as indicated on the top of each panel, since no single genome contains all of the accessory genome variants. Grey horizontal arrows indicate core genes. Accessory genes are colored according to their tier or are tan if they do not belong to one. Vertical colored bars within grey arrows indicate core-SNPs in IG1, with colors indicating different coadaptation tiers. COG classification labels are shown above the genes. The numerical labels above reference genomes indicate the identity of the coadaptation genome blocks, corresponding to the information in Supplementary file 1. Colors of bottom left curves indicate the average copying probability (probability of genetic variants inherited from the same ecogroup) of different EGs calculated using chromosome painting.

Figure 3—figure supplement 1
Hierarchical clustering of 469 non-redundant strains (columns) based on different subsets of IG1 variants (rows).

(a) Core SNPs. (b) Accessory genes. (c) SNPs of lateral flagellar gene cluster. Colors of the heatmap indicate the status of genetic variants, colored bars below the upper clustering tree indicate the populations and ecogroups of strains, and colored bars on the left indicate different coadaptation ties as shown in the legend (bottom).

Figure 3—figure supplement 2
Neighbor-joining trees (top) and average copy probability value distributions (bottom) of geographical populations (a) and semi-clonal group (SCG) strains (b) of V. parahaemolyticus.

Branch colors of the trees indicate populations (a) and SCGs (b), respectively. Horizontal dotted line indicates the expected average copy probability value. Different chromosomes are separated by a vertical dotted line.

Figure 3—figure supplement 3
NJ tree of lateral flagellar gene cluster region (VPA1538-1557) in Vibrio genus.

Three randomly selected strains of EG1a and NonEG1a are used and marked in blue and red, respectively.

Phenotypes of strains from different EGs.

(a) Swimming (top) and swarming (bottom) ability of strains in different ecogroups. Motility diameters in swimming and swarming plate were used to measure the motility ability. (b) Growth curve of strains in different ecogroups. The average optical density at 600 nm (OD600) of five replicates were used to generate the curve, vertical lines indicate the standard deviation. (c) Biofilm formation (top) and colony morphology (bottom). OD595 values were used to measure the biofilm formation ability. Colony morphologies of strains at different salinities were shown on the bottom. In panel a-c, five replicates were performed for each strain, and colors indicate different ecogroups.

Figure 5 with 1 supplement
Characteristics of detected interaction groups.

(a) COG classification and GC content of all analyzed genes (top panel) and of different types of coadaptated genes (lower panels). Red for core genes and blue for accessory genes. The first number in brackets is the number of genes with COG annotation and the second is the total number of genes in the category. (b) Gene maps of different IGs. The colors of the bar on the left indicates average linkage strength of the loci in each IG. Arrows indicate genes. Acessory genes are shown in blue and genes with no coadaptation signal are shown in orange. Red genes indicated core genes containing coadapted SNPs which are indicated with black vertical lines. Vertical dotted lines were used to split compatible genes with physical distance larger than 3 kb, or genes located in different contigs, chromosomes and strains. Dotted rectangles indicate incompatible genes. IGs with genome block length larger than 60 kb are broken by double slash and shown in (c) after zooming out. COG classification labels are shown above the genes.

Figure 5—figure supplement 1
Clustering of 469 strains based on variations of IG2-90.

Hierarchical clustering of strains (column) based on coadaptation loci (row) of all IGs, except for IG1, 2, 3, 14, and 42 that had been showed in figures of the main text. Colors of the heatmap indicate the status of genetic variants, colored bars below the upper clustering tree indicate the populations of strains as shown in the legend.

Four representative interaction groups.

Hierarchical clustering of 469 non-redundant strains (columns) based on coadaptation loci (rows) of 4 representative IGs. Colors of the heatmap indicate the status of genetic variants, with light orange/orange for two alleles of a SNP, light yellow/brown for absence and presence of the accessory genes. Bar colors below the upper tree indicate the populations of strains according to the legend. A summary of the function of the involved genes is shown on the top of each heatmap. Arcs on the right indicate the causal links after ARACNE filtering, colors and the width of the arcs scale with the P values.

Figure 7 with 1 supplement
Comparison with other epistasis detection methods.

(a) Correlation between Fisher exact test P value and SuperDCA coupling strength (red and blue for ecogroup 1a (EG1a) and non-EG1a SNPs, respectively), and between Fisher exact test P value and SpydrPick mutual information (green). (b) Overlap of strong linked SNP sites detected by Fisher exact test (p<10−10, excluding EG1a SNPs) and SuperDCA (coupling strength >10-2.2). Red for interacted SNP pairs detected by both methods, blue for SNP pairs detected only by Fisher exact test.

Figure 7—figure supplement 1
EG1a interactions that only detected by Fisher exact test.
Conceptual model of four stages of coadaptation, analogous to human relationships.

Circles indicate bacterial strains within a population. Red and green stars indicate the two alleles of a SNP. Red and green rectangles indicate different accessory genes or genome islands. Blue arrows indicate transitions between different SNP alleles, or gain/loss of genes and genome islands. Black arrows indicate the transitions between different stages, or cycles within a stage. (a) Casual: frequent gene flow generated multiple combinations of different variants within a population. Some combinations might have high fitness but alternate combinations arise frequently due to gene flux and adaptation at individual loci. (b) Going steady: coadapted interactions, such as the SNPs (red star) and accessory gene (red rectangles), become more difficult to dislodge despite ongoing genetic exchange due to their high fitness when present in combination. (c) Married: coadapted interactions become fixed in the population and led to further co-adaptation in multiple genome regions. (d) Setting up home together: as the progressive enlargement of coadapted regions in the genome, the entire genome becomes differentiated, which prompt the barriers (horizontal line) to genetic exchange between different ecogroups.

Author response image 1
Maximum likelihood tree of 469 non-redundant strains based on 53 concatenated ribosomal gene nucleotide sequences.

Red branches indicate strains in the discovery dataset. The colored circles indicate populations (inner) and ecogroups (outer) according to the legend.

Tables

Table 1
Summary of interactions detected in coadaptation screen.
Total numberIG1IG2-90
NumberFractionNumberFraction
SNP-SNP pair2.3 × 10102891860.00%227510.00%
SNP-Accessory gene pair2.2 × 1091139730.01%11880.00%
Accessory gene-gene pair2.1 × 108134870.01%122640.01%
SNP15195715401.01%3330.22%
Synonymous (Syn)11754110840.92%2260.19%
Nonsynonymous (NonSyn)236733791.60%1070.45%
NonSyn/Syn0.20.350.47
Core gene3936822.25%180.56%
Accessory gene144863382.33%11227.75%
Table 2
Summary of interaction group 1 variants.
TierCore SNPsAccessory genes
a-T1520 SNPs (359 Syn, 137 NonSyn) in 21 genes of 8 blocks, 14 genes encoding lateral flagellar66 genes in 10 blocks, 11 COG M genes, 5 T2SS genes
a-T2121 SNPs (68 Syn, 43 NonSyn) in 18 genes of 10 blocks, 10 genes encoding lateral flagellar22 genes in 10 blocks, 2 COG M genes, 3 T2SS genes
a-T3190 SNPs (137 Syn, 38 NonSyn) in 44 genes of 18 blocks, 17 genes encoding lateral flagellar46 genes in 13 blocks, 5 COG M genes, 3 COG NU genes, 4 T2SS genes
b-T112 SNPs (5 Syn, 2 NonSyn) lin 3 genes of 2 blocks, 2 genes encoding lateral flagellar33 genes in 2 blocks, 12 COG M genes
b-T225 SNPs (15 Syn, 10 NonSyn) in 3 genes of 1 block encoding lateral flagellar8 genes in 2 blocks, 1 COG M gene
d-T15 SNPs (3 syn, 2 nonsyn) in 1 gene encoding transmembrane31 genes in 5 blocks, 23 genes (1 block) encoding T6SS
d-T20 SNP14 genes in 4 blocks, 8 genes (1 block) encoding cellulose synthase
O-136 SNPs (28 Syn, 10 NonSyn) in 3 genes of 2 blocks, 2 genes encoding lateral flagellar, 1 TonB gene6 genes in 4 blocks, 1 COG H gene
O-227 SNPs (18 Syn, 1 NonSyn) in 3 genes in 1 block, 2 genes encoding LuxR family transcriptional regulator2 genes in 1 block, COG M and T
O-3368 SNPs (269 Syn, 97 NonSyn) in 4 genes of 2 blocks, encoding multidrug resistance protein, lipase and long-chain fatty acid transport protein0 gene
O-40 SNP7 genes in 5 blocks, 4 genes encoding transferase
Others236 SNPs (182 Syn, 41 NonSyn) in 48 genes, 8 genes encoding lateral flagellar102 genes, 15 COG M genes
Key resources table
Reagent type
(species) or
resource
DesignationSource or
reference
IdentifiersAdditional information
Software, algorithmfineSTRUCTUREhttp://paintmychromosomes.com https://doi.org/10.1371/journal.pgen.1002453SCR_018170
Software, algorithmMUMmerhttp://mummer.sourceforge.net/ https://doi.org/10.1186/gb-2004-5-2-r12SCR_018171
Software, algorithmProkkahttps://github.com/tseemann/prokka https://doi.org/10.1093/bioinformatics/btu153SCR_014732
Software,
algorithm
Roaryhttps://sanger-pathogens.github.io/Roary/ https://doi.org/10.1093/bioinformatics/btv421SCR_018172
Software, algorithmTreeBesthttp://treesoft.sourceforge.net/treebest.shtmlSCR_018173
Software, algorithmiTOLhttps://itol.embl.de/ https://doi.org/10.1093/nar/gkw290SCR_018174
Software, algorithmSuperDCAhttps://github.com/santeripuranen/SuperDCA https://doi.org/10.1099/mgen.0.000184SCR_018175
Software, algorithmSpydrPickhttps://github.com/santeripuranen/SpydrPick https://doi.org/10.1093/nar/gkz656SCR_018176
Software, algorithmCircoshttp://circos.ca/ https://doi.org/10.1101/gr.092759.109SCR_011798

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)