Evolution of host-microbe cell adherence by receptor domain shuffling

  1. EmilyClare P Baker
  2. Ryan Sayegh
  3. Kristin M Kohler
  4. Wyatt Borman
  5. Claire K Goodfellow
  6. Eden R Brush
  7. Matthew F Barber  Is a corresponding author
  1. Institute of Ecology and Evolution, University of Oregon, United States
  2. Department of Biology, University of Oregon, United States
7 figures, 1 table and 2 additional files

Figures

Interactions between epithelial carcinoembryonic antigen-associated cell adhesion molecules (CEACAMs) and bacterial adhesins.

Bacterial attachment to host cells via adhesin proteins (purple) facilitates epithelial adherence. Adhesins also contribute to pathogenicity by promoting invasion, modulation of host cell signaling pathways, and by promoting the delivery of virulence factors into the host cell cytoplasm.

Figure 2 with 1 supplement
Rapid evolution of primate carcinoembryonic antigen-related cell adhesion molecule (CEACAM) N-domains.

(A) Sites in CEACAM proteins exhibiting elevated ω. Domain structure of CEACAMs outlined in red (N-domain), light gray (IgC-like domains), dark gray (transmembrane domain), and black (cytoplasmic domain). All rapidly evolving sites identified by at least one phylogenetic analysis (PAML, FUBAR, or MEME) are marked by a white line, sites identified by two or three tests signified by gray and red asterisks, respectively. Blue line shows the proportion of rapidly evolving sites identified across a 10 amino acid sliding window. (B) Multiple sequence alignment of hominid CEACAM1 residues 26–98. Sites identified as evolving under positive selection and sites known to influence adhesin and host protein binding are highlighted (Figure 2—source data 1F). (C) Protein co-crystal structure of human CEACAM1 (gray) and the HopQ adhesin (purple) from Helicobacter pylori strain G27 (PDB ID: 6GBG). CEACAM1 sites identified as evolving under positive selection by two or more tests are highlighted.

Figure 2—source code 1

Code to generate graphs and images for Figure 2A.

https://cdn.elifesciences.org/articles/73330/elife-73330-fig2-code1-v3.zip
Figure 2—source data 1

(a) Summary of primate carcinoembryonic antigen-related cell adhesion molecule (CEACAM) sequences used in analyses.

Table summarizing primate CEACAM sequences extracted for evolutionary analyses and phylogenetic reconstructions. (b) Summary of primate CEACAM identification. Table summarizing BLAST results, genome annotation, and sequence analyses used to identify human CEACAM orthologs in primates. (c) Additional notes on primate CEACAM identification. Table of additional notes on CEACAM sequences used in analyses. (d) PAML NS sites results summary. Table of PAML NS sites tests of selection in primate CEACAMs. (e) Summary of sites identified by evolutionary analyses. Table of sites identified as evolving under positive selection by evolutionary analyses and GARD predicted recombination breakpoints. (f) References for CEACAM1 binding sites. Table of references for sites identified as contributing to CEACAM1 binding with host proteins and bacterial adhesins as well as the specific sites identified.

https://cdn.elifesciences.org/articles/73330/elife-73330-fig2-data1-v3.xlsx
Figure 2—source data 2

Trimmed carcinoembryonic antigen-related cell adhesion molecule (CEACAM) sequences and primate species trees used for evolutionary analyses.

https://cdn.elifesciences.org/articles/73330/elife-73330-fig2-data2-v3.zip
Figure 2—source data 3

Results files for evolutionary analyses.

https://cdn.elifesciences.org/articles/73330/elife-73330-fig2-data3-v3.zip
Figure 2—figure supplement 1
Primate carcinoembryonic antigen-related cell adhesion molecule (CEACAM) evolutionary analysis summary.

Sites with elevated dN/dS in all human CEACAM proteins. (A) Sites in CEACAM proteins identified as evolving rapidly in specific domains by one (white line), two (gray asterisks), or three (red asterisks) evolutionary analyses. Dotted blue line indicates the proportion of sites identified as evolving rapidly across a 10 amino acid sliding window. Open triangles show GARD predictions of the approximate locations of recombination breakpoints. (B) Location of human CEACAM genes along chromosome 19. Other genes on chromosome 19 not shown.

Figure 3 with 1 supplement
Carcinoembryonic antigen-related cell adhesion molecule 1 (CEACAM1) divergence in great apes restricts bacterial adhesin recognition.

(A) Binding between primate GFP-tagged CEACAM1 N-domain orthologs and bacteria determined by pulldown assays and visualized by western blotting. Input is 10% CEACAM1 protein used in bacterial pulldowns. Primate species relationships indicated by phylogenetic tree. (B) Pulldown experiments of Helicobacter pylori strain G27 incubated with CEACAM1 N-domain constructs or GFP assayed by flow cytometry. Binding indicated by relative GFP fluorescence. Representative western blot and flow cytometry experiments are depicted. For flow cytometry all tests shown were performed as part of a single experiment using H. pylori strain G27 alone as a negative control.

Figure 3—source data 1

Raw and labeled western blot images for Figure 3A and flow cytometry data for Figure 3B.

https://cdn.elifesciences.org/articles/73330/elife-73330-fig3-data1-v3.zip
Figure 3—figure supplement 1
Helicobacter pylori G27 Δhopq pulldown.

Binding assay to assess interactions between H. pylori strain G27 Δhopq and GFP-tagged CEACAM1 N-domain constructs for human, chimpanzee, and gorilla, by pulldown experiments and visualization by western blot.

Figure 4 with 6 supplements
Recurrent episodes of gene conversion among adhesin-binding carcinoembryonic antigen-related cell adhesion molecules (CEACAMs).

(A) Maximum likelihood-based phylogeny of full-length primate CEACAM protein-coding sequences. (B) Phylogeny of the IgV-like (N-domain) of primate CEACAM proteins. (C) Expanded cladogram view of the clade containing the N-domains of CEACAM1, CEACAM3, CEACAM5, and CEACAM6 from panel B. Arrows indicate nodes designating clades for Old World monkeys (OWM), hominoids (Hom), and New World monkeys (NWM). Specific subclades, gorilla CEACAM3 and CEACAM5, orangutan CEACAM5 and CEACAM1, and NWM are further magnified and highlighted with bootstrap support at nodes. (D) Domain structures of CEACAM proteins predicted to have undergone recombination by GARD analysis with sites of predicted breakpoints highlighted (blue arrows). CEACAM N-domains are denoted in red.

Figure 4—source code 1

Code to generate images for Figure 4D.

https://cdn.elifesciences.org/articles/73330/elife-73330-fig4-code1-v3.zip
Figure 4—source data 1

Sequence alignments of trimmed carcinoembryonic antigen-related cell adhesion molecule (CEACAM) sequences used for phylogenetic reconstructions.

https://cdn.elifesciences.org/articles/73330/elife-73330-fig4-data1-v3.zip
Figure 4—figure supplement 1
Alignment of human-pan carcinoembryonic antigen-related cell adhesion molecule (CEACAM) sequences.

Human, chimpanzee, and bonobo CEACAM1 (A) and CEACAM5 (B) alignments by MAFFT translation alignment implemented in Geneious Prime 2020.2.2. Black lines mark differences from consensus. Lower bars show location of CEACAM domains.

Figure 4—figure supplement 2
Expanded full-length carcinoembryonic antigen-related cell adhesion molecule (CEACAM) tree.

Maximum likelihood-based phylogeny of full-length CEACAM protein-coding sequences as represented in Figure 4A, with clades expanded. Clades encompassing individual CEACAM orthologs are shown isolated and expanded.

Figure 4—figure supplement 3
Expanded carcinoembryonic antigen-related cell adhesion molecule (CEACAM) N-domain tree.

Maximum likelihood-based phylogeny of CEACAM IgV-like (N-domain) sequences as represented in Figure 4B, with clades expanded. Clades encompassing individual CEACAM orthologs along with the CEACAM1, CEACAM3, CEACAM5, and CEACAM6 clade are shown isolated and expanded.

Figure 4—figure supplement 4
Expanded view of carcinoembryonic antigen-related cell adhesion molecule (CEACAM)1,3,5,6 N-domain clade.

Expanded view of CEACAM1, CEACAM3, CEACAM5, and CEACAM6 clade from Figure 4B.

Figure 4—figure supplement 5
Expanded carcinoembryonic antigen-related cell adhesion molecule (CEACAM) IgC domains tree.

Maximum likelihood-based phylogeny of CEACAM IgC-like domain sequences. Expanded view of CEACAM20 clade shown.

Figure 4—figure supplement 6
Expanded carcinoembryonic antigen-related cell adhesion molecule (CEACAM) cytoplasmic domain tree.

Maximum likelihood-based phylogeny of CEACAM cytoplasmic domain sequences. Clades encompassing individual CEACAM orthologs are shown isolated and expanded.

Figure 5 with 1 supplement
Rapid divergence of the bonobo carcinoembryonic antigen-related cell adhesion molecule 1 (CEACAM1) N-domain impairs bacterial adhesin recognition.

(A) Graph shows a fifty base pair sliding window plotting identity between bonobo CEACAM1 N-domain sequence and other CEACAM sequences. Asterisks mark locations of residues mutated for adhesin-binding assays. (B) Windows show amino acids and their structures at sites selected for mutational analysis in humans and bonobos. Lower right depicts a protein co-crystal structure of human CEACAM1 and Helicobacter pylori G27 HopQ with sites selected for mutagenesis highlighted. (C) Representative western blots of pulldown experiments assaying binding between chimeric human and bonobo CEACAM1 N-domain constructs and bacterial strains.

Figure 5—figure supplement 1
Alignment of rapidly evolving N-domain region in hominids.

Multiple sequence alignment of carcinoembryonic antigen-related cell adhesion molecule (CEACAM)1, CEACAM3, CEACAM5, and CEACAM8 orthologs for human, bonobo, chimpanzee, gorilla, and orangutan. Translation of each nucleotide sequence is positioned on the line below. Sites known to influence adhesin and host protein binding (Figure 2—source data 1F) are indicated as are sites identified as evolving under positive selection.

Figure 6 with 4 supplements
Abundant human carcinoembryonic antigen-related cell adhesion molecule 1 (CEACAM1) variants restrict pathogen binding.

(A) Frequency of haplotypes containing variants Q1K, A49V, and Q89H across human populations (map from BioRender.com). (B) CEACAM1 crystal structure highlighting high-frequency human variants and sites found to be evolving under positive selection across simian primates. (C) Representative western blots of pulldown experiments testing binding between combinations of high-frequency human variants in the human CEACAM1 reference background and bacterial strains.

Figure 6—source code 1

Code for analyzing carcinoembryonic antigen-related cell adhesion molecule 1 (CEACAM1) haplotypes and generating graphs for Figure 6A.

https://cdn.elifesciences.org/articles/73330/elife-73330-fig6-code1-v3.zip
Figure 6—source data 1

Data files for carcinoembryonic antigen-related cell adhesion molecule 1 (CEACAM1) haplotypes for Figure 6A and Figure 6—figure supplements 1 and 2.

https://cdn.elifesciences.org/articles/73330/elife-73330-fig6-data1-v3.zip
Figure 6—source data 2

Raw and labeled western blot images for Figure 6C.

https://cdn.elifesciences.org/articles/73330/elife-73330-fig6-data2-v3.zip
Figure 6—figure supplement 1
Human carcinoembryonic antigen-related cell adhesion molecule (CEACAM)-like CEACAM1 haplotypes.

Other CEACAM-like human CEACAM1 haplotypes. Alignment of human CEACAM1, CEACAM3, and CEACAM5 N-domain reference nucleotide sequences with amino acid translations below. Long invariable alignment regions are removed. Sites that differ in CEACAM3 or CEACAM5 relative to CEACAM1 are bolded. Sites found in variant CEACAM1 haplotypes are in black. Changes that encode the high-frequency variants Q1K, A49V, and Q89H are in red. Below alignment each row is a unique human CEACAM1 N-domain haplotype. Lines indicate variant regions in CEACAM1. Only haplotypes that increase similarity to CEACAM3 or CEACAM5 are shown.

Figure 6—figure supplement 2
Human carcinoembryonic antigen-related cell adhesion molecule 1 (CEACAM1) haplotype frequencies.

Frequency of variant human CEACAM1 haplotypes. (A) Overall frequency of CEACAM1 variants Q1K, 449V, Q89H, and other variant haplotypes in humans. The indicated CEACAM-like haplotypes are enumerated in Figure 6—figure supplement 1. (B) Frequency of CEACAM1 variants across different human populations.

Figure 6—figure supplement 2—source code 1

Code for analyzing carcinoembryonic antigen-related cell adhesion molecule 1 (CEACAM1) haplotypes and generating graphs for Figure 6—figure supplement 2.

https://cdn.elifesciences.org/articles/73330/elife-73330-fig5-code5-v3.zip
Figure 6—figure supplement 3
Human carcinoembryonic antigen-related cell adhesion molecule 3 (CEACAM3) variation.

Human CEACAM1-like CEACAM3 haplotypes. (A) Alignment of human CEACAM1 and CEACAM3 reference sequences. Disagreements are bolded in red with the amino acid translation below each sequence. Below alignment each row represents a unique human CEACAM3 haplotype. Lines indicate variant regions that match the human CEACAM1 reference sequence. Only haplotypes that increase similarity to the human CEACAM1 reference sequence are shown. (B) Overall frequency of variant CEACAM3 haplotypes in humans. The CEACAM1-like haplotypes indicated are enumerated in panel A. (C) Frequency of CEACAM3 variants across different human populations.

Figure 6—figure supplement 3—source code 1

Code for analyzing carcinoembryonic antigen-related cell adhesion molecule 3 (CEACAM3) haplotypes and generating graphs for Figure 6—figure supplement 3.

https://cdn.elifesciences.org/articles/73330/elife-73330-fig6-code6-v3.zip
Figure 6—figure supplement 3—source data 1

Data files for carcinoembryonic antigen-related cell adhesion molecule 3 (CEACAM3) haplotypes for Figure 6—figure supplement 3.

https://cdn.elifesciences.org/articles/73330/elife-73330-fig6-figsupp3-data1-v3.zip
Figure 6—figure supplement 4
Human carcinoembryonic antigen-related cell adhesion molecule 5 (CEACAM5) variation.

Human CEACAM1-like CEACAM5 haplotypes. (A) Alignment of human CEACAM1 and CEACAM5 reference sequences. Disagreements are bolded in red with the amino acid translation below each sequence. Below alignment each row represents a unique human CEACAM5 haplotype. Lines indicate variant regions that match the human CEACAM1 reference sequence. Only haplotypes that increase similarity to the human CEACAM1 reference sequence are shown. (B) Overall frequency of variant CEACAM5 haplotypes in humans. The CEACAM1-like haplotypes indicated are enumerated in panel A. (C) Frequency of CEACAM5 variants across different human populations.

Figure 6—figure supplement 4—source code 1

Code for analyzing and generating graphs for carcinoembryonic antigen-related cell adhesion molecule 5 (CEACAM5) haplotypes for Figure 6—figure supplement 4.

https://cdn.elifesciences.org/articles/73330/elife-73330-fig7-code7-v3.zip
Figure 6—figure supplement 4—source data 1

Data files for carcinoembryonic antigen-related cell adhesion molecule 5 (CEACAM5) haplotypes for Figure 6—figure supplement 4.

https://cdn.elifesciences.org/articles/73330/elife-73330-fig6-figsupp4-data1-v3.zip
Model of carcinoembryonic antigen-related cell adhesion molecule (CEACAM) evolution in primates.

(A) Bacterial adhesins recognize a subset of epithelial CEACAM proteins and avoid binding with decoy CEACAM receptors present on neutrophils. (B) Gene conversion facilitates the shuffling of regions of the CEACAM N-domain that alter binding to bacterial adhesins. (C) Through gene conversion outlined in B, epithelial CEACAM proteins avoid binding by bacterial adhesins while the CEACAM decoy receptor gains binding, triggering bacterial clearance through phagocytosis.

Tables

Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Strain, strain background (Helicobacter pylori)G27Baltrus et al., 2009
Strain, strain background (Helicobacter pylori)J99Alm et al., 1999
Strain, strain background (Helicobacter pylori)Tx30aATCC51932
Strain, strain background (Helicobacter pylori)omp27::cat-sacB in NSH57Yang et al., 2019H. pylori strain G27 with HopQ deletion
Strain, strain background (Escherichia coli)Rosetta (DE3) pLySLab collectionE. coli strain for outer membrane IPTG inducible expression of Neisserial Opa proteins
Strain, strain background (Escherichia coli)DH5αLab collectionE. coli strain for maintenance and propagation of pET-28a plasmid constructs
Strain, strain background (Escherichia coli)One Shot Top10 Chemically Competent cellsThermo Fisher ScientificC404010E. coli strain for cloning, maintenance and propagation of pcDNA3 GFP LIC plasmid constructs
Cell line (Homo sapiens)HEK293TATCCRRID:CVCL_0063; CRL-3216
Recombinant DNA reagentpET-28a (plasmid)GenscriptPlasmid backbone for expression of Neisserial Opa proteins
Recombinant DNA reagentpcDNA3 GFP LIC (plasmid)AddgeneRRID:Addgene_30127; #30,127Plasmid backbone for expression of primate CEACAM1 N-domain constructs in HEK293T cells
AntibodyMouse monoclonal antibody mixture;Mouse α-GFP clones 7.1 and 13.1Sigma-AldrichRRID:AB_390913; 118144600011:103 dilution; Primary antibody for visualization of GFP labeled CEACAM1 N-domain constructs
AntibodyGoat polyclonal antibody; goat α-mouse conjugated to horseradish peroxidaseJackson ImmunoResearchRRID:AB_10015289; 115-035-0031:104 dilution; Secondary antibody for visualization of GFP labeled CEACAM1 N-domain constructs
OtherAdvansta WesternBright ECL HRP SubstrateThomas ScientificK-12049-D50Reagent to visualize proteins bound by secondary antibody in a western blot
Software, algorithmPAML4.9hhttp://abacus.gene.ucl.ac.uk/software/paml.html Yang, 2007RRID:SCR_014932
Software, algorithmFUBARhttps://www.datamonkey.orgMurrell et al., 2013RRID:SCR_010278
Software, algorithmMEMEclassic.datamonkey.orgMurrell et al., 2012RRID:SCR_010278
Software, algorithmGARDclassic.datamonkey.org Kosakovsky Pond et al., 2006RRID:SCR_010278
Sequence-based reagentbon_gCCM1N_F3This paperPCR primerPrimer for initial amplification of bonobo CEACAM1 N-domain from genomic DNA [TTCACAGAGTGCGTGTACCC]
Sequence-based reagentbon_gCCM1N_R2This paperPCR primerPrimer for initial amplification of bonobo CEACAM1 N-domain from genomic DNA [CCTCCCAGGTTCAAGCGATT]
Sequence-based reagentbon_gCCM1N_F1This paperPCR primerPrimer for secondary amplification of bonobo CEACAM1 N-domain from genomic DNA [CAGTGGAGGGGTGAAGACAC]
Sequence-based reagentbon_gCCM1N_R1This paperPCR primerPrimer for secondary amplification of bonobo CEACAM1 N-domain from genomic DNA [CATGTTGGTCAGGCTGGTCT]
Sequence-based reagentbon_gCCM1N_seqF1This paperSequencing primerPrimer to sequence bonobo CEACAM1 N-domain amplified from genomic DNA [CCCGTTTTTCCACCCTAATGC]
Sequence-based reagentbon_gCCM1N_seqF4This paperSequencing primerPrimer to sequence bonobo CEACAM1 N-domain amplified from genomic DNA [GGGGAAAGAGTGGATGGCAA]
Sequence-based reagentbon_gCCM1N_seqR2This paperSequencing primerPrimer to sequence bonobo CEACAM1 N-domain amplified from genomic DNA [TGGGGGAATCACTCACGGTA]
Biological sample (pan paniscus)AG05253Nels EldeRRID:CVCL_1G37Bonobo genomic DNA sample
Software, algorithmR v4.1.2https://cran.r-project.org/RRID:SCR_003005
Software, algorithmPython 3.7Python Software Foundation https://www.python.org/RRID:SCR_008394
Software, algorithmJupyterNotebook 5.7.4Project Jupyter https://jupyter.org/RRID:SCR_018315
Software, algorithmAnacondaNavigator 1.9.12Anaconda, Inc https://www.anaconda.com/

Additional files

Supplementary file 1

A. Oligomers and DNA templates.

Table of oligomers, DNA templates, and their order in assembly reactions used to assemble carcinoembryonic antigen-associated cell adhesion molecule 1 (CEACAM1) N-domain expression plasmids. B. Sources templates for plasmid components. Table listing sources of template sequences for CEACAM1 and other plasmid components used for expression plasmid construction.

https://cdn.elifesciences.org/articles/73330/elife-73330-supp1-v3.xlsx
Transparent reporting form
https://cdn.elifesciences.org/articles/73330/elife-73330-transrepform1-v3.docx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. EmilyClare P Baker
  2. Ryan Sayegh
  3. Kristin M Kohler
  4. Wyatt Borman
  5. Claire K Goodfellow
  6. Eden R Brush
  7. Matthew F Barber
(2022)
Evolution of host-microbe cell adherence by receptor domain shuffling
eLife 11:e73330.
https://doi.org/10.7554/eLife.73330