Overview of MSH6 structure and histone reader domains

a) Predicted (Alphafold2) structures for MSH6 (MutS homolog), together with b) heatmaps showing predicted alignment error, a measure for predicted folding confidence. The “Linker” regions marks the region, predicted to be intrinsically disordered, that separates MutS repair domains from histone reader domains, c) Alignment between PWWP and Tudor domains of MSH6, with shared amino acids highlighted by black borders and arrows highlighted experimentally determined amino acids that bind histone PTMs. d) Predicted structure of human MSH6 PWWP domain interaction with H3K36me3, highlighting aromatic residues that form binding cage, e) Predicted structure of Arabidopsis MSH6 Tudor domain interaction with H3K4me1, highlighting aromatic residues that form binding cage.

Taxonomic diversity of MSH6 histone reader presence and absence.

Tree is based on NCBI taxonomy, showing species for which unambiguous MSH6 orthologs and domain presence/absence could be called. Tree generated with PhyloT, with branches representing taxonomic hierarchy, not time. See Supplemental Table 1 for complete list of species and MSH6 domain calls. For this visualization, fungal genera are represented by a single species. The color of branches indicates proportion of species containing histone reader domains in MSH6 (green = Tudor, orange = PWWP, gray = No reader).

Conservation in MSH6 domain architectures and histone readers.

a) MSH6 domain architectures (InterProScan) for MSH6 protein (200 random species with and without histone reader domains, aligned to the MutS1 domain of MSH6. Relationship between distance to H3 amino acids and amino acid Shannon entropy across all species, colored by conservation (proportion AA = consensus) for b) PWWP domains and c) Tudor domains. Consensus amino acids with corresponding functional class shown. Amino acid conservation is equal to the frequency of the most frequently observed amino acid. Distances: Tudor + H3K4me1 approximated by superimposition of domains to reference PDB 7DE9; PWWP + H3K36me3 with reference PDB 5iu.

Species with histone readers fused to MSH6 differ in genomic and other traits.

a) Summary of genomic traits among species with MSH6-Tudor, -PWWP, or no reader, scaled to percentile. Boxplots: box = median± interquartile range (IQR), whiskers = 1.5 x IQR. b) Summary of contrasts: binomial generalized linear model and linear model (left), phylogenetic binary regression with phylolm::phy1olm and with ape::binaryPGLMM (right) using TimeTree V5 phylogeny (Kumar et al. 2022). Analyses for log10 transformed and rank values were tested. Binomial tests compare species with either Tudor or PWWP vs species with no reader, c) Example of phylogenetic trend in in mean #exons/gene in relation to the presence of MSH6 reader. Left: ancestral reconstruction of mean logw(#exons/gene) across eukaryotes, with tips colored by presence of Tudor (green) and PWWP (orange) domains in MSH6. Right: results from binomial phylogenetic regression, d) Population and other traits across species as described in (a). Number of species with these data are indicated by (n = X). Estimated body size and effective population size from Buffalo 2021. Lifespan from Tacutu et al. 2018. Mutation data from Wang & Obbard 2023. e) Statistical contrasts as described in (b).

Phylogeny of of Eukaryotes with branch lengths.

a) Tree from TimeTree v5 with branches and tips colored by presence of MSH6 reader domains. Contains 2004 species. See Supplemental Table 1 for complete list of species, b) Illustrative examples of MSH6 lacking histone reader domain in Metazoa. Alphafold predicted structures of MSH6 in the model organisms, Drosophila melanogaster and Caenorhabditis elegans.

Ancestral reconstruction of MSH6 reader domain presence and absence.

Generated with ace function from ape in R. Left shows results from ace(reader ,tree,type=“discrete”,model=“ER”), right shows ace(reader ,tree,type=“discrete”,model=“ARD”)· Top panels show reconstruction of all readers, middle shows Tudor alone, and bottom shows PWWP. Pie plots at internal nodes show the posterior probability of ancestral state.

Results from MSH6 blastp queries in all archaeal NCBI proteomes.

a) Human MSH6 with PWWP domain b) Arabidopsis MSH6 with Tudor domain. All results returned were annotated as MutS homologs. Top 100 shown from NCBI query results. Queries for Human MSH6 PWWP and Arabidopsis MSH6 Tudor did not yield any hits. In contrast, significant blastp results for PWWP and Tudor were found in MSH6 orthologs of distantly related eukaryotes, Cnidaria and Stramenopiles, respectively.

Lack of evidence that brown algae (Stramenopiles) acquired MSH6-Tudor via horizontal gene transfer from algae.

Maximum likelihood tree from MutS domains of putative MSH6 orthologs in non-metazoan and non-fungal organisms. ML tree was completed with IQ-TREE from protein alignments of MSH6 only for domains MutS domain I, II, III, IV and V. Tips highlighted in green have blastp results for Tudor domain. The topology shows that the Stramenopiles’ MutS domains are sister to other SAR organisms, which lack Tudor domains, rather than sister to green algae which would have been a putative donor through horizontal gene transfer.

ESCO fusions to fungal MSH6.

a) NCBI taxonomy tree of fungi with branches in red marking presence of ESCO domains, b) Domain architectures of MSH6-ESCO, aligned to MutS domain I.

MSH7 in plants lacks a Tudor domain.

a) Eukaryotes with NCBI branches colored (red) according to fraction with predicted MSH7 orthologs. 274/314 Viridiplantae species had predicted MSH7 orthologs, with 0 showing evidence of Tudor domains, b) Illustrative examples from Zea mays and Arabidopsis thaliana. Alphafold2 predicted structures of MutS homolog 7 (MSH7) in plants. MSH7 arose in plants via duplication of MSH6. The lack of histone domain at the terminus of MSH7 is also confirmed by domain prediction with Interproscan.

Species with histone readers fused to MSH6 differ in genomic and other traits.

Summary of genomic and other characteristics among species with MSH6-Tudor, -PWWP, or no reader, Iog10 transformed values. Boxplots: box = median± interquartile range (IQR), whiskers = 1.5 x IQR. a) All species. Significance of differences reported in Fig. 4b,d. b) Only protestóme animals. Significance of difference from binomial regression: ***p<1×108 **p<1×10’3, *p<0.05.

Workflow to identify MHS6 orthologs and their fusions to PWWP or Tudor domain histone readers.