Regulatory and coding sequences of TRNP1 co-evolve with brain size and cortical folding in mammals

  1. Zane Kliesmete
  2. Lucas Esteban Wange
  3. Beate Vieth
  4. Miriam Esgleas
  5. Jessica Radmer
  6. Matthias Hülsmann
  7. Johanna Geuder
  8. Daniel Richter
  9. Mari Ohnuki
  10. Magdelena Götz
  11. Ines Hellmann  Is a corresponding author
  12. Wolfgang Enard  Is a corresponding author
  1. Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians-Universität, Germany
  2. Physiological Genomics, BioMedical Center - BMC, Ludwig-Maximilians-Universität, Germany
  3. Institute for Stem Cell Research, Helmholtz Zentrum München, Germany Research Center for Environmental Health, Germany
  4. Department of Environmental Microbiology, Eawag, Switzerland
  5. Department of Environmental Systems Science, ETH Zurich, Switzerland
  6. SYNERGY, Excellence Cluster of Systems Neurology, BioMedical Center (BMC), Ludwig-Maximilians-Universität München, Germany
4 figures, 1 table and 4 additional files

Figures

Figure 1 with 3 supplements
TRNP1 amino acid substitution rates co-evolve with brain size and cortical folding in mammals.

(A) Mammalian species for which body mass, brain size, gyrification index (GI) measurements, and TRNP1 coding sequences were available (n=30)(Figure 1—figure supplement 1). Log2-transformed units: body mass and brain size in kg; GI is a ratio (cortical surface/perimeter of the brain surface). (B) Estimated marginal and partial correlation between ω of TRNP1 and the three traits using Coevol (Lartillot and Poujol, 2011). Size indicates posterior probability (pp). (C) TRNP1 protein substitution rates (ω) significantly correlate with brain size (r=0.83, pp = 0.97).(D) The average correlation across 124 control proteins with brain size (r¯=0.10). (E) TRNP1 ω correlation with GI compared to the average across control proteins. (F) TRNP1 ω correlation with body mass compared to the average across control proteins. (C, D, E, F) Error bars indicate standard errors. (G) Distribution of partial correlations between ω and brain size of the control proteins and TRNP1. (H) Distribution of partial correlations between ω and GI of the control proteins and TRNP1. (I) Scheme of the mouse TRNP1 protein (223 amino acids [AAs]) with intrinsically disordered regions (orange) and sites (red lines) subject to positive selection in mammals (ω > 1, pp>0.95Figure 1—figure supplement 1). Letter size of the depicted AAs represents the abundance of AAs at the positively selected sites.

Figure 1—figure supplement 1
TRNP1 protein-coding sequence analysis.

(A) Multiple alignment of 45 TRNP1-coding sequences (99.0% completeness) using phylogeny-aware aligner PRANK. The alignment is 735 bases long, which translates to 245 amino acids (AAs). For comparison: human TRNP1 coding sequence is 227 AA long, whereas mouse – 223 AAs. (B) Sites under positive selection across the phylogenetic tree according to PAML M8 site model (in total 9.8% of sites with ω>1, likelihood ratio test (LRT), p-value < 0.001). The depicted sites had a posterior probability Pr(ω > 1) > 0.95 according to naive empirical Bayes analysis. Colours of the amino acids indicate their relatedness in biochemical properties. Sites with light-grey background and a dash indicate indels.

Figure 1—figure supplement 2
Estimated marginal (A) and partial (B) correlation matrices of the combined Coevol model including the three traits and substitution rates of TRNP1.

Posterior probabilities of the associations are depicted in brackets.

Figure 1—figure supplement 3
Control protein evolution rate correlation with brain size, gyrification, and body mass.

(A) A flowchart depicting the selection of comparable control proteins to infer the average protein correlation rate with the included phenotypes across the mammalian phylogeny of 30 species. 1088 1-exon protein sequences with a comparable length to TRNP1 were available in human CCDS, 132 of which had good full-length alignment quality across the 30 species, a log(dS) tree length less than 3× SD away from the average and all belong to a different gene. 124 of these converged between the three Coevol runs. RBB – reciprocal best blat. (B) TRNP1 and 132 control proteins show comparable synonymous substitution rates (log(dS), top) and protein evolution rates (ω, middle), inferred using PAML branch free-ratios model. All included proteins have high-quality full-length alignments (bottom), quantified as the mean relative tree length across all alignment positions per protein. Brown lines indicate TRNP1, black dashed lines – the average across all proteins, black dotted lines – median across all proteins. (C) Marginal and partial correlation distribution of 125 proteins, including TRNP1, with the three phenotypes: brain size, gyrification index (GI), and body mass inferred using Coevol. Dashed lines indicate the average across all proteins.

Figure 2 with 1 supplement
TRNP1 proliferative activity correlates with brain size and cortical folding.

(A) Five different TRNP1 orthologues were transfected into neural stem cells (NSCs) isolated from cerebral cortices of 14-day-old mouse embryos and proliferation rates were assessed after 48 hr using Ki67 immunostaining as proliferation marker and green fluorescent protein (GFP) as transfection marker in 7–12 independent biological replicates. (B) Representative image of the transfected cortical NSCs immunostained for GFP and Ki67. Arrows indicate three transfected cells of which two (solid arrows) are Ki67-positive (Figure 2—figure supplement 1). (C) Induced proliferation in NSCs transfected with TRNP1 orthologues from five different species (Supplementary file 2). Proliferation rates are a significant predictor for brain size (χ2=10.04, df = 1, BH-adjusted p-value = 0.0018 = 11.75 ± 2.412, R2 = 0.89) and GI (χ2=5.85, df = 1, BH-adjusted p-value = 0.016 = 16.97 ± 6.568, R2 = 0.69) in the respective species (phylogenetic generalized least squares [PGLS], likelihood ratio test [LRT]). Error bars indicate standard errors. Included species: human (Homo sapiens), rhesus macaque (Macaca mulatta), northern greater galago (Otolemur garnettii), house mouse (Mus musculus), common bottlenose dolphin (Tursiops truncatus).

Figure 2—figure supplement 1
Proliferation induced by TRNP1.

(A) Proliferation induced in neural stem cells (NSCs) transfected with TRNP1 (all TRNP1 orthologues combined) compared to the control NSCs transfected only with green fluorescent protein (GFP). TRNP1 presence in NSCs significantly increases the proliferation rates (TRNP1: 0.53 (±0.02), control: 0.34 (±0.02), df = 57). (B) Proliferation induced in NSCs transfected with TRNP1 orthologous from five different species (Supplementary file 2). (A), (B) Bars indicate standard errors of logistic regression and asterisks indicate the significance of pairwise comparisons (Tukey test, p-value: <0.1, *<0.05, **<0.001, ****<2e-16).

Figure 3 with 2 supplements
Activity of a cis-regulatory element (CRE) of TRNP1 correlates with cortical folding in catarrhines.

(A) Experimental setup of the massively parallel reporter assay (MPRA). Regulatory activity of seven putative TRNP1 CREs from 75 species were assayed in neural progenitor cells (NPCs) derived from human and cynomolgus macaque induced pluripotent stem cells. (Figure 3—figure supplement 1). (B) Fraction of the detected CRE tiles in the plasmid library per species across regions. The detection rates are unbiased and uniformly distributed across species and clades with only one extreme outlier Dipodomys ordii. (C) Fraction of the detected CRE tiles in the plasmid library per region across species. (D) Log-transformed total regulatory activity per CRE in human NPCs across species with available brain size and gyrification index (GI) measurements (n=45). (E) Total activity per CRE across species. Exon 1 (E1), intron (I), and the downstream (D) regions are more active and longer than other regions. (B, C, E) Each box represents the median and first and third quartiles with the whiskers indicating the furthest value no further than 1.5 * IQR from the box. Individual points indicate outliers. Figure 3—figure supplement 2 (F) Regulatory activity of the intron CRE is weakly associated with gyrification across mammals (phylogenetic generalized least squares [PGLS], likelihood ratio test [LRT] p-value=0.097, R2=0.07, n=37) and strongest across great apes and Old World monkeys, that is, catarrhines (PGLS, LRT p-value=0.003, R2=0.58, n=10).

Figure 3—figure supplement 1
Length of the covered cis-regulatory element (CRE) sequences in the massively parallel reporter assay (MPRA) library across the tree.

Species for which the regions were inferred based on DNase hypersensitive sites (DHS) from embryonic brain (Bernstein et al., 2010) are marked in bold and black: human (H. sapiens) and mouse (M. musculus). These species do not show extreme differences in length compared to others (human: 5/7, mouse: 3/5 regions within the 10% and 90% quantiles). The orthologous CRE sequence length differs strongly between primate and non-primate species, being on average 1.8–2.8 times longer in the primate species than in the other mammals.

Figure 3—figure supplement 2
Analysis of massively parallel reporter assay (MPRA) data.

(A) Pairwise correlation of the log2-transformed cis-regulatory element (CRE) tile activity between the three transduced cell lines: human 1, human 2, and macaque. Pearson’s r is specified in the brackets of figure titles. (B) Pairwise correlation of the log2-transformed summarized activity per CRE region between cell lines. Pearson’s r is specified in the brackets of figure titles.

Figure 4 with 2 supplements
Transcription factors (TFs) with binding site enrichment on intron cis-regulatory elements (CREs) regulate cell proliferation and are candidates to explain the observed activity across catarrhines.

(A) Orthologous intron CRE sequences show different regulatory activities under the same cellular conditions, suggesting variation in cis regulation across species. (B) Variance-stabilized expression in neural progenitor cells (NPCs) of TRNP1 and the 22 TFs with enriched binding sites (motif weight ≥ 1) on the intron CREs. Each box represents the median, first and third quartiles with the whiskers indicating the furthest value no further than 1.5 * IQR from the box. Points indicate individual expression values. Vertical line indicates average expression across all 392 TFs (5.58), grey area: standard deviation (1.61). (C) Eight top enriched biological processes (Gene Ontology, Fisher’s exact test p-value <0.05) of the 22 TFs. Background: all expressed TFs (392). (D) Variation in binding scores of the enriched TFs across catarrhines. Heatmaps indicate standardized binding scores (grey), gyrification index (GI) values (blue) and intron CRE activities (yellow) from the respective species. TF background colour indicates gene ontology assignment of the TFs to the two most significant biological processes. The bottom panel indicates the spatial position of the top binding site (motif score >3) for each TF on the human sequence. (E) Binding scores of three TFs (CTCF, ZBTB26, SOX8) are the best candidates to explain intron CRE activity, whereas only CTCF binding shows an association with the GI (phylogenetic generalized least squares [PGLS], likelihood ratio test [LRT] p-value <0.05). (F) Predicted intron CRE activity by the binding scores of the three TFs vs. the measured intron CRE activity across catarrhines.

Figure 4—figure supplement 1
TRNP1 expression in human and cynomolgus macaque (Macaca fascicularis) cell lines.

(A) Variance-stabilized expression of TRNP1. (B) Expression ranks of TRNP1. (C) Differential expression results between the three cell lines. Colour indicates significant differential expression using Benjamini-Hochberg adjusted p-value cutoff of 0.05.

Figure 4—figure supplement 2
Human genome tracks for the TRNP1 locus (hg19).

TAD boundary (blue) ends just upstream of the TRNP1 exon 1 in human germinal zone (gestational week 8, Hi-C) (Won et al., 2016). In agreement, CTCF ChIP-seq peak was detected within intron (I) CRE (neural progenitor cells [NPCs]). Another nearby CTCF binding site that is bound by CTCF is in the upstream 1 (U1) CRE sequence (ENCODE Project Consortium (2012)). Histonemark H3K27ac (light brown) and H3K4me2 (dark brown) data from human developing cortex (gestational week 7, Reilly et al., 2015).

Tables

Appendix 1—key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Gene (45 mammal species)TRNP1See Supplementary file 1aSee Supplementary file 1aSee Supplementary file 1a
Strain, strain background (E. coli)NEB 10-betaNew England Biolabs; Rowley, MA,
United States
Cat# C3020KElectrocompetent E. coli
Strain, strain background (E. coli)NEB 5-alpha High EfficiencyNew England Biolabs; Rowley, MA,
United States
Cat# C2987IChemically competent E. coli
Cell line (Macaca fascicularis)Cynomolgus Macaque NPCThis paper, based on Geuder et al., 2021N15_39B2Macaca fascicularis neural progenitor cells
Cell line (Mus musculus)N2AATCC; Manassas, VA,
United States
CCL-131
Cell line (Homo sapiens)HEK293TATCC; Manassas, VA,
United States
CRL-11268
Cell line (Homo sapiens, female)Human NPC 1This paper, based on
Geuder et al., 2021
N4_29B5Human neural progenitor cells
Cell line (Homo sapiens, male)Human NPC 2This paper, based on
Geuder et al., 2021
N4_12 C2Human neural progenitor cells
Biological sample (Mus musculus)Primary murine cerebral cortex cells (NSC)This paper, based on
Esgleas et al., 2020
primarySee Methods
Sequence-based reagentMPRA oligo Library Trnp1 CRECustom Array; Redmond, WA,
United States
custoSee https://github.com/Hellmann-Lab/Co-evolution-TRNP1-and-GI
Transfected construct (multiple species)MPRA Library in lentiviral particlesThis papercustomLentiviral particles with pMPRA-lenti and TRNP1 CRE library
Antibodyrabbit anti Ki67 (monoclonal)Abcam; Waltham, MA,
United States
Cat# ab92742, Clone EPR36101:100
Antibodychicken anti-GFP (polyclonal)Aves Labs; Davis, CA,
United States
RRID: AB_2307313, Cat# GFP-1010, Polyclonal1:500
Recombinant DNA reagentpCAG-GFP_Gateway plasmidDr. Paolo MalatestaNAKind gift of Dr. Paolo Malatesta
Recombinant DNA reagentpMDLg/pRRE plasmidAddgene; Waterton, MA,
United States
Addgene 12251
Recombinant DNA reagentpRSV-Rev plasmidAddgene; Waterton, MA,
United States
Addgene 12253
Recombinant DNA reagentpMD2.G plasmidAddgene; Waterton, MA,
United States
Addgene 12259
Recombinant DNA reagentpMPRAlenti1 plasmidAddgene; Waterton, MA,
United States
Addgene 61600Kind gift of Dr. Davide Cacchiarelli
Recombinant DNA reagentpNL3.1[Nluc/minP] plasmid, SfiI restriction site mutatedDr. Davide CacchiarelliNAKind gift of Dr. Davide Cacchiarelli
Recombinant DNA reagentpMPRA1 plasmidAddgene; Waterton, MA,
United States
Addgene 49349Kind gift of Dr. Davide Cacchiarelli
Recombinant DNA reagentpENTR1a plasmidStahl et al., 2013pENTR1a
Peptide, recombinant proteinhEGFMiltenyi Biotec; Bergisch Gladbach,
Germany
Cat#130-093-825
Peptide, recombinant proteinB-27 SupplementThermo Fisher Scientific;
Waltham, MA,
United States
Cat#12587–010
Peptide, recombinant proteinN2 SupplementThermo Fisher Scientific;
Waltham, MA,
United States
Cat#17502048
Peptide, recombinant proteinL-Ascorbic acid 2-phosphateSigma/Merck; St. Louis, MO,
United States
Cat#A8960-5G
Peptide, recombinant proteinpoly-D-lysineSigma/Merck; St. Louis, MO,
United States
Cat# A-003-E
Peptide, recombinant proteinbFGFPeproTech, Cranbury, New Jersey,
United States
Cat#100-18B
Commercial assay or kitGenomiPhi V2 DNA-Amplification KitSigma/Merck; St. Louis, MO,
United States
Cat# GE25-6600-32
Commercial assay or kitGateway LR Clonase Enzyme mixThermo Fisher Scientific; Waltham, MA,
United States
Cat# 11791019
Commercial assay or kitLipofectamine 2000Thermo Fisher Scientific; Waltham,
MA, United States
Cat# 11668019
Commercial assay or kitLipofectamine 3000Thermo Fisher Scientific; Waltham,
MA, United States
Cat# L3000015
Commercial assay or kitMicellula DNA Emulsion & Purification KitRoboklon; Berlin, GermanyCat# E3600-01
Commercial assay or kitAgilent High Sensitivity DNA KitAgilent; Santa Clara, CA,
United States
Cat# 5067–4626
Commercial assay or kitNextera XT DNA Library Preparation KitIllumina; San Diego, CA,
United States
Cat# FC-131–1024
Chemical compound, drugGlutaMax-IThermo Fisher Scientific;
Waltham, MA, United States
Cat# 35050038
Chemical compound, drugBlasticidin S HClThermo Fisher Scientific;
Waltham, MA, United States
Cat# R21001
Chemical compound, drugDMEM-GlutaMAXThermo Fisher Scientific;
Waltham, MA, United States
Cat# 10566016
Chemical compound, drugPolybreneSigma/Merck; St. Louis, MO,
United States
Cat# TR-1003-G
Chemical compound, drugTRI reagentSigma/Merck; St. Louis, MO,
United States
Cat# T9424-200ML
Chemical compound, drugGeltrexThermo Fisher Scientific;
Waltham, MA, United States
Cat# A1413302
Sequence-based reagentTrnp1 CRE resequencing primersIntegrated DNA Technologies,
Coralville, IO,
United States
customSee https://github.com/Hellmann-Lab/Co-evolution-TRNP1-and-GI
Sequence-based reagentTrnp1 coding resequencing forward primerIntegrated DNA Technologies,
Coralville, IO,
United States
customGGGAGGAGTAAACACGAGCC
Sequence-based reagentTrnp1 coding resequencing reverse primerIntegrated DNA Technologies,
Coralville, IO,
United States
customAGCCAGGTCATTCACAGTGG
Software, algorithmHotspot version 4.0.0John et al., 2011, http://www.uwencode.org/software/hotspotNA
Software, algorithmBLAT version 35x1Kent, 2002,
https://github.com/djhshih/blat
NA
Software, algorithmPriMux, compiled on 20 July 2014Hysom et al., 2012, https://sourceforge.net/projects/primux/NA
Software, algorithmdeML version 1.1.3Renaud et al., 2015,
https://github.com/grenaud/deml
NA
Software, algorithmcutadapt version 1.6Martin, 2011,
https://anaconda.org/bioconda/cutadapt
NA
Software, algorithmTrinity version 2.0.6Grabherr et al., 2011, https://github.com/trinityrnaseq/trinityrnaseq/releasesNA
Software, algorithmrBLAST version 0.99.2https://github.com/mhahsler/rBLASTNA
Software, algorithmPRANK version 150803Löytynoja, 2021,
http://wasabiapp.org/software/prank/
NA
Software, algorithmPAML version 4.8Yang, 1997,
http://abacus.gene.ucl.ac.uk/software/paml.html
NA
Software, algorithmCoevol version 1.4Lartillot and Poujol, 2011, https://megasun.bch.umontreal.ca/People/lartillot/www/downloadcoevol.htmlNA
Software, algorithmNextGenMap (NGM) version 0.0.1Sedlazeck et al., 2013, http://cibiv.github.io/NextGenMap/NA
Software, algorithmPrimer BlastYe et al., 2012NA
Software, algorithmzUMIs version 2.4.5bParekh et al., 2018,
https://github.com/sdparekh/zUMIs
NA
Software, algorithmSTAR version STAR_2.6.1 cDobin et al., 2013,
https://github.com/alexdobin/STAR
NA
Software, algorithmDESeq2 version 1.26.0Love et al., 2014, BioconductorNA
Software, algorithmCluster Buster, compiled on Jun 13 2019Frith et al., 2003, http://cagt.bu.edu/page/ClusterBuster_downloadNA
Software, algorithmR version 3.6/4https://www.r-project.org/NA
Software, algorithmnlme version 3.1–143https://cran.r-project.org/web/packages/nlme/index.htmlNA
Software, algorithmtopGO version 2.40.0Alexa, 2009, https://bioconductor.org/packages/release/bioc/html/topGO.htmlNA
Software, algorithmape version 5.4https://cran.r-project.org/web/packages/ape/index.htmlNA
Software, algorithmmultcomp version 1.4–13https://cran.r-project.org/web/packages/multcomp/index.htmlNA
Software, algorithmRR2 version 1.0.2https://cran.r-project.org/web/packages/rr2/index.htmlNA

Additional files

Supplementary file 1

Summaries of all information for the Coevol analyses, including the data sources for genome sequence and phenotype information as well as relevant Coevol outputs.

Source information on TRNP1 protein sequences (1a), primate gDNA (1b), phenotype information (1c) as well as detailed results from PAML (Yang, 1997) (1d) and Coevol (Lartillot and Poujol, 2011) results for TRNP1 and the control proteins (1f, 1e, 1g).

https://cdn.elifesciences.org/articles/83593/elife-83593-supp1-v1.xlsx
Supplementary file 2

Model selection for NSC proliferation (2a) as well as proliferation rates based on the selected model (2b) and statistical testing of pairwise differences (2c).

https://cdn.elifesciences.org/articles/83593/elife-83593-supp2-v1.xlsx
Supplementary file 3

Analyses of TRNP1 CREs and their activities and a characterization of TF binding sites within.

TRNP1 DNase hypersensitive sites (3a), phylogenetic generalized least squares (PGLS) model selections using likelihood ratio test for all seven CREs and the whole phylogeny (3b) as well as only the intron CRE in Old World monkeys and great apes (3c) and enriched gene ontologies based on the transcription factors (TFs) with binding site enrichment in the intron CRE (3d).

https://cdn.elifesciences.org/articles/83593/elife-83593-supp3-v1.xlsx
MDAR checklist
https://cdn.elifesciences.org/articles/83593/elife-83593-mdarchecklist1-v1.pdf

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Zane Kliesmete
  2. Lucas Esteban Wange
  3. Beate Vieth
  4. Miriam Esgleas
  5. Jessica Radmer
  6. Matthias Hülsmann
  7. Johanna Geuder
  8. Daniel Richter
  9. Mari Ohnuki
  10. Magdelena Götz
  11. Ines Hellmann
  12. Wolfgang Enard
(2023)
Regulatory and coding sequences of TRNP1 co-evolve with brain size and cortical folding in mammals
eLife 12:e83593.
https://doi.org/10.7554/eLife.83593