Recombination, meiotic expression and human codon usage

Abstract
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Synonymous codon usage (SCU) varies widely among human genes. In particular, genes involved in different functional categories display a distinct codon usage, which was interpreted as evidence that SCU is adaptively constrained to optimize translation efficiency in distinct cellular states. We demonstrate here that SCU is not driven by constraints on tRNA abundance, but by large-scale variation in GC-content, caused by meiotic recombination, via the non-adaptive process of GC-biased gene conversion (gBGC). Expression in meiotic cells is associated with a strong decrease in recombination within genes. Differences in SCU among functional categories reflect differences in levels of meiotic transcription, which is linked to variation in recombination and therefore in gBGC. Overall, the gBGC model explains 70% of the variance in SCU among genes. We argue that the strong heterogeneity of SCU induced by gBGC in mammalian genomes precludes any optimization of the tRNA pool to the demand in codon usage.

https://doi.org/10.7554/eLife.27344.001

Introduction

In humans, the usage of synonymous codons varies substantially among genes. Both adaptive and nonadaptive processes, not mutually exclusive, have been proposed to explain the existence of codon usage biases (Duret, 2002; Chamary et al., 2006; Plotkin and Kudla, 2011). The main adaptive model, called translational selection, proposes that synonymous codon usage (SCU) and abundance of tRNA are co-adapted to optimize the efficiency of translation (Ikemura, 1981; Kanaya et al., 2001; Drummond and Wilke, 2008; Hershberg and Petrov, 2008; dos Reis and Wernisch, 2009). The selective pressure on translational efficiency (in terms of both speed and accuracy) is expected to be more pronounced in highly expressed genes because they mobilize a large number of ribosomes (Bulmer, 1991) and are subject to stronger constraints on translational errors (Akashi, 1994; Drummond and Wilke, 2008). A first prediction of this model is that preferred codons should correspond to the most abundant tRNAs, particularly in highly expressed genes. A second prediction is that codon usage bias should correlate with gene expression patterns and tRNA contents. Both predictions are verified in some animals, such as flies and nematodes, the genomes of which show clear signatures of translational selection (Shields et al., 1988; Duret and Mouchiroud, 1999; Duret, 2002; Castillo-Davis and Hartl, 2002).

The situation is different in mammals, and notably humans, where the influence of translational selection is still strongly debated (Duret, 2002; Chamary et al., 2006; Plotkin and Kudla, 2011). It has long been shown that variation in SCU between genes is correlated to large-scale fluctuations of GC-content along chromosomes, the so-called isochores (Bernardi et al., 1985; Mouchiroud et al., 1988; Mouchiroud et al., 1991; Clay and Bernardi, 2011). The fact that codon usage correlates with the base composition of non-coding regions demonstrates that SCU is affected by a process that is not linked to translational selection. And indeed, there is strong evidence that isochores are the consequence of GC-biased gene conversion (gBGC), a form of segregation distortion that occurs during meiotic recombination and that favors the transmission of GC alleles over AT alleles (Duret and Galtier, 2009; Munch et al., 2014; Williams et al., 2015). This non-adaptive process leads to an increase in GC-content in regions of high recombination rate, which affects both coding and non-coding regions, including synonymous codon positions (Galtier and Duret, 2007; Duret and Galtier, 2009; Glémin et al., 2015).

In principle, this does not exclude that besides gBGC, codon usage bias might also be affected by translational selection. Interestingly, several studies have reported that human codon usage varies among genes expressed in different tissues or cell types (Vinogradov, 2003; Plotkin et al., 2004; Gingold et al., 2014). In particular, strong variations in SCU are observed among sets of human genes associated to different functional categories and notably between sets of genes involved in cellular proliferation or differentiation (Gingold et al., 2014). The relative abundance of tRNA varies also according to the proliferative or differentiation state of cells, which was logically interpreted in term of translational selection: different cell types express specific sets of genes whose coding sequence is co-adapted with specific pools of tRNAs (Gingold et al., 2014). If true, this has important implications regarding the role of translational regulation in determining cell fate (differentiation versus proliferation).

However, this interpretation stands in contradiction with two other studies examining tRNA abundance in mammals. First, although expression levels of individual tRNA genes vary substantially between tissue types and developmental stages in mice, the collective expression levels of isoacceptor tRNAs (which recognize the same codon) remain constant. Thus, the pool of available anticodons is stable throughout development (Schmitt et al., 2014). Second, in continuation to this work, a recent study specifically contrasted cells undergoing proliferation and those undergoing differentiation, and found no covariation of tRNA pool and codon usage between these cells (Rudolph et al., 2016). Both results are inconsistent with the differences in SCU between functional classes as being a consequence of translational selection.

The question of the relative contributions of adaptive and nonadaptive processes to variation in codon usage in mammals therefore remains open: on the one side, patterns of tRNA abundances do not fit with the translational selection model, but on the other side, the reason why codon usage varies among functional categories is not yet understood. Here, we examined the hypothesis that variation in codon usage might result from differences in transcription activity in meiotic cells. Indeed, it has been observed that intragenic recombination rate correlates negatively with expression level in the germline (McVicker and Green, 2010). It is therefore possible that differences in germline expression levels among functional categories induce differences in gBGC, and hence codon usage biases.

To test this hypothesis, we analyzed SCU among different functional categories of human genes, and investigated covariation with GC-content, recombination rate and expression patterns. We first show that the variation in codon usage among functional categories results from differences in GC content. Then, we propose a new test that demonstrates that variation in SCU is not associated with translational selection. Instead, SCU correlates with large-scale variation in genomic GC-content and with differences in intragenic recombination rate. In turn, the difference in intragenic recombination rate between functional categories is explained by their expression level in meiosis. Altogether, GC-content of non-coding regions and meiotic expression explain 70% of the variation in SCU of human genes. In the end, our results are fully consistent with the hypothesis that SCU is driven by gBGC, and not by translational selection. They indicate that the differences observed among functional categories reflect variation in long-term intragenic recombination rates, resulting from differences in meiotic expression levels.

Results

Variation in codon usage among functional categories results from differences in GC-content

To better understand the causes of the differences in codon usage between sets of genes involved in cellular proliferation and differentiation (reported by [Gingold et al., 2014]), we started by investigating the main factors that discriminate codon usage between functional categories in general. For this purpose, we grouped genes per functional category (687 biological processes, associated to more than 40 genes in the Gene Ontology database), and computed codon frequencies for each of these gene sets. We used the classification proposed by Gingold et al. (2014) to distinguish GO gene sets associated to ‘proliferation’ or ‘differentiation’. Variation in relative synonymous codon usage (RSCU; see Materials and methods) among GO gene sets was analyzed by Principal Component Analysis (PCA). The first principal component of this analysis segregates ‘proliferation’ (red dots) from ‘differentiation’ (blue dots) GO categories (Figure 1A). Thus, in agreement with Gingold et al. (2014), synonymous codon usage clearly varies between functional categories in general, and between proliferation and differentiation in particular. Previous studies had shown that synonymous codon usage is correlated to GC content at third position of codons – termed GC3 (Mouchiroud et al., 1988). And indeed, we observed that the average GC3 of each GO gene set is perfectly correlated to their coordinates on the first PCA axis (R² = 0.99; Figure 1B). Hence, variation in SCU between functional categories is fully explained by variation in GC3.

Figure 1

Download asset Open asset

Variation in synonymous codon usage and in GC3 among functional categories.

(A) Factorial map of the principal-component analysis of synonymous codon usage in GO functional categories in the human genome. Each dot corresponds to a GO gene set, for which the relative synonymous codon usage (RSCU) was computed. GO categories that are associated with ‘differentiation’ or with ‘proliferation’ are displayed in blue and in red, respectively. (B) Correlation between the RSCU of GO gene sets (first PCA axis) and their average GC-content at third codon position (GC3). (C) Distribution of GC3 of human protein coding genes. Red: ‘proliferation’ genes (N = 1,008); blue: ‘differentiation’ genes (N = 2,833); grey: other genes (N = 12,129). (D) Correlation between the GC3 of mono-isoacceptor amino acids and multi-isoacceptor amino acids. For each GO gene set, the average GC3 was computed separately for amino acids decoded by multiple tRNA isoacceptors (N = 14 multi-isoacceptor amino acids), and for those decoded by one single tRNA isoacceptor (mono-isoacceptor amino acids: Phe, Asp, His, Cys). Amino-acids encoded by a single codon (Met, Trp) were excluded.

https://doi.org/10.7554/eLife.27344.002

On average, in our dataset, each gene is associated to nine GO biological processes. Many genes belong to more than one GO biological-process category, either because they have several functions (pleiotropy) or because these categories are nested from specific to broad functions. Hence, GO-terms are not independent. To avoid this redundancy, for the remainder of this study we switched from analyses at the level of GO gene sets to analyses at the level of individual genes (except when stated otherwise). Each gene was assigned with one of three categories based on their GO annotation: 1008 genes associated with ‘proliferation’, 2833 genes associated with ‘differentiation’, and 12,129 ‘other’ genes unrelated to these key words (see Materials and methods). Genes associated to ‘proliferation’ are on average less GC-rich than genes associated to ‘differentiation’ (mean GC3 0.53 and 0.61 in the two subsets respectively). The two distributions of GC3 differ significantly from each other (t-test, p-value<2.10⁻¹⁶), and their peaks coincide with each of the two modes observed for the rest of the genome (Figure 1C).

Variation in synonymous codon usage is not driven by translational selection

We first investigated whether the observed variation in synonymous codon usage (i.e. variation in GC3) might be driven by translational selection. This model proposes that the relative usage of synonymous codons should co-vary with the abundance of their cognate tRNAs. A property of the tRNA gene repertoires allows us to test this hypothesis. The human genome contains 506 tRNA genes (decoding the 20 standard amino acids), corresponding to 48 different tRNA isoacceptors (Chan and Lowe, 2016). Among the 18 amino acids having two or more synonymous codons, 4 are decoded by a single tRNA isoacceptor (mono-isoacceptor amino acids: Phe, Asp, His and Cys), and the 14 other ones are decoded by several tRNA isoacceptors (multi-isoacceptors amino acids).

For multi-isoacceptors amino acids, the relative abundance of the different tRNA isoacceptors can vary among different cell types, and hence might covary with the relative synonymous codon usage of genes preferentially expressed in these cell types. For instance, let us consider Gln, which has two synonymous codons (CAG, CAA) that are decoded by two tRNA isoacceptors (respectively anticodons CTG and TTG). Let us consider a theoretical example of two cell types (say A and B) that differ in their relative tRNA abundance (CTG-tRNA being more abundant in A cells, and TTG-tRNA in B cells). According to the translational selection model, sets of genes that are over-expressed in A cells, should preferentially use the CAG codon, whereas genes that are over-expressed in B cells, should preferentially use the CAA codon. However, mono-isoacceptor amino acids are, by definition, decoded by a single tRNA isoacceptor and the relative tRNA abundance cannot vary across cell types. Hence, according to the translational selection model, the relative synonymous codon usage for mono-isoacceptor amino acids is not expected to vary among cell-specific gene sets. In other words, for mono-isoacceptor amino acids, variation in synonymous codon usage among GO gene sets cannot be explained by co-adaptation with the tRNA pool.

To test whether variation in synonymous codon usage was driven by translational selection, we computed synonymous codon usage (GC3) in GO gene sets, separately for codons corresponding to mono-isoacceptor amino acids and for codons corresponding to multi-isoacceptor amino acids. We observed that the range of variation in GC3 is very similar for mono- and multi-isoacceptor amino acids. Importantly, the two parameters are strongly correlated (R² = 0.90) (Figure 1D). This implies that GC3 variation is driven by a process that affects both mono-isoacceptor and multi-isoacceptor amino acids, and hence that this process is not related to variation in tRNA abundance. This observation holds true for all functional categories, including those associated to differentiation or proliferation (red and blue dots in Figure 1D).

Impact of large-scale variation in genomic GC-content on synonymous codon usage

We observed that the GC3 of genes correlates with the GC-content of their flanking regions (Figure 2A, Figure 2—figure supplement 1, R² = 0.48, p-value<2.10⁻¹⁶). This correlation is observed for all genes, including the subsets of genes associated with ‘proliferation’ and ‘differentiation’ (R² = 0.48 and 0.46, all p-values<2.10⁻¹⁶). Thus, variation in SCU between genes is to a large extent attributable to the GC-content of the genomic region in which they are located (the isochore effect). However, when the regional GC-content is controlled for, there remains a difference in GC3 between gene categories (Figure 2A): for a given regional GC-content, the GC3 of proliferation-associated genes is lower than that of differentiation or other genes. This difference is highly significant (Figure 2A, Figure 2—figure supplement 1, p-value<2.10⁻¹⁶). This implies that the difference in synonymous codon usage between these gene categories does not result from a preferential location in different isochores.

Figure 2 with 1 supplement see all

Download asset Open asset

Difference in SCU between ‘proliferation’ and ‘differentiation’ genes is linked to variation in intragenic crossover rate, and not to their isochore context.

(A) Variation in gene GC3 according to the GC content of their flanking region (GC-flank) in each functional category. Genes were first binned into 10 classes of equal sample size according to their GC-flank, and then split into three sets according to their functional category: ‘proliferation’ (red), ‘differentiation’ (blue), and ‘other’ genes (grey). Boxplots display the distribution of GC3 for each functional category within each GC-flank bin. (B) Mean sex-averaged intragenic crossover rate (HapMap) in each functional category. Error bars represent the 95% confidence interval of the mean.

https://doi.org/10.7554/eLife.27344.003

Variation in synonymous codon usage among functional categories correlates with differences in intragenic recombination rate

Previous studies have demonstrated that the evolution of GC-content along chromosomes is driven by meiotic recombination, both on a broad (Mb) scale (Duret and Arndt, 2008; Munch et al., 2014) and on a fine (kb) scale (Clément and Arndt, 2013; Pratto et al., 2014). There is now strong evidence that this correlation between GC-content and recombination is caused by the process of GC-biased gene conversion (gBGC) which leads to increase the GC-content in regions of high recombination (Galtier et al., 2001; Galtier and Duret, 2007; Duret and Galtier, 2009; Munch et al., 2014; Pratto et al., 2014; Williams et al., 2015). Recombination rate varies along chromosomes, and notably tends to be lower within genes than in flanking regions (Myers et al., 2005; McVicker and Green, 2010). Interestingly, we observed that intragenic crossover rates (in cM/Mb) differ among the three sets of genes defined previously, and covary with their GC3: the average intragenic crossover rate is lower in ‘proliferation’ genes compared to other genes, whereas it is higher in ‘differentiation’ genes (Figure 2B; p-value of Kruskal-Wallis test <2.10⁻¹⁶ as for all pairwise Wilcoxon tests). These observations are therefore consistent with the hypothesis that differences in GC3 between ‘differentiation’ and ‘proliferation’ genes could also be driven by gBGC.

The difference in intragenic recombination rate between functional categories is explained by their expression level in meiosis

McVicker and Green (2010) reported a negative correlation between intragenic recombination rate and meiotic gene expression level. We reevaluated this relationship using recently published high-resolution genetic maps (Bhérer et al., 2017), meiotic double-strand breaks (DSBs) maps (Pratto et al., 2014) and meiotic gene expression datasets (Guo et al., 2015; Lesch et al., 2016). These new data show that the relationship between crossover rate and meiotic gene expression is even stronger than initially reported: we observed that the crossover rate is 3.5 (males) to 5.4 (females) times lower in highly expressed genes (top 10%) compared to weakly expressed genes (bottom 10%) (Figure 3A, Figure 3—figure supplement 3A,B). This reduction in crossover rate is explained, at least in part, by a lower density of meiotic DSB hotspots within highly expressed genes (Figure 3—figure supplement 3C). In agreement with Bhérer et al. (2017), we observed an elevation of crossover rate around transcription start sites, specifically in females (Figure 4—figure supplement 1). However, this peak is observed only in genes with low or medium meiotic expression level (Figure 4). Within genes with high meiotic expression level, we observed a strong reduction of crossover rate in both sexes, affecting the entire transcription unit, from the TSS to the polyadenylation site (Figure 4).

Figure 3 with 3 supplements see all

Download asset Open asset

Variation in intragenic crossover rate and GC3 according to expression levels in meiotic cells.

(A) Genes were classified according to their sex-averaged expression level in meiotic cells into 10 bins of equal sample size. The mean sex-averaged intragenic crossover rate (HapMap) was computed for each bin. Error bars represent the 95% confidence interval of the mean. Similar results were obtained when analyzing sex-specific crossover rates and expression levels or when using DSB maps to measure of recombination rate (Figure 3—figure supplement 3). (B) Variation in GC3 according to meiotic expression levels. Genes were first binned into 3 classes of equal sample size according to their sex-averaged expression level in meiotic cells (low:<3.07 FPKM; high:>22.68 FPKM: medium: the others), and then split into three sets according to their functional category: ‘proliferation’ (red), ‘differentiation’ (blue), and ‘other’ genes (grey). Boxplots display the distribution of GC3 for each functional category within each expression bin.

https://doi.org/10.7554/eLife.27344.005

Figure 4 with 2 supplements see all

Download asset Open asset

Variation in crossover rate as a function of the distance to transcription start site (TSS) and to the polyadenylation site, and according to meiotic expression level.

Autosomal genes longer than 5 kb (N = 15,055) were classified into three bins of equal sample size according to their expression level in female (top panels) or male meiosis (bottom panels): low (green), medium (orange) and high (red) expression level. Sex-specific crossover rates were measured in 1 kb-long non-overlapping windows. Shaded area represent the 95% confidence interval of the mean.

https://doi.org/10.7554/eLife.27344.009

We also analyzed other RNA-seq data sets (either from single cells or bulk samples), covering a broad range of tissues/cell types: somatic or germ cells at different stages of developing male and female embryo (20 different conditions; [Guo et al., 2015]) and differentiated adult tissues (26 somatic tissues, plus testis, which contains a fraction of germ cells; [Fagerberg et al., 2014]). In agreement with McVicker and Green (2010), we observed that the negative correlation between expression level and intragenic crossover rate is stronger in germ cells than in somatic samples (Figure 3—figure supplement 1), which indicates that recombination is associated with expression level, specifically in meiotic cells.

Many ‘proliferation’ genes are involved in basic cellular functions, and hence, tend to be expressed at relatively high levels in many tissues and at all developmental stages. In particular, most of these genes are highly expressed in meiotic cells: 65% of ‘proliferation’ genes are among the top 33% of genes with highest expression level (whereas only 11% are in the first tercile; Figure 3—figure supplement 2). Conversely, only 26% of ‘differentiation’ genes are highly expressed in meiotic cells, while 42% of are in the first tercile (Figure 3—figure supplement 2). This large proportion of ‘proliferation’ genes with high meiotic expression levels can therefore explain why they tend to have relatively low intragenic crossover rate (Figure 2B), and hence, given the gBGC process, why they tend to have a lower GC3 (Figure 1C). To further test whether these differences in expression patterns could account for the difference in GC3 between ‘proliferation’, ‘differentiation’ and ‘other genes’, we binned genes into three classes of increasing meiotic expression level. The distribution of GC3 is clearly shifted toward lower values for genes highly expressed at meiosis (top 33%), compared to genes weakly expressed (bottom 33%): the average GC3 is 0.51 in the ‘high’ category compared to 0.65 in the ‘low’ category (p-value<2.10⁻¹⁶) (Figure 3B). However, there is no significant difference in the distribution of GC3 between ‘proliferation’ and ‘differentiation’ within bins of low or high expression (p-value=0.68 and 0.15 respectively). For the mid-expression bin, there is still a significant difference of GC3 between ‘proliferation’ and ‘differentiation’ (p-value=3.2.10⁻⁸), potentially explained by differences in expression between categories within this bin. Thus, most of the difference in synonymous codon usage between functional categories (Figure 1C) disappears once level of expression during meiosis is controlled for (Figure 3B).

Thus, differences in synonymous codon usage among gene categories in human can be explained through the following causative chain: (i) The set of ‘proliferation’ genes is enriched in genes highly expressed in meiosis. (ii) Because high expression at meiosis is associated with a decreased rate of recombination, intragenic recombination rates are lower in the ‘proliferation’ set. (iii) In turn, reduced intragenic recombination diminishes the effect of gBGC on exon base composition, and hence GC3 is lower in the set ‘proliferation’ compared to ‘differentiation’.

To check whether this cascade of effects fully recapitulates the difference in synonymous codon usage between ‘proliferation’ and ‘differentiation’, we investigated whether differences in SCU between functional categories are driven by expression level in cells undergoing meiosis, rather than by expression level in another cell type or tissue. We examined the relationship between GC3 and expression levels in a broad panel of cell and tissue conditions (Figure 5). As predicted by our model, expression levels in germ cells, either from single-cell samples or from testis (which contains germ cells) are better predictors of GC3 than expression in all other somatic tissues. Strikingly, the levels of expression in primary germ cells is, on average, twice as informative than expression in somatic cells taken at comparable stage of development (Figure 5B). Among all individual samples, the strongest correlation between GC3 and expression level was found in male meiotic cells (pachytene spermatocytes, R² = 6.3%, p-value<2.10⁻¹⁶). Female meiotic cells (primordial germ cells, PGC 17 W) showed a similar correlation level (R² = 4.0%, p-value<2.10⁻¹⁶). As expected, the correlation is even stronger with sex-averaged meiotic expression level (R² = 8.6%, p-value<2.10⁻¹⁶). Hence, these results confirm that the cell type for which gene expression level is the best predictor of GC3 (and therefore SCU) corresponds to meiotic cells.

Figure 5

Download asset Open asset

Correlation between expression level and GC3 in a panel of tissues and cell types.

(A) Bulk adult tissues data (Fagerberg et al., 2014) and (B) early embryo single-cell data (Guo et al., 2015). These two subsets were obtained via very different protocols, which prevents direct cross-comparisons. Samples are sorted by increasing correlation coefficient (R²) between expression levels and GC3 (NB: all correlations are negative). Samples containing somatic cells are shown in blue; male germ cells in orange (testis or single cell) and female germ cells in red (PGC: primordial germ cells). The green point corresponds to cells from the inner cell mass (ICM) of the blastocysts, i.e. pluripotent cells from an early stage of development preceding the differentiation of germ cells.

https://doi.org/10.7554/eLife.27344.012

GC-content of non-coding regions and meiotic expression explain 70% of the variation in synonymous codon usage of human genes

Meiotic expression is associated with a deficit of recombination rates all along the gene (Figure 4). Thus, the expression pattern is expected to affect gBGC intensity (and hence the GC-content) both in exons and in introns. Consistent with that prediction, the GC3 of human genes is strongly correlated to the GC-content of their introns (GCi, R² = 62.7%, p-value<2.10⁻¹⁶). We build a linear model to quantify the relative contribution of the different parameters that covary with the GC3 of human genes (GCi, GC-flank, intragenic crossover rate, meiotic expression level, and ‘proliferation’ or ‘differentiation’ functional category). The analysis of variance demonstrates that GCi is by far the best predictor of GC3, but GC-flank, intragenic crossover rate and gene expression level during meiosis, also significantly improve the model (by 1%, 4% and 1.4%, respectively, Table 1, ANOVA, p-values<2.10⁻¹⁶). The integration of a categorical variable ‘differentiation’ versus ‘proliferation’ in the model significantly improves the model but its quantitative influence is minor (0.1%, p-value<2.10⁻¹⁶, Table 1). Altogether, 68.2% of the variance in GC3 among human genes can be explained by the first four parameters (GCi, GC-flank, intragenic crossover rate, meiotic expression). Adding interaction terms to the linear model gives very similar results (70.4% variance explained, same levels of significance for all variables).

Table 1

Analysis of the variance of GC3 among individual genes.

Variables included in the linear model are: GC-content of introns (GCi), GC-content of flanking regions (GC-flank), HapMap sex-averaged intragenic crossover rate (log scale), sex-averaged meiotic gene expression level (log scale) and functional category (‘differentiation’, ‘proliferation’ and ‘other’). Pairwise correlations (pairwise R²) were computed between GC3 and each of the other variables. Correlations of the model (model R²) were computed by adding variables sequentially.

https://doi.org/10.7554/eLife.27344.013

GC3 predictors	Pairwise R²	p-value	Model R²	F statistic	p-value
GCi	62.7%	<2.10⁻¹⁶	62.7%	30232.4	<2.10⁻¹⁶
GC-flank	48.1%	<2.10⁻¹⁶	62.9%	126.8	<2.10⁻¹⁶
Intragenic crossover rate	12.8%	<2.10⁻¹⁶	66.8%	1453.3	<2.10⁻¹⁶
Expression level in meiosis	8.3%	<2.10⁻¹⁶	68.2%	875.7	<2.10⁻¹⁶
Functional category	1%	<2.10⁻¹⁶	68.3%	30.43	<2.10⁻¹⁶

Discussion

Biased gene conversion drives codon usage in humans

In the human genome, gene sets that belong to different functional categories differ by their synonymous codon usage. Initially this pattern has been interpreted as evidence that the translation program was under tight control, notably to ensure a precise regulation of genes involved in cellular differentiation or proliferation (Gingold et al., 2014). According to this model, selection should optimize the match between the SCU of genes and tRNA abundances in the cells where they are expressed. However, the comparison of synonymous codon usage for amino acids with single or multiple tRNA isoacceptors (Figure 1D) shows that the difference in SCU between functional categories does not result from constraints linked to tRNA abundance. In fact, variation in synonymous codon usage among functional categories is explained by one single dominant factor: the GC-content at third codon position (Figure 1B). The GC3 of human genes is strongly correlated to the GC-content of their introns and flanking regions (Table 1). This implies that variation in SCU results from a process that affects both coding and non-coding regions (including non-transcribed intergenic regions), and hence that it is not related to the process of translation. In fact, this observation invalidates all the models that assume that SCU is driven by a selective pressure acting on RNAs (not only translational selection, but also selection on mRNA processing, structure or stability).

Many lines of evidence indicate that large-scale variation in GC-content along chromosomes (isochores) is driven by the gBGC process, both in mammals and birds. First, there is direct evidence that recombination favors the transmission of GC-alleles over AT-alleles during meiosis (Odenthal-Hesse et al., 2014; Arbeithuber et al., 2015; de Boer et al., 2015; Williams et al., 2015; Smeds et al., 2016). Second, the analysis of polymorphism and divergence at different physical scales (from kb to Mb) showed that recombination induces a fixation bias in favor of GC alleles (Duret and Arndt, 2008; Clément and Arndt, 2013; Munch et al., 2014; Pratto et al., 2014; Weber et al., 2014; Glémin et al., 2015; Singhal et al., 2015). Third, the gBGC model predicts that the GC-content of a given genomic segment should reflect its average long-term recombination rate over tens of million years (Duret and Arndt, 2008). Consistent with this prediction, analyses of ancestral genetic maps in the primate lineage revealed a very strong correlation between long-term recombination rates (in 1 Mb long windows) and stationary GC-content – R² = 0.64; (Munch et al., 2014). The strong correlation between GC3 and GC-flank therefore implies that variation in synonymous codon usage is primarily driven by large-scale variation in long-term recombination rate.

Besides these regional fluctuations, recombination rates also vary at finer scale. In particular, recombination rates tend to be reduced within human genes compared to their flanking regions (Myers et al., 2005), and this decrease depends on the level of expression of genes during meiosis (McVicker and Green, 2010) – see also Figure 3A and Figure 4. Hence, the gBGC model predicts that the GC3 of a gene should depend not only of the long-term recombination rate of the region where it is located, but also on its specific pattern of expression. And indeed, we observed that the difference in synonymous codon usage between ‘proliferation’ and ‘differentiation’ genes is not due to their preferential location in different classes of isochores, but to the fact that ‘proliferation’ genes tend to be expressed a high level in meiotic cells, and therefore to have a reduced intragenic recombination rate (Figures 2 and 3).

Figure 6 with 1 supplement see all

Download asset Open asset

Relationships between GC-content, intragenic crossover rates and meiotic expression levels (sex-averaged) among functional gene categories.

Average values of these parameters were computed for each GO gene set. We then measured correlations between these parameters: (A) Mean GC3 vs. mean sex-averaged intragenic crossover rate (HapMap). (B) Mean intragenic crossover rate vs. mean expression level in meiotic cells. (C) Mean GC3 vs. mean expression level in meiotic cells. (D) Mean intronic GC-content (GCi) vs. mean intragenic crossover rate. GO gene sets associated to ‘proliferation’ (red) or ‘differentiation’ (blue) are displayed as in Figure 1. Similar results were obtained when analyzing separately expression levels in female or male meiosis (Figure 6—figure supplement 1).

https://doi.org/10.7554/eLife.27344.014

To test whether this observation holds true for other functional categories, we measured the average GC3, intragenic crossover rate and meiotic expression level of each GO gene set. As predicted by the gBGC model, we observed a strong correlation between GC3 and the average intragenic crossover rate of GO gene sets (R² = 0.51, Figure 6A). The variance in intragenic crossover rate, in turn, is very well explained by differences in meiotic expression levels among functional classes (R² = 0.46, Figure 6B). As mentioned previously, these correlations measured on gene concatenates should be interpreted with caution because the different points are not independent (a same gene can belong to different GO categories). However, this analysis clearly shows that a large fraction of the variance in SCU observed among GO gene sets can be explained by variation in gBGC intensity, caused by variation in intragenic crossover rates, linked to differences expression patterns (Figure 6C). In agreement with the gBGC model, the intragenic crossover rate correlates with the base composition of the entire gene, including introns (Figure 6D). This observation clearly invalidates the hypothesis that the observed differences in SCU among functional categories might be driven by selection on codon usage.

In summary, the SCU of individual genes depends primarily on the isochore in which they are located (i.e. large-scale long-term variation in recombination rate), and secondarily on their meiotic expression level (which affects locally the intragenic recombination rate) (Table 1). In gene set analyses, the variance in SCU explained by expression (Figure 6) appears much stronger than in individual genes analyses (Table 1). This is due to the fact that in gene set analyses, SCU is averaged over a large number of genes, located in different isochores, which leads to decrease the isochore effect among functional categories (and hence mechanically increase the fraction of the variance explained by expression). Overall, the different variables linked to the intensity of gBGC explain 70% of the variance in GC3 of individual genes (Table 1). In other words, the gBGC model can account for most of the variation in synonymous codon usage in the human genome.

It should be noted that co-variation between SCU and expression is generally considered as a typical signature of translational selection and is often used to predict optimal codons (Duret, 2002; Plotkin et al., 2004; dos Reis and Wernisch, 2009). However, as shown here, such correlations can also emerge as a result of a non-adaptive process. Given that gBGC is widespread in eukaryotes (Mancera et al., 2008; Capra and Pollard, 2011; Pessia et al., 2012; de Boer et al., 2015; Williams et al., 2015; Smeds et al., 2016), it appears essential to take this process into account to interpret variation in synonymous codon usage (and more generally in base composition) among genes.

Relationship between meiotic expression and recombination

The reason why intragenic recombination rate correlates negatively with meiotic expression level is not known. In human and mice, the location of recombination hotspots is determined by PRDM9, a Zn-Finger DNA-binding protein with histone H3 lysine four trimethylation (H3K4me3) activity. PRDM9 is expressed during early meiosis and marks sites where DSBs are afterwards introduced by Spo11 (for review, see Baudat et al., 2013). These DSBs are then repaired by homologous recombination, forming either crossovers, the reciprocal exchanges of genetic material between parental chromosomes, or noncrossovers. Knockout experiments in mice have demonstrated that PRDM9 targets recombination away from active promoters (Brick et al., 2012). The analyses of male DSB maps suggests that PRDM9 plays the same role in humans: we observed a deficit of DSB hotpots around the transcription start site (TSS), specifically within genes that are highly expressed in meiotic cells (Figure 4—figure supplement 2). The decrease in recombination rate within highly expressed genes is however not restricted to the promoter region: in both sexes, there is a strong deficit of crossovers within the entire transcription unit, from the TSS to the polyadenylation site (Figure 4). In species that lack Prdm9 (such as dogs, birds, arabidopsis or yeast), recombination hotspots are strongly enriched in active promoters (Auton et al., 2013; Choi et al., 2013; Singhal et al., 2015; Lam and Keeney, 2015), which indicates that there is no mechanistic incompatibility between recombination and transcription activity in meiotic cells. However there is evidence that in highly expressed genes, H3K36me3 marks trigger DNA methylation in the gene body, and thereby prevent spurious transcription initiation (Neri et al., 2017). It is therefore possible that the peculiar chromatin state of highly expressed genes also interferes with the binding of PRDM9 (or with its histone modification activity), and thereby decrease the rate of DSB formation within the transcription unit. Consistent with this hypothesis, we observed a deficit in male DSB hotspot density along the transcription unit of highly expressed genes (Figure 4—figure supplement 2). This difference in DSB rates is, however, much less pronounced than the difference in male crossover rates (Figure 4; Figure 3—figure supplement 3). Furthermore, the profile of DSB hotspot density in highly expressed genes differs from that of crossover rates, with a strong deficit around the TSS and an excess around the polyadenylation site (Figure 4—figure supplement 2), whereas the deficit in male crossovers is more uniform along the transcription unit (Figure 4). This suggests that the differences in crossover profiles observed between highly and weakly expressed genes might also reflect differences in the way recombination events are resolved (crossover vs. non-crossovers).

gBGC precludes selection on translation efficiency in humans

There is a clear evidence that the usage of synonymous codons is under selective pressure in some metazoan species (such as drosophila or nematode), which implies that it has a significant impact on the fitness of organisms – for review, see (Duret, 2002; Chamary et al., 2006; Plotkin and Kudla, 2011). It is a priori expected that codon usage should also affect translation efficiency (speed and accuracy) in mammals. However, our results show that selection on codon usage is not strong enough to counteract the impact of gBGC. In principle, this does not exclude the hypothesis that the human genome might be subject to selection for translational efficiency: even if the GC-content of genes is driven by non-adaptive processes, there might be a selective pressure on the expression of tRNA genes to match the demand in synonymous codon usage. However, recent analyses of tRNA isoacceptors pools found no evidence for such variation (Schmitt et al., 2014; Rudolph et al., 2016). Moreover, we argue here that the peculiar base composition landscape induced by gBGC in the genomes of mammals and birds makes it impossible to match the tRNA pool to the demand in codon usage. Indeed, large-scale variation in recombination rates along the genome causes very strong variation in GC3 among genes, and this, regardless of their functional category. In particular, ‘proliferative’ genes, which are involved in basic cellular process, and are expressed at high levels in most tissues, show a very strong heterogeneity in GC3 (from 20% to almost 100%; Figure 1C). This implies that in any given cell, the set of highly expressed genes will show a very heterogeneous usage of synonymous codons. Hence, whatever the pool of tRNA available in that cell, there will be a large fraction of genes with a codon usage that does not match tRNA abundance. In other words, the heterogeneity of synonymous codon usage in mammalian genomes reflects a non-optimal situation, caused the gBGC process, in which it is not possible to adapt the tRNA pool to the demand in codon usage of the transcriptome of any cell type.

Materials and methods

Human protein coding genes

Request a detailed protocol

For each of the human protein coding genes in the Ensembl (RRID: SCR_002344) release 83 (Yates et al., 2016); assembly GRCh38.p5), we identified a canonical transcript as defined in http://www.ensembl.org/Help/Glossary?id=346 (PERL script available in supplementary material). Mitochondrial genes were excluded from this analysis. Sequences of the remaining 19,766 canonical transcripts together with exons coordinates, were downloaded through the BioMart query interface (Smedley et al., 2015)(RRID: SCR_010714).

Recombination rates

Request a detailed protocol

Sex-specific crossover rates were measured using pedigree-based genetic maps (Bhérer et al., 2017). For sex-averaged crossover rates, we used the HapMap genetic map (Frazer et al., 2007)(RRID: SCR_002846), which is based on the analysis of linkage disequilibrium in human populations, and provides a higher resolution than pedigree-based genetic maps.

The density in DSB hotspots along genes was measured using the map of DSB hotspots (targeted by Prdm9 alleles A, B or C) identified by DMC1-ChipSeq experiments in male meiotic cells (Pratto et al., 2014).

Definition of functional categories

Request a detailed protocol

The GO Term Accessions and GO domain were retrieved from Ensembl version 83 for the 19,766 genes. We retrieved biological process GO terms, counted the number of genes associated to each GO term and kept the ones that include at least 40 genes, except GO:0005515 that is too general to be informative (‘protein binding’ GO set, which includes 14,542 genes). This led to a final list of 687 GO gene sets. For each gene set, we concatenated coding sequences to compute the total codon usage, the relative synonymous codon usage (RSCU) and GC-content, and we also computed the average intragenic crossover rate and average expression levels (see below). The RSCU of a given codon corresponds to its frequency, normalized by its expected frequency if all corresponding synonymous codons were equally used (Sharp et al., 1986). For a given amino acid (x), encoded by n_x synonymous codons, the RSCU of its codon y is given by:

R S C U_{x y} = C_{x y} / (A_{x} / n_{x})

where $C_{x y}$ is the number $y$ for amino acid $x, A_{x}$ is the total number of occurrence of codons for the amino acid x.

Following the classification used by Gingold et al. (2014), we further defined two broad functional categories: ‘proliferation’ and ‘differentiation’. GO terms containing the following keywords were associated to ‘proliferation’: ‘Chromatin modification’, ‘chromatin remodeling’, ‘mitotic cell cycle’, ‘mRNA metabolic process’, ‘negative regulation of cell cycle’, ‘nucleosome assembly’, ‘translation’. GO terms containing the following keywords were associated to ‘differentiation’: ‘Development’, ‘differentiation’, ‘cell adhesion’, ‘pattern specification’, ‘multicellular organism growth’, ‘angiogenesis’. Please note that GO terms corresponding to negative effects were excluded where appropriate (e.g. ‘negative regulation of proliferation’ was not included in the ‘proliferation’ category). Complete lists of GO terms are available in the supplementary material.

Analyses of individual genes

View detailed protocol

We also measured the codon usage of individual genes, to analyze covariations with their GC-content, expression levels and sex-averaged intragenic crossover rate (HapMap). Owing to the low SNP density in human populations, the resolution of recombination maps is limited to about 5 kb (Myers et al., 2005). Because we investigate the relationship between GC3 and intragenic crossover rate, we selected genes that are long enough to measure recombination, that is at least 5 kb long (N = 16,223 genes).

We defined three non-overlapping classes of genes according to their GO category: genes associated to at least one of the ‘proliferation’ GO terms (N = 1,008), genes associated to ‘differentiation’ GO terms (N = 2,833) and other genes (N = 12,129). A group of 253 genes that were associated to both ‘proliferation’ and ‘differentiation’ GO terms were discarded from further analyses. The final dataset used in our analyses included 15,970 genes. In this dataset, there were 15,848 genes that contain at least one intron and for which we computed the GC content of intronic regions. The analyses of sex-specific crossover rates and of DSB hotspot densities (Figure 4; Figure 4—figure supplement 2) were based on 15,055 autosomal genes.

Expression data

Request a detailed protocol

Gene expression levels were collected from three publicly available human RNA-seq experiment datasets. The first one includes 27 differentiated adult tissues (Fagerberg et al., 2014; Kryuchkova-Mostacci and Robinson-Rechavi, 2015); EBI accession number E-MTAB-1733). We downloaded normalized expression levels, already averaged across replicates, from (Fagerberg et al., 2014; Kryuchkova-Mostacci and Robinson-Rechavi, 2015) (see supplementary information). The second one is based on single-cell RNA-seq analysis, and includes 20 samples, corresponding to inner cell mass (ICM) of the blastocysts, and to primordial germ cells (PGC) and somatic cells, from male and female embryos at different development stages (4, 7 or 8, 10, 11 and 17 or 19 weeks, (Guo et al., 2015) GEO accession number GSE63818). We downloaded normalized expression levels from their dataset of pool-split PGCs (for more details see supplementary information). Female 17 weeks PGCs are entered in meiosis (Guo et al., 2015). This sample was therefore taken as representative of the transcriptome of meiotic cells in female. The third dataset corresponds to human male germ cells at pachytene spermatocytes (i.e. cells entering meiosis) and at round spermatids stages (post meiotic stage) ([Lesch et al., 2016]; GEO accession number GSE68507, human RNA expression datasets GSM1673959, GSM1673963, GSM1673967, GSM1673971, GSM1673975 and GSM1673978). Guo and Lesch datasets include several replicates for each sample. We therefore computed the average expression levels over all replicates for each sample. The sex-averaged meiotic expression level was estimated by computing the mean of expression levels in female 17 weeks PGCs (Guo et al., 2015) and male spermatocytes or spermatids (Lesch et al., 2016). The correspondence between gene expression datasets and codon usage tables was based on Ensembl gene identifiers (Fagerberg and Lesch datasets), or on gene names (Guo dataset). In total, our analyses of expression levels were based on 15,305 genes (665 genes were absent from the Guo dataset).

Statistical analysis

Request a detailed protocol

Unless stated otherwise, reported R² values correspond to Pearson correlation tests. R version 3.2.2 (Core Team R, 2015) was used with Base package for statistical tests and graphics, plus ade4 library (Dray and Dufour, 2007) for PCA analysis. The data and R scripts, which permit to reproduce the figures and tests presented here, are provided in the supplementary material.

Supplementary information

Request a detailed protocol

Supplementary materials with R scripts and supplementary methods are available at: http://doi.org/10.5281/zenodo.835063 (Pouyet et al., 2017).

Data availability

The following previously published data sets were used

1. Guo F
2. Yan L
3. Guo H
4. Li L
5. Hu B
6. Zhao Y
7. Yong J
8. Hu Y
9. Wang X
10. Wei Y
11. et al
(2015) The transcriptome and DNA methylome landscapes of human primordial germ cells.
Publicly available at the NCBI Gene Expression Omnibus (accession no: GSE63818).

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63818
1. Fagerberg L
2. Hallström BM
3. Oksvold P
4. Kampf C
5. Djureinovic D
6. Odeberg J
7. Habuka M
8. Tahmasebpoor S
9. Danielsson A
10. Edlund K
11. et al
(2014) Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics
Publicly available at ArrayExpress (accession no. E-MTAB-1733).

https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-1733/
(2016) Parallel evolution of male germline epigenetic poising and somatic development in animals.
Publicly available at the NCBI Gene Expression Omnibus (accession no: GSE68507).

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE68507
1. Yates A
2. Akanni W
3. Amode MR
4. Barrell D
5. Billis K
6. Carvalho-Silva D
7. Cummins C
8. Clapham P
9. 650 Fitzgerald S
10. Gil L
11. et al
(2007) A second generation human haplotype map of over 3.1 million SNPs.
Publicly available at the NCBI ftp site for HapMap.

ftp://ftp.ncbi.nlm.nih.gov/hapmap/recombination/
(2017) Refined genetic maps reveal sexual dimorphism in human meiotic recombination
Publicly available at Github (https://github.com/).

https://github.com/cbherer/Bherer_etal_SexualDimorphismRecombination

References

1. Akashi H
(1994)
Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy

Genetics 136:927–935.
- PubMed
- Google Scholar
(2015) Crossovers are associated with mutation and biased gene conversion at recombination hotspots
PNAS 112:2109–2114.

https://doi.org/10.1073/pnas.1416622112
- PubMed
- Google Scholar
1. Auton A
2. Rui Li Y
3. Kidd J
4. Oliveira K
5. Nadel J
6. Holloway JK
7. Hayward JJ
8. Cohen PE
9. Greally JM
10. Wang J
11. Bustamante CD
12. Boyko AR
(2013) Genetic recombination is targeted towards gene promoter regions in dogs
PLoS Genetics 9:e1003984.

https://doi.org/10.1371/journal.pgen.1003984
- PubMed
- Google Scholar
(2013) Meiotic recombination in mammals: localization and regulation
Nature Reviews Genetics 14:794–806.

https://doi.org/10.1038/nrg3573
- PubMed
- Google Scholar
1. Bernardi G
2. Olofsson B
3. Filipski J
4. Zerial M
5. Salinas J
6. Cuny G
7. Meunier-Rotival M
8. Rodier F
(1985) The mosaic genome of warm-blooded vertebrates
Science 228:953–958.

https://doi.org/10.1126/science.4001930
- PubMed
- Google Scholar
(2017) Refined genetic maps reveal sexual dimorphism in human meiotic recombination at multiple scales
Nature Communications 8:14994.

https://doi.org/10.1038/ncomms14994
- PubMed
- Google Scholar
(2012) Genetic recombination is directed away from functional genomic elements in mice
Nature 485:642–645.

https://doi.org/10.1038/nature11089
- PubMed
- Google Scholar
1. Bulmer M
(1991)
The selection-mutation-drift theory of synonymous codon usage

Genetics 129:897–907.
- PubMed
- Google Scholar
1. Capra JA
2. Pollard KS
(2011) Substitution patterns are GC-biased in divergent sequences across the metazoans
Genome Biology and Evolution 3:516–527.

https://doi.org/10.1093/gbe/evr051
- PubMed
- Google Scholar
1. Castillo-Davis CI
2. Hartl DL
(2002) Genome evolution and developmental constraint in caenorhabditis elegans
Molecular Biology and Evolution 19:728–735.

https://doi.org/10.1093/oxfordjournals.molbev.a004131
- PubMed
- Google Scholar
(2006) Hearing silence: non-neutral evolution at synonymous sites in mammals
Nature Reviews Genetics 7:98–108.

https://doi.org/10.1038/nrg1770
- PubMed
- Google Scholar
1. Chan PP
2. Lowe TM
(2016) GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes
Nucleic Acids Research 44:D184–D189.

https://doi.org/10.1093/nar/gkv1309
- PubMed
- Google Scholar
1. Choi K
2. Zhao X
3. Kelly KA
4. Venn O
5. Higgins JD
6. Yelina NE
7. Hardcastle TJ
8. Ziolkowski PA
9. Copenhaver GP
10. Franklin FC
11. McVean G
12. Henderson IR
(2013) Arabidopsis meiotic crossover hot spots overlap with H2A.Z nucleosomes at gene promoters
Nature Genetics 45:1327–1336.

https://doi.org/10.1038/ng.2766
- PubMed
- Google Scholar
1. Clay OK
2. Bernardi G
(2011) GC3 of genes can be used as a proxy for isochore base composition: a reply to Elhaik et al
Molecular Biology and Evolution 28:21–23.

https://doi.org/10.1093/molbev/msq222
- PubMed
- Google Scholar
1. Clément Y
2. Arndt PF
(2013) Meiotic recombination strongly influences GC-content evolution in short regions in the mouse genome
Molecular Biology and Evolution 30:2612–2618.

https://doi.org/10.1093/molbev/mst154
- PubMed
- Google Scholar
Book
1. Core Team R
(2015)
R: A Language and Environment for Statistical Computing

Austria: Vienna.
- Google Scholar
(2015) Local and sex-specific biases in crossover vs. noncrossover outcomes at meiotic recombination hot spots in mice
Genes & Development 29:1721–1733.

https://doi.org/10.1101/gad.265561.115
- PubMed
- Google Scholar
1. dos Reis M
2. Wernisch L
(2009) Estimating translational selection in eukaryotic genomes
Molecular Biology and Evolution 26:451–461.

https://doi.org/10.1093/molbev/msn272
- PubMed
- Google Scholar
1. Dray S
2. Dufour A-B
(2007)
The ade4 Package: implementing the duality diagram for ecologists

Journal of Statistical Software 22:1–20.
- Google Scholar
1. Drummond DA
2. Wilke CO
(2008) Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution
Cell 134:341–352.

https://doi.org/10.1016/j.cell.2008.05.042
- PubMed
- Google Scholar
1. Duret L
2. Arndt PF
(2008) The impact of recombination on nucleotide substitutions in the human genome
PLoS Genetics 4:e1000071.

https://doi.org/10.1371/journal.pgen.1000071
- PubMed
- Google Scholar
1. Duret L
2. Galtier N
(2009) Biased gene conversion and the evolution of mammalian genomic landscapes
Annual Review of Genomics and Human Genetics 10:285–311.

https://doi.org/10.1146/annurev-genom-082908-150001
- PubMed
- Google Scholar
1. Duret L
2. Mouchiroud D
(1999) Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis
PNAS 96:4482–4487.

https://doi.org/10.1073/pnas.96.8.4482
- PubMed
- Google Scholar
1. Duret L
(2002) Evolution of synonymous codon usage in metazoans
Current Opinion in Genetics & Development 12:640–649.

https://doi.org/10.1016/S0959-437X(02)00353-2
- PubMed
- Google Scholar
1. Fagerberg L
2. Hallström BM
3. Oksvold P
4. Kampf C
5. Djureinovic D
6. Odeberg J
7. Habuka M
8. Tahmasebpoor S
9. Danielsson A
10. Edlund K
11. Asplund A
12. Sjöstedt E
13. Lundberg E
14. Szigyarto CA
15. Skogs M
16. Takanen JO
17. Berling H
18. Tegel H
19. Mulder J
20. Nilsson P
21. Schwenk JM
22. Lindskog C
23. Danielsson F
24. Mardinoglu A
25. Sivertsson A
26. von Feilitzen K
27. Forsberg M
28. Zwahlen M
29. Olsson I
30. Navani S
31. Huss M
32. Nielsen J
33. Ponten F
34. Uhlén M
(2014) Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics
Molecular & Cellular Proteomics 13:397–406.

https://doi.org/10.1074/mcp.M113.035600
- PubMed
- Google Scholar
1. Frazer KA
2. Ballinger DG
3. Cox DR
4. Hinds DA
5. Stuve LL
6. Gibbs RA
7. Belmont JW
8. Boudreau A
9. Hardenbol P
10. Leal SM
11. Pasternak S
12. Wheeler DA
13. Willis TD
14. Yu F
15. Yang H
16. Zeng C
17. Gao Y
18. Hu H
19. Hu W
20. Li C
21. Lin W
22. Liu S
23. Pan H
24. Tang X
25. Wang J
26. Wang W
27. Yu J
28. Zhang B
29. Zhang Q
30. Zhao H
31. Zhao H
32. Zhou J
33. Gabriel SB
34. Barry R
35. Blumenstiel B
36. Camargo A
37. Defelice M
38. Faggart M
39. Goyette M
40. Gupta S
41. Moore J
42. Nguyen H
43. Onofrio RC
44. Parkin M
45. Roy J
46. Stahl E
47. Winchester E
48. Ziaugra L
49. Altshuler D
50. Shen Y
51. Yao Z
52. Huang W
53. Chu X
54. He Y
55. Jin L
56. Liu Y
57. Shen Y
58. Sun W
59. Wang H
60. Wang Y
61. Wang Y
62. Xiong X
63. Xu L
64. Waye MM
65. Tsui SK
66. Xue H
67. Wong JT
68. Galver LM
69. Fan JB
70. Gunderson K
71. Murray SS
72. Oliphant AR
73. Chee MS
74. Montpetit A
75. Chagnon F
76. Ferretti V
77. Leboeuf M
78. Olivier JF
79. Phillips MS
80. Roumy S
81. Sallée C
82. Verner A
83. Hudson TJ
84. Kwok PY
85. Cai D
86. Koboldt DC
87. Miller RD
88. Pawlikowska L
89. Taillon-Miller P
90. Xiao M
91. Tsui LC
92. Mak W
93. Song YQ
94. Tam PK
95. Nakamura Y
96. Kawaguchi T
97. Kitamoto T
98. Morizono T
99. Nagashima A
100. Ohnishi Y
101. Sekine A
102. Tanaka T
103. Tsunoda T
104. Deloukas P
105. Bird CP
106. Delgado M
107. Dermitzakis ET
108. Gwilliam R
109. Hunt S
110. Morrison J
111. Powell D
112. Stranger BE
113. Whittaker P
114. Bentley DR
115. Daly MJ
116. de Bakker PI
117. Barrett J
118. Chretien YR
119. Maller J
120. McCarroll S
121. Patterson N
122. Pe'er I
123. Price A
124. Purcell S
125. Richter DJ
126. Sabeti P
127. Saxena R
128. Schaffner SF
129. Sham PC
130. Varilly P
131. Altshuler D
132. Stein LD
133. Krishnan L
134. Smith AV
135. Tello-Ruiz MK
136. Thorisson GA
137. Chakravarti A
138. Chen PE
139. Cutler DJ
140. Kashuk CS
141. Lin S
142. Abecasis GR
143. Guan W
144. Li Y
145. Munro HM
146. Qin ZS
147. Thomas DJ
148. McVean G
149. Auton A
150. Bottolo L
151. Cardin N
152. Eyheramendy S
153. Freeman C
154. Marchini J
155. Myers S
156. Spencer C
157. Stephens M
158. Donnelly P
159. Cardon LR
160. Clarke G
161. Evans DM
162. Morris AP
163. Weir BS
164. Tsunoda T
165. Mullikin JC
166. Sherry ST
167. Feolo M
168. Skol A
169. Zhang H
170. Zeng C
171. Zhao H
172. Matsuda I
173. Fukushima Y
174. Macer DR
175. Suda E
176. Rotimi CN
177. Adebamowo CA
178. Ajayi I
179. Aniagwu T
180. Marshall PA
181. Nkwodimmah C
182. Royal CD
183. Leppert MF
184. Dixon M
185. Peiffer A
186. Qiu R
187. Kent A
188. Kato K
189. Niikawa N
190. Adewole IF
191. Knoppers BM
192. Foster MW
193. Clayton EW
194. Watkin J
195. Gibbs RA
196. Belmont JW
197. Muzny D
198. Nazareth L
199. Sodergren E
200. Weinstock GM
201. Wheeler DA
202. Yakub I
203. Gabriel SB
204. Onofrio RC
205. Richter DJ
206. Ziaugra L
207. Birren BW
208. Daly MJ
209. Altshuler D
210. Wilson RK
211. Fulton LL
212. Rogers J
213. Burton J
214. Carter NP
215. Clee CM
216. Griffiths M
217. Jones MC
218. McLay K
219. Plumb RW
220. Ross MT
221. Sims SK
222. Willey DL
223. Chen Z
224. Han H
225. Kang L
226. Godbout M
227. Wallenburg JC
228. L'Archevêque P
229. Bellemare G
230. Saeki K
231. Wang H
232. An D
233. Fu H
234. Li Q
235. Wang Z
236. Wang R
237. Holden AL
238. Brooks LD
239. McEwen JE
240. Guyer MS
241. Wang VO
242. Peterson JL
243. Shi M
244. Spiegel J
245. Sung LM
246. Zacharia LF
247. Collins FS
248. Kennedy K
249. Jamieson R
250. Stewart J
251. International HapMap Consortium
(2007) A second generation human haplotype map of over 3.1 million SNPs
Nature 449:851–861.

https://doi.org/10.1038/nature06258
- PubMed
- Google Scholar
1. Galtier N
2. Duret L
(2007) Adaptation or biased gene conversion? extending the null hypothesis of molecular evolution
Trends in Genetics 23:273–277.

https://doi.org/10.1016/j.tig.2007.03.011
- PubMed
- Google Scholar
(2001)
GC-content evolution in mammalian genomes: the biased gene conversion hypothesis

Genetics 159:907–911.
- PubMed
- Google Scholar
1. Gingold H
2. Tehler D
3. Christoffersen NR
4. Nielsen MM
5. Asmar F
6. Kooistra SM
7. Christophersen NS
8. Christensen LL
9. Borre M
10. Sørensen KD
11. Andersen LD
12. Andersen CL
13. Hulleman E
14. Wurdinger T
15. Ralfkiær E
16. Helin K
17. Grønbæk K
18. Orntoft T
19. Waszak SM
20. Dahan O
21. Pedersen JS
22. Lund AH
23. Pilpel Y
(2014) A dual program for translation regulation in cellular proliferation and differentiation
Cell 158:1281–1292.

https://doi.org/10.1016/j.cell.2014.08.011
- PubMed
- Google Scholar
1. Glémin S
2. Arndt PF
3. Messer PW
4. Petrov D
5. Galtier N
6. Duret L
(2015) Quantification of GC-biased gene conversion in the human genome
Genome Research 25:1215–1228.

https://doi.org/10.1101/gr.185488.114
- PubMed
- Google Scholar
1. Guo F
2. Yan L
3. Guo H
4. Li L
5. Hu B
6. Zhao Y
7. Yong J
8. Hu Y
9. Wang X
10. Wei Y
11. Wang W
12. Li R
13. Yan J
14. Zhi X
15. Zhang Y
16. Jin H
17. Zhang W
18. Hou Y
19. Zhu P
20. Li J
21. Zhang L
22. Liu S
23. Ren Y
24. Zhu X
25. Wen L
26. Gao YQ
27. Tang F
28. Qiao J
(2015) The transcriptome and DNA methylome landscapes of human primordial germ cells
Cell 161:1437–1452.

https://doi.org/10.1016/j.cell.2015.05.015
- PubMed
- Google Scholar
1. Hershberg R
2. Petrov DA
(2008) Selection on codon bias
Annual Review of Genetics 42:287–299.

https://doi.org/10.1146/annurev.genet.42.110807.091442
- PubMed
- Google Scholar
1. Ikemura T
(1981) Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system
Journal of Molecular Biology 151:389–409.

https://doi.org/10.1016/0022-2836(81)90003-6
- PubMed
- Google Scholar
1. Kanaya S
2. Yamada Y
3. Kinouchi M
4. Kudo Y
5. Ikemura T
(2001) Codon usage and tRNA genes in eukaryotes: correlation of codon usage diversity with translation efficiency and with CG-dinucleotide usage as assessed by multivariate analysis
Journal of Molecular Evolution 53:290–298.

https://doi.org/10.1007/s002390010219
- PubMed
- Google Scholar
1. Kryuchkova-Mostacci N
2. Robinson-Rechavi M
(2015) Tissue-specific evolution of protein coding genes in human and mouse
PLoS One 10:e0131673.

https://doi.org/10.1371/journal.pone.0131673
- PubMed
- Google Scholar
1. Lam I
2. Keeney S
(2015) Nonparadoxical evolutionary stability of the recombination initiation landscape in yeast
Science 350:932–937.

https://doi.org/10.1126/science.aad0814
- PubMed
- Google Scholar
(2016) Parallel evolution of male germline epigenetic poising and somatic development in animals
Nature Genetics 48:888–894.

https://doi.org/10.1038/ng.3591
- PubMed
- Google Scholar
1. Mancera E
2. Bourgon R
3. Brozzi A
4. Huber W
5. Steinmetz LM
(2008) High-resolution mapping of meiotic crossovers and non-crossovers in yeast
Nature 454:479–485.

https://doi.org/10.1038/nature07135
- PubMed
- Google Scholar
1. McVicker G
2. Green P
(2010) Genomic signatures of germline gene expression
Genome Research 20:1503–1511.

https://doi.org/10.1101/gr.106666.110
- PubMed
- Google Scholar
(1991) The distribution of genes in the human genome
Gene 100:181–187.

https://doi.org/10.1016/0378-1119(91)90364-H
- PubMed
- Google Scholar
(1988) The compositional distribution of coding sequences and DNA molecules in humans and murids
Journal of Molecular Evolution 27:311–320.

https://doi.org/10.1007/BF02101193
- PubMed
- Google Scholar
(2014) A fine-scale recombination map of the human-chimpanzee ancestor reveals faster change in humans than in chimpanzees and a strong impact of GC-biased gene conversion
Genome Research 24:467–474.

https://doi.org/10.1101/gr.158469.113
- PubMed
- Google Scholar
1. Myers S
2. Bottolo L
3. Freeman C
4. McVean G
5. Donnelly P
(2005) A fine-scale map of recombination rates and hotspots across the human genome
Science 310:321–324.

https://doi.org/10.1126/science.1117196
- PubMed
- Google Scholar
1. Neri F
2. Rapelli S
3. Krepelova A
4. Incarnato D
5. Parlato C
6. Basile G
7. Maldotti M
8. Anselmi F
9. Oliviero S
(2017) Intragenic DNA methylation prevents spurious transcription initiation
Nature 543:72–77.

https://doi.org/10.1038/nature21373
- PubMed
- Google Scholar
(2014) Transmission distortion affecting human noncrossover but not crossover recombination: a hidden source of meiotic drive
PLoS Genetics 10:e1004106.

https://doi.org/10.1371/journal.pgen.1004106
- PubMed
- Google Scholar
1. Pessia E
2. Popa A
3. Mousset S
4. Rezvoy C
5. Duret L
6. Marais GA
(2012) Evidence for widespread GC-biased gene conversion in eukaryotes
Genome Biology and Evolution 4:675–682.

https://doi.org/10.1093/gbe/evs052
- PubMed
- Google Scholar
1. Plotkin JB
2. Kudla G
(2011) Synonymous but not the same: the causes and consequences of codon bias
Nature Reviews Genetics 12:32–42.

https://doi.org/10.1038/nrg2899
- PubMed
- Google Scholar
(2004) Tissue-specific codon usage and the expression of human genes
PNAS 101:12588–12591.

https://doi.org/10.1073/pnas.0404957101
- PubMed
- Google Scholar
Data
(authors) (2017) Recombination, meiotic expression and human codon usage
Zenodo.

https://doi.org/10.5281/zenodo.835063
(2014) DNA recombination. Recombination initiation maps of individual human genomes
Science 346:1256442.

https://doi.org/10.1126/science.1256442
- PubMed
- Google Scholar
1. Rudolph KL
2. Schmitt BM
3. Villar D
4. White RJ
5. Marioni JC
6. Kutter C
7. Odom DT
(2016) Codon-driven translational efficiency is stable across diverse mammalian cell states
PLOS Genetics 12:e1006024.

https://doi.org/10.1371/journal.pgen.1006024
- PubMed
- Google Scholar
1. Schmitt BM
2. Rudolph KL
3. Karagianni P
4. Fonseca NA
5. White RJ
6. Talianidis I
7. Odom DT
8. Marioni JC
9. Kutter C
(2014) High-resolution mapping of transcriptional dynamics across tissue development reveals a stable mRNA-tRNA interface
Genome Research 24:1797–1807.

https://doi.org/10.1101/gr.176784.114
- PubMed
- Google Scholar
(1986) Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes
Nucleic Acids Research 14:5125–5143.

https://doi.org/10.1093/nar/14.13.5125
- PubMed
- Google Scholar
(1988)
"Silent" sites in Drosophila genes are not neutral: evidence of selection among synonymous codons

Molecular Biology and Evolution 5:704–716.
- PubMed
- Google Scholar
1. Singhal S
2. Leffler EM
3. Sannareddy K
4. Turner I
5. Venn O
6. Hooper DM
7. Strand AI
8. Li Q
9. Raney B
10. Balakrishnan CN
11. Griffith SC
12. McVean G
13. Przeworski M
(2015) Stable recombination hotspots in birds
Science 350:928–932.

https://doi.org/10.1126/science.aad0843
- PubMed
- Google Scholar
1. Smedley D
2. Haider S
3. Durinck S
4. Pandini L
5. Provero P
6. Allen J
7. Arnaiz O
8. Awedh MH
9. Baldock R
10. Barbiera G
11. Bardou P
12. Beck T
13. Blake A
14. Bonierbale M
15. Brookes AJ
16. Bucci G
17. Buetti I
18. Burge S
19. Cabau C
20. Carlson JW
21. Chelala C
22. Chrysostomou C
23. Cittaro D
24. Collin O
25. Cordova R
26. Cutts RJ
27. Dassi E
28. Di Genova A
29. Djari A
30. Esposito A
31. Estrella H
32. Eyras E
33. Fernandez-Banet J
34. Forbes S
35. Free RC
36. Fujisawa T
37. Gadaleta E
38. Garcia-Manteiga JM
39. Goodstein D
40. Gray K
41. Guerra-Assunção JA
42. Haggarty B
43. Han DJ
44. Han BW
45. Harris T
46. Harshbarger J
47. Hastings RK
48. Hayes RD
49. Hoede C
50. Hu S
51. Hu ZL
52. Hutchins L
53. Kan Z
54. Kawaji H
55. Keliet A
56. Kerhornou A
57. Kim S
58. Kinsella R
59. Klopp C
60. Kong L
61. Lawson D
62. Lazarevic D
63. Lee JH
64. Letellier T
65. Li CY
66. Lio P
67. Liu CJ
68. Luo J
69. Maass A
70. Mariette J
71. Maurel T
72. Merella S
73. Mohamed AM
74. Moreews F
75. Nabihoudine I
76. Ndegwa N
77. Noirot C
78. Perez-Llamas C
79. Primig M
80. Quattrone A
81. Quesneville H
82. Rambaldi D
83. Reecy J
84. Riba M
85. Rosanoff S
86. Saddiq AA
87. Salas E
88. Sallou O
89. Shepherd R
90. Simon R
91. Sperling L
92. Spooner W
93. Staines DM
94. Steinbach D
95. Stone K
96. Stupka E
97. Teague JW
98. Dayem Ullah AZ
99. Wang J
100. Ware D
101. Wong-Erasmus M
102. Youens-Clark K
103. Zadissa A
104. Zhang SJ
105. Kasprzyk A
(2015) The BioMart community portal: an innovative alternative to large, centralized data repositories
Nucleic Acids Research 43:W589–W598.

https://doi.org/10.1093/nar/gkv350
- PubMed
- Google Scholar
(2016) High-resolution mapping of crossover and non-crossover recombination events by whole-genome re-sequencing of an avian pedigree
PLOS Genetics 12:e1006044.

https://doi.org/10.1371/journal.pgen.1006044
- PubMed
- Google Scholar
1. Vinogradov AE
(2003) Isochores and tissue-specificity
Nucleic Acids Research 31:5212–5220.

https://doi.org/10.1093/nar/gkg699
- PubMed
- Google Scholar
(2014) Evidence for GC-biased gene conversion as a driver of between-lineage differences in avian base composition
Genome Biology 15:549.

https://doi.org/10.1186/s13059-014-0549-1
- PubMed
- Google Scholar
1. Williams AL
2. Genovese G
3. Dyer T
4. Altemose N
5. Truax K
6. Jun G
7. Patterson N
8. Myers SR
9. Curran JE
10. Duggirala R
11. Blangero J
12. Reich D
13. Przeworski M
(2015) Non-crossover gene conversions show strong GC bias and unexpected clustering in humans
eLife 4:e04637.

https://doi.org/10.7554/eLife.04637
- Google Scholar
1. Yates A
2. Akanni W
3. Amode MR
4. Barrell D
5. Billis K
6. Carvalho-Silva D
7. Cummins C
8. Clapham P
9. Fitzgerald S
10. Gil L
11. Girón CG
12. Gordon L
13. Hourlier T
14. Hunt SE
15. Janacek SH
16. Johnson N
17. Juettemann T
18. Keenan S
19. Lavidas I
20. Martin FJ
21. Maurel T
22. McLaren W
23. Murphy DN
24. Nag R
25. Nuhn M
26. Parker A
27. Patricio M
28. Pignatelli M
29. Rahtz M
30. Riat HS
31. Sheppard D
32. Taylor K
33. Thormann A
34. Vullo A
35. Wilder SP
36. Zadissa A
37. Birney E
38. Harrow J
39. Muffato M
40. Perry E
41. Ruffier M
42. Spudich G
43. Trevanion SJ
44. Cunningham F
45. Aken BL
46. Zerbino DR
47. Flicek P
(2016) Ensembl 2016
Nucleic Acids Research 44:D710–D716.

https://doi.org/10.1093/nar/gkv1157
- PubMed
- Google Scholar

Article and author information

Author details

Fanny Pouyet

Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Claude Bernard, Villeurbanne, France

Contribution
Conceptualization, Data curation, Formal analysis, Methodology, Writing—original draft

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-5614-6998
Dominique Mouchiroud

Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Claude Bernard, Villeurbanne, France

Contribution
Conceptualization, Supervision, Methodology

Competing interests
No competing interests declared
Laurent Duret

Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Claude Bernard, Villeurbanne, France

Contribution
Conceptualization, Data curation, Formal analysis, Supervision, Methodology, Writing—original draft

For correspondence
Laurent.Duret@univ-lyon1.fr

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-2836-3463
Marie Sémon

Laboratory of Biology and Modelling of the Cell, UnivLyon, ENS de Lyon, Univ Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratoire de Biologie et Modélisation de la Cellule, Lyon, France

Contribution
Conceptualization, Formal analysis, Supervision, Methodology, Writing—original draft

For correspondence
marie.semon@ens-lyon.fr

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-3479-7524

Funding

Agence Nationale de la Recherche (ANR-530 15-CE12-0010-01/DaSiRe)

Laurent Duret

École Normale Supérieure de Lyon (Projet Emergent)

Marie Sémon

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This work was supported by French National Research Agency (ANR) grant DaSiRe (ANR-15-CE12-0010-01/DaSiRe) and the "appel d'offre fond recherche-projets emergents" of the ENS de Lyon. This work was performed using the computing facilities of the CC LBBE/PRABI. FP received a doctoral scholarship from Ecole Normale Supérieure de Lyon (http://www.ens-lyon.eu/). We thank Gaël Yvert for initiating the discussion and Adam Eyre-Walker and Vincent Daubin for helpful suggestions on a first version of our manuscript.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.