Effective population size does not explain long-term variation in genome size and transposable element content in animals

Alba Marino; Gautier Debaecker; Anna-Sophie Fiston-Lavier; Annabelle Haudry; Benoit Nabholz

doi:10.7554/eLife.100574.2

eLife Assessment

This important study offers a powerful empirical test of a highly influential hypothesis in population genetics. It incorporates a large number of animal genomes spanning a broad phylogenetic spectrum and treats them in a rigorous unified pipeline, providing the convincing negative result that effective population size scales neither with the content of transposable elements nor with overall genome size. These observations demonstrate that there is still no simple, global hypothesis that can explain the observed variation in transposable element content and genome size in animals.

https://doi.org/10.7554/eLife.100574.2.sa3

Significance of findings

important: Findings that have theoretical or practical implications beyond a single subfield

landmark
fundamental
important
valuable
useful

Strength of evidence

convincing: Appropriate and validated methodology in line with current state-of-the-art

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

Animal genomes exhibit a remarkable variation in size, but the evolutionary forces responsible for such variation are still debated. As the effective population size (N_e) reflects the intensity of genetic drift, it is expected to be a key determinant of the fixation rate of nearly-neutral mutations. Accordingly, the Mutational Hazard Hypothesis postulates lineages with low N_e to have bigger genome sizes due to the accumulation of slightly deleterious transposable elements (TEs), and those with high N_e to maintain streamlined genomes as a consequence of a more effective selection against TEs. However, the existence of both empirical confirmation and refutation using different methods and different scales precludes ts general validation. Using high-quality public data, we estimated genome size, TE content and rate of non-synonymous to synonymous substitutions (dN/dS) as N_e proxy for 807 species including vertebrates, molluscs and insects. After collecting available life-history traits, we tested the associations among population size proxies, TE content and genome size, while accounting for phylogenetic non-independence. Our results confirm TEs as major drivers of genome size variation, and endorse life-history traits and dN/dS as reliable proxies for N_e. However, we do not find any evidence for increased drift to result in an accumulation of TEs across animals. Within more closely related clades, only a few isolated and weak associations emerge in fishes and birds. Our results outline a scenario where TE dynamics vary according to lineage-specific patterns, lending no support for genetic drift as the predominant force driving long-term genome size evolution in animals.

1. Introduction

The variation in genome size among animals is remarkable, spanning four orders of magnitude, from 0.02 Gb in the nematode Pratylenchus coffeae to 120 Gb in the marbled ungfish Protopterus aethiopicus (Gregory 2023). Understanding why such a huge variation occurs is a long-standing question in evolutionary biology. It is now established that genome size is not related to organismal complexity (C-value enigma) or the number of coding genes n eukaryotes. Rather, variation in DNA content depends on the amount of non-coding DNA such as Transposable Elements (TEs), introns and pseudogenes (Elliott and Gregory 2015; Kidwell 2002; Lynch et al. 2011). However, the evolutionary mechanisms leading certain ineages to inflate their genome size or to maintain streamlined genomes are still debated (Galtier 2024).

The various hypotheses that have been proposed can be divided into adaptive and non-adaptive. Adaptive theories such as the nucleoskeletal (Cavalier-Smith 1978) and the nucleotypic one (Gregory and Hebert 1999) consider genome size to be mainly composed of “indifferent” DNA (Graur et al. 2015), whose bulk is indirectly selected as a consequence of ts effect on nuclear and cellular volume. Cellular phenotypes, in turn, are thought to influence the fitness of organisms by affecting traits such as cell division rate (Bennett and Riley 1997), metabolic rate (Vinogradov 1995; Vinogradov 1997) and developmental time and complexity (Jockusch 1997; Olmo et al. 1989; Gregory 2002). On the other hand, non-adaptive theories emphasize the importance of the neutral processes of mutation and genetic drift in determining genome size (Lynch and Conery 2003; Petrov 2001). In particular, the concepts originally proposed by Lynch and Conery were later on formalized within the framework of the Mutational Hazard Hypothesis (MHH) (Lynch 2007).

The MHH posits that tolerance to the accumulation of non-coding DNA depends on its mutational liability, which is minimized when the mutation rate is low and the effective population size is high (N_e, which is inversely proportional to the intensity of genetic drift). The fundamental assumption is that most of such extra DNA is mildly deleterious and its fate in the population depends on the interplay between selection and genetic drift: a newly emerged nearly-neutral allele with a given negative selective effect (s) should be effectively removed from the population when N_e is high (|N_es| >> 1), while it should have approximately the same chance of fixation of a neutral allele when N_e is low (|N_es| ∼ 1 or lower than 1) as drift will be the predominant force (Ohta 1992).

Because of the pervasiveness of TEs and their generally neutral or slightly deleterious effect (Arkhipova 2018), their dynamics in response to changing N_e are of particular interest in the context of the MHH. TE insertions are expected to drift to fixation as neutral alleles and enlarge genome size in organisms with low N_e, while the genomes of organisms with large N_e are expected to remain streamlined as emerging TEs should be efficiently removed by purifying selection (Lynch 2007). The MHH is highly popular: indeed, it is based on general principles of population genetics and proposes a unifying explanation for the evolution of complex traits of genome architecture without recourse to specific molecular processes. The studies supporting the MHH are based on phylogenetically very diverse datasets as the goal of the theory is to explain broad patterns of complexity emergence and variation (Lynch and Conery 2003; Yi and Streelman 2005; Yi 2006). Nevertheless, other authors pointed out that the application of the MHH to such distantly related taxa could suffer from confounding factors intervening across organisms with very different biologies (Charlesworth and Barton 2004; Daubin and Moran 2004), thus raising the question whether N_e could explain genome size variation patterns at finer scales. Additionally, potential issues of robustness of the original dataset to phylogenetic control have been raised (Lynch 2011; Whitney et al. 2011; Whitney and Garland 2010). On top of that, an alternative TE-host-oriented perspective is that the accumulation of TEs in particular depends on their type of activity and dynamics, as well as on the lineage-specific silencing mechanisms evolved by host genomes (Ågren and Wright 2011).

Recent attempts have been made to assess the impact of increased genetic drift on the genomic TE content and genome size increase across closely related species with similar biological characteristics, employing both genetic diversity data and life-history traits (LHTs) as predictors of N_e. While some studies do not find any evidence supporting the role of N_e in genome size and TE content variation (Bast et al. 2016; Kapheim et al. 2015; Mackintosh et al. 2019; Roddy et al. 2021; Yang et al. 2024), others do (Chak et al. 2021; Cui et al. 2019; Lefébure et al. 2017; Mérel et al. 2021; Mérel et al. 2024). Thus, a univocal conclusion can hardly be drawn from such studies. The MHH has thus been investigated at either very wide - from prokaryotes to multicellular eukaryotes - or narrow phylogenetic scales (i.e., nter-genera or inter-population). However, no study spanning across an exhaustive set of distantly related taxa and relying on a phylogenetic framework has yet been performed to our knowledge. Such an approach would allow a systematic test of the association between N_e variation and long-term patterns of genome and TE expansion at a wider evolutionary scale, while controlling for the effect of phylogenetic inertia. Synonymous genetic diversity is commonly used to inform patterns of N_e (Lynch and Conery 2003; Romiguier et al. 2014; Buffalo 2021; Lynch et al. 2023). However, while its insights are limited to the age of current alleles (mostly less than 10N_e generations in diploid organisms), complex genomic features ikely have much deeper origins. Comparative measures of divergence, like the genome-wide ratio of non-synonymous substitution rate to synonymous substitution rate (dN/dS), can quantify the level of genetic drift acting on protein-coding sequences since the last common ancestor. This approach accounts for processes occurring over a longer time scale than those responsible for genetic diversity. In fact, polymorphism might reflect relatively recent population size fluctuations (Daubin and Moran 2004) and might even diverge from indices of ong-term N_e if the selection-drift equilibrium is not reestablished (Lefébure et al. 2017; Müller et al. 2022). We therefore adopt dN/dS as an index of long-term N_e, as it is more likely to reflect the evolutionary lapse during which the deep changes in genome size and TE content that we are investigating occurred (Whitney et al. 2011; Whitney and Garland 2010).

In this study, we took advantage of 3,214 public metazoan reference genomes and C-value records (i.e., the haploid DNA content of a nucleus) to estimate genome sizes. A subset of 807 species including birds, mammals, ray-finned fishes, insects and molluscs was selected to test the predictions of the MHH, especially through the relationship among the level of drift, genome size and genomic TE content. A phylogeny was computed with metazoan-conserved genes. N_e was accounted for by the dN/dS and by life-history traits (LHTs) when available; TE contents were estimated de novo from read data. Controlling for phylogenetic non-independence, we (1) assessed the contribution of TE quantity to genome size differences and (2) evaluated the efficacy of N_e proxies in explaining genome size and TE content variation, across the whole dataset and within specific clades.

2. Results

2.1. Selection of high-quality assemblies

The reference genomes of the 3,214 metazoan species were downloaded via the NCBI genome database (Supplemental Table S1). From these, the genomes with contig N50 ≥ 50 kb and available C-value record were employed for genome size estimation (see 2.2, 4.2). Based on the assembly quality (contig N50 ≥ 50 kb), the completeness of metazoan core genes (complete BUSCO orthologs ≥ 70%) (Manni et al. 2021) and the raw sequencing data availability, a dataset of 930 genomes was then retained for downstream analyses. For reliable estimation of substitution rates, this dataset was further downsized to 807 representative genomes as species-poor, deep-branching taxa were excluded (Figure 1; Supplemental Table S2).

Phylogeny of the 807 species including ray-finned fishes (Actinopteri), birds (Aves), insects (Insecta), mammals (Mammalia), and molluscs (Mollusca) with bars corresponding to TE content (bp, blue), genome size (bp, green), and dN/dS estimations (values between 0 and 1, yellow).
Branch engths are amino acid substitutions calculated on BUSCO genes. The tree was plotted with iTOL (Letunic and Bork 2021).

2.2. Genome size estimation

For the selected genomes, the genome size records available for 465 species (Supplemental Table S3) show that the assembly size is strongly positively correlated with the C-value (Pearson’s r = 0.97, p-value < 0.001). This indicates the reliability of the use of assemblies to estimate genome size. Although a non-linear model is not statistically better than the Weighted Least Squares (WLS), assembly sizes tend to underestimate genome size in comparison to C-values, an effect becoming more and more evident as genome size ncreases (Supplemental Figure S1). According to the metadata that we could retrieve, whether a genome was assembled using long read or uniquely short read data does not affect the slope of the WLS with assembly size as independent variable (T-test: long reads - p-value = 0.88; short reads - p-value = 0.87). Because it is not affected by sequencing biases, C-value was used when available; otherwise, the predicted C-value according to WLS was employed. The value chosen as genome size estimation is reported for each species in Supplemental Tables S1, S2.

2.3. Transposable elements and genome size variation

Repeat content of a subset of 29 dipteran genomes was previously estimated using EarlGrey v1.3 (Baril et al. 2024) and a wrapper around dnaPipeTE (Goubert et al. 2015), an assembly-based and a read-based pipeline, respectively. dnaPipeTE leverages the sampling of reads at low-coverage to perform de novo assembly of TE consensus sequences: this approach has the advantage of being unbiased by repeat sequences potentially missing from genome assemblies. The results of the two methods overall agree with each other across the scanned genomes (Genomic percentage of TEs: Pearson’s r = 0.88, p-value < 0.001; TE base pairs: Pearson’s r = 0.90, p-value < 0.001), with the most notable difference being the proportion of unknown elements, generally higher in dnaPipeTE estimations (Wilcoxon signed-rank test, p-value < 0.001; Supplemental Table S4; Supplemental Figure S2, S3). We therefore mined the remaining genomes with dnaPipeTE which is much less computationally ntensive. Repeat content could only be estimated for 672 species over the 807 representative genomes (Supplemental Table S2): for the remaining 135, the pipeline could not be run because of unsuitable reads (e.g. only long reads available or too low coverage).

A very strong positive correlation between TE content and genome size is found both across the whole dataset and within taxa (Figure 2A; Supplemental Figure S4; Table 1; Supplemental Table S5). A notable exception concerns the avian clade that deviates from this pattern: the range of TE content is wider than the one of genome size compared to the other clades (Figure 2A), resulting in a weaker power of TEs in explaining genome size variation in this group (Table 1; Supplemental Table S5).

(A) Relationship between overall TE content and genome size (log-transformed). Slope = 0.718, adjusted-R² = 0.751, p-value < 0.001. (B) Relationship between genome size and dN/dS. Slope = 6.100, adjusted-R² = 0.275, p-value < 0.001. (C) Relationship between TE content and dN/dS. Slope = 4.253, adjusted-R² = 0.092, p-value < 0.001. Statistics refer to linear regression, see Tables 1 and 2 for Phylogenetic Independent Contrasts results.

Correlation between genome size and overall TE content based on Phylogenetic ndependent Contrasts. Statistics are shown relative to the overall dataset and to each clade. Variables were log-transformed previous to regression. * 0.05 < p ≤ 0.01; ** 0.01 < p ≤ 0.001; *** p < 0.001.

PIC results for the correlations of LHTs, genome size, and TE content against dN/dS. Results for Bio++ dN/dS are shown for the full dataset and for the phylogeny deprived of the longest (> 1) and shortest (< 0.01) terminal branches. Results for Coevol dN/dS are relative to the GC3-poor geneset. Only body mass and longevity are reported as LHTs (for an overview of all traits, see Table 3). For genomic traits, statistics are reported relative to the overall dataset and to each clade. Expected significant correlations of dN/dS with LHTs and genomic traits are marked in bold black; significant correlations opposite to the expected trend are marked in bold red. * 0.05 < p ≤ 0.01; ** 0.01 < p ≤ 0.001; *** p < 0.001

Alternatively to the impact of TEs on genome size, we investigated whether whole or partial genome duplications could be major factors in genome size variation among animals. BUSCO Duplicated score has indeed a slightly positive correlation with genome size, which s however much weaker than that of TEs (Slope = 6.639 · 10^-9, adjusted-R² = 0.022, p-value < 0.001). Of the 24 species with more than 30% of duplicated BUSCO genes, 13 include sturgeon, salmonids and cyprinids, known to have undergone whole genome duplication (Du et al. 2020; Li and Guo 2020; Lien et al. 2016), and five are dipteran species, where gene duplications are common (Ruzzante et al. 2019). In general, TEs appear as the main factor nfluencing genome size variation across species.

2.4. dN/dS and life history traits as proxies of effective population size

Intensity of effective selection acting on species can be informed by the dN/dS ratio: a dN/dS closer to 1 accounts for more frequent accumulation of mildly deleterious mutations over time due to increased genetic drift, while a dN/dS close to zero is associated with a stronger effect of purifying selection. We therefore employed this parameter as a genomic indicator of N_e, as the two are expected to scale negatively between each other. We compiled several LHTs from different sources (see 4.6) to cross-check our estimations of dN/dS. In general, dN/dS is expected to scale positively with body length, age at first birth, maximum longevity, age at sexual maturity and mass, and to scale negatively with metabolic rate, population density and depth range.

We estimated dN/dS with a mapping method (hereafter referred to as Bio++ dN/dS; Dutheil et al. 2006; Guéguen et al. 2013), and with a bayesian approach using Coevol (hereafter referred to as Coevol dN/dS; Lartillot and Poujol 2011). The two metrics are reported in Supplemental Table S2. Note that for Coevol, we report both the results relative to dN/dS at terminal branches (Table 2) and the correlations inferred by the model (Table 3; Supplemental Table S5).

Correlation coefficients (CC) and posterior probabilities (PP) estimated by Coevol with the GC3-poor and GC3-rich genesets for the coevolution of dN/dS h life history and genomic traits. Different LHTs are shown according to availability for a clade. Posterior probabilities lower than 0.1 indicate significant negative relations; posterior probabilities higher than 0.9 indicate significant positive correlations. Expected significant correlations of dN/dS with LHTs and genomic traits marked in bold black; significant correlations opposite to the expected trend are marked in bold red. ^aPantheria, ^bAnAge.

As expected, Bio++ dN/dS scales positively with body mass and longevity under Phylogenetic Independent Contrasts (PIC) (Table 2; Supplemental Figure S5). dN/dS estimation on the trimmed phylogeny deprived of short and long branches results in a stronger correlation with LHTs, suggesting that short branches might contribute to dN/dS fluctuations (Table 2; Supplemental Figure S6). Coevol dN/dS values are however highly concordant with Bio++ dN/dS (Supplemental Figure S7) and scale positively with body mass and longevity, as well (Table 2).

As for Coevol reconstruction, dN/dS covaries as expected with most of the tested LHTs: dN/dS scales positively with body length, longevity, mass, sexual maturity and depth range in fishes (Table 3); the same is found in mammals, in addition to negative correlations with population density and metabolic rate; in birds, mass, metabolic rate and sexual maturity correlate in the same way with dN/dS, although this is consistently observed only for the GC3-rich gene set (Table 3). Based on the available traits, the estimations of dN/dS ratios obtained using two different methods correspond in general to each other, supporting dN/dS as a meaningful indicator of long-term effective population size, at least for vertebrate clades. Results are not reported for molluscs and insects, as none and very few records of LHTs (seven species with at least one trait) were available, respectively.

2.5. dN/dS does not predict genome size and overall TE content across metazoans

If increased genetic drift leads to TE expansions, a positive relationship between dN/dS and TE content, and more broadly with genome size, should be observed. However, we find no statistical support for this relationship across all species in the PIC analysis. Similarly, no association is found when short and long branches are removed (Figure 2B-C; Supplemental Figure S8; Table 2). Contrary to our expectations, Coevol dN/dS scales negatively with genome size across the whole dataset (Slope =-0.287, adjusted-R² = 0.004, p-value = 0.039) and within insects (Slope =-1.241, adjusted-R² = 0.026, p-value = 0.018).

Surprisingly, different patterns are observed relative to TE content, as a negative correlation with Coevol dN/dS is detected across all species (Slope =-0.903, adjusted-R² = 0.004, p-value = 0.050) and within Mammalia (Slope =-2.113, adjusted-R² = 0.063, p-value = 0.001), while a positive correlation is found within Actinopteri (Slope = 3.407, adjusted-R² = 0.046, p-value = 0.013) (Table 2). Therefore, the two only significant positive PIC correlations found for birds and fishes are contrasted by results with an opposite trend in other groups. However, such correlations are slightly significant and their explained variance is extremely ow.

Overall, we find no evidence for a recursive association of long-term N_e variation, as approximated by dN/dS, with genome size and TE content across the analysed animal taxa. PIC analysis without the 24 species with more than 30% of duplicated BUSCO genes produced similar results with dN/dS as independent variable (genome size: slope = 0.234, adjusted-R² = 0.002, p-value = 0.102; TE content: slope = 0.819, adjusted-R² = 0.004, p-value = 0.054; Recent TE content: slope = 2.002, adjusted-R² = 0.012, p-value = 0.003), ndicating that genomic duplications have a negligible effect on the missing link between dN/dS and genome size.

2.6. Population size and genome size: a complex relationship across clades

Although no strong signal is found across the full dataset using PIC, different trends within different clades are suggested by both PIC and Coevol approaches.

Coevol infers a negative correlation of dN/dS with genome size in insects (GC3-poor: CC =-0.330, PP = 0.03; GC3-rich: CC =-0.110, PP = 0.24) and TE content in mammals (GC3-poor: CC =-0.220, PP = 0.08; GC3-rich: CC =-0.249, PP = 0.06), and a positive correlation (even though below significance threshold) with TEs in fishes (GC3-poor: CC = 0.192, PP = 0.88; GC3-rich: CC = 0.167, PP = 0.84). Additionally, a negative correlation with TE content is found in birds for the GC3-rich geneset, while a positive – yet not significant – trend is found using the GC3-poor geneset (GC3-poor: CC = 0.065, PP = 0.73; GC3-rich: CC =-0.195, PP = 0.03) (Table 3). Even though available for fewer species, LHTs partially support these trends for vertebrates: Actinopteri display a positive correlation between ongevity and recent TE content (see 2.7). Instead, mammalian TE content correlates positively with metabolic rate and population density, and negatively with body length, mass, sexual maturity, age at first birth and longevity (Supplemental Table S5). Within Aves, Coevol predicts opposite results for genome size and TE content: genome size associates positively with longevity and mass, as well as with dN/dS, and negatively with metabolic rate, while TEs correlate positively with metabolic rate (and negatively with dN/dS in one case) (Supplemental Table S5). In summary, N_e seems to negatively affect TE content in fishes, and positively in mammals. Importantly, genome size correlations seem to follow the same trends of TE content in these groups, although correlations are weaker and mostly non-significant. In the case of birds, genome size seems rather to be explained by N_e as expected by MHH but not TE content, which instead might have the opposite trend.

2.7. dN/dS weakly correlates with the recent TE content

The global TE content integrally reflects a long history of TE insertions and deletions. To have a glance at the dynamics of TEs on an evolutionary time comparable to that of the level of drift estimated using dN/dS, we additionally examined the quantity of the youngest elements. From the overall TE insertions, we estimated a recent TE content, defined by reads with less than 5% of divergence from consensus, and included it among the traits to model with PIC and Coevol. In PIC analysis, the variation of recent TE content weakly associates with Bio++ dN/dS across the full dataset (Slope = 1.963, adjusted-R² = 0.012, p-value = 0.003) and in mammals (Slope = 4.115, adjusted-R² = 0.024, p-value = 0.031). On the contrary, using Coevol dN/dS this correlation is found in Aves (Slope = 4.982, adjusted-R² = 0.028, p-value = 0.006) and Actinopteri (Slope = 4.365, adjusted-R² = 0.061, p-value = 0.005), but the opposite is detected in mammals (Slope =-3.151, adjusted-R² = 0.024, p-value = 0.032) (Table 2). In agreement with PIC, Coevol reconstruction retrieves a positive association of recent TE content with dN/dS and longevity only in fishes, and a relationship opposite to expectations with dN/dS (CC =-0.296, PP = 0.06) and LHTs in mammals. In contrast with PIC results, a negative correlation between recent TE content and dN/dS is found for birds using the GC3-rich genesets (CC =-0.240, PP = 0.01) (Table 3; Supplemental Table S5).

On the whole, only a very weak positive correlation of dN/dS with recent TE insertions is observed across all species. However, considering again the taxa separately, clade-specific patterns emerge: a negative association between population size proxies and recent TE content is jointly found by the two methods only in fishes. Conversely, mammals show a positive correlation between recent TE content and population size proxies. Therefore, the coevolution patterns between population size and recent TE content are consistent with the pictures emerging from the comparison of population size proxies with genome size and overall TE content in the corresponding clades.

3. Discussion

Our results demonstrate the absence of a negative relationship between genome size and effective population size across a large dataset of animals, in contrast to the prediction of the MHH (Lynch and Conery 2003; Lynch 2007). Rather, our results highlight heterogeneous patterns within clades, with no consistent response of genome size and TE dynamics to N_e variations.

3.1. Assembly size underestimate genome size as genomes grow bigger

Assembly size is commonly used as a measure of genome size. However, the difficulty in assembling repetitive regions generally have it underestimate the actual genome size, in particular when only short reads are employed (Benham et al. 2024; Blommaert 2020; Peona et al. 2018). Consequently, methods that directly measure C-value such as flow cytometry (Doležel and Greilhuber 2010) and Feulgen densitometry (Hardie et al. 2002) are normally preferred as they do not rely on sequence data. Despite applying quality criteria to the assembly, the relationship between assembly size and genome size might still be questioned. However, we show that assembly size can overall approximate genome size quite well and, probably because we removed lower quality assemblies, no effect related to read type (short Illumina vs long ONT/PacBio) was detected. This suggests that the assemblies selected for our dataset can mostly provide a reliable measurement of genome size, and thus a quasi-exhaustive view of the genome architecture. On the other hand, because a gap with C-value is still present, we integrated this metric to correct assembly size estimations to their “expected C-values”. Similarly, the use of dnaPipeTE allowed us to quantify the repeat content without relying on assembly completeness. In summary, we extensively controlled for the effect of data quality on results and employed methods to minimize it.

3.2. Transposable elements are major contributors to genome size variation

Genome size and TE content have already been reported as tightly linked in eukaryotes (Elliott and Gregory 2015; Kidwell 2002), arthropods (Sproul et al. 2023; Wu and Lu 2019) and vertebrates (Chalopin et al. 2015). Our results are consistent with this perspective across all the animal species analysed, as well as at the level of ray-finned fishes (Reinar et al. 2023), insects (Heckenhauer et al. 2022; Mérel et al. 2024; Petersen et al. 2019; Sessegolo et al. 2016), mammals (Osmanski et al. 2023), and molluscs (Martelossi et al. 2023), strongly ndicating TEs as major drivers of genome size variation in metazoans. It should be noted however that we mainly focused on some vertebrate groups and insects, while leaving out many animal taxa with fewer genomic resources currently available including much of the animal tree of life, such as most molluscs, annelids, sponges, cnidarians and nematodes. Even for better studied vertebrates, our datasets are far from comprehensive. For instance, the genomes of squamate reptiles are relatively stable in size but show a high variability in repeat content (Castoe et al. 2011; Pasquesi et al. 2018). A similar case is represented by bird genomes where, according to our analysis and consistently with other studies (Ji et al. 2022; Kapusta and Suh 2017), repeat content has a lower capacity to explain size compared to other clades. This could be due to satellites, whose contribution to genome size can be highly variable (Flynn et al. 2020; Pasquesi et al. 2018; Peona et al. 2023). While the remarkable conservation of avian genome sizes has prompted interpretations involving further mechanisms (see discussion below), dnaPipeTE is known to generally underestimate satellite content (Goubert et al. 2015). This bias is more relevant for those species that exhibit large fractions of satellites compared to TEs in their repeatome. For instance, the portions of simple and low complexity repeats estimated with dnaPipeTE are consistently smaller than those reported in previous analyses based on assembly annotation for some species, such as Triatoma infestans (0.46% vs 25%; 7 Mbp vs 400 Mbp), Drosophila eugracilis (1.28% vs 10.89%; 2 Mbp vs 25 Mbp), Drosophila albomicans (0.06% vs 18 to 38%; 0.12 Mbp vs 39 to 85 Mbp) and some other Drosophila species (Pita et al. 2017; de Lima and Ruiz-Luano 2022; Supplemental Table S2). Although the accuracy of Coevol analyses might occasionally be affected by such underestimations, the effect is likely minimal on the general trends. Inability to detect ancient TE copies is another relevant bias of dnaPipeTE. However, the strong correlation between repeat content and genome size and the consistency of dnaPipeTE and earlGrey results, even in large genomes such as that of Aedes albopictus, indicate that dnaPipeTE method is pertinent for our large-scale analysis. Furthermore, such an approach is especially fitting for the examination of recent TEs, as this specific analysis is not biased by very repetitive new TE families that are problematic to assemble.

Another way for genomes to grow involves genomic duplications. Although a high proportion of duplicated BUSCO genes may indicate a low haplotype resolution of the assembly, many species with a high duplication score in our dataset correspond to documented duplication cases, suggesting that such BUSCO statistics may provide an insight into this biological process. However, the contribution of duplicated genes to genome size is minimal compared to the one of TEs, and removing species with high duplication scores does not affect our results: this implies that duplication is unlikely to be the factor causing the relationship between genome size and dN/dS to deviate from the pattern expected from the MHH. Across the animal species considered here, the activity of TEs is therefore a preponderant mechanism of DNA gain, and their evolutionary dynamics appear of prime importance in driving genome size variation.

3.3. Reduced selection efficacy is not associated with increased genome size and TE content

Our dN/dS calculation included several filtering steps by branch length and topology: indeed, selecting markers by such criteria appears to be an essential step to reconcile estimations with different methodologies (M Bastian, F Bénitière, A Marino, pers. comm.; Bastian 2024). In addition, our analyses resulted to be robust to species pruning by deviant branch lengths. Müller et al. (2022) showed that recent N_e fluctuations might perturb the expected correlation between long-and short-term estimates of N_e. According to the nearly neutral theory, alleles that start at low relative frequencies have a mean fixation of ∼4Ne generations, under the mplicit assumption of constant N_e (Kimura and Ohta 1969). This implies that dN/dS, that accounts for the accumulation of substitutions over time, has a weaker sensitivity to short-term changes in N_e compared to estimates based on polymorphism (Müller et al. 2022). Additionally, inferences on simulated and empirical data showed that N_e changes along branches could be captured and generally recapitulated by dN/dS and LHTs in a framework similar to that of Coevol (Latrille et al. 2021). Accordingly, dN/dS assessments by Bio++ and Coevol are highly concordant between each other and with LHTs. Taken together, our results point at the dN/dS found with the two methods as reliable proxies of long-term N_e.

If TEs are ascribable to nearly-neutral mutations, a negative correlation of N_e with TE expansion, and consequently with genome size – equivalent to a positive association with dN/dS – is expected. However, no such correlations are observed across the sampled species. It is important to note how not treating species traits as non-independent leads to artifactual results (Figure 2B-C). For instance, mammals have on average small population sizes and the largest genomes. Conversely, insects tend to have large N_e and overall small genomes. With a high sampling power and phylogenetic inertia being taken into account, our meta-analysis clearly points at a phylogenetic structure in the data: the main clades are each confined to separate genome size ranges regardless of their dN/dS variation. The other way around, variability in genome size can be observed in insects, irrespective of their dN/dS. Relying on non phylogenetically corrected models based on a limited number of species (such as that available at the time of the MHH proposal) can thus result in a spurious positive scaling between genome size and N_e proxies. To account for a shallower phylogenetic scale, we isolated recently active elements and at the same time explored the same relationships within each clade. Indeed, while the selective effect of elements might be slightly negative as ong as they are active, TEs accumulated over long periods of time might be subject to changing dynamics: in the latter case, the pace of sequence erosion could be in the long run ndependent of drift and lead to different trends of TE retention and degradation in different ineages. Extracting recent elements should thus allow us to have a glimpse of the latest TE colonizations. A positive scaling between the quantity of young TEs and dN/dS found in some cases indicates that relatively recent expansions of TEs could be subject to a more effective negative selection. However, this trend is always very weak and often summarizes that of full TE content within clades. A potential limit of this analysis lies in the application of the same similarity threshold to all species to delimit recent elements. While this is not problematic when comparing species that recently split apart (e.g., Yang et al. 2024), some noise might be introduced at large scale, as the quantity of young repeats that evolved on the same time scale can vary according to the mutation rate and generation time of a species.

Interestingly, the correlation patterns between population size proxies and genomic traits emerging within single clades are distinct and sometimes opposite to the expectations of MHH. Mammals display a negative correlation of dN/dS with TE content, a pattern that is uniformly confirmed by LHTs. Not only does this result corroborates previous findings of no relationship between N_e and genome size in mammals (Roddy et al. 2021), but it supports a correlation opposite to the predictions of the MHH. On the other hand, the observed positive scaling between dN/dS and TE content in ray-finned fishes might lend support to a role of drift on genome size in this clade: this result is also consistent with a previous study which found a negative scaling between genome size and heterozygosity in this group (Yi and Streelman 2005). In birds, population size seems to negatively affect genome size and positively the TE content, a decoupling that is however not surprising given the higher variation of TE load compared to the restricted genome size range. Contrasting signals from the two genomic traits have already been observed by Ji et al. (2022) who also reported a positive correlation between assembly size and mass, but a negative correlation between TE content and generation time. As previous studies find relatively weak correlations between TE content and genome size in birds (Ji et al. 2022; Kapusta and Suh 2017), it is possible for the very narrow variation of the avian genome sizes to impair the detection of consistent signals. On the other hand, it is conceivable the avian TE diversity to be underappreciated due to the limits of sequencing technologies used so far in resolving complex repeat-rich regions. For instance, employment of long-reads technologies allowed to reveal more extended repeated regions that were previously ignored with short read assemblies (Kapusta and Suh 2017; Benham et al. 2024). Besides, quite large fractions might indeed be satellite sequences constituting relevant fractions of the genome that are challenging to identify with reference-or read-based methods (Edwards et al. 2025). An “accordion” dynamic has been proposed whereby higher TE loads are paralleled by equally strong deletional pressures, which could contribute to the maintenance of remarkably small and constant genome sizes in birds, in spite of ongoing TE activity (Kapusta et al. 2017; Kapusta and Suh 2017). Finally, the diffused evidence for a positive and a negative correlation of genome size with body mass and metabolic rate, respectively, is also compatible with the adaptationist perspective of powered flight indirectly maintaining small genome sizes in birds as a consequence of the metabolic needs (Wright et al. 2014; Zhang and Edwards 2012). In insects, dN/dS scales negatively with genome size, but never with TE content. As eusociality appears to bring about selection relaxation (Imrit et al. 2020; Kapheim et al. 2015; Weyna and Romiguier 2021), several studies explored the link between N_e and genome size in this taxon by focussing on social complexity as a proxy, but with contrasting outcomes: Mikhailova et al. (2023) find bigger genome size associated with eusociality in Hymenoptera, but the opposite trend in Blattodea; in contrast and partially in accordance with our findings, Kapheim et al. (2015) and Koshikawa et al. (2008) report less abundant TEs in eusocial hymenopters and smaller genomes in eusocial termites, respectively. While the approximation of N_e based on dN/dS should allow for a quantification of selection efficacy in wider terms than sociality traits, the investigated evolutionary scale might hold an important role in the outcome of such analyses. First, and specifically relative to insects, genome size seems to be subject to different evolutionary pressures – either selective or neutral – within different insect orders (Cong et al. 2022), implying that increased drift might not necessarily produce the same effect on genome size across all insect groups. More generally, the five defined clades cover quite different time scales: insects and molluscs have much more ancient origins than mammals and birds, and such distant groups also evolve at very different evolutionary rates, making it difficult to characterize the evolution of their traits on the same evolutionary scale. Nevertheless, the results are still valuable in highlighting the absence of relationship between genetic drift and genome size variation in the long-term evolution of such broad groups, in contrast to previous work focusing at the population level or on recently diverged species (Cui et al. 2019; Mérel et al. 2021; Yang et al. 2024). At the same time, as noted by Mérel et al. (2024), comparing very distantly related species - as the insect and molluscan species of our dataset - might overshadow any relationship between genome size and N_e, either due to dN/dS predicting power being weakened by branch saturation, deep N_e fluctuations not being detected by our methods, or to additional factors affecting long-term genome size evolution.

3.4. Do lineage-specific TE dynamics affect genome size evolution?

Our findings do not support the quantity of non-coding DNA being driven in a nearly-neutral fashion by genetic drift. Notably, these results not only reject the theory of extra non-coding DNA being costly for its point mutational risk, but also challenges the more general idea of its accumulation depending on other kinds of detrimental effects, such as increased replication, pervasive transcription, or ectopic recombination. Therefore, our results can be considered more general than a mere rejection of the MHH hypothesis, as they do not support any theory predicting that species with low N_e would accumulate more non-coding DNA. In agreement with previous analyses (Pasquesi et al. 2018), we find that the proliferation of TEs n particular can, under comparable drift levels, give place to lineage-specific outcomes that mostly do not seem to depend on effective population size. These results contrast with those of other large-scale analyses which instead support the predictions of the drift-barrier hypothesis for a general impact of N_e on other genomic features, notably mutation rate (Bergeron et al. 2023; Lynch et al. 2023; Wang and Obbard 2023) and splicing accuracy (Bénitière et al. 2024). To put this in perspective, it should be emphasized that, in the framework of the MHH, the success of nearly-neutral alleles depends on the combination of both N_e and liability to mutation of non-coding DNA (Lynch et al. 2011). Overall, we studied N_e variation without accounting for the different mutagenic burden posed by non-coding DNA across different lineages. In the case of TEs, inherently assuming the same distribution of selective effect and a constant activity in all species and among TE insertions was assumed. However, it is known that TEs are subject to waves of activity rather than a uniform pace of transposition (Arkhipova 2018). Moreover, given the broad phylogenetic scale of our dataset, t is likely for different levels of hazard to be acting across genomes due to different “host-parasite” dynamics in different animal groups (Ågren and Wright 2011). Such coevolutionary dynamics are, for example, determined by TE silencing mechanisms, which evolve differently across lineages and might influence the degree of genome expansion (Lechner et al. 2013; Zhou et al. 2020; Wang et al. 2023). In general, because of their complex interactions with genomes, TEs are especially likely to deviate from the assumption of gradually mutating sequences. Therefore, treating them as universally slightly deleterious alleles might be an oversimplified model. For instance, while the big genomes of salamanders are not related to small N_e, the low synonymous substitution rates and low degree of deletions due to ectopic recombination suggest weak mutational hazard of TEs that possibly contributes to the maintenance of genomic gigantism in this group (Frahry et al. 2015; Mohlhenrich and Mueller 2016; Rios-Carlos et al. 2024). Additionally, lineage-specific TE dynamics themselves might underlie different genomic architectures: for example, mammalian genomes are generally characterized by one preponderant type of active element and by a long-term retention of old TEs (Osmanski et al. 2023; Sotero-Caio et al. 2017), as in human where a very small proportion of active elements (<0.05%) is unlikely to mpose a mutation rate causing genome size variation (Mills et al. 2007). Conversely, squamate and teleost fish genomes are smaller and characterized by several, simultaneously active and less abundant TE types (Duvernell et al. 2004; Furano et al. 2004; Novick et al. 2009; Pasquesi et al. 2018; Volff et al. 2003). These different patterns of genomic organization seem overall associated with different rates of elements’ turnover (Blass et al. 2012; Lavoie et al. 2013; Novick et al. 2009; Volff et al. 2003). All such variables might alter the selective effect and differentiate TEs from gradually and constantly evolving alleles, eventually contributing to the lack of association between long-term N_e and genome size. Finally, Kapusta et al. (2017) showed that large-scale deletions can be as important as DNA gain in determining genome size, thus questioning the assumption of the rate of elements nsertion being greater than their removal rate (Lynch 2007). This implies that the contribution of TEs constitutes just one side of the coin and that deletion bias could also drive the divergence of genome size across lineages, as suggested by several studies linking negatively deletion rates with genome size (Frahry et al. 2015; Ji et al. 2022; Kapusta et al. 2017; Wang et al. 2014).

3.5. Perspectives

Evidence for signatures of negative selection against TE proliferation exist at various degrees. In Anolis lizards, the ability of TEs to reach fixation can vary between populations of the same species according to population size (Ruggiero et al. 2017; Tollis and Boissinot 2013). Furthermore, N_e was found to negatively correlate with genome size and TE expansion at the intraspecific level in Drosophila suzukii (Mérel et al. 2021) and at the nterspecific level in fruit flies (Mérel et al. 2024), asellid isopods (Lefébure et al. 2017) and killifishes (Cui et al. 2019), supporting the role of genetic drift in determining recent differences in genome size among closely related animal species. Given the very different taxonomic scale of such works and ours, and with the perspective of lineage-specific nteraction between genome and genomic parasites in mind, our negative results for the MHH at metazoan scale are not incompatible with an effect of N_e on genome size within specific clades. In a nutshell, although an increase in genetic drift seems to lead to the short-term accumulation of transposable elements, this process is not visible in the long-term, suggesting that it fades over time. A general mechanism of selection preventing the proliferation of non-coding DNA and TEs in animals might exist but its results be detectable only at a sufficiently short evolutionary time. In this sense, the lack of evidence for MHH in other clade-specific studies might be due to the methodological challenges of either estimating a suitable marker of N_e or investigating too distantly related lineages. Moreover, the contrasting outcomes of such studies might reflect an actual variability in the selective effect of TEs not compatible with a general selection mechanism. Further reducing the phylogenetic scale under study and systematically exploring the consequences of N_e variation within independent biological systems could therefore provide an alternative way to test the impact of drift, while removing the confounding effects due to different genomic backgrounds.

4. Methods

4.1. Dataset

All the metazoan reference assemblies available as of November 14th 2021 were used, except for insect genomes which were drawn from Sproul et al. (2023), for a total of 3,214 assemblies. For each assembly, quality metrics were computed with Quast 5.0.2 (Gurevich et al. 2013) and genome completeness was assessed with BUSCO 5.2.2 using the 954 markers of the metazoa_odb10 geneset (Supplemental Table S1). Availability of raw reads was verified with SRA Explorer (https://github.com/ewels/sra-explorer). All the assemblies with either a contig N50 smaller than 50 kb, less than 70% of complete BUSCO orthologs, or without available reads were excluded from TE and dN/dS analyses. The subdivision into Actinopteri (N = 148), Aves (N = 260), Insecta (N = 189), Mammalia (N = 182), Mollusca (N = 28) was adopted to perform alignments, phylogenies, dN/dS estimation with Bio++ and Coevol runs (see below).

4.2. Genome size estimation

Assembly sizes and C-values were jointly used to estimate genome size. C-values measured by either flow cytometry (FCM), Feulgen densitometry (FD) or Feulgen image analysis densitometry (FIA) were collected from https://www.genomesize.com/ (last accessed 6 october 2022) for all available species of our initial dataset with contig N50 ≥ 50 kb, totalling 465 measurements for 365 species (Supplemental Table S3). To assign a unique C-value, when multiple values were present for one species, the most recent one was retained and, if dates were the same, the average was used. For all the other species having contig N50 ≥ 50 kb but with no available C-value record, genome size was calculated as an expected C-value predicted from a WLS where the 465 FCM, FD and FIA estimations were the dependent variables and assembly sizes were the independent variables (for details see https://github.com/albmarino/Meta-analysis_scripts). Out of all the records in this training dataset for genome size, 93 correspond to ray-finned fishes, 93 to mammals, 92 to birds, 106 to insects, and 9 to molluscs, overall mirroring the taxa represented in the final dataset. For the purpose of our analysis, C-values were used for the species for which such data were available, while the expected C-value was used as genome size estimation in all the other cases, regardless of the type of sequencing data used for the assembly (Supplemental Tables S1, S3).

4.3. Gene alignment

The 954 annotated single-copy BUSCO genes were aligned with the pipeline OMM_MACSE 11.05 using MACSE 2.06 (Ranwez et al. 2018; Scornavacca et al. 2019). Alignments were performed separately within each clade - Actinopteri, Aves, Insecta, Mammalia and Mollusca.

4.4. Phylogeny

Phylogenies were computed separately for each clade with IQ-TREE 1.6.12 (Nguyen et al. 2015). JTT+F+R10 substitution model was selected with ModelFinder (-m MFP option) (Kalyaanamoorthy et al. 2017). For reasons of computing power and time, we have reconstructed the phylogenies of each clade independently and then grouped them together to create a single complete phylogenetic tree (see below). The same set of 107 concatenated BUSCO amino-acid sequences was used to calculate all the phylogenies. However, since this produced a spurious relationship in the mammalian tree with paraphyly of primates, an alternative set of randomly selected 100 genes was used instead for the phylogeny of Mammalia (Supplemental Table S6). Each phylogeny was rooted using an outgroup species belonging to its respective sister clade: the outgroup sequences were added to gene alignments with the enrichAlignment function from MACSE; the outgroup + clade gene alignments were concatenated and used to recompute the outgroup + clade phylogeny taking nto account the previously computed tree topology of the clade. The outgroups were then removed, and rooted clade phylogenies were merged together manually using the tree editor program Baobab (Dutheil and Galtier 2002). Finally, 50 top-shared genes across all species were chosen among the set of 107 genes (Supplemental Table S6) to recalculate the branch engths of the whole dataset phylogeny: with the MACSE program alignTwoProfiles the nucleotide gene alignment of one clade was joined to the one of its respective sister clade until achievement of the whole dataset alignment. Branch lengths were then estimated based on the 50-genes concatenate and the tree topology (for the detailed workflow, see https://github.com/albmarino/Meta-analysis_scripts).

4.5. dN/dS estimation

When genetic drift is strong, slightly deleterious mutations are more likely to reach fixation than under conditions of high N_e and more effective selection (Ohta 1992). The genome-wide fixation rate of non-synonymous mutations is expected to drive the dN/dS ratio due to nearly-neutral mutations responding to different N_e: a higher dN/dS accounts for more frequent accumulation of mildly deleterious mutations over time due to small N_e, while lower dN/dS is associated with a stronger effect of selection against slightly deleterious non-synonymous mutations due to high N_e (James et al. 2016; Romiguier et al. 2014; Weyna and Romiguier 2021; Woolfit and Bromham 2005). This is also supported at the polymorphism level, with higher pN/pS and accumulation of slightly deleterious mutations in smaller populations (Leroy et al. 2021; Dussex et al. 2023).

Before dN/dS calculation, sequences with more than 10% of their length occupied by nsertions were preemptively removed from BUSCO alignments. Estimation of dN/dS on either very long or short terminal branches might lead to loss of accuracy due to branch saturation (Weber et al. 2014) or to a higher variance of substitution rates, respectively. Furthermore, shared polymorphism can be captured in the substitution rates when closely related species are compared, and further contribute to bias dN/dS (Mugal et al. 2020). To correct for such issues, genes with deviant topology were identified and removed from every clade with PhylteR using default parameters (Comte et al. 2023). Moreover, genes exhibiting branch lengths shorter than 0.001, for which dN/dS could have a large variance, were also not integrated in the dN/dS calculation of a species.

We then used bppml and mapnh from the Bio++ libraries to estimate dN/dS on terminal branches (Dutheil et al. 2006; Guéguen et al. 2013; Guéguen and Duret 2018; Romiguier et al. 2012). bppml calculates the parameters under a homogenous codon model YN98 (F3X4). Next, mapnh maps substitutions along the tree branches and estimates dN and dS. More precisely, the substitution rate is given by the number of substitutions mapped according to the model parameters normalized by the number of substitutions of the same category (i.e. synonymous, non-synonymous) that would occur under the same neutral model (Bolívar et al. 2019). Therefore, dN and dS are calculated for each species as follows:

Where n is the number of genes, K is the substitutions count as mapped by the substitution model calculated for the gene, O is the substitutions count as mapped under the same neutral model, and l is the branch length of the given species for that gene. In addition to the gene filtering described above, Bio++ dN/dS was recalculated on a reduced dataset where the longest (> 1) and shortest (< 0.01) branches were removed, in order to ensure that substitution saturation and segregating polymorphism did not influence the results. Terminal branches with more than 1 and less than 0.01 amino-acid substitutions per site were removed, and dN/dS recalculated on the trimmed phylogenies with the same method described above.

The same metric was estimated with Coevol 1.6 (Lartillot and Poujol 2011). Coevol models the co-evolution of dN/dS and continuous traits along branches following a multivariate Brownian diffusion process, thus reducing the variance in the dN/dS of the smallest branches (Lartillot and Delsuc 2012). Bio++ dN/dS was therefore compared with dN/dS estimated by Coevol on terminal branches to verify the consistency between the two methods.

4.6. Compilation of life history traits

Life-history traits – body mass, longevity, generation time, among others – are found to be related to N_e in mammals, birds, and amniotes in general (Bolívar et al. 2019; Figuet et al. 2016; Nikolaev et al. 2007; Popadin et al. 2007). Available LHTs were assigned to the species of our dataset using information from several resources. Adult body mass, body ength, maximum longevity, basal metabolic rate, age at first birth, population size and population density were assigned to mammalian species using PanTHERIA (Jones et al. 2009). For birds, body mass information was extracted from Dunning (2007). Shallow to deep depth range, longevity in the wild, body length and body mass were compiled for ray-finned fishes from www.fishbase.org using the rfishbase R package (Boettiger et al. 2012).

Additionally, age at sexual maturity, adult body mass, maximum longevity and metabolic rate were extracted from AnAge (Tacutu et al. 2018), as well as body mass and metabolic rate from AnimalTraits (Herberstein et al. 2022): such data were used to complement information when missing from the databases cited above. All the retrieved LHTs and their relative source are reported in Supplemental Table S2.

4.7. TE quantification

TEs were annotated with a pipeline employing dnaPipeTE (Goubert et al. 2015) in two rounds (https://github.com/sigau/pipeline_dnapipe). Raw reads are filtered with UrQt (Modolo and Lerat 2015) or fastp (Chen et al. 2018) and undergo a first “dirty” dnaPipeTE round. The obtained dnaPipeTE contigs are mapped against a database of organellar, fungal, bacterial and archaean reference sequences with Minimap2 (Li 2018) and the original reads matching contaminant sequences removed with SAMtools (Danecek et al. 2021). Finally, the quality-and contaminant-filtered reads are used to perform a second “clean” dnaPipeTE round. dnaPipeTE was configured with the Dfam 3.5 and RepBaseRepeatMaskerEdition-20181026 repeat libraries and was run with a genome coverage of 0.25.

To verify the consistency of dnaPipeTE estimations, the dnaPipeTE-based pipeline was benchmarked on a subset of 29 dipteran species against EarlGrey 1.3, an automated pipeline performing TE annotation on genome assemblies (Baril et al. 2024). EarlGrey was configured with the same libraries as dnaPipeTE and was run with ‘metazoa’ as search term. The results of the second dnaPipeTE round were used to extrapolate the total and recent TE content, the latter being defined by all the reads below 5% of divergence from the corresponding consensus. The two contents were extracted by adapting dnaPT_landscapes.sh from the dnaPT_utils repository (https://github.com/clemgoub/dnaPT_utils) to a custom R script (https://github.com/albmarino/Meta-analysis_scripts).

4.8. Gene duplication

To account for the effect of whole genome or segmental duplications, we used the BUSCO Duplicated score: if big-scale duplication events recently took place, a higher score should be observed genome-wide even for conserved genes. As many of the genomes with BUSCO Duplicated score above 30% corresponded to reported cases of genomic duplication, we used this threshold to perform PIC analysis with and without species whose genome size is potentially more affected by duplication events.

4.9. Phylogenetic independent contrasts and Coevol reconstruction

The correlations of Bio++ and Coevol dN/dS with LHTs, as well as with genome size and TE content, were tested with PIC to correct for the covariation of traits due to the phylogenetic relatedness of species (Felsenstein 1985). PIC were performed on the whole dataset, the trimmed dataset, and within every clade with the R packages ape 5.7.1 (Paradis et al. 2004), nlme 3.1.162, and caper 1.0.1. Results were plotted with the ggplot2 3.4.2 package. Additionally, Coevol 1.6 was run to test for the coevolution of dN/dS and traits: sequence substitution processes and quantitative traits such as LHTs, genome size, TE content, are here assumed to covary along the phylogeny as a multivariate Brownian motion process. Coevol infers trait values on internal nodes and terminal branches (those used for PIC), as well as correlation coefficients and their relative posterior probabilities. Due to computational imitations, Coevol analysis was carried out separately on every clade and on a limited number of genes. Genes were selected according to their GC content at the third codon position (GC3). Indeed, mixing genes with heterogeneous base composition in the same concatenate might result in an alteration of the calculation of codon frequencies, and consequently impair the accuracy of the model estimating substitution rates (Mérel et al. 2024). Moreover, genes with different GC3 levels can reflect different selective pressures, as highly expressed genes should be enriched in optimal codons as a consequence of selection on codon usage. In Drosophila, where codon usage bias is at play, most optimal codons present G/C bases at the third position (Duret and Mouchiroud, 1999), meaning that genes with high GC3 content should evolve under stronger purifying selection than GC3-poor genes. Accordingly, Mérel et al. (2024) do find a stronger relationship between dN/dS and genome size when using GC3-poor genes, as compared to GC3-rich genes or gene concatenates of random GC3 composition. Finally, dN/dS can be influenced by GC-biased gene conversion (Bolívar et al. 2019; Ratnakumar et al. 2010), and the strength at which such substitution bias acts can be reflected by base composition. For these reasons, two sets of 50 genes with similar GC3 content were defined in order to employ genes undergoing similar evolutionary regimes. Markers and species were subject to a more stringent filtering in order to have as much information as possible for each species. 16 species with less than 50% of single-copy orthologs were further filtered out from the dataset. In addition to the PhylteR step, only genes represented in at least 95% of the species of a clade were retained. From those, the 50 GC3-poorest and the 50 GC3-richest genes were chosen. Coevol was then run with both the gene sets for every clade. Convergence of the MCMC chains were checked visually by plotting the evolution of statistics. Likelihood values and correlations were estimated, running the chains for a minimum of 1,000 steps and discarding the first 400 steps as burn-in.

Data access

The accession numbers, SRA experiments and C-values used in this study are reported in Supplemental Table S1. C-values are available at https://www.genomesize.com/. The respective sources of LHTs are reported in Supplemental Table S2. Detailed commands and custom scripts can be found at https://github.com/albmarino/Meta-analysis_scripts.

Acknowledgements

We are grateful to Laurent Duret, Nicolas Lartillot, Tristan Lefébure, Mélodie Bastian and Florian Bénitière for the helpful discussions. We thank Mélodie Bastian and Florian Bénitière for methodological exchanges to ensure to have a dN/dS accurately estimated. We thank Laurent Guéguen for his help with dN/dS analyses. Analyses were run on the servers of the Montpellier Bioinformatics Biodiversity platform (MBB). We are grateful to three anonymous reviewers whose comments helped improve the quality of this manuscript.

This research was funded by the French National Research Agency (ANR-20-CE02-0008-01 “NeGA”). A CC-BY public copyright licence has been applied by the authors to the present document and will be applied to all subsequent versions up to the Author Accepted Manuscript arising from this submission, in accordance with the grant’s open access conditions.

Additional files

Supplemental Figure S1

Supplemental Figure S2

Supplemental Figure S3

Supplemental Figure S4

Supplemental Figure S5

Supplemental Figure S6

Supplemental Figure S7

Supplemental Figure S8

Supplemental Table S1

Supplemental Table S2

Supplemental Table S3

Supplemental Table S4

Supplemental Table S5

Supplemental Table S6

References

1. Ågren JA
2. Wright SI
2011Co-evolution between transposable elements and their hosts: a major factor in genome size evolution?Chromosome Res 19:777–786Google Scholar
1. Arkhipova IR
2018Neutral Theory, Transposable Elements, and Eukaryotic Genome EvolutionMol Biol Evol 35:1332–1337Google Scholar
1. Baril T
2. Galbraith J
3. Hayward A
2024Earl Grey: a fully automated user-friendly transposable element annotation and analysis pipelineMol Biol Evol 41:msae068https://doi.org/10.1093/molbev/msae068 Google Scholar
1. Bast J
2. Schaefer I
3. Schwander T
4. Maraun M
5. Scheu S
6. Kraaijeveld K
2016No Accumulation of Transposable Elements in Asexual ArthropodsMol Biol Evol 33:697–706Google Scholar
1. Bastian M.
2024Génomique des populations intégrative: de la phylogénie à la génétique des populationsDoctoral dissertation, Université Lyon Google Scholar
1. Benham PM
2. Cicero C
3. Escalona M
4. Beraut E
5. Fairbairn C
6. Marimuthu MPA
7. Nguyen O
8. Sahasrabudhe R
9. King BL
10. Thomas WK
11. et al.
2024Remarkably High Repeat Content in the Genomes of Sparrows: The Importance of Genome Assembly Completeness for Transposable Element DiscoveryGenome Biol Evol 16:evae067https://doi.org/10.1093/gbe/evae067 Google Scholar
1. Bénitière F
2. Necsulea A
3. Duret L
2024Random genetic drift sets an upper limit on mRNA splicing accuracy in metazoanseLife 13:RP93629https://doi.org/10.7554/eLife.93629.3 Google Scholar
1. Bennett MD
2. Riley R
1997The duration of meiosisProc Royal Soc B 178:277–299Google Scholar
1. Blass E
2. Bell M
3. Boissinot S.
2012Accumulation and Rapid Decay of Non-LTR Retrotransposons in the Genome of the Three-Spine SticklebackGenome Biol Evol 4:687–702Google Scholar
1. Bergeron LA
2. Besenbacher S
3. Zheng J
4. Li P
5. Bertelsen MF
6. Quintard B
7. Hoffman JI
8. Li Z
9. St Leger J
10. Shao C
11. et al.
2023Evolution of the germline mutation rate across vertebratesNature 615:285–291Google Scholar
1. Blommaert J
2020Genome size evolution: towards new model systems for old questionsProc R Soc B Biol Sci 287:20201441https://doi.org/10.1098/rspb.2020.1441 Google Scholar
1. Boettiger C
2. Lang DT
3. Wainwright PC
2012. rfishbase: exploring, manipulating and visualizing FishBase data from RJ Fish Biol 81:2030–2039Google Scholar
1. Bolívar P
2. Guéguen L
3. Duret L
4. Ellegren H
5. Mugal CF
2019GC-biased gene conversion conceals the prediction of the nearly neutral theory in avian genomesGenome Biol 20:1–13Google Scholar
1. Brevet M
2. Lartillot N
2021Reconstructing the History of Variation in Effective Population Size along PhylogeniesGenome Biol Evol 13:evab150https://doi.org/10.1093/gbe/evab150 Google Scholar
1. Buffalo V
2021Quantifying the relationship between genetic diversity and population size suggests natural selection cannot explain Lewontin’s ParadoxeLife 10:e67509https://doi.org/10.7554/eLife.67509 Google Scholar
1. Castoe TA
2. Hall KT
3. Guibotsy Mboulas ML
4. Gu W
5. de Koning APJ
6. Fox SE
7. Poole AW
8. Vemulapalli V
9. Daza JM
10. Mockler T
11. et al.
2011Discovery of Highly Divergent Repeat Landscapes in Snake Genomes Using High-Throughput SequencingGenome Biol Evol 3:641–653Google Scholar
1. Cavalier-Smith T
1978Nuclear volume control by nucleoskeletal DNA, selection for cell volume and cell growth rate, and the solution of the DNA C-value paradoxJ Cell Sci 34:247–278Google Scholar
1. Chak STC
2. Harris SE
3. Hultgren KM
4. Jeffery NW
5. Rubenstein DR
2021Eusociality in snapping shrimps is associated with larger genomes and an accumulation of transposable elementsPNAS 118:e2025051118https://doi.org/10.1073/pnas.2025051118 Google Scholar
1. Chalopin D
2. Naville M
3. Plard F
4. Galiana D
5. Volff J-N
2015Comparative Analysis of Transposable Elements Highlights Mobilome Diversity and Evolution in VertebratesGenome Biol Evol 7:567–580Google Scholar
1. Charlesworth B
2. Barton N
2004Genome Size: Does Bigger Mean Worse?Curr Biol 14:R233–R235Google Scholar
1. Chen S
2. Zhou Y
3. Chen Y
4. Gu J
2018. fastp: an ultra-fast all-in-one FASTQ preprocessorBioinformatics 34:i884–i890Google Scholar
1. Comte A
2. Tricou T
3. Tannier E
4. Joseph J
5. Siberchicot A
6. Penel S
7. Allio R
8. Delsuc F
9. Dray S
10. De Vienne DM.
2023PhylteR: efficient identification of outlier sequences in phylogenomic datasetsMol Biol Evol 40:msad234https://doi.org/10.1093/molbev/msad234 Google Scholar
1. Cong Y
2. Ye X
3. Mei Y
4. He K
5. Li F
2022Transposons and non-coding regions drive the ntrafamily differences of genome size in insectsiScience 25:104873https://doi.org/10.1016/j.isci.2022.104873 Google Scholar
1. Cui R
2. Medeiros T
3. Willemsen D
4. Iasi LNM
5. Collier GE
6. Graef M
7. Reichard M
8. Valenzano DR
2019Relaxed Selection Limits Lifespan by Increasing Mutation LoadCell 178:385–399Google Scholar
1. Danecek P
2. Bonfield JK
3. Liddle J
4. Marshall J
5. Ohan V
6. Pollard MO
7. Whitwham A
8. Keane T
9. McCarthy SA
10. Davies RM
11. Li H
2021Twelve years of SAMtools and BCFtoolsGigaScience 10:giab008https://doi.org/10.1093/gigascience/giab008 Google Scholar
1. Daubin V
2. Moran NA
2004Comment on “The Origins of Genome Complexity.”Science 306:978–978Google Scholar
1. de Lima LG
2. Ruiz-Ruano FJ
2022In-depth satellitome analyses of 37 Drosophila species lluminate repetitive DNA evolution in the Drosophila genusGenome Biol Evol 14:evac064https://doi.org/10.1093/gbe/evac064 Google Scholar
1. Doležel J
2. Greilhuber J
2010Nuclear genome size: Are we getting closer?Cytometry A 77:635–642Google Scholar
1. Du K
2. Stöck M
3. Kneitz S
4. Klopp C
5. Woltering JM
6. Adolfi MC
7. Feron R
8. Prokopov D
9. Makunin A
10. Kichigin I
11. et al.
2020The sterlet sturgeon genome sequence and the mechanisms of segmental rediploidizationNat Ecol Evol 4:841–852Google Scholar
1. Duret L
2. Mouchiroud D
1999Expression pattern and, surprisingly, gene length shape codon usage in CaenorhabditisDrosophila, and Arabidopsis. PNAS 96:4482–4487Google Scholar
1. Dussex N
2. Morales HE
3. Grossen C
4. Dalén L
5. van Oosterhout C.
2023Purging and accumulation of genetic load in conservationTrends Ecol Evol 38:961–969Google Scholar
1. Dutheil J
2. Gaillard S
3. Bazin E
4. Glémin S
5. Ranwez V
6. Galtier N
7. Belkhir K
2006Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population geneticsBMC Bioinform 7:1–6Google Scholar
1. Dutheil J
2. Galtier N
2002Bioinformaticsa Java editor for large phylogenetic trees: BAOBAB pp. 892–893Google Scholar
1. Duvernell DD
2. Pryor SR
3. Adams SM
2004Teleost Fish Genomes Contain a Diverse Array of L1 RetrotransposonLineages That Exhibit a Low Copy Number and High Rate of TurnoverJ Mol Evol 59:298–308Google Scholar
1. Edwards SV
2. Fang B
3. Khost D
4. Kolyfetis GE
5. Cheek RG
6. DeRaad D
7. Chen N
8. Fitzpatrick JW
9. McCormack JE
10. Funk WC
11. et al.
2025Comparative population pangenomes reveal unexpected complexity and fitness effects of structural variantsbioRxiv https://doi.org/10.1101/2025.02.11.637762 Google Scholar
1. Elliott TA
2. Gregory TR
2015What’s in a genome?The C-value enigma and the evolution of eukaryotic genome content. Proc R Soc B Biol Sci 370:20140331https://doi.org/10.1098/rstb.2014.0331 Google Scholar
1. Felsenstein J
1985Phylogenies and the Comparative MethodAm Nat 125:1–15Google Scholar
1. Figuet E
2. Nabholz B
3. Bonneau M
4. Mas Carrio E
5. Nadachowska-Brzyska K
6. Ellegren H
7. Galtier N
2016Life History Traits, Protein Evolution, and the Nearly Neutral Theory in AmniotesMol Biol Evol 33:1517–1527Google Scholar
1. Flynn JM
2. Long M
3. Wing RA
4. Clark AG
2020Evolutionary Dynamics of Abundant 7-bp Satellites in the Genome of Drosophila virilisMol Biol Evol 37:1362–1375Google Scholar
1. Frahry MB
2. Sun C
3. Chong RA
4. Mueller RL
2015Low Levels of LTR Retrotransposon Deletion by Ectopic Recombination in the Gigantic Genomes of SalamandersJ Mol Evol 80:120–129Google Scholar
1. Furano AV
2. Duvernell DD
3. Boissinot S
2004L1 (LINE-1) retrotransposon diversity differs dramatically between mammals and fishTrends Genet 20:9–14Google Scholar
1. Galtier N
2024Half a Century of Controversy: The Neutralist/Selectionist Debate in Molecular EvolutionGenome Biol Evol 16:evae003https://doi.org/10.1093/gbe/evae003 Google Scholar
1. Goubert C
2. Modolo L
3. Vieira C
4. ValienteMoro C
5. Mavingui P
6. Boulesteix M
2015De Novo Assembly and Annotation of the Asian Tiger Mosquito (Aedes albopictus) Repeatome with dnaPipeTE from Raw Genomic Reads and Comparative Analysis with the Yellow Fever Mosquito (Aedes aegypti)Genome Biol Evol 7:1192–1205Google Scholar
1. Graur D
2. Zheng Y
3. Azevedo RBR
2015An Evolutionary Classification of Genomic FunctionGenome Biol Evol 7:642–645Google Scholar
1. Gregory TR
2001Coincidence, coevolution, or causation? DNA content, cell size, and the C-value enigmaBiol rev biol proc Camb Philos Soc 76:65–101Google Scholar
1. Gregory TR
2002Genome size and developmental complexityGenetica 115:131–146Google Scholar
1. Gregory TR
2. Hebert PDN
1999The Modulation of DNA Content: Proximate Causes and Ultimate ConsequencesGenome Res 9:317–324Google Scholar
1. Guéguen L
2. Duret L
2018Unbiased Estimate of Synonymous and Nonsynonymous Substitution Rates with Nonstationary Base CompositionMol Biol Evol 35:734–742Google Scholar
1. Guéguen L
2. Gaillard S
3. Boussau B
4. Gouy M
5. Groussin M
6. Rochette NC
7. Bigot T
8. Fournier D
9. Pouyet F
10. Cahais V
11. et al.
2013Bio++: Efficient Extensible Libraries and Tools for Computational Molecular EvolutionMol Biol Evol 30:1745–1750Google Scholar
1. Gurevich A
2. Saveliev V
3. Vyahhi N
4. Tesler G
2013QUAST: quality assessment tool for genome assembliesBioinformatics 29:1072–1075Google Scholar
1. Hardie DC
2. Gregory TR
3. Hebert PDN
2002From Pixels to Picograms: A Beginners’ Guide to Genome Quantification by Feulgen Image Analysis DensitometryJ Histochem Cytochem 50:735–749Google Scholar
1. Heckenhauer J
2. Frandsen PB
3. Sproul JS
4. Li Z
5. Paule J
6. Larracuente AM
7. Maughan PJ
8. Barker MS
9. Schneider JV
10. Stewart RJ
11. Pauls SU
2022Genome size evolution in the diverse nsect order TrichopteraGigaScience 11:giac011https://doi.org/10.1093/gigascience/giac011 Google Scholar
1. Herberstein ME
2. McLean DJ
3. Lowe E
4. Wolff JO
5. Khan MK
6. Smith K
7. Allen AP
8. Bulbert M
9. Buzatto BA
10. Eldridge MDB
11. et al.
2022AnimalTraits - a curated animal trait database for body mass, metabolic rate and brain sizeSci Data 9:265https://doi.org/10.1038/s41597-022-01364-9 Google Scholar
1. Hoang DT
2. Chernomor O
3. von Haeseler A
4. Minh BQ
5. Vinh LS.
2018UFBoot2: Improving the Ultrafast Bootstrap ApproximationMol Biol Evol 35:518–522Google Scholar
1. Imrit MA
2. Dogantzis KA
3. Harpur BA
4. Zayed A
2020Eusociality influences the strength of negative selection on insect genomesProc Royal Soc B 287:20201512https://doi.org/10.1098/rspb.2020.1512 Google Scholar
1. James JE
2. Lanfear R
3. Eyre-Walker A
2016Molecular Evolutionary Consequences of Island ColonizationGenome Biol Evol 8:1876–1888Google Scholar
1. Ji Y
2. Feng S
3. Wu L
4. Fang Q
5. Brüniche-Olsen A
6. DeWoody JA
7. Cheng Y
8. Zhang D
9. Hao Y
10. Song G
11. et al.
2022Orthologous microsatellites, transposable elements, and DNA deletions correlate with generation time and body mass in neoavian birdsSci Adv 8:eabo0099https://doi.org/10.1126/sciadv.abo0099 Google Scholar
1. Jockusch EL
1997An evolutionary correlate of genome size change in plethodontid salamandersProc R Soc B Biol Sci 264:597–604Google Scholar
1. Jones KE
2. Bielby J
3. Cardillo M
4. Fritz SA
5. O’Dell J
6. Orme CDL
7. Safi K
8. Sechrest W
9. Boakes EH
10. Carbone C
11. et al.
2009PanTHERIA: a species-level database of life history, ecology, and geography of extant and recently extinct mammalsEcology 90:2648–2648Google Scholar
1. Dunning JB.
2007CRC handbook of avian body massesAnn Arbor, Michigan, USA: CRC Press Google Scholar
1. Kalyaanamoorthy S
2. Minh BQ
3. Wong TKF
4. von Haeseler A
5. Jermiin LS.
2017ModelFinder: fast model selection for accurate phylogenetic estimatesNat Methods 14:587–589Google Scholar
1. Kapheim KM
2. Pan H
3. Li C
4. Salzberg SL
5. Puiu D
6. Magoc T
7. Robertson HM
8. Hudson ME
9. Venkat A
10. Fischman BJ
11. et al.
2015Genomic signatures of evolutionary transitions from solitary to group livingScience 348:1139–1143Google Scholar
1. Kapusta A
2. Suh A
2017Evolution of bird genomes—a transposon’s-eye viewAnn N Y Acad Sci 1389:164–185Google Scholar
1. Kapusta A
2. Suh A
3. Feschotte C
2017Dynamics of genome size evolution in birds and mammalsPNAS 114:E1460–E1469Google Scholar
1. Kidwell MG
2002Transposable elements and the evolution of genome size in eukaryotesGenetica 115:49–63Google Scholar
1. Kimura M
2. Ohta T
1969The average number of generations until fixation of a mutant gene n a finite populationGenetics 61:763Google Scholar
1. Koshikawa S
2. Miyazaki S
3. Cornette R
4. Matsumoto T
5. Miura T
2008Genome size of termites (Insecta, Dictyoptera, Isoptera) and wood roaches (Insecta, Dictyoptera, Cryptocercidae)Sci Nat 95:859–867Google Scholar
1. Lartillot N
2. Delsuc F
2012Joint reconstruction of divergence times and life-history evolution n placental mammals using a phylogenetic covariance modelEvolution 66:1773–1787Google Scholar
1. Lartillot N
2. Poujol R
2011A Phylogenetic Model for Investigating Correlated Evolution of Substitution Rates and Continuous Phenotypic CharactersMol Biol Evol 28:729–744Google Scholar
1. Latrille T
2. Lanore V
3. Lartillot N
2021Inferring long-term effective population size with mutation–selection modelsMol Biol Evol 38:4573–4587Google Scholar
1. Lavoie CA
2. Platt RN
3. Novick PA
4. Counterman BA
5. Ray DA
2013Transposable element evolution in Heliconius suggests genome diversity within LepidopteraMob DNA 4:1–10Google Scholar
1. Lechner M
2. Marz M
3. Ihling C
4. Sinz A
5. Stadler PF
6. Krauss V
2013The correlation of genome size and DNA methylation rate in metazoansTheory Biosci 132:47–60Google Scholar
1. Lefébure T
2. Morvan C
3. Malard F
4. François C
5. Konecny-Dupré L
6. Guéguen L
7. Weiss-Gayet M
8. Seguin-Orlando A
9. Ermini L
10. Sarkissian CD
11. et al.
2017Less effective selection leads to arger genomesGenome Res 27:1016–1028Google Scholar
1. Leroy T
2. Rousselle M
3. Tilak MK
4. Caizergues AE
5. Scornavacca C
6. Recuerda M
7. Fuchs J
8. Illera JC
9. De Swardt DH
10. Blanco G
11. et al.
2021Island songbirds as windows into evolution in small populationsCurr Biol 31:1303–1310Google Scholar
1. Letunic I
2. Bork P
2021Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotationNucleic Acids Res 49:W293–W296Google Scholar
1. Li H
2018Minimap2: pairwise alignment for nucleotide sequencesBioinformatics 34:3094–3100Google Scholar
1. Li X
2. Guo B
2020Substantially adaptive potential in polyploid cyprinid fishes: evidence from biogeographic, phylogenetic and genomic studiesProc Royal Soc B 287:20193008https://doi.org/10.1098/rspb.2019.3008 Google Scholar
1. Lien S
2. Koop BF
3. Sandve SR
4. Miller JR
5. Kent MP
6. Nome T
7. Hvidsten TR
8. Leong JS
9. Minkley DR
10. Zimin A
11. et al.
2016The Atlantic salmon genome provides insights into rediploidizationNature 533:200–205Google Scholar
1. Lynch M
2007The origins of genome architecture. Sinauer Associates, Sunderland, MassachusettsUsa Google Scholar
1. Lynch M
2011Statistical Inference on the Mechanisms of Genome EvolutionPLoS Genet 7:e1001389https://doi.org/10.1371/journal.pgen.1001389 Google Scholar
1. Lynch M
2. Ali F
3. Lin T
4. Wang Y
5. Ni J
6. Long H
2023The divergence of mutation rates and spectra across the Tree of LifeEMBO Rep 24:e57561https://doi.org/10.15252/embr.202357561 Google Scholar
1. Lynch M
2. Bobay L-M
3. Catania F
4. Gout J-F
5. Rho M
2011The Repatterning of Eukaryotic Genomes by Random Genetic DriftAnnu Rev Genomics Hum Genet 12:347–366Google Scholar
1. Lynch M
2. Conery JS
2003The Origins of Genome ComplexityScience 302:1401–1404Google Scholar
1. Mackintosh A
2. Laetsch DR
3. Hayward A
4. Charlesworth B
5. Waterfall M
6. Vila R
7. Lohse K
2019The determinants of genetic diversity in butterfliesNat Commun 10:3466https://doi.org/10.1038/s41467-019-11308-4 Google Scholar
1. Manni M
2. Berkeley MR
3. Seppey M
4. Simão FA
5. Zdobnov EM
2021BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral GenomesMol Biol Evol 38:4647–4654Google Scholar
1. Martelossi J
2. Nicolini F
3. Subacchi S
4. Pasquale D
5. Ghiselli F
6. Luchetti A
2023Multiple and diversified transposon lineages contribute to early and recent bivalve genome evolutionBMC Biol 21:1–23Google Scholar
1. Mérel V
2. Gibert P
3. Buch I
4. Rodriguez Rada V
5. Estoup A
6. Gautier M
7. Fablet M
8. Boulesteix M
9. Vieira C
2021The Worldwide Invasion of Drosophila suzukii Is Accompanied by a Large Increase of Transposable Element Load and a Small Number of Putatively Adaptive InsertionsMol Biol Evol 38:4252–4267Google Scholar
1. Mérel V
2. Tricou T
3. Burlet N
4. Haudry AV
2024Relaxed purifying selection is associated with an accumulation of transposable elements in fliesbioRxiv https://doi.org/10.1101/2024.01.23.576885 Google Scholar
1. Mikhailova AA
2. Rinke S
3. Harrison MC
2023Genomic signatures of eusocial evolution in nsectsCurr Opin Insect Sci 61:101136https://doi.org/10.1016/j.cois.2023.101136 Google Scholar
1. Mills RE
2. Bennett EA
3. Iskow RC
4. Devine SE
2007Which transposable elements are active n the human genome?Trends Genet 23:183–191Google Scholar
1. Modolo L
2. Lerat E
2015UrQt: an efficient software for the Unsupervised Quality trimming of NGS dataBMC Bioinform 16:1–8Google Scholar
1. Mohlhenrich ER
2. Mueller RL
2016Genetic drift and mutational hazard in the evolution of salamander genomic gigantismEvolution 70:2865–2878Google Scholar
1. Mugal CF
2. Kutschera VE
3. Botero-Castro F
4. Wolf JBW
5. Kaj I
2020Polymorphism Data Assist Estimation of the Nonsynonymous over Synonymous Fixation Rate Ratio ω for Closely Related SpeciesMol Biol Evol 37:260–279Google Scholar
1. Müller R
2. Kaj I
3. Mugal CF
2022A nearly neutral model of molecular signatures of natural selection after change in population sizeGenome Biol Evol 14:evac058https://doi.org/10.1093/gbe/evac058 Google Scholar
1. Nguyen L-T
2. Schmidt HA
3. von Haeseler A
4. Minh BQ.
2015IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood PhylogeniesMol Biol Evol 32:268–274Google Scholar
1. Nikolaev SI
2. Montoya-Burgos JI
3. Popadin K
4. Parand L
5. Margulies EH
2007National Institutes of Health Intramural Sequencing Center Comparative Sequencing Program, Antonarakis SELife-history traits drive the evolutionary rates of mammalian coding and noncoding genomic elements. PNAS 104:20443–20448Google Scholar
1. Novick PA
2. Basta H
3. Floumanhaft M
4. McClure MA
5. Boissinot S
2009The Evolutionary Dynamics of Autonomous Non-LTR Retrotransposons in the Lizard Anolis Carolinensis Shows More Similarity to Fish Than MammalsMol Biol Evol 26:1811–1822Google Scholar
1. Ohta T
1992The Nearly Neutral Theory of Molecular EvolutionAnnu Rev Ecol Evol Syst 23:263–286Google Scholar
1. Olmo E
2. Capriglione T
3. Odierna G
1989Genome size evolution in vertebrates: Trends and constraintsComp. Biochem. Physiol. B 92:447–453Google Scholar
1. Osmanski AB
2. Paulat NS
3. Korstian J
4. Grimshaw JR
5. Halsey M
6. Sullivan KAM
7. Moreno-Santillán DD
8. Crookshanks C
9. Roberts J
10. Garcia C
11. et al.
2023Insights into mammalian TE diversity through the curation of 248 genome assembliesScience 380:eabn1430https://doi.org/10.1126/science.abn1430 Google Scholar
1. Paradis E
2. Claude J
3. Strimmer K
2004APE: Analyses of Phylogenetics and Evolution in R anguageBioinformatics 20:289–290Google Scholar
1. Pasquesi GIM
2. Adams RH
3. Card DC
4. Schield DR
5. Corbin AB
6. Perry BW
7. Reyes-Velasco J
8. Ruggiero RP
9. Vandewege MW
10. Shortt JA
11. et al.
2018Squamate reptiles challenge paradigms of genomic repeat element evolution set by birds and mammalsNat Commun 9:2774https://doi.org/10.1038/s41467-018-05279-1 Google Scholar
1. Peona V
2. Kutschera VE
3. Blom MPK
4. Irestedt M
5. Suh A
2023Satellite DNA evolution in Corvoidea inferred from short and long readsMol Ecol 32:1288–1305Google Scholar
1. Peona V
2. Weissensteiner MH
3. Suh A
2018How complete are “complete” genome assemblies?—An avian perspective. Mol Ecol Resour 18:1188–1195Google Scholar
1. Petersen M
2. Armisén D
3. Gibbs RA
4. Hering L
5. Khila A
6. Mayer G
7. Richards S
8. Niehuis O
9. Misof B
2019Diversity and evolution of the transposable element repertoire in arthropods with particular reference to insectsBMC Ecol Evol 19:1–15Google Scholar
1. Petrov DA
2001Evolution of genome size: new approaches to an old problemTrends Genet 17:23–28Google Scholar
1. Pita S
2. Panzera F
3. Mora P
4. Vela J
5. Cuadrado A
6. Sánchez A
7. Palomeque T
8. Lorite P
2017Comparative repeatome analysis on Triatoma infestans Andean and Non-Andean lineages, main vector of Chagas diseasePLoS One 12:e0181635https://doi.org/10.1371/journal.pone.0181635 Google Scholar
1. Popadin K
2. Polishchuk LV
3. Mamirova L
4. Knorre D
5. Gunbin K
2007Accumulation of slightly deleterious mutations in mitochondrial protein-coding genes of large versus small mammalsPNAS 104:13390–13395Google Scholar
1. Ranwez V
2. Douzery EJP
3. Cambon C
4. Chantret N
5. Delsuc F
2018MACSE v2: Toolkit for the Alignment of Coding Sequences Accounting for Frameshifts and Stop CodonsMol Biol Evol 35:2582–2584Google Scholar
1. Ratnakumar A
2. Mousset S
3. Glémin S
4. Berglund J
5. Galtier N
6. Duret L
7. Webster MT
2010Detecting positive selection within genomes: the problem of biased gene conversionPhilos Trans R Soc B Biol Sci 365:2571–2580Google Scholar
1. Reinar WB
2. Tørresen OK
3. Nederbragt AJ
4. Matschiner M
5. Jentoft S
6. Jakobsen KS
2023Teleost genomic repeat landscapes in light of diversification rates and ecologyMob DNA 14:14Google Scholar
1. Rios-Carlos H
2. Segovia-Ramírez MG
3. Fujita MK
4. Rovito SM
2024Genomic Gigantism is not Associated with Reduced Selection Efficiency in Neotropical SalamandersJ Mol Evol :1–10Google Scholar
1. Roddy AB
2. Alvarez-Ponce D
3. Roy SW
2021Mammals with Small Populations Do Not Exhibit Larger GenomesMol Biol Evol 38:3737–3741Google Scholar
1. Romiguier J
2. Figuet E
3. Galtier N
4. Douzery EJP
5. Boussau B
6. Dutheil JY
7. Ranwez V
2012Fast and Robust Characterization of Time-Heterogeneous Sequence Evolutionary Processes Using Substitution MappingPLoS One 7:e33852https://doi.org/10.1371/journal.pone.0033852 Google Scholar
1. Romiguier J
2. Lourenco J
3. Gayral P
4. Faivre N
5. Weinert LA
6. Ravel S
7. Ballenghien M
8. Cahais V
9. Bernard A
10. Loire E
11. Keller L
12. Galtier N
2014Population genomics of eusocial insects: the costs of a vertebrate-like effective population sizeJ Evol Biol 27:593–603Google Scholar
1. Ruggiero RP
2. Bourgeois Y
3. Boissinot S
2017LINE Insertion Polymorphisms are Abundant but at Low Frequencies across Populations of Anolis carolinensisFront genet 8:44https://doi.org/10.3389/fgene.2017.00044 Google Scholar
1. Ruzzante L
2. Reijnders MJMF
3. Waterhouse RM
2019Of Genes and Genomes: Mosquito Evolution and DiversityTrends Parasitol 35:32–51Google Scholar
1. Scornavacca C
2. Belkhir K
3. Lopez J
4. Dernat R
5. Delsuc F
6. Douzery EJP
7. Ranwez V
2019OrthoMaM v10: Scaling-Up Orthologous Coding Sequence and Exon Alignments with More than One Hundred Mammalian GenomesMol Biol Evol 36:861–862Google Scholar
1. Sessegolo C
2. Burlet N
3. Haudry A
2016Strong phylogenetic inertia on genome size and transposable element content among 26 species of fliesBiol Lett 12:20160407https://doi.org/10.1098/rsbl.2016.0407 Google Scholar
1. Sotero-Caio CG
2. Platt RN
3. Suh A
4. Ray DA
2017Evolution and Diversity of Transposable Elements in Vertebrate GenomesGenome Biol Evol 9:161–177Google Scholar
1. Sproul JS
2. Hotaling S
3. Heckenhauer J
4. Powell A
5. Marshall D
6. Larracuente AM
7. Kelley JL
8. Pauls SU
9. Frandsen PB
2023Analyses of 600+ insect genomes reveal repetitive element dynamics and highlight biodiversity-scale repeat annotation challengesGenome Res 33:1708–1717Google Scholar
1. Tacutu R
2. Thornton D
3. Johnson E
4. Budovsky A
5. Barardo D
6. Craig T
7. Diana E
8. Lehmann G
9. Toren D
10. Wang J
11. Fraifeld VE
12. de Magalhães JP.
2018Human Ageing Genomic Resources: new and updated databasesNucleic Acids Res 46:D1083–D1090Google Scholar
1. Tollis M
2. Boissinot S
2013Lizards and LINEs: Selection and Demography Affect the Fate of L1 Retrotransposons in the Genome of the Green Anole (Anolis carolinensis)Genome Biol Evol 5:1754–1768Google Scholar
1. Vinogradov AE
1995Nucleotypic Effect in Homeotherms: Body-Mass-Corrected Basal Metabolic Rate of Mammals Is Related to Genome SizeEvolution 49:1249–1259Google Scholar
1. Vinogradov AE
1997Nucleotypic Effect in Homeotherms: Body-Mass Independent Resting Metabolic Rate of Passerine Birds Is Related to Genome SizeEvolution 51:220–225Google Scholar
1. Volff J-N
2. Bouneau L
3. Ozouf-Costaz C
4. Fischer C
2003Diversity of retrotransposable elements in compact pufferfish genomesTrends Genet 19:674–678Google Scholar
1. Wang J
2. Yuan L
3. Tang J
4. Liu J
5. Sun C
6. Itgen MW
7. Chen G
8. Sessions SK
9. Zhang G
10. et al.
2023Transposable element and host silencing activity in gigantic genomesFront cell dev biol 11:1124374https://doi.org/10.3389/fcell.2023.1124374 Google Scholar
1. Wang X
2. Fang X
3. Yang P
4. Jiang X
5. Jiang F
6. Zhao D
7. Li B
8. Cui F
9. Wei J
10. Ma C
11. et al.
2014The locust genome provides insight into swarm formation and long-distance flightNat Commun 5:2957https://doi.org/10.1038/ncomms3957 Google Scholar
1. Wang Y
2. Obbard DJ
2023Experimental estimates of germline mutation rate in eukaryotes: a phylogenetic meta-analysisEvol Lett 7:216–226Google Scholar
1. Weber CC
2. Nabholz B
3. Romiguier J
4. Ellegren H
2014Kr/Kc but not dN/dS correlates positively with body mass in birds, raising implications for inferring lineage-specific selectionGenome Biol 15:1–13https://doi.org/10.1186/s13059-014-0542-8 Google Scholar
1. Weyna A
2. Romiguier J
2021Relaxation of purifying selection suggests low effective population size in eusocial Hymenoptera and solitary pollinating beesPeer Community J 1:e2https://doi.org/10.24072/pcjournal.3 Google Scholar
1. Whitney KD
2. Boussau B
3. Baack EJ
4. Garland Jr T
2011Drift and Genome Complexity RevisitedPLoS Genet 7:e1002092https://doi.org/10.1371/journal.pgen.1002092 Google Scholar
1. Whitney KD
2. Garland Jr T
2010Did Genetic Drift Drive Increases in Genome Complexity?PLoS Genet 6:e1001080https://doi.org/10.1371/journal.pgen.1001080 Google Scholar
1. Woolfit M
2. Bromham L
2005Population size and molecular evolution on islandsProc Royal Soc B 272:2277–2282Google Scholar
1. Wright NA
2. Gregory TR
3. Witt CC
2014Metabolic ‘engines’ of flight drive genome size reduction in birdsProc Royal Soc B 281:20132780https://doi.org/10.1098/rspb.2013.2780 Google Scholar
1. Wu C
2. Lu J
2019Diversification of Transposable Elements in Arthropods and Its Impact on Genome EvolutionGenes 10:338https://doi.org/10.3390/genes10050338 Google Scholar
1. Yang H
2. Goubert C
3. Cotoras DD
4. Dimitrov D
5. Graham NR
6. Cerca J
7. Gillespie RG
2024Consistent accumulation of transposable elements in species of the Hawaiian Tetragnatha spiny-leg adaptive radiation across the archipelago chronosequenceEvol J Linn Soc 3:kzae005https://doi.org/10.1093/evolinnean/kzae005 Google Scholar
1. Yi S
2. Streelman JT
2005Genome size is negatively correlated with effective population size n ray-finned fishTrends Genet 21:643–646Google Scholar
1. Yi SV
2006Non-adaptive evolution of genome complexityBioessays 28:979–982Google Scholar
1. Zhang Q
2. Edwards SV
2012The Evolution of Intron Size in Amniotes: A Role for Powered Flight?Genome Biol Evol 4:1033–1043Google Scholar
1. Zhou W
2. Liang G
3. Molloy PL
4. Jones PA
2020DNA methylation enables transposable element-driven genome expansionPNAS 117:19359–19366Google Scholar

Article and author information

Author information

Alba Marino
ISEM, Univ Montpellier, CNRS, IRD, Montpellier, France
ORCID iD: 0009-0005-0984-2524
- For correspondence: alba.marino@umontpellier.fr
Gautier Debaecker
Université Claude Bernard Lyon 1, LEHNA, UMR 5023, CNRS, Villeurbanne, France
Anna-Sophie Fiston-Lavier
ISEM, Univ Montpellier, CNRS, IRD, Montpellier, France, Institut Universitaire de France (IUF), Paris, France
ORCID iD: 0000-0002-7306-6532
Annabelle Haudry
Université Claude Bernard Lyon 1, LBBE, UMR 5558, CNRS, Villeurbanne, France
ORCID iD: 0000-0001-6088-0909
Benoit Nabholz
ISEM, Univ Montpellier, CNRS, IRD, Montpellier, France, Institut Universitaire de France (IUF), Paris, France
ORCID iD: 0000-0003-0447-1451

Author Notes

Competing interests: No competing interests declared

Version history

Preprint posted: June 11, 2024
Sent for peer review: June 17, 2024
Reviewed Preprint version 1: September 11, 2024
Reviewed Preprint version 2: June 18, 2025
Version of Record published: July 18, 2025

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.100574. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

views: 3,071
downloads: 202
citations: 13

Views, downloads and citations are aggregated across all versions of this paper published by eLife.