1. Evolutionary Biology
  2. Genetics and Genomics
Download icon

A large genomic insertion containing a duplicated follistatin gene is linked to the pea aphid male wing dimorphism

  1. Binshuang Li
  2. Ryan D Bickel
  3. Benjamin J Parker
  4. Omid Saleh Ziabari
  5. Fangzhou Liu
  6. Neetha Nanoth Vellichirammal
  7. Jean-Christophe Simon
  8. David L Stern
  9. Jennifer A Brisson  Is a corresponding author
  1. Department of Biology, University of Rochester, United States
  2. University of Nebraska Medical Center, United States
  3. INRAE, UMR 1349 IGEPP, France
  4. Janelia Research Campus of the Howard Hughes Medical Institute, United States
Research Article
  • Cited 0
  • Views 1,047
  • Annotations
Cite this article as: eLife 2020;9:e50608 doi: 10.7554/eLife.50608

Abstract

Wing dimorphisms have long served as models for examining the ecological and evolutionary tradeoffs associated with alternative phenotypes. Here, we investigated the genetic cause of the pea aphid (Acyrthosiphon pisum) male wing dimorphism, wherein males exhibit one of two morphologies that differ in correlated traits that include the presence or absence of wings. We mapped this trait difference to a single genomic region and, using third generation, long-read sequencing, we identified a 120 kb insertion in the wingless allele. This insertion includes a duplicated follistatin gene, which is a strong candidate gene in the minimal mapped interval to cause the dimorphism. We found that both alleles were present prior to pea aphid biotype lineage diversification, we estimated that the insertion occurred millions of years ago, and we propose that both alleles have been maintained in the species, likely due to balancing selection.

Introduction

The evolutionary loss of flight ability in insects has long been leveraged to study the patterns and processes of adaptation (Zera and Denno, 1997; Roff, 1990; Harrison, 1980). The loss of flight has occurred repeatedly across insect orders, such as ants, termites, beetles, crickets, and aphids (Zera and Denno, 1997), with such evolutionary shifts to flightlessness co-occurring with entry into stable or isolated habitats where flight is unnecessary (Roff, 1990; Wagner and Liebherr, 1992). Loss of flight allows animals to shift resources toward production of offspring instead of flight appendages and fuel. Flight-capable and flight-incapable insects therefore present an evolutionary trade-off between dispersal and reproduction (Zera and Denno, 1997; Harrison, 1980). This tradeoff is sometimes observed within species, where some individuals develop with wings and others are wingless. We focus here on wing dimorphisms that are under genetic control (Roff, 1986), although some species display phenotypically plastic wing dimorphisms (Zera and Denno, 1997).

Wing dimorphisms offer an appealing system for exploring the genetic basis of adaptation because of their relatively simple genetic architectures. In species of weevils, ground beetles, and dung flies, for example, a single locus controls each species’ wing dimorphism (Roff, 1986). Other dimorphisms are polygenic, but Roff (1986) found that eight of 22 species exhibited Mendelian inheritance. Transitions between flight-capable and flight-incapable morphs have probably evolved thousands of times in insects (Wagner and Liebherr, 1992; Whiting et al., 2003). Despite the importance of alternative winged and wingless morphs in insect adaptation, none of the loci controlling wing dimorphism have yet been characterized (Saastamoinen et al., 2018).

Here, we investigated the genetic basis and evolution of the pea aphid (Acyrthosiphon pisum) male wing dimorphism. Pea aphid males exhibit distinct winged and wingless adult phenotypes (Figure 1A) and differ in a set of correlated traits. Winged males have well-developed fore and hind-wings, and direct and indirect flight muscles, while wingless males have a smaller thorax, and lack wings and associated flight musculature (Dixon and Howard, 1986; Braendle et al., 2006; Ogawa et al., 2012). The males also differ behaviorally; winged males are more active than wingless males and thus may gain more mating opportunities with females in competitive trials (Sack and Stern, 2007), while the wingless males do not fly and thus avoid the high mortality associated with flight. Unlike the aphid asexual female polyphenism where environmental cues like high density and low food quality lead to the production of winged versus wingless female morphs (Dixon, 1973), the male polymorphism is genetically determined. A single locus with two alleles on the X chromosome called aphicarus (‘aphid’ plus ‘Icarus’, api) controls male morphology (Braendle et al., 2005; Caillaud et al., 2002). Male aphids carry only a single X chromosome (females are XX, males are XO), so males carry only a single winged or wingless allele. The correlated anatomical and behavioral traits suggest that the causal locus can regulate development, and perhaps physiology, in multiple anatomical domains. We found that a large structural variant, resulting from a duplication event, underlies the winged versus wingless differences. We then explore the evolutionary history of this locus, including the maintenance of the duplication polymorphism and its distribution in present-day populations.

Figure 1 with 4 supplements see all
Linkage and association mapping of api.

(A) Winged (top) and wingless (bottom) males. (B) Top: QTL analysis revealed the presence of a single QTL peak on the X chromosome as assembled by Li et al. (2019). The blue line indicates p=0.01 determined by 1000 permutations of the phenotype data relative to the genotype data. Further RFLP analysis of recombinant F2 individuals localized api to a ~ 190 kb region, which is highlighted by a green, vertical line. Bottom: Genetic differentiation between winged and wingless males using FST values calculated from 20 kb windows across the X chromosome. (C) Dot plots showing areas of similarity among our newly assembled pea aphid genomic scaffolds containing api (either W:winged or WL:wingless, as indicated) and the scaffold containing api from a species in the Carolinaia genus.

Results and discussion

Mapping the api locus

The api locus is located on the X chromosome, which is estimated to contain a third of the pea aphid genome (Jaquiéry et al., 2018). The locus was previously localized to a 10 cM region (Braendle et al., 2005), a large region consisting of hundreds of genes. We took a two-pronged mapping approach to narrow down the api region. First, we crossed males and females from the F1 line from the original mapping population (Braendle et al., 2005) to generate additional F2 individuals. We performed multiplexed shotgun genotyping (Andolfatto et al., 2011) to simultaneously identify and score single nucleotide polymorphisms (SNPs) from 164 of these F2s. QTL analysis of these data using a recently published chromosome-level assembly (Li et al., 2019) resulted in a single LOD peak on the X chromosome (LOD 10.8, with a 1% LOD permutation threshold of 3.6, Figure 1B, top; Source data 1). Second, we performed genome-wide association mapping using 44 winged and 44 wingless males collected across the U.S. from alfalfa (Medicago sativa) fields (Supplementary file 1). We pooled the 44 winged and wingless males separately and performed Illumina resequencing to high coverage for a total of 68X and 70X coverage, respectively. FST analysis between the winged and wingless sequenced pools revealed a single, large FST value (Figure 1B, bottom; Source data 2) close to the LOD peak from the QTL analysis.

We further refined the api region by assaying restriction fragment length polymorphism (RFLP) markers on a panel of 40 F2 individuals that each carried a recombination event between the previous closest api flanking markers (Braendle et al., 2005). Two markers displayed perfect association with all 40 F2s, narrowing the api interval to a ~ 190 kb region (see the green vertical line in Figure 1B, spanning positions from 62.56M to 62.75M). Hereafter, we will use the term ‘api region’ to refer to this genetically defined region.

We then used an additional set of fully sequenced pea aphid genomes to independently verify the genomic location of api. The pea aphid is a species complex with as many as 15 host specialized lineages, called biotypes, that probably began diverging 500,000 years ago (Fazalova and Nevado, 2019, but see Peccoud et al., 2009b). Biotypes have limited gene flow between them, and the ones with the least genetic exchange have been described as incipient species (Peccoud et al., 2009a; Peccoud et al., 2009b). We used the complete genomes of 23 individual genotypes from nine different biotypes to investigate patterns of association in the api region: nine winged allele carrying genotypes from five biotypes, and 14 wingless allele carrying genotypes from six biotypes (two biotypes, pea and alfalfa, had individuals of both allele classes; we sequenced males of these individuals to separately sequence the two X chromosomes). SNPs across the api region segregated perfectly with the male winged or wingless phenotype across these biotype samples. Thus, data from linkage mapping, and the association study within and between biotypes, all confirm the same genomic location of the causative api locus.

Genome sequencing reveals a structural variant that differentiates the two alleles

Given the lack of a contiguous reference sequence in the api region in the previous assembly of the reference genome (v2.0), we separately sequenced the genomes of an F2 female homozygous for the winged (W) allele to 28x coverage and an F2 female homozygous for the wingless (WL) allele to 11x coverage using Nanopore long-read sequencing. For each allele, we assembled contiguous sequences that spanned the entire api region (WL scaffold ID = utg000342l; W scaffold ID = tig00159082l). We identified a ~ 120 kb insertion in the WL scaffold relative to the W scaffold (Figure 1C, left axis). The SNPs that originally identified this genomic region in the linkage and association mapping are adjacent to, and thus linked to, this insertion. The pool-seq reads from wingless males (generated for the association mapping discussed above) generated the same level of coverage across this insertion (including reads that cross the insertion boundaries) as the surrounding sequence, while the winged male pool-seq reads did not map to this inserted region, confirming the absence of this sequence in winged male genomes (Figure 1—figure supplement 1). This 120 kb insertion is not assembled correctly in the original pea aphid genome published in International Aphid Genomics Consortium (2010), which is homozygous for the wingless allele, so a portion of this region is present in that assembly as a separate contig. In the more recent, near-chromosome length assembly of the pea aphid genome (Li et al., 2019), the authors reported linkage of an approximately 50 kb insertion to the api locus. However, we found that the api region is incompletely assembled in this new assembly as well. We therefore use our new, wingless assembly for the remainder of this work.

To determine whether this indel represents an insertion in the wingless allele or a deletion in the winged allele, we aligned the winged and wingless api scaffolds against homologous genomic scaffolds from three other aphid species, the peach-potato aphid (Myzus persicae), an unidentified species from the Carolinaia genus (Carolinaia spp.), and the Russian wheat aphid (Diuraphis noxia) (other available aphid genomes were not used because they did not have scaffolds that spanned this region). The peach-potato aphid produces only winged males and the Russian wheat aphid produces only wingless males. We do not know the male phenotype of the Carolinaia species. We observed that the pea aphid wingless scaffold contained ~120 kb of sequence not present in other species, while the pea aphid winged scaffold exhibited nearly complete synteny across the region (Figure 1C, right axis, for comparison to the Carolinaia spp.; see Figure 1—figure supplement 2 for comparison to all three aphid species). Thus, the ~120 kb sequence in the wingless allele is an evolutionarily derived insertion that occurred in the lineage that led to pea aphids. The majority of the species closely related to the pea aphid produce only winged males (Figure 1—figure supplement 3), so the wingless phenotype is likely the derived phenotype. Thus, both the wingless male phenotype and the insertion appear to be derived characteristics of pea aphids.

To investigate the source of this inserted sequence, we used a BLASTn analysis of the ~120 kb wingless-specific region against our newly reconstructed pea aphid genome. This analysis revealed that approximately half of the inserted sequence showed similarity to another region of the reference genome (scaffold utg000238I; Figure 1—figure supplement 4), which is on an autosome. Thus, most of the inserted sequence at api is derived from a duplication of an autosomal region. The remainder of the insertion is largely composed of complex repeats which are likely the result of mobile element insertions that occurred after the duplication event.

A candidate follistatin gene in the inserted sequence

Annotation software (Augustus v.3.3.1) predicted 12 open reading frames (ORFs) in the wingless api insertion. The gene models located within the insertion in Figure 2a are from this annotation; models outside the insertion are the genome v3.0 annotations (Li et al., 2019). 11 of the 12 ORFs (Supplementary file 1) appear to be remnants of transposable element insertions. Ten show nucleotide similarity to previously identified pea aphid transposable element sequences (International Aphid Genomics Consortium, 2010). Another, ORF g6, has a DDE_Tnp4 superfamily domain (7.15e-3), which is part of the DDE superfamily of endonucleases that are likely transposases. Its predicted protein sequence is highly repetitive, with over 100 copies present across the pea aphid genome (tBLASTn evalue < −20). 38% of the pea aphid genome has been identified as derived from transposable element sequences (International Aphid Genomics Consortium, 2010), thus the api insertion is not unique in containing these elements. The remaining ORF, fs-3, is not repetitive and, importantly, is the only ORF that has homologous sequence in the insertion’s autosomal source location.

Figure 2 with 2 supplements see all
An expressed follistatin gene duplicate in the wingless-specific insertion.

(A) Gene models at the api locus, with distances shown in bp on the x-axis. The recombination mapping defined api region and the wingless-allele specific insertion are indicated. ORFs inside the wingless-specific insertion are our own annotation, while those outside the insertion are those of the genome v3.0 (Li et al., 2019). Repetitive ORFs and those outside of the api region are shown in gray. ORFs with evidence for expression either from RT-PCR or RT-qPCR are shown in black. (B) Phylogenetic tree of the aphids studied here: the soybean aphid (Aphis glycines), the bird cherry-oat aphid (Rhopalosiphum padi), the peach-potato aphid (Myzus persicae), an unidentified Carolinaia genus species, the rose aphid (Macrosiphum rosae), and the pea aphid (Acyrthosiphon pisum). The maximum likelihood tree was constructed from 2,378 bp of DNA sequence from elongation factor 1α, 18S rRNA, 12S rRNA, cytochrome b, and cytochrome c oxidase subunit I collected from Genbank. (C) Maximum likelihood tree of the different copies of follistatin (fs) nucleotide sequences. Nodes with bootstrap values < 80 are collapsed. (D) RT-PCR analysis of fs-3 and fs-2 gene expression across different developmental stages (E: embryo cDNA, 1: 1st instar nymph cDNA, 2: 2nd instar nymph cDNA; +: genomic DNA, -: no cDNA) of wingless males, winged males, daughters of crowded adult wingless females (‘winged’ females, which are predominantly winged females), daughters of uncrowded adult wingless females (‘wingless’ females, which are predominantly wingless female samples). Females here are heterozygous for the api locus, and thus have fs-3 as indicated by the positive band in the female gDNA lanes. Data for the G3PDH gene are provided as a positive control.

The ORF that originated with the insertion shows high similarity to the Drosophila melanogaster follistatin (fs) gene (blastp e-value of 4e−43). In Drosophila, fs is a secreted protein that modulates TGF-β signaling by interacting antagonistically with activin (Bickel et al., 2008; Pentek et al., 2009). The Drosophila Fs protein contains follistatin-Kazal domains necessary for activin binding (Wang et al., 2000; Keutmann et al., 2004). Three follistatin-Kazal domains are present in the pea aphid fs gene that is in the wingless-specific insertion.

We searched the pea aphid genome for fs homologs and identified three total fs homologs: the fs homolog in the polymorphic insertion (fs-3), the fs homolog in the source of the insertion (fs-2), and a third homolog (fs-1). A search of four other published aphid genomes revealed only a single fs copy in each species. The other aphid genomes are significantly diverged from the pea aphid; the most closely related genome to the pea aphid is that of the peach-potato aphid, which diverged from the pea aphid over 40 million years ago (Kim et al., 2011). We therefore used the Illumina technology to sequence the genome of the rose aphid, Macrosiphum rosae, which diverged from the pea aphid more recently (Figure 2BKim et al., 2011; Hardy et al., 2015), although the exact divergence time is unknown. We examined the relationships among the fs homologs relative to the species tree (Figure 2B,C). fs-1 has a slower rate of evolutionary change compared to the fs-2/fs-3 lineage, suggesting that fs-1 likely retains the ancestral follistatin function.

The fs gene tree (Figure 2C) shows that there have been two fs duplication events. The first duplication created fs-1 and fs-2/3. It occurred around the time that the Macrosiphini aphids represented here (the peach-potato, Carolinaia spp., rose, and pea aphid) diverged from one another about 42 mya (Kim et al., 2011). The rate of synonymous substitution (dS) between fs-1 and the fs-2/fs-3 lineage is consistent with this divergence time: the average dS between the peach-potato aphid and the pea aphid is 0.25 (Ji et al., 2016), while dS between fs-1 and fs-2 or fs-3 are very similar at 0.28 and 0.29, respectively.

Despite this ancient duplication, of the species examined here, we found fs-2 and fs-3 only in the pea aphid genome (and not the rose aphid or the Carolinaia spp. aphid). More recently, fs-2/3 duplicated to produce fs-2 (the autosomal, api-source copy) and fs-3 (the copy found solely in the pea aphid wingless allele). The most parsimonious explanation for these patterns is that the initial duplication that gave rise to the fs-2/3 lineage arose after the split of the pea aphid and the peach-potato aphid, and this duplicate copy was subsequently lost in the rose aphid lineage. The ambiguity of the timing, and the limited taxon sampling, however, do not preclude other evolutionary scenarios. Future sequencing of more aphid genomes would provide insight into the timing of the emergence of these paralogs.

All pairwise dN/dS comparisons using fs copies in Figure 2C resulted in values less than one (range of 0.03–0.21, Supplementary file 2), indicating purifying selection acting on all copies. Thus, all three fs paralogs seem to be functional genes. Although all three fs genes appear to be functional, fs-3 is only present in the WL insertion, which is not found in winged males. Therefore, fs-3 is not required for aphid survival, although it appears to provide a benefit to individuals that carry it.

To examine if fs-3 is expressed in wingless males, we performed RT-PCR on cDNA collected from winged and wingless males across three developmental stages (late stage embryos, 1st instar nymphs, and 2nd instar nymphs, Figure 2D). We did not perform RT-qPCR for this gene because our experimental design was aimed at confirming that the gene was indeed expressed, and not to perform an analysis of comparative expression levels. The winged allele lacks fs-3 and thus winged males show no expression (Figure 2D). In contrast, the wingless males expressed fs-3 (Figure 2D). Female samples are also shown for comparison. Winged and wingless males are morphologically distinct by the second nymphal instar (Ogawa et al., 2012), so the developmental timepoint of morph determination must occur before this stage. In the phenotypically plastic pea aphid females, wing morph determination occurs embryonically (Sutherland, 1969). We reasoned that the action of api, and thus morph determination, likely begins at the embryonic stage and possibly persists throughout development. fs-3 gene expression present at these early developmental stages is consistent with this expectation. fs-3 is a strong candidate gene for causing the wingless phenotype, but other genes within or flanking the WL insertion cannot be ruled out as contributing to, or causing, male winglessness. For example, there is high sequence divergence between the winged and wingless alleles in the flanking sequence. In the 10 kb to the right of the insertion we observe ~6% divergence in the alignable portion of the sequence, plus indels that range from 1 to 920 bp. Many SNPs in the flanking region are strongly associated with the male phenotype (Figure 2—figure supplement 1).

We examined whether gene expression patterns for other genes in this region are consistent with their potential role in causing male winglessness. There are five genes present in the pea aphid genome annotation that are outside of the insertion, but inside or partially inside the minimal api region (pea aphid genome annotation v3.0, Figure 2A; Supplementary file 1). We provided names to these five genes as follows: two aphid-specific genes (as1, as2; neither gene contains conserved domains and therefore their possible functions are unknown), a predicted mitochondrial sorting and assembly gene (sam50), a predicted fibroblast growth factor receptor substrate gene (frs2), and a predicted chromodomain-encoding gene (cdg). We found no nonsynonymous polymorphisms in these genes that perfectly segregate with the api phenotype across the pool-seq and biotype data. Any mutations that contribute to the winged and wingless males would, therefore, likely be regulatory. We therefore measured the expression levels of the five genes (as1, as2, sam50, frs2, and cdg) using RT-qPCR. We focused on two developmental stages, embryos and first instar nymphs, as the likely critical times as explained above. as2 is not expressed in males at these stages. Of the four other genes (as1, sam50, frs2, and cdg), none were significantly differentially expressed after multiple comparison correction (Figure 2—figure supplement 2, although as1 embryos had an uncorrected t-test p-value=0.013; Source data 3). We therefore found no strong evidence that these genes contribute to the male wing dimorphism.

Multiple lines of evidence suggest that fs-3 is the male morph determination gene. fs-3 is present in all wingless males and absent from winged males. It is not essential for aphid survival because winged males do not have it. Yet, fs-3 is under purifying selection and its coding sequence has persisted, intact, for thousands of generations. The maintenance of fs-3 suggests that it provides a selective advantage, but this advantage can be present only in individuals that have fs-3 (i.e., only wingless males or females carrying this allele). Furthermore, fs-3 is expressed at the relevant time for mediating morph determination. A transgene rescue or mutational knock-out experiment (ideally both) are ultimately required to determine whether fs-3 causes the male wing polymorphism.

The function of fs, as studied in Drosophila, provides insight into how fs-3 might function in the pea aphid male wing polymorphism. The sole Drosophila fs gene encodes a protein, Fs, that binds activins to affect TGF-β/activin signaling (Bickel et al., 2008; Pentek et al., 2009). Overexpression studies (Bickel et al., 2008; Pentek et al., 2009) revealed two potentially relevant Fs roles: regulation of cell proliferation and regulation of ecdysone receptor expression in the brain. For the former, wings were smaller in fs overexpression mutants (Bickel et al., 2008; Pentek et al., 2009). In the pea aphid wingless males, expression of an additional fs copy could negatively regulate cell proliferation in the wing and wing musculature, resulting in a lack of growth. For the latter, fs overexpression caused phenotypes that mimicked disruption of ecdysone signaling; the authors posited that there is a potentially underappreciated interaction between activin signaling and ecdysone responses (Pentek et al., 2009). This is particularly intriguing because the pea aphid plastic, female wing dimorphism is regulated, at least in part, by ecdysone signaling (Vellichirammal et al., 2017). Pea aphid fs-3 might, therefore, directly or indirectly activate ecdysone signaling in a tissue-specific manner during the development of the wingless morph, causing it to display the wingless male phenotype. So, rather than using an environmental cue to activate ecdysone signaling and to ultimately cause winged versus wingless developmental events, as in the pea aphid female polyphenism, the males may begin the process using the fs-3 gene product. Future work is required to determine whether or not fs-3 influences TGF-β/activin signaling in the pea aphid and how these changes might alter the male phenotype.

Evolution at the api locus

To investigate the evolution of the api region within the pea aphid species complex, we used the 23 individually sequenced pea aphid genomes from nine biotypes discussed above. Of these genomes, all 14 wingless individuals carried the insertion (Figure 3—figure supplement 1) and the nine winged individuals did not (Figure 3—figure supplement 2). Phylogenetic network analysis using a region of the aphid genome outside of api (Source data 4) showed that genetically distinct biotypes, such as those feeding on Lathyrus pratensis (Lap1-3) and Vicia cracca (Vc1-2), group together and are separated by relatively long branches (Figure 3A, top). The other biotypes exhibited little network structure, suggesting that there is gene flow among these biotypes, as reported previously (Peccoud et al., 2009a). In stark contrast, within the api region but just outside of the insertion (Source data 5), we found that the winged and wingless alleles are separated by a long branch, implying that biotype differentiation occurred after the origin of the WL api allele (Figure 3A, bottom). Thus, the WL api insertion must predate the biotype divergence that occurred ~500,000 ya.

Figure 3 with 2 supplements see all
The polymorphism predates radiation within the species complex.

(A) Tree networks, built with SplitsTrees, based on positions ~ 1,235 kb-1,245kb (‘Outside api’) or ~379 kb–388 kb (‘Inside api’) on the WL api scaffold. Winged individuals are colored in blue and wingless alleles in red. Abbreviations: Cs: Cytisus scoparius, Lap: Lathyrus pratensis, Mes: Melilotus spp., Ms: Medicago sativa, Os: Ononis spinosa, Ps: Pisum sativum, Tp: Trifolium pratense, and Vc: Vicia cracca. (B) Neutral divergence times, as measured by dS. Orange line shows the dS distribution for 16,367 genes, derived from comparing individuals Lap1 and Ms3. Dotted vertical lines indicate the dS values for the average of the three api-adjacent genes (as1, sam50, frs2) when winged and wingless individuals are compared, the comparison of fs-2 and fs-3, and the average rose to pea aphid dS. (C) Allele frequency estimates based on SNP frequency analysis of read counts in pool-seq data for each of the different pea aphid biotypes. Biotypes that overlap with data in (A) are shown with the same abbreviations as in (A), but some biotypes are unique to analyses in (A) or (C). Additional abbreviations: Gs: Genista sagittalis, Gt: Genista tinctoria, Lc: Lotus corniculatus, Ml: Medicago lupulina, Ov: Onobrychis viciifolia, Ss: Securigera spp., and Ts: Trifolium spp.

To further explore the age of the insertion, we used the synonymous divergence rate, dS, as a measure of neutral differences and thus as a proxy for age. We first explored the distribution of dS values for all genes between two divergent biotypes, a Lathyrus pratensis biotype genome (Lap1 from Figure 3A) and a Medicago sativa biotype genome (Ms3 from Figure 3A). Even between these divergent biotypes, most genes had few differences, with a median dS of 0.0035 (orange distribution, Figure 3B; Source data 6). In contrast, the three expressed genes with exons within the api-adjacent region had dS values between winged and wingless alleles that are an order of magnitude higher than the average of all genes: 0.045 (as1), 0.035 (sam50), and 0.036 (frs2) (vertical dashed line in Figure 3B). Even more strikingly, dS between fs-3, the follistatin copy in the WL insertion, and fs-2, the follistatin copy at the autosomal origin, was 0.068 (Figure 3B, vertical dashed line). These high dS values also support the hypothesis that the api WL insertion is considerably older than the biotype diversification.

We next estimated the age of the insertion relative to the divergence between the pea aphid and the rose aphid (Macrosiphum rosae). The pea aphid and rose aphid display an average dS of 0.08 (Figure 3B). Assuming a molecular clock, and an estimated dS and divergence time between the pea aphid and the peach-potato aphid (Myzus persicae) of 0.25 and approximately 42 million years, respectively (Ji et al., 2016), we estimate that the pea aphid and rose aphid diverged approximately 13 million years ago. We therefore conclude that the api WL insertion occurred approximately 10 millions years ago. Given the age of this insertion, the region is likely not unique to pea aphids, but rather could be responsible for the wingless male phenotype observed in other, related species as well. For example, a species not examined in the phylogeny shown in Figure 1—figure supplement 3 is Acyrthosiphon auctum, which is purportedly within the same genus as the pea aphid, but this species only produces wingless males (Blackman and Eastop, 2000). It will be interesting to interrogate other, closely related wingless and dimorphic species to determine whether or not the insertion is present in them and thus responsible for the wingless phenotype in other species.

To explore contemporary frequencies of the two alleles, we estimated the winged allele frequency from 12 different pea aphid biotypes. To do this, we focused on a noncoding SNP in the api region that segregates perfectly with the male wing phenotype across all 23 sequenced genomes from the nine biotypes, and which is 95% associated with the male phenotype in the alfalfa-biotype-specific, pool-seq data. We examined this SNP from pre-existing pool-seq data collected from each of the 12 biotypes (Guyomar et al., 2018) to estimate the winged allele frequencies. Each pool contained from 14 to 35 individuals from France, which has high host plant diversity, and each pooled individual had a distinct, multilocus genotype (Guyomar et al., 2018). The majority (9/12) of the biotype lineages exhibited both alleles, with the ancestral, winged allele frequency ranging from 5% to 94% (Figure 3C). In the remaining three biotypes (Gs, Mes, and Vc), the derived, wingless allele has gone to fixation in the samples of our dataset. This inter-biotype variation raises the intriguing idea that specialization on distinct legume hosts may influence male wing dimorphism variation.

The winged and wingless alleles have been maintained within the pea aphid lineage for millions of years. This variation may have been maintained within one or more biotypes as we see in modern-day sampling, or alternatively, the pea aphid species complex has maintained both alleles for thousands of years, but the alleles have moved between individual biotype via introgression. Unlike the female pea aphid wing dimorphism, which is well studied in terms of the investment tradeoff in reproduction versus dispersal (Zera and Denno, 1997), it is not as clear what fitness advantage a winged or wingless pea aphid male has over its counterpart. Indeed, the existence of wing dimorphisms in male insects are less well understood generally, especially given their limited energetic investment in offspring relative to females (reviewed in Langellotto et al., 2000). Hypotheses would include inbreeding avoidance and reduction of local mate competition driving the increase in abundance of winged males, while the high mortality associated with flight would drive an increase in wingless males. Future studies should, therefore, aim to understand why both morphs have existed for so long in pea aphids.

Conclusions

We have identified a derived insertion carrying a duplicated follistatin homolog that underlies the differences between winged and wingless males. This study represents an important breakthrough in understanding the molecular evolution of wing dimorphisms, which have evolved repeatedly in insects (Roff, 1986). Furthermore, this work demonstrates that a dramatic phenotypic difference is due to a relatively old insertion containing a gene duplicate, and thus that mutations of relatively large effect can play an important role in adaptive evolution (Orr and Coyne, 1992). The discovery of these types of structural variants underlying morphological adaptations have largely gone undetected (at least, compared to SNPs, although notable examples exist e.g., Lamichhaney et al., 2016; Joron et al., 2006) to date due to technological limitations. Long-read genome sequencing technologies are likely to uncover additional cases of structural variants underlying morphological evolution in the future.

Materials and methods

Linkage mapping

Request a detailed protocol

Braendle et al. (2005) previously established an api linkage mapping population. We used the F1 line from that population to generate additional F2 recombinants. Specifically, F1 asexual females (that reproduce through parthenogenesis) were placed on Vicia faba plants in an incubator at 16°C and a photoperiod of 12 hr light and 12 hr dark. After two generations of asexual reproduction, these conditions induce the production of sexual females and males. F1 males are both winged and wingless. Females produce males asexually by losing one X chromosome (Blackman, 1987). The api locus is on the X chromosome, so F1 females produce either winged or wingless males depending on which chromosome they drop. Male siblings are therefore identical on their autosomes, but they carry different X chromosomes. We crossed F1 females to F1 males, collected fertilized eggs, sterilized them in 1% calcium propionate on Whatman paper, and placed them in an incubator that alternated between 4°C for 12 hr light and 0°C for 12 hr dark. After 90 days, eggs were removed from this incubator and placed in a 19°C incubator that alternated between 16 hr light and 8 hr dark. F2 hatchlings were transferred to individual plants for asexual reproduction to establish a line of that F2 individual.

Genomic DNA from 164 F2 females was isolated, quantified, and diluted to 2 ng/µl and subjected to multiplexed shotgun genotyping (Andolfatto et al., 2011). The final amplified libraries were sequenced on an Illumina Genome Analyzer. We generated parental X-chromosome genomes for the MSG software pipeline (https://github.com/YourePrettyGood/msg; copy archived at https://github.com/elifesciences-publications/msgPinero et al., 2020) by updating a recently published assembly of the pea aphid genome (Li et al., 2019) with short sequence reads generated from wingless and winged males from the F1 strain and a custom script (available upon request). MSG produced genotype posterior probabilities that were imported directly into the genotype data structure of R-qtl (Broman et al., 2003) using a custom script (https://github.com/dstern/read_cross_msg; copy archived at https://github.com/elifesciences-publications/read_cross_msgStern, 2020). We performed QTL analysis assuming a single additive locus for winged versus wingless males using the R-qtl command scanone using Haley-Knott regression (Haley and Knott, 1992). Genome-wide significance values were calculated using 1000 permutations of the phenotype data over the genotypes.

Pool-seq

Request a detailed protocol

We used 44 winged and 44 wingless male pea aphids induced from females collected from Nebraska, New York, California and Massachusetts (Supplementary file 3). Genomic DNA (gDNA) from each male was extracted using the Qiagen DNeasy Blood and Tissue Kit. 50 ng of gDNA from each male was pooled together with males of the same phenotype. Paired-end libraries were prepared with the TruSeq DNA PCR-Free Library Preparation Kit at the University of Rochester Genomics Research Center and sequenced on an Illumina HiSeq2500 Sequencer with paired 125nt reads. First, reads were mapped to the recently published whole-X-chromosome assembly (Li et al., 2019) using bowtie2 v2.2.9 (50) with default parameters. The SAM files were converted to BAM files using samtools (v1.7) and later used for Fst analysis. In addition, the reads were mapped to our WL genome using bowtie2 (Li and Durbin, 2009) using default parameters. Reads were filtered for a mapping quality of 20 and BAM files were sorted by coordinates using samtools. The coverage of both W and WL libraries mapping to our WL genome was calculated using the samtools depth function. The pool-seq reads are deposited under BioProject PRJNA562690, SRA SRR10030338 for the winged reads and SRA SRR10030337 for the wingless reads.

Biotype genomes

Request a detailed protocol

In addition to the reference pea aphid genome which is api wingless homozygous, we obtained the sequence of 22 additional pea aphid genomes: nine carrying only the winged allele and 13 the wingless allele. These genotypes were collected in the wild as parthenogenetic females, on distinct legume species (Guyomar et al., 2018; Gouin et al., 2015). Biotypes (populations specialized on distinct legume hosts) of these genotypes were assigned based on their microsatellite profiles (Peccoud et al., 2009b) and representatives were then selected for genome resequencing (Gouin et al., 2015). Lines were sequenced to 17X to 30X coverage with Illumina 100nt paired-end reads. The reads of each sample were aligned to the pea aphid reference genome with default settings in bowtie2 (version 2.2.1) (Langmead et al., 2009). The consensus sequence of each sample was acquired using the recommended pipeline in samtools (version 1.3.1) and bcftools (version 1.3).

Nanopore sequencing and genome assembly

Request a detailed protocol

Two F2 lines, one homozygous winged and the other homozygous wingless, were generated from the api heterozygous F1 (W/WL) line. The DNA of nymphs (first through fourth instars) was isolated using the Qiagen Gentra Puregene Tissue Kit according to the manufacturer’s instructions. DNA was quantified on a Qubit and a Nanodrop and visualized on a pulse filed gel to verify its large size. For Nanopore sequencing, 2 µg of DNA was used to prepare a 1D sequencing library (SQK-LSK108) according to the manufacturer's instructions (1D gDNA long reads without BluePippin protocol), including the FFPE repair step. 75 μL of the library was immediately loaded onto an R9.4 flow cell prepared according to the manufacturer's instructions and run for approximately 48 hr. Basecalling was completed using ONT Albacore Sequencing Pipeline Software version v2.2.6. 14 GB of Nanopore reads were generated for the winged line and 7 GB reads were generated for the wingless line. OntCns, a customized version of MECAT (Xiao et al., 2017) was used to assemble the Nanopore reads of the W line. All parameters were kept the same as the example included in the package manual except for the OntCns2Consensus step, in which the coverage cutoff (-c) was set to 5. Minimap2 (Li, 2018) and Miniasm2 (Li, 2016) was used for genome assembly of the WL genome. An additional 36 GB of WL/WL Pacbio reads were combined with the 7 GB of Nanopore reads. Preset parameters (-x ava-pb) were used in the Minimap step. Assembly polishing was performed slightly differently for the two assemblies. First, 4 rounds of RACON (Vaser et al., 2016) (two rounds of Nanopore and two rounds of Pacbio) were performed to improve the WL assembly since there is no consensus step in the Miniasm assembler (there is a consensus step in the MECAT assembler). We aligned the Nanopore reads and Pacbio reads using Minimap2 with preset parameters (-x ava-ont and -x ava-pb) respectively. Next, 10 rounds of Pilon were used to further polish the WL assembly (Walker et al., 2014). We aligned 84X W and 48X WL Illumina paired-end (2 × 100 nt) reads to the assembly using bowtie2 (Langmead and Salzberg, 2012) and then ran Pilon on the sorted bam files using default settings. The same Pilon polishing process has been performed for the MECAT assembly. The only difference is that we used 68 × 100 nt F2(wl) and 44X F2(w) Illumina paired-end (2 × 100 nt) reads for the alignment. The wingless reads are deposited under accession SRR8306868 (wingless) and SRR10028116 (winged).

Other genomes

Request a detailed protocol

We sequenced the genome of the rose aphid (Macrosiphum rosae) using 2 × 150 nt Illumina reads to approximately 105x coverage. Abyss 2.13 was used to assemble the genome, with kmer set to 107 and other parameters left as default. This Whole Genome Shotgun project has been deposited at GenBank under the accession BioProject: PRJNA576965. We used publicly available data for the peach-potato (Myzus persicae), the soybean (Aphis glycines), and the bird cherry-oat (Rhopalosiphum padi) aphid genomes, available at Aphidbase.com. We also used information from a genome produced by the 10XGENOMICS company (here renamed Carolinaia spp. based on its COI sequence) available at their website: https://support.10xgenomics.com/de-novo-assembly/datasets/2.0.0/aphid).

Association and FST analysis

Request a detailed protocol

Two mpileup files were created from the W and WL male BAM files, mapping to either the genome recently published chromosome-length genome (Li et al., 2019) or to the WL genome assembly we produced here using the samtools mpileup function with the -B option to disable BAQ computation. They were further processed by the mpileup2sync.jar script in PoPoolation2 v.1.201 (Kofler et al., 2011) to generate synchronized mpileup files (sync files), with fastq type set to sanger and minimum quality set to 20. For association analyses, Fisher’s exact tests were performed on the mapping to our WL genome using the fisher-test.pl script included in PoPoolation2 with a window-size of 1, step-size of 1, minimum coverage of 2, and minimum allele count of 2. FST values were calculated on the mapping to the whole-X-chromosome assembly (Li et al., 2019) with the fst-sliding.pl script included in PoPoolation2 with a window size set to 20 kb, a step size set to 10 kb, minimum coverage set to 30, a maximum coverage set to 200, pool size set to 44. We filtered the results by the fraction of the window having sufficient coverage greater than 0.05.

Species and follistatin gene phylogenies

Request a detailed protocol

Two fs genes were identified in the A. pisum reference genome (v3.0): XM_003246488 and XM_029488805. We found an additional copy in the insertion, which in v2.0 of the genome was referred to as ACYPI065759. They were renamed as follows: XM_003246488: fs-1; XM_029488805: fs-2; ACYPI065759: fs-3. The fs homologs in the peach-potato (Myzus persicae), the soybean (Aphis glycines), and the bird cherry-oat aphids (Rhopalosiphum padi) were identified by blastp using the pea aphid fs-1. Tblastn was used to identify the homolog in the 10X genomics, Carolinaia spp. genome. Finally, a blastn was used to find the fs gene sequence from the rose aphid. MAFFT (v.7.313) was used for alignment and Raxml (v.8.2.11) was used to build a maximum likelihood tree of the aligned coding sequences. Rapid bootstrap analysis (-f a) was used to search for the best-scoring maximum likelihood tree in one program run. The GTRGAMMA model was selected (-m GTRGAMMA) and 1000 bootstrap replicates were performed. Random seed number for the starting trees (-p 123321) and Bootstrap (-x 12321) were set accordingly.

Five housekeeping genes (Elongation factor 1α, 18S ribosomal RNA, 12S ribosomal RNA, Cytochrome b and Cytochrome c oxidase subunit I) were selected to construct the phylogeny of six aphid species: the pea aphid (A. pisum), the rose aphid (Macrosiphum rosae), a Carolinaia spp. aphid, the peach-potato aphid (Myzus persicae), the bird cherry-oat aphid (Rhopalosiphum padi), and the soybean aphid (Aphis glycines). The sequences were concatenated together for each species and then aligned using MUSCLE (v.3.8.31). The aligned sequences were further curated using Gblock (http://www.phylogeny.fr/) to eliminate poorly aligned positions and divergent regions. Then the same settings of Raxml as in the follistatin phylogeny were adopted to construct the maximum likelihood tree.

Annotations

Request a detailed protocol

Augustus v.3.3.1 (Sommerfeld et al., 2009) was used for annotations. Species parameters were set to the pea aphid, Acyrthosiphon pisum, since there is a pretrained model.

SplitsTree

Request a detailed protocol

SplitsTree v.4.14.6 (Huson and Bryant, 2006) was used to generate unrooted phylogenetic networks. Two regions were analyzed for individual biotypes on the api scaffold: inside api (positions 378,979–387,730) and outside api (positions 1,235,176–1,245,122). All parameters were left as default.

dS biotype calculations

Request a detailed protocol

The bedtools (v.2.26.0) getfasta function was used to extract CDS sequences from the consensus sequences of the Lap1 and Ms3 genomes based on the annotation file AphidBase_OGS2.1b_withCDS.gff3, available at aphidbase.com. A modified perl script of SNAP (Synonymous Non-synonymous Analysis Program, v.2.1.1), (Korber, 2000) was used to calculate dS values with the Jukes-Cantor correction. Results were filtered to only include CDS lengths greater than or equal to 300 and less than or equal to 3000.

Reverse-transcriptase quantitative PCR (RT-qPCR)

Request a detailed protocol

We used the two api homozygous F2 lines to collect male embryos and first instar nymphs from winged and wingless genotypes. We dissected stage 18 (Miura et al., 2003) embryos from the females or collected first instar nymphs and verified that they were male via PCR (primers 5’ ATCGATGCTTTTGAATTGTTTTAC 3’ and 5’ TGTAGGGTCTCTTGAAGTTGTTTG 3’) followed by restriction digest. This primer set targets a heterozygous recognition sequence for Taq1alpha on the X chromosome and males only have one X chromosome; thus, they only display one of the two possible bands shown by females. Five biological replicates were included. Each embryo replicate contained 20 embryos from 6 to 10 females, while each first instar nymph replicate contained 10 individuals produced by 3 to 5 females. RNA extraction was performed using Trizol and cDNA synthesis via the BioRAD iScript cDNA kit. Quantitative PCR was performed on a Bio-Rad CFX-96 Real-Time System using 12 μL reactions of 40 ng cDNA, 1X PCR buffer, 2 nM Mg+2, 0.2 nM dNTPs, 1X EvaGreen, and 0.025 units/μL Invitrogen Taq with the following conditions: 95°C 3 min, 40x (95°C 10 s, 55°C 30 s). Primer concentrations were optimized to 100 ± 5% reaction efficiency with an R2 value of >0.99 [G3PDH (XM_001943017): 400 nM Forward primer, 350 nM Reverse primer; NADH (NM_001162323): 350 nM F, 300 nM R primer; 2281: 175 nM; 25525: 150 nM; 25532: 250 nM; and 25533: 150 nM]. Each of the five biological replicates was run on a single plate, with three technical replicates of each reaction. ΔCt values were calculated by subtracting the average CT value of the two endogenous controls (G3PDH and NADH) from the CT values of each target gene. For each pair of winged and wingless samples, ΔCT values were analyzed using two-sided t-tests after checking for normality. The endogenous control primers used were GAPDH 5’ CGGGAATTTCATTGAACGAC 3’ and 5’ TCCACAACACGGTTGGAGTA 3’; and NADH: 5’ CGAGGAGAACATGCTCTTAGAC 3’ and 5’ GATAGCTTGGGCTGGACATATAG 3’. The other gene-specific primers were frs2: 5’ AATGACACAAATGTGGCTGAAG 3’ and 5’ CAGGGCTTAACTCAAGGTATGG 3’; sam50: 5’ CCTATGACATTACGTGGTTTCC 3’ and 5’ TTCGGTTACTTTCCATGCC 3’; as1: 5’ GACCTGACGATTTTGAACACAG 3’ and 5’ TTATCTCCCATCATCAGCATTC 3’; cdg: 5’ AAGATGCTTCTTCTTCAACACC 3’ and 5’ CCCTGCAGCTTGTTCTACATTA 3’.

Reverse-transcriptase PCR (RT-PCR)

Request a detailed protocol

We used the api mapping F1 line (parental line of the two F2 lines) to collect embryonic, 1st instar, and 2nd instar males and females. For males, we verified that each individual was a male via PCR (primers 5’ TGGTACATATCAGCTATCAGCACA 3’ and 5’ ACACAAGTTATTTCAGTTGCTTAGG 3’) followed by restriction digest. This primer set targets a heterozygous recognition sequence on the X for Taq1alpha, so that males will only display one of the two possible bands depending on wing morph. We pooled groups of 20 males for RNA extraction. Wing phenotype is environmentally determined in females, so there are no genetic differences between them. To collect predominantly winged female samples, we crowded three-day-old asexual adult wingless females in groups of 12 for 24 hr. Adults were then placed on plants to lay offspring. Individuals from embryonic stage 18, the 1st nymphal instar, and the 2nd nymphal instar were collected and pooled in groups of 20 for RNA extraction. cDNA conversion was via the TransScript cDNA synthesis kit. To target fs-3, we used the primers 5' CTTTGCACTTTGCGCACGA 3' and 5' GCTGAACGCTTTTCGAAAGAAC 3' and a 65°C annealing temperature. For fs-2, we used the primers 5’ GCGTCCAACGTACATATATAGAAA 3’ and 5’ ACACACCATTGGCATCACAT 3’ and a 60°C annealing temperature. The G3PDH primers were the same as used for the RT-qPCR. 40 ng of cDNA was used for each reaction.

Winged allele frequency estimation in pea aphid biotypes

Request a detailed protocol

We analyzed the api allele frequency of pooled samples from 12 of the biotypes (Guyomar et al., 2018). Reads were mapped to the reference genome (AphidBase 2.0) using bowtie2 (v2.2.9) with default settings. Samtools (v1.7) was used to generate mpileup files and popoolation2 (v2.1201) were used to generated sync files which contain the read depth of each nucleotide at each location (all default settings except minimum mapping quality (-q for samtools and --min-qual for popoolation2) was set to 20). Allele frequency was calculated based on the nucleotide frequency at the marker location (position 55599 on scaffold GL349773).

References

  1. 1
  2. 2
  3. 3
    Reproduction, cytogenetics and development
    1. RL Blackman
    (1987)
    In: A. K Minks, P Harrewijn, editors. Aphids: Their Biology, Natural Enemies & Control, 2A. Amsterdam: Elsevier. pp. 163–195.
  4. 4
    Aphids on the World's Trees: An Identification and Information Guide
    1. RL Blackman
    2. VF Eastop
    (1994)
    Wallingford: CAB International.
  5. 5
    Aphids on the World's Crops: An Identification and Information Guide
    1. RL Blackman
    2. VF Eastop
    (2000)
    Chichester: John Wiley & Sons Ltd.
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
    Biology of Aphids
    1. AFG Dixon
    (1973)
    London: Edward Arnold Ltd.
  11. 11
    Dispersal in aphids, a problem in resource allocation
    1. AFG Dixon
    2. MT Howard
    (1986)
    In: W Danthanarayana, editors. Insect Flight: Dispersal and Migration. Berlin: Springer-Verlag. pp. 145–151.
    https://doi.org/10.1007/978-3-642-71155-8
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
    HIV signature and sequence variation analysis
    1. B Korber
    (2000)
    In: AG Rodrigo, GH Learn, editors. Computational Analysis of HIV Molecular Sequences. Dordrecht, Netherlands: Kluwer Academic Publishers. pp. 55–72.
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56

Decision letter

  1. Mia T Levine
    Reviewing Editor; University of Pennsylvania, United States
  2. Patricia J Wittkopp
    Senior Editor; University of Michigan, United States
  3. Nancy Moran
    Reviewer; Austin, United States
  4. Leif Andersson
    Reviewer; Uppsala University, Sweden

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

Adaptive morphological traits have been mapped primarily to cis-regulatory elements or coding sequences. This paper offers compelling evidence that a classic winged/wingless polymorphism in male pea aphids maps instead to a large, X-linked insertion that duplicated over 10 million years ago. The discovery of a single strong candidate gene carried on the insertion highlights the importance of radical structural rearrangements in adaptive evolution.

Decision letter after peer review:

Thank you for submitting your article "A large genomic insertion containing a duplicated follistatin gene is linked to the pea aphid male wing dimorphism" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Patricia Wittkopp as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Nancy Moran (Reviewer #2); Leif Andersson (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission. Please note, however, that the requested changes are significant and the suitability for eLife following review of a revised version is thus more uncertain than usual. eLife

Summary:

Li et al. report their comprehensive effort to identify the genetic basis of a wing presence/absence polymorphism in the pea aphid. The authors introduce the topic well, highlighting the pervasiveness of wing loss over insect evolution and the power of within species polymorphism to map the genetic basis of wing loss. Unlike many genotype-phenotype studies published over the past decade that map to a cis-regulatory element or a nonsynonymous site, the authors report a structural variant that determines winglessness. Importantly, single- molecule sequencing was required to make this discovery, highlighting the manuscript's timeliness.

The authors validate and narrow a previously reported X-linked region controlling the wing polymorphism. They use three different methods – QTL on an F2 population, Fst analysis of US-wide populations of the pea aphid, and then an association study on so-called biotypes that are deeply diverged lineages within the species complex. Assembly of long-reads revealed a 120kb insertion only in the wingless morph. The authors go on to show that the insertion onto the X from an autosomal region is quite old. Using an additional aphid species genomes, including a previously unsequenced species, the authors infer that the insertion is unique to the pea aphid, at least among this set of species. To define a single candidate gene, the authors use several approaches to narrow the 12 genes encoded on the insertion plus those flanking the insertion. The authors identify a follistatin gene duplicate as the likely candidate and provide some evidence from Drosophila that this gene may in fact mediate wing development.

All three reviewers found the broad question compelling and the mapping to a structural variant convincing and significant. The three reviewers also determined that additional data was needed to support follistatin-3 as the likely causal locus. Suggested experiments and additional analyses below would offer either more robust support for follistatin or instead support of another gene encoded on the insertion or possibly a gene in linkage with the insertion. Additionally, interpretation of the follistatin gene tree warrants more caution and alternative methods are required to detect purifying/negative selection on the candidate causal locus. Please find specific questions/suggestions related to these two points as well as other essential comments below:

Essential revisions:

1) The authors failed to evaluate sufficiently the 11 genes encoded on the insertion and as1 outside the insertion. Addressing the questions below with additional analyses and/or experiments is necessary to identify a convincing candidate gene.

a) Do these 11 gene (in males) as well as fs-2 (in males and females) show a developmental pattern of expression different from fs-3, consistent with fs-3 winglessness-specificity? Additional RT-PCR (or better, RT-qPCR) would address these important questions. Note that the current RT-PCR data are missing a loading control and one reviewer noted the absence of a genomic DNA contamination control.

b) Do these 11 genes evolve at a faster rate than fs-3, consistent with lack of functional constraint? Are the open reading frames intact? Molecular evolution analyses would address this point. Given the ancient origins of the insertion. there has been ample time to accumulate mutations at non-functional genes.

c) The as1 expression data deserve more discussion – the differential expression data are actually compelling.

d) A more convincing approach to support fs-3 as the causal locus would be RNA FISH on the relevant developing tissue, though is not required if additional supporting data for follistatin-3 emerge from the above experiments.

e) Please acknowledge explicitly that a transgene rescue experiment is ultimately required to demonstrate causality.

2) The fs paralog tree interpretation is problematic. The support value for the fs-2/fs-3 clade is extremely low, suggesting that the topology should collapse into a polytomy. The ambiguity undermines the inference that an ancestral fs-2 was lost. Please clarify. Moreover, if the authors find additional support for the presented topology, could it be instead that the ancient fs-3 duplication represented an origin of male wing polymorphism, and that Carolinaia and M. rosae lost wingless males, as did some of the A. pisum biotypes? Please address this alternative interpretation.

3) I am not familiar with methods that allow inference of purifying selection from paralog comparisons (rather than ortholog comparisons). I don't have a great suggestion here for inferring purifying selection on a young gene duplicate (except, possibly, comparing to other genes in the insertion that may be evolving under no constraint and so have degenerated-see 1b).

4) There are weaknesses with Figure 1—figure supplement 3 and the argument that wingless males are rare in general in related aphid species (and so the wingless gene and phenotype is derived in A. pisum). I agree that this example of the wingless state is most likely derived, resulting from this insertion. However, the insertion may not be unique to A. pisum alone given the apparent age of the event. Importantly, the phylogeny in Figure 1—figure supplement 3 (from Hardy et al.) is not well-supported. It was based on an alignment of 4800 nucleotide sites but most of the taxa have missing data and no confidence metrics were included (What are units in the scale bar of Figure 1—figure supplement 3? What is '20.0')? I suggest summarizing winged/wingless males on the well-established Macrosiphini tree.

5) The last paragraph of the Introduction says that the majority of species in this group have winged males. Possibly the proportion is indeed more than half, but if you look in Blackman and Eastop (Aphids on the world's plants), one or more species in almost all of the genera of Figure 1—figure supplement 3 are known to have wingless males. By chance it seems there are fewer in the set represented in the Figure 1—figure supplement 3 tree – numerous of these are host plant-alternating species and therefore must have winged males. (Of course, if other species descend from host alternating species, then winged males have to be ancestral…). The idea that wingless males are derived seems almost certainly correct, but this point needs to be clarified in the context of the species presented to establish this point.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "A large genomic insertion containing a duplicated follistatin gene is linked to the pea aphid male wing dimorphism" for further consideration by eLife. Your revised article has been evaluated by Patricia Wittkopp as the Senior Editor and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

1) The Supplementary file 1, which includes the annotation of the predicted ORFs both inside and outside the insertion, does not sufficiently address the reviewers' concerns (#1). There are no data in this file supporting the statement that all "g" ORFs represent repetitive sequences in the genome. If indeed all ORFs occur many times, additional data demonstrating this point should appear in this file or instead as a distinct supplementary file/figure. For those ORFs that are unique (if any), data on their expression in wingless males is important for supporting the ultimate focus on fs-3. It appears that the ORFs have predicted splice junctions, which could be used to support or reject the presence of a transcript. Addressing this concern is imperative given the inability to conduct knockout or transgene experiments.

2) The new RT-PCR data are certainly promising but the quality of the gel image is poor. It is difficult to discern that "…wingless males 266 expressed fs-3 increasingly across development." Either additional product should be run to make clearer the result or, as suggested in the first Decision Letter, RT-qPCR should be performed (particularly given the use of the term "increasingly"). It is surprising that the authors chose to run RT-qPCR for some of the data (i.e., the genes outside the insertion) but RT-PCR for the more important focal gene, fs-3.

3) The rejection of as1 may be warranted based on statistical criteria but given the RT-qPCR data and that it has a gene name, it would be helpful to at least know its function to further support the decision to reject it as a candidate gene.

https://doi.org/10.7554/eLife.50608.sa1

Author response

Essential revisions:

1) The authors failed to evaluate sufficiently the 11 genes encoded on the insertion and as1 outside the insertion. Addressing the questions below with additional analyses and/or experiments is necessary to identify a convincing candidate gene.

a) Do these 11 gene (in males) as well as fs-2 (in males and females) show a developmental pattern of expression different from fs-3, consistent with fs-3 winglessness-specificity? Additional RT-PCR (or better, RT-qPCR) would address these important questions. Note that the current RT-PCR data are missing a loading control and one reviewer noted the absence of a genomic DNA contamination control.

We have collected new cDNA from males and females to perform the fs-2 and fs-3 RT-PCRs. We collected a different range of stages this time: instead of collecting all stages of development, we focused specifically on embryos, 1st instar nymphs, and 2nd instar nymphs. Our rationale for this choice was twofold. First, these are likely the most relevant stages of expression, as noted in the paper. Second, this makes the results more consistent with the RTqPCR presented in Figure 2—figure supplement 2.

The results are presented in the updated Figure 2. Recall that our purpose for the RT-PCR of fs-2 was to show that it is expressed. Figure 2D shows that wingless males exhibit expression of fs-3 and winged males do not (because they do not have the insertion and thus do not have the fs-3 gene). Figure 2D also shows a positive control (G3PDH) for these same samples. This positive control also works as a negative control for genomic DNA contamination of the cDNA, since it has an amplicon size difference between genomic DNA and cDNA.

The fs-2 gene also appears to be expressed in some samples (please Figure 2D), though not in the same pattern at fs-3.

We hope this addresses the reviewers’ concerns about the fs genes. As for the other 11 genes, please see the next response.

b) Do these 11 genes evolve at a faster rate than fs-3, consistent with lack of functional constraint? Are the open reading frames intact? Molecular evolution analyses would address this point. Given the ancient origins of the insertion. there has been ample time to accumulate mutations at non-functional genes.

We understand from these critiques that we did not provide enough information about these 11 genes. In retrospect it was probably inappropriate to describe them as genes. These are predicted ORFs using automated annotation. All 11 ORFs have strong similarity to sequences that are present many times in the pea aphid genome. Thus, the 11 ORFs are highly repetitive sequences and thus are likely non-functional and unlikely to play a role in the api phenotype. We have now revised the paragraph describing the 11 genes:

“Annotation software (Augustus v.3.3.1) predicted 12 open reading frames (ORFs) in the wingless api insertion (Figure 2A), only one of which has homologous sequence in the insertion’s autosomal source location. The other 11 ORFs are each composed of sequences that are highly repetitive across the genome, with many apparently derived from transposable elements (Supplementary file 1).”

Also as noted in this added text, none of these ORFs have homologous copies present in the original, autosomal location. Therefore, we cannot do any molecular evolution analyses among paralogs as we do with the follistatin genes.

c) The as1 expression data deserve more discussion – the differential expression data are actually compelling.

After the appropriate multiple comparison correction the p-value is 0.10, so given that we tested multiple genes, this difference does not meet statistical significance, is likely due to chance, and is not compelling evidence. We pointed out in the text that the uncorrected p-value was 0.013, and then followed that by stating “We therefore found no strong evidence that these genes contribute to the male wing dimorphism.” We are therefore trying to be as transparent as possible about this result.

d) A more convincing approach to support fs-3 as the causal locus would be RNA FISH on the relevant developing tissue, though is not required if additional supporting data for follistatin-3 emerge from the above experiments.

We agree and hope to eventually do this experiment. Unfortunately, there are many candidate stages and tissues and this experiment is outside the scope of this paper. We feel the qPCR supports our conclusions.

e) Please acknowledge explicitly that a transgene rescue experiment is ultimately required to demonstrate causality.

Done. We added this sentence:

“A transgene rescue or mutational knock-out experiment (ideally both) are ultimately required to determine whether fs-3 causes the male wing polymorphism.”

2) The fs paralog tree interpretation is problematic. The support value for the fs-2/fs-3 clade is extremely low, suggesting that the topology should collapse into a polytomy. The ambiguity undermines the inference that an ancestral fs-2 was lost. Please clarify. Moreover, if the authors find additional support for the presented topology, could it be instead that the ancient fs-3 duplication represented an origin of male wing polymorphism, and that Carolinaia and M. rosae lost wingless males, as did some of the A. pisum biotypes? Please address this alternative interpretation.

Yes, it is not well supported. We have now collapsed the topology, as suggested. We collapsed two branches and now the presentation is one of a polytomy. The relevant portion of the paragraph, now reads:

“The most parsimonious explanation for these patterns is that the initial duplication that gave rise to the fs-2/3 lineage arose after the split of the pea aphid and the peach-potato aphid, and this duplicate copy was subsequently lost in the rose aphid lineage. […] Future sequencing of more aphid genomes would provide insight into the timing of the emergence of these paralogs.”

3) I am not familiar with methods that allow inference of purifying selection from paralog comparisons (rather than ortholog comparisons). I don't have a great suggestion here for inferring purifying selection on a young gene duplicate (except, possibly, comparing to other genes in the insertion that may be evolving under no constraint and so have degenerated-see 1b).

As far as we know, this is the best tool we have available. Whole genome duplication studies frequently use dN/dS of paralogs for similar reasons (e.g., Berthelot et al. 2014 Nat Comm paper on trout genome whole-genome duplication). We would be happy to do further work to address this if an appropriate analysis can be recommended.

4) There are weaknesses with Figure 1—figure supplement 3 and the argument that wingless males are rare in general in related aphid species (and so the wingless gene and phenotype is derived in A. pisum). I agree that this example of the wingless state is most likely derived, resulting from this insertion. However, the insertion may not be unique to A. pisum alone given the apparent age of the event. Importantly, the phylogeny in Figure 1—figure supplement 3 (from Hardy et al) is not well-supported. It was based on an alignment of 4800 nucleotide sites but most of the taxa have missing data and no confidence metrics were included (What are units in the scale bar of Figure 1—figure supplement 3? What is '20.0')? I suggest summarizing winged/wingless males on the well-established Macrosiphini tree.

Please see combined answer for 4 and 5 under 5 below.

5) The last paragraph of the Introduction says that the majority of species in this group have winged males. Possibly the proportion is indeed more than half, but if you look in Blackman and Eastop (Aphids on the world's plants), one or more species in almost all of the genera of Figure 1—figure supplement 3 are known to have wingless males. By chance it seems there are fewer in the set represented in the Figure 1—figure supplement 3 tree – numerous of these are host plant-alternating species and therefore must have winged males. (Of course, if other species descend from host alternating species, then winged males have to be ancestral…). The idea that wingless males are derived seems almost certainly correct, but this point needs to be clarified in the context of the species presented to establish this point.

We have now exchanged the tree in Figure 1—figure supplement 3 for the tree from a phylogenetic study of the Macrosiphini, specifically – from Choi et al. 2018 (Molecular phylogeny of Macrosiphini (Hemiptera: Aphididae): An evolutionary hypothesis for the Pterocomma-group habitat adaptation). And we agree that the insertion is likely not unique to the pea aphid given the likely age of the insertion event. We see that the text, as it was written, implied this, even though we never meant to imply this. We have thus added these sentences:

“Given the age of this insertion, the region is likely not unique to pea aphids, but rather could be responsible for the wingless male phenotype observed in other, related species as well. […] It will be interesting to interrogate other, closely related wingless and dimorphic species to determine whether or not the insertion is present in them and thus responsible for the wingless phenotype in other species.”

To the broader point – we think there have been many transitions between winged and wingless males across the aphid phylogeny. This is an idea we are actively pursuing with further work. For now, we have revised the text to say: “The majority of the species closely related to the pea aphid produce only winged males (Figure 1—figure supplement 3), so the wingless phenotype is likely the derived phenotype.”

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

1) The Supplementary file 1, which includes the annotation of the predicted ORFs both inside and outside the insertion, does not sufficiently address the reviewers' concerns (#1). There are no data in this file supporting the statement that all "g" ORFs represent repetitive sequences in the genome. If indeed all ORFs occur many times, additional data demonstrating this point should appear in this file or instead as a distinct supplementary file/figure. For those ORFs that are unique (if any), data on their expression in wingless males is important for supporting the ultimate focus on fs-3. It appears that the ORFs have predicted splice junctions, which could be used to support or reject the presence of a transcript. Addressing this concern is imperative given the inability to conduct knockout or transgene experiments.

We have made a series of changes that we hope will help address this issue.

– As part of the pea aphid genome release in 2010, a database of pea aphid transposable element consensus sequences was created. We have used blastn to query this database and have reported the results in this supplementary table. All but two of the ORFs from the insertion hit this repeat database. One of the ORFs is our focal gene, the follistatin copy (fs-3). The other is g6. This ORF has a conserved DDE_Tnp4 domain, part of the DDE superfamily of endonucleases that are likely transposases. Although a blastn of the nucleotide sequence does not hit the pea aphid TE database, a tblastn using the g6 predicted protein sequence reveals that similar sequences are present over 100 times in the pea aphid genome. This is consistent with g6 being a low copy number variant of an extremely common superfamily of TEs present in the pea aphid genome. We have updated the text to say: “Annotation software (Augustus v.3.3.1) predicted 12 open reading frames (ORFs) in the wingless api insertion. […] The remaining ORF, fs-3, is not repetitive and, importantly, is the only ORF that has homologous sequence in the insertion’s autosomal source location.”

– We have changed the descriptions of these genes in the supplementary file. Rather than using an aggregation of blast hits, we have made the information more systematic. Now the annotations are restricted to conserved domains and their evalues. We have also added a column of information that summarized the other columns.

– We have reduced the number of genes listed in the supplementary file. Specifically, we have deleted two of the gene models (g12 and g14) that were outside of the insertion. Our previous Figure 2A showed annotations of genes from the genome annotation version 2.1b. A new annotation came out later last year and these two genes are not in that annotation. Our edits make the manuscript more consistent because we are using v3.0 for the genomic sequence elsewhere in the manuscript. We have also removed reference to these genes in the text. We have also changed out all ACYPI IDs (version 2.1b reference IDs) in the text to the new, v3.0 names of genes except in Supplementary file 1 where we have listed both names.

– In case it was previously unclear, the gene models within the insertion are our own annotations, not genome v3.0 annotations. This is because much of the insertion sequence is not found in this version of the genome. We could, therefore, not use it. We have now stated this clearly in the text and in the legend for Figure 2.

Our intention with these changes was to make the data on all the ORFs/genes more systematic and to reflect the latest official build and annotation of the genome as much as possible.

2) The new RT-PCR data are certainly promising but the quality of the gel image is poor. It is difficult to discern that "…wingless males 266 expressed fs-3 increasingly across development." Either additional product should be run to make clearer the result or, as suggested in the first Decision Letter, RT-qPCR should be performed (particularly given the use of the term "increasingly"). It is surprising that the authors chose to run RT-qPCR for some of the data (i.e., the genes outside the insertion) but RT-PCR for the more important focal gene, fs-3.

We have re-run the PCRs to present better gel images. And, importantly, we have removed reference to any sort of relative expression in the text because we should not have used terms like “increasingly”.

We ran RT-qPCR for the genes outside of the insertion because we thought it would be relevant to know the expression in the winged relative to the wingless morph. Our rationale was that a difference between the morphs could be driven by a regulatory change at one of these genes. We therefore presented RT-qPCR data in Figure 2—figure supplement 2. For the focal gene (fs-3), which is inside the insertion, this comparison is not possible; the gene is not present in the winged morph, so we cannot look at relative expression in the winged relative to the wingless morph. Any RT-qPCR, and thus any relative expression, would only be comparing expression across developmental stages. This type of RT-qPCR would be useful for beginning to unravel the functioning of this gene, but that is not the subject of this manuscript. Instead, the purpose here is to establish that this gene is expressed in the wingless males. RT-PCR establishes that it is expressed. We have added text in the manuscript about this rationale. We hope that this clearly addresses why we have performed RT-qPCR for some genes and RT-PCR for the other.

3) The rejection of as1 may be warranted based on statistical criteria but given the RT-qPCR data and that it has a gene name, it would be helpful to at least know its function to further support the decision to reject it as a candidate gene.

It would be great to know the function, but it is unknown. There are no functional domains, and all blast hits are to other aphid genes. We have now made it clear in the text that we named this gene as1. We have also now stated in the text that this gene has no known function and no conserved domains.

https://doi.org/10.7554/eLife.50608.sa2

Article and author information

Author details

  1. Binshuang Li

    Department of Biology, University of Rochester, Rochester, United States
    Contribution
    Data curation, Formal analysis, Validation, Investigation
    Contributed equally with
    Ryan D Bickel
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3643-747X
  2. Ryan D Bickel

    Department of Biology, University of Rochester, Rochester, United States
    Contribution
    Conceptualization, Formal analysis, Investigation, Methodology
    Contributed equally with
    Binshuang Li
    Competing interests
    No competing interests declared
  3. Benjamin J Parker

    Department of Biology, University of Rochester, Rochester, United States
    Present address
    Department of Microbiology, University of Tennessee, Knoxville, United States
    Contribution
    Investigation, Methodology
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0679-4732
  4. Omid Saleh Ziabari

    Department of Biology, University of Rochester, Rochester, United States
    Contribution
    Investigation, Methodology
    Competing interests
    No competing interests declared
  5. Fangzhou Liu

    Department of Biology, University of Rochester, Rochester, United States
    Contribution
    Investigation, Methodology
    Competing interests
    No competing interests declared
  6. Neetha Nanoth Vellichirammal

    University of Nebraska Medical Center, Omaha, United States
    Contribution
    Formal analysis, Investigation, Methodology
    Competing interests
    No competing interests declared
  7. Jean-Christophe Simon

    INRAE, UMR 1349 IGEPP, Le Rheu, France
    Contribution
    Resources, Writing - review and editing
    Competing interests
    No competing interests declared
  8. David L Stern

    Janelia Research Campus of the Howard Hughes Medical Institute, Ashburn, United States
    Contribution
    Conceptualization, Formal analysis, Investigation, Visualization, Methodology
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1847-6483
  9. Jennifer A Brisson

    Department of Biology, University of Rochester, Rochester, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Project administration
    For correspondence
    jbrisso3@ur.rochester.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7000-0709

Funding

National Institute of General Medical Sciences (R01GM116867)

  • Jennifer Brisson

Agence Nationale de la Recherche (ANR-11-BSV7-007)

  • Jean-Christophe Simon

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We gratefully thank Jen Keister for technical assistance. This research was supported by award R01GM116867 from the National Institute of General Medical Sciences to JAB. Pool-seq sequencing was performed at the University of Rochester Genomics Research Center. JCS was supported by the Agence Nationale de la Recherche (grant ANR-11-BSV7-007) and France génomique (project 62 AAP 2009/2010). We thank Prof. Shin-Ichi Akimoto who provided samples and api phenotypes for pea aphid lines 09003A, Sap05VC7 and Iwamizawa.

Senior Editor

  1. Patricia J Wittkopp, University of Michigan, United States

Reviewing Editor

  1. Mia T Levine, University of Pennsylvania, United States

Reviewers

  1. Nancy Moran, Austin, United States
  2. Leif Andersson, Uppsala University, Sweden

Publication history

  1. Received: July 27, 2019
  2. Accepted: February 18, 2020
  3. Version of Record published: March 6, 2020 (version 1)

Copyright

© 2020, Li et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,047
    Page views
  • 137
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Developmental Biology
    2. Evolutionary Biology
    Neal Anthwal et al.
    Research Article
    1. Evolutionary Biology
    2. Neuroscience
    Ian W Keesey et al.
    Research Article