Colour polymorphism associated with a gene duplication in male wood tiger moths

  1. Melanie N Brien  Is a corresponding author
  2. Anna Orteu
  3. Eugenie C Yen
  4. Juan A Galarza
  5. Jimi Kirvesoja
  6. Hannu Pakkanen
  7. Kazumasa Wakamatsu
  8. Chris D Jiggins
  9. Johanna Mappes
  1. Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences, University of Helsinki, Finland
  2. Department of Zoology, University of Cambridge, United Kingdom
  3. Ecology and Genetics Research Unit, University of Oulu, Finland
  4. Department of Biological and Environmental Science, University of Jyväskylä, Finland
  5. Department of Chemistry, University of Jyväskylä, Finland
  6. Institute for Melanin Chemistry, Fujita Health University, Japan

Abstract

Colour is often used as an aposematic warning signal, with predator learning expected to lead to a single colour pattern within a population. However, there are many puzzling cases where aposematic signals are also polymorphic. The wood tiger moth, Arctia plantaginis, displays bright hindwing colours associated with unpalatability, and males have discrete colour morphs which vary in frequency between localities. In Finland, both white and yellow morphs can be found, and these colour morphs also differ in behavioural and life-history traits. Here, we show that male colour is linked to an extra copy of a yellow family gene that is only present in the white morphs. This white-specific duplication, which we name valkea, is highly upregulated during wing development. CRISPR targeting valkea resulted in editing of both valkea and its paralog, yellow-e, and led to the production of yellow wings. We also characterise the pigments responsible for yellow, white, and black colouration, showing that yellow is partly produced by pheomelanins, while black is dopamine-derived eumelanin. Our results add to a growing number of studies on the genetic architecture of complex and seemingly paradoxical polymorphisms, and the role of gene duplications and structural variation in adaptive evolution.

Editor's evaluation

Through genetic mapping and analysis of WGS data, the authors identify a gene duplication co-segregating with a color polymorphism in males of the aposematic tiger moth. They name the new gene valkea and investigate its expression and function in relation to wing pigmentation. Using CRISPR to disrupt valkea, they observe changes in wing color. However, because valkea was not the only gene edited, its causal role in the color polymorphism cannot be unambiguously established.

https://doi.org/10.7554/eLife.80116.sa0

Introduction

Colour polymorphisms, defined as the presence of multiple discrete colour phenotypes within a population (Huxley, 1955), provide an ideal trait to study natural and sexual selection. Colour phenotypes can have an effect on fitness in many contexts including camouflage, mimicry, and mating success. Colour is often associated with aposematism, where it acts as a signal, warning predators of unpalatability (Cott, 1940; Cuthill et al., 2017). In such cases, predator learning should favour the most common colour pattern, leading to positive frequency-dependent selection (Endler, 1988). Despite this, aposematic polymorphisms can be stable when selection is context-dependent (Briolat et al., 2019), especially where genetic correlations between colour phenotypes and other traits lead to complex fitness landscapes (reviewed by McKinnon and Pierotti, 2010).

A variety of genetic mechanisms can underpin these types of complex polymorphisms involving multiple associated traits (Orteu and Jiggins, 2020). In many cases, such complex polymorphisms are controlled by ‘supergenes’ in which divergent alleles at several linked genes are maintained in strong linkage disequilibrium by reduced recombination. The most common mechanism for locally reduced recombination are inversions, which range from single inversions involving a small number of genes, to multiple nested inversions covering large genomic regions (Joron et al., 2011; Wang et al., 2013; Küpper et al., 2016; Funk et al., 2021). Nonetheless, other mechanisms for reducing recombination, such as centromeres or large genomic deletions, may also play a role. An alternative mechanism is that a single regulatory gene controls variation via multiple downstream effects (Thompson and Jiggins, 2014). While there are fewer instances in which multiple phenotypes seem to be controlled by a single gene, one potential example is the common wall lizard, where colour genes have pleiotropic effects on behavioural and reproductive traits (Andrade et al., 2019). Multiple mutations within a single gene can also lead to variation in multiple traits (Linnen et al., 2013).

The wood tiger moth, Arctia plantaginis, has a complex polymorphism that has been well studied in an ecological context. Males show polymorphic aposematic hindwing colouration with discrete yellow, white, or red hindwing colour morphs found at varying frequencies in different geographic locations. In Finland, for example, both yellow and white morphs can be found, with white morphs varying in frequency from 40 to 75% (Galarza et al., 2014). In Estonia, white morphs make up 97% of the population, while yellows morphs form a completely monomorphic population in Scotland (Hegna et al., 2015; Figure 1). Long-term breeding studies of these moths have shown that male hindwing colour is a Mendelian trait controlled by a single locus with two alleles (Suomalainen, 1938; Nokelainen et al., 2022). White alleles (W) are dominant over the yellow (y). These colour genotypes also covary with behavioural and life-history traits, contributing to the maintenance of this polymorphism. Yellow males are subject to lower levels of predation in the wild (Nokelainen et al., 2012; Nokelainen et al., 2014), while white males have a positive frequency-dependent mating advantage (Gordon et al., 2015). There are differences in chemical defences, and bird reactions to these defences, between the genotypes (Rojas et al., 2017; Winters et al., 2021). Yellow morphs show reduced flight activity compared to white males, although yellows may fly at more selective times, that is, at peak female calling periods (Rojas et al., 2015). Increased reproductive success in Wy genotype females points towards strong heterozygote advantage (De Pasqual et al., 2022). In summary, there is a trade-off between natural selection through predation and reproductive success, which contributes to the maintenance of this polymorphism (Rönkä et al., 2020).

The wood tiger moth, Arctia plantaginis.

(A) Sampling locations and frequencies of yellow and white males in Finland, Scotland, and Estonia. (B) Males of the white and yellow colour morphs (credit: Samuel Waldron).

Despite the large body of research on A. plantaginis colour morphs, the genetic basis of this polymorphism is unknown. Here, we explore male hindwing colour variation using linkage mapping, whole-genome data, and gene expression analyses, with wild populations and lab crosses, to identify the locus controlling the colour polymorphism. We use CRISPR/Cas9 gene knockouts to determine the function of the identified gene, and then characterise the pigments producing yellow, white, and black colouration on the wings of male A. plantaginis. Our findings aim to provide an example of the genetic architecture controlling a trait that is part of a complex polymorphism.

Results

A narrow genomic region is associated with hindwing colour

To investigate the genetic basis of male hindwing colouration in A. plantaginis, we carried out a quantitative trait locus (QTL) mapping analysis using crosses between heterozygous Wy males and homozygous yy females. We used RADseq data aligned to the yellow A. plantaginis reference genome from 172 male offspring (90 white and 82 yellow) from four families. The QTL analysis identified a single marker associated with male hindwing colour (Figure 2A). This marker was found on scaffold YY_tarseq_206_arrow at position 9,887,968 bp (95% confidence intervals 9,349,978–9,888,009 bp) and had a LOD score of 32.8 (p<0.001). The significant marker explains around 75% of the phenotypic variation and, with one exception, yellow individuals all had a homozygous yy genotype at this marker.

Figure 2 with 5 supplements see all
A duplicated region in white morphs is associated with male hindwing colour.

(A) Quantitative trait locus (QTL) analysis of white and yellow F1 males (n = 172) reveals a 500 kb region significant on scaffold 206 of the yellow reference, part of linkage group 9. The dotted line indicates the significance threshold determined by permutation tests (p=0.05). (B) A genome-wide association study (GWAS) of wild samples (n = 46) showed SNPs associated with hindwing colour along the same scaffold. The dotted line shows the Bonferroni corrected significance threshold. (C) Alignment of the white and yellow reference genomes reveals an insertion in the white reference sequence that contains a copy of the yellow-e gene which we named valkea, in addition to the yellow-e present in both white and yellow morphs. (D) Mean read depth across the candidate region in all Finnish white (WW and Wy) and yellow (yy) samples.

To further narrow down this region, we ran a genome-wide association study (GWAS) using whole-genome sequences of males from four populations: polymorphic Southern Finland (5 white, 5 yellow) and Central Finland (10 white, 10 yellow), Estonia (4 white), where males are mostly white, and monomorphic Scotland (12 yellow), where all males are yellow. This identified a region of associated SNPs also on scaffold 206 (Figure 2B). Two SNPs, 137 bp apart (at positions 9,885,384 and 9,885,521), were significant above a strict Bonferroni corrected threshold. A total of 162 SNPs were over the threshold of p<0.0001 and, of these, 155 are within a 99 kb region on scaffold 206 (9,833,387–9,932,264 bp). The top SNPs are within 2.5 kb from the top QTL marker, and the SNP at this marker has a p-value<0.0001.

The 538 Kb QTL interval contains 21 genes (Supplementary file 1A) which were annotated with reference to Drosophila melanogaster. Of these genes, four are part of the yellow gene family. The top two SNPs from the GWAS, and the top marker from the QTL, fall in a non-coding region upstream of the gene, yellow-e, and are also close to an additional yellow gene, yellow-g.

Identifying structural variation in this region

The trio binning method used by Yen et al., 2020 to assemble the A. plantaginis reference genome produced two reference sequences, one for a white allele and one for a yellow allele. We extracted the region containing the QTL interval from the yellow reference and aligned it against the white reference. The alignment showed a duplicated region approximately 117 kb long on scaffold 419 of the white reference from around 6,941,000–7,058,000 bp (Figure 2—figure supplement 1). The yellow-e gene and its flanking regions are within this sequence and are therefore duplicated in the white reference (Figure 2C). One copy of the gene (named jg1310 in the W annotation) has seven exons and is similar to the yellow-e gene in the yellow reference (99.7% identity in coding sequences). The second copy unique to the white scaffold (jg1308) has only the first five exons (81.8% identical to the gene in the yellow reference), possibly due to a stop codon mutation in the fifth exon. For clarity, we named this duplicated white-specific copy valkea, in reference to a Finnish word for ‘white’. While all white samples had consistent coverage of reads across the duplicated region, coverage was patchy in yellow samples, with many regions having no and very low coverage in yellow samples (Figure 2D). Those reads that map in the valkea region in yellow samples are likely to be mapping errors, because the sequence similarity is high and mapping quality is reduced within the duplication (Figure 2—figure supplement 2). When increasing the mapping quality filtering, read depth decreases more in yellow samples compared to white samples in this region (Figure 2—figure supplement 3). We confirmed the absence of this region in yellow individuals by designing primers within the duplication (Supplementary file 1B, Figure 2—figure supplement 4), which only amplified in WW and Wy samples, including Finnish, Estonian, and lab populations.

To confirm that both of these gene copies are related to yellow-e, we compared them to yellow-e orthologues found in Bombyx mori, Heliconius melpomene, and D. melanogaster, along with other yellow genes from A. plantaginis and B. mori. Both of the tiger moth genes were most closely related to the H. melpomene yellow-e (Figure 2—figure supplement 5). Between valkea and yellow-e, there is an additional gene which showed highest similarity to Drosophila yellow-g2 (when extracted from both the white and yellow references). This gene is not part of the duplicated sequence and is present as a single copy in both morphs. Coverage across yellow-g and yellow-e genome regions in wild samples is similar in both morphs (Figure 2D). Upstream of the duplication is an unnamed gene (listed as jg6744 in the yellow annotation and jg1307 in the white). This is the same orthologous gene in both reference genomes, having 99.3% identity. Similarly, if we look at the 150 kb upstream region, sequence identity is 99.98%. There are no non-synonymous mutations between coding sequences of yellow-e when comparing the white and yellow references, although there are differences in the first exon of yellow-g. The absence of this duplicated region in the yellow morphs means we cannot determine if there is a change in linkage disequilibrium across white and yellow morphs.

Valkea is differentially expressed between morphs

To pinpoint which of these candidate genes is associated with male wing polymorphism in A. plantaginis, we next performed gene expression analyses across several developmental stages. Based on knowledge of the expression patterns of yellow genes (Ferguson et al., 2011) and other melanin pathway genes such as pale, ebony and ddc in Lepidoptera (Zhang et al., 2017b), we hypothesised that changes in gene regulation that control the development of wing colour morphs in the tiger moth most likely occur during pupal development. Pupal development in the wood tiger moth lasts for approximately 8 days, and no colour is present in the wings until day 7, when the yellow pigment appears. A few hours later, black melanin pigmentation is deposited. We sampled two stages early in development when no colouration is present in the wings (72 hr post-pupation, and 5-day-old pupae), and two stages later in development: the point when yellow appears in yy morphs (Pre-mel, 7-day-old pupae) and the other after black melanin has also been deposited (Mel, 7–8-day-old). Forty individuals were sampled in total – five per genotype and stage.

First, we explored the general patterns of expression by mapping RNAseq reads to the white reference genome, which contains the duplicated region that includes valkea. We filtered out lowly expressed genes, retaining 10,920 genes and used multidimensional scaling (MDS), a dimensionality reduction technique, to explore which factors explain genome-wide variation in gene expression between samples. We observed that samples clustered based on their developmental stage, suggesting it is an important factor driving differences in genome-wide gene expression between samples (Figure 3—figure supplement 1). Such a pattern would be expected as many genes are involved in development and thus are likely to be differentially expressed (DE) between developmental stages. No apparent clustering can be observed among samples of the same colour morph.

We next compared gene expression between yy and WW individuals at each of the developmental stages. Overall, 99 genes were differentially expressed (FDR < 0.05) between the two morphs (Figure 3A). Two of these DE genes, yellow-e and valkea, are two of the 22 genes identified in the GWAS and QTL analysis. Valkea was overexpressed in white individuals in the pre-melanin stage with a log fold change of 10.32 and a p-value of 2.18e-06. As valkea is only fully present in the W genome, it is not expected to be expressed at all in the Y genome. Yellow-e was also overexpressed in white individuals during the pre-melanin stage with a log fold change of 3.86 and adjusted p-value of 5.62e-06. In other developmental stages, neither valkea nor yellow-e showed differences in expression between morphs (Figure 3B).

Figure 3 with 2 supplements see all
Valkea is overexpressed in white males in the pre-melanin stage.

(A) In pink are genes that are significantly differentially expressed between yellow and white morphs at the pre-melanin stage. Valkea is the most differentially expressed (DE) gene (i.e. gene with the highest log fold change). (B) Expression of valkea across developmental timepoints shows that it has higher expression measured in Log2 CPM (counts per million) in white individuals compared to yellow ones. Expression of valkea in yellow morphs is around 0.

Of the 99 genes differentially expressed between white and yellow individuals across development, 49 were upregulated in the yy morph, while the remaining 50 were upregulated in WW individuals. The earliest developmental stage, 72 hr, was the stage with the highest number of DE genes (n = 48), while the 5-day-old stage had the fewest (n = 7). One gene which encodes a C2H2 zinc finger transcription factor in D. melanogaster, ‘jg15945’, was over-expressed in yy in the first three stages (Figure 3—figure supplement 2).

Finally, the GWAS and QTL peaks of association are situated in scaffold 419 of the WW reference assembly, which in a WW linkage map forms a linkage group along with six more scaffolds (472, 487, 515, 531, 540, and 609). We found that 12 genes present in this linkage group were differentially expressed, including valkea and yellow-e (Supplementary file 1C), and identified their orthologues in D. melanogaster.

CRISPR/Cas9 knockouts of valkea produce yellow hindwings

To confirm the function of valkea in wing colouration, we used CRISPR/Cas9 to knock out the gene in white morphs. We tested five different guides to target the first three exons of valkea, and injected Cas9/sgRNA duplexes into a total of 1223 eggs. Of 143 larvae that hatched, only six developed to adults (Supplementary file 1D). However, of the five males that did eclose, four had a visible change in phenotype. Males produced yellow scales instead of white on the dorsal side of both the forewings and the hindwings (Figure 4). Forewings were more yellow than in the wildtype yellow males, which usually have lighter forewings compared to hindwings. White scales on the ventral side of the wings also became yellow, similar to wildtype yellows. Black melanin patterning did not seem to be affected. Variation in the amount of melanin can be attributed to the populations from which the individuals originated, with the darker samples coming from the Finnish population (Figure 4—figure supplement 1). Wildtype white morphs also reflect UV, particularly on the hindwings, but this is not seen in the CRISPR males (Figure 4—figure supplement 2). This could suggest a change in scale structure, or that the yellow pigment covers the UV-reflecting structures. To quantify changes in visible and UV colour, we took spectral measurements of the mutant males and compared them to wildtype males (Figure 4—figure supplement 4). The reflectance spectra for the hindwings of the CRISPR males most closely resembled that of wildtype yellow males, in both visible and UV wavelengths.

Figure 4 with 4 supplements see all
CRISPR/Cas9 knockouts of valkea transforms white scales into yellow scales across both hindwings and forewings.

Wildtype WW and yy morphs (top), and the dorsal and ventral sides of one of the CRISPR knockout males (bottom).

Four out of the five guides tested produced a mutant phenotype, with no differences in the male phenotype between guides. We used whole-genome sequences of the mutants to confirm that the correct sites in valkea had been targeted (Figure 4—figure supplement 3). All samples also showed evidence of editing at the corresponding yellow-e exons, which mainly involved insertions. As all genotypes have similar forewing colour in the wildtypes, we do not expect valkea to affect the forewing and thus the change in forewing colour could be attributed to a yellow-e mutation. Only one female survived to adulthood, and this had a mosaic phenotype. Female colour does not correlate with the male colour genotypes, and the forewings of females are a pale-yellow colour. This individual with a mosaic phenotype had one mutant forewing which was much more yellow/orange than the wildtype. The rest of the wings and body resembled a wildtype female (Figure 4—figure supplement 1/Figure 4—figure supplement 4). Reflectance spectra show that the mutant left forewing is closer in colour to the yellow/orange on the hindwings, than to the colour of the opposite forewing (Figure 4—figure supplement 4). Since a valkea knockout is not expected to affect female phenotypes as they always have orange/red hindwings, this could be further evidence for the effect of yellow-e on forewing colour. We also checked for potential off-target effects of the CRISPR on other yellow genes. There was no evidence of editing (insertions, deletions or mutations) in the yellow genes c, d2, f, f2, g2, h, and yellow itself.

Survival of the eggs varied between the guides, although this was largely affected by the female, as hatching rate between females ranged from 0 to 70%. Females often lay unfertilised eggs, so we expect that hatching rate will be low in some crosses. Using two guides in combination did not produce any pupae or adults.

Pigment analysis

Since the yellow gene family, to which valkea is related, is known to be responsible for the production of melanin pigments, we further investigated the identity of the wing pigmentation. First, we ruled out the presence of several non-melanin pigment types in the hindwings, including pterins and carotenoids. Pterins are commonly found in insects and, along with purine derivatives, papiliochromes and flavonoids, are soluble in strong acids and bases or in organic solvents (Umebachi, 1975; Kayser, 1985; Shamim et al., 2014). We placed wing samples from each morph in NaOH overnight, then measured the absorbance of the supernatant using a spectrophotometer. We also left wings in methanol overnight before measuring the supernatant. The spectra did not show any peaks indicative of any pigment dissolved in the sample. Similarly, we found no evidence for carotenoid pigments after dissolving in a hexane:tert-butyl methyl ether solution (Figure 5—figure supplement 1). Wings did not fluoresce under UV light, providing further evidence for the lack of fluorescent pigments including pterins, flavonoids, flavins, and papiliochromes (Umebachi, 1975; Kayser, 1985). Ommochromes are red and yellow pigments; high-performance liquid chromatography (HPLC) ruled out the presence of these pigments on the moth wings, which we compared to data from ommochrome-containing Heliconius wings and a xanthurenic acid standard (Figure 5—figure supplement 2).

HPLC analysis showed peaks characteristic of pheomelanin (Figure 5). Pheomelanins produce red-brown colour in grasshoppers and wasps (Galván et al., 2015; Jorge García et al., 2016), and orange-red colours in ants and bumblebees (Hines et al., 2017; Polidori et al., 2017). Insects generally have dopamine-derived pheomelanin and a breakdown product of this is 4-amino-3-hydroxyphenylethylamine (4-AHPEA) (Barek et al., 2018). Yellow wings showed large peaks for 4-AHPEA. White wings had around 27% of the 4-AHPEA levels seen in yellow wings, and black sections of the wings had 16%. Hydrogen iodide hydrolysis of wings produced the isomer 3-AHPEA, which may come from 3-nitrotyramine originating from the decarboxylation of 3-nitrotyrosine. Reduction of 3-nitrotyrosine produces 3-AHP, another marker of pheomelanin (Wakamatsu et al., 2002).

Figure 5 with 3 supplements see all
High-performance liquid chromatography (HPLC) analysis shows that the highest levels of 4-AHPEA, a breakdown product of pheomelanin, are seen in the yellow wings.

Measurements for yellow, white, and black portions of the hindwing, plus the standard (Std) are shown.

Analysis of the black portions of the wing found pyrrole-2,3-dicarboxylic acid (PDCA) and pyrrole-2,3,5-tricarboxylic acid (PTCA) (Figure 5—figure supplement 3). Both are components of eumelanin (Barek et al., 2018), suggesting that the black colouration seen in the wood tiger moth is predominantly eumelanin derived from dopamine. This is common in producing black colouration and providing structural components of the exoskeleton. In addition, dopamine is acylated to both N-β-alanyldopamine (NBAD) and N-acetyldopamine (NADA) sclerotins. NADA sclerotins are colourless and likely to be present on the white wings. This analysis of pigmentation is therefore consistent with a role for yellow family genes in regulating the colour polymorphism.

Discussion

Hindwing colouration of male Arctia plantaginis is polymorphic and these colour morphs vary in multiple behavioural and life-history traits, providing an example of a complex polymorphism. Here, we have shown that variation in male hindwing colour is associated with a duplicated sequence found only in white morphs and containing a gene from the yellow gene family. The white-specific copy, valkea, is highly expressed during pupal development, consistent with genetic dominance of the white allele. When valkea is knocked out in the white morphs, yellow pigment is produced, although due to the similarity of the sequences, yellow-e was also edited. While we cannot confirm that valkea is solely responsible for the white/yellow switch, we can rule out the role of other yellow family genes found along the same chromosome (b, d2, h, and g2).

These results add to the increasing evidence for the role of gene duplications in the evolution of adaptive genetic variation. Genes for the metabolism of proteins in Heliconius butterflies underwent several duplications, facilitating changes in diet and adaptation to pollen feeding (Smith et al., 2016). In Zerene cesoina butterflies, recent partial duplications of the transcription factor doublesex, resulting from multiple duplication events, are associated with sex-specific wing patterning. The duplicated paralog acts as a repressor of genes producing UV-reflecting wing scales in females (Rodriguez-Caro et al., 2021).

We hypothesise that the morph-specific duplication that we see in A. plantaginis provides a region of reduced recombination between morphs, as the duplicated region is effectively hemizygous and cannot recombine except in homozygote genotypes, which could contribute to the maintenance of the complex polymorphism and the linkage of multiple traits. This is similar to the genetic architecture of the Primula supergene controlling heterostyly, which involves a large duplication containing five genes (Huu et al., 2020). In polymorphic Papilio dardanus, one colour pattern morph is associated with a duplicated region, again providing physical constraints on recombination (Timmermans et al., 2014). Nonetheless, in the case of the wood tiger moth, it remains unclear how a single gene, such as valkea, can control the development of a broad array of phenotypic traits.

One possible mechanism is that there is a regulatory element along the scaffold which is controlling colour via the valkea gene, but also regulating other genes to control different phenotypic traits. We found the most significant markers and SNPs located in a non-coding region close to the yellow genes, which likely contains a cis-regulatory element (CRE) controlling transcription of valkea. In cichlids, for example, a CRE at the gene encoding agouti-related peptide 2 controls variation in strip patterning in two closely related species (Kratochwil et al., 2018). Conserved CREs were shown to have wide-ranging effects on wing patterning in multiple Nymphalidae butterflies (Mazo-Vargas et al., 2022).

Differential expression of other genes on the same chromosome controlled by the CRE could explain variation in covarying traits. The overexpression of another gene, possibly encoding a zinc transcription factor, in yellow individuals in the early pupal stages suggests that there is differential expression of unlinked genes as a result of the polymorphism, although since this gene is on a different chromosome to valkea it is unlikely to be directly controlled by the CRE. Another hypothesis is that somehow valkea itself regulates other genes. However, yellow family genes are not known to regulate transcription of other genes, unlike, for example, doublesex, which undergoes alternative splicing and female mimetic wing pattern polymorphism in Papilio polytes (Kunte et al., 2014; Nishikawa et al., 2015).

The yellow family genes are highly conserved throughout insects (Ferguson et al., 2011). They have been widely linked to colouration (Wittkopp et al., 2002; Miyazaki et al., 2014; Zhang et al., 2017a; Zhang et al., 2017b), as well as behaviour, sex-specific phenotypes, and reproductive maturation (Wilson et al., 1976). These genes share a common origin with the major royal jelly protein (MRJP) genes (Drapeau et al., 2006) which are crucial in caste development in honeybees. Like the MRJP genes, yellow genes in honeybees have diverse spatial and temporal expression patterns. As our focus in A. plantaginis has been on wing tissue, we are missing expression of genes in other tissues that could be linked to other traits. Thus, it is not impossible to imagine that a yellow gene could have a similar function to a MRJP in regulating the development of a complex phenotype. Recent work with Bicyclus anynana showed that yellow functions as a repressor of male courtship (Connahs, 2022). On the other hand, sex-specific behavioural phenotypes of yellow mutants in Drosophila were found to be due to pigmentation effects (Massey et al., 2019), so more evidence is needed to suggest a functional role for yellow genes outside of pigmentation.

The duplication of yellow-e and surrounding regions in the white morphs suggests that the yellow morph is the ancestral form. Valkea could have evolved in a stepwise fashion, first as a tandem duplication then with a stop codon mutation altering the gene structure. Gene duplications can facilitate adaptation and, in some examples, lead to polymorphism. The fact that the white allele is dominant also supports the hypothesis that yellow is ancestral. Such invasions of new adaptive alleles are facilitated when the new allele is dominant, as it is then also expressed when heterozygous, that is, the Haldane’s sieve effect. Melanism, for example, has repeatedly evolved in mammals due to dominant and semidominant mutations in the Mc1r locus which have become fixed (Hoekstra, 2006).

Valkea could represent an example of neofunctionalisation, where the duplicated gene gains a different function to the original gene copy. In the CRISPR mutants, both forewings and hindwings became yellow, and thus we hypothesise that valkea is controlling hindwing colour while yellow-e controls forewing colour. Since we do not expect valkea to have an effect on female wing colour, the change in forewing colour in the female could be attributed to the yellow-e knockout, although knockouts of yellow-e only are needed to confirm this. Those with the mutant phenotype showed only small deletions or insertions around the target site. By combining multiple guides we may expect to see larger deletions (Mazo-Vargas et al., 2022); however, none of the eggs that were injected with more than one guide survived past the larval stage, suggesting that large deletions in yellow genes reduce fitness.

Contrary to previous work that attributed red and yellow colours to pterins in another tiger moth species (Gawne and Frederik Nijhout, 2019), we found high levels of 4-AHPEA in the yellow wings confirming the presence of pheomelanins. These pigments have been widely associated with red and yellow colours in mammals (e.g. Mcgraw and Wakamatsu, 2004), but only relatively recently described in insects and likely to be more widespread than previously thought. Yellow colours can also be produced by NBAD sclerotins which are sclerotising precursor molecules made from dopamine and these have an important role in the sclerotisation pathway for hardening the insect cuticle (Andersen, 2007; Barek et al., 2017) before becoming involved in melanisation (Barek et al., 2018). Thus, we suggest that the yellow colour arises partly from the NBAD sclerotins and partly from the presence of pheomelanin pigments, which has been proposed in other Lepidoptera (Matsuoka and Monteiro, 2018). While some 4-AHPEA also occurred in white wings, this may be due to its role in production of cross-linking cuticular proteins and chitin during sclerotisation (Sugumaran, 2010). Upregulation of genes on the white allele could be acting as a repressor of the generation of yellow colour. If we may speculate, valkea could impact the catalysis of dopamine, having cascading effects down the pathway resulting in the lack of yellow pigmentation. We suspect that yellow family genes play multiple roles within the melanin production pathway. In the wood tiger moth, yellow affects the conversion of DOPA into black dopamine melanin (Galarza, 2021). Yellow-e in particular has been linked to larval colouration in B. mori (Ito et al., 2010) and adult colour in beetles (Wang et al., 2022), while another gene, yellow-f, has a role in eumelanin production (Barek et al., 2018).

In summary, we identified a structural variant which is only present in white morphs of A. plantaginis. This region contains a previously undescribed gene, valkea, which when knocked out results in yellow wings. The presence of a regulatory element controlling wing colour and other traits via multiple downstream effects could explain how multiple traits are linked to wing colouration. This complex polymorphism allows multiple beneficial phenotypes to be inherited together, whereas recombination would separate multiple loci leading to maladapted individuals. Our results provide the basis for further exploration of the genetic basis of covarying behavioural and life-history traits, and offer an intriguing example for the role of gene duplications in adaptive variation.

Methods

Sampling

Homozygous lines of white (WW) and yellow (yy) A. plantaginis moths were created from Finnish populations at the University of Jyväskylä, Finland. Larvae were fed with wild dandelion (Taraxacum sp.) and reared under natural light conditions, with an average day temperature of 25°C and night temperature between 15 and 20°C. For the crosses, a heterozygous male, created from crossing a heterozygous male with a homozygous yy female, was backcrossed with a yy female. This was repeated to obtain four families totalling 172 offspring and 8 parents (Supplementary file 1E). Samples from wild populations were caught in Southern Finland (n = 10) and Central Finland (n = 20), where male morphs are either white or yellow, Estonia (n = 4), where males are mostly white, and Scotland (n = 4), where males are yellow (Supplementary file 1F). In addition, we included eight samples which are F1 offspring of wild Scottish samples. Forty pupae with known genotypes from lab populations (20 WW and 20 yy) were used for the RNA extractions.

DNA extraction and sequencing

For the lab crosses, DNA was extracted from two legs crushed with sterilised PVC pestles using a QIAGEN DNeasy Blood & Tissue kit, following the manufacturer’s instructions. Library preparation and GBS sequencing were performed by BGI Genomics on an Illumina HiSeq X Ten. For the wild samples, DNA was extracted from the thoraces also with a QIAGEN kit. Library preparation and sequencing were performed by Novogene (Hong Kong, China). 150 bp paired-end reads were sequenced on an Illumina NovaSeq 6000 platform.

Linkage mapping analysis

FASTQ reads were mapped using bowtie v2.3.2 (Langmead and Salzberg, 2012) to the yellow A. plantaginis scaffold-level genome assembly (Yen et al., 2020). BAM files were sorted and indexed using SAMtools v.1.9 (Li et al., 2009) and duplicates removed using PicardTools MarkDuplicates (RRID:SCR_006525). Twelve samples which had aligned <30% were removed. Reads of the remaining samples had an average alignment of 94%. SNPs were called using SAMtools mpileup with minimum mapping quality set to 20 and bcftools call function. Lep-MAP3 (Rastas, 2017) was used for linkage map construction and we ran the following modules: ParentCall2 which called 105,622 markers, Filtering2, SeparateChromosomes2 with lodLimit = 5 and sizeLimit = 100, JoinSingles2All and OrderMarkers2 with recombination2 = 0 to denote the lack of female recombination. Genotypes were phased using the map2genotypes.awk script included with Lep-MAP3. Markers were named based on the genomic positions of the SNPs in the reference genome and the map.awk script, and this was used to further order the markers within the linkage groups. This resolved 30 linkage groups. Although we expect that there are 31 chromosomes in the moth genome, we suspect that the sex chromosome is missing in this assembly as the yy individual used in the genome assembly was female (Yen et al., 2020). A small number of markers which caused long gaps at the beginning or end of linkage groups were manually removed, leaving the final map 948.7 cM long with 19,803 markers. Markers were well distributed so we began the first analyses with this map. A linkage map was also assembled using sequences aligned to the white reference and this separated into 31 linkage groups.

The QTL analysis was carried out in R/qtl (Broman et al., 2003). Genotype probabilities were calculated before running a genome scan using the scanone function with the Haley–Knott method and binary model parameters, and including family as an additive covariate. The phenotype was labelled as either 0 (Wy) or 1 (yy). We ran 5000 permutations to determine the significance level for the QTL LOD scores. The bayesint function calculated the 95% Bayesian confidence intervals around the significant marker.

Analysis of whole-genome sequences

FASTQ reads were mapped to the yellow A. plantaginis genome assembly (Yen et al., 2020) using BWA-MEM v7.17 (Li, 2013). As before, BAM files were sorted and indexed, and duplicates were removed. Genotyping and variant calling was carried out with the Genome Analysis Toolkit (GATK) (McKenna et al., 2010). Variants were called using HaplotypeCaller (v.3.7) in GVCF mode, then gVCFs combined with GenomicsDBImport (v.4.0). Joint genotyping was run with GenotypeGVCFs, set with a heterozygosity of 0.01, and SNPs were called using SelectVariants. Finally, the set of 20,787,772 raw SNPs were filtered using VariantFiltration and thresholds: quality by depth (QD > 2.0), root mean square mapping quality (MQ > 50.0), mapping quality rank sum test (MQRankSum > −12.5), read position rank sum test (ReadPosRankSum > −8.0), Fisher strand bias (FS < 60.0), and strand odds ratio (SOR < 3.0). A set of 5,227,288 SNPs passed the filtering.

We carried out a GWAS using the R package GenABEL v.1.8 (Aulchenko et al., 2007). The set of filtered SNPs were converted to BED format with PLINK2, keeping only biallelic SNPs (https://www.cog-genomics.org/plink/2.0/). Sites which were not in Hardy–Weinberg equilibrium (p<0.01), or had a call rate of <0.5, were excluded. Following this, 381,266 sites were retained across 40 individuals (out of 57). To account for population stratification, we performed MDS on kinship and identity-by-state (IBS) information estimated from the data, and included this as a covariate in the association test. Significance levels were calculated using Bonferroni corrected thresholds to account for multiple testing. Central and Southern Finnish populations were pooled for this analysis, based on a previous principal component analysis of these samples (Yen et al., 2020).

In Yen et al., 2020, many of these samples were processed in the same way but aligned to the white genome assembly. Read depth of the W-mapped samples was calculated in 1 kb windows across the candidate region using BEDtools (v.2.20.1) multicov (Quinlan and Hall, 2010). For visualisation, lines were smoothed using LOESS and span = 0.01 within ggplot2.

For analysis of structural variants, sequences from the white and yellow genome assemblies were aligned using MAFFT v7.450 (Katoh and Standley, 2013) and viewed with Geneious. Our focal sequence, scaffold 419 in the white genome, is the reverse complement of scaffold 206 in the yellow genome. Figure 2C was plotted with pafr (Winter et al., 2020).

Identification of candidate genes and tree construction

To identify candidate genes in the QTL interval and GWAS region, we ran a protein BLASTP v.2.4.0 search to identify H. melpomene (Hmel2.5) proteins homologous to predicted A. plantaginis proteins in the region from the genome annotation. Informative gene names were obtained by performing a BLASTP search with the H. melpomene proteins against all D. melanogaster proteins in FlyBase v.FB2020_01 (flybase.org/blast).

For the yellow gene tree, we used Lepbase (Challi et al., 2016) to search for yellow genes in B. mori (ASM15162v1). We identified yellow-e in H. melpomene by searching for major royal jelly proteins, then comparing protein sequences of these against Drosophila proteins in FlyBase. The sequence for Dmel yellow-e was downloaded from FlyBase. To make the tree, coding sequences of all genes were aligned in Geneious using MAFFT v7.450 (Katoh and Standley, 2013), then the tree was constructed with PhyML using 10 bootstraps (Guindon et al., 2010).

Differential gene expression

We dissected the wings out of the pupae in Cambridge, UK. Pupae and larvae were sent to Cambridge from Jyväskylä and were kept between 22 and 30°C. Pupae were sexed and only males were used. Dissections were made at four different stages: 72 hr after pupation (72 hr), 5 d after pupation (5 d; counting 0–24 first hours after pupation as day 1), pre-melanin deposition (Pre-mel), and post-melanin deposition (Mel). We sampled five individuals per stage and genotype. Hindwings and forewings were stored separately in RNA-later (Sigma-Aldrich) at 4°C for 2 wk and later transferred to –20°C, while the rest of the body was stored in pure ethanol. Only hindwings were used for RNAseq analysis.

Total RNA was extracted from hindwing tissue using a standard hybrid protocol. First, we transferred the wing tissue into Trizol Reagent (Invitrogen) and homogenised it using dounce tissue grinders (Sigma-Aldrich). Then, we performed a chloroform phase extraction, followed by DNase treatment (Ambion) for 30 min at 37°C. We measured the concentration of total RNA using Qubit Fluorometric Quantitation (Thermo Fisher) and performed a quality check using an Agilent 4200 TapeStation (Agilent). The extracted total RNA was stored at –20°C before being sent to Novogene UK for sequencing. Each individual was sequenced separately, with a total of 40 individual samples sequenced (five individuals per stage and genotype).

We performed quality control and low-quality base and adapter trimming of the sequence data using TrimGalore! We then mapped the trimmed reads to the two A. plantaginis genomes using STAR (Dobin et al., 2013). We performed a second round of mapping (2pass) including as input the output splice junctions from the first round. The A. plantaginis genome annotations WW and YY were included in each round of mapping respectively. We then used FeatureCounts to count the mapped reads. Finally, we used DESeq2 to analyse the counts and perform the DE analysis.

To identify the gene or genes controlling the development of wing colour in A. plantaginis, we performed a genome-wide differential expression analysis using limma-voom (Ritchie et al., 2015). First, we defined a categorical variable, ‘GenStage’, with eight levels containing the genotype and stage information of every individual sample (e.g. YY72h, WW72h, YY5days, etc.). Then, we built the design matrix fitting a model with GenStage as the only fixed effect factor contributing to the variance in gene expression and included family as a random effect factor (gene expression ~ 0 + GenStage + (1|Family)). We then filtered lowly expressed genes using the filterByExpr function in limma, which resulted in a reduction of the number of tested genes from 17,615–11,330 genes in the Y-mapped analysis and 17,930–10,920 in the W-mapped one. Then, we normalised the expression of the genes using the calcNormFactors function with TMM normalisation in limma and fit the design matrix using the voom function. We built a contrast matrix including the comparisons of interest, in which we compared the expression of the genotypes in each stage (i.e. h72 = WW72h-YY72h, d5 = WWd5-YYd5, Premel = WWPremel-YYPremel, Mel = WWMel-YYMel), and fit the contrast matrix to the data using the contrasts.fit function. Finally, we used the eBayer function on the fit dataset and we extracted the list of genes that are differentially expressed in each stage using the Benjamini–Hochberg procedure to correct for multiple testing. We evaluated the genome-wide gene expression using MDS using the plotMDS function of the limma package.

Orthology assignment

To infer genome-wide orthology between A. plantaginis and D. melanogaster, we used OrthoFinder (v2.5.4) (Emms and Kelly, 2019). We used proteomes from six Lepidoptera species, Plutela xylostella (GCA_905116875_2), B. mori (GCF_014905235_1), Spodoptera frugiperda (GCF_011064685_1), Parnassius apollo (GCA_907164705_1), Pieris macdunnoughi (GCA_905332375_1), Pararge aegeria (GCF_905163445_1), and D. melanogaster (GCF_000001215_4). We ran the primary_transcript.py utility from OrthoFinder to extract only one transcript per protein, and then ran OrthoFinder with default settings.

CRISPR/Cas9 genome editing

Guide RNAs were designed within the first three exons of valkea in the white genome annotation using Geneious (v. 2022.1.1). Guides were chosen with minimal off-target effects, high activity scores, and high specificity scores based on the Geneious algorithm (Supplementary file 1G). Guides in the first two exons of valkea showed off-target sites in yellow-e; however, those in exon 3 showed no off-target sites. Guide RNAs were synthesised by Sigma-Aldrich. Moths from the greenhouse populations, originating from Finnish and Estonian populations, at the University of Helsinki, Finland, were genotyped using DNA extracted from leg tissue using the Chemagic DNA tissue kit (Chemagen) and the primers detailed below. They were paired and left to mate overnight. Females were watched over the next 3–4 d for egg laying, and the eggs were removed and injected less than 6 hours after laying. Eggs were glued to microscope slides and injected with a 1:1 sgRNA/Cas9 mix with phenol red dye using pulled borosilicate glass capillaries. The injection mix contained 1 ug/ul Cas9, 500 ng/ul sgRNA, and 0.5% phenol red. Guides and Cas9 were diluted using low concentration TE buffer. Different combinations of guides were also injected in some samples, in which case these were mixed in a 1:1 ratio. Injections were performed using a MPPI-3 pressure injector with back pressure unit (ASI). In total, we injected 1223 eggs from 18 [WW × WW] or [WW × Wy] crosses. Larvae were kept individually in Petri dishes and fed daily with dandelion leaves. After eclosion, legs were taken from adults for DNA extraction. Library preparation and whole-genome sequencing (using Illumina NovaSeq 6000) of five CRISPR mutants were performed by CeGaT (Tübingen, Germany). We used these whole-genome sequences to confirm the editing of the valkea gene in the mutants. Sequences were aligned to the white reference genome using BWA-MEM as detailed earlier. We visualised the valkea gene sequences using Geneious and looked for insertions and deletions within and around the locations of the guides. This was repeated for the other yellow family genes (b, c, d2, e, f, f2, g2, h, and yellow).

Genotyping white and yellow alleles

We used Primer3 to design primers within the duplicated region. Primers were expected to only amplify in WW and Wy individuals. The alignment of the white and yellow sequences was then used to design primers for genotyping the locus (Supplementary file 1B). We looked for short insertions or deletions that were fixed between the WW and yy within the valkea/yellow-e region, and put primers around these structural variants. Primers were tested on DNA extractions from moths of known genotypes, including both sexes, wild and lab samples. We used Sanger sequencing of the PCR product to confirm the correct sequences were amplified. A set of primers successfully amplified a 449 bp region downstream of valkea within the duplication. This amplified in WW and Wy samples, but not in yy (Figure 2—figure supplement 4—source data 1). We found that white alleles have a 35 bp deletion within an intron of the yellow-e gene. We amplified a 163 bp region around this (YY_tarseq_206_arrow:9,846,212–9,846,375) using a standard PCR protocol which allowed us to identify the allele based on the size of the PCR product. Yellow alleles produce the full 163 bp sequence, while white alleles produce a smaller 128 bp product (Figure 2—figure supplement 4). Heterozygotes have a copy of each and show both bands on a gel.

Photography and spectrophotometry

Photographs of CRISPR mutant and wildtype moths were taken under standard lighting conditions with a Samsung NX1000 digital camera converted to full-spectrum with no quartz filter to enable ultraviolet (UV) sensitivity fitted with a Nikon 80 mm lens. A UV and infrared blocking filter was used for the human-visible photos, which transmits wavelengths between 400and 680 nm (Baader UV/IR Cut Filter). For the UV images, a UV pass filter was used (Baader U filter), which transmits wavelengths between 320 and 380 nm. Images were standardised using grey-scale reflectance standards (Avian Technologies, Micro FSS08).

Reflectance spectra of coloured regions of the forewings and hindwings of 22 lab stock (including WW, Wy, and yy genotypes) and five CRISPR mutant moths were recorded with a UV-VIS spectrometer (Ocean Insight HR4PRO) connected to a xenon light source (Ocean Insight PX-2). Measurements were normalised using a diffuse white standard (Spectralon 99%). We used the OceanView software (v.2.0.8) to record scans with a boxcar width of 5 and integration time of 5000 ms. Measurements were repeated three times and the mean used. Reflectance spectra were plotted and analysed using the R package pavo (Maia et al., 2013).

Pigment analysis

Solubility and fluorescence tests

Five hindwings from each morph were placed in two separate solvents (0.1 M NaOH and 90% MeOH) and left overnight. The supernatant was analysed with an Agilent Cary 8454 UV-Visible spectrophotometer and the spectra compared to known spectra for various pigments. A UV lamp (Philips TL8W/08F8T5/BLB) was used to test for fluorescence on the wings. The presence of carotenoids was tested by placing wings into 1 ml of pyridine and leaving at 95°C for 4 hr (McGraw et al., 2002). To these we added 1 ml of 1:1 hexane:tert-butyl methyl ether and 2 ml of water before shaking and leaving overnight. Again, the supernatant was measured with the spectrometer.

HPLC test for eumelanin and ommochrome pigments

To determine the type of melanin producing the black colour on the wings, we cut out approximately 5 mg of the black sections of the wings, from both females and males. Eumelanin analysis was carried out according to Borges et al., 2001. Each sample was added to a tube containing 820 μl 0.5 M NaOH, 80 μl 3% H2O2 and an internal standard (48 nmol phthalic acid) and heated in a boiling water bath for 20 min. Once cool, 20 μl of 10% Na2SO3 and 250 μl of 6 M HCl were added. Samples were then extracted twice with 7 ml of ethyl acetate. Ethyl acetate was dried at 45°C under a stream of nitrogen. The residue was dissolved into 0.5 ml of 0.1% formic acid.

We carried out HPLC on an Agilent 1100 HPLC. 20 μl of the sample was injected into a Waters Atlantis T3, 100 × 3.0 mm i.d. analytical column (Waters, Milford, MA). The column was set to 25°C and analytes were detected at wavelength 280 nm. The HPLC mobile phase consisted of two eluents: UHQ-water/MeOH (98/2; v/v) with 0.1% formic acid and UHQ-water/MeOH (40/60; v/v) with 0.1% formic acid. Flow rate was 0.4 ml/min and the used gradient started with 100% of eluent A and ramped evenly from time 0–15 min to 40:60 (A:B; v/v), held at 40:60 for 6 min, and ramped evenly back to initial eluent composition (100% A) over 5 min. We compared chromatograms obtained from the samples to the chromatograms obtained from synthetic melanin, ink from sepia officinalis and black human hair.

HPLC was also applied to observe the possible presence of ommochrome pigments. Injection volume was 10 µl and for the separation we used the same Waters Atlantis T3 column (100 × 3.0 mm i.d.) set to 30°C. Solvent A was UHQ-water and B was acetonitrile (ACN), both containing 0.1% formic acid. Flow rate was 0.4 ml/min and the used gradient was as follows: initial flow ratio was 98/2 water/ACN (v/v) ramping then evenly from time 1–15 min to 30:70 water:ACN (v/v), held for 1.5 min and then ramped evenly back to initial eluent composition over 0.5 min. The column was stabilised for 7 min before a new run.

Pheomelanin analysis

Samples were analysed for pheomelanin content according to the method of Kolb et al., 1997 with modifications. A 2 mg sample was placed in a screw-capped tube with 100 μl water, 500 μl ~55–58% hydrogen iodide (HI), and 20 μl 50% hypophosphorous acid (H3PO2). Samples were capped tightly and hydrolysed for 20 hr at 130°. After cooling, samples were evaporated under nitrogen flow, then dissolved in 1 ml of 0.1 M HCl and purified with solid-phase extraction. Strata SCX cartridges were preconditioned with 2 ml of methanol, 3 ml of water, and 1 ml of 0.1 M HCl. Sample was then applied to the cartridge, washed with 1 ml of 0.1 M HCl, and finally eluted with 1 ml of methanol (MeOH): 0.5 M ammonium acetate (NH4CH3CO2) (20:80 v/v).

Hydrogen iodide hydrolysis products were determined by a Dionex HPLC equipped with pulsed amperometric detection (HPLC/PAD). A Phenomenex Kinetex C18 column (150 × 4.6 mm i.d.; 5 µm particle size) with a gradient elution (Supplementary file 1H) at a flow rate of 0.9 ml min- 1 with the eluents: (A) sodium citrate buffer (Hines et al., 2017) in ultra-high-quality water (internal resistance ≥ 18.2 MΩ cm; Milli-Q Plus; Millipore, Bedford, MA) and (B) methanol were used for the separation. Dionex ED-50 pulsed amperometric detector (Dionex, Sunnyvale, CA) equipped with a disposable working electrode by using a Dionex waveform A with potentials presented in Supplementary file 1I was used for detection. The preparation method for the 4-AHP, 3-AHPEA, and 4-AHPEA standards used in calibration is described in Wakamatsu et al., 2014.

Data availability

Scripts and data for the QTL, GWAS and DE analyses can be found at doi: https://doi.org/10.5281/zenodo.8208751. RADseq, RNAseq data, and WGS of CRISPR samples were deposited to SRA under study accession number PRJNA937225. Raw sequencing data of wild samples has previously been deposited in ENA, study accession No. PRJEB36595.

The following data sets were generated
    1. Brien MN
    2. Orteu A
    (2022) Zenodo
    Colour polymorphism associated with a gene duplication in male wood tiger moths.
    https://doi.org/10.5281/zenodo.8208751
    1. Brien MN
    2. Orteu A
    (2022) NCBI BioProject
    ID PRJNA937225. Arctia plantaginis Raw sequence reads.
The following previously published data sets were used
    1. Yen EC
    (2020) EBI European Nucleotide Archive
    ID PRJEB36595. A haplotype-resolved, de novo genome assembly for the wood tiger moth (Arctia plantaginis) through trio binning.

References

  1. Book
    1. Cott HB
    (1940)
    Adaptive Coloration in Animals
    Methuen.
    1. Endler JA
    (1988) Frequency-dependent predation, crypsis and aposematic coloration
    Philosophical Transactions of the Royal Society of London. B, Biological Sciences 319:505–523.
    https://doi.org/10.1098/rstb.1988.0062
  2. Book
    1. Kayser H
    (1985)
    Pigments
    In: Kerkut GA, Gilbert LI, editors. Comparative Insect Physiology, Biochemistry, and Pharmacology. Oxford: Pergamon Press. pp. 367–415.

Decision letter

  1. Patrícia Beldade
    Reviewing Editor; University of Lisbon, Portugal
  2. Christian R Landry
    Senior Editor; Université Laval, Canada

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Colour polymorphism associated with a gene duplication in male wood tiger moths" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Christian Landry as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers provided very thorough individual reviews and have also validated each other's comments. The Reviewing Editor has subsequently drafted this to help you prepare a revised submission.

Essential revisions:

1) To frame it in terms of "supergene entailing reduced recombination", the work requires quantification of "lower recombination" within the duplicated segment, and more detailed characterization of the 5' end of that segment. Alternatively, claims of "supergene"-like behavior should be explicitly stated as a hypothesis. In terms of "supergene" pleiotropic effects, it seems that the association between duplication and polymorphism is shown directly only for pigmentation, and not any other phenotypes that covary with that. The association to other traits should also be presented as hypothetical.

2) Definite proof that valkea, and not something else in the duplicated region (e.g. regulatory sequence responsible for expression differences between morphs for other genes in that linkage group), is responsible for the white phenotype requires functional analysis. Possibly, the more accessible type of analysis would involve using CRISPR-Cas9 to knock-out valkea from a white morph background. That being impossible, showing spatial patterns of valkea (and other genes in that linkage group?) expression (e.g. using in situ hybridization) in developing wings of the white morph would at least already associate valkea to that specific region of the wing and add support to it being involved in the COLOR (not scale maturation, for example) polymorphism.

3) Provide more details on the methods, including making replication and data structure clearer in the gene expression analysis (and plotting actual data points in Figure 2B).

Reviewer #1 (Recommendations for the authors):

Line:125-126. "likely to be mapping errors". What do the authors mean by 'mapping errors'? greater specificity is needed. Importantly, I would like to see some attempt to document what you think is going on. If you filter your mapping using MAPQ > 30, when mapping across to the entire genome, does this region lose more reads in the yellow samples than what you show in Figure 2? Do all of the while individuals show this higher coverage, compared to yellow? Not clear in Figure 2 if the read depth here is the total for all of the individuals in your collection. Did you look at other regional samples that you sequenced?

The genomic region flanking valkea is not very well characterized in the manuscript. Figure 2 is only showing a cartoon, while there are perfectly good methods for aligning these two regions and showing computationally inferred orthology for this region. More specifically, while the downstream region of yellow-g, yellow-e both look orthologous, the upstream region appears to have different loci (ie. jg6744, jg1307). This suggests that this simplified cartoon is masking a lot more complexity, and I am asking for that to be presented clearly and empirically, as this is currently … glossed over/ignored in the relevant section of the results (lines:112-126).

Lines:150-151: I can understand your reasoning, but this is because I understand quite a bit about the temporal dynamics of color deposition in Lepidoptera wings. Most readers will not. Please provide more of your reasoning here, in terms of thinking that this color change is not due to patterning genes (though nearly all, or all?, aforementioned genes associated with Lep wing color changes, as not associated with color biosynthesis genes, but regulatory/patterning genes). So, your logical step here is quite a departure from the literature, please justify edifying the reader.

Gene expression patterns. I greatly appreciate that you provide an overview via a PCA-like plot to see the clustering of your samples. But.. Figure S3: is this an MDS plot (as per Edger), or something else? You do not describe how you generated this figure in the methods and that should be clarified. Also, in the relevant main text, lines: 160-161, you make a very qualitative statement, and I can't tell if that's just the authors "eye-balling" the PCA-like plot.

Since the RNAseq analysis was working with WW vs. yy individuals, how do the authors envision the expression threshold of valkea to give rise to a dominant phenotype? Stated another way, if white individuals still arise from Wy males, and in those the expression of valkea is going to be much lower … how do they envision the functioning of their new gene in a heterozygous background giving rise to a binary trait?

Figures. I was surprised to see that none of the figures had a general header before the subpanels were described (i.e. a one-sentence overview). I find this very strange and suggest the authors do this.

Figure 3 could benefit from more clarity. A is I guess a restructuring of all your RNAseq data to only look at differences between the two color morphs only, grouping all tissues together? This was not really clear in the main text and is not clarified here. I assume B is only looking at valkea expression across all time points … but this should be made clear.

Line 179: this analysis is fine, but I am rather unhappy with calling this pooling of all tissues and looking for only morph differences, as 'genome-wide analysis of RNAseq' … as all of your analyses are looking at RNAseq data mapped to the genome.. there is nothing unique here compared to what was done previously, expect that tissues are pooled by morph -- but this is not described clearly in the methods (lines: 447-457). Please, revise your methods for greater clarity of your two-step approach, and revise your main text, and figure legends accordingly. Perhaps more importantly, what do you gain by doing this two-step approach? I can see the logic, that even with this type of dev stage grouping, valkea clearly an outlier. This perspective should be shared with the reader. Having that come before the tissue-specific result works, but currently, you present the tissue-specific, then the pooled tissue, and then the figure panels are in the wrong order … it could be more linear and clear. Please revise.

Topology approach. This section appears rather rushed and should be introduced with greater clarity for the reader. Also, why are you only doing this for such a narrow region of the chromosome? Why are you not doing this for the whole region flanking the valkea insertion region? Where is the actual location of yellow-e in this figure? Again, it brings up the strange part this manuscript, in that the authors appear again to be avoiding their 5' flanking region of the duplication … why? That should be mirroring this pattern, which would strengthen the message here, but it is not presented. In sum, one can only really appreciate S5 if you can see the larger region, the flanking loci, the repeated patterns, and some proper phylogeny explaining the alternative topologies (as I find the text description alone lacking proper clarity for the topology alternatives). Does this arise due to the low coverage of your individual WGS data?

Recombination. I find it rather strange that you discuss the potential for recombination suppression as a result of the duplication, yet conduct no measures of LD. Why? You have many whole-genome datasets from a sufficient number of individuals for some preliminary analyses at least, to provide quantitative evidence. But, upon closer reading, is this because you have too little depth per individual for this? This brings up the issue that average read depth per individual is not clearly reported, and that needs to be changed in the main text.

Where is the table of the data generated per individual, for RAD and WGS? Their genomic coverage after mapping? In the area of the text where I expected this, I found instead % of reads mapping.. that doesn't convey depth, which conveys accuracy of WGS data … please make a table for these standard metrics common to QTL and GWAS papers.

Reviewer #2 (Recommendations for the authors):

This study truly is a fantastic effort to identify the locus responsible for adaptive color polymorphism in tiger moths. In general, the paper is well-written and the figures communicate the main results quite well. Following are suggestions, concerns, and/or questions I have about the study that I believe could improve the study and paper.

As mentioned in the public review, I have concerns with the hypotheses the authors use to frame the paper. I see this study as a quite well-executed effort to identify the genetic and phenotypic basis of wing color polymorphism in these tiger moths. I do clearly see how the study was designed to distinguish between the involvement of "large structural variants" versus "sing gene mutations". I think this could be addressed through some revisions in the Introduction. Along the same lines, I don't see any need to introduce the concept of supergenes, as I don't see any efforts to directly test if a co-adapted gene complex is involved. Again, this can be addressed through limited text editing.

This study would be greatly strengthened by additional gene expression and/or functional data. Spatial expression data of valkea and yellow-e in developing hindwings could provide critical evidence of these genes involved in the color pattern differences. Such data has been critical in the implication of other color pattern genes involved in Heliconius and Bicyclus wing development. Even further, functional confirmation, through methods such as CRISPR-cas9 editing has proven to be extremely successful to confirm the role of candidate genes in butterfly wing pattern development ( see examples from Heliconius, Bicyclus, Colias, and other butterflies), including successful CRISPR edits of yellow to study gene function in other butterfly species. Recent other studies of butterfly color pattern genetics published in eLife have included such spatial expression data and/or functional data. I remain unconvinced from the tree topology analyses that valkea alone at this locus is involved in generating the color differences, or that valkea acts as the genetic switch for the color polymorphism. To find the results of this study as convincing as those other recent studies, I would need to see comparable evidence.

For the pigment analyses, after the pheomelanin is extracted from yellow wings, do the wings appear white instead of yellow? I would be curious to see an image of what the extracted wings looked like, so I could directly connect the HPLC differences with a change in yellow versus white coloration.

I feel the paper could be strengthened through some integration of the genetic and phenotypic results. The authors have a rich RNA-seq dataset that can be used to characterize clusters and networks of genes expressed in development, and differences between the color morphs. There is also a well-resolved melanin pathway, with some knowledge of specific gene functions from Drosophila and other butterfly studies. In this regard, I feel the authors have missed an opportunity to integrate their gene expression data with their phenotypic data. For instance, what other genes do valkea and yellow-e cluster with (e.g. show correlated expression pattern with) in the RNA-seq data? These clusters would reflect the network of genes that are differently expressed between color morphs. I would in interested in knowing what these genes are and if there are any genes with interesting functions or known to be in developmental pathways that involve yellow genes, or are involved in pigmentation. In the melanic pathway, it could be powerful to visualize where in the pathway the authors propose that valkea may be impacting pheomelanin production. I would urge the authors to revisit Matsuda and Monteiro 2020 as an example of how such data can be integrated to give the reader a more clear and integrated understanding of how the genetic changes identified may be impacting the phenotype.

I quite like that the authors highlight gene duplication as a structural variant that is largely unable to properly recombine with haplotypes lacking the duplicated region. I would urge the authors to cite other examples where such duplications have been implicated in wing pattern development and adaptive evolution. For example, gene duplicates have been implicated in the adaptive evolution of pollen feeding in Helcinius butterflies (Smith et al. 2020) and sexually dimorphic color pattern development in Zerene butterflies (Rodriguez et al. 2021). This paper has an opportunity to highlight the increasing evidence of recent gene duplications in evolutionary diversification.

The duplicated region at the mapped locus needs to be further resolved. At a minimum, the authors should finely annotate the duplicated region. For instance, are there any TE insertions? Are the entire duplicate regions reflect a single recent duplication? Or, are there regions duplicated more than once, and this region appears to have experienced several instances of unequal crossovers and potential insertion/deletion events? Is the regulatory region (e.g. 5' UTR, etc.) duplicated? Does the regulatory region show elevated divergence relative to the other duplicated regions?

Similarly, further analysis of valkea would strengthen the paper. Does valkea show any evidence of adaptive molecular evolution? Are there non-synonymous substitutions with yellow-e? How old/recent is the gene duplication event?

Further analyses to address these questions could provide further resolution to the evolution and potential role of valkea in the color polymorphism.

Figure 2D. I have some reservations on interpreting the read-coverage as evidence the duplicated region is missing in all yellow samples. For instance, yellow-g shows a similar mapped reads pattern as the region just 3' of valkea in the duplicated region, yet yellow-g is not considered to be within the duplicated region. Are the regions in the duplicated region with high coverage for yellow samples potentially repetitive regions of the genome, such as TEs? If so, an annotation of this region would improve our ability to interpret the read coverage results.

Also, did the authors attempt to map RNA-seq reads from yellow individuals to a white reference genome to see if any reads mapped to valkea? This would be a quick and direct way to confirm that valkea is not present/expressed in any yellow genomes. In the methods section, it does not state which A. plantaginis genome the RNA-seq gata was mapped to. If RNA-seq data for yellow individuals was only mapped to a yellow reference genome that lacks valkea, then we can not be sure if valkea transcripts are actually absent from yellow RNA-seq samples (I honestly assume the authors are aware of the bias introduced by mapping yellow RNA-seq data to a yellow reference genome only, but I just need to check since I couldn't discern from the methods).

Reviewer #3 (Recommendations for the authors):

Specific comments to the authors:

Line 26: the limitation of recombination does not necessarily imply a supergene architecture. Furthermore, your results point a pleiotropic effect of a single gene rather than to a combined effect of several genes, therefore departing from the classical 'supergene' hypothesis. I would recommend rephrasing this part.

Line 40: it is unclear to me what you mean by 'selection is context-dependent, this needs to be explained in more detail.

Line 49: in mimetic butterflies, there is also a series of inversions at the supergene controlling colour pattern polymorphism in H. numata (Jay et al. 2021 Nature Genetics).

Line 59: it is unclear what you mean by 'in an ecological context', you may explain the key ecological features involved in the persistence of the polymorphism in this species.

Line 70: What is causing the mating advantage? Is it linked to female preference? If so, this raises the question of the selection promoting the evolution of such preference?

Figure 1: it this the frequency of MALE colour patterns shown on panel A?

Line 131: In my opinion figure S2 should be in the main document, it is very important to infer the ancestral state and the origin of the duplicated region. I would prefer moving panel D of figure 2 into the supplementary if space is missing.

In figure 2 panel D, I guess you compared YY HOMOZYGOUS males with WY HETEROZYGOUS males? This would be useful to provide this genotypic information in the legend.

Line 148: you may be precise that the RNAseq was performed on the wing disk. Did you investigate the expression patterns in hindwings and forewings separately? This might be interesting since the level of yellow colour seem to be higher in the hindwing than in the forewings (at least from what I can see in figure 1).

Line164: This suggests that there is not major shift in expression patterns between morphs even within the wing disk tissue. This is in apparent contradiction with the 99 DE genes found at the genomic level (lines 180-181). I think I misunderstood something here, these first expression analyses were restricted to genes located within the QTL region? This should be clarified.

Line 170-171: Did the overexpression of yellow-e occur at the same developmental stage as the overexpression of valkea (i.e. premelanin stage)? This is important to infer the putative developmental pathway inducing white colour pattern development.

Figure S5: The position of the yellow-e gene and of the valkea gene are not indicated in the figure, so it is difficult to draw conclusions from this figure at this point.

Line 196: This provides quite indirect evidence for ruling out the effect of yellow-e on the switch between white and yellow colour pattern development. The overexpression of yellow-e at the pre-melanin stage could be caused by variation in the (non-coding) regulatory region, and therefore explaining why variation in the yellow-e sequences is not specifically associated with colour pattern variation.

Line 291: In line with your conclusions, the dominance of the 'white' allele over the 'yellow' one is consistent with the white allele being a derived haplotype that invaded an ancestrally yellow population. Such invasion of a new adaptive allele is facilitated when the invading allele is dominant over the ancestral one because it is then expressed at a heterozygous state (i.e. Haldane's sieve effect).

Line 297: I have some trouble reconciling the 'neofunctionalization hypothesis' with the fact that valkea seems to be a truncated gene. Is there any example where a truncated yellow gene gained a new function in the melanin developmental pathway?

The overexpression of the valkea gene could stem from a lack of regulation of a gene with a loss of function. In that case, the switch in colour pattern might stem from variation in the non-coding region affecting the expression of other genes, like yellow-e. Is there a way you can rule out this alternative hypothesis?

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Colour polymorphism associated with a gene duplication in male wood tiger moths" for further consideration by eLife. Your revised article has been evaluated by Christian Landry (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

The CRISPR experiment is important but lacks a more detailed description, as well as earlier and more explicit acknowledgement of its limitations, including that it failed to conclusively demonstrate that valkea (and not yellow-e) is responsible for the white/yellow switch. This uncertainty should be referred to earlier on (abstract?).

Relative to standard butterfly color pattern analysis, more information is necessary regarding the UV analysis (methods and wildtype phenotype), and regarding the use of "eumelanin" and "pheomelanin" which are usually reserved for vertebrates.

Reviewer #2 (Recommendations for the authors):

I have reviewed the revisions, and the authors have sufficiently addressed my previous concerns and suggestions. However, the authors' inclusion of additional CRISPR data is lacking critical information and analyses, which I detail below.

Lines 217 and 218 states that whole genome sequences of mutants were used to confirm mutants. However, there is no description of the methods used, nor can I find that those data are made available. Please add a description of the methods used for whole genome sequencing and confirming the presence of mutant alleles. I am also interested in what methods were used to test for off-target effects. It is particularly important to examine for potential off-target edits to other yellow genes.

Ln 220. Only one female survived to adulthood, and this had a mosaic phenotype. "This individual had one yellow forewing, similar to the male mutants, with the rest of the body and wings being wildtype (Figure 4 —figure supplement 1)." It is not at all clear that this female has one mutant wing. Both wings appear much more yellow than a white wildtype. I need some further phenotypic evidence (spectrophotometer readings or pigment analyses) as the phenotypic variation is not evident in the images provided. It would be ideal to see that the colors in mosaic mutant phenotypic regions are significantly different from wildtype (this can be done using spec readings from multiple wildtype wings and mutant wings). Second, there needs to be sequence verification of the mutations included in the manuscript, as previously mentioned.

Figure 4 —figure supplement 2 shows images UV. However, there are no methods provided for how these UV data were collected. Without some details of the imaging setup, I am unable to discern that images reflect differences in UV reflection, or may be due to variations in the imaging procedure. If possible, spectra analyses of the wings are an easy and cost-effective approach to quickly confirming changes in UV brightness on lepidoptera wings.

There is also no background information given for the wildtype UV. Lines 212-213 suggest the UV is a result of scale structures. What is the reference or evidence for this? Variation in UV reflection is known to be influenced by pigment composition in Pieris butterflies, not necessarily scale structures. To make assertions of UV being associated with scale structures, I would be interested in seeing the characterization of the putative UV related scale structures in wildtypes and mutants. This type of scale characterization (e.g. SEM and/TEM of wing scales in wildtype and mutants) is routinely included with other functional genomic studies of similar wing colorations (for examples see Ficarrotta et al. 2022, Livraghi et al. 2022, Concha et al. 2019, Matsuoka and Monteiro 2018). At a minimum, a detailed description/characterization of the wildtype UV should be given to the readers. Along these lines, I am curious to know if the UV may be iridescent. If so, some descriptive info on the iridescence would be needed (i.e. angle of incidence). Also, if iridescent, the differences in UV between wildtype and mutants should be examined further to determine if the image differences between wt and mutants are due to changes in the angle of incidence.

Ln 335-336. I am unclear what evidence supports yellow-e having a forewing-specific effect.

Ln 337-338 The authors state that yellow-e was "likely also knocked out…". I think this is misleading, as lines 218-219 states "All samples also showed evidence of editing at the corresponding yellow-e exons, which mainly involved insertions". Based on this it seems more than "likely", and actually confirmed yellow-e coding was disrupted in ALL samples.

Reviewer #3 (Recommendations for the authors):

The revised version of the manuscript successfully addresses most of my previous concerns.

Results from CrispR/cas9 experiments targeting the valkea gene were added to the manuscript in order to validate the role of this gene in the developmental switch from the yellow to the white morph. Such CrispR/cas9 experiments are challenging and obtaining high number of mutant adults is usually difficult in Lepidoptera.

Here a few male mutants and one female mutant were successfully obtained. Nevertheless, the lack of specificity of the CrispR guides resulted in modifications in both valkea and yellow-e genes in the few mutant individuals that reached the adult stage, therefore preventing the full characterisation of the respective functional implications of these two genes in the development of hind and forewing colour patterns in males and females.

From what I understood, the main argument for ruling out yellow-e as causing the white/yellow switch in male hindwings is the phenotype observed in a single mutant female showing in panel E of the supplementary figure 4. The sentences line 221-224 are not entirely convincing to me. The phenotype of the mutant female is used to point at the putative role of yellow-e on forewing colour in female. Does it lead to hypothesize a role of yellow-e on forewing colour in both sexes? And thus to a role of valkea in male hindwing colour? This indirect argument should be made clearer, and further discussion on the respective roles of these two genes in hind and forewing coloration is needed.

In the supplementary figure 4 (referred to as Figure 4 —figure supplement 1): the panel E shows the phenotype of the mutant female but picture of the wild-type female would be useful to fully evaluate the impact of the CrispR treatment on phenotypic variation.

Line 213: This is quite interesting, did you observe differences in scale structure between wild-type yellow and white scales, and in the wild-type yellow vs. mutant yellow scales? Such observations on the respective role of pigments and scale structure in the reflected colours are also relevant to understand the developmental bases of wing colour variations.

https://doi.org/10.7554/eLife.80116.sa1

Author response

Essential revisions:

1) To frame it in terms of "supergene entailing reduced recombination", the work requires quantification of "lower recombination" within the duplicated segment, and more detailed characterization of the 5' end of that segment. Alternatively, claims of "supergene"-like behavior should be explicitly stated as a hypothesis. In terms of "supergene" pleiotropic effects, it seems that the association between duplication and polymorphism is shown directly only for pigmentation, and not any other phenotypes that covary with that. The association to other traits should also be presented as hypothetical.

2) Definite proof that valkea, and not something else in the duplicated region (e.g. regulatory sequence responsible for expression differences between morphs for other genes in that linkage group), is responsible for the white phenotype requires functional analysis. Possibly, the more accessible type of analysis would involve using CRISPR-Cas9 to knock-out valkea from a white morph background. That being impossible, showing spatial patterns of valkea (and other genes in that linkage group?) expression (e.g. using in situ hybridization) in developing wings of the white morph would at least already associate valkea to that specific region of the wing and add support to it being involved in the COLOR (not scale maturation, for example) polymorphism.

3) Provide more details on the methods, including making replication and data structure clearer in the gene expression analysis (and plotting actual data points in Figure 2B).

We have made extensive revisions to this manuscript, the main addition being a CRISPR/Cas9 experiment in which we functionally validate the valkea gene. Regarding the essential revisions:

1. We have edited the text to move the focus away from supergenes and look more generally at the possible genetic basis of complex polymorphisms and adaptive variation, presenting supergenes as one possibility. The main figures now include more of the 5’ region of the duplication which shows the sequence upstream of the duplication is highly similar in the white and yellow genomes (Figure 2C). In the discussion, we hypothesise that this morph-specific duplication will provide a region of reduced recombination because the region is effectively hemizygous and cannot recombine except in homozygote genotypes. We cannot carry out a detailed LD analysis as the sequence is not present in both morphs. We have added PCR assays as extra evidence for the lack of the duplicated region in yellow morphs (Appendix figure 1). This genotyping assay is based on a small deletion in the yellow-e gene in white morphs.

2. We have added the results of a CRISPR gene editing experiment, in which we knocked out the valkea gene leading to the white morphs becoming yellow. We suspect that these results also show a role for yellow-e in forewing colouration, as sequencing of these individuals suggested mutations in both valkea and yellow-e due to the similarity of the sequences.

3. The methods for the gene expression analysis have been restructured and clarified.

We thank the three reviewers for their detailed comments and hope that the following changes fully address their questions and concerns. New and edited text is highlighted in blue on the revised manuscript.

Reviewer #1 (Recommendations for the authors):

Line:125-126. "likely to be mapping errors". What do the authors mean by 'mapping errors'? greater specificity is needed. Importantly, I would like to see some attempt to document what you think is going on. If you filter your mapping using MAPQ > 30, when mapping across to the entire genome, does this region lose more reads in the yellow samples than what you show in Figure 2? Do all of the while individuals show this higher coverage, compared to yellow? Not clear in Figure 2 if the read depth here is the total for all of the individuals in your collection. Did you look at other regional samples that you sequenced?

We have taken a more detailed look at the mapping across the whole scaffold by including a comparison of different mapping quality filters. This shows that when MQ filters are more stringent, read depth decreases more in yellow samples compared to white in the duplication region (Figure 2 supplements 2 and 3). Thus we think this shows that in yellow individuals, yellow-e reads are mapping to valkea (and in white morphs vice versa). In general, MQ is lower in the duplication in both white and yellow samples.

The patterns of higher coverage in white individuals were also seen when including samples from the non-Finnish populations (Scotland and Estonia). Figure 2D shows the mean read depth of the white and yellow samples, and this has been clarified in the figure legend. None of the white samples had 0 coverage in the valkea region.

The genomic region flanking valkea is not very well characterized in the manuscript. Figure 2 is only showing a cartoon, while there are perfectly good methods for aligning these two regions and showing computationally inferred orthology for this region. More specifically, while the downstream region of yellow-g, yellow-e both look orthologous, the upstream region appears to have different loci (ie. jg6744, jg1307). This suggests that this simplified cartoon is masking a lot more complexity, and I am asking for that to be presented clearly and empirically, as this is currently … glossed over/ignored in the relevant section of the results (lines:112-126).

Figure 2C has been edited with the cartoon replaced with a more specific alignment of the two regions which shows clearly the missing region in the yellow genome. This also now includes more of the region upstream of the duplication. We have added an explanation that jg6744 and jg1307 are the same gene, and all genes in the flanking regions are present in both morphs (lines 146-149).

Lines:150-151: I can understand your reasoning, but this is because I understand quite a bit about the temporal dynamics of color deposition in Lepidoptera wings. Most readers will not. Please provide more of your reasoning here, in terms of thinking that this color change is not due to patterning genes (though nearly all, or all?, aforementioned genes associated with Lep wing color changes, as not associated with color biosynthesis genes, but regulatory/patterning genes). So, your logical step here is quite a departure from the literature, please justify edifying the reader.

It is confusing that we here mentioned genes associated with patterning rather than colour. Instead we have now included the example of melanin pathway genes from Zhang et al. 2017, which is a more relevant analysis of melanin production in Lepidoptera. We have also included a comparison of gene expression of yellow genes from Ferguson et al., 2011 (now lines 157-160).

Gene expression patterns. I greatly appreciate that you provide an overview via a PCA-like plot to see the clustering of your samples. But.. Figure S3: is this an MDS plot (as per Edger), or something else? You do not describe how you generated this figure in the methods and that should be clarified. Also, in the relevant main text, lines: 160-161, you make a very qualitative statement, and I can't tell if that's just the authors "eye-balling" the PCA-like plot.

Figure S3 (now Figure 3 supplement 1) is an MDS plot generated in limma and we added this information to the methods. We have edited the figure legend to include this and a better description of the plot. We did not formally test the effect of developmental stage in explaining most of the genome-wide variation. However, the lack of clustering among samples of the same morphs could be consistent with the fact that the phenotype is controlled by a single Mendelian locus and thus few genes are DE between morphs. These results suggest that more genes are involved in expression profiles specific to the developmental stage, rather than the wing phenotype. We have added this explanation to the text.

Since the RNAseq analysis was working with WW vs. yy individuals, how do the authors envision the expression threshold of valkea to give rise to a dominant phenotype? Stated another way, if white individuals still arise from Wy males, and in those the expression of valkea is going to be much lower … how do they envision the functioning of their new gene in a heterozygous background giving rise to a binary trait?

It is unknown if the expression of valkea is lower in Wy. The W allele is fully dominant over y, with heterozygotes presenting the dominant hindwing colour phenotype and not intermediate phenotypes. Thus we expect expression of valkea to be the similar in WW and Wy individuals, although there are differences in other traits between WW and Wy such as UV reflectance.

Figures. I was surprised to see that none of the figures had a general header before the subpanels were described (i.e. a one-sentence overview). I find this very strange and suggest the authors do this.

Figure legends have been edited.

Figure 3 could benefit from more clarity. A is I guess a restructuring of all your RNAseq data to only look at differences between the two color morphs only, grouping all tissues together? This was not really clear in the main text and is not clarified here. I assume B is only looking at valkea expression across all time points … but this should be made clear.

More details have been added to the figure legend to make these plots clearer. Figure 3A is showing only DE genes at the pre-melanin stage, while 3B is showing only expression of valkea across the different stages. Overall, the gene expression methods have been restructured into a more logical order for the reader.

Line 179: this analysis is fine, but I am rather unhappy with calling this pooling of all tissues and looking for only morph differences, as 'genome-wide analysis of RNAseq' … as all of your analyses are looking at RNAseq data mapped to the genome.. there is nothing unique here compared to what was done previously, expect that tissues are pooled by morph -- but this is not described clearly in the methods (lines: 447-457). Please, revise your methods for greater clarity of your two-step approach, and revise your main text, and figure legends accordingly. Perhaps more importantly, what do you gain by doing this two-step approach? I can see the logic, that even with this type of dev stage grouping, valkea clearly an outlier. This perspective should be shared with the reader. Having that come before the tissue-specific result works, but currently, you present the tissue-specific, then the pooled tissue, and then the figure panels are in the wrong order … it could be more linear and clear. Please revise.

We have modified the gene expression Results section and methods to clarify the strategy that was followed. Only one analysis was performed in which all genes of all samples were analysed. However, initially we wanted to specify that we had a list of 22 candidate genes found in the QTL/GWAS region that we were particularly interested in, as possible cis-regulatory changes found in the associated could be affecting gene expression of nearby genes.

Topology approach. This section appears rather rushed and should be introduced with greater clarity for the reader. Also, why are you only doing this for such a narrow region of the chromosome? Why are you not doing this for the whole region flanking the valkea insertion region? Where is the actual location of yellow-e in this figure?

It wasn’t clear here that we had excluded the duplication because the white and yellow morphs cannot be compared due to lack of sequence in yellows. Because we cannot make this comparison, we decide to remove this analysis, and it does not provide sufficient evidence for or against the role of yellow-e in hindwing colour.

Again, it brings up the strange part this manuscript, in that the authors appear again to be avoiding their 5' flanking region of the duplication … why? That should be mirroring this pattern, which would strengthen the message here, but it is not presented. In sum, one can only really appreciate S5 if you can see the larger region, the flanking loci, the repeated patterns, and some proper phylogeny explaining the alternative topologies (as I find the text description alone lacking proper clarity for the topology alternatives). Does this arise due to the low coverage of your individual WGS data?

Initially we did not include detailed analysis of the 5’ flanking region because this did not fall within the region which was found to be significant in the QTL analysis. The genes in this region are orthologous in the white and yellow reference genomes, and we do not see any additional large structural variation. This is now clearer in Figure 2C.

Recombination. I find it rather strange that you discuss the potential for recombination suppression as a result of the duplication, yet conduct no measures of LD. Why? You have many whole-genome datasets from a sufficient number of individuals for some preliminary analyses at least, to provide quantitative evidence. But, upon closer reading, is this because you have too little depth per individual for this?

We discuss recombination suppression as one potential consequence of the duplication. The absence of the duplicated region in the yy genome indicates that the sequence is effectively hemizygous and so we cannot determine if there is a change in LD across white and yellow morphs.

This brings up the issue that average read depth per individual is not clearly reported, and that needs to be changed in the main text.

Where is the table of the data generated per individual, for RAD and WGS? Their genomic coverage after mapping? In the area of the text where I expected this, I found instead % of reads mapping.. that doesn't convey depth, which conveys accuracy of WGS data … please make a table for these standard metrics common to QTL and GWAS papers.

Summary statistics have been added to the supplementary tables (S5 and S6) for the QTL and GWAS samples.

Reviewer #2 (Recommendations for the authors):

This study truly is a fantastic effort to identify the locus responsible for adaptive color polymorphism in tiger moths. In general, the paper is well-written and the figures communicate the main results quite well. Following are suggestions, concerns, and/or questions I have about the study that I believe could improve the study and paper.

As mentioned in the public review, I have concerns with the hypotheses the authors use to frame the paper. I see this study as a quite well-executed effort to identify the genetic and phenotypic basis of wing color polymorphism in these tiger moths. I do clearly see how the study was designed to distinguish between the involvement of "large structural variants" versus "sing gene mutations". I think this could be addressed through some revisions in the Introduction. Along the same lines, I don't see any need to introduce the concept of supergenes, as I don't see any efforts to directly test if a co-adapted gene complex is involved. Again, this can be addressed through limited text editing.

We have edited both the introduction and discussion to move the focus away from supergenes, and instead base our hypotheses around the possible mechanisms for variation in colour polymorphisms and adaptive variation.

This study would be greatly strengthened by additional gene expression and/or functional data. Spatial expression data of valkea and yellow-e in developing hindwings could provide critical evidence of these genes involved in the color pattern differences. Such data has been critical in the implication of other color pattern genes involved in Heliconius and Bicyclus wing development. Even further, functional confirmation, through methods such as CRISPR-cas9 editing has proven to be extremely successful to confirm the role of candidate genes in butterfly wing pattern development ( see examples from Heliconius, Bicyclus, Colias, and other butterflies), including successful CRISPR edits of yellow to study gene function in other butterfly species. Recent other studies of butterfly color pattern genetics published in eLife have included such spatial expression data and/or functional data. I remain unconvinced from the tree topology analyses that valkea alone at this locus is involved in generating the color differences, or that valkea acts as the genetic switch for the color polymorphism. To find the results of this study as convincing as those other recent studies, I would need to see comparable evidence.

We have added the results of our gene editing experiment which successfully showed that when valkea is knocked out in a white morph, yellow pigment is produced on the wings. We remain unsure of the role of yellow-e but we use these results to hypothesise that valkea controls hindwing colour, while yellow-e affects forewing colour. Sequencing showed that both valkea and yellow-e had evidence of gene editing around the target guide sequences.

We agree that the tree topology did not provide any convincing evidence for or against the role of yellow-e vs. valkea, so as mentioned earlier, we decided to remove this analysis since we cannot estimate topology at the valkea gene.

For the pigment analyses, after the pheomelanin is extracted from yellow wings, do the wings appear white instead of yellow? I would be curious to see an image of what the extracted wings looked like, so I could directly connect the HPLC differences with a change in yellow versus white coloration.

Unfortunately the methods mean that this is not possible as the wings are crushed to extract the pheomelanin. The hydrogen iodide treatment is a harsh strong acid, leaving the residues of all samples a dark brown colour.

I feel the paper could be strengthened through some integration of the genetic and phenotypic results. The authors have a rich RNA-seq dataset that can be used to characterize clusters and networks of genes expressed in development, and differences between the color morphs. There is also a well-resolved melanin pathway, with some knowledge of specific gene functions from Drosophila and other butterfly studies. In this regard, I feel the authors have missed an opportunity to integrate their gene expression data with their phenotypic data. For instance, what other genes do valkea and yellow-e cluster with (e.g. show correlated expression pattern with) in the RNA-seq data? These clusters would reflect the network of genes that are differently expressed between color morphs. I would in interested in knowing what these genes are and if there are any genes with interesting functions or known to be in developmental pathways that involve yellow genes, or are involved in pigmentation. In the melanic pathway, it could be powerful to visualize where in the pathway the authors propose that valkea may be impacting pheomelanin production. I would urge the authors to revisit Matsuda and Monteiro 2020 as an example of how such data can be integrated to give the reader a more clear and integrated understanding of how the genetic changes identified may be impacting the phenotype.

This is a very good suggestion. We looked at the functions of other DE genes but none stood out as being part of melanin or pigmentation pathways. Valkea in the white genome sits next (5’) to yellow family genes g and e. yellow genes together with laccase2 lie upstream of the insect melanin pathway and act as master genes involved in the catalysis of dopa-melanin and dopamine-melanin to produce black and brown melanin respectively. The production of dopamine-melanin can be suppressed further down the pathway by the conversion of dopamine to N-b-alanyldopamine (NBAD) though the binding of β-alanine by the activity of ebony, forming NBAD, the precursor of yellow sclerotin resulting in a yellow pigment.

In recent insect studies (Galván et al., 2015; Jorge García et al., 2016; Matsuoka and Monteiro, 2018; Polidori et al., 2017; Zhang et al., 2019 doi:10.3390/ijms20112728), including ours, the yellow pigmentation is also attributed, at least partly, to pheomelanin derived by the oxidation of dopamine. Both the NBAD and pheomelanin routes are final molecule products of the melanin pathways. Hence, if we may speculate, valkea could impact the catalysis of dopamine having cascading effects down the pathway resulting in (lack of) yellow pigmentation. The precise interactions with other genes in the dopamine pathway are currently unknown. Because of these unknowns we have not added any further figures relating to the pigment analysis or melanin pathway (which is described in better detail in Matsuoka and Monteiro, as the reviewer suggests).

I quite like that the authors highlight gene duplication as a structural variant that is largely unable to properly recombine with haplotypes lacking the duplicated region. I would urge the authors to cite other examples where such duplications have been implicated in wing pattern development and adaptive evolution. For example, gene duplicates have been implicated in the adaptive evolution of pollen feeding in Helcinius butterflies (Smith et al. 2020) and sexually dimorphic color pattern development in Zerene butterflies (Rodriguez et al. 2021). This paper has an opportunity to highlight the increasing evidence of recent gene duplications in evolutionary diversification.

These are great examples and we have edited the discussion to include more examples for the role of gene duplications in adaptive variation.

The duplicated region at the mapped locus needs to be further resolved. At a minimum, the authors should finely annotate the duplicated region. For instance, are there any TE insertions? Are the entire duplicate regions reflect a single recent duplication? Or, are there regions duplicated more than once, and this region appears to have experienced several instances of unequal crossovers and potential insertion/deletion events? Is the regulatory region (e.g. 5' UTR, etc.) duplicated? Does the regulatory region show elevated divergence relative to the other duplicated regions?

Similarly, further analysis of valkea would strengthen the paper. Does valkea show any evidence of adaptive molecular evolution? Are there non-synonymous substitutions with yellow-e? How old/recent is the gene duplication event?

Further analyses to address these questions could provide further resolution to the evolution and potential role of valkea in the color polymorphism.

These are all interesting questions. We used RepeatMasker and found a number of TEs within the region. There are no TEs in the coding sequences of valkea, and the density of TEs in this region is not obviously different from surrounding regions. At the moment we can’t say much more about this so haven’t included any specific analysis in the manuscript.

From looking at whole scaffold alignments, we do not see evidence for further major duplications in this region.

We are lacking data that would allow us to estimate mutation rate and the age of the duplication event. From a previous study (Yen et al. 2020), we know that the population in Georgia is separated from the Finnish and Estonian populations. Georgian samples do not have the duplication suggesting a more recent timescale. Further analysis of valkea is planned with new data and we are looking to include this in future manuscripts.

Figure 2D. I have some reservations on interpreting the read-coverage as evidence the duplicated region is missing in all yellow samples. For instance, yellow-g shows a similar mapped reads pattern as the region just 3' of valkea in the duplicated region, yet yellow-g is not considered to be within the duplicated region. Are the regions in the duplicated region with high coverage for yellow samples potentially repetitive regions of the genome, such as TEs? If so, an annotation of this region would improve our ability to interpret the read coverage results.

We used PCR assays as additional evidence for the lack of valkea in yellow individuals. Although coverage of yellow-g does fluctuate across the gene, the pattern is fairly consistent in both whites and yellows. There is no pattern between the location of the TEs and the peaks in coverage.

Also, did the authors attempt to map RNA-seq reads from yellow individuals to a white reference genome to see if any reads mapped to valkea? This would be a quick and direct way to confirm that valkea is not present/expressed in any yellow genomes. In the methods section, it does not state which A. plantaginis genome the RNA-seq gata was mapped to. If RNA-seq data for yellow individuals was only mapped to a yellow reference genome that lacks valkea, then we can not be sure if valkea transcripts are actually absent from yellow RNA-seq samples (I honestly assume the authors are aware of the bias introduced by mapping yellow RNA-seq data to a yellow reference genome only, but I just need to check since I couldn't discern from the methods).

Yes, the results presented in the DE analysis section are using reads mapped to the white reference (line168).

Reviewer #3 (Recommendations for the authors):

Specific comments to the authors:

Line 26: the limitation of recombination does not necessarily imply a supergene architecture. Furthermore, your results point a pleiotropic effect of a single gene rather than to a combined effect of several genes, therefore departing from the classical 'supergene' hypothesis. I would recommend rephrasing this part.

We have rephrased the abstract, along with the general focus of the introduction and discussion, to move away from the supergene hypothesis and talk more generally about the genetic basis of polymorphisms in wild populations, including supergenes as one of several possible mechanisms.

Line 40: it is unclear to me what you mean by 'selection is context-dependent, this needs to be explained in more detail.

Here we are referring to genetic correlations between the colour locus and other traits which have benefits in different contexts, such as in the examples presented in the following paragraph, where white and yellow males have differences in traits which, for example, give them an advantage in mating or predator defence.

Line 49: in mimetic butterflies, there is also a series of inversions at the supergene controlling colour pattern polymorphism in H. numata (Jay et al. 2021 Nature Genetics).

A reference to the H. numata supergene has been added (line 51).

Line 59: it is unclear what you mean by 'in an ecological context', you may explain the key ecological features involved in the persistence of the polymorphism in this species.

There has been lots of research relating to the persistence of the polymorphism in this well-studied species. We have tried to make sure that the introduction covers the key studies while not becoming too detailed.

Line 70: What is causing the mating advantage? Is it linked to female preference? If so, this raises the question of the selection promoting the evolution of such preference?

This paragraph has been updated with the most recent studies in this area. While females generally prefer to mate with white males (Nokelainen et al., 2012), their preference is thought to be flexible and affected by male morph frequency, as males of either morph have a reproductive advantage when they are the most common morph (Gordon et al., 2015). Yellow males are generally less successful in their reproductive output (De Pasqual et al., 2022).

Figure 1: it this the frequency of MALE colour patterns shown on panel A?

Yes, this has been clarified.

Line 131: In my opinion figure S2 should be in the main document, it is very important to infer the ancestral state and the origin of the duplicated region. I would prefer moving panel D of figure 2 into the supplementary if space is missing.

After looking at various ways to present this section, we decided to keep this figure in the supplementary (now Figure 2 – supplement 4). However, we have added some further discussion about the possible ancestry of yellow-e (lines 328-333).

In figure 2 panel D, I guess you compared YY HOMOZYGOUS males with WY HETEROZYGOUS males? This would be useful to provide this genotypic information in the legend.

Yes, figure 2D is comparing wild white individuals which are likely a combination of WW and Wy, with wild yy. The figure legend has been edited.

Line 148: you may be precise that the RNAseq was performed on the wing disk. Did you investigate the expression patterns in hindwings and forewings separately? This might be interesting since the level of yellow colour seem to be higher in the hindwing than in the forewings (at least from what I can see in figure 1).

The RNAseq analysis used only hindwings dissected from pupae and we have clarified this in the methods.

Line164: This suggests that there is not major shift in expression patterns between morphs even within the wing disk tissue. This is in apparent contradiction with the 99 DE genes found at the genomic level (lines 180-181). I think I misunderstood something here, these first expression analyses were restricted to genes located within the QTL region? This should be clarified.

This has been clarified with the restructuring of the DE section. There are more genes that are DE between stages than between colour morphs, thus developmental stage drives the clustering.

Line 170-171: Did the overexpression of yellow-e occur at the same developmental stage as the overexpression of valkea (i.e. premelanin stage)? This is important to infer the putative developmental pathway inducing white colour pattern development.

Yes, these are both overexpressed in the pre-melanin stage. This has been added to line 184.

Figure S5: The position of the yellow-e gene and of the valkea gene are not indicated in the figure, so it is difficult to draw conclusions from this figure at this point.

Line 196: This provides quite indirect evidence for ruling out the effect of yellow-e on the switch between white and yellow colour pattern development. The overexpression of yellow-e at the pre-melanin stage could be caused by variation in the (non-coding) regulatory region, and therefore explaining why variation in the yellow-e sequences is not specifically associated with colour pattern variation.

As mentioned earlier, we removed this figure and analysis because, as the reviewer suggests, it was difficult to draw any conclusions from this.

Line 291: In line with your conclusions, the dominance of the 'white' allele over the 'yellow' one is consistent with the white allele being a derived haplotype that invaded an ancestrally yellow population. Such invasion of a new adaptive allele is facilitated when the invading allele is dominant over the ancestral one because it is then expressed at a heterozygous state (i.e. Haldane's sieve effect).

This is a great point to include the Haldane’s sieve effect and we have added it to the discussion (lines 328-333).

Line 297: I have some trouble reconciling the 'neofunctionalization hypothesis' with the fact that valkea seems to be a truncated gene. Is there any example where a truncated yellow gene gained a new function in the melanin developmental pathway?

The overexpression of the valkea gene could stem from a lack of regulation of a gene with a loss of function. In that case, the switch in colour pattern might stem from variation in the non-coding region affecting the expression of other genes, like yellow-e. Is there a way you can rule out this alternative hypothesis?

We believe that the CRISPR mutants provide evidence that valkea is having a direct effect on the phenotype. In the mutants, both forewings and hindwings became yellow, and thus we suspect that valkea is controlling hindwing colour while yellow-e, which was likely also knocked out due to the similarity of the sequences, controls forewing colour. However, because of this we cannot completely rule out the role of yellow-e in hindwing colouration.

[Editors’ note: what follows is the authors’ response to the second round of review.]

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

The CRISPR experiment is important but lacks a more detailed description, as well as earlier and more explicit acknowledgement of its limitations, including that it failed to conclusively demonstrate that valkea (and not yellow-e) is responsible for the white/yellow switch. This uncertainty should be referred to earlier on (abstract?).

We have made it clear in the abstract that both valkea and the original yellow-e gene were knocked out in our CRISPR experiment. We also added further clarification of this in the first paragraph of the discussion. Further discussion of the role of yellow-e remains later in the manuscript (lines 350-353).

Relative to standard butterfly color pattern analysis, more information is necessary regarding the UV analysis (methods and wildtype phenotype), and regarding the use of "eumelanin" and "pheomelanin" which are usually reserved for vertebrates.

Detailed methods for the UV photography have been added at lines 576-582. Further comparison of UV reflectance in wildtype vs. mutant phenotypes can be seen in the reflectance spectra (Figure 4 – supplement 4). The use of eumelanin and pheomelanin in insects is covered in the paper by Barek et al. 2018 (doi: 10.1111/pcmr.12672), which we cite in the pigment Results section and in the discussion. Other examples of pheomelanins in insects are at lines 260-262.

Reviewer #2 (Recommendations for the authors):

I have reviewed the revisions, and the authors have sufficiently addressed my previous concerns and suggestions. However, the authors' inclusion of additional CRISPR data is lacking critical information and analyses, which I detail below.

Lines 217 and 218 states that whole genome sequences of mutants were used to confirm mutants. However, there is no description of the methods used, nor can I find that those data are made available. Please add a description of the methods used for whole genome sequencing and confirming the presence of mutant alleles.

The methods for whole genome sequencing have been added in lines 551-557. The exact indel sequences for each mutant are shown in Figure 4 – supplement 3, with the guide sequence highlighted in pink. We show the sequences for both valkea and yellow-e. The raw whole genome sequences of the CRISPR mutants were deposited in SRA with the other raw data, as detailed in the data availability statement.

I am also interested in what methods were used to test for off-target effects. It is particularly important to examine for potential off-target edits to other yellow genes.

We used the whole genome data of the CRISPR mutants to look for off-target effects in the remaining yellow genes (c, d2, f, g2, g2, h and yellow). No indels or mutations were found in these gene sequences and we have added this to the manuscript (lines 233-236).

Ln 220. Only one female survived to adulthood, and this had a mosaic phenotype. "This individual had one yellow forewing, similar to the male mutants, with the rest of the body and wings being wildtype (Figure 4 —figure supplement 1)." It is not at all clear that this female has one mutant wing. Both wings appear much more yellow than a white wildtype. I need some further phenotypic evidence (spectrophotometer readings or pigment analyses) as the phenotypic variation is not evident in the images provided. It would be ideal to see that the colors in mosaic mutant phenotypic regions are significantly different from wildtype (this can be done using spec readings from multiple wildtype wings and mutant wings).

We have added more explanation about the female colour in lines 226-232. Female colour does not correlate with the male colour genotypes, and the forewings of wildtype females are a pale-yellow colour. This mutant female had one wildtype forewing and one which was much more yellow/orange than expected. This is made clearer in figure 4 – supplement 1 and also quantified in the spectral measurements, which show that the mutant forewing is closer in colour to the hindwings than to the other forewing (Figure 4 – supplement 4). We also added a photo of a wildtype female to Figure 4 – supplement 1. This shows the pale-yellow colour of the female forewings. Hindwing colour in females varies continuously from yellow/orange to red so although the hindwing colour of the mutant looks different to the wildtype, this is not unusual.

Second, there needs to be sequence verification of the mutations included in the manuscript, as previously mentioned.

The sequences for the female are also included in Figure 4 – supplement 3. We have highlighted which is the female in the figure legend.

Figure 4 —figure supplement 2 shows images UV. However, there are no methods provided for how these UV data were collected. Without some details of the imaging setup, I am unable to discern that images reflect differences in UV reflection, or may be due to variations in the imaging procedure. If possible, spectra analyses of the wings are an easy and cost-effective approach to quickly confirming changes in UV brightness on lepidoptera wings.

All photographs were taken under standard lighting conditions and images were standardised using colour standards. Detailed methods regarding the camera and filters used have been added at lines 576-582. We took spectral measurements of a set of wildtype males and females (including WW, wy and yy genotypes), and also the 5 CRISPR mutants. These plots are found in figure 4 – supplement 4 and methods in lines 583-589. The reflectance spectra show that the mutant hindwings have lost UV reflectance compared to wildtype white wings.

There is also no background information given for the wildtype UV. Lines 212-213 suggest the UV is a result of scale structures. What is the reference or evidence for this? Variation in UV reflection is known to be influenced by pigment composition in Pieris butterflies, not necessarily scale structures. To make assertions of UV being associated with scale structures, I would be interested in seeing the characterization of the putative UV related scale structures in wildtypes and mutants. This type of scale characterization (e.g. SEM and/TEM of wing scales in wildtype and mutants) is routinely included with other functional genomic studies of similar wing colorations (for examples see Ficarrotta et al. 2022, Livraghi et al. 2022, Concha et al. 2019, Matsuoka and Monteiro 2018). At a minimum, a detailed description/characterization of the wildtype UV should be given to the readers. Along these lines, I am curious to know if the UV may be iridescent. If so, some descriptive info on the iridescence would be needed (i.e. angle of incidence). Also, if iridescent, the differences in UV between wildtype and mutants should be examined further to determine if the image differences between wt and mutants are due to changes in the angle of incidence.

We are definitely interested to look more into scale structure differences and the development of these. However, at the moment little is known about how UV reflectance is produced in this species and so we are carrying out a more thorough study of this to be included in a separate manuscript. Hence in this manuscript, we make a suggestion that there could be scale structure changes, but make no firm conclusions about this and do not rule out the possibility that only pigments are affected. With only 5 mutants it would be difficult to make any statistical inferences regarding any changes. Preliminary measurements show that UV is not iridescent but again this will be part of a further analysis which we will think will benefit from being separate from this manuscript. However, the addition of the reflectance spectra now shows more clearly that the mutant hindwings resemble more closely the yellow hindwings rather than the white wings in the UV wavelengths.

Ln 335-336. I am unclear what evidence supports yellow-e having a forewing-specific effect.

Now line 350 – we state the role of yellow-e in forewing colour as a hypothesis. We have added further explanation at this point that this suggestion comes from the change in female forewing colour, since we do not expect knockouts of valkea to have any effect on female colour. Spectral measurements show that forewing colour is similar in WW, wy and yy Finnish samples, so we don’t expect the presence/absence of valkea to control forewing colour.

Ln 337-338 The authors state that yellow-e was "likely also knocked out…". I think this is misleading, as lines 218-219 states "All samples also showed evidence of editing at the corresponding yellow-e exons, which mainly involved insertions". Based on this it seems more than "likely", and actually confirmed yellow-e coding was disrupted in ALL samples.

This has been edited at the start of the discussion and also in the abstract.

Reviewer #3 (Recommendations for the authors):

The revised version of the manuscript successfully addresses most of my previous concerns.

Results from CrispR/cas9 experiments targeting the valkea gene were added to the manuscript in order to validate the role of this gene in the developmental switch from the yellow to the white morph. Such CrispR/cas9 experiments are challenging and obtaining high number of mutant adults is usually difficult in Lepidoptera.

Here a few male mutants and one female mutant were successfully obtained. Nevertheless, the lack of specificity of the CrispR guides resulted in modifications in both valkea and yellow-e genes in the few mutant individuals that reached the adult stage, therefore preventing the full characterisation of the respective functional implications of these two genes in the development of hind and forewing colour patterns in males and females.

From what I understood, the main argument for ruling out yellow-e as causing the white/yellow switch in male hindwings is the phenotype observed in a single mutant female showing in panel E of the supplementary figure 4. The sentences line 221-224 are not entirely convincing to me. The phenotype of the mutant female is used to point at the putative role of yellow-e on forewing colour in female. Does it lead to hypothesize a role of yellow-e on forewing colour in both sexes? And thus to a role of valkea in male hindwing colour? This indirect argument should be made clearer, and further discussion on the respective roles of these two genes in hind and forewing coloration is needed.

We have added some further explanation to the paragraph around lines 221-232 and hope this clarifies the hypothesis that yellow-e could affect forewing colour. We also made clearer in the discussion that knockouts of only yellow-e would be needed to confirm this hypothesis.

“As all genotypes have similar forewing colour in the wildtypes, we do not expect valkea to affect the forewing and thus the change in forewing colour could be attributed to a yellow-e mutation. Only one female survived to adulthood, and this had a mosaic phenotype. Female colour does not correlate with the male colour genotypes, and the forewings of females are a pale-yellow colour. This individual with a mosaic phenotype had one mutant forewing which was much more yellow/orange than the wildtype. The rest of the wings and body resembled a wildtype female (Figure 4 —figure supplement 1/Figure 4 —figure supplement 4). Reflectance spectra show that the mutant left forewing is closer in colour to the yellow/orange on the hindwings, than to the colour of the opposite forewing (Figure 4 —figure supplement 4). Since a valkea knockout is not expected to affect female phenotypes as they always have orange/red hindwings, this could be further evidence for the effect of yellow-e on forewing colour.”

In the supplementary figure 4 (referred to as Figure 4 —figure supplement 1): the panel E shows the phenotype of the mutant female but picture of the wild-type female would be useful to fully evaluate the impact of the CrispR treatment on phenotypic variation.

A wildtype female has been added to this figure, and spectral data for the mosaic and wildtype females can be seen in figure 4 – supplement 4.

Line 213: This is quite interesting, did you observe differences in scale structure between wild-type yellow and white scales, and in the wild-type yellow vs. mutant yellow scales? Such observations on the respective role of pigments and scale structure in the reflected colours are also relevant to understand the developmental bases of wing colour variations.

Please see the reply to reviewer 2’s comment above regarding the UV reflectance and scale structure.

https://doi.org/10.7554/eLife.80116.sa2

Article and author information

Author details

  1. Melanie N Brien

    Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki, Finland
    Contribution
    Data curation, Formal analysis, Funding acquisition, Investigation, Visualization, Writing – original draft, Writing – review and editing
    Contributed equally with
    Anna Orteu
    For correspondence
    mnbrien1@gmail.com
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3089-4776
  2. Anna Orteu

    Department of Zoology, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Formal analysis, Investigation, Visualization, Writing – original draft, Writing – review and editing
    Contributed equally with
    Melanie N Brien
    Competing interests
    No competing interests declared
  3. Eugenie C Yen

    Department of Zoology, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
  4. Juan A Galarza

    Ecology and Genetics Research Unit, University of Oulu, Oulu, Finland
    Contribution
    Resources, Data curation, Writing – review and editing
    Competing interests
    No competing interests declared
  5. Jimi Kirvesoja

    Department of Biological and Environmental Science, University of Jyväskylä, Jyväskylä, Finland
    Contribution
    Investigation, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
  6. Hannu Pakkanen

    Department of Chemistry, University of Jyväskylä, Jyväskylä, Finland
    Contribution
    Resources, Investigation, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8725-1931
  7. Kazumasa Wakamatsu

    Institute for Melanin Chemistry, Fujita Health University, Toyoake, Japan
    Contribution
    Resources, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1748-9001
  8. Chris D Jiggins

    Department of Zoology, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Writing – review and editing
    Contributed equally with
    Johanna Mappes
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7809-062X
  9. Johanna Mappes

    1. Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki, Finland
    2. Department of Biological and Environmental Science, University of Jyväskylä, Jyväskylä, Finland
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Writing – review and editing
    Contributed equally with
    Chris D Jiggins
    Competing interests
    No competing interests declared

Funding

Academy of Finland (343356)

  • Melanie N Brien

Academy of Finland (345091)

  • Johanna Mappes

Academy of Finland (328474)

  • Johanna Mappes

Biotechnology and Biological Sciences Research Council (046_BB_V0145X_1)

  • Chris D Jiggins

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Alma Oksanen, Kaisa Suisto, and the greenhouse staff for insect rearing, and Elisa Salmivirta and Sari Viinikainen for lab assistance. Thanks to Emeritus Prof. Shosuke Ito for kindly providing pheomelanin standards, Bodo Wilts for advice on the pigment analyses, and James Barnett for help with the spectrophotometry. We thank Muktai Kuwalekar, Claudius Kratochwil, Rachel Blow, Ian Warren, Tom Generalovic, and Joe Hanly for providing equipment and advice regarding the CRISPR injections. Funding This work was supported by the Academy of Finland grants to MB (#343356) and JM (projects 345091 and 328474), and a Biotechnology and Biological Sciences Research Council (BBSRC) grant to CJ (046_BB_V0145X_1).

Senior Editor

  1. Christian R Landry, Université Laval, Canada

Reviewing Editor

  1. Patrícia Beldade, University of Lisbon, Portugal

Version history

  1. Preprint posted: May 1, 2022 (view preprint)
  2. Received: May 9, 2022
  3. Accepted: September 5, 2023
  4. Accepted Manuscript published: October 30, 2023 (version 1)
  5. Version of Record published: November 9, 2023 (version 2)

Copyright

© 2023, Brien, Orteu et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 465
    Page views
  • 122
    Downloads
  • 2
    Citations

Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Melanie N Brien
  2. Anna Orteu
  3. Eugenie C Yen
  4. Juan A Galarza
  5. Jimi Kirvesoja
  6. Hannu Pakkanen
  7. Kazumasa Wakamatsu
  8. Chris D Jiggins
  9. Johanna Mappes
(2023)
Colour polymorphism associated with a gene duplication in male wood tiger moths
eLife 12:e80116.
https://doi.org/10.7554/eLife.80116

Further reading

    1. Evolutionary Biology
    John S Favate, Kyle S Skalenko ... Premal Shah
    Research Article

    Changes in an organism’s environment, genome, or gene expression patterns can lead to changes in its metabolism. The metabolic phenotype can be under selection and contributes to adaptation. However, the networked and convoluted nature of an organism’s metabolism makes relating mutations, metabolic changes, and effects on fitness challenging. To overcome this challenge, we use the long-term evolution experiment (LTEE) with E. coli as a model to understand how mutations can eventually affect metabolism and perhaps fitness. We used mass spectrometry to broadly survey the metabolomes of the ancestral strains and all 12 evolved lines. We combined this metabolic data with mutation and expression data to suggest how mutations that alter specific reaction pathways, such as the biosynthesis of nicotinamide adenine dinucleotide, might increase fitness in the system. Our work provides a better understanding of how mutations might affect fitness through the metabolic changes in the LTEE and thus provides a major step in developing a complete genotype–phenotype map for this experimental system.

    1. Ecology
    2. Evolutionary Biology
    Songdou Zhang, Jianying Li ... Xiaoxia Liu
    Research Article

    Temperature determines the geographical distribution of organisms and affects the outbreak and damage of pests. Insects seasonal polyphenism is a successful strategy adopted by some species to adapt the changeable external environment. Cacopsylla chinensis (Yang & Li) showed two seasonal morphotypes, summer-form and winter-form, with significant differences in morphological characteristics. Low temperature is the key environmental factor to induce its transition from summer-form to winter-form. However, the detailed molecular mechanism remains unknown. Here, we firstly confirmed that low temperature of 10 °C induced the transition from summer-form to winter-form by affecting the cuticle thickness and chitin content. Subsequently, we demonstrated that CcTRPM functions as a temperature receptor to regulate this transition. In addition, miR-252 was identified to mediate the expression of CcTRPM to involve in this morphological transition. Finally, we found CcTre1 and CcCHS1, two rate-limiting enzymes of insect chitin biosyntheis, act as the critical down-stream signal of CcTRPM in mediating this behavioral transition. Taken together, our results revealed that a signal transduction cascade mediates the seasonal polyphenism in C. chinensis. These findings not only lay a solid foundation for fully clarifying the ecological adaptation mechanism of C. chinensis outbreak, but also broaden our understanding about insect polymorphism.