Speciation, wing pattern evolution and mimetic polymorphism in the Papilio polytes species group.

a. Species distributional ranges and wing colour pattern polymorphism in the female forms of the polytes species group, along with their Batesian models. Although the minute details of mimetic wing colour patterns and the presence/absence of tails vary across species and populations, the form names are generalised for the purpose of this paper and apply to multiple species. b. A secondary fossil-calibrated, dated phylogeny of the polytes species group, showing mean with 95% Highest Posterior Density of each split in million years. c. Evolution of mimetic polymorphism and doublesex inversion in relation to speciation events. Diamonds on branches show fixation or polymorphism of female wing patterns and the evolution of accompanying dsx inversion.

dsx is a hotspot of selective sweeps in all the mimetic species.

a. Annotated genes showing selective sweeps in the five mimetic species. The alternately coloured outer bands indicate chromosomes as defined in the reference Bombyx mori genome, whereas colour-coded lines inside represent genes showing signature of selective sweeps in the five species. b. Summary of analysis of selective sweeps using Raised Accuracy in Sweep Detection (RAiSD). Note that a single annotated gene may contain multiple loci (SNPs) that show signatures of selective sweeps.

Allelic basis of mimetic polymorphism in the polytes species group.

Lepidopteran dsx comprises of six exons, of which exon 6 is untranslated27. It has female- and male-specific transcripts (called dsx F and dsx M, respectively), with multiple female isoforms (F1, F2 and F3) in the polytes group9,10. The exon composition of each isoform is depicted at the top. Allele-specific SNPs, their positions on the CDS and protein sequence, and their corresponding amino acids are colour coded. Invariable and non-specific polymorphic sites are represented by dotted lines. The domain regions (OD) are highlighted where feasible. OD2 spans several exonic regions in the DNA sequence where bases cannot easily be numbered because of different lengths of exonic regions across dsx isoforms. The amino acid sequence of OD1 is conserved.

Rapid molecular evolution of mimetic dsx alleles in the polytes species group.

a. Allele-specific, fixed substitutions from Fig. 3 are shown for DNA and amino acid sequences (CDS) with respect to the evolutionary timeline of the origin of three dsx alleles (a linear increase is assumed for simplicity). The total number of fixed substitutions accumulated in each allele is shown before the new allele evolved. Colour-coded regions that cross evolutionary boundaries (dotted lines) represent the number of fixed substitutions that were inherited in the new allele from the previous allele from which it arose. For example, h has no fixed substitutions in the amino acid sequence but 16 in the DNA sequence, of which 7 were inherited in HP, of which 5 were inherited in HR. HP has 61 new, fixed DNA substitutions relative to h, of which 44 were inherited by HR. In addition, HR and h share 18 SNPs that are absent in HP, and 25 fixed substitutions that are unique to HR, showing a rapid accumulation of mutations in coding regions of novel mimetic alleles. See Fig. 3 for molecular details and sample sizes. b. Percentage of nucleotide substitutions in dsx sequence as observed at the genus and family levels (separated by a horizontal dotted line) for four insect orders 27. Numbers in parentheses after family/genus names represent the number of species from that group used in this analysis. The three alleles of dsx in P. polytes alone (vertical dotted line) have more substitutions than the dsx sequences within several genera.

Exon swaps (recombination) between the HR and HP alleles produce a novel, rare intermediate phenotype.

a. Wing patterns and possible dsx genotypes of mimetic female forms of P. polytes, along with the two females with intermediate wing patterns that we obtained, are shown. Both the intermediates had a crossed over, hybrid dsx allele (represented as HH) with exon 1 of the universally dominant romulus allele and exons 2–5 of the second-dominant polytes allele. b. The three dsx alleles have distinctive SNPs in each exon, which are colour coded for the normal phenotypes that they produce. IBC-PT567 was heterozygous for polytes and cyrus alleles, while IBC-PT568 was homozygous for polytes allele, except that both specimens had exon 1 of HR allele. c and d. Exon-specific SNPs of romulus and polytes alleles in the intermediate IBC-PT567 and IBC-PT568, highlighted in blue. In panel c, romulus-specific SNPs are marked in red, and polytes-specific SNPs in green. Non-synonymous SNPs are highlighted in yellow.

Primers used in this study.

Breakpoint- and exon-specific primers distinguish between the three female forms (considering polytes and theseus forms to be genetically indistinguishable) in the polytes species group. 1=primers designed for this study, 2=primers previously published8 but that failed in our PCRs, and 3=primers previously published8 that also worked in our samples.

Details of the significant sites associated with the mimetic f. romulus in the GWAS.

Dominance hierarchy of female forms in P. polytes.

Details of crosses with various phenotypic and genotypic combinations of parents, and the offspring produced, are shown. The three dsx alleles that produce three female forms are listed as HR (mimetic romulus allele that is universally dominant), HP (mimetic polytes allele that is recessive to romulus but dominant over cyrus), and h (non-mimetic cyrus allele that is universally recessive). The column “Larvae/Pupae” shows the number of individuals that died in early stages, so their sex and form were unknown. Only female offspring were genotyped since the polymorphic phenotype manifests only in females where genotypes and phenotypes can be correlated. Genotypes were identified using allele-specific primers (Table S2). Additional details of the mapping broods and genotyping (including PCR gel images) are provided in Table S3, and phenotypes are depicted in Fig. 1.

Sources of sequence data for the analyses performed in this study.

Papilio polytes distribution, as determined from the tree topology in Fig. 1, is shaded in purple (b). Numbers on butterfly images indicate sample sizes for genome sequences of each female form/male from the respective sampling site.

Time calibrated phylogeny and divergence times of models and mimics.

a. The Papilio phylogeny showing node ages for all ingroups and outgroups considered in this work. The Papilio polytes species group and its Pachliopta aposematic models are highlighted in blue and purple, respectively. Numbers at nodes represent divergence times. All divergence times are in million years. b. The P. polytes species group, and c. the aposematic Pachliopta, are enlarged from panel a, with node ages represented on the top left corner of the nodes, and node numbers used in panel ‘d’ circled on the right side of the node. d. Node numbers, corresponding divergence times and 95% HPD min-max ranges are shown for all Papilio polytes species group species and their Pachliopta models to compare their relative ages.

Genotype-phenotype association between dsx and f. romulus.

a. The cross designed to generate a romulus-cyrus segregating brood to map the mimetic locus associated with f. romulus. b. Loci significantly associated with f. romulus and the scaffolds they lie on in the P. polytes genome (n=102: two parents and 50 females each of f. cyrus and f. romulus). The dotted lines indicate significance thresholds of p-values. c. The positions of all loci significantly associated with the mimetic phenotype and the genes in which they lie. Chromosome 25 of Bombyx mori contains dsx. d. Cross design to clarify dominance hierarchy (sample sizes given in Extended Data Table 2).

Inversion breakpoints for HR and HP alleles in P. polytes.

a. DNA sequence alignment of the dsx inversion breakpoints from Sanger sequencing data, where blank spaces in rows indicate that PCRs did not work for the specific samples or breakpoints. The left and right blocks of sequences represent the left and right breakpoints of the dsx inversion. The details of each sequence are given on the left, and colour coded by female forms. The black row at the bottom includes consensus sequence, with the most prevalent base depicted at the top. Height of this row indicates sequence similarity. Inside the breakpoints, sequence similarity degrades, and the height of the black row at the bottom decreases. The precise breakpoints are marked with solid black lines and arrowheads. Inversion breakpoints for both the mimetic forms are identical. b. Genomic alignments showing dsx inversion breakpoints in f. theseus.

Expression of dsx isoforms in developing mimetic and non-mimetic wing patterns.

Sex- and tissue-specific expression of dsx gene (panel a) and isoforms (panel b) across female forms (mimetic wings) and males (non-mimetic wings) in 3-day old pupae. Numbers in panel b indicate sample sizes for each female form and tissue. The composition of sex-specific dsx isoforms is shown in Extended Data Figure 6.

Genetic divergence in dsx-containing scaffolds of the three P. polytes female forms. From left to right, the figure shows plots of genetic divergence (Fst) between the non-mimetic dsx scaffold, the non-mimetic dsx gene and the mimetic dsx scaffold. The left panels highlight the dsx gene as a high Fst peak across the entire non-mimetic scaffold. Middle panels zoom into this high Fst peak to show several sites of high divergence in the pairwise comparisons of the three female forms. The right panels depict the mimetic scaffold that only contains dsxH, and shows high Fst peaks in the romulus-polytes comparison. Sample sizes for the three dsx alleles: HR=10; HP=13; and h=13.

Evolution of dsx alleles in the polytes species group.

We reconstructed gene trees using the CDS (a) and the non-coding regions of dsx (b), with P. protenor as the outgroup. The dsx alleles are colour coded, and the mimetic and non-mimetic female forms they produce are depicted (see Table S1 and Fig. 1 for dsx allelic abbreviations and female forms). c. Haplotype network generated with complete dsx CDS. Each female form clusters across species in both gene trees and haplotype network, indicating that the allelic polymorphism of dsx is much older than the individual species in which the alleles occur.

Occurrence of romulus-polytes intermediates in nature.

The phenotypic intermediates possessing romulus-like forewings and polytes-like hindwings occur in nature, two of which were genetically characterized in Fig. 5. Image courtesy: Ashok Dey and Dattaprasad Sawant, from the Butterflies of India citizen science project (http://www.ifoundbutterflies.org/sp/603/Papilio-polytes).