Introduction

Insects often display stunning colors, and the patterns of these colors have been shown to be involved in behavior(1); immunity(2); thermoregulation(3); UV protection(4); and especially visual antagonism of predators via aposematism(5), mimicry(6), cryptic color patterns(7) or some combination of the above (8). In addition, the color patterns are divergent and convergent among populations(9). Due to these striking visual features and highly active adaptive evolutionary phenotypes, the genetic basis and evolutionary mechanism of color patterns have been a topic of interest for a long time.

Insect coloration can be pigmentary, structural, or bioluminescent. Pigments are mainly synthesized by the insects themselves and form solid particles that are deposited in the cuticle of the body surface and the scales of the wings(10, 11). Interestingly, recent studies have found that bile pigments and carotenoid pigments synthesized through biological synthesis are incorporated into body fluids and filled in the wing membranes of two butterflies (Siproeta stelenes and Philaethria diatonica) via hemolymph circulation, providing color in the form of liquid pigments(12). The pigments form colors by selective absorption and/or scattering of light depending on their physical properties(13). However, structural color refers to colors, such as metallic colors and iridescence, generated by optical interference and grating diffraction of the microstructure/nanostructure of the body surface or appendages (such as scales)(14, 15). Pigment color and structural color are widely distributed in insects and can only be observed by the naked eye in illuminated environments. However, some insects, such as fireflies, exhibit colors (green to orange) in the dark due to bioluminescence(16). Bioluminescence occurs when luciferase catalyzes the oxidation of small molecules of luciferin(17). In conclusion, the color patterns of insects have evolved to be highly sophisticated and are closely related to their living environments. For example, cryptic color can deceive animals via high similarity to the surrounding environment. However, the molecular mechanism by which insects form precise color patterns to match their living environment is still unknown.

Recent research has identified the metabolic pathways of related pigments, such as melanins, pterins, and ommochromes, in Lepidoptera(18). A deficiency of enzymes in the pigment metabolism pathway can lead to changes in color patterns(19, 20). In addition to pigment synthesis, the microstructure/nanostructure of the body surface and wing scales are important factors that influence body color patterns. The body surface and wing scales of Lepidoptera are mainly composed of cuticular proteins (CPs) (21, 22). There are multiple genes encoding CPs in Lepidopteran genomes. For example, in Bombyx mori, over 220 genes encode CPs(23). However, there are still many unknowns regarding the function of CPs and the molecular mechanisms underlying their fine-scale localization in the cuticle.

In addition, some pleiotropic factors, such as wnt1(24), Apontic-like(25), clawless(26), abdominal-A(27), abdominal-B(28), engrailed(29), antennapedia(30), optix(31), bric à brac (bab)(32), and Distal-less (Dll)(33), play an important role in the regulation of color patterns. The molecular mechanism by which these factors participate in the regulation of body color patterns needs further study. In addition, there are many undiscovered factors in the gene regulatory network (GRN) involved in the regulation of insect color patterns. The identification of new color pattern regulatory genes and the study of their molecular mechanisms would be helpful for further understanding color patterns.

Silkworms (B. mori) have been completely domesticated for more than 5000 years and are famous for their silk fiber production(34). Because of this long-term domestication, artificial selection and genetic research, its genetic background is well resolved, and approximately 200 strains of mutants with variable color patterns have been preserved(35), which provide a good resource for color pattern research. The black dilute (bd) mutant, which exhibits recessive Mendelian inheritance, has a dark gray larval body color, and the female is sterile. bdf (black dilute fertile) is one of the alleles, leading to a lighter body color than that of bd and fertile females (Fig. 1). The two mutations were mapped to a single locus at 22.9 centimorgans (cM) of linkage group 9(36). No pigmentation-related genes have been reported at this locus. Thus, research may reveal a new color pattern-related gene, which stimulated our interest.

Phenotypes of bd, bdf, and wild-type Dazao larvae.

The epidermis of bd is dark gray, and the epidermis of bdf is light gray at the 5th instar (day 3) of silkworm larvae. The bar indicates 1 centimeter.

Results

Candidate gene of the bd allele

To identify the genomic region responsible for the bd alleles, positional cloning of the bd locus was performed. Due to the female infertility of the bd mutant and the fertility of females with the bdf allele, we used bdf and the wild-type Dazao strain as the parents for mapping analysis. The 1162 back-crossed filial 1st (BC1M) generation individuals from bdfand Dazao were used for fine mapping with molecular markers (Fig. 2). A genomic region of approximately 390 kilobases (kb) was responsible for the bd phenotype (Table S1). In the SilkDB database(37), this region included five predicted genes (BGIBMGA012516, BGIBMGA012517, BGIBMGA012518, BGIBMGA012519 and BGIBMGA014089). In addition, we analyzed the predictive genes for this genomic region from the GenBank(38) and SilkBase(39) databases (Fig. S1). The number of predicted genes varied among different databases. We performed sequence alignment analysis of the predicted genes in the three databases to determine their correspondence. Then, Real-time Quantitative polymerase chain reaction (qPCR) was performed, which showed that BGIBMGA012517 and BGIBMGA012518 were significantly downregulated on Day 3 of the fifth instar of larvae in the bd phenotype individuals, while there was no difference in the expression levels of other genes (Fig. S1). These two genes were predicted as a single locus (LOC101738295) in GenBank. To determine the gene structure of BGIBMGA012517 and BGIBMGA012518, we used forward primers for the BGIBMGA012517 gene and reverse primers for the BGIBMGA012518 gene to amplify cDNA from the wild-type Dazao strain. By gene cloning, the predicted genes BGIBMGA012517 and BGIBMGA012518 were proven to be one gene. For the convenience of description, we have temporarily called it the 12517/8 gene.

Positional cloning of the bd locus.

(A) We used 532 BC1 individuals to map the bd locus between PCR markers c and i. The numbers above the DNA markers indicate recombination events.

(B) A total of 1162 BC1 individuals were used to narrow the bd locus to an approximately 400 kb genomic region.

(C) Partial enlarged view of the region responsible for bd. This region contains 4 predicted genes, BGIBMGA012517, BGIBMGA012518, BGIBMGA012519 and BGIBMGA014089.

(D) Analysis of nucleotide differences in the region responsible for bd. The green block indicates the deletion of the genome in bd mutants. The black vertical lines indicate the SNPs and indels of bdf mutants.

The 12517/8 gene produces two transcripts; the open reading frame (ORF) of the long transcript is 2397 bp, and the ORF of the short transcript is 1824 bp, which shares the same 5’-terminus as the wild-type Dazao strain (Fig. S2). The 12517/8 gene showed significantly lower expression in the bdf mutant (Fig. S3), and multiple variations were found in the nearby region of this gene by comparative genomic analysis between bdf and Dazao (Table S2). In addition, 12517/8 was completely silenced due to the deletion of DNA fragments (approximately 168 kb) from the first upstream intron and an insertion of 3560 bp in the bd mutant (Fig. S4).

To predict the function of the 12517/8 gene, we performed a BLAST search using its full-length sequence and found a transcription factor, the maternal gene required for meiosis (mamo) in D. melanogaster, that had a high sequence similarity to that of the 12517/8 gene (Fig. S5). Therefore, we named the 12517/18 gene Bm-mamo; the long transcript was designated as Bm-mamo-L, and the short transcript was designated as Bm-mamo-S.

The mamo gene belongs to the Broad-complex, Tramtrack and bric à brac/poxvirus zinc finger protein (BTB-ZF) family. In the BTB-ZF family, the zinc finger serves as a recognition motif for DNA-specific sequences, and the BTB domain promotes oligomerization and the recruitment of other regulatory factors(40). Most of these factors are typically transcription repressors, such as nuclear receptor corepressor (N-CoR) and silencing mediator for retinoid and thyroid hormone receptor (SMRT)(41), but some are activators, such as p300(42). Therefore, these features commonly serve as regulators of gene expression. mamo is enriched in embryonic primordial germ cells (PGCs) in D. melanogaster. Individuals deficient in mamo are able to undergo oogenesis but fail to execute meiosis properly, leading to female infertility in D. melanogaster(43). Bm-mamo was identified as an important candidate gene for further analysis.

Expression pattern analysis of Bm-mamo

To analyze the expression profiles of Bm-mamo, we performed quantitative PCR. The expression levels of the Bm-mamo gene were investigated in the whole body at different developmental stages, from the embryonic stage to the adult stage, in the Dazao strain. The gene was highly expressed in the molting stage of caterpillars, and its expression was upregulated in the later pupal and adult stages (Fig. 3A). This suggests that the Bm-mamo gene responds to the ecdysone titer and participates in the processes of molting and metamorphosis in silkworms. In the investigation of tissue-specific expression levels in the 5th-instar 3rd-day larvae of the Dazao strain, the midgut, head, and epidermis had high expression levels; the trachea, nerves, silk glands, testis, ovary, muscle, wing disc and Malpighian tubules had moderate expression levels; and the blood and fat bodies had low expression levels (Fig. 3B). This suggests that Bm-mamo is involved in the developmental regulation of multiple silkworm tissues. Due to the melanism of the epidermis of the bd mutant and the high expression level of the Bm-mamo gene in the epidermis, we measured the expression level of this gene in the epidermis of the 4th to 5th instar of the Dazao strain. In the epidermis, the Bm-mamo gene was upregulated in the molting period, and the highest expression was observed at the beginning of molting (Fig. 3C).

Spatiotemporal expression of Bm-mamo.

(A) Temporal expression of Bm-mamo. In the molting stage and adult stage, this gene is significantly upregulated. M: molting stage, W: wandering stage, P: pupal stage, P1: Day 1 of pupal stage, A: adult stage, 4th3d indicates 4th instar day 3. 1st to 5th denote the first instar of larvae to fifth instar of larvae, respectively.

(B) Tissue-specific expression of 4th-instar molting larvae. Bm-mamo had relatively high expression levels in the midgut, head, and epidermis. AMSG: anterior division of silk gland and middle silk gland, PSG: posterior silk gland.

(C) Detailed analysis of Bm-mamo at the 4th larval stage in the epidermis of the Dazao strain. Bm-mamo expression is upregulated during the molting stage. HCS indicates the head capsule stage. The “h” indicates the hour, 90 h: at 90 hours of the 4th instar.

Functional analyses of Bm-mamo

To study the function of the Bm-mamo gene, we carried out an RNA interference (RNAi) experiment. Short interfering RNA (siRNA) was injected into the hemolymph of silkworms, and an electroporation experiment(44) was immediately conducted. We found significant melanin pigmentation in the epidermis of the newly molted 5th-instar larvae. This indicates that Bm-mamo deficiency can cause melanin pigmentation (Fig. 4). The melanism phenotype of RNAi individuals was similar to that of bd mutants.

(A) siRNA was introduced by microinjection followed by electroporation. “+” and “-” indicate the positive and negative poles of the electrical current. Scale bars, 1 cm.

(B) Partial magnification of the siRNA experimental group and negative control group. Scale bars, 0.2 cm.

(C) and (D) Relative expression levels of Bm-mamo in the negative control and RNAi groups were determined by qPCR analysis. The means ± s.d.s. Numbers of samples are shown in the upper right in each graph. **P<0.01, paired Student’s t test (NS, not significant).

(E) Efficiency statistics of RNAi.

In addition, gene knockout was performed. Ribonucleoproteins (RNPs) generated by guide RNA (gRNA) and recombinant Cas9 protein were injected into 450 silkworm embryos. In the G0 generation, individuals with a mosaic melanization phenotype were found. These melanistic individuals were raised to moths and then crossed. The homozygous line with gene knockout was obtained through generations of screening. The gene-edited individuals had a significantly melanistic body color, and the female moths were sterile (Fig. 5).

Knockout of Bm-mamo

(A) Larval phenotype in G3 fourth-instar larvae of Dazao targeted for Bm-mamo. Scale bars, 1 cm.

(B) Partial magnification of the Bm-mamo knockout individual and control. Scale bars, 0.2 cm.

(C) After 48 hours of laying eggs, the pigmentation of the eggs indicates that those produced by knockout homozygous females cannot undergo normal pigmentation and development.

(D) Genomic structure of Bm-mamo. Open reading frame (blue), untranslated region (black). The gRNA 1 and gRNA 2 sequences are shown.

(E) Sequences of the Bm-mamo knockout individuals. Lines 1 and 2 indicate deletions of 15 and 71 bp, respectively.

These results indicated that the Bm-mamo gene negatively regulates melanin pigmentation in caterpillars and participates in the reproductive regulation of female moths.

Downstream target gene analysis

Bm-mamo belongs to the zinc finger protein family, which specifically recognizes downstream DNA sequences according to their zinc fingers. We identified homologous genes of mamo in multiple species and conducted a phylogenetic analysis (Fig. S6). Sequence alignment revealed that the amino acid residues of the zinc finger motif of the mamo-S protein were highly conserved among 57 species. The insufficient prediction of alternative transcripts of mamo in different species in GenBank may have resulted in the mamo-L protein being found in only 30 species (Fig. S7). As the mamo protein carries a tandem Cys2His2 zinc finger (C2H2-ZF) motif, it can directly bind to DNA sequences. Previous research has suggested that the ZF-DNA binding interface can be understood as a “canonical binding model”, in which each finger contacts DNA in an antiparallel manner. The binding sequence of the C2H2-ZF motif is determined by the amino acid residue sequence of its α-helical component. Considering the first amino acid residue in the α-helical region of the C2H2-ZF domain as position 1, positions –1, 2, 3, and 6 are key amino acids in the recognition and binding of DNA. The residues at positions –1, 3, and 6 specifically interact with base 3, base 2, and base 1 of the DNA sense sequence, respectively, while the residue at position 2 interacts with the complementary DNA strand(45, 46). To analyze the downstream target genes of mamo, we first predicted its DNA binding motifs using online software (http://zf.princeton.edu) based on the canonical binding model(47). In addition, the DNA-binding sequence of mamo (TGCGT) in Drosophila, confirmed by electrophoretic mobility shift assay (EMSA) experiments(48), has a consensus sequence with the predicted binding site sequence of Bm-mamo-S (GTGCGTGGC), and the predicted sequence was longer. This indicates that the predicted results of the DNA-binding site have good reliability.

Furthermore, the predicted DNA-binding sites of Bm-mamo-L and Bm-mamo-S were highly consistent with those of mamo orthologs in different species (Fig. S8). This suggests that the protein may regulate similar target genes between species.

C2H2-ZF transcription factors function by recognizing and binding to cis-regulatory sequences in the genome, which harbor cis-regulatory elements (CREs)(49). CREs are broadly classified as promoters, enhancers, silencers and insulators(50). CREs are often located near their target genes, such as enhancers, which are typically located upstream (5’) or downstream (3’) of the gene they regulate, or in introns, but approximately 12% of CREs are located far from their target gene(51). Therefore, we first investigated the genome range 2 kb upstream and downstream of the predicted genes in silkworms.

The predicted position weight matrices (PWMs) of the recognized sequences of Bm-mamo and the Find Individual Motif Occurrences (FIMO) software of MEME were used to perform silkworm whole-genome scanning for possible downstream target genes. The genomic regions 2 kb upstream and downstream of the 14,623 predicted genes in silkworms were investigated. A total of 10,622 genes contained the recognition sites of the Bm-mamo protein in the silkworm genome (Fig. S9).

Moreover, we compared transcriptome data of the integument tissue between homozygotes and heterozygotes of the bd mutant at the 4th instar/beginning molting stage(52). In the integument tissue, 10,072 genes (∼69% of the total predicted genes of silkworm) were expressed in heterozygotes, and 9,853 genes (∼67% of the total predicted genes) were expressed in homozygotes of the bd mutant. In addition, there were 191 genes with significant expression differences between homozygotes (bd/bd) and heterozygotes (+/bd) by comparative transcriptome analysis (Table S3)(52). Protein functional annotation was performed, and 19 CP genes were significantly differentially expressed between heterozygotes and homozygotes of bd. In addition, the orthologs of these CPs were analyzed in Danaus plexippus, Papilio xuthu and D. melanogaster (Table S4). Furthermore, we identified 53 enzyme-encoding genes, 17 antimicrobial peptide genes, 6 transporter genes, 5 transcription factor genes, 5 cytochrome genes, and others. CP genes were significantly enriched among the differentially expressed genes (DEGs), and previous studies have found that CPs can affect pigmentation(53). Therefore, we first conducted an investigation of the CP genes. Among them, 18 CP genes had Bm-mamo binding sites in the upstream and downstream 2 kb genomic regions. In addition, we investigated the expression level of the 18 CP genes in the integument from the 4th instar (day 1) to the beginning of the 5th instar of Bm-mamo knockout lines (Fig. S10). The CP genes were significantly upregulated at one or several time points in homozygous (mamo-/mamo-) individuals compared with heterozygous (mamo-/+) Bm-mamo knockout line individuals.

Interestingly, the CP gene BmorCPH24 was significantly upregulated at the feeding stage in Bm-mamo knockout homozygotes. Previous studies have shown that BmorCPH24 deficiency can lead to a marked decrease in pigmentation in silkworm larvae(53). Therefore, the expression of some CP genes may be necessary for color patterns in caterpillars.

In addition, the synthesis of pigment substances is an important driver of color patterns(54). Eight key genes (TH, DDC, aaNAT, ebony, black, tan, yellow and laccase2) involved in melanin synthesis(11) were investigated in heterozygous and homozygous gene knockout lines of Bm-mamo. Among them, yellow, tan, and DDC were significantly upregulated during the molting period in the homozygous Bm-mamo knockout individuals (Fig. 6). The upstream and/or downstream 2 kb genomic regions of yellow, tan and DDC contain the binding site of the Bm-mamo protein. In addition, the expression of yellow, DDC and tan can promote the generation of melanin(55).

Melanin metabolism pathway and qPCR of related genes.

(A) The melanin metabolism pathway. Blue indicates amino acids and catecholamines, red indicates the names of genes, and black indicates the names of enzymes.

(B) Relative expression levels of eight genes in the heterozygous Bm-mamo knockout group (blue) and homozygous Bm-mamo knockout group (red) as determined by qPCR analysis. The means ± s.d.s. *P<0.05, paired Student’s t test. 4th3d indicates 4th instar day 3, M indicates molting, and h indicates hours.

To explore the interaction between the Bm-mamo protein and its binding sequence, EMSA experiments were conducted. A binding site of Bm-mamo-S (CTGCGTGGT) was located approximately 70 bp upstream of the transcription initiation site of the Bm-yellow gene. The EMSA experiment showed that the Bm-mamo-S protein expressed in prokaryotes can bind to the CTGCGTGGT sequence in vitro (Fig. S11). This suggests that the Bm-mamo-S protein can bind to the upstream region of the Bm-yellow gene and regulate its transcription.

Therefore, the Bm-mamo gene may control the color pattern of caterpillars by regulating key melanin synthesis genes and 18 CP genes.

Discussion

Insects have evolved many important phenotypes during the process of adapting to the environment. Among these traits, color pattern is one of the most interesting. The biochemical metabolic pathways of pigments, such as melanin, ommochromes, and pteridines, have been identified in insects(18). However, the regulation of pigment metabolism-related genes and the processes of the transport and deposition of pigment substances are not clear. In this study, we discovered that the Bm-mamo gene negatively regulates melanin pigmentation in caterpillars. When this gene is deficient, the body color of silkworms exhibits substantial melanism, and changes in the expression of some melanin synthesis genes and some CP genes are also significant.

Deficiency of some genes in the melanin synthesis pathway, such as TH, yellow, ebony, tan, and aaNAT, can lead to variations in color patterns (11). However, they often lead only to localized pigmentation changes in later-instar caterpillars. For example, in yellow-mutant silkworms, the eye spot, lunar spot, star spot, spiracle plate, head, and some sclerotized areas appear reddish brown, and the other cuticle is consistent with that in the wild type in later instars of larvae (third instar to fifth instar)(56). This situation is highly similar to that of the tan mutant in silkworms(57). Why do the yellow and tan phenotypes appear only in a limited cuticle region in later instars of silkworm larvae? One possible reason is that the expression of yellow and tan is limited to a certain region by transcription regulators. Alternatively, other factors, such as CPs in the cuticle, may limit pigmentation by interacting with pigment substances in the cuticle.

On the one hand, we investigated the expression levels of yellow and tan in the pigmented region (lunar spot) and nonpigmented region of the epidermis during the 4th molting of the wild-type Dazao strain. The expression level of yellow was significantly upregulated in the lunar spot (the epidermis on the dorsum of the fifth body segment) compared with the nonpigmented region (the epidermis on the dorsum of the sixth body segment) at 6 hours and 12 hours after molting.

Meanwhile, tan was significantly upregulated 18 hours after molting in the lunar spot region (Fig. S12). This suggests that the upregulated expression of pigment synthesis genes at key time points may be important for pigmentation. However, yellow and tan were still moderately expressed in the nonpigmented epidermis, although they did not cause significant melanin pigmentation. This indicates that pigment synthesis alone cannot dominantly determine the color pattern of the cuticle in caterpillars.

On the other hand, synthesized pigment substances need to be transported from epidermal cells and embedded into the cuticle to allow pigmentation. Therefore, the correct cuticle structure and location of cuticular proteins in the cuticle may be important factors affecting pigmentation.

Previous studies have shown that the lack of expression of BmorCPH24, which encodes important components of the endocuticle, can lead to dramatic changes in body shape and a significant reduction in the pigmentation of caterpillars(53). We crossed Bo (BmorCPH24 null mutation) and bd to obtain F1(Bo/+Bo, bd/+), then self-crossed F1 and observed the phenotype of F2. The lunar spots and star spots decreased, and light-colored stripes appeared on the body segments, but the other areas still had significant melanin pigmentation in double mutation (Bo, bd) individuals (Fig. S13). However, in previous studies, introduction of Bo into L (ectopic expression of wnt1 results in lunar stripes generated on each body segment)(24) and U (overexpression of SoxD results in excessive melanin pigmentation of the epidermis)(58) strains by genetic crosses can remarkably reduce the pigmentation of L and U (53). Interestingly, there was a more significant decrease in pigmentation in the double mutants (Bo, L) and (Bo, U) than in (Bo, bd). This suggests that Bm-mamo has a stronger ability than wnt1 and SoxD to regulate pigmentation. On the one hand, mamo may be a stronger regulator of the melanin metabolic pathway, and on the other hand, mamo may regulate other CP genes to reduce the impact of BmorCPH24 deficiency.

How do CPs affect pigmentation? One study showed that some CPs can form “pore canals” to transport some macromolecules(59). In addition, some CPs can be crosslinked with catecholamines, which are synthesized in the melanin metabolism pathway(60). Because there are no live cells in the cuticle, melanin precursor substances may be transported by the pore canals formed by some CPs and fixed to specific positions through cross-linking with CPs. The cuticular protein TcCPR4 is needed for the formation of pore canals of the cuticle in Tribolium castaneum(61). In contrast, the vertical pore canal is lacking in the less pigmented cuticles of T. castaneum(62). This suggests that the pore canals constructed by TcCPR4 may transport pigments and contribute to cuticle pigmentation in T. castaneum. Moreover, the melanin metabolites N-acetyldopamine (NADA) and N-β-alanyldopamine (NBAD) can target and sclerotize the cuticle by cross-linking with specific cuticular proteins (57). This suggests that pigments interact with specific CPs, thereby affecting pigmentation, hardening properties and the structure of the cuticle. Interestingly, a study found that pigments, in addition to absorbing specific wavelengths, can affect cuticle polymerization, density, and refractive index, which in turn affects the reflected wavelengths that produce structural color in butterfly wing scales(63).

This implies that the interaction between pigments and CPs can be very subtle, resulting in the formation of unique nanostructures, such as those on wing scales, that produce brilliant structural colors.

Consequently, to maintain the accuracy of the color pattern, the localization of CPs in the cuticle may be very important. In a previous study employing microarray analysis, different CPs were found in differently colored areas of the epidermis in Papilio xuthus larvae(64). We investigated the CP genes highly expressed in the black region of Papilio xuthus caterpillars. Thirteen genes had orthologous genes in silkworms (Table S5). Among them, 11 genes (BmorCPR67, BmorCPR71, BmorCPR76, BmorCPR79, BmorCPR99, BmorCPR107, BmorCPT4, BmorCPH5, BmorCPFL4, BmorCPG27 and BmorCPG4) were significantly upregulated in homozygous (mamo-/mamo-) knockout individuals at some time points from the 3rd day of the 4th instar to the 5th instar; BmorCPG4 was also among the 18 previously detected CP genes, and two genes (BmorCPG38 and BmorCPR129) were not different between homozygous (mamo-/mamo-) and heterozygous (mamo-/+) individuals (Fig. S14).

The 28 CP genes mentioned above have significantly upregulated expression in homozygous (mamo-/mamo-) gene knockout individuals at some stages. These CPs may be involved in the transportation or cross-linking of melanin in the cuticle. However, among them, there were no differences in their expression level in some periods compared with the control, and some genes were significantly downregulated at some time points in melanic individuals (Fig. 7). This suggests that the regulation of CP genes is complex and may involve other transcription factors and feedback effects. CPs are essential components of the insect cuticle and are involved in cuticular microstructure construction (65), body shape development(66), wing morphogenesis(67), and pigmentation(53). CP genes usually account for over 1% of the total genes in an insect genome and can be categorized into several families, including CPR, CPG, CPH, CPAP1, CPAP3, CPT, CPF and CPFL(68). The CPR family is the largest group of CPs, containing a chitin-binding domain called the Rebers and Riddiford motif (R&R)(69). The variation in the R&R consensus sequence allows subdivision into three subfamilies (RR-1, RR-2, and RR-3)(70). Among the 28 CPs, 11 RR-1 genes, 6 RR-2 genes, 4 hypothetical cuticular protein (CPH) genes, 3 glycine-rich cuticular protein (CPG) genes, 3 cuticular protein Tweedle motif (CPT) genes, and 1 CPFL (like the CPFs in a conserved C-terminal region) gene were identified. The RR-1 consensus among species is usually more variable than RR-2, which suggests that RR-1 may have a species-specific function. RR-2 often clustered into several branches, which may be due to gene duplication events in co-orthologous groups and may result in conserved functions between species (71). The classification of CPH is due to their lack of known motifs. In the epidermis of Lepidoptera, the CPH genes often have high expression levels. For example, BmorCPH24 had a highest expression level, in silkworm larvae epidermis(72). The CPG protein is rich in glycine. The CPH and CPG genes are less commonly found in insects outside the order Lepidoptera (73). This suggests that they may provide species specific functions for the Lepidoptera. CPT contains a Tweedle motif, and the TweedleD1 mutation has a dramatic effect on body shape in D. melanogaster(74). The CPFL members are relatively conserved in species and may be involved in the synthesis of larval cuticles(75). CPT and CPFL may have relatively conserved functions among insects. The CP genes are a group of rapidly evolving genes, and their copy numbers may undergo significant changes in different species. In addition, RNAi experiments on 135 CP genes in brown planthopper (Nilaparvata lugens) showed that deficiency of 32 CP genes leads to significant defective phenotypes, such as lethal, developmental retardation, etc. It is suggested that the 32 CP genes are indispensable, and other CP genes may have redundant and complementary functions (76). In previous studies, it was found that the construction of the larval cuticle of silkworms requires the precise expression of over two hundred CP genes(22). The production, interaction, and deposition of CPs and pigments are complex and precise processes, and our research shows that Bm-mamo plays an important regulatory role in this process in silkworm caterpillars. For further understanding of the role of CPs, future work should aim to identify the function of important cuticular protein genes and the deposition mechanism in the cuticle.

Heatmap of 28 differentially expressed cuticular protein genes.

Differentially expressed cuticular protein genes between homozygotes (mamo-/mamo-) and heterozygotes (mamo-/+) of Bm-mamo knockout line individuals. Red indicates upregulation in homozygous individuals. Blue indicates downregulation in homozygous individuals. White indicates no difference in expression level or no investigation. 4th1d indicates 4-instar day one, M indicates molting, and h indicates hours.

In addition, among the 191 DEGs found in the comparative transcriptome data, we also discovered some interesting genes. For example, BGIBMGA013242, encoding a major facilitator superfamily protein (MFS), named BmMFS, which is responsible for the cheek and tail spot (cts) mutant, was significantly upregulated in the bd mutant. Deficiency of this gene (BmMFS) results in chocolate-colored head and anal plates of the silkworm caterpillar(77). In the ommochrome metabolic pathway, Bm-re, encoding an MFS protein, may function in the transportation of some amino acids, such as cysteine or methionine, into pigment granules(78). Therefore, the encoded product of BmMFS may participate in pigmentation by promoting the maturation of pigment granules. Moreover, BGIBMGA013576 and BGIBMGA013656, which encode MFS domain-containing proteins belonging to the solute carrier family 22 (SLC22) and solute carrier family 2 (SLC2) families, respectively, were significantly upregulated in bd. These two genes may participate in pigmentation in a similar manner to the BmMFS gene. In addition, the red Malpighian tubules (red) gene encodes LysM domain-containing proteins. Deficiency of red results in a significant decrease in orange pterin in the wing of Colias butterflies. Research suggests that the product of red may interact with V-ATPase to modulate vacuolar pH across a variety of endosomal organelles, thereby affecting the maturation of pigment granules in cells(79). This indicates that the maturation of pigment granules plays an important role in the coloring of Lepidoptera, and Bm-mamo may be involved in regulating the maturation of pigment granules in the epidermal cells of silkworms.

Bm-mamo may affect the synthesis of melanin in epidermis cells by regulating yellow, DDC, and tan; regulate the maturation of melanin granules in epidermis cells through BmMFS; and affect the deposition of melanin granules in the cuticle by regulating CP genes, thereby comprehensively regulating the color pattern in caterpillars.

Moreover, in D. melanogaster, mamo is needed for functional gamete production. In the silkworm, Bm-mamo has a conserved function in female reproduction. However, mamo has developed new functions in color pattern regulation in silkworm caterpillars. We found that in D. melanogaster, the mamo gene is mainly expressed during the adult stage; it is not expressed during the 1st instar or 2nd instar larval stage and has very low expression at the 3rd instar (Table S6). In addition, several binding sites of mamo were found near the TTS of yellow in D. melanogaster (Fig. S15). The yellow gene is a key melanin metabolic gene with upstream and intron sequences that have been identified as multiple CREs, and it has been considered a research model for CREs (80, 81). This may be due to a change in the expression pattern of mamo, causing this gene to develop a new function of regulating coloration in silkworm caterpillars.

It is generally believed that changes in gene expression patterns are the result of the evolution of CREs. Because CREs have important functions in the spatiotemporal expression pattern regulation of genes, many are found in the noncoding region, which is relatively prone to sequence variation compared with the sequences of coding genes(49). However, the molecular mechanism underlying the sequence change of CREs is not clear.

Transcription factors (TFs), because they recognize relatively short sequences, generally between 4 and 20 bp in length, can have many binding sites in the genome(82). Therefore, one member of the TF family has the potential to regulate almost all genes in one genome. For example, in the PWM sequence of the mamo protein recognition sequence, 10,622 genes (∼73% of the total number of genes) contain binding sites within 2 kb of their up/downstream region in silkworms (Table S7 and Table S8). Divergence of CREs is currently considered an important cause of phenotypic evolution(83). For example, the marine form of the three-spined stickleback (Gasterosteus aculeatus) has thick armor, and the lake population (which was recently derived from the marine form) does not. Research has shown that pelvic loss in different natural populations of three-spined stickleback fish occurs by regulatory mutations deleting a tissue-specific enhancer (Pel) of the pituitary homeobox transcription factor 1 (Pitx1) gene. The researchers genotyped 13 pelvic-reduced populations of three-spined stickleback from disparate geographic locations. Nine of the 13 pelvic-reduced stickleback populations had sequence deletions of varying lengths, all of which were located at the Pel enhancer. The author suggested that the Pitx1 locus of the stickleback genome may be prone to double-stranded DNA breaks that are subsequently repaired by nonhomologous end joining (NHEJ) (84). There are also examples of this in Lepidoptera. The cortex gene encodes a member of the Cdc20/fizzy family of cell cycle regulators. The cortex gene locus is frequently mutated in Lepidoptera, such as the black (carbonaria) form of the peppered moth (Biston betularia) in industrial melanism(85), the diversification of the leaf wing of oakleaf butterflies(86), the mimicry wing patterning in Heliconius butterflies(87), and the plastic wing pattern in Junonia coenia (88). The explanation of DNA fragility is very interesting, but it still cannot explain why DNA sequence mutations occur frequently in a specially site corresponding to the adaptive phenotype. DNA fragility sites are widely distributed in the genome(89). Under the stimulation of the same environmental conditions, all fragile sites are thought to undergo frequent mutations, and this clearly brings adverse burdens to the organism.

Some research suggests that common fragile sites (CFSs) are specific regions in the genome of all individuals that are prone to DNA double stranded breaks (DSBs) and subsequent rearrangements, and the DSB sites are correlated with epigenetic markers for chromatin accessibility, including DNaseI HSSs, H3K4me3, and CTCF binding sites(90, 91).

Moreover, some studies have shown that there is a mutation bias in the genome; compared with the intergenic region, the mutation frequency is reduced by half inside gene bodies and by two-thirds in essential genes. In addition, they compared the mutation rates of genes with different functions. The mutation rate in the coding region of essential genes (such as translation) is the lowest, and the mutation rates in the coding region of specialized functional genes (such as environmental response) are the highest. These patterns are mainly affected by the features of the epigenome(92). Epigenetic modifications include DNA methylation, histone modifications, remodeling of nucleosomes and higher order chromatin reorganization(93). TFs, such as BTB-ZF proteins, can recruit histone modification factors such as DNA methyltransferase 1 (DMNT1), cullin3 (CUL3), histone deacetylase 1 (HDAC1), and histone acetyltransferase 1 (HAT1) to perform chromatin remodeling at specific genomic sites based on specific DNA recognition capabilities(94). In addition, programmed events related to DSBs and repair at genome-specific sites have been reported. For example, PRDM9 can determine recombination hotspots by H3K4 and H3K36 trimethylation (H3K4me3 and H3K36me3) of nucleosomes near its DNA-binding site in meiosis recombination(95). This suggests that, on the basis of randomness, TFs and epigenomic features can determine the mutation bias at unique positions of the genome.

Materials and Methods Silkworm strains

The bd and bdf mutant strains and wild-type strains Dazao and N4 were obtained from the bank of genetic resources of Southwest University. Silkworms were reared on mulberry leaves or artificial diets at 25 °C and 73% relative humidity under dark for 12 hours and light for 12 hours.

Positional cloning of the bd locus

For mapping of the bd locus, F1 heterozygous individuals were obtained from a cross between a bdf strain and a Dazao strain. Then, an F1 female was crossed with a bdf male (BC1F), and an F1 male was backcrossed with a bdf female (BC1M). A total of 1162 BC1M individuals were used for recombination analysis. Genomic DNA was extracted from the parents (Dazao, bdf and F1) and each BC1 individual using the phenol chloroform extraction method. Available DNA molecular markers were sought through polymorphism screening and linkage analysis. The primers used for mapping are listed in Table S9.

Phylogenetic analysis

To determine whether Bm-mamo orthologs existed in other species, the BlastX program of the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/BLAST/) was used. The sequences of Bm-mamo-L and Bm-mamo-S were blasted against the nonredundant protein sequence (nr) database. Sequences with a maximum score and E-value≤10-4 were downloaded. The sequences of multiple species were subjected to multiple sequence alignment of the predicted amino acid sequences by MUSCLE(96). Then, a phylogenetic tree was constructed using the neighbor-joining method with the MEGA7 program (Pearson model). The confidence levels for various phylogenetic lineages were estimated by bootstrap analysis (2,000 replicates).

Quantitative PCR

Total RNA was isolated from the whole body and integument of the silkworms using TRIzol reagent (Invitrogen, California, USA), purified by phenol chloroform extraction and then reverse transcribed with a PrimeScript™ RT Reagent Kit (TAKARA, Dalian, China) according to the manufacturers’ protocol. qPCR was performed using a CFX96™ Real-Time PCR Detection System (Bio-Rad, Hercules, CA) with a Real-Time PCR System and an iTaq Universal SYBR Green Supermix System (Bio-Rad). The cycling parameters were as follows: 95 °C for 3 min, followed by 40 cycles of 95 °C for 10 s and annealing for 30 s. The primers used for target genes are listed in Table S9.

Expression levels for each sample were determined with biological replicates, and each sample was analyzed in triplicate. The gene expression levels were normalized against the expression levels of the ribosomal protein L3 (RpL3). The relative expression levels were analyzed using the classical R=2−ΔΔCt method.

siRNA for gene knockdown

siRNAs for Bm-mamo were designed by the siDirect program (http://sidirect2.rnai.jp). The target siRNAs and negative control were synthetized by Tsingke Biotechnology Company Limited. The siRNA (5 μl, 1 μl/μg) was injected from the abdominal spiracle into the hemolymph at the fourth instar (day 3) larval stage. Immediately after injection, phosphate-buffered saline (pH 7.3) droplets were placed nearby, and a 20-voltage pulse for one second and pause for one second were repeated 3 times. The phenotype was observed for fifth-instar larvae. The left and right epidermis were separately dissected from the injected larvae, and RNA was extracted. Then, cDNA was synthetized, and the expression level of the gene was detected by qPCR.

sgRNA synthesis and RNP complex assembly

CRISPRdirect (http://crispr.dbcls.jp/doc/) online software was used to screen appropriate single guide RNA (sgRNA) target sequences. The gRNAs were synthetized by Beijing Genomics Institute. sgRNA templates were transcribed using T7 polymerase with RiboMAX™ Large-Scale RNA Production Systems (Promega, Beijing, China) according to the manufacturer’s instructions. The RNA transcripts were purified using 3 M sodium acetate (pH 5.2) and anhydrous ethanol (1:30) precipitation, washed using 75% ethanol, and eluted in nuclease-free water. All injection mixes contained 300 ng/µL Cas9 nuclease (Invitrogen, California, USA) and 300 ng/µL purified sgRNA. Before injection, mixtures of Cas9 nuclease and gRNA were incubated for 15 min at 37 °C to reconstitute active ribonucleoproteins (RNPs).

Microinjection of embryos

For embryo microinjection, microcapillary needles were manufactured using a PC-10 model micropipette puller (Narishige, Tokyo, Japan). Microinjection was performed using Eppendorf’s TransferMan NK 2 and a FemtoJet 4i system (Eppendorf, Hamburg, Germany). The eggs used for microinjection came from the mating of female and male wild-type Dazao moths. Within 4 hours of being laid, the eggs were adhered onto a clean glass slide. CRISPR/Cas9-messenger RNP mixtures with volumes of approximately 1 nL were injected into the middle of the eggs, and the wound was sealed with glue. All injected embryos were allowed to develop in a climate chamber at 25 °C and 80% humidity.

Comparative genomics

The reference genome of Dazao was downloaded from Silkbase (https://silkbase.ab.a.u-tokyo.ac.jp/cgi-bin/index.cgi). The bdf genome was obtained from the silkworm pangenome project(34).

The short reads of the bdf strains were mapped to the silkworm reference genome by BWA55 v0.7.17 mem with default parameters. The SAMtools56 v1.11 and Picard v2.23.5 (https://broadinstitute.github.io/picard/) programs were used to filter the unmapped and duplicated reads. A GVCF file of the samples was obtained using GATK57 v4.1.8.1 HaplotypeCaller with the parameter –ERC = GVCF. The VCF files of insertion/deletions (indels) and single-nucleotide polymorphisms (SNPs) were used for further analysis by eGPS software.

Downstream target gene screening

Online software (http://zf.princeton.edu) was used to predict the DNA-binding site for Cys2His2 zinc finger proteins. The confident ZF domains with scores higher than 17.7 were chosen. RF regression on the B1H model was used to predict the DNA-binding sites. Then, the sequence logo and position weight matrices (PWMs) of the DNA-binding sites were obtained. The sequences 2 kb upstream and downstream of the predicted genes were extracted by a Perl script. The FIMO package of the MEME suite was used to search for binding sites in the silkworm genome.

Analysis of EMSA

Primers with binding sites and their flanking sequences were designed according to the FIMO results. A biotin label was added to one end of the upstream primer (probe), and the downstream primer was normal. Primers with the same sequence as the labeled probes were used as competitive probes. The EMSA experiment was conducted with an EMSA reagent kit (Beyotime, Shanghai, China) according to the manufacturers’ protocol.

Parthenogenesis

Virgin moths were dissected in 0.9% bacterium-free physiological saline. The ovaries were extracted, and water was removed by clean filter paper. The ovaries were stored at 25 °C and 40% relative humidity for 10 hours. The eggs were treated in 46 °C warm water for 18 minutes and then transferred to room temperature for 5-10 minutes. The activated egg was protected in an environment of 15-17 °C and 18% relative humidity for 3 days. Then, it was immersed in acid immediately (with a hydrochloric acid specific gravity of 1.075 at 46 °C for 5 minutes) or transferred to the same conditions as the hibernating eggs for protection. The average hatching rate of silkworm eggs treated in this way was 50-60%.

Acknowledgements

Funding

This work was supported by the National Natural Science Foundation of China (No. 32002230), the Fundamental Research Funds for the Central Universities in China (No. SWU120024), the National Natural Science Foundation of China (No. 31830094, No. U20A2058), the China Agriculture Research System of MOF and MARA (No. CARS-18-ZJ0102, No. CARS-18-ZJ0103), the Natural Science Foundation of Chongqing, China (No. cstc2021jcyj-cxtt0005), the Yibin Academy of Southwest University (XNDX2022020015), the Southwest University Innovation Research 2035 Pilot Program (No. SWU-XDPY22011) and the High-level Talents Program of Southwest University (No. SWURC2021001).

Author contributions

FY Dai conceived and designed the experiments. SY Wu, CX Peng, JW Luo, KP Lu and CH Zhang performed the study. SY Wu, XL Tong, KP Lu, CL Li, X Ding, YR Lu, XH Duan, D Tan and H Hu analyzed the data. SY Wu wrote the paper. XL Tong and FY Dai edited and revised the manuscript.

Competing interests

The authors declare that they have no competing interests.

Data and material availability

The authors confirm that the data supporting the findings of this study are available within the article and its supplementary materials.

Additional information

Supplementary Information is available for this paper.