A transcriptomic atlas of Aedes aegypti reveals detailed functional organization of major body parts and gut regional specializations in sugar-fed and blood-fed adult females

  1. Bretta Hixson
  2. Xiao-Li Bing
  3. Xiaowei Yang
  4. Alessandro Bonfini
  5. Peter Nagy
  6. Nicolas Buchon  Is a corresponding author
  1. Cornell Institute of Host-Microbe Interactions and Disease, Department of Entomology, Cornell University, United States

Abstract

Mosquitoes transmit numerous pathogens, but large gaps remain in our understanding of their physiology. To facilitate explorations of mosquito biology, we have created Aegypti-Atlas (http://aegyptiatlas.buchonlab.com/), an online resource hosting RNAseq profiles of Ae. aegypti body parts (head, thorax, abdomen, gut, Malpighian tubules, ovaries), gut regions (crop, proventriculus, anterior and posterior midgut, hindgut), and a gut time course of blood meal digestion. Using Aegypti-Atlas, we provide insights into regionalization of gut function, blood feeding response, and immune defenses. We find that the anterior and posterior midgut possess digestive specializations which are preserved in the blood-fed state. Blood feeding initiates the sequential induction and repression/depletion of multiple cohorts of peptidases. With respect to defense, immune signaling components, but not recognition or effector molecules, show enrichment in ovaries. Basal expression of antimicrobial peptides is dominated by holotricin and gambicin, which are expressed in carcass and digestive tissues, respectively, in a mutually exclusive manner. In the midgut, gambicin and other effectors are almost exclusively expressed in the anterior regions, while the posterior midgut exhibits hallmarks of immune tolerance. Finally, in a cross-species comparison between Ae. aegypti and Anopheles gambiae midguts, we observe that regional digestive and immune specializations are conserved, indicating that our dataset may be broadly relevant to multiple mosquito species. We demonstrate that the expression of orthologous genes is highly correlated, with the exception of a ‘species signature’ comprising a few highly/disparately expressed genes. With this work, we show the potential of Aegypti-Atlas to unlock a more complete understanding of mosquito biology.

Editor's evaluation

Hixson et al. provide a large overview of gene expression level of the mosquito Aedes aegypti through the use of RNA-seq. They analyse gene expression changes in the digestive tract, as well as the 3 body regions and the ovaries in various conditions. These organ-specific transcriptomes fill a hole in our understanding of mosquito vector biology and will be an excellent starting point for many researchers to produce new projects.

https://doi.org/10.7554/eLife.76132.sa0

Introduction

Hematophagous mosquito vectors contribute substantially to the world’s disease burden through the transmission of Plasmodium parasites and arboviruses (“World Health Organisation, 2020a | Vector-borne diseases,”). With the rise of insecticide resistance (Moyes et al., 2017; Ranson et al., 2011; “WHO | Insecticide resistance,” World Health Organisation, 2020b), and the expansion of mosquito species’ ranges potentiated by climate change (Liu et al., 2020), new and more efficient means of vector control are needed, grounded in a solid understanding of mosquito physiology. The FlyAtlas project (Chintapalli et al., 2007; Leader et al., 2018) contributed to the field of insect physiology, not only by characterizing the functions associated with the major body parts and organs of Drosophila melanogaster, but by making it possible to quickly ascertain where in the body, and in what quantity, any gene of interest is transcribed. FlyAtlas also showed how the various contributions of different organs added up to constitute the whole fly transcriptome, providing an estimate of the relative transcriptional yield of the parts of the Drosophila melanogaster body. Similar anatomical datasets have been created for several species of Anopheles (Baker et al., 2011; Martínez-Barnetche et al., 2012; Sreenivasamurthy et al., 2017), but not for Aedes mosquitoes. While transcriptional profiles have been created for various Ae. aegypti body parts in diverse studies (e.g head (Ptitsyn et al., 2011), Malpighian tubules (Li et al., 2017), midgut (Angleró-Rodríguez et al., 2017; Cui and Franz, 2020; Dong et al., 2017; Hyde et al., 2020; Liu et al., 2016; Raquin et al., 2017; Xiao et al., 2017), ovaries (Akbari et al., 2013; Nag et al., 2021), carcass (Akbari et al., 2013; Choi et al., 2012), the lack of uniformity of strain and methodology between these studies makes direct comparison difficult.

The mosquito gut is of special interest to vector control, as it is central to the hematophagous lifestyle and serves as the first interface between a mosquito and the pathogens it transmits. Insect guts are highly regionalized organs, with specialized functions differentially distributed along their length (Benguettat et al., 2018; Buchon and Osman, 2015; Buchon et al., 2013; Dutta et al., 2015; Lemaitre and Miguel-Aliaga, 2013). The mosquito gut is divided into five anatomically distinct regions: foregut (comprising pharynx, dorsal diverticula, and crop), proventriculus, anterior midgut, posterior midgut, and hindgut. The crop and dorsal diverticula are muscular cuticle-lined sacs which store imbibed sugar for gradual release into the midgut (Clements, 1992). In Anopheles gambiae, the proventriculus and anterior midgut were found to exhibit an enrichment of transcripts encoding antimicrobial peptides (AMPs) and anti-Plasmodium factors (Warr et al., 2007) suggesting that these regions serve a special defensive function against orally acquired pathogens. Within the midgut, digestive functions are believed to be divided between the anterior and posterior regions, with the former specializing in the digestion of nectar but playing no direct role in the digestion of the blood meal, which is processed in the posterior midgut (Hecker, 1977). The hindgut receives and eliminates excreta from the midgut and urine from the Malpighian tubules (Clements, 1992). Genome-wide transcriptomic profiles have the potential to reveal much more detailed information about the regionalization of function in the mosquito gut. In recent years, many groups have used microarrays and RNAseq to create transcriptomic profiles of mosquito midguts, mostly in the context of infection (Baker et al., 2011; Cui and Franz, 2020; Dong et al., 2017; Hyde et al., 2020; Raquin et al., 2017; Vedururu et al., 2019; Vlachou et al., 2005; Warr et al., 2007). However, to our knowledge, only a single microarray-based study has ever characterized the transcriptomes of individual mosquito midgut regions (Warr et al., 2007) and no transcriptomic profiles have been generated for either foregut or hindgut tissues in any mosquito species.

The Ae. aegypti gut’s response to the ingestion of a blood meal has been characterized as a biphasic process. In the first phase, mRNAs encoding a cohort of ‘early’ serine endopeptidases, transcribed in response to juvenile hormone (JH) secretion during previtellogenic maturation, are translated (Bian et al., 2008; Felix et al., 1991; Jiang et al., 1997; Noriega et al., 1996; Noriega et al., 1997; Noriega et al., 2001). In the second, ecdysone signaling mediates the transcription of a ‘late’ cohort of endo and exopeptidases, with the transcripts of key peptidases and overall proteolytic activity typically peaking around 24 hr post blood meal (pbm) (Barillas-Mury et al., 1991; Brackney et al., 2010; Gooding, 1966; Graf and Briegel, 1982; Graf et al., 1998; He et al., 2021; Isoe et al., 2009b; Noriega et al., 2002). The transcriptional timeline is, however, more complex than a simple early translation/late transcription paradigm might suggest. Several studies have, by RT-qPCR and microarray, documented de novo transcription of peptidases and other genes peaking much earlier than 24 hr pbm (Brackney et al., 2010; Isoe et al., 2009b; Sanders et al., 2003). However, we found no RNAseq data to allow us to compare gut or midgut transcriptomes genome-wide in sugar-fed conditions and across the time-course of bloodmeal digestion.

Here, we aimed to complement existing datasets by, first, contextualizing the adult female gut’s transcriptome with profiles of the whole body and other major body parts and organs (head, thorax, abdomen, Malpighian tubules, ovaries) with the dual goal of (a) creating an at-a-glance reference for the anatomical distribution of transcripts, genome-wide and (b) facilitating the identification of tissue-specific marker genes which may, in the future, be useful in the construction of tissue-specific expression systems. Second, we created profiles for the five anatomically distinct regions of the adult female gut (crop, proventriculus, anterior midgut, posterior midgut, hindgut) to facilitate explorations of the regionalization of gut function. Finally, we profiled transcriptional responses to blood feeding in the anterior and posterior midgut and at multiple timepoints (6, 24, and 48 hr) after blood meal ingestion. We opted to profile Aedes aegypti, as it is among the most important insect vectors (Powell, 2018), with an increasing worldwide range (Iwamura et al., 2020), and a recently updated genome assembly (Matthews et al., 2018). All the resulting transcriptional data can be accessed at the Aegypti-Atlas online database we constructed (http://aegyptiatlas.buchonlab.com/).

With RNAseq profiles for body parts, gut regions, and blood-fed guts in hand, we launched a broad investigation into multiple aspects of mosquito biology. Our inquiries were guided by the following questions: What functions are associated with each body part? What are the functional specializations of the gut regions? How does the gut transcriptome change in the context of blood feeding, and how are those changes distributed in the midgut? What can our atlas tell us about the organization of mosquitoes' antimicrobial defenses? Finally, in a cross-species comparison between the midguts of Ae. aegypti and Anopheles gambiae (s.l.), we asked whether the gut’s structure-function relationship is conserved between two hematophagous mosquito vector species.

Results

The transcriptional landscape of an adult mosquito reflects the embryonic origins and functional signatures of its major tissues

In order to develop a transcriptomic atlas of some of the main body parts of Ae. aegypti, we generated RNAseq profiles from the whole bodies and dissected body parts (head, thorax, abdomen, gut, Malpighian tubules, and ovaries) of sucrose-fed adult female mosquitoes of a field-derived ‘Thai’ strain from a region where arbovirus transmission is endemic (League et al., 2019). For the dissected body parts (but not whole body), we omitted the anterior-most section of the thorax as well as the final segments of the abdomen to exclude transcripts from salivary glands and from sperm in the spermatheca (see Figure 1—figure supplement 1A for diagram). We first evaluated the relative variance between body part and whole-body profiles by Principal Component Analysis (PCA) (Figure 1A). All transcriptomes from the same body part clustered together closely, providing evidence for the replicability of our data. On PC1, body parts were grouped in a manner that accorded with their divergent embryonic origin (primarily mesoderm/ectoderm for head, thorax, and abdomen, endoderm for gut and Malpighian tubules) and whole-body transcriptomes were clustered in between. Ovaries, which contain the germline, strongly separated from the other, exclusively somatic, body parts on PC2. Altogether, PCA indicated that the transcriptomes of body parts are largely defined by their embryonic origin, and that germline tissues possess a highly distinctive transcriptional signature.

Figure 1 with 3 supplements see all
The transcriptomes of Aedes aegypti body parts reflect their embryonic origins and functional specializations.

(A) Principal component analysis of the transcriptomes of whole body and body parts; 3–6 replicates per body part. Circled clusters contain body parts of dominantly endodermal (left) and mesodermal/ectodermal (right) derivation. (B) Expression of body-part-specific putative tissue markers (in TPM) (C) Calculated estimates of body parts’ contributions to whole body transcriptome; Bubble plots: classic Fisher topGO gene ontology enrichment analysis of genes enriched 5 x (head, gut thorax, Malpighian tubules, abdomen) or 2 x (ovaries) relative to whole body (DESeq2, padj <0.05); Treemaps: 20 highest expressed genes per body part, grouped by category/function; percentages reflect the proportion of the transcriptome comprised by the displayed genes.

Each body part we sequenced expressed a set of unique or nearly unique transcripts some of which, we hypothesize, may be exclusive tissue markers. We conducted a census of putative tissue markers (here defined as genes expressed at five or more transcripts per million (TPM) in one body part and enriched at least 50-fold compared to all other body parts). Figure 1B presents a selection of markers from each body part (full census in Supplementary file 1). The quantity of markers ranged from 8 in the abdomen to 391 in the ovaries, which expressed more markers than all other body parts combined; nearly one in fifty of all genes in the genome qualified as an ovary marker. We further noted that ovaries simultaneously expressed the greatest number of highly enriched transcripts relative to other dissected body parts (Figure 1—figure supplement 1B) and the smallest number of highly enriched transcripts relative to whole body (Figure 1—figure supplement 1C). This apparent paradox led us to hypothesize that the ovaries make a disproportionately large contribution to the transcriptome of the whole female mosquito. A gene that is uniquely expressed in the ovaries will not be highly enriched relative to the whole mosquito body if the ovarian transcriptome comprises a large proportion of the whole body’s transcriptome.

To validate our sequencing results, we performed RT-qPCR amplification of one putative marker gene from each body part. We selected a translation factor (AAEL013144) as a reference gene for our assay over more traditional candidates (Dzaki et al., 2017), on the basis of its robust expression and relatively small variation across samples Figure 1—figure supplement 2A. The results of our RT-qPCR confirmed the anatomical specificity of each selected marker (Figure 1—figure supplement 2B Fig).

The existence of transcripts specific to each body part allowed us to calculate their relative contribution to the whole body’s transcriptome (see the ‘mosquito equation’ in Materials and methods). Expressed as percentages, the body part contributions we obtained were: head (4.7%), thorax (19.4%), abdomen (26.2%), gut (14.0%), Malpighian tubules (1.7%), and ovaries (28.8%). It should be noted that the percentage calculated for the thorax is likely to slightly underestimate its actual contribution, as a portion of the thorax was removed (Figure 1—figure supplement 1A) to eliminate salivary glands. This removal likely accounts, at least in part, for the fact that the sum of the calculated scaling factors is 94.9%, rather than 100%. We validated our estimates by using them to predict the whole-body expression of every gene in the genome, then plotting those predictions against observed values from the whole-body transcriptome (Figure 1—figure supplement 3A). Calculated and observed values were highly correlated (slope of 1.00, R2 = 0.95) after removal of a single outlying gene (AAEL018689, a mitochondrial ribosomal RNA) which was underpredicted by our calculation. This analysis validates our hypothesis that the ovaries contribute a disproportionate number of transcripts to the whole-body transcriptome. Further, our data demonstrate that, due to the preponderance of ovarian contribution, multiple tissues have only a low representation in the body, severely limiting the ability to detect tissue-specific changes in whole-mosquito experiments.

To examine the biological processes and molecular functions enriched in each body part, we performed a Gene Ontology Enrichment Analysis (GOEA) of genes 5 x (or, for ovaries, 2 x) enriched in each body part (in comparison to whole body, DESeq2 padj <0.05). An extended list of GO categories may be found in Supplementary file 2. We complemented this approach by identifying the twenty highest expressed genes in each body part (treemaps in Figure 1C), reasoning that the highest expressed genes in a transcriptome may also give insight into function. In brief, the head showed enrichment for sensory machinery and neuronal signaling, the thoracic carcass (which houses flight muscle) showed signs of enhanced metabolic activity, and the abdominal carcass (which contains mostly fat body) displayed enrichment for regulatory proteolytic enzymes (CLIP-domain serine endopeptidases), defense, and lipid transport. The gut was enriched with defense-related genes including immune-activating peptidoglycan recognition proteins (PGRPs), AMPs, and immune-modulating amidase PGRPs. It also showed enrichment for digestive enzymes, especially peptidases. Notably, its three most prevalent transcripts were the non-CLIP serine endopeptidases early trypsin (ET), female-specific chymotrypsin (CHYMO), and juvenile hormone-regulated chymotrypsin (JHA15) which, together, accounted for more than one third of its transcriptome. (Note: Vectorbase gene IDs for all genes named in this text are listed in Supplementary file 3). The Malpighian tubules, which serve a function analogous to mammalian kidneys, expressed large quantities of ion transporters, especially vacuolar ATPases, which may be required to create proton gradients for use in secondary active transport (Weng et al., 2003). Because the ovaries did not express many highly enriched genes, we adjusted our analysis to examine genes expressed 2 x higher than in the whole body and discovered categories related to cell division and germline maintenance. We also observed that a small cohort of mitochondrial genes (COX1, COX2, COX3, ATP6, and a 16 s gene) were among the highest expressed transcripts in several body parts, especially those containing tissues that carry out energy-intensive functions (head, thorax, Malpighian tubules). Overall, we concluded that the most prominent transcripts and enriched GO categories in each body part were consistent with expectations for the tissues they house. We take these results as a validation of our data set, and as confirmation that the segments of the mosquito carcass (head, thorax, abdomen) perform well as proxies for important tissues (brain/eyes, flight muscle, fat body).

Gut region-specific transcriptomes reveal functional compartmentalization of the mosquito midgut

To evaluate the regionalization of gut function, we generated RNAseq profiles for the five main regions of the mosquito digestive tract: crop and dorsal diverticula (hereafter referred to as ‘crop’ for brevity), proventriculus, anterior midgut, posterior midgut, and hindgut. A PCA of these profiles (Figure 2A) demonstrated clear segregation between regions, and close clustering of all replicates from the same region, indicating clearly distinct transcriptomes. The anterior and posterior midgut, which are predominantly derived from endoderm, clustered together opposite the crop and hindgut, which are of primarily ectodermal origin. The proventriculus, which is partially derived from both germ layers, was intermediate between these two clusters on PC1. Whole-gut transcriptomes clustered closely with posterior midgut transcriptomes, suggesting that the posterior midgut contributes more to the transcriptome of the whole gut than do the other regions. Accordingly, very few genes in the posterior midgut showed enrichment compared to the whole gut (Figure 2—figure supplement 1A). GOEA on genes enriched in each region (Figure 2B), complemented by an examination of the twenty highest expressed genes in each (Figure 2C) revealed that the ectodermally derived regions of the gut (crop and hindgut) expressed genes involved in chitin metabolism, lipid metabolism, ion transport and numerous CLIP-domain serine endopeptidases. The proventriculus was notably enriched for wingless signaling and defensive genes (lysozymes, and the highly expressed AMP gambicin, GAM1). In the anterior midgut, carbohydrate metabolism and heme-binding proteins (predominantly cytochrome P450s) were the most prominently enriched categories. The highest expressed gene in this region was a member of the MBF2 family of transcription factors of unknown function in mosquitoes. As the posterior midgut transcriptome closely resembled that of the whole gut, only a handful of genes were enriched 5 x in this region relative to whole gut. We also noted that a large proportion of the genome was either unexpressed or little-expressed in the posterior midgut relative to the other gut regions (Figure 2—figure supplement 1B). To better identify genes enriched in the posterior midgut, we calculated enrichment against an aggregate of the other four gut regions. We found that the posterior midgut is highly enriched for peptidases (especially non-CLIP serine endopeptidases) and genes involved in translation. Other enriched categories include signal recognition particle (SRP)-dependent targeting, amino acid transport and amino acid metabolism. Together these categories describe a region that is primed to translate and secrete large quantities of peptidases, and to process and absorb the resulting free amino acids. Overall, our data demonstrate that the five gut regions are highly specialized functional units.

Figure 2 with 1 supplement see all
Mosquito gut regions’ transcriptomic investments and outputs reveal digestive specializations.

(A) Principal component analysis of the transcriptomes of whole gut and gut regions; 3–6 replicates. Circled clusters contain body parts of partial endodermal (left) and ectodermal (right) derivation. (B) Bubble plots: classic Fisher topGO gene ontology enrichment analysis of regionally enriched genes (for crop, proventriculus, and anterior midgut: 5 x enriched over whole-gut, DESeq2 padj < 0.05; for posterior midgut: enriched over all other gut regions, DESeq2 padj < 0.05) (C) Treemaps: 20 highest expressed genes per region, grouped by category/function; percentages reflect the proportion of the transcriptome comprised by the displayed genes. (D) Regional investments in categories of digestive enzymes and transporters. Colored cells are scaled as percentages of the region with the highest transcriptional investment in the category (as measured by cumulative TPM). The region with the highest investment is scored as ‘100’, with its cumulative TPM displayed in the rightmost column (‘Max TPM’). (E) Estimates of regions’ transcriptional output by empirical quantification of RNA yield; statistics: One-way ANOVA, Tukey HSD. (F) Estimated output of digestive enzymes/transporters by region (categorical investments weighted by regional RNA yield).

The central function of the gut is digestion, a highly sequential process. To better understand how digestion and nutrient absorption are distributed along the gut, we examined the regional cumulative expression of enzymes and transporters putatively involved in the digestion and absorption of lipids, carbohydrates, and proteins/amino acids. We postulated that the cumulative expression of a given set of digestive enzymes/transporters of common function reflects the investment of each region in that function (what fraction of their transcriptome they dedicate to that process). We first assembled lists of lipases, lipid-transporting proteins, amylases/maltases, glucosidases, sugar transporters, peptidases, and amino acid transporters with a probable digestive function (excluding genes with anticipated non-digestive function, e.g., peptidases annotated as, or orthologous with, CLIP-domain serine endopeptidases, caspases, proteasome components, etc.). For complete lists and inclusion criteria, see Supplementary file 4. We summed the expression of the genes from each category (in TPM) and calculated their relative expression by region (Figure 2D). The highest investments in different categories of lipases were divided among several regions, with phospholipases receiving the greatest investment from crop and anterior midgut, sphingomyelinases from anterior midgut, and the remainder of lipases modestly more expressed in the posterior midgut and hindgut. The crop was the greatest investor in lipid carrier proteins and fatty acid transporters, while the anterior midgut showed the greatest investment in phospholipid and sterol transporters. The highest investments in sugar digestion/absorption and protein digestion were found in the anterior and posterior midgut, respectively. The hindgut was a disproportionate investor in amino acid transporters, suggesting that many of the products of protein digestion from the midgut are absorbed there. These data demonstrate that distinct gut regions invest different amounts of their respective transcriptomes in the digestion and absorption of different nutrients, hinting at a system of specialization/sequential processing of ingested materials.

Caution should be used in the interpretation of cumulative expression data, as investment (in TPM) is distinct from output (i.e. the number of transcripts of a given gene or category produced in a region). The five regions do not produce equal numbers of transcripts overall, so their transcriptional investments must be scaled by their total transcriptional yield to achieve an estimate of their relative contributions to specific functions. We measured the RNA content from each region (Figure 2E) and scaled the cumulative expression values from Figure 2D by yield to gain an estimate of the transcriptional output of each region with respect to each digestive category (Figure 2F). We found that the posterior midgut, by virtue of its outsized transcriptional yield, is the dominant contributor not only of peptidases, but of most categories of transcripts with digestive and absorptive functions. However, the proventriculus and anterior midgut together are the source of more than half of all amylases/maltases and glucosidases and contribute a sizeable minority of all sugar transporters. These observations cement our conclusion that the anterior midgut specializes in the digestion and absorption of carbohydrates, while the posterior midgut is responsible for most protein and lipid digestion.

Blood feeding initiates a series of transcriptomic shifts over the course of digestion

The gut of the hematophagous mosquito undergoes dramatic changes upon blood feeding. To better understand these changes at a transcriptome-wide level, we generated RNAseq profiles for sugar-fed guts as well as guts at 4–6 hr (hereafter referred to as 6 hr for brevity), 24 hr, and 48 hr after feeding on a live chicken. PCA (Figure 3A) demonstrates that the transcriptome changes throughout at least the first 24 hr pbm, before reverting to a near-basal condition by 48 hr pbm, suggesting that the transcriptional regulation of digestion is a very dynamic process.

Figure 3 with 3 supplements see all
The gut’s transcriptome shifts continuously throughout blood meal digestion.

(A) Principal component analysis of the transcriptomes of sugar-fed and blood-fed guts; 3–6 replicates per condition. (B) Classic Fisher topGO gene ontology enrichment analysis of blood-fed guts at each timepoint pbm, in comparison to prior timepoint (DESeq2, padj <0.05, minimum of 2 x fold-change). (C) Investments in categories of digestive enzymes and transporters by timepoint/condition. Colored cells are scaled as percentages of the region with the highest transcriptional investment in the category (as measured by cumulative TPM). The region with the highest investment is scored as ‘100’, with its cumulative TPM displayed in the rightmost column. (D) Proportional investment (out of whole transcriptome) in selected gene categories/families by timepoint/condition. (E) Expression of selected peptidases, scaled as a percentage of maximum expression. The region with the highest expression (in TPM) is scored as ‘100’, with its TPM displayed in the rightmost column. (F) Relative expression of selected peptidases at 0, 6, 12, 24, 36, 48, and 72 hrs post-blood meal measured by RT-qPCR; reference gene: AAEL009653; 3 replicates.

To evaluate the functional changes to the blood-fed gut’s transcriptome, we performed a GOEA (Figure 3B) comparing the gut at each time-point to its state at the time-point prior. Guts at the final timepoint (48 hr) were also compared to sugar-fed guts. The analysis was limited to genes that were up or downregulated at least two-fold (DESeq2 padj <0.05).

We found that, at 24 hr pbm, after ecdysone titers reach their peak, (Hagedorn et al., 1975), steroid receptor activity was significantly enriched. A total of five steroid receptor genes were upregulated, including EcR, USP (encoding the co-receptor to EcR), Hnf4, ftz-f1, and HR3. Multiple categories pertaining to protein synthesis and secretion were also regulated by blood-feeding: at 6 hr pbm, a small number of genes associated with mRNA splicing, protein folding, and SRP-dependent secretion were upregulated, while genes associated with miRNA inhibition of translation were downregulated. However, by 24 hr pbm, all these changes were reversed and, additionally, genes associated with peptide transport (39 genes) and translation (57 genes) were significantly downregulated. At 48 hr pbm, protein folding, peptide transport, and translation were all upregulated again. Notably, translation was one of very few categories that was significantly enriched in 48 hr blood-fed guts compared to exclusively sugar-fed guts.

Several of the categories regulated by blood feeding appear to reflect necessary adaptations to the changing luminal environment. Six hrs following blood meal ingestion, ferric iron-binding proteins (including three ferritin subunit precursors, AAEL004335, AAEL007383, and AAEL007385) were upregulated and two genes in the lysozyme activity category (LYSC11, and LYSC4) were also induced, possibly in response to the previously described increase of ROS and bacterial density pbm (Gusmão et al., 2010). At 24 hr pbm, the defense response category was also induced. Two defensins (DEFA and DEFD) were enriched, as well as two peptidoglycan-degrading amidases (PGRP-SC1, and PGRPS4), orthologs of which, in Drosophila, modulate the activation of the immune deficiency (IMD) pathway (Buchon et al., 2009). At 48 hr pbm, both the ferric iron-binding and defense response categories were downregulated relative to 24 hr, which is unsurprising as most of the luminal content had been evacuated by that time (Figure 3—figure supplement 1A). We also noted a significant induction of genes involved in cellular respiration which could reflect the metabolic expenditure required for the peristaltic evacuation of the blood bolus. GO categories related to the digestion and absorption of macronutrients were nearly all upregulated at 6 hr pbm, with some upregulated again at 24 hr but downregulated to return to baseline by 48 hr. In summary, GOEA indicates that blood feeding regulates the gut’s transcription of genes involved in diverse functions, including digestion, absorption, translation, and defense.

In adult mosquitoes, a peritrophic matrix is synthesized in the hours immediately following the ingestion of a blood meal. Previous studies have documented or proposed the involvement of several genes in peritrophic matrix synthesis in Ae. aegypti. These include the glucosamine-fructose-6-phosphate amino transferase AeGfat-1, which catalyzes the first step of chitin biosynthesis, the chitin synthase AeCs, which catalyzes the final step (Kato et al., 2006), and the peritrophins AAEL004798, AAEL006953, and Aper50 (Shao et al., 2005; Whiten et al., 2018). All five of these genes were robustly expressed in our blood-fed guts, and two (AeGfat-1 and Aper50) were significantly upregulated at 6 hr pbm. Notably, the peritrophin AAEL006953 was heavily concentrated in the proventriculus at baseline and, at a TPM of over 68,000, was that region’s top expressed gene (Figure 2B). Other candidates, orthologous to Drosophila chitin synthases (AAEL002718) and peritrophins (AAEL005702, AAEL009585) showed little expression in the gut at any timepoint. Surprisingly, DUOX, the ortholog of a dual oxidase which catalyzes the formation of a dityrosine network in the mucin layer adjacent to the An. gambiae peritrophic matrix (Kumar et al., 2010), was notably absent in the Ae. aegypti gut (Figure 3—figure supplement 1B).

To characterize the blood-fed gut’s changes in digestion/absorption in more detail, we calculated the cumulative investment of the gut in each digestive category at each timepoint (Figure 3C). Overall, cumulative transcript analysis echoed the results of our GOEA. Both showed that the gut’s transcriptional investment in sphingomyelinases and amino acid transporters peaked at 6 hr pbm, and that the expression of carbohydrate digestive enzymes was elevated at 24 hr pbm but declined again at 48 hr pbm. Our granular cumulative expression analysis revealed that aminopeptidases, dipeptidases, and dipeptidyl peptidases reached peak investment at 6 hr, and carboxypeptidases at 24 hr. Remarkably, the gut’s investment in serine endopeptidases (non-CLIP), which constitute the majority of peptidase transcripts in the gut by nearly an order of magnitude, remained steady across sugar-fed, 6 hr, and 24 hr pbm guts, before plunging by approximately one third at 48 hr pbm.

We extended our cumulative investment approach to other functions in the gut by aggregating transcripts of other transcriptionally prominent families and comparing their relative proportions across all timepoints (Figure 3D). This revealed the dramatic temporary increase in the expression of a family of twelve genes containing the insect allergen domain (IPR010629) which are orthologous to the An. gambiae G12 gene and have been found to possess hemolytic, cytolytic, and antiviral properties (Foo et al., 2021). We also observed that the family of MBF2 transcription factors - one member of which was previously noted among the highest expressed genes of the anterior midgut (Figure 2C) - surged from just over 1% of the transcriptome to more than 4% at 24 hr pbm, partially subsiding to approximately 1.7% by 48 hr pbm. Our comparison of summed transcripts yielded an apparent paradox: while GOEA demonstrated the upregulation of multiple peptidases at 6 hr pbm, the gut’s overall investment in peptidases (cumulative TPM) had actually declined at this timepoint from 39.7% to 32.7% of the transcriptome, before rising to 41.2% at 24 hr pbm. This contradiction is explained by the depletion of transcripts from the small ‘early’ cohort of peptidases (ET, CHYMO, and JHA15) which compensated for the upregulation of other peptidases at 6 hr pbm. It should be noted that, while peptidase investment did not increase much in blood-fed guts, higher RNA yield Figure 3—figure supplement 1C caused peptidase transcript output to increase significantly at 24 hr pbm (Figure 3—figure supplement 1D).

To more closely examine the dynamics of peptidase transcription in the blood-fed gut, we performed a clustering analysis (Figure 3—figure supplement 2A) which revealed that the transcript levels of many peptidases in the gut change dramatically over the course of blood meal digestion, that these changes come in the form of both induction and repression/depletion, frequently followed by a return to baseline at the subsequent timepoint, and that they manifest in multiple successive waves. Our clustering analysis divided the dynamically expressed gut peptidases into three waves of induced transcripts (rapid, intermediate, and delayed), and three corresponding waves of repressed or depleted transcripts. Among induced peptidases, the rapid and delayed waves comprise genes that peaked in our data set at 6 hr and 24 hr pbm, (23 and 40 genes respectively). Our clustering analysis implied that another, ‘intermediate’ cohort of induced peptidases (16 genes) nests somewhere between these waves. These genes show sustained induction from 6 to 24 hr, and likely peak at some time between these two timepoints. The waves of rapid, intermediate, and delayed repressed/depleted peptidases mirror the expression patterns of their induced counterparts in inverse. Figure 3E shows the expression of representative peptidases from each of the six cohorts across the blood feeding kinetic. We validated our findings by performing RT-qPCR on selected peptidases (two repressed/depleted and five induced genes) at 6, 12, 24, 36, 48, and 72 hr pbm. We selected RpS30 (AAEL009653), a housekeeping gene with robust and steady expression over the course of blood meal digestion, as a reference gene for our assay (Figure 3—figure supplement 2B). Figure 3F shows the relative expression of each peptidase (for unscaled data normalized only to the reference gene, see Figure 3—figure supplement 2C). At each timepoint we assayed, we captured one or more genes peaking sharply, further illustrating the dynamic nature of peptidase expression in the blood-fed gut. Most peptidases are organized in genomic clusters which, we hypothesized, could underlie co-regulation. We also examined the evolutionary relatedness of the peptidases from each temporal cluster by comparing their positions on a phylogenetic tree (composed using Geneious R11 software, Figure 3—figure supplement 3A). While sequence similarity corresponded closely with physical location in most cases, neither parameter appeared correlated with temporal clustering. Altogether, our data reveal that blood feeding initiates a well-orchestrated transcriptomic response in the gut, dominated by the dynamic up and down-regulation of peptidases throughout the entire process of digestion.

Regional digestive specializations are preserved in the blood-fed midgut

Considering the extensive functional regionalization of the Ae. aegypti gut and the profound reshaping of the whole-gut transcriptome upon blood feeding, we asked whether regional differences are preserved in the blood-fed gut. To evaluate the regionalized effects of blood feeding, we generated transcriptomes for dissected anterior midguts (without proventriculus) and posterior midguts at 24 hr pbm. PCA (Figure 4A) revealed that all profiles clustered both by region and by blood feeding status. As we previously noted (Figure 2A), there was little difference between sugar-fed whole gut and posterior midgut profiles. Upon blood feeding, both whole gut and posterior midgut shifted in tandem, maintaining proximity, indicating that blood feeding does not lessen the dominance of the posterior midgut’s transcriptional yield over that of the anterior. Both midgut regions shifted in response to blood feeding; moreover, they moved in the same directions with respect to PC1 and PC2, suggesting that the overall transcriptional responses of the two regions share some commonality. We noted that the relative investments of these two regions in most of the prominent functional categories did not change dramatically upon blood feeding (Figure 4B). However, two of the most remarkable changes in the transcriptome of blood-fed mosquitoes, the upregulation of MBF2 transcription factors and G12 genes (Figure 3D), were mainly localized to the posterior midgut.

Regional specializations in sugar and protein digestion are preserved in the blood-fed midgut.

(A) Principal component analysis of the transcriptomes of sugar-fed and 24 hr blood-fed whole guts and anterior and posterior midguts; 3–6 replicates per condition. (B) Proportional expression (out of whole transcriptome) of selected gene categories/families by region/condition. (C) Classic Fisher topGO gene ontology enrichment analysis of blood-fed versus sugar-fed regions (DESeq2, padj <0.05, minimum of 2 x fold-change). Categories that are regulated in both regions are shown in red. (D) Investments in categories of digestive enzymes and transporters by region/condition. Colored cells are scaled as percentages of the region/condition with the highest transcriptional investment in the category (as measured by cumulative TPM). The treatment with the highest investment is scored as ‘100’, with its cumulative TPM displayed in the rightmost column. (E) Estimates of midgut regions’ transcriptional yield by empirical quantification of RNA in sugar-fed guts versus 24 hr after feeding with an artificial blood replacement diet; statistics: unpaired t-test. (F) Estimated output of digestive enzymes/transporters by midgut region in sugar-fed and blood-fed mosquitoes (categorical investments weighted by regional RNA yield).

A GOEA of categories upregulated and downregulated by blood feeding in the anterior and posterior midgut (Figure 4C) mainly recapitulated changes already noted in blood-fed whole guts (Figure 3B). Most categories that were upregulated in the anterior midgut were similarly upregulated in the posterior, albeit to differing extents. The fact that blood feeding promotes enrichment of many of the same digestive/absorptive categories in both anterior and posterior midgut could imply that the two regions are converging on a common role with respect to macronutrient exploitation. However, an analysis of cumulative expression (Figure 4D) demonstrated that the two regions’ respective investments in digestive enzymes and transporters remained highly divergent. Consistent with the results of the GOEA, the posterior midgut substantially increased its investment in lipase transcripts, with a smaller increase in the anterior midgut. By contrast, the anterior midgut kept a substantial edge in its investment in lipid transporters and lipid carrier proteins. The posterior midgut increased its investment in amylases/maltases and glucosidases by approximately five and twofold, respectively, and the anterior midgut increased both by less than twofold. However, the difference in baseline expression was such that the blood-fed anterior midgut’s investment was still ten times that of the blood-fed posterior midgut. Neither region substantially increased its investment in sugar transporters, peptidases, or amino acid transporters. When these regional investments were weighted by the increased RNA yield observed in both blood-fed midgut regions (Figure 4E), we found that the proportional output of the posterior midgut increased across most digestive categories, but that the anterior midgut still contributed a substantial minority of transcripts for sugar-digesting enzymes and sugar transporters (Figure 4F). Altogether, we concluded that both midgut regions participate in the transcriptional response to blood feeding, and that the blood-fed anterior and posterior midgut retain their regional specializations with respect to carbohydrate and protein digestion.

Immune gene patterning reveals areas of immune activity and immune tolerance in the mosquito body and gut

As mosquitoes are prolific vectors of important human pathogens, their defensive functions are of paramount interest. We assembled a table recording the expression of immune genes in each category by body part, gut region, and blood-fed timepoint (Supplementary file 5). Many genes associated with the siRNA and piRNA pathways display a strong tropism toward the ovaries (Figure 5A), possibly indicating that, in mosquitoes, siRNAs as well as piRNAs (Siomi et al., 2011) are required to protect the germline from selfish elements. Accordingly, the gene loqs2, which has been described as an essential component of Ae. aegypti’s systemic siRNA response to Zika and dengue viruses (Olmo et al., 2018) was restricted to the ovaries to such an extent that it qualified as an ovary-specific marker (Figure 1B). The ovaries were also the site of the highest expression of genes coding for components of Toll, IMD, and JAK-STAT signaling. Remarkably, genes coding for extracellular regulators (such as most CLIP-domain serine endopeptidases) and pattern recognition proteins (PGRPs and Gram-negative bacteria-binding proteins, GNBPs) did not show this pattern of expression. Instead, they were generally highly expressed in the head, thorax and abdomen, the three compartments that contain fat body. Even more remarkably, AMPs (Figure 5B) showed negligible expression in the ovaries. Our results suggest that, in the female germline, portions of the IMD, Toll, and JAK-STAT pathways are uncoupled from upstream pattern recognition proteins and downstream AMP targets, possibly playing a regulatory role in some alternative process (e.g. development).

Immune gene patterning reveals zones of immune activity and tolerance in the mosquito body and gut.

(A) Body parts’ investments in immune pathway-related genes, scaled as a percentage of maximum expression. The body part with the highest expression (in TPM) is scored as ‘100’ with its TPM displayed in the rightmost column. Note: immune-related genes may serve an alternate, developmental role in ovaries. (B) Cumulative antimicrobial peptide expression by body part (TPM); statistics: One-way ANOVA, Tukey HSD. (C) Expression of immune activating (white) /modulating (gray) genes, scaled as a percentage of maximum expression. The region with the highest expression (in TPM) is scored as ‘100’ with its TPM displayed in the rightmost column of each half of the table. For genes with very low expression (Max TPM <1) color scale is replaced with gray. (D) Cumulative antimicrobial peptide expression by gut region (TPM); statistics: One-way ANOVA, Tukey HSD. (E) Output of immune-related genes and gene categories by gut region (TPM weighted by regional RNA yield).

An examination of the distribution of AMPs and lysozymes in the body (Figure 5B) revealed that two AMPs, holotricin (GRRP) and gambicin, together accounted for most AMP transcripts in the mosquito body. These two AMPs were expressed in different body parts in a near mutually exclusive manner. The head, thorax, and abdomen of the mosquito predominantly expressed holotricin transcripts, with very little gambicin, while the gut and Malpighian tubules were dominated by gambicin expression and expressed very few holotricin transcripts.

Immune activity in the gut is tightly regulated as this organ interfaces with both commensal and pathogenic microbes. In Drosophila, the IMD pathway is the main pathway controlling AMPs in the midgut, and its activity is strongly constrained by expression of amidase PGRPs (Buchon et al., 2009), while the Toll pathway is expressed mostly in the ectodermal portions of the gut. We found a similar pattern in the gut of Ae. aegypti (Figure 5C) with orthologs of Toll pathway recognition proteins most strongly expressed in the crop while IMD-activating PGRPs (PGRPLC and PGRP-LE) and IMD pathway components were enriched in the midgut. Investment in these IMD-activating PGRPs was highest in the anterior midgut, while immune-modulating PGRPs showed more divided expression. The amidase PGRPLB was most prominently expressed in the anterior midgut, but the short amidase PGRPs (PGRP-SC1 and PGRPS4, orthologs of Drosophila PGRP-SC1a and PGRP-SC1b) were most strongly expressed in the posterior midgut. Likewise, an ortholog of caudal (cad), a transcription factor that limits AMP expression in the Drosophila midgut (Clayton et al., 2013; Ryu et al., 2008), was profoundly enriched in the posterior midgut. Altogether, we found that the expression patterns of these key genes suggested enhanced immune vigilance in the anterior portion of the midgut, with hallmarks of immune tolerance prominent in the posterior midgut. We also observed that blood feeding enhanced the expression of the short amidase PGRPs in the posterior midgut, sharpening the regional divide.

A cumulative analysis of the expression of immune effectors (AMPs and lysozymes) in the gut showed that the proventriculus and anterior midgut, respectively, invest 3.6% and 2.3% of their transcriptomes in the expression of these effectors – especially gambicin (Figure 5D). The posterior midgut, by contrast, expressed just 110 TPM of AMPs and lysozymes combined, supporting our hypothesis that the posterior midgut is characterized by immune tolerance toward microbes. When AMP and lysozyme expression is weighted by gut regional RNA yield (Figure 5E), it is apparent that the anterior midgut and proventriculus contribute the majority of transcripts in these categories despite their relatively small transcriptional yields. Overall, the expression patterns of immune activators, modulators, and effectors suggests that the anterior portions of the midgut may exert strong selection of microbes upon entry, while the posterior midgut is a region of greater immune tolerance.

Gut digestive and defensive regionalization is well conserved between the midguts of Aedes aegypti and Anopheles gambiae

Ae. aegypti and An. gambiae are both important hematophagous vectors, and their digestive tracts bear a clear anatomical similarity. To assess how similar these organs are at the transcriptomic level, we created RNAseq profiles for whole guts (comprising all regions from crop to hindgut) and midgut regions (proventriculus, anterior midgut and posterior midgut) from An. gambiae (s.l.) G3 mosquitoes and compared them to their Ae. aegypti counterparts. In a clustering analysis (Figure 6A), we found that the profiles of midgut regions and whole guts from each species were grouped similarly, with anterior midguts intermediate between proventriculi and posterior midguts. Also, in both species, whole guts clustered between anterior and posterior midgut regions but displayed more similarity to the latter, suggesting that for An. gambiae, as for Ae. aegypti, the transcriptional yield of the posterior midgut greatly exceeds that of the anterior.

Figure 6 with 1 supplement see all
Digestive and defensive regional specializations are well conserved between the Aedes aegypti and Anopheles gambiae midguts.

(A) Clustering analyses of the transcriptomes of Aedes aegypti and Anopheles gambiae whole gut and midgut regions; 3–6 replicates (B) Regional investments in categories of digestive enzymes and transporters. Colored cells are scaled as percentages of the region with the highest transcriptional investment in the category (as measured by cumulative TPM). The region with the highest investment is scored as ‘100’, with its cumulative TPM displayed in the rightmost column of each half of the table. (C) Expression of immune activating (white)/modulating (gray) genes and effectors, scaled as a percentage of maximum expression. The region with the highest expression (in TPM) is scored as ‘100’ with its TPM displayed in the rightmost column of each half of the table. (D) Conserved regional enrichment of gene families and functions in Ae. aegypti and An. gambiae midgut regions, as evaluated by classic Fisher topGO gene ontology enrichment analysis, investment analysis, and a comparison of the twenty highest expressed transcripts.

We repeated our analysis of regional transcriptomic investment in nutrient digestion and absorption by generating lists of genes encoding enzymes and transporters, in the same categories as in Figure 2C, summing their regional expression in TPM, and calculating their relative abundance. Since we lacked transcriptomes for whole body, crop, and hindgut for An. gambiae, it was necessary to adjust the inclusion criteria we employed (Supplementary file 4) for our previous investment analysis. Complete lists of genes in each category with adjusted inclusion criteria can be found in Supplementary file 6 for Ae. aegypti and Supplementary file 7 for An. gambiae. Our comparison across the two species (Figure 6B) revealed overall conservation of gut structure-function. This was particularly evident for the anterior and posterior midguts which, respectively, remained the strongest investors in sugar and protein digestion/absorption. However, in An. gambiae, the posterior midgut appears to make a greater proportional investment in the digestion/absorption of both lipids and sugars. We also noted that, while the posterior midgut in both species was the dominant investor in peptidases, the amplitude of the Ae. aegypti investment far outstripped that of An. gambiae (Figure 6—figure supplement 1A), particularly with respect to serine endopeptidases.

To compare the distribution of defensive functions in the two midguts, we examined the expression patterns of important orthologous immune activators, modulators, and effectors between the two mosquitoes (Figure 6C) and found clear similarities. In both species, the IMD-activating PGRPLC receptor as well as the immune-modulating PGRPLB amidase were most strongly expressed in the anterior midguts, the transcription factor cad was most strongly expressed in the posterior midguts, and the orthologs of GAM1 and LYSC7B (respectively the highest expressed AMP and lysozyme in the Ae. aegypti midgut) were most strongly expressed in the proventriculi. This distribution of key genes leads us to hypothesize that the immune regionalization of the An. gambiae midgut is likely similar to that of Ae. aegypti. It should be noted, however, that the short amidase, PGRPS3, a putative negative regulator of the IMD pathway, is robustly expressed in the An. gambiae proventriculus, in striking contrast to its Ae. aegypti counterparts. We further note that, while the amplitude of GAM1 and LYSC7B/LYSC7 investment is comparable in the two species, the An. gambaie proventriculus and anterior midgut transcriptomes contained large quantities of transcripts for cecropins and defensins, rendering their overall AMP investment more than five times higher than the same regions in Ae. aegypti (Figure 6—figure supplement 1B). In both species, the posterior midgut’s investment in AMPs was negligible. Altogether, the expression of digestive (Figure 6B) and defensive genes (Figure 6C) as well as a GOEA (Figure 6—figure supplement 1C) of the An. gambiae midgut regions confirm that the midgut structure-function relationship is well conserved between the two mosquito species. Figure 6D provides a summary of enriched functions and prominently expressed gene families that are shared by the midgut regions of Ae. aegypti and An. gambiae.

A small number of disparately expressed genes differentiate Aedes aegypti and Anopheles gambiae midgut transcriptomes

The conservation of gut functional regionalization may be due either to conserved expression patterns among orthologs across evolutionary distance, or to convergent patterning of non-homologous genes of shared function. To evaluate how well the relative expression of individual orthologous genes has been maintained between Ae. aegypti and An. gambiae midgut regions, we performed a series of correlation analyses using only the 7,430 genes classified by Orthogroups Analysis (Supplementary file 8) as one-to-one orthologs. Scaled expression values for the orthologs were poorly correlated both in whole guts (Figure 7A), and in dissected regions (Figure 7—figure supplement 1A), with R2 values ranging from 0.08 (proventriculus) to 0.56 (posterior midgut). However, the distribution of data points in the correlation graphs suggested that a few highly/disparately expressed genes substantially depressed correlation. To test the effect of these genes on overall correlation, we sequentially censored genes that were strongly and significantly disparately expressed between both species (DESeq2, padj <0.05), rescaled the expression (TPM) of the remaining one-to-one orthologs, and plotted the resulting changes in slope and correlation coefficient (Figure 7B, Figure 7—figure supplement 1B). For the proventriculus, anterior midgut, posterior midgut, and whole gut the censorship of, respectively 4, 10, 20, and 35 1-to-1 orthologous pairs was sufficient to reveal strong correlation between the remaining orthologs, with slopes ranging from 0.78 to 1.1 and R2 values ranging from 0.77 to 0.89 (Figure 7C, Figure 7—figure supplement 1C). From this, we conclude that a ‘species signature’ masking the transcriptional similarity of Ae. aegypti and An. gambiae midguts is the product of a small number of highly/disparately expressed genes.

Figure 7 with 3 supplements see all
A small number of disparately expressed genes constitute a ‘species signature’ differentiating midgut transcriptomes in Aedes aegypti and Anopheles gambiae.

(A–C) Correlation of one-to-one orthologous genes before and after censorship of the 35 most disparately expressed orthologous pairs. (D) One-to-one orthologs censored in a regional correlation analysis; (^) signifies orthologous pairs censored in both proventriculus and anterior midgut, (†) signifies orthologous pairs censored in both anterior midgut and posterior midgut (*) signifies orthologous pairs censored in both proventriculus and posterior midgut. (E) 20 highest expressed transcripts per region, grouped by category/function; percentages reflect the proportion of the transcriptome comprised by the displayed genes.

Figure 7D displays the genes censored from our correlation analysis in the three midgut regions. The most disparately expressed gene in both the An. gambiae proventriculus and anterior midgut encodes an uncharacterized protein containing a domain (IPR007931) from the Tsetse EP protein which, in tsetse fles has been shown to play a role in resisting the establishment of trypanosome infections (Haines et al., 2010). MBF2 transcription factors were also censored in each region: twice they were expressed higher in Ae. aegypti (proventriculus and anterior midgut) and once in An. gambiae (posterior midgut). Among the disparately expressed orthologs, the most abundant category was ribosomal proteins. We observed that in both the anterior and posterior midgut, Ae. aegypti had significantly increased the expression of some ribosomal proteins relative to their one-to-one orthologs in An. gambiae, and vice versa. Phylogenetic analysis of ribosomal proteins (Figure 7—figure supplement 2A) confirmed that the putatively one-to-one orthologous ribosomal genes identified by Orthogroups Analysis were sister sequences and that the disparately expressed pairs of orthologous genes were widely distributed throughout the phylogenetic tree. We therefore conclude that evolutionary changes between the two species have reshuffled the expression of ribosomal proteins, significantly reducing the roles of some while increasing those of others.

Many of the genes censored in our correlation analysis were among the highest expressed genes in the three Ae. aegypti and An. gambiae midgut regions. A follow-up comparison of the 20 highest expressed genes in the midgut regions of the two species (Figures 2C and 7E) revealed striking differences in both amplitude and function. In the proventriculus, An. gambiae devoted a higher proportion of its transcriptome (63% vs 41% in Ae. aegypti) to the expression of the 20 highest expressed genes. The three highest expressed, including an uncharacterized gene (AGAP013543), the Tsetse EP gene (AGAP009313), and a cecropin (CEC2) together composed more than one third of the transcriptome. Both species’ proventriculi shared expression of two MBF2 transcription factors (AAEL005428 and AAEL013885 in Ae. aegypti, AGAP011630 and AGAP000570 in An. gambiae), as well as the AMP gambicin (GAM1). However, the An. gambiae proventriculus also expressed three cecropins (CEC2, CEC1, CECB), one defensin (DEF1) and a lysozyme (LYSC7) among its top 20 genes, together totaling more than 25% of its transcriptome. Cecropins (CEC1, CEC2) and a defensin (DEF1) were likewise among the 20 highest expressed genes in the An. gambiae but not the Ae. aegypti anterior midgut. In the posterior midgut, the highest expressed peptidases in An. gambiae were much lower expressed than their counterparts in Ae. aegypti, and the top 20 genes in the An. gambiae posterior midgut included two MBF2 transcription factors (AGAP000570, AGAP003713) that were absent among the highest expressed genes in the corresponding region in Ae. aegypti. Altogether, we find that while gut regional specializations have been broadly conserved between Ae. aegypti and An. gambiae, and while most one-to-one orthologous genes maintain closely correlated expression patterns, this correlation is lost among a few of the highest expressed genes in the midgut, creating a ‘species signature’ that differentiates the two midguts’ transcriptomes.

Discussion

Aegypti-Atlas is a new resource which contextualizes and dissects the transcriptome of the sugar and blood-fed Aedes aegypti gut. Aegypti-Atlas will allow exploration of gene expression across the main parts/regions of the female mosquito body/gut in an easily accessible online database. In addition, we have performed a first analysis of this new dataset, examining how biological functions are distributed in the parts of the mosquito body and the regions of the gut, as well as how they change in the gut over the course of blood meal digestion. Our study demonstrates the importance of estimating investment, yield, and output in transcriptomic studies. In addition, this research has yielded new insights into such diverse areas as the regionalization of digestive function, the temporal patterning of peptidase transcription, the organization of immune defenses, and the similarities and differences between the guts of two hematophagous vectors.

Assessments for highest expressed genes and categorical investment/output are valuable complements to GO enrichment analysis

In bioinformatic analysis of large-scale transcriptomic data sets, GOEA is often the preferred method for determining the functional role of a tissue or body part, or for detecting functional differences across conditions. We have employed GOEA to derive valuable information about the enrichment of functions in mosquito body parts, gut regions, and blood-fed conditions. However, enrichment analysis alone cannot capture all the complexities of comparative transcriptomics, as it is primarily sensitive not to quantities of transcripts from a given category, but to quantities of genes belonging to that category. Furthermore, GOEA is substantially less robust in non-model species, owing to incomplete annotations which impact many key genes. We propose that complementary approaches, such as identifying highest expressed genes and examining transcriptomic profiles through the lenses of investment and output, can help to strengthen conclusions and capture dynamics which might otherwise be overlooked. By examining the twenty highest expressed genes in each body part and gut region, we were able to identify genes that account for large proportions of the transcriptome, but which were missed by enrichment analysis (e.g. GAM1 in the proventriculus, MBF2 factors in the anterior midgut). Comparing the top-expressed genes across transcriptomes allowed us to see that some body parts are dominated by a small number of highly expressed genes (e.g. the gut) while others spread their transcriptome more equally over the genome (e.g. the ovaries). Comparing the relative expression of one-to-one orthologs between species allowed us to identify genes that are disproportionately expressed in one species (e.g. the Tsetse EP protein, a putative antimicrobial effector in the An. gambiae proventriculus) and to detect a reshuffling of expression among ribosomal proteins across the two species.

Another approach we adopted – summing transcripts belonging to a functional category and comparing the sums across body parts, regions, or timepoints – allowed us to assess relative investment in the given category. We confined the use of this method to categories of genes encoding proteins that are all presumed to have similar molecular function, narrowly defined (e.g. peptidases cleaving proteins, or antimicrobial peptides killing pathogens). The utility of evaluating regional or temporal specialization by summing transcripts (investment) as opposed to counting genes (enrichment), is evident when we consider that using GOEA, the highest expressed peptidase in the gut carries no more weight than the fifth highest – despite a difference of over 165,000 TPM, or 16.5% of the total gut transcriptome. Using this investment approach, we were also able to break large categories (e.g. peptidases) into smaller categories (e.g. carboxypeptidases, dipeptidases, etc.) for a fine-grained picture of where and when each was most expressed. There is, however, an important caveat to the interpretation of cumulative analysis as it fails to account for differential transcriptional yields between profiled body parts, regions, or conditions. We overcame this caveat by weighting transcriptional investment by RNA yield to calculate estimated regional/temporal output of digestive transcripts, from which we were able to infer how certain tasks (e.g., sugar digestion) are divided between the regions of the gut, as well as how peptidase transcript output changes over the course of blood meal digestion.

Spatial organization of gut function is conserved across mosquito species; few highly/disparately expressed genes drive transcriptomic differences

The Ae. aegypti gut is linearly divided into five anatomically distinct regions. With the caveat that the crop only admits sugar meals, ingested materials encounter each of these regions sequentially. The regions of the gut therefore possess an ordinal quality, corresponding to stages (either transient or prolonged) of the ingestion/digestion process. Through GOEA and quantitative evaluation of transcripts belonging to digestive categories, we discerned a clear pattern of strong investment in sugar digestion/absorption in the anterior midgut, as hypothesized by Hecker, 1977. Peptidases, by contrast, were predominantly expressed in the posterior midgut. These specializations were maintained under blood-fed conditions, and largely conserved in An. gambiae (s.l.) midguts. Other notable instances of conserved regional function are apparent in the proventriculus, where both species share enrichment of wingless signaling components (Buchon et al., 2013) and AMPS (Hao et al., 2003) with other dipteran species.

While the midgut regions of Ae. aegypti and An. gambiae shared many important functions, and their one-to-one orthologs maintained strikingly close transcriptional correlation, we noted that differences in the expression of a small number of highly expressed genes created large disparities in their overall categorical apportionment of transcripts. Most notably, the An. gambiae proventriculus and anterior midgut expressed far greater quantities of AMPs – possibly in response to some microbial presence – as well as a handful of highly expressed but poorly characterized genes, including an apparent ortholog of a tsetse midgut protein with a putative defensive function (Haines et al., 2010). Meanwhile, the An. gambiae posterior midgut expressed far fewer peptidase transcripts than its Ae. aegypti counterpart, and far more of an MBF2 family of transcription cofactors which, in the Ae. aegypti gut, was only prominent in the anterior regions of the sugar-fed midgut and, intriguingly, the blood-fed posterior midgut. In Drosophila and Bombyx mori, MBF2 factors have been shown to complex with the transcription factor FTZ-F1 and cofactor MBF1 to activate transcription of target genes (Li et al., 1997; Liu et al., 2000). We cannot say whether this role is conserved in Ae. aegypti and An. gambiae. However, the expression of MBF2 factors in both species outstripped the expression of their orthologs of FTZ-F1 (AAEL019863, AAEL026810, AGAP005661) and MBF1 (AAEL008768, AGAP004990) by several orders of magnitude in multiple conditions and/or gut regions (see Aegypti-Atlas website), suggesting MBF2 may not be confined to cofactoring FTZ-F1 in these species. Altogether, this analysis yielded multiple ‘species signature’ genes worthy of closer scrutiny and functional study.

The anterior midgut participates in the transcriptional response to blood feeding

The anterior midgut has been held to play little or no role in the process of blood meal digestion, as the blood bolus is sealed into the posterior midgut by the formation of the peritrophic matrix shortly after ingestion (Billingsley, 1990). However, late in the blood meal response (24 hr) we found that the anterior midgut had increased its investment in some categories of digestive enzymes and transporters (e.g. sphingomyelinases, sterol transporters, amylases/maltases, glucosidases, Figure 4D) as well as its overall transcriptional yield (Figure 4E). While the proportional output of the anterior midgut drops relative to the posterior midgut (Figure 4F), it still contributes approximately one third of the transcripts for sugar-digesting enzymes in the whole midgut (exclusive of proventriculus). This finding is congruent with Billingsley and Hecker’s observation that alpha-glucosidase activity was elevated in homogenates of Anopheles stephensi anterior midguts at 24 hr pbm (Billingsley and Hecker, 1991). As the majority of amylases/maltases and glucosidases in the Ae. aegypti genome possess signal peptides but lack transmembrane domains, it is possible that enzymes secreted in the anterior midgut are capable of diffusing into the extra-peritrophic area in the posterior midgut where they may participate in blood meal digestion.

Gut peptidases are dynamically up and downregulated in sequential transcriptional waves upon blood feeding

Peptidase expression in the Ae. aegypti gut unfolds in a series of phases, coordinated by rising and falling titers of JH and ecdysone. In the post-emergence/pre-vitellogenic phase, JH drives the transcription of a cohort of “early” peptidases, priming the gut for its first blood meal (Bian et al., 2008; Jiang et al., 1997; Noriega et al., 2001; Noriega et al., 1997). Within hours of blood feeding, translation of this pool of transcripts is initiated and peptidase activity begins to increase (Felix et al., 1991). Next, rising ecdysone titers drive the transcription of a “late” cohort of peptidases, which complete the digestive process (He et al., 2021). Finally, early peptidases are transcribed anew (Bian et al., 2008; Noriega et al., 1996) amid a postprandial surge of JH (Hernández-Martínez et al., 2015; Lucas et al., 2015; Shapiro et al., 1986; Zhao et al., 2016), restoring the gut to a state of readiness for its next blood meal.

Peptidase activity peaks somewhere between 18 and 36 hr pbm (Gooding, 1966; Graf and Briegel, 1982; Noriega et al., 2002), as do transcripts for some of the highest expressed ‘late’ phase peptidases (Brackney et al., 2010; Isoe et al., 2009a; Isoe et al., 2009b). However, a few publications have documented that some peptidases peak at earlier timepoints (Brackney et al., 2010; Isoe et al., 2009b; Sanders et al., 2003), and that at least one of these (SPI) is directly responsive to/dependent on ecdysone signaling (He et al., 2021). Here, our genome-wide multi-timepoint series has demonstrated that these early-peaking genes are not isolated and exceptional but are, rather, members of large transcriptional cohorts. We show that peptidases in the blood-fed Ae. aegypti gut are expressed not only in two phases: ‘early’ (translational) and ‘late’ (transcriptional), but that the ‘late’ transcriptional response manifests in multiple successive waves or shifts. We have divided these into three (‘rapid’, ‘intermediate’, and ‘delayed’), but because we only obtained RNAseq profiles for guts at baseline and three blood-fed timepoints, we can only speculate how many of these shifts are actually triggered over the course of blood meal digestion, and how synchronously any of the peptidases that our clustering analysis grouped together are actually regulated. Our timepoints were too broadly spaced to say at what time each gene reached its true peak, or whether peptidases that were apparently quiescent throughout blood meal digestion in fact participated in a shift that was too transient to be detected by our experimental design. Future work will likely uncover even more complexity than we have described.

The sharp temporal patterning of peptidase transcription in the blood-fed mosquito gut raises intriguing questions. How is this transcriptional choreography achieved? Is the transcription of all ‘late’ phase peptidases ecdysone-dependent, and are rising ecdysone levels sufficient to drive them? To what extent is the transcriptional induction of succeeding waves dependent on the activity of prior ones? Why are some ‘delayed’ peptidases so slow to show any transcriptional response when ecdysone titers are ascendent (e.g. AAEL008782, Figure 3E)?. And what mechanism effects the precipitous drop in the ‘rapid’ cohort of induced peptidases (ascendent at 6 hr pbm) at a timepoint (24 hr pbm) when ecdysone titers are still high (e.g. AAEL013715) (Hagedorn et al., 1975)? What is the adaptive significance of the rapid up and down-regulation of so many genes of shared function? We observed that the gut’s investment in aminopeptidases, dipeptidases, and carboxypeptidases has distinct temporal peaks, suggesting that polypeptides may be attacked more from one terminus or the other at different times over the course of digestion. However, most of the peptidases that participate in shift-changes are endopeptidases, and it is not clear how the sequential substitution of one set of endopeptidases for another – as opposed to simultaneous expression – helps to move the process of digestion forward. We speculate that rapidly changing conditions in the blood-fed gut may alter the stability and kinetics of specific peptidases, and that different peptidases predominate at different times because they are adapted to function in specific, transient conditions obtaining at corresponding stages of the digestive process.

The anterior and posterior midgut regions are, respectively, characterized by signs of immune activity and immune tolerance

In the midguts of both Ae. aegypti and An. gambiae, the majority of AMP transcripts are contributed by the proventriculus and anterior midgut (Figure 5D, Figure 6—figure supplement 1B) with minimal expression in the posterior midgut. Also, in both species, the transcription factor caudal which, in Drosophila (Ryu et al., 2008) and An. gambiae (Clayton et al., 2013) has been shown to repress AMP expression, is dominantly expressed in the posterior midgut. The expression of immune activating recognition proteins (PGRPLC and, in Ae. aegypti, PGRP-LE) is less profoundly patterned, but still displays some tropism toward the anterior midgut in both mosquitoes. Altogether, these patterns suggest that ingested microbes are subjected to heightened immune surveillance and antimicrobial activity in the first regions of the midgut but are thereafter well tolerated by the posterior midgut, where they may serve some mutualistic functions. We speculate that selection by gambicin in the proventriculus and anterior midgut of Ae. aegypti and An. gambiae may play an important role in determining the composition of the microbiota that seed the posterior midgut (Figure 7—figure supplement 3A).

Holotricin and gambicin are highly expressed under baseline conditions in mutually exclusive regions of the body

Our examination of the expression of immune effectors yielded the observation that at baseline, AMP expression in Ae. aegypti is dominated by two genes: holotricin and gambicin. We observed that holotricin predominated in the carcass, while gambicin was highly expressed in the gut and Malpighian tubules. While it is not uncommon for antimicrobial effectors to display strong tropisms to specific tissues, we are not aware of comparable instances where a single AMP exhibits such overwhelming transcriptional dominance. In future work, it might be interesting to compare the activity of holotricin and gambicin against different types of microbes (Gram-positive, Gram-negative, fungi, etc.), and to examine the effects that silencing these genes has on midgut communities, and on mosquitoes’ survival in the contexts of oral and systemic infection.

Conclusion

In this manuscript we introduce Aegypti-Atlas, a repository of RNAseq data, and demonstrate how these data can yield insights into tissue function, the organization of digestive specializations, regulatory networks, and immune effectors, as well as the changes wrought by diet and evolution in mosquitoes. This resource may also be useful for the creation of functional genomic tools (e.g. tissue-specific expression systems) which will afford researchers a greater degree of control in experiments and allow for more confidence in the interpretation of results. It is our hope that Aegypti-Atlas will be valuable to other investigators in many areas of mosquito biology.

Materials and methods

Mosquito provenance and rearing

Request a detailed protocol

For all experiments, we used 5- to 12-day-old female mosquitoes of the Thai strain (Ae. aegypti) or the G3 strain (An. gambiae, s.l.) kindly provided by Laura Harrington. The Thai colony has been maintained at Cornell University since 2011 and is outcrossed annually with field-collected eggs from Thailand to relieve inbreeding and maintain field relevance. Mosquitoes were reared at a density of 200 larvae per 1 liter tray. Ae. aegypti received 720 mg of fish food (Hikari #04428). An. gambiae received 50 mg of ground fish food per day from day 1 to 4 of development, and 150 mg per day thereafter until pupation. Adults were maintained in humidified chambers (RH 75% ± 5%) at 29 °C on a diet of 10% sucrose ad libitum. Females were reared and maintained in approximately equal proportion with males for a minimum of 5 days prior to blood-feeding and/or dissection and were presumed to be mated.

Blood feeding and mock blood feeding

Request a detailed protocol

For all RNA-seq experiments involving blood-fed guts, mosquitoes were starved for twenty-four hours, then blood-fed on live chickens without anesthetic in accordance with Cornell University IACUC approved protocol #01–56. For our extended blood feeding time series by RT-qPCR, mosquitoes were fed through a membrane on rooster blood treated with the anticoagulant sodium citrate (Lampire Biological Laboratories, Pipersville, PA, catalogue #7208806). For experiments where RNA yield was estimated in guts and midgut regions under blood-fed conditions, we employed an artificial formulation (SkitoSnack) (Gonzales et al., 2018) fed through a membrane. This formulation was used in order to avoid any skewing of yield which might result from RNA content in a natural blood meal. For RNA yield experiments, boluses were removed during dissection.

Dissections and RNA extraction

Request a detailed protocol

Mosquitoes were sacrificed by submersion in 70% ethanol and dissected in sterile PBS. For body part samples, the last 2–3 abdominal segments were removed to eliminate the sperm-containing spermatheca. Whole guts, Malpighian tubules, and ovaries were then dissected out. The residual carcass was divided into head, thorax, and abdomen. A small anterior section of the thoracic carcass containing the salivary glands was excised to exclude salivary gland transcripts from thoracic profiles. For RNAseq experiments, a minimum of 10–20 individuals per replicate was used for large body parts (e.g. whole body, abdominal carcass, whole guts) and >100 individuals per replicate for small gut regions (e.g. crop, proventriculus). For additional information about replicates, see Supplementary file 9. For RT-qPCR time-series experiments, a minimum of 10 individuals was dissected per replicate. For all experiments, a minimum of three replicates was prepared per part and/or condition.

All samples for RNAseq were homogenized in 600 µl TRIzol and stored at –80 °C. For RNAseq experiments, RNA was extracted via a modified phenol-chloroform method (Troha et al., 2018). In brief, samples in TRIzol were thawed and combined with 120 µl of chloroform, followed by vortexing and centrifugation for 15 min at 12,000 g, 4 °C. Between 200 and 250 µl of aqueous supernatant was removed to a fresh tube, where it was mixed first with 700 µl Buffer RLT (Qiagen RNeasy kit, cat #74004) then 500 µl 100% ethanol. This mixture was passed through a RNeasy spin column (700 µl at a time) with centrifugations of 20 s at 10,000 g, followed by a final spin for 1 min at 10,000 g to dry the column. The membrane was then washed twice with 500 µl of Buffer RPE (20 second centrifugation, 10,000 g), followed by a 1 min centrifugation at 16,000 g to dry the column. Samples were eluted in 30 µl of RNase-free water, applied directly to the membrane.

For RNA yield experiments, dissected gut regions were pooled (10 for posterior guts, 20–50 for other regions) in 1 mL Trizol, and the number in each sample was recorded. Samples were then extracted by a standard phenol-chloroform procedure. Exactly 400 µl of aqueous supernatant was recovered from each tube, and RNA was precipitated overnight with isopropanol at –20 °C. Pellets were resuspended in 20–50 µl of nuclease-free water, and concentrations were measured by Qubit. Yield was then calculated as follows: (concentration * suspension volume / # of dissected regions).

Library preparation and sequencing

Request a detailed protocol

Libraries were prepared using the Quantseq 3’ mRNA-seq prep kit from Lexogen according to the manufacturer’s instructions. Sample quality was evaluated before and after library preparation using a fragment analyzer (Advanced Analytical). Libraries were pooled and sequenced on the Illumina Nextseq 500 platform using standard protocols for 75 bp single-end read sequencing at the Cornell Life Sciences Sequencing Core. Sequences have been deposited on NCBI (BioProject ID: PRJNA789580).

Analysis pipelines

Request a detailed protocol

Quality control of raw reads was performed with fastqc (https://github.com/s-andrews/FastQC) (Andrews et al., 2020) and reads were trimmed by BBMap (https://jgi.doe.gov/data-and-tools/bbtools/) and then mapped to the Ae. aegypti transcriptome or the An. gambiae transcriptome Aedes aegypti LVP_AGWG AaegL5.2 and Anopheles gambiae PEST AgamP4.12, VectorBase, https://www.vectorbase.org/ (Giraldo-Calderón et al., 2022) using Salmon version 0.9.1 with default parameters. Salmon’s transcript-level quantifications were then aggregated to the gene level for gene-level differential expression analysis with the “tximport” package on the R version 3.5. DEseq2 (Love et al., 2014) was used for differential expression analysis and PCAs were performed using custom R scripts (available upon request). GOEA was performed using the topGO package (classic Fisher method). Scripts for Salmon, topGO, Orthogroups analysis, and DESeq2 analysis, as well as DESeq2 outputs can be accessed at GitHub (https://github.com/bingdun/RNAseq_analysis_of_Aedes_aegypti; Bing, 2022).

All phylogenetic trees were constructed using Geneious software version R11, with the following settings. Alignment type: Global alignment, Cost Matrix: Blosum62, Genetic Distance Model: Jukes-Cantor, Tree Build Method: Neighbor-Joining. Clustering was performed using the Pheatmap package in R with gene expression scaled by row. To calculate z-scores, we first censored all genes expressed at less than 2 TPM in the relevant body parts or gut regions, then employing the following expression for each gene in each body part or gut region: z = (x - µ)/s, where x is the expression in the body part or gut region, µ represents the mean expression of all body parts or gut regions, and s is the standard deviation. The peptide sequences of AaegL5.2 and AgamP4.12 were used to find orthogroups and orthologs in the two species using OrthoFinder v2.3 (Emms and Kelly, 2019). The longest transcript variant of each gene was extracted to run the OrthoFinder with the following options: ‘-M msa -S blast -A muscle -T iqtree’.

Calculating the ‘mosquito equation’

Request a detailed protocol

If X is defined as the expression of a specific gene in a given body part (in TPM), and the part’s initial stands for a scaling factor equal to the proportion of transcripts in the whole body derived from that part, the relationship between the expression of that gene in each body part versus in the whole body can be expressed as.

(XHeadH)+(XThoraxT)+(XAbdomenA)+(XGutG)+(XMTMT)+(XOvariesO)=XWholeBody

where H+T+A+G+MT+O=100% Whole Body expression.

For marker genes, the expression in other body parts may be considered to be negligible, (i.e. for a head marker, XThorax, XAbdomen, XGut, XMT, and XOvaries are all ≈ 0). Therefore.

XHeadHXWholeBody

We solved for the estimated scaling factor for each of our qualifying markers, then averaged the values we obtained for each body part to create an estimate of the percent of transcripts in the whole body that are contributed by each part.

RT-qPCR

Request a detailed protocol

For all RT-qPCR amplifications, RNA was pretreated with DNase (Quantabio 95150–01 K) and cDNA was prepared using the qScript cDNA synthesis kit (Quantabio 95047–100) and amplified using Perfecta SYBR Green Fastmix (Quantabio 95072–012) in a Bio-Rad CFX-Connect Instrument. The primer sequences used in this study are available in Supplementary file 10.

Data availability

Data have been submitted to NCBI. The BioProject ID is PRJNA789580.

The following data sets were generated
    1. Hixson B
    2. Bing XL
    3. Yang X
    4. Bonfini A
    5. Nagy P
    6. Buchon N
    (2022) NCBI BioProject
    ID PRJNA789580. A transcriptomic atlas of Aedes aegypti reveals detailed functional organization of major body parts and gut regional specializations in sugar-fed and blood-fed adult females.

References

  1. Book
    1. Clements AN
    (1992)
    The Biology of Mosquitoes Volume 1: Development, Nutrition and Reproduction
    London: Chapman & Hall.
    1. Powell JR
    (2018) Mosquito-Borne Human Viral Diseases: Why Aedes aegypti?
    The American Journal of Tropical Medicine and Hygiene 98:1563–1565.
    https://doi.org/10.4269/ajtmh.17-0866

Decision letter

  1. Bruno Lemaître
    Reviewing Editor; École Polytechnique Fédérale de Lausanne, Switzerland
  2. Utpal Banerjee
    Senior Editor; University of California, Los Angeles, United States
  3. Emilie Pondeville
    Reviewer; MRC-University of Glasgow Centre for Virus Research, United Kingdom

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Decision letter after peer review:

Thank you for submitting your article "A transcriptomic atlas of Aedes aegypti reveals detailed functional organization of major body parts and gut regional specializations in sugar-fed and blood-fed adult females" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Utpal Banerjee as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Emilie Pondeville (Reviewer #2).

All reviewers were quite positive on the findings reported by the manuscript and agree that the paper contains important data that will be of use to the vector biology community. There are however some issues to address (see below) An important one is the lack of clarity/detail in the Materials and methods in regards to how the libraries were constructed and details of how the comparisons across tissues were made. However, collectively they think that the paper is a valuable contribution and if the authors can clarify/revise in regards to those issues it would be a nice paper. We therefore ask you to prepare a revised submission to address the points listed below

Essential revisions:

1. As a dry/wet bioinformatician/molecular biologist, this reviewer acknowledges that this work has been beautifully executed following standard practices for mosquito rearing and following the guidelines of RNA sequencing. Though, data reproducibility is a concern in bioinformatics since different analysis pipelines may lead to dissimilar results from same datasets. While aiming to reach a broad audience of specialists and non-specialists in RNAseq analyses, this reviewer suggests that authors deposit their command lines and scripts (even if only relevant parts) in an open-access repository such as GitHub or equivalent, so others may compare their own data without technical variation. Also, raw tables with normalized gene expression (i.e., outputs of DEseq2) could also be provided for future reference, either deposited in GEO/NCBI or as tables together with the code at GitHub.

2. One major concern relates to the normalization of transcriptomic data presented here. Authors emphasize that data from some tissues such as ovaries or specific midgut regions showed considerably distinct patterns of gene expression. Consequently, such patterns of unbalanced transcriptional levels can skew normalization and could be of concern. The tool used here for normalization of transcript counts, DEseq2, should not be severely impacted by these events since, as mentioned by its author Michael Love, "the median ratio normalization in DESeq2 doesn't have as strong of an assumption that most genes don't change" (see post at https://support.bioconductor.org/p/61604/ for details). This reviewer suggest that the authors provide some data confirming that the normalization is stable enough between samples, hopefully showing that the core of genes that do not change expression reflect the size factors estimated by DEseq2 (a good example of this quality control is given in the blog post above). If some striking differences are seen, I would suggest applying HMM-normalization or other technique to better normalize the data and increase accuracy. Example of normalization methods for such situation are given in this Review by Liu et al. Front. Bioeng. Biotechnol. 2019 – DOI 10.3389/fbioe.2019.00358.

3. A known problem in high-throughput analysis is the establishment of arbitrary thresholds or cutoffs when analyzing differential expression. Here the authors often established arbitrary thresholds/cutoffs without depicting the reason for doing so (e.g., top 20 genes, 5-fold difference or 2-fold difference, etc). This reviewer thinks that it would be of great improvement for this manuscript if authors explain in detail the relevance of their choices and how it impacted their analyses.

4) Although some studies already analysed gene expression in the different organs of Aedes mosquitoes and changes occurring after a blood meal, this study is the first to analyse gene expression in most of female tissues allowing an accurate comparison of profiles between tissues. The analysis of data is very thorough and well described, showing investment (number of transcripts) and output (number of transcripts balanced by total number of transcripts in a specific tissue) of each organ. The putative biological functions of the different organs are not new and surprising, however, the gene expression profiles and conferred biological functions of the different gut regions is original and was not previously assessed. Although this study brings a lot of information and is very valuable for the mosquito research community, it remains a very descriptive study without functional characterization/validation.

Some limits should be taken in consideration when looking at the data for instance to know in which tissue a gene is expressed or not, if a gene promoter could be used for specific gene expression system, e.g., one mosquito strain analysed, immune gene expression profiles (especially in the gut) may be affected by the microbiota, which is known to be different between labs, mating status (not controlled in the study), etc. However, this study is a good start for the creation of an atlas for Aedes aegypti and should constitute the basis for a future and larger deposition of data to complete the picture, e.g., more tissues (hemocytes for instance), more developmental stages, infection with pathogens etc. The creation of an online database is of course positive, but it is regrettable that those data are not integrated in larger databases such as vector base allowing an integrated analysis of data with previous published datasets.

5) The use of cumulative expression values and RNA yield is confusing. There is not an adequate description of library prep to assess the validity of these methods. Typically, libraries are prepped with a subset of the total RNA extracted from a sample. In the manuscript, authors write that they normalized transcript levels by RNA yield, however there is no discussion of whether the volume of tissue for each replicate was standardized. If libraries were prepped with equivalent concentrations of RNA, this normalization is unnecessary. Authors may have used the concentration of RNA used to prepare the RNA-seq libraries for this normalization but the way it is written does not make their methods clear.

6) Additionally, there are points throughout the manuscript that need further explanation to improve clarity and to allow for successful assessment of the methods used. For example, when describing the mosquito equation developed by the authors, a "scaling factor" is mentioned without proper explanation of what this scaling factor is. Furthermore, specifics on the parameters used in the differential expression analysis and the gene ontology enrichment analysis are missing from the methods section.

https://doi.org/10.7554/eLife.76132.sa1

Author response

Essential revisions:

1. As a dry/wet bioinformatician/molecular biologist, this reviewer acknowledges that this work has been beautifully executed following standard practices for mosquito rearing and following the guidelines of RNA sequencing. Though, data reproducibility is a concern in bioinformatics since different analysis pipelines may lead to dissimilar results from same datasets. While aiming to reach a broad audience of specialists and non-specialists in RNAseq analyses, this reviewer suggests that authors deposit their command lines and scripts (even if only relevant parts) in an open-access repository such as GitHub or equivalent, so others may compare their own data without technical variation. Also, raw tables with normalized gene expression (i.e., outputs of DEseq2) could also be provided for future reference, either deposited in GEO/NCBI or as tables together with the code at GitHub.

We thank the reviewer for their kind words concerning our work. We have done as requested. Our scripts and DEseq2 outputs have been deposited at GitHub and are accessible here: https://github.com/bingdun/RNAseq_analysis_of_Aedes_aegypti.

2. One major concern relates to the normalization of transcriptomic data presented here. Authors emphasize that data from some tissues such as ovaries or specific midgut regions showed considerably distinct patterns of gene expression. Consequently, such patterns of unbalanced transcriptional levels can skew normalization and could be of concern. The tool used here for normalization of transcript counts, DEseq2, should not be severely impacted by these events since, as mentioned by its author Michael Love, "the median ratio normalization in DESeq2 doesn't have as strong of an assumption that most genes don't change" (see post at https://support.bioconductor.org/p/61604/ for details). This reviewer suggest that the authors provide some data confirming that the normalization is stable enough between samples, hopefully showing that the core of genes that do not change expression reflect the size factors estimated by DEseq2 (a good example of this quality control is given in the blog post above). If some striking differences are seen, I would suggest applying HMM-normalization or other technique to better normalize the data and increase accuracy. Example of normalization methods for such situation are given in this Review by Liu et al. Front. Bioeng. Biotechnol. 2019 – DOI 10.3389/fbioe.2019.00358.

We have extracted size factors for each of the samples in the body-part vs whole body DEseq2 analyses performed (a) with all genes in the genome and (b) with the subset of genes that were not significantly different in the first analysis (p > 0.05). We found the values were nearly identical. Since size factors are median values and, in all comparisons between body parts and whole body, fewer than half of all genes were differentially expressed, this result was consistent with expectations. In Author response image 1 we plot the two sets of size factors we extracted:

Author response image 1

3. A known problem in high-throughput analysis is the establishment of arbitrary thresholds or cutoffs when analyzing differential expression. Here the authors often established arbitrary thresholds/cutoffs without depicting the reason for doing so (e.g., top 20 genes, 5-fold difference or 2-fold difference, etc). This reviewer thinks that it would be of great improvement for this manuscript if authors explain in detail the relevance of their choices and how it impacted their analyses.

We employed a variety of arbitrary cutoffs in this manuscript. These include:

a) Putative markers, Figure 1B (TPM > 5, 50x enriched versus all other body parts)

The putative markers in our manuscript serve two purposes: first, they are a suggested starting place for researchers who may want to build tissue-specific expression systems. Second, they were used as terms in the calculation of the “mosquito equation” which estimates what fraction of transcripts each body part contributes to the mosquito whole body.

We chose the >5 TPM cutoff because, for a gene to be useful as a potential tissue-specific driver, it would need to be robustly expressed. Our enrichment cutoff (50x enrichment) was chosen because we wanted to set the threshold somewhere that allowed us to identify a robust cohort of body-part specific transcripts in each body part to use in calculating the “mosquito equation”. At TPM >5 and 50x enrichment, the abdomen had the smallest number of qualifying genes (8). At the next most restrictive threshold we tried (100x) the number of qualifying genes in the abdomen fell to 3. We felt it was not desirable to base the calculation on such a small cohort.

After calculating the mosquito equation, we used it to predict the expression of each gene in the genome in whole body and plotted it against observed values (2 supp 2A). The resulting strong correlation satisfied us that 50x was a sufficiently restrictive cutoff for this purpose.

b) ‘Top 20’ (Treemaps, figures 1C, 2C, and 7E)

The purpose of the ‘Top 20’ figures was to demonstrate (a) that some transcriptomes (e.g., gut) invest heavily in a few genes while others (e.g., ovaries) spread their transcripts over more of the genome and (b) to illustrate the types of function that the top expressed genes in each body part have (e.g. peptidases dominating the gut, AMPs in the prov, etc.).

We chose to use the 20 highest-expressed genes because we found this was the threshold that best illustrated the contrast between transcriptomes: the top 20 genes contribute ~50% of all gut transcripts and ~10% of all ovary transcripts. We found that using more than 20 genes was not desirable as the cells in the treemaps became too small to distinguish.

c) GO plots: padj <0.05, 5x Enrichment (1C, 2B, 6 supp 1C); 2x Enrichment (1C (ovaries only) 3B, 4C)

For all GO analyses, we found, through trial and error, that a test set of ~200-1000 genes was optimal to obtain meaningful and statistically significantly enriched categories against a background of ~20,000 genes in the Aedes genome. We found that test sets of >2000 genes (>10% of the genome) were too large, resulting in higher and often non-significant p values. Thresholds of 2x and 5x enrichment versus whole-body yielded the following quantities of significantly upregulated genes (overly small and overly large test sets are marked with *):

Author response table 1
2x5x
Head*26641376
Thorax*2271761
Abdomen1871359
Gut*2163985
MT*25671193
Ovaries555*44
Crop*25011427
Prov1234414
Ant757188
Post*3*1
Hind1449620

We found that the 5x enrichment netted a desirable fraction of the genome in all body part and gut region comparisons except for ovaries vs whole body, which yielded only 44 genes, and posterior gut versus whole gut, which yielded a single gene.

The low number of genes in ovaries meeting the 5x threshold motivated the choice to use a 2x cut-off for ovaries only. We felt this choice was justified by the fact that ovaries are the greatest contributor (~29%) of transcripts to whole gut, making it more difficult for differentially expressed genes in this organ to meet the 5x threshold.

A different approach was required for the posterior midgut which contributes ~90% of whole gut transcripts (see Figure 2E). In effect, a comparison of posterior midgut to whole gut was nearly equivalent to comparing the posterior midgut to itself. To evaluate enrichment of function in this region, we created a test set which included all genes that were significantly enriched in the posterior midgut (padj < 0.05) in comparison to all other gut regions, in a combined DESeq2 comparison, with all replicates from other regions receiving equal weight in the analysis. This yielded a test set of 608 genes. We would like to note here that in the previous version of the manuscript (and pre-print) we mistakenly stated that the GOEA for posterior midguts in Aedes and Anopheles employed a 5x cutoff. This is incorrect. No fold-change cutoff was employed for this region in either species. We regret the error and have corrected it in all figure legends and supplementary materials.

The comparisons of blood-fed timepoints and regions did not yield as many 5x enriched genes as the body part/gut region comparisons. This is unsurprising, as we were comparing differences in condition rather than in tissue. For these comparisons, we opted to use a 2x threshold.

Author response table 2
2x5x
Upregulated
6 vs SF828228
24 vs 6911381
48 vs 24469*55
48 vs SF102*1
BF post vs Post989409
BF ant vs Ant317*95
Downregulated
6 vs SF1048371
24 vs 6741170
48 vs 24569268
48 vs SF*85*10
BF post vs Post737135
BF ant vs Ant525108

Note: in the preprint version of this manuscript, we employed a 400-gene cutoff to limit test set size only for blood-fed comparisons. However, in the interests of consistency, we have decided to eliminate this threshold. The resulting changes in the outcome of the analysis have been disseminated to figures 3 and 4, to supplemental Table S2, and to the text of the Results section.

4) Although some studies already analysed gene expression in the different organs of Aedes mosquitoes and changes occurring after a blood meal, this study is the first to analyse gene expression in most of female tissues allowing an accurate comparison of profiles between tissues. The analysis of data is very thorough and well described, showing investment (number of transcripts) and output (number of transcripts balanced by total number of transcripts in a specific tissue) of each organ. The putative biological functions of the different organs are not new and surprising, however, the gene expression profiles and conferred biological functions of the different gut regions is original and was not previously assessed. Although this study brings a lot of information and is very valuable for the mosquito research community, it remains a very descriptive study without functional characterization/validation.

Some limits should be taken in consideration when looking at the data for instance to know in which tissue a gene is expressed or not, if a gene promoter could be used for specific gene expression system, e.g., one mosquito strain analysed, immune gene expression profiles (especially in the gut) may be affected by the microbiota, which is known to be different between labs, mating status (not controlled in the study), etc. However, this study is a good start for the creation of an atlas for Aedes aegypti and should constitute the basis for a future and larger deposition of data to complete the picture, e.g., more tissues (hemocytes for instance), more developmental stages, infection with pathogens etc. The creation of an online database is of course positive, but it is regrettable that those data are not integrated in larger databases such as vector base allowing an integrated analysis of data with previous published datasets.

We agree with the reviewer that readers will need to take stock of any differences in strain, etc. when making use of our data set. The clarifications in Methods (lines 938-940) about mating status and the new supplemental table (Table S9) clarifying the details of dissections should help readers to establish the comparability of their own data/methods. We have also already noted that microbiota may affect immune gene expression in the gut (lines 805-806).

We are in contact with George Christophides and Gloria Giraldo-Calderon regarding integrating our data sets into the VectorBase transcript expression data visualization tool. Dr. Giraldo-Calderon has taken receipt of our SRA files, and forecasts that the data will be available to users as early as June of 2022.

5) The use of cumulative expression values and RNA yield is confusing. There is not an adequate description of library prep to assess the validity of these methods. Typically, libraries are prepped with a subset of the total RNA extracted from a sample. In the manuscript, authors write that they normalized transcript levels by RNA yield, however there is no discussion of whether the volume of tissue for each replicate was standardized. If libraries were prepped with equivalent concentrations of RNA, this normalization is unnecessary. Authors may have used the concentration of RNA used to prepare the RNA-seq libraries for this normalization but the way it is written does not make their methods clear.

After reviewing our Methods section, we believe we see the source of the confusion. RNA yield analyses were not performed as a correction for differences in sequencing depth between RNAseq samples (which were already normalized by the calculation of TPM). Rather, RNA yield experiments were conducted completely independently from RNAseq experiments. The RNA yield experiments serve a similar purpose (for gut regions) to the mosquito equation calculation (for body parts). The objective was to estimate what proportion of the whole gut transcriptome was derived from each gut region.

We have expanded this section of Methods to make our approach clearer (see lines 964 to 981 in the revised manuscript), but we will also briefly reiterate it here:

In RNAseq experiments: the samples extracted for the creation of libraries were diluted to the recommended concentration for library preparation, and the resulting libraries were pooled in approximately equal concentration to ensure equal depth of sequencing. Therefore, the resulting quantity of reads from each sample does not reflect the RNA yield of the gut regions from which they were extracted. The conversion of counts to TPMs then normalized the data across samples, correcting for any residual variation from dis-equal sequencing depth.

The RNA yield experiments were conducted by dissecting a known number of gut regions (20 to 50 for small ones, 10 for posterior guts) into 1 mL of Trizol, performing a standard phenol-chloroform extraction (no column purification), resuspending the resulting RNA pellet in a known volume, measuring the concentration by Qubit, and calculating the yield in ng/region (concentration * suspension volume / # of dissected regions).

The proportional expression of digestive enzymes/transporters (2F, 4F) and of immune genes (5E) was calculated by weighting the cumulative TPM from each category for each region (as determined by RNAseq) by that region’s RNA yield (as determined by Qubit measurement) to establish an estimate of what proportion of transcripts in the whole gut belonging to that category were derived from each contributing region.

6) Additionally, there are points throughout the manuscript that need further explanation to improve clarity and to allow for successful assessment of the methods used. For example, when describing the mosquito equation developed by the authors, a "scaling factor" is mentioned without proper explanation of what this scaling factor is.

The “scaling factor” for each body part is the proportion of transcripts in the whole body contributed by that part. For example, the head is the source of approximately 4.7% of all transcripts in the whole body, therefore its scaling factor is 0.047. The “mosquito equation” uses the scaling factors for the six body parts to predict the expression of a gene in the whole body, given TPM values for all body parts:

WBTPM = HeadTPM*0.047 + ThxTPM*0.194 + AbdTPM*0.262 + GutTPM*0.14 + MTTPM*0.017 + OvTPM*0.288

We have updated the ‘mosquito equation’ section of methods (lines 1014-1017) to more clearly explain the nature of the scaling factor (see bolded words below):

“If X is defined as the expression of a specific gene in a given body part (in TPM), and the part’s initial stands for a scaling factor equal to the proportion of transcripts in the whole body derived from that part, the relationship between the expression of that gene in each body part versus in the whole body can be expressed as...”

7) Furthermore, specifics on the parameters used in the differential expression analysis and the gene ontology enrichment analysis are missing from the methods section.

We have now deposited our scripts for topGO and DESeq2 with GitHub (see major revision #1). For all topGO analyses, we used the p value calculated using the classic Fisher method.

https://doi.org/10.7554/eLife.76132.sa2

Article and author information

Author details

  1. Bretta Hixson

    Cornell Institute of Host-Microbe Interactions and Disease, Department of Entomology, Cornell University, Ithaca, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Validation, Visualization, Writing - original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7983-5104
  2. Xiao-Li Bing

    Cornell Institute of Host-Microbe Interactions and Disease, Department of Entomology, Cornell University, Ithaca, United States
    Present address
    Department of Entomology, Nanjing Agricultural University, Jiangsu, China
    Contribution
    Data curation, Formal analysis, Writing – review and editing
    Competing interests
    No competing interests declared
  3. Xiaowei Yang

    Cornell Institute of Host-Microbe Interactions and Disease, Department of Entomology, Cornell University, Ithaca, United States
    Present address
    State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, China
    Contribution
    Data curation, Formal analysis, Writing – review and editing
    Competing interests
    No competing interests declared
  4. Alessandro Bonfini

    Cornell Institute of Host-Microbe Interactions and Disease, Department of Entomology, Cornell University, Ithaca, United States
    Contribution
    Formal analysis, Investigation
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6642-8665
  5. Peter Nagy

    Cornell Institute of Host-Microbe Interactions and Disease, Department of Entomology, Cornell University, Ithaca, United States
    Contribution
    Formal analysis, Investigation
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5053-0646
  6. Nicolas Buchon

    Cornell Institute of Host-Microbe Interactions and Disease, Department of Entomology, Cornell University, Ithaca, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Funding acquisition, Project administration, Resources, Supervision, Writing - original draft, Writing – review and editing
    For correspondence
    nicolas.buchon@cornell.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3636-8387

Funding

National Institute of Allergy and Infectious Diseases (1R01AI148529-02)

  • Nicolas Buchon

National Science Foundation (IOS 2024252)

  • Nicolas Buchon

National Institute of Allergy and Infectious Diseases (1R01AI148541-02)

  • Nicolas Buchon

National Institute on Aging (5R21AG065733-02)

  • Nicolas Buchon

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

Special thanks to Jonathan Revah for creating the Aegypti-Atlas website, and to Laura Harrington, Sylvie Pitcher, and Garrett League for their kind provision of Thai and G3 strain mosquitoes. We also acknowledge members of the Buchon lab, Louise Huot, Kristin Michel, Courtney Murdock, Laura Harrington, and Jeffrey Scott for helpful comments on the manuscript.

Senior Editor

  1. Utpal Banerjee, University of California, Los Angeles, United States

Reviewing Editor

  1. Bruno Lemaître, École Polytechnique Fédérale de Lausanne, Switzerland

Reviewer

  1. Emilie Pondeville, MRC-University of Glasgow Centre for Virus Research, United Kingdom

Publication history

  1. Received: December 6, 2021
  2. Preprint posted: December 21, 2021 (view preprint)
  3. Accepted: April 25, 2022
  4. Accepted Manuscript published: April 26, 2022 (version 1)
  5. Version of Record published: May 17, 2022 (version 2)

Copyright

© 2022, Hixson et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 521
    Page views
  • 100
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Bretta Hixson
  2. Xiao-Li Bing
  3. Xiaowei Yang
  4. Alessandro Bonfini
  5. Peter Nagy
  6. Nicolas Buchon
(2022)
A transcriptomic atlas of Aedes aegypti reveals detailed functional organization of major body parts and gut regional specializations in sugar-fed and blood-fed adult females
eLife 11:e76132.
https://doi.org/10.7554/eLife.76132

Further reading

    1. Chromosomes and Gene Expression
    Faith C Fowler et al.
    Research Article

    DNA double-strand break (DSB) repair by homologous recombination is confined to the S and G2 phases of the cell cycle partly due to 53BP1 antagonizing DNA end resection in G1 phase and non-cycling quiescent (G0) cells where DSBs are predominately repaired by non-homologous end joining (NHEJ). Unexpectedly, we uncovered extensive MRE11- and CtIP-dependent DNA end resection at DSBs in G0 murine and human cells. A whole genome CRISPR/Cas9 screen revealed the DNA-dependent kinase (DNA-PK) complex as a key factor in promoting DNA end resection in G0 cells. In agreement, depletion of FBXL12, which promotes ubiquitylation and removal of the KU70/KU80 subunits of DNA-PK from DSBs, promotes even more extensive resection in G0 cells. In contrast, a requirement for DNA-PK in promoting DNA end resection in proliferating cells at the G1 or G2 phase of the cell cycle was not observed. Our findings establish that DNA-PK uniquely promotes DNA end resection in G0, but not in G1 or G2 phase cells, which has important implications for DNA DSB repair in quiescent cells.

    1. Chromosomes and Gene Expression
    2. Genetics and Genomics
    Mary Couvillion et al.
    Tools and Resources

    Understanding the complex network that regulates transcription elongation requires the quantitative analysis of RNA polymerase II (Pol II) activity in a wide variety of regulatory environments. We performed native elongating transcript sequencing (NET-seq) in 41 strains of S. cerevisiae lacking known elongation regulators, including RNA processing factors, transcription elongation factors, chromatin modifiers, and remodelers. We found that the opposing effects of these factors balance transcription elongation and antisense transcription. Different sets of factors tightly regulate Pol II progression across gene bodies so that Pol II density peaks at key points of RNA processing. These regulators control where Pol II pauses with each obscuring large numbers of potential pause sites that are primarily determined by DNA sequence and shape. Antisense transcription varies highly across the regulatory landscapes analyzed, but antisense transcription in itself does not affect sense transcription at the same locus. Our findings collectively show that a diverse array of factors regulate transcription elongation by precisely balancing Pol II activity.