1. Developmental Biology
  2. Genetics and Genomics
Download icon

Paths and pathways that generate cell-type heterogeneity and developmental progression in hematopoiesis

  1. Juliet R Girard
  2. Lauren M Goins
  3. Dung M Vuu
  4. Mark S Sharpley
  5. Carrie M Spratford
  6. Shreya R Mantri
  7. Utpal Banerjee  Is a corresponding author
  1. Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, United States
  2. Molecular Biology Institute, University of California, Los Angeles, United States
  3. Department of Biological Chemistry, University of California, Los Angeles, United States
  4. Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, United States
Research Article
  • Cited 0
  • Views 528
  • Annotations
Cite this article as: eLife 2021;10:e67516 doi: 10.7554/eLife.67516

Abstract

Mechanistic studies of Drosophila lymph gland hematopoiesis are limited by the availability of cell-type-specific markers. Using a combination of bulk RNA-Seq of FACS-sorted cells, single-cell RNA-Seq, and genetic dissection, we identify new blood cell subpopulations along a developmental trajectory with multiple paths to mature cell types. This provides functional insights into key developmental processes and signaling pathways. We highlight metabolism as a driver of development, show that graded Pointed expression allows distinct roles in successive developmental steps, and that mature crystal cells specifically express an alternate isoform of Hypoxia-inducible factor (Hif/Sima). Mechanistically, the Musashi-regulated protein Numb facilitates Sima-dependent non-canonical, and inhibits canonical, Notch signaling. Broadly, we find that prior to making a fate choice, a progenitor selects between alternative, biologically relevant, transitory states allowing smooth transitions reflective of combinatorial expressions rather than stepwise binary decisions. Increasingly, this view is gaining support in mammalian hematopoiesis.

Editor's evaluation

This paper will be of interest to scientists who study hematopoiesis. The authors combine single cell RNA-seq with bulk RNA-seq of transcripts from blood cells in the Drosophila larval hematopoietic organ. They present extensive analysis of the datasets, and the pseudotime analyses present a model of how hematopoietic progenitors can differentiate along transitory paths. These datasets reveal cell-type specific isoform expression of Notch pathway regulators, and genetic experiments prove the importance of these factors in development of one lineage. These transcriptomic analyses and subsequent genetic experiences provide strong support for the major claims of the paper.

https://doi.org/10.7554/eLife.67516.sa0

Introduction

The Drosophila lymph gland is the major hematopoietic organ that develops during the larval stages for the purpose of providing blood cells during later pupal/adult periods (reviewed in Banerjee et al., 2019). Hematopoietic function for the larva itself is largely provided by a separate set of sessile or circulating blood cells outside of the lymph gland (reviewed in Letourneau et al., 2016). The only time the lymph gland provides blood cells to the circulating larval hemolymph is if the larva faces a stress or immune challenge. This study entirely concentrates on the primary/anterior lobes of the lymph gland, which display the highest hematopoietic activity during normal larval development.

Past work has identified specific functional zones. The PSC (Posterior Signaling Center) is marked by expression of Antp (Mandal et al., 2007) and knot/collier (kn/col) (Crozatier et al., 2004). The PSC signals progenitors that belong to the medullary zone (MZ) and are marked by domeMESO and Tep4 (Jung et al., 2005; Irving et al., 2005). Differentiating cells form the cortical zone (CZ), expressing Hemolectin (Hml), Peroxidasin (Pxn), lozenge (lz), and other differentiating cell markers (Jung et al., 2005). A narrow band of cells that are double positive for domeMESO and HmlΔ occupy the edge abutting these two zones in the early third instar (Sinenko et al., 2009), and is referred to as the intermediate zone (IZ), which contains intermediate progenitors (IPs) (Krzemien et al., 2010).

Invertebrates predate the evolution of the lymphoid system for adaptive immunity. Accordingly, Drosophila blood cells are all similar in function to cells of the vertebrate myeloid lineage. The most predominant class of blood cells, the plasmatocytes (PLs; 95% of all hemocytes), share a monophyletic relationship with vertebrate macrophages. PLs function in the engulfment of microbes and apoptotic cells, and they produce extracellular matrix proteins (Fessler and Fessler, 1989; Tepass et al., 1994; Franc et al., 1996). A minor (2–5%), but important class is represented by crystal cells (CCs) named for their crystalline inclusions of the pro-phenoloxidase enzymes, PPO1 and PPO2. CCs are necessary for melanization, blood clot formation, immunity against bacterial infections, and to help mitigate hypoxic stress (Rämet et al., 2002; Galko and Krasnow, 2004; Binggeli et al., 2014; Dudzic et al., 2015; Cho et al., 2018). The transcription factor Lozenge (Lz) cooperates with Notch signaling to express a number of target genes (such as hindsight/pebbled) to specify CCs (Lebestky et al., 2000; Duvic et al., 2002), whereas the Sima (vertebrate HIF-1α) protein is required for their maintenance (Mukherjee et al., 2011). The orthologue of Lz in mammals is RUNX1, with broad hematopoietic function at many developmental stages, and RUNX1 is often dysregulated in acute myeloid leukemias (de Bruijn and Speck, 2004; Ito, 2004). The third class of blood cells, lamellocytes (<1%), is usually present only during parasitization by wasps (reviewed in Letourneau et al., 2016).

In early genetic studies, the MZ appeared to consist of a fairly homogeneous group of cells, although a small number of cells clustered near the heart (dorsal vessel) are identified as pre-progenitors (Jung et al., 2005; Dey et al., 2016; Tiwari et al., 2020). More recent reports have noted considerable heterogeneity and complexity within the progenitor population (reviewed in Banerjee et al., 2019). Particularly noteworthy, in this context, is the functional distinction into a Hh-sensitive and a Hh-resistant group of progenitors within the MZ (Baldeosingh et al., 2018).

Hematopoiesis requires complex collaborations between direct cell to cell signals (e.g., Serrate/Notch), interzonal communication (e.g., Hedgehog), signals from the neighboring cardiac tube (Morin-Poulard et al., 2016; Destalminil-Letourneau et al., 2021), and systemic signals (e.g., olfactory and nutritional) (Lebestky et al., 2003; Crozatier et al., 2004; Mandal et al., 2007; Shim et al., 2012; Shim et al., 2013; Ferguson and Martinez-Agosto, 2014). An important type of interzonal signaling mechanism relevant to this paper involves multiple cell types across the zones. In brief, progenitors are maintained not only through PSC-derived signals but also through a signaling relay mediated by the differentiating cells. This backward signal from the differentiating cells to the precursors is named the Equilibrium Signal (Mondal et al., 2011; Mondal et al., 2014). In this process, Pvf1 (PDGF- and VEGF-related factor 1) produced by the PSC, trans-cytoses through the MZ to bind its receptor Pvr (PDGF/VEGF receptor), which is expressed at high levels in the CZ. This initiates a STAT-dependent but JAK-independent signaling cascade that ultimately leads to the secretion of the extracellular enzyme ADGF-A (adenosine deaminase-related growth factor A). This enzyme breaks down adenosine, preventing its mitogenic signal and proliferation of MZ progenitors. Acting together the niche and the backward signal maintain a balance between progenitor and differentiated cell types. The genetic studies broadly implicated the CZ cells as originators of this backward signal. Finer analysis, afforded by cell-separated bulk and single-cell RNA-Seq in this study, allows us to attribute this role to a smaller and more specific subset of cells.

RNA-Seq has been used recently as a technique to study Drosophila blood cells (Cattenoz et al., 2020; Cho et al., 2020; Fu et al., 2020; Ramond et al., 2020; Tattikota et al., 2020). Four of the cited studies analyze circulating blood cells that have a completely different developmental profile than the lymph gland. Cho et al., 2020 utilized the lymph gland and validated its zonal structure at the level of gene expression. Additionally, new markers and sub-zones were identified. The broader picture revealed in our current manuscript is largely consistent with Cho et al., 2020, but several important details and interpretations vary. The results and conclusions of the two independent studies are compared and contrasted later in this paper. Importantly, the primary motivation of this current study is to use the combined strategies of several RNA-Seq analyses as a tool to provide data that can be combined seamlessly with the powerful genetics available in Drosophila. This functional validation of the two approaches is an advancement over the use of transcriptomics to distinguish cell types by their expressed markers. This is a level of in vivo mechanistic analysis that is not yet available for many mammalian systems, but for which Drosophila could serve as a model. While this work also describes subzones and their characteristic markers, the primary emphasis that makes it distinct is the use of a complex strategy that allows us to extend beyond cell type identification and to dissect mechanisms that define alternate paths and pathways that were not solvable by earlier genetic methods alone.

The novel conclusions from this analysis include a clear characterization of the IZ cells (IPs), and a demonstration of the IPs as a distinct cell type; identification of two separate transitional populations that define distinct paths between progenitors and differentiated cells fates; the role of metabolism in a zone-specific developmental program; previously uncharacterized functional aspects of transcriptional regulation by the JNK and RTK pathways; the unique mechanism of CC maturation by a novel and specific isoform of Sima identified in the RNA-Seq analysis and a previously uncharacterized interaction of this Sima isoform with Notch, Numb, and Musashi, which provides a full mechanism for CC formation and maintenance.

This combination of molecular genetics and whole genome approaches makes it clear that hematopoietic cells are far more heterogeneous and diverse than previously realized by genetics alone, and helps shift our view of hematopoiesis from being a series of discrete steps to a more continuous journey of cells with similar, but not identical transcriptomic profiles along multiple paths. The multiplicity in layers of decision points creates new routes, which can each lead to a distinct differentiated endpoint, or, alternatively, follow their parallel trajectories to a single final outcome.

Results

Bulk RNA-Seq analysis of zonal patterning within the lymph gland

To better understand the distribution of gene-expression patterns in different lymph gland zones, we utilize a combination of established, directly driven, reporter constructs that mark the MZ (domeMESO enhancer driven nuclear EGFP) as well as the CZ (HmlΔ enhancer driven nuclear DsRed). These markers are not GAL4-driven and therefore allow simultaneous visualization and manipulation of different cell types (Figure 1A–A’). Lymph glands from these marked third instar larvae are dissected and the primary lobes are separated from the rest of the lymph gland. Our samples do not include any of the posterior lobes. Following dissociation of the primary lobes and FACS sorting the resulting cells, two single positive cell types for each marker and a distinct cell population that is positive for both markers are found (Figure 1B; Figure 1—figure supplement 1A). These three represent cells of the MZ, CZ, and IZ, respectively. The transitioning cells of the IZ (Sinenko et al., 2009) are referred to as IPs (Krzemien et al., 2010). Direct drivers and nuclearly localized fluorescent reporters make this double positive population easy to identify, both in the intact lymph gland (Figure 1A–A’) and in dissociated cells (Figure 1B). IPs express lower levels of the markers EGFP and DsRed than in MZ and CZ cells, respectively (Figure 1B). The three gated populations are used in bulk RNA-Seq experiments. A fourth population that is double negative for both markers is also detected in the FACS sorted populations. We have not characterized these cells in detail, as they comprise a mixed population including the PSC, which is not marked in these tissues, but is explored in the single-cell RNA-Seq (scRNA-Seq) experiments.

Figure 1 with 3 supplements see all
Analysis of subzonal patterning of the lymph gland by RNA-Seq.

(A–E) Genotype of glands used is domeMESO-GFP.nls, HmlΔ-DsRed.nls GFP (dome) is in green and DsRed (Hml) is in magenta. (A) Confocal image shows the zonal pattern of an early third instar lymph gland. Progenitors of the medullary zone (MZ, green), differentiated cells of the cortical zone (CZ, magenta), and cells in the intermediate zone (IZ, white due to colocalization of green and magenta) are seen as distinct cell types. (A’) A digital rendering of the confocal image is shown in (A). Nuclei are pseudo-colored based on their fluorescence: MZ (green), CZ (magenta), and IZ double positive cells (white). (A–A’) Scale bars, 20 μm. (B) Images of individual dissociated lymph gland cells. Brightfield (BF), DAPI/DNA (blue), MZ cells (green), CZ cells (magenta), and IZ cells (green [weak] and magenta). (C–E) Gene expression profiles from bulk RNA-Seq analysis of dissociated and sorted cells in three biological replicates. (C) The sorted MZ and CZ cells express high levels of their corresponding hallmark genes while IZ cells show low to moderate levels of expression of these genes. MZ progenitors are validated by their expression of: domeMESO-EGFP, col (collier), Tep4, shg (shotgun; E-Cad), and dome (domeless); CZ plasmatocytes (PLs) by: HmlΔ-DsRed, Hml (Hemolectin), Pxn (Peroxidasin), vkg (viking), Col4a1, and NimC1; and the crystal cells (CC; also part of CZ) by: lz (lozenge), hnt (hindsight/pebbled), PPO2, and PPO1. (D) Newly identified zone-enriched genes for MZ include QC, mthl13, and Lst. For CZ, these include Ten-m, edl, and Sr-Cl. In general, IZ cells show low to moderate levels of these MZ and CZ-specific markers. (p-values shown are from GSA analysis. Mean with SD shown.) (E) Expression of six newly identified IZ-enriched marker genes is not enriched in MZ or CZ. (F–I) Genotype: lz-GAL4, UAS-mGFP; HmlΔ-DsRed.nls. CCs expressing lz-GAL4 are marked by GFP whereas PLs express DsRed only. Lymph glands are dissociated and the cells are subjected to flow cytometric and/or bulk RNA-Seq analysis. (F) Flow cytometry identifies two distinct populations of CCs, expressing either low GFP (GFPLO) or high GFP (GFPHI). These two CC populations are referred to as iCCs and mCCs, respectively (see text). (G) PLs show high expression of Hml, Pxn, Col4a1, vkg, and NimC1 and no expression of lz> mGFP, PPO1, and PPO2. GFPLO CCs (iCCs) show moderate levels of both PL and CC specific genes. GFPHI CCs (mCCs) show high PPO1 and PPO2, no expression of NimC1 and low expression of other PL markers. (p-values shown are from GSA analysis. Mean with SD shown.) (H) DNA (Hoechst-A) measurement shows that iCCs have 2 N or 4 N DNA content, while mCCs have a significant number of cells with >4 N DNA content indicative of endoreplication. (I) Quantitation of data from four individual experiments. mCCs have higher mean FSC-A (cell size) and mean SSC-A (cellular complexity) values than iCCs. (p-values shown are from unpaired t-test. Mean with SD shown.) (J, K) Single-cell RNA-Seq analysis of dissociated cells from domeMESO-GFP.nls, HmlΔ-DsRed.nls lymph glands. (J) 2D t-SNE visualization of graph-based clustering identifies nine clusters: PSC (dark blue), MZ1 (red), MZ2 (orange), IZ (green), proPL (purple), PL (light blue), X (pink), iCC (light green), and mCC (dark red). The number of cells in each cluster is indicated on the right. (K) Expression analysis of hallmark genes shows enrichment in appropriate clusters. PSC (Antp, col, and dlp), MZ (shg, EGFP, Tep4, and dome), IZ (CG30090, lectin-24A, and MFS3), CZ (DsRed, vkg, Col4a1, Pxn, and Hml), mature PLs (NimC1), and CCs (PPO2, PPO1, lz and hnt). Delta (Dl), shg, dlp, and col show the highest enrichment in the PSC followed by MZ1. iCC, immature crystal cell; mCC, mature crystal cell.

Each bulk RNA-Seq sample utilizes cells from 100 lymph glands from mid-third instar larvae (90–96 hr after egg lay [AEL] at 25°C). Three biological replicates are analyzed for each sample and approximately 11,000 genes meet our threshold criteria for transcript expression. Previously established ‘hallmark genes,’ such as Tep4, dome, shg/E-Cad, and kn/collier (Benmimoun et al., 2015), (and EGFP), are detected in the MZ (Figure 1C). Similarly, vkg, Col4a1, Hml, Pxn, and NimC1 (and DsRed) transcripts are enriched in the CZ. The transcript expression for known markers is a validation of the Bulk-Seq approach. In addition, we identify several novel genes that are differentially expressed in the MZ (QC, mthl13, and Lst) or the CZ (Ten-m, edl, and Sr-CI) population (Figure 1D). Future genetic analysis will determine how these genes function in their specified zones.

Hml is considered a hallmark gene for PLs, however, it is also expressed in CC precursors (Goto et al., 2003). This low HmlΔ is lost in CCs expressing very high Hnt (Figure 1—figure supplement 1B-C'). Therefore, the HmlΔ-DsRed population contains both PLs and CC expressing lz, hnt (pebbled; peb), PPO1, and PPO2 (Figure 1C).

IPs do not express late differentiation markers such as NimC1 or PPO1/2, which are characteristic of mature PLs and CCs, respectively (Figure 1C). Nor do they express significant levels of very early progenitor markers such as Tep4 and kn/collier. IPs represent a transitional population between the MZ and the CZ, but the IPs are also, in themselves, a bona fide cell type, uniquely enriched in transcripts such as MFS3, CG30090, lectin-24A, CG13482, Amy-p, and CG31821 as compared with the expression of these transcripts in either MZ or CZ (Figure 1E). The collective expression of these bulk RNA-Seq derived novel IZ-enriched transcripts proved crucial in specifying a group of cells as IZ in our subsequent scRNA-Seq analysis.

We next used a genotype, combining HmlΔ-DsRed.nls and lz>mGFP (lz-GAL4, UAS-mGFP), such that CCs are separable from PLs. For this second bulk RNA-Seq, we use late wandering third instar larvae (93–117 hr AEL) at which stage CCs are more abundant than in the mid-third instar. All other conditions remain the same. Within the lz>mGFP population, two clearly separable groups, expressing high GFP (GFPHI) and lower GFP (GFPLO) are evident (Figure 1F). As expected, a large number of DsRed-positive but GFP-negative cells are sorted into a different gate and these are the PLs (Figure 1—figure supplement 1C) that express the hallmark genes: Hml, Pxn, Col4a1, vkg, and NimC1 (Figure 1G). They also do not express PPO1 or PPO2. In contrast, both GFPHI and GFPLO cells express PPO1/2 and therefore they are both CC populations.

PPO1/2 expression in GFPHI cells is much higher than in GFPLO CCs (Figure 1G). As PPO1/2 are maturity markers, this allows us to name the two GFP expressing CC classes as mature (mCC) and (relatively speaking) immature (iCC) CCs. PL-related genes are higher in iCCs than in mCCs but the pan-CC marker, hnt, is expressed at comparable levels in all CCs (Figure 1—figure supplement 1D). lz RNA is also only marginally different between the two populations, although its surrogate, lz>mGFP, is readily distinguishable. As expected, both mCC and iCC contain cells with 2N and 4N DNA content (Figure 1H). However a subset of mCCs, but not iCCs, exhibits >4N DNA content, indicative of endocycling (Krzemien et al., 2010; Terriente-Felix et al., 2013). These data suggest that endocycling is confined to the more mature, mCC subpopulation. We also find that the average forward scatter (FSC-A), a measure of cell size, and average side scatter (SSC-A), a measure of internal complexity, are higher in mCCs compared to iCCs (Figure 1I; Figure 1—figure supplement 1E, F). Thus, mCCs are larger, more mature, and more granular than iCCs.

Single-cell RNA-Seq defines subzonal patterns within the lymph gland

Bulk RNA-Seq is a useful tool for identifying the broad gene expression landscape in a relatively large group of cells with previously established canonical biomarkers. To complement and enhance these data and to characterize subpopulations within each zone, we used single-cell RNA-seq (scRNA-Seq). The same genetic background and developmental timing (90–93 hr AEL at 25oC) are used to facilitate comparison between the two approaches. Each sample utilizes 11 lymph glands to yield a concentrated cell suspension with high (85–90%) cell viability. Three biological replicates are processed in parallel and the transcriptome of about 21,200 individual cells is determined using the 10× Genomics platform and analyzed using Partek Flow software (see Materials and methods). Graph-based clustering analysis and t-distributed stochastic neighbor embedding (t-SNE) visualization in two-dimensions (2D) and three-dimensions (3D) are then performed.

Nine individual cell clusters are predicted for the lymph gland (Figure 1J), and each of these populations is present in similar proportions in all three biological replicates (Figure 1—figure supplement 2A, B). Known zone-specific markers within the differentially expressed genes assist in the assignment of unique identities to the graph-based clusters (the cluster names are justified in later sections). The PSC and IZ are each represented by single clusters. We identify two clusters (MZ1 and MZ2) with progenitor characteristics. The data suggest that in addition to IZ, a second transitional population, proPL, straddles MZ2 and the PL cluster, PL (Figure 1J; more obvious in Video 1). As in bulk RNA-Seq analysis, subclustering of CCs splits them into two populations (iCC and mCC). Validating our assignment of cell-type identity, all of the above clusters express their respective previously identified zone-specific hallmark genes (Figure 1K; see complete gene list in Supplementary file 1). Please note that a subclustering algorithm was not used to generate MZ1 and MZ2 or PL and proPL. These are products of the basic graph-based clustering process. The names MZ1 and MZ2, for example, refer to their similarities to the historical name, MZ attributed to a zone containing progenitors. In contrast, the classification as PH1, PH2, and so on, for groups of cells by Cho et al., 2020, result from true subclustering (similar to iCC and mCC identified here as subclusters of CC). In our hands, sub clustering MZ1 leads to some very small groups of cells that are not distinguishable enough to classify as separate populations.

Video 1
Movie of three-dimensional (3D) t-SNE visualization of scRNA-Seq data showing the same nine lymph gland populations and color scheme shown in Figure 1J.

In 3D, the IZ and proPL clusters are on separate planes that are largely non-adjacent to each other. However, both IZ and proPL each have adjacencies to the MZ2 and PL clusters.

The cluster designated as ‘X’ on the t-SNE exhibits high levels of mitosis and replication stress-related genes. The PSC, CC, and X clusters are distinct enough from the rest to remain as islands distant from each other and the core group of the other cell populations. The similarities and gradual transitions between the rest of the cells (belonging to MZ1, MZ2, IZ, proPL, and PL) cause their clusters to be closely associated as a core group of neighbors on the t-SNE map (Figure 1J). This organization of the t-SNE is seen in all three biological replicates (Figure 1—figure supplement 2A). Despite adjacency on the t-SNE, each cluster is distinguished by differences in differential gene expression (Supplementary file 1) and gene set/pathway enrichment (Figure 1—figure supplement 3A).

Trajectory and pseudotime analysis are used to map the timeline of progression of the identified heterogeneous population of cells through their multiple phases of maturity (Figure 2A–C). This analysis allows further groupings within the major clusters. PSC is separate in developmental origin from the rest of the lymph gland (Mandal et al., 2007) and cluster X likely represents mitotic states of several distinct cell types and therefore, although these two populations are represented in the t-SNE, they are not included for the purpose of constructing the trajectory. We find that the lymph gland cells form a branched trajectory with a total of three branch points and seven states (Figure 2C). Mapping the states back onto the t-SNE (Figure 2D) allows visualization of individual paths between related clusters. The relatedness between clusters is often easier to discern on a 3D-tSNE (Figure 2F; Video 1).

Figure 2 with 3 supplements see all
Developmental trajectory predicted by single-cell RNA-Seq data.

(A–C) Trajectory diagram shows the progression of lymph gland cells from the earliest progenitors (left) to the most mature cell types. (A) Earlier points in pseudotime (green) progress to later (magenta) developmental stages. (B) Superposition of graph-based clusters onto the trajectory shows that MZ1 appears at the beginning of the trajectory, and proceeds through MZ2, IZ/proPL, and ultimately onto the differentiated cell types iCC, mCC, and PL. (C) Trajectory diagram showing seven states, each separated by branch points of the trajectory. State 1 (MZ1 and early MZ2), state 2 (late MZ2), state 3 (late MZ2, IZ, and proPL), states 4 and 5 (IZ and proPL), state 6 (terminal CCs), and state 7 (terminal PLs). (D) Trajectory states overlaid on the 2D t-SNE plot. State 1 (dark blue) contains all MZ1 cells as well as MZ2 cells adjacent to MZ1. State 2 (dark red) is a very minor component of MZ2. State 3 (orange) lies at the border between MZ2 progenitors and the more developmentally advanced clusters (IZ, proPL, and PL). States 4 (green) and 5 (purple) represent the majority of IZ and proPL cells. State 6 (light blue) contains the majority of CCs. State 7 (pink) contains most PL cells as well as IZ and proPL cells adjacent to the PL cluster. A small number of iCC cells are in states 5 and 7, which map to the tip of the CC cluster (see M). PSC and X cells are colored gray as they were not used for trajectory analysis. (E) Volcano plot depicting the results of an ANOVA comparison between the IZ and proPL clusters. Each gene expressed in the scRNA-Seq data is represented by a dot. The X-axis depicts the magnitude of the difference in expression of each gene in IZ compared to proPL. The Y-axis indicates the statistical significance of each difference in gene expression, the false discovery rate (FDR) value for each comparison, where magenta lines represent the significance thresholds beyond which the difference in gene expression is statistically significant. IZ upregulated genes (red); proPL upregulated genes (blue); Selected statistically significant genes (black labels). (F) 3D t-SNE emphasizes that IZ (green) and proPL (purple) are distinct clusters that show little adjacency to each other because of their different 3D-planar locations. IZ and proPL possess separate connections between MZ (tan) (compare green arrow vs. purple arrow) and PL (gray) (compare green arrowhead vs. purple arrowhead). The data strongly suggest that IZ and proPL are two distinct means to connect MZ with PL. For a better view of the 3D-tSNE, see Video 1. (G) Representation of the fraction of cells from individual clusters at each pseudotime point. MZ1 (red) is found at the earliest pseudotime, while MZ2 (orange) is found in states 1, 2, and 3, slightly later in pseudotime. IZ (green) and proPL (purple) are placed primarily at intermediate pseudotime in states 3, 4, and 5. The placement of the states relative to pseudotime is indicated at the top of the graph and reveal that states 4 and 5 overlap considerably in pseudotime, indicating that spatially distinct clusters can overlap in pseudotime. Similarly, states 6 and 7 overlap and include a number of cell types. (H) Image of an early third instar lymph gland expressing CHIZ-GAL4, UAS-mGFP (green) to mark IZ cells, and domeMESO-EBFP2 (cyan) with HmlΔ-QF2, QUAS-mCherry (red) to mark proPL cells. IZ and proPL cells display distinct non-overlapping expression patterns. proPL cells can be distinguished here from PLs (which also express HmlΔ-QF2, QUAS-mCherry) by their expression of domeMESO-EBFP2. Scale bar, 10 μm. Full lobe shown in Figure 2—figure supplement 2C. (I) CHIZ-GAL4, UAS-mGFP (green) and HmlΔ-QF2, QUAS-mCherry (red) are expressed in distinct cells of the lymph gland with little colocalization. Scale bar, 10 μm. Full lobe shown in Figure 2—figure supplement 2D. (J–L) Loss of equilibrium signal with PvrRNAi increases P1 staining when driven with HmlΔ-GAL4 (L) but not with CHIZ-GAL4 (K) compared to wild-type (J). Scale bar, 20 μm. Quantifications found in Figure 2—figure supplement 2E. (M) A magnified view of the CC island from the 3D t-SNE. The boxed area is further magnified to show the identity of individual cells. iCC-6 and mCC-6 make up the majority of CCs. However, the base of the CC island (boxed part) includes small populations of iCC-5 and iCC-7 in close proximity to IZ-5 and PL-7, respectively, from which they are derived. (N) Dot plot showing the expression pattern of lz (lozenge) in iCCs and their immediate IZ-5 and PL-7 neighbors compared with IZ-5 and PL-7 cells that are not on the CC island. The ‘subpopulations’ are as follows: 1. IZ-5 cells not on the CC island. 2. IZ-5 cells on the CC island. 3. iCC-5 cells. 4. PL-7a/b cells not on CC island. 5. PL-7a/b cells on the CC island. And 6. iCC-7 cells. In each population, dot color reflects the mean level of lz expression (mean) and the dot size indicates the percentage of cells that express lz (non-zero percent). The data show that the IZ and PL cells that map to the CC island are enriched for lz unlike the rest of the IZ and PL cells. CZ, cortical zone; iCC, immature crystal cell; IZ, intermediate zone; mCC, mature crystal cell; PL, plasmatocyte; t-SNE, t-distributed stochastic neighbor embedding.

The PSC cluster

Known canonical PSC markers, such as Antp, col, and dlp, are all co-expressed at high levels in the PSC cluster (Figure 1K). Additional highly enriched genes include Pvf1, Dad, Dif, and EGFR, each of which has been shown to play a role in lymph gland development (Mondal et al., 2011; Pennetier et al., 2012; Sinenko et al., 2011; Louradour et al., 2017). Delta is expressed overall at low levels, but is enriched in the PSC with lower levels in MZ1 (Figure 1K). This pattern of expression is consistent with Cho et al., 2020. Additionally, we detect expression of Delta in mCCs.

The nature of the PSC has been extensively investigated prior to this study and GO terms related to many of the biological pathways such as TGF-β, Robo, and Wnt that are deemed important for PSC maintenance and function (Krzemień et al., 2007; Mandal et al., 2007; Sinenko et al., 2009; Morin-Poulard et al., 2016; reviewed in Luo et al., 2020) are enriched in the PSC cluster (Figure 1—figure supplement 3A). This serves as an independent validation of the RNA-Seq assisted assignment of genes for which genetics provides a specific function. Additionally, we focus later on a novel angle of the PSC cells that is related to their unique metabolic profile.

MZ clusters

Both MZ1 and MZ2 express known hallmark genes as well as a number of genes that are newly identified as MZ-enriched in the bulk RNA-Seq experiments (Figure 1K; Figure 2—figure supplement 1A). MZ1 cells are found entirely at the beginning of the trajectory at the earliest pseudotime (Figure 2A–C and G) and all MZ1 cells are contained within state 1 of the trajectory (thus formally named, MZ1-1; Figure 2D and G). MZ1 represents a small number of cells (1.3% of total cells; 4.8% of total MZ cells) with higher expression levels of several progenitor markers (e.g., shg, col/kn, dome, and Tep4) compared with MZ2 (Figure 1K; Figure 2—figure supplement 1B). MZ1 does not neighbor CZ clusters (Figure 1J) and MZ1 cells are the earliest progenitors at this time point in development. Their similarities to PSC cells (see later), could indicate additional signaling function for this cluster.

MZ2 occupies three separate trajectory states (MZ2-1, MZ2-2, and MZ2-3; Figure 2D and G). A majority is in state 1 (MZ2-1, 68% of MZ2). MZ2-2 has very few cells (2% of MZ2), whereas MZ2-3 is of significant size (29% of MZ2). The levels of the progenitor markers Tep4 and dome are highest in MZ1-1, they decrease in MZ2-1, and further decline in MZ2-2 and MZ2-3 (Figure 2—figure supplement 1B). Importantly, however, the Tep4 and dome levels in MZ2-3 are still higher than those seen in the IZ and CZ-related clusters. In contrast, we find that the expression of IZ enriched genes, such as CG30090 and MFS3, show the opposite trend compared toTep4 and dome (Figure 2—figure supplement 1B). Based on gene expression patterns, we propose that MZ2-2 and MZ2-3 are similar and represent a more mature population within MZ2. Together they represent a progenitor group of cells that mature to IZ, proPL, and PL.

MZ1 and MZ2 share 43 of the 241 differentially expressed MZ genes (Supplementary file 1). Thus MZ1 and MZ2 are closely related. However their identities are distinct. 144 of the MZ enriched genes are enriched in MZ1 and 54 are enriched in MZ2. More importantly, by multiple criteria, MZ1 and MZ2 show evidence of distinct biological functions. For instance, MZ1, but not MZ2, expresses some genes that are also found in the PSC, such as dlp and kn/col (Figure 1K). MZ1 is distinct from PSC as it lacks established markers such as Antp (Figure 1K). Surprisingly, Ubx expression seems to be a hallmark of MZ1 (Figure 2—figure supplement 1B). The expression level is low, and is similar to that seen for early PH2 progenitors in Cho et al., 2020. Please note that Ubx is also expressed in the tertiary lobe (Rodrigues et al., 2021) which was removed from our sample. Several glycolytic genes are expressed at much higher levels in MZ1 than in MZ2 suggesting distinct metabolic requirements (explored later in Figure 3).

Figure 3 with 2 supplements see all
Developmental metabolism of the lymph gland by single-cell analysis.

Expression analysis of either single genes (B) or groups of genes scored as AUCell (A, C) displayed in heatmaps with z-scores ranging from –2 (blue) to 2 (red). (A) Glycolysis pathway: enriched in PSC (z-score=5.7; corresponds to p<0.0001) and MZ1 (z-score=2.5; p=0.006). Other clusters: not statistically significant (n.s.). TCA cycle enzymes: not enriched in PSC (z-score=–4.1; p<0.0001). Other clusters: n.s. Oxidative phosphorylation: not enriched in PSC (z-score=–2.2; p=0.015), slightly enriched in MZ1 (z-score=2.0; p=0.026). Other clusters: n.s. Pentose phosphate pathway (PPP): enriched in PSC (z-score=2.4; p=0.008), slightly enriched in MZ1 (z-score=1.9; p=0.02), and mCC (z-score=2.6; p=0.004). Other clusters: n.s. Sugarbabe-dependent genes: enriched in PSC (z-score=3.4; p=0.0004). Other clusters: n.s. (B) Heatmap showing expression of assorted metabolic genes. PSC is enriched for Zw, Idh, and sug. MZ1 is enriched for srl and Myc. IZ is enriched for Mmp1, GlcT, Ect3, and CG7997. (C) Heatmap of AUCell analysis of growth-related pathways. MZ1 is slightly enriched in: srl dependent genes (z-score=2.06; p=0.020), and enriched for Myc target genes (z-score=2.44; p=0.007), TOR upregulated genes (z-score=2.64; p=0.004), and ribosome biogenesis genes (z-score=2.40; p=0.008). (D–F’’) Manipulation of the JNK pathway using CHIZ-GAL4, UAS-mGFP (green) that marks IZ cells (Spratford et al., 2020). Immunolocalization of MMP1 is shown in magenta. Images are maximum intensity projections of the middle third of a confocal z-stack of lymph glands from wandering third instar larvae. The regions boxed in (D’–F’) are shown at a higher magnification in (D”–F”). Scale bars, 10 μm. (D–D”) MMP1 protein expressed in close proximity to IZ cells in wild-type (WT). (E–E”) A microRNA-based depletion of JNKK/hep in the IZ cells results in loss of MMP1 throughout the lymph gland (including in the region neighboring the IZ). (F–F’’) Overactivation of JNK by a constitutively active isoform of JNKK/hep causes a large increase in MMP1 staining throughout the lymph gland. (G) Summary of the metabolic gene expression signatures found in the PSC, MZ1, IZ, and CC clusters. Green (up) arrows indicate enrichment of genes/pathways while red (down) arrows indicate the absence of enrichment. CC, crystal cell; IZ, intermediate zone; MZ, medullary zone.

Another distinct biological difference between MZ1 and MZ2 involves the expression of specific immunity-related genes. For example, MZ2 (but not MZ1) progenitors are enriched for all four Cecropin genes (CecA1, CecA2, CecB, and CecC) (Figure 1—figure supplement 3A; Figure 2—figure supplement 1C), which are involved in antibacterial humoral response downstream of the Toll and Imd pathways. Other Imd-related genes such as PGRP-SC2, and Toll pathway genes such as grass, are also enriched in MZ2 compared to MZ1 (Figure 1—figure supplement 3A; Figure 2—figure supplement 1C). Overall, these results indicate that the slightly more mature MZ2 cells might be better poised to respond to immune challenge than MZ1 cells, while MZ1 cells are the least mature and are more likely to respond to different metabolic and local signaling cues.

Cluster X

X is a very small cluster of cells (~1% of total) with a rather unique genomic composition. While most zones, which are much larger than X, are enriched for approximately 100–200 genes, ANOVA analysis suggests that for X this number is over 2000 (Supplementary file 1). X represents a mitotic component of the lymph gland and likely includes cells from multiple zones. Cell cycle-related proteins are enriched in cluster X (Figure 1—figure supplement 3A), and the five most represented genes are regulators of cell cycle (reviewed in Lee and Orr-Weaver, 2003; Berridge, 2014) in S and G2: cdc25/stg (30-fold), cdt1/dup (11-fold), Mcm5 (10-fold), Claspin (9-fold), and dap/p21 (9-fold). Also enriched are genes involved in DNA replication, cell division, spindle checkpoint, mitotic spindle, and kinetochores (Figure 1—figure supplement 3A). AUCell analysis, another tool for analyzing the enrichment of specific gene sets (see Materials and methods), further suggests that Cluster X shows high levels of ‘mitotic G2/M transition’ activity (Figure 2—figure supplement 1D). Interestingly, single-cell transcriptomic study of the human HSC/HSPC cells from the bone marrow (Velten et al., 2017) also found small high cell cycle activity clusters with characteristics similar to X.

Another characteristic of X is that it is the only cluster to include replication stress-related intra-S DNA damage checkpoint genes (Figure 2—figure supplement 1E). This is a characteristic of replication fork formation in transcriptionally active cells (Lee et al., 2012; Blythe and Wieschaus, 2015; Iyer and Rhind, 2017) and ‘replication stress’ is a means to control the progression of the cell cycle (Berti and Vindigni, 2016; Zou and Nguyen, 2018). In the lymph gland, the MZ cells are prolonged in their G2 state (Sharma et al., 2019) and it is attractive to hypothesize that replication stress-related S-phase events are, at least in part, responsible for the slow-down of the subsequent G2. Importantly, the enrichment of DNA damage-related genes in cluster X is not due to cells of this cluster being generally damaged or dying as quality control metrics such as percentage mitochondrial and ribosomal reads are on par with the other clusters (Figure 1—figure supplement 2C, D).

Finally, 11 out of the 12 members of the Myb complex (Myeloblastosis oncoprotein family), including the Myb transcription factor itself, are highly enriched in Cluster X (Figure 1—figure supplement 3A). Myb regulates DNA damage checkpoints and participates in the DNA repair process in cancer cells (Yang et al., 2019) with an established role, as well, in Drosophila blood cells (Davidson et al., 2005) and is similar to that in mammalian hematopoietic cells (Greig et al., 2008).

Cluster X is physically separated from the larger core group of clusters on the t-SNE and although small, the X cluster itself is separated into two islands (Figure 1J). One of the X islands (~30%) is physically closer to the MZ, while the larger one (~70%) maps in the direction of the PL cluster. The smaller island is more MZ-like (XMZ) and the larger is more CZ-like (XCZ) in gene expression (Figure 2—figure supplement 1F). Subclustering of X leads to three separate groups, one which corresponds directly to XMZ and the other two represent a split of XCZ into two subclusters, XTR (transitional) and XPL (PL-like) (Figure 2—figure supplement 1F, G). These subclusters show different patterns when profiled for hallmark zone-specific genes and markers. The clear variation in the expression of MZ and CZ genes within X provides the basis for the designation XMZ, XTR, and XPL for the three subclusters. When a trajectory diagram that only includes cells from X is constructed, XMZ is earliest in pseudotime and XPL is the latest (Figure 2—figure supplement 1H, I). XTR represents a mid-point transition into the PL state. This variation in the three X subclusters is limited to the expression of the MZ and PL hallmark genes. In contrast, cell cycle and DNA damage-related genes that are enriched in X relative to the other clusters are equally represented in XMZ, XTR, and XPL (Figure 2—figure supplement 1J).

IZ cluster

In the bulk RNA-Seq experiment, 10 genes were identified as enriched in the double positive IZ cells. Six of these 10 markers are expressed at levels that are detected in the scRNA-Seq. Of these six, five are enriched in the IZ cluster (Figure 2—figure supplement 1K; Supplementary file 1). The top IZ-enriched markers continue to be CG30090, lectin-24A, MFS3, and CG13482 (Figure 1E and K; Figure 2—figure supplement 1K). Single-cell analysis also confirms that the IZ cluster expresses lower levels of canonical CZ markers compared to the PL clusters and does not express mature blood cell markers (Figure 1K). On the t-SNE, the IZ cluster lies between MZ2 and PL (Figure 1J), consistent with its intermediate nature between progenitors and differentiated cells. Although IZ cells form a single cluster, they are found in multiple trajectory states, IZ-3, IZ-4, IZ-5, and IZ-7 that lie between MZ2 and PL/CC on the trajectory and in pseudotime (Figure 2A–D and G). IZ-3 borders and represents a step immediately after MZ2-3 (Figure 2B and D). Whereas, IZ-5 cells are either near CC clusters or placed between IZ-4 and IZ-7 (Figure 2D and M). IZ-7 is at the border with PL and therefore represents a transition to committed PLs (Figure 2D). The most parsimonious model is that IZ-5 cells have the capacity to directly become CCs, or alternatively, via IZ-7, they take on a PL fate. Finally, IZ-independent paths to mature blood cells are also observed that are described later.

proPL and PL clusters

Two separate clusters (proPL and PL) identified in scRNA-Seq both enriched for hallmark PL genes and neither cluster expresses CC-specific genes (Figure 1K). Expression of several PL markers is slightly lower in proPL than in PL (Figure 1K), and proPL appears earlier in pseudotime than PL (Figure 2A–C and G). In addition to such quantitative temporal differences in the expression of known markers, the distinction between PL and proPL clusters is highlighted by the differential expression of at least 750 genes that are represented differently between the two clusters (Supplementary file 1).

proPL cluster

The proPL population arises in multiple states on the developmental trajectory appearing first in state 3 (proPL-3), followed by proPL-4/5/7 (Figure 2A–D and G). The proPL subclasses differ from each other in their placement in pseudotime (Figure 2G), and in the expression levels of multiple mitosis- and maturation-related genes (Figure 2—figure supplement 2A, B, J).

As described earlier for IZ, placement of a group of cells in multiple trajectory states is a characteristic of a transitional population that can represent multiple distinct developmental paths. The IZ and proPL cells belong to distinct cell clusters with clear transcriptomic differences, and are largely non-adjacent on the t-SNE; this fact is easier to discern in a 3D t-SNE representation (Figure 2F; better seen in Video 1). This contrasts with the extensive direct adjacencies observed for both proPL and IZ with MZ2 and PL (Figure 1J; Figure 2F; Video 1). Both IZ and proPL cells are found in similar transitional states (3–5) on the trajectory and arise at similar points in pseudotime, although their pseudotime profiles are distinct (Figure 2G).

IZ and proPL are both transitional populations with some similarities in the scRNA-Seq, but can be genetically distinguished from each other in vivo by using a combination of the Q and GAL4 systems. In a genetic background that includes HmlΔ-QF/QUAS (Lin and Potter, 2016) and CHIZ-GAL4/UAS (Spratford et al., 2020), the IZ is marked by CHIZ-GAL4 while HmlΔ-QF marks proPL and PL. CHIZ-GAL4 is a split GAL4 construct that contains a combination of HmlΔ and domeMESO enhancers and includes the strong p65 activation domain. CHIZ-GAL4 expression overlaps with cells that are double positive for the directly driven markers domeMESO-GFP and HmlΔ-DsRed (Spratford et al., 2020). The percentage of cells marked by CHIZ-GAL4 (~17% at 96 hAEL) (Spratford et al., 2020) is consistent with the size of the IZ cluster defined by scRNA-Seq (~14–16% in individual replicates). Early proPL cells can be identified by co-expression of HmlΔ-QF and domeMESO-EBFP2 (Figure 2H; Figure 2—figure supplement 2C). HmlΔ-QF has an identical expression pattern as the HmlΔ-GAL4 from which it is derived (Lin and Potter, 2016) but shows little overlap with CHIZ-GAL4 (Figure 2H–I; Figure 2—figure supplement 2C, D). Why the HmlΔ-QF/HmlΔ-GAL4 constructs exhibit a more restricted expression pattern that does not include IZ cells, while the directly driven HmlΔ-DsRed is expressed at low levels in the IZ is unclear, but is likely due to differences in timing or level of expression. Nevertheless, the use of this complex genetic background allows simultaneous detection of IZ and proPL cells in the same lymph gland (Figure 2H–I; Figure 2—figure supplement 2C, D), and these two populations are largely distinct.

Direct comparison between IZ and proPL shows that they differ significantly in their expression of many genes (Figure 2E). Another important distinction between IZ and proPL is that proPLs (proPL-4 in particular), but not IZ cells, show high AUCell activity for a diverse set of genes that are collectively identified as participants of the backward or equilibrium signal (Pvr, STAT92E, ADGF-A, bip1, RPS8, and Nup98-96) (Figure 2—figure supplement 2A, B). Past genetic analysis had indicated that the CZ initiates the equilibrium signal via ADGF beginning in the second instar (Mondal et al., 2011). Consistent with this finding, we find the loss of equilibrium signaling with PvrRNAi in HmlΔ-GAL4+ cells results in increased differentiation, while PvrRNAi in CHIZ-GAL4+ IZ cells does not cause a similar phenotype (Figure 2J–L; Figure 2—figure supplement 2E). These genetic results, when combined with the data that show that the backward signal genes show higher AUCell activity in proPL relative to IZ and PL, suggest that it is the proPL, and not the IZ or PL, cells that largely participate in the backward signaling to progenitors in vivo. This functional difference, taken together with the trajectory and gene enrichment results, further support the notion that proPL and IZ are two separate transitional populations that follow distinct paths as they mature towards the same set of differentiated cells.

PL cluster

In contrast to the multiple pseudotime states of the proPL cells, virtually all cells of the PL cluster (>99%) are seen exclusively in state 7 (PL-7) at the terminal arm of the trajectory (Figure 2A–D). Accordingly, the highest transcript levels of the mature PL marker, NimC1, are seen in PL-7 (Figure 1K).

The high NimC1 expressing PL-7 can be further subclustered into four smaller groups that we named PL-7a, PL-7b, PL-7c, and PL-7d (or PL-7a/b and PL-7c/d for convenience; Figure 2—figure supplement 2F). PL-7a/b cells have lower NimC1 than PL-7c/d. In fact, the NimC1 levels of PL-7a/b are more similar to that in the transitioning proPL-5/7 cells than in the mature PL-7c/d (Figure 2—figure supplement 2I). By far, the highest NimC1 levels are reserved for PL-7c/d located along the edge of the t-SNE (Figure 2—figure supplement 2F, I). Pseudotime analysis shows that PL-7a/b arise earlier in development than PL-7c/d (Figure 2—figure supplement 2G). We conclude that PLs follow a maturation path from the proPL-5/7 to PL-7a/b and then to the most mature PL-7c/d cells.

A very small fraction of PL cells arise earlier as PL-3 (Figure 2G). They are represented on the t-SNE as a very thin but distinctive border separating MZ2-3 and PL-7 (Figure 2—figure supplement 2F). PL-3 (and some adjacent cells) express unexpectedly high amounts of the progenitor marker E-Cad/shg (Figure 2—figure supplement 2H), as do the PL-7 cells in the vicinity (PL-7a and PL7d). Normally, E-Cadherin expression is a characteristic of MZ, with its expression declining in IZ/proPL and PL clusters (Figure 1K). The placement of the PL-3 subpopulation on the t-SNE, the trajectory, and its high E-Cad level may suggest that PL-3 is derived directly from the MZ progenitors.

CC clusters

The cells of the CC cluster are identified by the high expression of canonical CC markers and by the complete absence of NimC1 (Figure 1K). Additionally, transcripts encoding factors that respond to stress, heat, and unfolded proteins are enriched in CCs (Figure 1—figure supplement 3A). This is consistent with genetic data on CCs as mediators of stress response (Sorrentino et al., 2002; Cho et al., 2018; Miller et al., 2017). Two subclusters of CCs, iCC and mCC, predicted by flow cytometry and bulk RNA-Seq experiments, are also distinguishable in scRNA-Seq (Figure 1J) by their differential expression of the maturation markers PPO1 and PPO2 (Figure 2—figure supplement 3A). Transcripts for lz and hnt are also higher in mCC than in iCC, while Hml shows the opposite trend (Figure 2—figure supplement 3A). Also, lz and hnt correlate positively with PPO2 in both iCCs and mCCs, whereas Hml correlates negatively, especially strongly in mCCs, with PPO2 (Figure 2—figure supplement 3B-D). These results further reinforce that the two subclusters of CCs represent an immature (iCC) and a mature (mCC) population as has been seen in recent transcriptomic studies (Cho et al., 2020; Tattikota et al., 2020).

All mCCs and the vast majority of iCCs (~95%) belong to terminal state 6 (mCC-6 and iCC-6) (Figure 2A–D and G), which is the dedicated CC arm of the trajectory, but a small (~5%) fraction of iCCs are also found in states 5 (iCC-5) and 7 (iCC-7) (Figure 2G and M). iCC-5 and iCC-7 represent distinct developmental paths in the formation of iCCs from their precursors. On the t-SNE, the broader tip of the CC island bifurcates and one arm contains iCC-5 and their adjacent small number of IZ-5 cells, while the other arm includes iCC-7 adjacent to a few PL-7 cells (Figure 2M). Interestingly, these IZ-5 and iCC-5 cells share expression of CG30090 (Figure 2—figure supplement 3E, F) and lz (Figure 2N). Similarly, these PL-7 and iCC-7 cells both express NimC1 (Figure 2—figure supplement 3E, G) and lz (Figure 2N). These results indicate that these IZ-5 and PL-7 cells are the precursors to iCC-5 and iCC-7, respectively. Both of these iCC populations then transition to iCC-6 and further to mCC-6. The IZ-5/iCC-5/iCC-6/mCC-6 path represents the straight-forward maturation of CCs. The PL-7/iCC-7/iCC-6/mCC-6 path, on the other hand, has a step that is a reversal in pseudotime. This likely represents the phenomena of dedifferentiation or transdifferentiation of CCs from PLs, which are supported by genetic data (Terriente-Felix et al., 2013; Leitão and Sucena, 2015).

Comparative gene enrichment in differentiating cells

To better understand major genetic components that control similarities and differences between the cells of IZ, proPL, PL, and CC, we performed gene enrichment analysis (Figure 1—figure supplement 3A).

An important role of differentiated hemocytes is the formation and secretion of ECM/BM components (Tepass et al., 1994; Martinek et al., 2008; reviewed in Fessler and Fessler, 1989; Pastor-Pareja, 2020). Surprisingly, genes related to ECM/BM such as vkg, Col4a1, SPARC, Laminins (A, B1, and B2), and Tiggrin are enriched not only in the PL but also in the transitory proPL and IZ populations (Figure 1—figure supplement 3A; Figure 2—figure supplement 3H). This is also true for genes involved in hydroxyproline production required for Collagen formation, such as PH4αEFB, Pdi, and Plod, which positively correlate with Col4a1 expression (Figure 2—figure supplement 3H, I). However, we found that genes involved in secretion of Collagen and ECM/BM proteins are not enriched in IZ/proPL as they are in PL (Figure 1—figure supplement 3A; Figure 2—figure supplement 3H). We conclude that while ECM/BM proteins initiate their expression and maturation in the transitory IZ/proPL cells, the secretory mechanism for these proteins likely becomes fully functional only at the PL stage.

As expected, components of common signaling pathways, known for their context-dependent function are not zone-specific, with some more broadly represented than others. The Ras/MAPK pathway, for example, is enriched in multiple zones (IZ, proPL, and PL), and is particularly not enriched in CCs (Figure 1—figure supplement 3A; Figure 2—figure supplement 3H). The Notch pathway shows the opposite pattern with enrichment in CCs compared to IZ, proPL, and PL (Figure 2—figure supplement 3H). It could be argued that while hallmark genes make good markers, the distributed ones may contain more developmental information, for example, the above trends suggest that the Notch and MAPK pathways oppose each other in the choice between CCs and PLs, respectively (see case studies later).

More novel and surprising is the finding that genes that belong to prominent metabolic pathways are enriched in the PSC. This prompted us to investigate if key metabolic pathways play unique zone-specific roles in the lymph gland.

Gene enrichment of metabolic pathways

Complete functional insight into the role of metabolism in lymph gland development will require metabolomic analysis, which is beyond the scope of this study. However, much can be gleaned from transcriptomic data since multiple components of any single metabolic pathway are often co-regulated by common transcription factors.

Glycolysis and TCA cycle

Glycolysis-related genes are significantly enriched in the PSC (AUC scores: Figure 3A; individual genes: Figure 3—figure supplement 1A, B). The gene sugarbabe (sug) encodes a transcription factor that regulates multiple glycolysis and gluconeogenesis-related genes. sug and its known downstream targets are highly enriched in the PSC (Figure 3A–B) and they exhibit a strong positive correlation with glycolytic gene expression (Figure 3—figure supplement 1E). In contrast, TCA cycle and oxidative phosphorylation-related genes are particularly not enriched in the PSC (Figure 3A). However, it is very unlikely that the bioenergetic requirements of the PSC are maintained through aerobic glycolysis (Warburg effect) as in cancer cells (reviewed in Liberti and Locasale, 2016; Drosophila example: Wang et al., 2016) because the transcript for lactate dehydrogenase (Ldh), the enzyme involved in the last step of glycolysis is not expressed in the PSC (Figure 3—figure supplement 1B). Combined with the low expression of TCA and Ox-Phos genes, we conclude that the PSC has a very low bioenergetic requirement that is characteristic of quiescent post-mitotic cells. This is further supported by the fact that the percentage of mitochondrial reads is lower in the PSC compared to other clusters (Figure 1—figure supplement 2C).

Pentose phosphate pathway

If not for energy generation, what could be the need for the high expression of glycolytic genes (other than Ldh) in the PSC? The evidence points to the importance of pentose phosphate pathway (PPP), the biosynthetic arm of glucose metabolism (Stincone et al., 2015). PPP-related genes are enriched in the PSC and MZ1 (Figure 3A). The absence of enrichment of oxidative phosphorylation genes in the PSC and their relatively higher levels in the MZ (Figure 3A) points to a higher bioenergetic status for the progenitors than that of the PSC. However, while increased mitochondrial activity facilitates ATP generation, it would also potentially raise reactive oxygen species (ROS) levels in the MZ.

NADPH and ROS

G6PD (Zw), the PPP component enzyme that catalyzes the first reaction in the PPP produces NADPH, a crucial metabolite that maintains glutathione in its reduced form (GSH), which in turn acts as a scavenger of intracellular ROS (Ying, 2008; Fan et al., 2014; Lewis et al., 2014; Kuehne et al., 2015). Unlike NADH, NADPH is produced by only a handful of enzymes, the most prominent being isocitrate dehydrogenase (Idh) (Geer et al., 1979a; Geer et al., 1979b). Both Zw and Idh are enriched in the PSC (Figure 3B; Supplementary file 1), and together they would raise NADPH, facilitating GSH formation and maintaining a low ROS level in the PSC. The lower Idh and Zw expression in the MZ suggests lesser scavenging of physiological ROS content, which has interesting biological correlates from past genetic studies (see Discussion).

Additional important enzymes involved in NADPH generation are malic enzyme (Men) and phosphogluconate dehydrogenase (Pgd) (Geer et al., 1979a; Geer et al., 1979b; reviewed in Stanton, 2012). The CCs, by far, show the highest expression for both Men and Pgd, with a considerable increase from iCCs to mCCs (Figure 3—figure supplement 1C). Presumably, ROS is kept particularly low in the mCCs to prevent premature JNK activation, which is known to promote CC bursting and melanization (Bidla et al., 2007). ROS may also be kept in check by the expression of antioxidants, such as Sod1, Catalase (Cat), Jafrac1, and Trx-2, that are higher in mCCs than in iCCs (Figure 3—figure supplement 1C). Of these, Trx-2 mutations have been shown to cause CC defects (Jin et al., 2008).

Lipids, autophagy, and chaperones

The iCCs and mCCs are also different in additional metabolic aspects. For instance, both peroxisomal and mitochondrial fatty acid beta oxidation, as well as fatty acid synthesis genes decrease in relative expression in mCCs compared to iCCs (Figure 3—figure supplement 1D), whereas glycerolipid remodeling/lipid signaling genes (e.g., Bbc, Pld, Lpin, laza, Plc21C, GK2, sws, and CG10602) are highly enriched in mCC (Figure 3—figure supplement 1D).

Autophagy related genes (such as Atg1, Atg13, and Atg17) and pathways indirectly related to autophagy (reviewed in Soto-Avellaneda and Morrison, 2020; Carra et al., 2010; Kaushik and Cuervo, 2012; Uytterhoeven et al., 2015) are strongly enriched in mCCs relative to iCCs (Figure 3—figure supplement 1D). This includes glycerolipid remodeling/lipid signaling and chaperone-mediated protein folding (e.g., Hsc70-4, Hsp67Bc, and Hsp70Bb) (Figure 3—figure supplement 1D), both of which correlate strongly (r=0.99 and 0.95, respectively) with autophagy genes (Figure 3—figure supplement 1F, G). Future genetic explorations will likely unravel the precise link between lipid signaling, chaperone-mediated autophagy, and the maturation of CCs.

Transcription factors in metabolic control

Among transcription factors that control metabolism-related genes, Spargel (srl; PGC1-α) and its targets are enriched in MZ1 (Figure 3B–C). Srl, a homolog of mammalian PGC1-α, is a transcriptional target of Myc and both Srl and Myc function downstream of the insulin receptor/TOR signaling pathways to mediate ribosome biogenesis, mitochondrial activity, and cell growth (Tiefenböck et al., 2010; Mukherjee and Duttaroy, 2013; Mukherjee et al., 2014; Teleman et al., 2008). Myc and its transcriptional targets, as well as TOR upregulated genes, and those related to ribosome biogenesis are all enriched in MZ1 (Figure 3B–C). These trends are similar in MZ2 when compared to the other clusters. In a related observation by direct comparison of cell size by FSC, we find that MZ progenitors are on average larger in size than the cells of the CZ (Figure 3—figure supplement 2A), which is consistent with the higher growth-promoting pathway activity within the MZ.

Sphingolipid metabolism

The IZ cells frequently express intermediate levels (between MZ and CZ) of most metabolic pathway genes, with the prominent exception of sphingolipid metabolism that is enriched in the IZ (Figure 1—figure supplement 3A). This further reinforces the independent cell-type identity of the IPs. For example, the gene encoding the rate-limiting enzyme for de novo ceramide synthesis pathway (spt2/lace; Kraut, 2011) is enriched in the IZ (Figure 1—figure supplement 3A; Supplementary file 1). This enzyme helps convert palmitoyl-CoA and serine to ceramide. AUCell scores for the entire de novo ceramide synthesis pathway are higher in the IZ when compared to other clusters (Figure 3—figure supplement 2B). Excess ceramide is toxic and is kept in check by enzymes of the glycosphingolipid pathway (Kohyama-Koganeya et al., 2004). Such genes include GlcT, Ect3/Beta-Gal, and CG7997/alpha-Gal that are also enriched in the IZ (Figure 1—figure supplement 3A; Figure 3B; Supplementary file 1).

Ceramide and JNK activation

Ceramide production is linked to JNK activation in Drosophila and in other organisms (Adachi-Yamada et al., 1999; reviewed in Ruvolo, 2003; Kraut, 2011) and predicted JNK/AP-1 targets are enriched in the IZ (Figure 1—figure supplement 3A). AUCell activity for predicted AP-1 target genes is also highest in the IZ relative to the other clusters (Figure 3—figure supplement 2B). Moreover, predicted AP-1 targets positively correlate with de novo ceramide synthesis in IZ cells (r=0.94; Figure 3—figure supplement 2C). Mmp1 is prominent amongst the JNK targets (Uhlirova and Bohmann, 2006; Stevens and Page-McCaw, 2012) in that it is enriched in the IZ and correlates positively (r = 0.9) with de novo ceramide synthesis (Figure 3B; Figure 3—figure supplement 2D). The gene encoding the rate-limiting enzyme in the glycosphingolipid pathway, GlcT, is also a target of the JNK pathway and its expression positively correlates (r = 0.89) with Mmp1 (Figure 3—figure supplement 2E). This suggests an opportunity for feedback inhibition whereby ceramide activates the JNK pathway, including its downstream target GlcT, which limits free ceramide levels. This would prevent uncontrolled JNK activation that can result in cell death (Kohyama-Koganeya et al., 2004).

The possibility of a link between ceramide biosynthesis, JNK pathway, and MMP1 within the transitional IZ population is intriguing from a functional standpoint, and we therefore probed this further using molecular-genetic tools. Immunolocalization using an antibody against MMP1 reveals that the expression of the protein is limited to the region of the IZ (Figure 3D–D’’). MMP1 is a secreted protein, and is detected in cells at the edge of the IZ, likely to act as a metalloprotease in reorganizing the ECM around the newly forming hemocytes. Consistent with the high representation of Mmp1 transcript in the IZ, inhibition of the JNK pathway (JNKK/hepRNAi) in the IZ (CHIZ-GAL4) alone eliminates all the diffuse MMP1 protein detected in the IZ neighbors (Figure 3E–E’’), suggesting the IPs are a source of MMP1. Likewise, a huge increase in MMP1 protein is seen when an activated form of JNKK (hepact) is expressed in the IPs (Figure 3F–F’’). Interestingly, activation of JNK in this manner does not cause extensive cell death suggesting the possible concurrent presence of a cell death inhibition mechanism operating within the IZ cells (Uhlirova et al., 2005). A schematic diagram summarizing the transcriptomic control of metabolic genes in different cell populations is shown in Figure 3G.

Synergistic combinations of genetic and transcriptomic data

Case study 1. Pointed and plasmatocyte formation

The ETS family transcription factor Pointed (Pnt) functions downstream of RTK/Ras/MAPK pathways and regulates differentiation and proliferation in multiple fly tissues including blood (Zettervall et al., 2004; Dragojlovic-Munther and Martinez-Agosto, 2013; Shwartz et al., 2013; reviewed in Vivekanand, 2018). pnt transcript is expressed in very few cells in the PSC, rising slightly in MZ1/MZ2, which is particularly noticeable in MZ2-3 and then continuing in its rising trend in IZ/proPL and PL. This suggests the possibility of multiple functions for pnt in these different cell types. Pnt levels decline significantly in CCs, particularly mCC, suggesting low RTK-related activity in these cells (Figure 4A).

Figure 4 with 1 supplement see all
Role of Pnt in lymph gland development.

(B, C, G, H, and J–O) are maximum intensity projections of the middle third, and (E, F) are single confocal slices. Lymph glands are from wandering (B, C, G, H, J–M) or early (E, F, N, O) third instar larvae. Scale bars: 10 μm. (A) scRNA-Seq analysis shows graded pnt expression in different subpopulations (see text for details). The mean level of pnt expression is represented by the dot color and the percentage of cells that expresses pnt is indicated by the dot size. (B–D) Genetic analysis of domeMESO-GAL4, UAS-GFP, HmlΔ-DsRed lymph glands. (B) Control lymph gland. domeMESO marks MZ (reported by GFP, green) and HmlΔ marks IZ and CZ (reported by DsRed, magenta). (C) Expression of pntRNAi in the MZ cells that are domeMESO positive prevents the formation of IZ and CZ cells. (D) Quantitation of data shown in (B, C) reveals a complete loss of HmlΔ-DsRed+ cells (IZ, proPL, or PL) in domeMESO-GAL4, UAS-pntRNAi. n=11. (E) Genotype, Tep4-GAL4, UAS-GFP, domeMESO-EBFP2, HmlΔ-DsRed. Late progenitors marked by arrowheads are positive for domeMESO (blue) but negative for Tep4 (green) and HmlΔ (red).(F) Genotype, CHIZ-GAL4, UAS-mGFP, domeMESO-EBFP2, Tep4-QF2, QUAS-mCherry. A population of pre-IZ late progenitors marked by arrowheads are positive for domeMESO (blue), but negative for Tep4 (green) and CHIZ (red). (G–I) Genotype, Tep4-GAL4, UAS-GFP, HmlΔ-DsRed.(G) Control lymph gland. Tep4 (green) is expressed in a subset of MZ progenitors. Hml (magenta) marks IZ/CZ. (H) pntRNAi expressed in Tep4+ MZ cells has no effect on the formation of IZ/CZ cells. (I) Quantitation of the data in (G, H) shows no significant difference in the number of HmlΔ-DsRed+ cells when pntRNAi is expressed using Tep4-GAL4 (contrast with D). WT: n=12; pntRNAi: n=10. (J, K) Genotype is the same as in (B, C). (J) Control shows nuclei (DNA, blue) and crystal cells (CCs; Hnt, yellow). (K) Depletion of pnt in domeMESO positive MZ cells does not prevent formation of Hnt+ CCs. Quantitation in Figure 4—figure supplement 1E. (L, M) Genotype, CHIZ-GAL4, UAS-mGFP (GFP not shown). (L) Control shows DNA (blue) and Hnt (yellow). (M) Depletion of pnt in IZ cells does not prevent formation of Hnt+ CCs. Quantitation in Figure 4—figure supplement 1F. (N–P) Genotype, HmlΔ-GAL4, UAS-2xEGFP. HmlΔ (green) and Hnt (magenta). (N) Control. (O) pntRNAi expressed in HmlΔ-GAL4+ cells increases the number of Hnt+ CCs. (P) Quantitation of the data in (N, O) shows a significant increase in the number of Hnt+ CCs with pntRNAi using two independent RNAi lines driven by HmlΔ-GAL4. WT: n=8; pntRNAi 1 & 2: n=10. CZ, cortical zone; IZ, intermediate zone; MZ, medullary zone; PL, plasmatocyte.

Knockdown of pnt specifically in the MZ (domeMESO-GAL4, UAS-pntRNAi) blocks the differentiation of the progenitor population (Figure 4B–C). No HmlΔ-DsRed positive IZ, proPL, or PL cells are detected (Figure 4D). There is an increase in domeMESO positive progenitors, but the complete lack of IZ and CZ results in an overall smaller lymph gland (Figure 4—figure supplement 1A, B). Published literature shows that the marker Tep4 is expressed in a limited number of MZ cells that are the least mature (Benmimoun et al., 2015; Oyallon et al., 2016; Blanco-Obregon et al., 2020). Dome expression initiates in the same cells as Tep4 but extends further within the MZ. In fact, using a combination of cell-marking methods, we clearly detect a population of cells that are dome-positive, HmlΔ-DsRed-negative, and Tep4-negative (Figure 4E) and these are also distinct from the IZ cells since they do not express an IZ specific-GAL4 driver (CHIZ-GAL4) (Figure 4F). In stark contrast to domeMESO-GAL4, UAS-pntRNAi, the same pntRNAi expressed in the high Tep4 positive early MZ progenitors (Tep4-GAL4, UAS-pntRNAi), has no observable effect on either differentiation or lymph gland size (Figure 4G–I; Figure 4—figure supplement 1C, D). Combined with the fact that pnt expression is higher in MZ2-3 than in earlier MZ subpopulations, we propose that Pnt functions in a post-Tep4 and pre-IZ population of dome-expressing cells likely within the sub-state MZ2-3, and promotes their transition into the intermediate IZ and proPL cell types. Interestingly, CCs still form when pnt is depleted in the MZ (Figure 4J–K; Figure 4—figure supplement 1E). This suggests that CC formation does not require Pnt activity and that there is a direct route (perhaps made more prominent under these mutant conditions) for a MZ cell to become a CC without first going through a HmlΔ-DsRed positive IZ/proPL/PL cell type.

Since the loss of pnt in the MZ blocks entry into IZ, the higher pnt expression in the IZ suggests yet another different and additional role in this zone. This IZ function is explored in some detail elsewhere (Spratford et al., 2020), where we demonstrate that loss of pnt in the IZ prevents these cells from exiting their transitional state and prevents PL differentiation. Together with the data presented here, we conclude that Pnt is required for both entry into and exit from the IZ. Once again, CC formation is not affected upon loss of pnt in the IZ (Figure 4L–M; Figure 4—figure supplement 1F), reinforcing the idea of a direct path between MZ and CC without an intervening HmlΔ-positive cell. This is in addition to the IZ-dependent CC formation in wild-type (WT) described earlier.

Finally, since even higher pnt levels are seen in PL (Figure 4A), we eliminated Pnt function in HmlΔ-GAL4 expressing cells (HmlΔ-GAL4, UAS-pntRNAi). This causes a large number of HmlΔ-GAL4+ cells to be converted into CCs (Figure 4N–P). Thus, loss of pnt in a HmlΔ-GAL4+ precursor alters the PL/CC fate choice. Keeping in mind that Pnt is activated by RTK/MAPK pathways and that the Serrate/Notch pathway is important for CC formation, we conclude that Notch activation directs HmlΔ positive cells towards CC differentiation, while Pnt functions antagonistically to prevent this process, driving the cell instead to a PL fate (similar antagonistic interactions between these two pathways are seen in many tissues; reviewed in Sundaram, 2005). Overall, our combined RNA-Seq and genetic analysis demonstrate that Pnt has several distinct, context-dependent functions during lymph gland development and the genetic and sequencing data mutually validate each other.

Note that the expression of pnt changes gradually as one progresses through the lymph gland. There are no quantal jumps between zones, yet the function of Pnt is distinguishable between cell types (summarized in Figure 7C). We believe that most developmentally relevant genes will not be ‘hallmark indicators’ or markers of zones, but the trend and subtle modulations of their expression could have unique functional consequences for each cell type.

Case study 2. Numb/Musashi assisted non-canonical Notch signaling in crystal cells

A canonical, Serrate-dependent Notch signal is required for CC formation from a Hml+ precursor; whereas a separate, non-canonical, ligand-independent and Sima (Hif)-dependent Notch signal is important for CC maintenance (Mukherjee et al., 2011). Mechanistic details of this complex process, which remained elusive for over a decade are described below, and could only be deciphered when genetic data are analyzed in the context of the expression profiles of a number of genes.

Notch target genes have been investigated at length using multiple functional and biochemical criteria (Krejcí et al., 2009; Terriente-Felix et al., 2013). Based on the RNA-Seq data, such a list of targets can be classified into two groups, which for simplicity, we call type I and type II. Type I targets (such as E(spl)m3-HLH, cv-c, and CG3847) are expressed at a higher level in iCCs than in mCCs (Figure 5A), while type II targets (such as CG32369, bnl, and IP3K2), are more highly enriched in mCCs than in iCCs (Figure 5B). The type I targets correlate positively with the maturity marker PPO2 in iCC but negatively in mCCs (Figure 5C). In contrast, type II targets positively correlate with PPO2 in both iCC and mCC populations (Figure 5D). As one example of validation, branchless (bnl), a type II target by its expression, is seen in only a subset of the CCs, expected to be the more mature (Figure 5—figure supplement 1A-A'; Tattikota et al., 2020). Type II Notch targets, including bnl and CG32369, have been shown in independent studies to be both Notch pathway and Sima/hypoxia-responsive (Li et al., 2013; Terriente-Felix et al., 2013; Kamps-Hughes et al., 2015; Du et al., 2017), and we find that enhancer sequences for these two genes contain combinations of both Su(H) and Sima binding sites (Figure 5—figure supplement 1B, C).

Figure 5 with 4 supplements see all
Numb promotes non-canonical Notch/Sima signaling.

(A, B, E, F, K) are from bulk RNA-Seq whereas (C, D, L) are from single-cell RNA-Seq. (A) ‘Type I’ Notch targets with highest expression in iCC, lower in mCC, and lower still in PL. (B) ‘Type II’ Notch targets have their lowest expression in PLs, increase in iCCs, and are expressed highest in mCCs. (C) Type I Notch targets correlate positively with the CC maturity marker PPO2 in iCC and negatively in mCC. (D) Type II Notch targets correlate positively with PPO2 in both iCCs and with an even higher slope in mCCs. (E) Total sima transcript levels are similar in PL and mCC. The usually major splice variant, sima-RA decreases with CC maturity. The normally minor sima-RC isoform increases from PL to iCC and is higher still in mCC. (F) Alignment counts for an exon specific to sima-RC are highest in mCC (red). The FlyBase coordinates are in Figure 5—figure supplement 1D. (G–G’) Live internalization assay in lz-GAL4, UAS-mGFP lymph glands with an antibody against the extracellular domain of Notch (NECD, magenta) to visualize uptake and stabilization of full-length Notch protein. Large Notch punctae are specifically located in mCC (GFPHI; white arrowhead) but not in iCC (GFPLO; yellow arrowhead). (H, I) Protein staining for Hnt (green) and Sima (magenta) shows numerous large Sima punctae in mCC (I, high Hnt) but not in iCC (H, low Hnt). (J–J’’) Full-length endocytosed Notch protein is visualized in a live internalization assay with an antibody against NECD (green, J’) and then fixed and stained for Sima protein (magenta, J). Numerous large NECD and Sima punctae colocalize and therefore appear white in the merged image (J’’). (K) numb transcript level is minimal in PL, increases in iCC and further increases in mCC. (L) Dot plot showing the mean level of numb expression (indicated by dot color) and the percentage of cells that express numb in each population (indicated by dot size). Compared to all cells identified by scRNA-Seq, numb transcript levels are enriched in iCC and are even higher in mCC. (M) Strong Numb protein staining (green) is restricted to mCCs (white arrowheads), with stronger Hnt staining (magenta) and not in low Hnt-expressing iCCs (yellow arrowheads). (N–N’’’) Large Sima punctae (magenta) are only seen in Hnt (gray) positive crystal cells (CCs) with high Numb staining (green). (O) Quantitation of the data in (P–Q’) showing the percentage of Hnt+ CCs that are positive or negative for Sima and Numb in wild-type (WT) and upon knockdown of numb. No Numb negative Sima+ cells are evident in either genotype. Depletion of numb causes loss of nearly all Sima+CCs. n=221 total CCs from WT and n=111 CCs for numbRNAi. (P–P’) WT lymph glands display large Sima punctae (magenta) in Hnt+ (green) CCs. (Q–Q’) The large Sima punctae are eliminated when numb is depleted in CCs using lz-GAL4 UAS-numbRNAi. Quantitation in Figure 5—figure supplement 3F. (R–R’) PPO2 protein (magenta) is high in most Hnt+ (green) CCs in WT. (S–S’) PPO2 levels (magenta) are lower in Hnt+ (green) CCs when numb is depleted using lz-GAL4 UAS-numbRNAi. Quantitation in Figure 5—figure supplement 3I. (T) Flow cytometry shows that when numb is knocked down (Genotype: lz-GAL4 UAS-numbRNAi), a large proportion of the mCCs (GFPHI) are lost while the total number of CCs does not change significantly (2883 in WT vs. 3288 in numbRNAi). Scale bars: 2 μm in (G, H, J, N, P–Q’); 5 μm in (M, R–S’). See Figure 5—figure supplement 3 for lower magnification views of lymph glands shown in (P–S’). CC, crystal cell; iCC, immature crystal cell; mCC, mature crystal cell; PL, plasmatocyte.

Our past work established a role for Sima in CC maintenance (Mukherjee et al., 2011). However, these results predate the sub-classification of CCs into iCC and mCC subpopulations, which could, potentially, provide a cellular context for the switch between the two different modes of Notch signaling. Based on the expression patterns of the type II Notch targets, Sima-dependent signaling is expected to be highest in mCC, however, we detect no significant differences in total sima transcript levels between PL and mCC populations (Figure 5E). This was not entirely surprising because Sima is known to be primarily controlled at the protein level (Bertolin et al., 2016). However, the bulk RNA-Seq experiments have sufficient depth of sequencing to identify different isoforms, and this was key to the understanding of the pathway.

Of the four proposed RNA isoforms, sima-RA, -RB, -RC, and -RD (Figure 5E–F), two, -RB and -RD, are not expressed in lymph glands. The full length and most widely studied isoform is sima-RA, which is significantly higher in its expression in PLs than in CCs (Figure 5E–F). Importantly for this study, the smaller isoform sima-RC is expressed highest in mCCs (Figure 5E–F; Figure 5—figure supplement 1D). The iCCs contain low levels of both isoforms, which explains why the total sima transcript expression is lower in iCCs than in PLs and mCCs (Figure 5E). qRT-PCR using isoform-specific primer sets confirms sima-RC, but not sima-RA, is expressed at a higher level in mCCs compared to iCCs and PLs (Figure 5—figure supplement 1E, F). Interestingly, a past study has suggested that a stabilized Sima protein is involved in the auto-regulation of sima-RC (Kamps-Hughes et al., 2015) that is consistent with the presence of Sima binding sites within the identified sima-RC enhancer element (Figure 5—figure supplement 1G).

Importantly, the predicted protein, Sima-PC, encoded by the smaller than full-length sima-RC transcript, would be truncated and lack the N-terminal motif that is essential for binding its partner Tango (Hif-1β/ARNT). The Sima/Tango heterodimer is essential for eliciting a hypoxia response (Gorr et al., 2004; Romero et al., 2008). Sima-PC retains the oxygen (and prolyl-hydroxylation)-dependent degradation (ODD) domain (Figure 5—figure supplement 1H, I; Gorr et al., 2004; Romero et al., 2008). We conclude that the Sima-dependent non-canonical Notch signaling (Mukherjee et al., 2011) involves the formation of the Tango-independent Sima-PC/Notch heterodimer that does not activate hypoxia-inducible genes. Rather, such a complex will only form in mCCs where the abundance of the Sima-RC transcript is high, and help maintain the mature CCs. Indeed, we find that mCCs, but not iCCs, contain large punctae of endocytosed and stabilized full-length Notch (Figure 5G–G’) and large punctae of Sima protein (Figure 5H–I) that colocalize in the same punctae (Figure 5J–J”). Taken together, the data show that mCCs, but not iCCs, participate in Notch/Sima signaling that is facilitated by the switch to an alternate isoform of Sima in the mature CCs.

Numb and Musashi in CC determination

The existence of punctae containing stabilized N/Sima proteins suggests that endocytic mechanisms are important for this non-canonical Notch pathway. This motivated us to focus on the gene numb, which encodes a component of the endocytic pathway that promotes internalization/trafficking of Notch (Couturier et al., 2013; Yap and Winckler, 2015; Johnson et al., 2016; Shao et al., 2016) and intracellular Numb blocks canonical Notch signaling (Frise et al., 1996; Spana and Doe, 1996). Given that CC induction requires a ligand-dependent canonical Notch signal (Lebestky et al., 2003), we found it surprising that numb RNA is enriched in CCs (Cho et al., 2020; Figure 5L). numb levels are by far the highest in both iCCs and mCCs compared to all other lymph gland cell types (Figure 5L), with a significant increase during the transition from iCC to mCC (Figure 5K–L), and correlating strongly (r=0.99) with PPO2 (Figure 5—figure supplement 1J). numb is reported to be a Notch target (Rebeiz et al., 2011), a conclusion we find is well-supported for CCs as well. Constitutively active Notch (NotchACT) expressed in CCs, raises numb levels while knockdown of Notch observed in NotchRNAi has the opposite effect (Figure 5—figure supplement 2A-G).

By far, the most spectacular control of Numb is post-transcriptional. In spite of the significant quantities of numb RNA in iCCs, Numb protein is exclusively detected in mCCs and not in iCCs (Figure 5M; also later in Figure 6A–C’). In fact, immunohistochemically detected Numb protein is a new and distinctive marker for mCC. The large Sima punctae described above are also exclusively seen in the Numb-expressing mCCs (Figure 5N–N’’’; and WT quantitation in Figure 5O). These full-length Notch/Sima punctae co-localize with Numb punctae in a live endocytosis assay and are also seen in Hrs8-2 positive early endosomes (Figure 5—figure supplement 3A-C’’’’).

Figure 6 with 1 supplement see all
Musashi (Msi) regulates Numb protein levels.

(A–C’) Three independent criteria distinguish iCCs from mCCs and validate Numb expression. A pair of cells is shown in each panel, and for each, the one on the left (yellow arrowhead) is iCC and the one on the right (white arrowhead) is mCC by the described criteria. (A–A’) iCC is low in Hnt (cyan) and PPO2 (green), whereas mCC is high for both. Numb (magenta) is only seen in mCCs. (B–B’) iCC is low and mCC high in lz> mGFP expression. Numb (magenta), is only seen in mCCs. (C–C’) iCC expresses and mCC is negative for HmlΔ>GFP (green) expression. Hnt (cyan) is lower in iCC than in mCC. Numb (magenta) is only detected in mCCs. (D) Bulk RNA-Seq shows msi transcript levels decrease with crystal cell (CC) maturity from PL to iCC to mCC. (E) Mean level of msi expression (indicated by dot color) and the percentage of cells that express msi in each population (indicated by dot size) are represented in a dot plot. scRNA-Seq shows reasonably uniform msi transcript level in all lymph gland cells with the exception of mCCs in which msi transcript is much lower. (F) msi expression negatively correlates with PPO2 levels in iCC and mCC populations. (G–G”) CCs (Hnt+; gray) with low levels of the fusion protein Msi-GFP (green; white arrowheads) have high Numb staining (magenta), indicating that they are mCCs. iCC with low Hnt staining (yellow arrowhead) shows higher levels of Msi-GFP and no Numb protein. (H–J) Genotype, lz-GAL4, UAS-mGFP. Numb (magenta) is expressed at moderate levels in wild-type (WT) lz>mGFP+ CCs (H–H’) and increases significantly with lz-GAL4, UAS-msiRNAi (I–I’). Quantitation of Numb levels in individual crystals, n=80 (J). (K–M) CCs are visualized with Hnt (gray) and HmlΔ-GAL4, UAS-2xEGFP is used as a driver. In WT (K–K’) HmlΔ>GFP positive cells (green) do not show high levels of Numb protein (magenta). Whereas upon expression of UAS-msiRNAi (L–L’), Numb protein greatly increases. Data quantitating Numb levels in individual CCs, n=60 (M). All images are single confocal slices. See Figure 6—figure supplement 1 for lower magnification views of lymph glands shown in (G–G’’, H–I’, K–L’). Scale bars: (A–C), 2 μm; (G), 5 μm; (H I, K, L), 10 μm. iCC, immature crystal cell; mCC, mature crystal cell; PL, plasmatocyte.

When trapped in endosomes, full-length Notch can promote a signal even in the absence of a ligand (Vaccari et al., 2008; reviewed in Fortini and Bilder, 2009). We propose that the exclusive and specific expression of the Numb protein in mCCs assists such a trapping of the full-length Notch/Sima complex in early endosomes. This enhances rather than inhibits signaling by this unusual and non-canonical pathway. As a direct genetic test of this model, we downregulate numb in CCs (lz-GAL4; UAS-numbRNAi; Figure 5—figure supplement 1K-M) and find a clear reduction in the large Sima punctae, as well as reduction in PPO2 expression, indicating a decrease in the number of mCCs without affecting the total number of CCs (Figure 5; Figure 5—figure supplement 3D-I). In summary, loss of Numb has no effect on initial CC specification and iCC induction; instead depletion of Numb inhibits the maturation of iCCs to the mCC state. These results are further confirmed by flow cytometric analysis, which shows that when numbRNAi is expressed in all CCs, the proportion of mCCs decreases with a corresponding increase in iCCs (Figure 5T). Flow cytometry also shows that numbRNAi exhibits smaller cell size (FSC) and less cellular complexity (SSC) in mCCs when compared to WT, but does not cause such reductions in iCCs (Figure 5—figure supplement 4A-D). As shown earlier, the most mature CCs contain >4N DNA attributed to endocycling (Figure 1H). We find that numb knockdown in CCs causes a very clear reduction in the number of cells with >4N DNA content (Figure 5—figure supplement 4E). Thus loss of numb reduces mCCs (Figure 5T), and the ones that remain show loss of endoreplication and other signs of terminal CC maturity.

To summarize, numb mRNA is expressed in all CCs, but its translation to Numb protein is blocked in iCCs and is therefore unable to inhibit the Ser/N specification signal. In this model, it is essential that this translational block be specifically eliminated in mCC, where Numb promotes the Sima/Notch non-canonical signal (summarized in Figure 7D). Investigations in mammalian systems have suggested that the RNA-binding protein Musashi (Msi) binds to numb mRNA and represses its translation (Imai et al., 2001). RNA-Seq results indicate that msi mRNA is expressed widely in the lymph gland, with the clear and notable exception of CCs, particularly in mCCs where msi is not enriched (Figure 6D–E). msi RNA shows a very strong negative correlation with PPO2 (Figure 6F). It is therefore reasonable to consider msi as a candidate for the numb translational repressor in iCCs but not in mCCs. Consistent with this proposal, the Msi protein is also expressed broadly except in the Numb positive mCCs (Figure 6G–G’’; Figure 6—figure supplement 1A-B'). A knockdown of msi (Figure 6—figure supplement 1G) in all CCs (lz-GAL4, UAS-msiRNAi) causes a strong increase in Numb protein level in individual CCs relative to WT (Figure 6H–J; Figure 6—figure supplement 1C-D'). Similarly, when msiRNAi is driven in HmlΔ expressing cells (HmlΔ-GAL4, UAS-msiRNAi) that include the iCCs but not mCCs (Figure 6C–C’), an even more dramatic increase in the Numb protein is readily evident (Figure 6K–M; Figure 6—figure supplement 1E-F'). Combining the established nature of Msi function in mammals as an RNA-binding post-transcriptional repressor of numb (Nakamura et al., 1994), with the transcriptomic and genetic data presented here, we conclude that Msi controls numb RNA post-transcriptionally in iCCs and the absence of Msi in the mCC allows Numb protein to be made in these cells (summarized in Figure 7D).

Summary of markers, case studies, and a model for the developmental progression of lymph gland cells.

(A) Table showing representative combinations of markers that can be used to distinguish between the subpopulations of cells identified in this study. Markers specific to any one cell type or cluster are rare. The entire transcriptome contributes to cellular identities. The combinations listed here are useful identifiers of the subpopulations. Alternative marker combinations are separated by ‘OR.’ (B–D) LG developmental models. Nodes at which cells make alternate fate choices (FCs) are marked by asterisks. (B) A model demonstrating the complexity and developmental progression of cell types within the Drosophila lymph gland (see Discussion for details). PSC cells (not shown) have a developmental origin distinct from the rest of the primary lobe cells. The medullary zone (MZ) cells are found in clusters MZ1 (earliest in pseudotime state 1) and MZ2. The most mature subpopulation within the progenitors is MZ2-3, which is an FC point that leads to two alternate transitional zones IZ and proPL (pPL), each replete with its own set of temporally distinct states. IZ-5 is the second FC node that allows either a plasmatocyte (PL) or a CC fate choice. The third FC node is at PL-7a/b, a state where a choice between a CC fate is weighed against maturation to a terminally differentiated PL. Additional minor paths and variations thereof are discussed in the text. The model emphasizes flexibility as its central feature that is characterized by transitional and alternate paths connecting progenitors to differentiated cells. These routes are specified by gradual trends and combinations of gene expression rather than by quantal transitions between cell types maturing towards common or distinct fates. (C) Context-specific function of Pnt is determined by its graded expression pattern that increases progressively from low (pntLO) in the early MZ2-1/2, to medium (pntMED) in the later MZ2-3 progenitors. Pnt plays a role, indicated by green arrows, in differentiation of MZ2-3 progenitors into either the IZ or proPL (pPL) cell types. Transcript levels for pnt are higher in IZ/proPL (designated MED/HI) than in the MZ. Its function is required for both the entry into and exit from the IZ state. High levels of Pnt (pntHI) promote PL fate and inhibit CC formation. (D) A combination of results from bulk and scRNA-Seq, flow cytometry, and in vivo genetics is used here to depict the mechanism by which canonical and non-canonical Notch signaling define CC fate. Precursors of CCs (preCC) receive a Serrate-dependent canonical Notch signal to be specified as iCCs. The RNA-binding protein Musashi (Msi), expressed in all preCCs and iCCs, inhibits Numb translation, allowing canonical Notch signaling. Msi is not expressed in mCCs (we speculate it is repressed by high Lz), and therefore Numb is expressed in these cells. This allows a non-canonical ligand-independent Sima-PC/Notch signal while inhibiting any residual Ser-dependent canonical Notch signaling in the mature CCs. CC, crystal cell; iCC, immature crystal cell; IZ, intermediate zone; mCC, mature crystal cell.

Discussion

Heterogeneity of cell types

The cells of the small, hematopoietic lymph gland tissue are far more complex at the genome-wide expression level than could have been anticipated by earlier marker and genetic analyses. This is now confirmed by this work, and by the earlier results of Cho et al., 2020. The first step in our analysis was to separate cells by FACS based on the canonical markers that classically define each zone within the lymph gland (reviewed in Banerjee et al., 2019). When probed for the presence of known ‘hallmark genes,’ the separated cells expressing them match up with their corresponding zones, providing early validation of the methods used. This process also allows us to identify zone enriched gene expression for less well-characterized cell types, including the IZ cells (IPs), as well as immature and mature CC types (iCC and mCC). This bulk RNA-Seq approach was further extended using scRNA-Seq and genetics to identify possible combinations of markers that identify each cell type (Figure 7A). However, the primary goal of this work is not to identify more tissue-specific hallmark genes (although several were found), but to utilize RNA-Seq as a tool with other genetic strategies to understand: cell-fate specification, the multiple developmental paths available to a cell, and the mechanistic links between expression trends and developmental function. Many individual examples, and two complete case studies are presented that solve long-standing questions in Drosophila hematopoiesis (Figure 7C–D).

The transcriptomic data are most useful in determining trends in the collective behavior of a set of related genes. At the core of this assertion is the fact that most developmentally relevant genes function in a context-dependent manner, and their individual expression is therefore not exclusively limited to a single cell type, but certain combinations of expressed genes could approximate their identities (Figure 7A). Obvious exceptions are genes marking functions of terminal states such as lz or NimC1, but even in such cases, RNA expression begins in multipotent precursors and continues in the terminal cell types. The case studies presented in this work demonstrate this concept, showing that a graded expression pattern of a transcription factor allows the identification of specific phenotypes for each developmental step. Similarly, expression of an alternate isoform for the protein Sima and the RNA-binding protein Msi explains why Numb inhibits canonical Ser/Notch function but not non-canonical Sima/Notch function in the same cell type. Thus the motivation for this study is to provide multiple examples that take advantage of the ready access to genetic tools that make Drosophila a particularly attractive system in which to establish detailed mechanistic aspects of complex pathways. Based on the long history of conservation of basic principles, it is not unreasonable to expect that parallels to such mechanisms will be found in mammalian hematopoiesis.

Employing fairly conservative criteria for cluster separation in scRNA-Seq, we identify eight primary clusters. The CCs were subclustered to yield iCC and mCC giving rise to the following nine groups of cells: a single cluster each for PSC, X (a mitosis and replication stress-related cluster), PL, and CC (subclustered into iCC and mCC). Two clusters each were identified for MZ (MZ1 and MZ2), and one for the two transitional populations (IZ and proPL). The compact arrangement of the majority of clusters implies smooth developmental transitions between them even as, from a gene-enrichment point of view, they represent different cell types. However, from a developmental biology point of view, it is the functional differences between clusters that must be used to define them as distinct cell types. It is virtually impossible to find any transcript that is 100% cell-specific, and therefore our analysis focuses on trends and enrichments in transcriptional patterns. Sometimes, as in the case of pnt, the changes in expression along each developmental step can be very small, but the trend defines its multiple functions and only functional data from mutant analysis provides validation for the gene expression patterns.

RNA-Seq is by now a commonly used technique in many fields, although its first use in lymph gland hematopoiesis was relatively recent (Cho et al., 2020). That study identified new markers and validated the expression of a representative number of the expressed genes. In Supplementary file 2, we present a detailed comparison of the transcriptional map comparing the clusters and subclusters of Cho et al., with those generated in our single-cell RNA-Seq. By comparing the sizes of the clusters/subclusters, the overlapping gene lists, and the expression patterns and genetic profiles (Supplementary file 2), we find that MZ1 is similar to the PH1 and PH2 subclusters in Cho et al.; MZ2 is similar to PH3 and PH4; IZ to PH5 and PH6; proPL to PM1; PL to PM2, PM 3, and PM4; PSC to PSC; iCC to CC1; mCC to CC2; and X is most similar to the ‘GST-rich’ cluster of Cho et al. The differences in where boundaries are drawn could arise from many sources, such as the experimental technique (drop Seq by Cho et al. vs. 10×), genetic background (Oregon R vs. w1118), and perhaps most importantly, the computational strategy (manual curation and aggregation of the clusters based on known gene expression by Cho et al. vs. unsupervised graph-based clustering in this study). Both studies provide useful data. The strength of our study is that we use FACS to sort populations defined as MZ, CZ, IZ, CC, and so on, and therefore, we are certain that the two clusters MZ1 and MZ2, for example, belong to the traditionally defined ‘MZ’ (Jung et al., 2005) and the same is true for the others. The second strength is that our strategy requires the use of multiple backgrounds and biological replicates, and the results are very consistent. Finally, given that most expression patterns represent trends rather than specific cells, and often different from the proteins they encode (such as for numb), the strongest validation of expression data, we feel is when it is in agreement with genetic strategies based on loss of function in a subset of cells (such as with pnt or Mmp1).

An updated model for lymph gland hematopoiesis

In Figure 7B, we summarize our results to present a model of lymph gland development. Our analysis is based on a single time point in development but the occupancy states in pseudotime allow us to use maturation states as a form of developmental clock. The model is largely based on adjacencies, genetic compositions, and validation by mutant analysis. Transition from pre-progenitors to progenitors, then through transitional IZ or proPL populations, finally on to PLs or CCs is a continuous process traversing gradually through a permissive landscape. It does not appear to be a set of pre-programmed, quantal decisions that a cell makes based on the expression of a single fate-specifying gene. This idea is gaining increased traction in the newer reports on mammalian hematopoiesis (Velten et al., 2017; Rodriguez-Fraticelli et al., 2018; Weinreb et al., 2020).

The developmental trajectory for Drosophila hematopoiesis is branched, and the subdivision of 9 expression-based clusters into 22 subpopulations is based on both cell type and the trajectory state in which they reside. It is important to point out that in this context, the cluster name (e.g., MZ1 or MZ2) represents cell types distinguishable by their gene-enrichment profile, whereas the ‘states‘ (such as MZ2-1, MZ2-2, and MZ2-3) represent the same cell type (MZ2), but appearing at different pseudo-times (1, 2, or 3). Although the analysis is a snapshot of a particular real-time point in development, many developmental steps of a single cell type are represented as progress in pseudotime. For example, the MZ2-3 state is composed of the most mature cells of the MZ2 cell type. The next transitions to either of the two separate transitional cell types, IZ or proPL, that define alternate developmental paths. The cell states MZ2-3, IZ-5, and PL-7a/b (marked with an asterisk in Figure 7B–D) are nodes of bifurcation based on this model. Some details of the model require further functional confirmation in vivo that is beyond the scope of the current manuscript. It is anticipated that such details of cell identity will change with future refinements. However, the model in Figure 7B provides a blueprint and a rich opportunity to study changes in signaling, cell cycle, or possible modes of cell divisions that promote alternate cell fates.

Transition zones provide alternative paths for hematopoietic development

An important finding of this study is the demonstration of alternate paths that initiate with the same progenitor types and terminate in the same differentiated fate, but they traverse through distinct transitory cell types. The distinction between transitional states such as IZ and proPL would be less remarkable, if they did not also have additional unique characteristics and functions. For example, together the genetic and RNA-Seq data suggest that proPL is likely a major source of the equilibrium signal, whereas IZ largely contributes to the JNK signal. The two cell types are largely non-overlapping and virtually non-adjacent in a 3D t-SNE representation of the clusters. These alternate routes are reminiscent of the concept of progression through alternate epigenetic landscapes proposed by Waddington (Waddington and Kacser, 1957) at the very dawn of Developmental Biology. Finally, in T cell development, there is evidence to suggest that intermediate cells bridge the major singly and doubly marked populations, but even less is known about their possible developmental roles (Kaech and Cui, 2012).

Minor paths not involving either of the two major transitional states (IZ or proPL) are consistent with, but not fully established yet by our data. For instance, the earliest PL clusters (PL-3) are sandwiched between MZ2 and PL-7 with no intervening proPL or IZ cells, suggesting a direct MZ to PL path, or perhaps one that involves X as an intermediary. As another example of a minor path, a small number of iCC cells follow the path PL-7/iCC-7/iCC-6/mCC-6. The iCC-7 to iCC-6 transition is a reversal in pseudotime. Although unexpected, this supports the concepts of transdifferentiation (Leitão and Sucena, 2015) and dedifferentiation (Terriente-Felix et al., 2013) proposed in Drosophila hematopoiesis. It will be interesting to determine in future studies if paths that are minor during homeostasis become more prominent under stress or immune challenge when a rapid and amplified response is prioritized over orderly development.

Developmental metabolism and the transcriptome

Contrary to a commonly held viewpoint, metabolic pathways are regulated in a cell-specific manner and their participation is not limited to ‘housekeeping’ roles during development. Indeed, data on both cancer and developmental metabolism show that selective use of such pathways can drive certain critical developmental decisions instead of the other way around (Pavlova and Thompson, 2016; Nagaraj et al., 2017; Miyazawa and Aulehla, 2018; Chi et al., 2020; Li and Simon, 2020; Nakamura-Ishizu et al., 2020; Tiwari et al., 2020).

The analysis presented in this paper demonstrates that in Drosophila hematopoiesis, cells within individual zones are not only defined by their position within the organ and the markers that they express, but also by their metabolic status that is foreshadowed by the content of their transcriptome. The PSC cells, as a group, for example, are well represented by most upper glycolysis genes that are then used, not for bioenergetic purposes, but to increase the PPP flux of glucose metabolism that aids in maintaining an NADPH/GSH-dependent low ROS status for these cells. This is important as high ROS in the PSC is a trigger for a specific immune response that must be repressed during homeostasis (Sinenko et al., 2011; Louradour et al., 2017). Interestingly, the immediately adjacent MZ cells are lower in NADPH-forming enzymes, and their genes controlling oxidative phosphorylation are higher than in the PSC. This would lead to higher ROS even during homeostasis. Indeed, the MZ ROS levels are high and this physiological amount is essential for progenitor differentiation (Owusu-Ansah and Banerjee, 2009). A very interesting example of metabolic control is in the IZ cluster. Surprisingly, this narrow band of cells is enriched for genes required for both synthesis and clearance of free ceramide from a cell. This is important given the known role of ceramide in the activation of the JNK pathway, and we provide genetic and immunohistochemical evidence of transient activation of JNK and MMP1 in this group of cells.

Unlike cancer metabolism, developmental metabolism is at a surprisingly early phase of research, and Drosophila hematopoiesis could be a very attractive system to study this phenomenon during homeostasis. More broadly, the results point to the continued relevance of the use of Drosophila as the singular invertebrate hematopoietic model, which provides a logical framework within which to establish less-studied concepts such as the characterization of parallel transitory populations, the roles of developmental metabolism, mechanisms of unusual signaling paradigms, and genetic dissection of pleiotropy.

Materials and methods

Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Genetic reagent (Drosophila melanogaster)domeMESO>GFP HmlΔ-DsRedBanerjee LabdomeMESO-GAL4, UAS-EGFP, HmlΔ-DsRed
Genetic reagent (D. melanogaster)lz-GAL4Gift from John Pollock, Lebestky et al., 2000
Genetic reagent (D. melanogaster)domeMESO-GFP.nls, HmlΔ-DsRed.nlsBanerjee Lab,
Makhijani et al., 2011
Recombinant of dmGFPnls (Banerjee Lab) and HmlDsRednls (Brückner Lab)
Genetic reagent (D. melanogaster)domeMESO-EBFP2Banerjee Lab,
Evans et al., 2014
Genetic reagent (D. melanogaster)CHIZ-GAL4, UAS-mGFPBanerjee Lab,
Spratford et al., 2020
IZ-specific split GAL4
Genetic reagent (D. melanogaster)Tep4>GFP HmlΔ-DsRedBanerjee LabRecombinant of Tep4-GAL4,UAS-EGFP, HmlΔ-DsRed
Genetic reagent (D. melanogaster)Tep4-QF2 G4HThis studyTep4-GAL4 NP7374 line (DGRC Kyoto) was converted to QF2 using the HACK method (Lin and Potter, 2016)
Genetic reagent (D. melanogaster)HmlΔ>2xEGFPBanerjee LabRecombinant of HmlΔ-GAL4, UAS-2xEGFP
Genetic reagent (D. melanogaster)UAS-NotchACTGift from Dr. Artavanis-Tsakonas,
Artavanis-Tsakonas et al., 1999
Genetic reagent (D. melanogaster)lz-Gal4, UAS-mGFPBloomington Drosophila Stock CenterBDSC:6314RRID:BDSC_6314
Genetic reagent (D. melanogaster)UAS-pntRNAiBloomington Drosophila Stock CenterBDSC:35038RRID:BDSC_35038
Genetic reagent (D. melanogaster)UAS-pntRNAiBloomington Drosophila Stock CenterBDSC:31936RRID:BDSC_31936
Genetic reagent (D. melanogaster)UAS-hepACTBloomington Drosophila Stock CenterBDSC:9306RRID:BDSC_9306
Genetic reagent (D. melanogaster)UAS-mihepBloomington Drosophila Stock CenterBDSC:35210RRID:BDSC_35210
Genetic reagent (D. melanogaster)UAS-numbRNAiBloomington Drosophila Stock CenterBDSC:35045RRID:BDSC_35045
Genetic reagent (D. melanogaster)LexAop2-6XmCherryBloomington Drosophila Stock CenterBDSC:52271RRID:BDSC_52271
Genetic reagent (D. melanogaster)UAS-NotchRNAiBloomington Drosophila Stock CenterBDSC:7077RRID:BDSC_7077
Genetic reagent (D. melanogaster)Msi-GFPBloomington Drosophila Stock CenterBDSC:61750RRID:BDSC_61750MiMIC protein trap
Genetic reagent (D. melanogaster)HmlΔ-QF2Bloomington Drosophila Stock CenterBDSC:66468RRID:BDSC_66468
Genetic reagent (D. melanogaster)10XQUAS-6XmCherryBloomington Drosophila Stock CenterBDSC: 52269RRID:BDSC_52269
Genetic reagent (D. melanogaster)bnl-LexAGift from Dr. Roy, Du et al., 2017
Genetic reagent (D. melanogaster)UAS-msiRNAi, UAS-Dcr2Gift from Dr. Wappner, Bertolin et al., 2016
Genetic reagent (D. melanogaster)HmlΔ-DsRednlsGift from Dr. Brückner, Makhijani et al., 2011
Genetic reagent (D. melanogaster)UAS-PvrRNAiGift from Dr. Shilo, Rosin et al., 2004
AntibodyAnti-Notch extracellular domain (NECD) (Mouse monoclonal)Developmental Studies Hybidoma Bank (DSHB)Cat# ABS571, RRID:AB_528408IF(1:50)
AntibodyAnti-Hrs 8-2 (Mouse monoclonal)Developmental Studies Hybidoma Bank (DSHB)Cat#, Hrs 8-2 RRID:AB_722114IF(1:100)
AntibodyAnti-MMP1 (Mouse monoclonal)Developmental Studies Hybidoma Bank (DSHB)Cat#3B8D12,RRID:AB_579781; 3A6B4, RRID:AB_579780; and 5H7B1RRID:AB_579779,IF(1:100 of a 1:1:1 mixture of 3B8D12, 3A6B4 and 5H7B1)
AntibodyAnti-Hnt(Mouse monoclonal) concentrateDevelopmental Studies Hybidoma Bank (DSHB)Cat#1G9-c,RRID:AB_528278IF(1:200)
AntibodyAnti-P1(Mouse monoclonal)Gift from Dr. Ando, Kurucz et al., 2007IF(1:100)
AntibodyAnti-PPO2(Rabbit polyclonal)Gift from Dr. Asano, Asano and Takebuchi, 2009IF(1:200)
AntibodyAnti-Numb(Guinea pig polyclonal)Gift from Jan lab, Roegiers et al., 2001IF(1:200)
AntibodyAnti-Numb(Rabbit polyclonal) preabsorbedGift of Jan lab, Rhyu et al., 1994IF(1:200)
AntibodyAnti-Sima(Guinea pig polyclonal)Banerjee Lab, Wang et al., 2016Rhyu et al., 1994preabsorbed; IF(1:100)
Sequence-based reagentSima-RA/RB_FThis studyPCR primersGCAGAACTTCAAGGTGCAATAA
Sequence-based reagentSima-RA/RB_RThis studyPCR primersCACCGTTCACCTCGATTAACT
Sequence-based reagentsima-RC_FThis studyPCR primersGAGGCGCACTAGTGACAAA
Sequence-based reagentsima-RC_RThis studyPCR primersCGAGCGAGATAGCAACGG
Sequence-based reagentmsi_FFlyPrimerBankPP29850ACGTCGTCTGACAAGCTCAAG
Sequence-based reagentmsi_RFlyPrimerBankPP29850GAATGTGATGAAACCAAAGCCG

Drosophila strains

Request a detailed protocol

The Drosophila lines from our lab stock were used in this study as follows: domeMESO>GFP HmlΔ-DsRed, domeMESO-GFP.nls HmlΔ-DsRed.nls, domeMESO-EBFP2, CHIZ-GAL4 UAS-mGFP (IZ-specific GAL4; Spratford et al., 2020), Tep4>GFP HmlΔ-DsRed, Tep4-QF2 (first used in this study), and HmlΔ>2xEGFP. The following lines were obtained from the Bloomington Drosophila Stock Center (BDSC): lz-Gal4 UAS-mGFP (#6314), UAS-pntRNAi (#35038 or #31936), UAS-hepACT (#9306), UAS-mihep (#35210), UAS-numbRNAi (#35045), LexAop2-6XmCherry (#52271), UAS-NotchRNAi (#7077), Msi-GFP MiMIC protein trap (#61750), HmlΔ-QF2 (#66468), and 10XQUAS-6XmCherry (#52269). The following lines were kind gifts from other labs: bnl-LexA from Dr. Roy (Du et al., 2017), lz-GAL4 from Dr. Pollock (Lebestky et al., 2000), UAS-NotchACT from Dr. Artavanis-Tsakonas (Artavanis-Tsakonas et al., 1999), UAS-msiRNAi UAS-Dcr2 from Dr. Wappner (Bertolin et al., 2016), HmlΔ-DsRednls from Dr. Brückner (Makhijani et al., 2011), and UAS-PvrRNAi from Dr. Shilo (Rosin et al., 2004).

The potency of the RNAi lines listed above has been confirmed. Briefly, we demonstrate loss of Numb or Notch protein expression with numbRNAi (Figure 5—figure supplement 1K-M) or NotchRNAi (Figure 5—figure supplement 2E-G), respectively. Lines such as pntRNAi are used routinely in the laboratory for diverse experiments in different tissues. For pntRNAi, we use two different and independently generated RNAi lines for pnt that both give rise to the same reproducible phenotype, both qualitatively and quantitatively, with two different GAL4 drivers (HmlΔ> and CHIZ>, see Figure 4Q, Figure 4—figure supplement 1F). The potency of the miRNA against the JNKK hep is demonstrated by the complete loss of staining for MMP1, a known downstream target of JNK pathway (Figure 3I–I’’). msiRNAi shows a statistically significant reduction of msi transcript levels by qPCR (~67% lower, Figure 6—figure supplement 1G) and phenotypes are seen with msiRNAi using multiple independent drivers (HmΔl > and lz>, Figure 6H–M). PvrRNAi was previously used and validated (Rosin et al., 2004), also in our lab (Mondal et al., 2011; Mondal et al., 2014) and the phenotype shown (Figure 2L) is the same as published.

Preparation of single-cell suspension from larval lymph glands

Request a detailed protocol

Larvae were collected at the third instar stage (90–93 hr AEL for scRNA-Seq; 90–96 hr AEL for bulk RNA-Seq of domeMESO-GFP.nls HmlΔ-DsRed.nls; 93–117 hr AEL for bulk RNA-Seq of lz>GFP, HmlΔ-DsRednls) and washed with DEPC water on a shaker to remove food traces before dissection. Pairs of lymph gland anterior/primary lobes were dissected (posterior/secondary and tertiary lobes were removed by mechanical separation using forceps) from 11 larvae, including 6 females and 5 males (for single-cell RNA sequencing) and approximately 100 larvae (for bulk RNA sequencing) in 1× modified dissecting saline (MDS) buffer (9.9 mM HEPES-KOH, 137 mM NaCl, 5.4 mM KCl, 0.17 mM NaH2PO4, 0.22 mM KH2PO4, 3.3 mM Glucose, and 43.8 mM Sucrose, pH 7.4) and lymph glands were then placed into a glass dish containing Schneider’s medium (Gibco) kept on ice. Glass dishes were pretreated with 1% BSA in phosphate-buffered saline (PBS) and rinsed prior to use to prevent adherence of primary lobes. Three biological replicates were done in parallel. The lymph glands were dissociated as previously described with some modifications (Harzer et al., 2013; Khan et al., 2016). After being washed with MDS buffer twice, these tissues were transferred to 1.5 ml DNA LoBind tubes (Eppendorf) and incubated with 200 µl of dissociation solution containing 1 mg/ml of papain (Sigma-Aldrich, P4762) and 1 mg/ml of collagenase (Sigma-Aldrich, C2674) in Schneider’s medium. They were dissociated for 15 min in a shaking incubator at 25°C, 300 rpm. Next, 500 µl of cold Schneider’s medium was added and the suspension was gently pipetted up and down using a low-binding 1000 µl tip (Olympus Plastics) 20 times for mechanical dissociation. After centrifugation at 3000 rpm for 5 min, cells were resuspended and washed with 500 µl of 1× PBS (Corning, MT21040CV) containing 0.04% of UltraPure BSA (Invitrogen, AM2616) and then passed through a 35-µm cell strainer (Falcon 352235). For preparation of the single-cell RNA-Seq sample, the cell suspension was concentrated by centrifuging and resuspending in a lower volume of PBS containing 0.04% BSA (30 µl). Cell concentration and viability were assessed using the Countess II automated cell counter (Applied Biosystems). The samples with final concentration of more than 650 cells/µl and viability of more than 85% were used for single-cell RNA-Seq.

Flow cytometry and cell sorting

Request a detailed protocol

For all RNA extractions from sorted populations, dissociated live lymph gland cells (from approximately 100 pooled larvae) were sorted using the BD FACSARIA-H. Gates and compensation were based on single-color controls. Cells were sorted into 300 µl of DNA/RNA Shield (Zymo) in DNA LoBind tubes and frozen at –80°C prior to RNA extraction.

For DNA content analysis, dissociated lymph gland cells were fixed in 1 ml of 1% formaldehyde solution in PBS after being dissociated using the above protocol. Cells were incubated in fixative in low binding tubes for 30 min at 4°C on a shaker, then were spun down and washed with PBS. Fixed cells were resuspended in a solution of PBS containing NucBlue Live ReadyProbes Reagent (Hoechst 33342) and incubated at room temperature on a shaker for 30 min. Cells were transferred to 5 ml polystyrene tubes for flow cytometry analysis on the BD LSRII. Cells were gated to exclude doublets using the FSC-H versus FSC-W and SSC-H versus SSC-W comparisons.

RNA extraction and qRT-PCR

Request a detailed protocol

For the bulk RNA-Seq and qRT-PCR, total RNA was extracted using the Quick-RNA Microprep Kit (Zymo). RNA quality control was performed using the Agilent 4150 TapeStation system. For qRT-PCR analysis, cDNA was generated using SuperScriptIV VILO Master Mix (Thermo Fisher Scientific). qRT-PCR was performed on cDNA using the PowerUp SYBR Green Master Mix (Thermo Fisher Scientific) and the StepONE Real-Time PCR system (Applied Biosystems). Primers used were as follows: sima-RA/RB Forward 5′-GCAGAACTTCAAGGTGCAATAA-3′; sima-RA/RB Reverse 5′-CACCGTTCACCTCGATTAACT-3′; sima-RC Forward 5′-GAGGCGCACTAGTGACAAA-3′; sima-RC Reverse 5′-CGAGCGAGATAGCAACGG-3′; msi Forward 5′-ACGTCGTCTGACAAGCTCAAG-3′; msi Reverse 5′-GAATGTGATGAAACCAAAGCCG-3′ (Hu et al., 2013).

Bulk RNA-Sequencing and Analysis

Request a detailed protocol

cDNA libraries were prepared using KAPA Stranded mRNA-Seq Kit (KAPA Biosystems) for the MZ/IZ/CZ bulk RNA-Seq experiment and the Universal Plus mRNA-Seq Kit (Nugen) for the CCs bulk RNA-Seq. Libraries were sequenced on two lanes of a HiSeq3000 (Illumina) or NovaSeq 6000 SP (Illumina), respectively. RNA sample and cDNA library concentration and quality control were assessed using the Agilent 4200 TapeStation system. Sequencing data was analyzed using Partek Flow, a web-based software platform. Sequences were aligned to the Drosophila melanogaster reference genome r6.22 (FlyBase) using STAR aligner with default parameters. Read counts were normalized by counts per million (CPM). Differential gene expression analysis was performed using ANOVA with a fold change cutoff of 2 and FDR<0.05 (see Supplementary file 3).

Single-cell RNA-Sequencing

Request a detailed protocol

Three samples were processed using 10× Single Cell 3′ GEX version 3 (10× Genomics) and sequenced on a NovaSeq 6000 S4 PE (Illumina) at UCLA Technology Center for Genomics & Bioinformatics. More than 8000 cells per sample were put through the Chromium Controller Instrument (10× Genomics) to partition single cells into Gel bead-in-Emulsions (GEMs) and 10× barcoded libraries were then constructed. All cDNA libraries were sequenced on one lane of NovaSeq flow cell using a symmetric run (2×150 bp).

Single-cell RNA-Seq data processing and analysis

Request a detailed protocol

Partek Flow was used to analyze single-cell sequencing data. Before alignment, each paired read was trimmed according to 10× Genomics Chromium Single Cell 3′ v3 specifications. Trimmed reads were aligned as described in the bulk RNA-Seq above. After alignment, UMIs were deduplicated and barcodes were then filtered and quantified to generate a single-cell count matrix. Genes with fewer than 30 total reads across three samples were excluded. Low-quality cells and potential doublets were filtered out by selecting out cells with high read counts (>85,000), an especially low (<1500) or high (>4500) number of genes detected, or a high percentage of mitochondrial reads (>6%). 21,157 cells passed these quality control filters across the three samples.

Read counts were normalized with the following order: (1) CPM; (2) Add 1; and (3) Log2 transformation. Genes that were not expressed in any cells were excluded and a total of 9458 genes were detected. Data was then corrected for batch effects between samples using the General Linear Model. All rRNA and three sex-related lncRNA (lncRNA:roX1, lncRNA:roX2, and lncRNA:CR40469) were filtered out. The data was imputed using the MAGIC algorithm (van Dijk et al., 2018) with the number of nearest neighbors=50.

For graph-based clustering, the top 2000 genes with the highest dispersion were used to perform principal component (PC) analysis (PCA). Based on the Scree plot, the first 13 PCs were selected to use as the input for clustering and data visualization tasks. Graph-based clustering was first performed with 50 nearest neighbors (NNs) and resolution (res) of 0.1–0.5, giving 6–14 clusters, with 8 clusters (res=0.19) showing the most marked differences in gene expression between clusters. The data was visualized using t-SNE with perplexity of 50 and PCA initialization.

Each cluster was compared to the others using an ANOVA to identify differentially expressed genes that were enriched more than 1.5-fold (with a FDR cutoff<10×e−6) (see Supplementary file 1). Cluster identity was assigned based on the presence of known zone-specific markers in the differentially expressed genes. Gene set enrichment analysis (GO, KEGG) was performed on the differentially expressed genes for each cluster (see Figure 1—figure supplement 3A).

Gene expression is shown in a couple of different ways. Heatmaps and dot plots were constructed for differentially expressed genes to show expression patterns/trends across the different subpopulations. For heatmaps, the expression values for a specific gene are converted to standardized z-scores as follows: the imputed expression value for that gene in each single cell is subtracted from the mean expression for that gene in all cells and then divided by the standard deviation for all cells using imputed expression data. All heatmaps are displayed using the z-score range of –2 to +2 (with the exception of Figure 2—figure supplement 3A, which uses –3 to +3). For dot plots, the mean expression (mean) and percentage of cells that show expression (non-zero percent) of a given gene is calculated for each subpopulation using non-imputed gene expression data.

To evaluate the developmental progression of the cells from a progenitor to a differentiated cell type, a subsequent trajectory and pseudotime analysis using Monocle 2 (Qiu et al., 2017) were performed. PSC and X clusters were both excluded from trajectory analysis. The trajectory was calculated using the top 1824 genes with the highest dispersion. For cluster X, a separate trajectory was constructed using the top 1500 highest dispersion genes.

We performed subclustering on isolated clusters (CC, X, and PL) to determine subpopulations. The subclustering analysis follows the general procedure as the initial graph-based clustering and used 2000 genes with the highest dispersion within each of the isolated clusters. Two CC subclusters were identified using 5 PC, 60 NN, and res 0.1. Three X subclusters were generated using 4 PC, 30 NN, and res 0.5. PL showed four subclusters with 7 PC, 50 NN, and res 0.2.

Bulk and scRNA-Seq data can be found at: GEO accession GSE168823.

AUCell analysis

Request a detailed protocol

To analyze the activity of a gene set in our data, we used the AUCell tool (Aibar et al., 2017), with the top 25% of genes. The minimum gene set size was generally 5, but in a few cases where the number of genes in the pathway was between 3 and 5, a smaller minimum was used. Gene lists used for AUCell analysis are found in Supplementary file 4. Each of the lists was derived from a specific source including GO terms, KEGG pathways, and published literature. The individual source is given for each list in Supplementary file 4. The Pearson’s correlation coefficient (r) shown in indicated graphs was calculated using Partek software. For AUCell analysis, z-scores ranging from –2 to +2 are displayed on heatmaps where the z-score is a measure of the standard deviations from the mean of each individual cell’s AUC score. Absolute or relative z-scores of >1.65 or <−1.65 are considered biologically relevant and statistically significant, approximately equivalent to p-values<0.05 with a one-tailed t-test.

Lymph gland dissection, immunostaining, and imaging

Request a detailed protocol

Lymph glands were dissected and processed as previously described (Jung et al., 2005). Unless indicated otherwise in the figure legend, all stainings were formed on lymph glands from wandering third instar larvae. Briefly, for MMP1 staining, lymph glands were dissected and immediately placed into 4% formaldehyde in PBS on ice and then fixed for 15 min at room temperature. For all other stainings, lymph glands were dissected into cold PBS and then fixed in 4% formaldehyde in PBS at room temp for 20 min. After fixation, tissues were washed three times in PBS with 0.3% Triton X-100 (PBST) for 10 min each, blocked in 10% normal goat serum in PBST (blocking solution) for 30 min, followed by incubation with primary antibodies in blocking solution. Primary antibodies were incubated with tissues overnight at 4°C and then washed three times in PBST for 10 min each, followed by incubation with secondary antibodies for 3 hr at room temperature. Samples were washed four times in PBST, with DAPI (1:1000, Invitrogen) added to the third wash to stain nuclei, and then placed into VectaShield mounting medium (Vector Laboratories) and mounted on glass slides. Notch internalization assays were performed as described (Mukherjee et al., 2011) using mouse anti-NECD (1:50, DSHB C458.2H). Staining for endosomal proteins was performed as described (Riedel et al., 2016) using mouse Hrs 8-2 antibody (DSHB). Lymph glands and dissociated cells were imaged using a Zeiss LSM 880 Confocal Microscope. All microscopy data are representative images from a total of approximately 10 biological replicates (n) in most cases. For each n, full z-stacks were imaged, processed, and analyzed using ImageJ or Imaris software. For MZ, CZ, and CC indexes, Imaris software was used to reconstruct a 3D volume from a z-stack and the spots feature was used to quantify individual cell types based on their fluorescence intensity values. To assess levels of protein staining in individual cells, ImageJ (e.g., for Numb and Sima staining) or Imaris (e.g., for Numb and PPO2 staining) was used. For all quantitation graphs, mean with standard deviation are shown and significance was calculated by unpaired t-test. All statistics were performed using Prism (GraphPad) software and p-values are shown in charts or figure legends as indicated. For space considerations, in some graphs p-values are represented in GraphPad style using asterisks as follows: n.s. if p>0.05; * if p≤0.05; ** if p≤0.01; *** if p≤0.001; **** if p≤0.0001.

The primary antibodies were used as follows: mouse anti-MMP1 (1:100 of a 1:1:1 mixture of 3B8D12, 3A6B4, and 5H7B1, DSHB) (Page-McCaw et al., 2003), mouse anti-P1 (1:100, kind gift from Dr. Ando) (Kurucz et al., 2007), rabbit anti-PPO2 (1:200, kind gift of Dr. Asano) (Asano and Takebuchi, 2009), mouse anti-Hnt (1:200, DSHB 1G9-c) (Yip et al., 1997), guinea pig anti-Numb (1:200, kind gift of Jan lab) (Roegiers et al., 2001), preabsorbed rabbit anti-Numb (1:200, kind gift of Jan lab) (Rhyu et al., 1994), and guinea pig anti-Sima (preabsorbed; 1:100) (Wang et al., 2016). Primary antibodies were detected with secondary antibodies conjugated to Cy3 (1:100; Jackson ImmunoResearch Laboratories), Alexa 633 (1:100), Alexa 555 (1:200), or Alexa 488 (1:200) (Invitrogen).

Data availability

Sequencing data have been deposited in GEO under Accession Code GSE168823 Complete Source Data are provided.

The following data sets were generated
    1. Girard JR
    2. Goins LM
    3. Vuu DM
    4. Banerjee U
    (2021) NCBI Gene Expression Omnibus
    ID GSE168823. Paths and Pathways that Generate Cell-Type Heterogeneity and Developmental Progression in Hematopoiesis.

References

    1. Tepass U
    2. Fessler LI
    3. Aziz A
    4. Hartenstein V
    (1994)
    Embryonic origin of hemocytes and their relationship to cell death in Drosophila
    Development 120:1829–1837.
    1. Yip ML
    2. Lamka ML
    3. Lipshitz HD
    (1997)
    Control of germ-band retraction in Drosophila by the zinc-finger protein HINDSIGHT
    Development 124:2129–2141.

Decision letter

  1. Erika A Bach
    Reviewing Editor; New York University School of Medicine, United States
  2. Anna Akhmanova
    Senior Editor; Utrecht University, Netherlands
  3. Erika A Bach
    Reviewer; New York University School of Medicine, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Decision letter after peer review of original submision:

Thank you for submitting your article "Paths and Pathways that Generate Cell-Type Heterogeneity and Developmental Progression in Hematopoiesis" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, including Erika A Bach as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Anna Akhmanova as the Senior Editor.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1) Address issues with AUC scores:

a) Define how you established the AUC list.

b) Clarify how you define AUC scores that are statistically significant.

c) Include a zero (0) on the y-axis of all graphs and P values for all graphs.

2) Provide a comparison of the data between the replicates.

3) Provide validation of some of the genes associated with clusters/sub-clusters, including Delta in MZ1, PSC, X and MZ2 clusters, and Ubx in posterior lymph gland lobes, and provide validation to support your model that there is no transition path for IZ to proPL.

4) Demonstrate the potency of RNAi lines, including pnt RNAi.

5) Explain the main differences and similarities between your data set and Jiwon Shim's by addressing specific issues such as Cluster "PH1" (your MZ1, PSC, X and MZ2 clusters) and Cluster X (and include a dot plot representation of Cluster X).

6) Provide additional quantification of CC in "Numb and Musashi in CC determination", additional information on how Numb intensity was quantified, and P values

7) Provide for lymph gland analyses the quantification of CC and MZ indexes and P values.

8) Consider using dot plots instead of box plots.

9) Provide in the main figures a list of the key genes defining each sub-population inside the different clusters.

10) Provide the number of cells in each cluster, including the IZ cluster.

11) Provide a schematic representation of crystal cell specification and maturation involving Notch, Numb and Sima.

12) Make extensive editorial changes, including shortening the manuscript per the reviewers' suggestions and clarifying the narrative so that the introduction covers what is actually being treated in the manuscript. For example, the introduction discusses the IZ but then the manuscript does not provide much insight into this cluster and instead focuses on CCs.

Full Reviews

Reviewer #1/Reviewing Editor:

Girard and colleagues combine bulk RNA-seq and scRNA-seq of FACS sorted blood cells from the larval lymph gland, the hematopoietic organ in Drosophila, to identify cell types and differentiation paths. The lymph gland is comprised of medullary and cortical zones. At least five cell types are found in the lymph gland: niche-like cells (termed the PSC), progenitors (termed prohemocytes) that are found in the MZ, intermediate progenitors (termed IZs) that represent a transition state between prohemocyte and terminally differentiated cells, and terminally-differentiated macrophage-like cells plasmatocytes (PLs) and crystal cells (CCs) found in the CZ. The bulked sorted populations are pure and are enriched for genes known to be expressed in these populations. Analysis of the scRNA-seq data identifies 9 clusters. Some of the clusters correspond to known populations like the PSC, IZ and PL. However, some clusters represent distinct subpopulations of MZ cells (MZ1 and MZ2) and CCs (iCCs and mCCs) and previously unknown clusters proPL and X. The pseudotime analysis identifies three "branch points" and seven "states", and t-SNEs show projected relationships between the states and clusters. Numerous graphs in multiple figures provide transcript levels of genes related to metabolism (Figure 3), pnt (Figure 4), numb (Figure 5), msi (Figure 6 ). The supplements for Figures 2 and 3 provide transcript levels for MZ markers, cluster X markers, IZ markers, CC markers, PL markers and basement membrane genes. Validation of the scRNA-seq are provided for JNK pathway (Figure 3), pnt (Figure 4), numb and sima (Figure 5), msi (Figure 6). The authors extend the transcriptomics through two "case studies". In the first one, the ETS transcription factor Pnt is required for the progression from late MZ to IZ. They also show that Pnt has a later function in PLs in preventing them from becoming CCs. In the second case study, their transcriptomics reveals a specific isoform of sima expressed in mature CCs (i.e., mCCs) that co-localizes with Notch, supporting the Banerjee lab's earlier work. The transcriptomics also reveals that numb transcripts (encoding a Notch inhibitor) are expressed in iCC but the Numb protein is highest in mCC. They author resolve this paradox by showing that the numb translational inhibitor Musashi is expressed robustly in iCCs. The authors also supply analyses comparing their scRNA-seq with one published in 2020. These transcriptomic analyses and subsequent genetic experiences provide strong support for the major claims of the paper.

There is a massive amount of data in this study. The data are of high quality, and the manuscript is well written. The results support the main conclusions. This study will be a very valuable resource to the community. I have a few suggestions for possible improvement.

1. On p. 36 (line 906), the authors write that "we surmise that Notch activation is the default pathway for early Hml-expressing cells to become CCs, and that the activation of Pnt acts antagonistically to prevent this process thus favoring instead, the plasmatocyte fate." I don't understand the logic of this. PLs represent 95% of the mature hemocytes, whereas CC represent 5%. Why would most of the differentiating hemocytes have to repress Notch signaling by expressing Pnt. Loss of Pnt from Hml+ cells would very consequential as the animal would not have PLs from the LG. What I am trying to say is that the kind of regulation proposed here is not robust and could be easily disrupted by mutation. Could the authors comment on this.

2. The manuscript is very long (the results section is thirty-five long) and reader attention spans tend to be short. Could the authors please edit the manuscript to reduce its length. For example, I don't think that the entire section about cluster X is needed. The metabolism section could be condensed. Some of the discussion is redundant with the results.

3. Please provide accession numbers the raw data at NCBI and a link to the reviewers.

Reviewer #2:

In this manuscript, Girard et al. analysed the transcriptome of the lymph gland of Drosophila using high throughput sequencing. They fist used genetic markers (Domemeso-GFP and Hml-RFP) to sort the cells from three distinct regions in feeding larvae: the medullary zone (MZ) known to be populated by progenitors, the intermediate zone (IZ) populated by intermediate precursors and the cortical zone (CZ) containing the mature hemocytes. The cells from the CZ were further subdivided into plasmatocytes, immature crystal cells and mature crystal cells with the genetic markers Hml-RFP and Lz-GFP. This subdivision was carried out at later stage to maximise the number of mature hemocytes. A comprehensive molecular signature of each region and cell type was determined with bulk RNAseq. Then, the authors analysed the transcriptome of 21200 cells of the lymph gland using 10x genomics technology. They found 9 clusters of cells displaying distinctive signatures and metabolic properties, including the cells of the Posterior Signaling center (PSC), two types of progenitors in the MZ, the intermediate precursors, the plasmatocytes as well as the immature and mature crystal cells. They predicted the filiation between the clusters using Monocle, indicating a developmental trajectory starting in MZ producing IZ and then plasmatocytes. At last, the authors validated two mechanisms of cell differentiation highlighted by the transcriptome analysis. They showed the context-specific role of the transcription factor Pnt in the differentiation of plasmatocytes. Loss of Pnt in the early progenitors has no effect whereas its loss in the late progenitors prevents the differentiation of intermediate precursors and thus mature plasmatocytes without affecting the differentiation of crystal cells. As a second study case, they showed that the interplay between Notch, Numb, Sima and Musashi is involved in the maturation of crystal cells.

Overall, this manuscript presents a tremendous amount of RNAseq and in vivo data with highly detailed interpretations. It provides very valuable and substantiated information on the molecular mechanisms involved in the development of the hemocyte lineages in the Drosophila lymph gland. The main caveats are (1) the definition of two clusters IZ and proPL, which seem to belong both to the IZ, (2) the fact that most boxplots do not include 0 in the y-axis, which strongly biases the interpretation of the graphs, (3) the definition of mitotic precursors. Finally, shortening would have made the manuscript more easily readable and would have conveyed the message more directly.

1) Most charts presenting expression levels or AUC score across clusters do not include 0 on the y-axis and disclose highly heterogeneous ranges. For example, the ranges of the AUC displayed in Figure 3 are from 37 points in A (y-axis range 0.43 to 0.8) to 3 points in I (y-axis range 0.13 to 0.16). This is highly misleading, and biases the interpretation of the data since the authors describe minor differences the same way than large differences. Few examples are:

– In the legend of Figure 3B-C, the authors write "(B) TCA cycle enzymes are expressed at exceptionally low levels in the PSC compared with the cells of other clusters. (C) Expression of oxidative phosphorylation pathway enzymes is low in the PSC, high in MZ1 and MZ2 and moderate in IZ, proPL, and PL clusters.". In B, exceptionally low level correspond to 0.42 compared to 0.46 in the other clusters. In C, low in the PSC means 0.74 and high in the MZ corresponds to 0.78. Since AUC represent frequencies, I doubt that few point can translate into such high ranges.

– In figure 3D and E, the authors mention the levels of Zw and Idh in the clusters. They explain that both are highly enriched in PSC compared to MZ. Zw presents an expression level of 1.1 in PSC, around 1.2 fold higher than in MZ and Idh presents an expression level of ~ 7 in PSC around 2 fold higher than in MZ. The difference in level of expression and fold enrichment cannot be appreciated properly due to the heterogeneous y-axes. In addition, the authors described the two enzymes in the same terms while Idh is expressed 7 time higher than Zw.

– For Pnt (Figure 4AB), the authors describe "a very low level of Pnt" in the PSC and MZ and a significant increase in the MZ2-3. The level in PSC is around 5.2, which is much higher than most genes described in this study and the significant increase is going from 5.2 in MZ2-2 to 5.5 in the MZ2-3. The significance of this increase need to be documented with a p-value and the biological relevance of this difference seem far-fetched.

2) A better explanation should be provided for the AUC scores. The authors rightly say that AUC are reflective of co-regulation however the keys to interpret the score are not described. In addition, while a difference from 0.80 to 0.60 (Figure 3A) seems plausible and sufficient to call for an enrichment of glycolytic genes in the PSC, the biological relevance of differences from 0.74 to 0.78 (Figure 3C) or from 0.08 to 0.09 (Figure 2—figure supplement 5) seems far-fetched without further explanations/justifications.

3) The distinction between the cluster IZ and proPL needs to be clarified. The enrichments of the IZ AUC and individual IZ markers are not striking (Figure 2—figure supplement 3A-H). In addition, can the authors explain why only 6 of the 9 IZ specific genes were taken to estimate the AUC score? The strong similarities between the two clusters, in terms of markers and developmental trajectories, seem to indicate that the two clusters represent transient conditions that each cell goes through on its way towards full differentiation. Indeed, the single cell analysis has been performed in feeding larvae, when cells are actively differentiating, not in a steady state condition. Longitudinal/spatial analyses in the developing lymph gland might help in the interpretation, but this is not the scope of the present manuscript.

4) The authors assess the impact of Pnt in the MZ cells using the drivers domemeso-Gal4 and Tep4-Gal4. The authors showed that domemeso>pntRNAi prevent the progression of MZ cells toward IZ cells. Some quantification would be welcome to appreciate the penetrance of the phenotype. In addition, the authors say that Tep4>pntRNAi has no observable phenotype, while the comparison between Figure 4E and Figure 4F seem to indicate that the lobe is smaller and with less Tep4 positive cells. Such difference could arise from slight stage differences. Could the authors indicate how they staged the animals, the number of samples...? At last, the potency of the pntRNAi construct should be documented.

5) In the section "Numb and Musashi in CC determination", the authors mention that Notch-ACT raises Numb levels and Notch-RNAi decreases Numb levels in the crystal cell. This interpretation should be clarified. Since N activity is modulated using a CC specific driver and the number of lz>GFP also changes upon N modulation, the observed results may arise from regulation of Numb expression and/or from regulation of CC number. CC quantification will help sustaining their statement on the role of N. In the third paragraph linked to Figure 5—figure supplement 3H-N and Figure 5O-R', the authors describe a clear reduction of Sima puncta, PPO2 expression and number of mCC. P-values should document these observations. Also, does the expression of the type II targets decrease upon Numb RNAi? At last, Figure 6H-M, the authors indicate that msi-RNAi enhance the level of Numb in crystal cells without providing information on the procedure followed to measure Numb intensity. What does Numb intensity represent in Figures 6J and 6M? Is it the average level per cell or the level in the whole lobe?

6) The authors carried out the single cell sequencing in triplicate. It would strengthen considerably the data to provide a comparison between the replicates.

7) The paper would gain from shortening. The introduction is broad and exhaustive, the results section describes the different the clusters and states as well as two study cases, the discussion elaborates on the mode of differentiation and put forward interesting models such as the gradual rather than stepwise transitions between groups of cells. Since this is a resource paper, the validation of all the single cell data is out of the scope, hence a thorough discussion of all those data could be shorten and used in subsequent studies. Furthermore, many data are already discussed in the results section, diluting the important and novel messages that the paper conveys.

8) According to the authors, Cluster X represents mitotic states of several distinct cell types, including the CZ that carries differentiated cells. This intriguing finding indicating the presence of dividing cells throughout the lymph gland deserves some clarifications. Does it imply that none of the other clusters identified by the RNAseq analysis contains cells in mitosis? Does it mean that plasmatocytes and cells of the medullary zones have similar mitotic potential? Is there any difference in the type/levels of genes associated to cell division between the cells of the cluster that express MZ markers vs. those that express the Pl markers? I understand that the spatial analysis of the cluster X cells in the lymph gland, which would help clarifying these issues, goes beyond the scope of the manuscript. It would be nevertheless useful to compare the RNAseq data with those from the laboratory of Jiwon Shim, who also identified the clusters of mitotic cells in the lymph gland. Also, a dot plot representation of the genes associated with cell division in the different cells of the cluster X (MZ, Tr, Pl) might help identifying features specific to the different subclusters.

Reviewer #3:

Using a combination of bulk RNA-Seq of FACS-sorted cells and single cell RNA-seq, the authors identify various blood cell subpopulations that compose the Drosophila hematopoietic organ called the lymph gland. This study has been performed at one developmental time point, mid third instar larvae. The authors perform a pseudo time analysis and propose a developmental trajectory with multiple paths to mature blood cells types. RNAseq data suggest that different blood cell types express genes involved in various metabolisc processes. They establish that Pointed has different roles during lymph gland hematopoiesis. Finally, they identify that Numb and Musahi are involved in a Sima dependent Notch non canonical pathway in mature crystal cells.

This analysis is of interest, however in the current version, it is too preliminary.

1. The list of the main genes defining each sub-population inside the different clusters, as well as their expression in all the other sub clusters, has to be provided in the main figures.

2. No validation of RNA seq data is provided: A spatial reconstruction in vivo by profiling the expression of a subset of genes identified by RNA seq is necessary.

3. To support the developmental progression of lymph gland cells proposed in Figure 7,

lineage tracing experiments are required.

4. Comparison between RNA seq data obtained by (Cho et al 2020) has to be given in the main text. Furthermore, discrepancies between these 2 studies have to be clarified. How the AUC list has been established? This is a key point. For example, the PH1 cluster identified by Cho et al is spread out in MZ1, PSC, X and MZ2 cluster in this study. Why are the results so different? Delta is a marker of PH1, which is validated by analyzing its in vivo expression profile. What about delta expression in the scRNA seq performed here?

5. There is a discrepancy between the introduction and the main results of this paper. In the introduction the authors focus our attention on the IZ, but in fine we don't learn much about IZ cells from this analysis. Instead of deciphering IZ identity and fate by in vivo profiling, most functional analyses performed concern crystal cell maturation. This part is developed via 2 main figures among 7, plus 5 sup figures. From my point of view, this study represents an ideal opportunity to better characterise IZ cell identity, lineage and function. Unfortunately these data are missing in the current version of the manuscript but could be added instead of the data concerning crystal cell maturation, which is somewhat out of the scope of this manuscript and could be published in a separate paper.

6. This manuscript has to be focused on the novelty given by the RNAseq data. The data concerning crystal cell maturation, which is somewhat out of the scope of this manuscript, could be published in a separate paper.

7. There are 7 main figures and 16 sup figures + 1 additional file. All these Sup figures give information and make suggestions that unfortunately are not validated by additional experiments. Overall the reader is left with a lot of observations that are not further validated and in fine one cannot rely on. Data presented in this manuscript have to be focused to avoid overloading the reader with too many side observations, which in turn lead to losing the thread of the message of this study.

8. I have concerns about the single cell RNA seq data, since essential information is missing.

– What about the cell numbers in each cluster? The IZ cluster (Dome-GFP+ and Hml+) represents a small subset of lymph gland cells based on the CHIZ expression profile (see Figure 3K); however, it corresponds surprisingly to a quite large lymph gland cell subset, as illustrated in Figure 1J. How can one explain this?

– To identify subpopulations in clusters, the authors performed sub clustering on isolated clusters for PL and CC. Why was this not done in the same way for the other 3 main clusters (MZ2, IZ and proPL)?

– For the IZ sub-cluster: The plot in Figure 2 sup 3I is very misleading, since it suggests that genes expressed in the IZ are specific to this cluster. For example, "state 3" is present both in the MZ and proPL (Figure 2), but in Figure 2 sup 3l it is only represented in the IZ and not in the MZ and proPL clusters. The same remark holds for states 4, 5 and 7. Furthermore, as I mentioned above, the list of the main genes defining each sub-population inside clusters, as well as their expression in the other sub-clusters, has to be provided in the main figures. Furthermore, a spatial reconstruction in vivo by profiling the expression of a subset of genes identified in sub populations is mandatory to validate the RNAseq data.

– Why is the plot shown in Figure 1J different from the plot shown in Figure 2 sup 5 I? The t-SNE graphic representation does not give any indications concerning whether clusters are related or not.

– Why are there discrepancies between gene expression levels and their representation on the corresponding plot? Please see for example the case of CG30090 in Figure 2 sup 3B and the corresponding plot in C. CG30090 is expressed at a similar level in proPL and PL, but its expression in proPL is lacking in the plot. Why ?

– Concerning IZ markers, among the 6 identified by bulk RNA seq, only 4 of them have been analysed in single cell experiments. What about the 2 others?

– In Figure 7, the model of developmental progression of blood cells proposes that there is no transition path between IZ and proPL. This proposition does not fit with the data. Indeed, in the pseudo time analysis there is a clear overlap between IZ and proPL, indicating that they are connected (see states 4 and 5 in Figure 2 O-P). Genes highly expressed in IZ are also highly expressed in proPL. This is observed for all the IZ markers analysed in this manuscript (please see Figure 2 sup 3B, D, G, H). Determining whether there is a transition path between IZ and proPL has to be validated by in vivo experiments.

– Figure 7: Regarding the translational link between PL7 and iCC7 (Figure 2 sup4), again this proposition has to be validated in vivo. Furthermore, how can iCC7 (more engaged in maturation) give rise to iCC6 (less engaged in differentiation). This also needs to be validated.

– Concerning the MZ1 cluster, Ubx is expressed in these cells. Ubx is specifically expressed in lymph gland posterior lobes that are composed of hematopoietic progenitors expressing markers of MZ cells (Rodrigues et al., 2021). Altogether these data strongly suggest that MZ1 cells correspond to posterior lobe hemocytes. This has to be clarified.

– For cluster X, since DNA damage markers are expressed, this strongly suggests that this cluster might correspond to unhealthy cells damaged during the experiment. What about their ribosomal content (a criteria commonly used to check for cell health)? Is the molecular signature of cluster X found in bulk RNA seq and in the Cho et al., 2020 paper?

– What is the unit given for the Y axis in Figure 3? Why is the scale different from one graph to another and does not start always at zero? For all graphs in this manuscript the p value is missing, so the reader cannot not figure out whether the differences are statistically significant or not. Concerning the AUC analysis, how is the list of genes taken into account for a signalling pathway or a function that has been established? What kind of conclusion can be drawn from analyses regarding metabolism? In other words, considering the PSC as an example of a group of cells where glycolysis genes are highly expressed, what is the impact of this on PSC function, and how does potential glycolysis in the PSC help us to understand PSC function?

– Figure 3K: the authors need to define what CHIZ is. Since there is no staining overlap between MMP1 and CHIZ-GFP, the authors cannot conclude that MMP1 is expressed in the IZ. Furthermore, MMP1 transcripts are detected at high levels not only in the IZ but also in the MZ2, proPL and PL (Figure 3J). How can these results be reconciled with the expression profile of MMP1 shown in Figure 3K?

–Figure 4: Quantifications are missing. Crystal cell and MZ indexes have to be given.

– Figure 4G and H: That tep4 is expressed in a subset of MZ progenitors defined by dome>GFP is not new, this has been established previously. Please see Benmimoun et al. 2015, and Oyallon et al 2016.

– What about the Pnt expression profile in the LG? Figure 4C-D: DomeMESO>pnt RNAi , in addition to a defect in blood cell differentiation, this leads to a smaller LG compared to the control. This defect in size has to be mentioned and quantified. Since the role of pnt in the MZ has been previously reported by Dragojlovic-Munther M et al , 2013, this paper has to be cited. Furthermore, the Dragojlovic-Munther M et al. study indicates that in addition to preventing hemocyte differentiation, pnt RNAi in the MZ leads to lamellocyte differentiation. Do lamellocytes differentiate when pnt is knocked down using domemeso, tep4 , CHIZ and hml gal4 drivers? In Figure 4F :Tep4>GFP>pntRNAi , GFP levels are decreased compared to the control. Does Pnt control the expression of the Tep4-gal4 driver? In the text, p35 line 886 "Pnt loss in MZ2.3" is an over interpretation, since no gal4 driver specific for this group of cells has been used to perform the lof experiment. p35 lines 903 "plasmatocytes to be converted into crystal cells" is erroneous, since hml-Gal4 expression is not restricted to mature plasmatocytes but is expressed both in plasmatocyte and crystal cell precursors. p35, lines 905-910 should be in the discussion, not in the result 'section.

– Figure 5: Numb and crystal cells

The paper Cho et al 2020 has to be mentioned in the text, since it established previously by immunostaining that Numb is expressed in crystal cells.

– A previous study done in Banerjee's lab, reports on the role of Sima and Notch in crystal cell survival by preventing their dissociation (Mukherjee et al., 2011). In the present manuscript, no data and comments refer to crystal cell survival depending on Numb. Thus in the current version of the manuscript, it remains unclear as to what is the role of Numb in crystal cells. Does it control iCC to mCC, or is it required for survival of mature crystal cells? The confusion is sustained by sentences such as: "depletion of Numb prevents the maturation of iCC to the mCC state", please see p 39, lines 1000. Since Numb is detected at high levels in mCC and not in iCC, the function of Numb should be in mCC once they have matured. Furthermore, it has been previously published that Sima is required for crystal cell survival. A decrease in Sima levels is observed in Lz>numbRNAi conditions, supporting the proposition that CC depleted from Numb should disrupt since they lack Sima . In conclusion, the role of Numb in either crystal cell maturation (i. e., going from iCC to mcCC) or mCC survival has to be clarified.

– Figure 5 sup2 : Quantification is missing. p 38 lines 979-981. The authors need to clarify whether there are talking about an increase in Numb levels per crystal cell, or an increase in Numb levels per LG, which in this case reflects an increase in the number of cell expressing Numb.

– Numb subcellular localisation is different from one picture to another. In Figure 5M, Figure 5 sup2 , Figure 6A-c and 6 H-L', Numb is mainly localised at the periphery of the cell at the plasma membrane, whereas in Figure 5 sup 3A-B and sup 3F-G, Numb is detected as cytoplasmic punctate dots without staining at the plasma membrane. This is confusing and clarification is necessary.

– Figure 5 sup 3: PPO2 staining shown in Figure 5 sup 3 K-L is not in agreement with the quantification given in M. p values are missing in M and N. There are also discrepancies between these figures and the text p 39: "numb RNAi expressed in crystal cells causes ...with a concomitant increase in the iCC population". Figure 5 sup 3N: Quantification of crystal cell numbers is not convincing. Since there is a lot of variation among LGs, quantification of PPO+ cells has to be given as an index (i.e., a ratio between the total number of PPO+ cells/ per total number of LG cells). In N, 6 LGs maximum per genotype have been analysed, which is far too few. Defining whether the number of crystal cells is affected in lz>numbRNAi is essential to determine whether Numb is required to allow iCC to mature into mCC, or whether it controls mCC disruption. The MM section has to be completed; it should indicate how quantifications of cell numbers and fluorescent intensity were performed.

– For Hnt staining in lz>numbRNAi, there is a discrepancy between Figure 5 sup 3 l-l" and Figure 5 sup 3 L-L". In I-I" no difference in Hnt levels compared to control (H-H) is observed, whereas in L-L" a strong decrease is observed (K-K").

– The enlargements shown in Figure 5O-R have been taken from pictures shown in Figure 5 sup 3K-L. It would be better to show independent immunostainings. This remark is even more relevant in this case because the staining in Figure 5 sup 3 L is not convincing.

– Figure 5 sup 3 H-I: lz >numb RNAi there is a decrease in Sima staining. Is it due to a decrease in sima transcription and/or Sima protein stability and/or Sima subcellular localisation?

– Figure 6 J and M ; is "the Numb intensity" referring to the intensity of Numb per cell or the total amount of Numb intensity measured per LG? This has to be clarified in the figure, in the text (p 40) and mentioned in the MM. What about the crystal cell index in lz>msi RNAi and Hml>msRNAi?

– A schematic representation of crystal cell maturation involving N (both the canonical and non-canonical signalling), Numb and Sima would be very helpful.

– In MM p 74 lines 1888 "data was corrected for batch effects between samples". The information concerning the method used has to be provided.

9. Modifications in the text are required

– Lines 1093 "the equilibrium signal from proPl". Functional data supporting this conclusion are lacking.

– Lines 1088 "JNK signalling ... is a specific property if IZ cells". Neither JNK expression nor its function has been analysed in this study.

Reviewer comments after revision

Decision letter 2

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Paths and Pathways that Generate Cell-Type Heterogeneity and Developmental Progression in Hematopoiesis" for further consideration by eLife. Your revised article has been reviewed by 3 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Anna Akhmanova as the Senior Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

Essential revisions:

1. The authors should temper some conclusions because the conversion of AUC scores to z-scores can obscure the actual level of enrichment. Specifically, the authors should:

a. Soften the conclusion that proPL generates the equilibrium signal and that the IZ alone activates the JNK pathway as this is based on a low number of genes (lines 959-960). Please rephrase this statement.

b. Point out in the text that JNK activation is enriched in but not specific to the IZ.

c. Clarify the in vivo definition of the IZ and the ProPL cells with respect to the Hml delta QF driver. The authors use it to specifically label the IZ but the driver's expression domain is broader than IZ cells.

2. The authors should indicate in Figure 3—figure supplement 3, 1D which AUC terms encompass "glycerolipid remodelling genes" (line 592)

3. The authors should rephrase "pnt transcript is expressed at low levels in the PSC (Figure 4A)". (Line 664). One interpretation of Figure 4A is that pnt is expressed at the same levels across the different clusters but in a smaller number of cells in the PSC (hence the smaller circle).

4. The authors should provide publications for the enhancers in Figure 5—figure supplement 1 B,C,G or explain how they defined the enhancers.

5. The authors should remove the comma in line 795 ("we find, is").

6. The authors should avoid the term de-enriched/de-enrichment (lines 525, 549, 562, 836, 135, 1136, 1159, 1435, 1437, 1438) to indicate a change in the levels of expression.

7. The authors should modify their conclusion (line 652) "establishing the IPs as MMP1 producing cells". The diffused MMP1 staining in Figure 3D-D' is not convincing.

8. The authors should address how they conclude (line 696) that "Pnt is required for exit from the IZ" since there is no defect in CC differentiation in CHIZ >pnt RNAi (Figure 4l-M). They should also comment on Pnt's role in plasmatocyte differentiation.

9. The authors should modify the introduction so that it includes a statement about the cardiac tube acting as a niche to control lymph gland hematopoiesis and supporting references.

10. The authors should add a sentence/short paragraph in the discussion saying that the proposed model/definition of states awaits functional confirmation, at least in some cases.

11. In the dot plots, the authors should indicate what is meant by 'mean' and 'non-zero percent'.

12. In Figure 2H,I, the authors should show the whole lobe and indicate the IZ and the proPL with arrowheads.

Suggested revisions:

1. It is suggested but not required that the authors provide a volcano plot to illustrate the differences between proPL and IZ.

2. It is suggested but not required that the authors provide the single table that must have been generated to produce the figures of the first submission and used to calculate the z-scores.

Detailed Reviews

Reviewer #1/Reviewing Editor:

I have read the response to authors and the revised manuscript. The authors have addressed all of the essential revisions and all of the reviewers' comments satisfactorily. They have overhauled the manuscript, making extensive changes to the figures and the text, thereby improving their study and its conclusions. The manuscript is now easy to read, follow the logic and the data support the conclusions in the text. I recommend its publication in eLife.

Reviewer #2:

The revised manuscript from Utpal Banerjee and collaborators involves substantial editing of the text, additional experiments and significant changes in the presentation of the data. The RNAseq data provide a useful resource to the community and the validations already help understanding the mechanisms controlling lymph gland development. Altogether, the revision allows a much smoother reading and answers many questions asked by the reviewers. I have few comments that do not call for additional experiments but need to be addressed.

The authors converted the AUC-scores in normalised z-scores to enhance the contrast on the heatmaps. This representation does indeed ease the interpretation of the AUC score but hides completely their actual levels of enrichment, which leads to strong conclusions based on minor differences. Here below indicative examples.

In the discussion (line 959-960), the authors state "proPL (but not IZ) generates the equilibrium signal, whereas the IZ alone activates the JNK pathway". While this statement could be deduced from the heatmaps (Figure 2—figure supplement 2A,B for the AUC Equilibrium signal and Figure3B and Figure 3—figure supplement 2B for the JNK pathway), it is far from being conclusive (compared to the initial representation of the 1st submission, old Figure 2—figure supplement 5 for the AUC Equilibrium signal and old Figure 3I,J for the JNK pathway).

In the previous representation, Mmp1 expression levels and AP-1 targets are enriched in IZ but overlap between the IZ and the proPL, suggesting that JNK activation is enriched but not specific to IZ.

Concerning the equilibrium signal, Figure 2—figure supplement 2B indicates a strong enrichment in the proPL but the IZ also present a mild enrichment. In addition, the AUC equilibrium signal displays low variability across the different clusters (old Figure 2—figure supplement 5: highest ~0.475 for pPL-4 and lowest 0.445 for PL-7). The AUC score represents the number of genes associated with the term. Thus, the score of AUC comprising low gene number should provide stronger contrast than the AUC with high gene number since the genes are less likely to be among the top 25% of genes (threshold set up by the authors). The AUC Equilibrium signal comprise 6 genes with a single one (Pvr) enriched in proPL. With such low gene number, the scores 0.475 and 0.445 both represent less than 3 genes and no striking differences across the two clusters. Therefore, it seems overstating to say that "proPL (but not IZ) generates the equilibrium signal".

To avoid any mis-interpretation and provide the reader with all the data to interpret the heatmap, I would recommend the authors to join the matrix of the AUC scores used in the manuscript across all clusters. I would also strongly recommend to rephrase the sentence line 959-960.

A clarification is appreciated on the in vivo definition of the IZ and the ProPL cells. The authors use the Hml delta QF line and CHIZ Gal4, a split Ga4 line relying on the domemeso and Hml delta enhancers that allows the identification of the IZ cells (page 16-18). The Hml delta Ga4 driver is expressed in the cortical zone and in the intermediate zone (Spratford 2020). The Hml delta QF is derived from this line and has an identical profile of expression (line 428). Yet, with no further explanation, this very line is used to label ProPL cells and distinguish them from the IZ, in combination with CHIZ-Gal4 (Figure 2H,I).

Altogether, the use of AUC and cell states adds a new dimension to the single cell analysis and can identify transient populations that appear during development, however, the manuscript largely over-emphasizes this concept and subdivides the developing lymph gland into numerous cell subpopulations in some cases without strong evidence. Beyond the fact that the analysis concerns a single time point, does not provide spatial resolution and does not sufficiently validate the described states (three issues that are anyway beyond the scope of the manuscript), an unweighted analysis can lead to over interpretations. After all, AUC are relative definitions, as detailed in the rebuttal letter. To distinguish cell groups through AUC, qualitative and quantitative information should to be somehow taken into account: the number of cells, the number of genes in each AUC, that of the members of the AUC showing differential expression, the absolute levels of gene expression, the levels of differential expression, as well as the relative 'significance' of genes in the AUC. Typically, transcription factors that control the expression of many genes or key enzymes in a biochemical pathway may have a different weight compared to other genes. Some of this information is not available in the figures, some is present in supplemental materials/can be extrapolated from other figures (whereas each figure should be self-explanatory). The lack of granularity necessary to firmly define a cell group (state, cluster...) by no mean affects the quality of the manuscript as the validation of all the cell groups in the lymph gland can await further studies. The authors should nevertheless be more cautious in the interpretations and avoid strong statements. This will overall strengthen and further simplify the message provided by the manuscript.

Reviewer #3:

The revised version of the manuscript answers my main requests. The extensive editorial changes in the text and modification of figures make the manuscript much easier to read and allow one to understand the novelty of this study.

I still have two concerns:

1. Figure 3D-D': That MMP1 staining corresponds to diffused MMP1 is not convincing. If IZ cells synthetized MMP1, why is there no MMP1 staining in these cells? Furthermore, MMP1 is expressed in plasmatocytes; thus in Figure 3E-F'(Chiz>mihep and CHIZ>hepACT) what about plamatocyte differentiation in these two conditions? The modification in MMP1 expression in these 2 genetic contexts might reflect a change in plasmatocyte differentiation and thus might explain the difference in MMP1 expression. In conclusion, data supporting that MMP1 is produced by IZ cells are missing, and the sentence on line 652 "establish that IP as MMP1 producing cells" has to be modified.

2. Line 696 :"Pnt is required for exit from the IZ " How do the authors arrive at such a conclusion since CHIZ >pnt RNAi (Figure 4l-M) shows no defect in crystal cell differentiation? What about plasmatocyte differentiation?

Additional corrections:

1. The introduction should be updated, including that the cardiac tube acts as a niche to control lymph gland hematopoiesis. This has to be added.

2. Figure 2 Sup 3 : Zscore is -3 to +3. Is it OK?

Decision letter 3; after second revision:

Dear Dr Banerjee,

Congratulations, we are pleased to inform you that your article, "Paths and Pathways that Generate Cell-Type Heterogeneity and Developmental Progression in Hematopoiesis", has been accepted for publication in eLife.

Editor's evaluation:

This paper will be of interest to scientists who study hematopoiesis. The authors combine single cell RNA-seq with bulk RNA-seq of transcripts from blood cells in the Drosophila larval hematopoietic organ. They present extensive analysis of the datasets, and the pseudotime analyses present a model of how hematopoietic progenitors can differentiate along transitory paths. These datasets reveal cell-type specific isoform expression of Notch pathway regulators, and genetic experiments prove the importance of these factors in development of one lineage. These transcriptomic analyses and subsequent genetic experiences provide strong support for the major claims of the paper.

Please take note of the points below and we hope you will continue to support eLife.

Best wishes,

Erika Bach

Reviewing Editor

Anna Akhmanova

Senior Editor

https://doi.org/10.7554/eLife.67516.sa1

Author response

Following first review:

Response to essential revisions:

1) Address issues with AUC scores:

a) Define how you established the AUC list.

We have now clarified how we established the AUC lists in the Methods section (page 68). In brief, chosen gene lists are compiled in ways that are relevant to the individual situation. If a set of genes is anticipated to be co-regulated based on some independent criterion, then we ask if that set is also enriched in any cell of the lymph gland, and if multiple cells belong to this enrichment profile, do they form a cluster or a combination of related clusters?

Such a strategy also helps us, for example, in mapping a list of genes identified as subzone-specific by Cho et al., onto our distribution of clusters, allowing for comparisons between the two approaches.

For more predictive purposes, gene sets could comprise, for example, published data sets of coregulated genes, those curated under well defined GO terms, KEGG pathway lists, or known and time-tested components of metabolic pathways that we have curated as stable, non-redundant participants across species. The individual genes used to construct the AUC lists as well as the source data for those lists can be found in Supplementary File 4.

b) Clarify how you define AUC scores that are statistically significant.

p-values directly calculated from the AUCell plots would be inflated in their significance because the number of data points (i.e. the number of cells used in the scRNA-Seq) is very large (please see Lin et al. 2013: “Too Big to Fail: Large Samples and the p-value Problem” for details). Very large sample size (# of cells) causes very small changes to gain significance, artificially lowering p-values. The more appropriate statistical criterion for such analyses is the number of standard deviations that separates two means of the sets of data as a way to assess the size of the effect. We have now addressed this issue by calculating z-scores (a representation that includes standard deviations from the mean). For consistency, we now use a single, standardized spread of z-scores that ranges from -2 to +2 for all such statistical representations throughout the paper.

Corresponding p-values can be calculated from the z-scores and are now reported in the figure legends.

For GO terms or KEGG pathway components, we first conduct Gene Set Enrichment Analysis (GSEA) to determine, in an unbiased manner, which terms or pathways are significantly enriched in each cluster. The resulting False Discovery Rate (FDR) values are meaningful, and we have now included them for the specific GO or KEGG terms in a new supplementary figure (Figure 1 figure supplement 3A).

c) Include a zero (0) on the y-axis of all graphs and P values for all graphs.

With the above in mind, and following some excellent comments and suggestions from the reviewers, we have revamped the way we display and use AUC scores; we agree that box plots are not the most convenient representations of such scores. To address concerns about comparing “highly heterogeneous ranges'' of AUC values across clusters, we now use heatmaps with standardized z-scores ranging -2 to +2 as explained above, to represent all AUC values and their variations across one standard scale (See new Figures 3A, C; Figure 2—figure supplement 2AB; Figure 2—figure supplement 3H; Figure 3—figure supplement 1D; Figure 3—figure supplement 2B). Related AUC lists are presented together on a heatmap with the z-score representing the standard deviations from the mean. Such comparative representations have been successfully utilized in previous studies of mammalian hematopoiesis (Zhu et al. 2020; Blood 136 (7): 845–856). The AUC scores in this standardized heatmap format displays individual values and thus 0’s on the y-axis are not necessary for the comparisons. Please also see a more detailed explanation with examples below (Reviewer #2, comment 1).

2) Provide a comparison of the data between the replicates.

This was an excellent suggestion with a pleasing outcome. We now include a comparison of the overall predicted cluster size and structure between the three replicates (Figure 1—figure supplement 2AB). This includes the percentage of cells belonging to each cluster and we find that each cluster contributes similarly in structure and in percentage in individual biological replicate samples (Figure 1 figure supplement 2A-B). These are true biological replicates and not simply technical ones, with high viability and high recovery rate maintained across the isolates. This adds to the robustness and reproducibility of the methods. We thank the reviewers for this valuable suggestion.

3) Provide validation of some of the genes associated with clusters/sub-clusters, including Delta in MZ1, PSC, X and MZ2 clusters, and Ubx in posterior lymph gland lobes, and provide validation to support your model that there is no transition path for IZ to proPL.

First, we thank reviewers 1/editor and 2 for their opinion that validation of the expression pattern of all identified genes in the single cell data is out of the scope of this manuscript (Reviewer 2) and yet that several individual genes have been validated (Reviewer 1) through multiple means. We whole-heartedly agree with this. We wish to briefly elaborate on this point.

In the Discussion section (pages 33-35), we have explained that the methodology employed here is quite different from those in Cho et al., and so is the nature of what we consider is the best way to validate our data, given that doing so for the entire list is beyond the scope. For bulk RNA-Seq experiments, we first separate cells based on single or double markers for a cell type (domemeso and HmlΔ single positives, double positives, and double negatives, etc). And we do this with a different genotype to separate CCs. Thus, we know which population belongs to a particular zone. Since we start with separated MZ, CZ, IZ, CC cells based on how they have been historically defined, there can be no question that what we sequence in the separated “MZ” cells are indeed MZ. So when two clusters in the scRNA-Seq show those characteristics and yet are separable, we can be confident in calling them MZ1 and MZ2. Additionally, we use lists of published hallmark genes (that have been validated through genetic analysis and/or immunohistochemistry through the last two decades of lymph gland studies) to compare to the lists of differentially expressed genes found for each cluster. Our algorithm for generating data (unsupervised graph-based clustering) is agnostic to these lists, but when mapped, they belong to the appropriate zone and cell type (such as PSC, Crystal Cells, MZ, and Plasmatocytes). This first step of isolating populations of cells by FACS, defining markers through bulk RNA-Seq, and mapping them onto scRNA-Seq is not often used in other transcriptomic analyses, but it provides a benchmark and validation for the cluster structure. The bulk RNA-Seq and scRNA-Seq are orthogonal methods conducted on different biological samples on independent days using very different techniques. Thus their remarkable agreement is also a measure of the robustness of the strategy (examples of comparisons shown in Figure 1K, Figure 5K-L, Figure 6D-E, Figure 2 figure supplement 1A, K, Figure 2—figure supplement 3A). We hope we made the distinction clear between isolating a physically and genetically defined zone of cells and seeing which of the clusters identified within it differentially express known markers and the approach in which all cells of the lymph gland are mixed together even from different time points in development, and calling a set of cells a subcluster of prohemocytes (PH1, 2, 3, etc as in Cho et al.), because they express some markers. Both are valid approaches, but the latter approach cannot guarantee a single physical localization of the subcluster within a zone (e.g. MZ, CZ, etc).

Most importantly, we include multiple individual examples of cross-checks of revealed expression patterns with loss of function genetic assays and we also include two complete case studies (which, as a reviewer says, could have made their own manuscripts); we included them here precisely because they provide functional validation for the expression patterns, and in the process provide elegant genetic solutions to problems that have eluded us in the past. The fact that an alternate isoform of a protein (Sima in case study 2), or a previously unexplored step-wise role of a pathway involving a very well studied protein (Pointed in case study 1) could be linked to novel functional cascades by following up their transcriptomic profiles, to us, is the best validation of the data, and it then successfully loops back to predict the expression of additional members of the pathway (eg., Numb and Musashi). Since we are limited to choosing a small and representative number of genes to validate, we prefer ones that also help us understand principles of blood development better, over choosing a handful of randomly selected genes to validate, for which Gal4 drivers are available. The fact that entirely new genetic pathways can be constructed relying on the RNA-Seq data predicted for their components, in our mind, is a very powerful validation of the system and is the sort of validation that only Drosophila can currently provide.

Finally, the ultimate goal of this manuscript is not to identify more “hallmark genes” and new cell-type specific markers, although such markers fall out of some of the analysis. Generally speaking, developmental genes are rarely cell-specific (genes such as Lz, NimC1, and Antp are exceptions since they represent proteins expressed at terminal stages of development). We focus more on the trends of coexpression of pathway-specific genes that are revealed by RNA-Seq, and therefore validating such pathways by loss of function is our primary approach and we believe if counted with the above criteria in mind, and combined with protein expression patterns, we have validated a fairly significant number of the identified genes in this manuscript.

With regards to Delta expression, Delta was identified as a PH1 marker by Cho et al. We also detect Delta in our analysis, but did not include it because it is expressed at very low levels. We now do so in the heatmap shown (Figure 1K). In terms of differential expression, PSC cells show the highest expression of Delta, which is lower in MZ1 and becomes insignificant in MZ2. These results are comparable with Cho et al. who do not comment much on the PSC, but, when GTraced for Delta-Gal4 expression, their data also show the highest expression in PSC. This is consistent with our data and also with the overall expression level of Delta being low. We have also included a comparison between the two studies in the discussion (page 35-36) but also retained the more detailed comparison (Supplementary file 2).

The primary MZ1 cluster in our study is small (1.3% of total cells), but likely includes the much smaller subclusters defined by Cho et al. 2020: PH1 (0.4% of total cells), PH2 (0.4% of total cells), and a subset of the PH3 cells (2.1% of total cells). In this scenario, PH1 would comprise such a small proportion of MZ1 that it is not a surprise that Delta is not identified as a differentially represented hallmark of MZ1. This illustration also justifies why we did not further subcluster the primary MZ1 cluster. Doing so, we end up with subclusters that are not very robust.

Concerning the comment, “and Ubx in posterior lymph gland lobes...”. In the primary lobe, Ubx is identified as a low expression marker with its highest representation in MZ1 as in Cho et al. 2020, who found it to be a marker of PH2. Ubx expression in MZ1 is surprising, worthy of a future genetic analysis. But for now, all we can say is that the expression data seem consistent between the two studies.

The reference to posterior lobes in the context of this manuscript is unclear. All posterior lobes are removed during dissection of the tissue and so this analysis is strictly for the anterior lobe. The reason for the expression of Ubx in the posterior lobe region was explored by our laboratory almost two decades ago (Mandal et al. 2004). The excellent Rodrigues et al. 2021 paper that Reviewer 3 refers to cites our past work and makes the following statement that we have known for a long time as well: “Ubx is specifically expressed in the tertiary lobes of third instar larval lymph glands.” Posterior, yes, but not even in the secondary, instead the tertiary lobe expresses Ubx. So there is no chance of any contamination during dissection given the intervening secondary lobes and pericardial cells. This is not to say that some unknown signaling process or tertiary lobe cells might not have contributed to turning on Ubx in early larval stages, but that would be purely speculative at this point.

Regarding “No transition path from IZ to proPL”, we apologize if we overstated our case in implying that even in principle, there can be no cross-talk between IZ and proPL. This is more properly addressed in the current revision (pages 16-18 in results; pages 37-38 in Discussion). However, it is not true, as Reviewer 3 suggests, that an overlap in trajectory/pseudotime implies a lack of independence between the two paths. To make this point clearer, we now include a new figure (Figure 2G), illustrating what overlap in pseudotime means (a different but equivalent representation can be seen in Figure 4C; Supplementary Figure 3C of the Cho et al. 2020 manuscript). We have also added new in vivo data that clearly demonstrate the distinctinction between IZ and proPL cells (Figure 2H-L, Figure 2—figure supplement 2C). Inasmuch as pseudotime represents temporal progress in development and not differences in cell types, two independent pathways can indeed overlap in pseudotime. Both IZ and proPL are found in transitional states that link MZ2 with PL, but we know that IZ and proPL are distinct clusters (Figure 1J; Figure 2F) with distinct gene expression patterns (Figure 1K; Figure 3B), and have distinct functions (Figure 2J-L; Figure 3D-F’’; Figure 2—figure supplement 2C). On a 3D tSNE (Figure 2F; Video 1), these two large clusters are not adjacent along their boundaries as they are with both MZ2 and PL. We also know that these two paths are different since IZ and proPL can be visualized with distinct genetic drivers and have differences in signaling properties in vivo (pages 16-18, Figure 2H-L; Figure 3D-F’’). Therefore IZ and proPL provide two different paths. However, since absence of evidence is not evidence of absence, we should not have implied that there is no possibility of exchange between them. We thank the reviewer for pointing this out and have made the appropriate changes listed above.

4) Demonstrate the potency of RNAi lines, including pnt RNAi.

We have now included data that demonstrate the potency of the RNAi lines in the Methods section and in individual figures and figure legends. Not all lines could be retested by RT-PCR due to our lab closures, but each line is vetted for the robustness and specificity of its activity. For example, and in the context of the reviewer’s specific query, the pntRNAi line shows a very strong and consistent phenotype as now documented with quantitation and p-values (Figure 4B-D). Additionally, we used 2 different and independently generated pre-validated RNAi lines for pnt that both give rise to the same reproducible phenotype when each is used with 2 independent drivers (Figure 4P; Figure 4—figure supplement 1F). Finally, these pntRNAi lines have been validated as well, in several projects in the past since they are used routinely in the laboratory for diverse experiments in different tissues.

5) Explain the main differences and similarities between your data set and Jiwon Shim's by addressing specific issues such as Cluster "PH1" (your MZ1, PSC, X and MZ2 clusters) and Cluster X (and include a dot plot representation of Cluster X).

A discussion of the main differences and similarities between our data set and that by Jiwon Shim’s lab (Cho et al. 2020) is now included in the main text (pages 9-12; 19; 35-36).

Briefly, from this comparative analysis we conclude that:

– The broadly defined clusters identified by Cho et al. agree fairly well with the primary clusters we identified in our experiments as seen in Supplementary File 2A-B.

– Due to inherent differences and biases of the different experimental methods employed by the two studies (e.g., DropSeq vs 10X Genomics) (page 35-36), we do not expect an exact 1:1 match of our clusters with the smaller subclusters from Cho et al. nor a complete overlap of their exact genetic profiles.

– Based on transcriptomic and genetic data as well as comparison of sizes, MZ1 cluster is most similar to the PH1 and PH2 subclusters of Cho et al. (pages 9-12, 35-36). The MZ1 cluster (1.3% of total cells) likely contains cells of the subclusters PH1 (0.4%), PH2 (0.4%), and also a small number of the PH3 cells (2.1%) (Supplementary File 2E).

– MZ2 cluster (26% of total cells) is most similar in gene expression and size to PH4 (27.2%) (Supplementary File 2C-E).

– It is important to keep in mind that PH1/2/3 etc are obtained by a “subclustering” algorithm (after the authors “aggregated cell clusters according to the expression of previously published marker genes by manual curation”), which is very different from the clusters such as MZ1, MZ2, etc obtained through unsupervised graph-based clustering methods. So we are not surprised by several subclusters mapping to our clusters.

The primary X cluster (1.1% of total cells) is most similar in gene expression and size to the primary “GST-rich” cluster (1.2% of total cells) in Cho et al. (Supplementary File 2A-B, E). The X and GST-rich clusters share many genes in common, and we find that the genes shared between Cluster X and the GST-rich cluster show strong enrichment of DNA damage response pathways (p-value = 0.00004). X and PH1 also show similarities but that can be largely attributed to common cell-cycle genes (p-value = 5.031×10-19), and Cho et al., also mention high mitotic activity for PH1. In terms of a broader comparison however, the number of shared genes is highest between X and GST-rich.

A dot plot representation of genes highly enriched in Cluster X has now been included (Figure 2 figure supplement 1J). We thank the reviewer, it does look much nicer than what we had before.

6) Provide additional quantification of CC in "Numb and Musashi in CC determination", additional information on how Numb intensity was quantified, and P values

Thanks for this important suggestion. We now provide additional quantification and p-values for the various sorts of data concerning CCs in the “Numb and Musashi in CC determination” section (Figure 5K, O; Figure 5—figure supplement 2D; Figure 5—figure supplement 3H-I; and Figure 6D, J, M). The intensity values of Numb (Figure 5—figure supplement 1M; Figure 5—figure supplement 2D; Figures 6J, M) represent “per cell” expression and not of the entire gland. However closely placed cells were quantified together to prevent skewing of the data. In general, we have now included p-values for other data outside of this CC section as well, and can be found in all relevant figures in the revised manuscript (new Figures 1, 4, 5, 6, and related supplementary figures).

7) Provide for lymph gland analyses the quantification of CC and MZ indexes and P values.

We have now provided quantification of CC and MZ indexes and p-values for lymph gland analyses (Figure 4P; Figure 5O; Figure 4—figure supplement 1A, C, E-F; Figure 5—figure supplement 2D, G; Figure 5—figure supplement 3H-I, Figure 6J, M).

8) Consider using dot plots instead of box plots.

We thank Reviewer 2 for this suggestion. We have included several dot plots (Figure 2N; Figure 4A; Figure 5L; Figure 6E; Figure 2—figure supplement 1J-K; Figure 2—figure supplement 2G). Box plots have been entirely eliminated throughout the study, including for AUC scores and replaced with heat-maps and dot plots as appropriate.

9) Provide in the main figures a list of the key genes defining each sub-population inside the different clusters.

We now provide a list of the key gene combinations that define the subpopulations (Figure 7A). As mentioned in Essential revisions #3, please note that trajectory states represent progress in pseudotime and not different cell types. Also important is our basic premise that developmental genes are rarely cell type specific. Therefore, clusters are defined by expression trends. While the entire transcriptome contributes to defining this trend, we have included (Figure 7A) some of the combinations of genes that we have investigated and validated as part of this study.

10) Provide the number of cells in each cluster, including the IZ cluster.

We have included the number of cells in each cluster including the IZ cluster (Figure 1J) and also included this number for each replicate (Figure 1—figure supplement 2B).

11) Provide a schematic representation of crystal cell specification and maturation involving Notch, Numb and Sima.

Thanks, this is an excellent suggestion. We have done so in a new Figure panel and this does improve the readability of the manuscript (Figure 7D).

12) Make extensive editorial changes, including shortening the manuscript per the reviewers' suggestions and clarifying the narrative so that the introduction covers what is actually being treated in the manuscript. For example, the introduction discusses the IZ but then the manuscript does not provide much insight into this cluster and instead focuses on CCs.

We thank the reviewers and RE for this critique. The introduction is shortened and re-focused on the main themes of the manuscript. In general, we agree that the descriptive parts were confusing and repetitive. In the annotated (changes tracked) version of the revised manuscript, we hope it is clear that we have made extensive changes to every section, shortening where appropriate, and avoiding duplications throughout the manuscript. There are too many such changes to list them here individually.

The changes do not affect the scientific conclusions but, we hope, make them more comprehensible.

Thank you for this suggestion on which all reviewers agreed and we appreciate this constructive feedback a lot. Overall the manuscript is shorter by 7.5 pages. The net decrease in figure panels is 53. This is in spite of the many new additions asked for by the reviewers.

As for the comment on IZ, this manuscript includes more details on IZ than any other published work to date. To supplement the material further, we now include more functional data on the IZ (Figure 2H-L; Figure 2—figure supplement 2C; Figure 4—figure supplement 1F). CC development is a more mature field that introduced our laboratory into hematopoiesis and naturally it is more amenable to complex genetic analyses (Rizki and Ritzki 1981, Genetics 97(Suppl. 1): s90; Lebestky et al. 2000, Science 288, 146–149).

Response to detailed comments (essential revisions and other reviewer suggestions)

Reviewer #1/Reviewing editor:

There is a massive amount of data in this study. The data are of high quality, and the manuscript is well written. The results support the main conclusions. This study will be a very valuable resource to the community. I have a few suggestions for possible improvement.

1. On p. 36 (line 906), the authors write that "we surmise that Notch activation is the default pathway for early Hml-expressing cells to become CCs, and that the activation of Pnt acts antagonistically to prevent this process thus favoring instead, the plasmatocyte fate." I don't understand the logic of this. PLs represent 95% of the mature hemocytes, whereas CC represent 5%. Why would most of the differentiating hemocytes have to repress Notch signaling by expressing Pnt. Loss of Pnt from Hml+ cells would very consequential as the animal would not have PLs from the LG. What I am trying to say is that the kind of regulation proposed here is not robust and could be easily disrupted by mutation. Could the authors comment on this.

Thank you. Calling this a “default” pathway was an oversimplification of the data presented and we have now rephrased our statements in the text (page 27-28). We hope this resolves the main issue.

But the reviewer asks a very insightful question about CC number. The answer is long and not a central focus of this manuscript. The basis is in the dynamic regulation of the ligand (as it is for all cases of Notch activation in flies), Serrate expression is dynamic, temporally and spatially, and controls how many cells receive the Notch signal. This regulation depends on multiple factors, including, for example, systemic signals triggered by the O2/CO2 balance of the local environment.

2. The manuscript is very long (the results section is thirty-five long) and reader attention spans tend to be short. Could the authors please edit the manuscript to reduce its length. For example, I don't think that the entire section about cluster X is needed. The metabolism section could be condensed. Some of the discussion is redundant with the results.

Thank you for the valuable suggestion, we have addressed this as above in the essential revisions (12). The introduction and discussion are shortened and repetitions removed. X and metabolism are telling us something novel about development that few laboratories study, but that future investigators might find valuable and so while we shorten them as you ask, we retain their prominent mention in the paper. The editing, asked for by all reviewers, was a very valuable suggestion.

3. Please provide accession numbers the raw data at NCBI and a link to the reviewers.

To review GEO accession GSE168823 (now included in Methods page 68).

Reviewer #2:

In this manuscript, Girard et al. analysed the transcriptome of the lymph gland of Drosophila using high throughput sequencing. They fist used genetic markers (Domemeso-GFP and Hml-RFP) to sort the cells from three distinct regions in feeding larvae: the medullary zone (MZ) known to be populated by progenitors, the intermediate zone (IZ) populated by intermediate precursors and the cortical zone (CZ) containing the mature hemocytes. The cells from the CZ were further subdivided into plasmatocytes, immature crystal cells and mature crystal cells with the genetic markers Hml-RFP and Lz-GFP. This subdivision was carried out at later stage to maximise the number of mature hemocytes. A comprehensive molecular signature of each region and cell type was determined with bulk RNAseq. Then, the authors analysed the transcriptome of 21200 cells of the lymph gland using 10x genomics technology. They found 9 clusters of cells displaying distinctive signatures and metabolic properties, including the cells of the Posterior Signaling center (PSC), two types of progenitors in the MZ, the intermediate precursors, the plasmatocytes as well as the immature and mature crystal cells. They predicted the filiation between the clusters using Monocle, indicating a developmental trajectory starting in MZ producing IZ and then plasmatocytes. At last, the authors validated two mechanisms of cell differentiation highlighted by the transcriptome analysis. They showed the context-specific role of the transcription factor Pnt in the differentiation of plasmatocytes. Loss of Pnt in the early progenitors has no effect whereas its loss in the late progenitors prevents the differentiation of intermediate precursors and thus mature plasmatocytes without affecting the differentiation of crystal cells. As a second study case, they showed that the interplay between Notch, Numb, Sima and Musashi is involved in the maturation of crystal cells.

Overall, this manuscript presents a tremendous amount of RNAseq and in vivo data with highly detailed interpretations. It provides very valuable and substantiated information on the molecular mechanisms involved in the development of the hemocyte lineages in the Drosophila lymph gland. The main caveats are (1) the definition of two clusters IZ and proPL, which seem to belong both to the IZ, (2) the fact that most boxplots do not include 0 in the y-axis, which strongly biases the interpretation of the graphs, (3) the definition of mitotic precursors. Finally, shortening would have made the manuscript more easily readable and would have conveyed the message more directly.

1) Most charts presenting expression levels or AUC score across clusters do not include 0 on the y-axis and disclose highly heterogeneous ranges. For example, the ranges of the AUC displayed in Figure 3 are from 37 points in A (y-axis range 0.43 to 0.8) to 3 points in I (y-axis range 0.13 to 0.16). This is highly misleading, and biases the interpretation of the data since the authors describe minor differences the same way than large differences. Few examples are:

– In the legend of Figure 3B-C, the authors write "(B) TCA cycle enzymes are expressed at exceptionally low levels in the PSC compared with the cells of other clusters. (C) Expression of oxidative phosphorylation pathway enzymes is low in the PSC, high in MZ1 and MZ2 and moderate in IZ, proPL, and PL clusters.". In B, exceptionally low level correspond to 0.42 compared to 0.46 in the other clusters. In C, low in the PSC means 0.74 and high in the MZ corresponds to 0.78. Since AUC represent frequencies, I doubt that few point can translate into such high ranges.

Many of the concerns raised by the reviewers about AUC and how to measure and interpret its statistical and biological relevance are addressed above in essential revision (1). There we provide an explanation of the importance of z-scores that are easier to see in heatmaps, and so all AUCell analyses are now converted to heat maps with a single scale.

The reviewers’ concerns with the small differences in AUC scores are genuine and we thank them for their very careful analysis of our data. As stated in essential revisions (1), we no longer have AUC data shown as box plots, but they are now heatmaps with standardized z-scores.

But we do not wish to give the impression that the data in the box plots were incorrect and somehow disappeared in a new representation. For a full explanation of AUC data, we strongly recommend (Aibar et al., 2017). Nature Methods, 14, 1083-1086; who first used AUC for transcriptomic analysis, but also the important details in https://www.bioconductor.org/packages/release/bioc/vignettes/AUCell/inst/doc/AUCell.html. Both of these resources helped us a lot. A quote from the bioconductor article is:

“The AUC is not an absolute value, but it depends on the cell type (i.e. cell size, amount of transcripts), the specific dataset (i.e. sensitivity of the measures) and the gene-set. […] Since the AUC represents the proportion of expressed genes in the signature, we can use the relative AUCs across the cells to explore the population of cells that are present in the dataset according to the expression of the gene-set.”

The above quote asserts that many factors affect AUC scores and also that its value is derived on a cell by cell basis. Most importantly, in our case, the actual AUC score will be different depending on the input data-set. As a hypothetical example, a data set-A, “signal transduction” vs set B called “EGFR signaling” (components of which are included in the “signal transduction” category as well), will give very different AUC scores even for a single cell involved in EGFR signaling. However, each set can be compared between cells (but the same cell cannot be compared between sets A and B). Thus, what matters is not AUC scores themselves, but how many standard deviations apart one group of cells is from another, when compared over all cells.

We now see and agree that box plots are a messy and confusing way to show this comparison. They are particularly troublesome when used to display comparisons of sets of data represented in arbitrary units (for all the reasons the reviewers point out). The z-score includes both difference from the mean and the standard deviation in its calculations and therefore small AUC scores with very tight distributions can give meaningful z-score values, while large mean values with very big dispersions might not.

For some of the specific cases cited by Reviewer 2, we have converted the z-scores into adjusted p-values (based on percentile deviations of the z-score distributions) and below are some numbers for zscores and p-values for each cluster for the metabolic processes referred to by Reviewer 2, read each as Cluster (z/p):

For Glycolytic genes: PSC (+5.7/ < 0.00001); MZ1 (+2.5/0.006); MZ2/IZ/proPL/ PL z ranges from +0.98 to -0.79 and p-values from 0.2-0.5. CONCLUSION: PSC and MZ1 are significantly enriched (HIGH) for glycolytic genes with PSC significantly higher than MZ1. No significant enrichment of the glycolytic gene set is seen in MZ2/IZ/proPL/PL

For TCA cycle genes: PSC (-4.2/ 0.000013); MZ1/MZ2/IZ/proPL/ PL, -score ranges from -0.46 to +0.17 and p-values from 0.3-0.4. CONCLUSION: PSC is significantly de-enriched (LOW) for TCA cycle components; No significant enrichment or de-enrichment of the TCA gene-set in MZ1/MZ2/IZ/proPL/PL.

For nuclearly encoded Ox-Phos genes: PSC (-2.2/ 0.015); MZ1 (+1.95/0.026); MZ2/IZ/proPL/ PL, zscore ranges from +1.09 to -0.47 and p-values from 0.1-0.3. Conclusion: Ox-Phos related genes are de-enriched (LOW) and are somewhat higher in MZ1 (meets 5% but short of 1% criterion for significance). There is no significant enrichment or de-enrichment of the Ox-Phos gene-set in MZ2/IZ/proPL/PL.

We now use formal statistical descriptors (such as on pages 21-22 for the above pathways).

– In figure 3D and E, the authors mention the levels of Zw and Idh in the clusters. They explain that both are highly enriched in PSC compared to MZ. Zw presents an expression level of 1.1 in PSC, around 1.2 fold higher than in MZ and Idh presents an expression level of ~ 7 in PSC around 2 fold higher than in MZ. The difference in level of expression and fold enrichment cannot be appreciated properly due to the heterogeneous y-axes. In addition, the authors described the two enzymes in the same terms while Idh is expressed 7 time higher than Zw.

We state in the paper that Idh is the most abundant dehydrogenase in the cell that maintains NADPH levels important for detoxification of free radicals, Zw (G6PDH) is another dehydrogenase that also increases NADPH but is expressed at a low level. G6PDH is a PPP pathway enzyme that is enriched in the PSC cells, which must maintain very low ROS as explained in the text.

Even at low expression, the enzyme G6PDH is critical for ROS removal because it generates reduced glutathione (GSH) in a process that consumes NADPH. Normally this is not an issue because other sources, principally Idh can make NADPH (but it makes alpha-ketoglutarate in the process, and not glutathione) that can be utilized in the G6PDH coupled reaction to make GSH. But Idh activity requires isocitrate, made by the TCA cycle and if TCA is low (as in the PSC), G6PDH gets upregulated by a small degree to raise NADPH (400 million kids get anemic from low G6PDH since the RBCs lack citrate). All this to say that we lumped Idh and Zw together not because of expression levels but because of their coupled functions. We feel this will be too much detail to include into this manuscript. We hope you agree.

– For Pnt (Figure 4AB), the authors describe "a very low level of Pnt" in the PSC and MZ and a significant increase in the MZ2-3. The level in PSC is around 5.2, which is much higher than most genes described in this study and the significant increase is going from 5.2 in MZ2-2 to 5.5 in the MZ2-3. The significance of this increase need to be documented with a p-value and the biological relevance of this difference seem far-fetched.

The pnt levels are now expressed as a dot plot as per the reviewer’s previous suggestion (Figure 4A). As explained in the text (page 28), the change in pnt expression (as, likely with many developmental genes) is gradual and needs to be considered for its trend and not stepwise changes as for hallmark genes and therefore requires functional (genetic) validation in order for the trend to mean something. The dot plot shows a trend rising from its low expression in PSC to higher in MZ1/MZ2-1/2. Its rise in MZ2-3 is apparent with further increase in IZ/proPL, and in PL and then a very clear de-enhancement in CC. This gradual modulation of expression becomes meaningful only when combined with genetics, and is the reason for the “Case Study”. This is described in the text (pages 26-28) and figures (Figure 4A-P; Figure 4—figure supplement 1A-F), which establish independent functions of pnt in late MZ, IZ, and PL.

2) A better explanation should be provided for the AUC scores. The authors rightly say that AUC are reflective of co-regulation however the keys to interpret the score are not described. In addition, while a difference from 0.80 to 0.60 (Figure 3A) seems plausible and sufficient to call for an enrichment of glycolytic genes in the PSC, the biological relevance of differences from 0.74 to 0.78 (Figure 3C) or from 0.08 to 0.09 (Figure 2—figure supplement 5) seems far-fetched without further explanations/justifications.

Please see our response to comment 1 above.

3) The distinction between the cluster IZ and proPL needs to be clarified. The enrichments of the IZ AUC and individual IZ markers are not striking (Figure 2—figure supplement 3A-H). In addition, can the authors explain why only 6 of the 9 IZ specific genes were taken to estimate the AUC score? The strong similarities between the two clusters, in terms of markers and developmental trajectories, seem to indicate that the two clusters represent transient conditions that each cell goes through on its way towards full differentiation. Indeed, the single cell analysis has been performed in feeding larvae, when cells are actively differentiating, not in a steady state condition. Longitudinal/spatial analyses in the developing lymph gland might help in the interpretation, but this is not the scope of the present manuscript.

There are several parts to this question.

– The distinction between IZ and proPL has been better clarified in the main text (pages 16-18, 3738) and in the essential revision (3).

– AUC scores are no longer used in describing IZ genes. We present them individually in the revised version (Figure 2—figure supplement 1K). 6 of the genes are picked since the others are expressed at very low levels in the scRNA-Seq (page 15).

– Importantly, 5 out of those 6 genes (all but CG31821) are present in the differentially expressed genes list for the IZ cluster (Supplementary File 1).

– The expression patterns of the individual IZ-enriched genes are now shown in a dot plot representation (Figure 2—figure supplement 1K).

– A temporal mapping of gene activity in true-time, as the reviewer points out, is beyond the scope of this manuscript, but is a critical follow-up project.

4) The authors assess the impact of Pnt in the MZ cells using the drivers domemeso-Gal4 and Tep4-Gal4. The authors showed that domemeso>pntRNAi prevent the progression of MZ cells toward IZ cells. Some quantification would be welcome to appreciate the penetrance of the phenotype. In addition, the authors say that Tep4>pntRNAi has no observable phenotype, while the comparison between Figure 4E and Figure 4F seem to indicate that the lobe is smaller and with less Tep4 positive cells. Such difference could arise from slight stage differences. Could the authors indicate how they staged the animals, the number of samples...? At last, the potency of the pntRNAi construct should be documented.

We have now repeated our experiments with carefully staged larvae and included more definitive quantification for all of the pntRNAi related data (Figures 4D, I, P; Figure 4—figure supplement 1A-F). Our analysis of all the images (n=10) confirms that Tep4>pntRNAi does not change the number of HmlΔ+ cells (Figure 4G-I), the number of Tep4+ cells (Figure 4—figure supplement 1C), or the overall lobe size (Figure 4—figure supplement 1D) to any significant level. We have replaced the previous images with those that better represent the statistical data. The potency of the pntRNAi line is discussed in essential revision (4) above, and is also clear from the quantification and statistical significance (Figure 4D; p<0.0001) of the observed phenotype. We also use two independent pntRNAi lines and find similar results with both (Figure 4P; Figure 4—figure supplement 1F).

5) In the section "Numb and Musashi in CC determination", the authors mention that Notch-ACT raises Numb levels and Notch-RNAi decreases Numb levels in the crystal cell. This interpretation should be clarified. Since N activity is modulated using a CC specific driver and the number of lz>GFP also changes upon N modulation, the observed results may arise from regulation of Numb expression and/or from regulation of CC number. CC quantification will help sustaining their statement on the role of N. In the third paragraph linked to Figure 5—figure supplement 3H-N and Figure 5O-R', the authors describe a clear reduction of Sima puncta, PPO2 expression and number of mCC. P-values should document these observations. Also, does the expression of the type II targets decrease upon Numb RNAi? At last, Figure 6H-M, the authors indicate that msi-RNAi enhance the level of Numb in crystal cells without providing information on the procedure followed to measure Numb intensity. What does Numb intensity represent in Figures 6J and 6M? Is it the average level per cell or the level in the whole lobe?

We have quantified Numb levels in NotchRNAi and NotchACT backgrounds (Figure 5—figure supplement 2D) to support our statements on the role of Notch. We have added p-values to document the reduction of Sima puncta (Figure 5O) and reduction in PPO2 expression (Figure 5—figure supplement 3H). The trend in crystal cell numbers in numbRNAi vs wild type is best appreciated in the FACS results (Figure 5T), which show that the total CC number remains constant but the distribution shows less mCC and higher proportion of iCC.

We do not know if the expression of type II targets decreases upon NumbRNAi and these experiments would require extensive FACS sorting with many different populations and qPCRs that are not possible in the current circumstances. The intensity values in Figures 6J and 6M represent “per cell” expression and not of the entire gland. However closely placed cells were quantified together to prevent skewing of the data.

6) The authors carried out the single cell sequencing in triplicate. It would strengthen considerably the data to provide a comparison between the replicates.

We are very thankful for this suggestion. We have addressed this point above in essential revision (2).

7) The paper would gain from shortening. The introduction is broad and exhaustive, the results section describes the different the clusters and states as well as two study cases, the discussion elaborates on the mode of differentiation and put forward interesting models such as the gradual rather than stepwise transitions between groups of cells. Since this is a resource paper, the validation of all the single cell data is out of the scope, hence a thorough discussion of all those data could be shorten and used in subsequent studies. Furthermore, many data are already discussed in the results section, diluting the important and novel messages that the paper conveys.

We have addressed this point above in essential revision (12).

8) According to the authors, Cluster X represents mitotic states of several distinct cell types, including the CZ that carries differentiated cells. This intriguing finding indicating the presence of dividing cells throughout the lymph gland deserves some clarifications. Does it imply that none of the other clusters identified by the RNAseq analysis contains cells in mitosis? Does it mean that plasmatocytes and cells of the medullary zones have similar mitotic potential? Is there any difference in the type/levels of genes associated to cell division between the cells of the cluster that express MZ markers vs. those that express the Pl markers? I understand that the spatial analysis of the cluster X cells in the lymph gland, which would help clarifying these issues, goes beyond the scope of the manuscript. It would be nevertheless useful to compare the RNAseq data with those from the laboratory of Jiwon Shim, who also identified the clusters of mitotic cells in the lymph gland. Also, a dot plot representation of the genes associated with cell division in the different cells of the cluster X (MZ, Tr, Pl) might help identifying features specific to the different subclusters.

Although Cluster X shows especially high levels of mitotic activity, this does not imply that the cells in the other clusters are devoid of any proliferation. When purely mitosis-related genes are shown on the t-SNE some regions that border between clusters e.g. proPL-7 cells bordering PL and IZ have z-scores for G2-M related genes that range between 1.5 and 4.3 (p-values between 0.07 and <0.00001) (Figure 2 figure supplement 1D). Where these cells are spatially located in the lymph gland will be important for follow up studies. However the stress-related DNA damage activity is exclusive to Cluster X with z-score of 8.6 (p < 0.00001) (Figure 2—figure supplement 1E), for the mitotic DNA damage category. All other clusters have z-scores for this latter category that averages to approximately 0 (p = 0.5) and so it is possible that the nature of mitotic stress related activity seen in Cluster X is somehow different from that of other subsets of mitotic cells.

With regards to mitotic potential, MZ cells show very low mitotic activity while PL cells show higher mitotic activity at this developmental time point (as judged by phospho-Histone H3 staining in previous studies and mitotic G2/M transition enrichment here). As mentioned in the main text, the majority of MZ cells are found in the G2 phase. We believe it is this lengthening of G2 that may trigger the signs of replication stress in Cluster X but as the reviewer mentions a spatial analysis of the physical location of cluster X cells within the lymph gland will be necessary and is beyond the scope of this manuscript.

Finally, as described in essential revisions (5), signatures of Cluster X are detected in the Cho et al., “GST-rich cluster”. Through data mining, we have also detected clusters with X signatures in scRNA-Seq analyses of the mammalian hematopoietic system (Velten et al. 2017; pages 13-14). Future studies will very likely find that clusters such as X play an important and specific role in blood development. But for now, we have minimized speculations in the text regarding X.

Reviewer #3:

Using a combination of bulk RNA-Seq of FACS-sorted cells and single cell RNA-seq, the authors identify various blood cell subpopulations that compose the Drosophila hematopoietic organ called the lymph gland. This study has been performed at one developmental time point, mid third instar larvae. The authors perform a pseudo time analysis and propose a developmental trajectory with multiple paths to mature blood cells types. RNAseq data suggest that different blood cell types express genes involved in various metabolic processes. They establish that Pointed has different roles during lymph gland hematopoiesis. Finally, they identify that Numb and Musahi are involved in a Sima dependent Notch non canonical pathway in mature crystal cells.

This analysis is of interest, however in the current version, it is too preliminary.

1. The list of the main genes defining each sub-population inside the different clusters, as well as their expression in all the other sub clusters, has to be provided in the main figures.

This point is addressed above in essential revision 9.

2. No validation of RNA seq data is provided: A spatial reconstruction in vivo by profiling the expression of a subset of genes identified by RNA seq is necessary.

Please see essential changes point 3.

3. To support the developmental progression of lymph gland cells proposed in Figure 7,

lineage tracing experiments are required.

Figure 7 is a model figure to help make the discussion section more comprehensible. It is the summary of the transcriptomic and genetic data resulting from the analysis. It shows a schematic representation of a snap-shot in time and the diversity of cell types. Lineage tracing involves studies through real development using specific cell drivers for each zone, which is not the focus of this paper.

4. Comparison between RNA seq data obtained by (Cho et al 2020) has to be given in the main text. Furthermore, discrepancies between these 2 studies have to be clarified. How the AUC list has been established? This is a key point. For example, the PH1 cluster identified by Cho et al is spread out in MZ1, PSC, X and MZ2 cluster in this study. Why are the results so different? Delta is a marker of PH1, which is validated by analyzing its in vivo expression profile. What about delta expression in the scRNA seq performed here?

These concerns have been addressed above in essential revisions points (1, 3, and 5).

On a practical note, Professor Shim is a close colleague and past member of our laboratory. She made her data available to us long before its publication. We have concentrated on how this transcriptomic approach in marker identification (new for lymph gland studies, but quite commonplace as a tool otherwise), can be extended to contribute to functional analysis and explain new developmental mechanisms in Drosophila. It is not our goal to repeat all their findings since the entire gene list is included.

It is reasonable to ask us to provide similarities and explain sources of differences, in the overall context of the two studies. We have now placed this in the main text as the reviewer suggests (page 3536) while not removing it from the very detailed comparisons that we had already provided in the original manuscript (Supplementary file 2). In Figure 1K, and in the essential revisions section (3), we now include Delta expression and also explain why the results are consistent with Cho et al. The Shim lab is working on Delta and there are many other genes for us to concentrate on. We also point out (essential revisions 3), that Delta is not a “marker for PH1”; it is expressed in PH1 as well as in the PSC, just as we find (page 11 and Figure 1K). The differences between subclusters in Cho et al., and clusters in this manuscript are now made clear in multiple places in the text and in these responses to the Reviewers’ comments. And we thank them for bringing this up.

5. There is a discrepancy between the introduction and the main results of this paper. In the introduction the authors focus our attention on the IZ, but in fine we don't learn much about IZ cells from this analysis. Instead of deciphering IZ identity and fate by in vivo profiling, most functional analyses performed concern crystal cell maturation. This part is developed via 2 main figures among 7, plus 5 sup figures. From my point of view, this study represents an ideal opportunity to better characterise IZ cell identity, lineage and function. Unfortunately these data are missing in the current version of the manuscript but could be added instead of the data concerning crystal cell maturation, which is somewhat out of the scope of this manuscript and could be published in a separate paper.

The manuscript about IZ that the reviewer is likely referring to is repeatedly cited in the manuscript and available in bioRxiv. It is under minor revisions for publication in the journal Development (Spratford et al. 2020). We cite the manuscript as well as the CHIZ-GAL4 line, which is carefully and thoroughly characterized there. That said, we have added some more functional data on IZ to this revision (Figure 2J-L; Figure 2—figure supplement 2C; Figure 4—figure supplement 1F), in addition to the material on IZ that was presented in the original manuscript (Figures 1, 2, 3, 4, 7 and their corresponding supplementary figures); and as it stands now, this manuscript has more data on IZ than any other published material. Author’s note: this paper is now in press in Development.

We appreciate the Reviewer’s opinion about publishing another paper with the CC data. Based on the argument presented in essential revisions (3), doing so will be contrary to how we define proper validation of transcriptomic data in Drosophila. Validation by functional analysis was one of the primary motivations for initiating this extensive study.

6. This manuscript has to be focused on the novelty given by the RNAseq data. The data concerning crystal cell maturation, which is somewhat out of the scope of this manuscript, could be published in a separate paper.

See Reviewer #3 comment 5. We are hoping that this paper will help move Drosophila analyses of transcriptomic data to a step beyond identification of markers. The novelty in this work is the coupling of transcriptomic and functional data, not so easy to do in mammalian hematopoiesis.

7. There are 7 main figures and 16 sup figures + 1 additional file. All these Sup figures give information and make suggestions that unfortunately are not validated by additional experiments. Overall the reader is left with a lot of observations that are not further validated and in fine one cannot rely on. Data presented in this manuscript have to be focused to avoid overloading the reader with too many side observations, which in turn lead to losing the thread of the message of this study.

Please see essential revisions (12). The supplementary and main figure data are treated with equal rigor. Many supplementary data panels have been consolidated.

8. I have concerns about the single cell RNA seq data, since essential information is missing.

– What about the cell numbers in each cluster? The IZ cluster (Dome-GFP+ and Hml+) represents a small subset of lymph gland cells based on the CHIZ expression profile (see Figure 3K); however, it corresponds surprisingly to a quite large lymph gland cell subset, as illustrated in Figure 1J. How can one explain this?

The cell numbers/percentages in each cluster (and each biological replicate) are now included in (Figure 1J; Figure 1—figure supplement 2B). These are in agreement with other independent studies.

Figure 3K in the original version is Figure 3D in this revision. It shows a part of the middle third section of a lymph gland, and makes a specific and quite a different point. We are not sure how the reviewer can surmise from this panel that less than 15% of cells comprise the IZ cluster. Particularly since this manuscript is the first to actually quantify the number of IZ cells and present the data in a statistically meaningful way. The IZ cluster in the scRNA-Seq represents ~15% (14 -16% between the 3 biological replicates). This is consistent with the data in Spratford et al. 2020, where CHIZ-GAL4+ cells, representing the first bona fide positive marker of IZ, account for ~17% of the total number of cells at 96h AEL. So there is no discrepancy. In fact, these data support our argument that CHIZ-GAL4+ cells (17%) are akin to cells of the IZ cluster (14-16%) and distinct from the cells of the other transitional cluster (proPL; 25-27%).

– To identify subpopulations in clusters, the authors performed sub clustering on isolated clusters for PL and CC. Why was this not done in the same way for the other 3 main clusters (MZ2, IZ and proPL)?

This is a great question. PL and CC cells are both in stages of terminal differentiation. Nearly all CCs are in pseudotime State 6 and PL cells are in State 7. These are reasonably sized clusters, which when sub clustered (using graph based unsupervised methods), can still be represented by single gene expressions, providing validation of them as being subclusters (eg., high or low lz-GFP for CC and high or low NimC1 for PL). In contrast, more dynamic clusters eg., MZ2, IZ etc are in many different states (meaning stages of development at any particular time). Subclustering non-terminal, progenitor and transitional populations that are in mixed developmental stages (States) or are particularly small, such as with MZ1, does not generally yield robust subclusters.

– For the IZ sub-cluster: The plot in Figure 2 sup 3I is very misleading, since it suggests that genes expressed in the IZ are specific to this cluster. For example, "state 3" is present both in the MZ and proPL (Figure 2), but in Figure 2 sup 3l it is only represented in the IZ and not in the MZ and proPL clusters. The same remark holds for states 4, 5 and 7. Furthermore, as I mentioned above, the list of the main genes defining each sub-population inside clusters, as well as their expression in the other sub-clusters, has to be provided in the main figures. Furthermore, a spatial reconstruction in vivo by profiling the expression of a subset of genes identified in sub populations is mandatory to validate the RNAseq data.

As for the list of genes, this has been addressed above in essential revision 9 and in response to Reviewer #3 comment 1. We have removed Figure 2 Sup 3I since the reviewer finds it confusing/misleading. The essence of this information is available in Figure 2D-E. We have also added a dot plot (Figure 2 figure supplement 1K) to make the point about IZ-enriched genes more clear. We further note that 5 out of 6 of these genes are differentially expressed (enriched) in the IZ cluster (Supplementary file 1). Many of the other questions raised have been addressed in essential revisions, and should be resolved with the three important points that we have already made above, but are perhaps worth restating:

1. It is not our expectation that we will find single gene markers for every population. In the new Figure 7A, we describe combinations that reasonably represent a cluster. What is measured is highest relative enrichment compared with other populations, not uniquely expressed markers. We have been careful to say “enriched” rather than “specific” in the context of the IZ, and even for individual genes.

2. States and subclusters are not the same. The state represents advancement in pseudotime. A number of cell types could belong to the same developmental time (state). Clusters are made on the basis of global differential gene expression and they may therefore usually represent different cell types (e.g. MZ1 or PL). The clustering process can break up heterogeneity within what was previously thought of as a single cell type (such as MZ broken into MZ1 and MZ2). The state designations such as MZ2-3 and IZ-3 are not “subclusters” of MZ or IZ, rather they are parts of two different clusters that are in developmental state 3 in pseudotime. MZ2-3 is more similar to MZ2-1 than to IZ-3 in spite of their both being in state 3. In contrast, PH1/2/3 etc of Cho et al., are subclusters not merely states in pseudotime and could, in principle, represent cell types if proven so with markers.

3. For the purposes of gene expression differences, in our study, it is best to look at clusters (e.g. MZ1 and MZ2) or true subclusters such as iCC and mCC. But while individual gene expression can vary as a cell matures through pseudotime states, these differences do not represent enough of a change in transcriptomic (and therefore in markers) to represent unique cell types. Our apologies, if this was not made clear in the text, we have tried to make these distinctions clear in the revision and in this response to reviewer’s comments that we have taken very seriously.

– Why is the plot shown in Figure 1J different from the plot shown in Figure 2 sup 5 I? The t-SNE graphic representation does not give any indications concerning whether clusters are related or not.

The t-SNE in Figure 1J is a 2D (2 dimensional) t-SNE representation of the data (i.e. the Ndimensional data reduced to 2) while the plot in the previous Figure 2—figure supplement 5I (which is now moved to a main Figure 2F) is a 3D (3 dimensional) t-SNE representation. The principal components are marked as axes of a cartesian plot on the figure (t-SNE 1, t-SNE 2, and t-SNE 3). We have included the words “3D t-SNE” at the top and color-coded arrows and arrowheads in (Figure 2F) to indicate the locations of the adjoining clusters. Finally, a true 3D rendition of the 3D t-SNE was also provided (Video 1) where the relative positions of all clusters is more obvious.

– Why are there discrepancies between gene expression levels and their representation on the corresponding plot? Please see for example the case of CG30090 in Figure 2 sup 3B and the corresponding plot in C. CG30090 is expressed at a similar level in proPL and PL, but its expression in proPL is lacking in the plot. Why?

These plots were thresholded to show highest expressing cells. We have now removed them as the dot plot (Figure 2—figure supplement 1K) replacing them makes the same point, only better.

– Concerning IZ markers, among the 6 identified by bulk RNA seq, only 4 of them have been analysed in single cell experiments. What about the 2 others?

We have now added a dot plot representation of all 6 of the IZ enriched markers identified by bulk RNA-Seq (Figure 2—figure supplement 1K), and noted that 5 out of the 6 genes are differentially expressed in the IZ cluster (page 15; Supplementary file 1).

– In Figure 7, the model of developmental progression of blood cells proposes that there is no transition path between IZ and proPL. This proposition does not fit with the data. Indeed, in the pseudo time analysis there is a clear overlap between IZ and proPL, indicating that they are connected (see states 4 and 5 in Figure 2 O-P). Genes highly expressed in IZ are also highly expressed in proPL. This is observed for all the IZ markers analysed in this manuscript (please see Figure 2 sup 3B, D, G, H). Determining whether there is a transition path between IZ and proPL has to be validated by in vivo experiments.

This point is fully addressed in essential revisions (3). The overlap in pseudotime between IZ and proPL, which we believe is now better illustrated in new Figure 2G, indicates that both distinct cell types (distinct by unsupervised graph based clustering, size distribution, and also supported by in vivo labeling and functional experiments) represent two separate transitional populations that bridge MZ2 and PL. The overlap in pseudotime implies they each follow similar states of maturation.

As for no transition between IZ and proPL, we agree with the reviewer that we overstated the case; we have revised our explanations and details are in essential revisions point 3.

The reviewer is correct that IZ and proPL have several genes in common. We had commented on this in the text (pages 20-21). Both are transitioning populations and have expressed genes in common. But their individual functions set them apart. And their minimal contact boundaries makes them separate, without precluding the possibility of some cross talk pointed out by the reviewer. Our new representation of pseudotime (Figure 2G) shows more clearly how the different zonal clusters can be overlapping in pseudotime and yet maintain their cluster identity.

– Figure 7: Regarding the translational link between PL7 and iCC7 (Figure 2 sup4), again this proposition has to be validated in vivo. Furthermore, how can iCC7 (more engaged in maturation) give rise to iCC6 (less engaged in differentiation). This also needs to be validated.

There have been several past studies that have presented convincing data for dedifferentiation and transdifferentiation of plasmatocytes to CCs (Leitão and Sucena 2015; Terriente-Felix et al. 2013) so there is genetic precedence for such reversion and in that respect, this is not an entirely new phenomenon. We cited both manuscripts. To add to that, iCC-7 cells show similarities to PL-7 that the rest of CCs do not share (Figure 2N; Figure 2—figure supplement 3G). Since virtually all PL cells are placed in state 7, and CCs mature in state 6, any conversion from PL to CC would have to represent a state 7 to state 6 transition. Note that the route to mCC-6 is from iCC-6 which does not involve such reversals.

– Concerning the MZ1 cluster, Ubx is expressed in these cells. Ubx is specifically expressed in lymph gland posterior lobes that are composed of hematopoietic progenitors expressing markers of MZ cells (Rodrigues et al., 2021). Altogether these data strongly suggest that MZ1 cells correspond to posterior lobe hemocytes. This has to be clarified.

This is addressed above in essential revision (3). The tertiary lobes were not included in this study.

– For cluster X, since DNA damage markers are expressed, this strongly suggests that this cluster might correspond to unhealthy cells damaged during the experiment. What about their ribosomal content (a criteria commonly used to check for cell health)? Is the molecular signature of cluster X found in bulk RNA seq and in the Cho et al., 2020 paper?

Part of our initial quality control in scRNA-Seq is excluding cells with especially high or low mitochondrial gene content (as a check for cell health and to exclude any damaged cells), and Cluster X cells show similar mitochondrial reads as other cells (Figure 1—figure supplement 2C). Cluster X cells also show similar levels of ribosomal content as other cell clusters (Figure 1—figure supplement 2D).

The genes we see expressed here were first identified by UV irradiating mammalian cells that are blocked in G2 until their DNA is repaired. Wieschaus amongst others has shown this also happens in normal cells in S-G2 transitions that are also transcriptionally active (Blythe and Wieschaus 2015) and display so called “replicative stress”.

As for comparison with the Cho et al. 2020 data, this signature is seen in what they refer to as the “GST-rich” cluster. X and “GST-rich” are both primary clusters that share a large number of differentially expressed genes (Supplementary file 2B) and are similar in size (1.14% vs 1.21%, respectively) (Supplementary file 2E). And when their shared genes are analyzed in GSEA, we find that genes involved in “cellular response to DNA damage stimulus” are strongly enriched in both (p-value=0.00004).

– What is the unit given for the Y axis in Figure 3? Why is the scale different from one graph to another and does not start always at zero? For all graphs in this manuscript the p value is missing, so the reader cannot not figure out whether the differences are statistically significant or not. Concerning the AUC analysis, how is the list of genes taken into account for a signalling pathway or a function that has been established? What kind of conclusion can be drawn from analyses regarding metabolism? In other words, considering the PSC as an example of a group of cells where glycolysis genes are highly expressed, what is the impact of this on PSC function, and how does potential glycolysis in the PSC help us to understand PSC function?

Please see above essential revision 1 for a discussion of AUC values, statistical significance, and p-values. Please see pages 21-22 for discussion of the impact of high glycolysis and low OxPhos on PSC function and pages 38-39 for discussion of what conclusions can be drawn from analyses regarding metabolism.

– Figure 3K: the authors need to define what CHIZ is. Since there is no staining overlap between MMP1 and CHIZ-GFP, the authors cannot conclude that MMP1 is expressed in the IZ. Furthermore, MMP1 transcripts are detected at high levels not only in the IZ but also in the MZ2, proPL and PL (Figure 3J). How can these results be reconciled with the expression profile of MMP1 shown in Figure 3K?

We have defined CHIZ-GAL4 (page 17) and included a reference to the paper that describes how this split GAL4 was constructed and validated (Spratford et al. 2020). We mention in the text (page 2526) that MMP1 is a secreted protein and so we expect to see MMP1 protein in neighboring cells. But all of this protein is altered by manipulation of JNK in CHIZ+ cells (Figure 3D-F’’). MMP1 expression is lost with loss of function and dramatically increased with gain of function of JNK in the CHIZ+ cells alone.

–Figure 4: Quantifications are missing. Crystal cell and MZ indexes have to be given.

Thanks for pointing this out. This has been addressed in essential revision (7).

– Figure 4G and H: That tep4 is expressed in a subset of MZ progenitors defined by dome>GFP is not new, this has been established previously. Please see Benmimoun et al. 2015, and Oyallon et al 2016.

We cite them for their work (page 26). Figure 4G-H (now Figure 4E-F) has a different purpose. Since IZ cells are positive for both HmlΔ and dome (authors’s note: in the context of CHIZ-GAL4), we needed to design QF and GAL4 constructs to demonstrate that some of these dome-positive, Tep4-negative cells are not IZ, i.e. there is a pre-IZ population beyond Tep4. This is a new finding and key to our interpretation of the experiments in Figure 4.

– What about the Pnt expression profile in the LG? Figure 4C-D: DomeMESO>pnt RNAi , in addition to a defect in blood cell differentiation, this leads to a smaller LG compared to the control. This defect in size has to be mentioned and quantified. Since the role of pnt in the MZ has been previously reported by Dragojlovic-Munther M et al , 2013, this paper has to be cited. Furthermore, the Dragojlovic-Munther M et al. study indicates that in addition to preventing hemocyte differentiation, pnt RNAi in the MZ leads to lamellocyte differentiation. Do lamellocytes differentiate when pnt is knocked down using domemeso, tep4 , CHIZ and hml gal4 drivers? In Figure 4F :Tep4>GFP>pntRNAi , GFP levels are decreased compared to the control. Does Pnt control the expression of the Tep4-gal4 driver? In the text, p35 line 886 "Pnt loss in MZ2.3" is an over interpretation, since no gal4 driver specific for this group of cells has been used to perform the lof experiment. p35 lines 903 "plasmatocytes to be converted into crystal cells" is erroneous, since hml-Gal4 expression is not restricted to mature plasmatocytes but is expressed both in plasmatocyte and crystal cell precursors. p35, lines 905-910 should be in the discussion, not in the result 'section.

The Dragojlovic-Munther and Martinez-Agosto 2013 is cited on page 26 (and was cited in the original submission). Unfortunately, that study did not provide the specific pntRNAi line used and there are aspects of that study that for us, needed revisiting as they do not reproduce in our hands. So we left it at that, citing them. We did not look for lamellocyte formation. The issue about MZ2-3 is now explained better (page 27). Loss of pnt has no effect on Tep4>GFP expression (Reviewer #2 comment 4). In regards to suggestion about line 903, we have replaced “plasmatocytes” with “HmlΔ-GAL4 positive cells” (page 28). The discussion item about RTK and Notch has been modified to reflect the pnt data (also in Reviewer #1 comment 1).

– Figure 5: Numb and crystal cells

The paper Cho et al 2020 has to be mentioned in the text, since it established previously by immunostaining that Numb is expressed in crystal cells.

Cho et al. 2020 showed numb::GFP expression in CCs, but did not note differences between protein and RNA expression in CC subsets. The difference in protein expression between iCC and mCC is the main point of this analysis. Nevertheless, we have now cited this observation (page 31).

– A previous study done in Banerjee's lab, reports on the role of Sima and Notch in crystal cell survival by preventing their dissociation (Mukherjee et al., 2011). In the present manuscript, no data and comments refer to crystal cell survival depending on Numb. Thus in the current version of the manuscript, it remains unclear as to what is the role of Numb in crystal cells. Does it control iCC to mCC, or is it required for survival of mature crystal cells? The confusion is sustained by sentences such as: "depletion of Numb prevents the maturation of iCC to the mCC state", please see p 39, lines 1000. Since Numb is detected at high levels in mCC and not in iCC, the function of Numb should be in mCC once they have matured. Furthermore, it has been previously published that Sima is required for crystal cell survival. A decrease in Sima levels is observed in Lz>numbRNAi conditions, supporting the proposition that CC depleted from Numb should disrupt since they lack Sima . In conclusion, the role of Numb in either crystal cell maturation (i. e., going from iCC to mcCC) or mCC survival has to be clarified.

We do not see a major contradiction between what we state and the reviewer’s comments. While the title of the Mukherjee et al. 2011 paper does not mention it, Figure S4 in that paper clearly shows in the model that Notch/Sima signaling is required for “late crystal cell maturation and survival”. The maturation data was not the focus of that paper but was clearly demonstrated and is now replicated and validated by both RNA-Seq and flow cytometry data. We chose to focus our study on the maturation aspect and not survival. This does not, in any way contradict the notion that if mCCs happen to form, sustained lack of the Notch/Sima signal will cause them to burst.

The role of Numb is schematized in the model (new Figure 7D). The observed loss of numb phenotype using the RNAi reagents (Figure 5O-T; Figure 5—figure supplement 3D-I) is fewer mCC and more iCC than in wild type (without a change in the overall number of CCs). Sima/Notch signaling in mCC cells turns on genes that then give those cells the mCC identity. So with severely attenuated Numb/Sima/Notch signaling are you an mCC that looks like an iCC or an iCC that never became an mCC? Beyond semantics, the data clearly show a decrease in the number of mCCs.

– Figure 5 sup2 : Quantification is missing. p 38 lines 979-981. The authors need to clarify whether there are talking about an increase in Numb levels per crystal cell, or an increase in Numb levels per LG, which in this case reflects an increase in the number of cell expressing Numb.

Numb quantitation has been clarified in the Methods (page 69; see also essential revisions 6).

– Numb subcellular localisation is different from one picture to another. In Figure 5M, Figure 5 sup2 , Figure 6A-c and 6 H-L', Numb is mainly localised at the periphery of the cell at the plasma membrane, whereas in Figure 5 sup 3A-B and sup 3F-G, Numb is detected as cytoplasmic punctate dots without staining at the plasma membrane. This is confusing and clarification is necessary.

These are different assays. The live endocytosis assay experiments with active internalization (Figure 5—figure supplement 3A-B) are quite different from conventional staining of fixed tissue (Figure 6A-C, H-L’; Figure 5—figure supplement 2A-C’’’), this likely accounts for the differences in Numb staining. The endocytic punctae are less prevalent but can still be seen (Figures 5M, N’) with conventional staining but the membrane staining is more prominent.

– Figure 5 sup 3: PPO2 staining shown in Figure 5 sup 3 K-L is not in agreement with the quantification given in M. p values are missing in M and N. There are also discrepancies between these figures and the text p 39: "numb RNAi expressed in crystal cells causes ...with a concomitant increase in the iCC population". Figure 5 sup 3N: Quantification of crystal cell numbers is not convincing. Since there is a lot of variation among LGs, quantification of PPO+ cells has to be given as an index (i.e., a ratio between the total number of PPO+ cells/ per total number of LG cells). In N, 6 LGs maximum per genotype have been analysed, which is far too few. Defining whether the number of crystal cells is affected in lz>numbRNAi is essential to determine whether Numb is required to allow iCC to mature into mCC, or whether it controls mCC disruption. The MM section has to be completed; it should indicate how quantifications of cell numbers and fluorescent intensity were performed.

Both the PPO2 staining in Figure 5—figure supplement 3F’’, G’’ and the quantification in H and I show that in a numbRNAi background, CCs with the highest levels of PPO2 (i.e. mCCs) are depleted (H) but the overall number of CCs is not changed (I). p-values are now included in H (p=0.0001) and I (not significant) to support the data. The effect of numbRNAi on CC formation is not based solely on staining but on three independent experiments all showing a decrease in mCCs (Sima punctae in Figure 5O-Q’ and Figure 5—figure supplement 3D-E’’; PPO2 levels in Figure 5R-S’ and Figure 5—figure supplement 3F-I; and lz>GFPHI cells in Figure 5T) with no change in total CC numbers (Figure 5T; Figure 5—figure supplement 3I). The flow cytometry analysis (Figure 5T) also shows a decrease in mCCs (GFPHI) and a concomitant increase in iCCs (GFPLO) with no decrease in total CC. All together these results fully support our model (Figure 7D). The methods section includes how quantifications of cell numbers and fluorescent intensities were performed (page 69).

– For Hnt staining in lz>numbRNAi, there is a discrepancy between Figure 5 sup 3 l-l" and Figure 5 sup 3 L-L". In I-I" no difference in Hnt levels compared to control (H-H) is observed, whereas in L-L" a strong decrease is observed (K-K").

As we note above (Reviewer #2, comment #5) and in the manuscript, the number of CCs does not change but the maturity does. Both Figure 5—figure supplement 3I-I” (now Figure 5—figure supplement 3E-E’’) and Figure 5 sup 3 L-L" (now Figure 5—figure supplement 3G-G’’’) show the same concept: that signs of CC maturity (Sima punctae in E-E’’ and PPO2 levels in G-G’”) decrease while the total number of Hnt+ CCs does not change. The Hnt staining in E’ and G’ were done with different fluorescent secondary antibodies (with different dynamic ranges) and so cannot be directly compared. The visual comparison between intensities of the fluorophores is trumped by the statistical data (Figure 5O; Figure 5—figure supplement 3H-I).

– The enlargements shown in Figure 5O-R have been taken from pictures shown in Figure 5 sup 3K-L. It would be better to show independent immunostainings. This remark is even more relevant in this case because the staining in Figure 5 sup 3 L is not convincing.

These data are quantified and the statistics shown (Figure 5O; Figure 5—figure supplement 3HI; p-values between 0.0001 and 0.0063), which should ease the reviewer’s concerns. We have indicated that images in Figure 5 sup 3K-L (now Figure 5—figure supplement 3D-G’’’) correspond to lower magnifications of Figure 5O-R (now Figure 5P-S’) in the figure legends. Showing an inset with high magnification has a different purpose and is a standard practice to help the reader hone into an area of interest in that particular picture and does not imply that we have only stained one lymph gland since the statistics are provided.

– Figure 5 sup 3 H-I: lz >numb RNAi there is a decrease in Sima staining. Is it due to a decrease in sima transcription and/or Sima protein stability and/or Sima subcellular localisation?

This question is beyond the scope of this manuscript but is an important suggestion to explore in future studies.

– Figure 6 J and M ; is "the Numb intensity" referring to the intensity of Numb per cell or the total amount of Numb intensity measured per LG? This has to be clarified in the figure, in the text (p 40) and mentioned in the MM. What about the crystal cell index in lz>msi RNAi and Hml>msRNAi?

Thanks for pointing this out. Some of the technical details of Numb intensity measurement are included in the results (page 33), the Methods (page 69), and Figure 6J, M legend, and also addressed in essential revisions (6). But to respond to the reviewer’s more important real concern, all intensities are measured on a “per cell” and not an overall LG basis.

The msiRNAi experiments focused on whether Numb protein levels changed. The expectation is that the phenotype on CC cells will reflect Numb levels, but a formal analysis of Msi’s effects on CC number will be explored in future studies.

– A schematic representation of crystal cell maturation involving N (both the canonical and non-canonical signalling), Numb and Sima would be very helpful.

Excellent idea! We have provided a schematic representation in Figure 7D.

– In MM p 74 lines 1888 "data was corrected for batch effects between samples". The information concerning the method used has to be provided.

Information concerning the method used for batch effects corrections has been provided in the Methods (page 66).

9. Modifications in the text are required

– Lines 1093 "the equilibrium signal from proPl". Functional data supporting this conclusion are lacking.

Functional data showing loss of equilibrium signaling (PvrRNAi) in HmlΔ-GAL4+ cells results in a strong phenotype (at an early time point) but no phenotype is seen when driven in CHIZ-GAL4+ cells (at any time point) are provided in new Figures 2J-L and Figure 2—figure supplement 2C (pages 17-18). Because the equilibrium signal initiates at a developmental timepoint (2nd instar) when PL cells are not present, we conclude that HmlΔ+ proPL cells participate in equilibrium signaling.

– Lines 1088 "JNK signalling ... is a specific property if IZ cells". Neither JNK expression nor its function has been analysed in this study.

We performed loss and gain of function analysis using mutant forms of JNK pathway members. MMP1 is a known target of JNK and we show that levels of MMP1 change upon manipulation of JNK signaling with loss of JNKK (hepRNAi) and activation of JNKK (hepACT) in CHIZ-GAL+ IZ cells (Figures 3D-F’’; pages 25-26). We also show that AP-1 complex (Fos/Jun) targets (which functions downstream of JNK) (including Mmp1 and GlcT, noted in the text on page 25) are also enriched in the IZ (Figure 1 figure supplement 3A; FDR=0.00243).

(Authors’ note: Following the detailed response and corresponding changes to the reviewers’ comments, the resubitted manuscript needed more clarifcation for the reviewers.)

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Essential revisions:

1. The authors should temper some conclusions because the conversion of AUC scores to z-scores can obscure the actual level of enrichment. Specifically, the authors should:

a. Soften the conclusion that proPL generates the equilibrium signal and that the IZ alone activates the JNK pathway as this is based on a low number of genes (lines 959-960). Please rephrase this statement.

On page 38, in the Discussion section, we have replaced the statement “For example, proPL (but not IZ) generates the equilibrium signal, whereas the IZ alone activates the JNK pathway” with the phrase “For example, together the genetic and RNA-Seq data suggest that proPL is likely a major source of the equilibrium signal, whereas IZ largely contributes to the JNK signal.”

b. Point out in the text that JNK activation is enriched in but not specific to the IZ.

The statement addressed in 1a. was from the Discussion Section, which we have now tempered. In the Results Section, we have not ever used the word “specific” in reporting on any transcriptomic data (as we explained in the previous rebuttal). We have consistently used the term “enriched'' since we agree that genes are not on/off in any one cell type.

This includes the results on JNK (page 25-26).

Also in results, we interpreted the genetics of the MMP1 data on the basis of its staining being completely eliminated in a CHIZ-GAL4, JNKK miRNA background and not based solely on the staining pattern of a secreted protein. Nevertheless, we have changed the phrasing.

Previously stated: "establishing the IPs as MMP1 producing cells"

Now changed to the phrase “suggesting the IPs are a source of MMP1” (page 26).

c. Clarify the in vivo definition of the IZ and the ProPL cells with respect to the Hml delta QF driver. The authors use it to specifically label the IZ but the driver's expression domain is broader than IZ cells.

We have added extra details to make this clearer (page 17).

Briefly, HmlΔ-GAL4 and HmlΔ-QF are identical in their expression and they fully overlap. What is different is CHIZ-GAL4 (in Spratford et al.), which is a split GAL4 that contains a combination of HmlΔ and domeMESO enhancers and includes the strong p65 activation domain. CHIZ-GAL4 expression colocalizes with the overlap between the directly driven Hml-DsRed and domeMESO-GFP reporters, and these Hml-DsRed, domeMESO-GFP double positive cells define the IZ in our bulk RNA-Seq. In the current manuscript, we demonstrate that CHIZ-GAL4, which marks IZ cells, and HmlΔ-QF, which marks proPL and PL cells, are non-overlapping (Figure 2H-I; Figure 2—figure supplement 2C-D). The reason is not fully clear, but is likely due to differences in timing or level of expression. We apologize if this was not stated clearly and we hope the new description is more useful (page 17).

2. The authors should indicate in Figure 3—figure supplement 3, 1D which AUC terms encompass "glycerolipid remodelling genes" (line 592)

We have included which AUC term encompasses “glycerolipid remodelling” in the AUC table (Supplementary file 4, row 23) as well as in the text (page 23-24) and the legends for Figure 3—figure supplement 1D and 1F (page 58-59).

3. The authors should rephrase "pnt transcript is expressed at low levels in the PSC (Figure 4A)". (Line 664). One interpretation of Figure 4A is that pnt is expressed at the same levels across the different clusters but in a smaller number of cells in the PSC (hence the smaller circle).

Yes, that is correct, and we have changed the sentence to “pnt transcript is expressed in very few cells in the PSC” (page 26).

4. The authors should provide publications for the enhancers in Figure 5—figure supplement 1 B,C,G or explain how they defined the enhancers.

These references are now included in the figure legends for Figure 5—figure supplement 1B-C and G (page 60).

5. The authors should remove the comma in line 795 ("we find, is").

We have removed this comma (page 31). (Author's Comment: Although it was grammatically correct with the comma).

6. The authors should avoid the term de-enriched/de-enrichment (lines 525, 549, 562, 836, 135, 1136, 1159, 1435, 1437, 1438) to indicate a change in the levels of expression.

Thank you. We have replaced these terms with “not enriched” (pages 21-22, 33, 45, and 58) or “absence of enrichment” (pages 22 and 46).

7. The authors should modify their conclusion (line 652) "establishing the IPs as MMP1 producing cells". The diffused MMP1 staining in Figure 3D-D' is not convincing.

As described above in point 1b, the MMP1 conclusion is based on the genetic data that all MMP1 expression is lost when the JNK pathway is knocked down in IZ cells in a CHIZ-GAL4; UAS-JNKK miRNA background.

Nevertheless, we have changed the previous phrase: "establishing the IPs as MMP1 producing cells" to the phrase “suggesting the IPs are a source of MMP1” (page 26).

8. The authors should address how they conclude (line 696) that "Pnt is required for exit from the IZ" since there is no defect in CC differentiation in CHIZ >pnt RNAi (Figure 4l-M). They should also comment on Pnt's role in plasmatocyte differentiation.

As explained (page 27), there are routes to CC that bypass IZ, which we demonstrated by showing normal CC formation despite the complete absence of IZ cells (or any other Hml+ cells) in a genetic background with pntRNAi driven in MZ using domeMESO-GAL4 (Figure 4B-D). We believe that these minor routes may become more dominant under mutant conditions (further elaborated on page 38-39). We have also added an additional detail to the text related to the effects of pnt loss of function on plasmatocyte differentiation. We now point out that Spratford et al. demonstrates that Pnt is required for exit from the IZ and plasmatocyte differentiation (page 27-28). Our conclusion is supported by the cited data (Spratford et al.) that shows the absence of plasmatocytes when pntRNAi is driven in IZ cells, but the data in this manuscript (Figure 4L-M) clearly shows CC differentiation still occurs in this background.

9. The authors should modify the introduction so that it includes a statement about the cardiac tube acting as a niche to control lymph gland hematopoiesis and supporting references.

OK, we have done so (page 3) and cited two Crozatier papers for this work.

10. The authors should add a sentence/short paragraph in the discussion saying that the proposed model/definition of states awaits functional confirmation, at least in some cases.

We have added this statement (page 37).

11. In the dot plots, the authors should indicate what is meant by 'mean' and 'non-zero percent'.

This has been included in the legends for Figure 2N (page 45), Figure 4A (page 47), Figure 5L (page 49), Figure 6E (page 51), Figure 2—figure supplement 1J-K (page 56), and Figure 2—figure supplement 2H (page 57). In these figure legends we say that for each of the populations shown in the dot plot, the color of the dot represents the mean expression level of a given gene (mean) and the dot size indicates the percent of cells that express that gene (non-zero percent).

12. In Figure 2H,I, the authors should show the whole lobe and indicate the IZ and the proPL with arrowheads.

This is now included in Figure 2—figure supplement 2C-D.

Suggested revisions:

1. It is suggested but not required that the authors provide a volcano plot to illustrate the differences between proPL and IZ.

Now included in Figure 2E and described in the text (page 17). We thank the reviewers for this suggestion, that has helped us visualize and better understand the differences between the two populations.

2. It is suggested but not required that the authors provide the single table that must have been generated to produce the figures of the first submission and used to calculate the z-scores.

The figures in the first submission were not based on a single table. Each figure is based on a separate table that includes the individual AUCell values for every individual cell identified in the scRNASeq along with each cell’s identity and that cell’s cluster, state, and cluster-state assignment. We therefore do not believe such a collection of tables will be helpful for the reader. But as we have included the full scRNA-Seq dataset, list of individual genes for each AUCell list, and the parameters we used for the AUC analysis, any interested researcher can reconstruct those graphs and any other analysis included in the original or re-submission. We believe that the way we now display the AUC data in heatmap format, which is a common practice for such analysis (e.g., Zhu et al. 2020; Blood 136 (7): 845–856), accurately portrays the most important aspects of the analysis which is the difference between groups and not the absolute values for the AUC lists (which we and the reviewer noted is highly dependent on the number of genes in the list, the number of genes identified in the scRNA-Seq, and the parameters set for the analysis). For example, taking mitotic G2/M genes as AUC, we find that the mean AUC score over all cells is 0.08 and the value for cluster X is 0.12 (other clusters in the 0.09 range). Looking at these data alone makes all clusters seem similar in value, and the absolute numbers are small. But the value for X is greater than 3 standard deviations (SD) away from the mean calculated for all cells taken together, while the other clusters are less than 1 SD away. The exact value of the AUC for any cluster varies from 0 to 1 and is not meaningful and cannot be compared with other AUC’s in the same table. Therefore, we think that the table would confuse the average reader, who will likely think that our conclusions are drawn from the absolute AUC scores or absolute differences between them.

https://doi.org/10.7554/eLife.67516.sa2

Article and author information

Author details

  1. Juliet R Girard

    Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, Los Angeles, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing - original draft, Writing – review and editing
    Contributed equally with
    Lauren M Goins and Dung M Vuu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8274-2136
  2. Lauren M Goins

    Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, Los Angeles, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing - original draft, Writing – review and editing
    Contributed equally with
    Juliet R Girard and Dung M Vuu
    Competing interests
    No competing interests declared
  3. Dung M Vuu

    Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, Los Angeles, United States
    Contribution
    Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing - original draft, Writing – review and editing
    Contributed equally with
    Juliet R Girard and Lauren M Goins
    Competing interests
    No competing interests declared
  4. Mark S Sharpley

    Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, Los Angeles, United States
    Contribution
    Formal analysis, Methodology, Resources, Writing – review and editing
    Competing interests
    No competing interests declared
  5. Carrie M Spratford

    Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, Los Angeles, United States
    Contribution
    Investigation, Resources, Writing – review and editing
    Competing interests
    No competing interests declared
  6. Shreya R Mantri

    Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, Los Angeles, United States
    Contribution
    Investigation, Resources
    Competing interests
    No competing interests declared
  7. Utpal Banerjee

    1. Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, Los Angeles, United States
    2. Molecular Biology Institute, University of California, Los Angeles, Los Angeles, United States
    3. Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, United States
    4. Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, United States
    Contribution
    Conceptualization, Formal analysis, Funding acquisition, Project administration, Resources, Supervision, Writing – review and editing
    For correspondence
    banerjee@mbi.ucla.edu
    Competing interests
    Reviewing editor, eLife
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6247-0284

Funding

National Heart, Lung, and Blood Institute (R01-HL067395)

  • Utpal Banerjee

National Cancer Institute (R01-CA217608)

  • Utpal Banerjee

National Heart, Lung, and Blood Institute (T32-HL69766)

  • Juliet R Girard

National Institute of General Medical Sciences (K12-GM106996)

  • Juliet R Girard

National Cancer Institute (T32-CA009056)

  • Lauren M Goins

National Heart, Lung, and Blood Institute (T32-HL863458)

  • Carrie M Spratford

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The authors thank Fangtao Chi for engineering the CHIZ-GAL4 construct used in this study, and past and present members of the laboratory for their help and advice. The authors gratefully acknowledge FlyBase; the Bloomington Drosophila Stock Center; the Vienna Drosophila Resource Center; the Kyoto Stock Center, and the fly community including Sougata Roy, Pablo Wappner, Dalmiro Blanco-Obregon, Katja Brückner, Tsunaki Asano, Yuh Nung Jan, Dirk Bohmann, and Andrea Page-McCaw for reagents. The authors acknowledge the help of the Broad Stem Cell Research Center (BSCRC) and Owen Witte for help and support, the MCDB/BSCRC Core Facility in Microscopy, and the BSCRC core in Flow Cytometry, particularly Felicia Codrea, Jessica Scholes, and Jeffrey Calimlim for help with cell sorting. The authors thank the UCLA TCGB center, Xinmin Li, and Michael Mashock for help in sequencing. The authors acknowledge the Partek Flow technical support team and in particular Xiaowen Wang’s help, which was crucial for data analysis. The authors thank undergraduate research scholars Peiliang Zhou, Chloe Su, and Khoi Luc for their contributions. The authors thank David Eisenberg and Vy Phan Lai for their support of DMV through the Center for Global Mentoring. UB is supported by National Institutes of Health grants R01 HL-067395 and R01 CA-217608; JRG by Ruth L Kirschstein National Research Service Award number T32HL69766 and UPLIFT (UCLA Postdocs’ Longitudinal Investment in Faculty) Award number K12GM106996; LMG by Ruth L Kirschstein Institutional National Research Service Award number T32CA009056 and by National Heart, Lung, and Blood Institute of the National Institutes of Health under award number 3R01HL067395-16S1; DMV by the Center for Global Mentoring at UCLA-DOE Institute for Genomics & Proteomics; and CMS by Ruth L Kirschstein National Research Service Award number T32HL863458.

Senior Editor

  1. Anna Akhmanova, Utrecht University, Netherlands

Reviewing Editor

  1. Erika A Bach, New York University School of Medicine, United States

Reviewer

  1. Erika A Bach, New York University School of Medicine, United States

Publication history

  1. Preprint posted: February 13, 2021 (view preprint)
  2. Received: February 13, 2021
  3. Accepted: October 22, 2021
  4. Accepted Manuscript published: October 29, 2021 (version 1)
  5. Version of Record published: November 23, 2021 (version 2)

Copyright

© 2021, Girard et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 528
    Page views
  • 163
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Developmental Biology
    2. Neuroscience
    David Sokolov et al.
    Research Article

    Despite mounting evidence that the mammalian retina is exceptionally reliant on proper NAD+ homeostasis for health and function, the specific roles of subcellular NAD+ pools in retinal development, maintenance, and disease remain obscure. Here, we show that deletion of the nuclear-localized NAD+ synthase nicotinamide mononucleotide adenylyltransferase-1 (NMNAT1) in the developing murine retina causes early and severe degeneration of photoreceptors and select inner retinal neurons via multiple distinct cell death pathways. This severe phenotype is associated with disruptions to retinal central carbon metabolism, purine nucleotide synthesis, and amino acid pathways. Furthermore, transcriptomic and immunostaining approaches reveal dysregulation of a collection of photoreceptor and synapse-specific genes in NMNAT1 knockout retinas prior to detectable morphological or metabolic alterations. Collectively, our study reveals previously unrecognized complexity in NMNAT1-associated retinal degeneration and suggests a yet-undescribed role for NMNAT1 in gene regulation during photoreceptor terminal differentiation.

    1. Developmental Biology
    2. Neuroscience
    Nishtha Ranawat, Ichiro Masai
    Research Article

    Microglia are brain-resident macrophages that function as the first line of defense in brain. Embryonic microglial precursors originate in peripheral mesoderm and migrate into the brain during development. However, the mechanism by which they colonize the brain is incompletely understood. The retina is one of the first brain regions to accommodate microglia. In zebrafish, embryonic microglial precursors use intraocular hyaloid blood vessels as a pathway to migrate into the optic cup via the choroid fissure. Once retinal progenitor cells exit the cell cycle, microglial precursors associated with hyaloid blood vessels start to infiltrate the retina preferentially through neurogenic regions, suggesting that colonization of retinal tissue depends upon the neurogenic state. Along with blood vessels and retinal neurogenesis, IL34 also participates in microglial precursor colonization of the retina. Altogether, CSF receptor signaling, blood vessels, and neuronal differentiation function as cues to create an essential path for microglial migration into developing retina.