Overview of TaG-EM system.

A) Detailed view of the 3’ UTR of the TaG-EM constructs showing the position of the 14 bp barcode sequence (green highlight) relative to the polyadenylation signal sequences (underlined) and polyA cleavage sites (purple highlights). pJFRC12 backbone schematic is modified from Pfeiffer et al., 2010. B) Schematic illustrating the design of the TaG-EM constructs, where a barcode sequence is inserted in the 3’ UTR of a UAS-GFP construct and inserted in a specific genomic locus using PhiC31 integrase. C) Use of TaG-EM barcodes for sequencing-based population behavioral assays. D) Use of TaG-EM barcodes expressed with tissue-specific Gal4 drivers to label cell populations in vivo upstream of cell isolation and single-cell sequencing.

Structured pool tests.

A) Overview of the construction of the structured pools for assessing the quantitative accuracy of TaG-EM barcode measurements. Male and female even pools were constructed and extracted in triplicate. The table shows the number of flies that were pooled for each experimental condition. B) Barcode abundance data for three independent replicates of the female even pool. C) Barcode abundance data for three independent replicates of the male even pool. D) Barcode abundance data for the female staggered pool. Inset plot shows the average observed barcode abundance among lines pooled at each level compared to the expected abundance. E) Barcode abundance data for the male staggered pool. Inset plot shows the average observed barcode abundance among lines pooled at each level compared to the expected abundance. For all plots, bars indicate the mean barcode abundance for three technical replicates of each pool, error bars are +/− S.E.M.

TaG-EM barcode-based behavioral measurements.

A) TaG-EM barcode lines in either a wild-type or norpA background were pooled and tested in a phototaxis assay. After 30 seconds of light exposure, flies in tubes facing the light or dark side of the chamber were collected, DNA was extracted, and TaG-EM barcodes were amplified and sequenced. Barcode abundance values were scaled to the number of flies in each tube and used to calculate a preference index (P.I.). Average P.I. values for four different TaG-EM barcode lines in both the wild-type and norpA backgrounds are shown (n=3 biological replicates, error bars are +/− S.E.M.). B) The same eight lines used for the sequencing-based TaG-EM barcode measurements were independently tested in the phototaxis assay and manually scored videos were used to calculate a P.I. for each genotype. Average P.I. values for each line are shown (n=3 biological replicates, error bars are +/− S.E.M.) for TaG-EM-based quantification (top) and manual video-based quantification (bottom). C) Flies carrying different TaG-EM barcodes were collected and aged for one to four weeks and then eggs were collected and egg number and viability was manually scored for each line. In parallel the barcoded flies from each timepoint were pooled, and eggs were collected, aged, and DNA was extracted, followed by TaG-EM barcode amplification and sequencing. Average number of viable eggs per female (manual counts) and average barcode abundance are shown both as a bar plot and scatter plot (n=3 biological replicates for 3 barcodes per condition, error bars are +/− S.E.M.).

TaG-EM barcode-based quantification of larval gut motility.

Schematics depicting A) manual and B) TaG-EM-based assays for quantifying food transit time in Drosophila larvae. C) Transit time of a food bolus in the presence and absence of caffeine measured using the manual assay (p = 0.0340). D) Transit time of a food bolus in the presence and absence of caffeine measured using the TaG-EM assay (p = 0.0488). A modified Chi-squared method was used for statistical testing (Hristova and Wimley, 2023).

Gal4 driven expression of GFP from TaG-EM lines.

A) Comparison of endogenous GFP expression and GFP antibody staining in the wing imaginal disc for the original pJFRC12 construct inserted in the attP2 landing site or for a TaG-EM barcode line driven by dpp-Gal4. Wing discs are counterstained with DAPI. B) Endogenous expression of GFP from either a TaG-EM barcode construct (left column), a hexameric GFP construct (middle column), or a line carrying both a TaG-EM barcode construct and a hexameric GFP construct (right column) driven by the indicated gut driver line (PMG-Gal4: Pan-midgut driver; EC-Gal4: Enterocyte driver; EE-Gal4: Enteroendocrine driver; EB-Gal4: Enteroblast driver).

Expression of TaG-EM genetic barcodes in larval intestinal cell types.

A) UMAP plot of Drosophila larval gut cell types. B) Annotation of cells associated with a TaG-EM barcode across all 8 multiplexed experimental conditions using data from the gene expression library and an enriched TaG-EM barcode library. C) Annotated enteroblast cells. D) Presence of TaG-EM barcode (BC6) driven by the EB-Gal4 line using data from the gene expression library and an enriched TaG-EM barcode library. Gene expression levels of enteroblast marker genes E) esg, F) klu. G) Annotated enterocyte cells. H) Presence of TaG-EM barcode (BC4) driven by the EC-Gal4 line using data from the gene expression library and an enriched TaG-EM barcode library. Gene expression levels of enterocyte marker genes I) betaTry, J) Jon99Ciii. K) Annotated enteroendocrine cells. L) Presence of TaG-EM barcode (BC9) driven by the EE-Gal4 line using data from the gene expression library and an enriched TaG-EM barcode library. Gene expression levels of enteroendocrine cell marker genes M) Dh31, N) IA-2.

Sanger sequencing identification of TaG-EM barcode lines.

A) Summary of barcode pool injections. Barcode sequence and transgenic vial identifier in which the barcode was identified are shown. B) Sanger sequencing-based confirmation of the barcode sequence and PCR handle in TaG-EM transgenic lines. Because the TaG-EM barcode constructs were injected as a pool of 29 purified plasmids, some of the transgenic lines had inserts of the same construct. In total 20 unique lines were recovered from this round of injection.

Optimization of TaG-EM barcode amplification.

A) Gels showing bands produced when amplifying TaG-EM flies or a wild type control with the indicated polymerase, annealing temperature, and primer pair (short = B2_3’F1_Nextera/ SV40_pre_R_Nextera; long = B2_3’F1_Nextera/ SV40_post_R_Nextera). B-E) Mean error (R.M.S.D. root mean squared deviation from expected value) for even pool amplified with the indicated primer set, input amount, and cycle number using KAPA HiFi polymerase (n=3, error bars are +/− S.E.M.). F-G) Mean error (R.M.S.D. root mean squared deviation from expected value) for staggered pool amplified with the indicated primer set, input amount, and cycle number using KAPA HiFi polymerase (n=3, error bars are +/− S.E.M.).

Coeffient of variation for TaG-EM structured pools.

Plot showing coefficient of variation for different groups of TaG-EM barcodes in the structured pools. Dashed line indicates the mean coefficient of variation across all conditions.

Oviposition tests with TaG-EM barcode lines.

Plots showing mean TaG-EM barcode abundance for adult females used in oviposition experiments (top) and eggs collected from these females (bottom). Data from two independent trials is shown (n=3 for each trial, error bars are +/− S.E.M.). Dashed lines indicate the expected abundance values.

Fecundity data for individual TaG-EM lines.

Manually collected data for mean number of viable eggs per female, barcode abundance data, and barcode abundance data normalized to adult fly barcode data for each of the TaG-EM barcode lines used in the age-dependent fecundity experiment. Scatterplots show correlations between manually collected data and barcode sequencing results. Data from two independent trials is shown (n=3 for each trial, error bars are +/− S.E.M.).

Average age-dependent fecundity data for Trial 1.

Average number of viable eggs per female (manual counts) and average barcode abundance are shown both as a bar plot and scatter plot (n=3 biological replicates for 3 barcodes per condition, error bars are +/− S.E.M.). Data from Trial 2 is shown in Figure 3C.

Larval gut motility assay parameters.

A) Images of larvae fed with blue-dyed yeast agar. B) Effect of dye concentration on food transit time. C) Effect of starvation time on feeding and uptake of the dyed food bolus. D) Effect of liquid versus solid diet on food transit time. E) Aversive effect of caffeine on food bolus uptake.

Cost comparisons for manual and TaG-EM gut motility assays.

A) Cost per data point as a function of the number of data points generated and the number of experimental conditions screened. B) Overall experiment cost and C) labor effort as a function of the number of data points generated and the number of experimental conditions screened.

Expression driven by dpp-Gal4 for 20 TaG-EM lines.

GFP antibody staining in the wing imaginal disc for the indicated TaG-EM barcode line driven by dpp-Gal4. Wing discs are counterstained with DAPI.

TaG-EM line GFP expression driven by different Gal4 drivers.

A) Comparison of endogenous GFP expression in larvae for the original pJFRC12 construct inserted in the attP2 landing site (left) or for a TaG-EM barcode line (right) expressed under the control of the indicated driver line. B) GFP expression of the PC-Gal (Precursor-Gal4) driver line together with either UAS-2xGFP or a combination of UAS-2xGFP and a TaG-EM barcode line.

Dissociated intestinal cell viability.

A) GFP expression visualized in dissociated cells from gut driver lines crossed to hexameric GFP and TaG-EM line. B) Proportion of live (left) and dead (right) cells post-isolation and flow sorting as assessed by GFP expression and propidium iodide staining.

BD FACSDiva 8.0.1 gating for sorted cells.

A) GFP gating created by analyzing a pool of GFP positive and negative cells. B) Flow gating for Drosophila gut cells with TaG-EM GFP expression driven in intestinal precursor cells (PC-Gal4) and enterocytes (EC-Gal4).

Expression of TaG-EM genetic barcodes in larval intestinal precursor cells.

UMAP plots showing gene expression levels of A) enteroblast/ISC marker genes esg, klu, and E(spl)mbeta-HLH; and B) the TaG-EM barcodes 7, 8, and 9 driven by the PC-Gal4 line.

BD FACSDiva 8.0.1 gating for sorted cells.

A) Dead cell gating created by staining sample with propidium iodine (PI). B) Flow gating for Drosophila gut cells with TaG-EM and hexameric GFP expression driven by the pan-midgut, enteroblast, enterocyte, enteroendocrine, and precursor cell drivers.

TaG-EM-based doublet identification.

UMAP plots pre-doublet removal showing A) doublets uniquely identified by DoubletFinder, B) all doublets identified by DoubletFinder, C) doublets uniquely identified by TaG-EM barcodes, D) all doublets identified by TaG-EM barcodes, E) doublets mutually found by TaG-EM and DoubletFinder, F) Venn diagram of overlap between doublets identified by TaG-EM and DoubletFinder.

Clustering and automated annotation.

A) UMAP plots clustered at different resolutions. B) Clustree analysis of the effect of clustering resolution. C) Automated cell type annotation using data from the Fly Cell Atlas.

Expression of TaG-EM genetic barcodes in larval intestinal cell types.

A) UMAP plot of Drosophila larval gut cell types. B) Annotation of cells associated with a TaG-EM barcode across all 8 multiplexed experimental conditions using data from the gene expression library only. C) Annotated enteroblast cells. D) Expression level of TaG-EM barcode (BC6) driven by the EB-Gal4 line using data from the gene expression library only. Gene expression levels of enteroblast marker genes E) esg, F) klu. G) Annotated enterocyte cells. H) Expression level of TaG-EM barcode (BC4) driven by the EC-Gal4 line using data from the gene expression library only. Gene expression levels of enterocyte marker genes I) betaTry, J) Jon99Ciii. K) Annotated enteroendocrine cells. L) Expression level of TaG-EM barcode (BC9) driven by the EE-Gal4 line using data from the gene expression library only. Gene expression levels of enteroendocrine cell marker genes M) Dh31, N) IA-2.

Optimizing amplification of the TaG-EM barcode library.

A) Workflow for single-cell capture; cDNA amplification with added spike-in primer for TaG-EM library followed by a SPRI size-selection clean-up, then PCR(s) to create library for sequencing. B) Spike-in primers and amplification primers used to enrich TaG-EM barcodes. Table summarizes different protocols tested to amplify the TaG-EM barcodes and create an enriched sequencing library. C) Percent of on-target reads for each enriched TaG-EM barcode library.

Performance of the enriched TaG-EM barcode library.

A) Proportion of cells with at least one barcode read assigned as a function of read depth for the enriched TaG-EM barcode library. Dashed line indicated percentage of cells with TaG-EM barcodes detected in the gene expression library B) Number of unique UMIs observed as a function of read depth. C) Correlation between barcodes detected in the gene expression (GEX) library and the enriched TaG-EM barcode library as a function of the purity of TaG-EM barcode assignment to the corresponding cell barcode. Dashed line indicates the threshold used for TaG-EM barcode calling in the enriched TaG-EM barcode library.

Expression of the PMG-Gal4 driven TaG-EM barcodes.

UMAP plots showing expression of the four PMG-Gal4 driven TaG-EM barcodes (BC1, BC2, BC3, and BC7) either A) in aggregate or B) individually.

Characterization of Gal4 line expression in the larval gut.

A) Confocal images of third instar midguts showing Gal4-driven fluorophore expression (GFP or mCherry) and comparison with immunostainings of the gut cell markers Prospero (enteroendocrine), Pdm1 (enterocyte) and Esg-GFP (progenitor cell). For each image, Z projections of the stacks recorded along the length of the midgut were manually stitched together. B) Representative single frames confocal images of a small region of the midgut showing immunostainings of the different gut cell markers and the Gal4-driven fluorophores. Quantification of overlapping and non-overlapping expression between the Gal4-driver fluorophore expression and the cell type marker in the anterior (A), middle (M), and posterior (P) regions for C) enteroendocrine cells (EC-Gal4), D) enterocytes (EC-Gal4), E) precursor cells (PC-Gal4). Five specimens for each Gal4 line were examined. In the case of the enterocyte-specific driver, only anterior and middle regions were analyzed since the driver is largely inactive in the posterior part of the midgut.