Improved cell barcode readout enables larger-scale pooled screens.

a, Stages of a pooled screen for high-dimensional image-based phenotypes. b, Schematic of the proximally-barcoded sgRNA expression cassette. Minimizing the spacing between guide and barcode sequences reduces lentivirus template switching and simplifies cloning. c, Molecular biology workflow for reading out DNA barcodes from single cells, grouped into two stages: (i) cell fixation and in situ barcode amplification and (ii) in situ sequencing (ISS). Asterisks mark aspects of the protocol for which key optimizations were made in this work. d, Number of amplicons (y-axis) and mean fluorescence intensity (x-axis) of barcode readout from each of 113 molecular inversion probes (MIPs, gray circles), tested in a pooled experiment. e, Representative ISS images resulting from amplification protocols from reference 16 (top) and this work (bottom), performed side-by-side. Nuclei (blue, Hoechst 33342 stain) and rolling circle amplicons that incorporated fluorescently-labeled nucleotides (spots in yellow, cyan, magenta, and white) are shown. f,g, Quantification of the mean number of amplicons per cell (f) and fluorescence intensity of sequencing signal (g) resulting from changes to the in situ amplification protocol. The first and fourth conditions correspond to the upper and lower images in (e), respectively.

A pooled screen for 366 genes regulating cell morphology in over one million cells.

a, Experimental workflow of the pooled screen. b, Phenotyping image (upper left) and genotyping images over nine rounds of sequencing for two example cells. c, Full phenotyping and genotyping fields of view (FOVs) at 20x and 10x magnification, respectively, aligned and overlaid. Dashed box marks the region shown in (b). d, Distribution of edit distance to a designed barcode for each of the 38M in situ sequencing (ISS) reads. e, Histogram of the number of ISS reads per cell. f, Histogram of the number of cells assigned to each CRISPR guide. g, Phenotyping image (left) and genotyping images over four rounds of sequencing for one example cell that contains two distinct barcodes. b, c,g, Images show nuclei in blue (Hoechst 3342), actin in green (Alexa Fluor 488 phalloidin), cell segmentations outlined in white, and amplicons containing fluorescently-labeled nucleotides in yellow, cyan, magenta, and white.

Dimensionality reduction of cell images by morphological feature-based PCA and image-based β-variational autoencoder (β-VAE).

a, Computational workflow for capturing the important axes of variation using morphological feature-based PCA (green, left) and imagebased β-VAE (purple, right). b, For five example cell images (left column), de novo reconstruction by the β-VAE using the entire latent space (middle column) or only the top 25 iADs (right column). Nuclear and actin signals outside the segmented nuclei and cells, respectively, were masked out. These cells were not seen by the β-VAE during training. c, Percent of total variance of input morphological features explained by each of the top 100 fPCs. g, Percent of total KL-divergence of all 512 iADs captured by each of the top 100 iADs. d-f,h-j Visual Interpretation of Embeddings by constrained Walkthrough Sampling (VIEWS) of three fPCs (d-f) and three iADs (h-j). Top: density distribution along each dimension for all cells in the dataset, with 1st, 16th, 50th, 84th, and 99th percentiles marked. Bottom: images of three cells that fall at each of these percentiles along the given dimension but have near-average values in each other dimension. Images show nuclei in blue (Hoechst 33342), actin in green (Alexa Fluor 488 phalloidin), and cell segmentations outlined in white. Surrounding cells are shown at 50% brightness. In (h-j), β-VAE-generated synthetic images of cells traversing each iAD are also shown below. k, Absolute value of Pearson’s correlation across all cells in the dataset for each iAD paired with each fPC is shown as a heatmap.

Two dimensionality reduction methods identify many overlapping gene hits.

a, Venn diagram of gene hits identified using fPCA (green) and iADR (purple). b, For each hit (blue) and non-targeting control (gray) guide, the maximum -log10(p value) across all fPCs and across all iADs are plotted. Guide hits for two genes, ARHGEF7 (green) and MYL6 (orange), are highlighted. The diagonal dashed line shows equality; the square in the lower left shows the significance threshold for calling hits (p < 10-6). Eleven guides have x- or y-values higher than 30 and are not shown. p values were calculated by two-sample Kolmogorov-Smirnov test relative to control guides. c, Bottom: for each of the 37 guide hits shared by fPCA and iADR, significance (-log10(p value)) in each of the top 25 fPCs (left) and iADs (right) is plotted as a heatmap. Five non-targeting control guides are also shown. Top: the number of guides that are significant (p < 10-6) in each dimension is shown as a bar chart.

Definition and visualization of phenotypic shifts caused by gene knockdowns.

a, Schematic: the linear discriminant (LD) axis (thick diagonal line) is the direction that produces the best separation between two populations in high-dimensional space (gray and maroon dots). b, Histogram of Pearson correlation values between LD vectors of two non-targeting control guides (gray) or between LD vectors of two replicate guide hits (blue). LD vectors are calculated in fPCA space. c, All gene hits are listed, with color indicating if they were detected using fPCA (green) or iADR (purple) and whether they were found by two or more replicate guides with significant p value (dark color) or by correlated LD vectors (light color). d-i, VIEWS plots of the LD vector between control cells (gray) and knockdown cells (maroon) for six different guide hits. Density plots show the distribution of each cell population projected onto the LD axis. Cell images are sampled from the entire dataset and are displayed in the same manner as in Fig. 3. n, number of cells in each population. In all panels, LD vectors are calculated using the first 25 fPCs.

A morphological landscape of U2OS cells.

a, Left: heatmap of Pearson correlation between LD vectors of all pairs of sgRNAs of replicated gene hits. Values below and above the diagonal use LD vectors calculated with the first 25 fPCs and the first 15 iADs, respectively. Only guide hits shared by fPCA and iADR are shown. Dendrogram from hierarchical clustering in fPC space is shown with the four major clusters highlighted in color. Right: VIEWS of LD vectors of one guide from each gene, shown in the same order as in the heatmap and grouped into the four clusters. b, Same as (a), but showing all gene hits identified by fPCA or iADR. Genes that are not hits by either fPCA or iADR are displayed in gray text on the corresponding half of the heatmap. Dendrogram from hierarchical clustering in fPC space is shown with the same four major clusters highlighted in color. c, VIEWS of the LD vector describing the phenotypic shift between VCL knockdown cells (replicate guide 1) and TLN1 knockdown cells (replicate guide 1).