Nonlinear transcriptional responses to gradual modulation of transcription factor dosage
Figures
Modulation and quantification of gene dosage using CRISPR and targeted multimodal single-cell sequencing.
(A) Co-expression network representation of the 92 selected genes under study. Genes (nodes) are connected by edges when their co-expression across single cells was above 0.5 (data used from Morris et al., 2023). Highlighted in colour are the two control highly (GAPDH) and lowly (LHX3) constantly expressed genes, as well as cis genes for which dosage was modulated with CRISPRi/a. (B) Design of the multimodal single-cell experiment (HTO = hash tag oligos). (C) Distribution of the GFI1B (left) or NFE2 (right) normalized expression across single cells for different classes of sgRNAs (NTC = Non-targeting controls, TSS = transcription start site). (D) Resulting relative expression change (log2 fold change) of the 4 cis genes upon each unique CRISPR perturbation when grouped across different classes of sgRNAs. (E) Distribution of cis gene log2FC across all sgRNA perturbations.
Experimental design and data processing from UMIs to expression fold change, related to Figure 1 and STAR methods.
(A) Co-expression matrix of the 76 selected GFI1B trans genes based on K562 data from Maurano et al., 2012. Three clusters from the selected targeted panel show similar co-expression architecture than the original clusters identified using the entire GFI1B trans-network (original clusters A in blue, B in green and C in red). (B) Same as A for the 39 NFE2 trans genes (original clusters A in green, B in orange, C in blue, and D in red). (C) Correlation between total UMI counts per gene between 10 X chip lanes. Targeted panel genes are shown in orange and highlighted names correspond to dosage genes (NFE2, MYB, GFI1B, and TET2) and low/high expression controls (LHX3 and GAPDH). (D) The number of singlet cells carrying each sgRNA in the two different CRISPR cell lines. NTC = non-targeting controls. (E) Q-Q plots from Sceptre calibration test. (F) Distribution of normalized UMI expression of the cis gene labelled on top for cells with single guide RNAs (sgRNAs) targeting their transcription start site (TSS) or harbouring non-targeting control (NTC) sgRNAs.
Biochemical and activity properties of different types of single guide RNAs (sgRNAs).
(A) Relationship between off-target and on-target activity of sgRNAs and the change in expression of their target cis gene. (B) The relationship between the number of cells that covered each sgRNA perturbation with the absolute fold change of the cis gene (top) or the number of differentially expressed trans genes due to the cis gene perturbation (bottom). (C) The relationship between the location of the mismatch mutation of attenuated sgRNAs (position 1 being farthest away from protospacer adjacent motif (PAM) motif location) and their effect on the cis gene expression.
Gradual effects of the single guide RNAs (sgRNAs).
(A) Distribution of the normalized cis gene UMIs in single cells, grouped by their unique sgRNAs, ranked top to bottom by mean normalized expression. Transparent distributions correspond to non-targeting controls. (B) Distribution of the correlation in trans gene expression fold changes when splitting the same sgRNA cells into 0 UMI or >0 UMI for the cis gene (top panel). Comparison of the strength of these correlations with the effect of that sgRNA on the cis gene (bottom panel). The size of dots indicates the difference in the size of the 0 UMI or >0 UMI cell groups. (C) UMAPs of the cells with GFI1B, MYB, and NFE2 guides together with non-targeting guides. The left and right clusters in each figure represent CRISPRa and CRISPRi cells, respectively. The cells are coloured by the median fold change associated with their sgRNA.
Cis determinants of dosage.
(A) Comparison of the relative expression change (log2FC) from the same single guide RNA (sgRNA) between the two different CRISPR modalities. Vertical and horizontal bars represent CRISPRa and CRISPRi standard errors, respectively. (B) Relative expression change of the targeted cis gene based on distance from transcription start site (TSS). Top plot excluded attenuated and non-targeting control (NTC) sgRNAs, while bottom plot also excludes enhancer sgRNAs. (C) Number of sgRNAs that overlap with the different epigenetic or open chromatin peaks. (D) Relative expression change to NTC sgRNAs (log2(FC)) of all cis genes when their sgRNAs fall or not in the different epigenetic or open chromatin peaks. P-value results from Wilcoxon rank-sum tests, with nominally significant p-values shown in black.
Trans responses of transcription factor dosage modulation.
(A) Average absolute expression change of all trans genes relative to the changes in expression of the cis genes. (B) Changes in relative expression of all trans genes (bottom heatmap) in response to GFI1B expression changes (top barplot) upon each distinct targeted single guide RNA (sgRNA) perturbation, in comparison to non-targeting control (NTC) cells. The rows of the heatmap (trans genes) are hierarchically clustered based on their expression fold change linked to alterations in GFI1B dosage. Highlighted rows are selected dosage response examples shown in C. (C) Dosage response curves of the highlighted trans gene in B as a function of changes in GFI1B expression. The orange line represents the sigmoid model fit, except for GATA2, which displays a non-monotonic response and are fitted with a loess curve. (D) Illustration of the linear and sigmoid models and equations used to fit the dosage response curves. (E) Distribution of the difference in Akaike Information Criterion (ΔAIClinear-sigmoid) after fitting the sigmoidal or linear model for each trans gene upon GFI1B dosage modulation (top panel), and the direct comparison of the Akaike Information Criterion (AIC) of each fit (bottom panel).
Global view of trans effects and their replication.
(A) Principal component analysis (PCA) of mean UMI normalized expression (not relative to each cell line of origin) for all genes across unique single guide RNA (sgRNA) perturbations. (B) Same as A but using relative expression fold-change when normalising by the CRISPR cell line of origin. (C) Replication of trans-effects of CRISPRi of CREs for GFI1B and NFE2, targeted both in this study (x-axis) and in Morris et al., 2023 (y-axis). GFI1B CRE 1 and NFE2 CRE 1 were targeted in Morris et al. data batches V1 and V2, and the effects are shown here for both separately. (D) Replication of trans-effects from transcription start site (TSS) silencing in this study and in Replogle et al., 2022, analysing guides from this study that target transcription start sites, but the guides do not fully match the exact guides used in Replogle et al. The effect size in Replogle et al. is quantified using their metric of Wilcox mean difference. The dashed line represents a linear regression line between the x and y variables. (E) Number of differentially expressed trans genes relative to the cis gene dosage perturbation.
Trans gene responses to GFI1B dosage modulation.
(A) Changes in relative expression of all trans genes (heatmap) in response to GFI1B expression (top barplot) upon each distinct targeted single guide RNA (sgRNA) perturbation. The rows of the heatmap (trans genes) are hierarchically clustered based on their expression fold change linked to alterations in GFI1B dosage. (B) Dosage response curves are plotted for each trans gene against changes in GFI1B expression. The orange line represents the sigmoid model fit, and the blue line represents a loess curve.
Trans gene responses to MYB dosage modulation.
(A) Changes in relative expression of all trans genes (bottom heatmap) in response to MYB expression (top barplot) upon each distinct targeted GFI1B single guide RNA (sgRNA) perturbation. The rows of the heatmap (trans genes) are hierarchically clustered based on their expression fold change linked to alterations in MYB dosage. (B) Dosage response curves are plotted for each trans gene against changes in MYB expression. The orange line represents the sigmoid model fit.
Trans gene responses to NFE2 dosage modulation.
(A) Changes in relative expression of all trans genes (bottom heatmap) in response to NFE2 expression (top barplot) upon each distinct targeted NFE2 sgRNA perturbation. The rows of the heatmap (trans genes) are hierarchically clustered based on their expression fold change linked to alterations in NFE2 dosage. (B) Dosage response curves are plotted for each trans gene against changes in NFE2 expression. The orange line represents the sigmoid model fit.
Trans gene responses to TET2 dosage modulation.
(A) Changes in relative expression of all trans genes (bottom heatmap) in response to TET2 expression (top barplot) upon each distinct targeted TET2 single guide RNA (sgRNA) perturbation. The rows of the heatmap (trans genes) are hierarchically clustered based on their expression fold change linked to alterations in TET2 dosage. (B) Dosage response curves are plotted for each trans gene against changes in TET2 expression. The orange line represents the sigmoid model fit.
Dosage response linear and non-linear model fitting.
(A) Distribution of the difference in Akaike Information Criterion (ΔAIClinear-sigmoid) after fitting the sigmoidal or linear model for each trans gene based on the gradual expression perturbations of the four cis genes (top panel), and the direct comparison of the Akaike Information Criterion (AIC) of each fit (bottom panel). Red lines indicate median ∆AIC. (B) Same as A but only fitting the models on those single guide RNA (sgRNA) perturbations that lead to a cis gene dosage change bounded between log2(1/2) and log2(3/2). (C) Agreement between observed and predicted trans genes expression fold change upon cis gene dosage modulation across a 10-fold cross-validation scheme. (D) Comparison of the Root Mean Square Error (RMSE) of the sigmoid model on the different trans genes dosage responses to the RMSE of the equivalent loess fit (bottom panel). In blue are highlighted the non-monotonic responses that correspond to the top four ΔRMSEsigmoid-loess (RMSEsigmoid - RMSEloess) values (top panel).
Distribution of the fitted parameters of the sigmoidal model on dosage responses.
Cumulative distribution of the four fitted parameters (first four columns) of the sigmoid model across genes given the independent perturbation of the four transcription factors (TFs) (rows). slope_IF = slope of dosage response curve at the inflection point, min_asmp = minimum asymptote (minimum trans gene dosage level), max_asmp = maximum asymptote (maximum trans gene dosage level), x_IF = TF expression FC at the dosage response inflection point.
Relationship between gene and dosage response properties.
(A) Predicted changes (using sigmoid or loess fits for monotonic and non-monotonic responses, respectively) in relative expression of all trans genes in response to changes of the GFI1B, MYB, and NFE2 expression. Trans genes (rows) were hierarchically clustered based on their expression fold change linked to alterations of all transcription factors (TFs) dosage. A dendrogram of the resulting clustering shown in the left. (B) Heatmap showing the qualitative properties of each trans gene. The x-axis indicates specific gene features. The top labels specify the source of the data, while the bottom labels describe the corresponding gene properties. WBCs, platelets, RBCs, and reticulocytes refer to genome-wide association studies (GWAS) of white blood cells, platelets, red blood cells, and reticulocytes, respectively. (C) Heatmap indicating the z-scaled quantitative gene features of each transgene. The x-axis indicates specific gene features. The top labels specify the source of the data, while the bottom labels describe the corresponding gene properties. Erythroblast, platelets, monocytes, and dendritic cells refer to cell types from Hay et al., 2018. Gray cells indicate missing data. (D) The difference in the average value of the sigmoid parameter indicated in the right between the genes qualified into the no/yes category of the gene properties indicated in B. (E) Pearson correlation coefficient of the quantitative trans gene features (shown in C) with the sigmoid parameter value for each trans gene in the response to the modulation of dosage of the TF indicated on the left. The size of the points are inversely related to the significance of correlation, and colour indicates the direction of correlation. (F) Differences in the range of expression response for Housekeeping vs. non-Housekeeping trans genes with changes of dosage of MYB, GFI1B, and NFE2. (G) Negative correlation between haploinsufficiency score (pHaplo) and the range of the response of trans genes to the modulation of MYB.
Relationship of gene properties and transcription factor (TF)-target network properties with TF dosage responses.
(A) A regulatory network constructed based on TF-target gene data (Minaeva et al., 2025) with nodes and edges coloured by betweenness. Nodes are sized by their degree. (B) Heatmap illustrating the correlation between the sigmoid parameters in response to cis-gene modulation and network centrality metrics calculated based on the regulatory networks from Minaeva et al., 2025. Point size is scaled to -log10 p-value.
Non-linearities in transcription factor (TF) dosage responses of complex traits and disease genes.
(A) Heatmap illustrating the correlation between the mean expression of cell types and the changes in expression linked to individual TF dosage perturbations. The bar plot on the top panel represents cis gene dosage perturbation. Asterisks (*) denote correlations with 10% FDR. (B) Enrichment log(odds) ratio of non-linear TF dosage responses (ΔAIClinear-sigmoid>0) in disease-related genes (OMIM genes linked to 1 or more diseases, top panel) or in GWAS blood traits-associated genes (closest expressed gene to lead GWAS variant, bottom panel). Log(odds) with Fisher’s exact test at FDR <0.05 are highlighted in blue. (C) Examples of TF dosage response curves of genes both associated with disease (OMIM) and complex traits (Blood GWAS).
Transcriptional similarity among bone marrow cell types at different transcription factor (TF) dosage levels.
(A) Normalized z-score mean expression across donors for targeted genes within each bone marrow cell type (Data from the Human Cell Atlas). (B) Examples of trends of correlation of trans genes expression with the TF change in dosage. The title specifies the cis gene and the cell type for which the trans effects of TF dosage modulation have been contrasted.
Tables
| Reagent type (species) or resource | Designation | Source or reference | Identifiers | Additional information |
|---|---|---|---|---|
| Recombinant DNA reagent | pCC_05: Lentiviral puromycin CRISPRa dCas9-VPR system | Addgene | RRID:Addgene_139090 | Used as PCR template for dCas9-VPR cassette (Legut et al., 2020). |
| Recombinant DNA reagent | pGC02: Lentiviral blasticidin CRISPRi KRAB-dCas9-MeCP2 system | other | RRID:Addgene:_170068 | Sourced from Sanjana Laboratory (Morris et al., 2023). Backbone for pJDE003 construction; digested with XbaI-FD and BamHI-FD. |
| Recombinant DNA reagent | pJDE003: Lentiviral blasticidin CRISPRa dCas9-VPR system | this study | NA | Constructed by replacing KRAB-dCas9-MeCP2 cassette in pGC02 with dCas9-VPR PCR product from pCC_05; Gibson assembled (2:1 insert:vector). |
| Recombinant DNA reagent | pGC03: Lentiviral puromycin sgRNA library cloning vector | Addgene | RRID:Addgene:_170069 | Used for cloning 96-sgRNA library (BsmBI digestion; NEBuilder HiFi assembly). |
| Recombinant DNA reagent | pMD2.G: Lentiviral envelope plasmid | Addgene | RRID:Addgene:_12259 | Envelope plasmid for lentiviral production. |
| Recombinant DNA reagent | psPAX2: Lentiviral packaging plasmid | Addgene | RRID:Addgene:_12260 | Packaging plasmid for lentiviral production. |
| strain, strain background (Escherichia coli) | NEB 5-alpha competent cells | New England Biolabs | NEB:C2987H | Used for plasmid transformations (pJDE003 assemblies). |
| Strain, strain background (E. coli) | One Shot Stbl3 chemically competent cells | Thermo Fisher Scientific | ThermoFisher:C737303 | Used for cloning/propagating lentiviral vectors. |
| Strain, strain background (E. coli) | Endura electrocompetent cells | Lucigen | Lucigen:60242–2 | Used for sgRNA library transformation by electroporation;>2.5e5 transformants obtained. |
| Cell line (Human) | HEK293FT | Thermo Fisher Scientific | ThermoFisher:R70007; RRID:CVCL_6911 | Maintained at 37 °C, 5% CO2 in DMEM high glucose (Cytiva SH30022.01)+10% Serum Plus II (Sigma 14,009 C). |
| Cell line (Human) | K562 | ATCC | ATCC:CCL-243; RRID:CVCL_0004 | Maintained at 37 °C, 5% CO2 in IMDM, GlutaMAX (ThermoFisher:31980097)+10% Serum Plus II (Sigma 14,009 C). |
| Antibody | Purified anti-CRISPR (CAS9) antibody (clone 7 A9) | BioLegend | BioLegend:844302; RRID:AB_2749904 | Primary antibody for western blot of dCas9 (conditions not specified in excerpt). |
| Antibody | GAPDH (14 C10) Rabbit monoclonal antibody | Cell Signaling Technology | CST:2118 S; RRID:AB_561053 | Primary antibody for loading control western blot (conditions not specified in excerpt). |
| Antibody | IRDye 800CW goat anti-mouse IgG (H+L) | LI-COR | LI-COR:925–32212 | Secondary antibody for CAS9 western blot (conditions not specified in excerpt). |
| Antibody | IRDye 680RD goat anti-rabbit IgG (H+L) | LI-COR | LI-COR:925–68073 | Secondary antibody for GAPDH western blot (conditions not specified in excerpt). |
| Antibody | FITC anti-human CD4 antibody (clone RPA-T4) | BioLegend | BioLegend:300505; RRID:AB_314073 | Used for FACS validation of CRISPRa activation (day 4 and day 10/11 post-transduction). |
| antibody | APC anti-human CD19 antibody (clone HIB19) | BioLegend | BioLegend:302211; RRID:AB_314241 | Used for FACS validation of CRISPRa activation. |
| Antibody | PE anti-human CD45 antibody (clone 2D1) | BioLegend | BioLegend:368509; RRID:AB_2566369 | Used for FACS validation of CRISPRa activation. |
| Commercial assay or kit | Q5 High-Fidelity 2 X Master Mix | New England Biolabs | NEB:M0492L | PCR amplification of dCas9-VPR cassette. |
| Commercial assay or kit | Gibson Assembly Master Mix | New England Biolabs | NEB:E2611S | Used for Gibson assembly (2:1 insert:vector). |
| Commercial assay or kit | NEBuilder HiFi DNA Assembly kit | New England Biolabs | NEB:NEBuilder-HiFi | Used for cloning pooled sgRNA library into BsmBI-digested pGC03 (10 reactions). |
| Commercial assay or kit | Plasmid Maxiprep Kit | QIAGEN | QIAGEN:12362 | Used for plasmid DNA preparation for virus production. |
| Commercial assay or kit | Maxi Fast-Ion Plasmid Kit, Endotoxin Free | IBI Scientific | IBI:IB47123 | Used for sgRNA library plasmid maxiprep. |
| Commercial assay or kit | Steriflip-HV 0.45 µm filter | Millipore | Millipore:SE1M003M00 | Filtration of harvested lentiviral supernatant. |
| Commercial assay or kit | Lentivirus Precipitation Solution | Alstem | Alstem:VC100 | Used for lentiviral concentration (10 X or 2 X as described). |
| Commercial assay or kit | 10 x Chromium Next GEM Single Cell 5’ Reagent Kit v2 (single indexing) | 10 x Genomics | 10 x:PN-1000265; 10 x:PN-1000190 | Used for 5' single-cell library prep (two lanes; ECCITE-seq modifications). |
| Commercial assay or kit | 10 x Targeted Gene Expression protocol | 10 x Genomics | 10 x:PN-1000248 | Custom probe library used for targeted enrichment of genes of interest. |
| Commercial assay or kit | Illumina NextSeq 500/550 Mid Output v2.5 kit (150 cycles) | Illumina | Illumina:NextSeq-MidOutput-v2.5–150 | Sequencing of targeted gene expression, HTO and GDO libraries. |
| Commercial assay or kit | Illumina MiSeq Reagent Kit v3 (150 cycles) | Illumina | Illumina:MiSeq-v3-150 | Sequencing of dCas9 targeted enrichment and additional HTO libraries. |
| Commercial assay or kit | xGen Custom Hybridization Capture Panel (biotinylated oligos) | IDT | IDT:xGen-Custom-Panel | Custom targeted gene expression panel (final 4,405 probes;~15% discarded during design). |
| Commercial assay or kit | LookOut Mycoplasma PCR Detection Kit | Sigma-Aldrich | Sigma:MP0035 | Routine mycoplasma testing (frequency not specified). |
| Peptide, recombinant protein | XbaI FastDigest (XbaI-FD) | Thermo Fisher Scientific | ThermoFisher:FD0685 | Restriction digest of pGC02. |
| Peptide, recombinant protein | BamHI FastDigest (BamHI-FD) | Thermo Fisher Scientific | ThermoFisher:FD0054 | Restriction digest of pGC02. |
| Peptide, recombinant protein | FastAP Thermosensitive Alkaline Phosphatase | Thermo Fisher Scientific | ThermoFisher:EF0651 | Vector dephosphorylation after restriction digest. |
| Peptide, recombinant protein | DpnI | Thermo Fisher Scientific | ThermoFisher:FD1704 | Digest PCR template plasmid (15 min) prior to Gibson assembly. |
| Other | DMEM high glucose with L-glutamine; without sodium pyruvate | Cytiva (HyClone) | Cytiva:SH30022.01 | Used for HEK293FT culture and lentivirus resuspension media. |
| Other | IMDM, GlutaMAX | Thermo Fisher Scientific | ThermoFisher:31980097 | Used for K562 culture. |
| Other | Serum Plus II medium supplement | Sigma-Aldrich | Sigma:14,009 C | Used at 10% supplementation for HEK293FT and K562 culture. |
| Chemical compound, drug | Polyethylenimine (PEI) linear MW 25,000 | Polysciences | Polysciences:23966 | Used for HEK293FT transfection for lentivirus production. |
| Chemical compound, drug | Blasticidin | A.G. Scientific | A.G.Scientific:B-1247 | Used at 10 µg/mL for 16 days to select dCas9-VPR K562 clones; also 5 µg/mL during sgRNA library culture as described. |
| Chemical compound, drug | Puromycin | InvivoGen | InvivoGen:ant-pr-1 | Used at 2 µg/mL for sgRNA integration selection. |
| Chemical compound, drug | GlycoBlue | Thermo Fisher Scientific | ThermoFisher:AM9515 | Used for DNA precipitation of pooled sgRNA library assemblies. |
| Chemical compound, drug | Isopropanol | other | NA | Used for DNA precipitation of pooled sgRNA library assemblies. |
| Chemical compound, drug | NaCl | other | NA | Used at 50 mM during DNA precipitation of pooled sgRNA library assemblies. |
| Chemical compound, drug | Ethanol 70% | other | NA | Used for washes during DNA precipitation cleanup. |
| Sequence-based reagent | PCR primers oJDE005 and oJDE006 | this study | NA | Used to amplify dCas9-VPR cassette from pCC_05; primer sequences not provided in excerpt. |
| Sequence-based reagent | 96-sgRNA library (ssDNA oligos, 60 bp) for gene dosage library | IDT | IDT:ssDNA-oligos-plate | 96 guides pooled equimolarly to 0.2 µM; cloned into pGC03; guide sequences not provided in excerpt. |
| Software, algorithm | FastQC | Babraham Bioinformatics | RRID:SCR_014583 | Used for QC/demultiplexing of FASTQs (version not specified). |
| Software, algorithm | Cell Ranger (cellranger count) | 10 x Genomics | RRID:SCR_017344 | Used for gene expression (with targeted-panel) and guide capture analysis (Gaussian mixture model calling). |
| Software, algorithm | Seurat | Hao et al., 2021 | RRID:SCR_016341 | Used for normalization, scaling, and UMAP; Seurat v4.3 used for NormalizeData and downstream analyses. |
| Software, algorithm | Salmon/Alevin | Srivastava et al., 2020 | RRID:SCR_017036 | Used for HTO quantification (version not specified). |
| Software, algorithm | Sceptre | Barry et al., 2021 | NA | Used to validate calibration of control cells (Figure 1e). |
| Software, algorithm | R (stats: lm, loess, AIC; drc: drm(fct =L.4())) | R Foundation; drc package | RRID:SCR_001905 | Used for model fitting (linear, LOESS, 4-parameter sigmoid) and AIC calculation. |