Gene regulatory network reconstruction using single-cell RNA sequencing of barcoded genotypes in diverse environments

  1. Christopher A Jackson
  2. Dayanne M Castro
  3. Giuseppe-Antonio Saldi
  4. Richard Bonneau  Is a corresponding author
  5. David Gresham  Is a corresponding author
  1. New York University, United States
  2. Simons Foundation, United States
7 figures, 1 table and 7 additional files

Figures

Figure 1 with 1 supplement
Single-Cell RNA-Seq Experimental Workflow in Saccharomyces Cerevisiae.

Schematic workflows for: (A) Growth of a transcriptionally-barcoded pool of 11 nitrogen metabolism transcription factor (TF) knockout strains and a wild-type control strain each analyzed with six …

Figure 1—figure supplement 1
Strain Construction Workflow and Validation.

(A) Construction of KanMX[BC] deletion cassettes containing degenerate barcodes in the 3’ untranslated region, followed by transformation into yeast to create a transcription factor knockout array (B

Figure 2 with 4 supplements
Gene expression of single Yeast Cells Cluster Based on Environmental Growth Condition.

(A) Normalized density histograms of raw UMI counts of the core glycolytic genes ENO2 and PDC1, and the alcohol respiration gene ADH2 in each environmental growth condition. Mean UMI count for each …

Figure 2—figure supplement 1
Quality Control of Single-Cell RNA Sequencing Data.

(A) The number of cells that pass all quality control and preprocessing filters for each growth condition. Each genotype has multiple independently barcoded biological replicates that are plotted …

Figure 2—figure supplement 2
Single-Cell RNA Expression Comparison.

(A) Pairwise ranked gene expression plots of cells grown in YPD to mid-log phase. Data sets are bulk counts [TRIZOL] from FY4/FY5 diploids (n = 6), 10x-based 3’ end-labeled single-cell counts …

Figure 2—figure supplement 3
Expression of Categories of Genes in Single Cells.

UMAP projection of log-transformed and batch-normalized scRNAseq data, colored by: (A) Condition, as Figure 2B (B) Total raw count of transcripts (C) Percentage of transcripts which are ribosomal …

Figure 2—figure supplement 4
Measures of Gene Variance in Each Condition.

(A) Coefficient of variation (standard deviation over mean) plotted against mean of each gene for each growth condition. Both axes are plotted on a log scale. (B) The mean pearson residuals …

Figure 3 with 2 supplements
Cells Within Conditions Cluster According to Cell Cycle Genes.

(A) Cells from each growth condition were separately normalized and transformed into 2-dimensional space using UMAP. The log-transformed, normalized expression for each cell of (i) the G1-phase …

Figure 3—figure supplement 1
Expression of Important Genes For Clustering.

(A) Gene expression heatmap of genes that are specific for clusters of cells in YPD. Genes name are colored green for G1 phase genes, yellow for S phase genes, blue for G2 phase genes, and purple …

Figure 3—figure supplement 2
Some Conditions Have Stress Response Clusters Cells from each growth condition were separately normalized and transformed into 2-dimensional space using UMAP.

These single cells are colored by (A) Total raw UMI count (B) Percentage of transcripts which are ribosomal genes (C) Percentage of transcripts which are induced environmental stress response (iESR) …

Figure 4 with 1 supplement
Impact of Deleting Transcription Factors on Gene Expression.

(A) Violin plots of the log2 batch-normalized expression of the general amino acid permease gene GAP1 in YPD, RAPA, ammonium-limited media, and urea-limited media. (B) Count of differentially …

Figure 4—figure supplement 1
Differential Gene Expression Varies by Condition.

(A) Distribution by strain genotype of the log2 batch-normalized expression of the ammonium permease gene MEP2 and the glutamine synthetase gene GLN1 in YPD, RAPA, and ammonium limitation. (B) …

Figure 5 with 1 supplement
Model Performance and Impact of Data Imputation, Prior Selection, and Multitask learning on Network Inference using the Inferelator.

(A) Model performance of Inferelator (TFA-BBSR) network inference after shuffling priors [Neg.Shuffled], on a simulated negative data set [Neg. Data], on the unaltered count matrix [No Imputation], …

Figure 5—figure supplement 1
Low-Dimensional Clustering of Imputed Data Scatter plot after UMAP into 2-dimensional space.

Unmodified data (A) is compared to imputation methods to recover missing data using MAGIC (B), ScImpute (C), and VIPER (D).

Figure 6 with 1 supplement
Reconstruction of a Gene Regulatory Network Identifies New Regulatory Relationships.

A network inferred from the single-cell expression data using multi-task learning and the YEASTRACT TF-gene interaction prior, with a cutoff at precision >0.5. (A) Network graph with known …

Figure 6—figure supplement 1
Summary of Learned GRN.

(A) Histogram of the number of regulators per target gene in the learned and prior network (B) Hexagonal heatmap of the ranked expression of a gene against the number of regulators for that gene; r2

Figure 7 with 1 supplement
Coordinated regulation of Nitrogen Response and Cell Cycle.

(A) A gene regulatory network showing target genes that are regulated by at least one nitrogen TF (blue) and at least one cell cycle TF (green). Target gene nodes are colored by GO slim term. Newly …

Figure 7—figure supplement 1
Cell Cycle TF Activity Clusters within Growth Conditions.

Cells grown in YPD are plotted after UMAP with (A–B) z-Score of the calculated transcription factor activity (TFA) based on the learned network for (A) select nitrogen TFs and (B) cell cycle TFs (C) …

Tables

Table 1
Environmental Growth Conditions.

Environmental growth conditions are listed with their respective nitrogen and carbon sources. Yeast Extract + Peptone (YP) is a rich, complex nitrogen source. YP + Dextrose [YPD] is standard yeast …

Growth conditionAbbrv.Nitrogen sourceCarbon source
Yeast Extract, Peptone, GlucoseYPDYPD-Glucose
YPD (Harvested after Post-Diauxic Shift)DIAUXYYPD-Glucose
YPD + 200 ng/mL RapamycinRAPAYPD-Glucose
Yeast Extract, Peptone, EthanolYPEtOHYPEthanol
Minimal Media (Glucose)MMD20 mM (NH4)2SO4D-Glucose
Minimal Media (Ethanol)MMEtOH20 mM (NH4)2SO4Ethanol
Nitrogen Limited Minimal Media (with Glutamine)NLIM-GLN0.8 mM L-GlutamineD-Glucose
Nitrogen Limited Minimal Media (with Proline)NLIM-PRO0.8 mM L-ProlineD-Glucose
Nitrogen Limited Minimal Media (with NH4)NLIM-NH40.8 mM (NH4)2SO4D-Glucose
Nitrogen Limited Minimal Media (with Urea)NLIM-UREA0.8 mM UreaD-Glucose
Carbon StarvationCSTARVE1 mM (NH4)2SO4None

Additional files

Source code 1

A ‘tar.gz’ archive containing R scripts used to generate Figures 27 and accompanying supplementary figures with a README detailing the necessary R environment to run them locally.

It also contains a data folder with the raw count matrix as a TSV file (103118_SS_Data.tsv.gz), the simulated negative data count matrix as a TSV file (110518_SS_NEG_Data.tsv.gz), a gene name metadata TSV file (yeast_gene_names.tsv), supplemental tables 5 (STable5.tsv) and 6 (STable6.tsv) as TSV files, and the yeast gene ontology slim mapping as a TAB file (go_slim_mapping.tab). Source code 1 also contains a priors folder with the Gold Standard, the three sets of priors data tested in this work, and the YEASTRACT comparison data, all as TSV files. Source code 1 also contains a network folder with the network learned in this paper (signed_network.tsv) as a TSV file, and the networks for each experimental condition (COND_signed_network.tsv) as 11 separate TSV files. Source code 1 also contains an inferelator folder with the python scripts used to generate the networks for Figures 5, 6, 7.

https://cdn.elifesciences.org/articles/51254/elife-51254-code1-v2.gz
Source code 2

The raw count matrix as a gzipped TSV file.

This file contains 38,225 observations (cells). Doublets and low-count cells have already been removed; gene expression values are unmodified transcript counts after deartifacting using UMIs (these values are directly produced by the cellranger count pipeline)

https://cdn.elifesciences.org/articles/51254/elife-51254-code2-v2.gz
Source code 3

The network learned in this paper as a TSV file.

https://cdn.elifesciences.org/articles/51254/elife-51254-code3-v2.tsv
Source code 4

A ‘.tar.gz’ archive containing the sequences used for mapping reads.

It also contains a FASTA file containing the genotype-specific barcodes (bcdel_1_barcodes.fasta), a FASTA file containing the yeast S288C genome modified with markers (Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.Marker.fa), and a GTF file containing the yeast gene annotations modified to include untranslated regions at the 5’ and 3’ end, and with markers (Saccharomyces_cerevisiae.R64-1-1.Marker.UTR.notRNA.gtf).

https://cdn.elifesciences.org/articles/51254/elife-51254-code4-v2.gz
Source code 5

A zipped HTML document containing the raw R output figures for Figures 27 and accompanying supplementary Figures.

The R markdown file to create this document is contained in Source code 1.

https://cdn.elifesciences.org/articles/51254/elife-51254-code5-v2.zip
Supplementary file 1

An excel file containing Supplemental Tables 1-6.

Supplemental Table 1 contains all primer sequences used in this work. Supplemental Table 2 contains all Saccharomyces cerevisiae strains used in this work. Supplemental Table 3 contains all plasmids used in this work. Supplemental Table 4 contains all media formulations used in this work. Supplemental Table 5 contains the source data for modeling performance (as AUPR) that is reported graphically in Figure 5. Supplemental Table 6 contains the gene categorizations (cell cycle stage, RP, RiBi, etc) used in Figure 3.

https://cdn.elifesciences.org/articles/51254/elife-51254-supp1-v2.xlsx
Transparent reporting form
https://cdn.elifesciences.org/articles/51254/elife-51254-transrepform-v2.pdf

Download links