In silico analysis of the transcriptional regulatory logic of neuronal identity specification throughout the C. elegans nervous system

  1. Lori Glenwinkel
  2. Seth R Taylor
  3. Kasper Langebeck-Jensen
  4. Laura Pereira
  5. Molly B Reilly
  6. Manasa Basavaraju
  7. Ibnul Rafi
  8. Eviatar Yemini
  9. Roger Pocock
  10. Nenad Sestan
  11. Marc Hammarlund
  12. David M Miller III
  13. Oliver Hobert  Is a corresponding author
  1. Department of Biological Sciences, Columbia University, Howard Hughes Medical Institute, United States
  2. Department of Cell and Developmental Biology, Vanderbilt University School of Medicine, United States
  3. Biotech Research and Innovation Centre, University of Copenhagen, Denmark
  4. Department of Neurobiology, Yale University School of Medicine, United States
  5. Department of Genetics, Yale University School of Medicine, United States
  6. Development and Stem Cells Program, Monash Biomedicine Discovery Institute and Department of Anatomy and Developmental Biology, Monash University, Australia
8 figures and 5 additional files

Figures

Background.

(A) Possible models for regulation of terminal gene batteries. (B) Four examples of terminal selectors (>20 markers tested >1 binding site mutated). All genes shown are direct targets of the indicated terminal selectors (blue ovals) as evidenced by loss of reporter expression in genetic loss-of-function mutants (Etchberger et al., 2007; Kratsios et al., 2011; Masoudi et al., 2018; Wenick and Hobert, 2004). (C) Genetic and biochemical evidence of direct regulation by four terminal selectors. Blue: TF binding site mutated resulting in loss of reporter gene expression. Gray: reporter gene requires TF for expression in neuron class. White, reporter gene expression not affected in TF mutant. (D) Summary of published marker tested in putative terminal selector mutants per neuron class selectors, putative terminal selectors = other classes with genetic evidence of regulation (see Hobert, 2016a for review of markers tested). Neuron classes with more than one putative terminal selector list the largest set of markers tested for a single TF.

Data sources overview.

(A) Top: Differentially expressed genes per single cell profile (Taylor et al., 2021). Bottom: Reporter count per neuron class from the Brain Atlas collection (a compendium of reporter expression patterns extracted from wormbasse.org; Hobert et al., 2016). (B) Motif data overview. Of 230 neuronal transcription factors (TFs) from the Brain Atlas collection and single cell differential expression profiles, 136 TF have unique motifs, 94 TFs have somewhat ambiguous motifs where more than one Caenorhabditis elegans TF has a similar DNA binding domain as computed in Lambert et al., 2019. The majority of motifs are derived from protein binding microarrays (PBMs) (see Weirauch et al., 2014).

Figure 2—source data 1

DNA binding motif logos.

Images were downloaded from cisbp.ccbr.utoronto.ca/. See motif source information in Supplementary file 1A.

https://cdn.elifesciences.org/articles/64906/elife-64906-fig2-data1-v1.pdf
Figure 3 with 3 supplements
TargetOrtho2 development.

(A) Ortholog schematic for eight nematode genomes included in TargetOrtho2. Fifty motif features used by TargetOrtho2’s classifier to rank candidate transcription factor (TF) target genes are shown (left). TargetOrtho2 uses the FIMO scanner (Grant et al., 2011) to identify motif matches across eight nematode genomes. After assignment of each motif to the nearest gene loci, the following features are used to rank each potential TF target gene. f1 and f2 = upstream and intronic conservation where conservation is an integer corresponding to the number of species with at least one motif match (range 1 to 8). Other motif features used for classification are listed above. f1,1=species 1, feature 1 corresponding to the species (label on right) and the motif feature (labeled above). Upstream frequency: motif match count upstream of gene transcription start site. Intron frequency: motif match count in all introns. Upstream max position-specific scoring matrix (PSSM) score: highest scoring upstream motif match (PSSM score from FIMO motif scanner; Grant et al., 2011). Intron max PSSM score: highest scoring intronic motif match. Upstream average PSSM score: average of all motif match PSSM scores upstream. Intron average PSSM score: average all motif match PSSM scores in introns. (B) Supervised learning for TF target gene prediction. Candidate TF target genes are rank ordered based on motif feature data per gene. Rather than ranking candidate TF regulatory target genes based on non-weighted normalized motif feature scores as in the previous published version of TargetOrtho2 (Glenwinkel et al., 2014), we tested and implemented a supervised learning approach in which previous in vivo validated TF target gene’ (Figure 1B) motif features are used to train a classifier for predicting and ranking novel TF target genes. (C) TargetOrtho2 user interface for OS X. The user uploads a MEME formatted DNA binding motif file (Bailey et al., 2009) for a TF of interest and selects a p-value threshold for the FIMO motif scanner. The reference genome can be set to Caenorhabditis elegans or Pristionchus pacificus so that target genes statistics are output for a specific species. The search distance can be restricted to a user-defined upstream distance. If the intergenic distance is smaller than the selected distance, the smaller distance will be searched. TargetOrtho2 is also available as a command line tool for Linux with additional adjustable parameters. (D) Snapshot of TargetOrtho2 output from the ODR-7 motif. TargetOrtho2 outputs a summary file listing each candidate target gene and its relative rank order derived from the Gaussian process classifier (GPC) classifier. Each motif feature listed in part A is output in this file. An additional file showing each motif match per candidate target gene is output showing the PSSM scores and p-values from the FIMO scanner (Grant et al., 2011) as well as the matching DNA sequence and genome coordinate. Motif match information is output separately for each of the eight species.

Figure 3—figure supplement 1
TargetOrtho2 development cross-validation results.

(A) Prediction of transcription factor (TF) target genes with supervised learning. Diagram of cross-validation scheme for choosing the best classifier for target gene prediction from experimentally validated target gene motif feature data. Motif feature data from TF target genes shown in Figure 1B were used to generate testing and training sets for classifier evaluation. A Gaussian process classifier (GPC) classifier has the best recall of true positive TF target genes (see Figure 3—figure supplement 3). (B) Cross-validation results from combined motif feature data from CHE-1 motif (ASE), UNC-3 motif (COE), and TTX-3::CEH-10 (AIY) motifs (OH motifs in Supplementary file 1A). X axis: classifiers tested. Left: Recall: recovery of true positive identification of TF target genes. Middle: Wilcoxon test: TargetOrtho2 ranking of true positive TF target genes compared to random genes with motifs. Right: Wilcoxon test Z scores. (C) Cross-validation results from individual motifs using the Gaussian process classifier (GPC).

Figure 3—figure supplement 2
Four bona fide terminal selectors are enriched for phylogenetically conserved transcription factor (TF) binding sites (Brain Atlas data).

Motif matches in neuron class reporter genes. UNC-3 motif (OH2011) from Kratsios et al., 2011, ASE motif (OH2007) from Etchberger et al., 2007, AIY motif (OH2004) from Wenick and Hobert, 2004, ADL motif (OH2018) from Masoudi et al., 2018. Darker shades of purple are better ranked by TargetOrtho2 as TF target genes. Lighter purple genes have weak motif matches and white have no motif matches. Three motifs are available for the AIY. The in vivo derived AIY motif (OH2004) and the PBM derived TTX-3 and CEH-10 motifs. AIY genes are clustered for visualization of motif match overlap. Right. The proportion of markers with a motif, fold enrichment from hypergeometric test for enrichment and p-values from the hypergeometric and Wilcoxon rank sums tests for motif enrichment and TargetOrtho2 ranking, respectively.

Figure 3—figure supplement 3
Comparison of in vivo and in vitro derived transcription factor (TF) DNA binding motifs.

(A) Data from single cell differential gene expression profiles. (B) Data from Brain Atlas.

Figure 4 with 4 supplements
Terminal selectors with published genetic loss-of-function data.

(A) Transcription factors (TFs) with previously described effects on select identity features are predicted to broadly control neuron type-specific gene batteries. TFs and target neuron class-specific gene batteries are listed with the proportion of markers tested in the corresponding genetic loss-of-function mutant. TFs and neuron class labels are colored according to motif signatures described in 4B. Only putative terminal selector TFs with DNA binding motifs and prior evidence of direct regulation of neuron class-specific genes are shown regardless of motif enrichment status. (B) DNA binding motif signatures in neuron class-specific gene batteries. Three distinct motif signatures among neuron class-specific gene batteries are diagramed: (1) a ‘coordinated regulatory’ motif signature (blue) in which resident genes are both significantly enriched for DNA binding motifs compared to genes across the entire genome (enrichment test p<0.05) and also significantly rank ordered (Wilcoxon rank sums test p<0.05) compared to random genes across the genome with a DNA binding motif (gene rank test); (2) a ‘piecemeal regulatory’ motif signature in which resident genes are not significantly enriched, but those genes that do have a DNA binding motif match are significantly rank ordered by TargetOrtho2 compared to random genes across the genome with a motif (green); and (3) ‘other’ (gray), a situation where many resident genes have a motif match, but these genes are not significantly rank ordered by TargetOrtho2 as likely TF target genes or where residents are neither enriched for motifs nor rank ordered by TargetOrtho2. The corresponding tests are indicated in the table to the left. *p<0.05, N.S. = not significant. Enrichment test = hypergeometric test for enrichment. Rank sums test = Wilcoxon rank sums test. Example neuron class-specific gene batteries are colored by TargetOrtho2 rank order per gene where highly ranked TF target genes are dark purple and poorly ranked TF target genes are lighter purple. White boxes indicate genes with no DNA binding motif match present. TargetOrtho2 rankings are normalized from full genome DNA binding motif matches where the best predicted TF targets are assigned a value of 100 and the worst are assigned a value of 1. (C) Four bona fide terminal selectors are enriched for phylogenetically conserved TF binding sites. DNA binding motif matches in differentially expressed neuron class genes colored by normalized TargetOrtho2 rank order (see color bar in part B). UNC-3 motif (OH2011) from Kratsios et al., 2011, ASE motif (OH2007) from Etchberger et al., 2007, AIY motif (OH2004) from Wenick and Hobert, 2004, ADL motif (OH2018) from Masoudi et al., 2018. Darker shades of purple are better ranked by TargetOrtho2 as TF target genes. Lighter purple genes have weak motif matches and white have no motif matches. Three motifs are available for the AIY. The in vivo derived AIY motif (OH2004) and the PBM derived TTX-3 and CEH-10 motifs. AIY genes are clustered for visualization of motif match overlap. Right. The proportion of markers with a motif, fold enrichment from hypergeometric test for enrichment and p-values from the hypergeometric and Wilcoxon rank sums tests for motif enrichment and TargetOrtho2 ranking, respectively.

Figure 4—figure supplement 1
Proportion of predicted coordinated regulators versus number of markers tested.

(A) Single cell RNA-seq differential gene expression data. (B) Brain Atlas reporter data.

Figure 4—figure supplement 2
Comparison of in vivo and in vitro derived TF DNA binding motifs.

(A) Data from single cell differential gene expression profiles. (B) Data from Brain Atlas.

Figure 4—figure supplement 3
Four bona fide terminal selectors are enriched for phylogenetically conserved TF binding sites (Brain Atlas data).

Motif matches in neuron class reporter genes. UNC-3 motif (OH2011) from Kratsios et al., 2011, ASE motif (OH2007) from Etchberger et al., 2007, AIY motif (OH2004) from Wenick and Hobert, 2004 ADL motif (OH2018) from Masoudi et al., 2018. Darker shades of purple are better ranked by TargetOrtho2 as TF target genes. Lighter purple genes have weak motif matches and white have no motif matches. Three motifs are available for the AIY. The in vivo derived AIY motif (OH2004) and the PBM derived TTX-3 and CEH-10 motifs. AIY genes are clustered for visualization of motif match overlap. Right. The proportion of markers with a motif, fold enrichment from hypergeometric test for enrichment and p values from the hypergeometric and Wilcoxon rank sums tests for motif enrichment and TargetOrtho2 ranking respectively.

Figure 4—figure supplement 4
Motif presence in published, TF-dependent reporter genes.
Experimental validation of putative terminal selectors.

(A) Genetic loss-of-function mutant analysis for three transcription factors (TFs) with little previous evidence of direct regulation. ODR-7 is required for expression of ins-1 and pgp-2 reporters in AWA. CFI-1 is required for expression of unc-17 and cho-1 in PVC, and UNC-86 is required for expression of flp-19 and gcy-33 in URX. (B–D) Validation of cis-regulatory motifs. (B) Schematic of UNC-42 (green rectangle), LIM-4 (orange rectangle), and UNC −86 (blue rectangle) binding sites in the cho-1 locus. White rectangles: binding site deleted (UNC-42 and LIM-4), or mutated (UNC-86). Black lines: upstream intergenic region. (C) cho-1 reporter expression in SMD and SMB is lost when LIM-4/UNC-42 binding site is deleted. Deletion of the more proximal UNC-42 binding site results in loss of cho-1 reporter expression in SMD only. A lad-2 reporter was used to identify SMD and SMB neuron classes. Mutation of the UNC-86 binding site results in loss of expression in URX. A 1 KB fragment is sufficient to drive cho-1 expression in URX and ADF. A cho-1 fosmid reporter was used to identify URX. White dashed rectangle: enlargement and brightness increased to show dim gfp expression of cho-1 in ADF. (D) Quantification of reporter expression in adults.

Figure 6 with 1 supplement
Motif signatures in the nervous system from single cell data.

(A) Overview of all transcription factor (TF) motif signatures in differentially expressed neuron class genes across the nervous system. Most neuron classes have a candidate ‘coordinated regulator’ (see Figure 4B). Distribution of regulatory signatures by neuron class (top) and by TF (bottom). (B) Overview of predicted terminal selectors in the nervous system from single cell differential gene expression data. Terminal selectors refer to candidate coordinated regulators summarized in 6A.

Figure 6—figure supplement 1
Motif presence in published, transcription factor (TF)-dependent reporter genes.
Terminal selector combinations inferred from motif co-occurences.

(A) Co-motif signatures from pairwise cofactor analysis. (1) ‘co-occuring coordinated signatures’ (orange) in which not only is each TF individually characterized as a ‘coordinated regulator’ (Figure 4A, blue), but the observed DNA binding motif matches from each of two TFs are significantly co-enriched in common neuron class-specific genes compared to expected by chance (co-enrichment test p<0.05) and these common genes are each significantly rank ordered by TargetOrtho2 compared to random genes with DNA binding motif matches in the genome (gene rank tests, p<0.05 for each motif); (2) ‘common regulators’, in which significant co-enrichment and TargetOrtho2 rank order is observed as in case 1, but one or both of the two TFs may be a piecemeal regulatory (yellow); (3) ‘independent regulators’, in which either significant co-enrichment and/or TargetOrtho2 rank order is not observed for one or both motifs (gray). (B) Motif patterns among cofactors with genetic loss-of-function data. Colored labels on the right correspond to motif signatures of the individual TFs (right two-colored columns, see Figure 4B) as well as the co-motif signature (left colored column, see part 7A). Lower: Example of cofactor DNA binding motif matches in an orthologous gene set. (C) Cofactor analysis overview in the nervous system. Most cases where motif 1 and motif 2 have ‘coordinated regulatory’ motif signatures show common master regulator co-motif signatures. The master regulator TF pair targets common effector genes in a given neuron class gene battery (orange). Examples TF pairs shown to right. (D) Overview of cofactor motif signatures in the nervous system. Colors correspond to co-motif signatures described in part A. (E) Several examples of candidate terminal selectors whose binding sites are co-enriched in a given neuron class. The chord diagrams (generated using the Chord package for Python) indicate which combination of candidate terminal selectors show joined enrichment in neuron type-specific gene batteries. Chord diagrams of all other neurons in which co-enrichement of coordinated regulators were observed is shown in Figure 7—source data 1. TFs with genetic evidence for terminal selector function are boxed in red. All combinations are also listed in Supplementary File S2G. Note that TF expression is based on scRNA data and may contain false negative/positive signals (in contrast to transcripts, the UNC-86 protein is not expressed in ADL) and also includes broad/ubiquitously expressed TFs (e.g. ztf-3). This may inflate the number of putatitve terminal selectors.

Figure 7—source data 1

Chord diagrams that shows predicted terminal selector combinations (with joined enrichment of binding sites in terminal gene batteries) for all neuron classes.

https://cdn.elifesciences.org/articles/64906/elife-64906-fig7-data1-v1.pdf
Comparison of Brain Atlas reporter-based motif analysis and single cell RNA-sequencing derived results.

(A) The proportion of motifs per transcription factor (TF) per neuron class is significantly correlated between the two datasets. (B) Motif signature model assignments per TF per neuron class are significantly correlated between the two datasets.

Additional files

Supplementary file 1

Data sources.

(A) Motif information. Log-odd PSSM in MEME format for non-CISBP source motifs can be found at the end of (A). (B) Single cell differentially expressed genes as binary expression matrix. (C) Brain Atlas binary expression matrix.

https://cdn.elifesciences.org/articles/64906/elife-64906-supp1-v1.xlsx
Supplementary file 2

Motif analysis results from single cell differential expression data.

(A) Results from transcription factor (TF) with genetic loss-of-function data including results that were further validated in vivo. (B) Results for four, well-characterized terminal selectors. (C) Nervous system-wide motif analysis results from single cell differential gene expression data. (D) Cofactor results from motif analysis of single cell differential gene expression data. (E) All predicted coordinated regulators (candidate terminal selectors) by neuron class. (F) Candidate target neuron class gene batteries by coordinated regulatory TF. (G) All co-occuring coordinated signatures (candidate terminal selector co-regulators).

https://cdn.elifesciences.org/articles/64906/elife-64906-supp2-v1.xlsx
Supplementary file 3

Motif analysis stats from Brain Atlas reporter data.

(A) Brain Atlas single factor motif analysis results. (B) Cofactor results. (C) Agreement of top two regulators between datasets.

https://cdn.elifesciences.org/articles/64906/elife-64906-supp3-v1.xlsx
Supplementary file 4

Strain information.

https://cdn.elifesciences.org/articles/64906/elife-64906-supp4-v1.xlsx
Transparent reporting form
https://cdn.elifesciences.org/articles/64906/elife-64906-transrepform-v1.docx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Lori Glenwinkel
  2. Seth R Taylor
  3. Kasper Langebeck-Jensen
  4. Laura Pereira
  5. Molly B Reilly
  6. Manasa Basavaraju
  7. Ibnul Rafi
  8. Eviatar Yemini
  9. Roger Pocock
  10. Nenad Sestan
  11. Marc Hammarlund
  12. David M Miller III
  13. Oliver Hobert
(2021)
In silico analysis of the transcriptional regulatory logic of neuronal identity specification throughout the C. elegans nervous system
eLife 10:e64906.
https://doi.org/10.7554/eLife.64906