Systematic creation and phenotyping of Mendelian disease models in C. elegans: towards large-scale drug repurposing

  1. Institute of Clinical Sciences, Imperial College London, London, UK
  2. MRC London Institute of Medical Sciences, London, UK

Peer review process

Revised: This Reviewed Preprint has been revised by the authors in response to the previous round of peer review; the eLife assessment and the public reviews have been updated where necessary by the editors and peer reviewers.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Patrick Hu
    Vanderbilt University Medical Center, Nashville, United States of America
  • Senior Editor
    Tony Yuen
    Icahn School of Medicine at Mount Sinai, New York, United States of America

Reviewer #1 (Public review):

Summary:

As the scientific community identifies increasing numbers of genes and genetic variants that cause rare human diseases, a challenge in the field quickly identify pharmacological interventions to address known deficits. The authors point out that defining phenotypic outcomes required for drug screen assays is often a bottleneck, and emphasize how invertebrate models can be used for quick ID of compounds that may address genetic deficits. A major contribution of this work is to establish a framework for potential intervention drug screening based on quantitative imaging of morphology and mobility behavior, using methods that the authors show can define subtle phenotypes in a high proportion of disease gene knockout mutants. Overall, the work constitutes an elegant combination of previously developed high-volume imaging with highly detailed quantitative phenotyping (and some paring down to specific phenotypes) to establish proof of principle on how the combined applications can contribute to screens for compounds that may address specific genetic deficits, which can, in turn, suggest both mechanism and therapy.

In brief, the authors selected 25 genes for which loss of function is implicated in human neuro-muscular disease and engineered deletions in the corresponding C. elegans homologs. The authors then imaged morphological features and behaviors prior to, during, and after blue light stimuli, quantitating features, and clustering outcomes as they elegantly developed previously (PMID 35322206; 30171234; 30201839). In doing so, phenotypes in 23/25 tested mutants could be separated enough to distinguish WT from mutant and half of those with adequate robustness to permit high-throughput screens, an outcome that supports the utility of related general efforts to ID phenotypes in C. elegans disease orthologs. A detailed discussion of 4 ciliopathy gene defects, and NACLN-related channelopathy mutants reveals both expected and novel phenotypes, validating the basic approach to modeling vetted targets and underscoring that quantitative imaging approaches reiterate known biology.

The authors then screened a library of nearly 750 FDA-approved drugs for the capacity to shift the unc-80 NACLN channel-disrupted phenotype closer to the wild type. Top "mover" compounds shift outcome in the experimental outcome space; and also reveal how "side effects" can be evaluated to prioritize compounds that confer the fewest changes of other parameters away from the center.

Strengths:

Although the imaging and data analysis approaches have been reported and the screen is restricted in scope and intervention exposure, it is impressive, encouraging and important that the authors strongly combine tools to demonstrate how quantitative imaging phenotypes can be integrated with C. elegans genetics to accelerate the identification of potential modulators of disease (easily extendable to other goals). Generation of deletion alleles and documentation of their associated phenotypes (available in supplemental data) provide potentially useful reagents/data to the field. The capacity to identify "over-shooting" of compound applications with suggestions for scale back and to sort efficacious interventions to minimize other changes to behavioral and physical profiles is a strong contribution.

Weaknesses:

The work does not have major weaknesses, and in revision, the authors have expanded the discussion to potential utility and application in the field.

The authors have also taken into account minor modifications in writing.

Reviewer #2 (Public review):

Summary and strengths:

O'Brien et al. present a compelling strategy to both understand rare disease that could have a neuronal focus and discover drugs for repurposing that can affect rare disease phenotypes. Using C. elegans, they optimize the Brown lab worm tracker and Tierpsy analysis platform to look at movement behaviors of 25 knockout strains. These gene knockouts were chosen based on a process to identify human orthologs that could underlie rare diseases. I found the manuscript interesting and a powerful approach to make genotype-phenotype connections using C. elegans. Given the rate that rare Mendelian diseases are found and candidate genes suggested, human geneticists need to consider orthologous approaches to understand the disease and seek treatments on a rapid time scale. This approach is one such way. Overall, I have a few minor suggestions and some specific edits.

Weaknesses:

(1) Throughout the text on figures, labels are nearly impossible to read. I had to zoom into the PDF to determine what the figure was showing. Please make text in all figures a minimum of 10 point font. Similarly, Figure 2D point type is impossible to read. Points should be larger in all figures. Gene names should be in italics in all figures, following C. elegans convention.

(2) I have a strong bias against the second point in Figure 1A. Sequencing of trios, cohorts, or individuals NEVER identifies causal genes in the disease. This technique proposes a candidate gene. Future experiments (oftentimes in model organisms) are required to make those connections to causality. Please edit this figure and parts of the text.

(3) How were the high-confidence orthologs filtered from 767 to 543 (lines 128-131)? Also, the choice of the final list of 25 genes is not well justified. Please expand more about how these choices were made.

(4) Figures 3 and 4, why show all 8289 features? It might be easier to understand and read if only the 256 Tierpsy features were plotted in the heat maps.

(5) The unc-80 mutant screen is clever. In the feature space, it is likely better to focus on the 256 less-redundant Tierpsy features instead of just a number of features. It is unclear to me how many of these features are correlated and not providing more information. In other words, the "worsening" of less-redundant features is far more of a concern than "worsening" of 1000 correlated features.Reviewer #2 (Public review):

Summary and strengths:

O'Brien et al. present a compelling strategy to both understand rare disease that could have a neuronal focus and discover drugs for repurposing that can affect rare disease phenotypes. Using C. elegans, they optimize the Brown lab worm tracker and Tierpsy analysis platform to look at movement behaviors of 25 knockout strains. These gene knockouts were chosen based on a process to identify human orthologs that could underlie rare diseases. I found the manuscript interesting and a powerful approach to make genotype-phenotype connections using C. elegans. Given the rate that rare Mendelian diseases are found and candidate genes suggested, human geneticists need to consider orthologous approaches to understand the disease and seek treatments on a rapid time scale. This approach is one such way. Overall, I have a few minor suggestions and some specific edits.

Weaknesses:

(1) Throughout the text on figures, labels are nearly impossible to read. I had to zoom into the PDF to determine what the figure was showing. Please make text in all figures a minimum of 10 point font. Similarly, Figure 2D point type is impossible to read. Points should be larger in all figures. Gene names should be in italics in all figures, following C. elegans convention.

(2) I have a strong bias against the second point in Figure 1A. Sequencing of trios, cohorts, or individuals NEVER identifies causal genes in the disease. This technique proposes a candidate gene. Future experiments (oftentimes in model organisms) are required to make those connections to causality. Please edit this figure and parts of the text.

(3) How were the high-confidence orthologs filtered from 767 to 543 (lines 128-131)? Also, the choice of the final list of 25 genes is not well justified. Please expand more about how these choices were made.

(4) Figures 3 and 4, why show all 8289 features? It might be easier to understand and read if only the 256 Tierpsy features were plotted in the heat maps.

(5) The unc-80 mutant screen is clever. In the feature space, it is likely better to focus on the 256 less-redundant Tierpsy features instead of just a number of features. It is unclear to me how many of these features are correlated and not providing more information. In other words, the "worsening" of less-redundant features is far more of a concern than "worsening" of 1000 correlated features.

Reviewer #3 (Public review):

In this study, O'Brien et al. address the need for scalable and cost-effective approaches to finding lead compounds for the treatment of the growing number of Mendelian diseases. They used state-of-the-art phenotypic screening based on an established high-dimensional phenotypic analysis pipeline in the nematode C. elegans.

First, a panel of 25 C. elegans models was created by generating CRISPR/Cas9 knock-out lines for conserved human disease genes. These mutant strains underwent behavioral analysis using the group's published methodology. Clustering analysis revealed common features for genes likely operating in similar genetic pathways or biological functions. The study also presents results from a more focused examination of ciliopathy disease models.

Subsequently, the study focuses on the NALCN channel gene family, comparing the phenotypes of mutants of nca-1, unc-77, and unc-80. This initial characterization identifies three behavioral parameters that exhibit significant differences from the wild type and could serve as indicators for pharmacological modulation.

As a proof-of-concept, O'Brien et al. present a drug repurposing screen using an FDA-approved compound library, identifying two compounds capable of rescuing the behavioral phenotype in a model with UNC80 deficiency. The relatively short time and low cost associated with creating and phenotyping these strains suggest that high-throughput worm tracking could serve as a scalable approach for drug repurposing, addressing the multitude of Mendelian diseases. Interestingly, by measuring a wide range of behavioural parameters, this strategy also simultaneously reveals deleterious side effects of tested drugs that may confound the analysis.

Considering the wealth of data generated in this study regarding important human disease genes, it is regrettable that the data is not made accessible to researchers less versed in data analysis methods. This diminishes the study's utility. It would have a far greater impact if an accessible and user-friendly online interface were established to facilitate data querying and feature extraction for specific mutants. This would empower researchers to compare their findings with the extensive dataset created here.

Another technical limitation of the study is the use of single alleles. Large deletion alleles were generated by CRISPR/Cas9 gene editing. At first glance, this seems like a good idea because it limits the risk that background mutations, present in chemically-generated alleles, will affect behavioral parameters. However, these large deletions can also remove non-coding RNAs or other regulatory genetic elements, as found, for example, in introns. Therefore, it would be prudent to validate the behavioral effects by testing additional loss-of-function alleles produced through early stop codons or targeted deletion of key functional domains.

Author response:

The following is the authors’ response to the original reviews.

Public Reviews:

Reviewer #1 (Public Review):

Summary:

As the scientific community identifies increasing numbers of genetic variants that cause rare human diseases, a challenge is how the field can most quickly identify pharmacological interventions to address known deficits. The authors point out that defining phenotypic outcomes required for drug screen assays is often challenging, and emphasize how invertebrate models can be used for quick ID of compounds that may address genetic deficits. A major contribution of this work is to establish a framework for potential intervention drug screening based on quantitative imaging of morphology and mobility behavior, using methods that the authors show can define subtle phenotypes in a high proportion of disease gene knockout mutants.

Overall, the work constitutes an elegant combination of previously developed high-volume imaging with highly detailed quantitative phenotyping (and some paring down to specific phenotypes) to establish proof of principle on how the combined applications can contribute to screens for compounds that may address specific genetic deficits, which can suggest both mechanism and therapy.

In brief, the authors selected 25 genes for which loss of function is implicated in human neuro-muscular disease and engineered deletions in the corresponding C. elegans homologs. The authors then imaged morphological features and behaviors prior to, during, and after blue light stimuli, quantitating features, and clustering outcomes as they elegantly developed previously (PMID 35322206; 30171234; 30201839). In doing so, phenotypes in 23/25 tested mutants could be separated enough to distinguish WT from mutant and half of those with adequate robustness to permit high-throughput screens, an outcome that supports the utility of general efforts to ID phenotypes in C. elegans disease orthologs using this approach. A detailed discussion of 4 ciliopathy gene defects, and NACLN-related channelopathy mutants reveals both expected and novel phenotypes, validating the basic approach to modeling vetted targets and underscoring that quantitative imaging approaches reiterate known biology. The authors then screened a library of nearly 750 FDA-approved drugs for the capacity to shift the unc-80 NACLN channel-disrupted phenotype closer to the wild type. Top "mover" compound move outcome in the experimental outcome space; and also reveal how "side effects" can be evaluated to prioritize compounds that confer the fewest changes of other parameters away from the center.

Strengths:

Although the imaging and data analysis approaches have been reported and the screen is limited in scope and intervention exposure, it is important that the authors strongly combine individual approach elements to demonstrate how quantitative imaging phenotypes can be integrated with C. elegans genetics to accelerate the identification of potential modulators of disease (easily extendable to other goals). Generation of deletion alleles and documentation of their associated phenotypes (available in supplemental data) provide potentially useful reagents/data to the field. The capacity to identify "over-shooting" of compound applications with suggestions for scale back and to sort efficacious interventions to minimize other changes to behavioral and physical profiles is a strong contribution.

Weaknesses:

The work does not have major weaknesses, although it may be possible to expand the discussion to increase utility in the field:

(1) Increased discussion of the challenges and limitations of the approach may enhance successful adaptation application in the field.

It is quite possible that morphological and behavioral phenotypes have nothing to do with disease mechanisms and rather reflect secondary outcomes, such that positive hits will address "off-target" consequences.

This is possible and can only be determined with human data. We now discuss the possibility in the discussion.

The deletion approach is adequately justified in the text, but the authors may make the point somewhere that screening target outcomes might be enhanced by the inclusion of engineered alleles that match the human disease condition. Their work on sod-1 alleles (PMID 35322206) might be noted in this discussion.

We agree and now mention this work in the discussion. We are currently working on a collection of strains with patient-specific mutations.

Drug testing here involved a strikingly brief exposure to a compound, which holds implications for how a given drug might engage in adult animals. The authors might comment more extensively on extended treatments that include earlier life or more extended targeting. The assumption is that administering different exposure periods and durations, but if the authors are aware as to whether there are challenges associated with more prolonged applications, larger scale etc. it would be useful to note them.

More prolonged applications are definitely possible. We chose short treatments for this screen to model the potential for changing neural phenotypes once developmental effects of the mutation have already occurred. We now briefly discuss this choice and the potential of longer treatments in the discussion.

(2) More justification of the shift to only a few target parameters for judging compound effectiveness.

- In the screen in Figure 4D and text around 313, 3 selected core features of the unc-80 mutant (fraction that blue-light pause, speed, and curvature) were used to avoid the high replicate requirements to identify subtle phenotypes. Although this strategy was successful as reported in Figure 5, the pared-down approach seems a bit at odds with the emphasis on the range of features that can be compared mutant/wt with the author's powerful image analysis. Adding details about the reduced statistical power upon multiple comparisons, with a concrete example calculated, might help interested scientists better assess how to apply this tool in experimental design.

To empirically test the effect of including more features on the subsequent screen, we have repeated the analysis using increasing numbers of features. In a new supplementary figure we find increasing the number of features reduces our power to detect rescue. At 256 features, we would not be able to detect any compounds that rescued the disease model phenotype.

(3) More development of the side-effect concept. The side effects analysis is interesting and potentially powerful. Prioritization of an intervention because of minimal perturbation of other phenotypes might be better documented and discussed a bit further; how reliably does the metric of low side effects correlate with drug effectiveness?

Ultimately this can only be determined with clinical trial data on multiple drugs, but there are currently no therapeutic options for UNC80 deficiency in humans. We have included some extra discussion of the side effect concept.

Reviewer #2 (Public Review):

Summary and strengths:

O'Brien et al. present a compelling strategy to both understand rare disease that could have a neuronal focus and discover drugs for repurposing that can affect rare disease phenotypes. Using C. elegans, they optimize the Brown lab worm tracker and Tierpsy analysis platform to look at the movement behaviors of 25 knockout strains. These gene knockouts were chosen based on a process to identify human orthologs that could underlie rare diseases. I found the manuscript interesting and a powerful approach to making genotype-phenotype connections using C. elegans. Given the rate at which rare Mendelian diseases are found and candidate genes suggested, human geneticists need to consider orthologous approaches to understand the disease and seek treatments on a rapid time scale. This approach is one such way. Overall, I have a few minor suggestions and some specific edits.

Weaknesses:

(1) Throughout the text on figures, labels are nearly impossible to read. I had to zoom into the PDF to determine what the figure was showing. Please make text in all figures a minimum of 10-point font. Similarly, the Figure 2D point type is impossible to read. Points should be larger in all figures. Gene names should be in italics in all figures, following C. elegans convention.

We have updated all figures with larger labels and, where necessary, split figures to allow for better readability. We’ve also corrected italicisation.

(2) I have a strong bias against the second point in Figure 1A. Sequencing of trios, cohorts, or individuals NEVER identifies causal genes in the disease. This technique proposes a candidate gene. Future experiments (oftentimes in model organisms) are required to make those connections to causality. Please edit this figure and parts of the text.

We have removed references to causation. We were thinking of cases where a known variant is found in a patient where causality has already been established rather than cases of new variant discovery.

(3) How were the high-confidence orthologs filtered from 767 to 543 (lines 128-131)? Also, the choice of the final list of 25 genes is not well justified. Please expand more about how these choices were made.

We now explain the extra keyword filtering step. For the final filtering step, we simply examined the list and chose 25. There is therefore little justification to provide and we acknowledge these cannot be seen as representative of the larger set according to well-defined rules. The choice was based on which genes we thought would be interesting using their descriptions or our prior knowledge (“subjective interestingness” in the main text).

(4) Figures 3 and 4, why show all 8289 features? It might be easier to understand and read if only the 256 Tierpsy features were plotted in the heat maps.

In this case, we included all features because they were all tested for differences between mutants and controls. By consistently using all features for each fingerprint we can be sure that the features that are different that we want to highlight in box plots can be referred to in the fingerprint.

(5) The unc-80 mutant screen is clever. In the feature space, it is likely better to focus on the 256 less-redundant Tierpsy features instead of just a number of features. It is unclear to me how many of these features are correlated and not providing more information. In other words, the "worsening" of less-redundant features is far more of a concern than the "worsening" of 1000 correlated features.

This is a good point. We’ve redone the analysis using the Tierpsy 256 feature set and included this as a supplementary figure. We find that the same trend exists when looking at this reduced feature set.

Reviewer #3 (Public Review):

In this study, O'Brien et al. address the need for scalable and cost-effective approaches to finding lead compounds for the treatment of the growing number of Mendelian diseases. They used state-of-the-art phenotypic screening based on an established high-dimensional phenotypic analysis pipeline in the nematode C. elegans.

First, a panel of 25 C. elegans models was created by generating CRISPR/Cas9 knock-out lines for conserved human disease genes. These mutant strains underwent behavioral analysis using the group's published methodology. Clustering analysis revealed common features for genes likely operating in similar genetic pathways or biological functions. The study also presents results from a more focused examination of ciliopathy disease models.

Subsequently, the study focuses on the NALCN channel gene family, comparing the phenotypes of mutants of nca-1, unc-77, and unc-80. This initial characterization identifies three behavioral parameters that exhibit significant differences from the wild type and could serve as indicators for pharmacological modulation.

As a proof-of-concept, O'Brien et al. present a drug repurposing screen using an FDA-approved compound library, identifying two compounds capable of rescuing the behavioral phenotype in a model with UNC80 deficiency. The relatively short time and low cost associated with creating and phenotyping these strains suggest that high-throughput worm tracking could serve as a scalable approach for drug repurposing, addressing the multitude of Mendelian diseases. Interestingly, by measuring a wide range of behavioural parameters, this strategy also simultaneously reveals deleterious side effects of tested drugs that may confound the analysis.

Considering the wealth of data generated in this study regarding important human disease genes, it is regrettable that the data is not actually made accessible. This diminishes the study's utility. It would have a far greater impact if an accessible and user-friendly online interface were established to facilitate data querying and feature extraction for specific mutants. This would empower researchers to compare their findings with the extensive dataset created here. Otherwise, one is left with a very limited set of exploitable data.

We have now made the feature data available on Zenodo (https://doi.org/10.5281/zenodo.12684118) as a matrix of feature summaries and individual skeleton timeseries data (the feature matrix makes it more straightforward to extract the data from particular mutants for reanalysis). We have also created a static html version of the heatmap in Figure 2 containing the entire behavioural feature set extracted by Tierpsy. This can be opened in a browser and zoomed for detailed inspection. Mousing over the heatmap shows the names of features at each position making it easier to arrive at intuitive conclusions like ‘strain A is slow’ or ‘strain B is more curved’.

Another technical limitation of the study is the use of single alleles. Large deletion alleles were generated by CRISPR/Cas9 gene editing. At first glance, this seems like a good idea because it limits the risk that background mutations, present in chemically-generated alleles, will affect behavioral parameters. However, these large deletions can also remove non-coding RNAs or other regulatory genetic elements, as found, for example, in introns. Therefore, it would be prudent to validate the behavioral effects by testing additional loss-of-function alleles produced through early stop codons or targeted deletion of key functional domains.

We have added a note in the main text on limitations of deletion alleles. We like the idea of making multiple alleles in future studies, especially in cases where a project is focussed on just one or a few genes.

Recommendations for the authors

Reviewer #1 (Recommendations For The Authors):

Note that none of the above suggestions or the one immediately below are considered mandatory.

One additional minor point: The dual implication of mevalonate perturbations for NACLM deficiencies is striking. At the same time, the mevalonate pathway is critical for embryo viability among other things, which prompts questions about how reproductive physiology is integrated in this screen approach. It appears that sterilization protocols are not used to prepare screen target animals, but it would be useful to know if there were a signature associated with drug-induced sterility that might help identify one potential common non-interesting outcome of compound treatments in general. In this work, the screen treatment is only 4 hours, which is probably too short to compromise reproduction, but as noted above, it is likely users would intend to expose test subjects for much longer than 4-hour periods.

This is an interesting point. In its current form our screen doesn’t assess reproductive physiology. This is something that we will consider in ongoing projects.

Figures

Figure 1D might be omitted or moved to supplement.

We have removed 1D and moved figure 1E as a standalone table (Table 1) to improve readability.

Figure 2D "key" is hard to make out size differences for prestim, bluelight, and poststim -more distinctive symbols should be used.

We have increased the size of the symbols so that the key is easier to read.

Line 412 unc-25 should be in italics

Corrected

Reviewer #2 (Recommendations For The Authors):

Specific edits:

All of the errors below have been corrected.

Line 47, "loss of function" should be hyphenated because it is a compound adjective that modifies mutations.

Line 50, "genetically-tractable" should not be hyphenated because it is not a compound adjective. It is an adverb-adjective pair. Line 102 has the same grammatical issue.

Line 85, "rare genetic diseases" do not "affect nervous system function". The disease might have deficits in this function, but the disease does not do anything to function.

Line 86, it should be mutations not mutants. Mutations are changes to DNA. Mutants are individuals with mutations.

Throughout, wild-type should be hyphenated when it is used as a compound adjective.

Figure 4, asterisks is spelled incorrectly.

Reviewer #3 (Recommendations For The Authors):

- As stated in the public review, the utility of the study is limited by the lack of access to the complete dataset. The wealth of data produced by the study is one of its major outputs.

We have made the data publicly available on Zenodo. We appreciate the request.

- Describe the exact break-points of the different alleles, because it was not readily feasible to derive them from the gene fact sheets provided in the supplementary materials.

We have now provided the start position and total length of deletion for each gene in the gene fact sheets.

- Figure 1C: what does "Genetic homology"/"sequence identity" refer to? How were these values calculated?

UNC-49 is clearly not 95% identical to vertebrate GABAR subunits at the protein level.

We have changed the axis label to “BLAST % Sequence Identity” to clarify that these values are calculated from BLAST sequence alignments on WormBase and the Alliance Genom Resources webpages.

- Figure 1E : The data presented in Figure 1E appears somewhat unreliable. For example, a cursory check showed:

(1) Wrong human ortholog: unc-49 is a Gaba receptor, not a Glycine receptor as indicated in the second column.

(2) Wrong disease association: dys-1 is not associated with Bardet-Biedl syndrome; overall the data indicated in the table does not seem to fully match the HPO database.

(3) Inconsistent disease association: why don't the avr-14 and glc-2 (and even unc-49) profiles overlap/coincide given that they present overlapping sets of human orthologs.

Thank you for catching this! We have corrected gene names which were mistakenly pasted. We have also made this a standalone table (Table 1) for improved readability.

- Error in legend to figure 4I : "with ciliopathies and N2" > ciliopathies should be "NALCN disease".

- Error at line 301: "Figures 2E-H" should be "Figures 4E-H".

Corrected.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation