CriSNPr: a single interface for the curated and de novo design of gRNAs for CRISPR diagnostics using diverse Cas systems
Abstract
CRISPR-based diagnostics (CRISPRDx) have improved clinical decision-making, especially during the COVID-19 pandemic, by detecting nucleic acids and identifying variants. This has been accelerated by the discovery of new and engineered CRISPR effectors, which have expanded the portfolio of diagnostic applications to include a broad range of pathogenic and non-pathogenic conditions. However, each diagnostic CRISPR pipeline necessitates customized detection schemes based on the fundamental principles of the Cas protein used, its guide RNA (gRNA) design parameters, and the assay readout. This is especially relevant for variant detection, a low-cost alternative to sequencing-based approaches for which no in silico pipeline for the ready-to-use design of CRISPR-based diagnostics currently exists. In this manuscript, we fill this lacuna using a unified webserver, CriSNPr (CRISPR-based SNP recognition), which provides the user with the opportunity to de-novo design gRNAs based on six CRISPRDx proteins of choice (Fn/enFnCas9, LwCas13a, LbCas12a, AaCas12b, and Cas14a) and query for ready-to-use oligonucleotide sequences for validation on relevant samples. Furthermore, we provide a database of curated pre-designed gRNAs as well as target/off-target for all human and SARS-CoV-2 variants reported thus far. CriSNPr has been validated on multiple Cas proteins, demonstrating its broad and immediate applicability across multiple detection platforms. CriSNPr can be found at http://crisnpr.igib.res.in/.
Data availability
The current manuscript is a computational study, so no new data has been generated for this manuscript. Experimental validation results have been presented in figures in the manuscript. The source code and related datasets have been indicated in the manuscript and also uploaded here: http://crisnpr.igib.res.in/download. All other validation data have been presented in the main manuscript itself.
-
DbSNP: The NCBI database of genetic variationDbSNP: The NCBI database of genetic variation.
-
Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2021.Nucleic Acids Research, doi.org/10.1093/nar/gkaa1022.
Article and author information
Author details
Funding
CSIR (HCP23)
- Souvik Maiti
- Debojyoti Chakraborty
EMBO (GAP252)
- Debojyoti Chakraborty
Lady Tata Memorial Trust (GAP198)
- Debojyoti Chakraborty
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Copyright
© 2023, Ansari et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 1,855
- views
-
- 267
- downloads
-
- 8
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Computational and Systems Biology
Protein engineering is a pivotal aspect of synthetic biology, involving the modification of amino acids within existing protein sequences to achieve novel or enhanced functionalities and physical properties. Accurate prediction of protein variant effects requires a thorough understanding of protein sequence, structure, and function. Deep learning methods have demonstrated remarkable performance in guiding protein modification for improved functionality. However, existing approaches predominantly rely on protein sequences, which face challenges in efficiently encoding the geometric aspects of amino acids’ local environment and often fall short in capturing crucial details related to protein folding stability, internal molecular interactions, and bio-functions. Furthermore, there lacks a fundamental evaluation for developed methods in predicting protein thermostability, although it is a key physical property that is frequently investigated in practice. To address these challenges, this article introduces a novel pre-training framework that integrates sequential and geometric encoders for protein primary and tertiary structures. This framework guides mutation directions toward desired traits by simulating natural selection on wild-type proteins and evaluates variant effects based on their fitness to perform specific functions. We assess the proposed approach using three benchmarks comprising over 300 deep mutational scanning assays. The prediction results showcase exceptional performance across extensive experiments compared to other zero-shot learning methods, all while maintaining a minimal cost in terms of trainable parameters. This study not only proposes an effective framework for more accurate and comprehensive predictions to facilitate efficient protein engineering, but also enhances the in silico assessment system for future deep learning models to better align with empirical requirements. The PyTorch implementation is available at https://github.com/ai4protein/ProtSSN.
-
- Computational and Systems Biology
- Microbiology and Infectious Disease
The Staphylococcus aureus clonal complex 8 (CC8) is made up of several subtypes with varying levels of clinical burden; from community-associated methicillin-resistant S. aureus USA300 strains to hospital-associated (HA-MRSA) USA500 strains and ancestral methicillin-susceptible (MSSA) strains. This phenotypic distribution within a single clonal complex makes CC8 an ideal clade to study the emergence of mutations important for antibiotic resistance and community spread. Gene-level analysis comparing USA300 against MSSA and HA-MRSA strains have revealed key horizontally acquired genes important for its rapid spread in the community. However, efforts to define the contributions of point mutations and indels have been confounded by strong linkage disequilibrium resulting from clonal propagation. To break down this confounding effect, we combined genetic association testing with a model of the transcriptional regulatory network (TRN) to find candidate mutations that may have led to changes in gene regulation. First, we used a De Bruijn graph genome-wide association study to enrich mutations unique to the USA300 lineages within CC8. Next, we reconstructed the TRN by using independent component analysis on 670 RNA-sequencing samples from USA300 and non-USA300 CC8 strains which predicted several genes with strain-specific altered expression patterns. Examination of the regulatory region of one of the genes enriched by both approaches, isdH, revealed a 38-bp deletion containing a Fur-binding site and a conserved single-nucleotide polymorphism which likely led to the altered expression levels in USA300 strains. Taken together, our results demonstrate the utility of reconstructed TRNs to address the limits of genetic approaches when studying emerging pathogenic strains.