Base editing strategies to convert CAG to CAA diminish the disease-causing mutation in Huntington's disease

Doo Eun Choi; Jun Wan Shin; Sophia Zeng; Eun Pyo Hong; Jae-Hyun Jang; Jacob M. Loupe; Vanessa C. Wheeler; Hannah E. Stutzman; Benjamin P. Kleinstiver; Jong-Min Lee

doi:10.7554/eLife.89782.1

eLife assessment

This proof-of-concept study focuses on an A->G DNA base editing strategy that converts CAG repeats to CAA repeats in the human HTT gene, which causes Huntington's disease (HD). These studies are conducted in human HEK293 cells engineered with a 51 CAG canonical repeat and in HD knock-in mice harboring 105+ CAG repeats. The findings of this study are valuable for the HD field, applying state-of-the-art techniques. However, the key experiments have yet to be performed in neuronal systems or brains of these mice: actual disease-rectifying effects relevant to patients have yet to observed, leaving the work incomplete.

https://doi.org/10.7554/eLife.89782.1.sa3

Significance of findings

valuable: Findings that have theoretical or practical implications for a subfield

landmark
fundamental
important
valuable
useful

Strength of evidence

incomplete: Main claims are only partially supported

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

An expanded CAG repeat in the huntingtin gene (HTT) causes Huntington’s disease (HD). Since the length of uninterrupted CAG repeat, not polyglutamine, determines the age-at-onset in HD, base editing strategies to convert CAG to CAA are anticipated to delay onset by shortening the uninterrupted CAG repeat. Here, we developed base editing strategies to convert CAG in the repeat to CAA and determined their molecular outcomes and effects on relevant disease phenotypes. Base editing strategies employing combinations of cytosine base editors and gRNAs efficiently converted CAG to CAA at various sites in the CAG repeat without generating significant indels, off-target edits, or transcriptome alterations, demonstrating their feasibility and specificity. Candidate BE strategies converted CAG to CAA on both expanded and non-expanded CAG repeats without altering HTT mRNA and protein levels. In addition, somatic CAG repeat expansion, which is the major disease driver in HD, was significantly decreased by a candidate BE strategy treatment in HD knock-in mice carrying canonical CAG repeats. Notably, CAG repeat expansion was abolished entirely in HD knock-in mice carrying CAA-interrupted repeats, supporting the therapeutic potential of CAG-to-CAA conversion base editing strategies in HD and potentially other repeat expansion disorders.

Introduction

Huntington’s disease (HD; MIM #143100) ^1–3 is one of many trinucleotide repeat disorders caused by expansions of CAG repeats ^4–8. Although the underlying causative genes, pathogenic mechanisms, clinical features, and target tissues may be different ⁷^; ⁹^; ¹⁰, these disorders share a cardinal feature: an inverse relationship between age-at-onset and respective CAG repeat length ⁴^; ⁷^; ^11–19. To explain this striking genotype-phenotype correlation that is common to many trinucleotide repeat expansion disorders, a universal mechanism in which length-dependent somatic repeat expansion occurs toward a pathological threshold has been proposed ²⁰. This mechanism provides a good explanation of the relationship between CAG repeat length and age-at-onset in HD very well as 1) the HTT CAG repeat shows increased repeat length mosaicism in the target brain region ^21–23, 2) somatic instability is repeat length-dependent ²³^; ²⁴, and 3) the levels of repeat instability shows correlations with cell type-specific vulnerability and age-at-onset ²²^; ²³^; ²⁵. In addition, somatic repeat instability of an expanded HTT CAG repeat appears to play a major role in modifying HD since our genome-wide association studies have revealed that the majority of onset modification signals represent instability-related DNA repair genes ^26–29. Together, these data support the critical importance of CAG repeat length and somatic instability in determining the timing of HD onset.

Recent large-scale genetic analyses of HD subjects have revealed that different DNA repeat sequence polymorphisms have an impact on age-at-onset. Most HD subjects carry an uninterrupted glutamine-encoding CAG repeat followed by a glutamine-encoding CAA-CAG codon doublet (referred to as a canonical repeat; CR) ²⁴^; ²⁷^; ³⁰. However, expanded CAG repeats lacking the CAA interruption (loss of interruption; LI) or carrying two consecutive CAA-CAG (duplicated interruption; DI) ²⁴^; ²⁷^; ³⁰ also exist (S. Figure 1). Surprisingly, the age-at-onset of HD subjects carrying LI or DI alleles is best explained by the length of their uninterrupted CAG repeat, not the encoded polyglutamine length ²⁴^; ²⁷^; ³⁰. These human genetics data indicate that introducing CAA interruption(s) into the HTT CAG repeat to reduce the length of the uninterrupted repeat is a potential therapeutic avenue to delay the onset of HD. Importantly, a genome engineering technology called base editing (BE) was recently developed, permitting the C-to-T conversion (cytosine base editors; CBEs) or A-to-G conversion (adenine base editors; ABEs) ^31–34, where CBEs could in principle be applied to convert CAG codons to CAA to shorten the uninterrupted CAG repeat without altering polyglutamine length or introducing different amino acids. In view of the strong human genetic evidence for the role of the uninterrupted CAG repeat length in determining HD onset, we have conceived BE strategies of converting CAG codons to CAA within the repeat and evaluated their therapeutic potential in HD.

Effects of CAA interruption on HD age-at-onset.
(A-C) Least square approximation was performed to estimate the additional effects of LI (red circles in panel b and c) and DI on age-at-onset (green circles in panel c and d). We varied the CAG length of HD participants carrying LI or DI, and subsequently calculated sum of square (SS) to identify the CAG repeat that explained the maximum variance in age-at-onset of these allele carriers. Y-axis and X-axis represent age-at-onset and CAG repeat length, respectively. Grey circles and black trend lines respectively represent HD participants with CR alleles and their onset-CAG relationship. SS means sum of square.
(D) To illustrate the magnitude of the impact of a therapeutic base editing strategy of converting a CR allele to a DI allele by changing CAG to CAA, an example of a CR of 43 CAG (45 glutamine) with a mean observed onset of 48 years is displayed. In this example, therapeutic conversion of the 42nd CAG to CAA by BE would produce a DI allele of 41 CAG (45 glutamine). Considering the additional effect of DI alleles in HD patients, a 41 CAG / 45 glutamine DI allele would produce an onset similar to a CR allele of 40 CAG / 42 glutamine, with a mean onset age of 60. Therefore, CAG-to-CAA conversion in HD subjects with 43 CR repeats could delay onset by 12 years.

Material and Methods

Study approval

Subject consents and the overall study were approved by the Mass General Brigham IRB and described previously ²⁷. Experiments involving mice were approved by the Mass General Brigham Institutional Animal Care and Use Committee.

Age-at-onset of HD subjects carrying LI or DI

Detailed experimental procedures for sequencing of the HTT CAG repeat region and determination of the CAG repeat length are described previously ²⁷. We compared age-at-onset of HD subjects carrying LI or DI to that of HD subjects carrying CR. Expected age-at-onset from CAG repeat length of CR was based on the onset-CAG regression model that we reported previously ¹⁹^; ²⁶^; ²⁷. For expected age-at-onset based on the polyglutamine length, the same regression model was modified by replacing CAG repeat length with CAG repeat length plus 2 because the glutamine length equals CAG + 2 in CR.

Least square approximation to estimate the additional effects of LI and DI on age-at-onset

Carriers of LI and DI alleles showed slightly earlier and later onset age, respectively, compared to those with CR alleles of the same uninterrupted CAG repeat lengths, suggesting that LI and DI alleles confer additional effects. Thus, we attempted to determine the levels of additional effects of LI and DI that were not explained by the uninterrupted CAG size by taking a mathematical approach that is similar to least square approximation. Briefly, we predicted age-at-onset of HD subject carrying LI or DI alleles using our CAG-onset regression model for CR ¹⁹, and subsequently calculated the residual by subtracting predicted onset from observed onset age. We then calculated the sum of square (SS) for LI and DI carriers based on the participant’s true uninterrupted CAG repeat length using the following formula.

Sum of squares (SS) = ∑ (observed age-at-onset − predicted age-at-onset age) ²

Subsequently, we gradually increased and decreased the CAG repeat length for LI and DI allele carriers, respectively, and calculated SS again to identify CAG repeat size that generated the smallest SS. The differences between true CAG and CAG repeat length that produced the smallest SS were considered as additional effects of LI and DI alleles on age-at-onset.

HEK293 cell culture, gRNA cloning, and transfection

HEK293 cells (https://www.atcc.org/products/crl-1573) were maintained in DMEM containing L-glutamine supplemented with 10% (v/v) FBS and 1% Penicillin-Streptomycin (10,000U/ml). Cells were maintained at 37 °C and 5% CO2. TrypLE Express (Life Technologies) was used to detach cells for sub-culture. PX552 vector (addgene #60958) was digested using SapI (Thermo) and purified by gel purification (QIAquick Gel Extraction Kit). A pair of oligos for each sgRNA were phosphorylated (T4 Polynucleotide Kinase, Thermo) and annealed by incubating at 37 °C for 30 min, 95 °C for 5 min, and ramping to 4 °C. Annealed oligos were diluted (1:50) and ligated into the digested PX552 vector (T7 ligase, Enzymatic) and incubated at room temperature for 15 min. Then, transformation was performed (One Shot Stbl3, Invitrogen). The inserted gRNA sequences were confirmed by Sanger sequencing. For transfection, cells were seeded in 6-well plates at approximately 65% confluence and treated with 1.66ug of CBEs and 0.7ug of gRNA plasmids on the following day using Lipofectamine 3000 (Invitrogen) according to the manufacturer’s protocol. Three days after transfection, cells were harvested for molecular analyses. Genomic DNA was extracted using DNeasy Blood & Tissue kit (QIAGEN). AccuPrime GC-Rich DNA Polymerase (Invitrogen) was used to amplify a region containing the CAG repeat (35 cycles). PCR product was purified by PCR QIAquick PCR Purification Kit (QIAGEN) and subjected to MiSeq (Center for Computational and Integrative Biology DNA Core, Massachusetts General Hospital) and/or Sanger sequencing analysis (Center for Genomic Medicine Genomics Core, Massachusetts General Hospital). Primers for MiSeq sequencing were ATGAAGGCCTTCGAGTCCC and GGCTGAGGAAGCTGAGGA; primers for Sanger sequencing analysis were CAAGATGGACGGCCGCTCAG and GCAGCGGGCCCAAACTCA.

MiSeq data analysis to determine indels and conversion types

Sequence data from the MiSeq sequencing were subject to quality control (QC) by removing sequence reads 1) with mean base Phred quality score smaller than 20, 2) showing the difference between forward and reverse read pair, 3) containing fewer than 6 CAGs, or 4) not involving the full primer sequences. QC-passed data revealed that HEK293 cells carry 16/17 CAG canonical repeats and therefore are expected to produce 18/19 polyglutamine segments. For QC-passed sequence reads, we determined the proportion of sequence reads containing indels, revealing most indels were sequencing errors. Subsequently, we focused on sequence reads without indels to determine the types of conversion. For each sequence read (not including CAA-CAG interruption), we counted sequence reads containing CAA, CAC, CAG, CCG, CGG, CTG, AAG, GAG, and TAG trinucleotide to determine the types and levels of conversion.

MiSeq data analysis of HEK293 cells treated with BE strategies to determine the sites of conversion

Sequence analysis revealed that BE strategies using CBEs produced mostly CAG-to-CAA conversion. CAG-to-TAG conversion was detected in all samples regardless of BE strategies, suggesting that this type of conversion is also due to amplification/sequencing errors. Therefore, we focused on sequence reads of 16/17 CAG repeats containing only CAG or CAA to determine the sites of CAG-to-CAA conversion. Briefly, we recorded the sites of CAG-to-CAA conversion for each sequence read and summed the number of conversions at a given site for a given sample. Therefore, 30% CAG-to-CAA conversion at the second CAG means 70% and 30% of all sequence reads contain CAG and CAA at the second CAG position, respectively.

Quantification of duplicated interruption and multiple conversion

The proportion of duplicated interruptions was determined from HEK293 cells treated with different combinations of CBEs and gRNAs. Briefly, we counted sequence reads containing duplicated interruption and divided them by the number of all QC-passed sequence reads to calculate the proportion of DI alleles. Similarly, we calculated the proportion of sequence reads containing duplicated interruption and CAG-to-CAA conversions at other sites. We also determined the levels of multiple CAG-to-CAA conversion for each BE strategy. For each sequence read in a sample, we counted the number of CAG-to-CAA conversions regardless of their locations to generate a distribution of numbers of multiple conversion for each sample. Since we counted the conversions regardless of their positions, multiple conversions do not necessarily mean consecutive conversions.

Determination of the transfection efficiency

To determine the effects of transfection efficiency on patterns of base editing, we transfected HEK293 cells with gRNA 2 with combinations of different base editors and performed cell staining. Transfected HEK293 cells were fixed with paraformaldehyde (4%) and permeabilized with Triton X-100 (0.5%). Then, cells were stained with DAPI (0.5uM) and incubated for 30 min before being washed with PBS. The eGFP (enhanced green fluorescent protein) and DAPI (4’,6-diamidino-2-phenylindole) images from eight areas in each well were captured using a fluorescence inverted microscope (Nikon Eclipse TE2000-U). The ImageJ analysis program was used to measure the size of a single cell expressing eGFP; we randomly selected 20 cells for each image and averaged their sizes to be used as a reference. We counted the number of pixels covered by eGFP-positive signals, and subsequently divided by the average cell size to obtain the number of eGFP positive cells in each image. This was repeated with the DAPI staining images. The percent transfected was calculated by dividing the number of eGFP-positive cells by that of DAPI-positive cells multiplied by 100.

Base editing in HD patient-derived iPSC and differentiated neurons

An iPSC line carrying adult-onset CAG repeats (42 CAG) was derived from a lymphoblastoid cell line in our internal collection by the Harvard Stem Cell Institute iPS Core Facility (http://ipscore.hsci.harvard.edu/) ³⁵^; ³⁶. HD iPSC cells were dissociated into single cells with Accutase (STEMCELL Tech) and plated on matrigel-coated 24-well plate in mTeSR plus media containing CloneR (STEMCELL Tech) to increase cell viability. The following day, cells at 60∼70% confluence were transfected with 1.8ug of BE4max and 0.6ug of gRNA plasmids using Lipofectamine STEM (Invitrogen) according to the manufacturer’s protocol. Cells were incubated at 37 °C and 5% CO2 for 5 days for sequencing analysis.

The same iPSC line was differentiated into neurons using a previously described method ³⁷. Briefly, the iPSC line was plated on growth factor reduced Matrigel (Corning) in mTeSR Plus media (STEMCELL Technologies). When cells reached ∼ 80% confluence, differentiation was initiated by switching to DMEM-F12/Neurobasal media (2:1) supplemented with N2 and retinol-free B27 (N2B27 RA−; Gibco). For the first ten days, cells were supplemented with SB431542 (10 µM; Tocris), LDN-193189 (100 nM; StemGene), and dorsomorphin (200 nM; Tocris). SB431542 was removed from the media on day 5. Cells were maintained in N2B27 RA− supplemented with activin A (25 ng/ml; R&D) on day 9. On day 22, cells were split using Accutase (STEMCELL Technologies) and seeded on a poly-D-lysine/laminin plate with N2B27 media supplemented with BDNF and GDNF (10 ng/ml each; Peprotech). Media were changed the next day to facilitate neuronal maturation and survival. Cells were fed with new media every two days. For neuronal marker staining, cells were fixed, permeabilized, and blocked using the Image-iT Fix-Perm kit (Invitrogen). Subsequently, cells were stained by Anti-TUBB3 (tubline beta 3; Biolegend Inc, Cat# 801202) in a blocking solution overnight at 4°. Then, cells were washed with PBS three times for 5 minutes, followed by incubation with Alexa Flour 594 secondary antibodies (Invitrogen) for 1 hour. Finally, cells were washed with PBS three times for 5 minutes and mounted with Vectorshield mounting medium with DAPI (Vector Laboratories). Images were captured by the Leica fluorescence microscope. Differentiated neurons were transfected with 1.8ug of BE4max and 0.6ug of gRNA plasmids using Lipofectamine 3000 according to the manufacturer’s protocol. Cells were incubated at 37 °C and 5% CO2 for 7 days for sequencing analysis.

Off-target prediction and experimental validation

Potential off-targets were predicted by the Off-Spotter (https://cm.jefferson.edu/Off-Spotter/) for 8 BE strategies using the gRNA sequences. We allowed a maximum of 4 mismatches to identify potential off-targets that are flanked by the NGG PAM. Given decreased single base specificity at the PAM-proximal sites in the CRISPR-Cas9 genome engineering ³⁸ and the abundance of CAG repeat carrying genes in the human genome, many of our gRNAs (except gRNAs 1 and 2) are predicted to hybridize with many CAG repeat sequences in the genome, generating increased numbers of predicted off-targets. Thus, we performed experimental validation of 1) predicted off-targets for BE strategies 1 and 2 (described here) and 2) genes that cause polyglutamine disorders. For the experimental validation of predicted off-targets, we analyzed HEK293 DNA samples that were used for MiSeq analysis. Briefly, we focused on predicted off-targets in the protein-coding genes for gRNAs 1 and 2 with two mismatches. One and four potential off-targets were predicted for gRNAs 1 (MINK1) and 2 (PINK1, ZNF704, WBP1L, C20orf112), respectively. We amplified predicted off-target sites of gRNAs 1 and 2 (35 cycles) using the following primers:

MINK1, AGCATGCCTACCTCAAGTCC and CTGGTTTGTCAGCGGGATTC;

PINK1, CTGTACCCTGCGCCAGTA and GGATGTTGTCGGATTTCAGGT;

ZNF704, GGACGGGTTGGACTGGTC and GGGTCCTGGCACTGACTGTG;

WBP1L, CCGACCTCCAACTCCTCCC and GCTGCTCTGTGCCCCCTG; and

C20orf112, GATCTCCGTGGGGCTGAG and CCTACTTCCCTCTCCACAGG.

Amplified DNA samples were analyzed by MiSeq sequencing.

Experimental validation of off-targets in genes causing polyglutamine diseases

Similarly, we amplified genomic regions (35 cycles) containing CAG repeat regions in the genes causing polyglutamine diseases using the following primers:

ATXN1, CCTGCTGAGGTGCTGCTG and CAACATGGGCAGTCTGAGC;

ATXN2, CGGGCTTGCGGACATTGG and GTGCGAGCCGGTGTATGG;

ATXN3, GAATGGTGAGCAGGCCTTAC and TTCAGACAGCAGCAAAAGCA;

CACNA1A, CCTGGGTACCTCCGAGGGC and ACGTGTCCTATTCCCCTGTG;

ATXN7, GAAAGAATGTCGGAGCGGG and CTTCAGGACTGGGCAGAGG;

TBP, AAGAGCAACAAAGGCAGCAG and AGCTGCCACTGCCTGTTG;

ATN1, CCAGTCTCAACACATCACCAT and AGTGGGTGGGGAAATGCTC; and

AR, CTCCCGGCGCCAGTTTGCTG and GAACCATCCTCACCCTGCTG.

Sequencing data analysis was focused on calculating the proportion of sequence reads that contain the CAG-to-CAA conversions.

RNAseq analysis

To determine the molecular consequences of candidate BE strategies, we performed RNAseq analysis. We transfected HEK293 cells with BE4max+empty vector, BE4max+gRNA 1, or BE4max+gRNA2 for 72hours. Subsequently, genomic DNA for MiSeq analysis and cell pellets for RNAseq analysis were generated from replica plates Genome-wide RNAseq analysis (Tru-Seq strand specific large insert RNA sequencing) was performed by the Broad Institute. Sequence data were processed by STAR aligner ³⁹ as part of the Broad Institute’s standard RNAseq analysis pipeline. For differential gene expression (DGE) analysis, we used transcripts per million (TPM) data computed by the TPMCalculator (https://github.com/ncbi/TPMCalculator) ⁴⁰. Expression levels in approximately 19,000 protein-coding genes based on Ensembl (ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/) were normalized. The DGE analysis was performed by the generalized linear model using a library of “glm” in R package v3.3.1 (https://www.r-project.org/) after adjustment for two principal components based on RNAseq data, followed by multiple test correction using a FDR method. A multiple test corrected p-value less than 0.05 was considered statistically significant.

Generation and validation of HEK293-51 CAG cells carrying an expanded CAG repeat

HD patient-derived iPSC and neurons showed modest conversion efficiencies, making it technically difficult to characterize molecular consequences of CAG-to-CAA conversion strategies. Thus, we generated HEK293 cells carrying an expanded repeat by replacing one of the non-expanded HTT CAG repeats with a 51 CAG repeat. Briefly, we cloned a gRNA (CAGAGCGCAGAGAATGCGCG) into the PX459 vector (Addgene# 62988) to express SpCas9 and gRNA for CRISPR-Cas9 targeting at the HTT CAG repeat region. The donor template for homologous recombination was generated by PCR amplification of a human DNA sample carrying 51 CAG repeat into the pCR-Blunt II TOPO plasmid (Invitrogen). Subsequently, HEK293 cells were transfected with PX459 and pCR-Blunt II TOPO plasmids by Lipofectamine 3000 (Invitrogen) for 72 hr. Subsequently, cells were treated with G-418 (Gibco) for 21 days, and surviving cells were re-plated onto 100 cm dishes. After 10 days, visible colonies were picked and maintained separately. Single cell clonal lines were validated by PCR analysis using AccuPrime GC-Rich DNA Polymerase and primer set (ATGAAGGCCTTCGAGTCCC and GGCTGAGGAAGCTGAGGA). The PCR conditions were initial denaturation (95 °C, 3 min), 30 cycles of denaturation (95 °C, 30 sec), annealing (55 °C, 30 sec), extension (72 °C, 40 sec), and final extension (72 °C, 10 min). The PCR products were resolved on a 1.5% agarose gel containing GelRed (Biotium) and visualized under UV light to distinguish expanded from non-expanded CAG repeats. We also performed RT-PCR and immunoblot analysis to confirm the correct integration of the expanded CAG repeat. Briefly, 1 μg of total RNA from the targeted clonal line was subjected to reverse transcription with SuperScript IV Reverse Transcriptase (Invitrogen) according to the manufacturer’s instructions followed by PCR analysis using a primer set (ATGAAGGCCTTCGAGTCCC and GGCTGAGGAAGCTGAGGA). For HTT immunoblot analysis, cells were lysed with RIPA Lysis/Extraction Buffer (Thermo) supplemented Halt Protease and Phosphatase Inhibitor Cocktail (Thermo). Whole cell lysate was then separated on NuPAGE 3 to 8%, Tris-Acetate gel (Invitrogen) and transferred to a polyvinylidene fluoride membrane. The membrane was blocked with 5% nonfat dry milk in Tris-buffered saline for 1 h and incubated with primary antibodies for HTT (MAB2166, Sigma-Aldrich) for 12 h at 4 °C. The membrane was washed for 1 h, and blots were incubated with a peroxidase-conjugated secondary antibody for 1 h then washed for 1 h. The bands were visualized by enhanced chemiluminescence (Thermo). Similar to HEK293 cells, HEK293-51CAG cells were treated with BE4max and candidate gRNAs (i.e., gRNA 1 and gRNA 2) to determine the levels of CAG-to-CAA conversion and the total HTT protein levels. For gRNA 1, we determined the levels of in-frame insertion and deletion right after treatment using methods previously described (Lee 2015).

AAV treatment for a candidate BE strategy and CAG repeat instability in mice

For AAV injection experiments, we used split-intein base editor (v5 AAV) ³⁴. Forward and reverse oligos (CACCGCTGCTGCTGCTGCTGCTGGA and AAACTCCAGCAGCAGCAGCAGC) (IDT) for gRNA 2 were cloned into the BSmBI site of pCbh_v5 AAV-CBE C-terminal (Addgene, # 137176) and pCbh_v5 AAV-CBE N-terminal (Addgene, # 137175). Cloned vectors were validated by Sanger sequencing, and subsequently packed into AAV9 serotype by UMass Viral Vector Core. HttQ111 HD knock-in mice ⁴¹ were maintained on an FVB/N background ⁴²; AAV9 injections were performed in heterozygous HttQ111/+ mice at 6∼11 week. Animal husbandry was performed under controlled temperature and light/dark cycles. After anesthesia was induced using isoflurane, an insulin syringe was inserted into the medial canthus with the bevel of the needle facing down from the eyeball, advanced until the needle tip was at the base of the eye. We injected HD knock-in mice with AAV9 mix (200 μl containing C-terminal and N-terminal split-intein base editor, 1 x 10¹² vg for each) (experimental group) or PBS (200 μl, control group) by retro-orbital (RO) injection. Ten weeks later, liver and tail samples were collected for instability analysis ⁴³^; ⁴⁴. Briefly, DNA samples were amplified using primer set (6’FAM-ATGAAGGCC TTCGAGTCCCTCAAGTCCTTC and GGCGGCTGAGGAAGCTGAGGA) and analyzed by ABI3730 to determine the sizes of fragments. Quantification of repeat expansion was based on the expansion index method that we developed previously. The expansion index method robustly quantifies the levels of repeat instability by eliminating potential noise in the fragment analysis results based on the relative peak height threshold ⁴³^; ⁴⁴. To quantify expansion index in control and mice treated with BE, we applied 10% threshold, and expansion index was calculated based on the highest peak in the tail DNA.

Repeat instability in HD knock-in mice carrying interrupted CAG repeat

To determine the maximal effects of CAG-to-CAA interruption, we analyzed HD knock-in mice carrying interrupted CAG repeat (namely interrupted repeat mice; https://www.jax.org/strain/027418) to HD knock-in mice carrying uninterrupted repeat (namely, pure repeat mice; https://www.jax.org/strain/027417). Repeat in the interrupted repeat mice and pure repeat mice comprises 21 copies of [CAGCAACAGCAACAA] and 105 copies of [CAG], respectively. Both mouse lines were expected to produce huntingtin protein with 105 polyglutamine. Repeat instability in these mice were determined (5 months) by the fragment analysis as described previously ⁴⁴.

Statistical analysis and software

Statistical analysis of RNAseq data was performed using generalized linear regression analysis. Multiple test correction was performed using false discovery rate using R 3.5.3 ⁴⁵. R 3.5.3 was also used to produce plots.

Results

Effects of CAG-CAG codon doublet on age-at-onset in HD patients

Previously, we and others reported that most HD subjects carry canonical repeats (CR) comprising an uninterrupted expanded CAG repeat followed by CAA-CAG ²⁴^; ²⁷^; ³⁰. Although infrequent, uninterrupted CAG repeats followed by 1) no CAA-CAG (LI; 0.23% in our previous GWA data) and 2) two CAA-CAG codon doublets (DI; 0.76% in our previous GWA data) also exist (S. Figure 1). In HD subjects carrying LI alleles, the length of the CAG repeat and polyglutamine segment are identical. However, the polyglutamine length is greater by 2 and 4, respectively, compared to the CAG repeats in CR and DI alleles (S. Figure 1). Since CR, LI, and DI alleles with the same uninterrupted CAG repeat lengths have different polyglutamine sizes, they have provided a powerful tool to investigate the relative importance of the CAG repeat in DNA vs. polyglutamine in protein in determining onset age. For example, if polyglutamine length played an important role in determining age-at-onset, onset of LI and DI allele carriers, who respectively have 2 fewer and 2 more glutamines compared to CR allele carriers, would be significantly later and earlier compared to CR allele carriers with the same uninterrupted CAG repeats (S. Figures 2A and 2B). In stark contrast to these predictions, the onset ages of LI or DI allele carriers are best explained by their respective CAG repeat sizes, not polyglutamine length (S. Figures 2C and 2D). Furthermore, age-at-onset of DI allele carriers is significantly delayed compared to that of LI allele carriers with the same uninterrupted CAG repeat size even though DI alleles encode 4 more glutamines than LI alleles (Student t-test p-value, 1.007E-12) (S. Figure 2D). Together, the data indicate that age-at-onset in HD is determined primarily by the uninterrupted length of the CAG repeat, but there may also be additional effects of different CAA-interruption structures since the CAG repeat length does not fully explain age-at-onset in LI and DI allele carriers (S. Figure 2D) ⁴⁶. Therefore, we performed least square approximation to calculate the magnitudes of the additional effects of LI and DI alleles on age-at-onset. Briefly, we varied the individual CAG repeat length to identify the repeat size that best explains the observed age-at-onset of carriers of these LI and DI alleles relative to CR alleles. The age-at-onset of the LI allele carriers is best explained when 3 CAGs are added to the true CAG repeat length (Figures 1A and 1B; S. Figure 3D) while the DI allele carriers behave with respect to age-at-onset as if they have one less CAG than their true CAG repeat length (Figures 1B and 1C; S. Figure 3F). These data suggest that switching a CR allele to a DI allele would delay onset by 1) shortening the uninterrupted CAG repeat by two CAG repeats and 2) conferring an additional effect comparable to a reduction in length of one CAG. For example, if a DI allele were generated from a CR allele with 43 uninterrupted CAGs by converting the 42nd CAG to CAA using base editing strategies, the age-at-onset is predicted to be delayed by approximately 12 years (Figure 1D), illustrating the robustness of therapeutic base editing strategies.

Cytosine base editors and gRNAs for CAG-to-CAA conversion in HD.
(A) Constituents of base editing are displayed.
(B) Schematic of cytosine base editors (CBEs) that can generate C-to-T edits within a finite edit window at a fixed distance from the PAM.
(C) CBE variants described in the literature and used in this study (lower 4) are shown, including the evoCDA1- based SpG CBE that should function more efficiently in GC nucleotide contexts. Protospacer-adjacent motif (PAM), guide RNA (gRNA), uracil glycosylase inhibitor (UGI), rat APOBEC1 deaminase domain (rAPO1), evolved CDA1 cytosine deaminase domain (evoCDA).
(D) The target region, gRNAs, and expected hybridization sites of the 8 gRNAs are shown.

Levels of CAG-to-CAA conversion by BE strategies.
Only CAG-to-CAA conversion showed significantly increased levels over the baseline sequencing errors. Thus, we calculated the percentage of CAA in the cells that were treated with a combination of cytosine base editors (A, BE4max; B, BE4-NG; C, BE4-SpG; and D, evo-SpG) and gRNAs. HEK293 cells without any treatment (i.e., Cell) were combined (n=8) and plotted for each base editor. EV represents HEK293 cells treated with a base editor and empty vector for gRNA. *, significant by Bonferroni corrected p-value < 0.05 (8 tests for each base editor).

Cytosine base editors and gRNAs to convert CAG to CAA in the HTT CAG repeat

Recent advancements in genome editing technologies have led to the development of CBEs that are capable of efficient C-to-T conversion (Figure 2A) ³¹^; ³²^; ^47–49. In principle, CR can be converted to DI if CBEs target the non-coding strand of the HTT CAG repeat (Figure 2B). In this study, we tested 4 CBEs comprised of various cytosine deaminases and SpCas9 enzymes with different protospacer-adjacent motif (PAM) specificities to explore the feasibility of CAG-to-CAA conversion as a putative treatment for HD. BE4 is the fourth-generation base editor which was engineered from BE3 to increase the editing efficiencies and decrease the frequency of undesired by-products (Figure 2C) ⁴⁷. BE4 exhibited high levels of C-to-T editing activity on the target sites harboring NGG PAMs ⁴⁷. The activity window of BE4 is position 4-8, counting from the PAM distal end of the spacer (where the PAM is positions 21-23) (Figure 2B) ⁴⁸. We tested the BE4max (Addgene #112093) in this study, which is a codon optimized version of BE4 with improved nuclear localization ⁴⁸. Due to the sparsity and lack of NGG PAM sites near and within the CAG repeat, respectively, CAG-to-CAA conversion using BE4 was expected to be somewhat limited. Therefore, we also explored engineered CBEs containing SpCas9 variants that target an expanded range of PAM sequences, including SpCas9-NG ⁵⁰ and SpG ⁵¹ (Figure 2C). Since these variants are capable of targeting sites with NGN PAMs, they might permit higher density targeting near or within the CAG repeat. The nucleotide preceding the target cytosine also affects the C-to-T conversion efficiency in CBEs, especially when a G precedes the C ³¹^; ⁵²^; ⁵³. Thus, engineered deaminase domains have been explored to improve C-to-T conversion in the GC-contexts ³²^; ⁴⁹. For instance, an evolved CDA1-based BE4max variant (evoCDA1) showed substantially higher editing on GC targets ⁴⁹, which is relevant to the nucleotide context on the non-coding strand of the HTT CAG repeat (CTGCTG). Therefore, we explored the use of canonical BE4max-SpCas9, BE4max-SpCas-NG, BE4max-SpG, and evoCDA1-BE4max-SpG (henceforth referred to as BE4max, BE4-NG, BE4-SpG, and evo-SpG, respectively) (Figure 2C).

To achieve CAG-to-CAA conversion in the HTT CAG repeat, we designed 3 groups of gRNAs (S. Table 1) based on the sites of predicted hybridization (Figure 2D). Aiming at converting CAGs at the front-end of the repeat, gRNAs 1 and 2 were designed to hybridize with a region involving the upstream of the repeat and conventional NGG PAMs. The gRNAs 1 and 2 contain 10 and 2 non-CAG bases at the PAM-proximal ends, respectively (S. Table 1; S. Figure 4). Considering the activity window of the BE4 (i.e., 13th-17th nucleotide from the PAM) (S. Figure 4, green boxes in the gRNAs), BE4max-gRNAs 1 and BE4max-gRNA 2 were predicted to convert the 1st/2nd and 4th/5th CAG to CAA, respectively (S. Figure 4; sequences with green highlight). The gRNAs 3, 4, and 5 comprised the CAG repeat sequence (S. Table 1) and therefore, were predicted to hybridize throughout the HTT CAG repeat and potentially other CAG repeat-containing genes (S. Figure 4). The gRNAs 3, 4, and 5 were predicted to utilize NAA/NTG, NGA/NCT, and NGG/NGC PAMs, respectively (S. Figure 4). Lastly, gRNAs 6, 7, and 8 were designed to convert CAGs at the back-end of the repeat (S. Table 1). Available PAM sites for these gRNAs are NCT, NGC, and NTG (S. Figure 4). Considering the predicted gRNA-target hybridization sites and conversion windows, these three gRNAs might generate the duplicated interruption that is found in HD patients. (S. Figure 4).

Sites of CAG-to-CAA conversion by BE strategies.
We calculated the percentage of sequence reads containing CAA at specific sites relative to all sequence reads. For example, 27.7% conversion at the 2nd CAG by BE4max-gRNA 1 (top left panel, red) means 27.7 % of all sequence reads from the 16 or 17 CAG alleles have CAA at the 2nd CAG. X-axis and y-axis represent the position of the CAG and percent conversion. Each panel represents a tested gRNA. Plots were based on the mean of 3 independent transfection experiments in HEK293 cells after subtracting corresponding empty vector (EV)-treated cell data. Red, blue, purple, and cyan traces represent BE4max, BE4-NG, BE4-SpG, and evo-SpG.

Predominant CAG-to-CAA conversion without significant indels by BE strategies for HD

We then characterized 32 BE strategies (i.e., combinations of 4 CBEs and 8 gRNAs). We first determined whether BE strategies for HD produced indels. Since low base editing efficiencies might result in proportionally low levels of indels leading to an underestimation of their frequencies, we used HEK293 cells, which showed high levels of base editing efficiencies ⁵⁴^; ⁵⁵. Our MiSeq sequence analyses revealed that HEK293 cells carry two CRs (16 and 17 CAGs) and showed approximately 10% of basal levels of indel (’Cell’ in S. Figure 5), which reflects errors due to the difficulty in sequencing the CAG repeat. Nevertheless, transfection of plasmids for BE strategies did not significantly increase the levels of indels compared to cells without any treatment (Cell) or cells treated with empty vector (EV) (S. Figure 5). The lack of significant indel formation was quite expected because the cytosine base editors that we tested use nickases (Figure 2C) ⁴⁷^; ⁴⁸.

Allele specificity and molecular outcomes of candidate BE strategies.
(A) To overcome the limitations of patient-derived iPSC and differentiated neurons, we developed HEK293 carrying an adult-onset CAG repeat by replacing one of the normal repeats with 51 canonical CAG (namely HEK293-51 CAG). Red and green bars represent respectively mutant and normal *HTT* in HEK293-51 CAG cells.
(B and C) The HEK293-51 CAG cells were treated with BE4max-gRNA 1 and analyzed to determine the levels of in-frame insertion (B) and in-frame deletion (C) at the time of treatment.
(D) The HEK293-51 CAG cells were treated with the gRNA 1 and analyzed by MiSeq to determine the levels of allele specificity. Conversion efficiency on the Y-axis indicates the percentage of sequence reads containing the CAG-to-CAA conversion at the target site. * represents uncorrected p-value < 0.05 by Student t-test.
(E) Original HEK293 cells and HEK293-51 CAG cells were treated with empty vector (EV), or candidate BE strategies (BE4max-gRNA 1 and BE4max-gRNA 2) and subjected to immunoblot analysis; representative blot is shown in panel E.
(F) Four independent experiments were performed, and we performed one-sample t-test to determine whether BE-treated cells show different total HTT protein levels compared to EV-treated cells. Nothing was significant by p-value < 0.05.

Since most sequence reads containing indels might be sequencing errors, we focused on sequence reads without indels to determine the types of base conversions. HEK293 cells without any treatment (Cell) or cells treated with empty vector (EV) showed low but detectable levels of CAG-to-CAA and CAG-to-TAG conversions (S. Table 2; S. Figure 6), also reflecting sequencing errors. However, the levels of CAG-to-CAA conversion were significantly increased over baseline sequencing errors in cells treated with some BE strategies (S. Table 2). For example, BE4max in combination with gRNAs 1, 2, 5, and 7 resulted in efficient CAG-to-CAA conversion (Figures 3A). Given the availability of the NGG PAMs (S. Figure 4), robust CAG-to-CAA conversion by gRNAs 1, 2, and 5 was somewhat anticipated for BE4max. However, high levels of CAG-to-CAA conversion by the BE4max-gRNA 7 combination (Figure 3A; S. Figure 6A) were unexpected because the anticipated hybridization site does not provide the NGG PAM (S. Figure 4) that is required for the optimal activity of BE4max. The BE4-NG robustly produced CAG-to-CAA conversions with gRNAs 1, 2, and 3; although not significant, gRNAs 5 and 7 also generated high levels of CAG-to-CAA conversions (Figure 3B; S. Figure 6B). BE4-SpG with the combinations with gRNAs 1 and 2 resulted in significant levels of CAG-to-CAA conversions (Figure 3C; S. Figure 6C). Overall, the CAG-to-CAA conversion was higher in evo-SpG compared to other base editors; gRNAs 1, 2, 4, and 8 produced significant CAG-to-CAA conversions (Figure 3D; S. Figure 6D). These data indicated that our BE strategies primarily generated CAG-to-CAA conversion without significant indel formation. Patterns of conversions also indicated that sites with NGG PAMs (gRNAs 1, 2, and 5) permitted the highest levels of CAG-to-CAA conversion for BE4max and CBEs with relaxed PAM specificities.

RNAseq analysis of BE strategies confirms the lack of transcriptome alternation.
(A) HEK293 cells were treated with empty vector (EV) or candidate BE strategies such as BE4max-gRNA 1 (gRNA 1), and BE4max-gRNA 2 (gRNA 2) for RNAseq analysis. MiSeq analysis was also performed to judge the levels of CAG-to-CAA conversion. ****, p-value < 0.0001 by Student t-test (n=4).
(B) Confirming the lack of significantly altered genes in BE4max-gRNA 1 or BE4max-gRNA 2, we compared all BE-treated samples (n=8) with all EV-treated samples (n=4) to increase the power in the RNAseq differential gene expression analysis. Each circle in the volcano plot represents a gene analyzed in the RNAseq; *HTT* is indicated by a filled red circle. A red horizontal line represents false discovery rate of 0.05, showing that none was significantly altered by candidate BE strategies.
(C) We also compared two groups of randomly assigned samples (6 samples vs. 6 samples) to understand the shape of the volcano plot when there were no significant genes.

Sites of CAG-to-CAA conversion

Subsequently, we determined conversion sites for different BE strategies. The patterns of conversion sites were similar for BE4max, BE4-NG, and BE4-SpG in gRNA 1, showing the most conversion at the second CAG with decreased levels of conversion at the first CAG (Figure 4A; S. Table 3). In contrast, evo-SpG-gRNA 1 combination showed higher editing efficiencies with the maximum conversion at the second CAG with comparable levels of conversions at the first and third CAGs (Figure 4A, cyan; S. Table 3). The gRNA 2 showed similar patterns as gRNA 1 except that conversion sites were shifted to the right; the highest conversion occurred at the 4th CAG by BE4max, BE4-NG, and BE4-SpG (Figure 4B; S. Table 3).

The gRNAs 3 and 4, which were designed to hybridize throughout the CAG repeat, did not generate CAG-to-CAA conversion in combination with BE4max (Figures 4C and 4D, red; S. Table 3) because of the lack of a NGG PAM. Although modest, BE4-NG and BE4-SpG converted the 5th CAG to CAA (Figures 4C and 4D; S. Table 3), potentially due to the possibility that NAA (gRNA 3) and NGA (gRNA 4) PAMs supported the base editing activity of BE4-NG and BE4-SpG. The gRNAs 3 and 4 produced higher levels of CAG-to-CAA conversion in evo-SpG again (Figures 4C and 4D, cyan), and interestingly, CAG-to-CAA conversions were not limited to the 5th CAG (S. Table 3). The gRNA 5 with BE4max efficiently converted the 6th CAG (Figure 4E, red), which was unexpected; conversions by other base editors were lower but widespread throughout the repeat (Figure 4E).

BE strategies designed to convert CAGs at the back-end of the repeat were tested using gRNA 6, 7, and 8. Although less robust, the patterns of conversion by gRNA 6 (Figure 4F) were similar to those of gRNA 4 (Figure 4D). Since only one nucleotide is different between gRNA 6 and gRNA 4, it appeared that gRNA 6 behaved like gRNA 4 despite one mismatch, favoring the NGA PAM instead of the less optimal NCT PAM (S. Figure 7A). The same explanation might account for the similar patterns of conversion sites for gRNA 7 (Figure 4G) and gRNA 5 (Figure 4E); efficient conversion at the 6th CAG by BE4max-gRNA 7 might be due to the interaction of gRNA 7 at the target site of gRNA 5 (with one mismatch) in favor of the NGG PAM (S. Figure 7B). The gRNA 8 generated CAG-to-CAA conversions only in evo-SpG. Although this group of gRNAs was designed to hybridize with the back-end of the CAG repeat, higher levels of conversion were observed at the front-end CAGs and throughout the repeat (Figures 4F-4H). These results suggest that one PAM-distal mismatch might be tolerated by base editors in favor of targets sites harboring more robust PAMs. Also, our data revealed that as expected, BE4max is highly dependent on the NGG PAM, resulting in CAG-to-CAA conversion at specific CAG sites, while evo-SpG is more efficient in conversion leading to broader targeting due to its relaxed PAM requirement.

Impacts of CAA interruption on CAG repeat instability.
(A - C) DNA samples (liver and tail) of BE-treated mice were analyzed to quantify somatic repeat expansion. We performed linear regression analysis to model the levels of repeat expansion as a function of treatment, CAG repeat in tail (A), age (B), and with other covariates (i.e., experimental batch, sex, tail CAG and age). Summary of the statistical analysis is summarized in the panel C.
(D and E) To determine the maximal impacts of CAA interruption on the repeat expansion, HD knock-in mice carrying CAA interrupted repeats were analyzed. Liver samples of 105 uninterrupted CAG repeat (D) and interrupted repeat (E) were analyzed at 5 months. Representative fragment analysis is displayed. Red arrows indicate the modal alleles representing inherited CAG repeats; peaks at the right side of the modal peaks (red arrows) represent expanded repeats.

Generation of duplicated interruption by BE strategies

Next, we determined the levels of duplicated interruption in the same HEK293 cell MiSeq data. As shown in S. Figure 8, BE4max and evo-SpG did not produce significant amounts of the DI that is found in humans (S. Figure 1). However, BE4-NG (S. Figure 8B) and BE4-SpG (S. Figure 8C) produced modest but significant levels of DI in combinations with gRNAs 5 and 7 (0.5%∼1% increase over the basal levels). Modest levels of DI alleles compared to conversions at other sites might be due to the lack of canonical PAMs (e.g., NGG) at the specific site (approximately 18 nucleotides upstream of CAA-CAG interruption). We also observed that gRNA 5 and 7 relatively increased the number of sequence reads containing both DI and CAG-to-CAA conversions at other sites (S. Figures 9B and 9C), indicating that CTG trinucleotides on the non-coding strand of the repeat contributed to modest but widespread CAG-to-CAA conversion throughout the repeat. Similarly, CAG-to-CAA conversion was not confined to specific sites in evo-SpG in combination with gRNAs 3-8 as DI alleles generated by these strategies also contained CAG-to-CAA conversions at other sites (S. Figure 9D). Since increased conversion efficiency in evo-SpG could not be explained by the transfection efficiency (S. Figure 10), these data indicate that evo-SpG has a significantly wider conversion window. In agreement with this, the most frequent number of conversions in a given sequence read by evo-SpG was greater than that of other base editors (S. Figure 11; S. Table 4).

Evaluation of off-target effects

We then evaluated the levels of off-target conversions using Off-Spotter. As summarized in S. Table 5, gRNAs 1 and 2 showed relatively smaller numbers of predicted off-targets due to unique sequences near the PAMs. As expected, gRNAs that were designed to hybridize throughout the CAG repeat showed increased numbers of predicted off-targets. Similarly, gRNAs to convert CAG at the back-end of the repeat showed larger numbers of predicted off-targets, potentially due to the fact that unique sequences are distal to the PAMs. Subsequently, we performed two sets of follow-up off-target validations. For gRNAs 1 and 2, we experimentally evaluated predicted off-targets focusing on protein-encoding genes; one and four genes were predicted off-target sites for gRNA 1 and gRNA 2, respectively, and all showed low levels of conversion compared to on-target (S. Table 6). We also characterized the levels of off-target conversion in other CAG repeat-containing genes focusing on 8 polyglutamine disease genes (S. Tables 7). As predicted, gRNAs 1 and 2 showed low-level conversions in the CAG repeats of other polyglutamine disease genes in general (S. Table 8; S. Figure 12). In contrast, gRNAs 3- 8 produced variable but higher levels of conversion in some polyglutamine disease genes depending on the availability of preferred PAMs (S. Figure 12).

Allele specificities and molecular outcomes of candidate BE strategies

Subsequently, we evaluate the levels of allele specificity of candidate BE strategies (BE4max-gRNA 1 and BE4max-gRNA 2) in patient-derived induced pluripotent stem cells (iPSC carrying 41 CAG CR) ³⁵^; ³⁶ and differentiated neurons (S. Figure 13). As shown in S. Table 9, transfection of gRNA 1 and gRNA 2 produced modest CAG-to-CAA conversion on both mutant and normal HTT (approximately < 3%). Overall, low conversion efficiencies by transfection and transduction of AAV (adeno-associated virus; data not shown) represent difficulties in delivery in these cell types ³⁵^; ⁵⁶, posing a challenge to determining the levels of allele specificity of BE strategies. To overcome these technical difficulties, we developed a HEK293 clonal line carrying an expanded HTT CAG repeat (Figure 5A) by replacing one of normal CAG repeats with a 51 CAG canonical repeat (namely HEK293-51 CAG) (S. Figure 14). A candidate BE strategy (i.e., BE4max-gRNA 1) did not increase the levels of in-frame insertion/deletion in the mutant or normal HTT repeat (Figures 5B and 5C). Subsequent analysis revealed that a candidate BE strategy BE4max-gRNA 1 produced high levels of CAG-to-CAA conversions on both expanded and non-expanded canonical repeats (Figure 5D). Although very modest, conversion was significantly higher in the non-expanded repeat (uncorrected p-value, 0.04996), which can be explained by slightly reduced conversion on the mutant HTT due to higher GC content in the expanded CAG repeat. However, the candidate BE strategies did not alter huntingtin protein levels (Figures 5E and 5F) at the time of treatment, supporting the safety of candidate BE strategies. We also performed RNAseq analysis to identify genes whose expression levels were altered by candidate BE strategies in HEK293 cells. Candidate strategies such as BE4max-gRNA 1 and BE4max-gRNA 2 produced significant on-target CAG-to-CAA conversions (Figure 6A), but the levels of HTT mRNA were not altered by either treatment (filled red circles in S. Figure 15 and Figure 6). In addition, RNAseq data analysis showed that neither BE strategy induced significant gene expression changes in any genes (false discovery rate, 0.05) (S. Figure 5). When comparing all HEK293 samples treated with either BE strategies (n=8) to those treated with EV (n=4), the shape of volcano plot mimicked random sample comparison (Figures 6B and 6C), implying the lack of impacts of candidate BE strategies on transcriptome.

Effects of base conversion on the CAG repeat instability in vivo

The limited cargo capacity of AAV has been circumvented by the intein-split base editor, and the feasibility of BE strategies targeting non-repetitive sequences has been demonstrated in mouse models of human diseases ³⁴^; ⁵⁷^; ⁵⁸. Taking advantage of the split BE system, we determined whether a candidate HD BE strategy could target the CAG repeat and result in a decrease in somatic repeat expansion, which was hypothesized to be the major disease driver ²⁰. Since striatal and liver repeat instability share certain underlying mechanisms ^59–66, and in vivo delivery might be more efficient in the liver compared to the brain ³⁴, we used AAV9 to evaluate a candidate BE strategy in the liver. As expected, somatic CAG repeat expansion index in the liver of HD knock-in mice carrying around 110 CAGs showed a positive correlation with the inherited CAG repeat length (as represented in tail DNA) and the age of mice (Figures 7A and 7B) ¹⁶^; ²³^; ⁴¹^; ⁴⁴^; ⁶⁷^; ⁶⁸. Unfortunately, we could not determine the sequence modification in treated mice by sequencing because of 1) very long CAG repeats in these mice, 2) modest levels of base conversion, and 3) high levels of errors when sequencing the CAG repeat (S. Figure 6). However, when the effects of the tail CAG repeat size and age of mice were corrected, retro-orbital injection of AAV9 for split CBE (v5 AAV-CBE) and gRNA 2 significantly decreased the levels of repeat expansion (Figure 7C; p-value, 1.78E-6). Nevertheless, the expansion index in treated and control mice was largely overlapping (Figure 7A), suggesting that the effects of BE treatment were very modest. We speculate that 1) insufficient dosage due to difficulty in producing high titer viral package for big cargo (i.e., 5KB) ³⁴^; ⁶⁹, 2) limited delivery ⁷⁰, and/or 3) difficulty in targeting the very long CAG repeat resulted in modest effects. Given those limitations, we also analyzed a mouse model containing interrupted repeat to determine the maximum effects of the interruption on the repeat expansion. HD knock-in mice carrying 105 interrupted CAG repeat (https://www.jax.org/strain/027418) showed complete loss of repeat expansion compared to 105 CAG uninterrupted repeat mice (https://www.jax.org/strain/027417) (Figures 7D and 7E), suggesting that CAA interruption could completely suppress the most important disease modifier (i.e., CAG repeat expansion).

Discussion

Recent advances in genome engineering provide powerful tools to interrogate the relationships among genes, functions, and diseases. For example, CRISPR-Cas9-based editing approaches have revolutionized the investigation of individual genes of interest and also have begun to be applied to humans to treat diseases ^71–75. Base editing (BE), which can convert a single nucleotide to another, represents a newly developed and highly versatile genome engineering technology ³¹^; ³³. BE has advantages over other genome engineering approaches with respect to safety and clinical applicability. BE employing nickase Cas9 does not intentionally create double-stranded DNA breaks (DSBs) ³¹^; ³³, minimizing potential adverse effects. Also, BE with low off-targeting is being actively developed, adding an additional layer of safety ⁷⁶. The majority of well-characterized disease-causing mutations are point mutations, and therefore many genetic disorders can be addressed by BE strategies ⁷⁷. The robustness of BE has been demonstrated in models of genetic disorders caused by point mutations ⁵⁷^; ⁵⁸^; ⁷⁷^; ⁷⁸, and the first human trial employing base editing has already been started ⁷⁹^; ⁸⁰. However, many human disorders are caused by other types of mutations, such as expansions of DNA repeats ⁷^; ¹⁶^; ⁸¹ for which BE may not seem like an ideal tool. In contrast to this commonly held notion, we show that BE strategies could also address diseases that are caused by expanded repeats, broadening their target space and applicability.

In HD, multiple studies have shown that the uninterrupted CAG repeat length in HTT gene, not the polyglutamine length in huntingtin protein, determines age-at-onset ²⁴^; ²⁷^; ³⁰. Age-at-onset of HD subjects carrying DI alleles not only supports this notion directly, but also points to novel therapeutic strategies. For example, converting CAG to CAA would decrease the length of uninterrupted CAG repeat without changing the length of polyglutamine or altering huntingtin protein. Indeed, our candidate BE strategies could shorten the length of uninterrupted CAG repeat by converting CAG to CAA at various sites in the CAG repeat without causing significant indels or off-target effects. In support, our candidate BE strategy modestly but significantly reduced the levels of CAG repeat expansion in mice, and HD knock-in mice carrying the CAA-interrupted repeats showed virtually zero repeat expansion. Given the role of the uninterrupted CAG repeat length as the most important disease determinant and a pivotal role for repeat instability in the modification of HD ²⁶^; ²⁷, our data support the therapeutic potential of CAG-to-CAA conversion BE strategies in HD.

Our data are relevant for a number of reasons. Firstly, genetically supported targets significantly increase the success rate in clinical development ⁸²; our BE strategies derive directly from human genetic observations in HD individuals ²⁴^; ²⁷^; ³⁰ that point to the uninterrupted CAG repeat length in HTT as the most direct therapeutic target in HD. Based on these human data, CAG-to-CAA conversion even near the 5’-end of the expanded CAG repeat may produce robust onset-delaying effects. In addition, if BE strategies are applied to fully penetrant 40 or 41 CAG canonical repeats, the repeats can become reduced penetrant (i.e., 36-39). Similarly, BE strategies may be able to convert some of the reduced penetrant CAG repeats (e.g., 36 and 37 CAG) to non-pathogenic (CAG < 36), which can prevent the manifestation of the disease. Secondly, lessons from the recent huntingtin-lowering clinical trial ⁸³^; ⁸⁴ implied the importance of allele-specific approaches ³⁵^; ³⁶. Although CAG-to-CAA conversions in our experiments occurred on both mutant and normal HTT, our candidate BE strategies are expected to produce mutant allele-specific consequences. The amino acid sequence and levels of huntingtin were not altered, while the length of the uninterrupted CAG repeat was shortened. On the mutant allele, this shortening is expected to reduce the somatic instability of the repeat, reducing its disease-producing potential. On the normal allele, inherited variation in the length of the CAG repeat has not been associated with an abnormal phenotype ¹⁹, so the shortening of the CAG repeat is expected to be benign. Importantly, the mutant allele-specific consequences can be achieved without relying on individual genetic variations beyond the CAG repeat. Various SNP-targeting allele-specific approaches have been proposed ³⁵^; ³⁶^; ⁸⁵^; ⁸⁶, but most of these can be applied only to a subset of the HD population depending on heterozygosity at the target site. Our BE strategies can achieve allele-specific consequences without targeting SNPs, and, therefore, can be applied to all HD subjects, representing a huge advantage over SNP-targeting allele-specific strategies. Lastly, BE with relaxed PAM requirements ⁵¹^; ⁷⁷ has increased the applicability of BE, and our study has further expanded the target space of this powerful technology. BE is appropriately viewed as a tool to correct disease-causing point mutations or to modify gene expression by introducing early stop codons or altering splice sites ⁷⁷^; ⁸⁷^; ⁸⁸. However, our study demonstrates that base conversion can address disease-causing repeat expansion mutations without involving DSB. The ramifications apply not only to HD but also to numerous other diseases that are caused by expansions of repeats ^4–8, offering alternative therapeutic approaches for the repeat expansion disorders.

Although promising, hurdles must be overcome before CAG-to-CAA conversion BE strategies are applied to humans. The BE strategies that we evaluated did not robustly generate DI alleles that are found in humans, potentially due to the possibility that PAM at specific sites that are required to generate DI did not sufficiently support the activity of CBEs that we tested. Therefore, new CBEs that can efficiently generate DI alleles will greatly facilitate the development of rational treatments for HD. Also, the inability to directly target alternative toxic species such as RAN translation or exon 1A huntingtin fragment ^89–92 may represent one of the limitations of our BE strategies for HD. Still, if the levels of those alternative toxic species are dependent of the length of uninterrupted CAG repeat, CAG-to-CAA conversion strategies may be able to ameliorate alternative toxic species-mediated HD pathogenesis. Although BE strategies can address the primary disease driver in principle, they may not produce any significant clinical benefits if they are applied too late in the disease. As we previously speculated, the timing of treatment might have negatively impacted the outcomes of the first ASO HTT lowering trial ³⁵^; ³⁶^; ⁸³^; ⁸⁴. Considering the evidence for significant levels of neurodegeneration at the onset of characteristic clinical manifestations ⁹³, CAG-to-CAA conversion treatments may not produce any clinical improvements if applied late. With the expectation of mutant-specific consequences, we reason that BE strategies can be applied quite early without involving deficiency-related adverse effects ^94–97 because CAG-to-CAA conversion is predicted to neither alter the amino acid sequence nor changes the expression levels of HTT. Regardless, these temporal aspects and safety features have to be determined. Finally, like other gene targeting strategies, the development of effective delivery methods is critical for applying BE therapeutically. The expansion-decreasing effects of our initial AAV injection experiments, while significant, may have been limited compared to cell culture systems by inefficient delivery and difficulty in targeting the repeat sequence. For the successful application of BE strategies to human HD, efficient delivery methods will be critical.

Given the lack of effective treatments for HD and the premature terminations of highly anticipated HTT- lowering clinical trials such as GENERATION-HD1 ⁸³ and VIBRANT-HD (https://www.hda.org.uk/media/4418/novartis-vibrant-hd-community-letter-final-pdf.pdf), aiming at the most relevant target is becoming increasingly important. Our data reveal relevant strategies for addressing the target most strongly supported by human HD genetic data, the uninterrupted CAG repeat in HTT, therefore offer new opportunities for blocking the disorder at its cause. Although both great promise and significant hurdles exist for the clinical application of BE strategies in HD, our data demonstrate the proof-of-concept of this technology as the basis for developing a rational treatment for HD and, potentially, for other repeat expansion disorders.

Supporting information

S. Figures

S. Tables

Abbreviations

HD: Huntington’s disease; huntingtin, HTT
Q: glutamine
CR: canonical repeat
LI: loss of interruption
DI: duplicated interruption
BE: base editing
CBE: cytosine base editor
PAM: protospacer adjacent motif
gRNA: guide
RNA: iPSC, induced pluripotent stem cell
EV: empty vector.

Acknowledgements

We thank Drs. Marcy E. MacDonald, James F. Gusella, and David Liu for helpful discussion. This work was supported by grants from Harvard NeuroDiscovery Center, NIH (NS105709, NS119471, NS091161, NS049206), and CHDI Foundation. B.P.K. was also supported by an MGH ECOR Howard M. Goodman Award and a CHDI Research Agreement (14962).

Declaration of Interests

V.C.W. was a founding scientific advisory board member with financial interest in Triplet Therapeutics Inc. Her financial interests were reviewed and are managed by Massachusetts General Hospital and Mass General Brigham in accordance with their conflict of interest policies. V.C.W. is a scientific advisory board member of LoQus23 Therapeutics Ltd. and has provided paid consulting services to Acadia Pharmaceuticals Inc., Alnylam Inc., Biogen Inc. and Passage Bio. V.C.W. has received research support from Pfizer Inc. B.P.K is an inventor on patents and/or patent applications filed by Mass General Brigham that describe genome engineering technologies. B.P.K. is a consultant for EcoR1 capital and is a scientific advisory board member of Acrigen Biosciences, Life Edit Therapeutics, and Prime Medicine. J-ML consults for Life Edit Therapeutics and serves in the advisory board of GenEdit Inc.

Data availability

RNAseq data of control and targeted iPSC clones have been deposited in Dryad (https://doi.org/10.5061/dryad.k3j9kd5cb).

References

1.
1. Huntington G
1872On choreaMed Surg Rep 26:320–321Google Scholar
2.
1. Huntington’s Disease Collaborative Research Group.
1993A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomesCell 72:971–983Google Scholar
3.
1. Bates G.P.
2. Dorsey R.
3. Gusella J.F.
4. Hayden M.R.
5. Kay C.
6. Leavitt B.R.
7. Nance M.
8. Ross C.A.
9. Scahill R.I.
10. Wetzel R.
11. et al.
2015Huntington diseaseNature reviews Disease primers 1:15005Google Scholar
4.
1. Gusella J.F.
2. MacDonald M.E
2000Molecular genetics: unmasking polyglutamine triggers in neurodegenerative diseaseNature reviews Neuroscience 1:109–115Google Scholar
5.
1. Ross C.A
2002Polyglutamine pathogenesis: emergence of unifying mechanisms for Huntington’s disease and related disordersNeuron 35:819–822Google Scholar
6.
1. Di Prospero N.A.
2. Fischbeck K.H.
2005Therapeutics development for triplet repeat expansion diseasesNature reviews Genetics 6:756–765Google Scholar
7.
1. Orr H.T.
2. Zoghbi H.Y
2007Trinucleotide repeat disordersAnnual review of neuroscience 30:575–621Google Scholar
8.
1. Depienne C.
2. Mandel J.L
202130 years of repeat expansion disorders: What have we learned and what are the remaining challenges?American journal of human genetics 108:764–785Google Scholar
9.
1. Paulson H.L.
2. Bonini N.M.
3. Roth K.A
2000Polyglutamine disease and neuronal cell deathProceedings of the National Academy of Sciences of the United States of America 97:12957–12958Google Scholar
10.
1. Gatchel J.R.
2. Zoghbi H.Y
2005Diseases of unstable repeat expansion: mechanisms and common principlesNature reviews Genetics 6:743–755Google Scholar
11.
1. Orr H.T.
2. Chung M.Y.
3. Banfi S.
4. Kwiatkowski T.J.
5. Servadio A.
6. Beaudet A.L.
7. McCall A.E.
8. Duvick L.A.
9. Ranum L.P.
10. Zoghbi H.Y
1993Expansion of an unstable trinucleotide CAG repeat in spinocerebellar ataxia type 1Nature genetics 4:221–226Google Scholar
12.
1. Pulst S.M.
2. Nechiporuk A.
3. Nechiporuk T.
4. Gispert S.
5. Chen X.N.
6. Lopes-Cendes I.
7. Pearlman S.
8. Starkman S.
9. Orozco-Diaz G.
10. Lunkes A.
11. et al.
1996Moderate expansion of a normally biallelic trinucleotide repeat in spinocerebellar ataxia type 2Nature genetics 14:269–276Google Scholar
13.
1. Stevanin G.
2. Durr A.
3. Brice A
2000Clinical and molecular advances in autosomal dominant cerebellar ataxias: from genotype to phenotype and physiopathologyEuropean journal of human genetics: EJHG 8:4–18Google Scholar
14.
1. Zoghbi H.Y.
2. Orr H.T
2000Glutamine repeats and neurodegenerationAnnual review of neuroscience 23:217–247Google Scholar
15.
1. Schols L.
2. Bauer P.
3. Schmidt T.
4. Schulte T.
5. Riess O
2004Autosomal dominant cerebellar ataxias: clinical features, genetics, and pathogenesisThe Lancet Neurology 3:291–304Google Scholar
16.
1. Pearson C.E.
2. Nichol Edamura K.
3. Cleary J.D
2005Repeat instability: mechanisms of dynamic mutationsNature reviews Genetics 6:729–742Google Scholar
17.
1. Andrew S.E.
2. Goldberg Y.P.
3. Kremer B.
4. Telenius H.
5. Theilmann J.
6. Adam S.
7. Starr E.
8. Squitieri F.
9. Lin B.
10. Kalchman M.A.
11. et al.
1993The relationship between trinucleotide (CAG) repeat length and clinical features of Huntington’s diseaseNature genetics 4:398–403Google Scholar
18.
1. Duyao M.
2. Ambrose C.
3. Myers R.
4. Novelletto A.
5. Persichetti F.
6. Frontali M.
7. Folstein S.
8. Ross C.
9. Franz M.
10. Abbott M.
11. et al.
1993Trinucleotide repeat length instability and age of onset in Huntington’s diseaseNature genetics 4:387–392Google Scholar
19.
1. Lee J.M.
2. Ramos E.M.
3. Lee J.H.
4. Gillis T.
5. Mysore J.S.
6. Hayden M.R.
7. Warby S.C.
8. Morrison P.
9. Nance M.
10. Ross C.A.
11. et al.
2012CAG repeat expansion in Huntington disease determines age at onset in a fully dominant fashionNeurology 78:690–695Google Scholar
20.
1. Kaplan S.
2. Itzkovitz S.
3. Shapiro E
2007A universal mechanism ties genotype to phenotype in trinucleotide diseasesPLoS computational biology 3:e235Google Scholar
21.
1. Telenius H.
2. Kremer B.
3. Goldberg Y.P.
4. Theilmann J.
5. Andrew S.E.
6. Zeisler J.
7. Adam S.
8. Greenberg C.
9. Ives E.J.
10. Clarke L.A.
11. et al.
1994Somatic and gonadal mosaicism of the Huntington disease gene CAG repeat in brain and spermNature genetics 6:409–414Google Scholar
22.
1. Shelbourne P.F.
2. Keller-McGandy C.
3. Bi W.L.
4. Yoon S.R.
5. Dubeau L.
6. Veitch N.J.
7. Vonsattel J.P.
8. Wexler N.S.
9. Group U.S.-V.C.R.
10. Arnheim N.
11. et al.
2007Triplet repeat mutation length gains correlate with cell-type specific vulnerability in Huntington disease brainHuman molecular genetics 16:1133–1142Google Scholar
23.
1. Pinto Mouro
2. Arning R.
3. Giordano L.
4. Razghandi J.V.
5. Andrew P.
6. Gillis M.A.
7. Correia T.
8. Mysore K.
9. Grote Urtubey J.S.
10. Parwez D.M.
11. et al C.R.
2020Patterns of CAG repeat instability in the central nervous system and periphery in Huntington’s disease and in spinocerebellar ataxia type 1Human molecular genetics 29:2551–2567Google Scholar
24.
1. Ciosi M.
2. Maxwell A.
3. Cumming S.A.
4. Hensman Moss D.J.
5. Alshammari A.M.
6. Flower M.D.
7. Durr A.
8. Leavitt B.R.
9. Roos R.A.C.
2019A genetic association study of glutamine-encoding DNA sequence structures, somatic CAG expansion, and DNA repair gene variants, with Huntington disease clinical outcomesEBioMedicine 48:568–580Google Scholar
25.
1. Swami M.
2. Hendricks A.E.
3. Gillis T.
4. Massood T.
5. Mysore J.
6. Myers R.H.
7. Wheeler V.C
2009Somatic expansion of the Huntington’s disease CAG repeat in the brain is associated with an earlier age of disease onsetHuman molecular genetics 18:3039–3047Google Scholar
26.
1. Consortium GeM-HD
2015Identification of Genetic Factors that Modify Clinical Onset of Huntington’s DiseaseCell 162:516–526Google Scholar
27.
1. Consortium GeM-HD
2019CAG Repeat Not Polyglutamine Length Determines Timing of Huntington’s Disease OnsetCell 178:887–900Google Scholar
28.
1. Hong E.P.
2. MacDonald M.E.
3. Wheeler V.C.
4. Jones L.
5. Holmans P.
6. Orth M.
7. Monckton D.G.
8. Long J.D.
9. Kwak S.
10. Gusella J.F.
11. et al.
2021Huntington’s Disease Pathogenesis: Two Sequential ComponentsJournal of Huntington’s disease 10:35–51Google Scholar
29.
1. Lee J.M.
2. Chao M.J.
3. Harold D.
4. Abu Elneel K.
5. Gillis T.
6. Holmans P.
7. Jones L.
8. Orth M.
9. Myers R.H.
10. Kwak S.
11. et al.
2017A modifier of Huntington’s disease onset at the MLH1 locusHuman molecular genetics 26:3859–3867Google Scholar
30.
1. Wright G.E.B.
2. Collins J.A.
3. Kay C.
4. McDonald C.
5. Dolzhenko E.
6. Xia Q.
7. Becanovic K.
8. Drogemoller B.I.
9. Semaka A.
10. Nguyen C.M.
11. et al.
2019Length of Uninterrupted CAG, Independent of Polyglutamine Size, Results in Increased Somatic Instability, Hastening Onset of Huntington DiseaseAmerican journal of human genetics 104:1116–1126Google Scholar
31.
1. Komor A.C.
2. Kim Y.B.
3. Packer M.S.
4. Zuris J.A.
5. Liu D.R
2016Programmable editing of a target base in genomic DNA without double-stranded DNA cleavageNature 533:420–424Google Scholar
32.
1. Nishida K.
2. Arazoe T.
3. Yachie N.
4. Banno S.
5. Kakimoto M.
6. Tabata M.
7. Mochizuki M.
8. Miyabe A.
9. Araki M.
10. Hara K.Y.
11. et al.
2016Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systemsScience 353Google Scholar
33.
1. Gaudelli N.M.
2. Komor A.C.
3. Rees H.A.
4. Packer M.S.
5. Badran A.H.
6. Bryson D.I.
7. Liu D.R
2017Programmable base editing of A*T to G*C in genomic DNA without DNA cleavageNature 551:464–471Google Scholar
34.
1. Levy J.M.
2. Yeh W.H.
3. Pendse N.
4. Davis J.R.
5. Hennessey E.
6. Butcher R.
7. Koblan L.W.
8. Comander J.
9. Liu Q.
10. Liu D.R
2020Cytosine and adenine base editing of the brain, liver, retina, heart and skeletal muscle of mice via adeno-associated virusesNature biomedical engineering 4:97–110Google Scholar
35.
1. Shin J.W.
2. Hong E.P.
3. Park S.S.
4. Choi D.E.
5. Seong I.S.
6. Whittaker M.N.
7. Kleinstiver B.P.
8. Chen R.Z.
9. Lee J.M.
2022Allele-specific silencing of the gain-of-function mutation in Huntington’s disease using CRISPR/Cas9JCI insight 7 Google Scholar
36.
1. Shin J.W.
2. Shin A.
3. Park S.S.
4. Lee J.M
2022Haplotype-specific insertion-deletion variations for allele-specific targeting in Huntington’s diseaseMolecular therapy Methods & clinical development 25:84–95Google Scholar
37.
1. Fjodorova M.
2. Li M
2018Robust Induction of DARPP32-Expressing GABAergic Striatal Neurons from Human Pluripotent Stem CellsMethods in molecular biology 1780:585–605Google Scholar
38.
1. Hsu P.D.
2. Scott D.A.
3. Weinstein J.A.
4. Ran F.A.
5. Konermann S.
6. Agarwala V.
7. Li Y.
8. Fine E.J.
9. Wu X.
10. Shalem O.
11. et al.
2013DNA targeting specificity of RNA-guided Cas9 nucleasesNature biotechnology 31:827–832Google Scholar
39.
1. Dobin A.
2. Davis C.A.
3. Schlesinger F.
4. Drenkow J.
5. Zaleski C.
6. Jha S.
7. Batut P.
8. Chaisson M.
9. Gingeras T.R
2013STAR: ultrafast universal RNA-seq alignerBioinformatics 29:15–21Google Scholar
40.
1. Alvarez Vera
2. Pongor R.
3. Marino-Ramirez L.S.
4. and Landsman L.
2019TPMCalculator: one-step software to quantify mRNA abundance of genomic featuresBioinformatics 35:1960–1962Google Scholar
41.
1. Wheeler V.C.
2. Auerbach W.
3. White J.K.
4. Srinidhi J.
5. Auerbach A.
6. Ryan A.
7. Duyao M.P.
8. Vrbanac V.
9. Weaver M.
10. Gusella J.F.
11. et al.
1999Length-dependent gametic CAG repeat instability in the Huntington’s disease knock-in mouseHuman molecular genetics 8:115–122Google Scholar
42.
1. Lloret A.
2. Dragileva E.
3. Teed A.
4. Espinola J.
5. Fossale E.
6. Gillis T.
7. Lopez E.
8. Myers R.H.
9. MacDonald M.E.
10. Wheeler V.C
2006Genetic background modifies nuclear mutant huntingtin accumulation and HD CAG repeat instability in Huntington’s disease knock-in miceHuman molecular genetics 15:2015–2024Google Scholar
43.
1. Lee J.M.
2. Zhang J.
3. Su A.I.
4. Walker J.R.
5. Wiltshire T.
6. Kang K.
7. Dragileva E.
8. Gillis T.
9. Lopez E.T.
10. Boily M.J.
11. et al.
2010A novel approach to investigate tissue-specific trinucleotide repeat instabilityBMC systems biology 4:29Google Scholar
44.
1. Lee J.M.
2. Pinto R.M.
3. Gillis T.
4. St Claire J.C.
5. Wheeler V.C
2011Quantification of age-dependent somatic CAG repeat instability in Hdh CAG knock-in mice reveals different expansion dynamics in striatum and liverPloS one 6:e23647Google Scholar
45.
1. Benjamini Y.
2. Drai D.
3. Elmer G.
4. Kafkafi N.
5. Golani I
2001Controlling the false discovery rate in behavior genetics researchBehavioural brain research 125:279–284Google Scholar
46.
1. McAllister B.
2. Donaldson J.
3. Binda C.S.
4. Powell S.
5. Chughtai U.
6. Edwards G.
7. Stone J.
8. Lobanov S.
9. Elliston L.
10. Schuhmacher L.N.
11. et al.
2022Exome sequencing of individuals with Huntington’s disease implicates FAN1 nuclease activity in slowing CAG expansion and disease onsetNature neuroscience 25:446–457Google Scholar
47.
1. Komor A.C.
2. Zhao K.T.
3. Packer M.S.
4. Gaudelli N.M.
5. Waterbury A.L.
6. Koblan L.W.
7. Kim Y.B.
8. Badran A.H.
9. Liu D.R.
2017Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C:G-to-T:A base editors with higher efficiency and product purityScience advances 3 :eaao4774Google Scholar
48.
1. Koblan L.W.
2. Doman J.L.
3. Wilson C.
4. Levy J.M.
5. Tay T.
6. Newby G.A.
7. Maianti J.P.
8. Raguram A.
9. Liu D.R.
2018Improving cytidine and adenine base editors by expression optimization and ancestral reconstructionNature biotechnology 36:843–846Google Scholar
49.
1. Thuronyi B.W.
2. Koblan L.W.
3. Levy J.M.
4. Yeh W.H.
5. Zheng C.
6. Newby G.A.
7. Wilson C.
8. Bhaumik M.
9. Shubina-Oleinik O.
10. Holt J.R.
11. et al.
2019Continuous evolution of base editors with expanded target compatibility and improved activityNature biotechnology 37:1070–1079Google Scholar
50.
1. Nishimasu H.
2. Shi X.
3. Ishiguro S.
4. Gao L.
5. Hirano S.
6. Okazaki S.
7. Noda T.
8. Abudayyeh O.O.
9. Gootenberg J.S.
10. Mori H.
11. et al.
2018Engineered CRISPR-Cas9 nuclease with expanded targeting spaceScience 361:1259–1262Google Scholar
51.
1. Walton R.T.
2. Christie K.A.
3. Whittaker M.N.
4. Kleinstiver B.P
2020Unconstrained genome targeting with near-PAMless engineered CRISPR-Cas9 variantsScience 368:290–296Google Scholar
52.
1. Kim Y.B.
2. Komor A.C.
3. Levy J.M.
4. Packer M.S.
5. Zhao K.T.
6. Liu D.R
2017Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytidine deaminase fusionsNature biotechnology 35:371–376Google Scholar
53.
1. Gehrke J.M.
2. Cervantes O.
3. Clement M.K.
4. Wu Y.
5. Zeng J.
6. Bauer D.E.
7. Pinello L.
8. Joung J.K.
2018An APOBEC3A-Cas9 base editor with minimized bystander and off-target activitiesNature biotechnology 36:977–982Google Scholar
54.
1. Fu J.
2. Li Q.
3. Liu X.
4. Tu T.
5. Lv X.
6. Yin X.
7. Lv J.
8. Song Z.
9. Qu J.
10. Zhang J.
11. et al.
2021Human cell based directed evolution of adenine base editors with improved efficiencyNature communications 12:5897Google Scholar
55.
1. Xu L.
2. Zhang C.
3. Li H.
4. Wang P.
5. Gao Y.
6. Mokadam N.A.
7. Ma J.
8. Arnold W.D.
9. Han R
2021Efficient precise in vivo base editing in adult dystrophic miceNature communications 12:3719Google Scholar
56.
1. Duong T.T.
2. Lim J.
3. Vasireddy V.
4. Papp T.
5. Nguyen H.
6. Leo L.
7. Pan J.
8. Zhou S.
9. Chen H.I.
10. Bennett J.
11. et al.
2019Comparative AAV-eGFP Transgene Expression Using Vector Serotypes 1-9, 7m8, and 8b in Human Pluripotent Stem Cells, RPEs, and Human and Rat Cortical NeuronsStem cells international 2019:7281912Google Scholar
57.
1. Villiger L.
2. Grisch-Chan H.M.
3. Lindsay H.
4. Ringnalda F.
5. Pogliano C.B.
6. Allegri G.
7. Fingerhut R.
8. Haberle J.
9. Matos J.
10. Robinson M.D.
11. et al.
2018Treatment of a metabolic liver disease by in vivo genome base editing in adult miceNature medicine 24:1519–1525Google Scholar
58.
1. Koblan L.W.
2. Erdos M.R.
3. Wilson C.
4. Cabral W.A.
5. Levy J.M.
6. Xiong Z.M.
7. Tavarez U.L.
8. Davison L.M.
9. Gete Y.G.
10. Mao X.
11. et al.
2021In vivo base editing rescues Hutchinson-Gilford progeria syndrome in miceNature 589:608–614Google Scholar
59.
1. Mangiarini L.
2. Sathasivam K.
3. Mahal A.
4. Mott R.
5. Seller M.
6. Bates G.P
1997Instability of highly expanded CAG repeats in mice transgenic for the Huntington’s disease mutationNature genetics 15:197–200Google Scholar
60.
1. Manley K.
2. Shirley T.L.
3. Flaherty L.
4. Messer A
1999Msh2 deficiency prevents in vivo somatic instability of the CAG repeat in Huntington disease transgenic miceNature genetics 23:471–473Google Scholar
61.
1. Kovtun I.V.
2. McMurray C.T
2001Trinucleotide expansion in haploid germ cells by gap repairNature genetics 27:407–411Google Scholar
62.
1. Kennedy L.
2. Evans E.
3. Chen C.M.
4. Craven L.
5. Detloff P.J.
6. Ennis M.
7. Shelbourne P.F
2003Dramatic tissue-specific mutation length increases are an early molecular event in Huntington disease pathogenesisHuman molecular genetics 12:3359–3367Google Scholar
63.
1. Kovalenko M.
2. Dragileva E.
3. St Claire J.
4. Gillis T.
5. Guide J.R.
6. New J.
7. Dong H.
8. Kucherlapati R.
9. Kucherlapati M.H.
10. Ehrlich M.E.
11. et al.
2012Msh2 acts in medium-spiny striatal neurons as an enhancer of CAG instability and mutant huntingtin phenotypes in Huntington’s disease knock-in micePloS one 7:e44273Google Scholar
64.
1. Pinto R.M.
2. Dragileva E.
3. Kirby A.
4. Lloret A.
5. Lopez E.
6. St Claire J.
7. Panigrahi G.B.
8. Hou C.
9. Holloway K.
10. Gillis T.
11. et al.
2013Mismatch repair genes Mlh1 and Mlh3 modify CAG instability in Huntington’s disease mice: genome-wide and candidate approachesPLoS genetics 9:e1003930Google Scholar
65.
1. Ament S.A.
2. Pearl J.R.
3. Grindeland A.
4. St Claire J.
5. Earls J.C.
6. Kovalenko M.
7. Gillis T.
8. Mysore J.
9. Gusella J.F.
10. Lee J.M.
11. et al.
2017High resolution time-course mapping of early transcriptomic, molecular and cellular phenotypes in Huntington’s disease CAG knock-in mice across multiple genetic backgroundsHuman molecular genetics 26:913–922Google Scholar
66.
1. Loupe J.M.
2. Pinto R.M.
3. Kim K.H.
4. Gillis T.
5. Mysore J.S.
6. Andrew M.A.
7. Kovalenko M.
8. Murtha R.
9. Seong I.
10. Gusella J.F.
11. et al.
2020Promotion of somatic CAG repeat expansion by Fan1 knock-out in Huntington’s disease knock-in mice is blocked by Mlh1 knock-outHuman molecular genetics 29:3044–3053Google Scholar
67.
1. Kacher R.
2. Lejeune F.X.
3. Noel S.
4. Cazeneuve C.
5. Brice A.
6. Humbert S.
7. Durr A
2021Propensity for somatic expansion increases over the course of life in Huntington diseaseeLife 10Google Scholar
68.
1. Kennedy L.
2. Shelbourne P.F
2000Dramatic mutation instability in HD mouse striatum: does polyglutamine load contribute to cell-specific vulnerability in Huntington’s disease?Human molecular genetics 9:2539–2544Google Scholar
69.
1. Wu Z.
2. Yang H.
3. Colosi P
2010Effect of genome size on AAV vector packagingMolecular therapy: the journal of the American Society of Gene Therapy 18:80–86Google Scholar
70.
1. Carvalho L.S.
2. Turunen H.T.
3. Wassmer S.J.
4. Luna-Velez M.V.
5. Xiao R.
6. Bennett J.
7. Vandenberghe L.H
2017Evaluating Efficiencies of Dual AAV Approaches for Retinal TargetingFrontiers in neuroscience 11:503Google Scholar
71.
1. Doudna J.A.
2. Charpentier E
2014Genome editingThe new frontier of genome engineering with CRISPR-Cas 9Google Scholar
72.
1. Hsu P.D.
2. Lander E.S.
3. Zhang F
2014Development and applications of CRISPR-Cas9 for genome engineeringCell 157:1262–1278Google Scholar
73.
1. Stadtmauer E.A.
2. Fraietta J.A.
3. Davis M.M.
4. Cohen A.D.
5. Weber K.L.
6. Lancaster E.
7. Mangan P.A.
8. Kulikovskaya I.
9. Gupta M.
10. Chen F.
11. et al.
2020CRISPR-engineered T cells in patients with refractory cancerScience 367Google Scholar
74.
1. Gillmore J.D.
2. Gane E.
3. Taubel J.
4. Kao J.
5. Fontana M.
6. Maitland M.L.
7. Seitzer J.
8. O’Connell D.
9. Walsh K.R.
10. Wood K.
11. et al.
2021CRISPR-Cas9 In Vivo Gene Editing for Transthyretin AmyloidosisThe New England journal of medicine 385:493–502Google Scholar
75.
1. Wang B.
2. Iriguchi S.
3. Waseda M.
4. Ueda N.
5. Ueda T.
6. Xu H.
7. Minagawa A.
8. Ishikawa A.
9. Yano H.
10. Ishi T.
11. et al.
2021Generation of hypoimmunogenic T cells from genetically engineered allogeneic human induced pluripotent stem cellsNature biomedical engineering 5:429–440Google Scholar
76.
1. Neugebauer M.E.
2. Hsu A.
3. Arbab M.
4. Krasnow N.A.
5. McElroy A.N.
6. Pandey S.
7. Doman J.L.
8. Huang T.P.
9. Raguram A.
10. Banskota S.
11. et al.
2022Evolution of an adenine base editor into a small, efficient cytosine base editor with low off-target activityNature biotechnology Google Scholar
77.
1. Rees H.A.
2. Liu D.R
2018Base editing: precision chemistry on the genome and transcriptome of living cellsNature reviews Genetics 19:770–788Google Scholar
78.
1. Newby G.A.
2. Yen J.S.
3. Woodard K.J.
4. Mayuranathan T.
5. Lazzarotto C.R.
6. Li Y.
7. Sheppard-Tillman H.
8. Porter S.N.
9. Yao Y.
10. Mayberry K.
11. et al.
2021Base editing of haematopoietic stem cells rescues sickle cell disease in miceNature 595:295–302Google Scholar
79.
1. Kingwell K.
2022Base editors hit the clinic. Nature reviews Drug discovery21:545–547Google Scholar
80.
1. Eisenstein M
2022Base editing marches on the clinicNature biotechnology 40:623–625Google Scholar
81.
1. McMurray C.T
2010Mechanisms of trinucleotide repeat instability during human developmentNature reviews Genetics 11:786–799Google Scholar
82.
1. Nelson M.R.
2. Tipney H.
3. Painter J.L.
4. Shen J.
5. Nicoletti P.
6. Shen Y.
7. Floratos A.
8. Sham P.C.
9. Li M.J.
10. Wang J.
11. et al.
2015The support of human genetic evidence for approved drug indicationsNature genetics 47:856–860Google Scholar
83.
1. Relations R.G.M.
2021Roche provides update on tominersen programme in manifest Huntington’s diseaseGoogle Scholar
84.
1. Sheridan C
2021Questions swirl around failures of disease-modifying Huntington’s drugsNature biotechnology 39:650–652Google Scholar
85.
1. Shin J.W.
2. Kim K.H.
3. Chao M.J.
4. Atwal R.S.
5. Gillis T.
6. MacDonald M.E.
7. Gusella J.F.
8. Lee J.M
2016Permanent inactivation of Huntington’s disease mutation by personalized allele-specific CRISPR/Cas9Human molecular genetics Google Scholar
86.
1. Monteys A.M.
2. Ebanks S.A.
3. Keiser M.S.
4. Davidson B.L
2017CRISPR/Cas9 Editing of the Mutant Huntingtin Allele In Vitro and In VivoMolecular therapy: the journal of the American Society of Gene Therapy 25:12–23Google Scholar
87.
1. Kuscu C.
2. Parlak M.
3. Tufan T.
4. Yang J.
5. Szlachta K.
6. Wei X.
7. Mammadov R.
8. Adli M
2017CRISPR-STOP: gene silencing through base-editing-induced nonsense mutationsNature methods 14:710–712Google Scholar
88.
1. Kim K.
2. Ryu S.M.
3. Kim S.T.
4. Baek G.
5. Kim D.
6. Lim K.
7. Chung E.
8. Kim S.
9. Kim J.S
2017Highly efficient RNA-guided base editing in mouse embryosNature biotechnology 35:435–437Google Scholar
89.
1. Yang S.
2. Yang H.
3. Huang L.
4. Chen L.
5. Qin Z.
6. Li S.
7. Li X.J
2020Lack of RAN-mediated toxicity in Huntington’s disease knock-in miceProceedings of the National Academy of Sciences of the United States of America 117:4411–4417Google Scholar
90.
1. Neueder A.
2. Landles C.
3. Ghosh R.
4. Howland D.
5. Myers R.H.
6. Faull R.L.M.
7. Tabrizi S.J.
8. Bates G.P
2017The pathogenic exon 1 HTT protein is produced by incomplete splicing in Huntington’s disease patientsScientific reports 7:1307Google Scholar
91.
1. Banez-Coronel M.
2. Ayhan F.
3. Tarabochia A.D.
4. Zu T.
5. Perez B.A.
6. Tusi S.K.
7. Pletnikova O.
8. Borchelt D.R.
9. Ross C.A.
10. Margolis R.L.
11. et al.
2015RAN Translation in Huntington DiseaseNeuron 88:667–677Google Scholar
92.
1. Sathasivam K.
2. Neueder A.
3. Gipson T.A.
4. Landles C.
5. Benjamin A.C.
6. Bondulich M.K.
7. Smith D.L.
8. Faull R.L.
9. Roos R.A.
10. Howland D.
11. et al.
2013Aberrant splicing of HTT generates the pathogenic exon 1 protein in Huntington diseaseProceedings of the National Academy of Sciences of the United States of America 110:2366–2370Google Scholar
93.
1. Paulsen J.S.
2. Langbehn D.R.
3. Stout J.C.
4. Aylward E.
5. Ross C.A.
6. Nance M.
7. Guttman M.
8. Johnson S.
9. MacDonald M.
10. Beglinger L.J.
11. et al.
2008Detection of Huntington’s disease decades before diagnosis: the Predict-HD studyJournal of neurology, neurosurgery, and psychiatry 79:874–880Google Scholar
94.
1. Lopes F.
2. Barbosa M.
3. Ameur A.
4. Soares G.
5. de Sa J.
6. Dias A.I.
7. Oliveira G.
8. Cabral P.
9. Temudo T.
10. Calado E.
11. et al.
2016Identification of novel genetic causes of Rett syndrome-like phenotypesJournal of medical genetics 53:190–199Google Scholar
95.
1. Rodan L.H.
2. Cohen J.
3. Fatemi A.
4. Gillis T.
5. Lucente D.
6. Gusella J.
7. Picker J.D
2016A novel neurodevelopmental disorder associated with compound heterozygous variants in the huntingtin geneEuropean journal of human genetics: EJHG 24:1826–1827Google Scholar
96.
1. Dietrich P.
2. Johnson I.M.
3. Alli S.
4. Dragatsis I
2017Elimination of huntingtin in the adult mouse leads to progressive behavioral deficits, bilateral thalamic calcification, and altered brain iron homeostasisPLoS genetics 13:e1006846Google Scholar
97.
1. Wang G.
2. Liu X.
3. Gaertig M.A.
4. Li S.
5. Li X.J
2016Ablation of huntingtin in adult neurons is nondeleterious but its depletion in young mice causes acute pancreatitisProceedings of the National Academy of Sciences of the United States of America 113:3359–3364Google Scholar

Article and author information

Author information

Doo Eun Choi
Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA, Department of Neurology, Harvard Medical School, Boston, MA 02115, USA
Jun Wan Shin
Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA, Department of Neurology, Harvard Medical School, Boston, MA 02115, USA
Sophia Zeng
Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
Eun Pyo Hong
Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA, Department of Neurology, Harvard Medical School, Boston, MA 02115, USA, Medical and Population Genetics Program, The Broad Institute of M.I.T. and Harvard, Cambridge, MA 02142, USA
Jae-Hyun Jang
Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA, Department of Neurology, Harvard Medical School, Boston, MA 02115, USA
Jacob M. Loupe
Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA, Department of Neurology, Harvard Medical School, Boston, MA 02115, USA
Vanessa C. Wheeler
Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA, Department of Neurology, Harvard Medical School, Boston, MA 02115, USA
Hannah E. Stutzman
Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA, Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA
Benjamin P. Kleinstiver
Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA, Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA, Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
ORCID iD: 0000-0002-5469-0655
Jong-Min Lee
Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA, Department of Neurology, Harvard Medical School, Boston, MA 02115, USA, Medical and Population Genetics Program, The Broad Institute of M.I.T. and Harvard, Cambridge, MA 02142, USA
ORCID iD: 0000-0001-5799-0787
- Corresponding Author: Jong-Min Lee 185 Cambridge Street, Boston, MA 02114, USA Phone: 617-726-9724 Email: jlee51@mgh.harvard.edu

Version history

Preprint posted: April 28, 2023
Sent for peer review: July 14, 2023
Reviewed Preprint version 1: December 19, 2023
Version of Record published: June 13, 2024
Version of Record updated: October 31, 2024

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.89782. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

views: 2,591
downloads: 306
citations: 22

Views, downloads and citations are aggregated across all versions of this paper published by eLife.