Effects of CAA interruption on HD age-at-onset.

(A-C) Least square approximation was performed to estimate the additional effects of LI (red circles in panel b and c) and DI on age-at-onset (green circles in panel c and d). We varied the CAG length of HD participants carrying LI or DI, and subsequently calculated sum of square (SS) to identify the CAG repeat that explained the maximum variance in age-at-onset of these allele carriers. Y-axis and X-axis represent age-at-onset and CAG repeat length, respectively. Grey circles and black trend lines respectively represent HD participants with CR alleles and their onset-CAG relationship. SS means sum of square.

(D) To illustrate the magnitude of the impact of a therapeutic base editing strategy of converting a CR allele to a DI allele by changing CAG to CAA, an example of a CR of 43 CAG (45 glutamine) with a mean observed onset of 48 years is displayed. In this example, therapeutic conversion of the 42nd CAG to CAA by BE would produce a DI allele of 41 CAG (45 glutamine). Considering the additional effect of DI alleles in HD patients, a 41 CAG / 45 glutamine DI allele would produce an onset similar to a CR allele of 40 CAG / 42 glutamine, with a mean onset age of 60. Therefore, CAG-to-CAA conversion in HD subjects with 43 CR repeats could delay onset by 12 years.

Cytosine base editors and gRNAs for CAG-to-CAA conversion in HD.

(A) Constituents of base editing are displayed.

(B) Schematic of cytosine base editors (CBEs) that can generate C-to-T edits within a finite edit window at a fixed distance from the PAM.

(C) CBE variants described in the literature and used in this study (lower 4) are shown, including the evoCDA1- based SpG CBE that should function more efficiently in GC nucleotide contexts. Protospacer-adjacent motif (PAM), guide RNA (gRNA), uracil glycosylase inhibitor (UGI), rat APOBEC1 deaminase domain (rAPO1), evolved CDA1 cytosine deaminase domain (evoCDA).

(D) The target region, gRNAs, and expected hybridization sites of the 8 gRNAs are shown.

Levels of CAG-to-CAA conversion by BE strategies.

Only CAG-to-CAA conversion showed significantly increased levels over the baseline sequencing errors. Thus, we calculated the percentage of CAA in the cells that were treated with a combination of cytosine base editors (A, BE4max; B, BE4-NG; C, BE4-SpG; and D, evo-SpG) and gRNAs. HEK293 cells without any treatment (i.e., Cell) were combined (n=8) and plotted for each base editor. EV represents HEK293 cells treated with a base editor and empty vector for gRNA. *, significant by Bonferroni corrected p-value < 0.05 (8 tests for each base editor).

Sites of CAG-to-CAA conversion by BE strategies.

We calculated the percentage of sequence reads containing CAA at specific sites relative to all sequence reads. For example, 27.7% conversion at the 2nd CAG by BE4max-gRNA 1 (top left panel, red) means 27.7 % of all sequence reads from the 16 or 17 CAG alleles have CAA at the 2nd CAG. X-axis and y-axis represent the position of the CAG and percent conversion. Each panel represents a tested gRNA. Plots were based on the mean of 3 independent transfection experiments in HEK293 cells after subtracting corresponding empty vector (EV)-treated cell data. Red, blue, purple, and cyan traces represent BE4max, BE4-NG, BE4-SpG, and evo-SpG.

Allele specificity and molecular outcomes of candidate BE strategies.

(A) To overcome the limitations of patient-derived iPSC and differentiated neurons, we developed HEK293 carrying an adult-onset CAG repeat by replacing one of the normal repeats with 51 canonical CAG (namely HEK293-51 CAG). Red and green bars represent respectively mutant and normal HTT in HEK293-51 CAG cells.

(B and C) The HEK293-51 CAG cells were treated with BE4max-gRNA 1 and analyzed to determine the levels of in-frame insertion (B) and in-frame deletion (C) at the time of treatment.

(D) The HEK293-51 CAG cells were treated with the gRNA 1 and analyzed by MiSeq to determine the levels of allele specificity. Conversion efficiency on the Y-axis indicates the percentage of sequence reads containing the CAG-to-CAA conversion at the target site. * represents uncorrected p-value < 0.05 by Student t-test.

(E) Original HEK293 cells and HEK293-51 CAG cells were treated with empty vector (EV), or candidate BE strategies (BE4max-gRNA 1 and BE4max-gRNA 2) and subjected to immunoblot analysis; representative blot is shown in panel E.

(F) Four independent experiments were performed, and we performed one-sample t-test to determine whether BE-treated cells show different total HTT protein levels compared to EV-treated cells. Nothing was significant by p-value < 0.05.

RNAseq analysis of BE strategies confirms the lack of transcriptome alternation.

(A) HEK293 cells were treated with empty vector (EV) or candidate BE strategies such as BE4max-gRNA 1 (gRNA 1), and BE4max-gRNA 2 (gRNA 2) for RNAseq analysis. MiSeq analysis was also performed to judge the levels of CAG-to-CAA conversion. ****, p-value < 0.0001 by Student t-test (n=4).

(B) Confirming the lack of significantly altered genes in BE4max-gRNA 1 or BE4max-gRNA 2, we compared all BE-treated samples (n=8) with all EV-treated samples (n=4) to increase the power in the RNAseq differential gene expression analysis. Each circle in the volcano plot represents a gene analyzed in the RNAseq; HTT is indicated by a filled red circle. A red horizontal line represents false discovery rate of 0.05, showing that none was significantly altered by candidate BE strategies.

(C) We also compared two groups of randomly assigned samples (6 samples vs. 6 samples) to understand the shape of the volcano plot when there were no significant genes.

Impacts of CAA interruption on CAG repeat instability.

(A - C) DNA samples (liver and tail) of BE-treated mice were analyzed to quantify somatic repeat expansion. We performed linear regression analysis to model the levels of repeat expansion as a function of treatment, CAG repeat in tail (A), age (B), and with other covariates (i.e., experimental batch, sex, tail CAG and age). Summary of the statistical analysis is summarized in the panel C.

(D and E) To determine the maximal impacts of CAA interruption on the repeat expansion, HD knock-in mice carrying CAA interrupted repeats were analyzed. Liver samples of 105 uninterrupted CAG repeat (D) and interrupted repeat (E) were analyzed at 5 months. Representative fragment analysis is displayed. Red arrows indicate the modal alleles representing inherited CAG repeats; peaks at the right side of the modal peaks (red arrows) represent expanded repeats.