Propensity for somatic expansion increases over the course of life in Huntington disease
Abstract
Recent work on Huntington disease (HD) suggests that somatic instability of CAG repeat tracts, which can expand into the hundreds in neurons, explains clinical outcomes better than the length of the inherited allele. Here, we measured somatic expansion in blood samples collected from the same 50 HD mutation carriers over a twenty-year period, along with post-mortem tissue from 15 adults and 7 fetal mutation carriers, to examine somatic expansions at different stages of life. Post-mortem brains, as previously reported, had the greatest expansions, but fetal cortex had virtually none. Somatic instability in blood increased with age, despite blood cells being short-lived compared to neurons, and was driven mostly by CAG repeat length, then by age at sampling and by interaction between these two variables. Expansion rates were higher in symptomatic subjects. These data lend support to a previously proposed computational model of somatic instability-driven disease.
Introduction
Most mutations are stably transmitted from parent to offspring. This reliable genetic principle does not hold, however, for dynamic mutation disorders such as Fragile X syndrome or Huntington disease (HD). In these diseases, a sequence such as a CAG repeat tract can expand during transmission, likely through mechanisms involving replication or transcription (Khristich and Mirkin, 2020). In general, the longer the repeat, the earlier the patient develops overt symptoms and the more aggressive the disease is likely to be (Koshy and Zoghbi, 1997). Thus, in HD, modest expansions of 40 repeats in huntingtin gene (HTT) are associated with the appearance of motor, cognitive, and psychiatric disturbances in mid- or late adulthood, whereas large expansions of over 80 repeats cause childhood onset with additional features such as epilepsy and a more rapidly fatal course (Bates et al., 2015b; Sun et al., 2017). Yet on an individual subject basis, we cannot predict the disease course just from the size of the repeat tract: two individuals with the same length repeat expansion in HTT may experience disease onset decades apart (Andrew et al., 1993). The inherited pathological CAG repeat size accounts for about 42–71% of the age at onset in HD (Squitieri et al., 2006), though the confidence limits narrow for tracts longer than 50 CAGs (Andrew et al., 1993; Bates et al., 2015b; Langbehn et al., 2004; Rubinsztein et al., 1997; Wexler et al., 2004).
Part of the reason for such variability could be that HD is still thought of primarily as a movement disorder, so age at onset in HD is typically defined as the point at which motor symptoms become unequivocal. But abnormalities in the brain are present from early development (Barnat et al., 2020), and mutation carriers may experience cognitive deficits, psychiatric disturbances, or even subtle motor impairments years before diagnosis (Bates et al., 2015b). It is challenging, and somewhat misleading, to pinpoint age at onset in a disease that evolves insidiously like HD.
A more interesting explanation takes into account the fact that CAG repeats do not just expand in the germline. They are also somatically unstable, such that different CAG expansions can be identified in the same sample tissue from various organs and brain regions. Somatic mosaicism occurs both in mouse models of HD (Kennedy and Shelbourne, 2000; Larson et al., 2015; Lee et al., 2010) and in humans with HD (Kennedy et al., 2003; Swami et al., 2009; Telenius et al., 1994). The greatest increases in CAG tract length have been observed in the brain regions most affected in HD, the cerebral cortex and the striatum, whose neurons can harbor repeat tract expansions in the hundreds (Gonitel et al., 2008; Kennedy et al., 2003; Møllersen et al., 2010; Mouro Pinto et al., 2020; Shelbourne et al., 2007). Repeat expansions likely result from the formation of unusual DNA structures that predispose the tract to errors in mismatch repair (Khristich and Mirkin, 2020; Tabrizi et al., 2020). In fact, variants in several different DNA repair genes are associated with somatic instability in both animal models of HD (Dragileva et al., 2009; Pinto et al., 2013; Tomé et al., 2013) and HD patients (Ciosi et al., 2019; Flower et al., 2019; Lee et al., 2019).
Mounting evidence suggests that somatic repeat lengths better explain age at onset than the germline repeat, as their propensity to expand relates to both the baseline allele length and age (Ciosi et al., 2019; Lee et al., 2019). Interestingly, these data lend support to a mathematical model put forth over a decade ago (Kaplan et al., 2007). In brief, Kaplan et al. proposed that the onset and progression of triplet repeat diseases, including HD, are determined by the rate of somatic expansion in disease-relevant cells. Symptoms manifest when a critical proportion of cells (say, 20%) pass a pathogenic threshold, which would differ for different cell types. Their modeling suggests the threshold in striatal neurons for HD would be ~115 repeats. They further posited that at birth, nearly all cells would carry just the inherited number of repeats, but that over time the mutant alleles would further expand at a rate that increased linearly with the number of repeats. The rate of expansion would thus determine how rapidly the pathological state is reached, and thus should influence disease onset and progression.
The Kaplan model is quite compelling, but to test its predictions requires longitudinal data to study the evolution of somatic instability over time within patients. Given that the HTT mutation was discovered less than thirty years ago, such a study is only now becoming feasible. Even so, there are limits to how much of the model can be tested in humans. We cannot, for example, sample neurons over the life span to see how many come to exceed 115 repeats, or tally the proportion of neurons that reach a pathogenic threshold before phenoconversion. Nevertheless, we have been able to measure somatic repeat expansions in blood samples from HD carriers and patients over a twenty-year period and examine cortical tissue from mutation-carrying fetuses and deceased adults. By characterizing the degree of somatic expansion at these different stages, we were able to analyze associations between changes in the somatic expansion, age, and inherited CAG repeat length.
Results
Determination of somatic expansion index in HD carriers
We collected biological samples from 72 HD mutation carriers across the life span: 7 fetuses, 50 adults, and 15 post-mortem brains (see Tables 1, 2 and 3). For all samples, we calculated an expansion index (EI) based on a specific PCR followed by fragment sizing to identify the peaks corresponding to different numbers of CAG repeats, or (CAG)n (Lee et al., 2010; Mouro Pinto et al., 2020). The expanded allele has a characteristic PCR profile with one particularly prominent peak, which provides the CAG repeat size given for diagnosis (Figure 1—figure supplement 1A, see 'Materials and methods'). This ‘reference peak’ is flanked by additional peaks that reveal the various repeat lengths in a given tissue, which we refer to as mosaicism or somatic instability. The fluorescence intensity of each peak reflects the proportion of cells bearing each somatic expansion, but it is worth noting that PCR is biased toward alleles containing smaller repeats. Because peaks to the left of the reference peak can be generated by polymerase slippage during PCR, we used only those to the right of the main peak to calculate the EI (Figure 1—figure supplement 1B). We normalized the heights of the somatic expansion peaks to the height of the reference peak, excluding any that were less than 3% of the main peak height. We multiplied each peak’s height by its position to account for the increased repeat length, then summed the peak heights for each sample. An EI of 0 indicates no expansion beyond the inherited allele, and an index >0 indicates mosaicism of the CAG repeat expansion in the tissue.
The CAG expansion is usually followed by a CAACAG cassette that can be duplicated or, in some cases, deleted (Ciosi et al., 2019). There are 21 CAG repeats in the reference sequence NG_009378.1, the cassette CAACAGCCGCCA followed by seven CCG and two CCT. When the cassette is changed to CAGCAGCCGCCA by loss of the CAA interruption (Wright et al., 2019), the tract becomes less stable and more prone to expansion (Khristich and Mirkin, 2020; Rolfsmeier et al., 2000; Xu et al., 2020). We did not detect this variant in our samples. We excluded one patient from the original cohort who had an additional CAA interruption in the CAG expansion.
The somatic expansion index increases over the life span in both blood and brain samples
Because of a long period of clinical prospective follow-up of HD patients at the Pitié-Salpêtrière Hospital, we were able to analyze blood samples that were collected during clinical visits at different ages for 50 HD patients (31 women, 19 men; mean reference (CAG)n 44.6 ± 3.5 [range 39–54]) (Table 2). With up to three samples (n = 50 for t1 and t2, n = 12 for t3), taken on average 12 and 7 years apart, respectively, we could analyze the progression of somatic instability over quite a long period of time. The EI increased over time (Figure 1A), with the aggregate EI increasing from t1 (0.620 ± 0.655) to t2 (0.881 ± 0.929) to t3 (0.967 ± 0.841) (Table 2). Regression and Pearson’s correlation showed a significant linear relationship between EI and reference (CAG)n in the blood at t1 (r = 0.816, slope = 0.155, p=5.0e-13), t2 (r = 0.880, slope = 0.237, p<2.2e-16), and t3 (r = 0.901, slope = 0.203, p=6.3e-5) (Figure 1B, left). It is interesting to note that in our cohort, the lowest index value associated with a symptomatic subject was 0.137; this patient had a reference repeat of 39 CAGs and showed overt motor signs at the age of 49 (Table 2).
We were able to evaluate cortices from a separate group of 15 deceased patients (Figure 1B, right). As expected from previous studies (Shelbourne et al., 2007; Telenius et al., 1994), these tissues had the highest EI in our cohort (3.361 ± 2.390, range: 1.288 to 9.094) (Table 3), which correlated with the CAG repeat length (r = 0.615, slope = 0.492, p=0.015). We also had a post-mortem brain from a juvenile-onset case with a reference CAG repeat size of 128. The extreme mosaicism in this tissue, however, made it difficult to determine a main CAG peak or calculate an EI using the PCR profile, so we did not include it in our analyses (Figure 1—figure supplement 1C).
Because severe neuronal loss could skew the detection of expansions (Mouro Pinto et al., 2020), we were particularly interested in examining brain tissue from early development. We analyzed fetal cortical samples from seven HD gene carriers at 13 weeks’ gestation (CAG: 40–46, Table 1; Figure 1B,C; Barnat et al., 2020). Although the adult HD cortex has been consistently found to bear the greatest somatic expansions, the fetal cortex showed almost no mosaicism: the somatic EIs were very small, ranging from 0.043 to 0.060 (0.050 ± 0.006), though they still correlated with CAG repeat length (p=0.023) (Table 1). These indices were extremely close to those from trophoblast tissues that were analyzed for prenatal diagnosis between 11 and 12 weeks' gestation (Figure 1C, left). Yet blood samples taken from their premanifest carrier parents at the same time (n = 6, CAG: 42.8 ± 2.5, 40–45; Table 1; these adults were not part of the longitudinal cohort) showed somatic expansions, with a mean EI of 0.256 ± 0.11 (range: 0.107 to 0.369; Figure 1C, left).
To better visualize these differences between parental blood and fetal tissue, we graphed somatic mosaicism in fetal cortices, trophoblasts, and premanifest parents for four different reference CAG lengths and estimated the percent of mutant alleles harboring each somatic expansion length (Figure 1C, right). There is clearly more variability in the parental blood (dark orange bars) than in the fetal brain tissue (green bars). Similarly, comparison of somatic mosaicism in three of the fetal brains, the blood samples (across three timepoints) from three patients in our longitudinal cohort, and three adult post-mortem cortices (Figure 2) clearly shows that mosaicism increased over time in blood cells but was even more marked in the adult brain, with more additional CAGs for a given reference CAG repeat size.
Determination of somatic expansion rate
We next asked whether the propensity to expand grows over time, and whether an ‘expansion rate’ (ER) that estimates the average annual expansion growth for each patient would correlate with the available clinical outcomes. To this end, we first ruled out the possibility of a sex effect by verifying that there was no sex difference in the AO (female: n = 31, 41.9 ± 8.5 [range 25–61]; male: n = 19, 42.4 ± 14.3 [range 25–80]; Wilcoxon rank-sum test, p=0.899) or in the age at death (AD) (female: n = 7, 59.4 ± 9.9 [range 49–77]; male: n = 7, 65.7 ± 16.7, 39–91; Wilcoxon rank-sum test, p=0.442).
We then calculated an ER for each of the 50 subjects using the slope of the regression line for the EI on ages at visits (0.024 ± 0.033 units per year [range −0.0003 to 0.1367], Table 2). Because calculating a rate entails having a baseline, we chose to extrapolate a plausible, if theoretical, EI at AO (EI-AO). To do so we used the slope and the intercept (estimated EI at birth) for each patient to estimate EI-AO (see 'Materials and methods'). A Pearson’s correlation coefficient of r = 0.861 (p=2.1e-15) showed a strong association between the reference CAG repeat size and the somatic ER (Figure 3—figure supplement 1). Also, a Pearson’s correlation coefficient of r = 0.847 (p=1.6e-14) showed a strong association between the reference CAG repeat size and EI-AO (Figure 3—figure supplement 1).
EI and ER correlations with age at onset, age at death, and disease manifest status
To determine whether EI or ER could explain the variation in AO not explained by the reference repeat, we first needed to calculate how (CAG)n correlates with AO in our sample. In our longitudinal cohort of 50 subjects, (CAG)n accounted for 47.6% of variance in AO, which is at the low end of the published ranges (~42–71%) (Squitieri et al., 2006; Figure 3A, left). This is likely due to our small sample relative to many such studies, which can include hundreds to thousands of patients. CAG repeat length accounted for 68% of variance in age at death (AD) (Figure 3A, right). Nevertheless, we proceeded to analyze the relationships between EI-AO, ER, AO, and AD. EI-AO had an inverse correlation with AO (r = −0.437, p=1.7e-03) and AD (r = −0.666, p=9.3e-03) (Figure 3—figure supplement 1) and accounted for 20.7% of the variance in AO and 49.7% of variance in AD from the longitudinal group (n = 14 patients who died during the study) (Figure 3B). ER had an inverse correlation with AO (r = −0.541, p=5.9e-05) and AD (r = −0.261, p=3.7e-01) (Figure 3—figure supplement 1); it accounted for 33% of the variance in AO and did not account for the variance in AD from the longitudinal group (Figure 3C). Notably, ER explained a larger proportion of AO variance than EI-AO. EI-AO accounted for more of the variance in AD than did ER, but it is difficult to draw conclusions based on the small sample of patients for the AD data.
We then took an alternative approach to understanding variation in AO. We classified individuals into three groups indicating expected AO, earlier- or later-than-expected AO, as defined by the model errors in the linear regression of AO and reference CAG repeat size (Figure 3A, left; see 'Materials and methods'). Neither EI-AO nor ER accounted for the differences in AO among these groups, despite a trend for lower ER in the later-than-expected group (Kruskal-Wallis test, rate: p=0.181, EI-AO: p=0.810, Figure 4A). Given the difficulties inherent in pinpointing AO, we asked whether we could see an influence of residual ER on the more general classification of premanifest vs manifest. Here we found significant differences between groups in both residual EI and residual ER (Wilcoxon test, p=3.5e-05 and p=0.023 respectively, Figure 4B).
We next asked whether we could find correlations based on the post-mortem cortices (Table 3). A previous study on post-mortem HD brains showed that, after accounting for reference CAG repeat size, greater somatic expansions in the cortex correlated significantly with earlier AO (Swami et al., 2009). Since we did not have information on AO for the 14 subjects in the post-mortem group, we asked whether EI (from the postmortem samples) or ER (from the blood sample group) correlated with residual AD. We calculated the residual AD after accounting for the effect of the reference CAG repeat length, compared to the ER derived from the blood measures (p=0.028, R2adj = 28.6%; Figure 4—figure supplement 1A) or to the EI derived from the postmortem cohort (p=0.578, R2adj < 0, Figure 4—figure supplement 1B). With the caveat that we do not know the cause of death in all cases (which could be due to causes other than HD), EI from brain samples did not correlate with the residual AD, but ER from blood samples correlated weakly with residual AD. A larger sample would likely reveal stronger correlations.
Within-subject variation in somatic mosaicism depends on (CAG)n, age, and the interaction of these two variables
We next sought to understand the relative contributions of the reference repeat size and age on the tendency toward somatic expansions. To account for the repeated measurements for each patient, a linear mixed-effects model (LMM) was fitted to the EI data on a log scale. Based on the fixed effects of the derived model, we found significant effects of age (coefficient = 0.028, SE = 0.001, p=3.4e-34) and number of CAG repeats (coefficient = 0.276, SE = 0.012, p=1.6e-30). In addition, the significant interaction between age and CAG repeat length suggests that, as CAG repeat length increases, the expansions become greater each year (coefficient = 0.002, SE = 3.8e-4, p=2.2e-7) (Table 4). As both age and CAG were mean-centered in the model, the exponential intercept would also indicate a predicted EI of 0.547 (intercept = −0.603, SE = 0.048, p=1.0e-16) for a hypothetical patient carrying the mean characteristics of the cohort (i.e., average age in the cohorts of 44.6 years and mean CAG repeat size of 44.7). Sex was again used as a cofactor and did not show any significant effect on EI (coefficient = 0.028, SE = 0.078, p=0.719).
Finally, the contribution of each fixed effects term explaining the EI, given by t values (estimate divided by SE) in descending order of importance, was as follows: t(CAG)=24.0, t(age)=22.8 and t(age ×CAG)=5.7. Based on the fixed effects estimation extracted from the LMM, we plotted trajectories for the EI as of function of age (one trajectory for each CAG repeat length). The predicted values of each EI are shown on the original scale after back-transformation from the logarithmic scale over the same age intervals from the patient cohort for each (CAG)n (Figure 5). This model provides a glimpse of how instability evolves with (CAG)n, age, and the interaction between these two factors.
Discussion
Our longitudinal study provides data that support the Kaplan model in several ways (Kaplan et al., 2007). First, the model predicts that at the beginning of life, all disease-relevant cells begin with the inherited repeat and negligible somatic instability. This turns out to be the case: we found almost no somatic mosaicism at the fetal stage. One might have expected that the high number of mitoses at this stage of brain development would make neural precursors sensitive to double-strand breaks and replication errors (Leija-Salazar et al., 2018; Schwer et al., 2016), but somatic expansions occur through different mechanisms than germline expansions (Khristich and Mirkin, 2020; Tabrizi et al., 2020). Although we did not have samples from embryos at later stages, there is such data for other diseases caused by repeat expansions. For instance, in Friedreich ataxia, which is caused by an expanded GAA repeat in the first intron on both alleles of the FXN gene, levels of instability found in tissues from an 18-week-old fetus were very low compared to adult-derived tissues (De Biase et al., 2007). In myotonic dystrophy, caused by a non-coding CTG repeat expansion in the DMPK gene, repeat instability was not observed at 13 weeks in fetal tissues, but a difference between tissues became detectable after 16 weeks (Martorell et al., 1997). All these studies suggest that, early in life, somatic instability is minimal.
Second, Kaplan et al. posited that somatic expansions should progress with age, even prior to disease onset. This also turns out to be correct: the presymptomatic carrier parents of fetal mutation carriers already showed somatic instability in the blood at the time of the pregnancy. It is remarkable, in fact, that increases in ER were evident despite our limited sample size and despite the fact that we had to derive this calculation from blood cells, which are not involved in HD pathogenesis and completely change over every six months or so. Unfortunately, for this very reason, the EI from the blood is not sufficient to predict AO, which is influenced by not only repeat length and somatic instability but other factors such as variants in DNA repair factors (see below).
Third, the model predicts the rate of allele expansion should increase with time and be a function of the repeat length at that time. This is indeed what we found: not just greater somatic expansions with age and reference repeat length (as represented by EI), but a greater propensity to expand with age (as represented by ER).
One prediction of the Kaplan model we could not test is that there should be different thresholds of somatic expansions that must be reached for different brain regions to become pathological. It is hard to imagine how this particular prediction of the model could be tested, other than by performing extensive neuropathological studies on a great many mice at many different disease stages. In terms of correlation between somatic instability and disease progression, we did find group-level differences in EI and ER between the premanifest and manifest state. We could not establish a correlation at the individual subject level, however, likely because of the limited sample size as well as the difficulty of pinpointing phenoconversion in a disease that continues to unfold over many years. Stronger evidence on this point came from a large study of nearly 750 HD mutation carriers, which showed that larger somatic expansions are associated with worse clinical outcomes (earlier AO, higher motor and progression scores) in HD (Ciosi et al., 2019).
The most interesting questions that remain to be answered have to do with what drives somatic instability. The brain regions that have the greatest repeat expansions in HD, the striatum and cortex, are hypermetabolic from early in the disease course (Tereshchenko et al., 2020), and neurons show greater somatic instability than glial cells in models and post mortem brains (Gonitel et al., 2008; Shelbourne et al., 2007). Metabolic stress may also lead to mitochondrial dysfunction and energy deficit in HD (Mochel et al., 2012; Roze et al., 2008; Tabrizi et al., 1999). An excess of excitatory glutamatergic inputs and NMDA receptor activation creates energy demands that are not sustainable in a context of diminished energy capacity, and may lead to cell death (Milnerwood et al., 2010; Mochel and Haller, 2011).
In fact, an excitotoxicity model of neurodegeneration was proposed for HD many years before the discovery of the genetic basis of the disease (Coyle and Schwarcz, 1976; Mcgeer and Mcgeer, 1976). The medium spiny neurons of the caudate and putamen, which are the most vulnerable in HD, receive their main input from cortical glutamatergic neurons; they are thus particularly susceptible to excitation and, in fact, HD can be mimicked by administering glutamate analogues to the striatum (Coyle and Schwarcz, 1976; Estrada Sánchez et al., 2008; Mcgeer and Mcgeer, 1976). In this context it is worth noting that variants in the GluR6 kainate receptor locus were found to account for 13% of variation in AO that was not provided by CAG repeat number (Rubinsztein et al., 1997). Along similar lines, a recent study showed that absence of the aryl hydrocarbon receptor (AhR), which protects mice from excitotoxicity, greatly reduced behavioral deficits in the R6/1 transgenic model of HD (Angeles-López et al., 2021).
Hypermetabolism would also contribute to oxidative stress, which can cause DNA damage (Iyer and Pluciennik, 2021; Leija-Salazar et al., 2018). Large-scale studies have linked somatic CAG expansions in patients’ blood to the presence of variants in DNA repair genes, not just in HD (Ciosi et al., 2019; Lee et al., 2019) but in other polyglutamine diseases as well (Bettencourt et al., 2016). In HD, somatic instability is influenced by polymorphisms in MSH3, MLH1, MlH3, and FAN1, which are all involved in DNA repair (Ciosi et al., 2019). Counterintuitively, loss of function of some DNA repair factors can be protective: knockout of Msh2 or Msh3 in a knock-in model of HD prevents CAG expansions in the striatum (Pinto et al., 2013). The reason for this may be that transcriptionally active genes elicit mismatch repair activity to guard genomic integrity, but long repeat tracts are difficult to repair accurately (Iyer and Pluciennik, 2021). A different mechanism is at work for FAN1, which actually stabilizes the CAG repeat in HD (Goold et al., 2019); loss of FAN1 function increases repeat instability (Kim et al., 2020; Loupe et al., 2020). Interestingly, there is evidence that double-strand break repair is dysregulated in HD: ATM (ataxia-telengiectasia mutated) is upregulated in brain tissue from HD mice and patients, and its heterozygous loss of function is protective in both mouse and Drosophila models of HD (Lu et al., 2014). It could be that the decline in DNA repair capacity or efficiency that comes with age (Gorbunova et al., 2007) contributes to the increasing somatic instability in blood cells, which, as we noted above, seem too short-lived to accumulate expansions as they do. An extended longitudinal study of the effect of DNA repair gene variants on somatic instability would be of great interest.
Given that somatic instability influences disease progression, targeting the repeat instability is a very appealing disease-modifying strategy (Khristich and Mirkin, 2020). One possibility is to introduce DNA-stabilizing interruptions into the repeat tract via gene editing (Ciosi et al., 2019). Another is to modulate DNA repair activity in HD to retard somatic expansions (Dragileva et al., 2009; Pinto et al., 2013), but this might also run the risk of increasing overall genomic instability. A recent approach using a small molecule that specifically binds CAG slip-out structures was able to contract the expansions and reduce protein aggregates in the striatum of R6/2 mice (Nakamori et al., 2020). Further efforts to stabilize or contract somatic expansions are warranted, particularly if expansions within brain tissue can be reduced. Last but not least, there is much more work to be done to understand the mechanisms that trigger somatic expansions, whether they relate to excitotoxicity, and how they lead to neurodegeneration.
Materials and methods
Sample collection
Longitudinal study
Request a detailed protocolWe recruited HD patients through the Department of Genetics of the Pitié-Salpêtrière University Hospital (Paris, France). Inclusion criteria were a pathological CAG repeat expansion in the HTT gene above 38 repeats. Age at disease onset was defined as the presence of a clinically significant movement disorder consistent with HD. We obtained blood samples with written informed consent according to the French legislation (approval from local ethics committees on 19/12/1990, 10/11/1992, followed by the Ethics committee Ile de France II on 30/9/2004 and 18/2/2010). All tested subjects were offered long-term follow-up and signed an informed consent prior to clinical examination and interview. We determined AO by taking the earliest date between self-reported age and motor signs at examination by a neurologist.
Post-mortem cortical samples
Request a detailed protocolBrain samples were collected as part of a program of ‘Brain Donation for Research’ (National Neuro-CEB Brain Bank, GIE Neuro-CEB BB-0033–00011). Brains were dissected in the neuropathological department of the Pitié-Salpêtrière University Hospital (Paris, France) to isolate samples from the frontal cortex.
Fetal samples
Request a detailed protocolApproximately 20% of HD mutation carriers request prenatal diagnosis. After analysis of the fetal DNA, obtained by chorionic villus sampling, if the fetus carries the mutation the parents can request termination of the pregnancy, which is performed by manual vacuum aspiration under general anesthesia. Typically, the termination occurs at gestational week 13. We used standard obstetric protocols in accordance with the French guidelines for clinical practice. Prenatal visits and psychological support were provided for all couples participating, as standard practice, and no additional visits were planned due to participation in this study. The women signed an informed consent during a prenatal visit agreeing to the collection of fetal tissue following the eventual termination of the pregnancy. The study complied with all relevant ethical regulations, with approval from the French Agency of biomedicine (n°PFS17-001; 24/01/2017). The brain tissue analyzed was from the developing cortex.
DNA extraction
Request a detailed protocolPost-mortem brains and fetal tissues were rapidly frozen and stored at -80°C until DNA extraction. DNA was extracted from brain tissues using the QIAamp Fast DNA Tissue Kit (Qiagen S.A., Courtaboeuf Cedex, France), according to manufacturer’s instructions. For blood samples, DNA was extracted using the Maxwell RSC Blood DNA kit, according to manufacturer’s instructions (Promega, France EURL). Finally, we measured DNA yields using a NanoDrop 8000 spectrophotometer (ThermoScientific, Illkirch Cedex, France).
Determination of the CAG length on huntingtin exon 1
Request a detailed protocolAmplification of the CAG repeat in the HTT gene was performed as follows: in a final volume of 25 µl, each PCR contained 200 µM of each dNTP, 5 pmoles of each primer (see table below), 200 ng of genomic DNA and 1x PCR-Buffer, 1x Q-Solution, and 1 unit of Taq DNA polymerase pu (Qiagen S.A., Courtaboeuf Cedex, France). After an initial denaturation for 10 min at 96°C, samples were subjected to 35 cycles of 45 s of denaturation at 96°C, 2 min 30 s of annealing-extension at 70°C, followed by a final extension for 7 min at 72°C. Each amplification product was mixed with Hi-Di Formamide and Genescan-400HD Rox size standard (Applied Biosystems, Foster City, CA). Fragments were separated on an Applied Biosystems 3730XL DNA Analyzer. We scored alleles with GeneMapper software v5.0 (Applied Biosystems). We used two sets of primers (see sequence below): HD-F2 with HD-WR2-hex to determine the CAG repeat length and instability, and HD-F2 with HD-WCAAM4-R-fam to determine the presence of an additional CAA interruption. We excluded any patients with a CAA interruption from this study (n = 1).
Sequences of the primers used for determination of CAG length by PCR | |
---|---|
HD-F2 | 5'- GGGAGACCGCCATGGCGACCCTGGA - 3' |
HD-WR2-hex | 5'- HEXGGCGGTGGCGGCTGTTGCTGCTGCT- 3' |
HD-WCAAM4-R-fam | 5’-[6FAM]GGCGGTGGCGGCTGTTGCTGTTGAT-3’ |
To visualize the fragments, the primers used for the PCR contain a fluorescent tag, so that the fluorescence intensity is proportional to the number of amplified fragments.
Measuring somatic CAG repeat expansions and calculating the somatic expansion index
Request a detailed protocolWe used the GeneMapper software v5.0 (Applied Biosystems) to analyze the somatic CAG repeat expansions. For any individual, the majority of PCR products peak around a main signal representing the reference CAG repeat size. Signals to the left of this peak include PCR ‘stutter’ inherent in the assay, but PCR products to the right represent somatically expanded CAG repeats only; these latter peaks were included. From the GeneMapper ‘sample plot view,’ we exported a data table for each sample containing the following information: sample name, called CAG allele, peak size in base pair (bp), peak height, area under the peak, and data point/scan number of the highest point of the peak. Based on the main expanded CAG peak size, we used an internal standard to assign, on a per plate basis, a main CAG length to each sample. We used peak heights to quantify mosaicism from GeneMapper traces. To calculate the proportion of expanded products for each sample, we normalized the heights of the expanded peaks to the height of the main CAG peak, multiplied by the position of the peak. We applied a relative threshold of 0.03 of the main peak, excluding peaks falling below this threshold from analysis. We selected this threshold based on the additional peaks in fetal tissues that were low in intensity but clearly distinguishable from background by the software. Finally, we summed all peak values to generate an expansion index.
Statistical analyses
Request a detailed protocolWe conducted all statistical analyses using R version 3.6.1 (R Development Core Team, 2019; https://www.R-project.org/), and we generated plots with the ggplot2 R package (Wickham, 2009) (ggplot2_3.3.0). We generated correlation plots using the corrplot R package (corrplot_0.84). The level of statistical significance was set at p<0.05 for all tests.
Descriptive statistics
Request a detailed protocolDescriptive statistics were reported for the HD patients with demographics and disease characteristics (sex, age, somatic expansion index, and Unified HD Rating Scale total motor score [UHDRS-TMS]) determined at each visit that included blood sample collection. We defined AO as the onset of motor signs, as defined by the patient and their family, or first neurological exam at which they were considered symptomatic, whichever was earlier. Patients with a UHDRS-TMS greater than 5, which indicates motor signs of HD, were considered to have ‘manifest’ HD. We summarized the data as n (number of available values), mean ± SD, and range (minimum and maximum) for quantitative variables and frequency counts and percentages for categorical variables.
Relationship between somatic CAG expansions and germline CAG repeat length
Request a detailed protocolFor samples collected from post-mortem brains (carrier fetuses and adult brain) or blood (two to three samples per patient), we studied the relationship between the CAG somatic expansions and the CAG repeat length by linear regression. We then determined the strength of association by the Pearson’s correlation coefficient (r), the slope, and p-value of the regression line.
Regression analysis of disease characteristics with CAG repeat and somatic expansion measures
Expansion index (EI)
Request a detailed protocolPrior to regression analysis, we transformed AO and AD values by the natural logarithm to better meet the linear model assumptions of normality and homoscedasticity (constant variance) of the residuals. Because we were able to collect blood samples at two or three time points for each patient in the longitudinal part of the study, we calculated corresponding EIs for each time point.
Expansion rate (ER)
Request a detailed protocolFrom these EIs, we were able to derive a rate of change in expansion over time (expansion rate or ER) in addition to the single time point measures. To investigate whether somatic instability itself evolves, i.e., whether the tendency to expand increases with age, both slope and intercept coefficients were extracted using linear regressions for each individual. We used the slope to calculate the expansion rate of change (ER), while the intercept (EI-intercept) indicated a theoretical baseline value (age 0, i.e., at birth) for the expansion index.
Expansion index at age at onset (EI-AO)
Request a detailed protocolEven though the EI-intercept is too distant in time from the visits to be a realistic estimate of CAG instability at birth, we used the slope and the intercept for each patient to extrapolate a plausible (albeit theoretical) expansion index at AO (EI-AO).
In a first analysis, we performed linear regressions to model the values of log-AO and log-AD, respectively, as a function of CAG repeat length, EI-AO, and ER. We used the p-value of the slope and adjusted R squared (R²adj) values to determine all associations. Sex differences in AO and AD were also assessed using Wilcoxon rank-sum tests. Finally, we generated a correlation matrix plot summarizing all pairwise correlations between the variables from the longitudinal cohort.
Since the CAG repeat length is a well-established predictor of AO, we carried out the following analyses to understand whether combining information from the CAG repeat length and evolution of the somatic CAG instability could better characterize the disease onset.
Determination of earlier-than-expected, as expected, or later-than-expected age at onset
Request a detailed protocolSince AO, EI, and ER are all CAG length-dependent to a great extent, we sought a way to dissociate their contributions. To this end, we divided HD patients into three groups according to whether their motor symptom onset occurred earlier or later than the AO predicted by CAG repeat number [(CAG)n]. Following (Swami et al., 2009), we calculated the residuals from the linear regression, including log-AO as the dependent variable and (CAG)n as the independent variable, to evaluate the differences between the observed and predicted AO. We standardized residuals to have mean zero and unit variance and defined onset groups as ‘earlier’ for residual values less than −0.5, ‘later’ for residual values greater than 0.5, and ‘as expected’ otherwise. We then performed a Kruskal-Wallis test to compare the ER and EI-AO values among these groups.
Relationship between somatic expansion and residual age at death
Request a detailed protocolAs a complementary analysis, similarly to the previous AO study, we used data from the 14 deceased patients in the longitudinal cohort, and data measured in the 14 postmortem brains (Tables 2 and 3). Based on the residual AD (i.e., AD after subtracting the effect of the CAGn using linear regression), we performed an association study with ER (blood samples) and EI at AD (postmortem samples) using linear regressions. Associations were reported with p-value of the slope and adjusted R squared (R²adj) values.
Influence of disease status on EI and ER in blood samples
Request a detailed protocolThe cohort had a sufficient number of subjects in the premanifest and manifest stages at the first visit to study the influence of disease status on the residual EI and ER after using linear regression to subtract the contribution of CAGn. Since we could correlate EI with disease status only at the first visit (too many patients phenoconverted by the second visit), and because of the impossibility of clearly distinguishing the contributions of premanifest/manifest status, CAG repeat length, and age, this was an exploratory study prior to modeling using the complete expansion data with age and CAG repeat length. Comparisons of EI and ER with disease status were performed using Wilcoxon sum-rank tests.
Distinguishing the determinants of somatic instability in blood samples: linear mixed-effects model
Request a detailed protocolTo investigate the longitudinal association of CAG repeat length and age with the somatic expansion in blood, we employed a linear mixed-effects model (LMM) including the variables age, CAG, and age × CAG interaction term as fixed effects, the subject identifier as a random effect to account for the within-subject correlation among visits (‘random intercept only model’), and sex as a cofactor for adjustment. Prior to modeling, the somatic expansion values were transformed by natural logarithm to improve the model assumptions of linearity, normality, and constant variance of the residuals. LMM was fitted using restricted maximum-likelihood estimation (REML) from the function lmer in the lme4 R package (Bates et al., 2015a) (lme4_1.1–21). For the retained model, we reported the coefficient estimates of fixed effects with standard errors and standardized regression coefficients (t values), and the standard deviation of random effects. T values were obtained by dividing each coefficient estimate by its standard error and used as a measure to represent the relative strength of association of each term with somatic expansion in blood. Significance of fixed effects (p-values adjusted for sex) was obtained with the lmerTest R package (lmerTest_3.1–1) using Satterthwaite’s approximation for degrees of freedom. As age and CAG repeat length were mean-centered for modeling, the estimate for the model intercept can be interpreted as the level of somatic expansion for a virtual subject with average characteristics for all patients in the study (mean age and mean CAG repeat length). Curves for the age-trajectories of somatic expansion in blood (one trajectory per CAG value, Figure 5) were plotted from the fixed effects component of the model.
Data availability
All patient data generated and analyzed are included in the manuscript and available in Tables 1, 2 and 3.
References
-
Fitting linear Mixed-Effects models using lme4Journal of Statistical Software 67:1–48.https://doi.org/10.18637/jss.v067.i01
-
Excitotoxic Neuronal Death and the Pathogenesis of Huntington's DiseaseArchives of Medical Research 39:265–276.https://doi.org/10.1016/j.arcmed.2007.11.011
-
FAN1 modifies Huntington’s disease progression by stabilizing the expanded HTT CAG repeatHuman Molecular Genetics 28:650–661.https://doi.org/10.1093/hmg/ddy375
-
Changes in DNA repair during agingNucleic Acids Research 35:7466–7474.https://doi.org/10.1093/nar/gkm756
-
DNA Mismatch Repair and its Role in Huntington’s DiseaseJournal of Huntington's Disease 10:75–94.https://doi.org/10.3233/JHD-200438
-
A Universal Mechanism Ties Genotype to Phenotype in Trinucleotide DiseasesPLOS Computational Biology 3:e235.https://doi.org/10.1371/journal.pcbi.0030235
-
Dramatic tissue-specific mutation length increases are an early molecular event in Huntington disease pathogenesisHuman Molecular Genetics 12:3359–3367.https://doi.org/10.1093/hmg/ddg352
-
On the wrong DNA track: Molecular mechanisms of repeat-mediated genome instabilityJournal of Biological Chemistry 295:4134–4170.https://doi.org/10.1074/jbc.REV119.007678
-
Genetic and Functional Analyses Point to FAN1 as the Source of Multiple Huntington Disease Modifier EffectsThe American Journal of Human Genetics 107:96–110.https://doi.org/10.1016/j.ajhg.2020.05.012
-
Review: Somatic mutations in neurodegenerationNeuropathology and Applied Neurobiology 44:267–285.https://doi.org/10.1111/nan.12465
-
Targeting ATM ameliorates mutant huntingtin toxicity in cell and animal models of Huntington's diseaseScience Translational Medicine 6:268ra178.https://doi.org/10.1126/scitranslmed.3010523
-
Somatic Instability of the Myotonic Dystrophy (CTG)n Repeat during Human Fetal DevelopmentHuman Molecular Genetics 6:877–880.https://doi.org/10.1093/hmg/6.6.877
-
Early Alterations of Brain Cellular Energy Homeostasis in Huntington Disease Models*Journal of Biological Chemistry 287:1361–1370.https://doi.org/10.1074/jbc.M111.309849
-
Energy deficit in Huntington disease: why it mattersJournal of Clinical Investigation 121:493–499.https://doi.org/10.1172/JCI45691
-
SoftwareR: A Language and Environment for Statistical ComputingR Foundation for Statistical Computing, Vienna, Austria.
-
Pathophysiology of Huntington’s disease: from huntingtin functions to potential treatmentsCurrent Opinion in Neurology 21:497–503.https://doi.org/10.1097/WCO.0b013e328304b692
-
Triplet repeat mutation length gains correlate with cell-type specific vulnerability in Huntington disease brainHuman Molecular Genetics 16:1133–1142.https://doi.org/10.1093/hmg/ddm054
-
The search for cerebral biomarkers of Huntington's disease: a review of genetic models of age at onset predictionEuropean Journal of Neurology 13:408–415.https://doi.org/10.1111/j.1468-1331.2006.01264.x
-
Huntington’s Disease: Relationship Between Phenotype and GenotypeMolecular Neurobiology 54:342–348.https://doi.org/10.1007/s12035-015-9662-8
-
Huntington disease: new insights into molecular pathogenesis and therapeutic opportunitiesNature Reviews Neurology 16:529–546.https://doi.org/10.1038/s41582-020-0389-4
-
Length of Uninterrupted CAG, Independent of Polyglutamine Size, Results in Increased Somatic Instability, Hastening Onset of Huntington DiseaseThe American Journal of Human Genetics 104:1116–1126.https://doi.org/10.1016/j.ajhg.2019.04.007
Article and author information
Author details
Funding
Agence Nationale de la Recherche (ANR-16-COEN-0006-02)
- Sandrine Humbert
- Alexandra Durr
Fondation pour la Recherche Médicale (DEQ20170336752)
- Sandrine Humbert
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We express our deepest gratitude to the patients and families participating in this study. Many thanks to Triplet Therapeutics for fruitful discussions and financial support. We thank Vicky Brandt for helpful discussions and editing the manuscript. We also thank research assistants Lynda Benammar, Marie Biet, and Rania Hilab for their engaged help and Ludmila Jornea and Philippe Martin Hardy for their laboratory skills. We received support from i-crin Neuroscience (NEUROLOP). This work was supported by grants from Agence Nationale pour la Recherche (Network of centers of excellence in neurodegeneration COEN, AD, SH); Fondation pour la Recherche Médicale (DEQ20170336752, SH).
Ethics
Human subjects: We obtained blood samples with written informed consent according to the French legislation (Approval from local ethics committees on 19/12/1990, 10/11/1992, followed by the Ethics committee Ile de France II on 30/9/2004 and 18/2/2010). Brain samples were collected as part of a program of 'Brain Donation for Research' (National Neuro-CEB Brain Bank, GIE Neuro-CEB BB-0033-00011).
Copyright
© 2021, Kacher et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 5,199
- views
-
- 537
- downloads
-
- 52
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Genetics and Genomics
- Microbiology and Infectious Disease
The sustained success of Mycobacterium tuberculosis as a pathogen arises from its ability to persist within macrophages for extended periods and its limited responsiveness to antibiotics. Furthermore, the high incidence of resistance to the few available antituberculosis drugs is a significant concern, especially since the driving forces of the emergence of drug resistance are not clear. Drug-resistant strains of Mycobacterium tuberculosis can emerge through de novo mutations, however, mycobacterial mutation rates are low. To unravel the effects of antibiotic pressure on genome stability, we determined the genetic variability, phenotypic tolerance, DNA repair system activation, and dNTP pool upon treatment with current antibiotics using Mycobacterium smegmatis. Whole-genome sequencing revealed no significant increase in mutation rates after prolonged exposure to first-line antibiotics. However, the phenotypic fluctuation assay indicated rapid adaptation to antibiotics mediated by non-genetic factors. The upregulation of DNA repair genes, measured using qPCR, suggests that genomic integrity may be maintained through the activation of specific DNA repair pathways. Our results, indicating that antibiotic exposure does not result in de novo adaptive mutagenesis under laboratory conditions, do not lend support to the model suggesting antibiotic resistance development through drug pressure-induced microevolution.
-
- Computational and Systems Biology
- Genetics and Genomics
Enhancers and promoters are classically considered to be bound by a small set of transcription factors (TFs) in a sequence-specific manner. This assumption has come under increasing skepticism as the datasets of ChIP-seq assays of TFs have expanded. In particular, high-occupancy target (HOT) loci attract hundreds of TFs with often no detectable correlation between ChIP-seq peaks and DNA-binding motif presence. Here, we used a set of 1003 TF ChIP-seq datasets (HepG2, K562, H1) to analyze the patterns of ChIP-seq peak co-occurrence in combination with functional genomics datasets. We identified 43,891 HOT loci forming at the promoter (53%) and enhancer (47%) regions. HOT promoters regulate housekeeping genes, whereas HOT enhancers are involved in tissue-specific process regulation. HOT loci form the foundation of human super-enhancers and evolve under strong negative selection, with some of these loci being located in ultraconserved regions. Sequence-based classification analysis of HOT loci suggested that their formation is driven by the sequence features, and the density of mapped ChIP-seq peaks across TF-bound loci correlates with sequence features and the expression level of flanking genes. Based on the affinities to bind to promoters and enhancers we detected five distinct clusters of TFs that form the core of the HOT loci. We report an abundance of HOT loci in the human genome and a commitment of 51% of all TF ChIP-seq binding events to HOT locus formation thus challenging the classical model of enhancer activity and propose a model of HOT locus formation based on the existence of large transcriptional condensates.