Native functions of short tandem repeats
Abstract
Over a third of the human genome is comprised of repetitive sequences, including more than a million short tandem repeats (STRs). While studies of the pathologic consequences of repeat expansions that cause syndromic human diseases are extensive, the potential native functions of STRs are often ignored. Here, we summarize a growing body of research into the normal biological functions for repetitive elements across the genome, with a particular focus on the roles of STRs in regulating gene expression. We propose reconceptualizing the pathogenic consequences of repeat expansions as aberrancies in normal gene regulation. From this altered viewpoint, we predict that future work will reveal broader roles for STRs in neuronal function and as risk alleles for more common human neurological diseases.
Introduction
At least a third of the human genome is comprised of repetitive sequences (de Koning et al., 2011; Gemmell, 2021; Britten and Kohne, 1968). Some of the first genomic repetitive elements were discovered in association with disease. As a result, pathogenic roles of repeats were well studied, while potential native functions of these repeats were largely dismissed. However, the conservation of genomic repeats among different eukaryotic species (Eichler et al., 1995; Sulovari et al., 2019; Liquori et al., 2003) and their high polymorphism rates compared to other types of genetic variations (Willems et al., 2016) suggests that repeats may have important biological functions in addition to the pathogenic ones. A growing body of research has revealed complex biological and evolutionary functions for repeats across the genome. Here, we summarize the important functions of one type of genomic repeat, short (2–6 base pair) tandem repeats (STRs), in DNA, RNA, and as proteins. We then reframe STR toxicity observed in repeat expansion disorders (REDs) as an aberrancy of native STR functions, rather than as solely a emergent property disconnected from the native repeat. Finally, we discuss how this alternative view of STR toxicity can improve our understanding of roles of STRs in neuronal function and human health.
A brief history of repetitive DNA
Repetitive elements in DNA were first discovered by Barbara McClintock, who observed the presence of ‘controlling elements’ randomly dispersed throughout the maize (Zea mays) genome (Comfort, 2001; Ravindran, 2012; McClintock, 1950). These interspersed repeats, which would come to be known as transposable elements (TEs), use flanking repetitive sequences to ‘jump’ around to different locations in the genome, often resulting in duplications of genetic material.
In contrast to interspersed repeats, tandem repeats (TRs) are regions in which repeating units lie in parallel (or in tandem) and are classified by size of the repeating unit as satellites (>60 base pairs), minisatellites (10–60 base pairs), or microsatellites (<9 base pairs). Short (2–6 base pair) tandem repeats (STRs) comprise between 1% and 3% of the human genome (Gymrek et al., 2016; Wyner et al., 2020; Lander et al., 2001). In the early 1990s, a series of STR expansions were causally linked with human diseases, including spinobulbar muscular atrophy (La Spada et al., 1992), Fragile X Syndrome (Fu et al., 1991; Heitz et al., 1991; Oberlé et al., 1991; Verkerk et al., 1991; Yu et al., 1991), Huntington’s disease (MacDonald et al., 1993), and myotonic dystrophy (Brook et al., 1992; Buxton et al., 1992; Harley et al., 1992; Aslanidis et al., 1992; Fu et al., 1992; Mahadevan et al., 1992). As such, much of the research on STRs to date has centered on the mechanisms by which repeat expansions trigger neuronal toxicity. We will use the Fragile X locus as an exemplar of this now extensive body of literature, which is reviewed in more detail elsewhere (Malik et al., 2021a; Hagerman et al., 2017; Hagerman and Hagerman, 2015; Jacquemont et al., 2003; Glineburg et al., 2018), as it helps us understand how STRs might function normally in the absence of expansion.
Fragile X-associated disorders: the discovery of pathogenic short tandem repeats
Fragile X Syndrome (FXS), the most common monogenic form of intellectual disability, was one of the first genetic diseases linked to an STR expansion (Fu et al., 1991; Heitz et al., 1991; Oberlé et al., 1991; Verkerk et al., 1991; Yu et al., 1991). In 1943, Julia Bell and James Purdon Martin described an X-linked intellectual disability primarily affecting people assigned male at birth, that could be inherited from a carrier female parent or affected male parent (Martin and Bell, 1943). Karyotypes of affected individuals show a folate-sensitive fragile site on the X chromosome, which causes the chromosome to bend or break at one arm (Lubs, 1969; Proops and Webb, 1981; Chudley and Hagerman, 1987; Hagerman et al., 1986). The fragile site associated with FXS is located at the Fragile X messenger ribonucleoprotein 1 (FMR1) gene, which contains a large CGG repeat in the 5’ UTR of affected individuals (>200 repeats) (Fu et al., 1991; Heitz et al., 1991; Oberlé et al., 1991; Verkerk et al., 1991; Yu et al., 1991). In addition to intellectual disability, FXS patients commonly present with hyperactivity, anxiety, and seizures (Hagerman et al., 2017; Chudley and Hagerman, 1987; Hagerman and Hagerman, 2002a). Other chromosomal fragile sites also contain STRs, some of which are linked to other diseases (Glover, 2006; Debacker and Kooy, 2007; Schwartz et al., 2006). For example, Fragile XE syndrome (FRAXE), caused by a CGG repeat expansion in the FMR2 gene (Knight et al., 1993; Gu et al., 1996; Gecz et al., 1996), manifests in an X-linked intellectual disability similar to FXS (Mulley et al., 1995; Gecz, 2000).
While studying pedigrees of Fragile X families, Stephanie Sherman and colleagues observed incomplete penetrance of mental impairment, affecting only 79% of males and 35% of females (Sherman et al., 1984; Sherman et al., 1985). This ‘Sherman paradox’ suggested a generational risk factor in Fragile X mental impairment, as unaffected ‘normal transmitting’ males (NTMs) passed on a mutant allele to unaffected female children, with disease manifestation in affected (predominantly) male grandchildren. Subsequent studies of CGG repeat length variation found that individuals from non-Fragile X families have 6–54 CGG repeats, while some unaffected individuals in Fragile X families have 55–200 repeats, a ‘pre-mutation’ associated with increased risk of further repeat expansion during oogenesis (Fu et al., 1991).
Subsequent work with Fragile X families revealed that Fragile X premutation expansion carriers often manifest clinically distinct disorders that are caused by the CGG repeat. Fragile X-associated tremor/ataxia syndrome (FXTAS) is an age-linked neurodegenerative disorder characterized by progressive intention tremor and ataxia, parkinsonism, and cognitive decline (Hagerman and Hagerman, 2015; Jacquemont et al., 2003; Hagerman and Hagerman, 2004; Hagerman et al., 2001; Hagerman and Hagerman, 2002b; Leehey et al., 2003; Brunberg et al., 2002). As an X-linked disorder, FXTAS primarily affects people assigned male at birth. People with two X chromosomes may develop FXTAS, but are also at risk for developing Fragile X-associated premature ovarian insufficiency (FXPOI), a disorder characterized by absent or irregular menstrual cycles, early onset of menopause, and fertility issues (Hagerman and Hagerman, 2002a; Allingham-Hawkins et al., 1999; Murray et al., 2000a; Murray et al., 2000b). As ‘premutation disorders’, FXTAS and FXPOI are thought to share similar molecular mechanisms by which the premutation CGG repeat expansion causes cytotoxicity and dysfunction.
More than 50 REDs discovered to date show common mechanisms of molecular pathology (Malik et al., 2021a; Glineburg et al., 2018; Paulson, 2018; Rodriguez and Todd, 2019). FXS, FXTAS, and FXPOI, collectively referred to as Fragile X-associated disorders, are revisited throughout this review to exemplify the mechanisms by which STRs can cause cellular dysfunction and toxicity. However, the often-stereotyped manifestations of REDs, in addition to the abundance of repetitive elements throughout the genome, suggests that STRs could have native functions which become aberrant in the setting of repeat expansions. We will focus most of the rest of the review on this supposition.
Native STR functions
While overshadowed by disease-centered research, scientists have investigated functional consequences of repeat polymorphisms for decades. Studies of individual or small groups of genes showed phenotypic consequences of repeat length variation on flocculation and cell adhesion in yeast (Voynov et al., 2006; Levdansky et al., 2007; Verstrepen et al., 2005), limb and skull morphology in dogs (Fondon and Garner, 2004) and on behavioral traits in voles (Hammock and Young, 2005). Recent advances in sequencing technology and STR-conscious alignment techniques now permit the detection and characterization of thousands of new STRs and their variation across the human genome, and have enabled genome-wide study of the effect of repeat length polymorphisms on gene expression (Willems et al., 2016; Payseur et al., 2011; McIver et al., 2011; O’Dushlaine and Shields, 2008; McIver et al., 2013; Willems et al., 2014; Mallick et al., 2016). As thousands of single-nucleotide polymorphisms (SNPs) have been linked with disease risk in Genome Wide Association Studies (GWAS; Tam et al., 2019; Uffelmann et al., 2021), ongoing studies of human genomes aim to link variation in STR length to phenotypic outcomes (Gymrek et al., 2016; Fotsing et al., 2019; Mitra et al., 2021). In homage to the expression quantitative trait loci (eQTL) identified in traditional GWAS (Tam et al., 2019; Uffelmann et al., 2021), STRs associated with differences in expression of nearby genes are called eSTRs. In the following sections, we will first discuss evidence for evolutionary constraint on STRs linked to evolution across phylogeny and in humans. We will then showcase the mechanisms by which variation in STR length affects gene expression.
Repetitive DNA regulates transcription
Repetitive elements can impact the transcription of neighboring genes or the genes in which they reside by regulating chromatin structure and epigenetic markers. A role for repetitive DNA in facilitating 3D folding of the genome was first observed with TE-dependent formation of chromatin loops across multiple species, including yeast, Drosophila, and mammals (Figure 1(i); Cournac et al., 2016; Bourque, 2009; Lu et al., 2021). Contact maps generated using chromosome conformation capture (Hi-C) show high co-localization of repetitive elements in nuclear space in humans, mice, and Drosophila, demonstrating a structural function of repetitive DNA (Cournac et al., 2016). The enrichment of transcription factor binding sites in proximity to spatially associated repeats suggests that repeat-mediated 3D DNA packaging may allow for context-dependent co-transcription of linearly remote genes (Cournac et al., 2016).
STRs play critical roles in maintaining chromatin structure (Nikumbh and Pfeifer, 2017; Sun et al., 2018; Volle and Delaney, 2012). For example, short CAG/CTG tracts avidly incorporate nucleosomes (Figure 1 (ii)), which are a basic subunit of chromatin packaging (Volle and Delaney, 2012). Nucleosome position varies with differences in STR length and flanking sequence context (Volle and Delaney, 2012), influencing chromatin structure and transcription of nearby genes. Other STRs, including CGG repeats, have the opposite effect and exclude nucleosomes in their native states, creating more open chromatin states near the transcription start sites of genes that favor local transcription (Wang et al., 1996; Wang, 2007). This feature may underlie the enrichment of CGG repeats in promoters and 5’UTRs (Uesaka et al., 2014).
STRs can also influence chromatin structure by modulating DNA methylation. Some STRs are prone to methylation, which can lead to gene silencing and the absence of transcription (Quilez et al., 2016; Garg et al., 2021; Pappalardo and Barra, 2021). One common example of repeat-mediated gene silencing, CpG islands are repeating di-nucleotide CpG sequences ranging from around 500–3000 base pairs and are located in ~40% of gene promoters across mammalian genomes (Deaton and Bird, 2011; Janitz and Janitz, 2011; Thomson et al., 2010; Clouaire et al., 2012; Blackledge et al., 2013). STRs are commonly located near CpG islands (Sun et al., 2018), and may influence their methylation states (Figure 1 (iii)). Moreover, other STRs can contain CpGs within their repetitive sequence that can undergo methylation (Bolton et al., 2013).
A genome-wide study in yeast (Vinces et al., 2009) estimates as many as 25% of promoters contain tandem repeats (TRs). Generally, expression of genes with TRs in their promoters increased with increasing repeat size. TRs in promoters may increase gene expression by increasing transcription factor binding (Figure 1 (iv)), blocking or reducing nucleosome density, or in the case of AT-rich repeats, by facilitating DNA melting (Vinces et al., 2009). However, stable secondary structures formed by TRs can also inhibit transcription by impeding procession and access of transcriptional machinery (Figure 1 (v); Grabczyk and Fishman, 1995; Belotserkovskii et al., 2010; Usdin and Woodford, 1995). For example, the evolutionarily conserved THO complex is recruited to actively transcribed genes (Kim et al., 2004; Abruzzi et al., 2004; Strässer et al., 2002), and facilitates elongation of RNA polymerase (Fan et al., 1996; Prado et al., 1997; Fan et al., 2001; Jimeno et al., 2002; Chávez et al., 2000) through super-helical structures formed by long GC-rich TRs (Voynov et al., 2006; Chávez et al., 2001). Yeast strains with mutations to THO complex subunits exhibited lower levels of TR-containing FLO11 mRNA. Reduced FLO11 mRNA coincided with an accumulation of RNA polymerase at the beginning of the gene. Removal of the TR or overexpression of topoisomerase I to enhance unwinding of the structured DNA, rescued the reduction in FLO11 mRNA in THO complex mutants (Voynov et al., 2006).
As in yeast, human STRs can either enhance and inhibit transcription of associated genes dependent on their sequence and locations, and also affect gene expression via changes in gene methylation and chromatin structure (Gymrek et al., 2016; Fotsing et al., 2019; Quilez et al., 2016; Garg et al., 2021; Jakubosky et al., 2020). Together, these studies demonstrate numerous mechanisms by which TRs can enhance or inhibit gene expression.
STRs in RNA regulate pre-mRNA processing and RNA localization
Transcribed repetitive elements regulate numerous aspects of RNA biology. STRs in RNAs form complex higher order structures, including G-quadruplexes and hairpins (Krzyzosiak et al., 2012; Sobczak et al., 2003; Malgowska et al., 2014; Sobczak et al., 2010), which are thought to exert broad influence over pre-mRNA splicing (Figure 1 (vi)) (Muro et al., 1999; Tu et al., 2000; Black, 2003; Solnick and Lee, 1987). An analysis of human introns found that sites of alternative splicing are enriched for STRs (Lian and Garner, 2005). STRs can facilitate alternative splicing by complementary pairing of intronic repeats, bringing exonic regions into close proximity (Lian and Garner, 2005). Structure-forming STRs can inhibit or enhance alternative splicing by blocking or facilitating the recruitment of splicing factors, respectively (Lian and Garner, 2005). For example, alternative splicing of the EIIIB exon in the well-conserved fibronectin gene is regulated by an intronic TGCATG repeat (Huh and Hynes, 1994; Lim and Sharp, 1998). Contractions in this STR reduce EIIIB exon inclusion, while overexpression of a specific splicing factor, SRp40, stimulates inclusion. While the TGCATG repeat differs from SRp40’s consensus binding site, it can form a strong hairpin structure, which is a key feature of SRp40 binding site motifs (Tacke et al., 1997). This suggests that the TGCATG repeat may modulate alternative splicing by recruiting the SRp40 splicing factor to the intron/exon boundary (Lim and Sharp, 1998).
Some STRs in RNA regulate splicing in trans, by binding to and sequestering splicing factors and blocking their functions (Figure 1 (vii)). A recent study identified a group of novel long non-coding RNAs (lncRNAs) with multiple predicted RNA binding motifs (Yap et al., 2018), a subset of which contained long stretches of STRs (‘strRNAs’). One strRNA called the pyrimidine-rich noncoding transcript (PNCTR) contains numerous stretches of (TC)n repeats, avidly binds to the polypyrimidine tract-binding protein (PTBP1) in cells, and negatively regulates PTBP1-mediated splicing (Yap et al., 2018). As such, PNCTR overexpression was sufficient to trigger mis-splicing of PTBP1 targets and trigger programmed cell death (Yap et al., 2018). In this way, STRs in RNA can regulate the global availability of other RNA-binding proteins (RBPs) with other functions, exerting profound control over numerous aspects of cell biology.
STRs in 3’ UTRs can also serve as RNA localization signals, and via interactions with RBPs, facilitate the transport of RNAs to specified cellular compartments (Figure 1 (viii)). A program called REPFIND was developed to analyze 3’ UTRs of localized mRNAs in Xenopus oocytes and identified various CAC-containing repeat motifs that serve as localization elements (Betley et al., 2002). Mutating these CAC-containing repeats was sufficient to abolish normal RNA localization. CAC-containing repeats were also found in zebrafish and human 3’ UTRs of transcripts that are known to be specifically localized within cells, suggesting that CAC-containing repeats are conserved localization elements in chordates (Betley et al., 2002). REPFIND was subsequently used to generate a database of repeating motifs in 3’ UTRs of mammalian genes from the Mammalian Gene Collection (MGC) that revealed hundreds of human genes containing short CAC- and CAG-rich repeats in their 3’ UTRs (Lim and Sharp, 1998). Intriguingly, these elements facilitate RNA localization to neurites in rat hippocampal neurons (Andken et al., 2007).
Repetitive RNA regulates translation
STRs located in 5’ UTRs and coding regions impact mRNA translational efficiency. GC-rich STRs form stable RNA structures (Krzyzosiak et al., 2012; Sobczak et al., 2003; Malgowska et al., 2014; Sobczak et al., 2010), which can impede the processivity of scanning translational complexes (Figure 1 (ix)) (Kozak, 1980; Kozak, 1986; Kudla et al., 2009; Tuller et al., 2011; Ding et al., 2012; Bentele et al., 2013; Weinberg et al., 2016). For example, a native GGN repeat in the 5’ UTR of the potassium 2-pore domain leak channel Task3 mRNA forms a G-quadruplex structure in vivo (Maltby et al., 2020). This G-quadruplex is inhibitory to translation of Task3 mRNA, but can be overcome by DHX36 helicase activity to improve ribosome processivity through the stable structure (Maltby et al., 2020).
Indeed, libraries of synthetic (Millette et al., 2022) and naturally occurring (Li et al., 2017; Niederer et al., 2022) hairpin sequences placed within 5’ UTRs can be used to precisely control translational transgene output, with potential implications for gene therapy dosing. These studies show how single unit variations in STRs can precisely modulate protein expression, generally permitting more and faster translation of mRNAs with smaller STRs, and less and slower translation of mRNAs with larger STRs.
Repeats in proteins facilitate multi-protein complex formation and structural flexibility
Eukaryotic proteins are more likely to have repeats than prokaryotic proteins, and proteins containing repeats are often unique to eukaryotes and eukaryotic functions (Marcotte et al., 1999). There are numerous long repeating motifs in proteins (>20 amino acids/repeat) with loose homology between repeats, that form complex tertiary structures (Andrade et al., 2001). These protein repeat domains are characterized by the structures they form, as all-β (i.e. β-propellers, β-trefoils), all-α (i.e. HEAT and tetratricopeptide repeats (TPRs)), or mixed α/β (i.e. leucine-rich repeats, ankyrin repeat; Andrade et al., 2001). Although their specific functions vary, protein repeat domains typically serve as binding sites, and are thought to have evolved in eukaryotes to aide in the formation of multi-protein complexes with advanced cellular functions (Andrade et al., 2001; Kajava, 2012; Sharma and Pandey, 2015).
STRs translated into proteins, are thought to have similar functions as these larger repeat-based protein domains, serving as sites for protein-protein interactions (Figure 1 (x); Schaefer et al., 2012; Faux, 2012). CAG repeats are enriched in coding regions and are most frequently found in the polyglutamine (polyQ) reading frame, suggesting that polyQ stretches in proteins have a native function (Schaefer et al., 2012). PolyQ stretches are enriched in proteins that are components of multi-protein complexes, and have functions in transcriptional control, phosphatidylinositol (PI) signaling, protein degradation, and chromatin remodeling. Evolutionary sequence comparison reveals that the location of polyQs within a protein is not always conserved (Schaefer et al., 2012). This suggests that polyQ stretches have evolved multiple times, and don’t directly confer a protein’s function, but rather modulate the protein-protein interactions necessary for those functions (Schaefer et al., 2012; Orr, 2012). Other CG-containing STRs (i.e. CUGs and CGGs) show similar patterns of overrepresentation in coding regions (Schaefer et al., 2012), and likely serve similar complex-scaffolding functions (Nasrallah et al., 2012).
STRs when translated into proteins can be critical for proper protein folding. For example, translation of a CAG repeat in the huntingtin gene (HTT) produces a polyQ tract in the HTT protein which serves as a flexible hinge, allowing the neighboring domains to fold into close proximity (Figure 1 (xi)) (Caron et al., 2013). HTT protein structure is altered with repeat expansion, demonstrating the importance of the flexibility conferred by this STR (Caron et al., 2013).
Pathogenic consequences of STRs: A Fragile X case study
In the previous section, we summarized how STRs in DNA, RNA, and when translated into proteins can affect gene expression and protein function. In the following section, we will draw parallels from these native functions of STRs to pathogenic mechanisms in REDs (Figure 2). These parallels demonstrate how STR toxicity can be viewed as aberrancies of native processes, rather than emergent dysfunctions. For this analysis, we will largely use the Fragile X locus discussed earlier as a well-characterized case study, although many of these principles also apply to other REDs and a few specific examples are included here (reviewed in broader detail in Malik et al., 2021a; Glineburg et al., 2018; Paulson, 2018; Rodriguez and Todd, 2019).
Epigenetic and transcriptional dysfunction of STRs in DNA
The functional consequences of STRs on genome organization and transcription are evident when dysfunction is observed in REDs (Dion and Wilson, 2009; López-Martínez et al., 2020; Yin et al., 2020; Usdin, 2008). Repeat expansions can alter local genome architecture and expression of neighboring genes. A prime example is observed at a CTG repeat in the 3’UTR of the DMPK gene associated with myotonic dystrophy type 1 (DM1), expansion of which alters local chromatin structure and suppresses transcription of neighboring gene, Six5 (Winchester et al., 1999; Brouwer et al., 2013; López Castel et al., 2011). Repeat expansions also cause global alterations in chromatin structure. CGG repeat expansions in FXS patients cause severe disruptions in chromatin boundaries (Figure 2 (i); Sun et al., 2018). These disruptions may explain delayed DNA replication (Subramanian et al., 1996), activation of DNA replication stress pathways (Chakraborty et al., 2020) and altered local DNA replication patterns (Gerhardt et al., 2014) observed at CGG repeat expansions and the Fragile X locus in particular.
As genomic repeats influence native DNA methylation, some STRs are aberrantly methylated only upon expansion (Figure 2(ii); Otten and Tapscott, 1995; Steinbach et al., 1998; Herman et al., 2006; Greene et al., 2007; Belzil et al., 2013; Xi et al., 2013). When the CGG repeat in the 5’ UTR of FMR1 expands beyond 200 repeats, it is susceptible to DNA methylation of both the CpG elements within the repeat and at a CpG element within the FMR1 promoter (Oberlé et al., 1991; Sutcliffe et al., 1992; Pieretti et al., 1991; McConkie-Rosell et al., 1993; Hansen et al., 1992; Coffee et al., 2002; Colak et al., 2014; Willemsen et al., 2002). This hypermethylation is associated with FMR1 gene silencing, with a resulting absence of FMR1 mRNA and FMRP, a critical RBP involved in synaptic plasticity and neuronal function (Oberlé et al., 1991; Hagerman et al., 2017; Quartier et al., 2017; Myrick et al., 2014). How exactly repeat expansion triggers methylation and the relationship between expansion, methylation, and epigenetic silencing is not fully understood, but the locus remains transcriptionally active and unmethylated in human embryonic stem cells even in the presence of very large repeat expansions, with silencing occurring during differentiation. Some studies suggest that FMR1 silencing requires co-transcriptional binding of CGG repeat mRNA directly to the FMR1 promoter region as an RNA-DNA heteroduplex (Colak et al., 2014; Groh et al., 2014).
STR expansions can enhance or inhibit mRNA production from nearby genes. At FMR1, premutation range CGG repeats which cause FXTAS or FXPOI (and which are unmethylated) result in elevated transcription of FMR1 mRNA (Tassone et al., 2000a; Tassone et al., 2000b; Entezam et al., 2007; Brouwer et al., 2008; Kenneson et al., 2001). This may result from use of additional upstream transcription start sites (Beilina et al., 2004; Tassone et al., 2011), or be associated with enrichment of acetylated histones or other chromatin activating factors at the premutation allele (Todd et al., 2010). It’s possible that both hypo-expression and hyperexpression of FMR1 stems from the complex structures formed by these CGG repeats as DNA (Usdin and Woodford, 1995; Fry and Loeb, 1994; Kettani et al., 1995; Patel et al., 2000). As seen in native STRs, different structures formed by expanded STRs could facilitate or block binding of histone-modifying methylases, demethylases, acetylases, deacetylases, and even entire nucleosomes to affect downstream gene expression (Figure 2 (iii-v)) (Wang et al., 1996; Usdin and Kumari, 2015).
Repeat expansions cause defects in pre-mRNA processing and mRNA localization
The native roles of STRs in RNA in regulating splicing mirror splicing dysfunction observed in numerous REDs (Figure 2 (vi)). Splicing of the HTT huntingtin gene, which contains a CAG repeat, is altered at expanded repeats associated with Huntington’s Disease, resulting in the production of a transcript containing only exon 1 and the production of an exon 1 HTT protein (Gipson et al., 2013; Sathasivam et al., 2013; Neueder et al., 2017; Neueder et al., 2018; Franich et al., 2019). The exon 1 HTT protein is found in patient tissues and is toxic in model systems (Gipson et al., 2013; Sathasivam et al., 2013; Neueder et al., 2017; Neueder et al., 2018; Franich et al., 2019). Incomplete splicing of HTT with the CAG repeat expansion increased with overexpression and decreased with knockdown of splicing factor SRSF6. SRSF6 is predicted to bind to the 5’ end of HTT transcripts via the CAG repeat, suggesting that SRSF6-CAG repeat interactions interfere with spliceosome formation at the nearby splice site (Neueder et al., 2018).
Global splicing defects in REDs result from sequestration of critical splicing factors that bind to STR-containing RNA (López-Martínez et al., 2020; Mykowska et al., 2011; Botta et al., 2008). STRs can be binding sites for RBPs, regulating their availability throughout the cell. RNAs with longer STRs bind more RBPs, which can cause a global cellular depletion of these factors (Figure 2 (vii)) (Malik et al., 2021a; Glineburg et al., 2018; Rodriguez and Todd, 2019). In myotonic dystrophy type 1 (DM1), the expanded CTG repeat in the 3’ UTR of the DMPK gene (López-Martínez et al., 2020; Korade-Mirnics et al., 1998; Udd and Krahe, 2012) binds to muscleblind-like splicing regulator 1 protein (MBNL1) among other RBPs (Paul et al., 2011; Jiang et al., 2004; Fardaei et al., 2001; Mankodi et al., 2001; Miller et al., 2000), resulting in depletion of critical splicing factors (Botta et al., 2008; Jiang et al., 2004; Pascual et al., 2006; Jog et al., 2012) and global splicing defects (López-Martínez et al., 2020; Paul et al., 2011; Jiang et al., 2004; Du et al., 2010; Philips et al., 1998). RBP depletion by the CTG repeat in DM1 can also impact other aspects of pre-mRNA processing, including polyadenylation (Thomas et al., 2017; Goodwin et al., 2015).
In FXTAS, premutation expansion mRNA sequesters and depletes multiple RBPs that bind to the CGG repeat RNA directly (i.e. DGCR8 Sellier et al., 2013), Purα (Jin et al., 2007), hnRNP A2/B1 (Jin et al., 2007; Iwahashi et al., 2006; Muslimov et al., 2011; Sofola et al., 2007) or indirectly via binding to CGG-bound proteins (i.e. Drosha Sellier et al., 2013 and Sam68 Sellier et al., 2010). These RBPs are involved in a variety of functions that are affected by their sequestration, including miRNA processing (DGCR8, Drosha), mRNA transport (hnRNP A2/B1, Purα) (Figure 2 (viii)), and in splicing (hnRNP A2/B1, Sam68). Splicing defects have been observed in CGG premutation expansion models (Sellier et al., 2010; He et al., 2014), but compensatory overexpression of CGG repeat-sequestered RBPs (Jin et al., 2007; Sofola et al., 2007; He et al., 2014; Qurashi et al., 2011) or blocking RBP binding to CGG repeats (Disney et al., 2012; Verma et al., 2019; Verma et al., 2020; Verma et al., 2022) can improve these disease-associated defects. This pathogenic sequestration of RBPs by expanded repeats mirrors the native role for STRs in RNA as RBP reserves, mediating fine-tuned dosing of RBP availability with repeat length.
In addition to the depletion of RBPs and consequent defects in RNA splicing and localization and miRNA processing, expanded STRs in RNAs may also cause toxicity by self-association (gelation) (Figure 2(ix); Glineburg et al., 2018; Sellier et al., 2010; He et al., 2014; Jain and Vale, 2017; Ciesiolka et al., 2017; Fay et al., 2017; Tassone et al., 2004). Yet, these processes also occur on RNAs with shorter STRs that are below the pathological threshold for disease, suggesting such that such phase separation properties of specific RNA motifs and their associated RBPs may exist on a spectrum from physiologic to pathologic.
Expanded STRs in RNA can mis-localize or be retained in the nucleus instead of transported to its functional location in the cell (Davis et al., 1997; Mastroyiannopoulos et al., 2010; Sun et al., 2015). This may be mediated by splicing defects (Sun et al., 2015), via export-inhibiting RBP interactions (Smith et al., 2007), or via a larger dysfunction of nucleocytoplasmic transport (Zhang et al., 2015; Zhang et al., 2016; Jovičić et al., 2015; Freibaum et al., 2015; Grima et al., 2017; Gasset-Rosa et al., 2017; Sellier et al., 2017). For example, SRSF proteins bind to CGG and G4C2 repeats and appear critical to their cytoplasmic transport out of the nucleus (Malik et al., 2021b; Hautbergue et al., 2017). In this context, lowered expression of SRSF proteins or inhibition of the SRSF protein kinase SRPK1, which regulates SRSF nuclear entry, suppress CGG repeat exit to the cytoplasm and reduce toxicity in Drosophila and neuronal model systems (Malik et al., 2021b). Together, these studies show that expanded STRs in RNA can induce toxicity via RBP depletion or by direct RNA dysfunction.
Aberrant translation of expanded STRs
Scanning translational complexes are more likely to stall at stable secondary structures formed by expanded STRs (Figure 2 (x)), resulting in aberrant translation initiation upstream of or within the repeat in a process known as repeat-associated non-AUG (RAN) translation (Figure 2 (xi)). RAN translation produces toxic peptides that contribute to expanded STR toxicity and neurodegeneration in numerous REDs (Glineburg et al., 2018; Ciesiolka et al., 2017; Cleary and Ranum, 2017; Kearse and Todd, 2014; Kearse et al., 2016; Todd et al., 2013; Wojciechowska et al., 2014; Mori et al., 2013a; Ash et al., 2013; Mori et al., 2013b; Bañez-Coronel et al., 2015; Zu et al., 2011; Zu et al., 2017; Soragni et al., 2018; Ishiguro et al., 2017).
The mechanisms underlying RAN translation likely vary across different STRs and different genetic contexts (Malik et al., 2021a; Gao et al., 2017). At the CGG repeat of FMR1, RAN translation occurs in all three reading frames to produce polyarginine (FMRpolyR) (+0-frame relative to the AUG of FMR1), polyglycine (FMRpolyG) (+1-frame), and polyalanine (FMRpolyA) (+2-frame) peptides at different efficiencies (Kearse et al., 2016; Todd et al., 2013). STR-induced stalling of translation machinery is also responsible for a reduction in downstream production of the main protein produced by FMR1 translation, FMRP, in CGG premutation carriers (Tassone et al., 2000b; Kenneson et al., 2001).
Repetitive proteins have pathogenic consequences
Repeat-containing peptides, produced via canonical translation of STRs in coding regions or via RAN translation, contribute to toxicity in REDs. At CGG repeats, both FMRpolyG and FMRpolyA are present within intranuclear neuronal inclusions in patient tissues (Sellier et al., 2017; Todd et al., 2013; Buijsen et al., 2014; Krans et al., 2019; Ma et al., 2019), and are toxic in model systems (Sellier et al., 2017; Todd et al., 2013; Derbis et al., 2018; Gohel et al., 2019; Hoem et al., 2019). FMRpolyG, the most abundant CGG RAN product, is necessary for CGG repeat toxicity and inclusion formation (Sellier et al., 2017; Todd et al., 2013; Oh et al., 2015) in overexpression models. Numerous RAN or homopolymeric peptides generated in other REDs are essential for their toxicity and formation of proteinaceous inclusions (Figure 2 (xii); Yamamoto et al., 2000; Schilling et al., 1999; Ordway et al., 1997; Bäuerlein et al., 2017; Paulson et al., 1997; Mizielinska et al., 2014; May et al., 2014; Zu et al., 2013). Overall, dysfunctional aggregation of repeat derived protein products mirrors the native function of STRs in proteins as facilitators of protein-protein interactions.
Translation through large STRs that form stable secondary structures likely induces ribosome stalls and elongation errors. A growing body of work shows that disease-associated STRs undergo stall-induced translational frameshifting to produce novel chimeric polypeptides (Gaspar et al., 2000; Toulouse et al., 2005; Davies and Rubinsztein, 2006; Tabet et al., 2018; McEachin et al., 2020; Wright et al., 2022), and several of these studies have shown that these frameshift products have distinct contributions to neuronal dysfunction in disease (Tabet et al., 2018; McEachin et al., 2020; Wright et al., 2022). While there is evidence that polymeric peptides contribute to toxicity observed in REDs via aggregation, the mechanistic details of homo- and di-polymeric peptide toxicity and chimeric polypeptide toxicity remain under investigation.
Antisense transcripts contribute to REDs via multiple mechanisms
Antisense transcription from the FMR1 locus generates multiple long-noncoding asFMR1 mRNAs, with some including the repeat (Ladd et al., 2007; Khalil et al., 2008; Elizur et al., 2016; Pastori et al., 2014). One antisense transcript, FMR4, is thought to play a critical role in regulating the cell cycle and apoptosis (Khalil et al., 2008). Another antisense transcript, FMR6, is upregulated in premutation women, with increased repeat length correlating to elevated RNA levels and reduced number of oocytes, suggesting a relationship between antisense transcript expression and toxicity (Elizur et al., 2016). FMR1 antisense transcription in general is upregulated in Fragile X premutation disorders and lost in FXS, like the sense FMR1 mRNA (Ladd et al., 2007). Moreover, asFMR1 mRNAs containing the CCG repeats can undergo RAN translation, producing additional homopolymeric proteins with toxic potential (Kearse et al., 2016). STR-containing antisense transcripts likely contribute to toxicity observed in many REDs, but this is best characterized in C9ALS/FTD and SCA8, where antisense transcripts are found in toxic RNA foci and contribute to RAN peptide production (Mori et al., 2013a; Zu et al., 2011; Zu et al., 2013; Moseley et al., 2006; Gendron et al., 2013).
Mechanisms of STR toxicity reveal novel native functions of STRs
Studies over the past three decades have delineated numerous mechanisms by which repeat expansions trigger cellular toxicity. Yet, there are striking parallels between the pathologic drivers of dysfunction elicited by repeat expansions and the native functions of STRs in regulating gene expression. In this section, we provide examples of how mechanisms initially identified as causing STR toxicity directly inform our understanding of native functions of STRs more broadly. We also discuss how emergent pathogenic properties associated with repeat expansions might inform additional native functions of repeats that are not yet well understood.
RAN translation occurs at native repeat lengths and have native functions
While CGG repeats in the FMR1 gene were primarily studied for their disease association, the CGG repeat is present in all humans at nonpathogenic lengths (<55 repeats) and conserved across mammals (Eichler et al., 1995; Sellier et al., 2017). Some studies suggest phenotypes associated with low CGG repeat numbers at this allele in humans, including memory difficulties and language dysfluency (Klusek et al., 2018; Mailick et al., 2014). Our group observed that CGG RAN translation, originally thought to be an aberrant toxic event, occurs in reporters with native repeat lengths (25 repeats) (Kearse et al., 2016), suggesting CGG repeats and/or translation of those repeats may have a native function in addition to the pathogenic one. CGG RAN translation at native and expanded STRs acts as an overlapping upstream open reading frame (uORF), inhibiting translation of the downstream main ORF (mORF) and thereby reducing FMRP synthesis (Rodriguez et al., 2020). Furthermore, this RAN uORF-like regulation of FMRP synthesis was critical for facilitating translational changes associated with stimulation of metabotropic glutamate receptors (mGluRs) in neurons (Rodriguez et al., 2020).
Upstream open-reading frames (uORFs) are well-characterized regulatory elements in eukaryotes that influence expression of protein produced from the main open reading frame (mORF) on the same transcript, and are typically inhibitory to downstream mORF translation (Hinnebusch et al., 2016). In this way, uORFs resulting from RAN translation of STRs may play a global role in regulating mRNA translation, presenting another mechanism by which STRs influence gene expression.
STRs facilitate protein function and localization
Expanded STRs in coding regions can fundamentally change the functions of the proteins within which they reside. In spinocerebellar ataxia type 1 (SCA1), a CAG repeat expansion in the ataxin 1 (ATXN1) gene changes the localization of ATXN1 protein (Irwin et al., 2005). ATXN1 normally shuttles between the nucleus and the cytoplasm, but an expanded polyQ region shifts ATXN1 localization to the nucleus (Figure 2(xiii); Irwin et al., 2005). Aberrant nuclear localization of ATXN1 underlies dysfunction in SCA1 (Lam et al., 2006; Lai et al., 2011; Klement et al., 1998; Emamian et al., 2003; Duvick et al., 2010), as modifications that favor nuclear localization are sufficient to elicit disease relevant phenotypes in the absence of the repeat expansion in mouse models.
PolyQ-associated nuclear translocation is also central to pathology in spinal and bulbar muscular atrophy (SBMA), where ligand binding and translocation to the nucleus of the expanded PolyQ-containing androgen receptor is required to elicit disease-associated transcriptional defects and cytotoxicity (Katsuno et al., 2006; Katsuno et al., 2002; Montie et al., 2009; Palazzolo et al., 2007). However, within the normal range of polyQ lengths observed in humans, Androgen receptor CAG repeat size inversely correlates with the receptor’s transactivational activity and linearly correlates with infertility and decreased sperm function (Choong and Wilson, 1998; Osadchuk et al., 2022; Pan et al., 2016). These findings suggest that the CAG repeats play a normal role in testosterone activated gene cascades that become aberrant at larger repeat sizes.
STRs facilitate mRNA transport to dendrites
An investigation into dendritic mRNA localization identified a localization pathway dependent on the interaction of a CGG repeat-interacting RBP, hnRNP A2, with a GA dendritic targeting element of an RNA (Muslimov et al., 2011). This GA targeting motif was competed for by CGG repeat-containing RNAs, including FMR1 mRNA. In addition to a native function of CGG repeats as a dendritic localization factor, this study revealed that elevated levels of CGG repeat mRNA caused by the CGG premutation expansion sequester hnRNP A2, resulting in global dysfunction in the transport of hnRNP A2-target mRNAs (Muslimov et al., 2011). Another study seeking to reveal transcriptome-wide impacts of C(C)UG repeat-mediated MBNL depletion on splicing in myotonic dystrophy (DM) also uncovered a global role for MBNL in mRNA localization (Wang et al., 2012).
PolyQ containing proteins regulate autophagy
Numerous REDs are caused by CAG repeats, including the huntingtin gene in Huntington’s disease (HD) and Ataxin 3 in spinocerebellar ataxia type 3 (SCA3), with toxicity largely attributed to the aggregation of long polyQ containing proteins. Autophagy induction results in clearance of these aggregates, attenuating their toxicity (Rubinsztein, 2006; Ravikumar et al., 2004; Menzies et al., 2010). PolyQ tracts in ataxin 3, a deubiquitinase associated with spinocerebellar ataxia type 3 (SCA3), interact with beclin 1, a key initiator of autophagy (Ashkenazi et al., 2017). Ataxin 3 then deuniquitinates beclin 1, protecting it from degradation and permitting autophagy initiation. Ataxin 3 activity and interaction with beclin 1 is competitively inhibited by other polyQ tract-containing proteins in a length-dependent manner (Ashkenazi et al., 2017). As such, polyQ tracts may actively engage protein quality control pathways basally but then these interactions become aberrant after STR expansion, in this case inhibiting autophagy and clearance of toxic proteins. Together, these studies suggest that the pathology of disease-associated STR expansions reveal native functions of STRs, just as an improved understanding of the native functions of STRs can inform on dysfunctions in disease.
Tetranucleotide, pentanucleotide, and biallelic repeat expansion disorders
Tetranucleotide and pentanucleotide STRs are rare within coding sequences, presumably because changes in their repeat number would trigger translational frameshifts. However, they are relatively common within introns, where their expansion causes several neurological disorders that likely act through pathogenic mechanisms that are similar to those exhibited by non-coding trinucleotide STRs. For example, Myotonic dystrophy type 2 (DM2) results from a dominantly inherited intronic CCTG STR expansion in ZNF9 (Liquori et al., 2001). CCTG STRs form RNA secondary structures that are like those generated by CTG STRs, and like the 3’ UTR CTG repeat in DM1, the DM2 repeat binds to and sequesters the RBP muscleblind (Botta et al., 2008; Paul et al., 2011; Fardaei et al., 2001; Mankodi et al., 2001; Miller et al., 2000; Du et al., 2010; Philips et al., 1998). This shared mechanism explains the significant overlap in their clinical phenotypes. Perhaps more interesting, however, is how subtle differences in how these repeats underlie the phenotypic differences in these conditions. In particular, CCTG expansions in DM2 do not trigger genetic anticipation or congenital forms of disease as occurs in DM1 despite the presence of very large CCTG expansions in DM2. These phenotypic differences are thought to occur for two reasons. First, these repeats exhibit differences in how they interact with other RBPs, such as rbFOX, that modulate the effects of muscleblind sequestration (Sellier et al., 2018). Second, differences in the genic positioning (intron versus 3’ UTR) and temporal expression of the two STRs alters their relative abilities to disrupt early developmental processes (Thomas et al., 2017; Cerro-Herreros et al., 2017).
An intriguing feature observed in multiple pentanucleotide repeat expansion disorders, including complex TTTTA a d TTTCA repeats that cause benign adult familial myoclonic epilepsy (BAFME) in multiple genes (Ishiura et al., 2018), ATTTC repeats in Spinocerebellar ataxia (SCA) type 31 (Sato et al., 2009), and AAGGG repeats that cause cerebellar ataxia, neuropathy, vestibular areflexia syndrome (CANVAS) (Nakamura et al., 2020; Cortese et al., 2019; Tsuchiya et al., 2020), is that the pathogenic alleles represent non-reference STRs. That is, the repeat element is not only expanded in size, but it has a different sequence than the normal allele. For example, in CANVAS, an AAAAG pentanucleotide STR normally resides within the first intron of RFC1. However, the pathological repeat is a qualitatively different and expanded AAGGG repeat. Moreover, CANVAS can also occur with a third pentanucleotide repeat, ACAGG, at this same genomic location. In all these cases, these pentanucleotide repeats occur within the polyA region of an Alu transposable element. Active Alu transposition requires pure polyA elements at their 3’ ends (Deininger, 2011). As such, there is strong evolutionary selection pressure favoring mutation of these regions to pure polyA sequences. This suggests that both the reference and non-reference STRs occurred initially through a protective process that disrupted the polyA element and prevented continued transposase activity. However, stochastic differences in these interrupting mutations created some STRs that were more prone to expansion, resulting in pathogenic alleles that either create toxic STR RNAs or that interfere with local gene expression.
Conclusions and open questions
Native functions of STRs from an evolutionary perspective
Evolutionary pressures on STR copy number predict that repeat expansions will be either tolerated or selected for until an upper, deleterious limit is reached. If STRs were intrinsically deleterious, then there would be selective pressure towards repeat contractions, leading to global reductions in STR size or their selective elimination. However, several recent studies suggest that there is selective STR expansion across phylogeny, especially in primates. This is particularly true in 5’ UTRs and coding regions, where constraints on repeat expansion and contraction are greatest (Gymrek et al., 2017). At the same time, many intergenic STRs show correlations between their size and the expression of neighboring genes. These eSTRs contribute meaningfully to population variance in gene expression profiles and disease associated Quantitative trait loci (eQTLs) in human populations (Fotsing et al., 2019; Gymrek et al., 2017). In general, these eSTRs are largely unconstrained unless neighboring or embedded within a gene already under strong constraint. This mutation-selection balance suggests some inherent native functions of STRs within transcript and protein space while also implying that intrinsic STR instability may allow for more rapid variation and acquisition of traits through local perturbations in gene expression than could be accomplished through single nucleotide mutations (Figure 3A). The highly variable methylation status, mRNA and protein expression patterns elicited by differences in FMR1 CGG repeat lengths typify the potential for repeat variation within populations to influence gene expression (Figure 3B). As even subtle changes in repeat size can tune gene expression and protein function and have downstream impacts on simple and complex phenotypes, they may be an important component of the genetic differences between humans and other species, and among humans themselves.
Revisiting our understanding of and approach to Repeat Expansion Disorders
Historically, repetitive elements within human genomes have been viewed as mostly unregulated ‘junk DNA’ that is not under selective evolutionary pressure. As such expansions of these repetitive elements are unfortunate accidents which become apparent and important only when they elicit highly penetrant and syndromic human diseases. Consistent with this line of reasoning, the field of REDs has largely focused on emergent toxic mechanisms as drivers of disease only in the setting of large STR expansions rather than considering their pathology as alterations in the native functions played by these repeats in their normal genomic contexts. Here, we propose re-framing the discussion around repetitive elements in general- and STRs in particular- within human genomes. For each STR, we suggest first considering whether the STRs associated with a human disease have any native functions at their ‘normal’ size. If a native function exists, then expansion of these STRs can be viewed primarily as an aberrancy of that native function with coincident predictable impacts on gene expression dysregulation above certain repeat lengths. This reframing aligns with the approach typically taken in studying gain-of-function and loss-of-function mutations in disease associated single amino acid mutations and better ties the native functions of STRs to their pathology. It also suggests that shared regulatory rules will likely apply across REDs.
This approach to thinking about REDs leads to specific predictions. First, we predict that more REDs will be discovered in the future. For example, multiple recently described REDs are linked to CGG repeats, including neuronal intranuclear inclusion disease (NIID) (Ishiura et al., 2019; Sone et al., 2019), oculopharyngodistal myopathy (OPDM) and leukodystrophy (OPML) (Ishiura et al., 2019; Deng et al., 2020; Ogasawara et al., 2020; Tian et al., 2019), adult onset leukoencephalopathy (Okubo et al., 2019), and autism/intellectual disability (Annear et al., 2022). Most of these new CGG repeatopathies reside within the 5’ UTRs, like the CGG repeat in FMR1, and there is already evidence of convergent disease mechanisms triggered by these new repeats with those already established in Fragile X disorders. In one particularly notable example, a CGG repeat expansion in NOTCH2NLC leads to the creation of an AUG-initiated upstream open reading frame in the 5’ UTR that is generates a polyglycine-containing protein akin to FMRpolyG in FMR1 (Liu et al., 2022; Boivin et al., 2021). This polyglycine protein is found within inclusions in patients with NIID and its generation is required to trigger inclusion formation and behavioral phenotypes in a mouse model of NOTCH2NLC associated NIID. As such, we know that this motif in this location within neuronally expressed genes can elicit dysfunction through predictable mechanisms. This means that we should expect other CGG repeat expansions to emerge that mirror the pathologic processes established for the FMR1 locus and now extended to a large set of loci. Similarly, given evidence that the CGG repeat in FMR1 5’UTRs can serve as a functional element that regulates transcription, mRNA localization and translation, we predict that native CGG repeat elements in these disease-associated alleles may have normal functions akin to those observed for FMR1, and as such represent a functional motif shared among many genes.
However, these new REDs may not all fit the typical model observed to date, where highly penetrant STR expansions lead to syndromic disorders. Instead, smaller changes in repeat size at multiple loci, impacting expression of the genes in which they reside or neighboring genes, will serve as risk alleles for common conditions. This risk-allele model is already apparent, as intermediate CAG repeat expansions in ATXN1, ATXN2, and HTT are associated with sporadic ALS and some other common neurodegenerative disorders (Elden et al., 2010; Rosas et al., 2020). Indeed, a fair proportion of the unexplained signal within Genome Wide association Studies (GWAS) can be explained by variations within neighboring STRs (Gymrek et al., 2016; Gymrek, 2017; Hannan, 2018). To date, numerous STR variants have been linked to ASD (Mitra et al., 2021; Trost et al., 2020) and Schizophrenia (Mojarad et al., 2022). As PCR-free and long-read whole genome data becomes more abundant and available (reviewed in Mitsuhashi and Matsumoto, 2020), it will become increasingly easy to detect these dynamic repeat size/disease relationships, creating a whole new class of STR-associated conditions that will likely expand outside of neurological conditions.
Second, we predict that long-read whole genome sequencing datasets will improve our understanding of the native roles of STRs in humans, and reveal a ubiquitous impact of repeat length variation on gene expression. Once we create accurate maps of STR variation across the genome and link this variation to neighboring gene loci expression, we will be able to better discern the mechanisms by which STRs influence gene expression across cell types. We predict that many genes whose expression is affected by neighboring repeat length variation will play critical functions in the nervous system. Most known REDs present with neurological symptoms. If REDs reflect the native functions of STRs, then the overrepresentation of neurological dysfunctions linked to STR expansions suggests that STRs may play roles relevant to neuronal health and function. It is also possible that neurons, as terminally differentiated cells, may be more prone to somatic instability, leading to repeat expansion and the emergence of associated dysfunction with age.
Finally, we predict that the native functions of STRs will inform our understanding of how STR expansions cause disease and vice versa. A deeper understanding of the native functions of both disease-associated STRs and STRs in general will reveal the pathways altered in REDs, and these pathways may be areas for therapeutic intervention that can be applicable across all REDs. By studying the mechanisms by which STRs elicit disease, we can also surmise key elements of how they might function normally within nervous systems (see examples in previous section, “Mechanisms of STR toxicity reveal novel native functions of STRs”). Ultimately, research into native functions of STRs will reveal both mechanisms by which they regulate neuronal function and therapeutic targets by which their toxicity in REDs can be mitigated.
References
-
Fragile X premutation is a significant risk factor for premature ovarian failure: the International collaborative POF in fragile X study -- preliminary dataAmerican Journal of Medical Genetics 83:322–325.
-
Protein repeats: structures, functions, and evolutionJournal of Structural Biology 134:117–131.https://doi.org/10.1006/jsbi.2001.4392
-
Efficient translation initiation dictates codon usage at gene startMolecular Systems Biology 9:675.https://doi.org/10.1038/msb.2013.32
-
A ubiquitous and conserved signal for RNA localization in chordatesCurrent Biology 12:1756–1761.https://doi.org/10.1016/s0960-9822(02)01220-4
-
Mechanisms of alternative pre-messenger RNA splicingAnnual Review of Biochemistry 72:291–336.https://doi.org/10.1146/annurev.biochem.72.121801.161720
-
Cpg island chromatin is shaped by recruitment of ZF-cxxc proteinsCold Spring Harbor Perspectives in Biology 5:a018648.https://doi.org/10.1101/cshperspect.a018648
-
Transposable elements in gene regulation and in the evolution of vertebrate genomesCurrent Opinion in Genetics & Development 19:607–612.https://doi.org/10.1016/j.gde.2009.10.013
-
Fragile X premutation carriers: characteristic MR imaging findings of adult male patients with progressive cerebellar and cognitive dysfunctionAJNR. American Journal of Neuroradiology 23:1757–1766.
-
Hpr1 is preferentially required for transcription of either long or G+C-rich DNA sequences in Saccharomyces cerevisiaeMolecular and Cellular Biology 21:7054–7064.https://doi.org/10.1128/MCB.21.20.7054-7064.2001
-
Trinucleotide repeats in the human androgen receptor: a molecular basis for diseaseJournal of Molecular Endocrinology 21:235–257.https://doi.org/10.1677/jme.0.0210235
-
Fragile X syndromeThe Journal of Pediatrics 110:821–831.https://doi.org/10.1016/s0022-3476(87)80392-x
-
Structural characteristics of simple RNA repeats associated with disease and their deleterious protein interactionsFrontiers in Cellular Neuroscience 11:97.https://doi.org/10.3389/fncel.2017.00097
-
New developments in Ran translation: insights from multiple diseasesCurrent Opinion in Genetics & Development 44:125–134.https://doi.org/10.1016/j.gde.2017.03.006
-
Histone modifications depict an aberrantly heterochromatinized FMR1 gene in fragile X syndromeAmerican Journal of Human Genetics 71:923–932.https://doi.org/10.1086/342931
-
The 3D folding of metazoan genomes correlates with the association of similar repetitive elementsNucleic Acids Research 44:245–255.https://doi.org/10.1093/nar/gkv1292
-
Polyalanine and polyserine frameshift products in Huntington’s diseaseJournal of Medical Genetics 43:893–896.https://doi.org/10.1136/jmg.2006.044222
-
Cpg islands and the regulation of transcriptionGenes & Development 25:1010–1022.https://doi.org/10.1101/gad.2037511
-
Fragile sites and human diseaseHuman Molecular Genetics 16 Spec No. 2:R150–R158.https://doi.org/10.1093/hmg/ddm136
-
Expansion of GGC repeat in GIPC1 is associated with oculopharyngodistal myopathyAmerican Journal of Human Genetics 106:793–804.https://doi.org/10.1016/j.ajhg.2020.04.011
-
Weak 5’-mrna secondary structures in short eukaryotic genesGenome Biology and Evolution 4:1046–1053.https://doi.org/10.1093/gbe/evs082
-
Instability and chromatin structure of expanded trinucleotide repeatsTrends in Genetics 25:288–297.https://doi.org/10.1016/j.tig.2009.04.007
-
Aberrant alternative splicing and extracellular matrix gene expression in mouse models of myotonic dystrophyNature Structural & Molecular Biology 17:187–193.https://doi.org/10.1038/nsmb.1720
-
Evolution of the cryptic FMR1 CGG repeatNature Genetics 11:301–308.https://doi.org/10.1038/ng1195-301
-
FMR6 may play a role in the pathogenesis of fragile X-associated premature ovarian insufficiencyGynecological Endocrinology 32:334–337.https://doi.org/10.3109/09513590.2015.1116508
-
High-Copy-Number expression of Sub2p, a member of the RNA helicase superfamily, suppresses hpr1-mediated genomic instabilityMolecular and Cellular Biology 21:5459–5470.https://doi.org/10.1128/MCB.21.16.5459-5470.2001
-
In vivo co-localisation of MBNL protein with DMPK expanded-repeat transcriptsNucleic Acids Research 29:2766–2771.https://doi.org/10.1093/nar/29.13.2766
-
Single amino acid and trinucleotide repeats: function and evolutionAdvances in Experimental Medicine and Biology 769:26–40.https://doi.org/10.1007/978-1-4614-5434-2_3
-
The impact of short tandem repeat variation on gene expressionNature Genetics 51:1652–1659.https://doi.org/10.1038/s41588-019-0521-9
-
Phenotype onset in Huntington’s disease knock-in mice is correlated with the incomplete splicing of the mutant huntingtin geneJournal of Neuroscience Research 97:1590–1605.https://doi.org/10.1002/jnr.24493
-
Pervasive cis effects of variation in copy number of large tandem repeats on local DNA methylation and gene expressionAmerican Journal of Human Genetics 108:809–824.https://doi.org/10.1016/j.ajhg.2021.03.016
-
CAG tract of MJD-1 may be prone to frameshifts causing polyalanine accumulationHuman Molecular Genetics 9:1957–1966.https://doi.org/10.1093/hmg/9.13.1957
-
Repetitive DNA: genomic dark matter mattersNature Reviews. Genetics 22:342.https://doi.org/10.1038/s41576-021-00354-8
-
FMRpolyG alters mitochondrial transcripts level and respiratory chain complex assembly in fragile X associated tremor/ataxia syndromeBiochimica et Biophysica Acta. Molecular Basis of Disease 1865:1379–1388.https://doi.org/10.1016/j.bbadis.2019.02.010
-
A long purine-pyrimidine homopolymer acts as a transcriptional diodeThe Journal of Biological Chemistry 270:1791–1797.https://doi.org/10.1074/jbc.270.4.1791
-
Mechanisms of transcriptional dysregulation in repeat expansion disordersBiochemical Society Transactions 42:1123–1128.https://doi.org/10.1042/BST20140049
-
A genomic view of short tandem repeatsCurrent Opinion in Genetics & Development 44:9–16.https://doi.org/10.1016/j.gde.2017.01.012
-
Interpreting short tandem repeat variations in humans using mutational constraintNature Genetics 49:1495–1501.https://doi.org/10.1038/ng.3952
-
An analysis of autism in fifty males with the fragile X syndromeAmerican Journal of Medical Genetics 23:359–374.https://doi.org/10.1002/ajmg.1320230128
-
BookFragile X Syndrome: Diagnosis, Treatment, and ResearchUnited States: Taylor & Francis.https://doi.org/10.56021/9780801868436
-
The fragile X premutation: into the phenotypic foldCurrent Opinion in Genetics & Development 12:278–283.https://doi.org/10.1016/s0959-437x(02)00299-x
-
Fragile X-associated Tremor/Ataxia Syndrome (FXTAS)Mental Retardation and Developmental Disabilities Research Reviews 10:25–30.https://doi.org/10.1002/mrdd.20005
-
Fragile X-associated tremor/ataxia syndromeAnnals of the New York Academy of Sciences 1338:58–70.https://doi.org/10.1111/nyas.12693
-
Tandem repeats mediating genetic plasticity in health and diseaseNature Reviews. Genetics 19:286–298.https://doi.org/10.1038/nrg.2017.115
-
Methylation analysis of CGG sites in the CpG island of the human FMR1 geneHuman Molecular Genetics 1:571–578.https://doi.org/10.1093/hmg/1.8.571
-
Tdp-43 suppresses CGG repeat-induced neurotoxicity through interactions with hnRNP A2/B1Human Molecular Genetics 23:5036–5051.https://doi.org/10.1093/hmg/ddu216
-
Histone deacetylase inhibitors reverse gene silencing in Friedreich’s ataxiaNature Chemical Biology 2:551–558.https://doi.org/10.1038/nchembio815
-
Regulation of alternative pre-mRNA splicing by a novel repeated hexanucleotide elementGenes & Development 8:1561–1574.https://doi.org/10.1101/gad.8.13.1561
-
RNA association and nucleocytoplasmic shuttling by ataxin-1Journal of Cell Science 118:233–242.https://doi.org/10.1242/jcs.01611
-
Fragile X premutation tremor/ataxia syndrome: molecular, clinical, and neuroimaging correlatesAmerican Journal of Human Genetics 72:869–878.https://doi.org/10.1086/374321
-
BookChapter 12 - Assessing epigenetic informationIn: Tollefsbol T, editors. Handbook of Epigenetics. San Diego: Academic Press. pp. 173–181.https://doi.org/10.1016/B978-0-12-375709-8.00012-5
-
Tandem repeats in proteins: from sequence to structureJournal of Structural Biology 179:279–288.https://doi.org/10.1016/j.jsb.2011.08.009
-
Solution structure of a DNA quadruplex containing the fragile X syndrome triplet repeatJournal of Molecular Biology 254:638–656.https://doi.org/10.1006/jmbi.1995.0644
-
Myotonic dystrophy: molecular windows on a complex etiologyNucleic Acids Research 26:1363–1368.https://doi.org/10.1093/nar/26.6.1363
-
Neuropathology of ran translation proteins in fragile X-associated tremor/ataxia syndromeActa Neuropathologica Communications 7:152.https://doi.org/10.1186/s40478-019-0782-7
-
Triplet repeat RNA structure and its role as pathogenic agent and therapeutic targetNucleic Acids Research 40:11–26.https://doi.org/10.1093/nar/gkr729
-
14-3-3 binding to ataxin-1 (ATXN1) regulates its dephosphorylation at Ser-776 and transport to the nucleusThe Journal of Biological Chemistry 286:34606–34616.https://doi.org/10.1074/jbc.M111.238527
-
The fragile X premutation presenting as essential tremorArchives of Neurology 60:117–121.https://doi.org/10.1001/archneur.60.1.117
-
Coding tandem repeats generate diversity in Aspergillus fumigatus genesEukaryotic Cell 6:1380–1391.https://doi.org/10.1128/EC.00229-06
-
Alternative splicing of the fibronectin EIIIB exon depends on specific TGCATG repeatsMolecular and Cellular Biology 18:3900–3906.https://doi.org/10.1128/MCB.18.7.3900
-
Myotonic dystrophy type 2: human founder haplotype and evolutionary conservation of the repeat tractAmerican Journal of Human Genetics 73:849–862.https://doi.org/10.1086/378720
-
Composition of the intranuclear inclusions of fragile X-associated tremor/ataxia syndromeActa Neuropathologica Communications 7:143.https://doi.org/10.1186/s40478-019-0796-1
-
Low-normal FMR1 CGG repeat length: phenotypic associationsFrontiers in Genetics 5:309.https://doi.org/10.3389/fgene.2014.00309
-
Distinctive structural motifs of RNA G-quadruplexes composed of AGG, CGG and UGG trinucleotide repeatsNucleic Acids Research 42:10196–10207.https://doi.org/10.1093/nar/gku710
-
Molecular mechanisms underlying nucleotide repeat expansion disordersNature Reviews. Molecular Cell Biology 22:589–607.https://doi.org/10.1038/s41580-021-00382-6
-
SRSF protein kinase 1 modulates RAN translation and suppresses CGG repeat toxicityEMBO Molecular Medicine 13:e14163.https://doi.org/10.15252/emmm.202114163
-
Muscleblind localizes to nuclear foci of aberrant RNA in myotonic dystrophy types 1 and 2Human Molecular Genetics 10:2165–2170.https://doi.org/10.1093/hmg/10.19.2165
-
A census of protein repeatsJournal of Molecular Biology 293:151–160.https://doi.org/10.1006/jmbi.1999.3136
-
A pedigree of mental defect showing sex-linkageJournal of Neurology and Psychiatry 6:154–157.https://doi.org/10.1136/jnnp.6.3-4.154
-
Tackling the pathogenesis of RNA nuclear retention in myotonic dystrophyBiology of the Cell 102:515–523.https://doi.org/10.1042/BC20100040
-
Evidence that methylation of the FMR-I locus is responsible for variable phenotypic expression of the fragile X syndromeAmerican Journal of Human Genetics 53:800–809.
-
Long-Read sequencing for rare human genetic diseasesJournal of Human Genetics 65:11–19.https://doi.org/10.1038/s10038-019-0671-8
-
Genome-Wide tandem repeat expansions contribute to schizophrenia riskMolecular Psychiatry 27:3692–3698.https://doi.org/10.1038/s41380-022-01575-x
-
Fraxe and mental retardationJournal of Medical Genetics 32:162–169.https://doi.org/10.1136/jmg.32.3.162
-
Regulation of fibronectin EDA exon alternative splicing: possible role of RNA secondary structure for enhancer displayMolecular and Cellular Biology 19:2657–2671.https://doi.org/10.1128/MCB.19.4.2657
-
Reproductive and menstrual history of females with fragile X expansionsEuropean Journal of Human Genetics 8:247–252.https://doi.org/10.1038/sj.ejhg.5200451
-
No evidence for parent of origin influencing premature ovarian failure in fragile X premutation carriersAmerican Journal of Human Genetics 67:253–254.https://doi.org/10.1086/302963
-
Spatial code recognition in neuronal RNA targeting: role of RNA-hnrnp A2 interactionsThe Journal of Cell Biology 194:441–457.https://doi.org/10.1083/jcb.201010027
-
Cag repeats mimic CUG repeats in the misregulation of alternative splicingNucleic Acids Research 39:8938–8951.https://doi.org/10.1093/nar/gkr608
-
Fragile X syndrome due to a missense mutationEuropean Journal of Human Genetics 22:1185–1189.https://doi.org/10.1038/ejhg.2013.311
-
Differential effects of a polyalanine tract expansion in Arx on neural development and gene expressionHuman Molecular Genetics 21:1090–1098.https://doi.org/10.1093/hmg/ddr538
-
Regulatory mechanisms of incomplete huntingtin mrna splicingNature Communications 9:3955.https://doi.org/10.1038/s41467-018-06281-3
-
Cgg expansion in NOTCH2NLC is associated with oculopharyngodistal myopathy with neurological manifestationsActa Neuropathologica Communications 8:204.https://doi.org/10.1186/s40478-020-01084-4
-
Ggc repeat expansion of NOTCH2NLC in adult patients with leukoencephalopathyAnnals of Neurology 86:962–968.https://doi.org/10.1002/ana.25586
-
Polyglutamine neurodegeneration: expanded glutamines enhance native functionsCurrent Opinion in Genetics & Development 22:251–255.https://doi.org/10.1016/j.gde.2012.01.001
-
Androgen receptor gene CAG repeat length varies and affects semen quality in an ethnic-specific fashion in young men from russiaInternational Journal of Molecular Sciences 23:10594.https://doi.org/10.3390/ijms231810594
-
Akt blocks ligand binding and protects against expanded polyglutamine androgen receptor toxicityHuman Molecular Genetics 16:1593–1603.https://doi.org/10.1093/hmg/ddm109
-
Losing DNA methylation at repetitive elements and breaking badEpigenetics & Chromatin 14:25.https://doi.org/10.1186/s13072-021-00400-z
-
The muscleblind family of proteins: an emerging class of regulators of developmentally programmed alternative splicingDifferentiation; Research in Biological Diversity 74:65–80.https://doi.org/10.1111/j.1432-0436.2006.00060.x
-
Cation-Dependent conformational switches in d-TGGCGGC containing two triplet repeats of fragile X syndrome: NMR observationsBiochemical and Biophysical Research Communications 278:833–838.https://doi.org/10.1006/bbrc.2000.3878
-
Expanded CUG repeats Dysregulate RNA splicing by altering the stoichiometry of the muscleblind 1 complexThe Journal of Biological Chemistry 286:38427–38438.https://doi.org/10.1074/jbc.M111.255224
-
BookChapter 9 - Repeat expansion diseasesIn: Geschwind DH, Paulson HL, Klein C, editors. Handbook of Clinical Neurology, Neurogenetics, Part I. Elsevier. pp. 105–123.https://doi.org/10.1016/B978-0-444-63233-3.00009-9
-
A genomic portrait of human microsatellite variationMolecular Biology and Evolution 28:303–312.https://doi.org/10.1093/molbev/msq198
-
Intragenic FMR1 disease-causing variants: a significant mutational mechanism leading to fragile-X syndromeEuropean Journal of Human Genetics 25:423–431.https://doi.org/10.1038/ejhg.2016.204
-
New pathologic mechanisms in nucleotide repeat expansion disordersNeurobiology of Disease 130:104515.https://doi.org/10.1016/j.nbd.2019.104515
-
Spinocerebellar ataxia type 31 is associated with “ inserted ” penta-nucleotide repeats containing (TGGAA) NAmerican Journal of Human Genetics 85:544–557.https://doi.org/10.1016/j.ajhg.2009.09.019
-
Evolution and function of CAG/polyglutamine repeats in protein-protein interaction networksNucleic Acids Research 40:4273–4287.https://doi.org/10.1093/nar/gks011
-
The molecular basis of common and rare fragile sitesCancer Letters 232:13–26.https://doi.org/10.1016/j.canlet.2005.07.039
-
Expansion and function of repeat domain proteins during stress and development in plantsFrontiers in Plant Science 6:1218.https://doi.org/10.3389/fpls.2015.01218
-
The marker (X) syndrome: a cytogenetic and genetic analysisAnnals of Human Genetics 48:21–37.https://doi.org/10.1111/j.1469-1809.1984.tb00830.x
-
Defining early steps in mRNA transport: mutant mRNA in myotonic dystrophy type I is blocked at entry into SC-35 domainsThe Journal of Cell Biology 178:951–964.https://doi.org/10.1083/jcb.200706048
-
Rna structure of trinucleotide repeats associated with human neurological diseasesNucleic Acids Research 31:5469–5482.https://doi.org/10.1093/nar/gkg766
-
Structural diversity of triplet repeat rnasJournal of Biological Chemistry 285:12755–12764.https://doi.org/10.1074/jbc.M109.078790
-
Amount of RNA secondary structure required to induce an alternative spliceMolecular and Cellular Biology 7:3194–3198.https://doi.org/10.1128/mcb.7.9.3194-3198.1987
-
Repeat-associated non-ATG (ran) translation in fuchs’ endothelial corneal dystrophyInvestigative Opthalmology & Visual Science 59:1888.https://doi.org/10.1167/iovs.17-23265
-
The DMPK gene of severely affected myotonic dystrophy patients is hypermethylated proximal to the largely expanded CTG repeatAmerican Journal of Human Genetics 62:278–285.https://doi.org/10.1086/301711
-
Large domains of apparent delayed replication timing associated with triplet repeat expansion at FRAXA and FRAXEAmerican Journal of Human Genetics 59:407–416.
-
Dna methylation represses FMR-1 transcription in fragile X syndromeHuman Molecular Genetics 1:397–400.https://doi.org/10.1093/hmg/1.6.397
-
Benefits and limitations of genome-wide association studiesNature Reviews. Genetics 20:467–484.https://doi.org/10.1038/s41576-019-0127-1
-
Elevated levels of FMR1 mrna in carrier males: a new mechanism of involvement in the fragile-X syndromeAmerican Journal of Human Genetics 66:6–15.https://doi.org/10.1086/302720
-
Differential usage of transcriptional start sites and polyadenylation sites in FMR1 premutation allelesNucleic Acids Research 39:6172–6185.https://doi.org/10.1093/nar/gkr100
-
Disrupted prenatal RNA processing and myogenesis in congenital myotonic dystrophyGenes & Development 31:1122–1133.https://doi.org/10.1101/gad.300590.117
-
Expansion of human-specific GGC repeat in neuronal intranuclear inclusion disease-related disordersAmerican Journal of Human Genetics 105:166–176.https://doi.org/10.1016/j.ajhg.2019.05.013
-
Ribosomal frameshifting on MJD-1 transcripts with long CAG tractsHuman Molecular Genetics 14:2649–2660.https://doi.org/10.1093/hmg/ddi299
-
Rfc1 repeat expansion in Japanese patients with late-onset cerebellar ataxiaJournal of Human Genetics 65:1143–1147.https://doi.org/10.1038/s10038-020-0807-x
-
Predicted changes in pre-mRNA secondary structure vary in their association with exon skipping for mutations in exons 2, 4, and 8 of the HPRT gene and exon 51 of the fibrillin geneMutation Research/Mutation Research Genomics 432:15–32.https://doi.org/10.1016/S1383-5726(99)00011-4
-
The myotonic dystrophies: molecular, clinical, and therapeutic challengesThe Lancet. Neurology 11:891–905.https://doi.org/10.1016/S1474-4422(12)70204-1
-
Genome-wide association studiesNature Reviews Methods Primers 1:1–21.https://doi.org/10.1038/s43586-021-00056-9
-
Intragenic tandem repeats generate functional variabilityNature Genetics 37:986–990.https://doi.org/10.1038/ng1618
-
Long CCG triplet repeat blocks exclude nucleosomes: a possible mechanism for the nature of fragile sites in chromosomesJournal of Molecular Biology 263:511–516.https://doi.org/10.1006/jmbi.1996.0593
-
Chromatin structure of repeating CTG/CAG and CGG/CCG sequences in human diseaseFrontiers in Bioscience 12:4731–4741.https://doi.org/10.2741/2422
-
The landscape of human STR variationGenome Research 24:1894–1904.https://doi.org/10.1101/gr.177774.114
-
Population-Scale sequencing data enable precise estimates of Y-STR mutation ratesAmerican Journal of Human Genetics 98:919–933.https://doi.org/10.1016/j.ajhg.2016.04.001
-
Cgg repeats trigger translational frameshifts that generate aggregation-prone chimeric proteinsNucleic Acids Research 50:8674–8689.https://doi.org/10.1093/nar/gkac626
-
Hypermethylation of the CpG island near the G4C2 repeat in ALS with a C9orf72 expansionAmerican Journal of Human Genetics 92:981–989.https://doi.org/10.1016/j.ajhg.2013.04.017
Article and author information
Author details
Funding
National Institute of Neurological Disorders and Stroke (F31NS113513)
- Shannon E Wright
Eunice Kennedy Shriver National Institute of Child Health and Human Development (P50HD104463)
- Peter K Todd
National Institute of Neurological Disorders and Stroke (R01NS099280)
- Peter K Todd
National Institute of Neurological Disorders and Stroke (R01NS086810)
- Peter K Todd
Veterans Administration Medical Center (BX004842)
- Peter K Todd
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank current and former members of the Todd lab for helpful discussions and commentary. This work was funded by grants from the NIH to SEW (T32-NS076401 and NRSA F31NS113513) and PKT (P50HD104463, R01NS099280 and R01NS086810). PKT was also supported by the VA (BLRD BX004842 to PKT) and private philanthropic support.
Copyright
This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Metrics
-
- 3,021
- views
-
- 457
- downloads
-
- 11
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Genetics and Genomics
Recent studies have revealed a role for zinc in insulin secretion and glucose homeostasis. Randomized placebo-controlled zinc supplementation trials have demonstrated improved glycemic traits in patients with type II diabetes (T2D). Moreover, rare loss-of-function variants in the zinc efflux transporter SLC30A8 reduce T2D risk. Despite this accumulated evidence, a mechanistic understanding of how zinc influences systemic glucose homeostasis and consequently T2D risk remains unclear. To further explore the relationship between zinc and metabolic traits, we searched the exome database of the Regeneron Genetics Center-Geisinger Health System DiscovEHR cohort for genes that regulate zinc levels and associate with changes in metabolic traits. We then explored our main finding using in vitro and in vivo models. We identified rare loss-of-function (LOF) variants (MAF <1%) in Solute Carrier Family 39, Member 5 (SLC39A5) associated with increased circulating zinc (p=4.9 × 10-4). Trans-ancestry meta-analysis across four studies exhibited a nominal association of SLC39A5 LOF variants with decreased T2D risk. To explore the mechanisms underlying these associations, we generated mice lacking Slc39a5. Slc39a5-/- mice display improved liver function and reduced hyperglycemia when challenged with congenital or diet-induced obesity. These improvements result from elevated hepatic zinc levels and concomitant activation of hepatic AMPK and AKT signaling, in part due to zinc-mediated inhibition of hepatic protein phosphatase activity. Furthermore, under conditions of diet-induced non-alcoholic steatohepatitis (NASH), Slc39a5-/- mice display significantly attenuated fibrosis and inflammation. Taken together, these results suggest SLC39A5 as a potential therapeutic target for non-alcoholic fatty liver disease (NAFLD) due to metabolic derangements including T2D.
-
- Genetics and Genomics
- Stem Cells and Regenerative Medicine
Retinal degeneration in mammals causes permanent loss of vision, due to an inability to regenerate naturally. Some non-mammalian vertebrates show robust regeneration, via Muller glia (MG). We have recently made significant progress in stimulating adult mouse MG to regenerate functional neurons by transgenic expression of the proneural transcription factor Ascl1. While these results showed that MG can serve as an endogenous source of neuronal replacement, the efficacy of this process is limited. With the goal of improving this in mammals, we designed a small molecule screen using sci-Plex, a method to multiplex up to thousands of single-nucleus RNA-seq conditions into a single experiment. We used this technology to screen a library of 92 compounds, identified, and validated two that promote neurogenesis in vivo. Our results demonstrate that high-throughput single-cell molecular profiling can substantially improve the discovery process for molecules and pathways that can stimulate neural regeneration and further demonstrate the potential for this approach to restore vision in patients with retinal disease.