1. Chromosomes and Gene Expression
  2. Plant Biology
Download icon

Widespread premature transcription termination of Arabidopsis thaliana NLR genes by the spen protein FPA

  1. Matthew T Parker
  2. Katarzyna Knop
  3. Vasiliki Zacharaki
  4. Anna V Sherwood
  5. Daniel Tomé
  6. Xuhong Yu
  7. Pascal GP Martin
  8. Jim Beynon
  9. Scott D Michaels
  10. Geoffrey J Barton
  11. Gordon G Simpson  Is a corresponding author
  1. School of Life Sciences, University of Dundee, United Kingdom
  2. School of Life Sciences, University of Warwick, United Kingdom
  3. Department of Biology, Indiana University, United States
  4. The James Hutton Institute, United Kingdom
Research Article
  • Cited 0
  • Views 1,105
  • Annotations
Cite this article as: eLife 2021;10:e65537 doi: 10.7554/eLife.65537

Abstract

Genes involved in disease resistance are some of the fastest evolving and most diverse components of genomes. Large numbers of nucleotide-binding, leucine-rich repeat (NLR) genes are found in plant genomes and are required for disease resistance. However, NLRs can trigger autoimmunity, disrupt beneficial microbiota or reduce fitness. It is therefore crucial to understand how NLRs are controlled. Here, we show that the RNA-binding protein FPA mediates widespread premature cleavage and polyadenylation of NLR transcripts, thereby controlling their functional expression and impacting immunity. Using long-read Nanopore direct RNA sequencing, we resolved the complexity of NLR transcript processing and gene annotation. Our results uncover a co-transcriptional layer of NLR control with implications for understanding the regulatory and evolutionary dynamics of NLRs in the immune responses of plants.

Introduction

In plants and animals, NLR (nucleotide-binding, leucine-rich repeat) proteins function to detect the presence and activity of pathogens (Barragan and Weigel, 2020; Jones et al., 2016; Tamborski and Krasileva, 2020). Plant genomes can encode large numbers of NLR genes, which often occur in physical clusters (Jiao and Schneeberger, 2020; Wei et al., 2016). Powerful selective pressure drives the rapid birth and death of NLR genes, resulting in intraspecific diversity in NLR alleles and gene number. Consequently, the near-complete repertoire of Arabidopsis NLR genes was only recently revealed using long-read DNA sequencing of diverse Arabidopsis accessions (Van de Weyer et al., 2019).

In plants, NLR proteins generally comprise an N-terminal Toll/interleukin receptor (TIR), coiled-coil (CC) or RPW8 domain that facilitates signalling; a central nucleotide-binding NB-ARC domain that acts as a molecular switch; and C-terminal leucine-rich repeats (LRRs) that interact with target proteins. NLRs can recognise pathogen effectors either directly by binding to them through LRR domains or indirectly by detecting modifications to host proteins caused by effector action. In some cases, domains of host proteins targeted by pathogen effectors have been incorporated into NLRs as integrated domains (or decoys) (Le Roux et al., 2015). NLRs that interact directly with effectors are under high levels of diversifying selection to modify their recognition specificities, resulting in significant allelic polymorphism (Prigozhin and Krasileva, 2021). Genomic variation also yields diversity in NLR protein organisation, through domain swapping or truncating mutations, and NLR isoforms that lack NB-ARC or LRR domains can function in plant immune responses (Nishimura et al., 2015; Swiderski et al., 2009; Zhang and Gassmann, 2007). The consequence of this diversity is that there is no one-size-fits-all explanation of how NLR proteins function (Barragan and Weigel, 2020).

The benefit of NLRs to the host is disease resistance, but the costs of increased NLR diversity or activity can include detrimental autoimmunity (Rodriguez et al., 2016), reduced association with beneficial microbes (Yang et al., 2010) and a general reduction in fitness (Tian et al., 2003). In some cases, autoimmunity caused by epistatic interactions involving NLRs can cause hybrid necrosis (Chae et al., 2014). Therefore, a key question is how NLRs are regulated to enable limited expression for pathogen surveillance but enhanced expression during defence responses. This problem is compounded by the evolutionary dynamics of NLRs because regulatory processes must keep pace with the emergence of new NLR genes and gain or loss of function in others. Consequently, the regulation of NLRs is one of the most important and difficult challenges faced by plants.

NLR control measures occur at different stages of gene expression (Lai and Eulgem, 2018). For example, microRNAs limit the expression of many NLRs by targeting conserved regions encoded in NLR mRNAs and triggering cascades of phased siRNAs that broadly suppress NLR activity (Cai et al., 2018; Canto-Pastor et al., 2019; Shivaprasad et al., 2012; Zhai et al., 2011). Alternative splicing, which promotes the simultaneous expression of more than one NLR isoform, is required for the functions of both the N gene which provides resistance to tobacco mosaic virus (Dinesh-Kumar and Baker, 2000), and RECOGNITION OF PSEUDOMONAS SYRINGAE 4 (RPS4), which confers resistance to Pseudomonas syringae DC3000 in Arabidopsis (Zhang and Gassmann, 2007). Alternative polyadenylation at intragenic heterochromatin controls the expression of Arabidopsis RECOGNITION OF PERONOSPORA PARASITICA 7 (RPP7), with functional consequences for immunity against the oomycete pathogen Hyaloperonospora arabidopsidis (Tsuchiya and Eulgem, 2013). Finally, RNA surveillance pathways control NLRs. For example, null mutants defective in nonsense-mediated RNA decay (NMD) are lethal in Arabidopsis because they trigger NLR RPS6-dependent autoimmunity (Gloggnitzer et al., 2014). Conversely, mutations in the RNA exosome, which degrades RNAs in a 3′ to 5′ direction, suppress RPS6-dependent autoimmune phenotypes (Takagi et al., 2020). Fine tuning of different levels of NLR control may be integrated to produce quantitative patterns of disease resistance (Corwin and Kliebenstein, 2017), but our understanding of how this occurs globally is fragmentary and incomplete (Adachi et al., 2019).

The RNA-binding protein FPA was first identified as a factor required for the control of Arabidopsis flowering time (Koornneef et al., 1991). Loss-of-function fpa mutants flower late due to elevated levels of the floral repressor, FLC (Schomburg et al., 2001). However, this cannot be the only function of FPA because it is much more widely conserved than FLC. FPA is a member of the spen family of proteins, which are defined by three N-terminal RNA recognition motifs and a C-terminal protein interaction SPOC domain (Ariyoshi and Schwabe, 2003). We previously showed that FPA controls the site of cleavage and polyadenylation in some mRNAs, including autoregulation of FPA pre-mRNA (Duc et al., 2013; Hornyik et al., 2010; Lyons et al., 2013). These findings were extended to show that FPA can affect poly(A) site choice at genes with intronic heterochromatin, including RPP7 (Deremetz et al., 2019). The poly(A) site selection mechanism used by FPA remains unclear. FPA might mediate poly(A) site choice either directly by recruiting the RNA 3′ end processing machinery to sensitive sites or indirectly, for example by influencing splicing, chromatin modifications or the rate of transcription by RNA Polymerase II (Pol II). We previously used Helicos Biosciences direct RNA sequencing (Helicos DRS) to map the 3′ ends of Arabidopsis polyadenylated transcripts and identify genes affected by transcriptome-wide loss of FPA function (Duc et al., 2013; Sherstnev et al., 2012). A limitation of this approach was that it could only identify RNA 3′ end positions, and so could not resolve other potential roles of FPA in gene expression.

In this study, we used two approaches to gain a clearer understanding of how FPA functions. We first investigated which proteins FPA associates with inside living plant cells. Next, we analysed the global impact of different levels of FPA activity on gene expression. For this, we combined Helicos DRS with short-read Illumina RNA-Seq and Oxford Nanopore Technologies (Nanopore) DRS, which can reveal the authentic processing and modification of full-length mRNAs (Parker et al., 2020). Using these combined data together with new computational approaches to study RNA processing, we found that the predominant role of FPA is to promote poly(A) site choice. In addition, we uncovered an unusual degree of complexity in the processing of NLR mRNAs, which is sensitive to FPA. The finding that premature transcription termination functions as an additional layer of NLR expression control has implications for understanding the dynamics of NLR regulation and evolution.

Results

FPA co-purifies with proteins that mediate mRNA 3′ end processing

In order to understand how FPA controls the site of mRNA 3′ end formation, we used in vivo interaction proteomics–mass spectrometry (IVI-MS) to identify which proteins FPA associates with inside living plant cells. First, we fixed molecular interactions using formaldehyde infiltration of Arabidopsis seedlings expressing FPA fused to YFP (35S::FPA:YFP). Wild-type Columbia-0 (Col-0) seedlings treated in the same way were used as a negative control. We then purified nuclei and performed GFP-trap immunopurification followed by liquid chromatography–tandem mass spectrometry (LC-MS/MS) to identify FPA-associated proteins. By comparing the proteins detected in three biological replicates of 35S::FPA:YFP and Col-0, we identified 203 FPA co-purifying proteins with a median log2 fold change in adundance of greater than two (Figure 1—figure supplement 1). At least 56% (113) of the enriched proteins are poly(A)+mRNA binding proteins as established by orthogonal RNA-binding proteome analysis (Bach-Pages et al., 2020; Reichel et al., 2016).

Consistent with FPA control of mRNA 3′ end formation, 14 highly conserved cleavage and polyadenylation factors (CPFs) co-purified with FPA (Figure 1A, Supplementary file 1). These include members of the cleavage and polyadenylation specificity factor (CPSF) complex, cleavage stimulating factor (CstF) complex, and cleavage factor I and II (CFIm/CFIIm) complexes. The U2AF and U2 spliceosome components that interact with CFIm–CPSF to mediate terminal exon definition were also detected (Kyburz et al., 2006; Figure 1B, Supplementary file 1). We additionally detected both subunits of Pol II. Characteristically, Serine5 of the Pol II C-terminal domain (CTD) heptad repeat is phosphorylated when Pol II is at the 5′ end of genes, and Ser2 is phosphorylated when Pol II is at the 3′ end (Komarnitsky et al., 2000). The position-specific phosphorylation of these sites alters the RNA processing factors which are recruited to the CTD at the different stages of transcription. We found that the kinase CDKC;2, which phosphorylates Ser2 (Wang et al., 2014), and the phosphatase CPL1 (homolog of yeast Fcp1), which dephosphorylates Ser5 (Koiwa et al., 2004), co-purified with FPA. We also detected the homolog of the human exonuclease XRN2 (known as XRN3 in Arabidopsis), which mediates Pol II transcription termination (Krzyszton et al., 2018).

Figure 1 with 2 supplements see all
FPA associates with proteins that function to process the 3′ ends of Pol II-transcribed RNAs and promote transcription termination.

(A–D) Volcano plots representing proteins co-purifying with FPA using IVI-MS. Only proteins detected in all three biological replicates of the 35S::FPA:YFP line are shown (light grey). The following classes are highlighted: (A) CPFs in dark blue; (B) Pol II-associated factors in green; terminal exon definition factors in dark orange; (C) autonomous pathway components in yellow and factors controlling alternative polyadenylation in light orange; and (D) m6A writer complex components in light blue. (E) ChIP-Seq metagene profile showing the normalised occupancy of FPA (green) and Pol II phosphorylated at Ser5 (pink) and Ser2 (brown) of the CTD (Yu et al., 2019) relative to the major 3′ position of each gene, as measured using Helicos DRS. Only long genes (>2.5 kb) are included (n = 10,215).

A second major class of proteins that co-purified with FPA are components of the autonomous flowering pathway (Andrés and Coupland, 2012; Simpson, 2004; Figure 1C, Supplementary file 1). FPA functions in the autonomous pathway to limit expression of the floral repressor FLC. FPA activity is associated with alternative polyadenylation of long non-coding RNAs that are transcribed antisense to the FLC locus (Hornyik et al., 2010; Liu et al., 2007). Consistent with this, conserved CPF proteins such as FY (WDR33) (Simpson et al., 2003), PCFS4 (Xing et al., 2008), CSTF64 and CSTF77 (Liu et al., 2010) were previously identified in late flowering mutant screens. Other detected autonomous pathway factors are proteins with established roles in pre-mRNA processing, including HLP1 (Zhang et al., 2015), FLK (Mockler et al., 2004) and EMB1579/RSA1 (Zhang et al., 2020b). Notably, FLK has been found to associate with PEP, HUA1, and HEN4 (Zhang et al., 2015), and we identified all four of these as FPA co-purifying proteins. In addition to regulating FLC, the FLK–PEP complex has been shown to control alternative polyadenylation within pre-mRNA encoding the floral homeotic transcription factor AGAMOUS (Rodríguez-Cazorla et al., 2015). Their co-purification with FPA suggests that this role may be more global and involve direct interactions at RNA 3′ ends.

A third group of proteins that co-purified with FPA are conserved members of the mRNA N6-methyladenosine (m6A) writer complex (Růžička et al., 2017; Figure 1D, Supplementary file 1). The m6A modification mediated by this complex is predominately targeted to the 3′ untranslated region (UTR) of Arabidopsis protein-coding mRNAs (Parker et al., 2020). The co-purification of FPA with m6A writer complex components may be explained by either a direct role for FPA in m6A modification or, more simply, because both CPF and m6A writer proteins are found at RNA 3′ ends.

The picture that emerges from this analysis is that FPA is located in proximity to proteins that promote cleavage, polyadenylation, transcription termination and RNA modification at the 3′ end of Pol II-transcribed genes.

FPA co-localises with RNA Pol II Ser2 at the 3′ end of Arabidopsis genes

We next used an orthogonal approach to investigate the association of FPA with proteins that function at the 3′ end of Pol II-transcribed genes. We performed chromatin immumunoprecipitation sequencing (ChIP-Seq) using antibodies against FPA and Pol II phosphorylated at either Ser5 or Ser2 of the CTD heptad repeat (Yu et al., 2019). Our metagene analysis revealed that FPA is enriched at the 3′ end of genes and co-localises with Pol II phosphorylated at Ser2 of the CTD (Figure 1E, Figure 1—figure supplement 1). We found that FPA occupancy at 3’ ends was well correlated with Pol II Ser2 occupancy (Spearman’s ρ = 0.67, p<2 × 10−308, 95% confidence interval [0.66, 0.68]). The close relationship between FPA and Pol II Ser2 is reinforced by changes in the distribution of Pol II isoforms in fpa mutants. For example, we previously showed that FPA is required for 3′ end processing at PIF5 (Duc et al., 2013). Pol II Ser2 was enriched at the 3′ end of PIF5 in Col-0 but depleted from this region in fpa-7 mutants (Figure 1—figure supplement 2). Together, these orthogonal ChIP-Seq and IVI-MS analyses reveal the close association of FPA with proteins involved in 3′ end processing and transcription termination at the 3′ end of Arabidopsis genes.

FPA predominantly promotes poly(A) site choice

We next asked which RNA processing events are controlled by FPA. We used a combination of Illumina RNA-Seq and Helicos and Nanopore DRS technologies to analyse three different genetic backgrounds expressing different levels of FPA activity: wild-type Col-0, loss-of-function fpa-8 and a line overexpressing FPA fused to YFP (35S::FPA:YFP). In combination, these orthogonal sequencing technologies can reveal different features of transcriptomes: Helicos DRS short reads identify the 3′ ends of mRNAs, but cannot reveal the full properties of the corresponding transcripts (Ozsolak et al., 2009) Illumina RNA-Seq produces short reads derived from all expressed regions, meaning that changes in RNA 3′ end processing can only be detected by differences in coverage (Xia et al., 2014) and Nanopore DRS long reads define the 3′ ends of mRNAs in the context of reads that can correspond to full-length transcripts (Parker et al., 2020). For each genotype, we performed three biological replicates with Helicos DRS, six with Illumina RNA-Seq and four with Nanopore DRS. The resultant sequencing statistics are detailed in Supplementary file 1.

We first assessed the utility of the three sequencing technologies to map changes in mRNA processing by focusing on the FPA locus. FPA autoregulates its expression by promoting premature cleavage and polyadenylation within intron 1 of FPA pre-mRNA (Duc et al., 2013; Hornyik et al., 2010). Consistent with this, a proximal poly(A) site in the first intron and distal sites in the terminal intron and exon of FPA could be mapped in Col-0 using Nanopore and Helicos DRS (Figure 2A). Using all three data types, we detected a quantitative shift towards selection of distal poly(A) sites in the loss-of-function fpa-8 mutant and a strong shift to proximal poly(A) site selection when FPA is overexpressed (35S::FPA:YFP). Nanopore DRS provided the clearest picture of alternative polyadenylation events because full-length reads reveal poly(A) site choice in the context of other RNA processing events.

Figure 2 with 4 supplements see all
FPA-dependent poly(A) site selection.

Loss of FPA function is associated with the preferential selection of distal poly(A) sites, whereas FPA overexpression leads to the preferential selection of proximal poly(A) sites. (A) Illumina RNA-Seq, Helicos DRS and Nanopore DRS reveal FPA-dependent RNA 3′ end processing changes at the FPA (AT2G43410) locus. The 35S::FPA:YFP construct has alternative transgene-derived untranslated regions, so mRNAs derived from the transgene do not align to the native FPA 5′UTR and 3′UTR. (B) Histograms showing change in mean RNA 3′ end position for significantly alternatively polyadenylated loci (EMD >25, FDR < 0.05) in fpa-8 (left panel) and 35S::FPA:YFP (right panel) compared with Col-0, as detected using Nanopore DRS. Orange and green shaded regions indicate sites with negative and positive RNA 3′ end position changes, respectively. (C) Effect size of significant proximal (orange) and distal (green) alternative polyadenylation events in fpa-8 (left panel) and 35S::FPA:YFP (right panel) compared with Col-0, as measured using the EMD. (D) Histograms showing change in mean RNA 3′ end position for significantly alternatively polyadenylated loci (EMD >25, FDR < 0.05) in fpa-8 (left panel) and 35S::FPA:YFP (right panel) compared with Col-0, as detected using Nanopore DRS. Orange and green shaded regions indicate sites with negative and positive RNA 3′ end position changes, respectively. (E) Effect size of significant proximal (orange) and distal (green) alternative polyadenylation events in fpa-8 (left panel) and 35S::FPA:YFP (right panel) compared with Col-0, as measured using the EMD. (F) Boxplots showing the effect size (absolute log2 fold change (logFC)) of alternatively processed loci identified using Illumina RNA-Seq in fpa-8 (left panel) and 35S::FPA:YFP (right panel) respectively. Down- and upregulated loci are shown in orange and green, respectively. For each locus, the region with the largest logFC was selected to represent the locus. Loci with both up- and downregulated regions contribute to both boxes. (G) Boxplots showing the effect size (absolute logFC) of loci with alternative splice junction usage identified using Illumina RNA-Seq in fpa-8 (left panel) and 35S::FPA:YFP (right panel), respectively. Down- and upregulated loci are shown in orange and green, respectively. For each locus, the junction with the largest logFC was selected to represent the locus. Loci with both up- and downregulated junctions contribute to both boxes.

Figure 2—source data 1

Nanopore StringTie assembly [Linked to Figure 2A–B].

https://cdn.elifesciences.org/articles/65537/elife-65537-fig2-data1-v3.tds
Figure 2—source data 2

Differential 3′ processing results for fpa-8 vs Col-0, as identified by Nanopore DRS [Linked to Figure 2B–C].

https://cdn.elifesciences.org/articles/65537/elife-65537-fig2-data2-v3.tds
Figure 2—source data 3

Differential 3′ processing results for 35S::FPA:YFP vs Col-0, as identified by Nanopore DRS [Linked to Figure 2B–C].

https://cdn.elifesciences.org/articles/65537/elife-65537-fig2-data3-v3.tds
Figure 2—source data 4

Differential 3′ processing results for fpa-8 vs Col-0, as identified by Helicos DRS [Linked to Figure 2D–E].

https://cdn.elifesciences.org/articles/65537/elife-65537-fig2-data4-v3.tds
Figure 2—source data 5

Differential 3′ processing results for 35S::FPA:YFP vs Col-0, as identified by Helicos DRS [Linked to Figure 2D–E].

https://cdn.elifesciences.org/articles/65537/elife-65537-fig2-data5-v3.tds
Figure 2—source data 6

Differentially expressed regions results for fpa-8 vs Col-0, as identified by Illumina RNA-Seq [Linked to Figure 2F].

https://cdn.elifesciences.org/articles/65537/elife-65537-fig2-data6-v3.tds
Figure 2—source data 7

Differentially expressed regions results for 35S::FPA:YFP vs Col-0, as identified by Illumina RNA-Seq [Linked to Figure 2F].

https://cdn.elifesciences.org/articles/65537/elife-65537-fig2-data7-v3.tds
Figure 2—source data 8

Differential splice junction usage results for fpa-8 vs Col-0, as identified by Illumina RNA-Seq [Linked to Figure 2G].

https://cdn.elifesciences.org/articles/65537/elife-65537-fig2-data8-v3.tds
Figure 2—source data 9

Differential splice junction usage results for 35S::FPA:YFP vs Col-0, as identified by Illumina RNA-Seq [Linked to Figure 2G].

https://cdn.elifesciences.org/articles/65537/elife-65537-fig2-data9-v3.tds

We next asked how transcriptome-wide RNA processing is affected by FPA activity. Since mutations in FPA cause readthrough of annotated 3′UTRs (Duc et al., 2013), we applied the software tool StringTie2 (Pertea et al., 2015) to create a bespoke reference annotation with Nanopore DRS reads from Col-0, fpa-8 and 35S::FPA:YFP. We then measured how changes in FPA expression altered the 3′ end distribution at each locus using the earth mover’s distance (EMD; also known as the Wasserstein distance). EMD indicates the ‘work’ required to transform one normalised distribution into another based on the proportion of 3′ ends that would have to be moved and by what distance. We used an EMD permutation test, in which reads are randomly shuffled between conditions, to estimate p-values for each locus. Loci with an EMD greater than 25 and a false discovery rate (FDR) less than 0.05 were considered differentially polyadenylated.

Using this approach on Nanopore DRS data, we identified 285 and 293 loci with alternative polyadenylation events in fpa-8 and 35S::FPA:YFP, respectively (Figure 2B). In all, 77.9% (222) of loci with alternative polyadenylation in fpa-8 displayed a positive change in the mean 3′ end position, indicating a predominant shift to distal poly(A) site selection (Figure 2B, left panel). These loci also had greater effect sizes than those with shifts towards proximal poly(A) sites (Figure 2C, left panel). In contrast, 56.7% (166) of loci with alternative polyadenylation in 35S::FPA:YFP displayed a negative change in the mean 3′ end position, indicating a shift towards proximal poly(A) sites (Figure 2B, right panel). These loci had greater effect sizes than those with positive changes in 3′ end profile (Figure 2C, right panel). A total of 16 loci displayed a shift to distal poly(A) site selection in fpa-8 and to proximal poly(A) site selection in 35S::FPA:YFP (hypergeometric test p=3.9 × 10−7), demonstrating that loss of function versus overexpression of FPA can result in reciprocal patterns of poly(A) site choice.

We used the same approach to identify loci with FPA-dependent alternative polyadenylation in Helicos DRS data. We identified 319 and 299 genes with alternative polyadenylation events in fpa-8 and 35S::FPA:YFP, respectively (Figure 2D and E). Consistent with Nanopore DRS analysis, the predominant shifts in fpa-8 and 35S::FPA:YFP were towards distal (79.0% or 252 loci) and proximal (75.3% or 225 loci) poly(A) sites, respectively. In all, 44 loci displayed a shift to distal poly(A) sites in fpa-8 and to proximal poly(A) sites in 35S::FPA:YFP (hypergeometric test p=4.8 × 10−30), again demonstrating reciprocal poly(A) site selection depending on FPA activity. Of the 222 loci identified with shifts to distal poly(A) sites in fpa-8 using Nanopore DRS, 39.6% (88) were also detected using Helicos DRS (Figure 2—figure supplement 1). Likewise, 44.0% of loci (73) with proximal polyadenylation detected in 35S::FPA:YFP using Nanopore DRS were also detected using Helicos DRS. Across the DRS datasets, we identified 59 loci for which reciprocal poly(A) site selection depending on FPA activity could be detected by Nanopore DRS and/or Helicos DRS.

In order to analyse the Illumina RNA-Seq data, we developed annotation-agnostic software for detecting alternative RNA 3′ end processing events, using a similar approach to the existing tools DERfinder (Collado-Torres et al., 2017), RNAprof (Tran et al., 2016), and DEXSeq (Anders et al., 2012). We segmented Illumina RNA-Seq data by coverage and relative expression in fpa-8 or 35S::FPA:YFP compared with Col-0. Segmented regions were grouped into transcriptional loci using the annotations generated from Nanopore DRS datasets. Differential usage of regions within each locus was then tested using DEXSeq. Using this approach, we identified 2535 loci with differential RNA processing events in fpa-8: 1792 were upregulated, 390 were downregulated, and 353 had both upregulated and downregulated regions (FDR < 0.05, absolute logFC >1; Figure 2F, left panel). A total of 1747 loci with differential RNA processing events were identified in 35S::FPA:YFP: 997 were upregulated, 532 were downregulated, and 218 had both upregulated and downregulated regions (Figure 2F, right panel). The median effect size for differentially processed regions was greater for upregulated regions than for downregulated regions in fpa-8. This is consistent with an increase in transcriptional readthrough events and elevated expression of intergenic regions and downstream genes. In contrast, the median effect size for differentially processed regions was similar for up- and downregulated regions in 35S::FPA:YFP. This is consistent with an increase in the relative expression of proximal exonic and intronic regions, and loss of expression of distal exonic regions caused by preferential selection of proximal poly(A) sites. Similar results were seen for differential splice junction usage analysis (Figure 2G), suggesting that changes in splicing are the indirect effects of altered 3′ end processing in fpa-8, rather than direct effects of FPA on splice site choice. Evidence of this can be seen at the PIF5 locus, where readthrough results in increased cryptic and canonical splicing of downstream PAO3 (Figure 2—figure supplement 2).

We next asked whether FPA influences RNA modification. Our IVI-MS analysis had revealed that conserved members of the Arabidopsis m6A writer complex co-purify with FPA (Figure 1D, Supplementary file 1). The human proteins most closely related to FPA are RBM15/B, which co-purify with the human m6A writer complex and are required for m6A deposition (Patil et al., 2016). We used LC-MS/MS to analyse the m6A/A (adenosine) ratio in mRNA purified from Col-0, fpa-8, 35S::FPA:YFP and a mutant defective in the m6A writer complex component VIR (vir-1). Consistent with previous reports, the level of mRNA m6A in the hypomorphic vir-1 allele was reduced to approximately 10% of wild-type levels (Parker et al., 2020; Růžička et al., 2017; Figure 2—figure supplement 3). However, we detected no differences in the m6A level between genotypes with altered FPA activity. Therefore, we conclude that FPA does not influence global levels of mRNA m6A methylation.

Finally, we asked whether the FPA-dependent global changes in alternative polyadenylation result from an indirect effect on chromatin state. We previously showed that FPA controls the expression of histone demethylase IBM1 by promoting proximal polyadenylation within IBM1 intron 7 (Duc et al., 2013). IBM1 functions to restrict H3K9me2 levels, and ibm1 mutants accumulate ectopic heterochromatic marks in gene bodies, which affects RNA processing at certain loci (Miura et al., 2009; Saze et al., 2008). When we analysed two independent ChIP-Seq datasets of H3K9me2 in ibm1–4 mutants (Inagaki et al., 2017; Lai et al., 2020), we found that only 10.6% of loci with altered poly(A) site choice in 35S::FPA:YFP have altered H3K9me2 in ibm1 mutants compared with 14.2% of all loci tested (hypergeometric p=0.97; Figure 2—figure supplement 4). This result suggests that FPA-dependent poly(A) site choice is not an indirect consequence of FPA control of IBM1.

Overall, these analyses reveal that the primary function of FPA is to control poly(A) site choice. FPA predominantly promotes poly(A) site selection; hence, fpa loss-of-function backgrounds exhibit readthrough at sites used in the wild type, whereas FPA overexpression results in increased selection of proximal poly(A) sites.

NLRs are major targets of FPA-sensitive alternative poly(A) site selection

We next asked which groups of genes are sensitive to FPA-dependent alternative polyadenylation. We used InterPro annotations (Mitchell et al., 2019) to perform protein family domain enrichment analysis of the loci affected by FPA (revealed by the Nanopore and Helicos DRS analyses). We found that sequences encoding NB-ARC, Rx-like coiled coil (CC), and/or LRR domains were enriched amongst the loci with increased proximal polyadenylation in 35S::FPA:YFP (Figure 3A and B). This combination of domains is associated with NLR disease resistance proteins.

Figure 3 with 2 supplements see all
Nanopore and Helicos DRS identify NLR genes regulated by alternative polyadenylation.

(A–B) Protein domain enrichment analysis for loci with increased proximal poly(A) site selection in 35S::FPA:YFP line, as detected using (A) Nanopore DRS or (B) Helicos DRS. (C) Nanopore DRS reveals the complexity of RNA processing at RPS6. Protein domain locations (shown in grey) represent collapsed InterPro annotations. The novel TIR domain was annotated using InterProScan (Mitchell et al., 2019). (D) Protein alignment of the predicted TIR domain from the novel gene downstream of RPS6, with the sequence of the TIR domains from RPS6 and RPS4. Helix and strand secondary structures (from UniProt: RPS4, Q9XGM3) are shown in blue and yellow, respectively. Residues are shaded according to the degree of conservation.

The Col-0 accession contains at least 206 genes encoding some combination of TIR, CC, RPW8, NB-ARC, and LRR domains, which might be classified as NLRs or partial NLRs (Van de Weyer et al., 2019). In general, these can be grouped according to their encoded N-terminal domain as TIR (TNLs), CC (CNLs), or RPW8 (RNLs) genes. We manually examined these NLR genes to identify those with alternative polyadenylation. Reannotation of some loci was required to interpret the effects of FPA activity. For example, we found that the TNL gene AT5G46490, located in the RPS6 cluster, is incorrectly annotated as two loci, AT5G46490 and AT5G46500 (Figure 3—figure supplement 1). Nanopore DRS evidence indicates that this is actually a single locus with a previously unrecognised 2.7 kb intron containing a proximal poly(A) site, the use of which is controlled by FPA. This interpretation is supported by nanoPARE data (Schon et al., 2018), which showed no evidence of capped 5′ ends originating from the annotated downstream gene. Use of the distal poly(A) site introduces an additional ~400 amino acids to the C-terminus of the protein. This C-terminal region has homology to other NLRs in the RPS6 cluster and is predicted to introduce additional LRR repeats (Martin et al., 2020; Figure 3—figure supplement 2).

Notably, we could also reannotate the chromosomal region around RPS6 itself. The extreme autoimmunity phenotypes of NMD mutants and mitogen-activated kinase pathway mutants require RPS6 but the mechanisms involved are not understood (Gloggnitzer et al., 2014; Takagi et al., 2020). Nanopore DRS indicates that the 3′UTR of RPS6 is complex, with multiple splicing events and poly(A) sites (Figure 3C). We also detected transcripts expressed from this region that do not appear to be contiguous with RPS6 3′UTR reads. Instead, these reads correspond to an independent unannotated gene that overlaps the RPS6 3′UTR. This interpretation is supported by capped RNA 5′ ends detected in this region by nanoPARE (Schon et al., 2018). In addition, Nanopore DRS analysis of the RNA exosome mutant hen2-2 (Parker et al., 2021) revealed that this unannotated gene is expressed at relatively high levels, but that the transcripts are subject to degradation. Consequently, steady-state levels of RNA expressed from this locus are relatively low in Col-0. The gene encodes a TIR domain similar to that of RPS6 (Figure 3D). Therefore, use of the distal RPS6 poly(A) site constitutes readthrough into the downstream TIR-domain-only NLR. Based on these analyses, we conclude that long-read Nanopore DRS data have the potential to correct NLR gene annotation at complex loci that cannot be resolved by genome annotation software or short-read Illumina RNA-Seq.

Widespread premature transcription termination of NLRs includes frequent selection of poly(A) sites in protein-coding exons

Of the 206 NLR genes examined, 124 had a sufficient level of expression to identify alternative polyadenylation in the Nanopore DRS data; of these 124, 62 (50.0%) were found to have FPA-dependent alternative polyadenylation (Tables 13). Of the 74 expressed NLRs located in major clusters, 44 (59.5%) were sensitive to FPA activity (chi2p=0.02) (Lee and Chae, 2020). The localisation of NLRs to large genomic clusters is known to facilitate diversification (Barragan and Weigel, 2020). Consistent with this, 20 (71.4%) of the 28 expressed NLRs reported to be under high levels of diversifying selection were sensitive to FPA activity (chi2p=0.02) (Prigozhin and Krasileva, 2021). In addition, FPA-sensitive NLRs tended to be located in regions with higher levels of synteny diversity (Jiao and Schneeberger, 2020), although in this case the association was not significant (t-test p=0.09; Figure 4—figure supplement 1). Overall, these findings suggest that FPA-dependent alternative polyadenylation is associated with rapidly evolving NLRs.

Table 1
Readthrough and chimeric RNA formation events at FPA-sensitive NLR genes.
Gene IDGene nameNLR classChimeric pair (upstream–downstream)
AT1G12220RPS5CNLAT1G12220–AT1G12230
AT1G58848RPP7a/bTNLAT1G58848–AT1G58889
AT1G59218RPP7a/bTNLAT1G59218–AT1G59265
AT1G61190-CNLncRNA–AT1G61190
AT1G63730-TNLAT1G63730–AT1G63740
AT1G63740-TNLAT1G63730–AT1G63740
AT3G46730-CNLAT3G46740–AT3G46730
AT4G16860RPP4TNLAT4G16860–AT4G16870–AT4G16857
AT4G16960SIKIC3TNLAT4G16970–AT4G16960–AT4G16957
AT4G19060-NB onlyAT4G19070–AT4G19060
AT4G19530-TNLAT4G19530–AT4G19540
AT5G38850-TNLAT5G38850–AT5G38860
AT5G40090CHL1TNLncRNA–AT5G40090
AT5G44510TAO1TNLAT5G44520–AT5G44510
AT5G45490-CNLAT5G45472–AT5G45490
AT5G46470RPS6TNLAT5G46470–TIR gene
AT5G48780-TNLAT5G48775–AT5G48780
Table 2
Intronic proximal polyadenylation events at FPA-sensitive NLR genes.
Gene IDGene nameNLR classPredicted functionProtein isoform
AT1G12210RFL1CNLnon-coding (5′UTR)-
AT1G58602RPP7CNLnon-coding (5′UTR); alternative 3′UTR-
AT1G63750WRR9TNLprotein codingTIR only
AT1G63880RLM1BTNLprotein coding; non-stopTIR only
AT1G69550-TNLprotein codingLRR truncation
AT3G44480RPP1TNLprotein codingLRR truncation
AT3G50480HR4RPW8protein codingRPW8 truncation
AT4G16860RPP4TNLprotein codingTIR only
AT4G16900-TNLprotein codingLRR truncation
AT4G19510RPP2BTNLalternative 3′UTR-
AT5G17890DAR4/CHS3TNLprotein codingTIR only
AT5G40910-TNLprotein codingTIR only
AT5G43730RSG2CNLnon-coding (5′UTR)-
AT5G43740-CNLnon-coding (5′UTR)-
AT5G46270-TNLprotein codingTIR/NB-ARC only;
LRR truncation
AT5G46470RPS6TNLalternative 3′UTR
AT5G46490-TNLprotein coding; non-stopTIR/NB-ARC only;
LRR truncation
Table 3
Exonic proximal polyadenylation events at FPA-sensitive NLR genes.
Gene IDGene nameNLR classPredicted functionProtein isoform
AT1G10920LOV1CNLprotein coding*CC-only*
AT1G27180-TNLnon-stop-
AT1G31540RAC1TNLnon-stop; protein coding^LRR truncation^
AT1G33560ADR1RNLnon-stop-
AT1G53350-CNLnon-stop-
AT1G56510WRR4ATNLnon-stop-
AT1G56520-TNLnon-stop-
AT1G58602RPP7CNLnon-stop-
AT1G58807RF45CNLnon-stop-
AT1G58848RPP7a/bCNLnon-stop-
AT1G59124RDL5CNLnon-stop-
AT1G59218RPP7a/bCNLnon-stop-
AT1G61300-CNLnon-stop-
AT1G62630-CNLnon-stop-
AT1G63360-CNLnon-stop-
AT1G63730-TNLnon-stop-
AT1G63860-TNLnon-stop-
AT1G63880RLM1BTNLnon-stop-
AT1G72840-TNLnon-coding (5′UTR)-
AT2G14080RPP28TNLnon-stop-
AT3G44480RPP1TNLnon-stop; protein codingLRR truncation
AT3G44630-TNLnon-stop-
AT3G44670-TNLnon-stop; protein codingTIR only
AT3G46530RPP13CNLnon-stop-
AT4G16860RPP4TNLnon-stop-
AT4G16890SNC1TNLnon-stop-
AT4G16900-TNLnon-stop-
AT4G19520-TNLnon-stop-
AT4G19530-TNLnon-stop-
AT4G36140-TNLnon-stop-
AT5G17890DAR4/CHS3TNLnon-stop-
AT5G35450-CNLnon-stop-
AT5G38850-TNLnon-stop-
AT5G40060-TNLprotein coding*TIR only*
AT5G40910-TNLnon-stop-
AT5G43470RPP8CNLnon-stop-
AT5G43740-CNLnon-stop-
AT5G44510TAO1TNLnon-stop; protein codingLRR truncation
AT5G44870LAZ5TNLnon-stop-
AT5G45050RRS1BTNLnon-stop-
AT5G45250RPS4TNLprotein codingLRR truncation
AT5G45260RRS1TNLnon-stop-
AT5G46270-TNLnon-stop; protein codingLRR truncation
AT5G48620-CNLnon-stop-
AT5G58120DM10TNLnon-stop; protein codingLRR truncation
  1. * indicates loci where exonic proximal polyadenylation generates transcripts that may be protein coding due toupstream ORFs.

    † indicates loci where exonic proximal polyadenylation coupled with intron retention results in a protein-coding ORF.

The effects of FPA activity can be broadly classified into three modes of control involving (i) readthrough and chimeric RNAs, (ii) intronic poly(A) sites, and (iii) poly(A) sites within protein-coding exons. At certain complex loci, FPA can affect poly(A) site choice using combinations of these different modes of regulation.

For 17 NLR genes, we found that a change in FPA activity altered the formation of readthrough or chimeric RNAs containing one or more NLR loci (Table 1). The duplicated RPP7a/b-like genes AT1G58848 and AT1G59218 (which form part of the RPP7 cluster containing five CNL-class NLRs) displayed increased readthrough into downstream transposable elements (TEs) in fpa-8 (Figure 4A). EMD tests could not be performed at these loci due to the multi-mapping of reads at these duplicated genes (AT1G58848 and AT1G59218). Loss of FPA function can also lead to clusters of two or more NLR genes being co-transcribed as a single transcriptional unit. For example, the TNL-class gene AT1G63730, located in the B4/RLM1 cluster, forms chimeric RNA with the downstream TNL-class gene AT1G63740 in fpa-8 (Helicos EMD = 1099, FDR = 0.02; Figure 4—figure supplement 2).

Figure 4 with 4 supplements see all
FPA-dependent alternative polyadenylation of NLR transcripts.

FPA controls (A) readthrough and chimeric RNA formation at AT1G58848 (unique mapping of short Helicos DRS reads was not possible due to the high homology of AT1G58848 to tandemly duplicated NLR loci in the same cluster); (B) intronic polyadenylation at AT1G69550, resulting in transcripts encoding a protein with a truncated LRR domain; (C) exonic polyadenylation at AT2G14080, resulting in stop-codonless transcripts; and (D) exonic polyadenylation at AT5G40060, resulting in transcripts encoding a TIR-domain-only protein due to an upstream ORF.

We identified another 17 NLR genes with intronic polyadenylation controlled by FPA (Table 2). Of these, four contained poly(A) sites in 5′UTR introns (which would result in non-coding transcripts) and three contained alternative poly(A) sites after the stop codon (which could alter potential regulatory sequences contained in 3′UTRs). The remainder contained poly(A) sites in introns between protein-coding exons. Selection of these poly(A) sites introduce premature stop codons that result in truncated open reading frames (ORFs). For example, we identified a proximal poly(A) site within the third intron of AT1G69550, which encodes a TNL-type singleton NLR (Figure 4B). Use of this poly(A) site results in mRNAs with a premature stop codon; the encoded protein lacks most of the predicted LRR domain. In fpa-8, readthrough at this poly(A) site is increased (Helicos EMD = 1271, FDR = 1.2 × 10−4), resulting in an increase in the relative number of full-length transcripts.

The most common form of FPA-dependent NLR regulation was premature termination within exons (Table 3). We identified 45 NLRs controlled in this way: at 44 of these loci, termination occurred within protein-coding exons. In most cases, this results in stop-codonless transcripts that are predicted targets of non-stop decay (Szádeczky-Kardoss et al., 2018). Many of these proximal exonic poly(A) sites could be identified at lower levels in Col-0. For example, at RPP28 (AT2G14080), which encodes a TNL-class singleton NLR, we detected multiple exonic poly(A) sites located within the second and fourth exons, which encode the NB-ARC and LRR domains, respectively (Figure 4C). Selection of these exonic poly(A) sites was increased in 35S::FPA:YFP (Helicos EMD = 859, FDR = 5.4 × 10−9) and decreased in fpa-8 (Helicos EMD = 912, FDR = 7.6 × 10−9). FPA was also found to promote premature termination in the protein-coding sequence of single-exon, intronless NLR genes. For example, at RPP13 (AT3G46530), which encodes a CNL-class NLR protein, FPA overexpression causes selection of proximal poly(A) sites located within the region encoding the LRR domain (Helicos EMD = 228, FDR = 1.8 × 10−4; Figure 4—figure supplement 3).

Although the most frequent consequence of FPA selection of exonic poly(A) sites was stop-codonless transcripts, we also identified examples where the protein-coding potential was altered. For example, AT5G40060 encodes a TNL-class NLR but has a premature stop codon between the TIR and NB-ARC domains. Consequently, full-length transcription results in an mRNA with an upstream ORF (uORF) encoding the TIR domain and a larger downstream ORF encoding NB-ARC and LRR domains (Figure 4D). However, transcripts with such large uORFs are targets of NMD in plants (Nyikó et al., 2009). Therefore, FPA-dependent proximal polyadenylation in the region encoding the NB-ARC domain results in a transcript containing only the uORF, which is not a predicted NMD target and may be more efficiently translated into a TIR-only protein.

In seven of the identified genes, exonic proximal polyadenylation is associated with retention of an upstream intron (Table 3). As a result, premature stop codons are introduced, resulting in a truncated coding region. For example, the TNL-type NLR RPS4 was previously shown to be regulated by alternative splicing induced by the effector AvrRps4 (Zhang and Gassmann, 2007). We identified an increase in RPS4 intron 3 retention in 35S::FPA:YFP compared with Col-0 that was associated with proximal polyadenylation events in exon 4 (Helicos EMD = 34, not significant; Figure 4—figure supplement 4). Therefore, inter-dependence between splicing and poly(A) site choice may explain RPS4 control.

FPA controlled NLR poly(A) site selection at 16 complex loci with combinations of intronic, exonic, and readthrough sites. One example is RPP4 (AT4G16860), a TNL-class NLR known to mediate Arabidopsis resistance to Hpa isolate Emoy2 (Hpa-Emoy2) (van der Biezen et al., 2002). RPP4 is part of the RPP5 cluster, comprising seven TNL-class NLRs. In agreement with a previous study (Wang and Warren, 2010), we found that in wild-type Col-0, RPP4 can be transcribed as a chimeric RNA together with the downstream AtCOPIA4 TE (AT4G16870) through selection of one of the two distal poly(A) sites located within the TE (Figure 5; Wang and Warren, 2010) or selection of a third poly(A) site in the downstream gene AT4G16857. Use of the proximal poly(A) site within the TE is associated with an approximately 8 kb cryptic splicing event between the 5′ splice site of the first exon of RPP4 and a 3′ splice site located within the TE. Both Nanopore DRS and Illumina RNA-Seq data provide evidence for this cryptic splicing event, which skips all RPP4 exons downstream of exon 1, removing most of the RPP4 coding sequence and introducing a stop codon (Figure 5, Inset 1). The resulting transcript is predicted to encode a TIR-domain-only protein. Loss of FPA function decreases chimeric RNA formation by shifting poly(A) site selection towards a proximal poly(A) site located within the protein-coding region of the final exon (Figure 5—figure supplement 1). This results in the production of RPP4 transcripts lacking in-frame stop codons (Figure 5, Inset 2). Furthermore, in 35S::FPA:YFP, we observed increased selection of a proximal poly(A) site located within the first intron of RPP4, which would also encode a truncated RPP4 protein. We conclude that FPA-dependent alternative polyadenylation at RPP4 produces transcripts with unusually long 3′UTRs, alternative protein isoforms and transcripts that cannot be efficiently translated.

Figure 5 with 1 supplement see all
Complex FPA-dependent patterns of alternative polyadenylation at RPP4.

FPA-dependent intronic, exonic and readthrough poly(A) site selection in RPP4. (Inset 1) A magnified view of TIR-domain-only RPP4 transcripts detected in 35S::FPA:YFP caused by proximal polyadenylation in intron 1, and distal polyadenylation within the TE associated with cryptic splicing. (Inset 2) A magnified view of the stop-codonless transcripts produced within the protein-coding RPP4 region in fpa-8.

FPA controls RPP7 by promoting premature termination within protein-coding exon 6

To examine the functional impact of FPA on the regulation of NLRs, we focused on RPP7. RPP7 encodes a CNL-class NLR protein which is necessary for resistance to Hpa isolate Hiks1 (Hpa-Hiks1) in Col-0 (McDowell et al., 2000). The full-length expression of RPP7 is controlled by elongation factors that interact with H3K9me2, which is associated with the COPIA-type retrotransposon (COPIA-R7) located in RPP7 intron 1 (Saze et al., 2013). Using Nanopore and Helicos DRS data, we identified at least two poly(A) sites within the COPIA-R7 element, both of which were selected more frequently in fpa-8 (Figure 6A, Figure 6—figure supplement 1). We also identified two poly(A) sites within the second intron of RPP7. The use of both sites is reciprocally sensitive to FPA activity, with a moderate decrease in fpa-8 and an increase in 35S::FPA:YFP. All these intronic proximal poly(A) sites are located before the start of the RPP7 ORF and generate transcripts that do not encode RPP7 protein. At the 3′ end of RPP7, we found three alternative poly(A) sites located in the terminal intron, in addition to the previously reported most distal and most commonly used poly(A) site in the terminal exon (Figure 6A, Inset 1) (Tsuchiya and Eulgem, 2013). Selection of each of these poly(A) sites is associated with alternative splicing events that lead to the generation of four possible 3′UTR sequences. Termination at the 3′UTR intronic poly(A) sites is suppressed by FPA: their usage is increased in fpa-8 and decreased in 35S::FPA:YFP. These data indicate that FPA influences RPP7 intronic polyadenylation at a larger number of poly(A) sites than previously supposed.

Figure 6 with 1 supplement see all
FPA promotes premature cleavage and polyadenylation within RPP7 protein-coding exon six that compromises plant immunity against Hpa-Hiks1.

(A) FPA-dependent RNA 3′ end formation changes at the RPP7 (AT1G58602) locus. (Inset 1) Magnified view of the RPP7 3′UTR region with alternative RNA 3′ ends. (Inset 2) Magnified view of the stop-codonless transcripts produced in protein-coding RPP7 exon 6. (B) RNA gel blot visualising RPP7 transcripts in Col-0, fpa-8 and 35S::FPA:YFP. Probe location in second exon is shown on (A) (light brown). Beta-TUBULIN was used as an internal control. (C) FPA-dependent premature exonic termination of RPP7 compromises immunity against Hpa-Hiks1. Point plot showing median number of sporangiophores per plant calculated 4 days after Hpa-Hiks1 inoculation. Error bars are 95% confidence intervals. Each experimental replicate was generated from 7 to 45 plants per genotype.

Figure 6—source data 1

Hpa-Hiks1 susceptibility results for the Col-0, Ksk-1, fpa-7, fpa-8, pFPA::FPA and 35S::FPA:YFP lines [Linked to Figure 6C].

https://cdn.elifesciences.org/articles/65537/elife-65537-fig6-data1-v3.csv

The major effect of FPA on RPP7 is within protein-coding exon 6, where we identified three poly(A) sites (Figure 6A, Inset 2): two at the end of the region encoding the NB-ARC domain and one within the region encoding the LRR repeats. Cleavage and polyadenylation at these sites result in transcripts without in-frame stop codons, thereby disrupting the coding potential of RPP7 mRNA. These poly(A) sites were identified in both Helicos and Nanopore DRS data, indicating that they are unlikely to be caused by alignment errors. The relative selection of exon 6 poly(A) sites depends on FPA expression: in Col-0, 25% of RPP7 Nanopore DRS reads terminate at one of these exon 6 poly(A) sites; and when FPA is overexpressed, this figure increases to 63%. Consistent with this, a relative drop in coverage at exon 6 was also observed in 35S::FPA:YFP Illumina RNA-Seq data. Consequently, only 23% of RPP7 transcripts are expected to encode RPP7 protein in the FPA-overexpressing line. In contrast, 4% of RPP7 Nanopore DRS reads identified in fpa-8 terminate in exon 6, and 79% of transcripts are expected to be protein coding. In an orthogonal approach, we used RNA gel blot analysis to visualise RPP7 mRNAs in Col-0, fpa-8, and 35S::FPA:YFP backgrounds and detected a clear decrease in signal corresponding to full-length RPP7 transcripts in 35S::FPA:YFP (Figure 6B). These data support previous evidence of FPA-dependent control of RPP7 (Deremetz et al., 2019) but reveal that the predominant mechanism is via exonic transcription termination.

RPP7-dependent immunity to the biotrophic pathogen Hpa is sensitive to FPA expression

We next asked whether FPA-dependent premature transcription termination at RPP7 exon 6 has a functional consequence. Since FPA reduced the level of full-length protein-coding RPP7 transcripts, we asked whether increased FPA activity might compromise RPP7-dependent immunity. To test this hypothesis, we carried out pathogenesis assays using the oomycete strain Hpa-Hiks1. RPP7 function is required for immunity to Hpa-Hiks1 in Col-0 (McDowell et al., 2000). The Keswick (Ksk-1) accession is susceptible to Hpa-Hiks1 (Lai et al., 2019) and we used it as a control in these studies.

We inoculated Arabidopsis seedlings with Hpa-Hiks1 spores in three independent experiments. Four days after inoculation, we checked susceptibility by counting the number of sporangiophores. With the exception of Ksk-1, all of the lines we tested were in a Col-0 background. As expected, Col-0 plants were resistant to infection (median: 0 sporangiophores per plant), and Ksk-1 plants were sensitive to infection (median: five sporangiophores per plant; p=1.7 × 10−32; Figure 6C). fpa-7 mutants were as resistant to infection as Col-0 (median: 0 sporangiophores per plant, p=0.19). This is consistent with our finding that full-length RPP7 transcript expression is not reduced in the absence of FPA. fpa-8 mutants were also resistant to infection (median: 0 sporangiophores per plant); however, there was slight variability in their resistance compared to fpa-7 (p=2.4 × 10−12). This variability was not restored by complementation with a pFPA::FPA transgene (p=0.23) indicating that it is not caused by loss of FPA function, and is likely to result from other mutations in the fpa-8 background. In contrast, 35S::FPA:YFP plants were significantly more sensitive to Hpa-Hiks1 than pFPA::FPA (median: three sporangiophores per plant; p=3.8 × 10−9), indicating that overexpression of FPA compromises immunity. We conclude that FPA control of poly(A) site selection can modulate NLR function, with a functional consequence for immunity.

Discussion

We have identified a novel role for the RNA-binding protein FPA in the control of plant innate immunity. Using IVI-MS proteomics and ChIP-Seq, we showed that FPA is closely associated with proteins involved in RNA 3′ processing and co-localises with Ser2 phosphorylated Pol II at the 3′ ends of genes. Integrative analysis using three RNA sequencing technologies confirmed that the major effect of modulating FPA activity is to alter poly(A) site selection. An unexpected finding was that half of expressed NLR loci were sensitive to FPA activity. In most cases, FPA promoted the use of poly(A) sites within protein-coding exons of NLR genes. At RPP7, an increase in exonic polyadenylation caused by FPA overexpression was shown to compromise immunity to Hpa-Hiks1. The widespread nature of this control mechanism suggests that transcription termination plays an important role in the regulatory and evolutionary dynamics of NLR genes.

Uncovering protein assemblies that mediate 3′ end processing in living plant cells

We used an in vivo formaldehyde cross-linking approach to identify proteins that co-localise with FPA inside living plant cells. These data provide in-depth knowledge of the proteins involved in Arabidopsis RNA 3′ end processing and clues to the function of the uncharacterised proteins identified here. Components of the m6A writer complex also co-purify with FPA. However, unlike related proteins in human and Drosophila (Knuckles et al., 2018; Patil et al., 2016), we found that FPA is not required to maintain global levels of m6A modification in Arabidopsis.

Two Arabidopsis PCF11 paralogs with Pol II CTD-interacting domains (CIDs), PCFS2 and PCFS4, co-purified with FPA, but two paralogs lacking CIDs, PCFS1, and PCFS5, did not. PCF11 was previously shown to have functionally separable roles in transcription termination and cleavage and polyadenylation (Sadowski et al., 2003): the N-terminal PCF11 CID is required for transcription termination, whereas the C-terminal domains are required for cleavage and polyadenylation. The specific interaction of FPA with CID-containing PCF11 paralogs suggests that FPA controls alternative polyadenylation by altering Pol II speed and transcription termination. The human SPOC domain protein PHF3 can bind to two adjacent Ser2 phosphorylated heptads of the CTD of Pol II via two electropositive patches on the surface of its SPOC domain (Appel et al., 2020). One of these patches, and the key amino acid residues within it, is conserved in the structure of the FPA SPOC domain (Zhang et al., 2016). Consequently, FPA might also interact with the CTD, possibly in conjunction with CID domains of PCFS2 and PCFS4. Such interactions could account for the global correlation between FPA and Pol II Ser2 occupancy and explain how FPA is able to associate with terminating Pol II at the 3’ ends of most expressed genes.

Widespread control of NLR transcription termination by FPA

An unanticipated finding of this study is that Arabidopsis NLR genes were enriched amongst loci with FPA-sensitive poly(A) sites. NLRs function in the immune response and, consistent with this crucial role, they are under powerful and dynamic selective pressure. Defining the inventory of Arabidopsis NLRs depended on long-range DNA sequencing of diverse accessions (Van de Weyer et al., 2019). Here, we show that long-read Nanopore DRS provides insight into the authentic complexity of NLR mRNA processing and enables the accurate annotation of NLR genes. For example, our reannotation of the RPS6 locus is essential to understand the recurring role of RPS6 in autoimmunity. The autoimmune phenotypes of mutants defective in NMD or the mitogen-activated kinase pathway are RPS6 dependent, but the mechanisms involved are unclear (Gloggnitzer et al., 2014; Takagi et al., 2020). We found that RPS6 is transcribed through a previously unrecognised downstream gene that encodes an RPS6-like TIR domain. We showed that expression of the downstream gene is dependent on the RNA exosome component HEN2. In addition, mutations in HEN2 were recently identified as suppressors of RPS6-dependent autoimmune phenotypes (Takagi et al., 2020). It is clear that accurate annotation of complex NLR loci facilitates the interpretation of basic features of NLR function.

Of the 124 NLRs with detectable expression in Nanopore DRS data, 62 were sensitive to FPA activity. FPA controls 3′ end formation of NLR genes in three different transcript locations (Figure 7): (i) 3′UTRs, where it can prevent readthrough and chimeric RNA formation; (ii) introns, where it promotes proximal polyadenylation; and (iii) protein-coding exons, where it promotes stop-codonless transcript formation. The consequences of such complex control of RNA 3′ end formation are wide-ranging and likely to be context dependent (Mayr, 2019).

Functional consequences of FPA-dependent alternative polyadenylation at NLR loci.

Model diagram showing how FPA-dependent alternative polyadenylation at NLR loci might affect the regulatory and evolutionary dynamics of plant disease resistance.

Where FPA controls readthrough and chimeric RNA formation, it affects 3′UTR length, sequence composition and cryptic splice site usage. Long or intron-containing 3′UTRs are targeted by NMD, leading to RNA decay or suppressed translatability. Long, unstructured 3′UTRs influence intermolecular RNA interactions and phase separation, changing the subcellular localisation of mRNAs (Ma et al., 2020). The close proximity of mRNAs in the resulting granules may enable co-translational protein complex formation. Readthrough transcription may also disrupt the expression of downstream genes by transcription interference (Proudfoot, 1986).

FPA-dependent premature transcription termination at intronic poly(A) sites can introduce novel stop codons, resulting in transcripts that encode truncated NLR proteins with altered functions. For example, some TIR domain-only proteins are known to be active in NLR regulation, resulting in constitutive signalling activity (Zhang et al., 2004) or act as competitive inhibitors by titrating full-length NLR protein (Williams et al., 2014). In other cases, TIR-domain-only proteins are sufficient for pathogen recognition (Nishimura et al., 2017). The TE-containing 3′UTR of RPP4 appears to be required for resistance to the pathogen Hpa-Emoy2, although the mechanism involved is unclear (Wang and Warren, 2010). We discovered that cryptic splicing of RPP4 exon 1 to a novel 3′ splice site within the TE can produce a unique transcript that encodes only the RPP4 TIR domain. It will be interesting to examine whether the TIR-only RPP4 isoform is required for full pathogen resistance. We also found that intron retention at RPS4, which is essential for RPS4-dependent resistance against P. syringae DC3000 (Zhang and Gassmann, 2007), is linked to exonic proximal polyadenylation. Intron retention without accompanying proximal polyadenylation will result in transcripts with long 3′UTRs that are likely to be sensitive to NMD, whereas proximally polyadenylated transcripts could be translated into truncated protein. Therefore, a combination of alternative polyadenylation and splicing probably underpins RPS4 control. In future, sensitive proteomic analyses will be important to determine the impact of alternative polyadenylation on NLR protein isoform expression.

A remarkable finding was that FPA mostly targets the protein-coding exons of NLR genes and, even controls premature transcription termination within the ORF of single-exon NLR genes such as RPP13. Premature transcription termination in protein-coding exons results in the production of stop-codonless transcripts that cannot be efficiently translated into protein. These truncated transcripts may be subject to decay by RNA surveillance mechanisms (e.g. the non-stop decay pathway) or act as non-coding RNA decoys to titrate the levels of regulatory microRNAs (Shivaprasad et al., 2012). Increased rates of NLR transcription in plants under pathogen attack could promote elongation through such ‘regulatory’ poly(A) sites. In this way, the expression of NLR proteins might be restricted during pathogen surveillance but kept poised for rapid activation during infection.

Since the evolution of cis-regulatory elements controlling poly(A) site choice within introns or 3′UTRs is free from the constraints of protein-coding functionality, why should protein-coding exons be targeted so frequently? One possibility is that this enables the expression of newly created NLR genes to be kept under tight control, thereby facilitating rapid evolution whilst reducing the chances of autoimmunity (Figure 7). This hypothesis is strengthened by the finding that many NLRs with high allelic diversity (Prigozhin and Krasileva, 2021) are sensitive to FPA activity. Alternative polyadenylation might also function to hide NLR genes from negative selection and contribute to cryptic genetic variation in a similar way to the mechanism proposed for NMD- and microRNA-mediated NLR control (Raxwal and Riha, 2016; Shivaprasad et al., 2012). Cryptically spliced chimeric RNAs, with subsequent retrotransposition, can be a source of new genes (Akiva et al., 2006). Therefore, the control of transcription termination could directly facilitate the neofunctionalisation of NLRs. In the future, it will be important to compare patterns of transcription termination at NLRs across Arabidopsis accessions. For example, analysis of transcriptomic data will determine whether proximal polyadenylation is conserved in NLRs with high allelic diversity, whilst an integrative analysis of transcriptomic and genomic data will establish whether chimeric NLR transcripts identified in some accessions are found as retrotransposed genes in others.

At least two distinct patterns of alternative polyadenylation mediate RPP7 regulation, one involving intronic heterochromatin (Tsuchiya and Eulgem, 2013) and another involving FPA-dependent termination in exon 6. The latter mechanism is conserved across all NLR genes of the Col-0 RPP7 locus (Table 3). Alleles of these RPP7-like NLR genes have been identified as the causes of specific cases of hybrid necrosis (Barragan et al., 2019; Chae et al., 2014; Li et al., 2020). In these cases, autoimmunity is explained by allele-specific physical interactions between RPP7 protein and the RPW8-only protein HR4 (Li et al., 2020). We found that not only are RPP7-like genes targeted by FPA-dependent premature transcript termination, but so too is HR4 (Table 2). This raises the possibility that FPA could rescue hybrid necrosis by limiting the expression of these proteins. FPA also appears to control the proximal polyadenylation of DANGEROUS MIX 10 (DM10), producing transcripts that could encode a protein with truncated LRR repeats. DM10 alleles with LRR truncations have been demonstrated to cause autoimmunity in specific crosses (Barragan et al., 2021), suggesting that in other cases FPA overexpression could trigger or enhance autoimmune phenotypes. Consequently, modulation of transcription termination may shift the balance of costs and benefits associated with NLR gene expression. This phenomenon is not likely to be restricted to FPA because mutations in the RNA 3′ processing factor CPSF30 can also suppress autoimmunity (Bruggeman et al., 2014).

The impact of FPA overexpression on gene expression and immunity revealed here derives from artificial transgene expression. However, pathogens could similarly modulate NLR activity by evolving effectors that target the expression or activity of factors controlling NLR poly(A) site choice. Consistent with this idea, the HopU1 effector of P. syringae targets the RNA-binding protein AtGRP7 (Fu et al., 2007), which co-purified with FPA. In addition, the Pi4089 effector of the oomycete pathogen Phytophthora infestans targets the KH-domain RNA-binding protein StKRBP1 in potato; as a result, the abundance of StKRBP1 increases and infection by P. infestans is enhanced (Wang et al., 2015). This precedent reveals that effector-mediated increases in RNA-binding protein abundance can transform host RNA-binding proteins into susceptibility factors. Phylogenetic analysis of StKRBP1 suggests that a direct homolog is absent in Brassicaceae. However, the most closely related Arabidopsis proteins are FLK and PEP (Zhang et al., 2020a), both of which co-purify with FPA and have been shown to regulate poly(A) site choice (Rodríguez-Cazorla et al., 2015). FPA, GRP7, FLK and PEP, along with other RNA-binding proteins, act in concert to fine-tune the timing of flowering through the regulation of FLC. In a similar way, RNA-binding protein-dependent modulation of NLR expression might explain how quantitative disease resistance occurs (Corwin and Kliebenstein, 2017).

New ways to analyse RNA processing

An essential feature of our study was the introduction of new approaches to study RNA processing and 3′ end formation. The use of long-read Nanopore DRS transformed our understanding of the complexity of NLR gene expression by providing insight that short-read Illumina RNA-Seq and Helicos DRS could not. We recently showed that Nanopore DRS mapping of RNA 3′ ends closely agrees with short-read Helicos DRS, and that Nanopore DRS is not compromised by internal priming artefacts (Parker et al., 2020). Consequently, we used Nanopore DRS to quantify alternative patterns of cleavage and polyadenylation. We also introduced a new approach to analyse alternative polyadenylation by applying the EMD metric. EMD incorporates information on the both the relative abundance and the genomic distance between alternative poly(A) sites. This is valuable because large distances between poly(A) sites are more likely to impact the mRNA coding potential or trigger mRNA surveillance compared with subtle changes in 3′UTR length.

A limitation of short-read analyses of RNA processing is their dependence upon reference transcript annotations because these may be incomplete. For example, in disease or mutant conditions, RNA processing often occurs at novel sites that are not present in reference transcriptomes (as was the case here for NLR genes). For this reason, using long-read sequencing data to generate bespoke reference transcriptomes for the genotypes under analysis can increase the value of short-read sequencing data. Until the throughput of long-read sequencing matches that of short-read technologies, a combined approach is likely to be generally useful in interpreting transcriptomes.

Concluding remarks

It is difficult to identify alternative polyadenylation from conventional short-read RNA-Seq data. As a result, the impact of alternative polyadenylation is probably under-reported. Here we show that premature transcription termination of NLR genes is widespread. Using Nanopore DRS, we could improve the accuracy of NLR annotation and revealed a layer of NLR gene regulation that may also influence the dynamics of NLR evolution. The continued development of approaches that reveal full-length native RNA molecules is likely to provide new insight into other important, but previously unrecognised, aspects of biology.

Materials and methods

Key resources table
Reagent type
(species) or resource
DesignationSource or referenceIdentifiersAdditional information
Strain (Arabidopsis thaliana)Columbia (Col-0)NAABRC: CS22625Country of Origin: USA
Strain (Arabidopsis thaliana)Keswick (Ksk-1)Lai and Eulgem, 2018ABRC: CS1634Country of Origin: UK
Gene (Arabidopsis thaliana)FPANATAIR/ABRC: AT2G43410-
Gene (Arabidopsis thaliana)RPP7NATAIR/ABRC: AT1G58602-
Genetic reagent (Arabidopsis thaliana)fpa-7Duc et al., 2013ABRC: SALK_021959CT-DNA insertion mutant in Col-0 background. Gifted by R. Amasino, UW-Madison.
Genetic reagent (Arabidopsis thaliana)fpa-8Bäurle et al., 2007TAIR: 4515120225EMS point mutation in Col-0 background. Gifted by C. Dean, John Innes Centre
Genetic reagent (Arabidopsis thaliana)35S::FPA:YFP fpa-8Bäurle et al., 2007NATransgenic line in fpa-8 background, gifted by C. Dean, John Innes Centre
Genetic reagent (Arabidopsis thaliana)pFPA::FPA fpa-8Zhang et al., 2016NATransgenic line in fpa-8 background.
Genetic reagent (Arabidopsis thaliana)vir-1Růžička et al., 2017TAIR: 6532672723EMS point mutant in Col-0 background. Gifted by K. Růžička, Brno.
Commercial assay, kitRneasy Plant Mini kitQIAGENCat#: 74904-
Commercial assay, kitSuperScript III Reverse TranscriptaseThermo Fisher ScientificCat#: 18080044-
Commercial assay, kitNEBNext Ultra Directional RNA Library Prep Kit for IlluminaNew England BiolabsCat#: E7420-
Commercial assay, kitDynabeads mRNA Purification KitThermo Fisher ScientificCat#: 61006-
Commercial assay, kitNanopore Direct RNA sequencing kitOxford Nanopore TechnologiesCat#: SQK-RNA001-
Commercial assay, kitMinION Flow cell r9.4Oxford Nanopore TechnologiesCat#: FLO-MIN106-
Peptide, recombinant proteinT4 DNA ligaseNew England BiolabsCat#: M0202-
Commercial assay, kitQuick Ligase reaction bufferNew England BiolabsCat#: B6058S-
Commercial assay, kitAgencourt RNAclean XP magnetic beadsBeckman CoulterCat#: A63987-
Commercial assay, kitQubit RNA BR Assay KitThermo Fisher ScientificCat#: Q10210-
Commercial assay or kitRNA ScreenTape SystemAgilentCat#: 5067–5576 - 5067–5578-
AntibodyFPA antibodyCovanceNARabbit polyclonal antibody. Raised against FPA amino acids536–901.
Chemical compound[γ−32P]-ATPPerkin ElmerCat#: BLU012H250UC-
Commercial assay or kitDECAprime II DNA labelling kitThermo Fisher ScientificCat#: AM1455-
Commercial assay or kitIllustra MicroSpin G-50 ColumnsGE HealthcareCat#: 27-5330-01-
Commercial assay or kitRiboRuler High Range RNA LadderThermo Fisher ScientificCat#: SM1821-
Peptide, recombinant proteinFastAP Thermosensitive Alkaline PhosphataseThermo Fisher ScientificCat#: EF0651-
Peptide, recombinant proteinT4 Polynucleotide KinaseThermo Fisher ScientificCat#: EK0031-
Peptide, recombinant proteinNuclease P1MerckCat#: N8630-1VL-
Peptide, recombinant proteinCalf Intestinal Alkaline PhosphataseNew England BiolabsCat#: M0290S-
Chemical compoundN6-Methyladenosine (m6A), Modified adenosine analogAbcamCat#: ab145715-
Chemical compoundAdenosine, Endogenous P1 receptor agonistAbcamCat#: ab120498-
Commercial assay or kitGFP-Trap AgaroseChromotekCat#: gta-20-
Software, algorithmd3pendr10.5281/zenodo.4319112NAScripts to perform differential 3' end analysis using Nanopore DRS or Helicos DRS data
Software, algorithmSimpson_Barton_FPA_NLRs10.5281/zenodo.4319108NAAll pipelines, scripts and notebooks used for analyses in this manuscript.

Plants

Plant material and growth conditions

Request a detailed protocol

The wild-type Col-0 accession and fpa-7 were obtained from the Nottingham Arabidopsis Stock Centre. The fpa-8 mutant (Col-0 background) and 35S::FPA:YFP in fpa-8 (Bäurle et al., 2007) were provided by C. Dean (John Innes Centre). Generation of the pFPA::FPA line was previously described (Zhang et al., 2016). Surface-sterilised seeds were sown on MS10 medium plates containing 2% agar, stratified at 4°C for 2 days, germinated in a controlled environment at 20°C under 16 hr light/8 hr dark conditions and harvested 14 days after transfer to 20°C.

IVI-MS

Preparation of IVI-MS samples

Request a detailed protocol

Seedlings were harvested 14 days after germination and cross-linked with 1% (v/v) formaldehyde under vacuum. The cross-linking reaction was stopped after 15 min by the addition of glycine to a final concentration of 0.125 M and returned to vacuum for a further 5 min. Nuclei were isolated from frozen ground plant tissue using Honda buffer (20 mM Hepes-KOH pH 7.4, 10 mM MgCl2, 440 mM sucrose, 1.25% (w/v) Ficoll, 2.5% (w/v) Dextran T40, 0.5% (v/v) Triton X-100, 5 mM DTT, 1 mM PMSF, 1% (v/v) Protease Inhibitor Cocktail; (Sigma)) and collected by centrifugation at 2000 g for 17 min at 4°C. Nuclei were washed twice with Honda buffer (centrifugation at 1500 g for 15 min at 4°C between washes) and lysed in nuclear lysis buffer (50 mM Tris-HCl pH 8, 10 mM EDTA, 1% (w/v) SDS, 1 mM PMSF, 1% (v/v) Protease Inhibitor Cocktail) by sonication for four cycles of 30 s pulses with low power and 60 s cooling between pulses using a Bioruptor UCD-200 (Diagenode). Following centrifugation (16,100 g for 10 min at 4°C), the supernatant was diluted 10-fold with sample dilution buffer (16.7 mM Tris-HCl pH 8, 167 mM NaCl, 1.1% (v/v) Triton X-100, 1% (v/v) Protease Inhibitor Cocktail). Cross-linked protein complexes were isolated with GFP-trap agarose beads (Chromotek) and incubated at 4°C with constant rotation for 5 hr, followed by centrifugation (141 g for 3 min at 4°C). Beads were washed three times with washing buffer (150 mM NaCl, 20 mM Tris-HCl pH 8, 2 mM EDTA pH 8, 1% (v/v) Triton X-100, 0.1% (w/v) SDS, 1 mM PMSF) by centrifugations between washes (400 g for 3 min at 4°C). Samples were incubated at 90°C for 30 min to reverse the cross-linking prior to SDS-PAGE. Each biological replicate was separated into five fractions following SDS-PAGE, subjected to in-gel digestion with trypsin and submitted for LC-MS/MS analysis (LTQ Orbitrap Velos Pro mass spectrometer; Thermo Fisher Scientific). Three biological replicates were performed for each genotype.

IVI-MS data analysis

Request a detailed protocol

Raw peptide data files from IVI-MS were analysed by MaxQuant software (version 1.6.10.43) (Cox and Mann, 2008). Peptide tables were then loaded using Proteus (version 0.2.14) (Gierlinski et al., 2018) and summarised to protein level counts using the hi-flyer method (mean of the top three most abundant peptides). Because wild-type plants lacking GFP were used as controls, a large number of the proteins enriched by immunoprecipitation were below the detection threshold in the control. This group of proteins can be classified as ‘missing not at random’ (MNAR). In all proteomics experiments, there will also be a number of proteins which are not detected purely by chance: these are referred to as ‘missing at random’ (MAR). We treated proteins that were missing from all replicates of a condition as MNAR, and proteins that were missing only from a subset of replicates as MAR. Using the imputeLCMD package (version 2.0) (Lazar, 2015), a K nearest neighbours’ strategy was used to impute MAR examples, and a quantile regression imputation of left centred data (QRILC) approach was used to impute MNAR examples. Differential expression analysis was performed on imputed data using limma (version 3.40.0) (Ritchie et al., 2015). Because imputation is not deterministic (i.e. will lead to different outcomes every time), we improved the robustness of the analysis by performing 999 bootstraps of the imputation and differential expression, and summarising the results using the median log2 fold change and harmonic mean p value.

ChIP-Seq

Preparation of libraries for ChIP-Seq

Request a detailed protocol

ChIP against FPA and Pol II phosphorylated at either Ser5 or Ser2 of the CTD heptad repeat was performed as previously described (Yu et al., 2019). Polyclonal antibodies against FPA amino acids 536–901 were raised in rabbit by Covance.

ChIP-Seq data processing

Request a detailed protocol

FPA and Pol II ChIP-Seq data are available at ENA accession PRJNA449914. H3K9me2 ChIP-Seq data were downloaded from ENA accessions PRJDB5192 (Inagaki et al., 2017) and PRJNA427432 (Lai et al., 2020). Reads were aligned to the TAIR10 reference genome using Bowtie2 (version 2.3.5.1) (Langmead and Salzberg, 2012) with the parameters –mm –very-sensitive –maxins 800 –no-mixed –no-discordant. Counts per million normalised coverage profiles were generated using deepTools (version 3.4.3) (Ramírez et al., 2014). For 3′ end centred metagene profiles, we determined the major 3′ position per gene using the Araport11 annotation and existing Col-0 Helicos DRS data (Sherstnev et al., 2012). Metagenes centred on these positions were then generated in Python 3.6 using pyBigWig (version 0.3.17) (Ramírez et al., 2014), Numpy (version 1.18.1) (Harris et al., 2020) and Matplotlib (version 3.1.3) (Hunter, 2007). For differential H3K9me2 analysis, read counts per gene (including intronic regions) were generated using pysam (version 0.16.0), and differential expression analysis was performed using edgeR (version 3.22.5) (Robinson et al., 2010).

RNA

Total RNA isolation

Request a detailed protocol

Total RNA was isolated using RNeasy Plant Mini kit (QIAGEN) and treated with TURBO DNase (Thermo Fisher Scientific) according to the manufacturers’ instructions. The total RNA concentration was measured using a Qubit 1.0 Fluorometer and Qubit RNA BR Assay Kit (Thermo Fisher Scientific), whilst RNA quality and integrity was assessed using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific) and Agilent 2200 TapeStation System (Agilent).

Nanopore DRS

Preparation of libraries for DRS using nanopores

Request a detailed protocol

Total RNA was isolated from Col-0, fpa-8 and 35S::FPA:YFP seedlings as described above. mRNA was isolated and Nanopore DRS libraries prepared (using the SQK-RNA001 Nanopore DRS Kit; Oxford Nanopore Technologies) as previously described (Parker et al., 2020). Libraries were loaded onto R9.4 flow cells (Oxford Nanopore Technologies) and sequenced using a 48 hr runtime. Four biological replicates were performed for each genotype.

Nanopore DRS data processing

Request a detailed protocol

Nanopore DRS reads were basecalled using the Guppy (version 3.6.0) high-accuracy model. Reads were mapped to the Arabidopsis TAIR10 genome (Arabidopsis Genome Initiative, 2000) using minimap2 (version 2.17) with the parameters -a -L --cs=short x splice -G20000 --end-seed-pen=12 --junc-bonus=12 uf. Spliced alignment was guided using junctions from the Araport11 annotation (Cheng et al., 2017). Nanopore DRS reads can suffer from ‘oversplitting’ – where the signal originating from a single RNA molecule is incorrectly interpreted as two or more reads (Parker et al., 2020). These errors can be systematic and result in false positive 3′ ends. To filter these errors, we identified reads that were sequenced consecutively through the same pore and also mapped contiguously on the genome (within 1 kb of each other). In this way, we filtered all except the most 3′ reads, which should contain the genuine RNA 3′ end. Pipelines for processing Nanopore DRS data were built using Snakemake (Köster and Rahmann, 2012).

Helicos DRS

Preparation of samples for Helicos DRS

Request a detailed protocol

Total RNA was isolated from the Col-0, fpa-8 and 35S::FPA:YFP seedlings as described above. Samples were processed by Helicos BioSciences as previously described (Ozsolak et al., 2009; Sherstnev et al., 2012). Three biological replicates were performed for each genotype.

Helicos DRS data processing

Request a detailed protocol

Helicos DRS reads were mapped to the Arabidopsis TAIR10 genome using Heliosphere (version 1.1.498.63) as previously described (Sherstnev et al., 2012). Reads were filtered to remove those with insertions or deletions of >4 nt and to mask regions with low complexity, as determined using DustMasker (Camacho et al., 2009) (from BLAST +suite version 2.10.1) set at DUST level 15 (Sherstnev et al., 2012).

Differential 3′ end analysis of Nanopore and Helicos DRS datasets

Request a detailed protocol

Transcriptional loci were first identified in Col-0, fpa-8 and 35S::FPA:YFP Nanopore DRS reads using the long-read transcript assembly tool StringTie2 version 2.1.1 (Pertea et al., 2015). Novel transcriptional loci were merged with annotated loci from the Araport11 reference (Cheng et al., 2017). To detect sites with altered 3′ end distributions in fpa-8 and 35S::FPA:YFP, we pooled the replicates of either Nanopore or Helicos DRS data and identified reads overlapping each transcriptional locus. These reads were used to build distributions of 3′ end locations. The difference in 3′ end distributions between the treatment and control (Col-0) was measured using EMD. To identify loci with statistically significant differences in 3′ distributions, we performed an EMD permutation test using 999 bootstraps: for this, reads for each locus were randomly shuffled between the treatment and control samples to create null distributions, and the EMD recalculated. The histogram of null EMDs was fitted using a gamma distribution, and the p-value (probability of achieving the observed EMD or greater by chance) was calculated from the distribution. p-Values were corrected for multiple testing using the Benjamini–Hochberg method. Genes with an EMD >25 and an FDR < 0.05 were considered to be differentially alternatively polyadenylated, and the directionality of change was identified using the difference in mean 3′ position. Software developed to perform differential 3′ analysis is available on GitHub at https://github.com/bartongroup/d3pendr and 10.5281/zenodo.4319113, and can be used with Nanopore DRS, Helicos DRS, or Illumina 3′ tag-based datasets.

Illumina RNA sequencing

Preparation of libraries for Illumina RNA sequencing

Request a detailed protocol

Total RNA was isolated from the Col-0, fpa-8 and 35S::FPA:YFP seedlings as described above. mRNA was isolated and sequencing libraries prepared using the NEBNext Ultra Directional RNA Library Prep Kit for Illumina (New England Biolabs) by the Centre for Genomic Research (University of Liverpool). 150 bp paired-end sequencing was carried out on Illumina HiSeq 4000. Six biological replicates were performed for each genotype.

Illumina RNA sequencing data processing

Request a detailed protocol

Illumina RNA-Seq data were assessed for quality using FastQC (version 0.11.9) and MultiQC (version 1.8) (Andrews, 2010; Ewels et al., 2016). Reads were mapped to the TAIR10 genome using STAR (version 2.7.3a) (Dobin et al., 2013) with a splice junction database generated from the Araport11 reference annotation (Cheng et al., 2017). Counts per million normalised coverage tracks were created using samtools (version 1.10) and deepTools (version 3.4.3) (Ramírez et al., 2014). To identify expressed regions in each locus, the coverage profiles of each treatment and control replicate were first extracted using pyBigWig (version 0.3.17) (Ramírez et al., 2014). These were normalised such that the area under each profile was equal to the mean area under the profiles. A normalised coverage threshold of 1 was used to identify expressed regions of the loci. These regions were further segmented when at least two-fold differences in expression within a 25-nt window were found between control and treatment conditions (and then regions smaller than 50 nt removed). Expression of the segmented regions was then calculated using featureCounts (version 2.0.0) (Liao et al., 2013). Each read pair was counted as one fragment, and only properly paired, concordant and primary read pairs were considered. Differential usage within transcriptional loci was assessed using DEXSeq (version 1.32.0) (Reyes et al., 2013). Loci were considered to be differentially processed if they had a locus-level FDR < 0.05 and at least one region with an absolute logFC >1 and FDR < 0.05. For differential splice junction usage analysis, counts of splice junctions annotated in the bespoke Nanopore DRS-derived annotation, plus Araport11 annotation, were generated for each locus using pysam (version 0.16.0). Differential splice junction usage was assessed using DEXSeq (version 1.32.0) (Reyes et al., 2013). Loci were considered to be differentially spliced if they had a locus-level FDR < 0.05 and at least one junction with an absolute logFC >1 and FDR < 0.05.

Gene tracks

Request a detailed protocol

Gene track figures were generated in Python 3.6 using Matplotlib (version 3.1.3) (Hunter, 2007). For gene tracks where any condition had >200 Nanopore DRS read alignments, 200 representative alignments were selected by random sampling without replacement (except for the FPA gene track figure, where 500 read alignments were sampled). nanoPARE data (Schon et al., 2018) were processed as previously described (Parker et al., 2020). For reannotated gene loci, domains were predicted using the InterproScan web client (Mitchell et al., 2019) and LRRs were predicted using LRRpredictor web client (Martin et al., 2020). Protein alignments were created and visualised in Jalview (version 2.11) (Waterhouse et al., 2009) using T-Coffee (Notredame et al., 2000).

Protein domain family enrichment analysis

Request a detailed protocol

To conduct protein domain enrichment analysis, InterPro domain annotations of Arabidopsis proteins were downloaded from BioMart (Smedley et al., 2009) and converted to genomic co-ordinates using the Araport11 annotation (Cheng et al., 2017). Domain families overlapping each locus tested for alternative polyadenylation using either Nanopore or Helicos DRS were identified using pybedtools (version 0.8.1) (Dale et al., 2011). To identify enriched domain families, domains were randomly shuffled between tested loci in 19,999 bootstraps, and the number of times that each domain class overlapped by chance with significantly alternatively polyadenylated loci was recorded. This was compared with the observed overlap of each domain family with alternatively polyadenylated loci to calculate p-values, which were corrected for multiple testing using the Benjamini–Hochberg method.

Manual annotation of alternatively polyadenylated NLR genes

Request a detailed protocol

To identify which of the 206 previously annotated NLR genes present in Col-0 were alternatively polyadenylated in fpa-8 and 35S::FPA, we devised a standard operating procedure for visual inspection. Genes that had Nanopore DRS read coverage in at least two conditions were considered to be expressed. Genes were considered to be alternatively polyadenylated if they had multiple 3′ end locations with each supported by at least four Nanopore DRS reads, and if there was a clear difference in Nanopore DRS coverage in the treatment condition compared with Col-0. Helicos and Illumina corroboration of poly(A) sites and coverage changes was also taken into consideration.

Genomic organisation of alternatively polyadenylated NLR genes

Request a detailed protocol

To test whether expressed NLR genes with FPA-dependent alternative polyadenylation were associated with NLR gene clusters, we used previously produced cluster assignments for Col-0 NLR genes (Lee and Chae, 2020). We also tested the association of FPA-dependent alternative polyadenylation with previously produced hypervariable NLR classifications (Prigozhin and Krasileva, 2021). The association of alternatively polyadenylated genes with both major NLR gene clusters and hypervariable NLRs was assessed using a Chi squared test. To test whether FPA-sensitive NLRs are found in regions with high synteny diversity, we used 5 kb sliding window estimates of synteny diversity calculated from seven diverse Arabidopsis ecotypes (Jiao and Schneeberger, 2020). For each expressed NLR gene, the window with the largest overlap was used as the estimate of synteny diversity. The association with alternatively polyadenylated genes was assessed using a t-test.

RNA gel blot analysis of RPP7 mRNAs

Request a detailed protocol

RNA gel blot analysis was carried out as previously described (Quesada et al., 2003) with minor modifications. RPP7 mRNA was detected using a probe annealing to the second exon of the RPP7 (AT1G58602) gene (200 bp PCR product amplified with the following primers: Forward: 5′-TCGGGGACTACTACTACTCAAGA-3′ and Reverse: 5′-TCTTGATGGTGTGAAAGAATCTAGT-3′). β-TUBULIN mRNA was used as a loading control and visualised by a probe annealing to the third exon of the β-TUBULIN (AT1G20010) gene (550 bp PCR product amplified with the following primers: Forward: 5′- CTGACCTCAGGAAACTCGCG-3′ and Reverse: 5′- CATCAGCAGTAGCATCTTGG-3′). The probes were 5′ labelled using [γ-32P]-ATP (Perkin Elmer) and DECAprime II DNA labelling kit (Thermo Fisher Scientific) and purified on illustra G-50 columns (GE Healthcare Life Sciences). mRNA isoforms were visualised and quantified using an Amersham Typhoon Gel and Blot Imaging System (GE Healthcare Bio-Sciences AB). The RiboRuler High Range RNA Ladder (Thermo Fisher Scientific), used to identify the approximate size of RNA bands, was first dephosphorylated using FastAP Thermosensitive Alkaline Phosphatase (Thermo Fisher Scientific) and then labelled with [γ-32P]-ATP (Perkin Elmer) using T4 Polynucleotide Kinase (Thermo Fisher Scientific) before gel loading.

m6A LC-MS/MS

Request a detailed protocol

Total RNA was isolated and checked as described above. mRNA was extracted twice from approximately 75 μg of total RNA using the Dynabeads mRNA Purification Kit (Thermo Fisher Scientific) according to the manufacturer’s instructions. The quality and quantity of mRNA was assessed using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific) and Agilent 2200 TapeStation System (Agilent). Samples for m6A LC-MS/MS were prepared as previously described (Huang et al., 2018) with several modifications. First, 100 ng mRNA was diluted in a total volume of 14 ml nuclease-free water (Thermo Fisher Scientific) and digested by nuclease P1 (1 U, Merck) in 25 µl buffer containing 20 mM NH4OAc (pH 5.3) at 42°C for 2 hr. Next, 3 µl freshly made 1 M NH4HCO3 and calf intestinal alkaline phosphatase (1 U, New England Biolabs) were added, and samples were incubated at 37°C for 2 hr. The samples were then diluted to 50 µl with nuclease-free water and filtered (0.22 μm pore size, 4 mm diameter; Millipore). LC-MS/MS was carried out by the FingerPrints Proteomics facility at the University of Dundee. m6A/A ratio quantification was performed in comparison with the curves obtained from pure adenosine (endogenous P1 receptor agonist, Abcam) and m6A (modified adenosine analog, Abcam) nucleoside standards. Statistical analysis was performed using a two-way t-test.

Pathogenesis assays

Request a detailed protocol

Pathogenesis assays were carried out as previously described (Tomé et al., 2014). The Hpa-Hiks1 isolate was maintained by weekly sub-culturing on Ksk-1 plants. A solution containing Hpa-Hiks1 spores was used to inoculate 14-day-old Col-0, Ksk-1, fpa-7, fpa-8, pFPA::FPA, and 35S::FPA:YFP seedlings. Sporangiophores were counted 4 days after inoculation. The experiment was repeated three times using up to 45 plants per genotype per each repeat. Statistical analysis was performed with negative binomial regression using Statsmodels (version 0.11.0) (Seabold and Perktold, 2010), plants were grouped by experimental repeat during testing to control for variation between repeats.

Data availability

IVI-MS data is available from the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD022684 (Perez-Riverol et al., 2019, DOI: https://doi.org/10.1093/nar/gky1106). FPA and Pol II ChIP-Seq data is available from ENA accession PRJNA449914. Col-0 nanopore DRS data is available from ENA accession PRJEB32782. fpa-8 and 35S::FPA:YFP nanopore DRS data is available from ENA accession PRJEB41451. hen2-2 nanopore DRS data is available from ENA accession PRJEB41381. Col-0, fpa-8 and 35S::FPA:YFP Helicos DRS data is available from Zenodo DOI 10.5281/zenodo.4309752. Col-0, fpa-8 and 35S::FPA:YFP Illumina RNA-Seq data is available from ENA accession PRJEB41455. All pipelines, scripts and notebooks used to generate figures are available from GitHub at https://github.com/bartongroup/Simpson_Barton_FPA_NLRs and Zenodo a thttps://doi.org/10.5281/zenodo.4319109. The software tool developed for detecting changes in poly(A) site choice in Nanopore and Helicos DRSdata are available from GitHub at https://github.com/bartongroup/d3pendr and Zenodo athttps://doi.org/10.5281/zenodo.4319113.

The following data sets were generated
    1. Parker MT
    2. Knop K
    3. Zacharaki V
    4. Sherwood AV
    5. Tome D
    6. Yu X
    7. Martin P
    8. Beynon J
    9. Michaels S
    10. Barton GJ
    11. Simpson GG
    (2020) ENA
    ID PRJEB41451. Nanopore direct RNA sequencing of FPA mutants and overexpressors.
    1. Parker MT
    (2020) Zenodo
    bartongroup/Simpson_Barton_FPA_NLRs: preprint version.
    https://doi.org/10.5281/zenodo.4319109
    1. Parker MT
    (2020) Zenodo
    bartongroup/d3pendr: preprint release.
    https://doi.org/10.5281/zenodo.4319113
The following previously published data sets were used
    1. Parker MT
    2. Knop K
    3. Barton GJ
    4. Simpson GG
    (2020) ENA
    ID PRJEB41381. Nanopore direct RNA sequencing of hen2-2 mutants.
    1. Parker MT
    2. Knop K
    3. Zacharaki V
    4. Sherwood AV
    5. Tome D
    6. Yu X
    7. Martin P
    8. Beynon J
    9. Michaels S
    10. Barton GJ
    11. Simpson GG
    (2020) ENA
    ID PRJEB32782. Nanopore Direct RNA Sequencing Maps the Arabidopsis m6A Epitranscriptome.
    1. Yu X
    2. Martin PGP
    3. Michaels SD
    (2019) ENA
    ID PRJNA449914. Genome-wide occupancy of BDR1, BDR2 and FPA (ChIP-seq).
    1. Schon MA
    2. Kellner MJ
    3. Plotnikova A
    4. Hofmann F
    5. Nodine MD
    (2018) ENA
    ID PRJNA449355. nanoPARE: Parallel analysis of RNA 5' ends from low input RNA.
    1. Inagaki S
    2. Takahashi M
    3. Hosaka A
    4. Ito T
    5. Toyoda A
    6. Fujiyama A
    7. Tarutani Y
    8. Kakutani T
    (2017) ENA
    ID PRJDB5192. The gene-body chromatin modifications dynamics mediates epigenome differentiation in Arabidopsis.
    1. Lai Y
    2. Lu XM
    3. Daron J
    4. Pan S
    5. Wang J
    6. Wang W
    7. Tsuchiya T
    8. Holub E
    9. McDowell JM
    10. Slotkin RK
    11. Le RKG
    12. Eulgem T
    (2020) ENA
    ID PRJNA427432. Genome-wide profilings of EDM2-mediated effects on H3K9me2 and transcripts in Arabidopsis thaliana.

References

  1. Software
    1. Lazar C
    (2015) imputeLCMD: A Collection of Methods for Left-Censored Missing Data Imputation
    imputeLCMD: A Collection of Methods for Left-Censored Missing Data Imputation.
    1. Mayr C
    (2019) What are 3' UTRs doing?
    Cold Spring Harbor Perspectives in Biology 11:a034728.
    https://doi.org/10.1101/cshperspect.a034728
    1. Raxwal VK
    2. Riha K
    (2016) Nonsense mediated RNA decay and evolutionary capacitance
    Biochimica Et Biophysica Acta (BBA) - Gene Regulatory Mechanisms 1859:1538–1543.
    https://doi.org/10.1016/j.bbagrm.2016.09.001
  2. Conference
    1. Seabold S
    2. Perktold J
    (2010)
    Statsmodels: econometric and statistical modeling with Python
    Proceedings of the 9th Python in Science Conference. pp. 92–96.

Decision letter

  1. Hao Yu
    Reviewing Editor; National University of Singapore & Temasek Life Sciences Laboratory, Singapore
  2. Detlef Weigel
    Senior Editor; Max Planck Institute for Developmental Biology, Germany
  3. Chae Eunyoung
    Reviewer
  4. Blake C Meyers
    Reviewer; Donald Danforth Plant Science Center, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

In this study, the authors examined the function of the RNA-binding protein FPA through analyzing its protein interactome and its global impact on gene expression using a combined approaches of Nanopore DRS, Helicos DRS, and short-read Illumina RNA-Seq. The combined datasets and new computational approaches developed by the authors showed a predominant role of FPA in promoting poly(A) site choice. The authors further revealed that FPA mediates widespread premature cleavage and polyadenylation of the transcripts of NLR genes, which act as important plant immune regulators. Overall, this study suggests that control of transcription termination processes mediated by FPA provides an additional layer of the regulatory dynamics of NLRs in plant immune responses.

Decision letter after peer review:

Thank you for submitting your article “Widespread premature transcriptiontermination of Arabidopsis thaliana NLR genes by the spen protein FPA” for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by Hao Yu as the Reviewing Editor and Detlef Weigel as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Chae Eunyoung (Reviewer #1); Blake C Meyers (Reviewer #3).

Essential Revisions:

While there was agreement that the topic is timely and the findings relevant, there were some concerns regarding manuscript structure, inconsistency among some results, and interpretation of biological relevance of the data, as listed below, that need to be addressed to support the conclusions.

1. The manuscript presents an extensive body of studies in analyzing FPA interacting proteins and its potential RNA targets including NLRs. Although the overall results cover a series of observations, many of them are descriptive and divert the audience's attention from understanding the novelty and significance of the findings. Thus, we suggest that the authors re-organize the manuscript into a more coherent story and focus on the most important data pertaining to NLR control as shown in the title and the abstract.

2. The authors should address or explain some inconsistencies in the results as mentioned by reviewers. For example, the authors found that at least for some pathogens, "loss of FPA function does not reduce plant resistance". This is not consistent with the hypothesis that FPA is important for regulating NLR immune response genes, and the observation that premature exonic termination of RPP7 caused by FPA has a functional consequence for Arabidopsis immunity against Hpa-Hiks1.

3. The significance of this study will be strengthened by analysis of the biological relevancy of the alternative polyadenylation events mediated by FPA pertaining to NLR functions. We suggest that the authors consider either providing new experimental data or clearly interpreting existing results, such as those relevant to regulation of RPP7, to provide better insights into biological significance of the data presented in this manuscript.

Please also take into consideration the other specific comments from the reviewers below to revise the manuscript.

Reviewer #1:

The manuscript by Parker and colleagues presents an extensive body of work on characterizing the role of FPA in the choice of polyadenylation sites in transcripts of A. thaliana. Investigation on the mechanistic details that FPA engages on the mRNA processing was first initiated with the in vivo pull-down followed by LC-MS/MS, which revealed the its protein interactome relevant for 3'-end processing. The main dataset pertaining to the manuscript title comes from the comparative transcriptome analysis of Col-0, fpa-8 mutant and the overexpressor of FPA, 35S:FPA:YFP. The strength of this work lies in the use of nanopore DRS by demonstrating the layers of FPA-dependent transcripts, including its own, and its comparison to datasets by Illumina RNA-Seq and Helicos DRS. The systematic analysis uncovered unexpected complexity in the A. thaliana NLR transcriptome under the control of FPA and thus delivers a new insight on NLR biology. Several studies anecdotally have reported the importance of using genomic DNA, but not a single cDNA species, for addressing full functionality of NLR genes. Recent advances in NLRome sequencing from multiple genomes of a species and NLR structure/function studies also highlight the importance of understanding modular nature of NLR. As alluded with the modular diversity of NLRs kept in the genomes of a species in recent studies, NLR genes are prone to reshuffle in the genome to generate different variants, including partial entities with the loss of some parts of the proteins or even chimeras, supposedly maximizing the repertoire for defense. This work adds the level of transcript diversity on that of genomic diversity; FPA, an essential factor for transcription termination determinant, targets numerous NLRs to control the layers of NLR transcriptome of an individual plant. Although it is yet to be clarified for the regulatory significance of FPA-mediated NLR transcript changes under biotic or abiotic conditions, the authors succeeded in employing fine genetic schemes utilizing FPA-defective vs. -overexpressing lines along with long-read nanopore DRS technology for the first time to uncover the breadth of differential transcript generation focused on 3'-end choices. This work is timely and impactful for NLR research owing to the above-mentioned recent advances in NLR field.

As this work is the first of its kind in utilizing nanopore DRS to address NLR transcriptome, several technical concerns can be addressed to corroborate the claims made in the manuscript, which authors can find in the following section (1-8). Regarding the organization of the manuscript, the authors may consider to rebalance the two parts: FPA interactome vs. FPA targets and NLRs. Overall, the manuscript can be seen as combining two stories; first to characterize FPA function in 3'-end processing of transcripts inferred by interacting proteomes and meta-analysis of ChIP-seq data; second part includes detailed analysis of NLR transcripts and others. Although the first half of the analysis is a necessary prelude to the following NLR analysis, the current title and academic novelty mainly lies, or were intended by the authors, on the NLR analysis. However, current manuscript has relatively enlarged section of the first with NLR analysis packed into a series of supplementary dataset. If authors wishes to opt for highlighting NLR analysis, the following suggestions would help (9-14).

1) Earth mover distance (EMD) has been applied to identify a locus with alternative polyadenylation. What is the basis of using EMD value of 25 as a cutoff? According to Figure 4 B,D, EMD can range from 0-4000. One would also wonder if the distance unit equals bp.

In addition, EMD values of some genes (e.g. FPA and representative NLRs) can be specified in the main dataset so that significance of the cut-off values shall be appreciated.

2) Regarding the manual annotation of alternatively polyadenylated NLR genes (L1160-):

Genes with alternative polyadenylation were identified and the ending location was supported when there were minimum four DRS reads. It would be relevant to provide the significance of "the four" based on read coverage statistics, for example, with average read number covering an annotated NLR transcript with the specification of an average size.

3) Figure 4E shows that Ilumina-RNAseq dataset detects the number of loci with a different order of magnitude compared with the other two methods. Reference-agonistic pipeline shall be appreciated, however, the method engaged might have elevated the counting of paralogous reads mapped to different locations than they should be. Along with paralogous read collapsing, this is always a problem with tandemly repeated genes, such as NLRs by and large. For example, NLR paralogs in a complex cluster with conserved TIR/NBS but diversified LRRs would have higher coverage in the first two domains but drop in the diversified parts. The authors need to specify their bioinformatic consideration to avoid such problems.

Although the tone of the Illumina read section was careful and the main 3'-end processing conclusion was made by nanopore DRS, the authors are also advised to clearly state the limitation of using Illumina-RNAseq to address alternative polyadenylating sites at the beginning of the section, for example what to be maximally taken out from Figure 4 E and 4F. This will give relative weights to each dataset generated by different methods. One advantage of using Illumina data would be that the expression level changes can be associated with changes in processing, it seems.

4) At the RPP7 locus, At1g58848 is identical in sequences with At1g59218 as is At1g58807 with At1g59214 (two twins in the RPP7 cluster by tandem duplication). It would be good to check whether the TE At1g58889 readthrough indeed occurs in the sister duplicate with a potential TE in the downstream of At1g59218. If not, it can be used as an example of duplication and neofunctionalization through an alternative polyadenylation site choices.

5) HMM search shall be revisited to confirm if they are to detect the TIR domain. Given that a large proportion of NLRs in A. thaliana carry TIR at their N-terminal ends and the specified examples included TIR-NLR, it is surprising to see no TIR domain in Figure 5.

6) L659-668: how does the new data relate to the previously TAIR annotated At1g58602.1 vs At1g58602.2 (Figure 6, Inset 1)? It would be good to see these clearly stated in the main text as compared to newly identified ones. From the nanopore profiling, At1g58602.2 appears to be the dominant form.

7) One thing to note is that in the overexpressor of which Hiks1 R is suppressed, there was hardly any At1g58602.1 produced in addition to the large reduction of At1g58602.2. Thus, relative functional importance of the two transcripts shall be discussed in line with the Hpa resistance data. Accordingly, L740-741 phrasing shall be revised to include the possibility of absolute or relative "depletion" of functional transcript(s) contributing to the compromise in Hpa resistance.

8) It would be necessary to state in the main text the implication of phosphorylation on the two Ser residues on Pol II at L245. A clear description distinguishing the effect of the two phosphorylation and the specificity of the antibodies is desirable, as the data was interpreted as if the two sites made differences, such that Ser2 was heavily emphasized (e.g. subtitle). Albeit low level, Ser5 data also shows an overlap with FPA ChIP-seq coverage at the 3' end. If there is a statistical significance to be taken account to interpret the coverage, please state it. Given that elongation occurs progressively, I wonder how much should be taken out from the distinction.

9) Figures presentation for RPP4 and RPP7 are great in detailing the FPA-dependent NLR transcript complexity. To make the functional link more evident, the authors may consider bringing up parts of the Figure 5-supplement to a main Figure to detail the revised annotation of NLRs. Given recent advances in NLR structure and function studies, extra domain fusion, fission and truncated versions of NLRs require a great deal of attention. For example, potential functional link to the NMD-mediated autoimmunity and revised annotation of At5g46470 (RPS6) needs a clear visual guidance preferably with a main figure (Figure 5-Sup3).

10) The section "FPA controls the processing of NLR transcripts" includes dense information and can be broken down to several categories. To this end, Supplement File 3 (NLR list) shall be revised to deliver the categorical classes and further details and converted to a main table.

For NLR audience, for example, it would be important to associate the information to raw reads to assess where the premature termination would occur. At least, the ways to retrieve dataset or to curate the termination sites shall be guided.

On the contrary, there is no need to include other genes in Figure 4-Figure Supplement 4-8 under this section. They are not NLRs.

11) Figure 7 and IBM1 section can be spared to supplement.

12) The list of "truncated NLR transcripts" in particular, either by premature termination within protein-coding or with intronic polyadenylation, should be made as a main table. The table can be preferably carrying details in which degree the truncation is predicted to be made. With current sup excel files, it is difficult to assess the breadth of the FPA effect on the repertoire of NLRs and their function. This way, functional implication of differential NLRs transcriptome can be better emphasized.

13) FPA-mediated NLR transcript controls, as to promote transcript diversity, is expected to exert its maximum effect if FPA level or activity is subject to the environmental stresses, such as biotic or abiotic stresses. The discussion on effectors targeting RNA-binding proteins (L909-918) is a great attempt in broadening the impact of this research. In addition, if anything is known to modulate FPA activity, such as biotic or abiotic stresses or environmental conditions, please include in the discussion.

14) NLR transcript diversity as source of cryptic variation contributing to NLR "evolution" is an interesting concept, however, evolutionary changes require processes of genic changes affecting transcript layers or stabilizing transcriptome diversity. In the authors' proposition in looking into accessions, potential evolutionary processes can be further clarified.

Reviewer #2:

Parker et al attempted to show that the FPA protein functions to regulate the widespread premature transcription termination of the Arabidopsis NLR genes. Using in vivo interaction proteomic-mass spectrometry, FPA was shown to co-purified with the mRNA 3' end processing machinery. Metagene analysis was used to show that FPA co-localized with Pol II phosphorylated at Ser2 of the CTD heptad repeat at the 3' end of Arabidopsis genes. Using a combination of Illumina RNA-Seq, Helicos, and nanopore DRS technologies, FPA was found to affect RNA processing by promoting poly(A) site choice, and hence controls the processing of NLR transcripts whereas such process is independent of IBM1.

Overall, it is a potentially important research. The data is rich and could be useful. However, the biological stories described are not thoroughly supported by the data presented, especially when the authors tried to touch on several aspects without some important validations and strong connections among different parts. Some special comments are provided below:

(1) The title of this manuscript is "The expression of Arabidopsis NLR immune response genes is modulated by premature transcription termination and this has implications for understanding NLR evolutionary dynamics". Therefore, the readers will expect some functional connections between the FPA and the novel NLR isoforms due to premature transcription termination. However, the transcript levels of plant NLR genes are under strict regulation (e.g. Mol. Plant Pathol. 19:1267). Since the functions of NLR genes are related to effector-triggered immunity, it is more important to study the function of FPA on premature transcription termination when the plants are challenged with pathogens. In this manuscript, most transcript analyses are based on samples under normal growth conditions. It is therefore a weak link between the genomic studies and the functional aspects. For instance, it is more important to identify unique NLR isoforms produced upon pathogen challenges that are regulated by FPA. The authors will need to provide some of these data to fill this gap.

(2) Since the function of FPA is to regulate NLR immune response genes, we should expect a change in plant defense phenotype in FPA loss-of-function mutants. Could the authors provide more information on this? On the contrary, in line 728 of this manuscript, the authors found that at least for some pathogens, "loss of FPA function does not reduce plant resistance". It is not consistent with the hypothesis that FPA is important to regulate NLR immune response genes.

(3) Furthermore, the authors mentioned in lines 729-731 "Greater variability in pathogen susceptibility was observed in the fpa-8 mutant and was not restored by complementation with pFPA::FPA, possibly indicating background EMS mutations affecting susceptibility." Does it mean that fpa-8 contains other mutations? Will these additional mutations complicate the results of the RNA processing? Could the authors outcross the fpa-8 mutation to a clean background?

(4) In line 318, the authors found 285 and 293 APA events in the fpa-8 mutant and the 35S::FPA:YFP construct respectively, but only 59 loci (line 347) exhibited opposite APA events (about one fifth). The low overlapping frequency suggests that some results could be false positive.

(5) In line 732-736: "In contrast, 35S::FPA:YFP plants exhibited a similar level of sporulation to the pathogen-sensitive Ksk-1 accession (median 3 sporangiophores per plant). This suggests that the premature exonic termination of RPP7 caused by FPA has a functional consequence for Arabidopsis immunity against Hpa-Hiks1." It is contradictory to the statement in line 728 that "loss of FPA function does not reduce plant resistance". Is it possible that overexpression of FPA:YFP had generated an artificial condition that is not related to the natural function of FPA?

(6) The fpa-8 mutant has a delayed flower phenotype (Plant Cell 13:1427). Could the 35S::FPA:YFP fusion protein construct reverse this phenotype and the plant defense response phenotype? It is important to interpret the data when the 35S::FPA:YFP construct was used to represent the overexpression of FPA.

(7) Under the subheading "FPA co-purifies with the mRNA 3' end processing machinery". The results were based on in vivo interaction proteomics-mass spectrometry. MS prompts to false positives and will need proper controls and validations. Have the authors added the control of 35S:YFP instead of just the untransformed Col-0? At least for the putative interacting partners in Figure 1A, could the authors perform validations of some important targets, using techniques such as reverse co-IP, or to show direct protein-protein interaction between FPA to a few of the important targets by in vitro pull-down, BiFC, or FRET, etc.

(8) In Fig. 3, the data show that the last exon of the FPA gene is missing in the FPA transcripts generated from the 35S::FPA:YFP construct. Will the missing of this exon affect the function of the transcript and the encoded protein?

(9) The function of FPA is still ambiguous. There was a quantitative shift toward the selection of distal poly(A) sites in the loss-of-function fpa-8 mutant and a strong shift to proximal poly(A) site selection when FPA is overexpressed (35S::FPA:YFP) in some cases (Fig. 3, Fig. 5, Fig. 8). But the situation could be kind of reversed in other cases (Fig. 6). What is the mechanism behind it?

(10) Under the subheading: "The impact of FPA on NLR gene regulation is independent of its role in controlling IBM1 expression". IBM1 is a common target of FPA and IBM2. Indeed, FPA and IBM2 share several common targets (Plant Physiol. 180:392). It may be more meaningful to compare the impact of FPA and IBM2 on NLR gene instead.

(11) In lines 423-425, the authors described "Consistent with previous reports, the level of mRNA m6A in the hypomorphic vir-1 allele was reduced to approximately 10% of wild-type levels (Parker et al., 2020b; Ruzicka et al., 2017) (Figure 4-figure supplement 3)." This data could not be found.

(12) In line 426: "However, we did not detect any differences in the m6A level between genotypes with altered FPA activity." Which data this statement is referring to?

Reviewer #3:

In the article "Widespread premature transcription termination of Arabidopsis thaliana NLR genes by the spen protein FPA", the authors describe the function of FPA as a mediator of premature cleavage and polyadenylation of transcripts. They also focused their study on NLR-encoding transcripts, as that was their most novel observation, describing an additional layer of control.

In general, the article is well written and clear. The experimental design is good, they didn't seem to over-interpret the results, the controls were solid, and the nanopore data were quite informative for their work. It is rather descriptive maybe bordering on dry in parts - but the results will be helpful for those working on NLRs, and demonstrate the utility of bulk long-read transcript data. The authors were able to string together a number of descriptive observations or vignettes into an informative paper. Overall, it is solid science, but maybe not monumental.

One minor complaint is that the authors don't focus on NLRs starting on line 436, and then they have extensive results on NLRs; by the time I got to the discussion, I'd forgotten about the early focus on the M6A. While the first part of the article is necessary, I would suggest a more concise results section to give the paper more focus on the NLR control (since that is emphasized in the abstract and the title of the manuscript).

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Widespread premature transcription termination of Arabidopsis thaliana NLR genes by the spen protein FPA" for further consideration by eLife. Your revised article has been evaluated by Detlef Weigel (Senior Editor), Hao Yu (Reviewing Editor) and three reviewers who reviewed the last version of the manuscript.

We feel that this revised manuscript has been significantly improved, but there are some remaining issues that need to be addressed, as indicated in the following comments given by Reviewers 1 and 2.

Reviewer #1 (Recommendations for the authors):

The authors made great efforts to reorganize the manuscript to address comments from all three reviewers. Current manuscript supports the main claim on FPA modulating the NLR regulation with a series of graphic illustration as main figures with supporting supplements. These encompass the breadth of regulatory roles of FPA on different NLR genes, in particular. Their quantitative assessment of the FPA effects on clustered or hypervariable NLR genes have been performed in a sound way, taking on the latest research outcomes (2020-2021 publications) on NLR diversity and evolution.

Reviewer #2 (Recommendations for the authors):

Overall, it is a piece of interesting research supported with rich data. The authors have addressed much of the concerns in the revised version and through further explanations. Some remaining questions could be addressed via clarification, strengthened comparison, and additional discussions.

1. In relation to my original Question 1. Since the title of this manuscript is "Widespread premature transcription termination of Arabidopsis thaliana NLR genes by the spen protein FPA" and some NLR gene expressions are responsive to pathogen attack, the readers may be interested to know the changes in NLR genes under pathogen attack conditions that are regulated by FPA. If the authors have these data, it will be great to share.

2. In relation to my original Question 2 and Question 5. Since overexpression of FPA only partially reduces the level of functional RPP7 transcripts, is it possible that FPA overexpression also acts on other NLR transcripts that leading to loss of resistance?

3. In relation to my original Question 4. Is it possible to make a comparison directly between the 35S::FPA:YFP line versus the fpa-8 mutant to investigate see whether all disappeared pre-mature transcriptional terminations have returned to the level of Col-0 or even more?

4. In relation to my original Question 6. The authors showed that overexpression FPA will decrease the overall FLC transcripts. Is the FPA acting on the pre-mature transcriptional termination of FLC too? Any data to support this?

5. In relation to my original Question 7. Does the anti-FPA chip data match well with the proximal APA in Col-0?

6. In relation to my original Question 9 and Question 10. IBM1 is a common target of FPA and EDM2, indicating the possible coordination of the FPA and EDM2 functions. There have been several studies on EDM2, could the authors compare the target of FPA and EDM2, and also address whether FPA also targets TEs in introns of function genes similar to that of EDM2?

Reviewer #3 (Recommendations for the authors):

I am satisfied with the authors' response to the reviewers, including the valuable points raised by the other reviewers. The extensive changes that the authors made to the manuscript have substantially improved the work.

https://doi.org/10.7554/eLife.65537.sa1

Author response

Essential Revisions:

While there was agreement that the topic is timely and the findings relevant, there were some concerns regarding manuscript structure, inconsistency among some results, and interpretation of biological relevance of the data, as listed below, that need to be addressed to support the conclusions.

1. The manuscript presents an extensive body of studies in analyzing FPA interacting proteins and its potential RNA targets including NLRs. Although the overall results cover a series of observations, many of them are descriptive and divert the audience's attention from understanding the novelty and significance of the findings. Thus, we suggest that the authors re-organize the manuscript into a more coherent story and focus on the most important data pertaining to NLR control as shown in the title and the abstract.

We have restructured the manuscript to address this concern. Following the specific comments of the reviewers we have:

– Carried out a major-re-write of the Results section.

– Reduced the detailed descriptions in the proteomics and Illumina RNA-seq sections, so as to reach the NLR focus of the study more quickly.

– Created a new main text Figure explaining RNA processing around RPS6.

– Summarized the different FPA-dependent effects on NLRs into 3 new tables to provide a broader view that complements the examples detailed in the text and the detailed data provided as supplementary materials.

– We have removed non-NLR examples of FPA regulation from the manuscript, to focus on the NLR aspect.

– We have changed the text-based descriptions of FPA-NLR control, to rationalize the logic and focus of illustrated examples.

– We have moved the Figure on IBM1 from the main text to a Supplementary Figure.

– We have edited the section on RPP7-dependent pathogen testing, to clarify apparent misunderstanding.

2. The authors should address or explain some inconsistencies in the results as mentioned by reviewers. For example, the authors found that at least for some pathogens, "loss of FPA function does not reduce plant resistance". This is not consistent with the hypothesis that FPA is important for regulating NLR immune response genes, and the observation that premature exonic termination of RPP7 caused by FPA has a functional consequence for Arabidopsis immunity against Hpa-Hiks1.

There is a straightforward misunderstanding here, possibly because our text in the relevant section was not sufficiently clear.

We tested the impact of different activity levels of Arabidopsis FPA on NLR function by investigating the NLR, RPP7. We chose RPP7 because features of its function and regulation are relatively well characterised. RPP7 provides disease resistance to the oomycete pathogen Hyaloperonospora arbidopsidis (Hpa) strain Hiks1. The reference Arabidopsis accession, Col-0, encodes a functional RPP7 gene and hence is resistant to Hpa-Hiks1 infection. Not all Arabidopsis accessions are resistant to all Hpa strains. For example, the Duc-1 and Ksk-1 accessions have been reported as having susceptibility to Hpa-Hiks1 infection, likely due to the lack of a functional RPP7 gene (Lai et al., 2019). It was for this reason that we incorporated the Ksk accession as an infectionsensitive positive control accession in our pathogen tests.

The question we were addressing was: Does FPA-dependent premature cleavage and polyadenylation in RPP7 exon 6 compromise RPP7 function? To address this question, we therefore applied Hpa-Hiks to our different genetic lines. Neither Col-0 nor the fpa-8 mutant (which is in the Col-0 genetic background) were sensitive to infection. This is consistent with our hypothesis because the poly(A) site used in exon 6 in Col-0, is used significantly less in fpa-8. Hence, there is no compromise in the expression of full-length RPP7 in fpa-8 mutants. As Col-0 is already resistant to Hpa-Hiks1, we would therefore expect fpa-8 to also be resistant and indeed, this is what we found.

This was also true when we tested an independent allele, fpa-7, that is also in the Col-0 background. However, when we tested the line that was over-expressing FPA, which was introduced into an fpa-8 background (and hence, ultimately Col-0), we found that resistance was lost and Hpa-Hiks1 was able to infect these plants.

Therefore, the findings from this experiment are completely consistent “with the hypothesis that FPA is important for regulating NLR immune response genes, and the observation that premature exonic termination of RPP7 caused by FPA has a functional consequence for Arabidopsis immunity against Hpa-Hiks1.” We have clarified the text in this section to make our hypothesis and findings clearer.

3. The significance of this study will be strengthened by analysis of the biological relevancy of the alternative polyadenylation events mediated by FPA pertaining to NLR functions. We suggest that the authors consider either providing new experimental data or clearly interpreting existing results, such as those relevant to regulation of RPP7, to provide better insights into biological significance of the data presented in this manuscript.

To more clearly address the predicted consequences of FPA regulated alternative polyadenylation, we have added tables for the three classes of FPA-regulated alternative polyadenylation events to the main text and made predictions of the functional consequences for intronic and exonic polyadenylation events. Since there are several documented examples of TIR-only or LRR-truncated NLRs regulating resistance and cell death, we have also changed some of the example genes used to focus on those proximal polyadenylation events which can alter protein coding potential.

Please also take into consideration the other specific comments from the reviewers below to revise the manuscript.

Reviewer #1:

[...] If authors wishes to opt for highlighting NLR analysis, the following suggestions would help (9-14).

1. Earth mover distance (EMD) has been applied to identify a locus with alternative polyadenylation. What is the basis of using EMD value of 25 as a cutoff? According to Figure 4 B,D, EMD can range from 0-4000. One would also wonder if the distance unit equals bp. In addition, EMD values of some genes (e.g. FPA and representative NLRs) can be specified in the main dataset so that significance of the cut-off values shall be appreciated.

We found that for some very highly expressed loci, we were able to detect statistically significant changes in poly(A) site usage with very small effect sizes which were unlikely to represent functionally important changes. An EMD threshold was therefore required for removing these small effect size loci. The EMD is informally described as the minimum amount of “work” required to turn one distribution into another – it represents the percentage of the distribution moved multiplied by the distance moved. For example, an EMD of 25 could describe a situation where 10% of the transcripts have shifted by 250 nt, or 50% of the transcripts have shifted by 50 nt. A threshold of 25 gives a good trade-off between the percentage of proximal/distal site switching, and the distances between sites (since larger changes in distance are more likely to result in functional changes). We have included EMD values for example NLRs in the main text to give an idea of effect sizes of these genes.

2. Regarding the manual annotation of alternatively polyadenylated NLR genes (L1160-): Genes with alternative polyadenylation were identified and the ending location was supported when there were minimum four DRS reads. It would be relevant to provide the significance of "the four" based on read coverage statistics, for example, with average read number covering an annotated NLR transcript with the specification of an average size.

We have previously demonstrated that both Helicos and Nanopore DRS reads are able to capture the true 3’ ends of single RNA molecules. However, both techniques have some technical limitations which may result in artefacts – for example, the over-splitting of nanopore signal from a single molecule into multiple reads, or the incorrect alignment of low-quality basecalls at the ends of reads. For this reason, and also to standardise our approach to manually identifying FPA-regulated NLRs, we developed a standard operating procedure. We chose to identify poly(A) sites using a minimum of four nanopore read alignments, as a trade-off between sensitively detecting genuine alternative polyadenylation events, and ignoring events caused by poor alignment of low-quality reads or over-splitting. We also looked for evidence of events seen in nanopore data in other sequencing datasets, particularly the Helicos DRS alignments, to corroborate our findings. We have improved the language of the relevant methods section to clarify this.

3. Figure 4E shows that Ilumina-RNAseq dataset detects the number of loci with a different order of magnitude compared with the other two methods. Reference-agonistic pipeline shall be appreciated, however, the method engaged might have elevated the counting of paralogous reads mapped to different locations than they should be. Along with paralogous read collapsing, this is always a problem with tandemly repeated genes, such as NLRs by and large. For example, NLR paralogs in a complex cluster with conserved TIR/NBS but diversified LRRs would have higher coverage in the first two domains but drop in the diversified parts. The authors need to specify their bioinformatic consideration to avoid such problems.

Although the tone of the Illumina read section was careful and the main 3'-end processing conclusion was made by nanopore DRS, the authors are also advised to clearly state the limitation of using Illumina-RNAseq to address alternative polyadenylating sites at the beginning of the section, for example what to be maximally taken out from Figure 4 E and 4F. This will give relative weights to each dataset generated by different methods. One advantage of using Illumina data would be that the expression level changes can be associated with changes in processing, it seems.

The reviewer is correct that multimapping reads are an issue at NLR genes and may lead to uneven coverage of uniquely and multi-mapped reads when some regions of a gene are divergent, and others are not. Although it is the relative change in coverage of exons or expressed regions which is important in DEXSeq analysis (rather than absolute coverage), it is possible that changes in processing that cause relative expression changes at one NLR locus may have impacts on the relative expression of multimapping regions at other paralogous NLR loci. We addressed this issue when quantifying the expression of expressed regions by running featureCounts using the –primary option that only counts primary alignments, but we failed to mention this in the methods. We have updated the methods to clarify this.

4. At the RPP7 locus, At1g58848 is identical in sequences with At1g59218 as is At1g58807 with At1g59214 (two twins in the RPP7 cluster by tandem duplication). It would be good to check whether the TE At1g58889 readthrough indeed occurs in the sister duplicate with a potential TE in the downstream of At1g59218. If not, it can be used as an example of duplication and neofunctionalization through an alternative polyadenylation site choices.

The tandem duplication of AT1G58848 and AT1G58807 in Col-0 makes the RPP7 locus complex to analyse even with long read sequencing data. We find that even with nanopore DRS data, nearly all reads mapping to AT1G58807 multimap at AT1G59124. There is clear evidence of exonic proximal polyadenylation in these transcripts, but the locus of origin is not determinable. In the case of AT1G58848 and AT1G59218, we find a mixture of uniquely mapping and multimapping reads at both genes, and both genes have uniquely mapped reads indicating exonic proximal polyadenylation in 35S::FPA, and chimeric RNA formation in fpa-8. This suggests that RNA processing of these loci is very similar, and so we opted only to show AT1G58848 as an example. Due to the much shorter length of Helicos DRS reads, we applied much more stringent filtering to remove incorrectly mapping or multimapping reads, meaning that there were not enough uniquely mapped reads at the AT1G58848 and AT1G58807 loci to perform Helicos EMD tests. We have updated the text to explain this more clearly.

5. HMM search shall be revisited to confirm if they are to detect the TIR domain. Given that a large proportion of NLRs in A. thaliana carry TIR at their N-terminal ends and the specified examples included TIR-NLR, it is surprising to see no TIR domain in Figure 5.

The absence of the Interpro annotation from Figure 5C (now Figure 4A in the revised manuscript) is a mistake on our part rather than due to its absence from the Interpro annotation. We have now corrected the figure and all other gene tracks to make sure that all Interpro annotations are shown.

6. L659-668: how does the new data relate to the previously TAIR annotated At1g58602.1 vs At1g58602.2 (Figure 6, Inset 1)? It would be good to see these clearly stated in the main text as compared to newly identified ones. From the nanopore profiling, At1g58602.2 appears to be the dominant form.

AT1G58602.2 from the Araport11 annotation contains the most distal annotated isoform of RPP7, whilst AT1G58602.1 contains a slightly more proximal 3’UTR. The reviewer is correct that AT1G58602.2 is the more dominant isoform in our Col-0 data. We have added a sentence that acknowledges this to the section on RPP7 3’UTR isoforms.

7. One thing to note is that in the overexpressor of which Hiks1 R is suppressed, there was hardly any At1g58602.1 produced in addition to the large reduction of At1g58602.2. Thus, relative functional importance of the two transcripts shall be discussed in line with the Hpa resistance data. Accordingly, L740-741 phrasing shall be revised to include the possibility of absolute or relative "depletion" of functional transcript(s) contributing to the compromise in Hpa resistance.

While we agree that, in principle, the change in relative expression of the two annotated distal isoforms of RPP7 could have functional consequences, given that both of these isoforms can encode a protein, the functional impact of this relative change is much less likely to be the cause of the loss of Hpa resistance in FPA overexpressing plants, compared to the larger change in exonic proximal polyadenylation, which produces transcripts which are unlikely to express protein. Given that we have not demonstrated conclusively that it is the increase in exonic polyadenylation of RPP7 that causes reduced immunity in 35S::FPA:YFP, we have made the language of our conclusions in the section “FPA modulates RPP7-dependent, race-specific pathogen susceptibility” more careful.

8. It would be necessary to state in the main text the implication of phosphorylation on the two Ser residues on Pol II at L245. A clear description distinguishing the effect of the two phosphorylation and the specificity of the antibodies is desirable, as the data was interpreted as if the two sites made differences, such that Ser2 was heavily emphasized (e.g. subtitle). Albeit low level, Ser5 data also shows an overlap with FPA ChIP-seq coverage at the 3' end. If there is a statistical significance to be taken account to interpret the coverage, please state it. Given that elongation occurs progressively, I wonder how much should be taken out from the distinction.

It is well established in the literature that Pol II phosphorylated at Ser5 of the C-terminal domain is a hallmark of initiating and elongating Pol II, whilst Ser2 is a hallmark of terminating Pol II (Phatnani and Greenleaf, 2006). This was first established in yeast, where it was shown that Ser5 phosphorylation is necessary for the recruitment of the mRNA capping machinery (Cho et al., 1997; Ho and Shuman, 1999). The yeast homolog of 5’-to-3’ exonuclease which is required for termination (West et al., 2004), was also shown to interact specifically with Pol II phosphorylated at Ser2 via an accessory protein (Kim et al., 2004). Therefore, comparing FPA occupancy to relative levels of Ser2 and Ser5 phosphorylated Pol II is an important validation of the location of FPA binding. We have added a sentence to the relevant Results section describing why CTD phosphorylation varies through the gene body. Arabidopsis ChIP-seq experiments from the literature which profile all Pol II (not just phosphorylated versions) indicate that in Arabidopsis, the highest occupancy is over the terminator (Yu et al., 2019). This may explain why there is also a peak of Ser5 at the terminator (i.e. if there are low levels of Ser5 in a region of higher occupancy, or if there is cross-reactivity of the antibody with Ser52 or unphosphorylated Pol II).

9. Figures presentation for RPP4 and RPP7 are great in detailing the FPA-dependent NLR transcript complexity. To make the functional link more evident, the authors may consider bringing up parts of the Figure 5-supplement to a main Figure to detail the revised annotation of NLRs. Given recent advances in NLR structure and function studies, extra domain fusion, fission and truncated versions of NLRs require a great deal of attention. For example, potential functional link to the NMD-mediated autoimmunity and revised annotation of At5g46470 (RPS6) needs a clear visual guidance preferably with a main figure (Figure 5-Figure Supplement 3).

We thank the reviewer for this comment, and we agree that these figures deserve to be made more visible. This is one of the reasons that we have chosen to submit our manuscript to eLife, since supplementary figures are displayed alongside linked main text figures in an image slider which allows easy access to each gene track. We believe that this will also make it much easier to examine individual gene tracks, without having to compress them to fit them into a single figure panel. However, we do agree that RPS6 is particularly interesting and deserves to be a main figure. We have therefore split the NLR figure into two new figures and incorporated RPS6 gene tracks into the first of these.

10. The section "FPA controls the processing of NLR transcripts" includes dense information and can be broken down to several categories. To this end, Supplement File 3 (NLR list) shall be revised to deliver the categorical classes and further details and converted to a main table.

For NLR audience, for example, it would be important to associate the information to raw reads to assess where the premature termination would occur. At least, the ways to retrieve dataset or to curate the termination sites shall be guided.

On the contrary, there is no need to include other genes in Figure 4-Figure Supplement 4-8 under this section. They are not NLRs.

We have created main-text tables for each of the three classes of FPA-regulated NLR genes, as suggested by the reviewer. We have also removed the examples of non-NLR genes regulated by FPA from the paper, to streamline the story. All the datasets analysed in the study are already available on ENA with database identifiers provided in the Data Availability section to guide readers.

11. Figure 7 and IBM1 section can be spared to the supplement.

We have followed the reviewer’s suggestion and this figure now appears as Figure 2 supplement 4. We have moved the results section on IBM1 up to join it with the global analysis of FPA function in RNA processing.

12. The list of "truncated NLR transcripts" in particular, either by premature termination within protein-coding or with intronic polyadenylation, should be made as a main table. The table can be preferably carrying details in which degree the truncation is predicted to be made. With current sup excel files, it is difficult to assess the breadth of the FPA effect on the repertoire of NLRs and their function. This way, functional implication of differential NLRs transcriptome can be better emphasized.

We have followed the reviewer’s suggestion here and prepared this information into main-text tables 1-3, including predictions of the functional consequences for intronic/exonic poly(A) site choice.

13. FPA-mediated NLR transcript controls, as to promote transcript diversity, is expected to exert its maximum effect if FPA level or activity is subject to the environmental stresses, such as biotic or abiotic stresses. The discussion on effectors targeting RNA-binding proteins (L909-918) is a great attempt in broadening the impact of this research. In addition, if anything is known to modulate FPA activity, such as biotic or abiotic stresses or environmental conditions, please include in the discussion.

We are not aware of any literature reporting the modulation of FPA activity by biotic or abiotic stresses. This is certainly an interesting question which we would like to examine. However, the analysis of FPA activity is complicated by a number of factors. RNA-level expression is often used as a proxy for overall activity. The RNA-level expression of FPA is not necessarily indicative of FPA activity, however, since the proximally polyadenylated isoform of FPA does not produce functional FPA protein. To get a clear picture of FPA activity during infection will therefore require high-depth Illumina RNA-Seq, nanopore direct RNA sequencing or proteomics analysis.

14. NLR transcript diversity as source of cryptic variation contributing to NLR "evolution" is an interesting concept, however, evolutionary changes require processes of genic changes affecting transcript layers or stabilizing transcriptome diversity. In the authors' proposition in looking into accessions, potential evolutionary processes can be further clarified.

We agree with the reviewer that a species-wide transcriptome analysis would provide an invaluable insight into how transcription can affect evolutionary changes. For example, we find that NLRs with high levels of allelic diversity are more likely to be regulated by proximal polyadenylation in Col-0, and so a species-wide approach will reveal whether this regulation is conserved or tailored to environmental conditions. An integrative analysis of genomic and transcriptomic data will also help to identify whether chimeric RNAs present in some accessions are found as retrotransposed genes in others. We have added these specific example experiments to the relevant discussion section.

Reviewer #2:

[...] Overall, it is a potentially important research. The data is rich and could be useful. However, the biological stories described are not thoroughly supported by the data presented, especially when the authors tried to touch on several aspects without some important validations and strong connections among different parts. Some special comments are provided below:

1. The title of this manuscript is "The expression of Arabidopsis NLR immune response genes is modulated by premature transcription termination and this has implications for understanding NLR evolutionary dynamics". Therefore, the readers will expect some functional connections between the FPA and the novel NLR isoforms due to premature transcription termination. However, the transcript levels of plant NLR genes are under strict regulation (e.g. Mol. Plant Pathol. 19:1267). Since the functions of NLR genes are related to effector-triggered immunity, it is more important to study the function of FPA on premature transcription termination when the plants are challenged with pathogens. In this manuscript, most transcript analyses are based on samples under normal growth conditions. It is therefore a weak link between the genomic studies and the functional aspects. For instance, it is more important to identify unique NLR isoforms produced upon pathogen challenges that are regulated by FPA. The authors will need to provide some of these data to fill this gap.

To clarify, the title of this manuscript is not as stated here by the reviewer but is “Widespread premature transcription termination of Arabidopsis thaliana NLR genes by the spen protein FPA”. We do indeed describe a functional pathogen test to examine the functional impact of FPA. We show that overexpression of FPA reduces the functional expression of RPP7 transcripts, and that this impacts upon the ability of plants to resist Hpa-hiks1. We agree with the referee that it will be very interesting to investigate, not just FPA, but changes in 3’ processing during infection by different pathogens. However, key questions on NLRs extend to how they function, how they evolve, how they trigger hyperimmunity and how they are controlled to limit impact on fitness, all of which may be impacted by the control of RNA 3’ processing.

2. Since the function of FPA is to regulate NLR immune response genes, we should expect a change in plant defense phenotype in FPA loss-of-function mutants. Could the authors provide more information on this? On the contrary, in line 728 of this manuscript, the authors found that at least for some pathogens, "loss of FPA function does not reduce plant resistance". It is not consistent with the hypothesis that FPA is important to regulate NLR immune response genes.

There is a misunderstanding here, which we clarify in response to Essential Revisions 2.

3. Furthermore, the authors mentioned in lines 729-731 "Greater variability in pathogen susceptibility was observed in the fpa-8 mutant and was not restored by complementation with pFPA::FPA, possibly indicating background EMS mutations affecting susceptibility." Does it mean that fpa-8 contains other mutations? Will these additional mutations complicate the results of the RNA processing? Could the authors outcross the fpa-8 mutation to a clean background?

Given that the fpa-8 mutant was generated using EMS treatment, it is probable that it does contain other mutations besides the one that removes FPA function (this is likely to be the case with most mutants – whether they are generated with EMS or T-DNA insertions). These mutations are likely to be the source of the slightly greater variability in susceptibility to Hpa-hiks1 in fpa-8 compared to the fpa-7 T-DNA mutant. These potential off-target mutations are unlikely to be the cause of the RNA 3’ processing changes seen in the fpa-8 mutant, however, for three reasons: (i) we have previously published Helicos DRS data from fpa-7 mutants which shows that they have the same RNA 3’ processing defects as fpa-8 mutants, for example at PIF5 and IBM1 (Duc et al., 2013) indicating that changes in 3’ processing in fpa-8 and fpa-7 are caused by the common loss of FPA function; (ii) our Illumina RNA-Seq data for the FPA complementing line shows that an FPA transgene restores 3’ processing effects seen in the fpa-8 mutant, for example at PIF5 (Author response image 1), but does not restore the variability in susceptibility of fpa-8 to Hpa-hiks1 (Figure 6C) (iii) many of the genes with altered poly (A) site choice in fpa-8, including RPP7, show reciprocal changes in processing in the FPA overexpressing line. Taken together, these findings strongly indicate that the loss of FPA is what causes altered poly (A) site choice in an fpa-8 mutant.

Author response image 1
A pFPA::FPA transgene complements chimeric RNA formation found in the fpa-8 mutant at PIF5.

Illumina RNA-Seq data showing the expression of PIF5 (AT3G59060) - PAO3 (AT3G59050) chimeric RNAs in fpa-8 is lost in pFPA::FPAfpa-8 complemented lines.

4. In line 318, the authors found 285 and 293 APA events in the fpa-8 mutant and the 35S::FPA:YFP construct respectively, but only 59 loci (line 347) exhibited opposite APA events (about one fifth). The low overlapping frequency suggests that some results could be false positive.

The level of reciprocal alternative polyadenylation cannot be used to determine false positive rate. For a gene to show reciprocal effects, when comparing the results of fpa-8 vs Col-0, and 35S::FPA:YFP vs Col-0, requires at least two poly(A) sites to be used at high levels in Col-0. For example, at RPP7, high levels of proximal exonic polyadenylation are detectable in Col-0, meaning that a shift to distal site usage is detectable in fpa-8, as well as the shift to proximal site selection in 35S::FPA:YFP. However, there are many loci where this is not the case. For example, the abundant chimeric RNAs found at the PIF5 locus in fpa-8 are undetectable in Col-0, meaning that overexpression of FPA has no effect on PIF5 when compared to Col-0. Consequently, PIF5 is not amongst those genes with reciprocal regulation, despite the effect of FPA on PIF5 RNA processing being very clear in multiple datasets.

5. In line 732-736: "In contrast, 35S::FPA:YFP plants exhibited a similar level of sporulation to the pathogen-sensitive Ksk-1 accession (median 3 sporangiophores per plant). This suggests that the premature exonic termination of RPP7 caused by FPA has a functional consequence for Arabidopsis immunity against Hpa-Hiks1." It is contradictory to the statement in line 728 that "loss of FPA function does not reduce plant resistance". Is it possible that overexpression of FPA:YFP had generated an artificial condition that is not related to the natural function of FPA?

There is a misunderstanding here that may be due to the wording that we used in this section and we explain this above. Col-0 is resistant to Hpa-Hiks1 because it has a functional RPP7 gene. In fpa-8 mutants, the expression of full-length RPP7 transcripts is not compromised relative to Col-0 and hence it is as resistant to Hpa-Hiks1 as Col-0. In contrast, 35S::FPA:YFP promotes the use of a poly(A) site within exon 6, reducing the amount of full-length RPP7 detected. This poly(A) site is used in the Col-0 wildtype line but is not detectably selected in the loss-of-function fpa-8 mutant line. Together, these findings reveal that this poly(A) site is chosen in the Col-0 reference strain and that this requires FPA. Therefore, the selection of this site is the natural function of FPA and not simply generated by an artificial condition. We have re-worded the text in this section to clarify this misunderstanding.

6. The fpa-8 mutant has a delayed flower phenotype (Plant Cell 13:1427). Could the 35S::FPA:YFP fusion protein construct reverse this phenotype and the plant defense response phenotype? It is important to interpret the data when the 35S::FPA:YFP construct was used to represent the overexpression of FPA.

As we report in the Materials & Methods section, a line expressing 35S::FPA:YFP was obtained from Caroline Dean. Published evidence that this line complements the late flowering phenotype of fpa-8 is provided in the corresponding publication (Baurle et al., 2007) as Figure S5. In our growth conditions, these lines flower early like wild-type compared to the very late flowering of fpa-8. The late flowering phenotype of fpa-8 mutants is explained by elevated levels of the floral repressor FLC. The Illumina RNA-Seq, Helicos DRS and nanopore DRS data that we release here all show reduced levels of FLC in the 35S::FPA:YFP line compared to fpa-8 consistent with complementation (Author response image 2).

Author response image 2
A 35S::FPA:YFP transgene complements elevated expression of FLC found in the fpa-8 mutant.

Illumina RNA-Seq data showing the overexpression of FLC (AT5G10140) in fpa-8 is restored to around wild type levels in pFPA::FPA and 35S::FPA:YFP complemented lines.

7. Under the subheading "FPA co-purifies with the mRNA 3' end processing machinery". The results were based on in vivo interaction proteomics-mass spectrometry. MS prompts to false positives and will need proper controls and validations. Have the authors added the control of 35S:YFP instead of just the untransformed Col-0? At least for the putative interacting partners in Figure 1A, could the authors perform validations of some important targets, using techniques such as reverse co-IP, or to show direct protein-protein interaction between FPA to a few of the important targets by in vitro pull-down, BiFC, or FRET, etc.

FP fusions are widely used in IP experiments, but we are not aware of any study that reports 3’ processing factors to be recurrent contaminants in such experiments. We had anticipated submitting an additional proteomics study at around the same time as this study but aspects of this additional work were disrupted by control measures associated with Covid-19. What we do show here, is that an orthogonal approach (ChIP) with different antibodies (anti-FPA) also localises FPA to the 3’ end of Arabidopsis genes together with Pol II phosphorylated on Ser2 of the CTD. These orthogonal datasets are therefore consistent with our interpretation that FPA co-purifies with Pol II and multiple factors involved in the processing of RNA 3’ ends and are also supported by our transcriptomic analyses of fpa mutants and overexpressors which have altered 3’ processing.

8. In Fig. 3, the data show that the last exon of the FPA gene is missing in the FPA transcripts generated from the 35S::FPA:YFP construct. Will the missing of this exon affect the function of the transcript and the encoded protein?

As we state in the Materials & Methods section, this line was obtained from Caroline Dean and the details of its construction were previously described (Baurle et al., 2007). The transgene construct has a different promoter (CaMV 35S) and associated 5’UTR sequence and the sequence downstream of the stop codon is replaced by a transgene-derived 3’UTR. Consequently, these regions of the transgene-derived FPA do not align to the Col-0 reference. We have added new text to the Figure legend to clarify this point. Given that the 35S::FPA:YFP transgene complements the flowering time phenotype of fpa-8 mutants, and causes widespread changes in 3’ processing, there is no evidence that the lack of the canonical 3’UTR has a deleterious impact on the function of the FPA protein.

9. The function of FPA is still ambiguous. There was a quantitative shift toward the selection of distal poly(A) sites in the loss-of-function fpa-8 mutant and a strong shift to proximal poly(A) site selection when FPA is overexpressed (35S::FPA:YFP) in some cases (Fig. 3, Fig. 5, Fig. 8). But the situation could be kind of reversed in other cases (Fig. 6). What is the mechanism behind it?

Using different sequencing technologies, we clearly show that the predominant effect of FPA is to promote proximal poly(A) site selection and indeed that these cases are associated with the largest effect sizes. The mechanism involved is not studied here. One possibility is that genes which display an increase in distal polyadenylation when FPA is overexpressed are indirect targets of FPA. This would be unsurprising given that FPA regulates the alternative polyadenylation of a number of other factors involved in 3’ processing. Another possibility is that FPA can associate with different complexes of 3’ processing factors at different locations, resulting in opposing effects on 3’ processing. A future goal for us, in dissecting the mechanism by which FPA mediates NLR transcription termination will be to relate poly(A) site choice to direct RNA binding site interactions mapped by iCLIP, for example.

10. Under the subheading: "The impact of FPA on NLR gene regulation is independent of its role in controlling IBM1 expression". IBM1 is a common target of FPA and IBM2. Indeed, FPA and IBM2 share several common targets (Plant Physiol. 180:392). It may be more meaningful to compare the impact of FPA and IBM2 on NLR gene instead.

IBM2/ASI1 is an RNA and chromatin binding protein that regulates the expression of IBM1 by promoting elongation through intronic heterochromatic marks, as part of a complex with EDM2 and AIPP1. As a result, edm2, ibm2, and aipp1 mutants fail to produce full length IBM1 transcripts, resulting in phenotypes similar to the ibm1 mutant. Mutations in FPA were recently identified as suppressors of the phenotypes of ibm2 mutants. This is likely because FPA promotes the proximal polyadenylation of IBM1 transcripts.

Since FPA regulates the proximal polyadenylation of IBM1, we asked if it was possible that some of the targets of FPA overexpression identified by nanopore and Helicos DRS were caused by indirect effects on chromatin state resulting from a decrease in full length IBM1 expression. However, there is no indication that FPA acts to promote alternative polyadenylation of IBM2. We therefore consider it unlikely that proximal polyadenylation of NLRs in the 35S::FPA:YFP line is caused by indirect effects on IBM2.

11. In lines 423-425, the authors described "Consistent with previous reports, the level of mRNA m6A in the hypomorphic vir-1 allele was reduced to approximately 10% of wild-type levels (Parker et al., 2020b; Ruzicka et al., 2017) (Figure 4 - supplement 3)." This data could not be found.

We have re-checked the submitted article. These data are indeed there: page 46, line 1510 and correctly labelled as Figure 4 supplement 3. In the revised manuscript these data are included as Figure 2-figure supplement 3, and the raw data is also available as Figure 2 source data 11.

12. In line 426: "However, we did not detect any differences in the m6A level between genotypes with altered FPA activity." Which data is this statement referring to?

This statement refers to the data in Figure 2-figure supplement 3 of the revised manuscript.

Reviewer #3:

[...] One minor complaint is that the authors don't focus on NLRs starting on line 436, and then they have extensive results on NLRs; by the time I got to the discussion, I'd forgotten about the early focus on the M6A. While the first part of the article is necessary, I would suggest a more concise results section to give the paper more focus on the NLR control (since that is emphasized in the abstract and the title of the manuscript).

We thank the reviewer for their comments. We agree that the paper is dichotomous due to the initial focus on the function of FPA and subsequent identification of the effect on NLRs. We have reduced the length of the initial results sections, particularly the proteomics results, so as to come to our findings on NLR genes more quickly.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Reviewer #2 (Recommendations for the authors):

Overall, it is a piece of interesting research supported with rich data. The authors have addressed much of the concerns in the revised version and through further explanations. Some remaining questions could be addressed via clarification, strengthened comparison, and additional discussions.

We thank the reviewer for these remarks. We have addressed their questions below.

1. In relation to my original Question 1. Since the title of this manuscript is "Widespread premature transcription termination of Arabidopsis thaliana NLR genes by the spen protein FPA" and some NLR gene expressions are responsive to pathogen attack, the readers may be interested to know the changes in NLR genes under pathogen attack conditions that are regulated by FPA. If the authors have these data, it will be great to share.

The question of whether FPA (or other RNA binding proteins) alter the 3’ processing of NLR transcripts during infection is something that we would like to explore. Whilst some microarray and RNA sequencing datasets collected during infection conditions are available, these are generally designed for analysing expression changes at the gene level. As a result, they are underpowered for the analysis of RNA processing, which generally requires higher sequencing depth and longer reads – for example 50-100 million 2 x 150bp reads per replicate, with 6 or more biological replicates, as was used in our Illumina experiment. As far as we are aware, no experiments using nanopore direct RNA sequencing to identify RNA processing changes during infection have yet been generated or published. It is clear that the absence of detailed analyses of NLR transcript processing and fate during pathogen infection represents a gap in understanding for the immunity field as a whole, and so generating these data is a goal for our future enquiries.

2. In relation to my original Question 2 and Question 5. Since overexpression of FPA only partially reduces the level of functional RPP7 transcripts, is it possible that FPA overexpression also acts on other NLR transcripts that leading to loss of resistance?

We cannot rule out the possibility that FPA-dependent proximal polyadenylation at other NLR loci besides RPP7 may contribute quantitatively to the loss of immunity to Hpa-hiks1 seen in the 35S::FPA:YFP line. We have therefore reworded our conclusions in the relevant Results section. We now state: “We conclude that FPA control of poly(A) site selection can modulate NLR function, with a functional consequence for immunity.”

3. In relation to my original Question 4. Is it possible to make a comparison directly between the 35S::FPA:YFP line versus the fpa-8 mutant to investigate see whether all disappeared pre-mature transcriptional terminations have returned to the level of Col-0 or even more?

We compared fpa-8 and 35S::FPA:YFP nanopore DRS data directly using the Earth mover distance (EMD) method, as suggested by the reviewer. We found that 80.0% of the loci with significantly increased distal poly(A) site choice in fpa-8 when compared with Col-0, were also significant when compared to 35S::FPA:YFP (hypergeometric p = 2.2 x 10-172). This indicates that the 35S::FPA:YFP transgene is able to reverse the readthrough at these loci displayed in fpa-8. Furthermore, 77.2% of loci with significantly increased proximal poly(A) site choice in 35S::FPA:YFP when compared to Col0, were also significant when compared to fpa-8 (hypergeometric p = 1.9 x 10-119). Of the loci with altered poly(A) site choice when comparing 35S::FPA:YFP to either Col-0 or fpa-8, 85.9% had a larger EMD when compared to fpa-8, suggesting that there are reciprocal changes beyond Col-0 levels in 35S::FPA:YFP and fpa-8 at these loci.

4. In relation to my original Question 6. The authors showed that overexpression FPA will decrease the overall FLC transcripts. Is the FPA acting on the pre-mature transcriptional termination of FLC too? Any data to support this?

There is no evidence that sense FLC transcripts are targeted by FPA-dependent proximal polyadenylation (Duc et al., 2013; Hornyik et al., 2010). Instead, there is a significant body of literature on the role of FPA and other 3’ processing factors influencing long non-coding antisense RNAs at the FLC locus (Hornyik et al., 2010). The ratio of proximal to distal antisense RNAs correlates negatively with sense FLC expression (Duc et al., 2013; Hornyik et al., 2010).

5. In relation to my original Question 7. Does the anti-FPA chip data match well with the proximal APA in Col-0?

To test whether there was an overlap between the sites of FPA-associated chromatin and loci with FPA-sensitive poly(A) site choice, we called peaks from the FPA ChIP-seq data using MACS2 (Zhang et al., 2008). This resulted in the identification of 1120 unstranded peaks. We then assigned peaks to loci by identifying the closest (or overlapping) transcribed locus in an upstream orientation to each peak, using bedtools (Quinlan and Hall, 2010). Where there were multiple tied loci that could be assigned to a single peak (e.g. at convergent terminators or overlapping loci), peaks were assigned to all tied loci. We then compared the loci with identified FPA peaks to those loci with FPA-dependent alternative polyadenylation, identified using Nanopore DRS sequencing. We found that of the 222 loci with increased distal polyadenylation in fpa-8, 38 were also associated with an FPA ChIP-seq peak (hypergeometric p = 3.5 x 10-4). Of the 166 loci with increased proximal polyadenylation in 35S::FPA:YFP, 27 were also associated with an FPA ChIP-seq peak (hypergeometric p = 3.3´10-3). The lack of FPA ChIP-seq peaks on many genes with FPA-dependent RNA processing may be explained by low Pol II occupancy. In agreement with this, loci with FPA-dependent alternative polyadenylation, which did not have an associated FPA peak, were more weakly expressed than those that did have an FPA peak (Mann-Whitney U p=7.2 x 10-4). Notably, 95.6% of genes which were associated with FPA ChIP-seq peaks did not have FPA-sensitive alternative polyadenylation under our experimental conditions, and global FPA ChIP-seq signal at 3’ ends was well correlated with Pol II Ser2 signal (Spearman’s ρ = 0.67, p < 2´10-308, 95% CIs [0.66, 0.68]). This suggests that FPA is able to associate with terminating Pol II at most loci but is necessary for poly(A) site choice at a relatively smaller number of loci in our experimental conditions. We have added our findings on the correlation of FPA and Pol II ChIP-seq signal to the relevant Results section and updated the Discussion section “Uncovering protein assemblies that mediate 3ʹ end processing in living plant cells”. We now state:

“Such interactions [between FPA and Pol II Ser2] could account for the global correlation between FPA and Pol II Ser2 occupancy and explain how FPA is able to associate with terminating Pol II at the 3’ ends of most expressed genes.”

6. In relation to my original Question 9 and Question 10. IBM1 is a common target of FPA and EDM2, indicating the possible coordination of the FPA and EDM2 functions. There have been several studies on EDM2, could the authors compare the target of FPA and EDM2, and also address whether FPA also targets TEs in introns of function genes similar to that of EDM2?

Previous studies have indicated that FPA and the EDM2/IBM2/AIPP1 complex act antagonistically to regulate the expression of IBM1 (Deremetz et al., 2019). As a result, mutations disrupting FPA were identified in a genetic screen to isolate suppressors of the ibm2 mutation, the phenotype of which is caused by dysregulation of IBM1. This finding is consistent with our previous discovery that FPA controls the proximal polyadenylation of IBM1 (Duc et al., 2013). Other genes regulated by EDM2/IBM2/AIPP1 include RPP7, RPP4, AT3G05410, and AT1G11270 (Lai et al., 2020; Wang et al., 2013). We have shown in this manuscript that FPA-dependent alternative polyadenylation of RPP7 occurs via an independent exon 6 poly(A) site to the one regulated by EDM2 (which is in intron 1). At AT3G05410, loss of EDM2 function causes proximal polyadenylation near the 5’ splice site of intron 3, resulting in complete loss of expression of the distal exon. We find that loss of FPA causes a slight increase in distal polyadenylation of AT3G05410, including at a poly(A) site in a cryptic final exon in intron 3 (Author response image 1A). This may explain the findings of Deremetz et al., who showed that loss of FPA function partially rescued the proximal polyadenylation of AT3G05410 in ibm2 mutants (Deremetz et al., 2019). At AT1G11270, loss of EDM2 function causes proximal polyadenylation near the 5’ splice site of intron 3, also resulting in complete loss of expression of the distal exon (Duan et al., 2017). In comparison, loss of FPA function does not have a strong effect on AT1G11270, though it may cause a slight increase in intronic proximal polyadenylation at a cryptic final exon in intron 3 (Author response image 3). This suggests that any control of AT1G11270 by FPA occurs via an independent mechanism to the regulation by EDM2/IBM2/AIPP1. Finally, at RPP4, loss of EDM2 function suppresses readthrough into the downstream COPIA retrotransposon (Lai et al., 2020). Similar suppression of readthrough is seen in the fpa-8 mutant (Figure 5) indicating that at RPP4, EDM2 and FPA do not act antagonistically. FPA also controls poly(A) site choice a large number of genes which have no evidence of intronic transposons or intragenic heterochromatin. In addition, FPA predominantly co-purifies with RNA 3’ processing factors rather than histone readers in our proteomics dataset. None of the known interactors of EDM2 and IBM2, which include AIPP1, AIPP2, AIPP3, and CPL2 (Duan et al., 2017), were found associated with FPA. We conclude that regulation of 3’ processing by FPA occurs independently to EDM2 regulated 3’ processing, at a different set of poly(A) sites, and using different mechanisms.

Author response image 3
Alternative polyadenylation of genes with intronic heterochromatin in fpa-8 and 35S::FPA:YFP lines.

(A-B) Gene track showing poly(A) site choice of (A) AT3G05410 and (B) AT1G11270 in fpa-8 and 35S::FPA:YFP lines.

https://doi.org/10.7554/eLife.65537.sa2

Article and author information

Author details

  1. Matthew T Parker

    School of Life Sciences, University of Dundee, Dundee, United Kingdom
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    Contributed equally with
    Katarzyna Knop and Vasiliki Zacharaki
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0891-8495
  2. Katarzyna Knop

    School of Life Sciences, University of Dundee, Dundee, United Kingdom
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    Contributed equally with
    Matthew T Parker and Vasiliki Zacharaki
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2636-9450
  3. Vasiliki Zacharaki

    School of Life Sciences, University of Dundee, Dundee, United Kingdom
    Contribution
    Conceptualization, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    Contributed equally with
    Matthew T Parker and Katarzyna Knop
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5543-2332
  4. Anna V Sherwood

    School of Life Sciences, University of Dundee, Dundee, United Kingdom
    Contribution
    Conceptualization, Validation, Investigation, Methodology, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  5. Daniel Tomé

    School of Life Sciences, University of Warwick, Coventry, United Kingdom
    Contribution
    Conceptualization, Formal analysis, Validation, Investigation, Methodology, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  6. Xuhong Yu

    Department of Biology, Indiana University, Bloomington, United States
    Contribution
    Formal analysis, Validation, Investigation, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
  7. Pascal GP Martin

    Department of Biology, Indiana University, Bloomington, United States
    Present address
    INRAE, Univ. Bordeaux, Villenave d'Ornon, France
    Contribution
    Data curation, Formal analysis, Validation, Investigation, Methodology
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4271-658X
  8. Jim Beynon

    School of Life Sciences, University of Warwick, Coventry, United Kingdom
    Contribution
    Resources, Formal analysis, Supervision, Investigation, Project administration
    Competing interests
    No competing interests declared
  9. Scott D Michaels

    Department of Biology, Indiana University, Bloomington, United States
    Contribution
    Conceptualization, Resources, Supervision, Project administration
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5248-3487
  10. Geoffrey J Barton

    School of Life Sciences, University of Dundee, Dundee, United Kingdom
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9014-5355
  11. Gordon G Simpson

    1. School of Life Sciences, University of Dundee, Dundee, United Kingdom
    2. The James Hutton Institute, Invergowrie, United Kingdom
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    g.g.simpson@dundee.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6744-5889

Funding

Biotechnology and Biological Sciences Research Council (BB/M010066/1)

  • Geoffrey J Barton
  • Gordon G Simpson

Biotechnology and Biological Sciences Research Council (BB/J00247X/1)

  • Geoffrey J Barton
  • Gordon G Simpson

Biotechnology and Biological Sciences Research Council (BB/M004155/1)

  • Geoffrey J Barton
  • Gordon G Simpson

H2020 Marie Skłodowska-Curie Actions (799300)

  • Katarzyna Knop

Wellcome Trust (097945/B/11/Z)

  • Geoffrey John Barton
  • Gordon G Simpson

National Institutes of Health (GM075060)

  • Scott D Michaels

FP7-PEOPLE (609398)

  • Pascal GP Martin

National Science Foundation (2001115)

  • Scott D Michaels

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Paul Birch and Ingo Hein for comments on the manuscript and David Baulcombe, Ian Henderson and Wenbo Ma for helpful NLR discussions. We thank Abdelmadjid Atrih (Centre for Advanced Scientific Technologies, School of Life Sciences) for the m6A LC-MS/MS analysis. This work was supported by awards from the BBSRC (BB/M010066/1; BB/J00247X/1; BB/M004155/1), the University of Dundee Global Challenges Research Fund, a University of Dundee PhD studentship to V.Z. and a European Union Horizon 2020 research and innovation programme under Marie Skłodowska-Curie grant agreement No. 799300 to KK. P.G.P.M. received the support of the EU in the framework of the Marie-Curie FP7 COFUND People Programme, through the award of an AgreenSkills+ fellowship ( grant agreement no. 609398). This work was supported by a grants to S.D.M from the National Institutes of Health (GM075060) and National Science Foundation (2001115). The FingerPrints Proteomics Facility of the University of Dundee is supported by a Wellcome Trust Technology Platform Award (097945/B/11/Z).

Senior Editor

  1. Detlef Weigel, Max Planck Institute for Developmental Biology, Germany

Reviewing Editor

  1. Hao Yu, National University of Singapore & Temasek Life Sciences Laboratory, Singapore

Reviewers

  1. Chae Eunyoung
  2. Blake C Meyers, Donald Danforth Plant Science Center, United States

Publication history

  1. Received: December 7, 2020
  2. Accepted: April 26, 2021
  3. Accepted Manuscript published: April 27, 2021 (version 1)
  4. Version of Record published: May 12, 2021 (version 2)
  5. Version of Record updated: June 4, 2021 (version 3)

Copyright

© 2021, Parker et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,105
    Page views
  • 214
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Chromosomes and Gene Expression
    2. Structural Biology and Molecular Biophysics
    Francisco de Asis Balaguer et al.
    Research Article Updated

    Faithful segregation of bacterial chromosomes relies on the ParABS partitioning system and the SMC complex. In this work, we used single-molecule techniques to investigate the role of cytidine triphosphate (CTP) binding and hydrolysis in the critical interaction between centromere-like parS DNA sequences and the ParB CTPase. Using a combined optical tweezers confocal microscope, we observe the specific interaction of ParB with parS directly. Binding around parS is enhanced by the presence of CTP or the non-hydrolysable analogue CTPγS. However, ParB proteins are also detected at a lower density in distal non-specific DNA. This requires the presence of a parS loading site and is prevented by protein roadblocks, consistent with one-dimensional diffusion by a sliding clamp. ParB diffusion on non-specific DNA is corroborated by direct visualization and quantification of movement of individual quantum dot labelled ParB. Magnetic tweezers experiments show that the spreading activity, which has an absolute requirement for CTP binding but not hydrolysis, results in the condensation of parS-containing DNA molecules at low nanomolar protein concentrations.

    1. Biochemistry and Chemical Biology
    2. Chromosomes and Gene Expression
    Robert P Fuchs et al.
    Research Article Updated

    Temozolomide (TMZ), a DNA methylating agent, is the primary chemotherapeutic drug used in glioblastoma treatment. TMZ induces mostly N-alkylation adducts (N7-methylguanine and N3-methyladenine) and some O6-methylguanine (O6mG) adducts. Current models propose that during DNA replication, thymine is incorporated across from O6mG, promoting a futile cycle of mismatch repair (MMR) that leads to DNA double-strand breaks (DSBs). To revisit the mechanism of O6mG processing, we reacted plasmid DNA with N-methyl-N-nitrosourea (MNU), a temozolomide mimic, and incubated it in Xenopus egg-derived extracts. We have shown that in this system, MMR proteins are enriched on MNU-treated DNA and we observed robust, MMR-dependent, repair synthesis. Our evidence also suggests that MMR, initiated at O6mG:C sites, is strongly stimulated in cis by repair processing of other lesions, such as N-alkylation adducts. Importantly, MNU-treated plasmids display DSBs in extracts, the frequency of which increases linearly with the square of alkylation dose. We suggest that DSBs result from two independent repair processes, one involving MMR at O6mG:C sites and the other involving base excision repair acting at a nearby N-alkylation adduct. We propose a new, replication-independent mechanism of action of TMZ, which operates in addition to the well-studied cell cycle-dependent mode of action.