Compensatory sequence variation between trans-species small RNAs and their target sites

  1. Nathan R Johnson
  2. Claude W dePamphilis
  3. Michael J Axtell  Is a corresponding author
  1. The Pennsylvania State University, United States


Trans-species small regulatory RNAs (sRNAs) are delivered to host plants from diverse pathogens and parasites and can target host mRNAs. How trans-species sRNAs can be effective on diverse hosts has been unclear. Multiple species of the parasitic plant Cuscuta produce trans-species sRNAs that collectively target many host mRNAs. Confirmed target sites are nearly always in highly conserved, protein-coding regions of host mRNAs. Cuscuta trans-species sRNAs can be grouped into superfamilies that have variation in a three-nucleotide period. These variants compensate for synonymous-site variation in host mRNAs. By targeting host mRNAs at highly conserved protein-coding sites, and simultaneously expressing multiple variants to cover synonymous-site variation, Cuscuta trans-species sRNAs may be able to successfully target multiple homologous mRNAs from diverse hosts.


Small regulatory RNAs (sRNAs) produced in one organism can sometimes function to silence mRNAs in another organism. These trans-species sRNAs seem especially prominent in plant/pathogen and plant/parasite interactions. Fungal plant pathogens produce sRNAs with complementarity to host mRNAs (Weiberg et al., 2013) and host plants produce trans-species sRNAs that silence mRNAs in both pathogenic fungi (Zhang et al., 2016; Cai et al., 2018) and oomycetes (Hou et al., 2019). The parasitic plant Cuscuta campestris produces trans-species microRNAs (miRNAs) which silence mRNAs in multiple host plants (Shahid et al., 2018). Silencing by plant trans-species sRNAs relies on extensive complementarity between the sRNA and target mRNA, similar to normal endogenous plant miRNAs (Liu et al., 2014).

Trans-species silencing is expected to benefit the source organism while being detrimental to the target organism in parasitic/pathogenic relationships. This implies that target sites are not under purifying selection to maintain complementarity to trans-species sRNAs. How could such a system be stable over evolutionary time and/or be useful against multiple species? One suggestion is a ‘shotgun’ strategy, in which a very diverse set of trans-species sRNAs is deployed to hit target mRNAs randomly. The plant response to Phytophthora may make use of this strategy (Hou et al., 2019). However, the fact that the trans-species sRNAs delivered to hosts from the parasitic plant C. campestris are miRNAs (Shahid et al., 2018) argues against the shotgun hypothesis in this case. MiRNAs are defined by the precise excision of a single mature, functional small RNA (Axtell and Meyers, 2018), which implies selection for the miRNA to target a particular sequence or closely related set of sequences. We examined Cuscuta trans-species sRNAs and their targets in detail to shed light on how this system may be evolutionarily stable and robust against diverse hosts.


We analyzed sRNA expression from four Cuscuta species (Figure 1A). Specimens from two or three distinct populations of C. pentagona and C. gronovii, respectively, were included, making a total of seven separate sRNA expression studies (identified with acronyms for brevity; Figure 1A–B, Supplementary files 12). All four Cuscuta species are generalists with documented hosts spanning multiple plant families (Figure 1—figure supplement 1). RNA samples (three biological replicates each) from host-parasite interfaces and parasite stems growing on the host Arabidopsis thaliana were obtained and used for sRNA sequencing (Figure 1B). Libraries were condensed to highly expressed sRNA variants and filtered to remove any sRNAs that came from the host (Figure 1—figure supplement 2). Differential expression analysis revealed several hundred Cuscuta sRNAs in each experiment that were significantly up-regulated in the interface tissue relative to parasite stems (FDR < 0.1) (Supplementary file 3); we dubbed these haustorially-induced (HI) sRNAs (Figure 1A; Supplementary file 4). HI-sRNAs are mostly 21 or 22 nucleotides long (Figure 1A), sizes consistent with either miRNAs or short interfering RNAs (siRNAs). Distinguishing miRNAs from siRNAs requires a genome assembly (Axtell and Meyers, 2018), a criterion met so far for only one of the four species (C. campestris) included in this study (Shahid et al., 2018; Vogel et al., 2018). Approximately half of the C. campestris HI-sRNAs (208/408) come from MIRNA hairpins (Supplementary file 5). C. campestris-derived HI-sRNAs were recovered from 40 of the 42 novel MIRNA loci described by Shahid et al. (2018), including representatives from all five miRNA families previously demonstrated to target host mRNAs.

Figure 1 with 2 supplements see all
Haustorium-induced small RNAs (HI-sRNAs) are present in multiple Cuscuta species.

(A) Phylogeny of select Cuscuta species. Size distribution of HI-sRNAs for each sequenced isolate and acronyms are shown. (B) Sampling and sequencing schematic to discern HI-sRNAs. (C) HI-sRNA family counts and membership for each isolate, showing only the top 15 groups. Families were grouped strictly using a maximum edit distance of one nucleotide. Yellow indicates families present in a single isolate.

We next examined conservation of HI-sRNA accumulation between isolates and species. Some canonical plant miRNAs are highly conserved, with several ubiquitous families found in multiple plant orders, or even broadly in all land plants (Cuperus et al., 2011; Chávez Montes et al., 2014). Surprisingly, when using a strict cutoff (maximum edit distance of 1) we found that the majority of HI-sRNAs were not observed in more than one species (Figure 1C). In many cases HI-sRNAs were unique to single isolates of a single species (Figure 1C). This result implies that HI-sRNAs could be rapidly differentiating in expression, sequence, or both within these Cuscuta species.

Our previous work showed that C. campestris HI-sRNAs can target host mRNAs in several hosts (Shahid et al., 2018). We thus looked for evidence of interactions between our broader sets of HI-sRNAs with host (A. thaliana) mRNAs using two complementary methods: secondary siRNA accumulation (Shahid et al., 2018) and degradome analysis (Addo-Quaye et al., 2008). Secondary siRNAs can accumulate from mRNAs as a result of an initial miRNA- or siRNA-directed targeting event, especially when the initiating sRNA is 22 nucleotides long (Cuperus et al., 2010). A large portion of HI-sRNAs are 22-nt or are clustered from some sRNAs which are 22-nt in length (Figure 1—figure supplement 2), allowing us to detect their targeting with this approach. Degradome analysis made use of the NanoPARE method (Schon et al., 2018), which recovers 5’ ends of both capped and uncapped mRNAs. NanoPARE libraries were made from just one isolate from each of the four Cuscuta species, and comprised three biological replicates from the host portion of the interface (Supplementary file 2). A. thaliana xrn4 mutants were used as hosts for these experiments because they over-accumulate 5’ remnants of sRNA-mediated mRNA cleavage (Rymarquis et al., 2011; Schon et al., 2018). The CRCK2 mRNA is an example with both degradome and secondary siRNA evidence of targeting by a C. pentagona HI-sRNA (Figure 2A–E). Altogether these two analyses yielded a set of 61 target sites over 54 A. thaliana mRNAs targeted by Cuscuta HI-sRNAs confirmed by a single method and seven more confirmed by both (Figure 2F, Figure 2—figure supplement 1A, Supplementary file 6). Based on RNA-seq analysis, accumulation of confirmed target mRNAs is generally down-regulated in parasitized host stems (Figure 3). This greatly expands on the set of six mRNAs previously identified as host targets of C. campestris miRNAs (Shahid et al., 2018), and demonstrates that trans-species sRNAs are used by multiple Cuscuta species. Target predictions show that C. campestris homologs of targeted A. thaliana mRNAs invariably have lower complementarity to HI-sRNAs (Figure 4). Repeating the analysis pipeline to examine possible self-targeting of C. campestris mRNAs by HI-sRNAs found only four confirmed targeting interactions, an indication that HI-sRNAs may largely function in trans in the host (Figure 4—figure supplement 1).

Figure 2 with 2 supplements see all
Host targets of Cuscuta HI-sRNAs.

(A) Modeled sRNA-target interaction for A. thaliana CRCK2. (B) Secondary siRNA accumulation from CRCK2. (C) Phasing analysis of secondary siRNAs from CRCK2. Expected phase for cut-site shown in red. (D) Size distribution of CRCK2 secondary siRNAs. (E) Frequency of 5’ ends from the CRCK2 mRNA, with the predicted HI-sRNA cut site shown in red. (F) Host mRNAs with confirmed targeting by a Cuscuta HI-sRNA. Full details in Figure 2—figure supplement 1 and Supplementary file 6.

Analysis of mRNA accumulation in host-parasite interfaces.

Cumulative density plots of interface/control stem ratios for host mRNAs expressed in Cuscuta-host interfaces, assessed by RNA-seq. All mRNAs shown with black line. Colored lines and dots indicate mRNAs which are confirmed targets of HI-sRNAs in the indicated Cuscuta isolates.

Figure 4 with 1 supplement see all
Predicted trans-species and self-targeting in C. campestris homologs of target A. thaliana mRNAs.

Target prediction scores for confirmed A. thaliana mRNA targets (black) and best-blast-hit homologs in C. campestris (red). All sRNAs with predicted targeting are shown.

Some mRNAs were confirmed to be targeted by HI-sRNAs from multiple species, with the most frequent interaction being with SEOR1 (Figure 2F, Figure 2—figure supplement 1A). SEOR1 encodes a phloem protein that acts to reduce sap loss after wounding (Knoblauch et al., 2014). C. campestris growth is enhanced when A. thaliana seor1 mutants are used as hosts (Shahid et al., 2018). However, the majority of mRNAs confirmed as targets are unique to a single Cuscuta species or isolate (Figure 2F, Figure 2—figure supplement 1A). A possible explanation could be that Cuscuta trans-species sRNAs function to regulate similar host processes, while not necessarily the same target mRNAs. Additionally, our current analysis is likely to have missed many targets, both due to lack of sensitivity of our methods (secondary siRNA accumulation and/or degradome analysis both can miss true targets), and because A. thaliana is unlikely to be a major host of Cuscuta in nature.

Numerous target mRNAs are known to be involved in the same processes, both on a gene ontology level (Figure 2—figure supplement 2) and when manually examining known pathways. Genes involved in auxin signaling repeatedly appear, including the previously identified targets TIR1, AFB2, and AFB3 (Shahid et al., 2018) and new targets PXY (Etchells et al., 2012) and ARK2 (Sankaranarayanan and Samuel, 2015) with a proposed role in auxin response. Auxin signaling is involved in many processes in the plant, and is potentially connected to defense against Cuscuta through its role in glucosinolate production (Smith et al., 2016; Salehin et al., 2019). Phloem protein mRNAs are targeted, adding OPS (Truernit et al., 2012) to the previously identified SEOR1. A receptor-like kinase (CuRe1) from tomato is a resistance gene that prevents Cuscuta reflexa infestation (Hegenauer et al., 2016). Receptor-like kinases and kinases in general are well represented in the set of HI-sRNA targets, including several that are involved in defense responses. This includes the well-known defense regulator MPK3 (Asai et al., 2002) and previously discovered BIK1 (Veronese et al., 2006). Another targeted pathway is brassinosteroid (BR) signaling, with targets BRI1 (Planas-Riverola et al., 2019), MAPKKK5 (Yan et al., 2018), and PICKLE (Zhang et al., 2014). BR has a clear role in defense, with connections to both BIK1 and MPK3 (Zheng et al., 2018). An overall theme of targeting host immunity and vascular system function emerges from this set of confirmed targets.

Plant miRNAs that initially seem unrelated based on divergent sequences can sometimes be grouped into superfamilies (Xia et al., 2013). To discover potential superfamilies among Cuscuta HI-sRNAs, we clustered them with a cutoff of five substitutions and barring indels (Figure 5—figure supplement 1). This clustering strategy gives low rates of grouping by random chance (Figure 5—figure supplement 2). Many superfamilies of Cuscuta HI-sRNAs were found, with a substantial portion of them shared between species and isolates (Figure 5A). 19 superfamilies were shared between all isolates except C. indecora, and another 14 superfamilies were present in at least one isolate each of C. campestris, C. pentagona, and C. gronovii. Leveraging the prior C. campestris miRNA annotations, we can extrapolate that 158 out of 332 superfamilies which contain C. campestris HI-sRNAs are likely miRNAs. Furthermore, we extrapolate that of the superfamilies present in C. campestris with proven target relationships, 22 out of 23 are likely to be miRNAs (Figure 2—figure supplement 1B).

Figure 5 with 2 supplements see all
Cuscuta HI-sRNAs form superfamilies that co-vary with target sites across eudicots.

(A) sRNA superfamily count and membership for each Cuscuta isolate. Colors indicate general groupings of superfamilies. (B) An example HI-sRNA superfamily aligned to target sites from homologs in 36 eudicot genomes. Nucleotide and amino acid Shannon entropy from the alignments are shown as bits. Vertical red lines indicate the frame. Dots indicate the number of possible synonymous nucleotides at each codon. 17 additional examples in supplementary file 7. (C) Average conservation of target sites from homologs. Confirmed target site shown (red point), with all other possible sites shown by 25–75% quartiles (black line) and median (black point).

HI-sRNAs within a superfamily vary at several positions both between and within species (Figure 5B). In many cases variation within superfamilies occurred in a visible three-nucleotide period (e.g. SupFam_24, SupFam_37; Figure 5B, Supplementary file 7). This pattern led us to investigate nucleotide variation in corresponding target sites among possible hosts. All four Cuscuta species in this study are generalists that parasitize eudicot hosts, so we aligned homologous target mRNAs from 36 eudicot species (Supplementary file 8). Analysis of translated target site conservation shows that HI-sRNAs target highly-conserved protein-coding positions (Figure 5C). Positional variations in HI-sRNA superfamilies precisely correspond to variable positions in homologous target sequences (Figure 5B). This variation is frequently apparent at synonymous sites, accounting for the three-nucleotide periodicity of superfamily variation. Modeling correlation of positional variation between HI-sRNA superfamilies and eudicot target sites found 18 significant (p-value<0.05, Pearson correlation) examples of this type of co-variation (Supplementary file 7, Figure 2—figure supplement 1C). Importantly, HI-sRNA superfamily variation occurs within single Cuscuta species (Figure 5B, Supplementary file 7), such that multiple HI-sRNA variants are commonly deployed by a given parasite during infestation. By targeting conserved sites, and making several HI-sRNA variants that collectively cover many/most possible synonymous target variants, Cuscuta may ensure successful targeting across a wide range of hosts.

Using sRNA-seq libraries made from C. campestris attachments on Nicotiana benthamiana (Shahid et al., 2018), we found evidence of targeting in transcripts homologous to known A. thaliana targets (Supplementary file 9). Additionally, N. benthamiana target mRNAs were generally down-regulated in interface tissues (Figure 6A, Supplementary file 10. Comparing targeting of TIR1 in A. thaliana and N. benthamiana homologs by SupFam_27 sRNAs illustrates differential complementarity of superfamily members to different mRNAs . The N. benthamiana TIR1 target sites encode identical amino acids, but vary at synonymous positions. Some SupFam_27 variants are more complementary than others from each of the different homologs (Figure 6B). This provides a direct example where variation in a Cuscuta sRNA superfamily accommodates synonymous-site variation in confirmed target mRNAs from different plant species.

Superfamilies compensate for variation in N.benthamiana target homologs.

(A) Accumulation of N. benthamiana target mRNAs. Interface (IN, red) and control stem (CS, black) are shown relative to average CS expression. Points represent biological replicates (N = 5 to 6). P values comparing IN to CS are displayed above the x axis; Wilcoxon rank-sum tests, unpaired, one-tailed. Accumulation was normalized to NbTIP41-L (Niben101Scf03385g06003) and NbPP2A (Niben101Scf09716g01002). (B) sRNA-target alignments of SupFam_27 sRNAs with TIR1 family members from N. benthamiana and A. thaliana. Complementarity scores (Allen et al., 2005) are shown in the heatplot. The strongest predicted interactions are shown on the right; highlighted nucleotides are synonymous variants relative to AtTIR1.

HI-sRNA superfamily diversity could also enable repression of multiple mRNAs with homologous target-sites within a single host. We examined target predictions within A. thaliana and found ten examples of gene family-specific motifs potentially targeted by Cuscuta HI-sRNA superfamilies (Supplementary file 11). These include a HI-sRNA superfamily predicted to target the mRNA region encoding the eponymous WRKY motifs within the well-known family of defense-related transcription factors (Pandey and Somssich, 2009). Several of the targets have been experimentally confirmed by secondary siRNA accumulation or degradome analysis but most remain predictions, including the WRKY family. However, false negatives are common with these methods of confirmation. The striking patterns of sequence covariation between the HI-sRNA superfamilies and their possible target mRNA families make a strong case for the reality of these interactions.


We conclude that multiple Cuscuta species use trans-species HI-sRNAs to target a substantial number of host mRNAs. Many if not most of these HI-siRNAs are likely to be miRNAs. Host genes involved in pathogen defense, hormone signaling, and vascular system function are common targets of Cuscuta trans-species HI-sRNAs. Cuscuta trans-species HI-sRNAs nearly always interact with highly conserved target sites within the coding sequences of host mRNAs. HI-sRNAs can often be grouped into superfamilies that have nucleotide diversity that corresponds with target site variation primarily at synonymous sites. It seems likely that host target sites are under purifying selection because they code for critical amino acids that have little variation even among distantly related eudicots. By targeting these already constrained protein-coding sites, and deploying an array of sRNA variants that cover most possible permutations of synonymous site variation, Cuscuta HI-sRNAs are likely to be robust against the evolution of host resistance by target site sequence changes. This is also a suitable strategy for a generalist parasite that interacts with diverse hosts. The strategy used by Cuscuta provides a novel paradigm for the molecular evolution of trans-species sRNA targeting during parasitism.

Materials and methods

Key resources table
Reagent type (species) or
DesignationSource or
Genetic reagent (A. thaliana)xrn4Rymarquis et al., 2011xrn4-5; CS68822; SAIL_681_E01T-DNA insertion mutation
in Col-0 background
Commercial assay or kitNextera DNAflex kitIlluminaProduct: 20018704
assay or kit
NEB primers set 1New England BiolabsProduct: E7335S
Commercial assay or kitNEB primers set 2New England BiolabsProduct: E7500S
Commercial assay or kitNEB primers set 3New England BiolabsProduct: E7710S
Commercial assay or kitNEB primers set 4New England BiolabsProduct: E7730S
ShortStack(Johnson et al., 2016)v3.8.5
Software, algorithmDESeq2(Love et al., 2014)v1.24.0
Biological sample
(C. campestris)
ccmShahid et al., 2018; Jim Westwood, Virginia Tech‘doddi’
Biological sample
(C. pentagona)
cpe-2017Ebay, seller: eden_wilds2017 collection
Biological sample (C. pentagona)cpe-2015Ebay, seller: eden_wilds2015 collection
Biological sample
(C. gronovii)
cgr-dpClaude dePamphilis, Penn StateProvenance unknown
Biological sample
(C. gronovii)
cgr-massJim Westwood, Virginia Techmassachusetts isolateOrigin: A Massachusetts cranberry bog
Biological sample
(C. gronovii)
cgr-pmWild collectionpurdue mountain isolateOrigin: Roadside near State College, PA (Coordinates: 40.866 N, 77.888 W)
Biological sample
(C. indecora)
cinwww.ars-grin.govPI 675068Origin: Texas

Seed sources

Request a detailed protocol

Cuscuta campestris (isolate ‘doddi’) was originally acquired from a tomato field in California, followed by several generations of selfing in the Westwood laboratory and provided to us as a gift. C. gronovii (isolate ‘DP’) was a gift C. dePamphilis, from an unknown source. C. gronovii (isolate ‘mass’) was collected in Massachusetts and provided as a gift by J. Westwood. C. gronovii (isolate ‘PM’) was collected from a road-side near State College, PA by M. Axtell in October 2017 (Coordinates: 40.866 N, 77.888 W). C. pentagona (isolates ‘eden-2015’ and ‘eden-2017’) were purchased from ebay seller eden_wilds in 2018, both collected from locations in upstate New York. C. indecora (isolate cin/GRIN) was acquired from the U.S. national plant germplasm system ( under the accession: PI 675068.

Genotyping Cuscuta

Request a detailed protocol

Cuscuta seed were scarified by nicking with a razor blade under dissecting microscope and germinated on wet paper towel under growth lighting at ~28°C, harvesting seedlings after 3–5 days of growth for DNA extraction. DNA was extracted using Edwards method (Kasajima et al., 2004). 2 uL of template was used in a 20 uL PCR reaction using Taq polymerase using 0.5 uM final concentration of each primer (Primers 'c' and 'f', Supplementary file 12) (Taberlet et al., 1991). PCR was performed for 30 cycles and enzymatically cleaned-up using 0.5 uL of Exo1 (NEB) and 1 uL of Antarctic Phosphatase (NEB) with 5 uL PCR product, followed by incubation (15 min at 37°C) and inactivation (15 min at 80°C). Sanger sequencing was performed by the Penn State genomics core using primer 'c' (Taberlet et al., 1991). Sequences were trimmed of low quality bases and aligned using MUSCLE (Edgar, 2004) to published TrnL-F sequences (Stefanovic et al., 2007; Costea et al., 2015). Nucleotide phylogeny was constructed using MEGA7 (Kumar et al., 2016) with a maximum likelihood method and 500 bootstrap replicates (Supplementary file 13).

Growth conditions

Request a detailed protocol

Host A. thaliana (Col-0 and xrn4) was sown on wet potting medium, followed by with 3 days of stratification at 4°C. Plants were placed into long day (16 day/8 night) growth conditions at ~23°C under cool-white-fluorescent lamps. Hosts were allowed to grow to maturity (4–5 weeks old), ready for attachment when first inflorescences were longer than 5 cm.

Cuscuta seeds were scarified and germinated as above. Seedlings were ready for attachment once completely emerged from their seed husk and roughly ~2 cm in length, 3–5 days depending on the species. Seedlings were placed in soil next to the primary bolts of host plants. House-built far-red supplementary LED lighting was used to induce attachment under fluorescent lights, allowing 4–5 days for attachment of the parasite. Once attached, parasitized hosts were removed from far-red lighting to prevent secondary attachment. For C. indecora, experimental attachments came from tendrils from a previously established C. indecora colony. 5 cm tendril tips were cut off of the colony and affixed to primary bolts of host plants with scotch tape. Plants were allowed to grow for 10 days after attachment followed by tissue harvest.

Tissue collection and RNA extraction

Request a detailed protocol

All tissues were collected by the following methods and immediately submerged in liquid nitrogen to preserve RNA stability. Guide to tissues gathered is found in Supplementary file 2. Interface (IN) tissue was collected by taking both the host and parasite portions of the interface, trimming away any stems above and below the connection. Parasite stem (PS) was harvested ~4 cm above the interface, approximately 4 cm long each. In NanoPARE experiment using xrn4 as a host, we collected only host interface (HIN); similar method to interface collection, except removing any parasite tissue which can be pulled away. Control stem (CS) tissues were harvested from non-parasitized A. thaliana, collecting stems from the same region where Cuscuta would have been attached. 1–3 tissues were pooled for each biological replicate. RNA was extracted by grinding tissue in a liquid nitrogen cooled mortar, with Tri-reagent (Sigma) added while still cold. Tri-reagent extraction was performed as per the manufacturer's suggestions with a second sodium-acetate–ethanol precipitation and wash step.

Sequencing library preparation

Request a detailed protocol

All small-RNA-seq libraries were prepared using a protocol based on the NEBnext small-RNA library kit (NEB), described as follows. (Step 1) 3’ SR Adaptor (NJ410) was pre-adenylated using 5’ adenylation kit (NEB) as per manufacturer’s instructions. (Step 2) 500 ng of total RNA, 1 μL adenylated adapter (5 μM) and water to 5.25 μL were denatured for 2 min at 70°C and immediately moved to ice. Entire reaction was combined with premixed 100 u RNA Ligase 2, truncated KQ (NEB), 1 μL 10x T4 RNA reaction buffer, 10 u RNAse inhibitor and 3 μL 50% PEG8000, to a total volume of 10 μL and incubated 1 hr at 25°C. (Step 3) Primer hybridization was performed adding 0.5 μL SR RT primer (NJ391, 10 μM) and 2.25 μL water to the prior reaction and incubated as follows: 5 min at 75°C, 15 min at 37°C, 15 min at 25°C, and holding at 4°C. (Step 4) 5’ SR RNA adaptor (NJ411) was diluted to 10 μM and denatured for 2 min at 70°C, moved to ice and used immediately for ligation. Ligation was performed combining the prior reaction with 0.5 μL denatured adapter (NJ411), 5 u RNA Ligase 1 (NEB), 0.25 μL 10x RNA ligase buffer, 10 u RNAse inhibitor, 0.5 μL ATP (10 mM), and water to 15 uL and incubated for 1 hr at 25°C. (Step 5) Reverse transcription was performed immediately following ligation, combining the prior reaction with 100 u Protoscript II reverse transcriptase (NEB), 4.5 μL 5x first strand synthesis buffer, 1 μL dNTPs (10 uM), 1.5 μL DTT (0.1 M), and 10 u RNAse inhibitor equaling 23 μL in total volume and incubated for 1 hr at 50°C followed by heat-killing for 15 min at 70°C. (Step 6) Library amplification was performed, combining 5 μL of cDNA with 25 μL LongAmp Taq 2x master mix (NEB), 1.25 μL SR primer (NJ412, 10 μM), 1.25 μL barcode primer (‘NEB’ primers, 10 μM), and water to 50 μL total volume. Reaction was performed as follows: 30 s initial denature at 94°C, 15 cycles of 15 s at 94°C, 30 s at 62°C, and 15 s at 70°C, followed by final extension of 5 min at 70°C. Reactions were purified and size selected for sRNAs 15–40 nt in length by PAGE. Extracted bands were quantified by qPCR and quality-controlled by high-sensitivity DNA chip (Agilent). Sequencing was performed on a NextSeq550 (Illumina) with the high-output kit (75 nt, single-end, single barcode) by the Penn State genomics core. Sequencing libraries were de-multiplexed and adaptor trimmed using cutadapt (Martin, 2011) (cutadapt -a AGATCGGAAGA -m 15 j 8 -o output.fq input.fq).

NanoPARE and mRNA-seq libraries were prepared using the protocol described in Schon et al. (2018), with the following details: NanoPARE and mRNA-seq were performed on interfaces (IN) of four isolates (ccm, cpe-2015, cgr-dp, and cin) and control stems (CS), grown on Col-0 A. thaliana. NanoPare was also performed on host interfaces (HIN) of the same isolates and control stem grown on xrn4 mutant Col-0 A. thaliana. The Nextera DNA flex kit (Illumina) was used for tagmentation of 110 ng pre-amplified PCR product. Libraries were amplified using different barcoded i7 and i5 primer sets, described in Supplementary file 12), allowing for either amplification of 5’ ends (NanoPARE) or all tagged entities (mRNA-seq). The sequencing of NanoPARE data made use of custom read one sequencing and i5 index sequencing primers (NJ395 and NJ416, reverse complements of each other), which sequence out from the template switching oligo adapter. Sequencing was performed on a NextSeq550 (Illumina) with the high-output kit (75 nt, single-end, double barcoded) by the Penn State genomics core. Sequencing libraries were de-multiplexed and NanoPARE libraries were trimmed using an in-house script to remove any residual untemplated 5’ nucleotides caused by reverse transcription of the template-switching oligo.

Genome-free sRNA discovery

Request a detailed protocol

Genome-free sRNA discovery was performed using a set of in-house scripts, corresponding to the following pipeline (Figure 1—figure supplement 2). Reads were filtered by size, retaining lengths of 20 to 24 nt, and condensed to unique sequences with a count of abundances for each tissue. For each Cuscuta species, unique reads were further condensed by sequence similarity to their most abundant variants. This process found similar variants for sRNAs in rank order of abundance, clustering sRNAs with a Levenshtein edit distance of 2 or less. Reads which do not cluster to a variant with abundance of 0.5 reads per million (RPM) or higher are discarded. Most abundant sRNA variants of each cluster are reported, with the abundance as the combined abundance of all clustered reads. Host sRNAs were then filtered, removing an sRNA if it met one of the following criteria: (1) it is closely similar to an annotated miRNA; (2) it aligns perfectly to the A. thaliana genome or transcriptome (Cheng et al., 2017); (3) it is present in non-parasitized A. thaliana control libraries at an RPM greater than 1/100 its expression in parasite libraries. Differential expression analysis was then performed with DEseq2 (Love et al., 2014) to identify sRNAs up-regulated in the interface tissue relative to the parasite stem, using the command the ‘results’ command with a false-discovery rate of 0.1 (Benjamini–Hochberg correction). This pipeline resulted in our list of HI-sRNAs.

Superfamilies were constructed using an in-house script that corresponds to the following pipeline. All by all comparisons of HI-sRNA sequences were performed, measuring modified hamming distance (Figure 5—figure supplement 1), and sequences with a distance of 5 or less were clustered together, ordered by overall size of the superfamily. To test this distance cutoff, HI-sRNA sequences were shuffled using Ushuffle (Jiang et al., 2008), set to retain di-nucleotide structure (10 random replicates) (Figure 5—figure supplement 2).

Target confirmation

Request a detailed protocol

Target prediction of HI-sRNAs was performed using the script (, 2014; copy archived at under default settings, using HI-sRNA from a given isolate as the query and A. thaliana ARAPORT11 transcriptome (Cheng et al., 2017) as the subject.

To find secondary siRNAs produced from targeting of A. thaliana genes, sRNA-annotation was performed on the A. thaliana genome (Arabidopsis Genome Initiative, 2000), using ShortStack (Johnson et al., 2016) ( with gene locations from the ARAPORT11 annotation (Cheng et al., 2017) as the basis for sRNA loci. Differential expression analysis was performed with DEseq2 (Love et al., 2014) to identify loci up-regulated in the interface (IN) relative to control stem (CS), using the ‘results’ command with a false-discovery rate of 0.1 (Benjamini–Hochberg correction). Up-regulated loci were then filtered to retain loci which met the following criteria: (1) have a strong predicted HI-sRNA target site (complementarity score (Allen et al., 2005) six or less); (2) are unstranded; (3) have a predominant sRNA length of 21/22 nt; (4) have a minimum depth of 20 reads. Plots of loci with predicted targets, radar plots of sRNA phasing, and length distribution plots to find examples where HI-sRNAs are clearly causative in the locus.

To confirm targeting of A. thaliana genes using degradome data, host stem of interface (HIN) NanoPARE libraries were aligned to the ARAPORT11 (Cheng et al., 2017) transcriptome using bowtie (Langmead et al., 2009) (bowtie -p 8 f -v 3 S -a). Using an in-house script, frequency at 5’ positions of alignments were intersected with predicted HI-sRNA target sites, retaining interactions which met the following criteria: (1) have a strong prediction score (complementarity score six or less; Allen et al., 2005) six or less; (2) the target site is greater than 100 nt from the start of the transcript (to avoid miscalls with the transcriptional start site); (3) the target is not in an organellar genome; (4) the target peak is greater than the median peak depth in the gene; (5) the target peak is found in all three replicates; (6) the target peak is at least 10 fold higher than detected in control stems. Candidates were then examined by eye, filtering out hits with low expression compared to the rest of the gene, hits which appear to be in the transcriptional start site, or hits with low prominence compared to surrounding peaks. Gene-ontology analysis was performed on confirmed targets using blast2go (Götz et al., 2008).

Confirmation of targeting in C. campestris genes (Figure 4—figure supplement 1) was performed using the similar methods as above, with the following changes. Secondary siRNAs were annotated with ShortStack (Johnson et al., 2016) to the C. campestris genome (Vogel et al., 2018), using gene annotations as the basis for loci. Different NanoPARE libraries were used, coming from mixed host-parasite interface (IN), and were aligned to the C. campestris transcriptome (Vogel et al., 2018). No direct control was present to compare peak expression, so the few confirmed examples could not be subjected to this filter.

mRNA-seq analysis

Request a detailed protocol

mRNA-seq libraries were aligned to the A. thaliana genome (Arabidopsis Genome Initiative, 200) using HISAT2 (Kim et al., 2015) (hisat2 -p 2 –max-intronlen 5000 -x genome.fa -U library.fq.gz). Gene expression was quantified by minBamCov (Barnett et al., 2011) (multiBamCov -bams alignment.bam -bed annotation.gff) using the ARAPORT11 annotation (Cheng et al., 2017). Deseq2 (Love et al., 2014) was used to accurately estimate log fold change of mRNAs for each condition, using the ‘lfcShrink’ command (type = apeglm).

Identification of C. campestris miRNAs

Request a detailed protocol

To identify sRNAs that were derived from miRNA hairpins, de novo annotation of sRNA loci in the C. campestris genome (Vogel et al., 2018) was performed using ShortStack (Johnson et al., 2016). Next, loci containing a HI-sRNA from C. campestris were extracted and screened by eye to find miRNAs with the criteria that they have a clear concise hairpin with two matching regions of expression which have a clear two nt offset (factors consistent with miRNA processing). Superfamilies were annotated to identify which contained confirmed miRNAs.

Discovery of target homologs in eudicots

Request a detailed protocol

cDNA and CDS libraries of 36 eudicot species (Supplementary file 8) available in Phytozome v12.1.6 (Goodstein et al., 2012) were downloaded for local analysis. Nucleotide queries of A. thaliana target transcripts were searched against translated CDS libraries from eudicots using blastx (Camacho et al., 2009) (blastx -query target.fa -db eudicot.db -outfmt 6 -num_threads 6 -evalue 0.001 -task blastx-fast), extracting the best hit for each species based on bit score. Conservation of target site and coding sequence of homologs was calculated by aligning their translated coding sequences using MUSCLE (Edgar, 2004) and measuring the average conservation (Shannon entropy) of every eight amino acid window, flagging the window which corresponds to the target site. RNA superfamilies and transcripts of best-hit homologs of target were each aligned using MUSCLE (Edgar, 2004) and oriented to each other using in-house scripts. Positional nucleotide conservation for target site in homologs and superfamily sRNAs were calculated and used to construct a linear regression model in R (lm function and resulting p-values). For each comparison, n equals the number of correlating nucleotide positions in the interaction (n = 20–24).

Discovery of conserved motifs in A. thaliana targets

Request a detailed protocol

Conserved motifs targeted by sRNA superfamilies in A. thaliana were found by first extracting all targets of a superfamily with very strong predicted targeting (complementarity score [Allen et al., 2005] three or less). Using an in-house script, sequences of target sites were translated for the correct frame and clustered using a greedy algorithm with a maximum edit distance in a cluster of three or less. Conservation of target sites and surrounding nucleotide sequences were then calculated and oriented adjacent to multiple sequence alignments of the targeting superfamily, highlighting confirmed interactions.

Target analysis in N. benthamiana

Request a detailed protocol

sRNA sequencing libraries of C. campestris parasitizing N. benthamiana were retrieved from SRA bioproject: PRJNA408115 (Supplementary file 2) (Shahid et al., 2018). Parasite stem and interface libraries were used as input in the genome-free sRNA discovery pipeline explained above. Secondary siRNA producing loci were identified using the pipeline explained above with the N. benthamiana Genome v1.0.1 (Bombarely et al., 2012). Transcripts that were identified as containing HI-sRNA induced secondary siRNA loci were then compared to C. campestris HI-sRNA targets in A. thaliana, identifying possible homologs of these targets (Supplementary file 10).

For the quantification of target mRNA expression, C. campestris was attached to 2–3 week old N. benthamiana plants, and allowed to grow for 11–15 days. Total RNA was extracted from the interface of parasitized (IN) and the stem of unparasitized plants (CS), using Tri-reagent (Sigma) and the double-precipitation method explained above. cDNA was synthesized from 1 μg of total RNA using the ProtoScript II (NEB) reverse transcriptase with random primers, as per manufacturer instructions. qRT-PCR primers (Supplementary file 12) were designed for these transcripts, sometimes targeting several homologous mRNAs in N. benthamiana equally well. The amplicons were designed to bridge the best predicted cut sites among causative sRNAs. Best-blast-hit homologs of housekeeping genes PP2A (AT1G13320) and TIP41-L (AT4G34270) in N. benthamiana were used for normalization of PCR accumulation between cDNAs, averaging expression relative to control stem. qPCR resulted in expression from 5 to 6 biological replicates in IN and CS samples.

Code availability

Request a detailed protocol

ShortStack (Johnson et al., 2016) and StrucVis are both freely available at (Axtell, 2018; copy archived at is freely available at Muscle (Edgar, 2004) is freely available at MEGA7 (Kumar et al., 2016) is freely available at Blast-suite (Camacho et al., 2009) is freely available at Cutadapt (Martin, 2011) is freely available at Bamtools (Barnett et al., 2011) is freely available at The R package DEseq2 (Love et al., 2014) is freely available at HISAT2 (Kim et al., 2015) and bowtie (Langmead et al., 2009) are both freely available at Ushuffle (Jiang et al., 2008) is freely available at Blast2go (Götz et al., 2008) is available with a limited free version at

Data availability

Request a detailed protocol

sRNA-seq data from this work are available at the NCBI SRA under BioProject PRJNA543296.

Data availability

Sequencing data have been deposited in NCBI SRA under Bioproject number PRJNA543296.

The following data sets were generated
    1. Johnson NR
    2. Axtell MJ
    (2019) NCBI SRA
    ID PRJNA543296. Small RNA-seq from multiple Cuscuta species parasitizing Arabidopsis.


Decision letter

  1. Yijun Qi
    Reviewing Editor; School of Life Sciences, Tsinghua University, China
  2. Christian S Hardtke
    Senior Editor; University of Lausanne, Switzerland
  3. Sascha Laubinger
    Reviewer; University of Tübingen, Germany

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

Small RNAs (sRNAs) produced in one species can move into another to regulate gene expression and interactions between species. It has been previously reported that the parasitic plant Cuscuta produces microRNAs (miRNAs) to silence host plant target mRNAs. In this study, the authors identified many more haustorially-induced sRNAs (HI-sRNAs) in several Cuscuta species and confirmed targets of some HI-sRNAs. A subset of the HI-sRNAs are shared between Cuscuta species and isolates and can be grouped into superfamilies. Interestingly, the authors found that HI-sRNAs within a superfamily have variation in a three-nucleotide period and their target mRNAs contain variations at the synonymous sites within the target sites. The authors propose that such co-variation may enable the HI-sRNAs to target homologous mRNAs in different host plants. These findings may have implications in the other parasite/pathogen-host interactions.

Decision letter after peer review:

Thank you for submitting your article "Compensatory sequence variation between trans-species small RNAs and their target sites" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Christian Hardtke as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Sascha Laubinger (Reviewer #2).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

The reviewers agreed that such findings should in principle be published at the highest levels. While the reviewers furthermore agreed that the data presented are clear and thorough, they also agreed that some additional experiments would be essential to improve the manuscript.

Essential revisions:

Some experimental data are required to show that Cuscuta HS-sRNAs indeed use sequence variations to target homologous genes with corresponding sequence variations in different plant species or members within the same gene family in the same host plant. The following experiments are suggested.

1) Generate complementation lines of seor1, or apk2b, or other 1-2 of the 19 examples that contain synonymous mutations. The lines contain the mutated sequences in the synonymous sites but not cleaved by the Hi-sRNAs, and compare the infection results with the mutant and normal complementation lines in Arabidopsis. The reviewers realized that generating stable transgenic lines takes a long time and thus suggested the authors to provide results if they already have the lines.

2) Perform 5' RACE experiments in non-Arabidopsis plants infected by different Cuscuta species to confirm that the predicted targets are cleaved by HI-sRNAs.

3) Examine the cleavage of target sequences from different hosts by a few representative Cuscuta HI-sRNAs using reporter assays in tobacco leaves.

4) Perform qRT-PCR or RNA-seq analysis in Arabidopsis and other hosts infected by Cuscuta species to show that the target genes are indeed down-regulated by Cuscuta HI-sRNAs.

Author response

Essential revisions:

Some experimental data are required to show that Cuscuta HS-sRNAs indeed use sequence variations to target homologous genes with corresponding sequence variations in different plant species or members within the same gene family in the same host plant. The following experiments are suggested.

Overall we've addressed this by adding new data and analyses that demonstrate targeting of homologous mRNAs, with synonymous sequence variants, from Nicotiana benthamiana. The newly added data are in new Figure 6, and Supplementary files 9 and 10.

1) Generate complementation lines of seor1, or apk2b, or other 1-2 of the 19 examples that contain synonymous mutations. The lines contain the mutated sequences in the synonymous sites but not cleaved by the Hi-sRNAs, and compare the infection results with the mutant and normal complementation lines in Arabidopsis. The reviewers realized that generating stable transgenic lines takes a long time and thus suggested the authors to provide results if they already have the lines.

We agree that these experiments would be informative. We have started producing the plant materials for these experiments, but they are not ready in time for this manuscript.

2) Perform 5' RACE experiments in non-Arabidopsis plants infected by different Cuscuta species to confirm that the predicted targets are cleaved by HI-sRNAs.

Confirmation of targeting in Nicotiana benthamiana was shown for Cuscuta campestris miRNAs by 5’ RACE and secondary siRNA identification in our prior publication (Shahid et al., 2018). To address this question in the context of this project, we re-analyzed the prior sRNA-seq data with our genome-free pipeline, confirming targeting for many of these transcripts through the presence of secondary siRNAs (now shown in Supplementary files 9 and 10). We show a more in-depth example of a sRNA superfamily targeting multiple TIR1 homologs in N. benthamiana (Figure 6), which illustrates a clear case where synonymous variants of a target (comparing N. benthamiana and A. thaliana) are compensated for by miRNA superfamily variation.

3) Examine the cleavage of target sequences from different hosts by a few representative Cuscuta HI-sRNAs using reporter assays in tobacco leaves.

Thank you for the suggestion. While potentially informative, we feel this experiment is outside of the scope of our present study. We do worry that artificial reporter assays might not reflect the true situation during parasite infestation, primarily because of the different modes of microRNA delivery.

4) Perform qRT-PCR or RNA-seq analysis in Arabidopsis and other hosts infected by Cuscuta species to show that the target genes are indeed down-regulated by Cuscuta HI-sRNAs.

Done. In the modified Figure 3 we show an analysis of confirmed targets based on mRNA-seq in Arabidopsis, confirming a general trend of target down-regulation during parasite infestation. We have also now performed qRT-PCR experiments from C. campestris – N. benthamiana samples for several confirmed targets. In general these N. benthamiana targets also trend toward down regulation, with the TIR1 example significantly down-regulated (newly added Figure 6).

Article and author information

Author details

  1. Nathan R Johnson

    1. Intercollege PhD Program in Plant Biology, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, United States
    2. Department of Biology, The Pennsylvania State University, University Park, United States
    Conceptualization, Data curation, Formal analysis, Investigation, Methodology
    Competing interests
    No competing interests declared
  2. Claude W dePamphilis

    1. Intercollege PhD Program in Plant Biology, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, United States
    2. Department of Biology, The Pennsylvania State University, University Park, United States
    Conceptualization, Funding acquisition, Methodology
    Competing interests
    No competing interests declared
  3. Michael J Axtell

    1. Intercollege PhD Program in Plant Biology, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, United States
    2. Department of Biology, The Pennsylvania State University, University Park, United States
    Conceptualization, Supervision, Funding acquisition, Methodology, Project administration
    For correspondence
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8951-7361


National Institute of Food and Agriculture (2018-67013-28514)

  • Michael J Axtell
  • Claude W dePamphilis

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.


We thank T Phifer for help generating preliminary data; LS Berghard for greenhouse support; C Praul for next-generation sequencing support; J Westwood (Virginia Tech) for providing multiple Cuscuta seed stocks and thoughtful insight into the work; C Depew for informing us about C. gronovii locations in the wild; and M Schon and M Nodine for early access to the NanoPARE method. This work was supported by an award from the United States Department of Agriculture - National Institute of Food and Agriculture [grant number 2018-67013-285] and a Graduate Research Initiative grant (GRI) from the Huck Institutes of the Life Sciences at Penn State.

Senior Editor

  1. Christian S Hardtke, University of Lausanne, Switzerland

Reviewing Editor

  1. Yijun Qi, School of Life Sciences, Tsinghua University, China


  1. Sascha Laubinger, University of Tübingen, Germany

Publication history

  1. Received: June 27, 2019
  2. Accepted: November 25, 2019
  3. Version of Record published: December 17, 2019 (version 1)


© 2019, Johnson et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.


  • 1,888
    Page views
  • 209
  • 16

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Nathan R Johnson
  2. Claude W dePamphilis
  3. Michael J Axtell
Compensatory sequence variation between trans-species small RNAs and their target sites
eLife 8:e49750.

Further reading

    1. Plant Biology
    Josephine HR Maidment, Motoki Shimizu ... Mark J Banfield
    Research Article

    A subset of plant intracellular NLR immune receptors detect effector proteins, secreted by phytopathogens to promote infection, through unconventional integrated domains which resemble the effector’s host targets. Direct binding of effectors to these integrated domains activates plant defenses. The rice NLR receptor Pik-1 binds the Magnaporthe oryzae effector AVR-Pik through an integrated heavy metal-associated (HMA) domain. However, the stealthy alleles AVR-PikC and AVR-PikF avoid interaction with Pik-HMA and evade host defenses. Here, we exploited knowledge of the biochemical interactions between AVR-Pik and its host target, OsHIPP19, to engineer novel Pik-1 variants that respond to AVR-PikC/F. First, we exchanged the HMA domain of Pikp-1 for OsHIPP19-HMA, demonstrating that effector targets can be incorporated into NLR receptors to provide novel recognition profiles. Second, we used the structure of OsHIPP19-HMA to guide the mutagenesis of Pikp-HMA to expand its recognition profile. We demonstrate that the extended recognition profiles of engineered Pikp-1 variants correlate with effector binding in planta and in vitro, and with the gain of new contacts across the effector/HMA interface. Crucially, transgenic rice producing the engineered Pikp-1 variants was resistant to blast fungus isolates carrying AVR-PikC or AVR-PikF. These results demonstrate that effector target-guided engineering of NLR receptors can provide new-to-nature disease resistance in crops.

    1. Cell Biology
    2. Plant Biology
    Han Nim Lee, Jenu Varghese Chacko ... Marisa S Otegui
    Research Article Updated

    The ubiquitin-binding NBR1 autophagy receptor plays a prominent role in recognizing ubiquitylated protein aggregates for vacuolar degradation by macroautophagy. Here, we show that upon exposing Arabidopsis plants to intense light, NBR1 associates with photodamaged chloroplasts independently of ATG7, a core component of the canonical autophagy machinery. NBR1 coats both the surface and interior of chloroplasts, which is then followed by direct engulfment of the organelles into the central vacuole via a microautophagy-type process. The relocalization of NBR1 into chloroplasts does not require the chloroplast translocon complexes embedded in the envelope but is instead greatly enhanced by removing the self-oligomerization mPB1 domain of NBR1. The delivery of NBR1-decorated chloroplasts into vacuoles depends on the ubiquitin-binding UBA2 domain of NBR1 but is independent of the ubiquitin E3 ligases SP1 and PUB4, known to direct the ubiquitylation of chloroplast surface proteins. Compared to wild-type plants, nbr1 mutants have altered levels of a subset of chloroplast proteins and display abnormal chloroplast density and sizes upon high light exposure. We postulate that, as photodamaged chloroplasts lose envelope integrity, cytosolic ligases reach the chloroplast interior to ubiquitylate thylakoid and stroma proteins which are then recognized by NBR1 for autophagic clearance. This study uncovers a new function of NBR1 in the degradation of damaged chloroplasts by microautophagy.