Compensatory sequence variation between trans-species small RNAs and their target sites

  1. Nathan R Johnson
  2. Claude W dePamphilis
  3. Michael J Axtell  Is a corresponding author
  1. The Pennsylvania State University, United States
6 figures, 1 table and 14 additional files

Figures

Figure 1 with 2 supplements
Haustorium-induced small RNAs (HI-sRNAs) are present in multiple Cuscuta species.

(A) Phylogeny of select Cuscuta species. Size distribution of HI-sRNAs for each sequenced isolate and acronyms are shown. (B) Sampling and sequencing schematic to discern HI-sRNAs. (C) HI-sRNA family counts and membership for each isolate, showing only the top 15 groups. Families were grouped strictly using a maximum edit distance of one nucleotide. Yellow indicates families present in a single isolate.

Figure 1—figure supplement 1
Host preference in Cuscuta species in the United States.

(A) Pipeline for processing herbaria data from the mid-atlantic herbaria consortium (MAHC; http://midatlanticherbaria.org) on interactions with each Cuscuta species of interest. (B) Ranked list of most identified host families for each species. Top 15 are shown for each species, with the top 10 overall identified with consistent colors (all others in black). (C) Geographical listings within the United States for each sample, where latitude and longitude or a searchable county are found.

Figure 1—figure supplement 2
Genome-free HI-sRNA discovery pipeline.

(A) Discovery of HI-sRNAs in Cuscuta isolates. Three major steps include condensing reads to representative sRNAs in a genome-free manner, filtering reads which could have originated from A. thaliana, and performing differential expression with DEseq2 to find reads up-regulated in the interface tissue (FDR < 0.1, null hypothesis: sRNA not differentially expressed). (B) Example of a C. campestris sRNA discovered by this method, with the top 25 constituent sRNA sequences ranked by expression. Highest expressed read is deemed as the representative sRNA sequence and is shown with black box. Green boxes show variations from representative sequences with total distance shown to left. (C) Same as B but with a known miRNA, showing similar variation to the novel sRNA in B. (D) Comparing the proportion of reads present in annotated miRNAs, using both genome-alignment (ShortStack) and genome-free based approaches. Reads are ranked by size, with the canonical miRNA (blue) and the variants (grey) showing the proportion of reads they make up in the sRNA. Reads grouped in the locus by the genome-free method that are absent in the alignment approach are shown in green.

Figure 2 with 2 supplements
Host targets of Cuscuta HI-sRNAs.

(A) Modeled sRNA-target interaction for A. thaliana CRCK2. (B) Secondary siRNA accumulation from CRCK2. (C) Phasing analysis of secondary siRNAs from CRCK2. Expected phase for cut-site shown in red. (D) Size distribution of CRCK2 secondary siRNAs. (E) Frequency of 5’ ends from the CRCK2 mRNA, with the predicted HI-sRNA cut site shown in red. (F) Host mRNAs with confirmed targeting by a Cuscuta HI-sRNA. Full details in Figure 2—figure supplement 1 and Supplementary file 6.

Figure 2—figure supplement 1
Summary of Cuscuta HI-sRNA and host gene target relationships.

(A) Complete list of target interactions between sRNAs and host genes. Confirmation status diagram indicates in what species the interaction is confirmed. Target gene information includes the number of homologs found in 36 eudicot transcriptomes. sRNA counts in superfamilies and the presence of a confirmed miRNA in the family is shown (NoAl: ccm sRNA failed to align to ccm genome). Target interaction columns indicate the conservation at the translated target site in an alignment of found homologs (5’/3’ UTR: not considered for conservation analysis). Correlation coefficient and P-value for variation in positions in target and sRNA superfamily shown. (B) Breakdown of superfamilies with confirmed targeting by the presence of a confirmed miRNA, where possible. (C) Correlation of positional variation in target-sites and their sRNAs, indicating the interactions with a significant correlation.

Figure 2—figure supplement 2
Most common GO terms for confirmed target genes.

(A) GO terms for molecular function with a nodescore ≥5.0, demonstrating the species for which the interaction is confirmed with colored bars. Locations where bars overlap indicate genes where both species have confirmed targeting. (B) Same as with A, but for biological processes.

Analysis of mRNA accumulation in host-parasite interfaces.

Cumulative density plots of interface/control stem ratios for host mRNAs expressed in Cuscuta-host interfaces, assessed by RNA-seq. All mRNAs shown with black line. Colored lines and dots indicate mRNAs which are confirmed targets of HI-sRNAs in the indicated Cuscuta isolates.

Figure 4 with 1 supplement
Predicted trans-species and self-targeting in C. campestris homologs of target A. thaliana mRNAs.

Target prediction scores for confirmed A. thaliana mRNA targets (black) and best-blast-hit homologs in C. campestris (red). All sRNAs with predicted targeting are shown.

Figure 4—figure supplement 1
Experimental flowchart for confirming self-targeting of C.campestris mRNAs by HI-sRNAs.

(A) Pipeline for confirmation by the presence of secondary siRNAs. (B) Pipeline for confirmation by the 5’ transcript sequencing (NanoPARE). (C) List of all mRNAs with strong evidence for self-targeting.

Figure 5 with 2 supplements
Cuscuta HI-sRNAs form superfamilies that co-vary with target sites across eudicots.

(A) sRNA superfamily count and membership for each Cuscuta isolate. Colors indicate general groupings of superfamilies. (B) An example HI-sRNA superfamily aligned to target sites from homologs in 36 eudicot genomes. Nucleotide and amino acid Shannon entropy from the alignments are shown as bits. Vertical red lines indicate the frame. Dots indicate the number of possible synonymous nucleotides at each codon. 17 additional examples in supplementary file 7. (C) Average conservation of target sites from homologs. Confirmed target site shown (red point), with all other possible sites shown by 25–75% quartiles (black line) and median (black point).

Figure 5—figure supplement 1
Clustering method for forming HI-sRNA superfamilies.

(A) Example demonstrating implementation of the ‘modified hamming distance’ (mHD) when comparing strings. Levenshtein edit distance is tolerant of insertions and deletions, yet the mHD does not allow these operations, making a high penalty to strings which contain insertional errors while shift errors are penalized the same. (B) Example of clustering seven HI-sRNAs into three superfamilies using mHD. Species are indicated by color; clustering is independent of species. Edges close enough to form a cluster (solid line, red distance number) and inadequate edges (dashed line, black distance number) connect HI-sRNA nodes. Cutoff for clustering is an mHD distance of five or less and it is not required that all nodes in a cluster must meet this threshold (must have one adequate edge to join a cluster).

Figure 5—figure supplement 2
Testing distance cutoff parameters for superfamily formation.

(A) Experimental pipeline for testing cutoff. sRNA libraries are shuffled using UShuffle maintaining dinucleotide composition. (B) Number of superfamilies formed for real HI-sRNAs and shuffled libraries by maximum distance allowed for cluster formation. Smaller count of superfamilies means that more HI-sRNAs are successfully clustering with each other. (C) The same analysis as in B, except demonstrating the cumulative density of superfamilies by the number of sRNAs grouped in them. Larger cutoffs yield larger superfamilies, with shuffled libraries remaining unable to form clusters larger than one or two.

Superfamilies compensate for variation in N.benthamiana target homologs.

(A) Accumulation of N. benthamiana target mRNAs. Interface (IN, red) and control stem (CS, black) are shown relative to average CS expression. Points represent biological replicates (N = 5 to 6). P values comparing IN to CS are displayed above the x axis; Wilcoxon rank-sum tests, unpaired, one-tailed. Accumulation was normalized to NbTIP41-L (Niben101Scf03385g06003) and NbPP2A (Niben101Scf09716g01002). (B) sRNA-target alignments of SupFam_27 sRNAs with TIR1 family members from N. benthamiana and A. thaliana. Complementarity scores (Allen et al., 2005) are shown in the heatplot. The strongest predicted interactions are shown on the right; highlighted nucleotides are synonymous variants relative to AtTIR1.

Tables

Key resources table
Reagent type (species) or
resource
DesignationSource or
reference
IdentifiersAdditional
information
Genetic reagent (A. thaliana)xrn4Rymarquis et al., 2011xrn4-5; CS68822; SAIL_681_E01T-DNA insertion mutation
in Col-0 background
Commercial assay or kitNextera DNAflex kitIlluminaProduct: 20018704
Commercial
assay or kit
NEB primers set 1New England BiolabsProduct: E7335S
Commercial assay or kitNEB primers set 2New England BiolabsProduct: E7500S
Commercial assay or kitNEB primers set 3New England BiolabsProduct: E7710S
Commercial assay or kitNEB primers set 4New England BiolabsProduct: E7730S
Software,
algorithm
ShortStack(Johnson et al., 2016)v3.8.5https://github.com/MikeAxtell/ShortStack
Software, algorithmDESeq2(Love et al., 2014)v1.24.0https://bioconductor.org/packages/release/bioc/html/DESeq2.html
Biological sample
(C. campestris)
ccmShahid et al., 2018; Jim Westwood, Virginia Tech‘doddi’
Biological sample
(C. pentagona)
cpe-2017Ebay, seller: eden_wilds2017 collection
Biological sample (C. pentagona)cpe-2015Ebay, seller: eden_wilds2015 collection
Biological sample
(C. gronovii)
cgr-dpClaude dePamphilis, Penn StateProvenance unknown
Biological sample
(C. gronovii)
cgr-massJim Westwood, Virginia Techmassachusetts isolateOrigin: A Massachusetts cranberry bog
Biological sample
(C. gronovii)
cgr-pmWild collectionpurdue mountain isolateOrigin: Roadside near State College, PA (Coordinates: 40.866 N, 77.888 W)
Biological sample
(C. indecora)
cinwww.ars-grin.govPI 675068Origin: Texas

Additional files

Supplementary file 1

Unabridged phylogeny of Cuscuta Phylogeny based on TrnL-F sequencing using vouchered samples and primers (Stefanovic et al., 2007; Costea et al., 2015).

Isolates used in this study are in bold and indicated with arrows. Samples identified as members of species examined in this study are highlighted with color; red - C. campestris, purple - C. pentagona, green - C. gronovii, pink - C. indecora. Format: PDF

https://cdn.elifesciences.org/articles/49750/elife-49750-supp1-v1.pdf
Supplementary file 2

List of all libraries and tissues prepared or used in this study.

All libraries are available under the SRA BioProject: PRJNA543296. Format: xlsx

https://cdn.elifesciences.org/articles/49750/elife-49750-supp2-v1.xlsx
Supplementary file 3

Testing alternative p-value cutoffs for HI-sRNA detection.

Format: xlsx

https://cdn.elifesciences.org/articles/49750/elife-49750-supp3-v1.xlsx
Supplementary file 4

Comprehensive list of haustorium-induced small RNAs (HI-sRNAs) discovered in this study.

Format: xlsx

https://cdn.elifesciences.org/articles/49750/elife-49750-supp4-v1.xlsx
Supplementary file 5

Predicted secondary structures of miRNA hairpins producing HI-sRNAs in C. campestris.

Predicted RNA secondary structures and expression profiles of loci that produce HI-sRNAs and have an apparent miRNA hairpin. Format: PDF

https://cdn.elifesciences.org/articles/49750/elife-49750-supp5-v1.pdf
Supplementary file 6

Target confirmation data for every confirmed HI-sRNA-target interaction.

Details of confirmed HI-sRNA targets including HI-sRNA-target complementarity, site, score, superfamily and the status of C. campestris superfamily members as a confirmed miRNA. Targeting confirmation for target mRNA is shown in upper right, with confirmed interactions in species highlighted in red. sRNA distribution at target locus is shown for experimental interface and control, demonstrating secondary siRNA phasing and size distribution for up-regulated loci. Degradome sequencing is shown where confirmed hits were discovered in NanoPARE data. Format: PDF

https://cdn.elifesciences.org/articles/49750/elife-49750-supp6-v1.pdf
Supplementary file 7

Target interactions with significant correlation of variation in superfamily and target site.

Multiple sequence alignments of HI-sRNA superfamilies which have significant correlations between sRNA positional variation and target site variation. Alignment of eudicot homologs around target site also shown, with nucleotide and amino acid Shannon entropy shown as bits. Vertical red lines indicate the frame. Dots indicate the number of possible synonymous nucleotides at a position for the confirmed target’s sequence. Nucleotide positions are in reference to the position in the multiple sequence alignment. Format: PDF

https://cdn.elifesciences.org/articles/49750/elife-49750-supp7-v1.pdf
Supplementary file 8

Eudicot genomic resources used in this study.

All available in Phytozome version v12.1.6. Format: xlsx

https://cdn.elifesciences.org/articles/49750/elife-49750-supp8-v1.xlsx
Supplementary file 9

Target confirmation data for every confirmed HI-sRNA-target interaction in N. benthamiana.

Details of confirmed C. campestris HI-sRNA targets in N. benthamiana, including HI-sRNA-target complementarity, site, score, superfamily and the status of C. campestris superfamily members as a confirmed miRNA. sRNA distribution at target locus is shown for experimental interface and control, demonstrating secondary siRNA phasing and size distribution for up-regulated loci. Format: PDF

https://cdn.elifesciences.org/articles/49750/elife-49750-supp9-v1.pdf
Supplementary file 10

N. benthamiana targets of HI-sRNAs Based on N. benthamiana genome v1.0.1.

Format: xlsx

https://cdn.elifesciences.org/articles/49750/elife-49750-supp10-v1.xlsx
Supplementary file 11

Target interactions of A. thaliana homologs with conserved target motifs.

Multiple sequence alignments sRNA of superfamilies and conserved target motifs found in Arabidopsis transcriptome, with nucleotide and amino acid Shannon entropy shown as bits. Vertical red lines indicate the frame. Dots indicate the number of possible synonymous nucleotides at a position for the confirmed target’s sequence. Nucleotide positions are in reference to the position in the multiple sequence alignment. Color of gene names indicates if there is evidence for targeting in NanoPARE data (black - 0 replicates; orange - 1 or two replicates; red - three replicates, confirmed interaction). Format: PDF

https://cdn.elifesciences.org/articles/49750/elife-49750-supp11-v1.pdf
Supplementary file 12

List of primers used in this study.

Format: xlsx

https://cdn.elifesciences.org/articles/49750/elife-49750-supp12-v1.xlsx
Supplementary file 13

Alignment of TrnL-F sequences from Cuscuta.

These were the basis for the phylogenetic tree presented in Supplementary file 1. Format: FASTA (plain text).

https://cdn.elifesciences.org/articles/49750/elife-49750-supp13-v1.fasta
Transparent reporting form
https://cdn.elifesciences.org/articles/49750/elife-49750-transrepform-v1.docx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Nathan R Johnson
  2. Claude W dePamphilis
  3. Michael J Axtell
(2019)
Compensatory sequence variation between trans-species small RNAs and their target sites
eLife 8:e49750.
https://doi.org/10.7554/eLife.49750