Figures and data in Compensatory sequence variation between trans-species small RNAs and their target sites

Figures
Tables
Additional files

6 figures, 1 table and 14 additional files

Figures

Figure 1 with 2 supplements

Download asset Open asset

Haustorium-induced small RNAs (HI-sRNAs) are present in multiple *Cuscuta* species.

(A) Phylogeny of select *Cuscuta* species. Size distribution of HI-sRNAs for each sequenced isolate and acronyms are shown. (B) Sampling and sequencing schematic to discern HI-sRNAs. (C) HI-sRNA family counts and membership for each isolate, showing only the top 15 groups. Families were grouped strictly using a maximum edit distance of one nucleotide. Yellow indicates families present in a single isolate.

Figure 1—figure supplement 1

Download asset Open asset

Host preference in *Cuscuta* species in the United States.

(A) Pipeline for processing herbaria data from the mid-atlantic herbaria consortium (MAHC; http://midatlanticherbaria.org) on interactions with each *Cuscuta* species of interest. (B) Ranked list of most identified host families for each species. Top 15 are shown for each species, with the top 10 overall identified with consistent colors (all others in black). (C) Geographical listings within the United States for each sample, where latitude and longitude or a searchable county are found.

Figure 1—figure supplement 2

Download asset Open asset

Genome-free HI-sRNA discovery pipeline.

(A) Discovery of HI-sRNAs in *Cuscuta* isolates. Three major steps include condensing reads to representative sRNAs in a genome-free manner, filtering reads which could have originated from *A. thaliana*, and performing differential expression with DEseq2 to find reads up-regulated in the interface tissue (FDR < 0.1, null hypothesis: sRNA not differentially expressed). (B) Example of a *C. campestris* sRNA discovered by this method, with the top 25 constituent sRNA sequences ranked by expression. Highest expressed read is deemed as the representative sRNA sequence and is shown with black box. Green boxes show variations from representative sequences with total distance shown to left. (C) Same as B but with a known miRNA, showing similar variation to the novel sRNA in B. (D) Comparing the proportion of reads present in annotated miRNAs, using both genome-alignment (ShortStack) and genome-free based approaches. Reads are ranked by size, with the canonical miRNA (blue) and the variants (grey) showing the proportion of reads they make up in the sRNA. Reads grouped in the locus by the genome-free method that are absent in the alignment approach are shown in green.

Figure 2 with 2 supplements

Download asset Open asset

Host targets of *Cuscuta* HI-sRNAs.

(A) Modeled sRNA-target interaction for *A. thaliana CRCK2*. (B) Secondary siRNA accumulation from *CRCK2*. (C) Phasing analysis of secondary siRNAs from *CRCK2*. Expected phase for cut-site shown in red. (D) Size distribution of *CRCK2* secondary siRNAs. (E) Frequency of 5’ ends from the *CRCK2* mRNA, with the predicted HI-sRNA cut site shown in red. (F) Host mRNAs with confirmed targeting by a *Cuscuta* HI-sRNA. Full details in Figure 2—figure supplement 1 and Supplementary file 6.

Figure 2—figure supplement 1

Download asset Open asset

Summary of *Cuscuta* HI-sRNA and host gene target relationships.

(A) Complete list of target interactions between sRNAs and host genes. Confirmation status diagram indicates in what species the interaction is confirmed. Target gene information includes the number of homologs found in 36 eudicot transcriptomes. sRNA counts in superfamilies and the presence of a confirmed miRNA in the family is shown (NoAl: *ccm* sRNA failed to align to *ccm* genome). Target interaction columns indicate the conservation at the translated target site in an alignment of found homologs (5’/3’ UTR: not considered for conservation analysis). Correlation coefficient and P-value for variation in positions in target and sRNA superfamily shown. (B) Breakdown of superfamilies with confirmed targeting by the presence of a confirmed miRNA, where possible. (C) Correlation of positional variation in target-sites and their sRNAs, indicating the interactions with a significant correlation.

Figure 2—figure supplement 2

Download asset Open asset

Most common GO terms for confirmed target genes.

(A) GO terms for molecular function with a nodescore ≥5.0, demonstrating the species for which the interaction is confirmed with colored bars. Locations where bars overlap indicate genes where both species have confirmed targeting. (B) Same as with A, but for biological processes.

Figure 3

Download asset Open asset

Analysis of mRNA accumulation in host-parasite interfaces.

Cumulative density plots of interface/control stem ratios for host mRNAs expressed in *Cuscuta*-host interfaces, assessed by RNA-seq. All mRNAs shown with black line. Colored lines and dots indicate mRNAs which are confirmed targets of HI-sRNAs in the indicated *Cuscuta* isolates.

Figure 4 with 1 supplement

Download asset Open asset

Predicted *trans*-species and self-targeting in *C. campestris* homologs of target *A. thaliana* mRNAs.

Target prediction scores for confirmed *A. thaliana* mRNA targets (black) and best-blast-hit homologs in *C. campestris* (red). All sRNAs with predicted targeting are shown.

Figure 4—figure supplement 1

Download asset Open asset

Experimental flowchart for confirming self-targeting of *C.campestris* mRNAs by HI-sRNAs.

(A) Pipeline for confirmation by the presence of secondary siRNAs. (B) Pipeline for confirmation by the 5’ transcript sequencing (NanoPARE). (C) List of all mRNAs with strong evidence for self-targeting.

Figure 5 with 2 supplements

Download asset Open asset

*Cuscuta* HI-sRNAs form superfamilies that co-vary with target sites across eudicots.

(A) sRNA superfamily count and membership for each *Cuscuta* isolate. Colors indicate general groupings of superfamilies. (B) An example HI-sRNA superfamily aligned to target sites from homologs in 36 eudicot genomes. Nucleotide and amino acid Shannon entropy from the alignments are shown as bits. Vertical red lines indicate the frame. Dots indicate the number of possible synonymous nucleotides at each codon. 17 additional examples in supplementary file 7. (C) Average conservation of target sites from homologs. Confirmed target site shown (red point), with all other possible sites shown by 25–75% quartiles (black line) and median (black point).

Figure 5—figure supplement 1

Download asset Open asset

Clustering method for forming HI-sRNA superfamilies.

(A) Example demonstrating implementation of the ‘modified hamming distance’ (mHD) when comparing strings. Levenshtein edit distance is tolerant of insertions and deletions, yet the mHD does not allow these operations, making a high penalty to strings which contain insertional errors while shift errors are penalized the same. (B) Example of clustering seven HI-sRNAs into three superfamilies using mHD. Species are indicated by color; clustering is independent of species. Edges close enough to form a cluster (solid line, red distance number) and inadequate edges (dashed line, black distance number) connect HI-sRNA nodes. Cutoff for clustering is an mHD distance of five or less and it is not required that all nodes in a cluster must meet this threshold (must have one adequate edge to join a cluster).

Figure 5—figure supplement 2

Download asset Open asset

Testing distance cutoff parameters for superfamily formation.

(A) Experimental pipeline for testing cutoff. sRNA libraries are shuffled using UShuffle maintaining dinucleotide composition. (B) Number of superfamilies formed for real HI-sRNAs and shuffled libraries by maximum distance allowed for cluster formation. Smaller count of superfamilies means that more HI-sRNAs are successfully clustering with each other. (C) The same analysis as in B, except demonstrating the cumulative density of superfamilies by the number of sRNAs grouped in them. Larger cutoffs yield larger superfamilies, with shuffled libraries remaining unable to form clusters larger than one or two.

Figure 6

Download asset Open asset

Superfamilies compensate for variation in *N.benthamiana* target homologs.

(A) Accumulation of *N. benthamiana* target mRNAs. Interface (IN, red) and control stem (CS, black) are shown relative to average CS expression. Points represent biological replicates (N = 5 to 6). P values comparing IN to CS are displayed above the x axis; Wilcoxon rank-sum tests, unpaired, one-tailed. Accumulation was normalized to *NbTIP41-L* (Niben101Scf03385g06003) and *NbPP2A* (Niben101Scf09716g01002). (B) sRNA-target alignments of SupFam_27 sRNAs with *TIR1* family members from *N. benthamiana* and *A. thaliana*. Complementarity scores (Allen et al., 2005) are shown in the heatplot. The strongest predicted interactions are shown on the right; highlighted nucleotides are synonymous variants relative to *AtTIR1*.

Tables

Key resources table

Reagent type (species) or resource	Designation	Source or reference	Identifiers	Additional information
Genetic reagent (A. thaliana)	xrn4	Rymarquis et al., 2011	xrn4-5; CS68822; SAIL_681_E01	T-DNA insertion mutation in Col-0 background
Commercial assay or kit	Nextera DNAflex kit	Illumina	Product: 20018704
Commercial assay or kit	NEB primers set 1	New England Biolabs	Product: E7335S
Commercial assay or kit	NEB primers set 2	New England Biolabs	Product: E7500S
Commercial assay or kit	NEB primers set 3	New England Biolabs	Product: E7710S
Commercial assay or kit	NEB primers set 4	New England Biolabs	Product: E7730S
Software, algorithm	ShortStack	(Johnson et al., 2016)	v3.8.5	https://github.com/MikeAxtell/ShortStack
Software, algorithm	DESeq2	(Love et al., 2014)	v1.24.0	https://bioconductor.org/packages/release/bioc/html/DESeq2.html
Biological sample (C. campestris)	ccm	Shahid et al., 2018; Jim Westwood, Virginia Tech	‘doddi’
Biological sample (C. pentagona)	cpe-2017	Ebay, seller: eden_wilds	2017 collection
Biological sample (C. pentagona)	cpe-2015	Ebay, seller: eden_wilds	2015 collection
Biological sample (C. gronovii)	cgr-dp	Claude dePamphilis, Penn State		Provenance unknown
Biological sample (C. gronovii)	cgr-mass	Jim Westwood, Virginia Tech	massachusetts isolate	Origin: A Massachusetts cranberry bog
Biological sample (C. gronovii)	cgr-pm	Wild collection	purdue mountain isolate	Origin: Roadside near State College, PA (Coordinates: 40.866 N, 77.888 W)
Biological sample (C. indecora)	cin	www.ars-grin.gov	PI 675068	Origin: Texas

Additional files

Supplementary file 1 Unabridged phylogeny of Cuscuta Phylogeny based on TrnL-F sequencing using vouchered samples and primers (Stefanovic et al., 2007; Costea et al., 2015). Isolates used in this study are in bold and indicated with arrows. Samples identified as members of species examined in this study are highlighted with color; red - C. campestris, purple - C. pentagona, green - C. gronovii, pink - C. indecora. Format: PDF: https://cdn.elifesciences.org/articles/49750/elife-49750-supp1-v1.pdf
Download elife-49750-supp1-v1.pdf
Supplementary file 2 List of all libraries and tissues prepared or used in this study. All libraries are available under the SRA BioProject: PRJNA543296. Format: xlsx: https://cdn.elifesciences.org/articles/49750/elife-49750-supp2-v1.xlsx
Download elife-49750-supp2-v1.xlsx
Supplementary file 3 Testing alternative p-value cutoffs for HI-sRNA detection. Format: xlsx: https://cdn.elifesciences.org/articles/49750/elife-49750-supp3-v1.xlsx
Download elife-49750-supp3-v1.xlsx
Supplementary file 4 Comprehensive list of haustorium-induced small RNAs (HI-sRNAs) discovered in this study. Format: xlsx: https://cdn.elifesciences.org/articles/49750/elife-49750-supp4-v1.xlsx
Download elife-49750-supp4-v1.xlsx
Supplementary file 5 Predicted secondary structures of miRNA hairpins producing HI-sRNAs in C. campestris. Predicted RNA secondary structures and expression profiles of loci that produce HI-sRNAs and have an apparent miRNA hairpin. Format: PDF: https://cdn.elifesciences.org/articles/49750/elife-49750-supp5-v1.pdf
Download elife-49750-supp5-v1.pdf
Supplementary file 6 Target confirmation data for every confirmed HI-sRNA-target interaction. Details of confirmed HI-sRNA targets including HI-sRNA-target complementarity, site, score, superfamily and the status of C. campestris superfamily members as a confirmed miRNA. Targeting confirmation for target mRNA is shown in upper right, with confirmed interactions in species highlighted in red. sRNA distribution at target locus is shown for experimental interface and control, demonstrating secondary siRNA phasing and size distribution for up-regulated loci. Degradome sequencing is shown where confirmed hits were discovered in NanoPARE data. Format: PDF: https://cdn.elifesciences.org/articles/49750/elife-49750-supp6-v1.pdf
Download elife-49750-supp6-v1.pdf
Supplementary file 7 Target interactions with significant correlation of variation in superfamily and target site. Multiple sequence alignments of HI-sRNA superfamilies which have significant correlations between sRNA positional variation and target site variation. Alignment of eudicot homologs around target site also shown, with nucleotide and amino acid Shannon entropy shown as bits. Vertical red lines indicate the frame. Dots indicate the number of possible synonymous nucleotides at a position for the confirmed target’s sequence. Nucleotide positions are in reference to the position in the multiple sequence alignment. Format: PDF: https://cdn.elifesciences.org/articles/49750/elife-49750-supp7-v1.pdf
Download elife-49750-supp7-v1.pdf
Supplementary file 8 Eudicot genomic resources used in this study. All available in Phytozome version v12.1.6. Format: xlsx: https://cdn.elifesciences.org/articles/49750/elife-49750-supp8-v1.xlsx
Download elife-49750-supp8-v1.xlsx
Supplementary file 9 Target confirmation data for every confirmed HI-sRNA-target interaction in N. benthamiana. Details of confirmed C. campestris HI-sRNA targets in N. benthamiana, including HI-sRNA-target complementarity, site, score, superfamily and the status of C. campestris superfamily members as a confirmed miRNA. sRNA distribution at target locus is shown for experimental interface and control, demonstrating secondary siRNA phasing and size distribution for up-regulated loci. Format: PDF: https://cdn.elifesciences.org/articles/49750/elife-49750-supp9-v1.pdf
Download elife-49750-supp9-v1.pdf
Supplementary file 10 N. benthamiana targets of HI-sRNAs Based on N. benthamiana genome v1.0.1. Format: xlsx: https://cdn.elifesciences.org/articles/49750/elife-49750-supp10-v1.xlsx
Download elife-49750-supp10-v1.xlsx
Supplementary file 11 Target interactions of A. thaliana homologs with conserved target motifs. Multiple sequence alignments sRNA of superfamilies and conserved target motifs found in Arabidopsis transcriptome, with nucleotide and amino acid Shannon entropy shown as bits. Vertical red lines indicate the frame. Dots indicate the number of possible synonymous nucleotides at a position for the confirmed target’s sequence. Nucleotide positions are in reference to the position in the multiple sequence alignment. Color of gene names indicates if there is evidence for targeting in NanoPARE data (black - 0 replicates; orange - 1 or two replicates; red - three replicates, confirmed interaction). Format: PDF: https://cdn.elifesciences.org/articles/49750/elife-49750-supp11-v1.pdf
Download elife-49750-supp11-v1.pdf
Supplementary file 12 List of primers used in this study. Format: xlsx: https://cdn.elifesciences.org/articles/49750/elife-49750-supp12-v1.xlsx
Download elife-49750-supp12-v1.xlsx
Supplementary file 13 Alignment of TrnL-F sequences from Cuscuta. These were the basis for the phylogenetic tree presented in Supplementary file 1. Format: FASTA (plain text).: https://cdn.elifesciences.org/articles/49750/elife-49750-supp13-v1.fasta
Download elife-49750-supp13-v1.fasta
Transparent reporting form: https://cdn.elifesciences.org/articles/49750/elife-49750-transrepform-v1.docx
Download elife-49750-transrepform-v1.docx