1. Genetics and Genomics
  2. Plant Biology
Download icon

Genetic basis of sRNA quantitative variation analyzed using an experimental population derived from an elite rice hybrid

  1. Jia Wang
  2. Wen Yao
  3. Dan Zhu
  4. Weibo Xie
  5. Qifa Zhang  Is a corresponding author
  1. Huazhong Agricultural University, China
Research Article
  • Cited 3
  • Views 1,359
  • Annotations
Cite this article as: eLife 2015;4:e03913 doi: 10.7554/eLife.03913

Abstract

We performed a genetic analysis of sRNA abundance in flag leaf from an immortalized F2 (IMF2) population in rice. We identified 53,613,739 unique sRNAs and 165,797 sRNA expression traits (s-traits). A total of 66,649 s-traits mapped 40,049 local-sQTLs and 30,809 distant-sQTLs. By defining 80,362 sRNA clusters, 22,263 sRNA cluster QTLs (scQTLs) were recovered for 20,249 of all the 50,139 sRNA cluster expression traits (sc-traits). The expression levels for most of s-traits from the same genes or the same sRNA clusters were slightly positively correlated. While genetic co-regulation between sRNAs from the same mother genes and between sRNAs and their mother genes was observed for a portion of the sRNAs, most of the sRNAs and their mother genes showed little co-regulation. Some sRNA biogenesis genes were located in distant-sQTL hotspots and showed correspondence with specific length classes of sRNAs suggesting their important roles in the regulation and biogenesis of the sRNAs.

https://doi.org/10.7554/eLife.03913.001

eLife digest

Genes within the DNA of a plant or animal contain instructions to make molecules called RNAs. Some RNA molecules can be decoded to make proteins, whereas others have different roles. A single gene often contains the instructions to make both protein-coding RNAs and non-coding RNAs.

Molecules called small RNAs (or sRNAs) do not code for proteins. Instead, sRNAs can control protein-coding RNA molecules or chemically alter the DNA itself; this allows them to perform many different roles in living organisms. In plants, for example, these molecules affect how the plant grows, the shapes and structures it forms, and how likely it is to survive challenges such as drought and diseases. Often different plants of the same species have different amounts of sRNAs, but the reasons for this remain unclear.

Now, Wang, Yao et al. have made use of a technique called ‘expression quantitative locus’ analysis to look at how sRNAs in rice plants are controlled by additional information encoded within DNA. The analysis identified over 53 million sRNA molecules from a population of rice plants. Many of these sRNAs varied in their abundance between different plants within the population. Wang, Yao et al. also found many thousands of individual instructions within the DNA of the rice that can either increase or reduce the abundance of their associated sRNA.

Some of the abundant sRNAs were influenced by instructions within their own genes; some were influenced by instructions from other genes; and some were influenced by both. Wang, Yao et al. also found that the control of protein-coding RNAs was not necessarily related to the control of sRNAs encoded by the same gene. Further work is now needed to identify which specific DNA sequences regulate the abundance of sRNA molecules in plants and other organisms.

https://doi.org/10.7554/eLife.03913.002

Introduction

Small RNAs (sRNAs) are non-coding RNAs mainly 18–30 nt in length that regulate a wide range of biological processes in eukaryotic organisms (Carthew and Sontheimer, 2009; Axtell, 2013). According to their origin, sRNAs can be grouped into two major types: hpRNAs that are derived from single-stranded precursors with a hairpin structure (such as microRNAs [miRNAs]) and short interfering RNA (siRNAs) that are derived from double-stranded RNA precursors such as heterochromatic small interfering RNAs (hc-siRNA) and trans-acting siRNAs (ta-siRNA). There has been an explosion of interest in recent years in studies of miRNAs and siRNAs on their identification, biogenesis, and functioning in diverse biological processes. In plants, sRNAs function in regulating growth, development (Juarez et al., 2004; Zhu and Helliwell, 2011), architecture (Jiao et al., 2010; Miura et al., 2010), yield (Zhang et al., 2013), and response to biotic and abiotic stresses (Lu et al., 2008; Shukla et al., 2008). Such regulations are usually achieved by mediating endogenous mRNA cleavage and decay, DNA methylation of source and target loci, and chromatin modification and transcriptional silencing (Arikit et al., 2013).

Although the sRNAs differ in length, sequences, and functions, the pathways for their biogenesis and functioning from precursor transcription, processing, maturation, and action are relatively conserved, which involve the activities of a number of enzymes including RNA polymerase II (Pol II), RNA-dependent RNA polymerases (RDRs), Dicer-like proteins (DCLs), and Argonautes (AGOs) (Chen, 2009; Ghildiyal and Zamore, 2009). It is also known that the abundance of sRNA species can be highly variable among individuals within a species, and the possible regulatory role of such quantitative difference has been assumed (He et al., 2010; Groszmann et al., 2011b). It is not known whether the quantitative variation of sRNA species between genotypes is related to the biological machinery, as quantitative variation of sRNAs has not been assayed at the population level and their genetic control has yet to be elucidated.

The recently developed expression quantitative trait locus (eQTL) analysis has provided an approach for determining the genetic control of the expression level of a gene, including cis- and trans-eQTLs, as well as epistatic effects (Becker et al., 2012). This approach can also be applied to the genetic analysis of quantitative variation of sRNAs by regarding the abundance of the sRNAs in the population as quantitative traits. Once the QTLs are identified, subsequent studies can be pursued very much the same way as the analysis of genes and regulatory networks underpinning phenotypic QTLs (Xing and Zhang, 2010).

There have been several studies focusing on the genetic regulation of known and validated miRNAs and small nucleolar RNAs in specific tissues/cells from samples of the human population (Borel et al., 2011; Gamazon et al., 2012; Parts et al., 2012; Civelek et al., 2013; Jin and Lee, 2013; Siddle et al., 2014). These studies detected a number of cis- and trans-miQTLs that could also influence the expression of the mRNA targets, which may be associated with phenotype difference.

Here, we performed a whole genome QTL analysis of the entire sRNA kingdom consisted of 18-nt to 26-nt sRNAs from flag leaf of rice using an experimental genetic population. The analysis revealed features of the genetic controls of sRNA abundance showing both commonality and distinction with their precursor transcripts. It was also shown that the abundance of sRNAs is probably related to proteins constituting the machinery for sRNA biogenesis and functioning.

Results

Patterns and distributions of sRNAs

The genetic materials consisted of 98 hybrids obtained by paired crosses of 196 recombinant inbred lines (RILs) derived by single seed descent from a cross between Zhenshan 97 and Minghui 63, the parents of Shanyou 63 that was the most widely cultivated rice hybrid in the 1980s and 1990s and still used with reduced area in recent years in China. Because the genetic composition of these crosses resembled individuals in an F2 population, they were referred to as an immortalized F2 (IMF2) population (Hua et al., 2002, 2003), and each hybrid in the population was hereafter referred to as an IMF2 for ease of description. An sRNA library was constructed using RNA extracted from flag leaf at the day of full expansion for each IMF2, and two biological replicates were obtained for each of the two parental lines and their F1 hybrid, producing a total of 104 libraries.

The read size of the raw sequencing data obtained using Illumina Hiseq2000 varied from 3 nt to 44 nt (Figure 1—figure supplement 1). Sequences of 18–26 nt in length that appeared to be the more abundant than others were kept for the analyses after filtering out low-quality reads and eliminating ones matching tRNAs, rRNAs, snRNAs, and snoRNAs (Figure 1—figure supplement 2). The numbers of resulting reads varied from 14.52 million to 27.73 million per library (2.08 billion in total) with non-redundant reads ranging from 3.00 million to 6.86 million per library (0.52 billion in total) (Supplementary file 1 in Dryad [Wang et al., 2015], Table 1, Figure 1—figure supplement 2). The 24 nt sRNAs were the most numerous in both redundant and distinct reads (Figure 1—figure supplement 2).

Table 1

Average number of reads obtained for the 104 libraries from flag leaves of the IMF2 population and the parental lines

https://doi.org/10.7554/eLife.03913.003
Quality filtering*Size filteringNcRNA filteringGenome mapping§Total#
Redundant reads22,864,06722,318,92819,968,5669,816,9331,020,961,021
Distinct reads5,106,6545,035,0734,986,1842,499,23353,613,739
  1. *

    Reads after filtering out low-quality reads.

  2. Reads of 18–26 nt in length.

  3. Reads after eliminating ones matching tRNAs, rRNAs, snRNAs, and snoRNAs.

  4. §

    Reads mapped to the SNP-replaced reference genomes of the parents with unique locations allowing no mismatch.

  5. #

    The total reads of 104 libraries used to identify s-traits.

We mapped the reads to the SNP-replaced reference genomes of the parents (‘Materials and methods’), with unique location allowing no mismatch. The reads could be divided into four categories: (1) approximately 82.28% of the sRNAs had identical sequences between parents, (2) 0.71% of the sRNAs had SNPs between the two parents, (3) 8.35% were only mapped to the Zhenshan 97 genome, and (4) 8.66% were specifically mapped to the Minghui 63 genome (Supplementary file 2 in Dryad [Wang et al., 2015]). A total of 53,613,739 unique sRNA sequences including ones with SNPs between the parents were identified by combining all the reads from 104 libraries (Table 1). Approximately 84.7% of sRNAs were found in no more than 5 IMF2s, while only 0.13% were present in all 98 IMF2s (Figure 1A).

Figure 1 with 5 supplements see all
The distribution of sRNAs and s-traits in different genomic regions across the IMF2 population.

(A) The distribution of sRNAs across the IMF2 population. (B) The distribution of sRNAs aligned to the genic region, 2-kb upstream, and 500-bp downstream of annotated genes, as well as intergenic regions. TE: transposons; NTE: non-transposon genes. (C) The percentages of sRNAs of different sizes. (D) The distribution of sRNAs of different sizes in different genomic regions. (E) The distribution of sRNAs of different sizes in different portions of genic regions. (F) The percentages of s-traits of different sizes. (G) The distribution of s-traits aligned to the genic region, 2-kb upstream, and 500-bp downstream of annotated genes, as well as intergenic regions. The legend is the same as in (B). (H) The distribution of s-traits of different size in different genomic regions. The legend is the same as in (D). (I) The distribution of s-traits of different sizes in different portions of genic regions. The legend is the same as in (E).

https://doi.org/10.7554/eLife.03913.004

The distributions of the sRNAs in different portions of the genome were not random. Approximately 51.1% of sRNAs originated from the 2-kb upstream and the genic regions of non-transposon genes (Figure 1B). In addition, sRNAs occurred in high frequencies near the transcription start sites compared to other regions of the promoters (Figure 1—figure supplement 3). About 34.4% of sRNAs originated from intergenic regions, which account for no more than 20% of the whole genome length (Figure 1B). About 43.9% of sRNAs were 24 nt, while 12.1% were 21 nt (Figure 1C).

Different species of sRNAs differed in their origins of genomic regions. The genic regions of non-transposon genes held the highest number of 21 nt sRNAs, while 24 nt sRNAs were most common in the intergenic regions (Figure 1D). sRNAs of 21 nt in genic regions were mostly derived from the exon of non-transposon genes, while 24 nt sRNAs in genic regions were mainly from the intron of non-transposon genes (Figure 1E). The 24-nt sRNAs mainly consist of endogenous heterochromatic siRNAs (Axtell, 2013) and tend to be produced from repeats and transposable elements as well as intergenic regions (Chen, 2009). The 24-nt sRNAs are components of the epigenome that target the homologous genomic regions for de novo DNA methylation through RNA-directed DNA methylation to maintain genome stability by transcriptional gene silencing (Groszmann et al., 2011a, 2013; Pooggin, 2013).

Next, the abundance of each sRNA in a library was normalized to number of reads per millions (RPM) (‘Materials and methods’) to quantify the sRNA levels in each library, which were subject to analyses and comparisons. An sRNA with RPM value ≥0.6 was regarded as expressed, and an sRNA identified as expressed in more than 25 of the 98 IMF2s was regarded as an sRNA expression trait (s-trait). In this way, a total of 165,797 s-traits were recovered (Supplementary file 3 in Dryad [Wang et al., 2015]). A total of 87.9% of s-traits were 24 nt (Figure 1F). The s-traits mainly originated from the intergenic regions, 2-kb upstream, and genic regions of non-transposon genes (Figure 1G). Although the 2-kb upstream and the genic regions of non-transposon genes contained almost equal number of sRNAs (Figure 1B), the number of s-traits derived from the 2-kb upstream of non-transposon genes was more than twice of the number of s-traits in genic regions of non-transposon genes (Figure 1G). The distribution of different species s-traits in different genomic regions was similar to that of sRNAs (Figure 1D,H). Although sRNAs of various lengths from genic regions were mostly from exons of non-transposon genes (Figure 1E), the s-traits of different sizes were mainly derived from introns of non-transposon genes (Figure 1I). The coefficients of variation of the 165,797 s-traits (Figure 1—figure supplement 4) showed a very wide range of distribution of the sRNA levels in the population. We selected 9 s-traits randomly from all the 165,797 s-traits, and the distributions of the expression values for most of these s-traits across the IMF2 population were more or less normal (Figure 1—figure supplement 5).

Expression correlations between sRNAs that originated from the same genes

To test whether s-traits originated from the same gene were transcribed together, the correlation coefficients between the expression values of s-traits originated from the same gene were calculated. A total of 3,739,873 correlation coefficients involving 15,078 genes and 96,914 s-traits were obtained, 2,593,503 (69.3%) and 507,528 (13.6%) of which were contributed by 2278 s-traits from genes LOC_Os03g01360 and 1008 s-traits from LOC_Os07g01240, respectively (Figure 2). The correlation coefficients naturally fell into three classes: strong negative correlations, no correlation, and strong positive correlations with obvious dividing points at −0.3 and 0.3 (Figure 2A), which were far above 0.25, the threshold for statistical significance at p < 0.05 determined by simulation data. While 85.8% of these strong correlations (positive or negative) were contributed by s-traits originated from LOC_Os03g01360 (Figure 2B), most of s-traits originated from gene LOC_Os07g01240 showed low (or no) correlations with each other (Figure 2C). Although the expression values of s-traits originated from the same regions of the remaining 15,076 genes in the whole genome were mostly slightly positively correlated, the correlation coefficients were significantly distinct from that of the simulation of random data (Mann–Whitney U test, p-value <2.2e-16, Figure 2D), indicating various degrees of correlations. We found that the correlations were affected by the sizes of s-traits (Figure 2—figure supplement 1). Expression correlations between s-traits of the same size were stronger than that between s-traits of different sizes. In particular, the expression correlations between 21-nt sRNAs and 25-nt sRNAs from the same genes were very weak. The correlations between overlapped s-traits were slightly stronger than that between s-traits not overlapping with each other (Figure 2—figure supplement 2A,B). The correlations between s-traits from the same transposons were slightly different from that of s-traits from the same non-transposons genes (Figure 2—figure supplement 2C,D). Most of the s-traits originating from LOC_Os03g01360 were significantly correlated (positive or negative) with each other (Figure 2—figure supplement 3), and the correlations between different s-traits were mainly dependent on their genomic positions rather than their sizes, which were mostly 21, 22, and 24 nt, accounting for 40.2%, 34.4%, and 19.4% of the 2278 s-traits. This gene could be separated as three parts, the 2 kb-upstream, 5′ UTR, the first intron and the first coding DNA sequence (CDS) as the first part, the most of the second intron comprising the second part, and the second CDS and part of the second intron as the third part (Figure 2—figure supplement 3). s-traits within each part were positively correlated with each other, while s-traits from the first part were mostly negatively correlated with those from the other two parts. The correlations between the expression levels of s-traits from the second parts and s-traits from the third parts were mostly positive. Thus, although very strong positive and negative correlations were found between sRNAs from the same genic regions of specific genes, this was not a common phenomenon. Taken together, this correlation analysis indicated that most of the sRNAs derived from the same genes were correlated with each other to various extents, implying that nearby sRNAs were possibly jointly transcribed. Consecutively overlapped sRNAs were found from introns of specific genes and intergenic regions, indicating that sRNAs were not necessarily jointly transcribed from the genes.

Figure 2 with 5 supplements see all
Expression correlations between s-traits originating from the same genes.

(A) The distribution of expression correlations between s-traits originating from the same genes. (B) The distribution of expression correlations between s-traits originating from LOC_Os03g01360. (C) The distribution of expression correlations between s-traits originating from LOC_Os07g01240. (D) Comparison of the distribution of expression correlations for the simulation data and the true data excluding s-traits from LOC_Os03g01360 and LOC_Os07g01240.

https://doi.org/10.7554/eLife.03913.010

Expression correlations between sRNAs that originated from the same sRNA cluster

As many sRNAs may be derived from a single primary transcript, we defined sRNA clusters to quantify sRNA expression (Castel and Martienssen, 2013), based on our data of all 104 libraries. An sRNA island was defined as a genomic region composed of consecutive genomic positions matched by at least 30× sRNA read coverage. Nearby sRNA islands resided within 1000 nt from each other were merged and regarded as the same sRNA cluster. As a result, 80,362 sRNA clusters were recovered with an average size of 1927 bp (ranging from 51 bp to 47,021 bp) (Figure 2—figure supplement 4, Supplementary file 4 in Dryad [Wang et al., 2015]). A total of 46,293 (57.6%) sRNA clusters overlapped with genic regions, 23,143 (28.8%) of which were entirely derived from genic regions. In addition, the distances between another 19,164 (23.8%) sRNA clusters and the genic regions were smaller than 1 kb.

The abundance of each sRNA cluster was normalized using DEseq (‘Materials and methods’) (Anders and Huber, 2010). An sRNA cluster with normalized expression value ≥6 was regarded as expressed, and an sRNA cluster expressed in more than 25 of all the 98 IMF2s was regarded as an sRNA cluster expression trait (sc-trait). A total of 50,139 (62.4%) sRNA clusters were determined as sc-traits, 34,762 of which were expressed in all of the 98 IMF2s.

We next calculated the correlation coefficients between the expression values of s-traits originating from the same sRNA clusters to infer whether they were transcribed together. A total of 4,282,587 correlation coefficients involving 21,571 sRNA clusters and 157,994 s-traits were obtained, 72.5% of which were contributed by s-traits from clusters chr03-261504-273325 and chr07-136331-141317 (Figure 2—figure supplement 5). These two sRNA clusters corresponded to the two genes, LOC_Os03g01360 and LOC_Os07g01240, mentioned above. Of all the correlations, 2,776,296 were relatively strong (above 0.3 or below −0.3 based on a threshold of 0.25 determined by simulation data with p <0.05). However, 81.5% of these strong correlations were contributed by the 2280 s-traits from cluster chr03-261504-273325 (Figure 2—figure supplement 5B). A total of 1008 s-traits originated from cluster chr07-136331-141317, most of which were only slightly correlated with each other (Figure 2—figure supplement 5C). The distribution of correlation coefficients of s-traits from the same sRNA clusters was quite similar to that of s-traits from the same genes (Figure 2—figure supplement 5D). Although the s-traits originating from the same region of the remaining 215,69 sRNA clusters in the whole genome were only slightly positively correlated, the correlation coefficients were significantly different from that of simulated random data (Mann–Whitney U test, p-value <2.2e-16, Figure 2—figure supplement 5E).

Thus, again very strong positive and negative expression correlations were only observed for sRNAs from certain sRNA clusters, while most sRNAs originating from the same sRNA clusters were only slightly positively correlated with each other.

Expression correlations between sRNAs and their mother genes

mRNA sequencing was conducted using the same samples as in the sRNA sequencing with one biological replicate for the IMF2 population and two biological replicates for each of the two parental lines and their F1 hybrid, producing a total of 104 libraries. After removal of low-quality sequencing data, reads were mapped to the Nipponbare reference genome using TopHat (Figure 3—figure supplement 1, Supplementary file 5 in Dryad [Wang et al., 2015]) (Trapnell et al., 2009). Most of the reads were mapped to the CDS and UTR regions of non-transposon genes (Figure 3—figure supplement 2).

The abundance of each mRNA in a library was normalized to the number of fragments per kilobase of transcript per million mapped fragments (FPKM) using Cufflinks (Trapnell et al., 2010) to quantify the expression levels of mRNAs in each library. An mRNA with FPKM value ≥1 was regarded as expressed; 35,819 (54.2%) mRNAs were expressed in none of the 98 IMF2s, while 16,849 (25.5%) were expressed in all the 98 IMF2s. An mRNA identified as expressed in more than 25 of the 98 IMF2s was regarded as an e-trait. In this way, a total of 24,987 e-traits were obtained from a total of 66,123 mRNAs in the whole genome.

For ease of description, we hereafter refer to genic sequences including both exons and introns that encoded the s-traits as ‘mother genes’. To assess whether sRNAs and their mother genes were transcribed together, the correlation coefficients of the expression values of s-traits and their mother genes were calculated. A total of 77,640 correlation coefficients were obtained, involving 9968 mRNAs and 56,185 s-traits. Among all these correlations, 13,749 (17.7%) were strong correlations (above 0.3 or below −0.3, based on a threshold of 0.23 determined by simulation data with p < 0.05). However, 45.6% of these strong correlations were again contributed by gene LOC_Os03g01360 (Figure 3A). Correlations involving LOC_Os07g01240 were mostly weak (Figure 3B). The correlation coefficients involving rest of the genes in the whole genome were centered around zero (Figure 3C).

Figure 3 with 2 supplements see all
Correlations between the s-traits and the transcripts of their mother genes.

(A) Correlations between s-traits and e-traits derived from LOC_Os03g01360. (B) Correlations between s-traits and e-traits derived from LOC_Os07g01240. (C) Correlations between s-traits and e-traits originating from the same genes excluding the above two loci. (D) Correlations between s-traits and different e-traits derived from LOC_Os03g01360.

https://doi.org/10.7554/eLife.03913.016

In-depth analysis of s-traits from gene LOC_Os03g01360 revealed that the expression values of s-traits from the first part of LOC_Os03g01360 were positively correlated with two transcripts, LOC_Os03g01360.1 and LOC_Os03g01360.4, and negatively correlated with the other LOC_Os03g01360.2 (Figure 3D). On the other hand, the expression values of most s-traits from the third part of LOC_Os03g01360 were negatively correlated with LOC_Os03g01360.1 and LOC_Os03g01360.4 but positively correlated with LOC_Os03g01360.2 (Figure 3D). Whereas both negative and positive expression correlations were found between s-traits from the second part of LOC_Os03g01360 and these three transcripts (Figure 3D).

Thus, although strong expression correlations, either positive or negative, were observed between specific sRNAs and their mother gene, the expression levels of most sRNAs were not correlated with their mother genes, suggesting independent transcription of sRNAs and mRNAs.

QTL analysis of s-traits, sc-traits, and e-traits using the IMF2 population

QTL analysis was performed for the 165,797 s-traits based on the ultrahigh-density SNP map, which contained 1556 recombination events and was composed of 1568 bins with an average size of 238 kb (ranging from 6 kb to 7947 kb) (Figure 4—figure supplement 1, Figure 4—figure supplement 2, Supplementary file 6 in Dryad [Wang et al., 2015], ‘Materials and methods’) (Xie et al., 2010; Yu et al., 2011), using composite interval mapping (CIM) in R/qtl with 1000 permutations (Haley and Knott, 1992; Broman and Speed, 2002; Manichaikul et al., 2009). With a false discovery rate (FDR) set at 5%, a total of 70,858 sQTLs were recovered for 66,649 s-traits (Supplementary file 7 in Dryad [Wang et al., 2015]). These sQTLs were classified as local-sQTLs and distant-sQTLs, which were also referred to as cis- and trans-QTLs, respectively, in previous studies (Wang et al., 2010, 2014), according to their locations relative to the s-traits. A local-sQTL indicated the existence of local functional polymorphism(s) that could influence the abundance of the s-trait, and a distant-sQTL meant the s-trait expression variation in the population was controlled by regulatory element(s) distant from the s-trait precursor sequence. For defining a local-sQTL, we adopted a 1.5 LOD-drop support interval of the corresponding sQTL or no more than 250 kb from the closest marker in a recombination sparse region. Otherwise, it was regarded as a distant-sQTL (Yvert et al., 2003; Morley et al., 2004; Keurentjes et al., 2007). Such classification resulted in 40,049 local-sQTLs (shown in the diagonal of Figure 4) and 30,809 distant-sQTLs (off-diagonal of Figure 4) (Supplementary file 7 in Dryad [Wang et al., 2015]). Expression variations of s-traits explained by local-sQTLs were substantially higher than that by distant-sQTLs (Figure 4—figure supplement 3). The LOD values for 80.4% of the local-sQTLs and 34.5% of distant-sQTLs were larger than 10.

Figure 4 with 8 supplements see all
sQTLs for the 66,649 s-traits.

The color key shows the LOD value. X-axis, the physical position of sQTLs along the genome of 12 chromosomes. Y-axis, the physical position of s-traits. QTLs with LOD value <5 are not included in the presentation.

https://doi.org/10.7554/eLife.03913.019

Next, we performed QTL analysis for 50,139 sc-traits based on the same genetic map as in sQTL analysis using CIM in R/qtl with 1000 permutations. A total of 22,263 scQTLs including 11,476 local-scQTLs and 10,787 distant-sQTLs were recovered for 20,049 sc-traits utilizing the same definitions of local- and distant-QTLs (Figure 4—figure supplement 4, Supplementary file 7 in Dryad [Wang et al., 2015]). Again, the LOD values of local-scQTLs and expression variations explained by local-scQTLs were much higher than distant-scQTLs (Figure 4—figure supplement 5).

We also performed QTL analysis for the 24,987 e-traits based on the same genetic map, using the same algorithm and parameters as in the QTL analysis of s-traits and sc-traits. With an FDR set at 5%, a total of 6423 eQTLs were recovered for 6123 e-traits (Supplementary file 7 in Dryad [Wang et al., 2015]). These eQTLs were classified as local-eQTLs and distant-eQTLs according to the same rule applied in the sQTL analysis. As a result, 2964 local-eQTLs (shown in the diagonal of Figure 4—figure supplement 6) and 3459 distant-eQTLs (off-diagonal of Figure 4—figure supplement 6) were detected (Supplementary file 7 in Dryad [Wang et al., 2015]). Expression variations of mRNAs explained by local-eQTLs were significantly higher than that by distant-eQTLs (Figure 4—figure supplement 7).

Local-QTLs regulating the expression of large numbers of traits were observed for s-traits and sc-traits but not for e-traits (Figure 4, Figure 4—figure supplement 4 and Figure 4—figure supplement 6), probably due to the consecutive transcription of sRNAs and sRNA clusters, which was distinct from the transcription of mRNAs. In the extreme case, a total of 1660 s-traits were regulated by a single local-sQTL (Bin358) on chromosome 3 (Supplementary file 8 in Dryad [Wang et al., 2015]). Each of another five local-sQTLs (Bin359, Bin1183, Bin969, Bin286, and Bin166) explained the expression variations of more than 200 s-traits. The local-scQTL (Bin1183) with the largest effect regulated the expression of 93 sc-traits. Each of another 29 local-scQTLs was responsible for the expression variations of more than 30 sc-traits. In comparison, a total of 24 e-traits were regulated by Bin3, which is the local-eQTL explaining the expression variations of the largest number of e-traits. Distant-QTLs regulating many traits across all chromosomes were very common for s-traits and sc-traits but relatively rare for e-traits. A total of 15 bins were identified as distant-sQTLs, each of which regulated more than 200 s-traits; in one case, 514 s-traits were regulated by a single distant-sQTL (Bin1209). We observed three distant-scQTLs (Bin634, Bin635, and Bin636), each of which regulated more than 100 sc-traits, and another 21 distant-scQTLs, each of which explained the expression variations of more than 30 sc-traits. Whereas in case of mRNAs, only 9 bins were each responsible for the expression of more than 30 e-traits.

We further inspected the QTL results of 22,142 s-traits expressed (RPM >0) in only one of the two parents. In all, 11,194 and 10,948 s-traits were expressed only in the genome of Zhenshan 97 and Minghui 63, respectively. A total of 15,037 QTLs, 11,781 of which were local-sQTLs, were obtained for 14,139 of these s-traits. This proportion was significantly higher than the whole genome average level (Fisher's exact test, p value <2.2e-16), implying that the expression variations of these s-traits were mostly controlled by local polymorphisms. Examples of s-traits expressed in only one of the two parents were shown in Figure 4—figure supplement 8.

Identification of QTL hotspots

We next identified bins enriched with traits or QTLs based on the relative densities of the distributions in the genome (‘Materials and methods’). 40 bins were identified as hotspots for s-traits (Supplementary file 8 in Dryad [Wang et al., 2015], Figure 5). A total of 132 and 129 bins were identified as local-sQTL and distant-sQTL hotspots, respectively (Supplementary file 8 in Dryad [Wang et al., 2015], Figure 5). 21 bins were both s-trait and local-sQTL hotspots, while 64 bins were both local-sQTL and distant-sQTL hotspots (Supplementary file 8 in Dryad [Wang et al., 2015], Figure 5); 123 were local-scQTL hotspots and 149 distant-scQTL hotspots (Supplementary file 8 in Dryad [Wang et al., 2015]). In all, 76 bins were local-QTL hotspots and 80 bins were distant-QTL hotspots for both s-traits and sc-traits. On the other hand, 116 and 126 bins were identified as local- and distant-eQTL hotspots. The identified hotspots of sQTLs and scQTLs were quite similar but distinct from that of eQTLs (Figure 5). Totally, 33 bins were local-QTL hotspots and 36 bins were distant-QTL hotspots for both s-traits and e-traits. Consecutive distant-QTL hotspots in adjacent genomic regions could explain the expression variations of a large number of s-traits, sc-traits, or e-traits (Supplementary file 8 in Dryad [Wang et al., 2015]). Four consecutive distant-QTL hotspots on chromosome 4, 5, 6, and 9 were identified for both s-traits and sc-traits (Figure 5A,B). The genomic positions of consecutive distant-eQTLs hotspots were quite distinct from that of sQTL or scQTLs (Figure 5). A cluster of the consecutive distant-eQTL hotspots (chr04: 1.713 Mb–2.495 Mb) was located close to one end of chromosome 4, while another cluster of consecutive distant-QTL hotspots (chr04: 21.997 Mb–23.342 Mb) for both s-traits and sc-traits was about 10 Mb distant from the other end of chromosome 4. A cluster of consecutive distant-eQTL hotspots (chr07: 28.058 Mb–28.557 Mb) was located close to one end of chromosome 7, while no apparent consecutive distant-QTL hotspots for s-traits and sc-traits were on chromosome 7. A total of 12 bins (Bin1199–Bin1210) on chromosome 9 were identified as consecutive distant-QTL hotspots for both s-traits and sc-traits but not for mRNA e-traits. Only the consecutive distant-QTL hotspot on chromosome 6 (chr06: 2.847 Mb–4.023 Mb) was responsible for the expression variations of s-traits, sc-traits, and e-traits.

Figure 5 with 2 supplements see all
The distribution of traits and QTLs in the 1568 bins.

(A) s-traits and sQTLs. (B) sc-traits and scQTLs. (C) e-traits and e-QTLs. The 1568 bins are arranged from left to right according to their genomic position. The width of bar represents the size of the bin. The chromosome identifiers are labeled below each plot. Bins in red color were trait or QTL hotspots. Locations for a number of interesting regions like sRNA biogenesis genes, Bin358 (LOC_Os03g01360) and Bin 969 (LOC_Os07g01240), are also indicated, in addition to the hotspots.

https://doi.org/10.7554/eLife.03913.028

The s-trait hotspot with the highest s-trait density was Bin969 (chr07: 0.121 Mb–0.155 Mb), holding 1025 s-traits, of which 66.6% obtained at least one sQTL (Figure 5A, Supplementary file 7 in Dryad [Wang et al., 2015]). Bin969 was also a local-sQTL hotspot with the third highest local-sQTL density. The local-sQTL hotspot with the highest local-sQTL density was Bin358 (chr03: 0.266 Mb–0.368 Mb), which held 2338 s-traits, and an overwhelming majority of these s-traits (2245) had at least one sQTL (Figure 5A, Supplementary file 8 in Dryad [Wang et al., 2015]).

sQTL hotspots and sRNA biogenesis genes

sRNAs are usually generated by activities of Dicer-like proteins, Argonautes, and RNA-dependent RNA polymerases (RDRs) (Kapoor et al., 2008; Axtell, 2013). In rice, there are 43 transcripts of 32 genes for sRNA biogenesis including eight encoding Dicer-like proteins (OsDCLs), 19 Argonautes (OsAGOs), and 5 OsRDRs (Supplementary file 9 in Dryad [Wang et al., 2015]) (Kapoor et al., 2008). In this study, 26 transcripts for 20 genes were detected (FPKM ≥1) in at least one of the parents. Six transcripts of five genes (OsDCL2a, OsDCL2b, OsAGO1d, OsAGO13, and OsAGO18) were significantly differentially expressed between Zhenshan 97 and Minghui 63 (Supplementary file 9 in Dryad [Wang et al., 2015]). Comparison of DNA sequences between Zhenshan 97 and Minghui 63 revealed that 23 of the 32 sRNA biogenesis genes had polymorphisms of either SNPs or indels (Figure 5—figure supplement 1, Supplementary file 9 in Dryad [Wang et al., 2015]).

We compared the genomic positions of these sRNA biogenesis genes and distant-sQTL hotspots. OsRDR2 (in Bin636 and Bin637) was located in the consecutive distant-sQTL hotspots (Bin632–Bin637) on chromosome 4 regulating 1416 s-traits. OsDCL2b (in Bin1205) was located in the consecutive distant-sQTL hotspots (Bin1199–Bin1210) on chromosome 9 (Figure 5A, Supplementary file 10 in Dryad [Wang et al., 2015]), which regulated the expression variation of 3759 s-traits. OsDCL2a was located in Bin448, which regulated the expression of 213 s-traits, although this bin was not identified as a distant-sQTL hotspot due to its large size (1.254 Mb). However, four neighboring bins (Bin449–Bin451, Bin453) of Bin448 were identified as distant-sQTL hotspots (Supplementary file 8 in Dryad [Wang et al., 2015]). Eight adjacent bins (Bin734–Bin741) on chromosome 5 were found to harbor 364 distant-sQTLs, and OsDCL1c was in Bin738, which contained 33 distant-sQTLs (Supplementary file 8 in Dryad [Wang et al., 2015]).

The expression correlations between all the e-traits in the distant-sQTL hotspots (Bin632–Bin637) on chromosome 4 and all the s-traits whose expression variations were controlled by QTLs in these bins were calculated. Most of these correlations including those between the e-traits of OsRDR2 and all the s-traits were weak (Figure 5—figure supplement 2A,B). For s-traits regulated by QTLs in the consecutive distant-sQTL hotspots (Bin1199–Bin1210) on chromosome 9 (Figure 5A), most of the correlations between the s-traits and the e-traits in these bins were weak, while the correlations between s-traits and the e-traits of OsDCL2b were relatively strong (Figure 5—figure supplement 2C). About 79.7%, 17.4%, and 1.2% of the s-traits whose expression levels were regulated by the consecutive distant-sQTL hotspots harboring OsRDR2 on chromosome 4 were 24 nt, 23 nt, and 22 nt, respectively. On the other hand, 51.0% of the s-traits regulated by the consecutive distant-sQTL hotspots harboring OsDCL2b on chromosome 9 were 22 nt while 35.8% were 24 nt. For s-traits regulated by distant-QTLs in Bin448–Bin453 (containing OsDCL2a), 40.8% of them were 21 nt and 24.4% were 22 nt and 24.1% were 24 nt.

Five local-eQTLs and six distant-eQTLs for nine transcripts of these sRNA biogenesis genes were identified (Supplementary file 10 in Dryad [Wang et al., 2015]). We also compared the genomic positions of these eQTLs with the distant-sQTL hotspots. A distant-eQTL explaining the expression variation of OsDCL2a was located in Bin1207, which was in the consecutive distant-sQTL hotspots (Bin1199–Bin1210) on chromosome 9 (Supplementary file 10 in Dryad [Wang et al., 2015]). The expression correlations between s-traits regulated by QTLs in these bins and the e-traits of OsDCL2a were mostly strong (Figure 5—figure supplement 2D). The functional annotations and DNA polymorphisms between the two parents for all the genes in the distant-sQTL hotspots harboring OsRDR2, OsDCL2b, and OsDCL2a were listed in Supplementary file 11 in Dryad (Wang et al., 2015).

The distant-sQTL hotspots on chromosome 5 (Bin770–Bin777) and 6 (Bin850–Bin860) were also surveyed. Approximately 98.4% and 90.5% of all the s-traits regulated by the distant-sQTL hotspots on chromosome 5 and chromosome 6 were 24 nt. The functional annotations of genes in the distant-sQTL hotspots on chromosome 5 were investigated, and no apparent relationship to the biogenesis of sRNA was found, implicating the existence of unknown mechanisms of sRNA biogenesis and/or regulation (Supplementary file 12 in Dryad [Wang et al., 2015]). A gene annotated as RNA Pol II subunit Rpb7 was found in the distant-sQTL hotspots on chromosome 6 (Supplementary file 12 in Dryad [Wang et al., 2015]). Study in yeast reveals that Rpb7 has a specific role in pre-siRNA transcription (Djupedal et al., 2005).

Genetic co-regulation of s-traits originating from the same mother genes

To test whether the expression of s-traits originating from the same mother gene was regulated by the same genetic mechanism, the genomic positions of QTLs for s-traits originated from the same mother gene were compared with each other. S-traits originated from LOC_Os03g01360 and LOC_Os07g01240 were not included in this calculation. We calculated correlations for the following comparisons: 97,040 pairs of s-traits that were both controlled by local-sQTLs (Figure 6A), 36,123 pairs of s-traits that were both controlled by distant-sQTLs (Figure 6B), and 51,081 pairs of s-traits in which one was controlled by local-sQTL and the other one by distant-sQTL (Figure 6C). For a pair of s-traits originated from the same gene controlled by local-sQTL, the correlation coefficients between them were mostly strong (Figure 6A). The correlation coefficients between s-traits controlled by the same or nearby distant-sQTLs were also strong (Figure 6B), while the correlation coefficients between s-traits controlled by distant-sQTLs distant from each other were relatively weak (Figure 6B), which represented the majority (75.0%) of the s-traits pairs. Most of the correlations between the s-traits of local- and distant-regulation were also quite high (Figure 6C). For comparison, we also calculated correlations for all pairs of s-traits from the entire genome not necessarily the same genes that were regulated by the same sQTLs; their expression correlations were also mostly strong, either positive or negative (Figure 6D), indicating that the strong correlations between the s-traits had common genetic basis.

Genetic co-regulation between s-traits originating from the same mother genes.

(A) Expression correlations between pairs of s-traits originating from the same mother genes and both regulated by local-sQTLs. (B) Expression correlations between pairs of s-traits originating from the same mother genes and both regulated by distant-sQTLs. Green: distance between the two distant-sQTLs was within two 2 Mb; red: distance between the distant-sQTLs was larger than 2 Mb. (C) Expression correlations between pairs of s-traits from the same mother genes with one regulated by distant-sQTL and another one regulated by local-sQTL. (D) Expression correlations between pairs of s-traits from the whole genome regulated by distant-QTLs located in ≤2 Mb regions.

https://doi.org/10.7554/eLife.03913.031

Genetic co-regulation of s-traits and their mother genes

To infer the commonality of the genetic controls of the expression variations for sRNAs and their mother genes, we compared the map positions of the QTLs for sRNAs and their mother genes. Again s-traits and e-traits originating from LOC_Os03g01360 and LOC_Os07g01240 were not included in this calculation. We calculated correlations for the following comparisons: 4018 s-traits and 1084 corresponding mother genes that were both controlled by local-QTLs (Figure 7A), 1721 s-traits and 695 mother genes both regulated by distant-QTLs (Figure 7B), 1840 s-traits controlled by local-sQTLs and 544 corresponding mother genes regulated by distant-eQTLs (Figure 7C), and 1110 s-traits and 582 mother genes controlled by distant-sQTLs and local-eQTLs, respectively (Figure 7D). The expression correlations between s-traits and their mother genes that were both controlled by local-QTLs or the same and/or nearby distant-QTLs were mostly strong (Figure 7A,B). On the other hand, the correlations between s-traits and their mother genes that were regulated by different/distant-QTLs were relatively weak (Figure 7B–D). For all the pairs of e-traits and s-traits regulated by the same QTL position with the s-traits not necessarily from the genomic region of the e-traits, their expression correlations were mostly strong (Figure 7E), again indicating a common genetic basis.

Genetic co-regulation between s-traits and their mother genes.

(A) Expression correlations between s-traits and their mother genes that were both controlled by local-QTLs. (B) Expression correlations between s-traits regulated by distant-sQTLs and their mother genes also controlled by distant-eQTLs. Green, the distance between distant-QTLs for s-traits and the mother genes is within 2 Mb. Red, the distance between distant-QTLs is larger than 2 Mb. (C) Expression correlations between s-traits regulated by local-sQTLs and their mother genes controlled by distant-eQTLs. (D) Expression correlations between s-traits regulated by distant-sQTLs and their mother genes controlled by local-eQTLs. (E) Expression correlations between s-traits and e-traits from the whole genome regulated by distant-QTLs located in ≤2 Mb.

https://doi.org/10.7554/eLife.03913.032

The genetic effects of the sQTLs and scQTLs

There were three genotypes at each polymorphic locus in the IMF2 population: AA, Aa, and aa, where A represents the allele from one parent and a is the allele from the other parent. This allowed estimating of additive and dominant genetic effects for each of the sQTLs. Additive effect for sQTL was half of the sRNA expression difference between the two homozygotes, and dominant effect was the difference between the heterozygote and the average of the two homozygotes.

Of all the s-traits analyzed, s-traits regulated by 35,235 (49.7%) of the sQTLs exhibited higher expression levels in the Zhenshan 97 genotype than Minghui 63 genotype (Supplementary file 13 in Dryad [Wang et al., 2015]). We identified sQTLs with significant dominance effects using h-test (1000 permutations, p < 0.05) (Huang et al., 2006; Zhou et al., 2012). The analysis detected 23,018 sQTLs (32.5%) with significant dominant effects, of which the majority (71.8%) showed negative dominance such that the levels of the sRNAs were lower in heterozygotes than the mid-parent values (Supplementary file 13 in Dryad [Wang et al., 2015]). Moreover, 3842 (5.4%) sQTLs exhibited negative overdominance, in which sRNA levels of the heterozygotes were lower than both homozygotes. Approximately 92.4% of the sQTLs showing significant overdominance effects, either positive or negative, were distant-sQTLs.

The genetic effects of scQTLs were quite similar to that of sQTLs. Of all the 6913 scQTLs exhibiting significant dominant effects, 5140 (74.3%) showed negative dominance (Supplementary file 13 in Dryad [Wang et al., 2015]). A total of 1167 scQTLs (5.2%) exhibited negative overdominance, 89.3% of which were distant-scQTLs.

A closer look of LOC_Os03g01360 and LOC_Os07g01240

We took a closer look at the two loci that contributed extremely high numbers of sRNAs. Approximately 97.4% (2278) of the s-traits produced by Bin358 were encoded by the 2-kb upstream and genic region of the locus LOC_Os03g01360, where the sRNA sequences showed extensive and consecutive overlapping (Figure 8A). LOC_Os03g01360 did not have homologs in Arabidopsis but had a homolog in maize with unknown function (http://rice.plantbiology.msu.edu/cgi-bin/ORF_infopage.cgi?orf=LOC_Os03g01360). Three isoforms of this gene were significantly differentially expressed as measured in FPKM between Zhenshan 97 and Minghui 63 (Figure 8—figure supplement 1A). The sRNAs of Zhenshan 97 were mainly derived from the second and the third part of LOC_Os03g01360, while sRNAs of Minghui 63 originated from all the three parts with the expression level from the third part lower than that of sRNAs of Zhenshan 97 (Figure 8A). The expression levels of LOC_Os03g01360.1 and LOC_Os03g01360.4 in the genome of Minghui 63 were higher than Zhenshan 97 (Figure 8—figure supplement 1A). On the contrary, LOC_Os03g01360.2 was higher expressed in Zhenshan 97 than in Minghui 63. The expression of all the three transcripts in the hybrid was down-regulated compared with the mid-parent value (Figure 8—figure supplement 1A). We checked the cytosine methylation levels in this region, and the genomic DNA showed high cytosine methylation, especially from the start to the third exon (Figure 8A, ‘Materials and methods’). This high methylation might be due to the enrichment of sRNAs resulting in RNA-directed DNA methylation. In the differentially expressed region of sRNAs between Zhenshan 97 and Minghui 63, such as the third exon according to the annotated gene model, Minghui 63 had lower methylation and higher mRNA transcript level than Zhenshan 97, suggesting a negative correlation between the transcript level and methylation. The expression variation of three isoforms of LOC_Os03g01360 in the IMF2 population was controlled by local-eQTLs (Supplementary file 7 in Dryad [Wang et al., 2015]). At least one sQTL each was detected for 2200 of the 2278 s-traits located in the gene LOC_Os03g01360 region, of which 2181 (84.5%) were local-sQTLs (Supplementary file 7 in Dryad [Wang et al., 2015]). The expression variation of different transcripts of LOC_Os03g01360 probably corresponded to sRNAs from specific parts of the genic region and is regulated by the same genetic factor. The majority of distant-sQTLs regulating the expression of s-traits originating from LOC_Os03g01360 was for 21 and 22 nt s-traits and was attributed to Bin442–Bin453 on chromosome 3.

Figure 8 with 1 supplement see all
The expression levels of sRNA, mRNA, and the DNA methylation levels of the hybrid and its parents in the region of LOC_Os03g01360 and LOC_Os07g01240.

(A) The structure of LOC_Os03g01360 is shown in the top panel, with black color representing exons and red color indicating introns and UTRs. The number of mRNA reads is shown in blue color. The sRNAs expression level is shown in orange. The DNA methylation level is displayed in the bottom three panels. ZS, Zhenshan 97; MH, Minghui 63; HY, Hybrid. (B) The structure of LOC_Os07g01240 is shown in the top panel, with black color representing exons and green color indicating introns and UTRs. The meaning of different colors in the other panels is the same as in (A).

https://doi.org/10.7554/eLife.03913.033

The sRNAs in Bin969 were almost entirely derived from the intron of LOC_Os07g01240, of which 617 (61.2%) were 21-nt and 22-nt sRNAs and 280 (27.8%) were 24-nt sRNAs (Figure 8B). LOC_Os07g01240 had a homolog in Arabidopsis with unknown function and in Maize annotated as uncharacterized GPI-anchored protein (http://rice.plantbiology.msu.edu/cgi-bin/ORF_infopage.cgi?orf=LOC_Os07g01240) and was reported to modulate rice leaf rolling by regulating the formation of bulliform cells (Xiang et al., 2012). In all, 677 s-traits mapped 841 sQTLs, of which 514 were local-sQTLs (Supplementary file 7 in Dryad [Wang et al., 2015]). However, no significant expression difference of the two isoforms of LOC_Os07g01240 was detected between the parents (Figure 8—figure supplement 1B), and thus no eQTL for the e-traits was detected. The majority of the 327 distant-sQTLs was for 21 and 22 nt s-traits and was again attributed to Bin442–Bin453 on chromosome 3. Some of the distant-sQTLs for 24 nt s-traits were attributed to Bin770–Bin775 on chromosome 5. The expression correlation between e-traits of LOC_Os07g01240 and s-traits originating from LOC_Os07g01240 was weak (Figure 3B). This result was similar to that of Tong et al. (2013) who showed that intronic mRNAs had a distinct expression pattern from their host genes, suggesting that intronic sRNAs might have genetic regulatory mechanisms independent of their mother genes.

Discussion

The variation of sRNA abundance in a population depends on a range of factors: the levels of transcription of the precursors that is influenced by local elements mostly residing in the promoter regions of the sRNA itself; the mother gene or long non-coding RNAs where the sRNA is produced; polymorphisms in sequence influencing their biogenesis; and variation in distant regulatory factors located distantly from where the sRNA is generated. Our results indicated that almost one third of the detected local-sQTLs exhibited much larger effects than distant-sQTLs, consistent with the studies for eQTL of mRNA expression (Kliebenstein, 2009), suggesting that local elements are a major regulatory factor for sRNA variation. Local-QTLs regulating the expression of large numbers of traits were observed for s-traits but not for e-traits, resulting from the enrichment of s-traits in the region produced by the extensive and consecutive overlapping transcripts from the mother genes. Hotspots were found for both distant-QTLs of mRNA e-traits and sRNA s-traits, both of which could influence the expression variations of a large number of traits.

Our results showed that quantitative variation for a large portion of sRNAs transcribed from genic regions did not share the genetic control of the expression with the corresponding mother genes, although by common sense those sRNAs have to be transcribed along and regulated by the promoters of the mother genes. Similar results were reported in previous studies of miRNAs and their host genes, which were regulated or processed independently from their respective regulatory elements (Siddle et al., 2014). The results of Monteys et al. (2010) and Ozsolak et al. (2008) suggested that the transcription of sRNAs may also be regulated by their own promoters, independent of the mother genes. The widespread occurrence of such independent genetic controls between the sRNAs and the mother genes suggested that independent transcription of the sRNAs might be a common phenomenon, rather than only special circumstances. Moreover, the large number of distant-sQTLs that were not collocated with the distant-eQTLs for the corresponding mother genes supported the independent regulatory basis of the transcription of the sRNAs.

Our results showed that some of the sRNA biogenesis genes such as DCLs, AGOs, and RDRs were probably responsible for the quantitative variation of a large number of sRNAs. OsRDR2 was found in a region of consecutive distant-sQTL hotspots explaining the expression variation of many sRNAs, most of which were 24 nt. This was also a region of consecutive distant-scQTL hotspots with the highest number of distant-scQTLs. OsDCL2b was also found in consecutive distant-sQTL hotspots regulating the expression of sRNAs, most of which were 22 and 24 nt. Although OsDCL2a and OsDCL1c were not in distant-sQTL hotspots due to lack of local recombination in that region, they were also in bins regulating the expression of a large number of s-traits. This is in accordance with the reports that RDRs function on the upstream of DCLs in the process of sRNA biogenesis (Chapman and Carrington, 2007). Studies in sRNA pathways in Arabidopsis showed that DCL2 is responsible for the synthesis of 22 nt or 24 nt siRNAs, while RDR2 functions in the production of endogenous 24 nt siRNAs and the conversion of ssRNA template into dsRNAs that serve as substrates for DCLs (Arikit et al., 2013), which is in good agreement with the sQTLs found in this study. However, it should also be noted that these genes were associated with s-QTLs for only a small portions of the s-traits, while the quantitative variation of the sRNA abundance for majority of the s-traits was independent of the sRNA biogenesis genes.

One of the most interesting finding perhaps concerns the sRNAs from the two loci, LOC_Os03g01360 and LOC_Os07g01240, and their regulation patterns. Although both loci produced thousands of sRNAs, which were the most numerous in the genome, they showed sharp contrast in where the sRNAs were generated and the regulatory mechanisms with which the sRNAs were produced. sRNAs from LOC_Os03g01360 mostly originated from the 2-kb upstream and genic regions and were tightly co-regulated with each other and also with the mother gene. By contrast, sRNAs from LOC_Os07g01240 were mostly produced in the intronic region and loosely co-regulated with each other and not co-regulated with the mother gene; in fact, the s-trait variation and sQTLs were detected even without the expression variation of the mother gene. Since neither of the genes has been previously identified as related to the production and function of sRNAs, their roles in sRNA biogenesis warrant further investigation.

Another noticeable finding is the widespread negative dominance of sRNA levels detected in the IMF2 population such that heterozygotes had lower level of the sRNAs than the means of the two homozygotes (negative partial dominance) or the lower homozygote (negative overdominance). Such negative dominance was also observed in comparative sRNA profiling of hybrids relative to the parents in crosses of rice and Arabidopsis (Groszmann et al., 2011b, 2013; He et al., 2013), revealing a predominant-negative regulation of siRNA expression in hybrids (He et al., 2010; Chodavarapu et al., 2012). Our genetic analysis revealed that the sQTLs showing negative dominance of the sRNA levels are mostly distant-sQTLs, indicating regulation by distant elements likely at the transcriptional level. Studies of transcript levels of genes (mRNAs) in the hybrid against the parents in the same rice cross also revealed more negative dominance than positive dominance, also indicating down-regulation in the hybrid relative to the parents (Huang et al., 2006). How the down-regulation of these two classes of transcripts was related to each other and how the two classes of down-regulation are related to the hybrid performance present great challenges for future studies.

Materials and methods

Plant materials and growth conditions

Request a detailed protocol

The plant materials consisted of an IMF2 population of 98 hybrids produced by paired crosses of 196 RILs (Supplementary file 14 in Dryad [Wang et al., 2015]), the two parental lines Zhenshan 97 and Minghui 63 and their hybrid. The plants were grown under normal agricultural conditions at experimental farm of Huazhong Agricultural University in the rice-growing season (May to September) in Wuhan, China. A flag leaf at the day of full expansion from each of three random plants per replicate was harvested between 17:00 and 18:00 for library construction.

Library construction and sequencing

Request a detailed protocol

Total RNA was isolated from the leaf tissue using TRIzol reagent (Invitrogen, Waltham, Massachusetts), the process to convert total RNA into template suitable for high throughput DNA sequencing of mRNA-seq and sRNA-seq libraries using Sample Preparation Kit (Illumina, San Diego, California) followed the manufacturer's guide. Bisulfite sequencing (WGBS) libraries of the parents and the hybrid were made from genomic DNA isolated from the same leaf tissues as used for RNA-seq libraries. Sequencing was performed on Illumina HiSeq 2000 at BGI (http://www.genomics.cn/index, Shenzhen, China) (50 bp single end for mRNA-seq and 100 bp pair end for BS-seq).

sRNA data processing and SNPs-replaced reference genomes of the parents

Request a detailed protocol

Poor-quality reads were removed using fastq_quality_filter in FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) with parameters q = 20 and p = 85. Reads shorter than 18 nt or longer than 26 nt were excluded from further analysis. Reads mapped to rice tRNA, rRNA, snRNA, snoRNA obtained from fRNAdb (http://www.ncrna.org/frnadb/), NONCODE, GtRNAdb (http://lowelab.ucsc.edu/GtRNAdb/Osati/), and Rfam (ftp://ftp.sanger.ac.uk/pub/databases/Rfam/11.0/) were also removed.

There were 2,305,391 high-quality SNPs between Nipponbare (Oryza sativa ssp. japonica) and Zhenshan 97 and Minghui 63 (Oryza sativa ssp. indica) in the whole genome, of which 971,883 were the specific SNPs between Zhenshan 97 and Minghui 63 (data from http://211.69.128.148/rice/). We corrected the SNP sites of the Nipponbare reference genome (http://rice.plantbiology.msu.edu, The MSU Rice Database release 7.0) according to the sequences of Zhenshan 97 and Minghui 63 to reconstruct ‘SNPs-replaced reference genomes’ for the two parents.

For sRNA analysis, all filtered sRNA-seq reads from Zhenshan 97 libraries were specifically mapped to the SNPs-replaced Zhenshan 97 genome and reads from Minghui 63 libraries were mapped to SNPs-replaced Minghui 63 genome. The sRNAs from heterozygous materials of F1 and IMF2 were simultaneously mapped to the Zhenshan 97- and Minghui 63-replaced genomes. Bowtie (Langmead et al., 2009) was used to align short reads to each genome at unique genome location with no mismatch allowed. The reads from heterozygous materials could be divided into three groups: reads specifically mapped to Zhenshan 97, reads only mapped to Minghui 63 genome, and reads mapped to both Zhenshan 97 and Minghui 63 at the same position. In the third class, the sequences of the sRNAs from a monomorphic site were indistinguishable between the parents and thus obtained one count in quantitation of the abundance, whereas the two counts of the sRNAs from a polymorphic site were put together.

Quantification of sRNA or sRNA cluster expression level

Request a detailed protocol

The expression level of an sRNA in a specific library was defined as the number of this sRNA divided by the total number (in millions) of genome-mapped sRNAs in this library, which was designated as ‘RPM’. The R package DESeq (Anders and Huber, 2010) was used to quantify sRNA cluster expression level. The number of sRNA reads in each sRNA cluster for each sample was calculated and integrated as a count table, with each line representing an sRNA cluster and each column representing a sample. Then, the effective library size for each sample was estimated using the ‘estimateSizeFactors’ function in the DESeq package. Each column of the count table was divided by the corresponding library size to get the normalized read count, which was regarded as the expression level of the sRNA cluster.

Processing of mRNA sequencing data

Request a detailed protocol

The removal of poor-quality reads for mRNA-seq reads was done in the same way as sRNA analysis. The sequences from 104 libraries were mapped to the O. sativa ssp. japonica (cv. Nipponbare) version 7 reference genome using TopHat (Trapnell et al., 2009) with default parameters. Cufflinks (Trapnell et al., 2010) were utilized to estimate gene expression levels according to the Nipponbare version 7 reference annotation.

The analysis of bisulfite sequencing

Request a detailed protocol

For bisulfite sequencing, trimmomatic (Lohse et al., 2012) was used to remove low-quality reads. Bismark (Krueger and Andrews, 2011) was performed to align bisulfite-treated reads to the SNP-replaced genomes, allowing no mismatch in the seed of 40 nucleotides and up to two good alignments. This bisulfate mapping tool aims to find unique alignment through running four alignment processes simultaneously (Krueger and Andrews, 2011). Then, the de-duplication tool provided by Bismark was applied to remove potential PCR duplicates. Methylation calls were extracted for every single cytosine analyzed depending on its context (CpG, CHG, or CHH).

QTL analysis

Request a detailed protocol

The ultrahigh-density bin map constructed by genotyping the RILs with population sequencing (Xie et al., 2010; Yu et al., 2011) was used. The 1568 bin genotypes of each cross in the IMF2 population were deduced from the parental genotypes (Supplementary file 6 in Dryad [Wang et al., 2015], Figure 4—figure supplement 2). CIM in R/qtl (Haley and Knott, 1992; Broman and Speed, 2002; Manichaikul et al., 2009) was employed to map QTLs with 1000 permutations. Additive and dominant effects were decided by ‘effectscan’ function in R/qtl. Variation explained by the QTL was determined using the linear QTL model as described by Yu et al. (2011). The same genetic map and program parameters were used in the QTL analysis for s-traits, sc-traits, and e-traits.

Definition of QTL and trait hotspots

Request a detailed protocol

The density of s-trait and sQTL was defined as the number of s-traits and sQTLs in each bin divided by the bin size (Mb), respectively. The density of sc-traits, scQTLs, e-traits, and eQTLs was calculated in the same way. Bins with s-trait density larger than three times of the whole genome average level were defined as s-trait hotspots, while bins with sQTL density higher than six times of the whole genome average level were designated as sQTL hotspots. The process of identification of e-traits and eQTL hotspots was the same as that of s-traits and sQTLs. The definition of scQTL hotspots was identical to sQTL hotspots, while sc-traits hotspots were defined as bins with sc-traits density higher than the twice of the whole genome average.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
    Genetic dissection of an elite rice hybrid revealed that heterozygotes are not always advantageous for performance
    1. JP Hua
    2. YZ Xing
    3. CG Xu
    4. XL Sun
    5. SB Yu
    6. Q Zhang
    (2002)
    Genetics 162:1885–1895.
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
    Data from: Genetic basis of sRNA quantitative variation analyzed using an experimental population derived from an elite rice hybrid
    1. J Wang
    2. W Yao
    3. D Zhu
    4. W Xie
    5. Q Zhang
    (2015)
    Dryad Data Repository, 10.5061/dryad.9d030.
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57

Decision letter

  1. Justin Borevitz
    Reviewing Editor; Australian National University, Australia

eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.

Thank you for sending your work entitled “Genetic basis of sRNA quantitative variation in rice” for consideration at eLife. Your article has been favorably evaluated by Detlef Weigel (Senior editor) and three reviewers, including Justin Borevitz as guest Reviewing editor.

The Reviewing editor and the other reviewers discussed their comments before we reached this decision, and the Reviewing editor has assembled the following comments to help you prepare a revised submission.

You present a large body of new work, with the identification of sRNA eQTL being of wide interest. The explicit questions asked are whether sRNAs are regulated by the genes of origin, and what the genetic architecture of miRNA expression is, by making use of immortalized F2s (RILs).

The full mRNA eQTL analysis must be included. Based on this, specific tests looking for genetic co regulation of sRNA+mRNA and specific loci should be done, with independence being the null hypothesis. Also, you need to follow up the positive associations to candidate genes as suggested by other reviewers.

We would be happy to re-review a substantially revised version.

Full comments from the Reviewing editor:

Wang et al (2014) use an immortalized F2 (IMF2) population generated by crossing RILs from a cross between Zhenshan 97 and Minghui 63 created previously in their laboratory (Hua et al. 2002; Hua et al. 2003) to look into the variation of small RNA expression (from RNA sequencing data). Although numerous eQTL analyses have been previously performed, the novelty is here that they look specifically to the abundance of small RNAs and that has never been done before. Moreover, the added value of using the IMF2 population in comparison to conventional RILs is that this population allows the investigation of additive and dominance effects, and heterosis. The manuscript is well written and will be comprehensible to a broad readership

The authors give a very detailed description about the number of sRNA expression traits (they refer to as e-traits), the number of cis and trans-sQTL that they find, and the overlap of the e-traits with the mRNA expression of their mother genes. The major findings are that the sRNAs are mostly independently regulated from the sRNA biogenesis genes, that the SRNA are differently transcribed from their mother genes and that most of the heterozygotes show negative dominance. Although these results on the genetic regulation of sRNAs are interesting and valuable to the plant science community, I am not yet convinced that these results have enough added value to the literature for inclusion in eLife.

My major point of criticism is that the analyses are very descriptive and although many sQTLs are found, the genetic regulators of sRNAs are still unknown. Furthermore, the authors might want to go into further detail for the majority of their findings.

Major points:

1) The cis-sQTL hotspots on chr 3 and 7 are not elaborated on and remain purely descriptive analyses. The authors could at least give information on what the function of these genes is, whether they are co-expressed with other genes, do they have homologs in Arabidopsis or other plants, etc. Trans-sQTL hotspots are also not elaborated on, except the more detailed analysis of the sRNA biogenesis genes.

2) The OsDCL2 gene that is found to underlie a trans-sQTL and that has a cis-eQTL is also not analyzed further. Is there a correlation in expression between the eQTL and the sQTL? What happens if the OsDCL2 gene is silenced or over-expressed to the sRNA expression?

3) The authors published data on yield and heterosis for the same IMF2 population previously (Hua et al. 2002; Hua et al. 2003) and it would be very interesting to see whether the negative dominance effects are shared and correlated among the lines. Is there an effect of different sRNA expression on yield?

4) I further miss an analysis of the variation in the population, the heritability and a detailed comparison of parents and hybrid.

The other reviewer also largely agrees.

Decision recommendation:

In a population derived from crossing RIL lines, that were developed from a cross between two distinct rice inbreds (Zhenshan 97 and Minghui 63), Jian Wang and colleagues investigate: a) expression levels for small RNAs between 18 and 26 nt in length, b) mRNA expression levels and c) (genome-wide) DNA methylation by bisulfite sequencing.

Wang et al subject the small RNA expression data to an eQTL analysis and report dozens of thousands of QTLs for a similar number of e-traits.

The population consisted of 98 individuals. Hence, to accumulate the data for this project the authors produced 300 or more second generation sequencing libraries.

Zhenshan 97 and Minghui 63 are the parents of Shanyou 63, a very successful rice hybrid in China. First generation offspring of original parents (the hybrid) is heterozygous over the entire genome. But a population of individuals derived from crossing RILs provides a means to dissect the genetic interactions of this hybrid as F1s derived from different RILs will represent a different set of genes/alleles in homozygous/heterozygous states.

Hence, the chosen population is indeed suitable to elucidate genetic interactions underpinning small RNA expression; at least in this particular hybrid.

The first criticism hence concerns the title. It makes too broad a claim.

The study uses a very suitable population and a very suitable approach and technology. The study is potentially of high value for our understanding of the biology of hybrids, however, my recommendation at this point is to reject the paper. We will take another look at a revised version.

Main reasons:

I feel the study lacks a clear focus and clear messages. It is mainly a large collection of data spiced up with examples of which the reader doesn't know how representative they are.

Despite the large amount of data it is not really a resource, because the data is not accessible. Lists of QTLs is not accessible data. If a resource, then the interested reader may want to explore expression levels and methylation status himself, genome-wide, to explore the QTLs.

The low (arbitrary?) thresholds for QTL detection and arbitrary filters for including or excluding e-traits do not help in making it a resource. The reader is left with tens of thousands of QTLs of low LOD scores (see Figure 2).

I found the paper a difficult read, which was mainly due to the many different layers and concepts that one has to constantly keep separate in ones head: sRNA, siRNA, miRNA, eQTL, sQTL, e-trait, mother gene, target genes, TE genic, non-TE genic, etc., convoluted by ratios and percentages and 20(!) supplementary tables, all long list of numbers. Tens of thousands of e-traits and QTL (eQTLs? sQTLs?) described does not make it easier.

It is unclear why siRNA and miRNA are dealt with together. Given the different roles, genesis and modes of operation of the small RNAs (miRNAs and siRNA), they should be kept separate as much as possible. miRNAs are post-transcriptional control and siRNA (24mers) are part of the epigenetic machinery. It is probably not expected that they and/or their genetic control have anything in common, and the study finds ample differences. Strict separation of the 2, best in 2 different papers, would reduce confusion.

Also it is unclear why each sRNA is treated as a separate trait. Based on their biogenesis all that come from the same mother gene are jointly transcribed at least. Different abundance of sRNAs from the same ‘mother gene’ points to either interesting biology or technical artifacts. This is not explored nor discussed.

My main criticism, however, is that the presented results do not make the use of most of the data. Besides the sRNA abundance data, there is mRNA expression data (by RNAseq) for all lines, and even genome wide methylation data (by bisulfite sequencing) of all lines. These data are only mentioned in passing and selectively used in (arbitrary?) examples. If a whole genome view on sRNA control is to be presented, then the study needs to integrate over all 3 data sets (mRNA, sRNA, Methylome) in a genome wide manner.

Please address these points in a major revision focusing on highlighting the major findings supporting your sRNA regulation of mother genes. I will take a look and make a decision.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled “Genetic basis of sRNA quantitative variation analyzed using an experimental population derived from an elite rice hybrid” for further consideration at eLife. Your revised article has been favorably evaluated by Detlef Weigel (Senior editor) and a guest Reviewing editor. The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

Figure 2 should be flipped along the diagonal so that QTL are along the x axis and traits on the y. This would show the vertical trans bands as is standard practice. A higher cutoff for the image or log scale coloring may highlight the major loci above the general noise.

In the Results section, cis-QTL may show allele specific expression. The term should be reserved for allelic tests and ‘local’ should be used when eQTL map near to their source gene.

https://doi.org/10.7554/eLife.03913.035

Author response

You present a large body of new work, with the identification of sRNA eQTL being of wide interest. The explicit questions asked are whether sRNAs are regulated by the genes of origin, and what the genetic architecture of miRNA expression is, by making use of immortalized F2s (RILs).

The full mRNA eQTL analysis must be included. Based on this, specific tests looking for genetic co regulation of sRNA+mRNA and specific loci should be done, with independence being the null hypothesis. Also, you need to follow up the positive associations to candidate genes as suggested by other reviewers.

Your suggestion has led to major change in the focus of the paper, which required almost complete reanalysis of the data. In the revised version, the main focus was on the co-regulation of the sRNAs, sRNAs and the mRNAs of their mother genes, mostly by correlation and QTL analysis. The manuscript now appears to be more focused. We devoted a section in the Results and a paragraph in Discussion on the two loci that produced huge numbers of sRNAs, also spent more words on the sRNA biogenesis genes. We also analyzed the possible roles of the sQTLs by relating the sQTLs especially the hotspots of trans-QTLs with sRNA biogenesis genes and also the length classes of the sRNAs. These changes have led to re-organizing the materials and rewriting for most of the Results section. We also made effort to unify the terminology to ease the reading.

Major points:

1) The cis-sQTL hotspots on chr 3 and 7 are not elaborated on and remain purely descriptive analyses. The authors could at least give information on what the function of these genes is, whether they are co-expressed with other genes, do they have homologs in Arabidopsis or other plants, etc. Trans-sQTL hotspots are also not elaborated on, except the more detailed analysis of the sRNA biogenesis genes.

We devoted a section in the Results and a paragraph in the Discussion on the two loci that produced huge numbers of sRNAs, LOC_Os03g01360 and LOC_Os07g01240 respectively. LOC_Os03g01360 does not have homolog in Arabidopsis but has homolog in Maize with unknown function (http://rice.plantbiology.msu.edu/cgi-bin/ORF_infopage.cgi?orf=LOC_Os03g01360). LOC_Os07g01240 has homologs in Arabidopsis with unknown function and in Maize annotated as “uncharacterized GPI-anchored protein” (http://rice.plantbiology.msu.edu/cgi-bin/ORF_infopage.cgi?orf=LOC_Os07g01240). This gene was reported to modulate rice leaf rolling by regulating the formation of bulliform cells (Xiang et al. 2012).

We enhanced the trans-QTL analysis in the revised version, including the relationship the QTL hotspots with the sRNA biogenesis genes, correspondence of trans-sQTLs with the species (length class) of the sRNAs which made the sQTLs more meaningful.

2) The OsDCL2 gene that is found to underlie a trans-sQTL and that has a cis-eQTL is also not analyzed further. Is there a correlation in expression between the eQTL and the sQTL? What happens if the OsDCL2 gene is silenced or over-expressed to the sRNA expression?

We added the analysis to in the revised version: “OsDCL2b (in Bin1205) was located in the consecutive trans-sQTL hotspots on chromosome 9 (Figure 5, Supplementary file 10 in Dryad (Wang et al. 2015)”. For s-traits regulated by QTLs in the consecutive trans-sQTL hotspots (Bin1199–Bin1210) on chromosome 9, most of the correlations between them and the e-traits in these bins were weak while the correlations between them and the e-traits of OsDCL2b were relatively strong (Figure 5—figure supplement 2C). About 51.0% of the s-traits regulated by the consecutive trans-sQTL hotspots harboring OsDCL2b on chromosome 9 were 22 nt while 35.8% were 24 nt. The DCL2 gene of Arabidopsis was reported to be responsible for the synthesis of 22 nt or 24 nt siRNA. We have not performed the silencing or over-expression of OsDCL2b at present.

3) The authors published data on yield and heterosis for the same IMF2 population previously (Hua et al. 2002; Hua et al. 2003) and it would be very interesting to see whether the negative dominance effects are shared and correlated among the lines. Is there an effect of different sRNA expression on yield?

The crosses were not the same as in the previous studies, although the RILs are. So correlations cannot be performed as suggested.

4) I further miss an analysis of the variation in the population, the heritability and a detailed comparison of parents and hybrid.

We added the analysis of coefficients of variation for the 165,797 s-traits (Figure 1—figure supplement 4) and selected 9 s-traits randomly from all the 165,797 s-traits to illustrate the distributions of the expression values across the IMF2 population (Figure 1—figure supplement 5). We did have data of DNA methylation, mRNA expression and sRNA expression of parents and hybrid. However, we found it hard to relate this comparison to the “genetic basis of sRNA expression” and including that part would divert the focus and make the paper unnecessarily lengthy. Since the manuscript was already criticized by one of the reviewers as having too much data, which we also agree, we decided not to include this part in the revised manuscript.

The other reviewer also largely agrees. […]

The first criticism hence concerns the title. It makes too broad a claim.

We changed it to “Genetic basis of sRNA quantitative variation analyzed using an experimental population derived from an elite rice hybrid” in the revised manuscript.

The study uses a very suitable population and a very suitable approach and technology. The study is potentially of high value for our understanding of the biology of hybrids, however, my recommendation at this point is to reject the paper. We will take another look at a revised version.

Main reasons:

I feel the study lacks a clear focus and clear messages. It is mainly a large collection of data spiced up with examples of which the reader doesn't know how representative they are.

We had re-analyzed all the data and re-drafted the manuscript. In the revised version the main focus was on the co-regulation of the sRNAs, sRNAs and the mRNAs of their mother genes, mostly by correlation and QTL analysis. The manuscript now appears to be more focused.

Despite the large amount of data it is not really a resource, because the data is not accessible. Lists of QTLs is not accessible data. If a resource, then the interested reader may want to explore expression levels and methylation status himself, genome-wide, to explore the QTLs.

We made the data available in a genome browser (http://srna.ncpgr.cn/), and will submit to a public database upon acceptance of this paper.

The low (arbitrary?) thresholds for QTL detection and arbitrary filters for including or excluding e-traits do not help in making it a resource. The reader is left with tens of thousands of QTLs of low LOD scores (see Figure 2).

We re-analyzed the whole data. Actually, 53,613,739 unique sRNA reads were filtered to 165,797 s-traits. An sRNA with RPM value ≥0.6 was regarded as expressed, and an sRNA identified as expressed in more than 25 of the 98 IMF2s was regarded as an sRNA expression trait (s-trait). Both of the criteria are very stringent, if the huge numbers are taken into account. We performed 1,000 permutations (FDR set as 5%) for each QTL scanning and got LOD value thresholds for each QTL detected. Again we feel that the criteria are highly stringent, and the QTL scanning results in the revised manuscript were quite different from the results in the previous manuscript (see Figure 4 of the revised manuscript).

I found the paper a difficult read, which was mainly due to the many different layers and concepts that one has to constantly keep separate in ones head: sRNA, siRNA, miRNA, eQTL, sQTL, e-trait, mother gene, target genes, TE genic, non-TE genic, etc., convoluted by ratios and percentages and 20(!) supplementary tables, all long list of numbers. Tens of thousands of e-traits and QTL (eQTLs? sQTLs?) described does not make it easier.

Thanks for this comment. We restructured the writing by focusing on the genetics of co-regulation of the sRNAs (s-traits) and between sRNAs and the mother genes. We made efforts to unify the terminology used throughout. We deleted the part for miRNAs and condensed the writing to some extent, which reduced the text, figures and data files. We hope the paper now becomes easier to read.

It is unclear why siRNA and miRNA are dealt with together. Given the different roles, genesis and modes of operation of the small RNAs (miRNAs and siRNA), they should be kept separate as much as possible. miRNAs are post-transcriptional control and siRNA (24mers) are part of the epigenetic machinery. It is probably not expected that they and/or their genetic control have anything in common, and the study finds ample differences. Strict separation of the 2, best in 2 different papers, would reduce confusion.

We are grateful for this suggestion. The analysis on miRNA was removed from the revised manuscript and may be dealt with in a future paper.

Also it is unclear why each sRNA is treated as a separate trait. Based on their biogenesis all that come from the same mother gene are jointly transcribed at least. Different abundance of sRNAs from the same ‘mother gene’ points to either interesting biology or technical artifacts. This is not explored nor discussed.

Yes, sRNAs are jointly transcribed. However, sRNAs are not necessarily jointly transcribed from a mother gene since sRNAs were also found in the intergenic region. In this revision, we treated sRNAs transcribed from the same genomic region as an sRNA cluster, which was adopted by several previous studies (subsection headed “Expression correlations between sRNAs that originated from the same sRNA cluster”) (Shen et al. 2012). We also showed that the expression levels of sRNAs from the same mother genes or from the same clusters were slightly positively correlated (Figure 2 and Figure 2–figure supplement 5).

My main criticism, however, is that the presented results do not make the use of most of the data. Besides the sRNA abundance data, there is mRNA expression data (by RNAseq) for all lines, and even genome wide methylation data (by bisulfite sequencing) of all lines. These data are only mentioned in passing and selectively used in (arbitrary?) examples. If a whole genome view on sRNA control is to be presented, then the study needs to integrate over all 3 data sets (mRNA, sRNA, Methylome) in a genome wide manner.

In the revised version, we added eQTL analysis of the mRNAs, and used this for the co-regulation analysis of the sRNAs and their mother genes (subsection headed “Expression correlations between sRNAs and their mother genes”). We have data of DNA methylation for parents and hybrid, not the population, thus could not do genetic analysis.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Figure 2 should be flipped along the diagonal so that QTL are along the x axis and traits on the y. This would show the vertical trans bands as is standard practice. A higher cutoff for the image or log scale coloring may highlight the major loci above the general noise.

The QTL figures were already presented in the way as you suggested, the QTLs are in x-axis and the traits are in y-axis. We changed the presentation with a higher cutoff so that LOD less than 5 were not included in the presentation.

In the Results section, cis-QTL may show allele specific expression. The term should be reserved for allelic tests and ‘local’ should be used when eQTL map near to their source gene.

We changed the cis-QTL to local-QTL throughout the manuscript as suggested, which is more accurate. Accordingly, we also changed trans-QTL to distant-QTL.

https://doi.org/10.7554/eLife.03913.036

Article and author information

Author details

  1. Jia Wang

    1. National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
    2. National Center of Plant Gene Research, Huazhong Agricultural University, Wuhan, China
    Contribution
    JW, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Contributed equally with
    Wen Yao
    Competing interests
    The authors declare that no competing interests exist.
  2. Wen Yao

    1. National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
    2. National Center of Plant Gene Research, Huazhong Agricultural University, Wuhan, China
    Contribution
    WY, Analysis and interpretation of data, Drafting or revising the article
    Contributed equally with
    Jia Wang
    Competing interests
    The authors declare that no competing interests exist.
  3. Dan Zhu

    1. National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
    2. National Center of Plant Gene Research, Huazhong Agricultural University, Wuhan, China
    Contribution
    DZ, Acquisition of data
    Competing interests
    The authors declare that no competing interests exist.
  4. Weibo Xie

    1. National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
    2. National Center of Plant Gene Research, Huazhong Agricultural University, Wuhan, China
    Contribution
    WX, Analysis and interpretation of data
    Competing interests
    The authors declare that no competing interests exist.
  5. Qifa Zhang

    1. National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
    2. National Center of Plant Gene Research, Huazhong Agricultural University, Wuhan, China
    Contribution
    QZ, Conception and design, Analysis and interpretation of data, Drafting or revising the article
    For correspondence
    qifazh@mail.hzau.edu.cn
    Competing interests
    The authors declare that no competing interests exist.

Funding

National Natural Science Foundation of China (31330039)

  • Qifa Zhang

The 863 Program of China (2012AA10A304)

  • Qifa Zhang

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Reviewing Editor

  1. Justin Borevitz, Australian National University, Australia

Publication history

  1. Received: July 7, 2014
  2. Accepted: March 30, 2015
  3. Accepted Manuscript published: March 30, 2015 (version 1)
  4. Version of Record published: May 5, 2015 (version 2)

Copyright

© 2015, Wang et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,359
    Page views
  • 308
    Downloads
  • 3
    Citations

Article citation count generated by polling the highest count across the following sources: Scopus, PubMed Central, Crossref.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Evolutionary Biology
    2. Genetics and Genomics
    M Florencia Camus et al.
    Research Article
    1. Evolutionary Biology
    2. Genetics and Genomics
    Lauren N Booth et al.
    Research Article Updated