Predicting effective microRNA target sites in mammalian mRNAs

  1. Vikram Agarwal
  2. George W Bell
  3. Jin-Wu Nam
  4. David P Bartel  Is a corresponding author
  1. Howard Hughes Medical Institute, Whitehead Institute for Biomedical Research, United States
  2. Massachusetts Institute of Technology, United States
  3. Whitehead Institute for Biomedical Research, United States
  4. Hanyang University, Korea
7 figures, 3 tables and 3 additional files

Figures

Figure 1 with 5 supplements
Inefficacy of recently reported non-canonical sites.

(A) Response of mRNAs to the loss of miRNAs, comparing mRNAs that contain either a canonical or nucleation-bulge site to miR-430 to those that do not contain a miR-430 site. Plotted are cumulative distributions of mRNA fold changes observed when comparing embryos that lack miRNAs (MZDicer) to those that have miRNAs (WT), focusing on mRNAs possessing a single site of the indicated type in their 3′ UTR. Similarity of site-containing distributions to the no-site distribution was tested (one-sided Kolmogorov–Smirnov [K–S] test, P values); the number of mRNAs analyzed in each category is listed in parentheses. See also Figure 1—figure supplement 1C and Figure 1—figure supplement 4A. (B and C) Response of mRNAs to the loss of miR-155, focusing on mRNAs that contain either a single canonical or ≥1 CLIP-supported non-canonical site to miR-155. These panels are as in (A), but compare fold changes for mRNAs with the indicated site type following genetic ablation of mir-155 in either T cells (B) or Th1 cells (C). See also Figure 1—figure supplement 2. (D and E) Response of mRNAs to the knockdown of miR-92a, focusing on mRNAs that contain either a single canonical or ≥1 CLASH-identified non-canonical site to miR-92a. These panels are as in (A), except CLASH-supported non-canonical sites were the same as those defined previously (Helwak et al., 2013) and thus were permitted to reside in any region of the mature mRNA, and these panels compare fold changes for mRNAs with the indicated site type following either knockdown of miR-92a (D) or combined knockdown of miR-92a and 24 other miRNAs (E) in HEK293 cells. See also Figure 1—figure supplement 3A,B. (F) As in (D), but focusing on mRNAs that contain ≥1 chimera-identified site. See also Figure 1—figure supplement 3C–E and Figure 1—figure supplement 4B. (G) Response of mRNAs to the transfection of 16 miRNAs, focusing on mRNAs that contain either a canonical or MIRZA-predicted non-canonical site. This panel is as in (A), but compares the fold changes for mRNAs with the indicated site type after introducing miRNAs, aggregating results from 16 individual transfection datasets. Fold changes are plotted for the top 100 non-canonical predictions for each of 16 miRNAs compiled either before (MIRZA, top 100) or after (MIRZA, no 6mers) removing mRNAs containing 6mer or offset-6mer 3′-UTR sites. (H) Response of mRNAs to a transfection of miR-522, focusing on mRNAs that contain either a single canonical or ≥1 IMPACT-seq–supported non-canonical site to miR-522. These panels are as in (A), except IMPACT-seq–supported non-canonical sites were the same as those defined previously (Tan et al., 2014) and thus were permitted in any region of the mature mRNA. (I) Response of ribosomes to the loss of miR-155, focusing on mRNAs that contain either a single canonical or ≥1 CLIP-supported non-canonical site to miR-155. This panel is as in (B and C) but compares the response of mRNAs using ribosome-footprint profiling (Eichhorn et al., 2014) after genetic ablation of mir-155 in B cells. Ribosome-footprint profiling captures changes in both mRNA stability and translational efficiency through the high-throughput sequencing of ribosome-protected mRNA fragments (RPFs).

https://doi.org/10.7554/eLife.05005.003
Figure 1—figure supplement 1
Inefficacy of nucleation-bulge sites.

(A and B) These panels are as in Figure 1A but compare the response of cognate site-containing mRNAs in a compendium of either 11 miRNA transfection datasets (A) or 74 sRNA transfection datasets (B). The datasets were pre-processed (Figure 3) and are provided in Supplementary file 1. (C) This panel is as in Figure 1A but compares the response of mRNAs in MZDicer embryos in which miR-430 has been injected. (DF) These panels are as in Figure 1A but compare the response of mRNAs with the indicated miR-124 site types after transfecting miR-124 into either HEK293 cells (D), HeLa cells (E), or Huh7 cells (F).

https://doi.org/10.7554/eLife.05005.004
Figure 1—figure supplement 2
Inefficacy of CLIP-supported non-canonical miR-155 sites.

(A and B) These panels are as in Figure 1B but compare the response of mRNAs after genetic ablation of miR-155 in Type 2 helper T cells (Th2, A) or B cells (B).

https://doi.org/10.7554/eLife.05005.005
Figure 1—figure supplement 3
Inefficacy of CLASH- and chimera-supported non-canonical sites.

(AD) These panels are as in Figure 1D but compare the response of mRNAs with sites cognate to any one of four miRNA families (miR-15/16, miR-19, miR-17/20/93/106, or miR-25/92), for either all CLASH-supported targets (A), mRNAs with CLASH-supported 3′-UTR sites (B), all chimera-supported targets (C), or mRNAs with chimera-supported 3′-UTR sites (D). These four miRNA families were chosen because their predicted targets were the most responsive to knockdown of the 25 miRNAs. p values reflect the median p value (as evaluated by a K–S test) across 100 trials in which a no-site control cohort with matched 3′-UTR lengths was chosen for each site-containing distribution. Length-matched no-site controls were required for this analysis because longer 3′ UTRs had a greater chance of containing additional sites to at least one of the many miRNAs that were knocked down, and thus had a greater chance of being derepressed as a result of interactions otherwise not considered in the analysis. To populate each control cohort, 500 different no-site mRNAs were chosen, considering the 3′-UTR length of each site-containing mRNA and selecting (without replacement) control mRNAs from among the 10 no-site mRNAs with the most similar 3′-UTR lengths. Shown is the response of a control cohort for mRNAs containing non-canonical sites. mRNAs with 3′ UTRs >2000 nt were excluded from the analysis because so many of the 3′ UTRs >2000 nt had a site to at least one of the four miRNA families, making it impossible to select appropriate length-matched controls. (E) This panel is as in Figure 1F but compares the response of mRNAs with the indicated miR-302 site types after knocking down miR-302/367 in hESCs.

https://doi.org/10.7554/eLife.05005.006
Figure 1—figure supplement 4
Inefficacy of non-canonical sites in mediating translational repression.

(A) This panel is as in Figure 1A but compares the response of mRNAs using ribosome footprint profiling (Bazzini et al., 2012), which captures changes in both mRNA stability and translational efficiency through the high-throughput sequencing of ribosome-protected mRNA fragments (RPFs). (B) This panel is as in (Figure 1I) but compares protein fold changes for chimera-supported targets, as evaluated by pulsed SILAC (Selbach et al., 2008) after transfection of miR-155 in HeLa cells.

https://doi.org/10.7554/eLife.05005.007
Figure 1—figure supplement 5
Re-evaluating conservation of chimera-supported non-canonical sites.

(A) Conservation of chimera-supported non-canonical sites detected in an analysis modeled after that of Grosswendt et al. (2014) but modified to control for background conservation. Plotted for the indicated miRNAs is the average conservation of chimera-supported non-canonical sites, as measured by branch-length score (BLS), compared to the average conservation of 100 equally sized cohorts of controls; error bars, standard deviation of cohort averages; **, p < 0.01; *, p < 0.05, one-sided Z test. We considered chimera-supported non-canonical sites that mapped within 3′ UTRs and contained a single mismatch to the 6 nt seed of the miRNA. This set of sites mirrored that analyzed previously (Grosswendt et al., 2014), and excluded offset 6mers, which as a class was already known to mediate repression and exhibit preferential conservation (Friedman et al., 2009). Cohorts of control sites were generated such that for each chimera-supported site, each control cohort contained a single example of the identical 6 nt motif that was present in the indicated region (either an AGO cluster or 3′ UTR) but not supported by chimeric reads. To control for local background conservation and thereby avoid treating sites within slowly evolving 3′ UTRs the same as those within rapidly evolving 3′ UTRs, we used the binning procedure developed for calculating PCT scores (Friedman et al., 2009); 3′ UTRs were partitioned into 10 conservation bins (based on the median BLS of the nucleotides of the human sequence), and control sites were randomly selected (with replacement) from 3′ UTRs in the same bin as the actual site. Control AGO clusters were collected as was done previously (Grosswendt et al., 2014), using genome-wide data downloaded from clipz.unibas.ch and derived from multiple AGO PAR-CLIP experiments performed in HEK293 cells (Kishore et al., 2011). The union of AGO clusters for all experiments was computed and filtered for overlap with Ensembl-annotated 3′ UTRs, using the ‘merge’ and ‘intersectBED’ utilities, respectively, found in BEDTools v2.20.1 (parameter ‘-s’) (Quinlan and Hall, 2010). (B) Attribution of the conservation signal to the confounding effects of conserved regions. Considered are 1443 non-canonical chimera-supported sites selected as in (A) but including sites of all miRNA families. For each chimera-supported site, a z score was generated using the distribution of BLSs for 100 control sites chosen as in panel (A) from either AGO clusters or 3′ UTRs, as indicated. Each z score reflected how the conservation of the actual site differed from that of its controls. Compared are cumulative distributions of the z scores for sites of broadly conserved miRNAs and those of less conserved miRNAs, using the previously defined sets of broadly and less conserved miRNAs (Friedman et al., 2009). If the chimera-supported non-canonical sites were preferentially conserved because of their function in mediating repression, then sites of broadly conserved miRNAs would be expected to have a right-shifted distribution compared to sites of less conserved miRNAs. However, no significant difference was discerned between each pair of z-score distributions. The remainder of this legend outlines the rationale for the analysis of this panel. One way to reconcile the conservation signal observed in panel (A) with our conclusion that a large majority if not all of these sites bind miRNA but do not mediate repression is to consider the potentially confounding biochemical properties of conserved regions, which are illustrated by the observation that artificial siRNAs preferentially target sites that are evolutionarily conserved over those that are not (Nielsen et al., 2007). Because these siRNAs are not natural (and do not share a seed with conserved miRNAs) the evolutionary conservation of these preferred sites could not have arisen because they function to mediate sRNA-guided repression. Instead, some other function of these 3′-UTR regions, such as greater accessibility to RNA-binding factors, must explain their preferential conservation and also endow them with properties that favor sRNA binding (Nielsen et al., 2007). To examine whether confounding properties of conserved 3′-UTR regions might similarly explain the elevated conservation of chimera-supported sites, we compared the z scores for sites bound by broadly conserved miRNAs (miRNAs in families conserved beyond mammals, as listed in TargetScan7) with those bound by less conserved miRNAs. MicroRNAs conserved among mammals but not more broadly were grouped with the less conserved miRNAs because canonical 6mer and 7mer sites to these miRNAs have no conservation signal above background, presumably because these miRNAs have not been present long enough for the number of preferentially conserved 6mer and 7mer sites to rise above the background (Friedman et al., 2009); we reasoned that the same would be true of non-canonical sites, to the extent that any are preferentially conserved. If the conservation signal observed in panel (A) were related to miRNA binding, we would have expected a difference between the scores for the sites of broadly and less conserved miRNAs. The lack of a significant difference supports the idea that chimera-supported non-canonical sites tend to be conserved for the same reason that functional sites to artificial siRNAs tend to be conserved.

https://doi.org/10.7554/eLife.05005.008
Figure 2 with 2 supplements
Confirmation of experimentally identified non-canonical miRNA binding sites.

(A) Sequence logos corresponding to motifs enriched in dCLIP clusters that either appear following transfection of miR-124 into HeLa cells (Chi et al., 2009) (top) or disappear following knockout of miR-155 in T cells (Loeb et al., 2012) (bottom). Shown to the right of each logo is its E-value among clusters lacking a seed-matched or offset-6mer canonical site and the fraction of these clusters that matched the logo. Shown below each logo are the complementary regions of the cognate miRNA family, highlighting nucleotides 2–8 in capital letters. (B) Position of the top-ranked motif corresponding to non-canonical sites enriched in CLASH (Helwak et al., 2013) (left) or chimera (Grosswendt et al., 2014) (right) data for each human miRNA family supported by at least 50 interactions without a seed-matched or offset-6mer canonical site. For each family the most enriched logo was aligned to the reverse complement of the miRNA. In cases in which a logo mapped to multiple positions along the miRNA, the positions with the best and second best scores are indicated (red and blue, respectively). (C) Sequence logos of motifs enriched in chimera interactions that lack canonical sites. As in (A), but displaying sequence logos identified in the chimera data of panel (B) for a sample of nine human miRNAs. Logos identified for the other human miRNAs are also provided (Figure 2—figure supplement 1B). A nucleotide that differs between miRNA family members is indicated as a black ‘n’.

https://doi.org/10.7554/eLife.05005.009
Figure 2—figure supplement 1
Comparison of CLASH and chimera data and identification of motifs enriched in human chimera interactions that lack canonical sites.

(A) Comparison of CLASH (left) and chimera (right) reads from human cells, showing the proportion possessing a canonical site (blue) and overlapping 3′ UTRs (red). In total, 18,514 CLASH and 10,567 chimera interactions were analyzed. (B) Sequence logos of motifs enriched in chimera interactions that lack canonical sites. This panel is as in Figure 2C but displays the remaining motifs identified from the chimera data analyzed in Figure 2B. In cases of alignment ambiguity, both alignments are shown below the logo. For some miRNA families, multiple motifs were significantly enriched (E ≤ 0.001) and are shown separately. Significantly enriched motifs (or a top-ranked motif matching the miRNA) were not found for miR-21, and miR-3168 was excluded from the analysis due to poor support for its authenticity as a miRNA. (C) Sequence logos of motifs that do not match the cognate miRNA but are nonetheless enriched in miR-124 dCLIP (Chi et al., 2009) and miR-522 IMPACT-seq (Tan et al., 2014) clusters that lack canonical sites to the miRNA. The miR-124 logo was nearly identical to a non-specific motif previously identified as enriched in CLIP data from the mouse brain (Chi et al., 2012). The miR-522 logo was found instead of the previously reported miRNA-matching logo (Tan et al., 2014).

https://doi.org/10.7554/eLife.05005.010
Figure 2—figure supplement 2
Identification of motifs enriched in mouse and nematode chimera interactions that lack canonical sites.

(A) Sequence logos of motifs enriched in M. musculus chimera interactions that lack canonical sites; otherwise as in Figure 2C. Significantly enriched motifs (or a top-ranked motif matching the miRNA) were not found for let-7 and miR-142-3p. (B) Sequence logos of motifs enriched in C. elegans chimera interactions that lack canonical sites; otherwise as in Figure 2C. Significantly enriched motifs (or a top-ranked motif matching the miRNA) were not found for miR-1.

https://doi.org/10.7554/eLife.05005.011
Figure 3 with 1 supplement
Pre-processing the microarray datasets to minimize nonspecific effects and technical biases.

(A) Example of the correlated response of mRNAs after transfecting two unrelated sRNAs (sRNA 1 and 2, respectively). Results for mRNAs containing at least one canonical 7–8 nt 3′-UTR site for either sRNA 1, sRNA 2, or both sRNAs are highlighted in red, blue, and green, respectively. Values for mRNAs without such sites are in grey. All mRNAs were used to calculate the Spearman correlation (rs). (B) Correlated responses observed in a compendium of 74 transfection experiments from six studies (colored as indicted in the publications list). For each pair of experiments, the rs value was calculated as in panel (A), colored as indicated in the key, and used for hierarchical clustering. (C) Study-dependent relationships between the responses of mRNAs to the transfected sRNA and either 3′-UTR length or 3′-UTR AU content, focusing on mRNAs without a canonical 7–8 nt 3′-UTR site to the sRNA. Boxplots indicate the median rs (bar), 25th and 75th percentiles (box), and the minimum of either 1.5 times the interquartile range or the most extreme data point (whiskers), with the width of the box proportional to the number of datasets used from each study. The studies are colored as in panel (B), abbreviating the first author and year. (D) Reduced correlation between the responses of mRNAs to unrelated sRNAs after applying the PLSR technique. This panel is as in (A) but plots the normalized mRNA fold changes. (E) Reduced correlations in results of the compendium experiments after applying the PLSR technique. This panel is as in (B) but plots the correlations after normalizing the mRNA fold changes. (F) Reduced study-dependent relationships between mRNA responses and either 3′-UTR length or 3′-UTR AU content. This panel is as in (C) but plots the correlations after normalizing the mRNA fold changes. (G and H) Cumulative distributions of fold changes for mRNAs containing at least one canonical 7–8 nt 3′-UTR site or no site either before normalization (raw) or after normalization (normalized). Panel (G) plots the results from experiments shown in (A) and (D), and (H) plots results from all 74 datasets.

https://doi.org/10.7554/eLife.05005.012
Figure 3—figure supplement 1
Reduced biases from derepression of endogenous miRNA targets.

(A) Pie chart reflecting the relative proportions of reads for the indicated miRNA families observed when sequencing small RNAs from HeLa cells. Relative miRNA levels were quantified as described previously (Denzler et al., 2014). (B and C) Cumulative distributions of fold changes for mRNAs with at least one canonical 7–8 nt 3′-UTR site to the indicated miRNA family in the compendium of 74 sRNA transfection datasets, either before (B) or after (C) normalization. p values were computed using a one-sided Wilcoxon rank-sum test, comparing each of the site-containing distributions to the no-site distribution. This test was a more stringent alternative to the K–S test, which led to highly significant p values for very slight differences, due to the large number of mRNAs in each distribution. To account for multiple hypotheses, an appropriate Bonferroni-corrected significance threshold would be p < 0.005, which was not achieved for most comparisons in panel (C).

https://doi.org/10.7554/eLife.05005.013
Developing a regression model to predict miRNA targeting efficacy.

(A) Optimizing the scoring of predicted structural accessibility. Predicted RNA structural accessibility scores were computed for variable-length windows within the region centered on each canonical 7–8 nt 3′-UTR site. The heatmap displays the partial correlations between these values and the repression associated with the corresponding sites, determined while controlling for local AU content and other features of the context+ model (Garcia et al., 2011). (B) Performance of the models generated using stepwise regression compared to that of either the context-only or context+ models. Shown are boxplots of r2 values for each of the models across all 1000 sampled test sets, for mRNAs possessing a single site of the indicated type. For each site type, all groups significantly differ (P < 10−15, paired Wilcoxon sign-rank test). Boxplots are as in Figure 3C. (C) The contributions of site type and each of the 14 features of the context++ model. For each site type, the coefficients for the multiple linear regression are plotted for each feature. Because features are each scored on a similar scale, the relative contribution of each feature in discriminating between more or less effective sites is roughly proportional to the absolute value of its coefficient. Also plotted are the intercepts, which roughly indicate the discriminatory power of site type. Dashed bars indicate the 95% confidence intervals of each coefficient.

https://doi.org/10.7554/eLife.05005.015
Figure 4—source data 1

Coefficients of the trained context++ model corresponding to each site type.

Using these coefficients and corresponding scaling factors (Table 3), context++ scores can be computed essentially as illustrated in Supplementary Figure 5 of Garcia et al. (2011).

https://doi.org/10.7554/eLife.05005.016
Figure 5 with 1 supplement
Performance of target prediction algorithms on a test set of seven experiments in which miRNAs were individually transfected into HCT116 cells.

(A) Average number of targets predicted by the indicated algorithm for each of the seven miRNAs in the test set (let-7c, miR-16, miR-103, miR-106b, miR-200b, miR-200a, and miR-215). The numbers of predictions with at least one canonical 7–8 nt 3′-UTR site to the transfected miRNA (dark blue) are distinguished from the remaining predictions (light blue). Names of algorithms are colored according to whether they consider only sequence or thermodynamic features of site pairing (grey), only site conservation (orange), pairing and contextual features of a site (red), or pairing, contextual features, and site conservation (purple). The most recently updated predictions were downloaded, with year that those predictions were released indicated in parentheses. (B and C) Extent to which the predictions explain the mRNA fold changes observed in the test set. For predictions tallied in panel (A), the explanatory power, as evaluated by the r2 value for the relationship between the scores of the predictions and the observed mRNA fold changes in the test set, is plotted for either mRNAs with 3′ UTRs containing at least one canonical 7–8 nt 3′-UTR site (B) or other mRNAs (C). Algorithms designed to evaluate only targets with seed-matched 7–8 nt 3′-UTR sites are labeled ‘N/A’ in (C). (D) Repression of the top predictions of the context++ model and of our previous two models, focusing on an average of 16 top predicted targets per miRNA in the test set. The dotted lines indicate the median fold-change value for each distribution, otherwise as in Figure 1A. (E and F) Median mRNA fold changes observed in the test set for top-ranked predicted targets, considering either all predictions (E) or only those with 3′ UTRs lacking at least one canonical 7–8 nt site (F). For each algorithm listed in panel (A), all reported predictions for the seven miRNAs were ranked according to their scores, and the indicated sliding threshold of top predictions was implemented. For example, at the threshold of 4, the 28 predictions with the top scores were identified (an average of 4 predictions per miRNA, allowing miRNAs with more top scores to contribute more predictions), mRNA fold-change values from the cognate transfections were collected, and the median value was plotted. When the threshold exceeded the number of reported predictions, no value was plotted. Also plotted is the median mRNA fold change for all mRNAs with at least one cognate canonical 7–8 nt site in their 3′ UTR (dashed line; an average of 1366 mRNAs per miRNA), the median fold change for all mRNAs with at least one conserved cognate canonical 7–8 nt site in their 3′ UTR (dotted line; an average of 461 mRNAs per miRNA), and the 95% interval for the median fold change of randomly selected mRNAs, determined using 1000 resamplings (without replacement) at each cutoff (shading). Conserved sites were defined as in TargetScan6, with conservation cutoffs for each site type set at different branch-length scores (cutoffs of 0.8, 1.3, and 1.6 for 8mer, 7mer-m8, and 7mer-A1 sites, respectively).

https://doi.org/10.7554/eLife.05005.017
Figure 5—figure supplement 1
Performance of miRNA prediction algorithms on the test set.

(A) This panel is as in Figure 5D, but shows the results for all algorithms evaluated in Figure 5A. Algorithm names are listed in the order of the median fold change for their top predictions, with each name colored using the color used for its cumulative distribution. (B and C) These panels are as in Figure 5E–F, respectively, but compare mean fold changes instead of median fold changes.

https://doi.org/10.7554/eLife.05005.018
Response of predictions and mRNAs with experimentally supported canonical binding sites.

(AE) Comparison of the top TargetScan7 predicted targets to mRNAs with canonical sites identified from dCLIP in either HeLa cells with and without transfected miR-124 (Chi et al., 2009) or lymphocytes with and without miR-155 (Loeb et al., 2012). Plotted are cumulative distributions of mRNA fold changes after transfection of miR-124 in HeLa cells (A), or after genetic ablation of miR-155 in either T cells (B), Th1 cells (C), Th2 cells (D), and B cells (E) (one-sided K–S test, P values). For genes with alternative last exons, the analysis considered the score of the most abundant alternative last exon, as assessed by 3P-seq tags (as is the default for TargetScan7 when ranking predictions). Each dCLIP-identified mRNA was required to have a 3′-UTR CLIP cluster with at least one canonical site to the cognate miRNA (including 6mers but not offset 6mers). Each intersection mRNA (red) was found in both the dCLIP set and top TargetScan7 set. Similarity between performance of the TargetScan7 and dCLIP sets (purple and green, respectively) and TargetScan7 and intersection sets (blue and red, respectively) was tested (two-sided K–S test, P values); the number of mRNAs analyzed in each category is in parentheses. TargetScan7 scores for mouse mRNAs were generated using human parameters for all features. (FH) Comparison of top TargetScan7 predicted targets to mRNAs with canonical binding sites identified using photoactivatable-ribonucleoside-enhanced CLIP (PAR-CLIP) (Hafner et al., 2010; Lipchina et al., 2011). Plotted are cumulative distributions of mRNA fold changes after either transfecting miR-7 (F) or miR-124 (G) into HEK293 cells, or knocking down miR-302/367 in hESCs (H). Otherwise these panels are as in (AE). (I) Comparison of top TargetScan7 predicted targets to mRNAs with canonical sites identified using CLASH (Helwak et al., 2013). Plotted are cumulative distributions of mRNA fold changes after knockdown of 25 miRNAs from 14 miRNA families in HEK293 cells. For each of these miRNA families, a cohort of top TargetScan7 predictions was chosen to match the number of mRNAs with CLASH-identified canonical sites, and the union of these TargetScan7 cohorts was analyzed. The total number of TargetScan7 predictions did not match the number of CLASH-identified targets due to slightly different overlap between mRNAs targeted by different miRNAs. Otherwise these panels are as in (AE). (J) Comparison of top TargetScan7 predicted targets to mRNAs with chimera-identified canonical sites (Grosswendt et al., 2014). Otherwise this panel is as in (I). (K) Comparison of top TargetScan7 predicted targets to mRNAs with canonical binding sites within 3′ UTRs of mRNAs identified using pulldown-seq (Tan et al., 2014). Plotted are cumulative distributions of mRNA fold changes after transfecting miR-522 into triple-negative breast cancer (TNBC) cells. Otherwise this panel is as in (AE). (L) Comparison of top TargetScan7 predicted targets to mRNAs with canonical sites identified using IMPACT-seq (Tan et al., 2014). Otherwise this panel is as in (K).

https://doi.org/10.7554/eLife.05005.019
Figure 7 with 1 supplement
Example display of TargetScan7 predictions.

The example shows a TargetScanHuman page for the 3′ UTR of the LRRC1 gene. At the top is the 3′-UTR profile, showing the relative expression of tandem 3′-UTR isoforms, as measured using 3P-seq (Nam et al., 2014). Shown on this profile is the end of the longest Gencode annotation (blue vertical line) and the total number of 3P-seq reads (339) used to generate the profile (labeled on the y-axis). Below the profile are predicted conserved sites for miRNAs broadly conserved among vertebrates (colored according to the key), with options to display conserved sites for mammalian conserved miRNAs, or poorly conserved sites for any set of miRNAs. Boxed are the predicted miR-124 sites, with the site selected by the user indicated with a darker box. The multiple sequence alignment shows the species in which an orthologous site can be detected (white highlighting) among representative vertebrate species, with the option to display site conservation among all 84 vertebrate species. Below the alignment is the predicted consequential pairing between the selected miRNA and its sites, showing also for each site its position, site type, context++ score, context++ score percentile, weighted context++ score, branch-length score, and PCT score.

https://doi.org/10.7554/eLife.05005.020
Figure 7—figure supplement 1
Flowchart of the computational pipeline used to build the TargetScan7 database.
https://doi.org/10.7554/eLife.05005.021

Tables

Table 1

The 26 features considered in the models, highlighting the 14 robustly selected through stepwise regression (bold)

https://doi.org/10.7554/eLife.05005.014
FeatureAbbreviationDescriptionFrequency chosen
8mer7mer-m87mer-A16mer
miRNA
 3′-UTR target-site abundanceTA_3UTRNumber of sites in all annotated 3′ UTRs (Arvey et al., 2010; Garcia et al., 2011)100%100%100%100%
 ORF target-site abundanceTA_ORFNumber of sites in all annotated ORFs (Garcia et al., 2011)9.4%0.7%68.1%93.4%
Predicted seed-pairing stabilitySPSPredicted thermodynamic stability of seed pairing (Garcia et al., 2011)100%100%100%100%
sRNA position 1sRNA1Identity of nucleotide at position 1 of the sRNA68%100%99.7%97.7%
sRNA position 8sRNA8Identity of nucleotide at position 8 of the sRNA0%0.8%100%100%
Site
 Site position 1site1Identity of nucleotide at position 1 of the siteN/A57.1%N/A2%
Site position 8site8Identity of nucleotide at position 8 of the site0.8%95.1%99.4%100%
 Site position 9site9Identity of nucleotide at position 9 of the site (Lewis et al., 2005; Nielsen et al., 2007)15.4%7.1%0.9%93.7%
 Site position 10site10Identity of nucleotide at position 10 of the site (Nielsen et al., 2007)0.1%100%8.5%26.3%
Local AU contentlocal_AUAU content near the site (Grimson et al., 2007; Nielsen et al., 2007)100%100%100%100%
3′ supplementary pairing3P_scoreSupplementary pairing at the miRNA 3′ end (Grimson et al., 2007)42.5%100%100%100%
 Distance from stop codondist_stoplog10(Distance of site from stop codon)62.4%10.8%8.7%25.7%
Predicted structural accessibilitySAlog10(Probability that a 14 nt segment centered on the match to sRNA positions 7 and 8 is unpaired)100%100%100%100%
Minimum distancemin_distlog10(Minimum distance of site from stop codon or polyadenylation site) (Gaidatzis et al., 2007; Grimson et al., 2007; Majoros and Ohler, 2007)99.9%100%87.4%100%
Probability of conserved targetingPCTProbability of site conservation, controlling for dinucleotide evolution and site context (Friedman et al., 2009)100%100%100%20.8%
mRNA
 5′-UTR lengthlen_5UTRlog10(Length of the 5′ UTR)98.2%8.2%4.6%17.2%
ORF lengthlen_ORFlog10(Length of the ORF)100%100%100%100%
3′-UTR lengthlen_3UTRlog10(Length of the 3′ UTR) (Hausser et al., 2009)100%100%100%100%
 5′-UTR AU contentAU_5UTRFraction of AU nucleotides in the 5′ UTR13%38.9%91.1%31.3%
 ORF AU contentAU_ORFFraction of AU nucleotides in the ORF1.2%72.4%28.4%35.8%
 3′-UTR AU contentAU_3UTRFraction of AU nucleotides in the 3′ UTR (Robins and Press, 2005; Hausser et al., 2009)5.4%73.3%65.3%80.6%
3-UTR offset-6mer sitesoff6mNumber of offset-6mer sites in the 3′ UTR (Friedman et al., 2009)65.9%89.6%99.8%100%
ORF 8mer sitesORF8mNumber of 8mer sites in the ORF (Lewis et al., 2005; Reczko et al., 2012)99.5%99.1%100%100%
 ORF 7mer-m8 sitesORF7m8Number of 7mer-m8 sites in the ORF (Reczko et al., 2012)4.7%4.3%85.3%100%
 ORF 7mer-A1 sitesORF7A1Number of 7mer-A1 sites in the ORF (Reczko et al., 2012)68.4%34.2%97.8%98.4%
 ORF 6mer sitesORF6mNumber of 6mer sites in the ORF (Reczko et al., 2012)91%13.3%0.7%36.7%
  1. The feature description does not include the scaling performed (Table 3) to generate more comparable regression coefficients.

Table 2

Summary of datasets analyzed in this study, and corresponding figures using the datasets

https://doi.org/10.7554/eLife.05005.022
FigureGene expression omnibus (GEO) ID, ArrayExpress ID, or data sourceReference
Figure 1A, Figure 1—figure supplement 4AGSM854425, GSM854430, GSM854431, GSM854436, GSM854437, GSM854442, GSM854443(Bazzini et al., 2012)
Figure 1B, Figure 6BGSM1012118, GSM1012119, GSM1012120, GSM1012121, GSM1012122, GSM1012123(Loeb et al., 2012)
Figure 1C, Figure 1figure supplement 2A, Figure 6C,DE-TABM-232(Rodriguez et al., 2007)
Figure 1D,FGSM1122217, GSM1122218, GSM1122219, GSM1122220, GSM1122221, GSM1122222, GSM1122223, GSM1122224, GSM1122225, GSM1122226(Helwak et al., 2013)
Figure 1E, Figure 1—figure supplement 3A–D, Figure 6I,JGSM538818, GSM538819, GSM538820, GSM538821(Hafner et al., 2010)
Figure 1GGSM156524, GSM156532, GSM210897, GSM210898, GSM210901, GSM210903, GSM210904, GSM210907, GSM210909, GSM210911, GSM210913, GSM37599, http://psilac.mdc-berlin.de/download/ (let7b_32h, miR-30_32h, miR-155_32h, miR-16_32h)(Lim et al., 2005; Grimson et al., 2007; Linsley et al., 2007; Selbach et al., 2008)
Figure 1H, Figure 6K,LE-MTAB-2110(Tan et al., 2014)
Figure 1I, Figure 1—figure supplement 2B, Figure 6EGSM1479572, GSM1479576, GSM1479580, GSM1479584(Eichhorn et al., 2014)
Figure 1—figure supplement 1AGSM210897, GSM210898, GSM210901, GSM210903, GSM210904, GSM210907, GSM210909, GSM210911, GSM210913, GSM37599, GSM37601(Lim et al., 2005; Grimson et al., 2007)
Figure 1—figure supplement 1B, Figure 3, Figure 3—figure supplement 1B,C, Figure 474 datasets compiled in Supplementary data 4 of Garcia et al. (2011), used as is or after normalization (Supplementary file 1); GSM119707, GSM119708, GSM119710, GSM119743, GSM119745, GSM119746, GSM119747, GSM119749, GSM119750, GSM119759, GSM119761, GSM119762, GSM119763, GSM133685, GSM133689, GSM133699, GSM133700, GSM134325, GSM134327, GSM134466, GSM134480, GSM134483, GSM134485, GSM134511, GSM134512, GSM134551, GSM210897, GSM210898, GSM210901, GSM210903, GSM210904, GSM210907, GSM210909, GSM210911, GSM210913, GSM37599, GSM37601; E-MEXP-1402 (1595297366, 1595297383, 1595297389, 1595297394, 1595297399, 1595297422, 1595297427, 1595297432, 1595297491, 1595297496, 1595297501, 1595297507, 1595297513, 1595297518, 1595297524, 1595297530, 1595297535, 1595297564, 1595297588, 1595297595, 1595297605, 1595297614, 1595297621, 1595297627, 1595297644, 1595297650, 1595297662); E-MEXP-668 (16012097016666, 16012097016667, 16012097016668, 16012097016669, 16012097017938, 16012097017939, 16012097017952, 16012097017953, 16012097018568, 251209725411)(Lim et al., 2005; Birmingham et al., 2006; Schwarz et al., 2006; Jackson et al., 2006a; Jackson et al., 2006b; Grimson et al., 2007; Anderson et al., 2008)
Figure 1—figure supplement 1CGSM95614, GSM95615, GSM95616, GSM95617, GSM95618, GSM95619(Giraldez et al., 2006)
Figure 1—figure supplement 1D,FGSM1269344, GSM1269345, GSM1269348, GSM1269349, GSM1269350, GSM1269351, GSM1269354, GSM1269355, GSM1269356, GSM1269357, GSM1269360, GSM1269361, GSM1269362, GSM1269363(Nam et al., 2014)
Figure 1—figure supplement 3E, Figure 6Hhttp://icb.med.cornell.edu/faculty/betel/lab/betelab_v1/Data.html(Lipchina et al., 2011)
Figure 1—figure supplement 4Bhttp://psilac.mdc-berlin.de/media/database/release-1.0/protein/pSILAC_all_protein_ratios_OE.txt (miR155)(Selbach et al., 2008)
Figure 3—figure supplement 1AGSM416753(Mayr and Bartel, 2009)
Figure 5, Figure 5—figure supplement 1GSM156522, GSM156580, GSM156557, GSM156548, GSM156533, GSM156532, GSM156524, processed and normalized (Supplementary file 2)(Linsley et al., 2007)
Figure 6AGSM37601(Lim et al., 2005)
Figure 6F,GGSM363763, GSM363766, GSM363769, GSM363772, GSM363775, GSM363778(Hausser et al., 2009)
Table 3

Scaling parameters used to normalize data to the (0, 1) interval

https://doi.org/10.7554/eLife.05005.023
Feature8mer7mer-m87mer-A16mer
5th %95th %5th %95th %5th %95th %5th %95th %
3P_score1.0003.5001.0003.5001.0003.5001.0003.500
SPS−11.130−5.520−11.130−5.490−8.410−3.330−8.570−3.330
TA_3UTR3.1133.8653.0673.8873.1453.8873.1133.887
Len_3UTR2.3923.6372.4093.6152.4133.6302.4053.620
Len_ORF2.7883.7532.7733.7292.7733.7302.7753.731
Min_dist1.4153.1131.4913.0961.4313.1171.4773.106
Local_AU0.3080.8140.2770.7820.3420.8010.2950.772
SA−4.356−0.661−5.218−0.725−4.230−0.588−5.082−0.666
PCT0.0000.8160.0000.3640.0000.4490.0000.193
  1. Provided are the 5th and 95th percentile values for continuous features that were scaled, after the values of the feature were appropriately transformed as indicated (Table 1).

Additional files

Supplementary file 1

Normalized values for fold changes (log2) of mRNAs detectable in the compendium of 74 sRNA transfection datasets.

https://doi.org/10.7554/eLife.05005.024
Supplementary file 2

Normalized values for fold changes (log2) of mRNAs detectable in the seven datasets examining the response of transfecting miRNAs into HCT116 cells.

https://doi.org/10.7554/eLife.05005.025
Supplementary file 3

Genomic coordinates of CLIP clusters that appeared in annotated 3′ UTRs after transfecting miR-124 into HeLa cells.

https://doi.org/10.7554/eLife.05005.026

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Vikram Agarwal
  2. George W Bell
  3. Jin-Wu Nam
  4. David P Bartel
(2015)
Predicting effective microRNA target sites in mammalian mRNAs
eLife 4:e05005.
https://doi.org/10.7554/eLife.05005