Variation in ubiquitin system genes creates substrate-specific effects on proteasomal protein degradation
Peer review process
This article was accepted for publication as part of eLife's original publishing model.
History
- Version of Record published
- Accepted Manuscript published
- Accepted
- Received
- Preprint posted
Decision letter
-
Magnus NordborgReviewing Editor; Gregor Mendel Institute, Austria
-
David RonSenior Editor; University of Cambridge, United Kingdom
-
Magnus NordborgReviewer; Gregor Mendel Institute, Austria
Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.
Decision letter after peer review:
Thank you for submitting your article "Variation in Ubiquitin System Genes Creates Substrate-Specific Effects on Proteasomal Protein Degradation" for consideration by eLife. Your article has been reviewed by 4 peer reviewers, , including Magnus Nordborg as the Reviewing Editor and Reviewer #1 and the evaluation has been overseen by David Ron as the Senior Editor.
The reviewers have discussed their reviews with one another and agreed this work is elegant, and absolutely deserves to be published. The Reviewing Editor has drafted this to help you prepare a revised submission – please also see the individual reviews for further suggestions for improvements.
There is an overall need to quantify several statements (see individual reviews). This should be straightforward and requires no additional experiments.
In addition, we would like to encourage you to put a bit more thought into the evolutionary analysis. For example, the issue of pleiotropy could be discussed further (as noted by Reviewer 1), and so could the persistent shifts in effect between the strains (Reviewer 4). It may be worth considering Hunter Fraser's use of sign tests to detect selection, e.g.,
https://www.pnas.org/doi/10.1073/pnas.0912245107 in this context.
Reviewer #1 (Recommendations for the authors):
I trust more mechanistic reviewers to comment on your experiments. From my point of view, I have only a few suggestions for improvements:
1) You argue (in connection with Figure 2)1 that "many QTLs were found only for individual pathways", but this is surely sensitive to significance thresholds. A more sophisticated analysis of pleiotropy would be in order.
2) You never quantify how much of the total variation your QTL explains.
3) You never quantify how much of each QTL your identified causal SNPs explain.
4) When dissecting your UBR1 alleles, would be nice to pay homage to this classic:
Laurie-Ahlberg, C. C., and L. F. Stam. 1987. "Use of P-Element-Mediated Transformation to Identify the Molecular Basis of Naturally Occurring Variants Affecting Adh Expression in Drosophila melanogaster." Genetics 115 (1): 129-40.
Reviewer #2 (Recommendations for the authors):
The only thing I would have appreciated is the validation of some of their findings using orthogonal approaches. There are a lot of readily established biochemical assays they can use to support their claims.
Reviewer #3 (Recommendations for the authors):
1. Figure 1 was super helpful in explaining the question and the methods used in this paper.
2. In this sentence, I am not sure that the second portion follows from the first: "These results are consistent with our observation that RM had higher UPS activity for 15 of 20 N-degrons (Figure 1D, Supplementary Table 1), suggesting that the QTLs we have mapped underlie a substantial portion of the heritable UPS activity difference between BY and RM." Maybe you missed the majority of the genetic variation that influences UPS activity, but you got lucky and the portion you detected just happened to be consistent with the overall trend. I think the authors need a different type of analysis to draw a conclusion about whether they sampled "a substantial portion" of heritable differences or whether there is missing heritability here. Can the authors use their data to calculate the narrow sense heritability and then report how much of that heritability is explained by the detected variants?
3. Perhaps reconsider using the term N-degron in the abstract. It's a bit specific. The intro and figures do a great job at defining an N-degron.
4. I did not know that each of the 20 amino acids is dealt with by one of two UPS systems. This was eventually clear in figure 1B where it seems Trp and Met N-degrons are degraded by different systems. Perhaps make this clearer in the text where N-degrons are described and defined.
5. I was mildly confused by the "italics vs plain" statement in figure 3D. Is it intended for only this panel or for the whole figure, or for the whole paper?
6. Some of the focus of the manuscript might be shifted away from general "straw man" questions, like whether this trait is mendelian and controlled by large effect variants (intro lines 55 – 66). The discovery of smaller effect variants is presented as a major finding, but it seems obvious that these exist. Perhaps instead focus on a more quantitative analysis of the power of the current study. How much heritability is explained by previously known genes or variants, and how much additional heritability is explained by the previously unknown genes/variants detected here? Even if a heritability analysis is not feasible, it think a shift in focus from a qualitative statement – we detect small effect variants – to a more quantitative or nuanced statement, would be appropriate.
Reviewer #4 (Recommendations for the authors):
I'm interested in the overall pattern that BY seems to have systematically lower UPS activity than RM. BY carries a rare allele at 3 out of the 4 examples in Figure S5, which hints that perhaps there has been an adaptation of BY for lower UPS. It would be interesting to explore this hypothesis further, or at least to discuss it. Has there been an overall adaptation for BY to be less transcriptionally active, coupled with lower degradation rates? Perhaps this might be explored using the previous data sets on mRNA in these lines – presumably, changes in transcription under this model could be either cis or trans. (That analysis is distinct from the analysis reported at L312, which focuses on specific trans-effects of the UPS variants.) And perhaps the authors may have other ideas as well to explore the evolutionary questions.
The paper reports 149 loci, but if I am reading this correctly it looks like this may double-count shared loci. It would be nice to discuss a bit more about likely sharing (ie pleiotropy) of the hits. (It's also a bit hard to see from 2B which of these are likely shared.)
L143: it should be possible to make this statement quantitative.
Para 303, 312: It seems likely that you may be looking for an effect that is small at individual genes but widespread. I would suggest looking more carefully at the overall distribution: eg in the protein data is there an upward shift in the mean? You could also use a more sophisticated method like ashR to study the distribution of changes. For Figure 5 you could also consider adding information about global patterns in means and correlations, which are difficult to read from these plots.
https://doi.org/10.7554/eLife.79570.sa1Author response
Reviewer #1 (Recommendations for the authors):
I trust more mechanistic reviewers to comment on your experiments. From my point of view, I have only a few suggestions for improvements:
1) You argue (in connection with Figure 2)1 that "many QTLs were found only for individual pathways", but this is surely sensitive to significance thresholds. A more sophisticated analysis of pleiotropy would be in order.
The revised manuscript includes an expanded analysis of pleiotropy and how the significance threshold used for QTL detection influences QTL pathway specificity. The high degree of QTL pathway specificity we observed was robust to these additional analyses.
As described in the revised manuscript (pg. 6, para. 3, line 180), we considered a QTL's effect common to multiple N-degrons when QTL peaks for distinct N-degrons were within 100 kb of each other and had the same direction of effect. Using these criteria, the 149 QTLs detected for the 20 N-degrons correspond to 35 distinct QTL regions that each affect between 1 and 11 N-degrons. At the LOD score threshold of 4.5 used in the manuscript, 23 of these 35 QTL regions specifically affected N-degrons from an individual pathway in the N-end Rule, with 11 regions affecting only Ac/N-degrons and 12 regions affecting only Arg/N-degrons. Five of the 23 distinct, pathway-specific QTL regions exclusively affected individual N-degrons. The remaining 12 of the 35 distinct QTL regions have the same direction of effect for at least one reporter each from the Arg/N-end and Ac/N-end pathways. The revised manuscript includes additional text in the Results section (pg. 6, para. 3) discussing pleiotropy among QTLs and a new table (Figure 2—figure supplement 2), which provides detailed information on the 35 distinct QTL regions, including their pathway- and N-degron specificity.
To understand the influence of significance threshold on these results, we analyzed how relaxing the LOD score threshold affected the number of pathway- or N-degron-specific QTL regions. The results of this analysis are summarized in Author response table 1. Supplementary file 2 shows the LOD score and allele frequency difference traces for all 20 N-degrons at the 23 N-end Rule pathway- or N-degron-specific QTL regions at multiple significance thresholds. Author response table 2 and Supplementary file 2 provide detailed information on how the N-end Rule pathway- and N-degron-specificity of each of the 35 distinct QTL regions is influenced by LOD score threshold.
Generally, altering the significance threshold does not affect the conclusions that at least half of the detected QTLs are specific to an individual N-end Rule pathway and that approximately 10% of QTLs are specific to individual N-degrons. In the revised Results section (pg. 6, para. 3, line 18), the qualitative statement mentioned by the reviewer is replaced with the quantitative information in Author response table 1 for the 4.5 LOD threshold column.
We note that N-end Rule pathway-specificity for 3 QTL regions (IVb, IXa, and XVc) is subject to the following considerations that may not be immediately obvious from reviewing Supplementary file 2.
The Arg/N-end-specific chromosome IVb and IXa QTL regions overlap the leftmost shoulders of QTLs detected with Ac/N-degrons, creating the appearance that these QTLs are not Arg/N-degron-specific when plotted as in Supplementary file 2. However, the peaks of these Arg/N-degron and Ac/N-degron QTLs are not within 100 kb and their confidence intervals do not overlap. We therefore interpret their effects as pathway-specific.
The Ac/N-degron-specific chromosome XVc QTL contains peaks from both glutamate Arg/N-degron replicates. However, the direction of effect of these two peaks is not consistent between replicates (the only such instance in our dataset). Because we could not unambiguously assign a direction of effect for the glutamate N-degron at this region, we do not include it in our set of QTLs and the chromosome XVc region is deemed to be Ac/N-degron-specific. In the revised manuscript, we report the chromosome IVb, IXa, and XVc QTL regions as pathway-specific.
We also note that the pathway-specific QTL regions displayed on pages 7 and 8 (QTL regions containing UBR1) as well as 20 and 21 of Supplementary file 2 are highly overlapping. However, these overlapping regions have opposing directions of effect on the sets of reporters they affect. We, therefore, continue to report these QTL regions as distinct in the revised manuscript, an interpretation supported by our fine-mapping data (see e.g., Figure 3C).
2) You never quantify how much of the total variation your QTL explains.
Our bulk segregant analysis QTL mapping method is based on comparing allele frequencies obtained from whole-genome sequencing pools of cells with extreme phenotypes. The genotypes of individual segregants, which would be needed for calculating explained variance, are not ascertained using this approach. Thus, it is not readily possible to calculate the proportion of variance explained by our QTLs.
3) You never quantify how much of each QTL your identified causal SNPs explain.
Author response table 3 displays the variance in ubiquitin-proteasome system (UPS) activity explained by each tested causal gene allele and variant.
We note that these estimates are obtained from experiments in near-isogenic strains that differ only at the tested causal gene allele or variant. The fraction of variance explained is thus inflated relative to what would be observed in the segregant populations used for QTL mapping and should not be used to estimate the variance explained by a QTL region. Therefore, we do not include these estimates in the revised manuscript.
4) When dissecting your UBR1 alleles, would be nice to pay homage to this classic: Laurie-Ahlberg, C. C., and L. F. Stam. 1987. "Use of P-Element-Mediated Transformation to Identify the Molecular Basis of Naturally Occurring Variants Affecting Adh Expression in Drosophila melanogaster." Genetics 115 (1): 129-40.
We thank the reviewer for the suggestion to include this important work on identifying causal variants for enzyme activity in highly polymorphic genomic regions. The work appears as reference 62 (pg. 8, para 3, line 218) in the revised manuscript.
Reviewer #2 (Recommendations for the authors): The only thing I would have appreciated is the validation of some of their findings using orthogonal approaches. There are a lot of readily established biochemical assays they can use to support their claims.
Previously published theoretical and empirical observations have demonstrated that tandem fluorescent timers (TFTs) provide precise and sensitive measures of protein degradation kinetics. In particular, the TFT system has been extensively used to measure differences in the degradation rate of UPS N-end Rule substrates (Khmelinskii et al., 2012, Khmelinskii and Knop, 2014, Kats et al., 2018). Given the well-established validity of the TFT system for measuring N-end Rule activity and the comparatively lower precision, sensitivity, and throughput of conventional biochemical measurements of protein degradation (in particular, pulse-chase Western blotting and cycloheximide chase analysis [Kong et al., 2021]), we argue that further experimentation is not needed to establish the claims made in our work.
Reviewer #3 (Recommendations for the authors):
1. Figure 1 was super helpful in explaining the question and the methods used in this paper.
Thank you!
2. In this sentence, I am not sure that the second portion follows from the first: "These results are consistent with our observation that RM had higher UPS activity for 15 of 20 N-degrons (Figure 1D, Supplementary Table 1), suggesting that the QTLs we have mapped underlie a substantial portion of the heritable UPS activity difference between BY and RM." Maybe you missed the majority of the genetic variation that influences UPS activity, but you got lucky and the portion you detected just happened to be consistent with the overall trend. I think the authors need a different type of analysis to draw a conclusion about whether they sampled "a substantial portion" of heritable differences or whether there is missing heritability here. Can the authors use their data to calculate the narrow sense heritability and then report how much of that heritability is explained by the detected variants?
As noted in our response to Reviewer 1, because of the pooled nature of our genetic mapping method, it is not readily possible to calculate heritability from our QTL mapping data. Author response table 3 presents the variance explained by the tested causal alleles and variants. However, as noted in our response to Reviewer 1, these estimates are inflated because they are derived from near-isogenic cell populations that differ only at the tested alleles / variants. Given these considerations, the second clause in the sentence mentioned by the reviewer has been removed from the revised manuscript.
3. Perhaps reconsider using the term N-degron in the abstract. It's a bit specific. The intro and figures do a great job at defining an N-degron.
The term “N-degron” has been removed from the abstract and replaced with more general terms.
4. I did not know that each of the 20 amino acids is dealt with by one of two UPS systems. This was eventually clear in figure 1B where it seems Trp and Met N-degrons are degraded by different systems. Perhaps make this clearer in the text where N-degrons are described and defined.
We thank the reviewer for this suggestion to improve the manuscript’s clarity. The revised introduction indicates that the N-end Rule contains two distinct targeting complexes when introducing the N-end Rule (pg. 3, para. 2, line 105) and references Figure 1, which illustrates the two pathways of the N-end rule.
5. I was mildly confused by the "italics vs plain" statement in figure 3D. Is it intended for only this panel or for the whole figure, or for the whole paper?
The “italics vs. plain” statement is intended only for panels 3D and 4B/F/J. To improve clarity, we have added additional annotation to these panels.
6. Some of the focus of the manuscript might be shifted away from general "straw man" questions, like whether this trait is mendelian and controlled by large effect variants (intro lines 55 – 66). The discovery of smaller effect variants is presented as a major finding, but it seems obvious that these exist. Perhaps instead focus on a more quantitative analysis of the power of the current study. How much heritability is explained by previously known genes or variants, and how much additional heritability is explained by the previously unknown genes/variants detected here? Even if a heritability analysis is not feasible, it think a shift in focus from a qualitative statement – we detect small effect variants – to a more quantitative or nuanced statement, would be appropriate.
The revised manuscript provides a more nuanced introduction to the genetics of UPS activity, in particular, emphasizing the expectation that UPS activity, like most traits, is genetically complex. We agree with the reviewer that the relative contributions of individual QTLs to the heritability of UPS activity would be an interesting and informative analysis. However, as noted in our response to Reviewer 1, it is not readily feasible to perform such an analysis. Instead, as suggested by the reviewer, several qualitative statements have been replaced by quantitative descriptions. In particular, the qualitative statement mentioned by the reviewer is replaced with a quantitative statement regarding QTL effect sizes (pg. 6, para. 1, line 164).
Reviewer #4 (Recommendations for the authors):
Specific comments:
I'm interested in the overall pattern that BY seems to have systematically lower UPS activity than RM. BY carries a rare allele at 3 out of the 4 examples in Figure S5, which hints that perhaps there has been an adaptation of BY for lower UPS. It would be interesting to explore this hypothesis further, or at least to discuss it. Has there been an overall adaptation for BY to be less transcriptionally active, coupled with lower degradation rates? Perhaps this might be explored using the previous data sets on mRNA in these lines – presumably, changes in transcription under this model could be either cis or trans. (That analysis is distinct from the analysis reported at L312, which focuses on specific trans-effects of the UPS variants.) And perhaps the authors may have other ideas as well to explore the evolutionary questions.
We thank the reviewer for these interesting suggestions, which we have addressed with a series of new analyses. In brief, we did not detect evidence for lineage-specific selection on UPS gene expression using eQTL data or on individual N-end reporters.
To explore whether the consistent differences in UPS activity between the BY and RM strains might reflect adaptive changes in these lineages, we performed several additional analyses. We first applied the sign test (Fraser et al., 2010; https://doi.org/10.1073/pnas.0912245107) to a recent comprehensive BY / RM eQTL mapping dataset, which became available after the original sign test publication. This newer eQTL dataset comprises 36,498 eQTLs mapped for 5,643 genes in a panel of 1,000 recombinant offspring from the BY / RM cross (Albert et al., 2018; https://doi.org/10.7554/eLife.35471). The results of this analysis are presented in Author response table 4. We performed the analysis at 11 different LOD thresholds (ranging from 2.5 to 50) to examine the influence of QTL effect size on the results. Across all genes, we do not find evidence for lineage-specific selection except at LOD thresholds of 40 and 45. Given the large fraction of eQTLs that are excluded at these high thresholds (94.6% and 95.3%), the large fraction of genes excluded (70.5% and 73.8%), and especially the marginally significant p-values (0.038 and 0.026) obtained at these two thresholds, we conclude that there is, at best, limited evidence from the sign test for lineage-specific selection on overall mRNA transcript abundance in the BY / RM cross. Future work is needed to reconcile discrepancies in the results of the sign test as obtained in different eQTL datasets from the same cross.
Abbreviations: “LOD”: LOD score threshold for calling cis / trans eQTL pairs, "n_eQTLs": number of eQTLs, "n_genes": number of genes with an eQTL, "n_pairs": number of cis / trans eQTL pairs, "reinf_BY_up": number of cis / trans eQTLs where the BY allele of the cis and trans eQTLs increases expression (reinforcing pairs), "reinf_RM_up": number of cis / trans eQTLs where the RM allele of the cis and trans eQTLs increases expression (reinforcing pairs), "oppos_BY_up": number of cis / trans eQTLs where the BY allele of the cis eQTL increases expression and the RM allele of the trans eQTL increases expression (opposing pairs), "oppos_RM_up": number of cis / trans eQTLs where the RM allele of the cis eQTL increases expression and the BY allele of the trans eQTL increases expression (opposing pairs), "excess_reinforcing_pairs", the number of excess reinforcing eQTL pairs calculated as in Fraser et al., 2010, "chi_sq_p": p-value of the chi-square test for enrichment of reinforcing pairs.
As in the original manuscript sign test manuscript, we performed a gene ontology (GO) enrichment analysis on the sets of reinforcing cis / trans eQTL pairs from the set of eQTLs to determine whether such pairs are enriched for UPS genes. We were able to replicate the previously-described enrichment for genes of the ergosterol biosynthesis pathway (Fraser et al., 2010, Author response table 5). However, there was no enrichment for ubiquitin system or proteasome GO terms in the sets of reinforcing eQTL pairs at any of the tested LOD score thresholds (Author response table 5).
Abbreviations: "GOBPID": gene ontology biological process ID, "ExpCount": expected number of genes for a given GOBPID, "LOD": LOD score threshold for including a gene with a cis / trans eQTL pair.
Because the results of our GO enrichment could be affected by the reference gene set (the list of all genes with at least one cis and one trans eQTL), we devised a complementary strategy to test for lineage-specific selection on UPS gene expression. Using the same set of BY / RM eQTLs, we applied the sign test to the sets of UPS genes, proteasome genes, ubiquitin system genes, E3 ligases, and proteasome chaperone genes. We did not detect lineage-specific selection in any of these gene sets (Author response table 6).
Abbreviations: “LOD”: LOD score threshold for calling cis / trans eQTL pairs, "n_eQTLs": number of eQTLs, "n_genes": number of genes with an eQTL, "n_pairs": number of cis / trans eQTL pairs, "reinf_BY_up": number of cis / trans eQTLs where the BY allele of the cis and trans eQTLs increases expression (reinforcing pairs), "reinf_RM_up": number of cis / trans eQTLs where the RM allele of the cis and trans eQTLs increases expression (reinforcing pairs), "oppos_BY_up": number of cis / trans eQTLs where the BY allele of the cis eQTL increases expression and the RM allele of the trans eQTL increases expression (opposing pairs), "oppos_RM_up": number of cis / trans eQTLs where the RM allele of the cis eQTL increases expression and the BY allele of the trans eQTL increases expression (opposing pairs), "excess reinforcing pairs" the number of excess reinforcing pairs, calculated as in Fraser et al., 2010, "chi_sq_p": p-value of the chi-square test for enrichment of reinforcing pairs.
Although we did not detect lineage-specific selection on global or UPS gene mRNA abundance, we note that these analyses do not detect lineage-specific selection on factors other than transcript abundance. For example, lineage-specific selection may occur through causal missense variants, such as the ones we identified here. For example, the DOA10 and NTA1 genes each contain multiple causal missense variants that alter UPS activity, but neither gene's transcript abundance is affected by a local eQTL. Integration of the effects of missense variants with those that alter gene expression in selection tests remains an interesting avenue for future work.
We note that the significant excess of UPS QTLs at which the RM allele increases UPS activity (pg. 6, para. 2, line 170) provides evidence that N-end Rule activity could have been subject to lineage-specific selection. To test whether this result was due to a strong enrichment of RM alleles at certain individual reporters, we applied this analysis to the sets of QTLs obtained for the 20 individual N-degrons. We did not detect an enrichment of any of the individual N-degrons, suggesting that the enrichment of RM QTLs that increase UPS activity is a general effect (Author response table 7).
Abbreviations: "N_QTLs": Number of QTLs for the indicated N-degron, "N_RM_up": Number of QTLs where the RM allele increases UPS activity for the indicated N-degron, "N_BY_up": Number of QTLs where the BY allele increases UPS activity for the indicated N-degron. "p_value": p-value of the binomial test for the set of QTLs for the indicated N-degron.
The paper reports 149 loci, but if I am reading this correctly it looks like this may double-count shared loci. It would be nice to discuss a bit more about likely sharing (ie pleiotropy) of the hits. (It's also a bit hard to see from 2B which of these are likely shared.)
The reviewer is correct. As described in our response to Reviewer 1, the revised manuscript now indicates that the 149 instances of QTL detection correspond to 35 distinct QTL regions (pg. 6, para. 3, line 185). Figure 2—figure supplement 2 provides detailed information on these regions, including which reporters each region affects. The revised manuscript includes an extended discussion of pleiotropy among the set of N-end Rule QTLs (pg. 6, para. 3).
L143: it should be possible to make this statement quantitative.
As noted in our response to Reviewer 1, it is not readily possible to calculate the amount of variance explained by each QTL due to the pooled nature of our bulk segregant analysis QTL mapping method. Accordingly, we have removed the statement mentioned by the reviewer from the revised Results section.
Para 303, 312: It seems likely that you may be looking for an effect that is small at individual genes but widespread. I would suggest looking more carefully at the overall distribution: eg in the protein data is there an upward shift in the mean? You could also use a more sophisticated method like ashR to study the distribution of changes.
We appreciate the reviewer’s suggestion, which we suspect may have been prompted by our wording of the following sentence from the Results section (page 23, paragraph 2, lines 307-310 of the original manuscript):
“This result is consistent with recent observations that suggest that altering UBR1 expression exerts broad effects on protein degradation or related processes controlling protein abundance and that protein sequences, rather than function or subcellular localization, are the primary determinants of degradation rates.”
Our intent was to convey that substrates of E3 ligases such as Ubr1 are more likely to share sequence features than they are to share a function or subcellular localization. In other words, E3 ligases typically target sets of proteins that are functionally diverse but that share common sequence features, e.g., an N-degron. The wording of this sentence may have unintentionally suggested that the causal UBR1 promoter variant should affect the abundance of many proteins. However, E3 ligases influence the abundance of distinct sets of dozens to several hundred proteins (Kong et al., 2021, Christiano et al., 2020). The moderate effect of the causal UBR1 promoter variant on UBR1 expression is, therefore, not expected to create widespread effects on the proteome but instead to affect a small subset of Ubr1 substrates and related proteins. We have revised this section to more clearly articulate these ideas. We have also re-analysed our data as described below.
The causal UBR1 variant significantly altered the abundance of 39 of 3,047 detected proteins at a 0.1 false discovery rate (FDR) threshold. Following the reviewer’s suggestion, we computed the overall median log2 fold change and found that it was -0.012 for all 3,047 detected proteins (that is, very close to zero) and 0.37 for the set of 39 differentially abundant proteins (that is, an average increase for proteins with significantly different abundance). The number of differentially abundant proteins, the average upward shift in log2 fold change for differentially abundant proteins, and the significant fraction of differentially abundant proteins exhibiting increased abundance (28 / 39, 72%, binomial p = 9.5e-3) are all consistent with the causal variant’s moderate effect on UBR1 expression.
The causal variant also did not have widespread effects on the transcriptome (median log2 fold change = -0.0024 [again, very close to zero]). As reported in our initial submission, 78 genes were differentially expressed at an FDR of 0.1. For these genes, the overall effect of the causal BY allele, which decreases UBR1 expression, was to decrease transcript abundance (median log2 fold change = -0.18, 60 / 78 decreased expression [77%, binomial p = 2e-6]).
Following the reviewer’s suggestion, we used ashr to explore how using the false sign rate (FSR, Stephens, 2017; https://doi.org/10.1093/biostatistics/kxw041) to call differentially expressed genes influenced our analysis of the causal UBR1 variant’s effects. Using an FSR threshold of 0.1, we detected 86 genes with altered mRNA transcript abundance. For these genes, the BY allele tended to decrease transcript abundance (median log2 fold change = -0.2, 65 / 86 [76%, binomial p = 2e-6]). All differentially expressed genes detected using the FDR were also detected using the FSR. Eight additional differentially expressed genes were detected using the FSR, but not the FDR. Each of these eight genes had lower absolute log2 fold changes than the 78 differentially expressed genes detected using the FDR. Therefore, we conclude that while the FSR is slightly more permissive, the FDR and FSR both capture the same global patterns of effects in our RNA-seq data.
In contrast to these consistent results in RNA-seq, ashr produced different results than the FDR when applied to our proteomics data. As reported in our initial submission, a 0.1 FDR threshold applied to abundance differences reported by the Proteome Discoverer software results in 39 differentially abundant proteins. The derived BY UBR1 allele increased the abundance of 28 of these 39 (72%, binomial p = 9.5e-3) differentially abundant proteins. At a 0.1 FSR threshold, we identified 56 differentially abundant proteins, with the BY allele increasing the abundance of 13 / 56 (23%, binomial p = 7.3e-5).This difference arises from the fact that there are considerable discrepancies among the sets of differentially abundant proteins called by FDR and FSR. We note that of the 39 differentially abundant proteins reported by Proteome Discoverer, 12 were estimated to have large absolute fold changes (>= 0.5, the 99th percentile observed in our data) but were not reported as significant by FSR (Author response image 1). Notably, these 12 genes included the known Ubr1-regulated proteins Tma10 and Adh2 (Kong et al., 2021, Christiano et al., 2020). For 8 of these 12 proteins, the BY allele increased protein abundance. These 12 proteins had relatively high standard errors and, as expected, their fold changes were considerably lower following the adaptive shrinkage that is part of ashr (Author response image 1). Second, ashr identified an additional 48 differentially abundant proteins. The BY allele decreases the abundance of 41 of these 48 proteins (Author response image 1).
Given that ashr strongly depends on correctly estimated standard errors, its application to mass-spectrometry data with a small sample size could be potentially problematic in a manner that does not seem to apply to RNA-seq, which, particularly at the very high sequencing depth used here, produces more accurate estimates of the standard error. We have therefore elected to continue to use the FDR applied to p-values reported by Proteome Discoverer to call differentially expressed genes at the protein and RNA levels. We note that our general conclusion that the causal UBR1 promoter variant affects the expression of dozens of genes at the protein and RNA levels remains unchanged irrespective of the method used to call differentially expressed genes.
For Figure 5 you could also consider adding information about global patterns in means and correlations, which are difficult to read from these plots.
The revised Figure 5 contains the median log2 fold changes for our proteomics and RNA-seq data (pg. 14). The revised Results section reports the correlation in log2 fold change for genes detected in both our proteomics (pg. 13, para. 4, line 362) and RNA-seq data (pg. 13, para. 5, line 377).
https://doi.org/10.7554/eLife.79570.sa2