Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.
Read more about eLife’s peer review process.Editors
- Reviewing EditorDiethard TautzMax Planck Institute for Evolutionary Biology, Plön, Germany
- Senior EditorAleksandra WalczakCNRS, Paris, France
Reviewer #1 (Public review):
Summary:
In this manuscript, the authors present a method to detect natural selection on transcription factor binding sites (TFBSs), which is an upgraded version of a previously published method (Liu and Robinson-Rechavi, 2020). This upgraded version of the test implements more explicit models of evolution and is shown to outperform its predecessor in terms of both power and false positive rate. I think this method can be a valuable resource for the community and can be helpful not only to studies of TFBSs but also broader evolutionary questions related to genotype-phenotype maps or fitness landscapes.
Major comments:
(1) Questions related to Figure 1
Figure 1, along with the first section of the Results, shows that the SVM score and its sensitivity to mutations are generally correlated with the strength of ChIP-seq signals. It is not very clear to me, however, what the motivation is behind this part of the paper. It seems that the model used to predict binding strength is a pre-existing one, and it is unclear what is new in this section. Was the prediction model retrained using different data? Was its validity confirmed using new data? I would appreciate some more elaboration on how these results differ from what was presented in the previous study of Liu and Robinson-Rechavi (2020).
The existence of weak or negative correlations between SVM and coverage, which reportedly reflects low-quality peaks, seems applicable not only to this paper, but also to previous ones, so I would like to have it confirmed whether the question and the authors' answers apply to previous studies as well.
It is reported that SVM scores capture TF binding signals better than conservation-based statistics do. My intuitive interpretation is that both ChIP-seq peaks and SVM scores are supposed to reflect binding strength, whereas conservation is supposed to reflect selection (i.e., different definitions of "function" as mentioned above). It is not explicitly explained in the Results, however, what the difference indicates, leaving only an impression that the SVM score is "better" than the conservation statistics.
In summary, I think further elaboration on the above problems would make the flow of thought of this paper easier to follow.
(2) Lack of directional selection for low binding affinity
In the analysis of Drosophila melanogaster ChIP-seq peaks, there were more cases of directional selection for higher binding affinity than directional selection for lower binding affinity. The authors suggested that this observation is "likely biological" because the same pattern was not seen in simulations (line 412-413). I wonder if this could have resulted from a difference in the distribution of ancestral binding affinity across TFBSs between real and simulated data. If binding affinity was generally low in the common ancestor of D. melanogaster and D. simulans, selection for low binding affinity would manifest mainly as purifying selection against mutations that increase affinity instead of directional selection. Ancestral sequences for simulations, if I understood correctly, are observed peaks in D. melanogaster (line 715-719), which would include high fraction sequences that could be rarer in the real ancestral sequences.
The description of this particular result does not refer to a figure or table, nor is it revisited in the Discussion. Figure 5 treats peaks under directional selection as a single category. Taken together, it is hard to tell how this observation should be interpreted. If the authors consider this result as biologically meaningful, I would suggest adding more details (e.g., the number of each side).
(3) Selection in non-focal lineages
Regarding the detected signals of directional selection for stronger binding in certain tissues (Figure 6), I wonder if it is the focal species or those very tissues that are "special": did the human lineage undergo more adaptive regulatory evolution than the chimpanzee lineage, or do nervous and male reproductive systems have a high "propensity" for adaptive regulatory evolution? Assuming that the binding preference of the same TF did not undergo a significant change since human-chimpanzee split (which, I believe, is a built-in assumption in both RegEvo and the permutation test), it should be possible to perform the same test using chimpanzee sequences that are homologous to the human ChIP-seq peak regions. In the case of coding sequences, for example, Bakewell et al. (2007) found that it was the chimpanzee that had more genes under positive selection than humans; I wonder if TFBSs show the same or a different pattern.
(4) Comments on terminology
a) Meaning of "function"
The word "function" has had different meanings in the biology literature, with some authors using "functional" to refer to anything with a phenotypic effect and some using it only for targets of selection. A (putative) TFBS would be considered "functional" as long as it has TF binding affinity if we follow the effect-based definition, but only if its binding affinity is under selection if we follow the selection-based definition. In this manuscript, the term "function" appears to have been used to refer to TF binding but not selection, most notably in the first Results section. There are also places where it is less clear what "function" means exactly (e.g., "deeply conserved elements that are likely to be functionally important" of line 61). Since this paper is about evolution, it is likely that many readers prefer the selection-based definition or assume that the selection-based definition would be used. Thus, using "function" to refer to just TF binding could be confusing. To this end, I would suggest that the authors drop the word "function" or give an explicit definition early in this paper.
b) Directional selection in different directions
In this paper, selection for increased TF binding affinity is referred to as "positive directional selection", and selection in the opposite direction is called "negative directional selection" (as exemplified in Figure 2). I understand that using such shorthand names would make the text less clumsy, but these two terms could potentially be confusing, as "positive selection" and "negative (purifying) selection" are also terms referring to specific types of selection and have some connection to directional and stabilizing selection. Therefore, I suggest that the authors use something like "selection for increased/decreased binding affinity" instead, or note explicitly in the text that "positive/negative directional selection" would be used as shorthand.
Reviewer #2 (Public review):
Summary:
The manuscript by Laverre et al. provides an interesting new test of selection on TF binding. Rather than focusing on sequence changes, this test is specifically for changes in predicted TF binding affinity. The authors report directional selection on 5.1% of tested regions in Drosophila, as well as a signal of selection on CTCF binding in the human CNS and male reproductive system.
Strengths:
Overall, I think this represents an important direction for the field of molecular evolution: now that TF binding can be predicted fairly well from sequence, it can be a very useful focus for tests of selection.
Weaknesses:
As mentioned several times in the manuscript, Jiang and Zhang (2024) pointed out some issues with a previous permutation-based version of this test. Foremost among these was the issue of ascertainment bias: when testing only experimentally supported TF binding sites from a focal species, and then asking what type of selection (or lack of selection) led to those sites, one is guaranteed to find more substitutions that increase affinity, simply because the sites were selected in the first place as those with maximum (empirically measured) affinity.
To address this issue, the authors simulated Drosophila CTCF peaks evolving neutrally and then tested different ascertainment cutoffs in Figure 4D. It was not entirely clear to me what is shown in Figure 4D: the text says the bins were stratified by derived delta-SVM, whereas the figure says SVM, and the legend says derived SVM (both without the delta). I was unable to find any clarification of this in the Methods section. In any case, I am not really convinced by this, for two main reasons. First, when analyzing empirical ChIP-seq data, I would guess that only a tiny fraction of the genome is bound (far less than 1%, especially in mammalian genomes). However, the most extreme bin in Figure 4D is taking the top 10% of (delta?) SVM values. What would Figure 4D look like at bins of the highest 0.1%, 0.001%, etc? My guess is there would be a strong uptick in the FPR. The second reason is actually more important and fundamental than the first. As long as this method is working as described, I cannot see any way that it would ‘not’ be impacted by ascertainment bias. As an extreme case, imagine that all TF binding sites tested had the maximum possible SVM scores; then none of them would have any chance of showing directional selection against binding, while even those that evolved neutrally would appear to have directional selection in favor of binding. Of course, real empirical data are not as extreme as this, but the same concept applies in less extreme scenarios.
This bias could explain patterns observed in the real data. For example: "We observe much more positive than negative directional selection, a pattern likely biological rather than methodological, since it is absent from simulations." This is exactly the pattern predicted under ascertainment bias (in the extreme-scenario thought experiment above). I suspect it is absent from simulations simply because the authors did not properly account for this bias in their simulations.
If the main result reported by the authors had been a lack of any directional selection in favor of binding, and instead only neutrality or directional selection against binding, then this ascertainment bias would not be an issue- it would only have made their results conservative. Unfortunately, this is not the case, and the directional selection in favor of binding, which is the main result emphasized from the empirical analysis, could be inflated by this bias.
Minor point:
The following statement: "In contrast, phastCons and phyloP scores lack such enrichment and have a lower dynamic range, suggesting that the conservation scores are less sensitive to fine-scale variation of TF occupancy and thus regulatory region function" is only true if one assumes that TF binding is the only function of this region. One could even turn this around and say the fact that the sites affecting TF binding are not the most conserved is actually evidence that TF binding is not a good indicator of these regions' entire function. I suggest the authors soften this claim that conservation scores are less sensitive to regulatory region function.