Peer review process
Revised: This Reviewed Preprint has been revised by the authors in response to the previous round of peer review; the eLife assessment and the public reviews have been updated where necessary by the editors and peer reviewers.
Read more about eLife’s peer review process.Editors
- Reviewing EditorArun SPIndian Institute of Science Bangalore, Bangalore, India
- Senior EditorYanchao BiBeijing Normal University, Beijing, China
Reviewer #2 (Public review):
The strengths of this paper are clear: The authors are asking a novel question about geometric representation that would be relevant to a broad audience. Their question has a clear grounding in pre-existing mathematical concepts, that have been only minimally explored in cognitive science. Moreover, the data themselves are quite striking, such that my only concern would be that the data seem almost too perfect. It is hard to know what to make of that, however. From one perspective, this is even more reason the results should be published. Yet I am of the (perhaps unorthodox) opinion that reviewers should voice these gut reactions, even if it does not influence the evaluation otherwise. I have a few additional comments:
(1) The authors have now explained their theoretical position in a much more thorough and accessible way. I applaud them for that.
(2) Although I continue to believe that the manipulation in Experiment 1 is imperfect, I am convinced by the authors that the subsequent evidence is more convincing, and thus that the merit of this work lies mostly in those data.
If these results are robust, I believe the authors have discovered something of great value. While this paper stops short of providing definitive evidence in support of the Erlangen program (just as most work in vision science has stopped short of providing definitive evidence in support of its favored view), the data are sufficiently novel and provocative that these theories are worth entertaining further.
Author response:
The following is the authors’ response to the original reviews.
eLife Assessment
This important study proposes a framework to understand and predict generalization in visual perceptual learning in humans based on form invariants. Using behavioral experiments in humans and by training deep networks, the authors offer evidence that the presence of stable invariants in a task leads to faster learning. However, this interpretation is promising but incomplete. It can be strengthened through clearer theoretical justification, additional experiments, and by rejecting alternate explanations.
We sincerely thank the editors and reviewers for their thoughtful feedback and constructive comments on our study. We have taken significant steps to address the points raised, particularly the concern regarding the incomplete interpretation of our findings.
In response to Reviewer #1, we have included long-term learning curves from the human experiments to provide a clearer demonstration of the differences in learning rates across invariants, and have incorporated a new experiment to investigate location generalization within each invariant stability level. These new findings have shifted the focus of our interpretation from learning rates to the generalization patterns both within and across invariants, which, alongside the observed weight changes across DNN layers, support our proposed framework based on the Klein hierarchy of geometries and the Reverse Hierarchy Theory (RHT).
We have also worked to clarify the conceptual foundation of our study and strengthen the theoretical interpretation of our results in light of the concerns raised by Reviewers #1 and #2. We have further expanded the discussion linking our findings to previous work on VPL generalization, and addressed alternative explanations raised by Reviewers #1.
Reviewer #1 (Public Review):
Summary:
Visual Perceptual Learning (VPL) results in varying degrees of generalization to tasks or stimuli not seen during training. The question of which stimulus or task features predict whether learning will transfer to a different perceptual task has long been central in the field of perceptual learning, with numerous theories proposed to address it. This paper introduces a novel framework for understanding generalization in VPL, focusing on the form invariants of the training stimulus. Contrary to a previously proposed theory that task difficulty predicts the extent of generalization - suggesting that more challenging tasks yield less transfer to other tasks or stimuli - this paper offers an alternative perspective. It introduces the concept of task invariants and investigates how the structural stability of these invariants affects VPL and its generalization. The study finds that tasks with high-stability invariants are learned more quickly. However, training with low-stability invariants leads to greater generalization to tasks with higher stability, but not the reverse. This indicates that, at least based on the experiments in this paper, an easier training task results in less generalization, challenging previous theories that focus on task difficulty (or precision). Instead, this paper posits that the structural stability of stimulus or task invariants is the key factor in explaining VPL generalization across different tasks
Strengths:
- The paper effectively demonstrates that the difficulty of a perceptual task does not necessarily correlate with its learning generalization to other tasks, challenging previous theories in the field of Visual Perceptual Learning. Instead, it proposes a significant and novel approach, suggesting that the form invariants of training stimuli are more reliable predictors of learning generalization. The results consistently bolster this theory, underlining the role of invariant stability in forecasting the extent of VPL generalization across different tasks.
- The experiments conducted in the study are thoughtfully designed and provide robust support for the central claim about the significance of form invariants in VPL generalization.
Weaknesses:
- The paper assumes a considerable familiarity with the Erlangen program and the definitions of invariants and their structural stability, potentially alienating readers who are not versed in these concepts. This assumption may hinder the understanding of the paper's theoretical rationale and the selection of stimuli for the experiments, particularly for those unfamiliar with the Erlangen program's application in psychophysics. A brief introduction to these key concepts would greatly enhance the paper's accessibility. The justification for the chosen stimuli and the design of the three experiments could be more thoroughly articulated.
We appreciate your feedback regarding the accessibility of our paper, particularly concerning the Erlangen Program and its associated concepts. We have revised the manuscript to include a more detailed introduction to Klein’s Erlangen Program in the second paragraph of Introduction section. It provides clear descriptions and illustrative examples for the three invariants within the Klein hierarchy of geometries, as well as the nested relationships among them (see revised Figure 1). We believe this addition will enhance the accessibility of the theoretical framework for readers who may not be familiar with these concepts.
In the revised manuscript, we have also expanded the descriptions of the stimuli and experimental design for psychophysics experiments. These additions aim to clarify the rationale behind our choices, ensuring that readers can fully understand the connection between our theoretical framework and experimental approach.
- The paper does not clearly articulate how its proposed theory can be integrated with existing observations in the field of VPL. While it acknowledges previous theories on VPL generalization, the paper falls short in explaining how its framework might apply to classical tasks and stimuli that have been widely used in the VPL literature, such as orientation or motion discrimination with Gabors, vernier acuity, etc. It also does not provide insight into the application of this framework to more naturalistic tasks or stimuli. If the stability of invariants is a key factor in predicting a task's generalization potential, the paper should elucidate how to define the stability of new stimuli or tasks. This issue ties back to the earlier mentioned weakness: namely, the absence of a clear explanation of the Erlangen program and its relevant concepts.
We thank you for highlighting the necessary to integrate our proposed framework with existing observations in VPL research.
Prior VPL studies have not concurrently examined multiple geometrical invariants with varying stability levels, making direct comparisons challenging. However, we have identified tasks from the literature that align with specific invariants. For example, orientation discrimination with Gabors (e.g., Dosher & Lu, 2005) and texture discrimination task (e.g., Wang et al., 2016) involve Euclidean invariants, and circle versus square discrimination (e.g., Kraft et al., 2010) involves affine invariants. On the other hand, our framework does not apply to studies using stimuli that are unrelated to geometric transformations, such as motion discrimination with Gabors or random dots, depth discrimination, vernier acuity, spatial frequency discrimination, contrast detection or discrimination.
By focusing on geometrical properties of stimuli, our work addresses a gap in the field and introduces a novel approach to studying VPL through the lens of invariant extraction, echoing Gibson’s ecological approach to perceptual learning.
In the revised manuscript, we have added a clearer explanation of Klein’s Erlangen Program, including the definition of geometrical invariants and their stability (the second paragraph in Introduction section). Additionally, we have expanded the Discussion section to draw more explicit comparisons between our results and previous studies on VPL generalization, highlighting both similarities and differences, as well as potential shared mechanisms.
- The paper does not convincingly establish the necessity of its introduced concept of invariant stability for interpreting the presented data. For instance, consider an alternative explanation: performing in the collinearity task requires orientation invariance. Therefore, it's straightforward that learning the collinearity task doesn't aid in performing the other two tasks (parallelism and orientation), which do require orientation estimation. Interestingly, orientation invariance is more characteristic of higher visual areas, which, consistent with the Reverse Hierarchy Theory, are engaged more rapidly in learning compared to lower visual areas. This simpler explanation, grounded in established concepts of VPL and the tuning properties of neurons across the visual cortex, can account for the observed effects, at least in one scenario. This approach has previously been used/proposed to explain VPL generalization, as seen in (Chowdhury and DeAngelis, Neuron, 2008), (Liu and Pack, Neuron, 2017), and (Bakhtiari et al., JoV, 2020). The question then is: how does the concept of invariant stability provide additional insights beyond this simpler explanation?
We appreciate your thoughtful alternative explanation. While this explanation accounts for why learning the collinearity task does not transfer to the orientation task—which requires orientation estimation—it does not explain why learning the collinearity task fails to transfer to the parallelism task, which requires orientation invariance rather than orientation estimation. Instead, the asymmetric transfer observed in our study could be perfectly explained by incorporating the framework of the Klein hierarchy of geometries.
According to the Klein hierarchy, invariants with higher stability are more perceptually salient and detectable, and they are nested hierarchically, with higher-stability invariants encompassing lower-stability invariants (as clarified in the revised Introduction). In our invariant discrimination tasks, participants need only extract and utilize the most stable invariant to differentiate stimuli, optimizing their ability to discriminate that invariant while leaving the less stable invariants unoptimized.
For example:
In the collinearity task, participants extract the most stable invariant, collinearity, to perform the task. Although the stimuli also contain differences in parallelism and orientation, these lower-stability invariants are not utilized or optimized during the task.
In the parallelism task, participants optimize their sensitivity to parallelism, the highest-stability invariant available in this task, while orientation, a lower-stability invariant, remains irrelevant and unoptimized.
In the orientation task, participants can only rely on differences in orientation to complete the task. Thus, the least stable invariant, orientation, is extracted and optimized.
This hierarchical process explains why training on a higher-stability invariant (e.g., collinearity) does not transfer to tasks involving lower-stability invariants (e.g., parallelism or orientation). Conversely, tasks involving lower-stability invariants (e.g., orientation) can aid in tasks requiring higher-stability invariants, as these higher-stability invariants inherently encompass the lower ones, resulting in a low-to-high-stability transfer effect.
This unique perspective underscores the importance of invariant stability in understanding generalization in VPL, complementing and extending existing theories such as the Reverse Hierarchy Theory. To help the reader understand our proposed theory, we revised the Introduction and Discussion section.
- While the paper discusses the transfer of learning between tasks with varying levels of invariant stability, the mechanism of this transfer within each invariant condition remains unclear. A more detailed analysis would involve keeping the invariant's stability constant while altering a feature of the stimulus in the test condition. For example, in the VPL literature, one of the primary methods for testing generalization is examining transfer to a new stimulus location. The paper does not address the expected outcomes of location transfer in relation to the stability of the invariant. Moreover, in the affine and Euclidean conditions one could maintain consistent orientations for the distractors and targets during training, then switch them in the testing phase to assess transfer within the same level of invariant structural stability.
We thank you for this good suggestion. Using one of the primary methods for test generalization, we performed a new psychophysics experiment to specifically examine how VPL generalizes to a new test location within a single invariant stability level (see Experiment 3 in the revised manuscript). The results show that the collinearity task exhibits greater location generalization compared to the parallelism task. This finding suggests the involvement of higher-order visual areas during high-stability invariant training, aligning with our theoretical framework based on the Reverse Hierarchy Theory (RHT). We attribute the unexpected location generalization observed in the orientation task to an additional requirement for spatial integration in its specific experimental design (as explained in the revised Results section “Location generalization within each invariant”). Moreover, based on previous VPL studies that have reported location specificity in orientation discrimination (Fiorentini and Berardi, 1980; Schoups et al., 1995; Shiu and Pashler, 1992), along with the substantial weight changes observed in lower layers of DNNs trained on the orientation task (Figure 9B, C), we infer that under a more controlled experimental design—such as the two-interval, two-alternative forced choice (2I2AFC) task employed in DNN simulations, where spatial integration is not required for any of the three invariants—the plasticity for orientation tasks would more likely occur in lower-order areas.
In the revised manuscript, we have discussed how these findings, together with the observed asymmetric transfer across invariants and the distribution of learning across DNN layers, collectively reveal the neural mechanisms underlying VPL of geometrical invariants.
- In the section detailing the modeling experiment using deep neural networks (DNN), the takeaway was unclear. While it was interesting to observe that the DNN exhibited a generalization pattern across conditions similar to that seen in the human experiments, the claim made in the abstract and introduction that the model provides a 'mechanistic' explanation for the phenomenon seems overstated. The pattern of weight changes across layers, as depicted in Figure 7, does not conclusively explain the observed variability in generalizations. Furthermore, the substantial weight change observed in the first two layers during the orientation discrimination task is somewhat counterintuitive. Given that neurons in early layers typically have smaller receptive fields and narrower tunings, one would expect this to result in less transfer, not more.
We appreciate your suggestion regarding the clarity of DNN modeling. While the DNN employed in our study recapitulates several known behavioral and physiological VPL effects (Manenti et al., 2023; Wenliang and Seitz, 2018), we acknowledge that the claim in the abstract and introduction suggesting the model provides a ‘mechanistic’ explanation for the phenomenon may have been overstated. The DNN serves primarily as a tool to generate important predictions about the underlying neural substrates and provides a promising testbed for investigating learning-related plasticity in the visual hierarchy.
In the revised manuscript, we have made significant improvements in explaining the weight change across DNN layers and its implication for understanding “when” and “where” learning occurs in the visual hierarchy. Specifically, in the Results ("Distribution of learning across layers") and Discussion sections, we have provided a more explicit explanation of the weight change across layers, emphasizing its implications for understanding the observed variability in generalizations and the underlying neural mechanisms.
Regarding the substantial weight change observed in the first two layers during the orientation discrimination task, we interpret this as evidence that VPL of this least stable invariant relies more on the plasticity of lower-level brain areas, which may explain the poorer generalization performance to new locations or features observed in the previous literature (Fiorentini and Berardi, 1980; Schoups et al., 1995; Shiu and Pashler, 1992). However, this does not imply that learning effects of this least stable invariant cannot transfer to more stable invariants. From the perspective of Klein’s Erlangen program, the extraction of more stable invariants is implicitly required when processing less stable ones, which leads to their automatic learning. Additionally, within the framework of the Reverse Hierarchy Theory (RHT), plasticity in lower-level visual areas affects higher-level areas that receive the same low-level input, due to the feedforward anatomical hierarchy of the visual system (Ahissar and Hochstein, 2004, 1997; Markov et al., 2013; McGovern et al., 2012). Therefore, the improved signal from lower-level plasticity resulted from training on less stable invariants can enhance higher-level representations of more stable invariants, facilitating the transfer effect from low- to high-stability invariants.
Reviewer #2 (Public Review):
The strengths of this paper are clear: The authors are asking a novel question about geometric representation that would be relevant to a broad audience. Their question has a clear grounding in pre-existing mathematical concepts, that, to my knowledge, have been only minimally explored in cognitive science. Moreover, the data themselves are quite striking, such that my only concern would be that the data seem almost *too* clean. It is hard to know what to make of that, however. From one perspective, this is even more reason the results should be publicly available. Yet I am of the (perhaps unorthodox) opinion that reviewers should voice these gut reactions, even if it does not influence the evaluation otherwise. Below I offer some more concrete comments:
(1) The justification for the designs is not well explained. The authors simply tell the audience in a single sentence that they test projective, affine, and Euclidean geometry. But despite my familiarity with these terms -- familiarity that many readers may not have -- I still had to pause for a very long time to make sense of how these considerations led to the stimuli that were created. I think the authors must, for a point that is so central to the paper, thoroughly explain exactly why the stimuli were designed the way that they were and how these designs map onto the theoretical constructs being tested.
We thank you for reminding us to better justify our experimental designs. In response, we have provided a detailed introduction to Klein’s Erlangen Program, describing projective, affine, and Euclidean geometries, their associated invariants, and the hierarchical relationships among them (see revised Introduction and Figure 1).
All experiments in our study employed stimuli with varying structural stability (collinearity, parallelism, orientation, see revised Figure 2, 4), enabling us to investigate the impact of invariant stability on visual perceptual learning. Experiment 1 was adapted from paradigms studying the "configural superiority effect," commonly used to assess the salience of geometric invariants. This paradigm was chosen to align with and build upon related research, thereby enhancing comparability across studies. To address the limitations of Experiment 1 (as detailed in our Results section), Experiments 2, 3, and 4 employed a 2AFC (two-alternative forced choice)-like paradigm, which is more common in visual perceptual learning research. Additionally, we have expanded descriptions of our stimuli and designs. aiming to ensure clarity and accessibility for all readers.
(2) I wondered if the design in Experiment 1 was flawed in one small but critical way. The goal of the parallelism stimuli, I gathered, was to have a set of items that is not parallel to the other set of items. But in doing that, isn't the manipulation effectively the same as the manipulation in the orientation stimuli? Both functionally involve just rotating one set by a fixed amount. (Note: This does not seem to be a problem in Experiment 2, in which the conditions are more clearly delineated.)
We appreciate your insightful observation regarding the design of Experiment 1 and the potential similarity between the manipulations of the parallelism and orientation stimuli.
The parallelism and orientation stimuli in Experiment 1 were originally introduced by Olson and Attneave (1970) to support line-based models of shape coding and were later adapted by Chen (1986) to measure the relative salience of different geometric properties. In the parallelism stimuli, the odd quadrant differs from the others in line slope, while in the orientation stimuli, the odd quadrant contains identical line segments but differs in the direction pointed by their angles. The faster detection of the odd quadrant in the parallelism stimuli compared to the orientation stimuli has traditionally been interpreted as evidence supporting line-based models of shape coding. However, as Chen (1986, 2005) proposed, the concept of invariants over transformations offers a different interpretation: in the parallelism stimuli, the fact that line segments share the same slope essentially implies that they are parallel, and the discrimination may be actually based on parallelism. This reinterpretation suggests that the superior performance with parallelism stimuli reflects the relative perceptual salience of parallelism (an affine invariant property) compared to the orientation of angles (a Euclidean invariant property).
In the collinearity and orientation tasks, the odd quadrant and the other quadrants differ in their corresponding geometries, such as being collinear versus non-collinear. However, in the parallelism task, participants could rely either on the non-parallel relationship between the odd quadrant and the other quadrants or on the difference in line slope to complete the task, which can be seen as effectively similar to the manipulation in the orientation stimuli, as you pointed out. Nonetheless, this set of stimuli and the associated paradigm have been used in prior studies to address questions about Klein’s hierarchy of geometries (Chen, 2005; Wang et al., 2007; Meng et al., 2019). Given its historical significance and the importance of ensuring comparability with previous research, we adopted this set of stimuli despite its imperfections. Other limitations of this paradigm are discussed in the Results section (“The paradigm of ‘configural superiority effects’ with reaction time measures”), and optimized experimental designs were implemented in Experiment 2, 3, and 4 to produce more reliable results.
(3) I wondered if the results would hold up for stimuli that were more diverse. It seems that a determined experimenter could easily design an "adversarial" version of these experiments for which the results would be unlikely to replicate. For instance: In the orientation group in Experiment 1, what if the odd-one-out was rotated 90 degrees instead of 180 degrees? Intuitively, it seems like this trial type would now be much easier, and the pattern observed here would not hold up. If it did hold up, that would provide stronger support for the authors' theory.
It is not enough, in my opinion, to simply have some confirmatory evidence of this theory. One would have to have thoroughly tested many possible ways that theory could fail. I'm unsure that enough has been done here to convince me that these ideas would hold up across a more diverse set of stimuli.
Thanks for your nice suggestion to validate our results using more diverse stimuli. However, the limitations of Experiment 1 make it less suitable for rigorous testing of diverse or "adversarial" stimuli. In addition to the limitation discussed in response to (2), another issue is that participants may rely on grouping effects among shapes in the quadrants, rather than solely extracting the geometrical invariants that are the focus of our study. As a result, the reaction times measured in this paradigm may not exclusively reflect the extraction time of geometrical invariants but could also be influenced by these grouping effects.
Therefore, we have shifted our focus to the improved design used in Experiment 2 to provide stronger evidence for our theory. Building on this more robust design, we have extended our investigations to study location generalization (revised Experiment 3) and long-term learning effects (revised Figure 6—figure supplement 2). These enhancements allow us to provide stronger evidence for our theory while addressing potential confounds present in Experiment 1.
While we did not explicitly test the 90-degree rotation scenario in Experiment 1, future studies could employ more diverse set of stimuli within the Experiment 2 framework to better understand the limits and applicability of our theoretical predictions. We appreciate this suggestion, as it offers a valuable direction for further research.
Reviewer #1 (Recommendations For The Authors):
Major comments:
- A concise introduction to the Erlangen program, geometric invariants, and their structural stability would greatly enhance the paper. This would not only clarify these concepts for readers unfamiliar with them but also provide a more intuitive explanation for the choice of tasks and stimuli used in the study.
- I recommend adding a section that discusses how this new framework aligns with previous observations in VPL, especially those involving more classical stimuli like Gabors, random dot kinematograms, etc. This would help in contextualizing the framework within the broader spectrum of VPL research.
- Exploring how each level of invariant stability transfers within itself would be an intriguing addition. Previous theories often consider transfer within a condition. For instance, in an orientation discrimination task, a challenging training condition might transfer less to a new stimulus test location (e.g., a different visual quadrant). Applying a similar approach to examine how VPL generalizes to a new test location within a single invariant stability level could provide insightful contrasts between the proposed theory and existing ones. This would be particularly relevant in the context of Experiment 2, which could be adapted for such a test.
- I suggest including some example learning curves from the human experiment for a more clear demonstration of the differences in the learning rates across conditions. Easier conditions are expected to be learned faster (i.e. plateau faster to a higher accuracy level). The learning speed is reported for the DNN but not for the human subjects.
- In the modeling section, it would be beneficial to focus on offering an explanation for the observed generalization as a function of the stability of the invariants. As it stands, the neural network model primarily demonstrates that DNNs replicate the same generalization pattern observed in human experiments. While this finding is indeed interesting, the model currently falls short of providing deeper insights or explanations. A more detailed analysis of how the DNN model contributes to our understanding of the relationship between invariant stability and generalization would significantly enhance this section of the paper.
Minor comments:
- Line 46: "it is remains" --> "it remains"
- Larger font sizes for the vertical axis in Figure 6B would be helpful.
We thank your detailed and constructive comments, which have significantly helped us improve the clarity and rigor of our manuscript. Below, we provide a response to each point raised.
Major Comments
(1) A concise introduction to the Erlangen program, geometric invariants, and their structural stability:
We appreciate your suggestion to provide a clearer introduction to these foundational concepts. In the revised manuscript, we have added a dedicated section in the Introduction that offers a concise explanation of Klein’s Erlangen Program, including the concept of geometric invariants and their structural stability. This addition aims to make the theoretical framework more accessible to readers unfamiliar with these concepts and to better justify the choice of tasks and stimuli used in the study.
(2) Contextualizing the framework within the broader spectrum of VPL research:
We have expanded the Discussion section to better integrate our framework with previous VPL studies that reported generalization, including those using classical stimuli such as Gabors (Dosher and Lu, 2005; Hung and Seitz, 2014; Jeter et al., 2009; Liu and Pack, 2017; Manenti et al., 2023) and random dot kinematograms (Chang et al., 2013; Chen et al., 2016; Huang et al., 2007; Liu and Pack, 2017). In particular, we now discuss the similarities and differences between our findings and these earlier studies, exploring potential shared mechanisms underlying VPL generalization across different types of stimuli. These additions aim to contextualize our framework within the broader field of VPL research and highlight its relevance to existing literature.
(3) Exploring transfer within each invariant stability level:
In response to this insightful suggestion, we have added a new psychophysics experiment in the revised manuscript (Experiment 3) to examine how VPL generalizes to a new test location within the same invariant stability level. This experiment provides an opportunity to further explore the neural substrates underlying VPL of geometrical invariants, offering a contrast to existing theories and strengthening the connection between our framework and location generalization findings in the VPL literature.
(4) Including example learning curves from the human experiments:
We appreciate your suggestion to include learning curves for human subjects. In the revised manuscript, we have added learning curves of long-term VPL (see revised Figure 6—figure supplement 2) to track the temporal learning processes across invariant conditions. Interestingly, and in contrast to the results reported in the DNN simulations, these curves show that less stable invariants are learned faster and exhibit greater magnitudes of learning. We interpret this discrepancy as a result of differences in initial performance levels between humans and DNNs, as discussed in the revised Discussion section.
(5) Offering a deeper explanation of the DNN model's findings:
We acknowledge your concern that the modeling section primarily demonstrates that DNNs replicate human generalization patterns without offering deeper mechanistic insights. To address this, we have expanded the Results and Discussion sections to more explicitly interpret the weight change patterns observed across DNN layers in relation to invariant stability and generalization. We discuss how the model contributes to understanding the observed generalization within and across invariants with different stability, focusing on the neural network's role in generating predictions about the neural mechanisms underlying these effects.
Minor Comments
(1) Line 46: Correction of “it is remains” to “it remains”:
We have corrected this typo in the revised manuscript.
(2) Vertical axis font size in Figure 6B:
We have increased the font size of the vertical axis labels in revised Figure 8B for improved readability.
Reviewer #2 (Recommendations For The Authors):
(1) There are many details throughout the paper that are confusing, such as the caption for Figure 4, which does not appear to correspond to what is shown (and is perhaps a copy-paste of the caption for Experiment 1?). Similarly, I wasn't sure about many methodological details, like: How participants made their second response in Experiment 2? It says somewhere that they pressed the corresponding key to indicate which one was the target, but I didn't see anything explaining what that meant. Also, I couldn't tell if the items in the figures were representative of all trials; the stimuli were described minimally in the paper.
(2) The language in the paper felt slightly off at times, in minor but noticeable ways. Consider the abstract. The word "could" in the first sentence is confusing, and, more generally, that first sentence is actually quite vague (i.e., it just states something that would appear to be true of any perceptual system). In the following sentence, I wasn't sure what was meant by "prior to be perceived in the visual system". Though I was able to discern what the authors were intending to say most times, I was required to "read between the lines" a bit. This is not to fault the authors. But these issues need to be addressed, I think.
(1) We sincerely apologize for the oversight regarding the caption for (original) Figure 4, and thank you for pointing out this error. In the revised manuscript, we have corrected the caption for Figure 4 (revised Figure 5) and ensured it accurately describes the content of the figure. Additionally, we have strengthened the descriptions of the stimuli and tasks in both the Materials and Methods section and the captions for (revised) Figures 4 and 5 to provide a clearer and more comprehensive explanation of Experiment 2. These revisions aim to help readers fully understand the experimental design and methodology.
(2) We appreciate your feedback regarding the clarity and precision of the language in the manuscript. We acknowledge that some expressions, particularly in the abstract, were unclear or imprecise. In the revised manuscript, we have rewritten the abstract to improve clarity and ensure that the statements are concise and accurately convey our intended meaning. Additionally, we have thoroughly reviewed the entire manuscript to address any other instances of ambiguous language, aiming to eliminate the need for readers to "read between the lines." We are grateful for your suggestions, which have helped us enhance the overall readability of the paper.