Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.
Read more about eLife’s peer review process.Editors
- Reviewing EditorTimothy HanksUniversity of California, Davis, Davis, United States of America
- Senior EditorMichael FrankBrown University, Providence, United States of America
Reviewer #1 (Public review):
I have to preface my evaluation with a disclosure that I lack the mathematical expertise to fully assess what seems to be the authors' main theoretical contribution. I am providing this assessment to the best of my ability, but I cannot substitute for a reviewer with more advanced mathematical/physical training.
Summary:
This paper describes a new theoretical framework for measuring parsimony preferences in human judgments. The authors derive four metrics that they associate with parsimony (dimensionality, boundary, volume, and robustness) and measure whether human adults are sensitive to these metrics. In two tasks, adults had to choose one of two flower beds which a statistical sample was generated from, with or without explicit instruction to choose the flower bed perceptually closest to the sample. The authors conduct extensive statistical analyses showing that humans are sensitive to most of the derived quantities, even when the instructions encouraged participants to choose only based on perceptual distance. The authors complement their study with a computational neural network model that learns to make judgments about the same stimuli with feedback. They show that the computational model is sensitive to the tasks communicated by feedback and only uses the parsimony-associated metrics when feedback trains it to do so.
Strengths:
(1) The paper derives and applies new mathematical quantities associated with parsimony. The mathematical rigor is very impressive and is much more extensive than in most other work in the field, where studies often adopt only one metric (such as the number of causes or parameters). These formal metrics can be very useful for the field.
(2) The studies are preregistered, and the statistical analyses are strong.
(3) The computational model complements the behavioral findings, showing that the derived quantities are not simply equivalent to maximum-likelihood inference in the task.
(4) The speculations in the discussion section (e.g., the idea that human sensitivity is driven by the computational demands each metric requires) are intriguing and could usefully guide future work.
Weaknesses:
(1) The paper is very hard to understand. Many of the key details of the derived metrics are in the appendix, with very little accessible explanation in the main text. The figures helped me understand the metrics somewhat, although I am still not sure how some of them (such as boundary or robustness as measured here) are linked to parsimony. I understand that this is addressed by the derivations in the appendix, but as a computational cognitive scientist, I would have benefited from more accessible explanations. Important aspects of the human studies are also missing from the main text, such as the sample size for Experiment 2.
(2) It is not fully clear whether the sensitivity of human participants to some of the quantities convincingly reported here actually means that participants preferred shapes according to the corresponding aspect of parsimony. The title and framing suggest that parsimony "guides" human decision-making, which may lead readers to conclude that humans prefer more parsimonious shapes. I am not sure the sensitivity findings alone support this framing, but it might just be my misunderstanding of the analyses.
(3) The stimulus set included only four combinations of shapes, each designed to diagnostically target one of the theoretical quantities. It is unclear whether the results are robust or specific to these particular 4 stimuli.
(4) The study is framed as measuring "decision-making," but the task resembles statistical inference (e.g., which shape generated the data) or perceptual judgment. This is a minor point since "decision-making" is not well defined in the literature, yet the current framing in the title gave me the initial impression that humans would be making preference choices and learning about them over time with feedback.
Reviewer #2 (Public review):
This manuscript presents a sophisticated investigation into the computational mechanisms underlying human decision-making, and it presents evidence for a preference for simpler explanations (Occam's razor). The authors dissect the simplicity bias into four different components, and they design experiments to target each of them by presenting choices whose underlying models differ only in one of these components. In the learning tasks, participants must infer a "law" (a logical rule) from observed data in a way that operationalizes the process of scientific reasoning in a controlled laboratory setting. The tasks are complex enough to be engaging but simple enough to allow for precise computational modeling.
As a further novel feature, authors derive a further term in the expansion of the log-evidence, which arises from boundary terms. This is combined with a choice model, which is the one that is tested in experiments. Experiments are run, but with humans and with artificial intelligence agents, showing that humans have an enhanced preference for simplicity as compared to artificial neural networks.
Overall, the work is well written, interesting, and timely, bridging concepts in statistical inference and human decision making. Although technical details are rather elaborate, my understanding is that they represent the state of the art.
I have only one main comment that I think deserves more comments. Computing the complexity penalty of models may be hard. It is unlikely that humans can perform such a calculation on the fly. As authors discuss in the final section, while the dimensionality term may be easier to compute, others (e.g., the volume term, which requires an integral) may be considerably harder to compute (it is true that they should be computed once and for all for each task, but still...). I wonder whether the sensitivity of human decision making with reference to the different terms is so different, and in particular whether it aligns with computational simplicity, or with the possibility of approximating each term by simple heuristics. Indeed, the sensitivity to the volume term is significantly and systematically lower than that of other terms. I wonder whether this relation could be made more quantitative using neural networks, using as a proxy of computational hardness the number of samples needed to reach a given error level in learning each of these terms.
Reviewer #3 (Public review):
Summary:
This is a very interesting paper that documents how humans use a variety of factors that penalize model complexity and integrate over a possible set of parameters within each model. By comparison, trained neural networks also use these biases, but only on tasks where model selection was part of the reward structure. In the situation where training emphasizes maximum-likelihood decisions, only neural networks, but not humans, were able to adapt their decision-making. Humans continue to use model integration simplicity biases.
Strengths:
This study used a pre-registered plan for analyzing human data, which exceeds the standards compared to other current studies.
The results are technically correct.
Weaknesses:
The presentation of the results could be improved.