Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.
Read more about eLife’s peer review process.Editors
- Reviewing EditorJoshua GoldUniversity of Pennsylvania, Philadelphia, United States of America
- Senior EditorJoshua GoldUniversity of Pennsylvania, Philadelphia, United States of America
Reviewer #1 (Public Review):
Summary:
A long literature in cognitive neuroscience studies how humans and animals adjudicate between conflicting goals. However, despite decades of research on the topic, a clear computational account of control has been difficult to pin down. In this project, Petri, Musslick, & Cohen attempt to formalize and quantify the problem of control in the context of toy neural networks performing conflicting tasks.
This manuscript builds on the formalism introduced in Petri et al (2021), "Topological limits to the parallel processing capability of network architectures", which describes a set of tasks as a graph in which input nodes (stimuli) are connected to output nodes (responses). Each edge in this graph links an input node to an output node, representing a "task"; i.e. a word reading task connects the input node "word" to the output node "read". Cleverly, patterns of interference and conflict between tasks can be quantified from this graph. In the current manuscript, the authors extend this framework by converting these graphs into neural networks and a) allowing edges to be continuous rather than binary; b) introducing "hidden layers" of units between input and output nodes; and c) introducing a "control" signal that modulates edge weights. The authors then examine how, in such a network, optimal behavior may involve serial versus parallel execution of different sets of tasks.
Strengths:
There is a longstanding belief in cognitive neuroscience that "control" manages conflicts by scheduling tasks to be executed in parallel versus serially; I applaud the efforts of the authors to give these intuitions a more concrete computational grounding.
My main scientific concern is that the authors focus on what seems like an arbitrary set of network architectures. The networks considered here are derived by converting task graphs, which represent a multitasking problem, into networks for _performing_ that multitasking problem. Frankly, these networks do not look like any neural network a computer scientist would use to actually solve a problem, nor do they seem biologically realistic. Furthermore, adding hidden layers to these networks only ever seems to make performance worse (Figures 4, 11), introducing unnecessary noise and interference; it would seem more useful to study a network architecture in which hidden layers fulfilled some useful purpose (as they do in the brain and machine learning).
However, this scientific concern is secondary to the major problem with this paper, which is clarity.
Major problem: A lack of clarity
I found this paper extremely difficult to read. To illustrate my difficulty, I will describe a subset of my confusion.
The authors define the "entropy" of an action in equation 1, but the content of the equation gives what is sometimes referred to as the "surprisal" of the action. Conventionally (as per Wikipedia and any introductory textbook I am familiar with), entropy is the "expected surprisal" of a random variable, not the surprisal of a single action. This creates immediate confusion going into the results. Furthermore, defining "entropy" this way means that "information" is functionally equivalent to accuracy for the purposes of this paper, in which case I do not know what has been gained by this excursion into (non-standard) information-theoretic terminology.
They next assert that equation 1 is the information _cost_ of an action. No motivation is given for this statement and I do not know what it means. In what sense is a "cost" associated with the negative logarithm of a probability?
In the next section II.B, the authors introduce a new formalism in which responses are represented by task graph nodes _R_. What is the relationship between an action _a_ and the responses _R_? Later, in section II.C, edges _f_ in the task graph are used as seemingly drop-in replacements for actions _a_.
I simply have no idea what is going on in equations 31 through 33. Where are the functions _R_ (not to be confused with the response nodes _R_) and _S_ defined? Or how are they approximated? What does the variable _t_ mean and why does it appear and disappear from equations seemingly at random?
Response times seem to be important, but as far as I can tell, nowhere do the authors actually describe how response times are calculated for the simulated networks.
Similar issues persist through the rest of the paper: unconventional formalism is regularly introduced using under-explained notation and without a clear relationship to the scientific questions at hand. As a result, the content and significance of the findings are largely inscrutable to me, and I suspect also to the vast majority of readers.
Reviewer #2 (Public Review):
Summary:
The authors develop a normative account of automaticity-control trade-offs using the mathematics of information theory, which they apply to abstract neural networks. They use this framework to derive optimal trade-off solutions under particular task conditions.
Strengths:
On the positive side, I appreciate the effort to rigorously synthesize ideas about multi-tasking within an information-theoretic framework. There is potentially a lot of promise in this approach. The analyis is quite comprehensive and careful.
Weaknesses:
Generally speaking, the paper is very long and dense. I don't in principle mind reading long and dense papers (though conciseness is a virtue); it becomes more of a slog when it's not clear what new insights are being gained from laboring through the math. For example, after reading the Stroop section, I wasn't sure what new insight was provided by the information-theoretic formalism which goes beyond earlier models. Is this just an elegant formalism for expressing previously conceived ideas, or is there something fundamentally new here that's not predicted by other frameworks? The authors cite multiple related frameworks addressing the same kinds of data, but there is no systematic comparison of predictions or theoretical interpretations. Even in the Discussion, where related work is directly addressed, I didn't see much in terms of explaining how different models made different predictions, or even what predictions any of them make.
After a discussion of the Stroop task early in the paper, the analysis quickly becomes disconnected from any empirical data. The analysis could be much more impactful if it was more tightly integrated with relevant empirical data.