Correspondence of neural network and task graph representations of two tasks.

Neural network level: shows the processing units in a neural network that implement two tasks, each of which maps the two stimulus units in each of two separate stimulus sets to the two response units of a shared response set. Graph: shows the corresponding task graph, in which each node corresponds to the processing units in a given set (outlined in blue) in the neural network, and each edge (red and green arrows) represents the set of connections (shown in gray) between the sets of processing units represented by the corresponding nodes. Note that in this task graph, we illustrate only the mapping from internal (“hidden”) representations corresponding to two sources of stimulus information to the set of response representations that are shared by them. For a more complete treatment that includes the relationship of external stimuli to their internal representations, see [57].

Tasks Graphs for the Stroop Paradigm.

Examples of task graphs showing the relationship between representations of stimulus and response sets in: (a) simple version of the standard Stroop paradigm, involving color naming and word reading; and (b) an extended version that includes an additional word pointing task. Note that shaded boxes correspond to nodes of the task graph (light gray for input nodes and pale green for output nodes), with each node comprised of a set of processing units (dark gray circles) that represent individual stimuli or responses in the set, corresponding to the processing units of the neural network implementation, and edges representing the aggregate set of associations (connection weights) between processing units that comprise a task processing pathway (see text and Figure 1 for further explanation).

Performance cost and control for single tasks.

(a) Efficacy p[P] for single task execution as a function of network depth for various values of β. (b) Automatic processing cost Ψ (solid lines) and ΔΨ (dots) for single task execution as a function of network depth for various values of β. Note that, for single tasks and sufficiently small values of β, the processing cost approaches and even reaches 0 (ΔΨ = Ψ). (c) Relationship between the processing cost reduction ΔΨ and the total amount of applied inhibition Δβ, which represent the control applied to modulate the basal inibitions β.

Information theoretic analysis of automatic performance in the Stroop paradigm.

Panel a): Neural network model of automatic processing in the Stroop task (adapted from [27], showing the processing pathways for each task, but not the mechanism responsible for control). Excitatory connections are shown as black lines, inhibitory connections as orange lines, and line thickness represents connection weight (corresponding to |ω| at the network level; see Figure 1). The pathway on the left implements the stimulus-response mappings for the color naming task, and the pathway on the right the mappings for word reading (see text for fuller description). All units have an inhibitory bias (corresponding to β) = 500. Panels b) and c): Efficacy values and processing costs, respectively, calculated for automatic performance under each of the three task conditions (see text) using the information theoretic formulation of the task corresponding to the network shown in Panel a) (see text and Equations 9-13). Network parameters are reported in Tables I in Appendix C.

Possible configurations of interference among tasks.

Blue nodes show stimulus and response sets, and colored edges show tasks formed by various mappings between them. (Note that here, as in Figure 1, we label the input nodes of the task graph as Hi, on the assumption that each receives a mapping from a different stimulus set (as in the model of the Stroop paradigm shown in Figure 4), and thus is sufficient to capture the neural network-level effects of shared representations at the hidden and/or response levels.) Green and yellow edges designate tasks that do not share representations with (i.e., are independent of) one another and thus can be performed in parallel. Red edges designate mappings that share a set of representations with another task, and thus either introduce dependencies among those tasks (panels b, d and e), or do not constitute a legal (independent) task (panel c; see Section II B or [57]). Different panels show examples of mappings associated with different types of dependencies (see text), arranged from left to right according to extent of representational sharing.

Task efficacies in multitask settings.

Efficacies for individual tasks (calculated using Eq. 18) in the multitask settings shown in Figure 5d (reproduced in panel a) and 5e (reproduced in panel d), as a function of the strength (ω) of the tasks (shown in red) responsible for structural and/or functional interference in each configuration. Panels b) and c) refer to the configuration in panel a). Panels e) and f) refer to the configuration in panel d). The efficacy curves for each task are shown in the corresponding colors in the plots. For completeness and comparison with the bottom row, in the top row right panel we show also the curve for f12 although it is zero for all ω21 values. Note that in this case the presence of f12 changes the efficacy for f22. For these examples, β = 1.2, ω11 = ω22 = 1, ω12 = .5.

Dependencies among task pathways in multilayered task graphs.

Examples of two task pathways (green and orange) in a task graph that also implements other task pathways (blue). Panels show configurations illustrating different dependency relationships between the green and orange pathways: a) independence; b) structural dependence, due to a shared node (outlined in red); c) functional dependence, due to other pathways that connect them (red dashed edges); and d) both types of dependence.

Frequency of dependency types for multitasks as a function of graph depth.

Each point corresponds to the mean proportion (and its standard deviation) of multitasks composed by two tasks that are independent (white), structurally dependent (black), and functionally dependent (gray). Results are for 1000 simple graphs with 10 nodes per layer and edges between units across adjacent layers assigned uniformly at random, with a fixed density per layer pair. Note that the proportion of independent pairs rapidly decreases, and thus increases for dependent pairs with graph depth.

Pathway selection in a multitask graph.

a) The full task graph for a multitask graph, 𝒰, showing the pathways (in orange and green) for two tasks in a specified multitask, ℳ, and the mappings for other tasks that introduce functional dependence between them (shown in red) and others that do not pose a risk of conflict (shown in blue). b) The pathways in the graph relevant for the multitask ℳ constituted by the two highlighted paths (orange and green, respectively) plus the edges in red, which are included in due to the presence of functional dependence and irrelevant pathways (faded).

Examples of multitasks involving dependent tasks with different patterns of pathway strengths.

Two multitask configurations, ℳ, each involving two tasks (orange and green pathways): a) the two pathways are composed of a mixture of stronger and weaker edges (SW in text), or b) the two pathways are composed of edges all of which have weak weights (WW in text). In both cases, the two tasks in ℳ are functionally dependent as result of the another pathway (red dashed edges) constituting .

Performance costs for multitasks involving tasks with different relative strengths.

Performance costs (automatic: Ψ; control-optimized: Ψ) for multitasking pairs of tasks chosen to have: a) both strong and weak pathways (SW; see example in Fig. 10a); and b) only weak pathways (WW; see example in Fig. 10b). In each case, performance costs are shown for strictly automatic execution (darker points) and control-optimized execution (lighter points). Note that, for graphs of all sizes, costs are less (performance efficacy is greater) for SW pairs relative to WW pairs, both for automatic and control-optimized execution. However, c) the fractional reduction in performance cost (i.e., improvement in efficacy) as a function of control, given by Eq. 30), is considerably greater for WW (violet and light blue) as compared to SW pairs (SW; red and orange), although SW pairs exhibit greater variance in this effect.

Differences in optimal control policies.

Distributions of optimal control parameter sets {Δβ} (panel a) and } (panel b) for the WW and SW cases.

Optimal serialization for three tasks.

Frequency of types of partitioning of three tasks into subsets (Θi covers) to minimize overall performance cost Ψ (i.e., optimize efficacy) in graphs of various depths (3-7 layers), in which the balance of strength among the three tasks is varied in a manner similar to Figure 12: a) two strong and one weak; or b) one strong and two weak. Each plot shows the frequency with which each type of partitioning of the three tasks (cover Θi) yields the best Ψ: all tasks in a single set and thus executed in parallel (3 tasks); two tasks in one subset executed in parallel, and in series with the remaining task executed on its own (2 + 1 tasks); or all three tasks executed in series (1 + 1 + 1 tasks). Results were obtained from 1000 graph instances for each depth, with randomly assigned weights and sets of tasks of each type chosen from them.

Total cumulative reward accrued under different automatization policies in the extended Stroop task and fixed learning rate of λ = .001.

a) Task graph for the extended Stroop task (based on the graph shown in Figure 2b) in which color naming relies on S1 → H1 → R1 (T1) and word pointing on S2 → H2 → R2 (T2, with functional dependence induced by word reading due to edge H2 → R1); but here with the addition of internal representation node (H3) that provides an alternative, initially weak pathway for word pointing, S2 → H3 → R2 (T3), that is independent of the color naming pathway T1. b) Plots of the total accumulated reward (R) over executions (proportional to T) for different practice policies (see text for description). Note that R is expressed in log units, and thus linear accumulation of reward (e.g., in the no practice condition) follows a standard logarithmic form.

Choice of execution policy for different time horizons and discount rates, and fixed learning rate of λ = .001 (same as used in Figure 14).

Frequency of execution of serial policy (T1 and T2 in series) versus frequency of execution of new parallel pathway (T1 and T3) (see text for description) for an agent for three possible horizon times (Th = 0, 20, 100, 500) and three discount rates (α = 0, 0.2, 0.5, respectively panels a-b-c). For longer horizons the preferred policies shift toward automatization (i.e., parallel execution of T1, T3), but this interacts with discount rate, such that automatization is favored to a greater extent when future value is discounted less.

Edgelist for the Stroop task network in Figure 4. Here ωw = 1.5, ωm = 2, ωs = 3, and κ = 1.2. For each different realization, we add .5 ∗ ϵ with ϵ a uniformly sampled noise in (0, 1). The β values were all set to 3 in this case.

Edge weights for the graph in Figure 14. Here ωw = 1.5, ωs = 6, and κ = 1.8. For each different realization, we add .5 ∗ ϵ with ϵ a uniformly sampled noise in (0, 1).