Individuality transfer: Predicting human decision-making across tasks

Hiroshi Higashi

doi:10.7554/eLife.107163.1

1 Introduction

Humans (and other animals) exhibit substantial commonalities in their decision-making processes. However, considerable variability is also frequently observed in how individuals perform perceptual and cognitive decision-making tasks [5, 1]. This variability arises from differences in underlying cognitive mechanisms. For example, individuals may vary in their ability or tendency to retain past experiences [12, 8], respond to events with both speed and accuracy [52, 45], or explore novel actions [17]. If these factors can be meaningfully disentangled, they would enable a concise characterization of individual decision-making processes, yielding a low-dimensional, parameterized representation of individuality. Such a representation could, in turn, be leveraged to predict future behaviours at an individual level. Shifting from population-level predictions to an individual-based approach would mark a significant advancement in domains where precise behaviour prediction is essential, such as social and cognitive sciences. Beyond prediction, this approach offers a framework for parametrizing and clustering individuals, thereby facilitating the visualization of behavioural heterogeneity, which has applications in psychiatric analysis [31, 10]. Furthermore, this parameterization offers a promising pathway toward computational modeling at the individual level—that is, replicating the cognitive and functional characteristics of individuals in silico [41].

Cognitive modelling is a standard approach for reproducing and predicting human behaviour [29, 3, 57], often implemented within a reinforcement learning framework (e.g., [30, 9, 54]). However, because these cognitive models are manually designed by researchers, their ability to accurately fit behavioural data may be limited [16, 44, 28, 13]. A data-driven approach using artificial neural networks (ANNs) offers an alternative [11, 33, 39]. Unlike cognitive models, which rely on predefined behavioural assumptions [37], ANNs require minimal prior assumptions and can learn complex patterns directly from data. For instance, convolutional neural networks (CNNs) have successfully replicated human choices and reaction times in various visual tasks [25, 35, 15]. Similarly, recurrent neural networks (RNNs) [42, 7] have been applied to model value-guided decision-making tasks such as the multi-armed bandit problem [56, 10]. A promising approach to capturing individual decision-making tendencies while preserving behavioural consistency is to tune ANN weights using a parameterized representation of individuality.

This idea was first proposed by Dezfouli et al. [10], who employed an RNN to solve a two-armed bandit task. Their study utilized an autoencoder framework [38, 48], in which behavioural recordings from a single session of the bandit task, performed by an individual, were fed into an encoder. The encoder produced a low-dimensional vector, interpreted as a latent representation of the individual. Similar to hypernetworks [20, 22], a decoder then took this low-dimensional vector as input and generates the weights of the RNN. This framework successfully reproduced behavioural recordings from other sessions of the same bandit task while preserving individual characteristics. However, since this individuality transfer has only been validated within the bandit task, it remains unclear whether the extracted latent representation captures an individual’s intrinsic tendencies across a variety of tasks.

To address this question, we aim to make the low-dimensional representation— referred as individuality index—robust to variations across individuals and tasks, thereby enhancing its generalizability. Specifically, we propose a framework that predicts an individual’s behaviours, not only in the same task but also in similar yet distinct tasks and environments. If the individuality index indeed serves as a low-dimensional representation of an individual’s decision-making process, then extracting it from one task could facilitate the prediction of that individual’s behaviours in another.

In this study, we define the problem of individuality transfer across tasks as follows (also illustrated in Figure 1). We assume access to a behavioural dataset from multiple individuals performing two tasks: a source task and a target task. We train an encoder that takes behavioural data from the source task as input and outputs an individuality index. This index is then fed into a decoder, which generates the weights of an ANN, referred to as a task solver, that reproduces behaviours in the target task. For testing, a new individual provides behavioural data from the source task, allowing us to infer their individuality index. Using this index, a task solver is constructed to predict how the test individual will behave in the target task. Importantly, this prediction does not require any behavioural data from the test individual performing the target task. We refer to this framework as EIDT, an acronym for encoder, individuality index, decoder and task solver.

The EIDT (encoder, individuality index, decoder, and task solver) framework for individuality transfer across tasks.
The encoder maps action(s) α, provided by an individual K performing a specific problem ϕ in the source task A, into an individuality index (represented as a point in the two-dimensional space in the center). The individual index is then fed into the decoder, which generates the weights for a task solver. The task solver predicts the behaviour of the same individual K in the target task B. During the training, a loss function evaluates the discrepancy between the predicted behaviour and the actual recorded behaviour β of individual K. The encoder’s input is referred to as an *action sequence*, the form of which depends on task. For example, in a sequential Markov decision process (MDP) task, an action sequence consists of an environment (state transition probabilities) and a sequence of actions over multiple episodes. For a digit recognition task, it consists of a stimulus digit image and the corresponding chosen response.

We evaluated whether the proposed EIDT framework can effectively transfer individuality in both value-guided sequential decision-making tasks and perceptual decision-making tasks. To assess its generalizability across individuals, meaning its ability to predict the behaviour of previously unseen individuals, we tested the framework using a test participant pool that was not included in the dataset used for model training. To determine how well our framework captures each individual’s unique behavioural patterns, we compared the prediction performance of a task solver specifically designed for a given individual with the performance of task solvers designed for other individuals. Our results indicate that the proposed framework successfully mimics decision-making while accounting for individual differences.

2 Results

We evaluated our individuality transfer framework, which consists of an encoder, a decoder, a task solver, and an individuality index, using two transfer problems: a value-guided sequential decision-making task (an MDP task) and a perceptual decision-making task (a handwritten digit recognition task, referred to as the MNIST task). The dataset for both tasks include two or more distinct experimental settings. Since each human participant completed all settings, we tested transfer problems in which one setting served as the source task, while another served as the target task.

2.1 Markov decision process (MDP) task

The dataset consisted of behavioural data from 98 participants, as detailed in Section 4.2.2. Participants performed the MDP task under two different conditions: a 2-step task and a 3-step task. Each participant completed three sequence blocks for both the 2- and 3-step tasks, with each sequence comprising 50 episodes. Thus, the dataset contained a total of N = 6 (sequences) × 98 (participants) = 588 (action sequences). For model training, sequence data from 70% of the participants (68 participants) were used to train the encoder and decoder. An additional 10% (10 participants) were designated as validation data, used to determine the optimal number of training iterations. The remaining 20% (20 participants) were allocated for evaluating the performance of the model.

Neural network exhibits superior performance in behaviour prediction

We evaluated our model under two experimental situations, summarized in Table 1. The first situation, Situation SX (no transfer), does not involve individuality transfer across tasks. Since individuality is not explicitly considered in this situation, we designed average models as baselines, described below.

Simulated experimental situations.
The “Availability” column indicates data availability for a given participant pool and task during model training. In the “Participant pool” column, “train”, “valid”, and “test” refer to the training, validation, and test participant pools, respectively.

Cognitive model (CG)

We implemented a cognitive model based on Q-learning, as detailed in Section 4.2.3. The model parameters—including the learning rate q_lr, forgetting factor q_ff, discount rate q_dr, and inverse temperature q_it—were estimated using a maximum likelihood approach based on behavioural data from the target task in the training and validation participant pools. Using these estimated parameters, we constructed a Q-learning agent to predict the behaviour of participants in the test participant pool.

Task solver (TS)

We employed a task solver TS(·), as defined in (1). Instead of using the EIDT framework, the weights of the task solver network were directly trained using data from the training participant pool. To determine the optimal stopping point during training, we used early stopping based on the loss measured on the validation participant pool—training was terminated when the validation loss reached its minimum. The trained task solver was then used to predict the behaviour of participants in the test participant pool.

For further details on the model training procedure, refer to Section 4.2.4.

Under Situation SX, we evaluated whether a neural network-based behaviour prediction model (i.e., the task solver) outperforms the cognitive model in predicting average performance across individuals. The result (Figure 2A) confirmed this. A two-way (model: CG/TS, task: 2/3) repeated-measures (RM) ANOVA with Greenhouse-Geisser correction (significant level was 0.05) revealed a significant effect of model (F_1,19 = 29.944, p < 0.001,), indicating that the task solver provided significantly better predictions than the cognitive model. We found no significant effect of task (F_1,19 = 0.162, p = 0.692,) and no significant interaction effect between model and task (F_1,19 = 3.144, p = 0.092,).

Prediction error results for the MDP task.
A Under Situation SX. The box represents the interquartile range (IQR), with the central line indicating the median. Whiskers extend to the minimum and maximum values. Connected dots represent data from the same participant. B Under Situation SY. C Comparison between the transfer-to-original participant (Original) and transfer-to-other-participants (Others) settings.

EIDT framework excels in individuality transfer

Under Situation SY (transfer across tasks), we assumed that that behavioural data from the source task of the test participant pool could be leveraged to predict behaviour in the target task. This situation was designed to evaluate individuality transfer across tasks, which is the primary focus of this study. We tested two models:

Cognitive model (CG)

We implemented a Q-learning-based cognitive model, as described in Section 4.2.3. The parameters (q_lr, q_ff, q_dr, and q_it) for a participant in the test pool were estimated using a maximum likelihood approach based on behavioural data from that participant’s source task. Using these estimated parameters, we constructed a Q-learning agent capable of solving the target task (not the source task) and used it to predict the participant’s behaviour in the target task.

EIDT

We employed our EIDT framework, which was trained using data from both and source and target tasks of the training participant pool. As with the TS model under Situation SX, data from the validation participant pool were used for early stopping during training. To predict behaviour in the target task for a participant K in the test participant pool, we first input each sequence from 𝒜_K (behavioural data from participant K in task A) into the encoder. The individuality index for participant K was computed as the average: . The weights of the task solver for that participant were then generated using the decoder output: . The task solver, incorporating these weights, was used to predict behaviour in response to a given problem ψ in ℬ_K.

For details on the model training procedure, refer to Section 4.2.4.

We compared the prediction error between models, as shown in Figure 2B. A two-way (model: CG/EIDT, transfer task set: 2→3/3→2) RM ANOVA revealed significant effects of model (F_1,19 = 21.372, p < 0.001,) and a significant interaction effect (F_1,19 = 5.053, p = 0.037,). However, there was no significant effect of transfer task set (F_1,19 = 0.330, p = 0.573,). Pairwise comparisons indicated that the cognitive model exhibited significantly greater prediction error than the EIDT model for both transfer task sets: 3→2 (t₁₉ = 3.634, p = 0.002 with Bonferroni correction) and 2→3 (t₁₉ = 4.254, p < 0.001). These results demonstrate that the EIDT model predicts human behaviour more accurately—with a lower negative likelihood— than the cognitive model.

Individuality index forms clusters for each individual

We visualized the individuality indices in Figure 3 and evaluated them using the within-individual and between-individual distances, as defined in Section 4.4.1. For both transfer task sets, the within-individual distance was significantly shorter than the between-individual distance (t₁₉ = −5.515, p < 0.001 for 3→2 and t₁₉ = −2.991, p = 0.008 for 2→3). Since the within-individual distance is shorter than the between-individual distance, the individuality indices from the same individual are positioned closer together in the individuality index space, forming a distinct cluster for each individual.

Individuality indices in the test participant pool for the MDP tasks.
Dots represent the average individuality index for each participant, while shaded areas show confidence ellipses, enclosing 98.9% of the points for each participant.

We also evaluated the extent to which the variables of the individuality index are disentangled, using the Pearson correlation coefficient. A test of the null hypothesis of uncorrelation showed that the two variables (z₀ and z₁) are significantly correlated for the 3→2 transfer (r = 0.687, p < 0.001) and the 2→3 transfer (r = −0.323, p = 0.012).

Individuality is preserved across tasks

We evaluated whether the models could predict unique behavioural patterns for each individual. To do so, we compared prediction accuracy for a given participant’s behaviour using a task solver trained specifically for that participant with its accuracy using task solvers trained for other participants. That is, we examined accuracy for the original task solver (Original) versus accuracy for other task solvers (Others). If the task solver effectively captures individuality, it should perform significantly better for Original than for Others. For details on this comparison, refer to Section 4.4.2.

The cognitive model exhibited this tendency (Figure 2C, left). A two-way (transfer task set: 3→2/2→3, transfer-to participant(s): Original/Others) RM ANOVA revealed a significant effect of transfer task set (F_1,19 = 5.383, p = 0.032,) and a significant effect of transfer-to participant(s) (F_1,19 = 25.169, p < 0.001,), while an interaction effect (F_1,19 = 0.047, p = 0.831,) was not significant.

The EIDT model exhibited a similar pattern (Figure 2C, right). A two-way RM ANOVA revealed a significant effect of transfer-to participant(s) (F_1,19 = 26.976, p < 0.001,) and an interaction effect (F_1,19 = 20.238, p < 0.001,). This did not show a significant effect for transfer task set (F_1,19 = 0.201, p = 0.659,). Bonferroni-corrected pairwise comparisons confirmed that prediction loss for Original was significantly lower for Others in both transfer task sets: 3→2 (t₁₉ = 4.278, p < 0.001) and 2→3 (t₁₉ = 5.282, p < 0.001). While the task solver, utilizing the individuality index of the original participant, predicted that participant’s behaviours more accurately than the cognitive model, it was less accurate in predicting the behaviour of other participants. This suggests that the EIDT model successfully captures individual characteristics, leading to personalized behaviour predictions.

2.2 Handwritten digit recognition (MNIST) task

The dataset was originally collected and published by Rafiei et al. [34]. It contains behavioural data from 60 human participants who performed a digit discrimination task. The experiment followed a 2 × 2 factorial design with two factors: task difficulty (easy versus difficult images, controlled by noise level) and speed pressure (accuracy versus speed focus, controlled by experimental instruction). This design resulted in four experimental settings; EA (easy, accuracy focus), ES (easy, speed focus), DA (difficult, accuracy focus), and DS (difficult, speed focus). Our individuality transfer experiments evaluated whether the proposed framework could transfer individuality from one setting to another. Each setting included 120 unique images, and each participant made a decision for each image twice, resulting in a total of 960 trials per participant. behavioural data from 70% of participants (42 participants) were used to train the encoder and decoder. Additional 10% of participants (6 participants) were assigned as validation data for early stopping during training. The remaining 20% of participants (12 participants) were used to evaluate model performance.

Our average model outperforms a no-fitting model

Situation SX, a no transfer situation, was used to evaluate how well our task solver fits human behaviour compared to an existing model. Ideally, this comparison would involve models that can be explicitly fitted to human behaviour. However, no readily adaptable model was available. As an alternative, we used an average model, RTNet [34], which predicts human-like decisions and reaction times.

RTNet (RN)

RTNet accumulates evidence generated by a CNN (see Section 4.3.2) to produce stochastic, human-like decisions and response times.

Task solver (TS)

The task solver TS(), as defined in Section 4.3, was trained using data from the training participant pool.

Our results confirmed that the task solver with fitting outperforms RT-Net without fitting, as shown in Figure 4. A two-way (model: RN/TS, task: EA/ES/DA/DS) RM ANOVA revealed significant effects of model (F_3,33 = 104.744, p < 0.001, ), task (F_1,11 = 322.550, p < 0.001, ), and their interaction (F_3,33 = 72.733, p < 0.001, ). Bonferroni-corrected pairwise comparisons showed significant differences between EA and DA, ES and DA, EA and DS, and ES and DS (p < 0.001 for all). However, no significant differences were found between EA and ES or DA and DS. Re-production accuracy may correlate with the percentage of correct responses in the MNIST task. The original study [34] reported that human accuracy (and RTNets performance) depended on task difficulty but not on focus condition. In our case, the task solvers achieved correct response rates of 77.7% for EA, 79.0% for ES, 62.0% for DA, and 61.6% for DS. These results are consistent with those reported for RTNet and suggest that reproduction accuracy depends primarily on task difficulty, i.e., uncertainty in response.

The prediction error under Situation SX (no transfer) in the MNIST task.

Individuality index forms clusters for each individual

We visualized the individuality indices and evaluated them using the within-individual and between-individual distances (see Section S1, Supplementary materials). For most transfer task sets, the within-individual distance was significantly shorter than the between-individual distance, indicating that individuality indices from the same participant tended to cluster together. A test of the null hypothesis of no correlation revealed that the two variables of the individuality index (z₀ and z₁) were significantly correlated for all transfer task sets.

Individuality preserves across tasks

Situation SY (transfer across tasks) was used to evaluate whether our EIDT model could predict the unique behaviour of each individual in an unexperienced (target) task. As in Section 2.1, we compared the prediction accuracy in the target task for a given participant using a task solver trained for that participant (Original) with its accuracy using task solvers trained for other participants (Others).

As shown in Figure 5A, the percentage of correct responses was approximately 78% for the easy difficulty level and 60% for the difficult level, aligning with the accuracy levels of human participants and RTNets [34] A two-way (transfer task set: 12 sets (see x-axis in Figure 5), transfer-to participant(s): Original/Others) RM ANOVA revealed a significant effect of transfer task set (F_1,121 = 402.851, p < 0.001,), no significant effect of transfer-to participant(s) (F_1,11 = 0.999, p = 0.339,), and no significant interaction effect (F_1,121 = 0.478, p = 0.718,).

Results for the transfer-to-original-participant (Original) and transfer-to-other-participants (Others) settings in the MNIST task.
A Percentage of correct responses. B Prediction error (likelihood). C Percentage of matches to actual behaviour.

A comparison using individual behavioural data revealed differences between Original and Others. Figure 5B shows the prediction accuracy for each task transfer set. A two-way RM ANOVA revealed significant effects of transfer task set (F_1,121 = 84.899, p < 0.001,) and transfer-to participant(s) (F_1,11 = 18.180, p = 0.001, ), and a significant interaction effect (F_11,121 = 2.665, p = 0.047, ). Bonferroni-corrected pairwise comparisons indicated that the prediction loss for Original was significantly lower than for Others in the transfer task sets of DA →EA, EA →ES, and DS →ES, and DA →DS. However, no significant differences were found in the other eight transfer sets (see Table S1, Supplementary materials). The percentage of matches to actual behaviour is shown in Figure 5C. A two-way RM ANOVA revealed significant effects of transfer task set (F_1,121 = 113.049, p < 0.001, ) and transfer-to participant(s) (F_1,11 = 14.626, p = 0.003, ), and no significant interaction (F_11,121 = 1.789, p = 0.143,). These results for prediction error and percentage of matches indicate that a task solver specialized for a specific individual using the EIDT model does not predict well the behaviour of other individuals.

Figure 6 visualizes the percentage of correct responses for each digit in the MNIST tasks, highlighting behavioural tendencies reproduced by the model. For example, Participant #33 exhibited a lower percentage for digit 3 compared to other digits, which was captured by the model. Participant #47 showed a drastically lower percentage for digit 1, a pattern also reflected in the model’s predictions.

Percentage of correct responses for each stimulus digit in human behaviour and model predictions. Model predictions were conducted under the transfer experiment from Setting ES to Setting EA.

3 Discussion

We proposed an EIDT framework for modeling the unique decision-making process of each individual. This framework enables the transfer of an individuality index from a (source) task to a different (target) task, allowing a task solver predict behaviours in the target task. Several neural network techniques, such as autoencoders [38, 48], hypernetworks [20], and learning-to-learn [53, 43], facilitate this transfer. Our experiments, conducted on both value-guided sequential and perceptual decision-making tasks, demonstrated the potential of the proposed EIDT framework in individuality transfer across tasks.

EIDT framework transfers individuality across tasks

The EIDT framework was originally proposed by Dezfouli et al. [10] for a bandit task. Their model was designed to predict behaviour within a session performed by a given participant, leveraging behavioural data from other sessions for model training. We extended this idea in three key ways. First, we validated that the framework is effective for previously unseen individuals who were not included in model training. Although these individuals provided behavioural data in the source task to identify their individuality indices, their data were not used for model training. Second, we demonstrated that individuality indices can be transferred even when the source and target tasks differ, indicating that decision-making tendencies are transferable across tasks. Third, while the original work focused on value-guided tasks, we validated the framework’s applicability to perceptual decision-making tasks, specifically the MNIST task. These findings establish that the EIDT framework effectively captures individual differences across both tasks and individuals.

Interpreting the individuality index remains challenging

The interpretation of the individuality index remains an open question. Since interpretation often requires task-specific considerations [13], it falls outside the primary scope of this study, whose aim is to develop a general framework for individuality transfer. Previous research [28, 18] has explored associating neural network parameters with cognitive or functional meanings. Approaches such as disentangling techniques [2] and cognitive model integration [19, 49, 44, 14] could aid in better understanding the cognitive and functional significance of the individuality index.

Regarding the individuality index, while disentanglement and separation losses [10] during the training of model training could enhance interpretability, we used only the reproduction loss, as defined in (5). Our results showed that the variables of the individuality index were correlated, suggesting that they do not independently influence behaviour. Although this dependency complicates the interpretation of the individuality index, we decided not to introduce explicit disentanglement constraints because interpretable parameters in cognitive models (e.g., [9]) are not necessarily independent (e.g., an individual with a high learning rate may also have a high inverse temperature [27], resulting these two parameters is represented with one variable).

Why can the encoder extract individuality for unseen individuals?

Our experiments, which divided participants into training and test participant pools, demonstrated that the framework successfully extracts individuality for completely new individuals. This generalization likely relies on the fact that individuality indices form clusters and individuals similar to new participants exist in the training participant pool [57]. The success of individuality transfer in our study suggests that individuals can be clustered based on behavioural patterns. Behavioural clustering has been widely discussed in relation to psychiatric conditions, medication effects, and gender-based differences (e.g., [31, 50, 40]). Our results could contribute to a deeper discussion of behavioural characteristics by clustering not only these groups but also healthy controls.

Which processes contribute to individuality?

In the MNIST task, we assumed that individuality emerged primarily from the decision-making process (implemented by an RNN [45, 6]), rather than from the visual processing system (implemented by a CNN [55]). The CNN was pretrained, and the decoder did not tune its weights. Our results do not rule out the possibility that the visual system also exhibits individuality [24, 47]; however, they imply that individual differences in perceptual decision-making can be explained primarily by variations in decision-making system [36, 51, 57, 21]. This assumption provides valuable insights for research on human perception.

Limitations

One limitation is that the source and target tasks were relatively similar. Even the cognitive model, despite its lower prediction accuracy, captured individual tendencies to some extent. Thus, our findings do not fully evaluate the generalizability of individuality transfer across diverse task domains. To broaden applicability, future studies should validate transfer across more distinct tasks that share underlying principles of individuality.

The effectiveness of individuality transfer may be influenced by dataset volume. As discussed earlier, prediction performance may depend on whether similar individuals exist in the training participant pool. In our study, 100 participants were sufficient for effective transfer. However, tasks involving greater behavioural diversity may require a substantially larger dataset.

As discussed earlier, the interpretability of the individuality index requires further investigation. Furthermore, the optimal dimensionality of the individuality index remains unclear. This likely depends on the complexity of tasks involved—specifically, the number of factors needed to represent the diversity of behaviour observed in those tasks. While these factors have been explored in cognitive modeling research (e.g., [23, 13]), a clear understanding at the individual level is still lacking. Integrating cognitive modeling with data-driven neural network approaches [10, 19] could help identify key factors underlying individual differences in decision-making.

Future directions

To further generalize our framework, a large-scale dataset is necessary, as discussed in the limitations. This dataset should include a large number of participants to ensure prediction performance for diverse individuals [32]. All participants should perform the same set of tasks, which should include a variety of tasks [56]. Building upon our framework, where the encoder currently accepts action sequences from only a single task, a more generalizable encoder should be able to process behavioural data from multiple tasks to generate a more robust individuality index. To enhance encoder, a multi-head neural network architecture [4] could be utilized. A generalized individuality index would enable transfer to a wider variety of tasks and allow accurate and detailed parameterization of individuals using data from only a single task.

Robust and generalizable parameterization of individuality enables computational modeling at the individual level. This approach, in turn, makes it possible to replicate individuals’ cognitive and functional characteristics in silico [41]. We anticipate that it offers a promising pathway toward a new frontier: artificial intelligence endowed with individuality.

4 Methods

4.1 General framework for individuality transfer across tasks

We formulate the problem of individuality transfer, which involves extracting an individuality index from a source task and predicting behaviour in a target task with preserving individuality. We consider two tasks, A and B, which are different but related. For example, task A might be a 2-step MDP task, while task B is a 3-step MDP task.

The individuality transfer across tasks is defined as follows. An individual K performs a problem within task A, with their behaviour recorded as 𝒜_K. Our objective is to predict ℬ_K, which represents K’s behaviour when performing task B. To achieve this, we extract an individuality index z from 𝒜_K, capturing the individual’s behavioural characteristics. This index z is then used to construct a task solver, enabling it to mimic K’s behaviour in task B. Since task A provides data for estimating the individuality index and task B is the target of behaviour prediction, we refer to them as the source task and target task, respectively.

Our proposed framework for the individuality transfer consists of three modules:

Task solver predicts behaviour in the target task B.

Encder extracts the individuality index from the source task A.

Decoder generates the weights of the task solver based on the individuality index.

These modules are illustrated in Figure 1. We refer to this framework as EIDT, an acronym for encoder, individuality index, decoder and task solver.

4.1.1 Data representation

For training, we assume that behaviour data from a participant pool (𝒫K ∉ 𝒫), where each participant has performed both tasks A and B. These dataset are represented as 𝒜 = {𝒜_n}_n∈P and ℬ = {ℬ_n}_n∈P.

For each individual n, the set 𝒜_n consists of one or more sets, each containing a problem instance ϕ (stimuli, task settings, or environment in task A) and a sequence of action(s) α (recorded behavioural responses). For example, in an MDP task, ϕ represents the Markov process (state-action-reward transition) and α consists of choices over multiple trials. In a simple object recognition task, ϕ is a visual stimulus and α is the participant’s response to the stimulus.

Similarly, ℬ_n consists of a problem instance ψ and an action sequence β.

4.1.2 Task solver

The task solver predicts the action sequence for task B as

where ψ is a specific problem in task B and Θ_TS represents the solver’s weights. The task solver architecture is tailored to task B. For example, in an MDP task, the task solver outputs a sequence of actions in response to ψ. In a simple object recognition task, it produces an action based on a visual stimulus ψ.

4.1.3 Encoder

The encoder processes an action sequence(s) α and generates an individuality index z ∈ ℝ^M as

where ϕ is a problem in task A, Θ_ENC represents the encoder’s weights, and M is the dimensionality of the individual index. The encoder architecture is task-specific and designed for task A.

4.1.4 Decoder

The decoder receives the individuality index z and generates the task solver’s weights as

where Θ_DEC represents the decoder’s weights. Since the decoder determines the task solver’s weights, it functions as a hypernetwork [20, 22].

4.1.5 Training objective

Although tasks A and B differ, an individual’s decision-making system remains consistent across tasks. We model this using the individuality index z, linking it to the task solver via the encoder and decoder. For training, we use behavioural dataset {𝒜_n, ℬ_n }_n∈𝒫 from a individual pool 𝒫.

Let α be an action sequence representing individual n’s behaviour on the source task, i.e., (α, ϕ) ∈ 𝒜_n, n ∈ 𝒫. The individuality index is derived by z = ENC(α, ϕ; Θ_ENC). The weights of the task solver are then given by Θ_TS = DEC(z; Θ_DEC). Subsequently, the task solver, with the given weights, predicts an action sequence for task B as ,where (β, ψ) ∈ B_n. We then measure the prediction error between and β as:

where β is an action sequence in ℬ_n recorded along with the problem ψ, and O(·,·) is a suitable loss function (e.g., likelihood-based loss for probabilistic outputs). Using the datasets containing the behaviour of the individual pool 𝒫, the weights of the encoder and decoders, Θ_ENC and Θ_DEC, are optimized by minimizing the total loss:

This section provides a general formulation of individuality transfer across two tasks. For specific details on task architectures and loss functions, see Sections 4.2 and 4.3.

4.2 Experiment on MDP task

We validated our individuality transfer framework using two different decisionmaking tasks: the MDP task and the MNIST task. This section focuses on the MDP tasks, a dynamic multi-step decision-making task.

4.2.1 Task

At the beginning of each episode, an initial state-cue is presented to the participant. For human participants, the state-cue is represented by animal images (Figure 7). For the cognitive model (Q-learning agent) and neural network-based model, the state-cue is represented numerically (e.g., (2, 1) for the first task state in the second choice). The participant makes a binary decision (denoted as action C₁ or C₂) for each step. In the human experiment, these actions correspond to pressing the left or right cursor key. With a certain probability (either 0.8/0.2 or 0.6/0.4), known as the state-action transition probability, the participant transitions to one of two subsequent task states. This process repeats two times for the 2-step MDP task and three times in the 3-step MDP task. After the final step, the participant receives an outcome: either a reward (r = 1) or no reward (r = 0). For human participants, rewards were displayed as symbols, as shown in Figure 7. Each sequence from initial state-cue presentation to reward delivery constitutes an episode.

The 3-step MDP task.
A. Tree diagram illustrating state-action transitions. B. Flow of a single episode in the behavioural experiment for human participants.

The state-action transition probability T (s, a, s^′) from a task state s to a preceding state s^′ given an action a varies gradually across episodes. With probability p_trans, one of the transition probabilities switches to a new set chosen from {(0.8, 0.2), (0.2, 0.8), (0.6, 0.4), (0.4, 0.6)}. Consequently, participants must adjust their decision-making strategy in response to these shifts in transition probabilities to maintain reward maximization.

4.2.2 Behavioural data collection

We recruited 123 participants via Prolific. All participants provided their informed consent online. This study was approved by the Committee for Human Research at the Graduate School of Engineering, The University of Osaka, and compiled with the Declaration of Helsinki. Participants received a base compensation of £4 for completing the entire experiment, A performance-based bonus (£0 to £2, average: £1) was awarded based on rewards earned in the MDP task.

Each participant completed 3 sequences for each step condition (2-step and 3-step MDP tasks), with each sequence comprising 50 episodes. The order of the 2-step and 3-step MDP tasks was randomized across sequences. State-cue assignment (animal images) were randomly determined for each sequence. Participants took a mandatory break (≥ 1 minute) between sequences.

To ensure data quality, we excluded participants based on response time, action bias, and task performance, as follows. Participants with an average response time was below 0.75 s or whose standard deviation of response time was over 10 s were excluded. Additionally, participants exhibiting a strong action bias (where 85% or more of actions were either left or right presses) were rejected. Furthermore, participants with an average reward below below (− 1.5 × interquartile range + first quantile) were excluded. This screening resulted in a final sample of 98 participants.

4.2.3 Cognitive model: Q-learning

To model decision-making in the MDP task, we employed a Q-learning agent [46]. At each step t, the agent was presented with the current task state s_t and selected an action a_t. The agent maintained Q-values, denoted as Q(s, a), for all state-action pairs, where s was a state of the set of all possible task states 𝒮 and a was an action of the set of available actions in that state 𝒞_s. The probability of selecting action a was determined by a softmax policy:

where q_it > 0 was a parameter called the inverse temperature or reward sensitivity, controlling the balance between exploration and exploitation.

After selecting action a_t, the agent received an outcome r_t ∈ 0, 1 and transitioned to a new state s_t+1. The Q-value for the selected action was updated by

where q_lr ∈ (0, 1) was the learning rate, determining how much newly acquired information replaced existing knowledge, and q_dr ∈ (0, 1) was the discount rate, governing the extent to which future rewards influenced current decision. For actions not selected, the agent applied a forgetting factor q_ff, updating Q-values as

for .

4.2.4 EIDT model

This section describes the specific models used for individuality transfer in the MDP task.

Data representation

Since MDP tasks involve sequential decision-making, each action sequence consists of multiple actions within a single session. In our experiment, each participant completed L trials per session, with L = 100 for the 2-step task and L = 150 for the 3-step MDP task. The action sequence is represented as [(s₁, a₁, r₁), …, (s_L, a_L, r_L)], where, s_t denotes the task state at trial t, a_t ∈ C represents the action selected from the set (with K = 2 in out task), and r_t∈{ 0, 1} indicates whether a reward was received. In the M-step MDP task described in Section 4.2, each task state is represented as (m, c_m), where m denotes the current step within the episode (m ∈ {1, …, M}) and c_m corresponds to the cue presented to the participant. The action sequence, denoted as α or β, consists of a sequence of selected actions (a₁, …, a_L), while a problem, denoted as ϕ or ψ, is represented as ((s₁, …, s_L), (r₁, …, r_L)).

Task solver

Before describing the encoder and decoder, we define the architecture of the task solver, which generates actions for the M-step MDP task. The task solver is implemented using a gated recurrent unit (GRU) [7] with Q cells, where Q = 4 for the 2-step task and Q = 8 for the 3-step task. At time-step t, the GRU takes as input the previous hidden state h_t−1 ∈ ℝ^Q, the previous task state s_t−1, the previous action a_t−1, the previous reward r_t−1, and the current task state s_t. It then updates the hidden state as

where Φ represents the GRU’s weights. The updated hidden state is then used to predict the probability of selecting each action through a fully-connected feed forward layer:

where v_t represents the logit scores for each action (unnormalised probabilities), and W∈ ℝ^K×Q is the weight matrix. The probabilities of each action are computed using a softmax layer:

where π(a_t = C_k) represents the probability of selecting action C_k at time t, and [v_t]_i denotes the ith element of v_t.

For input encoding, we used a 1-of-K scheme. The step of the MDP task is encoded as [1, 0, 0] for step 1, [0, 1, 0] for step 2, and [0, 0, 1] for step 3. Each task state s_m is represented as [1, 0] or [0, 1] to distinguish the two state cues at each step. The participant’s action is encoded as C₁: [1, 0] or C₂: [0, 1], while the reward is represented as 0: [1, 0] or 1: [0, 1]. These encodings are concatenated to form input sequences.

The task solver TS(ψ; Θ_TS) generates a sequence of predicted action probabilities ,using the GRU, the fully-connected layer W, and the softmax layer. The problem ψ defines the MDP environment, specifying state transitions and reward outcomes in response to selected action.

To evaluate prediction accuracy, the loss function , defined in (4), compares human-performed action {β, ψ} with those predicted by the task solver, . Notably, the problem ψ is not executed with the task solver; instead, the task solver predicts action probabilities based on the same task state and reward history as in the human behavioural data.

Encoder and decoder

The encoder ENC(α, ϕ; Θ_ENC) extracts an individuality index z from a sequence of actions α corresponding to a given environment ϕ. The first module of the encoder is a GRU, similar to the task solver, with R = 32 cells. The final hidden state h_L ∈ ℝ^R serves as the basis for computing the individuality index z ∈ ℝ^M using a fully-connected feed-forward network with four layers d(·) as z = d(h_L).

The decoder takes the individuality index z as input and generates the weights for the task solver by Θ_TS = DEC(z; Θ_DEC). The decoder is implemented as a single-layer linear network.

4.3 Experiment on MNIST task

This section describes the specific models used for individuality transfer in hand-written digit recognition (MNIST) task.

4.3.1 Task

The dataset used in this experiment was originally collected and published by Rafiei et al. [34]. In this task, participants were presented with a stimulus image depicting a handwritten digit and were required to respond by pressing the corresponding number key, as illusrtaed Figure 1. For further details regarding the task design and data collection, refer to [34].

4.3.2 EIDT model

Data representation

An action sequence, denoted as α or β, consists of a single action a and its corresponding response time b. The associated problem, represented as ϕ or ψ, corresponds to a stimulus image. The action a is selected from a set {C₁, … C_K}. Since the task involves recognizing digits ranging from 0 to 9, the number of possible actions is K = 10. The stimulus image, ϕ or ψ, is an image of size H× W. In this experiment, we adopted the same resolution as [34], setting H = W = 227.

Task solver

The task solver for the handwritten digit recognition task is based on the model proposed by [34]. Their model consists of a CNN and an evidence accumulation module. However, since their model represents average human behaviour and does not account for individuality differences, we replace the accumulation module with a GRU [6] to capture individuality. The CNN module processes the input image and produces an evidence vector e = CNN(ψ), where e ∈ ℝ^K and CNN(·) is based on the AlexNet architecture [26]. The weights of the CNN are sampled from a Bayesian neural network (BNN), introducing stochasticity in the output. This stochasticity enables the models to generate human-like, probabilistic decisions.

The stimulus image is fed into the CNN S times, generating S evidence distributions e_t ∈ ℝ^K at each time step t = 0, …, S −1. In this study, we set S = 16 to match the maximum response time, as described later. Since the CNN weights are stochastically sampled from the BNN, the CNN’s output varies even when the same image is input multiple times. To model individuality in decision-making, we introduce a GRU with Q cells (Q = 4 in our setup). The GRU receives as input the previous hidden state h_t−1 ∈ ℝ^Q and the current evidence e_t, updating its hidden state as

where Φ represents the GRU’s network weights. The updated hidden state is passed through a dense layer (as defined in (10)) and a softmax layer (as defined in (11)) to generate the probability distribution over possible digit classifications [P_t(C₁), …, P_t(C_K)] at each time step t.

To evaluate the prediction error, we compare the action sequences generated by human participants {β, ψ} with those predicted by the task solver ,incorporating response times into these analysis. The actual response time b is converted into an integer time step using the formula: .For example, a response time of b = 0.765 sec is converted to .The likelihood of observed decision is then calculated as ,where a is the actual digit chosen by the participant.

In this task solver, the CNN (driven by BNN) models a visual processing system, while the RNN represents the decision-making system. We assume that the visual system (implemented by CNN and BNN) is shared across all individuals, whereas the decision-making system (implemented by RNN) captures individual differences. Based on this assumption, the CNN and BNN are pretrained using the MNIST dataset [26], and their weight distributions are fixed across individuals. The pretraining procedure followed the original methodology [34].

Encoder and decoder

Since each action sequence contains only a single action, it does not form a true “sequence.” This makes it challenging to extract individuality from a single data point. To address this, the encoder takes a set of single action sequences as input rather than a single sequence. Specifically, the encoder extracts the individuality index z from U sets of stimulus images ϕ_u and their corresponding responses α_u, where u = 1, …, U.

Here, ϕ_u represent the stimulus presented in the uth trial, and α_u represents the corresponding response. The number of action sequences U corresponds to the number of samples available for each individual in the dataset. Since the outputs for these action sequences are just averaged, U can be adjusted flexibly. The encoder architecture consists of a single CNN module, a single GRU, and a fully-connected feed-forward network. The CNN module is identical to the one used in the task solver. Given an input ϕ_u, let e_t,u represent the evidence output from the CNN at time step t. The GRU, which consists R cells (R = 16 in our setup), updates its hidden state based on the previous state, the current CNN evidence, and an encoding of the response action by

where Ψ represents the network weights. The function outputs the one-hot encoded action a if ,and zeros otherwise. The value represents the converted response time, obtained from the original response time b in the action sequence α_u. After processing all U sequences, the final hidden states are averaged across sequences: .The individuality index is then computed as z = d(h), where d(·) represents a single-layer fully-connected feed-forward network. The decoder, implemented as a single linear layer, takes the individuality index z as input and outputs the weights for the task solver.

4.4 Analysis

4.4.1 Within- and between-individual distances for evaluating the individuality index

To evaluate the disentanglement of the individuality index, we computed two types of distances: within-individual distance and between-individual-distance. Let z(α, ϕ) denote the individuality index derived from an action sequence (α, ϕ). The within-individual distance quantifies the average distance between the individuality index for each action sequence and the average individuality index for a given individual. It is defined as

where represents the average individuality index for individual n, computed as . The between-individual distance measures the distance between the average individuality index for each individual and the overall average individuality index across all individuals. It is defined as

where represents the average individuality index across all individuals, given by .If the within-individual distance κ is smaller than the between-individual distance λ_n, this suggests that the individuality indices derived from different behavioural data of individual n are closely clustered, indicating consistency in the captured decision-making tendencies. Comparing these distances provides insight into whether the individuality index effectively represents stable and distinctive characteristics of an individual’s decision-making process.

4.4.2 Evaluation for uniqueness in task solvers

To assess whether a task solver generates unique behaviour for a specific participant, we compared two prediction loss scores. The first score, referred to as Original, represents the standard prediction loss for a specific participant when using the task solver designed for that participant. It is formulated as

where Θ_ENC and Θ_DEC are omitted for simplicity, as in (4). The second score, referred to as Others, measures the prediction loss for a specific participant when using task solvers designed for other participants. It is formulated as

where 𝒫 represents the set of participant indices in the test participant pool. If the task solvers successfully captures the unique decision-making tendency of each participant, the Original score L_K should be lower than the Others score ϒ_K for all participants K ∈ 𝒫.

5 Data availability

The behavioural data for the MDP task has been made publicly available at https://github.com/hgshrs/indiv_trans

Acknowledgements

This work was supported in part by the Japan Society for the Promotion of Science (JSPS) KAKENHI, grant number 22H05163 and 24K15047, and Japan Science and Technology Agency (JST) Advanced International Collaborative Research Program (AdCORP), grant number JPMJKB2307.

Additional information

6 Code availability

All code and trained models have been made publicly available at https://github.com/hgshrs/indiv_trans

Author contributions

H.H. designed and performed the research, collected and analyzed the data, and drafted and edited the paper.

Additional files

Supplemental materials

Significance of findings

Strength of evidence

Abstract

1 Introduction

The EIDT (encoder, individuality index, decoder, and task solver) framework for individuality transfer across tasks.

2 Results

2.1 Markov decision process (MDP) task

Neural network exhibits superior performance in behaviour prediction

Simulated experimental situations.

Cognitive model (CG)

Task solver (TS)

Prediction error results for the MDP task.

EIDT framework excels in individuality transfer

Cognitive model (CG)

EIDT

Individuality index forms clusters for each individual

Individuality indices in the test participant pool for the MDP tasks.

Individuality is preserved across tasks

2.2 Handwritten digit recognition (MNIST) task

Our average model outperforms a no-fitting model

RTNet (RN)

Task solver (TS)

Individuality index forms clusters for each individual

Individuality preserves across tasks

Results for the transfer-to-original-participant (Original) and transfer-to-other-participants (Others) settings in the MNIST task.

Percentage of correct responses for each stimulus digit in human behaviour and model predictions. Model predictions were conducted under the transfer experiment from Setting ES to Setting EA.

3 Discussion

EIDT framework transfers individuality across tasks

Interpreting the individuality index remains challenging

Why can the encoder extract individuality for unseen individuals?

Which processes contribute to individuality?

Limitations

Future directions

4 Methods

4.1 General framework for individuality transfer across tasks

4.1.1 Data representation

4.1.2 Task solver

4.1.3 Encoder

4.1.4 Decoder

4.1.5 Training objective

4.2 Experiment on MDP task

4.2.1 Task

The 3-step MDP task.

4.2.2 Behavioural data collection

4.2.3 Cognitive model: Q-learning

4.2.4 EIDT model

Data representation

Task solver

Encoder and decoder

4.3 Experiment on MNIST task

4.3.1 Task

4.3.2 EIDT model

Data representation

Task solver

Encoder and decoder

4.4 Analysis

4.4.1 Within- and between-individual distances for evaluating the individuality index

4.4.2 Evaluation for uniqueness in task solvers

5 Data availability

Acknowledgements

Additional information

6 Code availability

Author contributions

Additional files

References

Article and author information

Author information

Hiroshi Higashi

Author Notes

Version history

Cite all versions

Copyright

Metrics