Introduction

Natural behavior is a time-continuous process that depends on past events, reward contingencies, expectations, and social context. To behave adaptively in uncertain environments, individuals collect and evaluate dynamic sensory evidence. Perceptual decision-making encompasses two main dimensions: competence (accuracy of the percept) and confidence in the perceptual experience. Perceptual confidence, the individual’s belief regarding the accuracy of the own perception via metacognitive assessment of the perceptual history, plays a key role in perceptual decision-making (Fleming and Lau, 2014; Kepecs and Mainen, 2012; Yeung and Summerfield, 2012). Social settings can change decision competence and confidence (Bahrami et al., 2010; Bang et al., 2017; Esmaily et al., 2023; Pescetelli et al., 2016; Pescetelli and Yeung, 2022). In a social interaction, the assessment of own and other’s behavior depends on the individuals’ skill level: Incompetent individuals do not only perform poorly but are also unable to recognize this (Kruger and Dunning, 1999). However, it remains uncertain whether this metacognitive bias applies to basic perceptual phenomena, specifically the relationship between perceptual accuracy and perceptual confidence. Furthermore, it is not clear how real-time social influence manifests when individual decisions are combined with social signals. To address these gaps, we investigate whether, under which conditions, and to what extent unconstrained social information exchange affects the continuous accuracy and confidence reports of individuals during perception of ambiguous visual stimuli during a decision-making paradigm with unconstrained social exchange.

Perceptual decision-making is susceptible to both informational and normative social influences (Frith and Singer, 2008; Takagaki and Krug, 2020; Terenzi et al., 2021; Toelch and Dolan, 2015; Van Den Bos et al., 2013). For instance, by integrating the perceptual report of a partner with one’s own subjective experience, the quality of perceptual judgments can be optimized and result in collective benefits but only under certain conditions (Bahrami et al., 2010, 2012a; Baumgart et al., 2020). In addition, social information can speed up decision-making by reducing exploration time, but it can also introduce biases and false beliefs (Bang and Frith, 2017). For example, social conformity biases information uptake towards majority choices in perceptual decision-making (Germar et al., 2016; Toelch et al., 2018). Confidence reports in particular have been shown to have a major influence during social exchange. Judging competence of others is difficult because it requires performance monitoring over time. Therefore, agents often use confidence signals of a partner as surrogate for assessing competence of others (Bang and Frith, 2017). Furthermore, confidence signals could serve as a channel for social information transfer. For instance, the expressed confidence range can adapt to specific social partners to achieve more optimal group decisions (Bang et al., 2017). Access to choices coupled with the confidence of another player has been reported to modulate post-decision wagers: dyadic choice agreement caused increases in confidence levels while disagreement reduced confidence. Furthermore, compared to solo performance, dyadic agreement resulted in greater improvements to both confidence and accuracy than disagreement reduced these factors (Pescetelli et al., 2016). Accuracy and confidence measures generally covary (Baumgart et al., 2020; Khalvati et al., 2021; Pescetelli et al., 2016), however, how real-time social feedback during continuous evidence accumulation affects each measure is poorly understood.

Previous studies of social perceptual decision-making are characterized by the following aspects: First, sensory stimuli were presented in a serial and random manner, however, sensory perception is an ongoing and correlated process. Second, trial-based, discrete paradigms have been used, with delays between perceptual and metacognitive reports. This makes it liable to include aspects in metacognitive judgments that were not available at the time of the perceptual report (Navajas et al., 2016; Yeung and Summerfield, 2012). Third, choices were limited to two options, e.g., left vs right (Esmaily et al., 2023; Kiani and Shadlen, 2009) or first vs second interval (Bahrami et al., 2010; Pescetelli and Yeung, 2022), not allowing the report of the graded sensory perception. Fourth, social exchange and joint decision-making was imposed, resulting in interdependent dyadic performance (Bahrami et al., 2010; Bang et al., 2017; Baumgart et al., 2020). Furthermore, choices have been recorded in a serial manner with individual choices preceding joint choices. To understand how decision confidence and accuracy evolve during the course of the dynamic decision-making and how both factors are shaped by the availability of social information, novel experimental methods allowing unrestricted access to continuous reports are needed. Indeed, recent work began to demonstrate the influence of continuous confidence reports during dyadic co-action, although still in a two-alternative and serial design. Dynamic, mutually visible confidence reports elicited greater increase of perceptual confidence during agreement and less reduction during disagreement, compared to static reports (Pescetelli and Yeung, 2020), and also resulted in the confidence alignment between dyadic human partners (Pescetelli and Yeung, 2022).

To overcome these limitations, a new approach that partly reconciles the mismatch between real-life continuous perception and experiments has recently emerged: continuous psychophysics (Bonnen et al., 2017, 2015; Huk et al., 2018). This technique quantifies various sensorimotor and cognitive processes in real-time via continuous perceptual reports (CPR). CPRs have been shown to be viable alternatives for estimating perceptual uncertainty parameters (Straub and Rothkopf, 2022). As perceptual reports are displayed continuously, we argue that CPRs are particularly useful to measure interacting processes that unfold over time, such as the integration of noisy sensory and social information. To that end, we developed a novel CPR paradigm that allows the assessment of individual decision-making during social (dyadic) and non-social (solo) decision-making. Notably, our social setting did not enforce dyadic payoff interdependence: the payoffs of the two players did not influence each other. Participants were free to monitor and opportunistically incorporate information provided by the other player, without being compelled to do so by the task design.

Our task was set up to dissociate the perceptual report and the associated confidence via real-time (“peri-decision”) wagering, using standardized, easily comparable visual cues (see Methods). Previous work resorted to post-decision confidence assessments like (i) numerical ratings (Boldt and Yeung, 2015; Fleming et al., 2010), (ii) post-decision wagering (Moreira et al., 2018; Persaud et al., 2007), or (iii) opt-out/decline response options (Gail et al., 2004; Hanks and Summerfield, 2017; Khalvati et al., 2021; Kiani and Shadlen, 2009; Komura et al., 2013; Smith, 1997). Here, in contrast, we use continuous wagering to allow immediate access to the full perceptual report of others, including accuracy and confidence. With this approach, we aimed to answer the following questions: First, do humans express perceptual confidence in real-time? Second, does accuracy and the expression of confidence depend on the social setting? We hypothesize that humans signal perceptual confidence based on a real-time evaluation of the visual scene, and incorporate others’ perceptual choice and confidence to better interpret ambiguous sensory information. Specifically, we expect the weighted integration of sensory and social evidence to result in more accurate and confident perceptual reports. We conjectured that a partner’s task competence is a key driver of information integration during social interaction, with more dyadic benefit for participants performing worse in the solo condition.

Results

To assess the integration of social information, we developed a behavioral paradigm (Figure 1, Supplementary Figure 1a and Supplementary Video 1, Continuous Perceptual Report task, ‘CPR’, see Methods) that enables continuous “peri-decision” wagering on the accuracy of perceptual judgments about the direction of a noisy random dot pattern (RDP). Participants used a joystick to signal their perceived direction (angle) and confidence (eccentricity, the deviation from the central position). Their task was to maximize the monetary reward score by following the direction of the RDP as accurately as possible, so that their cursor (partial circle, ‘response arc’) would overlap with occasionally presented small reward targets appearing in the direction of the coherent motion (Supplementary Figure 1b). Notably, increasing joystick eccentricity shortened the arc length, making it harder to hit the target. In case of a successful target hit, the reward score was calculated as the product of report accuracy and confidence (and was zero otherwise, Figure 1d). This reward scheme was intended to elicit continuous peri-decision wagering, by awarding large rewards for accurate and confident reports, small rewards for less accurate and/or less confident reports, and omitting rewards for inaccurate but confident reports.

Continuous perceptual report task and behavioral response profile in solo experiments.

(a) Experimental setup: Two participants in were seated in adjacent experimental booths in the same room. Subjects played a motion tracking game with a joystick either alone (‘solo’) or together with a partner (‘dyadic’, mixed order: see Supplementary Figure 1a). In dyadic experiments, subjects watched the same visual stimulus on a screen and their joystick responses as well as feedback were mutually visible in real-time. (b) Stimulus layout in dyadic condition: Random dot pattern (‘RDP’) with circular aperture and blacked out central fixation area was continuously presented for intervals of about 1 minute. Subjects were instructed to look at the central fixation cross. Joystick-controlled cursors (‘response arcs’, color-coded) are located at the edge of the fixation area. The stimulus motion signal was predictive of the location of behaviorally relevant reward targets (gray disc). Alignment of the cursor arc with the target resulted in target collection (‘hit’) and reward. The joystick eccentricity was linked to size of the cursor: Central position – wide; Eccentric position - narrow. (c) Example trace of stimulus motion signal and dyadic responses: the stimulus was frequently changing in motion direction and coherence level. Participants tracked the stimulus direction with the joystick to obtain rewards when collecting reward targets. Joystick eccentricity is illustrated with hue of colored example trace. Darker hues indicate a more central joystick position. Target presentation occurred in pseudorandom time intervals (see Supplementary Figure 1b). The visual and auditory feedback representing the reward score was provided after each target hit. (d) Reward contingencies of the CPR game. Top: Upon target hit, the reward score was calculated based on joystick eccentricity and accuracy, individually for each player. Bottom: Joystick responses at the time of target presentation for an example dyad. Each dot corresponds to the accuracy-eccentricity combination during target presentation. The shaded greyscale background illustrates the non-zero reward map. The non-shaded, white area denotes missed targets with no reward (score = 0). The accuracy and eccentricity responses of both dyadic players are summarized with the histograms (color-coded). Positive trend of reward scores, shown in Supplementary Figure 1c, indicates perceptual learning over time.

Participants played the game in different experimental settings: alone (solo) and together with a partner (dyadic). In dyadic experiments, we continuously presented the perceptual (joystick) reports – perceived motion direction and associated confidence – of both participants in a mutually visible manner. In the context of this paper, we define as social information the perceptual report of the partner. In addition, ‘task competence’ refers to perceptual accuracy, while ‘performance’ relates to the behavioral outcome (reward score).

Humans express perceptual confidence in real-time through peri-decision wagering

Confidence measures have been shown to scale with evidence accumulation time and perceptual performance (Balsdon et al., 2021, 2020; Kiani and Shadlen, 2009). Our task design intended to capture perceptual confidence changes via real-time wagering behavior. First, we verified that participants used peri-decision wagering while playing the CPR game. To that end, we quantified if the overall quality of the solo report depended on the RDP motion coherence. The following parameters were considered: joystick position (accuracy & eccentricity), the proportion of successfully collected targets (‘hit rate’, Figure 2a), and joystick response lag after an RDP direction change (Figure 2b-c, Supplementary Figure 1d).

Solo behavior during the continuous perceptual report.

(a) Coherence-dependent modulation of hit rates (left), accuracy (center) and eccentricity (right). Gray lines show averages for individual participants. Red shading illustrates the 99% confidence intervals of the mean across participants. Bold, black lines show the mean across the population. Data was first averaged within-subject, before pooling coherence conditions across-subjects. (b) Estimation of the response lag after a stimulus direction change with a cross-correlation between stimulus and joystick signal for an example subject. The lag of the maximum cross-correlation coefficient was set to be the response lag. Low coherence levels resulted in a breakdown of response reliability, indicated by the low cross-correlation coefficients (see also Supplementary Figure 1d). (c) Top: average population response lag displayed with a violin plot. Black data points show average data of individuals. White dot displays population median. Bottom: variability of the population lags, displayed by the standard deviation. Response lags and response lag reliability after stimulus change depended on stimulus coherence (color-coded).

Participants responded, on average, 643 ms ± 78 ms (Mean ± IQR, coherence pooled) after an RDP direction change, with higher motion coherence causing faster stimulus following responses. Low RDP coherence resulted in a breakdown of motion tracking, increasing the variance of cross-correlation peaks. This motion tracking profile reflects the average evidence integration time, with faster more reliable responses suggesting higher levels of confidence.

Despite high inter-subject variability, we found that motion coherence robustly impacts all behavioral response measures: hit rates, joystick accuracy and joystick eccentricity (Linear mixed effects model – see Supplementary Table 1 to Supplementary Table 4). This suggests that participants were able to adapt their behavioral response to varying stimulus difficulty to maximize monetary outcome. Hence, participants had utilized the contingencies of the game and were able to wager on their percepts. In addition to this consistent joystick response modulation, we further demonstrate that hit rates, while generally increasing with motion coherence, dropped for 24 of 38 participants (63%) during the most salient RDP coherence, suggesting overconfident or risk-seeking joystick placement. These findings imply faster evidence integration resulting in higher perceptual accuracy and confidence levels.

To assess if subjects used joystick eccentricity as a proxy for perceptual confidence, we adapted the metacognitive performance analysis of the area under the receiver operating characteristics curve (AUC) for confidence ratings to our continuous joystick responses (Fleming and Lau, 2014; Maniscalco and Lau, 2014, 2012). We used the distributions of joystick response measurements to infer whether there is a relationship between response accuracy and eccentricity, separately for each RDP coherence level. To that end, we median-split the joystick responses into high- and low eccentricity distributions. We then analyzed the AUC to quantify if the accuracy of these distributions were different. High accuracy was indeed related to higher eccentricity, suggesting an ongoing metacognitive assessment of the perceptual report that is reflected in the eccentric joystick placement (Supplementary Figure 2). Thus, participants optimized joystick placement in both response dimensions (accuracy and eccentricity) when the task was easier, which provides further evidence for real-time peri-decision wagering, and the link between joystick eccentricity and perceptual confidence.

In summary, the solo CPR data indicate that participants positioned the joystick to actively wager on their own percept. As the eccentric joystick position was the only response dimension that could be chosen freely via metacognitive assessment of the current perceptual process, it can be treated as a proxy measure of subjective perceptual confidence. Usage and range of this response parameter varied widely between participants, suggesting individual confidence ranges. These findings imply that our CPR game makes it possible to continuously assess participants’ perceptual processes and associated confidence based on the behavioral report. In the next section, we examine whether and to what extent participants incorporated the perceptual report of a second player into their own decision-making process and how this affected perceptual accuracy and confidence.

Social setting changes perceptual confidence during real-time decision-making

Previous studies have shown that, under certain conditions, two participants are more successful in perceptual decision-making than the more competent player on its own (Bahrami et al., 2012b; Pescetelli and Yeung, 2022). We asked whether participants performed the CPR task better when a second player was playing along, and if so, whether changes in perceptual competence or confidence were the driving forces behind any improvement. To that end, we have developed a two-player (dyadic) CPR task (Figure 1a) to assess whether and how participants incorporated information from another player into their own perceptual report. Dyadic conditions could be real, with two human participants playing simultaneously (‘HH dyad’; participants were introduced to the other player beforehand); or simulated, with one participant (who is led to believe to be playing with another participant) performing alongside a computer agent (‘HC dyad’). This section covers results of human-human dyadic experiments. Both participants reacted to the same RDP and could observe both cursors and feedback of immediate and cumulative scores for themselves and the other player (see Methods). Importantly, we did not enforce a competitive or cooperative context – the individual payoff did not directly depend on the performance of the partner. Thus, participants could freely choose whether to use or ignore the perceptual report of the other player.

We pooled all experimental sessions of each participant according to social context (solo vs dyadic, within-subject). Average response lags after a direction change were significantly different in solo and dyadic experiments (Solo: 643 ms ± 78 ms [Mean ± IQR]; Dyadic 662 ms ± 105 ms; Wilcoxon signed rank test, n = 34, Z = −2.98, p < 0.01), albeit the average response lag difference was small. We also found a small but significant improvement in average individual score between solo and dyadic experiments (Solo: 0.2357 ± 0.0696 [Mean ± IQR]; Dyadic 0.247 ± 0.049; Wilcoxon signed rank test, n = 34 (subjects), Z = 2.31, p < 0.05). Compared to the solo CPR, 68% of participants (23/34) achieved a higher score when co-acting with a partner (Figure 3a). On the level of a dyad, the combined score of two players working together was higher than that of the same two players working alone (Mean difference = 0.023, Wilcoxon signed rank test, n = 50 (dyads), Z= −3.36, p<0.001). Thus, social context seemed to improve overall task outcome (‘reward score’).

Dyadic vs solo social modulation.

(a) Reward score in dyadic vs solo sessions. Scores were averaged across all targets, including misses. All solo and (human-human) dyadic sessions were pooled within-subject. Inset: coherence-wise averaging of reward scores. Stimulus coherence is color-coded, see legend on the right in panel c. Each subject contributes one data point per stimulus coherence level. The median score across all subjects for each coherence condition is overlaid in brighter color hues. Error bars show 99% confidence intervals of the median in solo and dyadic conditions. (b) Social modulation between solo and dyadic experiments, measured as AUC, for joystick accuracy (top) and eccentricity (bottom, Wilcoxon rank sum test, Bonferroni-corrected) of individual participants. Coherence was pooled within-subjects. See Supplementary Figure 3a for average performance in dyadic experiments and Supplementary Figure 3c for three examples of social modulation and how the AUC captures the directionality of the behavioral change. (c) Coherence dependent modulation of hit rate (top row), accuracy (center row) and eccentricity (bottom row) in dyadic vs solo setting. First column: quantification of dyadic vs solo change in behavior for each participant and coherence condition. All dyadic and solo sessions were pooled, respectively. AUC was used to quantify the direction and magnitude of social modulation. A value of 0.5 corresponds to perfect overlap between solo and dyadic response distributions, 1 and 0 imply perfect separation between experimental conditions (see Supplementary Figure 3c). Gray lines correspond to the AUC of individual participants. Red shading illustrates the 99% confidence intervals of the mean across participants. Bold, black lines show the mean across the population. Data were averaged within-subject first, before pooling across coherence conditions. Second column: average social modulation displayed for different solo performance quartiles. Coherence is color-coded. The grouping into quartiles was done for each response corresponding dimension separately. Please see Supplementary Figure 3b for comparison of raw solo joystick responses with social modulation. See Supplementary Figure 4 for quartile grouping across response dimensions. Third column: statistical comparison between joystick accuracy and eccentricity in solo and dyadic experiments, for each coherence condition. Sessions were pooled according to experimental condition within-subject. The percentage of participants with significantly different distribution in solo and dyadic sessions is displayed (Wilcoxon rank sum test, Bonferroni- corrected). The directionality of the significant effect in each subject was established with the AUC shown in the first column.

Next, we assessed whether changes in perceptual competence (discrete: hit rate; continuous: accuracy) or confidence were the driving factors behind this improvement. Hit rates were unaffected by social setting (Solo: 0.3887 ± 0.1054 [Mean ± IQR]; Dyadic 0.3909 ± 0.0638; Wilcoxon signed rank test, n = 34 (subjects), Z = 0.6582, p = 0.51). Furthermore, average response accuracy in dyadic experiments did not change for 94% (32/34) of participants (Figure 3b; within subjects: Wilcoxon rank-sum test, coherence pooled, number of tests: 34 (subjects), Bonferroni-corrected significance threshold = 0.0015; across subjects: Wilcoxon signed rank test, Median AUC = 0.49, Z = −1.94, p = 0.0523). Thus, the access to reports of others did not improve average CPR competence. However, 94% of participants (32/34) placed their joysticks at a significantly different eccentric position when playing in a dyadic setting (Figure 3b; within subjects: Wilcoxon rank-sum test, coherence pooled, number of tests: 34 (subjects), Bonferroni-corrected significance threshold = 0.0015; across subjects: Wilcoxon signed rank test, Median AUC = 0.57, Z = 2.47, p < 0.05), suggesting altered confidence reports in a social setting. We estimated the directionality of the dyadic vs solo social modulation with the area under the receiver-operating characteristic (AUC) for each player (Figure 3c + Supplementary Figure 3, see Methods). Of all participants with significantly different joystick eccentricity in solo and dyadic conditions, 38% - 56% (min and max across coherence levels) increased their joystick eccentricity when playing with a partner, while 12% - 26% changed to a more central position, i.e., played more conservative (Figure 3c, Wilcoxon rank-sum test, number of tests: 34 (subjects) * 7 (coherence levels) = 238, Bonferroni-corrected significance threshold = 2.1008e-04). Thus, most participants achieved better score by signaling, on average, more perceptual confidence during their continuous perceptual report in a social setting. The observed bidirectional effect might be explained by dyadic convergence, as initially less confident participants seemed to increase their perceptual wagers, while accurate individuals declined in accuracy (Figure 3c, plot of social modulation vs solo performance, see also Supplementary Figure 4) We further explore this hypothesis in the next section.

Performance difference between participants determines dyadic effect

When comparing all solo vs all dyadic sessions, our results suggest that perceptual confidence but not competence is modulated by the social setting. We next asked whether the observed social modulation can be explained by within-dyad differences in solo behavior. Intuitively, we hypothesized that a larger difference in solo performance between subjects would lead to a stronger modulation by the social context, because no additional information can be derived from observing a very similar partner (Figure 4a, ‘bow-tie’). On the other hand, earlier work suggested less successful perceptual decision-making in dyadic settings when perceptual sensitivities or confidence differ strongly between participants (Bahrami et al., 2010; Pescetelli and Yeung, 2022). To test these rival hypotheses, we analyzed the social modulation by contrasting the dyadic vs solo AUC for each player (all solo sessions of this subject vs specific dyadic session, RDP coherence pooled). Overall, we observed three behavioral patterns, across the range of individual solo differences (Figure 4a): (i) both participants improved in a social setting (AUC > 0.5; Accuracy: 14% of dyads, Eccentricity: 48% of dyads), (ii) both participants got worse (AUC < 0.5; Accuracy: 38%, Eccentricity: 16%) and (iii) one player improved while the other got worse (Accuracy: 48%, Eccentricity: 36%). Across the participants, the confidence increased while accuracy slightly decreased in the dyadic setting (Figure 4a – histogram, Wilcoxon signed rank test (n = 100), Accuracy: Median = 0.48, Z = - 2.93, p < 0.01; Eccentricity: Median = 0.58, Z = 2.92, p < 0.01), confirming earlier within-subject findings for confidence (Figure 3). The significant shift of the median from 0.5 indicates asymmetric social modulation. Crucially, in contrast to earlier studies, our data does not reveal systematic dyadic benefits for dyads with similar perceptual accuracy or confidence. Instead, there was a positive (but not significant) correlation between the average social modulation within a dyad and solo difference between players (Supplementary Figure 5a).

Social modulation in human-human dyads.

Left column: schematic depiction of hypothesized effects; middle and right column: actual data for eccentricity (confidence) and accuracy. (a) Left: social modulation might increase monotonically with solo performance difference (‘bow-tie’). Alternatively, similar performance has been shown to result in dyadic improvements (‘Gaussian’). Middle and right: within-dyad social modulation (dyadic vs solo), measured by AUC, as a function of the within-dyad solo difference. Dyads consist of two participants: player 1 (P1, filled circle) and player 2 (P2, open circle), which are connected with a gray line. Histograms summarize social modulation across all participants. Red lines illustrate the median of the distribution. (b) Absolute social modulation difference between dyadic players, corresponding to the length of the line connecting player 1 and player 2 in (a). Each dyad is represented by one data point. We expected a U-shaped function if the social modulation differences would be larger in more heterogeneous dyads. The data was fitted with a 2nd order polynomial function (red). A moving average (window size = 12, black) confirmed the fits. (c) Signed social modulation difference between players for eccentricity (middle) and accuracy (right). Each dyad is represented by one data point. The red line illustrates the correlation between solo difference and social modulation (Linear regression, Accuracy: r = −0.6, p < 0.01; Eccentricity: r = −0.7, p < 0.05; P-values significantly different from randomly permutated data). We hypothesized a negative correlation if the worse solo player had a larger AUC value than the dyadic partner (left). (d) Absolute difference between partners in solo and dyadic setting for eccentricity (middle) and accuracy (right). Each dyad is represented by one data point. Dyads show convergence when differences between players in dyadic setting are smaller than differences in solo experiments (left).

To further examine the relationship between solo and dyadic performance, we tested the social modulation difference between the two players in the dyad, again as a function of individual differences in the solo condition. The absolute social modulation difference increased with larger difference in solo performance, as indicated by the running averages and better fits resembling a U-shaped function instead of a linear function (Figure 4b, Adj. R2: Accuracy: Linear = 0.0335 vs. Quadratic = 0.0620; Eccentricity: Linear = −0.0186 vs. Quadratic = 0.2271). Thus, more dissimilar solo performance elicits large differences in social modulation.

Furthermore, concerning the direction of this relationship, we hypothesized that the worse solo player would benefit from a social setting, relative to the better solo player (here and elsewhere, “worse”/“better” mean less/more confident/accurate, correspondingly). Figure 4c (right column, see also Figure 4a) illustrates two possible scenarios: for dyad x, the individually worse player benefits from social context while the better solo player gets worse. For dyad y, both players get better, but the individually worse one improves more. We correlated the signed social modulation difference between the two members of each dyad with the difference in solo behavior. For both eccentricity and accuracy, a negative correlation was found between the within-dyad difference in social modulation and the difference in solo task (Figure 4c, Pearson correlation, n = 50, Eccentricity: r = −0.74, p < 0.05; Accuracy: r = −0.6, p < 0.01; compared to permuted dyads, see Methods). Thus, in line with our prediction, worse solo players either improve more or at least get less bad, relative to the initially better partner (who is either improving less or getting worse).

The observed effects could be explained by confidence convergence during dyadic interaction, which has been reported to impact perceptual accuracy of individual and dyadic decisions (Bang et al., 2017; Pescetelli and Yeung, 2022). To directly test this, we contrasted the absolute eccentricity and accuracy differences between the two players in dyadic vs solo setting. Compared to solo experiments, 74% of dyads (37/50) exhibit a smaller eccentricity difference when playing together (Figure 4d, Wilcoxon signed rank test, n = 50, Z = 3.46, p < 0.001). This finding and the relative benefit for the worse solo player (Figure 4c) suggest dyadic convergence for metacognitive confidence, and to a smaller extent for accuracy, in a social context. This convergence is asymmetric because initially more confident players were affected less, and more accurate players were affected more, than their counterparts (Figure 3c), resulting in overall positive shift for confidence and slight negative shift for accuracy (Figure 4a).

Lastly, we compared the average eccentricity and the average accuracy correlations between the two players, in solo and dyadic context. As expected, there was no correlation between solo reports, but the participants’ behavior became significantly correlated when they played together (Supplementary Figure 5c, Pearson’s correlation, n = 50, Accuracy: Solo: r = 0.19, p = 0.18, Dyadic: r = 0.48, p < 0.001; Eccentricity: Solo: r = 0.11, p = 0.44, Dyadic: r = 0.54, p < 0.001).

Perceptual accuracy improves with reliable social signaling

As expected, the quality of the solo perceptual report declined in a comparable fashion across participants for low stimulus coherence (Figure 2). We wondered if this perceptual breakdown led to the relatively small accuracy modulation by the social context we described above (Figure 3 and Figure 4). Based on earlier work on Bayesian integration and social conformity (De Martino et al., 2017; Germar et al., 2016; Khalvati et al., 2021; Park et al., 2017), we expected that integrating information from a partner will be weighted by their reports’ accuracy reliability. We hypothesized that participants would integrate more social evidence when it was reliably accurate, regardless of the stimulus noise. Furthermore, we asked whether incorporating social signals into human decision-making requires graded, accuracy-depended confidence signaling by others.

To that end, we developed a computer player, that was programed to accurately represent the nominal RDP direction (± Gaussian noise; note that even 0% coherence had a correct “nominal” direction), with a fixed “confidence” of 0.5 (± Gaussian noise in a.u.) across all coherence levels at all times (Supplementary Figure 6). In such human-computer (HC) dyads, the computer player was physically impersonated by one of the experimenters who pretended to be the partner. Thus, participants believed that they played the game with another human. The computer player was set up to report motion direction with a constant, human-like latency (508 ms ± Gaussian noise). The computer response was not affected by the cursor of the human participants, resulting in a situation where the social signals might only unilaterally affect the human player. Crucially, unlike real human partners, the computer player did not provide useful information regarding its tracking confidence. With this condition we aimed to evaluate whether human participants would integrate social cues about the motion direction into their own reports, while the partner’s confidence report was uninformative. We nevertheless expected a response accuracy improvement, especially when sensory evidence became degraded at low coherences. Furthermore, we hypothesized that the reliable nature of the computer partner would result in riskier, more eccentric joystick placement in human players.

By accurately representing the stimulus direction, the computer player triggered profound behavioral effects (Supplementary Table 2 to Supplementary Table 4). In this setting, participants collected more targets with a higher score (Figure 5a). Compared to human-human dyads, 48% of participants (16/33) improved their accuracy (while none became worse) across all coherence levels (Figure 5b-d, within-subjects: Wilcoxon rank-sum test, coherence pooled, number of tests = 33 (subjects), Bonferroni-corrected significance threshold = 0.0015; across subjects: Wilcoxon signed rank test, Median AUC = 0.57, Z = 4.85, p < 0.001; see Supplementary Figure 7 for comparison to solo behavior). In particular, 52% of participants (17/33) showed a significant accuracy boost at 0% coherence (Wilcoxon rank-sum test, number of tests: 33 (subjects) * 7 (coherence levels) = 231, Bonferroni-corrected significance threshold = 2.1645e-04, Median AUC across subjects at 0% coherence = 0.73). Thus, participants integrated reliable sensory-social direction cues to improve their task performance, especially when stimulus was ambiguous, suggesting a unilateral convergence towards the computer player. But despite more accurate task performance, most participants did not improve in the eccentricity dimension (Figure 5b-d). In fact, compared to an interaction with a real human counterpart, 64% of participants showed less confidence while only 18% improved when playing with the computer player (within-subjects: Wilcoxon rank-sum test, coherence pooled, number of tests = 33 (subjects), Bonferroni-corrected significance threshold = 0.0015, across subjects: Wilcoxon signed rank test, Median AUC = 0.45, Z = −3.15, p < 0.01). Subjects were particularly affected when the task was easy (98% coherence, Median AUC across subjects = 0.36). This too seems to suggest confidence convergence to the relatively invariant low confidence computer player, even with otherwise reliably accurate direction signaling.

Comparison of social modulation in human-human (HH) dyads and human-computer (HC) dyads.

(a) Left: average subject-wise score in the two dyadic experiments compared to the score of the same participant in solo experiments. Average score (middle) and hit rate (right) of the population in different experimental conditions. Error bars correspond to the 99% confidence intervals of the mean. (b) Comparison of average hit rate (top), accuracy (middle) and eccentricity (bottom) for each participant and each stimulus coherence (color-coded) in human-human and human-computer dyads. Individual data (subject-wise, averaged across several HH sessions for each subject) are shown in darker hue. Medians are overlaid for each coherence condition with bright colors. Error bars show 99% confidence intervals of the median. (c) Response difference between HH and HC dyadic experiments measured by AUC. All HH and HC sessions were pooled, respectively. A value of 0.5 corresponds to perfect overlap between HH and HC dyadic response distributions, 1 and 0 imply perfect separation between experimental conditions (see Supplementary Figure 3c). Gray lines correspond to the AUC of individual participants. Bold black lines and red shading illustrate the mean and 99% confidence intervals of the mean across participants. Data were averaged within-subject first, before pooling across coherence conditions. (d) Population comparison of social modulation (between solo and HH experiments, top) and HH and HC contrast (bottom).

Social modulation of confidence and accuracy co-varies across dyads

So far, we analyzed the accuracy and the eccentricity modulations independently. Here we investigate the link between the two report dimensions. In both dyadic conditions, subject-wise social modulation of perceptual confidence correlated with the change in accuracy (Figure 6, HH: Pearson’s correlation, n = 100, r = 0.55, p < 0.001; HC: n = 33, r = 0.43, p < 0.05), suggesting that the gain or the loss in accuracy leads to reappraisal of confidence. Interestingly, partners’ initial confidence (mis)match in solo experiments did not correlate with dyadic improvements or deteriorations in accuracy (Supplementary Figure 8, Pearson’s correlation, n = 50, r = 0.24, p = 0.09), unlike in the previous study that investigated a similar relationship (Pescetelli and Yeung, 2022).

Relationship between social modulation of accuracy and eccentricity.

Left: Human-Human (HH) dyads vs. solo; Middle: Human-Computer (HC) dyads vs solo; Right: Correlation of eccentricity vs accuracy difference between dyadic experiments (HC vs HH). Values above 0.5 correspond to increased accuracy or confidence in the “first condition” (e.g. HH dyad) compared to the “second condition” (e.g. Solo), and vice versa.

Despite the positive covariation between social modulation of accuracy and eccentricity, the human – computer experiment demonstrated that social evidence integration does not require metacognitively sensitive, graded confidence signaling by the partner. However, the apparent dissociation between the average improvement of accuracy and the average decline of confidence is still grounded in a lawful relationship between two response dimensions. Players who gained substantially more accuracy tended to show confidence increase, or at least less confidence decrease, compared to players who did not change much in accuracy (Figure 6, HC vs HH: n = 98, r = 0.62, p < 0.001). To conclude, both, individually-varied, coherence-dependent human reports, and reliably accurate direction reports coupled with uninformative confidence expression by the simulated partner influenced human perceptual decisions and resulted in converging, socially conforming behavior.

Discussion

In this study, we assessed continuous human perceptual decision-making in social and non-social settings, with a newly developed paradigm, where subjects wager on the correctness of their motion percept in real-time. Overall, in dyadic settings, we find higher perceptual confidence but no gain in accuracy. We demonstrate that convergence during dyadic co-action underlies this net effect: the magnitude and directionality of the behavioral change depends on competence and confidence of the social partners.

In contrast to the general increase in confidence we have observed, previous studies did not report overall rise in confidence (Bang et al., 2017; Pescetelli et al., 2016; Pescetelli and Yeung, 2022). However, some earlier work demonstrated gains in dyadic competence – i.e. when the joint decision-making outperforms even the better of two partners – after partners exchanged their individual confidence (Bahrami et al., 2010; Pescetelli et al., 2016). Likewise, competence increases even during dyadic co-action, especially for participants with similar confidence (Pescetelli and Yeung, 2022). Despite presenting subjective confidence continuously and saliently as part of the perceptual report, our dyadic task did not evoke overall improvements in accuracy (a continuous measure of competence) or hit rate (a discrete measure of competence), suggesting a metacognitive bias in social conditions. The resulting score (which combines accuracy, hit rate, and confidence), however, did improve in the dyadic setting, suggesting a form of dyadic benefit dissociated from perceptual competence. This indicates that real-time social feedback boosts confidence in one’s perception, without a corresponding enhancement in competence.

This apparent disconnect between confidence (which can be construed as perceived competence) and actual competence is further reflected in the differences in social modulation as a function of solo performance. We found that individually least confident players exhibited the largest increase in confidence in dyadic settings. Conversely, the most individually accurate players declined in accuracy. Reminiscent of previous work on metacognitive biases in individuals (Kruger and Dunning, 1999), we show disproportional effects of social context on perceptual confidence in less competent players. Less skilled individuals benefit from the social information of their superior partners, while proficient individuals give little weight to the information of their less skilled partners.

Along these lines, we found convergence in accuracy and especially confidence between dyadic partners. Direction and magnitude of this effect were determined by the difference in performance of the two players, with the worse solo player changing more and in a more beneficial way during dyadic experiments. Larger solo differences between participants resulted in larger difference in social modulation between the partners. Dyadic confidence convergence (Esmaily et al., 2023; Pescetelli and Yeung, 2022) and confidence matching (Bang et al., 2017) have been described before. Here, we also show systematic social modulation based on not only accuracy and confidence differences between participants but also on their initial solo performance. However, unlike the previous work, where similar confidence (Pescetelli and Yeung, 2022) or perceptual sensitivity (Bahrami et al., 2012a, 2010; Baumgart et al., 2020) correlated with higher dyadic competence, we do not find systematic dyadic competence benefits (in our case, average accuracy within a dyad) for subjects with similar task competence or confidence. We speculate that the type and modality of social feedback and interaction might underlie these differences. Explicit verbal communication (Bahrami et al., 2012a, 2010) or periods of metacognitive introspection in which prior individual decisions are evaluated (Pescetelli and Yeung, 2022) might have elicited competence improvements.

The control experiment with a reliably accurate, simulated dyadic partner who exhibited constant intermediate level of confidence irrespective of the task difficulty evoked vastly improved hit rate and accuracy, especially when sensory information was ambiguous. At the same time, human confidence reports became more conservative when playing with the “conservative” simulated partner, especially at the high stimulus coherence. Improvements in competence together with declining confidence further support dyadic convergence, because the human player gravitated towards the report of the simulated partner. Thus, instead of using the reliably accurate information provided by the computer player to be more accurate and more confident, convergence interfered with fully maximizing the reward score. High-accuracy, low-confidence simulated partners have been recently shown to elicit more conservative confidence reports during binary dyadic decision-making (Esmaily et al., 2023). Beyond these results, our experiments demonstrate that humans do not require sensible confidence expression to recognize and utilize differences in task competence. Our findings indicate a possible dissociation between the direction of accuracy and confidence alignment. At the same time, we observe a positive covariation of accuracy and confidence modulation by the dyadic context, suggesting a retention of metacognitive sensitivity under social influence. We conclude that performance history and temporal reliability might be important factors in addition to explicitly signaled confidence, especially when these information streams are not congruent.

Systematic changes towards group consensus (‘social conformity’) have been shown to bias decision-making towards majority choices (De Martino et al., 2017; Germar et al., 2016; Park et al., 2017; Toelch and Dolan, 2015). The dyadic convergence we and others observe might be the basis for social conformity in larger group settings. Critically, dyadic convergence bilaterally affects both partners, but asymmetrically. For instance, the more confident players adjust their confidence very little but such individuals may disproportionally influence group consensus.

In our experiment, participants were not instructed to cooperate or compete with one another. They did not jointly reach a perceptual decision but instead co-acted under independent reward contingencies. This difference to previous reports (Bahrami et al., 2012b, 2012a, 2010; Pescetelli et al., 2016) is crucial for interpreting our results, since participants could ignore the overt behavior of the other player. Therefore, any social modulation or correlated behavior observed in our experiment can be attributed to a spontaneous, self-regulated process. We interpret our findings as evidence that in social situations most people spontaneously and opportunistically integrate the judgment of others into their own decisions, even when social interaction is not incentivized or enforced. In line with this argument, humans seem to naturally follow gaze signals and choice preferences of others, suggesting the utilization of others’ thoughts and intentions (Bayliss et al., 2007; Madipakkam et al., 2019; Mitsuda and Masaki, 2018). Furthermore, human co-action seems to result in attentional attraction or withdrawal in some dyads (Dosso et al., 2018). As next step, it would be very interesting to test whether face-to-face co-action through the transparent shared visual display will induce even stronger social effects compared to separate experimental booths (Moeller et al., 2023).

The advent of new techniques such as time-continuous decision-making (Bonnen et al., 2015; Huk et al., 2018; Noel et al., 2023, 2022) and hyperscanning (Babiloni and Astolfi, 2014; Czeszumski et al., 2020) allow to ask how evolving decisional variables are represented in neural circuitry underlying flexible behaviors. This is an important step beyond the traditional approach based on discrete, trial-based decisions. Adapting this approach, we demonstrate real-time influence of social information on human perceptual decisions. Studies investigating the neuronal correlates of similar perceptual decisions have demonstrated faster and more accurate behavioral responses when the sensory evidence resulted in earlier and more reliable neuronal changes (Fan et al., 2018; Gold and Stocker, 2017; Kiani and Shadlen, 2009). It has also been shown that microstimulation- and optogenetically- elicited inputs can be integrated into perceptual decisions (Fetsch et al., 2018, 2014; Salzman et al., 1990). Along these lines, we propose that reliably accurate real-time social information is multiplexed with sensory signals, possibly resulting in enhanced encoding already in cortical neurons representing relevant sensory dimensions.

In summary, we show that the presence of a co-acting social partner adaptively changes continuous perceptual decisions, resulting in mutual but asymmetric convergence and a net dyadic benefit. This is particularly apparent in a strong improvement of competence, confidence, and reward score of the worse partner. The better partner, on the other hand, gets less accurate and/or slightly loses confidence. These lawful relationships between confidence and competence modulations demonstrate the importance of concurrently considering these two measures, both within each participant and across interacting partners. These results advance our understanding of how humans evaluate and incorporate social information, especially in real-time decision-making situations not permitting slow and careful deliberation.

Methods

Study design and participants

Data were recorded from 38 human participants (Median age: 26.17 years, IQR 4.01; 13 of which with corrected vision) between January 2022 and August 2023. Prior to the experimental sessions, each participant was trained on two occasions. During the experimental phase, participants played three variations of the experimental paradigm: alone (solo), with a human player (Human-Human dyad), and with a computer player (Human-Computer dyad). The experimental order was mixed and largely determined by the availability of participants (Supplementary Figure 1a). Most participants in this study originated from central Europe or South Asia. All procedures performed in this study were approved by the Ethics Board of the University of Göttingen (Application 171/292).

Experimental setup

Participants sat in separate experimental booths with identical hardware. They were instructed to rest their head on a chinrest, placed 57 cm away from the screen (Asus XG27AQ, 27” LCD). A single-stick joystick (adapted analog multifunctional joystick (Sasse), Resolution: 10-bit, 100 Hz) was anchored to an adjustable platform placed in front of the participants, at a height of 75 cm from the floor. The joystick was calibrated before data acquisition to ensure comparable readouts. Screens were calibrated to be isoluminant. Two speakers (Behringer MS16), one for each setup were used to deliver auditory feedback at 70 dB SPL.

The experimental paradigm was programmed in MWorks (Version 0.10 - https://mworks.github.io). Two iMac Pro computers (Apple, MacOS Mojave 10.14.6) served as independent servers for each setup booth. These computers were controlled by an iMac Pro (Apple, MacOS Mojave 10.14.6). Custom-made plugins for MWorks were used to generate and display the stimuli, to handle the data acquisition from the joystick (10 ms sampling rate), and to incorporate all data from both servers into a single data file.

Continuous perceptual report (CPR) game

Participants were instructed to maximize monetary outcome in a motion tracking game. In this game, subjects watched a frequently changing random dot pattern on the screen and used a joystick (Szul et al., 2020) to indicate their current motion direction perception. The joystick controlled an arc-shaped response cursor on the screen (partial circle with fixed eccentricity, Solo: 2 degree of visual angle (‘dva’) radius from the center of the screen; Dyadic: 1.8 dva & 2 dva radius). The angular direction of the joystick was linked to the cursor’s polar center position. In addition, the joystick eccentricity was permanently coupled to the cursor’s width (see below, 13 – 180 degrees). This resulted a continuous representation of the joystick position along its two axes. By moving the joystick, participants could rotate and shape the cursor. At unpredictable times (1% probability every 10 ms), a small white disc (‘reward target’, diameter: 0.5 dva) appeared on the screen for a duration of 50 ms at 2.5 dva eccentricity, congruently with the motion direction of the stimulus. Whenever a target appeared in line with the cursor, the target was considered collected (‘hit’) and the score of the participant increased. To help participants performing such alignment to the best of their perceptual abilities, a small triangular reference point was added to the center point of the cursor. Throughout the experiment, participants were required to maintain gaze fixation on a central fixation cross (2.5 dva radius tolerance window) or the cursor would disappear and no targets could be collected until fixation was resumed.

In the solo experiments, the cursor was always red. In dyadic conditions, the two cursors, present on screen simultaneously, were red and green (isoluminant at 17.5 cd/m2 ± 1 cd/m2). During dyadic experiments, the position of the two cursors switched between stimulus cycles, with the red cursor always starting above, but not overlapping the green cursor. Each cursor color was permanently associated with one of the two experimental booths. After the mid-session break, participants switched booths, contributing an equal amount of data for each setup (600 reward targets, ∼20 min, up to 17 stimulus cycles). Players initiated new stimulus cycles with a joystick movement. Each stimulus cycle could last up to 75 seconds, during which the RDP’s motion direction and coherence changed at pseudorandomized intervals, resulting in the presentation of 30 stimulus states per cycle.

Random dot pattern (RDP)

We used a circular RDP (8 dva radius) with white dots on a black background. Each dot had a diameter of 0.1 dva, moved with 8 dva/s and had a lifetime of 25 frames (208 ms). The overall dot density was 2.5 dots/dva. The stimulus patch was centrally located on the screen. The central part of the stimulus (5 dva diameter) was blacked out. In this area we presented the fixation cross and the response arc. The RDP motion direction was randomly seeded and set to change instantly by either 15 deg, 45 deg, 90 deg or 135 deg after a pseudo-randomized time interval of 1250 ms to 2500 ms. Whether the signal moved in clockwise or counterclockwise direction was random. Only signal dots altered their direction. The dot coherence changed pseudo-randomly after 10 RDP direction changes to the coherence level that was presented least. Seven coherence conditions were tested: 0, 8, 13, 22, 36, 59, 98%.

Gaze control

Participants were required to maintain gaze fixation at the center of the screen throughout each stimulus cycle. We used a white cross (0.3 dva diameter) as anchor point for the participants’ gaze. The diameter of the fixation window was set to 5 dva. An eye tracker (SR research, EyeLink 1000 Plus) was used to control gaze position in real-time. If the gaze position left the fixation window for more than 300 ms, the player’s arc would disappear from the screen, preventing target collection. In addition to this, an increase of the fixation cross’ size, together with a change in color (white to red), signaled to the participants that fixation was broken. As soon as the gaze entered the fixation window again, visual parameters were reset to the original values and the arc would reappear, allowing the player to continue target collection.

Reward score

Participants were incentivized to maximize their monetary outcome by collecting as many targets as possible with the highest possible score. The minimum polar distance between the arc’s center position and the target’s center (‘accuracy’) as well as the angular width of the arc (‘eccentricity’) at the moment of collection were taken into account when calculating the score:

where RDPdirrefers to the direction of the random dot pattern and JSdir refers to the direction of the joystick at sample i.

where Eccentricity varies between 0 and 1 for minimal and maximal eccentric joystick positions, respectively.

Thus, narrower and more accurately placed cursors caused higher reward scores.

Feedback signals

Various feedback signals were provided throughout the experiment to inform participants about their short- and long-term performance. All feedback signals were mutually visible.

Immediate feedback

Immediately after a target was collected, visual and acoustic signals were provided simultaneously. The auditory feedback consisted of a 200 ms long sinusoidal pure tone at a frequency determined by the score. Each tone corresponded to a reward range of 12.5%, with lower pitch corresponding to low reward score. We used 8 notes from the C5 major scale (523, 587, 659, 698, 784, 880, 988, 1047 Hz). Sounds were on- and off-ramped using a 50 ms Hanning window. No sound feedback was given for missed targets. In solo experiments, the visual feedback consisted of a 2 dva wide circle, filled in proportion to the score with the same color as the arc’s player. The circle was presented in the center of the screen, behind the fixation cross for 150 ms. In dyadic conditions, the visual feedback consisted of half a disc for each player and both color-coded halves were mutually visible.

Short-term feedback

During each stimulus cycle, a running average of the reward score was displayed for each player with a 0.9 dva wide, color-coded ring around the circumference of the RDP (18.2 dva and 19.4 dva diameter). After every target presentation, the filled portion of the ring updated. To avoid spatial biasing, the polar zero position of the ring changed randomly with every stimulus cycle.

Long-term feedback

Cumulative visual feedback was provided after each stimulus cycle (during the inter-cycle intervals) for 2000 ms. It displayed the total reward score accumulated across all cycles as a colored bar graph located at the center of the screen. A grey bar (2 dva wide, max height: 10 dva) indicated the maximal possible cumulative score after each cycle. A colored bar next to it (same dimensions) showed how much was collected by the player so far. In dyadic experiments, red and green bars would be shown on either side of the grey bar. In solo experiments, a red arc was shown to the left of the grey bar. The configuration of the visual stimuli and task parameters is illustrated in Figure 1 and as a supplementary video file (Supplementary Video 1).

Behavioral analysis

Performance metrics were extracted and averaged in time windows of 30 frames (∼250 ms) that were either target- or state-aligned: Target-alignment refers to time windows prior to the first target presentation of each stimulus state. Target-aligned data were only considered if the first target appeared at least 1000 ms after the direction change. This analysis approach was chosen to (i) allow adequate time for a response and (ii) to avoid the prediction of motion direction based on earlier target locations. State-alignment refer to a 30 frames time-window before a motion direction change of the stimulus. Stimulus states in which fixation breaks exceeded 10% of the state duration were excluded from the performance analyses.

Statistical analysis

For population analyses, data were first averaged within-subject. Bootstrapped 95% confidence intervals were estimated with 1000 repetitions (Matlab: bootci). Differences between experimental conditions were tested with a two-sided Wilcoxon signed rank test (Matlab: signrank, for paired samples) and a two-sided paired Wilcoxon rank sum test (Matlab: ranksum). Bonferroni correction was applied for multiple testing. Whether or not coherence was pooled is indicated for each test. Within-subject effect size was estimated with the area under the receiver operating characteristic (‘AUC’, Matlab: perfcurve). AUC values of 0.5 indicated similar distributions. AUC of 0 and 1 suggested perfectly separated distributions. Directionality of social modulation was inferred by AUC change (larger vs smaller than 0.5). Correlation coefficients were calculated with a Pearson correlation (Matlab: corrcoef). Average response lags between stimulus and response were estimated with the maximum cross-correlation coefficient (Matlab: xcorr). Social modulation differences between dyadic players were compared to the baseline modulation of shuffled dyadic partners (Matlab: randperm). We fitted three Generalized Linear Mixed Models (GLMM; Baayen, 2008) which differed in their response variable and the size of the data set analyzed but had identical fixed effects structures and largely identical random effects structures. We fitted one model for the probability of a target hit (model 1a), joystick eccentricity (model 1b), and joystick accuracy (model 1c) as the response. All three aimed at estimating the extent to which the respective response variable was affected by the fixed effects of experimental condition (solo or human-computer dyad), random dot pattern coherence, stimulus duration, stimulus number, block number, and day number. We hypothesized that the effect of coherence depended on the condition, thus, we included the interaction between these two predictors into the fixed affects part of the model. To avoid pseudo-replication and account for the possibility that the response was influenced by several layers of non-independence, we included three random intercepts effects, namely those of the ID of the participant, the ID of test day (nested in participant; thereafter ‘day ID’), and the ID of the block (nested in participant and day; thereafter ‘block ID’). The reason for including the latter two was that it could be reasonably assumed that the performance of participants varied between test days and also between blocks tested on the same day. To avoid an ‘overconfident model’ and keep type I error rate at the nominal level of 0.05 we included all theoretically random slopes (Barr et al., 2013; Schielzeth and Forstmeier, 2009). These were those of condition, coherence, their interaction, stimulus duration, stimulus number, block number, and day number within participant, coherence, stimulus duration, stimulus number, and block number within day ID, and finally coherence, stimulus duration, and stimulus number within block ID. Originally we also included estimates of the correlations among random intercepts and slopes into each model, but do to convergence and identifiability problems (recognizable by absolute correlation parameters being close to 1; Matuschek et al., 2017) we had to exclude all or several of these estimates from the full models (see Supplementary Table 1 for detailed information).

For each model we conducted a full-null model comparison which aims at avoiding ‘cryptic multiple testing’ and keeping the type one error rate at the nominal level of 0.05 (Forstmeier and Schielzeth, 2011). As we had a genuine interest in all predictors present in the fixed effects part of each model the null models comprised only the intercept in the fixed effect’s part but were otherwise identical to the respective full model. This full-null model comparison utilized a likelihood ratio test (Dobson, 2001). Tests of individual effects were also based on likelihood ratio test, comparing a full model with an each in a set of reduced models which lacked fixed effects one at a time.

Model implementation

We fitted all models in R (version 4.3.2; R Core Team, 2023). In model 1a we included the response as a two columns matrix with the number of targets hit and not hit in the first and second column respectively (Baayen, 2008). The model was fitted with a binomial error structure and logit link function (McCullagh and Nelder, 1989). In essence, such models model the proportion of targets hit. We are aware that in principle one would need an ‘observation level random effect’ which would link the number of targets hit and not hit in a given stage. However, in a relatively large proportion of stages (19.7%) there was only a single target that appeared and in the majority of stages (47.0%) only two targets appeared, making it unlikely that a respective random effect can be fitted successfully.

For models 1b and 1c, we fitted with a beta error distribution and logit link function (Bolker, 2008). Models fitted with a beta 1 error distribution cannot cope with values in the response being exactly 0 or 1. Hence, when such values were present in a given response variable we transformed then as suggested by Smithson & Verkuilen, 2006. Model 1a was fitted using the function glmer of the package lme4 (version 1.134; Bates et al., 2015), and models 1b and 1c were fitted using the function glmmTMB of the equally named package (version 1.1.8; Brooks et al., 2017). We determined model stability by dropping levels of the random effects factors, one at a time, fitting the full model to each of the subsets, and finally comparing the range of fixed effects estimates obtained from the subsets with those obtained from the model fitted on the respective full data set. This revealed all models to be of good stability. We estimated 95% confidence limits of model estimates and fitted values by means of parametric bootstraps (N=1000 bootstraps; function bootMer of the package lme4 for model 1 and function simulate of package glmmTMB for models the response was overdispersed (maximum dispersion parameter: 1.0).

Acknowledgements

This work was supported by German Research Foundation (Deutsche Forschungsgemeinschaft, DFG), SFB 1528 - Cognition of Interaction, project A01, and the Leibniz Collaborative Excellence grant K265/2019 “Neurophysiological mechanisms of primate interactions in dynamic sensorimotor settings” (PRIMAINT). We thank Fred Wolf for useful discussions.

Additional information

Author contribution

Conceptualization: F.S., A.C., A.G., I.K., S.T.; Methodology: F.S., A.C., S.T.; Investigation: F.S.; Analysis: F.S., R.M.; Software: F.S.; Writing – Original Draft: F.S., A.C., I.K.; Writing – Review & Editing: all authors; Funding Acquisition: A.G, I.K., S.T.; Resources: S.T.; Supervision: I.K, S.T.

Declaration of Interests

The authors declare no competing interests.

Lead contact

Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Felix Schneider (fschneider@dpz.eu).

Data and code availability

The dataset generated during this study is available at CLOUD INFORMATION (LINK). The MATLAB code generated during this study is available at GitHub (https://github.com/SocCog-Team/CPR/tree/main/Publications/2023_perceptual_confidence).

Supplementary Figures

Additional information regarding experiments and joystick responses of individual participants.

(a) Number and identity of experimental sessions for each subject. A session comprised two experimental blocks that were recorded in different setups. The order of the session type (solo or dyadic) was mixed (color-coded). All participants, except two, contributed data to each session type, specifically, solo CPR as well as both dyadic conditions. (b) Statistics of target occurrences during stimulus presentation. Distributions of inter-target intervals and target count per stimulus state for an example session. Targets were flashed with a 1% probability every 10 ms. Once a target was presented it remained in the screen for 50 ms followed by a minimum inter-target interval of 300 ms. (c) Final reward scores of participants over the course of the individuals’ data acquisition period (gray lines). Reward score increased over time. A linear regression was fitted to the cumulative scores of each experimental block for each participant (black lines). Note that each experimental session comprised two blocks, one in each setup. Independent of the session type (see panel (a) for more details), scores increased over time, likely due to perceptual learning. The final cumulative score is comprised of the hit rate, accuracy and eccentricity, all of which are affected by number of the experimental block, with later sessions resulting in higher hit rate as well as more accurate and eccentric responses (Supplementary Table 1 - Supplementary Table 4). (d) Normalized cross-correlation coefficients between random dot motion direction and joystick response direction illustrated for each subject. Lighter hues indicate higher cross-correlation coefficients at the respective signal lag, darker hues suggest low correlation between stimulus and joystick response at that lag. Cross-correlations broke down consistently with lower stimulus coherence.

Joystick eccentricity is a proxy measure of perceptual confidence.

(a) Metacognitive sensitivity of joystick response for one example subject. Left: Distribution of joystick accuracy for low (gray) vs high eccentricity stimulus states (colored, median split). Accuracy and eccentricity were averaged for the last 30 frames (250 ms) prior to a stimulus direction change. Coherence is color-coded. Right: Corresponding receiver-operating characteristics (‘ROC’) between the two distributions for each coherence level (color-coded). A ROC curve along the diagonal would indicate similar accuracy distributions between hits and misses, suggesting no metacognitive sensitivity. (b) Population AUC values (black dots) are consistently above 0.5 (p<0.001, Two-sided Wilcoxon signed rank test for distribution with median 0.5), demonstrating that high eccentricity was more often associated with high accuracy, suggesting metacognitive-sensitive confidence readouts.

Additional information for human-human dyadic performance.

(a) Performance summary in (human-human) dyadic experiments. Hit rate (left), accuracy (center) and eccentricity (right) illustrated for each subject contributing human-human dyadic data. Same conventions as in Figure 2a. (b) Comparison of social modulation with response measures of solo experiments. Hit rate differences (left, compared to solo hit rate) and social modulation of accuracy (center, compared to solo accuracy) and eccentricity (right, compared to solo eccentricity) are displayed for the entire population. Joystick data was recorded in a normalized fashion for both accuracy (180deg difference = 0; Perfect match of stimulus direction = 1) and eccentricity (center = 0; Max. eccentricity = 1). Each participant contributes on data point per coherence condition (color-coded). The median score across all subjects is overlaid for each coherence condition in brighter color hues. Error bars show 99% confidence intervals of the median in solo and dyadic conditions. (c) Social modulation of the eccentricity responses for three example participants. Data for solo (light, left) and human-human dyadic experiments (dark, right) are displayed for each stimulus coherence level (color-coded). Each dot corresponds to the time-window average for a single stimulus state. All sessions of the same experimental condition are pooled. Corresponding AUC values, used to quantify the direction and magnitude of social modulation between dyadic and solo experiments, are shown below. A value of 0.5 corresponds to perfect overlap between solo and dyadic response distributions, 1 and 0 imply perfect separation between experimental conditions.

Social modulation grouped by different metrics.

Illustration of average social modulation for hit rates (first row), accuracy (second row) and eccentricity (third row) based on differently grouped solo performance quartiles: first column - hit rate quartiles, second column - accuracy quartiles, third column - eccentricity quartiles. Coherence is color-coded. Same conventions as in Figure 3c. Social modulation between dyadic and solo experiments was measured by AUC. An AUC value of 0.5 corresponds to perfect overlap between solo and dyadic response distributions. AUC > 0.5 imply better accuracy or higher eccentricity in dyadic experiments. AUC < 0.5 imply better accuracy or higher eccentricity in solo experiments.

Additional information regarding the relationship between dyadic and solo behavior.

(a) Average (across dyad members) social modulation for eccentricity (top) and accuracy (bottom) displayed as a function of the absolute solo differences in eccentricity (left) and accuracy (right) between dyadic partners. Each data point corresponds to one dyad. Average social modulation did not correlate with absolute solo difference between players. (b) Dyadic performance, defined as average (across dyad members) score (top) and average hit rate (bottom), displayed as a function of the absolute solo differences in eccentricity (left) and accuracy (right) between dyadic partners. Dyadic performance did not correlate with absolute solo difference between players. (c) Correlations between the accuracy (left) and eccentricity (right) of dyadic partners in solo and dyadic setting. Only in dyadic situations accuracy and eccentricity correlate between players.

Additional information regarding computer player performance.

(a) Illustration of computer player behavior in an example session. Each dot corresponds to the accuracy-eccentricity combination during target presentation. Distribution of computer behavior is summarized with histograms. Same conventions as in Figure 1d. Coherence is color-coded. (b) Cross-correlation between stimulus direction changes and cursor responses of computer player. Similar human-like response lag was built in for all coherence conditions (color-coded). See Figure 2c for comparison. (c) Average computer player performance (reward score, hit rate, accuracy, and eccentricity) as a function of stimulus coherence. Shaded background corresponds to the 99% confidence intervals of the median. See Figure 2a for comparison to human behavior.

Social modulation in human-computer (HC) dyads vs solo.

(a) Comparison of average hit rate (top), accuracy (center) and eccentricity (bottom) for each stimulus coherence in solo and human-computer dyads, color-coded for coherence level. Individual data is shown in darker hues. Each subject contributes one data point per coherence condition. Medians across subjects overlaid for each coherence condition (bright color). Error bars show 99% confidence intervals of the median. Same conventions as in Figure 5b. (b) Social modulation of humans between dyadic (human-computer, HC) and solo experiments. Same conventions as in Figure 5c.

Absolute solo eccentricity difference between dyadic partners did not correlate with average (across dyad members) dyadic accuracy modulation.

Supplementary Tables

Random effects structure and sample size of each model

Results of the full model with hit probability as response

Results of the full model with eccentricity being the response

Results of the full model with accuracy being the response