Illustration of how pairwise correlations can affect the weight of evidence (logLR) for the generative source of an observation. a Computing the logLR when the observation (x) is a single sample from one of two one-dimensional Gaussian distributions (labeled A and B), with means ±μg and equal variances () (Gold and Shadlen, 2001). b Computing the logLR when the observation (x1, x2) is a pair of samples from one of two pairs of one-dimensional Gaussian distributions (labeled A and B), with means ±μg, equal variances (), and correlation between the two Gaussians = ρ. c The normative scaling ( term in b) of the observation plotted as a function of correlation sign and magnitude. The dashed horizontal line corresponds to scale factor = 1, which occurs at ρ = 0. The insets show three example pairs of distributions with different correlations, as indicated. The dotted lines in a, b, and the insets in c indicate the optimal decision boundary separating evidence for A versus B.

Task. a Human observers viewed pairs of stars (updated every 0.2 sec) and were asked to decide whether the stars were generated by a source on the left or right side of the screen. An example star pair is shown. The horizontal position of each star pair was drawn from a bivariate Gaussian distribution, with a mean and correlation that varied from trial-to-trial. b Because the normative correlation-dependent scale factor that converts observations to evidence (logLR) increases as the correlation decreases, we manipulated the mean of the generative distribution such that the expected logLR (evidence strength) was fixed across correlation conditions. c The generative distributions of the sum of individual star pairs, for three example correlation conditions. Decreasing the correlation has the effect of decreasing the standard deviation of the sum distribution. By adjusting each correlation-specific generative mean (μρ) in proportion to the correlation-dependent change in the standard deviation from the zero-correlation condition (i.e., ), the true logLR distribution (i.e., of an ideal observer) is invariant to the correlation, and thus evidence strength remains fixed. Note that the sum-of-pairs distribution is equivalent to the bivariate distribution for the purposes of computing the logLR (see Methods).

Effects of correlations on choice and RT. a Data from an example participant from the 0.6 correlation-magnitude group. Top: choices plotted as a function of evidence strength (abscissa) and correlation condition (see legend). Middle, Bottom: Mean RTs for correct and error trials, respectively. Error bars are within-participant standard errors of the mean (SEM). b Same as a, but data are averaged across all participants. Evidence strength was standardized to equal the mean evidence strength (expected logLR) for each condition, across participants. RT was standardized by subtracting each participant’s mean RT in the zero-correlation condition, separately for correct and error trials. Points and error bars are across-participant means and SEMs, respectively.

RTs were consistent with a bound on (approximate) logLR. a RTs measured from an example participant for the weaker (left) and stronger (right) evidence conditions. Unfilled points are data from individual trials. Filled points are means, lines are linear fits to those means. b Summary of mean RT versus correlation for all participants and conditions. Correlation-magnitude group is indicated at the top of each panel. Lines are data from individual participants. c Summary of slopes of linear fits to mean RT versus correlation for individual participants (as in a). Box-and-whisker plots show median, interquartile range, 90th percentiles, and outliers as a function of correlation-magnitude group. Colored lines are predicted relationships for decisions based on an accumulation of evidence to a fixed bound, where the weight of evidence was computed as correlation-independent (naïve) and correlation-dependent (true) logLR. The data are roughly consistent with decision processes that, on average, used a correlation-dependent logLR but based on a slight underestimate of the correlation-dependent scale factor (computed using ; black dashed lines).

A drift-diffusion model (DDM) captures normative evidence weighting via bound-height adjustments. a In the DDM, sensory observations are modeled as samples from a Gaussian distribution (in the continuum limit). Evidence is accumulated over time as the decision variable until it reaches one of the two bounds, which terminates the decision in favor of the choice corresponding to that bound (here for simplicity we show fixed bounds, but in the fitting detailed below we use collapsing bounds). For pairs of correlated observations, altering the correlation between the pairs is equivalent to changing the standard deviation of the generative distribution of the sum of each pair, which affects the drift rate plus the scaling of the bound height (see Methods). Normative evidence weighting in the DDM corresponds to correlation-dependent adjustments of the bound height (the decision rule) to account for the changes in the generative standard deviation. These changes in the bound height are functionally equivalent to scaling the observations to compute the true logLR. b Predictions from the DDM. Colors correspond to three simulated correlation conditions (see legend). Other parameters were chosen to approximate those found in fits to human data. Each column depicts predictions based on the same form of correlation-dependent bound scaling (see a and below) but with a different subjective correlation (i.e., the correlation assumed by the observer), which was computed as a proportion of the objective correlation ρ (computed on Fisher-z-transformed correlations that were then back-transformed). Given equal expected logLR across correlation conditions, underestimating the correlation (, first three columns) leads to performance differences between the conditions, where the magnitude of the differences is a function of the degree of underestimation. Only (rightmost column) produces equal predicted performance across conditions.

A DDM accounts for human behavior. a Model comparison: mean AIC (top) and protected exceedance probability (PEP; bottom), across all task conditions, for four different models, as labeled (see text for details). b Model comparison within each correlation-magnitude group, showing the difference in AIC between the bound and bound+drift models (top) and PEP (bottom). Bar colors in the PEP plots correspond to the model colors in the top panel of a. c Predictions from the DDM (lines) plotted against participant data (points) for choice (top) and RT (bottom) for each correlation-magnitude group (columns, labels at top). Predictions and data are averaged across participants. Colors correspond to the three correlation conditions (see legend). Error bars are SEM. Model predictions are derived from the best-fitting model at each magnitude (see b). The inset for the 0.8 group compares the fit of the bound+drift model (solid lines) to the bound model (dashed lines).

Participants used near-optimal correlation estimates, with slight biases away from extreme values. a The subjective fit correlation () from the DDM as a function of the objective correlation (ρ). Open circles are the fits from individual participants. Closed circles are the averages per correlation condition (averages were computed on Fisher z-transformed values and then back-transformed). Error bars (not visible in most cases) are SEM. The dashed line is the unity line. b The same data as in a, but plotted with the correlation-dependent bound scale factor on the ordinate. The orange dashed line corresponds to the normative scale factor. The green dashed line is the scale factor for a naïve observer that assumes zero correlation.

Participants used stable estimates of the correlations, even as they adjusted other components of the decision process over the course of a session. Each panel shows a scatterplot of DDM parameters estimated using the first (abscissa) versus second (ordinate) half of trials from a given participant, from the best-fitting model per correlation-magnitude group. Points are data from individual participants. Columns are correlation condition, and rows are a, drift rate, k0, and b, bound height, B. c shows estimates of positive (; squares) and negative (; diamonds) subjective correlations. P-values are for a Wilcoxon rank-sum test for H0: median difference between the first- and second-half parameter estimates across participants = 0, uncorrected for multiple comparisons.