Introduction

Sensation and action are traditionally thought to involve separate brain circuits serving distinct functions: Activity in early sensory areas is driven nearly exclusively by the corresponding sensory input, whereas activity in motor areas is exclusively related to movement. Recent work in mice, a major mammalian model system in neuroscience, has called for a reevaluation of this distinction, given recent demonstrations that activity in mouse primary visual cortex (V1) depends as much on what visual stimulus is shown as whether the mouse is running or stationary [1]. Neurons in V1 of virtually all mammals are selective for simple image features, a presumably critical early step of image processing that continues throughout a hierarchy of visual brain areas [2, 3], and this is true of mice as well [4]. The observation that running modulates V1 of a comparable magnitude to the visually-driven activity has motivated substantial effort in the field to understand the biological mechanisms and functional consequences of this powerful interaction between sensation and action [516]

However, these observations have all been made in rodents, and similar measurements have not been made in primates. Although rodents certainly rely on vision for important behaviors [17, 18], primates are more fundamentally visual organisms, with exquisite acuity and specialized functional characteristics such as foveas and corresponding high-resolution representations of the central visual field in V1 [19]. And while experiments that allow subjects to run while viewing visual stimuli may now be commonplace in mice, analogous experiments in nonhuman primates have not appeared techncially possible. It has thus remained unclear whether the large effect of running on early visual processing is a general property of mammalian brains revealed by work in mice, or whether the early stages of primate visual processing are less affected by nonvisual factors. Here, we fill this major gap in cross-species understanding by taking advantage of the relatively small size and peaceable nature of the common marmoset (Callithrix jacchus), which allowed us to have animals on a custom-designed treadmill and to use high-channel-count electrode arrays, including Neuropixels.

Results

We tested for running-based modulations in V1 of the common marmoset, a highly visual new world primate. Marmosets were head-fixed, placed on a wheel-based treadmill suited to their arboreal nature (Fig. 1a), and alternated between running and not running while we presented various visual stimuli designed to assess the properties and responsiveness of V1 neurons (Fig. 1b). We recorded from foveal and parafoveal neurons in 2 marmosets (using chronically-implanted N-Form 3D electrode arrays), and in one marmoset were also able to simultaneously record from both foveal and peripheral V1 using Neuropixels 1.0 probes. To support precise comparison to rodent V1, we used the same analysis pipeline on a publicly-available mouse dataset that used matching stimuli in a treadmill paradigm [20]. This let us perform direct quantitative and statistical comparisons of the effects of running on V1 activity in a rodent and a primate.

Recording from marmoset V1 during active locomotion.

a) Apparatus for recording from marmoset V1 while presenting visual stimuli on a high-resolution display, monitoring gaze using an eye tracker, on a toroidal treadmill that allowed the marmoset to run or not run. b) Schematic example of variables of interest. Visual stimuli were presented (top row). Rasters show activity from a V1 array (second row). Gaze was monitored (3rd row, x and y timeseries plotted in black and grey), saccades were detected (red), and pupil size was also measured (4th row). Running speed was measured using a rotary encoder attached to the treadmill (5th row). c) Before the main experiments, receptive fields were mapped using sparse noise [21]. The array of pseudocolor images shows 3 examples of V1 receptive fields (2 foveal and 1 peripheral neuron). d) Main experiment involved presenting full-field sinusoidal gratings that drifted in one of 12 directions (top row), at a variety of spatial frequencies (vertical axis at left). Rasters show example V1 activity during stimulus presentations when running (red) or stationary (black). e) Summary of receptive field (RF) locations in the mouse dataset (orange, top), and f) our data from marmosets (blue and green, bottom). In both marmosets, we recorded from a portion of V1 accessible at the dorsal surface of the brain using chronically implanted arrays, which yielded neurons with foveal receptive fields (green RFs). We also recorded from one marmoset using Neuropixels arrays, allowing us to simultaneously access both peripheral and foveal V1 (blue RFs; peripheral units are analyzed later/separately, see text). g) Examples of mouse V1 orientation tuning curves, for cells with weak, moderate, and strong orientation tuning. h) Same, for marmoset V1. i,j) Histograms of orientation-selectivity indices (OSIs) for mice (i) and marmosets (j). Marmoset OSIs, likely lower than previously reported because we used full-field stimuli not optimized to the spatial frequency tuning of each neuron. Regardless, the marmoset V1 neurons had strong visual responses and qualitatively conventional tuning. k,l) Running speeds in mice (k) and marmosets (l). Marmosets were acclimated to the treadmill and motivated to run with fluid rewards yoked to traveling a criterion distance.

First, we mapped the receptive fields of marmoset V1 neurons using reverse-correlation techniques adapted to free-viewing [21] while we measured gaze using a video-based eyetracker (Fig. 1c). In V1 of both marmosets, we found receptive fields within the central few degrees of vision, with sizes expected at those eccentricities (1-5 deg, Fig. 1f, blue and green; these can be compared to those in mouse, Fig. 1e). As expected for primary visual cortex, marmoset V1 (both well-isolated single units and well-tuned multi-unit clusters) responded robustly to oriented gratings and exhibited orientation(and sometimes direction-) selectivity [22, 23], similar to that in the mouse V1 dataset (Fig. 1g,h). Orientation tuning spanned a range from weak to strong tuning, with many units exhibiting strong and conventional tuning curves (Fig. 1i,j).

As a first test for effects of running on V1 activity, we assessed whether running speed was correlated with aggregate V1 activity by comparing the time series of these variables throughout each session. In the mouse, such modulations are easily visually evident when inspecting the time series of neural activity and running: when the mouse runs, V1 spiking often increases substantially. Fig. 2a,b shows example sessions with the maximal and median amounts of correlation between the time series of running speed and a generic low-dimensional representation of the population activity (the first principal component [PC] of the simultaneously-recorded V1 trial spike counts). This correlation could be seen when running / not running alternated on slow (Fig. 2a) or fast (Fig. 2b) time scales.

Mice and marmosets exhibit different correlations between V1 activity and running speed.

a) Mice show visually compelling correlations between V1 trial spike counts and running speed. Example session with the highest correlation between running and V1 activity. Raster at top shows spiking activity of all mouse V1 neurons recorded. Population activity is summarized below the raster as the first principal component of the V1 array activity (“First Neural PC”, orange trace); running speed is plotted underneath it on the same time axis (grey trace). Clearly, the two curves are highly similar. b) Same, for an example mouse session chosen to have the median correlation between running and V1 activity. In this example, the modulations of running speed and neural activity rise and fall together on a faster time scale than in the example in a). c,d) Marmosets show smaller, and typically negative, correlations between V1 spiking activity and running speed. Format same as the mouse data in (a,b), with example sessions chosen to show the maximal and the median correlations between V1 activity and running speed. The (anticorrelated) similarity between V1 activity (First Neural PC) and Running Speed curves is harder to discern in the marmoset. e,f) Correlations between V1 activity and running in the mouse (e) had a median > 0 (median=0.407, p=9.04 × 10−5, stat=308, n=25, Mann-Whitney U Test), and many individual sessions had significant correlations with running (filled bars), and all such significant sessions had positive correlations (with significance determined via permutation to remove effects of autocorrelation [24]). In the marmosets (f), the distributions of correlations were slightly but reliably negative (median=-0.033, p=0.034, stat=101, n=27, Mann-Whitney U Test), and all significantly modulated individual sessions exhibited negative correlations (5/27).

A starkly different impression comes from visual inspection of the relationship between running and the activity of marmoset V1 neurons representing the central visual field, which was considerably smaller and less compelling. In these examples, which show the maximal and median relationships between running and V1 activity (Fig. 2c,d). In these sessions, V1 activity did not track running speed as clearly, although the activity did tend to increase when the monkey stopped running, explaining the modest negative correlations. We then quantified these relationships across all experiments on a session-by-session basis, in both species. For mice, this confirmed a strong positive correlation (Fig. 2e; median=0.407, n=25, p=9.04 × 105, stat=308, Mann-Whitney U Test). For marmosets, the distribution of correlations between V1 activity and running was subtly but reliably negative (Fig. 2f, median=-0.033, p=0.034, stat=101, n=27, Mann-Whitney U Test). The correlation with running was significantly different between the two species (p=6.93 × 107, stat=934, Mann-Whitney U Test). This session-level analysis confirmed that running modulations in mice are large and mostly reflect increases in response, while running modulations in marmoset foveal V1 are slightly suppressive.

To perform additional quantitative tests at the level of individual V1 units, we divvied up the spike rate responses to drifting gratings based on whether or not the marmoset was running (Fig. 3). This analysis confirmed, in mouse, a tendency for large response increases during running to both the preferred (Fig. 3a, geometric mean ratio [running/stationary] = 1.523, 95% CI [1.469, 1.579], n=743 tuned units) and to all visual stimuli (Fig 3b, 1.402 [1.365, 1.440], n=1168). Many individual units had significant running modulations, and were more often increases rather than decreases (803/1168 [69%] increased firing rate, and 115/1168 [10%] decreased, bootstrapped t-test). In marmoset V1, there was again a modest decrease evident in the response to the preferred stimulus (Fig. 3c; geometric mean ratio [running/stationary] = 0.899, 95% CI [0.851, 0.949], n=228 tuned units). This suppression was less evident in responses aggregated across all stimuli (Fig. 3d, 1.011 [0.995, 1.027], n=786). The number of significantly modulated units was relatively small, and was more balanced between decreases and increases in firing rate (172/786 [22%] increased and 161/786 [20%] decreased, bootstrapped t-test). Because we performed quantitative comparisons on subsets of the data for which the stimuli were nearly identical across species, and used the same data analysis code to calculate response metrics, these analyses solidly confirm a substantial difference between the form of running modulations of V1 activity in mouse versus marmoset (log ratio of running:stationary was significantly different between mouse and marmoset for all units: p=6.62 × 1099, stat=1399874, Mann-Whitney U Test, and tuned units: p=4.69 × 10−57, stat=4030135). Thus, the aggregate impacts of running on V1 responses were larger in mice and of opposite sign in marmoset.

Running strongly increases mouse V1 activity and subtly decreases marmoset V1 activity, evidenced at the level of individual units.

Mouse data points are plotted in orange and marmoset data in blue. a) Scatterplot (log-log) shows firing rate to preferred stimulus for tuned units (OSI > 0.2), during running (y-axis) and stationary (x-axis). Histogram summarizes the projections onto the line of unity, and shows a clear shift indicating increases in response during running (geometric mean ratio [running/stationary] = 1.523 [1.469, 1.579], n=743). Dark-shaded symbols indicate individually-significant units. Dashed lines indicate doubling (2X) and halving (0.5x) of response. b) Same format, but now showing the response aggregated over all stimuli, for all units (geometric mean ratio [running/stationary] = 1.402 [1.365, 1.440], n=1168). A similar pattern reflecting primarily large increases is evident. c,d) V1 units in marmoset show a very different pattern. Responses of tuned units to preferred stimuli (c) cluster more closely to the line of unity, with a small but significant shift indicating a subtle decrease in response (geometric mean ratio [running/stationary] = 0.899 [0.851, 0.949], n=228). Responses to all stimuli for all units (d) show even less running-related modulation (geometric mean ratio [running/stationary] = 1.011 [0.995, 1.027], n=786).

Given these apparently categorical differences between the two species at the levels of both experimental sessions and individual units, a key question is whether mouse and marmoset visual cortices are modulated by non-visual input in fundamentally different ways. To answer this, we employed more powerful modelbased neuronal population analyses that inferred trial-to-trial variations in shared gain modulations across V1 (Fig. 4a,d) [25], in a manner totally agnostic to running (or any other aspect of behavior). This model improved descriptions of the population data over simpler models that only took the stimulus (and slow drifts in baseline firing rate) into account for all sessions (Fig. 4b,c; marmoset p=1.52 × 10 82, stat=27174, n=754, Wilcoxon signed rank test; mouse p=4.64 × 10 181, stat=25966, n=1257). This was true in both species, bolstering the emerging notion that population-level gain modulations are a general principle of mammalian V1 function [2529]. This shared gain term modulated more strongly in mice compared to marmosets (Fig. 4e, std. dev. in mouse = 2.170 [2.106, 2.245], marmoset = 1.188 [1.072, 1.274], p<1 × 109, stat=1013202, Mann-Whitney U Test). Furthermore, in the mouse, shared gain was higher during running than stationary stimulus presentations (mean difference 0.970 [0.761, 1.225], p∼0, stat 8.017, t test), demonstrating that a substantial portion of modulations of mouse V1 can be explained by a shared gain term that increases with running (Fig. 4f, orange point). In marmoset, shared gain was slightly but reliably lower when running (mean difference = -0.125 [-0.203, -0.059], p=0.002, stat=-3.360, t test, (Fig. 4f, blue point), a quantitatively very different relation to running than in mouse (p = 8.77 × 109, stat=6.615, 2 sample t test). Thus, a common mechanism (shared gain) explains running modulations in both species– but with quantitatively different correlations with behavior that make for distinct downstream impacts on perception and action.

Shared gain model accounts for fluctuations in both mouse and marmoset V1, and explains species differences.

a) Structure of shared modulator model. In addition to the effects of the stimulus (and slow drift in responsiveness, not rendered), the model allows for a shared gain/multiplicative term (green). Each simultaneously-recorded neuron is fitted with a weight to the latent gain term. b) The resulting model provides a better account of both mouse and marmoset V1 responses compared to a simple model that only fits stimulus and slow drift terms. Points show variance explained (r2) on test data for each session under each of the two models, plotted against one another. c) Variance explained for individual units was significantly improved in both species (marmoset: gain model [median r2=0.2504] significantly higher than stim+drift [median r2 = 0.1220], p=1.52 × 1082, stat=27174, Wilcoxon signed rank test; mouse: gain model [median r2=0.4420] significantly better than stim+drift [median r2=0.1697], p=4.64 × 10181, stat=25966, Wilcoxon signed rank test). d) Example of relationship between neural responses (top raster, blue), the shared gain (green) and running speed (black trace). Visual inspection similar to that in Figure 2 can be performed. e) Gain modulations span a larger range in mice than in marmosets. Orange, gain term from each mouse session; blue, gain term from each marmoset session. Triangles indicate medians (mouse = 2.17 [2.11, 2.25], marmoset = 1.19 [1.07, 1.27]). f) Shared gain term is larger during running for mouse data, but is slightly smaller during running for marmoset data (difference is plotted on y-axis; mouse = 0.970 [0.761, 1.225], p=4.73 × 109, stat 8.017, 1 sample t test; marmoset = -0.125 [-0.203, -0.059], p=0.002, stat=-3.360, 1 sample t test).

Although our marmoset dataset focused on V1 neurons representing the central portion of the visual field, we were also able to record simultaneously from neurons with peripheral and foveal receptive fields by advancing a Neuropixels probe into both the superficial portion of V1 (foveal) and the calcarine sulcus (peripheral), resulting in simultaneous recordings of 110 and 147 (stimulus-driven) units representing the central and peripheral portions of the visual field, respectively. Analyzing neurons with peripheral receptive fields separately revealed a difference in running modulations between these retinotopically-distinct portions of V1: peripheral neurons had slightly higher stimulus-driven responses during running (aggregating over all stimuli, geometric mean ratio [running/stationary] = 1.129 [1.068, 1.194], n=147; difference was significant, p=2.100e-03, stat=12376, Mann-Whitney U Test), and the two sessions in which we were able to perform these measurements had higher positive correlations than any sessions in our entire foveal V1 dataset (assessed by correlating running speed either with the First Neural PC or with a shared gain term). Although the foveal representation in V1, which is relatively specific to primates, is slightly suppressed by running, it appears that quantitative differences exist in the peripheral representation. Regardless, these measurements show that correlations with running in primate peripheral V1 are still small relative to those in mouse V1 (median spike rate modulation by running significantly different between mouse and marmoset calcarine recordings: p=7.639 × 10 11, stat=7967825, Mann-Whitney U Test).

Discussion

In short, running does not affect V1 activity in marmosets like it does in mouse. The large, typically positive correlations between running and V1 activity often found in mice are simply not evident in marmosets. Although we matched our experimental protocol to mouse experiments and used the same metrics and analysis pipeline, the species difference was close to categorical. We hypothesize that this difference is at the level of taxonomic order, distinguishing how much behavioral state interacts with early stages of visual processing in primates versus rodents.

However, our population-level analyses did reveal a deeper cross-species generalization. The same sharedgain model improved accounts of both mouse and marmoset V1 activity. These population-level gain modulations likely reflecting modulatory inputs associated with behavioral state and arousal. This commonality points to mechanistic insights of how V1 activity is modulated. The primate-rodent difference in the magnitude and sign of V1 gain modulations we observed is in fact consistent with known differences in neuromodulatory inputs related to arousal in rodent and primate V1 [30, 31]. In primates, the locations of ACh receptors allow cholinergic inputs to increase the activity of the majority of GABAergic neurons and hence suppress net activity via inhibition [32, 33], but pharmacologically and anatomically distinct cholinergic influences in rodent likely exert more complex effects on net activity, including disinhibition which can increase net activity [14, 16, 34]. Our population-level analyses also lay groundwork for connections to indirect and aggregate measures of neural activity made in humans under related conditions [3537], as well as the typically small modulations seen in primate visual cortices elicited by carefully-controlled attentional tasks, which are more clear when population-level modulations are considered [3840].

In mice, the large effects on V1 activity are likely to affect all subsequent stages of processing [8], but in marmosets, the small effects are less likely to have pronounced downstream effects. That said, running may directly and more strongly interact with later stages of visual processing in primates. This would be consistent with differences in where canonical computations occur across species with different numbers of visual areas [3, 41, 42], and our V1 results pave the way for similar investigations in the large number of primate extrastriate visual areas.

Although our analysis of aggregate V1 activity demonstrates a substantial difference in function across species, finer-grained parcellation of multiple effects correlated with running [9] and circuitand cell-type specific inquiries [1, 10, 14, 15] may reveal more nuanced effects in primate V1. For example, further consideration of the differences in feedback circuits across the visual field representation [43] is now strongly motivated by our findings in peripheral V1. Finally, it is important to recognize that larger effects of behavioral state may still be found in primate V1: Other behaviors that more directly recruit active vision may reveal stronger modulations. In mice, running may have a more direct functional relation to visual processing, and/or may be a more direct readout of large changes in arousal. Exploring such possibilities can now be done with elaborations of the experimental protocol and techniques described here, and the ensuing results will enrich the cross-species connections we have drawn at the level of primate V1. The population-level commonality of shared gain modulations we identified may support further cross-species generalizations that transcend simpler observations of empirical similarity or dissimilarity [44, 45].

Materials and methods

We performed electrophysiological recordings in V1 of two common marmosets (1 male, “marmoset G”, and 1 female, “marmoset B”, both aged 2 years). Both subjects had chronically implanted N-form arrays (Modular Bionics, Modular Bionics, Berkeley CA) inserted into left V1. Implantations were performed with standard surgical procedures for chronically-implanted arrays in primates. Additional recordings were also performed using Neuropixels 1.0 probes [46] acutely inserted into small craniotomies (procedure described below). All experimental protocols were approved by The University of Texas Institutional Animal Care and Use Committee and in accordance with National Institute of Health standards for care and use of laboratory animals.

Subjects stood quadrupedally on a 12” diameter wheel while head-fixed facing a 24” LCD (BenQ) monitor (resolution = 1920×1080 pixels, refresh rate = 120 Hz) corrected to have a linear gamma function, at a distance of 36 cm (pixels per degree = 26.03) in a dark room. Eye position was recorded via an Eyelink 1000 eye tracker (SR Research) sampling at 1 kHz. A syringe pump-operated reward line was used to deliver liquid reward to the subject. Timing events were generated using a Datapixx I/O box (VPixx) for precise temporal registration. All of these systems were integrated in and controlled by MarmoView. Stimuli were generated using MarmoView, custom code based on the PLDAPS [47] system using Psychophysics Toolbox in MATLAB (Mathworks). For the electrophysiology data gathered from the N-Form arrays, neural responses were recorded using two Intan C3324 headstages attached to the array connectors which sent output to an Open Ephys acquisition board and GUI on a dedicated computer. In electrophysiology data gathered using Neuropixels probes, data was sent through Neuropixels headstages to a Neuropixels PXIe acquisition card within a PXIe chassis (National Instruments). The PXIe chassis sent outputs to a dedicated computer running Open Ephys with an Open Ephys acquisition board additionally attached to record timing events sent from the Datapixx I/O box. Spike sorting on data acquired using N-Form arrays was performed using in-house code to track and merge data from identified single units across multiple recording sessions [49]. Spike sorting for data acquired using Neuropixels probes was performed using Kilosort 2.5.

Chronic N-Form array recordings

Chronic array recordings were performed using 64-channel chronically-implanted 3D N-Form arrays consisting of 16 shanks arrayed in a 4×4 grid with shanks evenly spaced 0.4 mm apart (Modular Bionics, Berkeley, CA, USA). Iridium oxide electrodes are located at 1, 1.125, 1.25, and 1.5 mm (tip) along each shank, forming a 4×4×4 grid of electrodes. Arrays were chronically inserted into the left dorsal V1 of marmosets G and B at 1.5 and 4 degrees eccentric in the visual field, respectively (confirmed via post-hoc spatial RF mapping). Well-isolated single units were detectable on the arrays in excess of 6 months after the initial implantation procedure.

Acute Neuropixels recordings

Acute Neuropixels recordings were performed using standard Neuropixels 1.0 electrodes (IMEC, Leuven, Belgium). Each probe consists of 384 recording channels that can individually be configured to record signals from 960 selectable sites along a 10 mm long, 70 × 24 µm cross-section straight shank. Probes were lowered into right dorsal V1 of marmoset G via one of 3 burr holes spaced irregularly along the AP axis 4-5 mm from the midline for a single session of experiments. Natural images were played to provide visual stimulus as well as occupy the subject and keep them awake during insertion and probe settling. The temporary seal on the burr hole was removed, the intact dura nicked with a thin needle and the burr hole filled with saline. The probe was then lowered through the dural slit at 500 µm/minute, allowing 5 minutes for settling every 1000 µm of total insertion. The whole-probe LFP visualization was monitored during insertion for the characteristic banding of increased LFP amplitude that characterizes cortical tissue. The probe was inserted until this banding was visible on the electrodes nearest the tip of the probe, indicating that the probe tip itself had passed through the dorsal cortex and was within the white matter. The probe was then advanced until a second band became visible on the electrodes nearest the tip, indicating the tip of the probe had exited through the cortex of the calcarine sulcus. The probe was then advanced slightly until the entirety of the second LFP band was visible to ensure that electrodes covered the full depth of the calcarine cortex and the tip of the probe was located confidently within the CSF of the sulcus. The probe was then allowed to settle for 10 minutes. Active electrode sites on the probe were configured to subtend both dorsal and calcarine cortex simultaneously. Post-hoc receptive field recreation confirmed that visually-driven, tuned, V1 neurons were recorded at both foveal and peripheral eccentricities.

Mouse dataset from Allen Institute

Mouse data were downloaded from the publicly-available Visual Coding database at https://portal.brainmap.org/explore/circuits/visual-coding-neuropixels. We used the same analysis code to analyze these data and the marmoset data we collected.

General experimental procedure

Marmoset recording sessions began with eye tracking calibration. Once calibration was completed, the wheel was unlocked and the subject was allowed to locomote freely, head-fixed, while free-viewing stimuli. Trials for all stimuli were 20 sec long with a 500 ms ITI and a 20 sec long natural image interleaved every fifth trial to keep the subject engaged. Stimuli were shown in blocks of 10 minutes and a typical recording session consisted of 50 trials of calibration followed by 1 or 2 blocks of a drifting grating stimulus and 1 block each of the two mapping stimuli. To elicit sufficiently reliable and frequent running behavior, subjects were rewarded at set locomotion distance intervals unrelated to the stimulus or gaze behavior (typical rewards were 50-70 µL and distance required to achieve a reward usually varied between 20-75 cm; reward amounts and intervals were adjusted daily to maximally motivate the subject.)

Eye tracking calibration

While the wheel was locked, subjects were allowed to free-view a sequence of patterns of marmoset faces. Marmosets naturally direct their gaze towards the faces of other marmosets when allowed to free-view with little-to-no training, allowing for the experimenter to adjust the calibration offset and gain manually between pattern presentations. Faces were 1.5 degrees in diameter and were presented for 3 sec with a 2 sec ISI between patterns. A portion of presented patterns were asymmetrical across both the X and Y axes of the screen to allow for disambiguation in the case of axis sign flips in the calibration. 50 trials were presented before each recording session to verify and refine the calibration. Calibration drift between sessions was minimal, requiring minor (<1 deg) adjustments over the course of 1-2 months of recordings.

Drifting grating stimuli

The primary stimulus consisted of full-field drifting gratings. Gratings were optimized to drive marmoset V1 with 3 separate spatial frequencies (1, 2, and 4 cycles per degree), two drift speeds (1 or 2 degrees per second) and 12 orientations (evenly-spaced 30 degree intervals). Each trial consisted of multiple grating presentations, each with a randomized spatial frequency, drift speed, and orientation. Gratings were displayed for 833 ms followed by a 249-415 ms randomly jittered inter-stimulus interval. After each 20 second trial there was a longer 500 ms inter-trial interval. Every fifth trial was replaced with a natural image to keep subjects engaged and allow for visual assessment of calibration stability on the experimenter’s display.

Mapping of receptive fields

A spatiotemporal receptive field mapping stimulus, consisting of sparse dot noise, was shown during each recording session. One hundred 1 degree white and black dots were presented at 50% contrast at random points on the screen. Dots had a lifetime of 2 frames (16.666 ms). Marmosets freely viewed the stimulus and we corrected for eye position offline to estimate the spatial receptive fields using forward correlation [21].

Necessary differences between mouse and marmoset experiments

Although we sought to perform experiments in marmosets that were as similar as possible to mouse experiments, some differences in their visual systems and behavior made for differences. Because the spatial frequency tunings of marmoset and mouse V1 neurons are starkly different, we used stimuli with considerably higher spatial frequencies than in the mouse experiments. Relatedly, marmoset V1 receptive fields are much smaller than in mouse. Because we used full-field stimuli (to match mouse experiments), responses in marmoset V1 were likely affected by substantial amounts of surround suppression, which would reduce overall responses. We also learned that, although the marmosets were comfortable perched on the wheel treadmill, they did not naturally run enough for our experimental purposes. We therefore incorporated a reward scheme to motivate the subjects to run more frequently. Finally, the mouse dataset we analyzed comprised a large number of mice with a small number of sessions per mouse; as is required of work with nonhuman primates, we were limited to a smaller number of subjects (N=2), and ran many experimental sessions with each animal.

Session and cell inclusion criteria

For the analyses shown in Figure 2, sessions were included if they contained more than 250 trials and a proportion of trials running was not less than 10% or greater than 90%. For the mouse dataset, this yielded 25/32 sessions. For the marmoset dataset, this yielded 27/34 sessions. For the unit-wise analyses in Figure 3, super-sessioned units were included for analysis if they had more than 300 trials of data and a mean firing rate of >1 spike / second. This yielded 1168/2015 units in mouse and 786/1837 units in marmoset.

For the analyses shown in Figure 4, sessions were included using the same trial and running criterion as in Figure 2. Only units that were well fit by the stimulus + slow drift model (i.e., cross validated better than the null, see ‘shared modulator model’) were included and sessions were excluded if fewer than 10 units met this criterion. This resulted in 31/32 sessions for mouse and 28/34 sessions for marmoset.

Analysis of tuning

We counted spikes between the 50ms after grating onset and 50ms after grating offset and divided by the interval to generate a trial spike rate. To calculate orientation tuning curves, we computed the mean firing rate each orientation and spatial frequency. Because we were limited by the animal’s behavior to determine the number of trials in each condition (i.e., running or not), we computed orientation tuning as a weighted average across spatial frequencies with with weights set by the spatial frequency tuning. We used these resulting curves for the all analyses of tuning. We confirmed that the results did not change qualitatively if we either used only the best spatial frequency or marginalized across spatial frequency.

Orientation selectivity index was calculated using the following equation

where θ is the orientation and r is the baseline-subtracted vector of rates across orientations.

Shared modulator model

To capture shared modulator signals in an unsupervised manner, we fit our neural populations with a latent variable model [50]. The goal of our latent variable model is to summarize population activity with lowdimensional shared signal that operates as a gain on the stimulus processing (e.g. [26, 28]). The general form of the model is that the response of an individual neuron, ri on trial t is decomposed into a stimulus response, gain modulator, and additive offsets:

where fi[s(t)] is the tuning curve, gi(t) is a neuron-specific gain on the stimulus response, hi(t) is an additive noise term for the trial and bi is the baseline firing rate. To scale this to a shared population model, we enforced the gain, g, to be rank 1, such that it can be decomposed into a trial-wise vector of gains and a neuron-wise vector of loadings that map the trial latent into modulatory signal for each neuron. Similar models have been employed to describe the population response in V1 in several species [2528].

To capture the stimulus tuning curves, we represented the stimulus on each trial an m−dimensional ‘onehot’ vector, where m is the number of possible conditions (Orientation × Spatial Frequency) and on each trial all elements are zero, except for the condition shown. Thus, f [s(t)] is a linear projection of the stimulus on the tuning curves, As(t), where W is an n × m matrix of tuning weights. We decomposed the gain for each neuron on each trial into a rank 1 matrix that was rectified and offset by one, g(t) = ReLU[1 + zg(t)wg], where wg is an n−dimensional vector of loadings that map the 1-dimensional trial latent to a population-level signal, zg(t)wg. This signal is offset by 1 and rectified such that it is always positive and a loading weight of zero equals a gain of 1.0.

Thus, the full model describes the population response as

Thus, the parameters of the model are the stimulus tuning parameters A, the shared gain, g, the gain loadings, wg, and the offsets, b. To capture any unit-specific slow drifts in firing rate, we further parameterized b as a linear combination of 5 b0-splines evenly spaced across the experiment [51]. Thus, the baseline firing rate for each neuron, i, was a linear combination of 5 ‘tent’ basis functions spaced evenly across the experiment, bi = ∑jbjϕj(t).

We first fit a baseline model with only stimulus and baseline parameters

Following Whiteway and Butts (2017), we initialized A and b the model using fits from a model without latent variables and initialized the latent variables using an Autoencoder [52, 53]. We then fit the gain, loadings, and stimulus parameters using iterative optimization with L-BFGS, by minimizing the mean squared error (MSE) between the observed spikes and the model rates. The model parameters were regularized with a modest amount of L2-penalty and the amount was set using cross-validation on the training set. The latent variables were penalized with a small squared derivative penalty to impose some smoothness across trials. This was set to be small and the same value across all sessions. We reverted the model to the autoencoder initialization if the MSE on a validation set did not improve during fitting.

We cross-validated the model using a speckled holdout pattern [54] whereby some fraction of neurons were withheld on each trial with probability p=0.25. We further divided the withheld data into a validation set and a test set by randomly assigning units to either group on each trial with probability 0.5. The validation loss was used to stop the optimization during the iterative fitting and the test set was used to evaluate the models.

Acknowledgements

We thank Allison Laudano for animal and colony management and care, Christopher Badillo for apparatus design and fabrication, and Nika Hazen for assistance with animal work. Cris Niell, Cory Miller, Jude Mitchell, and Anne Churchland all provided valuable feedback on drafts of this paper. We thank the Visual Coding team at the Allen Institute for sharing the mouse data used in this paper (https://portal.brainmap.org/explore/circuits/visual-coding-neuropixels).

Funding

National Institutes of Health / BRAIN Initiative grant U01-UF1NS116377 (A.H.)

National Science Foundation grant NSC-FO 2123605 (D.B.)

National Institutes of Health grant K99EY032179-02 (J.Y.).

Author contributions

Conceptualization: J.L.Y., D.A.B., J.P.L., A.C.H.

Methodology: J.P.L., D.P.R., T.T.K.N., J.-O.M., D.A.B., J.L.Y., A.C.H.

Investigation: J.P.L., D.P.R.

Visualization: J.L.Y.

Funding acquisition: A.C.H., D.A.B., J.L.Y.

Analysis: J.L.Y.

Project administration: A.C.H.

Supervision: A.C.H., J.L.Y., D.A.B.

Writing – original draft: A.C.H., J.L.Y., J.P.L.

Writing – review & editing: J.P.L., D.P.R., T.T.K.N., J.-O.M., D.A.B., J.L.Y., A.C.H.

Competing interests

Authors declare that they have no competing interests.

Data and materials availability

All data in the main text or the supplementary materials are available upon request, and will be posted publicly at time of publication.