Sharing neurophysiology data from the Allen Brain Observatory

Abstract
Editor's evaluation
Introduction
Overview of the Allen Brain Observatory
Approach to data distribution
Three families of use cases
User experience
Discussion
Data availability
References
Article and author information
Metrics

Abstract

Nullius in verba (‘trust no one’), chosen as the motto of the Royal Society in 1660, implies that independently verifiable observations—rather than authoritative claims—are a defining feature of empirical science. As the complexity of modern scientific instrumentation has made exact replications prohibitive, sharing data is now essential for ensuring the trustworthiness of one’s findings. While embraced in spirit by many, in practice open data sharing remains the exception in contemporary systems neuroscience. Here, we take stock of the Allen Brain Observatory, an effort to share data and metadata associated with surveys of neuronal activity in the visual system of laboratory mice. Data from these surveys have been used to produce new discoveries, to validate computational algorithms, and as a benchmark for comparison with other data, resulting in over 100 publications and preprints to date. We distill some of the lessons learned about open surveys and data reuse, including remaining barriers to data sharing and what might be done to address these.

Editor's evaluation

This article presents an important review of data-sharing efforts in neurophysiology, with a focus on data released by the Allen Institute for Brain Science. The article offers perspectives from the users of such shared data, and makes a compelling case that data sharing has already advanced research in neuroscience. There are valuable insights here for producers and users of neurophysiology data, as well as the funders that support all those efforts.

https://doi.org/10.7554/eLife.85550.sa0

Introduction

Why share data? The central nervous system is among the most complex organs under investigation. Accordingly, the tools to study it have become intricate and costly, generating ever-growing torrents of data that need to be ingested, quality-controlled, and curated for subsequent analysis. Not every lab has the financial or personnel resources to accomplish this. Moreover, while many scientists relish running experiments, others find their passion in analysis. Data collection requires a different skillset than analysis, especially as the field demands more comprehensive and higher-dimensional datasets, which, in turn, necessitate more advanced analytical methods and software infrastructure. A scientific ecosystem in which data is extensively shared and reused would give researchers more freedom to focus on their favorite parts of the discovery process.

Sharing data brings other benefits as well. It increases the number of eyes on each dataset, making it easier to spot potential outlier effects (Button et al., 2013). It encourages meta-analyses that integrate data from multiple studies, providing the opportunity to reconcile apparently contradicting results or expose the biases inherent in specific analysis pipelines (Botvinik-Nezer et al., 2020; Mesa et al., 2021). It also gives researchers a chance to test hypotheses on existing data, refining and updating their ideas before embarking on the more costly process of running new experiments.

Without a doubt, reanalysis of neurophysiology data has already facilitated numerous advances. Electrophysiological recordings from nonhuman primates, which require tremendous dedication to collect, are often reused in multiple high-impact publications (Churchland et al., 2010; Murray et al., 2014). Data from ‘calibration’ experiments, in which activity of individual neurons is monitored via two modalities at once, have been extremely valuable for improving data processing algorithms (GENIE Project, 2015; Henze et al., 2009; Huang et al., 2021; Neto et al., 2016). A number of these datasets have been shared via the website of CRCNS (Teeters et al., 2008), far-sighted organization focused on aggregating data for computational neuroscience within the same searchable database. To date, CRCNS hosts 150 datasets, including extensive neurophysiology recordings from a variety of species, as well as fMRI, EEG, and eye movement datasets. This is especially impressive given that CRCNS was launched by a single lab in 2008. The repository does not enforce formatting standards, and thus each dataset differs in its packaging conventions, as well as what level of preprocessing may have been applied to the data. The website includes a list of 111 publications and preprints based on CRCNS data. Our own meta-analysis of these articles shows that 28 out of 150 datasets have been reused at least once, with four reused more than 10 times each.

More recently, an increasing number of researchers are choosing to make data public via generalist repositories such as Figshare, Dryad, and Zenodo, or the neuroscience-specific G-Node Infrastructure. In addition, the lab of György Buzsáki maintains a databank of recordings from more than 1000 sessions from freely moving rodents (Petersen et al., 2020). As data can be hosted on these repositories for free, they greatly lower the barriers to sharing. However, the same features that reduce the barriers for sharing can also increase the barriers for reuse. With no restrictions on the data format or level of documentation, learning how to analyze diverse open datasets can take substantial effort, and scientists are limited in their ability to perform meta-analyses across datasets. Further, with limited and nonstandard documentation, finding relevant datasets can be challenging.

Since its founding, the Allen Institute has made open data one of its core principles. Specifically, it has become known for generating and sharing survey datasets within the field of neuroscience, taking inspiration from domains such as astronomy where such surveys are common. (As a community, astronomers have developed a far more comprehensive and coherent data infrastructure than biology. One obvious reason is the existence of a single sky with an agreed-upon coordinate system and associated standards such as the Flexible Image Transport System; Borgman et al., 2016; York et al., 2000; Zuiderwijk and Spiers, 2019.) The original Allen Mouse Brain Atlas (Lein et al., 2007) and subsequent surveys of gene expression (Bakken et al., 2016; Hawrylycz et al., 2012; Miller et al., 2014), mesoscale connectivity (Harris et al., 2019; Oh et al., 2014), and in vitro firing patterns (Gouwens et al., 2019) have become essential resources across the field. These survey datasets are (1) collected in a highly standardized manner with stringent quality controls, (2) create a volume of data that is much larger than typical individual studies within their particular disciplines, and (3) are collected without a specific hypothesis to facilitate a diverse range of use cases.

Starting a decade ago, we began planning the first surveys of in vivo physiology in mouse cortex with single-cell resolution (Koch and Reid, 2012). Whereas gene expression and connectivity are expected to change relatively slowly, neural responses in awake subjects can vary dramatically from moment to moment, even during apparently quiescent periods (McCormick et al., 2020). Therefore, an in vivo survey of neural activity poses new challenges, requiring many trials and sessions to account for both intra- as well as inter-subject variability. We first used two-photon calcium imaging and later Neuropixels electrophysiology to record spontaneous and evoked activity in visual cortex and thalamus of awake mice that were passively exposed to a wide range of visual stimuli (known as ‘Visual Coding’ experiments). A large number of subjects, highly standardized procedures, and rigorous quality control criteria distinguished these surveys from typical small-scale neurophysiology studies. More recently, the Institute carried out surveys of single-cell activity in mice performing a visually guided behavioral task (known as ‘Visual Behavior’ experiments). In all cases, the data was shared even before we published our own analyses of them. We reflect here on the lessons learned concerning the challenges of data sharing and reuse in the neurophysiology space. Our primary takeaway is that the widespread mining of our publicly available resources demonstrates a clear community demand for open neurophysiology data and points to a future in which data reuse becomes more commonplace. However, more work is needed to make data sharing and reuse practical (and ideally the default) for all laboratories practicing systems neuroscience.

Overview of the Allen Brain Observatory

The Allen Brain Observatory consists of a set of standardized instruments and protocols designed to carry out surveys of cellular-scale neurophysiology in awake brains (de Vries et al., 2020; Siegle et al., 2021a). Our initial focus was on neuronal activity in the mouse visual cortex (Koch and Reid, 2012). Vision is the most widely studied sensory modality in mammals, but much of the foundational work is based on recordings with hand-tuned stimuli optimized for individual neurons, typically investigating a single area at a time (Hubel and Wiesel, 1998). The field has lacked the sort of unbiased, large-scale surveys required to rigorously test theoretical models of visual function (Olshausen and Field, 2005). The laboratory mouse is an advantageous model animal given the extensive ongoing work on mouse cell types (BRAIN Initiative Cell Census Network (BICCN), 2021; Tasic et al., 2018; Yao et al., 2021; Zeisel et al., 2015), as well as access to a well-established suite of genetic tools for observing and manipulating neural activity via driver and reporter lines or viruses (Gerfen et al., 2013; Madisen et al., 2015).

Our two-photon calcium imaging dataset (Allen Institute Mindscope Program, 2016) leveraged transgenic lines to drive the expression of a genetically encoded calcium indicator (Chen et al., 2013) in specific populations of excitatory neurons (often constrained to a specific cortical layer) or GABAergic interneurons. In total, we recorded activity from over 63,000 neurons across 6 cortical areas, 4 cortical layers, and 14 transgenic lines (Figure 1). The Neuropixels electrophysiology dataset (Allen Institute MindScope Program, 2019) used silicon probes (Jun et al., 2017) to record simultaneously from the same six cortical areas targeted in the two-photon dataset, as well as additional subcortical regions (Durand et al., 2022). While cell type specificity was largely lost, transgenic lines did enable optotagging of specific inhibitory interneurons. The Neuropixels dataset included recordings from over 40,000 units passing quality control across more than 14 brain regions and 4 mouse lines (Figure 1). In both surveys, mice were passively exposed to a range of visual stimuli. These included drifting and flashed sinusoidal gratings to measure traditional spatial and temporal tuning properties, sparse noise or windowed gratings to map spatial receptive fields, images and movies that have natural spatial and temporal statistics, and epochs of mean luminance to capture neurons’ spontaneous activity. These stimuli were selected to provide a broad survey of visual physiological activity and compare the organization of visual responses across brain regions and cell types. Mice were awake during these experiments and head-fixed on a spinning disk that permitted them to run in a self-initiated and unguided manner. Subsequent surveys of neural activity in mice performing a behavioral task are not discussed here as it is too soon to begin evaluating their impact on the field.

Figure 1

Download asset Open asset

Overview of Allen Brain Observatory Visual Coding datasets.

(a) Target brain regions and example visual stimuli. (b) Standardized rigs for two-photon calcium imaging (left) and Neuropixels electrophysiology (right). (c) Example ΔF/F traces or spike rasters for 100 simultaneously recorded neurons from each modality. Both are extracted during one presentation of a 30 s natural movie. (d) Dataset size after different stages of analysis.

Figure 1—source data 1 Dataset size after each stage of analysis.: https://cdn.elifesciences.org/articles/85550/elife-85550-fig1-data1-v1.xlsx
Download elife-85550-fig1-data1-v1.xlsx

Approach to data distribution

Once the data was collected, we wanted to minimize the friction required for external groups to access it and mine it for insights. This is challenging! Providing unfettered access to the data can be accomplished by providing a simple download link; yet, unless the user understands what is contained in the file and has installed the appropriate libraries for parsing the data, its usefulness is limited. At the other extreme, a web-based analysis interface that does not require any downloading or installation can facilitate easy data exploration, but this approach has high upfront development costs and imposes limitations on the analyses that can be carried out.

These conflicting demands are apparent in our custom tool, the AllenSDK, a Python package that serves as the primary interface for downloading data from these surveys as well as other Allen Institute resources. In the case of the Allen Brain Observatory, the AllenSDK provides wrapper functions for interacting with the Neurodata Without Borders (NWB) files (Rübel et al., 2022; Teeters et al., 2015) in which the data is stored. Intuitive functions enable users to search metadata for specific experimental sessions and extract the relevant data assets. Whereas our two-photon calcium imaging survey was accompanied by a dedicated web interface that displayed summary plots for every cell and experiment (observatory.brain-map.org/visualcoding), we discontinued this practice because of its associated development costs and because most users preferred to directly access the data in their own analysis environment.

One challenge with sharing cellular neurophysiology data is that it includes multiple high-dimensional data streams. Many other data modalities (e.g., gene expression) can be reduced to a derived metric and easily shared in a tabular format (e.g., cell-by-gene table). In contrast, neurophysiological data is highly varied, with researchers taking different approaches to both data processing (e.g., spike sorting or cell segmentation) and analysis. While these data can be analyzed as a large collection of single-cell recordings, they can also be approached as population recording, leveraging the fact that hundreds to thousands of neurons are recorded simultaneously. Thus, particularly for a survey-style dataset not designed to test a particular hypothesis, it is hard to reduce these recordings to a simple set of derived metrics that encapsulate the full range of neural and behavioral states. Even when it is possible (e.g., we could have shared a table of single-cell receptive field and tuning properties as the end product), this confines any downstream analyses to those specific metrics, severely undermining the space of possible use cases. At the same time, if we had only shared the raw data, few researchers would have the resources or the inclination to build their own preprocessing and packaging pipelines.

Therefore, we aimed to share our data in a flexible way to facilitate diverse use cases. For every session, we provided either spike times or fluorescence traces, temporally aligned stimulus information, the mouse’s running speed and pupil tracking data, as well as intermediate, derived data constructs, such as ROI masks, neuropil traces, and pre- and post- de-mixing traces for two-photon microscopy, and waveforms across channels for Neuropixels. All are contained within the NWB files. In addition, we uploaded the more cumbersome, terabyte-scale raw imaging movies and voltage traces to the public cloud for users focused on data processing algorithms (Figure 2).

Figure 2

Download asset Open asset

Distributing data from Allen Brain Observatory Visual Coding experiments.

Raw data is acquired and processed at the Allen Institute, combined with metadata (including 3-D neuronal coordinates, stimulus information, eye-tracking data, and running speed) and packaged into NWB files. Each such file is intended to be a complete, self-contained data record for one experimental session in one animal. NWB files are uploaded to three different locations in the cloud: The Amazon Web Services (AWS) Registry of Open Data, the Distributed Archives for Neurophysiology Data Integration (DANDI) repository, and the Allen Institute data warehouse (accessed via the AllenSDK, a Python API for searching for relevant sessions and downloading data). Raw data is also uploaded to the AWS Registry of Open Data. End users can either analyze data in the cloud or download data for local analysis.

Three families of use cases

The first round of two-photon calcium imaging data was released in July 2016, followed by three subsequent releases that expanded the dataset (green triangles in Figure 3). The Neuropixels dataset became available in October 2019 (yellow triangle in Figure 3). At the end of 2022, there were 104 publications or preprints that reuse these two datasets, with first authors at 50 unique institutions. This demonstrates the broad appeal of applying a survey-style approach to the domain of in vivo neurophysiology.

Figure 3

Download asset Open asset

Data reuse over time.

Cumulative number of papers or preprints that include novel analysis of Allen Brain Observatory Visual Coding surveys. Triangles indicate the years in which new data was made publicly available. Paper icons indicate the years in which the Allen Institute preprints describing the dataset contents and initial scientific findings were posted.

We found three general use cases of Allen Brain Observatory data in the research community:

Generating novel discoveries about brain function
Validating new computational models and algorithms
Comparing with experiments performed outside the Allen Institute

Below, we highlight some examples of these three use cases, for both the two-photon calcium imaging and Neuropixels datasets. All these studies were carried out by groups external to the Allen Institute, and frequently without any interaction from us, speaking to the ease with which data can be downloaded and analyzed.

Making discoveries

Sweeney and Clopath, 2020 used Allen Brain Observatory two-photon imaging data to explore the stability of neural responses over time. They previously found that neurons in a recurrent network model with high inherent plasticity had more variability in their stimulus selectivity than those with low plasticity. They also found that neurons with high inherent plasticity have higher population coupling. To examine whether these were related, they here analyzed real calcium-dependent fluorescence traces from the Allen Brain Observatory to examine whether population coupling and response variability were correlated. The authors found that, indeed, population coupling is correlated with the change in orientation and direction tuning of neurons over the course of a single experiment, an unexpected result linking population activity with individual neural responses.

Bakhtiari et al., 2021 examined whether a deep artificial neural network (ANN) could model both the ventral and dorsal pathways of the visual system in a single network with a single cost function. They trained two networks, one with a single pathway and the other with two parallel pathways, using a Contrastive Predictive Coding loss function. Comparing the representations of these networks with the neural responses in the two-photon imaging dataset, they found that the single pathway produced ventral-like representations but failed to capture the representational similarity of the dorsal areas. The parallel pathway network, though, induced distinct representations that mapped onto the ventral/dorsal division. This work is an illustration of how large-scale data can guide the development of neural network modeling, and, conversely, how those approaches can inform our understanding of cortical function.

Fritsche et al., 2022 analyzed the time course of stimulus-specific adaptation in 2365 neurons in the Neuropixels dataset and discovered that a single presentation of a drifting or static grating in a specific orientation leads to a reduction in the response to the same visual stimulus up to eight trials (22 s) in the future. This stimulus-specific, long-term adaptation persists despite intervening stimuli, and is seen in all six visual cortical areas, but not in visual thalamic areas (LGN and LP), which returned to baseline after one or two trials. This is a remarkable example of a discovery that was not envisioned when designing our survey, but for which our stimulus set was well suited.

At least three publications have taken advantage of the fact that every Neuropixels insertion targeting visual cortex and thalamus also passed through the intervening hippocampus and subiculum. Nitzan et al., 2022 analyzed the local field potential from these electrodes to detect the onset of sharp-wave ripples, fast oscillations believed to mediate offline information transfer out of the hippocampus (Girardeau and Zugaro, 2011). They found that sharp-wave ripples coincided with a transient, cortex-wide increase in functional connectivity with the hippocampus. Jeong et al., 2023 examined the topography of this functional connectivity and found that distinct but intermingled classes of visual cortex neurons were preferentially modulated by ripples originating in dorsal hippocampus, while others were more coupled to ripples in intermediate hippocampus. Purandare and Mehta, 2023 analyzed the responses of hippocampal neurons to natural movies and found that many displayed highly selective ‘movie fields’ that were often as robust as those of neurons in visual cortex. However, in contrast to visual cortex, the movie fields in the hippocampus disappeared if the movie frames were shuffled (thereby disrupting the learned temporal sequence). Although the Allen Brain Observatory experiments were not originally designed to test hypotheses of hippocampal function, the Neuropixels dataset turned out to be attractive for understanding the interactions between this structure and visual cortical and thalamic regions.

Validating models and algorithms

Many researchers used the numerous and diverse fluorescence movies in the two-photon imaging dataset to validate image processing algorithms. As the different transgenic lines used in the dataset target different populations of neurons, they have different labeling densities. As a result, there are some very sparse movies with only a dozen neurons within the field of view and others with up to ~400 neurons. This makes the dataset a rich resource for benchmarking methods for cell segmentation (Bao et al., 2021; Inan et al., 2021; Kirschbaum et al., 2020; Petersen et al., 2018; Soltanian-Zadeh et al., 2019), matching neurons across multiple sessions (Sheintuch et al., 2017), and removing false transients in the fluorescence traces (Bao et al., 2022; Gauthier et al., 2022).

Montijn et al., 2021 used the Neuropixels survey to showcase a novel method for identifying statistically significant changes in neural activity. Their method, called ZETA (Zenith of Event-based Time-locked Anomalies), detects whether a cell is responsive to stimulation without the need to tune parameters, such as spike bin size. As an example, they analyze the ‘optotagging’ portion of the Neuropixels experiments carried out in Vip-Cre × ChR2 mice, involving the activation of Vip+ interneurons with brief pulses of blue light. Intended to aid in the identification of genetically defined cell types at the end of each recording session, the authors show how these recordings can be exploited to test the network-level impact of triggering a particular class of interneurons. ZETA identifies not only Vip+ neurons that are directly activated by the light pulses, but also nearby cortical neurons that are inhibited on short timescales and disinhibited over longer timescales.

Buccino et al., 2020 used raw data from the Neuropixels survey to validate SpikeInterface, a Python package that runs multiple spike sorting algorithms in parallel and compares their outputs. We originally performed spike sorting with one such algorithm, Kilosort 2 (Pachitariu et al., 2016). The authors of this paper used SpikeInterface to compare the performance of Kilosort 2 and five additional algorithms. In one example session, over 1000 distinct units were detected by only one sorter, while only 73 units were detected by five or more sorters. At first glance, this finding seems to indicate a high level of disagreement among the algorithms. However, when comparing these results with those from simulations, it became clear that the low-agreement units were mainly false positives, while the true positive units were highly consistent across algorithms. This finding, and the SpikeInterface package in general, will be essential for improving the accuracy of spike sorting in the future.

Comparisons with other datasets

Kumar et al., 2021 used supervised and semi-supervised learning algorithms to classify cortical visual areas based on either spontaneous activity or visually evoked responses. Cortical visual areas, defined based on retinotopic maps, are thought to serve distinct visual processing functions. Rather than compare tuning properties of neurons across the areas, as many studies (including our own) have done, the authors trained classifiers to successfully determine the area membership and boundaries from the neural responses to visual stimuli. They compared the performance of these algorithms for their own wide-field imaging dataset with our two-photon imaging dataset. This provides an extension and validation of their results to conditions in which single-cell responses are available.

Muzzu and Saleem, 2021 performed electrophysiological recordings in mouse cortex to examine ‘mismatch’ responses, where neurons respond to differences in visual cue and motor signals from running. The authors argued that these responses derive from visual features rather than the mismatch, showing that these perturbation responses might be explained by preferential tuning to low temporal frequencies. The authors use our two-photon imaging dataset to demonstrate a difference in temporal frequency tuning across cortical layers, with neurons in superficial layers being tuned to lower frequencies, supporting the fact that mismatch responses are predominantly observed in superficial layers. While this use case is perhaps one of the simplest, it is an elegant demonstration of gaining validation for implications that emerge from one’s own experiments.

Stringer et al., 2021 compared spiking activity from the Neuropixels dataset to calcium-dependent fluorescence changes recorded in their laboratory. Their analysis focused on the precision with which the orientation of static gratings can be decoded from activity in visual cortex. Using their own two-photon calcium imaging dataset that consisted of up to 50,000 simultaneously recorded neurons, they found that it was possible to use neural activity to discriminate orientations that differ by less than 0.4°, about a factor of 100 better than reported behavioral thresholds in mice. As an important control, they showed that the trial-to-trial variability in evoked responses to static gratings was nearly identical between their two-photon data and our Neuropixels electrophysiology data, indicating that their main result was not likely to depend on the recording modality. This use case is noteworthy because the preprint containing this comparison appeared less than a month after our dataset became publicly available.

Schneider et al., 2021 directly compared the Allen Neuropixels dataset with Neuropixels recordings from LGN and V1 carried out locally. They first analyzed gamma-band coherence between these two structures in the Allen Brain Observatory dataset and found evidence in support of their hypothesis that inter-regional coherence is primarily driven by afferent inputs. This contrasts with the ‘communication through coherence’ hypothesis (Fries, 2015), which posits that pre-existing inter-regional coherence is necessary for information transfer. They then performed a separate set of Neuropixels recordings in which they found that silencing cortex (via optogenetic activation of somatostatin-positive interneurons) did not change the degree of coherence between LGN and V1, indicating that V1 phase-locking is inherited from LGN, further supporting their hypothesis. This is an insightful example of how a survey dataset can be used to test a hypothesis, followed by a set of more specific follow-up experiments that refine the initial findings.

Use in education

These surveys have also been used in a variety of educational contexts. Many computational neuroscience summer courses have presented them as potential source of student projects. This includes the Allen Institute’s own Summer Workshop on the Dynamic Brain as well as the Cold Spring Harbor Neural Data Science and Computational Neuroscience: Vision courses; Brains, Minds, and Machines Summer Course at the Marine Biological Laboratory; and the Human Brain Project Education Program. Indeed, in some cases these projects have led to publications (Christensen and Pillow, 2022; Conwell et al., 2022). Beyond these summer courses, these datasets are discussed in undergraduate classrooms, enabling students to learn computational methods with real data rather than toy models. This includes classes at the University of Washington, Brown University, and the University of California, San Diego.

User experience

To gain additional insight into the perspectives of end users, we interviewed eight scientists who published papers based on Allen Brain Observatory data. There were three primary reasons why users chose to analyze these datasets: (1) they were interested in the datasets’ unique features, such as the number of recorded regions; (2) they lacked the ability to collect data from a particular modality (e.g., an imaging lab wanted to analyze electrophysiology data); or (3) they wanted to validate their own findings using an independent dataset. Although most users initially tried to access the data via the AllenSDK Python package, several found it easier to download the NWB files directly after exporting a list of URLs, particularly if they were using Matlab for analysis. Common challenges included slow data download speeds, understanding the details of preprocessing steps, and data format changes (e.g., the original Neuropixels files were subsequently updated to adhere to the latest NWB standard, which broke compatibility with older versions of the AllenSDK). In most cases, reaching out to scientists at the Allen Institute cleared up these issues. Users also encountered obstacles related to the scale of the data: some scientists needed to learn how to submit jobs to their local high-performing computing cluster to speed-up analysis or to develop new methods for retrieving and organizing data. But the size of the dataset was also one of its biggest advantages:

“From a scientific perspective, facing such a rich dataset can be overwhelming at the beginning—there are so many questions that could be addressed with it and it’s easy to get lost. In my case, it was a blessing rather than a curse; my initial question was simply how different areas included in the dataset are modulated by ripples. Having such a wide coverage of the hippocampal axis, I later asked myself whether ripples recorded on different probes differentially modulate neuronal activity outside of the hippocampus, which led me to some interesting and unexpected findings.” (Noam Nitzan, NYU)

In general, researchers were enthusiastic about this resource:

“I looked at several open data sets, and I quickly realized that the Allen Brain Observatory Neuropixels data set was the best documented open data set I found. The intuitive packaging in the NWB format, as well as the systematic repetition of experiments with a comparatively high number of mice and single units in various visual areas, made the decision to use the Allen dataset very easy.” (Marius Schneider, Ernst Strüngmann Institute)

Journal referees seemed to respond positively to the use of Allen Brain Observatory data, although one user reported that a reviewer was concerned about their ability to adequately validate data they did not collect themselves. For future data releases, several users requested experiments with different types of visual stimuli, ideally chosen through interactions with the wider community.

Discussion

Although it is too early to assess the long-term relevance of the first two Allen Brain Observatory datasets, the more than 100 publications that mined this data over the last 6 years testify to its immediate impact. Our data has been used for a wide array of applications, many of which we did not envision when we designed the surveys. We attribute this success to several factors, including the scale of the dataset (tens of thousands of neurons across hundreds of subjects), our extensive curation and documentation efforts (in publications, white papers, and websites), a robust software kit for accessing and analyzing the data (the AllenSDK), and a well-organized outreach program (involving tutorials at conferences and a dedicated summer workshop).

One key lesson we learned is to facilitate different types of data reuse, as illustrated by the examples above. While many users primarily care about spike times or fluorescence traces, others require raw data. Because of this, it was fortuitous that we provided access to both (Figure 2). Sharing the data in a way that is flexible and well documented reduces constraints on which questions can be addressed, and is thus paramount for facilitating reuse. Indeed, while many papers leveraged the datasets to examine the visual functional properties of neurons or brain areas, many others used these data in a way that was agnostic to the visual context of the underlying experiments.

We hope to see the sharing of both raw and processed cellular physiology data soon become ubiquitous. However, we know that our surveys were contingent on the efforts of a large team, including scientists from multiple disciplines, hardware and software engineers, research associates, and project managers. Assembling similar resources is untenable for most academic labs. Fortunately, there are ongoing developments that will lower the barriers to sharing and reusing data: increased standardization and cloud-based analysis tools.

Increased standardization

The success of data reuse rests on the FAIR Principles: data must be Findable, Accessible, Interoperable, and Reusable (Wilkinson et al., 2016). In other words, prospective analysts must be able to easily identify datasets appropriate to their needs and know how to access and use the data assets. This is best accomplished if data is stored in standardized formats, with common conventions for rich metadata and easy-to-use tools for search and visualization.

The Allen Institute has invested heavily in developing and promoting Neurodata Without Borders (NWB) as a standard data model and interchange format for neurophysiology data (Rübel et al., 2022; Teeters et al., 2015). NWB has been criticized for being both too restrictive (as it often takes a dedicated programmer to generate format-compliant files from lab-specific data) and not restrictive enough (as it does not enforce sufficient metadata conventions, especially related to behavioral tasks). Nevertheless, there are overwhelming advantages to having common, language-agnostic formatting conventions across the field. Building a rich ecosystem of analysis and visualization tools based on NWB will incentivize additional labs to store their data in this format and even to directly acquire data in NWB files to make data immediately shareable (this is already possible for electrophysiological recordings using the Open Ephys GUI; Siegle et al., 2017). We envision a future in which it will require less effort for neurophysiologists to comply with community-wide standards than to use their own idiosyncratic conventions because standardized formats serve as a gateway to a host of pre-existing, carefully validated analysis packages.

Standardized metadata conventions are also critical for promoting data reuse. Our surveys are accompanied by extensive white papers, code repositories, and tutorials that detail the minutiae of our methods and tools, beyond the standard ‘Methods’ section in publications (see Box 1 for links). For the community at large, a more scalable solution is needed. Standardized and machine-readable metadata needs to extend beyond administrative metadata (describing authors, institutions, and licenses) to include thorough and detailed experimental conditions and parameters in a self-contained manner. As data sharing becomes more widespread, standardization of metadata will be particularly important for reducing ‘long tail’ effects in which a small number of datasets are reused extensively, while others are disregarded, as observed in the reuse of CRCNS data. To avoid a situation in which publicly available datasets from more focused studies are overlooked, all these studies should be indexed by a single database that can be filtered by relevance, making it much easier for researchers to identify data that is appropriate for their needs. The recently launched Distribute Archives for Neurophysiology Data (DANDI) addresses this concern by enforcing the use of NWB for all shared datasets (Rübel et al., 2022). In the two years since the first dataset was uploaded, the archive now hosts more than 100 NWB-formatted datasets accessible via download links, a command-line interface, or within a cloud-based JupyterHub environment.

Box 1

Web resources for Allen Brain Observatory Visual Coding datasets

White papers describing the surveys

2P – http://help.brain-map.org/display/observatory/Documentation

Neuropixels – https://portal.brain-map.org/explore/circuits/visual-coding-neuropixels

Code repositories

AllenSDK – https://github.com/alleninstitute/allensdk

2P – https://github.com/AllenInstitute/visual_coding_2p_analysis

Neuropixels – https://github.com/AllenInstitute/neuropixels_platform_paper

Tutorials

2P – https://allensdk.readthedocs.io/en/latest/brain_observatory.html

Neuropixels – https://allensdk.readthedocs.io/en/latest/visual_coding_neuropixels.html

The human neuroimaging field has faced similar challenges. To address the lack of standardization across public datasets, the community spearheaded the development of the Brain Imaging Data Structure (BIDS), a set of schemas for storing MRI volumes along with associated behavioral and physiological measures (Gorgolewski et al., 2016). NWB shares many features of the BIDS standard, including a hierarchical structure, separation of raw and derived data, and support for extensions. BIDS was essential for the success of OpenNeuro, a public neuroimaging data archive which, as of 2021, included data from over 20,000 subjects (Markiewicz et al., 2021). Given the related aims of OpenNeuro and DANDI, there are many opportunities for the leaders and maintainers of these resources to learn from one another.

While the adoption of consistent data formatting conventions is a welcome development, there are also benefits to greater standardization of protocols, hardware, and software used for data collection. One way this can be achieved is through coordinated cross-laboratories experiments, such as those implemented by the International Brain Laboratory (IBL), a consortium that uses Neuropixels to survey responses across the entire mouse brain in a visual decision-making task (Abbott et al., 2017; Ashwood et al., 2022). It can also be beneficial to carry out smaller-scale studies on infrastructure built for surveys, as we have done as part of the "OpenScope" project. OpenScope allows members of the wider community to propose experiments to be run by Allen Brain Observatory staff (Gillon et al., 2023; Mayner et al., 2022; Prince et al., 2021). This lowers the barriers to generating high-quality, standards-compliant data, especially for labs whose work is primarily computational. Similarly, the IBL is now entering a phase in which member laboratories conduct more focused studies that take advantage of existing rigs and data pipelines.

To encourage data sharing, the field of neurophysiology also needs greater standardization in the way data mining is tracked and credited. Digital object identifiers (DOIs) are an essential first step; we regret not making them an integral part of the Visual Coding data releases. However, they have not solved the problem of discovering reuse as they are not always included in publications. It is more common to include a reference to the original paper in which the dataset was described, but this makes it difficult to distinguish instances of reuse from other types of citations. Currently the onus is on those releasing the data to keep track of who accesses it. To take one example, the cai-1 calcium indicator calibration dataset from the Svoboda Lab at HHMI Janelia Research Campus (GENIE Project, 2015) only has five citations tracked in Google Scholar. Yet a deeper dive into the literature reveals that this dataset has been reused in a wide range of publications and conference papers that benchmark methods for inferring spike rate from calcium fluorescence signals, of which there are likely over 100 in total. Many of these papers only cite the original publication associated with this dataset (Chen et al., 2013), refer to the repository from which the data was downloaded (CRCNS), or do not cite the data source at all. The lack of an agreed-upon method for citing datasets (like we have for journal articles) is a loss for the community as it hinders our ability to give appropriate credit to those responsible for collecting widely used datasets. A simple, widely accepted method for citing data would benefit all authors as it has been shown that publications within the astrophysics community that provide links to the underlying data gain more citations on average than those that do not (Dorch et al., 2015).

Cloud-based analysis tools

To enable more efficient data mining, end users should ideally not need to download data at all. This is particularly true as the volume of data keeps on growing (e.g., a single Allen Brain Observatory Neuropixels session generates about 1.2 TB of raw data). Therefore, the goal should be to bring users to the data, rather than the data to users. This is supported by our interviews with end users who cited slow download speeds as a key challenge.

Generic analysis tools, such as Amazon’s SageMaker and Google’s Colab, already make it possible to set up a familiar coding environment in the cloud. However, we are most excited about tools that lower the barriers and the costs of cloud analysis for scientists. Some of the most promising tools include DataJoint (Yatsenko et al., 2015), DandiHub, NeuroCAAS (Abe et al., 2022), Binder, and Code Ocean (many of which are built on top of the powerful Jupyter platform). All of these are aimed at improving the reproducibility of scientific analyses, while shielding users from the details of configuring cloud services.

Cloud-based analysis is not a panacea. Although individual tools can be vendor-agnostic, there will be a push to centralize around a single cloud platform, given the high cost transferring data out of cloud storage. This could lead to a single company monopolizing the storage of neurophysiology data; it would therefore be prudent to invest in a parallel distribution system that is controlled by scientists (Saunders and Davis, 2022). In addition, it is (perhaps not surprisingly) notoriously easy for unwary users to provision expensive cloud computing resources; a single long-running analysis on a powerful cloud workstation could exhaust a lab’s entire annual budget without safeguards in place. Despite these drawbacks, we believe that a move to cloud-based analysis will be essential for reducing the friction involved in adopting new datasets. We plan to move toward supporting a cloud-native sharing model more directly in our upcoming data releases.

Fostering a culture of data reuse

The value of open data is best realized when it is conveniently accessible. Whether this involves new discoveries or comparing results across studies, data mining is vital for progress in neuroscience, especially as the field as a whole shifts toward more centralized ‘Observatories’ for mice and non-human primates (Koch et al., 2022). The BRAIN Initiative has invested considerable resources in advancing instruments and methods for recording a large number of neurons in more sophisticated behavioral contexts. Yet the analytical methods for understanding and interpreting large datasets are lagging, as many of our theoretical paradigms emerged from an era of small-scale recordings (Urai et al., 2022). In order to develop theories that can explain brain-wide cellular neurophysiology data, it is critical to maximize data reuse.

This poses a set of challenges. Any time a scientist uses a new dataset, they must comprehend both how to access and manipulate it and decide whether it is appropriate for their question. The latter is the actual scientific challenge, and is where scientists should expend the bulk of their energy. To facilitate this, we first need increased compliance with standards and enhanced tooling around those standards. The more straightforward and intuitive it is to analyze a particular dataset, the more likely it is to be reused. The full burden of refining and adhering to these standards should not fall on the good intentions of individual researchers; instead, we need funding agencies and institutions to recognize the value of open data and allocate resources to facilitate the use of such standards. Everyone benefits when scientists can focus on actual biology rather than on the technical challenges of sharing and accessing data. Second, we need our evaluation of data reuse to ensure that researchers have identified data assets pertinent to their questions and have accounted for the limitations of an experimental paradigm. For instance, we have shown that a naïve comparison of cellular properties measured in the same visual areas across our Neuropixels and two-photon calcium imaging datasets reveals substantial discrepancies (Siegle et al., 2021b). These can only be reconciled by accounting for the bias inherent in each recording modality, as well as the data processing steps leading to the calculation of functional metrics. Effective data reuse requires that we, as a field, focus more of our energies on better communicating these important technical factors and holding researchers accountable for understanding them when they analyze someone else’s data.

Neuroscientists have traditionally been taught to address questions by collecting new data. As data sharing becomes more prevalent, neuroscientists’ first instinct should instead be to search for existing data that may offer insights into the problem at hand, whether or not it was originally intended for this purpose. Even in situations where the ‘perfect’ dataset does not yet exist, it is likely that researchers can exploit available data to refine a broad question into one that is more focused, and thus experimentally more tractable. Just as young scientists are trained to discover, interpret, and cite relevant publications, it is imperative that they are also taught to effectively identify, evaluate, and mine open datasets.

Data availability

The Allen Brain Observatory Visual Coding datasets are available at https://portal.brain-map.org/explore/circuits. The list of papers reusing these datasets is provided in Source data 1.

References

1. Abbott LF
2. Angelaki DE
3. Carandini M
4. Churchland AK
5. Dan Y
6. Dayan P
7. Deneve S
8. Fiete I
9. Ganguli S
10. Harris KD
11. Häusser M
12. Hofer S
13. Latham PE
14. Mainen ZF
15. Mrsic-Flogel T
16. Paninski L
17. Pillow JW
18. Pouget A
19. Svoboda K
20. Witten IB
21. Zador AM
(2017) An international laboratory for systems and computational neuroscience
Neuron 96:1213–1218.

https://doi.org/10.1016/j.neuron.2017.12.013
- Google Scholar
1. Abe T
2. Kinsella I
3. Saxena S
4. Buchanan EK
5. Couto J
6. Briggs J
7. Kitt SL
8. Glassman R
9. Zhou J
10. Paninski L
11. Cunningham JP
(2022) Neuroscience cloud analysis as a service: an open-source platform for Scalable, reproducible data analysis
Neuron 110:2771–2789.

https://doi.org/10.1016/j.neuron.2022.06.018
- PubMed
- Google Scholar
Website
1. Allen Institute Mindscope Program
(2016) Allen Brain Observatory – 2-photon Visual Coding (Dataset)
Accessed July 5, 2023.

https://brain-map.org/explore/circuits
Website
1. Allen Institute MindScope Program
(2019) Allen Brain Observatory – Neuropixels Visual Coding (Dataset)
Accessed July 5, 2023.

https://brain-map.org/explore/circuits
(2022) Mice alternate between discrete strategies during perceptual decision-making
Nature Neuroscience 25:201–212.

https://doi.org/10.1038/s41593-021-01007-z
- PubMed
- Google Scholar
Preprint
(2021) The functional specialization of visual cortex emerges from training parallel pathways with self-supervised predictive learning
bioRxiv.

https://doi.org/10.1101/2021.06.18.448989
- Google Scholar
1. Bakken TE
2. Miller JA
3. Ding SL
4. Sunkin SM
5. Smith KA
6. Ng L
7. Szafer A
8. Dalley RA
9. Royall JJ
10. Lemon T
11. Shapouri S
12. Aiona K
13. Arnold J
14. Bennett JL
15. Bertagnolli D
16. Bickley K
17. Boe A
18. Brouner K
19. Butler S
20. Byrnes E
21. Caldejon S
22. Carey A
23. Cate S
24. Chapin M
25. Chen J
26. Dee N
27. Desta T
28. Dolbeare TA
29. Dotson N
30. Ebbert A
31. Fulfs E
32. Gee G
33. Gilbert TL
34. Goldy J
35. Gourley L
36. Gregor B
37. Gu G
38. Hall J
39. Haradon Z
40. Haynor DR
41. Hejazinia N
42. Hoerder-Suabedissen A
43. Howard R
44. Jochim J
45. Kinnunen M
46. Kriedberg A
47. Kuan CL
48. Lau C
49. Lee CK
50. Lee F
51. Luong L
52. Mastan N
53. May R
54. Melchor J
55. Mosqueda N
56. Mott E
57. Ngo K
58. Nyhus J
59. Oldre A
60. Olson E
61. Parente J
62. Parker PD
63. Parry S
64. Pendergraft J
65. Potekhina L
66. Reding M
67. Riley ZL
68. Roberts T
69. Rogers B
70. Roll K
71. Rosen D
72. Sandman D
73. Sarreal M
74. Shapovalova N
75. Shi S
76. Sjoquist N
77. Sodt AJ
78. Townsend R
79. Velasquez L
80. Wagley U
81. Wakeman WB
82. White C
83. Bennett C
84. Wu J
85. Young R
86. Youngstrom BL
87. Wohnoutka P
88. Gibbs RA
89. Rogers J
90. Hohmann JG
91. Hawrylycz MJ
92. Hevner RF
93. Molnár Z
94. Phillips JW
95. Dang C
96. Jones AR
97. Amaral DG
98. Bernard A
99. Lein ES
(2016) A comprehensive transcriptional map of primate brain development
Nature 535:367–375.

https://doi.org/10.1038/nature18637
- PubMed
- Google Scholar
(2021) Segmentation of neurons from fluorescence calcium recordings beyond real-time
Nature Machine Intelligence 3:590–600.

https://doi.org/10.1038/s42256-021-00342-x
- PubMed
- Google Scholar
1. Bao Y
2. Redington E
3. Agarwal A
4. Gong Y
(2022) Decontaminate traces from fluorescence calcium imaging videos using targeted non-negative matrix Factorization
Frontiers in Neuroscience 15:797421.

https://doi.org/10.3389/fnins.2021.797421
- PubMed
- Google Scholar
(2016) The durability and fragility of knowledge Infrastructures: lessons learned from astronomy
Proceedings of the Association for Information Science and Technology 53:1–10.

https://doi.org/10.1002/pra2.2016.14505301057
- Google Scholar
1. Botvinik-Nezer R
2. Holzmeister F
3. Camerer CF
4. Dreber A
5. Huber J
6. Johannesson M
7. Kirchler M
8. Iwanir R
9. Mumford JA
10. Adcock RA
11. Avesani P
12. Baczkowski BM
13. Bajracharya A
14. Bakst L
15. Ball S
16. Barilari M
17. Bault N
18. Beaton D
19. Beitner J
20. Benoit RG
21. Berkers R
22. Bhanji JP
23. Biswal BB
24. Bobadilla-Suarez S
25. Bortolini T
26. Bottenhorn KL
27. Bowring A
28. Braem S
29. Brooks HR
30. Brudner EG
31. Calderon CB
32. Camilleri JA
33. Castrellon JJ
34. Cecchetti L
35. Cieslik EC
36. Cole ZJ
37. Collignon O
38. Cox RW
39. Cunningham WA
40. Czoschke S
41. Dadi K
42. Davis CP
43. Luca AD
44. Delgado MR
45. Demetriou L
46. Dennison JB
47. Di X
48. Dickie EW
49. Dobryakova E
50. Donnat CL
51. Dukart J
52. Duncan NW
53. Durnez J
54. Eed A
55. Eickhoff SB
56. Erhart A
57. Fontanesi L
58. Fricke GM
59. Fu S
60. Galván A
61. Gau R
62. Genon S
63. Glatard T
64. Glerean E
65. Goeman JJ
66. Golowin SAE
67. González-García C
68. Gorgolewski KJ
69. Grady CL
70. Green MA
71. Guassi Moreira JF
72. Guest O
73. Hakimi S
74. Hamilton JP
75. Hancock R
76. Handjaras G
77. Harry BB
78. Hawco C
79. Herholz P
80. Herman G
81. Heunis S
82. Hoffstaedter F
83. Hogeveen J
84. Holmes S
85. Hu CP
86. Huettel SA
87. Hughes ME
88. Iacovella V
89. Iordan AD
90. Isager PM
91. Isik AI
92. Jahn A
93. Johnson MR
94. Johnstone T
95. Joseph MJE
96. Juliano AC
97. Kable JW
98. Kassinopoulos M
99. Koba C
100. Kong XZ
101. Koscik TR
102. Kucukboyaci NE
103. Kuhl BA
104. Kupek S
105. Laird AR
106. Lamm C
107. Langner R
108. Lauharatanahirun N
109. Lee H
110. Lee S
111. Leemans A
112. Leo A
113. Lesage E
114. Li F
115. Li MYC
116. Lim PC
117. Lintz EN
118. Liphardt SW
119. Losecaat Vermeer AB
120. Love BC
121. Mack ML
122. Malpica N
123. Marins T
124. Maumet C
125. McDonald K
126. McGuire JT
127. Melero H
128. Méndez Leal AS
129. Meyer B
130. Meyer KN
131. Mihai G
132. Mitsis GD
133. Moll J
134. Nielson DM
135. Nilsonne G
136. Notter MP
137. Olivetti E
138. Onicas AI
139. Papale P
140. Patil KR
141. Peelle JE
142. Pérez A
143. Pischedda D
144. Poline JB
145. Prystauka Y
146. Ray S
147. Reuter-Lorenz PA
148. Reynolds RC
149. Ricciardi E
150. Rieck JR
151. Rodriguez-Thompson AM
152. Romyn A
153. Salo T
154. Samanez-Larkin GR
155. Sanz-Morales E
156. Schlichting ML
157. Schultz DH
158. Shen Q
159. Sheridan MA
160. Silvers JA
161. Skagerlund K
162. Smith A
163. Smith DV
164. Sokol-Hessner P
165. Steinkamp SR
166. Tashjian SM
167. Thirion B
168. Thorp JN
169. Tinghög G
170. Tisdall L
171. Tompson SH
172. Toro-Serey C
173. Torre Tresols JJ
174. Tozzi L
175. Truong V
176. Turella L
177. van ’t Veer AE
178. Verguts T
179. Vettel JM
180. Vijayarajah S
181. Vo K
182. Wall MB
183. Weeda WD
184. Weis S
185. White DJ
186. Wisniewski D
187. Xifra-Porxas A
188. Yearling EA
189. Yoon S
190. Yuan R
191. Yuen KSL
192. Zhang L
193. Zhang X
194. Zosky JE
195. Nichols TE
196. Poldrack RA
197. Schonberg T
(2020) Variability in the analysis of a single neuroimaging dataset by many teams
Nature 582:84–88.

https://doi.org/10.1038/s41586-020-2314-9
- PubMed
- Google Scholar
1. BRAIN Initiative Cell Census Network (BICCN)
(2021) A multimodal cell census and atlas of the mammalian primary motor cortex
Nature 598:86–102.

https://doi.org/10.1038/s41586-021-03950-0
- PubMed
- Google Scholar
1. Buccino AP
2. Hurwitz CL
3. Garcia S
4. Magland J
5. Siegle JH
6. Hurwitz R
7. Hennig MH
(2020) SpikeInterface, a unified framework for spike sorting
eLife 9:e61834.

https://doi.org/10.7554/eLife.61834
- PubMed
- Google Scholar
1. Button KS
2. Ioannidis JPA
3. Mokrysz C
4. Nosek BA
5. Flint J
6. Robinson ESJ
7. Munafò MR
(2013) Power failure: why small sample size undermines the reliability of Neuroscience
Nature Reviews Neuroscience 14:365–376.

https://doi.org/10.1038/nrn3475
- Google Scholar
1. Chen TW
2. Wardill TJ
3. Sun Y
4. Pulver SR
5. Renninger SL
6. Baohan A
7. Schreiter ER
8. Kerr RA
9. Orger MB
10. Jayaraman V
11. Looger LL
12. Svoboda K
13. Kim DS
(2013) Ultrasensitive fluorescent proteins for imaging neuronal activity
Nature 499:295–300.

https://doi.org/10.1038/nature12354
- PubMed
- Google Scholar
1. Christensen AJ
2. Pillow JW
(2022) Reduced neural activity but improved coding in rodent higher-order visual cortex during locomotion
Nature Communications 13:1676.

https://doi.org/10.1038/s41467-022-29200-z
- PubMed
- Google Scholar
1. Churchland MM
2. Yu BM
3. Cunningham JP
4. Sugrue LP
5. Cohen MR
6. Corrado GS
7. Newsome WT
8. Clark AM
9. Hosseini P
10. Scott BB
11. Bradley DC
12. Smith MA
13. Kohn A
14. Movshon JA
15. Armstrong KM
16. Moore T
17. Chang SW
18. Snyder LH
19. Lisberger SG
20. Priebe NJ
21. Finn IM
22. Ferster D
23. Ryu SI
24. Santhanam G
25. Sahani M
26. Shenoy KV
(2010) Stimulus onset quenches neural variability: a widespread cortical phenomenon
Nature Neuroscience 13:369–378.

https://doi.org/10.1038/nn.2501
- PubMed
- Google Scholar
Preprint
1. Conwell C
2. Mayo D
3. Buice MA
4. Katz B
5. Alvarez GA
6. Barbu A
(2022) Neural regression, representational similarity, model zoology & neural taskonomy at scale in rodent visual cortex
bioRxiv.

https://doi.org/10.1101/2021.06.18.448431
- Google Scholar
1. de Vries SEJ
2. Lecoq JA
3. Buice MA
4. Groblewski PA
5. Ocker GK
6. Oliver M
7. Feng D
8. Cain N
9. Ledochowitsch P
10. Millman D
11. Roll K
12. Garrett M
13. Keenan T
14. Kuan L
15. Mihalas S
16. Olsen S
17. Thompson C
18. Wakeman W
19. Waters J
20. Williams D
21. Barber C
22. Berbesque N
23. Blanchard B
24. Bowles N
25. Caldejon SD
26. Casal L
27. Cho A
28. Cross S
29. Dang C
30. Dolbeare T
31. Edwards M
32. Galbraith J
33. Gaudreault N
34. Gilbert TL
35. Griffin F
36. Hargrave P
37. Howard R
38. Huang L
39. Jewell S
40. Keller N
41. Knoblich U
42. Larkin JD
43. Larsen R
44. Lau C
45. Lee E
46. Lee F
47. Leon A
48. Li L
49. Long F
50. Luviano J
51. Mace K
52. Nguyen T
53. Perkins J
54. Robertson M
55. Seid S
56. Shea-Brown E
57. Shi J
58. Sjoquist N
59. Slaughterbeck C
60. Sullivan D
61. Valenza R
62. White C
63. Williford A
64. Witten DM
65. Zhuang J
66. Zeng H
67. Farrell C
68. Ng L
69. Bernard A
70. Phillips JW
71. Reid RC
72. Koch C
(2020) A large-scale standardized physiological survey reveals functional organization of the mouse visual cortex
Nature Neuroscience 23:138–151.

https://doi.org/10.1038/s41593-019-0550-9
- PubMed
- Google Scholar
(2015) The data sharing advantage in astrophysics
Proceedings of the International Astronomical Union 11:172–175.

https://doi.org/10.1017/S1743921316002696
- Google Scholar
1. Durand S
2. Heller GR
3. Ramirez TK
4. Luviano JA
5. Williford A
6. Sullivan DT
7. Cahoon AJ
8. Farrell C
9. Groblewski PA
10. Bennett C
11. Siegle JH
12. Olsen SR
(2022) Acute head-fixed recordings in awake mice with multiple neuropixels probes
Nature Protocols 18:424–457.

https://doi.org/10.1038/s41596-022-00768-6
- PubMed
- Google Scholar
1. Fries P
(2015) Rhythms for cognition: communication through coherence
Neuron 88:220–235.

https://doi.org/10.1016/j.neuron.2015.09.034
- PubMed
- Google Scholar
(2022) Brief stimuli cast a persistent long-term trace in visual cortex
The Journal of Neuroscience 42:1999–2010.

https://doi.org/10.1523/JNEUROSCI.1350-21.2021
- PubMed
- Google Scholar
1. Gauthier JL
2. Koay SA
3. Nieh EH
4. Tank DW
5. Pillow JW
6. Charles AS
(2022) Detecting and correcting false transients in calcium imaging
Nature Methods 19:470–478.

https://doi.org/10.1038/s41592-022-01422-5
- PubMed
- Google Scholar
Data
1. GENIE Project
(authors) (2015) Simultaneous imaging and loose-seal cell-attached electrical recordings from neurons expressing a variety of genetically encoded calcium indicators
Collaborative Research in Computational Neuroscience.

https://doi.org/10.6080/K02R3PMN
(2013) GENSAT BAC Cre-Recombinase driver lines to study the functional organization of cerebral cortical and basal ganglia circuits
Neuron 80:1368–1383.

https://doi.org/10.1016/j.neuron.2013.10.016
- PubMed
- Google Scholar
1. Gillon CJ
2. Lecoq JA
3. Pina JE
4. Ahmed R
5. Billeh YN
6. Caldejon S
7. Groblewski P
8. Henley TM
9. Kato I
10. Lee E
11. Luviano J
12. Mace K
13. Nayan C
14. Nguyen TV
15. North K
16. Perkins J
17. Seid S
18. Valley MT
19. Williford A
20. Bengio Y
21. Lillicrap TP
22. Zylberberg J
23. Richards BA
(2023) Responses of pyramidal cell somata and apical dendrites in mouse visual cortex over multiple days
Scientific Data 10:287.

https://doi.org/10.1038/s41597-023-02214-y
- PubMed
- Google Scholar
1. Girardeau G
2. Zugaro M
(2011) Hippocampal ripples and memory consolidation
Current Opinion in Neurobiology 21:452–459.

https://doi.org/10.1016/j.conb.2011.02.005
- PubMed
- Google Scholar
1. Gorgolewski KJ
2. Auer T
3. Calhoun VD
4. Craddock RC
5. Das S
6. Duff EP
7. Flandin G
8. Ghosh SS
9. Glatard T
10. Halchenko YO
11. Handwerker DA
12. Hanke M
13. Keator D
14. Li X
15. Michael Z
16. Maumet C
17. Nichols BN
18. Nichols TE
19. Pellman J
20. Poline JB
21. Rokem A
22. Schaefer G
23. Sochat V
24. Triplett W
25. Turner JA
26. Varoquaux G
27. Poldrack RA
(2016) The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments
Scientific Data 3:160044.

https://doi.org/10.1038/sdata.2016.44
- PubMed
- Google Scholar
1. Gouwens NW
2. Sorensen SA
3. Berg J
4. Lee C
5. Jarsky T
6. Ting J
7. Sunkin SM
8. Feng D
9. Anastassiou CA
10. Barkan E
11. Bickley K
12. Blesie N
13. Braun T
14. Brouner K
15. Budzillo A
16. Caldejon S
17. Casper T
18. Castelli D
19. Chong P
20. Crichton K
21. Cuhaciyan C
22. Daigle TL
23. Dalley R
24. Dee N
25. Desta T
26. Ding SL
27. Dingman S
28. Doperalski A
29. Dotson N
30. Egdorf T
31. Fisher M
32. de Frates RA
33. Garren E
34. Garwood M
35. Gary A
36. Gaudreault N
37. Godfrey K
38. Gorham M
39. Gu H
40. Habel C
41. Hadley K
42. Harrington J
43. Harris JA
44. Henry A
45. Hill D
46. Josephsen S
47. Kebede S
48. Kim L
49. Kroll M
50. Lee B
51. Lemon T
52. Link KE
53. Liu X
54. Long B
55. Mann R
56. McGraw M
57. Mihalas S
58. Mukora A
59. Murphy GJ
60. Ng L
61. Ngo K
62. Nguyen TN
63. Nicovich PR
64. Oldre A
65. Park D
66. Parry S
67. Perkins J
68. Potekhina L
69. Reid D
70. Robertson M
71. Sandman D
72. Schroedter M
73. Slaughterbeck C
74. Soler-Llavina G
75. Sulc J
76. Szafer A
77. Tasic B
78. Taskin N
79. Teeter C
80. Thatra N
81. Tung H
82. Wakeman W
83. Williams G
84. Young R
85. Zhou Z
86. Farrell C
87. Peng H
88. Hawrylycz MJ
89. Lein E
90. Ng L
91. Arkhipov A
92. Bernard A
93. Phillips JW
94. Zeng H
95. Koch C
(2019) Classification of electrophysiological and morphological neuron types in the mouse visual cortex
Nature Neuroscience 22:1182–1195.

https://doi.org/10.1038/s41593-019-0417-0
- PubMed
- Google Scholar
1. Harris JA
2. Mihalas S
3. Hirokawa KE
4. Whitesell JD
5. Choi H
6. Bernard A
7. Bohn P
8. Caldejon S
9. Casal L
10. Cho A
11. Feiner A
12. Feng D
13. Gaudreault N
14. Gerfen CR
15. Graddis N
16. Groblewski PA
17. Henry AM
18. Ho A
19. Howard R
20. Knox JE
21. Kuan L
22. Kuang X
23. Lecoq J
24. Lesnar P
25. Li Y
26. Luviano J
27. McConoughey S
28. Mortrud MT
29. Naeemi M
30. Ng L
31. Oh SW
32. Ouellette B
33. Shen E
34. Sorensen SA
35. Wakeman W
36. Wang Q
37. Wang Y
38. Williford A
39. Phillips JW
40. Jones AR
41. Koch C
42. Zeng H
(2019) Hierarchical organization of cortical and thalamic connectivity
Nature 575:195–202.

https://doi.org/10.1038/s41586-019-1716-z
- PubMed
- Google Scholar
1. Hawrylycz MJ
2. Lein ES
3. Guillozet-Bongaarts AL
4. Shen EH
5. Ng L
6. Miller JA
7. van de Lagemaat LN
8. Smith KA
9. Ebbert A
10. Riley ZL
11. Abajian C
12. Beckmann CF
13. Bernard A
14. Bertagnolli D
15. Boe AF
16. Cartagena PM
17. Chakravarty MM
18. Chapin M
19. Chong J
20. Dalley RA
21. David Daly B
22. Dang C
23. Datta S
24. Dee N
25. Dolbeare TA
26. Faber V
27. Feng D
28. Fowler DR
29. Goldy J
30. Gregor BW
31. Haradon Z
32. Haynor DR
33. Hohmann JG
34. Horvath S
35. Howard RE
36. Jeromin A
37. Jochim JM
38. Kinnunen M
39. Lau C
40. Lazarz ET
41. Lee C
42. Lemon TA
43. Li L
44. Li Y
45. Morris JA
46. Overly CC
47. Parker PD
48. Parry SE
49. Reding M
50. Royall JJ
51. Schulkin J
52. Sequeira PA
53. Slaughterbeck CR
54. Smith SC
55. Sodt AJ
56. Sunkin SM
57. Swanson BE
58. Vawter MP
59. Williams D
60. Wohnoutka P
61. Zielke HR
62. Geschwind DH
63. Hof PR
64. Smith SM
65. Koch C
66. Grant SGN
67. Jones AR
(2012) An anatomically comprehensive atlas of the adult human brain transcriptome
Nature 489:391–399.

https://doi.org/10.1038/nature11405
- PubMed
- Google Scholar
Data
1. Henze D
2. Harris K
3. Borhegyi Z
4. Csicsvari J
5. Mamiya A
6. Hirase H
7. Sirota A
8. Buzsáki G
(authors) (2009) Simultaneous intracellular and extracellular recordings from hippocampus region CA1 of anesthetized rats
Collaborative Research in Computational Neuroscience.

https://doi.org/10.6080/K02Z13FP
1. Huang L
2. Ledochowitsch P
3. Knoblich U
4. Lecoq J
5. Murphy GJ
6. Reid RC
7. de Vries SE
8. Koch C
9. Zeng H
10. Buice MA
11. Waters J
12. Li L
(2021) Relationship between simultaneously recorded spiking activity and fluorescence signal in GCaMP6 transgenic mice
eLife 10:e51675.

https://doi.org/10.7554/eLife.51675
- PubMed
- Google Scholar
1. Hubel DH
2. Wiesel TN
(1998) Early exploration of the visual cortex
Neuron 20:401–412.

https://doi.org/10.1016/s0896-6273(00)80984-8
- PubMed
- Google Scholar
Preprint
1. Inan H
2. Schmuckermair C
3. Tasci T
4. Ahanonu BO
5. Hernandez O
6. Lecoq J
7. Dinç F
8. Wagner MJ
9. Erdogdu MA
10. Schnitzer MJ
(2021) Fast and statistically robust cell extraction from large-scale neural calcium imaging datasets
bioRxiv.

https://doi.org/10.1101/2021.03.24.436279
- Google Scholar
Preprint
(2023) Sensory cortical ensembles exhibit differential coupling to ripples in distinct hippocampal subregions
bioRxiv.

https://doi.org/10.1101/2023.03.17.533028
- Google Scholar
1. Jun JJ
2. Steinmetz NA
3. Siegle JH
4. Denman DJ
5. Bauza M
6. Barbarits B
7. Lee AK
8. Anastassiou CA
9. Andrei A
10. Aydın Ç
11. Barbic M
12. Blanche TJ
13. Bonin V
14. Couto J
15. Dutta B
16. Gratiy SL
17. Gutnisky DA
18. Häusser M
19. Karsh B
20. Ledochowitsch P
21. Lopez CM
22. Mitelut C
23. Musa S
24. Okun M
25. Pachitariu M
26. Putzeys J
27. Rich PD
28. Rossant C
29. Sun WL
30. Svoboda K
31. Carandini M
32. Harris KD
33. Koch C
34. O’Keefe J
35. Harris TD
(2017) Fully integrated silicon probes for high-density recording of neural activity
Nature 551:232–236.

https://doi.org/10.1038/nature24636
- PubMed
- Google Scholar
Preprint
(2020) Disco: deep learning, instance segmentation, and correlations for cell segmentation in calcium imaging
arXiv.

https://arxiv.org/abs/1908.07957
- Google Scholar
1. Koch C
2. Reid RC
(2012) Observatories of the mind
Nature 483:397–398.

https://doi.org/10.1038/483397a
- PubMed
- Google Scholar
1. Koch C
2. Svoboda K
3. Bernard A
4. Basso MA
5. Churchland AK
6. Fairhall AL
7. Groblewski PA
8. Lecoq JA
9. Mainen ZF
10. Mathis MW
11. Olsen SR
12. Phillips JW
13. Pouget A
14. Saxena S
15. Siegle JH
16. Zador AM
(2022) Next-generation brain observatories
Neuron 110:3661–3666.

https://doi.org/10.1016/j.neuron.2022.09.033
- PubMed
- Google Scholar
1. Kumar MG
2. Hu M
3. Ramanujan A
4. Sur M
5. Murthy HA
(2021) Functional parcellation of mouse visual cortex using statistical techniques reveals response-dependent clustering of cortical processing areas
PLOS Computational Biology 17:e1008548.

https://doi.org/10.1371/journal.pcbi.1008548
- PubMed
- Google Scholar
1. Lein ES
2. Hawrylycz MJ
3. Ao N
4. Ayres M
5. Bensinger A
6. Bernard A
7. Boe AF
8. Boguski MS
9. Brockway KS
10. Byrnes EJ
11. Chen L
12. Chen L
13. Chen TM
14. Chin MC
15. Chong J
16. Crook BE
17. Czaplinska A
18. Dang CN
19. Datta S
20. Dee NR
21. Desaki AL
22. Desta T
23. Diep E
24. Dolbeare TA
25. Donelan MJ
26. Dong HW
27. Dougherty JG
28. Duncan BJ
29. Ebbert AJ
30. Eichele G
31. Estin LK
32. Faber C
33. Facer BA
34. Fields R
35. Fischer SR
36. Fliss TP
37. Frensley C
38. Gates SN
39. Glattfelder KJ
40. Halverson KR
41. Hart MR
42. Hohmann JG
43. Howell MP
44. Jeung DP
45. Johnson RA
46. Karr PT
47. Kawal R
48. Kidney JM
49. Knapik RH
50. Kuan CL
51. Lake JH
52. Laramee AR
53. Larsen KD
54. Lau C
55. Lemon TA
56. Liang AJ
57. Liu Y
58. Luong LT
59. Michaels J
60. Morgan JJ
61. Morgan RJ
62. Mortrud MT
63. Mosqueda NF
64. Ng LL
65. Ng R
66. Orta GJ
67. Overly CC
68. Pak TH
69. Parry SE
70. Pathak SD
71. Pearson OC
72. Puchalski RB
73. Riley ZL
74. Rockett HR
75. Rowland SA
76. Royall JJ
77. Ruiz MJ
78. Sarno NR
79. Schaffnit K
80. Shapovalova NV
81. Sivisay T
82. Slaughterbeck CR
83. Smith SC
84. Smith KA
85. Smith BI
86. Sodt AJ
87. Stewart NN
88. Stumpf KR
89. Sunkin SM
90. Sutram M
91. Tam A
92. Teemer CD
93. Thaller C
94. Thompson CL
95. Varnam LR
96. Visel A
97. Whitlock RM
98. Wohnoutka PE
99. Wolkey CK
100. Wong VY
101. Wood M
102. Yaylaoglu MB
103. Young RC
104. Youngstrom BL
105. Yuan XF
106. Zhang B
107. Zwingman TA
108. Jones AR
(2007) Genome-wide atlas of gene expression in the adult mouse brain
Nature 445:168–176.

https://doi.org/10.1038/nature05453
- PubMed
- Google Scholar
1. Madisen L
2. Garner AR
3. Shimaoka D
4. Chuong AS
5. Klapoetke NC
6. Li L
7. van der Bourg A
8. Niino Y
9. Egolf L
10. Monetti C
11. Gu H
12. Mills M
13. Cheng A
14. Tasic B
15. Nguyen TN
16. Sunkin SM
17. Benucci A
18. Nagy A
19. Miyawaki A
20. Helmchen F
21. Empson RM
22. Knöpfel T
23. Boyden ES
24. Reid RC
25. Carandini M
26. Zeng H
(2015) Transgenic mice for Intersectional targeting of neural sensors and effectors with high specificity and performance
Neuron 85:942–958.

https://doi.org/10.1016/j.neuron.2015.02.022
- PubMed
- Google Scholar
1. Markiewicz CJ
2. Gorgolewski KJ
3. Feingold F
4. Blair R
5. Halchenko YO
6. Miller E
7. Hardcastle N
8. Wexler J
9. Esteban O
10. Goncavles M
11. Jwa A
12. Poldrack R
(2021) The OpenNeuro resource for sharing of neuroscience data
eLife 10:e71774.

https://doi.org/10.7554/eLife.71774
- PubMed
- Google Scholar
1. Mayner WGP
2. Marshall W
3. Billeh YN
4. Gandhi SR
5. Caldejon S
6. Cho A
7. Griffin F
8. Hancock N
9. Lambert S
10. Lee EK
11. Luviano JA
12. Mace K
13. Nayan C
14. Nguyen TV
15. North K
16. Seid S
17. Williford A
18. Cirelli C
19. Groblewski PA
20. Lecoq J
21. Tononi G
22. Koch C
23. Arkhipov A
(2022) Measuring stimulus-evoked neurophysiological differentiation in distinct populations of neurons in mouse visual cortex
ENeuro 9:2021.

https://doi.org/10.1523/ENEURO.0280-21.2021
- PubMed
- Google Scholar
(2020) Neuromodulation of brain state and behavior
Annual Review of Neuroscience 43:391–415.

https://doi.org/10.1146/annurev-neuro-100219-105424
- PubMed
- Google Scholar
(2021) The effect of inclusion criteria on the functional properties reported in mouse visual cortex
ENeuro 8:2021.

https://doi.org/10.1523/ENEURO.0188-20.2021
- PubMed
- Google Scholar
1. Miller JA
2. Ding SL
3. Sunkin SM
4. Smith KA
5. Ng L
6. Szafer A
7. Ebbert A
8. Riley ZL
9. Royall JJ
10. Aiona K
11. Arnold JM
12. Bennet C
13. Bertagnolli D
14. Brouner K
15. Butler S
16. Caldejon S
17. Carey A
18. Cuhaciyan C
19. Dalley RA
20. Dee N
21. Dolbeare TA
22. Facer BAC
23. Feng D
24. Fliss TP
25. Gee G
26. Goldy J
27. Gourley L
28. Gregor BW
29. Gu G
30. Howard RE
31. Jochim JM
32. Kuan CL
33. Lau C
34. Lee CK
35. Lee F
36. Lemon TA
37. Lesnar P
38. McMurray B
39. Mastan N
40. Mosqueda N
41. Naluai-Cecchini T
42. Ngo NK
43. Nyhus J
44. Oldre A
45. Olson E
46. Parente J
47. Parker PD
48. Parry SE
49. Stevens A
50. Pletikos M
51. Reding M
52. Roll K
53. Sandman D
54. Sarreal M
55. Shapouri S
56. Shapovalova NV
57. Shen EH
58. Sjoquist N
59. Slaughterbeck CR
60. Smith M
61. Sodt AJ
62. Williams D
63. Zöllei L
64. Fischl B
65. Gerstein MB
66. Geschwind DH
67. Glass IA
68. Hawrylycz MJ
69. Hevner RF
70. Huang H
71. Jones AR
72. Knowles JA
73. Levitt P
74. Phillips JW
75. Sestan N
76. Wohnoutka P
77. Dang C
78. Bernard A
79. Hohmann JG
80. Lein ES
(2014) Transcriptional landscape of the prenatal human brain
Nature 508:199–206.

https://doi.org/10.1038/nature13185
- PubMed
- Google Scholar
(2021) A parameter-free statistical test for neuronal responsiveness
eLife 10:e71969.

https://doi.org/10.7554/eLife.71969
- PubMed
- Google Scholar
1. Murray JD
2. Bernacchia A
3. Freedman DJ
4. Romo R
5. Wallis JD
6. Cai X
7. Padoa-Schioppa C
8. Pasternak T
9. Seo H
10. Lee D
11. Wang XJ
(2014) A hierarchy of intrinsic timescales across primate cortex
Nature Neuroscience 17:1661–1663.

https://doi.org/10.1038/nn.3862
- PubMed
- Google Scholar
1. Muzzu T
2. Saleem AB
(2021) Feature selectivity can explain mismatch signals in mouse visual cortex
Cell Reports 37:109772.

https://doi.org/10.1016/j.celrep.2021.109772
- PubMed
- Google Scholar
1. Neto JP
2. Lopes G
3. Frazão J
4. Nogueira J
5. Lacerda P
6. Baião P
7. Aarts A
8. Andrei A
9. Musa S
10. Fortunato E
11. Barquinha P
12. Kampff AR
(2016) Validating Silicon Polytrodes with paired juxtacellular recordings: method and dataset
Journal of Neurophysiology 116:892–903.

https://doi.org/10.1152/jn.00103.2016
- PubMed
- Google Scholar
(2022) Brain-wide interactions during hippocampal sharp wave ripples
PNAS 119:e2200931119.

https://doi.org/10.1073/pnas.2200931119
- PubMed
- Google Scholar
1. Oh SW
2. Harris JA
3. Ng L
4. Winslow B
5. Cain N
6. Mihalas S
7. Wang Q
8. Lau C
9. Kuan L
10. Henry AM
11. Mortrud MT
12. Ouellette B
13. Nguyen TN
14. Sorensen SA
15. Slaughterbeck CR
16. Wakeman W
17. Li Y
18. Feng D
19. Ho A
20. Nicholas E
21. Hirokawa KE
22. Bohn P
23. Joines KM
24. Peng H
25. Hawrylycz MJ
26. Phillips JW
27. Hohmann JG
28. Wohnoutka P
29. Gerfen CR
30. Koch C
31. Bernard A
32. Dang C
33. Jones AR
34. Zeng H
(2014) A mesoscale connectome of the mouse brain
Nature 508:207–214.

https://doi.org/10.1038/nature13186
- PubMed
- Google Scholar
1. Olshausen BA
2. Field DJ
(2005) How close are we to understanding V1?
Neural Computation 17:1665–1699.

https://doi.org/10.1162/0899766054026639
- PubMed
- Google Scholar
Preprint
(2016) Kilosort: realtime spike-sorting for extracellular electrophysiology with hundreds of channels
bioRxiv.

https://doi.org/10.1101/061481
- Google Scholar
(2018) Scalpel: extracting neurons from calcium imaging data
The Annals of Applied Statistics 12:2430–2456.

https://doi.org/10.1214/18-AOAS1159
- PubMed
- Google Scholar
Data
(authors) (2020) The Buzsaki Lab Databank - Public electrophysiological datasets from awake animals
Zenodo.

https://doi.org/10.5281/zenodo.4307883
Preprint
(2021) Parallel inference of hierarchical latent dynamics in two-photon calcium imaging of neuronal populations
bioRxiv.

https://doi.org/10.1101/2021.03.05.434105
- Google Scholar
1. Purandare CS
2. Mehta MR
(2023) Mega-scale movie-fields in the mouse visuo-hippocampal network
eLife 12:RP85069.

https://doi.org/10.7554/eLife.85069.1
- Google Scholar
1. Rübel O
2. Tritt A
3. Ly R
4. Dichter BK
5. Ghosh S
6. Niu L
7. Baker P
8. Soltesz I
9. Ng L
10. Svoboda K
11. Frank L
12. Bouchard KE
(2022) The neurodata without borders ecosystem for neurophysiological data science
eLife 11:e78362.

https://doi.org/10.7554/eLife.78362
- PubMed
- Google Scholar
1. Saunders JL
2. Davis MD
(2022) Evaluation of exhaled fugitive particles during mechanical ventilation
Respiratory Care 67:1361–1362.

https://doi.org/10.4187/respcare.10483
- PubMed
- Google Scholar
1. Schneider M
2. Broggini AC
3. Dann B
4. Tzanou A
5. Uran C
6. Sheshadri S
7. Scherberger H
8. Vinck M
(2021) A mechanism for inter-areal coherence through communication based on connectivity and oscillatory power
Neuron 109:4050–4067.

https://doi.org/10.1016/j.neuron.2021.09.037
- PubMed
- Google Scholar
1. Sheintuch L
2. Rubin A
3. Brande-Eilat N
4. Geva N
5. Sadeh N
6. Pinchasof O
7. Ziv Y
(2017) Tracking the same neurons across multiple days in Ca2+ imaging data
Cell Reports 21:1102–1115.

https://doi.org/10.1016/j.celrep.2017.10.013
- Google Scholar
1. Siegle JH
2. López AC
3. Patel YA
4. Abramov K
5. Ohayon S
6. Voigts J
(2017) Open Ephys: an open-source, plugin-based platform for multichannel electrophysiology
Journal of Neural Engineering 14:045003.

https://doi.org/10.1088/1741-2552/aa5eea
- PubMed
- Google Scholar
1. Siegle JH
2. Jia X
3. Durand S
4. Gale S
5. Bennett C
6. Graddis N
7. Heller G
8. Ramirez TK
9. Choi H
10. Luviano JA
11. Groblewski PA
12. Ahmed R
13. Arkhipov A
14. Bernard A
15. Billeh YN
16. Brown D
17. Buice MA
18. Cain N
19. Caldejon S
20. Casal L
21. Cho A
22. Chvilicek M
23. Cox TC
24. Dai K
25. Denman DJ
26. de Vries SEJ
27. Dietzman R
28. Esposito L
29. Farrell C
30. Feng D
31. Galbraith J
32. Garrett M
33. Gelfand EC
34. Hancock N
35. Harris JA
36. Howard R
37. Hu B
38. Hytnen R
39. Iyer R
40. Jessett E
41. Johnson K
42. Kato I
43. Kiggins J
44. Lambert S
45. Lecoq J
46. Ledochowitsch P
47. Lee JH
48. Leon A
49. Li Y
50. Liang E
51. Long F
52. Mace K
53. Melchior J
54. Millman D
55. Mollenkopf T
56. Nayan C
57. Ng L
58. Ngo K
59. Nguyen T
60. Nicovich PR
61. North K
62. Ocker GK
63. Ollerenshaw D
64. Oliver M
65. Pachitariu M
66. Perkins J
67. Reding M
68. Reid D
69. Robertson M
70. Ronellenfitch K
71. Seid S
72. Slaughterbeck C
73. Stoecklin M
74. Sullivan D
75. Sutton B
76. Swapp J
77. Thompson C
78. Turner K
79. Wakeman W
80. Whitesell JD
81. Williams D
82. Williford A
83. Young R
84. Zeng H
85. Naylor S
86. Phillips JW
87. Reid RC
88. Mihalas S
89. Olsen SR
90. Koch C
(2021a) Survey of spiking in the mouse visual system reveals functional hierarchy
Nature 592:86–92.

https://doi.org/10.1038/s41586-020-03171-x
- PubMed
- Google Scholar
1. Siegle Joshua H
2. Ledochowitsch P
3. Jia X
4. Millman DJ
5. Ocker GK
6. Caldejon S
7. Casal L
8. Cho A
9. Denman DJ
10. Durand S
11. Groblewski PA
12. Heller G
13. Kato I
14. Kivikas S
15. Lecoq J
16. Nayan C
17. Ngo K
18. Nicovich PR
19. North K
20. Ramirez TK
21. Swapp J
22. Waughman X
23. Williford A
24. Olsen SR
25. Koch C
26. Buice MA
27. de Vries SE
(2021b) Reconciling functional differences in populations of neurons recorded with 2-photon imaging and electrophysiology
eLife 10:e69068.

https://doi.org/10.7554/eLife.69068
- Google Scholar
(2019) Fast and robust active neuron segmentation in two-photon calcium imaging using spatiotemporal deep learning
PNAS 116:8554–8563.

https://doi.org/10.1073/pnas.1812995116
- PubMed
- Google Scholar
(2021) High-precision coding in visual cortex
Cell 184:2767–2778.

https://doi.org/10.1016/j.cell.2021.03.042
- PubMed
- Google Scholar
1. Sweeney Y
2. Clopath C
(2020) Population coupling predicts the plasticity of stimulus responses in cortical circuits
eLife 9:e56053.

https://doi.org/10.7554/eLife.56053
- PubMed
- Google Scholar
1. Tasic B
2. Yao Z
3. Graybuck LT
4. Smith KA
5. Nguyen TN
6. Bertagnolli D
7. Goldy J
8. Garren E
9. Economo MN
10. Viswanathan S
11. Penn O
12. Bakken T
13. Menon V
14. Miller J
15. Fong O
16. Hirokawa KE
17. Lathia K
18. Rimorin C
19. Tieu M
20. Larsen R
21. Casper T
22. Barkan E
23. Kroll M
24. Parry S
25. Shapovalova NV
26. Hirschstein D
27. Pendergraft J
28. Sullivan HA
29. Kim TK
30. Szafer A
31. Dee N
32. Groblewski P
33. Wickersham I
34. Cetin A
35. Harris JA
36. Levi BP
37. Sunkin SM
38. Madisen L
39. Daigle TL
40. Looger L
41. Bernard A
42. Phillips J
43. Lein E
44. Hawrylycz M
45. Svoboda K
46. Jones AR
47. Koch C
48. Zeng H
(2018) Shared and distinct transcriptomic cell types across neocortical areas
Nature 563:72–78.

https://doi.org/10.1038/s41586-018-0654-5
- PubMed
- Google Scholar
(2008) Data sharing for computational neuroscience
Neuroinformatics 6:47–55.

https://doi.org/10.1007/s12021-008-9009-y
- PubMed
- Google Scholar
1. Teeters JL
2. Godfrey K
3. Young R
4. Dang C
5. Friedsam C
6. Wark B
7. Asari H
8. Peron S
9. Li N
10. Peyrache A
11. Denisov G
12. Siegle JH
13. Olsen SR
14. Martin C
15. Chun M
16. Tripathy S
17. Blanche TJ
18. Harris K
19. Buzsáki G
20. Koch C
21. Meister M
22. Svoboda K
23. Sommer FT
(2015) Neurodata without borders: creating a common data format for neurophysiology
Neuron 88:629–634.

https://doi.org/10.1016/j.neuron.2015.10.025
- PubMed
- Google Scholar
(2022) Large-scale neural recordings call for new insights to link brain and behavior
Nature Neuroscience 25:11–19.

https://doi.org/10.1038/s41593-021-00980-9
- PubMed
- Google Scholar
1. Wilkinson MD
2. Dumontier M
3. Aalbersberg IJJ
4. Appleton G
5. Axton M
6. Baak A
7. Blomberg N
8. Boiten JW
9. da Silva Santos LB
10. Bourne PE
11. Bouwman J
12. Brookes AJ
13. Clark T
14. Crosas M
15. Dillo I
16. Dumon O
17. Edmunds S
18. Evelo CT
19. Finkers R
20. Gonzalez-Beltran A
21. Gray AJG
22. Groth P
23. Goble C
24. Grethe JS
25. Heringa J
26. ’t Hoen PAC
27. Hooft R
28. Kuhn T
29. Kok R
30. Kok J
31. Lusher SJ
32. Martone ME
33. Mons A
34. Packer AL
35. Persson B
36. Rocca-Serra P
37. Roos M
38. van Schaik R
39. Sansone SA
40. Schultes E
41. Sengstag T
42. Slater T
43. Strawn G
44. Swertz MA
45. Thompson M
46. van der Lei J
47. van Mulligen E
48. Velterop J
49. Waagmeester A
50. Wittenburg P
51. Wolstencroft K
52. Zhao J
53. Mons B
(2016) The FAIR guiding principles for scientific data management and stewardship
Scientific Data 3:160018.

https://doi.org/10.1038/sdata.2016.18
- PubMed
- Google Scholar
1. Yao Z
2. van Velthoven CTJ
3. Nguyen TN
4. Goldy J
5. Sedeno-Cortes AE
6. Baftizadeh F
7. Bertagnolli D
8. Casper T
9. Chiang M
10. Crichton K
11. Ding SL
12. Fong O
13. Garren E
14. Glandon A
15. Gouwens NW
16. Gray J
17. Graybuck LT
18. Hawrylycz MJ
19. Hirschstein D
20. Kroll M
21. Lathia K
22. Lee C
23. Levi B
24. McMillen D
25. Mok S
26. Pham T
27. Ren Q
28. Rimorin C
29. Shapovalova N
30. Sulc J
31. Sunkin SM
32. Tieu M
33. Torkelson A
34. Tung H
35. Ward K
36. Dee N
37. Smith KA
38. Tasic B
39. Zeng H
(2021) A taxonomy of transcriptomic cell types across the isocortex and hippocampal formation
Cell 184:3222–3241.

https://doi.org/10.1016/j.cell.2021.04.021
- PubMed
- Google Scholar
Preprint
1. Yatsenko D
2. Reimer J
3. Ecker AS
4. Walker EY
5. Sinz F
6. Berens P
7. Hoenselaar A
8. Cotton RJ
9. Siapas AS
10. Tolias AS
(2015) DataJoint: Managing big scientific data using MATLAB or Python
bioRxiv.

https://doi.org/10.1101/031658
- Google Scholar
1. York DG
2. Adelman J
3. Anderson, Jr. JE
4. Anderson SF
5. Annis J
6. Bahcall NA
7. Bakken JA
8. Barkhouser R
9. Bastian S
10. Berman E
11. Boroski WN
12. Bracker S
13. Briegel C
14. Briggs JW
15. Brinkmann J
16. Brunner R
17. Burles S
18. Carey L
19. Carr MA
20. Castander FJ
21. Chen B
22. Colestock PL
23. Connolly AJ
24. Crocker JH
25. Csabai I
26. Czarapata PC
27. Davis JE
28. Doi M
29. Dombeck T
30. Eisenstein D
31. Ellman N
32. Elms BR
33. Evans ML
34. Fan X
35. Federwitz GR
36. Fiscelli L
37. Friedman S
38. Frieman JA
39. Fukugita M
40. Gillespie B
41. Gunn JE
42. Gurbani VK
43. de Haas E
44. Haldeman M
45. Harris FH
46. Hayes J
47. Heckman TM
48. Hennessy GS
49. Hindsley RB
50. Holm S
51. Holmgren DJ
52. Huang C
53. Hull C
54. Husby D
55. Ichikawa S-I
56. Ichikawa T
57. Ivezić Ž
58. Kent S
59. Kim RSJ
60. Kinney E
61. Klaene M
62. Kleinman AN
63. Kleinman S
64. Knapp GR
65. Korienek J
66. Kron RG
67. Kunszt PZ
68. Lamb DQ
69. Lee B
70. Leger RF
71. Limmongkol S
72. Lindenmeyer C
73. Long DC
74. Loomis C
75. Loveday J
76. Lucinio R
77. Lupton RH
78. MacKinnon B
79. Mannery EJ
80. Mantsch PM
81. Margon B
82. McGehee P
83. McKay TA
84. Meiksin A
85. Merelli A
86. Monet DG
87. Munn JA
88. Narayanan VK
89. Nash T
90. Neilsen E
91. Neswold R
92. Newberg HJ
93. Nichol RC
94. Nicinski T
95. Nonino M
96. Okada N
97. Okamura S
98. Ostriker JP
99. Owen R
100. Pauls AG
101. Peoples J
102. Peterson RL
103. Petravick D
104. Pier JR
105. Pope A
106. Pordes R
107. Prosapio A
108. Rechenmacher R
109. Quinn TR
110. Richards GT
111. Richmond MW
112. Rivetta CH
113. Rockosi CM
114. Ruthmansdorfer K
115. Sandford D
116. Schlegel DJ
117. Schneider DP
118. Sekiguchi M
119. Sergey G
120. Shimasaku K
121. Siegmund WA
122. Smee S
123. Smith JA
124. Snedden S
125. Stone R
126. Stoughton C
127. Strauss MA
128. Stubbs C
129. SubbaRao M
130. Szalay AS
131. Szapudi I
132. Szokoly GP
133. Thakar AR
134. Tremonti C
135. Tucker DL
136. Uomoto A
137. Vanden Berk D
138. Vogeley MS
139. Waddell P
140. Wang S
141. Watanabe M
142. Weinberg DH
143. Yanny B
144. Yasuda N
(2000) The Sloan Digital Sky Survey: Technical summary
The Astronomical Journal 120:1579–1587.

https://doi.org/10.1086/301513
- Google Scholar
(2015) Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-Seq
Science 347:1138–1142.

https://doi.org/10.1126/science.aaa1934
- PubMed
- Google Scholar
1. Zuiderwijk A
2. Spiers H
(2019) Sharing and re-using open data: A case study of motivations in astrophysics
International Journal of Information Management 49:228–241.

https://doi.org/10.1016/j.ijinfomgt.2019.05.024
- Google Scholar

Article and author information

Author details

Saskia EJ de Vries

Allen Institute, Seattle, United States

Contribution
Conceptualization, Writing – original draft, Writing – review and editing

Contributed equally with
Joshua H Siegle

For correspondence
saskiad@alleninstitute.org

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-3704-3499
Joshua H Siegle

Allen Institute, Seattle, United States

Contribution
Conceptualization, Writing – original draft, Writing – review and editing

Contributed equally with
Saskia EJ de Vries

For correspondence
joshs@alleninstitute.org

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-7736-4844
Christof Koch

Allen Institute, Seattle, United States

Contribution
Conceptualization, Writing – original draft, Writing – review and editing

Competing interests
No competing interests declared

Funding

Allen Institute

Saskia EJ de Vries
Joshua H Siegle
Christof Koch

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank all the members of Transgenic Colony Management, Lab Animal Services, Neurosurgery & Behavior, Imaging and Neuropixels Operations Teams, Materials & Process Engineering, Information Technology, and Program Management that cared for and trained the animals, built and staffed the instruments, processed the brains, and wrangled the data streams. We thank Allan Jones for providing an environment that nurtured our efforts and the Allen Institute founder, Paul G Allen, for his vision, encouragement, and support. This research was funded by the Allen Institute. We thank Amazon Web Services for providing free cloud data storage as part of the Open Data Registry program. We thank Hilton Lewis for his insights into data sharing in the astronomy community and Fritz Sommer for providing information about CRCNS. We thank Bénédicte Rossi for illustrating the different use cases for our data. We thank Huijeong Jeong, Chinmay Purandare, Noam Nitzan, Carsen Stringer, Shabab Bakhtiari, Aman Saleem, Marius Schneider, and Jorrit Montijn for providing feedback about their user experiences. We thank Karel Svoboda, David Feng, Jerome Lecoq, Shawn Olsen, Stefan Mihalas, Anton Arkhipov, and Michael Buice for feedback on the manuscript.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.