MouseBytes, an open-access high-throughput pipeline and database for rodent touchscreen-based cognitive assessment

Abstract
Introduction
Results and discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Open Science has changed research by making data accessible and shareable, contributing to replicability to accelerate and disseminate knowledge. However, for rodent cognitive studies the availability of tools to share and disseminate data is scarce. Automated touchscreen-based tests enable systematic cognitive assessment with easily standardised outputs that can facilitate data dissemination. Here we present an integration of touchscreen cognitive testing with an open-access database public repository (mousebytes.ca), as well as a Web platform for knowledge dissemination (https://touchscreencognition.org). We complement these resources with the largest dataset of age-dependent high-level cognitive assessment of mouse models of Alzheimer’s disease, expanding knowledge of affected cognitive domains from male and female mice of three strains. We envision that these new platforms will enhance sharing of protocols, data availability and transparency, allowing meta-analysis and reuse of mouse cognitive data to increase the replicability/reproducibility of datasets.

Introduction

The public nature of research and increased rigor applied to research outputs have encouraged new approaches to enhance transparency, data sharing, and reproducibility (Button et al., 2013). Over the past 10 years, Open Science initiatives featuring increased data sharing and high-throughput automated data collection have increased the efficiency, quality, integrity and reproducibility of data gathering (Johnson and O'Donnell, 2009; Rahman and Watabe, 2018). In genomics, for example, researchers have made major progress in understanding the genetic basis of diseases by establishing multi-research site consortia and by providing access to these data through different open repositories (Boucas, 2018; Diehl and Boyle, 2016; Gerstein, 2012). In neuroimaging, data sharing and large open-access databases have enabled the development of new analytic tools allowing researchers to address questions that could not be answered using single data sets (Biswal et al., 2010; Poldrack and Gorgolewski, 2014).

Recently, there have been several attempts to build databases of rodent behaviour data. The Jackson Laboratory has developed the Mouse Phenome Database, a repository of mouse data taken from several studies (Grubb et al., 2009). Additionally, resources such as the International Mouse Phenotyping Resource of Standardised Screens provide different pipelines for the characterisation of mouse lines (Koscielny et al., 2014). Although these databases represent a necessary and fundamental shift in the availability of data, these repositories provide only limited information on high-level cognitive testing in mouse models.

Conventional cognitive assessments in mouse models are subject to large variation (Kafkafi et al., 2018; Wahlsten et al., 2003), which may be in part the result of lack of automation. Additionally, the methodology used for cognitive assessments can significantly vary among research groups. For example, a recent analysis focusing on transgenic mouse models of Alzheimer’s disease found a significant amount of variation in the parameters used in the Morris Water Maze, including pool size, pool temperature, number of trials per day, and number of acquisition days (Egan et al., 2016). Further evidence shows that even when protocol parameters are controlled for, different experimenters can still obtain different behavioural results in conventional behavioural tasks (Chesler et al., 2002; Crabbe et al., 1999; Kafkafi et al., 2018). Overall, there are converging domains of evidence to suggest that non-automated and non-standardised conventional behavioural assessments may be prone to several sources of bias.

Efforts to address these important gaps in data collection, automation, and translational research led to the development of the rodent touchscreen testing method (Horner et al., 2013; Mar et al., 2013; Oomen et al., 2013). This technology allows the use of tests in rodents that are highly similar, and in some cases identical, to human cognitive tests (Heath et al., 2019; Nilsson et al., 2016; Nithianantharajah et al., 2015; Romberg et al., 2013a). Touchscreen testing systems have standardised behavioural protocols that are under the control of a computer system, allowing for increased standardisation of outcomes. Furthermore, the automation of high-level cognitive testing can provide significant reductions in experimenter and environmental influence, by providing a standard operant environment and standardised output file formats (Horner et al., 2013; Mar et al., 2013). This feature makes results generated amenable to storage in a central repository, allowing for data categorisation, searching and comparison between multiple laboratories.

Here we used data obtained with male and female mice from three distinct mouse lines commonly used in Alzheimer's disease (AD) research to highlight the use of a new repository and Web-based software, MouseBytes (mousebytes.ca). We reveal longitudinal heterogeneity as well as commonalities in cognitive function between the various strains modelling AD. For example, 3xTG-AD mice, males and females, present early attention deficits (at 3–6 months of age) when compared to their age matched controls, demonstrating reproducibility of earlier results. Overall, our cognitive assessment suggests which mouse models can be used to model cognitive phenotypes consistent with Alzheimer’s disease.

MouseBytes is available to researchers worldwide (mousebytes.ca), so they can pre-process, run automated quality control scripts, store, visualise, and analyse their data alone or alongside other researchers’ stored data. Moreover, researchers can use a knowledge sharing tool https://touchscreencognition.org to disseminate community-driven information, including standard operating procedures (SOPs) and protocols. We foresee this repository for touchscreen data as a major step towards increasing the availability of datasets, including negative results, that can serve to evaluate reproducibility, decrease publication bias and to bring high-level cognitive assessment into the Open Science era.

Results and discussion

Open-access database and repository

To highlight the potential strengths of MouseBytes, we acquired data from male and female mice from three commonly used transgenic mouse models that have pathological similarities to AD at three ages in two different laboratories on three clinically-relevant touchscreen-based cognitive tests: attention [5-choice serial reaction time task (5-CSRTT)] (Beraldo et al., 2015; Mar et al., 2013; Romberg et al., 2011), behavioural flexibility [pairwise visual discrimination reversal (PD)] (Bussey et al., 2008; Kolisnyk et al., 2013; Mar et al., 2013) and long-term learning and memory [paired-associates learning (PAL)] (Al-Onaizi et al., 2016; Bartko et al., 2011; Horner et al., 2013). The mouse lines (3xTG-AD, 5xFAD and APP/PS1) were chosen due to their extensive use in AD research, as well as their differences in pathology development and AD familial genetic mutations (Egan et al., 2016; Jankowsky et al., 2004; Lee and Han, 2013; Oakley et al., 2006; Oddo et al., 2003). Moreover, the 3xTG-AD mouse line had been tested before using touchscreen attention tests providing a framework for reproducibility testing (Romberg et al., 2011).

Following completion of individual experiments, Extensive Markup Language (XML) files were generated using the Animal Behaviour Environment Test II (ABET II by Campden Instruments Ltd, Loughborough, England) software. XML files were uploaded into MouseBytes and screened using an automated quality control (QC) procedure which is a tool available at MouseBytes.ca (mousebytesQC). The rules and codes for the QC are available for download and modification in GitHub (GitHub_Touchscreen_Pipeline; copy archived at https://github.com/elifesciences-publications/Mousebytes-An-open-access-high-throughput-pipeline-and-database-for-rodent-touchscreen-based-data) (Memar et al., 2019) . Files that did not meet the QC criteria were automatically identified. Following QC, only XML files (one mouse unique ID/session) that passed QC were automatically uploaded to the database (mousebytes.ca) and integrated to the analytics TIBCO Spotfire to generate an interactive visualisation platform for 5-CSRTT, PAL, and PD (Figure 1, see also the online data visualisation - https://mousebytes.ca/data-visualization). Briefly, to navigate through the data in mousebytes visualisation (Spotfire) the users should select the cognitive task in the dropdown menu. After the selection of the cognitive task, corresponding features are selected (e.g. 5-SCRTT Probe trial for the 5-CSRTT data, etc.). A glossary with the description of the training and probe phases is found in MouseBytes (mousebytes.ca_description). Moreover, the user can check or uncheck the filter boxes on the right side of the page to define the data to be visualised and export or analyse specific graphs using the side tabs. This allows features selected, such as site, mouse strain, genotype, sex and age for example to be quickly compared (for more information on how to use the data visualisation please check the methods - Data Quality Control and availability).

Figure 1 with 1 supplement see all

Download asset Open asset

Schematic overview of the automated touchscreen cognition platform.

Males and females of three different AD mouse lines were each evaluated in three different touchscreen tasks. The mice were food restricted and tested longitudinally and at two different sites (The University of Western Ontario (UWO) and University of Guelph (UoG) - Canada). Data were submitted to an automated QC process. Following automated QC, data were uploaded to an open-access database (mousebytes.ca) for post-processing analysis and visualisation using the analytics tool TIBCO Spotfire.

Our pipeline enabled collection of extensive amounts of data from different ages, mouse strains, and sex. In total, we tested 652 different mice and generated 62,411 xml files (27,440 for 5-CSRTT, 17,230 for PAL, and 17,892 for PD). Importantly, by scanning the files through our automated QC procedure, we identified 487 files (0.8%) that did not meet the QC system criteria. 62,393 xml files (99.2%) passed the automated QC criteria and were transferred to the database. After QC, files that did not meet the criteria, or could not be fixed, were automatically discarded and were not used for analysis (see Materials and methods).

Due to the amount of data we generated in this work, classical graphical format to visualise all information contained in these datasets would require close to 30–40 figures with 6–10 panels each, depending on the kind of comparisons being performed. This situation underscores the need for online and on-the-fly data assessment, multidimensional visualisation, and comparison using online visualisation tools (Dunn et al., 2016), such as TIBCO Spotfire (Dunn et al., 2016; Pechter et al., 2016), which we used here.

Currently, our system has been optimised for the intake of data from the Bussey-Saksida Operant Chamber systems (Campden Instruments, Lafayette Instruments). There are alternative commercial touchscreen systems available (i.e. Med Associate K-Limbic touchscreen operant chambers), as well as several open source alternatives (O'Leary et al., 2018; Pineño, 2014; Wolf et al., 2014). In order to open the MouseBytes platform to all researchers using touchscreens, we have incorporated codes for download and modification in GitHub (GitHub codes; Memar et al., 2019) to easily convert the formats of output XML files from other systems to the format used in MouseBytes.

High-level cognitive testing in AD mouse models

The sample sizes for all experiments/tasks can be found in Supplementary file 1. Key parameters that were analysed for each experiment can be seen for 5-CSRTT: 5-CSRTT MouseBytes data link, PAL: PAL MouseBytes data link, and PD: PD mousebytes data link. One of the features of this open-access database is the possibility of downloading a standardised dataset (using a hyperlink generated by MouseBytes) related to particular experiments (i.e. linked to a particular figure of a paper) to perform customised analyses (A series of videos is available on the website that demonstrates how to use MouseBytes: MouseBytes-Guidelines).

Statistical analysis of the performance of distinct AD mouse models was conducted using R, taking advantage of the fact that CSV files can be generated for specific datasets using MouseBytes, which facilitates the use of open-source statistical packages. A summary of the split-plot ANOVA of all behavioural measures for each genotype can be found in Supplementary file 2 (5-CSRTT), Supplementary file 3 (PAL), and Supplementary file 4 (PD). In addition, a second set of planned ANOVAs was conducted separately isolating cohorts by age and sex to identify potential genotype effects within select subpopulations of mice. Summary information with the complete secondary ANOVA statistics for all three tasks can be found in Supplementary file 5, Supplementary file 6 and Supplementary file 7. A summary of the results of these statistical analyses can be found in Table 1 (5-CSRTT) and Table 2 (PD and PAL). Specific analyses and links to each dataset for figures are presented below.

Table 1

5-CSRTT analyses.

Summary of conventional genotype analyses on the 5-CSRTT task. Summary results were based on simple 2 (genotype) x 4 (stimulus duration) split-plot ANOVA. Impairment or Facilitation was determined by looking for a significant genotype effect or interaction. (3x – 3xTG-AD, 5x – 5xFAD and APP – APP/PS1) mouse lines. Impairment (↓), Improvement (↑) No Effect (-). See also Supplementary file 2 and 5.

		Accuracy			Omissions			Premature Responses			Perseverative Responses			Touch Latency			Reward Latency
Sex	Age (months)	3x	5x	APP	3x	5x	APP	3x	5x	APP	3x	5x	APP	3x	5x	APP	3x	5x	APP
Female	3-6	↓	-	↑	↓	-	↑	↑	-	-	↑	-	↓	↓	-	↑	-	↓	-
	7-10	↓	↓	-	-	-	↑	-	-	-	↑	↑	-	↓	-	-	-	↓	-
	11-13	↓	↓	-	-	-	-	-	-	-	↑	↑	-	↓	-	-	-	↓	-
Male	3-6	↓	-	-	-	-	-	-	-	-	-	-	↓	↓	-	-	↓	↓	-
	7-10	↓	-	-	-	-	-	-	-	-	-	↑	↓	↓	↓	-	↓	↓	-
	11-13	↓	↓	↑	↓	↓	-	-	-	-	-	↑	↓	↓	↓	-	-	↓	↓

Table 2

PD and PAL analyses.

Summary of conventional genotype analyses on the PD and PAL tasks. Summary results were based on simple 2 (genotype) x 4 (stimulus duration) split-plot ANOVA. Impairment or Facilitation was determined by looking for a significant genotype effect or interaction. (3x – 3xTG-AD, 5x – 5xFAD and APP – APP/PS1) mouse lines. Impairment (↓), Improvement (↑) No Effect (-). See also Supplementary file 3, 4, 6, 7.

			Accuracy			Correction Trials			Touch Latency			Reward Latency
Task	Sex	Age (months)	3x	5x	APP	3x	5x	APP	3x	5x	APP	3x	5x	APP
PD	Female	3-6	-	-	↓	-	↓	↓	↓	↓	↑	-	↓	↑
		7-10	-	↓	-	-	↓	↓	↓	↓	↑	-	↓	↑
		11-13	-	-	-	-	-	-	-	↓	↑	-	↓	-
	Male	3-6	↑	-	↓	↑	-	-	-	-	-	-	-	-
		7-10	-	-	-	-	-	-	-	-	-	-	↓	-
		11-13	-	-	-	-	-	-	↑	↓	-	-	↓	-
PAL	Female	3-6	-	-	↑	-	↓	↑	-	-	↑	↑	↓	↑
	Female	11-13	-	-	-	-	↓	-	-	↓	-	↑	↓	↓
	Male	3-6	↓	-	↓	↓	↓	↓	↓	↑	↑	↓	↓	-
	Male	11-13	-	↓	-	-	↓	-	-	-	-	↓	↓	-

Reliability in touchscreen testing

Variability of mouse performance in behavioural tests across different laboratories is an important issue for replicability (Crabbe et al., 1999; Kafkafi et al., 2018; Wahlsten et al., 2003). The use of automated and standardized testing can help decrease variability, although a wide range of factors including colony genetic drift (Zeldovich, 2017), light-dark cycle, types of cages and housing (single housed or group-housed), source of food, type and amount of reward, different types of environmental enrichment and colony room temperature/humidity conditions can still potentially contribute to variability (Kim et al., 2017). Furthermore, even though touchscreen tasks are automated and standardised, there is some level of flexibility in these tasks. We are aware that researchers, depending on the scientific question, may modify the experimental design (set of images, length of inter-trial intervals, number of trials and sessions per day, type and or amount of reward, etc.), which can increase the number of variables for analysis. To control these variables, we included in MouseBytes features that allow the users to describe these variables as Metadata. For example, when uploading XML files, the user must check boxes indicating the light-cycle and whether mice were single or group housed. In addition, in experimental description users can describe how mice were tested (e.g. number of trials and sessions per day). Furthermore, users can also link the digital object identifier (DOI) of their published article to datasets. With these additional sources of metadata information, one can begin to determine which variables can influence behavioural performance within the touchscreen environment.

To directly assess potential site variability in the current dataset stored in MouseBytes, we included site as a factor in our 5-CSRTT analyses. The 5-CSRTT task was chosen due to the larger cohorts of mice used in these experiments at the two sites. Throughout the analyses of 5-CSRTT measurements (see Materials and methods), no consistent pattern of main effects or interactions emerged for site between age, sex, genotype, strain, or measure (stimulus length) (Supplementary file 2, tabs 1, 2 and 3). For example, for interactions between test site, genotype, sex, age and stimulus length, only APP/PS1 mice presented a statistical difference in accuracy, whereas all other parameters (% correct, number of premature responses, number of perseverative responses, reward collection latency and correct touch latency) for the three strains were not significantly different (Supplementary file 2, tabs 1, 2 and 3, lines 28 #a). Interactions with test site that were significant typically had a small effect size and lacked consistency across behavioural domains and mouse strains (Supplementary file 2, tabs 1, 2 and 3, #a). Overall, the evidence suggests low site-to-site variability and high replicability for touchscreen test performance. For example, low variability was observed between sites when we compared longitudinally the performance of wild-type female mice (B6129SF2/J) and their AD-mouse model counterpart, 3xTG-AD, in the 5-CSRTT task (attention). We observed a difference in accuracy (0.6 s stimulus duration) in wild-type females at 3–6 months of age between the two sites (Figure 2A dataset 1). Other than that, no statistically significant differences were found for either accuracy or omission for both B6129SF2/J females (Figure 2B dataset 2, Figure 2C dataset 3, Figure 2D dataset 4) or 3xTG females (Figure 2E and F dataset 5 and dataset 6, Figure 2G and H dataset 7 and dataset 8) between the two sites. We observed similar reproducible results for mouse touchscreen performance across the other AD models and their wild-type counterparts (see mousebytes.ca for more comparisons). As an important feature, MouseBytes allows the generation of dataset hyperlinks to easily identify and download the raw data used to generate each figure panel (dataset 1, 2, etc.).

Figure 2 with 3 supplements see all

Download asset Open asset

Performance and response measures of Male and Female mice during 5-CSRTT probe trials.

Mice were subjected to a series of probe trials and the averages of accuracy (% correct), omissions (%) and premature responses (number) were plotted at different ages. The plots were generated with data downloaded from MouseBytes and the links (datasets) for the individual analysis can be found in the results section. (**A-D**), longitudinal site comparison of the performance (accuracy and omissions) of female Wild-type controls (B6129SF2/J) at 3–6 and 11 to 13 months of age; (**E-H**) longitudinal site comparison of the performance (accuracy and omissions) of female 3xTG-AD at 3–6 and 11 to 13 months of age respectively; (**I-L**) comparison of the performance (accuracy and omissions) of 3xTG-AD male and their Wild-type controls (B6129SF2/J) at 3–6 and 11 to 13 months of age; (**M-P**) comparison of the performance (accuracy and omissions) of 3xTG-AD female mice and Wild-type controls (B6129SF2/J) at 3–6 and 11 to 13 months of age. Results are presented as means ± s.e.m.; data were analysed and compared using Repeated measure Two-Way ANOVA and Bonferroni multiple comparisons post-hoc test; *p<0.05, compared with control.

Previous experiments have detected robust attentional deficits in 11- month-old male 3xTG-AD mice (Romberg et al., 2011), with lower accuracy in the 5-CSRTT and no differences in omissions compared to wild-type controls (Romberg et al., 2011). We tested male 3xTG-AD mice at the same age and reproduced the cognitive signature pattern of attentional deficit as previously published for male mice (Figure 2J, dataset nine for accuracy, Figure 2L, dataset 10 for omissions). In addition, we also tested female mice and similar to the males, 3xTG-AD female mice also presented lower accuracy (Figure 2N dataset 11) and no difference in omissions (Figure 2P, dataset 12) when compared to the wild-type controls. Moreover, both male and female 3xTG-AD mice that were tested starting at 4 months of age also presented lower accuracy (Figure 2I, dataset 13 and M, dataset 14) and no difference in omissions (Figure 2K, dataset 15 and 2 O dataset 16) when compared to controls (Table 1 and Supplementary file 2 and Supplementary file 5 - Tab 1, #b). We also examined vigilance (the ability to maintain concentrated attention over a prolonged period of time), which was also previously reported to be affected in this mouse line (Romberg et al., 2011), by characterising performance across blocks of 10 trials. A complete breakdown of all the vigilance analyses can be found in Supplementary file 8 (Tab 1). Reduced vigilance across trials was reflected in a deficit in accuracy in 3xTG-AD males (10–11 month-old mice used as example, Figure 3A–D and Supplementary file 8- Tab 1, #c) or 3xTG- AD female mice (3–6 -month-old mice used as example, Figure 3I–L and Supplementary file 8 - Tab 1, #c). No differences were observed for omissions (Figure 3E–H,M–P and Supplementary file 8 – Tab 1). These experiments support the replicability we observed between sites and suggest that 3xTG-AD mice present robust attentional deficits that can be observed across several laboratories even when a different genetic background is used. Because genetic drifting can potentially affect reproducibility in mouse behaviour testing (Zeldovich, 2017), identification of robust deficits of high-level cognition resulting from AD-related pathology is important to develop drug treatments. It seems that attention deficit in the 3xTG-AD is one such outcome.

Figure 3

Download asset Open asset

Sustained attention (vigilance) of 3xTG-AD male and female mice during the 5-CSRTT probe trials.

Response accuracy and omissions in Wild-type and 3xTG-AD male (A-D for accuracy and E-H for omission) and female (I-L for accuracy and M-P for omission) mice were analysed across 10-trials blocks within the daily sessions of 50 trials with different stimulus durations. Results are presented as means ± s.e.m.; data were analysed and compared using Repeated measure Two-Way ANOVA and Bonferroni multiple comparisons post-hot test; **p<0.01 compared with control.

The ability to compare mouse performance between sites can provide important insights on sources of variability for experiments. Replication experiments are the gold standard to validate scientific discoveries, but particularly in conventional rodent cognitive testing, variability of results is an issue. For example, different mouse models of AD present behavioural changes that are quite variable between laboratories when using conventional behaviour testing (Arendash et al., 2001; Clinton et al., 2007; Ding et al., 2008; Holcomb et al., 1999; Ostapchenko et al., 2015; Stevens and Brown, 2015). The combination of touchscreen cognitive testing and MouseBytes may help to identify sources of variability to overcome issues of replication in rodent high-level cognitive analysis.

In order to expand and further refine our understanding of the cognitive deficits in 3xTG-AD mice, we conducted two additional touchscreen-based cognitive assessments. Of particular interest was the PAL task, a relevant test for AD progression as the CANTAB version of PAL has been found to be predictive of conversion from mild cognitive impairment to AD (Junkkila et al., 2012). Moreover, forebrain cholinergic dysfunction, which is found in AD, impairs performance of mice in the PAL test (Al-Onaizi et al., 2016). We observed a small but significant deficit in the PAL task for 3xTG-AD male mice at four months of age (Table 2, Supplementary file 3 and Supplementary file 6 - Tab 1, #d). 3xTG-AD mice did not show any sign of deficits in visual discrimination learning or behavioural flexibility in PD (Supplementary file 4 and Supplementary file 7 - Tab 1). Overall, the cognitive phenotype of these mice resembled patients with early AD, presenting early deficits in sustained attention (Perry et al., 2000) and visual-spatial learning (Blackwell et al., 2004), but not in behavioural flexibility (Sahakian et al., 1989).

In addition to testing 3xTG-AD mice, we also tested APP/PS1 and the widely used 5xFAD mouse line. We chose to use the 5xFAD mice in a mixed genetic background (C57Bl6 and Swiss Jim Lambert -SJL), as this was the original background in which this mouse line was generated (Oakley et al., 2006), and it is the most commonly used background across several laboratories (Qosa and Kaddoumi, 2016). However, the SJL genetic background presents the Pdeb^rd1 mutation that can lead to retinal degeneration (Clapcote et al., 2005) (see Materials and methods), which causes severe visual impairment in homozygosis. Given that some of the 5xFAD mice could be heterozygous for the Pdeb^rd1 mutation, we evaluated whether carrying one Pdeb^rd allele affected the performance of mice in touchscreens using the PD task, which directly measures visual discrimination. The performance of mice carrying one Pdeb^rd allele did not differ from those who did not (Figure 2—figure supplement 1A, B and C). Moreover, as touchscreen testing requires food restriction for motivation, we also assessed whether the food restriction protocols used for touchscreen cognitive testing affected amyloid production in 3xTG-AD and 5xFAD mice. Ultimately, we failed to find any differences in amyloid production and deposition between food restricted and non-food restricted animals (Figure 2—figure supplement 2 and Figure 2—figure supplement 3).

The 5xFAD transgenic mouse line displayed a complex cognitive phenotype. Female 5xFAD mice displayed deficits in sustained attention that begin at 7 months in the 5-CSRTT task, while males show deficits by 11 months (Table 1, Supplementary file 2 and Supplementary file 5 - Tab 2, #e). Initial training on the PAL task did not generate robust results as both 5xFAD and their controls (both male and female) were poor performers (Supplementary file 3 and Supplementary file 6 – Tab 2). This highlights the utility of MouseBytes in assessing cognitive testing of a given mouse line by comparing with other lines in the database. However, a simplified version of the test revealed significant visual-spatial deficits at 10 months of age for both male and female 5xFAD mice (Supplementary file 3 and Supplementary file 6 – Tab 2, #f). No deficits in behavioural flexibility or visual discrimination learning were observed for 5xFAD mice when compared to their respective controls (Supplementary file 4 and Supplementary file 7 – Tab 2). The 5xFAD mouse line displays a subtler behavioural phenotype than the 3xTG-AD, but is still consistent with the impairments observed in AD. Interestingly, amyloidosis has been reported to start earlier and to be more aggressive in the 5XFAD (Oakley et al., 2006) compared to the 3xTG-AD line (Oddo et al., 2003). However, our results showed earlier deficits development in 3xTG-AD mice compared to the 5xFAD line, suggesting that this could be related to abnormal Tau function in the 3xTG-AD mouse line (Oddo et al., 2003).

APP/PS1 mice (male or female) did not show any sign of attentional deficits in the 5-CSRTT task at any age (Supplementary file 2 and Supplementary file 5 – Tab 3) similar to what was observed independently in a different background for this strain (Shepherd et al., 2019). However, APP/PS1 male mice presented with an early deficit in visual-spatial integration learning in the PAL task, which is consistent with the 3xTG-AD mouse phenotype (Supplementary file 3 and Supplementary file 6 – Tab 3, #g). Furthermore, an early deficit in behavioural flexibility was observed for female APP/PS1 mice at four months of age in the PD reversal task, which is interesting from the point of the translational utility of this mouse model, as behavioural flexibility deficits are not typically associated with AD at early stages of the disease progression (Sahakian et al., 1989) (Supplementary file 4 and Supplementary file 7 – Tab 3, #h).

Although each touchscreen task is generally run across labs using the same set of task-specific stimuli, task stimuli are being optimised continuously (Horner et al., 2013; Mar et al., 2013 ). Furthermore, some researchers have run tasks multiple times within a cohort of mice using different stimuli (Bartko et al., 2011) and have found that the performance of animals may vary with the stimulus set used (Bussey et al., 2008). We extracted cross-site data from mice using different stimulus sets to show that indeed, depending on the image set used in PD or PAL, mice can reach higher or lower levels of discrimination accuracy (Figure 1—figure supplement 1). Our data indicate that, for PAL and PD or other touchscreen tasks using complex visual stimuli, longitudinal testing should be preceded by appropriate control experiments to avoid potential bias with image sets. We anticipate that when more data are available in MouseBytes, the touchscreen community will be able to compare a larger number of images sets and identify optimal stimulus combinations.

Genetic background and touchscreen performance

The choice of background strain for mouse models of disease can have major implications for cognitive assessment (Sittig et al., 2016). However, due to the absence of framework within which to aggregate behavioural data, comparison of the performance by different mouse strains has been limited. For example, in previous work data acquisition needed to be standardised across laboratories to gather information on how genetic background influences performance (Graybeal et al., 2014). We compared the performance of mice from three different wild-type strains in the initial dataset deposited in MouseBytes (B6129SF2/J, B6SJLF1/J and C57BL6/j background). To directly assess strain variability in touchscreen test performance, data from 5-CSRTT experiments were extracted from MouseBytes and analysed (similar analyses can be performed for other tests by extracting the datasets from MouseBytes). Interestingly, both female (Figure 4A, dataset 17) and males (Figure 4C, dataset 18) B6129SF2/J presented higher levels of accuracy on 5-CSRTT at 3–6 months of age, but not at 11–13 months of age (Figure 4B, dataset 19 and 4D, dataset 20), when compared to the other two wild-type strains tested (B6SJLF1/J and C57BL6/j). Moreover, both male and female B6SJLF1/J mice (3–6 and 11–13 months of age) were found to engage in more premature responses than the B6129SF2/J and C57BL6/j lines (4E-H, datasets 21, 22, 23 and 24). This suggests a general phenotype of impulsiveness inherent to these B6SJLF1/J mice. We envision that with multiple users depositing their data in MouseBytes, it will be relatively easy to make comparisons of performance for thousands of mice from different strains. Ultimately, these overarching analyses could help to inform the background strains to be used for new mouse lines to investigate specific high-level cognitive domains, for example, models that can now be generated using new genome-editing techniques such as CRISPR/Cas.

Figure 4

Download asset Open asset

Performance and response measures of male and female mice during the 5-CSRTT probe trials.

(**A-D**) Strain/mouse background comparison (accuracy) of female and male Wild-type controls (B6129SF2/J, B6SJLF1/J, C57Bl6/J) at 3–6 and 11–13 months of age; (**E-H**) Strain/mouse background comparison (premature responses) of female and male Wild-type controls (B6129SF2/J, B6SJLF1/J, C57Bl6/J) at 3–6 and 11–13 months of age; (**I-L**) Sex comparison (accuracy) of B6129SF2/J and 3xTG-AD females and males at 3–6 and 11–13 months of age. Results are presented as means ± s.e.m.; data were analysed and compared using Repeated measure Two-Way ANOVA and Bonferroni multiple comparisons post-hot test; *p<0.05, **p<0.01 and ***p<0.001 compared with control.

Sex variability

Recognition that behavioural rodent research is biased towards using male mice has led funding institutions to establish specific guidelines in the choice of animals for research (McCullough et al., 2014). Several neurobiological differences are present between male and female brains (Grissom and Reyes, 2019; Ruigrok et al., 2014). In the scope of AD, there are sex differences in the pathological development of plaques and tangles (Corder et al., 2004) and sex steroid hormones’ levels can contribute to some of these effects (Carroll et al., 2010; Carroll et al., 2007). To highlight the potential for sex comparisons in high-level cognitive assessment, we initially compared 3xTG-AD mice, a mouse line that presents sex variability in pathology (Carroll et al., 2010; Carroll et al., 2007). We found no major differences in attentional performance (accuracy) when we compared male and female mice in 5-CSRTT, as shown for the B6129SF2/J at 3–6 months (Figure 4I, dataset 25) or 11–13 months of age (Figure 4J, dataset 26). Similarly, no differences were found when 3xTG-AD males and females were compared at 3–6 (Figure 4K, dataset 27) or 11–13 months of age (Figure 4L, dataset 28). These results suggest little or no difference for high-level cognitive performance between male and female mice. To the best of our knowledge, the dataset presented here and deposited in MouseBytes provides the most extensive evaluation of performance of female mice in touchscreen tests.

Unbiased analysis of behavioural performance

While conventional ANOVA-based statistics can be employed to measure changes in behaviour, large datasets can make this approach difficult. In order to address these challenges, alternative types of analyses may be necessary. One potential solution is to employ machine learning algorithms or artificial intelligence systems. To provide an example of these high-level analyses to describe large behavioural datasets that can be extracted easily from MouseBytes, we generated a summary of the touchscreen behavioural data utilising a k-mean classification approach. This approach represents a class of unsupervised learning algorithms that can identify group clustering without any bias towards the behavioural measures or the sample identification such as of the genotype, age, sex, or test site. Recently, researchers have used longitudinal k-mean algorithms to subcategorise different cohorts of AD patient populations that had been previously classified in one large group based on disease progression (Genolini et al., 2016). We used the R package kml3d (Genolini et al., 2013) to conduct longitudinal k-mean unsupervised grouping of all the data for the three mouse lines. For all tasks, data were grouped into three categories loaded onto a similar progression of behavioural metrics of high performance, moderate performance, and low performance (Figure 5—figure supplement 1). We decided to choose three groups across all of our tasks in order to be consistent across domains, as well as to try and capture more subtlety in the clustering of behavioural characteristics. Because we tested animals at two or three temporally separated intervals during our cognitive testing, we decided to treat each animal’s observations for each testing interval as independent. This was done to allow for mice to change membership across our K-mean groups. We expected that changes in k-mean group membership would likely indicate cognitive decline in our animal populations. Following k-mean grouping, we then used Fisher’s exact test to determine if significant differences existed in the k-mean group membership between transgenic mice and their respective control strains. In order to separate out potential sex effects from genotype variation, we conducted comparisons separately between male and female mice for these analyses. Because animals were considered independent across age, we also conducted the analyses separately for each testing period. In order to account for multiple comparisons with Fisher’s Exact Test, the Benjamini-Hochberg correction for false discovery rate was applied (Table 3). Visualisation of the k-mean group memberships by strain, task, age, and sex can be found in Figure 5.

Figure 5 with 1 supplement see all

Download asset Open asset

Heatmap visualisations of k-mean group membership across experiments.

The percentage of group representation per strain is shown by the cell color. Cell color closer to red indicates a higher representation of mice in the k-mean grouping. Mice are divided by sex (male and female), genotype (W for wild type and T for transgenic), and age (3–6, 7–10, and 11–13). Analysis of the PAL (A) task data uncovered early group membership variability in the APP/PS1 mouse line, as well as later group membership differences in the male 5xFAD mice. In the PD (B) experiment, only APP/PS1 transgenic mice showed an increase in membership of the low performing group compared to wildtype. See also S5. In the 5-CSRTT (C), transgenic 3xTG-AD and 5xFAD mice were found to shift to the lowest performing group of mice.

Table 3

p-values from Fisher Exact Test for K-Mean Group.

Group Differences between wildtype and Transgenic mice across behavioural experiments. Fisher’s exact test was conducted to compare the % membership of the high, mid, and low k-mean groups between wildtype and transgenic mice for each strain, sex, and age. All p-values have been adjusted with the Benjamini and Hochberg Correction.

		3xTG-AD		5xFAD		APP/PS1
Task	Age	Female	Male	Female	Male	Female	Male
5-CSRTT	3-6	.03	.01	.03	.06	.17	.13
	7-10	.19	.06	.02	.01	.77	.07
	11-13	.01	.39	.02	<0.001	1.00	.19
PD	3-6	.34	.25	<0.001	.92	.02	.69
	7-10	.81	.16	<0.001	.06	.42	.61
	11-13	1.00	.36	<0.001	.03	.18	.81
PAL	3-6	.59	.19	.33	.19	.06	.03
PAL	11-13	.77	.95	.03	.01	1.00	.19

In the PAL task, the behaviour of high performing mice was characterised by high accuracy (% correct) and low numbers of correction trials (Figure 5—figure supplement 1A and B). Mid-performing mice had reduced accuracy and slow correct and reward collection response latencies (Figure 5—figure supplement 1A,C and D). The low-performing mice showed consistently low accuracy and high numbers of correction trials (errors) (Figure 5—figure supplement 1A and B). The number of mice belonging to each cluster can be found in Figure 5—figure supplement 1E. No significant differences in k-mean group membership were found for the 3xTG-AD mice at any age (Figure 5 and Table 3). Differences in membership were found to be significant for 5xFAD mice at 10–11 months of age, as more males 5xFAD transgenic were found to be in the low performing group, while more females 5xFAD transgenic were found to be in the mid performing group (Table 3). Interestingly, at four months of age, most 5xFAD wild-type and transgenic mice were low performers, suggesting the background strain may affect performance on this task, confirming our observation with traditional analysis. Significant group membership differences were observed for APP/PS1 mice at four months of age, and more female transgenic mice were found in the high performing group (Figure 5), while male APP/PS1 transgenic mice were in the low performing group (Figure 5 and Table 3). These data suggest that while the PAL task might be a good behavioural predictor in human AD, further studies should be conducted to ensure that this effect is consistently observed across multiple AD mouse models. This is consistent with the small effect sizes in PAL, except for 10-month-old 5xFAD mice (Supplementary file 3 and Supplementary file 6, Tab 2).

In the PD tasks, behaviour of high performing mice was characterized by high response accuracy (% correct) and low number of correction trials (errors) (Figure 5—figure supplement 1F and G). The typical behaviour of the mid performing mice included slower correct and reward collection response latencies to the test stimuli (Figure 5—figure supplement 1H and I). Low performing mice showed a pattern of low accuracy (% correct) and high number of correction trials (Figure 5—figure supplement 1F and G). The number of mice belonging to each cluster can be found in Figure 5—figure supplement 1J. No significant group composition differences were observed for 3xTG-AD mice (Table 3). Fisher’s exact test revealed significant differences in k-mean group composition for 5xFAD mice as more transgenic mice were found to occupy the mid performing group, while wild-type control mice occupied the low performing group (Table 3). Only a significant difference in group composition for APP/PS1 females was observed at four months, as there was a larger group of transgenic mice in the low performing group compared to wild type (Table 3). Overall, the different pattern of results in PD suggests it was not particularly sensitive to AD-related pathological changes. Separation of performance between strains was only observed for the female APP/PS1 mice.

In the 5-CSRTT, high performing mice were characterised by high accuracy (% correct), low omissions, and higher perseverative responses, Figure 5—figure supplement 1K,L and M). Mid performing mice had average response accuracy and rates of omissions (Figure 5—figure supplement 1K, L), but showed an increase in premature responses (Figure 5—figure supplement 1N). The low performing mice showed low accuracy, high omission, slow response times and slow reward collection latency (Figure 5—figure supplement 1K, L, O and P). The number of mice belonging to each cluster can be found in Figure 5—figure supplement 1Q. Significant k-mean membership differences were observed for 3xTG-AD mice, consistently at 4 months of age, and transgenic mice were usually clustered as low performers (Figure 5 and Table 3). Fisher’s exact test analysis revealed significant k-mean membership differences for 5xFAD transgenic mice and their respective control mice across all ages (Table 3), and the 5xFAD mice presented consistently low performance. These results suggest that both 5xFAD and 3xTG-AD transgenic mice consistently are lower performers (shifted to the lower performance group) than their WT counterparts (shifted to the higher and moderate performers) in the 5-CSRTT, suggesting that this test might be a good candidate for screening cognitive symptoms in these two mouse models of AD. Interestingly, these differences were not observed for the APP/PS1 mice. In fact, there was no difference in the performance of females APP/PS1 mice (3–6, 7–10 or 11–13). However, surprisingly, APP/PS1 male mice tended to shift their performance to the higher and moderate performers while the WT shifted to the lower performers.

Curiously, across all touchscreen paradigms, male and female 5xFAD mice had a consistent phenotype displaying increases in reward collection latency (Tables 1 and 2, Supplementary file 2, 3, 4, 5, 6 and 7 – Tab 2, #i), suggesting the potential that these mice present lack of motivation or abnormal motor function at the ages tested, which has been described previously (O'Leary et al., 2018).

Collectively, the data show different patterns of cognition abnormalities between the three AD models, which may be related to different human AD mutations and the pathophysiology associated with them, including the tau mutation in the 3xTG-AD mouse line. Overall, two of the three lines showed a consistent deficit in attention and all lines presented modest but significantly lower performance in PAL. In addition, we also observed sex dissimilarities in PAL for 3xTG-AD and APP/PS1 (Supplementary file 3, tabs 1 and 3 #d and #g respectively), which could be due to differences in cellular and molecular mechanisms in brain development (Grissom and Reyes, 2019; Ruigrok et al., 2014) and/or the differences of AD-type pathology, disease onset and progression rate in males and females.

Conclusions and next steps

Here we introduce an open-access high-throughput pipeline and a Web application database that facilitates data repository, searching, and analysis of touchscreen data. The MouseBytes data integration platform introduces quality control of high-throughput approaches using touchscreen analysis in an open source platform for dissemination of high-level cognitive data. Including standardised data from different laboratories around the world will bring the advantages of open-access data sharing and greatly enhance validation, comparison and post-publication analysis of large datasets by independent researchers. Furthermore, this approach also facilitates collaboration to increase replicability/reproducibility and re-use of cognitive data and ultimately increases the accuracy of predictions regarding cognitive phenotypes and outcomes in drug efficacy studies.

Currently, several different species can be tested using touchscreens for cognitive assessment, including rats, primates, monkeys, birds, and dogs (Bussey et al., 2008; Charles et al., 2004; Guigueno et al., 2015; Horner et al., 2013; Kangas and Bergman, 2012; Mar et al., 2013; Nagahara et al., 2010; Rodriguez et al., 2011; Schmitt, 2018; Steurer et al., 2012; Wallis, 2017). Our current scripts can facilitate the formatting of files from such studies and ultimately data from different species, including rats, will be easily incorporated into MouseBytes or similar platforms. Moreover, one can easily envision outputs from unidentified human touchscreen cognitive testing being stored and accessed using similar repository and Web applications. Given the potential for identical touchscreen tests in mice and humans, these data may prove valuable for understanding the consequences of specific mutations for high-level cognition (Nithianantharajah et al., 2015).

A major publication bias is the lack of published null datasets, which are important to avoid waste of resources. MouseBytes provides a platform for the dissemination of datasets for touchscreen cognitive assessment even when results show no change in high-level cognition. We anticipate that researchers using automated touchscreen tests will benefit by making their original data available for the community as an integral part of scientific record and publication. This database will become exponentially more valuable when data from more strains of mouse models of disease, drug treatments and genetic manipulations are deposited. Furthermore, as an open source, MouseBytes will be built as a platform where the research community can contribute to new features and share new codes for data analysis. Indeed, MouseBytes is part of a large open initiative for the touchscreen/cognitive behaviour community which includes the touchscreencognition.org platform as well, a knowledge sharing platform that allows storage of protocols and community-driven discussions.

We envision that with MouseBytes it will become easier to connect transcriptomic and different modalities of imaging data from mouse models to their cognitive performance. Ultimately, the integration of current and new touchscreen tests with the use of MouseBytes will change how cognitive function is evaluated in rodents facilitating the discovery of new therapeutic approaches for neurodegenerative and neuropsychiatric disorders.

Materials and methods

Key resources table

Reagent type (species) or Resource	Designation	Source or reference	Identifiers	Additional Information
Antibody	6E10 Primary Antibody (Human Monoclonal)	Covance	RRID:AB_2564652 Lot#: D13EF01399 Cat#: SIG-39320	IF (1:200)
Antibody	488 Goat Anti-Mouse Secondary Antibody (Mouse Polyclonal)	Invitrogen	RRID:AB_2534069 Cat#: A-11001	IF(1:1000)
Commercial Assay Kit	Amyloid Beta 42 Human ELISA Kit – Ultrasensitive	Invitrogen	Cat#:KHB3544
Strain	B6.Cg-Tg[APPswe,PSEN1dE9]85Dbo/Mmja (Mouse, Male, Female)	Jackson Laboratories	RRID:MGI:034832-JAX Stock#: 034832-JAX
Strain	(B6;129-Psen1^tm1MpmTg[APPSwe,tauP301L-1Lfa 0 (Mouse, Male, Female)	Jackson Laboratories	RRID:MGI:101045-JAX Stock#: 101045-JAX
Strain	B6SJL-Tg(APPSwFlLon,PSEN1M146LL286V)6799Vas/Mmja (Mouse, Male, Female)	Jackson Laboratories	RRID:MGI:034840-JAX Stock#: 034840-JAX
Strain	C57BL/6 (Mouse, Male, Female)	Jackson Laboratorie	RRID:MGI:000664-JAX Stock#: 000664
Strain	B6129SF1/J(Mouse, Male, Female)	Jackson Laboratorie	RRID:MGI:101043 Stock#: 101043
Strain	B6SJLF1/J(Mouse, Male, Female)	Jackson Laboratorie	RRID:MGI:100012 Stock#: 100012
Software	ABET II Touch	Lafayette Neuroscience	Model#: 89505
Software	Spotfire	TIBCO	https://www.tibco.com/products/tibco-spotfir
Software	Touchscreen Quality Control Syste	BrainsCAN	https://github.com/srmemar/Mousebytes-QualityControl

Share this article

Cite this article

Schematic overview of the automated touchscreen cognition platform.

5-CSRTT analyses.

PD and PAL analyses.

Performance and response measures of Male and Female mice during 5-CSRTT probe trials.

Sustained attention (vigilance) of 3xTG-AD male and female mice during the 5-CSRTT probe trials.

Performance and response measures of male and female mice during the 5-CSRTT probe trials.

Heatmap visualisations of k-mean group membership across experiments.

p-values from Fisher Exact Test for K-Mean Group.

Author details

Flavio H Beraldo

Contribution

Contributed equally with

Competing interests

Daniel Palmer

Contribution

Contributed equally with

Competing interests

Sara Memar

Contribution

Contributed equally with

Competing interests

David I Wasserman

Contribution

Contributed equally with

Competing interests

Wai-Jane V Lee

Contribution

Competing interests

Shuai Liang

Contribution

Competing interests

Samantha D Creighton

Contribution

Competing interests

Benjamin Kolisnyk

Present address

Contribution

Competing interests

Matthew F Cowan

Contribution

Competing interests

Justin Mels

Contribution

Competing interests

Talal S Masood

Contribution

Competing interests

Chris Fodor

Contribution

Competing interests

Mohammed A Al-Onaizi

Present address

Contribution

Competing interests

Robert Bartha

Contribution

Competing interests

Tom Gee

Contribution

Competing interests

Lisa M Saksida

Contribution

Competing interests

Timothy J Bussey

Contribution

Competing interests

Stephen S Strother

Contribution

Competing interests

Vania F Prado

Contribution

Competing interests

Boyer D Winters

Contribution

For correspondence

Competing interests

Marco AM Prado

Contribution