Hierarchical Bayesian modeling of multiregion brain cell count data

  1. Sydney Dimmock  Is a corresponding author
  2. Benjamin MS Exley
  3. Gerald Moore
  4. Lucy Menage
  5. Alessio Delogu
  6. Simon R Schultz
  7. E Clea Warburton
  8. Conor J Houghton  Is a corresponding author
  9. Cian O'Donnell  Is a corresponding author
  1. School of Engineering Mathematics and Technology, University of Bristol, Michael Ventris Building, United Kingdom
  2. School of Physiology, Pharmacology and Neuroscience, University of Bristol, Biomedical Sciences Building, University Walk, United Kingdom
  3. Centre for Neurotechnology and Department of Bioengineering, Imperial College London, South Kensington, United Kingdom
  4. Department of Basic and Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, United Kingdom
  5. School of Computing, Engineering and Intelligent Systems, Ulster University, United Kingdom
14 figures, 4 tables and 1 additional file

Figures

Introduction.

(A) Each of N animals produces a cell count from a total of R brain regions of interest. Cell-count data is typically undersampled with NR. Scientists analyze the brain sections from the experiment for positive signals. Here, an example section is shown where teal points mark cells expressing the immediate early gene c-Fos (green and red lines indicate regions labeled as damaged). The final cell count is equal to the sum of these individual items sagittal brain map taken from the Allen mouse brain atlas: https://mouse.brain-map.org. (B) Partial pooling is a hierarchical structure that jointly models observations from some shared population distribution. It is a continuum that depends on the value of the population variance τ. When τ=0, there is no variation in the population, and each individual observation is modeled as a conditionally independent estimate of some fixed population mean θ (complete pooling). As τ tends to infinity, observations do not combine inferential strength but inform an independent estimate γi (no pooling). In between the two extremes, combine. Each observation can contribute to the population estimate while simultaneously supporting a local one to effectively model the variance in the data. The observed data quantities, yi to yn, are highlighted with a thick line in the model diagrams. (C) An example of partial pooling on simulated count data. As the population standard deviation increases on the x-axis, the individual estimates exp(γi) trace a path from a completely pooled estimate to an unpooled estimate. Circular points give the raw data values. Parameters are exponentiated because the outcomes are Poisson and so parameters are fit on the log scale.

Methods.

A table of partial pooling behavior for different likelihood and prior combinations. Rows are the two prior choices for the population distribution, and columns the two distributions for the data. Within each cell, the expectation of the marginal posterior p(exp(γi)|θ,τ,y) is plotted as a function of τ. The solid black line is the expectation of the marginal posterior p(θ|τ,y) with one standard deviation highlighted in gray. Top left: Combining a normal prior for the population with a Poisson likelihood is unsatisfactory in the presence of a zero observation. The zero observations influence the population mean in an extreme way owing to their high importance under the Poisson likelihood. Bottom left: By changing to a horseshoe prior, the problematic zero observations can escape the regularization machinery. However, regularization of the estimates with positive observations is much less impactful. Top right: A zero-inflated Poisson likelihood accounts for the zero observations with an added process, reducing the burden on the population estimate to compromise between extreme values. Bottom right: No model.

Recognition memory circuit.

Schematic of the recognition memory network adapted from Exley, 2019. Bold arrows show the assumed two-way connection between the medial prefrontal cortex and the hippocampus facilitated by the nucleus reuniens (NRe). Colors highlight the hippocampus (HPC) (red), MPC (blue), and specific areas of the rhinal cortex (yellow). The NRe was lesioned in the experiment.

Results - Case study 1.

(A) Heatmap of the raw log cell count data. Each row corresponds to a single animal, columns correspond to brain regions. Animals are grouped into lesion-familiar (LF), lesion-novel (LN), sham-familiar (SF), and sham-novel (SN). (B, C) log2-fold differences for each surgery type: B shows differences between SF and SN groups; C shows differences between LF and LN groups. The 95% Bayesian highest density interval (HDI) is given in green, and the 95% confidence interval calculated from a Welch’s t-test in orange. Horizontal lines within the intervals mark the posterior mean of the Bayesian results, and the raw data means in the t-test case. The x-axis is ordered in terms of decreasing p-value from the significance test and ticks have been color-paired with the nodes in the recognition memory circuit diagram (Figure 3). Black ticks are not present in the circuit because they are the control regions in the experiment.

Results - Case study 2.

(A) Heatmap of the raw log cell count data. Each row corresponds to a single animal, columns correspond to brain regions. L and R denote left and right hemispheres, respectively. (B) log2 fold differences in green fluorescent protein (GFP) positive cells between mouse genotypes, heterozygous (HET), and knockout (KO), for each of the 50 recorded brain regions spread across two rows. The 95% Bayesian highest density interval (HDI) is given in purple and pink for the Bayesian horseshoe and zero-inflated model. The 95% confidence interval calculated from a Welch’s t-test is in orange. Horizontal lines within the intervals mark the posterior mean of the Bayesian results and the data estimate for the t-test. The x-axis is ordered in terms of decreasing p-value from the significance test.

Example data and inferences highlighting model discrepancies.

On the left under ‘data’: boxplots with medians and interquartile ranges for the raw data for four example brain regions. The shape of each point pairs left and right hemisphere readings in each of the five animals. On the right under ‘inference’: highest density intervals (HDIs) and confidence intervals are plotted. Purple is the Bayesian horseshoe model, pink is the Bayesian ZIP model, and orange is the sample mean. The Bayesian estimates are not strongly influenced by the zero-valued observations (medial preoptic nucleus [MPN], suprachiasmatic nucleus [SCH], dorsal tuberomammillary nucleus [TMd]) or large-valued outliers (medial habenula [MH]) and have means close to the data median. This explains the advantage of the Bayesian results over the confidence interval.

Appendix 1—figure 1
Diagnostics - Poisson.

Standard Poisson model - Case study 1.

Appendix 1—figure 2
Diagnostics - Horseshoe.

Horseshoe model - Case study 2.

Appendix 1—figure 3
Diagnostics - ZIPoisson.

Zero-inflated Poisson - Case study 2.

Appendix 1—figure 4
PPC - Poisson.

Posterior predictive check for the standard Poisson model in Case study 1. (A) The proportion of zeroes in the data matches the proportion of zeroes in posterior predictive samples. This proportion is zero. (B) The distribution of standard deviations computed over a number of posterior predictive datasets (histogram) aligns with the standard deviation of the data.

Appendix 1—figure 5
PPC - Horseshoe.

Horseshoe model - Case study 2. Posterior predictive check for the standard horseshoe model in Case study 2. (A) The proportion of zeroes in the data is larger than those found in posterior predictive datasets. This makes sense, because the likelihood is still a Poisson distribution. (B) The distribution of standard deviations computed over a number of posterior predictive datasets (histogram) aligns with the standard deviation of the data.

Appendix 1—figure 6
PPC - ZIPoisson.

Zero-inflated Poisson - Case study 2. (A) The proportion of zeroes in the data matches the proportion of zeroes in posterior predictive samples. (B) The distribution of standard deviations computed over a number of posterior predictive datasets (histogram) aligns with the standard deviation of the data.

Appendix 1—figure 7
Horseshoe densities.

(A) Conditional posterior. (B) MCMC pair plots. Divergent samples are colored in pink, non-divergent in blue.

Appendix 1—figure 8
Modified horseshoe densities.

(A) The conditional posterior p(γ~,κθ,τ,y) when y = 0 (left) and y ≠ 0 (right). (B) MCMC pair plots of samples from the marginal posterior density p(γ~,κy).

Tables

Table 1
Parameter table for the hierarchical model.
ParameterDescription
EiExposure
κiHorseshoe inflation.
πZero inflation
γiRandom effect for observationi
θrgFixed effect for regionr
in groupg
τrgScale of random effects for regionr
in groupg
Appendix 1—table 1
Software packages used.
R LibrariesVersionDescription
rstan2.26.3complete Stan library
cmdstanr0.5.2lightweight Stan library
HDInterval0.2.2calculating HDI in R
ggplot23.4.1plotting
bayesplot1.9.0plotting
tidyverse1.3.1tibble, tidyr, readr, purr, dplyr, stringr, forcats
  1. R version 4.2.1 - ‘Funny-looking-kid’.

  2. Computation was performed locally on a Dell XPS 13 7390 laptop. Intel i7-10510U @ 1.80 GHz, 16 GB of RAM, Ubuntu 20.04.4 LTS.

  3. Panels composed using Inkscape version 1.2.2.

Appendix 1—table 2
Acronyms for the brain regions in Case study 1.
TermDefinition
ACCAnterior cingulate cortex
DCA1/3Dorsal CA1/3
DDGDorsal dentate gyrus
DPCDorsal peduncular cortex
DSUBDorsal subiculum
HPCHippocampus
ICA1/3Intermediate CA1/3
IDGIntermediate dentate gyrus
IFCInfralimbic cortex
LENTLateral entorhinal cortex
MOCMedial orbital cortex
MPFCMedial prefrontal cortex
M2CMotor cortex M2
NReNucleus reuniens
PRLPrelimbic cortex
PRHPerirhinal cortex
PSTCPostrhinal cortex
TE2Temporal association cortex
VCA1/3Ventral CA1/3
VDGVentral dentate gyrus
VOCVentral orbital cortex
VSUBVentral subiculum
V2CVisual cortex V2
Appendix 1—table 3
Acronyms for the brain regions in Case study 2.
TermDefinitionTermDefinition
AHNAnterior hypothalamic nucleusPPPeripeduncular nucleus
ARHArcuate hypothalamic nucleusPRPerireunensis nucleus
CLCentral lateral nucleus of the thalamusPVaPeriventricular hypothalamic nucleus, anterior part
DMHDorsomedial nucleus of the hypothalamusPVHParaventricular hypothalamic nucleus
FFFields of ForelPVHdParaventricular hypothalamic nucleus, descending division
IGLIntergeniculate leaflet of the lateral geniculate complexPViPeriventricular hypothalamic nucleus, intermediate part
LDLateral dorsal nucleus of thalamusPVpPeriventricular hypothalamic nucleus, posterior part
LMLateral mammillary nucleusRCHRetrochiasmatic area
LGvVentral part of the lateral geniculate complexRTReticular nucleus of the thalamus
LGdDorsal part of the lateral geniculate complexSBPVSubparaventricular zone
LHLateral habenulaSCHSuprachiasmatic nucleus
LHALateral hypothalamic areaSGNSuprageniculate nucleus
LPLateral posterior nucleus of the thalamusSPFmSubparafascicular nucleus, magnocellular part
MDMediodorsal nucleus of thalamusSPFpSubparafascicular nucleus, parvicellular part
MGdMedial geniculate complex, dorsal partSUMSupramammillary nucleus
MGvMedial geniculate complex, ventral partTMdTuberomammillary nucleus, dorsal part
MGmMedial geniculate complex, medial partTMvTuberomammillary nucleus, ventral part
MHMedial habenulaTUTuberal nucleus
MMmeMedial mammillary nucleus, median partVALVentral anterior-lateral complex of the thalamus
MPNMedial preoptic nucleusVMHVentromedial hypothalamic nucleus
PHPosterior hypothalamic nucleusVMVentral medial nucleus of the thalamus
PMdDorsal premammillary nucleusVPLVentral posterolateral nucleus of the thalamus
PMvVentral premammillary nucleusVPMVentral posteromedial nucleus of the thalamus
POPosterior complex of the thalamusVPMpcVentral posteromedial nucleus of the thalamus, parvicellular part
POLPosterior limiting nucleus of the thalamusZIZona incerta

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Sydney Dimmock
  2. Benjamin MS Exley
  3. Gerald Moore
  4. Lucy Menage
  5. Alessio Delogu
  6. Simon R Schultz
  7. E Clea Warburton
  8. Conor J Houghton
  9. Cian O'Donnell
(2025)
Hierarchical Bayesian modeling of multiregion brain cell count data
eLife 13:RP102391.
https://doi.org/10.7554/eLife.102391.3