Hierarchical Bayesian modeling of multiregion brain cell count data
Figures
Introduction.
(A) Each of animals produces a cell count from a total of brain regions of interest. Cell-count data is typically undersampled with . Scientists analyze the brain sections from the experiment for positive signals. Here, an example section is shown where teal points mark cells expressing the immediate early gene c-Fos (green and red lines indicate regions labeled as damaged). The final cell count is equal to the sum of these individual items sagittal brain map taken from the Allen mouse brain atlas: https://mouse.brain-map.org. (B) Partial pooling is a hierarchical structure that jointly models observations from some shared population distribution. It is a continuum that depends on the value of the population variance . When , there is no variation in the population, and each individual observation is modeled as a conditionally independent estimate of some fixed population mean (complete pooling). As tends to infinity, observations do not combine inferential strength but inform an independent estimate (no pooling). In between the two extremes, combine. Each observation can contribute to the population estimate while simultaneously supporting a local one to effectively model the variance in the data. The observed data quantities, to , are highlighted with a thick line in the model diagrams. (C) An example of partial pooling on simulated count data. As the population standard deviation increases on the -axis, the individual estimates trace a path from a completely pooled estimate to an unpooled estimate. Circular points give the raw data values. Parameters are exponentiated because the outcomes are Poisson and so parameters are fit on the log scale.
Methods.
A table of partial pooling behavior for different likelihood and prior combinations. Rows are the two prior choices for the population distribution, and columns the two distributions for the data. Within each cell, the expectation of the marginal posterior is plotted as a function of . The solid black line is the expectation of the marginal posterior with one standard deviation highlighted in gray. Top left: Combining a normal prior for the population with a Poisson likelihood is unsatisfactory in the presence of a zero observation. The zero observations influence the population mean in an extreme way owing to their high importance under the Poisson likelihood. Bottom left: By changing to a horseshoe prior, the problematic zero observations can escape the regularization machinery. However, regularization of the estimates with positive observations is much less impactful. Top right: A zero-inflated Poisson likelihood accounts for the zero observations with an added process, reducing the burden on the population estimate to compromise between extreme values. Bottom right: No model.
Recognition memory circuit.
Schematic of the recognition memory network adapted from Exley, 2019. Bold arrows show the assumed two-way connection between the medial prefrontal cortex and the hippocampus facilitated by the nucleus reuniens (NRe). Colors highlight the hippocampus (HPC) (red), MPC (blue), and specific areas of the rhinal cortex (yellow). The NRe was lesioned in the experiment.
Results - Case study 1.
(A) Heatmap of the raw log cell count data. Each row corresponds to a single animal, columns correspond to brain regions. Animals are grouped into lesion-familiar (LF), lesion-novel (LN), sham-familiar (SF), and sham-novel (SN). (B, C) -fold differences for each surgery type: B shows differences between SF and SN groups; C shows differences between LF and LN groups. The 95% Bayesian highest density interval (HDI) is given in green, and the 95% confidence interval calculated from a Welch’s -test in orange. Horizontal lines within the intervals mark the posterior mean of the Bayesian results, and the raw data means in the -test case. The -axis is ordered in terms of decreasing p-value from the significance test and ticks have been color-paired with the nodes in the recognition memory circuit diagram (Figure 3). Black ticks are not present in the circuit because they are the control regions in the experiment.
Results - Case study 2.
(A) Heatmap of the raw log cell count data. Each row corresponds to a single animal, columns correspond to brain regions. L and R denote left and right hemispheres, respectively. (B) log2 fold differences in green fluorescent protein (GFP) positive cells between mouse genotypes, heterozygous (HET), and knockout (KO), for each of the 50 recorded brain regions spread across two rows. The 95% Bayesian highest density interval (HDI) is given in purple and pink for the Bayesian horseshoe and zero-inflated model. The 95% confidence interval calculated from a Welch’s -test is in orange. Horizontal lines within the intervals mark the posterior mean of the Bayesian results and the data estimate for the -test. The -axis is ordered in terms of decreasing p-value from the significance test.
Example data and inferences highlighting model discrepancies.
On the left under ‘data’: boxplots with medians and interquartile ranges for the raw data for four example brain regions. The shape of each point pairs left and right hemisphere readings in each of the five animals. On the right under ‘inference’: highest density intervals (HDIs) and confidence intervals are plotted. Purple is the Bayesian horseshoe model, pink is the Bayesian ZIP model, and orange is the sample mean. The Bayesian estimates are not strongly influenced by the zero-valued observations (medial preoptic nucleus [MPN], suprachiasmatic nucleus [SCH], dorsal tuberomammillary nucleus [TMd]) or large-valued outliers (medial habenula [MH]) and have means close to the data median. This explains the advantage of the Bayesian results over the confidence interval.
PPC - Poisson.
Posterior predictive check for the standard Poisson model in Case study 1. (A) The proportion of zeroes in the data matches the proportion of zeroes in posterior predictive samples. This proportion is zero. (B) The distribution of standard deviations computed over a number of posterior predictive datasets (histogram) aligns with the standard deviation of the data.
PPC - Horseshoe.
Horseshoe model - Case study 2. Posterior predictive check for the standard horseshoe model in Case study 2. (A) The proportion of zeroes in the data is larger than those found in posterior predictive datasets. This makes sense, because the likelihood is still a Poisson distribution. (B) The distribution of standard deviations computed over a number of posterior predictive datasets (histogram) aligns with the standard deviation of the data.
PPC - ZIPoisson.
Zero-inflated Poisson - Case study 2. (A) The proportion of zeroes in the data matches the proportion of zeroes in posterior predictive samples. (B) The distribution of standard deviations computed over a number of posterior predictive datasets (histogram) aligns with the standard deviation of the data.
Horseshoe densities.
(A) Conditional posterior. (B) MCMC pair plots. Divergent samples are colored in pink, non-divergent in blue.
Tables
Parameter table for the hierarchical model.
| Parameter | Description |
|---|---|
| Exposure | |
| Horseshoe inflation. | |
| Zero inflation | |
| Random effect for observationi | |
| Fixed effect for regionr in groupg | |
| Scale of random effects for regionr in groupg |
Software packages used.
| R Libraries | Version | Description |
|---|---|---|
| rstan | 2.26.3 | complete Stan library |
| cmdstanr | 0.5.2 | lightweight Stan library |
| HDInterval | 0.2.2 | calculating HDI in R |
| ggplot2 | 3.4.1 | plotting |
| bayesplot | 1.9.0 | plotting |
| tidyverse | 1.3.1 | tibble, tidyr, readr, purr, dplyr, stringr, forcats |
-
R version 4.2.1 - ‘Funny-looking-kid’.
-
Computation was performed locally on a Dell XPS 13 7390 laptop. Intel i7-10510U @ 1.80 GHz, 16 GB of RAM, Ubuntu 20.04.4 LTS.
-
Panels composed using Inkscape version 1.2.2.
Acronyms for the brain regions in Case study 1.
| Term | Definition |
|---|---|
| ACC | Anterior cingulate cortex |
| DCA1/3 | Dorsal CA1/3 |
| DDG | Dorsal dentate gyrus |
| DPC | Dorsal peduncular cortex |
| DSUB | Dorsal subiculum |
| HPC | Hippocampus |
| ICA1/3 | Intermediate CA1/3 |
| IDG | Intermediate dentate gyrus |
| IFC | Infralimbic cortex |
| LENT | Lateral entorhinal cortex |
| MOC | Medial orbital cortex |
| MPFC | Medial prefrontal cortex |
| M2C | Motor cortex M2 |
| NRe | Nucleus reuniens |
| PRL | Prelimbic cortex |
| PRH | Perirhinal cortex |
| PSTC | Postrhinal cortex |
| TE2 | Temporal association cortex |
| VCA1/3 | Ventral CA1/3 |
| VDG | Ventral dentate gyrus |
| VOC | Ventral orbital cortex |
| VSUB | Ventral subiculum |
| V2C | Visual cortex V2 |
Acronyms for the brain regions in Case study 2.
| Term | Definition | Term | Definition |
|---|---|---|---|
| AHN | Anterior hypothalamic nucleus | PP | Peripeduncular nucleus |
| ARH | Arcuate hypothalamic nucleus | PR | Perireunensis nucleus |
| CL | Central lateral nucleus of the thalamus | PVa | Periventricular hypothalamic nucleus, anterior part |
| DMH | Dorsomedial nucleus of the hypothalamus | PVH | Paraventricular hypothalamic nucleus |
| FF | Fields of Forel | PVHd | Paraventricular hypothalamic nucleus, descending division |
| IGL | Intergeniculate leaflet of the lateral geniculate complex | PVi | Periventricular hypothalamic nucleus, intermediate part |
| LD | Lateral dorsal nucleus of thalamus | PVp | Periventricular hypothalamic nucleus, posterior part |
| LM | Lateral mammillary nucleus | RCH | Retrochiasmatic area |
| LGv | Ventral part of the lateral geniculate complex | RT | Reticular nucleus of the thalamus |
| LGd | Dorsal part of the lateral geniculate complex | SBPV | Subparaventricular zone |
| LH | Lateral habenula | SCH | Suprachiasmatic nucleus |
| LHA | Lateral hypothalamic area | SGN | Suprageniculate nucleus |
| LP | Lateral posterior nucleus of the thalamus | SPFm | Subparafascicular nucleus, magnocellular part |
| MD | Mediodorsal nucleus of thalamus | SPFp | Subparafascicular nucleus, parvicellular part |
| MGd | Medial geniculate complex, dorsal part | SUM | Supramammillary nucleus |
| MGv | Medial geniculate complex, ventral part | TMd | Tuberomammillary nucleus, dorsal part |
| MGm | Medial geniculate complex, medial part | TMv | Tuberomammillary nucleus, ventral part |
| MH | Medial habenula | TU | Tuberal nucleus |
| MMme | Medial mammillary nucleus, median part | VAL | Ventral anterior-lateral complex of the thalamus |
| MPN | Medial preoptic nucleus | VMH | Ventromedial hypothalamic nucleus |
| PH | Posterior hypothalamic nucleus | VM | Ventral medial nucleus of the thalamus |
| PMd | Dorsal premammillary nucleus | VPL | Ventral posterolateral nucleus of the thalamus |
| PMv | Ventral premammillary nucleus | VPM | Ventral posteromedial nucleus of the thalamus |
| PO | Posterior complex of the thalamus | VPMpc | Ventral posteromedial nucleus of the thalamus, parvicellular part |
| POL | Posterior limiting nucleus of the thalamus | ZI | Zona incerta |