Introduction

Access to education is codified as a fundamental human right with immense societal and economic benefits 1,2. Individuals who experience more education tend to show a wide variety of beneficial health, cognitive, and neural outcomes 3. Correlational evidence across disparate contexts, countries, and demographics offers overwhelming support for these findings 410.

Education not only increases learned skills and knowledge, but also reasoning and fluid intelligence abilities 1114. One plausible mechanism underlying these widespread benefits from education is the long-term reorganization of brain structure. Lifespan theories on heterogenous neurodevelopment place particular emphasis on education 1519 and correlative work further support the role of brain structure as a mechanism 7,2022. For instance, more educated individuals show higher mean cortical thickness in later life – taken as evidence of ‘increased brain reserve’ lessening the neurodegenerative effects of aging 23,24.

Yet, strong causal evidence for the effect of education on brain structure does not yet exist. Both ethical and practical constraints mean that the effect of education cannot be experimentally tested. This makes it unclear if any findings from education are causal, or if instead, they reflect a complex web of preexisting, or amplifying, sociodemographic and individual characteristics 3,25 – many of which have been associated with brain structure (e.g., intelligence 26, parental income 27, neighborhood pollution 28). Among the plethora of environmental factors, there is also a substantial (∼40% heritability) genetic component implicated in educational attainment, further confounding potential effects 29,30. In addition, access to higher education involves substantial selection processes at the level of the individual and educational system which further complicate the causal pathways involved.

One solution to this problem is using a natural experiment – a ‘random-like’ (exogenous) external event – allowing causal inference in observational phenomena 31,32. A crucial feature of this design is an assignment mechanism outside a participant’s control, usually of natural (e.g., geographical, weather events) or governmental (e.g., policy decisions, cutoff rules) origin. For instance, a law that increases the number of mandatory school years affects everyone equally, in turn, decoupling individual characteristics and other unmeasured confounders from the effects of additional education 33. One of the most widely used causal inference analysis techniques is regression discontinuity (RD), which is applicable when treatment is assigned via a cutoff of a particular score or running variable 31,32. Recent advances in the field of econometrics have optimized inference and largely standardized regression discontinuity (RD) analysis 32,3436. These methods have provided robust evidence that additional education causes an increase in intelligence (0.14 – 0.35 SD) 12. Despite the many strengths and potential causal insights natural experiments offer the field of cognitive neuroscience they have not been used to address such questions.

On September 1st 1972, the minimum mandatory age to leave school was raised from 15 to 16 years of age in England, Scotland, and Wales (called here ‘ROSLA’) 37. The consequence of this law change was substantial: it resulted in almost 100% of children aged 15 staying in school for an additional year, in turn, increasing formal qualifications, income, and cognition 3843.

However, whether this substantial intervention also affected the long-term brain structure of those born right around the cut-off remains an open question. This is unfortunate, as regression discontinuity is a powerful tool to study phenomena that cannot (ethically or practically) be randomized. Recent developments of increased large population-based neuroimaging cohorts provide the sample size needed to make use of these cross-disciplinary methods. One such cohort, the UK BioBank, has recruitment criteria matching the geographic and birth window of the natural experiment ROSLA 44. This provides the ideal opportunity to test, for the first time, the causal effect of a year of education on long-term structural neuroimaging properties.

Using a preregistered design (https://osf.io/rv38z/) with over 30,000 participants we evaluate if an additional year of education, as mandated by ROSLA, causes changes in six global neural properties (total surface area, average cortical thickness, normalized total brain volume, mean weighted fractional anisotropy, white matter hyperintensities, and normalized cerebral spinal fluid volume) as compared to individuals born before the cut-off. It is also possible an effect from education could manifest only in specific regions, in the absence of a broad, macro-neural effects. For this reason, we further tested 66 individual cortical regions for surface area and cortical thickness, 27 white matter fiber tracks for fractional anisotropy, and subcortical gray matter volume in 18 regions. Taken together, the combination of cutting-edge quantitative methods, a large sample, a well-validated natural experiment and high-quality imaging allows us to examine, for the first time, if an additional year of education causes long-term structural reorganization in the brain.

Results

To test if an additional year of education causes long-lasting changes in neural properties we used fuzzy local-linear regression discontinuity (RD) 34,35. This continuity-based technique exploits the fact that ROSLA affected individuals based on a date of birth cutoff (September 1st, 1957). More specifically, adolescents born after this date had to spend one more year in school than those born only one day earlier. Local-linear RD analysis tests this by comparing the limits of two non-parametric functions, one fit using only participants right before the policy change to another function fit on participants right after 32. If an additional year of education affects neural outcomes, we assume that brain structure will be discontinuous exactly at the cutoff. In contrast, if ROSLA does not affect long-term neural outcomes the functions should be continuous, or smooth around the cutoff.

Using fuzzy local-linear RD we fit a series of regressions to various measures of brain structure (Fig 1) to test for any discontinuity at the cutoff. To determine the optimal number of participants to include, we use mean square error optimized bandwidths to a maximal range of 10 years before and after the cutoff (N > 30,000). Doing so, we observed no evidence of an effect from additional education on any of our preregistered global neuroimaging measures (p’s > .05; Sup. Table 2). In other words, the relationship between the year of birth and neural outcomes was continuous around ROSLA’s cutoff, indicating no differences in global structural measures from an extra year of education. The optimized bandwidths used in this analysis included participants born 20 to 35 months around the cutoff (average N = 5124). The absence of a causal effect of education was observed for all our global neuroimaging metrics: total surface area (SA), average cortical thickness (CT), normalized total brain volume (TBV), mean weighted fractional anisotropy (wFA), white matter hyperintensities (WMh), or normalized cerebral spinal fluid volume (CSF) (Figure 1; Sup. Fig. 3). These results did not change when imputing missing (∼4%) covariate data.

Regression discontinuity (RD) plot of monthly averaged a) total surface area, b) average cortical thickness, c) total brain volume, and d) weighted fractional anisotropy plotted by the participant’s date of birth in months (our running variable). Each dot reflects the average value for individuals born in that month. The dashed line corresponds to Sept. 1957 the date of birth inclusion cutoff for an additional year of mandatory education from ROSLA. We found no evidence of an effect from an additional year of education on any structural neuroimaging measures – illustrated here by a continuous line around the cutoff. Dark blue dots represent all individuals within the mean-squared error-optimized bandwidths, in turn, reflecting participants used for analysis with a local-linear fuzzy RD approach. Third-order polynomials (dark red lines) are fit on either side of the cutoff only for illustration purposes. Sup. Fig. 3 illustrates the two other preregistered global neuroimaging outcomes (total white matter hyperintensities and cerebral spinal fluid volume).

These findings strongly suggest that an additional year of education did not lead to changes detectable by MRI decades later. However, to ensure the validity of our (causal) inferences, a critical step in any regression discontinuity approach is to test the validity of the design 32. For instance, if participants can manipulate their treatment by sorting around the cutoff this severely limits the strength of causal claims. In the context of ROSLA, this is highly unlikely since the assignment was based on date of birth 37,45. However, for completeness we tested this question 46, finding no evidence that individuals were somehow able to adjust their enrollment (Tq = -0.72, p = .47; Sup. Fig. 2). A second validation approach is to employ placebo outcome tests 32. This approach uses variables that should not be causally related to your treatment (e.g., an additional year of education), to ensure the absence of spurious effects arising through unknown mechanisms. To accomplish this, we used all of our neuroimaging covariates (e.g., sex, head motion, site) as placebo outcomes, under the assumption that ROSLA should not affect these variables. Other than one covariate (summer) which was deterministically related to ROSLA (therefore excluded), none of the placebo outcomes were related to ROSLA (Sup. Table 1). Our findings add to the existing body of prior work 38,40,41,43 further solidifying ROSLA as a valid natural experiment.

Next, we tested whether there may be any regionally specific neuroimaging effects. It is possible that an additional year of education caused localized neural changes that are not picked up globally. As preregistered, we used the Desikan-Killiany cortical atlas 47 to test the effect of an additional year of education on 33 bilateral regions for both cortical thickness (CT) and surface area (SA). These analyses included on average 5080 effective participants (N range = 3884 – 7771) for CT and 5392 participants (N range = 3739 – 8727) for SA. Despite these relatively high participant numbers, we did not find an additional year of education to cause changes in any regions for CT or SA (p’sFDR > .05). This was also the case for weighted mean fractional anisotropy in all 27 tracks tested 48 (p’sFDR > .05; mean n = 4766). Lastly, we tested the volume of 18 subcortical regions, finding none to be related to an additional year of education (p’sFDR > .05, mean n = 5174). To summarize, there was no evidence of an additional year of education affecting any regional neuroimaging measures with fuzzy local-linear regression discontinuity (Sup. Figure 4).

Bayesian Local Randomization Robustness Analysis

As an additional preregistered robustness test, we used a slightly different approach, often referred to as ‘local randomization’, to analyze natural experiments. This approach works under the assumption that individuals close to the cutoff are exchangeable and similar except for the treatment (in our case an additional year of school) 32,49. To implement this analysis, we compared participants born exactly right before the cutoff (August 1957; n ≈ 130) to those born right after the cutoff (September 1957; n ≈ 100). An additional benefit is that we implemented this analysis in a Bayesian framework, allowing us to more readily interpret the strength of evidence either in favor of the null or alternative hypothesis. Our preregistered default point null Bayes factors were too wide, arguably providing relatively strong evidence in favor of the null. Taking a more conservative approach, we report these analyses with a narrower normal prior (mean=0, sd=1). Doing so, we replicated and extended our findings from fuzzy local-linear RD analysis, observing strong evidence in favor of the null hypothesis for total surface area (BF01 = 18.21), average cortical thickness (BF01 = 15.09), total brain volume (BF01 = 13.61), weighted fractional anisotropy (BF01 = 11.63), white matter hyperintensities (BF01 = 13.26), and cerebral spinal fluid volume (BF01 = 14.50). In addition, we tested across a range of priors which did not meaningfully affect our inferences, as each showed evidence in favor of the null (Sup. Table 4 & 6).

The two quantitative RD approaches described here (local-linear and local randomization) have strengths and challenges similar to the widespread bias versus variance tradeoff. As the number of months on either side of the cutoff increases, bias is introduced as participants become less similar, yet at the same time, the sample size increases, thereby lowering the variance of the estimate. Our local randomization approach runs a negligible risk of bias, but at the cost of a relatively modest sample size: One could argue the 230 participants included in our 1-month window are too few for neuroimaging outcomes. To examine the consequences of this tradeoff, we therefore expanded the boundary to a 5-month window around September 1st, 1957 (n per group ≈ 600). This larger participant pool provided further evidence in support of the null hypothesis of education not affecting global neural measures (Figure 2 & 3, Sup. Table 6). Lastly, we repeated our placebo outcome tests for both a one- and five-month window local randomization analysis, finding no associations (Sup Table 5 & 7), demonstrating the robustness of the natural experiment ROSLA and our analysis approach.

Bayes factors for surface area per region using a local randomization analysis with a 5-month window around the onset of ROSLA (September 1st, 1957). Illustrating widespread evidence against the effect of a year of education on total surface area. The regionally specific analysis of these bayes factors [reported prior: normal(0, 1)] was not preregistered and serves to illustrate our global neural findings.

Correlational Effect of Education

The above analyses employed an RD design allowing us to investigate the hypothesized causal effect of (additional) education on differences in brain structure independent of confounding pathways. To ensure our sample is at least in principle sensitive to observing brain-behavior associations (cf. 50), we reran the analysis as a simple association instead. This allows us to examine whether more years of education are associated with differences in brain structure. Notably, such an observational analysis (and resulting parameter estimate) would reflect an indeterminate mixture of causal effects as well as any indirect, sociodemographic, and individual pathways. These associations would still be of considerable potential scientific interest, but could not be interpreted as causal effects of education on brain structure. Crucially, this was done using the same subset of participants as in the local randomization analysis. Resulting in an estimate of how much one additional year of education correlates with brain structure.

First, we estimated the association between education (in years of attainment) and global neuroimaging measures using the same sample of participants from the one-month window local randomization analysis (i.e., August & September 1957; n ∼ 230). Similar to the causal approach, five of the six measures showed evidence in support of the null hypothesis (Sup. Table 4). In contrast, total surface area showed weak evidence in support of a positive association of education (BF10 = 2.3, n = 229). To increase power, we expanded our observational analysis to participants born in a five-month window around the ROSLA cutoff (Fig. 3). This considerably increased the strength of evidence in support of a positive association between years of education and total surface area (BF10 = 41.7, N = 1185). In addition, the larger pool of participants provided extreme evidence in support of an association between years of education and cerebral spinal fluid volume (BF10 = 80.7, n = 1193). This analysis highlights the stark difference between a causal and associational approach, while also providing evidence of sensitivity to brain-behavioral associations in the same sample. Of the remaining four global measures, three meaningful decreased in their strength of evidence (in terms of Jeffrey criteria51) in favor of the null when compared to the causal estimate (Fig.3; Sup. Table 6). The only global neuroimaging measure that provided a similar amount of evidence (in favor of the null) between a causal and correlative approach was mean cortical thickness (BF01 = 7.22 & BF01 = 8.81 respectively).

Bayesian Heat Plot:

Bayesian evidentiary strength (x-axis) for causal (circles) and correlational (triangles) estimates reflecting the impact of one additional year of education on global neuroimaging measures (y-axis; average cortical thickness [CT], white matter hyperintensities [WMh], total brain volume normalized for head size [TBV], mean weighted fractional anisotropy [wFA], total surface area [SA], and cerebral spinal fluid volume normalized for head size [CSF]). Stripped bands reflect the strength of evidence using Jeffrey’s criteria 1961. For the causal estimates, positive Bayes Factors indicated support for the alternative hypothesis that an additional year of education affects the brain, while negative values indicate support for the null hypothesis of no effect. The causal and correlational parameters come from the same set of participants (n ≈ 1200) born from April 1957 until Jan 1958. The causal parameter is an estimate of the effect of ROSLA with a 5-month window local-randomization analysis, the correlational parameter is an estimate of the association between a participant’s self-reported educational attainment in years and (global) neuroimaging measures.

Discussion

In a large, preregistered study we find converging evidence against a causal effect of education on long-term structural neuroimaging outcomes. This null result is present across imaging modalities, different regions, and analysis strategies. We find no issues with the design of the 1972 ROSLA, substantiating it as a valid natural experiment, in agreement with prior work 38,4043. Despite a large sample (min N = 4238), we find no evidence of an effect of education on any of the global neuroimaging measures with a continuity-based RD analysis. Confirming this result, we find strong evidence in support of the null hypothesis for these global neuroimaging measures using a Bayesian local randomization analysis. Moreover, we find no regionally specific effect of education on local mean cortical thicknesses or surface area across 66 cortical regions. This lack of localized effects was further confirmed in weighted mean fractional anisotropy for 27 white matter tracks, as well as subcortical gray matter volume in 18 regions. Moreover, we demonstrate the ability to find strong evidence in favor of observational associations between education and brain structure at this resolution, suggesting our findings are not due to lack of sensitivity more generally.

Our robust null result is seemingly at odds with causal inference findings of education’s positive behavioral effect on intelligence 11,14,39 – which is sustained throughout decades 12,38. This juxtaposition suggests that, to the extent that the additional year of education induced long-term changes in cognitive abilities, the neural manifestation is at a level of resolution not detectable with conventional MRI field strengths. However, this would seem to contrast with a range of influential findings demonstrating that high-intensity experimental behavioral interventions (e.g. juggling, studying, memory training) lead to measurable differences in brain structure (with similar imaging pipelines) in much smaller samples 5254. Moreover, compared to even these high-intensity interventions, a year of education is an extensive period of learning. The 1972 ROSLA was well implemented, schools had time to prepare and were given additional funding, increasing standardized formal qualifications of those affected 41,42. This leaves open the question of how to interpret this constellation of findings.

One potential explanation to account for this discrepancy is the concept of expansion-renormalization 55,56, which posits following a period of skill acquisition, the cortex initially expands and then renormalizes over the course of a few months. In our context, this would suggest that the additional year would have manifested at a level detectable in MRI when the difference in educational exposure between children pre- and post-ROSLA was most pronounced and recent. In other words, MRI effects at the macro-scale might have been detectable immediately post-ROSLA in 16-year-old adolescents, before renormalizing, to a micro-scale, leaving in place permanent, but microstructural changes. Possible cellular candidates for initial experience-dependent plasticity are an increase in dendritic spines, the swelling of astrocytes, and intracortical myelin adaptations 57,58. These structural changes may be detectable using other approaches such as in vivo cellular work (cf. 59), extreme high field strengths 56,60, or postmortem histology 61,62.

Additionally, the long period of time between additional education and neuroimaging offers both strengths and weaknesses for our design. First, it could be the case that 46 years is too long and any potential effect faded out over the years. That is, rather than having a micro-neural effect, it may be that there simply are no lingering effects at the brain level at all. In this case, it may be better to think of the (causal) impact of additional education as more akin to fitness or strength interventions which are also unlikely to persist across such a period. However, we note that prominent aging-related theories of heterogeneity argue directly against this rationale, instead positing life course experiences offer a reserve or ‘brain buffer’ that leads to an increasing cascade of processes limiting adverse aging effects 1517. Here, the timing between our intervention (ROSLA) and scanning makes our design particularly well-powered to test these theories, since education would contribute to an initial brain buffer (intercept) and any cumulative educational effect (slope) over 46 years. While our results are at odds with prior conceptual and observational work, a recent longitudinal study found prior education to not affect the rate of brain aging 63 – in alignment with our findings.

Lastly, to demonstrate the importance of controlling for unobserved confounders we conducted a simple correlation analysis – using the same subset of participants – relating education to differences in brain structure. We found evidence to support an association between more years of education and greater surface area and cerebral spinal fluid volume. This result emphasizes the need for caution in attributing causation in non-causal designs, as unobserved confounders can masquerade as an effect of interest. For instance, more education is frequently highlighted as offering behavioral and neural protection against the adverse effects of aging 1517– while we replicate this inference associationally, we find no causal evidence for any neuroprotective effects. This suggests a more complex pathway of effects unfolding over time. Environmental causes are most likely very small and additive, which makes them not only difficult to study 64 but also equally hard to adequately control65. Our findings suggest that to truly understand the neural and behavioral processes that unfold after interventions such as education we need a multipronged, mixed methods approach that combines deep phenotyping, longitudinal imaging and behavioral follow-up 6668, as well as more sophisticated models that can capture gene-by-environment-interplay 14,64,69. Only then will we be able to identify idiosyncratic environmental effects and individual characteristics underlying heterogenous lifespan development.

Here, we report a lack of causal evidence of a year of school on long-term neural outcomes in thousands of participants. An additional year of education is a substantial intervention and our preregistered findings are robust across imaging modalities, different regions, and analysis strategies. While our design cannot inform us of any short-term neural effects of education, our results call into question sustained experience-dependent plasticity, with significant ramifications for prominent theories of aging-related heterogeneity. The recent availability of large neuroimaging cohort data paired with cutting-edge methods from econometrics offers new avenues in studying neural effects. Causal inference is a new tool to the neuroimager’s toolkit – opening novel, societally relevant phenomena – with the potential to move the field of population neuroscience from one of association to one of causation.

Methods

On September 1st, 1972 the minimum age to leave school was increased from 15 years of age to 16 in England, Scotland, and Wales 37. This law, henceforth ROSLA, mandated children born after September 1st, 1957 to stay in school for an additional year. In contrast, a child born only one day earlier was unaffected and legally allowed to stop formal schooling at 15 years of age. The consequence of this law change was substantial: it resulted in almost 100% of children aged 15 staying in school for an additional year, in turn, increasing formal qualifications (Sup. Fig 2) 40,41. Crucially for our purposes, ROSLA is a well-studied natural experiment with (commonly agreed, e.g. 38,4042) high design validity. A sizable body of prior work has found behavioral effects from the 1972 ROSLA 38,4043.

In this study, we leverage ROSLA to study the causal effect of an additional year of education on the brain. To do so, we will use the neuroimaging sub-sample of the UK BioBank – the largest neuroimaging study to date – which also lines up perfectly with geographic and birth window (∼1935-1971) requirement characteristics for ROSLA 70. We followed our preregistration (https://osf.io/rv38z) closely, yet some minor deviations were necessary and explicitly outlined in ‘deviations from preregistration’ (code: https://github.com/njudd/eduBrain).

Structural neuroimaging outcomes

It is very plausible that a broad intervention like education could affect the brain in either a global, or more regionally specific manner. For this reason, we examined both whole-brain averaged measures in addition to regional specificity in atlas-based regions and tracts. We tested the following global measures; total surface area (SA), average cortical thickness (CT), total brain volume normalized for head size (TBV), mean weighted fractional anisotropy (wFA), white matter hyperintensities (WMh), and cerebral spinal fluid volume normalized for head size (CSF). Next, we examined cortical thickness (CT) and surface area (SA) regionally using the Desikan-Killiany Atlas 47. The temporal pole was not included by the UK BioBank making the total number of regions 66. We will also test weighted mean fractional anisotropy (wFA) with a global average and, regionally on 27 white matter tracks 48. Lastly, we examine subcortical volume in 18 regions 71.

All measures are derived from the image preprocessing pipeline from the UKB, further preprocessing details are outlined elsewhere 44,72. All outliers were identified if they were either above quartile 3 plus 1.5 times the interquartile range (IQR) or below quartile 1 minus 1.5 times the IQR and brought to the fence by manually recoding them to this limit (Tukey/Boxplot Method). Our alpha level for global neuroimaging measures (mean CT, total SA, average FA, WM hyper-intensities, normalized TBV & normalized CSF) is 0.05. The regional metrics (SA, CT, wFA & subcortical structures) are false discovery rate (FDR) corrected using the number of regions per modality (e.g., 66 regions for SA) with a q value less than 0.05 considered significant. Neuroimaging measures are reported in raw units.

Continuity-based framework: Local-linear fuzzy regression discontinuity

Regression discontinuity (RD) is a technique we use to estimate the effect of an intervention on an outcome where assignment (usually binary) is based on a cutoff of a running or ‘forcing’ variable 31,32 – in our case, age in months. One major design issue in RD is if participants select into (or out of) the treatment group by sorting around to the cutoff of the running variable. However, as the 1972 ROSLA law was not pre-announced, generally strictly enforced, and affected teenagers, such alternative explanations are highly improbable 45. Nevertheless, we conducted a density test of the running variable to check for bunching near the cutoff 46. Although the design validity of the 1972 ROSLA is well established 38,4043 we still tested a variety of placebo outcomes (outlined in ‘covariates of no interest’) – outcomes implausible to be affected by the intervention.

RD analysis broadly falls into two separate but complementary frameworks 36,49. The first, continuity-based approach defines the estimand as the difference between the limits of two continuous non-parametric functions: one fit using only participants right before the policy change to another function fit on participants right after. The other, so-called local randomization approaches assume participants are ‘as if random’ in a small window (ω) around the cutoff (described more in detail in the section “Bayesian Local Randomization analysis”). In this case, the estimand is the mean group difference between participants before and after the cutoff within ω. As ω approaches zero around the cutoff, the estimand becomes conceptually more similar to continuity-based approaches.

To empirically test whether an additional year of education caused long-lasting global and regional neural changes we used a fuzzy local linear regression discontinuity design (a continuity-based approach) with robust confidence intervals from the RDHonest package 31,34,35. Our outcome variables are the neuroimaging metrics described above, which were adjusted to increase statistical precision (see section ‘covariates of no interest’). The running variable (X) is a participant’s date of birth in months (mDOB), as is convention, it was centered at zero around the birth cutoff of ROSLA (September 1st, 1957). Our first-stage (fuzzy) outcome was a dummy coded variable of whether the participant completed at least 16 years of education. Participants who indicated they completed college were not asked this question, therefore we recorded their response as 21 years 38. Our choice of using the RDHonest package was primarily due to its ability to provide accurate inference with discrete running variables (in our case mDOB). We used the default settings of local-linear analysis on MSE-derived bandwidths with triangular kernels 34,35.

We included participants born in England, Scotland, or Wales 10 years on either side of the September 1st, 1957 cutoff (dob Sept. 1st, 1947 – Aug 31st, 1967) with neuroimaging data. The range of the running variable (age in our case) to include on either side of the cutoff (known as the bandwidth; z) is one of the most consequential analytical decisions in regression discontinuity designs. Prior work using ROSLA in the UK BioBank has analyzed bandwidths as large as 10 years 43 to as small as a year 38. Large bandwidths include more participants (decreasing variance) yet these participants are also further away from the cutoff, in turn, potentially increasing bias in the estimand 36,49. Conversely, smaller bandwidths provide a less biased, yet noisier estimates. State-of-the-art continuity-based RD methods use data-driven bandwidth estimation such as mean squared error (MSE) optimized bandwidths 73. Since we used this approach the optimal bandwidth range and, in turn, the number of included participants, will differ per fitted model. This also makes it logically incompatible to sensitivity test our bandwidths 36, since they are mean squared error derived – widening them will lower variance and, in turn, increase bias while tightening them will have the opposite effect. Lastly, the use of triangular kernels means participants further away from the cutoff will be weighted less than those closer 36. Both MSE-optimized bandwidths and triangular kernels determine the number of ‘effective observations’ to be fit by a fuzzy local linear RD model.

For our global continuity-based analysis we made sure the results did not change due to missing covariate data. This was accomplished by imputing missing covariate data (≈4%) with classification and regression trees from the MICE package 74. We imputed using information from only the variables included in each analysis and did not use the running variable (DOB in months) or our first-stage instrument for prediction. We did this ten times checking the estimates and inferences across each iteration to ensure robustness.

Bayesian Local Randomization Analysis

As a robustness test and to provide evidence for the null hypothesis, we conducted a Bayesian analysis using the local randomization framework for RD 32. This alternative framework assumes a small window (ω) around the cutoff (c) where the running variable is treated “as if random” 49. While the local randomization framework invokes stricter assumptions on the assignment mechanism, placing more importance on the window around the cutoff (ω = [c − w, c + w]) it handles discrete running variables well 49. As the number of months on either side of the cut-off increases, bias is introduced as subjects become less similar, yet at the same time, the sample size increases, thereby lowering the variance of the estimate.

As recommended 32,49, we included participants within the smallest window possible (ω = 1 month; August vs September 1957), then expanded to a 5-month window around September 1st, 1957. A dummy variable (ROSLA) was constructed to reflect if a participant was impacted by the policy change. We then tested the effect of this variable on our six global neuroimaging measures while correcting for the covariates listed above. A few neuroimaging covariates did not have sufficient observations to be included (see ‘deviations from preregistration’).

Models were fit in R (v. 4.3.2) with rstanarm with Markov Chain Monte Carlo sampling of 80,000 iterations over 4 chains 75. All priors (p) used a normal distribution centered at 0 with autoscaling [p*sd(y)/sd(x)]. Our preregistration referred to using the ‘default’ weakly informative prior of STAN (i.e., 2.5 SDs). However, this is a relatively wide prior for point null Bayesian hypothesis testing, and at odds with the defaults from packages meant for this purpose (e.g., BayesFactor). We therefore deviated from our preregistration and reported Bayes Factors with a normal prior centered at 0 with a standard deviation of 1 (medium informative). We also report strongly informative (SD = .5) and weakly informative (SD = 1.5) normal priors.

Model diagnostics were checked with trace plots, posterior distributions, and rhat values (<1.05) 76. We then computed log Bayes Factors using the bayestestR package 77 using the Savage-Dickey density ratio with a point-null of 0 for each of the 3 priors. The strength of evidence was interpreted on a graded scale using the criteria preregistered 51. If the two frameworks disagree our primary inference will be based on the continuity-based framework.

Similarly to our continuity-based approach, we conducted placebo outcome tests within both windows (ω = 1 & 5 months) using each included covariate as the outcome being predicted by ROSLA. In addition, we preregistered four placebo cutoffs – an analysis where the cutoff is artificially moved to test the specificity of the effect – yet null findings made this test no longer necessary (see ‘deviations from preregistration’).

Correlational Analysis

To compare our results to a correlational approach, we tested self-reported years of total education (EduYears) on the six global neuroimaging measures using the same participants as the local randomization analysis (ω = 1 & 5 months). The analysis pipeline is identical to the local randomization approach, except the dummy coded term ROSLA was substituted for continuous EduYears.

Covariates of no interest

In neuroimaging it is common to include covariates. However, for identification in a valid regression discontinuity design covariates by definition should be equal on either side of the cutoff 36. We used standard neuroimaging covariates for two purposes, 1) to further test the validity of the 1972 ROSLA and 2) to increase statistical precision in our RD analysis. For instance, there are large sex differences in certain neural measures mostly related to head size 78, therefore we included a dummy variable for sex. We expect ROSLA to not affect the proportion of males and females, yet including this measure as a covariate will increase the estimation precision for measures sensitive to sex-related head size differences (e.g., surface area). [As expected, we found a 24% precision gain for total Surface Area, measured as the difference in CI width of the RD parameter pre and post-correction. Precision increased for all global measures (3-7%, 24%)].

Children born in July and August could technically leave school at 15 after 1972 ROSLA due to the exam period ending in mid-June and the school year starting on September 1st 40. This is an artifact of asking ‘What age did you complete full-time education?’, rather than ‘How many years of education did you complete?’. This artifact predates and is not influenced by ROSLA 79, yet to account for this we included a dummy variable for children born in July or August.

Lastly, Neuro-UKB recruitment is still ongoing and started in April 2014 70. In turn, there is a wide range of intervals in terms of the period between when someone was born and the age at which they were scanned. We control for this discrepancy by including a variable ‘date of scanning’ (DoS). This was done by coding the earliest date a (included) participant was scanned to 0 and counting the number of days until the last included participant. This resulted in a value for each participant that was the number of days they were scanned after the first participant. We also include this variable squared, so we can model quadratic effects, resulting in two variables (DoS & DoS2). Other (preregistered) neuroimaging quality control measures were included such as scanning site, head motion, whether a T2 FLAIR sequence was used, T1 intensity scaling, and 3 diffusion measures recommended 80. No additional covariates other than those preregistered (https://osf.io/rv38z) were added.

Our first aim, known as a placebo outcome test, involved testing the effect of an additional year of education on each covariate. This was accomplished identically to our neuroimaging outcomes (see section ‘Continuity-based framework’) using RDHonest. We expect there to be no effect of ROSLA on any covariate. This was also tested for the Local-Randomization Framework (ω = 1 & 5 months).

For our second aim, increasing the precision of the RD estimand we used covariate-adjusted outcomes (Y). This was done by fitting a local linear regression for each MSE-derived bandwidth z with a matrix of covariates (C′) on the unadjusted outcome (Y) using Equation 1 below. Let X denote the running variable (date of birth in months) which is centered around the ROSLA cutoff (i.e., September 1st, 1957 = 0). In the second step, this matrix of covariates is then multiplied by the fitted coefficients and subtracted from the unadjusted outcome (Y) to make a covariate-adjusted outcome (Y). Since our assignment mechanism is probabilistic (i.e., fuzzy RD) we also corrected our first-stage outcome, a dummy coded variable reflecting if the participant stayed in school until 16, in an identical manner. This method was used after personal communication with Prof. Michal Kolesar the package creator of RDHonest. For the Bayesian local randomization analysis and correlational analysis, we simply included the covariates in the model.

Deviations from the preregistration

The study closely followed the preregistration pipeline (https://osf.io/rv38z), yet in a few minor cases it was not possible to follow. For instance, our initial plan involved placebo cutoffs – a sensitivity test where the cutoff is artificially moved to check if a significant result may also appear (for whatever reason) at other, non-hypothesized dates. Due to our lack of findings, we did not conduct this analysis as it was not necessary.

Some covariates lacked variance in specific specifications due to very few observations and therefore could not be included. In the global continuity analysis, this only impacted weighted fractional anisotropy, where the variable ‘T2_FLAIR’ was not used. This dummy-coded variable, indicating if a participant had a T2-weighed MRI scan, was also not used in the regional continuity-based analysis nor for the local-randomization analysis. The local randomization analysis included a small number of participants around September 1st 1957 therefore we did not include the covariate “summer” as this would have been isomorphic to our effect of interest (ROSLA). Lastly, for the one-month window analysis, there were not enough observations to include imaging center 11028.

Choice of priors

The Savage-Dickey density ratio is the height of the posterior divided by the height of the prior at a particular point (0 in our case for Point null Bayes Factors). This makes them particularly sensitive to the prior used. Our preregistration referred to using a ‘default weakly informative prior’. While not specified this was referencing the default prior of STAN (i.e., normal of 2.5 SDs). However, this is arguably too wide for adequate point null Bayesian hypothesis testing and at odds with the defaults from packages meant for this purpose (e.g., BayesFactor package). If we used the default prior from STAN it would have given us unrealistically strong support for the null hypothesis. We therefore deviated from our preregistration and reported Bayes Factors with a normal prior centered at 0 with a standard deviation of 1 (mediumly informative). We also report strongly informative (SD = .5) and weakly informative (SD = 1.5) normal priors both also centered at 0. All of our priors supported the null hypothesis they just varied in the amount of evidence in support of the null. Lastly, for illustration purposes, we ran the 5-month local randomization analysis for surface area regions using the mediumly informative prior (Figure 2).

Code and Data Availability

All code is publicly available (https://github.com/njudd/eduBrain). The data is also publicly available yet must be accessed via the centralized UK BioBank repository (https://www.ukbiobank.ac.uk). This research has been conducted using the UK Biobank resource under application number 23668.

Funding & Acknowledgments

Kievit was supported by a Hypatia fellowship from the RadboudUMC. This research has been conducted using the UK Biobank resource under application number 23668.

Ethical Statement

UK Biobank has ethical approval from the North West Multi-centre Research Ethics Committee (MREC) as a Research Tissue Bank (RTB) (approval number: 11/NW/0382). This consortium has its own independent ethics advisory committee (https://www.ukbiobank.ac.uk/learn-more-about-uk-biobank/governance/ethics-advisory-committee), that assures the UK Biobank abiders by the ethics and governance framework (https://www.ukbiobank.ac.uk/media/0xsbmfmw/egf.pdf). Written informed consent was obtained from all participants.

Supplementary Information

Raincloud plot of the participant’s age at neuroimaging (mean = 61.89). For anonymity reasons date of birth (DOB) is measured in months, yet the scan date was measured to the day. To derive age at neuroimaging we subtracted the number of months between DOB and the scan date setting each participant’s day of birth to the first of the month.

This regression discontinuity plot shows the effect of the law on the percentage of students staying an additional year in school (Y). This is the first stage in the two-stage fuzzy regression discontinuity analysis, this increased attendance by roughly 10%. See “first.stage” in Supp. Table 2 for coefficient estimates.

Regression discontinuity plot of monthly averaged a) cerebrospinal fluid volume and, b) white matter hyperintensities plotted by the participant’s date of birth in months (X; our running variable). The dashed line corresponds to Sept. 1957 when ROSLA came into effect. Dark blue dots represent the mean-squared error-optimized bandwidths, while 3rd order polynomials (in dark red) were fit on either side of the cutoff for illustration purposes.

Raincloud plots of a) the effective number of observations and b) uncorrected p-value of a local-linear fuzzy RD per region per modality [cortical thickness (CT), Surface Area (SA), Subcortical regions, and weighted mean fractional anisotropy (wFA)]. No regions were significant (p < .05) following FDR correction per modality.

Fuzzy RD Placebo Outcome results

[Summer is a dummy coding variable for if the participants date of birth was in July or August. This makes it inherently weighed, by design, against ROSLA (which happened on September 1st). These children can be seen in Sup. Fig 2, see methods for further details].

Fuzzy RD Global Neuroimaging Results

Fuzzy RD Uncorrected Global Neuroimaging Results

One month window Bayesian Analysis

Only participants born in August and September 1957 are included. Bayesian analysis of global neuroimaging measures (Y) using a local randomization RD (“ROSLA”; dummy coding for participants born in September 1957) and the associational effect (EduAge) of the amount of attained education in years. The estimate is the median of the posterior reported in raw units. Std. Est is the standardized estimate (for EduAge this is a standardized continuous variable, while ROSLA is a dummy coded variable with 1 corresponding to being impacted by the policy). The estimate, CI & BF are reported for a normal prior (mean = 0, SD = 1). We also report Bayes factors (BF) for other priors. Two measures CSF & TBV are normalized for head size, this is reflected with “_norm”.

One Month window Test of Covariates

A Bayesian local randomization analysis of placebo outcomes with participants born in August and September 1957 included. ROSLA dummy codes participants born in September 1957. Placebo outcomes are a common method to falsify an RD design, as seen above, by definition they should be unrelated to the natural experiment (ROSLA). The estimate is the median of the posterior. The estimate, CI & BF are reported for a normal prior (mean = 0, SD = 1).

Five Month window Bayesian Analysis

Bayesian analysis of global neuroimaging measures (Y) using a local randomization RD (“ROSLA”). ROSLA using dummy coding TRUE for participants born from September 1957 until Jan 1958 and FALSE for participants born April-August 1957). EduAge is the associational effect of the amount of attained education in years of the same group of participants. The estimate is the median of the posterior reported in raw units. Std. Est is the standardized estimate (for EduAge this is a standardized continuous variable, while ROSLA is a dummy coded variable with 1 corresponding to being impacted by the policy). The estimate, CI & BF are reported for a normal prior (mean = 0, SD = 1). We also report Bayes factors (BF) for other priors. The suffix “_norm” means normalized for head size.

Five Month window Test of Covariates

Five-month bandwidth Bayesian local randomization analysis of placebo outcomes. Placebo outcomes are a common method to falsify an RD design, by definition, they should be unrelated to the natural experiment (ROSLA). ROSLA dummy codes participants born after September 1st, 1957. The estimate is the median of the posterior. The estimate, CI & BF are reported for a normal prior (mean = 0, SD = 1).