Generative network modeling reveals quantitative definitions of bilateral symmetry exhibited by a whole insect brain connectome
Abstract
Comparing connectomes can help explain how neural connectivity is related to genetics, disease, development, learning, and behavior. However, making statistical inferences about the significance and nature of differences between two networks is an open problem, and such analysis has not been extensively applied to nanoscale connectomes. Here, we investigate this problem via a case study on the bilateral symmetry of a larval Drosophila brain connectome. We translate notions of ‘bilateral symmetry’ to generative models of the network structure of the left and right hemispheres, allowing us to test and refine our understanding of symmetry. We find significant differences in connection probabilities both across the entire left and right networks and between specific cell types. By rescaling connection probabilities or removing certain edges based on weight, we also present adjusted definitions of bilateral symmetry exhibited by this connectome. This work shows how statistical inferences from networks can inform the study of connectomes, facilitating future comparisons of neural structures.
Editor's evaluation
This important work demonstrates a significant asymmetry between the connectivity statistics of the left and right hemispheres of the Drosophila larva brain. The evidence supporting the conclusions is compelling and represents a first step toward the development of statistical tests for comparing pairs of connectomes more generally. This work will therefore be of interest to the broad neuroscience community.
https://doi.org/10.7554/eLife.83739.sa0Introduction
Connectomes – maps of neural wiring – have become increasingly important in neuroscience, and are thought to be an important window into studying how connectivity relates to neural activity, evolution, disease, genetics, and learning (Vogelstein et al., 2019; Abbott et al., 2020; Barsotti et al., 2021; Galili et al., 2022). However, many of these pursuits in connectomics depend on being able to compare networks. For instance, to understand how memory relates to connectivity, one would need to map a connectome which has learned something and one which has not, and then assess whether and how the two networks are different. To understand how a gene affects connectivity, one would need to map a connectome from an organism with a genetic mutation and one from a wild-type organism, and then assess whether and how the two networks are different. Authors have advocated for comparing connectomes across the phylogenetic tree of life (Barsotti et al., 2021; Galili et al., 2022), disease states (Abbott et al., 2020), life experiences (Galili et al., 2022; Abbott et al., 2020), development (Galili et al., 2022), and sex (Galili et al., 2022).
Several recent works have already started toward this goal of comparative connectomics. Gerhard et al., 2017, compared the connections in the nerve cord (the insect equivalent of a spinal cord) of the L1 and L3 stages of the larval Drosophila melanogaster to understand how these connections change as the animal develops. Similarly, Witvliet et al., 2021, collected connectomes from Caenorhabditis elegans at various life stages, and examined which connections were stable and which were dynamic across development. Cook et al., 2019, generated connectomes for both a male and hermaphrodite C. elegans worm to understand which aspects of this organism’s wiring diagram differ between the sexes. Valdes-Aleman et al., 2021, made genetic perturbations to different individual D. melanogaster fly larva, and examined how these perturbations affected the connectivity of a local circuit in the organism’s nerve cord. Viewed through the lens of the wiring diagrams alone (i.e. ignoring morphology, subcellular structures, etc.), these pursuits all amount to comparing two or more networks.
In addition to those described above, one comparison that has been prevalent in the connectomics literature is to assess the degree of left/right structural similarity of a nervous system. Bilateria is a group of animals which have a left/right structural symmetry. This clade is thought to have emerged around 550 million years ago (Fedonkin and Waggoner, 1997), making it one of the oldest groups of animals. Most organisms studied in neuroscience (including C. elegans, D. melanogaster, mice, rats, monkeys, and humans) are all bilaterians. While functional asymmetries in the brain have been discovered, this axis of structural symmetry is generally thought to extend to the brain (Hobert, 2014).
Connectomic studies have investigated this structural similarity in various ways. The degree of left/right symmetry in a single connectome has often been studied as a proxy or lower bound for the amount of stereotypy that one could expect between connectomes of different individuals. Lu et al., 2009, reconstructed the connectome of the axons projecting to the interscutularis muscle on the left and right sides of two individual mice. They found that the branching patterns of axons between the left and right sides within one animal were no more similar than a comparison between the two animals, and also no more similar than two random branching patterns generated by a null model. In contrast, Schlegel et al., 2021, found a striking similarity between the morphologies of neurons (as measured by NBLAST; Costa et al., 2016) in the left and right hemispheres of the D. melanogaster antennal lobe, and a similar level of stereotypy between the antennal lobes of two different individuals. Cook et al., 2019, used the observed level of left-right variability in a C. elegans hermaphrodite connectome as a proxy for the amount of variability in connectivity between individuals, assuming that one should expect the connectomes of the left and right to be the same up to developmental and experiential variability. Conversely, they also point out the fact that the ASEL neuron (on the left side) projects more strongly to neuron class AWC than the analogous version on the right, verifying this difference via fluorescent labeling in another animal. Similarly, in confocal imaging studies in Drosophila, the vast majority of genetically defined cell types were found to have bilaterally symmetric morphologies, but with one notable exception. A population of neurons projecting to the aptly named asymmetric body were found to preferentially target this structure on the right hemisphere in most animals (Jenett et al., 2012; Wolff and Rubin, 2018), and this bias was even found to be related to function (Pascual et al., 2004). These studies highlight the complicated relationship between neuroscientists and bilateral symmetry: at times, we may assume that the left and right sides of a nervous system are in some sense the same in expectation, but at other times we find marked, reproducible differences between them. To date, no study (to our knowledge) has framed this question of bilateral symmetry of connectivity as a statistical hypothesis comparing two networks.
In this work, we compare the connectivity of the left and the right hemispheres of an insect connectome from the perspective of statistical hypothesis testing. Motivated by the discussion above, in this work we make three major contributions: (1) we formally state several notions of bilateral symmetry for connectomes as statistical hypotheses, (2) we present test procedures for each of these hypotheses of bilateral symmetry, and (3) we demonstrate the utility of these tests for understanding the significance and nature of bilateral symmetry/asymmetry in the brain of a D. melanogaster larva. In doing so, we provide a framework and methodology for any neuroscientist wishing to compare two networks, facilitating future work in comparative connectomics. We also provide Python implementations and documentation for the statistical tests for network comparison developed in this work.
Results
Connectome of a larval Drosophila brain
Recently, authors mapped a connectome of the brain of a D. melanogaster larva (Winding et al., 2023). To understand how the neurons in this brain were connected to each other, the authors first imaged this brain using electron microscopy (Ohyama et al., 2015), and then manually reconstructed each neuron and its pre- and post-synaptic contacts. This synaptic wiring diagram consists of 3016 neurons and over 548,000 synapses. We represent this connectome as a network, with nodes representing neurons and edges representing some number of synapses between them (Figure 1). Importantly, this work yielded a complete reconstruction of both the left and right hemispheres of the brain. In order to assess bilateral symmetry, we focused on the left-to-left and right-to-right (ipsilateral) induced subgraphs. While there are conceivable ways to define bilateral symmetry which include the contralateral connections, we did not consider them here in order to restrict our methods to the more widely applicable case of two-network-sample testing. More details on how we created the networks to compare here are available in Network construction. This process yielded a 1506 neuron network for the left hemisphere, and a 1506 neuron network for the right (note that the number of nodes in the two hemispheres need not have been exactly the same).
We sought to understand whether these two networks were significantly different according to some definition, in order to characterize whether this brain was bilaterally symmetric. As with any statistical hypothesis test, this required that we make some modeling assumptions about the nature of the networks being compared. We stress that our subsequent results should be interpreted in light of these models and what they do (and do not) tell us about these networks (see Váša and Mišić, 2022, for an excellent discussion of this point in network neuroscience, and see Limitations for a discussion of alternative modeling assumptions). For all of our models, we treated the networks as directed (since we knew the direction of synapses), unweighted (creating an edge when there was one or more synapse between neurons unless otherwise specified), and loopless (since we ignored any observed self-loops). We made no assumptions about whether individual neurons in the left hemisphere correspond with individual neurons in the right hemisphere. Next, we detail a series of more specific models, what aspects of the networks they characterize, and how we construct a hypothesis test from each.
Density test
Our first test of bilateral symmetry was based on perhaps the simplest network model, the Erdos-Renyi (ER) model (Gilbert, 1959; Erdős and Rényi, 1960), which models each potential edge as independently generated with the same probability, . Comparing two networks under the ER model amounts to simply comparing their densities (Figure 2A).
This comparison of probabilities can be tested using well-established statistical machinery for two-sample tests under the binomial distribution (see ER model and density testing for more details). We refer to this procedure as the density test.
Figure 2B shows the comparison of the network densities between the left and right hemisphere networks. The densities of the left and right are ~0.016 and ~0.017, respectively, making the density of the left ~0.93 that of the right. To determine whether this is a difference likely to be observed by chance under the ER model, we ran a two-sided chi-squared test, which tests whether the probabilities of two independent binomials are significantly different. This test yielded a p-value , suggesting that we have strong evidence to reject this version of our hypothesis of bilateral symmetry. While the ratio of the estimated densities is only ~0.93, this extremely small p-value resulted from the large sample size for this comparison, as there are 2,266,530 potential edges on both the left and the right sides.
To our knowledge, when neuroscientists have considered the question of bilateral symmetry, they have not meant such a simple comparison of network densities. In many ways, the ER model is too simple to be an interesting description of connectome structure. However, it is also striking that perhaps the simplest network comparison produced a significant difference between brain hemispheres for this brain. It is unclear whether this difference in densities is biological (e.g. a result of slightly differing rates of development for this individual), an artifact of how the data was collected (e.g. technological limitations causing slightly lower reconstruction rates on the left hemisphere), or something else entirely. Still, in addition to highlighting a simple departure from symmetry in this dataset, the density test result also provides important considerations for other tests. More complicated models of symmetry could compare other network statistics – say, the clustering coefficients, the number of triangles, and so on. These statistics, as well as the model-based parameters we will consider in this paper, are strongly related to the network density (Suarez et al., 2022; Chen et al., 2021). Thus, if the densities are different, it is likely that tests based on any of these other test statistics will also reject the null hypothesis of bilateral symmetry. Later, we describe methods for adjusting for a difference in density in other tests for bilateral symmetry.
Group connection test
To understand whether this broad difference between the hemispheres can be localized to a specific set of connections, we next tested bilateral symmetry by making an assumption that the left and right hemispheres both come from a stochastic block model (SBM). Under the SBM, each neuron is assigned to a group, and the probability of any potential edge is a function of the groups to which the source and target neurons belong. For instance, the probability of a connection from a neuron in group to a neuron in group is set by the parameter , where is a matrix of connection probabilities if there are groups. Here, we used broad cell type categorizations from Winding et al., 2023, to determine each neuron’s group (see Figure 3—figure supplement 1 for the number of neurons in each group in each hemisphere, see Table 1 for naming conventions). Alternatively, there are many methods for estimating these assignments to groups for each neuron which we do not explore here (see Limitations for discussion on this point). Under the SBM with a fixed group assignment for each node, testing for bilateral symmetry amounts to testing whether the group-to-group connection probability matrices, and , are the same.
Rather than having to compare one probability as in Equation 1, we were interested in comparing all group-to-group connection probabilities between the SBM models for the left and right hemispheres. We developed a novel statistical hypothesis test for this comparison, which uses many tests to compare each of the group-to-group connection probabilities, followed by appropriate correction for multiple comparisons (when examining the individual group-to-group connections) or combination of p-values (when assessing the overall null hypothesis in Equation 2). Details on the methodology used here is provided in SBM and group connection testing, and is shown as a schematic in Figure 3A. We refer to this procedure as the group connection test.
Figure 3B shows both of the estimated group-to-group probability matrices, and . From a visual comparison of and , the group-to-group connection probabilities appear qualitatively similar. Note also that some group-to-group connection probabilities are zero, making it nonsensical to do a comparison of probabilities. We highlight these elements in the matrices with explicit ‘0’s, and note that we did not run the corresponding test in these cases. Figure 3C shows the p-values from all 285 tests that were run to compare each element of these two matrices. After multiple comparisons correction, seven tests produced p-values less than , indicating that we could reject the null hypothesis that those specific connection probabilities are the same between the two hemispheres. We also combined all (uncorrected) p-values, yielding an overall p-value for the entire null hypothesis (Equation 2) of equivalence of group-to-group connection probabilities of .
Taken together, these results suggest that while the group-to-group connections are roughly similar between the two hemispheres, they are not the same under this model. Notably, there are seven group-to-group connections which were significantly different: Kenyon cells (KC) → KC, lateral horn neurons (LHN) → other, other → LHN, other → other, projection neurons (PN) → LHN, somatosensory projection neurons () →other, and . We stress that, as with any statistical test, a lack of a significant difference (e.g. in other subgraphs) could be the result of the null hypothesis of no difference being true, or simply from a lack of power against a particular alternative (see Figure 3—figure supplement 2 and Figure 3—figure supplement 3 for analysis of the power of this test in simulation, and Helwegen et al., 2023, for an excellent discussion on this point). Nevertheless, knowing some neuron groups which are wired significantly differently between the two hemispheres highlights the interpretability of this test. If a neuroscientist wanted to study mechanisms which could cause bilateral asymmetries in the brain, these seven group-to-group connections would be prime candidates for investigation.
However, in Density test, we saw that the densities of the two networks are significantly different. , the density of the network, can be thought of as a weighted average of the individual group-to-group connection probabilities, . Should we then be surprised that if the density is different, the group-to-group connection probabilities are, too? Interestingly, for all the group-to-group connection probabilities which are different, the probability on the right hemisphere (which has the greater density) is higher (Figure 3D). We consider this phenomenon in the next section.
Density-adjusted group connection test
Next, we examined whether the group-to-group connection probabilities on the right are simply a ‘scaled-up’ version of those on the left. Figure 3D showed that for all the individual connections which are significant, the connection probability on the right hemisphere is higher. This is consistent with the hypothesis stated above, which predicts that the connection probabilities in should be consistently higher than those in .
We thus created a test for this notion of bilateral symmetry in group-to-group connections (up to a density adjustment):
Note that these adjusted hypothesis do not test whether the density across all subgraphs of the left or right hemisphere networks are the same; rather, they are asking wither a single scaling factor ( in Equation 3) makes any significant density differences disappear from our previous comparison. To implement this hypothesis test, we first computed the density correcting constant , which is simply the ratio of the left to the right hemisphere densities, finding that 0.93. Then, we replaced each of the component tests in the group connection test with a modified version of the standard chi-squared test for non-unity probability ratios (see Density-adjusted group connection testing for more details) (Miettinen and Nurminen, 1985). We refer to this procedure as the density-adjusted group connection test (Figure 4A). The p-values for each of the component tests for the density-adjusted group connection test are shown in Figure 4B. After correction for multiple comparisons, there are two group-to-group connections which are significantly different (at significance level 0.05): KC → convergence neurons (CN) and KC → mushroom body output neurons (MBON). Thus, all significant differences between the hemispheres under this version of the SBM are associated with the Kenyon cells.
Removing Kenyon cells
Based on the results of Figure 4C, we sought to verify that the remaining differences in group-to-group connection probabilities after adjusting for a difference in density can be explained by asymmetry that is isolated to the Kenyon cells. To confirm this, we simply removed the Kenyon cells (i.e. all Kenyon cell nodes and edges to or from those nodes) from both the left and right hemisphere networks, and then re-ran each of the tests for bilateral symmetry presented here (Figure 5A). We observed significant differences between the left and right hemispheres for the density and group connection tests when excluding Kenyon cells, yielding p-values of and , respectively (Figure 5B and C). However, for the density-adjusted group connection test, the p-value was ~0.60, indicating that we no longer rejected bilateral symmetry under this definition when the Kenyon cells are excluded from the analysis (Figure 5D). This sequence of results suggests that the difference between the left and right hemispheres (at least in terms of the high-level network statistics studied here) can be explained as the combination of a global effect (the difference in density) and a cell type-specific effect (the difference in Kenyon cell projection probabilities).
It is noteworthy that the Kenyon cells were the specific cell type where we detected asymmetry after correcting for the density difference. Kenyon cells are involved in associative learning in Drosophila and other insects (Heisenberg, 2003; Aso et al., 2014; Eichler et al., 2017). Other studies have suggested that certain connections (specifically from antennal lobe projection neurons to Kenyon cells) are random (Caron et al., 2013; Eichler et al., 2017). The marked lack of symmetry we observed specifically in the Kenyon cells in the current study could be the result of these features, which suggest their uniquely non-stereotyped patterns of connectivity in this nervous system.
Edge weight thresholds
Next, we sought to examine how the definition of an edge used to construct our binary network affects the degree of symmetry under each of the definitions considered here. For the networks considered in the previous sections, we considered an edge to exist if one or more synapses from neurons to were in the dataset. To understand how our analysis might change based on this assumption, we considered two types of edge weight threshold schemes for creating a binary network before testing: the first based simply on a threshold on the number of synapses, and the second based on a threshold of the proportion of a downstream neuron’s input (Figure 6A). By varying the threshold in both schemes, we were able to evaluate many hypotheses about bilateral symmetry, where higher thresholds meant that we only considered the symmetry present in strong edges (Figure 6B).
Before running the tests for each of these notions of symmetry, we first examined the distributions of edge weights to get a sense for how we should expect these tests to perform. Figure 6C and D displays the distribution (total count) for the synapse count or input proportion edge weights, respectively. The right hemisphere has more connections than the left for all synapse count values (Figure 6C), hinting that the density of the right hemisphere will be slightly higher for any potential edge weight threshold using this definition. Conversely, the distributions of weights as an input percentage shows a different trend. For edge weights less than ~1.25%, the right appears to have more edges, but past this threshold, the counts of edges between left and right appear comparable (Figure 6D).
Figure 6E and F shows the effect of varying these thresholds on the p-values from each of our tests of bilateral symmetry. We observed that for either thresholding scheme (synapse count or input proportion), the p-value for each test generally increased as a function of the threshold – in other words, the left and right hemisphere networks became less significantly different (under the definitions of ‘different’ we have presented here) as the edge weight threshold increased. Previous works have shown that higher-weight edges are more likely to have that corresponding edge present on the other side of the nervous system (Gerhard et al., 2017; Ohyama et al., 2015). Here, we provide evidence that considering networks formed from only strong edges also decreases asymmetry at a broad, network-wide level.
To make these two thresholding schemes more comparable, we also examined these results as a function of the proportion of edges from the original network which that threshold removed (Figure 6E and F, lower x-axis). We found that when thresholding based on synapse counts, the majority (~60%) of the edges of the networks need to be removed for any test (in this case the density-adjusted group connection test) to yield non-significant p-values. Conversely, for the thresholds based on input proportion, the density-adjusted group connection test yielded a p-value greater than 0.05 after removing only the weakest ~20% of edges. Strikingly, we observed that when considering only the strongest ~60% of edges in terms of input proportion, even the density test had a high p-value (>0.05), while for the synapse-based thresholds we examined, this never occurred. We observed similar trends when running a thresholding experiment in isolation on the KC → KC subgraph (Figure 6—figure supplement 1).
These findings are consistent with previous work in connectomics which has hinted at the importance of input proportion as a meaningful ‘edge weight.’ Gerhard et al., 2017, compared the connectivity of select neurons in the nerve cord between L1 and L3 stages of the larva. They observed that while the number of synapses from the mdIV cell type onto various nerve cord local neurons can grow ~3- to 10-fold from L1 to L3, the proportion of that downstream neuron’s input stays relatively conserved. Based on this finding, the authors suggested that perhaps the nervous system evolved to keep this parameter constant as the organism develops. An analysis of wiring in the olfactory system of the adult Drosophila suggested a similar interpretation after examining a projection neuron cell type with an asymmetric number of neurons on the two sides of the brain (Tobin et al., 2017). Here, we provide further evidence based on the entire brain of the Drosophila larva that while the left and right hemispheres may appear significantly different when considering all observed connections, the networks formed by only the strongest edges (especially in terms of input proportion) are not significantly different between the hemispheres when viewed through the lens of the models considered in this work.
Discussion
Summary
We began with what was at its face a very simple question: is the connectivity on the left and the right side of this brain ‘different?’ We then described several ways that one could mathematically formalize notions of ‘different’ from the perspective of network model parameters: difference in density of connections across the entire network (Density test), difference in group connection probabilities (Group connection test), or difference in group connection probabilities while adjusting for a difference in density (Density-adjusted group connection test). We proposed a test procedure corresponding with each of these notions, relying on well-established statistical techniques for evaluating contingency tables and combining p-values to construct our tests. The results of these different test procedures varied markedly (Table 2). Specifically, we saw that the network densities were significantly different between the hemispheres. The group connection test also detected a difference, highlighting seven group-to-group connections which had significantly differing connection probabilities when comparing the hemispheres. However, when we added an adjustment to the group connection comparison to account for the difference in network density, this test had only two significant group connections, and both were projections from the Kenyon cells. Thus, the asymmetry observed (at least when viewed through the lens of these high-level network statistics) between the hemispheres can be thought of as a global density difference in addition to a cell type-specific effect shown in the Kenyon cells. We confirmed this finding by simply removing the Kenyon cells, and showing that the density-adjusted group connection test no longer rejected (Removing Kenyon cells). Finally, we examined whether the left and right hemisphere networks would become less dissimilar when only high-edge-weight edges were considered (Edge weight thresholds). We found that whether thresholding based on number of synapses or the proportion of input to the post-synaptic neuron, p-values generally increased for each test (i.e. less significant asymmetry was detected) as the edge weight threshold grew. However, we observed that thresholds based on neuron input proportion could achieve symmetry while removing fewer (only 20% for some tests) edges. These results are consistent with the idea that the nervous system evolved to preserve a relative balance of inputs to individual neurons, which has been suggested by previous studies on specific subcircuits in the larval and adult Drosophila nervous system (Gerhard et al., 2017; Tobin et al., 2017; Berck et al., 2016).
Limitations
As with any statistical inference, our conclusions are valid under particular model assumptions. Therefore, it is important to highlight the assumptions which motivated each of our tests in order to understand what each p-value means (and what it does not). We highlight several of these assumptions below, and comment on alternative assumptions that one could make in each case.
What model?
First, while we motivated the tests presented here by assuming that some statistical model produced the connectivity of the left and the right hemispheres, these models do not literally describe the process which generated these networks. However, without knowledge of how genes and development give rise to the connectome, we know of no more correct model for how this connectome was generated (Vogelstein et al., 2019; Witvliet et al., 2021; Barabási and Barabási, 2020) (and even this would still be just a model). Without an agreed upon definition of bilateral symmetry, we chose to start from the simplest definition of what one could mean by bilateral symmetry. From this simplest network model, we iteratively added complexity to the definition of bilateral symmetry until we found the simplest model for which the Drosophila larva connectome displayed no significant asymmetry. We also note that previous studies have found associations between the test statistic we study here (graph or subgraph density) and various other biological properties, such as development (Witvliet et al., 2021), neurodegeneration (Pfeiffer et al., 2020), and phylogenetics (Suarez et al., 2022).
However, many other network models could have been applied to examine different definitions of bilateral symmetry. For instance, SBM may fail to capture certain features of an empirically observed network, such as degree distributions. This led to the development of the popular degree-corrected SBM (Karrer and Newman, 2011), which adds parameters to account for heterogeneous node degrees. A modified group connection test which also compares these degree correction parameters would be a natural extension of the current work, but requires further study to establish as a valid statistical test. Tests based on the random dot product graph model (Tang et al., 2017; Athreya et al., 2018; Chung et al., 2022) would allow us to compare connection probabilities between hemispheres without assuming that neurons belong to a finite number of groups. Bravo-Hermsdorff et al., 2021, showed that a two-network-sample test could be constructed from subgraph counts, which they argue characterize a network’s ‘texture’ rather than its ‘backbone’ as studied in this work. We also did not use network models that incorporate edge weights, as two-network-sample tests for this case are even less developed than for the unweighted case. Further, a variety of neuroscience-specific network models (such as those which incorporate spatial information) have been proposed (Váša and Mišić, 2022). Nevertheless, we note that even if one is concerned with these more elaborate notions of symmetry, they are still related to the simple models studied here. For instance, the network density would affect a network’s latent positions under the random dot product graph model, as well as the count of any possible subgraph. Thus, even if one prefers a different definition of bilateral symmetry, the definitions presented here were worth testing.
What is a cell type?
Second, even if these networks were generated from SBMs, alternative groupings of neurons could have been used. We used broad cell type categorizations from previous literature (Winding et al., 2023) to partition our network into groups. However, we could have used a coarser partition, categorizing neurons as sensory, interneuron, and descending/output. Conversely, we could have used a finer partition, splitting the cell types used here into subgroups (such as whether a sensory neuron receives odor or visual information). As these different partitions likely lead to different subgraph sizes and connection probabilities, the statistical power of the group connection test would also be affected by these choices (Helwegen et al., 2023). Thus, the results presented for any group connection test need to be interpreted in terms of the specific cell type groupings used.
Further, a rich literature exists on inferring the partition for an SBM from the observed connectivity (Lee and Wilkinson, 2019; Peixoto, 2014; Peixoto, 2017; Rohe et al., 2011; Sussman et al., 2012; Funke and Becker, 2019) – this is one perspective for clustering neurons based on their observed connectivity, much like clustering procedures are used to predict meaningful groups of neurons based on morphology, activity, or gene expression. Applying these techniques to a connectome would yield alternative groupings of neurons (as in Winding et al., 2023) to use for a group connection test, which again could change its conclusions. However, this approach requires further study, as it introduces a new source of uncertainty since more model parameters are estimated from the data.
What about neuron pairs?
Third, we assumed that the two networks we observed were unmatched – that is, the tests we applied did not use any pairing of individual neurons between hemispheres. In Drosophila, this 1-to-1 neuron correspondence is known to exist for most neurons, particularly in the larva. GAL-4 lines are able to reliably label bilateral neuron pairs on the basis of their gene expression (Jenett et al., 2012; Eschbach et al., 2020). These neurons tend to be similar in terms of their morphology and their connectivity (Winding et al., 2023; Ohyama et al., 2015; Pedigo et al., 2022; Schlegel et al., 2021; Eschbach et al., 2020; Gerhard et al., 2017; Schneider-Mizell et al., 2016). Methods which use this pairing (e.g. Tang et al., 2017; Ghoshdastidar and Von Luxburg, 2018; Bhadra et al., 2019, as well as tests based on correlated ER and SBM models) would be able to evaluate symmetry in light of edge correspondences between the two networks, and could have higher power at detecting certain asymmetries. However, these methods assume that the matching of nodes is perfect and complete – if even one neuron pairing is a mistake, or if even one neuron does not have a partner in the opposite hemisphere, then these tests could be invalid or inapplicable. We note that graph matching techniques could estimate a correspondence between nodes for all neurons (Fishkind et al., 2019; Vogelstein et al., 2015; Saad-Eldin et al., 2021; Winding et al., 2023; Pedigo et al., 2022); however, the statistical consequences of first learning this (likely imperfect) alignment prior to using a method which assumes the alignment is known and exact have not been thoroughly studied, so we did not explore it further here.
Outlook
We presented the first statistical comparison of bilateral networks in a neuron-level brain connectome. While we focused on the larval Drosophila brain connectome, these techniques could be applied to future connectomes to evaluate bilateral symmetry in other individuals or organisms. More generally, we presented several notions that can be used to compare two networks, a particularly relevant problem in the current age of connectomics. Human (macroscale) connectomics has seen an explosion in the number of network samples that can be obtained, allowing for different approaches for comparing connectomes across populations, from simple comparisons of edges (Ingalhalikar et al., 2014) to low-rank and sparse regressions across networks (Xia et al., 2020). However, nanoscale connectomics is still technologically limited in its acquisition rate, often to only one or at best a few (<10, e.g. Witvliet et al., 2021) individuals for a given experiment. Nevertheless, we wish to make valid inferences and comparisons between these connectomes (Vogelstein et al., 2019; Barsotti et al., 2021; Abbott et al., 2020; Galili et al., 2022). The framework for two-network-sample testing presented here will facilitate these kinds of comparisons. To make these comparisons more practical to neuroscientists, we demonstrated the importance of adjustments to simple null hypotheses – as we saw, even a difference in something as simple as a network density can be related to other network comparisons. For example, take the problem of comparing the connectome of the larval and adult Drosophila. Since the adult Drosophila brain has orders of magnitude more nodes (Raji and Potter, 2021; Winding et al., 2023; Bates et al., 2020), the density of this network is likely to be smaller than that of the larva. Therefore, we may want to consider a more subtle question – are the connectomes of the adult and larva different (and if so, how) after adjusting for this difference in density? These kinds of biologically motivated adjustments to out-of-the-box statistical hypotheses will be key to drawing valid inferences from connectomes which are also relevant to meaningful questions in neuroscience.
Methods
Network construction
Here, we explain how we generated networks for the bilateral symmetry comparison. We started from a network of all neurons in the brain and sensory neurons which project into it for a larval Drosophila (Winding et al., 2023). As in Winding et al., 2023, we removed neurons which were considered partially differentiated. From this network, we selected only the left-to-left (ipsilateral) induced subgraph, and likewise for the right-to-right. We ignored a pair of neurons which had no left/right designation, as their cell bodies lie on the midline (Winding et al., 2023). To ensure we had fully connected networks on either hemisphere, we took the largest weakly connected component of neurons on the left, and likewise on the right.
With this selection for our nodes of interest, we then choose our set of edges to be:
Unweighted: we only considered the presence or absence of a connection, creating a binary network. For most analyses except where explicitly indicated, this meant we considered an edge to exist if there was at least one synapse from the source to the target neuron. For this connectome, four edge types are available: axo-axonic, axo-dendritic, dendro-dendritic, and dendro-axonic. We made no distinction between these four edge types when constructing the binary networks. One could consider notions of bilateral symmetry for a weighted network, but we focused on the unweighted case for simplicity and the fact that most network models are for binary networks. We studied the effect of varying the edge weight requirement (i.e. the threshold) for an edge to exist in Edge weight thresholds.
Directed: we allow for a distinction between edges which go from neuron (pre-synaptic) to neuron (post-synaptic) and the reverse.
Loopless: we remove any edges which go from neuron to neuron , as the theory on network testing typically makes this assumption. We note that while ~18% of neurons have a connection to themselves, these self-loops comprise only ~0.7% of edges.
When comparing two networks, methods may make differing assumptions about the nature of the two networks being compared. One of the most important is whether the method assumes a correspondence between nodes (Tantardini et al., 2019). Some methods (matched comparisons, also called known node correspondence) require that the two networks being compared have the same number of nodes, and that for each node in network 1, there is a known node in network 2 which corresponds to it. Other methods (unmatched comparisons, also called unknown node correspondence) do not have this requirement. To make an analogy to the classical statistical literature on two-sample testing, this distinction is similar to that between an unpaired (unmatched) and a paired (matched) t-test. We focused on the unmatched case in this work, where we say nothing about whether any neurons on the left correspond with any specific neurons on the right.
Two-network-sample testing
Here, we describe in more detail the methods used to evaluate bilateral symmetry, each of which is based on some generative statistical model for the network. For each model, we formally define the model, describe how its parameters can be estimated from observed data, and then explain the test procedure motivated by the model. A more thorough review of these models can be found in Chung et al., 2021.
Independent edge random networks
Many statistical network models fall under the umbrella of independent edge random networks, sometimes called the inhomogeneous ER model. Under this model, the elements of the network’s adjacency matrix are sampled independently from a Bernoulli distribution:
If is the number of nodes, the matrix is an matrix of probabilities with elements in . Depending on how the matrix is constructed, we can create different models. We next describe several of these choices. Note that for each model, we assume that there are no loops, or in other words the diagonal of the matrix will always be set to zero.
ER model and density testing
Perhaps the simplest model of a network is the ER model. This model treats the probability of each potential edge in the network occurring to be the same. In other words, all edges between any two nodes are equally likely. Thus, for all , with and both running from , the probability of the edge occurring is
where is the global connection probability.
Thus, for this model, the only parameter of interest is the global connection probability, . This is sometimes also referred to as the network density. For a directed, loopless network, with nodes, there are unique potential edges (since we ignore the elements on the diagonal of the adjacency matrix). If the observed network has total edges, then the estimated density is simply
In order to compare two networks and under this model, we simply need to compute these estimated network densities ( and ), and then run a statistical test to see if these densities are significantly different. Under this model, the total number of edges comes from a distribution. This is because the number of edges is the sum of independent Bernoulli trials with the same probability. If is the number of edges on the left hemisphere, and is the number of edges on the right, then we have:
and independently,
To compare the two networks, we are interested in a comparison of vs. . Formally, we are testing:
Fortunately, the problem of testing for equal proportions under the binomial is well studied. In our case, we used a chi-squared test (Agresti, 2013) to run this test for the null and alternative hypotheses above.
SBMs and group connection testing
An SBM is a popular statistical model of networks (Holland et al., 1983). Put simply, this model treats the probability of an edge occurring between node and node as purely a function of the communities or groups that node and belong to. This model is parameterized by:
An assignment of each node in the network to a group. Note that this assignment can be considered to be deterministic or random, depending on the specific framing of the model one wants to use. Here, we are assuming is a fixed vector of assignments. We represent this non-random assignment of neuron to group by an -length vector . If there are groups, has elements in . If the -th element of is equal to , then that means that neuron is assigned to group .
A set of group-to-group connection probabilities. We represent these probabilities by the matrix , where the element of this matrix represents the probability of an edge from a neuron in group to one in group .
Thus, the probability of any specific edge can be found by looking up the appropriate element of :
In our case, we assume is known – in the case where it is not, or one simply wishes to estimate an alternative partition of the network, many methods exist for estimating . But with known, estimating becomes simple, amounting to doing subgraph density estimates. Specifically, let be the number of edges from nodes in group to nodes in group . We then compute the density of this subgraph for each pair (ignoring self-loops):
where nk is the number of nodes in group , and likewise for nl.
Assuming the SBM, we are interested in comparing the group-to-group connection probability matrices, , for the left and right hemispheres. The null hypothesis of bilateral symmetry becomes
Rather than having to compare one proportion as in ER model and density testing, we are now interested in comparing all probabilities between the SBM models for the left and right hemispheres. The hypothesis test above can be decomposed into hypotheses. and are both matrices, where each element represents the probability of a connection from a neuron in group to one in group . We also know that group for the left network corresponds with group for the right. In other words, the groups are matched. Thus, we are interested in testing, for both running from :
Now, we are left with p-values from Equation 5, each of which bears upon the overall null hypothesis in Equation 4. We therefore require some method of combining these p-values into one, or otherwise making a decision about the hypothesis in Equation 4. Many methods for combining p-values have been proposed. This problem of combining p-values can itself be viewed as a hypothesis testing problem. Denoting the th p-value from Equation 5 as , we are testing
versus the alternative hypothesis that at least one of the p-values is distributed according to some non-uniform, non-increasing density with support (Birnbaum, 1954; Heard and Rubin-Delanchy, 2018). Birnbaum, 1954, showed that no method of combining these p-values can be optimal in general to all alternatives, so we are left with a decision to make (with no universally preferred answer) about which methods to use to combine p-values (Heard and Rubin-Delanchy, 2018). Here, we select Tippett’s method (Tippett, 1931; Heard and Rubin-Delanchy, 2018) due to its ubiquity, simplicity, and power against various alternatives to bilateral symmetry under a simulation described in Power and validity of group connection test under various alternatives (Figure 3—figure supplement 4). In future work, specific classes of alternatives may motivate different methods for combining p-values, as described in Heard and Rubin-Delanchy, 2018.
We also examined the p-values from each of the individual tests after Holm-Bonferroni correction to correct for multiple comparisons. As in ER model and density testing, we used chi-squared tests (Agresti, 2013) to perform each of the individual hypothesis tests in Equation 5. Note also that in some cases, an element of and/or could be 0; in each of these cases, we did not run that specific comparison between elements, as the notion of testing for proportions being the same becomes nonsensical. We indicated these tests in Figure 3C, Figure 4C, and Figure 5C–D, and note that these tests were not included when computing the number of comparisons for the Holm-Bonferroni correction. We also note that when few edges (say, <10 are present in a given subgraph), exact tests (e.g. Fisher’s exact test; Agresti, 2013) may be more appropriate, as they do not rely on asymptotic approximations. We found that in the current work, this choice of test did not substantially affect the results (Figure 3—figure supplement 5).
Density-adjusted group connection testing
In density-adjusted group connection test, we considered the null hypothesis that the left hemisphere connection probabilities under the SBM are a scaled version of those on the right:
The scale for this comparison is the ratio of the densities between the left and the right hemisphere networks:
Analogous to the group connection testing in Equation 5, this means that the individual group connection hypotheses become
where can be viewed as a probability ratio:
In essence, we wish to test whether this probability ratio for each subgraph matches a prespecified hypothesized value, . To test Equation 5, we used a modified score test (Miettinen and Nurminen, 1985), which aims to determine whether the ratio of two proportions is significantly different from some known constant, . Note that this test reduces to the standard chi-squared test when the probability ratio . We used this score test in the individual group connection tests, with all other machinery (e.g. for combining p-values or correcting for multiple comparisons) remaining the same as in SBMs and group connection testing. We found that the results using this score test agreed well with an intuitive approach to performing the density adjustment wherein we randomly removed edges from the right hemisphere to set the densities of the networks equal, and then re-ran the standard group connection test over many resamples (Figure 4—figure supplement 1). Again, it is worth noting that when testing on very sparse subgraphs, exact versions of this test may be advisable, though these are computationally more difficult to implement (Chan, 1998).
Edge weight thresholds
To examine the effect of which edges are used to define the left and right networks on the p-values from each test, we tested various edge weight thresholds used to define our binary networks for comparison. Given a set of edges (i.e. pairs) with corresponding weights , a thresholding simply selects the subset of those edges for which is greater than or equal to some threshold, .
Let be the observed number of synapses from neuron to neuron . We considered two thresholding schemes: the first was to simply use the number of synapses from neuron to as the edge weight and the second was to consider the edge weight from neuron to to be the number of synapses from to divided by the total number of observed synapses onto neuron . We stress that the number of synapses onto neuron is not necessarily equal to the weighted degree of neuron . This is simply because we consider all annotated post-synaptic contacts onto neuron , and some number of those contacts may not be connected to another neuron in the current networks considered here. We denote the number of synapses onto neuron as . To summarize:
Synapse number threshold:
Input proportion threshold:
Given either definition of the weighting scheme, we formed a series of networks by varying the edge weight threshold, . We stress that edge weights were used only for the purposes of defining the edges to consider for our (binary) networks – the edge weights themselves were not used in the statistical tests. We then re-ran the density, group connection, and density-adjusted group connection tests for each network. The p-values for these tests are plotted against the weight thresholds and the proportion of edges removed in Figure 6E and F for the synapse number and input proportion thresholds, respectively.
Power and validity of group connection test under various alternatives
In SBMs and group connection testing, we considered the group connection test, where the goal was to test
We saw that this set of hypotheses could be decomposed into (where is the number of groups) different hypotheses
yielding a p-value for the th test, . We now consider the problem of trying to combine these p-values into one which bears on the overall hypotheses in Equation 9. We proposed using Tippett’s method for combining p-values (Tippett, 1931), and we now demonstrate the utility of this method against various alternatives.
To do so, we performed the following simulation experiment. First, we consider two hypothetical group connection matrices, and . We set . We also consider the matrix , which is a matrix denoting the number of possible edges in each block of an SBM. Here, we again set , in other words, we use the number of potential edges for each block observed for the left hemisphere network. To analyze the sensitivity of Tippett’s method to different alternatives, we conducted the following simulation: Let be the number of probabilities to perturb. Let represent the strength of the perturbation. We performed experiments using (note that if or , then we are under the null hypothesis in Equation 9). For each , we ran 50 replicates of the simulation below:
Randomly select probabilities without replacement from the elements of .
For each of the selected elements, set , where is a truncated normal distribution with support .
For each of the unselected elements, set .
For each block , sample the number of edges in that block for network 1:
Sample the number of edges in each block similarly for network 2, but using .
For each block , compare and using chi-squared tests as in SBMs and group connection testing. This yields a set of p-values for each comparison.
Apply Tippett’s method to combine the p-values into one p-value for the overall hypotheses.
We observed that the p-values obtained from Tippett’s method were valid – they controlled the probability of Type I error for any significance level (Figure 3—figure supplement 4A). Further, we observed that Tippett’s method was also powerful against differing alternatives to the null hypothesis (Figure 3—figure supplement 4B). Tippett’s method had a power of 1 against the alternative , meaning a small number of large perturbations. It also had a power of ~0.8 against the alternative , in other words, a large number of small perturbations. Thus, we concluded that Tippett’s method is a reasonable choice of method for combining p-values for our group connection test.
Code and data
Analyses relied on graspologic (Chung et al., 2019), NumPy (Harris et al., 2020), SciPy (Virtanen et al., 2020), Pandas (McKinney, 2010), statsmodels (Seabold and Perktold, 2010), and NetworkX (Hagberg et al., 2008). Plotting was performed using matplotlib (Hunter, 2007) and Seaborn (Waskom, 2021).
Data availability
The code to perform all analyses in this paper (Python 3) can be found at https://github.com/neurodata/bilateral-connectome (MIT license) and viewed as a JupyterBook (Executable Books Community, 2020) at http://docs.neurodata.io/bilateral-connectome. The version for this submission is archived at https://doi.org/10.5281/zenodo.7733481 (Pedigo, 2023). All data analyzed in this study were generated in Winding et al., 2023. These data are also included for convenience in the code repository linked above and as Figure 1—source data 1.
References
-
Statistical inference on random dot product graphs: a surveyJournal of Machine Learning Research 18:1–92.
-
Neural architectures in the light of comparative connectomicsCurrent Opinion in Neurobiology 71:139–149.https://doi.org/10.1016/j.conb.2021.10.006
-
Combining independent tests of significance*Journal of the American Statistical Association 49:559–574.https://doi.org/10.1080/01621459.1954.10483521
-
Same STATs, different graphs: exploring the space of graphs in terms of graph propertiesIEEE Transactions on Visualization and Computer Graphics 27:2056–2072.https://doi.org/10.1109/TVCG.2019.2946558
-
Statistical connectomicsAnnual Review of Statistics and Its Application 8:463–492.https://doi.org/10.1146/annurev-statistics-042720-023234
-
BookOn the Evolution of Random GraphsIn Publication of the Mathematical Institute of the Hungarian Academy of Sciences.
-
Recurrent architecture for adaptive regulation of learning in the insect brainNature Neuroscience 23:544–555.https://doi.org/10.1038/s41593-020-0607-9
-
Connectomics and the neural basis of behaviourCurrent Opinion in Insect Science 54:100968.https://doi.org/10.1016/j.cois.2022.100968
-
Random graphsThe Annals of Mathematical Statistics 30:1141–1144.https://doi.org/10.1214/aoms/1177706098
-
ConferenceExploring Network Structure, Dynamics, and Function using NetworkXProceedings of Seventh Python in Science Conference SciPy.
-
Choosing between methods of combining $p$-valuesBiometrika 105:239–246.https://doi.org/10.1093/biomet/asx076
-
Mushroom body memoir: from maps to modelsNature Reviews. Neuroscience 4:266–275.https://doi.org/10.1038/nrn1074
-
Trends in cognitive sciencesStatistical Power in Network Neuroscience 27:282–301.https://doi.org/10.1016/j.tics.2022.12.011
-
Stochastic blockmodels: first stepsSocial Networks 5:109–137.https://doi.org/10.1016/0378-8733(83)90021-7
-
Matplotlib: a 2D graphics environmentComputing in Science & Engineering 9:90–95.https://doi.org/10.1109/MCSE.2007.55
-
Stochastic blockmodels and community structure in networksPhysical Review. E, Statistical, Nonlinear, and Soft Matter Physics 83:016107.https://doi.org/10.1103/PhysRevE.83.016107
-
ConferenceData Structures for Statistical Computing in PythonProceedings of the 9th Python in Science Conference. pp. 56–61.
-
Comparative analysis of two ratesStatistics in Medicine 4:213–226.https://doi.org/10.1002/sim.4780040211
-
Efficient monte carlo and greedy heuristic for the inference of stochastic block modelsPhysical Review. E, Statistical, Nonlinear, and Soft Matter Physics 89:1–8.https://doi.org/10.1103/PhysRevE.89.012804
-
A pathoconnectome of early neurodegeneration: network changes in retinal degenerationExperimental Eye Research 199:108196.https://doi.org/10.1016/j.exer.2020.108196
-
Spectral clustering and the high-dimensional stochastic blockmodelThe Annals of Statistics 39:AOS887.https://doi.org/10.1214/11-AOS887
-
ConferenceStatsmodels: econometric and statistical modeling with pythonPython in Science Conference. pp. 92–96.https://doi.org/10.25080/Majora-92bf1922-011
-
A semiparametric two-sample hypothesis testing problem for random graphsJournal of Computational and Graphical Statistics 26:344–354.https://doi.org/10.1080/10618600.2016.1193505
-
Comparing methods for comparing networksScientific Reports 9:17557.https://doi.org/10.1038/s41598-019-53708-y
-
Null models in network neuroscienceNature Reviews. Neuroscience 23:493–504.https://doi.org/10.1038/s41583-022-00601-9
-
Connectal coding: discovering the structures linking cognitive phenotypes to individual historiesCurrent Opinion in Neurobiology 55:199–212.https://doi.org/10.1016/j.conb.2019.04.005
-
Seaborn: statistical data visualizationJournal of Open Source Software 6:3021.https://doi.org/10.21105/joss.03021
-
Neuroarchitecture of the Drosophila central complex: a catalog of nodulus and asymmetrical body neurons and a revision of the protocerebral bridge catalogThe Journal of Comparative Neurology 526:2585–2611.https://doi.org/10.1002/cne.24512
-
Multi-Scale network regression for brain-phenotype associationsHuman Brain Mapping 41:2553–2566.https://doi.org/10.1002/hbm.24982
Article and author information
Author details
Funding
National Science Foundation (DGE1746891)
- Benjamin D Pedigo
National Science Foundation (1942963)
- Joshua T Vogelstein
National Science Foundation (2014862)
- Joshua T Vogelstein
National Institutes of Health (1RF1MH123233-01)
- Carey E Priebe
- Joshua T Vogelstein
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
BDP was supported by the NSF Graduate Research Fellowship (Grant no. DGE1746891). JTV was supported by the NSF CAREER Award (Grant no. 1942963). JTV was supported by the NSF NeuroNex Award (Grant no. 2014862). JTV and CEP were supported by the NIH BRAIN Initiative (Grant no. 1RF1MH123233-01). The authors thank members of the NeuroData lab for helpful feedback.
Copyright
© 2023, Pedigo et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 1,036
- views
-
- 171
- downloads
-
- 5
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Neuroscience
Long-range axonal projections of diverse classes of neocortical excitatory neurons likely contribute to brain-wide interactions processing sensory, cognitive and motor signals. Here, we performed light-sheet imaging of fluorescently labeled axons from genetically defined neurons located in posterior primary somatosensory barrel cortex and supplemental somatosensory cortex. We used convolutional networks to segment axon-containing voxels and quantified their distribution within the Allen Mouse Brain Atlas Common Coordinate Framework. Axonal density was analyzed for different classes of glutamatergic neurons using transgenic mouse lines selectively expressing Cre recombinase in layer 2/3 intratelencephalic projection neurons (Rasgrf2-dCre), layer 4 intratelencephalic projection neurons (Scnn1a-Cre), layer 5 intratelencephalic projection neurons (Tlx3-Cre), layer 5 pyramidal tract projection neurons (Sim1-Cre), layer 5 projection neurons (Rbp4-Cre), and layer 6 corticothalamic neurons (Ntsr1-Cre). We found distinct axonal projections from the different neuronal classes to many downstream brain areas, which were largely similar for primary and supplementary somatosensory cortices. Functional connectivity maps obtained from optogenetic activation of sensory cortex and wide-field imaging revealed topographically organized evoked activity in frontal cortex with neurons located more laterally in somatosensory cortex signaling to more anteriorly located regions in motor cortex, consistent with the anatomical projections. The current methodology therefore appears to quantify brain-wide axonal innervation patterns supporting brain-wide signaling.
-
- Neuroscience
Errors in stimulus estimation reveal how stimulus representation changes during cognitive processes. Repulsive bias and minimum variance observed near cardinal axes are well-known error patterns typically associated with visual orientation perception. Recent experiments suggest that these errors continuously evolve during working memory, posing a challenge that neither static sensory models nor traditional memory models can address. Here, we demonstrate that these evolving errors, maintaining characteristic shapes, require network interaction between two distinct modules. Each module fulfills efficient sensory encoding and memory maintenance, which cannot be achieved simultaneously in a single-module network. The sensory module exhibits heterogeneous tuning with strong inhibitory modulation reflecting natural orientation statistics. While the memory module, operating alone, supports homogeneous representation via continuous attractor dynamics, the fully connected network forms discrete attractors with moderate drift speed and nonuniform diffusion processes. Together, our work underscores the significance of sensory-memory interaction in continuously shaping stimulus representation during working memory.