Nanoscale organization of rotavirus replication machineries
Abstract
Rotavirus genome replication and assembly take place in cytoplasmic electron dense inclusions termed viroplasms (VPs). Previous conventional optical microscopy studies observing the intracellular distribution of rotavirus proteins and their organization in VPs have lacked molecularscale spatial resolution, due to inherent spatial resolution constraints. In this work we employed superresolution microscopy to reveal the nanometricscale organization of VPs formed during rotavirus infection, and quantitatively describe the structural organization of seven viral proteins within and around the VPs. The observed viral components are spatially organized as five concentric layers, in which NSP5 localizes at the center of the VPs, surrounded by a layer of NSP2 and NSP4 proteins, followed by an intermediate zone comprised of the VP1, VP2, VP6. In the outermost zone, we observed a ring of VP4 and finally a layer of VP7. These findings show that rotavirus VPs are highly organized organelles.
https://doi.org/10.7554/eLife.42906.001eLife digest
Rotaviruses are small viruses that can infect cells in the intestine. They are responsible for most cases of severe infectious diarrhea, the most common cause of death among young children in developing countries. Controlling the spread of rotavirus infections is difficult, even with high levels of hygiene, so effective treatments are essential to curtail the virus’ infections. Understanding how new rotaviral particles are made in infected cells is one of the first steps toward developing new therapies.
Once rotaviruses enter the cells, proteins from the virus and the cell aggregate into compact spheres called viroplasms to make new viral particles. Studying these viroplasms used to be difficult because they are too small to see with the resolution of standard microscopes. In recent years, advances in microscopy and mathematical methods have focused on breaking the existing resolution limits, leading to the development of superresolution microscopy. This new technique has made it possible to study objects with sizes in the order of a billionth of a meter, known as nanoscopic structures, including viroplasms.
Garcés et al. use superresolution microscopy to determine how viral proteins are arranged in the viroplasm and gain a better understanding of how the viruses are assembled. The images revealed that, in infected monkey kidney cells, rotavirus proteins inside the viroplasm form highly organized concentric layers. This arrangement is reliably repeated in viroplasms of different sizes, indicating that the organization of the proteins is likely set up when the viroplasm starts to form.
These findings make use of new microscopy, image analysis and statistical tools to study rotaviruses, providing a new framework to understand many aspects of rotaviral biology. Additionally, the result showing that proteins organize consistently in viroplasms is a first step towards understanding how the machinery that makes new rotaviruses works, which could lead to future treatments for severe infectious diarrhea.
https://doi.org/10.7554/eLife.42906.002Introduction
Rotavirus is a nonenveloped virus composed of three concentric layers of proteins that enclose a genome constituted by eleven segments of double stranded RNA (dsRNA) that encode six structural proteins (VP1 to VP4, VP6 and VP7) and six nonstructural proteins (NSP1 to NSP6). The inner layer is formed by dimers of VP2 that enclose the viral genome and small numbers of molecules of the viral RNAdependent RNA polymerase (RdRp), VP1, and the capping enzyme, VP3. This nucleoprotein complex constitutes the core of the virus, which is surrounded by an intermediate protein layer of trimers of VP6, to form doublelayered particles (DLPs). The surface of the virion is occupied by two polypeptides, VP7, a glycoprotein, and VP4, which forms spikes that protrude from the VP7 shell (Estes and Greenberg, 2013). Replication of the rotavirus genome and assembly of DLPs take place in cytoplasmic electron dense inclusions termed viroplasms (VPs) (Estes and Greenberg, 2013). Once the doubleshelled particles are assembled, they bud from the cytoplasmic VPs into the adjacent endoplasmic reticulum (ER). During this process, which is mediated by the interaction of DLPs with the ER transmembrane viral protein NSP4, the particles acquire a temporary lipid bilayer, modified by VP7 and NSP4, which after being removed in the lumen of the ER by an unknown mechanism, yields the mature triplelayered virions (Estes and Greenberg, 2013). It has been reported that VP4 is located between the VP and the ER membrane and it is incorporated into triplelayered particles (TLPs) during the budding process and maturation of the virus particle inside the ER (Estes and Greenberg, 2013; Navarro et al., 2016).
The viral nonstructural proteins NSP2 and NSP5 serve a nucleation role that is essential for the biogenesis of VPs (Fabbretti et al., 1999; Silvestri et al., 2004; Vascotto et al., 2004; Campagna et al., 2005). In addition to viral proteins and genomic dsRNA, cellular proteins such as ER chaperones (MaruriAvidal et al., 2008), proteins associated with lipid droplets (Cheung et al., 2010), and ribonuclear proteins (Dhillon et al., 2018), have been shown to colocalize with VPs. Several studies have characterized the intracellular distribution of the rotavirus proteins (González et al., 2000; Petrie et al., 1982; Petrie et al., 1984; Richardson et al., 1986). Immunofluorescence studies, based upon epifluorescence or confocal microscopy, have described the viral proteins that conform the VPs, however the images are inherently diffractionlimited to a spatial resolution in the range of hundreds of nanometer, precluding the identification of the nanoscopic molecular scale organization of VPs (González et al., 1998; González et al., 2000; Eichwald et al., 2004; López et al., 2005b; Criglar et al., 2014; Martin et al., 2011; Contin et al., 2010). On the other hand, transmission electron microscopy (TEM) studies often provide images with nanometric resolution, nevertheless, immunoelectron microscopy is challenging when looking for the localization of more than a single protein (Altenburg et al., 1980; Petrie et al., 1982; Petrie et al., 1984). Over the past 15 years, a variety of superresolution microscopy (SRM) techniques have been developed to observe subcellular structures beneath the diffraction limit of optical microscopes, with resolutions in the tens of nanometers (Schnitzbauer et al., 2017; Deschout et al., 2014; Cox et al., 2011). In this work, we determined the organization of rotaviral proteins within and around VPs through the ‘Bayesian Blinking and Bleaching’ (3B) SRM technique. We developed a segmentation algorithm to automatically analyze and quantify the relative distribution of seven viral proteins, and propose a model that describes their relative spatial distribution. Also, we present a dependency model that explains the relationship between the viral proteins. This work establishes a structural framework for VP organization that future mechanistic and functional studies must take into account, and establishes key methodologies for future investigations on this subject.
Results
Qualitative analysis of VP morphology and structure through SRM
Rotavirus VPs are complex signaling hubs composed of viral and cellular proteins, packed together with viral RNAs. By TEM, they roughly resemble circular electrodense structures whose internal components lack an obvious degree of spatial organization (Altenburg et al., 1980; Eichwald et al., 2012). In this work, we determined the relative spatial distribution of VPs components by immunofluorescence and SRM in MA104 cells infected with the rhesus rotavirus strain RRV at 6 hr postinfection (hpi), using proteinspecific antibodies. Due to their important role as nucleating factors during VP biogenesis, we selected either NSP2 or NSP5 as spatial relative reference for the distribution of the VP1, VP2 and VP6 proteins. VPs were optically sectioned through total internal reflection fluorescence microscopy (TIRF), with an excitation depth of field restricted to 200 $nm$ from the coverslip. This approach avoids excitation of fluorophores marking structural components located away from this plane, that is towards the inner cellular milieu. Additionally, NSP2 was also coimmunostained with the viral outer layer protein VP4 as well as with the ER resident proteins NSP4 and VP7, all of which have been reported to form separate ringlike structures that closely associate with VPs (González et al., 2000). In order to gain more insight into the morphogenesis of rotavirus, we analyzed the distribution of both VP7 monomers (VP7Mon) and trimers (VP7Tri) since this protein is assembled into virus particles in the latter form (Kabcenell et al., 1988). The nanoscale distribution of VPs was then analyzed through 3BSRM, with improvements in the technique, developed in the present work, to solve nanoscopic structures (‘Stochastic model fitted for 3B super resolution microscopy’Appendix 1). By different methods of analysis VPs exhibit roughly a circular shape (Figure 1A–E). However, unlike the diffractionlimited image (Figure 1B), in superresolution microscopy structural details of VP are appreciated, like the different layer distributions of viral components with respect to NSP2 (Figure 1C–E). In addition to VPs, by diffractionlimited TIRF microscopy we detected in the cytoplasm several small and dispersed puncta of fluorescence (Figure 1B), and in these images it is also sometimes possible to differentiate the distribution of NSP2 from that of VP4, a closely viroplasmassociated viral protein (see also González et al., 2000); in this case, VP4 is detected as a ringlike structure that surrounds the VP. Nevertheless, the small size of the VPs effectively precludes measurement of component distribution for the majority of its structural elements, as their separation is below the spatial resolution of typical optical microscopes. In contrast, images obtained by 3BSRM do allow the study of the relative distribution of the VP components (Figure 1C–E). In the case of SRM images of VP4 (Figure 1C), we observed that this protein forms a ringlike structure that does not colocalize with NSP2, and also ribbonlike projections that extend towards the cytoplasm, details that were not apparent in images captured with conventional fluorescence microscopy (Figure 1B). Additionally, we observed that the small puncta of proteins detected in the cytoplasm were in fact ribbonlike structures composed of various viral proteins that may represent different organization forms of the viroplasmic proteins (Figure 1C). In this regard, it is interesting to note that both NSP2 and VP4 have been reported to have at least two different intracellular distributions (González et al., 2000; Nejmeddine et al., 2000; Criglar et al., 2014). An examination of 3BSRM images of VPs (Figure 1C–E) revealed that the viral components form ring like structures within the VPs and are arrayed as rather discrete concentric layers. As seen in Figure 1C–E, we find that although the structural proteins VP1, VP2 and VP6 partially overlap in position with NSP2, the bulk of the proteins form separate and distinct layers. Also, the monomeric as well as the trimeric forms of VP7 are clearly distinguished from NSP2, forming an outer ring. Of interest, the spatial distribution of NSP4 colocalized with that of NSP2, an unexpected result since, as mentioned, NSP4 is an ER integral membrane protein (see the Discussion section), and as such it was expected to colocalize with VP7 rather than with an internal viroplasmic protein (Petrie et al., 1984). With regard to NSP5, it was observed distributed inside the ring formed by NSP2 (Figure 1E).
Quantitative characterization of VPs structure by a novel segmentation algorithm
A qualitative analysis of the distribution of the VP components through 3BSRM suggested that these are arranged as concentric spherical shells; thus, we set out to quantitatively validate the circularity of the VP shape. For this, we developed a segmentation algorithm based on a least squares approach, which we called ‘Viroplasm Direct Least Squares Fitting Circumference’ (VPDLSFC) (see ‘Segmentation Algorithm’ in Appendix 1), to measure the spatial distribution of the components within individual VPs by adjusting concentric circumferences. This method is automatic, deterministic, easy to implement, and has a linear computational complexity. The performance of VPDLSFC was tested on approximately 40,000 ‘ground truth’ (GT) synthetic images, showing a high robustness to noise and partial occlusion scenarios. Additionally, we compared our method with two other alternative methods (Gander et al., 1994), and our approach displayed an improved performance (see ‘Algorithm Validation’ in Appendix 1). Based on this new algorithm, we find that the mean radius of the NSP5 distribution was smaller than that of NSP2, suggesting that NSP5 is located in the innermost section, as a component of the core of VPs (Figure 2A). On the other hand, the distribution of the structural proteins VP1, VP2 and VP6 exhibit slightly larger mean radii than that of NSP2, and are thus primarily localized in a zone surrounding NSP2. Continuing further towards the outer regions of the VP, we observed a region occupied by the spike protein VP4. Finally, the two different forms of VP7 (VP7Mon and VP7Tri) were located together, close to the most external region of the VPs (Figure 2A). The distribution of the glycoprotein NSP4 showed a similar mean radius to that of NSP2 (around $0.4\mu m$) suggesting, as described above, that these two proteins are located in the same structural layer of the VP (Figure 2A).
In order to confirm our preliminary observations and clarify the nanoscopic organization of the VPs, we evaluated the relative separation between NSP2 and each accompanying protein. Again, the results show a remarkable degree of organization in the structure of the VP (Figure 2B). As predicted from Figure 2A, we found that NSP5 is located in the internal part of the VP, in close proximity ($\approx 0.05\mu m$) to the area occupied by proteins NSP2 and NSP4, which themselves show the closest association. After the NSP2NSP4 region, VP6 occupies a middle region at $\approx 0.05\mu m$ from NSP2, followed by the VP4 protein, which were located at a distance of $\approx 0.18\mu m$. Finally, the VP7Mon and VP7Tri were situated at $\approx 0.38\mu m$ from NSP2 (Figure 2B). A MannWhitney test showed that the distances of the various viral components in relation to NSP2 were significantly different (Figure 2B), suggesting that they are situated in specific areas of the VPs. The two forms of VP7 were located at the same distance to NSP2, suggesting that the formation of trimers of VP7 takes place at the ER membrane, where the VP7 monomers should also be located. Note that in Figure 2B the relative distance of VP1 and VP2 to NSP2 was not included, since the radii obtained for NSP2 in these two combinations were significantly smaller than those found when it was determined in combination with the other VP components (see ‘Supplementary Exploratory Analysis’). In addition to this, we found no significant differences between the distance of both VP1 and VP2 to NSP2 (Figure 2C). Nonetheless, based on the inferential analysis, we could place these two proteins in the same layer as VP6 (see below).
Next, through a hierarchical cluster analysis, we studied the relationship between the components of the VP, taking into account multiple variables at the same time, like the mean distance to NSP2 [‘Mean(Dist)”], the standard deviation of the distance to NSP2 [‘Std(Dist)”], the mean radius of NSP2 [‘Mean(NSP2)”], and the radii of the other proteins [‘Mean(Other)”] (Figure 2D). Note that the proteins within a cluster should be as similar as possible and proteins in one cluster should be as dissimilar as possible from proteins in another. Because our variables are related with the distance to NSP2 and the radii of the proteins, this is a noparametric analysis that should provide evidence about the spatial distribution/order of the viral proteins into the VP. As we are considering the distance to NSP2, VP1 and VP2 were not included in this analysis. The first level of the hierarchical agglomerative cluster (Figure 2D, left) partitioned the VPs and the surrounding proteins in five clusters, composed by NSP4, NSP5, VP6, VP4 and {VP7Mon, VP7Tri}, which suggest that these five proteins compose different layers of the VP. The second agglomerative level merged into the same group the proteins NSP4 and NSP5, meanwhile VP6 and VP4 continue as independent clusters, which indicate that NSP5 and NSP4 are closer to each other than to VP6 and VP4 in the VP. In the third level, VP6 and VP4 are clustered in the same group, and as consequence are more related between them than with the others viral proteins. The subsequent groups in the clustering analysis indicate that VP7 remains as an independent layer with respect to the other proteins. Based on this analysis, the viral proteins seem to be highly organized, with VP7 conforming the most external layer, while NSP5, NSP4, VP6 and VP4 are distributed very close but as independent layers. The clusters between NSP5NSP4 and VP6VP4 suggest that these two pairs of proteins (in each cluster) conform continuous layers in the VP.
The relative spatial organization of VPs is maintained regardless their size
The scatterplot between the radius of the spatial distribution of NSP2 (independent variable, xaxis) and the radius of the distribution of other viral components (response variable, yaxis) showed a strong linear relationship (Figure 3A). The distribution of NSP5 grows $0.87\mu m$ for each $1\mu m$ increase in the radius of NSP2 (slope interpretation), whereas the radius of the distribution of NSP4 increases $0.99\mu m$ (Figure 3B). These findings indicate that NSP5 is distributed in a proportionally smaller region than NSP2 regardless of the absolute size of the VP, supporting our observation that NSP5 is a constituent of the core of the VP. Moreover, the fact that the increase in the radius of the fitted distribution of NSP4 is directly proportional to the same parameter measured for NSP2 supports the idea that these proteins are both constituents of a putative second layer. VP1, VP2 and VP6 exhibit similar slopes which diverge between 0.03 and 0.05 μm (Figure 3B and Appendix 1—table 6); thus, these results confirm that VP1, VP2 and VP6 are components of the same layer in the VPs which, from the data in Figure 3, is located just after the layer of NSP2 and NSP4. Finally, as noted in our quantitative analysis, VP4 and VP7 form consecutive external layers with a slope of 1.39 and 1.94 μm, respectively (Figure 3B and Appendix 1—table 6). These findings indicate that the spatial distribution of the viral components in the VPs and in the surrounding areas is conserved regardless of their absolute size, and also form the basis of a predictive model, where, for a given radius of distribution of NSP2, it is possible to predict the radii of the remaining VP components (NSP5, NSP4, VP1, VP2, VP6) and of VP4 and VP7 proteins. This predictive model is available as a web app at https://yasel.shinyapps.io/Nanoscale_organization_of_rotavirus_replication_machineries/. The mathematical details and the residual analysis that validate these linear models are available in the Appendix 1, section ‘Linear dependency between the viral components’, Appendix 1—table 6 and Appendix 1—figure 9.
The structural organization of VPs is independent of the reference protein chosen for pairwise comparison
In order to confirm the observed structural organization of VPs, we analyzed two more experimental conditions in which we chose a different reference protein for pairwise comparisons. The first was based on the distribution of NSP5 and its comparison with the relative localizations of VP6 and VP4, and the second considered NSP4 as the reference protein to compare with the distribution of VP6. We found that both analyses produced an identical structural organization for the VPs, with a comparative localization error of approximately $0.05\mu m$ between models (close to the effective resolution limit of the 3B algorithm; see ‘NSP5 and NSP4 as reference proteins’ in Appendix 1). An extensive quantitative validation regarding the congruence between the NSP2, NSP5 and NSP4 models is available in the Appendix 1.
Based on our extensive quantitative, descriptive and inferential statistical analyses, we propose that the VP and the surrounding viral proteins form an ordered biological structure composed of at least five concentric layers organized as depicted in Figure 4. In this structure, NSP5 constitutes the innermost layer, followed by a {NSP2NSP4} layer. Then, there is a layer composed by {VP1VP2VP6} and two consecutive external layers formed by VP4 and VP7. The different layers of proteins are most likely highly porous to allow the entry of positivesense singlestranded viral RNA (+RNA) during genome replication and also of the antibodies used for VP staining.
Discussion
VPs have been previously studied using electron and fluorescence microscopy, however, due to the limited resolution of classic fluorescence microscopy techniques, and the difficulty of analysis of immunoelectron microscopy, the existence of any complex structural organization of the viral elements inside the VPs has not been reported. In recent years, the development of SRM has facilitated research into the nanoscale organization of a diverse range of cellular structures (Grant et al., 2018; Reznikov et al., 2018), however, until now SRM had not been applied to study the replication cycle of rotavirus. In this work, thanks to the use of the 3B SRM algorithm, we visualized and determined quantitatively the location of several viroplasmic proteins, leading us to propose a detailed model of the VP that should be of great value for understanding virus morphogenesis.
Other SRM algorithms had been used to study the organization of viral and cellular structures showing concentric arrangements, as those proposed by Laine et al. (2015) and Manetsberger et al. (2015). The main similarities between those studies and our approach is the use of conics, such as circles or ellipses, to fitting structures showing concentric organization. The method provided by Manetsberger et al. (2015) could actually be implemented to analyze our data set, which as outcome will produce similar results. This method could also provide information about the degree of asymmetry within the VP, which may be valuable to establish functional relationships between the protein distribution belts that shape these intriguing structures. The selection of the 3B SRM algorithm over other superresolution approaches was based on the fact that this method allows to deal with samples with high density of labeling, obtaining data with a reasonable resolution, although at the cost of higher computational effort.
The quantification of the viral protein distribution within the VPs was possible thanks to a novel segmentation algorithm (VPsDLSFC) that was proven to be robust and efficient in noisy and partial occlusion scenarios. The manual presegmentation step of this algorithm was necessary in our case because we did not want to introduce any bias in the isolation of the VPs through an automatic approach. Setting aside the manual presegmentation step, the VPsDLSFC algorithm is automatic, deterministic, noniterative and has a linear computational complexity.
Previous reports have suggested that VPs have a sphericallike structure (Eichwald et al., 2004; CabralRomero and PadillaNoriega, 2006; Campagna et al., 2013); in this study we confirmed this suggestion by comparing the VPsDLSFC approach with a similar approach based on an ellipse adjustment (Garcés et al., 2016). The results showed no significant statistical differences between these two models, and as consequence we can confidently model the structure of the VPs as a circumference. We also ratified that the center displacement of the circumferences that adjust two paired proteins are not statistically different.
Our study indicates that the viral components in the VPs, as well as VP4 and VP7, are arranged as largely discrete concentric layers (note that we are describing the structure of viroplasms, not of virus particles). This organization, however, does not preclude the interaction among the VP components proposed in this model as being located in separate layers since, for instance, NSP5 has been shown by different biochemical methods to interact with NSP2 (Eichwald et al., 2004; Poncet et al., 1997; Afrikanova et al., 1998; Jiang et al., 2006), VP1 (Afrikanova et al., 1998) and VP2 (Berois et al., 2003). In this regard, based on the super resolution microscopy images, it seems clear that there is some overlapping between different protein layers, as is the case for NSP2 and NSP5 in Figure 1E, but also between NSP2 and VP1, VP2, VP6 and NSP4 in Figure 1. These general overlapping zones between different proteins most likely are relevant for coordinating the genome replication and virion assembly, as suggested. Of interest, we observed the presence of protein projections (‘spikelike’) from different viral shells that could also contribute to the interaction of proteins mapped to different layers (Appendix 1—figure 8). Although our present analysis is limited to a general characterization of the spatial distribution of the viral proteins within VPs, and not to understand specific details about the interactions between proteins in different layers, it could be used as departure point to analyze these interactions. Taking as initial solution the result of the algorithm VPsDLSFC and the SRM image, it is possible to employ other segmentation approaches, like deformable/active contours (snakes) (Kass et al., 1988), levelset (Osher and Fedkiw, 2003), or region growing methods (Mehnert and Jackway, 1997; Synthuja et al., 2012), to evolve the circular contour and fit precisely the spatial distribution of the viral proteins. Then, establishing a polar coordinate system in the VP, and considering the results of both segmentation algorithms, it would be possible to quantify the radial angle in which a spike from the central distribution of a viral protein that interacts with a different protein exists. It would be also possible to determine how strong these interactions are (intersection between two segmentation curves), and to study whether the spikes are randomly distributed between layers or a specific pattern in the connection between different protein layers exists. In the latter case, this could allow us to explore whether these patterns influence the assembly of the viruslike particles or only provide a skeleton that maintain the structure of the VP. The results obtained could also be used to study topological changes that the VP might experience at different times post infection, and associate these changes with maturation of the subviral particles. In this regard, in preliminary experiments carried out at three hpi, the viral proteins in he VP have been found to have a similar ‘layered’ organization as shown for the mature VP at six hpi (data not shown). This observation indicates that this organization is already present when the formation of viral particles has not yet taken place, suggesting that it might be important for the assembly of DLPs within VPs. In an additional application, SRM could also be used to observe the assembly of the virus particles and the interactions that may occur of these particles in different layers of the VP. Nevertheless, to develop this idea it would be important to establish an experimental protocol to observe the viral particles during the early stages of the assembly process, to distinguish simultaneously the layers of the viroplasm and the viral particles, and to collect the SRM images with a very short acquisition rate and a very high resolution (25–30 nm), which makes this experimental plan a challenge.
Previous studies based on conventional microscopy techniques have reported that NSP5 and NSP2 colocalize (González et al., 2000; Eichwald et al., 2004; Fabbretti et al., 1999); in contrast, we found that although NSP5 and NSP2 are located in close proximity, their positions in the VP were separable. This difference is attributable to the increased spatial resolution in the final image created by the superresolution techniques employed in our study. Here, NSP5 was found to represent the innermost layer of the VPs, suggesting that this protein might serve as the core scaffold upon which the subsequent viroplasmic proteins are assembled to form the VPs. This finding contrasts with a report by Eichwald et al. (2004), who described that NSP5 locates to a region external to NSP2. In addition to the superior spatial resolution obtainable through 3BSRM, compared to the traditional confocal microscopy employed in the previous report, the difference might be due to the fact that in our study we characterized the endogenous structures produced during virus replication, while Eichwald et al. characterized VPlike structures formed by transiently expressed proteins fused to GFP.
Immediately outside the NSP5 core, we observed a layer composed of NSP2 and NSP4 proteins. The finding that NSP4 is located in the inner part of VPs was unexpected, since it is known that NSP4 is an integral membrane protein of the ER and since it has been reported that functions as a receptor for the new DLPs located at the periphery of the VPs, during their budding towards the lumen of the ER (Chasey, 1980; Petrie et al., 1982; Petrie et al., 1984; Au et al., 1989). Furthermore, it has been shown that NSP4 associates with VP4 and VP7 to form a heterooligomeric complex that could be involved in the last steps of rotavirus morphogenesis (Maass and Atkinson, 1990). Based on these findings, NSP4 was expected to locate close to VP4 and VP7, in the surroundings of the VP. On the other hand, and in line with our observations, previous confocal microscopy studies have shown that a portion of NSP4 also shows a limited colocalization with NSP2 (González et al., 2000).
The dual location of NSP4 as an integral glycoprotein of the ER membrane and as internal to VPs, as our results indicate, is not easy to reconcile; however, in a previous work it was suggested that there are three pools of intracellular NSP4 molecules. The first pool is represented by NSP4 localized in the ER, a second minor pool localized in the ERGIC compartment, and the third pool distributed in cytoplasmic vesicular structures associated with the autosomal marker LC3 (Berkova et al., 2006). Furthermore, in that work the authors suggested that NSP4 and autophagic marker LC3positive vesicles may serve as a lipid membrane scaffold for the formation of large VPs by recruiting early VPs or VPlike structures formed by NSP2 and NSP5 (Berkova et al., 2006). This observation is in line with our model that NSP4 lies in an internal protein shell within VPs.
An additional, and very interesting possibility to explain the internal location of NSP4 in VPs is the hypothesis that VP morphogenesis occurs on the surface of lipid droplets (LDs) (Cheung et al., 2010). In that work, it was proposed that LDs serve as a platform to which NSP2 and NSP5 proteins attach to form VPlike structures; NSP2 octamers, in turn, associate with the viral polymerase VP1 and rotavirus +RNAs. The assorted RNA complex containing NSP2, VP1, the capping enzyme VP3 and viral +ssRNA is predicted to nucleate VP2 core assembly. In this model, core assembly results in the displacement of +RNAbound NSP2 octamers, while VP1 within new formed cores direct dsRNA synthesis, using +RNAs as templates (Cheung et al., 2010; Borodavka et al., 2017; Borodavka et al., 2018). These events are followed by incorporation of the middle virus capsid protein VP6 to form DLPs. At some stage, these assemblies become VPs containing cores and DLPs and may lose some or all of their lipids (Cheung et al., 2010). In this regard, it is important to have in mind that the currently accepted model for the LD biogenesis is that neutral lipids are synthesized between the leaflets of the ER membrane, and the mature LD is then thought to bud from the ER membrane to form an independent organelle that is contained within a limiting monolayer of phospholipids and associated LD proteins (Walther and Farese, 2012). Thus, during budding of the LDs from the ER membrane they could take along rotavirus NSP4 (topologically oriented towards the cell's cytoplasm) which could help as a scaffold on the surface of LDs for the assembly of other rotavirus viroplasmic proteins, localizing then to the interior of VPs.
Further support for our model of localization of at least one pool of NSP4 molecules inside of the VPs is the observation that knockingdown the expression of NSP4 by RNA interference significantly reduces the number and size of VPs present in the cell, as well as the production of DLPs (López et al., 2005a). That study also showed that during RNAi inhibition of NSP4 expression the NSP2 and NSP5 proteins maintained an intracellular distribution restricted to VPs, while the VP2, VP4, VP6 and VP7 proteins failed to locate to VPs. Based on these observations, it is tempting to suggest that, in addition to the role NSP4 has on the budding of DLPs into the ER lumen, it may also play an important role as a regulator of VP assembly.
After the NSP2/NSP4 layer, we observed a middle zone composed of the structural proteins VP1, VP2 and VP6. Their location in the same zone is expected given their close association in the assembled DLPs (Estes and Greenberg, 2013). Also, the fact that VP1, VP2 and VP6 form a complex with NSP2 that has replicase activity (Aponte et al., 1996), suggests that the production of new DLPs could take place in this zone of the VP.
Finally, we found that VP4 and VP7 conform independent layers just external to the viroplasmic proteins. The position of these two proteins agrees with the proposed model of rotavirus morphogenesis in which VP4 is assembled first on DLPs, and subsequently VP7 binds the particles and locks VP4 in place (Trask and Dormitzer, 2006). Furthermore, the fact that VP7Mon and VP7Tri occupied the same layer in our model indicates that in the ER sites into which the DLPs bud, VP7 is already organized as trimers, which are subsequently assembled into the virus particles. Of interest, VP4 has been reported to exist in two different forms in infected cells. One of them is associated with microtubules (Nejmeddine et al., 2000), while the other one has been reported to be found between the VP and the ER membrane (González et al., 2000). In this regard, based on our findings, we suggest that the latter form of VP4 can be actually considered as an integral component of the VP. Since several studies have found the presence of different cellular proteins and lipids in association to VPs (MaruriAvidal et al., 2008; Cheung et al., 2010; Dhillon et al., 2018), it will be interesting to study the relative localization of this components using the methodologies described here.
Materials and methods
Cell and virus
Request a detailed protocolThe rhesus monkey kidney epithelial cell line MA104 (ATCC) was grown in Dulbecco’s Modified Eagle MediumReduced Serum (DMEMRS) (ThermoScientific HyClone, Logan, UT) supplemented with 5% heatinactivated fetal bovine serum (FBS) (Biowest, Kansas City, MO) at 37°C in a 5% CO_{2} atmosphere. The cells were confirmed to be free of mycoplasm by testing with the INTRON Mycoplasma PCR Detection Kit (#25234). Rhesus rotavirus (RRV) was obtained from H. B. Greenberg (Stanford University, Stanford, Calif.) and propagated in MA104 cells as described previously (Pando et al., 2002). Prior to infection, RRV was activated with trypsin (10 μg/ml; Gibco, Life Technologies, Carlsbad, CA) for 30 min at 37°C.
Antibodies
Monoclonal antibodies (MAbs) to VP2(3A8), VP4 (2G4), VP6 (255/60), VP7 (60) and VP7 (159) were kindly provided by H. B. Greenberg (Stanford University, Stanford, CA) (Shaw et al., 1986; Greenberg et al., 1983). The rabbit polyclonal sera to NSP2, NSP4 and NSP5, and the mouse polyclonal serum to NSP2 were produced in our laboratory (González et al., 1998). The hyperimmune serum to NSP4 (C239) was generated in our laboratory by immunizing New Zealand white rabbits with a recombinant protein expressed in E. coli with a histidinetail, representing the carboxyterminal end (amino acids 120 to 175) of the rhesus rotavirus RRV NSP4 protein; see also MaruriAvidal et al. (2008), in which this serum was used. The hyperimmune serum to VP1 was also generated in our laboratory by immunizing BALB/c mice with a recombinant protein expressed in E. coli with a histidinetail, representing amino acids 227 to 539 of the rhesus rotavirus RRV VP1 protein. Goat antimouse Alexa488 and Goat antirabbit Alexa568conjugated secondary antibodies were purchased from Molecular Probes (Eugene, Oreg.).
Immunofluorescence
Request a detailed protocolMA104 cells grown on glass coverslips were infected with rotavirus RRV at a multiplicity of infection (MOI) of 1. Six hours post infection, the cells were fixed with and processed for immunofluorescence as described (SilvaAyala et al., 2013). Finally, the coverslips were mounted onto the center of glass slides with storm solution (1.5% glucose oxidase $+100$ mM $\beta $mercaptoethanol) to induce the blinking of the fluorophores (Dempsey et al., 2011; Heilemann et al., 2009).
Transmission electron microscopy
Request a detailed protocolCells grown in 75$c{m}^{2}$ flasks were infected with rotavirus RRV at an MOI of 3 as described above. Six hours postinfection the cells were fixed in 2.5% glutaraldehyde0.1 M cacodylate (pH 7.2), postfixed with 1% osmium tetroxide, and embedded in Epon 812 resin. The ultrathin sections obtained were stained with 2% uranyl acetate1% lead citrate (Reynolds mix). The grids were examined with a Zeiss EM900 electron microscope at 80 kV.
Set up of the optical microscope
Request a detailed protocolAll superresolution imaging measurements were performed on an Olympus IX81 inverted microscope configured for total internal reflection fluorescence (TIRF) excitation (Olympus, cellTIRFTM Illuminator). The critical angle was set up such that the evanescence field had a penetration depth of ~200 nm (Xcellence software v1.2, Olympus soft imaging solution GMBH). The samples were continuously illuminated using excitation sources depending on the fluorophore in use. Alexa Fluor 488 and Alexa Fluor 568 dyes were excited with a 488 nm or 568 nm diodepumped solidstate laser, respectively. Beam selection and modulation of laser intensities were controlled via Xcellence software v.1.2. A full multiband laser cube set was used to discriminate the selected light sources (LF 405/488/561/635 AOMF, Bright Line; Semrock). Fluorescence was collected using an Olympus UApo N $100x/1.49$ numerical aperture, oilimmersion objective lens, with an extra 1.6x intermediate magnification lens. All movies were recorded onto a 128 × 128pixel region of an electronmultiplying charge coupled device (EMCCD) camera (iXon 897, Model No: DU897ECS0#BV; Andor) at 100 nm per pixel, and within a 50 ms interval (300 images per fluorescent excitation).
Bayesian analysis of the blinking and bleaching
Request a detailed protocolSubdiffraction images were derived from the Bayesian analysis of the stochastic Blinking and Bleaching of Alexa Fluor 488 dye (Cox et al., 2011). For each superresolution reconstruction, 300 images were acquired at 20 frames per second with an exposure time of 50 ms at full laser power, spreading the bleaching of the sample over the length of the entire acquisition time. The maximum laser power coming out of the optical fiber measured at the back focal plane of the objective lens, for the 488 nm laser line, was 23.1 mW. The image sequences were analyzed with the 3B algorithm considering a pixel size of 100 nm and a full width half maximum of the point spread function of 270 nm (for Alexa Fluor 488), measured experimentally with 0.17 μm fluorescent beads (PSSpeckTM Microscope Point Source Kit, Molecular Probes, Inc). All other parameters were set up using default values. The 3B analysis was run over 200 iterations, as recommended by the authors in Cox et al. (2011), and the final superresolution reconstruction was created at a pixel size of 10 nm with the ImageJ plugin for 3B analysis (Rosten et al., 2013), using parallel computing as described in Hernández et al. (2016). The resolution increase observed in our imaging set up by 3B analysis was up to five times below the Abbe’s limit (~50 nm). The resolution provided by 3B was improved by computing the photophysical properties of Alexa Fluor 488, and Alexa Fluor 568 dyes, which were provided to 3B algorithm, as an input parameter which encompass the probability transition matrix between fluorophore’s states. The method was validated with 40 nm gattapaint nanorules (PAINT 40RG, gattaquant, Inc) labeled with ATTO 655/ATTO 542 dyes (see ‘3B Algorithm’ in Appendix 1).
Code and statistical analysis
Request a detailed protocolThe segmentation algorithm (VPsDLSFC) was developed in Matlab R2018a (9.4.0.813654) software. A detailed explanation of each the developed methods is available in Appendix 1. Statistical analysis were performed using R version 3.4.4 (20180315) software. All the codes are available at https://github.com/Yasel88/Nanoscale_organization_of_rotavirus_replication_machineries (Garcés Suárez, 2019; copy archived at https://github.com/elifesciencespublications/Nanoscale_organization_of_rotavirus_replication_machineries).
Appendix 1
This appendix is divided in seven sections. In –Segmentation algorithm–, we discuss the mathematical details of the algorithm 'Viroplasm Direct Least Square Fitting Circumference' (VPsDLSFC). In Section –Algorithm validation–, we compare the algorithm VPsDLSFC with other two classic methods for the adjustment of circumferences; this analysis was done thanks to the use of approximately 40000 synthetic 'ground truth' images. In Section – Model Considerations–, we prove that the distribution of the studied viral elements follow a concentric circumference spatial distribution. Sections –Exploratory Analysis– and – Linear Regression Model– contains a supplementary exploratory analysis about the spatial distribution of the viral elements, and details about the results and the residual error analysis of the linear regression models, respectively. In Section –NSP5 and NSP4 as reference proteins– we developed a similar study that was made with NSP2, but considering NSP5 and NSP4 as reference proteins. Also, we shown that there is no statistically significative differences between the distributions of the viral elements when we changed the reference protein. Finally, in Section –3B Algorithm– we explain the fitting model for the 3B algorithm where the transition matrix is modeled by ordinary differential equations (ODE).
Segmentation algorithm
The use of primitive models for the segmentation of the SRM images has many benefits, like computational efficiency, simple programmation, and understandable information about the objects that were segmented. In this regard, we developed a simple method for fitting a circumference to scattered data, which we called ‘Direct Least Squares Fitting Circumference’ (DLSFC). This approach considers basic mathematical analysis tools for the computation of the extreme value of a continuous function with N variables.
Direct least squares fitting circumference (DLSFC)
The spatial distribution of a viral proteins into the VP can be seen as a set of points $P=\left\{({x}_{i},\text{}{y}_{i})i=\overline{1:N}\right\}\in {\mathbb{R}}^{2}$ (scattered data in the plane). Taking into account the implicit equation of a circumference ${(x{x}_{c})}^{2}+{(y{y}_{c})}^{2}={r}^{2}$, where $({x}_{c},{y}_{c})$ is the center, and $r$ is the radius. The problem of adjust a circumference to $P$, is a optimization problem given by:
The center of mass of $P$ is given by:
Then, considerering Equation (2), the problem (Equation (1)) can be rewritten as:
where ${u}_{i}={x}_{i}\overline{x}$, ${v}_{i}={y}_{i}\overline{y}(\text{for all}i=1,2,\mathrm{\dots},N)$, $\alpha ={r}^{2}$, and $({u}_{c},{v}_{c})$ is the center of the circumference after the change of variable.
For simplicity, we will define as ${f}_{{u}_{c},{v}_{c},\alpha}({u}_{i},{v}_{i}):={({u}_{i}{u}_{c})}^{2}+{({v}_{i}{v}_{c})}^{2}\alpha $ the distance function, and by $F({u}_{c},{v}_{c},\alpha ):=\sum _{i=1}^{N}{f}_{{u}_{c},{v}_{c},\alpha}{({u}_{i},{v}_{i})}^{2}$ the objective function of our minimization problem.
Note 1 Note that, with the new convention of notation, the problem (Equation (3)) is equivalent to:
A simple alternative to resolve Equation (4) is to consider the extreme values of $F({u}_{c},{v}_{c},\alpha )$, this is:
The partial derivates of the function $F({u}_{c},{v}_{c},\alpha )$ are:
1. For $\alpha $:
Taking into account the condition given in Equation (5) for $\frac{\mathrm{\partial}F}{\mathrm{\partial}\alpha}$, we obtain that:
2. For ${u}_{c}$:
3. For ${v}_{c}$: It is the same process than ${u}_{c}$, the final result is,
Expanding the Equation (7) we obtain:
and taking into account that ${u}_{i}={x}_{i}\overline{x}\Rightarrow \sum _{i=1}^{N}{u}_{i}=\sum _{i=1}^{N}{x}_{i}N\overline{x}=\sum _{i=1}^{N}{x}_{i}N{\displaystyle \frac{\sum _{i=1}^{N}{x}_{i}}{N}}=0$, we obtain that:
Let ${C}_{({u}^{k},{v}^{t})}:=\sum _{i=1}^{N}{u}_{i}^{k}{v}_{i}^{t}$, then the equation Equation (9) can be written as:
and considering the extreme condition $\frac{\partial F}{\partial {u}_{c}}=0$ (see Equation (5)) we obtain:
Following the same process, but now considering the Equation (8), we obtain:
The extreme values $\widehat{{u}_{c}}$ and $\widehat{{v}_{c}}$ of the function $F({u}_{c},{v}_{c},\alpha )$ are obtained as solution of the linear system formed by Equation (10) and Equation (11).
Note 2 The coordinate system translation to the mass center of the original data, allow the simplification of the equation (Equation (9)).
The radius of the circumference is obtained developing the Equation (6):
and writing the equation in function of $\alpha $, we obtain that:
The point $(\widehat{{u}_{c}},\widehat{{v}_{c}},\widehat{\alpha})$ is a critical point of the function $F({u}_{c},{v}_{c},\alpha )$. Finally, we need to compute the Hessian matrix of $F({u}_{c},{v}_{c},\alpha )$ to test if $(\widehat{{u}_{c}},\widehat{{v}_{c}},\widehat{\alpha})$ is a maximum, a minimum, or an inflection point:
The principal minors of $H$ are:
Note 3 The CauchySchwarz inequality is equal to zero if and only if exist a real value $r$ such that ${u}_{i}\mathit{}r\mathrm{+}{v}_{i}\mathrm{=}\mathrm{0}$, for all $i\mathrm{=}\mathrm{1}\mathrm{,}\mathrm{2}\mathrm{,}\mathrm{\dots}\mathrm{,}N$, that is, all the points are in a straight line. In our problem, the proteins do not follow a straight line distribution in the space, then, we can guarantee the strict inequality. Note that ${H}_{\mathrm{2}}\mathrm{=}\mathrm{0}$ if and only if ${u}_{i}\mathrm{=}\mathrm{0}\mathrm{,}\mathit{\text{for all}}\mathit{}i\mathrm{=}\mathrm{1}\mathrm{,}\mathrm{2}\mathrm{,}\mathrm{\dots}\mathrm{,}N$, which is a particular case of a straight line.
The principal minors of the matrix $H$ are greater than zero, then, the point $(\widehat{{u}_{c}},\widehat{{v}_{c}},\widehat{\alpha})$ is a local minimum, but this point is the only critical point of the function $F({u}_{c},{v}_{c},\alpha )$, and as consequence it is a global minimum.
The center of the fitted circumference in the original coordinate system is $({x}_{c},{y}_{c})=(\widehat{{u}_{c}},\widehat{{v}_{c}})+(\overline{x},\overline{y})$, and the radius is $R=\sqrt{\widehat{\alpha}}$.
Note 4 This method is based on a simple and very popular mathematical techniques (specifically Differential Calculus). Even when we do not found a published paper with this approach, we want to take cares and not adjudicate this technique to our work.
Note 5 Note that, by definition, the distance function ${f}_{{u}_{c}\mathrm{,}{v}_{c}\mathrm{,}\alpha}\mathit{}\mathrm{(}{u}_{i}\mathrm{,}{v}_{i}\mathrm{)}$ is not a geometric distance to the circle, which is probably more intuitive to use in this kind of problems. Let consider for a second the geometric distance of a set of points $P$ to a circle, which is given by:
where $\mathrm{(}{x}_{c}\mathrm{,}{y}_{c}\mathrm{)}$ is the center of the circumference and $r$ is the radius. Considering the extreme condition for the function $G\mathit{}\mathrm{(}{x}_{c}\mathrm{,}{y}_{c}\mathrm{,}r\mathrm{)}$, it is obtained that:
where $\overline{x}$ and $\overline{y}$ are the mean values of the variables $x$ and $y$ respectively. Simultaneously equating these partials to zero does not produce closed form solutions for ${x}_{c}\mathrm{,}{y}_{c}\mathrm{,}$ and $r$. With numerical methods it is possible to carry out the optimization of these parameters, but these alternatives are iterative, nodeterministic process, which also are highly sensitives to the initial values, while our proposal not have these kind of unwished characteristics. In any case, in Appendix 1 Section –Algorithm validation– we compared our approach with an algorithm based on geometric distance and explained in details the benefits of each one.
The final segmentation algorithm is composed by three steps (Appendix 1—figure 1). The first is a manual presegmentation that guarantees the existence of one and only one viroplasm per image (Appendix 1—figure 1A). In the second step, we adjust a circumference to the reference protein (remember that these are paired experiments, where NSP2 is present in all the combinations as reference protein) through the algorithm DLSFC (Appendix 1—figure 1B), and finally the radius of the accompanying protein is adjusted using the Equation (12). Note that, the center of the accompanying protein is the same that the center of the reference protein (concentric model). Out of the manual presegmentation step, this algorithm is automatic, deterministic, not iterative and with a linear computational complexity.
Note 6 The hypothesis that the viral proteins in the viroplasm are distributed like as concentric circumferences was proved in Appendix 1 Section – Model Considerations– (‘Model Considerations’).
Algorithm Validation
The algorithm DLSFC was compared with other two approaches proposed by Gander et al. (1994). For a more comprehensive and clear comparison we named these others two methods as ‘Algebraic Least Square Fitting Circle’ (ALSFC) and ‘Geometric Least Square Fitting Circle’ (GLSFC).
The algorithm ALSFC considers the algebraic representation of a circle in the plane:
where $p=(x,y)$ is a point in ${\mathbb{R}}^{2}$, $B=({B}_{1},{B}_{2})\in {\mathbb{R}}^{2}$, $A\ne 0$, and $C$ is the independent term. Then, let ${p}_{i}=({x}_{i},{y}_{i})\in {\mathbb{R}}^{2},i=1,2,\mathrm{\dots},N$ the set of points that we want to adjust. The objective function for this minimization problem is $\sum _{i=1}^{n}F({p}_{i})$, which can be represented as matrix form:
The final optimization problem is given by:
where $\parallel u\parallel =1$ is a contrain to avoid the trivial solution $u=(0,0,0,0)$. The notation $\Vert v\Vert =\sqrt{{v}_{1}^{2}+{v}_{2}^{2}+\dots +{v}_{m}^{2}}$ represent the Euclidean norm. Finally, the problem (Equation (16)) is solved considering the right singular eigenvector associated with the smallest singular eigenvalue of $\mathrm{\Psi}$ (Gander et al., 1994).
On the other hand, the algorithm GLSFC deals with the minimization of the geometric distance:
Note that ${\left(\parallel ({x}_{c},{y}_{c}){p}_{i}\parallel r\right)}^{2}$ is the geometric distance of the point ${p}_{i}$ to the circle ${(x{x}_{c})}^{2}+{(y{y}_{c})}^{2}={r}^{2}$. This problem is nonlinear, and in Gander et al. (1994), the authors consider to resolve it trought a GaussNewton algorithm. The GaussNewton method is an iterative algorithm that depends of an initial approximation, and in this case, Gander and collaborators consider the solution of the problem (Equation (16)) as the initial value for the iterative process. More details about the algorithms ALSFC and GLSFC can be consulted in Gander et al. (1994). In summary, the method ALSFC is very simple and computational efficient, but the algebraic distance might not reflect with accuracy the real distance between the points and the circumference and, as consequence, the parameters of the model might be biased. On the contrary, the algorithm GLSFC is highly precise, but it is an iterative process (demand more computation time), and depends of good initial values to achieve the convergence. The Matlab codes for the algorithms GLSFC and ALSFC were obtained from Brown (2007).
For the validation, we simulated the spatial distribution of viral elements as circumferences, for which we know their parametric form (‘ground truth’ dataset) (Appendix 1—figure 2A). The use of ‘ground truth’ allows us to quantify the error in the adjustment of the algorithms DLSFC, ALSFC and GLSFC at different noise levels (Appendix 1—figure 2B) and partial occlusion conditions (Appendix 1—figure 2C). We generated over 40 000 images (size $512\times 512$ pixels) taking into account different levels of additive white Gaussian noise (AWGN) in order to consider autofluorescence and the error in the localization of the fluorophores by the algorithm 3BODE (see Appendix 1 Section –3B Algorithm–), and partial occlusion angles. Several radii and position of the synthetic viroplasms were generated randomly through a uniform distribution function. The codes for this simulation are availables at https://github.com/Yasel88/Nanoscale_organization_of_rotavirus_replication_machineries (Garcés Suárez, 2019; copy archived at https://github.com/elifesciencespublications/Nanoscale_organization_of_rotavirus_replication_machineries).
Noise Generation
Let $\mathcal{I}$ a synthetic image and ${p}_{i}\in {\mathbb{R}}^{2},i=1,2,\mathrm{\dots},N$ points of the ‘ground truth’ circunference $\mathcal{C}\in \mathcal{I}$, the AWGN over $p}_{i},\text{}\mathrm{\forall}i=\overline{1,N$ was generated as:
where ${p}_{i}^{c}$ is the corrupted point after the contamination with AWGN, $\sigma $ is a scalar that represents the level of noise, and ${X}_{i}$ is a trial of a random variable $X\sim N(0,1)$ (standard gaussian distribution). In our experimental design we included 20 different levels of noise $\sigma =0:0.5:10$ (increase by 0.5 from 0 to 10). Note that the noise was added to the points of the circumferences and not to the images. This is because we are working with SRM images, and as consequence does not exist background noise.
Partial Occlusion Generation
The partial occlusion can be seen as the existence of incomplete information of the objects of interest, for example, an image where we have only information about the half of the VPs. In our perspective, it is important that a segmentation algorithm have a good performance even in these situations.
Definition 1 We called an angle $\theta $ as a partial oclussion angle for a set of points ${p}_{i}\mathrm{=}\mathrm{(}{x}_{i}\mathrm{,}{y}_{i}\mathrm{)}\mathrm{\in}{\mathrm{R}}^{\mathrm{2}}\mathrm{,}i\mathrm{=}\mathrm{1}\mathrm{,}\mathrm{2}\mathrm{,}\mathrm{\dots}\mathrm{,}N$, if not exist $j\mathrm{\in}\mathrm{\{}\mathrm{1}\mathrm{,}\mathrm{2}\mathrm{,}\mathrm{\dots}\mathrm{,}N\mathrm{\}}\mathrm{,}r\mathrm{\in}\mathrm{R}$ and $\beta \mathrm{\in}\mathrm{[}\alpha \mathrm{,}\alpha \mathrm{+}\theta \mathrm{]}$ such as:
where $\alpha $ is an angle in $\mathrm{[}\mathrm{0}\mathrm{,}\mathrm{2}\mathit{}\pi \mathrm{]}$, and $\mathrm{(}\overline{x}\mathrm{,}\overline{y}\mathrm{)}\mathrm{=}\mathrm{(}\frac{\mathrm{1}}{N}\mathit{}\mathrm{\sum}_{i\mathrm{=}\mathrm{1}}^{N}{x}_{i}\mathrm{,}\frac{\mathrm{1}}{N}\mathit{}\mathrm{\sum}_{i\mathrm{=}\mathrm{1}}^{N}{y}_{i}\mathrm{)}$ is the center of mass of the points ${p}_{i}$.
Note 7 Note that, in the Definition 1 we considered the interval $\mathrm{[}\alpha \mathrm{,}\alpha \mathrm{+}\theta \mathrm{]}$ as a partial oclusion angle $\theta $, but it is possible to obtain the same oclussion angle with $\mathrm{[}\alpha \mathrm{,}\alpha \mathrm{}\theta \mathrm{]}$. For our validation we took $\alpha \mathrm{=}\mathrm{0}$ for simplicity, but this has not consequences in our experiments, because we generate the points as a circumference.
The relationship between cartesian and polar coordinates is given by:
where, (a, b) is the circumference center, $r$ is the radius and $\beta $ is the angular coordinate. Then, for example, to provoke a partial occlusion of $\theta $, it is just needed to avoid the generation of points in the angle $[0,\theta ]$ or $[2\pi ,2\pi \theta ]$ through the Equation (19). Appendix 1—figure 3 shows this strategy, in this example, a half of the information is removed from the ‘ground truth’ circumference.
In general, the average of the points that will be conserved after the generation of partial occlusion is given by:
In the validation of the algorithm DLSFC, we considered four different partial occlusion conditions: 1 Not partial occlusion: Conserves all the points of the ‘ground truth’ circumference; 2– 25% of partial occlusion: Partial occlusion angle $\theta =\pi /2$; 3– 50% of partial occlusion: Partial occlusion angle $\theta =\pi $; 4– 75% of partial occlusion: Partial occlusion angle $\theta =3\pi /2$.
In the experiments, we represent the noise through the mean and the standard deviation of the distance of the points ${p}_{i}^{c},i=1,2\mathrm{\dots},n$ to the ‘ground truth’ circumference $\mathcal{C}$, this is:
where ${d}_{i\mathcal{C}}=\left\parallel \mu {p}_{i}^{c}\parallel r\right$ is the geometric distance between the point ${p}_{i}^{c}$ and the circumference $\mathcal{C}$, $\cdot $ represent the absolute value, $\mu =({x}_{c},{y}_{c})$ is the center of the circumference and $r$ is the radius. The mean and standard deviation are an useful alternative to understand the amount of dispersion generated by a noise $\sigma $ in a circumference $\mathcal{C}$.
The Appendix 1—figure 4 shows the adjustment through the algorithms DLSFC, GLSFC and ALSFC of one ‘ground truth’ circumference (solid black line) that was corrupted by different levels of AWGN and partial occlusion angles. As was expected, the performance of the algorithms seems to get worse when increase the level of partial occlusion, but the three algorithms shows good results for high AWGN levels. This example suggests that the algorithm GLSFC (solid red line) has a better performance than DLSFC (solid blue line) and ALSFC (solid green line) when the partial occlusion is less than $\pi $ (columns AC), but when $\theta =3\pi /2$, this method breaks down, and this observation is more relevant when increase the level of noise. The algorithm ALSFC shows relative good results for all the experimental conditions, but it is interesting to note that, in most cases, the adjustment is worse than in the DLSFC and GLSFC alternatives, probably as consequence of the algebraic distance (see Appendix 1 Section –Algorithm validation–). This example is just for a first visual analysis, but a complete study of the results obtained by the three algorithms over the 40 000 synthetic images will be presented below.
A circumference can be represented as a vector $\mathcal{C}=(r,{x}_{c},{y}_{c})\in {\mathbb{R}}^{3}$, where $r$ is the radius, and $({x}_{c},{y}_{c})$ the center. Let $\widehat{\mathcal{C}}=(\widehat{r},\widehat{{x}_{c}},\widehat{{y}_{c}})$ and $\mathcal{C}=(r,{x}_{c},{y}_{c})$ the ‘ground truth’ and the adjusted circumference, respectively. The fit error committed by the algorithms can be quantified as:
Now, thanks to the use of ‘ground truth’ circumferences, it is possible to compare the obtained outcome with the ideal result. We consider that an adjustment is good enough if the estimation error of each circumference’s component $\mathcal{C}$ is less than 0.1 μm. This value is approximately twofold smaller than the theoretical Abbe’s limit ($\approx 250nm$), and taking into account that the synthetic images resolution vary between 4.6 and 25.8 μm in mean (see Appendix 1—figure 5), we are demanding that the algorithm has a nanoscopic precision (less than 100 nm) even when the images have a very bad resolution. Combining the threshold explained above with Equation (21), we obtain the final condition based on the euclidean norm between these two vectors:
The Appendix 1—figure 5(a) shows the descriptive analysis (through boxplots) of the errors generated by the algorithms DLSFC, GLSFC and ALSFC over the 40000 synthetic images. In some experiments we obtained extremely good adjusts ($\epsilon \approx {10}^{15}\mu m$), but in general the 75% of the errors are in the interval $[{10}^{3},1]\mu m$. For all occlusion angles, the algorithm GLSFC had a bad behaviour in a wide range of experiments with an error of $\approx {10}^{10}\mu m$. This kind of situation was expected for big partial occlusion angles (for example $\theta =3\pi /2$), but in conditions where does not exist partial occlusion ($\theta =0$), the bad adjustment of GSFC is evidence that this algorithm is not a good alternative in this kind of problems. Note that a similar situation happens with the algorithm ALSFC, but in this case the maximum error is $\approx {10}^{3}\mu m$, which is high too, but significantly less than that obtained by GLSFC. The algorithm DLSFC does not show the previous problems; observe that the maximun error is under $1\mu m$ for all experimental conditions, even when $\theta =3\pi /2$. Now, if we forget the outliers and take into account only the 75% of the experiments (boxes), a direct relationship between the error of each algorithm and the partial occlusion angles (increase occlusion angle imply a major error) is observed. However, the raise of the mean error is small in relationship with the occlusion angle, which suggests that these three methods are robust to partial occlusion. The blue shadow band highlights the occlusion angle $\left(\theta =3\pi /2\right)$ for which the algorithms have a median error greater than $0.1732\mu m$. Therefore, it is necessary at least the 50% of the data to obtain a good adjustment (with an error under $0.1732\mu m$) in the 75% of the experiments.
The Appendix 1—figure 5(b) displays the mean error obtained by each algorithm in relationship with the resolution of the images (mean and the standard deviation of the points distance to the ‘ground truth’ circumference, Equation (20)). This graph shows again that the method GLSFC did not reach the convergence in some experiments (remember that GLSFC is based on an iterative optimization algorithm), even for small partial occlusion angles, but it is interesting to point out that in all these cases (at least for the occlusion angles $\theta =\{0,\frac{\pi}{2},\pi \}$), these high errors ($\approx {10}^{10}\mu m$) seem to be related with bad initial values. Note that in these experiments the algorithm ALSFC do not has a good performance (remember that the result of ALSFC is the initial value for GLSFC), which is an evidence of the sensibility of the GLSFC to the initial values. For the occlusion angle $\theta =\frac{3\pi}{2}$, a lot of bad fits were obtained through the algorithm GLSFC. In our opinion, even when the algorithm GLSFC can reach an extremely good fit, its sensitivity to initial values, the computational cost (iterative process), and the possibility of not convergence, are strongs reasons to consider that this alternative is not viable to use in this kind of problems.
In the Appendix 1—figure 5(c), the mean error of the algorithms DLSFC and ALSFC was splitted to see more clearly the performance of each one. For small partial occlusion angles $\left(\theta =\{0,\frac{\pi}{2}\}\right)$ the algorithm DLSFC has errors $\le 0.1732\mu m$, while ALSFC, in high noise scenes, get a poor adjustment (black arrows). When $\theta =\pi $, the method ALSFC has errors $\ge 0.1732\mu m$ in many cases, that increase when the level of noise grows. Both algorithms generate an error $\ge 0.1732\mu m$ when $\theta =\frac{3\pi}{2}$, but in the case of DLSFC, in some experiments had errors $<0.1732\mu m$, even for high noise levels. The Appendix 1—figure 5(d) show only the performance of the algorithm DLSFC. As noted, the method DLSFC shows a behaviour extremely robust and stable to noise and partial acclusion; note that for $\theta =\{0,\frac{\pi}{2},\pi \}$ does not exist an increment in the mean error when the noise level is incremented. In extreme experimental conditions, when only the 25% of the data remain in the image $\left(\theta =\frac{3\pi}{2}\right)$, DLSFC has a mean error between $[0.3,0.5]\mu m$ with a 95% confidence interval equal to $[0.1,0.9]\mu m$.
So far we have shown the behaviour of the three algorithms considering several noise levels and partial occlusion angles conditions. However, it is interesting to know through a hypothesis test (inferential statistics) the performance of DLSFC and ALSFC taking into account the experiments we developed (samples). Note that we are not going to study the algorithm GLSFC, because, as we mentioned above, its sensitivity to initial values and its computational cost (iterative process) are strong reasons to consider that this algorithm is not viable to use in this kind of problems. Taking into account the threshold in Equation (22), for each experiment we have two possible outcomes:
From this point of view, each experiment follows a Bernoulli’s distribution, and then, all the experiments for the same experimental condition (partial occlusion angle) follows a Binomial distribution. The generated samples are independent (random generation), the decision rule (Equation (23)) is dichotomous, and our experimental design is a fair representation of the population (we considered many experimental conditions in more than 40000 images); then, it is possible to develop a ‘Binomial Test’ (Prybutok, 1989; Howell, 1982) to evaluate the population proportion of success, which we will denote by $p$. The null (H0) and alternative (H1) hypothesis for this test are:
The null hypothesis is that the population proportion with an error less that 0.1732 is 0.7, and the alternative hypothesis is that the proportion is more that 0.7. The significance level for these tests is fixed at $\alpha =0.05$. For more information about this test and its implementation review (R Development Core Team, 2017; Prybutok, 1989; Howell, 1982) Online documentation is available in url https://stat.ethz.ch/Rmanual/Rdevel/library/stats/html/00Index.html.
The Appendix 1—table 1 shows the results of the ‘Binomial Test’ for the algorithms DLSFC and ALSFC. For small occlusion angles $\theta =\{0,\frac{\pi}{2}\}$, the results of both methods are very similar and indicate a high performance. Note that in these cases the null hypothesis was rejected with a probability error of type I of $\approx {10}^{16}\le \alpha $, and the 95% confidence intervals are equal to $[0.99,1]$, which means that, at a population level, we hope that at least the 99% of the adjustments had an error $\le 0.1732\mu m$. For $\theta =\pi $, the algorithm DLSFC has a probability of success greater than 0.7 with a pvalue of $\approx {10}^{16}\le \alpha $. On the contrary, for the same oclussion angle, the test revealed that the null hypothesis can’t be rejected in the case of the algorithm ALSFC. Finally, as was analyzed above, the performance of the algorithms decrease significally for $\theta =\frac{3\pi}{2}$, and we can’t reject the null hypothesis, in others words, we don’t have enough evidence to assume that in this case the probability of success is greater than 0.7.
The study carried up in this section shows that the algorithm GLSFC has a good performance and is capable to obtain a low error in the adjustment, but this alternative is sensitive to the initial values (Appendix 1—figure 5 (a,b)), and as any iterative process, consumes more computational resources. The method ALSFC proved to have a good behavior in many experimental conditions, although in diverse occasions a bad fit was obtained. Also, the behaviour of this algorithm was always inferior than that of DLSFC (Appendix 1—figure 5(C) and Appendix 1—table 1). Finally, the algorithm DLSFC had a great performance and stability in all the experimental conditions, even in extreme partial occlusion angles (Appendix 1—figure 5 and Appendix 1—table 1). This method proved to be robust in high noise conditions and it is a deterministic and not an iterative algorithm. For all these reasons we consider that the algorithm DLSFC is the best choice in comparison with GLSFC and ALSFC, to fitting the spatial distribution of the viral elements.
Model Considerations
To proof the hypothesis that the viral elements of the VPs can be approximated through circumferences, we carried up a series of experiments based on the comparison between the circumference obtained by the algorithm DLSFC, and a least squared ellipse resulting of the ‘Direct Least Square Fitting Ellipse’ (DLSFE) algorithm (Fitzgibbon et al., 1999). The algorithm DLSFE has been used previously to adjust the viral replication centers of adenoviruses in fluorescence images (limited by diffraction) (Garcés et al., 2016), and has also been shown to be robust to noise, computationally efficient, and easy to implement (Fitzgibbon et al., 1999).
Let us denote by $({x}_{i},{y}_{i}),i=1,2,\mathrm{\dots},N$ the set of N points relative to the protein $\mathcal{P}$, and by ${C}_{\mathcal{P}}$ and ${E}_{\mathcal{P}}$ the adjusted implicit form of the conics obtained by the algorithms DLSFC and DLSFE respectively. The variable ${C}_{\mathcal{P}}=(r,{x}_{c},{y}_{c})$ is a vector where $r$ is the radius, and $({x}_{c},{y}_{c})$ is the center of the circumference. On the other hand, the implicit form of the ellipse can be represented as ${E}_{\mathcal{P}}=(a,b,{x}_{c}^{E},{y}_{c}^{E},\theta )$, where $\{a,b\}$ are the semimajor and semiminor axis respectively, $({x}_{c}^{E},{y}_{c}^{E})$ is the center of the ellipse, and $\theta $ is the rotation angle. Under the assumption that the protein $\mathcal{P}$ can be approximate by a circumference, we expect that does not exist significative statistical differences between the radius of the circumference $(r)$ and each one of the ellipse semiaxis $(a,b)$.
Note 8 Note that this approach is independent of which protein we choose to test the circularity hypothesis.
For each protein combination, we accomplish a ShapiroWilk hypothesis test to study if any of the three distributions functions (circumference radius ($r$), semimajor axis ($a$) and semiminor axis ($b$)) came from a normally distributed population (Shapiro and Wilk, 1965). The results revealed that, our data does not seem to follow a Gaussian distribution, and as consequence, it is impossible to use a parametric test (for example tstudent). Based on our data conditions, we considered the MannWhitney test (Mann and Whitney, 1947; Hollander et al., 2013) to evaluate the differences between the radii of the circumferences and each of the ellipses semiaxis. This is a nonparametric test that can be applied when the observations are independent (our variables are independent because the radius and the semiaxes were obtained by different algorithms), the variables are ordinal (also our variables meet this requirement because they are numeric), and finally, we want to test if the radius of the circumferences and each one of the ellipses semiaxis, are equal (null hypothesis) or are not (alternative hypothesis).
The statistical analysis reveal that does not exist a significative statistical differences between ${R}_{C}$ and $a$ (see Appendix 1—table 2) with a significance level of $\alpha =0.05$. The same results were obtained for ${R}_{C}$ and $b$ (see Appendix 1—table 3), and as consequence not matter if we use circumferences or ellipses to fitt the spatial distribution of the viral proteins.
On the other hand, we carried up a study to validate the viral elements concentricity hypothesis based on the distance between the centers of the adjusted circumferences of both proteins. For all the proteins, the median value is very close or smaller than $0.05\mu m$ (Appendix 1—figure 6 and Appendix 1—table 4 (Column 2)), which is in accordance with the resolution limit of the algorithm 3BODE ($0.040.05\mu m$) (Cox et al., 2011). Also, the 95% confidence interval (Appendix 1—table 4, Column 3) shows that the distance between the centers in each case is, 95% of the times in the order of ${10}^{2}\mu m$, which we consider a small error taking into account the limitations of the algorithm 3BODE.
Note 9 The MannWhitney test was used to study the differences in population medians (difference in location) (Mann and Whitney, 1947).
Supplementary Exploratory Analysis
The algorithm VPsDLSFC allows the quantification of the radius of two proteins in the same VP, and also the relative distance between them. At a first stage, we analyzed approximately 590 images (see histogram in Appendix 1—figure 7A) that were obtained by the presegmentation step of the algorithm VPsDLSFC (Appendix 1—figure 1). In agreement with previous reports (Eichwald et al., 2012; Eichwald et al., 2004; CarreñoTorres et al., 2010), we found a wide heterogeneity in the size of VPs measured by NSP2, which exhibited radii ranging from 0.2 to 0.75 μm. In order to compare the distribution of each protein taking NSP2 as reference, we selected a subset of the individual VPs, in a way that there were no significant statistical difference between the radii of NSP2 from each condition.
In accordance with this criterion, we were able to adjust the size of NSP2, such that the mean in the distribution of the radius was similar when this protein was paired with NSP5, NSP4, VP6, VP4 and VP7 (Appendix 1—figure 7B). Unfortunately, in the case of NSP2~VP1 and NSP2~VP2, even though we could selected a group of VPs in which the mean radius of NSP2 (0.35 μm) was closer to the size of the NSP2 from the other conditions ($0.4\mu m$), the analysis of the distributions in the radius of NSP2 by the MannWhitney hypothesis test (Appendix 1—figure 7B), revealed that the VPs of VP1 and VP2 were significative smaller than the mean size of viroplasms of the others proteins. As consequence, was not possible to compare directly the position of VP1 and VP2 with the others viral elements through an exploratory analysis. However, this problem did not limit the inferential study that we developed.
The use of superresolution microscopy and the algorithm VPsDLSFC allows the detection of small differences (resolution of nanometers) in the relative position of the viral elements. To validate that the different viral elements were in fact located in a different region from NSP2, we developed a MannWhitney test to study the existence of significative differences between the radii of NSP2 and the radii of each of the other viral elements. According with our previous results (see main text), this test revealed that only NSP4 has a similar spatial distribution that of NSP2 (Appendix 1—figure 7C and Appendix 1—table 5), while the others viral elements have different radii. Moreover, the small negative differences in location of NSP5 turned out to be significative different with respect to NSP2 (the pvalues of the test are available in the (Appendix 1—figure 7C and Appendix 1—table 5). These results support that NSP5 is located in the innermost part of the viroplasm and that NSP4 shares the same layer with NSP2. The structural proteins VP1, VP2 and VP6 seem to be in the same layer of the viroplasm, with similar values, but with a significative statistical difference in location with NSP2 at a level $\alpha =0.01$. Also, this test suggests that VP4 and VP7 form independent layers in the most external zone of the VPs (Appendix 1—figure 7C and Appendix 1—table 5), which is in accordance with our proposed model.
Linear dependency between the viral components
The spatial distribution of the viral elements in the VPs is conserved regardless of their size; that is, the radius of each protein shows a linear dependency with the radius of NSP2.
Let ${P}_{i}=({x}_{i},{y}_{i})\in {\mathbb{R}}^{2},i=1,2,\mathrm{\dots},N$ a set of points, where ${x}_{i}$ and ${y}_{i}$ are the radii of NSP2 and the acompayning viral element Y, respectively. The adjustment of the linear model can be solved through the following optimization problem:
The model (Equation (25)) does not include an offset term because we don’t have any quantification about the radius of the accompanying viral element when NSP2 have been silenced.
The solution of this problem is given by Kiefer (1987):
where $cor(x,y)$ is the linear correlation between x and y, $cov(x,y)$ is the covariance between x and y, Var(x) is the variance of x, and $\overline{x}$ is the mean value of the variable $X$.
The ‘Residual Standard Error’ (RSE or $\left({\sigma}^{2}\right)$) is a measure of fit of the linear regression model.
where ${\u03f5}_{i}={y}_{i}\widehat{\beta}{x}_{i}$. The ‘Standard Error’ $\left({\sigma}_{\widehat{\beta}}^{2}\right)$ in the estimation of the slope $\widehat{\beta}$ is:
Consider the statistic ${t}_{\beta}=\frac{\widehat{\beta}\beta}{{\sigma}_{\widehat{\beta}}}$ and the null (H0) and alternative hypothesis (H1):
Under the null hypothesis, ${t}_{0}=\frac{\widehat{\beta}}{{\sigma}_{\widehat{\beta}}}$ follow a tStudent distribution with N1 degrees of freedom. The pvalue of this tests is computed as:
where $P(T\le {t}_{0})$ is the probability that a tStudent variable with N1 degrees of freedom be less than ${t}_{0}$. Note that through this hypothesis test it is possible to evaluate if the slope $\beta $ is significative to explain the linear relationship between the variables.
The variance of the dependent variable $y$ is:
where, ${\widehat{y}}_{i}=\widehat{\beta}{x}_{i}$. In accordance with the Equation (31), the variance rate that could be explained by the regression model is the regression variance divided by the full variance, this is:
Let $\mathrm{\Psi}$ be the lineal regression model that adjusts the data $({x}_{i},{y}_{i})$, and be ${R}^{2}$ the associated rsquare value, then, $100\times {R}^{2}$ is the percent of the data variance that it was explained by the model. For strong linear dependencies, the expected ${R}^{2}$ value should be close to 1. For more information about linear regression models consult (Kiefer, 1987).
The results of the regression models are available in Appendix 1—table 6. In all regression models, the standard errors are in the order of ${10}^{2}\mu m$, with a $pvalue<{10}^{33}$ which is a strong evidence of the linear relationship that exist between NSP2 with the other seven viral proteins. Also, the RSE have values in the order of ${10}^{2}\mu m$, except for VP7 which have a RSE of $0.15\mu m$, which could suggest that exist a greater dispersion of VP7 in comparison with the others viral proteins, something expected since, according to our model, VP7 is the most external protein.
The average errors are under 8% in all models (Appendix 1—table 6), which confirm the capacity of our linear models to predict with high accuracy the radius of the described viral elements. On the other hand, the percentage variance rate that could be explained by the regression models ($100\times {R}^{2}$, see Equation (32)), was greater than the 96% in all cases (Appendix 1—table 6), demonstrating again the high linear dependency between NSP2 and the others viral elements into the viroplasm.
NSP5 and NSP4 as reference proteins
As was mentioned in the main text, we carried out a similar study using NSP5 and NSP4 as reference proteins. Since these experiments were developed to validate our previous results with NSP2, they were not as deep and extensive as in the case of the NSP2 model.
The number of VPs studied was 38 and 23 for NSP5~VP6 and NSP5~VP4 respectively (Appendix 1—figure 10A.1). The distribution of NSP5 relative to VP6 and VP4 did not show significative statistical differences according to the MannWhitney test, and the radius of VP6 was smaller than VP4, which is in accordance with our model of the viroplasm (Appendix 1—figure 10A). Furthermore, the distance of VP6 and VP4 towards NSP5, reflects that VP6 is closer to NSP5 than VP4, with a significance level $\le 0.001$ (Appendix 1—figure 10B). On the other hand, we found that exists a linear relationship between the radii of NSP5 and {VP6, VP4} (Appendix 1—figure 10 CE). The linear models of both, VP6 and VP4, explain approximately 97% of the data variability and exhibit a RSE in the order of ${10}^{2}\mu m$ (see Appendix 1—figure 10 (D,E) and Appendix 1—table 7). Additionally, we performed a MannWhitney test to study the differences of the radii of NSP5 and each of the accompanying proteins (VP6 and VP4). This test demonstrated that the differences in location between NSP5 and VP6 is $\approx 0.1\mu m$, while in the case of VP4 is $\approx 0.23\mu m$, with a pvalue in the order of ${10}^{7}$ and ${10}^{8}$ respectively (see Appendix 1—table 8), indicating that although these protein are near to each other, they occupy different layers on the VPs.
In the case of the model based on NSP4, we studied 60 viroplasms through the algorithm VPsDLSFC. The results show that the radius of VP6 were bigger than the radius of NSP4 in approximately 0.04 μm (Appendix 1—figure 11 AB). This difference is statistically significative ($pvalue\le 0.05$) according with the MannWhitney test (Appendix 1—figure 11A). Like the models based on NSP2 and NSP5, we found a linear relationship between NSP4 and VP6 (Appendix 1—figure 11C). This linear model was validated through a residual analysis (Appendix 1—figure 11D and Appendix 1—table 9), obtaining a $RSE=0.079\mu m$, and a ${R}^{2}=0.97$.
Consistency Between Models
As was described, our study involves three differents reference proteins (NSP2, NSP5 and NSP4). Since the three models should respond to an unique spatial organization of the viral elements into the VP, it is expected to obtain similar results independenly of the selected model.
The equivalence of the differents models (models based on NSP2 and NSP5) can be proven through a comparison of the distances between NSP5 and VP6. If the two models describe the same distribution of the VPs, we expect that the distance of NSP5 to VP6 computed by the NSP5 model be similar to the the distance between NSP5 and NSP2, plus the distance of NSP2 to VP6 that were computed by the model based on NSP2. Hence:
The subindex (‘name’) in ${d}_{name}(\cdot )$ specifies the model employed to compute the distance. Therefore, replacing the corresponding values (available in Appendix 1—table 5 and Appendix 1—table 8) in the above equation, we obtain that:
Since the level of precision in our radii estimation of the viral components are limited by the resolution of the algorithm 3B ($0.040.05\mu m$), we set this parameter as a threshold for the errors when we compared the results obtained for each model. For example, the error between the models based on NSP2 and NSP5 for VP6 is $\approx 0.016\mu m$ (see Equation (34)), which is under the resolution limit that can be reached by the algorithm 3B, and as consequence the error is asociated to the limitations of the optic and not to the reference protein that we selected for the estimation of the VP6’s radius.
This same approach was applied for VP4. In this case, and considering again the results in Appendix 1—table 5 and Appendix 1—table 8 we obtain:
which again is an error under the resolution limit of the alorithm 3B.
On the other hand, taking into account the models based on NSP2 and NSP4, and the results in Appendix 1—table 5 and Appendix 1—figure 11A, we obtain that:
As before, the error is under the limit of resolution of the algorithm 3B. Altogether, the comparison of the three models shows up that different experimental approaches describe the same spatial distributions of the viral elements inside the viroplasm.
The previous analysis only considers the consistency of the models based on the difference in the radius of the viral elements, but besides this, we developed a similar study using the lineal regression models.
Again, consider the model NSP5 and NSP2 as predictors of VP6:
where ${\beta}_{\text{NSP2}}^{\text{VP6}}$ and ${\beta}_{\text{NSP5}}^{\text{VP6}}$ are the slope associated with VP6 in the model based on NSP2 and NSP5 respectively. Also, the radius of the NSP5 can be estimated with the model based on NSP2, this is:
Substituting Equation (39) into Equation (38) we obtain:
and taking into acount Equation (37), we can assert that:
Note 10 The relative error was computed under the assumption that the slope of the model based on NSP2 is the ideal value. This consideration was based in the fact that we have more evidence of a linear relationship in the NSP2 model (see the ${R}^{\mathrm{2}}$ and RSE in Appendix 1—table 6) that in the models based on NSP5 and NSP4 (Appendix 1—table 7 and Appendix 1—table 9 respectively).
For the prediction of VP6 through the models NSP2 and NSP5, the relative error is (Equation (43)):
The relative error asociated with the prediction of the radius of VP4 through the models NSP2 and NSP5 is:
Finally, in the case of VP6 considering the models NSP2 and NSP4 the relative error is:
Since the relative errors represent the average error of the slopes computed with the models NSP5 and NSP4 in relation with the NSP2 model. These results indicate that the model based on NSP5 has an error of 6.5% for the prediction of the radius of VP6, while for VP4 the error is around 1.3% compared to the prediction of the model based on NSP2. On the other hand, if we use the model based on NSP4, we get approximately a 7.3% of differences in the radius of VP6. As in the previous analysis, these errors are small too, and could be associated with experimental variations and with the resolution limit of the algorithm 3B.
Even when the relative error provide important information about the differences in the prediction between differents models, we are not taking into account the standard error associated with the slopes of each model. In the case of the prediction of VP6 through the models NSP2 and NSP5 (all others cases have the same deduction), if we consider ${\beta}_{\text{NSP5}}^{\text{VP6}}$ and ${\beta}_{\text{NSP2}}^{\text{NSP5}}$ as random variables with a standard error $\sigma \left({\beta}_{\text{NSP5}}^{\text{VP6}}\right)$ and $\sigma \left({\beta}_{\text{NSP2}}^{\text{NSP5}}\right)$ respectively, the variance of the variable $\left({\beta}_{\text{NSP5}}^{\text{VP6}}\times {\beta}_{\text{NSP2}}^{\text{NSP5}}\right)$ could be computed through the ‘Delta Method’ (Oehlert, 1992; Ver Hoef, 2012). The ‘Delta Method’ makes possible to obtain an approximation of the variance of a function $f({X}_{1},{X}_{2},\mathrm{\dots},{X}_{n})$ as:
were $Cov({X}_{i},{X}_{j})$ denote the covariance between ${X}_{i}$ and ${X}_{j}$ (Lawrence, 1953).
Note 11 The standard error is defined as:
but, in our case $n\mathrm{=}\mathrm{1}$ because we only have the final slope estimation and the associated standard error, hence the standard error is equal to the standard deviation (Altman and Bland, 2005).
Because ${\beta}_{\text{NSP5}}^{\text{VP6}}$ and ${\beta}_{\text{NSP2}}^{\text{NSP5}}$ are independent (obtained by the linear regression analysis of the segmentation results in differents experiments and images) the covariance $Cov({\beta}_{\text{NSP5}}^{\text{VP6}},{\beta}_{\text{NSP2}}^{\text{NSP5}})=0$. This point can be proved easily because $Cov(X,Y)=\mathbf{\mathbf{E}}(XY)\mathbf{\mathbf{E}}(X)\mathbf{\mathbf{E}}(Y)$, were $\mathbf{\mathbf{E}}(\cdot )$ is the expected value. If X and Y are independents $\mathbf{\mathbf{E}}(XY)=\mathbf{\mathbf{E}}(X)\mathbf{\mathbf{E}}(Y)\Rightarrow Cov(X,Y)=0$. Therefore, Equation (47) can be simplified to:
Now, considering the function $f({p}_{1},{p}_{2})={p}_{1}{p}_{2}$, we can rewrite Equation (48) as:
and substituting the variance for the standard deviation (same that the standard error in this case, see Note 11), we have:
The 99% confidence interval for ${p}_{1}{p}_{2}$ can be computed as:
Set ${p}_{1}={\beta}_{\text{NSP5}}^{\text{VP6}}$ and ${p}_{2}={\beta}_{\text{NSP2}}^{\text{NSP5}}$. We can compute the confidence interval of ${\beta}_{\text{NSP5}}^{\text{VP6}}\times {\beta}_{\text{NSP2}}^{\text{NSP5}}$ through Equation (49) and Equation (50).
Now, considering Equation (50) we obtain:
The slope of VP6 in the NSP2 model (${\beta}_{\text{NSP2}}^{\text{VP6}}$, see Appendix 1—table 6) is $1.1806335\in [1.009652,1.197519]$, which proves that, in reference to VP6, the models based on NSP2 and NSP5 are consistent.
The rest of combinations runs as before. In the case of VP4 and the models based on NSP2 and NSP5, we obtained:
Again, the slope of VP4 in the NSP2 model is $1.3887083\in [1.267769,1.546092]$ (see Appendix 1—table 6), and as consequence the models are consistent between them.
Finally, the stability between the models NSP4 and NSP2 based on the protein VP6 was evaluated.
In this case the slope ${\beta}_{\text{NSP2}}^{\text{VP6}}=1.1806335\notin [1.036190,1.151645]$, which is surprising taking into account that the relative error of NSP4 as predictor of VP6 instead of the model based on NSP2 is around the 7.3% (see Equation (46)). Even when this percent doesn’t represent a great difference, it could be enough to generate small variations that cause that the slope ${\beta}_{\text{NSP2}}^{\text{VP6}}$ be out of the confidence interval given in Equation (56). Moreover, note that the difference between the upper endpoint of the confidence interval of the slope between NSP2 and VP6 is approximately $0.03\mu m$, which is under the resolution limit of the 3B algorithm. A deeper study with different antibodies for NSP4 and the contrast of NSP4 with others viral elements would help to clarify our results.
Stochastic model fitted for 3B super resolution microscopy.
The algorithm 3B (Cox et al., 2011) creates a super resolution image from experimental data (hundreds of images acquired by the microscope). The 3B algorithm generates a probability map of fluorophore location through Markov Chains and Bayesian inference. The implementation of the algorithm requires a priori information on the dynamics of the photophysical properties of the fluorophores under study; however, the software does not provide a method to calculate it. In this section we propose a method to obtain this information from the experimental data.
3B microscopy generates a probability map of fluorophore localization by deriving a weighted average over all possible models; the set of models includes varying numbers of emitters, emitter localizations and temporal dynamics of the states of the fluorophores. The information stored in the image sequence is iteratively compared with the models using Bayesian inference, producing a SRM image. This technique allows the reduction of the number of necessary images, so it reduces the photo damage in the sample. However, the necessary computation time is significant (Hu et al., 2013); therefore, we employed 3B parallelization techniques for a cluster and PC from Hernández et al. (2016) in order to decrease the time to enhance the SR images.
The dynamics of the photophysical properties of a fluorophore is modeled in 3Balgorithm as a Markov chain, where, for each time, the fluorophore can be in three posible states: the emitting state ${S}_{0}$, the nonemitting state ${S}_{1}$, and the bleached state ${S}_{2}$ (Appendix 1—figure 12). Hence, in the complete sequence of images a fluorophore can be transiting between these states with a probability ${P}_{i}$.
In this way, the temporary stack of images acquired by the microscope contains information on the dynamics of an unknown number of fluorophores transiting between these three states, with a probability ${P}_{i}$. This information is represented as a transition matrix:
where the entry in the $i$th row and $j$th column is the probability that a fluorophore in the state ${S}_{i}$ moves to the state ${S}_{j}$.
The initial probability distribution
describes the probability of a fluorophore to be in each energy state in the beginning of the process: ${\varphi}_{0}(i)$ corresponds to the state ${S}_{i}$.
The 3B algorithm considers the Markov chain determined by the transition matrix (Equation (57)) and the initial probability distribution (Equation (58)). Then,
are the probabilities that a fluorophore is in the states ${S}_{0},{S}_{1}$ and ${S}_{2}$ in time $1,\mathrm{\dots},n$; that is, for all $i=1,\mathrm{\dots},n$, ${\overline{\varphi}}_{i}$ is the 3vector whose entries are the probabilities that the fluorophore is in each energy state at time $i$.
The 3B algorithm has the transition matrix (Equation (57)) as an input, together with the initial probability distribution (Equation (58)). However, the software does not provide a method to calculate the transition matrix from the experimental data.
We deal with this issue by proposing a method to determine $P$ from the data. Namely, we propose to adjust the experimental data (the stack of images) with an ordinary differential equation (ODE) model that allows us to calculate the probabilities of the system.
Following the Jablonski diagram for a given fluorophore, we propose a model with three states (${S}_{0},{S}_{1}$ and ${S}_{2}$) and three transition probabilities (${k}_{isc},{k}_{T}$ and ${k}_{b}$) between them (Appendix 1—figure 13).
For $i=0,\mathrm{\hspace{0.25em}1},\mathrm{\hspace{0.25em}2}$, we define the dynamics of the energy state ${S}_{i}$ as the time evolution of the concentration $[{E}_{i}]$ of molecules (fluorophores) emitting photons:
The sequence of images taken through the time keeps track of the state the fluorophore is in. If in the $i$th image the fluorophore has intensity (the value of the pixel approaches the maximum value), then the fluorophore is in its emitting state; in any other case, the fluorophore is off, and it can be in two states: the nonemitting state or the bleached state. We normalize the data and compute the mean intensity at time t of the fluorophores in the emitting state: by definition, the initial mean value is one; after that, it is going to decay depending on the acquiring protocol.
Following the classical regression methods, we fit the experimental data to the proposed differential equations system. The estimate of the rate constant ${k}_{isc}$, ${k}_{T}$ and ${k}_{b}$ from the stack of images is obtained following the example taken from Rawlings and Ekerdt (2013).
Let us consider the matrix $P$ from Equation (57) as the transition matrix of our model. Then, by the definition of our model, ${P}_{2}={k}_{isc}$, ${P}_{3}={k}_{T}$ and ${P}_{5}={k}_{b}$. Also, by the general properties of the probability function, the sum of the entries of each row of this matrix is one; so we obtain the transition matrix as follows:
As we mentioned earlier, the estimate of the transition matrix depends on the acquisition protocol. There are image sequences for which the fitting does not converge; this happens when the average intensity decays exponentially. However, when the average intensity decreases linearly, the adjustment of the data to the converging model results in a transition matrix that fits the data.
After the fitting to the experimental data we simulate a Markov chain for the calculated transition matrix (Appendix 1—figure 14). We also observe that greater number of fluorophores leads to better concordance between the simulation and the experimental data.
To corroborate that the proposed fitting is adjusted properly we use the experimental data provided by the authors of the 3B SRM microscopy algorithm. The podosomes data is available at the ThreeB plugin for ImageJ (Rosten et al., 2013), is a time serie of 300 images. Taking this stack as initial experimental data, we simulate Markov chains for 1000 fluorophores with the original transition matrix and the transition matrix fitted by our model. Although we found that our model fits better to the experimental data (Appendix 1—figure 15); the comparision between the SR images of the podosomes are not conclusive about the resolution enhancement with our model (Appendix 1—figure 16).
Furthermore, in order to validate our method, we use Gattapaint nanorulers, DNA molecules label with 488, 550 and 655 fluorophores at a distance of $40nm$ (GATTAPAINT 40RG/40B, immobilized in buffer on glassslide) to resolve the resolution enhacement of our method.
As in the previous analysis, we find that our model fits better to the experimental data obtained from the different ATTOS. Because our model involves a greater number of fluorophores, there is more information of the fluorophores localization. For example, in the Appendix 1—figure 17A.1, the information in the SR image with the original matrix, resolves only one fluorophore; however, with our method, three fluorophores in the same area can be resolved. In this way, we observed that in all cases the distance between the fluorophores (each peak) of the grahps shown in panels A.1A.3 from Appendix 1—figure 17 is approximately 40 nm; this coincides with the information provided by the manufacturers of the samples. All together, these results demonstrate that the new proposed method improves the resolution capacity of 3B algorithm by reflecting better the photophysical behavior of the fluorophores.
Once the transition matrix for the nanorulers was calculated and taking into account that the data consists of hundreds of images, we parallelized the data on a cluster and on a personal computer (Hernández et al., 2016) (the cluster has 314 nodes with RedHat Enterprise Linux Version 6.2, 2 Intel Xeon Sandy Bridge E52670 processors at 2.6 GHz with 8 cores each 64 GB of RAM) to reduce the computation time.
Data availability
All data generated or analysed during this study are included in the manuscript and supporting files.
References

Rotavirus NSP5 phosphorylation is upregulated by interaction with NSP2Journal of General Virology 79 Pt 11:2679–2686.https://doi.org/10.1099/0022131779112679

Ultrastructural study of Rotavirus replication in cultured cellsJournal of General Virology 46:75–85.https://doi.org/10.1099/0022131746175

Recovery and characterization of a replicase complex in rotavirusinfected cells by using a monoclonal antibody against NSP2Journal of Virology 70:985–991.

Receptor activity of Rotavirus nonstructural glycoprotein NS28Journal of Virology 63:4553–4562.

Rotavirus nonstructural protein NSP5 interacts with major core protein VP2Journal of Virology 77:1757–1763.https://doi.org/10.1128/JVI.77.3.17571763.2003

Genome packaging in multisegmented dsRNA viruses: distinct mechanisms with similar outcomesCurrent Opinion in Virology 33:106–112.https://doi.org/10.1016/j.coviro.2018.08.001

Association of Rotavirus viroplasms with microtubules through NSP2 and NSP5Memórias Do Instituto Oswaldo Cruz 101:603–611.https://doi.org/10.1590/S007402762006000600006

RNA interference of Rotavirus segment 11 mRNA reveals the essential role of NSP5 in the virus replicative cycleJournal of General Virology 86:1481–1487.https://doi.org/10.1099/vir.0.805980

Investigation of immunoperoxidaselabelled Rotavirus in tissue culture by light and electron microscopyJournal of General Virology 50:195–200.https://doi.org/10.1099/00221317501195

Rotavirus NSP5 orchestrates recruitment of viroplasmic proteinsJournal of General Virology 91:1782–1793.https://doi.org/10.1099/vir.0.0191330

Characterization of Rotavirus NSP2/NSP5 interactions and the dynamics of viroplasm formationJournal of General Virology 85:625–634.https://doi.org/10.1099/vir.0.196110

BookRotavirusesIn: Knipe D, Howley P, Cohen J, Griffin D, Lamb R, Martin M, Racaniello V, Roizman B, editors. Fields Virology (Sixth Edition). Philadelphia: Wolters Kluwer Health/Lippincott Williams & Wilkins. pp. 1347–1401.

Two nonstructural Rotavirus proteins, NSP2 and NSP5, form viroplasmlike structures in vivoJournal of General Virology 80 Pt 2:333–339.https://doi.org/10.1099/00221317802333

Direct least square fitting of ellipsesIEEE Transactions on Pattern Analysis and Machine Intelligence 21:476–480.https://doi.org/10.1109/34.765658

In vivo interactions among Rotavirus nonstructural proteinsArchives of Virology 143:981–996.https://doi.org/10.1007/s007050050347

Serological analysis of the subgroup protein of Rotavirus, using monoclonal antibodiesInfection and Immunity 39:91–99.

Superresolution imaging with small organic fluorophoresAngewandte Chemie International Edition 48:6903–6908.https://doi.org/10.1002/anie.200902073

BookParallelizing the Bayesian Analysis of Blinking and Bleaching for SuperResolution MicroscopyIn: Gitler I, Klapp J, editors. High Performance Computer Applications, 595. Springer. pp. 356–366.https://doi.org/10.1007/9783319322438_25

Two forms of VP7 are involved in assembly of SA11 Rotavirus in endoplasmic reticulumJournal of Virology 62:2929–2941.

Snakes: active contour modelsInternational Journal of Computer Vision 1:321–331.https://doi.org/10.1007/BF00133570

BookIntroduction to Statistical Inference (First Edition)Springer Texts in Statistics.

Structural analysis of herpes simplex virus by optical superresolution imagingNature Communications 6:2058–2066.https://doi.org/10.1038/ncomms6980

Silencing the morphogenesis of RotavirusJournal of Virology 79:184–192.https://doi.org/10.1128/JVI.79.1.184192.2005

Reduced expression of the Rotavirus NSP5 gene has a pleiotropic effect on virus replicationJournal of General Virology 86:1609–1617.https://doi.org/10.1099/vir.0.808270

Rotavirus proteins VP7, NS28, and VP4 form oligomeric structuresJournal of Virology 64:2632–2641.

On a test of whether one of two random variables is stochastically larger than the otherThe Annals of Mathematical Statistics 18:50–60.https://doi.org/10.1214/aoms/1177730491

Structural organisation of the Rotavirus nonstructural protein NSP5Journal of Molecular Biology 413:209–221.https://doi.org/10.1016/j.jmb.2011.08.008

Variations of box plotsThe American Statistician 32:12–16.https://doi.org/10.1080/00031305.1978.10479236

An improved seeded region growing algorithmPattern Recognition Letters 18:1065–1071.https://doi.org/10.1016/S01678655(97)001311

Rotavirus replication and reverse geneticsViral Gastroenteritis pp. 121–143.https://doi.org/10.1016/B9780128022412.000079

Level Set Methods and Dynamic Implicit Surfaces. Applied Mathematical Sciences79–86, Particle Level Set Method, Level Set Methods and Dynamic Implicit Surfaces. Applied Mathematical Sciences, Springer.

Localization of Rotavirus antigens in infected cells by ultrastructural immunocytochemistryJournal of General Virology 63:457–467.https://doi.org/10.1099/00221317632457

In vivo and in vitro phosphorylation of Rotavirus NSP5 correlates with its localization in viroplasmsJournal of Virology 71:34–41.

SoftwareR: A Language and Environment for Statistical ComputingR Foundation for Statistical Computing, Vienna, Austria.

BookChemical Reactor Analysis and Design Fundamentals (Second Edition)Nob Hill Pub.

Intracellular localization of rotaviral proteinsArchives of Virology 88:251–264.https://doi.org/10.1007/BF01310879

Superresolution microscopy with DNAPAINTNature Protocols 12:1198–1228.https://doi.org/10.1038/nprot.2017.024

Image segmentation using seeded region growingInternational Conference on Computing, Electronics and Electrical Technologies 2012:576–583.

Assembly of highly infectious Rotavirus particles recoated with recombinant outer capsid proteinsJournal of Virology 80:11293–11304.https://doi.org/10.1128/JVI.0134606

Effects of intrabodies specific for Rotavirus NSP5 during the virus replicative cycleJournal of General Virology 85:3285–3290.https://doi.org/10.1099/vir.0.800750

Who invented the Delta method?The American Statistician 66:124–127.https://doi.org/10.1080/00031305.2012.687494

Lipid droplets and cellular lipid metabolismAnnual Review of Biochemistry 81:687–714.https://doi.org/10.1146/annurevbiochem061009102430
Article and author information
Author details
Funding
Dirección General de Asuntos del Personal Académico, Universidad Nacional Autónoma de México (IG200317)
 Susana López
 Carlos F Arias
Dirección General de Asuntos del Personal Académico, Universidad Nacional Autónoma de México (IA202417)
 Adán Guerrero
Universidad Nacional Autónoma de México (SC151IR89)
 Adán Guerrero
Consejo Nacional de Ciencia y Tecnología (252213)
 Adán Guerrero
Dirección General de Asuntos del Personal Académico, Universidad Nacional Autónoma de México (IN202312)
 Haydee Olinca Hernández
Universidad Nacional Autónoma de México (SC161IR102)
 Adán Guerrero
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
YG received a postdoctoral fellowship from DGAPAUNAM at the Institute of Biotechnology (IBtUNAM). AG thanks DGTICUNAM for generous computing time on the Miztli supercomputer (Grant numbers: SC151IR89; SC161IR102). JLM and DTH are recipients of scholarships from CONACyT. HOH received a grant from the Programa de Apoyo a Proyectos de Investigación e Innovación Tecnológica (PAPIITUNAM), IN202312. Microscopy equipment was provided and maintained through CONACYT grants 123007, 232708, 260541, 280487, 293624. AG thanks CONACyT (No. 252213) and DGAPAPAPIIT (No. IA202417), SL and CFA thank DGAPAPAPIIT grant IG200317 for funding. We are thankful to IBtUNAM for providing access to the computer cluster and to Jerome Verleyen for his support while using it. We are also thankful to Arturo Pimentel, Andrés Saralegui and Xochitl Alvarado from LNMAUNAM for their helpful discussions. The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Copyright
© 2019, Garcés Suárez et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 2,572
 views

 410
 downloads

 24
 citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Microbiology and Infectious Disease
The NixTB clinical trial evaluated a new 6 month regimen containing three oral drugs; bedaquiline (B), pretomanid (Pa), and linezolid (L) (BPaL regimen) for the treatment of tuberculosis (TB). This regimen achieved remarkable results as almost 90% of the multidrugresistant or extensively drugresistant TB participants were cured but many patients also developed severe adverse events (AEs). The AEs were associated with the longterm administration of the protein synthesis inhibitor linezolid. Spectinamide 1599 is also a protein synthesis inhibitor of Mycobacterium tuberculosis with an excellent safety profile, but it lacks oral bioavailability. Here, we propose to replace L in the BPaL regimen with spectinamide (S) administered via inhalation and we demonstrate that inhaled spectinamide 1599, combined with BPa ––BPaS regimen––has similar efficacy to that of the BPaL regimen while simultaneously avoiding the Lassociated AEs. The BPaL and BPaS regimens were compared in the BALB/c and C3HeB/FeJ murine chronic TB efficacy models. After 4weeks of treatment, both regimens promoted equivalent bactericidal effects in both TB murine models. However, treatment with BPaL resulted in significant weight loss and the complete blood count suggested the development of anemia. These effects were not similarly observed in mice treated with BPaS. BPaL and BPa, but not the BPaS treatment, also decreased myeloid to erythroid ratio suggesting the S in the BPaS regimen was able to recover this effect. Moreover, the BPaL also increased concentration of proinflammatory cytokines in bone marrow compared to mice receiving BPaS regimen. These combined data suggest that inhaled spectinamide 1599 combined with BPa is an effective TB regimen without Lassociated AEs.

 Immunology and Inflammation
 Microbiology and Infectious Disease
The balanced gut microbiota in intestinal mucus layer plays an instrumental role in the health of the host. However, the mechanisms by which the host regulates microbial communities in the mucus layer remain largely unknown. Here, we discovered that the host regulates bacterial colonization in the gut mucus layer by producing a protein called Chitinase 3like protein 1 (Chi3l1). Intestinal epithelial cells are stimulated by the gut microbiota to express Chi3l1. Once expressed, Chi3l1 is secreted into the mucus layer where it interacts with the gut microbiota, specifically through a component of bacterial cell walls called peptidoglycan. This interaction between Chi3l1 and bacteria is beneficial for the colonization of bacteria in the mucus, particularly for Grampositive bacteria like Lactobacillus. Moreover, a deficiency of Chi3l1 leads to an imbalance in the gut microbiota, which exacerbates colitis induced by dextran sodium sulfate. By performing fecal microbiota transplantation from Villincre mice or replenishing Lactobacillus in IEC^{∆Chil1} mice, we were able to restore their colitis to the same level as that of Villincre mice. In summary, this study shows a ‘scaffold model’ for microbiota homeostasis by interaction between intestinal Chi3l1 and bacteria cell wall interaction, and it also highlights that an unbalanced gut microbiota in the intestinal mucus contributes to the development of colitis.