Glycan processing in the Golgi as optimal information coding that constrains cisternal number and enzyme specificity

  1. Alkesh Yadav
  2. Quentin Vagne
  3. Pierre Sens
  4. Garud Iyengar  Is a corresponding author
  5. Madan Rao  Is a corresponding author
  1. Raman Research Institute, India
  2. Laboratoire Physico Chimie Curie, Institut Curie, CNRS UMR168, France
  3. Industrial Engineering and Operations Research, Columbia University, United States
  4. Simons Centre for the Study of Living Machines, National Centre for Biological Sciences, India
16 figures, 3 tables and 1 additional file

Figures

Living cells display a complex glycan distribution.

(a) 3-Gaussian mixture model (GMM) and 20-GMM approximation for the relative abundance of glycans taken from mass spectrometry coupled with determination of molecular structure (MSMS) data of planaria Schmidtea mediterranea, Hydra magnipapillata, and human neutrophils. (b) The change in the Kullback–Leibler (KL) divergence D(pTpGMM(m)) as a function of the number of GMM components m. The KL divergence for planaria saturates at m=5, for hydra at m=11, and for human cells at m=20. Thus, the number of components required to approximate the glycan profile correlates well with the complexity of the organism. Details are given in Appendix 1.

Enzymatic reaction and transport network in the secretory pathway.

Represented here is the array of Golgi cisternae (blue) indexed by j=1,,NC situated between the endoplasmic reticulum (ER) and plasma membrane (PM). Glycan-binding proteins Pc1(1) are injected from the ER to cisterna-1 at rate q. Superimposed on the Golgi cisternae is the transition network of chemical reactions (column) – inter-cisternal transfer (rows), the latter with rates μ(j). Pc1(1) denotes the acceptor substrate in compartment j and the glycosyl donor c0 is chemostated in each cisterna. This results in a distribution (relative abundance) of glycans displayed at the PM (red curve), which is representative of the cell type.

Trade-offs amongst the glycan synthesis parameters, enzyme specificity σ, cisternal number NC, and enzyme number NE to achieve a complex target distribution c.

(a, b) Normalised Kullback–Leibler distance D¯(σ,NE,NC,c) as a function of σ and NC (for fixed NE=3), (c, d) D¯(σ,NE,NC,c) as a function of σ and NE (for fixed NC=3), with the target distribution c set to the 3-Gaussian mixture model (GMM) (less complex) and 20-GMM (more complex) approximations for the human T-cell mass spectrometry coupled with determination of molecular structure (MSMS) data. D¯(σ,NE,NC,c) is a convex function of σ for each (NE,NC,c), decreasing in NC,NE for each (σ,c), increasing in the complexity of c for fixed (σ,NE,NC). The specificity σmin(c,NE,NC)=argminσ{D¯(σ,NE,NC,c)} that minimizes the error for given (NE,NC,c) is an increasing function of NC,NE and the complexity of the target distribution c.

Fidelity of glycan distribution and optimal enzyme properties to achieve a complex target distribution.

The target c is taken from 3-Gaussian mixture model (GMM) (less complex) and 20-GMM (more complex) approximations of the human T-cell mass spectrometry coupled with determination of molecular structure (MSMS) data. (a, b) Optimum fidelity minσ{D¯(σ,NC,NE,c)} as a function of (NE,NC). More complex distributions require either a larger NE or NC. The marginal impact of increasing NE and NC on the fidelity D¯ is approximately equal. (c, d) Enzyme specificity σmin that achieves minσ{D¯(σ,NC,NE,c)} as a function of (NE,NC). σmin increases with increasing NE or NC. To synthesize the more complex 20-GMM approximation with high fidelity requires enzymes with higher specificity σmin compared to those needed to synthesize the broader, less complex 3-GMM approximation.

Optimal enzyme partitioning in cisternae.

(a) Heat map of the effective reaction rates in each cisterna (representing the optimal enzyme partitioning) and the steady-state concentration in the last compartment (c(NC)) for the 20-Gaussian mixture model (GMM) target distribution. Here, NE=5, NC=7, normalized D(T(20)c(NC))/H(T(20))=0.11. (b) Effective reaction rates after swapping the optimal enzymes of the fourth and second cisternae. The displayed glycan profile is considerably altered from the original profile.

Stiff and sloppy directions in the optimization parameters.

(a) Eigenvectors of the Hessian matrix 2XiXjF|Xmin for (NE,NC)=(4,4). The x-axis indexes the NC+2NENC+1=37 eigenvectors, the y-axis indexes the NC+2NENC+1 components of the eigenvectors, and the greyscale denotes the absolute value of the component in the range [0,1]. The components are grouped according to (μ,R,L,σ), and the eigenvectors are ordered according to the most dominant component in the eigenvector (μ, orange; R, blue; L, green; σ, purple). There is some mixing of the different components (R and μ or σ and μ) but this is usually small. (b) The distribution of eigenvalues λi of the Hessian matrix 2XiXjF|Xmin. Each stripe represents an eigenvalue, and the location of the stripe on the x-axis represents whether the dominant component of the associated eigenvector belongs to μ, R, L, or σ direction. (c) The average stiffness along μ, R, L, or σ directions, defined by the log of the average of eigenvalues corresponding to the eigenvectors in the respective group, as a function of NC for fixed NE=4. (d) Total average stiffness λ=log(λiNC+2NENC+1) as a function of NE,NC.

Strategies for achieving high glycan diversity.

Diversity versus NC and transport rate μ at various values of specificity σ for fixed NE=3. (a) Diversity vs. NC at optimal transport rate μ. Diversity initially increases with NC, but eventually levels off. The levelling off starts at a higher NC when σ is increased. These curves are bounded by the σ=0 curve. (b) Diversity vs. cisternal residence time (μ-1) in units of the reaction time (Rmin-1) at various value of σ, for fixed NC=4 and NE=10.

Appendix 1—figure 1
The binned mass spectroscopy (MS) data (blue) approximates the raw MS data (red) very well.

We use this binned data for Gaussian mixture model (GMM) approximation of the MS data.

Appendix 1—figure 2
Log likelihood vs. number of components (N) in the Gaussian mixture model (GMM).

We see that the log likelihood saturates at around,N=20, thus 20-GMM is a very good representation of the mass spectroscopy (MS) data from human T-cells. The different symbols are for (a) different values of the maximum intensity Imax=50,100,200 and (b) different values of the number of i.i.d. samples,Ni=500,1000,2000 showing the insensitivity of the log likelihood to the value of Imax and.Ni

Appendix 4—figure 1
Flow chart showing the optimization schemes for Optimization A and B.

We prove that DminA=DminB by showing the set of all c is equal to the set of all v. We additionally establish that the optimum vmin=cmin.

Appendix 6—figure 1
Glycan concentration profile calculated from the model using (a) formula (31) for NE=NC=1 and (b) formulae (32)–(36) for NE=1,NC=2.
Appendix 6—figure 2
Glycan profile {ck:k=1,,Ns} as a function of specificity σ (a, c) and reaction rates R (b, d).

(a) NE=NC=1,(R=50,μ=1,l=10). ck decreases exponentially with k for very low and very high σ; however, the decay rate is lower at low σ. For intermediate values of σ, the distribution has exactly two peaks, one of which is at k=0, and eventually decays exponentially. The width of the distribution is a decreasing function of σ. (b) NE=NC=1,(σ=0.1,μ=1,l=10). At low R, ck is concentrated at low k. The proportion of higher index glycans in an increasing function of R. (c) NE=NC=2,(R=40,μ=1,[l1(1),l2(1),l1(2),l2(2)]=[10,30,50,70]). As σ increases, the distribution becomes more complex – from a single-peaked distribution at low σ to a maximum of four-peaked distribution at high σ. The peaks gets sharper, and more well defined as σ increases. (d) NE=NC=2,(R=40,μ=1,[l1(1),l2(1),l1(2),l2(2)]=[10,30,50,70]). As in the plots in (b), increasing R shifts the peaks towards higher index glycans and the proportion of higher index glycan increases.

Appendix 7—figure 1
Optimum fidelity D¯KL as a function of (NE,NC) for different values of b/Ns, where b bounds the deformation in the ideal length α(0) of an enzyme α=1,,NE.

Small values of b restrict all enzymes from working in all cisternae and all substrates, where large value of b removes this constraint.

Appendix 8—figure 1
Recovering the σ values for different target distribution.

Note that barring four data points all other optimized σ values (red dots) exactly overlap with the corresponding target σ (diamonds).

Appendix 8—figure 2
D¯ for various initial conditions, sorted in increasing order for clarity.

This clearly shows the fraction of initial conditions for which the optimized D¯ is small (see Appendix 8—table 1).

Appendix 10—figure 1
Diversity vs. NC for different values of σ keeping NE=1 fixed, for three different values of the threshold, cth=1Ns,12Ns,14Ns.

Changing the value of the threshold cth only changes the saturation value of the diversity curve.

Tables

Appendix 2—table 1
Enzyme parameters taken from Table 3 in Umaña and Bailey, 1997 that we use to calculate the bounds on the reaction rate R.

Here, KM(α) and Vmax(α) denote the Michaelis constant and Vmax of the αth enzyme.

αKM(α)(µmol)Vmax(α)(pmol/106 cell-min)
11005
22607.5
32005
41005
51902.33
6130.16
73400.16
840009.66
Appendix 8—table 1
Distribution of local minima.
NENCminD¯KLmaxD¯KLFraction of initial conditions within D¯KL0.0228
110.02280.440.56
210.00810.440.73
120.00510.290.70
221.17e-40.290.84
Appendix 11—table 1
Table of symbols and their definitions.
SymbolDefinition
ck(j)Concentration of kth glycan in jth compartment
μ(j)Transport rate from jth to j+1 compartment
σSpecificity of the enzymes
α(j)Ideal substrate length for αth enzyme in jth compartment
M(j,k,α)Enzyme parameter related to the Michaelis constant KM
V(j,k,α)Enzyme parameter related to Vmax
R(j,α)Reaction parameter for Optimization B
D(c*c)KL divergence between c and c
F(c||c)Fidelity, KL divergence normalized by the entropy of the target c
D¯(σ,NE,NC,c)minμ,R,LF(c||c)
Reff(j, k)Effective reaction rate of kth reaction in jth compartment

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Alkesh Yadav
  2. Quentin Vagne
  3. Pierre Sens
  4. Garud Iyengar
  5. Madan Rao
(2022)
Glycan processing in the Golgi as optimal information coding that constrains cisternal number and enzyme specificity
eLife 11:e76757.
https://doi.org/10.7554/eLife.76757