A single, continuous metric to define tiered serum neutralization potency against HIV
Abstract
HIV1 Envelope (Env) variants are grouped into tiers by their neutralizationsensitivity phenotype. This helped to recognize that tier 1 neutralization responses can be elicited readily, but do not protect against new infections. Tier 3 viruses are the least sensitive to neutralization. Because most circulating viruses are tier 2, vaccines that elicit neutralization responses against them are needed. While tier classification is widely used for viruses, a way to rate serum or antibody neutralization responses in comparable terms is needed. Logistic regression of neutralization outcomes summarizes serum or antibody potency on a continuous, tierlike scale. It also tests significance of the neutralization score, to indicate cases where serum response does not depend on virus tiers. The method can standardize results from different virus panels, and could lead to highthroughput assays, which evaluate a single serum dilution, rather than a dilution series, for more efficient use of limited resources to screen samples from vaccinees.
https://doi.org/10.7554/eLife.31805.001Introduction
Comparing antibody neutralization activity of different sera against genetically and antigenically diverse viral strains requires standardization. ID50 (or ID80) values, the inhibitory dilutions at which 50% (or 80%) neutralization is attained, are determined for a panel of viruses, using the TZMbl neutralization assay (SarzottiKelsoe et al., 2014). Serum breadth and potency are two measures used to characterize neutralization responses across virus diversity. Breadth is the proportion of pseudoviruses with an ID50 score above the threshold of detection, and potency is the geometric mean ID50 (Hraber et al., 2014; Rademeyer et al., 2016). At least half of the variation in neutralization assay results from large panels can be explained by the averaged responses per serum, Env, and the entire panel, overall (Hraber et al., 2014). Serum breadth and potency therefore depend strongly on the Env panels used, which can vary markedly between studies.
Virus neutralization sensitivity to panels of sera from chronically infected individuals represents a continuum (Seaman et al., 2010). To characterize Envs in tiers involves partitioning large neutralization panels into three or four groups with similar sensitivity (Rademeyer et al., 2016; Seaman et al., 2010). Antibodies able to neutralize only tier 1 (most sensitive) viruses are readily elicited by HIV Env gp120 immunogens, but such tier1 responses are not protective; in human vaccine efficacy trials, such responses have been unable to confer protection against the viruses that continue to fuel the pandemic (Gilbert et al., 2010; Montefiori et al., 2012). Tier 2 viruses are more difficult to neutralize than tier 1, and represent the majority of viruses that are transmitted to establish new infections (Rademeyer et al., 2016; Seaman et al., 2010). Tier 3 viruses are the most resistant to neutralization.
One difficulty with the tiered scheme for labeling viruses (i.e. tiers 1A, 1B, 2, and 3) is that it simplifies a continuous distribution into three or four categories (Seaman et al., 2010), despite wide variation within each category. Moreover, while the system categorizes viruses, it does not help compare serum neutralization potency. For example, a serum that neutralizes one tier 3 virus but only a few tier 2 viruses might subjectively be designated a ‘tier 3 neutralizing serum,’ while one which neutralizes no tier 3 viruses but many tier 2 viruses a ‘tier 2 serum.’ The latter serum is likely more potent (protective) in realworld scenarios despite being designated with a lower tier. A metric to rate sera for neutralization potency would be useful, for example to downselect vaccine candidates for further evaluation in clinical trials. Such a metric should be objective and continuous, rather than categorybased. It should also provide biologically meaningful and interpretable values that are consistent with expectations of tiered viruses from terminology used by practitioners in this field.
Here, we describe an objective, quantitative metric for serum classification, and apply it to characterize serum neutralization activity against both large and smaller panels of pseudoviruses. It uses logistic regression to establish a numerical value for a given serum, based on its ability to neutralize viruses of different tiers. We describe a statistically motivated Neutralization Potency (NP) score, which represents serum neutralization tier on a continuous, rather than categorical, scale. That scale is designed to be intuitively meaningful to HIV researchers, such that sera with a low score (near 1) are able to neutralize only tier1 viruses, while sera with scores ranging from 2 to 3 reflect increasing capacity to neutralize tier 2 and 3 viruses. A continuum of NP values enables comparisons between sera. Rather than suggesting most sera can neutralize tier 2 viruses, NP values can distinguish between, say, ‘tier 2.1’ and ‘tier 2.5’ sera, the higher score indicating a better neutralization outcome. The potency comparisons are similar to comparing geometric mean neutralization titers, but instead are represented in tierlike terms.
Because this approach is based on the outcome of yesorno neutralization evaluations from a single dilution of serum, it can be used to evaluate large numbers of sera in novel highthroughput designs. The examples here use a threshold ID50 of 1:50, that is a binary assignment of whether or not a serum neutralizes 50% of a virus at a dilution of 1:50. This means the NP could be calculated from a single serum dilution, as opposed to a full eightpoint titration series. The metric lends itself to highthroughput methods to compare neutralization potencies of many sera.
Results
To define a metric that can compare neutralization potencies of different sera, we assigned a single neutralization index (NI) value to each HIV1 envelopepseudotyped virus (Materials and methods). The logarithm of the geometric mean ID50 against 205 sera was linearly rescaled to correspond with tier designations. The resulting NI values thus capture envelope neutralization sensitivity across 205 sera and provide a continuous scale of sensitivity that roughly corresponds to their tier designations. For logistic regression analyses below, the Envs were evaluated according to their NI. We note that the neutralization index could be defined in other ways, for example by using areaundercurve (AUC) values from the dilution series (Yu et al., 2012).
Tierscaled virus NIs can be used to quantify serum neutralization activity. As highertier viruses are more difficult to neutralize, the expectation for a typical serum is that it can more potently neutralize lower tier viruses than higher tier viruses. In an oversimplification to illustrate the concept, we consider cases where a single threshold NI value cleanly separates outcomes, such that only those viruses above that NI resist neutralization by that serum. The serum would be assigned the threshold NI as a measure of neutralization potency. For example, one hypothetical serum that neutralizes tier 1 viruses up to tier 1.1 would receive a neutralization score of 1.1 (Figure 1a). Another hypothetical serum that neutralizes tier 2 viruses up to tier 2.8, receives a score of 2.8 (Figure 1b). Thus, in these simple examples, the scores for sera directly reflect their ability to neutralize viruses up to a rescaled virus tier value.
In practice, neutralization responses are noisy and not clearly resolved as in these two examples. Instead, the viruses neutralized by a particular serum are more scattered, and the overall trend becomes apparent in the aggregate view across many viruses, by considering how the probability of neutralization depends on rescaled tier values (Figure 1c). The probability for a serum to neutralize should decrease as virus NI values increase. We define serum neutralization potency (NP) as the NI that gives equal probability for viruses sensitive and resistant to neutralization by that serum, which we estimate using logistic regression (Figure 1d and Materials and methods).
Serum neutralization potencies from 225 Envs
From the large, multiclade, M group neutralization panel (Hraber et al., 2014), we computed loggeometric mean neutralization ID50 titers per virus and serum, then transformed the logmeans onto the interval from at least 1 to below 4, to obtain tiered neutralization indices (NIs). This transformation gives an inverse relationship between tierscaled NIs and geometric mean titer for viruses, as higher titers correspond to lowertier viruses (Figure 2a).
Conversely for sera, higher titers averaged across viruses indicate greater probability of neutralizing highertier, neutralizationresistant viruses (Figure 2b). That is, tier 3 sera (NP >3) are better able to neutralize tier 3 viruses than are tier 1 sera (NP <2). A tier 2.5 serum should generally be able to neutralize viruses up to that point on the neutralization sensitivity continuum, although there could be some viruses that are neutralized above and a few resistant below. Low scatter, as illustrated in Figure 1a and b, gives a steep slope in logistic regression, indicating a sharp boundary between viruses neutralized and not neutralized. More scatter, that is lower contrast between viruses neutralized and not neutralized with increasing mean ID50, as illustrated in Figure 1c, gives a lower slope. Inability to resolve between neutralization outcomes would give no slope, quantified by a high probability of a false positive (p value) from rejecting the null hypothesis that the slope is zero, as is true for some NP outcomes in Figure 2b. This is most common at the low end of the serum neutralization continuum, where the ability to neutralize a virus is constant (and low) across the range of virus sensitivities.
Resampling (with replacement) sets of 225 Envs from the M group panel indicates that NP values are robust to sampling variation, save for a few sera with a slope of zero (Figure 2—figure supplement 1a). Variation may increase slightly among resampled NP values at the extremes of the neutralization scale even when the slope is nonzero (Figure 2—figure supplement 1b).
Neutralization responses for a typical serum, such as SA.C37, which has median potency among the sera we studied, appear in Figure 3. For each virus, the neutralization outcome is shown as a function of the tiertransformed geometric mean ID50 (Figure 3a). Serum neutralization potency computed from the 225 Envpanel is 2.5. Separation, with overlap, between viruses neutralized and not neutralized is apparent in Figure 3b.
Serum neutralization potencies from small Env panels
As described in Materials and methods, we identified smaller panels and evaluated whether sets of 11 of the 12Env global panel (deCamp et al., 2014), or either 10 or 20 Envs identified by lasso (Tibshirani, 1996; Friedman et al., 2010) could estimate neutralization indices more efficiently than the full set of 225 Envs. Together, five Envs were shared by all three small panels (Figure 4—figure supplement 1) and four additional Envs were common to both the global panel and 20Env lasso panel, while the two handselected Envs were specific only to the global panel. The Envs that were selected to infer NP from smaller panels represent a range of neutralization sensitivities, favoring more sensitive and disfavoring the insensitive viruses (Figure 4—figure supplement 1).
A review of the ability of each panel to model NP for the serum SA.C37 (Figure 4) suggests that the 20Env panel, as might be expected, is better able to resolve between neutralization outcomes (p=0.000162, Figure 4c) than smaller panels (Figure 4a and b; p=0.00298 and p=0.00151 for global and 10Env panels, respectively). In this example, the standard global panel performed as well as the lasso panel of 10. This one case may not indicate the responses when tested across more sera.
NP values inferred from smaller panels are correlated with NP values from larger sets of holdout, nonpanel Envs, and also with each other, across panels (Figure 4—figure supplement 2). The lassoselected panel of 10 and the global panel perform similarly, so the global panel performs reasonably well for a panel of that size. The global panel may offer greater resolution for tier 1A and 1B sera, because it includes more Envs with NI below 2.0 than either of the panels selected by lasso. With panel Envs, inferred serum NP values were roughly limited to range from 1 to 4. When more Envs are tested, the NP values can fall below 1 or above 4, most likely because the most extreme Envs on the neutralization continuum allow the logistic regression to fit parameters outside the range typically expected.
Over all the 205 sera, the 20Env panel is better powered to detect a nonzero slope than the smaller 10 and 11Env panels (Figure 4—figure supplement 3), giving p>0.05 in about half as many cases as the smaller panels, likely because greater statistical power results from having nearly twice as many measurements to compute logistic regression parameters. Overall, rather than recommending use of a single panel to compute NP, it seems NP can be computed using a reasonable choice of Envs that represent a range of neutralization sensitivities, and use of more Envs is better able to quantify NP significantly than fewer Envs.
Neutralization responses in progressors and longterm nonprogressors
Using previously reported neutralization assay results (DoriaRose et al., 2010), we computed geometric mean ID50 titers from 20 Envs and 103 donor sera, of which 25 were previously found to be longterm nonprogressors (LTNP) (Migueles et al., 2002; Migueles et al., 2008; DoriaRose et al., 2009). We identified 10 Envs in this panel that were also tested in the M group panel, two from subtype C (DU422.1, DU156.12), seven from subtype B (BG1168.1, 6101.10, TRJO4551.58, PVO.4, CAAN5342.A2, THRO4156.18, TRO.11), and one from subtype A (Q769.D22). We computed NP values from these 10 Envs, using a cutoff ID50 of 50 for positive neutralization outcome, and pvalue cutoff from χ^{2} testing of 0.1 to indicate a significant NP score. This cutoff excluded 4 of 25 LTNP sera and 20 of 78 progressors; the proportions of excluded NP values were not significantly different between progressors and nonprogressors (Fisher’s exact test, p=0.42). NP values are highly correlated with geometric mean ID50s (Figure 5), regardless of whether or not NP outcomes with high pvalues from χ^{2} testing are excluded (Kendall’s τ, p<2.2 × 10^{−16}).
While the range of NP values in Figure 5 may seem large, a score of 4.1 indicates sera that neutralized all ten Envs (ID50 ≥50), and the lowestscoring sera neutralized none. To help interpret this range, consider the serum with a nominal NP of 4.6, which had a high corresponding pvalue of 0.24. This particular serum neutralized 9 of 10 Envs, but the NP score is not significant. The only Env this serum failed to neutralize was DU156.12, which should be the most readily neutralized of these 10 Envs. (Its NI is 2.04, versus a mean NI of 2.91 and range from 2.46 to 3.34 among the other nine.) In this case, DU156.12 may contain one or more mutations that altered an epitope targeted by this serum.
As anticipated, based on the established studies (DoriaRose et al., 2010), the geometric mean ID50s differed significantly between the progressors and nonprogressors (Wilcoxon test, p=2.1 × 10^{−11}). We noted the same outcome for breadth, defined as the percentage of 20 Envs neutralized per serum with an ID50 of at least 50 (Wilcoxon p=7.2 × 10^{−11}). Similarly, NP values differed significantly between these groups (n = 79, Wilcoxon p=4.3 × 10^{−9}). This NP comparison therefore agrees with established findings (DoriaRose et al., 2010), but notably, the NP comparisons involved half as many Envs and could use many fewer dilutions than required to compute mean ID50s among 20 Envs. To repeat the NP analysis using singledilution assays, rather than a fivepoint (or greater) dilution series, the entire comparison would require at most onetenth the number of neutralization reactions and material per sample, compared with the experiment using mean ID50 titers. Also, as seen above, testing more than 10 Envs per serum could increase statistical power, and reduce the number of sera excluded because their NP scores were deemed insufficiently significant.
Vaccinee sera
Recent work to stabilize the clade A BG505 SOSIP.664 trimer and reduce conformational changes in the CD4bound state has introduced mutations that increase hydrophobic packing of the V3 loop region, increase sensitivity to known neutralizing antibodies, and add disulfide bonds between gp120 and gp41 subunits within or between protomers. (Sanders et al., 2013; Pugach et al., 2015; Julien et al., 2015; de Taeye et al., 2015; Ringe et al., 2017) These mutations have been introduced to several Env isolates in addition to BG505, including a Cclade Env, ZM197M, to demonstrate general suitability of the approach (Pugach et al., 2015; Julien et al., 2015; de Taeye et al., 2015). A recent study (Torrents de la Peña et al., 2017) evaluated antigenicity and immunogenicity of these nextgeneration modified SOSIP trimers, designated v4 through v6.
We analyzed the postvaccination serological data by computing neutralization potency scores from ID50 neutralization titers in 50 rabbits sampled 22 weeks after the first vaccination (boosted at weeks 4 and 20). Fifteen viruses from that study overlapped with the set for which we have already computed NI values, which we tabulate from most to least sensitive (Table 1). For comparison, we computed breadth as the fraction of these 15 Envs that were neutralized with an ID50 of at least 50 reciprocal dilutions. We computed geometric mean titers with censored values fixed at a low constant (i.e. <20 was treated as 10). Where two different laboratories tested the same Env, we used the more complete set of results (i.e. those with fewer missing values).
Results complement the original findings, and add to interpretation of the original assay results. Using a cutoff ID50 of 50, all but two of 50 rabbits tested yielded pvalues below 0.05 (Table 2). These two animals (1586 and 1591 from Study C02215, vaccinated with BG505 SOSIP.v5.1 and BG505 SOSIP.v5.2, respectively) neutralized only WITO, which is a relatively resistant Env. Comparing animals that showed similar breadth (7%, or 1 of 15 Envs neutralized) and potency (geometric mean ID50 of 13.3 and 12.8, respectively), the NP values from animals 1586 and 1591 are lower and of questionable significance (NP = 0.87, χ^{2} p=0.0831 for both) than animal 1588, vaccinated with BG505 SOSIP.v5.1 (NP = 1.73, χ^{2} p=0.0171).
Similarly, comparing outcomes in Study C012015 (Table 2), three of five animals vaccinated with ZM197M SOSIP.v5.2 (with IDs of 1875, 1876, and 1878) showed 20% breadth (3 of 15 Envs neutralized), geometric mean titers of 19.3, 19.1, and 21.9, and NP values above 2.0, with χ^{2} pvalues below 0.05. This immunogen induced tier 2 responses in 4 of 5 rabbits, and yielded the most promising outcome among the refined SOSIP immunogens studied, though the cladeA BG505 trimers may have been at disadvantage because there were fewer cladeA Envs and Arelated recombinants in the set utilized than cladeC Envs and Crelated recombinants (Table 1). NP analysis agreed with the other neutralization metrics considered, and was able to resolve apparent ties between animals with similar neutralization responses to different Envs.
Antibody combinations
The same methods can be used with data from titration experiments using monoclonal antibodies. Although the scale of measurement is reversed (low IC50 values indicate high potency), a cutoff value can again be used to obtain a yesorno neutralization response. Using data from an earlier study (Kong et al., 2015), we explored the behavior of NP values from broadly neutralizing antibodies (bnAbs), both alone (monoclonal) and in combinations of up to four bnAbs. The NP and also the slope of the logistic function increase as bnAbs are combined in most cases, but not all (Figure 6). These results suggest that, all else equal, sera with higher number of antibody specificities could have higher NP and slope values.
Discussion
We have described a simple method to quantify and compare serum neutralization probabilities. The method uses logistic regression to model the probability that a serum neutralizes a virus with an ID50 titer above some cutoff. The neutralization potency (NP) identifies where the probabilities of neutralizing and not neutralizing a virus are equal. It provides a continuous measure for sera, which builds upon established tier categories now used to rate virus sensitivity. The NP statistic defines the greatest virus tier that a serum can be expected to neutralize. Thus, an NP of (roughly) 1.8 to 2.8 is ‘tier 2like’, and an NP above 2.8 is ‘tier 3like’. NP values below 1 are unable to neutralize even tier 1A Envs. The NP values are not absolute and depend on the ID50 cutoff used.
Defining a neutralization potency by testing a serum against a panel of 225 Envs on a routine basis is impractical and costly. We identified subsets of these Envs that largely reproduce the results from testing all 225. This makes assignment of NP values to a set of sera far more tractable than testing against all Envs. The already established 12virus global panel may suffice to characterize NP values, although larger panels tend to give more significant outcomes.
Evaluating neutralization assay outcomes against the continuum of neutralization sensitivity among viral variants provides more context to interpret results, because it considers not only the proportion of tier 2 Envs neutralized (breadth), but which Envs should most likely be neutralized. This helps to interpret differences between sera that neutralize the same number of Envs, each of which have different sensitivities. It also helps to compare sera where titers may be averaged over many outcomes below the limit of assay quantification, as was the case for sera from vaccinated rabbits (Table 2) (Torrents de la Peña et al., 2017).
We also explored results from experiments that utilized different Env panels and found they can be compared on the same neutralization scale (not shown). The ability to do this requires only that some number of Envs in each panel have available tiered neutralization scores, computed from available data. Better comparative results are obtained when using more Envs, because resulting NP values are less likely to be undefined.
A webbased utility – http://hiv.lanl.gov/content/sequence/NI/ni.html – at the Los Alamos HIV database computes NPs for sera tested against subsets of M group Envs. In addition to the analysis described here, it can also compute and report NPs for clade C Envs (Rademeyer et al., 2016) and for antibody IC50s.
Broadly neutralizing monoclonal antibodies are similarly characterized when isolated, and TZMbl assay inhibitory concentration IC50 and IC80 scores are generally determined across large pseudovirus panels, for example, to characterize a newly isolated antibody (Wu et al., 2010). We applied the same analytic methods to IC50s from antibodies. Our analysis of data from experiments that combined broadly neutralizing antibodies (bnAbs) having distinct specificities suggests that the NP increases with the number or variety of distinct antibody specificities in the sample.
The NP, as defined here, is a single metric to compare serum potencies. However, logistic regression provides another parameter, the slope. The slope indicates agreement between serum neutralization outcomes and average potency (geometric mean ID50) among serum samples used to compute the NI per Env. We hypothesize that the slope should be low for sera with limited epitope breadth. In the extreme case of a serum that targets a single epitope (e.g. a monoclonal response dominates), only Envs with that epitope would be neutralized, and neutralization outcomes should be widely scattered among viruses tested, independent of an overall sensitivity of the virus to neutralization, resulting in a slope near zero. Thus, the slope may indicate diversity of epitopes targeted by the test serum. Consistent with this, we found single monoclonal bnAbs had lower slopes than mixtures of monoclonal bnAbs. Such a finding might help characterize the mixtures of antibody specificities in polyclonal sera, to complement the methods for computational neutralization fingerprinting that have recently been advanced (Georgiev et al., 2013; DoriaRose et al., 2017). To evaluate this idea, subsequent work could identify serum samples with similar NP values but different slopes and map the epitopes therein.
In addition to establishing a metric for serum neutralization, a primary advantage of this approach is that it suggests a strategy to simplify neutralization assay experiments, for more costeffective screening of responses in largescale vaccination studies. Logistic regression utilizes a collection of binary (‘true’ or ‘false’) responses, rather than the actual neutralization titers. Because its derivation relies only on a singledilution analysis, rather than the current eightpoint dilution series, it can enable a largerscale screening throughput. Sera that score well using this screening metric could be prioritized for a more thorough dilutionseries analysis. Thus, our approach provides a relatively simple metric by which serum neutralization of HIV viruses can be compared with a quantitative method. Together with the slope (which may indicate breadth of epitopes targeted), this approach could be useful to downselect vaccine candidates and move forward with regimens that are able to elicit, on average, greater NP values.
In summary, we propose a way to simplify the comparison of neutralization potency of antisera. The resulting metric is continuous, scaled to provide an easily interpreted value, and may provide a formal method for ranking antisera. It is used on single dilution assay data, lending it to a highthroughput platform.
Materials and methods
Env tierscaled neutralization index (NI)
Request a detailed protocolTo obtain tierscaled neutralization scores for sera, we first transformed for each virus the geometric mean ID50s against a panel of chronic sera to a range of values that correspond to tiers, based on previously published results from testing multiple sera against many different Envs (Hraber et al., 2014; Rademeyer et al., 2016; Seaman et al., 2010). In an earlier study that inferred tiers using Envs and unpooled sera (Rademeyer et al., 2016), the greatest geometric mean ID50 against tier 3 Envs was 26.7 and the greatest geometric mean ID50 against tier 2 Envs was 117.6. We used these two values to set the boundaries between tiers 2 and 3 and between tiers 1 and 2, respectively. A linear transformation then scaled the logarithm of the virus geometric mean ID50 to virus neutralization tier. (The logmean is appropriate because the distribution of means is skewed, and logtransformation provides a more symmetric, normalized outcome.) We call this single transformed value for each virus the neutralization index (NI), and computed NI from Env geometric mean ID50s as 2 – [log(geometric mean ID50) – log(117.6)] / [log(117.6) – log(26.7)].
The NI is a continuously valued quantity, which can be interpreted intuitively on the tier scale, for example tier 1 viruses have higher mean neutralization ID50s and lower NI than tier 3 viruses. The resulting scale is not intended as an absolute or rigid standard, but rather as a guideline for interpretation. Such a transformation should best be held constant to compute and compare neutralization indices. The cutoffs would likely be different for different background neutralization data, such as a large Env panel against pooled sera (Seaman et al., 2010). Any linear transformation of the logtransformed geometric mean ID50s, or leaving them as they stand, would yield results identical to our findings, but on a modified numerical scale. The NI transformation makes it possible to interpret neutralization scores in terms of the familiar tier system.
Serum neutralization potency (NP)
Request a detailed protocolTo define binomial (yes/no) outcomes for any serum, we set an ID50 threshold value, and consider a virus as neutralized or resistant to that serum, depending on whether or not ID50 was above the threshold, respectively. Here, we use a cutoff dilution of 1:50, though this could be changed as conditions merit. While 1:50 is a good working choice for sera from natural infection cases, in a vaccine setting, a more generous choice (say 1:20) might be desirable. Changing the cutoff could introduce inconsistent interpretations among results from different cutoffs. The M group neutralization data (Hraber et al., 2014) have a median ID50 of 28, and 37% of observations are above the 1:50 cutoff value.
For each serum, we use logistic regression to model the probability of neutralization as a function of the tiertransformed virus neutralization score (NI), using the glm function in R (version 3.4.0). In cases where parameter estimation did not converge on a solution, we used the function glm2 (version 1.1.2) and more iterations, to ensure optimization converges on the solution. The glm2 function was designed to overcome the convergence limitations of its predecessor (Marschner, 2011).
The logistic function is defined as p(x)=1/[1 + exp(b_{0} + b_{1} x)], with x the independent variable, and two parameters: intercept (b_{0}) and slope (b_{1}). We define neutralization potency (NP) as –b_{0}/b_{1}. This corresponds to the NI that has equal odds of being neutralized or not (Crawley, 2002). Thus, the NP assigns each serum a tiertransformed neutralization score, which best separates the neutralized and nonneutralized viruses. In practice, we found it useful to include two hypothetical viruses with extreme phenotypes, one always sensitive to neutralization by any serum, with a tiered score of 0, and one always resistant to neutralization, with a tiered score of 5. These extremes help to define the NP and ensure the regression calculations perform as expected.
An important caveat remains to be addressed. Because it would require division by zero, NP is undefined if the slope is zero. This occurs when the probability of neutralization is independent of the tiertransformed NI values for viruses (i.e. is a constant equal to the breadth of the serum), meaning there is no consistent NI value that can separate neutralized from nonneutralized viruses. This might result in cases of low statistical power, a nonrepresentative selection of viruses (they are assumed random and independent), or a serum with a response that is otherwise atypical of sera used to compute mean virus neutralization titers. In such cases, one might report the slope and intercept separately, and refrain from interpreting NP. Because the slope is a statistical inference, a formalism exists to evaluate the null hypothesis of no slope. This is achieved by a likelihoodratio test, which computes a pvalue from the χ^{2} distribution for the reduction in deviance that results from adding a slope to the regression model (Crawley, 2002). A small pvalue indicates a significant, nonzero slope, and that the NP is well defined. NP values with high pvalues should be interpreted with caution.
Panel selection and validation
Request a detailed protocolIt would be impractical to require ID50s from 200 distinct Envs to compute serum NP values. We sought to develop and validate smaller, representative Env panels useful to estimate serum NP. Recently a global panel of 12 viruses was developed for its ability to model the median of the distribution of the magnitudebreadth curve (deCamp et al., 2014). That panel was selected using lasso to identify nine Envs for their ability to model the median area under the dilutionseries curve (AUC) for many sera. To these Envs, three were added manually, to include some neutralization response profiles not included in the nine. Here, we used this Env set as a candidate panel to infer the fullpanel NPs. Because of missing values among the NSDP panel data, we omitted one virus (clade A, 398F1_F6_20), which was missing values from 39 of 205 sera. We did this because the missing value would have caused different effective sample sizes across sera and may have reduced the apparent robustness of resulting NP values. This procedure yielded a panel of 11 Envs.
Because our approach to compute NP values uses mean ID50 as an input variable, we again used lasso (Tibshirani, 1996) to select alternative small panels. Here, instead of modeling median AUC, we sought predictors to model the logarithm of the geometric mean ID50 per serum, using the glmnet R package(Friedman et al., 2010), version 2.0–10, to obtain panels of 10 and 20 Envs. We used bootstrap resampling (with replacement) to assess NP robustness, and summarized the results as median and interquartile range per serum. To evaluate panel robustness, we compare the NP values from the panel and the remaining heldout Envs, testing for correlations between them. We also evaluated correlations among NP values from alternative panels, and the distribution of pvalues from logistic regression.
References

Statistical Computing: An Introduction to Data Analysis Using SPlus513–536, Proportion data: binomial errors, Statistical Computing: An Introduction to Data Analysis Using SPlus, Hoboken, Wiley & Sons.

Regularization paths for generalized linear models via coordinate descentJournal of Statistical Software 33:1–22.https://doi.org/10.18637/jss.v033.i01

Magnitude and breadth of a nonprotective neutralizing antibody response in an efficacy trial of a candidate HIV1 gp120 vaccineThe Journal of Infectious Diseases 202:595–605.https://doi.org/10.1086/654816

Impact of clade, geography, and age of the epidemic on HIV1 neutralization by antibodiesJournal of Virology 88:12623–12643.https://doi.org/10.1128/JVI.0170514

glm2: fitting generalized linear models with convergence problemsThe R Journal 3:12–15.

Magnitude and breadth of the neutralizing antibody response in the RV144 and Vax003 HIV1 vaccine efficacy trialsJournal of Infectious Diseases 206:431–441.https://doi.org/10.1093/infdis/jis367

A nativelike SOSIP.664 trimer based on an HIV1 subtype B env geneJournal of Virology 89:3380–3395.https://doi.org/10.1128/JVI.0347314

Reducing V3 antigenicity and immunogenicity on soluble, nativelike HIV1 Env SOSIP trimersJournal of Virology 91:e0067717.https://doi.org/10.1128/JVI.0067717

Optimization and validation of the TZMbl assay for standardized assessments of neutralizing antibodies against HIV1Journal of Immunological Methods 409:131–146.https://doi.org/10.1016/j.jim.2013.11.022

Regression shrinkage and selection via the lassoJournal of the Royal Statistical Society. Series B, Statistical Methodology 1:267–288.

Statistical approaches to analyzing HIV1 neutralizing antibody assay dataStatistics in Biopharmaceutical Research 4:1–13.https://doi.org/10.1080/19466315.2011.633860
Decision letter

Quarraisha Abdool KarimReviewing Editor; University of KwaZulu Natal, South Africa
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Congratulations, we are pleased to inform you that your article, "A Single, Continuous Metric to Define Tiered Serum Neutralization Potency against HIV", has been accepted for publication in eLife.
This well written manuscript from Hraber and colleagues describes a novel method for applying a single score to quantify polyclonal antibody responses directed to HIV Envelope that result in blocking of virus in vivo, called virus neutralization. This approach of classifying HIVpositive serum samples is analogous to what has been done for viral isolates. Neutralization results from the combined activities of multiple single monoclonals in a given individual, each of which possesses potentially different affinities and activities. It has been a challenge for the vaccine field to evaluate the activity of antibodies in a consistent manner. The proposed approach will be extremely useful in providing an objective, quantitative, high throughput approach for evaluating the potential of vaccine regimens geared towards neutralizing antibodies and/or classifying patient plasma or antibody potency and breadth. Advantages of this approach include: factoring in the sensitivity of the variants; cutting down on the time and cost needed to perform these evaluations; and importantly enhance standardized comparisons between studies.
The statistical approach consists of cascading the neutralization sensitivity level of HIV1 variants to generate a neutralization index score (NI) to report a serum neutralization potency (NP) score from 1 to 4, with 1 being the most sensitive variants and scores > 2 being able to neutralize the more relevant patient derived variants. In summary, a vaccine approach that achieves a higher potency score would be considered more promising and worthy of further investigation, compared to those that elicited lower scores.
The authors provide a link to a tool that they will make available for calculating NP on the Los Alamos HIV Database website. This tool includes precalculated NI values for several hundred envelope variants that are widely available and would be commonly used by labs that do this type of work. The acceptability of IC50 or binary data increases the utility of the tool.
The value of this approach cannot be underscored sufficiently, as it has been extraordinarily challenging to compare serum responses from infected humans, monkeys and other animals immunized with various vaccines or infected with SHIV (monkeys) or HIV (humans). At present, the evaluation of neutralizing antibodies requires that a set of virus isolates be tested individually, as the authors note, in a dilution series, in a cumbersome although painstakingly validated assay. Despite the use of common 'panels' of virus it is difficult to interpret hence the need for a simpler method to advance the field which these authors do. The proposed approach also works on smaller virus panels and may provide a higherthroughput and more costeffective method for screening sera. Importantly, as part of the development of the methodology, they have studied the longitudinal responses in HIVinfected subjects (n=2 shown in the manuscript) to demonstrate how this value might be applied to understanding the changing landscape of the humoral immunity to the escape variants in vivo.
Given its significance to the field it is important to share these findings and methodology with the research community for further validation in different groups and settings.
The paper could be strengthened by: i) The application of a retrospective analyses to understand the immunity generated in humans and monkeys following vaccination, and to further correlate these findings with protection outcomes.
ii) Inclusion of data from a panel of patient serum/plasma samples that had been previously ranked for breadth and potency by other methods to see how this approach compares.
iii) Clarifying how this approach is better than simply using the geometric mean titer (GMT). Figure 5 shows very clearly that the NP and GMT track very closely so more discussion of the added value of using the NP is needed.
iv) Introduction, first paragraph: Suggest replacing "complicated" with "requires standardization".
v) Last line of the introductory paragraph: Since Env panels vary a lot between labs, the inclusion of the something to the effect that "serum breadth and potency therefore depend strongly on the Env panels used that vary markedly between studies” would make that sentence more complete.
vi) Second paragraph of the Introduction: Minimise the repetition in discussing that tier 2 viruses are more difficult to neutralize.
vii) Subsection “Case studies”, last paragraph: Make it clear that this data is using monoclonal antibodies.
viii) Clarify what a concentration series is?
ix) Figure 1: It would be helpful to show the NP scores in blocks a and b as is done for block d. Is the scale between NP 1, 2 and 3 not linear? The spacing between the numbers on the xaxis suggests they are not.
x) Figure 6: The labelling is confusing. The same symbol is used to identify individual antibodies as well as sets of antibodies.
https://doi.org/10.7554/eLife.31805.016Author response
The paper could be strengthened by: i) The application of a retrospective analyses to understand the immunity generated in humans and monkeys following vaccination, and to further correlate these findings with protection outcomes.
We looked for suitable data sets among recent publications; there is little available to date in terms of Tier 2 neutralization responses with breadth in response to vaccination in human and monkeys. Thus, we have added an analysis of vaccinated rabbit sera from Rogier Sanders’ lab. Those neutralization data included 15 Envs that we could use to compute NP values. As detailed in the revised Results “This [ZM197M SOSIP.v5.2] immunogen induced Tier 2 responses in 4 of 5 test subjects, and yielded the most promising outcome among the SOSIP immunogens studied […] NP analysis agreed with the other neutralization metrics considered, and was able to resolve apparent ties between animals with similar neutralization responses to different Envs.” We look forward to being able to further correlate these findings with protection outcomes in future work.
ii) Inclusion of data from a panel of patient serum/plasma samples that had been previously ranked for breadth and potency by other methods to see how this approach compares.
We agree and used for this purpose a panel of serum neutralization data from a comparative study of breadth in progressors and longterm nonprogressors (LTNP). As discussed in the revised text, that study compared 25 LTNP sera with 78 progressors, and noted conspicuously lower breadth and geometric mean ID50 neutralization titers in LTNPs than progressors. Using 10 of the 20 Envs published in that study, we computed NP values and found them highly correlated with potency and breadth. Results from comparing NP values between groups similarly indicated highly significant differences. The strong correlation between NP values and geometric mean ID50s indicates that the NP values provide a useful measure, consistent with the goldstandard neutralization assay used currently, but that to repeat the experiments used for this comparison using the NP method would require merely onetenth the original number of neutralization reactions and sample material, given a fivepoint dilution series and testing 10 Envs, not 20.
iii) Clarifying how this approach is better than simply using the geometric mean titer (GMT). Figure 5 shows very clearly that the NP and GMT track very closely so more discussion of the added value of using the NP is needed.
Beyond being able to use greatly simplified and so more costeffective experimental procedures to screen largescale vaccine studies, as discussed above, we added this paragraph to the Discussion:
“Evaluating neutralization assay outcomes against the continuum of neutralization sensitivity among viral variants provides more context to interpret results, because it considers not only the proportion of Tier 2 Envs neutralized (breadth), but which Envs should most likely be neutralized. […] It helps also to compare sera where titers may be averaged over many outcomes below the limit of assay quantification, as was the case for sera from vaccinated rabbits.”
iv) Introduction, first paragraph: Suggest replacing "complicated" with "requires standardization".
Done.
v) Last line of the introductory paragraph: Since Env panels vary a lot between labs, the inclusion of the something to the effect that "serum breadth and potency therefore depend strongly on the Env panels used that vary markedly between studies” would make that sentence more complete.
Done.
vi) Second paragraph of the Introduction: Minimise the repetition in discussing that tier 2 viruses are more difficult to neutralize.
Done. (Deleted “which are more difficult to neutralize”)
vii) Subsection “Case studies”, last paragraph: Make it clear that this data is using monoclonal antibodies.
We added “monoclonal” to the final Results paragraph, the relevant Discussion paragraph, and the legend for Figure 6.
viii) Clarify what a concentration series is?
Replaced “antibody concentration series” with “antibody titration experiments”.
ix) Figure 1: It would be helpful to show the NP scores in blocks a and b as is done for block d. Is the scale between NP 1, 2 and 3 not linear? The spacing between the numbers on the xaxis suggests they are not.
We agree and have reformatted this notional figure to present the concept more clearly. The xaxis scale is based on rank, but purely speculative, so we have taken your advice.
x) Figure 6: The labelling is confusing. The same symbol is used to identify individual antibodies as well as sets of antibodies.
We have added symbols to this figure and explanatory text in the figure legend to identify which antibodies occur where.
https://doi.org/10.7554/eLife.31805.017Article and author information
Author details
Funding
Bill and Melinda Gates Foundation (OPP1146996)
 Peter Hraber
 Bette Korber
 Kshitij Wagh
 David Montefiori
National Institute of Allergy and Infectious Diseases
 Mario Roederer
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank Gabriella Scarlatti, affiliates of the Global HIV Vaccine Enterprise, and participants of the workshop on appropriate use of tiered virus panels when assessing HIV1 vaccineelicited neutralizing antibodies, for the discussions that inspired this study. Nicole DoriaRose and Mark Connors kindly shared their serum neutralization data. We thank authors of the other cited reports for making their neutralization data available at the time of publication. This work was supported by the Bill and Melinda Gates Foundation [OPP1146996]. MR was supported by the Intramural Research Program of the Vaccine Research Center (VRC), National Institute of Allergy and Infectious Disease (NIAID), National Institutes of Health (NIH).
Reviewing Editor
 Quarraisha Abdool Karim, University of KwaZulu Natal, South Africa
Publication history
 Received: September 6, 2017
 Accepted: January 16, 2018
 Accepted Manuscript published: January 19, 2018 (version 1)
 Version of Record published: January 29, 2018 (version 2)
Copyright
This is an openaccess article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Metrics

 1,472
 Page views

 157
 Downloads

 11
 Citations
Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Epidemiology and Global Health
 Medicine
There is increasing awareness throughout biomedical science that many results do not withstand the trials of repeat investigation. The growing abundance of medical literature has only increased the urgent need for tools to gauge the robustness and trustworthiness of published science. Dichotomous outcome designs are vital in randomized clinical trials, cohort studies, and observational data for ascertaining differences between experimental and control arms. It has however been shown with tools like the fragility index (FI) that many ostensibly impactful results fail to materialise when even small numbers of patients or subjects in either the control or experimental arms are recoded from event to nonevent. Critics of this metric counter that there is no objective means to determine a meaningful FI. As currently used, FI is not multidimensional and is computationally expensive. In this work a conceptually similar geometrical approach is introduced, the ellipse of insignificance (EOI). This method yields precise deterministic values for the degree of manipulation or miscoding that can be tolerated simultaneously in both control and experimental arms, allowing for the derivation of objective measures of experimental robustness. More than this, the tool is intimately connected with sensitivity and specificity of the event / nonevent tests, and is readily combined with knowledge of test parameters to reject unsound results. The method is outlined here, with illustrative clinical examples.

 Epidemiology and Global Health
Background: Zoonotic spillover from animal reservoirs is responsible for a significant global public health burden, but the processes that promote spillover events are poorly understood in complex urban settings. Endemic transmission of Leptospira, the agent of leptospirosis, in marginalised urban communities occurs through human exposure to an environment contaminated by bacteria shed in the urine of the rat reservoir. However, it is unclear to what extent transmission is driven by variation in the distribution of rats or by the dispersal of bacteria in rainwater runoff and overflow from open sewer systems.
Methods: We conducted an ecoepidemiological study in a highrisk community in Salvador, Brazil, by prospectively following a cohort of 1,401 residents to ascertain serological evidence for leptospiral infections. A concurrent rat ecology study was used to collect information on the finescale spatial distribution of ‘rattiness’, our proxy for rat abundance and exposure of interest. We developed and applied a novel geostatistical framework for joint spatial modelling of multiple indices of disease reservoir abundance and human infection risk.
Results: The estimated infection rate was 51.4 (95%CI 40.4, 64.2) infections per 1,000 followup events. Infection risk increased with age until 30 years of age and was 37 associated with male gender. Rattiness was positively associated with infection risk for residents across the entire study area, but this effect was stronger in higher elevation areas (OR 3.27 95%CI 1.68, 19.07) than in lower elevation areas (OR 1.14 95%CI 1.05, 1.53).
Conclusions: These findings suggest that, while frequent flooding events may disperse bacteria in regions of low elevation, environmental risk in higher elevation areas is more localised and directly driven by the distribution of local rat populations. The modelling framework developed may have broad applications in delineating complex animalenvironmenthuman interactions during zoonotic spillover and identifying opportunities for public health intervention.
Funding: This work was supported by the Oswaldo Cruz Foundation and Secretariat of Health Surveillance, Brazilian Ministry of Health, the National Institutes of Health of the United States (grant numbers F31 AI114245, R01 AI052473, U01 AI088752, R01 TW009504 and R25 TW009338); the Wellcome Trust (102330/Z/13/Z), and by the Fundação de Amparo à Pesquisa do Estado da Bahia (FAPESB/JCB0020/2016). MTE was supported by a Medical Research UK doctorate studentship. FBS participated in this study under a FAPESB doctorate scholarship.