Consistent and correctable bias in metagenomic sequencing experiments

  1. Michael R McLaren
  2. Amy D Willis
  3. Benjamin J Callahan  Is a corresponding author
  1. North Carolina State University, United States
  2. University of Washington, United States
7 figures, 3 tables and 1 additional file

Figures

Bias arises throughout an MGS workflow, creating systematic error between the observed and actual compositions.

Panel A illustrates a hypothetical marker-gene measurement of an even mixture of three taxa. The observed composition differs from the actual composition due to the bias at each step in the …

https://doi.org/10.7554/eLife.46923.002
Consistent multiplicative bias causes systematic error in taxon ratios, but not taxon proportions, that is independent of sample composition.

The even community from Figure 1 and a second community containing the same three taxa in different proportions are measured by a common MGS protocol. Measurements of both samples are subject to the …

https://doi.org/10.7554/eLife.46923.003
Figure 3 with 3 supplements
Our model of bias explains the systematic error observed in the Brooks et al. (2015) cell-mixture experiment.

The top row compares the observed proportions of individual taxa to the actual proportions (Panel A) and to those predicted by our fitted bias model (Panel B). Panel A shows significant error across …

https://doi.org/10.7554/eLife.46923.004
Figure 3—figure supplement 1
The observed error in taxon ratios for all three mixture experiments.

The observed error in taxon ratios (colored dots) against the fitted model prediction (black cross) for the three mixture experiments of Brooks et al. (2015).

https://doi.org/10.7554/eLife.46923.005
Figure 3—figure supplement 2
Observed vs. expected proportions under no bias, copy-number bias only, and the estimated bias.

Comparison of the observed proportions with three types of expected proportions—the actual proportions, the proportions predicted from the estimated 16S copy numbers in Table 3, and the proportions …

https://doi.org/10.7554/eLife.46923.006
Figure 3—figure supplement 3
Comparison between the simple linear model, the linear interactions model of Brooks et al. (2015), and our model.

Comparison of the model fits for the simple linear model, the linear interactions model of Brooks et al. (2015), and our model. The simple linear model has a poor fit, while the Brooks et al. (2015)

https://doi.org/10.7554/eLife.46923.007
Figure 4 with 1 supplement
Bias of the mock spike-in in the Costea et al. (2017) experiment is consistent across samples with varying background compositions.

Panel A shows the variation in bacterial composition across protocols and specimens (Labels 1 through 8 denote fecal specimens; M denotes the mock-only specimen) and Panel B shows the relative …

https://doi.org/10.7554/eLife.46923.009
Figure 4—figure supplement 1
Estimated bias for the mock taxa for the three protocols.

Estimated bias for the 10 mock taxa for the three protocols in the Costea et al. (2017) experiment. Bias is shown as relative to the average taxon; that is, the efficiency of each taxon is divided …

https://doi.org/10.7554/eLife.46923.010
Figure 5 with 1 supplement
Calibration can remove bias and make MGS measurements from different protocols quantitatively comparable.

For the sub-community defined by the mock spike-in of the Costea et al. (2017) dataset, we estimated bias from three specimens (the estimation set ‘Est’) and used the estimate to calibrate all …

https://doi.org/10.7554/eLife.46923.012
Figure 5—figure supplement 1
Precision in the bias estimate vs. the number of control samples for Protocol H.

Precision in bias estimate decreases with the number of control samples and depends on the noise associated with each taxon. For Protocol H in the Costea et al. (2017) experiment, Panel A shows the …

https://doi.org/10.7554/eLife.46923.013
Figure 6 with 1 supplement
In the Brooks et al. (2015) experiment, bias is primarily driven by DNA extraction and is not substantially reduced by 16S copy-number (CN) correction.

Panel A shows the bias estimate for each step in the experimental workflow (DNA extraction, PCR amplification, and sequencing + (bio)informatics), as well as the bias imposed by performing 16S CN …

https://doi.org/10.7554/eLife.46923.014
Figure 6—figure supplement 1
PCR bias and total bias vs. bias predicted by 16S copy number.

In the Brooks et al. (2015) experiment, variation in 16S copy number is moderately predictive of PCR bias (A) but not of total bias (B). The grey line corresponds to y=x, or perfect agreement between …

https://doi.org/10.7554/eLife.46923.015

Tables

Table 1
Estimated bias for the three Brooks et al. (2015) mixture experiments.

The first three columns show the bias estimated in each mixture experiment; the second three columns show the bias estimated for individual protocol steps from the mixture estimates. In each case, …

https://doi.org/10.7554/eLife.46923.008
MixturesSteps
TaxonCellsDNAPCR prod.ExtractionPCRSeq.+Inf.
Lactobacillus iners4.72.31.22.01.91.2
Sneathia amnii4.62.41.31.91.81.3
Lactobacillus crispatus2.30.50.94.30.60.9
Prevotella bivia1.80.40.94.60.40.9
Atopobium vaginae0.31.11.00.31.01.0
Streptococcus agalactiae0.22.00.90.12.20.9
Gardnerella vaginalis0.20.40.80.40.50.8
Max pairwise bias29.36.11.636.65.21.6
Avg. pairwise bias5.62.71.25.52.31.2
Avg. pairwise noise1.21.21.3
Table 2
Estimated bias and differential bias among the spike-in taxa for the three protocols (Protocols H, Q, and W) in the Costea et al. (2017) experiment.

The first three columns show the bias of the given protocol for the 10 mock taxa; the second three columns show the differential bias between protocols for the 10 mock taxa and the contaminant. In …

https://doi.org/10.7554/eLife.46923.011
ProtocolProtocol/Reference
TaxonHQWH/QH/WQ/W
Prevotella melaninogenica2.811.551.371.822.051.12
Clostridium perfringens1.180.491.142.411.040.43
Salmonella enterica1.772.290.790.772.252.90
Clostridium difficile0.110.090.221.240.490.40
Lactobacillus plantarum0.020.770.350.020.052.18
Vibrio cholerae2.101.160.891.812.371.31
Clostridium saccharolyticum1.211.440.590.842.052.45
Yersinia pseudotuberculosis1.461.350.661.082.212.05
Blautia hansenii0.540.741.800.720.300.41
Fusobacterium nucleatum46.294.9816.519.302.800.30
Contaminant0.892.132.38
Max pairwise bias275156744285910
Avg. pairwise bias9.73.23.54.73.92.8
Avg. pairwise noise1.31.51.11.51.31.5
Table 3
Estimated genome size and 16S copy number for the seven mock taxa in the Brooks et al. (2015) experiment (Materials and methods).
https://doi.org/10.7554/eLife.46923.016
TaxonGenome size (Mbp)Copy number
Atopobium vaginae1.442*
Gardnerella vaginalis1.642
Lactobacillus crispatus2.044
Lactobacillus iners1.285*
Prevotella bivia2.524*
Sneathia amnii1.333*
Streptococcus agalactiae2.167
  1. *Denotes copy numbers that were instead estimated to be 1 by Brooks et al. (2015).

Additional files

Download links