1. Cancer Biology
Download icon

Replication Study: Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma

  1. John Repass
  2. Reproducibility Project: Cancer Biology  Is a corresponding author
  1. ARQ Genetics, United States
Replication Study
  • Cited 5
  • Views 2,402
  • Annotations
Cite this article as: eLife 2018;7:e25801 doi: 10.7554/eLife.25801

Abstract

As part of the Reproducibility Project: Cancer Biology, we published a Registered Report (Repass et al., 2016), that described how we intended to replicate an experiment from the paper ‘Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma’ (Castellarin et al., 2012). Here we report the results. When measuring Fusobacterium nucleatum DNA by qPCR in colorectal carcinoma (CRC), adjacent normal tissue, and separate matched control tissue, we did not detect a signal for F. nucleatum in most samples: 25% of CRCs, 15% of adjacent normal, and 0% of matched control tissue were positive based on quantitative PCR (qPCR) and confirmed by sequencing of the qPCR products. When only samples with detectable F. nucleatum in CRC and adjacent normal tissue were compared, the difference was not statistically significant, while the original study reported a statistically significant increase in F. nucleatum expression in CRC compared to adjacent normal tissue (Figure 2; Castellarin et al., 2012). Finally, we report a meta-analysis of the result, which suggests F. nucleatum expression is increased in CRC, but is confounded by the inability to detect F. nucleatum in most samples. The difference in F. nucleatum expression between CRC and adjacent normal tissues was thus smaller than the original study, and not detected in most samples.

https://doi.org/10.7554/eLife.25801.001

Introduction

The Reproducibility Project: Cancer Biology (RP:CB) is a collaboration between the Center for Open Science and Science Exchange that seeks to address concerns about reproducibility in scientific research by conducting replications of selected experiments from a number of high-profile papers in the field of cancer biology (Errington et al., 2014). For each of these papers a Registered Report detailing the proposed experimental designs and protocols for the replications was peer reviewed and published prior to data collection. The present paper is a Replication Study that reports the results of the replication experiments detailed in the Registered Report (Repass et al., 2016), for a paper by Castellarin et al., and uses a number of approaches to compare the outcomes of the original experiments and the replications.

In 2012, Castellarin et al. reported that overall abundance of Fusobacterium nucleatum (F. nucleatum) RNA was increased by approximately 79 fold in colorectal carcinoma (CRC) as compared to adjacent normal biopsies, as determined by RNA sequencing. Next, they measured F. nucleatum DNA abundance in 99 CRC and adjacent normal tissue biopsies from a patient cohort and found that the presence of F. nucleatum DNA was 415 times greater in CRC tissue than adjacent normal tissue. These results, combined with earlier studies, provided evidence for a link between tissue-associated bacteria and tumorigenesis.

The Registered Report for the paper by Castellarin et al. (2012) described the experiments to be replicated (Figure 2), and summarized the current evidence for these findings. Since that publication there have been additional studies investigating F. nucleatum abundance in CRC. F. nucleatum DNA abundance was reported to be enriched in 88 out of 101 CRC tissue samples compared to adjacent normal tissues in Chinese patients (Li et al., 2016). The proportion of CRC tissue that were found to be F. nucleatum DNA positive increased from rectal cancers to cecal cancers (Gao et al., 2015; Mima et al., 2016a), increased according to histological grade (Ito et al., 2015), and varied depending on diet (Mehta et al., 2017). An association of F. nucleatum DNA with clinical outcome in CRC patients has also been investigated, but with mixed outcomes. In one study, F. nucleatum positive cases resulted in a worse outcome (Mima et al., 2016b), while two separate studies found no association with cancer-specific mortality (Ito et al., 2015; Mima et al., 2015).

Many of these recent studies utilized the same TaqMan primer/probe set as Castellarin et al. (2012) and reported 12–13% of CRC samples as positive for F. nucleatum (Mehta et al., 2017; Mima et al., 2016a; 2016b; Mima et al., 2015), compared to 3.4% positive in adjacent normal tissue (Mima et al., 2015). Another study, which utilized a custom-made TaqMan assay, reported 56% of CRC samples as positive, and that a subset of these positive CRC samples had lower F. nucleatum expression in the adjacent normal tissue (Ito et al., 2015). Ito and colleagues also examined F. nucleatum expression in normal mucosa samples and reported 15% were positive, but with low expression of F. nucleatum (Ito et al., 2015). Importantly, these studies utilized formalin-fixed paraffin-embedded (FFPE) tissue instead of fresh frozen tissue samples. While FFPE has the advantage of preserving morphology, it makes analysis of biomolecules, particularly nucleic acids, challenging due to formation of crosslinks as compared to fresh frozen tissue (Bradley et al., 2015; Van Allen et al., 2014).

In addition to using quantitative PCR (qPCR), F. nucleatum has also been visualized within CRC tissue using fluorescence in situ hybridization (FISH) (Kostic et al., 2013; 2012; Li et al., 2016; McCoy et al., 2013). Another method of detecting bacterial infection is through a humoral immune response. F. nucleatum specific antibodies were detectable in samples from CRC patients and have been reported as a potential diagnostic biomarker for CRC (Wang et al., 2016). Finally, using pyrosequencing, Gao and colleagues reported a higher abundance of bacteria from the genus Fusobacterium between CRC patients and separate healthy individuals (10.08% vs 0.01%, respectively) (Gao et al., 2015).

The outcome measures reported in this Replication Study will be aggregated with those from the other Replication Studies to create a dataset that will be examined to provide evidence about reproducibility of cancer biology research, and to identify factors that influence reproducibility more generally.

Results and discussion

Quantitative PCR of F. nucleatum DNA abundance from colorectal carcinoma, adjacent normal tissue, and matched normal human colon tissue

We sought to independently replicate an experiment testing the hypothesis that F. nucleatum is overrepresented in CRC tissue compared to adjacent normal tissue by qPCR. This experiment is similar to what was reported in Figure 2 of Castellarin et al. (2012). While it is common practice to use adjacent normal tissue as a control to reduce the effect of genetic background, it is widely accepted that adjacent normal tissue, while observationally normal, may have genetic alterations that make it distinct from very distant somatic tissue (Braakhuis et al., 2004). To this end, gene expression profiling in adjacent normal tissue has even been used to predict recurrence in patients with rectal cancer (Schneider et al., 2006) and to predict recurrence as well as overall survival time in breast cancer patients (Troester et al., 2016). Thus, as an additional control, this replication attempt included a group of normal colorectal tissues from age/ethnicity matched patients. Extracted genomic DNA was analyzed for abundance of F. nucleatum, which was determined using the same primers as the original study targeting an F. nucleatum specific gene. The human PGT gene (updated gene symbol: solute carrier organic anion transporter family member 2A1 (SLCO2A1)) served as the reference control, with the ratio of these two genes giving an estimate of the ratio of F. nucleatum DNA to human DNA in each sample. Two independent qPCR runs were performed.

The cycle threshold (Ct) values, a measure of the concentration of the target sequence in the PCR reaction, observed in this replication attempt ranged from 22.4 to 40 for adjacent normal tissue, 22.1 to 40 for CRC tissue, and 21.8 to 40 for matched normal tissue (Figure 1—figure supplement 1, Figure 1—figure supplement 2). This compares to the original study that reported Ct values that ranged from 25.5 to 40 for adjacent normal tissue and 21.4 to 40 for CRC tissue. However, these ranges include all Ct values observed (PGT and F. nucleatum). We observed that while the raw Ct values for PGT were well within the normal acceptable range (<30) (Karlen et al., 2007), the raw Ct values for F. nucleatum were very high across both independent qPCR runs for all samples. For PGT, the first independent run (Figure 1—figure supplement 1A) yielded a median raw Ct value of 22.7 (IQR = 22.6–22.9) in adjacent normal tissue, 22.8 (IQR = 22.6–22.8) in CRC tissue, and 22.8 (IQR = 22.7–23.0) in matched normal tissue. However, F. nucleatum did not give a signal in most samples. In these samples, the Ct was set to 40, even though no amplicon was observed (also known as ‘non-detects’). While others have suggested methods to reduce the bias introduced when non-detects are set to Ct values of 40, such as setting the value to 35 (McCall et al., 2014), or using imputation methods and hierarchical models to deal with non-detects (McCall et al., 2014; Boyer et al., 2013), the approach should be based on evidence (Caraguel et al., 2011) and introduced prior to data collection, such as in a pre-registered analysis plan, to minimize confirmation bias (Wagenmakers et al., 2012). Furthermore, in the samples in which F. nucleatum was detectable, it was often at the edge of detectability, requiring more than 30 cycles of PCR to be detected (Figure 1—figure supplement 1B). This makes the data difficult to interpret considering the noise associated with very high Ct values. A similar observation was made with data from the second independent run (Figure 1—figure supplement 2B). There was high concordance for the PGT Ct values between the two runs with a correlation coefficient (ρ) of 0.84, 0.93, and 0.96 for adjacent normal, CRC, and matched normal tissue, respectively. Similar high concordance was observed for the F. nucleatum Ct values with a ρ of 0.99, 0.96, and 0.81 for adjacent normal, CRC, and matched normal tissue, respectively.

Although not pre-specified, to assess if the signal for F. nucleatum from the qPCR assay produced a specific product for F. nucleatum, we sequenced the amplicons generated. When we examined the 10 adjacent normal and 16 CRC tissue samples that gave a signal for F. nucleatum from the qPCR assay, we observed that at higher Ct values (Ct >35) a specific sequence was detected in some samples (3 CRC), while others only produced non-specific amplification (4 adjacent normal; 6 CRC). The non-specific amplification observed could be due to poor DNA quality or poor cleanup of the DNA samples during the sequencing process. In all of the samples with Ct values less than 35, specific sequences were observed (6 adjacent normal; 7 CRC). Interestingly, there were six paired samples, all with Ct values less than 35, where a specific sequence was observed in the adjacent normal as well as in the CRC tissue. These samples were further analyzed to determine if the relative abundance of F. nucleatum DNA (normalized to PGT) between the tissues were different (Figure 1). Performing an exploratory analysis on the mean normalized expression of F. nucleatum DNA (normalized to PGT) from the two independent runs, we found that the median fold change in F. nucleatum DNA between CRC and adjacent normal tissue [n = 6] was 1.42 (IQR = 0.72–2.80) with the mean fold change equal to 2.78 [SD = 3.46]. This difference was not statistically significant (two-tailed paired t test: t(5) = 1.067, = 0.335, r = 0.43, 95% CI [−0.59, 0.92]) when only samples in which F. nucleatum was detected in both CRC and adjacent normal tissue was considered. Considering only the data in which a specific product from the qPCR assay was able to be confirmed, F. nucleatum expression was higher in the CRC tissue compared to the adjacent normal tissue in 8 of the 40 examined samples, while expression was higher in the adjacent normal tissue compared to CRC tissue in two samples.

Figure 1 with 2 supplements see all
Relative abundance of F. nucleatum in colorectal carcinoma versus adjacent normal biopsies in samples with detectable F. nucleatum.

Only samples in which F. nucleatum was detected in both colorectal carcinoma (CRC) and adjacent normal tissue by quantitative PCR (qPCR) and confirmed by sequencing of the qPCR product are shown (n = 6). The mean relative abundance of F. nucleatum (normalized to PGT expression) in CRC tissue versus adjacent normal tissue of both independent runs is reported for each patient sample and error bars represent SEM. The y-axis represents mean fold gene expression change (2-ΔΔCt) while the x-axis represents patient samples. Exploratory analysis: two-tailed paired t test: t(5) = 1.067, = 0.335, r = 0.43, 95% CI [−0.59, 0.92]. Additional details for this experiment can be found at https://osf.io/rb4yq/.

https://doi.org/10.7554/eLife.25801.002

To facilitate a direct comparison of these results to the original study we included all of the samples in the analysis, as was done in the original study, and specified in the confirmatory analysis plan of the Registered Report (Repass et al., 2016). Samples in this case were identified positive based upon whether a product was amplified in the qPCR reaction, irrespective of whether the amplicon was confirmed to be F. nucleatum by sequencing. While allowing us to compare qPCR signal intensity among samples, this approach also introduces error into the comparison because, as noted above, we know that some of the amplification products from the qPCR assay were non-specific. The relative abundance of F. nucleatum DNA (normalized to PGT) between CRC and adjacent normal tissue was determined for each independent run (Figure 2—figure supplement 1) and averaged for the analysis (Figure 2). We found that the median fold change in F. nucleatum DNA between CRC and adjacent normal tissue [n = 40] was 1.13 (IQR = 0.91–2.84) with the mean fold change equal to 4.97 [SD = 14.2]. This compares to the original study, which reported an estimated median fold change of F. nucleatum DNA in CRC to adjacent normal tissue [n = 99] of 3.15 (IQR = 1.04–14.49) with the mean fold change equal to 378.9 [SD = 1980] (Castellarin et al., 2012). Analysis of the mean normalized expression of F. nucleatum DNA (normalized to PGT) from the two independent runs was statistically significant (two-tailed Wilcoxon signed-rank test: Z = 2.14, = 0.032), which suggests that F. nucleatum DNA is over represented in CRC compared to adjacent normal tissue. Collectively, 5% (2 out of 40 samples) of the matched normal tissue samples, 25% (10 out of 40 samples) of the adjacent normal tissue, and 40% (16 out of 40 samples) of the CRC tissue gave qPCR products for F. nucleatum (i.e. amplification of a product after fewer than 40 cycles in both independent repeats, some of which were confirmed to have amplified F. nucleatum sequence and some that were non-specific, as noted above). There were three samples that gave qPCR products for F. nucleatum in the adjacent normal tissue and not in the CRC tissue, nine samples that gave qPCR products in the CRC tissue and not in the adjacent normal tissue, and seven samples that gave qPCR products in the adjacent normal as well as in the CRC tissue. Considering all of the data in which a product was amplified in the qPCR reaction, and irrespective of whether the qPCR products were confirmed to be specific or non-specific upon sequencing, F. nucleatum expression was higher in the CRC tissue compared to adjacent normal tissue in 13 of the 40 examined samples, while expression was higher in the adjacent normal tissue compared to CRC tissue in six samples. The difference in the observed fold change between this replication attempt and the original study could be due to a number of factors between the studies. Particularly with so many samples around the threshold of detection for F. nucleatum, very small changes in signal can lead to large changes in fold-change values (Cui et al., 2015).

Figure 2 with 1 supplement see all
Relative abundance of F. nucleatum by qPCR in colorectal carcinoma versus adjacent normal biopsies.

Fold change values are shown for all paired specimens based on differences in Ct values, irrespective of whether the qPCR products were confirmed to be specific or non-specific upon sequencing. Tissue was collected from colorectal carcinoma (CRC) and adjacent normal tissue (n = 40). qPCR was performed independently two times and averaged. The mean relative abundance of F. nucleatum (normalized to PGT expression) in CRC tissue versus adjacent normal tissue of both independent runs is reported for each patient sample and error bars represent SEM. The y-axis represents mean fold gene expression change (2-ΔΔCt) while the x-axis represents patient samples. Two-tailed Wilcoxon signed-rank test: Z = 2.14, p = 0.032. Additional details for this experiment can be found at https://osf.io/rb4yq/.

https://doi.org/10.7554/eLife.25801.005

Meta-analyses of original and replicated effects

We performed a meta-analysis using a random-effects model to combine each of the effects from the original study and this replication described above as pre-specified in the confirmatory analysis plan (Repass et al., 2016). To directly compare and combine the results of both studies, we used the qPCR results for this replication irrespective of whether the amplicon was confirmed to be F. nucleatum by sequencing. To provide a standardized measure of the effect, a common effect size was calculated for the original and replication studies. The effect size r is a standardized measure of the correlation (strength and direction) of the association between two variables, in this case tissue type and normalized F. nucleatum DNA levels. Since F. nucleatum was not detected in most samples, this confounds the effect size for this meta-analysis and makes it difficult to compare the replication data to the original study, which did not report the number of samples below the limit of detection.

The comparison of CRC tissue to adjacent normal tissue resulted in r = 0.45, 95% CI [0.28, 0.60] for the effect size estimated a priori from the p-value and sample size reported in Figure 2 of the original study (Castellarin et al., 2012). This compares to r = 0.24, 95% CI [−0.08, 0.51] for the comparison of all of the CRC to adjacent normal tissue samples reported in this study, which spans zero and implies that the null hypothesis cannot be rejected. A random effects meta-analysis (Figure 3) of both the replication and original effects resulted in r = 0.38, 95% CI [0.17, 0.56], which was statistically significant (p = 5.86×10−4). Using the estimate of the effect size of one study, as well as the associated uncertainty (i.e. confidence interval), and comparing it to the effect size of the other study provides another approach to compare the original and replication results (Errington et al., 2014; Valentine et al., 2011). Importantly, the width of the confidence interval for each study is a reflection of not only the confidence level (e.g. 95%), but also variability of the sample (e.g. SD) and sample size. Thus, both studies, observed higher F. nucleatum levels in CRC tissue. The point estimate of the replication effect size was not within the confidence interval of the original result; however, the point estimate of the original effect size was within the confidence interval of the replication.

Meta-analyses of each effect.

Effect size (r) and 95% confidence interval are presented for Castellarin et al. (2012), this replication attempt (RP:CB), and a meta-analysis to combine the two effects. To directly compare and combine the results of both studies, the qPCR results for RP:CB were used irrespective of whether the amplicon was confirmed to be F. nucleatum by sequencing. The effect size r is a standardized measure of the correlation (strength and direction) of the association between tissue type and normalized F. nucleatum DNA levels, with a larger positive value indicating CRC tissue is correlated with a higher F. nucleatum expression level. Sample sizes used in Castellarin et al. (2012) and this replication attempt are reported under the study name. Random effects meta-analysis of the abundance of F. nucleatum in CRC tissue compared to adjacent normal tissue (meta-analysis p = 5.86 × 10−4). Additional details for this meta-analysis can be found at https://osf.io/kup8d/.

https://doi.org/10.7554/eLife.25801.007

This direct replication provides an opportunity to understand the present evidence of these effects. Any known differences, including reagents and protocol differences, were identified prior to conducting the experimental work and described in the Registered Report (Repass et al., 2016). However, this is limited to what was obtainable from the original paper, which means there might be particular features of the original experimental protocol that could be critical, but unidentified. So while some aspects, such as method of qPCR and primer and probe sequences were maintained, others were unknown or not easily controlled for, including similarities and differences in patient characteristics (Klevorn and Teague, 2016), as well as influence of diet on the gut microbiota (Mehta et al., 2017). Whether these or other factors influence the outcomes of this study is open to hypothesizing and further investigation, which is facilitated by direct replications and transparent reporting.

Materials and methods

As described in the Registered Report (Repass et al., 2016), we attempted a replication of the experiment reported in Figure 2 of Castellarin et al. (2012). A detailed description of all protocols can be found in the Registered Report (Repass et al., 2016). Additional detailed experimental notes, data, and analyses are available on OSF (RRID:SCR_003238) (https://osf.io/v4se2/; Repass et al., 2017). This includes the R Markdown file (https://osf.io/fmp6u/) that was used to compose this manuscript, which is a reproducible document linking the results in the article directly to the data and code that produced them (Hartgerink, 2017).

Clinical specimens

Patient tissue samples were obtained from iSpecimen (Lexington, Massachusetts) as flash-frozen in liquid nitrogen shortly after harvest. This included 40 CRC samples along with adjacent normal tissue from the same patient, as well as 40 age, sex, and ethnicity matched non-diseased control colorectal tissues. Patient phenotypes (age, gender, ethnicity, diagnosis, result, and histopathology report) are available at https://osf.io/tc3jc/. The original study did not report patient phenotypes, thus it is unknown how the patient populations compared between the two studies. Approval was obtained from the iSpecimen institutional review board (protocol # 2011–332) and shared samples and data were de-identified for this study.

Quantitative PCR

Genomic DNA was isolated from all samples using the Gentra Puregene genomic DNA extraction kit (Qiagen, cat# 158389) with Proteinase K (Roche, cat# 0311587900) as outlined in the Registered Report (full protocol details are available at https://osf.io/y8eum/). Each sample was approximately the same size, with the amount of Proteinase K and other additives used based on volume calculations and assumed tissue density of 1.04 g/cm3. Following an initial qPCR run, genomic DNA was re-purified using the Qiagen DNeasy Blood and Tissue Kit (Qiagen, cat# 69504) according to the manufacturer’s instructions and the amount of starting material was increased in an attempt to optimize the detection of F. nucleatum. DNA was quantified using a Nanodrop spectrophotometer (Thermo Fisher Scientific, cat# ND-1000). qPCR was performed on the ABI 7900HT Fast Real Time PCR System (Applied Biosystems, cat# 4351405) using assays specific for each gene of interest as outlined in the Registered Report (Repass et al., 2016). For both assays, each reaction well contained 10 µL of TaqMan Universal Master Mix II (Applied Biosystems, cat# 4440039), 60 ng (first independent run) or 100 ng (second independent run) total DNA, at 30 ng/µL, and 1 µL of each assay (final forward/reverse primer concentration 900 nM and probe concentration 250 nM) in a reaction volume of 20 µL. Cycling conditions were as follows: 50°C for 2 min for UNG activation, 95°C for 10 min for polymerase activation, followed by 40 cycles of 95°C for 15 s and 60°C for 1 min. Ct values were obtained using Sequence Detection System software from Applied Biosystems, version 2.4. The reaction for the F. nucleatum assay, using six serial logarithmic dilutions of purified genomic DNA from F. nucleatum subsp. Nucleatum Strain VPI 4355 (ATCC, cat# 25586D-5), was found to have an efficiency of 99% with a sensitivity to detect at least 1 pg DNA. The reaction for the PGT assay, using four serial logarithmic dilutions of human genomic DNA, was found to have an efficiency of 96% with a sensitivity to detect at least 100 pg DNA. Standard curve data are available at https://osf.io/32e8q/. For all qPCR runs, the F. nucleatum assay had a baseline of cycles 3–30 and a threshold of 0.2 and the PGT assay had a baseline of cycles 3–20 and a threshold of 0.1. For qPCR preprocessing all nondetects were set to 40. Technical duplicates were averaged for each sample. Fold difference in F. nucleatum abundance in tumor versus normal tissue was determined using the ∆∆Ct method (Pfaffl, 2001). For each run, ∆Ct (F. nucleatum expression normalized to PGT expression) was calculated for each tissue sample and then the fold difference (2-∆∆Ct) in F. nucleatum abundance in tumor versus adjacent normal tissue was determined. There was high concordance of normalized expression between the two runs with a correlation coefficient (ρ) of 0.99, 0.98, 0.97 for adjacent normal, CRC, and matched normal tissue, respectively. The mean ∆Ct was calculated for each sample across the two independent qPCR runs and used for statistical analysis. Additional information, including all raw qPCR data, is available at https://osf.io/rb4yq/.

Sequencing PCR products

Multiple methods for sequencing PCR products from F. nucleatum amplicons were attempted, utilizing both patient tissue samples and a control of purified genomic DNA from F. nucleatum subsp. Nucleatum Strain VPI 4355. In the first approach, PCR products were amplified from the appropriate template using AmpliTaq Gold DNA Polymerase (Thermo Fisher Scientific, cat# 4311806) and F. nucleatum specific forward and reverse primers and amplification conditions as described above. No products were seen by agarose gel electrophoresis. In a second approach, TaqMan Universal Master Mix with UNG (Thermo Fisher Scientific, cat# 4440038) with the same primers and amplification conditions as above were attempted. PCR products of the correct size (~125 bp) were seen by agarose gel electrophoresis and were then used with the TOPO TA Cloning Kit (Thermo Fisher Scientific, cat# K457540) in an attempt to generate plasmids containing the F. nucleatum-derived PCR products for Sanger sequencing analysis. Several attempts resulted in no colonies positive for the correct insert. Finally, PCR products were generated using the same approach described in the second attempt (with TaqMan) and analyzed by direct Sanger sequencing using the F. nucleatum specific forward amplification primer. Automated DNA sequencing was performed using industry standard, capillary-based Applied Biosystems 3730/3730XL DNA analyzers (Applied Biosystems, Inc., Foster City, California) and BigDYe Terminator 3.1 cycle sequencing kit (Thermo Fisher Scientific, cat# 4337455).

Statistical analysis

Statistical analysis was performed with R software (RRID:SCR_001905), version 3.4.2 (R Core Team, 2017). All data, csv files, and analysis scripts are available on the OSF (https://osf.io/v4se2/). Confirmatory statistical analysis was pre-registered (https://osf.io/2wmfb/) before the experimental work began as outlined in the Registered Report (Repass et al., 2016). All additional analysis reported is exploratory. Data were checked to ensure assumptions of statistical tests were met. A meta-analysis of a common original and replication effect size, with confidence intervals determined by Fisher’s z’ transformation (Rosenthal and DiMatteo, 2001), was performed with a random effects model and the metafor R package (Viechtbauer, 2010) (available at: https://osf.io/kup8d/). Concordance between the two independent runs (raw Ct values as well as normalized F. nucleatum expression) was determined by computing Pearson correlation coefficients (ρ). The original study data was extracted a priori from the published figure during preparation of the experimental design. The extracted data was published in the Registered Report (Repass et al., 2016) and the published p-value was used in the power calculations to determine the sample size for this study.

Deviations from the registered report

The proposed analysis of age/ethnicity matched samples compared to CRC samples were unable to be performed because of undetectable F. nucleatum expression in the matched normal tissue. After an initial attempt the purification and qPCR conditions were optimized in an effort to detect F. nucleatum gene expression. This includes re-purification of genomic DNA using Qiagen DNeasy Blood and Tissue Kit and increasing the starting material from 5 ng to 60 or 100 ng of total DNA. To assess if the signal for F. nucleatum from the qPCR assay produced a specific product for F. nucleatum, we sequenced the amplicons generated as described above. Furthermore, an additional exploratory analysis was performed to test if F. nucleatum expression was different between CRC and adjacent normal tissue only in the samples with a detectable F. nucleatum signal. Additional materials and instrumentation not listed in the Registered Report, but needed during experimentation are also listed.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
    R: A language and environment for statistical computing
    1. R Core Team
    (2017)
    R Foundation for Statistical Computing, Austria.
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
    Study 50: Replication of Castellarin et al., 2012 (Genome Research)
    1. J Repass
    2. E Iorns
    3. A Denis
    4. SR Williams
    5. N Perfito
    6. TM Errington
    (2017)
    Open Science Framework, 10.17605/OSF.IO/V4SE2.
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
    Conducting Meta-Analyses in R with the metafor Package
    1. W Viechtbauer
    (2010)
    Journal of Statistical Software, 36, 10.18637/jss.v036.i03.
  32. 32
  33. 33

Decision letter

  1. Cynthia L Sears
    Reviewing Editor; Johns Hopkins University, United States

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Replication Study: Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom served as a Guest Reviewing Editor, and the evaluation has been overseen by Sean Morrison as the Senior Editor. The reviewers have opted to remain anonymous.

This paper reflects a pretty straightforward replication of one piece of the Castellarin 2012 paper. Each reviewer has some comments to be addressed in a revised version.

Reviewer #1:

The authors tried to replicate a selected experiment from the paper "Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma" (Castellarin et al., 2012). However, the authors obtained undetectable Fusobacterium nucleatum DNA by qPCR in all 3 different types of samples, despite the fact that they have no problem to measure the control gene PGT. The authors further discussed several approaches taken by others to combat this issue. It is not clear to me whether the authors have taken one of the existing approaches or just used the measurement for F. nucleatum without normalizing using PGT. Since it is an important issue, it would help reviewers to evaluate the conclusion if the authors present in detail how they solved this problem and the rationale of their solutions.

Subsection “Quantitative PCR of F. nucleatum DNA abundance from colorectal carcinoma, adjacent normal tissue, and matched normal human colon tissue”. The mean fold change (4.97) is much lower than reported by the original study (378.9). It would be nice to have some discussion on this.

Subsection “Meta-analyses of original and replicated effects” – r = 0.24, 95% CI [-0.08, 0.51] spans 0 indicting that there is no significant correlation. In contrast, the original study showed a significant positive correlation (r= 0.45, 95% CI [0.28, 0.60] (page 157-158). It would be nice to have some discussion on this.

Statistical analysis section: For some reason, I could not find the csv files or analysis scripts on the OSF.

Subsection “Quantitative PCR of F. nucleatum DNA abundance from colorectal carcinoma, adjacent normal tissue, and matched normal human colon tissue” – Is it correct that the relative abundance was calculated without using control gene PGT?

Could you please clarify how of the mean normalized expression is calculated, e.g., using control gene PGT?

Please explain how 20-40 is within the normal acceptable range (<30). Perhaps the authors meant that the normal acceptable range is <=40 or 20-30 is within the normal acceptable range (<30).

Reviewer #2:

The authors conducted the quantitative PCR of Fusobacterium nucleatum DNA abundance from colorectal carcinoma, normal tissue adjacent to carcinoma, and age/ethnicity-matched normal human colon tissue. This is an important topic, given high interest and potentials to expand new study areas. The Materials and methods are well described. The results are clearly demonstrated in the figures, and clearly summarized in the manuscript. There are some useful findings in the present paper. Following points should be considered.

In the literature, some studies used FFPE tissue and others used fresh frozen tissue. That makes a difference in quality of DNA as well as tumor neoplastic cellularity. In general, FFPE tissue can give higher tumor cellularity than fresh frozen tissue if tumor dissection from FFPE tissue is properly performed. When discussing the literature, this point needs to be kept in mind.

At the end of the Results and Discussion section, as the authors appropriately discuss, some unmeasured factors can contribute to difference in results between studies. One such factor may be diet, which can very likely influence the gut microbiota. Recently, a new study has provided evidence for differing influences of fiber-rich diet on incidence of F. nucleatum-positive colorectal carcinoma (stronger inhibition than on that of F. nucleatum-negative carcinoma) (Mehta et al., 2017). That study by Mehta et al. can be discussed as evidence pre-analytic factors such as diet that may explain the difference in study findings.

The authors may add a table/figure that shows the raw data of Ct in colorectal carcinoma, adjacent normal tissue, and matched normal human colon tissue.

The author may discuss why the point estimate of the replication effect size was not within the 95% confidence interval of the original effect size, while the point estimate of the original effect size was within the 95% CI of the replication effect size. Does this make sense?

Reviewer #3:

A few comments for the authors to address:

1) The Materials and methods state that patient phenotypes are available online. However, a description regarding the similarities and differences or unknowns between the original population described by Castellarin and the group assayed in this paper should be provided to the reader.

2) The Introduction of the paper fails to mention one approach that has directly visualized clusters of Fusobacterium associated, at least, with a subset of CRC, i.e., FISH of tumors. This data helps to support the idea that, at least for some patients, Fusobacterium nucleatum is tumor-associated. This should be included in the manuscript in this reviewer's opinion.

3) The authors should present the concordance between the 2 qPCR runs conducted. In addition they should comment as to whether the two runs identified the same patients as Fn+ using a Ct cutoff value of 35.

[Editors' note: additional revisions were requested prior to acceptance.]

Thank you for submitting your article "Replication Study: Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma" for consideration by eLife. Your article has been reviewed by Sean Morrison as the Senior Editor.

The new data has really improved the paper. We're almost there but it's still written in a way that makes it hard to extract the key observations. Below are some suggestions that could be addressed with minor changes to the text that would really improve readability:

1) In the Abstract you wrote "…5% of matched control tissue, 25% of adjacent normal, and 40% of CRC gave a positive signal". My understanding is that the new sequencing data show that many of these "positive signals" didn't actually have detectable F. nucleatum because the amplification products were non-specific. If that's true, it's misleading to write that 40% of CRCs gave a positive signal, knowing that many of these samples weren't actually positive for the bacterium. Instead, you should give the numbers of each type of sample that were confirmed to be positive based on sequencing of the amplicon, or at least that generated amplification products within a range of Ct values that were consistent with specific amplification based on the sequencing data.

2) In addition to indicating what fraction of samples of each type were truly positive for F. nucleatum, you should indicate in the Abstract whether there was a statistically significant difference in F. nucleatum levels among samples in which there was confirmed positivity for F. nucleatum in the CRC AND/OR adjacent normal sample.

3) You do a good job of mentioning all of the other studies that compared F. nucleatum between CRC and control tissue. However, you usually don't state what fraction of samples from the other studies had detectable F. nucleatum. That's the key piece of information that's required to compare your results with theirs. If that is known for some of the other studies (e.g. Gao et al., 2015 and Mima et al., 2016a) that information should be included when you first mention the studies.

4) In subsection “Quantitative PCR of F. nucleatum DNA abundance from colorectal carcinoma, adjacent normal tissue, and matched normal human colon tissue”, paragraph five, "this criteria" should be "this criterion".

5) In that same paragraph you should always state the denominator (the total number of samples analyzed) when you state the number that were positive.

6) Rather than going all the way through the paper, with much discussion of positive and negative samples, before presenting the key piece of information why not state up front the number of positive samples, taking into account both the PCR data and the sequencing data. As currently presented, much of the discussion, and the numbers in the Abstract and Results seem misleading when you finally acknowledge that many of the "positive signals" were actually non-specific – with no detectable F. nucleatum.

7) In the same section you wrote "In samples with Ct values less than 35…". If it would be accurate to write "In ALL of the samples with Ct values less than 35…" this clarification would be helpful.

8) Figure 1 is unnecessarily unclear about what is presented. Rather than graphing "2-ddCt" why not just show fold change? You should also state explicitly on the Y-axis what is divided by what (CRC/adjacent normal?) rather than leaving it for readers to try to figure out.

https://doi.org/10.7554/eLife.25801.011

Author response

Reviewer #1:

The authors tried to replicate a selected experiment from the paper "Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma" (Castellarin et al., 2012). However, the authors obtained undetectable Fusobacterium nucleatum DNA by qPCR in all 3 different types of samples, despite the fact that they have no problem to measure the control gene PGT. The authors further discussed several approaches taken by others to combat this issue. It is not clear to me whether the authors have taken one of the existing approaches or just used the measurement for F. nucleatum without normalizing using PGT. Since it is an important issue, it would help reviewers to evaluate the conclusion if the authors present in detail how they solved this problem and the rationale of their solutions.

In order to directly compare the replication results to the original study, we performed the normalization (F. nucleatum normalized to PGT) in a similar manner as the original study despite the concerns raised. We have revised the manuscript (Abstract and relevant section of the Results/Discussion section) to further explain our approach. Further we have reordered the paragraphs to first describe the approach described here before discussing approaches taken by others.

Subsection “Quantitative PCR of F. nucleatum DNA abundance from colorectal carcinoma, adjacent normal tissue, and matched normal human colon tissue”. The mean fold change (4.97) is much lower than reported by the original study (378.9). It would be nice to have some discussion on this.

We have revised the manuscript to include some discussion on these differences.

Subsection “Meta-analyses of original and replicated effects” – r = 0.24, 95% CI [-0.08, 0.51] spans 0 indicting that there is no significant correlation. In contrast, the original study showed a significant positive correlation (r= 0.45, 95% CI [0.28, 0.60] (page 157-158). It would be nice to have some discussion on this.

We have added additional discussion in the meta-analysis section of the revised manuscript.

Statistical analysis section: For some reason, I could not find the csv files or analysis scripts on the OSF.

The files (all data files, analysis/figure scripts, etc) associated with this study are currently private. They will be made public at the time of publication.

Subsection “Quantitative PCR of F. nucleatum DNA abundance from colorectal carcinoma, adjacent normal tissue, and matched normal human colon tissue”. Is it correct that the relative abundance was calculated without using control gene PGT?

The relative abundance was calculated by normalizing the F. nucleatum expression to PGT, similar to the original study. We revised the manuscript to make this more explicit.

Could you please clarify how of the mean normalized expression is calculated, e.g., using control gene PGT?

We revised the manuscript to include that F. nucleatum expression was normalized to PGT.

Please explain how 20-40 is within the normal acceptable range (<30). Perhaps the authors meant that the normal acceptable range is <=40 or 20-30 is within the normal acceptable range (<30).

The range reported (~20-40) is for all Ct values observed (PGT and F. nucleatum), which are not in the normal acceptable range of less than 30. While the PGT Ct values were in the normal range, the F. nucleatum was not. We have made this more explicit in the revised manuscript.

Reviewer #2:

The authors conducted the quantitative PCR of Fusobacterium nucleatum DNA abundance from colorectal carcinoma, normal tissue adjacent to carcinoma, and age/ethnicity-matched normal human colon tissue. This is an important topic, given high interest and potentials to expand new study areas. The Materials and methods are well described. The results are clearly demonstrated in the figures, and clearly summarized in the manuscript. There are some useful findings in the present paper. Following points should be considered.

In the literature, some studies used FFPE tissue and others used fresh frozen tissue. That makes a difference in quality of DNA as well as tumor neoplastic cellularity. In general, FFPE tissue can give higher tumor cellularity than fresh frozen tissue if tumor dissection from FFPE tissue is properly performed. When discussing the literature, this point needs to be kept in mind.

Thank you for raising this point. We have highlighted this aspect in the revised manuscript.

At the end of the Results and Discussion section as the authors appropriately discuss, some unmeasured factors can contribute to difference in results between studies. One such factor may be diet, which can very likely influence the gut microbiota. Recently, a new study has provided evidence for differing influences of fiber-rich diet on incidence of F. nucleatum-positive colorectal carcinoma (stronger inhibition than on that of F. nucleatum-negative carcinoma) (Mehta et al., 2017). That study by Mehta et al. can be discussed as evidence pre-analytic factors such as diet that may explain the difference in study findings.

Thank you for bringing this to our attention. We have included this in the revised manuscript.

The author may add a table/figure that shows the raw data of Ct in colorectal carcinoma, adjacent normal tissue, and matched normal human colon tissue.

We included distribution plots of the raw Ct counts for PGT and F. nucleatum for the three tissue types we analyzed. They are displayed in Figure 1—figure supplement 1B,C (for the first qPCR run) and Figure 1—figure supplement 2B,C (for the second qPCR run).

The author may discuss why the point estimate of the replication effect size was not within the 95% confidence interval of the original effect size, while the point estimate of the original effect size was within the 95% CI of the replication effect size. Does this make sense?

We have added additional discussion in the meta-analysis section of the revised manuscript.

Reviewer #3:

A few comments for the authors to address:

1) The Materials and methods state that patient phenotypes are available online. However, a description regarding the similarities and differences or unknowns between the original population described by Castellarin and the group assayed in this paper should be provided to the reader.

The characteristics of the patient population used in the original study were not reported or made available, thus we are unable to compare them to this replication attempt. We have revised the relevant section in the Materials and methods to describe how this is unknown.

2) The Introduction of the paper fails to mention one approach that has directly visualized clusters of Fusobacterium associated, at least, with a subset of CRC, i.e., FISH of tumors. This data helps to support the idea that, at least for some patients, Fusobacterium nucleatum is tumor-associated. This should be included in the manuscript in this reviewer's opinion.

We have included this in the Introduction of the revised manuscript.

3) The authors should present the concordance between the 2 qPCR runs conducted. In addition they should comment as to whether the two runs identified the same patients as Fn+ using a Ct cutoff value of 35.

Thank you for these suggestions. We have included this information in the revised manuscript. There was high concordance of the raw Ct values between both runs, as well as the gene expression. Additionally, the same patients were identified with a Ct cutoff of 35, with the exception of one patient that was over the cutoff for the first run (Ct = 37.1) and just under it for the second run (Ct = 34.9). Interestingly, all patients that were identified as F. nucleatum positive in their adjacent normal tissue were also identified positive in their CRC tissue. We have included this additional analysis in the revised manuscript.

[Editors' note: additional revisions were requested prior to acceptance.]

1) In the Abstract you wrote "…5% of matched control tissue, 25% of adjacent normal, and 40% of CRC gave a positive signal". My understanding is that the new sequencing data show that many of these "positive signals" didn't actually have detectable F. nucleatum because the amplification products were non-specific. If that's true, it's misleading to write that 40% of CRCs gave a positive signal, knowing that many of these samples weren't actually positive for the bacterium. Instead, you should give the numbers of each type of sample that were confirmed to be positive based on sequencing of the amplicon, or at least that generated amplification products within a range of Ct values that were consistent with specific amplification based on the sequencing data.

We agree with this suggestion and have revised the Abstract to reflect only the numbers and results of the confirmed positive samples.

2) In addition to indicating what fraction of samples of each type were truly positive for F. nucleatum, you should indicate in the Abstract whether there was a statistically significant difference in F. nucleatum levels among samples in which there was confirmed positivity for F. nucleatum in the CRC AND/OR adjacent normal sample.

We agree with this suggestion and have revised the Abstract to reflect only the statistical analysis performed on the confirmed positive samples from CRC and adjacent normal.

3) You do a good job of mentioning all of the other studies that compared F. nucleatum between CRC and control tissue. However, you usually don't state what fraction of samples from the other studies had detectable F. nucleatum. That's the key piece of information that's required to compare your results with theirs. If that is known for some of the other studies (e.g. Gao et al., 2015 and Mima et al., 2016a) that information should be included when you first mention the studies.

We included most of this information in the Results/Discussion section; however we agree including this information when the studies are first mentioned in the Introduction is more fitting and have revised the manuscript accordingly.

4) In subsection “Quantitative PCR of F. nucleatum DNA abundance from colorectal carcinoma, adjacent normal tissue, and matched normal human colon tissue”, paragraph five, "this criteria" should be "this criterion".

Thank you for catching this error. We have fixed it in the revised manuscript.

5) In that same paragraph you should always state the denominator (the total number of samples analyzed) when you state the number that were positive.

Thank you for the suggestion. We have included the total number of samples analyzed for each of the numbers given in this section.

6) Rather than going all the way through the paper, with much discussion of positive and negative samples, before presenting the key piece of information why not state up front the number of positive samples, taking into account both the PCR data and the sequencing data. As currently presented, much of the discussion, and the numbers in the Abstract and Results seem misleading when you finally acknowledge that many of the "positive signals" were actually non-specific – with no detectable F. nucleatum.

We agree with this suggestion and have revised the Results/Discussion to start with the observation of the Ct values, then discuss the sequencing and the number of positive samples taking into account both the PCR data and the sequencing data, and ending with the comparison to the original study (only using the PCR data), before going into the meta-analysis section. We have also reordered the figures to include the subset analysis (confirmed F. nucleatum positive) as Figure 1 and the complete analysis of all samples (just PCR data) as Figure 2.

7) In the same section you wrote "In samples with Ct values less than 35…". If it would be accurate to write "In ALL of the samples with Ct values less than 35…" this clarification would be helpful.

We agree and have revised the text as suggested in the manuscript.

8) Figure 1 is unnecessarily unclear about what is presented. Rather than graphing "2-ddCt" why not just show fold change? You should also state explicitly on the Y-axis what is divided by what (CRC/adjacent normal?) rather than leaving it for readers to try to figure out.

For each Replication Study, we have presented the results in the same way as the original study to provide a direct comparison. This is the way the original study had presented the results; however we agree it could be made clearer and have revised the y-axis of Figure 1 and Figure 2 to better reflect what is being plotted. It is the fold change in F. nucleatum abundance (normalized to PGT) in CRC versus adjacent normal tissue.

https://doi.org/10.7554/eLife.25801.012

Article and author information

Author details

  1. John Repass

    ARQ Genetics, Bastrop, United States
    Contribution
    Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    ARQ Genetics is a Science Exchange associated lab.
  2. Reproducibility Project: Cancer Biology

    Contribution
    Analysis and interpretation of data, Drafting or revising the article
    For correspondence
    1. tim@cos.io
    2. nicole@scienceexchange.com
    Competing interests
    EI, NP: Employed by and hold shares in Science Exchange Inc.
    1. Elizabeth Iorns, Science Exchange, Palo Alto, United States
    2. Alexandria Denis, Center for Open Science, Charlottesville, United States
    3. Stephen R Williams, Center for Open Science, Charlottesville, United States
    4. Nicole Perfito, Science Exchange, Palo Alto, United States
    5. Timothy M Errington, Center for Open Science, Charlottesville, United States

Funding

Laura and John Arnold Foundation

  • Reproducibility Project: Cancer Biology

The funder had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The Reproducibility Project: Cancer Biology would like to thank Courtney Soderberg at the Center for Open Science for assistance with statistical analyses and the following companies and labs for generously donating reagents to the Reproducibility Project: Cancer Biology; American Type and Tissue Collection (ATCC), Applied Biological Materials, BioLegend, Charles River Laboratories, Corning Incorporated, DDC Medical, EMD Millipore, Harlan Laboratories, LI-COR Biosciences, Mirus Bio, Novus Biologicals, Sigma-Aldrich, and System Biosciences (SBI).

Ethics

Human subjects: Patient tissue samples were obtained from iSpecimen (Lexington, Massachusetts) as flash-frozen in liquid nitrogen shortly after harvest. Approval was obtained from the iSpecimen institutional review board (protocol # 2011-332) and shared samples and data were de-identified for this study.

Reviewing Editor

  1. Cynthia L Sears, Johns Hopkins University, United States

Publication history

  1. Received: February 8, 2017
  2. Accepted: January 26, 2018
  3. Version of Record published: March 13, 2018 (version 1)

Copyright

© 2018, Repass et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 2,402
    Page views
  • 175
    Downloads
  • 5
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Cancer Biology
    Cynthia L Sears
    Insight

    The association between the bacterium Fusobacterium nucleatum and human colon cancer is more complicated than it first appeared.

    1. Cancer Biology
    Curated by Roger Davis et al.
    Collection Updated

    Investigating reproducibility in preclinical cancer research.