1. Cell Biology
  2. Human Biology and Medicine
Download icon

Registered report: Senescence surveillance of pre-malignant hepatocytes limits liver cancer development

  1. Samrrah Raouf
  2. Claire Weston
  3. Nora Yucel
  4. Reproducibility Project: Cancer Biology Is a corresponding author
  1. University of California–Davis
  2. Reveal Biosciences
  3. Stanford University
Registered Report
Cited
1
Views
988
Comments
0
Cite as: eLife 2015;4:e04105 doi: 10.7554/eLife.04105

Abstract

The Reproducibility Project: Cancer Biology seeks to address growing concerns about reproducibility in scientific research by conducting replications of 50 papers in the field of cancer biology published between 2010 and 2012. This Registered report describes the proposed replication plan of key experiments from ‘Senescence surveillance of pre-malignant hepatocytes limits liver cancer development’ by Kang et al. (2011), published in Nature in 2011. The experiments that will be replicated are those reported in Figures 3B, 3C, 3E, and 4A. In these experiments, Kang et al. (2011) demonstrate the phenomenon of oncogene-induced cellular senescence and immune-mediated clearance of senescent cells after intrahepatic injection of NRAS (Figures 2I, 3B, 3C, and 3E). Additionally, Kang et al. (2011) show the specific necessity of CD4+ T cells for immunoclearance of senescent cells (Figure 4A). The Reproducibility Project: Cancer Biology is a collaboration between the Center for Open Science and Science Exchange, and the results of the replications will be published by eLife.

https://doi.org/10.7554/eLife.04105.001

Introduction

Cellular senescence—a permanent state of proliferative arrest—has been shown to be an important failsafe mechanism against tumor development in vivo. Aberrant activation of oncogenes can force normal cells into a senescent state of stable cell cycle arrest (Narita and Lowe, 2005). Senescence is often associated with a pre-malignant state, and a wide range of cancers have been shown to harbor populations of senescent cells (Quintanilla et al., 1986; Braig et al., 2005; Collado et al., 2005; Bennecke et al., 2010). An ongoing debate in the field of cancer biology has been the phenomenon of ‘immune surveillance’, or the ability of the immune system to specifically identify and eliminate nascent pre-cancerous (e.g., senescent) cells before they can cause harm (Swann and Smyth, 2007). The possibility of tailored, pro-senescent therapies for cancer treatment has provoked considerable interest.

Kang et al. (2011) report that pre-malignant, senescent hepatocytes are recognized and cleared through a CD4+ T-cell-specific immune response and that this immunosurveillance is crucial for tumor suppression in vivo. The authors attempted to mimic aberrant oncogene activation by stably delivering oncogenic NrasG12V into mouse livers in vivo. Hydrodynamic injection of transposable elements resulted in mosaic generation of Nras-expressing hepatocytes that displayed senescent markers. Time course analyses revealed a progressive loss of NrasG12V expressing senescent cells within 2 months after stable intrahepatic delivery of oncogenic NrasG12V. Studies were carried out in wild-type mice, as well as immunocompromised mice (SCID/beige and CD4−/−). Importantly, all analyses were paralleled using an NrasG12V effector loop mutant (NrasG12V/D38A) incapable of signaling to downstream pathways (Kang et al., 2011).

In Figures 2I, 3B, 3C, and 3E, Kang et al. (2011) tested whether intrahepatic expression of oncogenic NrasG12V induced cellular senescence in affected hepatocytes (as indicated by the expression of pro-senescent markers p16 and p21), and if senescent cells were eventually cleared over a time course spanning 60 days. Further, Kang et al. demonstrated the necessity of a functional adaptive immune response in clearing senescent cells, as mice harboring mutations in adaptive immunity (SCID/beige) showed significant attenuation in the ability to clear pre-malignant cells. These key experiments, which test the major underlying hypotheses of the paper, are replicated with Protocol 1.

In Figure 4A, Kang et al. demonstrated the specific necessity of CD4+ T lymphocytes in undertaking senescence surveillance. They showed that CD4−/− mice displayed significantly abrogated clearing of Nras-positive hepatocytes 12 days following intrahepatic injection of NrasG12V; no immunoclearance was observed for either CD4−/− mice or wild-type controls expressing the defective signaling mutant NrasG12V/D38A. These key results suggest that an intact CD4+ T-cell-mediated adaptive immune response is necessary for senescence surveillance of pre-malignant hepatocytes. These experiments will be replicated in Protocol 2.

To date, conflicting reports of Ras-associated senescence have been reported. Using a mouse model with conditional pancreatic expression of Kras, Kennedy et al. (2011) demonstrated that activated KrasG12D-induced senescence in pancreatic cells. However, using an intrahepatic injection system similar to Kang et al., Ho et al. (2012) did not detect positive markers for senescence in the livers of Nras-injected wild-type mice as compared to controls. Multiple studies have observed immunoclearance of senescent cells similar to results reported by Kang et al. Xue et al. (2007) reported rapid clearance of senescent cells by the immune system, and Rakhra et al. (2010) reported similar findings, as well as demonstrating that CD4+ T cells were necessary for clearing senescent cells. Similarly, Acosta et al. (2013) demonstrated clearance of Nras-positive cells in wild-type mice, but impaired clearance in mice treated with various immunosuppressive drugs. Interestingly, co-injection of NrasG12V along with other oncogenes failed to manifest senescence surveillance, and instead resulted in an aggressive tumor phenotype (Brinkhoff et al., 2014). Likewise, intrahepatic activation of SV40 large T antigen resulted in hepatocellular carcinoma development and did not activate immunosurveillance (Willimsky and Blankenstein, 2005).

Materials and methods

Unless otherwise noted, all protocol information was derived from the original paper, references from the original paper, or information obtained directly from the authors. An asterix (*) indicates data or information provided by the Reproducibility Project: Cancer Biology core team. A hashtag (#) indicates information provided by the replicating lab.

Protocol 1: generation of oncogene-induced senescence and immunosurveillance in murine hepatocytes

This protocol assesses whether intrahepatic expression of oncogenic NrasG12V induces cellular senescence, and if such pre-malignant senescent hepatocytes are eventually eradicated over time, as is depicted in Figures 2I, 3B, 3C, and 3E. This protocol also determines the necessity of a functional adaptive immune response in clearing senescent cells by measuring the abrogation of Nras-, p21-, and p16-positive hepatocytes in severely immune-compromised mice (SCID/beige) injected with either NrasG12V or NrasG12V/D38A.

Sampling

  • These experiments will analyze a minimum of five mice per treatment group, for a total power of between 89.8% and 99.9%.

    1. See ‘Power calculations’ section for details.

    2. In order to account for variability in hydrodynamic injections that might contribute to exclusion of study animals (estimated as a 20% failure rate by Kang et al.), the initial starting sample size will be seven mice per treatment group. The inclusion of two additional animals will help ensure that at least five animals are available for analysis at the conclusion of the study.

  • Outline of experimental conditions:

    1. Mouse Cohort 1 (tissue processed at 6 days post-injection).

      • 14 female C.B-17 wild-type mice (4–6 weeks old).

        • a. Seven mice injected with pPGK-SB13 + pT/CaggsNrasG12V

        • b. Seven mice injected with pPGK-SB13 + pT/CaggsNrasG12V/D38A

      • 14 female C.B-17 SCID/beige mice (4–6 weeks old).

        • a. Seven mice injected with pPGK-SB13 + pT/CaggsNrasG12V

        • b. Seven mice injected with pPGK-SB13 + pT/CaggsNrasG12V/D38A

    2. Mouse Cohort 2 (tissue processed at 30 days post-injection).

      • 14 female C.B-17 wild-type mice (4–6 weeks old).

        • a. Seven mice injected with pPGK-SB13 + pT/CaggsNrasG12V

        • b. Seven mice injected with pPGK-SB13 + pT/CaggsNrasG12V/D38A

      • 14 female C.B-17 SCID/beige mice (4–6 weeks old).

        • a. Seven mice injected with pPGK-SB13 + pT/CaggsNrasG12V

        • b. Seven mice injected with pPGK-SB13 + pT/CaggsNrasG12V/D38A

Materials and reagents

ReagentTypeManufacturerCatalog #Comments
pPGK-SB13 transposase vectorDNA VectorN/AN/AOriginal reagent obtained from authors
pT/CaggsNrasG12V transposon vectorDNA VectorN/AN/AOriginal reagent obtained from authors
pT/CaggsNrasG12V/D38A transposon vectorDNA VectorN/AN/AOriginal reagent obtained from authors
GenElute Endotoxin-free Plasmid Maxiprep KitReagentSigmaPLEX15-1KTThis kit replaces the Qiagen Endo-free Maxiprep kit used by the original authors
Ketamine HClAnestheticSpecific brand information will be left up to the discretion of the provider lab and recorded later
Acepromazine maleateAnesthetic
Butorphanol tartrateAnesthetic
Dulbecco's Phosphate Buffered SalineReagentSigma–AldrichD1408Original brand not specified
ParaformaldehydeReagentSigma–Aldrich158127Original brand not specified
Mouse anti-Nras (F155)AntibodySanta CruzSc-31Same as original
Mouse anti-p21 (SXM30)AntibodyBD Pharmingen556431Same as original
Rabbit anti-p16 (M156)AntibodySanta Cruzsc-1207This clone is no longer available. We will work with the provider lab to determine a suitable replacement
Biotinylated goat anti-mouse IgG1-conjugateAntibodyThermo-FisherTM-060-BNSame as original
Biotinylated goat anti-rabbit IgG- conjugateAntibodyThermo-FisherTR-060-BNSame as original
IgG1 mouse isotype control antibodyAntibodySigma–AldrichM5284Originally not included
IgG rabbit isotype control antibodyAntibodyWe will obtain a suitable rabbit IgG control antibody if necessary
Streptavidin-HRP conjugateReagentSigma–AldrichGERPN1231Original not specified
Diaminobenzidine (DAB)IHC StainSpecific brand information will be left up to the discretion of the provider lab and recorded later
HaematoxylinIHC Stain
EosinIHC Stain
4-week-old female Fox Chase CB17 (C.BKa-lghb/lcrCrl)MouseCharles RiverStrain 251
4-week-old female Fox Chase SCID Beige (CB17.Cg-PrkdcscidLystbg-J/Crl)Mouse lineCharles RiverStrain 250

Procedure

Note: The following procedures (steps 1–5) are presented in greater detail in Bell et al. (2007). Please consult this resource for enhanced experimental protocol detail and description.

  1. Grow and prepare endotoxin-free plasmid constructs according to the manufacturer's protocol for the GenElute Endotoxin-free Plasmid Maxiprep Kit. Store DNA at ∼1–2 µg/µl in 10 mM Tris–HCl, pH 7.2, 0.1 mM ethylenediaminetetraacetic acid (EDTA) at 4°C.

    • a. Prepare >400 µg of pPGK-SB13 transposase vector.

    • b. Prepare >1000 µg of pT/CaggsNrasG12V transposon vector.

    • c. Prepare >1000 µg of pT/CaggsNrasG12V/D38A transposon vector.

  2. Sequence plasmids to confirm identity and run on gel to confirm vector integrity, as well as Nras mutational status.

    • a. Use the following pT/CaggsNras sequencing primers:

      • i. Forward: tgtgaccggcggctctaga.

      • ii. Reverse: cgaggctgatccttgaaagtggctctt.

  3. Obtain 4-week-old female C.B-17 wild-type and C.B-17 SCID/beige mice.

    • a. Mice should be randomized into strain-based treatment and control groups and co-housed so that mice of the same strain receiving either NrasG12V or NrasG12V/D38A injections are housed together.

    • b. House mice in ventilated cages (IVC) that are specific pathogen free (SPF). Allow mice 1 week to acclimate to new housing prior to injection.

  4. Prepare 5-week-old mice for injection.

    • a. Mix anesthetic cocktail: 8 mg/ml ketamine HCl, 0.1 mg/ml acepromazine maleate, and 0.01 mg/ml butorphanol tartrate.

    • b. Weigh mice.

    • c. Administer anesthetic to mice; avoid rendering them unconscious (mice should appear ‘drowsy’).

      • i. For mice 25–30 g, inject 50 µl of cocktail i.p.

      • ii. For mice <20 g, inject 25–30 µl of cocktail i.p.

  5. Inject 5-week-old mice with transposon/transposase:

    • a. Mix at a 5:1 molar ratio of transposon to transposase-encoding plasmid for a total injection mass of 30 µg.

      • i. 25 µg pT/CaggsNrasG12V or pT/CaggsNrasG12V/D38A.

      • ii. 5 µg pPGK-SB13.

    • b. Weigh mice.

    • c. Suspend 30 µg of plasmids in 0.9% NaCl at a final volume of 10% of the animal's body weight (upper limit is 2.5 ml injection volume).

    • d. Blindly deliver DNA solution by hydrodynamic tail vein injection within a period of 4–7 s.

      • i. Exclusion criteria for injection: as soon as resistance is encountered during tail-vein injection, the respective animal cannot be included for studies. Injections taking >10 s are considered unsuccessful and are not included in test groups.

  6. At days 6 and 30 post-injection of DNA, euthanize mice and harvest liver tissue.

    • a. Tissues should be randomized and blinded and shipped to Reveal Bioscience for downstream analysis.

  7. Fix tissue and embed in paraffin using the following procedure:

    • a. Fix tissues with 4% paraformaldehyde (PFA) overnight at 4°C.

    • b. Rinse tissues with phosphate-buffered saline (PBS) (five washes for 1 hr each at 4°C).

    • c. Dehydrate tissues using a step-wise gradient from 70–100% ETOH and xylene using an automatic tissue processor.

    • d. Embed tissues in paraffin.

  8. Section tissues and perform immunohistochemistry using the following procedure:

    • a. Use a microtome to prepare 2–3 µm sections.

    • b. Mount sections on slides.

    • c. Deparaffinize sections using xylene and ETOH and rehydrate.

    • d. Perform heat-induced epitope retrieval using sodium citrate, pH 6.0.

    • e. Rinse sections with tris-buffered saline (TBS) (one wash for 3 min).

    • f. Block tissues with 3% hydrogen peroxide (H2O2) for 10 min.

    • g. Rinse sections with TBS (one wash for 3 min).

    • h. Block sections with blocking solution for 5 min.

    • i. Rinse sections with TBS (one wash for 3 min).

    • j. Incubate sections with primary antibody overnight at 4°C.

    • k. Rinse sections with TBS (one wash for 3 min).

    • l. Incubate sections with secondary antibody for 30 min at RT.

    • m. Rinse sections with TBS (one wash for 3 min).

    • n. Incubate sections with Streptavidin-HRP for 10–15 min at RT.

    • o. Rinse sections with TBS (one wash for 3 min).

    • p. Incubate sections with 3,3′ Diaminobenzidine (DAB) + DAB substrate.

    • q. Rinse sections with ddH2O.

    • r. Counterstain sections with Haematoxylin for 30 s.

    • s. Rinse sections with ddH2O for 10–15 min.

    • t. Dehydrate and mount sections for imaging.

  9. Perform immunohistochemistry staining on each tissue section with the following:

    • a. Primary antibodies for N-ras, p21, and p16.

      • i. mouse-anti-Nras, 1:100 dilution.

      • ii. mouse-anti-p21, 1:50 dilution.

      • iii. rabbit-anti-p16, 1:50 dilution.

    • b. Secondary antibodies.

      • i. Biotinylated goat anti-mouse IgG1-conjugate.

      • ii. Biotinylated goat anti-rabbit IgG- conjugate.

    • c. Isotype control primary antibody + secondary antibody.

      • i. IgG1 mouse isotype control antibody.

      • ii. IgG rabbit isotype control antibody.

    • d. Secondary antibody only.

  10. Blindly examine five 200× fields from two stained liver sections from each mouse.

    • a. Count 200 total cells per field and record number of Nras-, p21-, and p16-positvely stained cells for each image.

Deliverables

  • Data to be collected:

    1. Sequencing information and gel-verification of pT/CaggsNras plasmids.

    2. Mouse health records (weight at time of injection, duration(s) of injection, health over course of experiment, etc).

    3. Images of all stained sections, including controls. (Compare visually to Figure 2I).

    4. Raw total cell counts and Nras-, p21-, and p16-positively stained cells for each field examined.

    5. Bar graph of Nras-, p21-, and p16-positively stained cells as a percentage of total cells in field for each condition. (Compare to Figures 3B, 3C, and 3E).

  • Samples delivered for further analysis:

    1. Purified transposon/transposase plasmids will be used in Protocol 2.

Confirmatory analysis plan

This replication attempt will perform the following statistical analyses listed below. First, the replication effect size will be computed and compared to the original effect size for each analysis. Second, the original effect size and the replication effect size will be combined into a single effect size using a meta-analytic approach and will be presented as a forest plot.

  • Statistical Analysis of the Replication Data:

    1. Three-way ANOVA (2 × 2 × 2), comparing the percent of Nras-positive cells in C.B-17 wild-type and C.B-17 SCID/beige mouse livers 6 and 30 days after stable delivery of NrasG12V or NrasG12V/D38A.

      • Two-way ANOVA comparing the percent of Nras-positive cells in C.B-17 wildtype and C.B-17 SCID/beige mouse livers 6 and 30 days after stable delivery of NrasG12V.

        • a. Planned comparisons with the Bonferroni correction:

          • i. C.B-17 wild-type mouse livers 6 days after delivery compared to C.B-17 wild-type mouse livers 30 days after delivery.

          • ii. C.B-17 wild-type mouse livers 30 days after delivery compared to C.B-17 SCID/beige mouse livers 30 days after delivery.

      • Two-way ANOVA comparing the percent of Nras-positive cells in C.B-17 wildtype and C.B-17 SCID/beige mouse livers 6 and 30 days after stable delivery of NrasG12V/D38A.

    2. Three-way ANOVA (2 × 2 × 2), comparing the percent of p21-positive cells in C.B-17 wild-type mouse livers vs C.B-17 SCID/beige mouse livers 6 and 30 days after stable delivery of NrasG12V or NrasG12V/D38A.

      • Two-way ANOVA comparing the percent of p21-positive cells in C.B-17 wildtype and C.B-17 SCID/beige mouse livers 6 and 30 days after stable delivery of NrasG12V.

        • a. Planned comparisons with the Bonferroni correction:

          • i. C.B-17 wild-type mouse livers 6 days after delivery compared to C.B-17 wild-type mouse livers 30 days after delivery.

          • ii. C.B-17 wild-type mouse livers 30 days after delivery compared to C.B-17 SCID/beige mouse livers 30 days after delivery.

      • Two-way ANOVA comparing the percent of p21-positive cells in C.B-17 wildtype and C.B-17 SCID/beige mouse livers 6 and 30 days after stable delivery of NrasG12V/D38A.

    3. Three-way ANOVA (2 × 2 × 2), comparing the percent of p16-positive cells in C.B-17 wild-type mouse livers vs C.B-17 SCID/beige mouse livers 6 and 30 days after stable delivery of NrasG12V or NrasG12V/D38A.

      • Two-way ANOVA comparing the percent of p16-positive cells in C.B-17 wildtype and C.B-17 SCID/beige mouse livers 6 and 30 days after stable delivery of NrasG12V.

        • a. Planned comparisons with the Bonferroni correction:

          • i. C.B-17 wild-type mouse livers 6 days after delivery compared to C.B-17 wild-type mouse livers 30 days after delivery.

          • ii. C.B-17 wild-type mouse livers 30 days after delivery compared to C.B-17 SCID/beige mouse livers 30 days after delivery.

      • Two-way ANOVA comparing the percent of p16-positive cells in C.B-17 wildtype and C.B-17 SCID/beige mouse livers 6 and 30 days after stable delivery of NrasG12V/D38A.

Known differences from the original study

The replication will only include evaluation of mice at days 6 and 30 post-injection. The original study also included evaluations at days 12 and 60 post-injection, as well as further analysis of liver tissue at 7 months post-injection. The replication is only comparing wild-type and SCID/beige mice; SCID mice were also included in the original study. The replication is including staining for Nras, p16, and p21. The original study also included pERK staining. All known differences in reagents are listed in the materials and reagents section above, as indicated in the comments section. All differences have the same capabilities as the original and are not expected to alter the experimental design.

Provisions for quality control

Each of the transposon and transposase plasmids will be verified for sequence identity and DNA integrity. All immunohistochemistry experiments will include isotype antibody controls and secondary-only controls. All mice will be handled and housed in accordance with the Institutional Animal Care and Use Committee (IACUC) and the University of California-Davis. Exclusion criteria for successful hydrodynamic injections will be adhered to, as detailed above. According to the original authors, there is an expected failure rate of ∼20% for hydrodynamic injection. Therefore, we are including extra animals in each treatment group so that we can maintain the samples sizes necessary to achieve high statistical power (see ‘Power calculations’). All of the raw data, including immunohistochemistry controls and complete mouse health records, will be uploaded to the project page on the OSF (https://osf.io/82nfe/) and made publically available.

Protocol 2: determining the significance of CD4+ T lymphocyte cells in senescence surveillance

This protocol assesses the specific necessity for CD4+ T lymphocytes in immune surveillance of pre-malignant senescent hepatocytes. Oncogenic NrasG12V and non-oncogenic NrasG12V/D38A will be intrahepatically delivered to either wild-type or CD4−/− mice via hydrodynamic injection. Senescence surveillance will be determined by measuring the clearance of Nras-positive cells from liver tissue sections 12 days following injection. This protocol replicates experiments reported in Figure 4A.

Sampling

  1. These experiments will analyze a minimum of five mice per treatment group, for a total power of 94.6%.

    • See ‘Power calculations’ section for details.

    • In order to account for variability in hydrodynamic injections that might contribute to exclusion of study animals (estimated as a 20% failure rate by Kang et al.), the initial starting sample size will be seven mice per treatment group. The inclusion of two additional animals will help ensure that at least five animals are available for analysis at the conclusion of the study.

  2. Outline of experimental conditions: (tissue processed at 12 days post-injection).

    • 14 female C57/BL6 (H-2b) wild-type mice (4–6 weeks old).

      • a. Seven mice injected with pPGK-SB13 + pT/CaggsNrasG12V.

      • b. Seven mice injected with pPGK-SB13 + pT/CaggsNrasG12V/D38A.

    • 14 female C57/BL6 CD4−/− (4–6 weeks old).

      • a. Seven mice injected with pPGK-SB13 + pT/CaggsNrasG12V.

      • b. Seven mice injected with pPGK-SB13 + pT/CaggsNrasG12V/D38A.

Materials and reagents

ReagentTypeManufacturerCatalog #Comments
pPGK-SB13 transposase vectorDNA VectorN/AN/AOriginal reagent obtained from authors
pT/CaggsNrasG12V transposon vectorDNA VectorN/AN/AOriginal reagent obtained from authors
pT/CaggsNrasG12V/D38A transposon vectorDNA VectorN/AN/AOriginal reagent obtained from authors
Ketamine HClAnestheticSpecific brand information will be left up to the discretion of the provider lab and recorded later
Acepromazine maleateAnesthetic
Butorphanol tartrateAnesthetic
Dulbecco's Phosphate Buffered SalineReagentSigma–AldrichD1408Original brand not specified
ParaformaldehydeReagentSigma–Aldrich158127Original brand not specified
C57/BL6J (H-2b)Mouse lineJackson LaboratoryStrain #000664
C57/BL6 CD4−/− (H-2b) (B6.129S2-CD4tm1Mak/J)Mouse lineJackson LaboratoryStrain #002663
Mouse anti-Nras (F155)AntibodySanta CruzSc-31
Biotinylated goat anti-mouse IgG1-conjugateAntibodyThermo-FisherTM-060-BN
IgG1 mouse isotype control antibodyAntibodySigma–AldrichM5284
Diaminobenzidine (DAB)IHC StainSpecific brand information will be left up to the discretion of the provider lab and recorded later
HaematoxylinIHC Stain
EosinIHC Stain

Procedure

  1. Obtain 4-week-old female C57/BL6 wild-type and C57/BL6 CD4−/− mice.

    • a. House mice in individual ventilated cages (IVC) that are specific pathogen free (SPF). Allow mice 1 week to acclimate to new housing prior to injection.

    • b. Mice should be randomized into strain-based treatment and control groups and co-housed so that mice of the same strain receiving either NrasG12V or NrasG12V/D38A injections are housed together.

  2. Using plasmids prepared in Protocol 1, randomly inject 5-week-old mice with transposon/transposase. Use detailed protocol provided in Protocol 1, as well as in the reference (Bell et al., 2007).

  3. At 12 days post-injection of DNA, harvest liver tissue from mice.

    • a. Tissues should be randomized and blinded and shipped to Reveal Bioscience for downstream analysis.

  4. Fix, process, embed, and section tissues as described in Protocol 1.

  5. Perform immunohistochemistry staining on each tissue section with the following:

    • a. Primary antibody for N-ras.

      • i. Mouse anti-Nras, 1:100 dilution.

    • b. Secondary antibody.

      • i. Biotinylated goat anti-mouse IgG1-conjugate.

    • c. Isotype control primary antibody + secondary antibody.

      • i. IgG1 mouse isotype control antibody.

    • d. Secondary antibody only.

  6. Blindly examine five 200× fields from two stained liver sections from each mouse.

    • a. Count 200 total cells per field and record number of Nras-positively stained cells for each image.

Deliverables

  • Data to be collected:

    1. Mouse records (weight at time of injection, duration(s) of injection, health over course of experiment, etc).

    2. Images of all stained sections, including controls.

    3. Raw total cell counts and Nras-positively stained cells for each field examined.

    4. Graph of Nras-positively stained cells as a percentage of total cells in field for each condition. (compare to Figure 4A).

Confirmatory analysis plan

This replication attempt will perform the following statistical analyses listed below. First, the replication effect size will be computed and compared to the original effect size for each analysis. Second, the original effect size and the replication effect size will be combined into a single effect size using a meta-analytic approach and will be presented as a forest plot.

  • Statistical analysis of the replication data:

    1. Two-way ANOVA, comparing the percent of Nras-positive cells in both wild-type and CD4−/− mice injected with NrasG12V vs wild-type and CD4−/− mice injected with NrasG12V/D38A.

      • Planned comparisons with the Bonferroni correction.

        • a. The percent of Nras-positive cells in wild-type mice injected with NrasG12V compared to the percent of Nras-positive cells in wild-type mice injected with NrasG12V/D38A.

        • b. The percent of Nras-positive cells in wild-type mice injected with NrasG12V compared to the percent of Nras-positive cells in CD4−/− mice injected with NrasG12V.

    2. Note: In order to enable a direct comparison to the original data, an unpaired, two-tailed t-test (performed outside the framework of an ANOVA), comparing the percent of Nras-positive cells in wild-type mice injected with NrasG12V vs CD4−/− mice injected with NrasG12V will also be performed.

Known differences from the original study

The replication will only include assessing genetically null CD4−/− mice and wild-type controls at a time point of 12 days following intrahepatic injections. The original study also evaluated α-CD4 antibody-depleted mice, as well as genetically null and antibody-depleted CD8−/− mice and genetically null Cd1d−/− mice. The original study also evaluated CD4−/− mice and wild-type mice 7 months after injection. All known differences in reagents are listed in the materials and reagents section above, as indicated in the comments section. All differences have the same capabilities as the original and are not expected to alter the experimental design.

Provisions for quality control

Each of the transposon and transposase plasmids will be verified for sequence identity and DNA integrity. All immunohistochemistry experiments will include isotype antibody controls and secondary-only controls. All mice will be handled and housed in accordance with the Institutional Animal Care and Use Committee (IACUC) and the University of California-Davis. Exclusion criteria for successful hydrodynamic injections will be adhered to, as detailed above. According to the original authors, there is an expected failure rate of ∼20% for hydrodynamic injection. Therefore, we are including extra animals in each treatment group so that we can maintain the samples sizes necessary to achieve high statistical power (see ‘Power calcualtions’). All of the raw data, including immunohistochemistry controls and complete mouse health records will be uploaded to the project page on the OSF (https://osf.io/82nfe/) and made publically available.

Power calculations

Protocol 1

Summary of original data from Figure 3B (Kang et al., 2011):

Nras-positive hepatocytes (Figure 3B)MeanSDN
CB17 wild-type injected with NrasG12V (Day 6)15.43.94
CB17 wild-type injected with NrasG12V (Day 30)1.20.74
SCID/beige injected with NrasG12V (Day 6)16.42.84
SCID/beige injected with NrasG12V (Day 30)12.82.94

Test family

2-way ANOVA: Fixed effects, special, main effects and interactions, alpha error = 0.05.

  • Power calculations were performed with effects reported in original study using G*Power software (version 3.1.7) (Faul et al., 2007).

  • ANOVA F statistic calculated with GraphPad Prism 6.0.

  • Partial η2 calculated from Lakens (2013).

Power calculations for replication

GroupF (DFn, Dfd)Partial η2Effect size fA priori powerTotal Sample Size
NrasG12VF(1, 12) = 14.0670 (interaction)0.5396481.08270590.6%*12* (3/group)
  1. *

    20 total (5/group) will be used based on the p21 planned comparison calculations making the power 99.5%.

Test family

  • 2 tailed t test, difference between two independent means, Bonferroni's correction: alpha error = 0.025.

Power calculations (performed with G*Power software, version 3.1.7 [Faul et al., 2007]).

Group 1Group 2Effect size dA priori powerGroup 1 sample sizeGroup 2 sample size
CB17/G12V/Day 6CB17/G12V/Day 305.06819796.5%*3*3*
CB17/G12V/Day 6SCIDBeige/G12V/Day 305.49892798.3%*3*3*
  1. *

    Five in each group will be used based on the p21 planned comparison calculations making the power 99.9%.

Summary of original data from Figure 3C (Kang et al., 2011):

p21-positive hepatocytes (Figure 3C)MeanSDN
CB17 wild-type injected with NrasG12V (Day 6)15.52.04
CB17 wild-type injected with NrasG12V (Day 30)0.81.74
SCID/beige injected with NrasG12V (Day 6)15.31.94
SCID/beige injected with NrasG12V (Day 30)8.83.94

Test family

2-way ANOVA: Fixed effects, special, main effects and interactions, alpha error = 0.05.

  • Power calculations were performed with effects reported in original study using G*Power software (version 3.1.7) (Faul et al., 2007).

  • ANOVA F statistic calculated with GraphPad Prism 6.0.

  • Partial η2 calculated from Lakens (2013).

Power calculations for replication

GroupF (DFn, Dfd)Partial η2Effect size fA priori powerTotal Sample Size
NrasG12VF(1, 12) = 10.4613 (interaction)0.4657480.933689480.8%*12* (3/group)
  1. *

    20 total (5/group) will be used based on the planned comparison calculations making the power 97.5%.

Test family

  • 2 tailed t test, difference between two independent means, Bonferroni's correction: alpha error = 0.025.

Power calculations (performed with G*Power software, version 3.1.7 [Faul et al., 2007]).

Group 1Group 2Effect size dA priori powerGroup 1 sample sizeGroup 2 sample size
CB17/G12V/Day 6CB17/G12V/Day 307.91995599.9%*3*3*
CB17/G12V/Day 6SCIDBeige/G12V/Day 302.65929089.8%55
  1. *

    Five in each group will be used based on the planned comparison calculations making the power 99.9%.

Summary of original data from Figure 3E (Kang et al., 2011):

p16-positive hepatocytes (Figure 3E)MeanSDN
CB17 wild-type injected with NrasG12V (Day 6)14.51.94
CB17 wild-type injected with NrasG12V (Day 30)004
SCID/beige injected with NrasG12V (Day 6)15.82.04
SCID/beige injected with NrasG12V (Day 30)10.64.04

Test family

2-way ANOVA: Fixed effects, special, main effects and interactions, alpha error = 0.05.

  • Power calculations were performed with effects reported in original study using G*Power software (version 3.1.7) (Faul et al., 2007).

  • ANOVA F statistic calculated with GraphPad Prism 6.0.

  • Partial η2 calculated from Lakens (2013).

Power calculations for replication

GroupF (DFn, Dfd)Partial η2Effect size fA priori powerTotal Sample Size
NrasG12VF(1, 12) = 14.6531 (interaction)0.5497711.10503091.6%*12* (3/group)
  1. *

    20 total (5/group) will be used based on the p21 planned comparison calculations making the power 99.6%.

Test family

  • 2 tailed t test, difference between two independent means, Bonferroni's correction: alpha error = 0.025.

Power calculations (performed with G*Power software, version 3.1.7 [Faul et al., 2007]).

Group 1Group 2Effect size dA priori powerGroup 1 sample sizeGroup 2 sample size
CB17/G12V/Day 6CB17/G12V/Day 3010.7926899.9%*3*3*
CB17/G12V/Day 6SCIDBeige/G12V/Day 303.74766680.1%33
  1. *

    Five in each group will be used based on the p21 planned comparison calculations making the power 99.9%.

  2. Five in each group will be used based on the p21 planned comparison calculations making the power 99.6%.

Protocol 2

Summary of original data from Figure 4A (Kang et al., 2011):

Nras-positive hepatocytes (Figure 4A)MeanSDN
BL/6 wild-type injected with NrasG12V6.11.75
BL/6 wild-type injected with NrasG12V/D38A16.22.35
CD4−/− injected with NrasG12V18.63.75
CD4−/− injected with NrasG12V/D38A18.44.95

Test family

2-way ANOVA: Fixed effects, special, main effects, and interactions, alpha error = 0.05.

  • Power calculations were performed with effects reported in original study using G*Power software (version 3.1.7) (Faul et al., 2007).

  • ANOVA F statistic calculated with GraphPad Prism 6.0.

  • Partial η2 calculated from Lakens (2013).

Power calculations for replication

F (DFn, Dfd)Partial η2Effect size fA priori powerTotal sample size
F(1, 16) = 11.5617 (interaction)0.4194840.85006287.7%116* (4/group)
  1. *

    20 total (5/group) will be used to account for additional variance making the power 94.6%.

Test family

  • 2 tailed t test, difference between two independent means, Bonferroni's correction: alpha error = 0.025.

Power calculations (Performed with G*Power software, version 3.1.7 [Faul et al., 2007]).

Group 1Group 2Effect size dA priori powerGroup 1 sample sizeGroup 2 sample size
BL/6 G12VBL/6 G12V/D38A4.99412996.0%*3*3*
BL/6 G12VCD4−/− G12V4.34142990.0%*3*3*
  1. *

    Five in each group will be used to account for additional variance making the power 99.9%.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
    Immune surveillance of tumors
    1. JB Swann
    2. MJ Smyth
    (2007)
    The Journal of Clinical Investigation 117:1137–1146.
    https://doi.org/10.1172/JCI31405
  16. 16
  17. 17

Decision letter

  1. Ronald N Germain
    Reviewing Editor; National Institute of Allergy and Infectious Diseases, United States

eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.

Thank you for sending your work entitled “Registered report: Senescence surveillance of pre-malignant hepatocytes limits liver cancer development” for consideration at eLife. Your article has been favorably evaluated by Sean Morrison (Senior editor), a Reviewing editor, and 4 reviewers, one of whom is a statistician.

The Reviewing editor and the reviewers discussed their comments before we reached this decision, and the Reviewing editor has assembled the following comments to help you prepare a revised submission. All believe this is a generally very well-conceived plan for replication, but also raised some specific issues that need to be addressed before a protocol is finalized, and the registered report accepted for publication in eLife.

Major issues to be addressed in a revised submission:

Technical points:

1) As the proposed reproduction experiments involve hydrodynamic tail vein injection, it is crucial that these experiments are conducted in a laboratory with high experience with these technique. It is important to note that the efficiency of DNA uptake is influenced by blood pressure and heart rate. Preferably, the technique is conducted without anesthesia, as in this way there will be no drop in blood pressure, and the cardiovascular system of the mouse can fully adapt to the volume challenge by hydrodynamic tail vein injection. In any case, two studies can only be compared if the same anesthesia is used. The anesthesia proposed for the reproduction experiment involves, in addition to ketamine, acepromazine. As acepromazine is known to induce vasodilation and decreases blood pressure, it cannot be ruled out that acepromazine impacts the efficiency of hydrodynamic DNA delivery. This is important to keep in mind, as in the original paper by Kang et al. I could not find information that acepromazine was used.

2) To ensure comparability of two studies it is furthermore important that mice of the same age and weight are subjected to hydrodynamic delivery. Kang et al. describe that mice 4–6 weeks of age were used for hydrodynamic delivery while in the outline of the reproducibility project it is stated that mice 4–6 weeks or 6–8 weeks of age are ordered and used after one week of adjustment time. It should be ensured that mice are injected at the same age as described by Kang et al. Likewise, the hydrodynamic injections are a critical step in this experiment. The researcher who conducts these injections should learn this method in a laboratory in which this protocol is well-established, if possible the Zender lab itself.

3) Even though the mice are being caged under SPF conditions, there is no evidence that the research team will test for the presence of viruses that are largely confined to mice with highly significant immunodeficiencies. These can often have significant effects on the anti-tumor activity of innate immunity. At a minimum that the investigators should test for the presence of Norovirus in their SCID/Beige mice as a marker for such infections, especially since Norovirus is found in many SPF colonies across the country. The experimental groups should also be co-caged to control, at least partially, for strain specific microbiota that can also have profound influences on natural anti-tumor activity.

4) With respect to Protocol 2, Kang et al. went to extreme lengths to carefully look at the role of class II restricted CD4 T cells, including using CD4−/− mice, Class II–/– mice, and wt mice depleted of CD4+ cells using CD4 specific mAb. All of their results agree that CD4+ T cells are playing a major role in surveillance. However, in the proposed study, the research team only proposes to use CD4−/− mice. There may be effectors or Tregs that emerge from these mice that can still influence the response, and at least one more of the previously employed models of mice lacking CD4 T cells should also be used to substantiate the effects that may or may not be seen with CD4−/− mice.

Statistical issues:

1) The authors state that: “The original authors of the study performed this [statistical] analysis, and we are therefore including it in the replication analysis plan. We are also including the more statistically appropriate tests detailed below”. The used wording implies that the authors of the replication study regard the statistical analyses by Kang et al. as not adequate. More detailed information should be provided as to why the authors think the analyses by Kang et al. are not adequate.

2) Each protocol has statistics mentioned in the confirmatory analysis plan and the power calculation sections. In protocol 1 the analyses will be two-way ANOVA applied to three datasets. In each ANOVA are the effects treatment and time? The summary data from the original report suggests this. Means and SD are reported for each of the four groups within each experiment but the factor specific effects are not reported. A two-way ANOVA can consist of up to three tests, the treatment, the time, and the treatment–time interaction. The power calculations use a single effect size, which we do not understand. As the experiment is trying to replicate an already detected effect, then each such effect can be clearly stated. The power calculation is reported for one degree of freedom suggesting only one of these terms (treatment?) is expected to be significant though time clearly seems to have a strong effect in one group.

3) The plans for protocol 2 are easier to understand in that simple one-way ANOVAs and t-tests are planned, though with just 5 per group this will rely on the very strong assumption of normality. Prior experience involving tumor in vivo studies shows that such experiments have considerable mouse-to-mouse variation. It would be a shame if conclusions are limited by too small sample numbers, and in light of the statistical issue just raised, it seems important to increase the animal numbers per group to n = 10–15 because of the low projecting 81% confidence level at n = 5. Also, the power calculations seem to justify 3 in each sample, yet 5 in each group are planned. The authors need to better explain their power calculations that led them to conclude openly 3 animals per group would be sufficient and why they then choose 5 animals per group. Is this to cover some results lost due to failure? In any case, the power calculations for these comparisons need to state what specific effect sizes will be viewed as a replication of the original study.

4) In a related vein, in a replication study it is important to be clear what specific cut offs will be used to declare a replication or lack of replication. So the explicit effect sizes for each factor and for each of the 6 experiments need to be stated. While it is a good idea to combine the two results (original and replication) in a forest plot, this will give an inflated estimate of any effect sizes due to the publication selection operating on the first article. The decision regarding whether there has been replication should be based on the results of the replication study alone. So it may be that smaller effect sizes than those reported in the original article would still be viewed as strong evidence of the anticipated effect. In such a case, as stated above (point #3), a larger sample size would be justified and should be strongly considered at this stage.

5) Regarding the power calculations, the proposed power calculations use a combination of data reported in the original study and data from an alternative relevant study. This is fine at this stage, however, we suggest the following improvements:

a) Cross-study variation should be taken into account to determine expected power in the proposed study, since loss of power can be expected from the original study. This is hard to estimate, but papers by Giovanni Parmigiani and collaborators at the Dana–Farber provide some estimates about cross-study variation that could be used for this purpose. The authors should budget some additional variability because of cross-study reproducibility, and increase the sample size on-the-fly, as they deem appropriate.

b) The final report on the replicated study should report the actual power of the tests, based on numerical summaries of the data in the replicated study.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your article entitled “Registered report: Senescence surveillance of pre-malignant hepatocytes limits liver cancer development” for further consideration at eLife. Your revised Registered report has been favorably evaluated by Sean Morrison (Senior editor), a member of the Board of Reviewing Editors, and by someone with relevant expertise in statistics.

The Registered report is essentially ready for acceptance, but we would ask you to respond to the remaining statistical issues and make additional minor modifications as needed:

1) We do not think it is a good idea to report a combined estimate of any effect size with the original. However displaying results on a forest plot is helpful.

2) With respect to the sample size calculations for the two-way ANOVA, have you used the within group SDs instead of the within group variances in the formulae? If so, this means that you would be over-estimating the power. With 5 in each group the authors should just have 80% power to replicate what was seen in the original study.

https://doi.org/10.7554/eLife.04105.002

Author response

1) As the proposed reproduction experiments involve hydrodynamic tail vein injection, it is crucial that these experiments are conducted in a laboratory with high experience with these technique. It is important to note that the efficiency of DNA uptake is influenced by blood pressure and heart rate. Preferably, the technique is conducted without anesthesia, as in this way there will be no drop in blood pressure, and the cardiovascular system of the mouse can fully adapt to the volume challenge by hydrodynamic tail vein injection. In any case, two studies can only be compared if the same anesthesia is used. The anesthesia proposed for the reproduction experiment involves, in addition to ketamine, acepromazine. As acepromazine is known to induce vasodilation and decreases blood pressure, it cannot be ruled out that acepromazine impacts the efficiency of hydrodynamic DNA delivery. This is important to keep in mind, as in the original paper by Kang et al. I could not find information that acepromazine was used.

We thank the reviewers for this suggestion. According to correspondence with the original authors, Kang and colleagues followed the protocol of Bell et al. (2007) in order to perform their hydrodynamic injections. This protocol clearly outlines a drug cocktail of 8 mg ml−1 ketamine HCl, 0.1 mg ml−1 acepromazine maleate and 0.01 mg ml−1 butorphanol tartrate. We agree with the reviewers that deviating from the exact drug cocktail used in the original study may contribute to skewed results; therefore, we will also follow the protocol of Bell, et al., and use the same anesthetics as used in the original study.

We have included a biographical sketch of the scientists directly in charge of performing the hydrodynamic injections. Ms. Lynette Bower has over ten years of experience in laboratory animal procedures, and is certified proficient in performing HPTV injections. She has performed over 50 injections on four separate projects, with a >80% success rate. Ms. Bower will be overseen by Dr. Kristin Evans, who is one of the directors of the Mouse Biology Program at the University of California, Davis.

In order to be overly conservative in our approach, we have increased the sample sizes for each cohort of mice receiving hydrodynamic injections from 5 animals to 7. In this way, we feel confident that we will have enough end-point animals to achieve high statistical power in our analyses.

2) To ensure comparability of two studies it is furthermore important that mice of the same age and weight are subjected to hydrodynamic delivery. Kang et al. describe that mice 4–6 weeks of age were used for hydrodynamic delivery while in the outline of the reproducibility project it is stated that mice 4–6 weeks or 6–8 weeks of age are ordered and used after one week of adjustment time. It should be ensured that mice are injected at the same age as described by Kang et al. Likewise, the hydrodynamic injections are a critical step in this experiment. The researcher who conducts these injections should learn this method in a laboratory in which this protocol is well-established, if possible the Zender lab itself.

We thank the reviewers for calling our attention to this apparent discrepancy. While our initial correspondence with Kang and colleagues implied that two age ranges of mice were used, subsequent communication with Dr. Tae Won Kang has revealed that for the particular experiments outlined in this replication, all mice should be 4–6 weeks of age.

We have clarified the exact ages of mice that will be used for each protocol in the Registered report. For both Protocol 1 and 2, female mice will be ordered at 4 weeks of age, acclimated for 1 week, and injected at 5 weeks of age. The precise ages and weights of mice at the time of injection will be documented.

In regards to performing hydrodynamic injections, we will follow the procedures outlined in Bell et al. (2007), as followed by Kang and colleagues. The replicating scientists are highly experienced in the practice of hydrodynamic injections (please see above response). Additionally, we are increasing the sample sizes for all animals receiving injections, so as to ensure the statistical power of our downstream analyses (see above response).

3) Even though the mice are being caged under SPF conditions, there is no evidence that the research team will test for the presence of viruses that are largely confined to mice with highly significant immunodeficiencies. These can often have significant effects on the anti-tumor activity of innate immunity. At a minimum that the investigators should test for the presence of Norovirus in their SCID/Beige mice as a marker for such infections, especially since Norovirus is found in many SPF colonies across the country. The experimental groups should also be co-caged to control, at least partially, for strain specific microbiota that can also have profound influences on natural anti-tumor activity.

We thank the reviewers for these suggestions. We have attached a sample health report from the University of California–Davis Mouse Biology Program (UCD-MBP) that details the testing parameters in their vivarium. As noted, murine norovirus (MNV) is routinely tested for via serum ELISA. UCD-MBP has not detected any pathogens in their colonies.

We agree with the reviewers’ suggestion to co-cage experimental and control groups of mice within the same strain. Therefore, we have updated the Registered report to indicate that mice of the same strain receiving either NrasG12V or NrasG12V/D38A will be co-caged both before and after receiving injections. As far as co-caging mice between strains, we believe this introduces a major procedural difference that did not occur in the original study, as strain mixing may introduce variation in mouse behavior and/or microbiota (as noted by the reviewer). To minimize such differences in our replication, we will continue to house mice strain-specifically.

4) With respect to Protocol 2, Kang et al. went to extreme lengths to carefully look at the role of class II restricted CD4 T cells, including using CD4−/− mice, Class II–/– mice, and wt mice depleted of CD4+ cells using CD4 specific mAb. All of their results agree that CD4+ T cells are playing a major role in surveillance. However, in the proposed study, the research team only proposes to use CD4−/− mice. There may be effectors or Tregs that emerge from these mice that can still influence the response, and at least one more of the previously employed models of mice lacking CD4 T cells should also be used to substantiate the effects that may or may not be seen with CD4−/− mice.

We thank the reviewers for this insightful comment, and we agree that this is an important issue to consider during data interpretation. Although we plan to obtain mice from the same vendor as the original authors (Jackson Laboratory), there may be subtle differences that emerge even from seemingly genetically identical animals. The raised concern may be a reason why the experimental effect, or size of the effect, is not replicated and thus will be acknowledged as a potential factor in the Replication Study. Alternatively, the raised concern may result in the same observed effect, but not due to the role of CD4 cells. This too will be acknowledged in the Replication Study.

We recognize that all of the experiments included in the original study are important, and choosing which experiments to replicate has been one of the great challenges of this project. We agree that the exclusion of certain experiments limits the scope of what can be analyzed by the project, but we are attempting to identify a balance of breadth of sampling for general inference with sensible investment of resources on replication projects. The Reproducibility Project: Cancer Biology is aimed to replicate a selection of experiments as faithfully as possible, not necessarily the main conclusions that are drawn from the many experiments in any given paper.

Statistical issues:

1) The authors state that: “The original authors of the study performed this [statistical] analysis, and we are therefore including it in the replication analysis plan. We are also including the more statistically appropriate tests detailed below”. The used wording implies that the authors of the replication study regard the statistical analyses by Kang et al. as not adequate. More detailed information should be provided as to why the authors think the analyses by Kang et al. are not adequate.

We have updated the language in the Registered report to better reflect the strategy for this analysis. As outlined in Nieuwenhuis et al. (Nieuwenhuis et al., 2011), the authors did not report an interaction effect (Nras mutational status–immune defect), which the researchers would have needed to report as significant to support their claim. Thus, we will be performing the two-way ANOVA to determine if the interaction is significant before performing pair-wise comparisons. However, we also plan to perform a t-test outside the framework of an ANOVA similar to the original paper (although performing a planned comparison within the framework of an ANOVA is more powerful than performing a separate t-test if the assumption of ANOVA is valid) to allow for a direct comparison to what was originally reported.

2) Each protocol has statistics mentioned in the confirmatory analysis plan and the power calculation sections. In protocol 1 the analyses will be two-way ANOVA applied to three datasets. In each ANOVA are the effects treatment and time? The summary data from the original report suggests this. Means and SD are reported for each of the four groups within each experiment but the factor specific effects are not reported. A two-way ANOVA can consist of up to three tests, the treatment, the time, and the treatment-time interaction. The power calculations use a single effect size, which we do not understand. As the experiment is trying to replicate an already detected effect, then each such effect can be clearly stated. The power calculation is reported for one degree of freedom suggesting only one of these terms (treatment?) is expected to be significant though time clearly seems to have a strong effect in one group.

We thank the reviewer for these helpful comments. We have revised our confirmatory analysis plan for Protocol 1, as updated in the Registered report. We have also updated the Power Calculations section to better reflect both the analysis we performed on the original data, as well as our future analysis plans. The original data had a significant treatment–time interaction, which we are powered for and that we have more clearly labeled. Thus, the experiment is designed to test the originally observed effect of time differing between the treatments. However, we do see value in adding planned comparisons between the two times for each treatment as this will evaluate the size of the effect in each group. Therefore, we have included such comparisons in the updated version of our confirmatory analysis plan for Protocol 1.

3) The plans for protocol 2 are easier to understand in that simple one-way ANOVAs and t-tests are planned, though with just 5 per group this will rely on the very strong assumption of normality. Prior experience involving tumor in vivo studies shows that such experiments have considerable mouse-to-mouse variation. It would be a shame if conclusions are limited by too small sample numbers, and in light of the statistical issue just raised, it seems important to increase the animal numbers per group to n= 10-15 because of the low projecting 81% confidence level at n = 5. Also, the power calculations seem to justify 3 in each sample, yet 5 in each group are planned. The authors need to better explain their power calculations that led them to conclude openly 3 animals per group would be sufficient and why they then choose 5 animals per group. Is this to cover some results lost due to failure? In any case, the power calculations for these comparisons need to state what specific effect sizes will be viewed as a replication of the original study.

We thank the reviewers for these insightful suggestions. We agree that there is the potential for unanticipated variation in the experimental design that would lead to the exclusion of animals in a given group. Regarding hydrodynamic injections, the original study authors communicated variation in injections that resulted in a ∼20% injection failure rate, leading to exclusion of animals in downstream analyses. To account for variation in injections, we have increased the sample size to seven mice per treatment group for the experiments performed in Protocol 1, as well as the 12-day experimental arm of Protocol 2. We believe this increased sample size will allow us to analyze a minimum of five mice per group in downstream analyses. We have updated the Power Calculations to indicate the statistical power achieved by including five mice per group (89.9–99.9% for Protocol 1; 94.6–99.9% for Protocol 2).

We believe the reviewers have highlighted an important concern regarding the potential for mouse-to-mouse variation in tumor development in the 7-month experimental arm of Protocol 2. Given that we cannot calculate the effect size of the original experimental data in Figure 4B (specifically the CD4−/− arm), and that we also do not know the variance or the distribution of the data for this arm, it is difficult to properly power this replication with a large enough sample size that also remains within the feasibility of the project. Based on these factors, we have reconsidered the inclusion of the 7-month in vivo tumor formation experiment in this replication study. Given that the original effect size is unknown, we feel that a replication of this experiment would be considered exploratory in nature; thus, we have reallocated our resources toward increasing sample sizes for, and therefore improving, those arms of the study that can be statistically compared to the original data.

4) In a related vein, in a replication study it is important to be clear what specific cut offs will be used to declare a replication or lack of replication. So the explicit effect sizes for each factor and for each of the 6 experiments need to be stated. While it is a good idea to combine the two results (original and replication) in a forest plot, this will give an inflated estimate of any effect sizes due to the publication selection operating on the first article. The decision regarding whether there has been replication should be based on the results of the replication study alone. So it may be that smaller effect sizes than those reported in the original article would still be viewed as strong evidence of the anticipated effect. In such a case, as stated above (point #3), a larger sample size would be justified and should be strongly considered at this stage.

We agree that the replication effect size and the original effect size (with their corresponding 95% confidence intervals) should be compared to each other. Therefore, each confirmatory analysis plan section begins with the phrasing describing the comparison of the two effect sizes and the combination of them. We have slightly modified the text in the Registered report to clarify this plan. We plan to present the effect sizes of each factor for all the experiments this way and to combine each factor (original and replication) using a meta-analytic approach. This will allow direct comparison of the original and replication effect sizes and the combination of the two will give an estimation of the current knowledge for that given effect size. And we agree that the direct comparison will allow for one to determine if the evidence for an anticipated effect is still there, even if the size of that effect, or precision of that effect size estimate is different.

5) Regarding the power calculations, the proposed power calculations use a combination of data reported in the original study and data from an alternative relevant study. This is fine at this stage, however, we suggest the following improvements:

a) Cross-study variation should be taken into account to determine expected power in the proposed study, since loss of power can be expected from the original study. This is hard to estimate, but papers by Giovanni Parmigiani and collaborators at the Dana–Farber provide some estimates about cross-study variation that could be used for this purpose. The authors should budget some additional variability because of cross-study reproducibility, and increase the sample size on-the-fly, as they deem appropriate.

We thank the reviewers for these suggestions. The cross-study variation, such as approaches that utilize the 95% confidence interval of the effect size, can be useful in conducting power calculations when planning adequate sample sizes for detecting the true population effect size, which requires a range of possible observed effect sizes. However, the Reproducibility Project: Cancer Biology is designed to conduct replications that have 80% power to detect the point estimate of the originally reported effect size. While this has the limitation of being underpowered to detect smaller effects than what is originally reported, this standardizes the approach across all studies to be designed to detect the originally reported effect size with at least 80% power. Also, while the minimum power guarantee is beneficial for observing a range of possible effect sizes, the experiments in this replication, and all experiments in the project, are designed to detect the originally reported effect size with a minimum power of 80%. Thus, performing power calculations during or after data collection is not necessary in this replication attempt as all studies included are already designed to meet a minimum power or are identified beforehand as being underpowered and thus are not included in the confirmatory analysis plan. The papers by Giovanni Parmigiani and collaborators highlight the importance of accounting for variability that can occur across different studies, specifically gene expression data. While it is possible for a difference in variance between the originally reported results and the replication data, this will be reflected in the presentation of the data and a possible reason for obtaining a different effect size estimate.

b) The final report on the replicated study should report the actual power of the tests, based on numerical summaries of the data in the replicated study.

As described above, we do not see the value in performing post-hoc power calculations on the obtained data. However, we do agree that reporting the actual power of the tests to detect the originally reported effect size estimate based on the sample size analyzed in the replication study is important and will be reported.

[Editors’ note: further revisions were requested prior to acceptance, as described below.]

1) We do not think it is a good idea to report a combined estimate of any effect size with the original. However displaying results on a forest plot is helpful.

We disagree, and as described in Valentine et al., 2011, and Bumming, 2012, combining multiple studies using a meta-analytic approach is a statistical option that can be employed to describe all of the available evidence about a given effect size. Specifically, we will utilize a random effects meta-analysis because this approach assumes that the effects vary due to known and unknown characteristics of the studies.

2) With respect to the sample size calculations for the two-way ANOVA, have you used the within group SDs instead of the within group variances in the formulae? If so, this means that you would be over-estimating the power. With 5 in each group the authors should just have 80% power to replicate what was seen in the original study.

We used the SD when we calculated the F statistic, which is the input for GraphPad Prism’s software when the original data values are not known. We also obtained the same F values and partial eta squared values when using the R software package ‘rpsychi’ and the function ‘ind.twoway.second’, which conducts a two-way design using published work using the mean, SD, and n (http://cran.r-project.org/package=rpsychi). Please see the “Study 34 2-way ANOVAs.R” file.

References:

Valentine JC, Biglan A, Boruch RF, Castro FG, Collins LM, Flay BR, Kellam S, Moscicki EK, Schinke SP. 2011. Replication in prevention science. Prev Sci. 12(2): 103-17. doi:10.1007/s11121-011-0217-6.

Bumming G. 2012. Understanding The New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis. Routledge. ISBN: 978-0-415-87,968-2.

https://doi.org/10.7554/eLife.04105.003

Article and author information

Author details

  1. Samrrah Raouf

    Mouse Biology Program, University of California–Davis, Davis
    Contribution
    SR, Drafting or revising the article
    Competing interests
    SR: The UC-Davis Mouse Biology Program is a Science Exchange affiliated lab.
  2. Claire Weston

    Reveal Biosciences, San Diego
    Contribution
    CW, Drafting or revising the article
    Competing interests
    CW: Reveal Biosciences is a Science Exchange affiliated lab.
  3. Nora Yucel

    Stanford University, Stanford
    Contribution
    NY, Drafting or revising the article
    Competing interests
    No competing interests declared.
  4. Reproducibility Project: Cancer Biology

    Contribution
    RP:CB, Conception and design, Drafting or revising the article
    For correspondence
    joelle@scienceexchange.com
    Competing interests
    RP:CB: EI, FT and JL are employed and holds shares in Science Exchange, Inc.
    1. Elizabeth Iorns, Science Exchange, Palo Alto, California
    2. William Gunn, Mendeley, London, United Kingdom
    3. Fraser Tan, Science Exchange, Palo Alto, California
    4. Joelle Lomax, Science Exchange, Palo Alto, California
    5. Timothy Errington, Center for Open Science, Charlottesville, Virginia

Funding

Laura and John Arnold Foundation

  • Reproducibility Project: Cancer Biology

The Reproducibility Project: Cancer Biology is funded by the Laura and John Arnold Foundation, provided to the Center for Open Science in collaboration with Science Exchange. The funder had no role in study design or the decision to submit the work for publication.

Acknowledgements

The Reproducibility Project: Cancer Biology core team would like to thank the original authors, in particular Lars Zender and Tae-Won Kang, for generously sharing critical information and reagents to ensure the fidelity and quality of this replication attempt. We are grateful to Courtney Soderberg at the Center for Open Science for assistance with statistical analyses. We would also like to thank the following companies for generously donating reagents to the Reproducibility Project: Cancer Biology; American Type Culture Collection (ATCC), BioLegend, Cell Signaling Technology, Charles River Laboratories, Corning Incorporated, DDC Medical, EMD Millipore, Harlan Laboratories, LI-COR Biosciences, Mirus Bio, Novus Biologicals, Sigma–Aldrich, and System Biosciences (SBI).

Reviewing Editor

  1. Ronald N Germain, Reviewing Editor, National Institute of Allergy and Infectious Diseases, United States

Publication history

  1. Received: July 21, 2014
  2. Accepted: December 14, 2014
  3. Version of Record published: January 26, 2015 (version 1)

Copyright

© 2015, Raouf et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 988
    Page views
  • 95
    Downloads
  • 1
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Comments

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

  1. TW Kang et al.
    1
    1. Cancer Biology
    Curated by Roger Davis et al.
    Collection Updated

    Investigating reproducibility in preclinical cancer research.

    1. Cell Biology
    Nicholas R Guydosh et al.
    Research Article Updated