Extent, impact, and mitigation of batch effects in tumor biomarker studies using tissue microarrays
Abstract
Tissue microarrays (TMAs) have been used in thousands of cancer biomarker studies. To what extent batch effects, measurement error in biomarker levels between slides, affects TMA-based studies has not been assessed systematically. We evaluated 20 protein biomarkers on 14 TMAs with prospectively collected tumor tissue from 1,448 primary prostate cancers. In half of the biomarkers, more than 10% of biomarker variance was attributable to between-TMA differences (range, 1-48%). We implemented different methods to mitigate batch effects (R package batchtma), tested in plasmode simulation. Biomarker levels were more similar between mitigation approaches compared to uncorrected values. For some biomarkers, associations with clinical features changed substantially after addressing batch effects. Batch effects and resulting bias are not an error of an individual study but an inherent feature of TMA-based protein biomarker studies. They always need to be considered during study design and addressed analytically in studies using more than one TMA.
Data availability
The batchtma R package is available at https://stopsack.github.io/batchtma and the Comprehensive R Archive Network (CRAN). Code used to produce results this manuscript is at https://github.com/stopsack/batchtma_manuscript. Data are available for analysis on the Harvard FAS computing cluster through a project proposal for the Health Professionals Follow-up Study (https://sites.sph.harvard.edu/hpfs/for-collaborators).
Article and author information
Author details
Funding
National Cancer Institute (U01 CA167552)
- Lorelei A Mucci
DOD Prostate Cancer Research Program (W81XWH-18-1-0330)
- Konrad H Stopsack
Prostate Cancer Foundation (Young Investigator Award)
- Konrad H Stopsack
- Kathryn L.# Penney
- Stephen P Finn
- Tamara L Lotan
- Lorelei A Mucci
National Cancer Institute (P50 CA090381)
- Philip W Kantoff
- Massimo Loda
- Lorelei A Mucci
National Cancer Institute (P50 CA211024)
- Massimo Loda
National Cancer Institute (P30 CA008748)
- Konrad H Stopsack
- Philip W Kantoff
National Cancer Institute (P30 CA006516)
- Massimo Loda
- Lorelei A Mucci
National Cancer Institute (5R37 CA227190-02)
- Svitlana Tyekucheva
- Kathryn L.# Penney
- Giovanni Parmigiani
- Lorelei A Mucci
National Cancer Institute (R03 CA212799)
- Molin Wang
National Cancer Institute (R35 CA212799)
- Molin Wang
National Cancer Institute (R01 CA131945)
- Massimo Loda
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Ethics
Human subjects: Written informed consent was obtained from all participants, and the study protocol was approved by the institutional review boards of the Brigham and Women's Hospital and Harvard T.H. Chan School of Public Health (IRB 19-1430), and those of participating registries as required.
Copyright
© 2021, Stopsack et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 780
- views
-
- 141
- downloads
-
- 8
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Citations by DOI
-
- 8
- citations for umbrella DOI https://doi.org/10.7554/eLife.71265