Multi-syndrome, multi-gene risk modeling for individuals with a family history of cancer with the novel R package PanelPRO

  1. Gavin Lee
  2. Jane W Liang
  3. Qing Zhang
  4. Theodore Huang
  5. Christine Choirat
  6. Giovanni Parmigani
  7. Danielle Braun  Is a corresponding author
  1. Swiss Data Science Center, ETH Zürich and EPFL, Switzerland
  2. Department of Biostatistics, Harvard T.H. Chan School of Public Health, United States
  3. Department of Data Sciences, Dana-Farber Cancer Institute, United States
  4. Broad Institute of MIT and Harvard, United States

Abstract

Identifying individuals who are at high risk of cancer due to inherited germline mutations is critical for effective implementation of personalized prevention strategies. Most existing models focus on a few specific syndromes; however, recent evidence from multi-gene panel testing shows that many syndromes are overlapping, motivating the development of models that incorporate family history on several cancers and predict mutations for a comprehensive panel of genes.

We present PanelPRO, a new, open-source R package providing a fast, flexible back-end for multi-gene, multi-cancer risk modeling with pedigree data. It includes a customizable database with default parameter values estimated from published studies and allows users to select any combinations of genes and cancers for their models, including well-established single syndrome BayesMendel models (BRCAPRO and MMRPRO). This leads to more accurate risk predictions and ultimately has a high impact on prevention strategies for cancer and clinical decision making. The package is available for download for research purposes at https://projects.iq.harvard.edu/bayesmendel/panelpro.

eLife digest

Genetic mutations that increase cancer risk can be passed down from parents to their children, which can affect families across many generations. In these families, multiple members may be affected by different types of cancer, and these cancers often develop at an early age. Unaffected family members are often referred to genetic counselling, where they can explore their own risk of cancer. Clinicians and genetic counselors can provide recommendations to minimize cancer risk and inform personal choices on how to manage that risk, such as opting for preventative surgeries or participating in regular screening.

In genetic counselling sessions, highly trained clinicians and specialists use software that takes an individual’s family history of cancer and uses it to estimate their individual risk of carrying certain genetic mutations. These estimates can in turn help to predict their future risk of cancer. Many existing software packages are limited to estimating risks based on mutations in well-known cancer-related genes, such as BRCA1 and BRCA2 in breast and ovarian cancer. However, emerging evidence suggests that many of the genes associated with cancer risk work as part of a complex and overlapping network. Since current risk-profiling software packages are only designed to consider such genes in isolation, they cannot generate the most robust, accurate or comprehensive cancer risk profiles.

To address this challenge, Lee, Liang et al. have developed a new risk-profiling software that can integrate a large number of gene mutations and a wide range of potential cancer types to provide more accurate estimates of individual cancer risk. This software, called PanelPRO, uses evidence identified from extensive literature reviews to model the complex interplay between genes and cancer risk. The software not only calculates risks based on known genes, but also allows other developers to integrate new cancer-related genes that may be identified in the future. Importantly, the software is compatible with genetic counselling applications, since it returns answers within seconds when reasonable family and gene database sizes are used.

PanelPRO is a new, modern, flexible and efficient software package that provides an important advance towards modelling the vast genetic and biological complexity that contributes to inherited cancer risk. This software is designed to provide a more accurate and comprehensive estimate of cancer risk for individuals with family histories of cancer.

As an open-source software, it is freely available for research purposes, and can be licensed by software companies and healthcare organizations to integrate electronic patient records and rapidly identify at-risk individuals across larger patient groups. Ultimately, this software has the potential to improve cancer prevention strategies and optimize the personalized decision-making processes around cancer risk.

Introduction

In the last decade, DNA sequencing has changed dramatically. Tests have become faster and more affordable, leading to discovery of a growing number of germline pathogenic variants associated with increased cancer risk. Multi-gene panels are routinely available and include varying combinations of genes (Plichta et al., 2016). Evidence is accruing that gene mutations, which were typically believed to be only associated with one or two types of hereditary cancers, may in fact increase the risk for a wider range of syndromes. These advancements have changed the genetic counseling landscape by introducing a need to consider a wider set of individual genes and cancers to accurately assess overall risks. In the context of genetic counseling, the importance of accurate estimates of carrier probabilities is well-known (Nelson et al., 2014). With the number of genes of interest and their combinations increasing, efficient calculation of these estimates becomes crucial in clinical settings.

In genetic counseling, an individual may be suspected of inherited cancer susceptibility if their family history exhibits certain patterns. For example, if two or more relatives have the same type of cancer on the same side of the family, or if cancer diagnoses in the family are particularly early, they may be referred to testing for mutations in genes associated with increased risk for those specific cancers. In the case of hereditary breast cancer, guidelines in the US were established to identify patients who have higher likelihoods of benefiting from germline genetic testing. Thresholds for testing were set high initially, since genetic testing was very expensive at the time (Manahan et al., 2019). Although cost of testing has decreased and guidelines are constantly changing, accurate calculation of carrier probabilities, given family history, is essential in supporting the decision for further testing, preventative treatment, or family planning (Chen et al., 2004).

Existing Mendelian models consider a relatively narrow subset of cancers and genes. For example, BRCAPRO, available in the BayesMendel R package (Chen et al., 2004), considers two cancers (breast and ovarian) and two genes (BRCA1 and BRCA2). Boadicea v4 Beta (Centre for Cancer Genetic Epidemiology, 2020) considers BRCA1, BRCA2, PALB2, CHEK2, and ATM mutations in the same cancers. To comprehensively incorporate cancers, genes, and their interactions, we introduce PanelPRO, an R package which aims to efficiently and flexibly scale to the demands of germline panel testing. The newly developed package has the following key advantages:

  • Customizable model specification, including the choice of genes and cancers included in the model;

  • Customizable model parameters, including the allele frequencies and penetrances;

  • Accurate default parameter estimates, curated from an extensive literature search;

  • Flexibility to incorporate cancer risk modifiers such as prophylactic surgeries;

  • Comprehensive user input checks for pedigrees;

  • Speed of computation, through an optimized C++ implementation of an efficient algorithm. (Madsen et al., 2018)

In general, PanelPRO can handle models with K genes and R cancers, where K and R are arbitrary (subject to reasonable run-times and memory constraints). It is intended that K and R become larger as more research on gene cancer associations becomes available. The package contains a comprehensive collection of functions designed to efficiently calculate carrier probabilities and future cancer risk for individuals, given detailed information about their family history.

PanelPRO is designed to be back-compatible with the existing BayesMendel package; individual models within that package (for example, BRCAPRO, MMRPRO, BRCAPRO5, or BRCAPRO6) can be called directly from PanelPRO by passing this model specification to the main function call. We expect that users of BayesMendel will migrate to this generalized and customizable enhancement, and that PanelPRO will lead to new users interested in broader cross syndrome modeling in the current landscape for cancer clinical risk assessment. In the current release, there are minor differences in how BayesMendel and PanelPRO deal with peer-reviewed data, in particular, for cancer penetrance calculations.

The PanelPRO R package is freely available for research purposes. The BayesMendel R package is likewise freely available for research purposes, and is currently licensed for clinical commercial use to CRA Health, CancerGene Connect, Progeny, FamHis, MagView, Igentify, CancerIQ, and Finch genetics. Family history has long been understood to be a key component for identifying risk and preventing heritable diseases, and clinical tools such as the ones which license BayesMendel are becoming more readily available (Welch et al., 2018). For BayesMendel, we leave clinical integration to the software licensees, including the integration of electronic medical records such as EPIC. We envision a similar dissemination plan for PanelPRO.

Methods: package workflow

The workflow of the package includes four main parts: the input, including user and model input; pre-processing of the inputs, including user input checks and a database build; running the peeling-paring algorithm; and outputting the results. Figure 1 shows the workflow. Additionally, Appendix 1—figure 1 shows detailed sub-routines within the package.

PanelPRO package workflow.

Input

User input

The main input from the user is their pedigree. This is in the form of an R data.frame which contains detailed information about known family members, such as their ages and previous cancer diagnoses. The pedigree structure is defined by the ID, MotherID and FatherID columns. Previous cancer diagnoses and their ages of diagnosis are stored in the isAff* and Age* columns, respectively, where * represents a cancer type, designated according to a standard nomenclature of two- to four-letter tags which are also used in visualizations. Cancers currently considered in PanelPRO are listed in Appendix 1—table 1. Note, this list will be expanded for future versions of the package as more information on risk for various genes and cancers becomes available in the literature. Risk modifiers, such as prophylactic surgeries, can be incorporated to adjust the likelihood calculation. Previous genetic testing history can also be incorporated. The current version of the package, v0.2.0, contains thirteen sample pedigrees, called test_fam_X, where X goes from 1 to 12, and err_fam_1. The test_fam_X pedigrees provide realistic and extreme examples of the data that can be included in the user input, whereas err_fam_1 is an example of a family that does not pass PanelPRO’s preprocessing pedigree check and therefore cannot be evaluated by the model. A clipped version (with a subset of the necessary columns) of test_fam_1 pedigree is shown below. Key figures relevant to the pedigree can be found in Appendix 1—table 2.

head(test_fam_1)
##IDSexMotherIDFatherIDisProbandCurAgeisAffBCisAffBCAgeBCAgeOCisDead
##110NANA0931065NA1
##221NANA08000NANA1
##330120721140NA0
##4411206500NANA1
##5511206500NANA0
##6601215500NANA0

The full specification of the pedigree structure is shown in Table 1. The family tree pedigree can also be visualized by using the visPed package as in Figure 2. This external package is available through https://github.com/bayesmendel/visPed version 0.1.0 (Lee, 2021) and is based on the kinship2 package available in CRAN (Sinnwell et al., 2014). PanelPRO itself does not contain pedigree plotting functionality. However, users can easily acquire the visPed package separately.

Table 1
Pedigree structure in PanelPRO.
ColumnDefinitionValue
IDUnique numeric identifier of each individualNon-repeated strictly positive integer
MotherIDID of one’s motherStrictly positive integer or NA (missing)
FatherIDID of one’s fatherStrictly positive integer or NA (missing)
SexSex of the individual: 1 for male, 0 for femaleOne of {0, 1}
isProbandIndicates the proband or counselee by 1 and 0 otherwise – multiple probands can be specifiedOne of {0, 1)
CurAgeAge of censoring: either the current age or death age, depending on isDead statusPositive integer or NA (missing)
isAff*Affection status of cancer *One of {0, 1}
Age*Affection age of cancer *Positive integer or NA (missing)
isDeadWhether someone has diedOne of {0, 1, NA}
raceRace of individual (used to modify penetrance)One of All_Races, AIAN, Asian, Black, White, Hispanic, WH, WNH, NA
AncestryAncestry of individual (used to modify allele frequencies)One of AJ, nonAJ, Italian, NA
TwinsIdentifies siblings who are identical twins or multiple birthsEach set is identified by a unique integer, and 0 otherwise
riskmodPreventative interventions which modify penetranceList, combination of "mastectomy", "hysterectomy", and "oophorectomy"
InterAgeAge of each preventative interventionsList, combination of integers
Gene name from GENE_TYPESGermline testing resultOne of {0, 1, NA}
Marker name from CK14, CK5.6, ER, PR, HER2, MSIMarker testing resultOne of {0, 1, NA}
test_fam_1 sample pedigree as included in the PanelPRO package, plotted using the external visPed package.

The colors refer to cancer diagnoses in the legend. The age of diagnosis is shown below the individual if it is known.

Prophylactic surgeries (mastectomy, oophorectomy and hysterectomy) act as risk modifiers. They are specified in a column of lists in the user input pedigree. Including these risk modifiers changes the resulting carrier probability and future risk outputs (see the Output section). Previous history of biomarker testing for breast and colorectal cancers can also be included in the model.

Model input

Calculation of carrier probabilities requires information about allele frequencies and penetrances for the mutations and cancers requested in the function call. They are derived from peer-reviewed studies whose results are cataloged in the PanelPRODatabase. At each function call, the code extracts the appropriate subset of gene-cancer combinations. These combinations are specified in the main PanelPRO function call, where the user should indicate the cancers for which family history in the pedigree should be used, as well as the genes for which carrier probabilities are requested.

PanelPRO(pedigree = test_fam_1, 
                      cancers = c(‘Breast’, ‘Ovarian’), 
                      genes = c(‘BRCA1’, ‘BRCA2’, ‘ATM’, ‘MSH2’))

If no genes or cancers are specified, PanelPRO will default to all the supported genes in the version at that time, with all the cancers in the pedigree.

Users are free to change the database defaults for their own purposes. The structure of this database is an R list. A partial output is provided below.

str(PanelPRODatabase)
##$ Penetrance: num [1:18, 1:26, 1:8, 1:2, 1:94, 1:2] 3.98e-05 2.80e-07 0.00 5.00e-08 0.00 ...
## ..- attr(*, 'dimnames')=List of 6
## .. ..$ Cancer: chr [1:18] 'Brain' 'Breast' 'Cervical' 'Colorectal' ...
## .. ..$ Gene: chr [1:26] 'APC_hetero_anyPV' 'ATM_hetero_anyPV' 'BARD1_hetero_anyPV' ...
##.. ..$ Race: chr [1:8] 'All_Races' 'AIAN' 'Asian' 'Black' ...
## .. ..$ Sex: chr [1:2] 'Female' 'Male
## .. ..$ PenetType: chr [1:2] 'Net' 'Crude
## $ AlleleFrequency: num [1:24, 1:3] 1.45e-04 1.90e-03 3.41e-04 2.17e-05 1.37e-02 ...
## ..- attr(*, 'dimnames')=List of 2
## .. ..$ Gene: chr [1:24] 'APC_anyPV' 'ATM_anyPV' 'BARD1_anyPV' 'BMPR1A_anyPV' ...
## .. ..$ Ancestry: chr [1:3] 'AJ' 'nonAJ' 'Italian'

In the current PanelPRO database, cancer penetrances are taken from data included in the BayesMendel package when available: the BRCA1 and BRCA2 estimates for the probability of developing breast or ovarian cancer Chen et al., 2020; the MLH1, MSH2, and MSH6 estimates for the probability of developing colorectal or endometrial cancer Wang et al., 2020; Felton et al., 2007; and the CDKN2A estimates for the probability of developing melanoma (Wang et al., 2010; Begg et al., 2005; Bishop, 2002). All other cancer penetrances are pulled from the All Syndromes Known to Man Evaluator (ASK2ME) clinical tool (Braun et al., 2018).

For allele frequencies, we use the non-Ashkenazi, Ashkenazi Jewish, and Italian BRCA1 and BRCA2 allele frequency estimates from BRCAPRO (Chen et al., 2004; Antoniou et al., 2002); for MLH1, MSH2, and MSH6, we use the allele frequency estimates from MMRpro (Chen et al., 2004Chen et al., 2006); and for CDKN2A, we use the allele frequency estimate from Melapro (Chen et al., 2004; Berwick, 2006). Allele frequency estimates for ATM, CHEK2, and PALB2 are taken from Lee et al., 2016. The allele frequencies of the remaining genes are estimated based on a 25-gene panel study of 252,223 individuals (Rosenthal et al., 2017) that did not adjust for ascertainment. In this case, we rescale the reported estimates by the ratio of the ascertained and unascertained allele frequencies for a gene reported in both our existing database and the study.

New genes and cancers will be added to PanelPRO based on regular literature reviews as conducted in ASK2ME (Braun et al., 2018). The ASK2ME approach identifies best-available studies that adjust for ascertainment; since many papers report odds ratios or relative risks, it then calculates absolute age-specific cancer penetrances when necessary.

The user can also select other options in the function call which are relevant at run-time. Examples include the maximum number of simultaneous gene mutations considered for a given individual, whether a parallelized version of the algorithm is performed, and the number of imputations in case of missing age data (see the Missing Data section). Many of the useful options are listed in Table 2.

Table 2
List of model options that the user can pass to PanelPRO, along with their defaults.
OptionDefault valuePossible valuesDescription
max.mutNULLIntegers up to the number of genesNumber of maximum simultaneous mutations, also known as the paring parameter. If no integer has been input, it re-defaults to 2.
iterations20Integers from 1 upwardsIn case of missing current or cancer ages in the pedigree, this is the number of times those ages will be imputed.
parallelTRUETRUE or FALSEIf age imputations are needed, this parameter can be set to utilize multiple cores on one’s machine.
netFALSETRUE or FALSEDetermines whether net or crude penetrances are used to compute future risk of cancer. Net penetrances exclude all other causes of death, apart from the affected cancer.
age.by5Integers from one upwardsThe intervals of age used to report the future risk of cancer.

Passing these options to the function call is simple, as shown below.

PanelPRO(pedigree = test_fam_1, 
                      cancers = c(‘Breast’, ‘Ovarian’),
                      genes = c(‘BRCA1’, ‘BRCA2’, ‘ATM’, ‘MSH2’),
                      max.mut = 1,
                      parallel = FALSE)

Instead of specifying a set of cancers and genes, users can call models corresponding to those in the BayesMendel package, as well as other predefined models. For example, the two calls below are equivalent.

bayesMendelCall <- BRCAPRO6(pedigree = test_fam_1)
panelProCall <- PanelPRO(pedigree = test_fam_1,
                                                    cancers = c(‘Breast’, ‘Ovarian’),
                                               genes = c(‘BRCA1’, ‘BRCA2’, ‘MLH1’, ‘MSH2’, ‘MSH6’, ‘CDKN2A’))
all.equal(bayesMendelCall, panelProCall)
## (1) TRUE

Preprocessing

Pedigree check

First, PanelPRO checks the structure of the user-supplied R data.frame containing the family history to be evaluated, using a call to the checkFam function. A description can be found in Table 3. Messages or warnings are given to the user if values have been automatically changed to rectify conflicts. For example, test_fam_1 contains some ancestry and race inconsistencies.

Table 3
List of main functions in PanelPRO.
CategoryNameDescription
Pre-processingcheckFamChecks family structure as defined by the user. The inputs are a data.frame specifying the pedigree and a built database returned by buildDatabase. The output is a modified data.frame pedigree and list of imputed ages, if missing ages were imputed (see the Missing Data section).
Pre-processingbuildDatabaseSubsets the internal database PanelPRODatabase depending on the cancers and genes selected. The input is the list PanelPRODatabase. The output is another list which is a subset of PanelPRODatabase.
AlgorithmPanelPROCalcEstimates the posterior carrier probabilities and future risks of the proband. The inputs are the outputs of checkFam. The outputs are lists of posterior probabilities and future risks for the proband.
Main functionPanelPRORuns main function. The inputs are the user-specified pedigree, a vector of cancers in the model, a vector of genes in the model, and other optional parameters. The output is a list of estimates of posterior carrier probabilities for each genotype, along with future cancer risks and ranges for each of these.
checkFam(test_fam_1)
## Your model has two cancers - Breast, Ovarian and 24 genes - APC_hetero_anyPV, ATM_hetero_anyPV ...
## Germline testing results for BRCA1 are assumed to be for default variant BRCA1_hetero_anyPV.
## Germline testing results for BRCA2 are assumed to be for default variant BRCA2_hetero_anyPV.
## ID 3 ’s Ancestry has been changed to nonAJ to meet heredity consistency
## ID 10 ’s race has been changed to All_Races to meet heredity consistency
## ID 9 ’s Ancestry has been changed to nonAJ to meet heredity consistency
## ID 16,17 ’s Ancestry has been changed to nonAJ to meet heredity consistency
## ID 29,30 ’s Ancestry has been changed to nonAJ to meet heredity consistency
## ID 33,34 ’s race has been changed to All_Races to meet heredity consistency
## ID 33,34 ’s Ancestry has been changed to nonAJ to meet heredity consistency

Errors are given if inconsistencies or ambiguities in the pedigree cannot be resolved such that the pedigree can be safely passed into downstream functions. Most of the checks are for the presence of required information; whether the pedigree variables have values in the expected range/set; and for consistency between cancers and sex and in terms of features with hereditary assumptions among parents/children and twins. The pedigree is also checked for ‘loops’, which PanelPRO currently does not support. For example, within err_fam_1, there are two sets of male and female siblings (four individuals) who have mated with the corresponding siblings in another family, as shown in Figure 3. This mating configuration results in a loop. For a more detailed definition of loops, see Fernando et al., 1993. In addition, disconnected family members are detected and removed from the pedigree if they will not influence the counselee’s results.

The sample pedigree err_fam_1 which contains a pedigree loop, due to the mating pattern of the siblings aged 59 and 55 with the siblings aged 60 and 62, respectively.

The two circles linked by a dotted line represent the same individual.

Build database

Depending on the configuration of the model requested (cancers in the family, genes considered), a subset of PanelPRODatabase or a user-modified database will be created and passed through for further calculation by the buildDatabase function. A description of this function can be found in Table 3 .

Algorithm

The checked pedigree and PanelPRODatabase subset, as well as any user options, are then passed to the ‘peeling-paring’ algorithm, which approximates Equation 2 in the Genotype probabilities section. It is based on the ‘peeling’ algorithm as introduced by Elston and Stewart, 1971 with its implementation based on Fernando et al., 1993. The ‘paring’ aspect of the algorithm limits the number of simultaneous mutations allowed. This is called the paring parameter and has a default value of 2, which results in an approximation which has been shown to be adequate for clinical purposes (Madsen et al., 2018). When the paring parameter is set equal to the number of distinct genes to be considered, the calculation is exact (assuming no other missing information about the pedigree). Future cancer risks are then calculated based on the law of total probability, using the previously calculated posterior carrier probabilities, as described in the Methods: Mendelian modeling section. These two calculations are performed in PanelPROCalc, as listed in Table 3. The underlying algorithm is written in Rcpp using the RcppArmadillo package (Eddelbuettel and Sanderson, 2014; Eddelbuettel and Francois, 2011). It uses, as much as possible, optimized data structures, vectorized operations and in-place modifications to be both time and memory efficient. See the Discussion section for benchmarks on the run-time of the implementation.

The recursive nature of the peeling-paring algorithm allows for multiple counselees to be specified in the function call without significant increase in the computational time. This is an advantage when multiple family members are at high risk and would benefit from knowing their carrier probabilities and future cancer risks.

Missing data

PanelPRO calculates mutation carrier probabilities for one or more counselees. The peeling-paring algorithm requires both parents of the counselee to be present in the pedigree in order to link individuals who are non-founders. When there is only data for a single parent (whose children influence the results), we add a pseudo-parent who has the same prior allele frequencies as the parent for whom we do have information on. This allows the peeling-paring algorithm to run and serves as an approximation of the final results.

When the current age or age of cancer diagnosis of a family member is unknown, we use a multiple imputation procedure to repeatedly sample their age (Biswas et al., 2013). Unknown current ages are sampled based on the current ages of the relatives, and unknown ages of cancer diagnosis are sampled from the cancer penetrances, using the current age as an upper bound. The optional impute.times argument in the main PanelPRO function can be used to set the number of samples taken. The value labeled ‘estimate’ in the output is the average of the results over the sampled ages, whilst the ‘lower’ and ‘upper’ bounds are the minimum and maximum values over the respective samples (whether it be for the posterior probabilities or future risks).

When impute.times is high (say, 50 or more), it is recommended to set the parameter parallel to TRUE. The algorithm will then use the foreach package and the existing cores in one’s machine to execute the imputations in a parallel fashion, instead of sequentially, thereby speeding up the computation.

Output

For each proband in the pedigree, the output consists of:

  • estimates of carrier probabilities,

  • lower and upper bound estimates of carrier probabilities if imputations were made for missing data,

  • estimates of future risks of cancers in 5-year intervals (the user can also change the length of the intervals),

  • lower and upper bound estimates of future risks of cancers in 5-year intervals if imputations were made for missing data.

Messages or warnings generated from checkFam have been omitted in the example below for brevity.

output <- PanelPRO(pedigree = test_fam_1, 
                                          cancers = c(‘Breast’, ‘Ovarian’), 
                                          genes = c(‘BRCA1’, ‘BRCA2’, ‘ATM’, ‘MSH2’), 
                                          max.mut = 2, 
                                          parallel = FALSE)
## Your model has two cancers - Breast, Ovarian and four genes - BRCA1_hetero_anyPV ...
output
##$posterior.prob
##$posterior.prob$‘6‘
##genesestimatelowerupper
##1noncarrier6.895857e-016.852634e-01 6.901750e-01
##2BRCA1_hetero_anyPV3.028050e-013.009048e-013.030639e-01
##3 BRCA2_hetero_anyPV3.486877e-042.817221e-04 4.216045e-04
##4ATM_hetero_anyPV4.027689e-033.904178e-035.532214e-03
##5MSH2_hetero_anyPV9.814770e-045.697486e-044.574016e-03
##6BRCA1_hetero_anyPV.BRCA2_hetero_anyPV1.462717e-041.181799e-04 1.768601e-04
##7BRCA1_hetero_anyPV.ATM_hetero_anyPV1.689866e-031.638046e-032.321095e-03
##8BRCA2_hetero_anyPV.ATM_hetero_anyPV 7.125399e-076.081245e-079.481190e-07
##9BRCA1_hetero_anyPV.MSH2_hetero_anyPV4.118459e-042.390780e-049.481190e-07
##10BRCA2_hetero_anyPV.MSH2_hetero_anyPV4.118459e-041.361573e-079.481190e-07
##11ATM_hetero_anyPV.MSH2_hetero_anyPV2.541503e-061.637872e-069.481190e-07
##
##
##$future.risk
##$future.risk$‘6‘
##$future.risk$‘6‘$Breast
##
##ByAgeestimatelowerupper
##1600.046942330.046917140.04707834
##2650.093408340.093361080.09367413
##3700.137092540.137026860.13747579
##4750.174645370.174565010.17513057
##5800.204839190.204747320.20541009
##6850.226872100.22677195 0.22750817
##7900.239271700.239167140.23994509
##
##$future.risk$‘6‘$Ovarian
##ByAgeestimatelowerupper
##1600.036000790.035982670.17954734
##2650.071863450.071831880.17954734
##3700.104641590.104600040.10477485
##4750.132383540.132334320.13253217
##5800.155202300.132334320.15536227
##6850.171830440.171771520.15536227
##7900.179607860.179547340.15536227

The package includes the function visRisk to visualize the output graphically. Figure 4 demonstrates this usage for test_fam_1. The visRisk function was implemented using the plotly (Sievert, 2020) package, so that the output can be rendered interactively and display the exact probabilities upon hovering.

Sample output using visRisk function.

Additional examples

In this section, we display some of the other test pedigrees included in PanelPRO and their corresponding output from the visRisk function, as well as a comparison to some other platforms and models. We compared PanelPRO to: BRCAPRO and MMRPRO in BayesMendel (Chen et al., 2004), CanRisk (Lee et al., 2019; Carver et al., 2021), IBIS (Tyrer et al., 2004), and PREMM-5 (Kastrinos et al., 2017), which all support different cancers, genes and model assumptions. Table 4 summarizes the supported inputs and outputs of each model. If a particular model or platform does not support certain cancers, the irrelevant family history is simply omitted from input. For brevity, Figures 59 provide visualizations of the pedigrees of these additional examples, and the corresponding model outputs are reported in as figure supplements. In all cases, we used the default model settings (if any). Many of the aforementioned platforms have long reports as outputs, so we have only included the portions concerned with carrier probabilities and future risks. The same information contained in the PanelPRO sample pedigrees is input to the other models; however not all of the features are used by these other platforms. Conversely, there are some inputs for the other models that PanelPRO does not include.

Table 4
Comparison between supported cancers and genes in PanelPRO and other platforms.
Model or platform nameVersionSupported cancer input typesSupported gene carrier probability outputsSupported future cancer risk outputs
PanelPRO0.2.0Brain, breast, cervical, colorectal, endometrial, gastric, kidney, leukemia, melanoma, ovarian, osteosarcoma, pancreatic, small intestine, soft tissue sarcoma, thyroid, urinary bladder, hepatobiliaryAPC, ATM, BARD1, BMPR1A, BRCA1, BRCA2, BRIP1, CDH1, CDK4, CDKN2A, CHEK2, EPCAM, MLH1, MSH2, MSH6, MUTYH, NBN, PALB2, PMS2, PTEN, RAD51C, RAD51D, STK11, TP53 same as cancer inputs
BRCAPRO2.1–7Breast, ovarianBRCA1, BRCA2 same as cancer inputs
MMRPRO2.1–7Colorectal, endometrialMLH1, MSH2, MSH6 same as cancer inputs
IBIS0.8bBreastNABreast
CanRisk1.2.3Breast, contralateral breast, ovarian, prostate, pancreaticBRCA1, BRCA2, PALB2, CHEK2, ATM, RAD51D, RAD51C, BRIP1Breast, ovarian
PREMM-5NAColorectal, endometrial, other (group of ovarian, stomach, small intestine, urinary tract/bladder/kidney, bile ducts, brain, pancreas, sebaceous gland skin)MLH1, MSH2, MSH6, PMS2, EPCAMNA
Figure 5 with 3 supplements see all
Sample pedigree test_fam_5.
Figure 6 with 5 supplements see all
Sample pedigree test_fam_6.
Figure 7 with 3 supplements see all
Sample pedigree test_fam_7.
Figure 8 with 5 supplements see all
Sample pedigree test_fam_10.
Figure 9 with 5 supplements see all
Sample pedigree test_fam_11.

Notably, all the cancers and genes supported by these other models are a subset of those supported in PanelPRO, except for PREMM-5, which takes into consideration other cancers associated with Lynch syndrome which are not currently included in PanelPRO (bile duct and sebaceous gland). However, PREMM-5 does not provide the associated future risk for these additional cancers, only carrier probabilities for MLH1, MSH2, MSH6, PMS2, and EPCAM.

Comparing PanelPRO with BRCAPRO and MMRPRO, we see that PanelPRO offers carrier probability estimates for a larger set of genes, as well as a graphical output of the future risk. IBIS does not give any estimates for carrier probabilities; however, it gives a summary of the future risks in text format, relative to population averages. Finally, PREMM-5 gives an estimate of carrying any of five genes (MLH1, MSH2, MSH6, PMS2, or EPCAM), whilst PanelPRO is able to give estimates for each of those individual genes. PREMM-5 also does not give estimates for future risks of cancer.

Two pedigrees that illustrate the differences between PanelPRO and PREMM-5 are test_fam_7 (Figure 5) and test_fam_11 (Figure 9). For test_fam_7, PanelPRO estimates a 42.3% probability of carrying an MLH1, MSH2, MSH6, PMS2, or EPCAM mutation (when excluding the possibility of multiple simultaneous gene mutations), compared to 32% for PREMM-5. This pedigree only contains family history of colorectal and endometrial cancers, which PREMM-5 uses as key risk factors, leading to similar results. In contrast, the mutation probability estimates between the same two models for test_fam_11 are quite different. This pedigree is an extreme example that contains history of endometrial, small intestine, ovarian, and pancreatic cancers, but PREMM-5 groups the latter three cancers into a single risk factor for any other Lynch syndrome-associated cancers. The differences in the model approaches and assumptions result in PanelPRO giving a 93% estimate for a mutation in any of the aforementioned genes (without multiple simultaneous gene mutations), while PREMM-5 returns 3.2%. test_fam_11 is an extreme pedigree, but it nonetheless illustrates the flexibility of PanelPRO for incorporating very detailed pedigree information with a high clinical impact.

Implementation summary

We list the key functions with their input(s) and output(s) in Table 3. The PanelPRO function calls the pre-processing functions and the algorithm engine in the back-end, so we expect that most users will only need to use this main function. However, the other functions can be called separately if desired. For example, users can call buildDatabase to inspect the database of model parameters or run checkFam to examine the pedigree after it has been checked.

Methods: Mendelian modeling

In this section, we give the mathematical details of the main PanelPROCalc engine, which encompasses approximating genotype distributions of counselees and their future cancer risks.

Genotype probabilities

PanelPRO predicts an individual’s probability of having a specified genotype. We use the notation in Table 5. Without loss of generality, let the subscript i=1 represent the counselee (i.e. the individual who is counseled). For simplicity, we only consider one counselee, although the model can handle multiple counselees in a computationally efficient manner. The counselee’s genotype probability is: 

(1) P(G1|H,U).
Table 5
Notation for Mendelian Modeling for a model with K genes and R cancers and a family of I members.

The subscript i denotes the i th family member.

Variable and notationDescriptionR object from user input, if applicable
Genotypes
𝐆i=(Gki)k=1KGenotype of individual i, where Gki is the binary indicator for carrying a deleterious mutation in the k th gene
𝐆=(𝐆i)i=1IGenotypes of all family members i=1,,I
Sex
UiBinary indicator that individual i is maleSex
𝐔=(Ui)i=1IBinary male indicators for all family members i=1,,I
Cancer history
TriAge of diagnosis of the r th cancer for individual iAgeXX
CiIndividual i’ s censoring age (current age or age of death)CurAge
δri=I(TriCi)Binary indicator that cancer r occurs before the censoring age for individual i
𝐇ri={(Ci,δri)if δri=0(Ci,δri,Tri)if δri=1Observed history of the r th cancer for individual i, not including risk modifiers and interventions
𝐇i=(𝐇ri)r=1RAll observed history for individual i
𝐇=(𝐇i)i=1IObserved histories for all family members i=1,,I
Td,riIndividual i ’s age of death from causes other than cancer r
Tri*=min(Tri,Td,ri)Individual i ’s age of first outcome, either cancer r or death from causes other than cancer r
Jri=I(Tri*=Tri)Binary indicator that individual i develops the r th cancerisAffXX

Using Bayes’ rule, the law of total probability and the assumption of independence of family phenotypes given genotypes and sex, this can be written as

(2) P(G1|H,U)P(G1)G2,,GIi=1IP(Hi|Gi,Ui)P(G2,,GI|G1)=P(G1)G2,,GIr=1Ri=1IP(Hri|Gi,Ui)P(G2,,GI|G1).

From this representation of the posterior probability, we can clearly see the model and user inputs to PanelPRO. P(𝐆1) represents the allele frequencies for each gene in the model. P(𝐇ri|𝐆i,Ui) are derived from the cancer penetrances P(Tri=t|𝐆i,Ui). Explicitly,

P(𝐇ri|𝐆i,Ui)={1-s=1CiP(Tri=s|𝐆i,Ui) if δri=0P(Tri=Triobs|𝐆i,Ui) if δri=1

where Tri is the random variable and Triobs is the observed cancer age. By default, the allele frequencies and penetrances are obtained from existing peer-reviewed studies and estimates, but are completely customizable within PanelPRO.

Since the genotype space {(𝐆2,,𝐆I):𝐆i{0,1}K,i=2,,I} is large for large values of K, we use the peeling-paring algorithm (Madsen et al., 2018) as an approximation, only allowing a pre-specified number of mutations to be simultaneously present in the same individual. The pedigree structure from the user input is used to derive the P(𝐆2,,𝐆I|𝐆1) term in Equation 2 using Mendelian laws of inheritance.

Future cancer risk

PanelPRO also estimates future cancer risk, based on the previously calculated genotype distribution of the individual. Suppose the counselee has not developed the r th cancer by their current age. Then the risk of developing the r th cancer in t0 years is

(3) P(Tr1*C1+t0,Jr1=1|𝐇,𝐔)=𝐆1P(Tr1*C1+t0,Jr1=1|𝐆1,𝐔)P(𝐆1|𝐇,𝐔).

Equation 3 produces so-called ‘crude’ risk, since competing risks of death from causes other than the specified cancer are accounted for. Thus, the reported future risk is the probability that the counselee develops the r th cancer within the next t0 years and does not die from other causes beforehand, given the cancer history and sexes of the family. P(Tr1*C1+t0,Jr1=1|𝐆1,𝐔) is the crude penetrance and is also a model input with default values estimated from the literature.

PanelPRO also provides the option to report ‘net’ future risk, which is the probability that the counselee develops the r th cancer in a hypothetical world where they cannot die from other causes, given the cancer history and sexes of the family. This risk type is not as realistic but some clinicians find it useful, as it focuses on the specified cancer and allows them to factor qualitatively the patient-specific covariates that may affect the patient’s risk. To report net future risk, PanelPRO uses the net penetrances P(Tri=t|𝐆i,Ui). Note that the genotype probabilities in Equation 2 were calculated using net penetrances, as we do not collect death from other causes as a user input.

Discussion

PanelPRO is a highly flexible package which provides an interface to efficiently calculate carrier probabilities for a wide array of cancer susceptibility genes, as well as future cancer risks. It is designed for R users. Similarly to the BayesMendel package, it can provide the computational engine behind clinical and counseling decision support tools.

It excels in being fully customizable. Any combination of the 24 genes and 18 cancers currently in version 0.2.0 of the package can be included in the model. New genes and cancers can easily be added, and in fact the code allows for an arbitrary number of genes and cancers. Risk modifiers have been included for certain procedures, and more can be added as additional information becomes available. The user can also change the internal database of parameter values.

The package includes a comprehensive check on the input pedigree to ensure users are informed of potentially inconsistent or infeasible data entries. When it is possible to do so safely, the data is automatically remedied and the user is then notified. Otherwise, the program will halt with an informative error message. Once the pedigree is pre-processed, the posterior probabilities are calculated efficiently. For example, test_fam_1, which has 19 members and family history of 2 cancers, runs with all the default settings in a few seconds as shown in Figure 10. The polynomial run-time of the peeling-paring algorithm is alleviated with PanelPRO’s Rcpp implementation. Even when relaxing the maximum mutations (paring) parameter, the C++ implementation is able to handle the calculations efficiently. Run-times in these ranges are certainly appropriate for clinical use, as well as use in a research setting where possibly hundreds of pedigrees have to be processed through PanelPRO. Moreover, the peeling-paring algorithm run-time scales linearly in the number of family members in the pedigree and can handle hundreds of members in an inter-generational configuration easily.

Sample run-times for test_fam_1 evaluated by PanelPRO on the default settings, as a function of the number of genes considered.

The paring parameter is set to 2. These run time experiments were performed on a 2020 Linux machine with an 11th Gen Intel(R) i7-1165G7 chip at 2.80 GHz.

PanelPRO has two main limitations. Firstly, the initial release does not handle pedigrees which contain loops. This additional functionality would be desirable in future releases, although loops in pedigrees do not happen frequently. Several studies suggest either exact or approximate computations for pedigrees with loops, see Stricker et al., 1995 and Totir et al., 2009. Secondly, the polynomial scaling of peeling-paring as a function of the number of genes considered becomes significant when many genes are incorporated. This issue is of concern because we strive for future releases to contain far more genes than 24 as data becomes available. Alternative algorithms which have different time complexity properties, such as the Lander-Green family of algorithms (Lander and Green, 1987), should be explored. These algorithms scale linearly in terms of the number of genes considered, but are exponential in the number of family members in the pedigree (Gao et al., 2009). A future objective for this package is to contain a choice of the carrier probability calculation method, and ideally an automatic selection of the one which is most efficient, depending on family size and total number of genes. Appropriate thresholds of these two parameters need to be determined by a comprehensive benchmarking exercise.

Appendix 1

In depth package workflow

Cancer name mapping

Appendix 1—figure 1
PanelPRO in depth package workflow.
Appendix 1—table 1
Abbreviations of cancers in PanelPRO.
Short nameLong nameShort nameLong name
BRABrainOCOvarian
BCBreastOSTOsteosarcoma
CERCervicalPANCPancreas
COLColorectalPROSProstate
ENDOEndometrialSISmall Intestine
GASGastricSTSSoft Tissue Sarcoma
KIDKidneyTHYThyroid
LEUKLeukemiaUBUrinary Bladder
MELAMelanomaHEPHeptobiliary

Sample pedigrees

Appendix 1—table 2
Summary of sample pedigrees provided within the package.
Pedigree nameNumber of family membersCancers present
test_fam_119BC, OC
test_fam_225ENDO, PANC, SI
test_fam_350BRA, BC, COL, ENDO, GAS, KID, MELA, OC, PANC, PROS, SI
test_fam_49BC, OC, BRA, COL, PROS, ENDO, SI,
test_fam_517BC, MELA
test_fam_619BC, ENDO, MELA, PANC
test_fam_719COL, ENDO
test_fam_820COL, PROS
test_fam_919BC, PROS
test_fam_1021BC, COL
test_fam_1116BC, ENDO, OC, PANC, SI
test_fam_1221all cancers in Appendix 1—table 1
err_fam_110BC, PANC

Data availability

This manuscript introduces PanelPRO, an innovative multi-gene multi-cancer Mendelian model. Software for this model, including the model parameter database, is available to download for research use; https://projects.iq.harvard.edu/bayesmendel/panelpro.

The following data sets were generated
    1. Lee G
    2. Liang JW
    3. Zhang Q
    4. Huang T
    5. Choirat C
    6. Parmigani G
    7. Braun D
    (2021) PanelPRO R Package
    ID projects.iq.harvard.edu/bayesmendel/panelpro. PanelPRO R Package.

References

    1. Plichta JK
    2. Griffin M
    3. Thakuria J
    4. Hughes KS
    (2016)
    What’s new in genetic testing for cancer susceptibility?
    Oncology 30:787–799.
  1. Book
    1. Sievert C
    (2020)
    Interactive Web-Based Data Visualization with R, Plotly, and Shiny
    Chapman and Hall/CRC.

Decision letter

  1. Goutham Narla
    Reviewing Editor; University of Michigan, United States
  2. Eduardo Franco
    Senior Editor; McGill University, Canada

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

The authors have done an admirable job responding to the reviewers and editorial board members. They have increased the number of cases/pedigrees analyzed using this novel modeling tool. They have also outlined how to practically use this program for users.

Decision letter after peer review:

Thank you for submitting your article "PanelPRO: An R package for multi-syndrome, multi-gene risk modeling for individuals with a family history of cancer" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by a Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another, and the Editors have drafted this to help you prepare a revised submission.

Essential Revisions:

Overall the reviewers felt that this is outstanding manuscript with potential broad impact in the cancer genetics and oncology community as highlighted in the reviews below. There are a couple of key revisions that are recommended prior to the manuscript being acceptable for publication.

1) Additional examples of cases and the output from the analysis performed on these example cases with comparison when appropriate to existing programs.

2) More details about the practical uses of the program in the clinic and discussion about the ability to import data from existing clinical platforms into PanelPRO.Reviewer #1:

This is an outstanding paper that develops and validates a new germline analysis tool for inputting family history and germline sequencing data into a user friendly interface that can be then used to calculate the risk architecture of a given family based upon both ascertained family history information as well as DNA sequencing results.

There are a number of strengths in this manuscript including:

1. The work and program generated here is novel and has a number of advantages to the existing software / platforms currently available to the cancer genetics community – it is more comprehensive in genes that can be inputted and in the family history architecture that can be ascertained.

2. The paper is well written and is able to clearly followed with the methods laid out thoroughly and comprehensively.

3. The impact of this work to the cancer genetics and oncology community will be immediate and I can envision and predict widespread adoption of this software platform in the near term.

One weakness to be noted is that more example cases could be included as test and validation examples and to show the results from the analysis being performed. When appropriate and relevant perhaps a comparison of the results to currently available platforms to analyze germline sequencing data would be of value and strengthen the paper.

Reviewer #2:

This work describes an R package called PanelPRO that seeks to assist in identifying individuals at increased risk of cancer due to inherited germline mutations in multiple genes. This tool integrates pedigree data and, importantly, has the ability to be updated as new cancer predisposition genes are discovered and peer-reviewed data on cancer risk becomes available. Patient factors such as risk-reducing surgeries and tumor biomarkers can also be incorporated as part of the risk evaluation. This allows for more personalized risk predictions compared to existing risk prediction models.

Strengths:

This R package offers many advantages over existing programs with similar intended uses. The authors highlight this by noting the limited cancer types and genes considered by existing tools and contrasting this to the capabilities of PanelPRO. Specifically, the authors provide clear examples of this software's ability to incorporate family histories with several different cancer types and to provide an output for prediction of finding a germline pathogenic variant in multiple genes as well as individualized future cancer risk. It also allows users to include individual level risk factors and biomarkers, resulting in a more personalized risk assessment when compared to existing tools. There are several other features that can be changed based on user preference, providing users with a high level of customization. The flexibility to make updates to the R package to reflect changes in the field of cancer genetics is another strength of this work. PanelPRO is designed to be compatible with an existing risk-modeling package, BayesMendel.

The authors provide a clear workflow explaining the use of the package and provide the necessary information to access it in its currently available form. The different variables, notations, and customizable features of the program are clearly explained in the manuscript. Many important variables from the family history that cannot be included in risk assessments using currently available tools can be incorporated into PanelPRO. There are several examples showing different features and capabilities of the program that improve on existing tools. Instructions for use with pedigree information appear to be clear for those familiar with R software. Some limitations of the technology are noted but suggestions to address the described limitations are included.

Weaknesses:

While the strengths of the PanelPRO package are evident, discussion around its use in practice is lacking. By excluding this aspect, potential limitations related to implementation and use are not addressed here, but are key in determining the potential impact of this software in the field. It is not noted if this new package is intended for use in clinical practice, research, or both. Proposed users include users of the existing BayesMendel package, so providing information about users of the BayesMendel package (clinicians versus researchers, volumes, etc) could help readers determine possible applicability of PanelPRO to their own practice. While the authors posit new users may be interested in this software due to its enhanced abilities, further information on new potential users is not included. Due to the absence of discussion of use of this package in a clinical and/or research setting, the likely uptake and impact of this work on the field is difficult to determine. Broader audiences may benefit from more context surrounding existing cancer risk models and their use in cancer genetics to better appreciate the improvements noted here compared to traditional risk modeling programs.

One of the strengths of PanelPRO is the capability for software updates as new knowledge about hereditary cancer syndromes and associated pathogenic variants becomes available. Although this software has the ability to customize the allele frequency and penetrance for a requested gene, meaning it can accommodate predictions for genes not built into its software, built-in gene data relies on published, peer-reviewed data for allele frequencies and penetrance. Additional context surrounding how new genes get added to PanelPRO would help readers understand the significance of the work.

While the improvements compared to existing programs are clear, context around the current use of available risk models in practice, and specific examples of intended use, would help the reader better appreciate the potential significant impact of PanelPRO in clinical and/or research cancer genetics settings. Information about the ability to import data directly from popular pedigree programs would also help determine the potential uptake and impact.

https://doi.org/10.7554/eLife.68699.sa1

Author response

Essential Revisions:

Overall the reviewers felt that this is outstanding manuscript with potential broad impact in the cancer genetics and oncology community as highlighted in the reviews below. There are a couple of key revisions that are recommended prior to the manuscript being acceptable for publication.

1) Additional examples of cases and the output from the analysis performed on these example cases with comparison when appropriate to existing programs.

We appreciate the reviewer's thoughtful comments. We agree that additional example cases and comparisons would strengthen the manuscript. We have added eight additional example pedigrees to PanelPRO version 0.2.0, with a total of thirteen pedigrees now provided as examples in the package (see revised manuscript lines 103-108 in the User Input subsection). These pedigrees cover a broad range of realistic and extreme examples that illustrate the utility of the PanelPRO model. In the manuscript, we have added a new subsection to the Methods section titled Additional Examples (see revised manuscript lines 236-273). We visualize five of these sample pedigrees, as well as their outputs based on PanelPRO with all the default settings (Figures 5-9 and their corresponding figure supplements). Where appropriate, depending on the cancers reported in the example pedigree, we compare these outputs to the outputs of BRCAPRO (breast and ovarian) and MMRPRO (colorectal and endometrial), both from the BayesMendel R package (Chen et al., 2004); CanRisk (breast and ovarian), also known as BOADICEA (Lee et al., 2019; Carver et al., 2021); IBIS (breast and ovarian) (Tyrer et al., 2004); and PREMM-5 (colorectal, endometrial, and any other Lynch syndrome-associated cancers) (Kastrinos et al., 2017). We have added Table 3 to the main text which summarizes the cancer inputs and output for each of these models. The advantage of PanelPRO is that it takes as input multiple cancers; the output includes both carrier probability and future risk estimates; future risks of cancers are shown over time; and the entire output figure is interactive, meaning that numerical estimates can be sought by hovering over the relevant data point(s).

2) More details about the practical uses of the program in the clinic and discussion about the ability to import data from existing clinical platforms into PanelPRO.

PanelPRO is currently available for research purposes (https://projects.iq.harvard.edu/bayesmendel/panelpro) and we may license it in the future for commercial clinical use. The existing BayesMendel package is also available for research purposes (https://projects.iq.harvard.edu/bayesmendel/bayesmendel-r-package) and is currently licensed for commercial clinical use to CRA Health, CancerGene Connect, Progeny, FamHis, MagView, Igentify, CancerIQ, and Finch genetics. For BayesMendel, we leave clinical integration to the software licensees, including the integration into electronic health records using software such as EPIC. We envision a similar dissemination plan for PanelPRO. Since R is capable of reading in a wide range of data types, from flat files to compressed data objects, there are many approaches for importing and formatting PanelPRO-compatible pedigrees from existing clinical platforms. We do hope, funding permitted, to develop a R shiny app to support a user interface for R usage of the package, but we leave integration of the PanelPRO model into existing clinical platforms to the companies who specialize in this. We have added a discussion on the practical usage of PanelPRO to the end of the Introduction section in lines 78-85.

Reviewer #1:

One weakness to be noted is that more example cases could be included as test and validation examples and to show the results from the analysis being performed. When appropriate and relevant perhaps a comparison of the results to currently available platforms to analyze germline sequencing data would be of value and strengthen the paper.

To address this weakness, we have added eight additional pedigrees, for a total of thirteen example cases in PanelPRO version 0.2.0 (lines 103-108 in the User Input subsection) (see also the response to reviewing editor, comment 1). These pedigrees cover a broad range of realistic and extreme examples that illustrate the utility of the PanelPRO model. In the manuscript, we provide visualizations for five of these sample pedigrees, as well as the results from applying PanelPRO with all the default settings to these pedigrees (Figures 5-9 and their corresponding figure supplements). We compare the PanelPRO results for these pedigrees to those of BRCAPRO and MMRPRO (from the BayesMendel R package) (Chen et al., 2004), CanRisk (also known as BOADICEA) (Lee et al., 2019; Carver et al., 2021), IBIS (Tyrer et al., 2004) and PREMM-5 (Kastrinos et al., 2017), when relevant (i.e. for pedigrees with cancer family history supported by both models). The output from PanelPRO provides both carrier probability and future risk estimates; future risks of cancers are shown over time; and the entire figure is interactive, meaning that numerical estimates can be sought by hovering over the relevant data point(s). Lines 236-273 and Table 3 in the newly-added “Additional Examples" section summarize this analysis and discussion.

A pair of such examples illustrates the differences between PanelPRO and PREMM-5 for estimating the risk for gene mutations associated with Lynch syndrome. For the pedigree `test_fam_7`, the mutation probabilities for any of the genes MLH1, MSH2, MSH6, PMS2 or EPCAM is 42.3% according to PanelPRO (when excluding the possibility of multiple simultaneous gene mutations), whereas it is 32% for PREMM-5. This pedigree only contains history of colorectal and endometrial cancers, which PREMM-5 uses as key risk factors in their model, leading to similar results (though PanelPRO gives estimates of each individual gene in addition). In contrast, the mutation probability estimates between the same two models for `test_fam_11` are quite different. This pedigree is an extreme example for illustration purposes and is unlikely to appear in clinical settings. Nevertheless, it contains endometrial, small intestine, ovarian and pancreatic cancers, all of which are associated with Lynch syndrome, but the latter three are grouped, along with other cancers, into a single risk factor in the PREMM-5 model. It also contains a case of breast cancer in a second-degree relative, but for Lynch syndrome this is not a significant risk factor. The differences in the model approaches, along with differences in assumptions, result in PanelPRO giving a 93% estimate for a mutation in any of the aforementioned genes (without multiple simultaneous gene mutations), whilst PREMM-5 gives 3.2%. This is clearly an extreme pedigree in terms of the number and nature of the relevant cancers, and both models give a recommendation for further genetic evaluation at a cut-o_ of 2.5%; however, it illustrates the exibility of PanelPRO for incorporating very detailed pedigree information that will have a high clinical impact.

Reviewer #2:

Weaknesses:

While the strengths of the PanelPRO package are evident, discussion around its use in practice is lacking. By excluding this aspect, potential limitations related to implementation and use are not addressed here, but are key in determining the potential impact of this software in the field. It is not noted if this new package is intended for use in clinical practice, research, or both. Proposed users include users of the existing BayesMendel package, so providing information about users of the BayesMendel package (clinicians versus researchers, volumes, etc) could help readers determine possible applicability of PanelPRO to their own practice. While the authors posit new users may be interested in this software due to its enhanced abilities, further information on new potential users is not included. Due to the absence of discussion of use of this package in a clinical and/or research setting, the likely uptake and impact of this work on the field is difficult to determine. Broader audiences may benefit from more context surrounding existing cancer risk models and their use in cancer genetics to better appreciate the improvements noted here compared to traditional risk modeling programs.

The reviewer raises excellent points about the practical usage of PanelPRO (see also the response to reviewing editor, comment 2). The existing BayesMendel package is used in research, as well as integrated into clinical tools such as CancerGene (Chen et al., 2004) (over 4,000 users in more than 75 countries), CRA Health (over 15,000 users per month), Progeny, FamHis, MagView, Igentify, CancerIQ, and Finch genetics. Similarly, we hope PanelPRO will be used by both clinicians and researchers. Family history has long been understood to be a key component for identifying risk and preventing heritable diseases, and clinical tools such as the ones which license BayesMendel are becoming more readily available (Welch et al., 2018). Cancer risk models like PanelPRO and the existing BayesMendel models incorporate family history and other information to quantify risk in terms of carrier probabilities and the future probability of disease development. Beyond current users of the BayesMendel package, we hope that PanelPRO's generality and relevance to panel testing will attract a broader audience in the current landscape for cancer clinical risk assessment. This additional context has been added to the end of the Introduction section in lines 74-75 and 81-85.

One of the strengths of PanelPRO is the capability for software updates as new knowledge about hereditary cancer syndromes and associated pathogenic variants becomes available. Although this software has the ability to customize the allele frequency and penetrance for a requested gene, meaning it can accommodate predictions for genes not built into its software, built-in gene data relies on published, peer-reviewed data for allele frequencies and penetrance. Additional context surrounding how new genes get added to PanelPRO would help readers understand the significance of the work.

As the reviewer notes, one must obtain estimates of the pathogenic variant allele frequency and the penetrance for at least one cancer in the database in order to add a new gene. Therefore, new genes will be added to PanelPRO based on regular literature reviews as conducted in the All Syndromes Known to Man Evaluator (ASK2ME) clinical tool (Braun et al., 2018). The ASK2ME approach identifies best-available studies that adjust for ascertainment; since many papers report odds ratios or relative risks, it then calculates absolute age-specific cancer penetrances when necessary.

In the current PanelPRO database, we incorporate cancer penetrances from data included in the Bayes-Mendel package (primarily from meta-analyses) when available: the BRCA1 and BRCA2 estimates for the probability of developing breast or ovarian cancer (Chen et al., 2020); the MLH1, MSH2, and MSH6 estimates for the probability of developing colorectal or endometrial cancer (Wang et al., 2020; Felton et al., 2007); and the CDKN2A estimates for the probability of developing melanoma (Wang et al., 2010; Begg et al., 2005; Bishop et al., 2002). All other cancer penetrances were pulled from ASK2ME (Braun et al., 2018).

For allele frequencies, we also perform a literature review. In the current PanelPRO database, we used the non-Ashkenazi, Ashkenazi Jewish, and Italian BRCA1 and BRCA2 allele frequency estimates from BRCAPRO (Chen et al., 2004; Antoniou et al., 2002); for MLH1, MSH2, and MSH6, we used the allele frequency estimates from MMRpro (Chen et al., 2004, 2006); and for CDKN2A, we used the allele frequency estimate from Melapro (Chen et al., 2004; Berwick et al., 2006). Allele frequency estimates for ATM, CHEK2, and PALB2 were taken from Lee et al. (2016). The allele frequencies of the remaining genes were estimated based on a 25-gene panel study of 252,223 individuals (Rosenthal et al., 2017) that did not adjust for ascertainment. In this case, we rescaled the reported estimates by the ratio of the ascertained and unascertained allele frequencies for a gene reported in both our existing database and the study. We now include notes on adding new genes and obtaining model parameter estimates, including new citations, in lines 131-150 of the Model Input subsection.

While the improvements compared to existing programs are clear, context around the current use of available risk models in practice, and specific examples of intended use, would help the reader better appreciate the potential significant impact of PanelPRO in clinical and/or research cancer genetics settings. Information about the ability to import data directly from popular pedigree programs would also help determine the potential uptake and impact.

We thank the reviewer for their recommendations (see also the response to reviewing editor, comment 2 and the response to reviewer 2, comment 1). As mentioned in previous responses, for BayesMendel, we leave clinical integration to the software licensees, including the integration of electronic medical records such as EPIC. We envision a similar dissemination plan for PanelPRO. Since R is capable of reading in a wide range of data types, from flat files to compressed data objects, there are many approaches for directly importing pedigrees from existing clinical platforms. We do hope, funding permitted, to develop an R shiny app to support R usage of the package that requires less technical skill, but we leave integration of the PanelPRO model into existing clinical platforms to the companies who specialize in this. We have added a discussion on the practical usage of PanelPRO to the end of the Introduction section in lines 78-85.

References

Antoniou, A., Pharoah, P., McMullan, G., Day, N., Stratton, M., Peto, J., Ponder, B., and Easton, D. (2002). A comprehensive model for familial breast cancer incorporating BRCA1, BRCA2 and other genes. British journal of cancer, 86(1):76.

Begg, C. B., Orlow, I., Hummer, A. J., Armstrong, B. K., Kricker, A., Marrett, L. D., Millikan, R. C., Gruber, S. B., Anton-Culver, H., Zanetti, R., et al. (2005). Lifetime risk of melanoma in CDKN2A mutation carriers in a population-based sample. Journal of the National Cancer Institute, 97(20):1507{1515.

Berwick, M., Orlow, I., Hummer, A. J., Armstrong, B. K., Kricker, A., Marrett, L. D., Millikan, R. C., Gruber, S. B., Anton-Culver, H., Zanetti, R., et al. (2006). The prevalence of CDKN2A germ-line mutations and relative risk for cutaneous malignant melanoma: an international population-based study. Cancer Epidemiology and Prevention Biomarkers, 15(8):1520{1525.

Bishop, D. T., Demenais, F., Goldstein, A. M., Bergman, W., Bishop, J. N., Paillerets, B. B.-d., Chompret, A., Ghiorzo, P., Gruis, N., Hansson, J., et al. (2002). Geographical variation in the penetrance of CDKN2A mutations for melanoma. Journal of the National Cancer Institute, 94(12):894{903.

Braun, D., Yang, J., Gri_n, M., Parmigiani, G., and Hughes, K. S. (2018). A clinical decision support tool to predict cancer risk for commonly tested cancer-related germline mutations. Journal of genetic counseling,27(5):1187{1199.

Carver, T., Hartley, S., Lee, A., Cunningham, A. P., Archer, S., de Villiers, C. B., Roberts, J., Ruston, R., Walter, F. M., Tischkowitz, M., et al. (2021). Canrisk tool|a web interface for the prediction of breast and ovarian cancer risk and the likelihood of carrying genetic pathogenic variants. Cancer Epidemiology and Prevention Biomarkers, 30(3):469{473.

Chen, J., Bae, E., Zhang, L., Hughes, K., Parmigiani, G., Braun, D., and Rebbeck, T. R. (2020). Penetrance of breast and ovarian cancer in women who carry a BRCA1/2 mutation and do not use risk-reducing salpingo-oophorectomy: An updated meta-analysis. JNCI cancer spectrum, 4(4):pkaa029.

Chen, S., Wang, W., Broman, K. W., Katki, H. A., and Parmigiani, G. (2004). BayesMendel: an R environment for mendelian risk prediction. Statistical applications in genetics and molecular biology, 3(1):1{19.

Chen, S., Wang, W., Lee, S., Nafa, K., Lee, J., Romans, K., Watson, P., Gruber, S. B., Euhus, D., Kinzler, K. W., et al. (2006). Prediction of germline mutations and cancer risk in the Lynch syndrome. Jama, 296(12):1479{1487.

Felton, K., Gilchrist, D., and Andrew, S. (2007). Constitutive deficiency in DNA mismatch repair: is it time for Lynch III? Clinical genetics, 71(6):499{500.

Kastrinos, F., Uno, H., Ukaegbu, C., Alvero, C., McFarland, A., Yurgelun, M. B., Kulke, M. H., Schrag, D., Meyerhardt, J. A., Fuchs, C. S., et al. (2017). Development and validation of the premm5 model for comprehensive risk assessment of lynch syndrome. Journal of clinical oncology, 35(19):2165.

Lee, A., Mavaddat, N., Wilcox, A. N., Cunningham, A. P., Carver, T., Hartley, S., de Villiers, C. B., Izquierdo, A., Simard, J., Schmidt, M. K., et al. (2019). Boadicea: a comprehensive breast cancer risk prediction model incorporating genetic and nongenetic risk factors. Genetics in Medicine, 21(8):1708{1718.

Lee, A. J., Cunningham, A. P., Tischkowitz, M., Simard, J., Pharoah, P. D., Easton, D. F., and Antoniou, A. C. (2016). Incorporating truncating variants in PALB2, CHEK2, and ATM into the BOADICEA breast cancer risk model. Genetics in Medicine, 18(12):1190.

Rosenthal, E. T., Bernhisel, R., Brown, K., Kidd, J., and Manley, S. (2017). Clinical testing with a panel of 25 genes associated with increased cancer risk results in a significant increase in clinically significant findings across a broad range of cancer histories. Cancer genetics, 218:58{68.

Tyrer, J., Du_y, S. W., and Cuzick, J. (2004). A breast cancer prediction model incorporating familial and personal risk factors. Statistics in medicine, 23(7):1111{1130.

Wang, C., Wang, Y., Hughes, K. S., Parmigiani, G., and Braun, D. (2020). Penetrance of colorectal cancer among mismatch repair gene mutation carriers: A meta-analysis. JNCI Cancer Spectrum.

Wang, W., Niendorf, K. B., Patel, D., Blackford, A., Marroni, F., Sober, A. J., Parmigiani, G., and Tsao, H. (2010). Estimating CDKN2A carrier probability and personalizing cancer risk assessments in hereditary melanoma using MelaPRO. Cancer research, 70(2):552{559.

Welch, B. M., Wiley, K., Pfieger, L., Achiangia, R., Baker, K., Hughes-Halbert, C., Morrison, H., Schiffman, J., and Doerr, M. (2018). Review and comparison of electronic patient-facing family health history tools. Journal of genetic counseling, 27(2):381{391.

https://doi.org/10.7554/eLife.68699.sa2

Article and author information

Author details

  1. Gavin Lee

    Swiss Data Science Center, ETH Zürich and EPFL, Lausanne, Switzerland
    Contribution
    Software, Investigation, Methodology, Writing - original draft
    Contributed equally with
    Jane W Liang
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2659-1163
  2. Jane W Liang

    1. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, United States
    2. Department of Data Sciences, Dana-Farber Cancer Institute, Boston, United States
    Contribution
    Software, Formal analysis, Validation, Methodology, Writing - review and editing
    Contributed equally with
    Gavin Lee
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2302-3809
  3. Qing Zhang

    Broad Institute of MIT and Harvard, Cambridge, United States
    Contribution
    Software, Validation, Investigation, Methodology
    Competing interests
    No competing interests declared
  4. Theodore Huang

    1. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, United States
    2. Department of Data Sciences, Dana-Farber Cancer Institute, Boston, United States
    Contribution
    Conceptualization, Data curation, Software, Investigation, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
  5. Christine Choirat

    Swiss Data Science Center, ETH Zürich and EPFL, Lausanne, Switzerland
    Contribution
    Supervision, Investigation, Methodology, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  6. Giovanni Parmigani

    1. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, United States
    2. Department of Data Sciences, Dana-Farber Cancer Institute, Boston, United States
    Contribution
    Conceptualization, Formal analysis, Supervision, Methodology, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  7. Danielle Braun

    1. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, United States
    2. Department of Data Sciences, Dana-Farber Cancer Institute, Boston, United States
    Contribution
    Conceptualization, Supervision, Methodology, Project administration, Writing - review and editing
    For correspondence
    bmendel@jimmy.harvard.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5177-8598

Funding

National Institutes of Health (5T32CA009337)

  • Jane W Liang
  • Theodore Huang

National Institutes of Health (2T32CA009001)

  • Theodore Huang

National Institutes of Health (4P30CA006516)

  • Giovanni Parmigani

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We gratefully acknowledge support from the National Cancer Institute at the National Institutes of Health grants 5T32CA009337 (JWL and TH), 2T32CA009001 (TH), and 4P30CA006516 (GP).

Senior Editor

  1. Eduardo Franco, McGill University, Canada

Reviewing Editor

  1. Goutham Narla, University of Michigan, United States

Publication history

  1. Received: March 24, 2021
  2. Accepted: August 16, 2021
  3. Accepted Manuscript published: August 18, 2021 (version 1)
  4. Version of Record published: September 28, 2021 (version 2)

Copyright

© 2021, Lee et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 880
    Page views
  • 79
    Downloads
  • 2
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Gavin Lee
  2. Jane W Liang
  3. Qing Zhang
  4. Theodore Huang
  5. Christine Choirat
  6. Giovanni Parmigani
  7. Danielle Braun
(2021)
Multi-syndrome, multi-gene risk modeling for individuals with a family history of cancer with the novel R package PanelPRO
eLife 10:e68699.
https://doi.org/10.7554/eLife.68699

Further reading

    1. Biochemistry and Chemical Biology
    2. Cancer Biology
    David J Hosfield et al.
    Research Article Updated

    Chemical manipulation of estrogen receptor alpha ligand binding domain structural mobility tunes receptor lifetime and influences breast cancer therapeutic activities. Selective estrogen receptor modulators (SERMs) extend estrogen receptor alpha (ERα) cellular lifetime/accumulation. They are antagonists in the breast but agonists in the uterine epithelium and/or in bone. Selective estrogen receptor degraders/downregulators (SERDs) reduce ERα cellular lifetime/accumulation and are pure antagonists. Activating somatic ESR1 mutations Y537S and D538G enable resistance to first-line endocrine therapies. SERDs have shown significant activities in ESR1 mutant setting while few SERMs have been studied. To understand whether chemical manipulation of ERα cellular lifetime and accumulation influences antagonistic activity, we studied a series of methylpyrollidine lasofoxifene (Laso) derivatives that maintained the drug’s antagonistic activities while uniquely tuning ERα cellular accumulation. These molecules were examined alongside a panel of antiestrogens in live cell assays of ERα cellular accumulation, lifetime, SUMOylation, and transcriptional antagonism. High-resolution x-ray crystal structures of WT and Y537S ERα ligand binding domain in complex with the methylated Laso derivatives or representative SERMs and SERDs show that molecules that favor a highly buried helix 12 antagonist conformation achieve the greatest transcriptional suppression activities in breast cancer cells harboring WT/Y537S ESR1. Together these results show that chemical reduction of ERα cellular lifetime is not necessarily the most crucial parameter for transcriptional antagonism in ESR1 mutated breast cancer cells. Importantly, our studies show how small chemical differences within a scaffold series can provide compounds with similar antagonistic activities, but with greatly different effects of the cellular lifetime of the ERα, which is crucial for achieving desired SERM or SERD profiles.

    1. Cancer Biology
    2. Computational and Systems Biology
    Iurii Petrov, Andrey Alexeyenko
    Research Article Updated

    Late advances in genome sequencing expanded the space of known cancer driver genes several-fold. However, most of this surge was based on computational analysis of somatic mutation frequencies and/or their impact on the protein function. On the contrary, experimental research necessarily accounted for functional context of mutations interacting with other genes and conferring cancer phenotypes. Eventually, just such results become ‘hard currency’ of cancer biology. The new method, NEAdriver employs knowledge accumulated thus far in the form of global interaction network and functionally annotated pathways in order to recover known and predict novel driver genes. The driver discovery was individualized by accounting for mutations’ co-occurrence in each tumour genome – as an alternative to summarizing information over the whole cancer patient cohorts. For each somatic genome change, probabilistic estimates from two lanes of network analysis were combined into joint likelihoods of being a driver. Thus, ability to detect previously unnoticed candidate driver events emerged from combining individual genomic context with network perspective. The procedure was applied to 10 largest cancer cohorts followed by evaluating error rates against previous cancer gene sets. The discovered driver combinations were shown to be informative on cancer outcome. This revealed driver genes with individually sparse mutation patterns that would not be detectable by other computational methods and related to cancer biology domains poorly covered by previous analyses. In particular, recurrent mutations of collagen, laminin, and integrin genes were observed in the adenocarcinoma and glioblastoma cancers. Considering constellation patterns of candidate drivers in individual cancer genomes opens a novel avenue for personalized cancer medicine.