Multi-syndrome, multi-gene risk modeling for individuals with a family history of cancer with the novel R package PanelPRO

  1. Gavin Lee
  2. Jane W Liang
  3. Qing Zhang
  4. Theodore Huang
  5. Christine Choirat
  6. Giovanni Parmigiani
  7. Danielle Braun  Is a corresponding author
  1. Swiss Data Science Center, ETH Zürich and EPFL, Switzerland
  2. Department of Biostatistics, Harvard T.H. Chan School of Public Health, United States
  3. Department of Data Sciences, Dana-Farber Cancer Institute, United States
  4. Broad Institute of MIT and Harvard, United States
11 figures, 7 tables and 1 additional file

Figures

PanelPRO package workflow.
test_fam_1 sample pedigree as included in the PanelPRO package, plotted using the external visPed package.

The colors refer to cancer diagnoses in the legend. The age of diagnosis is shown below the individual if it is known.

The sample pedigree err_fam_1 which contains a pedigree loop, due to the mating pattern of the siblings aged 59 and 55 with the siblings aged 60 and 62, respectively.

The two circles linked by a dotted line represent the same individual.

Sample output using visRisk function.
Figure 5 with 3 supplements
Sample pedigree test_fam_5.
Figure 5—figure supplement 1
PanelPRO output with test_fam_5 as pedigree input.

Graphical output from the visRisk function in PanelPRO for test_fam_5.

Figure 5—figure supplement 2
BRCAPRO output with test_fam_5 as pedigree input.

Text output from the BRCAPRO model for test_fam_5.

Figure 5—figure supplement 3
IBIS output with test_fam_5 as pedigree input.

Screenshot of output from IBIS for test_fam_5.

Figure 6 with 5 supplements
Sample pedigree test_fam_6.
Figure 6—figure supplement 1
PanelPRO output with test_fam_6 as pedigree input.

Graphical output from the visRisk function in PanelPRO for test_fam_6.

Figure 6—figure supplement 2
BRCAPRO output with test_fam_6 as pedigree input.

Text output from the BRCAPRO model for test_fam_6.

Figure 6—figure supplement 3
MMRPRO output with test_fam_6 as pedigree input.

Text output from the MMRPRO model for test_fam_6.

Figure 6—figure supplement 4
IBIS output with test_fam_6 as pedigree input.

Screenshot of output from IBIS for test_fam_6.

Figure 6—figure supplement 5
PREMM-5 output with test_fam_6 as pedigree input.

Screenshot of output from PREMM-5 for test_fam_6.

Figure 7 with 3 supplements
Sample pedigree test_fam_7.
Figure 7—figure supplement 1
PanelPRO output with test_fam_7 as pedigree input.

Graphical output from the visRisk function in PanelPRO for test_fam_7.

Figure 7—figure supplement 2
MMRPRO output with test_fam_7 as pedigree input.

Text output from the MMRPRO model for test_fam_7.

Figure 7—figure supplement 3
PREMM-5 output with test_fam_7 as pedigree input.

Screenshot of output from PREMM-5 for test_fam_7.

Figure 8 with 5 supplements
Sample pedigree test_fam_10.
Figure 8—figure supplement 1
PanelPRO output with test_fam_10 as pedigree input.

Graphical output from the visRisk function in PanelPRO for test_fam_10.

Figure 8—figure supplement 2
BRCAPRO output with test_fam_10 as pedigree input.

Text output from the BRCAPRO model for test_fam_10.

Figure 8—figure supplement 3
MMRPRO output with test_fam_10 as pedigree input.

Text output from the MMRPRO model for test_fam_10.

Figure 8—figure supplement 4
IBIS output with test_fam_1 as pedigree input.

Screenshot of output from IBIS for test_fam_10.

Figure 8—figure supplement 5
PREMM-5 output with test_fam_10 as pedigree input.

Screenshot of output from PREMM-5 for test_fam_10.

Figure 9 with 5 supplements
Sample pedigree test_fam_11.
Figure 9—figure supplement 1
PanelPRO output with test_fam_11 as pedigree input.

Graphical output from the visRisk function in PanelPRO for test_fam_11.

Figure 9—figure supplement 2
BRCAPRO output with test_fam_11 as pedigree input.

Text output from the MMRPRO model for test_fam_11.

Figure 9—figure supplement 3
MMRPRO output with test_fam_11 as pedigree input.

Text output from the MMRPRO model for test_fam_11.

Figure 9—figure supplement 4
IBIS output with test_fam_11 as pedigree input.

Screenshot of output from IBIS for test_fam_11.

Figure 9—figure supplement 5
PREMM-5 output with test_fam_11 as pedigree input.

Screenshot of output from PREMM-5 for test_fam_11

Sample run-times for test_fam_1 evaluated by PanelPRO on the default settings, as a function of the number of genes considered.

The paring parameter is set to 2. These run time experiments were performed on a 2020 Linux machine with an 11th Gen Intel(R) i7-1165G7 chip at 2.80 GHz.

Appendix 1—figure 1
PanelPRO in depth package workflow.

Tables

Table 1
Pedigree structure in PanelPRO.
ColumnDefinitionValue
IDUnique numeric identifier of each individualNon-repeated strictly positive integer
MotherIDID of one’s motherStrictly positive integer or NA (missing)
FatherIDID of one’s fatherStrictly positive integer or NA (missing)
SexSex of the individual: 1 for male, 0 for femaleOne of {0, 1}
isProbandIndicates the proband or counselee by 1 and 0 otherwise – multiple probands can be specifiedOne of {0, 1)
CurAgeAge of censoring: either the current age or death age, depending on isDead statusPositive integer or NA (missing)
isAff*Affection status of cancer *One of {0, 1}
Age*Affection age of cancer *Positive integer or NA (missing)
isDeadWhether someone has diedOne of {0, 1, NA}
raceRace of individual (used to modify penetrance)One of All_Races, AIAN, Asian, Black, White, Hispanic, WH, WNH, NA
AncestryAncestry of individual (used to modify allele frequencies)One of AJ, nonAJ, Italian, NA
TwinsIdentifies siblings who are identical twins or multiple birthsEach set is identified by a unique integer, and 0 otherwise
riskmodPreventative interventions which modify penetranceList, combination of "mastectomy", "hysterectomy", and "oophorectomy"
InterAgeAge of each preventative interventionsList, combination of integers
Gene name from GENE_TYPESGermline testing resultOne of {0, 1, NA}
Marker name from CK14, CK5.6, ER, PR, HER2, MSIMarker testing resultOne of {0, 1, NA}
Table 2
List of model options that the user can pass to PanelPRO, along with their defaults.
OptionDefault valuePossible valuesDescription
max.mutNULLIntegers up to the number of genesNumber of maximum simultaneous mutations, also known as the paring parameter. If no integer has been input, it re-defaults to 2.
iterations20Integers from 1 upwardsIn case of missing current or cancer ages in the pedigree, this is the number of times those ages will be imputed.
parallelTRUETRUE or FALSEIf age imputations are needed, this parameter can be set to utilize multiple cores on one’s machine.
netFALSETRUE or FALSEDetermines whether net or crude penetrances are used to compute future risk of cancer. Net penetrances exclude all other causes of death, apart from the affected cancer.
age.by5Integers from one upwardsThe intervals of age used to report the future risk of cancer.
Table 3
List of main functions in PanelPRO.
CategoryNameDescription
Pre-processingcheckFamChecks family structure as defined by the user. The inputs are a data.frame specifying the pedigree and a built database returned by buildDatabase. The output is a modified data.frame pedigree and list of imputed ages, if missing ages were imputed (see the Missing Data section).
Pre-processingbuildDatabaseSubsets the internal database PanelPRODatabase depending on the cancers and genes selected. The input is the list PanelPRODatabase. The output is another list which is a subset of PanelPRODatabase.
AlgorithmPanelPROCalcEstimates the posterior carrier probabilities and future risks of the proband. The inputs are the outputs of checkFam. The outputs are lists of posterior probabilities and future risks for the proband.
Main functionPanelPRORuns main function. The inputs are the user-specified pedigree, a vector of cancers in the model, a vector of genes in the model, and other optional parameters. The output is a list of estimates of posterior carrier probabilities for each genotype, along with future cancer risks and ranges for each of these.
Table 4
Comparison between supported cancers and genes in PanelPRO and other platforms.
Model or platform nameVersionSupported cancer input typesSupported gene carrier probability outputsSupported future cancer risk outputs
PanelPRO0.2.0Brain, breast, cervical, colorectal, endometrial, gastric, kidney, leukemia, melanoma, ovarian, osteosarcoma, pancreatic, small intestine, soft tissue sarcoma, thyroid, urinary bladder, hepatobiliaryAPC, ATM, BARD1, BMPR1A, BRCA1, BRCA2, BRIP1, CDH1, CDK4, CDKN2A, CHEK2, EPCAM, MLH1, MSH2, MSH6, MUTYH, NBN, PALB2, PMS2, PTEN, RAD51C, RAD51D, STK11, TP53 same as cancer inputs
BRCAPRO2.1–7Breast, ovarianBRCA1, BRCA2 same as cancer inputs
MMRPRO2.1–7Colorectal, endometrialMLH1, MSH2, MSH6 same as cancer inputs
IBIS0.8bBreastNABreast
CanRisk1.2.3Breast, contralateral breast, ovarian, prostate, pancreaticBRCA1, BRCA2, PALB2, CHEK2, ATM, RAD51D, RAD51C, BRIP1Breast, ovarian
PREMM-5NAColorectal, endometrial, other (group of ovarian, stomach, small intestine, urinary tract/bladder/kidney, bile ducts, brain, pancreas, sebaceous gland skin)MLH1, MSH2, MSH6, PMS2, EPCAMNA
Table 5
Notation for Mendelian Modeling for a model with K genes and R cancers and a family of I members.

The subscript i denotes the i th family member.

Variable and notationDescriptionR object from user input, if applicable
Genotypes
𝐆i=(Gki)k=1KGenotype of individual i, where Gki is the binary indicator for carrying a deleterious mutation in the k th gene
𝐆=(𝐆i)i=1IGenotypes of all family members i=1,,I
Sex
UiBinary indicator that individual i is maleSex
𝐔=(Ui)i=1IBinary male indicators for all family members i=1,,I
Cancer history
TriAge of diagnosis of the r th cancer for individual iAgeXX
CiIndividual i’ s censoring age (current age or age of death)CurAge
δri=I(TriCi)Binary indicator that cancer r occurs before the censoring age for individual i
𝐇ri={(Ci,δri)if δri=0(Ci,δri,Tri)if δri=1Observed history of the r th cancer for individual i, not including risk modifiers and interventions
𝐇i=(𝐇ri)r=1RAll observed history for individual i
𝐇=(𝐇i)i=1IObserved histories for all family members i=1,,I
Td,riIndividual i ’s age of death from causes other than cancer r
Tri*=min(Tri,Td,ri)Individual i ’s age of first outcome, either cancer r or death from causes other than cancer r
Jri=I(Tri*=Tri)Binary indicator that individual i develops the r th cancerisAffXX
Appendix 1—table 1
Abbreviations of cancers in PanelPRO.
Short nameLong nameShort nameLong name
BRABrainOCOvarian
BCBreastOSTOsteosarcoma
CERCervicalPANCPancreas
COLColorectalPROSProstate
ENDOEndometrialSISmall Intestine
GASGastricSTSSoft Tissue Sarcoma
KIDKidneyTHYThyroid
LEUKLeukemiaUBUrinary Bladder
MELAMelanomaHEPHeptobiliary
Appendix 1—table 2
Summary of sample pedigrees provided within the package.
Pedigree nameNumber of family membersCancers present
test_fam_119BC, OC
test_fam_225ENDO, PANC, SI
test_fam_350BRA, BC, COL, ENDO, GAS, KID, MELA, OC, PANC, PROS, SI
test_fam_49BC, OC, BRA, COL, PROS, ENDO, SI,
test_fam_517BC, MELA
test_fam_619BC, ENDO, MELA, PANC
test_fam_719COL, ENDO
test_fam_820COL, PROS
test_fam_919BC, PROS
test_fam_1021BC, COL
test_fam_1116BC, ENDO, OC, PANC, SI
test_fam_1221all cancers in Appendix 1—table 1
err_fam_110BC, PANC

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Gavin Lee
  2. Jane W Liang
  3. Qing Zhang
  4. Theodore Huang
  5. Christine Choirat
  6. Giovanni Parmigiani
  7. Danielle Braun
(2021)
Multi-syndrome, multi-gene risk modeling for individuals with a family history of cancer with the novel R package PanelPRO
eLife 10:e68699.
https://doi.org/10.7554/eLife.68699