BreakLoops: A New Feature for the Multi-Gene, Multi-Cancer Family History-Based Model, Fam3Pro

  1. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Cambridge, United States
  2. Department of Data Science, Dana Farber Cancer Institute, Boston, United States
  3. École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Goutham Narla
    University of Michigan, Ann Arbor, United States of America
  • Senior Editor
    Eduardo Franco
    McGill University, Montreal, Canada

Reviewer #1 (Public review):

Summary:

Although consanguinity is a rare clinical occurrence, it results in essentially a failure state for pedigree analysis algorithms by introducing loops that prevent accurate risk estimation. Therefore, Kubista et al. developed the graph-based "breakloops" function to allow their PanelPRO risk estimator (PMID 34406119) to successfully process consanguineous pedigrees.

Strengths:

This function allows them to first identify a loop in a pedigree, then decide which of two separate algorithms to best apply, Prim's or greedy, to optimize the introduction of clones to break these loops. As this function is automatic, it represents an improvement over previous similar algorithms, and also allows for the optimal algorithm to be chosen. The inclusion of pseudocode in the manuscripts provides a succinct summary of the logic behind the above: it greatly enhances the understanding of the function for those not necessarily computationally inclined.

After simulating a variety of consanguineous possibilities, the authors leveraged clinical pedigree data to validate their function. Integration of clinical pedigrees was extremely helpful in demonstrating the real-life applicability of this update. The successful inclusion of these clinical data justifies the claims they make regarding the ability to assess cancer risk in a wider range of family structures.

Weaknesses:

As consanguinity is inextricably linked with autosomal recessive disease, the discussion on the clinical implications of this new function is lacking.

Reviewer #2 (Public review):

Summary:

This paper introduces a new function within the Fam3Pro package that addresses the problem of breaking loops in family structures. When a loop is present, standard genotype peeling algorithms fail, as they cannot update genotypes correctly. The solution is to break these loops, but until now, this could not be done automatically and optimally.

The manuscript provides useful background on constructing graphs and trees from family data, detecting loops, and determining how to break them optimally for the case of no loops with multiple matings. For this situation, the algorithm switches between Prim's algorithm and a simple greedy approach and provides a solution. However, here, an optimal solution is not guaranteed.

The theoretical foundations-such as the representation of families as graphs or trees and the identification of loops-are clearly explained and well-illustrated with example pedigrees. The practical utility of the new function is demonstrated by applying it to a dataset containing families with loops.

This work has the potential for considerable impact, especially for medical researchers and individuals from families with loops. These families could previously not be analysed automatically and optimally. The new function changes that, enabling risk assessments and genetic calculations that were previously infeasible.

Strengths:

(1) The theoretical explanation of graphs, trees, and loop detection is clear and well-structured.

(2) The idea of switching between algorithms is original and appears effective.

(3) The function is well implemented, with minimal additional computational cost.

Weaknesses:

(1) In cases with multiple matings, the notion of a "close-to-optimal" solution is not clearly defined. It would be helpful to explain what this means-whether it refers to empirical performance, theoretical bounds, or something else.

(2) In the example pedigree discussed, multiple options exist for breaking loops, but it is unclear which is optimal.

(3) No example is provided where the optimal solution is demonstrably not reached.

(4) It is also unclear whether the software provides a warning when the solution might not be optimal.

Author response:

Response to Reviewer #1:

We plan to extend the discussion section to discuss the clinical implications of this new function. We will note the algorithm's applicability to broader genetic counseling contexts beyond cancer risk assessment.

Response to Reviewer #2:

We will clarify the four points raised:

(1) "Close-to-optimal" definition: We will explain that in multiple-mating cases, finding the global optimum is NP-hard (equivalent to the Weighted Feedback Vertex Set problem). We will clarify that our greedy algorithm provides practically efficient solutions suitable for clinical use, though without theoretical optimality guarantees.

(2) Example clarity: We will improve Figure 1's caption to explain the cost calculations and note that with equal weights, both shown solutions are equivalent.

(3) Non-optimal examples: We will describe scenarios where the greedy algorithm may not achieve the global optimum, particularly in multiple-mating cases with heterogeneous weights.

(4) Warning message: The current version not provide a warning when the solution might be non-optimal. This may be added in the future to the function.

We appreciate your feedback and suggestions to help improve the manuscript.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation