(A) Comparing the number of wells supporting various mutation types in run D1110 confirms that, as hypothesized, the majority of UTDs are present in only a few wells. Horizontal lines represent median. Boxes represent interquartile range. Whiskers show the range excluding outliers which are defined as being outside 1.5 times the interquartile range. (B) Similarly, the dual-allelic compartment rate of UTDs appears to be significantly higher when compared with true variants. This value was calculated by dividing the number of wells with co-presence of variant and reference alleles to the total number of wells with evidence for variant allele. (C) A diagram showing the main challenges in analyzing DigiPico data using ANNs. Each circle/star indicates one variant. Red lines show the behavior of the classification model. All variants above and/or to the left of the lines are predicted to be true variants by the model. The analysis of a sample without clone-specific variant would result in precise separation between real and artefactual mutations. In contrast, the analysis of a sample with true clone-specific mutations would result in a suboptimal model, which could lead to an over-fitting against true UTDs. This will enforce a model that removes all FP calls at the cost of losing nearly all clone-specific variants. (D) A diagram showing the two-step training process in MutLX. The first training step identifies some of the mislabeled true mutations (grey circles) among UTDs. All potentially mislabeled data points are temporarily removed from the analysis in the second training (colored in black) so that a better model is obtained for assigning a probability score to all mutations. Finally, combining the probability scores obtained from the model with the uncertainty estimate (as described in E) of these probability scores allows for effective elimination of FP calls while maintaining an excellent sensitivity for true clone-specific variants. (E) A diagram showing the test-time drop-out analysis to compute the uncertainty estimate of probability scores. Black neurons indicate the neurons that had been turned off during the drop-out analysis. Accepting only variants with a high probability score and a low uncertainty score should allow for elimination of FP variant calls. (F) The ROC curves of the output of MutLX analysis for runs D1110, D1111, DE011, and GM12885 are presented. Circles represent the default cut-off values determined by MutLX. (G) Bar plots representing the number of passed UTDs in the output of SCcaller, Platypus and MutLX. Since no true UTDs are expected to be present in runs D1110, DE011, and GM12885 the number of UTDs in these runs represent the FP rate for each analysis method. Values for Platypus are based on DigiPico-specific filtering criteria prior to the application of MutLX.