1. Structural Biology and Molecular Biophysics
Download icon

DeepFRET, a software for rapid and automated single-molecule FRET data classification using deep learning

  1. Johannes Thomsen
  2. Magnus Berg Sletfjerding
  3. Simon Bo Jensen
  4. Stefano Stella
  5. Bijoya Paul
  6. Mette Galsgaard Malle
  7. Guillermo Montoya
  8. Troels Christian Petersen
  9. Nikos S Hatzakis  Is a corresponding author
  1. Department of Chemistry and Nanoscience Centre, University of Copenhagen, Denmark
  2. Structural Molecular Biology Group, Novo Nordisk Foundation Centre for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
  3. Niels Bohr Institute, University of Copenhagen, Denmark
  4. Novo Nordisk Foundation Centre for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
Tools and Resources
  • Cited 0
  • Views 1,429
  • Annotations
Cite this article as: eLife 2020;9:e60404 doi: 10.7554/eLife.60404

Abstract

Single-molecule Förster Resonance energy transfer (smFRET) is an adaptable method for studying the structure and dynamics of biomolecules. The development of high throughput methodologies and the growth of commercial instrumentation have outpaced the development of rapid, standardized, and automated methodologies to objectively analyze the wealth of produced data. Here we present DeepFRET, an automated, open-source standalone solution based on deep learning, where the only crucial human intervention in transiting from raw microscope images to histograms of biomolecule behavior, is a user-adjustable quality threshold. Integrating standard features of smFRET analysis, DeepFRET consequently outputs the common kinetic information metrics. Its classification accuracy on ground truth data reached >95% outperforming human operators and commonly used threshold, only requiring ~1% of the time. Its precise and rapid operation on real data demonstrates DeepFRET’s capacity to objectively quantify biomolecular dynamics and the potential to contribute to benchmarking smFRET for dynamic structural biology.

eLife digest

Proteins are folded into particular shapes in order to carry out their roles in the cell. However, their structures are not rigid: proteins bend and rotate in response to their environment. Identifying these movements is an important part of understanding how proteins work and interact with each other. Unfortunately, when researchers study the structures of proteins, they often look at the ‘average’ shape a protein takes, missing out on other conformations the protein might only be in temporarily.

An important technique for studying protein flexibility is known as single molecule Förster resonance energy transfer (FRET). In this technique, two light-sensitive tags are attached to the same protein molecule and give off a signal when they come into close contact. This nano-scale sensor allows structural biologists to get information from individual protein movements that can be lost when looking at the average conformations of proteins.

Advances in the instruments used to perform FRET have made observing the motion of individual proteins more widely accessible to non-specialists, but the analysis of the data that these instruments produce still requires a high level of expertise. To lower the barrier for non-specialists to use the technology, and to ensure that experiments can be reproduced on different instruments and by different researchers, Thomsen et al. have developed a new way to automate the data analysis. They used machine learning technology to recognize, filter and characterize data so as to produce reliable results, with the user only needing to perform a couple of steps.

This new analysis approach could help expand the use of single-molecule FRET to different fields , allowing researchers to investigate the importance of protein flexibility for certain diseases, or to better understand the roles that proteins have in a cell.

Introduction

Single-molecule Förster resonance energy transfer (smFRET) combined with TIRFm (total internal reflection fluorescence microscopy) is a key powerful method to study the structure of biomolecules and provide a dynamic perspective in structural biology (Lerner et al., 2018). Capturing the real-time readouts of nanometer-scale distances of individual biomolecules by smFRET allows the direct observations of dynamics, interactions, and intermediates of stochastic non-accumulating events, as well as dynamic equilibria between unsynchronized molecules, all of which are obscured in ensemble averaging techniques (Dimura et al., 2016; Hellenkamp et al., 2018; Holmstrom et al., 2019; Juette et al., 2016; Newton et al., 2019; Preus et al., 2015; Roy et al., 2008; Schuler and Eaton, 2008; Stella et al., 2018). The high fidelity and proficiency of smFRET established it as a key toolbox for the accurate characterization of mechanisms, biomolecular interactions function, and even structures of biomolecules (Craggs and Kapanidis, 2012; Dulin et al., 2018; Kalinin et al., 2012; Kilic et al., 2018; Ratzke et al., 2014), under both in vitro (Schluesche et al., 2007; Sharma et al., 2008; Stein et al., 2011) and in vivo (Okamoto et al., 2020; Sakon and Weninger, 2010) conditions. Despite its great quantitative utility and profound impact on structural biology, smFRET is not a direct imaging modality and data treatment for extracting quantitative dynamic information relies on multiple layers of preprocessing: raw image treatment, trace selection, and data analysis. Raw image treatment (Greenfeld et al., 2012; Hon and Gonzalez, 2019; Juette et al., 2016; Preus et al., 2015; Stella et al., 2018) and data analysis of the selected smFRET traces is in general well-standardized and relies on well-defined methodologies with strong theoretical backing (Hon and Gonzalez, 2019; Schmid et al., 2016).

The actual trace selection can be time-consuming but crucial due to the presence of undesired phenomena at the single-molecule scale, such as sample aggregation, fluorescent contaminants, incomplete or incorrect sample labeling, complex photophysical behaviors, and high noise, to mention a few (Algar et al., 2019; Hellenkamp et al., 2018; Roy et al., 2008). Existing software (Greenfeld et al., 2012; Hellenkamp et al., 2018; Hon and Gonzalez, 2019; Juette et al., 2016; Preus et al., 2015; Stella et al., 2018) by single-molecule labs can simplify the tedious and time-consuming selection of traces and were recently expanded to large datasets (Juette et al., 2016) albeit requiring some form of manual supervision and hyper-parameter tuning selecting the proper thresholds by an expert user. This need for human intervention could potentially be subjected to cognitive biases especially by less experienced users and could limit the expansion of smFRET to classic biology labs. The increasing expansion of smFRET to structural biology labs would benefit from rapid and benchmarked methodologies, reproducible across laboratories, with minimal human intervention. This is highlighted by several initiatives to standardize the smFRET field (Greenfeld et al., 2012; Hellenkamp et al., 2018; Lerner et al., 2018; Sali et al., 2015).

Recent advances in machine learning (ML) and specifically deep learning (DL) (LeCun et al., 2015), have radically improved our capacity to access and extract information from abstract and noisy inputs independently of human interventions as we (ATLAS collaboration, 2014) and others have shown (Berg et al., 2019; Christiansen et al., 2018; Falk et al., 2019; Gómez-García et al., 2018; Jones, 2019; Ouyang et al., 2018; Smith et al., 2019; Zhang et al., 2018). DL implementations are providing high-level robust performances and have been successfully used to analyze and augment a wide range of the fluorescence microscopy analysis pipeline including assessing microscope image quality (Yang et al., 2018), in-silico cell labeling (Christiansen et al., 2018), single-cell morphology analysis (Berg et al., 2019; Falk et al., 2019), detecting single molecules (White et al., 2020; Wu and Rifkin, 2015) and linking smFRET experiments with molecular dynamics simulations (Matsunaga and Sugita, 2018), amongst others (Berg et al., 2019; Christiansen et al., 2018; Falk et al., 2019; Gómez-García et al., 2018; Jones, 2019; Ouyang et al., 2018; Smith et al., 2019; Zhang et al., 2018).

Deep learning-based analysis has several advantages over other approaches: It recognizes abstract patterns and learns useful features directly from the raw input data, which allows the implementation of analysis routines that do not require extensive data preprocessing or empirically defined rules, and thus offer reproducible and less opinionated evaluation of single-molecule data; It is significantly faster than human annotation for large single-molecule datasets; it comes close to, or outperforms human performance; and its performance is increased when increasing dataset size constituting an ideal case for evaluating the large datasets obtained from single-molecule data (Berg et al., 2019; Christiansen et al., 2018; Falk et al., 2019; Gómez-García et al., 2018; Jones, 2019; Ouyang et al., 2018; Smith et al., 2019; Zhang et al., 2018). Especially important are convolutional deep neural networks (DNN), artificial neural networks that learn to approximate the underlying function that transforms input to associated output through multiple rounds of optimization. The strength of DNN is the ability to learn arbitrarily complex functions to best recognize particular aspects of the given input data and model complex nonlinear relationships. The DNN is then able to classify data into predefined classes based on the provided training labels. While the training of a DNN is generally a computationally intensive process, once trained the final model can easily be shared and used for making predictions at almost no computational cost to end-users.

Here we provide DeepFRET, an all-inclusive analysis software with a pre-trained DNN at its core, for a rapid, objective, and accurate assessment of smFRET data for quantifying biomolecular dynamics. The fully automated analysis software operates with minimal crucial human intervention and requires only a threshold on the data quality confidence, as an initial step, to output detailed quantification of structural dynamic from raw images. This is attained by an intuitive and user-friendly interface that integrates and automates common smFRET analysis procedures (Greenfeld et al., 2012; Hellenkamp et al., 2018; Hon and Gonzalez, 2019; Juette et al., 2016; Preus et al., 2015; Stella et al., 2018) from raw image analysis and background-corrected intensity trace extraction (Thomsen et al., 2019), to sophisticated trace classification, statistical analysis of single-molecule data and production of publication-quality figures of dynamic structural biology insights (see Materials and methods). DeepFRET comes as a free-to-use standalone executable for both Windows and Mac users allowing end-users with limited programming skills to easily operate it. A script-based version implemented entirely in Python enables experts to adjust features pipelining the analysis specified for their needs. We anticipate DeepFRET to take full advantage of the widespread digitization and open repository of smFRET data and form a reference point setting a bar for the data quality and data classification performance metrics, offering additional benchmarking the field for dynamic structural biology.

Results

DeepFRET software package

DeepFRET is an open-source software package that implements a neural network model architecture for data evaluation integrating into a user-friendly platform all common procedures for smFRET analysis (Figure 1, Figure 1—figure supplements 12). The neural network model architecture used here is based on a deep convolutional neural network to recognize particular spatial features present in the data. The model first passes the data through several layers of convolutions of different lengths, and in the process learns to recognize which particular elements of a sample are present at different length scales, to best classify it correctly. This has previously been used to label time-series data such as electrocardiographs (Hannun et al., 2019) or electrical readouts in DNA sequencing (Wick et al., 2018). Additionally, we added a long short-term memory (LSTM) layer after the convolutional layers, as this will also help the model to learn temporality in the data and propagate to the later frames the learned information (Karim et al., 2018; Oh et al., 2018). A detailed description of the model hyperparameters and architecture can be found in the Materials and methods section.

Figure 1 with 11 supplements see all
Overview of smFRET evaluation and analysis using DeepFRET.

(a) Cartoon of the typical heterogeneous data acquired in smFRET experiments. Varying criteria for data selection for downstream analysis may yield different structural and kinetic information. (b) Screenshots of the provided standalone software that integrates deep learning and reduces the selection to a single-user-adjustable criterion: the DEEP FRET confidence threshold. The simple and intuitive GUI integrates all the features of our approach for rapid traces extraction from raw images to filtering of traces based on the predicted classification, treatment of smFRET data to extraction of publication-quality figures. (c) End-to-end sequence classification of smFRET traces by deep learning. Raw signals of donor-donor, donor-acceptor, and acceptor-acceptor intensities in the form of ASCII files can also be loaded with the DeepFRET software. The pre-trained DNN will classify individual frames into one of six different categories: bleached, static smFRET, dynamic smFRET, aggregate, noisy, and scrambled. A final smFRET confidence score is calculated, depending on each of the categories, that is used for threshold.

To ensure that the predictions of DeepFRET would generalize to a wide range of experimentally observable behaviors independently of biological systems or experimental conditions, we provide a fully pre-trained DNN model. The implemented DNN is pre-trained on 150,000 simulated traces that uniformly sample all possible FRET states, their respective lifetimes, occupancies, and transition pathways, as well as all possible noise levels, ensuring that the data represents all theoretically possible configurations (see Figure 1, Figure 1—figure supplements 35, Materials and methods for software and algorithms). As such DeepFRET does not require the selection of any direct initial guesses of FRET values or user-defined parameter pretraining. However, we do provide both a script-based method for simulating smFRET data and a simple graphical interface for expert end-users to adjust simulation distribution parameters (see Figure 1b, Figure 1—figure supplement 6 and Materials and methods) for model re-training if needed (e.g. for specific circumstances or stricter criteria). This offers experts the possibility to benchmark the impact of, for example, one’s sorting criteria, noise, and optical correction factors.

We built DeepFRET to treat both alternating laser excitation (ALEX) and non-ALEX FRET data. DeepFRET imports raw microscope images and performs colocalization of the two channels, to extract background corrected intensity traces of DD (donor excitation; donor emission), DA (donor excitation; acceptor emission), their respective stoichiometry, and in the case of ALEX data, also AA (acceptor excitation; acceptor emission; see Figure 1a, Figure 1—figure supplement 4). Alternatively, one can directly load and process previously-obtained time-traces ASCII format as exported from the popular software package iSMS (Preus et al., 2015) without their associated videos.

For a given time trace the DNN predicts and outputs six softmaxed probabilities pi to each time frame (Figure 1c, Figure 1—figure supplement 5 and Materials and methods), representing the six categories it has been trained to recognize: bleached (B), static smFRET within the experimental time frame (S), dynamic smFRET (D), aggregate (A), noisy (N), and all other types of non treatable data smFRET data defined here as scrambled (X) (Figure 1c, Figure 1—figure supplements 5 and 7). Both static and dynamic traces are included for further analysis. Given these probabilities, which sums to one, a simple sliding window then searches for frames predicted by the DNN to be bleached (pB >0.5, see (Figure 1, Figure 1—figure supplements 58) for evaluation accuracy, and blinking exclusion). When bleaching is found, the rest of the trace is removed to exclude the photobleached frames part of a trace from further analysis. If the trace still contains a minimum number of viable frames (here set to 15 but adjustable), the probabilities are summed up over all remaining time frames for each of the five remaining categories and divided by the number of frames for normalization (see Materials and methods and Figure 1c, Figure 1—figure supplements 58). We define the summaries of the combined static and dynamic trace scores as the ‘DeepFRET score’, representing the DNN model confidence that a trace is truly smFRET. The user-friendly interface displays all the categories and their associated probabilities, and offers the option for expert users to manually revise the classified traces.

If the DeepFRET score is above the user-defined threshold, the trace is accepted for the subsequent analysis (see Figure 1, Figure 1—figure supplement 8 and Materials and methods). Subsequent analysis involves two-channel fitting of idealized FRET traces using Hidden Markov modeling HMM (using the open-source package pomegranate); data and statistical evaluation of the abundance of FRET states and lifetimes; application of correction factors; and transition density plots (see Figure 1b, Figure 1—figure supplements 6 and 911). The number of underlying FRET distributions is automatically determined using Bayesian information criterion (BIC), offering the unbiased analysis of distribution of biomolecular distances (see Figure 1b, Figure 1—figure supplements 6 and 911). All data can be directly exported in publication-quality figures or extracted as data for user specific analysis if required.

Performance of DeepFRET

To test our DeepFRET performance in practice, we initially compared it with commonly used threshold values. To be on the safe site, we simulated 200 ground truth smFRET traces and merged them with a dataset containing 5000 random, non-smFRET traces (too noisy, aggregates of multiple molecules, aberrant single-molecule behavior, see Materials and methods for parameter descriptions). The obtained overall FRET distribution would be akin to what one would observe experimentally before any preprocessing of smFRET data on a low purity protein sample (Figure 2a). Common procedures for pre-selecting valid data for treatment often rely on an initial automatic threshold for discarding this large fraction of non-smFRET data (see Figure 1a). This is based on any number of combinations of the anticorrelated signal of the donor and acceptor, fluorophore bleaching, noise levels, or certain ranges of fluorophore stoichiometry, if recorded using ALEX methods (Hellenkamp et al., 2018; Hohlbein et al., 2014; Juette et al., 2016; Lee et al., 2005; McKinney et al., 2006; Preus et al., 2015; van de Meent et al., 2014). We first removed photobleaching and then accepted or rejected traces based on commonly used thresholds of median stoichiometry and max intensity (but not anticorrelation, see Materials and methods) without any manual post-inspection of the data. Figure 2b displays ground truth distribution (green) and the distribution of the accepted traces (pink) for varying the above thresholds. We recovered a poorly-defined FRET distribution that even at the tightest threshold does not recapitulate the underlying ground truth two-state conformational equilibrium. We calculated the common model evaluation metrics ‘precision’ and ‘recall’ (see Materials and methods) to quantify the quality of the predictions. The precision and recall, though improved by tightening the threshold, remain around 0.22 and 0.40, in the best case for simple thresholding (Figure 2b). The fact that out of 366 selected traces only 80 were true positive, while 286 were false positive and 120 false negative, highlights the need for human intervention as many traces are indistinguishable with simple statistical characterization (selected examples are shown below the histograms [Figure 2b]).

Figure 2 with 3 supplements see all
High quality FRET data evaluation.

(a) Simulated dynamic smFRET traces transitioning between FRET states 0.3 and 0.7 (left, ‘ground truth’) were mixed with a larger number of traces not showing smFRET (center). The overall distribution (right, ‘combined’) shows how the desired data can be drowned out in non-smFRET contaminant traces. The distribution would correspond to a raw distribution as extracted from raw image analysis of smFRET on low purity protein sample before any trace selection. (b) Automatic selection of data based on median stoichiometry, single-molecule intensity and bleaching. The number n designates the number of traces accepted by the model. Tightening the selection thresholds results in slight improvement of the poor overlap of the selected data with ground truth data, highlighting the need for a time-consuming and prone to potential cognitive biases human intervention. (c) Automatic classification of all traces of the combined set by DeepFRET, based only on the DeepFRET score threshold variation. Even at a low threshold DeepFRET selection follows the ground truth data. Increasing the score threshold further increases the fidelity of data selection. DeepFRET correctly assigns the dynamic, bleaching and aggregate behavior on the same smFRET traces as in (b) (see Figure 1—figure supplement 5 for more data). The single-user adjustable score threshold outperforms commonly used thresholds offering rapid, cross-lab reproducibility, and fully automatic data treatment. P: precision, R: recall, TN: true negatives, FP: false positives, FN: false negatives, TP: true positives.

DeepFRET, on the other hand, allowed the high-fidelity recovery of the underlying ground truth distribution reaching a precision of 0.91 when setting a DeepFRET score of 0.85 (Figure 2c) without the need for human intervention. The virtually identical FRET distributions, matching the ground truth data, that are derived for a large fraction of score thresholds (0.5– 0.85) show no systematic biases originating from data evaluation and illustrate the minimum impact of human interventions when using DeepFRET (see Figure 2, Figure 2—figure supplement 1). As expected, the fidelity of DeepFRET pertained to correctly identifying single or complex multistate FRET distributions (see Figure 2, Figure 2—figure supplements 12) reaching precision 0.91 as compared to just 0.22 for standard threshold setting in the absence of further human intervention. The practically identical precision and recall for single, double or triple, state FRET distributions independently of threshold further support the wide applicability to multiple biological systems.

Quantification of precision and recall of the selection for various DeepFRET score thresholds displays the tradeoffs in recovering a high fraction of useful data. Thresholding data with scores in the regime 0.8–0.9 appears optimal for maintaining sufficient and high-fidelity data (Figure 2, Figure 2—figure supplement 1) for the trace characteristics here. Based on these data, we employed a DeepFRET score threshold of 0.85 as optimal for maintaining high precision at reasonable recall values. We note that depending on datasets, imaging condition, noise levels, etc., users may need to adjust the threshold. The power of DeepFRET is further highlighted by the classification for the traces that were assigned as false negative and false positive by commonly accepted thresholds (traces in Figure 2b and Figure 2c, Figure 2—figure supplement 3). In summary, the fidelity of classification accuracy appears to supersede currently used simple thresholding, without human interventions. This was achieved in a fraction of the time required for data classification by human operators (~1 min for 10,000 traces on a normal laptop, as compared to potentially days for manually inspected traces). This improved classification was also achieved entirely without any preprocessing or post-inspection of data, illustrating the power of DeepFRET to operate without human interventions and its potential to benchmark the reproducibility of smFRET data acquisition methods for multiple biomolecular systems across laboratories.

We quantified and displayed using confusion matrix the discordance between the ground truth data and the data selected and classified by DeepFRET (Figure 3). In the confusion matrices, displayed in Figure 3, each row represents the predicted classification of traces while each column represents the ground truth data. The high classification accuracy for the annotation of individual frames is highlighted by the clear diagonal feature. We found similar classification performance for a DNN trained on non-ALEX FRET (by a DNN with only DD and DA inputs, which we will refer to as ‘ALEX-disabled’; Figure 3 right panels) signifying the applicability of the DeepFRET approach to both ALEX and non-ALEX FRET data. The misclassification between static and dynamic smFRET traces is practically non-existent and consists of <3% dynamic traces being misclassified as static, for both model types. This is important for accurately quantifying the abundance of static and dynamic subpopulations within the experimental time frame, which has been shown to have a clear experimental impact (He et al., 2019; Kilic et al., 2018; Osuka et al., 2018; Wood et al., 2012).

Figure 3 with 1 supplement see all
Confusion matrices of DeepFRET classification based on the ground truth data test.

(a) Classification accuracy of data in the six categories for the ALEX-enabled model, or the ALEX-disabled model. The absolute number of frames is shown while the fractions for each classification is displayed in parentheses (as calculated row-wise for each true label). The diagonal percentages show the accurate classification of DeepFRET (b) per-trace classification accuracy based on accepting only traces that are classified as smFRET (static/dynamic), and non-FRET data.

DeepFRET was found to classify bleached or aggregated frames with a 98% the true positives for ALEX-FRET model enabled (97% for the non-ALEX model), whereas only 89% (and 83% for ALEX-disabled) of the scrambled traces were correctly classified (see also Figure 2—figure supplement 3 for a detailed breakdown of the precision and recall). We note that the model is trained with a noise contribution that is drawn from a normal distribution of varying width (σ uniformly distributed between 0.01 and 0.30, multiplied by the maximum single fluorophore intensity) with a small contribution of gamma-distributed noise. As such, traces with σ above 0.25 are characterized by the employed DNN as ‘noisy’ (see Figure 2, Figure 2—figure supplement 3 and Materials and methods). In a regime where transition rates are similar to the imaging temporal resolution, traces may be incorrectly classified as noisy by the model. To allow experts to accept more noisy traces or traces with fast transition rates that may appear as noisy for given imaging conditions, we integrated into the DeepFRET a visual trace simulator. This user-friendly simulator allows the generation of traces with ground truth labels of traces where all parameters are tunable to integrate the specific needs of each lab (see Figure 1, Figure 1—figure supplements 6 and 11).

We found the classification accuracy of each frame to be consistent with the classification accuracy on each trace, later derived from the overall most probable class given all predictions of individual frames of a trace (Figure 3a, Figure 3b, Figure 1, Figure 1—figure supplement 6 and Materials and methods). This is achieved by adding a bidirectional long short-term memory, LSTM, layer at the end of the DNN (Figure 1, Figure 1—figure supplement 1). The LSTM layer allows coherent predictions throughout the trace and forward propagation of information detected in the first frames, such as fluorophore detection or bleaching, to the predictions for later frames. By collapsing the per-trace confusion matrix into a binary ‘smFRET’ and ‘non-smFRET’ (as shown by the cross-lines in Figure 3b), DeepFRET was found to be very balanced overall, with a true-positive rate of 94% for smFRET traces, and a true-negative rate of non-smFRET traces (Figure 3c), resulting in an overall balanced classification accuracy of 94% for the ALEX-enabled model and 93% for the ALEX-disabled model.

We then compared the classification accuracy of DeepFRET to the accuracy of three different human operators working with smFRET to evaluate the feasibility of manually inspecting and making decisions about smFRET examples. We simulated 1000 ground truth traces, of which only 46 contained actual smFRET, at different, randomly chosen levels of noise. The participants were neither informed about the underlying distribution nor the true number of smFRET traces. The test revealed that the average performance of the human operators, scoring 0.76 ± 0.10 in precision and 0.83 ± 0.14 in recall, was close to the precision-recall curve of the DNN, on a relatively small dataset (Figure 3, Figure 3—figure supplement 1). Notably, one participant scored slightly better than the model in both precision and recall but spent an average of 5 s per trace, which would significantly increase data treatment time, thus making this unfeasible in a high-throughput setting. The large spread on precision and recall attained by human operators on these data furthermore suggest a large possible spread in experimental outcomes and highlights the advantages of unifying, reproducible methodologies independent of human interventions. We, therefore, argue that DeepFRET is equally good, or better, as careful manual inspection while offering orders of magnitude faster data evaluation.

DeepFRET performance on real data compared to the existing robust software for smFRET analysis

The model’s generalizability was initially examined by evaluation on real experimental smFRET data previously published by us (Stella et al., 2018). The selected published dataset contains thousands of traces that included aggregates and incomplete labeled molecules, due to low labeling efficiency. Our pre-screening (using median stoichiometry and intensity distributions) and subsequent manual inspection of the data resulted in 214 traces to exhibit smFRET. Applying our trained model with a threshold of 0.85, without any other parameter tuning, recovered 228 traces, with a FRET distribution very closely matching manual selection (Figure 4, Figure 4—figure supplement 1). The DeepFRET score of human versus machine selection displays the importance of quantitative and reproducible assessment of trace scores (Figure 4, Figure 4—figure supplement 2). The total data evaluation time of <50 ms per trace (on a recent laptop) free of human intervention highlights the potential of DeepFRET to rapidly and reliably evaluate high throughput smFRET data. Most importantly, the trace selection is deterministic, strictly relies on the score threshold, and is thus independent of potential human cognitive bias. This demonstrates the ability of the DNN to generalize to a completely new set of experimental data, without any prior expectations for signal-to-noise ratio, anti-correlation, underlying FRET distribution, etc., offering the possibility to rapidly analyze smFRET data for structural biology insights.

Figure 4 with 4 supplements see all
Method evaluation on real previously published smFRET data.

(a) Raw FRET distribution as it would look before any sorting to remove incomplete or multi-labeled proteins, aggregates, cross talk, etc. (b, c) Comparison of DeepFRET data selection with published distributions for 0.7 in (b) and 0.85 in (c) thresholds. At the DeepFRET score threshold of 0.85, a high-fidelity data selection is achieved resulting in a similar distribution as compared to manual selection.

To further evaluate the performance of DeepFRET we compared it to the existing, robust, and widely used software packages for classification and treatment of smFRET data, iSMS, and SPARTAN (Juette et al., 2016; Preus et al., 2015), as well as ebFRET and HAMMY that focus on the kinetic analysis (McKinney et al., 2006; van de Meent et al., 2014). The performance was initially tested on simulated, ground truth data and further evaluated on published and publicly available experimental data (Kilic et al., 2018; Hellenkamp et al., 2018). For the simulated data, 200 ground truth smFRET traces were merged with 1800 simulated, non-smFRET traces and sorted by the various software packages. SPARTAN and iSMS both have sophisticated tools for automated sorting of traces (intensity, stoichiometry, and FRET thresholds in iSMS, and 26 parameter thresholds in SPARTAN including donor/acceptor correlation, SNR, intensity, FRET lifetime, and exclusion of photoblinking), while ebFRET and HAMMY relies on simple thresholds on the intensity and FRET. Here, practically the commonly used thresholds were employed in all software packages. DeepFRET was found to sort traces at least similarly to, or better than, both SPARTAN and iSMS without any parameter tuning, closely matching the underlying ground truth distribution, while ebFRET and HAMMY would require further sorting by manual selection or additional software packages for optimal results (see Figure 4—figure supplement 3). We then compared the performance of the three software packages offering advanced sorting on experimental data from other groups (Kilic et al., 2018; Hellenkamp et al., 2018). To ensure proper testing, the performance was evaluated on both ALEX and non-ALEX data. Our data show that all software packages were able to reproduce published FRET distributions from raw tif files with a little discrepancy (see Figure 4—figure supplement 4). We used practically the default settings in both software. We note that expert users are well trained to navigate through all thresholds and define their own to further, and even more accurately, optimize data selection. This task however may become more challenging on data where the ground truth distribution is unknown and fine-tuning parameters could be subject to bias, especially for non-specialized users. The use of a single threshold and thus minimal required expertise, offered by DeepFRET may be crucial for a greater number of scientists to take advantage of this tool.

To ensure the facile operation of DeepFRET by non-machine learning experts and users without any programming skills, we provided a standalone executable along with simple and detailed instructions on how to use it (see Materials and methods). DeepFRET implements and automates in a user-friendly and intuitive platform all common procedures for smFRET analysis: sophisticated raw image analysis from raw. tiff files; particle and signal detection and localization; pixel intensity extraction for each biomolecule on both spectral channels and background corrected fluorescence and FRET trace trajectories; automatic trace classification and sorting; unbiased analysis of number of FRET states based on BIC analysis; 2-channel fitting of idealized FRET traces using HMM analysis based on calculated number of states by BIC; data and statistical evaluation of abundance of FRET states and lifetimes; application of correction factors; and transition density plots (see Figure 1, Figure 1—figure supplements 911). DeepFRET furthermore offers interoperability and backward-compatible trace loading from. txt files exported from the popular iSMS software package. The software can export all results to publication-ready quality figures and also allows the extraction of data for further user-specific downstream analysis if desired.

The freely available open-source code and the underlying mathematical operations that are based on many commonly used packages (e.g. NumPy, SciPy, Matplotlib) will allow expert users to adjust features pipelining the analysis depending on their needs (see Code Availability). The DNN model is trained using Keras/TensorFlow, one of the most popular frameworks for deep learning. While the DNN is pre-trained with DeepFRET, we also provide the option for simulating new data with additional parameters offering the possibility of DNN model re-training to meet the specialized needs of trained users (e.g. multicolor FRET). The programming interface, on the other hand, allows the convenient implementation of additional scripts pipelining the analysis and to potentially expand it to additional single-molecule time-series analysis.

Discussion

smFRET is a powerful toolkit, key for exploring dynamic structural biology, but to meet its full potential, automated standardized and user-independent analysis of data is essential. Because the experimental conditions, instruments, and biological systems drastically vary across laboratories, the treatment of data based on semi-automatic methods and simplified assumptions could yield different conclusions. DeepFRET is designed to fill this void and analyze data independently of any assumptions and reproducibly across laboratories. Our experiments show that a neural network classifier trained on purely simulated smFRET time-series accurately and efficiently recognizes and classifies smFRET both in simulated ground truth data and in a real-world dataset. DeepFRET classification accuracy consistently outperformed trace selection using commonly published thresholds. Similarly, it supersedes the selection accuracy of human operators and importantly, only requiring a fraction of the time (minutes versus weeks if traces are manually selected). Such a drastic reduction of analysis times will allow the acquisition of even larger datasets expanding the field for high throughput analysis and improving the robustness of conclusion. The fact that DeepFRET does so only requiring a score threshold, as a sole human intervention, demonstrates its strength as a novel smFRET analysis method and its potential to form a reference against which the quality of the data and the structural biology insights are benchmarked. DeepFRET was found to perform accurately for both ALEX and non-ALEX smFRET data (Figure 4, Figure 4—figure supplements 34) highlighting its precise classification and applicability across laboratories and methods. The limited effect of human operators on data selection on the other hand illustrates its potential to contribute to the standardization of the field, increasing reproducibility across laboratories. We anticipate that DeepFRET, combined with the advent of commercial single-molecule instruments, will contribute in materializing the smFRET as a robust mainstream toolkit for structural biology labs.

DeepFRET is currently trained and thoroughly tested to operate on 2-color ALEX and non-ALEX smFRET data. Other experimental techniques such as 3- and 4-color FRET (Stein et al., 2011; Yoo et al., 2020), protein-induced fluorescence enhancement (PIFE; Hwang and Myong, 2014), metal-induced energy transfer (MIET; Chizhik et al., 2014) or intraparticle surface energy transfer (i-SET; Zhou et al., 2020) could be implemented in the future through additional simulation of training data with subsequent DNN re-training and software optimization (see Materials and methods and data availability for instructions). The simulation of data in DeepFRET is based on the assumption that protein dynamics as observed by smFRET follows a Markov process. Based on this assumption, practically any kind of experimental data can be simulated, using the implemented visual trace simulator, to meet specific requirements. This may include additional noise levels above 0.30, transition probabilities larger than 0.2, traces with more than 4 FRET states, slower or faster lifetimes that may be classified as noisy, etc. In its current version, DeepFRET operates on surface-immobilized particles only, however implementing single-particle tracking as described in Bohr et al., 2019 would allow simultaneous tracking and extraction of dynamics through smFRET on freely diffusing particles open up exciting possibilities, for example, live cell imaging with single-particletracking (Singh et al., 2020) and in vivo high throughput smFRET studies (White et al., 2020).

DeepFRET’s neural network is trained to operate for smFRET data but our approach of time-series classification and sequence annotation can conveniently be extended to consider a spectrum of stochastic single-molecule trajectories of individual turnovers including tracking (Bohr et al., 2019; Ferro et al., 2019; Lu et al., 1998; Persson et al., 2013), constant force measurements (Goldman et al., 2015) and blinking of individual molecules (Durisic et al., 2014; Wang et al., 2019) using either simulated or high-quality annotated experimental data for training. Consequently, we expect the neural network of DeepFRET or similar approaches to be a paradigm shift and enable new fully automated analysis methodologies related to biomolecular recognition, protein folding and dynamics, and super resolution. Such advances are paramount for increasing the breadth and impact of single-molecule studies to be fully exploited in structural biology.

Materials and methods

We first define a nomenclature that will be used throughout the text and plots: DD, DA, AA (donor excitation→donor emission, donor excitation→acceptor emission, acceptor excitation→acceptor emission, respectively). A separate background signal is not considered, as we assume all model inputs to be background-corrected (i.e. background is 0).

Synthetic smFRET data generation

Request a detailed protocol

Deep learning requires large amounts of diverse data in order to generalize well to unseen data. We have developed a method to generate the required thousands of ground truth traces to cover every type of empirically observed trace, with a dedicated user interface option (Figure 1—figure supplements 2 and 6). This algorithm includes the generation of TIRF-microscopy smFRET traces of ALEX or non-ALEX data. The traces sample any given FRET value with tunable dye photobleaching lifetime, signal noise, dye blinking, donor bleedthrough, aggregates (i.e. more than one donor/acceptor fluorophore) of any given size, as well as a ‘scrambling’ feature, to account for fluorophore phenomena that could not be classified as stemming from smFRET.

In order to generate traces, for each pair, we first generate the underlying FRET states from an adjustable Hidden Markov model and assume unscaled unit-intensities for DD, DA, AA. Then, if the energy transferred is defined by

(1) FRET=DA / (DD+DA)

the remaining intensity of the donor is

(2) DD=1FRET

and from (1), the transferred intensity is

(3) DA=(DD  FRET) / (FRET1)

In a perfectly-aligned setup, one can expect that

(4) DD + DA=AA

such that the stoichiometry S will be exactly

(5) S=(DD+DA) / (DD+DA+AA)=0.5

Initially, all fluorophores are simulated with an intensity of 1 (with absolute scaling only adjusted after applying all other parameters). Additionally, the intensity of AA should always be 1, regardless of the current FRET state. In practice, the AA intensity may not be exactly half of DD+DA (and consequently one might observe S that deviates slightly from 0.5). To account for this, we uniformly sample ‘AA-mismatch’ as a percentage of the unit intensity signal. Upon fluorophore photobleaching, with lifetimes sampled from an exponential distribution, either DD or DA/AA is set to 0. Noise, AA-mismatch, and donor bleedthrough are added to the ground truth signals to obtain the observable DD, DA, and AA, we can calculate realistic, observable values for E and S. For each synthetic trace, the noise is drawn from a Normal (μ = 0, σ) distribution of varying σ. We found that, on top of the normally distributed noise, we could add the noise from a (centered) Gamma(k = 1, θ = 1.1) distribution multiplied with the noise amplitude at each frame (and is thus controlled via the noise parameter). This did not visually alter the spread of the distribution significantly but improved the robustness of predictions on real data, as we found empirically that the noise of experimental data never exactly followed a pure normal distribution.

State-of-the-art neural networks can achieve human-like or better performance on a wide range of classification tasks. Recently, however, it has been demonstrated how small modifications to the input can lead to wildly inaccurate outputs (Goodfellow et al., 2014). During the development of our smFRET classification model, we observed how photophysical artifacting (described as ‘interesting effects’ by TJ Ha’s group [Roy et al., 2008]) would lead the model to make confident yet very inaccurate predictions to fix this, our trace generation algorithm contains extensive ‘scrambling’; we found that by randomly flipping one of the channels, creating strong correlations by multiplication of the channels or adding bursts of high noise and long dark states we could avoid ‘adversarial-like’ predictions. We note that scrambled data is not meant to mimic observable data but instead to make the model robust against mispredictions on highly aberrant data that does not fall into the other observable categories.

We generated ground truth traces, where every frame of the sequence was labeled as one of five categories: ‘(B) bleached’, ‘(A) aggregate’, ‘(N) noisy’, ‘(X) scrambled’, ‘(S) static smFRET’, or ‘(D) dynamic smFRET’ (see Figure 1—figure supplements 57). Additionally, we applied label smoothing with a strength of 0.05, as this has been shown to greatly improve model robustness and prediction confidence (Shafahi et al., 2019).

A central element of model training is the uniform sampling of the infinite number of possible permutations of FRET data (FRET states, occupancies, lifetimes, transition pathways, and noise). For training the model, we set the following parameters (easily adjustable in the interface, see Figure 1—figure supplement 6):

  • Up to four distinct FRET states within each trace uniformly sampled between FRET values 0 and 1 with a minimum distance of 0.1 FRET between states to be able to distinguish actual transitions from noise fluctuations. The uniformly sampled occupancies as well as FRET values are shown in Figure 1—figure supplement 3a.

  • A transition probability from one state to another, at any frame, uniformly sampled between 0 and 0.2. In a 4-state system, the maximum sampled transition probability is thus 0.2 between each of the four states yielding a total transition probability of 0.2*(4-1)=0.6 and thus a 1–0.6 = 0.4 probability of not transitioning at any given frame. The lifetime distribution of dynamic FRET states is an evenly weighted average over the exponential decays for each possible number of FRET states and transition probabilities. A Monte Carlo simulation on 10,000 traces sampling transition probabilities uniformly between 0 and 0.2 on 2–4 state traces, verifies that our training data follows the underlying model (see Figure 1—figure supplement 3b).

  • 0.15 probability that the trace is an aggregate.

  • Transition pathway sampling of an unbiased fraction of the entire smFRET space as shown in Figure 1—figure supplement 3c, where a transition density plot of 1000 simulated traces is plotted displaying a random transition pathway with no measurable bias.

  • 0.20 probability that a non-aggregate trace contains photoblinking.

  • 0.15 probability that a trace is scrambled, and in this case 0.90 probability that the scrambling is due to incorrectly colocalized fluorophores.

  • Acceptor-only mismatch between 70% and 130% of the donor intensity.

  • Donor-bleedthrough between 0% and 15% the donor intensity into the acceptor channel.

  • Noise drawn from a Normal(0, σ) distribution with σ values uniformly distributed between 0.01 and 0.30 (see Figure 1—figure supplement 4).

  • 0.8 probability that the noise has an additional layer of gamma noise on top, to mimic shot noise.

  • Individual trace duration of 300 frames.

  • Exponentially decaying photobleaching lifetime centered around 500 frames (which will generate a fraction of traces that do not contain any photobleaching).

  • 0.1 probability that the molecule will fall off the surface at a time given by an exponentially decaying lifetime centered around 500 frames (so it might not happen during the time of observation).

With these parameters, directly applicable as input for the algorithm (see Code availability), we randomly initially generated 250.000 traces of 300 frames each of randomized configurations. We then under-sampled data to balance the labels (as neural network classifiers perform worse if trained on highly class-imbalanced datasets) based on the first frame of each trace. This resulted in approximately 150,000 traces, roughly equally distributed over the five possible classes (bleaching being present in most traces naturally ends up accounting for a higher fraction of the overall frames). We used 80% for training the classifier and the remaining 20% for validation. After the training procedure, we generated an additional test set with 33.000 new traces and under-sampled it as previously, to roughly 20.000 traces, and based our statistical analysis on those alone.

We supplied only the raw features DD, DA, and AA to the model (or only DD and DA for the non-ALEX-enabled model), where for each trace, signals were normalized to the max of all signals in that trace to preserve the relative intensities between donor and acceptor. In this way, predictions done on individual smFRET traces are fully independent from every other in a given experiment, and also independent of non-standardized instrument intensity units (i.e. ‘arbitrary units’).

Neural network model setup and hyperparameters

An LSTM-RNN (long short-term memory recurrent neural network) classifier was implemented in Keras with TensorFlow as backend. The structure of the network (Figure 1—figure supplement 1) was inspired by a recent sequence classifier for ECG time-series (Hannun et al., 2019) that employs both skip connections and batch normalization as means to prevent overfitting. Here we replaced the global pooling layer with stride-1 max pooling layers, and added a bidirectional LSTM layer before the final fully connected layer, which we found lead to more temporally causal and context-sensitive predictions (e.g. if the model spots multiple bleaching steps in the beginning of a trace, this information is propagated throughout, so the whole trace can be confidently marked as aggregated).

Each residual block (‘Res’ in Figure 1—figure supplement 1) contains n = 2x filters, where x is five and is incremented by 1 at every 4th block. The kernel size k starts from 16 and is reduced by 4 at every 4th Res block, so as to learn larger-scale features and gradually smaller ones. The initial convolution has the same hyper-parameters as the first residual block. A 1 × k convolution is added in each residual block for efficiency (He et al., 2016). To avoid problems with vanishing gradients throughout such a deep model, each residual block keeps a copy of the input vector and adds it to the output vector (denoted by the ‘+” symbol). The long short-term memory (LSTM) cell is bidirectional and contains 16 units, and has a dropout rate of 0.4 applied to the outputs. For each frame, the outputs are distributed among six different classes by a dense layer with softmax activation.

The model loss was minimized in batches of 32 samples with the Adam optimizer, using the default parameters and the default learning rate of 0.001. The learning rate was decreased by a factor of 10 if validation loss showed no improvement over two consecutive epochs. The training was stopped early if no improvement in the validation loss was observed over five consecutive epochs. Convolutional kernels were initialized as proposed by He et al., 2015. Other layer configurations were left at their Keras defaults. The final model output is passed through a softmax layer, thus that for each frame the probabilities between all classes sum up to exactly 1. Further experimentation with optimizers and learning rates showed no significant improvement over the above configuration.

Bleaching detection

Request a detailed protocol

In order to avoid having single-frame bleaching triggering the remainder of the trace being marked as bleached, we employ a sliding window over the whole trace. In each window, at least 4 out of 7 frames must be marked as bleached with >0.5 probability by the model. If this condition is satisfied, all frames in the window and every frame onwards is marked as bleached, and excluded from the calculation of smFRET confidence. The model predicts with >99% accuracy bleaching (Figure 3). Additionally, if bleaching happens faster than the first 15 frames, the whole trace is classified as bleached, regardless of model classification, as the DeepFRET score would otherwise end up being artificially inflated (see below).

For stoichiometry-based thresholding (Figure 2), we employed a similar sliding window but instead marked frames as bleached if the stoichiometry was outside of the range (0.3, 0.7).

Precision and recall

Request a detailed protocol

We use precision and recall to quantify classifier performance. These are defined as,

(6) Precision:P=Tp / (Tp+Fp)
(7) Recall: R=Tp / (Fn+Tp)

where Tp, Fp, Fn are True positive, False positive, and False negative, respectively.

DeepFRET score calculation and trace classification

Request a detailed protocol

In order to calculate the confidence score, probabilities for all categories for each frame are first predicted by the model, and bleached frames (see above) excluded from the score calculation. The average probability pi over all frames t, for each of the remaining five categories is calculated, resulting in five category scores Pi for each category i.

Pi=tpittpitPi=tpittpit

Static smFRET (S), dynamic smFRET (D) scores are summed into the final DeepFRET score, and aggregate (A), noisy (N), and scrambled (X) scores ignored for calculation of this (but retained and displayed for explainability for the user). See Figure 1—figure supplements 58 for examples on all trace types.

Model performance evaluation

Noise level of synthetic data

Request a detailed protocol

We changed the label of traces to ‘noisy’ if the initial noise was drawn from a normal(μ = 0, σ) with σ above 0.25. Traces above this level of noise could no longer statistically be approximated as normally distributed by D’Agostino-Pearson two-sided test for normality (Figure 1—figure supplement 4; which is a requirement for fitting the correct number of FRET states in a trace, using a mixture model). Although a σ of 0.20 also fulfilled the p<0.05 test statistic, we chose to opt for a limit of 0.25, as we found that the neural network would otherwise tend to discard less noisy data too frequently.

Trends in human versus machine selection

Request a detailed protocol

To test for differences in the way a human versus our trained model would select traces, three participants partook in the manual selection of generated data (Figure 3—figure supplement 1), similar to that of Figure 2, only this time with 1000 traces, wherein 46 were true smFRET traces and 954 non-usable traces. The number of true smFRET traces and underlying distributions were unknown to the participants.

Performance test and comparison with existing software

Request a detailed protocol

For testing simple thresholding versus DeepFRET (Figure 2, Figure 2—figure supplements 12 and Figure 4—figure supplement 4) we generated data with the following parameters:

  • Acceptor-only mismatch between 70% and 130% of the donor intensity.

  • Donor-bleedthrough of 5% of the donor intensity into the acceptor channel.

  • Noise drawn from a normal(σ = 0.11) distribution

  • 1 (0.5 FRET), 2 (0.3, 0.7 FRET), or 3 FRET states (0.2, 0.5, 0.8 FRET)

  • Transition probability of 0.1 between states, at each frame.

Other parameters were set to the same value as what is used to generate training data. Furthermore, all generated ground truth traces that did not bleach were discarded.

Our definition of ‘simple thresholding’ is based on single-molecule intensity, median stoichiometry, and the presence of bleaching. Here we chose not to use any values for anti-correlation as this assumes that all molecules of interest are equally dynamic, when smFRET studies have shown that this may not always be the case (He et al., 2019; Kilic et al., 2018; Osuka et al., 2018).

Extra features of the software platform

Hidden Markov model and statistical analysis

Request a detailed protocol

The DeepFRET GUI has the option to fit traces with a Hidden Markov model, with adjustable strictness on the number of states, according to recent best practices for smFRET data analysis, including the ability to switch between predicting states from raw fluorescence intensities or EFRET values (Kelly et al., 2012). We fit the traces using the pomegranate implementation of the Baum-Welch algorithm (Schreiber, 2018). We further provide the option to predict state values directly from the Markov Model or from the median of the classified frames for each trace, to maintain compatibility and comparability with current results in the field. We provide clustering of subsequent transition density plots, lifetime estimates with detection of degenerate states, and publication-ready plots for Pearson's correlation coefficients, DD/DA histograms, and EFRET-stoichiometry histograms.

The Hidden Markov model was verified on externally available data from the kinSoft challenge, as well as simulated data produced within DeepFRET.

Data availability

Request a detailed protocol

All data used for model training and instructions on how to use it, is available at https://github.com/hatzakislab/DeepFRET-Model (Thomsen, 2020; copy archived at https://github.com/elifesciences-publications/DeepFRET-Model).

Code availability

Request a detailed protocol

We provide DeepFRET as an accessible GUI for everyone, as well as the Python source code for expert users. The code for the GUI as well as compiled executables, with instructions for how to edit and recompile the GUI is located at https://github.com/hatzakislab/DeepFRET-GUI.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
    Pomegranate: fast and flexible probabilistic modeling in Python
    1. J Schreiber
    (2018)
    Journal of Machine Learning Research 18:1–6.
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70

Decision letter

  1. Sebastian Deindl
    Reviewing Editor; Uppsala University, Sweden
  2. Suzanne R Pfeffer
    Senior Editor; Stanford University School of Medicine, United States
  3. Shixin Liu
    Reviewer; The Rockefeller University, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

DeepFRET, a new deep learning-based platform for standardized, automated, and unbiased single-molecule FRET data analysis, has great potential to lower the threshold for smFRET expertise, allowing for a greater number of scientists to take advantage of this powerful technique.

Decision letter after peer review:

Thank you for submitting your article "DeepFRET: Rapid and automated single molecule FRET data classification using deep learning" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by Sebastian Deindl as the Reviewing Editor and Suzanne Pfeffer as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Shixin Liu (Reviewer #1).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

A major bottleneck in many single-molecule FRET experiments is the need to sort through and classify the single-molecule time traces in order to separate the good traces in an experiment from ones that contain artifacts. In this work, Thomsen et al. describe a new platform for standardized, automated, and unbiased single-molecule FRET data analysis. The software is based on deep learning and is intended to act as the sole tool required to go from smFRET data acquisition all the way to quantitative analyses and publication-quality figure production. Rapid and automated analysis is enabled following a single user-input parameter (quality threshold), ensuring minimal user intervention in the data analysis process. A user-friendly GUI is provided to further simplify analysis. Importantly, the platform is open source written in Python, so users may adjust the code for their specific needs.

If successful, this platform could lower the threshold for smFRET expertise, allowing for a greater number of scientists to take advantage of this powerful tool. However, in its current form the manuscript could benefit from important clarifications to convince the readers of the novelty and superiority of this platform compared to the numerous previously published smFRET data analysis software packages. One weakness of the validation is that it was done only on a single real data set from one experiment performed by their lab.

Essential revisions:

1) It is unclear how pre-training the model can account for the infinite possible FRET states/lifetimes/occupancies/transition pathways/noise etc. How can this not bias the results to look for traces that are similar SNR etc. to the training data? For example, the 0.2 max probability of transition between states in the training dataset could bias analysis toward long-lived FRET states. The authors should comment on this.

2) Comparison of DeepFRET to human accuracy in picking "clean traces" does not seem to be an appropriate comparison (and is obviously faster). Manual trace selection is generally no longer a standard means to analyze smFRET data given the freely available open-source automated alternatives (e.g. HAMMY, ebFRET, SPARTAN, etc.). The comparison to other available software packages is important to convince users of the superior or at least equivalent performance of DeepFRET in automated trace selection. The authors should include such a comparison in the revised version of the manuscript.

3) Along the same lines, a better way to prove DeepFRET's trace analysis power is to take several datasets and compare analyses from HAMMY, ebFRET etc. vs. DeepFRET. The authors should include such comparative analyses on more than one dataset in the revised version of the manuscript.

4) It would be helpful if, in the Discussion section, the authors could provide a discussion of the tool's limitations. A discussion that talks about cases where their tool might fail would be useful for researchers who want to use their tool or build upon it. For example, in the discussion, the Materials and methods section notes that photophysical effects that are sometimes observed in smFRET experiments can be problematic for the method (it seems like the tool would likely classify these as non-useful traces even though they might reflect the "true" signal from the experiment [e.g. observation of PIFE in work from TJ Ha's group]).

https://doi.org/10.7554/eLife.60404.sa1

Author response

Essential revisions:

1) It is unclear how pre-training the model can account for the infinite possible FRET states/lifetimes/occupancies/transition pathways/noise etc. How can this not bias the results to look for traces that are similar SNR etc. to the training data? For example, the 0.2 max probability of transition between states in the training dataset could bias analysis toward long-lived FRET states. The authors should comment on this.

This is indeed a very valid comment and has been central for the process as there is an infinite number of possible permutations of FRET data, and improper training of the model could lead to biased trace selection. We thank the reviewers for allowing us to further comment on this as it may not have been clear in the manuscript.

To introduce as little bias as possible towards specific FRET states, we strived to generate a representative fraction of the entire smFRET space sampling uniformly a large number of the infinite possible permutations of data (FRET states/lifetimes/occupancies/transition pathways/noise). To verify this and further characterize the training data, we took a series of cautious steps which are outlined below:

a) FRET states and occupancies. The number of FRET states in a given trace was sampled uniformly from 1 to 4 states. Each state was randomly assigned a FRET value by uniform sampling between 0 and 1. For traces with more than one state, a minimum distance of 0.1 FRET was required between states (i.e. random, uniform sampling was repeated until the requirement was fulfilled) to be able to distinguish actual transitions from noise fluctuations. We verified that all FRET states as well as occupancies of the 150 k traces in the training data were uniformly distributed between 0 and 1 as displayed in Figure 1—figure supplement 3A. We realised that this may not have been clear in the original submission, so we have detailed this further in the Materials and methods subsection “Synthetic smFRET data generation”, in the revised version. Also, we corrected the legend of Figure 1—figure supplement 3A to “smFRET traces were generated with 1-4 randomly defined FRET states […]” as due to a phrasing error it stated that traces were generated with either 1, 2 or 3 states in the original submission.

b) Transition probability and lifetimes. We thank the reviewers for noticing the phrasing error in the original submission “with below 0 and 0.2 probability of transitioning”. The transition probability for each frame of the trace, from a given state to another, is uniformly sampled between 0 and 0.2. The transition probabilities are relevant between states, such that in a 4-state system with maximum allowed transition probabilities of 0.2, the combined transition probability at any given frame is 0.2*(4-1 states)=0.6. On the other hand, in a 2-state system with 0.01 transition probabilities between states, the combined transition probability at any given frame is 0.01*(2-1 states)=0.01. The mean lifetimes of the above mentioned systems are thus 1/0.6=1.7 frames and 1/0.01=100 frames, respectively. The lifetime distribution of the entire training data is an evenly weighted average over the exponential decays for each possible number of FRET states and transition probabilities as shown in Figure 1—figure supplement 3B. A Monte Carlo simulation on 10,000 traces sampling transition probabilities uniformly between 0 and 0.2 on 2-4 state traces, verifies that our training data follows the underlying model (Figure 1—figure supplement 3B). Hence, we sample a broad range of transition probabilities uniformly covering both long- and short-lived FRET states, striving to resemble most experimental data without introducing bias towards specific lifetimes. Acknowledging that this may not have been clear in the manuscript, we have added the new Figure 1—figure supplement 3B with lifetime distributions and the Monte Carlo simulation together with a brief description of how these were derived in the Materials and methods subsection “Synthetic smFRET data generation”.

c) The transition pathway follows a Markov chain which is randomly generated from a transition matrix with probabilities sampled as described above. The Markov chain of each simulated trace is generated using the Hidden Markov Model implementation of the opensource Python package named pomegranate. To verify that the training data sample a subset of all possible transition pathways uniformly, we have plotted a transition density plot of n=10,000 simulated traces. This convincingly illustrates a completely random and uniform transition pathway as expected. Acknowledging that this critical information was not implemented in the original submission, we have added the plot as a new Figure 1—figure supplement 3C, and made further clarifications in the Materials and methods subsection “Synthetic smFRET data generation”.

d) Noise level. Experimental data can sample a wide range of noise levels dependent on the biological system, instrumentation, experimental setup etc. To cover a broad range of possible noise levels, we took the following steps: First, donor and acceptor intensities of each trace were generated from the underlying true FRET values. Then, noise was added to the intensities by sampling from a normal distribution with σ values uniformly distributed between 0.01 and 0.30. To mimic shot noise, an additional layer of gamma noise was added on top with 0.8 probability. To justify the chosen range of σ values, we had plotted simulated FRET distributions at various noise levels as shown in Figure 1—figure supplement 4. The broad range of noise levels that we sample in the training data resemble most experimental data without introducing bias towards specific SNR due to uniform sampling supporting our training data and thus the model is unbiased towards specific SNR within the given range of σ values. We have clarified in the revised version both in the subsection “Performance of DeepFRET” and Discussion that in a regime where transition rates are similar to the imaging temporal resolution, dynamic smFRET traces may be incorrectly classified as noisy by the model. Trace simulation and model re-training (or changing imaging settings) would solve this.

We acknowledge that despite our rigorous attempts to introduce as little bias as possible by sampling all parameters uniformly, specialised users may have a better judgement and knowledge of their specific systems, noise levels, state lifetimes or transition probabilities amongst other parameters. When training a model, there might be some kind of bias defining the interval limits of the sampled parameters. If therefore, an expert user wants to tailor the model to better meet their specific needs, DeepFRET implements a user-friendly trace simulation interface where new FRET traces can easily be simulated based on user-defined parameters (see Figure 1—figure supplement 6) and used for retraining of the DNN model following our instructions (https://github.com/hatzakislab/DeepFRETModel). We have further highlighted this in the new paragraph in the Discussion section.

2) Comparison of DeepFRET to human accuracy in picking "clean traces" does not seem to be an appropriate comparison (and is obviously faster). Manual trace selection is generally no longer a standard means to analyze smFRET data given the freely available open-source automated alternatives (e.g. HAMMY, ebFRET, SPARTAN, etc.). The comparison to other available software packages is important to convince users of the superior or at least equivalent performance of DeepFRET in automated trace selection. The authors should include such a comparison in the revised version of the manuscript.

We wish to highlight that the main the scope of this manuscript is to provide an intuitive platform requiring minimal human intervention, that as the reviewers commented “[…] could lower the threshold for smFRET expertise, allowing for a greater number of scientists to take advantage of this powerful tool”, rather than prove wrong the existing, robust, software packages developed and operated by experts in the field. We also acknowledge that manual trace selection should not be the standard means to analyse smFRET data, but despite the wide range of available software packages for smFRET data analysis, only some implement advanced automatic sorting of traces. HAMMY and ebFRET as the reviewers suggested focus on kinetic rate extraction and offer simple thresholds based on intensity and FRET values. Cleaning up traces beyond these simple thresholds often requires extra manual selection. iSMS offers more advanced sorting by donor/acceptor intensity, mean FRET and mean stoichiometry for ALEX data as well as automatic detection of photobleaching. SPARTAN offers more extensive implementations for automated sorting (26 parameters in total) and is optimised for non-ALEX data. The actual threshold criteria however may vary significantly for each group and experimental system (Fessl et al., 2018; Gouge et al., 2017; Schärfen and Schlierf, 2019; Tsuboyama et al., 2018; Yao et al., 2015). Specialised groups are well trained to navigate through these multiple criteria and accurately define their own, which are optimised to operate on their specific systems (Aznauryan et al., 2016; Fessl et al., 2018; Gouge et al., 2017; Schärfen and Schlierf, 2019; Tsuboyama et al., 2018; Wu et al., 2018; Yao et al., 2015). However the diversity of these sorting criteria might introduce unnecessary bias in an already complicated series of data processing, especially since the advent of commercial instruments has rapidly expanded the smFRET field.

To directly address the comments of the reviewers we performed two types of experimental validations. We firstly compared DeepFRET directly with HAMMY, ebFRET, SPARTAN and iSMS sorting capabilities on simulated data where the ground truth is known. We then compared SPARTAN and iSMS, that have advanced sorting options, on experimental data sets published by other groups.

In the first case, we merged 200 simulated, ground truth smFRET traces with 1800 simulated, nonsmFRET traces (Figure 4—figure supplement 3A and the Materials and methods for FRET distributions and parameter descriptions, respectively) and reverse engineered tif files that would correspond to raw smFRET data. We simulated both ALEX and non-ALEX data as iSMS is optimized for ALEX data while HAMMY, ebFRET and SPARTAN operate optimally on non-ALEX data. The tif files were loaded into the respective software packages and used for extraction and sorting of traces applying a quality score of 0.80 in DeepFRET, default sorting parameters in SPARTAN (except background noise threshold), intensity, stoichiometry and FRET thresholds in iSMS, intensity thresholds in HAMMY and FRET thresholds in ebFRET. We found DeepFRET, SPARTAN and iSMS to recover the underlying ground truth FRET distribution to various levels of detail, while the simple intensity and FRET thresholds of HAMMY and ebFRET would require further sorting for optimal results. Notably, DeepFRET was found to sort traces at least similarly to, or better than, both SPARTAN and iSMS without any parameter tuning (Figure 4—figure supplement 3B). We strongly note that expert users would be able to fine-tune all possible thresholds to better match the ground truth data. However, in a real-world experiment where the ground truth is unknown the task becomes more challenging and finetuning parameters may be subject to bias, especially for non-specialised users. The single threshold-based classification offered by DeepFRET may be crucial for a greater number of scientists to take advantage of this tool.

In the second case, we compared the performance of the three software packages offering advanced sorting on experimental data published by other groups. Selecting non-ALEX and ALEX datasets published by Kilic et al. (Kilic et al., 2018) and Hellenkamp et al. (Hellenkamp et al., 2018), respectively ensures proper testing in diverse FRET settings and data sets. We used practically the default settings in both softwares and cropped data to the first 10 frames of each trace, minimising bleaching without using the default hard threshold at FRET < 0.2 in SPARTAN. Our analysis illustrates that all three software packages are able to reproduce the published FRET distributions from raw tif files with little discrepancy (Figure 4—figure supplement 4). DeepFRET displays equivalent or superior performance to existing sophisticated software packages also on experimental data. We emphasize that existing software packages are very robust and expert users would be able to navigate through and optimize all of the required settings for the individual datasets. The power of DeepFRET is its capacity to analyse both ALEX and non-ALEX data in a reproducible manner only requiring minimal human intervention, and thus minimal expertise in threshold setting, allowing for a greater number of scientists to take advantage of this powerful tool. Acknowledging the lack of comparison to existing software, we have added the new Figure 4—figure supplements 3-4. We have also renamed the sub section “DeepFRET performance on real data,” to “DeepFRET performance on real data, comparison to existing robust softwares for smFRET analysis” and also added a new paragraph in the section explicitly discussing the comparison on simulated ground truth and published data.

3) Along the same lines, a better way to prove DeepFRET's trace analysis power is to take several datasets and compare analyses from HAMMY, ebFRET etc. vs. DeepFRET. The authors should include such comparative analyses on more than one dataset in the revised version of the manuscript.

To address the comment of the reviewers we directly compared the performance of DeepFRET to published results on two experimental datasets (ALEX and non-ALEX) across different groups (Figure 4—figure supplement 4 and answer to comment 2). We found that DeepFRET was able to reproduce the published smFRET distributions using a simple quality threshold of 0.80 without further human intervention, and furthermore so equally or better than existing software, (see also answer to reviewer comment 2). These data further validate the performance and power of DeepFRET as compared to other existing software requiring user-defined thresholds that may require specialised expertise of its users. We have explained the comparison in the main text and added the new Figure 4—figure supplement 4 in the manuscript.

4) It would be helpful if, in the Discussion section, the authors could provide a discussion of the tool's limitations. A discussion that talks about cases where their tool might fail would be useful for researchers who want to use their tool or build upon it. For example, in the discussion, the Materials and methods section notes that photophysical effects that are sometimes observed in smFRET experiments can be problematic for the method (it seems like the tool would likely classify these as non-useful traces even though they might reflect the "true" signal from the experiment [e.g. observation of PIFE in work from TJ Ha's group]).

Following the comment of the reviewers, we added a new paragraph to the Discussion section outlining the limitations of the current version of DeepFRET and what could be done in the future to improve it. As outlined in the previous sections, DeepFRET performs accurately on both simulated and experimental 2-color smFRET data across multiple laboratories. The limitations that are discussed in the revised version of the manuscript can be addressed by expert users through simulation of new training data and re-training of the DNN model following our instructions (https://github.com/hatzakislab/DeepFRET-Model) as described in the Materials and methods section.

References:

Aznauryan, M., Søndergaard, S., Noer, S.L., Schiøtt, B., Birkedal, V., 2016. A direct view of the complex multi-pathway folding of telomeric G-quadruplexes. Nucleic Acids Res. 44, 11024– 11032. doi:10.1093/nar/gkw1010

Fessl, T., Watkins, D., Oatley, P., Allen, W.J., Corey, R.A., Horne, J., Baldwin, S.A., Radford, S.E., Collinson, I., Tuma, R., 2018. Dynamic action of the Sec machinery during initiation, protein translocation and termination. eLife 7. doi:10.7554/eLife.35112

Gouge, J., Guthertz, N., Kramm, K., Dergai, O., Abascal-Palacios, G., Satia, K., Cousin, P., Hernandez, N., Grohmann, D., Vannini, A., 2017. Molecular mechanisms of Bdp1 in TFIIIB assembly and RNA polymerase III transcription initiation. Nat. Commun. 8, 130.

doi:10.1038/s41467-017-00126-1

Schärfen, L., Schlierf, M., 2019. Real-time monitoring of protein-induced DNA conformational changes using single-molecule FRET. Methods 169, 11–20.

doi:10.1016/j.ymeth.2019.02.011

Tsuboyama, K., Tadakuma, H., Tomari, Y., 2018. Conformational activation of argonaute by distinct yet coordinated actions of the hsp70 and hsp90 chaperone systems. Mol. Cell 70, 722-729.e4. doi:10.1016/j.molcel.2018.04.010

Wu, S., Liu, J., Wang, W., 2018. Dissecting the Conformational Dynamics-Modulated Enzyme Catalysis with Single-Molecule FRET. J. Phys. Chem. B 122, 6179–6187.

doi:10.1021/acs.jpcb.8b02374

Yao, C., Sasaki, H.M., Ueda, T., Tomari, Y., Tadakuma, H., 2015. Single-Molecule Analysis of the Target Cleavage Reaction by the Drosophila RNAi Enzyme Complex. Mol. Cell 59, 125–132. doi:10.1016/j.molcel.2015.05.015

https://doi.org/10.7554/eLife.60404.sa2

Article and author information

Author details

  1. Johannes Thomsen

    Department of Chemistry and Nanoscience Centre, University of Copenhagen, Copenhagen, Denmark
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0148-9114
  2. Magnus Berg Sletfjerding

    Department of Chemistry and Nanoscience Centre, University of Copenhagen, Copenhagen, Denmark
    Contribution
    Data curation, Software, Formal analysis, Validation, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8669-4039
  3. Simon Bo Jensen

    Department of Chemistry and Nanoscience Centre, University of Copenhagen, Copenhagen, Denmark
    Contribution
    Data curation, Formal analysis, Validation, Investigation, Visualization, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8946-3831
  4. Stefano Stella

    Structural Molecular Biology Group, Novo Nordisk Foundation Centre for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
    Contribution
    Data curation, Investigation, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9078-4659
  5. Bijoya Paul

    Structural Molecular Biology Group, Novo Nordisk Foundation Centre for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
    Contribution
    Data curation, Investigation, Writing - review and editing
    Competing interests
    No competing interests declared
  6. Mette Galsgaard Malle

    Department of Chemistry and Nanoscience Centre, University of Copenhagen, Copenhagen, Denmark
    Contribution
    Validation, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3722-502X
  7. Guillermo Montoya

    Structural Molecular Biology Group, Novo Nordisk Foundation Centre for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
    Contribution
    Resources, Data curation, Supervision, Funding acquisition
    Competing interests
    No competing interests declared
  8. Troels Christian Petersen

    Niels Bohr Institute, University of Copenhagen, Copenhagen, Denmark
    Contribution
    Resources, Data curation, Software, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    Competing interests
    No competing interests declared
  9. Nikos S Hatzakis

    1. Department of Chemistry and Nanoscience Centre, University of Copenhagen, Copenhagen, Denmark
    2. Novo Nordisk Foundation Centre for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
    Contribution
    Conceptualization, Resources, Data curation, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    hatzakis@chem.ku.dk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4202-0328

Funding

Carlsbergfondet (CF16-0797)

  • Johannes Thomsen
  • Simon Bo Jensen
  • Nikos S Hatzakis

Velux Fonden (10099)

  • Nikos S Hatzakis

Velux Fonden (18333)

  • Magnus Berg Sletfjerding
  • Mette Galsgaard Malle
  • Nikos S Hatzakis

Novo Nordisk (NNF14CC0001)

  • Stefano Stella
  • Bijoya Paul
  • Guillermo Montoya
  • Nikos S Hatzakis

Novo Nordisk (NNF0024386)

  • Guillermo Montoya

Novo Nordisk (NNF17SA0030214)

  • Guillermo Montoya

Novo Nordisk (NNF18OC0055061)

  • Guillermo Montoya

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Soeren SR Bohr and Henrik Pinholt for fruitful discussions. GM and NSH are members of the Integrative Structural Biology Cluster (ISBUC) at the University of Copenhagen.

Senior Editor

  1. Suzanne R Pfeffer, Stanford University School of Medicine, United States

Reviewing Editor

  1. Sebastian Deindl, Uppsala University, Sweden

Reviewer

  1. Shixin Liu, The Rockefeller University, United States

Publication history

  1. Received: June 25, 2020
  2. Accepted: October 2, 2020
  3. Version of Record published: November 3, 2020 (version 1)

Copyright

© 2020, Thomsen et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,429
    Page views
  • 160
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

  1. Further reading

Further reading

    1. Structural Biology and Molecular Biophysics
    Andres López-Perrote et al.
    Research Article Updated

    Nonsense-mediated mRNA decay (NMD) is a surveillance pathway that degrades aberrant mRNAs and also regulates the expression of a wide range of physiological transcripts. RUVBL1 and RUVBL2 AAA-ATPases form an hetero-hexameric ring that is part of several macromolecular complexes such as INO80, SWR1, and R2TP. Interestingly, RUVBL1-RUVBL2 ATPase activity is required for NMD activation by an unknown mechanism. Here, we show that DHX34, an RNA helicase regulating NMD initiation, directly interacts with RUVBL1-RUVBL2 in vitro and in cells. Cryo-EM reveals that DHX34 induces extensive changes in the N-termini of every RUVBL2 subunit in the complex, stabilizing a conformation that does not bind nucleotide and thereby down-regulates ATP hydrolysis of the complex. Using ATPase-deficient mutants, we find that DHX34 acts exclusively on the RUVBL2 subunits. We propose a model, where DHX34 acts to couple RUVBL1-RUVBL2 ATPase activity to the assembly of factors required to initiate the NMD response.

    1. Cell Biology
    2. Structural Biology and Molecular Biophysics
    Lukas Ded et al.
    Research Article Updated

    Out of millions of ejaculated sperm, a few reach the fertilization site in mammals. Flagellar Ca2+ signaling nanodomains, organized by multi-subunit CatSper calcium channel complexes, are pivotal for sperm migration in the female tract, implicating CatSper-dependent mechanisms in sperm selection. Here using biochemical and pharmacological studies, we demonstrate that CatSper1 is an O-linked glycosylated protein, undergoing capacitation-induced processing dependent on Ca2+ and phosphorylation cascades. CatSper1 processing correlates with protein tyrosine phosphorylation (pY) development in sperm cells capacitated in vitro and in vivo. Using 3D in situ molecular imaging and ANN-based automatic detection of sperm distributed along the cleared female tract, we demonstrate that spermatozoa past the utero-tubal junction possess the intact CatSper1 signals. Together, we reveal that fertilizing mouse spermatozoa in situ are characterized by intact CatSper channel, lack of pY, and reacted acrosomes. These findings provide molecular insight into sperm selection for successful fertilization in the female reproductive tract.