Mars, a molecule archive suite for reproducible analysis and reporting of single-molecule properties from bioimages

  1. Nadia M Huisjes
  2. Thomas M Retzer
  3. Matthias J Scherr
  4. Rohit Agarwal
  5. Lional Rajappa
  6. Barbara Safaric
  7. Anita Minnen
  8. Karl E Duderstadt  Is a corresponding author
  1. Structure and Dynamics of Molecular Machines, Max Planck Institute of Biochemistry, Germany
  2. Physik Department, Technische Universität München, Germany

Abstract

The rapid development of new imaging approaches is generating larger and more complex datasets, revealing the time evolution of individual cells and biomolecules. Single-molecule techniques, in particular, provide access to rare intermediates in complex, multistage molecular pathways. However, few standards exist for processing these information-rich datasets, posing challenges for wider dissemination. Here, we present Mars, an open-source platform for storing and processing image-derived properties of biomolecules. Mars provides Fiji/ImageJ2 commands written in Java for common single-molecule analysis tasks using a Molecule Archive architecture that is easily adapted to complex, multistep analysis workflows. Three diverse workflows involving molecule tracking, multichannel fluorescence imaging, and force spectroscopy, demonstrate the range of analysis applications. A comprehensive graphical user interface written in JavaFX enhances biomolecule feature exploration by providing charting, tagging, region highlighting, scriptable dashboards, and interactive image views. The interoperability of ImageJ2 ensures Molecule Archives can easily be opened in multiple environments, including those written in Python using PyImageJ, for interactive scripting and visualization. Mars provides a flexible solution for reproducible analysis of image-derived properties, facilitating the discovery and quantitative classification of new biological phenomena with an open data format accessible to everyone.

Editor's evaluation

This is a valuable paper that reports an open-source platform for the storage and processing of single-molecule, camera-based, imaging data. The development and testing of the platform are very compelling and the platform will facilitate data sharing and reproducibility and will be of great interest to practitioners of single-molecule imaging experiments, both experienced and new to the field. The work represents significant and important steps towards unifying and standardizing how the field stores and processes data and expanding the base of researchers who can easily employ single-molecule imaging methods.

https://doi.org/10.7554/eLife.75899.sa0

Introduction

Reproducible analysis of bioimaging data is a major challenge slowing scientific progress. New imaging techniques generate datasets with increasing complexity that must be efficiently analyzed, classified, and shared. This challenge has gained wide recognition (Carpenter et al., 2012; Eliceiri et al., 2012; Lerner et al., 2021; Meijering et al., 2016; Ouyang and Zimmer, 2017), which has led to the development of software frameworks for curating images (Allan et al., 2012; Kvilekval et al., 2010) and analysis workflows (Rubens et al., 2020), metadata reporting standards (Goldberg et al., 2005) and the creation of public archives to increase data availability (Williams et al., 2017). However, few standards exist for the reproducible analysis and reuse of image-derived properties and those that have been developed for single molecule fluorescence (Greenfeld et al., 2015; Ingargiola et al., 2016) offer limited options for adaption to other experimental configurations. There is an increasingly powerful toolkit to quantitatively follow the time evolution of the position, shape, composition, and conformation of individual cells and complexes, but these precious information-rich observations are generated in heterogenous formats that do not provide easy, transparent access to key features. As a consequence, the promise of new technologies is often unrealized, and new biological phenomena remain undiscovered in existing datasets, due to the lack of tools that enable robust classification and interactive exploration.

Single-molecule techniques provide access to rare intermediates in complex, multistage molecular pathways. Multicolor fluorescence imaging has revealed the conformational dynamics of membrane transport (Akyuz et al., 2013; Erkens et al., 2013), molecular states underlying assembly and transcription by RNA polymerase (Baek et al., 2021; Duchi et al., 2016), and DNA replication dynamics (Duderstadt et al., 2016; Scherr et al., 2018; Ticau et al., 2015). These approaches have been combined with spatial tracking on biological structures to clarify how exchange events and conformational changes modulate the function of motor proteins, as well as replication, transcription, and DNA repair machineries (Crickard et al., 2020; Lewis et al., 2020; Niekamp et al., 2021; Scherr et al., 2022). Biological macromolecules are routinely attached to microspheres which allow for controlled studies of the forces and torques involved in basic biological reactions (Agarwal and Duderstadt, 2020; Dulin et al., 2013; Neuman and Nagy, 2008; Revyakin et al., 2006). And finally, improvements in the sensitivity of camera sensors, fluorophore brightness, and new illumination strategies Gao et al., 2012; Tokunaga et al., 2008 have enabled time-resolved studies of biological processes in live cells. These developments have revealed frequent exchange of factors during normal operation of replisomes (Beattie et al., 2017; Kapadia et al., 2020) and the dynamics of the transcription-factor target site search (Chen et al., 2014).

The discovery of new biological phenomena from these multidimensional observations, depends on long, multistage image analysis workflows, followed by careful grouping and feature classification. Images are typically preprocessed, to correct for non-uniform beam profiles and filtered to enhance detection of individual molecules and structures over the background. Additional tools are then used to follow the properties of individual biomolecules through time. These often provide some form of tabular data containing aggregated results, which are either manually evaluated or filtered using a feature unique to the desired group of observations. Typically, at this point, results can no longer be re-evaluated in the context of the original images due to unidirectional transformations or lack of interactive image evaluation software. The final data are frequently migrated to another platform ideally suited for the generation of publication-quality figures. This necessitates further data restructuring during transfers between software platforms. Large datasets are downsized to cope with limited storage by removal of rejected observations in ways that irreversibly alter datasets. As rejection criteria change and additional data must be incorporated, precious time and reproducibility are lost. This process is often repeated for each experimental application and research group, leading to substantial duplication of efforts.

Recognizing these issues in common single-molecule image processing workflows, we developed Mars, which provides a collection of Fiji/ImageJ2 (Pietzsch et al., 2012; Rueden et al., 2017; Schindelin et al., 2012) commands and integrations with well-established Fiji workflows for processing images and image-derived properties based on a versatile Molecule Archive architecture (Figure 1). This architecture allows for seamless virtual storage, merging, and multithreaded processing of very large datasets. A simple, yet powerful interface allows for fast and easy access to subsets of records based on any biomolecule property. This framework allows for the same data structure to be used from initial image processing all the way to the generation of final figures. All image metadata and biomolecule records are assigned universally unique ids (uids) upon creation, which together with comprehensive logging, ensures the history of each record remains traceable through long and complex analysis workflows involving numerous data merging steps. Non-destructive tag-based filtering ensures no observations are lost and updates to rejection criteria only require retagging of processed records. These design principles facilitate the reproduction of analysis workflows and ensure Molecule Archives provide a comprehensive, transparent format for the deposition of final datasets.

Overview of Mars workflows.

The process of reproducible data analysis with Mars starting from image processing to iterative rounds of classification and filtering to the final stage of data exploration and deposition into a public database.

The modular design of commands and minimal definitions of image metadata and biomolecule records ensure flexibility that facilitates the development of varied Mars workflows. To illustrate the range of image processing and kinetic analysis tasks that can be accomplished, we present the results of three applications. We demonstrate the versatility of Mars record types in representing both single molecules as well as large macromolecular structures by tracking single RNA polymerases on long DNAs. We benchmark a multichannel fluorescence integration workflow using well-established single-molecule FRET frameworks. And finally, we show how high-throughput imaging of DNA-tethered microspheres reveals the results of complex topological transformations induced by controlled forces and torques. In each example, we highlight image processing commands and methods for accessing and manipulating Molecule Archives using scripts. Each workflow provides a basic framework that can be further adapted to custom applications by the introduction of additional processing steps and modification of the scripts provided. The graphical user interface provided by the Mars Rover facilitates workflow refinement through interactive data exploration and manual classification with multichannel image views. Taken together, these features make Mars a powerful platform for the development of imaging-processing workflows for the discovery and quantitative characterization of new biological phenomena in a format that is open to everyone.

Results

Common pitfalls in single-molecule image-processing workflows

Single-molecule imaging platforms provide an increasingly powerful toolkit to observe complex biological systems, but obtaining quantitative information depends on multistage analysis workflows with many potential pitfalls. The core design principles of Mars were developed to avoid common issues. Before introducing the architecture of Mars in depth, we will illustrate the benefit of Mars with a practical example. For those not yet experienced with single-molecule imaging, this example should help to clarify how Mars improves data readability and workflow reproducibility to avoid issues that frequently arise during dataset transformations when analysis steps are not properly documented and annotated.

Many common single-molecule experiments involve recording images of the position and intensity of fluorescent labels attached to biomolecules as a function of time. For example, consider the simple experiment of fluorescently labeled polymerases traveling along DNA molecules during synthesis (covered in greater depth in workflow 1). Experimenters typically want to track polymerase position and fluorescence intensity in consecutive images. From the raw images, they would like to determine the polymerase synthesis rate, product sizes, and number of polymerases on each DNA molecule. This is accomplished by finding the locations of fluorescent spots in individual images, followed by linking spots from the same molecules over time. In a typical single-molecule workflow, tracking results are generated in a single table with columns for the image number, molecule position, and an index number for each track. Several filters are used to select moving polymerases that travel a minimum distance and have an intensity consistent with a single dye. After this automated filtering, tracks are manually checked to remove those with tracking errors. In a typical workflow, these filtering steps remove rejected tracks from the table. Finally, a conversion factor that relates DNA length to polymerase coordinates is used to calculate the synthesis rate and product length. The process is then repeated for many videos generating a final merged table with reindexed global track numbers.

This typical workflow has several pitfalls that increase the analysis time, make discovery of new phenomena more difficult, lead to irreproducibility, and, in the worst case, result in mistakes. The most significant problem is that every step is unidirectional. Tracks rejected during automated or manual filtering are removed. As a consequence, if any filtering criteria change, the analysis must be repeated from the start. Moreover, no framework is provided to easily document which scripts and thresholds were used and why tracks were manually rejected. The most significant pitfall arises from merging multiple videos and reindexing track numbers. This makes it impossible to evaluate the final dataset in the context of the original raw videos and their metadata. For example, when a mistake is discovered in a subset of experiments, such as poor polymerase labeling, incorrect temperature, illumination, or contamination, there is no way to easily remove the subset of experiments from the final dataset since the tracks have no identity linking them to the original videos. Beyond these immediate practical pitfalls, the structure of the workflow makes it difficult to ask complex questions that relate one biomolecule property to another and quickly re-evaluate the dataset based on a new model. For example, how does the synthesis rate depend on the number of polymerases? How does polymerase number influence the frequency of pauses? What if there are large differences in DNA extension between molecules or videos that require unique correction factors. The entire analysis and filtering would need to be done again.

Mars was developed to address these common pitfalls that arise in workflows like the example provided. First, Mars provides image processing commands streamlined for single-molecule applications that generate Molecule Archives containing a record of the analysis steps. Filtering is done using tags without the removal of rejected data to avoid unidirectional workflows. When filter criteria change only the tags need to be updated without redoing the analysis. Different subsets of existing tags may also be used to study complex differences between subpopulations such as polymerase clusters. The uids given to records upon creation, together with unique metadata ids, ensure a traceable history when datasets are merged, so that final datasets can always be evaluated in the context of the original raw images. The biomolecule storage and graphical interface in Mars are built to simplify the exploration of complex relationships based on different biomolecule properties. If experimental problems are discovered in a subset of videos, they can easily be filtered using the metadata ids stored with the records. All of these elements provide endless reslicing possibilities using the same dataset that overcome common pitfalls in single molecule image processing workflows.

Molecule Archive architecture

Molecule Archives provide a flexible standard for storing and processing image-derived properties adaptable to a broad range of experimental configurations. Molecule Archives contain three record types: Properties, Metadata, and Molecule (Figure 2). Each type is defined in an interface, independent of implementation details. This abstraction ensures Molecule Archives support a variety of biomolecule, metadata, and property implementations. Moreover, this allows for the creation of new implementations that seamlessly work with the existing code base, algorithms, and user interface. To simplify the mechanics of record retrieval and the process of dataset merging, all Molecule and Metadata records are assigned human-readable, base58-encoded universally uids. Storing records using uids reduces indexing requirements, facilitates scalable processing using uid-to-record maps that support multithreaded operations, and ensures the traceability of records through analysis workflows.

Figure 2 with 1 supplement see all
Molecule Archive structure.

Schematic representation of the structure of Molecule Archives consisting of three types of records: Properties, Metadata, and Molecule. The single Properties record contains global information about the Molecule Archive contents, the Metadata records store information about the images used for biomolecule analysis (e.g. image dimensions, the analysis log), and the Molecule records store molecule-specific information (e.g. position over time, intensity).

Molecule Archives contain a single Properties record with the type of the Molecule Archive and global information about the Molecule Archive contents. This includes the number of Metadata and Molecule records and unique names used for tags, parameters, regions, positions, and table columns. The Properties record stores global Molecule Archive comments typically containing important information about the analysis strategy and naming scheme for tags, parameters, regions, positions, and other fields to orient a new researcher that did not perform the original analysis. To improve the organization and readability of comments, the Mars Rover also provides a convenient Markdown editor.

Metadata records contain experimental information about the images used for biomolecule analysis. To ensure compatibility with the broadest set of image formats and maximum reusability, image dimensionality, timepoints, channels, filters, camera settings, and other microscope information are stored in OME format (Goldberg et al., 2005). Molecule Archives do not store the raw images, only the image-derived properties and the source directory containing the project. Nevertheless, interactive image views linked to molecule records are supported through big data viewer (BDV) integration (Pietzsch et al., 2015) with HD5 and N5 (Stephan Saalfeld et al., 2022) formatted images. The image file locations, coordinate transforms, and other settings are stored in Metadata records. Global events in time influencing all biomolecules are documented using Metadata regions and positions that are available for kinetic analysis and scripting. Global experimental conditions are stored as parameters in the form of key-value pairs (e.g. buffer composition and temperature). Metadata records are categorized using tags to filter biomolecules by whole experiments. The Metadata records also contain a log where all commands and settings used for processing are recorded, thus maintaining the entire history of data processing throughout the analysis.

Molecule records contain fields for convenient storage of common image-derived properties of biomolecules. This includes a table that typically contains the position and intensity over time, the uid of the Metadata record containing the primary experimental information, and the index of the image containing the biomolecule. Events of interest in time can be marked with regions and positions (e.g. activity bursts, dye bleaching), which allow for event-specific calculations and kinetic analyses. Calculated global biomolecule properties are stored as parameters in the form of key-value pairs (e.g. mean intensity, distance traveled, and mean position). Mars comes with kinetic change point (KCP) commands for unbiased identification of distinct linear regimes and steps (e.g. polymerase synthesis rate, FRET states, and dye bleaching) (Hill et al., 2018), which are stored in segment tables. And finally, Molecule records are categorized for later analysis using custom tags and notes for manual assessments.

Molecule Archives can be created, saved, and reloaded in a variety of formats in multiple environments both desktop-based and without a user interface for parallel processing on high-performance clusters. Molecule Archives are saved in JSON format using the field schema outlined in Figure 2. By default, Molecule Archives are written to single files with a yama extension using smile encoding and compression of molecule table data to reduce file size, but can also be saved and reopened in plaintext JSON. Mars supports processing of very large datasets, that typically do not fit in physical memory using a virtual storage mode in which records are retrieved only on-demand supported by a simple filesystem-backed record hierarchy. This architecture provides a powerful and flexible framework for multistep analysis workflows involving very large datasets.

Mars Rover–interactive molecule feature exploration and image views

The discovery of new biological phenomena using single-molecule techniques often relies on manual exploration of individual biomolecules in primary images together with image-derived measurements. To simplify this process, we developed a user interface in JavaFX, called the Mars Rover, that provides access to all information stored in Molecule Archives. The Mars Rover is integrated into Fiji with windows available for all open Molecule Archives. Open Molecule Archives are available as inputs and outputs in Fiji/ImageJ2 commands and in supported scripting languages.

In the Mars Rover, Molecule Archive windows contain tabs and subpanels that provide complete access to all fields of Molecule and Metadata records as well as global properties, comments, and interface settings (Figure 2—figure supplement 1). A customizable global dashboard provides information about the Molecule Archive contents and scriptable chart widgets that can be adapted to specific workflows. The Molecule tab features an interactive chart panel that reveals the time evolution of biomolecule properties and provides region and position highlighting tools. To ensure image-derived measurements of biomolecule properties are evaluated together with primary observations, interactive image views linked to molecule records are supported through BDV integration (Pietzsch et al., 2015).

The Mars Rover was written to provide extensive possibilities for customization. All tabs and panels are defined in interfaces, independent of implementation details. This facilitates extension of the Mars Rover to support custom icons and display elements based on Molecule Archive type. This will enable further refinement and the development of workflow-specific displays by extending the core architecture in the future.

Commands for image processing and biomolecule analysis

Mars comes with a collection of several dozen Fiji/ImageJ2 commands for common single-molecule image processing and analysis tasks (Table 1). This includes commands to find, fit, integrate, and track through time intensity peaks and objects in images. Commands to correct for non-uniform excitation beam profiles, and to transform region of interest peak collections with colocalization filtering possibilities. In addition to image processing, there are Molecule Archive commands for opening, merging, and transforming datasets as well as kinetic change point analysis. Finally, Mars has commands devoted to interoperability with other common formats. This includes importers for TrackMate (see the next section), single molecule dataset (SMD) (Greenfeld et al., 2015), and LUMICKS h5 files from optical tweezers experiments.

Table 1
Mars commands.

Description of Fiji/ImageJ2 commands supporting the analysis of image-derived biomolecule data in Mars. Detailed documentation can be found on the Mars documentation website (https://duderstadt-lab.github.io/mars-docs/).

CommandDescription
Image
Peak FinderFinds high-intensity pixel clusters (peaks) in an image. Additionally, the sub-pixel position can be determined utilizing a 2D Gaussian fit.
DNA FinderFinds vertically aligned DNA molecules in an image. Additionally, the sub-pixel position of both ends of the molecule can be determined utilizing a 2D Gaussian fit.
Peak TrackerFinds, fits, and tracks peaks in images.
Object TrackerIdentifies unspecified objects in images utilizing classification by segmentation and tracks their center of mass.
Molecule IntegratorIntegrates the intensity of a peak over all frames.
Molecule Integrator (multiview)Integrates the intensity of a peak over all frames in an image stack with multiview images.
Beam Profile CorrectorCorrects for the beam profile-generated image intensity deviations.
Gradient CalculatorCalculates the gradient of consecutive pixels from top to bottom or from left to right to identify long linear objects such as DNA molecules.
Overlay channelsCombines several individual videos into one creating a single video with the information stored along the ‘Channel (C)’ dimension.
Molecule
Open ArchiveOpens a Molecule Archive.
Open Virtual StoreOpens a virtual Molecule Archive.
Build Archive from TableConverts an opened table with a ‘molecule’ index column into a Molecule Archive.
Build DNA ArchiveBuilds a DNA Molecule Archive from a single Molecule Archive and a list of DNA ROIs in the ROI Manager. It uses the location of the DNA molecules to search for molecules in the single Molecule Archive that overlap with (parts of) this location.
Merge ArchivesMerges multiple Molecule Archives (placed in a single folder) into one.
Merge Virtual StoresMerges multiple virtual Molecule Archives (placed in a single folder) into one.
Add TimeAdds a column to the molecule tables to convert time points (T) to real time values as specified in the metadata or by a user-defined time increment.
Drift CorrectorCalculates and corrects for the sample drift given a Molecule Archive and a tag corresponding to all immobile molecules in the dataset. Generates new columns for each molecule table.
Region Difference CalculatorCalculates the difference between the regions specified for all molecules in the Molecule Archive and adds the outcome as a molecule parameter.
Variance CalculatorCalculates the variance on a specified molecule table column and adds the outcome as a molecule parameter.
Table
Open TableImports a comma or tab-delimited table to the MarsTable format.
SortSorts a MarsTable based on values in a specified column.
FilterFilters the rows of a MarsTable based on the specified criteria.
Import IJ1 TableImports any ImageJ1 table to the MarsTable format.
Import TableDisplayImports any SciJava table to the MarsTable format.
KCP
Change Point FinderDetects linear regions or steps in single-molecule traces. This command generates molecule segments tables listing endpoints and fits for linear regions.
Single Change Point FinderDetects a single change point in a single-molecule trace. The output is a segments table with the end points and fit or the position.
Sigma CalculatorCalculates the error value in a specific region of interest in all single-molecule traces that can be used as input for the change point calculation commands.
ROI
Transform ROIsTransforms peak ROIs from one region of a multiview image to another.
Import
LUMICKS h5Opens optical tweezer data in HDF5 (h5 file extension) format collected using a LUMICKS instrument and converts the data to Molecule Archive format.
Single-molecule dataset (SMD)Opens SMD files in plaintext json format and converts the data to Molecule Archive format.

Commands appear in a submenu of the plugins menu of Fiji when Mars is installed using the update site. When a command is selected, users are presented with a dialog to choose the desired options. The active image in Fiji is used as input by image processing commands. Some provide preview possibilities prior to processing full image sequences. Tracking and fluorescence integration commands generate Molecule Archives as outputs which open in new windows when the command finishes. Biomolecules can then be explored using the Mars Rover user interface presented in all open Molecule Archive windows and all analysis work can be saved to disk as two files: one file with a yama extension containing the primary Molecule Archive data and a second file with the yama.rover extension containing Mars Rover settings used to restore the window state and dashboard widgets. The typical Mars workflow starts with the creation of a Molecule Archive by processing an image sequence, followed by biomolecule feature analysis and tagging. This typically involves the creation of custom analysis scripts in Fiji that manipulate Molecule Archives. Final publication quality figures are often rendered by opening the yama file in a Jupyter notebook and using one of the Python charting libraries. Comprehensive reference materials and detailed step-by-step guides for common workflows are freely available on the Mars documentation website.

The UI-agnostic format of ImageJ2 commands ensures they can be used in many different contexts with or without a graphical user interface available. To facilitate these applications, methods have been added for all required settings and example scripts for commands. Commands can be combined into larger scripts that support multistep analysis workflows runnable on high-performance computing clusters. This facilitates a smooth transition from a dialog-based workflow development phase using the graphical user interface of Fiji to high-performance parallel processing of many experiments in environments lacking a graphical user interface.

TrackMate interoperability

Fiji provides a comprehensive open-source platform for scientific image analysis containing well-established software for common imaging processing tasks. These technologies are integrated in a modular fashion as plugins that can be combined in limitless combinations. Mars commands and data structures are fully integrated into Fiji, simplifying interoperability with these technologies. The applications presented below provide examples of how Fiji plugins can be combined with Mars commands. To illustrate how this interoperability can be further extended, we developed an action to export TrackMate results to Molecule Archive format (Source code available at https://github.com/duderstadt-lab/mars-trackmate). TrackMate is a Fiji plugin for single-particle tracking that offers several tracking algorithms with a powerful user interface with many spot filtering and track editing tools (Tinevez et al., 2017). The action we developed adds an export option in the final TrackMate panel called ‘go to Mars’ that opens a Molecule Archive with the converted results. This feature is installed with Mars and requires no additional configuration. The Mars Peak Tracker is ideal for everyday single-molecule tracking problems with a few simple options in a single dialog but does not offer all the capabilities of TrackMate. This extension gives users more possibilities for complex problems such as tracking the shape and position of objects using machine learning algorithms (Ershov et al., 2021) and exporting the results from TrackMate to a Mars ObjectArchive.

Applications

To demonstrate the range of analysis tasks that can be performed with Mars, we have developed three workflows based on real-world applications. In the first workflow, we determine the rate of transcription of single RNA polymerases by tracking their position as a function of time on long DNAs based on Scherr et al., 2022. In the second workflow, we demonstrate how to accurately analyze images containing observations of single-molecule FRET based on Hellenkamp et al., 2018 and dynamic FRET arising from conformational switching of Holiday junctions (Hyeon et al., 2012). And finally, we highlight the virtual storage capability of Mars working with very large datasets from high-throughput imaging of DNA-tethered microspheres manipulated with forces and torques based on Agarwal and Duderstadt, 2020. Detailed step-by-step instructions for each workflow can be found on the Mars documentation website and the raw data used in each workflow are freely available on either GitHub or Zenodo. Jupyter notebooks, sample archives, and scripts used in the workflows are available in the mars-tutorials repository.

Workflow 1–tracking RNA polymerase position during transcription

Single-molecule total internal reflection fluorescence (smTIRF) microscopy has become an indispensable tool to study biomacromolecular structure and functionality allowing the observation of, e.g., molecule position and dynamics. Examples of such studies include kinetic studies of DNA replication (Dequeker et al., 2022; Ha et al., 2002; Lewis et al., 2020), studies of the polymerization of structural elements like actin (Amann and Pollard, 2001), the direct observation of flagellar motor rotation (Sowa et al., 2005), as well as tracking of processes in vivo (Vizcay-Barrena et al., 2011). To illustrate the use of Mars for the analysis of such datasets, this example shows a typical Mars workflow for smTIRF studies of the kinetics of a fluorescently-labeled RNA polymerase transcribing on an immobilized, promoter-containing, 21 kb DNA molecule (Scherr et al., 2022; Figure 3A). In the presence of all four nucleotides, RNA polymerase could initiate transcription from the promoter and progress on the DNA which could be temporally and spatially visualized by measuring fluorescent emission upon excitation. After transcription was completed, DNA was poststained with SYTOX orange to reveal the position of the DNA molecules in the last frames of the video. By correlating the RNA polymerase movement with the position of the DNA molecule, information about the polymerase processivity and progression rates were obtained.

Workflow for tracking RNA polymerase position during transcription.

(A) Schematic of the RNA polymerase assay. Promoter-containing surface-immobilized 21 kb DNA was incubated with fluorescently-labeled RNA polymerase after which transcription was tracked over time. (B) Representation of the analysis pathway showing the analysis steps starting from the raw image stack on the left to a final plot on the right. First, the Peak Tracker extracted position vs time information from each fluorescent RNA polymerase creating a single Molecule Archive. In parallel, the DNA finder located the long, line-shaped, DNA molecules and generates a list of start and end positions. The information yielded from both tools was merged into a final DNA Molecule Archive. A classification and sorting process was applied resulting in a final plot showing the abundance of tracked molecules at various transcription rates (nt/s). A Gaussian fit to the population with rates > 40 nt/s revealed a population average transcription rate of 53±3.6 nt/s. Here n is the number of molecules.

To analyze the data quantitatively, first, a beam profile correction was applied to remove the non-uniform laser excitation in the field of view due to the Gaussian beam profile of frequently employed light sources in TIRF microscopy. The Peak Tracker (Figure 3B) then determined the location of each fluorescent spot throughout the progressing frames and stored this information in a Single Molecule Archive. The identified positions were subsequently corrected for sample drift, that occurred over the course of the measurement, with the Drift Corrector. In parallel, the last frames of the video, showing the DNA molecules, were fitted with the DNA Finder to generate a coordinate list with the positions of all DNA molecules. This information was correlated with the positional information of the polymerases and a DNA Molecule Archive was created. The obtained DNA Molecule Archive contains the positional information of all polymerase molecules found to be on the DNA and serves as the basis for further kinetic studies. Distinguishing between sub-populations is possible by sorting the molecule records either by means of parameter values and/or assigned molecule tags. Concluding the analysis, data exploration revealed a population-specific rate distribution (Figure 3B, right) showing an observed transcription rate of 53±3.6 nt/s which is well in line with previous studies reporting transcription rates between 40–80 nt/s (Thomen et al., 2008).

Workflow 2–measuring intramolecular distances with smFRET

Single-molecule Förster Resonance Energy Transfer (smFRET) microscopy is used extensively to study protein dynamics (Lerner et al., 2018; Mazal and Haran, 2019; Michalet et al., 2006), RNA (Seidel and Dekker, 2007; Shaw et al., 2014; Xiaowei, 2005) and DNA (Seidel and Dekker, 2007) interactions, to elucidate enzyme mechanisms (Smiley and Hammes, 2006) as well as to study protein structure (Dimura et al., 2016; Schuler and Eaton, 2008) and molecular machines (Hildebrandt et al., 2014; Stein et al., 2011). Mars comes with multi-color fluorescence integration commands well-suited for the analysis of such datasets. Here we present a typical Mars workflow for dual-color smFRET data collected using alternating laser excitation with TIRF microscopy. This workflow starts with integration of fluorescence intensities and performs all stages of analysis to obtain corrected FRET efficiency (E) and stoichiometry (S) distributions that provide intramolecular distance information and the kinetics of the molecule dynamics. The workflow is applied to both static and dynamic FRET datasets and was adapted to several collection strategies.

To illustrate how Mars is used to analyze static smFRET populations, we processed the publicly available dataset from the benchmark study of Hellenkamp et al., 2018 to determine the FRET efficiency (E) and stoichiometry (S) of two short dsDNA-based samples labeled with donor (Atto550) and acceptor (Atto647N) fluorophores at two different inter-fluorophore distances (15 bps [1-lo] and 23 bps [1-mid], Figure 4A). Data were gathered by recording the emission from the surface-attached DNA molecules upon alternating donor and acceptor excitation (ALEX) to ensure accurate FRET measurements. Images were analyzed as follows, the Peak Finder was used to identify the locations of DNA molecules, followed by integration of the fluorescence intensity using the molecule integrator (multiview) to generate Molecule Archives with intensity vs time (I vs T) information (Figure 4B). To allow for the calculation of all correction factors, three Single Molecule Archives were generated: (i) a FRET archive including all DNA molecules that were found to fluoresce in both emission regions, (ii) an acceptor only (AO) archive containing all molecules that only have fluorescent acceptors excited with direct acceptor excitation, and (iii) a donor only (DO) archive containing all molecules that only exhibit donor emission. Separating these three species at the start of analysis facilitated easier downstream processing and data correction calculations. All three Molecule Archives were tagged and merged to a single master Molecule Archive containing the information of all three DNA molecule population types before further data corrections were applied.

Workflow for a static smFRET experiment.

(A) Schematic of the FRET assay. The FRET efficiency between two coupled dyes (donor, shown in blue, and acceptor, shown in red) on a short, immobilized, dsDNA oligo was measured. The inter-fluorophore distance was probed for two constructs: 23 bps (1-lo) or 15 bps (1-mid). (B) Representation of the analysis pathway starting from the raw image stack on the left to a stoichiometry vs FRET efficiency plot on the far right. First, the molecule integrator is used to integrate intensity vs time traces for each molecule (Aex: acceptor excitation, Aem: acceptor emission, Dem: donor emission, Dex: donor excitation) resulting in three Single Molecule Archives (FRET archive, donor only [DO] archive, and acceptor only [AO] archive). After merging, the data are corrected for background and other photo-physical effectsre classified according to the observed molecular features. Finally, the single-molecule data is displayed in a scatterplot with the stoichiometry (S) and FRET efficiency (E) information for both FRET samples (1-lo and 1-mid) as well as the AO and DO populations. The accompanying histograms plot the data from the 1-lo and 1-mid populations in gray bars and corresponding population-specific Gaussian fits as a solid black line. A detailed step-by-step guide to this workflow is available on the Mars documentation website.

First, a position-specific excitation correction was applied normalizing donor and acceptor intensities in relation to their specific position in the field of view. A subsequent kinetic change point analysis procedure (Hill et al., 2018) identified all intensity transitions associated with bleaching events. Next, molecules exhibiting all the expected smFRET features were selected. For example, molecules were not selected if they had more or less than exactly one donor and acceptor, displayed large intensity fluctuations not caused by the bleaching event, or low signal. Molecules were background corrected by subtracting the mean background intensity after bleaching from the measured intensity at each time point of the respective molecule. Furthermore, corrections accounting for leakage of donor emission into the acceptor region, quantum yield normalization, and direct excitation of the acceptor were applied to obtain accurate FRET parameters (see Methods for details). Finally, the corrected traces allowed for the calculation of E and S values for each molecule. The average corrected E values obtained for 1-lo and 1-mid were E1-lo = 0.16 ± 0.11 and E1-mid = 0.53 ± 0.13 (mean ± SD) with 246 and 250 accepted molecules included in the final calculations, respectively. These values are in agreement with the results of the multilab study by Hellenkamp et al. (E1-lo = 0.15 ± 0.02 and E1-mid = 0.56 ± 0.03). The differences in precision are due to differences in error estimation. We report the standard deviation resulting from Gaussian fitting of each population. Whereas, Hellenkamp et al. have reported the standard deviation from the best estimates from many labs. When calculated with the standard error of the mean our estimates become E1-lo = 0.16 ± 0.01 and E1-mid = 0.53 ± 0.01.

Our analysis of the static FRET dataset illustrates how Mars can be used to determine the FRET efficiency (E) and stoichiometry (S) of fixed FRET populations and allowed for direct benchmarking against published values. However, the majority of smFRET studies typically contain transitions between FRET states resulting from distance changes during the observation time within each molecule. To illustrate how this workflow can be used to analyze samples that exhibit transitions between states, we imaged surface-immobilized Holliday junctions labeled with donor and acceptor fluorophores attached to different DNA arms with ALEX (Figure 5A). To enhance conformational switching of the arms, we introduced a buffer containing 50 mM magnesium as previously described (Hyeon et al., 2012). We applied the smFRET workflow described above for the static FRET case to obtain the corrected FRET efficiency and stoichiometry as a function of time. Frequent transitions between high and low FRET states are observed from Holliday junctions switching between iso-I and iso-II conformations (Figure 5B). A scatterplot of stoichiometry vs FRET efficiency for all accepted molecules has two well-defined populations (Figure 5C) which we fit with a double Gaussian distribution to obtain Eiso-I=0.16 ± 0.12 and Eiso-II=0.64 ± 0.13 (mean ± SD, for n=601 molecules). Finally, we determined the dwell time distributions using a simple threshold-based two state model from which we obtained the transition timescales τiso-I = 0.31 ± 0.03 seconds and τiso-II = 0.37 ± 0.04 seconds (mean ± standard error) (Figure 5D). We used this simple model based on our prior knowledge that the system exhibited two state behavior. Alternatively, when the number of states is not known and transitions occur on timescales longer than the imaging rate, the Change Point Finder can be used to identify transitions in an unbiased manner. Alternatively, Molecule Archives can be opened in Python and other kinetic analysis methods can be used, such as Hidden Markov Modeling.

Figure 5 with 1 supplement see all
Workflow results for dynamic smFRET.

(A) Schematic of the dynamic FRET substrate. The FRET efficiency between two dyes (donor, shown in blue, and acceptor, shown in red) attached to the arms of a Holliday junction exhibiting rapid interconversion between low and high FRET states was measured. Biotin attachment to the surface used during the experiment was omitted from the cartoon for clarity. (B) FRET efficiency as a function of time for representative molecules. (C) Scatterplot showing the stoichiometry vs FRET efficiency for all timepoints of accepted molecules for FRET, AO, and DO populations. One-dimensional histograms displayed along each axis are fitted with single or double Gaussian models for stoichiometry and FRET efficiency, respectively. (D) Dwell time distributions from a two-state model for the high and low FRET states. Time scales are from exponential fits with standard deviation and n is the number of dwells taken from 601 molecules. A detailed step-by-step guide to this workflow is available on the Mars documentation website.

The FRET datasets presented thus far demonstrate the advantages of the ALEX collection strategy. Alternating excitation offers acceptor emission information from direct excitation that is used to calculate the stoichiometry of the dyes in each molecule and exclude traces with fluctuations in the acceptor signal that are independent of FRET. This ensures a more robust determination of correction factors and greater reliability. However, it comes at a cost when imaging with smTIRF. The FRET sampling rate is lower and the observation time is reduced due to limited emission from the acceptor before photobleaching. As a consequence, there are good reasons to omit the acceptor excitation pulses when considering the FRET collection strategy. Therefore, we have developed a third smFRET workflow with the Holliday junction dataset but without using the direct acceptor excitation information to calculate the correction factors. A detailed step-by-step guide to this workflow is available on the Mars documentation website. In this case, the stoichiometry is not calculated due to the lack of direct acceptor information.

Molecule selection is one of the most important steps during smFRET analysis and also the one most susceptible to bias. Therefore, a consistent set of criteria must be adhered to throughout the selection process and the final set of accepted molecules should be evaluated against known smFRET features. These procedures ensure molecules with improper labeling, dye fluctuations independent of FRET, and other photophysical artifacts are removed from the analysis. Jupyter notebooks that generate an automated validation report for all smFRET examples are included in the mars-tutorials repository. In the report, the stability of the sum of fluorescence signals is evaluated using the coefficient of variation, the anti-correlation of donor and acceptor emission is quantified using the Pearson correlation coefficient, and the median values of stoichiometry and FRET efficiency are compared with expected values for different regions of FRET traces, among other validation tests. The results of these evaluations for the dynamic FRET dataset are displayed in Figure 5—figure supplement 1 with suggested rejection thresholds. This provides an extra layer to improve the quality of final datasets and provides additional quantitative criteria to reduce bias.

Workflow 3–characterizing the kinetics of DNA topology transformations

Force spectroscopy methods, utilizing the high precision tracking of beads attached to biomolecules, have led to great insights into biological processes. Magnetic tweezers, in particular, are routinely used to study the physical behavior of DNA (Nomidis et al., 2017; Strick et al., 1998) as well as nucleic acid transformations by essential types of cellular machinery, such as structural maintenance of chromosomes complexes (Eeftens et al., 2017), the DNA replication machinery (Burnham et al., 2019; Hodeib et al., 2016; Manosas et al., 2012; Seol et al., 2016) and topoisomerases (Charvin et al., 2005; Gore et al., 2006; Nöllmann et al., 2007; Strick et al., 2000). The throughput of this approach has recently been improved by combining magnetic tweezers with DNA flow stretching to create an instrument called flow magnetic tweezers (FMT) (Agarwal and Duderstadt, 2020). The very large datasets generated by this instrument pose a challenge for analysis and initially led to the development of Mars. To illustrate the advantages offered by Mars in the analysis of these types of data, we present a workflow studying the behavior of DNA gyrase, a topoisomerase from E. coli (Nöllmann et al., 2007). This workflow illustrates the analysis steps required from raw tracking data to a well-structured single-molecule dataset with a traceable processing history utilizing the virtual storage infrastructure of Mars.

In the experimental setup of FMT, surface-immobilized DNA molecules are attached to paramagnetic polystyrene beads in a flow cell (Figure 6A). Block magnets placed above the flow cell create a magnetic field that orients the beads and applies a vertical force. A constant flow through the flow cell results in a drag force on the beads and the DNAs. Similar to a conventional magnetic tweezers setup, rotation of the magnets will change the topology of the DNA according to the direction of rotation. Furthermore, by inverting the flow direction, the DNA tether will flip and reveal the location of surface attachment. A low magnification telecentric lens makes it possible to image a massive field of view providing high throughput observations. At low applied forces, changes in DNA topology induced through magnet rotation result in the formation of DNA supercoils and DNA compaction. This results in a decrease in the projected length of the DNA molecule observable as bead motion in the image. Despite the low magnification of the lens required for high throughput, subpixel fitting allows for high resolution tracking of DNA length over time. The tracking results are rich in information that allows for detailed molecule classification and quantification of enzyme kinetics.

Workflow for gyrase characterization using flow magnetic tweezers.

(A) Schematic of flow magnetic tweezers (FMT). The projected length of the surface-immobilized DNA molecule attached to a magnetic bead was measured under different flow as well as magnet height and rotation conditions to study changes in DNA topology. (B) Representation of the analysis workflow starting from the raw image stack on the left to the fully analyzed plots on the right. First, positional information is extracted by the Peak Tracker to yield a Single Molecule Archive. Regions assigned to specific parts of the experiment are highlighted in the example trace (‘reversal’, ‘singly tethered’, ‘force’, ‘coiling’, and ‘gyrase reaction’) and are used to calculate different DNA-related properties and parameters. Subsequent classification and tagging allows for easy exploration of subpopulations. The top graph shows the rate distribution (enzymatic cycles/s) found for gyrase activity resolving positive supercoils (orange) and introducing negative supercoils (blue), respectively. The lower graph shows a box plot of the delay between the introduction of the enzyme to the system (T=0) and the observed enzymatic activity. Plots were calculated from 2,406 individual molecules.

In the experiment presented in Figure 6B, first, a series of predetermined flow and magnet transformations were executed to check tether quality. Then gyrase was introduced, resolving positive supercoils and subsequently introducing negative supercoils. Regions were assigned to the trace in which parameters were calculated and tags were allocated to the molecules based on these parameter values. In this particular example, four final categories discriminated molecules that were either retained for analysis, immobile, nicked, or rejected for various other reasons (e.g. attachment of multiple DNA molecules to one magnetic bead). The molecules retained for analysis were background corrected and the rate of positive relaxation and negative introduction was calculated and plotted (Figure 6B, right). A positive burst rate of 1.59 cycles/s, as well as a negative burst rate of 0.80 cycles/s, were found in this experiment. These values are in agreement with those reported previously (Nöllmann et al., 2007).

This workflow illustrates how Mars can be used to analyze large datasets virtually with on-the-fly data retrieval. Careful documentation is even more critical for large datasets where many stages of analysis must be entirely automated. The documentation framework provided in Mars ensures that the analysis is done reproducibly by entering each step in the log of the metadata record. This ensures a fully reversible workflow in combination with keeping all observations in the Molecule Archive, including those rejected from the final analysis. Mars can be of value for other major force spectroscopy methods, even though many do rely on z-directional movement instead of x-y tracking. Several options for external initial image processing are available from which results could be imported to Mars either in tabular form or using a scripting environment. Furthermore, this example proves that Mars is very flexible and can be used for any camera-based data where single-molecules are localized or tracked. Subsequently, Mars provides a platform with improved classification options, documentation, and data reusability for downstream analysis. The expandability of Mars will allow diversification of the workflow to integrate a broader range of force spectroscopy input data, for example, from optical and traditional magnetic tweezers.

Discussion

Rapid improvements in bioimaging technologies have led to powerful new approaches to follow the time evolution of complex biological systems with unprecedented spatial and temporal resolution generating vast information-rich datasets. These observations must be efficiently and reproducibly analyzed, classified, and shared to realize the full potential of recent technological advances and ensure new biological phenomena are discovered and faithfully quantified. Single-molecule imaging approaches, in particular, have moved from obscurity in specialized physics laboratories to the forefront of molecular biology research. Nevertheless, surprisingly few reporting standards and common formats exist. While significant progress has been made in establishing common file formats for single-molecule FRET datasets that can store raw photon information from point detectors (Greenfeld et al., 2015) and time-binned trajectories from images (Ingargiola et al., 2016), they offer limited options for adaption to other experimental modalities. Mars provides a solution to bridge this gap in the form of a common set of commands for single-molecule image processing, a graphical user interface for molecule exploration, and a Molecule Archive file format for flexible storage and reuse of image-derived datasets adaptable to a broad range of experiment types. To ensure Mars is accessible to a large community, Mars is developed open source and freely available as a collection of SciJava commands distributed through a Fiji update site. The Mars project is a member of the Scientific Community Image Forum (Rueden et al., 2019) which provides a vibrant platform for new users to get help and for advanced users to find solutions to difficult image analysis problems.

The workflows described illustrate how the basic collection of modular commands and Molecule Archive transformations provided can be reused to analyze data from very different experimental configurations. Nevertheless, we recognize the commands are ultimately limited and will not address all problems easily. Therefore, we provide interoperability between Mars and other platforms available in Fiji. In particular, to provide access to a broader range of particle tracking options, results can be exported from TrackMate to Molecule Archive format. We plan to expand interoperability to include other common formats generated from single-molecule imaging experiments as they emerge. Additionally, Mars uses the SCIFIO framework (Hiner et al., 2016) to convert different image formats into OME format and includes a specialized image reader for more comprehensive support of images recorded using Micromanager (Edelstein et al., 2010) frequently used for single-molecule imaging. This will allow Mars to process new image formats as they are developed and SCIFIO or Bio-Formats readers are written (Linkert et al., 2010). These are only a few examples of the many workflow options that integrate with core Fiji technologies. In case these integrations do not provide a solution, Mars has several built-in extension mechanisms. New Molecule Archive types and custom user interface elements can be added separately and discovered by Mars at runtime using the SciJava discovery mechanism. Mars was written with script and command development in mind to allow for the analysis of observations resulting from new approaches beyond single-molecule applications by extending the existing framework. Finally, Molecule Archives can easily be opened in Python environments by directly loading them as JSON files or using PyImageJ (Curtis Rueden et al., 2021), which provides access to many other data manipulation and visualization libraries.

Single-molecule imaging approaches have gained widespread usage and have become an indispensable tool for the discovery of new biological mechanisms. Unfortunately, data reporting standards have lagged far behind. Raw images together with a record of the processing history for reproduction are rarely provided. The development of new formats like the Molecule Archive format presented here will make it easier for researchers to faithfully report their results and ensure reproducibility. This will increase the level of confidence and quantitative accuracy of findings and allow for broader reuse of existing information-rich datasets. However, Mars only provides a framework to aid reproducibility and does not ensure it. Individual researchers are ultimately responsible for maintaining reporting standards sufficient for reproduction and following best practice recommendations for single molecule imaging experiments (Lerner et al., 2021). For example, scripts developed with Mars should be version controlled, made publicly accessible, and report all essential parameters to the processing log. Moreover, the version numbers and settings of all software integrated into workflows must be documented and reported to ensure reproducibility.

The discovery of new biological phenomena from single-molecule observations often depends on time-consuming manual classification of individual molecules and behaviors. Machine learning algorithms are now offering the possibility to automate these tasks (Kapadia et al., 2021; Thomsen et al., 2020), but their accuracy depends on robust training datasets. The powerful record tagging tools provided with Mars provide the ideal platform for the creation of large training datasets for machine learning based classifiers. Future work will focus on further development of interoperability of Mars with other platforms and machine learning workflows.

Methods

Mars installation

To get Mars, start by downloading a copy of Fiji and installing it. Open Fiji, go to the help menu, select update… and click the manage update sites button and activate the Mars update site by checking the box where you find Mars in the list of available update sites. Apply all required changes. This should install a large number of jar files that includes all core Mars software components and all dependencies required. To complete the installation process, quit and reopen Fiji. Now you are ready to start using Mars by running the commands in the Plugins menu in the Mars submenu. We suggest installing Mars into a new copy of Fiji to avoid incompatibility issues with older copies of Fiji. We will ensure Mars works with future copies of Fiji using the installation procedure outlined.

Getting help with Mars

Mars is a community partner on the Scientific Community Image Forum (Rueden et al., 2019) where users can report their problems in posts with the mars tag to get feedback and troubleshooting support. Don’t be a stranger! If you have a problem, no matter how small, we would love to hear about it and try to help. Solving your problem will likely help other users.

Workflow 1–tracking RNA polymerase position during transcription

Specific details about protein purification and labeling, the microscope set-up, sample preparation, and the imaging procedure can be found in the publication by Scherr et al., 2022. The raw data accompanying this example workflow is freely available through the Mars tutorials GitHub repository .

Extensive background information about all described Mars commands, specific settings used, and screenshots of example Molecule Archives and algorithm outcomes can be found on the Mars documentation pages. Scripts and example Molecule Archives accompanying this analysis can be found in the Mars tutorials GitHub repository.

Workflow 2–measuring intramolecular distances with smFRET

In this workflow, parameter nomenclature, as well as data correction and calculation procedures, were performed as described by Hellenkamp et al., 2018 to facilitate comparisons and to illustrate the flexibility of Mars workflow creation. Specifics on sample design, preparation, and the data acquisition procedure for the static FRET dataset presented in Figure 4 can be found in the original work (Hellenkamp et al., 2018). The raw image data is freely available on Zenodo. Dynamic FRET from conformational changes of a Holliday junction labeled with donor and acceptor dyes was collected for this study and deposited on Zenodo, where it is freely available for download. Full experimental details can be found below. Detailed step-by-step instructions for three smFRET workflows are available on the Mars documentation website for static FRET, dynamic FRET , and dynamic FRET without direct acceptor excitation . In addition to the detailed step-by-step guides on the workflow documentation pages linked above, we have also created a YouTube channel with many tutorials and included a video showing how to perform the dynamic FRET workflow.

The most significant difference between the three examples is the method used to calculate gamma (γ), the normalization of effective fluorescence quantum yields, and detection efficiencies of the acceptor and donor. For datasets collected using ALEX, the γ correction factor is calculated using a linear fit of one over the stoichiometry vs FRET efficiency as described previously (Lee et al., 2005). In the case of static FRET, this is conducted with the combined 1-lo and 1-mid datasets. For dynamic FRET, the two populations within each molecule are separated and fit. In the absence of direct acceptor excitation, γ is calculated using the ratio of the changes in intensities before and after acceptor photobleaching as described previously (McCann et al., 2010).

The analysis starts with the identification of the fluorescent DNA molecule positions and subsequent intensity vs time trace extractions. The Mars image processing commands used in the smFRET workflow were written for videos where different excitation wavelengths are stored in different channels which is standard practice throughout the imaging community. However, the static FRET sample data (https://doi.org/10.5281/zenodo.1249497) was saved as a single image sequence without channel information despite alternating acceptor and donor excitation for each frame. Therefore, we wrote a script for Fiji to convert to the multichannel format expected by Mars that de-interleaves the frames into two channels: acceptor excitation (C=0) and donor excitation (C=1). The intensity peaks were identified using the Peak Finder and their locations are exported to the ImageJ ROI Manager. To account for the dual-view of the camera which separates the acceptor emission and donor emission wavelengths to different regions, the coordinates of the peaks were transformed to the other half of the dual-view using an Affine2D matrix. This matrix was calculated with the bead data provided on Zenodo using the ‘descriptor-based registration (2d/3d) (Preibisch et al., 2010) plugin in Fiji as described in the tutorial on the Mars documentation site . Next, the molecule integrator (multiview) was used to extract the intensity vs time traces of all molecules at all specified emission and excitation colors. This generated a single Molecule Archive containing molecule records with intensity traces for each identified molecule. For easier downstream analysis, the described procedure was repeated, thereby generating three different Molecule Archives from each video: (i) FRET archive–a Molecule Archive containing molecules that have both donor and acceptor emission; (ii) AO archive–a Molecule Archive containing molecules with acceptor emission after acceptor excitation only; and (iii) DO archive–a Molecule Archive containing molecules with only donor emission after donor excitation. The metadata records of these Molecule Archives were tagged accordingly before merging them into a master Molecule Archive using the Merge Archives command.

In the master Molecule Archive, after the metadata tags (FRET, AO, and DO) were added to all molecule records , a position-specific excitation correction was applied that normalizes the donor and acceptor intensities dependent on their location in the field of view . Next, for each intensity trace, the Single Change Point Finder was used to automatically detect large intensity shifts indicating donor or acceptor bleaching events . Subsequent manual tagging of molecule traces with the ‘accepted’ tag allowed for the exclusion of certain molecules in further analysis: (i) molecules with either more than or fewer than exactly one donor and acceptor fluorophore per molecule, respectively, (ii) molecules showing large intensity changes other than the bleaching event, and (iii) molecules with a too low signal to noise ratio, for example. This yielded a Molecule Archive with tagged molecules to be included in the forthcoming calculations and data correction steps.

Next, the fluorescence emission from the donor and acceptor were corrected and the FRET efficiency and stoichiometry values were calculated. All steps were combined in one script . The corrections and calculations performed by this script are described in the following steps, subscripts, superscripts, and variables are defined as follows:

  • i-iii: (i) the uncorrected intensity; (ii) intensity after background correction; (iii) intensity after background, ⍺ and δ corrections.

  • D or A: donor or acceptor.

  • Aem|Dex: intensity in the acceptor channel upon donor excitation.

  • Dem|Dex: intensity in the donor channel upon donor excitation.

  • Aem|Aex: intensity in the acceptor channel upon acceptor excitation.

  • app: apparent values that include systematic experimental offsets.

  • DO/AO: donor-only/acceptor-only species.

  • S: stoichiometry, approximately the ratio of the donor dyes to all dyes.

  • E: FRET efficiency.

  • Iemission|excitation: intensity for the specified excitation and emission.

  • FA|D: acceptor emission upon donor excitation fully corrected for background, leakage, and direct excitation.

  • FD|D: donor emission upon donor excitation fully corrected for background, and differences in fluorescence quantum yields and detection efficiencies of the acceptor and donor.

  • FA|A: acceptor emission upon acceptor excitation fully corrected for background, and differences in excitation intensities and cross-sections of the acceptor and donor.

  • α: leakage of donor fluorescence into the acceptor region.

  • δ: direct acceptor excitation by the donor excitation laser.

  • β: normalization of excitation intensities and cross-sections of the acceptor and donor.

  • γ: normalization of effective fluorescence quantum yields and detection efficiencies of the acceptor and donor.

Step 1: The background correction step subtracts the mean background intensity after bleaching as measured from the traces in the respective FRET Molecule Archive from all prebleaching intensities in a trace-wise fashion. This yielded the background-corrected intensity values (iiIemission|excitation). These corrected intensity values were used to calculate iiEapp and iiSapp (Equation 1).

(1) iiSapp=iiIAemDex + iiIDemDexiiIAemDex + iiIDemDex + iiIAemAex and iiEapp=iiIAemDexiiIAemDex + iiIDemDex

Step 2–3: Next, the leakage of donor fluorescence into the acceptor channel (α) and direct acceptor excitation by the donor excitation laser (δ) were calculated from the AO and DO molecules respectively according to (Equation 2 ). FA|D stores the fully corrected intensity from acceptor emission upon donor excitation for each molecule at each time point before the first photobleaching event (Equation 3).

(2) α=iiEapp(DO)1iiEapp(DO) and δ=iiSapp(AO)1iiSapp(AO)
(3) FAD=iiIAemDexαiiIDemDexδiiIAemAex

Step 4–5: To then account for the normalization of excitation intensities and cross-sections of the acceptor and donor (β) and the normalization of effective fluorescence quantum yields and detection efficiencies of the acceptor and donor (γ) the respective correction factors were determined based on the relationship between the 1-lo and 1-mid population. To do so, iiEapp and iiSapp values were averaged in a molecule-wise fashion and linear regression against the values of the entire population yielded correction factors β and γ (Equation 4) that were applied to calculate FA|A and FD|D (Equation 5). Where FA|A is the corrected acceptor emission upon acceptor excitation and FD|D is the corrected donor emission on donor excitation. The gamma correction factor was calculated based on the assumption that placing the same dyes at different bases along the DNA molecule does not influence the ratio of the acceptor and donor fluorescence quantum yields. These subsequently yielded the fully corrected E and S parameters for each molecule (Equation 6). A population-specific molecule average revealed the respective population average values.

(4) 1iiiSapp(FRET)=biiiEapp(FRET) + a

where β=a+b1 and γ=a1a+b1

(5) FDD=γiiIDemDex and FAA=1βiiIAemAex
(6) E=FADFDD+FAD and S=FAD+FDDFDD+FAD+FAA

More information regarding the derivation of the discussed formula as well as information about the applied corrections can be found in the publication by Lee et al., 2005. We developed a second workflow using the Holliday junction dataset but without using the direct acceptor excitation information for corrections and calculations . In the absence of direct acceptor excitation, γ is calculated using the ratio of the changes in intensities before and after acceptor photobleaching as described previously (McCann et al., 2010).

(7) γ = AIPreAIPostDIPostDIPre

where AIPre is the mean acceptor intensity before photobleaching, AIPost is the mean background of the acceptor spot after photobleaching, DIPre is the mean intensity of the donor before acceptor photobleaching, and DIPost is the mean intensity of the donor after acceptor photobleaching before donor photobleaching (the donor recovery period).

The dwell time distributions for the dynamic FRET workflow examples using the Holliday junction were determined using a simple two-state model run on the final archive that adds segments tables to all molecule records . The segment tables with the state fits from the script are used in the Jupyter notebook to calculate the timescales of conformational changes displayed in Figure 5.

To assess the quality of the selected molecules in the final Molecule Archives, we developed several validation criteria and included validation reports for all three smFRET examples that can be found in the FRET section of mars-tutorials repository within the subfolder for each example workflow. In the report, the stability of the sum of fluorescence signals is evaluated using the coefficient of variation, anti-correlation of donor and acceptor emission is quantified using the Pearson correlation coefficient, and the median values of stoichiometry and FRET efficiency are compared with expected values for different regions of FRET traces, among other validation tests. The results of these evaluations for the dynamic FRET dataset are displayed in Figure 5—figure supplement 1 with suggested rejection thresholds. The Jupyter notebooks provided only report on these validation parameters. We also provide a script to filter the selected molecules based on thresholds that can be used to automate part of the selection process . We offer this as a starting point, but care must be taken when performing the selection process and we expect additional criteria may need to be added to ensure robust automated filtering.

Furthermore, information about all described Mars commands, specific settings used in the built-in tools, and screenshots of expected outcomes can be found on the Mars documentation pages. Scripts, Jupyter notebooks, and Molecule Archives accompanying this analysis can be found in the Mars tutorials GitHub repository.

Dynamic smFRET data collection

Dynamic single-molecule FRET datasets were obtained using a four-stranded holliday junction assembled as previously described (Hyeon et al., 2012). Briefly, HPLC purified oligos R_branch_bio (5ʹ-biotin-TTTTTTTTCCCACCGCTCGGCTCAACTGGG-3ʹ), H_branch_Cy3 (5ʹ-Cy3-CCGTAGCAGCGCGAGCGGTGGG-3ʹ), X_branch (5ʹ-GGGCGGCGACCT CCCAGTTGAGCGCTTGCTAGGG-3ʹ), and B_branch_Alexa647 (5ʹ-Alexa647-CCCTAGCAAGCCGCTGCTACGG-3ʹ) obtained from Eurofins Genomics GmbH were mixed to a final concentration of 10 μM each in annealing buffer (30 mM Hepes pH 7.5, 100 mM potassium acetate), heated to 90°C, and annealed by slow cooling to 4°C over 90 min. Imaging was performed using an RM21 micromirror TIRF microscope from Mad City Labs (MCL, Madison Wisconsin, USA) with custom modifications as previously described (Larson et al., 2014) equipped with an Apo N TIRF 60 × oil-immersion objective (NA 1.49, Olympus). Dyes were excited with OBIS 532 nm LS 120 mW and 637 nm LX 100 mW lasers from Coherent at full power, expanded to fill the field of view. Scattered light from excitation was removed and signals were separated with emission filter sets (ET520/40 m and ZET532/640 m, Chroma). Emission light was split at 635 nm (T635lpxr, Chroma) with an OptoSplit II dualview (Cairn Research, UK) and collected on an iXon Ultra 888 EMCCD camera (Andor).

Imaging surfaces were prepared as follows. Glass coverslips (22×22 mm, Marienfeld) were cleaned with a Zepto plasma cleaner (Diener Electronic) and incubated in acetone containing 2% (v/v) 3-aminopropyltriethoxysilane for 5 min. Silanized coverslips were rinsed with ddH2O, dried, and baked at 110°C for 30 min. Coverslips were then covered with a fresh solution of 0.4% (w/v) Biotin-PEG-Succinimidyl Carbonate (MW 5,000) and 15% (w/v) mPEG-Succinimidyl Carbonate (MW 5,000) in fresh 0.1 M NaHCO3 and incubated overnight at room temperature. Coverslips were rinsed with ddH2O, dried, and incubated again with a fresh Biotin-PEG/mPEG solution as described above. Functionalized PEG-Biotin microscope slides were again washed and dried and finally stored under vacuum.

A functionalized PEG-Biotin microscope slide was covered with 0.2 mg/ml streptavidin in blocking buffer (20 mM Tris-HCl, pH 7.5, 50 mM NaCl, 2 mM EDTA, 0.2 mg/ml BSA, 0.005% (v/v) Tween20) for 15 min. Flow cells were assembled using ~0.1 mm thick double-sided tape containing an ~0.5 mm wide flow lane. The double-sided tape was sandwiched between the cover glass on one side and a 1 mm thick piece of glass on the other containing entry and exit holds. Polyethylene tubes (inner diameter 0.58 mm) were inserted into the holes and the entire assembly was sealed with epoxy.

Flow cells were flushed with blocking buffer and left for 30 min. Holliday junctions were incubated in the flow chamber at a concentration of 1–5 pM for 2 min in TN Buffer (10 mM Tris pH 8.0, 50 mM NaCl). Excess holliday junctions were removed by washing with TN buffer. Finally, imaging was performed in RXN Buffer (10 mM Tris pH 8.0, 50 mM MgCl2, 1 mM Trolox (aged for 5–8 min under UV), 2.5 mM PCA and 0.21 U/ml PCD). Tubes were sealed with clips several minutes prior to imaging to allow time for oxygen removal. Before exciting the dyes in each field of view, an 808 nm laser was used to obtain focus. Collection was performed using an ALEX approach in which the 532 and 637 lasers were alternated and 40 ms exposures were collected in burst acquisition mode using a custom BeanShell collection script with micromanager 2.0. Ten to twenty fields of view were collected sequentially for 3 min each using an automated microDrive stage from MCL with autofocus steps preceding each collection. Image sequences are available on Zenodo (https://doi.org/10.5281/zenodo.6659531).

Workflow 3–characterizing the kinetics of DNA topology transformations

The raw video is available through Zenodo (https://zenodo.org/record/3786442#.YTns2C2B1R0). This video is a reduced dataset from one of the FMT experiments investigating the topological changes gyrase performs on the DNA. The groovy scripts for analyzing the dataset can be found on GitHub . The first three scripts have been used to create a CSV file and the data was then plotted with a Python script. The data analysis procedure using Mars has been described in Agarwal and Duderstadt, 2020.

After tracking the molecules in the dataset and the generation of the single Molecule Archive, the traces were sorted and classified. The experiment was designed in such a way that steps in the process of the assay could be used as indicators. Based on these indicators, a discrimination was made between the molecules classifying them either as immobile (not mobile, not coilable), nicked (mobile, not coilable) or fit for analysis (mobile, coilable). A fourth category contains every rejected molecule due to various reasons like multiple DNA attachments on a single bead or getting stuck during the experiment before the enzymatic activity was detected. The tethers can appear to be mobile and coilable but certain thresholds are not passed like being coilable at high force. Specifically, the main indicators for this dataset are derived from: (i) the observation of a positional change of the DNA-bead after flow direction reversal (stuck or not stuck), (ii) a test for DNA molecule coilability by rotating the magnets at high force (single or multiple DNA molecules attached to a single bead), (iii) a molecular force measurement investigating the force on the DNA, and (iv) a second rotating step at lower force to introduce positive twist which is resolved by gyrase. The positive and negative introduction of supercoiling is used to calibrate the extension change of each DNA molecule to the change of twist during the analysis. The response of the molecule to these indicator tests is determined by parameter calculation scripts that only consider a certain assigned region of interest in the trace. When a certain threshold for a parameter is met, a tag is added to the molecule record. The final set of tags present in the molecule record determines in which group the molecule is classified. A sliding window approach is applied to all relevant molecules revealing kinetic information such as coiling and compaction rates as well as influences on these parameters after the introduction of gyrase. The size of the window depends on whether gyrase is relaxing positive twist or introducing negative twist. For the positive relaxation a window of 12.5 s and for the negative introduction a window of 25 s is used. In this case, one cycle means the linking number of the DNA is changed by –2. Since the DNA has to be relaxed to get negatively supercoiled, the burst position for positive relaxation comes earlier than the negative introduction. The time points have been corrected such that gyrase introduction coincides with T=0. The calculated information was either exported to CSV format or was directly interpreted with Python. This flexibility enables data visualization on various platforms.

The entire analysis, including more background information, can be found on the website. Furthermore, the Mars commands are explained in great detail on the documentation site (https://duderstadt-lab.github.io/mars-docs/docs/).

Data availability

The analysis software described is publicly available in several repositories on GitHub at https://github.com/duderstadt-lab. The core library used for the analysis and storage of data is contained in the mars-core repository. The graphical user interface is contained in the Mars-fx repository. The videos used in all workflows have been made available in public databases. Links can be found in the methods sections for each workflow. Extensive documentation and links to many additional resources and scripts used in all workflows can be found on the Mars documentation website.

Data availability

The analysis software described is publicly available in several repositories on GitHub at https://github.com/duderstadt-lab. The core library used for the analysis and storage of data is contained in the mars-core repository. The graphical user interface is contained in the mars-fx repository. The videos used in all workflows have been made available in public databases. Links to datasets can be found in the Methods section for each workflow. Extensive documentation and links to many additional resources and scripts used in all workflows can be found at https://duderstadt-lab.github.io/mars-docs/.

The following previously published data sets were used
    1. Hugel T
    (2018) Zenodo
    Datasets for "Precision and accuracy of single-molecule FRET measurements - a multi-laboratory benchmark study".
    https://doi.org/10.5281/zenodo.1249497
    1. Agarwal R
    2. Duderstadt KE
    (2020) Zenodo
    Flow Magnetic Tweezers example video.
    https://doi.org/10.5281/zenodo.3786442
    1. Duderstadt KE
    (2022) Zenodo
    Dynamic FRET example videos related to "Mars, a molecule archive suite for reproducible analysis and reporting of single-molecule properties from bioimages".
    https://doi.org/10.5281/zenodo.6659531

References

    1. Xiaowei Z
    (2005) Single-molecule RNA science
    Annual Review of Biophysics and Biomolecular Structure 34:399–414.
    https://doi.org/10.1146/annurev.biophys.34.040204.144641

Decision letter

  1. Ruben L Gonzalez
    Reviewing Editor; Columbia University, United States
  2. Volker Dötsch
    Senior Editor; Goethe University, Germany
  3. Eitan Lerner
    Reviewer; The Hebrew University of Jerusalem, Israel
  4. Jingyi Fei
    Reviewer; University of Chicago, United States

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Mars, a molecule archive suite for reproducible analysis and reporting of single molecule properties from bioimages" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by Ruben Gonzalez as the Reviewing Editor and Volker Dötsch as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Eitan Lerner (Reviewer #1); Jingyi Fei (Reviewer #2).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1) The authors should further develop the Mars platform to: (i) expand the flexibility of the smFRET input data files in the manner described by Reviewer 2 and, if necessary, to do the same for the other various types of data files (e.g., single-molecule force-extension data from laser tweezer data, particle tracking data from DNA curtain experiments, etc.) and (ii) include validation information/reports of the kind described by Reviewer 1.

2) The authors should comprehensively test and troubleshoot all of the functions of the platform using different computers and operating systems to ensure the robust performance of the platform. Reviewer 2 and her student have kindly provided a detailed, but not exhaustive, list of technical problems that they encountered and that should be resolved.

3) The sample dataset of smFRET trajectories from Hellenkamp et al., (2017) that is analyzed for the work reported in this manuscript is limited in the sense that the trajectories do not exhibit transitions between different FRET efficiency states as a function of time. Given that the majority of smFRET studies typically contain trajectories that do exhibit such transitions, it is important that the authors include a second sample dataset of smFRET trajectories that do exhibit such transitions as a function of time. The authors are encouraged to analyze an already existing dataset from their own laboratory or a collaborator's laboratory or, if that is somehow not possible, to record such a dataset de novo. When adding these new analyses and revising the smFRET section of the manuscript, the authors should take care to also address Reviewer 1's comments regarding the smFRET example(s).

4) The authors should revise the manuscript to include: (i) a beginners-level, practical discussion of why a platform such as Mars is needed to increase the reproducibility of data storage and processing practices (including examples of real-world problems that Mars would address, minimize, or eliminate) and (ii) an easy-to-follow, brief guide explaining the use of Mars within the manuscript aimed at readers who may not necessarily be ready to use the platform or the associated Jupyter notebooks and/or interactive features of the website.

5) The step-by-step instructions for using Mars that are included on the website should be expanded to include further details, particularly in terms of the output files, as well as to include a general troubleshooting guide for users.

Reviewer #1 (Recommendations for the authors):

This report constitutes the review of the manuscript titled "Mars, a molecule archive suite for reproducible analysis and reporting of single molecule properties from bioimages", by Huisjes, Retzer et al. With the ever increasing interest in these techniques, accompanied by ever increasing experimental schemes and analytical frameworks it becomes difficult to report such experimental results in a manner that is universal, one which can adapt to any type of experiment and analysis, one which assists in reproducibility, and one which will keep the data as compact as possible, yet its usage as efficient as possible. This work introduces a novel platform for the reproducible archiving of camera-based single-molecule imaging experiments. This work will appeal to practitioners of single-molecule imaging experiments, both experienced and newbies. The readers of this work would benefit from understanding how to employ a rational data archiving process using Mars, following three examples the authors provide, which exhibit the generality of the platform and its ease of use. Readers who might want to employ Mars for their own single-molecule imaging measurements can also experience an intuitive guide on Mars GitHub or in Jupyter Notebooks the authors provide. However, I believe that it would even be better if the authors could provide a guidance chapter in the manuscript itself, in addition and before sending the readers to the guide online – this way, readers would already have an idea of what to expect when they try constructing their own Mars data archiving.

I judge that to the most part, the manuscript shows concisely and clearly how to use Mars for proper archiving of single-molecule imaging data, in a rational manner, that can be easily read, well understood, without loss of information and with the ability to easily track back and perform changes. The manuscript is very well polished, the figures provide a self-explanatory graphical guide – overall I enjoyed reading it and see how much this tool can advance the field. The importance of this work is very clear to experienced practitioners, who already know what it takes to provide extensive reports on their results, after being analyzed, corrected, filtered and analyzed in one combination of ways out of many others. Although I am very positive about this manuscript, I have a few suggestions for the authors:

1. Readers of the manuscript who are not yet experienced with single-molecule imaging could benefit from a pragmatical discussion that will explain the need for Mars for reproducibility and/or readability of the data: the different types of filtrations a practitioner may perform, all are context dependent, and what might happen if those are not properly documented and annotated. I suggest providing user examples of what might go wrong, and how Mars minimizes such cases.

2. The authors provide a very nice interactive as well as notebook guidance for readers of the manuscript who might want to get acquainted with Mars. However, I think readers of the manuscript would benefit from such a guide already in the manuscript, which will assist them in understanding what to expect when they move to try using Mars. This way, readers who are not yet ready to test Mars, would already be able to follow the guide in text, and then decide if they move to test Mars in action for their datasets.

3. The analysis pipeline provides a nice abstraction all the way from the raw data to the filtered information. However, another layer of validation should be added afterwards. For example, procedures that test all the filtered smFRET data for features that should exist in the data: (1) sum of all donor-excited signals should be constant after corrections were applied; (2) sum of all intensities should be constant, after corrections were applied, (3) transitions between states (FRET dynamics, or photobleaching) should exhibit donor and acceptor anti-correlation; (4) transitions between Stoichiometry states should exhibit donor and acceptor anti-correlation. Adding a filtered data validation procedure will enhance the procedure even better, and provide another layer of credibility, perhaps even in an automated report as an outcome of the validation step applied on all molecules.

4. Comments on the smFRET example:

a. The authors chose to show the TIRF camera-based smFRET data from Hellenkamp et al., (2017), of a static mixture of two DNA FRET constructs with two different mean FRET efficiencies. The procedure they describe was designed to fit the treatment of the static mixture of two types of immobilized molecules. Therefore, the analysis pipeline employed the change point finder only for the identification of photo-bleaching steps, and not for identifying dynamic FRET transitions. While this might be fine for the static heterogeneity of this sample, for the analysis framework to be as generic as possible for analyzing smFRET measurements, it is important to include a step of identification of state dwells within each single molecule FRET trajectory. Importantly, the FRET calculations should be performed on dwells that have been proven to belong to a single state.

b. The analysis pipeline was designed to filter out traces with large intensity changes that are not due to photo-bleaching. Large intensity changes might be interesting if one of the FRET dyes experiences quantum-yield changes that are unrelated to FRET (e.g., smFRET involving Cy3 and Cy5). It could be interesting in the proper context, and hence should allow the user to change the data rejection criteria, in a context-dependent manner.

c. Correction factors 1: the text relies on the correction factors presented by the multi-lab work of Hellenkamp et al., (2017). That approach uses the Lee et al., (2005) approach for calculating the γ correction factor, which relies on the linear dependence of the inverse of the mean stoichiometry on the mean proximity ratio, of multiple single-population smFRET-ALEX measurements. While this correction factor procedure is used by many and is very common, it does suffer from assuming that both donor and acceptor fluorescence quantum yields change in the same manner when positioning the same donor and acceptor dyes at different positions along the same molecule (a dsDNA in this case). This might not be true, as different abelling positions introduce different dye microenvironments, which may introduce different quenching rates that are unrelated to FRET. In other words, it is possible (and not uncommon at all) that different molecules with different FRET values will also be associated with different γ factors. There are alternatives that can assist in calculating per-population γ factors, which require knowledge of the donor and acceptor fluorescence lifetimes per population. However, this is not yet possible in immobilized TIRF camera-based smFRET. Therefore, in page 10 of the bioRxiv preprint, when the authors present the γ correction factor, I suggest they add: “… , under the assumption that placing the same dyes at different bases along the DNA molecule does not influence the ratio of the acceptor and donor fluorescence quantum yields”.

d. Correction factors 2: the text relies on the correction factors presented by the multi-lab work of Hellenkamp et al., (2017). That approach uses the Lee et al., (2005) approach for calculating the γ correction factor. However, Hellenkamp et al., (2017) and not Lee et al., (2005) is cited. I ask the authors to cite Lee et al., (2005), when providing the explanation on the β- and γ-factor calculations.

e. When explaining the FRET-related calculations, sometimes the variables are not explained: (i) in steps 4 and 5, explicitly explain the readers what are FA|A and FD|D; (ii) "FA|D stores the fully corrected acceptor fluorescence intensity value"; (iii) explain for FRET novices what do the variables in equations 1 and 2 mean.

f. At the bottom of page 6 of the bioRxiv preprint, the authors provide their estimations of the FRET efficiency values, and it seems that the accuracy is high, if comparing the values to the mean values reported by Hellenkamp et al., (2017). However, the precision of the reported values (E1-lo = 0.14 {plus minus} 0.13 and E1-mid = 0.51 {plus minus} 0.09) are low compared to the precisions in Hellenlamp et al., (E1-lo = 0.15 {plus minus} 0.02 and E1-mid = 0.56 {plus minus} 0.03). Why such imprecisions? Please explain the readers in the text.

Reviewer #2 (Recommendations for the authors):

Technical issues encountered:

1. Overall the step-wise tutorial on the website is well organized, but can be further improved. Particularly, the expected results/output files/windows should also be included, similar as the parameter setting for the input part. Sometimes it’s unclear what the correct output format should be from each operation.

2. We noticed Mac and Windows systems could encounter different issues. Taking the FRET analysis as an example, we had to switch between the Windows and Mac systems to get it through completely, because certain steps worked in one but not the other. For example:

(1) When we tried to convert the example FRET video to contain channel information using script 1, the Windows system generated a blank image. However, it ran correctly on Mac.

(2) There was some problem with transforming the ROIs to the right part of the split view on a Mac system. We used the same parameters as the instruction, but all the coordinates disappeared after doing Transform ROIs. But it worked well on a windows system.

(3) On a Mac system, after running peak tracer or molecular integrator, we were not able to generate a GUI interface containing the actual output, but a console saying the analysis is successful.

3. It’s very confusing that the output file of the converted FRET data is a composite image of two channels. It is not explained what each of the channels is until explaining the output file of the molecular integrator step. Echoing my point 1, it is necessary to explain to the users the f’rmat of the output files. The composite image makes it appear that two channels are the s’me without any explanation, and each channel is the sum of the red and green signal.

https://doi.org/10.7554/eLife.75899.sa1

Author response

Essential revisions:

1) The authors should further develop the Mars platform to: (i) expand the flexibility of the smFRET input data files in the manner described by Reviewer 2 and, if necessary, to do the same for the other various types of data files (e.g., single-molecule force-extension data from laser tweezer data, particle tracking data from DNA curtain experiments, etc.) and (ii) include validation information/reports of the kind described by Reviewer 1.

(i) We have revised Mars to expand the flexibility of the smFRET input types as requested by Reviewer #2:

1. We added support for multiview setups having more than a dualview. The Transform ROIs and Molecule Integrator (multiview) commands now support tripleview, quadview, and more.

2. We added two new example smFRET workflows using a new dataset we collected that has dynamic FRET from a Holliday junction substrate. One example demonstrates a complete ALEX analysis workflow of dynamic smFRET data with alternating red and green pulses. (https://duderstadt-lab.github.io/mars-docs/examples/Dynamic_FRET/)

3. As requested by the reviewer, the second new example demonstrates smFRET analysis when there is no direct acceptor excitation. In this case, we provide new scripts that perform molecule-by-molecule γ calculation and added several other modifications for this type of input data. (https://duderstadt-lab.github.io/mars-docs/examples/No_aex_FRET/)

4. We have added instructions for processing smFRET data collection using setups with two different cameras.

Expand the flexibility of Mars input types to include single-molecule force-extension data from laser tweezer data, particle tracking from DNA curtain experiments:

1. We created a new repository called mars-lumicks (https://github.com/duderstadt-lab/mars-lumicks) with an import command for lumicks h5 optical tweezer datasets. This import command reads the h5 file created by the widely used lumicks optical tweezer setups and converts it into a Molecule Archive. The Molecule Archive with the converted data then opens directly in Fiji. This new import command is now included in the Mars update site that is part of normal Mars installations. Details instructions on use can be found in the repository readme.

2. Many performance enhancements were required in mars-fx (the Mars GUI we call the Mars Rover) to nicely handle the converted lumicks datasets, which had substantially higher time resolution than typical mars datasets. For example, one dataset has 10s of millions of points, so a new down sampler was added to allow seamless zooming in and out. Memory handling was also improved to better support these datasets.

3. We created a new repository called mars-smd (https://github.com/duderstadt-lab/mars-smd) with an import command for the Single-molecule Dataset (SMD) format. This format was proposed for single-molecule FRET datasets. The importer was tested on the example data provided in the original data format specification. We failed to locate newer datasets with this format for further testing. We hope users will get in touch if they have this data type and would like to use Mars.

4. The Mars Peak Tracker supports particle tracking from DNA curtain experiments. The tracking example provided on the mars-docs website can be used for these datasets, the only difference compared to the sample data provided is the more compact organization of DNA molecules in the DNA curtains.

Reviewer #2 also asked how Mars could handle raw data in different formats. For example, with red and green channels as separate videos. As a start, we improved the script we wrote for conversion of the Hellenkamp et al. Nature Methods (2018) dataset that adds the channel information Mars expects to the raw video (https://github.com/duderstadt-lab/mars-tutorials/blob/master/Example_workflows/FRET/scripts/static_FRET_reformat_video.groovy). In addition, we tested Mars with more video formats. Mars can support all formats that can be opened in Fiji. This includes the hundreds of formats supported by Bio-Formats. We also previously wrote a micro-manager 1.4 and 2.0 reader that is included with Mars and nicely extracts all metadata from these kinds of files (https://github.com/duderstadt-lab/mars-scifio). This includes the new Holliday junction sample data we collected with micro-manager 2.0.

When writing Mars, we made a huge effort to fully comply with and support the OME format. We strongly feel that members of the single-molecule imaging community should make a greater effort to collect data with this common and universally understood format. However, we understand this is unlikely to happen anytime soon.

Nevertheless, we are confident that videos collected with any strategy can be converted to a format Mars understands due to the robustness of the OME format. For the specific case of separate red and green videos mentioned by Reviewer#2, both videos could be opened in Fiji and combined as different channels using the Merge Channels ImageJ command (Image>Color>Merge Channels). The scripts and instructions included with Mars can help in conversion of other video types. However, without specific sample data to use for testing, we can only guess about the many different methods people could use for collecting their data when they decide to ignore common image format standards generally accepted in the microscopy community. We are very happy to provide further conversion scripts that accept other collection types we have not yet seen. Mars is a community partner on the Scientific Community Image Forum. This is the perfect place for users to post about different video input types and we can help provide conversion scripts.

We have also added a general discussion of supported image formats on the website in the documentation section under ‘Supported Image Formats’ to further address this comment (https://duderstadt-lab.github.io/mars-docs/docs/). This includes all types supported by Fiji, but some workflows depend on using multiple channels that separate excitation/filter differences. We provide instructions for converting where required.

(ii) Include validation information/reports of the kind described by Reviewer #1:

We have added validation information and a jupyter notebook validation report based on the suggestions from Reviewer #1 to the smFRET example workflows and scripts. In particular, we now calculate the coefficient of variation of the sum of the donor and acceptor signals and sum of all signals to check for the expected constant fluorescence. We calculate the Pearsons correlation coefficient in the FRET region to check for anti-correlation of the donor and acceptor fluorescence. We check that the E and S values match expected theoretical values for distinct regions of the FRET trace. These metrics have been added by the smFRET scripts as molecule parameters and are then used in the jupyter notebook validation report. The report notebook can be used with any dataset processed with the smFRET scripts we provide. We have generated sample records for all three smFRET workflows based on the sample Molecule Archives we provide in the mars-tutorials repository:

dynamic FRET example:

https://github.com/duderstadt-lab/mars-tutorials/blob/master/Example_workflows/FRET/dynamic/dynamic_FRET_validation_report.ipynb

dynamic FRET example no acceptor excitation:

https://github.com/duderstadt-lab/mars-tutorials/blob/master/Example_workflows/FRET/no_acceptor_excitation/no_acceptor_excitation_FRET_validation_report.ipynb

static FRET example:

https://github.com/duderstadt-lab/mars-tutorials/blob/master/Example_workflows/FRET/static/static_FRET_validation_report.ipynb

All these notebooks render nicely when viewed in Firefox on the github website without the need to install and run them locally (we noticed safari doesn’t render them as nicely due to issues with the nbviewer developed on github). They can be run with other datasets just by changing the files paths at the top.

We have included a new Figure 5 —figure supplement 1 with the results of validation for the new Holliday junction dataset.

These many new features and performance enhancements have greatly improved Mars. We were happy the editors and reviewers asked for these changes.

2) The authors should comprehensively test and troubleshoot all of the functions of the platform using different computers and operating systems to ensure the robust performance of the platform. Reviewer 2 and her student have kindly provided a detailed, but not exhaustive, list of technical problems that they encountered and that should be resolved.

We are very grateful to Reviewer #2 and her student for reporting the issues they encountered. We were very sad to hear about these problems. We have resolved the technical problems they encountered. In the process, we have also completely remade the smFRET workflows to try and avoid further issues. This involved streamlining all the steps and creating a uniform set of scripts for all workflows. We also rewrote the static FRET conversion script that gave problems (https://github.com/duderstadt-lab/mars-tutorials/blob/master/Example_workflows/FRET/scripts/static_FRET_reformat_video.groovy). We have tested the workflows on mac and windows. A new member of our lab has also been testing mars on windows more extensively for several months.

To help further with these issues and make it clear what is expected for different commands, we have created a YouTube channel (https://www.youtube.com/channel/UCkkYodMAeotj0aYxjw87pBQ) that has video tutorials for working with Mars. Many are short tutorials, but we have also added an in-depth video going through the dynamic FRET example step-by-step (https://www.youtube.com/watch?v=JsyznI8APlQ). This addresses a comment by reviewer #1 asking for further information for users that are not yet ready to try Mars. In that case, they can now watch the videos to see exactly how it works in use. We hope these videos will also help with issues because steps can now be precisely followed.

Despite all our efforts we imagine there could be computer configurations we have not yet encountered and issues we haven’t encountered yet could occur. As mentioned above, Mars is a community partner on the Scientific Community Image Forum. This is the perfect place for users to post about problems they encounter or errors they see.

3) The sample dataset of smFRET trajectories from Hellenkamp et al., (2017) that is analyzed for the work reported in this manuscript is limited in the sense that the trajectories do not exhibit transitions between different FRET efficiency states as a function of time. Given that the majority of smFRET studies typically contain trajectories that do exhibit such transitions, it is important that the authors include a second sample dataset of smFRET trajectories that do exhibit such transitions as a function of time. The authors are encouraged to analyze an already existing dataset from their own laboratory or a collaborator's laboratory or, if that is somehow not possible, to record such a dataset de novo. When adding these new analyses and revising the smFRET section of the manuscript, the authors should take care to also address Reviewer 1's comments regarding the smFRET example(s).

This is an excellent suggestion. We recorded a new dataset de novo in our laboratory to address this point. We chose a Holliday junction substrate very similar to the one reported by Hyeon et al., 2012, Nat. Chem. (https://www.nature.com/articles/nchem.1463) that exhibits dynamic changes between low FRET (iso-II) and high FRET (iso-I) conformations on the 100s of milliseconds timescale. We collected on many fields of view in ALEX format with our dualview microscope. We have deposited the new dataset on zenodo (https://doi.org/10.5281/zenodo.6659531) to make it publicly available. We developed two new smFRET example workflows using the new sample data:

(https://duderstadt-lab.github.io/mars-docs/examples/Dynamic_FRET/) and

(https://duderstadt-lab.github.io/mars-docs/examples/No_aex_FRET/). In the second one we ignore the direct acceptor excitation information to provide scripts needed to process datasets where that hasn’t been collected. We also provided a script for kinetic analysis of dwell time distributions and revised the smFRET workflow text of the manuscript accordingly to include a description of the newly collected dataset. We have included a new figure 5 showing the results of analysis of the new datasets and included the experimental details in the methods scripts of the revised manuscript.

4) The authors should revise the manuscript to include: (i) a beginners-level, practical discussion of why a platform such as Mars is needed to increase the reproducibility of data storage and processing practices (including examples of real-world problems that Mars would address, minimize, or eliminate) and

This is a great suggestion. In this revised manuscript, we have included a new section right after the Introduction called “Common pitfalls in single-molecule image-processing workflows” that provides a beginners-level, practical discussion of why Mars is needed and how it addresses real world reproducibility and data storage issues specific to single-molecule datasets.

(ii) an easy-to-follow, brief guide explaining the use of Mars within the manuscript aimed at readers who may not necessarily be ready to use the platform or the associated Jupyter notebooks and/or interactive features of the website.

We understand that many readers of our manuscript might not be ready to try mars and read through the other documentation and example materials we have created. For these readers, we see that a brief guide explaining the use of Mars would be helpful. We have expanded the ‘Commands for image processing and biomolecule analysis’ to include such a guide. There are limitless ways to combine the commands. We have also created a YouTube channel with tutorials and example workflow videos (https://www.youtube.com/channel/UCkkYodMAeotj0aYxjw87pBQ). We think this might be the very best way for people to quickly get a feeling for the mechanics of Mars.

5) The step-by-step instructions for using Mars that are included on the website should be expanded to include further details, particularly in terms of the output files, as well as to include a general troubleshooting guide for users.

Based on the feedback from the reviewers, we have extensively revised and updated the Mars documentation website. We have included a new FAQ page to help with troubleshooting and more extensive troubleshooting sections in the smFRET workflow examples. We also created a YouTube channel that has video tutorials and workflow examples to help clarify the expected inputs and outputs.

As mentioned above, it is very hard for us to anticipate all problems that could arise depending on the dataset used and computer configuration. The Scientific Community Image Forum is the very best place for many of these discussions. Forum posts are public and provide very helpful troubleshooting information for all users. Over time and more posts are made a much broader range of questions are addressed. We can only hope users will reach out for help there.

Reviewer #1 (Recommendations for the authors):

This report constitutes the review of the manuscript titled "Mars, a molecule archive suite for reproducible analysis and reporting of single molecule properties from bioimages", by Huisjes, Retzer et al. With the ever increasing interest in these techniques, accompanied by ever increasing experimental schemes and analytical frameworks it becomes difficult to report such experimental results in a manner that is universal, one which can adapt to any type of experiment and analysis, one which assists in reproducibility, and one which will keep the data as compact as possible, yet its usage as efficient as possible. This work introduces a novel platform for the reproducible archiving of camera-based single-molecule imaging experiments. This work will appeal to practitioners of single-molecule imaging experiments, both experienced and newbies. The readers of this work would benefit from understanding how to employ a rational data archiving process using Mars, following three examples the authors provide, which exhibit the generality of the platform and its ease of use. Readers who might want to employ Mars for their own single-molecule imaging measurements can also experience an intuitive guide on Mars GitHub or in Jupyter Notebooks the authors provide. However, I believe that it would even be better if the authors could provide a guidance chapter in the manuscript itself, in addition and before sending the readers to the guide online – this way, readers would already have an idea of what to expect when they try constructing their own Mars data archiving.

I judge that to the most part, the manuscript shows concisely and clearly how to use Mars for proper archiving of single-molecule imaging data, in a rational manner, that can be easily read, well understood, without loss of information and with the ability to easily track back and perform changes. The manuscript is very well polished, the figures provide a self-explanatory graphical guide – overall I enjoyed reading it and see how much this tool can advance the field. The importance of this work is very clear to experienced practitioners, who already know what it takes to provide extensive reports on their results, after being analyzed, corrected, filtered and analyzed in one combination of ways out of many others. Although I am very positive about this manuscript, I have a few suggestions for the authors:

1. Readers of the manuscript who are not yet experienced with single-molecule imaging could benefit from a pragmatical discussion that will explain the need for Mars for reproducibility and/or readability of the data: the different types of filtrations a practitioner may perform, all are context dependent, and what might happen if those are not properly documented and annotated. I suggest providing user examples of what might go wrong, and how Mars minimizes such cases.

We are very happy the reviewer appreciates the central goal of Mars to create a common format that helps to enforce reproducible analysis workflows that can easily be shared with others. We very much appreciate the positive feedback on the work that has been a long-term dream of our lab.

In this revised manuscript, we have included a new section right after the Introduction called “Common pitfalls in single-molecule image-processing workflows” that provides a beginners-level, practical discussion of why Mars is needed and how it addresses real world reproducibility and data storage issues specific to single-molecule datasets. We kept the explanation as brief as possible while still addressing the comment. Several pages could be devoted to this topic, but we would surely lose the reader in the process.

We thank the reviewer for this suggestion. This improves the manuscript for the broader audience beyond single-molecule experts.

2. The authors provide a very nice interactive as well as notebook guidance for readers of the manuscript who might want to get acquainted with Mars. However, I think readers of the manuscript would benefit from such a guide already in the manuscript, which will assist them in understanding what to expect when they move to try using Mars. This way, readers who are not yet ready to test Mars, would already be able to follow the guide in text, and then decide if they move to test Mars in action for their datasets.

We understand that many readers of our manuscript might not be ready to try mars and read through the other documentation and example materials we have created. For these readers, we see that a brief guide explaining the use of Mars would be helpful. We have revised the “Commands for image processing and biomolecule analysis” section to include a brief guide that describes the use of Mars starting from using menus and dialogs. We have also created a YouTube channel with tutorials and example workflow videos (https://www.youtube.com/channel/UCkkYodMAeotj0aYxjw87pBQ). We think this might be the very best way for people to quickly get a feeling for the mechanics of Mars without using it themselves or looking at the example pages.

As noted by the reviewer, we have put tremendous effort into the Mars documentation website and materials included in the mars-tutorials repository. These serve as living documentation for Mars that goes far beyond the scope of what can be covered in the manuscript, which is focused on introducing the conceptual framework for Mars. Therefore, we have included this new section in the methods of the revised manuscript. We feel this is the most appropriate place for these details if they are to be included in the manuscript.

3. The analysis pipeline provides a nice abstraction all the way from the raw data to the filtered information. However, another layer of validation should be added afterwards. For example, procedures that test all the filtered smFRET data for features that should exist in the data:

This is a fantastic suggestion that we very much appreciate. We have revised the smFRET workflow scripts to include additional table columns and molecule parameters that provide validation information. We have also written a validation jupyter notebook that assess all the key criteria described below. We have included a new Figure 5 —figure supplement 1 with the validation results from the new dynamic smFRET dataset. We explain our validation calculations point-by-point below.

1) sum of all donor-excited signals should be constant after corrections were applied;

In the smFRET scripts used for calculating all the correction factors we have added a section that introduces a new table column called SUM_Dex that is the sum of the donor and acceptor signals. To assess whether it is constant, we tested many different metrics and finally setting on the coefficient of variation. We calculate the coefficient of variation for the FRET region and add this as a new molecule parameter called SUM_Dex_FRET_Coefficient_of_Variation. Valid traces seem to have values in the 0 to 0.4 region.

Author response image 1
Histogram comparing the Coefficient of Variation of SUM_Dex for Accepted FRET molecules compared to all FRET molecules for the dynamic FRET example.

Lower values reflect valid constant signal. The SUM_Dex can also be plotted and directly inspected using the Mars Rover.

2) sum of all intensities should be constant, after corrections were applied,

In the smFRET scripts used for calculating all the correction factors we have added a section that introduces a new table column called SUM_signal that is the sum of all signals (donor and acceptor emissions on donor excitation as well as acceptor emission on acceptor excitation). We calculate the coefficient of variation for the FRET region and add this as a new molecule parameter called SUM_signal_FRET_Coefficient_of_Variation. Valid traces seem to have values in the 0 to 0.3 region.

Author response image 2
Histogram comparing the Coefficient of Variation of SUM_signal for Accepted FRET molecules compared to all FRET molecules for the dynamic FRET example.

Lower values reflect valid constant signal. The SUM_signal can also be plotted and directly inspected using the Mars Rover.

3) transitions between states (FRET dynamics, or photobleaching) should exhibit donor and acceptor anti-correlation;

In the smFRET scripts used for calculating all the correction factors we have added a calculation of the Pearsons correlation coefficient (comparing donor and acceptor emission due to FRET with donor excitation). We calculate the Pearsons correlation coefficient for the FRET region and add this as a new molecule parameter called FRET_Pearsons_Correlation. We see strong anti-correlation for the dynamic FRET example in the range of -1 to -0.25.

Author response image 3
Histogram comparing the Pearsons correlation coefficient for donor and acceptor emission during FRET for Accepted FRET molecules compared to all FRET molecules for the dynamic FRET example.

Valid molecules should have strong anti-correlation as seen for the accepted molecules.

As expected, we don’t see anti-correlation for the static FRET example because there are no dynamic changes, so this metric is not useful in that case. We made a note of this in the corresponding validation notebook.

4) transitions between Stoichiometry states should exhibit donor and acceptor anti-correlation).

We have added a section in the validation notebooks that assess the expected values of S depending on the region of the FRET trace. Here we note that this is somewhat different for TIRF smFRET data as compared to smFRET data collected using in-solution FRET from burst see in a confocal microscope. We include the region between dye bleach events for this analysis.

Author response image 4
Histograms comparing stoichiometry values for FRET, donor bleaches first, or acceptor bleaches first for Accepted FRET molecules compared to all FRET molecules for the dynamic FRET example.

Regions in between dye bleach events are often very short leading to broader distributions.

Adding a filtered data validation procedure will enhance the procedure even better, and provide another layer of credibility, perhaps even in an automated report as an outcome of the validation step applied on all molecules.

The new table columns and molecule parameters are calculated and introduce by two scripts depending on the workflow used:

https://github.com/duderstadt-lab/mars-tutorials/blob/master/Example_workflows/FRET/scripts/FRET_workflow_4_alex_corrections.groovy

or

https://github.com/duderstadt-lab/mars-tutorials/blob/master/Example_workflows/FRET/scripts/FRET_workflow_6_corrections_without_aex.groovy

The molecule parameters are then used in the jupyter notebook validation reports. The report notebook can be used with any dataset processed with the smFRET scripts we provide. We have generated sample notebooks for all three smFRET workflows based on the sample Molecule Archives we provide in the mars-tutorials repository:

dynamic FRET example:

https://github.com/duderstadt-lab/mars-tutorials/blob/master/Example_workflows/FRET/dynamic/dynamic_FRET_validation_report.ipynb

dynamic FRET example no acceptor excitation:

https://github.com/duderstadt-lab/mars-tutorials/blob/master/Example_workflows/FRET/no_acceptor_excitation/no_acceptor_excitation_FRET_validation_report.ipynb

static FRET example:

https://github.com/duderstadt-lab/mars-tutorials/blob/master/Example_workflows/FRET/static/static_FRET_validation_report.ipynb

All these notebooks render nicely when viewed in Firefox on the github website without the need to install and run them locally (we noticed safari doesn’t render them as nicely use to issues with the nbviewer developed on github). They can be run with other datasets just by changing the files paths at the top.

Finally, these provide rough guesses about thresholds for filtering. However, we did not perform any filtering in our analysis workflow, but only plotted the results from our manual assessment of the traces. We include a final script that could be used to filter the data based on several of the criteria we assess in the validation steps. We provide this as a starting point for users that would like to pursue this strategy, but we feel extra care must be taken. We have made a note of this.

Validation filtering script:

https://github.com/duderstadt-lab/mars-tutorials/blob/master/Example_workflows/FRET/scripts/FRET_workflow_7_validation_filter.groovy

Since the initial submission, we have worked very hard to integrate Conda Python 3 support directly into Mars in Fiji without using a Jupyter notebook or switching platforms (https://forum.image.sc/t/fiji-conda/59618). A β version of this is now working. However, it requires launching Fiji from Python following these instructions (https://github.com/duderstadt-lab/marspylib). This will allow for validation reports to be generated directly in Molecule Archives open in Fiji using Python charting libraries. As part of this work the comments tab now supports multiple documents and rendering of python fenced code blocks. Also, python dashboard widgets are supported, so that Python charts be used directly in the Mars Rover. This is an ongoing collaboration with the PyImageJ/Fiji developers that is not get finished and has required the creation of several new repositories: (https://github.com/scijava/scripting-python) and (https://github.com/duderstadt-lab/marspylib). Sadly, this is beyond the scope of the current work, but will soon provide a much more powerful reporting toolkit.

We would like to thank the reviewer again for this excellent suggestion. We feel this greatly improved Mars and we learned a lot about smFRET analysis in the process.

4. Comments on the smFRET example:

a. The authors chose to show the TIRF camera-based smFRET data from Hellenkamp et al., (2017), of a static mixture of two DNA FRET constructs with two different mean FRET efficiencies. The procedure they describe was designed to fit the treatment of the static mixture of two types of immobilized molecules. Therefore, the analysis pipeline employed the change point finder only for the identification of photo-bleaching steps, and not for identifying dynamic FRET transitions. While this might be fine for the static heterogeneity of this sample, for the analysis framework to be as generic as possible for analyzing smFRET measurements, it is important to include a step of identification of state dwells within each single molecule FRET trajectory. Importantly, the FRET calculations should be performed on dwells that have been proven to belong to a single state.

We agree! We recorded a new dataset de novo in our laboratory to address this point. We chose a Holliday junction substrate very similar to the one reported by Hyeon et al., 2012, Nat. Chem. (https://www.nature.com/articles/nchem.1463) that exhibits dynamic changes between low FRET (iso-II) and high FRET (iso-I) conformations on the 100s of milliseconds timescale. We collected on many fields of view in ALEX format with our dualview microscope. We have deposited the new dataset on zenodo (https://doi.org/10.5281/zenodo.6659531) to make it publicly available. We developed two new smFRET example workflows using the new sample data:

(https://duderstadt-lab.github.io/mars-docs/examples/Dynamic_FRET/) and

(https://duderstadt-lab.github.io/mars-docs/examples/No_aex_FRET/). In the second one we ignore the direct acceptor excitation information to provide scripts needed to process datasets where that hasn’t been collected. We also provided a script for kinetic analysis of dwell time distributions (https://github.com/duderstadt-lab/mars-tutorials/blob/master/Example_workflows/FRET/scripts/FRET_workflow_5_two_state_dwell_times.groovy) and some suggestions for use of change point and other types of kinetic analysis.

In this case, we used a very simple two-state model based on a threshold to generate the dwell time distributions shown in panel D of the Figure 5. For multi-state FRET cases where the number and E values of the states are not known, and their lifetimes are longer than the sampling rate, the change point command included in mars is ideally suited. The notebooks provide an example of how to calculate dwell time distributions from those results. For all example there are notebooks showing how the E value distributions are calculated as well as kinetic plots:

For dynamic FRET:

https://github.com/duderstadt-lab/mars-tutorials/blob/master/Example_workflows/FRET/dynamic/dynamic_FRET_example.ipynb

For No aex dynamic FRET:

https://github.com/duderstadt-lab/mars-tutorials/blob/master/Example_workflows/FRET/no_acceptor_excitation/no_acceptor_excitation_FRET_example.ipynb

In the example pages on the Mars documentation website, we have provided some ideas for integration of HHM algorithms commonly used for smFRET analysis into Mars workflows using Python. We created an issue (https://github.com/duderstadt-lab/mars-tutorials/issues/1) to track this and hope users reach out and request this feature. Then we can explore it further. This is a very big topic and we feel it goes far beyond the scope of the current manuscript.

We have revised the smFRET workflow text of the manuscript accordingly to include a description of the newly collected dataset and treatment of dynamic smFRET. We have included new figures showing the results of analysis of the new datasets and included the experimental details in the methods scripts of the revised manuscript.

b. The analysis pipeline was designed to filter out traces with large intensity changes that are not due to photo-bleaching. Large intensity changes might be interesting if one of the FRET dyes experiences quantum-yield changes that are unrelated to FRET (e.g., smFRET involving Cy3 and Cy5). It could be interesting in the proper context, and hence should allow the user to change the data rejection criteria, in a context-dependent manner.

This is an interesting point. Two change point commands are included in Mars. One only looks for the single most likely change point positions in the trace, which is used to detect the bleach positions. The other, normal change point command, will detect all state transitions based on an acceptable false positive rate provided and level of expected noise. The normal change point command can be used on the whole trace or marked regions to perform exactly the type of analysis you have mentioned.

In the case of the new dynamic FRET example dataset, we found leveraging the prior knowledge that we expected two state behavior yielded more accurate dwell time distributions using a simple threshold-based script. We have designed the workflows and Mars so that change point can easily be included and used for other types of analysis.

The change point tables generated could be used to develop further specific rejection criteria. The code used for generation of the dwell time distributions provide a starting point. This could also be easily accomplished in a groovy script. We hope we understood the reviewers comment properly.

c. Correction factors 1: the text relies on the correction factors presented by the multi-lab work of Hellenkamp et al. (2017). That approach uses the Lee et al., (2005) approach for calculating the γ correction factor, which relies on the linear dependence of the inverse of the mean stoichiometry on the mean proximity ratio, of multiple single-population smFRET-ALEX measurements. While this correction factor procedure is used by many and is very common, it does suffer from assuming that both donor and acceptor fluorescence quantum yields change in the same manner when positioning the same donor and acceptor dyes at different positions along the same molecule (a dsDNA in this case). This might not be true, as different labeling positions introduce different dye microenvironments, which may introduce different quenching rates that are unrelated to FRET. In other words, it is possible (and not uncommon at all) that different molecules with different FRET values will also be associated with different γ factors. There are alternatives that can assist in calculating per-population γ factors, which require knowledge of the donor and acceptor fluorescence lifetimes per population. However, this is not yet possible in immobilized TIRF camera-based smFRET. Therefore, in page 10 of the bioRxiv preprint, when the authors present the γ correction factor, I suggest they add: "… , under the assumption that placing the same dyes at different bases along the DNA molecule does not influence the ratio of the acceptor and donor fluorescence quantum yields".

We thank the reviewer for pointing out this important assumption in our analysis. We have added the suggested text in the description for the corrections on page 10 as suggested.

d. Correction factors 2: the text relies on the correction factors presented by the multi-lab work of Hellenkamp et al., (2017). That approach uses the Lee et al., (2005) approach for calculating the γ correction factor. However, Hellenkamp et al., (2017) and not Lee et al., (2005) is cited. I ask the authors to cite Lee et al., (2005), when providing the explanation on the β- and γ-factor calculations.

This was an oversight on our part. We have added the Lee et al., (2005) reference in the revised manuscript. We thank the reviewer for pointing this out.

e. When explaining the FRET-related calculations, sometimes the variables are not explained: (i) in steps 4 and 5, explicitly explain the readers what are FA|A and FD|D; (ii) "FA|D stores the fully corrected acceptor fluorescence intensity value"; (iii) explain for FRET novices what do the variables in equations 1 and 2 mean.

Thank you for pointing out these errors. We have included more explanations about each of the variables introduced in the FRET calculations section of the revised manuscript.

f. At the bottom of page 6 of the bioRxiv preprint, the authors provide their estimations of the FRET efficiency values, and it seems that the accuracy is high, if comparing the values to the mean values reported by Hellenkamp et al. (2017). However, the precision of the reported values (E1-lo = 0.14 {plus minus} 0.13 and E1-mid = 0.51 {plus minus} 0.09) are low compared to the precisions in Hellenlamp et al. (E1-lo = 0.15 {plus minus} 0.02 and E1-mid = 0.56 {plus minus} 0.03). Why such imprecisions? Please explain the readers in the text.

First of all, we apologize for a mistake in the text where these values are reported that has added to this confusion. It is not the standard error of the mean we reported, but simply the standard error (standard deviation). We reported the standard deviation from a gaussian fit of the two E value distributions for 1-lo and 1-mid. This precision estimate reflects the broadness of the E value distributions themselves. The wide E value distributions we observe are similar to Hellenkamp et al. This can be seen clearly by comparing our E value distributions to the ones displayed in their figures. However, they do not report this standard deviation. Instead, Hellenkamp et al., report the standard deviation of the mean values of E reported by many labs. The individual values used for this calculation of precision each represent the best estimate or center position of the E value peaks obtained for a given lab. As a consequence, the standard deviation is small from these values.

In the process of revising the smFRET examples, we reanalyzed the static FRET dataset with the new workflow and obtained the updated values E1-lo = 0.16 ± 0.11 and E1-mid = 0.53 ± 0.13 (standard deviation). We have revised the manuscript to include these values because they reflect the results from the revised workflow and newly deposited example Molecule Archive in the mars-tutorials repository. We have also included the same values but with the estimate of the standard error of the mean: E1-lo = 0.16 ± 0.01 and E1-mid = 0.53 ± 0.01 for the ~250 molecules in each the dataset.

We have included an explanation of the differences in precision we observe compared to Hellenkamp et al., due to difference in the calculations performed. We thank the reviewer for pointing this out.

Reviewer #2 (Recommendations for the authors):

Technical issues encountered:

1. Overall the step-wise tutorial on the website is well organized, but can be further improved. Particularly, the expected results/output files/windows should also be included, similar as the parameter setting for the input part. Sometimes it's unclear what the correct output format should be from each operation.

We have revised the workflow to further clarify the input and output of each of the steps. We have created an overview figure and split the process into two phases. The first phase is focused on image processing to obtain three Molecule Archives (FRET, DO, and AO). The second phase performs a bunch of operations on the merged Molecule Archive with a streamlined set of numbered scripts (https://github.com/duderstadt-lab/mars-tutorials/tree/master/Example_workflows/FRET/scripts). We have included screenshots of the dialogs presented from the scripts and make it clearer what the expected outputs are. We hope the full example video on YouTube clarifies any further questions (https://www.youtube.com/watch?v=JsyznI8APlQ).

2. We noticed Mac and Windows systems could encounter different issues. Taking the FRET analysis as an example, we had to switch between the Windows and Mac systems to get it through completely, because certain steps worked in one but not the other.

We are very sad to hear you had to switch between computer systems. This problem never occurred during our tests. We apologize for the issues. We have worked to improve and address each step you have mentioned below.

One frequent issue people encounter is an out-of-date copy of Fiji with incorrect jars. Therefore, to avoid those issues, we would strongly suggest downloading and installing a new copy of Fiji and updating that with the Mars update site as described in the installation instructions (now in the manuscript as well as on the Mars documentation page). See out point-by-point comments for the other issues below.

For example:

(1) When we tried to convert the example FRET video to contain channel information using script 1, the Windows system generated a blank image. However, it ran correctly on Mac.

This particular script was written using the ImageJ1 API. This could account for the problems you encountered. We rewrote the script using the ImageJ2 API and Imglib2 (https://github.com/duderstadt-lab/mars-tutorials/blob/master/Example_workflows/FRET/scripts/static_FRET_reformat_video.groovy).

This improved the performance of the script and we hope it is now more reliable. We have tested it on windows and mac without problems. Just in case, when you observe a blank image are you sure it didn’t work or is the contrast just not adjusted? It can happen that you see a black image due to the wrong contrast settings. This can be adjusted using Image > Adjust > Brightness and Contrast…

(2) There was some problem with transforming the ROIs to the right part of the split view on a Mac system. We used the same parameters as the instruction, but all the coordinates disappeared after doing Transform ROIs. But it worked well on a windows system.

We have done further testing with the Transform ROIs command in the process of revising the smFRET example. We have now also added an inverse transform option, so the coordinates don’t need to be changed during the workflow. We hope this addresses the problem. We haven’t see any differences in how the commands works when comparing windows and mac.

(3) On a Mac system, after running peak tracer or molecular integrator, we were not able to generate a GUI interface containing the actual output, but a console saying the analysis is successful.

This is very unfortunate and unexpected. This can occur if JavaFx or some of the Mars jars are not installed on the mac system. We have worked really hard with the Fiji development team and now JavaFx is included with Fiji for mac. We would suggest trying a new copy of Fiji, freshly downloaded from https://fiji.sc with Mars installed as described in the installation instructions. However, these problems should have generated error messages.

If the detection threshold is too high peaks might not be tracked for many time points and this could result in no molecule found that have tracks long enough. In this case, it will report that the tool rand successfully, but there will be not Molecule Archive created because not results were found. Could this have been the issue?

The Molecule Integrator requires a set of ROIs in the ROI Manager to work. Was there a set of ROIs added to the ROI manager using the Peak Finder before running the command? If not, the message in the console should report the problem that no ROIs were found.

The successful statement at the bottom of log after a command has finished just indicated it ended without obvious serious errors. It doesn’t guarantee and output if the settings don’t produce one.

3. It's very confusing that the output file of the converted FRET data is a composite image of two channels. It is not explained what each of the channels is until explaining the output file of the molecular integrator step. Echoing my point 1, it is necessary to explain to the users the format of the output files. The composite image makes it appear that two channels are the same without any explanation, and each channel is the sum of the red and green signal.

We now see our mistake in not explaining this clearly. We have revised the documentation to include a better description of the video input with the two channels in the OME section. We have also included a YouTube video for the dynamic FRET example that has a similar video input. In the video we explain the different channels and what information they contain. Thank you very much for pointing out his issue.

https://doi.org/10.7554/eLife.75899.sa2

Article and author information

Author details

  1. Nadia M Huisjes

    Structure and Dynamics of Molecular Machines, Max Planck Institute of Biochemistry, Martinsried, Germany
    Contribution
    Conceptualization, Software, Formal analysis, Investigation, Visualization, Writing - original draft, Writing – review and editing
    Contributed equally with
    Thomas M Retzer
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4182-7229
  2. Thomas M Retzer

    1. Structure and Dynamics of Molecular Machines, Max Planck Institute of Biochemistry, Martinsried, Germany
    2. Physik Department, Technische Universität München, Garching, Germany
    Contribution
    Conceptualization, Software, Formal analysis, Investigation, Visualization, Methodology, Writing - original draft, Writing – review and editing
    Contributed equally with
    Nadia M Huisjes
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7353-2708
  3. Matthias J Scherr

    Structure and Dynamics of Molecular Machines, Max Planck Institute of Biochemistry, Martinsried, Germany
    Contribution
    Conceptualization, Data curation, Software, Investigation, Visualization, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0110-3709
  4. Rohit Agarwal

    1. Structure and Dynamics of Molecular Machines, Max Planck Institute of Biochemistry, Martinsried, Germany
    2. Physik Department, Technische Universität München, Garching, Germany
    Contribution
    Data curation, Software, Formal analysis, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
  5. Lional Rajappa

    Structure and Dynamics of Molecular Machines, Max Planck Institute of Biochemistry, Martinsried, Germany
    Contribution
    Resources, Investigation
    Competing interests
    No competing interests declared
  6. Barbara Safaric

    Structure and Dynamics of Molecular Machines, Max Planck Institute of Biochemistry, Martinsried, Germany
    Contribution
    Validation, Visualization, Writing – review and editing
    Competing interests
    No competing interests declared
  7. Anita Minnen

    Structure and Dynamics of Molecular Machines, Max Planck Institute of Biochemistry, Martinsried, Germany
    Contribution
    Conceptualization, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
  8. Karl E Duderstadt

    1. Structure and Dynamics of Molecular Machines, Max Planck Institute of Biochemistry, Martinsried, Germany
    2. Physik Department, Technische Universität München, Garching, Germany
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing – review and editing
    For correspondence
    duderstadt@biochem.mpg.de
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1279-7841

Funding

European Research Council (ERC-StG-804098)

  • Nadia M Huisjes
  • Karl E Duderstadt
  • Matthias J Scherr
  • Barbara Safaric

Deutsche Forschungsgemeinschaft (SFB863-11166240)

  • Thomas M Retzer
  • Rohit Agarwal

Max Planck Society

  • Lional Rajappa
  • Anita Minnen
  • Karl E Duderstadt

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We are grateful to Giovanni Cardone (MPIB imaging facility) and Evelyn Plötz for their helpful feedback on the project. We would like to thank Prof. Thorsten Hugel for providing the Igor source code used for the analysis of smFRET data (Hellenkamp et al., 2018). We would like to thank Andreas Walbrun and Prof. Matthias Rief for providing the sample data used to develop the Mars LUMICKS h5 importer. We are grateful for the recent development efforts in the ImageJ2 and Fiji communities that have greatly enhanced the applications across platforms. We would like to thank Curtis Rueden, Jan Eglinger, Tobias Pietzsch, and Jean-Yves Tinevez for taking the time to answer the many questions that have come up during development and considering our pull requests. We thank Curtis Rueden for helping to configure our repositories for deployment to SciJava maven. This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)–SFB863–11166240, a starting grant from the European Research Council (Project Number: 804098, REPLISOMEBYPASS), and the Max Planck Society.

Senior Editor

  1. Volker Dötsch, Goethe University, Germany

Reviewing Editor

  1. Ruben L Gonzalez, Columbia University, United States

Reviewers

  1. Eitan Lerner, The Hebrew University of Jerusalem, Israel
  2. Jingyi Fei, University of Chicago, United States

Publication history

  1. Preprint posted: November 26, 2021 (view preprint)
  2. Received: November 26, 2021
  3. Accepted: August 19, 2022
  4. Version of Record published: September 13, 2022 (version 1)

Copyright

© 2022, Huisjes, Retzer et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 450
    Page views
  • 61
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Nadia M Huisjes
  2. Thomas M Retzer
  3. Matthias J Scherr
  4. Rohit Agarwal
  5. Lional Rajappa
  6. Barbara Safaric
  7. Anita Minnen
  8. Karl E Duderstadt
(2022)
Mars, a molecule archive suite for reproducible analysis and reporting of single-molecule properties from bioimages
eLife 11:e75899.
https://doi.org/10.7554/eLife.75899

Further reading

    1. Structural Biology and Molecular Biophysics
    Nicola Galvanetto, Zhongjie Ye ... Vincent Torre
    Research Article Updated

    Single-molecule force spectroscopy (SMFS) uses the cantilever tip of an atomic force microscopy (AFM) to apply a force able to unfold a single protein. The obtained force-distance curve encodes the unfolding pathway, and from its analysis it is possible to characterize the folded domains. SMFS has been mostly used to study the unfolding of purified proteins, in solution or reconstituted in a lipid bilayer. Here, we describe a pipeline for analyzing membrane proteins based on SMFS, which involves the isolation of the plasma membrane of single cells and the harvesting of force-distance curves directly from it. We characterized and identified the embedded membrane proteins combining, within a Bayesian framework, the information of the shape of the obtained curves, with the information from mass spectrometry and proteomic databases. The pipeline was tested with purified/reconstituted proteins and applied to five cell types where we classified the unfolding of their most abundant membrane proteins. We validated our pipeline by overexpressing four constructs, and this allowed us to gather structural insights of the identified proteins, revealing variable elements in the loop regions. Our results set the basis for the investigation of the unfolding of membrane proteins in situ, and for performing proteomics from a membrane fragment.

    1. Biochemistry and Chemical Biology
    2. Structural Biology and Molecular Biophysics
    Hemant N Goswami, Jay Rai ... Hong Li
    Research Article

    Cas7-11 is a Type III-E CRISPR Cas effector that confers programmable RNA cleavage and has potential applications in RNA interference. Cas7-11 encodes a single polypeptide containing four Cas7- and one Cas11-like segments that obscures the distinction between the multi-subunit Class 1 and the single-subunit Class-2 CRISPR-Cas systems. We report a cryo-EM structure of the active Cas7-11 from Desulfonema ishimotonii (DiCas7-11) that reveals the molecular basis for RNA processing and interference activities. DiCas7-11 arranges its Cas7- and Cas11-like domains in an extended form that resembles the backbone made up by four Cas7 and one Cas11 subunits in the multi-subunit enzymes. Unlike the multi-subunit enzymes, however, the backbone of DiCas7-11 contains evolutionarily different Cas7 and Cas11 domains, giving rise to their unique functionality. The first Cas7-like domain nearly engulfs the last 15 direct repeat nucleotides in processing and recognition of the CRISPR RNA, and its free-standing fragment retains most of the activity. Both the second and the third Cas7-like domains mediate target RNA cleavage in a metal-dependent manner. The structure and mutational data indicate that the long variable insertion to the fourth Cas7 domain has little impact to RNA processing or targeting, suggesting the possibility for engineering a compact and programmable RNA interference tool.