Bayesian machine learning analysis of single-molecule fluorescence colocalization images

  1. Yerdos A Ordabayev
  2. Larry J Friedman
  3. Jeff Gelles  Is a corresponding author
  4. Douglas L Theobald  Is a corresponding author
  1. Department of Biochemistry, Brandeis University, United States
9 figures, 5 tables and 7 additional files

Figures

Example CoSMoS experiment.

(A) Experiment schematic. DNA target molecules labeled with a blue-excited fluorescent dye (blue star) are tethered to the microscope slide surface. RNA polymerase II (Pol II) binder molecules …

Figure 2 with 2 supplements
Depiction of the cosmos probabilistic image model and model parameters.

(A) Example AOI image (from Data set A in Table 1). The AOI image is a matrix of 14 × 14 pixel intensities which is shown here as both a 2-D grayscale image and as a 3-D intensity plot. The image …

Figure 2—figure supplement 1
Extended graphical representation of the generative probabilistic model.

Directed factor graph representation (Bishop, 2006) of model parameters and parameter distributions. This diagram is a more complete version of the graphical model shown in Figure 2D; it includes …

Figure 2—figure supplement 2
Prior distributions for the x and y spot position parameters.

Prior distributions of x and y for specific and non-specific binding. Probability densities for x and y are defined in the range [-(P+1)/2,(P+1)/2] relative to the target molecule and are conditional on the …

Figure 3 with 4 supplements
Tapqir analysis and inferred model parameters.

(A,B) Tapqir was applied to simulated data (lamda0.5 parameter set in Supplementary file 1) (A) and to experimental data (Data set A in Table 1) (B). (A) and (B) each show a short extract from a …

Figure 3—figure supplement 1
Calculated spot probabilities.

The data sets used for panels A and B are identical to those in Figure 3A and B; the first two rows and the p(specific) (green) graph are reproduced from that figure. Blue graphs show the probability of …

Figure 3—figure supplement 2
Reproduction of experimental data by posterior predictive sampling.

Example frames are shown from Data set A (A: SNR = 1.61), Data set B (B: SNR = 3.77), Data set C (C: SNR = 4.23), and Data set D (D: SNR = 3.06) in Table 1. In each panel the top row shows AOI …

Figure 3—figure supplement 3
Tapqir analysis of image data simulated using a broad range of global parameters.

Simulations (see Materials and methods) consist of 16 data sets where values of global parameters (π,λ, σxy, and g) were randomly generated for each data set (Supplementary file 2). Simulated data …

Figure 3—figure supplement 4
Effect of AOI size on analysis of experimental data.

(A) and (B) each show a short extract from a single target location (AOI 163 in (A) and AOI 0 in (B)) from Data set A (Table 1; SNR = 1.61). Tapqir was applied to the data set using AOI image sizes P of 14 × 14 (first row), 10 × 10 (second row), and 6 × 6 (third row) pixels. Corresponding output p(specific) probabilities are plotted in the graph. Image contrasts in (A) and (B) are different. Unattended calculation time on an AMD Ryzen Threadripper 2990 WX with an Nvidia GeForce RTX 2080Ti GPU using CUDA version 11.5 for the different AOI sizes were: 7 h 40 min (P = 14), 3 h 5 min (P = 10), and 2 h 40 min (P = 6).

Figure 4 with 1 supplement
Tapqir performance on simulated data with different SNRs or different non-specific binding densities.

(A–D) Analysis of simulated data over a range of SNR. SNR was varied in the simulations by changing spot intensity h while keeping other parameters constant (Supplementary file 3). (A) Example …

Figure 4—figure supplement 1
False negative spot misidentifications by Tapqir and spot-picker method.

The same λ = 1 simulated data set used in Figure 4E–H (lamda1 in Supplementary file 1) was analyzed by Tapqir and spot-picker. The data set contained 418 AOI images containing target-specific …

Tapqir analysis of association/dissociation kinetics and thermodynamics.

(A) Chemical scheme for a one-step association/dissociation reaction at equilibrium with pseudo-first-order binding and dissociation rate constants kon and koff, respectively. (B) A simulation of the …

Figure 6 with 3 supplements
Extraction of target-binder association kinetics from example experimental data.

Data are from Data set B (SNR = 3.77, λ = 0.1575; see Table 1). (A) Probabilistic rastergram representation of Tapqir-calculated target-specific spot probabilities p(specific) (color scale). AOIs were …

Figure 6—figure supplement 1
Additional example showing extraction of target-binder association kinetics from experimental data.

Data are from Data set A (SNR = 1.61, λ = 0.2943; see Table 1). Results are plotted as in Figure 6, except that for clarity only every second frame and every third AOI is shown in (A).

Figure 6—figure supplement 2
Additional example showing extraction of target-binder association kinetics from experimental data.

Data are from Data set C (SNR = 4.23, λ = 0.0876; see Table 1). Results are plotted as in Figure 6, except that for clarity only every tenth frame is shown in (A).

Figure 6—figure supplement 3
Additional example showing extraction of target-binder association kinetics from experimental data.

Data are from Data set D (SNR = 3.06, λ = 0.0437; see Table 1). Results are plotted as in Figure 6, except that for clarity only every thirteenth frame and every second AOI is shown in (A).

Extraction of AOI images from raw images.
Pseudocode representation of cosmos model.
Pseudocode representation of cosmos guide.

Tables

Table 1
Experimental data sets.
Data set sizeaSNRπ [95% CI]λ [95% CI]g [95% CI]σxy [95% CI]Compute time
Data set A: Binder, SNAPf-tagged S. cerevisiae RNA polymerase II labeled with DY549; Target, transcription template DNA containing 5× Gal4 upstream activating sequences and CYC1 core promoter; Conditions, yeast nuclear extract supplemented with Gal4-VP16 activator and NTPs. From Rosen et al., 2020.
N= 331, Nc = 526, F = 7901.610.0951 [0.0936, 0.0966]0.2943 [0.2924, 0.2963]6.645 [6.643, 6.647]0.577 [0.573, 0.580]7 h 40 mb
3 h 50 mc
Data set B: Binder, 0.1 nM E. coli σ54 RNA polymerase labeled with Cy3; Target, 852 bp DNA containing the glnALG promoter; Conditions, physiological buffer, no NTPs. From (Fig. 1E) of Friedman et al., 2013.
N= 102, Nc = 127, F = 44073.770.0846 [0.0835, 0.0857]0.1575 [0.1569, 0.1583]11.861 [11.856, 11.865]0.476 [0.474, 0.479]7 h 40 mb
Data set C: Binder, 0.4 nM E. coli σ54 RNA polymerase labeled with Cy3; Target, 3,591 bp DNA containing the glnALG promoter; Conditions, physiological buffer, no NTPs. From (Fig. 3D) of Friedman et al., 2013.
N= 122, Nc = 157, F = 38554.230.0267 [0.0262, 0.0273]0.0876 [0.0869, 0.0883]16.777 [16.773, 16.782]0.404 [0.399, 0.408]9 h 15 mb
Data set D: Binder, 0.15 nM E. coli Cy3-GreB; Target, reconstituted backtracked EC-6 E. coli transcription elongation complex; Conditions, physiological buffer, no NTPs. Randomly selected subset of data set from Tetone et al., 2017.
N= 200, Nc = 200, F = 56223.060.0038 [0.0036, 0.0039]0.0437 [0.0434, 0.0440]18.727 [18.724, 18.731]0.451 [0.438, 0.463]11 hb
  1. *N - number of on-target AOIs, Nc - number of control off-target AOIs, F - number of frames.

  2. bUnattended calculation time on an AMD Ryzen Threadripper 2990WX with an Nvidia GeForce RTX 2080Ti GPU using CUDA version 11.5.

  3. cUnattended calculation time on an Intel Xeon CPU with an Nvidia Tesla V100-SXM2-16GB GPU using CUDA version 11.2 in a Google Colab Pro account.

Table 2
The effect of AOI size on classification accuracy*.
AOI dimension, P (pixels)MCCCompute time
140.9512 h 10 m
100.9481 h 25 m
60.9391 h 20 m
  1. *

    Tapqir was applied to the same simulated data set (height1000 parameter set in Supplementary file 3; SNR = 1.25) using different AOI sizes.

  2. The width (w) of the simulated spots (one standard deviation of the 2-D Gaussian) is equal to 1.4 pixels.

  3. Unattended calculation time on an AMD Ryzen Threadripper 2990WX with an Nvidia GeForce RTX 2080Ti GPU using CUDA version 11.5.

Table 3
Variables used in the Tapqir model.
SymbolMeaningDomain
KMaximum number of spots per image
NNumber of on-target AOIs
NcNumber of off-target control AOIs
FNumber of frames
PSize of the AOI image in pixels
gCamera gainR>0
σxyProximity(0,(P+1)/12)
πAverage target-specific binding probability[0,1]
λTarget-nonspecific binding densityR>0
μbMean background intensity across AOIR>0AOI[N]
σbStandard deviation of background intensity across AOIR>0AOI[N]
bBackground intensityR>0AOI[N]×frame[F]
zTarget-specific spot presence{0,1}AOI[N]×frame[F]
θTarget-specific spot index{0,1,,K}AOI[N]×frame[F]
mSpot presence indicator{0,1}spot[K]×AOI[N]×frame[F]
hIntegrated spot intensityR>0spot[K]×AOI[N]×frame[F]
wSpot width[0.75,2.25]spot[K]×AOI[N]×frame[F]
xCenter of the spot on the x-axisRspot[K]×AOI[N]×frame[F]
yCenter of the spot on the y-axisRspot[K]×AOI[N]×frame[F]
μS2-D Gaussian spotR>0spot[K]×AOI[N]×frame[F]×pixelX[P]×pixelY[P]
μIIdeal image w/o offsetR>0AOI[N]×frame[F]×pixelX[P]×pixelY[P]
δOffset signalR>0AOI[N]×frame[F]×pixelX[P]×pixelY[P]
IObserved image w/o offset signalR>0AOI[N]×frame[F]×pixelX[P]×pixelY[P]
DObserved image (I+δ)R>0AOI[N]×frame[F]×pixelX[P]×pixelY[P]
xtargetTarget molecule position on the x-axis[P/21,P/2]AOI[N]×frame[F]
ytargetTarget molecule position on the y-axis[P/21,P/2]AOI[N]×frame[F]
iPixel index on the x-axis{0,,(P1)}pixelX[P]
jPixel index on the y-axis{0,,(P1)}pixelX[P]
WWidth of the raw microscope images in pixels
HHeight of the raw microscope image in pixels
DrawRaw microscope imagesR>0frame[F]×pixelX[H]×pixelY[W]
xtarget,rawTarget molecule position in raw images on the x-axis[0.5,H0.5]AOI[N]×frame[F]
ytarget,rawTarget molecule position in raw images on the y-axis[0.5,W0.5]AOI[N]×frame[F]
Table 4
Probability distributions used in the model.
DistributionPDF
xAffineBeta(μ,ν,a,b)yα1(1y)β1B(α,β)whereα=ν(μa)ba,β=ν(bμ)ba,andy=xaba
xBernoulli(π)πx(1-π)1-x
xBeta(α,β)xα-1(1-x)β-1B(α,β)
xCategorical(p)i=1kpi[x=i]
xEmpirical(z,p)i=1kpi[x=zi]
xExponential(λ)λe-λx
xGamma(μ,σ)βαΓ(α)xα1eβxwhereα=μ2σ2andβ=μσ2
xHalfNormal(σ)2σπexp(x22σ2)forx>0
kTruncPoisson(λ,K){1eλi=0K1λii!ifk=Kλkeλk!otherwise
xUniform(a,b)1baforx[a,b]
Table 5
The effect of mapping precision on classification accuracy*.
σxy(true)σxy(fit) [95% CI]MCCσxy Prior
0.20.21 [0.20, 0.22]0.989Exponential(1)
10.96 [0.90, 1.02]0.939Exponential(1)
1.51.49 [1.40, 1.59]0.890Exponential(1)
21.96 [1.84, 2.09]0.834Exponential(1)
21.97 [1.84, 2.09]0.834Uniform(0,(P+1)/12)
  1. *

    Data were simulated over a range of proximity parameter σxy values at fixed π=0.15 and λ=0.15 (Supplementary file 6).

Additional files

Supplementary file 1

Varying non-specific binding rate simulation parameters and corresponding fit values.

https://cdn.elifesciences.org/articles/73860/elife-73860-supp1-v2.xlsx
Supplementary file 2

Randomized simulation parameters and corresponding fit values.

https://cdn.elifesciences.org/articles/73860/elife-73860-supp2-v2.xlsx
Supplementary file 3

Varying intensity (SNR) simulation parameters and corresponding fit values.

https://cdn.elifesciences.org/articles/73860/elife-73860-supp3-v2.xlsx
Supplementary file 4

No target-specific binding and varying non-specific binding rate simulation parameters and corresponding fit values.

https://cdn.elifesciences.org/articles/73860/elife-73860-supp4-v2.xlsx
Supplementary file 5

Kinetic simulation parameters and corresponding fit values.

https://cdn.elifesciences.org/articles/73860/elife-73860-supp5-v2.xlsx
Supplementary file 6

Varying proximity simulation parameters and corresponding fit values.

https://cdn.elifesciences.org/articles/73860/elife-73860-supp6-v2.xlsx
Transparent reporting form
https://cdn.elifesciences.org/articles/73860/elife-73860-transrepform1-v2.docx

Download links