Bisulfite treatment and single-molecule real-time sequencing reveal D-loop length, position, and distribution

  1. Shanaya Shital Shah
  2. Stella R Hartono
  3. Frédéric Chédin
  4. Wolf-Dietrich Heyer  Is a corresponding author
  1. Department of Microbiology and Molecular Genetics, University of California, Davis, United States
  2. Department of Molecular and Cellular Biology, University of California, Davis, United States
5 figures, 1 table and 5 additional files

Figures

Schematic of a novel D-loop Mapping Assay (DMA).

(A) Schematic of an in vitro D-loop reaction with a supercoiled dsDNA donor. Substrates have different lengths of homology (shown in blue line) to the supercoiled dsDNA donor. The substrates also have a 98 bp 5′ non-homologous duplex DNA to mimic physiological invading DNA. The D-loops would be restricted to be ~210 ± 50 nt in size, due to the supercoiling density of the donor DNA. (B) Schematic of the D-loop mapping assay (DMA) (for details, see Materials and methods). For a D-loop on a supercoiled dsDNA, the dsDNA is cropped here to only depict the strand invasion. Primers specific to the donor outside the region of homology (indicated by brown arrowheads) and having an additional universal primer sequence (in green) were used. Initially, only the primer sequence indicated by brown arrowheads would anneal to the target DNA. Barcodes were added to the amplicons via a second round of PCR using the universal primer sequence (green arrow ahead indicates primer location). Hairpin adaptors were added during library preparation (shown in orange), for single-molecule real-time sequencing. (C) Schematic depiction of the five internal controls for the DMA.

Figure 2 with 1 supplement
Mapping D-loops formed on a supercoiled donor by DMA.

(A, B, C) Footprint maps depicting reads with a D-loop footprint. Reads were derived from an in vitro D-loop reaction performed with a supercoiled donor and ds98-931, ds98-607, or ds98-197-78ss substrates, respectively. Only reads from the top strand of dsDNA donor that contains a footprint are shown here. Here and in all subsequent figures with a footprint map, each horizontal line represents one read molecule (or amplicon). Vertical yellow lines indicate the position of each cytosine across the read sequence. The status of each cytosine along the sequence is color-coded with green representing C-T conversions. The status of cytosine is changed to red if the C-T conversions cross the peak threshold and are defined as D-loop footprints. Unless otherwise mentioned, the peak threshold is t40w50 (requiring at least 40% cytosines converted to thymine in a stretch of 50 consecutive cytosines). The reads are clustered based on the position of footprints in a 5′ to 3′ direction. The faintly colored boxes indicate the clusters. The blue dotted box represents region homologous to the invading substrate. The text below the map summarizes the number of reads containing a footprint (depicted in the map) and the total top-strand reads analyzed for that sample. Some of the footprints seem to extend slightly beyond the region of homology. This may be due inclusion of converted cytosines from DNA breathing close to the D-loop. Scale bar is 100 nt. (D) Table summarizing the total number of reads containing a footprint as ‘peak’ and the total number of reads analyzed as ‘total’ for each strand. ‘% Peak’ indicates the percentage of reads containing a footprint calculated by dividing the number of reads containing a footprint by the total number of reads for that strand. The data represents a cumulation from >3 independent replicates. (E) Quantitation of D-loops from the gel in Figure 2—figure supplement 1 is compared to the percentage of reads with D-loop footprint, as observed by DMA. Mean ± SD (n = 3).

Figure 2—figure supplement 1
Gel-based assay for in vitro formed D-loops.

(A, B and C) SYBR Gold stained agarose gels depicting D-loops formed in vitro using a supercoiled donor before subjecting to DMA. * indicates concatenated plasmid. ~10% of the donor is nicked and topologically relaxed. <5% comprises of multi-invasions by the substrate into more than one donor molecule, similar to previous observations (Wright and Heyer, 2014). (D) Percentage of D-loops longer than 350 nt calculated as the percentage of total D-loops detected for each substrate type having their lengths longer than 350 nt in the DMA.

Distribution of D-loop lengths and position, when formed on a supercoiled donor.

(A, B, C) Distribution of D-loop footprints across the region of homology for the ds98-931, ds98-607, or ds98-197-78ss substrates, respectively. Here, and in all subsequent graphs, the distribution is measured by binning each footprint in 100 nt bins across the homology non-exclusively. The numbers by the plotted line represent the beginning and end of the exact region of homology for each substrate type on the reference sequence. (D) Dot plot showing the distribution of D-loop lengths seen with each substrate type in the DMA assay. In red is Mean ± SD (n = 3).

Figure 4 with 3 supplements
Characteristics of the D-loops formed on a linear donor.

(A) Schematic of the in vitro D-loop reaction involving linear dsDNA donor and various substrates. The blue lines indicate the homology between the substrate and the donor. These substrates can form different D-loops depending on their homology size, homology position, and flanking heterology. Linear donors lack topological restrictions. (B) Table summarizing the total number of reads containing a footprint as ‘peak’ and the total number of reads analyzed as ‘total’ for each strand. ‘% Peak’ indicates the percentage of reads containing a footprint. The data represents a cumulation from >3 independent replicates. Refer to Figure 4—figure supplement 1. (C) Footprint map of D-loop sample formed with ds98-931 and a linear donor. Refer to Figure 4—figure supplement 2 for other footprint maps. (D) Comparing the distribution of D-loop position between an invading substrate with (ds98-915-78ss) and without (ds98-915) a non-homologous 3′-end (n = 3). (E) Distribution of D-loop lengths observed for the three substrates with ~900 nt homology to the donor. Mean ± SEM in red (n = 3). (E) Dot plot showing a distribution of D-loop lengths seen with substrates having ~900 nt homology to the donor in the DMA assay. In red is Mean ± SD (n = 3). *** indicates p-value<0.0005, ns indicates non-significant using a two-tailed, Student’s t-test. (F) Dot plot showing a distribution of D-loop lengths seen with substrates having different lengths of homology to the donor in the DMA assay. In red is Mean ± SD (n = 3). ** indicates p-value<0.005, ***<0.0005, using a two-tailed, Student’s t-test.

Figure 4—figure supplement 1
Quantification of D-loops formed on a linear donor from the gel-based assay and DMA.

(A) SYBR Gold stained agarose gel depicting D-loops formed by each substrate type on a linear donor. (B) Comparison of the quantification of D-loop levels from the gel in (A) to those from the DMA. D-loops were measured relative to the donor DNA. The peak threshold used is t40w50. Mean ± SD (n ≤ 3).

Figure 4—figure supplement 2
Characterization of D-loops formed on a linear donor by DMA.

(A, B, C) Footprint map of D-loops formed with a linear donor and ds98-915-78ss, ds98-915 and ds98-607 substrates, respectively. (B) Footprints detected on the bottom-strand for ds98-915 D-loop sample are also shown, along with the top-strand footprints. The footprints detected to the right outside the homology, seem to arise from DNA breathing in a C-rich region and are thus detected on both strands. (D, E) Distribution of D-loop footprints across the region of homology for the ds98-931 and ds98-607 substrates, respectively. (E) Schematic representation of D-loop with multiple invasions of one or more substrates within a single donor molecule.

Figure 4—figure supplement 3
Characterization of D-loops formed using human recombinant proteins by DMA.

(A) Schematic of an in vitro D-loop reaction with the ds98-931 substrate, human recombinant proteins, and a supercoiled donor. Nascent D-loops formed would be restricted by the supercoiling density of the donor to be ~210 ± 50 nt in size. (B) Footprint map of the D-loop sample from the reaction as in (A). (C) Schematic of an in vitro D-loop reaction with the ds98-931 substrate, human recombinant proteins, and a linear donor. The D-loops are unrestricted by topology and limited only by the length of homology. (D) Footprint map of the D-loop sample from the reaction as in (C). (E) Distribution of D-loop footprints across the region of homology for (B) and (D). (F) Dot plot showing distribution of D-loop lengths from samples in (B) and (D). In red, Mean ± SD (n = 2). (G) SYBR Gold stained agarose gels depicting D-loop samples from an in vitro D-loop reaction from human recombinant proteins. D-loops formed in the presence of only RAD51 or RAD51 along with RAD54 are shown for ds98-931 and ds98-915-78ss substrates. These ds98-931 samples were subjected to DMA. (A) and (C) below the lanes indicate samples as described in (A) and (C) schematics of the D-loop reaction. (H) Comparison of D-loop levels from the gel in (G), to those from the DMA (B and D), relative to the donor DNA. Mean ± SD (n = 2).

Figure 5 with 2 supplements
Effect of varying peak thresholds on D-loop levels, length, and position.

(A) Percentage of reads with footprint when called with different peak thresholds for D-loops formed with ds98-931 and a linear donor. For each peak threshold, ‘t’ represents the minimum conversion frequency of cytosines, ‘w’ represents the window size requiring the minimum number of consecutive cytosines. Refer to Figure 5—figure supplement 2. (B) Schematic representation of potentially short, bottom-strand footprints derived from the D-loop junctions. (C) Changes in the distribution of D-loop lengths as defined with different peak thresholds. D-loops formed with ds98-931, and a linear donor are depicted. Mean ± SD (n = 3) in red. (D) Distribution of the position of D-loop footprints when defined with different peak thresholds.

Figure 5—figure supplement 1
Optimizing bisulfite treatment conditions.

(A and B) Footprint map from the top-strand reads depicting the D-loops formed with ds98-607 substrate and a supercoiled or linear donor, respectively. Note that the sample treated at room temperature (RT) for 4 hr has gray area on the 3′-end of the footprint map. This is due to poor quality of reads derived initially from the RS-II sequencing system (see Materials and methods). (C) Percentage of reads containing D-loop footprints in (A) and (B) after bisulfite treatment at different temperatures (n = 2, except RT 4 hr n = 1 and RT 3 hr n = 3). The peak threshold used here was t40w50.

Figure 5—figure supplement 2
Optimizing the peak threshold for DMA.

(A) Dot plot depicting the cytosine conversion frequency on a fully single-stranded DNA (ssDNA) sample (pBSKS(-) circular DNA) post DMA. The ssDNA was spiked into an in vitro D-loop reaction to test the cytosine conversion efficiency of the bisulfite treatment. Mean ± SD. (B) Footprint map of cytosine converted footprints on pBSKS(-) ssDNA that was spiked into an in vitro D-loop sample and subjected to DMA (for details, see Materials and methods). A total of 21 reads were sequenced and analyzed. All the reads had a footprint. (C) Footprint map of D-loops formed with the ds98-931 substrate and a linear donor are shown as defined with various peak thresholds. Reads from the top-strand and the bottom-strand containing footprints that crossed the threshold are shown. For peak thresholds where no bottom-strand footprints were detected, no footprint map is shown.

Tables

Key resources table
Reagent type (species)
or resource
DesignationSource or referenceIdentifiersAdditional information
Recombinant DNA reagent (plasmid)pBSKS (-) strandWright and Heyer, 2014Used to test bisulfite conversion efficiency
Recombinant DNA reagent (plasmid)pBSphix1200Wright and Heyer, 2014
Wright and Heyer, 2014
AmpUsed as dsDNA donor in D-loop assay in supercoiled or linear form
Recombinant protein (S. cerevisiae)Rad54Wright and Heyer, 2014
Recombinant protein (S. cerevisiae)Rad51Van Komen et al., 2006
Recombinant protein (S. cerevisiae)RPABinz et al., 2006
OligonucleotideUNI+Donor-PB-FThis paperGCAGTCGAACATGTAGCTGACTCAGGTCACTCACACTTCCTGGTTGATGG
OligonucleotideUNI+ PhiX-PB-RThis paperTGGATCACTTGTGCAAGCATCACATCGTAGATCTACACGACGGGGAGTCA
OligonucleotidepBS-915nt-subs-RThis paperGGTATCGATAAGCTTCCATGgcatttgtttcagggttatttg
OligonucleotidepBS-51-931nt-subs-FThis paperCGTATCTAGACTGCAgaacggaaaacatccttcatag
OligonucleotidepBS-1013nt subs-RThis paperCGTATCTAGACTGCAgaagtcatgattgaatcg
OligonucleotidepBS-1013nt subs-FThis paperGGTATCGATAAGCTTCCATGgttaatgccactcctctcccga
Oligonucleotide3`-non-tailed-NcoI-915This paperGATAAGCTTCCATGGCAT
OligonucleotideUNI+pBSKS-FThis paperGCAGTCGAACATGTAGCTGACTCAGGTTTTTGATTTATAAGGGATTTTG
OligonucleotideUNI+pBSKS-RThis paperTGGATCACTTGTGCAAGCATCACATCGTAGTTTATTTTTCTAAATACATTCAAATAT
Oligonucleotide100-merWright and Heyer, 2014ctggtcataatcatggtggcgaataagtacgcgttcttgcaaatcaccagaaggcggttcctgaatgaatgggaagccttcaagaaggtgataagcagga
Oligonucleotideds98-197-78ssWright and Heyer, 2014Homologous sequence (197 nt): ctggtcataatcatggtggcgaataagtacgcgttcttgcaaatcaccagaaggcggttcctgaatgaatgggaagccttcaagaaggtgataagcaggagaaacatacgaaggcgcataacgataccactgaccctcagcaatcttaaacttcttagacgaatcaccagaacggaaaacatccttcatagaaattt
Oligonucleotideds98-607Wright and Heyer, 2014Homologous sequence (607 nt): gaagtcatgattgaatcgcgagtggtcggcagattgcgataaacggtcacattaaatttaacctgactattccactgcaacaactgaacggactggaaacactggtcataatcatggtggcgaataagtacgcgttcttgcaaatcaccagaaggcggttcctgaatgaatgggaagccttcaagaaggtgataagcaggagaaacatacgaaggcgcataacgataccactgaccctcagcaatcttaaacttcttagacgaatcaccagaacggaaaacatccttcatagaaatttcacgcggcggcaagttgccatacaaaacagggtcgccagcaatatcggtataagtcaaagcacctttagcgttaaggtactgaatctctttagtcgcagtaggcggaaaacgaacaagcgcaagagtaaacatagtgccatgctcaggaacaaagaaacgcggcacagaatgtttataggtctgttgaacacgaccagaaaactggcctaacgacgtttggtcagttccatcaacatcatagccagatgcccagagattagagcgcatgacaagtaaaggacggttgtcagcgtcataagaggttttac
Oligonucleotideds98-931This paperHomologous sequence (931 nt): gaacggaaaacatccttcatagaaatttcacgcggcggcaagttgccatacaaaacagggtcgccagcaatatcggtataagtcaaagcacctttagcgttaaggtactgaatctctttagtcgcagtaggcggaaaacgaacaagcgcaagagtaaacatagtgccatgctcaggaacaaagaaacgcggcacagaatgtttataggtctgttgaacacgaccagaaaactggcctaacgacgtttggtcagttccatcaacatcatagccagatgcccagagattagagcgcatgacaagtaaaggacggttgtcagcgtcataagaggttttacctccaaatgaagaaataacatcatggtaacgctgcatgaagtaatcacgttcttggtcagtatgcaaattagcataagcagcttgcagacccataatgtcaatagatgtggtagaagtcgtcatttggcgagaaagctcagtctcaggaggaagcggagcagtccaaatgtttttgagatggcagcaacggaaaccataacgagcatcatcttgattaagctcattagggttagcctcggtacggtcaggcatccacggcgctttaaaatagttgttatagatattcaaataaccctgaaacaaatgcttagggattttattggtatcagggttaatcgtgccaagaaaagcggcatggtcaatataaccagtagtgttaacagtcgggagaggagtggcattaacaccatccttcatgaacttaatccactgttcaccataaacgtgacgatgagggacataaaaagtaaaaatgtctacagtagagtcaatagcaaggccacgacgcaatggagaaagacggagagcgccaacggcgtccatctcgaaggagtcgccagcgataaccggagtagttgaaatggtaataagac
Oligonucleotideds98-915This paperHomologous sequence (915 nt): gaagtcatgattgaatcgcgagtggtcggcagattgcgataaacggtcacattaaatttaacctgactattccactgcaacaactgaacggactggaaacactggtcataatcatggtggcgaataagtacgcgttcttgcaaatcaccagaaggcggttcctgaatgaatgggaagccttcaagaaggtgataagcaggagaaacatacgaaggcgcataacgataccactgaccctcagcaatcttaaacttcttagacgaatcaccagaacggaaaacatccttcatagaaatttcacgcggcggcaagttgccatacaaaacagggtcgccagcaatatcggtataagtcaaagcacctttagcgttaaggtactgaatctctttagtcgcagtaggcggaaaacgaacaagcgcaagagtaaacatagtgccatgctcaggaacaaagaaacgcggcacagaatgtttataggtctgttgaacacgaccagaaaactggcctaacgacgtttggtcagttccatcaacatcatagccagatgcccagagattagagcgcatgacaagtaaaggacggttgtcagcgtcataagaggttttacctccaaatgaagaaataacatcatggtaacgctgcatgaagtaatcacgttcttggtcagtatgcaaattagcataagcagcttgcagacccataatgtcaatagatgtggtagaagtcgtcatttggcgagaaagctcagtctcaggaggaagcggagcagtccaaatgtttttgagatggcagcaacggaaaccataacgagcatcatcttgattaagctcattagggttagcctcggtacggtcaggcatccacggcgctttaaaatagttgttatagatattcaaataaccctgaaacaaatgc
Oligonucleotideds98-915-78ssThis paperHomologous sequence (915 nt): gaagtcatgattgaatcgcgagtggtcggcagattgcgataaacggtcacattaaatttaacctgactattccactgcaacaactgaacggactggaaacactggtcataatcatggtggcgaataagtacgcgttcttgcaaatcaccagaaggcggttcctgaatgaatgggaagccttcaagaaggtgataagcaggagaaacatacgaaggcgcataacgataccactgaccctcagcaatcttaaacttcttagacgaatcaccagaacggaaaacatccttcatagaaatttcacgcggcggcaagttgccatacaaaacagggtcgccagcaatatcggtataagtcaaagcacctttagcgttaaggtactgaatctctttagtcgcagtaggcggaaaacgaacaagcgcaagagtaaacatagtgccatgctcaggaacaaagaaacgcggcacagaatgtttataggtctgttgaacacgaccagaaaactggcctaacgacgtttggtcagttccatcaacatcatagccagatgcccagagattagagcgcatgacaagtaaaggacggttgtcagcgtcataagaggttttacctccaaatgaagaaataacatcatggtaacgctgcatgaagtaatcacgttcttggtcagtatgcaaattagcataagcagcttgcagacccataatgtcaatagatgtggtagaagtcgtcatttggcgagaaagctcagtctcaggaggaagcggagcagtccaaatgtttttgagatggcagcaacggaaaccataacgagcatcatcttgattaagctcattagggttagcctcggtacggtcaggcatccacggcgctttaaaatagttgttatagatattcaaataaccctgaaacaaatgc
Commercial enzymeBsa1New England BiolabsCatalog: #R0535STo linearize pBSphix1200
Commercial enzymePhusion-U polymeraseThermo FischerCatalog: #PN-F555S
Commercial kitEpitect Bisulfite kitQiagenCatalog: #59104DMA
Commercial kitSMRTbell Template Prep Kit 1.0Pacific BiosciencesCatalog: #100-259-100DMA
Commercial reagentSera-Mag SpeedBead Carboxylate-Modified Magnetic particles (Hydrophobic)SigmaCatalog: #PN-65152105050250DMA
Commercial reagentAMPure PBPacific BiosciencesCatalog: #100-265-900DMA
Commercial reagentSYBR Gold Nucleic Acid StainInvitrogenCatalog: #S11494

Additional files

Source code 1

RScript for the D-loop Length Analysis.

https://cdn.elifesciences.org/articles/59111/elife-59111-code1-v2.rtf.zip
Source code 2

RScript for the analysis of D-loop position and distribution.

https://cdn.elifesciences.org/articles/59111/elife-59111-code2-v2.rtf.zip
Source data 1

Source data for all figures.

https://cdn.elifesciences.org/articles/59111/elife-59111-data1-v2.xlsx
Supplementary file 1

Total reads with human proteins.

Table summarizing the total number of reads containing a footprint as ‘peak’ and the total number of reads analyzed as ‘total’ for each strand. % Peak’ indicate the percentage of reads containing a footprint. The data represents a cumulation from two independent replicates.

https://cdn.elifesciences.org/articles/59111/elife-59111-supp1-v2.docx
Transparent reporting form
https://cdn.elifesciences.org/articles/59111/elife-59111-transrepform-v2.docx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Shanaya Shital Shah
  2. Stella R Hartono
  3. Frédéric Chédin
  4. Wolf-Dietrich Heyer
(2020)
Bisulfite treatment and single-molecule real-time sequencing reveal D-loop length, position, and distribution
eLife 9:e59111.
https://doi.org/10.7554/eLife.59111